LLMs such as ChatGPT have changed how developers write, review, and test code. The biggest testament to this is the rise of the term "Vibe coding", which was coined by Andrej Karpathy in an X post. To quote the post:
There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works. ~ Andrej Karpathy on X
In this blog, we tried to see if we could apply this approach and have ChatGPT scrape some data from the web for us. The idea was to see how far we could get on a defined scraping task without getting hands-on with code. Briefly, we tried getting some iPhone product listings from the Apple website, first by asking directly and then by scraping some URLs. Let's see how it fared.
TL;DR: ChatGPT could write scraping code, but could not run it from the browser based interface. The code worked after a few iterations; with some nudging and hinting.
The Task At Hand: Finding A Small iPhone
If you're like me - digitally minimal and spending a lot of time outdoors, with no social media apps on the phone and only occasionally taking it out to check the map - you have probably run into this problem lately: small phones are going extinct! The phones are quite literally getting out of hand. All I want is a GPS-enabled phone that can run WhatsApp and Google Maps and can be comfortably used with one hand.
With Android, there are some speciality brands like Ulefone and Unihertz, but since they do not retail in India, it would be hard to get them serviced if something goes wrong. The iPhone 16e might just cut it, but I do not want to spend so much for the minimal requirements I have from a phone. As it seems, the best option is for me to get a refurbished iPhone Mini or iPhone SE.
For this blog, the idea was to see if I could get ChatGPT to write a scraper and help me decide which iPhone to buy, and possibly even show me a refurbished phone listing that I can immediately order. My intention here was to be as lazy as possible and see how far it takes me. Ideally, ChatGPT could have suggested a phone and a purchase link, but for demonstrative purposes we asked it to write some scraper code.
Why Scrape When You Can Just Ask?
ChatGPT and other LLMs are usually presented in chat-based interfaces, and in an AI utopia, I should be able to ask for my requirements and get product listings. That was my first attempt:
Me: I'm looking for a small smartphone, something that is comfortable to be used with one hand. Please suggest Apple phones with official website links for phones that have a screen size of less than 6 inches. For each listing, mention the following details:
- Operating system version (current and upgradable upto)
- Screen size
- Phone dimensions (height, thickness and width)
- Battery capacity
- Availability of USB-C charging
- 5G availability
- Price and Stock status
- Year of release_
ChatGPT: As of March 2025, Apple has discontinued the sale of smartphones with screen sizes under 6 inches. The smallest iPhones currently available directly from Apple feature 6.1-inch displays, such as the iPhone 15, iPhone 16e, and iPhone 16
However, if you're seeking a compact iPhone suitable for one-handed use, you might consider the following models available through third-party retailers or as refurbished units:
iPhone 13 Mini Operating System: Originally shipped with iOS 15; upgradable to the latest iOS version as of 2025. Screen Size: 5.4 inches Dimensions: Height: 131.5 mm; Width: 64.2 mm; Depth: 7.65 mm Battery Capacity: Approximately 2,438 mAh USB-C Charging: No; utilizes Lightning connector 5G Availability: Yes Price and Stock Status: Varies by retailer; discontinued by Apple in September 2023 Year of Release: 2021
iPhone SE (3rd Generation, 2022) Operating System: Originally shipped with iOS 15; upgradable to the latest iOS version as of 2025. Screen Size: 4.7 inches Dimensions: Height: 138.4 mm; Width: 67.3 mm; Depth: 7.3 mm Battery Capacity: Approximately 2,018 mAh USB-C Charging: No; utilizes Lightning connector 5G Availability: Yes Price and Stock Status: Varies by retailer; discontinued by Apple in February 2025 Year of Release: 2022
Please note that while these models are no longer sold directly by Apple, they may still be available through authorized resellers or as refurbished units. For the most accurate information on pricing and availability, it's advisable to check with local retailers or Apple's official refurbished store.
The response was fairly accurate, but it missed the iPhone 12 mini. The next question was: Can ChatGPT go through a list of URLs and give me a list of phones that satisfy my expectations?
Me: Go through the listings on this sitemap: https://www.apple.com/shop/sitemaps/refurbished.xml
Tell me if there are any refurbished iPhone 13 mini, iPhone 12 mini, or iPhone SE units available for purchase.
ChatGPT: As of March 25, 2025, Apple's official refurbished store offers the following refurbished iPhone models:
Refurbished iPhone 13 mini 512GB - Midnight (Unlocked): This model features a 5.4-inch display and is available for purchase. Apple
Refurbished iPhone 13 256GB: Available in Starlight, Pink, Blue, and Green colors, these models are listed at $529.00, reflecting a $200.00 discount from the original price. Apple
Regarding the iPhone 12 mini and iPhone SE models, there are currently no listings for these devices in Apple's refurbished store. Availability of refurbished products is subject to change based on inventory, so it's advisable to check the store periodically for updates.
The response looks neat, but there was one issue: the first link showed an out-of-stock product, and the second one did not lead to a product page but redirected to the refurbished products listings page.
π€ Need help finding all the URLs on a website so you can scrape them? Check out our expert-level guide on How to find all URLs on a domainβs website (multiple methods).
Resorting To Scraping: Initial Lazy Attempt
From the above response, I assumed ChatGPT is unable to access the Apple site in real-time and produce the output I needed. I decided to have ChatGPT go through each URL and give me a response with the data I needed. So I asked:
Me: The first link is out of stock. Only the 2nd link works.
Instead, can you go through the phone available on this link: https://www.apple.com/shop/refurbished/iphone? Scrape each link to make a table with the following fields:
- Link
- Phone Model Name
- Operating system version (current and upgradable upto)
- Screen size
- Phone dimensions (height, thickness and width)
- Battery capacity
- Availability of USB-C charging
- 5G availability
- Price and Stock status
- Year of release
Finally, sort the table in the increasing order of screen size
ChatGPT returned a table with around 7 listings at first, but since I could see there were more listings on the Apple page, I nudged it to get all of them. The table it returned looked like this:
Model | Screen Size | Dimensions (H x W x D) | Battery Capacity | USB-C Charging | 5G Availability | Price (USD) | Year of Release |
---|---|---|---|---|---|---|---|
Refurbished iPhone 13 128GB - Midnight (Unlocked) | 6.1 inches | 146.7 x 71.5 x 7.65 mm | 3,227 mAh | No | Yes | $449.00 | 2021 |
Refurbished iPhone 13 128GB - Starlight (Unlocked) | 6.1 inches | 146.7 x 71.5 x 7.65 mm | 3,227 mAh | No | Yes | $449.00 | 2021 |
Refurbished iPhone 13 128GB - Pink (Unlocked) | 6.1 inches | 146.7 x 71.5 x 7.65 mm | 3,227 mAh | No | Yes | $449.00 | 2021 |
... more rows... |
At first glance, the table looks like ChatGPT did a good job. The data is fairly accurate, the list looks sorted in the order I specified. But once again, the links weren't working!
Me: The links you shared are broken. Will you be able to give me working links?
ChatGPT: I apologize for the earlier issues with the links. As of March 25, 2025, Apple's Certified Refurbished store offers a selection of refurbished iPhone models. You can view the available options here: APPLE
Please note that the availability of specific models, such as the iPhone 13 mini, iPhone 12 mini, or iPhone SE, may vary. I recommend visiting the provided link to check the current stock and pricing for these models.
ChatGPT took me back to square one, sharing the refurbished phone listings link that I shared with it to start with!
Writing Scraper Code With ChatGPT: Getting Links From A Webpage
All this time, I tried avoiding having to get involved with code in any way, but now I decided to give in to the temptation to use code. So I prompted:
Me: Go through the source code of this webpage: https://www.apple.com/shop/refurbished/iphone
Write a scraper in Python that can extract all the product URLs from this page and return them as a list.
ChatGPT: Here's a Python scraper using requests and BeautifulSoup to extract all product URLs from the Apple Refurbished iPhone page:
Button
ChatGPT showed a button in the chat, which opened up the generated code in a mini IDE setup. I could supposedly even run the code from there:
I immediately clicked on the "run" button. There was an error due to the browser interface not being able to send HTTP requests via the requests library. So I had to download the code and run it locally. The code generated is shown below:
import requests
from bs4 import BeautifulSoup
def get_refurbished_iphone_urls():
url = "<https://www.apple.com/shop/refurbished/iphone>"
headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(url, headers=headers)
if response.status_code != 200:
print("Failed to retrieve page")
return []
soup = BeautifulSoup(response.text, "html.parser")
product_links = []
for link in soup.find_all("a", href=True):
href = link["href"]
if "/shop/product/" in href:
full_url = f"<https://www.apple.com>{href}"
product_links.append(full_url)
return product_links
# Example usage
if __name__ == "__main__":
urls = get_refurbished_iphone_urls()
print(urls)
The code looks neat, and it worked well too. I got a list of URLs from the refurbished store page when I ran it. ChatGPT was smart with adding logic to return just the Product page URLs, and it was nice that it prepended "https://www.apple.com" to the raw href attributes.
ChatGPT-Generated Code For Scraping Apple Product Data
Once the code to extract the links was done, the next step was to write code to scrape each of these links for the data I needed. The product listing page looks like this:
Me: Go through the source code of this webpage: https://www.apple.com/shop/product/FLML3LL/A/refurbished-iphone-13-128gb-midnight-unlocked
Write a Python scraper function that takes in a URL like this and returns the following details as a dictionary with the following details:
- Input URL
- Phone Model Name
- Operating system version (current and upgradable upto)
- Screen size
- Phone dimensions (height, thickness and width)
- Battery capacity
- Availability of USB-C charging
- 5G availability
- Price and Stock status
- Year of release
Once this is done, write another function that can take a list of URLs, run the previous function for each URL (in parallel, using asyncio), gather the results, and make a Pandas DataFrame out of it.
Finally, the code must print the Pandas dataframe after sorting it in increasing order of screen size.
Here's the code ChatGPT returned:
import asyncio
import aiohttp
import pandas as pd
from bs4 import BeautifulSoup
async def fetch_page(session, url):
"""Fetches the HTML content of a given URL."""
async with session.get(url) as response:
return await response.text()
async def scrape_iphone_details(url):
"""Scrapes details of an iPhone from the given Apple Refurbished Store product URL."""
async with aiohttp.ClientSession() as session:
html = await fetch_page(session, url)
soup = BeautifulSoup(html, 'html.parser')
try:
phone_name = soup.find("h1", class_="product-title").text.strip()
price = soup.find("span", class_="as-price-currentprice").text.strip()
stock_status = "In Stock" if "Add to Bag" in html else "Out of Stock"
details_section = soup.find("div", class_="tech-specs")
details_list = details_section.find_all("li") if details_section else []
details = {}
for item in details_list:
text = item.text.strip()
if "display" in text.lower():
details["Screen Size"] = text.split(":")[-1].strip()
elif "dimensions" in text.lower():
details["Dimensions"] = text.split(":")[-1].strip()
elif "battery" in text.lower():
details["Battery Capacity"] = text.split(":")[-1].strip()
elif "5g" in text.lower():
details["5G Availability"] = "Yes" if "5G" in text else "No"
elif "usb-c" in text.lower():
details["USB-C Charging"] = "Yes" if "USB-C" in text else "No"
elif "release year" in text.lower():
details["Year of Release"] = text.split(":")[-1].strip()
return {
"Input URL": url,
"Phone Model Name": phone_name,
"Operating System": "iOS (Check Apple site for current version)",
"Screen Size": details.get("Screen Size", "Unknown"),
"Dimensions": details.get("Dimensions", "Unknown"),
"Battery Capacity": details.get("Battery Capacity", "Unknown"),
"USB-C Charging": details.get("USB-C Charging", "No"),
"5G Availability": details.get("5G Availability", "Unknown"),
"Price": price,
"Stock Status": stock_status,
"Year of Release": details.get("Year of Release", "Unknown"),
}
except Exception as e:
return {"Error": f"Failed to scrape {url} - {str(e)}"}
async def scrape_multiple_iphones(urls):
"""Scrapes details of multiple iPhones in parallel using asyncio."""
tasks = [scrape_iphone_details(url) for url in urls]
return await asyncio.gather(*tasks)
def main(urls):
"""Runs the scraper on a list of URLs and sorts the results by increasing screen size."""
results = asyncio.run(scrape_multiple_iphones(urls))
df = pd.DataFrame(results)
df["Screen Size"] = pd.to_numeric(df["Screen Size"].str.replace('"', ''), errors='coerce')
df = df.sort_values(by="Screen Size", ascending=True)
print(df)
# Example usage
urls = [
"<https://www.apple.com/shop/product/FLML3LL/A/refurbished-iphone-13-128gb-midnight-unlocked>",
"<https://www.apple.com/shop/product/FLMJ3LL/A/refurbished-iphone-13-256gb-starlight-unlocked>"
]
main(urls)
I ran the code without going through it. I first got a KeyError
because "Screen Size" wasn't in some dict
. I asked ChatGPT to fix it, and this is the fixed code it gave me:
import asyncio
import aiohttp
import pandas as pd
from bs4 import BeautifulSoup
import re
async def fetch_page(session, url):
"""Fetches the HTML content of a given URL asynchronously."""
async with session.get(url) as response:
return await response.text()
async def scrape_iphone_details(url):
"""Scrapes details of an iPhone from the given Apple Refurbished Store product URL."""
async with aiohttp.ClientSession() as session:
html = await fetch_page(session, url)
soup = BeautifulSoup(html, 'html.parser')
try:
phone_name = soup.find("h1", class_="product-title").text.strip() if soup.find("h1", class_="product-title") else "Unknown Model"
price = soup.find("span", class_="as-price-currentprice").text.strip() if soup.find("span", class_="as-price-currentprice") else "Unknown Price"
stock_status = "In Stock" if "Add to Bag" in html else "Out of Stock"
# Find the section that contains technical details
details_section = soup.find("div", class_="tech-specs")
details_list = details_section.find_all("li") if details_section else []
# Extract details dynamically
details = {
"Screen Size": "Unknown",
"Dimensions": "Unknown",
"Battery Capacity": "Unknown",
"5G Availability": "Unknown",
"USB-C Charging": "No",
"Year of Release": "Unknown"
}
for item in details_list:
text = item.text.strip().lower()
if "display" in text or "screen" in text:
match = re.search(r'(\\d+(\\.\\d+)?)\\s*inch', text)
details["Screen Size"] = match.group(1) + '"' if match else "Unknown"
if "dimensions" in text:
details["Dimensions"] = text.split(":")[-1].strip()
if "battery" in text:
match = re.search(r'(\\d{3,4})\\s*mAh', text)
details["Battery Capacity"] = match.group(1) + " mAh" if match else "Unknown"
if "5g" in text:
details["5G Availability"] = "Yes" if "5g" in text else "No"
if "usb-c" in text:
details["USB-C Charging"] = "Yes"
if "release year" in text:
match = re.search(r'(\\d{4})', text)
details["Year of Release"] = match.group(1) if match else "Unknown"
return {
"Input URL": url,
"Phone Model Name": phone_name,
"Operating System": "iOS (Check Apple site for current version)",
"Screen Size": details["Screen Size"],
"Dimensions": details["Dimensions"],
"Battery Capacity": details["Battery Capacity"],
"USB-C Charging": details["USB-C Charging"],
"5G Availability": details["5G Availability"],
"Price": price,
"Stock Status": stock_status,
"Year of Release": details["Year of Release"],
}
except Exception as e:
return {"Error": f"Failed to scrape {url} - {str(e)}"}
async def scrape_multiple_iphones(urls):
"""Scrapes details of multiple iPhones in parallel using asyncio."""
tasks = [scrape_iphone_details(url) for url in urls]
return await asyncio.gather(*tasks)
def main(urls):
"""Runs the scraper on a list of URLs and sorts the results by increasing screen size."""
results = asyncio.run(scrape_multiple_iphones(urls))
df = pd.DataFrame(results)
# Convert screen size to numeric for sorting
df["Screen Size"] = pd.to_numeric(df["Screen Size"].str.replace('"', ''), errors='coerce')
df = df.sort_values(by="Screen Size", ascending=True)
print(df)
# Example usage
urls = [
"<https://www.apple.com/shop/product/FLML3LL/A/refurbished-iphone-13-128gb-midnight-unlocked>",
"<https://www.apple.com/shop/product/FLMJ3LL/A/refurbished-iphone-13-256gb-starlight-unlocked>"
]
main(urls)
The fix seemed okay, but the phone name and stock status were not being accurately scraped. Upon inspecting the HTML code on the Apple website, I figured that the CSS selectors ChatGPT was using were wrong, and it couldn't get them right even after asking for a fix. I tried pasting the source code of the webpage and asked ChatGPT to suggest the right selectors, but the message was too long for ChatGPT to accept. Finally, I asked for this:
Me: Still not working. Can you use the schema.org json for product name, price, and stock status instead of the HTML?
Schema.org JSON objects embedded in the webpage are a more uniform representation of data on the web, and this is something that is easier to parse and will not change with a webpage redesign. Here's the modified bit from ChatGPT's new code:
...
html = await fetch_page(session, url)
soup = BeautifulSoup(html, 'html.parser')
try:
# Extract JSON-LD script containing structured data
json_ld_script = soup.find("script", type="application/ld+json")
product_data = json.loads(json_ld_script.string) if json_ld_script else {}
# Extract product name, price, and stock status from JSON-LD
phone_name = product_data.get("name", "Unknown Model")
price = product_data.get("offers", {}).get("price", "Unknown Price")
stock_status = product_data.get("offers", {}).get("availability", "").split("/")[-1]
stock_status = "In Stock" if stock_status.lower() == "instock" else "Out of Stock"
...
From this and the rest of the code, I could see two issues, which I asked ChatGPT to fix:
Me: There are still two issues:
- There may be multiple schema objects. Make sure you use the "Product" one.
- There is no "div.as-tech-specs" section to extract the details from. Use "div.TechSpecs-panel" instead. There are no "li" elements inside that either. Check only divs.
ChatGPT made some minor edits, and returned the code. I tried running the code, which likely failed, because all I got were blank results. This most likely happened because ChatGPT had put the scraping code inside a try
block and returned error dicts even if something went wrong. I asked it to replace this with a None
response and print the traceback whenever there is an error. Here's the fixed error handling code I got:
except Exception as e:
print(f"β ERROR while scraping {url} β")
traceback.print_exc() # Print full error traceback for debugging
return None # Return None so failed entries are ignored
That code has emojis! Perhaps it's the best indicator of whether any given piece of code was generated with ChatGPT. I tried running the code and got an error printed on my screen, which I pasted directly into ChatGPT:
Me:
β ERROR while scraping <https://www.apple.com/shop/product/FLML3LL/A/refurbished-iphone-13-128gb-midnight-unlocked> β
Traceback (most recent call last):
File "/home/kd/scrapingbee/chatgpt-apple-scraper/get_data.py", line 40, in scrape_iphone_details
price = product_data.get("offers", {}).get("price", "Unknown Price")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'get'
Traceback (most recent call last):
File "/home/kd/scrapingbee/chatgpt-apple-scraper/get_data.py", line 124, in <module>
main(urls)
File "/home/kd/scrapingbee/chatgpt-apple-scraper/get_data.py", line 115, in main
df["Screen Size"] = pd.to_numeric(df["Screen Size"].str.replace('"', ''), errors='coerce')
~~^^^^^^^^^^^^^^^
File "/home/kd/.cache/pypoetry/virtualenvs/chatgpt-apple-scraper-C8P38fdX-py3.11/lib/python3.11/site-packages/pandas/core/frame.py", line 4102, in __getitem__
indexer = self.columns.get_loc(key)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kd/.cache/pypoetry/virtualenvs/chatgpt-apple-scraper-C8P38fdX-py3.11/lib/python3.11/site-packages/pandas/core/indexes/range.py", line 417, in get_loc
raise KeyError(key)
KeyError: 'Screen Size'
ChatGPT analyzed these errors and listed the following issues along with fixes:
- Issue 1:
product_data["offers"]
is a list, not a dictionary - Issue 2: Screen Size column is missing in Pandas DataFrame
After applying these fixes, here's the final working code produced by ChatGPT:
import asyncio
import aiohttp
import pandas as pd
from bs4 import BeautifulSoup
import re
import json
import traceback
async def fetch_page(session, url):
"""Fetches the HTML content of a given URL asynchronously."""
async with session.get(url) as response:
return await response.text()
async def scrape_iphone_details(url):
"""Scrapes details of an iPhone from the given Apple Refurbished Store product URL."""
async with aiohttp.ClientSession() as session:
html = await fetch_page(session, url)
soup = BeautifulSoup(html, 'html.parser')
try:
# Extract JSON-LD script containing "Product" data
json_ld_scripts = soup.find_all("script", type="application/ld+json")
product_data = {}
for script in json_ld_scripts:
try:
data = json.loads(script.string)
if isinstance(data, list): # JSON-LD sometimes is an array
for item in data:
if item.get("@type") == "Product":
product_data = item
break
elif data.get("@type") == "Product":
product_data = data
except json.JSONDecodeError:
continue # Skip invalid JSON sections
# Extract product details
phone_name = product_data.get("name", "Unknown Model")
# Fix for the 'list' object issue
offers = product_data.get("offers", [])
if isinstance(offers, list) and len(offers) > 0:
price = offers[0].get("price", "Unknown Price")
stock_status = offers[0].get("availability", "").split("/")[-1]
else:
price = "Unknown Price"
stock_status = "Unknown"
stock_status = "In Stock" if stock_status.lower() == "instock" else "Out of Stock"
# Extract technical specifications from "TechSpecs-panel"
details_section = soup.find("div", class_="TechSpecs-panel")
details_divs = details_section.find_all("div") if details_section else []
details = {
"Screen Size": "Unknown",
"Dimensions": "Unknown",
"Battery Capacity": "Unknown",
"5G Availability": "Unknown",
"USB-C Charging": "No",
"Year of Release": "Unknown"
}
for div in details_divs:
text = div.get_text(strip=True).lower()
if "display" in text or "screen" in text:
match = re.search(r'(\\d+(\\.\\d+)?)\\s*inch', text)
details["Screen Size"] = match.group(1) + '"' if match else "Unknown"
if "dimensions" in text:
details["Dimensions"] = text.split(":")[-1].strip()
if "battery" in text:
match = re.search(r'(\\d{3,4})\\s*mAh', text)
details["Battery Capacity"] = match.group(1) + " mAh" if match else "Unknown"
if "5g" in text:
details["5G Availability"] = "Yes" if "5g" in text else "No"
if "usb-c" in text:
details["USB-C Charging"] = "Yes"
if "release year" in text:
match = re.search(r'(\\d{4})', text)
details["Year of Release"] = match.group(1) if match else "Unknown"
return {
"Input URL": url,
"Phone Model Name": phone_name,
"Operating System": "iOS (Check Apple site for current version)",
"Screen Size": details["Screen Size"],
"Dimensions": details["Dimensions"],
"Battery Capacity": details["Battery Capacity"],
"USB-C Charging": details["USB-C Charging"],
"5G Availability": details["5G Availability"],
"Price": f"${price}" if price != "Unknown Price" else price,
"Stock Status": stock_status,
"Year of Release": details["Year of Release"],
}
except Exception as e:
print(f"β ERROR while scraping {url} β")
traceback.print_exc()
return None # Ignore failed scrapes
async def scrape_multiple_iphones(urls):
"""Scrapes details of multiple iPhones in parallel using asyncio."""
tasks = [scrape_iphone_details(url) for url in urls]
return await asyncio.gather(*tasks)
def main(urls):
"""Runs the scraper on a list of URLs and sorts the results by increasing screen size."""
results = asyncio.run(scrape_multiple_iphones(urls))
# Remove None values (failed scrapes)
results = [r for r in results if r is not None]
df = pd.DataFrame(results)
# Fix for missing 'Screen Size' key
if "Screen Size" in df.columns:
df["Screen Size"] = pd.to_numeric(df["Screen Size"].str.replace('"', ''), errors='coerce')
df = df.sort_values(by="Screen Size", ascending=True)
print(df)
# Example usage
urls = [
"<https://www.apple.com/shop/product/FLML3LL/A/refurbished-iphone-13-128gb-midnight-unlocked>",
"<https://www.apple.com/shop/product/FLMJ3LL/A/refurbished-iphone-13-256gb-starlight-unlocked>"
]
main(urls)
The URLs from the example weren't working out of the box though. I manually called the URL scraper we wrote first and then used those URLs for scraping. Here's the final result I got:
Phone Model Name | Operating System | Screen Size | Dimensions | Battery Capacity | USB-C Charging | 5G Availability | Price | Stock Status | Year of Release |
---|---|---|---|---|---|---|---|---|---|
Refurbished iPhone 13 128GB - Midnight (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $449.0 | In Stock | Unknown |
Refurbished iPhone 13 128GB - Starlight (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $449.0 | In Stock | Unknown |
Refurbished iPhone 13 128GB - Pink (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $449.0 | In Stock | Unknown |
Refurbished iPhone 13 128GB - Blue (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $449.0 | In Stock | Unknown |
Refurbished iPhone 13 128GB - Green (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $449.0 | In Stock | Unknown |
Refurbished iPhone 13 256GB - Midnight (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $529.0 | In Stock | Unknown |
Refurbished iPhone 13 256GB - Starlight (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $529.0 | In Stock | Unknown |
Refurbished iPhone 13 256GB - Pink (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $529.0 | In Stock | Unknown |
Refurbished iPhone 13 256GB - Blue (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $529.0 | In Stock | Unknown |
Refurbished iPhone 13 256GB - Green (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $529.0 | In Stock | Unknown |
Refurbished iPhone 14 128GB - Purple (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $529.0 | In Stock | Unknown |
Refurbished iPhone 14 128GB - Blue (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $529.0 | In Stock | Unknown |
Refurbished iPhone 14 Plus 128GB - Midnight (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $619.0 | In Stock | Unknown |
Refurbished iPhone 14 Plus 128GB - Starlight (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $619.0 | In Stock | Unknown |
Refurbished iPhone 14 Plus 128GB - Purple (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $619.0 | In Stock | Unknown |
Refurbished iPhone 14 Plus 128GB - Blue (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $619.0 | In Stock | Unknown |
Refurbished iPhone 14 Plus 128GB - Yellow (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $619.0 | In Stock | Unknown |
Refurbished iPhone 13 512GB - Midnight (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $699.0 | In Stock | Unknown |
Refurbished iPhone 14 Plus 256GB - Midnight (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $699.0 | In Stock | Unknown |
Refurbished iPhone 13 Pro Max 512GB - Silver (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $929.0 | In Stock | Unknown |
Refurbished iPhone 13 Pro Max 512GB - Gold (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $929.0 | In Stock | Unknown |
Refurbished iPhone 14 Pro 512GB - Space Black (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $929.0 | In Stock | Unknown |
Refurbished iPhone 14 Pro 512GB - Silver (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $929.0 | In Stock | Unknown |
Refurbished iPhone 14 Pro 512GB - Gold (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $929.0 | In Stock | Unknown |
Refurbished iPhone 14 Pro Max 512GB - Silver (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $1019.0 | In Stock | Unknown |
Refurbished iPhone 14 Pro Max 512GB - Deep Purple (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $1019.0 | In Stock | Unknown |
Refurbished iPhone 13 Pro Max 1TB - Graphite (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $1099.0 | In Stock | Unknown |
Refurbished iPhone 14 Pro 1TB - Gold (Unlocked) | iOS (Check Apple site for current version) | nan | Unknown | Unknown | No | Yes | $1099.0 | In Stock | Unknown |
Some columns like Screen Size and Dimensions do not have accurate values, but they aren't mentioned in a structured format on the Apple website, so it's trickier to parse that with just code. This might be obtainable by an AI going through the product listing page.
Can ChatGPT Scrape Itself?
I thought checking if ChatGPT could scrape itself would be a fun exercise. So I decided to ask it to generate a changelog using the chat link.
Me: Can you scrape this chat and generate the changelog with updates that were made to the code?
Link: https://chatgpt.com/c/67e22699-4648-8000-8572-3e41a286d305
ChatGPT: I don't have direct access to external links, including past chat logs. However, I can summarize the changes based on our conversation here!
πΉ Changelog for iPhone Refurbished Store Scraper Version 1.0 β Version 1.1 β Fixed AttributeError for product_data["offers"] being a list instead of a dictionary β Updated price & stock status extraction to correctly handle multiple offers
Version 1.1 β Version 1.2 β Replaced missing "Screen Size" check before sorting in Pandas (fixed KeyError: 'Screen Size') β Ensured proper JSON parsing by checking multiple schema.org Product objects
Version 1.2 β Version 1.3 (Final Stable Release) β Updated TechSpecs-panel selector (instead of as-tech-specs) β Improved error handling with traceback.print_exc() for full debugging β Added conditional check for "Screen Size" in the DataFrame before processing
Now, the scraper works without errors and correctly extracts all details! π If you need a structured CHANGELOG.md file, let me know! π
Well, ChatGPT can't scrape itself, or so it said. But we got a nice changelog!
Generating Price Monitoring Code With ChatGPT
The next step was to see if we could make the code deployable. It has to run each day and actually notify if there's something I can buy - either a price drop, or a new model being listed. I asked ChatGPT to add this feature.
Me: I need some updates to the previous scraping code to make it a price monitor. Here's what I need:
- Filter the table, showing only the entries with the lowest price for that model name
- For the model name, exclude the color. So you can break the name at the hyphen and take only the first part.
- Make this a price monitor. This script needs to be run every day. So, write another program that calls the first program's function to get the URLs, scrapes all the URLs, and stores the result in memory. This must be in an infinite loop, running once per day. Any changes or new listings from the previous day must be dumped as a markdown file, with the date in the filename.
ChatGPT generated a scraper.py
and price_monitor.py
. The first one was an update to the previous code, and the second one was to run the whole thing in an infinite loop, once per day, and print the updates I needed. I had to ask for some fixes:
Me: I just went through the code, need a few updates:
- Use the
get_refurbished_iphone_urls()
you previously generated, instead offetch_urls()
- Add some emojis in the final markdown output
After it produced some updated code, I asked for the proverbial "one last change":
Me: Great, one last change: save the old_df to disk and load it from disk the next day for comparison. This way the data wouldn't be lost if the script is killed or exited for some reason.
Here's what the final code looked like:
# scraper.py
import asyncio
import aiohttp
import pandas as pd
from bs4 import BeautifulSoup
import json
import traceback
async def fetch_page(session, url):
"""Fetches the HTML content of a given URL asynchronously."""
async with session.get(url) as response:
return await response.text()
async def scrape_iphone_details(url):
"""Scrapes details of an iPhone from the given Apple Refurbished Store product URL."""
async with aiohttp.ClientSession() as session:
html = await fetch_page(session, url)
soup = BeautifulSoup(html, 'html.parser')
try:
# Extract JSON-LD script containing "Product" data
json_ld_scripts = soup.find_all("script", type="application/ld+json")
product_data = {}
for script in json_ld_scripts:
try:
data = json.loads(script.string)
if isinstance(data, list): # JSON-LD sometimes is an array
for item in data:
if item.get("@type") == "Product":
product_data = item
break
elif data.get("@type") == "Product":
product_data = data
except json.JSONDecodeError:
continue # Skip invalid JSON sections
# Extract product details
phone_name = product_data.get("name", "Unknown Model")
# Extract only the model name (remove color & storage info)
model_base_name = phone_name.split("-")[0].strip()
# Fix for the 'list' object issue
offers = product_data.get("offers", [])
if isinstance(offers, list) and len(offers) > 0:
price = offers[0].get("price", "Unknown Price")
stock_status = offers[0].get("availability", "").split("/")[-1]
else:
price = "Unknown Price"
stock_status = "Unknown"
stock_status = "In Stock" if stock_status.lower() == "instock" else "Out of Stock"
return {
"Model Name": model_base_name,
"Full Name": phone_name,
"Input URL": url,
"Price": price,
"Stock Status": stock_status,
}
except Exception as e:
print(f"β ERROR while scraping {url} β")
traceback.print_exc()
return None # Ignore failed scrapes
async def scrape_multiple_iphones(urls):
"""Scrapes details of multiple iPhones in parallel using asyncio."""
tasks = [scrape_iphone_details(url) for url in urls]
return await asyncio.gather(*tasks)
def main(urls):
"""Runs the scraper on a list of URLs and filters the lowest price per model."""
results = asyncio.run(scrape_multiple_iphones(urls))
# Remove None values (failed scrapes)
results = [r for r in results if r is not None]
df = pd.DataFrame(results)
# Keep only the lowest price for each model
df = df.sort_values(by="Price").drop_duplicates(subset=["Model Name"], keep="first")
return df
# price_monitor.py
import time
import datetime
import os
import pandas as pd
import scraper # Import the scraper script
from get_refurbished_iphone_urls import get_refurbished_iphone_urls # Fetches URLs dynamically
PRICE_FILE = "price_data.csv"
def load_previous_data():
"""Loads the last recorded price data from disk (if available)."""
if os.path.exists(PRICE_FILE):
try:
return pd.read_csv(PRICE_FILE)
except Exception as e:
print(f"β οΈ Error loading previous data: {e}. Starting fresh.")
return None
return None
def save_new_data(df):
"""Saves the latest price data to disk."""
df.to_csv(PRICE_FILE, index=False)
print(f"β
Saved latest price data to {PRICE_FILE}")
def compare_prices(old_df, new_df):
"""Compare old and new price data and return changes as a markdown string with emojis."""
changes = []
for _, new_row in new_df.iterrows():
model_name = new_row["Model Name"]
new_price = new_row["Price"]
new_stock = new_row["Stock Status"]
old_row = old_df[old_df["Model Name"] == model_name]
if not old_row.empty:
old_price = old_row.iloc[0]["Price"]
old_stock = old_row.iloc[0]["Stock Status"]
if new_price != old_price or new_stock != old_stock:
price_change_emoji = "π»" if new_price < old_price else "πΊ"
stock_change_emoji = "π’" if new_stock == "In Stock" else "π΄"
changes.append(f"| {model_name} | {price_change_emoji} **${old_price} β ${new_price}** | {stock_change_emoji} {old_stock} β **{new_stock}** |")
else:
changes.append(f"| {model_name} | π **New Listing: ${new_price}** | π’ **{new_stock}** |")
if not changes:
return None
markdown_content = f"# π Price Changes - {datetime.date.today()}\n\n"
markdown_content += "| Model Name | Price Change | Stock Status |\n"
markdown_content += "|------------|-------------|--------------|\n"
markdown_content += "\n".join(changes)
return markdown_content
def save_markdown(content):
"""Save the markdown content to a file with today's date."""
filename = f"price_changes_{datetime.date.today()}.md"
with open(filename, "w") as file:
file.write(content)
print(f"β
Saved changes to {filename}")
def price_monitor():
"""Runs the price monitor indefinitely, checking for changes daily."""
previous_data = load_previous_data()
while True:
print("π Running daily price check...")
# Get the latest refurbished iPhone URLs dynamically
urls = get_refurbished_iphone_urls()
new_data = scraper.main(urls)
if previous_data is not None:
changes = compare_prices(previous_data, new_data)
if changes:
save_markdown(changes)
else:
print("β
No changes detected.")
# Save the new data for future comparison
save_new_data(new_data)
# Update previous_data for the next iteration
previous_data = new_data.copy()
print("β³ Sleeping for 24 hours...")
time.sleep(86400) # Sleep for one day
if __name__ == "__main__":
price_monitor()
On the first iteration, it gives the following output:
π Running daily price check...
β
Saved latest price data to price_data.csv
β³ Sleeping for 24 hours...
The CSV file it saves looks like this:
Model Name | Full Name | Price | Stock Status |
---|---|---|---|
Refurbished iPhone 13 128GB | Refurbished iPhone 13 128GB - Starlight (Unlocked) | 449 | In Stock |
Refurbished iPhone 13 256GB | Refurbished iPhone 13 256GB - Midnight (Unlocked) | 529 | In Stock |
Refurbished iPhone 14 128GB | Refurbished iPhone 14 128GB - Purple (Unlocked) | 529 | In Stock |
Refurbished iPhone 14 Plus 128GB | Refurbished iPhone 14 Plus 128GB - Yellow (Unlocked) | 619 | In Stock |
Refurbished iPhone 14 256GB | Refurbished iPhone 14 256GB - Blue (Unlocked) | 619 | In Stock |
Refurbished iPhone 13 512GB | Refurbished iPhone 13 512GB - Midnight (Unlocked) | 699 | In Stock |
Refurbished iPhone 14 Plus 256GB | Refurbished iPhone 14 Plus 256GB - Midnight (Unlocked) | 699 | In Stock |
Refurbished iPhone 14 Plus 512GB | Refurbished iPhone 14 Plus 512GB - Midnight (Unlocked) | 869 | In Stock |
Refurbished iPhone 14 Pro 512GB | Refurbished iPhone 14 Pro 512GB - Silver (Unlocked) | 929 | In Stock |
Refurbished iPhone 14 Pro Max 512GB | Refurbished iPhone 14 Pro Max 512GB - Silver (Unlocked) | 1019 | In Stock |
Refurbished iPhone 13 Pro Max 1TB | Refurbished iPhone 13 Pro Max 1TB - Graphite (Unlocked) | 1099 | In Stock |
Refurbished iPhone 14 Pro 1TB | Refurbished iPhone 14 Pro 1TB - Gold (Unlocked) | 1099 | In Stock |
The next day, the output looks like:
π Running daily price check...
β
No changes detected.
β
Saved latest price data to price_data.csv
β³ Sleeping for 24 hours...
Looks like we'll have to wait a few days before seeing how it handles changed prices.
Conclusion
In this blog, we tried using ChatGPT to first do some scraping and then have it write a scraper that we could run to do the scraping on our PC. ChatGPT doesn't seem to be able to iterate over URLs from a sitemap or a listing page. However, it could produce some code to extract links or structured data from a page.
ChatGPT wasn't great at picking out CSS selectors for scraping relevant parts of the HTML, and its context window was too short to paste in HTML code and ask it to pick the selectors. That bit had to be done manually. Overall, using ChatGPT definitely made the code drafting process faster, but it needed some prior scraping experience to debug the errors and ask ChatGPT to suggest relevant fixes and workarounds. For example, ChatGPT may not have thought about using schema.org JSONs unless we mentioned it. So we can recommend ChatGPT to make your workflow faster, but it may not be a substitute for not knowing how scraping code works.
Finally, we did have to pick the CSS selectors. And there were some parts that we couldn't scrape because the data wasn't in a uniquely identifiable HTML element; these bits will need an LLM to go through the content and parse it into structured data. This is where ScrapingBee's AI scraping feature can be best used. With this feature, you can specify a CSS Selector and an AI prompt to run on the contents of that selector, alongside regular content scraping with CSS selectors, all in one API call. You can read more about it in our documentation or see it in action on our blog about How to Easily Scrape Shopify Stores With AI.
P.S: Iβm not getting a new phone. Instead, I shrunk the display size of my current Android phone using adb
. Now that's surely not something ChatGPT can think of as a solution to my stated problem.