ChatGPT Scraping - How to Vibe Scrape with ChatGPT

31 March 2025 | 30 min read

LLMs such as ChatGPT have changed how developers write, review, and test code. The biggest testament to this is the rise of the term "Vibe coding", which was coined by Andrej Karpathy in an X post. To quote the post:

There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works. ~ Andrej Karpathy on X

In this blog, we tried to see if we could apply this approach and have ChatGPT scrape some data from the web for us. The idea was to see how far we could get on a defined scraping task without getting hands-on with code. Briefly, we tried getting some iPhone product listings from the Apple website, first by asking directly and then by scraping some URLs. Let's see how it fared.

TL;DR: ChatGPT could write scraping code, but could not run it from the browser based interface. The code worked after a few iterations; with some nudging and hinting.

The Task At Hand: Finding A Small iPhone

If you're like me - digitally minimal and spending a lot of time outdoors, with no social media apps on the phone and only occasionally taking it out to check the map - you have probably run into this problem lately: small phones are going extinct! The phones are quite literally getting out of hand. All I want is a GPS-enabled phone that can run WhatsApp and Google Maps and can be comfortably used with one hand.

With Android, there are some speciality brands like Ulefone and Unihertz, but since they do not retail in India, it would be hard to get them serviced if something goes wrong. The iPhone 16e might just cut it, but I do not want to spend so much for the minimal requirements I have from a phone. As it seems, the best option is for me to get a refurbished iPhone Mini or iPhone SE.

For this blog, the idea was to see if I could get ChatGPT to write a scraper and help me decide which iPhone to buy, and possibly even show me a refurbished phone listing that I can immediately order. My intention here was to be as lazy as possible and see how far it takes me. Ideally, ChatGPT could have suggested a phone and a purchase link, but for demonstrative purposes we asked it to write some scraper code.

Why Scrape When You Can Just Ask?

ChatGPT and other LLMs are usually presented in chat-based interfaces, and in an AI utopia, I should be able to ask for my requirements and get product listings. That was my first attempt:

Me: I'm looking for a small smartphone, something that is comfortable to be used with one hand. Please suggest Apple phones with official website links for phones that have a screen size of less than 6 inches. For each listing, mention the following details:

  • Operating system version (current and upgradable upto)
  • Screen size
  • Phone dimensions (height, thickness and width)
  • Battery capacity
  • Availability of USB-C charging
  • 5G availability
  • Price and Stock status
  • Year of release_

ChatGPT: As of March 2025, Apple has discontinued the sale of smartphones with screen sizes under 6 inches. The smallest iPhones currently available directly from Apple feature 6.1-inch displays, such as the iPhone 15, iPhone 16e, and iPhone 16

However, if you're seeking a compact iPhone suitable for one-handed use, you might consider the following models available through third-party retailers or as refurbished units:

iPhone 13 Mini Operating System: Originally shipped with iOS 15; upgradable to the latest iOS version as of 2025. Screen Size: 5.4 inches Dimensions: Height: 131.5 mm; Width: 64.2 mm; Depth: 7.65 mm Battery Capacity: Approximately 2,438 mAh USB-C Charging: No; utilizes Lightning connector 5G Availability: Yes Price and Stock Status: Varies by retailer; discontinued by Apple in September 2023 Year of Release: 2021

iPhone SE (3rd Generation, 2022) Operating System: Originally shipped with iOS 15; upgradable to the latest iOS version as of 2025. Screen Size: 4.7 inches Dimensions: Height: 138.4 mm; Width: 67.3 mm; Depth: 7.3 mm Battery Capacity: Approximately 2,018 mAh USB-C Charging: No; utilizes Lightning connector 5G Availability: Yes Price and Stock Status: Varies by retailer; discontinued by Apple in February 2025 Year of Release: 2022

Please note that while these models are no longer sold directly by Apple, they may still be available through authorized resellers or as refurbished units. For the most accurate information on pricing and availability, it's advisable to check with local retailers or Apple's official refurbished store.

The response was fairly accurate, but it missed the iPhone 12 mini. The next question was: Can ChatGPT go through a list of URLs and give me a list of phones that satisfy my expectations?

Me: Go through the listings on this sitemap: https://www.apple.com/shop/sitemaps/refurbished.xml

Tell me if there are any refurbished iPhone 13 mini, iPhone 12 mini, or iPhone SE units available for purchase.

ChatGPT: As of March 25, 2025, Apple's official refurbished store offers the following refurbished iPhone models:

Refurbished iPhone 13 mini 512GB - Midnight (Unlocked): This model features a 5.4-inch display and is available for purchase. Apple

Refurbished iPhone 13 256GB: Available in Starlight, Pink, Blue, and Green colors, these models are listed at $529.00, reflecting a $200.00 discount from the original price. Apple

Regarding the iPhone 12 mini and iPhone SE models, there are currently no listings for these devices in Apple's refurbished store. Availability of refurbished products is subject to change based on inventory, so it's advisable to check the store periodically for updates.

The response looks neat, but there was one issue: the first link showed an out-of-stock product, and the second one did not lead to a product page but redirected to the refurbished products listings page.

πŸ€– Need help finding all the URLs on a website so you can scrape them? Check out our expert-level guide on How to find all URLs on a domain’s website (multiple methods).

Resorting To Scraping: Initial Lazy Attempt

From the above response, I assumed ChatGPT is unable to access the Apple site in real-time and produce the output I needed. I decided to have ChatGPT go through each URL and give me a response with the data I needed. So I asked:

Me: The first link is out of stock. Only the 2nd link works.

Instead, can you go through the phone available on this link: https://www.apple.com/shop/refurbished/iphone? Scrape each link to make a table with the following fields:

  • Link
  • Phone Model Name
  • Operating system version (current and upgradable upto)
  • Screen size
  • Phone dimensions (height, thickness and width)
  • Battery capacity
  • Availability of USB-C charging
  • 5G availability
  • Price and Stock status
  • Year of release

Finally, sort the table in the increasing order of screen size

ChatGPT returned a table with around 7 listings at first, but since I could see there were more listings on the Apple page, I nudged it to get all of them. The table it returned looked like this:

ModelScreen SizeDimensions (H x W x D)Battery CapacityUSB-C Charging5G AvailabilityPrice (USD)Year of Release
Refurbished iPhone 13 128GB - Midnight (Unlocked)6.1 inches146.7 x 71.5 x 7.65 mm3,227 mAhNoYes$449.002021
Refurbished iPhone 13 128GB - Starlight (Unlocked)6.1 inches146.7 x 71.5 x 7.65 mm3,227 mAhNoYes$449.002021
Refurbished iPhone 13 128GB - Pink (Unlocked)6.1 inches146.7 x 71.5 x 7.65 mm3,227 mAhNoYes$449.002021
... more rows...

At first glance, the table looks like ChatGPT did a good job. The data is fairly accurate, the list looks sorted in the order I specified. But once again, the links weren't working!

Me: The links you shared are broken. Will you be able to give me working links?

ChatGPT: I apologize for the earlier issues with the links. As of March 25, 2025, Apple's Certified Refurbished store offers a selection of refurbished iPhone models. You can view the available options here: APPLE

Please note that the availability of specific models, such as the iPhone 13 mini, iPhone 12 mini, or iPhone SE, may vary. I recommend visiting the provided link to check the current stock and pricing for these models.

ChatGPT took me back to square one, sharing the refurbished phone listings link that I shared with it to start with!

All this time, I tried avoiding having to get involved with code in any way, but now I decided to give in to the temptation to use code. So I prompted:

Me: Go through the source code of this webpage: https://www.apple.com/shop/refurbished/iphone

Write a scraper in Python that can extract all the product URLs from this page and return them as a list.

ChatGPT: Here's a Python scraper using requests and BeautifulSoup to extract all product URLs from the Apple Refurbished iPhone page:

Button This script fetches the refurbished iPhone page, parses it with BeautifulSoup, and extracts product URLs. Let me know if you need modifications! πŸš€

ChatGPT showed a button in the chat, which opened up the generated code in a mini IDE setup. I could supposedly even run the code from there:

ChatGPT Code Editor

I immediately clicked on the "run" button. There was an error due to the browser interface not being able to send HTTP requests via the requests library. So I had to download the code and run it locally. The code generated is shown below:

import requests
from bs4 import BeautifulSoup

def get_refurbished_iphone_urls():
    url = "<https://www.apple.com/shop/refurbished/iphone>"
    headers = {"User-Agent": "Mozilla/5.0"}
    response = requests.get(url, headers=headers)

    if response.status_code != 200:
        print("Failed to retrieve page")
        return []

    soup = BeautifulSoup(response.text, "html.parser")
    product_links = []

    for link in soup.find_all("a", href=True):
        href = link["href"]
        if "/shop/product/" in href:
            full_url = f"<https://www.apple.com>{href}"
            product_links.append(full_url)

    return product_links

# Example usage
if __name__ == "__main__":
    urls = get_refurbished_iphone_urls()
    print(urls)

The code looks neat, and it worked well too. I got a list of URLs from the refurbished store page when I ran it. ChatGPT was smart with adding logic to return just the Product page URLs, and it was nice that it prepended "https://www.apple.com" to the raw href attributes.

ChatGPT-Generated Code For Scraping Apple Product Data

Once the code to extract the links was done, the next step was to write code to scrape each of these links for the data I needed. The product listing page looks like this:

Apple Product Page

Me: Go through the source code of this webpage: https://www.apple.com/shop/product/FLML3LL/A/refurbished-iphone-13-128gb-midnight-unlocked

Write a Python scraper function that takes in a URL like this and returns the following details as a dictionary with the following details:

  • Input URL
  • Phone Model Name
  • Operating system version (current and upgradable upto)
  • Screen size
  • Phone dimensions (height, thickness and width)
  • Battery capacity
  • Availability of USB-C charging
  • 5G availability
  • Price and Stock status
  • Year of release

Once this is done, write another function that can take a list of URLs, run the previous function for each URL (in parallel, using asyncio), gather the results, and make a Pandas DataFrame out of it.

Finally, the code must print the Pandas dataframe after sorting it in increasing order of screen size.

Here's the code ChatGPT returned:

import asyncio
import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

async def fetch_page(session, url):
    """Fetches the HTML content of a given URL."""
    async with session.get(url) as response:
        return await response.text()

async def scrape_iphone_details(url):
    """Scrapes details of an iPhone from the given Apple Refurbished Store product URL."""
    async with aiohttp.ClientSession() as session:
        html = await fetch_page(session, url)
        soup = BeautifulSoup(html, 'html.parser')

        try:
            phone_name = soup.find("h1", class_="product-title").text.strip()
            price = soup.find("span", class_="as-price-currentprice").text.strip()
            stock_status = "In Stock" if "Add to Bag" in html else "Out of Stock"

            details_section = soup.find("div", class_="tech-specs")
            details_list = details_section.find_all("li") if details_section else []

            details = {}
            for item in details_list:
                text = item.text.strip()
                if "display" in text.lower():
                    details["Screen Size"] = text.split(":")[-1].strip()
                elif "dimensions" in text.lower():
                    details["Dimensions"] = text.split(":")[-1].strip()
                elif "battery" in text.lower():
                    details["Battery Capacity"] = text.split(":")[-1].strip()
                elif "5g" in text.lower():
                    details["5G Availability"] = "Yes" if "5G" in text else "No"
                elif "usb-c" in text.lower():
                    details["USB-C Charging"] = "Yes" if "USB-C" in text else "No"
                elif "release year" in text.lower():
                    details["Year of Release"] = text.split(":")[-1].strip()

            return {
                "Input URL": url,
                "Phone Model Name": phone_name,
                "Operating System": "iOS (Check Apple site for current version)",
                "Screen Size": details.get("Screen Size", "Unknown"),
                "Dimensions": details.get("Dimensions", "Unknown"),
                "Battery Capacity": details.get("Battery Capacity", "Unknown"),
                "USB-C Charging": details.get("USB-C Charging", "No"),
                "5G Availability": details.get("5G Availability", "Unknown"),
                "Price": price,
                "Stock Status": stock_status,
                "Year of Release": details.get("Year of Release", "Unknown"),
            }

        except Exception as e:
            return {"Error": f"Failed to scrape {url} - {str(e)}"}

async def scrape_multiple_iphones(urls):
    """Scrapes details of multiple iPhones in parallel using asyncio."""
    tasks = [scrape_iphone_details(url) for url in urls]
    return await asyncio.gather(*tasks)

def main(urls):
    """Runs the scraper on a list of URLs and sorts the results by increasing screen size."""
    results = asyncio.run(scrape_multiple_iphones(urls))
    df = pd.DataFrame(results)
    df["Screen Size"] = pd.to_numeric(df["Screen Size"].str.replace('"', ''), errors='coerce')
    df = df.sort_values(by="Screen Size", ascending=True)
    print(df)

# Example usage
urls = [
    "<https://www.apple.com/shop/product/FLML3LL/A/refurbished-iphone-13-128gb-midnight-unlocked>",
    "<https://www.apple.com/shop/product/FLMJ3LL/A/refurbished-iphone-13-256gb-starlight-unlocked>"
]
main(urls)

I ran the code without going through it. I first got a KeyError because "Screen Size" wasn't in some dict. I asked ChatGPT to fix it, and this is the fixed code it gave me:

import asyncio
import aiohttp
import pandas as pd
from bs4 import BeautifulSoup
import re

async def fetch_page(session, url):
    """Fetches the HTML content of a given URL asynchronously."""
    async with session.get(url) as response:
        return await response.text()

async def scrape_iphone_details(url):
    """Scrapes details of an iPhone from the given Apple Refurbished Store product URL."""
    async with aiohttp.ClientSession() as session:
        html = await fetch_page(session, url)
        soup = BeautifulSoup(html, 'html.parser')

        try:
            phone_name = soup.find("h1", class_="product-title").text.strip() if soup.find("h1", class_="product-title") else "Unknown Model"
            price = soup.find("span", class_="as-price-currentprice").text.strip() if soup.find("span", class_="as-price-currentprice") else "Unknown Price"
            stock_status = "In Stock" if "Add to Bag" in html else "Out of Stock"

            # Find the section that contains technical details
            details_section = soup.find("div", class_="tech-specs")
            details_list = details_section.find_all("li") if details_section else []

            # Extract details dynamically
            details = {
                "Screen Size": "Unknown",
                "Dimensions": "Unknown",
                "Battery Capacity": "Unknown",
                "5G Availability": "Unknown",
                "USB-C Charging": "No",
                "Year of Release": "Unknown"
            }

            for item in details_list:
                text = item.text.strip().lower()

                if "display" in text or "screen" in text:
                    match = re.search(r'(\\d+(\\.\\d+)?)\\s*inch', text)
                    details["Screen Size"] = match.group(1) + '"' if match else "Unknown"

                if "dimensions" in text:
                    details["Dimensions"] = text.split(":")[-1].strip()

                if "battery" in text:
                    match = re.search(r'(\\d{3,4})\\s*mAh', text)
                    details["Battery Capacity"] = match.group(1) + " mAh" if match else "Unknown"

                if "5g" in text:
                    details["5G Availability"] = "Yes" if "5g" in text else "No"

                if "usb-c" in text:
                    details["USB-C Charging"] = "Yes"

                if "release year" in text:
                    match = re.search(r'(\\d{4})', text)
                    details["Year of Release"] = match.group(1) if match else "Unknown"

            return {
                "Input URL": url,
                "Phone Model Name": phone_name,
                "Operating System": "iOS (Check Apple site for current version)",
                "Screen Size": details["Screen Size"],
                "Dimensions": details["Dimensions"],
                "Battery Capacity": details["Battery Capacity"],
                "USB-C Charging": details["USB-C Charging"],
                "5G Availability": details["5G Availability"],
                "Price": price,
                "Stock Status": stock_status,
                "Year of Release": details["Year of Release"],
            }

        except Exception as e:
            return {"Error": f"Failed to scrape {url} - {str(e)}"}

async def scrape_multiple_iphones(urls):
    """Scrapes details of multiple iPhones in parallel using asyncio."""
    tasks = [scrape_iphone_details(url) for url in urls]
    return await asyncio.gather(*tasks)

def main(urls):
    """Runs the scraper on a list of URLs and sorts the results by increasing screen size."""
    results = asyncio.run(scrape_multiple_iphones(urls))
    df = pd.DataFrame(results)

    # Convert screen size to numeric for sorting
    df["Screen Size"] = pd.to_numeric(df["Screen Size"].str.replace('"', ''), errors='coerce')
    df = df.sort_values(by="Screen Size", ascending=True)

    print(df)

# Example usage
urls = [
    "<https://www.apple.com/shop/product/FLML3LL/A/refurbished-iphone-13-128gb-midnight-unlocked>",
    "<https://www.apple.com/shop/product/FLMJ3LL/A/refurbished-iphone-13-256gb-starlight-unlocked>"
]
main(urls)

The fix seemed okay, but the phone name and stock status were not being accurately scraped. Upon inspecting the HTML code on the Apple website, I figured that the CSS selectors ChatGPT was using were wrong, and it couldn't get them right even after asking for a fix. I tried pasting the source code of the webpage and asked ChatGPT to suggest the right selectors, but the message was too long for ChatGPT to accept. Finally, I asked for this:

Me: Still not working. Can you use the schema.org json for product name, price, and stock status instead of the HTML?

Schema.org JSON objects embedded in the webpage are a more uniform representation of data on the web, and this is something that is easier to parse and will not change with a webpage redesign. Here's the modified bit from ChatGPT's new code:

...
html = await fetch_page(session, url)
soup = BeautifulSoup(html, 'html.parser')

try:
    # Extract JSON-LD script containing structured data
    json_ld_script = soup.find("script", type="application/ld+json")
    product_data = json.loads(json_ld_script.string) if json_ld_script else {}

    # Extract product name, price, and stock status from JSON-LD
    phone_name = product_data.get("name", "Unknown Model")
    price = product_data.get("offers", {}).get("price", "Unknown Price")
    stock_status = product_data.get("offers", {}).get("availability", "").split("/")[-1]
    stock_status = "In Stock" if stock_status.lower() == "instock" else "Out of Stock"
...

From this and the rest of the code, I could see two issues, which I asked ChatGPT to fix:

Me: There are still two issues:

  1. There may be multiple schema objects. Make sure you use the "Product" one.
  2. There is no "div.as-tech-specs" section to extract the details from. Use "div.TechSpecs-panel" instead. There are no "li" elements inside that either. Check only divs.

ChatGPT made some minor edits, and returned the code. I tried running the code, which likely failed, because all I got were blank results. This most likely happened because ChatGPT had put the scraping code inside a try block and returned error dicts even if something went wrong. I asked it to replace this with a None response and print the traceback whenever there is an error. Here's the fixed error handling code I got:

except Exception as e:
    print(f"❌ ERROR while scraping {url} ❌")
    traceback.print_exc()  # Print full error traceback for debugging
    return None  # Return None so failed entries are ignored

That code has emojis! Perhaps it's the best indicator of whether any given piece of code was generated with ChatGPT. I tried running the code and got an error printed on my screen, which I pasted directly into ChatGPT:

Me:

❌ ERROR while scraping <https://www.apple.com/shop/product/FLML3LL/A/refurbished-iphone-13-128gb-midnight-unlocked> ❌
Traceback (most recent call last):
  File "/home/kd/scrapingbee/chatgpt-apple-scraper/get_data.py", line 40, in scrape_iphone_details
    price = product_data.get("offers", {}).get("price", "Unknown Price")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'get'
Traceback (most recent call last):
  File "/home/kd/scrapingbee/chatgpt-apple-scraper/get_data.py", line 124, in <module>
    main(urls)
  File "/home/kd/scrapingbee/chatgpt-apple-scraper/get_data.py", line 115, in main
    df["Screen Size"] = pd.to_numeric(df["Screen Size"].str.replace('"', ''), errors='coerce')
                                      ~~^^^^^^^^^^^^^^^
  File "/home/kd/.cache/pypoetry/virtualenvs/chatgpt-apple-scraper-C8P38fdX-py3.11/lib/python3.11/site-packages/pandas/core/frame.py", line 4102, in __getitem__
    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kd/.cache/pypoetry/virtualenvs/chatgpt-apple-scraper-C8P38fdX-py3.11/lib/python3.11/site-packages/pandas/core/indexes/range.py", line 417, in get_loc
    raise KeyError(key)
KeyError: 'Screen Size'

ChatGPT analyzed these errors and listed the following issues along with fixes:

  • Issue 1: product_data["offers"] is a list, not a dictionary
  • Issue 2: Screen Size column is missing in Pandas DataFrame

After applying these fixes, here's the final working code produced by ChatGPT:

import asyncio
import aiohttp
import pandas as pd
from bs4 import BeautifulSoup
import re
import json
import traceback

async def fetch_page(session, url):
    """Fetches the HTML content of a given URL asynchronously."""
    async with session.get(url) as response:
        return await response.text()

async def scrape_iphone_details(url):
    """Scrapes details of an iPhone from the given Apple Refurbished Store product URL."""
    async with aiohttp.ClientSession() as session:
        html = await fetch_page(session, url)
        soup = BeautifulSoup(html, 'html.parser')

        try:
            # Extract JSON-LD script containing "Product" data
            json_ld_scripts = soup.find_all("script", type="application/ld+json")
            product_data = {}

            for script in json_ld_scripts:
                try:
                    data = json.loads(script.string)
                    if isinstance(data, list):  # JSON-LD sometimes is an array
                        for item in data:
                            if item.get("@type") == "Product":
                                product_data = item
                                break
                    elif data.get("@type") == "Product":
                        product_data = data
                except json.JSONDecodeError:
                    continue  # Skip invalid JSON sections

            # Extract product details
            phone_name = product_data.get("name", "Unknown Model")

            # Fix for the 'list' object issue
            offers = product_data.get("offers", [])
            if isinstance(offers, list) and len(offers) > 0:
                price = offers[0].get("price", "Unknown Price")
                stock_status = offers[0].get("availability", "").split("/")[-1]
            else:
                price = "Unknown Price"
                stock_status = "Unknown"

            stock_status = "In Stock" if stock_status.lower() == "instock" else "Out of Stock"

            # Extract technical specifications from "TechSpecs-panel"
            details_section = soup.find("div", class_="TechSpecs-panel")
            details_divs = details_section.find_all("div") if details_section else []

            details = {
                "Screen Size": "Unknown",
                "Dimensions": "Unknown",
                "Battery Capacity": "Unknown",
                "5G Availability": "Unknown",
                "USB-C Charging": "No",
                "Year of Release": "Unknown"
            }

            for div in details_divs:
                text = div.get_text(strip=True).lower()

                if "display" in text or "screen" in text:
                    match = re.search(r'(\\d+(\\.\\d+)?)\\s*inch', text)
                    details["Screen Size"] = match.group(1) + '"' if match else "Unknown"

                if "dimensions" in text:
                    details["Dimensions"] = text.split(":")[-1].strip()

                if "battery" in text:
                    match = re.search(r'(\\d{3,4})\\s*mAh', text)
                    details["Battery Capacity"] = match.group(1) + " mAh" if match else "Unknown"

                if "5g" in text:
                    details["5G Availability"] = "Yes" if "5g" in text else "No"

                if "usb-c" in text:
                    details["USB-C Charging"] = "Yes"

                if "release year" in text:
                    match = re.search(r'(\\d{4})', text)
                    details["Year of Release"] = match.group(1) if match else "Unknown"

            return {
                "Input URL": url,
                "Phone Model Name": phone_name,
                "Operating System": "iOS (Check Apple site for current version)",
                "Screen Size": details["Screen Size"],
                "Dimensions": details["Dimensions"],
                "Battery Capacity": details["Battery Capacity"],
                "USB-C Charging": details["USB-C Charging"],
                "5G Availability": details["5G Availability"],
                "Price": f"${price}" if price != "Unknown Price" else price,
                "Stock Status": stock_status,
                "Year of Release": details["Year of Release"],
            }

        except Exception as e:
            print(f"❌ ERROR while scraping {url} ❌")
            traceback.print_exc()
            return None  # Ignore failed scrapes

async def scrape_multiple_iphones(urls):
    """Scrapes details of multiple iPhones in parallel using asyncio."""
    tasks = [scrape_iphone_details(url) for url in urls]
    return await asyncio.gather(*tasks)

def main(urls):
    """Runs the scraper on a list of URLs and sorts the results by increasing screen size."""
    results = asyncio.run(scrape_multiple_iphones(urls))

    # Remove None values (failed scrapes)
    results = [r for r in results if r is not None]

    df = pd.DataFrame(results)

    # Fix for missing 'Screen Size' key
    if "Screen Size" in df.columns:
        df["Screen Size"] = pd.to_numeric(df["Screen Size"].str.replace('"', ''), errors='coerce')
        df = df.sort_values(by="Screen Size", ascending=True)

    print(df)

# Example usage
urls = [
    "<https://www.apple.com/shop/product/FLML3LL/A/refurbished-iphone-13-128gb-midnight-unlocked>",
    "<https://www.apple.com/shop/product/FLMJ3LL/A/refurbished-iphone-13-256gb-starlight-unlocked>"
]
main(urls)

The URLs from the example weren't working out of the box though. I manually called the URL scraper we wrote first and then used those URLs for scraping. Here's the final result I got:

Phone Model NameOperating SystemScreen SizeDimensionsBattery CapacityUSB-C Charging5G AvailabilityPriceStock StatusYear of Release
Refurbished iPhone 13 128GB - Midnight (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$449.0In StockUnknown
Refurbished iPhone 13 128GB - Starlight (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$449.0In StockUnknown
Refurbished iPhone 13 128GB - Pink (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$449.0In StockUnknown
Refurbished iPhone 13 128GB - Blue (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$449.0In StockUnknown
Refurbished iPhone 13 128GB - Green (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$449.0In StockUnknown
Refurbished iPhone 13 256GB - Midnight (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$529.0In StockUnknown
Refurbished iPhone 13 256GB - Starlight (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$529.0In StockUnknown
Refurbished iPhone 13 256GB - Pink (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$529.0In StockUnknown
Refurbished iPhone 13 256GB - Blue (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$529.0In StockUnknown
Refurbished iPhone 13 256GB - Green (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$529.0In StockUnknown
Refurbished iPhone 14 128GB - Purple (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$529.0In StockUnknown
Refurbished iPhone 14 128GB - Blue (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$529.0In StockUnknown
Refurbished iPhone 14 Plus 128GB - Midnight (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$619.0In StockUnknown
Refurbished iPhone 14 Plus 128GB - Starlight (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$619.0In StockUnknown
Refurbished iPhone 14 Plus 128GB - Purple (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$619.0In StockUnknown
Refurbished iPhone 14 Plus 128GB - Blue (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$619.0In StockUnknown
Refurbished iPhone 14 Plus 128GB - Yellow (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$619.0In StockUnknown
Refurbished iPhone 13 512GB - Midnight (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$699.0In StockUnknown
Refurbished iPhone 14 Plus 256GB - Midnight (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$699.0In StockUnknown
Refurbished iPhone 13 Pro Max 512GB - Silver (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$929.0In StockUnknown
Refurbished iPhone 13 Pro Max 512GB - Gold (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$929.0In StockUnknown
Refurbished iPhone 14 Pro 512GB - Space Black (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$929.0In StockUnknown
Refurbished iPhone 14 Pro 512GB - Silver (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$929.0In StockUnknown
Refurbished iPhone 14 Pro 512GB - Gold (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$929.0In StockUnknown
Refurbished iPhone 14 Pro Max 512GB - Silver (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$1019.0In StockUnknown
Refurbished iPhone 14 Pro Max 512GB - Deep Purple (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$1019.0In StockUnknown
Refurbished iPhone 13 Pro Max 1TB - Graphite (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$1099.0In StockUnknown
Refurbished iPhone 14 Pro 1TB - Gold (Unlocked)iOS (Check Apple site for current version)nanUnknownUnknownNoYes$1099.0In StockUnknown

Some columns like Screen Size and Dimensions do not have accurate values, but they aren't mentioned in a structured format on the Apple website, so it's trickier to parse that with just code. This might be obtainable by an AI going through the product listing page.

Can ChatGPT Scrape Itself?

I thought checking if ChatGPT could scrape itself would be a fun exercise. So I decided to ask it to generate a changelog using the chat link.

Me: Can you scrape this chat and generate the changelog with updates that were made to the code?

Link: https://chatgpt.com/c/67e22699-4648-8000-8572-3e41a286d305

ChatGPT: I don't have direct access to external links, including past chat logs. However, I can summarize the changes based on our conversation here!

πŸ”Ή Changelog for iPhone Refurbished Store Scraper Version 1.0 β†’ Version 1.1 βœ… Fixed AttributeError for product_data["offers"] being a list instead of a dictionary βœ… Updated price & stock status extraction to correctly handle multiple offers

Version 1.1 β†’ Version 1.2 βœ… Replaced missing "Screen Size" check before sorting in Pandas (fixed KeyError: 'Screen Size') βœ… Ensured proper JSON parsing by checking multiple schema.org Product objects

Version 1.2 β†’ Version 1.3 (Final Stable Release) βœ… Updated TechSpecs-panel selector (instead of as-tech-specs) βœ… Improved error handling with traceback.print_exc() for full debugging βœ… Added conditional check for "Screen Size" in the DataFrame before processing

Now, the scraper works without errors and correctly extracts all details! πŸš€ If you need a structured CHANGELOG.md file, let me know! 😊

Well, ChatGPT can't scrape itself, or so it said. But we got a nice changelog!

Generating Price Monitoring Code With ChatGPT

The next step was to see if we could make the code deployable. It has to run each day and actually notify if there's something I can buy - either a price drop, or a new model being listed. I asked ChatGPT to add this feature.

Me: I need some updates to the previous scraping code to make it a price monitor. Here's what I need:

  • Filter the table, showing only the entries with the lowest price for that model name
  • For the model name, exclude the color. So you can break the name at the hyphen and take only the first part.
  • Make this a price monitor. This script needs to be run every day. So, write another program that calls the first program's function to get the URLs, scrapes all the URLs, and stores the result in memory. This must be in an infinite loop, running once per day. Any changes or new listings from the previous day must be dumped as a markdown file, with the date in the filename.

ChatGPT generated a scraper.py and price_monitor.py. The first one was an update to the previous code, and the second one was to run the whole thing in an infinite loop, once per day, and print the updates I needed. I had to ask for some fixes:

Me: I just went through the code, need a few updates:

  • Use the get_refurbished_iphone_urls() you previously generated, instead of fetch_urls()
  • Add some emojis in the final markdown output

After it produced some updated code, I asked for the proverbial "one last change":

Me: Great, one last change: save the old_df to disk and load it from disk the next day for comparison. This way the data wouldn't be lost if the script is killed or exited for some reason.

Here's what the final code looked like:

# scraper.py
import asyncio
import aiohttp
import pandas as pd
from bs4 import BeautifulSoup
import json
import traceback

async def fetch_page(session, url):
    """Fetches the HTML content of a given URL asynchronously."""
    async with session.get(url) as response:
        return await response.text()

async def scrape_iphone_details(url):
    """Scrapes details of an iPhone from the given Apple Refurbished Store product URL."""
    async with aiohttp.ClientSession() as session:
        html = await fetch_page(session, url)
        soup = BeautifulSoup(html, 'html.parser')

        try:
            # Extract JSON-LD script containing "Product" data
            json_ld_scripts = soup.find_all("script", type="application/ld+json")
            product_data = {}

            for script in json_ld_scripts:
                try:
                    data = json.loads(script.string)
                    if isinstance(data, list):  # JSON-LD sometimes is an array
                        for item in data:
                            if item.get("@type") == "Product":
                                product_data = item
                                break
                    elif data.get("@type") == "Product":
                        product_data = data
                except json.JSONDecodeError:
                    continue  # Skip invalid JSON sections

            # Extract product details
            phone_name = product_data.get("name", "Unknown Model")

            # Extract only the model name (remove color & storage info)
            model_base_name = phone_name.split("-")[0].strip()

            # Fix for the 'list' object issue
            offers = product_data.get("offers", [])
            if isinstance(offers, list) and len(offers) > 0:
                price = offers[0].get("price", "Unknown Price")
                stock_status = offers[0].get("availability", "").split("/")[-1]
            else:
                price = "Unknown Price"
                stock_status = "Unknown"

            stock_status = "In Stock" if stock_status.lower() == "instock" else "Out of Stock"

            return {
                "Model Name": model_base_name,
                "Full Name": phone_name,
                "Input URL": url,
                "Price": price,
                "Stock Status": stock_status,
            }

        except Exception as e:
            print(f"❌ ERROR while scraping {url} ❌")
            traceback.print_exc()
            return None  # Ignore failed scrapes

async def scrape_multiple_iphones(urls):
    """Scrapes details of multiple iPhones in parallel using asyncio."""
    tasks = [scrape_iphone_details(url) for url in urls]
    return await asyncio.gather(*tasks)

def main(urls):
    """Runs the scraper on a list of URLs and filters the lowest price per model."""
    results = asyncio.run(scrape_multiple_iphones(urls))
    
    # Remove None values (failed scrapes)
    results = [r for r in results if r is not None]

    df = pd.DataFrame(results)

    # Keep only the lowest price for each model
    df = df.sort_values(by="Price").drop_duplicates(subset=["Model Name"], keep="first")

    return df
# price_monitor.py

import time
import datetime
import os
import pandas as pd
import scraper  # Import the scraper script
from get_refurbished_iphone_urls import get_refurbished_iphone_urls  # Fetches URLs dynamically

PRICE_FILE = "price_data.csv"

def load_previous_data():
    """Loads the last recorded price data from disk (if available)."""
    if os.path.exists(PRICE_FILE):
        try:
            return pd.read_csv(PRICE_FILE)
        except Exception as e:
            print(f"⚠️ Error loading previous data: {e}. Starting fresh.")
            return None
    return None

def save_new_data(df):
    """Saves the latest price data to disk."""
    df.to_csv(PRICE_FILE, index=False)
    print(f"βœ… Saved latest price data to {PRICE_FILE}")

def compare_prices(old_df, new_df):
    """Compare old and new price data and return changes as a markdown string with emojis."""
    changes = []
    
    for _, new_row in new_df.iterrows():
        model_name = new_row["Model Name"]
        new_price = new_row["Price"]
        new_stock = new_row["Stock Status"]
        
        old_row = old_df[old_df["Model Name"] == model_name]
        if not old_row.empty:
            old_price = old_row.iloc[0]["Price"]
            old_stock = old_row.iloc[0]["Stock Status"]

            if new_price != old_price or new_stock != old_stock:
                price_change_emoji = "πŸ”»" if new_price < old_price else "πŸ”Ί"
                stock_change_emoji = "🟒" if new_stock == "In Stock" else "πŸ”΄"
                changes.append(f"| {model_name} | {price_change_emoji} **${old_price} β†’ ${new_price}** | {stock_change_emoji} {old_stock} β†’ **{new_stock}** |")
        else:
            changes.append(f"| {model_name} | πŸŽ‰ **New Listing: ${new_price}** | 🟒 **{new_stock}** |")

    if not changes:
        return None

    markdown_content = f"# πŸ“‰ Price Changes - {datetime.date.today()}\n\n"
    markdown_content += "| Model Name | Price Change | Stock Status |\n"
    markdown_content += "|------------|-------------|--------------|\n"
    markdown_content += "\n".join(changes)
    
    return markdown_content

def save_markdown(content):
    """Save the markdown content to a file with today's date."""
    filename = f"price_changes_{datetime.date.today()}.md"
    with open(filename, "w") as file:
        file.write(content)
    print(f"βœ… Saved changes to {filename}")

def price_monitor():
    """Runs the price monitor indefinitely, checking for changes daily."""
    previous_data = load_previous_data()

    while True:
        print("πŸ”„ Running daily price check...")

        # Get the latest refurbished iPhone URLs dynamically
        urls = get_refurbished_iphone_urls()
        new_data = scraper.main(urls)

        if previous_data is not None:
            changes = compare_prices(previous_data, new_data)
            if changes:
                save_markdown(changes)
            else:
                print("βœ… No changes detected.")

        # Save the new data for future comparison
        save_new_data(new_data)

        # Update previous_data for the next iteration
        previous_data = new_data.copy()

        print("⏳ Sleeping for 24 hours...")
        time.sleep(86400)  # Sleep for one day

if __name__ == "__main__":
    price_monitor()

On the first iteration, it gives the following output:

πŸ”„ Running daily price check...
βœ… Saved latest price data to price_data.csv
⏳ Sleeping for 24 hours...

The CSV file it saves looks like this:

Model NameFull NamePriceStock Status
Refurbished iPhone 13 128GBRefurbished iPhone 13 128GB - Starlight (Unlocked)449In Stock
Refurbished iPhone 13 256GBRefurbished iPhone 13 256GB - Midnight (Unlocked)529In Stock
Refurbished iPhone 14 128GBRefurbished iPhone 14 128GB - Purple (Unlocked)529In Stock
Refurbished iPhone 14 Plus 128GBRefurbished iPhone 14 Plus 128GB - Yellow (Unlocked)619In Stock
Refurbished iPhone 14 256GBRefurbished iPhone 14 256GB - Blue (Unlocked)619In Stock
Refurbished iPhone 13 512GBRefurbished iPhone 13 512GB - Midnight (Unlocked)699In Stock
Refurbished iPhone 14 Plus 256GBRefurbished iPhone 14 Plus 256GB - Midnight (Unlocked)699In Stock
Refurbished iPhone 14 Plus 512GBRefurbished iPhone 14 Plus 512GB - Midnight (Unlocked)869In Stock
Refurbished iPhone 14 Pro 512GBRefurbished iPhone 14 Pro 512GB - Silver (Unlocked)929In Stock
Refurbished iPhone 14 Pro Max 512GBRefurbished iPhone 14 Pro Max 512GB - Silver (Unlocked)1019In Stock
Refurbished iPhone 13 Pro Max 1TBRefurbished iPhone 13 Pro Max 1TB - Graphite (Unlocked)1099In Stock
Refurbished iPhone 14 Pro 1TBRefurbished iPhone 14 Pro 1TB - Gold (Unlocked)1099In Stock

The next day, the output looks like:

πŸ”„ Running daily price check...
βœ… No changes detected.
βœ… Saved latest price data to price_data.csv
⏳ Sleeping for 24 hours...

Looks like we'll have to wait a few days before seeing how it handles changed prices.

Conclusion

In this blog, we tried using ChatGPT to first do some scraping and then have it write a scraper that we could run to do the scraping on our PC. ChatGPT doesn't seem to be able to iterate over URLs from a sitemap or a listing page. However, it could produce some code to extract links or structured data from a page.

ChatGPT wasn't great at picking out CSS selectors for scraping relevant parts of the HTML, and its context window was too short to paste in HTML code and ask it to pick the selectors. That bit had to be done manually. Overall, using ChatGPT definitely made the code drafting process faster, but it needed some prior scraping experience to debug the errors and ask ChatGPT to suggest relevant fixes and workarounds. For example, ChatGPT may not have thought about using schema.org JSONs unless we mentioned it. So we can recommend ChatGPT to make your workflow faster, but it may not be a substitute for not knowing how scraping code works.

Finally, we did have to pick the CSS selectors. And there were some parts that we couldn't scrape because the data wasn't in a uniquely identifiable HTML element; these bits will need an LLM to go through the content and parse it into structured data. This is where ScrapingBee's AI scraping feature can be best used. With this feature, you can specify a CSS Selector and an AI prompt to run on the contents of that selector, alongside regular content scraping with CSS selectors, all in one API call. You can read more about it in our documentation or see it in action on our blog about How to Easily Scrape Shopify Stores With AI.

P.S: I’m not getting a new phone. Instead, I shrunk the display size of my current Android phone using adb. Now that's surely not something ChatGPT can think of as a solution to my stated problem.

image description
Karthik Devan

I work freelance on full-stack development of apps and websites, and I'm also trying to work on a SaaS product. When I'm not working, I like to travel, play board games, hike and climb rocks.