How to download a file with Playwright and Python?

You can download a file with Playwright by targeting the file download button on the page using any Locator and clicking it. Alternatively, you can also extract the link from an anchor tag using the get_attribute method and then download the file using requests. This is better as sometimes the PDFs and other downloadable files will open natively in the browser instead of triggering a download on button click.

Here is some sample code that downloads a random paper from arXiv using Playwright and requests:

import requests
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless = False)

    page = browser.new_page()
    page.goto('https://arxiv.org/abs/2301.05226')
		
    # Get the href from the "download pdf" button
    href = page.locator("a.download-pdf").get_attribute('href')
    absolute_url = f"https://arxiv.org{href}"
    
    # Download the file using requests
    file = requests.get(absolute_url)
    with open('output.pdf', 'wb') as f:
        f.write(file.content)

Playwright web scraping tutorial:

Learn web scraping with Playwright

How to download a file with Playwright and Python?

Related Playwright web scraping questions:

Playwright web scraping tutorial: