You can block resources in Playwright by making use of the route
method of the Page
or Browser
object and registering an interceptor that rejects requests based on certain parameters. For instance, you can block all remote resources of image type. You can also filter the URL and block specific URLs.
Here is some sample code that navigates to the ScrapingBee homepage while blocking all images and all URLs containing "google":
from playwright.sync_api import sync_playwright
import time
def route_intercept(route):
if route.request.resource_type == "image":
print(f"Blocking the image request to: {route.request.url}")
return route.abort()
if "google" in route.request.url:
print(f"blocking {route.request.url} as it contains Google")
return route.abort()
return route.continue_()
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
page = browser.new_page()
# Intercept all requests
page.route("**/*", route_intercept)
page.goto("https://scrapingbee.com/")
time.sleep(30)
It should print an output similar to this:
blocking https://www.googleoptimize.com/optimize.js?id=OPT-MXVDJM6 as it contains Google
Blocking the image request to: https://www.scrapingbee.com/images/logo.svg
Blocking the image request to: https://www.scrapingbee.com/images/landing/hero_illustration.svg
Blocking the image request to: https://www.scrapingbee.com/images/landing/feature_headless.svg
Blocking the image request to: https://www.scrapingbee.com/images/testimonials/mike.png
Blocking the image request to: https://www.scrapingbee.com/images/landing/feature_rendering.svg
Blocking the image request to: https://www.scrapingbee.com/images/testimonials/russel.jpeg
...
Blocking the image request to: https://www.scrapingbee.com/images/landing/feature_proxies.svg