There are multiple ways for using XPath selectors in Python. One popular option is to use lxml
and BeautifulSoup
and pair it with requests
. And the second option is to use Selenium.
Here is some sample code for using lxml, BeautifulSoup, and Requests for opening up the ScrapingBee homepage and extracting the text from h1
tag using XPath:
import requests
from lxml import etree
from bs4 import BeautifulSoup
html = requests.get("https://scrapingbee.com")
soup = BeautifulSoup(html.text, "html.parser")
dom = etree.HTML(str(soup))
first_h1_text = dom.xpath('//h1')[0].text
print(first_h1_text)
# Output: Tired of getting blocked while scraping the web?
Here is some sample code for doing the same with Selenium:
from selenium import webdriver
from selenium.webdriver.common.by import By
DRIVER_PATH = '/path/to/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get("https://scrapingbee.com")
first_h1 = driver.find_element(By.XPATH, "//h1")
first_h1_text = first_h1.text
print(first_h1_text)
# Output: Tired of getting blocked while scraping the web?