What is XPath?
XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the W3C and can be used to navigate through elements and attributes in an XML document.
Can we use XPath with BeautifulSoup?
Technically, no. But we can BeautifulSoup4 with lxml Python library to achieve that.
To install lxml, all you have to do is run this command: pip install lxml
, and that's it!
And we can now run this code to extract ScrapingBee's blog title:
import requests
from bs4 import BeautifulSoup
from lxml import etree
response = requests.get("https://www.scrapingbee.com/blog/")
soup = BeautifulSoup(response.content, 'html.parser')
body = soup.find("body")
dom = etree.HTML(str(body)) # Parse the HTML content of the page
xpath_str = '//*[@id="content"]/section/div/div[1]/h1' # The XPath expression for the blog's title
print(dom.xpath(xpath_str)[0].text)