Do you need to grab elements by text using XPath? Well, today we're going to discuss just that. Our tutorial keeps things simple: exact matches with text() = '...'
, partial matches with contains()
, plus starts-with()
and normalize-space()
to avoid whitespace-related issues. You'll learn about case sensitivity, special characters, and how text matching differs for attributes vs. inner text. Of course, this article also includes copy-pasteable examples for Python/lxml and Selenium.
So, if you need to select XPath by text, check if XPath contains text, or see if text equals some value, you're in the right place. Let's get started!
Exact vs. partial text matching
Let's start with the most common case: matching the actual text inside an element (the stuff between the opening and closing tags). To put a long story short, XPath gives you two main ways to match text.
Exact match
//*[text() = "Tired"]
This only works if the element's text is exactly "Tired". Yeah, even one extra space or different casing will make it fail.
Partial match
//*[contains(text(), "Tired")]
This looks for "Tired" anywhere inside the element's text. So it will also match, for example, "Tired of getting blocked while scraping the web?".
Rule of thumb:
- Use
text() = "..."
when you know the full text won't change. - Use
contains()
when the text is longer, dynamic, or you only care about part of it.
Example: Requests and lxml
Here's a simple Python example demonstrating the usage of contains()
and text()
:
import requests
from lxml import etree
# Download the HTML from the page
html = requests.get("https://scrapingbee.com").text
# Parse the HTML into an lxml DOM object so we can run XPath queries
dom = etree.HTML(html)
# Example 1: exact match with text() = "..."
# This only matches elements whose text is exactly "Tired"
# Since the actual text is longer, this query returns nothing
exact = dom.xpath('//*[text()="Tired"]')
print("Exact match count:", len(exact))
# -> 0 (no exact match found)
# Example 2: partial match with "contains"
# This finds any element whose text includes the word "Tired"
partial = dom.xpath('//*[contains(text(), "Tired")]')[0].text
print("Partial match:", partial)
# -> Tired of getting blocked while scraping the web?
Example: Selenium
And here's another example using Selenium:
from selenium import webdriver
from selenium.webdriver.common.by import By
# Launch a new Chrome browser instance
driver = webdriver.Chrome()
# Open the target page
driver.get("https://scrapingbee.com")
# Example 1: exact match with text() = "..."
# This only matches if the element's text is exactly "Tired"
# Since the page text is longer, it won't find anything
# We use find_elements() here to avoid an exception — it returns a list
exact = driver.find_elements(By.XPATH, "//*[text()='Tired']")
print("Exact match count:", len(exact))
# -> 0 (no exact match found)
# Example 2: partial match with "contains"
# This finds the first element where the text contains "Tired"
partial = driver.find_element(By.XPATH, "//*[contains(text(), 'Tired')]")
print("Partial match:", partial.text)
# -> Tired of getting blocked while scraping the web?
contains(), starts-with(), and normalize-space()
XPath has a few handy functions for working with text beyond exact matches. These are especially useful when you want to find elements by text in XPath without relying on an exact string.
contains()
Use contains(text(), "...")
for partial text searches. This is the most common way people do XPath contains text queries.
//*[contains(text(), "Login")]
Matches elements like <button>Login now</button>
or <span>Click to Login</span>
.
starts-with()
Use starts-with(text(), "...")
if you want to find elements by text that begins with a specific string.
//*[starts-with(text(), "Welcome")]
Matches <h1>Welcome back!</h1>
but not <h1>Hey, Welcome back!</h1>
.
normalize-space()
Use normalize-space(text())
when extra whitespace might break your match.
//*[normalize-space(text()) = "Submit"]
This approach comes in handy when the text has extra spaces or newlines, e.g. <button> Submit </button>
(note leading and trailing spaces!).
Python example
Here's a Python example showing contains()
, starts-with()
, and normalize-space()
in action:
import requests
from lxml import etree
html = requests.get("https://scrapingbee.com").text
dom = etree.HTML(html)
# contains(): find any element with "Tired" in its text
contains_match = dom.xpath('//*[contains(text(), "Tired")]')[0].text
print("Contains match:", contains_match)
# starts-with(): find elements where text starts with "Tired"
startswith_match = dom.xpath('//*[starts-with(text(), "Tired")]')[0].text
print("Starts-with match:", startswith_match)
# normalize-space(): ignore leading/trailing spaces when matching
# Here we simulate extra whitespace around the text
fake_html = "<button> Submit </button>"
fake_dom = etree.HTML(fake_html)
normalized = fake_dom.xpath('//*[normalize-space(text()) = "Submit"]')[0].text
print("Normalized match:", normalized)
Case-insensitive text matching with translate()
Keep in mind that XPath text matching is case sensitive. That means:
//*[text()="Submit"]
matches<button>Submit</button>
- ...but it won't match
<button>submit</button>
or<button>SUBMIT</button>
.
If you need a case-insensitive XPath find by text search, a common trick is to convert both sides to lowercase using translate()
:
//*[translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = 'submit']
This forces everything to lowercase before comparing.
How translate() works in XPath
The translate()
function doesn't mean "translate language." It literally replaces characters in a string, one by one.
Its form is:
translate(source_string, characters_to_replace, replacement_characters)
This function works for every character in the original string.
- If that character appears in
characters_to_replace
, it gets replaced with the character at the same position inreplacement_characters
. - If it appears in
characters_to_replace
but there's no corresponding character inreplacement_characters
(because the third string is shorter), that character is removed from the result string. - All other characters stay unchanged.
But why do we need to provide this huge list of characters? Well, because unfortunately XPath doesn't have a built-in lower()
or upper()
function. Yes, similar functions are available in XPath 3 but many tools still utilize older XPath 1. For example, Python's lxml relies on libxml2 that employs XPath 1. Therefore, if you want to make a string lowercase, you have to map every uppercase letter to its lowercase version manually.
Here's a small example:
translate("ABC", "ABC", "abc") -> "abc"
It reads as: "for each uppercase A, replace with lowercase a, for B replace with b..." and so on. All other characters will be left intact.
Python example
Here's a short Python example showing the usage of translate()
:
import requests
from lxml import etree
# For demo purposes, let's use a small HTML snippet instead of fetching a page
html = "<div>Submit</div><div>submit</div><div>SUBMIT</div>"
dom = etree.HTML(html)
# Regular exact match (case sensitive)
# This will only match the first <div> ("Submit")
exact_match = dom.xpath('//*[text()="Submit"]')
print("Exact match count:", len(exact_match))
for e in exact_match:
print("Matched text:", e.text)
# Case-insensitive match using translate()
# This converts the element text to lowercase before comparing
ci_match = dom.xpath(
'//*[translate(text(), "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz") = "submit"]'
)
print("Case-insensitive match count:", len(ci_match))
for e in ci_match:
print("Matched text:", e.text)
Handling quotes and special characters
Text in real pages isn't always clean. You'll run into quotes, HTML entities, non-breaking spaces, emojis, and random whitespace. Here's how to make your XPath find by text selectors robust.
Quotes inside text
If your text has a single quote, wrap the string in double quotes:
//*[text()="It's live!"]
If your text has a double quote, wrap the string in single quotes:
//*[text()='Click "OK" to continue']
For tricky cases with both kinds of quotes, you can use the concat() function:
//*[text()=concat('He said "', "OK", '" and it\'s fine')]
HTML entities
After parsing HTML, entities are decoded into real characters in the DOM:
<script>
becomes<script>
text.A & B
becomesA & B
.
becomes a non-breaking space (\u00A0
).
So match the decoded characters in your XPath, not the entity form. For example:
- Contains
<script>
text://*[contains(text(), "<script>")]
- Exact text with ampersand:
//*[text()="A & B"]
Non-breaking spaces and whitespace
normalize-space()
trims and collapses regular whitespace, but not non-breaking spaces. If you suspect \u00A0
, translate it to a normal space first:
//*[normalize-space(translate(text(), "\u00A0", " ")) = "Price: 10 USD"]
General trick: translate(., "\u00A0", " ")
and then use normalize-space(.)
.
Unicode, emojis, and symbols
XPath handles Unicode fine. If your Python file is UTF-8 (default in Py3), you can match emojis and symbols directly:
//*[contains(text(), "✅")]
Python example (quotes, entities, NBSP)
from lxml import etree
# Let's prepare some simple HTML
html = """
<div class="a">He said "OK"</div>
<div class="b">It's fine</div>
<div class="c"><script>alert(1)</script></div>
<div class="d">A & B</div>
<div class="e">Price: 10 USD</div>
"""
dom = etree.HTML(html)
# 1) Quotes: use opposite quotes around the XPath literal
print(dom.xpath('//*[text()=\'He said "OK"\']/@class')) # ['a']
print(dom.xpath('//*[text()="It\'s fine"]/@class')) # ['b']
# 2) Entities are decoded: <script> and & become real characters
print(dom.xpath('//*[contains(text(), "<script>")]/@class')) # ['c']
print(dom.xpath('//*[text()="A & B"]/@class')) # ['d']
# 3) NBSP handling: translate NBSP (\u00A0) to space, then normalize
nbspace_xpath = '//*[normalize-space(translate(text(), "\u00A0", " ")) = "Price: 10 USD"]/@class'
print(dom.xpath(nbspace_xpath)) # ['e']
Attribute text matching (vs. inner text)
Sometimes the text you care about isn't the inner text of an element—it's sitting in an attribute like alt
, title
, aria-label
, or data-*
. The approach is almost the same as matching by inner text, but instead of text()
you target the attribute with @attr
.
Inner text vs. attribute text
Inner text (content between tags):
//*[text() = "Submit"]
//*[contains(text(), "Submit")]
Attribute text (values on the tag):
//*[@alt = "Company logo"]
//*[contains(@aria-label, "Submit")]
//*[starts-with(@title, "Welcome")]
Use attribute matching when the visible label is actually provided via attributes (common with icons, SVGs, or accessibility labels).
Common patterns for matching attribute text
Exact attribute match:
//*[@data-test-id = "login-button"]
Partial attribute match (contains):
//*[contains(@class, "btn-primary")]
//*[contains(@aria-label, "Submit")]
(Great for quick "XPath contains text" checks on attributes.)
Starts with:
//*[starts-with(@href, "/docs/")]
Trim spaces (attributes can also have extra leading or trailing whitespace):
//*[normalize-space(@title) = "User settings"]
Case-insensitive match on attributes (same translate()
trick):
//*[translate(@alt, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz") = "company logo"]
Python example
from lxml import etree
html = """
<button class="btn btn-primary" aria-label="Submit form">Go</button>
<img src="/logo.png" alt="Company Logo"/>
<a href="/docs/getting-started" title=" User settings ">Profile</a>
"""
dom = etree.HTML(html)
# 1) Exact attribute match
alt_exact = dom.xpath('//*[@alt = "Company Logo"]/@src')
print("Exact alt match -> src:", alt_exact) # ['/logo.png']
# 2) Partial attribute match (contains) — "XPath contains text" on attributes
aria_contains = dom.xpath('//*[contains(@aria-label, "Submit")]/text()')
print("ARIA contains 'Submit' -> inner text:", aria_contains) # ['Go']
# 3) Starts-with on attributes
href_starts = dom.xpath('//*[starts-with(@href, "/docs/")]/@href')
print("href starts-with '/docs/' ->", href_starts) # ['/docs/getting-started']
# 4) normalize-space on attributes (trims leading/trailing spaces)
title_trimmed = dom.xpath('//*[normalize-space(@title) = "User settings"]/name()') # returns the tag name of the matching element
print("normalize-space(@title) match -> tag name:", title_trimmed) # ['a']
# 5) Case-insensitive attribute match using translate()
alt_ci = dom.xpath(
'//*[translate(@alt, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz") = "company logo"]/@alt'
)
print("Case-insensitive alt match ->", alt_ci) # ['Company Logo']
Performance tips for selecting elements by text in XPath
Speed matters when you're running XPath at scale. Here are practical ways to keep selectors fast and stable, whether you match by inner text or by attributes.
Narrow the search scope:
- Avoid
//*
(searches the whole DOM). Start from a known parent or tag://main//button[...]
,//ul[@id="menu"]//a[...]
. - In Selenium, first locate a container
el = driver.find_element(...)
then run relative XPath:.//button[...]
.
Prefer attributes over text:
- Attribute queries are usually cheaper and more stable than XPath by text. If the page offers
data-test
,data-testid
,aria-*
,id
, or uniqueclass
hooks—use them.
Use exact matches when possible:
@attr = "value"
is faster thancontains(@attr, "value")
.- For text:
text() = "Submit"
is cheaper thancontains(text(), "Submit")
, if you control the copy.
Keep predicates specific:
- Combine constraints to cut the candidate set early:
//button[@data-test="login" and contains(@class, "primary")]
. - Put the most selective predicate first when you can (helps humans, and some engines may short-circuit).
Be careful with expensive functions:
translate()
,contains()
, andnormalize-space()
on every node of a huge page can add up.- Try to limit them to a smaller subtree or pair them with a selective tag/attribute filter.
Trim whitespace only when needed:
normalize-space(.)
on big containers is costly; prefernormalize-space(text())
or run it on a narrow node.- If NBSPs are the issue, translate just that character before normalizing.
Avoid text wildcards for layout content:
- Headings and marketing copy change. Use attributes for critical flows; keep XPath contains text for optional fallbacks.
Cache/compile your XPath (lxml):
- If you run the same selector many times, compile it once (
etree.XPath(...)
) and reuse it.
Reduce the node set with tag hints:
//button[contains(., "Submit")]
is better than//*[contains(., "Submit")]
when you know the element type.
Mind dynamic content:
- For Selenium, wait for the right state (visibility, presence) before querying. Fewer retries, fewer stale refs, faster runs.
Test selectors in devtools first:
- Validate in the browser's XPath/CSS console to avoid slow iterate-and-fix cycles in code.
Use indexes sparingly:
(...)[1]
is fine, but relying on positional indexes in unstable DOMs causes flakiness. Prefer unique predicates.
These keep your XPath find by text and attribute-based selectors quick and less brittle.
Common errors and debugging
Even with good XPath knowledge, you'll run into errors or "why the hell is this not matching?" moments. Here are common pitfalls and how to debug them.
Extra spaces or newlines:
text()="Submit"
won't match<button> Submit </button>
.- Fix: wrap with
normalize-space(text()) = "Submit"
.
Hidden or dynamic elements:
- Selenium can see elements that are in the DOM but not visible yet.
- Fix: wait for visibility with
WebDriverWait
or target the right container.
Multiple text nodes:
text()
only matches a single text node.- Example:
<button>Click <span>here</span></button>
→text()
returns just"Click "
, ignoring"here"
. - Fix: use
normalize-space(.)
instead oftext()
to grab all text inside the element.
Case sensitivity:
- XPath is case sensitive.
//*[text()="submit"]
won't match"Submit"
. - Fix: use the
translate()
trick for case-insensitive search.
Entities vs decoded text:
- Remember: HTML entities are decoded after parsing.
- Example:
<div>A & B</div>
→ in the DOM the text is"A & B"
. - Fix: search for
&
instead of&
.
Index confusion:
(...)[1]
in XPath is 1-based, not 0-based.- Fix: remember the first element is
(...)[1]
.
Using text()
vs .
:
text()
→ only direct text nodes..
→ full string value of the element (includes nested text).- Debug: if
text()
gives nothing, try.
.
Typos in attribute vs text:
- Easy to mix up
@attr
andtext()
. Double-check you're using the right one.
Debugging tips
Test XPath in browser devtools:
- In Chrome/Firefox console:
$x('//button[contains(text(),"Submit")]')
. - If it matches nothing there, probably it won't work in Python or Selenium either. There might be exceptions though: for example, when the page changes after JS execution.
$x()
instantly shows the matching elements, making it easy to tweak selectors before using them in code.
Print the matched elements:
- In lxml:
print(dom.xpath(...))
or loop and print.text
. - In Selenium:
print([el.text for el in driver.find_elements(...)])
.
Simplify, then add complexity:
- Start with
//button
→ then add[contains(text(),"Submit")]
. - Layer predicates step by step.
Check for dynamic content:
- If the text is loaded with JS, requests+lxml won't see it.
- Fix: use Selenium or an API endpoint.
Frequently asked questions on selecting elements in XPath
How do I select an element by exact text in XPath?
Use text() = "..."
when you want an exact match. This only works if the element's text matches exactly (including spaces and case).
//button[text() = "Submit"]
Example in Python (lxml):
from lxml import etree
html = "<button>Submit</button><button>Submit now</button>"
dom = etree.HTML(html)
exact = dom.xpath('//button[text()="Submit"]')
print([el.text for el in exact]) # ['Submit']
How do I select an element by partial text in XPath?
Use contains(text(), "...")
when you only care about part of the text. It matches any element that has the substring inside its text.
//button[contains(text(), "Submit")]
Example in Python (lxml):
from lxml import etree
html = "<button>Submit</button><button>Submit now</button>"
dom = etree.HTML(html)
partial = dom.xpath('//button[contains(text(), "Submit")]')
print([el.text for el in partial]) # ['Submit', 'Submit now']
How can I ignore leading or trailing spaces in XPath text?
Use the normalize-space()
function. It trims leading and trailing whitespace and also collapses multiple spaces into a single one.
//button[normalize-space(text()) = "Submit"]
Example in Python (lxml):
from lxml import etree
html = "<button> Submit </button>"
dom = etree.HTML(html)
trimmed = dom.xpath('//button[normalize-space(text()) = "Submit"]')
print([el.text for el in trimmed]) # [' Submit ']
Can I do case-insensitive text matching in XPath?
Yes, but XPath doesn't have a built-in lower()
or upper()
function. The common workaround is to use translate()
to convert both sides to lowercase before comparing.
//*[translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = 'submit']
Example in Python (lxml):
from lxml import etree
html = "<div>Submit</div><div>submit</div><div>SUBMIT</div>"
dom = etree.HTML(html)
ci = dom.xpath('//*[translate(text(), "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz") = "submit"]')
print([el.text for el in ci]) # ['Submit', 'submit', 'SUBMIT']
How do I select elements with text or another condition?
Use the logical operator or
inside your XPath predicate. This way you can match by text or by an attribute (or any other condition).
//button[text() = "Submit" or @id = "submit-btn"]
Example in Python (lxml):
from lxml import etree
html = """
<button id="submit-btn">Go</button>
<button>Submit</button>
<button>Cancel</button>
"""
dom = etree.HTML(html)
match = dom.xpath('//button[text()="Submit" or @id="submit-btn"]')
print([el.text for el in match]) # ['Go', 'Submit']
How do I find elements by text inside attributes instead of the element's content?
Instead of text()
, use @attribute
in your XPath. You can combine it with =
, contains()
, or starts-with()
just like with inner text.
Exact match:
//img[@alt = "Company Logo"]
Partial match:
//button[contains(@aria-label, "Submit")]
Example in Python (lxml):
from lxml import etree
html = """
<img src="logo.png" alt="Company Logo"/>
<button aria-label="Submit form">Go</button>
"""
dom = etree.HTML(html)
alt_match = dom.xpath('//img[@alt="Company Logo"]/@src')
print("Alt match:", alt_match) # ['logo.png']
aria_match = dom.xpath('//button[contains(@aria-label, "Submit")]/text()')
print("ARIA match:", aria_match) # ['Go']
Can I select elements with text containing special characters like quotes?
Yes. XPath string literals can be wrapped in either single or double quotes. Use the opposite one inside the string:
Text with a single quote → wrap it with double quotes:
//div[text()="It's live!"]
Text with a double quote → wrap with single quotes:
//div[text()='Click "OK" to continue']
Text contains both single and double quotes → use concat()
:
//*[text()=concat('He said "', "OK", '" and it\'s fine')]
Example in Python (lxml):
from lxml import etree
html = """
<div class="a">It's live!</div>
<div class="b">Click "OK" to continue</div>
<div class="c">A & B</div>
"""
dom = etree.HTML(html)
# Single quote inside text
print(dom.xpath('//div[text()="It\'s live!"]/@class')) # ['a']
# Double quote inside text
print(dom.xpath('//div[text()=\'Click "OK" to continue\']/@class')) # ['b']
# Special char (& is decoded into a real ampersand in DOM)
print(dom.xpath('//div[text()="A & B"]/@class')) # ['c']
How do I get all text nodes within an element using XPath?
text()
only grabs direct text nodes, so if the element contains nested tags, you won't get the full string. To capture all text (including nested elements), use .
or string(.)
.
Direct text nodes only:
//div/text()
Full string value (includes nested spans, strong, etc.):
//div/string()
Another approach:
//div[normalize-space(.) = "Full text here"]
Python example:
from lxml import etree
html = """
<div>
Hello <span>World</span>!
</div>
"""
dom = etree.HTML(html)
# Direct text nodes only
print(dom.xpath('//div/text()'))
# ['\n Hello ', '!']
# Full combined text (including nested span)
print(dom.xpath('string(//div)'))
# 'Hello World!'
How do I select elements whose text starts or ends with a specific string?
XPath gives you starts-with()
for the beginning, and you can simulate "ends with" using substring()
.
Starts with:
//div[starts-with(text(), "Hello")]
Ends with (XPath 1.0 trick; works in lxml, Selenium, and other XPath 1.0 engines):
//div[substring(text(), string-length(text()) - string-length("World") + 1) = "World"]
(In XPath 2.0+ you can simply use ends-with()
, but most tools (lxml, Selenium, etc.) are XPath 1.0 only, so the substring trick is needed there.)
Python example:
from lxml import etree
html = """
<div class="a">Hello world</div>
<div class="b">Say Hello</div>
<div class="c">Goodbye World</div>
"""
dom = etree.HTML(html)
# Starts with "Hello"
print(dom.xpath('//div[starts-with(text(), "Hello")]/@class'))
# ['a']
# Ends with "World"
print(dom.xpath('//div[substring(text(), string-length(text()) - string-length("World") + 1) = "World"]/@class'))
# ['c']
What's the difference between text()
and .
in XPath?
This is a common source of confusion so let's discuss it in simple terms.
text()
→ targets direct text node children only. It won't include text from nested tags.
//div/text()
.
(dot) or string(.)
→ returns the string value of the entire element, including all nested text.
//div[normalize-space(.) = "Hello World"]
Python example:
from lxml import etree
html = """
<div>
Hello <span>World</span>!
</div>
"""
dom = etree.HTML(html)
# Using text() → only direct text nodes
print(dom.xpath('//div/text()'))
# ['\n Hello ', '!']
# Using . → full string value including nested span
print(dom.xpath('string(//div)'))
# 'Hello World!'