Xpyth
A module for querying the DOM tree and writing XPath expressions using native Python syntax.
Install / Use
/learn @hchasestevens/XpythREADME
xpyth
A module for querying the DOM tree and writing XPath expressions using native Python syntax.
Example usage
>>> from xpyth import xpath, DOM, X
>>> xpath(X for X in DOM if X.name == 'main')
"//*[@name='main']"
>>> xpath(span for div in DOM for span in div if div.id == 'main')
"//div[@id='main']//span"
>>> xpath(a for a in DOM if '.com' not in a.href)
"//a[not(contains(@href, '.com'))]"
>>> xpath(a.href for a in DOM if any(p for p in a.ancestors if p.id))
"//a[./ancestor::p[@id]]/@href"
>>> xpath(X.data-bind for X in DOM if X.data-bind == '1')
"//*[@data-bind='1']/@data-bind"
>>> xpath(
... form.action
... for form in DOM
... if all(
... input
... for input in form.children
... if input.value == 'a'
... )
... )
"//form[not(./input[not(@value='a')])]/@action"
>>> allowed_ids = list('abc')
>>> xpath(X for X in DOM if X.id in allowed_ids)
"//*[@id='a' or @id='b' or @id='c']"
Motivation
XPath is the de facto standard in querying XML and HTML documents. In Python (and most other languages), XPath expressions are represented as strings; this not only constitutes a potential security threat, but also means that developers are denied standard text-editor and IDE features such as syntax highlighting and autocomplete when writing XPaths. Furthermore, having to become familiar with XPath (or CSS selectors) presents a barrier to entry for developers who want to interact with the web.
Great inroads have been made in various programming languages in allowing the use of native list-comprehension-like syntax to generate SQL queries. xpyth piggybacks off one such effort, Pony, to extend this functionality to XPath. Now anyone familiar with Python comprehension syntax can query XML/HTML documents quickly and easily. Moreover, xpyth integrates with the popular lxml library to enable developers to go beyond the querying capabilities of XPath (when necessary).
Installation
pip install xpyth
Use with lxml
xpyth supports querying lxml ElementTrees using the query function. For example, given a document
<html>
<div id='main' class='main'>
<a href='http://www.google.com'>Google</a>
<a href='http://www.chasestevens.com'>Not Google</a>
<p>Lorem ipsum</p>
<p id='123'>no numbers here</p>
<p id='numbers_only'>123</p>
</div>
<div id='123' class='secondary'>
<a href='http://www.google.org'>Google Charity</a>
<a href='http://www.chasestevens.org'>Broken link!</a>
</div>
</html>
accessible as the ElementTree tree, the following can be executed:
>>> len(query(a for a in tree))
4
>>> query(a for a in tree if 'Not Google' not in a.text)[0].attrib.get('href')
"http://www.google.com"
>>> next(
... node
... for node in
... query(
... p
... for p in
... tree
... if p.id
... )
... if re.match(r'\D+', node.attrib.get('id'))
... ).text
"123"
Known Issues
-
HTML tag names that contain special characters (dashes) cannot be selected, as they violate Python's generator comprehension syntax. HTML attributes containing dashes, e.g.
data-bind, work normally. -
The use of
allis quite buggy, e.g. the following return incorrect expressions:>>> xpath(X for X in DOM if all(p.id in ('a', 'b') for p in X)) "//*[not(.//p/@id='a' or //p/@id='b')]" # expected "//*[not(.//p[./@id!='a' and ./@id!='b'])]" >>> xpath(X for X in DOM if all('x' in p.id for p in X)) "//*[not(.contains(@id, //p))]" # expected "//*[not(.//p[not(contains(@id, 'x'))])]"
Contacts
- Name: H. Chase Stevens
- Twitter: @hchasestevens
Related Skills
claude-opus-4-5-migration
81.7kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
docs-writer
98.8k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
332.3kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
TrendRadar
49.6k⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
