Modest
CSS selectors for HTML5 Parser myhtml
Install / Use
/learn @kostya/ModestREADME
WARNING, this shard obsolete and moved to myhtml directly, use myhtml >= 1.0.0
modest
CSS selectors for HTML5 Parser myhtml (Crystal wrapper for https://github.com/lexborisov/Modest).
Installation
Add this to your application's shard.yml:
dependencies:
modest:
github: kostya/modest
Usage of CSS Selectors with myhtml parser
require "modest"
page = <<-PAGE
<html>
<div class=aaa><p id=bbb><a href="http://..." class=ccc>bla</a></div>
</html>
PAGE
myhtml = Myhtml::Parser.new(page)
# css select from the root! scope (equal with myhtml.root!.css("..."))
iterator = myhtml.css("div.aaa p#bbb a.ccc") # => Iterator(Myhtml::Node), methods: .each, .to_a, ...
iterator.each do |node|
p node.tag_id # MyHTML_TAG_A
p node.tag_name # "a"
p node.tag_sym # :a
p node.attributes["href"]? # "http://..."
p node.inner_text # "bla"
puts node.to_html # <a href="http://..." class="ccc">bla</a>
end
# css select from node scope
if p_node = myhtml.css("div.aaa p#bbb").first?
p_node.css("a.ccc").each do |node|
p node.tag_sym # :a
end
end
Example 2
require "modest"
html = <<-PAGE
<div>
<p id=p1>
<p id=p2 class=jo>
<p id=p3>
<a href="some.html" id=a1>link1</a>
<a href="some.png" id=a2>link2</a>
<div id=bla>
<p id=p4 class=jo>
<p id=p5 class=bu>
<p id=p6 class=jo>
</div>
</div>
PAGE
parser = Myhtml::Parser.new(html)
# select all p nodes which id like `*p*`
p parser.css("p[id*=p]").map(&.attribute_by("id")).to_a # => ["p1", "p2", "p3", "p4", "p5", "p6"]
# select all nodes with class "jo"
p parser.css("p.jo").map(&.attribute_by("id")).to_a # => ["p2", "p4", "p6"]
p parser.css(".jo").map(&.attribute_by("id")).to_a # => ["p2", "p4", "p6"]
# select odd child tag inside div, which not contain a
p parser.css("div > :nth-child(2n+1):not(:has(a))").map(&.attribute_by("id")).to_a # => ["p1", "p4", "p6"]
# all elements with class=jo inside last div tag
p parser.css("div").to_a.last.css(".jo").map(&.attribute_by("id")).to_a # => ["p4", "p6"]
# a element with href ends like .png
p parser.css(%q{a[href$=".png"]}).map(&.attribute_by("id")).to_a # => ["a2"]
# find all a tags inside <p id=p3>, which href contain `html`
p parser.css(%q{p[id=p3] > a[href*="html"]}).map(&.attribute_by("id")).to_a # => ["a1"]
# find all a tags inside <p id=p3>, which href contain `html` or ends_with `.png`
p parser.css(%q{p[id=p3] > a:matches([href *= "html"], [href $= ".png"])}).map(&.attribute_by("id")).to_a # => ["a1", "a2"]
# create finder and use it in many places, this is faster, than create it many times
finder = Modest::Finder.new(".jo")
p parser.css(finder).map(&.attribute_by("id")).to_a # => ["p2", "p4", "p6"]
Example 3
require "modest"
html = <<-PAGE
<html><body>
<table id="t1"><tbody>
<tr><td>Hello</td></tr>
</tbody></table>
<table id="t2"><tbody>
<tr><td>123</td><td>other</td></tr>
<tr><td>foo</td><td>columns</td></tr>
<tr><td>bar</td><td>are</td></tr>
<tr><td>xyz</td><td>ignored</td></tr>
</tbody></table>
</body></html>
PAGE
parser = Myhtml::Parser.new(html)
p parser.css("#t2 tr td:first-child").map(&.inner_text).to_a # => ["123", "foo", "bar", "xyz"]
p parser.css("#t2 tr td:first-child").map(&.to_html).to_a # => ["<td>123</td>", "<td>foo</td>", "<td>bar</td>", "<td>xyz</td>"]
Benchmark
Comparing with nokorigi(libxml), and crystagiri(libxml). Parse 1000 times google page, code: https://github.com/kostya/modest/tree/master/bench
require "modest"
page = File.read("./google.html")
s = 0
links = [] of String
1000.times do
myhtml = Myhtml::Parser.new(page)
links = myhtml.css("div.g h3.r a").map(&.attribute_by("href")).to_a
s += links.size
myhtml.free
end
p links.last
p s
Parse + Selectors
| Lang | Package | Time, s | Memory, MiB | | -------- | ------------------ | ------- | ----------- | | Crystal | modest(myhtml) | 2.52 | 7.7 | | Crystal | Crystagiri(LibXML) | 19.89 | 14.3 | | Ruby 2.2 | Nokogiri(LibXML) | 45.05 | 136.2 |
Selectors Only (files with suffix 2)
| Lang | Package | Time, s | Memory, MiB | | -------- | ------------------ | ------- | ----------- | | Crystal | modest(myhtml) | 0.18 | 4.6 | | Crystal | Crystagiri(LibXML) | 12.30 | 6.6 | | Ruby 2.2 | Nokogiri(LibXML) | 28.06 | 68.8 |
CSS Selectors rules
https://drafts.csswg.org/selectors-4/
Related Skills
node-connect
347.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
