SwiftSoup
SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)
Install / Use
/learn @scinfu/SwiftSoupREADME
SwiftSoup is a pure Swift library designed for seamless HTML parsing and manipulation across multiple platforms, including macOS, iOS, tvOS, watchOS, and Linux. It offers an intuitive API that leverages the best aspects of DOM traversal, CSS selectors, and jQuery-like methods for effortless data extraction and transformation. Built to conform to the WHATWG HTML5 specification, SwiftSoup ensures that parsed HTML is structured just like modern browsers do.
Key Features:
- Parse and scrape HTML from a URL, file, or string.
- Find and extract data using DOM traversal or CSS selectors.
- Modify HTML elements, attributes, and text dynamically.
- Sanitize user-submitted content using a safe whitelist to prevent XSS attacks.
- Generate clean and well-structured HTML output.
SwiftSoup is designed to handle all types of HTML—whether perfectly structured or messy tag soup—ensuring a logical and reliable parse tree in every scenario.
Swift
Swift 5 >=2.0.0
Swift 4.2 1.7.4
Installation
Cocoapods
SwiftSoup is available through CocoaPods. To install it, simply add the following line to your Podfile:
pod 'SwiftSoup'
Carthage
SwiftSoup is also available through Carthage. To install it, simply add the following line to your Cartfile:
github "scinfu/SwiftSoup"
Swift Package Manager
SwiftSoup is also available through Swift Package Manager. To install it, simply add the dependency to your Package.Swift file:
...
dependencies: [
.package(url: "https://github.com/scinfu/SwiftSoup.git", from: "2.6.0"),
],
targets: [
.target( name: "YourTarget", dependencies: ["SwiftSoup"]),
]
...
Usage Examples
Parse an HTML Document
import SwiftSoup
let html = """
<html><head><title>Example</title></head>
<body><p>Hello, SwiftSoup!</p></body></html>
"""
let document: Document = try SwiftSoup.parse(html)
print(try document.title()) // Output: Example
Automatic Format Detection
SwiftSoup.parse(...) automatically detects XML input by looking for an <?xml declaration at the start of the
content. When detected, the XML parser is used; otherwise the HTML parser is applied. This means feeds, OPML, and
other XML documents with a standard XML declaration "just work":
import SwiftSoup
let xml = """
<?xml version="1.0" encoding="UTF-8"?>
<opml version="1.0">
<body>
<link>I'm link</link>
<img>I'm img</img>
</body>
</opml>
"""
let document = try SwiftSoup.parse(xml) // auto-detects XML
print(try document.select("link").first()?.text()) // Output: I'm link
print(try document.select("body > img").first()?.text()) // Output: I'm img
Explicit Parse Modes
Use parseXML(...) or parseHTML(...) when you want to force a specific parser regardless of the content:
// Force XML parsing (no HTML5 tag normalization)
let xmlDoc = try SwiftSoup.parseXML(xmlString)
// Force HTML parsing (always applies HTML5 rules, even if input has <?xml>)
let htmlDoc = try SwiftSoup.parseHTML(htmlString)
// Explicit parser argument (unchanged from before)
let doc = try SwiftSoup.parse(input, baseUri, Parser.xmlParser())
Parse HTML from a URL
If Foundation cannot determine a page's text encoding, avoid String(contentsOf:) and parse the raw response bytes
instead:
import SwiftSoup
let url = URL(string: "https://example.com")!
let document = try SwiftSoup.parse(url)
print(try document.title())
Profiling
SwiftSoup includes a lightweight profiler (gated by a compile-time flag) and a small CLI harness for parsing benchmarks.
CLI parse benchmark
This uses the SwiftSoupProfile executable target to parse a fixture corpus and report wall time:
swift run -c release SwiftSoupProfile --fixtures /path/to/fixtures
Add --text to include Document.text() in the workload.
In-code profiler
The Profiler type is only compiled when the PROFILE flag is set. Build with:
swift run -c release -Xswiftc -DPROFILE SwiftSoupProfile --fixtures /path/to/fixtures
Then the CLI will print the profiler summary at the end of the run.
Select Elements with CSS Query
let html = """
<html><body>
<p class='message'>SwiftSoup is powerful!</p>
<p class='message'>Parsing HTML in Swift</p>
</body></html>
"""
let document = try SwiftSoup.parse(html)
let messages = try document.select("p.message")
for message in messages {
print(try message.text())
}
// Output:
// SwiftSoup is powerful!
// Parsing HTML in Swift
Extract Text and Attributes
let html = "<a href='https://example.com'>Visit the site</a>"
let document = try SwiftSoup.parse(html)
let link = try document.select("a").first()
if let link = link {
print(try link.text()) // Output: Visit the site
print(try link.attr("href")) // Output: https://example.com
}
Modify the DOM
var document = try SwiftSoup.parse("<div id='content'></div>")
let div = try document.select("#content").first()
try div?.append("<p>New content added!</p>")
print(try document.html())
// Output:
// <html><head></head><body><div id="content"><p>New content added!</p></div></body></html>
Clean HTML for Security (Whitelist)
let dirtyHtml = "<script>alert('Hacked!')</script><b>Important text</b>"
let cleanHtml = try SwiftSoup.clean(dirtyHtml, Whitelist.basic())
print(cleanHtml) // Output: <b>Important text</b>
let dirtyHtml = #"<p style="color:red; position:absolute">Styled text</p>"#
let whitelist = try Whitelist()
.addTags("p")
.addAttributes("p", "style")
.addCSSProperties("p", "color")
let cleanHtml = try SwiftSoup.clean(dirtyHtml, whitelist)
print(cleanHtml) // Output: <p style="color:red">Styled text</p>
Use CSS selectors to find elements
(from jsoup)
Selector overview
tagname: find elements by tag, e.g.div#id: find elements by ID, e.g.#logo.class: find elements by class name, e.g..masthead[attribute]: elements with attribute, e.g.[href][^attrPrefix]: elements with an attribute name prefix, e.g.[^data-]finds elements with HTML5 dataset attributes[attr=value]: elements with attribute value, e.g.[width=500](also quotable, like[data-name='launch sequence'])[attr^=value],[attr$=value],[attr*=value]: elements with attributes that start with, end with, or contain the value, e.g.[href*=/path/][attr~=regex]: elements with attribute values that match the regular expression; e.g.img[src~=(?i)\.(png|jpe?g)]*: all elements, e.g.*[*]selects elements that have any attribute. e.g.p[*]finds paragraphs with at least one attribute, andp:not([*])finds those with no attributes.ns|tag: find elements by tag in a namespace prefix, e.g.dc|namefinds<dc:name>elements*|tag: find elements by tag in any namespace prefix, e.g.*|namefinds<dc:name>and<name>elements:empty: selects elements that have no children (ignoring blank text nodes, comments, etc.); e.g.li:empty
Selector combinations
el#id: elements with ID, e.g.div#logoel.class: elements with class, e.g.div.mastheadel[attr]: elements with attribute, e.g.a[href]- Any combination, e.g.
a[href].highlight ancestor child: child elements that descend from ancestor, e.g..body pfindspelements anywhere under a block with class "body"parent > child: child elements that descend directly from parent, e.g.div.content > pfindspelements; andbody > *finds the direct children of the body tagsiblingA + siblingB: finds sibling B element immediately preceded by sibling A, e.g.div.head + divsiblingA ~ siblingX: finds sibling X element preceded by sibling A, e.g.h1 ~ pel, el, el: group multiple selectors, find unique elements that match any of the selectors; e.g.div.masthead, div.logo
Pseudo selectors
:has(selector): find elements that contain elements matching the selector; e.g.div:has(p):is(selector): find elements that match any of the selectors in the selector list; e.g.:is(h1, h2, h3, h4, h5, h6)finds any heading element:not(selector): find elements that do not match the selector; e.g.div:not(.logo):lt(n): find elements whose sibling index (i.e. its position in the DOM tree relative to its parent) is less thann; e.g.td:lt(3):gt(n): find elements whose sibling index is greater thann; e.g.div p:gt(2):eq(n): find elements whose sibling index is equal ton; e.g.form input:eq(1)- Note that the above indexed pseudo-selectors are 0-based, that is, the first element is at in
Related Skills
openhue
326.5kControl Philips Hue lights and scenes via the OpenHue CLI.
sag
326.5kElevenLabs text-to-speech with mac-style say UX.
weather
326.5kGet current weather and forecasts via wttr.in or Open-Meteo
tweakcc
1.4kCustomize Claude Code's system prompts, create custom toolsets, input pattern highlighters, themes/thinking verbs/spinners, customize input box & user message styling, support AGENTS.md, unlock private/unreleased features, and much more. Supports both native/npm installs on all platforms.
