AngleParse
HTML parsing and processing tool for PowerShell.
Install / Use
/learn @kamome283/AngleParseREADME
AngleParse
An easy‑to‑use HTML parsing and processing toolkit for PowerShell.
# Popular modules in PowerShell Gallery
Invoke-WebRequest 'https://www.powershellgallery.com' |
Select-HtmlContent 'div.stats-table tr:not(:first-child)' @{
Module = 'td.text-left', { $_.Trim() }
Downloaded = 'td.text-right', { [long]($_ -replace ',', '') }
Page = 'td.text-left a', ([AngleParse.Attr]'href'), { "https://www.powershellgallery.com$_" }
} { [pscustomobject]$_ }
# Output:
# Page Downloaded Module
# ---- ---------- ------
# https://www.powershellgallery.com/packages/Az.Accounts/ 17969207 Az.Accounts
# https://www.powershellgallery.com/packages/DellBIOSProvider/ 12883950 DellBIOSProvider
# https://www.powershellgallery.com/packages/PackageManagement/ 5033505 PackageManagement
# https://www.powershellgallery.com/packages/Az.Storage/ 4769681 Az.Storage
# https://www.powershellgallery.com/packages/Az.Resources/ 4608517 Az.Resources
Table of Contents
- Overview
- Breaking Changes before v0.4
- Installation
- Usage
- How selectors work
- Selector Reference
- FAQ
- License
Overview
AngleParse simplifies HTML parsing and data extraction in PowerShell. Built on robust C# and the AngleSharp library, it provides intuitive, PowerShell‑style processing toolkit. Seamlessly integrates with common cmdlets like:
Invoke-WebRequest– retrieve web pagesGet-Content– read local HTML filesSelect-Object– format and filter outputConvertTo-*– transform output into other formats
Breaking Changes before v0.4
- The
-Selectorparameter now has theValueFromRemainingArgumentsattribute. - The
-Selectorparameter is no longer a positional parameter. - The
-Contentparameter is no longer a positional parameter. [AngleParse.Attr]::Elementhas been deprecated. Use[AngleParse.Prop]::Elementinstead.
## This is how it used to be before v0.4
'Some HTML Document' | Select-HtmlContent 'div', ([AngleParse.Attr]::Element)
## This is how it is now
'Some HTML Document' | Select-HtmlContent 'div' ([AngleParse.Prop]::Element)
Installation
Install-Package AngleParse
Import-Module AngleParse
Usage
The Select-HtmlContent command reads a string from the pipeline or -Content parameter, parses it as an HTML DOM, and applies the specified selectors which are in forms of variadic parameters.
Get-Content example.html -Raw |
Select-HtmlContent 'div.foo a.bar' ([AngleParse.Attr]::Href)
How selectors work
There are 6 types of selectors: CSS selector, Attribute selector, Property selector, Regex selector, ScriptBlock selector, and Table selector. Each selector inputs one item and outputs zero to many items. By specifying multiple selectors, you can combine them to work like PowerShell's pipeline.
<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><title>Shop</title></head>
<body>
<h1>Shop</h1>
<div id="products">
<div class="product">
<h2>Product 1</h2>
<p>Price: $10</p>
<button class="add-to-cart">Add to Cart</button>
</div>
<div class="product">
<h2>Product 2</h2>
<p>Price: $20</p>
<button class="add-to-cart">Add to Cart</button>
</div>
</div>
</body>
</html>
filter makeOutputInTableFormat { [pscustomobject]$_ }
Get-Content selector_work.html -raw |
Select-HtmlContent 'div#products > div.product' @{
Name = 'h2'
Price = 'p', ([regex]'\$(\d+)')
} | makeOutputInTableFormat
# Output:
# Name Price
# ---- -----
# Product 1 10
# Product 2 20
There are 3 kinds of input and output types:
- Element: a DOM element and subtype of string
- String: a string and subtype of object
- Object: any object
Each selector has its own input and output types. And if its input type is not matched with previous output type, it will throw an error when this cmdlet tries to connect them, not on the actual processing stage.
# This code throws an error because the first selector outputs string,
# but the second selector requires Element as its input type which is subtype of string.
Get-Content selector_work.html -raw |
Select-HtmlContent ([regex]'Price: \$(\d+)') 'p'
Selector Reference
CSS Selector
Element -> Element
a CSS selector receives DOM elements and output DOM elements that match the given CSS selector expression. Any bare string is interpreted as a CSS selector.
<div>
<span class="foo">text content here</span>
</div>
Get-Content css_selector.html -raw |
Select-HtmlContent "div > span.foo"
# Output: 'text content here'
Attribute Selector
Element -> String
An attribute selector receives DOM elements and outputs matched attributes as strings. There are some already defined attributes like:
HrefSrcTitleName
If you want to access to other attributes, you can make your own attribute selector by
converting from string using [AngleParse.Attr] class.
(e.g. ([AngleParse.Attr]'some-attribute'))
<a href="https://example.com" some-attribute="hey"><span>some link</span></a>
Get-Content attribute_selector.html -raw |
Select-HtmlContent ([AngleParse.Attr]::Href)
# Output: https://example.com
Get-Content attribute_selector.html -raw |
Select-HtmlContent ([AngleParse.Attr]'some-attribute')
# Output: hey
If you access a valueless attribute, it will return an empty string.
And if you access an attribute that doesn't exist, it will return $null.
There are also some special selectors in this category, which are actually not attributes but are useful for HTML processing. They are:
InnerHtml- the inner HTML of the elementOuterHtml- the outer HTML of the elementTextContent- the text content of the elementId- the ID of the elementClassName- the class name of the elementSplitClasses- array of class names split by space
Get-Content attribute_selector.html -raw |
Select-HtmlContent ([AngleParse.Attr]::InnerHtml)
# Output: <span>some link</span>
Property Selector
Element -> Object
A property selector receives DOM elements and outputs the property value of the inner
AngleSharp.Dom.IElement by acessing dynamically.
This selector is useful when you want to access the IElement property of the DOM element.
You can make a property selector by converting from string using [AngleParse.Prop] class.
(e.g. ([AngleParse.Prop]'some-property'))
<div><span class="foo">text content here</span></div>
# Well, I know that you should use [AngleParse.Attr]::TextContent instead of this.
# This is just an example.
Get-Content property_selector.html -raw |
Select-HtmlContent ([AngleParse.Prop]'TextContent')
# Output: text content here
As like the attribute selector, there are some special properties in this category. They are:
Element- the innerAngleSharp.Dom.IElementof the DOM elementAttributesTable- the attributes of the element as a dictionary
Regex Selector
String -> String
This selector receives string and outputs captured strings. When you pass DOM element to this selector, it operates capturing on the element's inner text content. Regex value is interpreted as regex selector.
<div><span>2020/07/22</span></div>
Get-Content regex_selector.html -raw |
Select-HtmlContent ([regex]'(\d{4})/(\d{2})')
# Output: 2020, 07
ScriptBlock Selector
Object -> Object
This selector receives any object and outputs the result of the script block.
This selector is useful when you want to process on the scraped data.
In the script block, you can use $_ to refer to the current object.
When you pass DOM element to this selector, it operates on the inner text content of the element.
<span class="some-date">2025/05/04</span>
Get-Content scriptblock_selector.html -raw |
Select-HtmlContent { [DateTime]$_ }
# Output: 2025/05/04 0:00:00
Table Selector
T -> Object where T is the most strict type required in the each branch
Table selector outputs hashtable composed of the given key-value pairs, whose values are processed by the given selectors in each branch. The input of this selector required to conform the most strict type among the selectors in the hashtable requires to conform. Hashtables are interpreted as table selectors.
<body>
<div class="a">
1a
</div>
<div class="b">
2b
</div>
</body>
Get-Content table_selector.html -raw |
Select-HtmlContent @{
ClassName = ([AngleParse.Attr]::ClassName);
NumPlus1 = ([regex]'(\d)\w'), { [int]$_ + 1 }
}
# Output:
# ClassName Number
# --------- ------
# a 2
# b 3
# This throws an error because the input type is string,
# though the most strict type required in the each branch is Element that is subtype of string.
# This does not conform to type constraint.
Get-Content table_selector.html -raw |
Select-HtmlContent ([regex]'.*') @{
ClassName = ([AngleParse.Attr]::ClassName);
NumPlus1 = ([regex]'(\d)\w'), { [int]$_ + 1 }
}
FAQ
Q: Why does the output is not in array when the output is a single item?
This is because of the unification.
Related Skills
node-connect
346.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
