Jtm
HTML/XML to JSON converter
Install / Use
/learn @ldn-softdev/JtmREADME
jtm - easy lossless HTML/XML to JSON and back converter cli utility
A simple tool offering quick lossless HTML/XML to JSON conversion
- version 2.x offers an improved JSON layout (compared to v1.x), so it's easier to parse a resulting JSON
- command interface switches also changed (to reflect action semantic better)
the tool offers following behaviors:
- HTML/XML tags semantic unaware - convertor does not keep track or tag meaning, see conversion specification below
- converted JSON is possible to reistate back to its original format XML/HTML (thanks to lossless conversion)
- detects malformed HTML/XML (i.e. closed tags w/o corresponding openning) and automatically fixing it (optionally could be disabled)
Conversion rules:
- each tag is translated into a JSON object, with a single label - name of the tag
- all attributes go into object with the label
attributes(a default label, could be changed by user) - a value of the tag (i.e. everything between opening and closing tags) is merged into the tag label
- empty tags w/o attributes will be set to JSON
nullvalue
A following sample illustrates HTML to JSON conversion rules:
- source HTML sample:
<!DOCTYPE html>
<html>
<head>
<title>HTML example</title>
<meta charset="utf-8">
</head>
<body text="green">
<p>
Oh Brother,<br>
Where Art Thou?<br>
</p>
</body>
</html>
- is converted into JSON:
[
{
"!": "DOCTYPE html"
},
{
"html": [
{
"head": [
{
"title": "HTML example"
},
{
"meta": {
"attributes": {
"charset": "utf-8"
}
}
}
]
},
{
"body": [
{
"attributes": {
"text": "green"
}
},
{
"p": [
"Oh Brother,",
{
"br": null
},
"Where Art Thou?",
{
"br": null
}
]
}
]
}
]
}
]
- and recovered from JSON back to HTML:
<!DOCTYPE html>
<html>
<head>
<title>HTML example</title>
<meta charset="utf-8">
</head>
<body text="green">
<p>Oh Brother,<br>Where Art Thou?<br>
</p>
</body>
</html>
Linux and MacOS precompiled binaries are available for download
For compiling c++14 (or later) is required:
- to compile under macOS, use cli:
c++ -o jtm -Wall -std=c++14 -Ofast jtm.cpp - To compile under Linux, use cli:
c++ -o jtm -Wall -std=gnu++14 -static -Ofast jtm.cpp
or download latest precompiled binary:
Compile and install instructions:
download jtm-master.zip, unzip it, descend into unzipped folder, compile using
an appropriate command, move compiled file into an install location.
here's the example steps (for macOS):
- say,
jtm-master.ziphas been downloaded to a folder and the terminal app is open in that folder: unzip jtm-master.zipcd jtm-masterc++ -o jtm -Wall -std=c++14 -Ofast jtm.cppsudo mv ./jtm /usr/local/bin/
help screen:
bash $ jtm -h
usage: jtm [-defhnqr] [-a label] [-i indent] [src_file]
HTML/XML to JSON and back lossless convertor.
Version 2.08, developed by Dmitry Lyssenko (ldn.softdev@gmail.com)
optional arguments:
-d turn on debugs (multiple calls increase verbosity)
-e enlist even single values (otherwise don't)
-f digitize all numerical strings
-h help screen
-n do not retry parsing upon facing a closing tag w/o its pair
-q enforce strict JSON's quoted solidus parsing
-r print json in a raw (compact) format
-a label a label used for attribute values [default: attributes]
-i indent indent for pretty printing [default: 3]
standalone arguments:
src_file file to read source from [default: <stdin>]
the tool is html/xml tag semantic agnostic, follows conversion specification:
<tag> </tag> <-> { "tag": [] }
<tag> ... </tag> <-> { "tag": [ <...> ] }
<tag attributes> </tag> <-> { "tag": [ { <attributes> } ] }
<tag attributes> ... </tag> <-> { "tag": [ { <attributes> }, <...> ] }
<self_closed attributes /> <-> { "self_closed/": { <attributes> } }
<self_closed/> <-> { "self_closed/": null }
<empty_tag attributes> <-> { "empty_tag": { <attributes> } }
<empty_tag> <-> { "empty_tag": null }
<!...> <-> { "!": <...> }
<?tag attributes> <-> { "?tag": { <attributes> } }
<?tag> <-> { "?tag": null }
- if a tag enlists a single value then it's value de-listed (default behavior,
could be disabled optionally), unless the value is "attributes" - then no
delisting occurs
bash $
- Once HTML/XML is converted to JSON, use jtc tool to extract / manipulated JSON data
Here's a trivial example how use them together.
Say, we want to remove from the original html document all specific tags (and their content respecitvely)?
Let it be tag <br> in the above html sample. A simple way to do it would be like this:
bash $ cat sample.html | jtm | jtc -w '<br>l+0[-1]' -p | jtm
<!DOCTYPE html>
<html>
<head>
<title>HTML example</title>
<meta charset="utf-8">
</head>
<body text="green">
<p>Oh Brother,Where Art Thou?</p>
</body>
</html>
bash $
As easy as a pie!
Related Skills
node-connect
351.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
