Percollate
A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs.
Install / Use
/learn @danburzo/PercollateREADME
<a href="https://www.npmjs.org/package/percollate"><img src="https://img.shields.io/npm/v/percollate.svg?style=flat-square&labelColor=324A97&color=black" alt="npm version"></a>
Percollate is a command-line tool that turns web pages into beautifully formatted PDF, EPUB, HTML or Markdown files.
<figure style='margin: 1rem 0'> <img alt="Sample Output" src="./.github/dimensions-of-colour.png"> <figcaption style='font-style: italic'>Sample spread from the generated PDF of <a href='http://www.huevaluechroma.com/072.php'>a chapter in Dimensions of Colour</a>; rendered here in black & white for a smaller image file size.</figcaption> </figure>Installation
percollate is a Node.js command-line tool which you can install globally from npm:
npm install -g percollate
Percollate and its dependencies require Node.js 14.17.0 or later.
Community-maintained packages
There's a packaged version available on Arch User Repository, which you can install using your local AUR helper (yay, pacaur, or similar):
yay -S nodejs-percollate
Some Docker images are available in this tracking issue.
Usage
Run
percollate --helpfor a list of available commands and options.
Percollate is invoked on one or more operands (usually URLs):
percollate <command> [options] url [url]...
The following commands are available:
percollate pdfproduces a PDF file;percollate epubproduces an EPUB file;percollate htmlproduces a HTML file.percollate mdproduces a Markdown file.
The operands can be URLs, paths to local files, or the - character which stands for stdin (the standard inputs).
Available options
Unless otherwise stated, these options apply to all three commands.
-o, --output
Specify the path of the resulting bundle relative to the current folder.
percollate pdf https://example.com -o my-example.pdf
-u, --url
Using the - operand you can read the HTML content from stdin, as fetched by a separate command, such as curl. In this sort of setup, percollate does not know the URL from which the content has been fetched, and relative paths on images, anchors, et cetera won't resolve correctly.
Use the --url option to supply the source's original URL.
curl https://example.com | percollate pdf - --url=https://example.com
-w, --wait
By default, percollate processes URLs in parallel. Use the --wait option to process them sequentially instead, with a pause between items. The delay is specified in seconds, and can be zero.
percollate epub --wait=1 url1 url2 url3
--individual
By default, percollate bundles all web pages in a single file. Use the --individual flag to export each source to a separate file.
percollate pdf --individual http://example.com/page1 http://example.com/page2
--template
Path to a custom HTML template. Applies to pdf, html, and md.
--style
Path to a custom CSS stylesheet, relative to the current folder.
--css
Additional CSS styles you can pass from the command-line to override styles specified by the default/custom stylesheet.
--no-amp
Don't prefer the AMP version of the web page.
--debug
Print more detailed information.
-t, --title
Provide a title for the bundle.
percollate epub http://example.com/page-1 http://example.com/page-2 --title="Best Of Example"
-a, --author
Provide an author for the bundle.
percollate pdf --author="Ella Example" http://example.com
--cover
Generate a cover. The option is implicitly enabled when the --title option is provided, or when bundling more than one web page to a single file. Disable this implicit behavior by passing the --no-cover flag.
--toc
Generate a hyperlinked table of contents. The option is implicitly enabled when bundling more than one web page to a single file. Disable this implicit behavior by passing the --no-toc flag.
Applies to pdf, html, and md.
--toc-level=<level>
By default, the table of contents is a flat list of article titles. With the --toc-level option the table of contents will include headings under each article title (<h2>, <h3>, etc.), up to the specified heading depth. A number between 1 and 6 is expected.
Using --toc-level with a value greater than 1 implies --toc.
--hyphenate
Hyphenation is enabled by default for pdf, and disabled for epub, html, and md. You can opt into hyphenation with the --hyphenate flag, or disable it with the --no-hyphenate flag.
See also the Hyphenation and justification recipe.
--inline
Embed images inline with the document. Images are fetched and converted to Base64-encoded data URLs.
This option is particularly useful for html to produce self-contained HTML files.
--md.<option>=<value>
Pass options to the underlying Markdown stringifier, mdast-util-to-markdown. These are the default Markdown options:
const DEFAULT_MARKDOWN_OPTIONS = {
fences: true,
emphasis: '_',
strong: '_',
resourceLink: true,
rule: '-'
};
--unsafe
Disables some JSDOM validations that may throw an error when parsing invalid HTML pages (See #177).
Recipes
Basic bundling
To turn a single web page into a PDF:
percollate pdf --output=some.pdf https://example.com
To bundle several web pages into a single PDF, specify them as separate arguments to the command:
percollate pdf --output=some.pdf https://example.com/page1 https://example.com/page2
You can use common Unix commands and keep the list of URLs in a newline-delimited text file:
cat urls.txt | xargs percollate pdf --output=some.pdf
To transform several web pages into individual PDF files at once, use the --individual flag:
percollate pdf --individual https://example.com/page1 https://example.com/page2
If you'd like to fetch the HTML with an external command, you can use - as an operand, which stands for stdin (the standard input):
curl https://example.com/page1 | percollate pdf --url=https://example.com/page1 -
Notice we're using the url option to tell percollate the source of our (now-anonymous) HTML it gets on stdin, so that relative URLs on links and images resolve correctly.
Web feeds
Percollate has basic support for processing XML web feeds in Atom or RSS format.
When processing a web feed, every entry in the feed becomes its own article, as if percollate received all the entry URLs as operands. The command below produces an EPUB book from the feed contents:
percollate epub https://example.com/posts.xml
To produce individual output files for the feed entries, use the --individual flag:
percollate epub --individual https://example.com/posts.xml
The content of the articles is read from the feed file rather than fetched anew. The content is passed through the DOM enhancements and sanitized as usual, but it’s not processed with Readability.
<details> <summary>To fetch the HTML pages for entries in Atom and RSS feeds</summary>If instead you’d like to fetch and process the original HTML pages corresponding to the entries in the Atom/RSS feed, use hred to extract the URLs and feed them to percollate with xargs.
Below is an example hred query for extracting URLs from Atom feeds, explained in more detail on the hred recipes page.
curl https://example.com/posts.xml | \
hred -xcr 'entry > link:is([rel=alternate],:not([rel]))@href' | \
xargs percollate epub
</details>
The --css option
The --css option lets you pass a small snippet of CSS to percollate. Here are some common use-cases:
Custom page size / margins
The default page size is A5 (portrait). You can use the --css option to override it using any supported CSS size:
percollate pdf --css "@page { size: A3 landscape }" http://example.com
Similarly, you can define:
- custom margins, e.g.
@page { margin: 0 } - the base font size:
html { font-size: 10pt }
Changing the font stacks
The default stylesheet includes CSS variables for the fonts used in the PDF:
:root {
--main-font: Palatino, 'Palatino Linotype', 'Times New Roman',
'Droid Serif', Times, 'Source Serif Pro', serif, 'Apple Color Emoji',
'Segoe UI Emoji', 'Segoe UI Symbol';
--alt-font: 'helvetica neue', ubuntu, roboto, noto, 'segoe ui', arial,
sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol';
--code-font: Menlo, Consolas, monospace;
}
| CSS variabl
Related Skills
node-connect
341.6kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
summarize
341.6kSummarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).
feishu-doc
341.6k|
