SkillAgentSearch skills...

Percollate

A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs.

Install / Use

/learn @danburzo/Percollate
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<img src='./.github/percollate.svg' alt='percollate' width='200'/>

<a href="https://www.npmjs.org/package/percollate"><img src="https://img.shields.io/npm/v/percollate.svg?style=flat-square&labelColor=324A97&color=black" alt="npm version"></a>

Percollate is a command-line tool that turns web pages into beautifully formatted PDF, EPUB, HTML or Markdown files.

<figure style='margin: 1rem 0'> <img alt="Sample Output" src="./.github/dimensions-of-colour.png"> <figcaption style='font-style: italic'>Sample spread from the generated PDF of <a href='http://www.huevaluechroma.com/072.php'>a chapter in Dimensions of Colour</a>; rendered here in black & white for a smaller image file size.</figcaption> </figure>

Installation

percollate is a Node.js command-line tool which you can install globally from npm:

npm install -g percollate

Percollate and its dependencies require Node.js 14.17.0 or later.

Community-maintained packages

There's a packaged version available on Arch User Repository, which you can install using your local AUR helper (yay, pacaur, or similar):

yay -S nodejs-percollate

Some Docker images are available in this tracking issue.

Usage

Run percollate --help for a list of available commands and options.

Percollate is invoked on one or more operands (usually URLs):

percollate <command> [options] url [url]...

The following commands are available:

  • percollate pdf produces a PDF file;
  • percollate epub produces an EPUB file;
  • percollate html produces a HTML file.
  • percollate md produces a Markdown file.

The operands can be URLs, paths to local files, or the - character which stands for stdin (the standard inputs).

Available options

Unless otherwise stated, these options apply to all three commands.

-o, --output

Specify the path of the resulting bundle relative to the current folder.

percollate pdf https://example.com -o my-example.pdf

-u, --url

Using the - operand you can read the HTML content from stdin, as fetched by a separate command, such as curl. In this sort of setup, percollate does not know the URL from which the content has been fetched, and relative paths on images, anchors, et cetera won't resolve correctly.

Use the --url option to supply the source's original URL.

curl https://example.com | percollate pdf - --url=https://example.com

-w, --wait

By default, percollate processes URLs in parallel. Use the --wait option to process them sequentially instead, with a pause between items. The delay is specified in seconds, and can be zero.

percollate epub --wait=1 url1 url2 url3

--individual

By default, percollate bundles all web pages in a single file. Use the --individual flag to export each source to a separate file.

percollate pdf --individual http://example.com/page1 http://example.com/page2

--template

Path to a custom HTML template. Applies to pdf, html, and md.

--style

Path to a custom CSS stylesheet, relative to the current folder.

--css

Additional CSS styles you can pass from the command-line to override styles specified by the default/custom stylesheet.

--no-amp

Don't prefer the AMP version of the web page.

--debug

Print more detailed information.

-t, --title

Provide a title for the bundle.

percollate epub http://example.com/page-1 http://example.com/page-2 --title="Best Of Example"

-a, --author

Provide an author for the bundle.

percollate pdf --author="Ella Example" http://example.com

--cover

Generate a cover. The option is implicitly enabled when the --title option is provided, or when bundling more than one web page to a single file. Disable this implicit behavior by passing the --no-cover flag.

--toc

Generate a hyperlinked table of contents. The option is implicitly enabled when bundling more than one web page to a single file. Disable this implicit behavior by passing the --no-toc flag.

Applies to pdf, html, and md.

--toc-level=<level>

By default, the table of contents is a flat list of article titles. With the --toc-level option the table of contents will include headings under each article title (<h2>, <h3>, etc.), up to the specified heading depth. A number between 1 and 6 is expected.

Using --toc-level with a value greater than 1 implies --toc.

--hyphenate

Hyphenation is enabled by default for pdf, and disabled for epub, html, and md. You can opt into hyphenation with the --hyphenate flag, or disable it with the --no-hyphenate flag.

See also the Hyphenation and justification recipe.

--inline

Embed images inline with the document. Images are fetched and converted to Base64-encoded data URLs.

This option is particularly useful for html to produce self-contained HTML files.

--md.<option>=<value>

Pass options to the underlying Markdown stringifier, mdast-util-to-markdown. These are the default Markdown options:

const DEFAULT_MARKDOWN_OPTIONS = {
	fences: true,
	emphasis: '_',
	strong: '_',
	resourceLink: true,
	rule: '-'
};

--unsafe

Disables some JSDOM validations that may throw an error when parsing invalid HTML pages (See #177).

Recipes

Basic bundling

To turn a single web page into a PDF:

percollate pdf --output=some.pdf https://example.com

To bundle several web pages into a single PDF, specify them as separate arguments to the command:

percollate pdf --output=some.pdf https://example.com/page1 https://example.com/page2

You can use common Unix commands and keep the list of URLs in a newline-delimited text file:

cat urls.txt | xargs percollate pdf --output=some.pdf

To transform several web pages into individual PDF files at once, use the --individual flag:

percollate pdf --individual https://example.com/page1 https://example.com/page2

If you'd like to fetch the HTML with an external command, you can use - as an operand, which stands for stdin (the standard input):

curl https://example.com/page1 | percollate pdf --url=https://example.com/page1 -

Notice we're using the url option to tell percollate the source of our (now-anonymous) HTML it gets on stdin, so that relative URLs on links and images resolve correctly.

Web feeds

Percollate has basic support for processing XML web feeds in Atom or RSS format.

When processing a web feed, every entry in the feed becomes its own article, as if percollate received all the entry URLs as operands. The command below produces an EPUB book from the feed contents:

percollate epub https://example.com/posts.xml

To produce individual output files for the feed entries, use the --individual flag:

percollate epub --individual https://example.com/posts.xml

The content of the articles is read from the feed file rather than fetched anew. The content is passed through the DOM enhancements and sanitized as usual, but it’s not processed with Readability.

<details> <summary>To fetch the HTML pages for entries in Atom and RSS feeds</summary>

If instead you’d like to fetch and process the original HTML pages corresponding to the entries in the Atom/RSS feed, use hred to extract the URLs and feed them to percollate with xargs.

Below is an example hred query for extracting URLs from Atom feeds, explained in more detail on the hred recipes page.

curl https://example.com/posts.xml | \
hred -xcr 'entry > link:is([rel=alternate],:not([rel]))@href' | \
xargs percollate epub
</details>

The --css option

The --css option lets you pass a small snippet of CSS to percollate. Here are some common use-cases:

Custom page size / margins

The default page size is A5 (portrait). You can use the --css option to override it using any supported CSS size:

percollate pdf --css "@page { size: A3 landscape }" http://example.com

Similarly, you can define:

  • custom margins, e.g. @page { margin: 0 }
  • the base font size: html { font-size: 10pt }

Changing the font stacks

The default stylesheet includes CSS variables for the fonts used in the PDF:

:root {
	--main-font: Palatino, 'Palatino Linotype', 'Times New Roman',
		'Droid Serif', Times, 'Source Serif Pro', serif, 'Apple Color Emoji',
		'Segoe UI Emoji', 'Segoe UI Symbol';
	--alt-font: 'helvetica neue', ubuntu, roboto, noto, 'segoe ui', arial,
		sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol';
	--code-font: Menlo, Consolas, monospace;
}

| CSS variabl

Related Skills

View on GitHub
GitHub Stars4.6k
CategoryDevelopment
Updated8h ago
Forks174

Languages

JavaScript

Security Score

100/100

Audited on Mar 30, 2026

No findings