= {Title} :title: pdftools :author: Urs Roesch :firstname: Urs :lastname: Roesch :email: github@bun.ch :revnumber: 0.7.1 :keywords: PDF, CLI, Command Line, tools, documents, pdftk, ghostscript, + poppler utils, tesseract, OCR :!toc: :icons: font :git-user: uroesch :repo-name: pdftools ifdef::env-gitlab[] :base-url: https://gitlab.com/{git-user}/{repo-name} :email: gitlab@bun.ch endif::env-gitlab[] ifdef::env-github[] :base-url: https://github.com/{git-user}/{repo-name} :email: github@bun.ch :tip-caption: :bulb: :note-caption: :information_source: :important-caption: :heavy_exclamation_mark: :caution-caption: :fire: :warning-caption: :warning: endif::env-github[]

image:{base-url}/workflows/bash-compatibility/badge.svg[ title="bash-compatilibity", link="{base-url}/actions?query=workflow:bash-compatilbility" ] image:{base-url}/workflows/test-pdftools/badge.svg[ title="os-compatibility", link="{base-url}/actions?query=workflow:os-compatiblilty" ] image:{base-url}/workflows/create-docs/badge.svg[ title="create-docs", link="{base-url}/actions?query=workflow:create-docs" ]

ifndef::env-github,env-gitlab[] image:icons/gitlab-avatar.png[float="left"] endif::env-github,env-gitlab[]

ifdef::env-github,env-gitlab[] +++ <img src="icons/gitlab-avatar.png" align="left"> +++ endif::env-github,env-gitlab[]

A collection of PDF command line tools and wrappers for Linux written in Bash Shell script. These are generally speaking convenience tools so one does not have to remember very long and cryptic options and switches.

The heavy lifting is done by backend tools such as pdftk, ghostscript and the poppler utils are used.

[[installation]] == Installation instructions

The scripts are meant to be installed in a users' home directory. To do this quickly the Makefile in the root of the repository has a target called user_install.

[source,console]

$ make user_install Install pdftools under /home/<user>/bin -> Installing bin/img2pdf to /home/<user>/bin/img2pdf -> Installing bin/ocrpdf to /home/<user>/bin/ocrpdf -> Installing bin/pdf2pdfa to /home/<user>/bin/pdf2pdfa -> Installing bin/pdfcat to /home/<user>/bin/pdfcat -> Installing bin/pdfmeta to /home/<user>/bin/pdfmeta -> Installing bin/pdfresize to /home/<user>/bin/pdfresize -> Installing bin/scan2jpg to /home/<user>/bin/scan2jpg -> Installing bin/scan2pdf to /home/<user>/bin/scan2pdf -> Installing bin/scan2png to /home/<user>/bin/scan2png

To uninstall everything the target user_uninstall can be used.

[[img2pdf]] == img2pdf

A script to convert PNGs, TIFFs or JPEGs to PDF files.

License:: MIT Requires:: bash, pdfcat, ImageMagick, pdftk

[[img2pdf-examples]] === Examples

.Create single page PDFs from images

img2pdf first.png second.jpg

.Create single page PDFs from images and then delete the original images.

img2pdf --delete first.png second.jpg

.Rotate image 180 degrees prior to creating the PDF.

img2pdf --rotate 180 myimage.png

.Rotate images by 180 degrees and concatenate everything into file `output.pdf`

img2pdf --output output.pdf --rotate 180 page1.png page2.jpg

[[img2pdf-usage]] === Usage

Usage: img2pdf <options> <img-file> [<img-file> ... ]

Options: -h | --help This message -d | --delete Delete the images after creating the PDF file. -o | --output <name> Write the output to specified file <name>. -r | --rotate <value> Rotate the image by <value> Where value can be a positive or negative integer between 0 and 360. -V | --version Display version and exit

<<<

[[ocrpdf]] == ocrpdf

Runs PDFs through OCR and saves the output as a text searchable PDF with the same name.

[NOTE]

Only works with PDFs comprised of a single JPEG, LZW or ZIP compressed image per page. LZW compressed images are being converted to ZIP compressed one during the OCR process.

License:: MIT Requires:: bash, pdfcat, pdfimages (poppler-utils), pdftk, tesseract

[[ocrpdf-examples]] === Examples

.Run OCR with all installed languages on a couple of PDFs

ocrpdf first.pdf second.pdf

.Run OCR with German dictionary on a single PDF

ocrpdf --lang deu german.pdf

.Run OCR with German, French and English dictionaries on multiple PDFs

ocrpdf --lang deu+fra+eng scanned_*.pdf

[[ocrpdf-usage]] === Usage

Usage: ocrpdf [options] <file> [<file> [,,]]

Options: -h | --help This message -q | --quiet Don't send display processed file names -V | --version Print version information and exit -l | --lang <lang> Set the OCR languages to use. For multiple languages concatenate with a '+' E.g eng+deu for English and German Default: deu+eng+fra+ita+jpn+osd

Description: Runs PDFs through OCR and saves the output as a text searchable PDF with the same name.

Disclaimer: Only works with PDFs comprised of a single JPEG, LZW or ZIP compressed image per page. LZW compressed images will be converted to ZIP compressed ones during the OCR process.

<<<

[[pdfcat]] == pdfcat

A quick hack to replace pdfunite as it destroys too much of the original's meta data.

License:: MIT Requires:: bash, pdftk >= 2.0

[[pdfcat-examples]] === Examples

.Merging two PDFs into a new one

pdfcat first.pdf second.pdf > merged.pdf

.Merging sequentially ordered PDFs into a single document

pdfcat myscan*.pdf > merged.pdf

[[pdfcat-usage]] === Usage

Usage: pdfcat [<options>] <pdf> <pdf> [..]

Options: -h | --help This message. -V | --version Print version and exit.

<<<

[[pdfmeta]] == pdfmeta

A wrapper script around pdftk to manipulate a PDFs meta data

License:: MIT Requires:: bash >= 4.0, pdftk >= 2.0

[[pdfmeta-examples]] === Examples

.Modify keywords

pdfmeta --keywords "rainbow, magical, unicorn" unicorn.pdf rainbow.pdf

.Modify creation date

pdfmeta --creation-date "2017-01-01 22:30:45" unicorn.pdf

[[pdfmeta-usage]] === Usage

Usage: pdfmeta <options> <pdf> [[<pdf>] ..]

Options:
  -h | --help               This message
  -k | --keywords           Comma separated list of keywords
  -s | --subject            Define the PDFs subject
  -t | --title              Define the PDFs title
  -c | --creator            Define the PDFs creator program or library
  -p | --producer           Define the PDFs producing program
  -C | --creation-date      Set the creation date of the PDF
  -M | --modification-date  Set the modification date of the PDF
  -V | --version            Display version and exit

[NOTE]

On Ubuntu 18.04 (bionic) `pdfmeta` works only with the https://snapcraft.io/pdftk[snap] or with version 3.2.x of https://gitlab.com/pdftk-java/pdftk[`pdftk-java`]. With every other version of pdftk `CreationDate` and `ModDate` will not work when running the unit tests. The changed PDF has no problem but `pdfinfo` from the `poppler-utils` package can't handle the changed entries and reports them as empty.

<<<

[[pdfresize]] == pdfresize

A wrapper around ghostscript to reduce the size of a scanned document

[NOTE]

pdfresize is very likely not working with PDF documents containing https://en.wikipedia.org/wiki/JBIG2[JBIG2] images.

License:: MIT Requires:: bash, ghostscript

[[pdfresize-examples]] === Examples

.Resize to default resolution

pdresize --input input.pdf --output output.pdf

.Resize to screen resolution

pdfresize --quality screen --input input.pdf --output output.pdf

[[pdfresize-usage]] === Usage

Usage: pdfresize [-q pdfsettings] -i <input> -o <output>

Options: -h | --help This message -i | --input <input> A PDF file preferably of high resolution -o | --output <output> Name of the PDF file to save the result to -q | --quality <quality> Quality settings for output PDF. See quality keywords for acceptable input. -V | --version Print version and exit.

Quality keywords: screen - low-resolution; comparable to "Screen Optimized" in Acrobat Distiller ebook - medium-resolution; comparable to "eBook" in Acrobat Distiller printer - comparable to "Print Optimized" in Acrobat Distiller prepress - comparable to "Prepress Optimized" in Acrobat Distiller default - intended to be useful across a wide variety of uses

<<<

[[pdf2pdfa]] == pdf2pdfa

Small script to convert a PDF to PDF/A type.

[NOTE]

This is early beta and all the meta data in the PDF will be lost!

[[pdf2pdfa-examples]] === Examples

.Convert pdf file `sample.pdf` to a PDF/A-2 named `sample_a.pdf`

pdf2pdfa sample.pdf

.Convert pdf file `sample.pdf` to a PDF/A-2 named `sample_pdfa.pdf`

pdf2pdfa --suffix _pdfa sample.pdf

.Convert pdf file `sample.pdf` to a PDF/A-1 named `sample_a.pdf`

pdf2pdfa --level 1 sample.pdf

.Convert pdf file `sample.pdf` to a PDF/A-3 exiting on errors.

pdf2pdfa --level 3 --strict sample.pdf

.Convert pdf file `sample.pdf` to a PDF/A-2 with color model CMYK.

pdf2pdfa --color-model CMYK sample.pdf

[[pdf2pdfa-usage]] == Usage

Usage: pdf2pdfa [<options>] <pdf_file> [<pdf_file> [..]]

Options: -c | --color-model <model> Color model to use for the conversion. Valid input is RGB or CMYK. Default: RGB -h | --help This message -l | --level <number> PDF-A specification level to use. Valid input is 1 (A-1), 2 (A-2) and 3 (A-3). Default: 2 -S | --strict Exit if errors are encountered during conversion. -

Pdftools

Install / Use

README

[source,console]

.Create single page PDFs from images

img2pdf first.png second.jpg

.Create single page PDFs from images and then delete the original images.

img2pdf --delete first.png second.jpg

.Rotate image 180 degrees prior to creating the PDF.

img2pdf --rotate 180 myimage.png

.Rotate images by 180 degrees and concatenate everything into file output.pdf

img2pdf --output output.pdf --rotate 180 page1.png page2.jpg

[NOTE]

Only works with PDFs comprised of a single JPEG, LZW or ZIP compressed image per page. LZW compressed images are being converted to ZIP compressed one during the OCR process.

.Run OCR with all installed languages on a couple of PDFs

ocrpdf first.pdf second.pdf

.Run OCR with German dictionary on a single PDF

ocrpdf --lang deu german.pdf

.Run OCR with German, French and English dictionaries on multiple PDFs

ocrpdf --lang deu+fra+eng scanned_*.pdf

Disclaimer: Only works with PDFs comprised of a single JPEG, LZW or ZIP compressed image per page. LZW compressed images will be converted to ZIP compressed ones during the OCR process.

.Merging two PDFs into a new one

pdfcat first.pdf second.pdf > merged.pdf

.Merging sequentially ordered PDFs into a single document

pdfcat myscan*.pdf > merged.pdf

Options: -h | --help This message. -V | --version Print version and exit.

.Modify keywords

pdfmeta --keywords "rainbow, magical, unicorn" unicorn.pdf rainbow.pdf

.Modify creation date

pdfmeta --creation-date "2017-01-01 22:30:45" unicorn.pdf

[NOTE]

[NOTE]

pdfresize is very likely not working with PDF documents containing https://en.wikipedia.org/wiki/JBIG2[JBIG2] images.

.Resize to default resolution

pdresize --input input.pdf --output output.pdf

.Resize to screen resolution

pdfresize --quality screen --input input.pdf --output output.pdf

[NOTE]

This is early beta and all the meta data in the PDF will be lost!

.Convert pdf file sample.pdf to a PDF/A-2 named sample_a.pdf

pdf2pdfa sample.pdf

.Convert pdf file sample.pdf to a PDF/A-2 named sample_pdfa.pdf

pdf2pdfa --suffix _pdfa sample.pdf

.Convert pdf file sample.pdf to a PDF/A-1 named sample_a.pdf

pdf2pdfa --level 1 sample.pdf

.Convert pdf file sample.pdf to a PDF/A-3 exiting on errors.

pdf2pdfa --level 3 --strict sample.pdf

.Convert pdf file sample.pdf to a PDF/A-2 with color model CMYK.

pdf2pdfa --color-model CMYK sample.pdf

.Rotate images by 180 degrees and concatenate everything into file `output.pdf`

.Convert pdf file `sample.pdf` to a PDF/A-2 named `sample_a.pdf`

.Convert pdf file `sample.pdf` to a PDF/A-2 named `sample_pdfa.pdf`

.Convert pdf file `sample.pdf` to a PDF/A-1 named `sample_a.pdf`

.Convert pdf file `sample.pdf` to a PDF/A-3 exiting on errors.

.Convert pdf file `sample.pdf` to a PDF/A-2 with color model CMYK.