Pypandoc
Thin wrapper for "pandoc" (MIT)
Install / Use
/learn @JessicaTegner/PypandocREADME
pypandoc
Pypandoc provides a thin wrapper for pandoc, a universal document converter.
Installation
Pypandoc uses pandoc, so it needs an available installation of pandoc. Pypandoc provides 2 packages, "pypandoc" and "pypandoc_binary", with the second one including pandoc out of the box. The 2 packages are identical, with the only difference being that one includes pandoc, while the other don't.
If pandoc is already installed (i.e. pandoc is in the PATH), pypandoc uses the version with the
higher version number, and if both are the same, the already installed version. See Specifying the location of pandoc binaries for more.
To use pandoc filters, you must have the relevant filters installed on your machine.
Installing via pip
If you want to install pandoc yourself or are on a unsupported platform, you'll need to install "pypandoc" and install pandoc manually
pip install pypandoc
If you want pandoc included out of the box, you can utilize our pypandoc_binary package, which are identical to the "pypandoc" package, but with pandoc included.
pip install pypandoc_binary
Prebuilt wheels for Windows and Mac OS X
If you use Linux and have your own wheelhouse,
you can build a wheel which includes pandoc with
uv build --wheel binary/. Be aware that this works only
on 64bit intel systems, as we only download it from the
official releases.
Installing via conda
Pypandoc is included in conda-forge. The conda packages will also install the pandoc package, so pandoc is available in the installation.
Install via conda install -c conda-forge pypandoc.
You can also add the channel to your conda config via
conda config --add channels conda-forge. This makes it possible to
use conda install pypandoc directly and also lets you update via conda update pypandoc.
Installing pandoc
If you don't already have pandoc on your system, or have installed the pypandoc_binary package, which includes pandoc, you need to install pandoc by yourself.
Installing pandoc via pypandoc
Installing via pypandoc is possible on Windows, Mac OS X or Linux (Intel-based, 64-bit):
pip install pypandoc
from pypandoc.pandoc_download import download_pandoc
# see the documentation how to customize the installation path
# but be aware that you then need to include it in the `PATH`
download_pandoc()
The default install location is included in the search path for pandoc, so you
don't need to add it to the PATH.
By default, the latest pandoc version is installed. If you want to specify your own version, say 1.19.1, use download_pandoc(version='1.19.1') instead.
You can also use pypandocs build in cli to download pandoc
# install latest pandoc to default path
pypandoc download
# Download a specific version
pypandoc download --version 3.6
Installing pandoc manually
Installing manually via the system mechanism is also possible. Such installation mechanism make pandoc available on many more platforms:
- Ubuntu/Debian:
sudo apt-get install pandoc - Fedora/Red Hat:
sudo yum install pandoc - Arch:
sudo pacman -S pandoc - Mac OS X with Homebrew:
brew install pandoc pandoc-citeproc Caskroom/cask/mactex - Machine with Haskell:
cabal-install pandoc - Windows: There is an installer available here
- FreeBSD with pkg:
pkg install hs-pandoc - Or see Pandoc - Installing pandoc
Be aware that not all install mechanisms put pandoc in the PATH, so you either
have to change the PATH yourself or set the full PATH to pandoc in
PYPANDOC_PANDOC. See the next section for more information.
Specifying the location of pandoc binaries
You can point to a specific pandoc version by setting the environment variable
PYPANDOC_PANDOC to the full PATH to the pandoc binary
(PYPANDOC_PANDOC=/home/x/whatever/pandoc or PYPANDOC_PANDOC=c:\pandoc\pandoc.exe).
If this environment variable is set, this is the only place where pandoc is searched for.
In certain cases, e.g. pandoc is installed but a web server with its own user cannot find the binaries, it is useful to specify the location at runtime:
import os
os.environ.setdefault('PYPANDOC_PANDOC', '/home/x/whatever/pandoc')
Usage
There are two basic ways to use pypandoc: with input files or with input strings.
import pypandoc
# With an input file: it will infer the input format from the filename
output = pypandoc.convert_file('somefile.md', 'rst')
# ...but you can overwrite the format via the `format` argument:
output = pypandoc.convert_file('somefile.txt', 'rst', format='md')
# alternatively you could just pass some string. In this case you need to
# define the input format:
output = pypandoc.convert_text('# some title', 'rst', format='md')
# output == 'some title\r\n==========\r\n\r\n'
convert_text expects this string to be unicode or utf-8 encoded bytes. convert_* will always
return a unicode string.
It's also possible to directly let pandoc write the output to a file. This is the only way to
convert to some output formats (e.g. odt, docx, epub, epub3, pdf). In that case convert_*() will
return an empty string.
import pypandoc
output = pypandoc.convert_file('somefile.md', 'docx', outputfile="somefile.docx")
assert output == ""
It's also possible to specify multiple input files to pandoc, either as absolute paths, relative paths or file patterns.
import pypandoc
# convert all markdown files in a chapters/ subdirectory.
pypandoc.convert_file('chapters/*.md', 'docx', outputfile="somefile.docx")
# convert all markdown files in the book1 and book2 directories.
pypandoc.convert_file(['book1/*.md', 'book2/*.md'], 'docx', outputfile="somefile.docx")
# convert the front from another drive, and all markdown files in the chapter directory.
pypandoc.convert_file(['D:/book_front.md', 'book2/*.md'], 'docx', outputfile="somefile.docx")
pathlib is also supported.
import pypandoc
from pathlib import Path
# single file
input = Path('somefile.md')
output = input.with_suffix('.docx')
pypandoc.convert_file(input, 'docx', outputfile=output)
# convert all markdown files in a chapters/ subdirectory.
pypandoc.convert_file(Path('chapters').glob('*.md'), 'docx', outputfile="somefile.docx")
# convert all markdown files in the book1 and book2 directories.
pypandoc.convert_file([*Path('book1').glob('*.md'), *Path('book2').glob('*.md')], 'docx', outputfile="somefile.docx")
# pathlib globs must be unpacked if they are inside lists.
In addition to format, it is possible to pass extra_args.
That makes it possible to access various pandoc options easily.
output = pypandoc.convert_text(
'<h1>Primary Heading</h1>',
'md', format='html',
extra_args=['--atx-headers'])
# output == '# Primary Heading\r\n'
output = pypandoc.convert_text(
'# Primary Heading',
'html', format='md',
extra_args=['--base-header-level=2'])
# output == '<h2 id="primary-heading">Primary Heading</h2>\r\n'
pypandoc now supports easy addition of pandoc filters.
filters = ['pandoc-citeproc']
pdoc_args = ['--mathjax',
'--smart']
output = pypandoc.convert_file(filename,
to='html5',
format='md',
extra_args=pdoc_args,
filters=filters)
Please pass any filters in as a list and not as a string.
Please refer to pandoc -h and the
official documentation for further details.
Dealing with Formatting Arguments
Pandoc supports custom formatting though -V parameter. In order to use it through
pypandoc, use code such as this:
output = pypandoc.convert_file('demo.md', 'pdf', outputfile='demo.pdf',
extra_args=['-V', 'geometry:margin=1.5cm'])
Note: it's important to separate
-Vand its argument within a list like that or else it won't work. This gotcha has to do with the waysubprocess.Popenworks.
PDF and LaTeX Support with TinyTeX
Converting to PDF requires a LaTeX engine (like pdflatex, xelatex, or lualatex) to be installed on your system.
