.. raw:: html

<img src="https://raw.githubusercontent.com/raul23/pyebooktools/master/docs/logo/pyebooktools.png"> 🚧    Work-In-Progress

This project (version 0.1.0a3) is a Python port of ebook-tools_ which is written in Shell by na--. The Python script ebooktools.py is a collection of tools for automated organization and management of large ebook collections.

Check also my other project search-ebooks_ which is based on pyebooktools_ for searching through the content and metadata of ebooks.

:warning:

Check organize-ebooks <https://github.com/raul23/organize-ebooks>_ which is the Python port of organize-ebooks.sh <https://github.com/na--/ebook-tools/blob/master/organize-ebooks.sh>_ and includes a Docker image <https://hub.docker.com/repository/docker/raul23/organize/general>_ for easy installation of all needed dependencies and Python package.

About

The ebooktools.py_ script is a Python port of the shell scripts_ from ebook-tools_ and makes use of the following modules:

edit_config.py edits a configuration file which can either be the main config file that contains all the options defined below <#usage-options-and-configuration>__ or the logging config file whose default values is defined in default_logging.py. The edit subcommand from the ebooktools.py script uses this module.
convert_to_txt.py converts the supplied file to a text file. It can optionally also use OCR for .pdf, .djvu and image files. The convert_ subcommand from the ebooktools.py script uses this module.
find_isbns.py tries to find valid ISBNs_ inside a file or in a string if no file was specified. Searching for ISBNs in files uses progressively more resource-intensive methods until some ISBNs are found, for more details see
- the documentation for ebook-tools_ (shell scripts) or
- search_file_for_isbns()_ from lib.py (Python function where ISBNs search in files is implemented).
The find_ subcommand from the ebooktools.py script uses this module.
organize_ebooks.py is used to automatically organize folders with potentially huge amounts of unorganized ebooks. This is done by renaming the files with proper names and moving them to other folders:
- By default it searches_ the supplied ebook files for ISBNs_, downloads the book metadata (author, title, series, publication date, etc.) from online sources like Goodreads, Amazon and Google Books and renames the files according to a specified template.
- If no ISBN is found, the script can optionally search for the ebooks online by their title and author, which are extracted from the filename or file metadata.
- Optionally an additional file that contains all the gathered ebook metadata can be saved together with the renamed book so it can later be used for additional verification, indexing or processing.
- Most ebook types are supported: .epub, .mobi, .azw, .pdf, .djvu, .chm, .cbr, .cbz, .txt, .lit, .rtf, .doc, .docx, .pdb, .html, .fb2, .lrf, .odt, .prc and potentially others. Even compressed ebooks in arbitrary archive files are supported. For example a .zip, .rar or other archive file that contains the .pdf or .html chapters of an ebook can be organized without a problem.
- Optical character recognition (OCR [Wikipedia] <https://en.wikipedia.org/wiki/Optical_character_recognition>_) can be automatically used for .pdf, .djvu and image files when no ISBNs were found in them by the fast and straightforward conversion to .txt. This is very useful for scanned ebooks that only contain images or were badly OCR-ed in the first place.
- Files are checked for corruption (zero-filled files, broken pdfs, corrupt archive, etc.) and corrupt files can optionally be moved to another folder.
- Non-ebook documents, pamphlets and pamphlet-like documents like saved webpages, short pdfs, etc. can also be detected and optionally moved to another folder.
Ref.: [ORG]_

The organize_ subcommand from the ebooktools.py script uses this module.
rename_calibre_library.py traverses a calibre library folder, renames all the book files in it by reading their metadata from calibre's metadata.opf files. Then the book files are either moved or symlinked to an output folder along with their corresponding metadata files. The rename_ subcommand from the ebooktools.py script uses this module.
split_into_folders.py splits the supplied ebook files (and the accompanying metadata files if present) into folders with consecutive names that each contain the specified number of files. The split_ subcommand from the ebooktools.py script uses this module.

Thus, you have access to various subcommands_ from within the ebooktools.py script.

:star:

ebook-tools_ is the original Shell project I ported to Python. I used the same names for the script options (short and longer versions) so that if you used the shell scripts, you will easily know how to run the corresponding subcommand_ with the given options.
ebooktools.py_ is the name of the Python script which will always be referred that way in this document (i.e. no hyphen and ending with .py) to distinguish from the original Shell project ebook-tools.
pyebooktools_ is the name of the Python package that you need to install <#install-pyebooktools>__ to have access to the ebooktools.py script.

Installation and dependencies

To install the script ebooktools.py, follow these steps:

Install the dependencies below <#other-dependencies>__.
Install the pyebooktools package below <#install-pyebooktools>__.

Python dependencies

Platforms: macOS [soon linux]
Python: >= 3.6
lxml >= 4.4 for parsing Calibre's metadata.opf files.

:information_source:

When installing the pyebooktools package below <#install-pyebooktools>__, the lxml library is automatically installed if it is not found or upgraded to the correct supported version.

Other dependencies

As explained in the documentation for ebook-tools <https://github.com/na--/ebook-tools#shell-scripts>__, you need recent versions of:

calibre_ for fetching metadata from online sources, conversion to txt (for ISBN searching) and ebook metadata extraction. Versions 2.84 and above are preferred because of their ability to manually specify from which specific online source we want to fetch metadata. For earlier versions you have to set isbn_metadata_fetch_order and organize_without_isbn_sources to empty strings.
p7zip_ for ISBN searching in ebooks that are in archives.
Tesseract_ for running OCR on books - version 4 gives better results even though it's still in alpha. OCR is disabled by default and another engine can be configured if preferred.
Optionally poppler, catdoc and DjVuLibre_ can be installed for faster than calibre's conversion of .pdf, .doc and .djvu files respectively to .txt.

:warning:

On macOS, you don't need catdoc_ since it has the built-in textutil_ command-line tool that converts any txt, html, rtf, rtfd, doc, docx, wordml, odt, or webarchive file.
Optionally the Goodreads_ and WorldCat xISBN_ calibre plugins can be installed for better metadata fetching.

:star:

If you only install calibre among these dependencies, you can still have a functioning program that will organize and manage your ebook collections:

fetching metadata from online sources will work: by default <https://manual.calibre-ebook.com/generated/en/fetch-ebook-metadata.html# cmdoption-fetch-ebook-metadata-allowed-plugin>__ calibre comes with Amazon and Google sources among others
conversion to txt will work: calibre's own ebook-convert_ tool will be used

All subcommands_ should work but accuracy and performance will be affected as explained in the list of dependencies above.

Install `pyebooktools`

Install first the Python dependencies <#python-dependencies>_ and other tools <#other-dependencies>_.
It is highly recommended to install the pyebooktools package in a virtual environment using for example venv_ or conda_.
Make sure to update pip::

$ pip install --upgrade pip
Install the pyebooktools package (bleeding-edge version) with pip::

$ pip install git+https://github.com/raul23/pyebooktools#egg=pyebooktools

:warning:

Make sure that pip is working with the correct Python version. It might be the case that pip is using Python 2.x You can find what Python version pip uses with the following::

  $ pip -V

If pip is working with the wrong Python version, then try to use pip3 which works with Python 3.x

Test installation

Test your installation by importing pyebooktools and printing its version::

$ python -c "import pyebooktools; print(pyebooktools.version)"
You can also test that you have access to the ebooktools.py script by showing the program's version::

$ ebooktools --version

Usage, options and configuration

All of the options documented below can either be passed to the ebooktools.py_ script via command-line arguments or via the configuration file ``config

Pyebooktools

Install / Use

README

About

Installation and dependencies

Python dependencies

Other dependencies

Install `pyebooktools`

Usage, options and configuration

Pyebooktools

Install / Use

README

About

Installation and dependencies

Python dependencies

Other dependencies

Install pyebooktools

Usage, options and configuration

Install `pyebooktools`