Pyebooktools
Command-line program for organizing and managing ebook collections. It is a Python port from the original shell scripts ebook-tools
Install / Use
/learn @raul23/PyebooktoolsREADME
.. raw:: html
<p align="center"> <img src="https://raw.githubusercontent.com/raul23/pyebooktools/master/docs/logo/pyebooktools.png"> <br> 🚧 <b>Work-In-Progress</b> </p>This project (version 0.1.0a3) is a Python port of ebook-tools_ which is
written in Shell by na--. The Python script ebooktools.py is a collection
of tools for automated organization and management of large ebook collections.
Check also my other project search-ebooks_ which is based on pyebooktools_
for searching through the content and metadata of ebooks.
:warning:
Check organize-ebooks <https://github.com/raul23/organize-ebooks>_ which is the Python port of organize-ebooks.sh <https://github.com/na--/ebook-tools/blob/master/organize-ebooks.sh>_ and includes a Docker image <https://hub.docker.com/repository/docker/raul23/organize/general>_ for easy installation of all needed dependencies and Python package.
About
The ebooktools.py_ script is a Python port of the shell scripts_ from
ebook-tools_ and makes use of the following modules:
-
edit_config.pyedits a configuration file which can either be the main config file that contains all the options definedbelow <#usage-options-and-configuration>__ or the logging config file whose default values is defined indefault_logging.py. Theeditsubcommand from theebooktools.pyscript uses this module. -
convert_to_txt.pyconverts the supplied file to a text file. It can optionally also use OCR for.pdf,.djvuand image files. Theconvert_ subcommand from theebooktools.pyscript uses this module. -
find_isbns.pytries to findvalid ISBNs_ inside a file or in astringif no file was specified. Searching for ISBNs in files uses progressively more resource-intensive methods until some ISBNs are found, for more details see- the
documentation for ebook-tools_ (shell scripts) or search_file_for_isbns()_ fromlib.py(Python function where ISBNs search in files is implemented).
The
find_ subcommand from theebooktools.pyscript uses this module. - the
-
organize_ebooks.pyis used to automatically organize folders with potentially huge amounts of unorganized ebooks. This is done by renaming the files with proper names and moving them to other folders:- By default it
searches_ the supplied ebook files forISBNs_, downloads the book metadata (author, title, series, publication date, etc.) from online sources like Goodreads, Amazon and Google Books and renames the files according to a specified template. - If no ISBN is found, the script can optionally search for the ebooks online by their title and author, which are extracted from the filename or file metadata.
- Optionally an additional file that contains all the gathered ebook metadata can be saved together with the renamed book so it can later be used for additional verification, indexing or processing.
- Most ebook types are supported:
.epub,.mobi,.azw,.pdf,.djvu,.chm,.cbr,.cbz,.txt,.lit,.rtf,.doc,.docx,.pdb,.html,.fb2,.lrf,.odt,.prcand potentially others. Even compressed ebooks in arbitrary archive files are supported. For example a.zip,.raror other archive file that contains the.pdfor.htmlchapters of an ebook can be organized without a problem. - Optical character recognition (
OCR [Wikipedia] <https://en.wikipedia.org/wiki/Optical_character_recognition>_) can be automatically used for.pdf,.djvuand image files when no ISBNs were found in them by the fast and straightforward conversion to.txt. This is very useful for scanned ebooks that only contain images or were badly OCR-ed in the first place. - Files are checked for corruption (zero-filled files, broken pdfs, corrupt archive, etc.) and corrupt files can optionally be moved to another folder.
- Non-ebook documents, pamphlets and pamphlet-like documents like saved webpages, short pdfs, etc. can also be detected and optionally moved to another folder.
Ref.: [ORG]_
The
organize_ subcommand from theebooktools.pyscript uses this module. - By default it
-
rename_calibre_library.pytraverses a calibre library folder, renames all the book files in it by reading their metadata from calibre'smetadata.opffiles. Then the book files are either moved or symlinked to an output folder along with their corresponding metadata files. Therename_ subcommand from theebooktools.pyscript uses this module. -
split_into_folders.pysplits the supplied ebook files (and the accompanying metadata files if present) into folders with consecutive names that each contain the specified number of files. Thesplit_ subcommand from theebooktools.pyscript uses this module.
Thus, you have access to various subcommands_ from within the
ebooktools.py script.
:star:
ebook-tools_ is the original Shell project I ported to Python. I used the same names for the script options (short and longer versions) so that if you used the shell scripts, you will easily know how to run the correspondingsubcommand_ with the given options.ebooktools.py_ is the name of the Python script which will always be referred that way in this document (i.e. no hyphen and ending with.py) to distinguish from the original Shell projectebook-tools.pyebooktools_ is the name of the Python package that you need toinstall <#install-pyebooktools>__ to have access to theebooktools.pyscript.
Installation and dependencies
To install the script ebooktools.py, follow these steps:
- Install the dependencies
below <#other-dependencies>__. - Install the
pyebooktoolspackagebelow <#install-pyebooktools>__.
Python dependencies
- Platforms: macOS [soon linux]
- Python: >= 3.6
lxml>= 4.4 for parsing Calibre'smetadata.opffiles.
:information_source:
When installing the pyebooktools package
below <#install-pyebooktools>__, the lxml library is automatically
installed if it is not found or upgraded to the correct supported version.
Other dependencies
As explained in the documentation for ebook-tools <https://github.com/na--/ebook-tools#shell-scripts>__, you need recent
versions of:
-
calibre_ for fetching metadata from online sources, conversion to txt (for ISBN searching) and ebook metadata extraction. Versions 2.84 and above are preferred because of their ability to manually specify from which specific online source we want to fetch metadata. For earlier versions you have to setisbn_metadata_fetch_orderandorganize_without_isbn_sourcesto empty strings. -
p7zip_ for ISBN searching in ebooks that are in archives. -
Tesseract_ for running OCR on books - version 4 gives better results even though it's still in alpha. OCR is disabled by default and another engine can be configured if preferred. -
Optionally
poppler,catdocandDjVuLibre_ can be installed for faster than calibre's conversion of.pdf,.docand.djvufiles respectively to.txt.:warning:On macOS, you don't need
catdoc_ since it has the built-intextutil_ command-line tool that converts anytxt,html,rtf,rtfd,doc,docx,wordml,odt, orwebarchivefile. -
Optionally the
Goodreads_ andWorldCat xISBN_ calibre plugins can be installed for better metadata fetching.
|
:star:
If you only install calibre among these dependencies, you can still have a functioning program that will organize and manage your ebook collections:
- fetching metadata from online sources will work: by
default <https://manual.calibre-ebook.com/generated/en/fetch-ebook-metadata.html# cmdoption-fetch-ebook-metadata-allowed-plugin>__ calibre comes with Amazon and Google sources among others - conversion to txt will work:
calibre's ownebook-convert_ tool will be used
All subcommands_ should work but accuracy and performance will be
affected as explained in the list of dependencies above.
Install pyebooktools
-
Install first the
Python dependencies <#python-dependencies>_ and othertools <#other-dependencies>_. -
It is highly recommended to install the
pyebooktoolspackage in a virtual environment using for examplevenv_ orconda_. -
Make sure to update pip::
$ pip install --upgrade pip
-
Install the
pyebooktoolspackage (bleeding-edge version) with pip::$ pip install git+https://github.com/raul23/pyebooktools#egg=pyebooktools
:warning:
Make sure that pip is working with the correct Python version. It might be the case that pip is using Python 2.x You can find what Python version pip uses with the following::
$ pip -V
If pip is working with the wrong Python version, then try to use pip3 which works with Python 3.x
Test installation
-
Test your installation by importing
pyebooktoolsand printing its version::$ python -c "import pyebooktools; print(pyebooktools.version)"
-
You can also test that you have access to the
ebooktools.pyscript by showing the program's version::$ ebooktools --version
Usage, options and configuration
All of the options documented below can either be passed to the
ebooktools.py_ script via command-line arguments or via the configuration
file ``config
