Handprint
Apply different text recognition services to images of handwritten documents.
Install / Use
/learn @caltechlibrary/HandprintREADME
Handprint<img width="12%" align="right" src="https://raw.githubusercontent.com/caltechlibrary/handprint/develop/.graphics/noun_Hand_733265.png">
The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of document pages. It can produce annotated images showing the results, compare the recognized text to expected text, save the HTR service results as JSON and text files, and more.
Table of Contents
- Introduction
- Installation and configuration
- Usage
- Getting help
- Contributing
- License
- Authors and history
- Acknowledgments
Introduction
Handprint (Handwritten Page Recognition Test) is a tool for comparing alternative services for offline handwritten text recognition (HTR). It was developed for use with documents from the Caltech Archives, but it is completely independent and can be applied to any images of text documents.
Handprint can generate images with recognized text overlaid over them to visualize the results. The image at right shows an example. Among other features, the software can also display bounding boxes, threshold results by confidence values, compare full-text results to expected/ground-truth results, and output the raw results from an HTR service as JSON and text files. It can work with individual images, directories of images, and URLs pointing to images on remote servers. Finally, Handprint can use multiple processor threads for parallel execution.
Services supported include Google's Google Cloud Vision API, Microsoft's Azure Computer Vision API, and Amazon's Textract and Rekognition. The framework for connecting to services could be expanded to support others as well (and contributions are welcome!).
Installation and configuration
The instructions below assume you have a Python interpreter version 3.8 or higher installed on your computer; if that's not the case, please first install Python and familiarize yourself with running Python programs on your system. If you are unsure of which version of Python you have, you can find out by running the following command in a terminal and inspecting the results:
# Note: on Windows, you may have to use "python" instead of "python3"
python3 --version
Note for Mac users: if you are using macOS Catalina (10.15) or later and have never run python3, then the first time you do, macOS will ask you if you want to install the macOS command-line developer tools. Go ahead and do so, as this is the easiest way to get a recent-enough Python 3 on those systems.
Handprint includes several adapters for working with cloud-based HTR services from Amazon, Google, and Microsoft, but does not include credentials for using the services. To be able to use Handprint, you must both install a copy of Handprint on your computer and supply your copy with credentials for accessing the cloud services you want to use. See below for more.
⓵ Install Handprint on your computer
Approach 1: using the standalone Handprint executables
Beginning with version 1.5.1, runnable self-contained single-file executables are available for select operating system and Python version combinations – to use them, you only need a Python 3 interpreter and a copy of Handprint, but do not need to run pip install or other steps. Please click on the relevant heading below to learn more.
Visit the Handprint releases page and look for the ZIP files with names such as (e.g.) handprint-1.5.4-macos-python3.8.zip. Then:
- Download the one matching your version of Python
- Unzip the file (if your browser did not automatically unzip it for you)
- Open the folder thus created (it will have a name like
handprint-1.5.4-macos-python3.8) - Look inside for
handprintand move it to a location where you put other command-line programs (e.g.,/usr/local/bin)
Visit the Handprint releases page and look for the ZIP files with names such as (e.g.) handprint-1.5.4-linux-python3.8.zip. Then:
- Download the one matching your version of Python
- Unzip the file (if your browser did not automatically unzip it for you)
- Open the folder thus created (it will have a name like
handprint-1.5.4-linux-python3.8) - Look inside for
handprintand move it to a location where you put other command-line programs (e.g.,/usr/local/bin)
Standalone executables for Windows are not available at this time. If you are running Windows, please use one of the other methods described below.
</details>Approach 2: using pipx
You can use pipx to install Handprint. Pipx will install it into a separate Python environment that isolates the dependencies needed by Handprint from other Python programs on your system, and yet the resulting handprint command wil be executable from any shell – like any normal application on your computer. If you do not already have pipx on your system, it can be installed in a variety of easy ways and it is best to consult Pipx's installation guide for instructions. Once you have pipx on your system, you can install Handprint with the following command:
pipx install handprint
Pipx can also let you run Handprint directly using pipx run handprint, although in that case, you must always prefix every Handprint command with pipx run. Consult the documentation for pipx run for more information.
Approach 3: using pip
If you prefer, you can install Handprint with pip. If you don't have pip package or are uncertain if you do, please consult the pip installation instructions. Then, to install or upgrade Handprint from the Python package repository, run the following command:
python3 -m pip install handprint --upgrade
⓶ Add cloud service credentials
A one-time configuration step is needed for each cloud-based HTR service after you install Handprint on a computer. This step supplies Handprint with credentials to access the services. In each case, the same command format is used:
handprint -a SERVICENAME CREDENTIALSFILE.json
SERVICENAME must be one of the service names printed by running handprint -l, and CREDENTIALSFILE.json must have one of the formats discussed below. When you run this command, Handprint copies CREDENTIALSFILE.json to a private location, and thereafter uses the credentials to access SERVICENAME. (The private location is different on different systems; for example, on macOS it is ~/Library/Application Support/Handprint/.) Examples are given below.
Microsoft
Microsoft's approach to credentials in Azure involves the use of subscription keys. The format of the credentials file for Handprint needs to contain two fields:
{
"subscription_key": "YOURKEYHERE",
"endpoint": "https://ENDPOINT"
}
The value "YOURKEYHERE" will be a string such as "18de248475134eb49ae4a4e94b93461c", and it will be associated with an endpoint URI such as "https://westus.api.cognitive.microsoft.com". To obtain a key and the corresponding endpoint URI, visit https://portal.azure.com and sign
