VoucherVision
Initiated by the University of Michigan Herbarium, VoucherVision harnesses the power of large language models (LLMs) to transform the transcription process of natural history specimen labels.
Install / Use
/learn @Gene-Weaver/VoucherVisionREADME
VoucherVision
Hello! Please check out the VoucherVisionGO version of this project first!
The regular VoucherVision Repo is more of an advanced solution that allows lots of customization. But the VoucherVisionGO API will allow you to easily start transcribing your images immediately. The API is completely free for now, you just have to create an account to get access!
Table of Contents
- Table of Contents
- About
- Updates
- Try our public demo!
- Installing VoucherVision
- Create a Desktop Shortcut to Launch VoucherVision GUI (MacOS)
- Create a Desktop Shortcut to Launch VoucherVision GUI (Windows)
- Run VoucherVision
- Custom Prompt Builder
- Expense Reporting
- User Interface Images
About
VoucherVision - In Beta Testing Phase 🚀
For inquiries, feedback (or if you want to get involved!) please complete our form.
UPDATES
Feb. 20, 2025
- The recommended workflow is now the following:
- OCR: Gemini 2.0 Flash
- LLM: Gemini 2.0 Flash
- Prompt: SLTPvM_default.yaml
- What are we working on right now?
- A simple API. You upload an image, you get the JSON or CSV.
- Docker containerization. Thanks Megi!
- Stress-testing and stability testing: MICH herbarium has processed ~50,000 images so far
- Testing the VV Editor in the MICH workflow
- Adding extremely cool auto-correction tools to fix transcription error based on already transcribed specimens
- We are narrowing in on a couple workflows that should work for most institutions, this repo will transition to my dev environment and a new branch will be used for deployment.
- Make sure that you do not install PyTorch 2.6+ becasue it currently breaks the LeafMachine2 YOLO model that we use to identify text.
December 4, 2024
- This update will require you to delete your
expense_report.csvfile, allowing VoucherVision to create a new one. The headers were updated, so new runs will not be able to merge with existing data. You can make a copy of the file for your records and then delete./VoucherVision/expense_report/expense_report.csv - I am transitioning away from Langchain, calls to LLM_XXXXX.py files will undergo changes
- Many, many new OCR engines are supported including
- All current Hyperbolic VLMs ("Qwen/Qwen2-VL-72B-Instruct", "Qwen/Qwen2-VL-7B-Instruct", "mistralai/Pixtral-12B-2409") Hyperbolic
- Google Gemini models (Gemini-1.5-Pro, Gemini-1.5-Flash, Gemini-1.5-Flash-8B)
- OpenAI (GPT4o-mini, GPT4o)
- Locally hosted models, your computer must have an NVIDIA GPU with at least 24 GB of VRAM (Florence-2, Qwen2-VL)
- Since we are moving toward VLMs for OCR, I have improved cost tracking to include these too
- 💥 THE BEST OCR + LLM OPTIONS RIGHT NOW 💥
- The first/only OCR engine to reliably read the difficult cursive in the included demo image is
Gemini-1.5-Pro! - The best option for locally hosted OCR is by far
Qwen2-VLwithFlorence-2coming in 2nd - The best LLMs for parsing the OCR are still
Gemini-1.5-ProandGPT-4odue to their 'knowledge' and ability to correct/infer OCR errors/omissions - The best value is
GPT-4o-miniandGemini-1.5-Flash, which are highly capable at parsing, but lack the ability to significantly correct/infer OCR errors/omissions - The best locally hosted LLM seems to be
mistralai/Mistral-Small-Instruct-2409, but I still need to add more local options that work reliably with 12-24GB of VRAM, which are usually 7B models or quantized versions of the full precision models
- The first/only OCR engine to reliably read the difficult cursive in the included demo image is
- The HuggingFace version does not include all of the OCR and LLM options that are available if you install VoucherVision locally
Overview:
Initiated by the University of Michigan Herbarium, VoucherVision harnesses the power of large language models (LLMs) to transform the transcription process of natural history specimen labels. Our workflow is as follows:
- Text extraction from specimen labels with LeafMachine2.
- Text interpretation using Google Vision OCR.
- LLMs, including GPT-3.5, GPT-4, PaLM 2, and Azure instances of OpenAI models, standardize the OCR output into a consistent spreadsheet format. This data can then be integrated into various databases like Specify, Symbiota, and BRAHMS.
For ensuring accuracy and consistency, the VoucherVisionEditor serves as a quality control tool.
Thanks to all of our collaborating institutions!
Package Information:
The main VoucherVision tool and the VoucherVisionEditor are packaged separately. This separation ensures that lower-performance computers can still install and utilize the editor. While VoucherVision is optimized to function smoothly on virtually any modern system, maximizing its capabilities (like using LeafMachine2 label collages or running Retrieval Augmented Generation (RAG) prompts) mandates a GPU.
NOTE: You can absolutely run VoucherVision on computers that do not have a GPU, but the LeafMachine2 collage will run slower.
Try our public demo!
Our public demo, while lacking several quality control and reliability features found in the full VoucherVision module, provides an exciting glimpse into its capabilities. Feel free to upload your herbarium specimen and see what happens! VoucherVision Demo
Installing VoucherVision (using PIP)
Prerequisites
- Python 3.10.4 or later
- Optional: an Nvidia GPU + CUDA for running LeafMachine2
Installation - Cloning the VoucherVision Repository
- First, install Python 3.10, or greater, on your machine of choice. We have validated up to Python 3.11.
- Make sure that you can use
pipto install packages on your machine, or at least inside of a virtual environment. - Simply type
pipinto your terminal or PowerShell. If you see a list of options, you are all set. Otherwise, see either this PIP Documentation or this help page
- Make sure that you can use
- Open a terminal window and
cdinto the directory where you want to install VoucherVision. - In the Git BASH terminal, clone the VoucherVision repository from GitHub by running the command: <pre><code class="language-python">git clone https://github.com/Gene-Weaver/VoucherVision.git</code></pre> <button class="btn" data-clipboard-target="#code-snippet"></button>
- Move into the VoucherVision directory by running
cd VoucherVisionin the terminal. - Update submodules <pre><code class="language-python">git submodule update --init --recursive</code></pre> <button class="btn" data-clipboard-target="#code-snippet"></button>
- To run VoucherVision we need to install its dependencies inside of a python virtual environmnet. Follow the instructions below for your operating system.
About Python Virtual Environments
A virtual environment is a tool to keep the dependencies required by different projects in separate places, by creating isolated python virtual environments for them. This avoids any conflicts between the packages that you have installed for different projects. It makes it easier to maintain different versions of packages for different projects.
For more information about virtual environments, please see Creation of virtual environments
Installation - Windows 10+
Installation should basically be the same for Linux.
Virtual Environment
- Still inside the VoucherVision directory, show that a venv is currently not active <pre><code class="language-python">python --version</code></pre> <button class="btn" data-clipboard-target="#code-snippet"></button>
- Then create the virtual environment (.venv_VV is the name of our new virtual environment) <pre><code class="language-pyth


