VoucherVision

Hello! Please check out the VoucherVisionGO version of this project first!

The regular VoucherVision Repo is more of an advanced solution that allows lots of customization. But the VoucherVisionGO API will allow you to easily start transcribing your images immediately. The API is completely free for now, you just have to create an account to get access!

Table of Contents
About
Updates
Try our public demo!
Installing VoucherVision
Create a Desktop Shortcut to Launch VoucherVision GUI (MacOS)
Create a Desktop Shortcut to Launch VoucherVision GUI (Windows)
Run VoucherVision
Custom Prompt Builder
Expense Reporting
- Expense Report Dashboard
User Interface Images

About

VoucherVision - In Beta Testing Phase 🚀

For inquiries, feedback (or if you want to get involved!) please complete our form.

UPDATES

Feb. 20, 2025

The recommended workflow is now the following:
- OCR: Gemini 2.0 Flash
- LLM: Gemini 2.0 Flash
- Prompt: SLTPvM_default.yaml
What are we working on right now?
- A simple API. You upload an image, you get the JSON or CSV.
- Docker containerization. Thanks Megi!
- Stress-testing and stability testing: MICH herbarium has processed ~50,000 images so far
- Testing the VV Editor in the MICH workflow
- Adding extremely cool auto-correction tools to fix transcription error based on already transcribed specimens
We are narrowing in on a couple workflows that should work for most institutions, this repo will transition to my dev environment and a new branch will be used for deployment.
Make sure that you do not install PyTorch 2.6+ becasue it currently breaks the LeafMachine2 YOLO model that we use to identify text.

December 4, 2024

This update will require you to delete your expense_report.csv file, allowing VoucherVision to create a new one. The headers were updated, so new runs will not be able to merge with existing data. You can make a copy of the file for your records and then delete ./VoucherVision/expense_report/expense_report.csv
I am transitioning away from Langchain, calls to LLM_XXXXX.py files will undergo changes
Many, many new OCR engines are supported including
- All current Hyperbolic VLMs ("Qwen/Qwen2-VL-72B-Instruct", "Qwen/Qwen2-VL-7B-Instruct", "mistralai/Pixtral-12B-2409") Hyperbolic
- Google Gemini models (Gemini-1.5-Pro, Gemini-1.5-Flash, Gemini-1.5-Flash-8B)
- OpenAI (GPT4o-mini, GPT4o)
- Locally hosted models, your computer must have an NVIDIA GPU with at least 24 GB of VRAM (Florence-2, Qwen2-VL)
Since we are moving toward VLMs for OCR, I have improved cost tracking to include these too
💥 THE BEST OCR + LLM OPTIONS RIGHT NOW 💥
- The first/only OCR engine to reliably read the difficult cursive in the included demo image is Gemini-1.5-Pro!
- The best option for locally hosted OCR is by far Qwen2-VL with Florence-2 coming in 2nd
- The best LLMs for parsing the OCR are still Gemini-1.5-Pro and GPT-4o due to their 'knowledge' and ability to correct/infer OCR errors/omissions
- The best value is GPT-4o-mini and Gemini-1.5-Flash, which are highly capable at parsing, but lack the ability to significantly correct/infer OCR errors/omissions
- The best locally hosted LLM seems to be mistralai/Mistral-Small-Instruct-2409, but I still need to add more local options that work reliably with 12-24GB of VRAM, which are usually 7B models or quantized versions of the full precision models
The HuggingFace version does not include all of the OCR and LLM options that are available if you install VoucherVision locally

Overview:

Initiated by the University of Michigan Herbarium, VoucherVision harnesses the power of large language models (LLMs) to transform the transcription process of natural history specimen labels. Our workflow is as follows:

Text extraction from specimen labels with LeafMachine2.
Text interpretation using Google Vision OCR.
LLMs, including GPT-3.5, GPT-4, PaLM 2, and Azure instances of OpenAI models, standardize the OCR output into a consistent spreadsheet format. This data can then be integrated into various databases like Specify, Symbiota, and BRAHMS.

For ensuring accuracy and consistency, the VoucherVisionEditor serves as a quality control tool.

Thanks to all of our collaborating institutions!

Package Information:

The main VoucherVision tool and the VoucherVisionEditor are packaged separately. This separation ensures that lower-performance computers can still install and utilize the editor. While VoucherVision is optimized to function smoothly on virtually any modern system, maximizing its capabilities (like using LeafMachine2 label collages or running Retrieval Augmented Generation (RAG) prompts) mandates a GPU.

NOTE: You can absolutely run VoucherVision on computers that do not have a GPU, but the LeafMachine2 collage will run slower.

Try our public demo!

Our public demo, while lacking several quality control and reliability features found in the full VoucherVision module, provides an exciting glimpse into its capabilities. Feel free to upload your herbarium specimen and see what happens! VoucherVision Demo

Installing VoucherVision (using PIP)

Prerequisites

Python 3.10.4 or later
Optional: an Nvidia GPU + CUDA for running LeafMachine2

Installation - Cloning the VoucherVision Repository

First, install Python 3.10, or greater, on your machine of choice. We have validated up to Python 3.11.
- Make sure that you can use pip to install packages on your machine, or at least inside of a virtual environment.
- Simply type pip into your terminal or PowerShell. If you see a list of options, you are all set. Otherwise, see either this PIP Documentation or this help page
Open a terminal window and cd into the directory where you want to install VoucherVision.
In the Git BASH terminal, clone the VoucherVision repository from GitHub by running the command: <pre><code class="language-python">git clone https://github.com/Gene-Weaver/VoucherVision.git</code></pre> <button class="btn" data-clipboard-target="#code-snippet"></button>
Move into the VoucherVision directory by running cd VoucherVision in the terminal.
Update submodules <pre><code class="language-python">git submodule update --init --recursive</code></pre> <button class="btn" data-clipboard-target="#code-snippet"></button>
To run VoucherVision we need to install its dependencies inside of a python virtual environmnet. Follow the instructions below for your operating system.

About Python Virtual Environments

A virtual environment is a tool to keep the dependencies required by different projects in separate places, by creating isolated python virtual environments for them. This avoids any conflicts between the packages that you have installed for different projects. It makes it easier to maintain different versions of packages for different projects.

For more information about virtual environments, please see Creation of virtual environments

Installation - Windows 10+

Installation should basically be the same for Linux.

Virtual Environment

Still inside the VoucherVision directory, show that a venv is currently not active <pre><code class="language-python">python --version</code></pre> <button class="btn" data-clipboard-target="#code-snippet"></button>
Then create the virtual environment (.venv_VV is the name of our new virtual environment) <pre><code class="language-pyth

VoucherVision

Install / Use

README

VoucherVision

Hello! Please check out the VoucherVisionGO version of this project first!

Table of Contents

About

VoucherVision - In Beta Testing Phase 🚀

UPDATES

Overview:

Package Information:

Try our public demo!

Installing VoucherVision (using PIP)

Prerequisites

Installation - Cloning the VoucherVision Repository

About Python Virtual Environments

Installation - Windows 10+

Virtual Environment