OCRIntegrator
OCRFusion is an integrated solution that combines multiple open-source OCR (Optical Character Recognition) models, layout analysis, and table parsing capabilities. This project unifies these functionalities into a single interface, providing a streamlined and efficient way to process and extract information from various types of documents.
Install / Use
/learn @peakhell/OCRIntegratorREADME
OCRIntegrator
Encapsulates open-source OCR models, table detection, layout recognition, and other capabilities, providing services through a unified interface. Currently, only deepdoc is integrated, with more services to be integrated in the future.
Introduce
- In deepdoc, pdfplumber is used to read text, while OCR is used to recognize text. The text from pdfplumber is preferred, and OCR is used entirely for scanned documents.
🎬 Get Started
📝 Prerequisites
- python >= 3.11 (recommended to use conda)
- GPU > 6G
- tensorrt == 10.0.1
- CUDA == 12.3 (other versions may work theoretically, but have not been tested)
- pycuda == 2024.1
运行环境
- Install Python 3.11, recommended to use conda.
- Install poetry:
curl -sSL https://install.python-poetry.org | python3 - - Install dependencies using poetry:
poetry install - Run the project:
uvicorn main:app
Running with GPU requires installing TensorRT
- Install TensorRT, note that the name of tensorrt-cu12 needs to be modified according to the CUDA version.
pip install tensorrt==10.0.1 pip install tensorrt-cu12==10.0.1 - Install pycuda
pip install pycuda == 2024.1
Below are screenshots of my environment for reference:

DEMO

API Documentation
After starting, you can view the usage methods through the documentation: http://localhost:8000/docs http://localhost:8000/docs
