DocInferX

DocInferX is a fully-local, privacy-focused document intelligence system. It ingests PDFs and images, performs OCR, cleans text, chunks content, embeds it into a vector database, and lets you chat with your documents offline using a lightweight LLM (Phi-2).

Generate Convert Improve

Install / Use

/learn @shekh-2810/DocInferX

About this skill

Quality Score

0/100

README

⚡ DocInferX

Local AI Document Intelligence Engine

📘 Overview

DocInferX is an offline-ready RAG (Retrieval Augmented Generation) system that lets you upload PDFs or images, extracts the text automatically (OCR + PDF parsing), indexes it using FAISS vector search, and allows you to chat with your documents using a local LLM (Phi-2).
The project is built for privacy-focused document intelligence: fast, local, and completely offline.

⭐ Features

Upload PDF or Images (PNG/JPG/JPEG)
Automatic OCR using PaddleOCR
Smart text chunking & cleaning
FAISS Vector Search for fast recall
Local LLM (Phi-2) for answering queries
Matrix rain cyber UI
Streamlit Frontend
Docker support for easy setup
Runs fully offline
Document Library view

🛠 Technologies & Tools Used

Python 3
Streamlit — UI framework
FAISS — vector database
Sentence Transformers — embeddings
Phi-2 / HuggingFace Transformers — LLM
PaddleOCR — OCR engine
PyPDF2 / pdfreader — PDF parsing
Docker — for containerized deployment

🔧 Installation & Run Guide

1. Clone the repository

git clone https://github.com/shekh-2810/DocInferX.git
cd DocInferX

2. Create a virtual environment

python3 -m venv venv
source venv/bin/activate      # Linux/macOS
venv\Scripts\activate         # Windows

3. Install dependencies

pip install -r requirements.txt

4. Run the app

streamlit run streamlit_app.py

5. Open on web browser

http://localhost:8501

🐳 Docker Setup

1.Build the image:

docker build -t docinferx .

2.Run the container:

docker run -p 8501:8501 docinferx

🧪 Testing Instructions:

1.Open the application in your browser.

2.Upload any PDF or image.

3.Wait for OCR + indexing to complete.

4.Open the Chat tab.

5.Ask questions related to the uploaded document.

6.Compare answers with the source document to verify accuracy.

Screenshots

Upload Page

Completion of upload

Done

Sidebar

Results

Result 1

Result 2

Result 3

Library

👤 Maintainer

Developed by Shashank Shekhar Choudhary.

⭐ Support

If you find this project useful, consider starring the repo!

👉 https://github.com/shekh-2810/DocInferX

Related Skills

feishu-drive

352.9k

things-mac

352.9k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)

clawhub

352.9k

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com

postkit

PostgreSQL-native identity, configuration, metering, and job queues. SQL functions that work with any language or driver

shekh-2810

View profile

View on GitHub

GitHub Stars7

CategoryData

Updated4mo ago

Forks0

shekh-2810/DocInferX

Languages

Python

Security Score

82/100

Audited on Nov 25, 2025

No findings