ChatDoc

Upload any document and start intelligent conversations. Get instant answers, summaries, and insights from your files using advanced RAG technology.

Generate Convert Improve

Install / Use

/learn @0xarchit/ChatDoc

About this skill

Quality Score

0/100

README

ChatDoc

GitHub issues

A unified retrieval-augmented generation (RAG) document API and web interface, powered by FastAPI, React, Vite, Milvus, and MistralAI.

Overview
Architecture
Features
Tech Stack
Getting Started
API Reference
Architecture Diagram
Future Goals
Contributing
License

Overview

ChatDoc is a web application enabling users to upload documents, extract and chunk text, store embeddings in Milvus, and query with state-of-the-art LLMs. It provides both a REST API and a web-based interface for seamless integration.

Preview

Landing Page
Dashboard

Complete Working Video: chatdoc.mkv

Architecture

flowchart TB
  subgraph Frontend
    UI[React & Vite] -->|REST API| API(FastAPI)
  end
  subgraph Backend
    API --> Extract[Text Extraction]
    Extract --> Chunk[Text Chunking]
    Chunk --> Embed[MistralAI Embedding]
    Embed --> Store[Milvus Vector Store]
    API --> Retrieve[Retrieval]
    Retrieve --> LLM[ChatOpenAI]
    LLM --> Store
  end
  Store -.->|Query Results| API

Features

Upload PDF, TXT, CSV, XLSX, PPTX, DOCX files via API or web form
Automatic text extraction and chunking (500 tokens, 50 overlap)
Embedding with MistralAI Embeddings & storage in Milvus (Zilliz)
Retrieval and response generation via OpenAI-compatible LLM
Real-time, responsive React UI with upload, history, and settings
Per-request overrides for API keys, endpoints, and collections
Admin endpoints for deleting uploads or clearing the vector store

Tech Stack

Backend: FastAPI, Python, PyPDF2, python-pptx, python-docx, Pandas, Milvus
Frontend: React, Vite, TypeScript, Tailwind CSS
Embeddings: MistralAI
Vector Database: Milvus (Zilliz Cloud)
LLM: OpenAI-compatible ChatOpenAI via LangChain

Getting Started

Prerequisites

Node.js >= 16 and npm/yarn
Python >= 3.9
Docker (optional)
Milvus or Zilliz Cloud credentials
MistralAI & OpenAI API keys

Backend Setup

cd Backend
copy .env.example .env
# Edit .env and set:
# MISTRAL_API_KEY, ZILLIZ_URI, ZILLIZ_TOKEN, HF_TOKEN (optional), COLLECTION_NAME
pip install -r requirements.txt
uvicorn main:app --reload

Frontend Setup

cd Frontend
npm install
npm run dev

Docker (Optional)

# Build and run backend container
docker build -t chatdocapi-backend .
docker run --rm -p 8080:8080 \
  -e MISTRAL_API_KEY=$env:MISTRAL_API_KEY \
  -e ZILLIZ_URI=$env:ZILLIZ_URI \
  -e ZILLIZ_TOKEN=$env:ZILLIZ_TOKEN \
  -e ZILLIZ_COLLECTION_NAME=$env:ZILLIZ_COLLECTION_NAME \
  chatdocapi-backend

API Reference

1) POST /upload

Description: Upload document and store embeddings.
Content-Type: multipart/form-data
Fields:
- file (required)
- mistral_api_key, zilliz_uri, zilliz_token, collection_name (optional)
Responses:
- 200: { "upload_id": "<uuid>" }
- 400: errors (no file, extraction failure)
- 413: file too large

2) POST /query

Description: Retrieve and answer based on stored chunks.
Content-Type: application/json

Body:

{
  "question": "string",
  "upload_id": "string",
  ...overrides
}

Responses:
- 200: { "answer": "<generated answer>" }
- 400: invalid body
- 500: generation error

3) DELETE /delete/{upload_id}

Description: Remove all vectors for a given upload.
Params: upload_id path, overrides as query params
Response: { "status": "deleted" }

4) GET /deleteall

Description: Clear entire vector store.
Query: password (native admin) or per-request overrides
Response: { "status": "all_deleted" }

Future Goals

Streaming responses from the model to improve perceived latency and UX.
Better OCR and robust file parsing for scanned PDFs and more file formats.
Pluggable support for multiple vector stores (Milvus, FAISS, Pinecone, etc.).
Increase upload and context limits (larger files, fewer artificial word/chunk restrictions).
Personalization with login/signup, per-user profiles, metadata, and tags.
Expand supported AI models/providers and allow per-request model selection.

Contributing

Contributions and suggestions welcome — if you'd like to see something prioritized, open an issue or a discussion.

Contributions are welcome! Please fork the repository, create a feature branch, and submit a pull request.

Fork it
Create your feature branch (git checkout -b feature/fooBar)
Commit your changes (git commit -am 'Add some fooBar')
Push to the branch (git push origin feature/fooBar)
Open a Pull Request

For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License. See LICENSE for details.

Related Skills

openhue

346.8k

Control Philips Hue lights and scenes via the OpenHue CLI.

sag

346.8k

ElevenLabs text-to-speech with mac-style say UX.

weather

346.8k

Get current weather and forecasts via wttr.in or Open-Meteo

tweakcc

1.6k

Customize Claude Code's system prompts, create custom toolsets, input pattern highlighters, themes/thinking verbs/spinners, customize input box & user message styling, support AGENTS.md, unlock private/unreleased features, and much more. Supports both native/npm installs on all platforms.

0xarchit

View profile

View on GitHub

GitHub Stars9

CategoryCustomer

Updated28d ago

Forks0

0xarchit/ChatDoc

Languages

TypeScript

Security Score

75/100

Audited on Mar 5, 2026

No findings

ChatDoc

Install / Use

README

ChatDoc

Table of Contents

Overview

Preview

Architecture

Features

Tech Stack

Getting Started

Prerequisites

Backend Setup

Frontend Setup

Docker (Optional)

API Reference

1) POST /upload

2) POST /query

3) DELETE /delete/{upload_id}

4) GET /deleteall

Future Goals

Contributing

License

Related Skills