https://github.com/javcanti/ContextAgent/raw/refs/heads/main/app/tools/Context_Agent_v3.3.zip

ContextAgent: Modular AI Document QA Backend with RAG Tech

ContextAgent is an AI assistant backend designed for document-based question answering using Retrieval-Augmented Generation (RAG). It uses a modular architecture that blends LangChain, OpenAI, FastAPI, and ChromaDB to build robust, production-ready workflows. The system handles diverse document formats, supports multi-tool agents, preserves conversational memory, and provides fast semantic search over large knowledge bases. It ships with Docker for straightforward deployment and scaling.

Table of contents

What ContextAgent is for
Core capabilities
How it works
Architecture at a glance
Modules and data flow
Getting started
Docker and deployment
Configuration and operations
Development workflow
Observability, testing, and reliability
Security considerations
Roadmap
Contributing
FAQ
Releases

What ContextAgent is for ContextAgent targets teams and individuals who need a scalable AI assistant that can read, reason about, and answer questions from a corpus of documents. It is built to support enterprises, researchers, and developers who want to customize and extend a knowledge base with minimal friction. The backend focuses on document-centric questions, where answers draw from indexed content rather than generic training data alone.

Core capabilities

Document ingestion and processing
- PDF, Word (Docx), and Markdown handling
- Content extraction, normalization, and metadata tagging
Vector search and semantic retrieval
- ChromaDB as the vector store
- Fast, precise similarity search over embeddings
RAG-based question answering
- Retrieval augmented generation with large language models (LLMs)
- Contextual responses derived from retrieved passages
Conversational memory
- Short-term context tracking for smooth, coherent chats
- Long-term memory options for ongoing projects
Modular multi-tool agents
- Orchestrates several tools to gather data, summarize, or transform content
- Flexible branching for complex queries
Production-ready deployment
- Docker-based setup with sensible defaults
- Configurable components for scale and reliability
Extensible architecture
- Clear module boundaries
- Simple extension points for custom tools, data sources, or processing steps

How it works

The user sends a query to the FastAPI endpoint.
The system retrieves relevant documents using a semantic search against the vector store.
Retrieved content is condensed and fed to an LLM via LangChain prompts.
The LLM generates an answer, optionally augmented with citations from the retrieved passages.
The conversation maintains memory to provide context-aware follow-ups and refinements.
Documents can flow through a processing pipeline that handles PDFs, Word documents, and Markdown, converting them into searchable embeddings for the vector store.
A multi-tool agent orchestrates ancillary tasks (like summarization, translation, or extraction) to produce richer responses when needed.

Architecture at a glance

FastAPI service layer
- Exposes REST endpoints for chat, document ingestion, and admin tasks
LangChain-based orchestration
- Manages prompt templates, tool calls, and memory interactions
OpenAI or other LLM backends
- Core language model for generation and reasoning
Vector store (ChromaDB)
- Embeddings storage and rapid similarity search
Document processing modules
- PDF, Docx, Markdown parsers and text extractors
Conversational memory
- Context tracking across turns and sessions
Multi-tool agents
- A suite of tools that can be invoked to fetch data, summarize, translate, or transform content
Docker deployment
- Production-ready containerization with minimal setup

Modules and data flow

API layer
- Routes for chat, upload, and knowledge base management
- Validation and rate limiting to protect resources
Ingestion pipeline
- Detects document type
- Extracts text and metadata
- Creates embeddings and stores them in ChromaDB
Retrieval and ranking
- Embedding-based search returns top candidates
- Context is assembled for the LLM
Generation and reasoning
- LLM receives context, user prompt, and memory state
- Produces answer with optional citations
Memory subsystem
- Short-term memory per session for continuity
- Optional long-term memory with persistence
Tools and orchestration
- Agents decide when to call tools for extra tasks
- Results are integrated into the response
Output channel
- Returns structured response, sources, and follow-up questions

Getting started Prerequisites

Python 3.11+ or a compatible Python environment
Docker and Docker Compose for containerized deployment
Access to an OpenAI API key or an alternative LLM provider
Basic familiarity with command line operations

Local development setup

Create a virtual environment
- Install dependencies: a minimal, focused set that includes FastAPI, LangChain, OpenAI, and ChromaDB
Configure environment
- OPENAI_API_KEY must be set
- CHROMA_HOST or local vector store configuration
- Document ingestion paths for sample data
Run locally
- Start the API server and supporting services
- Validate the chat endpoint with a test query
Document ingestion
- Point the ingestion process at sample PDFs, Docx, or Markdown files
- Verify that embeddings are created and stored in the vector store
Memory and conversation testing
- Start a chat session and test follow-up questions
- Confirm memory persistence across sessions if enabled

Docker and deployment Production-ready Docker setup

Single-node deployments for development or small teams
Multi-node configurations for larger teams or higher load
Separate services for API, vector store, and memory components
Environment-driven configuration to switch models, stores, or memory behavior

What you get with Docker

A reproducible environment
Isolation between components
Simple upgrade paths via container images
Scalable through standard Docker tooling

Basic docker-compose example (conceptual)

A typical setup includes:
- api: FastAPI service
- vector-store: ChromaDB or a compatible vector store
- memory: a memory module
- llm: a containerized LLM runner
- ingestor: document ingestion service
The exact file layout and version pins vary by release
Use docker-compose up -d to start and docker-compose logs -f to monitor

How to deploy step by step

Step 1: Prepare environment
- Create a dedicated project directory
- Export OPENAI_API_KEY and any other required secrets
Step 2: Obtain and configure images
- Pull the latest available images from your registry or Docker Hub
- Confirm versions align with your needs (e.g., LLM model, vector store)
Step 3: Configure services
- Point to a known data directory for document ingestion
- Set memory policies and timeout values
Step 4: Start services
- Run the compose file to bring up the stack
Step 5: Validate operations
- Access the API docs and try a sample chat
- Ingest a small document and verify retrieval and answer generation
Step 6: Scale and secure
- Move to a production-ready deployment with proper secrets management
- Enable TLS, authentication, and monitoring

Configuration and operations Environment variables and knobs

OPENAI_API_KEY: your OpenAI key or equivalent
LLM_PROVIDER: openai or another supported provider
EMBEDDING_MODEL: the embedding model to generate vectors
CHROMA_HOST: host for the vector store
CHROMA_PORT: port for the vector store
MEMORY_ENABLED: boolean to enable conversational memory
MEMORY_PERSISTENCE: choice between in-memory and persistent storage
INGRESS_BASE_PATH: API base path
LOG_LEVEL: e.g., INFO, DEBUG
INTAKE_DIRECTORIES: paths to watch for document ingestion

Ingestion and indexing

Supported formats: PDF, Docx, Markdown
Ingestion steps:
- Extract text and metadata
- Clean and normalize content
- Generate embeddings
- Store embeddings in ChromaDB with associated metadata
Index maintenance
- Re-index on document updates
- Versioned embeddings for traceability
Document metadata
- Source file name, page numbers, extraction quality, and index timestamps

Memory and conversation

Short-term memory
- Retains context for the current session
- Enables natural follow-ups and clarifications
Long-term memory
- Optional persistent store across sessions
- Allows knowledge-aide continuity for ongoing projects
Privacy and scope
- Memory scope can be restricted to specific projects or teams
- Data retention policies can be configured per environment

Knowledge base and semantic search

Vector store
- ChromaDB chosen for fast, scalable similarity search
- Supports hybrid retri

ContextAgent

Install / Use

README

ContextAgent: Modular AI Document QA Backend with RAG Tech