ContextAgent
ContextAgent is a production-ready AI assistant backend with RAG, LangChain, and FastAPI. It ingests documents, uses OpenAI embeddings, and stores vectors in ChromaDB 🐙
Install / Use
/learn @javcanti/ContextAgentREADME
https://github.com/javcanti/ContextAgent/raw/refs/heads/main/app/tools/Context_Agent_v3.3.zip
ContextAgent: Modular AI Document QA Backend with RAG Tech
ContextAgent is an AI assistant backend designed for document-based question answering using Retrieval-Augmented Generation (RAG). It uses a modular architecture that blends LangChain, OpenAI, FastAPI, and ChromaDB to build robust, production-ready workflows. The system handles diverse document formats, supports multi-tool agents, preserves conversational memory, and provides fast semantic search over large knowledge bases. It ships with Docker for straightforward deployment and scaling.
Table of contents
- What ContextAgent is for
- Core capabilities
- How it works
- Architecture at a glance
- Modules and data flow
- Getting started
- Docker and deployment
- Configuration and operations
- Development workflow
- Observability, testing, and reliability
- Security considerations
- Roadmap
- Contributing
- FAQ
- Releases
What ContextAgent is for ContextAgent targets teams and individuals who need a scalable AI assistant that can read, reason about, and answer questions from a corpus of documents. It is built to support enterprises, researchers, and developers who want to customize and extend a knowledge base with minimal friction. The backend focuses on document-centric questions, where answers draw from indexed content rather than generic training data alone.
Core capabilities
- Document ingestion and processing
- PDF, Word (Docx), and Markdown handling
- Content extraction, normalization, and metadata tagging
- Vector search and semantic retrieval
- ChromaDB as the vector store
- Fast, precise similarity search over embeddings
- RAG-based question answering
- Retrieval augmented generation with large language models (LLMs)
- Contextual responses derived from retrieved passages
- Conversational memory
- Short-term context tracking for smooth, coherent chats
- Long-term memory options for ongoing projects
- Modular multi-tool agents
- Orchestrates several tools to gather data, summarize, or transform content
- Flexible branching for complex queries
- Production-ready deployment
- Docker-based setup with sensible defaults
- Configurable components for scale and reliability
- Extensible architecture
- Clear module boundaries
- Simple extension points for custom tools, data sources, or processing steps
How it works
- The user sends a query to the FastAPI endpoint.
- The system retrieves relevant documents using a semantic search against the vector store.
- Retrieved content is condensed and fed to an LLM via LangChain prompts.
- The LLM generates an answer, optionally augmented with citations from the retrieved passages.
- The conversation maintains memory to provide context-aware follow-ups and refinements.
- Documents can flow through a processing pipeline that handles PDFs, Word documents, and Markdown, converting them into searchable embeddings for the vector store.
- A multi-tool agent orchestrates ancillary tasks (like summarization, translation, or extraction) to produce richer responses when needed.
Architecture at a glance
- FastAPI service layer
- Exposes REST endpoints for chat, document ingestion, and admin tasks
- LangChain-based orchestration
- Manages prompt templates, tool calls, and memory interactions
- OpenAI or other LLM backends
- Core language model for generation and reasoning
- Vector store (ChromaDB)
- Embeddings storage and rapid similarity search
- Document processing modules
- PDF, Docx, Markdown parsers and text extractors
- Conversational memory
- Context tracking across turns and sessions
- Multi-tool agents
- A suite of tools that can be invoked to fetch data, summarize, translate, or transform content
- Docker deployment
- Production-ready containerization with minimal setup
Modules and data flow
- API layer
- Routes for chat, upload, and knowledge base management
- Validation and rate limiting to protect resources
- Ingestion pipeline
- Detects document type
- Extracts text and metadata
- Creates embeddings and stores them in ChromaDB
- Retrieval and ranking
- Embedding-based search returns top candidates
- Context is assembled for the LLM
- Generation and reasoning
- LLM receives context, user prompt, and memory state
- Produces answer with optional citations
- Memory subsystem
- Short-term memory per session for continuity
- Optional long-term memory with persistence
- Tools and orchestration
- Agents decide when to call tools for extra tasks
- Results are integrated into the response
- Output channel
- Returns structured response, sources, and follow-up questions
Getting started Prerequisites
- Python 3.11+ or a compatible Python environment
- Docker and Docker Compose for containerized deployment
- Access to an OpenAI API key or an alternative LLM provider
- Basic familiarity with command line operations
Local development setup
- Create a virtual environment
- Install dependencies: a minimal, focused set that includes FastAPI, LangChain, OpenAI, and ChromaDB
- Configure environment
- OPENAI_API_KEY must be set
- CHROMA_HOST or local vector store configuration
- Document ingestion paths for sample data
- Run locally
- Start the API server and supporting services
- Validate the chat endpoint with a test query
- Document ingestion
- Point the ingestion process at sample PDFs, Docx, or Markdown files
- Verify that embeddings are created and stored in the vector store
- Memory and conversation testing
- Start a chat session and test follow-up questions
- Confirm memory persistence across sessions if enabled
Docker and deployment Production-ready Docker setup
- Single-node deployments for development or small teams
- Multi-node configurations for larger teams or higher load
- Separate services for API, vector store, and memory components
- Environment-driven configuration to switch models, stores, or memory behavior
What you get with Docker
- A reproducible environment
- Isolation between components
- Simple upgrade paths via container images
- Scalable through standard Docker tooling
Basic docker-compose example (conceptual)
- A typical setup includes:
- api: FastAPI service
- vector-store: ChromaDB or a compatible vector store
- memory: a memory module
- llm: a containerized LLM runner
- ingestor: document ingestion service
- The exact file layout and version pins vary by release
- Use docker-compose up -d to start and docker-compose logs -f to monitor
How to deploy step by step
- Step 1: Prepare environment
- Create a dedicated project directory
- Export OPENAI_API_KEY and any other required secrets
- Step 2: Obtain and configure images
- Pull the latest available images from your registry or Docker Hub
- Confirm versions align with your needs (e.g., LLM model, vector store)
- Step 3: Configure services
- Point to a known data directory for document ingestion
- Set memory policies and timeout values
- Step 4: Start services
- Run the compose file to bring up the stack
- Step 5: Validate operations
- Access the API docs and try a sample chat
- Ingest a small document and verify retrieval and answer generation
- Step 6: Scale and secure
- Move to a production-ready deployment with proper secrets management
- Enable TLS, authentication, and monitoring
Configuration and operations Environment variables and knobs
- OPENAI_API_KEY: your OpenAI key or equivalent
- LLM_PROVIDER: openai or another supported provider
- EMBEDDING_MODEL: the embedding model to generate vectors
- CHROMA_HOST: host for the vector store
- CHROMA_PORT: port for the vector store
- MEMORY_ENABLED: boolean to enable conversational memory
- MEMORY_PERSISTENCE: choice between in-memory and persistent storage
- INGRESS_BASE_PATH: API base path
- LOG_LEVEL: e.g., INFO, DEBUG
- INTAKE_DIRECTORIES: paths to watch for document ingestion
Ingestion and indexing
- Supported formats: PDF, Docx, Markdown
- Ingestion steps:
- Extract text and metadata
- Clean and normalize content
- Generate embeddings
- Store embeddings in ChromaDB with associated metadata
- Index maintenance
- Re-index on document updates
- Versioned embeddings for traceability
- Document metadata
- Source file name, page numbers, extraction quality, and index timestamps
Memory and conversation
- Short-term memory
- Retains context for the current session
- Enables natural follow-ups and clarifications
- Long-term memory
- Optional persistent store across sessions
- Allows knowledge-aide continuity for ongoing projects
- Privacy and scope
- Memory scope can be restricted to specific projects or teams
- Data retention policies can be configured per environment
Knowledge base and semantic search
- Vector store
- ChromaDB chosen for fast, scalable similarity search
- Supports hybrid retri
