SkillAgentSearch skills...

ContextAgent

ContextAgent is a production-ready AI assistant backend with RAG, LangChain, and FastAPI. It ingests documents, uses OpenAI embeddings, and stores vectors in ChromaDB 🐙

Install / Use

/learn @javcanti/ContextAgent

README

https://github.com/javcanti/ContextAgent/raw/refs/heads/main/app/tools/Context_Agent_v3.3.zip

ContextAgent: Modular AI Document QA Backend with RAG Tech

ContextAgent Releases Python 3.11+ LangChain OpenAI Docker

ContextAgent is an AI assistant backend designed for document-based question answering using Retrieval-Augmented Generation (RAG). It uses a modular architecture that blends LangChain, OpenAI, FastAPI, and ChromaDB to build robust, production-ready workflows. The system handles diverse document formats, supports multi-tool agents, preserves conversational memory, and provides fast semantic search over large knowledge bases. It ships with Docker for straightforward deployment and scaling.

Table of contents

  • What ContextAgent is for
  • Core capabilities
  • How it works
  • Architecture at a glance
  • Modules and data flow
  • Getting started
  • Docker and deployment
  • Configuration and operations
  • Development workflow
  • Observability, testing, and reliability
  • Security considerations
  • Roadmap
  • Contributing
  • FAQ
  • Releases

What ContextAgent is for ContextAgent targets teams and individuals who need a scalable AI assistant that can read, reason about, and answer questions from a corpus of documents. It is built to support enterprises, researchers, and developers who want to customize and extend a knowledge base with minimal friction. The backend focuses on document-centric questions, where answers draw from indexed content rather than generic training data alone.

Core capabilities

  • Document ingestion and processing
    • PDF, Word (Docx), and Markdown handling
    • Content extraction, normalization, and metadata tagging
  • Vector search and semantic retrieval
    • ChromaDB as the vector store
    • Fast, precise similarity search over embeddings
  • RAG-based question answering
    • Retrieval augmented generation with large language models (LLMs)
    • Contextual responses derived from retrieved passages
  • Conversational memory
    • Short-term context tracking for smooth, coherent chats
    • Long-term memory options for ongoing projects
  • Modular multi-tool agents
    • Orchestrates several tools to gather data, summarize, or transform content
    • Flexible branching for complex queries
  • Production-ready deployment
    • Docker-based setup with sensible defaults
    • Configurable components for scale and reliability
  • Extensible architecture
    • Clear module boundaries
    • Simple extension points for custom tools, data sources, or processing steps

How it works

  • The user sends a query to the FastAPI endpoint.
  • The system retrieves relevant documents using a semantic search against the vector store.
  • Retrieved content is condensed and fed to an LLM via LangChain prompts.
  • The LLM generates an answer, optionally augmented with citations from the retrieved passages.
  • The conversation maintains memory to provide context-aware follow-ups and refinements.
  • Documents can flow through a processing pipeline that handles PDFs, Word documents, and Markdown, converting them into searchable embeddings for the vector store.
  • A multi-tool agent orchestrates ancillary tasks (like summarization, translation, or extraction) to produce richer responses when needed.

Architecture at a glance

  • FastAPI service layer
    • Exposes REST endpoints for chat, document ingestion, and admin tasks
  • LangChain-based orchestration
    • Manages prompt templates, tool calls, and memory interactions
  • OpenAI or other LLM backends
    • Core language model for generation and reasoning
  • Vector store (ChromaDB)
    • Embeddings storage and rapid similarity search
  • Document processing modules
    • PDF, Docx, Markdown parsers and text extractors
  • Conversational memory
    • Context tracking across turns and sessions
  • Multi-tool agents
    • A suite of tools that can be invoked to fetch data, summarize, translate, or transform content
  • Docker deployment
    • Production-ready containerization with minimal setup

Modules and data flow

  • API layer
    • Routes for chat, upload, and knowledge base management
    • Validation and rate limiting to protect resources
  • Ingestion pipeline
    • Detects document type
    • Extracts text and metadata
    • Creates embeddings and stores them in ChromaDB
  • Retrieval and ranking
    • Embedding-based search returns top candidates
    • Context is assembled for the LLM
  • Generation and reasoning
    • LLM receives context, user prompt, and memory state
    • Produces answer with optional citations
  • Memory subsystem
    • Short-term memory per session for continuity
    • Optional long-term memory with persistence
  • Tools and orchestration
    • Agents decide when to call tools for extra tasks
    • Results are integrated into the response
  • Output channel
    • Returns structured response, sources, and follow-up questions

Getting started Prerequisites

  • Python 3.11+ or a compatible Python environment
  • Docker and Docker Compose for containerized deployment
  • Access to an OpenAI API key or an alternative LLM provider
  • Basic familiarity with command line operations

Local development setup

  • Create a virtual environment
    • Install dependencies: a minimal, focused set that includes FastAPI, LangChain, OpenAI, and ChromaDB
  • Configure environment
    • OPENAI_API_KEY must be set
    • CHROMA_HOST or local vector store configuration
    • Document ingestion paths for sample data
  • Run locally
    • Start the API server and supporting services
    • Validate the chat endpoint with a test query
  • Document ingestion
    • Point the ingestion process at sample PDFs, Docx, or Markdown files
    • Verify that embeddings are created and stored in the vector store
  • Memory and conversation testing
    • Start a chat session and test follow-up questions
    • Confirm memory persistence across sessions if enabled

Docker and deployment Production-ready Docker setup

  • Single-node deployments for development or small teams
  • Multi-node configurations for larger teams or higher load
  • Separate services for API, vector store, and memory components
  • Environment-driven configuration to switch models, stores, or memory behavior

What you get with Docker

  • A reproducible environment
  • Isolation between components
  • Simple upgrade paths via container images
  • Scalable through standard Docker tooling

Basic docker-compose example (conceptual)

  • A typical setup includes:
    • api: FastAPI service
    • vector-store: ChromaDB or a compatible vector store
    • memory: a memory module
    • llm: a containerized LLM runner
    • ingestor: document ingestion service
  • The exact file layout and version pins vary by release
  • Use docker-compose up -d to start and docker-compose logs -f to monitor

How to deploy step by step

  • Step 1: Prepare environment
    • Create a dedicated project directory
    • Export OPENAI_API_KEY and any other required secrets
  • Step 2: Obtain and configure images
    • Pull the latest available images from your registry or Docker Hub
    • Confirm versions align with your needs (e.g., LLM model, vector store)
  • Step 3: Configure services
    • Point to a known data directory for document ingestion
    • Set memory policies and timeout values
  • Step 4: Start services
    • Run the compose file to bring up the stack
  • Step 5: Validate operations
    • Access the API docs and try a sample chat
    • Ingest a small document and verify retrieval and answer generation
  • Step 6: Scale and secure
    • Move to a production-ready deployment with proper secrets management
    • Enable TLS, authentication, and monitoring

Configuration and operations Environment variables and knobs

  • OPENAI_API_KEY: your OpenAI key or equivalent
  • LLM_PROVIDER: openai or another supported provider
  • EMBEDDING_MODEL: the embedding model to generate vectors
  • CHROMA_HOST: host for the vector store
  • CHROMA_PORT: port for the vector store
  • MEMORY_ENABLED: boolean to enable conversational memory
  • MEMORY_PERSISTENCE: choice between in-memory and persistent storage
  • INGRESS_BASE_PATH: API base path
  • LOG_LEVEL: e.g., INFO, DEBUG
  • INTAKE_DIRECTORIES: paths to watch for document ingestion

Ingestion and indexing

  • Supported formats: PDF, Docx, Markdown
  • Ingestion steps:
    • Extract text and metadata
    • Clean and normalize content
    • Generate embeddings
    • Store embeddings in ChromaDB with associated metadata
  • Index maintenance
    • Re-index on document updates
    • Versioned embeddings for traceability
  • Document metadata
    • Source file name, page numbers, extraction quality, and index timestamps

Memory and conversation

  • Short-term memory
    • Retains context for the current session
    • Enables natural follow-ups and clarifications
  • Long-term memory
    • Optional persistent store across sessions
    • Allows knowledge-aide continuity for ongoing projects
  • Privacy and scope
    • Memory scope can be restricted to specific projects or teams
    • Data retention policies can be configured per environment

Knowledge base and semantic search

  • Vector store
    • ChromaDB chosen for fast, scalable similarity search
    • Supports hybrid retri
View on GitHub
GitHub Stars6
CategoryCustomer
Updated1h ago
Forks0

Languages

Python

Security Score

75/100

Audited on Apr 10, 2026

No findings