RAGLight

License

RAGLight is a lightweight and modular Python library for implementing Retrieval-Augmented Generation (RAG). It enhances the capabilities of Large Language Models (LLMs) by combining document retrieval with natural language inference.

Designed for simplicity and flexibility, RAGLight provides modular components to easily integrate various LLMs, embeddings, and vector stores, making it an ideal tool for building context-aware AI solutions.

📚 Table of Contents

Requirements
Features
Import library
Chat with Your Documents Instantly With CLI
- Ignore Folders Feature
- Ignore Folders in Configuration Classes
Deploy as a REST API (raglight serve)
Environment Variables
Providers and Databases
Quick Start
Use RAGLight with Docker
- Build your image
- Run your image

⚠️ Requirements

Actually RAGLight supports :

Ollama

Google Gemini

LMStudio

vLLM

OpenAI API

Mistral API

AWS Bedrock

If you use LMStudio, you need to have the model you want to use loaded in LMStudio. If you use AWS Bedrock, configure your AWS credentials (env vars, ~/.aws/credentials, or IAM role) — no extra install needed.

Features

Embeddings Model Integration: Plug in your preferred embedding models (e.g., HuggingFace all-MiniLM-L6-v2) for compact and efficient vector embeddings.
LLM Agnostic: Seamlessly integrates with different LLMs from different providers (Ollama, LMStudio, Mistral, OpenAI, Google Gemini, AWS Bedrock).
RAG Pipeline: Combines document retrieval and language generation in a unified workflow.
Agentic RAG Pipeline: Use Agent to improve your RAG performances.
🔌 MCP Integration: Add external tool capabilities (e.g. code execution, database access) via MCP servers.
Flexible Document Support: Ingest and index various document types (e.g., PDF, TXT, DOCX, Python, Javascript, ...).
Extensible Architecture: Easily swap vector stores, embedding models, or LLMs to suit your needs.
🔍 Hybrid Search (BM25 + Semantic + RRF): Combine keyword-based BM25 retrieval with dense vector search using Reciprocal Rank Fusion for best-of-both-worlds results.
✍️ Query Reformulation: Automatically rewrites follow-up questions into standalone queries using conversation history, improving retrieval accuracy in multi-turn conversations.
💬 Conversation History: Full multi-turn history supported across all providers (Ollama, OpenAI, Mistral, LMStudio, Gemini, Bedrock) with optional max_history cap.
⚡ Streaming Output: Token-by-token streaming via generate_streaming() on all providers — drop-in alongside generate() with no extra configuration.
☁️ AWS Bedrock: Use Claude, Titan, Llama and other Bedrock models for both LLM inference and embeddings.
📊 Langfuse Observability (v3+): Trace every RAG call end-to-end — retrieve, rerank, and generate — directly in your Langfuse dashboard.

Import library 🛠️

Install the base library:

pip install raglight

RAGLight uses optional extras for vector store backends, so you only install what you need:

| Extra | Package installed | Notes | | -------------------- | ----------------- | ----------------------------------------------------- | | raglight[chroma] | chromadb | Requires a C++ compiler on Windows | | raglight[qdrant] | qdrant-client | Pure Python — works on Windows without a C++ compiler | | raglight[langfuse] | langfuse | Observability tracing |

pip install "raglight[qdrant]"           # Qdrant only (Windows-friendly)
pip install "raglight[chroma]"           # ChromaDB only
pip install "raglight[chroma,qdrant]"    # both
pip install "raglight[qdrant,langfuse]"  # Qdrant + observability

Chat with Your Documents Instantly With CLI 💬

For the quickest and easiest way to get started, RAGLight provides an interactive command-line wizard. It will guide you through every step, from selecting your documents to chatting with them, without writing a single line of Python. Prerequisite: Ensure you have a local LLM service like Ollama running.

Just run this one command in your terminal:

raglight chat

You can also launch the Agentic RAG wizard with:

raglight agentic-chat

The wizard will guide you through the setup process. Here is what it looks like:

The wizard will ask you for:

📂 Data Source: The path to your local folder containing the documents.
🚫 Ignore Folders: Configure which folders to exclude during indexing (e.g., .venv, node_modules, __pycache__).
💾 Vector Database: Where to store the indexed data and what to name it.
🧠 Embeddings Model: Which model to use for understanding your documents.
🤖 Language Model (LLM): Which LLM to use for generating answers.

After configuration, it will automatically index your documents and start a chat session.

Ignore Folders Feature 🚫

RAGLight automatically excludes common directories that shouldn't be indexed, such as:

Virtual environments (.venv, venv, env)
Node.js dependencies (node_modules)
Python cache files (__pycache__)
Build artifacts (build, dist, target)
IDE files (.vscode, .idea)
And many more...

You can customize this list during the CLI setup or use the default configuration. This ensures that only relevant code and documentation are indexed, improving performance and reducing noise in your search results.

Ignore Folders in Configuration Classes 🚫

The ignore folders feature is also available in all configuration classes, allowing you to specify which directories to exclude during indexing:

RAGConfig: Use ignore_folders parameter to exclude folders during RAG pipeline indexing
AgenticRAGConfig: Use ignore_folders parameter to exclude folders during AgenticRAG pipeline indexing
VectorStoreConfig: Use ignore_folders parameter to exclude folders during vector store operations

All configuration classes use Settings.DEFAULT_IGNORE_FOLDERS as the default value, but you can override this with your custom list:

# Example: Custom ignore folders for any configuration
custom_ignore_folders = [
    ".venv",
    "venv",
    "node_modules",
    "__pycache__",
    ".git",
    "build",
    "dist",
    "temp_files",  # Your custom folders
    "cache"
]

# Use in any configuration class
config = RAGConfig(
    llm=Settings.DEFAULT_LLM,
    provider=Settings.OLLAMA,
    ignore_folders=custom_ignore_folders  # Override default
)

See the complete example in examples/ignore_folders_config_example.py for all configuration types.

Deploy as a REST API (raglight serve) 🌐

raglight serve starts a FastAPI server configured entirely via environment variables — no Python code required.

Start the server

raglight serve

Options :

--host      Host to bind (default: 0.0.0.0)
--port      Port to listen on (default: 8000)
--reload    Enable auto-reload for development (default: false)
--workers   Number of worker processes (default: 1)
--ui        Launch the Streamlit chat UI alongside the API (default: false)
--ui-port   Port for the Streamlit UI (default: 8501)

Example :

RAGLIGHT_LLM_MODEL=mistral-small-latest \
RAGLIGHT_LLM_PROVIDER=Mistral \
raglight serve --port 8080

Langfuse tracing example:

LANGFUSE_HOST=http://localhost:3000 \
LANGFUSE_PUBLIC_KEY=pk-lf-... \
LANGFUSE_SECRET_KEY=sk-lf-... \
raglight serve

Langfuse tracing is enabled automatically when LANGFUSE_HOST (or LANGFUSE_BASE_URL), LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are all set in the environment. Requires pip install "raglight[langfuse]".

Launch the Chat UI 💬

Add --ui to start a Streamlit chat interface alongside the REST API — no extra setup required:

raglight serve --ui

| A

RAGLight

Install / Use

README

RAGLight

📚 Table of Contents

⚠️ Requirements

Features

Import library 🛠️

Chat with Your Documents Instantly With CLI 💬

Ignore Folders Feature 🚫

Ignore Folders in Configuration Classes 🚫

Deploy as a REST API (raglight serve) 🌐

Start the server

Launch the Chat UI 💬