RAGLight
RAGLight is a modular framework for Retrieval-Augmented Generation (RAG). It makes it easy to plug in different LLMs, embeddings, and vector stores, and now includes seamless MCP integration to connect external tools and data sources.
Install / Use
/learn @Bessouat40/RAGLightREADME
RAGLight
<div align="center"> <img alt="RAGLight" height="200px" src="./media/raglight.png"> </div>RAGLight is a lightweight and modular Python library for implementing Retrieval-Augmented Generation (RAG). It enhances the capabilities of Large Language Models (LLMs) by combining document retrieval with natural language inference.
Designed for simplicity and flexibility, RAGLight provides modular components to easily integrate various LLMs, embeddings, and vector stores, making it an ideal tool for building context-aware AI solutions.
📚 Table of Contents
⚠️ Requirements
Actually RAGLight supports :
- Ollama
- Google Gemini
- LMStudio
- vLLM
- OpenAI API
- Mistral API
- AWS Bedrock
If you use LMStudio, you need to have the model you want to use loaded in LMStudio. If you use AWS Bedrock, configure your AWS credentials (env vars,
~/.aws/credentials, or IAM role) — no extra install needed.
Features
- Embeddings Model Integration: Plug in your preferred embedding models (e.g., HuggingFace all-MiniLM-L6-v2) for compact and efficient vector embeddings.
- LLM Agnostic: Seamlessly integrates with different LLMs from different providers (Ollama, LMStudio, Mistral, OpenAI, Google Gemini, AWS Bedrock).
- RAG Pipeline: Combines document retrieval and language generation in a unified workflow.
- Agentic RAG Pipeline: Use Agent to improve your RAG performances.
- 🔌 MCP Integration: Add external tool capabilities (e.g. code execution, database access) via MCP servers.
- Flexible Document Support: Ingest and index various document types (e.g., PDF, TXT, DOCX, Python, Javascript, ...).
- Extensible Architecture: Easily swap vector stores, embedding models, or LLMs to suit your needs.
- 🔍 Hybrid Search (BM25 + Semantic + RRF): Combine keyword-based BM25 retrieval with dense vector search using Reciprocal Rank Fusion for best-of-both-worlds results.
- ✍️ Query Reformulation: Automatically rewrites follow-up questions into standalone queries using conversation history, improving retrieval accuracy in multi-turn conversations.
- 💬 Conversation History: Full multi-turn history supported across all providers (Ollama, OpenAI, Mistral, LMStudio, Gemini, Bedrock) with optional
max_historycap. - ⚡ Streaming Output: Token-by-token streaming via
generate_streaming()on all providers — drop-in alongsidegenerate()with no extra configuration. - ☁️ AWS Bedrock: Use Claude, Titan, Llama and other Bedrock models for both LLM inference and embeddings.
- 📊 Langfuse Observability (v3+): Trace every RAG call end-to-end — retrieve, rerank, and generate — directly in your Langfuse dashboard.
Import library 🛠️
Install the base library:
pip install raglight
RAGLight uses optional extras for vector store backends, so you only install what you need:
| Extra | Package installed | Notes |
| -------------------- | ----------------- | ----------------------------------------------------- |
| raglight[chroma] | chromadb | Requires a C++ compiler on Windows |
| raglight[qdrant] | qdrant-client | Pure Python — works on Windows without a C++ compiler |
| raglight[langfuse] | langfuse | Observability tracing |
pip install "raglight[qdrant]" # Qdrant only (Windows-friendly)
pip install "raglight[chroma]" # ChromaDB only
pip install "raglight[chroma,qdrant]" # both
pip install "raglight[qdrant,langfuse]" # Qdrant + observability
Chat with Your Documents Instantly With CLI 💬
For the quickest and easiest way to get started, RAGLight provides an interactive command-line wizard. It will guide you through every step, from selecting your documents to chatting with them, without writing a single line of Python. Prerequisite: Ensure you have a local LLM service like Ollama running.
Just run this one command in your terminal:
raglight chat
You can also launch the Agentic RAG wizard with:
raglight agentic-chat
The wizard will guide you through the setup process. Here is what it looks like:
<div align="center"> <img alt="RAGLight" src="./media/cli.png"> </div>The wizard will ask you for:
- 📂 Data Source: The path to your local folder containing the documents.
- 🚫 Ignore Folders: Configure which folders to exclude during indexing (e.g.,
.venv,node_modules,__pycache__). - 💾 Vector Database: Where to store the indexed data and what to name it.
- 🧠 Embeddings Model: Which model to use for understanding your documents.
- 🤖 Language Model (LLM): Which LLM to use for generating answers.
After configuration, it will automatically index your documents and start a chat session.
Ignore Folders Feature 🚫
RAGLight automatically excludes common directories that shouldn't be indexed, such as:
- Virtual environments (
.venv,venv,env) - Node.js dependencies (
node_modules) - Python cache files (
__pycache__) - Build artifacts (
build,dist,target) - IDE files (
.vscode,.idea) - And many more...
You can customize this list during the CLI setup or use the default configuration. This ensures that only relevant code and documentation are indexed, improving performance and reducing noise in your search results.
Ignore Folders in Configuration Classes 🚫
The ignore folders feature is also available in all configuration classes, allowing you to specify which directories to exclude during indexing:
- RAGConfig: Use
ignore_foldersparameter to exclude folders during RAG pipeline indexing - AgenticRAGConfig: Use
ignore_foldersparameter to exclude folders during AgenticRAG pipeline indexing - VectorStoreConfig: Use
ignore_foldersparameter to exclude folders during vector store operations
All configuration classes use Settings.DEFAULT_IGNORE_FOLDERS as the default value, but you can override this with your custom list:
# Example: Custom ignore folders for any configuration
custom_ignore_folders = [
".venv",
"venv",
"node_modules",
"__pycache__",
".git",
"build",
"dist",
"temp_files", # Your custom folders
"cache"
]
# Use in any configuration class
config = RAGConfig(
llm=Settings.DEFAULT_LLM,
provider=Settings.OLLAMA,
ignore_folders=custom_ignore_folders # Override default
)
See the complete example in examples/ignore_folders_config_example.py for all configuration types.
Deploy as a REST API (raglight serve) 🌐
raglight serve starts a FastAPI server configured entirely via environment variables — no Python code required.
Start the server
raglight serve
Options :
--host Host to bind (default: 0.0.0.0)
--port Port to listen on (default: 8000)
--reload Enable auto-reload for development (default: false)
--workers Number of worker processes (default: 1)
--ui Launch the Streamlit chat UI alongside the API (default: false)
--ui-port Port for the Streamlit UI (default: 8501)
Example :
RAGLIGHT_LLM_MODEL=mistral-small-latest \
RAGLIGHT_LLM_PROVIDER=Mistral \
raglight serve --port 8080
Langfuse tracing example:
LANGFUSE_HOST=http://localhost:3000 \
LANGFUSE_PUBLIC_KEY=pk-lf-... \
LANGFUSE_SECRET_KEY=sk-lf-... \
raglight serve
Langfuse tracing is enabled automatically when
LANGFUSE_HOST(orLANGFUSE_BASE_URL),LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEYare all set in the environment. Requirespip install "raglight[langfuse]".
Launch the Chat UI 💬
Add --ui to start a Streamlit chat interface alongside the REST API — no extra setup required:
raglight serve --ui
| A
