<div align="center"> <a href="https://x.com/LeDonTizi" target="_blank"> <img src="https://img.shields.io/badge/Twitter-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white" alt="Twitter"> </a> <a href="https://discord.gg/tP5JB9DR" target="_blank"> <img src="https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Discord"> </a> <a href="https://www.youtube.com/@Dontizi" target="_blank"> <img src="https://img.shields.io/badge/YouTube-FF0000?style=for-the-badge&logo=youtube&logoColor=white" alt="YouTube"> </a> </div> <br>

RLAMA - User Guide

⚠️ Project Temporarily Paused
This project is currently on pause due to my work and university commitments that take up a lot of my time. I am not able to actively maintain this project at the moment. Development will resume when my situation allows it.

RLAMA is a powerful AI-driven question-answering tool for your documents, seamlessly integrating with your local Ollama models. It enables you to create, manage, and interact with Retrieval-Augmented Generation (RAG) systems tailored to your documentation needs.

Vision & Roadmap
Installation
Available Commands
Uninstallation
Supported Document Formats
Troubleshooting
Using OpenAI Models

Vision & Roadmap

RLAMA aims to become the definitive tool for creating local RAG systems that work seamlessly for everyone—from individual developers to large enterprises. Here's our strategic roadmap:

Completed Features ✅

✅ Basic RAG System Creation: CLI tool for creating and managing RAG systems
✅ Document Processing: Support for multiple document formats (.txt, .md, .pdf, etc.)
✅ Document Chunking: Advanced semantic chunking with multiple strategies (fixed, semantic, hierarchical, hybrid)
✅ Vector Storage: Local storage of document embeddings
✅ Context Retrieval: Basic semantic search with configurable context size
✅ Ollama Integration: Seamless connection to Ollama models
✅ Cross-Platform Support: Works on Linux, macOS, and Windows
✅ Easy Installation: One-line installation script
✅ API Server: HTTP endpoints for integrating RAG capabilities in other applications
✅ Web Crawling: Create RAGs directly from websites
✅ Guided RAG Setup Wizard: Interactive interface for easy RAG creation
✅ Hugging Face Integration: Access to 45,000+ GGUF models from Hugging Face Hub

Small LLM Optimization (Q2 2025)

[ ] Prompt Compression: Smart context summarization for limited context windows
✅ Adaptive Chunking: Dynamic content segmentation based on semantic boundaries and document structure
✅ Minimal Context Retrieval: Intelligent filtering to eliminate redundant content
[ ] Parameter Optimization: Fine-tuned settings for different model sizes

Advanced Embedding Pipeline (Q2-Q3 2025)

[ ] Multi-Model Embedding Support: Integration with various embedding models
[ ] Hybrid Retrieval Techniques: Combining sparse and dense retrievers for better accuracy
[ ] Embedding Evaluation Tools: Built-in metrics to measure retrieval quality
[ ] Automated Embedding Cache: Smart caching to reduce computation for similar queries

User Experience Enhancements (Q3 2025)

[ ] Lightweight Web Interface: Simple browser-based UI for the existing CLI backend
[ ] Knowledge Graph Visualization: Interactive exploration of document connections
[ ] Domain-Specific Templates: Pre-configured settings for different domains

Enterprise Features (Q4 2025)

[ ] Multi-User Access Control: Role-based permissions for team environments
[ ] Integration with Enterprise Systems: Connectors for SharePoint, Confluence, Google Workspace
[ ] Knowledge Quality Monitoring: Detection of outdated or contradictory information
[ ] System Integration API: Webhooks and APIs for embedding RLAMA in existing workflows
[ ] AI Agent Creation Framework: Simplified system for building custom AI agents with RAG capabilities

Next-Gen Retrieval Innovations (Q1 2026)

[ ] Multi-Step Retrieval: Using the LLM to refine search queries for complex questions
[ ] Cross-Modal Retrieval: Support for image content understanding and retrieval
[ ] Feedback-Based Optimization: Learning from user interactions to improve retrieval
[ ] Knowledge Graphs & Symbolic Reasoning: Combining vector search with structured knowledge

RLAMA's core philosophy remains unchanged: to provide a simple, powerful, local RAG solution that respects privacy, minimizes resource requirements, and works seamlessly across platforms.

Installation

Prerequisites

Ollama installed and running

Installation from terminal

curl -fsSL https://raw.githubusercontent.com/dontizi/rlama/main/install.sh | sh

Tech Stack

RLAMA is built with:

Core Language: Go (chosen for performance, cross-platform compatibility, and single binary distribution)
CLI Framework: Cobra (for command-line interface structure)
LLM Integration: Ollama API (for embeddings and completions)
Storage: Local filesystem-based storage (JSON files for simplicity and portability)
Vector Search: Custom implementation of cosine similarity for embedding retrieval

Architecture

RLAMA follows a clean architecture pattern with clear separation of concerns:

rlama/
├── cmd/                  # CLI commands (using Cobra)
│   ├── root.go           # Base command
│   ├── rag.go            # Create RAG systems
│   ├── run.go            # Query RAG systems
│   └── ...
├── internal/
│   ├── client/           # External API clients
│   │   └── ollama_client.go # Ollama API integration
│   ├── domain/           # Core domain models
│   │   ├── rag.go        # RAG system entity
│   │   └── document.go   # Document entity
│   ├── repository/       # Data persistence
│   │   └── rag_repository.go # Handles saving/loading RAGs
│   └── service/          # Business logic
│       ├── rag_service.go      # RAG operations
│       ├── document_loader.go  # Document processing
│       └── embedding_service.go # Vector embeddings
└── pkg/                  # Shared utilities
    └── vector/           # Vector operations

Data Flow

Document Processing: Documents are loaded from the file system, parsed based on their type, and converted to plain text.
Embedding Generation: Document text is sent to Ollama to generate vector embeddings.
Storage: The RAG system (documents + embeddings) is stored in the user's home directory (~/.rlama).
Query Process: When a user asks a question, it's converted to an embedding, compared against stored document embeddings, and relevant content is retrieved.
Response Generation: Retrieved content and the question are sent to Ollama to generate a contextually-informed response.

Visual Representation

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Documents  │────>│  Document   │────>│  Embedding  │
│  (Input)    │     │  Processing │     │  Generation │
└─────────────┘     └─────────────┘     └─────────────┘
                                              │
                                              ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Query     │────>│  Vector     │<────│ Vector Store│
│  Response   │     │  Search     │     │ (RAG System)│
└─────────────┘     └─────────────┘     └─────────────┘
       ▲                   │
       │                   ▼
┌─────────────┐     ┌─────────────┐
│   Ollama    │<────│   Conte

Rlama

Install / Use

README