SkillAgentSearch skills...

Rlama

A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems for all your document needs.

Install / Use

/learn @DonTizi/Rlama
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<!-- Social Links Navigation Bar --> <div align="center"> <a href="https://x.com/LeDonTizi" target="_blank"> <img src="https://img.shields.io/badge/Twitter-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white" alt="Twitter"> </a> <a href="https://discord.gg/tP5JB9DR" target="_blank"> <img src="https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Discord"> </a> <a href="https://www.youtube.com/@Dontizi" target="_blank"> <img src="https://img.shields.io/badge/YouTube-FF0000?style=for-the-badge&logo=youtube&logoColor=white" alt="YouTube"> </a> </div> <br>

RLAMA - User Guide

⚠️ Project Temporarily Paused
This project is currently on pause due to my work and university commitments that take up a lot of my time. I am not able to actively maintain this project at the moment. Development will resume when my situation allows it.

RLAMA is a powerful AI-driven question-answering tool for your documents, seamlessly integrating with your local Ollama models. It enables you to create, manage, and interact with Retrieval-Augmented Generation (RAG) systems tailored to your documentation needs.

RLAMA Demonstration

Table of Contents

Vision & Roadmap

RLAMA aims to become the definitive tool for creating local RAG systems that work seamlessly for everyone—from individual developers to large enterprises. Here's our strategic roadmap:

Completed Features ✅

  • Basic RAG System Creation: CLI tool for creating and managing RAG systems
  • Document Processing: Support for multiple document formats (.txt, .md, .pdf, etc.)
  • Document Chunking: Advanced semantic chunking with multiple strategies (fixed, semantic, hierarchical, hybrid)
  • Vector Storage: Local storage of document embeddings
  • Context Retrieval: Basic semantic search with configurable context size
  • Ollama Integration: Seamless connection to Ollama models
  • Cross-Platform Support: Works on Linux, macOS, and Windows
  • Easy Installation: One-line installation script
  • API Server: HTTP endpoints for integrating RAG capabilities in other applications
  • Web Crawling: Create RAGs directly from websites
  • Guided RAG Setup Wizard: Interactive interface for easy RAG creation
  • Hugging Face Integration: Access to 45,000+ GGUF models from Hugging Face Hub

Small LLM Optimization (Q2 2025)

  • [ ] Prompt Compression: Smart context summarization for limited context windows
  • Adaptive Chunking: Dynamic content segmentation based on semantic boundaries and document structure
  • Minimal Context Retrieval: Intelligent filtering to eliminate redundant content
  • [ ] Parameter Optimization: Fine-tuned settings for different model sizes

Advanced Embedding Pipeline (Q2-Q3 2025)

  • [ ] Multi-Model Embedding Support: Integration with various embedding models
  • [ ] Hybrid Retrieval Techniques: Combining sparse and dense retrievers for better accuracy
  • [ ] Embedding Evaluation Tools: Built-in metrics to measure retrieval quality
  • [ ] Automated Embedding Cache: Smart caching to reduce computation for similar queries

User Experience Enhancements (Q3 2025)

  • [ ] Lightweight Web Interface: Simple browser-based UI for the existing CLI backend
  • [ ] Knowledge Graph Visualization: Interactive exploration of document connections
  • [ ] Domain-Specific Templates: Pre-configured settings for different domains

Enterprise Features (Q4 2025)

  • [ ] Multi-User Access Control: Role-based permissions for team environments
  • [ ] Integration with Enterprise Systems: Connectors for SharePoint, Confluence, Google Workspace
  • [ ] Knowledge Quality Monitoring: Detection of outdated or contradictory information
  • [ ] System Integration API: Webhooks and APIs for embedding RLAMA in existing workflows
  • [ ] AI Agent Creation Framework: Simplified system for building custom AI agents with RAG capabilities

Next-Gen Retrieval Innovations (Q1 2026)

  • [ ] Multi-Step Retrieval: Using the LLM to refine search queries for complex questions
  • [ ] Cross-Modal Retrieval: Support for image content understanding and retrieval
  • [ ] Feedback-Based Optimization: Learning from user interactions to improve retrieval
  • [ ] Knowledge Graphs & Symbolic Reasoning: Combining vector search with structured knowledge

RLAMA's core philosophy remains unchanged: to provide a simple, powerful, local RAG solution that respects privacy, minimizes resource requirements, and works seamlessly across platforms.

Installation

Prerequisites

  • Ollama installed and running

Installation from terminal

curl -fsSL https://raw.githubusercontent.com/dontizi/rlama/main/install.sh | sh

Tech Stack

RLAMA is built with:

  • Core Language: Go (chosen for performance, cross-platform compatibility, and single binary distribution)
  • CLI Framework: Cobra (for command-line interface structure)
  • LLM Integration: Ollama API (for embeddings and completions)
  • Storage: Local filesystem-based storage (JSON files for simplicity and portability)
  • Vector Search: Custom implementation of cosine similarity for embedding retrieval

Architecture

RLAMA follows a clean architecture pattern with clear separation of concerns:

rlama/
├── cmd/                  # CLI commands (using Cobra)
│   ├── root.go           # Base command
│   ├── rag.go            # Create RAG systems
│   ├── run.go            # Query RAG systems
│   └── ...
├── internal/
│   ├── client/           # External API clients
│   │   └── ollama_client.go # Ollama API integration
│   ├── domain/           # Core domain models
│   │   ├── rag.go        # RAG system entity
│   │   └── document.go   # Document entity
│   ├── repository/       # Data persistence
│   │   └── rag_repository.go # Handles saving/loading RAGs
│   └── service/          # Business logic
│       ├── rag_service.go      # RAG operations
│       ├── document_loader.go  # Document processing
│       └── embedding_service.go # Vector embeddings
└── pkg/                  # Shared utilities
    └── vector/           # Vector operations

Data Flow

  1. Document Processing: Documents are loaded from the file system, parsed based on their type, and converted to plain text.
  2. Embedding Generation: Document text is sent to Ollama to generate vector embeddings.
  3. Storage: The RAG system (documents + embeddings) is stored in the user's home directory (~/.rlama).
  4. Query Process: When a user asks a question, it's converted to an embedding, compared against stored document embeddings, and relevant content is retrieved.
  5. Response Generation: Retrieved content and the question are sent to Ollama to generate a contextually-informed response.

Visual Representation

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Documents  │────>│  Document   │────>│  Embedding  │
│  (Input)    │     │  Processing │     │  Generation │
└─────────────┘     └─────────────┘     └─────────────┘
                                              │
                                              ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Query     │────>│  Vector     │<────│ Vector Store│
│  Response   │     │  Search     │     │ (RAG System)│
└─────────────┘     └─────────────┘     └─────────────┘
       ▲                   │
       │                   ▼
┌─────────────┐     ┌─────────────┐
│   Ollama    │<────│   Conte
View on GitHub
GitHub Stars1.1k
CategoryDevelopment
Updated3d ago
Forks74

Languages

Go

Security Score

95/100

Audited on Mar 30, 2026

No findings