SkillAgentSearch skills...

LocalGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

Install / Use

/learn @PromtEngineer/LocalGPT
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

LocalGPT - Private Document Intelligence Platform

<div align="center"> <p align="center"> <a href="https://trendshift.io/repositories/2947" target="_blank"><img src="https://trendshift.io/api/badge/repositories/2947" alt="PromtEngineer%2FlocalGPT | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> </p>

GitHub Stars GitHub Forks GitHub Issues GitHub Pull Requests Python 3.8+ License Docker

<p align="center"> <a href="https://x.com/engineerrprompt"> <img src="https://img.shields.io/badge/Follow%20on%20X-000000?style=for-the-badge&logo=x&logoColor=white" alt="Follow on X" /> </a> <a href="https://discord.gg/tUDWAFGc"> <img src="https://img.shields.io/badge/Join%20our%20Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Join our Discord" /> </a> </p> </div>

🚀 What is LocalGPT?

LocalGPT is a fully private, on-premise Document Intelligence platform. Ask questions, summarise, and uncover insights from your files with state-of-the-art AI—no data ever leaves your machine.

More than a traditional RAG (Retrieval-Augmented Generation) tool, LocalGPT features a hybrid search engine that blends semantic similarity, keyword matching, and Late Chunking for long-context precision. A smart router automatically selects between RAG and direct LLM answering for every query, while contextual enrichment and sentence-level Context Pruning surface only the most relevant content. An independent verification pass adds an extra layer of accuracy.

The architecture is modular and lightweight—enable only the components you need. With a pure-Python core and minimal dependencies, LocalGPT is simple to deploy, run, and maintain on any infrastructure.The system has minimal dependencies on frameworks and libraries, making it easy to deploy and maintain. The RAG system is pure python and does not require any additional dependencies.

▶️ Video

Watch this video to get started with LocalGPT.

| Home | Create Index | Chat | |------|--------------|------| | | | |

✨ Features

  • Utmost Privacy: Your data remains on your computer, ensuring 100% security.
  • Versatile Model Support: Seamlessly integrate a variety of open-source models via Ollama.
  • Diverse Embeddings: Choose from a range of open-source embeddings.
  • Reuse Your LLM: Once downloaded, reuse your LLM without the need for repeated downloads.
  • Chat History: Remembers your previous conversations (in a session).
  • API: LocalGPT has an API that you can use for building RAG Applications.
  • GPU, CPU, HPU & MPS Support: Supports multiple platforms out of the box, Chat with your data using CUDA, CPU, HPU (Intel® Gaudi®) or MPS and more!

📖 Document Processing

  • Multi-format Support: PDF, DOCX, TXT, Markdown, and more (Currently only PDF is supported)
  • Contextual Enrichment: Enhanced document understanding with AI-generated context, inspired by Contextual Retrieval
  • Batch Processing: Handle multiple documents simultaneously

🤖 AI-Powered Chat

  • Natural Language Queries: Ask questions in plain English
  • Source Attribution: Every answer includes document references
  • Smart Routing: Automatically chooses between RAG and direct LLM responses
  • Query Decomposition: Breaks complex queries into sub-questions for better answers
  • Semantic Caching: TTL-based caching with similarity matching for faster responses
  • Session-Aware History: Maintains conversation context across interactions
  • Answer Verification: Independent verification pass for accuracy
  • Multiple AI Models: Ollama for inference, HuggingFace for embeddings and reranking

🛠️ Developer-Friendly

  • RESTful APIs: Complete API access for integration
  • Real-time Progress: Live updates during document processing
  • Flexible Configuration: Customize models, chunk sizes, and search parameters
  • Extensible Architecture: Plugin system for custom components

🎨 Modern Interface

  • Intuitive Web UI: Clean, responsive design
  • Session Management: Organize conversations by topic
  • Index Management: Easy document collection management
  • Real-time Chat: Streaming responses for immediate feedback

🚀 Quick Start

Note: The installation is currently only tested on macOS.

Prerequisites

  • Python 3.8 or higher (tested with Python 3.11.5)
  • Node.js 16+ and npm (tested with Node.js 23.10.0, npm 10.9.2)
  • Docker (optional, for containerized deployment)
  • 8GB+ RAM (16GB+ recommended)
  • Ollama (required for both deployment approaches)

NOTE

Before this brach is moved to the main branch, please clone this branch for instalation:

git clone -b localgpt-v2 https://github.com/PromtEngineer/localGPT.git
cd localGPT

Option 1: Docker Deployment

# Clone the repository
git clone https://github.com/PromtEngineer/localGPT.git
cd localGPT

# Install Ollama locally (required even for Docker)
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen3:0.6b
ollama pull qwen3:8b

# Start Ollama
ollama serve

# Start with Docker (in a new terminal)
./start-docker.sh

# Access the application
open http://localhost:3000

Docker Management Commands:

# Check container status
docker compose ps

# View logs
docker compose logs -f

# Stop containers
./start-docker.sh stop

Option 2: Direct Development (Recommended for Development)

# Clone the repository
git clone https://github.com/PromtEngineer/localGPT.git
cd localGPT

# Install Python dependencies
pip install -r requirements.txt

# Key dependencies installed:
# - torch==2.4.1, transformers==4.51.0 (AI models)
# - lancedb (vector database)
# - rank_bm25, fuzzywuzzy (search algorithms)
# - sentence_transformers, rerankers (embedding/reranking)
# - docling (document processing)
# - colpali-engine (multimodal processing - support coming soon)

# Install Node.js dependencies
npm install

# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen3:0.6b
ollama pull qwen3:8b
ollama serve

# Start the system (in a new terminal)
python run_system.py

# Access the application
open http://localhost:3000

System Management:

# Check system health (comprehensive diagnostics)
python system_health_check.py

# Check service status and health
python run_system.py --health

# Start in production mode
python run_system.py --mode prod

# Skip frontend (backend + RAG API only)
python run_system.py --no-frontend

# View aggregated logs
python run_system.py --logs-only

# Stop all services
python run_system.py --stop
# Or press Ctrl+C in the terminal running python run_system.py

Service Architecture: The run_system.py launcher manages four key services:

  • Ollama Server (port 11434): AI model serving
  • RAG API Server (port 8001): Document processing and retrieval
  • Backend Server (port 8000): Session management and API endpoints
  • Frontend Server (port 3000): React/Next.js web interface

Option 3: Manual Component Startup

# Terminal 1: Start Ollama
ollama serve

# Terminal 2: Start RAG API
python -m rag_system.api_server

# Terminal 3: Start Backend
cd backend && python server.py

# Terminal 4: Start Frontend
npm run dev

# Access at http://localhost:3000

Detailed Installation

1. Install System Dependencies

Ubuntu/Debian:

sudo apt update
sudo apt install python3.8 python3-pip nodejs npm docker.io docker-compose

macOS:

brew install python@3.8 node npm docker docker-compose

Windows:

# Install Python 3.8+, Node.js, and Docker Desktop
# Then use PowerShell or WSL2

2. Install AI Models

Install Ollama (Recommended):

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull recommended models
ollama pull qwen3:0.6b          # Fast generation model
ollama pull qwen3:8b            # High-quality generation model

3. Configure Environment

# Copy environment template
cp .env.example .env

# Edit configuration
nano .env

Key Configuration Options:

# AI Models (referenced in rag_system/main.py)
OLLAMA_HOST=http://localhost:11434

# Database Paths (used by backend and RAG system)
DATABASE_PATH=./backend/chat_data.db
VECTOR_DB_PATH=./lancedb

# Server Settings (used by run_system.py)
BACKEND_PORT=8000
FRONTEND_PORT=3000
RAG_API_PORT=8001

# Optional: Override default models
GENERATION_MODEL=qwen3:8b
ENRICHMENT_MODEL=qwen3:0.6b
EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
RERANKER_MODEL=answerdotai/answerai-colbert-small-v1

4. Initialize the System

# Run system health check
python system_health_check.py

# Initialize databases
python -c "fro
View on GitHub
GitHub Stars22.2k
CategoryDevelopment
Updated4h ago
Forks2.5k

Languages

Python

Security Score

95/100

Audited on Mar 27, 2026

No findings