SkillAgentSearch skills...

H2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/

Install / Use

/learn @h2oai/H2ogpt
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

h2oGPT

Turn ★ into ⭐ (top-right corner) if you like the project!

Query and summarize your documents or just chat with local private GPT LLMs using h2oGPT, an Apache V2 open-source project.

Check out a long CoT Open-o1 open 🍓strawberry🍓 project: https://github.com/pseudotensor/open-strawberry

Try Enterprise Version for Free

Enterprise h2oGPTe

Video Demo

https://github.com/h2oai/h2ogpt/assets/2249614/2f805035-2c85-42fb-807f-fd0bca79abc6

img-small.png YouTube 4K Video

Features

  • Private offline database of any documents (PDFs, Excel, Word, Images, Video Frames, YouTube, Audio, Code, Text, MarkDown, etc.)
    • Persistent database (Chroma, Weaviate, or in-memory FAISS) using accurate embeddings (instructor-large, all-MiniLM-L6-v2, etc.)
    • Efficient use of context using instruct-tuned LLMs (no need for LangChain's few-shot approach)
    • Parallel summarization and extraction, reaching an output of 80 tokens per second with the 13B LLaMa2 model
    • HYDE (Hypothetical Document Embeddings) for enhanced retrieval based upon LLM responses
    • Semantic Chunking for better document splitting (requires GPU)
  • Variety of models supported (LLaMa2, Mistral, Falcon, Vicuna, WizardLM. With AutoGPTQ, 4-bit/8-bit, LORA, etc.)
    • GPU support from HF and LLaMa.cpp GGML models, and CPU support using HF, LLaMa.cpp, and GPT4ALL models
    • Attention Sinks for arbitrarily long generation (LLaMa-2, Mistral, MPT, Pythia, Falcon, etc.)
  • Gradio UI or CLI with streaming of all models
    • Upload and View documents through the UI (control multiple collaborative or personal collections)
    • Vision Models LLaVa, Claude-3, Gemini-Pro-Vision, GPT-4-Vision
    • Image Generation Stable Diffusion (sdxl-turbo, sdxl, SD3), PlaygroundAI (playv2), and Flux
    • Voice STT using Whisper with streaming audio conversion
    • Voice TTS using MIT-Licensed Microsoft Speech T5 with multiple voices and Streaming audio conversion
    • Voice TTS using MPL2-Licensed TTS including Voice Cloning and Streaming audio conversion
    • AI Assistant Voice Control Mode for hands-free control of h2oGPT chat
    • Bake-off UI mode against many models at the same time
    • Easy Download of model artifacts and control over models like LLaMa.cpp through the UI
    • Authentication in the UI by user/password via Native or Google OAuth
    • State Preservation in the UI by user/password
  • Open Web UI with h2oGPT as backend via OpenAI Proxy
    • See Start-up Docs.
    • Chat completion with streaming
    • Document Q/A using h2oGPT ingestion with advanced OCR from DocTR
    • Vision models
    • Audio Transcription (STT)
    • Audio Generation (TTS)
    • Image generation
    • Authentication
    • State preservation
  • Linux, Docker, macOS, and Windows support
  • Inference Servers support for oLLaMa, HF TGI server, vLLM, Gradio, ExLLaMa, Replicate, Together.ai, OpenAI, Azure OpenAI, Anthropic, MistralAI, Google, and Groq
  • OpenAI compliant
    • Server Proxy API (h2oGPT acts as drop-in-replacement to OpenAI server)
    • Chat and Text Completions (streaming and non-streaming)
    • Audio Transcription (STT)
    • Audio Generation (TTS)
    • Image Generation
    • Embedding
    • Function tool calling w/auto tool selection
    • AutoGen Code Execution Agent
  • JSON Mode
    • Strict schema control for vLLM via its use of outlines
    • Strict schema control for OpenAI, Anthropic, Google Gemini, MistralAI models
    • JSON mode for some older OpenAI or Gemini models with schema control if model is smart enough (e.g. gemini 1.5 flash)
    • Any model via code block extraction
  • Web-Search integration with Chat and Document Q/A
  • Agents for Search, Document Q/A, Python Code, CSV frames
    • High quality Agents via OpenAI proxy server on separate port
    • Code-first agent that generates plots, researches, evaluates images via vision model, etc. (client code openai_server/openai_client.py).
    • No UI for this, just API
  • Evaluate performance using reward models
  • Quality maintained with over 1000 unit and integration tests taking over 24 GPU-hours

Get Started

GitHub license Linux macOS Windows Docker

Install h2oGPT

Docker is recommended for Linux, Windows, and MAC for full capabilities. Linux Script also has full capability, while Windows and MAC scripts have less capabilities than using Docker.


Collab Demos

Resources

Docs Guide

<!-- cat README.md | ./gh-md-toc - But Help is heavily processed -->

Development

  • To create a development environment for training and generation, follow the installation instructions.
  • To fine-tune any LLM models on your data, follow the fine-tuning instructions.
  • To run h2oGPT tests:
    pip install requirements-parser pytest-instafail pytest-random-order playsound==1.3.0
    conda install -c conda-forge gst-python -y
    sudo apt-get install gstreamer-1.0
    pip install pygame
    GPT_H2O_AI=0 CONCURRENCY_COUNT=1 pytest --instafail -s -v tests
    # for openai server test on already-running local server
    pytest -s -v -n 4 openai_server/test_openai_server.py::test_openai_client
    
    or tweak/run tests/test4gpus.sh to run tests in parallel.

Acknowledgements

Related Skills

View on GitHub
GitHub Stars12.0k
CategoryCustomer
Updated1d ago
Forks1.3k

Languages

Python

Security Score

100/100

Audited on Mar 27, 2026

No findings