H2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/

Generate Convert Improve

Install / Use

/learn @h2oai/H2ogpt

About this skill

Quality Score

0/100

README

h2oGPT

Turn ★ into ⭐ (top-right corner) if you like the project!

Query and summarize your documents or just chat with local private GPT LLMs using h2oGPT, an Apache V2 open-source project.

Check out a long CoT Open-o1 open 🍓strawberry🍓 project: https://github.com/pseudotensor/open-strawberry

Try Enterprise Version for Free

Enterprise h2oGPTe

Video Demo

https://github.com/h2oai/h2ogpt/assets/2249614/2f805035-2c85-42fb-807f-fd0bca79abc6

YouTube 4K Video

Features

Private offline database of any documents (PDFs, Excel, Word, Images, Video Frames, YouTube, Audio, Code, Text, MarkDown, etc.)
- Persistent database (Chroma, Weaviate, or in-memory FAISS) using accurate embeddings (instructor-large, all-MiniLM-L6-v2, etc.)
- Efficient use of context using instruct-tuned LLMs (no need for LangChain's few-shot approach)
- Parallel summarization and extraction, reaching an output of 80 tokens per second with the 13B LLaMa2 model
- HYDE (Hypothetical Document Embeddings) for enhanced retrieval based upon LLM responses
- Semantic Chunking for better document splitting (requires GPU)
Variety of models supported (LLaMa2, Mistral, Falcon, Vicuna, WizardLM. With AutoGPTQ, 4-bit/8-bit, LORA, etc.)
- GPU support from HF and LLaMa.cpp GGML models, and CPU support using HF, LLaMa.cpp, and GPT4ALL models
- Attention Sinks for arbitrarily long generation (LLaMa-2, Mistral, MPT, Pythia, Falcon, etc.)
Gradio UI or CLI with streaming of all models
- Upload and View documents through the UI (control multiple collaborative or personal collections)
- Vision Models LLaVa, Claude-3, Gemini-Pro-Vision, GPT-4-Vision
- Image Generation Stable Diffusion (sdxl-turbo, sdxl, SD3), PlaygroundAI (playv2), and Flux
- Voice STT using Whisper with streaming audio conversion
- Voice TTS using MIT-Licensed Microsoft Speech T5 with multiple voices and Streaming audio conversion
- Voice TTS using MPL2-Licensed TTS including Voice Cloning and Streaming audio conversion
- AI Assistant Voice Control Mode for hands-free control of h2oGPT chat
- Bake-off UI mode against many models at the same time
- Easy Download of model artifacts and control over models like LLaMa.cpp through the UI
- Authentication in the UI by user/password via Native or Google OAuth
- State Preservation in the UI by user/password
Open Web UI with h2oGPT as backend via OpenAI Proxy
- See Start-up Docs.
- Chat completion with streaming
- Document Q/A using h2oGPT ingestion with advanced OCR from DocTR
- Vision models
- Audio Transcription (STT)
- Audio Generation (TTS)
- Image generation
- Authentication
- State preservation
Linux, Docker, macOS, and Windows support
Inference Servers support for oLLaMa, HF TGI server, vLLM, Gradio, ExLLaMa, Replicate, Together.ai, OpenAI, Azure OpenAI, Anthropic, MistralAI, Google, and Groq
OpenAI compliant
- Server Proxy API (h2oGPT acts as drop-in-replacement to OpenAI server)
- Chat and Text Completions (streaming and non-streaming)
- Audio Transcription (STT)
- Audio Generation (TTS)
- Image Generation
- Embedding
- Function tool calling w/auto tool selection
- AutoGen Code Execution Agent
JSON Mode
- Strict schema control for vLLM via its use of outlines
- Strict schema control for OpenAI, Anthropic, Google Gemini, MistralAI models
- JSON mode for some older OpenAI or Gemini models with schema control if model is smart enough (e.g. gemini 1.5 flash)
- Any model via code block extraction
Web-Search integration with Chat and Document Q/A
Agents for Search, Document Q/A, Python Code, CSV frames
- High quality Agents via OpenAI proxy server on separate port
- Code-first agent that generates plots, researches, evaluates images via vision model, etc. (client code openai_server/openai_client.py).
- No UI for this, just API
Evaluate performance using reward models
Quality maintained with over 1000 unit and integration tests taking over 24 GPU-hours

Get Started

Install h2oGPT

Docker is recommended for Linux, Windows, and MAC for full capabilities. Linux Script also has full capability, while Windows and MAC scripts have less capabilities than using Docker.

Collab Demos

Resources

Docs Guide

Development

To create a development environment for training and generation, follow the installation instructions.
To fine-tune any LLM models on your data, follow the fine-tuning instructions.

To run h2oGPT tests:

pip install requirements-parser pytest-instafail pytest-random-order playsound==1.3.0
conda install -c conda-forge gst-python -y
sudo apt-get install gstreamer-1.0
pip install pygame
GPT_H2O_AI=0 CONCURRENCY_COUNT=1 pytest --instafail -s -v tests
# for openai server test on already-running local server
pytest -s -v -n 4 openai_server/test_openai_server.py::test_openai_client

or tweak/run tests/test4gpus.sh to run tests in parallel.