H2ogpt
Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
Install / Use
/learn @h2oai/H2ogptREADME
h2oGPT
Turn ★ into ⭐ (top-right corner) if you like the project!
Query and summarize your documents or just chat with local private GPT LLMs using h2oGPT, an Apache V2 open-source project.
Check out a long CoT Open-o1 open 🍓strawberry🍓 project: https://github.com/pseudotensor/open-strawberry
Try Enterprise Version for Free
Video Demo
https://github.com/h2oai/h2ogpt/assets/2249614/2f805035-2c85-42fb-807f-fd0bca79abc6
Features
- Private offline database of any documents (PDFs, Excel, Word, Images, Video Frames, YouTube, Audio, Code, Text, MarkDown, etc.)
- Persistent database (Chroma, Weaviate, or in-memory FAISS) using accurate embeddings (instructor-large, all-MiniLM-L6-v2, etc.)
- Efficient use of context using instruct-tuned LLMs (no need for LangChain's few-shot approach)
- Parallel summarization and extraction, reaching an output of 80 tokens per second with the 13B LLaMa2 model
- HYDE (Hypothetical Document Embeddings) for enhanced retrieval based upon LLM responses
- Semantic Chunking for better document splitting (requires GPU)
- Variety of models supported (LLaMa2, Mistral, Falcon, Vicuna, WizardLM. With AutoGPTQ, 4-bit/8-bit, LORA, etc.)
- GPU support from HF and LLaMa.cpp GGML models, and CPU support using HF, LLaMa.cpp, and GPT4ALL models
- Attention Sinks for arbitrarily long generation (LLaMa-2, Mistral, MPT, Pythia, Falcon, etc.)
- Gradio UI or CLI with streaming of all models
- Upload and View documents through the UI (control multiple collaborative or personal collections)
- Vision Models LLaVa, Claude-3, Gemini-Pro-Vision, GPT-4-Vision
- Image Generation Stable Diffusion (sdxl-turbo, sdxl, SD3), PlaygroundAI (playv2), and Flux
- Voice STT using Whisper with streaming audio conversion
- Voice TTS using MIT-Licensed Microsoft Speech T5 with multiple voices and Streaming audio conversion
- Voice TTS using MPL2-Licensed TTS including Voice Cloning and Streaming audio conversion
- AI Assistant Voice Control Mode for hands-free control of h2oGPT chat
- Bake-off UI mode against many models at the same time
- Easy Download of model artifacts and control over models like LLaMa.cpp through the UI
- Authentication in the UI by user/password via Native or Google OAuth
- State Preservation in the UI by user/password
- Open Web UI with h2oGPT as backend via OpenAI Proxy
- See Start-up Docs.
- Chat completion with streaming
- Document Q/A using h2oGPT ingestion with advanced OCR from DocTR
- Vision models
- Audio Transcription (STT)
- Audio Generation (TTS)
- Image generation
- Authentication
- State preservation
- Linux, Docker, macOS, and Windows support
- Inference Servers support for oLLaMa, HF TGI server, vLLM, Gradio, ExLLaMa, Replicate, Together.ai, OpenAI, Azure OpenAI, Anthropic, MistralAI, Google, and Groq
- OpenAI compliant
- Server Proxy API (h2oGPT acts as drop-in-replacement to OpenAI server)
- Chat and Text Completions (streaming and non-streaming)
- Audio Transcription (STT)
- Audio Generation (TTS)
- Image Generation
- Embedding
- Function tool calling w/auto tool selection
- AutoGen Code Execution Agent
- JSON Mode
- Strict schema control for vLLM via its use of outlines
- Strict schema control for OpenAI, Anthropic, Google Gemini, MistralAI models
- JSON mode for some older OpenAI or Gemini models with schema control if model is smart enough (e.g. gemini 1.5 flash)
- Any model via code block extraction
- Web-Search integration with Chat and Document Q/A
- Agents for Search, Document Q/A, Python Code, CSV frames
- High quality Agents via OpenAI proxy server on separate port
- Code-first agent that generates plots, researches, evaluates images via vision model, etc. (client code openai_server/openai_client.py).
- No UI for this, just API
- Evaluate performance using reward models
- Quality maintained with over 1000 unit and integration tests taking over 24 GPU-hours
Get Started
Install h2oGPT
Docker is recommended for Linux, Windows, and MAC for full capabilities. Linux Script also has full capability, while Windows and MAC scripts have less capabilities than using Docker.
- Docker Build and Run Docs (Linux, Windows, MAC)
- Linux Install and Run Docs
- Windows 10/11 Installation Script
- MAC Install and Run Docs
- Quick Start on any Platform
Collab Demos
Resources
- FAQs
- README for LangChain
- Discord
- Models (LLaMa-2, Falcon 40, etc.) at 🤗
- YouTube: 100% Offline ChatGPT Alternative?
- YouTube: Ultimate Open-Source LLM Showdown (6 Models Tested) - Surprising Results!
- YouTube: Blazing Fast Falcon 40b 🚀 Uncensored, Open-Source, Fully Hosted, Chat With Your Docs
- Technical Paper: https://arxiv.org/pdf/2306.08161.pdf
Docs Guide
<!-- cat README.md | ./gh-md-toc - But Help is heavily processed -->- Get Started
- Linux (CPU or CUDA)
- macOS (CPU or M1/M2)
- Windows 10/11 (CPU or CUDA)
- GPU (CUDA, AutoGPTQ, exllama) Running Details
- CPU Running Details
- CLI chat
- Gradio UI
- Client API (Gradio, OpenAI-Compliant)
- Inference Servers (oLLaMa, HF TGI server, vLLM, Groq, Anthropic, Google, Mistral, Gradio, ExLLaMa, Replicate, OpenAI, Azure OpenAI)
- Build Python Wheel
- Offline Installation
- Low Memory
- Docker
- LangChain Document Support
- Compare to PrivateGPT et al.
- Roadmap
- Development
- Help
- Acknowledgements
- Why H2O.ai?
- Disclaimer
Development
- To create a development environment for training and generation, follow the installation instructions.
- To fine-tune any LLM models on your data, follow the fine-tuning instructions.
- To run h2oGPT tests:
or tweak/runpip install requirements-parser pytest-instafail pytest-random-order playsound==1.3.0 conda install -c conda-forge gst-python -y sudo apt-get install gstreamer-1.0 pip install pygame GPT_H2O_AI=0 CONCURRENCY_COUNT=1 pytest --instafail -s -v tests # for openai server test on already-running local server pytest -s -v -n 4 openai_server/test_openai_server.py::test_openai_clienttests/test4gpus.shto run tests in parallel.
Acknowledgements
- Some training code was based upon March 24 version of Alpaca-LoRA.
- Used high-quality created data by OpenAssistant.
- Used base models by EleutherAI
Related Skills
summarize
339.1kSummarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).
feishu-doc
339.1k|
obsidian
339.1kWork with Obsidian vaults (plain Markdown notes) and automate via obsidian-cli.
openhue
339.1kControl Philips Hue lights and scenes via the OpenHue CLI.
YouTube 4K Video