HyprStream: agentic infrastructure for continously learning applications

Overview

HyprStream is an agentic cloud infrastructure for applications that learn, build, and run. Integrating continous development, training, integration, and deployment of software and AI/ML models. Primary features include an LLM inference and training engine built in Rust, with PyTorch, featuring integrated training capabilities, version control, and secure tool use with microvm containers.

Users may communicate with open weight and custom LLMs via Hyprstream with an OpenAI API.

Easy to get started: download the AppImage and it auto-detects your NVIDIA or ROCm GPU. See docs/quickstart.md for a full walkthrough.

Core Features

Frontend-ready: Use the included TUI for easy of use and share terminals with collaborators and agents.
Collaborative: Multi-user, multi-agent interfaces through a high-speed compositing multiplexer.
LLM Inference & Training: Supporting the dense Qwen3.5 and Qwen3 model architectures.
Test Time Training: Models train models using MCP tools, test-time-training, and the Muon optimizer.
Security-minded: Zero-trust cryptographic architecture with ZK stream proxies, Casbin Policy, and OpenID integration.
Industry-compatible: Providing compatibility with OpenAI's OpenAPI specification.
Hardware Accelerated: NVIDIA CUDA and AMD ROCm support, universal binary.
Version Controlled: Manages source and weights with Git, compatible with HuggingFace.
Systemd Integration - Optional user-level service management for background workers, long-running services, and containers.
Powered by Torch: Built on stable PyTorch C++ API (libtorch) using tch-rs.

Experimental Features

Workers - Isolated workload execution using Kata microvms with cloud-hypervisor.
[Workflows] - Git workflow file support for local continous integration, deployment, and functions-as-a-service.
[Metrics] - Structured knowledge engine and time-series aggregation database powered by DuckDB, ADBC, and Flight.

Installation

Quick Install (AppImage, Linux)

Hyprstream requires git and git-lfs (available in all major Linux distros).

Download the Universal AppImage. We publish AppImages for each CPU/GPU configuration; the Universal image is recommended for ease-of-use and GPU auto-detection.

# Download and install (Universal recommended)
chmod +x hyprstream-v0.3.0-x86_64.AppImage

# Installer Path (v0.4.0+):

./hyprstream-v0.4.0-x86_64.AppImage wizard # add `-y` for autoinstall

# Manual path (< v0.3.0):
./hyprstream-v0.3.0-x86_64.AppImage service install

# Add to PATH
export PATH="$HOME/.local/bin:$PATH"

# Apply policy template (hyprstream is deny-by-default)
hyprstream quick policy apply-template local

hyprstream service start

See docs/quickstart.md for prerequisites, source build, and first-time setup.

NOTE: For CUDA systems, make sure you have installed CUDA Toolkit and set LD_PRELOAD:

systemctl --user set-environment LD_PRELOAD=libtorch_cuda.so && systemctl --user restart hyprstream-model

The installed files will be located in $HOME/.local/hyprstream/ and $HOME/.local/bin/.

Building from source

# Set LIBTORCH to your libtorch path, or use --features download-libtorch
cargo build --release

See docs/quickstart.md for prerequisites and DEVELOP.md for detailed build instructions.

Container deployment

Hyprstream can run inside containers. See README-Docker.md for Docker/Kubernetes deployment.

Quick Start

Clone a model

Hyprstream supports Qwen3 model inference from Git repositories (HuggingFace, GitHub, etc.).

# Clone a model
hyprstream quick clone https://huggingface.co/Qwen/Qwen3-0.6B

# Clone with a custom name
hyprstream quick clone https://huggingface.co/Qwen/Qwen3-0.6B --name qwen3-small

Managing models

Worktrees are automatically managed by hyprstream.

# List all cached models
hyprstream quick list

# Get detailed model information (model:branch format)
hyprstream quick info qwen3-small
hyprstream quick info qwen3-small:main

Run inference

# Basic inference
hyprstream quick infer qwen3-small:main \
    --prompt "Explain quantum computing in simple terms"

# With options
hyprstream quick infer qwen3-small:main \
    --prompt "Write a Python function to sort a list" \
    --temperature 0.7 \
    --top-p 0.9 \
    --max-tokens 1024

Architecture

Integrating Hyprstream into your business or workflow

OpenAI-Compatible REST API

HyprStream provides an OpenAI-compatible API endpoint for easy integration with existing tools and libraries:

# Start API server
hyprstream server --port 6789

# List available models (worktree-based)
curl http://localhost:6789/oai/v1/models

# Example response shows models as model:branch format
# {
#   "object": "list",
#   "data": [
#     {
#       "id": "qwen3-small:main",
#       "object": "model",
#       "created": 1762974327,
#       "owned_by": "system driver:overlay2, saved:2.3GB, age:2h cached"
#     },
#     {
#       "id": "qwen3-small:experiment-1",
#       "object": "model",
#       "created": 1762975000,
#       "owned_by": "system driver:overlay2, saved:1.8GB, age:30m"
#     }
#   ]
# }

# Make chat completions request (OpenAI-compatible)
# NOTE: Models must be referenced with branch (model:branch format)
curl -X POST http://localhost:6789/oai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-small:main",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ],
    "max_tokens": 100,
    "temperature": 0.7
  }'

# Or use with any OpenAI-compatible client
export OPENAI_API_KEY="dummy"
export OPENAI_BASE_URL="http://localhost:6789/oai/v1"
# Now use any OpenAI client library
# Note: Specify model as "qwen3-small:main" not just "qwen3-small"

Worktree-Based Model References

HyprStream uses Git worktrees for model management. The /v1/models endpoint lists all worktrees (not base models):

Format: Models are always shown as model:branch (e.g., qwen3-small:main)
Multiple Versions: Each worktree (branch) appears as a separate model
Metadata: The owned_by field includes worktree metadata:
- Storage driver (e.g., driver:overlay2)
- Space saved via CoW (e.g., saved:2.3GB)
- Worktree age (e.g., age:2h)
- Cache status (cached if loaded in memory)

Example: If you have a model qwen3-small with branches main, experiment-1, and training, the API will list three separate entries:

qwen3-small:main
qwen3-small:experiment-1
qwen3-small:training

This allows you to work with multiple versions of the same model simultaneously, each in its own worktree with isolated changes.

MCP Integration (Claude Code, Cursor, etc.)

HyprStream includes a built-in Model Context Protocol server that exposes inference, model management, and repository operations as tools for AI coding assistants.

1. Configure Claude Code:

claude mcp add --transport http hyprstream http://localhost:6790/mcp

2. Authenticate

Use /mcp, select hyprstream, and select Authenticate or Re-authenticate.

3. Available tools:

Once connected, Claude Code can use hyprstream tools directly:

| Tool | Description | |------|-------------| | model.load | Load a model for inference | | model.list | List loaded models | | model.status | Get model status and memory usage | | registry.list | List all cloned repositories | | registry.clone | Clone a model from HuggingFace/GitHub | | repo.* | Branch, worktree, merge, and tag operations | | policy.* | Policy checks and token management |

Configuration:

The MCP server listens on port 6790 by default. To change it, set in your hyprstream config:

[mcp]
host = "127.0.0.1"

Hyprstream

Install / Use

README