SkillAgentSearch skills...

Tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.

Install / Use

/learn @tensorzero/Tensorzero

README

<p><picture><img src="https://github.com/user-attachments/assets/9d0a93c6-7685-4e57-9737-7cbeb338a218" alt="TensorZero Logo" width="128" height="128"></picture></p>

TensorZero

<p><picture><img src="https://www.tensorzero.com/github-trending-badge.svg" alt="#1 Repository Of The Day"></picture></p>

TensorZero is an open-source stack for industrial-grade LLM applications:

  • Gateway: access every LLM provider through a unified API, built for performance (<1ms p99 latency)
  • Observability: store inferences and feedback in your database, available programmatically or in the UI
  • Optimization: collect metrics and human feedback to optimize prompts, models, and inference strategies
  • Evaluation: benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, etc.
  • Experimentation: ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.

Take what you need, adopt incrementally, and complement with other tools.

<video src="https://github.com/user-attachments/assets/04a8466e-27d8-4189-b305-e7cecb6881ee"></video>


<p align="center"> <b><a href="https://www.tensorzero.com/" target="_blank">Website</a></b> · <b><a href="https://www.tensorzero.com/docs" target="_blank">Docs</a></b> · <b><a href="https://www.x.com/tensorzero" target="_blank">Twitter</a></b> · <b><a href="https://www.tensorzero.com/slack" target="_blank">Slack</a></b> · <b><a href="https://www.tensorzero.com/discord" target="_blank">Discord</a></b> <br> <br> <b><a href="https://www.tensorzero.com/docs/quickstart" target="_blank">Quick Start (5min)</a></b> · <b><a href="https://www.tensorzero.com/docs/gateway/deployment" target="_blank">Deployment Guide</a></b> · <b><a href="https://www.tensorzero.com/docs/gateway/api-reference" target="_blank">API Reference</a></b> · <b><a href="https://www.tensorzero.com/docs/gateway/deployment" target="_blank">Configuration Reference</a></b> </p>

[!NOTE]

Coming Soon: TensorZero Autopilot

TensorZero Autopilot is an automated AI engineer (powered by the TensorZero Stack) that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. Learn more Join the waitlist

Features

🌐 LLM Gateway

Integrate with TensorZero once and access every major LLM provider.

Supported Model Providers

Anthropic, AWS Bedrock, AWS SageMaker, Azure, DeepSeek, Fireworks, GCP Vertex AI Anthropic, GCP Vertex AI Gemini, Google AI Studio (Gemini API), Groq, Hyperbolic, Mistral, OpenAI, OpenRouter, SGLang, TGI, Together AI, vLLM, and xAI (Grok). Need something else? TensorZero also supports any OpenAI-compatible API (e.g. Ollama).

Usage Example

You can use TensorZero with any OpenAI SDK (Python, Node, Go, etc.) or OpenAI-compatible client.

  1. Deploy the TensorZero Gateway (one Docker container).
  2. Update the base_url and model in your OpenAI-compatible client.
  3. Run inference:
from openai import OpenAI

# Point the client to the TensorZero Gateway
client = OpenAI(base_url="http://localhost:3000/openai/v1", api_key="not-used")

response = client.chat.completions.create(
    # Call any model provider (or TensorZero function)
    model="tensorzero::model_name::anthropic::claude-sonnet-4-6",
    messages=[
        {
            "role": "user",
            "content": "Write a haiku about TensorZero.",
        }
    ],
)

See Quick Start for more information.

🔍 LLM Observability

Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI.

  • [x] Store inferences and feedback (metrics, human edits, etc.) in your own database
  • [x] Dive into individual inferences or high-level aggregate patterns using the TensorZero UI or programmatically
  • [x] Build datasets for optimization, evaluation, and other workflows
  • [x] Replay historical inferences with new prompts, models, inference strategies, etc.
  • [x] Export OpenTelemetry traces (OTLP) and export Prometheus metrics to your favorite application observability tools
  • [ ] Soon: AI-assisted debugging and root cause analysis; AI-assisted data labeling

📈 LLM Optimization

Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically.

  • [x] Optimize your models with supervised fine-tuning, RLHF, and other techniques
  • [x] Optimize your prompts with automated prompt engineering algorithms like GEPA
  • [x] Optimize your inference strategy with dynamic in-context learning, best/mixture-of-N sampling, etc.
  • [x] Enable a feedback loop for your LLMs: a data & learning flywheel turning production data into smarter, faster, and cheaper models
  • [ ] Soon: synthetic data generation

📊 LLM Evaluation

Compare prompts, models, and inference strategies using evaluations powered by heuristics and LLM judges.

  • [x] Evaluate individual inferences with inference evaluations powered by heuristics or LLM judges (≈ unit tests for LLMs)
  • [x] Evaluate end-to-end workflows with workflow evaluations with complete flexibility (≈ integration tests for LLMs)
  • [x] Optimize LLM judges just like any other TensorZero function to align them to human preferences
  • [ ] Soon: more built-in evaluators; headless evaluations
<table> <tr></tr> <!-- flip highlight order --> <tr> <td width="50%" align="center" valign="middle"><b>Evaluation » UI</b></td> <td width="50%" align="center" valign="middle"><b>Evaluation » CLI</b></td> </tr> <tr> <td width="50%" align="center" valign="middle"><img src="htt
View on GitHub
GitHub Stars11.1k
CategoryEducation
Updated2h ago
Forks794

Languages

Rust

Security Score

100/100

Audited on Mar 21, 2026

No findings