SkillAgentSearch skills...

Opensre

Build your own AI SRE agents. The open source toolkit for the AI era ✨

Install / Use

/learn @Tracer-Cloud/Opensre

README

<div align="center"> <p align="center"> <img width="2136" height="476" alt="opensre-github-banner" src="https://github.com/user-attachments/assets/68ac81ff-dca0-45fb-9b92-9cc342f173f6" /> </p> <h1>OpenSRE: Build Your Own AI SRE Agents</h1> <p>The open-source framework for AI SRE agents, and the training and evaluation environment they need to improve. Connect the 40+ tools you already run, define your own workflows, and investigate incidents on your own infrastructure.</p> <p> <a href="https://github.com/Tracer-Cloud/opensre/stargazers"><img src="https://img.shields.io/github/stars/Tracer-Cloud/opensre?style=flat-square&color=FF6B00" alt="Stars"></a> <a href="https://github.com/Tracer-Cloud/opensre/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square" alt="License"></a> <a href="https://github.com/Tracer-Cloud/opensre/blob/main/.github/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/Tracer-Cloud/opensre/ci.yml?style=flat-square&label=CI" alt="CI"></a> <img src="https://img.shields.io/badge/open%20source-forever-brightgreen?style=flat-square" alt="Open Source"> <a href="https://discord.gg/7NTpevXf7w"><img src="https://img.shields.io/badge/Discord-Join%20Us-5865F2?style=flat-square&logo=discord&logoColor=white" alt="Discord"></a> </p> <p align="center"> <strong> <a href="https://app.tracer.cloud/">Getting Started</a> · <a href="https://tracer.cloud/">Tracer Agent</a> · <a href="https://tracer.mintlify.app/">Docs</a> · <a href="https://tracer.mintlify.app/faq">FAQ</a> · <a href="https://trust.tracer.cloud/">Security</a> </strong> </p> </div>

Why OpenSRE?

When something breaks in production, the evidence is scattered across logs, metrics, traces, runbooks, and Slack threads. OpenSRE is an open-source framework for AI SRE agents that resolve production incidents, built to run on your own infrastructure.

We do that because SWE-bench<sup>1</sup> gave coding agents scalable training data and clear feedback. Production incident response still lacks an equivalent.

Distributed failures are slower, noisier, and harder to simulate and evaluate than local code tasks, which is why AI SRE, and AI for production debugging more broadly, remains unsolved.

OpenSRE is building that missing layer:

an open reinforcement learning environment for agentic infrastructure incident response, with end-to-end tests and synthetic incident simulations for realistic production failures

We do that by:

  • building easy-to-deploy, customizable AI SRE agents for production incident investigation and response
  • running scored synthetic RCA suites that check root-cause accuracy, required evidence, and adversarial red herrings (tests/synthetic)
  • running real-world end-to-end tests across cloud-backed scenarios including Kubernetes, EC2, CloudWatch, Lambda, ECS Fargate, and Flink (tests/e2e)
  • keeping semantic test-catalog naming so e2e vs synthetic and local vs cloud boundaries stay obvious (tests/README.md)

Our mission is to build AI SRE agents on top of this, scale it to thousands of realistic infrastructure failure scenarios, and establish OpenSRE as the benchmark and training ground for AI SRE.

<sup>1</sup> https://arxiv.org/abs/2310.06770


Install

curl -fsSL https://raw.githubusercontent.com/Tracer-Cloud/opensre/main/install.sh | bash
brew install Tracer-Cloud/opensre/opensre
irm https://raw.githubusercontent.com/Tracer-Cloud/opensre/main/install.ps1 | iex
<!-- ```bash pipx install opensre ``` -->

Quick Start

opensre onboard
opensre investigate -i tests/e2e/kubernetes/fixtures/datadog_k8s_alert.json
opensre update

Development

New to OpenSRE? See SETUP.md for detailed platform-specific setup instructions, including Windows setup, environment configuration, and more.

git clone https://github.com/Tracer-Cloud/opensre
cd opensre
make install
# run opensre onboard to configure your local LLM provider
# and optionally validate/save Grafana, Datadog, Honeycomb, Coralogix, Slack, AWS, GitHub MCP, and Sentry integrations
opensre onboard
opensre investigate -i tests/e2e/kubernetes/fixtures/datadog_k8s_alert.json

How OpenSRE Works

<img width="4096" height="2187" alt="tracer-how-it-works-illustration" src="https://github.com/user-attachments/assets/8b50fe5c-470c-4982-866f-4f90c3e251d1" />

Investigation Workflow

When an alert fires, OpenSRE automatically:

  1. Fetches the alert context and correlated logs, metrics, and traces
  2. Reasons across your connected systems to identify anomalies
  3. Generates a structured investigation report with probable root cause
  4. Suggests next steps and, optionally, executes remediation actions
  5. Posts a summary directly to Slack or PagerDuty - no context switching needed

Benchmark

Generate the benchmark report:

make benchmark
<!-- BENCHMARK-START -->

No benchmark results yet. Run make benchmark to generate.

<!-- BENCHMARK-END -->

Capabilities

| | | | ---------------------------------------- | -------------------------------------------------------------------------------- | | 🔍 Structured incident investigation | Correlated root-cause analysis across all your signals | | 📋 Runbook-aware reasoning | OpenSRE reads your runbooks and applies them automatically | | 🔮 Predictive failure detection | Catch emerging issues before they page you | | 🔗 Evidence-backed root cause | Every conclusion is linked to the data behind it | | 🤖 Full LLM flexibility | Bring your own model — Anthropic, OpenAI, Ollama, Gemini, OpenRouter, NVIDIA NIM |


Integrations

OpenSRE connects to 40+ tools and services across the modern cloud stack, from LLM providers and observability platforms to infrastructure, databases, and incident management.

| Category | Integrations | Roadmap | | --- | --- | --- | | AI / LLM Providers | Anthropic · OpenAI · Ollama · Google Gemini · OpenRouter · NVIDIA NIM · Bedrock | | | Observability | <img src="docs/assets/icons/grafana.webp" width="16"> Grafana (Loki · Mimir · Tempo) · <img src="docs/assets/icons/datadog.svg" width="16"> Datadog · Honeycomb · Coralogix · <img src="docs/assets/icons/cloudwatch.png" width="16"> CloudWatch · <img src="docs/assets/icons/sentry.png" width="16"> Sentry · Elasticsearch | Splunk · New Relic · Victoria Logs | | Infrastructure | <img src="docs/assets/icons/kubernetes.png" width="16"> Kubernetes · <img src="docs/assets/icons/aws.png" width="16"> AWS (S3 · Lambda · EKS · EC2 · Bedrock) · <img src="docs/assets/icons/gcp.jpg" width="16"> GCP · <img src="docs/assets/icons/azure.png" width="16"> Azure | Helm · ArgoCD | | Database | MongoDB · ClickHouse | PostgreSQL · MySQL · MariaDB · MongoDB Atlas · Azure SQL · RDS · Snowflake | | Data Platform | Apache Airflow · Apache Kafka · Apache Spark · Prefect | RabbitMQ | | Dev Tools | <img src="docs/assets/icons/github.webp" width="16"> GitHub · GitHub MCP · Bitbucket | GitLab | | Incident Management | <img src="docs/assets/icons/pagerduty.png" width="16"> PagerDuty · Opsgenie · Jira | ServiceNow · incident.io · Alertmanager · Linear · Trello | | Communication | <img src="docs/assets/icons/slack.png" width="16"> Slack · Google Docs | Discord · Teams · WhatsApp · Confluence · Notion | | Agent Deployment | <img src="docs/assets/icons/vercel.png" width="16"> Vercel · <img src="docs/assets/icons/langsmith.png" width="16"> LangSmith · <img src="docs/assets/icons/aws.png" width="16"> EC2 · <img src="docs/assets/icons/aws.png" width="16"> ECS | Railway | | Protocols | <img src="docs/assets/icons/mcp.svg" width="16"> MCP · <img src="docs/assets/icons/acp.png" width="16"> ACP · <img src="docs/assets/icons/openclaw.jpg" width="16"> OpenClaw | |


Contributing

OpenSRE is community-built. Every integration, improvement, and bug fix makes it better for thousands of engineers. We actively review PRs and welcome contributors of all experience levels.

<p> <a href="https://discord.gg/7NTpevXf7w"> <img src="https://img.shields.io/badge/Join%20our%20Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Join our Discord" /> </a> </p>
View on GitHub
GitHub Stars506
CategoryDevelopment
Updated37m ago
Forks70

Languages

Python

Security Score

100/100

Audited on Apr 9, 2026

No findings