AgentTel
Agent-ready telemetry — enriches OpenTelemetry spans across backend (JVM) and frontend (TypeScript) with the context AI agents need to autonomously diagnose and resolve production incidents
Install / Use
/learn @rrohitramsen/AgentTelQuality Score
Category
Development & EngineeringSupported Platforms
README
AgentTel enriches OpenTelemetry telemetry with the structured context AI agents need to autonomously diagnose, reason about, and resolve production incidents — without human interpretation of dashboards. Works across the full stack: JVM backends (Java, Kotlin, Scala), Go backends, Node.js/TypeScript backends, Python backends (FastAPI, Django, Flask), and browser frontends (TypeScript/JavaScript).
Standard observability answers "What happened?" AgentTel adds "What does an AI agent need to know to act on this?"
The Problem
Modern observability tools generate massive volumes of telemetry — traces, metrics, logs — optimized for human consumption through dashboards and alert rules. AI agents tasked with autonomous incident response face critical gaps:
- No behavioral context — Spans lack baselines, so agents can't distinguish normal from anomalous
- No topology awareness — Agents don't know which services are critical, who owns them, or what depends on what
- No decision metadata — Is this operation retryable? Is there a fallback? What's the runbook?
- No actionable interface — Agents can read telemetry but can't query live system state or execute remediation
AgentTel closes these gaps at the instrumentation layer.
Design Philosophy
Core principle: telemetry should carry enough context for AI agents to reason and act autonomously.
AgentTel enriches telemetry at three levels — all configurable via YAML, no code changes required:
| Level | Where | What | Example | |-------|-------|------|---------| | Topology | OTel Resource (once per service) | Service identity, ownership, dependencies | team, tier, on-call channel | | Baselines | Span attributes (per operation) | What "normal" looks like | P50/P99 latency, error rate | | Decisions | Span attributes (per operation) | What an agent is allowed to do | retryable, runbook URL, escalation level |
Topology is set once on the OTel Resource and automatically associated with all telemetry by the SDK. Baselines and decision metadata are attached per-operation on spans. This avoids redundant data on every span while ensuring agents always have the full context.
<p align="center"> <img src="docs/assets/images/agenttel-overview.png" alt="AgentTel — How it works" width="800"/> </p>Quick Demo
Try AgentTel in one command — starts a demo payment service with OTel Collector and Jaeger:
cd examples/spring-boot-example
docker compose -f docker/docker-compose.yml up --build
Then open Jaeger to see enriched traces, Swagger UI for the API, and MCP Tool Docs for the agent interface.
What AgentTel Provides
Enriched Telemetry (agenttel-core)
Every span is automatically enriched with agent-actionable attributes:
| Category | Attributes | Purpose |
|----------|-----------|---------|
| Topology | agenttel.topology.team, tier, domain, dependencies | Service identity and dependency graph |
| Baselines | agenttel.baseline.latency_p50_ms, error_rate, source | What "normal" looks like for each operation |
| Decisions | agenttel.decision.retryable, idempotent, runbook_url, escalation_level | What an agent is allowed to do |
| Anomalies | agenttel.anomaly.detected, pattern, score | Real-time deviation detection |
| SLOs | agenttel.slo.budget_remaining, burn_rate | Error budget consumption tracking |
Agent Interface Layer (agenttel-agent)
A complete toolkit for AI agent interaction with production systems:
| Component | Description | |-----------|-------------| | MCP Server | JSON-RPC server implementing the Model Context Protocol — exposes telemetry as tools AI agents can call | | Health Aggregation | Real-time service health from span data with operation-level and dependency-level metrics | | Incident Context | Structured incident packages: what's happening, what changed, what's affected, what to do | | Remediation Framework | Registry of executable remediation actions with approval workflows | | Action Tracking | Every agent decision and action recorded as OTel spans for full auditability | | Context Formatters | Prompt-optimized output formats (compact, full, JSON) tuned for LLM context windows |
Frontend Telemetry (agenttel-web)
Browser SDK for agent-ready frontend observability:
| Feature | Description |
|---------|-------------|
| Auto-Instrumentation | Page loads (Navigation Timing API), SPA navigation, fetch/XMLHttpRequest interception, click/submit interactions, JavaScript errors |
| Journey Tracking | Multi-step user funnel tracking with completion rates, abandonment detection, and duration baselines |
| Anomaly Detection | Client-side pattern detection — rage clicks, API failure cascades, slow page loads, error loops, funnel drop-offs |
| Cross-Stack Correlation | W3C Trace Context injection on all outgoing requests; backend trace ID extraction from responses |
| Route Baselines | Per-route configuration of expected page load times, API response times, error rates, and business criticality |
| Decision Metadata | Escalation levels, runbook URLs, retry policies, and fallback pages per route |
Instrumentation Agent (agenttel-instrument)
IDE-integrated MCP server for automated instrumentation setup:
| Tool | Description |
|------|-------------|
| analyze_codebase | Scans Java/Spring Boot source code — detects endpoints, dependencies, and framework |
| instrument_backend | Generates backend config — Gradle/Maven dependencies, annotations, agenttel.yml |
| instrument_frontend | Generates frontend config — React route detection, criticality inference, SDK initialization |
| validate_instrumentation | Validates agenttel.yml completeness against source code |
| suggest_improvements | Analyzes config and suggests fixes — missing baselines, uncovered endpoints, stale thresholds |
| apply_improvements | Auto-applies low-risk improvements using live health data; flags high-risk items for review |
GenAI Instrumentation (agenttel-genai)
Full observability for AI/ML workloads on the JVM:
| Framework | Approach | Coverage | |-----------|----------|----------| | Spring AI | SpanProcessor enrichment of existing Micrometer spans | Framework tag, cost calculation | | LangChain4j | Decorator-based full instrumentation | Chat, embeddings, RAG retrieval | | Anthropic SDK | Client wrapper | Messages API with token/cost tracking | | OpenAI SDK | Client wrapper | Chat completions with token/cost tracking | | AWS Bedrock | Client wrapper | Converse API with token/cost tracking |
Agent Observability (agenttel-agentic)
Full lifecycle tracing for AI agents with 70+ semantic attributes:
| Feature | Description | |---------|-------------| | Invocation Lifecycle | Goal, status, step count, max steps for each agent execution | | Reasoning Steps | Thought, action, observation, evaluation, revision tracking | | Tool Calls | Tool name, success/error/timeout status per call | | Task Decomposition | Nested task breakdown with depth and parent tracking | | Orchestration Patterns | Sequential, parallel, evaluator-optimizer, handoff, ReAct, orchestrator-workers | | Cost Aggregation | Automatic LLM cost rollup from GenAI spans to agent sessions | | Guardrails | Block, warn, log, escalate actions with named guardrails | | Human Checkpoints | Approval, feedback, correction gates with wait time tracking | | Loop Detection | Detects stuck reasoning loops (identical tool calls) | |
