SkillAgentSearch skills...

AgentTel

Agent-ready telemetry — enriches OpenTelemetry spans across backend (JVM) and frontend (TypeScript) with the context AI agents need to autonomously diagnose and resolve production incidents

Install / Use

/learn @rrohitramsen/AgentTel

README

<p align="center"> <strong>AgentTel</strong><br/> <em>Agent-Ready Telemetry</em> </p> <p align="center"> <a href="https://github.com/rrohitramsen/AgentTel/actions/workflows/ci.yml"><img src="https://github.com/rrohitramsen/AgentTel/actions/workflows/ci.yml/badge.svg" alt="CI"></a> <a href="https://www.apache.org/licenses/LICENSE-2.0"><img src="https://img.shields.io/badge/license-Apache%202.0-blue.svg" alt="License"></a> <a href="https://opentelemetry.io"><img src="https://img.shields.io/badge/OpenTelemetry-1.39%2B-blueviolet.svg" alt="OpenTelemetry"></a> <a href="https://rrohitramsen.github.io/AgentTel"><img src="https://img.shields.io/badge/docs-GitHub%20Pages-blue.svg" alt="Documentation"></a> </p> <p align="center"> <a href="https://central.sonatype.com/search?q=dev.agenttel"><img src="https://img.shields.io/maven-central/v/dev.agenttel/agenttel-core?label=Maven%20Central&logo=apachemaven" alt="Maven Central"></a> <a href="https://www.npmjs.com/package/@agenttel/node"><img src="https://img.shields.io/npm/v/@agenttel/node?label=npm&logo=npm" alt="npm"></a> <a href="https://pypi.org/project/agenttel/"><img src="https://img.shields.io/pypi/v/agenttel?label=PyPI&logo=python&logoColor=white" alt="PyPI"></a> <a href="https://pkg.go.dev/go.agenttel.dev/agenttel"><img src="https://img.shields.io/badge/Go-pkg.go.dev-00ADD8?logo=go&logoColor=white" alt="Go Reference"></a> </p> <p align="center"> <a href="https://www.oracle.com/java/technologies/javase/jdk17-archive-downloads.html"><img src="https://img.shields.io/badge/JDK-17%2B-orange.svg?logo=openjdk&logoColor=white" alt="JDK 17+"></a> <a href="https://go.dev"><img src="https://img.shields.io/badge/Go-1.24%2B-00ADD8.svg?logo=go&logoColor=white" alt="Go 1.24+"></a> <a href="https://nodejs.org"><img src="https://img.shields.io/badge/Node.js-18%2B-339933.svg?logo=nodedotjs&logoColor=white" alt="Node.js 18+"></a> <a href="https://www.python.org"><img src="https://img.shields.io/badge/Python-3.11%2B-3776AB.svg?logo=python&logoColor=white" alt="Python 3.11+"></a> <a href="https://www.typescriptlang.org"><img src="https://img.shields.io/badge/TypeScript-4.7%2B-3178C6.svg?logo=typescript&logoColor=white" alt="TypeScript"></a> </p>

AgentTel enriches OpenTelemetry telemetry with the structured context AI agents need to autonomously diagnose, reason about, and resolve production incidents — without human interpretation of dashboards. Works across the full stack: JVM backends (Java, Kotlin, Scala), Go backends, Node.js/TypeScript backends, Python backends (FastAPI, Django, Flask), and browser frontends (TypeScript/JavaScript).

Standard observability answers "What happened?" AgentTel adds "What does an AI agent need to know to act on this?"

The Problem

Modern observability tools generate massive volumes of telemetry — traces, metrics, logs — optimized for human consumption through dashboards and alert rules. AI agents tasked with autonomous incident response face critical gaps:

  • No behavioral context — Spans lack baselines, so agents can't distinguish normal from anomalous
  • No topology awareness — Agents don't know which services are critical, who owns them, or what depends on what
  • No decision metadata — Is this operation retryable? Is there a fallback? What's the runbook?
  • No actionable interface — Agents can read telemetry but can't query live system state or execute remediation

AgentTel closes these gaps at the instrumentation layer.

Design Philosophy

Core principle: telemetry should carry enough context for AI agents to reason and act autonomously.

AgentTel enriches telemetry at three levels — all configurable via YAML, no code changes required:

| Level | Where | What | Example | |-------|-------|------|---------| | Topology | OTel Resource (once per service) | Service identity, ownership, dependencies | team, tier, on-call channel | | Baselines | Span attributes (per operation) | What "normal" looks like | P50/P99 latency, error rate | | Decisions | Span attributes (per operation) | What an agent is allowed to do | retryable, runbook URL, escalation level |

Topology is set once on the OTel Resource and automatically associated with all telemetry by the SDK. Baselines and decision metadata are attached per-operation on spans. This avoids redundant data on every span while ensuring agents always have the full context.

<p align="center"> <img src="docs/assets/images/agenttel-overview.png" alt="AgentTel — How it works" width="800"/> </p>

Quick Demo

Try AgentTel in one command — starts a demo payment service with OTel Collector and Jaeger:

cd examples/spring-boot-example
docker compose -f docker/docker-compose.yml up --build

Then open Jaeger to see enriched traces, Swagger UI for the API, and MCP Tool Docs for the agent interface.

What AgentTel Provides

Enriched Telemetry (agenttel-core)

Every span is automatically enriched with agent-actionable attributes:

| Category | Attributes | Purpose | |----------|-----------|---------| | Topology | agenttel.topology.team, tier, domain, dependencies | Service identity and dependency graph | | Baselines | agenttel.baseline.latency_p50_ms, error_rate, source | What "normal" looks like for each operation | | Decisions | agenttel.decision.retryable, idempotent, runbook_url, escalation_level | What an agent is allowed to do | | Anomalies | agenttel.anomaly.detected, pattern, score | Real-time deviation detection | | SLOs | agenttel.slo.budget_remaining, burn_rate | Error budget consumption tracking |

Agent Interface Layer (agenttel-agent)

A complete toolkit for AI agent interaction with production systems:

| Component | Description | |-----------|-------------| | MCP Server | JSON-RPC server implementing the Model Context Protocol — exposes telemetry as tools AI agents can call | | Health Aggregation | Real-time service health from span data with operation-level and dependency-level metrics | | Incident Context | Structured incident packages: what's happening, what changed, what's affected, what to do | | Remediation Framework | Registry of executable remediation actions with approval workflows | | Action Tracking | Every agent decision and action recorded as OTel spans for full auditability | | Context Formatters | Prompt-optimized output formats (compact, full, JSON) tuned for LLM context windows |

Frontend Telemetry (agenttel-web)

Browser SDK for agent-ready frontend observability:

| Feature | Description | |---------|-------------| | Auto-Instrumentation | Page loads (Navigation Timing API), SPA navigation, fetch/XMLHttpRequest interception, click/submit interactions, JavaScript errors | | Journey Tracking | Multi-step user funnel tracking with completion rates, abandonment detection, and duration baselines | | Anomaly Detection | Client-side pattern detection — rage clicks, API failure cascades, slow page loads, error loops, funnel drop-offs | | Cross-Stack Correlation | W3C Trace Context injection on all outgoing requests; backend trace ID extraction from responses | | Route Baselines | Per-route configuration of expected page load times, API response times, error rates, and business criticality | | Decision Metadata | Escalation levels, runbook URLs, retry policies, and fallback pages per route |

Instrumentation Agent (agenttel-instrument)

IDE-integrated MCP server for automated instrumentation setup:

| Tool | Description | |------|-------------| | analyze_codebase | Scans Java/Spring Boot source code — detects endpoints, dependencies, and framework | | instrument_backend | Generates backend config — Gradle/Maven dependencies, annotations, agenttel.yml | | instrument_frontend | Generates frontend config — React route detection, criticality inference, SDK initialization | | validate_instrumentation | Validates agenttel.yml completeness against source code | | suggest_improvements | Analyzes config and suggests fixes — missing baselines, uncovered endpoints, stale thresholds | | apply_improvements | Auto-applies low-risk improvements using live health data; flags high-risk items for review |

GenAI Instrumentation (agenttel-genai)

Full observability for AI/ML workloads on the JVM:

| Framework | Approach | Coverage | |-----------|----------|----------| | Spring AI | SpanProcessor enrichment of existing Micrometer spans | Framework tag, cost calculation | | LangChain4j | Decorator-based full instrumentation | Chat, embeddings, RAG retrieval | | Anthropic SDK | Client wrapper | Messages API with token/cost tracking | | OpenAI SDK | Client wrapper | Chat completions with token/cost tracking | | AWS Bedrock | Client wrapper | Converse API with token/cost tracking |

Agent Observability (agenttel-agentic)

Full lifecycle tracing for AI agents with 70+ semantic attributes:

| Feature | Description | |---------|-------------| | Invocation Lifecycle | Goal, status, step count, max steps for each agent execution | | Reasoning Steps | Thought, action, observation, evaluation, revision tracking | | Tool Calls | Tool name, success/error/timeout status per call | | Task Decomposition | Nested task breakdown with depth and parent tracking | | Orchestration Patterns | Sequential, parallel, evaluator-optimizer, handoff, ReAct, orchestrator-workers | | Cost Aggregation | Automatic LLM cost rollup from GenAI spans to agent sessions | | Guardrails | Block, warn, log, escalate actions with named guardrails | | Human Checkpoints | Approval, feedback, correction gates with wait time tracking | | Loop Detection | Detects stuck reasoning loops (identical tool calls) | |

View on GitHub
GitHub Stars3
CategoryDevelopment
Updated3d ago
Forks0

Languages

Java

Security Score

90/100

Audited on Mar 20, 2026

No findings