SkillAgentSearch skills...

Rootcause

RootCause is a local-first MCP server that turns natural-language requests into evidence-backed incident analysis, Kubernetes diagnostics, and safer operations.

Install / Use

/learn @yindia/Rootcause
About this skill

Quality Score

0/100

Supported Platforms

Claude Code
Cursor

README

RootCause 🧭

Go MCP codecov

AI-native SRE for Kubernetes incidents.

RootCause is a local-first MCP server that turns natural-language requests into evidence-backed incident analysis, Kubernetes diagnostics, and safer operations.

Built in Go as a single binary, RootCause is optimized for low-friction local workflows using your existing kubeconfig identity.


🚀 Quick Start | 🌐 Client Setup | 🛠️ Tools | 🧩 Skills | 🔒 Safety | ⚙️ Config | 🏗️ Architecture | 🤝 Contributing


Why RootCause 💡

RootCause is built for SRE/operator workflows where speed matters, but unsafe automation is unacceptable.

  • 🚀 Stop context-switching: investigate incidents, rollout risk, Helm/Terraform/AWS signals, and remediation from one MCP server.
  • 🧠 AI-powered diagnostics: evidence-first analysis with RCA, timelines, and action-oriented next checks.
  • 💸 Built-in cost optimization: combine resource usage, workload best-practice checks, Terraform plan analysis, and cloud context for optimization decisions.
  • 🔒 Enterprise-ready guardrails: role/namespace policy enforcement, redaction, read-only mode, destructive tool controls, and mutation preflight.
  • ⚡ Zero learning curve: ask natural-language operational questions and use provided prompt templates for common SRE flows.
  • 🌐 Universal compatibility: works with MCP-compatible clients across Claude, Cursor, Copilot, Codex, and more.
  • 🏭 Production-grade workflow: single Go binary, kubeconfig-native auth, deterministic structured outputs, and broad test coverage.

Why teams choose it

| Need | RootCause answer | |---|---| | "What changed and why did this break?" | rootcause.incident_bundle, rootcause.change_timeline, rootcause.rca_generate | | "Is it safe to restart or roll out now?" | k8s.restart_safety_check, k8s.best_practice, k8s.safe_mutation_preflight | | "Is my platform ecosystem healthy?" | k8s.*_detect + k8s.diagnose_* for ArgoCD/Flux/cert-manager/Kyverno/Gatekeeper/Cilium | | "Can I standardize SRE responses?" | Prompt templates + structured output from shared render/evidence pipeline |

What Can You Do?

Ask your AI assistant in natural language:

  • "Why did this deployment fail after rollout?"
  • "Is this workload safe to restart right now?"
  • "Why are ArgoCD apps out of sync?"
  • "Is Flux healthy in this cluster?"
  • "Why are certs failing to renew?"
  • "Before patch/apply, is this mutation safe?"

RootCause keeps its depth-first model: evidence-first diagnosis, root-cause analysis, and remediation flow instead of raw tool sprawl.

Power users can map these prompts to concrete tools in this README (Complete Feature Set, Toolchains, and Tools sections).

Use Cases

Incident response

  • Build end-to-end incident evidence with rootcause.incident_bundle
  • Generate probable causes with rootcause.rca_generate
  • Export timeline and postmortem artifacts for follow-up

Safe operations before mutation

  • Evaluate rollout/restart risk with k8s.restart_safety_check and k8s.best_practice
  • Run k8s.safe_mutation_preflight before apply/patch/delete/scale operations

Ecosystem-specific health checks

  • ArgoCD: detect installation and diagnose sync/health drift
  • Flux: detect controllers and diagnose reconciliation failures
  • cert-manager / Kyverno / Gatekeeper / Cilium: detect footprint and diagnose control-plane or policy issues

Feature Highlights

| Area | RootCause Capability | |---|---| | Incident analysis | rootcause.incident_bundle, rootcause.rca_generate, rootcause.change_timeline, rootcause.postmortem_export, rootcause.capabilities | | Kubernetes resilience | k8s.restart_safety_check, k8s.best_practice, k8s.safe_mutation_preflight | | Ecosystem diagnostics | ArgoCD/Flux/cert-manager/Kyverno/Gatekeeper/Cilium via *_detect and diagnose_* tools | | Deployment safety | Automatic preflight before k8s mutating operations | | Helm operations | Chart search/list/get, release diff, rollback advisor, template apply/uninstall flows | | Terraform analysis | Module/provider search + terraform.debug_plan for impact/risk analysis | | Service mesh & scaling | Linkerd/Istio/Karpenter diagnostics with shared evidence model |

Complete Feature Set

| Category | Representative capabilities | |---|---| | Kubernetes core (k8s.*) | CRUD, logs/events, graph-based debug flows, restart safety, best-practice scoring, mutation preflight | | Ecosystem diagnostics | ArgoCD, Flux, cert-manager, Kyverno, Gatekeeper, Cilium via *_detect and diagnose_* | | Incident intelligence (rootcause.*) | Incident bundle orchestration, timeline export, RCA generation, remediation playbook, postmortem export | | Helm operations (helm.*) | Chart registry search/list/get, release status/diff, rollback advisor, install/upgrade/uninstall, template apply/uninstall | | Terraform analysis (terraform.*) | Modules/providers/resources/data source discovery + plan debugging | | Service mesh (istio.*, linkerd.*) | Proxy/config/status diagnostics, policy/routing visibility, mesh resource health | | Cluster autoscaling (karpenter.*) | Provisioning, nodepool/nodeclass, interruption and scheduling diagnostics | | Cloud context (aws.*) | IAM, VPC, EC2, EKS, ECR, STS, KMS diagnostics for cross-layer incident analysis | | Safety and controls | Read-only mode, destructive gating, explicit confirmation, auto preflight checks before mutating K8s operations |

Agent Skills

Extend your AI coding agent with Kubernetes and RootCause expertise using the built-in skills library in skills/.

Skills metadata is schema-versioned and embedded in the CLI from internal/skills/catalog/manifest.json.

Quick Install

# Copy all skills to Claude
cp -r skills/claude/* ~/.claude/skills/

# Or install a specific skill
cp -r skills/claude/k8s-helm ~/.claude/skills/

Sync Skills into Project Agent Directories

# List supported agent targets
rootcause sync-skills --list-agents

# Sync skills for one agent into project-local defaults
rootcause sync-skills --agent claude --project-dir .

# Example: GitHub Copilot project files
rootcause sync-skills --agent copilot --project-dir .

# UX helpers
rootcause sync-skills --all-agents --dry-run
rootcause sync-skills --agent claude --skill k8s-incident --skill rootcause-rca
rootcause sync-skills --list-skills

Agent directory defaults used by sync-skills:

| Agent | Format | Project Directory | |---|---|---| | Claude Code | SKILL.md | .claude/skills/ | | Cursor | .mdc | .cursor/skills/ | | Codex | SKILL.md | .codex/skills/ | | Gemini CLI | SKILL.md | .gemini/skills/ | | OpenCode | SKILL.md | .opencode/skills/ | | GitHub Copilot | Markdown | .github/skills/ | | Windsurf | Markdown | .windsurf/skills/ | | Devin | Markdown | .devin/skills/ | | Aider | SKILL.md | .aider/skills/ | | Sourcegraph Cody | SKILL.md | .cody/skills/ | | Amazon Q | SKILL.md | .amazonq/skills/ |

Available Skills (21)

20 skills are currently included.

| Category | Skills | |---|---| | Incident Response | k8s-incident, rootcause-rca | | Core and Operations | k8s-core, k8s-operations | | Diagnostics and Debugging | k8s-diagnostics, k8s-troubleshoot | | Deployment and Delivery | k8s-deploy, k8s-helm, k8s-rollouts | | GitOps | k8s-gitops | | Networking and Mesh | k8s-networking, k8s-service-mesh, k8s-cilium | | Security and Policy | k8s-security, k8s-policy, k8s-gatekeeper, k8s-certs | | Cost and Scaling | k8s-cost, k8s-autoscaling | | Storage | k8s-storage | | Browser Automation | k8s-browser |

Supported agents include Claude, Cursor, Codex, Gemini CLI, GitHub Copilot, Goose, Windsurf, Roo, Amp, and more.

Skills include consistent triggers, workflow steps, tool references, troubleshooting notes, and output contracts.

See skills/README.md for full documentation and skills/CATALOG.md for auto-generated catalog output.

MCP Resources

Access Kubernetes data as browsable resources:

| Resource URI | Description | |---|---| | kubeconfig://contexts | List all available kubeconfig contexts | | kubeconfig://current-context | Get current active context | | namespace://current | Get current namespace | | namespace://list | List all namespaces | | cluster://info | Get cluster connection info | | cluster://nodes | Get detailed node information | | cluster://version | Get Kubernetes version | | cluster://api-resources | List available API resources | | manifest://deployments/{namespace}/{name} | Get deployment YAML | | manifest://services/{namespace}/{name} | Get service YAML | | manifest://pods/{namespace}/{name} | Get pod YAML | | manifest://configmaps/{namespace}/{name} | Get ConfigMap YAML | | manifest://secrets/{namespace}/{name} | Get secret YAML (data masked) | | manifest://ingresses/{namespace}/{name} | Get ingress YAML |

MCP Prompts

Pre-built workflow prompts for Kubernetes and platform operations:

| Prompt | Description | |---|---| | troubleshoot_workload | Comprehensive troubleshooting guide for pods/deployments | | deploy_application | Step-by-step deployment workflow | | security_audit | Security scanning and RBAC analysis workflow | | cost_optimization | Resource optimization and cost analysis workflow | | disaster_recovery | Backup and recovery planning workflow | | debug_networking | Network debugging for services and connectivity | | scale_application | Scaling guide with HPA

Related Skills

View on GitHub
GitHub Stars8
CategoryDevelopment
Updated21d ago
Forks1

Languages

Go

Security Score

90/100

Audited on Mar 19, 2026

No findings