Rootcause

RootCause is a local-first MCP server that turns natural-language requests into evidence-backed incident analysis, Kubernetes diagnostics, and safer operations.

Generate Convert Improve

Install / Use

/learn @yindia/Rootcause

About this skill

Quality Score

0/100

README

RootCause 🧭

AI-native SRE for Kubernetes incidents.

RootCause is a local-first MCP server that turns natural-language requests into evidence-backed incident analysis, Kubernetes diagnostics, and safer operations.

Built in Go as a single binary, RootCause is optimized for low-friction local workflows using your existing kubeconfig identity.

Why RootCause 💡

RootCause is built for SRE/operator workflows where speed matters, but unsafe automation is unacceptable.

🚀 Stop context-switching: investigate incidents, rollout risk, Helm/Terraform/AWS signals, and remediation from one MCP server.
🧠 AI-powered diagnostics: evidence-first analysis with RCA, timelines, and action-oriented next checks.
💸 Built-in cost optimization: combine resource usage, workload best-practice checks, Terraform plan analysis, and cloud context for optimization decisions.
🔒 Enterprise-ready guardrails: role/namespace policy enforcement, redaction, read-only mode, destructive tool controls, and mutation preflight.
⚡ Zero learning curve: ask natural-language operational questions and use provided prompt templates for common SRE flows.
🌐 Universal compatibility: works with MCP-compatible clients across Claude, Cursor, Copilot, Codex, and more.
🏭 Production-grade workflow: single Go binary, kubeconfig-native auth, deterministic structured outputs, and broad test coverage.

Why teams choose it

| Need | RootCause answer | |---|---| | "What changed and why did this break?" | rootcause.incident_bundle, rootcause.change_timeline, rootcause.rca_generate | | "Is it safe to restart or roll out now?" | k8s.restart_safety_check, k8s.best_practice, k8s.safe_mutation_preflight | | "Is my platform ecosystem healthy?" | k8s.*_detect + k8s.diagnose_* for ArgoCD/Flux/cert-manager/Kyverno/Gatekeeper/Cilium | | "Can I standardize SRE responses?" | Prompt templates + structured output from shared render/evidence pipeline |

What Can You Do?

Ask your AI assistant in natural language:

"Why did this deployment fail after rollout?"
"Is this workload safe to restart right now?"
"Why are ArgoCD apps out of sync?"
"Is Flux healthy in this cluster?"
"Why are certs failing to renew?"
"Before patch/apply, is this mutation safe?"

RootCause keeps its depth-first model: evidence-first diagnosis, root-cause analysis, and remediation flow instead of raw tool sprawl.

Power users can map these prompts to concrete tools in this README (Complete Feature Set, Toolchains, and Tools sections).

Use Cases

Incident response

Build end-to-end incident evidence with rootcause.incident_bundle
Generate probable causes with rootcause.rca_generate
Export timeline and postmortem artifacts for follow-up

Safe operations before mutation

Evaluate rollout/restart risk with k8s.restart_safety_check and k8s.best_practice
Run k8s.safe_mutation_preflight before apply/patch/delete/scale operations

Ecosystem-specific health checks

ArgoCD: detect installation and diagnose sync/health drift
Flux: detect controllers and diagnose reconciliation failures
cert-manager / Kyverno / Gatekeeper / Cilium: detect footprint and diagnose control-plane or policy issues

Feature Highlights

| Area | RootCause Capability | |---|---| | Incident analysis | rootcause.incident_bundle, rootcause.rca_generate, rootcause.change_timeline, rootcause.postmortem_export, rootcause.capabilities | | Kubernetes resilience | k8s.restart_safety_check, k8s.best_practice, k8s.safe_mutation_preflight | | Ecosystem diagnostics | ArgoCD/Flux/cert-manager/Kyverno/Gatekeeper/Cilium via *_detect and diagnose_* tools | | Deployment safety | Automatic preflight before k8s mutating operations | | Helm operations | Chart search/list/get, release diff, rollback advisor, template apply/uninstall flows | | Terraform analysis | Module/provider search + terraform.debug_plan for impact/risk analysis | | Service mesh & scaling | Linkerd/Istio/Karpenter diagnostics with shared evidence model |

Complete Feature Set

| Category | Representative capabilities | |---|---| | Kubernetes core (k8s.*) | CRUD, logs/events, graph-based debug flows, restart safety, best-practice scoring, mutation preflight | | Ecosystem diagnostics | ArgoCD, Flux, cert-manager, Kyverno, Gatekeeper, Cilium via *_detect and diagnose_* | | Incident intelligence (rootcause.*) | Incident bundle orchestration, timeline export, RCA generation, remediation playbook, postmortem export | | Helm operations (helm.*) | Chart registry search/list/get, release status/diff, rollback advisor, install/upgrade/uninstall, template apply/uninstall | | Terraform analysis (terraform.*) | Modules/providers/resources/data source discovery + plan debugging | | Service mesh (istio.*, linkerd.*) | Proxy/config/status diagnostics, policy/routing visibility, mesh resource health | | Cluster autoscaling (karpenter.*) | Provisioning, nodepool/nodeclass, interruption and scheduling diagnostics | | Cloud context (aws.*) | IAM, VPC, EC2, EKS, ECR, STS, KMS diagnostics for cross-layer incident analysis | | Safety and controls | Read-only mode, destructive gating, explicit confirmation, auto preflight checks before mutating K8s operations |

Agent Skills

Extend your AI coding agent with Kubernetes and RootCause expertise using the built-in skills library in skills/.

Skills metadata is schema-versioned and embedded in the CLI from internal/skills/catalog/manifest.json.

Quick Install

# Copy all skills to Claude
cp -r skills/claude/* ~/.claude/skills/

# Or install a specific skill
cp -r skills/claude/k8s-helm ~/.claude/skills/

Sync Skills into Project Agent Directories

# List supported agent targets
rootcause sync-skills --list-agents

# Sync skills for one agent into project-local defaults
rootcause sync-skills --agent claude --project-dir .

# Example: GitHub Copilot project files
rootcause sync-skills --agent copilot --project-dir .

# UX helpers
rootcause sync-skills --all-agents --dry-run
rootcause sync-skills --agent claude --skill k8s-incident --skill rootcause-rca
rootcause sync-skills --list-skills

Agent directory defaults used by sync-skills:

| Agent | Format | Project Directory | |---|---|---| | Claude Code | SKILL.md | .claude/skills/ | | Cursor | .mdc | .cursor/skills/ | | Codex | SKILL.md | .codex/skills/ | | Gemini CLI | SKILL.md | .gemini/skills/ | | OpenCode | SKILL.md | .opencode/skills/ | | GitHub Copilot | Markdown | .github/skills/ | | Windsurf | Markdown | .windsurf/skills/ | | Devin | Markdown | .devin/skills/ | | Aider | SKILL.md | .aider/skills/ | | Sourcegraph Cody | SKILL.md | .cody/skills/ | | Amazon Q | SKILL.md | .amazonq/skills/ |

Available Skills (21)

20 skills are currently included.

| Category | Skills | |---|---| | Incident Response | k8s-incident, rootcause-rca | | Core and Operations | k8s-core, k8s-operations | | Diagnostics and Debugging | k8s-diagnostics, k8s-troubleshoot | | Deployment and Delivery | k8s-deploy, k8s-helm, k8s-rollouts | | GitOps | k8s-gitops | | Networking and Mesh | k8s-networking, k8s-service-mesh, k8s-cilium | | Security and Policy | k8s-security, k8s-policy, k8s-gatekeeper, k8s-certs | | Cost and Scaling | k8s-cost, k8s-autoscaling | | Storage | k8s-storage | | Browser Automation | k8s-browser |

Supported agents include Claude, Cursor, Codex, Gemini CLI, GitHub Copilot, Goose, Windsurf, Roo, Amp, and more.

Skills include consistent triggers, workflow steps, tool references, troubleshooting notes, and output contracts.

See skills/README.md for full documentation and skills/CATALOG.md for auto-generated catalog output.

MCP Resources

Access Kubernetes data as browsable resources:

| Resource URI | Description | |---|---| | kubeconfig://contexts | List all available kubeconfig contexts | | kubeconfig://current-context | Get current active context | | namespace://current | Get current namespace | | namespace://list | List all namespaces | | cluster://info | Get cluster connection info | | cluster://nodes | Get detailed node information | | cluster://version | Get Kubernetes version | | cluster://api-resources | List available API resources | | manifest://deployments/{namespace}/{name} | Get deployment YAML | | manifest://services/{namespace}/{name} | Get service YAML | | manifest://pods/{namespace}/{name} | Get pod YAML | | manifest://configmaps/{namespace}/{name} | Get ConfigMap YAML | | manifest://secrets/{namespace}/{name} | Get secret YAML (data masked) | | manifest://ingresses/{namespace}/{name} | Get ingress YAML |

MCP Prompts

Pre-built workflow prompts for Kubernetes and platform operations:

| Prompt | Description | |---|---| | troubleshoot_workload | Comprehensive troubleshooting guide for pods/deployments | | deploy_application | Step-by-step deployment workflow | | security_audit | Security scanning and RBAC analysis workflow | | cost_optimization | Resource optimization and cost analysis workflow | | disaster_recovery | Backup and recovery planning workflow | | debug_networking | Network debugging for services and connectivity | | scale_application | Scaling guide with HPA

Related Skills

node-connect

352.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.3k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

Hook Development

111.3k

This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.

MCP Integration

111.3k

This skill should be used when the user asks to "add MCP server", "integrate MCP", "configure MCP in plugin", "use .mcp.json", "set up Model Context Protocol", "connect external service", mentions "${CLAUDE_PLUGIN_ROOT} with MCP", or discusses MCP server types (SSE, stdio, HTTP, WebSocket). Provides comprehensive guidance for integrating Model Context Protocol servers into Claude Code plugins for external tool and service integration.

yindia

View profile

View on GitHub

GitHub Stars8

CategoryDevelopment

Updated21d ago

Forks1

yindia/rootcause

Languages

Security Score

90/100

Audited on Mar 19, 2026

No findings