PyRCA

PyRCA: A Python Machine Learning Library for Root Cause Analysis

Generate Convert Improve

Install / Use

/learn @salesforce/PyRCA

About this skill

Quality Score

0/100

README

PyRCA: A Python library for Root Cause Analysis

Introduction
Installation
Getting Started
Documentation
Tutorial
Example
Benchmarks
How to Contribute

Introduction

The adoption of microservices architectures is growing at a rapid pace, making multi-service applications the standard paradigm in real-world IT applications. Typically, a multi-service application consists of hundreds of interacting services, making it increasingly challenging to detect service failures and identify their root causes. Root cause analysis (RCA) methods typically rely on KPI metrics, traces, or logs monitored on these services to determine the root causes when a system failure is detected. Such methods can aid engineers and SREs in the troubleshooting process.

PyRCA is a Python machine-learning library designed to facilitate root cause analysis by offering various state-of-the-art RCA algorithms and an end-to-end pipeline for building RCA solutions. At present, PyRCA primarily focuses on metric-based RCA, including two types of algorithms: (1) identifying anomalous metrics in parallel with the observed anomaly through metric data analysis, such as ε-diagnosis, and (2) identifying root causes based on a topology/causal graph representing the causal relationships between the observed metrics, such as Bayesian inference and Random Walk. PyRCA also provides a convenient tool for building causal graphs from the observed time series data and domain knowledge, enabling users to develop graph-based solutions quickly. Furthermore, PyRCA offers a benchmark for evaluating various RCA methods, which is valuable for industry and academic research.

The following list shows the supported RCA methods in our library:

ε-Diagnosis
Bayesian Inference-based RCA (BI)
Random Walk-based RCA (RW)
Root Cause Discovery method (RCD)
Hypothesis Testing-based RCA (HT)

We will continue improving this library to make it more comprehensive in the future. In the future, PyRCA will support trace and log-based RCA methods as well.

Installation

You can install pyrca from PyPI by calling pip install sfr-pyrca. You may install from source by cloning the PyRCA repo, navigating to the root directory, and calling pip install ., or pip install -e . to install in editable mode. You may install additional dependencies:

For plotting & visualization: Calling pip install sfr-pyrca[plot], or pip install .[plot] from the root directory of the repo.
Install all the dependencies: Calling pip install sfr-pyrca[all], or pip install .[all] from the root directory of the repo.

Getting Started

PyRCA provides a unified interface for training RCA models and finding root causes. To apply a certain RCA method, you only need to specify:

The selected RCA method: e.g., BayesianNetwork, EpsilonDiagnosis.
The method configuration: e.g., BayesianNetworkConfig, EpsilonDiagnosisConfig.
Time series data for initialization/training: e.g., A time series data in a pandas dataframe.
Abnormal time series data in an incident window: The RCA methods require the anomalous KPI metrics in an incident window.

Let's take BayesianNetwork as an example. Suppose that graph_df is the pandas dataframe of a graph representing the causal relationships between metrics (how to construct such causal graph will be discussed later), and df is the pandas dataframe containing the historical observed time series data (e.g., the index is the timestamp and each column represents one monitored metric). To train a BayesianNetwork, you can simply run the following code:

from pyrca.analyzers.bayesian import BayesianNetwork
model = BayesianNetwork(config=BayesianNetwork.config_class(graph=graph_df))
model.train(df)
model.save("model_folder")

After the model is trained, you can use it to find root causes of an incident given a list of anomalous metrics detected by a certain anomaly detector (you can use the stats-based detector supported in PyRCA or other anomaly detection methods supported by our Merlion library), e.g.,

from pyrca.analyzers.bayesian import BayesianNetwork
model = BayesianNetwork.load("model_folder")
results = model.find_root_causes(["observed_anomalous_metric", ...])
print(results.to_dict())

For other RCA methods, you can write similar code as above for finding root causes. For example, if you want to try EpsilonDiagnosis, you can initalize EpsilonDiagnosis as follows:

from pyrca.analyzers.epsilon_diagnosis import EpsilonDiagnosis
model = EpsilonDiagnosis(config=EpsilonDiagnosis.config_class(alpha=0.01))
model.train(normal_data)

Here normal_data is the historically observed time series data without anomalies. To identify root causes, you can run:

results = model.find_root_causes(abnormal_data)
print(results.to_dict())

where abnormal_data is the time series data collected in an incident window.

As mentioned above, some RCA methods such as BayesianNetwork require causal graphs as their inputs. To construct such causal graphs from the observed time series data, you can utilize our tool by running python -m pyrca.tools. This command will launch a Dash app for time series data analysis and causal discovery. alt text

The dashboard enables users to experiment with different causal discovery methods, customize causal discovery parameters, add domain knowledge constraints (e.g., root/leaf nodes, forbidden/required links), and visualize the generated causal graphs. This feature simplifies the process of manually revising causal graphs based on domain knowledge. Users can download the graph generated by this tool if they are satisfied with it. The graph can then be used by the RCA methods supported in PyRCA.

Alternatively, users can write code to build such graphs instead of using the dashboard. The package pyrca.graphs.causal includes several popular causal discovery methods that users can leverage. All of these methods support domain knowledge constraints. For instance, if users wish to apply the PC algorithm for building causal graphs on the observed time series data df, the following code can be used:

from pyrca.graphs.causal.pc import PC
model = PC(PC.config_class())
graph_df = model.train(df)

If you have some domain knowledge constraints, you may run:

from pyrca.graphs.causal.pc import PC
model = PC(PC.config_class(domain_knowledge_file="file_path"))
graph_df = model.train(df)

The domain knowledge file has a YAML format, e.g.,

causal-graph:
  root-nodes: ["A", "B"]
  leaf-nodes: ["E", "F"]
  forbids:
    - ["A", "E"]
  requires: 
    - ["A", "C"]

This domain knowledge file states that:

Metrics A and B must the root nodes,
Metrics E and F must be the leaf nodes,
There is no connection from A to E, and
There is a connection from A to C.

You can write your domain knowledge file based on this template for generating more reliable causal graphs.

Application Example

Here is a real-world example of applying BayesianNetwork to build a solution for RCA, which is adapted from our internal use cases. The "config" folder includes the settings for the stats-based anomaly detector and the domain knowledge. The "models" folder stores the causal graph and the trained Bayesian network. The RCAEngine class in the "rca.py" file implements the methods for building causal graphs, training Bayesian networks and finding root causes by utilizing the modules provided by PyRCA. You can directly use this class if the stats-based anomaly detector and Bayesian inference are suitable for your problems. For example, given a time series dataframe df, you can build and train a Bayesian network via the following code:

from pyrca.applications.example.rca import RCAEngine
engine = RCAEngine()
engine.build_causal_graph(
    df=df,
    run_pdag2dag=True,
    max_num_points=5000000,
    verbose=True
)
bn = engine.train_bayesian_network(dfs=[df])
bn.print_probabilities()

After the Bayesian network is constructed, you can use it directly for finding root causes:

engine = RCAEngine()
result = engine.find_root_causes_bn(anomalies=["conn_pool", "apt"])
pprint.pprint(result)

The inputs of find_root_causes_bn is a list of the anomalous metrics detected by the stats-based anomaly detector. This method will estimate the probability of a node being a root cause and extract the paths from a potential root cause node to the leaf nodes.

Benchmarks

The following table summarizes the RCA performance of different methods on t

Related Skills

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

399

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

codebase-to-course

Turn any codebase into a beautiful, interactive single-page HTML course that teaches how the code works to non-technical people. Use this skill whenever someone wants to create an interactive course, tutorial, or educational walkthrough from a codebase or project. Also trigger when users mention 'turn this into a course,' 'explain this codebase interactively,' 'teach this code,' 'interactive tutorial from code,' 'codebase walkthrough,' 'learn from this codebase,' or 'make a course from this project.' This skill produces a stunning, self-contained HTML file with scroll-based navigation, animated visualizations, embedded quizzes, and code-with-plain-English side-by-side translations.

academic-pptx

Use this skill whenever the user wants to create or improve a presentation for an academic context — conference papers, seminar talks, thesis defenses, grant briefings, lab meetings, invited lectures, or any presentation where the audience will evaluate reasoning and evidence. Triggers include: 'conference talk', 'seminar slides', 'thesis defense', 'research presentation', 'academic deck', 'academic presentation'. Also triggers when the user asks to 'make slides' in combination with academic content (e.g., 'make slides for my paper on X', 'create a presentation for my dissertation defense', 'build a deck for my grant proposal'). This skill governs CONTENT and STRUCTURE decisions. For the technical work of creating or editing the .pptx file itself, also read the pptx SKILL.md.

salesforce

View profile

View on GitHub

GitHub Stars544

CategoryEducation

Updated15h ago

Forks67

salesforce/PyRCA

Languages

Python

Security Score

95/100

Audited on Mar 25, 2026

No findings