DPG
Decision Predicate Graph (DPG) is a model-agnostic tool to provide a global interpretation of tree-based ensemble models.
Install / Use
/learn @Meta-Group/DPGREADME
Decision Predicate Graph (DPG)
<p align="center"> <img src="https://github.com/Meta-Group/DPG/blob/main/DPG.png" width="300" /> </p>DPG is a model-agnostic tool to provide a global interpretation of tree-based ensemble models, addressing transparency and explainability challenges.
DPG is a graph structure that captures the tree-based ensemble model and learned dataset details, preserving the relations among features, logical decisions, and predictions towards emphasising insightful points. DPG enables graph-based evaluations and the identification of model decisions towards facilitating comparisons between features and their associated values while offering insights into the entire model. DPG provides descriptive metrics that enhance the understanding of the decisions inherent in the model, offering valuable insights.
<p align="center"> <img src="https://github.com/Meta-Group/DPG/blob/main/dpg_image_examples/custom_l2.jpg?raw=true" width="600" /> </p>The structure
The concept behind DPG is to convert a generic tree-based ensemble model for classification into a graph, where:
- Nodes represent predicates, i.e., the feature-value associations present in each node of every tree;
- Edges denote the frequency with which these predicates are satisfied during the model training phase by the samples of the dataset.
Metrics
The graph-based nature of DPG provides significant enhancements in the direction of a complete mapping of the ensemble structure. | Property | Definition | Utility | |--------------|------------|---------| | Constraints | The intervals of values for each feature obtained from all predicates connected by a path that culminates in a given class. | Calculate the classification boundary values of each feature associated with each class. | | Betweenness centrality | Quantifies the fraction of all the shortest paths between every pair of nodes of the graph passing through the considered node. | Identify potential bottleneck nodes that correspond to crucial decisions. | | Local reaching centrality | Quantifies the proportion of other nodes reachable from the local node through its outgoing edges. | Assess the importance of nodes similarly to feature importance, but enrich the information by encompassing the values associated with features across all decisions. | | Community | A subset of nodes of the DPG which is characterised by dense interconnections between its elements and sparse connections with the other nodes of the DPG that do not belong to the community. | Understanding the characteristics of nodes to be assigned to a particular community class, identifying predominant predicates, and those that play a marginal role in the classification process. |
|Constraints | Betweenness centrality | Local reaching centrality | Community|
|------------|------------|--------------|--------------------|
|
|
|
|
|Constraints(Class 1) = val3 < F1 ≤ val1, F2 ≤ val2 | BC(F2 ≤ val2) = 4/24 | LRC(F1 ≤ val1) = 6 / 7 | Community(Class 1) = F1 ≤ val1, F2 ≤ val2 |
Installation
To install DPG locally, first clone the repository:
git clone https://github.com/Meta-Group/DPG.git
cd DPG
Then, install the DPG library in development mode using pip:
pip install -e .
Alternatively, if using pip directly:
pip install git+https://github.com/Meta-Group/DPG.git
Troubleshooting: If you encounter dependency conflicts, we recommend using a virtual environment:
1- For Windows Users:
# Create a virtual environment
python -m venv .venv
# Activate the virtual environment
.venv\Scripts\activate
# If you get execution policy errors, run this first in PowerShell as Administrator:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
# Then install DPG
pip install -r ./requirements.txt
2- For Linux/Mac Users:
# Create a virtual environment
python -m venv .venv
# Activate the virtual environment
source .venv/bin/activate
# Install DPG
pip install -r ./requirements.txt
3- Deactivating the Virtual Environment: When you're done working with DPG, you can deactivate the virtual environment:
deactivate
4- Graph rendering error (dot not found):
DPG plotting requires the Graphviz system executable (dot) in your PATH.
Installing the Python package graphviz is not sufficient on its own.
- macOS (Homebrew):
brew install graphviz - Ubuntu/Debian:
sudo apt-get install graphviz - Windows (winget):
winget install Graphviz.Graphviz
Documentation
For full documentation, visit https://dpg.readthedocs.io/.
To build and serve documentation locally, see docs/README.md.
Example usage (Python)
You can also try DPG directly inside a Jupyter Notebook. Here's a minimal working example using the high-level API:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from dpg import DPGExplainer
# Load dataset (last column assumed to be target)
df = pd.read_csv("datasets/custom.csv", index_col=0)
X = df.iloc[:, :-1]
y = df.iloc[:, -1]
# Train a simple Random Forest classifier
model = RandomForestClassifier(n_estimators=10, random_state=27)
model.fit(X, y)
# Build the DPG and extract global explanations
explainer = DPGExplainer(
model=model,
feature_names=X.columns,
target_names=np.unique(y).astype(str).tolist(),
)
explanation = explainer.explain_global(X.values, communities=True)
# Render the graph to disk
explainer.plot("dpg_output", explanation, save_dir="datasets", export_pdf=True)
explainer.plot_communities("dpg_output", explanation, save_dir="datasets", export_pdf=True)
Legacy API (low-level)
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from dpg.core import DecisionPredicateGraph
from dpg.visualizer import plot_dpg
from metrics.nodes import NodeMetrics
from metrics.edges import EdgeMetrics
df = pd.read_csv("datasets/custom.csv", index_col=0)
X = df.iloc[:, :-1]
y = df.iloc[:, -1]
model = RandomForestClassifier(n_estimators=10, random_state=27)
model.fit(X, y)
feature_names = X.columns.tolist()
class_names = np.unique(y).astype(str).tolist()
dpg = DecisionPredicateGraph(
model=model,
feature_names=feature_names,
target_names=class_names
)
dot = dpg.fit(X.values)
dpg_model, nodes_list = dpg.to_networkx(dot)
df_edges = EdgeMetrics.extract_edge_metrics(dpg_model, nodes_list)
df_nodes = NodeMetrics.extract_node_metrics(dpg_model, nodes_list)
plot_dpg(
"dpg_output",
dot,
df_nodes,
df_edges,
save_dir="datasets",
class_flag=True,
export_pdf=True,
)
Output:
<p align="center"> <img src="https://github.com/Meta-Group/DPG/blob/main/dpg_image_examples/dpg_output_communities.png?raw=true" width="600" /> </p>API overview (high-level)
The high-level API is designed to return structured outputs so downstream tools can use them directly.
DPGExplainer.fit(X): builds the DPG structureDPGExplainer.explain_global(X=None, communities=False, community_threshold=0.2): returns aDPGExplanationDPGExplainer.plot(...): renders the standard DPGDPGExplainer.plot_communities(...): renders a community-colored DPG
DPGExplanation includes dot, graph, nodes, node_metrics, edge_metrics, class_boundaries, and optional communities.
CLI scripts
The library contains two different scripts to apply DPG:
run_dpg_standard.py: with this script it is possible to test DPG on a standard classification dataset provided bysklearnsuch asiris,digits,wine,breast cancer, anddiabetes.run_dpg_custom.py: with this script it is possible to apply DPG to your classification dataset, specifying the target class.
DPG implementation
The library also contains two other essential scripts:
core.pycontains all the functions used to calculate and create the DPG and the metrics.visualizer.pycontains the functions used to manage the visualization of DPG.
Output
The DPG output, through run_dpg_standard.py or run_dpg_custom.py, produces several files:
- the visualization of DPG in a dedicated environment, which can be zoomed and saved;
- a
.txtfile containing the DPG metrics; - a
.csvfile containing the information about all the nodes of the DPG and their associated metrics; - a
.txtfile containing the Random Forest statistics (accuracy, confusion matrix, classification report)
Easy usage
Usage: python run_dpg_standard.py --dataset <dataset_name> --n_learners <integer_number> --pv <threshold_value> --t <integer_number> --model_name <str_model_name> --dir <save_dir_path> --plot --save_plot_dir <save_plot_dir_path> --attribute <attribute> --communities --clusters --threshold_clusters <float> --class_flag --seed <int>
Where:
datasetis the name of
Related Skills
node-connect
335.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
335.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.5kCommit, push, and open a PR
