Decision Predicate Graph (DPG)

DPG is a model-agnostic tool to provide a global interpretation of tree-based ensemble models, addressing transparency and explainability challenges.

DPG is a graph structure that captures the tree-based ensemble model and learned dataset details, preserving the relations among features, logical decisions, and predictions towards emphasising insightful points. DPG enables graph-based evaluations and the identification of model decisions towards facilitating comparisons between features and their associated values while offering insights into the entire model. DPG provides descriptive metrics that enhance the understanding of the decisions inherent in the model, offering valuable insights.

The structure

The concept behind DPG is to convert a generic tree-based ensemble model for classification into a graph, where:

Nodes represent predicates, i.e., the feature-value associations present in each node of every tree;
Edges denote the frequency with which these predicates are satisfied during the model training phase by the samples of the dataset.

Metrics

The graph-based nature of DPG provides significant enhancements in the direction of a complete mapping of the ensemble structure. | Property | Definition | Utility | |--------------|------------|---------| | Constraints | The intervals of values for each feature obtained from all predicates connected by a path that culminates in a given class. | Calculate the classification boundary values of each feature associated with each class. | | Betweenness centrality | Quantifies the fraction of all the shortest paths between every pair of nodes of the graph passing through the considered node. | Identify potential bottleneck nodes that correspond to crucial decisions. | | Local reaching centrality | Quantifies the proportion of other nodes reachable from the local node through its outgoing edges. | Assess the importance of nodes similarly to feature importance, but enrich the information by encompassing the values associated with features across all decisions. | | Community | A subset of nodes of the DPG which is characterised by dense interconnections between its elements and sparse connections with the other nodes of the DPG that do not belong to the community. | Understanding the characteristics of nodes to be assigned to a particular community class, identifying predominant predicates, and those that play a marginal role in the classification process. |

|Constraints | Betweenness centrality | Local reaching centrality | Community| |------------|------------|--------------|--------------------| | | | | |Constraints(Class 1) = val3 < F1 ≤ val1, F2 ≤ val2 | BC(F2 ≤ val2) = 4/24 | LRC(F1 ≤ val1) = 6 / 7 | Community(Class 1) = F1 ≤ val1, F2 ≤ val2 |

Installation

To install DPG locally, first clone the repository:

git clone https://github.com/Meta-Group/DPG.git
cd DPG

Then, install the DPG library in development mode using pip:

pip install -e .

Alternatively, if using pip directly:

pip install git+https://github.com/Meta-Group/DPG.git

Troubleshooting: If you encounter dependency conflicts, we recommend using a virtual environment:

1- For Windows Users:

# Create a virtual environment
python -m venv .venv

# Activate the virtual environment
.venv\Scripts\activate

# If you get execution policy errors, run this first in PowerShell as Administrator:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

# Then install DPG
pip install -r ./requirements.txt

2- For Linux/Mac Users:

# Create a virtual environment
python -m venv .venv

# Activate the virtual environment
source .venv/bin/activate

# Install DPG
pip install -r ./requirements.txt

3- Deactivating the Virtual Environment: When you're done working with DPG, you can deactivate the virtual environment:

deactivate

4- Graph rendering error (dot not found): DPG plotting requires the Graphviz system executable (dot) in your PATH.
Installing the Python package graphviz is not sufficient on its own.

macOS (Homebrew):
```
brew install graphviz
```
Ubuntu/Debian:
```
sudo apt-get install graphviz
```
Windows (winget):
```
winget install Graphviz.Graphviz
```

Documentation

For full documentation, visit https://dpg.readthedocs.io/.

To build and serve documentation locally, see docs/README.md.

Example usage (Python)

You can also try DPG directly inside a Jupyter Notebook. Here's a minimal working example using the high-level API:

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from dpg import DPGExplainer

# Load dataset (last column assumed to be target)
df = pd.read_csv("datasets/custom.csv", index_col=0)
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

# Train a simple Random Forest classifier
model = RandomForestClassifier(n_estimators=10, random_state=27)
model.fit(X, y)

# Build the DPG and extract global explanations
explainer = DPGExplainer(
    model=model,
    feature_names=X.columns,
    target_names=np.unique(y).astype(str).tolist(),
)
explanation = explainer.explain_global(X.values, communities=True)

# Render the graph to disk
explainer.plot("dpg_output", explanation, save_dir="datasets", export_pdf=True)
explainer.plot_communities("dpg_output", explanation, save_dir="datasets", export_pdf=True)

Legacy API (low-level)

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from dpg.core import DecisionPredicateGraph
from dpg.visualizer import plot_dpg
from metrics.nodes import NodeMetrics
from metrics.edges import EdgeMetrics

df = pd.read_csv("datasets/custom.csv", index_col=0)
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

model = RandomForestClassifier(n_estimators=10, random_state=27)
model.fit(X, y)

feature_names = X.columns.tolist()
class_names = np.unique(y).astype(str).tolist()
dpg = DecisionPredicateGraph(
    model=model,
    feature_names=feature_names,
    target_names=class_names
)
dot = dpg.fit(X.values)
dpg_model, nodes_list = dpg.to_networkx(dot)

df_edges = EdgeMetrics.extract_edge_metrics(dpg_model, nodes_list)
df_nodes = NodeMetrics.extract_node_metrics(dpg_model, nodes_list)

plot_dpg(
    "dpg_output",
    dot,
    df_nodes,
    df_edges,
    save_dir="datasets",
    class_flag=True,
    export_pdf=True,
)

Output:

API overview (high-level)

The high-level API is designed to return structured outputs so downstream tools can use them directly.

DPGExplainer.fit(X): builds the DPG structure
DPGExplainer.explain_global(X=None, communities=False, community_threshold=0.2): returns a DPGExplanation
DPGExplainer.plot(...): renders the standard DPG
DPGExplainer.plot_communities(...): renders a community-colored DPG

DPGExplanation includes dot, graph, nodes, node_metrics, edge_metrics, class_boundaries, and optional communities.

CLI scripts

The library contains two different scripts to apply DPG:

run_dpg_standard.py: with this script it is possible to test DPG on a standard classification dataset provided by sklearn such as iris, digits, wine, breast cancer, and diabetes.
run_dpg_custom.py: with this script it is possible to apply DPG to your classification dataset, specifying the target class.

DPG implementation

The library also contains two other essential scripts:

core.py contains all the functions used to calculate and create the DPG and the metrics.
visualizer.py contains the functions used to manage the visualization of DPG.

Output

The DPG output, through run_dpg_standard.py or run_dpg_custom.py, produces several files:

the visualization of DPG in a dedicated environment, which can be zoomed and saved;
a .txt file containing the DPG metrics;
a .csv file containing the information about all the nodes of the DPG and their associated metrics;
a .txt file containing the Random Forest statistics (accuracy, confusion matrix, classification report)

Easy usage

Usage: python run_dpg_standard.py --dataset <dataset_name> --n_learners <integer_number> --pv <threshold_value> --t <integer_number> --model_name <str_model_name> --dir <save_dir_path> --plot --save_plot_dir <save_plot_dir_path> --attribute <attribute> --communities --clusters --threshold_clusters <float> --class_flag --seed <int> Where:

dataset is the name of

DPG

Install / Use

README