SkillAgentSearch skills...

Explainerdashboard

Quickly build Explainable AI dashboards that show the inner workings of so-called "blackbox" machine learning models.

Install / Use

/learn @oegedijk/Explainerdashboard

README

GitHub Workflow Status (with event) https://pypi.python.org/pypi/explainerdashboard/ https://anaconda.org/conda-forge/explainerdashboard/ codecov Downloads

explainerdashboard

by: Oege Dijk

This package makes it convenient to quickly deploy a dashboard web app that explains the workings of a (scikit-learn compatible) machine learning model. The dashboard provides interactive plots on model performance, feature importances, feature contributions to individual predictions, "what if" analysis, partial dependence plots, SHAP (interaction) values, visualization of individual decision trees, etc.

You can also interactively explore components of the dashboard in a notebook/colab environment (or just launch a dashboard straight from there). Or design a dashboard with your own custom layout and explanations (thanks to the modular design of the library). And you can combine multiple dashboards into a single ExplainerHub.

Dashboards can be exported to static html directly from a running dashboard, or programmatically as an artifact as part of an automated CI/CD deployment process.

Examples deployed at: Fly.io, Hugging Face Space, detailed documentation at explainerdashboard.readthedocs.io, example notebook on how to launch dashboard for different models here, and an example notebook on how to interact with the explainer object here.

Works with scikit-learn, xgboost, catboost, lightgbm, and skorch (sklearn wrapper for tabular PyTorch models) and others.

Installation

You can install the package through pip:

pip install explainerdashboard

or conda-forge:

conda install -c conda-forge explainerdashboard

SageMaker Studio

SageMaker Studio runs notebooks and terminals in separate apps, so a common workflow is to export a dashboard config to disk and run it from the JupyterServer terminal. When running inside Studio, explainerdashboard can auto-detect SageMaker and apply the correct proxy prefixes, or you can set them explicitly.

Notebook example (export dashboard to disk):

db = ExplainerDashboard(
    explainer,
    mode="dash",
    port=8051,
    sagemaker=True,
)
db.to_yaml("dashboard.yaml", explainerfile="dashboard.joblib", dump_explainer=True)

Terminal example (run from the JupyterServer app):

explainerdashboard run dashboard.yaml --sagemaker --port 8051 --no-browser

Access the dashboard via the Studio proxy URL:

<STUDIO_URL>/jupyter/default/proxy/8051/

If your Studio proxy path differs, you can override the prefixes:

explainerdashboard run dashboard.yaml \
  --routes-pathname-prefix="/" \
  --requests-pathname-prefix="/jupyter/default/proxy/8051/"

Auto-detection uses the presence of /opt/ml/metadata/resource-metadata.json.

Demonstration:

explainerdashboard.gif

<!-- [![Dashboard Screenshot](https://i.postimg.cc/Gm8RnKVb/Screenshot-2020-07-01-at-13-25-19.png)](https://postimg.cc/PCj9mWd7) -->

(for live demonstration see Fly.io or Hugging Face Space)

Background

In a lot of organizations, especially governmental, but with the GDPR also increasingly in private sector, it is becoming more and more important to be able to explain the inner workings of your machine learning algorithms. Customers have to some extent a right to an explanation why they received a certain prediction, and more and more internal and external regulators require it. With recent innovations in explainable AI (e.g. SHAP values) the old black box trope is no longer valid, but it can still take quite a bit of data wrangling and plot manipulation to get the explanations out of a model. This library aims to make this easy.

The goal is manyfold:

  • Make it easy for data scientists to quickly inspect the workings and performance of their model in a few lines of code
  • Make it possible for non data scientist stakeholders such as managers, directors, internal and external watchdogs to interactively inspect the inner workings of the model without having to depend on a data scientist to generate every plot and table
  • Make it easy to build an application that explains individual predictions of your model for customers that ask for an explanation
  • Explain the inner workings of the model to the people working (human-in-the-loop) with it so that they gain understanding what the model does and doesn't do. This is important so that they can gain an intuition for when the model is likely missing information and may have to be overruled.

The library includes:

  • Shap values (i.e. what is the contributions of each feature to each individual prediction?)
  • Permutation importances (how much does the model metric deteriorate when you shuffle a feature?)
  • Partial dependence plots (how does the model prediction change when you vary a single feature?
  • Shap interaction values (decompose the shap value into a direct effect an interaction effects)
  • For Random Forest, XGBoost, and LightGBM models: visualisation of individual decision trees
  • Plus for classifiers: precision plots, confusion matrix, ROC AUC plot, PR AUC plot, etc
  • For regression models: goodness-of-fit plots, residual plots, etc.

The library is designed to be modular so that it should be easy to design your own interactive dashboards with plotly dash, with most of the work of calculating and formatting data, and rendering plots and tables handled by explainerdashboard, so that you can focus on the layout and project specific textual explanations. (i.e. design it so that it will be interpretable for business users in your organization, not just data scientists)

Alternatively, there is a built-in standard dashboard with pre-built tabs (that you can switch off individually)

Examples of use

Fitting a model, building the explainer object, building the dashboard, and then running it can be as simple as:

ExplainerDashboard(ClassifierExplainer(RandomForestClassifier().fit(X_train, y_train), X_test, y_test)).run()

Below a multi-line example, adding a few extra parameters. You can group onehot encoded categorical variables together using the cats parameter. You can either pass a dict specifying a list of onehot cols per categorical feature, or if you encode using e.g. pd.get_dummies(df.Name, prefix=['Name']) (resulting in column names 'Name_Adam', 'Name_Bob') you can simply pass the prefix 'Name':

from sklearn.ensemble import RandomForestClassifier
from explainerdashboard import ClassifierExplainer, ExplainerDashboard
from explainerdashboard.datasets import titanic_survive, titanic_names

feature_descriptions = {
    "Sex": "Gender of passenger",
    "Gender": "Gender of passenger",
    "Deck": "The deck the passenger had their cabin on",
    "PassengerClass": "The class of the ticket: 1st, 2nd or 3rd class",
    "Fare": "The amount of money people paid",
    "Embarked": "the port where the passenger boarded the Titanic. Either Southampton, Cherbourg or Queenstown",
    "Age": "Age of the passenger",
    "No_of_siblings_plus_spouses_on_board": "The sum of the number of siblings plus the number of spouses on board",
    "No_of_parents_plus_children_on_board" : "The sum of the number of parents plus the number of children on board",
}

X_train, y_train, X_test, y_test = titanic_survive()
train_names, test_names = titanic_names()
model = RandomForestClassifier(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)

explainer = ClassifierExplainer(model, X_test, y_test,
                                cats=['Deck', 'Embarked',
                                    {'Gender': ['Sex_male', 'Sex_female', 'Sex_nan']}],
                                cats_notencoded={'Embarked': 'Stowaway'}, # defaults to 'NOT_ENCODED'
                                descriptions=feature_descriptions, # adds a table and hover labels to dashboard
                                labels=['Not survived', 'Survived'], # defaults to ['0', '1', etc]
                                idxs = test_names, # defaults to X.index
                                index_name = "Passenger", # defaults to X.index.name
                                target = "Survival", # defaults to y.name
                                )

db = ExplainerDashboard(explainer,
                        title="Titanic Explainer", # defaults to "Model Explainer"
                        shap_interaction=False, # you can switch off tabs with bools
                        )
db.run(port=8050)

If you are passing an sklearn/imblearn Pipeline, you can also clean up transformed feature names and let the explainer infer onehot groups automatically:

explainer = ClassifierExplainer(
    pipeline_model, X_test, y_test,
    strip_pipeline_prefix=True,      # e.g. "num__Age" -> "Age"
    feature_name_fn=None,            # optional custom rename function
    auto_detect_pipeline_cats=True,  # infer cats from transformed pipeline output
)

For a regression model you can also pass the units of the target variable (e.g. dollars):

X_train, y_train, X_test, y_test = titanic_fare()
model = RandomForestR

Related Skills

View on GitHub
GitHub Stars2.5k
CategoryEducation
Updated1d ago
Forks346

Languages

Python

Security Score

100/100

Audited on Mar 31, 2026

No findings