Facet
Human-explainable AI.
Install / Use
/learn @BCG-X-Official/FacetREADME
.. image:: sphinx/source/_images/Gamma_Facet_Logo_RGB_LB.svg
FACET is an open source library for human-explainable AI. It combines sophisticated model inspection and model-based simulation to enable better explanations of your supervised machine learning models.
FACET is composed of the following key components:
+-----------------+-----------------------------------------------------------------------+
| |spacer| | Model Inspection |
| | |
| |inspect| | FACET introduces a new algorithm to quantify dependencies and |
| | interactions between features in ML models. |
| | This new tool for human-explainable AI adds a new, global |
| | perspective to the observation-level explanations provided by the |
| | popular SHAP <https://shap.readthedocs.io/en/stable/>__ approach. |
| | To learn more about FACET’s model inspection capabilities, see the |
| | getting started example below. |
+-----------------+-----------------------------------------------------------------------+
| |spacer| | Model Simulation |
| | |
| |sim| | FACET’s model simulation algorithms use ML models for |
| | virtual experiments to help identify scenarios that optimise |
| | predicted outcomes. |
| | To quantify the uncertainty in simulations, FACET utilises a range |
| | of bootstrapping algorithms including stationary and stratified |
| | bootstraps. |
| | For an example of FACET’s bootstrap simulations, see the |
| | quickstart example below. |
+-----------------+-----------------------------------------------------------------------+
| |spacer| | Enhanced Machine Learning Workflow |
| | |
| |pipe| | FACET offers an efficient and transparent machine learning |
| | workflow, enhancing |
| | scikit-learn <https://scikit-learn.org/stable/index.html>'s |
| | tried and tested pipelining paradigm with new capabilities for model |
| | selection, inspection, and simulation. |
| | FACET also introduces |
| | sklearndf <https://github.com/BCG-X-Official/sklearndf> |
| | [documentation <https://bcg-x-official.github.io/sklearndf/index.html>__]|
| | an augmented version of scikit-learn with enhanced support for |
| | pandas data frames that ensures end-to-end traceability of features.|
+-----------------+-----------------------------------------------------------------------+
.. Begin-Badges
|pypi| |conda| |azure_build| |azure_code_cov| |python_versions| |code_style| |made_with_sphinx_doc| |License_badge|
.. End-Badges
Installation
FACET supports both PyPI and Anaconda. We recommend to install FACET into a dedicated environment.
Anaconda
.. code-block:: sh
conda create -n facet
conda activate facet
conda install -c bcg_gamma -c conda-forge gamma-facet
Pip
~~~
macOS and Linux:
^^^^^^^^^^^^^^^^
.. code-block:: sh
python -m venv facet
source facet/bin/activate
pip install gamma-facet
Windows:
^^^^^^^^
.. code-block:: dosbatch
python -m venv facet
facet\Scripts\activate.bat
pip install gamma-facet
Quickstart
----------
The following quickstart guide provides a minimal example workflow to get you
up and running with FACET.
For additional tutorials and the API reference,
see the `FACET documentation <https://bcg-x-official.github.io/facet/docs-version/2-0>`__.
Changes and additions to new versions are summarized in the
`release notes <https://bcg-x-official.github.io/facet/docs-version/2-0/release_notes.html>`__.
Enhanced Machine Learning Workflow
To demonstrate the model inspection capability of FACET, we first create a
pipeline to fit a learner. In this simple example we will use the
diabetes dataset <https://web.stanford.edu/~hastie/Papers/LARS/diabetes.data>__
which contains age, sex, BMI and blood pressure along with 6 blood serum
measurements as features. This dataset was used in this
publication <https://statweb.stanford.edu/~tibs/ftp/lars.pdf>.
A transformed version of this dataset is also available on scikit-learn
here <https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset>.
In this quickstart we will train a Random Forest regressor using 10 repeated
5-fold CV to predict disease progression after one year. With the use of
sklearndf we can create a pandas DataFrame compatible workflow. However,
FACET provides additional enhancements to keep track of our feature matrix
and target vector using a sample object (Sample) and easily compare
hyperparameter configurations and even multiple learners with the LearnerSelector.
.. code-block:: Python
# standard imports
import pandas as pd
from sklearn.model_selection import RepeatedKFold, GridSearchCV
# some helpful imports from sklearndf
from sklearndf.pipeline import RegressorPipelineDF
from sklearndf.regression import RandomForestRegressorDF
# relevant FACET imports
from facet.data import Sample
from facet.selection import LearnerSelector, ParameterSpace
# declaring url with data
data_url = 'https://web.stanford.edu/~hastie/Papers/LARS/diabetes.data'
#importing data from url
diabetes_df = pd.read_csv(data_url, delimiter='\t').rename(
# renaming columns for better readability
columns={
'S1': 'TC', # total serum cholesterol
'S2': 'LDL', # low-density lipoproteins
'S3': 'HDL', # high-density lipoproteins
'S4': 'TCH', # total cholesterol/ HDL
'S5': 'LTG', # lamotrigine level
'S6': 'GLU', # blood sugar level
'Y': 'Disease_progression' # measure of progress since 1yr of baseline
}
)
# create FACET sample object
diabetes_sample = Sample(observations=diabetes_df, target_name="Disease_progression")
# create a (trivial) pipeline for a random forest regressor
rnd_forest_reg = RegressorPipelineDF(
regressor=RandomForestRegressorDF(n_estimators=200, random_state=42)
)
# define parameter space for models which are "competing" against each other
rnd_forest_ps = ParameterSpace(rnd_forest_reg)
rnd_forest_ps.regressor.min_samples_leaf = [8, 11, 15]
rnd_forest_ps.regressor.max_depth = [4, 5, 6]
# create repeated k-fold CV iterator
rkf_cv = RepeatedKFold(n_splits=5, n_repeats=10, random_state=42)
# rank your candidate models by performance
selector = LearnerSelector(
searcher_type=GridSearchCV,
parameter_space=rnd_forest_ps,
cv=rkf_cv,
n_jobs=-3,
scoring="r2"
).fit(diabetes_sample)
# get summary report
selector.summary_report()
.. image:: sphinx/source/_images/ranker_summary.png :width: 600
We can see based on this minimal workflow that a value of 11 for minimum samples in the leaf and 5 for maximum tree depth was the best performing of the three considered values. This approach easily extends to additional hyperparameters for the learner, and for multiple learners.
Model Inspection
FACET implements several model inspection methods for
`scikit-learn <https://scikit-learn.org/stable/index.html>`__ estimators.
FACET enhances model inspection by providing global metrics that complement
the local perspective of SHAP (see
`[arXiv:2107.12436] <https://arxiv.org/abs/2107.12436>`__ for a formal description).
The key global metrics for each pair of features in a model are:
- **Synergy**
The degree to which the model combines information from one feature with
another to predict the target. For example, let's assume we are predicting
cardiovascular health using age and gender and the fitted model includes
a complex interaction between them. This means these two features are
synergistic for predicting cardiovascular health. Further, both features
are important to the model and removing either one would significantly
impact performance. Let's assume age brings more information to the joint
contribution than gender. This asymmetric contribution means the synergy for
(age, gender) is less than the synergy for (gender, age). To think about it another
way, imagine the prediction is a coordinate you are trying to reach.
From your starting point, age gets you much closer to this point than
gender, however, you need both to get there. Synergy reflects the fact
that gender gets more help from age (higher synergy from the perspective
of gender) than age does from gender (lower synergy from the perspective of
age) to reach the prediction. *This leads to an important point: synergy
is a naturally asymmetric property of the global information two interacting
features contribute to the model predictions.* Synergy is expressed as a
percentage ranging from 0% (full autonomy) to 100% (full synergy).
- **Red
