<img align="center" width=60% src="https://csinva.io/imodels/img/imodels_logo.svg?sanitize=True&kill_cache=1"> </img> Python package for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use. For interpretability in NLP, check out our new package: <a href="https://github.com/csinva/imodelsX">imodelsX</a> <a href="https://csinva.github.io/imodels/">📚 docs</a> • <a href="#demo-notebooks">📖 demo notebooks</a> <img src="https://img.shields.io/badge/license-mit-blue.svg"> <img src="https://img.shields.io/badge/python-3.9--3.11-blue"> <img src="https://img.shields.io/badge/numpy->=2.0-blue"> <a href="https://doi.org/10.21105/joss.03192"><img src="https://joss.theoj.org/papers/10.21105/joss.03192/status.svg"></a> <a href="https://github.com/csinva/imodels/actions"><img src="https://github.com/csinva/imodels/workflows/tests/badge.svg"></a>  <img src="https://img.shields.io/pypi/v/imodels?color=orange"> <img src="https://static.pepy.tech/personalized-badge/imodels?period=total&units=none&left_color=gray&right_color=orange&left_text=downloads&kill_cache=12">

Modern machine-learning models are increasingly complex, often making them difficult to interpret. This package provides a simple interface for fitting and using state-of-the-art interpretable models, all compatible with scikit-learn. These models can often replace black-box models (e.g. random forests) with simpler models (e.g. rule lists) while improving interpretability and computational efficiency, all without sacrificing predictive accuracy! Simply import a classifier or regressor and use the fit and predict methods, same as standard scikit-learn models.

from sklearn.model_selection import train_test_split
from imodels import get_clean_dataset, HSTreeClassifierCV # import any imodels model here

# prepare data (a sample clinical dataset)
X, y, feature_names = get_clean_dataset('csi_pecarn_pred')
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=42)

# fit the model
model = HSTreeClassifierCV(max_leaf_nodes=4)  # initialize a tree model and specify only 4 leaf nodes
model.fit(X_train, y_train, feature_names=feature_names)   # fit model
preds = model.predict(X_test) # discrete predictions: shape is (n_test, 1)
preds_proba = model.predict_proba(X_test) # predicted probabilities: shape is (n_test, n_classes)
print(model) # print the model

------------------------------
Decision Tree with Hierarchical Shrinkage
Prediction is made by looking at the value in the appropriate leaf of the tree
------------------------------
|--- FocalNeuroFindings2 <= 0.50
|   |--- HighriskDiving <= 0.50
|   |   |--- Torticollis2 <= 0.50
|   |   |   |--- value: [0.10]
|   |   |--- Torticollis2 >  0.50
|   |   |   |--- value: [0.30]
|   |--- HighriskDiving >  0.50
|   |   |--- value: [0.68]
|--- FocalNeuroFindings2 >  0.50
|   |--- value: [0.42]

Installation

Install with pip install imodels (see here for help).

Supported models

<a href="https://csinva.io/imodels/">🗂️</a> Docs &emsp; 📄 Research paper &emsp; 🔗 Reference code implementation

| Model | Reference | Description | | :-------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | | Rulefit rule set | 🗂️, 📄, 🔗 | Fits a sparse linear model on rules extracted from decision trees | | Skope rule set | 🗂️, 🔗 | Extracts rules from gradient-boosted trees, deduplicates them, then linearly combines them based on their OOB precision | | Boosted rule set | 🗂️, 📄, 🔗 | Sequentially fits a set of rules with Adaboost | | Slipper rule set | 🗂️, 📄 | Sequentially learns a set of rules with SLIPPER | | Bayesian rule set | 🗂️, 📄, 🔗 | Finds concise rule set with Bayesian sampling (slow) | | Bayesian rule list | 🗂️, 📄, 🔗 | Fits compact rule list distribution with Bayesian sampling (slow) | | Greedy rule list | 🗂️, 🔗 | Uses CART to fit a list (only a single path), rather than a tree | | OneR rule list | 🗂️, 📄 | Fits rule list restricted to only one feature | | Optimal rule tree | 🗂️, 📄, 🔗 | Fits succinct tree using global optimization for sparsity (GOSDT) | | Greedy rule tree | 🗂️, 📄, 🔗 | Greedily fits tree using CART | | C4.5 rule tree | 🗂️, 📄, 🔗 | Greedily fits tree using C4.5 | | TAO rule tree | 🗂️, 📄 | Fits tree using alternating optimization | | Iterative random forest | 🗂️, 📄, 🔗 | Repeatedly fit random forest, giving features with high importance a higher chance of being selected | | Sparse integer linear model | 🗂️, 📄 | Sparse linear model with integer coefficients | | Tree GAM | 🗂️, 📄, 🔗 | Generalized additive model fit with short boosted trees | | Greedy treesums (FIGS) | 🗂️,ㅤ📄 | Sum of small trees with very few total rules (FIGS) | | Hierarchical shrinkage wrapper | 🗂️, 📄 | Improve a decision tree, random forest, or gradient-boosting ensemble with ultra-fast, post-hoc regularization | | RF+ (MDI+) | 🗂️, 📄 | Flexible random forest-based feature importance | | Distillation wrapper | 🗂️ | Train a black-box model, then distill it into an interpretable model | | AutoML wrapper | 🗂️ | Automatically fit and select an interpretable model | | More models | ⌛ | (Coming soon!) Lightweight Rule Induction, MLRules, ... |

Demo notebooks

Demos are contained in the notebooks folder.

<details> <summary><a href="https://github.com/csinva/imodels/blob/master/notebooks/imodels_demo.ipynb">Quickstart demo</a></summary> Shows how to fit, predict, and visualize with different interpretable models </details> <details> <summary><a href="https://auto.gluon.ai/dev/tutorials/tabular_prediction/tabular-interpretability.html">Autogluon demo</a></summary> Fit/select an interpretable model automatically using Autogluon AutoML </details> <details> <summary><a href="https://colab.research.google.com/drive/1WfqvSjegygT7p0gyqiWpRpiwz2ePtiao#scrollTo=bLnLknIuoWtQ"

Imodels

Install / Use

README

Installation

Supported models

Demo notebooks