SkillAgentSearch skills...

Causeinfer

Machine learning based causal inference/uplift in Python

Install / Use

/learn @andrewtavis/Causeinfer

README

<a id="top"></a>

<div align="center"> <a href="https://github.com/andrewtavis/causeinfer"><img src="https://raw.githubusercontent.com/andrewtavis/causeinfer/main/.github/resources/logo/causeinfer_logo_transparent.png" width=612 height=164></a> </div> <ol></ol>

rtd pr_ci python_package_ci codecov pyversions pypi pypistatus license coc codestyle colab

Machine learning based causal inference/uplift in Python

causeinfer is a Python package for estimating average and conditional average treatment effects using machine learning. The goal is to compile causal inference models both standard and advanced, as well as demonstrate their usage and efficacy - all this with the overarching ambition to help people learn causal inference techniques across business, medical, and socioeconomic fields. See the documentation for a full outline of the package including the available models and datasets.

Contents

Installation

causeinfer is available for installation via uv (recommended) or pip.

For Users

# Using uv (recommended - fast, Rust-based installer):
uv pip install causeinfer

# Or using pip:
pip install causeinfer

For Development Build

git clone https://github.com/andrewtavis/causeinfer.git
cd causeinfer

# With uv (recommended):
uv sync --all-extras  # install all dependencies
source .venv/bin/activate  # activate venv (macOS/Linux)
# .venv\Scripts\activate  # activate venv (Windows)

# Or with pip:
python -m venv .venv  # create virtual environment
source .venv/bin/activate  # activate venv (macOS/Linux)
# .venv\Scripts\activate  # activate venv (Windows)
pip install -e .
import causeinfer

<sub><a href="#top">Back to top.</a></sub>

Application

Standard Algorithms

<a id="two-model-approach"></a>

<details><summary><strong>Two Model Approach</strong></summary> </p>

Separate models for treatment and control groups are trained and combined to derive average treatment effects (Hansotia, 2002).

from causeinfer.standard_algorithms.two_model import TwoModel
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

tm_pred = TwoModel(
    treatment_model=RandomForestRegressor(**kwargs),
    control_model=RandomForestRegressor(**kwargs),
)
tm_pred.fit(X=X_train, y=y_train, w=w_train)

# An array of predictions given a treatment and control model
tm_preds = tm_pred.predict(X=X_test)

tm_proba = TwoModel(
    treatment_model=RandomForestClassifier(**kwargs),
    control_model=RandomForestClassifier(**kwargs),
)
tm_proba.fit(X=X_train, y=y_train, w=w_train)

# An array of predicted treatment class probabilities given models
tm_probas = tm.predict_proba(X=X_test)
</p> </details>

<a id="interaction-term-approach"></a>

<details><summary><strong>Interaction Term Approach</strong></summary> <p>

An interaction term between treatment and covariates is added to the data to allow for a basic single model application (Lo, 2002).

<div align="center"> <img src="https://raw.githubusercontent.com/andrewtavis/causeinfer/main/.github/resources/images/interaction_term_data.png" width="720" height="282"> </div>
from causeinfer.standard_algorithms.interaction_term import InteractionTerm
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

it_pred = InteractionTerm(model=RandomForestRegressor(**kwargs))
it_pred.fit(X=X_train, y=y_train, w=w_train)

# An array of predictions given a treatment and control interaction term
it_preds = it_pred.predict(X=X_test)

it_proba = InteractionTerm(model=RandomForestClassifier(**kwargs))
it_proba.fit(X=X_train, y=y_train, w=w_train)

# An array of predicted treatment class probabilities given interaction terms
it_probas = it_proba.predict_proba(X=X_test)
</p> </details>

<a id="class-transformation-approaches"></a>

<details><summary><strong>Class Transformation Approaches</strong></summary> <p>

Units are categorized into two or four classes to derive treatment effects from favorable class attributes (Lai, 2006; Kane, et al, 2014; Shaar, et al, 2016).

<div align="center"> <img src="https://raw.githubusercontent.com/andrewtavis/causeinfer/main/.github/resources/images/new_known_unknown_classes.png" width="720" height="405"> </div>
# Binary Class Transformation
from causeinfer.standard_algorithms.binary_transformation import BinaryTransformation
from sklearn.ensemble import RandomForestClassifier

bt = BinaryTransformation(model=RandomForestClassifier(**kwargs), regularize=True)
bt.fit(X=X_train, y=y_train, w=w_train)

# An array of predicted probabilities (P(Favorable Class), P(Unfavorable Class))
bt_probas = bt.predict_proba(X=X_test)
# Quaternary Class Transformation
from causeinfer.standard_algorithms.quaternary_transformation import (
    QuaternaryTransformation,
)
from sklearn.ensemble import RandomForestClassifier

qt = QuaternaryTransformation(model=RandomForestClassifier(**kwargs), regularize=True)
qt.fit(X=X_train, y=y_train, w=w_train)

# An array of predicted probabilities (P(Favorable Class), P(Unfavorable Class))
qt_probas = qt.predict_proba(X=X_test)
</p> </details>

<a id="reflective-and-pessimistic-uplift"></a>

<details><summary><strong>Reflective and Pessimistic Uplift</strong></summary> <p>

Weighted versions of the binary class transformation approach that are meant to dampen the original model's inherently noisy results (Shaar, et al, 2016).

# Reflective Uplift Transformation
from causeinfer.standard_algorithms.reflective import ReflectiveUplift
from sklearn.ensemble import RandomForestClassifier

ru = ReflectiveUplift(model=RandomForestClassifier(**kwargs))
ru.fit(X=X_train, y=y_train, w=w_train)

# An array of predicted probabilities (P(Favorable Class), P(Unfavorable Class))
ru_probas = ru.predict_proba(X=X_test)
# Pessimistic Uplift Transformation
from causeinfer.standard_algorithms.pessimistic import PessimisticUplift
from sklearn.ensemble import RandomForestClassifier

pu = PessimisticUplift(model=RandomForestClassifier(**kwargs))
pu.fit(X=X_train, y=y_train, w=w_train)

# An array of predicted probabilities (P(Favorable Class), P(Unfavorable Class))
pu_probas = pu.predict_proba(X=X_test)
</p> </details>

<sub><a href="#top">Back to top.</a></sub>

Advanced Algorithms

<details><summary><strong>Models to Consider</strong></summary> <p>
  • Under consideration for inclusion in causeinfer:
    • Generalized Random Forest via the R/C++ grf - Athey, Tibshirani, and Wager (2019)
    • The X-Learner - Kunzel, et al (2019)
    • The R-Learner - Nie and Wager (2017)
    • Double Machine Learning - Chernozhukov, et al (2018)
    • Information Theory Trees/Forests - Soltys, et al (2015)
</p> </details>

<sub><a href="#top">Back to top.</a></sub>

Evaluation Methods

<a id="visualization"></a>

<details><summary><strong>Visualization Metrics and Coefficients</strong></summary> <p>

Comparisons across stratified, ordered treatment response groups are used to derive model efficiency.

from causeinfer.evaluation import plot_cum_gain, plot_qini

visual_eval_dict = {
    "y_test": y_test,
    "w_test": w_test,
    "two_model": tm_effects,
    "interaction_term": it_effects,
    "binary_trans": bt_effects,
    "quaternary_trans": qt_effects,
}

df_visual_eval = pd.DataFrame(visual_eval_dict, columns=visual_eval_dict.keys())
model_pred_cols = [
    col for col in visual_eval_dict.keys() if col not in ["y_test", "w_test"]
]
fig
View on GitHub
GitHub Stars62
CategoryDevelopment
Updated20h ago
Forks12

Languages

Python

Security Score

100/100

Audited on Mar 26, 2026

No findings