Causallift

CausalLift: Python package for causality-based Uplift Modeling in real-world business

Generate Convert Improve

Install / Use

/learn @Minyus/Causallift

About this skill

Quality Score

0/100

README

CausalLift: Python package for Uplift Modeling in real-world business; applicable for both A/B testing and observational data

Introduction

Scenario 1: Marketing campaign/promotion targeting

Suppose you are responsible for a marketing campaign/promotion (show an advertisement, offer discount, make a phone call, etc.) to some customers to increase revenue or prevent churns. Which one will you choose?

Strategy A: choose customers who will buy the product (with or without being contacted)
Strategy B: choose customers who will buy the product if contacted, but will not if not contacted

Strategy B is known as Uplift Modelling.

Scenario 2: Recommendation systems in E-commerce sites

Suppose you are responsible for recommendation system at a E-commerce company. Which one will you choose?

Strategy A: recommend a product the user will buy (with or without recommendation)
Strategy B: recommend a product the user will buy if recommended, but will not if not recommended

Strategy B is known as Uplift Modelling.

Scenario 3: US presidential campaign

Suppose you are trying to make a candidate to be the next US president. Which one will you choose?

Strategy A: contact voters who will vote for the candidate (with or without being contacted)
Strategy B: contact voters who will vote for the candidate if contacted, but will not if not contacted

Strategy B is known as Uplift Modelling, and used by Barack Obama in 2012. Here are some articles.

Scenario 4: Avoid death

Suppose you can receive one of the following words of the God of Machine Learning. Which one will you choose?

Option A: prediction that you will die somewhere next year
Option B: prediction that you will die if you live in city XXX next year, but will not die if you move to city YYY.

Option B is the analogy of Uplift Modeling.

What is Uplift Modeling?

Uplift Modeling is a Machine Learning technique to find which customers (individuals) should be targeted ("treated") and which customers should not be targeted.

Uplift Modeling is also known as persuasion modeling, incremental modeling, treatment effects modeling, true lift modeling, or net modeling.

Uplift Modeling predicts the following 4 labels:

True Uplift, aka "Persuadables"
- Customers will buy a product if treated, but will not buy if not treated
False Uplift, aka "Sure Things"
- Customers will buy a product regardless of the treatment
True Drop, aka "Sleeping Dogs"/"Do Not Disturbs"
- Customers who will not buy a product if treated, but will buy if not treated
False Drop, aka "Lost Causes"
- Customers will not buy a product regardless of the treatment

How does Uplift Modeling work?

Uplift Modeling estimates uplift scores (a.k.a. CATE: Conditional Average Treatment Effect or ITE: Individual Treatment Effect). Uplift score is how much the estimated conversion rate will increase by the campaign.

Suppose you are in charge of a marketing campaign to sell a product, and the estimated conversion rate (probability to buy a product) of a customer is 50 % if targeted and the estimated conversion rate is 40 % if not targeted, then the uplift score of the customer is (50-40) = +10 % points. Likewise, suppose the estimated conversion rate if targeted is 20 % and the estimated conversion rate if not targeted is 80%, the uplift score is (20-80) = -60 % points (negative value).

The range of uplift scores is between -100 and +100 % points (-1 and +1). It is recommended to target customers with high uplift scores and avoid customers with negative uplift scores to optimize the marketing campaign.

What are the advantages of "CausalLift" package?

CausalLift works with both A/B testing results and observational datasets.
CausalLift can output intuitive metrics for evaluation.

Why CausalLift was developed?

In a word, to use for real-world business.

Existing packages for Uplift Modeling assumes the dataset is from A/B Testing (a.k.a. Randomized Controlled Trial). In real-world business, however, observational datasets in which treatment (campaign) targets were not chosen randomly are more common especially in the early stage of evidence-based decision making. CausalLift supports observational datasets using a basic methodology in Causal Inference called "Inverse Probability Weighting" based on the assumption that propensity to be treated can be inferred from the available features.
There are 2 challenges of Uplift Modeling; explainability of the model and evaluation. CausalLift utilizes a basic methodology of Uplift Modeling called Two Models approach (training 2 models independently for treated and untreated samples to compute the CATE (Conditional Average Treatment Effects) or uplift scores) to address these challenges.
- [Explainability of the model] Since it is relatively simple, it is less challenging to explain how it works to stakeholders in the business.
- [Explainability of evaluation] To evaluate Uplift Modeling, metrics such as Qini and AUUC (Area Under the Uplift Curve) are used in research, but these metrics are difficult to explain to the stakeholders. For business, a metric that can estimate how much more profit can be earned is more practical. Since CausalLift adopted the Two-Model approach, the 2 models can be reused to simulate the outcome of following the recommendation by the Uplift Model and can estimate how much conversion rate (the proportion of people who took the desired action such as buying a product) will increase using the uplift model.

<img src="https://raw.githubusercontent.com/Minyus/causallift/master/readme_images/CausalLift_flow_diagram.png" width="415" height="274"> CausalLift flow diagram <img src="https://raw.githubusercontent.com/Minyus/causallift/master/readme_images/CausalLift_Viz.PNG" width="734" height="465"> CausalLift internal pipeline (visualized by Kedro Viz)

Supported Python versions

Python 3.5
Python 3.6 (Tested and recommended)
Python 3.7

Installation

Install dependencies

$ pip install python-json-logger<=2.0.4 kedro<=0.17.7 scikit-learn<=0.21.3 numpy pandas easydict

Note:

Python 3.8 or later is not supported yet.
scikit-learn 0.22 or later is not supported yet.
kedro 0.18 or later is not supported yet.

Install CausalLift

[Option 1] To install the latest release from the PyPI:

$ pip install causallift

[Option 2] To install the latest pre-release:

$ pip install git+https://github.com/Minyus/causallift.git

[Option 3] To install the latest pre-release without need to reinstall even after modifying the source code:

$ git clone https://github.com/Minyus/causallift.git
$ cd pipelinex
$ python setup.py develop

Optional:

matplotlib
xgboost
scikit-optimize

Optional for visualization of the pipeline:

kedro-viz

How is the data pipeline implemented by CausalLift?

Step 0: Prepare data

Prepare the following columns in 2 pandas DataFrames, train and test (validation).

Features
- a.k.a independent variables, explanatory variables, covariates
- e.g. customer gender, age range, etc.
- Note: Categorical variables need to be one-hot coded so propensity can be estimated using logistic regression. pandas.get_dummies can be used.
Outcome: binary (0 or 1)
- a.k.a dependent variable, target variable, label
- e.g. whether the customer bought a product, clicked a link, etc.
Treatment: binary (0 or 1)
- a variable you can control and want to optimize for each individual (customer)
- a.k.a intervention
- e.g. whether an advertising campaign was executed, whether a discount was offered, etc.
- Note: if you cannot find a treatment column, you may need to ask stakeholders to get the data, which might take hours to years.
[Optional] Propensity: continuous between 0 and 1
- propensity (or probability) to be treated for observational datasets (not needed for A/B Testing results)
- If not provided, CausalLift can estimate from the features using logistic regression.

<img src="https://raw.githubusercontent.com/Minyus/causallift/master/readme_images/Example_table_data.png"> Example table data

Step 1: Prepare for Uplift modeling and optionally estimate propensity scores using a supervised classification model

If the train_df is from observational data (not A/B Test), you can set enable_ipw=True so IPW (Inverse Probability Weighting) can address the issue that treatment should have been chosen based on a different probability (propensity score) for each individual (e.g. customer, patient, etc.)

If the `train_

Related Skills

node-connect

338.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

83.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

338.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

83.4k

Commit, push, and open a PR