SkillAgentSearch skills...

Whynot

A Python sandbox for decision making in dynamics

Install / Use

/learn @socialfoundations/Whynot
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

WhyNot Logo

Build Status Documentation Status Code style: black

WhyNot is a Python package that provides an experimental sandbox for decisions in dynamics, connecting tools from causal inference and reinforcement learning with challenging dynamic environments. The package facilitates developing, testing, benchmarking, and teaching causal inference and sequential decision making tools.

For an introduction to WhyNot and a brief tutorial, see our walkthrough video. For more detailed information, check out the documentation.

Table of Contents

  1. Basic installation instructions
  2. Quick start examples
  3. Simulators in WhyNot
  4. Using estimators in R
  5. Frequently asked questions
  6. Citing WhyNot

WhyNot is still under active development! If you find bugs or have feature requests, please file a Github issue. We welcome all kinds of issues, especially those related to correctness, documentation, performance, and new features.

Basic installation instructions

  1. (Optionally) create a virtual environment
python3 -m venv whynot-env
source whynot-env/bin/activate
  1. Install via pip
pip install whynot

You can also install WhyNot directly from source.

git clone https://github.com/zykls/whynot.git
cd whynot
pip install -r requirements.txt

Quick start examples

Causal inference

Every simulator in WhyNot comes equipped with a set of experiments probing different aspects of causal inference. In this section, we show how to run experiments probing average treatment effect estimation on the World3 simulator. World3 is a dynamical systems model that studies the interplay between natural resource constraints, population growth, and industrial development.

First, we examine all of the experiments available for World3.

import whynot as wn
experiments = wn.world3.get_experiments()
print([experiment.name for experiment in experiments])
#['PollutionRCT', 'PollutionConfounding', 'PollutionUnobservedConfounding', 'PollutionMediation']

These experiments generate datasets both in the setting of a pure randomized control trial (PollutionRCT), as well as with (unobserved) confounding and mediation. We will run a randomized control experiment. The description property offers specific details about the experiment.

rct = wn.world3.PollutionRCT
rct.description
#'Study effect of intervening in 1975 to decrease pollution generation on total population in 2050.'

We can run the experiment using the experiment run function and specifying a desired sample size num_samples. The experiment then returns a causal Dataset consisting of the covariates for each unit, the treatment assignment, the outcome, and the ground truth causal effect for each unit. All of this data is contained in NumPy arrays, which makes it easy to connect to causal estimators.

import numpy as np

dataset = rct.run(num_samples=200, seed=1111, show_progress=True)
(X, W, Y) = dataset.covariates, dataset.treatments, dataset.outcomes
treatment_effect = np.mean(dataset.true_effects)

# Plug-in your favorite causal estimator
estimated_ate = np.mean(Y[W == 1.]) -  np.mean(Y[W  == 0.])

WhyNot also enables you to run a large collection of causal estimators on the data for benchmarking and comparison. The main function to do this is the causal_suite which, given the causal dataset, runs all of the estimators on the dataset and returns an InferenceResult for each estimator containing its estimated treatment effects and uncertainty estimates like confidence intervals.

# Run the suite of estimates
estimated_effects = wn.causal_suite(
    dataset.covariates, dataset.treatments, dataset.outcomes)

# Evaluate the relative error of the estimates
true_sate = dataset.sate
for estimator, estimate in estimated_effects.items():
    relative_error = np.abs((estimate.ate - true_sate) / true_sate)
    print("{}: {:.2f}".format(estimator, relative_error))
# ols: 1.06
# propensity_score_matching: 1.38
# propensity_weighted_ols: 1.37

In addition to experiments studying average treatment effect, WhyNot also supports causal inference experiments studying

  1. Heterogeneous treatment effects,
  2. Time-varying treatment policies
  3. Causal structure discovery

Sequential decision making

WhyNot supports experimentation with sequential decision making and reinforcement learning via unified interface with the OpenAI gym. In this section, we give a simple example showing how to use the HIV simulator for sequential decision making experiments.

First, we initialize the environment and set the random seed.

import whynot.gym as gym

env = gym.make('HIV-v0')
env.seed(1)

Observations in the simulator are a set of 6 states, capturing infected and uninfected T-lymphocytes, macrophages, immune response, and copies of free virus. Actions correspond to choosing between different drugs and dosages for treatment.

For illustration, we repeatedly chose actions, which correspond to treatment policy decisions, in the environment and measure both the next state and the reward. In this case, the reward weighs the strength of the immune response, the virus count, and the cost of the chosen treatment.

observation = env.reset()
for _ in range(100):
    action = env.action_space.sample()  # Replace with your treatment policy
    observation, reward, done, info = env.step(action)
    if done:
        observation = env.reset()

For more details on the simulation, as well as a fully worked out policy gradient example, see this notebook.

Strategic classification

Beyond settings typically studied in sequential decision making, WhyNot also supports experiments with standard supervised learning algorithms in dynamic settings. In this section, we show how to use WhyNot to study the performance of classifiers when individuals being classified behave strategically to improve their outcomes, a problem sometimes called strategic classification.

First, we set up the credit environment.

import whynot.gym as gym

env = gym.make('Credit-v0')
env.seed(1)

Observations in this environment correspond to a dataset of features for each individual and a label indicating whether they experience financial distress from the Kaggle GiveMeSomeCredit dataset.

dataset = env.reset()

Actions in the environment correspond to choosing a classifier to predict default. In response, individuals then strategically adapt their features in order to obtain a more favorable credit score. The subsequent observation is the adapted features, and the reward is the classifier's loss on this distribution

theta = env.action_space.sample() # Your classifier
dataset, loss, done, info = env.step(theta)

We can then experiment with the long-term equilibrium arising from repeatedly updating the classifier to cope with strategic response.

def learn_classifier(features, labels):
    # Replace with your learning algorithm
    return env.action_space.sample()

dataset = env.reset()
for _ in range(100):
    theta = learn_classifier(dataset["features"], dataset["labels"])
    dataset, loss, _, _ = env.step(theta)

For more details on the simulation and a complete example showing the standard retraining procedures perform in a strategic setting, see this notebook.

Beyond strategic classification, WhyNot also supports simulators and experiments evaluating other aspects of machine learning, e.g. fairness criteria, in dynamic settings.

For more examples and demonstrations of how to design and conduct experiments in each of these settings, check out usage and our collection of examples.

Simulators in WhyNot

WhyNot provides a large number of simulated environments from fields ranging from economics to epidemiology. Each simulator comes equipped with a representative set of causal inference experiments and exports a uniform Python interface that makes it easy to construct new causal inference experiments in these environments, as well as an OpenAI gym interface to perform reinforcement learning experiments in new environments.

The simulators in WhyNot currently include:

View on GitHub
GitHub Stars426
CategoryDevelopment
Updated1mo ago
Forks44

Languages

Python

Security Score

95/100

Audited on Feb 13, 2026

No findings