Contextual
Contextual Bandits in R - simulation and evaluation of Multi-Armed Bandit Policies
Install / Use
/learn @Nth-iteration-labs/ContextualREADME
Contextual: Multi-Armed Bandits in R
<!-- * [AppVeyor: passing](https://ci.appveyor.com/project/robinvanemden/contextual) * [Travis CI: passing](https://travis-ci.org/Nth-iteration-labs/contextual) * [Codecov: 96% coverage](https://codecov.io/gh/Nth-iteration-labs/contextual) -->Overview
R package facilitating the simulation and evaluation of context-free and contextual Multi-Armed Bandit policies.
The package has been developed to:
- Ease the implementation, evaluation and dissemination of both existing and new contextual Multi-Armed Bandit policies.
- Introduce a wider audience to contextual bandit policies' advanced sequential decision strategies.
Package links:
Installation
To install contextual from CRAN:
install.packages('contextual')
To install the development version (requires the devtools package):
install.packages("devtools")
devtools::install_github('Nth-iteration-labs/contextual')
When working on or extending the package, clone its GitHub repository, then do:
install.packages("devtools")
devtools::install_deps(dependencies = TRUE)
devtools::build()
devtools::reload()
clean and rebuild...
Overview of core classes

Contextual consists of six core classes. Of these, the Bandit and Policy classes are subclassed and extended when implementing custom (synthetic or offline) bandits and policies. The other four classes (Agent, Simulator, History, and Plot) are the workhorses of the package, and generally need not be adapted or subclassed.
Documentation
See the demo directory for practical examples and replications of both synthetic and offline (contextual) bandit policy evaluations.
When seeking to extend contextual, it may also be of use to review "Extending Contextual: Frequently Asked Questions", before diving into the source code.
How to replicate figures from two introductory context-free Multi-Armed Bandits texts:
- Replication of figures from Sutton and Barto, "Reinforcement Learning: An Introduction", Chapter 2
- Replication of figures from "Bandit algorithms for website optimization" by John Miles White
Basic, context-free multi-armed bandit examples:
- Basic MAB Epsilon Greedy evaluation
- Synthetic MAB policy comparison
- Replication Eckles & Kaptein (Bootstrap Thompson Sampling)
Examples of both synthetic and offline contextual multi-armed bandit evaluations:
-
Offline Evaluation Data Set - Bootstrapped Replay Bandit: Carskit DePaul Movies
-
Offline Evaluation Data Set - Lookup Table Replay Bandit: MovieLens 10M
An example how to make use of the optional theta log to create interactive context-free bandit animations:
Some more extensive vignettes to get you started with the package:
- Getting started: running simulations
- Offline evaluation: replication of Li et al (2010)
- Class reference
Paper offering a general overview of the package's structure & API:
<!--- * [Blog at Pavlov](https://pavlov.tech/category/contextual/) -->Policies and Bandits
Overview of contextual's growing library of contextual and context-free bandit policies:
| General | Context-free | Contextual | Other | |---------------|-------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|------------------------------| | Random<br> Oracle<br> Fixed <br><br><br><br><br><br> | Epsilon-Greedy<br> Epsilon-First<br> UCB1, UCB2<br> Thompson Sampling<br> BootstrapTS<br> Softmax<br> Gradient<br> Gittins | CMAB Naive Epsilon-Greedy <br> Epoch-Greedy<br> LinUCB (General, Disjoint, Hybrid)<br>Linear Thompson Sampling<br> ProbitTS<br> LogitBTS<br>GLMUCB<br> <br> | Lock-in Feedback (LiF) <br> <br> <br> <br><br> <br><br> <br> |
Overview of contextual's bandit library:
| Basic Synthetic | Contextual Synthetic | Offline | Continuous | |------------------------------------------|---------------------------------------------------------------------------------------|------------------------------------------------------------|------------| | Basic Bernoulli Bandit<br> Basic Gaussian Bandit<br><br> <br> <br> | Contextual Bernoulli<br> Contextual Logit<br> Contextual Hybrid<br> Contextual Linear<br> Contextual Wheel | Replay Evaluator <br> Bootstrap Replay<br>Propensity Weighting<br>Direct Method<br>Doubly Robust<br> | Continuum <br> <br> <br> <br> <br> |
Alternative parallel backends
By default, "contextual" uses R's built-in parallel package to facilitate parallel evaluation of multiple agents over repeated simulation. See the demo/alternative_parallel_backends directory for several alternative parallel backends:
- Microsoft Azure VM's using doAzureParallel.
- Redis using doRedis.
- MPI (Message-Passing Interface) using Rmpi and doMPI.
Maintainers
Robin van Emden: author, maintainer* Maurits Kaptein: supervisor*
* Tilburg University / Jheronimus Academy of Data Science.
If you encounter a clear bug, please file a minimal reproducible example on GitHub.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
research_rules
Research & Verification Rules Quote Verification Protocol Primary Task "Make sure that the quote is relevant to the chapter and so you we want to make sure that we want to have it identifie
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
