Msdm
Models of Sequential Decision-Making
Install / Use
/learn @markkho/MsdmREADME
msdm: Models of Sequential Decision-Making
Goals
msdm aims to simplify the design and evaluation of
models of sequential decision-making. The library
can be used for cognitive science or computer
science research/teaching.
Approach
msdm provides standardized interfaces and implementations
for common constructs in sequential
decision-making. This includes algorithms used in single-agent
reinforcement learning
as well as those used in
planning,
partially observable environments,
and multi-agent games.
The library is organized around different problem classes and algorithms that operate on problem instances. We take inspiration from existing libraries such as scikit-learn that enable users to transparently mix and match components. For instance, a standard way to define a problem, solve it, and examine the results would be:
# create a problem instance
mdp = make_russell_norvig_grid(
discount_rate=0.95,
slip_prob=0.8,
)
# solve the problem
vi = ValueIteration()
res = vi.plan_on(mdp)
# print the value function
print(res.V)
The library is under active development. Currently, we support the following problem classes:
- Markov Decision Processes (MDPs)
- Partially Observable Markov Decision Processes (POMDPs)
- Markov Games
- Partially Observable Stochastic Games (POSGs)
The following algorithms have been implemented and tested:
- Classical Planning
- Breadth-First Search (Zuse, 1945)
- A* (Hart, Nilsson & Raphael, 1968)
- Stochastic Planning
- Value Iteration (Bellman, 1957)
- Policy Iteration (Howard, 1960)
- Labeled Real-time Dynamic Programming (Bonet & Geffner, 2003)
- LAO* (Hansen & Zilberstein, 2003)
- Partially Observable Planning
- QMDP (Littman, Cassandra & Kaelbling, 1995)
- Point-based Value-Iteration (Pineau, Gordon & Thrun, 2003)
- Finite state controller gradient ascent (Meuleau, Kim, Kaelbling & Cassandra, 1999)
- Bounded finite state controller policy iteration (Poupart & Boutilier, 2003)
- Wrappers for POMDPs.jl solvers (requires Julia installation)
- Reinforcement Learning
- Q-Learning (Watkins, 1992)
- Double Q-Learning (van Hasselt, 2010)
- SARSA (Rummery & Niranjan, 1994)
- Expected SARSA (van Seijen, van Hasselt, Whiteson & Wiering, 2009)
- R-MAX (Brafman & Tennenholtz, 2002)
- Multi-agent Reinforcement Learning (in progress)
- Correlated Q Learning (Greenwald & Hall, 2002)
- Nash Q Learning (Hu & Wellman, 2003)
- Friend/Foe Q Learning (Littman, 2001)
We aim to add implementations for other algorithms in the near future (e.g., inverse RL, deep learning, multi-agent learning and planning).
Installation
It is recommended to use a virtual environment.
Installing from pip
$ pip install msdm
Installing from GitHub
$ pip install --upgrade git+https://github.com/markkho/msdm.git
Installing the package in edit mode
After downloading, go into the folder and install the package locally (with a symlink so its updated as source file changes are made):
$ pip install -e .
Contributing
We welcome contributions in the form of implementations of algorithms for common problem classes that are well-documented in the literature. Please first post an issue and/or reach out to mark.ho.cs@gmail.com to check if a proposed contribution is within the scope of the library.
Running tests, etc.
To run all tests: make test
To run tests for some file: python -m py.test msdm/tests/$TEST_FILE_NAME.py
To lint the code: make lint
Related Skills
node-connect
345.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
104.6kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
104.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
345.4kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
