Msdm

Models of Sequential Decision-Making

Generate Convert Improve

Install / Use

/learn @markkho/Msdm

About this skill

Quality Score

0/100

README

`msdm`: Models of Sequential Decision-Making

Goals

msdm aims to simplify the design and evaluation of models of sequential decision-making. The library can be used for cognitive science or computer science research/teaching.

Approach

msdm provides standardized interfaces and implementations for common constructs in sequential decision-making. This includes algorithms used in single-agent reinforcement learning as well as those used in planning, partially observable environments, and multi-agent games.

The library is organized around different problem classes and algorithms that operate on problem instances. We take inspiration from existing libraries such as scikit-learn that enable users to transparently mix and match components. For instance, a standard way to define a problem, solve it, and examine the results would be:

# create a problem instance
mdp = make_russell_norvig_grid(
    discount_rate=0.95,
    slip_prob=0.8,
)

# solve the problem
vi = ValueIteration()
res = vi.plan_on(mdp)

# print the value function
print(res.V)

The library is under active development. Currently, we support the following problem classes:

Markov Decision Processes (MDPs)
Partially Observable Markov Decision Processes (POMDPs)
Markov Games
Partially Observable Stochastic Games (POSGs)

The following algorithms have been implemented and tested:

Classical Planning
- Breadth-First Search (Zuse, 1945)
- A* (Hart, Nilsson & Raphael, 1968)
Stochastic Planning
- Value Iteration (Bellman, 1957)
- Policy Iteration (Howard, 1960)
- Labeled Real-time Dynamic Programming (Bonet & Geffner, 2003)
- LAO* (Hansen & Zilberstein, 2003)
Partially Observable Planning
- QMDP (Littman, Cassandra & Kaelbling, 1995)
- Point-based Value-Iteration (Pineau, Gordon & Thrun, 2003)
- Finite state controller gradient ascent (Meuleau, Kim, Kaelbling & Cassandra, 1999)
- Bounded finite state controller policy iteration (Poupart & Boutilier, 2003)
- Wrappers for POMDPs.jl solvers (requires Julia installation)
Reinforcement Learning
- Q-Learning (Watkins, 1992)
- Double Q-Learning (van Hasselt, 2010)
- SARSA (Rummery & Niranjan, 1994)
- Expected SARSA (van Seijen, van Hasselt, Whiteson & Wiering, 2009)
- R-MAX (Brafman & Tennenholtz, 2002)
Multi-agent Reinforcement Learning (in progress)
- Correlated Q Learning (Greenwald & Hall, 2002)
- Nash Q Learning (Hu & Wellman, 2003)
- Friend/Foe Q Learning (Littman, 2001)

We aim to add implementations for other algorithms in the near future (e.g., inverse RL, deep learning, multi-agent learning and planning).

Installation

It is recommended to use a virtual environment.

Installing from pip

$ pip install msdm

Installing from GitHub

$ pip install --upgrade git+https://github.com/markkho/msdm.git

Installing the package in edit mode

After downloading, go into the folder and install the package locally (with a symlink so its updated as source file changes are made):

$ pip install -e .

Contributing

We welcome contributions in the form of implementations of algorithms for common problem classes that are well-documented in the literature. Please first post an issue and/or reach out to mark.ho.cs@gmail.com to check if a proposed contribution is within the scope of the library.

Running tests, etc.

To run all tests: make test

To run tests for some file: python -m py.test msdm/tests/$TEST_FILE_NAME.py

To lint the code: make lint

Related Skills

node-connect

345.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

claude-opus-4-5-migration

104.6k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

frontend-design

104.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

model-usage

345.4k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

markkho

View profile

View on GitHub

GitHub Stars53

CategoryDevelopment

Updated1mo ago

Forks6

markkho/msdm

Languages

Python

Security Score

95/100

Audited on Feb 4, 2026

No findings