Subare
reinforcement learning algorithms from the book by Sutton and Barto
Install / Use
/learn @idsc-frazzoli/SubareREADME
ch.ethz.idsc.subare <a href="https://travis-ci.org/idsc-frazzoli/subare"><img src="https://travis-ci.org/idsc-frazzoli/subare.svg?branch=master" alt="Build Status"></a>
Library for reinforcement learning in Java, version 0.3.8
Repository includes algorithms, examples, and exercises from the 2nd edition of Reinforcement Learning: An Introduction by Richard S. Sutton, and Andrew G. Barto.
Our implementation is inspired by the python code by Shangtong Zhang, but differs from the reference in two aspects:
- the algorithms are implemented separate from the problem scenarios
- the math is in exact precision which reproduces symmetries in the results in case the problem features symmetries
Algorithms
- Iterative Policy Evaluation (parallel, in 4.1, p.59)
- Value Iteration to determine V*(s) (parallel, in 4.4, p.65)
- Action-Value Iteration to determine Q*(s,a) (parallel)
- First Visit Policy Evaluation (in 5.1, p.74)
- Monte Carlo Exploring Starts (in 5.3, p.79)
- Contant-alpha Monte Carlo
- Tabular Temporal Difference (in 6.1, p.96)
- Sarsa: An on-policy TD control algorithm (in 6.4, p.104)
- Q-learning: An off-policy TD control algorithm (in 6.5, p.105)
- Expected Sarsa (in 6.6, p.107)
- Double Sarsa, Double Expected Sarsa, Double Q-Learning (in 6.7, p.109)
- n-step Temporal Difference for estimating V(s) (in 7.1, p.115)
- n-step Sarsa, n-step Expected Sarsa, n-step Q-Learning (in 7.2, p.118)
- Random-sample one-step tabular Q-planning (parallel, in 8.1, p.131)
- Tabular Dyna-Q (in 8.2, p.133)
- Prioritized Sweeping (in 8.4, p.137)
- Semi-gradient Tabular Temporal Difference (in 9.3, p.164)
- True Online Sarsa (in 12.8, p.309)
Gallery
<table> <tr> <td>
Prisoner's Dilemma
<td>
Exact Gambler
</tr> </table>Examples
4.1 Gridworld
<table><tr> <td valign="top">AV-Iteration q(s,a)

TabularQPlan

Monte Carlo

Q-Learning

Expected-Sarsa

Sarsa

3-step Q-Learning

3-step E-Sarsa

3-step Sarsa

OTrue Online Sarsa

ETrue Online Sarsa

QTrue Online Sarsa

4.2: Jack's car rental
Value Iteration v(s)

4.4: Gambler's problem
Value Iteration v(s)

Action Value Iteration and optimal policy

Monte Carlo q(s,a)

ESarsa q(s,a)

QLearning q(s,a)

5.1 Blackjack
Monte Carlo Exploring Starts

5.2 Wireloop
<table><tr><td>AV-Iteration

TabularQPlan

Q-Learning

E-Sarsa

Sarsa

Monte Carlo

5.8 Racetrack
paths obtained using value iteration
<table><tr><td valign="top">track 1

track 2

6.5 Windygrid
<table><tr><td>Action Value Iteration

TabularQPlan

6.6 Cliffwalk
<table><tr><td>Action Value Iteration

Q-Learning

TabularQPlan

Expected Sarsa

8.1 Dynamaze
<table><tr><td>Action Value Iteration

Prioritized sweeping

Additional Examples
Repeated Prisoner's dilemma
Exact expected reward of two adversarial optimistic agents depending on their initial configuration:

Exact expected reward of two adversarial Upper-Confidence-Bound agents depending on their initial configuration:

Integration
Specify dependency and repository of the tensor library in the pom.xml file of your maven project:
<dependencies>
<dependency>
<groupId>ch.ethz.idsc</groupId>
<artifactId>subare</artifactId>
<version>0.3.8</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>subare-mvn-repo</id>
<url>https://raw.github.com/idsc-frazzoli/subare/mvn-repo/</url>
<snapshots>
<enabled>true</enabled>
<updatePolicy>always</updatePolicy>
</snapshots>
</repository>
</repositories>
The source code is attached to every release.
Contributors
Jan Hakenberg, Christian Fluri
Publications
- Learning to Operate a Fleet of Cars by Christian Fluri, Claudio Ruch, Julian Zilly, Jan Hakenberg, and Emilio Frazzoli
References
- Reinforcement Learning: An Introduction by Richard S. Sutton, and Andrew G. Barto

Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
401Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
