Subare

reinforcement learning algorithms from the book by Sutton and Barto

Generate Convert Improve

Install / Use

/learn @idsc-frazzoli/Subare

About this skill

Quality Score

0/100

README

ch.ethz.idsc.subare <a href="https://travis-ci.org/idsc-frazzoli/subare"><img src="https://travis-ci.org/idsc-frazzoli/subare.svg?branch=master" alt="Build Status"></a>

Library for reinforcement learning in Java, version 0.3.8

Repository includes algorithms, examples, and exercises from the 2nd edition of Reinforcement Learning: An Introduction by Richard S. Sutton, and Andrew G. Barto.

Our implementation is inspired by the python code by Shangtong Zhang, but differs from the reference in two aspects:

the algorithms are implemented separate from the problem scenarios
the math is in exact precision which reproduces symmetries in the results in case the problem features symmetries

Algorithms

Iterative Policy Evaluation (parallel, in 4.1, p.59)
Value Iteration to determine V*(s) (parallel, in 4.4, p.65)
Action-Value Iteration to determine Q*(s,a) (parallel)
First Visit Policy Evaluation (in 5.1, p.74)
Monte Carlo Exploring Starts (in 5.3, p.79)
Contant-alpha Monte Carlo
Tabular Temporal Difference (in 6.1, p.96)
Sarsa: An on-policy TD control algorithm (in 6.4, p.104)
Q-learning: An off-policy TD control algorithm (in 6.5, p.105)
Expected Sarsa (in 6.6, p.107)
Double Sarsa, Double Expected Sarsa, Double Q-Learning (in 6.7, p.109)
n-step Temporal Difference for estimating V(s) (in 7.1, p.115)
n-step Sarsa, n-step Expected Sarsa, n-step Q-Learning (in 7.2, p.118)
Random-sample one-step tabular Q-planning (parallel, in 8.1, p.131)
Tabular Dyna-Q (in 8.2, p.133)
Prioritized Sweeping (in 8.4, p.137)
Semi-gradient Tabular Temporal Difference (in 9.3, p.164)
True Online Sarsa (in 12.8, p.309)

Gallery

prisonersdilemma

Prisoner's Dilemma

<td>

gambler_exact

Exact Gambler

</tr> </table>

Examples

4.1 Gridworld

AV-Iteration q(s,a)

gridworld_qsa_avi

<td>

TabularQPlan

gridworld_qsa_rstqp

<td>

Monte Carlo

gridworld_qsa_mces

</tr><tr> <td>

Q-Learning

gridworld_qsa_qlearning

<td>

Expected-Sarsa

gridworld_qsa_expected

<td>

Sarsa

gridworld_qsa_original

</tr><tr> <td>

3-step Q-Learning

gridworld_qsa_qlearning3

<td>

3-step E-Sarsa

gridworld_qsa_expected3

<td>

3-step Sarsa

gridworld_qsa_original3

</tr><tr> <td>

OTrue Online Sarsa

gridworld_tos_original

<td>

ETrue Online Sarsa

gridworld_tos_expected

<td>

QTrue Online Sarsa

gridworld_tos_qlearning

</tr></table>

4.2: Jack's car rental

Value Iteration v(s)

carrental_vi_true

4.4: Gambler's problem

Value Iteration v(s)

gambler_sv

Action Value Iteration and optimal policy

gambler_avi

Monte Carlo q(s,a)

gambler_qsa_mces

<td>

ESarsa q(s,a)

gambler_qsa_esarsa

<td>

QLearning q(s,a)

gambler_qsa_qlearn

</tr></table>

5.1 Blackjack

Monte Carlo Exploring Starts

blackjack_mces

5.2 Wireloop

AV-Iteration

wire5_avi

<td>

TabularQPlan

wire5_qsa_rstqp

<td>

Q-Learning

wire5_qsa_qlearning

<td>

E-Sarsa

wire5_qsa_expected

<td>

Sarsa

wire5_qsa_original

<td>

Monte Carlo

wire5_mces

</tr></table>

5.8 Racetrack

paths obtained using value iteration

track 1

track1

track 2

track2

</tr></table>

6.5 Windygrid

Action Value Iteration

windygrid_qsa_avi

<td>

TabularQPlan

windygrid_qsa_rstqp

</tr></table>

6.6 Cliffwalk

Action Value Iteration

cliffwalk_qsa_avi

<td>

Q-Learning

cliffwalk_qsa_qlearning

<td>

TabularQPlan

cliffwalk_qsa_rstqp

<td>

Expected Sarsa

cliffwalk_qsa_expected

</tr></table>

8.1 Dynamaze

Action Value Iteration

maze5_qsa_avi

<td>

Prioritized sweeping

maze2_ps_qlearning

</tr></table>

Additional Examples

Repeated Prisoner's dilemma

Exact expected reward of two adversarial optimistic agents depending on their initial configuration:

opts

Exact expected reward of two adversarial Upper-Confidence-Bound agents depending on their initial configuration:

ucbs

Integration

Specify dependency and repository of the tensor library in the pom.xml file of your maven project:

<dependencies>
  <dependency>
    <groupId>ch.ethz.idsc</groupId>
    <artifactId>subare</artifactId>
    <version>0.3.8</version>
  </dependency>
</dependencies>

<repositories>
  <repository>
    <id>subare-mvn-repo</id>
    <url>https://raw.github.com/idsc-frazzoli/subare/mvn-repo/</url>
    <snapshots>
      <enabled>true</enabled>
      <updatePolicy>always</updatePolicy>
    </snapshots>
  </repository>
</repositories>

The source code is attached to every release.

Contributors

Jan Hakenberg, Christian Fluri

Publications

Learning to Operate a Fleet of Cars by Christian Fluri, Claudio Ruch, Julian Zilly, Jan Hakenberg, and Emilio Frazzoli

References

Reinforcement Learning: An Introduction by Richard S. Sutton, and Andrew G. Barto

ethz300

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

401

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

idsc-frazzoli

View profile

View on GitHub

GitHub Stars17

CategoryEducation

Updated3mo ago

Forks2

idsc-frazzoli/subare

Languages

Java

Security Score

80/100

Audited on Jan 10, 2026

No findings