CORL
High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC
Install / Use
/learn @tinkoff-ai/CORLREADME
CORL (Clean Offline Reinforcement Learning)
<img src="https://img.shields.io/badge/license-Apache_2.0-blue">
🧵 CORL is an Offline Reinforcement Learning library that provides high-quality and easy-to-follow single-file implementations of SOTA ORL algorithms. Each implementation is backed by a research-friendly codebase, allowing you to run or tune thousands of experiments. Heavily inspired by cleanrl for online RL, check them out too!<br/>
- 📜 Single-file implementation
- 📈 Benchmarked Implementation for N algorithms
- 🖼 Weights and Biases integration
- ⭐ If you're interested in discrete control, make sure to check out our new library — Katakomba. It provides both discrete control algorithms augmented with recurrence and an offline RL benchmark for the NetHack Learning environment.
Getting started
git clone https://github.com/tinkoff-ai/CORL.git && cd CORL
pip install -r requirements/requirements_dev.txt
# alternatively, you could use docker
docker build -t <image_name> .
docker run --gpus=all -it --rm --name <container_name> <image_name>
Algorithms Implemented
| Algorithm | Variants Implemented | Wandb Report |
|--------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| ----------- |
| Offline and Offline-to-Online | |
| ✅ Conservative Q-Learning for Offline Reinforcement Learning <br>(CQL) | offline/cql.py <br /> finetune/cql.py | Offline <br /> <br /> Offline-to-online
| ✅ Accelerating Online Reinforcement Learning with Offline Datasets <br>(AWAC) | offline/awac.py <br /> finetune/awac.py | Offline <br /> <br /> Offline-to-online
| ✅ Offline Reinforcement Learning with Implicit Q-Learning <br>(IQL) | offline/iql.py <br /> finetune/iql.py | Offline <br /> <br /> Offline-to-online
| Offline-to-Online only | |
| ✅ Supported Policy Optimization for Offline Reinforcement Learning <br>(SPOT) | finetune/spot.py | Offline-to-online
| ✅ Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning <br>(Cal-QL) | finetune/cal_ql.py | Offline-to-online
| Offline only | |
| ✅ Behavioral Cloning <br>(BC) | offline/any_percent_bc.py | Offline
| ✅ Behavioral Cloning-10% <br>(BC-10%) | offline/any_percent_bc.py | Offline
| ✅ A Minimalist Approach to Offline Reinforcement Learning <br>(TD3+BC) | offline/td3_bc.py | Offline
| ✅ Decision Transformer: Reinforcement Learning via Sequence Modeling <br>(DT) | offline/dt.py | Offline
| ✅ Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble <br>(SAC-N) | offline/sac_n.py | Offline
| ✅ Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble <br>(EDAC) | offline/edac.py | Offline
| ✅ Revisiting the Minimalist Approach to Offline Reinforcement Learning <br>(ReBRAC) | offline/rebrac.py | Offline
| ✅ Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size <br>(LB-SAC) | offline/lb_sac.py | Offline Gym-MuJoCo
D4RL Benchmarks
You can check the links above for learning curves and details. Here, we report reproduced final and best scores. Note that they differ by a significant margin, and some papers may use different approaches, not making it always explicit which reporting methodology they chose. If you want to re-collect our results in a more structured/nuanced manner, see results.
Offline
Last Scores
Gym-MuJoCo
| Task-Name|BC|10% BC|TD3+BC|AWAC|CQL|IQL|ReBRAC|SAC-N|EDAC|DT| |------------------------------|------------|--------|--------|--------|-----|-----|------|-------|------|----| |halfcheetah-medium-v2|42.40 ± 0.19|42.46 ± 0.70|48.10 ± 0.18|49.46 ± 0.62|47.04 ± 0.22|48.31 ± 0.22|64.04 ± 0.68|68.20 ± 1.28|67.70 ± 1.04|42.20 ± 0.26| |halfcheetah-medium-replay-v2|35.66 ± 2.33|23.59 ± 6.95|44.84 ± 0.59|44.70 ± 0.69|45.04 ± 0.27|44.46 ± 0.22|51.18 ± 0.31|60.70 ± 1.01|62.06 ± 1.10|38.91 ± 0.50| |halfcheetah-medium-expert-v2|55.95 ± 7.35|90.10 ± 2.45|90.78 ± 6.04|93.62 ± 0.41|95.63 ± 0.42|94.74 ± 0.52|103.80 ± 2.95|98.96 ± 9.31|104.76 ± 0.64|91.55 ± 0.95| |hopper-medium-v2|53.51 ± 1.76|55.48 ± 7.30|60.37 ± 3.49|74.45 ± 9.14|59.08 ± 3.77|67.53 ± 3.78|102.29 ± 0.17|40.82 ± 9.91|101.70 ± 0.28|65.10 ± 1.61| |hopper-medium-replay-v2|29.81 ± 2.07|70.42 ± 8.66|64.42 ± 21.52|96.39 ± 5.28|95.11 ± 5.27|97.43 ± 6.39|94.98 ± 6.53|100.33 ± 0.78|99.66 ± 0.81|81.77 ± 6.87| |hopper-medium-expert-v2|52.30 ± 4.01|111.16 ± 1.03|101.17 ± 9.07|52.73 ± 37.47|99.26 ± 10.91|107.42 ± 7.80|109.45 ± 2.34|101.31 ± 11.63|105.19 ± 10.08|110.44 ± 0.33| |walker2d-medium-v2|63.23 ± 16.24|67.34 ± 5.17|82.71 ± 4.78|66.53 ± 26.04|80.75 ± 3.28|80.91 ± 3.17|85.82 ± 0.77|87.47 ± 0.66|93.36 ± 1.38|67.63 ± 2.54| |walker2d-medium-replay-v2|21.80 ± 10.15|54.35 ± 6.34|85.62 ± 4.01|82.20 ± 1.05|73.09 ± 13.22|82.15 ± 3.03|84.25 ± 2.25|78.99 ± 0.50|87.10 ± 2.78|59.86 ± 2.73| |walker2d-medium-expert-v2|98.96 ± 15.98|108.70 ± 0.25|110.03 ± 0.36|49.41 ± 38.16|109.56 ± 0.39|111.72 ± 0.86|111.86 ± 0.43|114.93 ± 0.41|114.75 ± 0.74|107.11 ± 0.96| | | | | | | | | | | | | | locomotion average |50.40|69.29|76.45|67.72|78.28|81.63|89.74|83.52|92.92|73.84|
Maze2d
| Task-Name |BC|10% BC|TD3+BC|AWAC|CQL|IQL|ReBRAC|SAC-N|EDAC|DT| |--------------------|------------|--------|--------|--------|-----|-----|------|-------|------|----| | maze2d-umaze-v1 |0.36 ± 8.69|12.18 ± 4.29|29.41 ± 12.31|82.67 ± 28.30|-8.90 ±
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
flutter-tutor
Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
16.9kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
