SkillAgentSearch skills...

CORL

High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC

Install / Use

/learn @tinkoff-ai/CORL
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

CORL (Clean Offline Reinforcement Learning)

Twitter arXiv <img src="https://img.shields.io/badge/license-Apache_2.0-blue"> Ruff

🧵 CORL is an Offline Reinforcement Learning library that provides high-quality and easy-to-follow single-file implementations of SOTA ORL algorithms. Each implementation is backed by a research-friendly codebase, allowing you to run or tune thousands of experiments. Heavily inspired by cleanrl for online RL, check them out too!<br/>

  • 📜 Single-file implementation
  • 📈 Benchmarked Implementation for N algorithms
  • 🖼 Weights and Biases integration

  • ⭐ If you're interested in discrete control, make sure to check out our new library — Katakomba. It provides both discrete control algorithms augmented with recurrence and an offline RL benchmark for the NetHack Learning environment.

Getting started

git clone https://github.com/tinkoff-ai/CORL.git && cd CORL
pip install -r requirements/requirements_dev.txt

# alternatively, you could use docker
docker build -t <image_name> .
docker run --gpus=all -it --rm --name <container_name> <image_name>

Algorithms Implemented

| Algorithm | Variants Implemented | Wandb Report | |--------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| ----------- | | Offline and Offline-to-Online | | | ✅ Conservative Q-Learning for Offline Reinforcement Learning <br>(CQL) | offline/cql.py <br /> finetune/cql.py | Offline <br /> <br /> Offline-to-online | ✅ Accelerating Online Reinforcement Learning with Offline Datasets <br>(AWAC) | offline/awac.py <br /> finetune/awac.py | Offline <br /> <br /> Offline-to-online | ✅ Offline Reinforcement Learning with Implicit Q-Learning <br>(IQL) | offline/iql.py <br /> finetune/iql.py | Offline <br /> <br /> Offline-to-online | Offline-to-Online only | | | ✅ Supported Policy Optimization for Offline Reinforcement Learning <br>(SPOT) | finetune/spot.py | Offline-to-online | ✅ Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning <br>(Cal-QL) | finetune/cal_ql.py | Offline-to-online | Offline only | | | ✅ Behavioral Cloning <br>(BC) | offline/any_percent_bc.py | Offline | ✅ Behavioral Cloning-10% <br>(BC-10%) | offline/any_percent_bc.py | Offline | ✅ A Minimalist Approach to Offline Reinforcement Learning <br>(TD3+BC) | offline/td3_bc.py | Offline | ✅ Decision Transformer: Reinforcement Learning via Sequence Modeling <br>(DT) | offline/dt.py | Offline | ✅ Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble <br>(SAC-N) | offline/sac_n.py | Offline | ✅ Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble <br>(EDAC) | offline/edac.py | Offline | ✅ Revisiting the Minimalist Approach to Offline Reinforcement Learning <br>(ReBRAC) | offline/rebrac.py | Offline | ✅ Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size <br>(LB-SAC) | offline/lb_sac.py | Offline Gym-MuJoCo

D4RL Benchmarks

You can check the links above for learning curves and details. Here, we report reproduced final and best scores. Note that they differ by a significant margin, and some papers may use different approaches, not making it always explicit which reporting methodology they chose. If you want to re-collect our results in a more structured/nuanced manner, see results.

Offline

Last Scores

Gym-MuJoCo

| Task-Name|BC|10% BC|TD3+BC|AWAC|CQL|IQL|ReBRAC|SAC-N|EDAC|DT| |------------------------------|------------|--------|--------|--------|-----|-----|------|-------|------|----| |halfcheetah-medium-v2|42.40 ± 0.19|42.46 ± 0.70|48.10 ± 0.18|49.46 ± 0.62|47.04 ± 0.22|48.31 ± 0.22|64.04 ± 0.68|68.20 ± 1.28|67.70 ± 1.04|42.20 ± 0.26| |halfcheetah-medium-replay-v2|35.66 ± 2.33|23.59 ± 6.95|44.84 ± 0.59|44.70 ± 0.69|45.04 ± 0.27|44.46 ± 0.22|51.18 ± 0.31|60.70 ± 1.01|62.06 ± 1.10|38.91 ± 0.50| |halfcheetah-medium-expert-v2|55.95 ± 7.35|90.10 ± 2.45|90.78 ± 6.04|93.62 ± 0.41|95.63 ± 0.42|94.74 ± 0.52|103.80 ± 2.95|98.96 ± 9.31|104.76 ± 0.64|91.55 ± 0.95| |hopper-medium-v2|53.51 ± 1.76|55.48 ± 7.30|60.37 ± 3.49|74.45 ± 9.14|59.08 ± 3.77|67.53 ± 3.78|102.29 ± 0.17|40.82 ± 9.91|101.70 ± 0.28|65.10 ± 1.61| |hopper-medium-replay-v2|29.81 ± 2.07|70.42 ± 8.66|64.42 ± 21.52|96.39 ± 5.28|95.11 ± 5.27|97.43 ± 6.39|94.98 ± 6.53|100.33 ± 0.78|99.66 ± 0.81|81.77 ± 6.87| |hopper-medium-expert-v2|52.30 ± 4.01|111.16 ± 1.03|101.17 ± 9.07|52.73 ± 37.47|99.26 ± 10.91|107.42 ± 7.80|109.45 ± 2.34|101.31 ± 11.63|105.19 ± 10.08|110.44 ± 0.33| |walker2d-medium-v2|63.23 ± 16.24|67.34 ± 5.17|82.71 ± 4.78|66.53 ± 26.04|80.75 ± 3.28|80.91 ± 3.17|85.82 ± 0.77|87.47 ± 0.66|93.36 ± 1.38|67.63 ± 2.54| |walker2d-medium-replay-v2|21.80 ± 10.15|54.35 ± 6.34|85.62 ± 4.01|82.20 ± 1.05|73.09 ± 13.22|82.15 ± 3.03|84.25 ± 2.25|78.99 ± 0.50|87.10 ± 2.78|59.86 ± 2.73| |walker2d-medium-expert-v2|98.96 ± 15.98|108.70 ± 0.25|110.03 ± 0.36|49.41 ± 38.16|109.56 ± 0.39|111.72 ± 0.86|111.86 ± 0.43|114.93 ± 0.41|114.75 ± 0.74|107.11 ± 0.96| | | | | | | | | | | | | | locomotion average |50.40|69.29|76.45|67.72|78.28|81.63|89.74|83.52|92.92|73.84|

Maze2d

| Task-Name |BC|10% BC|TD3+BC|AWAC|CQL|IQL|ReBRAC|SAC-N|EDAC|DT| |--------------------|------------|--------|--------|--------|-----|-----|------|-------|------|----| | maze2d-umaze-v1 |0.36 ± 8.69|12.18 ± 4.29|29.41 ± 12.31|82.67 ± 28.30|-8.90 ±

Related Skills

View on GitHub
GitHub Stars1.3k
CategoryEducation
Updated1d ago
Forks165

Languages

Python

Security Score

100/100

Audited on Mar 30, 2026

No findings