CORL

High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC

Generate Convert Improve

Install / Use

/learn @tinkoff-ai/CORL

About this skill

Quality Score

0/100

README

CORL (Clean Offline Reinforcement Learning)

🧵 CORL is an Offline Reinforcement Learning library that provides high-quality and easy-to-follow single-file implementations of SOTA ORL algorithms. Each implementation is backed by a research-friendly codebase, allowing you to run or tune thousands of experiments. Heavily inspired by cleanrl for online RL, check them out too!

📜 Single-file implementation
📈 Benchmarked Implementation for N algorithms
🖼 Weights and Biases integration

⭐ If you're interested in discrete control, make sure to check out our new library — Katakomba. It provides both discrete control algorithms augmented with recurrence and an offline RL benchmark for the NetHack Learning environment.

Getting started

git clone https://github.com/tinkoff-ai/CORL.git && cd CORL
pip install -r requirements/requirements_dev.txt

# alternatively, you could use docker
docker build -t <image_name> .
docker run --gpus=all -it --rm --name <container_name> <image_name>

Algorithms Implemented

D4RL Benchmarks

You can check the links above for learning curves and details. Here, we report reproduced final and best scores. Note that they differ by a significant margin, and some papers may use different approaches, not making it always explicit which reporting methodology they chose. If you want to re-collect our results in a more structured/nuanced manner, see results.

Offline

Last Scores

Gym-MuJoCo

| Task-Name|BC|10% BC|TD3+BC|AWAC|CQL|IQL|ReBRAC|SAC-N|EDAC|DT| |------------------------------|------------|--------|--------|--------|-----|-----|------|-------|------|----| |halfcheetah-medium-v2|42.40 ± 0.19|42.46 ± 0.70|48.10 ± 0.18|49.46 ± 0.62|47.04 ± 0.22|48.31 ± 0.22|64.04 ± 0.68|68.20 ± 1.28|67.70 ± 1.04|42.20 ± 0.26| |halfcheetah-medium-replay-v2|35.66 ± 2.33|23.59 ± 6.95|44.84 ± 0.59|44.70 ± 0.69|45.04 ± 0.27|44.46 ± 0.22|51.18 ± 0.31|60.70 ± 1.01|62.06 ± 1.10|38.91 ± 0.50| |halfcheetah-medium-expert-v2|55.95 ± 7.35|90.10 ± 2.45|90.78 ± 6.04|93.62 ± 0.41|95.63 ± 0.42|94.74 ± 0.52|103.80 ± 2.95|98.96 ± 9.31|104.76 ± 0.64|91.55 ± 0.95| |hopper-medium-v2|53.51 ± 1.76|55.48 ± 7.30|60.37 ± 3.49|74.45 ± 9.14|59.08 ± 3.77|67.53 ± 3.78|102.29 ± 0.17|40.82 ± 9.91|101.70 ± 0.28|65.10 ± 1.61| |hopper-medium-replay-v2|29.81 ± 2.07|70.42 ± 8.66|64.42 ± 21.52|96.39 ± 5.28|95.11 ± 5.27|97.43 ± 6.39|94.98 ± 6.53|100.33 ± 0.78|99.66 ± 0.81|81.77 ± 6.87| |hopper-medium-expert-v2|52.30 ± 4.01|111.16 ± 1.03|101.17 ± 9.07|52.73 ± 37.47|99.26 ± 10.91|107.42 ± 7.80|109.45 ± 2.34|101.31 ± 11.63|105.19 ± 10.08|110.44 ± 0.33| |walker2d-medium-v2|63.23 ± 16.24|67.34 ± 5.17|82.71 ± 4.78|66.53 ± 26.04|80.75 ± 3.28|80.91 ± 3.17|85.82 ± 0.77|87.47 ± 0.66|93.36 ± 1.38|67.63 ± 2.54| |walker2d-medium-replay-v2|21.80 ± 10.15|54.35 ± 6.34|85.62 ± 4.01|82.20 ± 1.05|73.09 ± 13.22|82.15 ± 3.03|84.25 ± 2.25|78.99 ± 0.50|87.10 ± 2.78|59.86 ± 2.73| |walker2d-medium-expert-v2|98.96 ± 15.98|108.70 ± 0.25|110.03 ± 0.36|49.41 ± 38.16|109.56 ± 0.39|111.72 ± 0.86|111.86 ± 0.43|114.93 ± 0.41|114.75 ± 0.74|107.11 ± 0.96| | | | | | | | | | | | | | locomotion average |50.40|69.29|76.45|67.72|78.28|81.63|89.74|83.52|92.92|73.84|

Maze2d

| Task-Name |BC|10% BC|TD3+BC|AWAC|CQL|IQL|ReBRAC|SAC-N|EDAC|DT| |--------------------|------------|--------|--------|--------|-----|-----|------|-------|------|----| | maze2d-umaze-v1 |0.36 ± 8.69|12.18 ± 4.29|29.41 ± 12.31|82.67 ± 28.30|-8.90 ±

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

flutter-tutor

Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

16.9k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary