Cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Install / Use
/learn @vwxyzjn/CleanrlREADME
CleanRL (Clean Implementation of RL Algorithms)
<img src="https://img.shields.io/badge/license-MIT-blue">
<img src="https://img.shields.io/discord/767863440248143916?label=discord">
<img src="https://img.shields.io/youtube/channel/views/UCDdC6BIFRI0jvcwuhi3aI6w?style=social">
<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Models-Huggingface-F8D521">
CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments using AWS Batch. The highlight features of CleanRL are:
- 📜 Single-file implementation
- Every detail about an algorithm variant is put into a single standalone file.
- For example, our
ppo_atari.pyonly has 340 lines of code but contains all implementation details on how PPO works with Atari games, so it is a great reference implementation to read for folks who do not wish to read an entire modular library.
- 📊 Benchmarked Implementation (7+ algorithms and 34+ games at https://benchmark.cleanrl.dev)
- 📈 Tensorboard Logging
- 🪛 Local Reproducibility via Seeding
- 🎮 Videos of Gameplay Capturing
- 🧫 Experiment Management with Weights and Biases
- 💸 Cloud Integration with docker and AWS
You can read more about CleanRL in our JMLR paper and documentation.
Notable CleanRL-related projects:
- corl-team/CORL: Offline RL algorithm implemented in CleanRL style
- pytorch-labs/LeanRL: Fast optimized PyTorch implementation of CleanRL RL algorithms using CUDAGraphs.
ℹ️ Support for Gymnasium: Farama-Foundation/Gymnasium is the next generation of
openai/gymthat will continue to be maintained and introduce new features. Please see their announcement for further detail. We are migrating togymnasiumand the progress can be tracked in vwxyzjn/cleanrl#277.
⚠️ NOTE: CleanRL is not a modular library and therefore it is not meant to be imported. At the cost of duplicate code, we make all implementation details of a DRL algorithm variant easy to understand, so CleanRL comes with its own pros and cons. You should consider using CleanRL if you want to 1) understand all implementation details of an algorithm's variant or 2) prototype advanced features that other modular DRL libraries do not support (CleanRL has minimal lines of code so it gives you great debugging experience and you don't have do a lot of subclassing like sometimes in modular DRL libraries).
Get started
Prerequisites:
- Python >=3.7.1,<3.11
- uv 0.7.9+
To run experiments locally, give the following a try:
git clone https://github.com/vwxyzjn/cleanrl.git && cd cleanrl
uv pip install .
# alternatively, you could use `uv venv` and do
# `python run cleanrl/ppo.py`
uv run python cleanrl/ppo.py \
--seed 1 \
--env-id CartPole-v0 \
--total-timesteps 50000
# open another terminal and enter `cd cleanrl/cleanrl`
tensorboard --logdir runs
To use experiment tracking with wandb, run
wandb login # only required for the first time
uv run python cleanrl/ppo.py \
--seed 1 \
--env-id CartPole-v0 \
--total-timesteps 50000 \
--track \
--wandb-project-name cleanrltest
If you are not using uv, you can install CleanRL with requirements.txt:
# core dependencies
pip install -r requirements/requirements.txt
# optional dependencies
pip install -r requirements/requirements-atari.txt
pip install -r requirements/requirements-mujoco.txt
pip install -r requirements/requirements-mujoco_py.txt
pip install -r requirements/requirements-procgen.txt
pip install -r requirements/requirements-envpool.txt
pip install -r requirements/requirements-pettingzoo.txt
pip install -r requirements/requirements-jax.txt
pip install -r requirements/requirements-docs.txt
pip install -r requirements/requirements-cloud.txt
pip install -r requirements/requirements-memory_gym.txt
To run training scripts in other games:
uv venv
# classic control
python cleanrl/dqn.py --env-id CartPole-v1
python cleanrl/ppo.py --env-id CartPole-v1
python cleanrl/c51.py --env-id CartPole-v1
# atari
uv pip install ".[atari]"
python cleanrl/dqn_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/c51_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/ppo_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/sac_atari.py --env-id BreakoutNoFrameskip-v4
# NEW: 3-4x side-effects free speed up with envpool's atari (only available to linux)
uv pip install ".[envpool]"
python cleanrl/ppo_atari_envpool.py --env-id BreakoutNoFrameskip-v4
# Learn Pong-v5 in ~5-10 mins
# Side effects such as lower sample efficiency might occur
uv run python ppo_atari_envpool.py --clip-coef=0.2 --num-envs=16 --num-minibatches=8 --num-steps=128 --update-epochs=3
# procgen
uv pip install ".[procgen]"
python cleanrl/ppo_procgen.py --env-id starpilot
python cleanrl/ppg_procgen.py --env-id starpilot
# ppo + lstm
uv pip install ".[atari]"
python cleanrl/ppo_atari_lstm.py --env-id BreakoutNoFrameskip-v4
You may also use a prebuilt development environment hosted in Gitpod:
Algorithms Implemented
| Algorithm | Variants Implemented |
| ----------- | ----------- |
| ✅ Proximal Policy Gradient (PPO) | ppo.py, docs |
| | ppo_atari.py, docs
| | ppo_continuous_action.py, docs
| | ppo_atari_lstm.py, docs
| | ppo_atari_envpool.py, docs
| | ppo_atari_envpool_xla_jax.py, docs
| | ppo_atari_envpool_xla_jax_scan.py, docs)
| | ppo_procgen.py, docs
| | ppo_atari_multigpu.py, docs
| | ppo_pettingzoo_ma_atari.py, docs
| | ppo_continuous_action_isaacgym.py, docs
| | ppo_trxl.py, docs
| ✅ Deep Q-Learning (DQN) | dqn.py, docs |
| | dqn_atari.py, docs |
| | dqn_jax.py, docs |
| | dqn_atari_jax.py, docs |
| ✅ Categorical DQN (C51) | c51.py, docs |
| | [c51_atari.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/c51_atar
Related Skills
claude-opus-4-5-migration
81.4kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
330.7kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
TrendRadar
49.6k⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
mcp-for-beginners
15.5kThis open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.
