RLs
Reinforcement Learning Algorithms Based on PyTorch
Install / Use
/learn @StepNeverStop/RLsREADME
RLs
This project includes SOTA or classic reinforcement learning (single and multi-agent) algorithms used for training agents by interacting with Unity through ml-agents Release 18 or with gym.
About
The goal of this framework is to provide stable implementations of standard RL algorithms and simultaneously enable fast prototyping of new methods. It aims to fill the need for a small, easily grokked codebase in which users can freely experiment with wild ideas (speculative research).
Characteristics
This project supports:
- Suitable for Windows, Linux, and OSX
- Single- and Multi-Agent training.
- Multiple type of observation sensors as input.
- Only need 3 steps to implement a new algorithm:
- policy write
.pyinrls/algorithms/{single/multi}directory and make the policy inherit from super-class defined inrls/algorithms/base - config write
.yamlinrls/configs/algorithms/directory and specify the super config type defined inrls/configs/algorithms/general.yaml - register register new algorithm in
rls/algorithms/__init__.py
- policy write
- Only need 3 steps to adapt to a new training environment:
- wrapper write environment wrappers in
rls/envs/{new platform}directory and make it inherit from super-class defined inrls/envs/env_base.py - config write default configuration in
rls/configs/{new platform} - register register new environment platform in
rls/envs/__init__.py
- wrapper write environment wrappers in
- Compatible with several environment platforms
- Unity3D ml-agents.
- PettingZoo
- gym, for now only two data types are compatible——
[Box, Discrete]. Support parallel training using gym envs, just need to specify--copiesto how many agents you want to train in parallel.- environments:
- MuJoCo(v2.0.2.13)
- PyBullet
- gym_minigrid
- observation -> action:
- Discrete -> Discrete (observation type -> action type)
- Discrete -> Box
- Box -> Discrete
- Box -> Box
- Box/Discrete -> Tuple(Discrete, Discrete, Discrete)
- environments:
- Four types of Replay Buffer, Default is ER:
- Noisy Net for better exploration.
- Intrinsic Curiosity Module for almost all off-policy algorithms implemented.
- Parallel training multiple scenes for Gym
- Unified data format
Installation
method 1:
$ git clone https://github.com/StepNeverStop/RLs.git
$ cd RLs
$ conda create -n rls python=3.8
$ conda activate rls
# Windows
$ pip install -e .[windows]
# Linux or Mac OS
$ pip install -e .
method 1:
conda env create -f environment.yaml
If using ml-agents:
$ pip install -e .[unity]
You can download the builded docker image from here:
$ docker pull keavnn/rls:latest
If anyone who wants to send a PR, plz format all code-files first:
$ pip install -e .[pr]
$ python auto_format.py -d ./
Implemented Algorithms
For now, these algorithms are available:
- Multi-Agent training algorithms:
- Independent-SARL, i.e. IQL, I-DQN, etc.
- Value-Decomposition Networks, VDN
- Monotonic Value Function Factorisation Networks, QMIX
- Multi-head Attention based Q-value Mixing Network, Qatten
- Factorize with Transformation, Qtran
- Duplex Dueling Multi-Agent Q-Learning, QPLEX
- Multi-Agent Deep Deterministic Policy Gradient, MADDPG
- Single-Agent training algorithms(Some algorithms that only support continuous space problems use Gumbel-softmax trick
to implement discrete versions, i.e. DDPG):
- Policy Gradient, PG
- Actor Critic, AC
- Synchronous Advantage Actor Critic, A2C
- :boom:Proximal Policy Optimization, PPO , DPPO
- Trust Region Policy Optimization, TRPO
- Natural Policy Gradient, NPG
- Deterministic Policy Gradient, DPG
- Deep Deterministic Policy Gradient, DDPG
- :fire:Soft Actor Critic, SAC, Discrete SAC
- Tsallis Actor Critic, TAC
- :fire:Twin Delayed Deep Deterministic Policy Gradient, TD3
- Deep Q-learning Network, DQN, 2013 , 2015
- Double Deep Q-learning Network, DDQN
- Dueling Double Deep Q-learning Network, DDDQN
- Deep Recurrent Q-learning Network, DRQN
- Deep Recurrent Double Q-learning, DRDQN
- Category 51, C51
- Quantile Regression DQN, QR-DQN
- Implicit Quantile Networks, IQN
- Rainbow DQN
- MaxSQN
- Soft Q-Learning, SQL
- Bootstrapped DQN
- Averaged DQN
- Hierachical training algorithms:
- Model-based algorithms:
- Offline algorithms(under implementation):
- Conservative Q-Learning for Offline Reinforcement Learning, CQL
- BCQ
- Benchmarking Batch Deep Reinforcement Learning Algorithms, Discrete
- Off-Policy Deep Reinforcement Learning without Exploration, Continuous
| Algorithms | Discrete | Continuous | Image | RNN | Command parameter | | :-----------------------------: | :------: | :--------: | :---: | :--: | :---------------: | | PG | ✓ | ✓ | ✓ | ✓ | pg | | AC | ✓ | ✓ | ✓ | ✓ | ac | | A2C | ✓ | ✓ | ✓ | ✓ | a2c | | NPG | ✓ | ✓ | ✓ | ✓ | npg | | TRPO | ✓ | ✓ | ✓ | ✓ | trpo | | PPO | ✓ | ✓ | ✓ | ✓ | ppo | | DQN | ✓ | | ✓ | ✓ | dqn | | Double DQN | ✓ | | ✓ | ✓ | ddqn | | Dueling Double DQN | ✓ | | ✓ | ✓ | dddqn | | Averaged DQN | ✓ | | ✓ | ✓ | averaged_dqn | | Bootstrapped DQN | ✓ | | ✓ | ✓ | bootstrappeddqn | | Soft Q-Learning | ✓ | | ✓ | ✓ | sql | | C51 | ✓ | | ✓ | ✓ | c51 | | QR-DQN | ✓ | | ✓ | ✓ | qrdqn | | IQN | ✓ | | ✓ | ✓ | iqn | | Rainbow | ✓ | | ✓ | ✓ | rainbow | | DPG | ✓ | ✓ | ✓ | ✓ | dpg | | DDPG | ✓ | ✓ | ✓
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
10.3kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
