ElegantRL
Massively Parallel Deep Reinforcement Learning. π₯
Install / Use
/learn @AI4Finance-Foundation/ElegantRLREADME
ElegantRL βε°ι β: Massively Parallel Deep Reinforcement Learning
<br/> <a href="https://github.com/AI4Finance-Foundation/ElegantRL" target="\_blank"> <div align="center"> <img src="figs/icon.jpg" width="40%"/> </div> <!-- <div align="center"><caption>Slack Invitation Link</caption></div> --> </a> <br/>ElegantRL is a lightweight and structurally clean reinforcement learning framework designed to express core RL algorithms with minimal complexity and maximal clarity.
The name βElegantβ reflects its philosophy: small in dependency footprint, yet elegant in code structure. The framework avoids unnecessary third-party libraries while maintaining modular design, mathematical transparency, and engineering readability.
ElegantRL focuses on implementing reinforcement learning algorithms in their pure form β clear, extensible, and efficient β without sacrificing performance or simplicity.
ElegantRL (website) is developed for users/developers with the following advantages:
-
Cloud-native: follows a cloud-native paradigm through micro-service architecture and containerization, and supports ElegantRL-Podracer and FinRL-Podracer.
-
Scalable: fully exploits the parallelism of DRL algorithms, making it easily scale out to hundreds or thousands of computing nodes on a cloud platform, say, a DGX SuperPOD platform with thousands of GPUs.
-
Elastic: allows to elastically and automatically allocate computing resources on the cloud.
-
Lightweight: the core codes have <1,000 lines (check Elegantrl_Helloworld).
-
Efficient: in many testing cases (e.g., single-GPU/multi-GPU/GPU-cloud), we find it more efficient than Ray RLlib.
-
Stable: much much much more stable than Stable Baselines 3 by utilizing various methods such as the Hamiltonian term.
-
Practical: used in multipe projects (FinRL, FinRL-Meta, etc.)
-
Massively parallel simulations are used in multipe projects (FinRL, etc.); therefore, the sampling speed is high since we can build many many GPU-based environments.
ElegantRL implements the following model-free deep reinforcement learning (DRL) algorithms:
- DDPG, TD3, SAC, PPO, REDQ for continuous actions in single-agent environment,
- DQN, Double DQN, D3QN for discrete actions in single-agent environment,
- QMIX, VDN, MADDPG, MAPPO, MATD3 in multi-agent environment.
For more details of DRL algorithms, please refer to the educational webpage OpenAI Spinning Up.
ElegantRL supports the following simulators:
- Isaac Gym for massively parallel simulations,
- OpenAI Gym, MuJoCo, PyBullet, FinRL for benchmarking.
Contents
Tutorials
- [Towardsdatascience] A New Era of Massively Parallel Simulation: A Practical Tutorial Using ElegantRL, Nov. 2, 2022.
- [MLearning.ai] ElegantRL: Much More Stable Deep Reinforcement Learning Algorithms than Stable-Baseline3, Mar. 3, 2022.
- [Towardsdatascience] ElegantRL-Podracer: A Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning, Dec. 11, 2021.
- [Towardsdatascience] ElegantRL: Mastering PPO Algorithms, May. 3, 2021.
- [MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part II), Apr. 19, 2021.
- [MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part I), Mar. 28, 2021.
- [Towardsdatascience] ElegantRL-Helloworld: A Lightweight and Stable Deep Reinforcement Learning Library, Mar. 4, 2021.
ElegantRL-Helloworld
<div align="center"> <img align="center" src=figs/File_structure.png width="800"> </div>For beginners, we maintain ElegantRL-Helloworld as a tutorial. Its goal is to get hands-on experience with ELegantRL.
- Run the tutorial code and learn about RL algorithms in this order: DQN -> DDPG -> PPO
- Write the suggestion for Eleagant_HelloWorld in github issue.
One sentence summary: an agent (agent.py) with Actor-Critic networks (net.py) is trained (run.py) by interacting with an environment (env.py).
File Structure
-
elegantrl # main folder
- agents # a collection of DRL algorithms
- AgentXXX.py # a collection of one kind of DRL algorithms
- net.py # a collection of network architectures
- envs # a collection of environments
- XxxEnv.py # a training environment for RL
- train # a collection of training programs
- demo.py # a collection of demos
- config.py # configurations (hyper-parameter)
- run.py # training loop
- worker.py # the worker class (explores the env, saving the data to replay buffer)
- learner.py # the learner class (update the networks, using the data in replay buffer)
- evaluator.py # the evaluator class (evaluate the cumulative rewards of policy network)
- replay_buffer.py # the buffer class (save sequences of transitions for training)
- agents # a collection of DRL algorithms
-
elegantrl_helloworld # tutorial version
- config.py # configurations (hyper-parameter)
- agent.py # DRL algorithms
- net.py # network architectures
- run.py # training loop
- env.py # environments for RL training
-
examples # a collection of example codes
-
ready-to-run Google-Colab notebooks
- quickstart_Pendulum_v1.ipynb
- tutorial_BipedalWalker_v3.ipynb
- tutorial_Creating_ChasingVecEnv.ipynb
- tutorial_LunarLanderContinuous_v2.ipynb
-
unit_tests # a collection of tests
Experimental Demos
More efficient than Ray RLlib
Experiments on Ant (MuJoCo), Humainoid (MuJoCo), Ant (Isaac Gym), Humanoid (Isaac Gym) # from left to right
<div align="center"> <img align="center" src=figs/envs.png width="800"> <img align="center" src=figs/performance1.png width="800"> <img align="center" src=figs/performance2.png width="800"> </div>ElegantRL fully supports Isaac Gym that runs massively parallel simulation (e.g., 4096 sub-envs) on one GPU.
More stable than Stable-baseline 3
Experiment on Hopper-v2 # ElegantRL achieves much smaller variance (average over 8 runs).
Also, PPO+H in ElegantRL completed the training process of 5M samples about 6x faster than Stable-Baseline3.
<div align="center"> <img align="center" src=figs/SB3_vs_ElegantRL.png width="640"> </div>Testing and Contributing
Our tests are written with the built-in unittest Python module for easy access. In order to run a specific test file (for example, test_training_agents.py), use the following command from the root directory:
python -m unittest unit_tests/test_training_agents.py
In order to run all the tests sequentially, you can use the following command:
python -m unittest discover
Please note that some of the tests require Isaac Gym to be installed on your system. If it is not, any tests related to Isaac Gym will fa
