SkillAgentSearch skills...

ElegantRL

Massively Parallel Deep Reinforcement Learning. πŸ”₯

Install / Use

/learn @AI4Finance-Foundation/ElegantRL

README

<div align="center"> <img align="center" width="30%" alt="image" src="https://github.com/AI4Finance-Foundation/FinGPT/assets/31713746/e0371951-1ce1-488e-aa25-0992dafcc139"> </div>

ElegantRL β€œε°ι›…β€: Massively Parallel Deep Reinforcement Learning

Downloads Downloads Python 3.6 PyPI License

<br/> <a href="https://github.com/AI4Finance-Foundation/ElegantRL" target="\_blank"> <div align="center"> <img src="figs/icon.jpg" width="40%"/> </div> <!-- <div align="center"><caption>Slack Invitation Link</caption></div> --> </a> <br/>

ElegantRL is a lightweight and structurally clean reinforcement learning framework designed to express core RL algorithms with minimal complexity and maximal clarity.

The name β€œElegant” reflects its philosophy: small in dependency footprint, yet elegant in code structure. The framework avoids unnecessary third-party libraries while maintaining modular design, mathematical transparency, and engineering readability.

ElegantRL focuses on implementing reinforcement learning algorithms in their pure form β€” clear, extensible, and efficient β€” without sacrificing performance or simplicity.

Visitors

ElegantRL (website) is developed for users/developers with the following advantages:

  • Cloud-native: follows a cloud-native paradigm through micro-service architecture and containerization, and supports ElegantRL-Podracer and FinRL-Podracer.

  • Scalable: fully exploits the parallelism of DRL algorithms, making it easily scale out to hundreds or thousands of computing nodes on a cloud platform, say, a DGX SuperPOD platform with thousands of GPUs.

  • Elastic: allows to elastically and automatically allocate computing resources on the cloud.

  • Lightweight: the core codes have <1,000 lines (check Elegantrl_Helloworld).

  • Efficient: in many testing cases (e.g., single-GPU/multi-GPU/GPU-cloud), we find it more efficient than Ray RLlib.

  • Stable: much much much more stable than Stable Baselines 3 by utilizing various methods such as the Hamiltonian term.

  • Practical: used in multipe projects (FinRL, FinRL-Meta, etc.)

  • Massively parallel simulations are used in multipe projects (FinRL, etc.); therefore, the sampling speed is high since we can build many many GPU-based environments.

ElegantRL implements the following model-free deep reinforcement learning (DRL) algorithms:

  • DDPG, TD3, SAC, PPO, REDQ for continuous actions in single-agent environment,
  • DQN, Double DQN, D3QN for discrete actions in single-agent environment,
  • QMIX, VDN, MADDPG, MAPPO, MATD3 in multi-agent environment.

For more details of DRL algorithms, please refer to the educational webpage OpenAI Spinning Up.

ElegantRL supports the following simulators:

  • Isaac Gym for massively parallel simulations,
  • OpenAI Gym, MuJoCo, PyBullet, FinRL for benchmarking.

Contents

Tutorials

ElegantRL-Helloworld

<div align="center"> <img align="center" src=figs/File_structure.png width="800"> </div>

For beginners, we maintain ElegantRL-Helloworld as a tutorial. Its goal is to get hands-on experience with ELegantRL.

One sentence summary: an agent (agent.py) with Actor-Critic networks (net.py) is trained (run.py) by interacting with an environment (env.py).

File Structure

  • elegantrl # main folder

    • agents # a collection of DRL algorithms
      • AgentXXX.py # a collection of one kind of DRL algorithms
      • net.py # a collection of network architectures
    • envs # a collection of environments
      • XxxEnv.py # a training environment for RL
    • train # a collection of training programs - demo.py # a collection of demos
      • config.py # configurations (hyper-parameter)
      • run.py # training loop
      • worker.py # the worker class (explores the env, saving the data to replay buffer)
      • learner.py # the learner class (update the networks, using the data in replay buffer)
      • evaluator.py # the evaluator class (evaluate the cumulative rewards of policy network)
      • replay_buffer.py # the buffer class (save sequences of transitions for training)
  • elegantrl_helloworld # tutorial version

    • config.py # configurations (hyper-parameter)
    • agent.py # DRL algorithms
    • net.py # network architectures
    • run.py # training loop
    • env.py # environments for RL training
  • examples # a collection of example codes

  • ready-to-run Google-Colab notebooks

    • quickstart_Pendulum_v1.ipynb
    • tutorial_BipedalWalker_v3.ipynb
    • tutorial_Creating_ChasingVecEnv.ipynb
    • tutorial_LunarLanderContinuous_v2.ipynb
  • unit_tests # a collection of tests

Experimental Demos

More efficient than Ray RLlib

Experiments on Ant (MuJoCo), Humainoid (MuJoCo), Ant (Isaac Gym), Humanoid (Isaac Gym) # from left to right

<div align="center"> <img align="center" src=figs/envs.png width="800"> <img align="center" src=figs/performance1.png width="800"> <img align="center" src=figs/performance2.png width="800"> </div>

ElegantRL fully supports Isaac Gym that runs massively parallel simulation (e.g., 4096 sub-envs) on one GPU.

More stable than Stable-baseline 3

Experiment on Hopper-v2 # ElegantRL achieves much smaller variance (average over 8 runs).

Also, PPO+H in ElegantRL completed the training process of 5M samples about 6x faster than Stable-Baseline3.

<div align="center"> <img align="center" src=figs/SB3_vs_ElegantRL.png width="640"> </div>

Testing and Contributing

Our tests are written with the built-in unittest Python module for easy access. In order to run a specific test file (for example, test_training_agents.py), use the following command from the root directory:

python -m unittest unit_tests/test_training_agents.py

In order to run all the tests sequentially, you can use the following command:

python -m unittest discover

Please note that some of the tests require Isaac Gym to be installed on your system. If it is not, any tests related to Isaac Gym will fa

View on GitHub
GitHub Stars4.3k
CategoryEducation
Updated49m ago
Forks971

Languages

Python

Security Score

85/100

Audited on Mar 23, 2026

No findings