Flexrl
Non-modular implementation of common RL algorithms
Install / Use
/learn @alexchen-buaa/FlexrlREADME
FlexRL
FlexRL is a deep online/offline reinforcement learning library inspired and adapted from CleanRL and CORL that provides single-file implementations of algorithms that aren't necessarily covered by these libraries. FlexRL introduces the following features:
- Consistent style across online and offline algorithms
- Easy configuration with Pyrallis and tqdm progress bar
- A few custom environments under
gymAPI
Quick Start
Installing FlexRL
git clone https://github.com/alexchen-buaa/flexrl.git
cd flexrl
pip install -e .
Usage
Run the algorithms as individual scripts. Like CORL, we use Pyrallis for configuration management. The arguments can be specified using command-line arguments, a yaml file, or both:
python ppo.py --config_path=some_config.yaml
Algorithms Implemented
| Type | Algorithm | Variants Implemented | | -------- | ---------------------------------- | -------------------------------------------------------------- | | Online | Proximal Policy Optimization (PPO) | ppo.py | | | | ppo_atari.py | | | | ppo_multidiscrete.py | | | Deep Q-Networks (DQN) | dqn.py | | | | dqn_atari.py | | | Quantile-Regression DQN (QR-DQN) | qr_dqn.py | | | | qr_dqn_atari.py | | | Soft Actor-Critic (SAC) | sac.py | | Offline | Implicit Q-Learning (IQL) | iql.py | | | | iql_jax.py | | | In-Sample Actor-Critic (InAC) | inac.py | | | | inac_jax.py | | | Soft Actor-Critic Ensemble (SAC-N) | sac_n_jax.py |
Extra Requirements
Atari/ALE
According to The Arcade Learning Environment, you can use the command line tool to import your ROMS:
ale-import-roms roms/
MuJoCo
To use MuJoCo envs (for both online training and offline evaluation), you need to install MuJoCo first. See mujoco-py for instructions.
JAX with CUDA Support
To use JAX with CUDA support, you need to install the NVIDIA driver first. See JAX Installation for instructions.
References
- [1] S. Huang, R. F. J. Dossa, C. Ye, and J. Braga, “CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms.” arXiv, Nov. 16, 2021. Accessed: Nov. 21, 2022. [Online]. Available: http://arxiv.org/abs/2111.08819
- [2] Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann, “Stable-Baselines3: Reliable Reinforcement Learning Implementations,” Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021.
- [3] W. Dabney, M. Rowland, M. G. Bellemare, and R. Munos, “Distributional Reinforcement Learning with Quantile Regression,” arXiv:1710.10044 [cs, stat], Oct. 2017, Accessed: Apr. 15, 2022. [Online]. Available: http://arxiv.org/abs/1710.10044
- [4] I. Kostrikov, A. Nair, and S. Levine, “Offline Reinforcement Learning with Implicit Q-Learning.” arXiv, Oct. 12, 2021. Accessed: Mar. 29, 2023. [Online]. Available: http://arxiv.org/abs/2110.06169
- [5] C. Xiao, H. Wang, Y. Pan, A. White, and M. White, “The In-Sample Softmax for Offline Reinforcement Learning.” arXiv, Feb. 28, 2023. Accessed: Apr. 02, 2023. [Online]. Available: http://arxiv.org/abs/2302.14372
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
