RLeXplore
RLeXplore provides stable baselines of exploration methods in reinforcement learning, such as intrinsic curiosity module (ICM), random network distillation (RND) and rewarding impact-driven exploration (RIDE).
Install / Use
/learn @RLE-Foundation/RLeXploreREADME
RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
</div>RLeXplore is a unified, highly-modularized and plug-and-play toolkit that currently provides high-quality and reliable implementations of eight representative intrinsic reward algorithms. It used to be challenging to compare intrinsic reward algorithms due to various confounding factors, including distinct implementations, optimization strategies, and evaluation methodologies. Therefore, RLeXplore is designed to provide unified and standardized procedures for constructing, computing, and optimizing intrinsic reward modules.
The workflow of RLeXplore is illustrated as follows:
<div align=center> <img src='./assets/workflow.png' style="width: 100%"> </div>Table of Contents
Installation
- with pip
recommended
Open a terminal and install rllte with pip:
conda create -n rllte python=3.8
pip install rllte-core
- with git
Open a terminal and clone the repository from GitHub with git:
git clone https://github.com/RLE-Foundation/rllte.git
pip install -e .
Now you can invoke the intrinsic reward module by:
from rllte.xplore.reward import ICM, RIDE, ...
Module List
| Type | Modules | |--- |--- | | Count-based | PseudoCounts, RND, E3B | | Curiosity-driven | ICM, Disagreement, RIDE | | Memory-based | NGU | | Information theory-based | RE3 |
Tutorials
Click the following links to get the code notebook:
- Quick Start
- RLeXplore with RLLTE
- RLeXplore with Stable-Baselines3
- RLeXplore with CleanRL
- Exploring Hybrid Intrinsic Rewards
- Custom Intrinsic Rewards
Benchmark Results
We have published a space using Weights & Biases (W&B) to store reusable experiment results on recognized benchmarks. The space link is: RLeXplore's W&B Space.
<div align=center> <img src='./assets/wandb.png' style="width: 75%"> </div>RLLTE's PPO+RLeXploreon SuperMarioBros:
-
RLLTE's PPO+RLeXploreon MiniGrid:- DoorKey-16×16
- KeyCorridorS8R5, KeyCorridorS9R6, KeyCorridorS10R7, MultiRoom-N7-S8, MultiRoom-N10-S10, MultiRoom-N12-S10, Dynamic-Obstacles-16x16, and LockedRoom
-
RLLTE's PPO+RLeXploreon Procgen-Maze:- Number of levels=1
- Number of levels=200
-
RLLTE's PPO+RLeXploreon five hard-exploration tasks of ALE:
| Algorithm | Gravitar | MontezumaRevenge | PrivateEye | Seaquest | Venture | |:-------------:|:------------:|:--------------------:|:--------------:|:------------:|:-----------:| | Extrinsic | 1060.19 | 42.83 | 88.37 | 942.37 | 391.73 | | Disagreement | 689.12 | 0.00 | 33.23 | 6577.03 | 468.43 | | E3B | 503.43 | 0.50 | 66.23 | 8690.65 | 0.80 | | ICM | 194.71 | 31.14 | -27.50 | 2626.13 | 0.54 | | PseudoCounts | 295.49 | 0.00 | 1076.74 | 668.96 | 1.03 | | RE3 | 130.00 | 2.68 | 312.72 | 864.60 | 0.06 | | RIDE | 452.53 | 0.00 | -1.40 | 1024.39 | 404.81 | | RND | 835.57 | 160.22 | 45.85 | 5989.06 | 544.73 |
CleanRL's PPO+RLeXplore's RNDon Montezuma's Revenge:
RLLTE's SAC+RLeXploreon Ant-UMaze:
Cite Us
To cite this repository in publications:
@article{yuan_roger2025rlexplore,
title={RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning},
author={Yuan, Mingqi and Castanyer, Roger Creus and Li, Bo and Jin, Xin and Berseth, Glen and Zeng, Wenjun},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=B9BHjTN4z6},
note={}
}
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
