OSRL
š¤ Elegant implementations of offline safe RL algorithms in PyTorch
Install / Use
/learn @liuzuxin/OSRLREADME
OSRL (Offline Safe Reinforcement Learning) offers a collection of elegant and extensible implementations of state-of-the-art offline safe reinforcement learning (RL) algorithms. Aimed at propelling research in offline safe RL, OSRL serves as a solid foundation to implement, benchmark, and iterate on safe RL solutions. This repository is heavily inspired by the CORL library for offline RL, check them out too!
The OSRL package is a crucial component of our larger benchmarking suite for offline safe learning, which also includes DSRL and FSRL, and is built to facilitate the development of robust and reliable offline safe RL solutions.
To learn more, please visit our project website. If you find this code useful, please cite our paper, which has been accepted by the DMLR journal:
@article{
liu2024offlinesaferl,
title={Datasets and Benchmarks for Offline Safe Reinforcement Learning},
author={Zuxin Liu and Zijian Guo and Haohong Lin and Yihang Yao and Jiacheng Zhu and Zhepeng Cen and Hanjiang Hu and Wenhao Yu and Tingnan Zhang and Jie Tan and Ding Zhao},
journal={Journal of Data-centric Machine Learning Research},
year={2024}
}
Structure
The structure of this repo is as follows:
āāā examples
ā āāā configs # the training configs of each algorithm
ā āāā eval # the evaluation escipts
ā āāā train # the training scipts
āāā osrl
ā āāā algorithms # offline safe RL algorithms
ā āāā common # base networks and utils
The implemented offline safe RL and imitation learning algorithms include:
| Algorithm | Type | Description | |:-------------------:|:-----------------:|:------------------------:| | BCQ-Lag | Q-learning | BCQ with PID Lagrangian | | BEAR-Lag | Q-learning | BEARL with PID Lagrangian | | CPQ | Q-learning | Constraints Penalized Q-learning (CPQ)) | | COptiDICE | Distribution Correction Estimation | Offline Constrained Policy Optimization via stationary DIstribution Correction Estimation | | CDT | Sequential Modeling | Constrained Decision Transformer | | BC-All | Imitation Learning | Behavior Cloning with all datasets | | BC-Safe | Imitation Learning | Behavior Cloning with safe trajectories | | BC-Frontier | Imitation Learning | Behavior Cloning with high-reward trajectories |
Installation
OSRL is currently hosted on PyPI, you can simply install it by:
pip install osrl-lib
You can also pull the repo and install:
git clone https://github.com/liuzuxin/OSRL.git
cd osrl
pip install -e .
If you want to use the CDT algorithm, please also manually install the OApackage:
pip install OApackage==2.7.6
How to use OSRL
The example usage are in the examples folder, where you can find the training and evaluation scripts for all the algorithms.
All the parameters and their default configs for each algorithm are available in the examples/configs folder.
OSRL uses the WandbLogger in FSRL and Pyrallis configuration system. The offline dataset and offline environments are provided in DSRL, so make sure you install both of them first.
Training
For example, to train the bcql method, simply run by overriding the default parameters:
python examples/train/train_bcql.py --task OfflineCarCircle-v0 --param1 args1 ...
By default, the config file and the logs during training will be written to logs\ folder and the training plots can be viewed online using Wandb.
You can also launch a sequence of experiments or in parallel via the EasyRunner package, see examples/train_all_tasks.py for details.
Evaluation
To evaluate a trained agent, for example, a BCQ agent, simply run
python examples/eval/eval_bcql.py --path path_to_model --eval_episodes 20
It will load config file from path_to_model/config.yaml and model file from path_to_model/checkpoints/model.pt, run 20 episodes, and print the average normalized reward and cost. The pretrained checkpoints for all datasets are available here for reference.
Acknowledgement
The framework design and most baseline implementations of OSRL are heavily inspired by the CORL project, which is a great library for offline RL, and the cleanrl project, which targets online RL. So do check them out if you are interested!
Contributing
If you have any suggestions or find any bugs, please feel free to submit an issue or a pull request. We welcome contributions from the community!
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star āļø this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
