Omnisafe
JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
Install / Use
/learn @PKU-Alignment/OmnisafeREADME
OmniSafe is an infrastructural framework designed to accelerate safe reinforcement learning (RL) research. It provides a comprehensive and reliable benchmark for safe RL algorithms, and also an out-of-box modular toolkit for researchers. SafeRL intends to develop algorithms that minimize the risk of unintended harm or unsafe behavior.
OmniSafe stands as the inaugural unified learning framework in the realm of safe reinforcement learning, aiming to foster the Growth of SafeRL Learning Community. The key features of OmniSafe:
-
Highly Modular Framework. OmniSafe presents a highly modular framework, incorporating an extensive collection of tens of algorithms tailored for safe reinforcement learning across diverse domains. This framework is versatile due to its abstraction of various algorithm types and well-designed API, using the Adapter and Wrapper design components to bridge gaps and enable seamless interactions between different components. This design allows for easy extension and customization, making it a powerful tool for developers working with different types of algorithms.
-
High-performance parallel computing acceleration. By harnessing the capabilities of
torch.distributed, OmniSafe accelerates the learning process of algorithms with process parallelism. This enables OmniSafe not only to support environment-level asynchronous parallelism but also incorporates agent asynchronous learning. This methodology bolsters training stability and expedites the training process via the deployment of a parallel exploration mechanism. The integration of agent asynchronous learning in OmniSafe underscores its commitment to providing a versatile and robust platform for advancing SafeRL research. -
Out-of-box toolkits. OmniSafe offers customizable toolkits for tasks like training, benchmarking, analyzing, and rendering. Tutorials and user-friendly APIs make it easy for beginners and average users, while advanced researchers can enhance their efficiency without complex code.

If you find OmniSafe useful or use OmniSafe in your research, please cite it in your publications.
@article{JMLR:v25:23-0681,
author = {Jiaming Ji and Jiayi Zhou and Borong Zhang and Juntao Dai and Xuehai Pan and Ruiyang Sun and Weidong Huang and Yiran Geng and Mickel Liu and Yaodong Yang},
title = {OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research},
journal = {Journal of Machine Learning Research},
year = {2024},
volume = {25},
number = {285},
pages = {1--6},
url = {http://jmlr.org/papers/v25/23-0681.html}
}
Table of Contents <!-- omit in toc --> <!-- markdownlint-disable heading-increment -->
- Quick Start
- Implemented Algorithms
- Getting Started
- Changelog
- Citing OmniSafe
- Publications using OmniSafe
- The OmniSafe Team
- License
Quick Start
Installation
Prerequisites
OmniSafe requires Python 3.8+ and PyTorch 1.10+.
We support and test for Python 3.8, 3.9, 3.10 on Linux. Meanwhile, we also support M1 and M2 versions of macOS. We will accept PRs related to Windows, but do not officially support it.
Install from source
# Clone the repo
git clone https://github.com/PKU-Alignment/omnisafe.git
cd omnisafe
# Create a conda environment
conda env create --file conda-recipe.yaml
conda activate omnisafe
# Install omnisafe
pip install -e .
Install from PyPI
pip install omnisafe
Implemented Algorithms
<details> <summary><b><big>Latest SafeRL Papers</big></b></summary>- [AAAI 2023] Augmented Proximal Policy Optimization for Safe Reinforcement Learning (APPO)
- [NeurIPS 2022] Constrained Update Projection Approach to Safe Policy Optimization (CUP)
- [NeurIPS 2022] Effects of Safety State Augmentation on Safe Exploration (Simmer)
- [NeurIPS 2022] Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm
- [ICML 2022] Sauté RL: Almost Surely Safe Reinforcement Learning Using State Augmentation (SauteRL)
- [IJCAI 2022] Penalized Proximal Policy Optimization for Safe Reinforcement Learning
- [AAAI 2022] Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning (CAP)
- [x] The Lagrange version of PPO (PPO-Lag)
- [x] The Lagrange version of TRPO (TRPO-Lag)
- [x] [ICML 2017] Constrained Policy Optimization (CPO)
- [x] [ICLR 2019] Reward Constrained Policy Optimization (RCPO)
- [x] [ICML 2020] Responsive Safety in Reinforcement Learning by PID Lagrangian Methods (PID-Lag)
- [x] [NeurIPS 2020] First Order Constrained Optimization in Policy Space (FOCOPS)
- [x] [AAAI 2020] IPO: Interior-point Policy Optimization under Constraints (IPO)
- [x] [ICLR 2020] Projection-Based Constrained Policy Optimization (PCPO)
- [x] [ICML 2021] CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee
- [x] [IJCAI 2022] Penalized Proximal Policy Optimization for Safe Reinforcement Learning(P3O)
- [Preprint 2019] The Lagrangian version of DDPG (DDPGLag)
- [Preprint 2019] The Lagrangian version of TD3 (TD3Lag)
- [Preprint 2019] The Lagrangian version of SAC (SACLag)
- [ICML 2020] Responsive Safety in Reinforcement Learning by PID Lagrangian Methods (DDPGPID)
- [ICML 2020] Responsive Safety in Reinforcement Learning by PID Lagrangian Methods (TD3PID)
- [ICML 2020] Responsive Safety in Reinforcement Learning by PID Lagrangian Methods (SACPID)
<summar
