Xuance
XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library
Install / Use
/learn @agi-brain/XuanceREADME
XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library
Full Documentation | 中文文档 | README_CN.md
XuanCe is an open-source ensemble of Deep Reinforcement Learning (DRL) algorithm implementations.
We call it as Xuan-Ce (玄策) in Chinese. "Xuan (玄)" means incredible and magic box, "Ce (策)" means policy.
DRL algorithms are sensitive to hyper-parameters tuning, varying in performance with different tricks, and suffering from unstable training processes, therefore, sometimes DRL algorithms seems elusive and "Xuan". This project gives a thorough, high-quality and easy-to-understand implementation of DRL algorithms, and hope this implementation can give a hint on the magics of reinforcement learning.
We expect it to be compatible with multiple deep learning backends( PyTorch, TensorFlow, and MindSpore), and hope it can really become a zoo full of DRL algorithms.
Paper link: https://arxiv.org/pdf/2312.16248.pdf
Table of Contents:
Features
- :school_satchel: Highly modularized.
- :thumbsup: Easy to learn, easy for installation, and easy for usage.
- :twisted_rightwards_arrows: Flexible for model combination.
- :tada: Abundant algorithms with various tasks.
- :couple: Supports both DRL and MARL tasks.
- :key: High compatibility for different users. (PyTorch, TensorFlow2, MindSpore, CPU, GPU, Linux, Windows, MacOS, etc.)
- :zap: Fast running speed with parallel environments.
- :computer: Distributed training with multi-GPUs.
- 🎛️ Support automatically hyperparameters tuning.
- :chart_with_upwards_trend: Good visualization effect with tensorboard or wandb tool.
Algorithms
:point_right: DRL
- DQN: Deep Q Network [Paper]
- Double DQN: DQN with Double Q-learning [Paper]
- Dueling DQN: DQN with Dueling Network [Paper]
- PER: DQN with Prioritized Experience Replay [Paper]
- NoisyDQN: DQN with Parameter Space Noise for Exploration [Paper]
- DRQN: Deep Recurrent Q-Network [Paper]
- QRDQN: DQN with Quantile Regression [Paper]
- C51: Distributional Reinforcement Learning [Paper]
- PG: Vanilla Policy Gradient [Paper]
- NPG: Natural Policy Gradient [Paper]
- PPG: Phasic Policy Gradient [Paper] [Code]
- A2C: Advantage Actor Critic [Paper] [Code]
- SAC: Soft Actor-Critic [Paper] [Code]
- SAC-Discrete: Soft Actor-Critic for Discrete Actions [Paper] [Code]
- PPO-Clip: Proximal Policy Optimization with Clipped Objective [Paper] [Code]
- PPO-KL: Proximal Policy Optimization with KL Divergence [Paper] [Code]
- DDPG: Deep Deterministic Policy Gradient [Paper] [Code]
- TD3: Twin Delayed Deep Deterministic Policy Gradient [Paper][Code]
- P-DQN: Parameterised Deep Q-Network [Paper]
- MP-DQN: Multi-pass Parameterised Deep Q-network [Paper] [Code]
- SP-DQN: Split Parameterised Deep Q-Network [Paper]
:point_right: Model-Based Reinforcement Learning (MBRL)
:point_right: Multi-Agent Reinforcement Learning (MARL)
- IQL: Independent Q-learning [Paper] [Code]
- VDN: Value Decomposition Networks [Paper] [Code]
- QMIX: Q-mixing networks [Paper] [Code]
- WQMIX: Weighted Q-mixing networks [Paper] [Code]
- QTRAN: Q-transformation [Paper] [Code]
- DCG: Deep Coordination Graphs [Paper] [Code]
- IDDPG: Independent Deep Deterministic Policy Gradient [Paper]
- MADDPG: Multi-agent Deep Deterministic Policy Gradient [Paper] [Code]
- IAC: Independent Actor-Critic [Paper] [Code]
- COMA: Counterfactual Multi-agent Policy Gradient [Paper] [Code]
- VDAC: Value-Decomposition Actor-Critic [Paper] [Code]
- IPPO: Independent Proximal Policy Optimization [Paper] [Code]
- MAPPO: Multi-agent Proximal Policy Optimization [Paper] [Code]
- MFQ: Mean-Field Q-learning [Paper] [Code]
- MFAC: Mean-Field Actor-Critic [Paper] [[Code](https://github.com/
