XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library

PyPI - Python Version

Full Documentation | 中文文档 | README_CN.md

XuanCe is an open-source ensemble of Deep Reinforcement Learning (DRL) algorithm implementations.

We call it as Xuan-Ce (玄策) in Chinese. "Xuan (玄)" means incredible and magic box, "Ce (策)" means policy.

DRL algorithms are sensitive to hyper-parameters tuning, varying in performance with different tricks, and suffering from unstable training processes, therefore, sometimes DRL algorithms seems elusive and "Xuan". This project gives a thorough, high-quality and easy-to-understand implementation of DRL algorithms, and hope this implementation can give a hint on the magics of reinforcement learning.

We expect it to be compatible with multiple deep learning backends( PyTorch, TensorFlow, and MindSpore), and hope it can really become a zoo full of DRL algorithms.

Paper link: https://arxiv.org/pdf/2312.16248.pdf

Features

:school_satchel: Highly modularized.
:thumbsup: Easy to learn, easy for installation, and easy for usage.
:twisted_rightwards_arrows: Flexible for model combination.
:tada: Abundant algorithms with various tasks.
:couple: Supports both DRL and MARL tasks.
:key: High compatibility for different users. (PyTorch, TensorFlow2, MindSpore, CPU, GPU, Linux, Windows, MacOS, etc.)
:zap: Fast running speed with parallel environments.
:computer: Distributed training with multi-GPUs.
🎛️ Support automatically hyperparameters tuning.
:chart_with_upwards_trend: Good visualization effect with tensorboard or wandb tool.

Algorithms

:point_right: DRL

DQN: Deep Q Network [Paper]
Double DQN: DQN with Double Q-learning [Paper]
Dueling DQN: DQN with Dueling Network [Paper]
PER: DQN with Prioritized Experience Replay [Paper]
NoisyDQN: DQN with Parameter Space Noise for Exploration [Paper]
DRQN: Deep Recurrent Q-Network [Paper]
QRDQN: DQN with Quantile Regression [Paper]
C51: Distributional Reinforcement Learning [Paper]
PG: Vanilla Policy Gradient [Paper]
NPG: Natural Policy Gradient [Paper]
PPG: Phasic Policy Gradient [Paper] [Code]
A2C: Advantage Actor Critic [Paper] [Code]
SAC: Soft Actor-Critic [Paper] [Code]
SAC-Discrete: Soft Actor-Critic for Discrete Actions [Paper] [Code]
PPO-Clip: Proximal Policy Optimization with Clipped Objective [Paper] [Code]
PPO-KL: Proximal Policy Optimization with KL Divergence [Paper] [Code]
DDPG: Deep Deterministic Policy Gradient [Paper] [Code]
TD3: Twin Delayed Deep Deterministic Policy Gradient [Paper][Code]
P-DQN: Parameterised Deep Q-Network [Paper]
MP-DQN: Multi-pass Parameterised Deep Q-network [Paper] [Code]
SP-DQN: Split Parameterised Deep Q-Network [Paper]

:point_right: Model-Based Reinforcement Learning (MBRL)

DreamerV2 [Paper] [Code]
DreamerV3 [Paper] [Code]
HarmonyDream [Paper] [Code]

:point_right: Multi-Agent Reinforcement Learning (MARL)

IQL: Independent Q-learning [Paper] [Code]
VDN: Value Decomposition Networks [Paper] [Code]
QMIX: Q-mixing networks [Paper] [Code]
WQMIX: Weighted Q-mixing networks [Paper] [Code]
QTRAN: Q-transformation [Paper] [Code]
DCG: Deep Coordination Graphs [Paper] [Code]
IDDPG: Independent Deep Deterministic Policy Gradient [Paper]
MADDPG: Multi-agent Deep Deterministic Policy Gradient [Paper] [Code]
IAC: Independent Actor-Critic [Paper] [Code]
COMA: Counterfactual Multi-agent Policy Gradient [Paper] [Code]
VDAC: Value-Decomposition Actor-Critic [Paper] [Code]
IPPO: Independent Proximal Policy Optimization [Paper] [Code]
MAPPO: Multi-agent Proximal Policy Optimization [Paper] [Code]
MFQ: Mean-Field Q-learning [Paper] [Code]
MFAC: Mean-Field Actor-Critic [Paper] [[Code](https://github.com/

Xuance

Install / Use

README