SkillAgentSearch skills...

Xuance

XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library

Install / Use

/learn @agi-brain/Xuance

README

<div align="center"> <img src="docs/source/_static/figures/logo_1.png" width="400" height="auto" align=center /> </div>

XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library

PyPI Documentation Status GitHub Downloads GitHub Repo stars GitHub forks GitHub watchers

PyTorch TensorFlow MindSpore gymnasium pettingzoo PyPI - Python Version

Benchmarks

Full Documentation | 中文文档 | README_CN.md

XuanCe is an open-source ensemble of Deep Reinforcement Learning (DRL) algorithm implementations.

We call it as Xuan-Ce (玄策) in Chinese. "Xuan (玄)" means incredible and magic box, "Ce (策)" means policy.

DRL algorithms are sensitive to hyper-parameters tuning, varying in performance with different tricks, and suffering from unstable training processes, therefore, sometimes DRL algorithms seems elusive and "Xuan". This project gives a thorough, high-quality and easy-to-understand implementation of DRL algorithms, and hope this implementation can give a hint on the magics of reinforcement learning.

We expect it to be compatible with multiple deep learning backends( PyTorch, TensorFlow, and MindSpore), and hope it can really become a zoo full of DRL algorithms.

Paper link: https://arxiv.org/pdf/2312.16248.pdf

Table of Contents:

Features

  • :school_satchel: Highly modularized.
  • :thumbsup: Easy to learn, easy for installation, and easy for usage.
  • :twisted_rightwards_arrows: Flexible for model combination.
  • :tada: Abundant algorithms with various tasks.
  • :couple: Supports both DRL and MARL tasks.
  • :key: High compatibility for different users. (PyTorch, TensorFlow2, MindSpore, CPU, GPU, Linux, Windows, MacOS, etc.)
  • :zap: Fast running speed with parallel environments.
  • :computer: Distributed training with multi-GPUs.
  • 🎛️ Support automatically hyperparameters tuning.
  • :chart_with_upwards_trend: Good visualization effect with tensorboard or wandb tool.

Algorithms

:point_right: DRL

  • DQN: Deep Q Network [Paper]
  • Double DQN: DQN with Double Q-learning [Paper]
  • Dueling DQN: DQN with Dueling Network [Paper]
  • PER: DQN with Prioritized Experience Replay [Paper]
  • NoisyDQN: DQN with Parameter Space Noise for Exploration [Paper]
  • DRQN: Deep Recurrent Q-Network [Paper]
  • QRDQN: DQN with Quantile Regression [Paper]
  • C51: Distributional Reinforcement Learning [Paper]
  • PG: Vanilla Policy Gradient [Paper]
  • NPG: Natural Policy Gradient [Paper]
  • PPG: Phasic Policy Gradient [Paper] [Code]
  • A2C: Advantage Actor Critic [Paper] [Code]
  • SAC: Soft Actor-Critic [Paper] [Code]
  • SAC-Discrete: Soft Actor-Critic for Discrete Actions [Paper] [Code]
  • PPO-Clip: Proximal Policy Optimization with Clipped Objective [Paper] [Code]
  • PPO-KL: Proximal Policy Optimization with KL Divergence [Paper] [Code]
  • DDPG: Deep Deterministic Policy Gradient [Paper] [Code]
  • TD3: Twin Delayed Deep Deterministic Policy Gradient [Paper][Code]
  • P-DQN: Parameterised Deep Q-Network [Paper]
  • MP-DQN: Multi-pass Parameterised Deep Q-network [Paper] [Code]
  • SP-DQN: Split Parameterised Deep Q-Network [Paper]

:point_right: Model-Based Reinforcement Learning (MBRL)

:point_right: Multi-Agent Reinforcement Learning (MARL)

  • IQL: Independent Q-learning [Paper] [Code]
  • VDN: Value Decomposition Networks [Paper] [Code]
  • QMIX: Q-mixing networks [Paper] [Code]
  • WQMIX: Weighted Q-mixing networks [Paper] [Code]
  • QTRAN: Q-transformation [Paper] [Code]
  • DCG: Deep Coordination Graphs [Paper] [Code]
  • IDDPG: Independent Deep Deterministic Policy Gradient [Paper]
  • MADDPG: Multi-agent Deep Deterministic Policy Gradient [Paper] [Code]
  • IAC: Independent Actor-Critic [Paper] [Code]
  • COMA: Counterfactual Multi-agent Policy Gradient [Paper] [Code]
  • VDAC: Value-Decomposition Actor-Critic [Paper] [Code]
  • IPPO: Independent Proximal Policy Optimization [Paper] [Code]
  • MAPPO: Multi-agent Proximal Policy Optimization [Paper] [Code]
  • MFQ: Mean-Field Q-learning [Paper] [Code]
  • MFAC: Mean-Field Actor-Critic [Paper] [[Code](https://github.com/
View on GitHub
GitHub Stars1.0k
CategoryEducation
Updated20h ago
Forks153

Languages

Python

Security Score

100/100

Audited on Mar 22, 2026

No findings