Pymarl2

Fine-tuned MARL algorithms on SMAC (100% win rates on most scenarios)

Generate Convert Improve

Install / Use

/learn @hijkzzz/Pymarl2

About this skill

Quality Score

0/100

README

- If you want high sample efficiency, please use qmix_high_sample_efficiency.yaml
- which uses 4 processes for training, slower but higher sample efficiency.
- Performance is *not* comparable of models trained with different number of processes.

PyMARL2

Open-source code for Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning.

This repository is fine-tuned for StarCraft Multi-agent Challenge (SMAC). For other multi-agent tasks, we also recommend an optimized implementation of QMIX: https://github.com/marlbenchmark/off-policy.

StarCraft 2 version: SC2.4.10. difficulty: 7.

2022.10.10 update: add qmix_high_sample_efficiency.yaml, which uses 4 processes for training, slower but higher sample efficiency.

2021.10.28 update: add Google Football Environments [vdn_gfootball.yaml] (use `simple115 features`).

2021.10.4 update: add QMIX with attention (qmix_att.yaml) as a baseline for Communication tasks.

Finetuned-QMIX

There are so many code-level tricks in the Multi-agent Reinforcement Learning (MARL), such as:

Value function clipping (clip max Q values for QMIX)
Value Normalization
Reward scaling
Orthogonal initialization and layer scaling
Adam
Neural networks hidden size
learning rate annealing
Reward Clipping
Observation Normalization
Gradient Clipping
Large Batch Size
N-step Returns(including GAE($\lambda$) and Q($\lambda$) ...)
Rollout Process Number
$\epsilon$-greedy annealing steps
Death Agent Masking

Related Works

Implementation Matters in Deep RL: A Case Study on PPO and TRPO
What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Using a few of tricks above (bold texts), we enabled QMIX (qmix.yaml) to solve almost all hard scenarios of SMAC (Fine-tuned hyperparameters for each scenarios).

| Senarios | Difficulty | QMIX (batch_size=128) | Finetuned-QMIX | | ------------ | :--------: | :-------------------: | :------------------------------------------------: | | 8m | Easy | - | 100% | | 2c_vs_1sc | Easy | - | 100% | | 2s3z | Easy | - | 100% | | 1c3s5z | Easy | - | 100% | | 3s5z | Easy | - | 100% | | 8m_vs_9m | Hard | 84% | 100% | | 5m_vs_6m | Hard | 84% | 90% | | 3s_vs_5z | Hard | 96% | 100% | | bane_vs_bane | Hard | 100% | 100% | | 2c_vs_64zg | Hard | 100% | 100% | | corridor | Super Hard | 0% | 100% | | MMM2 | Super Hard | 98% | 100% | | 3s5z_vs_3s6z | Super Hard | 3% | 93%(hidden_size = 256, qmix_large.yaml) | | 27m_vs_30m | Super Hard | 56% | 100% | | 6h_vs_8z | Super Hard | 0% | 93%($\lambda$ = 0.3, epsilon_anneal_time = 500000) |

Re-Evaluation

Afterwards, we re-evaluate numerous QMIX variants with normalized the tricks (a general set of hyperparameters), and find that QMIX achieves the SOTA.

| Scenarios | Difficulty | Value-based | | | | | Policy-based | | | | | ------------ | ---------- | :-------------: | :------------: | :------------: | :------------: | :------------: | :------------: | ----- | :------------: | :------------: | | | | QMIX | VDNs | Qatten | QPLEX | WQMIX | LICA | VMIX | DOP | RIIT | | 2c_vs_64zg | Hard | 100% | 100% | 100% | 100% | 100% | 100% | 98% | 84% | 100% | | 8m_vs_9m | Hard | 100% | 100% | 100% | 95% | 95% | 48% | 75% | 96% | 95% | | 3s_vs_5z | Hard | 100% | 100% | 100% | 100% | 100% | 96% | 96% | 100% | 96% | | 5m_vs_6m | Hard | 90% | 90% | 90% | 90% | 90% | 53% | 9% | 63% | 67% | | 3s5z_vs_3s6z | S-Hard | 75% | 43% | 62% | 68% | 56% | 0% | 56% | 0% | 75% | | corridor | S-Hard | 100% | 98% | 100% | 96% | 96% | 0% | 0% | 0% | 100% | | 6h_vs_8z | S-Hard | 84% | 87% | 82% | 78% | 75% | 4% | 80% | 0% | 19% | | MMM2 | S-Hard | 100% | 96% | 100% | 100% | 96% | 0% | 70% | 3% | 100% | | 27m_vs_30m | S-Hard | 100% | 100% | 100% | 100% | 100% | 9% | 93% | 0% | 93% | | Discrete PP | - | 40 | 39 | - | 39 | 39 | 30 | 39 | 38 | 38 | | Avg. Score | Hard+ | 94.9% | 91.2% | 92.7% | 92.5% | 90.5% | 29.2% | 67.4% | 44.1% | 84.0% |

Communication

We also tested our QMIX-with-attention (qmix_att.yaml, $\lambda=0.3$, attention_heads=4) on some maps (from NDQ) that require communication.

| Senarios (200w steps) | Difficulty | Finetuned-QMIX (No Communication) | QMIX-with-attention ( Communication) | | --------------------- | :--------: | :-------------------------------: | :----------------------------------: | | 1o_10b_vs_1r | - | 56% | 87% | | 1o_2r_vs_4r | - | 50% | 95% | | bane_vs_hM | - | 0% | 0% |

Google Football

We also tested VDN (vdn_gfootball.yaml) on some maps (from Google Football). Specially, we use simple115 features to train the model (The Google Football original paper use complex CNN features). We did not test QMIX because this environment does not provide global status information.

| Senarios | Difficulty | VDN ($\lambda=1.0$) | | -------------------------- | :--------: | :-------------------: | | academy_counterattack_hard | - | 0.71 (Test Score) | | academy_counterattack_easy | - | 0.87 (Test Score) |

Usage

PyMARL is WhiRL's framework for deep multi-agent reinforcement learning and includes implementations of the following algorithms:

Value-based Methods:

Actor Critic Methods:

Installation instructions

Install Python packages

# require Anaconda 3 or Miniconda 3
conda create -n pymarl python=3.8 -y
conda activate pymarl

bash install_dependecies.sh

Set up StarCraft II (2.4.10) and SMAC:

bash install_sc2.sh

This will download SC2.4.10 into the 3rdparty folder and copy the maps necessary to run over.

Set up Google Football:

bash install_gfootball.sh

Command Line Tool

Run an experiment

# For SMAC
python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor

# For Difficulty-Enhanced Predator-Prey
python3 src/main.py --config=qmix_predator_prey --env-config=stag_hunt with env_args.map_name=stag_hunt

# For Communication tasks
python3 src/main.py --config=qmix_att --env-config=sc2 with env_args.map_name=1o_10b_vs_1r

# For Google Football (Insufficient testing)
# map_

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

400

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

19.9k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary