Smpe

[ICML 2025] Official Code of SMPE: "Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration"

Generate Convert Improve

Install / Use

/learn @ddaedalus/Smpe

About this skill

Quality Score

0/100

README

<h1 align="center"> Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration (Accepted at ICML 2025) </h1> <div align="center"> <a href="https://ddaedalus.github.io/" target="_blank">Andreas Kontogiannis</a>1,2,* &ensp; · &ensp; <a href="https://www.linkedin.com/in/konstantinos-papathanasiou-4bbb1b176/?originalSubdomain=gr" target="_blank">Konstantinos Papathanasiou</a>3,* &ensp; · &ensp; <a href="https://people.duke.edu/~ys267/">Yi Shen</a>4 &ensp; · &ensp; <a href="https://scholar.google.nl/citations?hl=en&user=R3y5dxMAAAAJ" target="_blank">Giorgos Stamou</a>1 · &ensp; <a href="https://www.michaelmzavlanos.org/" target="_blank">Michael Μ. Zavlanos</a>4 · &ensp; <a href="https://scholar.google.com/citations?user=PBX9aQUAAAAJ&hl=en" target="_blank">George Vouros</a>5 1 National Technical University of Athens &emsp; 2 Archimedes AI &emsp; &emsp; 3 ETH Zurich &emsp; 4 Duke University &emsp; 5 University of Piraeus &emsp; </div>

smpe

Abstract

Learning to cooperate in distributed partially observable environments with no communication abilities poses significant challenges for multi-agent deep reinforcement learning (MARL). This paper addresses key concerns in this domain, focusing on inferring state representations from individual agent observations and leveraging these representations to enhance agents' exploration and collaborative task execution policies. To this end, we propose a novel state modelling framework for cooperative MARL, where agents infer meaningful belief representations of the non-observable state, with respect to optimizing their own policies, while filtering redundant and less informative joint state information. Building upon this framework, we propose the MARL SMPE algorithm. In SMPE, agents enhance their own policy's discriminative abilities under partial observability, explicitly by incorporating their beliefs into the policy network, and implicitly by adopting an adversarial type of exploration policies which encourages agents to discover novel, high-value states while improving the discriminative abilities of others. Experimentally, we show that SMPE outperforms state-of-the-art MARL algorithms in complex fully cooperative tasks from the MPE, LBF, and RWARE benchmarks.

Paper Link

LBF command line

python3 main.py --config=smpe_lbf --env-config=gymma with env_args.time_limit=50 env_args.key="Foraging-2s-9x9-3p-2f-coop-v2"

MPE command line

python3 main.py --config=smpe_mpe --env-config=gymma with env_args.time_limit=25 env_args.key="mpe:SimpleSpread-v0"

RWARE command line

python3 main.py --config=smpe_lbf --env-config=gymma with env_args.time_limit=500 env_args.key="rware:rware-tiny-4ag-hard-v1"

If you are using SMPE in your research, please cite:

@inproceedings{
kontogiannis2025enhancing,
title={Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration},
author={Andreas Kontogiannis and Konstantinos Papathanasiou and Yi Shen and Giorgos Stamou and Michael M. Zavlanos and George Vouros},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=TCsdlqzZNL}
}

Related Skills

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

isf-agent

a repo for an agent that helps researchers apply for isf funding

workshop-rules

Materials used to teach the summer camp <Data Science for Kids>

last30days-skill

13.4k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary