ECON
[ICML 2025] "From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium"
Install / Use
/learn @tmlr-group/ECONREADME
Motivation
Existing multi-agent frameworks face significant limitations when applied to Large Language Models (LLMs). Traditional approaches struggle with the high-dimensional nature of language models and lack proper coordination mechanisms for complex reasoning tasks.
<div align="center"> <figure> <img src="assets/compare.jpg" alt="ECON vs Traditional MAD Comparison" width="800"> <br> <p><em>Comparison between ECON and traditional Multi-Agent Debate (MAD) approaches</em></p> </figure> </div>Current multi-agent LLM systems suffer from:
- Prohibitive Communication Costs: Traditional multi-agent debate relies on explicit message passing, incurring substantial token costs and computational overhead
- No Convergence Guarantees: Current approaches lack theoretical assurances of converging to stable, effective solutions
- Scalability Challenges: Information exchange often exceeds LLM context limits, severely impeding scalability in large agent ensembles
Our Solution: ECON Framework
<div align="center"> <figure> <img src="assets/framework.jpg" alt="ECON Framework Architecture" width="800"> <br> <p><em>ECON's two-stage coordination architecture with Bayesian Nash Equilibrium</em></p> </figure> </div>To address these critical challenges, we introduce ECON - a multi-agent LLM framework that implements efficient coordination via Bayesian Nash Equilibrium, enabling scalable and theoretically grounded multi-agent reasoning.
- Implicit Belief-Driven Coordination: Replaces costly message passing with belief-based coordination, dramatically reducing communication overhead
- Guaranteed Convergence to Equilibrium: Establishes a rigorous Bayesian Nash Equilibrium (BNE) framework with theoretical convergence guarantees
- Hierarchical & Scalable Architecture: Enables effective coordination in large ensembles via a local-to-global approach that respects LLM context limits
Minimal Usage
Installation
We provide two installation methods:
Package Installation (Recommended)
Install the ECON framework dependencies:
pip install -r requirements.txt
Development Installation
For development or customization, clone the repository and set up the environment:
# Clone the repository
git clone https://github.com/yourusername/ECON.git
cd ECON
# Create and activate conda environment
conda create -n econ python=3.8
conda activate econ
# Install dependencies
pip install -r requirements.txt
Model Setup
Before running the framework, you need to set up the Together AI API key:
export TOGETHER_API_KEY="your_together_ai_api_key"
Usage
Quick Start with Command Line Interface
Set your API key once:
export TOGETHER_API_KEY="your_together_ai_api_key"
One-line Math sanity run (train 1 ep, test 5 eps)
python scripts/run_math_test.py \
--train-eps 1 \
--test-eps 5 \
--log-dir logs_exp1 \
--model-dir models_exp1
Default P0 training + BNE testing
python scripts/run_p0_test.py \
--train-eps 100 \
--test-eps 30 \
--log-dir logs_exp1 \
--model-dir models_exp1
Notes:
max_roundsdefaults to 1 (single decision) inscripts/config_p0.yamlandscripts/config_math.yaml; increase if you need multi-round episodes.- Reward weights in env reuse the α weights learned during training; override via
reward.initial_weightsin config. - All schemes expect
agent_memoryso runner/learner can consume short-term trajectories.
Configuration
Key Parameters
n_agents: Number of executor agents (e.g., 3, 5, 8)coordinator_model: Coordinator LLM model nameexecutor_model: Executor LLM model nameupdate_interval: Gradient update frequency (default: 10 steps)bne_max_iterations: Maximum BNE coordination iterationsbelief_dim: Dimension of agent belief statessampling.temperature_min/max: Bounds for temperaturesampling.p_min/max: Bounds for repetition penalty (second action dimension)sampling.top_p_default: Fixed top_p used for generation (default 0.9)
Supported Models
The framework supports any open-source language model accessible via Together AI API. Models can be hosted using:
- Together AI: For remote model serving with API access
- Local APIs: Compatible with OpenAI-style APIs
Example: Using Llama-3.3-70B-Instruct-Turbo
./run_econ.sh \
--api-key YOUR_API_KEY \
--config src/config/config.yaml \
--agents 3 \
--experiment-name llama-coordination-test
Custom Datasets
Create your own datasets following the Hugging Face format with question and answer fields:
env_args:
hf_dataset_path: "your_custom_dataset"
dataset_split: "train"
question_field_name: "question"
answer_field_name: "answer"
max_question_length: 1024
max_answer_length: 512
Testing & Evaluation
Available Testing Methods
The framework provides multiple testing approaches for comprehensive model validation:
1. Integrated Training + Testing (scripts/run_p0_test.py)
Train and test in one command (recommended for quick experiments):
# Quick test (5 train episodes, 3 test episodes)
python scripts/run_p0_test.py \
--train-eps 5 \
--test-eps 3 \
--log-dir logs_quick \
--model-dir models_quick
# Full training (100 train episodes, 30 test episodes)
python scripts/run_p0_test.py \
--train-eps 100 \
--test-eps 30 \
--log-dir logs_exp1 \
--model-dir models_exp1
Features:
- Automatically runs training followed by BNE testing (3 rounds)
- Saves model checkpoints to
--model-dir - Logs test traces to
--log-dir/llm_traces_test_bne_3rounds.json - Reports accuracy and P0 metadata (JSON parsing rate)
2. Testing Pre-trained Models (scripts/test_p0.py)
Test existing trained models without retraining:
export TOGETHER_API_KEY="your_api_key"
python scripts/test_p0.py
Configuration:
- Edit
MODEL_DIRvariable in test_p0.py to point to your trained model directory (e.g.,./models_exp1/final) - Runs both baseline (no BNE) and P0 BNE (3 rounds) tests
- Outputs:
logs_p0_test_baseline.jsonandlogs_p0_test_p0.json
Example Output:
Baseline (no BNE): 10/10 = 100.0%
P0 BNE (3 rounds): 10/10 = 100.0%
P0 Metadata: JSON=100%
Note:
src/eval.pyis not part of the current workflow; rely on the above test scripts for evaluation.
3. Dataset-Specific Testing
Test on MATH or SVAMP datasets:
# MATH dataset
python scripts/run_math_test.py \
--train-eps 5 \
--test-eps 10 \
--log-dir logs_math \
--model-dir models_math
# SVAMP dataset
python scripts/run_svamp_test.py \
--train-eps 5 \
--test-eps 10 \
--log-dir logs_svamp \
--model-dir models_svamp
Already-trained checkpoints can be evaluated directly with scripts/test_math.py and scripts/test_svamp.py (baseline vs BNE, 10 episodes each by default).
Episode Structure Explanation
Important: Episodes in ECON have a unique structure that differs from traditional RL environments.
Default Setup (Single-Decision Episodes):
Episode = One math problem
├─ t=0: Decision step (includes K internal BNE refinement rounds)
│ └─ BNE coordination: belief updates, response generation, convergence
└─ t=1: Terminal state (reward computation)
Key Points:
- One episode = one math problem (not multiple attempts)
- 2 RL timesteps = 1 decision + 1 terminal (standard episodic RL)
- BNE refinement (K rounds) happens internally at t=0
- Multi-round debate occurs inside the decision step via belief coordination
Internal BNE Process (at t=0):
# Within single timestep t=0:
Round 0: e_init → LLM outputs → Commitment_0
Round 1: e_refined_1 → LLM outputs → Commitment_1
Round 2: e_refined_2 → LLM outputs → Commitment_2
Final: Submit Commitment_2 as answer
Configuration:
bne:
max_iterations_train: 1 # BNE rounds during training
max_iterations_infer: 3 # BNE rounds during testing
Optional Multi-Round Episodes:
For environments requiring multiple answer revision steps:
env_args:
max_rounds: 3 # Enable multi-round attempts (default is 1)
This creates true multi-step episodes:
- Single-step (default): 2 timesteps (decision + terminal)
- Multi-round (optional): (N+1) timesteps (N decisions + terminal)
Advanced Features
Architecture Components
- Coordinator LLM: Generates strategies (≤50 tokens) and final commitments without revealing answers
- Executor LLMs: Multiple agents that process strategies and generate individual responses
- BeliefNetwork: Individual agent belief state management with Q-value computation
- BeliefEncoder: Group representation aggregation using attention mechanisms
- Mixer: Attention-based agent interaction layer that aggregates local Q values with commitment-alignment and consistency regularization
Bayesian Na
Related Skills
node-connect
338.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
338.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.6kCommit, push, and open a PR
