GenEnv
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators
Install / Use
/learn @Gen-Verse/GenEnvREADME
🌟 Introduction
GenEnv is a novel co-training framework that simultaneously trains an Agent LLM and an Environment LLM. The key insight is that the Environment LLM learns to generate training tasks at the boundary of the Agent's capability—neither too easy nor too hard—creating an adaptive curriculum that maximizes learning efficiency.
<p align="center"> <img src="assets/framework.png" width="800"/> </p>Key Features
- 🔄 Co-Training Loop: Agent and Environment LLMs are trained alternately, each improving the other
- 📊 Adaptive Curriculum: Environment generates tasks calibrated to the Agent's current skill level
- 🎯 Boundary Learning: Focus on tasks where the Agent has ~50% success rate for maximum gradient signal
- ⚡ Built on veRL: Leverages the efficient veRL framework for distributed GRPO training
🚀 Quick Start
Prerequisites
# Clone the repository
git clone https://github.com/Gen-Verse/GenEnv.git
cd GenEnv
# Install dependencies
pip install -r requirements.txt
Dependencies
GenEnv is built on top of veRL. Please follow veRL's installation instructions first.
📋 Usage
⚠️ Important: Customization Required
This codebase provides the training framework for GenEnv. To use it for your specific task, you need to customize:
-
Reward Function (
genenv/utils/reward_functions.py)- Replace
RewardManager.compute_reward()with your domain-specific reward logic - Examples provided for math reasoning, tool calling, and action-based tasks
- Replace
-
Environment Prompt Template (
genenv/trainer/genenv_trainer.py)- Modify
_generate_new_tasks()to customize how the Env LLM generates new tasks - Adjust the prompt template based on your task format
- Modify
-
Task Parsing (
genenv/trainer/genenv_trainer.py)- Update the parsing logic in
_generate_new_tasks()to extract tasks from Env LLM outputs
- Update the parsing logic in
-
Initial Training Data (
configs/genenv_config.yaml)- Prepare your training data in parquet format with prompts and ground truth answers
Configuration
Edit configs/genenv_config.yaml:
# Key paths to customize
env_model_path: /path/to/your/env/model # Environment LLM
actor_rollout_ref.model.path: /path/to/agent # Agent LLM
data.train_files: /path/to/train.parquet # Training data
data.val_files: /path/to/val.parquet # Validation data
trainer.default_local_dir: /path/to/checkpoints
# GenEnv specific parameters
genenv:
enable: True
filtering_k: 0.1 # Filter top/bottom 10% of prompts
num_generations_per_prompt: 4
Training
# Using the provided script
bash scripts/run_genenv.sh --model /path/to/model --env-model /path/to/env/model
# Or directly with Python
python -m genenv.train \
genenv.enable=True \
env_model_path=/path/to/env/model \
actor_rollout_ref.model.path=/path/to/agent \
data.train_files=/path/to/train.parquet \
data.val_files=/path/to/val.parquet
📁 Project Structure
GenEnv/
├── genenv/
│ ├── __init__.py
│ ├── train.py # Main training entry point
│ ├── trainer/
│ │ ├── __init__.py
│ │ └── genenv_trainer.py # Core GenEnv training loop
│ └── utils/
│ ├── __init__.py
│ └── reward_functions.py # Reward function implementations
├── configs/
│ └── genenv_config.yaml # Training configuration
├── scripts/
│ └── run_genenv.sh # Training launch script
├── requirements.txt
└── README.md
🔧 Reward Function Examples
Math Reasoning (Default)
def compute_reward(self, generated_text: str, ground_truth: Any) -> float:
pred_answer = self._extract_boxed_answer(generated_text)
gold_answer = self._get_gold_answer(ground_truth)
return 1.0 if pred_answer == gold_answer else 0.0
Tool Calling
from genenv.utils import ToolCallingRewardManager
reward_fn = ToolCallingRewardManager(tokenizer=tokenizer)
# Checks if <tool_call>{"name": ..., "parameters": ...}</tool_call> matches ground truth
Custom Domain
class MyRewardManager(RewardManager):
def compute_reward(self, generated_text: str, ground_truth: Any) -> float:
# Your custom reward logic here
return score
📊 Training Data Format
Your training data should be in parquet format with at least these columns:
| Column | Description |
|--------|-------------|
| prompt | The task prompt (can be string or list of chat messages) |
| reward_model | Dict containing {"ground_truth": <answer>} |
Example:
import pandas as pd
data = [
{
"prompt": [{"role": "user", "content": "What is 2 + 2?"}],
"reward_model": {"ground_truth": "4"}
},
# ... more examples
]
pd.DataFrame(data).to_parquet("train.parquet")
🙏 Acknowledgements
This project is built upon the excellent work of:
We thank the authors for making their code publicly available.
📄 License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
📖 Citation
If you find GenEnv useful for your research, please consider citing:
@misc{guo2025genenvdifficultyalignedcoevolutionllm,
title={GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators},
author={Jiacheng Guo and Ling Yang and Peter Chen and Qixin Xiao and Yinjie Wang and Xinzhe Juan and Jiahao Qiu and Ke Shen and Mengdi Wang},
year={2025},
eprint={2512.19682},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.19682},
}
<p align="center"> <b>Princeton AI Lab</b> | <a href="https://github.com/Gen-Verse">Gen-Verse</a> </p>
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
