ContextAgent
[NeurIPS'25] ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions
Install / Use
/learn @openaiotlab/ContextAgentREADME
🏠 About
<div style="text-align: center;"> <img src="assets/teaser_contextagent.png" alt="Dialogue_Teaser" width=100% > </div> In this paper, we introduce ContextAgent, the first context-aware proactive agent that incorporates extensive sensory contexts surrounding humans to enhance the proactivity of LLM agents. <!-- ## Overview -->🗺️ Overview
<div style="text-align: center;"> <img src="assets/overview_contextagent.png" alt="Dialogue_Teaser" width=100% > </div>📂 Project Structure
ContextAgent/
├── src/
│ ├── icl/
│ │ ├── inference_api.py
│ │ └── inference.py
│ ├── sft/
│ │ ├── train.py
│ │ └── eval_sft.sh
│ ├── tools/
│ ├── utils/
│ └── config.py
├── data/
│ └── cab/
├── prompt/
├── scripts/
├── LLaMA-Factory/
├── setup.py
├── pyproject.toml
├── requirements.txt
├── environment.yml
⚙️ Installation
Method 1: Using pip (Recommended)
# Clone the repository
git clone https://github.com/bf-yang/ContextAgent.git
cd ContextAgent
# Install the package
pip install -e .
# Install LLaMA-Factory (required for SFT experiments)
pip install -e ./LLaMA-Factory
Method 2: Using conda
# Clone the repository
git clone https://github.com/bf-yang/ContextAgent.git
cd ContextAgent
# Create conda environment from environment file
conda env create -f environment.yml
conda activate contextagent
# Install the package
pip install -e .
📊 Evaluation
🔑 Configuration
Environment Variables Setup
ContextAgent requires several API keys for external tool integrations. Configure them using one of the following supported methods:
Option 1: Export variables inline (no script)
# Azure OpenAI Configuration
export AZURE_OPENAI_API_KEY="your_azure_openai_api_key_here"
export AZURE_OPENAI_ENDPOINT="https://your-resource-name.openai.azure.com/"
export AZURE_OPENAI_API_VERSION="2024-02-15-preview"
# External API Keys for Tools
export GOOGLE_MAP_API_KEY="your_google_maps_api_key_here"
export AMAP_API_KEY="your_amap_api_key_here"
export LOCATIONIQ_API_KEY="your_locationiq_api_key_here"
export SERPAPI_KEY="your_serpapi_key_here"
export GOOGLE_CALENDAR_ACCOUNT="your_google_calendar_account_here"
# Set GPU devices (optional)
export CUDA_VISIBLE_DEVICES=0,1 # Use GPUs 0 and 1
Option 2: Source a shell script (recommended for convenience)
# Edit with your own credentials
$EDITOR scripts/env/export_env.sh
# Load variables into your shell
source scripts/env/export_env.sh
️▶️ Usage
⚙️ 1. ICL Setting
The following provides scripts for evaluating different LLMs under In-Context Learning (ICL) settings. It supports multiple base models (e.g., GPT-4o, Qwen, LLaMA, and DeepSeek series) and two execution modes: live and sandbox.
-
Open-source models. Test open-source LLMs (e.g., Llama-3.1-8B-Instruct and Qwen2.5-7BInstruct).
-
Python (direct)
python src/icl/inference.py --model <MODEL_NAME> --mode sandbox -
Shell script
bash scripts/icl/run_infer_local.sh
-
-
Proprietary LLMs. Use API inference for proprietary LLMs (e.g., GPT-4o).
-
Python (direct)
python src/icl/inference_api.py --model <MODEL_NAME> --mode sandbox -
Shell script
bash scripts/icl/run_infer_api.sh
-
| Argument | Type | Description |
|-----------|--------|-----------------------------------------------------------------------------|
| --model | string | Base model to evaluate (e.g., qwen2.5:latest, llama3.1:8b, deepseek-r1)
| --mode | string | • live – the agent actually executes external tools and APIs <br>• sandbox – the agent uses predefined sandboxed results without making real API calls |
- Metrics. After inference finishes, compute metrics per model. Run one command per model you want to score (don’t pass two models at once). Calculte score:
python src/calculate_scores.py --methods icl --model_base_icl <MODEL_NAME>
👉 For more details, see README.md.
⚙️ 2. SFT Setting
Launch supervised fine-tuning (SFT) experiments via:
bash scripts/sft/run_sft_exp.sh
[!NOTE]
What the script does
- Training – calls
LLaMA-Factory/experiments/cab_lora_train.sh(LoRA/SFT configs).- Evaluation – runs
scripts/sft/run_sft_eval.shto evaluate fine-tuned models.Customize
- Edit
LLaMA-Factory/experiments/cab_lora_train.shto set the base model and SFT/LoRA parameters.- Edit
scripts/sft/run_sft_eval.shto choose the base model and evaluation mode.Tip
- Keep the same base model name across training and evaluation for consistency.
👉 For more details, see README.md.
🔗 Citation
If you find our work and this codebase helpful, please consider starring this repo 🌟 and cite:
@article{yang2025contextagent,
title={ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions},
author={Yang, Bufang and Xu, Lilin and Zeng, Liekang and Liu, Kaiwei and Jiang, Siyang and Lu, Wenrui and Chen, Hongkai and Jiang, Xiaofan and Xing, Guoliang and Yan, Zhenyu},
journal={Advances in Neural Information Processing Systems},
volume={38},
pages={1--10},
year={2025}
}
