TrajAgent
[NeurIPS 2025] TrajAgent: An LLM-Agent Framework for Trajectory Modeling via Large-and-Small Model Collaboration
Install / Use
/learn @tsinghua-fib-lab/TrajAgentREADME
TrajAgent: An LLM-Agent Framework for Trajectory Modeling via Large-and-Small Model Collaboration
🌍 Introduction
Trajectory modeling, which includes research on trajectory data pattern mining and future prediction, has widespread applications in areas such as life services, urban transportation, and public administration. Numerous methods have been proposed to address specific problems within trajectory modeling. However, the heterogeneity of data and the diversity of trajectory tasks make effective and reliable trajectory modeling an important yet highly challenging endeavor, even for domain experts. In this paper, we propose TrajAgent, a agent framework powered by large language models (LLMs), designed to facilitate robust and efficient trajectory modeling through automation modeling. This framework leverages and optimizes diverse specialized models to address various trajectory modeling tasks across different datasets effectively. In TrajAgent, we first develop UniEnv, an execution environment with a unified data and model interface, to support the execution and training of various models. Building on UniEnv, we introduce an agentic workflow designed for automatic trajectory modeling across various trajectory tasks and data. Furthermore, we introduce collaborative learning schema between LLM-based agents and small speciallized models, to enhance the performance of the whole framework effectively.

🚀 Key Features
- End-to-End Automation: Data Preprocessing → Model Selection → Data Augmentation → Parameter Optimization → Model Training → Result Analysis
- LLM-Driven Decision Making: Intelligent decision mechanism based on Large Language Models for automatic optimal strategy selection
- Multi-Task Support: Supports various trajectory modeling tasks including trajectory prediction, user linkage, anomaly detection, and more
- Multi-Dataset Compatibility: Compatible with mainstream trajectory datasets such as Foursquare, Gowalla, Brightkite, etc.
- Modular Design: Independent and testable modules supporting both standalone and combined usage
📋 Supported Tasks

| Specific Task | Supported Models | Dataset Type | |---------------|------------------|--------------| | Next Location Prediction | DeepMove, RNN, FPMC, GETNext, LLM-ZS | checkin | | Travel Time Estimation | DeepTTE, DutyTTE, MulTTTE | gps | | Trajectory Recovery | TrajBERT | gps | | Map Matching | DeepMM, GraphMM | map | | Trajectory User Linkage | DPLink, MainTUL, S2TUL | checkin | | Trajectory Generation | ActSTD, DSTPP | checkin | | Trajectory Representation | CACSR | checkin | | Trajectory Anomaly Detection | GMVSAE | gps | | Intention Prediction | LIMP | checkin |
🛠️ Environment Setup
1. Requirements
- Python 3.7+
- CUDA 10.1+ (recommended)
- Multiple conda environments (see environment/ directory)
2. Environment Installation
The project includes multiple pre-configured conda environments. Install as needed:
# View available environments
ls environment/
# Install environments (examples)
conda env create -f environment/libcity_py39_torch231_cu121.txt
conda env create -f environment/STAN_py37_cu101_torch171.txt
# ... other environments
3. Environment Path Configuration
After installing environments, modify the environment paths in shell scripts:
# Modify .sh files in base_model/ directory
# Replace BASE_ENV_PATH="your_base_env_path" with actual path
# Example:
BASE_ENV_PATH="/path/to/your/conda/envs"
4. Environment Variables
export PYTHONPATH="/path/to/TrajAgent"
export OPENAI_API_KEY="your_openai_key"
export DEEPINFRA_API_KEY="your_deepinfra_key" # optional
export SiliconFlow_API_KEY="your_siliconflow_key" # optional
📁 Project Structure
TrajAgent/
├── plan_agent_run.py # End-to-end pipeline entry
├── da_agent_run.py # Data augmentation entry
├── fm_agent_run.py # Data formatting/conversion entry
├── op_agent_run.py # Result analysis/summary entry
├── param_agent_run.py # Parameter optimization entry
├── preprocess/ # Data preprocessing
│ ├── traj_preprocess.py
│ └── traj_preprocess_gps.py
├── data_augmentation/ # Data augmentation + utils
│ ├── da_agent.py
│ └── utils/
│ ├── base_llm.py
│ ├── llm_da_utils.py
│ ├── distribution_sampler.py
│ └── prompts.py
├── model_selection/
│ └── utils/
│ └── utils.py
├── param_optimize/
│ ├── pa_agent.py
│ └── utils/
│ └── utils.py
├── result_optimize/
│ └── optimize_agent.py
├── UniEnv/ # Unified environment for external models
│ ├── base_model/ # Shell runners for each model
│ ├── model_lib/ # Third-party model code
│ └── etc/
│ ├── settings.py
│ ├── da-config.yaml
│ └── model_config/
├── environment/ # Environment specs (txt)
├── data/ # Datasets and outputs
│ ├── input_format/
│ ├── model_output/
│ └── aux/
├── nl_input_parser.py # Natural-language argument parser (rule + LLM fallback)
└── README_EN.md
🚀 Quick Start
1. Full Pipeline Execution
# Basic usage
python plan_agent_run.py \
--task="Next_Location_Prediction" \
--source="foursquare" \
--target="standard" \
--city="London" \
--gpu_id=0
# Parameter description
# --task: Task type (see supported tasks list)
# --source: Source dataset name
# --target: Target data format (recommend using standard)
# --city: City name (required for certain datasets)
# --gpu_id: GPU device ID
1.1 Natural-Language Command Parsing
TrajAgent supports natural-language instructions with a rule-first parser and LLM fallback. The parser first extracts parameters via rules; if required fields are missing, it invokes the LLM to complete them. Set your API key if you want LLM fallback.
# Optional (enable LLM fallback):
export OPENAI_API_KEY=your_key
# Optional model control (defaults to gpt-4o-mini if not set):
export LLM_MODEL=gpt-4o-mini
# Example: run Next Location Prediction on agentmove (London), GPU 1, 10 epochs, 5 steps
python plan_agent_run.py \
--query "I'm looking to figure out which points of interest users are likely to visit next in London"
Parameter semantics and supported options (also available via nl_input_parser.explain_all_options()):
- task: Map_Matching, Trajectory_Generation, Trajectory_Representation, Trajectory_Recovery, Next_Location_Prediction, Trajectory_User_Linkage, Travel_Time_Estimation, Trajectory_Anomaly_Detection
- source: foursquare, gowalla, brightkite, agentmove, Earthquake, tencent, chengdu
- target: foursquare, gowalla, brightkite, agentmove, standard
- city (required when source=agentmove): CapeTown, London, Moscow, Mumbai, Nairobi, NewYork, Paris, SanFrancisco, SaoPaulo, Sydney, Tokyo, Unknown
- other: gpu_id, base_model, trial_num, max_step, max_epoch, memory_length
2. Single Module Execution
Data Preprocessing
python traj_preprocess.py \
--city="London" \
--dataset="foursquare" \
--model="DeepMove"
Data Augmentation + Model Training
python da_agent_run.py \
--task="Next_Location_Prediction" \
--dataset="foursquare" \
--model="DeepMove" \
--city="London" \
--gpu_id=0 \
--pa_da
Model Selection + Training
python fm_agent_run.py \
--task="Next_Location_Prediction" \
--dataset="foursquare" \
--city="London" \
--gpu_id=0
Result Analysis
python op_agent_run.py \
--task="Next_Location_Prediction" \
--dataset="foursquare" \
--city="London"
📊 Data Preparation
1. Data Format
Place raw data in the data/input_format/ directory:
data/input_format/
├── foursquare/
│ ├── source1.csv # POI data
│ └── source2.csv # Check-in data
├── gowalla/
│ └── source1.csv
└── ...
2. Supported Datasets
| Dataset | Type | Description | |---------|------|-------------| | foursquare | checkin | Foursquare check-in data | | gowalla | checkin | Gowalla social network data | | brightkite | checkin | Brightkite location data | | agentmove | checkin | Synthetic trajectory data | | tencent | map | Tencent map data | | chengdu | gps | Chengdu taxi trajectory data | | porto | gps | Porto taxi trajectory data | | earthquake | time_series | Earthquake time series data |
3. Evaluation Data for NL Parsing
Evaluation datasets for the natural-language command parsing are generated by the scripts under evaluate_userQuery/ (e.g., gen_task_plan.py, eval_task_plan.py). The generated data are stored at:
/data/evaluate
Notes:
gen_task_plan.pyproduces task-plan JSON files (e.g.,task_plan_6.json).- Paths are absolute; ensure the target directory exists or has write permission.
- You can customize prompts/data inside
evaluate_userQuery/before generation.
3. City Parameter Description
The city parameter is implemented based on AgentMove (Agentmove: A large language model based agentic framework for zero-shot next location prediction), which divides worldwide Foursquare check-in data by cities. Supported cities include:
AgentMove City Datasets:
CapeTown- Cape TownLondon- LondonMoscow- MoscowMumbai- MumbaiNairobi- NairobiNewYork- New YorkParis- ParisSanFrancisco- San FranciscoSaoPaulo- São PauloSydney- SydneyTokyo- Tokyo
Usage Rules:
- When
data_typeisagentmove, you can select thecityparameter to specify the city dataset for training - When
data_typeis other types,citycan be set toNoneorUnknown
