DiffusionTrajectoryPlanner
A PyTorch implementation of a conditional Denoising Diffusion Probabilistic Model (DDPM) for multi-modal trajectory prediction. This project trains a U-Net on the Waymo Open Motion Dataset to generate plausible, human-like driving behaviors by denoising trajectories from pure Gaussian noise.
Install / Use
/learn @AntonioAlgaida/DiffusionTrajectoryPlannerREADME
Efficient Virtuoso: A Latent Diffusion Transformer for Trajectory Planning
<p align="center"> This repository contains the official PyTorch implementation of "Efficient Virtuoso," a project developing a conditional Denoising Diffusion Probabilistic Model (DDPM) for multi-modal, long-horizon trajectory planning on the Waymo Open Motion Dataset. </p> <p align="center"> <a href="https://arxiv.org/abs/2509.03658" target="_blank"> <img src="https://img.shields.io/badge/ArXiv-2509.03658-b31b1b.svg?style=flat-square" alt="ArXiv Paper"> </a> <a href="http://arxiv.org/licenses/nonexclusive-distrib/1.0/" target="_blank"> <img src="https://img.shields.io/badge/Paper%20License-arXiv%20Perpetual-b31b1b.svg?style=flat-square" alt="ArXiv Paper License"> </a> <a href="LICENSE"> <img src="https://img.shields.io/badge/Code%20License-MIT-blue.svg?style=flat-square" alt="Code License"> </a> <a href="https://pytorch.org/"> <img src="https://img.shields.io/badge/Made%20with-PyTorch-EE4C2C.svg?style=flat-square&logo=pytorch" alt="Made with PyTorch"> </a> <img src="https://img.shields.io/badge/Python-3.10-3776AB.svg?style=flat-square&logo=python" alt="Python 3.10"> </p> <p align="center"> A project by <strong>Antonio Guillen-Perez</strong> | <a href="https://antonioalgaida.github.io/" target="_blank"><strong>Portfolio</strong></a> | <a href="https://www.linkedin.com/in/antonioguillenperez/" target="_blank"><strong>LinkedIn</strong></a> | <a href="https://scholar.google.com/citations?user=BFS6jXwAAAAJ" target="_blank"><strong>Google Scholar</strong></a> </p>- Efficient Virtuoso: A Latent Diffusion Transformer for Trajectory Planning
1. Key Result
This project successfully trains a generative model that can produce diverse, realistic, and contextually-aware future trajectories for an autonomous vehicle. Given a single scene context, the model can generate a multi-modal distribution of plausible future plans, a critical capability for robust decision-making.
<p align="center"> <img src="figures/fan_out_1.png" width="320" alt="Multi-modal trajectory prediction example 1"> <img src="figures/fan_out_2.png" width="320" alt="Multi-modal trajectory prediction example 2"> <img src="figures/fan_out_3.png" width="300" alt="Multi-modal trajectory prediction example 3"> </p> <p align="center"> <em><b>Figure 1: Multi-modal Trajectory Generation.</b> For the same initial state (SDC in green, past trajectory in red), our model generates 20 diverse yet plausible future trajectories (purple-red scale fan-out) that correctly adhere to the road geometry. Each panel shows a different scenario, highlighting the model's ability to capture scene context and generate multi-modal predictions.</em> </p>2. Project Mission
The development of safe and intelligent autonomous vehicles hinges on their ability to reason about an uncertain and multi-modal future. Traditional deterministic approaches, which predict a single "best guess" trajectory, often fail to capture the rich distribution of plausible behaviors a human driver might exhibit. This can lead to policies that are overly conservative or dangerously indecisive in complex scenarios.
This project directly confronts this challenge by fundamentally shifting the modeling paradigm from deterministic regression to conditional generative modeling. The mission is to develop a policy that learns to represent and sample from the entire, complex distribution of plausible expert behaviors, enabling the generation of driving behaviors that are not only safe but also contextually appropriate, diverse, and human-like.
3. Technical Approach
The core of this project is a Conditional Latent Diffusion Model. To achieve both high-fidelity and computational efficiency, the diffusion process is performed not on the raw trajectory data, but in a compressed, low-dimensional latent space derived via Principal Component Analysis (PCA).
- Data Pipeline: The raw Waymo Open Motion Dataset is processed through a multi-stage pipeline (
src/data_processing/). This includes parsing raw data, intelligent filtering of static scenarios, and feature extraction to produce(Context, Target Trajectory)pairs. - Latent Space Creation (PCA): We perform PCA on the entire set of expert
Target Trajectoriesto find the principal components that capture the most variance. This allows us to represent a high-dimensional trajectory (e.g.,80 timesteps * 2 coords = 160 dims) with a much smaller latent vector (e.g.,32 dims), which becomes the new target for the diffusion model. - Context Encoding: The scene
Contextis encoded by a powerful StateEncoder. It uses dedicated sub-networks for each entity (ego history, agents, map, goal) and fuses them using a Transformer Encoder to produce a single, holisticscene_embedding. - Denoising Model (Latent Diffusion Transformer): The primary model is a Conditional Transformer Decoder. It takes a noisy latent vector
z_tand learns to predict the original noiseε, conditioned on thescene_embeddingfrom the StateEncoder and the noise levelt. This architecture is more expressive and parameter-efficient for this type of sequential data than a standard U-Net. - Sampling: At inference, we start with pure Gaussian noise
z_Tin the latent space and iteratively apply the trained denoiser to recover a clean latent vectorz_0. This clean latent vector is then projected back into the high-dimensional trajectory space using the inverse PCA transform. This repository implements both the slow, stochastic DDPM sampler and the fast, deterministic DDIM sampler.
To ensure stability, all trajectory data is normalized to a [-1, 1] range before being used in the diffusion process.
<p align="center"><b>Figure 2: Model Architecture.</b> A Transformer-based StateEncoder processes the scene context. A separate Transformer Decoder acts as the denoiser in the PCA latent space.</p>
4. Repository Structure
diffusion-trajectory-planner/
├── configs/
│ └── main_config.yaml
├── data/
│ ├── (gitignored) processed_npz/
│ └── (gitignored) featurized_v3_diffusion/
├── models/
│ ├── (gitignored) checkpoints/
│ └── (gitignored) normalization_stats.pt
├── notebooks/
│ ├── 1_analyze_source_data.ipynb
│ ├── 2_analyze_featurized_data.ipynb
│ └── 3_analyze_final_results.ipynb
├── src/
│ ├── data_processing/ # Scripts for parsing, featurizing, and PCA
│ │ ├── parser.py
│ │ ├── featurizer_diffusion.py
│ │ └── compute_normalization_stats.py
│ ├── diffusion_policy/ # Core model, dataset, and training logic
│ │ ├── dataset.py
│ │ ├── networks.py
│ │ └── train.py
│ └── evaluation/ # Scripts for evaluation and visualization
│ └── evaluate_prediction.py
└── README.md
4. Setup and Installation
-
Clone the repository:
git clone https://github.com/your-username/diffusion-trajectory-planner.git cd diffusion-trajectory-planner -
Create and activate a Conda environment:
conda create --name virtuoso_env python=3.10 conda activate virtuoso_env -
Install dependencies:
pip install -r requirements.txt
6. Data Preparation Pipeline
This is a multi-step, one-time process. All commands should be run from the root of the repository.
Step 0: Download the Waymo Open Motion Dataset
Download the .tfrecord files for the motion prediction task from the Waymo Open Dataset website. Place the scenario folder containing the training and validation shards into a directory of your choice.
Step 1: Parse Raw Data (.tfrecord -> .npz)
This initial step converts the raw .tfrecord files into a more accessible NumPy format.
Note: This
parser.pyscript is a prerequisite and is assumed to be adapted from a previous project.
Update configs/main_config.yaml with the correct path to your raw data, then run the parser.
# Activate the parser-specific environment
conda activate virtuoso_parser
python -m src.data_processing.parser
This will create a data/processed_npz/ directory containing the parsed .npz files.
