FastWAM
Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?
Install / Use
/learn @yuantianyuan01/FastWAMREADME
FastWAM
Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?
This repository contains the training and evaluation code for FastWAM on LIBERO / RoboTwin.
Index
- File Structure
- Environment Setup
- Model Preparation
- Dataset Download
- Inference with Released Checkpoints
- Training
- Inference with Your Trained Checkpoints
- Acknowledgements
- BibTeX
File Structure
FastWAM/
├── configs/
│ ├── data/ # Dataset configs (LIBERO, RoboTwin, etc.)
│ ├── model/ # Model architecture and component configs
│ └── task/ # Task-level configs (training task names)
├── scripts/
│ ├── train.py
│ ├── train_zero1.sh # Deepspeed zero1 training entrypoint
│ ├── preprocess_action_dit_backbone.py # Preprocess ActionDiT backbone before training
│ └── precompute_text_embeds.py # Precompute T5 text embedding cache before training
├── experiments/
│ ├── libero/
│ │ └── run_libero_manager.py
│ └── robotwin/
│ └── run_robotwin_manager.py
├── src/fastwam/ # Core code
├── runs/ # Training outputs (ckpt, logs)
├── checkpoints/ # Pretrained or external checkpoints
├── data/ # Data directory
└── evaluate_results/ # Inference / evaluation results
Environment Setup
conda create -n fastwam python=3.10 -y
conda activate fastwam
pip install -U pip
pip install torch==2.7.1+cu128 torchvision==0.22.1+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
pip install -e .
Model Preparation
This step is required before both training and inference.
Step 1: set the Wan model directory first (opional, default ./checkpoints):
mkdir -p checkpoints
export DIFFSYNTH_MODEL_BASE_PATH="$(pwd)/checkpoints"
Step 2: pre-generate the ActionDiT backbone (interpolated from Wan22 DiT):
# uncond (fastwam)
python scripts/preprocess_action_dit_backbone.py \
--model-config configs/model/fastwam.yaml \
--output checkpoints/ActionDiT_linear_interp_Wan22_alphascale_1024hdim.pt \
--device cuda \
--dtype bfloat16
Dataset Download
LIBERO
The preprocessed LIBERO dataset used by Fast-WAM is available at:
- https://huggingface.co/datasets/yuanty/LIBERO-fastwam
Download all compressed files first, then extract them all:
mkdir -p data/libero_mujoco3.3.2
cd data/libero_mujoco3.3.2
# Run after downloading all 4 tar.gz files
for f in *.tar.gz; do
tar -xzf "$f"
done
The extracted directory structure should be:
data/libero_mujoco3.3.2/
├── libero_10_no_noops_lerobot/
├── libero_goal_no_noops_lerobot/
├── libero_object_no_noops_lerobot/
└── libero_spatial_no_noops_lerobot/
RoboTwin
The preprocessed RoboTwin dataset used by Fast-WAM is available at:
- https://huggingface.co/datasets/yuanty/robotwin2.0-fastwam
Download all split archive files first, then concatenate and extract:
mkdir -p data/robotwin2.0
cd data/robotwin2.0
# Run after downloading all robotwin2.0.tar.gz.part-* files
cat robotwin2.0.tar.gz.part-* | tar -xzf -
The extracted directory structure should be:
data/robotwin2.0/
└── robotwin2.0/
├── data/
├── meta/
└── videos/
If you also keep:
data/robotwin2.0/dataset_stats.json
in the root directory, it can be used directly as the statistics file for the current configs in this repo. You can also recompute it.
Inference with Released Checkpoints
The released checkpoints and their corresponding dataset stats are available on Hugging Face.
Optional: download released checkpoints and dataset stats from Hugging Face:
pip install -U huggingface_hub
huggingface-cli download yuanty/fastwam \
libero_uncond_2cam224.pt \
libero_uncond_2cam224_dataset_stats.json \
robotwin_uncond_3cam_384.pt \
robotwin_uncond_3cam_384_dataset_stats.json \
--local-dir ./checkpoints/fastwam_release
After downloading, the local directory is expected to contain:
checkpoints/fastwam_release/
├── libero_uncond_2cam224.pt
├── libero_uncond_2cam224_dataset_stats.json
├── robotwin_uncond_3cam_384.pt
└── robotwin_uncond_3cam_384_dataset_stats.json
Before running the LIBERO benchmark, install the official LIBERO environment first
from the LIBERO repository.
Then run this final step:
pip install mujoco==3.3.2
The mujoco environment should ideally stay consistent with the LIBERO data version.
We have already copied the RoboTwin evaluation-related code into third_party/RoboTwin.
You still need to follow the official RoboTwin instructions from the
RoboTwin repository to finish environment installation and download the required assets, then create the policy symlink:
ln -sfn "$(pwd)/experiments/robotwin/fastwam_policy" "$(pwd)/third_party/RoboTwin/policy/fastwam_policy"
Optional: evaluate released LIBERO checkpoint:
The released LIBERO / RoboTwin evaluation managers default to 8 GPUs
(MULTIRUN.num_gpus=8 in configs/sim_libero.yaml and configs/sim_robotwin.yaml).
If you want to evaluate with fewer GPUs, pass a smaller value such as
MULTIRUN.num_gpus=4.
python experiments/libero/run_libero_manager.py \
task=libero_uncond_2cam224_1e-4 \
ckpt=./checkpoints/fastwam_release/libero_uncond_2cam224.pt \
EVALUATION.dataset_stats_path=./checkpoints/fastwam_release/libero_uncond_2cam224_dataset_stats.json \
MULTIRUN.num_gpus=8
Optional: evaluate released RoboTwin checkpoint:
python experiments/robotwin/run_robotwin_manager.py \
task=robotwin_uncond_3cam_384_1e-4 \
ckpt=./checkpoints/fastwam_release/robotwin_uncond_3cam_384.pt \
EVALUATION.dataset_stats_path=./checkpoints/fastwam_release/robotwin_uncond_3cam_384_dataset_stats.json \
MULTIRUN.num_gpus=8
For faster RoboTwin evaluation, we have enabled EVALUATION.skip_get_obs_within_replan=true in configs/sim_robotwin.yaml.
This skips RGB rendering while consecutively executing an action chunk within one replan window, which speeds up evaluation but makes the saved video look very low-FPS.
Set it to false if you want to save a fully rendered video.
Note: We evaluate with unseen instructions, following Motus. Lingbot-VA uses seen instructions instead. You can try EVALUATION.instruction_type=seen to use seen instructions, which should theoretically improve performance by one or two points.
Training
1) Precompute T5 embedding cache before training
Use scripts/precompute_text_embeds.py to precompute embeddings for each training task:
# LIBERO
python scripts/precompute_text_embeds.py task=libero_uncond_2cam224_1e-4
# RoboTwin
python scripts/precompute_text_embeds.py task=robotwin_uncond_3cam_384_1e-4
For multi-GPU:
torchrun --standalone --nproc_per_node=8 scripts/precompute_text_embeds.py task=libero_uncond_2cam224_1e-4
2) Training (using fastwam as an example)
When running a new task for the first time, set pretrained_norm_stats in the corresponding configs/data/*.yaml to null first.
After one training run, a dataset_stats.json file will be generated in the current run directory (for example, runs/{task_name}/{run_id}/dataset_stats.json).
You can then update pretrained_norm_stats to that file path for subsequent runs.
# LIBERO
bash scripts/train_zero1.sh 8 task=libero_uncond_2cam224_1e-4
# RoboTwin
bash scripts/train_zero1.sh 8 task=robotwin_uncond_3cam_384_1e-4
For LIBERO, we train on a single node with 8 GPUs. For RoboTwin, we use 64 GPUs to accelerate training. You can try reducing the GPU count or training epochs.
Inference with Your Trained Checkpoints
The mujoco environment should ideally stay consistent with the LIBERO data version. Then run LIBERO evaluation:
# LIBERO
python experiments/libero/run_libero_manager.py task={task_name} ckpt={ckpt_path}
We have already copied the RoboTwin evaluation-related code into third_party/RoboTwin.
You still need to follow the official RoboTwin instructions from the
RoboTwin repository.
Finish installation and download the required assets, then create the policy symlink:
ln -sfn "$(pwd)/experiments/robotwin/fastwam_policy" "$(pwd)/third_party/RoboTwin/policy/fastwam_policy"
Then run RoboTwin evaluation:
python experiments/robotwin/run_robotwin_manager.py task={task_name} ckpt={ckpt_path}
Common task_name examples:
l
Related Skills
node-connect
347.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
