FastWAM

Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?

Generate Convert Improve

Install / Use

/learn @yuantianyuan01/FastWAM

About this skill

Quality Score

0/100

README

FastWAM

Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?

This repository contains the training and evaluation code for FastWAM on LIBERO / RoboTwin.

File Structure

FastWAM/
├── configs/
│   ├── data/                 # Dataset configs (LIBERO, RoboTwin, etc.)
│   ├── model/                # Model architecture and component configs
│   └── task/                 # Task-level configs (training task names)
├── scripts/
│   ├── train.py
│   ├── train_zero1.sh        # Deepspeed zero1 training entrypoint
│   ├── preprocess_action_dit_backbone.py  # Preprocess ActionDiT backbone before training
│   └── precompute_text_embeds.py  # Precompute T5 text embedding cache before training
├── experiments/
│   ├── libero/
│   │   └── run_libero_manager.py
│   └── robotwin/
│       └── run_robotwin_manager.py
├── src/fastwam/              # Core code
├── runs/                     # Training outputs (ckpt, logs)
├── checkpoints/              # Pretrained or external checkpoints
├── data/                     # Data directory
└── evaluate_results/         # Inference / evaluation results

Environment Setup

conda create -n fastwam python=3.10 -y
conda activate fastwam
pip install -U pip
pip install torch==2.7.1+cu128 torchvision==0.22.1+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
pip install -e .

Model Preparation

This step is required before both training and inference.

Step 1: set the Wan model directory first (opional, default ./checkpoints):

mkdir -p checkpoints
export DIFFSYNTH_MODEL_BASE_PATH="$(pwd)/checkpoints"

Step 2: pre-generate the ActionDiT backbone (interpolated from Wan22 DiT):

# uncond (fastwam)
python scripts/preprocess_action_dit_backbone.py \
  --model-config configs/model/fastwam.yaml \
  --output checkpoints/ActionDiT_linear_interp_Wan22_alphascale_1024hdim.pt \
  --device cuda \
  --dtype bfloat16

Dataset Download

LIBERO

The preprocessed LIBERO dataset used by Fast-WAM is available at:

https://huggingface.co/datasets/yuanty/LIBERO-fastwam

Download all compressed files first, then extract them all:

mkdir -p data/libero_mujoco3.3.2
cd data/libero_mujoco3.3.2

# Run after downloading all 4 tar.gz files
for f in *.tar.gz; do
  tar -xzf "$f"
done

The extracted directory structure should be:

data/libero_mujoco3.3.2/
├── libero_10_no_noops_lerobot/
├── libero_goal_no_noops_lerobot/
├── libero_object_no_noops_lerobot/
└── libero_spatial_no_noops_lerobot/

RoboTwin

The preprocessed RoboTwin dataset used by Fast-WAM is available at:

https://huggingface.co/datasets/yuanty/robotwin2.0-fastwam

Download all split archive files first, then concatenate and extract:

mkdir -p data/robotwin2.0
cd data/robotwin2.0

# Run after downloading all robotwin2.0.tar.gz.part-* files
cat robotwin2.0.tar.gz.part-* | tar -xzf -

The extracted directory structure should be:

data/robotwin2.0/
└── robotwin2.0/
    ├── data/
    ├── meta/
    └── videos/

If you also keep:

data/robotwin2.0/dataset_stats.json

in the root directory, it can be used directly as the statistics file for the current configs in this repo. You can also recompute it.

Inference with Released Checkpoints

The released checkpoints and their corresponding dataset stats are available on Hugging Face.

Optional: download released checkpoints and dataset stats from Hugging Face:

pip install -U huggingface_hub

huggingface-cli download yuanty/fastwam \
  libero_uncond_2cam224.pt \
  libero_uncond_2cam224_dataset_stats.json \
  robotwin_uncond_3cam_384.pt \
  robotwin_uncond_3cam_384_dataset_stats.json \
  --local-dir ./checkpoints/fastwam_release

After downloading, the local directory is expected to contain:

checkpoints/fastwam_release/
├── libero_uncond_2cam224.pt
├── libero_uncond_2cam224_dataset_stats.json
├── robotwin_uncond_3cam_384.pt
└── robotwin_uncond_3cam_384_dataset_stats.json

Before running the LIBERO benchmark, install the official LIBERO environment first from the LIBERO repository. Then run this final step:

pip install mujoco==3.3.2

The mujoco environment should ideally stay consistent with the LIBERO data version.

We have already copied the RoboTwin evaluation-related code into third_party/RoboTwin. You still need to follow the official RoboTwin instructions from the RoboTwin repository to finish environment installation and download the required assets, then create the policy symlink:

ln -sfn "$(pwd)/experiments/robotwin/fastwam_policy" "$(pwd)/third_party/RoboTwin/policy/fastwam_policy"

Optional: evaluate released LIBERO checkpoint:

The released LIBERO / RoboTwin evaluation managers default to 8 GPUs (MULTIRUN.num_gpus=8 in configs/sim_libero.yaml and configs/sim_robotwin.yaml). If you want to evaluate with fewer GPUs, pass a smaller value such as MULTIRUN.num_gpus=4.

python experiments/libero/run_libero_manager.py \
  task=libero_uncond_2cam224_1e-4 \
  ckpt=./checkpoints/fastwam_release/libero_uncond_2cam224.pt \
  EVALUATION.dataset_stats_path=./checkpoints/fastwam_release/libero_uncond_2cam224_dataset_stats.json \
  MULTIRUN.num_gpus=8

Optional: evaluate released RoboTwin checkpoint:

python experiments/robotwin/run_robotwin_manager.py \
  task=robotwin_uncond_3cam_384_1e-4 \
  ckpt=./checkpoints/fastwam_release/robotwin_uncond_3cam_384.pt \
  EVALUATION.dataset_stats_path=./checkpoints/fastwam_release/robotwin_uncond_3cam_384_dataset_stats.json \
  MULTIRUN.num_gpus=8

For faster RoboTwin evaluation, we have enabled EVALUATION.skip_get_obs_within_replan=true in configs/sim_robotwin.yaml. This skips RGB rendering while consecutively executing an action chunk within one replan window, which speeds up evaluation but makes the saved video look very low-FPS. Set it to false if you want to save a fully rendered video.

Note: We evaluate with unseen instructions, following Motus. Lingbot-VA uses seen instructions instead. You can try EVALUATION.instruction_type=seen to use seen instructions, which should theoretically improve performance by one or two points.

Training

1) Precompute T5 embedding cache before training

Use scripts/precompute_text_embeds.py to precompute embeddings for each training task:

# LIBERO
python scripts/precompute_text_embeds.py task=libero_uncond_2cam224_1e-4

# RoboTwin
python scripts/precompute_text_embeds.py task=robotwin_uncond_3cam_384_1e-4

For multi-GPU:

torchrun --standalone --nproc_per_node=8 scripts/precompute_text_embeds.py task=libero_uncond_2cam224_1e-4

2) Training (using `fastwam` as an example)

When running a new task for the first time, set pretrained_norm_stats in the corresponding configs/data/*.yaml to null first. After one training run, a dataset_stats.json file will be generated in the current run directory (for example, runs/{task_name}/{run_id}/dataset_stats.json). You can then update pretrained_norm_stats to that file path for subsequent runs.

# LIBERO
bash scripts/train_zero1.sh 8 task=libero_uncond_2cam224_1e-4

# RoboTwin
bash scripts/train_zero1.sh 8 task=robotwin_uncond_3cam_384_1e-4

For LIBERO, we train on a single node with 8 GPUs. For RoboTwin, we use 64 GPUs to accelerate training. You can try reducing the GPU count or training epochs.

Inference with Your Trained Checkpoints

The mujoco environment should ideally stay consistent with the LIBERO data version. Then run LIBERO evaluation:

# LIBERO
python experiments/libero/run_libero_manager.py task={task_name} ckpt={ckpt_path}

We have already copied the RoboTwin evaluation-related code into third_party/RoboTwin. You still need to follow the official RoboTwin instructions from the RoboTwin repository. Finish installation and download the required assets, then create the policy symlink:

ln -sfn "$(pwd)/experiments/robotwin/fastwam_policy" "$(pwd)/third_party/RoboTwin/policy/fastwam_policy"

Then run RoboTwin evaluation:

python experiments/robotwin/run_robotwin_manager.py task={task_name} ckpt={ckpt_path}

Common task_name examples:

Related Skills

node-connect

347.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

108.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

yuantianyuan01

View profile

View on GitHub

GitHub Stars356

CategoryDevelopment

Updated9m ago

Forks19

yuantianyuan01/FastWAM

Languages

Python

Security Score

80/100

Audited on Apr 4, 2026

No findings

FastWAM

Install / Use

README

FastWAM

Index

File Structure

Environment Setup

Model Preparation

Dataset Download

LIBERO

RoboTwin

Inference with Released Checkpoints

Training

1) Precompute T5 embedding cache before training

2) Training (using fastwam as an example)

Inference with Your Trained Checkpoints

Related Skills

2) Training (using `fastwam` as an example)