SkillAgentSearch skills...

FastWAM

Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?

Install / Use

/learn @yuantianyuan01/FastWAM
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

FastWAM

Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?

English 中文

arXiv Project Page Hugging Face Model Hugging Face Dataset - LIBERO Hugging Face Dataset - RoboTwin

This repository contains the training and evaluation code for FastWAM on LIBERO / RoboTwin.

Index

File Structure

FastWAM/
├── configs/
│   ├── data/                 # Dataset configs (LIBERO, RoboTwin, etc.)
│   ├── model/                # Model architecture and component configs
│   └── task/                 # Task-level configs (training task names)
├── scripts/
│   ├── train.py
│   ├── train_zero1.sh        # Deepspeed zero1 training entrypoint
│   ├── preprocess_action_dit_backbone.py  # Preprocess ActionDiT backbone before training
│   └── precompute_text_embeds.py  # Precompute T5 text embedding cache before training
├── experiments/
│   ├── libero/
│   │   └── run_libero_manager.py
│   └── robotwin/
│       └── run_robotwin_manager.py
├── src/fastwam/              # Core code
├── runs/                     # Training outputs (ckpt, logs)
├── checkpoints/              # Pretrained or external checkpoints
├── data/                     # Data directory
└── evaluate_results/         # Inference / evaluation results

Environment Setup

conda create -n fastwam python=3.10 -y
conda activate fastwam
pip install -U pip
pip install torch==2.7.1+cu128 torchvision==0.22.1+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
pip install -e .

Model Preparation

This step is required before both training and inference.

Step 1: set the Wan model directory first (opional, default ./checkpoints):

mkdir -p checkpoints
export DIFFSYNTH_MODEL_BASE_PATH="$(pwd)/checkpoints"

Step 2: pre-generate the ActionDiT backbone (interpolated from Wan22 DiT):

# uncond (fastwam)
python scripts/preprocess_action_dit_backbone.py \
  --model-config configs/model/fastwam.yaml \
  --output checkpoints/ActionDiT_linear_interp_Wan22_alphascale_1024hdim.pt \
  --device cuda \
  --dtype bfloat16

Dataset Download

LIBERO

The preprocessed LIBERO dataset used by Fast-WAM is available at:

  • https://huggingface.co/datasets/yuanty/LIBERO-fastwam

Download all compressed files first, then extract them all:

mkdir -p data/libero_mujoco3.3.2
cd data/libero_mujoco3.3.2

# Run after downloading all 4 tar.gz files
for f in *.tar.gz; do
  tar -xzf "$f"
done

The extracted directory structure should be:

data/libero_mujoco3.3.2/
├── libero_10_no_noops_lerobot/
├── libero_goal_no_noops_lerobot/
├── libero_object_no_noops_lerobot/
└── libero_spatial_no_noops_lerobot/

RoboTwin

The preprocessed RoboTwin dataset used by Fast-WAM is available at:

  • https://huggingface.co/datasets/yuanty/robotwin2.0-fastwam

Download all split archive files first, then concatenate and extract:

mkdir -p data/robotwin2.0
cd data/robotwin2.0

# Run after downloading all robotwin2.0.tar.gz.part-* files
cat robotwin2.0.tar.gz.part-* | tar -xzf -

The extracted directory structure should be:

data/robotwin2.0/
└── robotwin2.0/
    ├── data/
    ├── meta/
    └── videos/

If you also keep:

data/robotwin2.0/dataset_stats.json

in the root directory, it can be used directly as the statistics file for the current configs in this repo. You can also recompute it.

Inference with Released Checkpoints

The released checkpoints and their corresponding dataset stats are available on Hugging Face.

Optional: download released checkpoints and dataset stats from Hugging Face:

pip install -U huggingface_hub

huggingface-cli download yuanty/fastwam \
  libero_uncond_2cam224.pt \
  libero_uncond_2cam224_dataset_stats.json \
  robotwin_uncond_3cam_384.pt \
  robotwin_uncond_3cam_384_dataset_stats.json \
  --local-dir ./checkpoints/fastwam_release

After downloading, the local directory is expected to contain:

checkpoints/fastwam_release/
├── libero_uncond_2cam224.pt
├── libero_uncond_2cam224_dataset_stats.json
├── robotwin_uncond_3cam_384.pt
└── robotwin_uncond_3cam_384_dataset_stats.json

Before running the LIBERO benchmark, install the official LIBERO environment first from the LIBERO repository. Then run this final step:

pip install mujoco==3.3.2

The mujoco environment should ideally stay consistent with the LIBERO data version.

We have already copied the RoboTwin evaluation-related code into third_party/RoboTwin. You still need to follow the official RoboTwin instructions from the RoboTwin repository to finish environment installation and download the required assets, then create the policy symlink:

ln -sfn "$(pwd)/experiments/robotwin/fastwam_policy" "$(pwd)/third_party/RoboTwin/policy/fastwam_policy"

Optional: evaluate released LIBERO checkpoint:

The released LIBERO / RoboTwin evaluation managers default to 8 GPUs (MULTIRUN.num_gpus=8 in configs/sim_libero.yaml and configs/sim_robotwin.yaml). If you want to evaluate with fewer GPUs, pass a smaller value such as MULTIRUN.num_gpus=4.

python experiments/libero/run_libero_manager.py \
  task=libero_uncond_2cam224_1e-4 \
  ckpt=./checkpoints/fastwam_release/libero_uncond_2cam224.pt \
  EVALUATION.dataset_stats_path=./checkpoints/fastwam_release/libero_uncond_2cam224_dataset_stats.json \
  MULTIRUN.num_gpus=8

Optional: evaluate released RoboTwin checkpoint:

python experiments/robotwin/run_robotwin_manager.py \
  task=robotwin_uncond_3cam_384_1e-4 \
  ckpt=./checkpoints/fastwam_release/robotwin_uncond_3cam_384.pt \
  EVALUATION.dataset_stats_path=./checkpoints/fastwam_release/robotwin_uncond_3cam_384_dataset_stats.json \
  MULTIRUN.num_gpus=8

For faster RoboTwin evaluation, we have enabled EVALUATION.skip_get_obs_within_replan=true in configs/sim_robotwin.yaml. This skips RGB rendering while consecutively executing an action chunk within one replan window, which speeds up evaluation but makes the saved video look very low-FPS. Set it to false if you want to save a fully rendered video.

Note: We evaluate with unseen instructions, following Motus. Lingbot-VA uses seen instructions instead. You can try EVALUATION.instruction_type=seen to use seen instructions, which should theoretically improve performance by one or two points.

Training

1) Precompute T5 embedding cache before training

Use scripts/precompute_text_embeds.py to precompute embeddings for each training task:

# LIBERO
python scripts/precompute_text_embeds.py task=libero_uncond_2cam224_1e-4

# RoboTwin
python scripts/precompute_text_embeds.py task=robotwin_uncond_3cam_384_1e-4

For multi-GPU:

torchrun --standalone --nproc_per_node=8 scripts/precompute_text_embeds.py task=libero_uncond_2cam224_1e-4

2) Training (using fastwam as an example)

When running a new task for the first time, set pretrained_norm_stats in the corresponding configs/data/*.yaml to null first. After one training run, a dataset_stats.json file will be generated in the current run directory (for example, runs/{task_name}/{run_id}/dataset_stats.json). You can then update pretrained_norm_stats to that file path for subsequent runs.

# LIBERO
bash scripts/train_zero1.sh 8 task=libero_uncond_2cam224_1e-4

# RoboTwin
bash scripts/train_zero1.sh 8 task=robotwin_uncond_3cam_384_1e-4

For LIBERO, we train on a single node with 8 GPUs. For RoboTwin, we use 64 GPUs to accelerate training. You can try reducing the GPU count or training epochs.

Inference with Your Trained Checkpoints

The mujoco environment should ideally stay consistent with the LIBERO data version. Then run LIBERO evaluation:

# LIBERO
python experiments/libero/run_libero_manager.py task={task_name} ckpt={ckpt_path}

We have already copied the RoboTwin evaluation-related code into third_party/RoboTwin. You still need to follow the official RoboTwin instructions from the RoboTwin repository. Finish installation and download the required assets, then create the policy symlink:

ln -sfn "$(pwd)/experiments/robotwin/fastwam_policy" "$(pwd)/third_party/RoboTwin/policy/fastwam_policy"

Then run RoboTwin evaluation:

python experiments/robotwin/run_robotwin_manager.py task={task_name} ckpt={ckpt_path}

Common task_name examples:

l

Related Skills

View on GitHub
GitHub Stars356
CategoryDevelopment
Updated9m ago
Forks19

Languages

Python

Security Score

80/100

Audited on Apr 4, 2026

No findings