SkillAgentSearch skills...

Astrolabe

Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Install / Use

/learn @franklinz233/Astrolabe
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <h2>Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models</h2>

Project Page arXiv Paper License

<br>

Songchun Zhang<sup>1</sup>, Zeyue Xue<sup>2,3</sup>, Siming Fu<sup>2</sup>, Jie Huang<sup>2</sup>, Xianghao Kong<sup>1</sup>, Yue Ma<sup>1</sup>, Haoyang Huang<sup>2</sup>, Nan Duan<sup>2✉</sup>, Anyi Rao<sup>1✉</sup>

<sup>1</sup>HKUST    <sup>2</sup>JD Explore Academy    <sup>3</sup>HKU    <sup></sup>Corresponding Authors

</div>

🔭 Overview

Astrolabe is an efficient online Reinforcement Learning (RL) framework designed to align distilled autoregressive (AR) streaming video models with human visual preferences. Without sacrificing real-time inference speed, Astrolabe consistently and robustly improves visual aesthetics and temporal consistency across various baseline models for both short and long video generation.

🎬 Demo

<div align="center">

| Sample 1 | Sample 2 | |:---:|:---:| | | | | Sample 3 | Sample 4 | | | |

</div>

📢 News

  • 2026-03-23 — Code released!
  • 2026-03-18 — Paper released on arXiv!

📑 Table of Contents

📊 Supported Methods & Rewards

Supported Base Models

| Model | Config File | |---|---| | LongLive | configs/nft_longlive.py | | Self-Forcing | configs/nft_self_forcing.py | | Causal Forcing | configs/nft_casual_forcing.py | | Krea 14B | configs/nft_krea14b.py |

Supported Reward Models

| Reward | Key in Config | What It Measures | Source | |---|---|---|---| | HPSv3 | video_hpsv3_local | Frame-level aesthetic & visual quality | HPSv3 | | VideoAlign – VQ | videoalign_vq_score | Per-frame visual fidelity scored by VideoReward | VideoReward | | VideoAlign – MQ | videoalign_mq_score | Temporal smoothness & motion naturalness | VideoReward | | VideoAlign – TA | videoalign_ta_score | Prompt–video semantic alignment | VideoReward |

Rewards can be freely combined with per-reward weights, e.g. reward_fn={"video_hpsv3_local": 1.0, "videoalign_mq_score": 1.0}.


🚀 Quick Start

1. Environment Setup

Tested configuration: Python 3.10.16, CUDA 12.8, NVIDIA H200 GPUs

# Clone the repository
git clone https://github.com/franklinz233/Astrolabe.git
cd Astrolabe

# Create and activate conda environment
conda create -n astrolabe python=3.10.16
conda activate astrolabe

pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124

# Install other dependencies
pip install -r requirements.txt

# Install Flash Attention (pre-built wheel for CUDA 12 + PyTorch 2.6)
pip install flash-attn==2.7.4.post1 --no-build-isolation

# You can also download the pre-build wheel
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

2. Model Download

We support four distilled AR video model baselines. Download the base Wan2.1 model and the desired distilled checkpoint(s):

Base Model

huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B

Distilled Model Checkpoints

<details> <summary><b>Self-Forcing</b></summary>
huggingface-cli download gdhe17/Self-Forcing checkpoints/self_forcing_dmd.pt --local-dir .
</details> <details> <summary><b>Causal Forcing</b></summary>
huggingface-cli download zhuhz22/Causal-Forcing chunkwise/causal_forcing.pt --local-dir checkpoints/casualforcing
huggingface-cli download zhuhz22/Causal-Forcing framewise/causal_forcing.pt --local-dir checkpoints/casualforcing
</details> <details> <summary><b>LongLive</b></summary>
huggingface-cli download Efficient-Large-Model/LongLive-1.3B --include "models/*" --local-dir checkpoints/longlive_models
</details> <details> <summary><b>Krea 14B</b></summary>
huggingface-cli download krea/krea-realtime-video \
  krea-realtime-video-14b.safetensors \
  --local-dir checkpoints
</details>

Expected Directory Structure

checkpoints/
├── casualforcing/
│   ├── chunkwise/
│   │   └── causal_forcing.pt
│   └── framewise/
│       └── causal_forcing.pt
├── krea-realtime-video-14b.safetensors
├── longlive_models/
│   └── models/
│       ├── longlive_base.pt
│       └── lora.pt
└── self_forcing_dmd.pt

3. Reward Models Preparation

Download reward model checkpoints:

mkdir -p reward_ckpts && cd reward_ckpts

# CLIP backbone (required by HPSv2/v3)
wget https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/resolve/main/open_clip_pytorch_model.bin

# HPSv2.1 checkpoint
wget https://huggingface.co/xswu/HPSv2/resolve/main/HPS_v2.1_compressed.pt

# HPSv3 checkpoint
wget https://huggingface.co/MizzenAI/HPSv3/resolve/main/HPSv3.safetensors

# VideoReward checkpoint
huggingface-cli download KlingTeam/VideoReward --local-dir ./Videoreward

4. Start Training

Training Prompts

Download the filtered VidProM prompt subset used for training:

huggingface-cli download Franklinzhang/stream_align \
  --include "vidprom/*" \
  --local-dir ./dataset

W&B Logging (Optional but Recommended)

export WANDB_API_KEY=<your_key>
export WANDB_ENTITY=<your_entity>

GPU presets: Training parameters like num_image_per_prompt, num_groups, and test_batch_size are auto-configured per GPU scale in GPU_CONFIGS inside configs/_base_clean.py. Add or edit entries there for custom GPU counts.

Multi-node setup: If your cluster cannot resolve MASTER_ADDR automatically, use a shared filesystem for node discovery. Run on each node, setting RANK=0 on master and RANK=1,2,... on workers:

export RANK=<node_rank>       # 0 for master, 1, 2, ... for workers
export WORLD_SIZE=<num_nodes> # 2→16 GPUs (2×8), 3→24 GPUs (3×8), 6→48 GPUs (6×8)
export MASTER_PORT=29500
CONFIG_NAME=<config_name>

torchrun --nproc_per_node=8 --nnodes=$WORLD_SIZE \
    --node_rank=$RANK \
    --master_addr=$MASTER_ADDR \
    --master_port=$MASTER_PORT \
    scripts/train_nft_wan.py \
    --config configs/nft_<model>.py:${CONFIG_NAME}

Multi-reward: Replace hpsv3 with multi_reward in any config name to enable the full multi-reward objective (HPSv3 + Motion Quality).


LongLive

💡 LoRA Initialization (recommended for LongLive): LongLive ships a pretrained LoRA adapter (checkpoints/longlive_models/models/lora.pt) that can be used to warm-start training. Simply append _with_lora_init to any LongLive config name to enable it — the adapter is loaded before RL training begins and typically leads to faster convergence.

<details> <summary><b>Single Node (8× GPU)</b></summary>
# HPSv3 reward
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
    --config configs/nft_longlive.py:longlive_video_hpsv3

# HPSv3 reward — with LoRA init
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
    --config configs/nft_longlive.py:longlive_video_hpsv3_with_lora_init

# Multi-reward (HPSv3 + Motion Quality)
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
    --config configs/nft_longlive.py:longlive_video_multi_reward
</details> <details> <summary><b>Multi-Node (16× / 24× / 48× GPU)</b></summary>

| Scale | HPSv3 Config | HPSv3 + LoRA Init | Multi-Reward Config | |---|---|---|---| | 16× GPU | longlive_video_hpsv3_16gpu | longlive_video_hpsv3_with_lora_init_16gpu | longlive_video_multi_reward_16gpu | | 24× GPU | longlive_video_hpsv3_24gpu | longlive_video_hpsv3_with_lora_init_24gpu | longlive_video_multi_reward_24gpu | | 48× GPU | longlive_video_hpsv3_48gpu | longlive_video_hpsv3_with_lora_init_48gpu | longlive_video_multi_reward_48gpu |

</details>

Self-Forcing

<details> <summary><b>Single Node (8× GPU)</b></summary>
# HPSv3 reward
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
    --config configs/nft_self_forcing.py:self_forcing_video_hpsv3

# Multi-reward (HPSv3 + Motion Quality)
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
    --config configs/nft_self_forcing.py:self_forcing_video_multi_reward
</details> <details> <summary><b>Multi-Node (16× / 24× / 48× GPU)</b></summary>

| Scale | HPSv3 Config | Multi-Reward Config | |---|---|---| | 16× GPU | self_forcing_video_hpsv3_16gpu | self_forcing_video_multi_reward_16gpu | | 24× GPU | self_forcing_video_hpsv3_24gpu | self_forcing_video_multi_reward_24gpu | | 48× GPU | self_forcing_video_hpsv3_48gpu | self_forcing_video_multi_reward_48gpu |

</details>

--

Related Skills

View on GitHub
GitHub Stars116
CategoryContent
Updated12h ago
Forks1

Languages

Python

Security Score

80/100

Audited on Mar 31, 2026

No findings