Astrolabe
Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models
Install / Use
/learn @franklinz233/AstrolabeREADME
Songchun Zhang<sup>1</sup>, Zeyue Xue<sup>2,3</sup>, Siming Fu<sup>2</sup>, Jie Huang<sup>2</sup>, Xianghao Kong<sup>1</sup>, Yue Ma<sup>1</sup>, Haoyang Huang<sup>2</sup>, Nan Duan<sup>2✉</sup>, Anyi Rao<sup>1✉</sup>
<sup>1</sup>HKUST <sup>2</sup>JD Explore Academy <sup>3</sup>HKU <sup>✉</sup>Corresponding Authors
</div>🔭 Overview
Astrolabe is an efficient online Reinforcement Learning (RL) framework designed to align distilled autoregressive (AR) streaming video models with human visual preferences. Without sacrificing real-time inference speed, Astrolabe consistently and robustly improves visual aesthetics and temporal consistency across various baseline models for both short and long video generation.
🎬 Demo
<div align="center">| Sample 1 | Sample 2 |
|:---:|:---:|
|
|
|
| Sample 3 | Sample 4 |
|
|
|
📢 News
- 2026-03-23 — Code released!
- 2026-03-18 — Paper released on arXiv!
📑 Table of Contents
📊 Supported Methods & Rewards
Supported Base Models
| Model | Config File |
|---|---|
| LongLive | configs/nft_longlive.py |
| Self-Forcing | configs/nft_self_forcing.py |
| Causal Forcing | configs/nft_casual_forcing.py |
| Krea 14B | configs/nft_krea14b.py |
Supported Reward Models
| Reward | Key in Config | What It Measures | Source |
|---|---|---|---|
| HPSv3 | video_hpsv3_local | Frame-level aesthetic & visual quality | HPSv3 |
| VideoAlign – VQ | videoalign_vq_score | Per-frame visual fidelity scored by VideoReward | VideoReward |
| VideoAlign – MQ | videoalign_mq_score | Temporal smoothness & motion naturalness | VideoReward |
| VideoAlign – TA | videoalign_ta_score | Prompt–video semantic alignment | VideoReward |
Rewards can be freely combined with per-reward weights, e.g. reward_fn={"video_hpsv3_local": 1.0, "videoalign_mq_score": 1.0}.
🚀 Quick Start
1. Environment Setup
Tested configuration: Python 3.10.16, CUDA 12.8, NVIDIA H200 GPUs
# Clone the repository
git clone https://github.com/franklinz233/Astrolabe.git
cd Astrolabe
# Create and activate conda environment
conda create -n astrolabe python=3.10.16
conda activate astrolabe
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
# Install other dependencies
pip install -r requirements.txt
# Install Flash Attention (pre-built wheel for CUDA 12 + PyTorch 2.6)
pip install flash-attn==2.7.4.post1 --no-build-isolation
# You can also download the pre-build wheel
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
2. Model Download
We support four distilled AR video model baselines. Download the base Wan2.1 model and the desired distilled checkpoint(s):
Base Model
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B
Distilled Model Checkpoints
<details> <summary><b>Self-Forcing</b></summary>huggingface-cli download gdhe17/Self-Forcing checkpoints/self_forcing_dmd.pt --local-dir .
</details>
<details>
<summary><b>Causal Forcing</b></summary>
huggingface-cli download zhuhz22/Causal-Forcing chunkwise/causal_forcing.pt --local-dir checkpoints/casualforcing
huggingface-cli download zhuhz22/Causal-Forcing framewise/causal_forcing.pt --local-dir checkpoints/casualforcing
</details>
<details>
<summary><b>LongLive</b></summary>
huggingface-cli download Efficient-Large-Model/LongLive-1.3B --include "models/*" --local-dir checkpoints/longlive_models
</details>
<details>
<summary><b>Krea 14B</b></summary>
huggingface-cli download krea/krea-realtime-video \
krea-realtime-video-14b.safetensors \
--local-dir checkpoints
</details>
Expected Directory Structure
checkpoints/
├── casualforcing/
│ ├── chunkwise/
│ │ └── causal_forcing.pt
│ └── framewise/
│ └── causal_forcing.pt
├── krea-realtime-video-14b.safetensors
├── longlive_models/
│ └── models/
│ ├── longlive_base.pt
│ └── lora.pt
└── self_forcing_dmd.pt
3. Reward Models Preparation
Download reward model checkpoints:
mkdir -p reward_ckpts && cd reward_ckpts
# CLIP backbone (required by HPSv2/v3)
wget https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/resolve/main/open_clip_pytorch_model.bin
# HPSv2.1 checkpoint
wget https://huggingface.co/xswu/HPSv2/resolve/main/HPS_v2.1_compressed.pt
# HPSv3 checkpoint
wget https://huggingface.co/MizzenAI/HPSv3/resolve/main/HPSv3.safetensors
# VideoReward checkpoint
huggingface-cli download KlingTeam/VideoReward --local-dir ./Videoreward
4. Start Training
Training Prompts
Download the filtered VidProM prompt subset used for training:
huggingface-cli download Franklinzhang/stream_align \
--include "vidprom/*" \
--local-dir ./dataset
W&B Logging (Optional but Recommended)
export WANDB_API_KEY=<your_key>
export WANDB_ENTITY=<your_entity>
GPU presets: Training parameters like
num_image_per_prompt,num_groups, andtest_batch_sizeare auto-configured per GPU scale inGPU_CONFIGSinsideconfigs/_base_clean.py. Add or edit entries there for custom GPU counts.
Multi-node setup: If your cluster cannot resolve
MASTER_ADDRautomatically, use a shared filesystem for node discovery. Run on each node, settingRANK=0on master andRANK=1,2,...on workers:export RANK=<node_rank> # 0 for master, 1, 2, ... for workers export WORLD_SIZE=<num_nodes> # 2→16 GPUs (2×8), 3→24 GPUs (3×8), 6→48 GPUs (6×8) export MASTER_PORT=29500 CONFIG_NAME=<config_name> torchrun --nproc_per_node=8 --nnodes=$WORLD_SIZE \ --node_rank=$RANK \ --master_addr=$MASTER_ADDR \ --master_port=$MASTER_PORT \ scripts/train_nft_wan.py \ --config configs/nft_<model>.py:${CONFIG_NAME}Multi-reward: Replace
hpsv3withmulti_rewardin any config name to enable the full multi-reward objective (HPSv3 + Motion Quality).
LongLive
<details> <summary><b>Single Node (8× GPU)</b></summary>💡 LoRA Initialization (recommended for LongLive): LongLive ships a pretrained LoRA adapter (
checkpoints/longlive_models/models/lora.pt) that can be used to warm-start training. Simply append_with_lora_initto any LongLive config name to enable it — the adapter is loaded before RL training begins and typically leads to faster convergence.
# HPSv3 reward
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
--config configs/nft_longlive.py:longlive_video_hpsv3
# HPSv3 reward — with LoRA init
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
--config configs/nft_longlive.py:longlive_video_hpsv3_with_lora_init
# Multi-reward (HPSv3 + Motion Quality)
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
--config configs/nft_longlive.py:longlive_video_multi_reward
</details>
<details>
<summary><b>Multi-Node (16× / 24× / 48× GPU)</b></summary>
| Scale | HPSv3 Config | HPSv3 + LoRA Init | Multi-Reward Config |
|---|---|---|---|
| 16× GPU | longlive_video_hpsv3_16gpu | longlive_video_hpsv3_with_lora_init_16gpu | longlive_video_multi_reward_16gpu |
| 24× GPU | longlive_video_hpsv3_24gpu | longlive_video_hpsv3_with_lora_init_24gpu | longlive_video_multi_reward_24gpu |
| 48× GPU | longlive_video_hpsv3_48gpu | longlive_video_hpsv3_with_lora_init_48gpu | longlive_video_multi_reward_48gpu |
Self-Forcing
<details> <summary><b>Single Node (8× GPU)</b></summary># HPSv3 reward
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
--config configs/nft_self_forcing.py:self_forcing_video_hpsv3
# Multi-reward (HPSv3 + Motion Quality)
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
--config configs/nft_self_forcing.py:self_forcing_video_multi_reward
</details>
<details>
<summary><b>Multi-Node (16× / 24× / 48× GPU)</b></summary>
| Scale | HPSv3 Config | Multi-Reward Config |
|---|---|---|
| 16× GPU | self_forcing_video_hpsv3_16gpu | self_forcing_video_multi_reward_16gpu |
| 24× GPU | self_forcing_video_hpsv3_24gpu | self_forcing_video_multi_reward_24gpu |
| 48× GPU | self_forcing_video_hpsv3_48gpu | self_forcing_video_multi_reward_48gpu |
--
Related Skills
qqbot-channel
343.1kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
99.7k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
343.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
ddd
Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso
