Astrolabe

Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Generate Convert Improve

Install / Use

/learn @franklinz233/Astrolabe

About this skill

Quality Score

0/100

README

<div align="center"> <h2>Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models</h2>

Songchun Zhang1, Zeyue Xue2,3, Siming Fu2, Jie Huang2, Xianghao Kong1, Yue Ma1, Haoyang Huang2, Nan Duan2✉, Anyi Rao1✉

1HKUST 2JD Explore Academy 3HKU ✉Corresponding Authors

</div>

🔭 Overview

Astrolabe is an efficient online Reinforcement Learning (RL) framework designed to align distilled autoregressive (AR) streaming video models with human visual preferences. Without sacrificing real-time inference speed, Astrolabe consistently and robustly improves visual aesthetics and temporal consistency across various baseline models for both short and long video generation.

🎬 Demo

</div>

📢 News

2026-03-23 — Code released!
2026-03-18 — Paper released on arXiv!

📊 Supported Methods & Rewards

Supported Base Models

| Model | Config File | |---|---| | LongLive | configs/nft_longlive.py | | Self-Forcing | configs/nft_self_forcing.py | | Causal Forcing | configs/nft_casual_forcing.py | | Krea 14B | configs/nft_krea14b.py |

Supported Reward Models

| Reward | Key in Config | What It Measures | Source | |---|---|---|---| | HPSv3 | video_hpsv3_local | Frame-level aesthetic & visual quality | HPSv3 | | VideoAlign – VQ | videoalign_vq_score | Per-frame visual fidelity scored by VideoReward | VideoReward | | VideoAlign – MQ | videoalign_mq_score | Temporal smoothness & motion naturalness | VideoReward | | VideoAlign – TA | videoalign_ta_score | Prompt–video semantic alignment | VideoReward |

Rewards can be freely combined with per-reward weights, e.g. reward_fn={"video_hpsv3_local": 1.0, "videoalign_mq_score": 1.0}.

🚀 Quick Start

1. Environment Setup

Tested configuration: Python 3.10.16, CUDA 12.8, NVIDIA H200 GPUs

# Clone the repository
git clone https://github.com/franklinz233/Astrolabe.git
cd Astrolabe

# Create and activate conda environment
conda create -n astrolabe python=3.10.16
conda activate astrolabe

pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124

# Install other dependencies
pip install -r requirements.txt

# Install Flash Attention (pre-built wheel for CUDA 12 + PyTorch 2.6)
pip install flash-attn==2.7.4.post1 --no-build-isolation

# You can also download the pre-build wheel
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

2. Model Download

We support four distilled AR video model baselines. Download the base Wan2.1 model and the desired distilled checkpoint(s):

Base Model

huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B

Distilled Model Checkpoints

<details> <summary>Self-Forcing</summary>

huggingface-cli download gdhe17/Self-Forcing checkpoints/self_forcing_dmd.pt --local-dir .

</details> <details> <summary>Causal Forcing</summary>

huggingface-cli download zhuhz22/Causal-Forcing chunkwise/causal_forcing.pt --local-dir checkpoints/casualforcing
huggingface-cli download zhuhz22/Causal-Forcing framewise/causal_forcing.pt --local-dir checkpoints/casualforcing

</details> <details> <summary>LongLive</summary>

huggingface-cli download Efficient-Large-Model/LongLive-1.3B --include "models/*" --local-dir checkpoints/longlive_models

</details> <details> <summary>Krea 14B</summary>

huggingface-cli download krea/krea-realtime-video \
  krea-realtime-video-14b.safetensors \
  --local-dir checkpoints

</details>

Expected Directory Structure

checkpoints/
├── casualforcing/
│   ├── chunkwise/
│   │   └── causal_forcing.pt
│   └── framewise/
│       └── causal_forcing.pt
├── krea-realtime-video-14b.safetensors
├── longlive_models/
│   └── models/
│       ├── longlive_base.pt
│       └── lora.pt
└── self_forcing_dmd.pt

3. Reward Models Preparation

Download reward model checkpoints:

mkdir -p reward_ckpts && cd reward_ckpts

# CLIP backbone (required by HPSv2/v3)
wget https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/resolve/main/open_clip_pytorch_model.bin

# HPSv2.1 checkpoint
wget https://huggingface.co/xswu/HPSv2/resolve/main/HPS_v2.1_compressed.pt

# HPSv3 checkpoint
wget https://huggingface.co/MizzenAI/HPSv3/resolve/main/HPSv3.safetensors

# VideoReward checkpoint
huggingface-cli download KlingTeam/VideoReward --local-dir ./Videoreward

4. Start Training

Training Prompts

Download the filtered VidProM prompt subset used for training:

huggingface-cli download Franklinzhang/stream_align \
  --include "vidprom/*" \
  --local-dir ./dataset

W&B Logging (Optional but Recommended)

export WANDB_API_KEY=<your_key>
export WANDB_ENTITY=<your_entity>

GPU presets: Training parameters like num_image_per_prompt, num_groups, and test_batch_size are auto-configured per GPU scale in GPU_CONFIGS inside configs/_base_clean.py. Add or edit entries there for custom GPU counts.

Multi-node setup: If your cluster cannot resolve MASTER_ADDR automatically, use a shared filesystem for node discovery. Run on each node, setting RANK=0 on master and RANK=1,2,... on workers:
export RANK=<node_rank> # 0 for master, 1, 2, ... for workers
export WORLD_SIZE=<num_nodes> # 2→16 GPUs (2×8), 3→24 GPUs (3×8), 6→48 GPUs (6×8)
export MASTER_PORT=29500
CONFIG_NAME=<config_name>

torchrun --nproc_per_node=8 --nnodes=$WORLD_SIZE \
 --node_rank=$RANK \
 --master_addr=$MASTER_ADDR \
 --master_port=$MASTER_PORT \
 scripts/train_nft_wan.py \
 --config configs/nft_<model>.py:${CONFIG_NAME}
Multi-reward: Replace hpsv3 with multi_reward in any config name to enable the full multi-reward objective (HPSv3 + Motion Quality).

LongLive

💡 LoRA Initialization (recommended for LongLive): LongLive ships a pretrained LoRA adapter (checkpoints/longlive_models/models/lora.pt) that can be used to warm-start training. Simply append _with_lora_init to any LongLive config name to enable it — the adapter is loaded before RL training begins and typically leads to faster convergence.

<details> <summary>Single Node (8× GPU)</summary>

# HPSv3 reward
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
    --config configs/nft_longlive.py:longlive_video_hpsv3

# HPSv3 reward — with LoRA init
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
    --config configs/nft_longlive.py:longlive_video_hpsv3_with_lora_init

# Multi-reward (HPSv3 + Motion Quality)
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
    --config configs/nft_longlive.py:longlive_video_multi_reward

</details> <details> <summary>Multi-Node (16× / 24× / 48× GPU)</summary>

| Scale | HPSv3 Config | HPSv3 + LoRA Init | Multi-Reward Config | |---|---|---|---| | 16× GPU | longlive_video_hpsv3_16gpu | longlive_video_hpsv3_with_lora_init_16gpu | longlive_video_multi_reward_16gpu | | 24× GPU | longlive_video_hpsv3_24gpu | longlive_video_hpsv3_with_lora_init_24gpu | longlive_video_multi_reward_24gpu | | 48× GPU | longlive_video_hpsv3_48gpu | longlive_video_hpsv3_with_lora_init_48gpu | longlive_video_multi_reward_48gpu |

</details>

Self-Forcing

<details> <summary>Single Node (8× GPU)</summary>

# HPSv3 reward
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
    --config configs/nft_self_forcing.py:self_forcing_video_hpsv3

# Multi-reward (HPSv3 + Motion Quality)
torchrun --nproc_per_node=8 scripts/train_nft_wan.py \
    --config configs/nft_self_forcing.py:self_forcing_video_multi_reward

</details> <details> <summary>Multi-Node (16× / 24× / 48× GPU)</summary>

| Scale | HPSv3 Config | Multi-Reward Config | |---|---|---| | 16× GPU | self_forcing_video_hpsv3_16gpu | self_forcing_video_multi_reward_16gpu | | 24× GPU | self_forcing_video_hpsv3_24gpu | self_forcing_video_multi_reward_24gpu | | 48× GPU | self_forcing_video_hpsv3_48gpu | self_forcing_video_multi_reward_48gpu |

</details>

Related Skills

qqbot-channel

343.1k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

99.7k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

343.1k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

ddd

Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso

franklinz233

View profile

View on GitHub

GitHub Stars116

CategoryContent

Updated12h ago

Forks1

franklinz233/Astrolabe

Languages

Python

Security Score

80/100

Audited on Mar 31, 2026

No findings