Astra

[ICLR 2026] Astra : General Interactive World Model with Autoregressive Denoising"

Generate Convert Improve

Install / Use

/learn @EternalEvan/Astra

About this skill

Quality Score

0/100

README

Astra<img src="./assets/images/logo.png" alt="logo" style="height: 1em; vertical-align: baseline; margin: 0 0.1em;">: General Interactive World Model with Autoregressive Denoising (ICLR 2026)

<div align="center"> <div style="margin-top: 0; margin-bottom: -20px;"> <img src="./assets/images/logo-text-2.png" width="50%" /> </div> <h3 style="margin-top: 0;"> 📄 [<a href="https://arxiv.org/pdf/2512.08931" target="_blank">arXiv</a>]    🏠 [<a href="https://eternalevan.github.io/Astra-project/" target="_blank">Project Page</a>]    🤗 [<a href="https://huggingface.co/EvanEternal/Astra" target="_blank">Huggingface</a>] </h3> </div> <div align="center">

Yixuan Zhu<sup>1</sup>, Jiaqi Feng<sup>1</sup>, Wenzhao Zheng<sup>1 †</sup>, Yuan Gao<sup>2</sup>, Xin Tao<sup>2</sup>, Pengfei Wan<sup>2</sup>, Jie Zhou <sup>1</sup>, Jiwen Lu<sup>1</sup>

(† Project leader)

<sup>1</sup>Tsinghua University, <sup>2</sup>Kuaishou Technology.

</div>

📖 Introduction

TL;DR: Astra is an interactive world model that delivers realistic long-horizon video rollouts under a wide range of scenarios and action inputs.

Astra is an interactive, action-driven world model that predicts long-horizon future videos across diverse real-world scenarios. Built on an autoregressive diffusion transformer with temporal causal attention, Astra supports streaming prediction while preserving strong temporal coherence. Astra introduces noise-augmented history memory to stabilize long rollouts, an action-aware adapter for precise control signals, and a mixture of action experts to route heterogeneous action modalities. Through these key innovations, Astra delivers consistent, controllable, and high-fidelity video futures for applications such as autonomous driving, robot manipulation, and camera motion.

Gallery

Astra+Wan2.1

🔥 News

[2026.1.26]: Our paper has been accepted to ICLR 2026! 🎉
[2025.11.17]: Release the project page.
[2025.12.09]: Release the inference code, model checkpoint.

🎯 TODO List

[x] Release dataset preprocessing tools
[ ] Release full inference pipelines for additional scenarios:
- [ ] 🚗 Autonomous driving
- [ ] 🤖 Robotic manipulation
- [ ] 🛸 Drone navigation / exploration
[ ] Open-source training scripts:
- [x] ⬆️ Action-conditioned autoregressive denoising training
- [ ] 🔄 Multi-scenario joint training pipeline
[ ] Provide unified evaluation toolkit

⚙️ Run Astra (Inference and Training)

Astra is built upon Wan2.1-1.3B, a diffusion-based video generation model. We provide inference scripts to help you quickly generate videos from images and action inputs. Follow the steps below:

Inference

Step 1: Set up the environment

DiffSynth-Studio requires Rust and Cargo to compile extensions. You can install them using the following command:

curl --proto '=https' --tlsv1.2 -sSf [https://sh.rustup.rs](https://sh.rustup.rs/) | sh
. "$HOME/.cargo/env"

Install DiffSynth-Studio:

git clone https://github.com/EternalEvan/Astra.git
cd Astra
pip install -e .

Step 2: Download the pretrained checkpoints

Download the pre-trained Wan2.1 models

cd script
python download_wan2.1.py

Download the pre-trained Astra checkpoint

Please download from huggingface and place it in models/Astra/checkpoints.

Step 3: Test the example image

python infer_demo.py \
  --dit_path ../models/Astra/checkpoints/diffusion_pytorch_model.ckpt \
  --wan_model_path ../models/Wan-AI/Wan2.1-T2V-1.3B \
  --condition_image ../examples/condition_images/garden_1.png \
  --cam_type 4 \
  --prompt "A sunlit European street lined with historic buildings and vibrant greenery creates a warm, charming, and inviting atmosphere. The scene shows a picturesque open square paved with red bricks, surrounded by classic narrow townhouses featuring tall windows, gabled roofs, and dark-painted facades. On the right side, a lush arrangement of potted plants and blooming flowers adds rich color and texture to the foreground. A vintage-style streetlamp stands prominently near the center-right, contributing to the timeless character of the street. Mature trees frame the background, their leaves glowing in the warm afternoon sunlight. Bicycles are visible along the edges of the buildings, reinforcing the urban yet leisurely feel. The sky is bright blue with scattered clouds, and soft sun flares enter the frame from the left, enhancing the scene’s inviting, peaceful mood."  \
  --output_path ../examples/output_videos/output_moe_framepack_sliding.mp4 \

This inference can be conducted on a single 24GB GPU, such as the NVIDIA 3090.

Step 4: Test your own images

To test with your own custom images, you need to prepare the target images and their corresponding text prompts. We recommend that the size of the input images is close to 832×480 (width × height, 16:9), which is consistent with the resolution of the generated video and can help achieve better video generation effects. For prompts generation, you can refer to the Prompt Extension section in Wan2.1 for guidance on crafting the captions.

python infer_demo.py \
  --dit_path path/to/your/dit_ckpt \

Related Skills

docs-writer

98.8k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

331.7k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

arscontexta

2.8k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

EternalEvan

View profile

View on GitHub

GitHub Stars232

CategoryContent

Updated1d ago

Forks5

EternalEvan/Astra

Languages

Python

Security Score

100/100

Audited on Mar 22, 2026

No findings