InfinityStar
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
Install / Use
/learn @FoundationVision/InfinityStarREADME
Infinity**⭐️**: Unified SpaceTime AutoRegressive Modeling for Visual Generation
<div align="center"> </div> <p align="center" style="font-size: larger;"> <a href="http://arxiv.org/abs/2511.04675">Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation</a> </p> <!-- <p align="center"> <img src="assets/show_images.jpg" width=95%> <p> -->🔥 Updates!!
- Nov 7, 2025: 🔥 Paper, Training and Inference Codes && Checkpoints && Demo Website released!
- Sep 18, 2025: 🎉 InfinityStar is accepted as NeurIPS 2025 Oral.
🕹️ Try and Play with Infinity⭐️!
We provide a demo website for you to play with InfinityStar and generate videos. Enjoy the fun of bitwise video autoregressive modeling!
✨ Overview
We introduce InfinityStar, a unified spacetime autoregressive framework for high-resolution image and dynamic video synthesis.
-
🧠 Unified Spacetime Model: A purely discrete, autoregressive approach that jointly captures spatial and temporal dependencies within a single, elegant architecture.
-
🎬 Versatile Generation: This unified design naturally supports a variety of generation tasks such as text-to-image, text-to-video, image-to-video, and long interactive video synthesis via straightforward temporal autoregression.
-
🏆 Leading Performance & Speed: Through extensive experiments, InfinityStar scores 83.74 on VBench, outperforming all autoregressive models by large margins, even surpassing diffusion competitors like HunyuanVideo, approximately 10x faster than leading diffusion-based methods.
-
📖 Pioneering High-Resolution Autoregressive Generation: To our knowledge, InfinityStar is the first discrete autoregressive video generator capable of producing industrial-level 720p videos, setting a new standard for quality in its class.
🔥 Unified modeling for image, video generation and long interactive video synthesis 📈:
<div align="left"> <img src="assets/framework.png" alt="" style="width: 100%;" /> </div>🎬 Video Demos
General Aesthetics
<div align="left"> <video src="https://github.com/user-attachments/assets/14e2b18b-9234-42ce-bdab-670faeef4b2a" width="100%" controls autoplay loop></video> </div>Anime & 3D Animation
<div align="left"> <video src="https://github.com/user-attachments/assets/478e9571-b550-4c23-a567-6fee9a0afb5b" width="100%" controls autoplay loop></video> </div>Motion
<div align="left"> <video src="https://github.com/user-attachments/assets/adab669b-d38f-4607-9a52-32d8d0bf0e53" width="100%" controls autoplay loop></video> </div>Extended Application: Long Interactive Videos
<div align="center"> <video src="https://github.com/user-attachments/assets/411666a6-563d-4551-a3f8-dc5de00436c1" width="100%" controls autoplay loop></video> </div>Benchmark
Achieve sota performance on image generation benchmark:
<div align="left"> <img src="assets/Infinitystar_image_gen_benchmark.png" alt="Image Generation Evaluation" style="width: 100%;" /> </div>Achieve sota performance on video generation benchmark:
<div align="left"> <img src="assets/Infinitystar_videogen_benchmark.png" alt="" style="width: 100%;" /> </div>Surpassing diffusion competitors like HunyuanVideo*:
<div align="left"> <img src="assets/Infinitystar_videogen_humaneval.png" alt="" style="width: 100%;" /> </div>Visualization
Text to image examples
<div align="left"> <img src="assets/supp_show_images.png" alt="Text to Image Examples" style="width: 100%;" /> </div>Image to video examples
<div align="left"> <img src="assets/i2v_examples.png" alt="Image to Video Examples" style="width: 100%;" /> </div>Video extrapolation examples
<div align="left"> <img src="assets/v2v_examples.png" alt="Video Extrapolation Examples" style="width: 100%;" /> </div>📑 Open-Source Plan
- [x] Training Code
- [x] Web Demo
- [x] InfinityStar Inference Code
- [x] InfinityStar Models Checkpoints
- [x] InfinityStar-Interact Inference Code
- [x] InfinityStar-Interact Checkpoints
Installation
- We use FlexAttention to speedup training, which requires
torch>=2.5.1. - Install other pip packages via
pip3 install -r requirements.txt.
Training Scripts
We provide a comprehensive workflow for training and finetuning our model, covering data organization, feature extraction, and training scripts. For detailed instructions, please refer to data/README.md.
Inference
-
720p Video Generation: Use
tools/infer_video_720p.pyto generate 5-second videos at 720p resolution. Due to the high computational cost of training, our released 720p model is trained for 5-second video generation. This script also supports image-to-video generation by specifying an image path.python3 tools/infer_video_720p.py -
480p Variable-Length Video Generation: We also provide an intermediate checkpoint for 480p resolution, capable of generating videos of 5 and 10 seconds. Since this model is not specifically optimized for Text-to-Video (T2V), we recommend using the experimental Image-to-Video (I2V) and Video-to-Video (V2V) modes for better results. To specify the video duration, you can edit the
generation_durationvariable intools/infer_video_480p.pyto either 5 or 10. This script also supports image-to-video and video continuation by providing a path to an image or a video.python3 tools/infer_video_480p.py -
480p Long Interactive Video Generation: Use
tools/infer_interact_480p.pyto generate a long interactive video in 480p. This script supports interactive video generation. You can provide a reference video and multiple prompts. The model will generate a video interactively with your assistance.python3 tools/infer_interact_480p.py
Citation
If our work assists your research, feel free to give us a star ⭐ or cite us using:
@Article{VAR,
title={Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction},
author={Keyu Tian and Yi Jiang and Zehuan Yuan and Bingyue Peng and Liwei Wang},
year={2024},
eprint={2404.02905},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{Infinity,
title={Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis},
author={Jian Han and Jinlai Liu and Yi Jiang and Bin Yan and Yuqi Zhang and Zehuan Yuan and Bingyue Peng and Xiaobing Liu},
year={2024},
eprint={2412.04431},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.04431},
}
@misc{InfinityStar,
title={InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation},
author={Jinlai Liu and Jian Han and Bin Yan and Hui Wu and Fengda Zhu and Xing Wang and Yi Jiang and Bingyue Peng and Zehuan Yuan},
year={2025},
eprint={2511.04675},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.04675},
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Related Skills
docs-writer
99.6k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
342.0kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
2.9kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
cursor-agent-tracking
134A repository that provides a structured system for maintaining context and tracking changes in Cursor's AGENT mode conversations through template files, enabling better continuity and organization of AI interactions.
