SkillAgentSearch skills...

InfiniteVGGT

The official implementation of InfiniteVGGT

Install / Use

/learn @AutoLab-SAI-SJTU/InfiniteVGGT
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<h1 align="left"> <img src="assets/InfiniteVGGT_Logo.jpg" alt="Logo" height="40px" style="vertical-align: middle;"> <span style="vertical-align: middle;">InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams</span> </h1> <div align="center"> <p> <a> <img src="assets/autolab_logo.png" alt="Autolab Logo" height="50" align="middle"> </a> &nbsp;&nbsp; <a href="https://github.com/Henryyuan429">Shuai Yuan,</a><sup>1</sup>&nbsp;&nbsp; <a href="https://github.com/YantaiYang-05">Yantai Yang,</a><sup>1, 2</sup>&nbsp;&nbsp; <a>Xiaotian Yang,</a><sup>1</sup>&nbsp;&nbsp; <a>Xupeng Zhang,</a><sup>1</sup>&nbsp;&nbsp; <br> <a>Zhonghao Zhao,</a><sup>1</sup>&nbsp;&nbsp; <a>Lingming Zhang,</a><sup></sup>&nbsp;&nbsp; <a href="https://zhipengzhang.cn/">Zhipeng Zhang</a><sup>1 ✉</sup>&nbsp;&nbsp; </p> <p> <sup>1</sup><a>AutoLab, School of Artificial Intelligence, Shanghai Jiao Tong University</a>&nbsp;&nbsp; <br> <sup>2</sup><a>Anyverse Dynamics</a> </p> <p> <sup>✉</sup> Corresponding Author </p> </div> <p align="center"> <a href="https://arxiv.org/abs/2601.02281v1"><img src="https://img.shields.io/badge/arXiv-InfiniteVGGT-red?logo=arxiv" alt="Paper PDF"></a> <a href="https://huggingface.co/papers/2601.02281"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging_Face-InfiniteVGGT-yellow" alt="Hugging Face"></a> <p align="center"> <img src="assets/InfiniteVGGT.gif" width="70%"> </p> <p> <i> Achieving higher reconstruction quality and more accurate camera pose estimation using thousands of frames input.</i> </p>

📰 News

  • [Jan 6 , 2026] Paper release.
  • [Jan 6 , 2026] Code release.
  • [Jan 19 , 2026] Long3D dataset release.

🔍 Recommendation

  • Welcome to check out our previous collaborative work FastVGGT.

📖 Overview

We propose InfiniteVGGT, a causal visual geometry transformer that utilizes a training-free rolling memory mechanism to enable stable, infinite-horizon streaming, and introduce the Long3D benchmark to rigorously evaluate long-term continuous 3D geometry performance. Our main contributions are summarized as follows:

  1. An unbounded memory architecture InfiniteVGGT for continuous 3D geometry understanding, built on a novel, dynamic, and interpretable explicit memory system.
  2. State-of-the-art performance on long-sequence benchmarks and a unique capability for robust, infinite-horizon reconstruction without memory overflow.
  3. The Long3D benchmark, a new dataset for the rigorous evaluation of long-term performance, addressing a critical gap in the field.
<div align="center"> <a> <img src="assets/method.png" width="90%"> </a> </div>

🌍 Installation

  1. Clone InfiniteVGGT
git clone https://github.com/AutoLab-SAI-SJTU/InfiniteVGGT.git
cd InfiniteVGGT
  1. Create conda environment
conda create -n infinitevggt python=3.11 cmake=3.14.0
conda activate infinitevggt 
  1. Install requirements
pip install -r requirements.txt
conda install 'llvm-openmp<16'
  1. Download the StreamVGGT pretrained checkpoint and place it to ./ckpt directory.

▶️ Run Inference

# Run on your own data
python run_inference.py --input_dir path/to/your/images_dir

# Run long sequence and store the result to directory for each frame
python run_inference.py \
    --input_dir path/to/your/images_dir \
    --frame_cache_dir path/to/your/results_perframe_dir \
    --no_cache_results

🚀 Run Demo

We provide demo code based on the NRGBD dataset. You can run it using the following command:

python demo_viser.py  \
    --seq_path path/to/nrgbd/image_sequence \
    --frame_interval 10 \
    --gt_path path/to/nrgbd/gt_camera (Optional)

🧊 Long3D Dataset

The Long3D Dataset is a benchmark designed for long-sequence 3D scene reconstruction. It provides 10Hz image streams paired with dense ground truth point clouds.

📊 Data Description

| File Name | Description | | :--- | :--- | | image.7z | Continuous image stream data captured at a frequency of 10 Hz. | | dense_cloud_map.pcd | Global ground truth point clouds, acquired via a 3D spatial scanner. |


📥 Download Instructions

Option1: Hugging Face CLI:

The most efficient way to download the dataset is using the huggingface-hub CLI. Ensure you have the library installed (pip install -U huggingface_hub).

# export HF_ENDPOINT=https://hf-mirror.com
hf download --repo-type dataset \
    --resume-download AutoLab-SJTU/Long3D \
    --local-dir ./Long3D

Option2: Manual Access:

Alternatively, you can browse and download files directly from the Long3D dataset.

📋 Checklist

  • [ √ ] Release the Dataset.

🙏 Acknowledgement

We would like to acknowledge the following open-source projects that served as a foundation for our implementation:

DUSt3R CUT3R VGGT Point3R StreamVGGT FastVGGT TTT3R

Many thanks to these authors!

📜 Citation

If you incorporate our work into your research, please cite:

@misc{yuan2026infinitevggt,
        title={InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams}, 
        author={Shuai Yuan and Yantai Yang and Xiaotian Yang and Xupeng Zhang and Zhonghao Zhao and Lingming Zhang and Zhipeng Zhang},
        journal={arXiv preprint arXiv:2601.02281},
        year={2026}
}
View on GitHub
GitHub Stars340
CategoryDevelopment
Updated1d ago
Forks17

Languages

Python

Security Score

95/100

Audited on Apr 7, 2026

No findings