SkillAgentSearch skills...

Pvsm

Official code release for the PVSM paper: "From Rays to Projections: Better Inputs for Feed-Forward View Synthesis"

Install / Use

/learn @wuzirui/Pvsm
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <p align="center"> <img src="assets/logo.png" alt="PVSM Logo" width="90%"/> </p> <p align="center"> <a href="https://wuzirui.github.io">Zirui Wu</a>, <a href="https://jzr99.github.io/">Zeren Jiang</a>, <a href="https://oswaldm.github.io/">Martin R. Oswald</a>, <a href="https://facultyprofiles.hkust-gz.edu.cn/faculty-personal-page/SONG-Jie/jsongroas">Jie Song</a> </p> <div align="center"> <a href="https://wuzirui.github.io/pvsm-web"><strong>Project Page</strong></a> | <a href="https://arxiv.org/abs/2601.05116"><strong>ArXiV</strong></a> | <a href="https://youtu.be/HSQnrXCU_BM"><strong>YouTube</strong></a> </div>

Preprint 2025

</div>

Installation

conda create -n pvsm python=3.11
conda activate pvsm
# Install torch torchvision based on your environment configurations
pip install -r requirements.txt

There's a known issue of the current release of gsplat==1.5.3, so please install gsplat via source for now:

# Install gsplat from source
pip install git+https://github.com/nerfstudio-project/gsplat.git

Quick Start

Download Checkpoints

Download DINOv3-ViT-B and place it under metric_checkpoints/;

Download our pre-trained model checkpoints:

  1. 12-layer model (small): OneDrive
  2. 24-layer model (full): OneDrive

After downloading, organize your checkpoints directory as follows:

metric_checkpoints/
├── pvsm_finetuned_full.pt          # Our trained full 24-layer model
├── pvsm_finetuned_small.pt         # Our trained smaller 12-layer model
├── dinov3-vitb16-pretrain-lvd1689m # DINOv3 Checkpoint
│   ├── config.json
│   ├── LICENSE.md
│   ├── model.safetensors
│   ├── preprocessor_config.json
│   └── README.md
├── imagenet-vgg-verydeep-19.mat    # (Optional) for training
└── map-anything                    # (Optional) for dataset generation
    ├── config.json
    ├── model.safetensors
    └── README.md

Interactive Demo

For a quick interactive demo, please follow the instruction and unzip the downloaded example data (22.3 MB) to your local machine.

To launch the interactive web-based demo:

torchrun --nproc_per_node 1 --standalone viser_demo.py --config-name runs/pvsm_finetuned_small

The demo will start a web server. Open your browser and navigate to the displayed URL to interact with the model.

System Requirements:

  • Small model: ~2.5GB VRAM
  • Full model: ~3.0GB VRAM

Note: The rendering quality in gsplat is compressed.

Running Inference

To run inference on a dataset:

python inference.py --config-name runs/pvsm_finetuned_small

Or for the full model:

python inference.py --config-name runs/pvsm_finetuned_full

Training

To train the model:

torchrun --nproc_per_node <num_gpus> train.py --config-name runs/pvsm_finetuned_small

Configuration:

  • Training configurations are located in configs/runs/
  • Model configurations are in configs/model/
  • Dataset configurations are in configs/dataset/

API Keys: Before training, create configs/api_keys.yaml with your WandB API key:

wandb: YOUR_WANDB_KEY

You can use configs/api_keys_example.yaml as a template.

Citation

If you find this work useful in your research, please consider citing:

@article{wu_pvsm_2026,
  title={From Rays to Projections: Better Inputs for Feed-Forward View Synthesis},
  author={Wu, Zirui and Jiang, Zeren and Oswald, Martin R. and Song, Jie},
  journal={arxiv preprint arxiv:2601.05116},
  year={2026}
}

License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. See LICENSE.md for details.

Acknowledgement

This work is built upon LVSM's code base.

Related Skills

View on GitHub
GitHub Stars42
CategoryDevelopment
Updated1d ago
Forks0

Languages

Python

Security Score

75/100

Audited on Mar 28, 2026

No findings