<div align="center"> <p align="center"> <img src="assets/logo.png" alt="PVSM Logo" width="90%"/> </p> <p align="center"> <a href="https://wuzirui.github.io">Zirui Wu</a>, <a href="https://jzr99.github.io/">Zeren Jiang</a>, <a href="https://oswaldm.github.io/">Martin R. Oswald</a>, <a href="https://facultyprofiles.hkust-gz.edu.cn/faculty-personal-page/SONG-Jie/jsongroas">Jie Song</a> </p> <div align="center"> <a href="https://wuzirui.github.io/pvsm-web"><strong>Project Page</strong></a> | <a href="https://arxiv.org/abs/2601.05116"><strong>ArXiV</strong></a> | <a href="https://youtu.be/HSQnrXCU_BM"><strong>YouTube</strong></a> </div>

Preprint 2025

</div>

Installation

conda create -n pvsm python=3.11
conda activate pvsm
# Install torch torchvision based on your environment configurations
pip install -r requirements.txt

There's a known issue of the current release of gsplat==1.5.3, so please install gsplat via source for now:

# Install gsplat from source
pip install git+https://github.com/nerfstudio-project/gsplat.git

Quick Start

Download Checkpoints

Download DINOv3-ViT-B and place it under metric_checkpoints/;

Download our pre-trained model checkpoints:

12-layer model (small): OneDrive
24-layer model (full): OneDrive

After downloading, organize your checkpoints directory as follows:

metric_checkpoints/
├── pvsm_finetuned_full.pt          # Our trained full 24-layer model
├── pvsm_finetuned_small.pt         # Our trained smaller 12-layer model
├── dinov3-vitb16-pretrain-lvd1689m # DINOv3 Checkpoint
│   ├── config.json
│   ├── LICENSE.md
│   ├── model.safetensors
│   ├── preprocessor_config.json
│   └── README.md
├── imagenet-vgg-verydeep-19.mat    # (Optional) for training
└── map-anything                    # (Optional) for dataset generation
    ├── config.json
    ├── model.safetensors
    └── README.md

Interactive Demo

For a quick interactive demo, please follow the instruction and unzip the downloaded example data (22.3 MB) to your local machine.

To launch the interactive web-based demo:

torchrun --nproc_per_node 1 --standalone viser_demo.py --config-name runs/pvsm_finetuned_small

The demo will start a web server. Open your browser and navigate to the displayed URL to interact with the model.

System Requirements:

Small model: ~2.5GB VRAM
Full model: ~3.0GB VRAM

Note: The rendering quality in gsplat is compressed.

Running Inference

To run inference on a dataset:

python inference.py --config-name runs/pvsm_finetuned_small

Or for the full model:

python inference.py --config-name runs/pvsm_finetuned_full

Training

To train the model:

torchrun --nproc_per_node <num_gpus> train.py --config-name runs/pvsm_finetuned_small

Configuration:

Training configurations are located in configs/runs/
Model configurations are in configs/model/
Dataset configurations are in configs/dataset/

API Keys: Before training, create configs/api_keys.yaml with your WandB API key:

wandb: YOUR_WANDB_KEY

You can use configs/api_keys_example.yaml as a template.

Citation

If you find this work useful in your research, please consider citing:

@article{wu_pvsm_2026,
  title={From Rays to Projections: Better Inputs for Feed-Forward View Synthesis},
  author={Wu, Zirui and Jiang, Zeren and Oswald, Martin R. and Song, Jie},
  journal={arxiv preprint arxiv:2601.05116},
  year={2026}
}

License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. See LICENSE.md for details.

Acknowledgement

This work is built upon LVSM's code base.

Pvsm

Install / Use

README