YoNoSplat
[ICLR'26] YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting
Install / Use
/learn @cvg/YoNoSplatREADME
TODO
- [x] Release code and pretrained models
- [ ] Release high-resolution models
- [ ] Dynamic dataloaders for training on more datasets
- [ ] Release models trained on a mixture of more datasets
Installation
Our code requires Python 3.10+ and is developed with PyTorch 2.1.2 and CUDA 11.8, but it should work with higher PyTorch/CUDA versions as well.
- Clone YoNoSplat.
git clone https://github.com/cvg/YoNoSplat
cd YoNoSplat
- Create the environment (example using conda).
conda create -y -n yonosplat python=3.10
conda activate yonosplat
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
Pre-trained Checkpoints
Pre-trained Checkpoints
Our models are hosted on Hugging Face 🤗
| Model name | Training resolutions | Training data |
|:-------------------------------------------------------------------------------------------------:|:--------------------:|:-------------:|
| re10k.ckpt | 224x224 | re10k |
| dl3dv.ckpt | 224x224 | dl3dv |
Download the checkpoints and place them in the pretrained_weights/ directory.
Camera Conventions
The camera system follows the pixelSplat convention:
- Intrinsics: Normalized camera intrinsic matrices (first row divided by image width, second row divided by image height)
- Extrinsics: OpenCV-style camera-to-world matrices (+X right, +Y down, +Z forward into the scene)
Datasets
Please refer to DATASETS.md for dataset preparation instructions.
Running the Code
Evaluation
Please refer to EVALUATION.md for detailed evaluation commands, including novel view synthesis, pose estimation, and metrics calculation.
Training
First download the Pi3 pretrained model and save it as ./pretrained_weights/pi3.safetensors directory.
Train on RealEstate10K (multi-view, 2-32 input views)
python -m src.main \
+experiment=yono_re10k \
trainer.num_nodes=8 \
wandb.mode=online \
wandb.name=re10k_ctx2to32 \
optimizer.lr=1e-4 \
data_loader.train.batch_size=1 \
checkpointing.save_weights_only=false \
dataset.re10k.view_sampler.num_context_views=[2,32]
Train on DL3DV (multi-view, 2-32 input views)
python -m src.main \
+experiment=yono_dl3dv \
trainer.num_nodes=8 \
wandb.mode=online \
wandb.name=dl3dv_ctx2to32 \
optimizer.lr=1e-4 \
data_loader.train.batch_size=1 \
checkpointing.save_weights_only=false \
dataset.dl3dv.view_sampler.num_context_views=[2,32]
You can adjust the batch size and number of GPUs/nodes to fit your hardware. Note that changing the total batch size may require adjusting the learning rate to maintain performance.
Acknowledgements
This project builds upon several excellent repositories: NoPoSplat, Pi3, pixelSplat.
Citation
If you find this work useful in your research, please cite:
@inproceedings{ye2026yonosplat,
title = {YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting},
author = {Ye, Botao and Chen, Boqi and Xu, Haofei and Barath, Daniel and Pollefeys, Marc},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2026}
}
