Less3Depend
[ICLR 2026] PyTorch implementation of "The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge".
Install / Use
/learn @ou524u/Less3DependREADME
Less3Depend (ICLR 2026)
This repository contains the PyTorch implementation of the paper "The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge".
<div> <a href="https://pku-vcl-geometry.github.io/Less3Depend/"><strong>Project Page</strong></a> | <a href="https://arxiv.org/abs/2506.09885"><strong>Paper</strong></a> </div>1. Preparation
Environment Setup
Create and activate conda environment:
conda create -n lvsm python=3.11
conda activate lvsm
pip install -r requirements.txt
Recommended: GPU device with compute capability > 8.0. We used 8*A100 GPUs in our experiments.
Dataset Setup
Update(26/01/04): We now also provide our preprocessed version of DL3DV dataset in pixelSplat-style format on huggingface!
We use RealEstate10K dataset from pixelSplat, and followed LVSM to do the preprocessing.
Download and unzip RealEstate10K .torch chunks. For our scaling experiments, we split the dataset into 4 sizes, each containing the number of chunks listed below:
| Size | Chunks | Scenes | |------|--------|--------| | Little | 76 | 1,202 | | Medium | 304 | 4,121 | | Large | 1,216 | 16,449 | | Full | 4,866 | 66,033 |
Process the dataset following LVSM:
# process training split
python process_data.py --base_path datasets/re10k --output_dir datasets/re10k-full_processed --mode train --num_processes 80
# process test split
python process_data.py --base_path datasets/re10k --output_dir datasets/re10k-full_processed --mode test --num_processes 80
2. Evaluation
<!-- Download pre-trained model from [Google Drive](https://drive.google.com/file/d/1PMEl0RoOwi2wlsMRv6K9YSfeq-KqVYbz/view?usp=sharing), or with the following command: ```bash # download pre-trained model mkdir -p checkpoints/uplvsm gdown 1PMEl0RoOwi2wlsMRv6K9YSfeq-KqVYbz -O checkpoints/uplvsm/uplvsm_x224.pt ``` -->Download pre-trained model from Hugging Face.
Run evaluation:
# fast inference, compute metrics only
torchrun --nproc_per_node 8 --nnodes 1 --rdzv_id 18640 --rdzv_backend c10d --rdzv_endpoint localhost:29511 -m src.inference_fast --config config/eval/uplvsm_x224.yaml
# complete inference
torchrun --nproc_per_node 8 --nnodes 1 --rdzv_id 18640 --rdzv_backend c10d --rdzv_endpoint localhost:29511 -m src.inference --config config/eval/uplvsm_x224.yaml
<!--
✅ Download uplvsm model with 518×518 resolution from [Google Drive](https://drive.google.com/file/d/1DiLCEzHbxtusvA6ic6IhpYuhD93PUjJw/view?usp=sharing), and run
evaluation:
-->
✅ Download uplvsm model with 518×518 resolution from Hugging Face, and run evaluation:
# fast inference, compute metrics only
torchrun --nproc_per_node 8 --nnodes 1 --rdzv_id 18640 --rdzv_backend c10d --rdzv_endpoint localhost:29511 -m src.inference_fast --config config/eval/uplvsm_x518.yaml
# complete inference
torchrun --nproc_per_node 8 --nnodes 1 --rdzv_id 18640 --rdzv_backend c10d --rdzv_endpoint localhost:29511 -m src.inference --config config/eval/uplvsm_x518.yaml
3. Training
# pretraining on 224×224 resolution
torchrun --nproc_per_node 8 --nnodes 1 --rdzv_id 18640 --rdzv_backend c10d --rdzv_endpoint localhost:29511 -m src.train --config config/uplvsm_x224.yaml
# finetuning on 518×518 resolution
torchrun --nproc_per_node 8 --nnodes 1 --rdzv_id 18640 --rdzv_backend c10d --rdzv_endpoint localhost:29511 -m src.train --config config/uplvsm_x518.yaml
📄 Acknowledgments
Our implementation builds upon LVSM. We also recommend RayZer, Pensieve and X-Factor for self-supervised scene reconstruction.
If you find this work useful for your research, please consider citing:
@misc{wang2025less3depend,
title={The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge},
author={Haoru Wang and Kai Ye and Yangyan Li and Wenzheng Chen and Baoquan Chen},
year={2025},
eprint={2506.09885},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.09885},
}
Related Skills
node-connect
353.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
