Less3Depend

[ICLR 2026] PyTorch implementation of "The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge".

Generate Convert Improve

Install / Use

/learn @ou524u/Less3Depend

About this skill

Quality Score

0/100

README

Less3Depend (ICLR 2026)

This repository contains the PyTorch implementation of the paper "The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge".

<div> <a href="https://pku-vcl-geometry.github.io/Less3Depend/"><strong>Project Page</strong></a> | <a href="https://arxiv.org/abs/2506.09885"><strong>Paper</strong></a> </div>

1. Preparation

Environment Setup

Create and activate conda environment:

conda create -n lvsm python=3.11
conda activate lvsm
pip install -r requirements.txt

Recommended: GPU device with compute capability > 8.0. We used 8*A100 GPUs in our experiments.

Dataset Setup

Update(26/01/04): We now also provide our preprocessed version of DL3DV dataset in pixelSplat-style format on huggingface!

We use RealEstate10K dataset from pixelSplat, and followed LVSM to do the preprocessing.

Download and unzip RealEstate10K .torch chunks. For our scaling experiments, we split the dataset into 4 sizes, each containing the number of chunks listed below:

| Size | Chunks | Scenes | |------|--------|--------| | Little | 76 | 1,202 | | Medium | 304 | 4,121 | | Large | 1,216 | 16,449 | | Full | 4,866 | 66,033 |

Process the dataset following LVSM:

# process training split
python process_data.py --base_path datasets/re10k --output_dir datasets/re10k-full_processed --mode train --num_processes 80

# process test split
python process_data.py --base_path datasets/re10k --output_dir datasets/re10k-full_processed --mode test --num_processes 80

2. Evaluation

Download pre-trained model from Hugging Face.

Run evaluation:

# fast inference, compute metrics only
torchrun --nproc_per_node 8 --nnodes 1 --rdzv_id 18640 --rdzv_backend c10d --rdzv_endpoint localhost:29511 -m src.inference_fast --config config/eval/uplvsm_x224.yaml

# complete inference
torchrun --nproc_per_node 8 --nnodes 1 --rdzv_id 18640 --rdzv_backend c10d --rdzv_endpoint localhost:29511 -m src.inference --config config/eval/uplvsm_x224.yaml

✅ Download uplvsm model with 518×518 resolution from Hugging Face, and run evaluation:

# fast inference, compute metrics only
torchrun --nproc_per_node 8 --nnodes 1 --rdzv_id 18640 --rdzv_backend c10d --rdzv_endpoint localhost:29511 -m src.inference_fast --config config/eval/uplvsm_x518.yaml

# complete inference
torchrun --nproc_per_node 8 --nnodes 1 --rdzv_id 18640 --rdzv_backend c10d --rdzv_endpoint localhost:29511 -m src.inference --config config/eval/uplvsm_x518.yaml

3. Training

# pretraining on 224×224 resolution
torchrun --nproc_per_node 8 --nnodes 1 --rdzv_id 18640 --rdzv_backend c10d --rdzv_endpoint localhost:29511 -m src.train --config config/uplvsm_x224.yaml

# finetuning on 518×518 resolution
torchrun --nproc_per_node 8 --nnodes 1 --rdzv_id 18640 --rdzv_backend c10d --rdzv_endpoint localhost:29511 -m src.train --config config/uplvsm_x518.yaml

📄 Acknowledgments

Our implementation builds upon LVSM. We also recommend RayZer, Pensieve and X-Factor for self-supervised scene reconstruction.

If you find this work useful for your research, please consider citing:

@misc{wang2025less3depend,
    title={The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge}, 
    author={Haoru Wang and Kai Ye and Yangyan Li and Wenzheng Chen and Baoquan Chen},
    year={2025},
    eprint={2506.09885},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2506.09885}, 
}

Related Skills

node-connect

353.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

353.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

353.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。