Loftup
[ICCV'25 oral] Official Code for "LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models"
Install / Use
/learn @andrehuang/LoftupREADME
LoftUp: A Coordinate-Based Feature Upsampler for Vision Foundation Models
ICCV2025 (oral)
Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, Dan Zhang

TL;DR: LoftUp achieves the strongest feature upsampling performance at a comparable speed to bilinear upsampling.

Contents
- Install
- Inference with pretrained upsamplers
- Evaluation on downstream tasks
- Training LoftUp upsamplers
- Citation
Install
In general, LoftUp can run with most recent pytorch environments. We encourage the users to try out LoftUp in their exisitng environment first.
We also provide two yaml file for installation. To use them, simply run:
conda env create -f environment_cuda11.yaml
or
conda env create -f environment.yaml
Inference with pretrained upsamplers
All pre-trained upsamplers are available on 🤗 here: https://huggingface.co/models?search=loftup.
We provide example code for using LoftUp in example_usage.py. Currently we provide:
|Backbone Name | Featurizer Class | HF hub | Torch Hub Repo | Torch Hub Name | |-------------------| ---|------------------------------------------------|------|-----| | DINOv2 S/14 | dinov2 | haiwen/loftup-dinov2s | andrehuang/loftup | loftup_dinov2s| | DINOv2 S/14 + Reg | dinov2s_reg | haiwen/loftup-dinov2s_reg| andrehuang/loftup | loftup_dinov2s_reg| | DINOv2 B/14 | dinov2b | haiwen/loftup-dinov2b | andrehuang/loftup | loftup_dinov2b| | DINOv2 B/14 + Reg | dinov2b_reg | haiwen/loftup-dinov2b_reg|andrehuang/loftup | loftup_dinov2b_reg| | CLIP ViT B/16 | clip |haiwen/loftup-clip | andrehuang/loftup | loftup_clip| |SigLIP ViT B/16 | siglip | haiwen/loftup-siglip| andrehuang/loftup | loftup_siglip| |SigLIP2 ViT B/16 | siglip2 | haiwen/loftup-siglip2| andrehuang/loftup | loftup_siglip2|
To use torch hub checkpoints, simply run
upsampler = torch.hub.load('andrehuang/loftup', model_torch_hub_name, pretrained=True)
For example, upsampler = torch.hub.load('andrehuang/loftup', loftup_dinov2s, pretrained=True).
The upsampler class is defined at UpsamplerwithChannelNorm.
Evaluation on Downstream Tasks
Dataset Preparation
See Preparing Datasets for Evaluation.
Semantic Segmentation
For semantic segmentation, our implementation is adapted from FeatUp. You can use eval_seg.py by running:
python eval_seg.py ++upsampler_path=/path/to/your/upsampler
You can also configure other hyper-parameters such as output_dir and dataset directory. The config file is configs/eval_seg.yaml.
Video Object Segmentation
For video object segmentation on DAVIS, our code is modified from the implementation in LiFT. Specifically, we first extract segmentaiton results by running:
python eval_davis.py --dataroot your_davis_data_dir --model_type "dinov2" --output_dir your_output_dir --imsize 224 --upsampler_path=your_upsampler_path
Then run the following to get evaluation results:
python davis2017-evaluation/evaluation_method.py --davis_path /your_davis_data_dir --task semi-supervised --results_path your_output_dir/davis_vidseg_224 --imsize 224
Others
For interactive segmentation, please check out iSegProbe.
For open-vocabulary segmentation, please check out ProxyCLIP.
For depth and normal estimation, please check out Probe3D.
Training LoftUp Upsamplers
This repository contains training scripts for training LoftUp upsamplers. The training is done in two stages:
Stage 1: Basic Feature Upsampling
Stage 1 training (train_loftup_stage1.py) trains upsamplers to convert low-resolution features to high-resolution features using reconstruction loss.
Example training command:
python train_loftup_stage1.py ++dataset="sa1b" ++epochs=1 ++batch_size=2 ++num_gpus=4 ++model_type="dinov2" ++pytorch_data_dir='datasets' ++upsampler_type="loftup" ++sam_mask_alpha=0.8 ++load_size=224 ++upsample_size=224 ++tv_weight=0.001 ++clamp_featup=True
Stage 2: High-Resolution Supervision
Stage 2 training (train_loftup_stage2.py) fine-tunes the Stage 1 upsampler with high-resolution supervision for improved quality.
Example training command:
python train_loftup_stage2.py ++dataset="sa1b" ++epochs=1 ++hr_res=896 ++batch_size=2 ++consistency_method="bilinear" ++model_type="dinov2" ++num_gpus=4 ++affinity_loss=True ++pytorch_data_dir='datasets' ++pretrained_upsampler="path/to/stage1_checkpoint.ckpt" ++upsampler_type="loftup" ++sam_mask_hr_alpha=0.5 ++sam_mask_reg=0.0 ++lr=1e-3 ++use_featup=False ++aug_size ++n_jitters=2
Configuration
Both training scripts use Hydra for configuration management. Configuration files are located in configs/:
configs/train_loftup_stage1.yaml- Stage 1 configurationconfigs/train_loftup_stage2.yaml- Stage 2 configuration
Key configuration parameters:
model_type: Feature extractor type (e.g., "dinov2", "clip")upsampler_type: Type of upsampler to train (e.g., "loftup")batch_size: Training batch sizeepochs: Number of training epochslr: Learning rateload_size: Input image size for feature extractionupsample_size: Target size for upsampled featuresn_jitters: Number of jittering augmentations per training steptv_weight: Weight for total variation losssam_mask_alpha: Weight for SAM mask adjustment (Stage 1)sam_mask_hr_alpha: Weight for SAM mask adjustment (Stage 2)
For more details, see the configuration files in configs/ and the training scripts themselves.
Citation
If you find our work helpful, please cite:
@misc{huang2025loftuplearningcoordinatebasedfeature,
title={LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models},
author={Haiwen Huang and Anpei Chen and Volodymyr Havrylov and Andreas Geiger and Dan Zhang},
year={2025},
eprint={2504.14032},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.14032},
}
