[WACV 2026] Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone

positive negative

Official repository of the paper:
"Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone" Tristan Amadei, Enric Meinhardt-Llopis, Benedicte Bascle, Corentin Abgrall, Gabriele Facciolo

🧠 Overview

We introduce CAEVL, a data-efficient method for UAV geo-localization that does not require paired UAV–satellite images during training.

Unlike traditional cross-view localization approaches that rely on aligned image pairs, CAEVL learns a shared representation space using:

edge-based image representations
self-supervised learning
non-contrastive training objectives

This allows the model to generalize to challenging conditions such as:

low-quality UAV imagery
high-altitude viewpoints
strong appearance differences between modalities

⚙️ Method Summary

CAEVL is based on a cross-view embedding learning strategy:

UAV and satellite images are encoded into a shared latent space
Training is performed without explicit UAV–satellite correspondences
A non-contrastive objective structures the latent space
Edge representations improve robustness to appearance variations

At inference time:

a UAV image is encoded
nearest neighbors are retrieved from a satellite database
the predicted location is obtained via image matching

Results

CAEVL is a very lightweight method with a computational cost of only 1.4 GFLOPs, and is competitive compared to SOTA methods despite having been trained with reference satellite imagery only.

| Method | Data | R@1 (100m) | R@1 (250m) | GFLOPs | |---|---|---|---|---| | Mix VPR | Paired | 42.2 | 79.2 | 10.3 | | Eigen Places | Paired | 39.8 | 78.7 | 19.7 | | FSRA | Paired | 37.2 | 84.7 | 13.3 | | DAC | Paired | 48.6 | 85.5 | 20.6 | | | | | | | | Di Piazza et al. | Sat-Only | 26.3 | 34.1 | 1.4 | | CAEVL (Ours) | Sat-Only | 49.7 | 82.8 | 1.4 |

📦 ViLD Dataset

We release the ViLD dataset, designed for UAV-to-satellite matching and geo-localization tasks.

UAV and satellite imagery from multiple flights
Ground-truth coordinates (Lambert 93 and UTM)
Predefined train/validation/test splits
Tools for visualization and reproducibility

👉 Download on Zenodo (https://zenodo.org/records/19223815) or 👉 Download from the ENS website

🔐 Dataset Access

The dataset is distributed as a password-protected archive.

To request access, please send an email to:
📧 vild.dataset@gmail.com

Please include:

Name and affiliation
Intended use (e.g., research, commercial, personal project)

📄 Email template

Subject: ViLD Dataset Access Request

Hello,

I would like to request access to the ViLD dataset.

Name: [Your Name] Affiliation: [Your Institution / Company] Intended use: [Brief description]

Thank you

You will receive the password upon request.

🚀 Getting Started

Evaluation

Here is an example of how to evaluate the model

python caevl/evaluation/eval.py \
    --method=caevl \
    --weights=caevl/models/trained/stagetwo/stagetwo.pth \
    --database_folder=path/to/database \
    --queries_folder=path/to/queries \
    --database_coords_path=path/to/database_coords \
    --queries_coords_path=path/to/queries_coords \

This will create a log file in logs/log_dir. You can add --save_predictions to save the predictions, allowing you to visualize and analyze them afterwards. The --database_coords_path and --queries_coords_path parameters are paths to the dictionaries that contain the coordinates of the database and query images. They need to have the names of images as keys and the coordinates as values. Otherwise, the coordinates of the images can be stored directly in the filenames, as such @utm_east@utm_north@filename

Training

Training the autoencoder for stage one is straightforward

python caevl/ae/train.py --config=caevl/ae/configs/config_ae.yml

Fine-tuning the encoder during stage 2 is performed similarly:

python caevl/ft_stage/train.py --config=caevl/ft_stage/configs/config_stagetwo.yml

You just need to indicate in the config file the folder in which the weights of the encoder are located, for instance

architecture:
  dir_model: 'AutoEncoder'
  backbone: 'stageone'

📚 Citation

If you use this work, please cite:

@inproceedings{amadei2026beyond,
  title={Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone},
  author={Amadei, Tristan and Meinhardt-Llopis, Enric and Bascle, Benedicte and Abgrall, Corentin and Facciolo, Gabriele},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={7409--7419},
  year={2026}
}

📬 Contact

For questions related to the dataset or the method, feel free to contact us via the dataset email: vild.dataset@gmail.com

Acknowledgements

We thank the contributors of the open source codes including Dinov2 (https://github.com/facebookresearch/dinov2) and (https://github.com/gmberton/VPR-methods-evaluation).

Caevl

Install / Use

README