SkillAgentSearch skills...

VICI

[ACMMM UAVM 2025] πŸŒπŸš— VICI: VLM-Instructed Cross-view Image-localisation πŸ“‘πŸ—ΊοΈ

Install / Use

/learn @tavisshore/VICI

README

<div align="center">

VICI: VLM-Instructed Cross-view Image-localisation

<p align="middle"> <a href="https://zxh009123.github.io/">Xiaohan Zhang*</a> <a href="https://tavisshore.co.uk/">Tavis Shore*</a> <a href="">Chen Chen</a> <br> <a href="https://cvssp.org/Personal/OscarMendez/index.html">Oscar Mendez</a> <a href="https://personalpages.surrey.ac.uk/s.hadfield/biography.html">Simon Hadfield</a> <a href="https://www.uvm.edu/cems/cs/profile/safwan-wshah">Safwan Wshah</a> </p> <p align="middle"> <a href="https://www.wshahaigroup.com/">Vermont Artificial Intelligence Laboratory (VaiL)</a> <br> <a href="https://www.surrey.ac.uk/centre-vision-speech-signal-processing">Centre for Vision, Speech, and Signal Processing (CVSSP)</a> <br> <a href="https://www.ucf.edu/">University of Central Florida</a> <a href="https://locusrobotics.com/">Locus Robotics</a> </p>

arxiv Conference License

University1652 - Benchmark

vici_diagram

</div>

πŸ““ Description

🧬 Feature Extractors

<div align="center">

| Backbone | Params (M) | FLOPs (G) | Dims | R@1 | R@5 | R@10 | |:----------:|:----------:|:---------:|:----:|:-----:|:-----:|:-----:| | ConvNeXt-T | 28 | 4.5 | 768 | 1.36 | 4.34 | 7.95 | | ConvNeXt-B | 89 | 15.4 | 1024 | 3.14 | 8.14 | 13.22 | | ViT-B | 86 | 17.6 | 768 | 3.30 | 8.92 | 13.96 | | ViT-L | 307 | 60.6 | 1024 | 9.62 | 23.42 | 32.73 | | DINOv2-B | 86 | 152 | 768 | 17.37 | 36.14 | 46.96 | | DINOv2-L | 304 | 507 | 1024 | 27.49 | 51.96 | 63.13 |

</div>

🧰 Vision-Language Models

<div align="center">

| VLM | R@1 | R@5 | R@10 | |:---------------------:|-------|-------|-------| | Without Re-ranking | 27.49 | 51.96 | 63.13 | | Gemini 2.5 Flash Lite | 23.54 | 48.39 | 63.13 | | Gemini 2.5 Flash | 30.21 | 53.04 | 63.13 |

</div>

πŸ›Έ Drone Augmentation

<div align="center">

| $P$ | R@1 | R@5 | R@10 | |:---:|:-----:|:-----:|:-----:| | 0 | 24.47 | 48.16 | 60.99 | | 0.1 | 26.98 | 51.34 | 61.92 | | 0.3 | 27.49 | 51.96 | 63.13 | | 0.5 | 24.89 | 52.03 | 62.66 |

</div>

🎯 Ablation study and baseline comparison.

<div align="center">

| Model | R@1 | R@5 | R@10 | |:---------------------------------:|-------|-------|-------| | U1652~\cite{zheng2020university} | 1.20 | - | - | | LPN w/o drone~\cite{wang2021each} | 0.74 | - | - | | LPN w/ drone~\cite{wang2021each} | 0.81 | - | - | | DINOv2-L | 24.66 | 48.00 | 59.02 | | + Drone Data | 27.49 | 51.96 | 63.13 | | + VLM Re-rank (Ours) | 30.21 | 53.04 | 63.13 |

</div>

πŸ“Š Evaluation

🐍 Environment Setup

conda env create -n ENV -f requirements.yaml && conda activate ENV

🐍 Stage 1 - Image Retrieval

Before running Stage 1, configure your dataset paths:

  1. Navigate to the /config/ directory.
  2. Open the default.yaml file (or copy it to a new file).
  3. Replace the placeholder values (e.g., DATA_ROOT) with the actual paths to your dataset and related files.

Once your configuration file is ready, you can train Stage 1 using:

python stage_1.py --config YOUR_CONFIG_FILE_NAME

You can also download our pre-trained weights here.

🐍 Stage 2 - VLM Re-ranking

To run Stage 2, you need to:

  1. Open the stage_2.py file.
  2. Replace the relevant placeholders (e.g., the path to the answer file from Stage 1 and your Gemini API key).
  3. Ensure any other required directories or options are correctly set.

Then, simply run:

python stage_2.py

This will perform re-ranking using a Vision-Language Model (VLM) on top of the initial retrieval results. There will be a LLM_re_ranked_answer.txt in the answer directory and a reasons.json containing all the reasons for re-ranking.

πŸ“— Related Works

πŸ•Ί PEnG: Pose-Enhanced Geo-Localisation

Β Β Β Β Β  arxiv Conference Project Page GitHub

⛰️ GeoDTR+: Toward Generic Cross-View Geolocalization via Geometric Disentanglement

Β Β Β Β Β  arxiv Conference Project Page GitHub

⭐ Star History

<a href="https://star-history.com/#tavisshore/VICI&Date" align="middle"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=tavisshore/VICI&type=Date&theme=dark" /> <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=tavisshore/VICI&type=Date" /> <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=tavisshore/VICI&type=Date" /> </picture> </a>
View on GitHub
GitHub Stars17
CategoryDevelopment
Updated1mo ago
Forks0

Languages

Python

Security Score

80/100

Audited on Feb 16, 2026

No findings