SemanticLocalizationMetrics

The first research for semantic localization

Generate Convert Improve

Install / Use

/learn @xiaoyuan1996/SemanticLocalizationMetrics

About this skill

Quality Score

0/100

README

The offical code for paper "Learning to Evaluate Performance of Multi-modal Semantic Localization", TGRS 2022.

Author: Zhiqiang Yuan, Chongyang Li, Zhuoying Pan, et. al

<a href="https://github.com/xiaoyuan1996/retrievalSystem"><img src="https://travis-ci.org/Cadene/block.bootstrap.pytorch.svg?branch=master"/></a> npm License <a href="https://pypi.org/project/mitype/"><img src="https://img.shields.io/pypi/v/mitype.svg"></a>

-------------------------------------------------------------------------------------

Welcome :+1:<big>`Fork and Star`</big>:+1:, then we'll let you know when we update

-------------------------------------------------------------------------------------

We recently released SeLo v2[link]., which improves SeLo from the speed and accuracy.

-------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

INTRODUCTION

An official evaluation metric for semantic localization.

Fig.1. (a) Results of airplane detection. (b) Results of semantic localization with query of ``white planes parked in the open space of the white airport''. Compared with tasks such as detection, SeLo achieves semantic-level retrieval with only caption-level annotation during training, which can adapt to higher-level retrieval tasks.

Fig.2. Visualization of SeLo with query of "the red rails where the grey train is located run through the residential area".

The semantic localization (SeLo) task refers to using cross-modal information such as text to quickly localize RS images at the semantic level [link]. This task implements semantic-level detection, which only uses caption-level supervision information. In our opinion, it is a meaningful and interesting work, which realizes the unification of sub-tasks such as detection and segmentation.

visual image

Fig.3. Framework of Semantic Localization. After multi-scale segmentation of large RS images, we perform cross-modal similarity calculation on query and multiple slices. The calculated regional probabilities are then utilized to pixel-level averaging, which generates the SeLo map after further noise suppression.

We contribute test sets, evaluation metrics and baselines for semantic localization, and provide a detailed demo to use this evaluation framework. Any questions can open a Github issue. Start and enjoy!

-------------------------------------------------------------------------------------

DATASET AND METRICS

TESTDATA

We contribute a semantic localization testset to provide systematic evaluation for SeLo task. The images in SLT come from Google Earth, and Fig. 4 exhibits several samples from the testset. Every sample includes a large image in RS scene with the size of 3k × 2k to 10k × 10k, a query sentence, and one or more corresponding semantic bounding boxes.

Fig.4. Four samples of Semantic Localization Testset. (a) Query: “ships without cargo floating on the black sea are docked in the port”. (b) Query: “a white airplane ready to take off on a grayblack runway”. (c) Query: “some cars are parked in a parking lot surrounded by green woods”. (d) Query: “the green football field is surrounded by a red track”.

TABLE I Quantitative Statistics of Semantic Localization Testset.

| Parameter | Value | Parameter | Value | | ------------- | ------| ---------------------| -------| | Word Number | 160 | Caption Ave Length | 11.2 | | Sample Number | 59 | Ave Resolution Ratio (m) | 0.3245 | | Channel Number| 3 | Ave Region Number | 1.75 | | Image Number | 22 | Ave Attention Ratio | 0.068 |

METRICS

We systematically model and study semantic localization in detail, and propose multiple discriminative evaluation metrics to quantify this task based on significant area proportion, attention shift distance, and discrete attention distance.

Fig.5. Three proposed evaluation metrics for semantic localization. (a) Rsu aims to calculate the attention ratio of the ground-truth area to the useless area. (b) Ras attempts to quantify the shift distance of the attention from the GT center. (c) Rda evaluates the discreteness of the generated attention from probability divergence distance and candidate attention number.

TABLE II Explanation of the indicators.

| Indicator | Range | Meaning | | --------- | ------| ---------| | Rsu | ↑ [ 0 ~ 1 ] | Calc the salient area proportion | | Ras | ↓ [ 0 ~ 1 ] | Makes attention center close to annotation center |
| Rda | ↑ [ 0 ~ 1 ] | Makes attention center focus on one point | | Rmi | ↑ [ 0 ~ 1 ] | Calculate the mean indicator of SeLo task |

Fig.6. Qualitative analysis of SeLo indicator validity. (a) Query: “eight large white oil storage tanks built on grey concrete floor”. (b) Query: “a white plane parked in a tawny clearing inside the airport”. (c) Query: “lots of white and black planes parked inside the grey and white airport”.

BASELINES

All experiments all carried out at Intel(R) Xeon(R) Gold 6226R CPU @2.90GHz and a single NVIDIA RTX 3090 GPU.

Comparison of SeLo Performance on Different Trainsets

| Trainset | ↑ Rsu | ↑ Rda | ↓ Ras | ↑ Rmi | | ------------- | ------| -------------- | -------| -------| | Sydney | 0.5844 | 0.5670 | 0.5026 | 0.5496 | | UCM | 0.5821 | 0.4715 | 0.5277 | 0.5160 | | RSITMD| 0.6920 | 0.6667 | 0.3323 | 0.6772 | | RSICD | 0.6661 | 0.5773 | 0.3875 | 0.6251

Comparison of SeLo Performance on Different Scales

| | Scale-128 | Scale-256 | Scale-512 | Scale-768 | ↑ Rsu | ↑ Rda | ↓ Ras | ↑ Rmi | Time (m) | --- | ----------| ----------| ----------| -----------| ----- | ----- | ----- | ----- | ----- | | s1 | √ | √ | | | 0.6389 | 0.6488 | 0.2878 | 0.6670 | 33.81 | | s2 | | √ | √ | | 0.6839 | 0.6030 | 0.3326 | 0.6579 | 14.25 | | s3 | | | √ | √ | 0.6897 | 0.6371 | 0.3933 | 0.6475 | 11.23 | | s4 | √ | √ | √ | | 0.6682 | 0.7072 | 0.2694 | 0.6998 | 34.60 | | s5 | | √ | √ | √ | 0.6920 | 0.6667 | 0.3323 | 0.6772 | 16.92 | | s6 | √ | √ | √ | √ | 0.6809 | 0.6884 | 0.3025 | 0.6886 | 36.28 |

Comparison of SeLo Performance on Different Retrieval Models

| Trainset | ↑ Rsu | ↑ Rda | ↓ Ras | ↑ Rmi | Time (m) | | ------------- | ------| -------------- | -------| -------| -------- | | VSE++ | 0.6364 | 0.5829 | 0.4166 | 0.6045 | 15.61 | LW-MCR | 0.6698 | 0.6021 | 0.4335 | 0.6167 | 15.47 | SCAN| 0.6421 | 0.6132 | 0.3871 | 0.6247 | 16.32 | CAMP | 0.6819 | 0.6314 | 0.3912 | 0.6437 | 18.24 | AMFMN | 0.6920 | 0.6667 | 0.3323 | 0.6772 | 16.92

Analysis of Time Consumption

| Scale (128, 256) | Cut | Sim | Gnt | Flt | Total | | ------------- | ------| -------------- | -------| -------| ------| | Times(m) | 2.85| 20.60 | 7.40| 0.73| 33.81| | Rate(%) | 8.42| 60.94 | 21.88| 2.16| -|

| Scale (512, 768) | Cut | Sim | Gnt | Flt | Total | | ------------- | ------| -------------- | -------| -------| ------| | Times(m) | 0.46| 1.17 | 6.96| 0.67| 11.23| | Rate(%) | 4.06| 10.42 | 61.98| 5.97| -|

| Scale (256, 512, 768) | Cut | Sim | Gnt | Flt | Total | | ------------- | ------| -------------- | -------| -------| ------| | Times(m) | 0.93| 5.72 | 7.38| 0.74| 16.92| | Rate(%) | 5.52| 33.82 | 43.60| 4.37| -|

IMPLEMENTATION

ENVIRONMENT

1.Pull our project and install the requirements, make sure the code path only include English:

   $ apt-get install python3
   $ git clone git@github.com:xiaoyuan1996/SemanticLocalizationMetrics.git
   $ cd SemanticLocalizationMetri

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

flutter-tutor

Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d

xiaoyuan1996

View profile

View on GitHub

GitHub Stars28

CategoryEducation

Updated1mo ago

Forks5

xiaoyuan1996/SemanticLocalizationMetrics

Languages

Python

Security Score

95/100

Audited on Feb 23, 2026

No findings

SemanticLocalizationMetrics

Install / Use

README

The offical code for paper "Learning to Evaluate Performance of Multi-modal Semantic Localization", TGRS 2022.

Author: Zhiqiang Yuan, Chongyang Li, Zhuoying Pan, et. al

-------------------------------------------------------------------------------------

Welcome :+1:<big>Fork and Star</big>:+1:, then we'll let you know when we update

-------------------------------------------------------------------------------------

We recently released SeLo v2[link]., which improves SeLo from the speed and accuracy.

-------------------------------------------------------------------------------------

Contexts

-------------------------------------------------------------------------------------

INTRODUCTION

-------------------------------------------------------------------------------------

DATASET AND METRICS

TESTDATA

METRICS

BASELINES

Comparison of SeLo Performance on Different Trainsets

Comparison of SeLo Performance on Different Scales

Comparison of SeLo Performance on Different Retrieval Models

Analysis of Time Consumption

IMPLEMENTATION

ENVIRONMENT

Related Skills

Welcome :+1:<big>`Fork and Star`</big>:+1:, then we'll let you know when we update