xGQA

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

xGQA builds on the original work of Hudson et al. 2019: GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering. The training data can be downloaded here.

Overview

The repository is structured as follows:

data/zero_shot/ contains the xGQA test-dev files for all 8 languages
data/few_shot/ contains the new standard splits for few shot learning. The number in the file name indicates how many distinct images the split includes. i.e. train_10.json implies that this subset contains questions about 10 distinct images.

Training Data

Please download the English training data of GQA (Hudson et al. 2019) here.

Zero-Shot Results

Zero-shot transfer results on xGQA when transferring from English GQA. Average accuracy is reported. Mean scores are not averaged over the source language (English). | model | en | de | pt | ru | id | bn | ko | zh | mean | |-----------|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:| | M3P | 58.43 | 23.93 | 24.37 | 20.37 | 22.57 | 15.83 | 16.90 | 18.60 | 20.37| | OSCAR+Emb | 62.23 | 17.35 | 19.25 | 10.52 | 18.26 | 14.93 | 17.10 | 16.41 | 16.26| | OSCAR+Ada | 60.30 | 18.91 | 27.02 | 17.50 | 18.77 | 15.42 | 15.28 | 14.96 | 18.27| | mBERTAda | 56.25 | 29.76 | 30.37 | 24.42 | 19.15 | 15.12 | 19.09 | 24.86 | 23.25|

Few-Shot

Few-shot dataset sizes. The GQA test-dev set is split into new development, test sets, and training splits of different sizes. We maintain the distribution of structural types in each split. | Set | Test | Dev | Train | | | | | | |------------|:----:|:----:|:-----:|-----|-----|-----|-----|------| | #Images | 300 | 50 | 1 | 5 | 10 | 20 | 25 | 48 | | #Questions | 9666 | 1422 | 27 | 155 | 317 | 594 | 704 | 1490 |

Citation

If you find this repository helpful, please cite our paper "xGQA: Cross-lingual Visual Question Answering":

@inproceedings{pfeiffer-etal-2021-xGQA,
    title={{xGQA: Cross-Lingual Visual Question Answering}},
    author={ Jonas Pfeiffer and Gregor Geigle and Aishwarya Kamath and Jan-Martin O. Steitz and Stefan Roth and Ivan Vuli{\'{c}} and Iryna Gurevych},
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",
    month = May,
    year = "2022",  
    url = "https://arxiv.org/pdf/2109.06082.pdf",
    publisher = "Association for Computational Linguistics",
}

Shield:

This work is licensed under a Creative Commons Attribution 4.0 International License.

XGQA

Install / Use

README