XGQA
No description available
Install / Use
/learn @adapter-hub/XGQAREADME
xGQA
This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".
xGQA builds on the original work of Hudson et al. 2019: GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering. The training data can be downloaded here.
Overview
The repository is structured as follows:
data/zero_shot/contains the xGQA test-dev files for all 8 languagesdata/few_shot/contains the new standard splits for few shot learning. The number in the file name indicates how many distinct images the split includes. i.e.train_10.jsonimplies that this subset contains questions about 10 distinct images.
Training Data
Please download the English training data of GQA (Hudson et al. 2019) here.
Zero-Shot Results
Zero-shot transfer results on xGQA when transferring from English GQA. Average accuracy is reported. Mean scores are not averaged over the source language (English). | model | en | de | pt | ru | id | bn | ko | zh | mean | |-----------|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:| | M3P | 58.43 | 23.93 | 24.37 | 20.37 | 22.57 | 15.83 | 16.90 | 18.60 | 20.37| | OSCAR+Emb | 62.23 | 17.35 | 19.25 | 10.52 | 18.26 | 14.93 | 17.10 | 16.41 | 16.26| | OSCAR+Ada | 60.30 | 18.91 | 27.02 | 17.50 | 18.77 | 15.42 | 15.28 | 14.96 | 18.27| | mBERTAda | 56.25 | 29.76 | 30.37 | 24.42 | 19.15 | 15.12 | 19.09 | 24.86 | 23.25|
Few-Shot
Few-shot dataset sizes. The GQA test-dev set is split into new development, test sets, and training splits of different sizes. We maintain the distribution of structural types in each split. | Set | Test | Dev | Train | | | | | | |------------|:----:|:----:|:-----:|-----|-----|-----|-----|------| | #Images | 300 | 50 | 1 | 5 | 10 | 20 | 25 | 48 | | #Questions | 9666 | 1422 | 27 | 155 | 317 | 594 | 704 | 1490 |
Citation
If you find this repository helpful, please cite our paper "xGQA: Cross-lingual Visual Question Answering":
@inproceedings{pfeiffer-etal-2021-xGQA,
title={{xGQA: Cross-Lingual Visual Question Answering}},
author={ Jonas Pfeiffer and Gregor Geigle and Aishwarya Kamath and Jan-Martin O. Steitz and Stefan Roth and Ivan Vuli{\'{c}} and Iryna Gurevych},
booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",
month = May,
year = "2022",
url = "https://arxiv.org/pdf/2109.06082.pdf",
publisher = "Association for Computational Linguistics",
}
This work is licensed under a Creative Commons Attribution 4.0 International License.
Related Skills
node-connect
350.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
Security Score
Audited on Mar 30, 2026

