VideoQA
No description available
Install / Use
/learn @VRU-NExT/VideoQAREADME
Video Question Answering: Datasets, Algorithms and Challenges
This repository contains a list of codes, leaderboards, dataset and paper lists of Video Question Answering (VideoQA). If you found any error, please don't hesitate to open an issue or pull request.
If you find this repository helpful for your work, please kindly cite the following paper. The Bibtex are listed below:
<pre> @inproceedings{zhong2022Video, title={Video Question Answering: Datasets, Algorithms and Challenges}, author={Yaoyao Zhong and Junbin Xiao and Wei Ji and Yicong Li and Weihong Deng and Tat-Seng Chua}, booktitle={The 2022 Conference on Empirical Methods in Natural Language Processing}, year={2022}, } </pre>Contributor
Contributed by Yaoyao Zhong, Junbin Xiao and Wei Ji.
Thanks for supports from our adviser Tat-Seng Chua!
Resources
Open-sourced code
<div style="overflow-x: auto; overflow-y: auto; height: auto; width:100%;"> <table style="width:100%" border="2"> <thead> <tr> <th>Time</th> <th>Links</th> </tr> </thead> <tbody> <tr> <td>2016</td> <td><a href="https://github.com/makarandtapaswi/MovieQA_CVPR2016/">[MovieQA-CVPR]</a> </td> </tr> <tr> <td>2017</td> <td><a href="https://github.com/YunseokJANG/tgif-qa">[TGIF-CVPR]</a> <a href="https://github.com/teganmaharaj/movieFIB">[MovieFIB-CVPR]</a> <a href="https://github.com/JonghwanMun/MarioQA">[MarioQA-CVPR]</a> <a href="https://github.com/xudejing/video-question-answering">[ACMMM]</a> <a href="https://github.com/Kyung-Min/Deep-Embedded-Memory-Networks">[DEMN-IJCAI]</a> <a href="https://github.com/s0sbazinga/videoqa-stan">[STAN-IJCAI]</a> <a href="https://github.com/ffmpbgrnn/VideoQA">[VideoQA(FiB)-IJCV]</a> <a href="https://github.com/ZJULearning/videoqa">[TIP]</a> </td> </tr> <tr> <td>2018</td> <td> <a href="https://github.com/yj-yu/lsmdc">[JSFusion-ECCV]</a> <a href="https://github.com/SVQA-founder/SVQA/tree/master/code">[SVQA-ACMMM]</a> </td> </tr> <tr> <td>2019</td> <td> <a href="https://github.com/fanchenyou/HME-VideoQA">[HME-CVPR]</a> <a href="https://github.com/wannature/video-qa-FAAAN">[FAAAN-TMM]</a> </td> </tr> <tr> <td>2020</td> <td> <a href="https://github.com/thaolmk54/hcrn-videoqa">[HCRN-CVPR]</a> <a href="https://github.com/chuangg/CLEVRER">[NS-DR-ICLR]</a> <a href="https://github.com/noagarcia/knowit-rock">[ROCK-AAAI]</a> <a href="https://github.com/Jumpin2/HGA">[HGA-AAAI]</a> <a href="https://github.com/jayleicn/TVQAplus">[TVQA+-ACL]</a> <a href="https://github.com/linjieli222/HERO">[HERO-EMNLP]</a> <a href="https://github.com/aurooj/MMFT-BERT">[MMFT-BERT-EMNLP]</a> <a href="https://github.com/jacobswan1/Video2Commonsense">[V2C-EMNLP]</a> <a href="https://github.com/op-multimodal/ACRTransformer">[ACRTransformer-TCSVT]</a> <a href="https://github.com/Jumperkables/tvqa_modality_bias">[Modality Bias-BMVC]</a> </td> </tr> <tr> <td>2021</td> <td> <a href="https://github.com/doc-doc/NExT-QA">[NExT-QA-CVPR]</a> <a href="https://github.com/doc-doc/NExT-OE">[NExT-QE-CVPR]</a> <a href="https://github.com/madeleinegrunde/AGQA_baselines_code">[AGQA-CVPR]</a> <a href="https://github.com/jayleicn/ClipBERT">[ClipBERT-CVPR]</a> <a href="https://github.com/antoyang/just-ask">[VQA-T-ICCV]</a> <a href="https://github.com/InterDigitalInc/DialogSummary-VideoQA">[DialogSummary-ICCV]</a> <a href="https://github.com/liveseongho/DramaQAChallenge2020">[DramaQA-AAAI]</a> <a href="https://github.com/zfchenUnique/DCL-Release">[DCL-ICLR]</a> <a href="https://github.com/rowanz/merlot">[MERLOT-NIPS]</a> <a href="https://github.com/dingmyu/VRDP">[VRDP-NIPS]</a> <a href="https://github.com/csbobby/STAR_Benchmark">[STAR-NIPS]</a> <a href="https://github.com/PengLiang-cn/PGAT">[PGAT-ACMMM]</a> <a href="https://github.com/ahjeongseo/MASN-pytorch">[MASN-ACL]</a> <a href="https://github.com/NJUPT-MCC/DualVGR-VideoQA">[MASN-TMM]</a> <a href="https://github.com/amanchadha/iPerceive">[iPerceive-BMVC]</a> <a href="https://github.com/Trunpm/TPT-for-VideoQA">[TPT-arXiv]</a> </td> </tr> <tr> <td>2022</td> <td> <a href="https://github.com/yl3800/IGV">[IGV-CVPR]</a> <a href="https://github.com/bcmi/Causal-VidQA">[Causal-VidQA-CVPR]</a> <a href="https://github.com/GeWu-Lab/MUSIC-AVQA">[MUSIC-AVQA-CVPR]</a> <a href="https://github.com/rowanz/merlot_reserve">[MERLOT Reserve-CVPR]</a> <a href="https://github.com/doc-doc/HQGA">[HQGA-AAAI]</a> <a href="https://github.com/sail-sg/VGT">[VGT-ECCV]</a> </td> </tr> </tbody> </table> </div>Leaderboards
Inference QA
NExT-QA
<div style="overflow-x: auto; overflow-y: auto; height: auto; width:100%;"> <table style="width:100%" border="2"> <thead> <tr> <th>Rank</th> <th>Name</th> <th>Techniques and Insights</th> <th>NExT-Val</th> <th>NExT-Test</th> </tr> </thead> <tr> <th>/</th> <th>Human Performance</th> <th>/</th> <th>88.4</th> <th>/</th> </tr> <tr> <th>1</th> <th><a href="">[VGT-ECCV2022]</th> <th>Graph, Transformer, Hierarchical Learning, Multi-Granularity</th> <th>55.02</th> <th>53.68</th> </tr> <tr> <th>2</th> <th><a href="https://openaccess.thecvf.com/content/CVPR2022/papers/Buch_Revisiting_the_Video_in_Video-Language_Understanding_CVPR_2022_paper.pdf">[ATP-CVPR2022]</th> <th>Transformer, Cross-modal Pre-training and Fine-tuning</th> <th>54.3</th> <th>/</th> </tr> <tr> <th>3</th> <th><a href="https://arxiv.org/pdf/2207.12783.pdf">[EIGV-ACMMM2022]</th> <th>Causality, Graph</th> <th>/</th> <th>53.7</th> </tr> <tr> <th>4</th> <th><a href="https://arxiv.org/abs/2202.09277">[(2.5+1)D-Transformer-AAAI2022]</th> <th>Graph, Transformer, Multi-Granularity</th> <th>53.4</th> <th>/</th> </tr> <tr> <th>5</th> <th><a href="https://arxiv.org/pdf/2204.11544.pdf">[MMA-arXiv2022]</th> <th>Graph, Hierarchical Learning</th> <th>53.3</th> <th>52.4</th> </tr> <tr> <th>6</th> <th><a href="https://arxiv.org/abs/2112.06197">[HQGA-AAAI2022]</th> <th>Modular Networks, Graph, Hierarchical Learning, Multi-Granularity</th> <th>51.42</th> <th>51.75</th> </tr> <tr> <th>7</th> <th><a href="https://openaccess.thecvf.com/content/CVPR2022/papers/Li_Invariant_Grounding_for_Video_Question_Answering_CVPR_2022_paper.pdf">[IGV-CVPR2022]</th> <th>Causality, Graph</th> <th>/</th> <th>51.34</th> </tr> <tr> <th>8</th> <th><a href="https://ojs.aaai.org/index.php/AAAI/article/view/6767">[HGA-AAAI2020]</th> <th>Graph</th> <th>49.74</th> <th>50.01</th> </tr> <tr> <th>9</th> <th><a href="https://openaccess.thecvf.com/content_CVPR_2020/papers/Le_Hierarchical_Conditional_Relation_Networks_for_Video_Question_Answering_CVPR_2020_paper.pdf">[HCRN-CVPR2020]</th> <th>Modular Networks, Hierarchical Learning</th> <th>48.20</th> <th>48.98</th> </tr> <tbody> </tbody> </table> </div>Factoid QA
Pre-Training
<div style="overflow-x: auto; overflow-y: auto; height: auto; width:100%;"> <table style="width:100%" border="2"> <thead> <tr> <th>Rank</th> <th>Name</th> <th>Cross-Modal Pre-Training</th> <th>TGIF-Frame</th> <th>MSVD-QA</th> <th>MSRVTT-QA</th> </tr> </thead> <tbody> <tr> <th>1</th> <th><a href="https://proceedings.neurips.cc/paper/2021/file/c6d4eb15f1e84a36eff58eca3627c82e-Paper.pdf">[MERLOT-NIPS2021]</th> <th> Youtube-Temporal-180M & Conceptual Captions-3M </th> <th>69.5</th> <th>/</th> <th>43.1</th> </tr> <tr> <th>2</th> <th><a href="">[VIOLET]</th> <th> WebVid2.5M & Youtube-Temporal-180M & Conceptual Captions-3M </th> <th>68.9</th> <th>47.9</th> <th>43.9</th> </tr> <tr> <th>3</th> <th><a href="https://proceedings.neurips.cc/paper/2021/file/dea184826614d3f4c608731389ed0c74-Paper.pdf">[SSRea-NIPS2021]</th> <th>Visual Genome & COCO </th> <th>60.2</th> <th>45.5</th> <th>41.6</th> </tr> <tr> <th>4</th> <th><a href="https://openaccess.thecvf.com/content/ICCV2021/papers/Yang_Just_Ask_Learning_To_Answer_Questions_From_Millions_of_Narrated_ICCV_2021_paper.pdf">[VQA-T-ICCV2021]</th> <th>H2VQA69M</th> <th>/</th> <th>46.3</th> <th>41.5</th> </tr> <tr> <th>5</th> <th><a href="https://openaccess.thecvf.com/content/CVPR2021/papers/Lei_Less_Is_More_ClipBERT_for_Video-and-Language_Learning_via_Sparse_Sampling_CVPR_2021_paper.pdf">[ClipBERT-CVPR2021]</th> <th>Visual Genome & COCO</th> <th>60.3</th> <th>/</th> <th>37.4</th> </tr> </tbody> </table> </div>No Pre-Training
<div style="overflow-x: auto; overflow-y: auto; height: auto; width:100%;"> <table style="Related Skills
docs-writer
99.6k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
342.0kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
2.9kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
cursor-agent-tracking
134A repository that provides a structured system for maintaining context and tracking changes in Cursor's AGENT mode conversations through template files, enabling better continuity and organization of AI interactions.
Security Score
Audited on Mar 21, 2026
