VideoQA

No description available

Generate Convert Improve

Install / Use

/learn @VRU-NExT/VideoQA

About this skill

Quality Score

0/100

README

Video Question Answering: Datasets, Algorithms and Challenges

This repository contains a list of codes, leaderboards, dataset and paper lists of Video Question Answering (VideoQA). If you found any error, please don't hesitate to open an issue or pull request.

If you find this repository helpful for your work, please kindly cite the following paper. The Bibtex are listed below:

<pre> @inproceedings{zhong2022Video, title={Video Question Answering: Datasets, Algorithms and Challenges}, author={Yaoyao Zhong and Junbin Xiao and Wei Ji and Yicong Li and Weihong Deng and Tat-Seng Chua}, booktitle={The 2022 Conference on Empirical Methods in Natural Language Processing}, year={2022}, } </pre>

Contributor

Contributed by Yaoyao Zhong, Junbin Xiao and Wei Ji.

Thanks for supports from our adviser Tat-Seng Chua!

Open-sourced code

<div style="overflow-x: auto; overflow-y: auto; height: auto; width:100%;"> <table style="width:100%" border="2"> <thead> <tr> <th>Time</th> <th>Links</th> </tr> </thead> <tbody> <tr> <td>2016</td> <td><a href="https://github.com/makarandtapaswi/MovieQA_CVPR2016/">[MovieQA-CVPR]</a> </td> </tr> <tr> <td>2017</td> <td><a href="https://github.com/YunseokJANG/tgif-qa">[TGIF-CVPR]</a> <a href="https://github.com/teganmaharaj/movieFIB">[MovieFIB-CVPR]</a> <a href="https://github.com/JonghwanMun/MarioQA">[MarioQA-CVPR]</a> <a href="https://github.com/xudejing/video-question-answering">[ACMMM]</a> <a href="https://github.com/Kyung-Min/Deep-Embedded-Memory-Networks">[DEMN-IJCAI]</a> <a href="https://github.com/s0sbazinga/videoqa-stan">[STAN-IJCAI]</a> <a href="https://github.com/ffmpbgrnn/VideoQA">[VideoQA(FiB)-IJCV]</a> <a href="https://github.com/ZJULearning/videoqa">[TIP]</a> </td> </tr> <tr> <td>2018</td> <td> <a href="https://github.com/yj-yu/lsmdc">[JSFusion-ECCV]</a> <a href="https://github.com/SVQA-founder/SVQA/tree/master/code">[SVQA-ACMMM]</a> </td> </tr> <tr> <td>2019</td> <td> <a href="https://github.com/fanchenyou/HME-VideoQA">[HME-CVPR]</a> <a href="https://github.com/wannature/video-qa-FAAAN">[FAAAN-TMM]</a> </td> </tr> <tr> <td>2020</td> <td> <a href="https://github.com/thaolmk54/hcrn-videoqa">[HCRN-CVPR]</a> <a href="https://github.com/chuangg/CLEVRER">[NS-DR-ICLR]</a> <a href="https://github.com/noagarcia/knowit-rock">[ROCK-AAAI]</a> <a href="https://github.com/Jumpin2/HGA">[HGA-AAAI]</a> <a href="https://github.com/jayleicn/TVQAplus">[TVQA+-ACL]</a> <a href="https://github.com/linjieli222/HERO">[HERO-EMNLP]</a> <a href="https://github.com/aurooj/MMFT-BERT">[MMFT-BERT-EMNLP]</a> <a href="https://github.com/jacobswan1/Video2Commonsense">[V2C-EMNLP]</a> <a href="https://github.com/op-multimodal/ACRTransformer">[ACRTransformer-TCSVT]</a> <a href="https://github.com/Jumperkables/tvqa_modality_bias">[Modality Bias-BMVC]</a> </td> </tr> <tr> <td>2021</td> <td> <a href="https://github.com/doc-doc/NExT-QA">[NExT-QA-CVPR]</a> <a href="https://github.com/doc-doc/NExT-OE">[NExT-QE-CVPR]</a> <a href="https://github.com/madeleinegrunde/AGQA_baselines_code">[AGQA-CVPR]</a> <a href="https://github.com/jayleicn/ClipBERT">[ClipBERT-CVPR]</a> <a href="https://github.com/antoyang/just-ask">[VQA-T-ICCV]</a> <a href="https://github.com/InterDigitalInc/DialogSummary-VideoQA">[DialogSummary-ICCV]</a> <a href="https://github.com/liveseongho/DramaQAChallenge2020">[DramaQA-AAAI]</a> <a href="https://github.com/zfchenUnique/DCL-Release">[DCL-ICLR]</a> <a href="https://github.com/rowanz/merlot">[MERLOT-NIPS]</a> <a href="https://github.com/dingmyu/VRDP">[VRDP-NIPS]</a> <a href="https://github.com/csbobby/STAR_Benchmark">[STAR-NIPS]</a> <a href="https://github.com/PengLiang-cn/PGAT">[PGAT-ACMMM]</a> <a href="https://github.com/ahjeongseo/MASN-pytorch">[MASN-ACL]</a> <a href="https://github.com/NJUPT-MCC/DualVGR-VideoQA">[MASN-TMM]</a> <a href="https://github.com/amanchadha/iPerceive">[iPerceive-BMVC]</a> <a href="https://github.com/Trunpm/TPT-for-VideoQA">[TPT-arXiv]</a> </td> </tr> <tr> <td>2022</td> <td> <a href="https://github.com/yl3800/IGV">[IGV-CVPR]</a> <a href="https://github.com/bcmi/Causal-VidQA">[Causal-VidQA-CVPR]</a> <a href="https://github.com/GeWu-Lab/MUSIC-AVQA">[MUSIC-AVQA-CVPR]</a> <a href="https://github.com/rowanz/merlot_reserve">[MERLOT Reserve-CVPR]</a> <a href="https://github.com/doc-doc/HQGA">[HQGA-AAAI]</a> <a href="https://github.com/sail-sg/VGT">[VGT-ECCV]</a> </td> </tr> </tbody> </table> </div>

Leaderboards

Inference QA

NExT-QA

<div style="overflow-x: auto; overflow-y: auto; height: auto; width:100%;"> <table style="width:100%" border="2"> <thead> <tr> <th>Rank</th> <th>Name</th> <th>Techniques and Insights</th> <th>NExT-Val</th> <th>NExT-Test</th> </tr> </thead> <tr> <th>/</th> <th>Human Performance</th> <th>/</th> <th>88.4</th> <th>/</th> </tr> <tr> <th>1</th> <th><a href="">[VGT-ECCV2022]</th> <th>Graph, Transformer, Hierarchical Learning, Multi-Granularity</th> <th>55.02</th> <th>53.68</th> </tr> <tr> <th>2</th> <th><a href="https://openaccess.thecvf.com/content/CVPR2022/papers/Buch_Revisiting_the_Video_in_Video-Language_Understanding_CVPR_2022_paper.pdf">[ATP-CVPR2022]</th> <th>Transformer, Cross-modal Pre-training and Fine-tuning</th> <th>54.3</th> <th>/</th> </tr> <tr> <th>3</th> <th><a href="https://arxiv.org/pdf/2207.12783.pdf">[EIGV-ACMMM2022]</th> <th>Causality, Graph</th> <th>/</th> <th>53.7</th> </tr> <tr> <th>4</th> <th><a href="https://arxiv.org/abs/2202.09277">[(2.5+1)D-Transformer-AAAI2022]</th> <th>Graph, Transformer, Multi-Granularity</th> <th>53.4</th> <th>/</th> </tr> <tr> <th>5</th> <th><a href="https://arxiv.org/pdf/2204.11544.pdf">[MMA-arXiv2022]</th> <th>Graph, Hierarchical Learning</th> <th>53.3</th> <th>52.4</th> </tr> <tr> <th>6</th> <th><a href="https://arxiv.org/abs/2112.06197">[HQGA-AAAI2022]</th> <th>Modular Networks, Graph, Hierarchical Learning, Multi-Granularity</th> <th>51.42</th> <th>51.75</th> </tr> <tr> <th>7</th> <th><a href="https://openaccess.thecvf.com/content/CVPR2022/papers/Li_Invariant_Grounding_for_Video_Question_Answering_CVPR_2022_paper.pdf">[IGV-CVPR2022]</th> <th>Causality, Graph</th> <th>/</th> <th>51.34</th> </tr> <tr> <th>8</th> <th><a href="https://ojs.aaai.org/index.php/AAAI/article/view/6767">[HGA-AAAI2020]</th> <th>Graph</th> <th>49.74</th> <th>50.01</th> </tr> <tr> <th>9</th> <th><a href="https://openaccess.thecvf.com/content_CVPR_2020/papers/Le_Hierarchical_Conditional_Relation_Networks_for_Video_Question_Answering_CVPR_2020_paper.pdf">[HCRN-CVPR2020]</th> <th>Modular Networks, Hierarchical Learning</th> <th>48.20</th> <th>48.98</th> </tr> <tbody> </tbody> </table> </div>

Factoid QA

Pre-Training

<div style="overflow-x: auto; overflow-y: auto; height: auto; width:100%;"> <table style="width:100%" border="2"> <thead> <tr> <th>Rank</th> <th>Name</th> <th>Cross-Modal Pre-Training</th> <th>TGIF-Frame</th> <th>MSVD-QA</th> <th>MSRVTT-QA</th> </tr> </thead> <tbody> <tr> <th>1</th> <th><a href="https://proceedings.neurips.cc/paper/2021/file/c6d4eb15f1e84a36eff58eca3627c82e-Paper.pdf">[MERLOT-NIPS2021]</th> <th> Youtube-Temporal-180M & Conceptual Captions-3M </th> <th>69.5</th> <th>/</th> <th>43.1</th> </tr> <tr> <th>2</th> <th><a href="">[VIOLET]</th> <th> WebVid2.5M & Youtube-Temporal-180M & Conceptual Captions-3M </th> <th>68.9</th> <th>47.9</th> <th>43.9</th> </tr> <tr> <th>3</th> <th><a href="https://proceedings.neurips.cc/paper/2021/file/dea184826614d3f4c608731389ed0c74-Paper.pdf">[SSRea-NIPS2021]</th> <th>Visual Genome & COCO </th> <th>60.2</th> <th>45.5</th> <th>41.6</th> </tr> <tr> <th>4</th> <th><a href="https://openaccess.thecvf.com/content/ICCV2021/papers/Yang_Just_Ask_Learning_To_Answer_Questions_From_Millions_of_Narrated_ICCV_2021_paper.pdf">[VQA-T-ICCV2021]</th> <th>H2VQA69M</th> <th>/</th> <th>46.3</th> <th>41.5</th> </tr> <tr> <th>5</th> <th><a href="https://openaccess.thecvf.com/content/CVPR2021/papers/Lei_Less_Is_More_ClipBERT_for_Video-and-Language_Learning_via_Sparse_Sampling_CVPR_2021_paper.pdf">[ClipBERT-CVPR2021]</th> <th>Visual Genome & COCO</th> <th>60.3</th> <th>/</th> <th>37.4</th> </tr> </tbody> </table> </div>

No Pre-Training

<div style="overflow-x: auto; overflow-y: auto; height: auto; width:100%;"> <table style="

Related Skills

docs-writer

99.6k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

342.0k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

2.9k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

cursor-agent-tracking

134

A repository that provides a structured system for maintaining context and tracking changes in Cursor's AGENT mode conversations through template files, enabling better continuity and organization of AI interactions.

VRU-NExT

View profile

View on GitHub

GitHub Stars103

CategoryContent

Updated9d ago

Forks9

VRU-NExT/VideoQA

Security Score

75/100

Audited on Mar 21, 2026

No findings