SkillAgentSearch skills...

VideoQA

No description available

Install / Use

/learn @VRU-NExT/VideoQA
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Video Question Answering: Datasets, Algorithms and Challenges

This repository contains a list of codes, leaderboards, dataset and paper lists of Video Question Answering (VideoQA). If you found any error, please don't hesitate to open an issue or pull request.

If you find this repository helpful for your work, please kindly cite the following paper. The Bibtex are listed below:

<pre> @inproceedings{zhong2022Video, title={Video Question Answering: Datasets, Algorithms and Challenges}, author={Yaoyao Zhong and Junbin Xiao and Wei Ji and Yicong Li and Weihong Deng and Tat-Seng Chua}, booktitle={The 2022 Conference on Empirical Methods in Natural Language Processing}, year={2022}, } </pre>

Contributor

Contributed by Yaoyao Zhong, Junbin Xiao and Wei Ji.

Thanks for supports from our adviser Tat-Seng Chua!


Resources


Open-sourced code

<div style="overflow-x: auto; overflow-y: auto; height: auto; width:100%;"> <table style="width:100%" border="2"> <thead> <tr> <th>Time</th> <th>Links</th> </tr> </thead> <tbody> <tr> <td>2016</td> <td><a href="https://github.com/makarandtapaswi/MovieQA_CVPR2016/">[MovieQA-CVPR]</a> </td> </tr> <tr> <td>2017</td> <td><a href="https://github.com/YunseokJANG/tgif-qa">[TGIF-CVPR]</a> <a href="https://github.com/teganmaharaj/movieFIB">[MovieFIB-CVPR]</a> <a href="https://github.com/JonghwanMun/MarioQA">[MarioQA-CVPR]</a> <a href="https://github.com/xudejing/video-question-answering">[ACMMM]</a> <a href="https://github.com/Kyung-Min/Deep-Embedded-Memory-Networks">[DEMN-IJCAI]</a> <a href="https://github.com/s0sbazinga/videoqa-stan">[STAN-IJCAI]</a> <a href="https://github.com/ffmpbgrnn/VideoQA">[VideoQA(FiB)-IJCV]</a> <a href="https://github.com/ZJULearning/videoqa">[TIP]</a> </td> </tr> <tr> <td>2018</td> <td> <a href="https://github.com/yj-yu/lsmdc">[JSFusion-ECCV]</a> <a href="https://github.com/SVQA-founder/SVQA/tree/master/code">[SVQA-ACMMM]</a> </td> </tr> <tr> <td>2019</td> <td> <a href="https://github.com/fanchenyou/HME-VideoQA">[HME-CVPR]</a> <a href="https://github.com/wannature/video-qa-FAAAN">[FAAAN-TMM]</a> </td> </tr> <tr> <td>2020</td> <td> <a href="https://github.com/thaolmk54/hcrn-videoqa">[HCRN-CVPR]</a> <a href="https://github.com/chuangg/CLEVRER">[NS-DR-ICLR]</a> <a href="https://github.com/noagarcia/knowit-rock">[ROCK-AAAI]</a> <a href="https://github.com/Jumpin2/HGA">[HGA-AAAI]</a> <a href="https://github.com/jayleicn/TVQAplus">[TVQA+-ACL]</a> <a href="https://github.com/linjieli222/HERO">[HERO-EMNLP]</a> <a href="https://github.com/aurooj/MMFT-BERT">[MMFT-BERT-EMNLP]</a> <a href="https://github.com/jacobswan1/Video2Commonsense">[V2C-EMNLP]</a> <a href="https://github.com/op-multimodal/ACRTransformer">[ACRTransformer-TCSVT]</a> <a href="https://github.com/Jumperkables/tvqa_modality_bias">[Modality Bias-BMVC]</a> </td> </tr> <tr> <td>2021</td> <td> <a href="https://github.com/doc-doc/NExT-QA">[NExT-QA-CVPR]</a> <a href="https://github.com/doc-doc/NExT-OE">[NExT-QE-CVPR]</a> <a href="https://github.com/madeleinegrunde/AGQA_baselines_code">[AGQA-CVPR]</a> <a href="https://github.com/jayleicn/ClipBERT">[ClipBERT-CVPR]</a> <a href="https://github.com/antoyang/just-ask">[VQA-T-ICCV]</a> <a href="https://github.com/InterDigitalInc/DialogSummary-VideoQA">[DialogSummary-ICCV]</a> <a href="https://github.com/liveseongho/DramaQAChallenge2020">[DramaQA-AAAI]</a> <a href="https://github.com/zfchenUnique/DCL-Release">[DCL-ICLR]</a> <a href="https://github.com/rowanz/merlot">[MERLOT-NIPS]</a> <a href="https://github.com/dingmyu/VRDP">[VRDP-NIPS]</a> <a href="https://github.com/csbobby/STAR_Benchmark">[STAR-NIPS]</a> <a href="https://github.com/PengLiang-cn/PGAT">[PGAT-ACMMM]</a> <a href="https://github.com/ahjeongseo/MASN-pytorch">[MASN-ACL]</a> <a href="https://github.com/NJUPT-MCC/DualVGR-VideoQA">[MASN-TMM]</a> <a href="https://github.com/amanchadha/iPerceive">[iPerceive-BMVC]</a> <a href="https://github.com/Trunpm/TPT-for-VideoQA">[TPT-arXiv]</a> </td> </tr> <tr> <td>2022</td> <td> <a href="https://github.com/yl3800/IGV">[IGV-CVPR]</a> <a href="https://github.com/bcmi/Causal-VidQA">[Causal-VidQA-CVPR]</a> <a href="https://github.com/GeWu-Lab/MUSIC-AVQA">[MUSIC-AVQA-CVPR]</a> <a href="https://github.com/rowanz/merlot_reserve">[MERLOT Reserve-CVPR]</a> <a href="https://github.com/doc-doc/HQGA">[HQGA-AAAI]</a> <a href="https://github.com/sail-sg/VGT">[VGT-ECCV]</a> </td> </tr> </tbody> </table> </div>

Leaderboards

Inference QA

NExT-QA
<div style="overflow-x: auto; overflow-y: auto; height: auto; width:100%;"> <table style="width:100%" border="2"> <thead> <tr> <th>Rank</th> <th>Name</th> <th>Techniques and Insights</th> <th>NExT-Val</th> <th>NExT-Test</th> </tr> </thead> <tr> <th>/</th> <th>Human Performance</th> <th>/</th> <th>88.4</th> <th>/</th> </tr> <tr> <th>1</th> <th><a href="">[VGT-ECCV2022]</th> <th>Graph, Transformer, Hierarchical Learning, Multi-Granularity</th> <th>55.02</th> <th>53.68</th> </tr> <tr> <th>2</th> <th><a href="https://openaccess.thecvf.com/content/CVPR2022/papers/Buch_Revisiting_the_Video_in_Video-Language_Understanding_CVPR_2022_paper.pdf">[ATP-CVPR2022]</th> <th>Transformer, Cross-modal Pre-training and Fine-tuning</th> <th>54.3</th> <th>/</th> </tr> <tr> <th>3</th> <th><a href="https://arxiv.org/pdf/2207.12783.pdf">[EIGV-ACMMM2022]</th> <th>Causality, Graph</th> <th>/</th> <th>53.7</th> </tr> <tr> <th>4</th> <th><a href="https://arxiv.org/abs/2202.09277">[(2.5+1)D-Transformer-AAAI2022]</th> <th>Graph, Transformer, Multi-Granularity</th> <th>53.4</th> <th>/</th> </tr> <tr> <th>5</th> <th><a href="https://arxiv.org/pdf/2204.11544.pdf">[MMA-arXiv2022]</th> <th>Graph, Hierarchical Learning</th> <th>53.3</th> <th>52.4</th> </tr> <tr> <th>6</th> <th><a href="https://arxiv.org/abs/2112.06197">[HQGA-AAAI2022]</th> <th>Modular Networks, Graph, Hierarchical Learning, Multi-Granularity</th> <th>51.42</th> <th>51.75</th> </tr> <tr> <th>7</th> <th><a href="https://openaccess.thecvf.com/content/CVPR2022/papers/Li_Invariant_Grounding_for_Video_Question_Answering_CVPR_2022_paper.pdf">[IGV-CVPR2022]</th> <th>Causality, Graph</th> <th>/</th> <th>51.34</th> </tr> <tr> <th>8</th> <th><a href="https://ojs.aaai.org/index.php/AAAI/article/view/6767">[HGA-AAAI2020]</th> <th>Graph</th> <th>49.74</th> <th>50.01</th> </tr> <tr> <th>9</th> <th><a href="https://openaccess.thecvf.com/content_CVPR_2020/papers/Le_Hierarchical_Conditional_Relation_Networks_for_Video_Question_Answering_CVPR_2020_paper.pdf">[HCRN-CVPR2020]</th> <th>Modular Networks, Hierarchical Learning</th> <th>48.20</th> <th>48.98</th> </tr> <tbody> </tbody> </table> </div>

Factoid QA

Pre-Training
<div style="overflow-x: auto; overflow-y: auto; height: auto; width:100%;"> <table style="width:100%" border="2"> <thead> <tr> <th>Rank</th> <th>Name</th> <th>Cross-Modal Pre-Training</th> <th>TGIF-Frame</th> <th>MSVD-QA</th> <th>MSRVTT-QA</th> </tr> </thead> <tbody> <tr> <th>1</th> <th><a href="https://proceedings.neurips.cc/paper/2021/file/c6d4eb15f1e84a36eff58eca3627c82e-Paper.pdf">[MERLOT-NIPS2021]</th> <th> Youtube-Temporal-180M & Conceptual Captions-3M </th> <th>69.5</th> <th>/</th> <th>43.1</th> </tr> <tr> <th>2</th> <th><a href="">[VIOLET]</th> <th> WebVid2.5M & Youtube-Temporal-180M & Conceptual Captions-3M </th> <th>68.9</th> <th>47.9</th> <th>43.9</th> </tr> <tr> <th>3</th> <th><a href="https://proceedings.neurips.cc/paper/2021/file/dea184826614d3f4c608731389ed0c74-Paper.pdf">[SSRea-NIPS2021]</th> <th>Visual Genome & COCO </th> <th>60.2</th> <th>45.5</th> <th>41.6</th> </tr> <tr> <th>4</th> <th><a href="https://openaccess.thecvf.com/content/ICCV2021/papers/Yang_Just_Ask_Learning_To_Answer_Questions_From_Millions_of_Narrated_ICCV_2021_paper.pdf">[VQA-T-ICCV2021]</th> <th>H2VQA69M</th> <th>/</th> <th>46.3</th> <th>41.5</th> </tr> <tr> <th>5</th> <th><a href="https://openaccess.thecvf.com/content/CVPR2021/papers/Lei_Less_Is_More_ClipBERT_for_Video-and-Language_Learning_via_Sparse_Sampling_CVPR_2021_paper.pdf">[ClipBERT-CVPR2021]</th> <th>Visual Genome & COCO</th> <th>60.3</th> <th>/</th> <th>37.4</th> </tr> </tbody> </table> </div>
No Pre-Training
<div style="overflow-x: auto; overflow-y: auto; height: auto; width:100%;"> <table style="

Related Skills

View on GitHub
GitHub Stars103
CategoryContent
Updated9d ago
Forks9

Security Score

75/100

Audited on Mar 21, 2026

No findings