IGV
This repo contains code for Invariant Grounding for Video Question Answering
Install / Use
/learn @yl3800/IGVREADME
Overview
This repo contains source code for Invariant Grounding for Video Question Answering (CVPR 2022 Oral, Best Paper Finalists). In this work, propose a new learning framework, Invariant Grounding for VideoQA (IGV), to ground the question-critical scene, whose causal relations with answers are invariant across different interventions on the complement. With IGV, the VideoQA models are forced to shield the answering process from the negative influence of spurious correlations, which significantly improves the reasoning ability.
<p align="center"> <img src="figures/framework.png" height="200">< </center> </p>Installation
- Main packages: PyTorch = 1.11
- See
requirements.txtfor other packages.
Data Preparation
We use MSVD-QA as an example to help get farmiliar with the code. Please download the dataset in dataset.zip and the pre-computed features here
After downloading the data, please modify your data path and feature path in run.py.
Run IGV
Simply run train.sh to reproduce the results in the paper. We have saved our checkpoint here (acc 41.42% on MSVD-QA) for your references.
Reference
@InProceedings{Li_2022_CVPR,
author = {Li, Yicong and Wang, Xiang and Xiao, Junbin and Ji, Wei and Chua, Tat-Seng},
title = {Invariant Grounding for Video Question Answering},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {2928-2937}
}
Acknowledgement
Our reproduction of the methods is based on the respective official repositories and NExT-QA, we thank the authors to release their code.
Related Skills
docs-writer
99.6k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
341.8kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
ddd
Guía de Principios DDD para el Proyecto > 📚 Documento Complementario : Este documento define los principios y reglas de DDD. Para ver templates de código, ejemplos detallados y guías paso
arscontexta
2.9kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
