16 skills found
YehLi / XmodalerX-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
jokieleung / Awesome Visual Question AnsweringA curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
rowanz / R2cRecognition to Cognition Networks (code for the model in "From Recognition to Cognition: Visual Commonsense Reasoning", CVPR 2019)
yuweijiang / HGL PytorchCode for the model "Heterogeneous Graph Learning for Visual Commonsense Reasoning (NeurlPS 2019)"
WadeYin9712 / GD VCRCode and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).
guyyariv / VLMIGThis repo contains the official PyTorch implementation of vLMIG: Improving Visual Commonsense in Language Models via Multiple Image Generation
AmingWu / CCNConnective Cognition Network for Directional Visual Commonsense Reasoning
PKU-ICST-MIPL / CKRM TCSVT2020Source code of our TCSVT 2020 paper "Multi-level Knowledge Injecting for Visual Commonsense Reasoning"
zhangxi1997 / MCCThe code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"
yekeren / VCR Shortcut Effects StudyCode and data of our AAAI2021 paper "A Case Study of the Shortcut Effects in Visual Commonsense Reasoning"
Gary-code / PEIFG[ACM MM 2024] The released code of paper "Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor"
marialymperaiou / Knowledge Enhanced Multimodal LearningA list of research papers on knowledge-enhanced multimodal learning
ZhuYun97 / Awesome Visual Reasoning DatasetsA curated collection of datasets for visual reasoning research across multiple domains, including mathematics, science, spatial understanding, and commonsense reasoning.
SDLZY / ARCThe codes for paper "Two Processes in One Step: Jointly Answering and Explaining for Visual Commonsense Reasoning"
zhangxi1997 / ECMR VCRThe coder for the paper "Explicit Cross-Modal Representation Learning for Visual Commonsense Reasoning"
eric-ai-lab / ViCorThis is the implementation of ACL 2024 Findings paper ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models