SkillAgentSearch skills...

VisualCoT

Codebase for AAAI 2024 conference paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

Install / Use

/learn @UMass-Embodied-AGI/VisualCoT
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Code for paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

Overall framework

framework

Preprocess datasets

  • Coco dataset 2014 and 2017
  • Download OK-VQA and AOK-VQA dataset, following the PICa format
  • For A-OKVQA, run preprocess script (preprocess/preprocess_aokvqa.sh for AOK-VQA). For OK-VQA, you need to modify the script a little to fit its format (A-OKVQA and OK-VQA have similar formats).
  • Make training object similarity file (object_similarity/object_similarity_aokvqa.sh for AOK-VQA and object_similarity/object_similarity_okvqa.sh for OK-VQA)

Prepare Scene graph and captions

  • Before running experiments, VisualCoT also need scene graph and captions, including three files for each input image (under input_text/scene_graph_text/scene_graph_coco17, input_text/scene_graph_text/scene_graph_coco17_attr, and input_text/scene_graph_text/scene_graph_coco17_caption). We have provided an example of image No.57 under each dir. Please follow the format of the examples and get scene graphs for all other images.
  • If you do not want to inference a scene graph model to get the scene graphs, here we provide the scene graphs and captions we generated (need additional process to match the format of above three examples):

Run experiments

  • run_aokvqa.sh for AOK-VQA
  • run_okvqa.sh for OK-VQA

Main Results

| Backbone | OK-VQA test (DA) | AOK-VQA val (DA) | AOK-VQA test (DA) | |-------------|------------------|------------------|-------------------| | OPT-66B | 44.6 | 46.4 | 46.0 | | Llama-2-70B | 54.9 | 50.5 | 54.4 |

Cite

arXiv version

@article{chen2023see,
  title={Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning},
  author={Chen, Zhenfang and Zhou, Qinhong and Shen, Yikang and Hong, Yining and Sun, Zhiqing and Gutfreund, Dan and Gan, Chuang},
  journal={arXiv preprint arXiv:2301.05226},
  year={2023}
}
View on GitHub
GitHub Stars39
CategoryDevelopment
Updated1mo ago
Forks4

Languages

Python

Security Score

75/100

Audited on Feb 21, 2026

No findings