VisionGraph
The benchmark and datasets of the ICML 2024 paper "VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context"
Install / Use
/learn @HITsz-TMG/VisionGraphREADME
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
<div align="center">Overview | News | Illustration | Citation
</div>:sparkles: Overview
This repository contains the official implementation of our ICML 2024 paper, VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context.
VisionGraph, is a benchmark used to explore the capabilities of advanced LMMs in solving multimodal graph theory problems. It encompasses eight graph problem tasks, from connectivity to shortest path problems. To step forward in this direction, we introduce a Description-Program-Reasoning (DPR) chain to enhance the logical accuracy of reasoning processes through graphical structure description generation and algorithm-aware multi-step reasoning. All prompts, datasets, checkpoints and evaluation methods related to VisionGraph and DPR are available in this repo for easy access and utilization.
Graph Data for VisionGraph can be accessed here. Graph Understanding Data for Training can be accessed here.
If you have any question, please feel free to contact me via email at liyunxin987@163.com or submit your issue in the repository.
:fire: News
[24.05.08] We have updated our paper: VisionGraph.
[24.05.11] We release the prompts, datasets, checkpoints and evaluation methods related to VisionGraph and DPR.
:rocket: Illustration
Here, you can see the detailed introduction of VisionGraph and DPR.
<p align="center" width="60%"><img src="VisionGraph.png" alt="VisionGraph" style="width: 100%; display: block; margin: auto;"></p> <p align="center" width="60%"><img src="DPR.png" alt="DPR" style="width: 100%; display: block; margin: auto;"></p>Citation
@article{li2024visiongraph,
title={VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context},
author={Yunxin Li and Baotian Hu and Haoyuan Shi and Wei Wang and Longyue Wang and Min Zhang},
journal={arXiv preprint arXiv:2405.04950},
year={2024},
}
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
