StatsChartMWP

No description available

Generate Convert Improve

Install / Use

/learn @ai4ed/StatsChartMWP

About this skill

Quality Score

0/100

README

StatsChartMWP

This is the official repository for the paper "StatsChartMWP: A Dataset for Evaluating Multimodal Mathematical Reasoning Abilities on Math Word Problems with Statistical Charts", the paper link is coming soon.

🏆 Leaderboard

The leaderboard is continuously being updated. If you have any new results to contribute, please feel free to reach out to us.

| # | Model | Method| Date | ALL | Bar | Hist | Line | Line-f | Scatter | D-axis | P-bar | Pie | Table | Comp | Radar | | ----------- | ---------------------- | ------------------------ | -------------- | ------------- | ------------- | -------------- | --------------- | --------------- | -------------- | ------------- | --------------- | --------------- | ------------- | --------------- | -------------- | | 1 | o3 | LMM | 2025-09-08 | 82.75 | 81.73 | 77.71 | 76.96 | 71.97 | 83.12 | 82.81 | 90.91 | 88.10 | 93.23 | 83.98 | 33.33 | | 2 | Qwen2.5-VL-72B | LMM | 2025-09-08 | 71.12 | 78.45 | 59.51 | 68.45 | 56.90 | 54.37 | 65.62 | 63.64 | 65.78 | 85.89 | 61.07 | 41.67 | | 3 | Qwen2-VL-72B | LMM | 2025-02-23 | 59.33 | 69.91 | 39.29 | 60.03 | 46.44 | 43.75 | 62.50 | 59.09 | 65.78 | 77.12 | 50.39 | 62.50 | | 4 | GPT-4o | LMM | 2025-02-23 | 57.05 | 66.51 | 26.38 | 58.76 | 42.26 | 45.62 | 68.75 | 54.55 | 72.57 | 81.54 | 49.50 | 45.83 | | 5 | InternVL2_5-78B | LMM | 2025-02-23 | 55.25| 70.93| 29.26| 56.12| 40.59| 48.75| 57.81| 54.55| 57.01| 74.27| 51.84| 37.04 | | 6 | GPT4 (GPT-4o) | LLM | 2025-02-23 | 46.95 | 59.98 | 13.30 | 52.72 | 35.98 | 27.50 | 45.31 | 27.27 | 59.19 | 71.85 | 38.82 | 20.83 | | 7 | InternVL2-Llama3-76B | LMM | 2025-02-23 | 45.02 | 58.81 | 24.58 | 50.43 | 35.98 | 43.12 | 42.19 | 13.64 | 48.08 | 57.38 | 35.37 | 29.17 | | 8 | Qwen2-VL-7B | LMM | 2025-02-23 | 37.46 | 45.67 | 20.16 | 39.29 | 30.96 | 31.25 | 65.62 | 36.36 | 44.54 | 51.25 | 25.70 | 62.50 | | 9 | GPT-4V | LMM | 2025-02-23 | 34.28 | 38.57 | 12.10 | 40.48 | 28.87 | 30.00 | 39.06 | 18.18 | 38.25 | 55.67 | 27.89 | 33.33 | | 10 | LLaVA-OV-72B | LMM | 2025-02-23 | 32.39 | 38.33 | 15.26 | 39.80 | 30.54 | 35.62 | 42.19 | 31.82 | 34.32 | 45.97 | 22.91 | 16.67 | | 11 | GPT4 (GPT-4V) | LLM | 2025-02-23 | 31.47 | 38.11 | 8.61 | 39.12 | 22.18 | 20.62 | 35.94 | 4.55 | 34.71 | 52.46 | 24.36 | 20.83 | | 12 | Qwen-VL-MAX | LLM | 2025-02-23 | 30.24 | 37.40 | 10.19 | 29.51 | 19.25 | 20.00 | 29.69 | 18.18 | 37.86 | 54.74 | 16.91 | 33.33 | | 13 | IXC-2.5-7B | LMM | 2025-02-23 | 22.55 | 31.10 | 7.36 | 29.25 | 17.99 | 18.75 | 43.75 | 18.18 | 24.88 | 29.72 | 15.02 | 41.67 | | 14 | Cambrian-34B | LMM | 2025-02-23 | 18.15 | 22.03 | 8.77 | 27.89 | 14.23 | 18.75 | 46.88 | 22.73 | 16.52 | 20.24 | 14.02 | 41.67 | | 15 | LLaVA-NeXT-34B | LMM | 2025-02-23 | 15.67 | 20.96 | 5.45 | 23.13 | 13.39 | 20.00 | 25.00 | 4.55 | 14.06 | 19.24 | 12.44 | 20.83 | | 16 | DeepSeek-VL-7B | LMM | 2025-02-23 | 13.20 | 16.06 | 4.63 | 21.43 | 11.72 | 12.50 | 28.12 | 4.55 | 14.16 | 15.47 | 9.78 | 8.33 | | 17 | HPT-1.0 | LMM | 2025-02-23 | 10.10 | 9.91 | 5.07 | 17.77 | 9.62 | 10.62 | 26.56 | 9.09 | 7.18 | 10.62 | 11.56 | 29.17 |

📐 StatsChartMWP Dataset

The StatsChartMWP dataset is designed as a benchmark to develop AI models capable of understanding multimodal information present in math word problems with statistical charts. Our dataset incorporates a variety of chart forms, presenting a broad visual spectrum and mathematical knowledge competencies and each item originates from real-world educational contexts, encompassing challenges formulated by mathematics educators, genuine student inquiries, and historical examination questions. The StatsChartMWP dataset encompasses 8,514 unique MWPs with statistical charts. The StatsChartMWP dataset contains 11 different types of statistical charts, including bar, line, line-function, dual-axis, pie, composite, radar, histograms, scatter, percentage-bar, tables. A comparative example between our dataset and ChartQA and FigureQA is shown below. R-Steps means the average reasoning steps of the dataset.

domains

The StatsChartMWP dataset json file and images are provided in [data].

🌟 CoTAR

Introduction

We introduce CoTAR, a data augmentation strategy that utilizes CoT augmented reasoning to alleviate the cross-modal alignment between representations of visual mediums of artificial figures and technical language and equations. Specifically, instead of directly using the concise textual solutions of the MWPs, we use the state-of-the-art LLM, so convert them into detailed step-by-step explanations in a CoT-alike format to improve their logical clarity. Furthermore, each step is made up of a short step summary that explicitly states the purpose of this step and a concrete reasoning response. The step summary serves as a guiding directive for the logical analysis or computation required in the current step, while the concrete reasoning response provides a detailed explanation of the process undertaken in response to the step summary. The architecture of our method illustrated in follow:

<p align="center"> <img src="assets/figures/architecture.png" width="100%"> <br> An illustration of CoTAR. (a) the original MWP with statistical chart. (b) the corresponding original solution. (c) the solution of CoTAR. The bold words are the step summaries and the following sentences are reasoning responses. </p>

We conducted fine-tuning on Qwen2-VL-7B. By employing both problem-original solution pairs and problem-augmented solution pairs on our proprietary training dataset, we achieved a 8.76% improvement in algorithmic accuracy.

Quick Start

Finetune

Finetune the Qwen2-VL-7B, you can see the official GitHub repository of Qwen2-VL-7B.

CoTAR

the prompt of CoTAR is provided in prompts. You can run the main code to get the CoTAR solution data.

python main.py

License

This work is marked with CC0 1.0

Related Work

Explore our additional research on Vision-Language Large Models, focusing on multi-modal LLMs and mathematical reasoning:

[ChartQA] ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
[TABMWP] DYNAMIC PROMPT LEARNING VIA POLICY GRADIENT FOR SEMI-STRUCTURED MATHEMATICAL REASONING
[MathVista] MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
[MathVerse] MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
[MATH-Vision] Measuring Multimodal Mathematical Reasoning with the MATH-Vision Dataset
[OlympiadBench] OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
[InternVL] InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
[LLaVA] LLaVA: Large Language and Vision Assistant

Related Skills

node-connect

344.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

99.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

344.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

344.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。