VFlowOpt

[ICCV 2025] VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

Generate Convert Improve

Install / Use

/learn @sihany077/VFlowOpt

About this skill

Quality Score

0/100

README

VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization

</div>  <a href="https://github.com/sihany077" target="_blank">Sihan Yang1</a>, <a href="https://runsenxu.com" target="_blank">Runsen Xu1,2</a>, <a href="https://gzcch.github.io/" target="_blank">Chenhang Cui3</a>, <a href="https://tai-wang.github.io/" target="_blank">Tai Wang1†</a>, <a href="http://dahua.site/" target="_blank">Dahua Lin1,2</a>, <a href="https://oceanpang.github.io/" target="_blank">Jiangmiao Pang1†</a> 1Shanghai AI Laboratory, 2The Chinese University of Hong Kong, 3National University of Singapore †Corresponding Author <a href="https://arxiv.org/pdf/2508.05211">📑 Paper</a> | <a href="https://arxiv.org/abs/2508.05211">📖 arXiv</a>

🔔News

[2025-08-8]: We released our paper and codes.

[2025-07-11]: Our paper is accepted by ICCV 2025! 🎉

Introduction

VFlowOpt is a novel, training-free token pruning framework designed to enhance the efficiency of Large Multimodal Models (LMMs) by addressing high computational costs from excessive visual tokens. It uniquely formulates pruning as an optimization problem, using a visual information flow-guided method to automatically find the best pruning strategy for different LMMs, thereby minimizing performance degradation. The framework features a more accurate token importance estimation by combining attention scores with image patch entropy, and it employs a progressive pruning strategy with a token recycling mechanism to preserve critical information. Experiments demonstrate that VFlowOpt can prune 90% of visual tokens while retaining 90% of the original performance, leading to an 89% reduction in KV-Cache memory and a 3.8x inference speedup.

Alt text

Quick Start

Installation

git clone https://github.com/sihany077/VFlowOpt.git
cd VFlowOpt
conda create -n VFlowOpt python=3.10 -y
conda activate VFlowOpt
bash setup.sh

Run Optimization

First, replace the dataset_path: in VFlowOpt/src/lmms_eval-0.2.4/lmms_eval/tasks/opt_data/opt_data.yaml with the directory where you downloaded the LLaVA-OneVision training data.

You can modify the codes marked "NOTE" in VFlowOpt/src/lmms_eval-0.2.4/lmms_eval/models/llava_ov_opt_all.py according to the number of layers in your model and your computational budget.

lmms-eval --model llava_ov_opt_all --model_args pretrained=pathTo/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen_training_free,device_map=auto,enable_illava_vit=True,illava_vit_k=25,enable_illava_llm=True,illava_llm_k=9-18 --task opt_data --batch_size 1 --log_samples --log_samples_suffix llava_onevision_7b --output_path ./logs

Run Evaluation

You can modify the pruning strategy in self.illava_config at line 141 of VFlowOpt/src/lmms_eval-0.2.4/lmms_eval/models/llava_onevision_training_free.py. You can refer to the LMMs-Eval usage guide for more information.

lmms-eval --model llava_onevision_training_free --model_args pretrained=pathTo/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen_training_free,device_map=auto,enable_illava_vit=True,illava_vit_k=25,enable_illava_llm=True,illava_llm_k=9-18 --task mmstar  --batch_size 1 --log_samples --log_samples_suffix llava_onevision_7b --output_path ./logs

📄 License

Shield:

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Acknowledgment

This repo benefits from iLLaVA, LMMs-Eval, and LLaVA-OneVison. We thank these teams for their open-source contributions.

Contact

Sihan Yang: sihany077@gmail.com

Related Skills

node-connect

344.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

96.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

344.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

344.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。