TextPecker

[CVPR2026] TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Generate Convert Improve

Install / Use

/learn @CIawevy/TextPecker

About this skill

Quality Score

0/100

README

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Hanshen Zhu1,*, Yuliang Liu1, Xuecheng Wu2, An-Lan Wang2, Hao Feng2, Dingkang Yang2, Chao Feng2, Can Huang2, Jingqun Tang2,†, Xiang Bai1,✉

1 Huazhong University of Science & Technology 2 ByteDance † Project Lead. ✉ Corresponding Author.

Abstract

Visual Text Rendering (VTR) remains a critical challenge in text‑to‑image generation, where even advanced models frequently produce text with structural anomalies such as distortion, blurriness, and misalignment. However, we find that leading MLLMs and specialist OCR models largely fail to perceive these structural anomalies, creating a critical bottleneck for both VTR evaluation and RL‑based optimization.
As a result, even state‑of‑the‑art generators (e.g., Seedream4.0, Qwen‑Image) still struggle to render structurally faithful text. To address this, we propose TextPecker, a plug-and-play structural anomaly perceptive RL strategy that mitigates noisy reward signals and works with any text-to-image generator. To enable this capability, we construct a recognition dataset with character‑level structural‑anomaly annotations and develop a stroke‑editing synthesis engine to expand structural‑error coverage. Experiments show that TextPecker consistently improves diverse text‑to‑image models; even on the well‑optimized Qwen‑Image, it significantly yields average gains of 4% in structural fidelity and 8.7% in semantic alignment for Chinese text rendering, establishing a new state-of-the-art in high-fidelity VTR. Our work fills a gap in VTR optimization, providing a foundational step towards reliable and structural faithful visual text generation.

📢 News

Feb 24, 2026: Our Arxiv Paper is now publicly available.
Feb 21, 2026: Our TextPecker has been accepted to CVPR 2026.
Feb 18, 2026: We released the LoRA weights for different TextPecker-optimized generative models, including: SD3.5-M, Flux.1-dev, Qwen-Image.
Feb 15, 2026: We released the official website, model, dataset for TextPecker.

🔥 Quick Start

Training, deployment, and evaluation of TextPecker are all built upon ms-swift. We currently provide two versions of model checkpoints: TextPecker-8B-Qwen3VL and TextPecker-8B-InternVL3. For detailed environment setup and model deployment/testing instructions, please refer to the official documentation.

1️⃣ Environment Setup

git clone https://github.com/CIawevy/TextPecker.git
cd TextPecker/train
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
conda create -n TextPecker python=3.11.13 -y
conda activate TextPecker
pip install -e .
cd ..
sh install_all.sh

2️⃣ Download Models & Dataset

We have uploaded our models and datasets to Hugging Face. You can download them using the provided scripts. Modify parameters (e.g., local paths, HF token) in scripts/download_models.sh and scripts/download_dataset.sh as needed, then run bash scripts/download_xxx.sh (for models / datasets). Additionally, refer to DATA to use our data engine for synthesizing your own datasets if needed.

3️⃣ Deployment （See TRAIN for more details.）

Example

bash train/deploy_textpecker.sh

4️⃣ Demo

After deployment, you can run the following command to try our demo:

python eval/TextPecker_eval/demo.py

🔥 Train & Eval

TextPecker training

TextPecker training, deployment, and evaluation are built on top of ms-swift. We provide backbone-specific training scripts under train folder. See TRAIN for more details.

VTR RL with TextPecker

Our RL framework builds on Flow-GRPO. We provide training code for optimizing text rendering models with TextPecker under ./RL/flow_grpo/. For details, please refer to RL.

Re-evaluate Benchmarks with TextPecker

TextPecker can evaluate text structural quality and image-level or box-level semantic consistency for any text generation or editing scenarios. We provide re-evaluation instructions for the following benchmarks: OneIG-Bench, CVTG-2K, LongText, TextAtlas, LeX-Bench, and TIIF-Bench. For more details, see EVAL.

🤗 Resource Collection

All fully open-sourced core resources for TextPecker are listed below:

Evaluator

| Variant | Model | | --------- | ----- | | InternVL-3 | TextPecker-8B-InternVL3 | | Qwen3-VL | TextPecker-8B-Qwen3VL |

VTR Models

| Variant | Model | | ----------- | ----- | | SD3.5-M | SD3.5M-TextPecker-SQPA | | Flux.1-dev | Flux.1-dev-TextPecker-SQPA | | Qwen-Image | QwenImage-TextPecker-SQPA |

Dataset & Engine

| Type | Link | | ------------------ | ---- | | Evaluator Dataset | TextPecker-1.5M | | VTR RL Dataset | TextPecker-RL | | Engine | TextPecker-engine |

Acknowledgement

We sincerely thank ms-swift, Flow-GRPO for their valuable methodological contributions.

Additionally, we appreciate the support of TextAtlas5M, LeX-10k, SynTIGER, WanJuan1.0, Flux.1-dev, Qwen-Image, SD3.5, CogView4, Kolors and Seedream4.0 for their role in data generation.

We also

Related Skills

node-connect

346.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

107.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

346.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

346.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。