TextPecker
[CVPR2026] TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering
Install / Use
/learn @CIawevy/TextPeckerREADME
TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering
<p align="center"><img src="assets/teaser.webp" width="95%"></p> <p align="center"><img src="assets/method.png" width="95%"></p> <!-- <p align="center"><img src="assets/motivation.png" width="95%"></p> --> <!-- <p align="center"><img src="assets/data_pipe.png" width="95%"></p> --> <!-- <p align="center"><img src="assets/eval.png" width="95%"></p> -->Hanshen Zhu<sup>1,*</sup>, Yuliang Liu<sup>1</sup>, Xuecheng Wu<sup>2</sup>, An-Lan Wang<sup>2</sup>, Hao Feng<sup>2</sup>, Dingkang Yang<sup>2</sup>, Chao Feng<sup>2</sup>, Can Huang<sup>2</sup>, Jingqun Tang<sup>2,†</sup>, Xiang Bai<sup>1,✉</sup>
<sup>1</sup> Huazhong University of Science & Technology <sup>2</sup> ByteDance <sup>†</sup> Project Lead. <sup>✉</sup> Corresponding Author.
Abstract
Visual Text Rendering (VTR) remains a critical challenge in text‑to‑image generation, where even advanced models frequently produce text with structural anomalies such as distortion, blurriness, and misalignment. However, we find that leading MLLMs and specialist OCR models largely fail to perceive these structural anomalies, creating a critical bottleneck for both VTR evaluation and RL‑based optimization.
As a result, even state‑of‑the‑art generators (e.g., Seedream4.0, Qwen‑Image) still struggle to render structurally faithful text. To address this, we propose TextPecker, a plug-and-play structural anomaly perceptive RL strategy that mitigates noisy reward signals and works with any text-to-image generator. To enable this capability, we construct a recognition dataset with character‑level structural‑anomaly annotations and develop a stroke‑editing synthesis engine to expand structural‑error coverage. Experiments show that TextPecker consistently improves diverse text‑to‑image models; even on the well‑optimized Qwen‑Image, it significantly yields average gains of 4% in structural fidelity and 8.7% in semantic alignment for Chinese text rendering, establishing a new state-of-the-art in high-fidelity VTR. Our work fills a gap in VTR optimization, providing a foundational step towards reliable and structural faithful visual text generation.
📢 News
- Feb 24, 2026: Our Arxiv Paper is now publicly available.
- Feb 21, 2026: Our TextPecker has been accepted to CVPR 2026.
- Feb 18, 2026: We released the LoRA weights for different TextPecker-optimized generative models, including: SD3.5-M, Flux.1-dev, Qwen-Image.
- Feb 15, 2026: We released the official website, model, dataset for TextPecker.
🔥 Quick Start
Training, deployment, and evaluation of TextPecker are all built upon ms-swift. We currently provide two versions of model checkpoints: TextPecker-8B-Qwen3VL and TextPecker-8B-InternVL3. For detailed environment setup and model deployment/testing instructions, please refer to the official documentation.
1️⃣ Environment Setup
git clone https://github.com/CIawevy/TextPecker.git
cd TextPecker/train
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
conda create -n TextPecker python=3.11.13 -y
conda activate TextPecker
pip install -e .
cd ..
sh install_all.sh
2️⃣ Download Models & Dataset
We have uploaded our models and datasets to Hugging Face. You can download them using the provided scripts. Modify parameters (e.g., local paths, HF token) in scripts/download_models.sh and scripts/download_dataset.sh as needed, then run bash scripts/download_xxx.sh (for models / datasets). Additionally, refer to DATA to use our data engine for synthesizing your own datasets if needed.
3️⃣ Deployment (See TRAIN for more details.)
Example
bash train/deploy_textpecker.sh
4️⃣ Demo
After deployment, you can run the following command to try our demo:
python eval/TextPecker_eval/demo.py
🔥 Train & Eval
TextPecker training
TextPecker training, deployment, and evaluation are built on top of ms-swift. We provide backbone-specific training scripts under train folder. See TRAIN for more details.
VTR RL with TextPecker
Our RL framework builds on Flow-GRPO. We provide training code for optimizing text rendering models with TextPecker under ./RL/flow_grpo/. For details, please refer to RL.
Re-evaluate Benchmarks with TextPecker
TextPecker can evaluate text structural quality and image-level or box-level semantic consistency for any text generation or editing scenarios. We provide re-evaluation instructions for the following benchmarks: OneIG-Bench, CVTG-2K, LongText, TextAtlas, LeX-Bench, and TIIF-Bench. For more details, see EVAL.
🤗 Resource Collection
All fully open-sourced core resources for TextPecker are listed below:
Evaluator
| Variant | Model | | --------- | ----- | | InternVL-3 | TextPecker-8B-InternVL3 | | Qwen3-VL | TextPecker-8B-Qwen3VL |
VTR Models
| Variant | Model | | ----------- | ----- | | SD3.5-M | SD3.5M-TextPecker-SQPA | | Flux.1-dev | Flux.1-dev-TextPecker-SQPA | | Qwen-Image | QwenImage-TextPecker-SQPA |
Dataset & Engine
| Type | Link | | ------------------ | ---- | | Evaluator Dataset | TextPecker-1.5M | | VTR RL Dataset | TextPecker-RL | | Engine | TextPecker-engine |
Acknowledgement
We sincerely thank ms-swift, Flow-GRPO for their valuable methodological contributions.
Additionally, we appreciate the support of TextAtlas5M, LeX-10k, SynTIGER, WanJuan1.0, Flux.1-dev, Qwen-Image, SD3.5, CogView4, Kolors and Seedream4.0 for their role in data generation.
We also
Related Skills
node-connect
346.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
