SkillAgentSearch skills...

TextPecker

[CVPR2026] TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Install / Use

/learn @CIawevy/TextPecker
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <img src="assets/logo.png" alt="TextPecker" width="480"/> </p> <p align="center"> <a href="https://github.com/CIawevy/TextPecker"> <img src="https://img.shields.io/badge/TextPecker-Website-0A66C2?logo=safari&logoColor=white" alt="TextPecker Website" /> </a> <a href="https://arxiv.org/abs/2602.20903"> <img src="https://img.shields.io/badge/TextPecker-Paper-red?logo=arxiv&logoColor=red" alt="TextPecker Paper on arXiv" /> </a> <a href="https://huggingface.co/CIawevy/TextPecker-8B-InternVL3"> <img src="https://img.shields.io/badge/TextPecker-Model-yellow?logo=huggingface&logoColor=yellow" alt="TextPecker Model" /> </a> <a href="https://github.com/CIawevy/TextPecker/blob/main/eval/TextPecker_eval/demo.py"> <img src="https://img.shields.io/badge/TextPecker-Demo-blue?logo=googleplay&logoColor=blue" alt="TextPecker Demo" /> </a> <a href="https://huggingface.co/datasets/CIawevy/TextPecker-1.5M"> <img src="https://img.shields.io/badge/TextPecker1.5M-Dataset-orange?logo=huggingface&logoColor=yellow" alt="TextPecker-1.5M Dataset" /> </a> <!-- <a href="https://huggingface.co/spaces/ByteDance/TextPecker"> <img src="https://img.shields.io/badge/TextPecker-Space-orange?logo=huggingface&logoColor=yellow" alt="TextPecker Model" /> </a> --> <!-- <a href="https://discord.gg/eXQNFhWe"> <img src="https://img.shields.io/badge/TextPecker-Discord-5865F2?logo=discord&logoColor=purple" alt="TextPecker Discord" /> </a> --> <!-- <a href="mailto:TextPecker@bytedance.com"> <img src="https://img.shields.io/badge/TextPecker-Email-D14836?logo=gmail&logoColor=red" alt="TextPecker Email" /> </a> </p> -->

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Hanshen Zhu<sup>1,*</sup>, Yuliang Liu<sup>1</sup>, Xuecheng Wu<sup>2</sup>, An-Lan Wang<sup>2</sup>, Hao Feng<sup>2</sup>, Dingkang Yang<sup>2</sup>, Chao Feng<sup>2</sup>, Can Huang<sup>2</sup>, Jingqun Tang<sup>2,†</sup>, Xiang Bai<sup>1,✉</sup>

<sup>1</sup> Huazhong University of Science & Technology <sup>2</sup> ByteDance <sup></sup> Project Lead. <sup></sup> Corresponding Author.

Abstract

Visual Text Rendering (VTR) remains a critical challenge in text‑to‑image generation, where even advanced models frequently produce text with structural anomalies such as distortion, blurriness, and misalignment. However, we find that leading MLLMs and specialist OCR models largely fail to perceive these structural anomalies, creating a critical bottleneck for both VTR evaluation and RL‑based optimization.
As a result, even state‑of‑the‑art generators (e.g., Seedream4.0, Qwen‑Image) still struggle to render structurally faithful text. To address this, we propose TextPecker, a plug-and-play structural anomaly perceptive RL strategy that mitigates noisy reward signals and works with any text-to-image generator. To enable this capability, we construct a recognition dataset with character‑level structural‑anomaly annotations and develop a stroke‑editing synthesis engine to expand structural‑error coverage. Experiments show that TextPecker consistently improves diverse text‑to‑image models; even on the well‑optimized Qwen‑Image, it significantly yields average gains of 4% in structural fidelity and 8.7% in semantic alignment for Chinese text rendering, establishing a new state-of-the-art in high-fidelity VTR. Our work fills a gap in VTR optimization, providing a foundational step towards reliable and structural faithful visual text generation.

<p align="center"><img src="assets/teaser.webp" width="95%"></p> <p align="center"><img src="assets/method.png" width="95%"></p> <!-- <p align="center"><img src="assets/motivation.png" width="95%"></p> --> <!-- <p align="center"><img src="assets/data_pipe.png" width="95%"></p> --> <!-- <p align="center"><img src="assets/eval.png" width="95%"></p> -->

📢 News

  • Feb 24, 2026: Our Arxiv Paper is now publicly available.
  • Feb 21, 2026: Our TextPecker has been accepted to CVPR 2026.
  • Feb 18, 2026: We released the LoRA weights for different TextPecker-optimized generative models, including: SD3.5-M, Flux.1-dev, Qwen-Image.
  • Feb 15, 2026: We released the official website, model, dataset for TextPecker.

🔥 Quick Start

Training, deployment, and evaluation of TextPecker are all built upon ms-swift. We currently provide two versions of model checkpoints: TextPecker-8B-Qwen3VL and TextPecker-8B-InternVL3. For detailed environment setup and model deployment/testing instructions, please refer to the official documentation.

1️⃣ Environment Setup

git clone https://github.com/CIawevy/TextPecker.git
cd TextPecker/train
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
conda create -n TextPecker python=3.11.13 -y
conda activate TextPecker
pip install -e .
cd ..
sh install_all.sh

2️⃣ Download Models & Dataset

We have uploaded our models and datasets to Hugging Face. You can download them using the provided scripts. Modify parameters (e.g., local paths, HF token) in scripts/download_models.sh and scripts/download_dataset.sh as needed, then run bash scripts/download_xxx.sh (for models / datasets). Additionally, refer to DATA to use our data engine for synthesizing your own datasets if needed.

3️⃣ Deployment (See TRAIN for more details.)

Example

bash train/deploy_textpecker.sh

4️⃣ Demo

After deployment, you can run the following command to try our demo:

python eval/TextPecker_eval/demo.py

🔥 Train & Eval

TextPecker training

TextPecker training, deployment, and evaluation are built on top of ms-swift. We provide backbone-specific training scripts under train folder. See TRAIN for more details.

VTR RL with TextPecker

Our RL framework builds on Flow-GRPO. We provide training code for optimizing text rendering models with TextPecker under ./RL/flow_grpo/. For details, please refer to RL.

Re-evaluate Benchmarks with TextPecker

TextPecker can evaluate text structural quality and image-level or box-level semantic consistency for any text generation or editing scenarios. We provide re-evaluation instructions for the following benchmarks: OneIG-Bench, CVTG-2K, LongText, TextAtlas, LeX-Bench, and TIIF-Bench. For more details, see EVAL.

🤗 Resource Collection

All fully open-sourced core resources for TextPecker are listed below:

Evaluator

| Variant | Model | | --------- | ----- | | InternVL-3 | TextPecker-8B-InternVL3 | | Qwen3-VL | TextPecker-8B-Qwen3VL |

VTR Models

| Variant | Model | | ----------- | ----- | | SD3.5-M | SD3.5M-TextPecker-SQPA | | Flux.1-dev | Flux.1-dev-TextPecker-SQPA | | Qwen-Image | QwenImage-TextPecker-SQPA |

Dataset & Engine

| Type | Link | | ------------------ | ---- | | Evaluator Dataset | TextPecker-1.5M | | VTR RL Dataset | TextPecker-RL | | Engine | TextPecker-engine |

Acknowledgement

We sincerely thank ms-swift, Flow-GRPO for their valuable methodological contributions.

Additionally, we appreciate the support of TextAtlas5M, LeX-10k, SynTIGER, WanJuan1.0, Flux.1-dev, Qwen-Image, SD3.5, CogView4, Kolors and Seedream4.0 for their role in data generation.

We also

Related Skills

View on GitHub
GitHub Stars45
CategoryDevelopment
Updated55m ago
Forks1

Languages

Python

Security Score

75/100

Audited on Apr 3, 2026

No findings