UnionST

[CVPR 2026] Official data synthesis code of the paper "What’s Wrong with Synthetic Data for Scene Text Recognition? A Strong Synthetic Engine with Diverse Simulations and Self-Evolution".

Generate Convert Improve

Install / Use

/learn @YesianRohn/UnionST

About this skill

Quality Score

0/100

README

UnionST: A Strong Synthetic Engine for Scene Text Recognition

Official data synthesis code of the paper "What’s Wrong with Synthetic Data for Scene Text Recognition? A Strong Synthetic Engine with Diverse Simulations and Self-Evolution".

Introduction

Scene Text Recognition (STR) relies critically on large-scale, high-quality training data. While synthetic data provides a cost-effective alternative to manually annotated real data, existing rendering-based synthetic datasets suffer from insufficient diversity (corpus/font/layout) and a large domain gap with real-world text.

Key Advantages

🎯 100% Label Correctness: Rendering-based paradigm ensures accurate labels (unlike generative models with aesthetic but error-prone outputs).
⚡ Cost-Efficiency: CPU-based generation costs only 1/20 of diffusion-based methods and 1/10,000 of closed-source alternatives.
🚀 Strong Performance: UnionST-S (5M samples) outperforms 36M-scale traditional synthetic datasets on challenging STR benchmarks.

Dataset

UnionST-S, UnionST-P, and UnionST-R datasets (each containing 5M samples) can be downloaded from Huggingface. We use the lmdb file format adopted by the mainstream STR protocol. In addition, we have summarized the other STR synthetic datasets compared in the paper, which are available here.

Training Model

The configuration and implementation of the SVTRv2-AR model have been completed in OpenOCR.

cd OpenOCR
torchrun  --nproc_per_node=8 tools/train_rec.py --c configs/rec/nrtr/svtrv2_nrtr_unionst.yml

Some of our trained models can be found at Huggingface.

Citation

@inproceedings{ye2026wrong,
title={What's Wrong with Synthetic Data for Scene Text Recognition? A Strong Synthetic Engine with Diverse Simulations and Self-Evolution},
author={Ye, Xingsong and Du, Yongkun and Zhang, JiaXin and Li, Chen and LYU, Jing and Chen, Zhineng},
booktitle={CVPR},
year={2026}
}

License

"""
UnionST
Copyright (c) 2025-present YesianRohn
Based on SynthTIGER
Copyright (c) 2021-present NAVER Corp.
MIT License
"""

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

Acknowledgements

We thank the SynthText, SynthTIGER, SVTRv2 and Union14M for their open-source code/datasets.
Special thanks also go to the training framework: OpenOCR.

Related Skills

node-connect

349.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。