CDistNet
Official Pytorch implementations of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition(IJCV)
Install / Use
/learn @simplify23/CDistNetREADME
CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition
The official code of CDistNet.
Paper Link : Arxiv Link
What's News
- [2023-08]🌟 Our paper is accepted by IJCV
- [2022-01]🌟 Our code is released in github
- [2021-11]🌟 The paper can be read in Arixv: http://arxiv.org/abs/2111.11011

To Do List
- [x] HA-IC13 & CA-IC13
- [x] Pre-train model
- [x] Cleaned Code
- [ ] Document
- [ ] Distributed Training
Two New Datasets
we test other sota method in HA-IC13 and CA-IC13 datasets.
CDistNet has a performance advantage over other SOTA methods as the character distance increases (1-6)
HA-IC13
|Method |1 | 2 | 3 | 4 | 5 | 6 | Code & Pretrain model| |- | - | - | - | - | - | - | - | |VisionLAN (ICCV 2021) | 93.58 | 92.88 | 89.97 | 82.26 | 72.23 | 61.03 | Offical Code| |ABINet (CVPR 2021 ) | 95.92 |95.22 | 91.95 | 85.76 | 73.75 | 64.99 | Offical Code| |RobustScanner* (ECCV 2020) | 96.15 | 95.33 | 93.23 | 88.91 | 81.10 |71.53 | -- | | Transformer-baseline* | 96.27 | 95.45 | 92.42 | 86.46 | 79.35 | 72.46 | -- | |CDistNet |96.62| 96.15 | 94.28 | 89.96 | 83.43 | 77.71 | -- |
CA-IC13
|Method |1 | 2 | 3 | 4 | 5 | 6 | Code & Pretrain model| |- | - | - | - | - | - | - | - | |VisionLAN (ICCV 2021) | 94.87 | 92.77 | 84.01 | 75.03 | 64.29 | 52.74 | Offical Code| |ABINet (CVPR 2021 ) | 96.62 | 95.92 | 87.86 |76.31 | 65.46 | 54.49 | Offical Code| |RobustScanner* (ECCV 2020) | 95.22 | 94.87 | 85.30 | 76.55 | 68.38 |60.79 | -- | | Transformer-baseline* | 95.68 | 94.40 | 85.88 | 75.85 | 65.93 | 58.58 | -- | |CDistNet | 96.27 | 95.57 | 88.45 | 79.58 | 70.36 | 63.13 | -- |
Datasets
The datasets are same as ABINet
-
Training datasets
-
Evaluation & Test datasets, LMDB datasets can be downloaded from BaiduNetdisk(passwd:1dbv), GoogleDrive.
- ICDAR 2013 (IC13)
- ICDAR 2015 (IC15)
- IIIT5K Words (IIIT)
- Street View Text (SVT)
- Street View Text-Perspective (SVTP)
- CUTE80 (CUTE)
-
Augment IC13
- HA-IC13 & CA-IC13 : BaiduNetdisk(passwd:d6jd), GoogleDrive
-
The structure of
datasetdirectory isdataset ├── eval │ ├── CUTE80 │ ├── IC13_857 │ ├── IC15_1811 │ ├── IIIT5k_3000 │ ├── SVT │ └── SVTP ├── train │ ├── MJ │ │ ├── MJ_test │ │ ├── MJ_train │ │ └── MJ_valid │ └── ST
Environment
package you can find in env_cdistnet.yaml.
#Installed
conda create -n CDistNet python=3.7
conda install pytorch==1.5.1 torchvision==0.6.1 cudatoolkit=9.2 -c pytorch
pip install opencv-python mmcv notebook numpy einops tensorboardX Pillow thop timm tornado tqdm matplotlib lmdb
Pretrained Models
Get the pretrained models from BaiduNetdisk(passwd:d6jd), GoogleDrive.
(We both offer training log and result.csv in same file.)
The pretrained model should set in models/reconstruct_CDistNetv3_3_10
Performances of the pretrained models are summaried as follows:
Train
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --config=configs/CDistNet_config.py
Eval
CUDA_VISIBLE_DEVICES=0 python eval.py --config=configs/CDistNet_config.py
Citation
@article{Zheng2021CDistNetPM,
title={CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition},
author={Tianlun Zheng and Zhineng Chen and Shancheng Fang and Hongtao Xie and Yu-Gang Jiang},
journal={ArXiv},
year={2021},
volume={abs/2111.11011}
}
Related Skills
node-connect
349.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
