CDistNet

Official Pytorch implementations of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition（IJCV）

Generate Convert Improve

Install / Use

/learn @simplify23/CDistNet

About this skill

Quality Score

0/100

README

CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

The official code of CDistNet.

Paper Link : Arxiv Link

What's News

[2023-08]🌟 Our paper is accepted by IJCV
[2022-01]🌟 Our code is released in github
[2021-11]🌟 The paper can be read in Arixv: http://arxiv.org/abs/2111.11011

pipline

To Do List

[x] HA-IC13 & CA-IC13
[x] Pre-train model
[x] Cleaned Code
[ ] Document
[ ] Distributed Training

Two New Datasets

we test other sota method in HA-IC13 and CA-IC13 datasets.

HA_CA CDistNet has a performance advantage over other SOTA methods as the character distance increases (1-6)

HA-IC13

|Method |1 | 2 | 3 | 4 | 5 | 6 | Code & Pretrain model| |- | - | - | - | - | - | - | - | |VisionLAN (ICCV 2021) | 93.58 | 92.88 | 89.97 | 82.26 | 72.23 | 61.03 | Offical Code| |ABINet (CVPR 2021 ) | 95.92 |95.22 | 91.95 | 85.76 | 73.75 | 64.99 | Offical Code| |RobustScanner* (ECCV 2020) | 96.15 | 95.33 | 93.23 | 88.91 | 81.10 |71.53 | -- | | Transformer-baseline* | 96.27 | 95.45 | 92.42 | 86.46 | 79.35 | 72.46 | -- | |CDistNet |96.62| 96.15 | 94.28 | 89.96 | 83.43 | 77.71 | -- |

CA-IC13

|Method |1 | 2 | 3 | 4 | 5 | 6 | Code & Pretrain model| |- | - | - | - | - | - | - | - | |VisionLAN (ICCV 2021) | 94.87 | 92.77 | 84.01 | 75.03 | 64.29 | 52.74 | Offical Code| |ABINet (CVPR 2021 ) | 96.62 | 95.92 | 87.86 |76.31 | 65.46 | 54.49 | Offical Code| |RobustScanner* (ECCV 2020) | 95.22 | 94.87 | 85.30 | 76.55 | 68.38 |60.79 | -- | | Transformer-baseline* | 95.68 | 94.40 | 85.88 | 75.85 | 65.93 | 58.58 | -- | |CDistNet | 96.27 | 95.57 | 88.45 | 79.58 | 70.36 | 63.13 | -- |

Datasets

The datasets are same as ABINet

Training datasets
1. MJSynth (MJ):
  - LMDB dataset BaiduNetdisk(passwd:n23k)
2. SynthText (ST):
  - LMDB dataset BaiduNetdisk(passwd:n23k)
Evaluation & Test datasets, LMDB datasets can be downloaded from BaiduNetdisk(passwd:1dbv), GoogleDrive.
1. ICDAR 2013 (IC13)
2. ICDAR 2015 (IC15)
3. IIIT5K Words (IIIT)
4. Street View Text (SVT)
5. Street View Text-Perspective (SVTP)
6. CUTE80 (CUTE)
Augment IC13
- HA-IC13 & CA-IC13 : BaiduNetdisk(passwd:d6jd), GoogleDrive

The structure of dataset directory is

dataset
├── eval
│   ├── CUTE80
│   ├── IC13_857
│   ├── IC15_1811
│   ├── IIIT5k_3000
│   ├── SVT
│   └── SVTP
├── train
│   ├── MJ
│   │   ├── MJ_test
│   │   ├── MJ_train
│   │   └── MJ_valid
│   └── ST

Environment

package you can find in env_cdistnet.yaml.

#Installed
conda create -n CDistNet python=3.7
conda install pytorch==1.5.1 torchvision==0.6.1 cudatoolkit=9.2 -c pytorch
pip install opencv-python mmcv notebook numpy einops tensorboardX Pillow thop timm tornado tqdm matplotlib lmdb

Pretrained Models

Get the pretrained models from BaiduNetdisk(passwd:d6jd), GoogleDrive. (We both offer training log and result.csv in same file.) The pretrained model should set in models/reconstruct_CDistNetv3_3_10

Performances of the pretrained models are summaried as follows:

Train

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --config=configs/CDistNet_config.py

Eval

CUDA_VISIBLE_DEVICES=0 python eval.py --config=configs/CDistNet_config.py

Citation

@article{Zheng2021CDistNetPM,
  title={CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition},
  author={Tianlun Zheng and Zhineng Chen and Shancheng Fang and Hongtao Xie and Yu-Gang Jiang},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.11011}
}

Related Skills

node-connect

349.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.5k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。