DRRG

Deep relational reasoning graph network for arbitrary shape text detection; Accepted by CVPR 2020 (Oral). http://arxiv.org/abs/2003.07493

Generate Convert Improve

Install / Use

/learn @GXYM/DRRG

About this skill

Quality Score

0/100

README

This is an implementation of “Deep relational reasoning graph network for arbitrary shape text detection”.

News

[x] Our new work at https://github.com/GXYM/TextBPN-Plus-Plus.
[x] This project is reproduced in MMOCR.
[x] This project is reproduced in PaddleOCR.
[x] This project is reproduced by Paddle implementation in DRRG_Paddle. Description of reproduce is in Paddle AI Studio

Prerequisites

python 3.7;
PyTorch 1.2.0;
Numpy >=1.16;
CUDA 10.1;
GCC >=9.0;
opencv-python < 4.5.0
NVIDIA GPU(with 10G or larger GPU memory for inference);

Compile

cd ./csrc and make
cd ./nmslib/lanms and make

Data Links

Note: download the data and put it under the data file

Models

The trained models of Total-Text, CTW-1500 model, MSRA-TD500, MLT2017, Icdar2015 all in here.
Google Drive or Baidu Drive (download code: cfat)

Train

cd tool
sh train_CTW1500.sh # run or other shell script

you should modify the relevant training parameters according to the environment， such as gpu_id and input_size:

#!/bin/bash
cd ../
CUDA_LAUNCH_BLOCKING=1 python train_TextGraph.py --exp_name Ctw1500 --max_epoch 600 --batch_size 6 --gpu 0 --input_size 640 --optim SGD --lr 0.001 --start_epoch 0 --viz --net vgg 
# --resume pretrained/mlt2017_pretain/textgraph_vgg_100.pth ### load the pretrain model,  You should change this path to your own

Eval

First, you can modify the relevant parameters in the config.py and option.py

python  eval_TextGraph.py # Testing single round model 
or 
python  batch_eval.py #  Testing multi round models

Qualitative results()

screenshot1

screenshot

Citing the related works

@inproceedings{DBLP:conf/cvpr/ZhangZHLYWY20,
  author       = {Shi{-}Xue Zhang and
                  Xiaobin Zhu and
                  Jie{-}Bo Hou and
                  Chang Liu and
                  Chun Yang and
                  Hongfa Wang and
                  Xu{-}Cheng Yin},
  title        = {Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection},
  booktitle    = {2020 {IEEE/CVF} Conference on Computer Vision and Pattern Recognition,
                  {CVPR} 2020, Seattle, WA, USA, June 13-19, 2020},
  pages        = {9696--9705},
  publisher    = {Computer Vision Foundation / {IEEE}},
  year         = {2020},
  doi          = {10.1109/CVPR42600.2020.00972},
}

@inproceedings{DBLP:conf/iccv/Zhang0YWY21,
  author    = {Shi{-}Xue Zhang and
               Xiaobin Zhu and
               Chun Yang and
               Hongfa Wang and
               Xu{-}Cheng Yin},
  title     = {Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection},
  booktitle = {2021 {IEEE/CVF} International Conference on Computer Vision, {ICCV} 2021, Montreal, QC, Canada, October 10-17, 2021},
  pages     = {1285--1294},
  publisher = {{IEEE}},
  year      = {2021},
}

@article{zhang2023arbitrary,
  title={Arbitrary shape text detection via boundary transformer},
  author={Zhang, Shi-Xue and Yang, Chun and Zhu, Xiaobin and Yin, Xu-Cheng},
  journal={IEEE Transactions on Multimedia},
  year={2023},
  publisher={IEEE}
}

@article{DBLP:journals/pami/ZhangZCHY23,
  author       = {Shi{-}Xue Zhang and
                  Xiaobin Zhu and
                  Lei Chen and
                  Jie{-}Bo Hou and
                  Xu{-}Cheng Yin},
  title        = {Arbitrary Shape Text Detection via Segmentation With Probability Maps},
  journal      = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
  volume       = {45},
  number       = {3},
  pages        = {2736--2750},
  year         = {2023},
  url          = {https://doi.org/10.1109/TPAMI.2022.3176122},
  doi          = {10.1109/TPAMI.2022.3176122},
}

@article{zhang2022kernel,
  title={Kernel proposal network for arbitrary shape text detection},
  author={Zhang, Shi-Xue and Zhu, Xiaobin and Hou, Jie-Bo and Yang, Chun and Yin, Xu-Cheng},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  year={2022},
  publisher={IEEE}
}

License

This project is licensed under the MIT License - see the LICENSE.md file for details

✨ Star History

Related Skills

node-connect

350.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

350.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

350.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。