TextPMs

Arbitrary Shape Text Detection via Segmentation with Probability Maps; accepted by TPAMI2022

Generate Convert Improve

Install / Use

/learn @GXYM/TextPMs

About this skill

Quality Score

0/100

README

TextPMs

This is a Pytorch implementation of "Arbitrary Shape Text Detection via Segmentation with Probability Maps "

NOTE: This paper and project were completed in January 2020 and accepted by PAMI in May 2022.

NEWS

2022.06.24 0ur new work at https://github.com/GXYM/TextBPN-Plus-Plus.
2022.6.06 We updated the Google cloud links so that it can be downloaded without permission.

Prerequisites

python 3.7;
PyTorch 1.2.0;
Numpy >=1.16;
CUDA >=10.2;
GCC >=9.0;
opencv-python < 4.5.0
NVIDIA GPU(1080, 2080 or 3090);

NOTE: We tested the code in the environment of Arch Linux+Python3.7 with 1080, and Arch Linux+Python3.9 with 2080. For other environments, the code may need to be adjusted slightly.

Makefile

If “pse” is used, some cpp files need to be compiled

cd pse & make

Dataset Links

NOTE: The images of each dataset can be obtained from their official website.

Training

Prepar dataset

We provide a simple example for each dataset in data, such as Total-Text, CTW-1500, and MLT-2017 ...

Pre-training models

We provide some pre-tarining models on SynText and MLT-2017 Baidu Drive (download code: 07pb), Google Drive

Models

Total-Text model: Baidu Drive (download code: ce36), Google Drive
CTW-1500 model: Baidu Drive (download code: 7gov), Google Drive
MSRA-TD500 model: Baidu Drive (download code: yocp), Google Drive
ICDAR2017 model: Baidu Drive (download code: eu1s), Google Drive

NOTE: The model of each benchmark is pre-trained on MLT-2017; the trained model of MLT-2017 in pre-training models，so there is no link separately here.

Runing the training scripts

We provide training scripts for each dataset in scripts-train, such as Total-Text, MLT-2017. We also provide pre-training script of SynText ...

Running Evaluation

run:

sh eval.sh

The details in a are as follows:

#!/bin/bash
###### test eval ############
##################### Total-Text ###################################
CUDA_LAUNCH_BLOCKING=1 python3 eval_TextPMs.py --exp_name Totaltext --checkepoch 250 --test_size 640 1024 --threshold 0.4 --score_i 0.7 --recover watershed --gpu 0 # --viz


###################### CTW-1500 ####################################
#CUDA_LAUNCH_BLOCKING=1 python3 eval_TextPMs.py --exp_name Ctw1500 --checkepoch 480 --test_size 512 1024 --threshold 0.4 --score_i 0.7 --recover watershed --gpu 0


#################### MSRA-TD500 ######################################
#CUDA_LAUNCH_BLOCKING=1 python3 eval_TextPMs.py --exp_name TD500 --checkepoch 125 --test_size 0 832 --threshold 0.45 --score_i 0.835 --recover watershed --gpu 0


#################### Icdar2015 ######################################
#CUDA_LAUNCH_BLOCKING=1 python3 eval_TextPMs.py --exp_name Icdar2015 --checkepoch 370 --test_size 960 1920 --threshold 0.515 --score_i 0.85 --recover watershed --gpu 0

NOTE：If you want to save the visualization results, you need to open “--viz”.

Demo

You can also run prediction on your own dataset without annotations. Here is an example:

#demo.sh
#!/bin/bash
CUDA_LAUNCH_BLOCKING=1 python3 demo.py --net resnet50 --exp_name Totaltext --checkepoch 250 --test_size 640 1024 --threshold 0.4 --score_i 0.7 --recover watershed --gpu 0 --img_root ./demo  --viz

Evaluate the performance

Note that we provide some the protocols for benchmarks (Total-Text, CTW-1500, MSRA-TD500, ICDAR2015. The embedded evaluation protocol in the code are obtatined from the official protocols. You don't need to run these protocols alone, because our test code will automatically call these scripts, please refer to "util/eval.py"

Visualization

Citing the related works

Please cite the related works in your publications if it helps your research:

@article{DBLP:journals/pami/ZhangZCHY23,
  author       = {Shi{-}Xue Zhang and
                  Xiaobin Zhu and
                  Lei Chen and
                  Jie{-}Bo Hou and
                  Xu{-}Cheng Yin},
  title        = {Arbitrary Shape Text Detection via Segmentation With Probability Maps},
  journal      = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
  volume       = {45},
  number       = {3},
  pages        = {2736--2750},
  year         = {2023},
  url          = {https://doi.org/10.1109/TPAMI.2022.3176122},
  doi          = {10.1109/TPAMI.2022.3176122},
  timestamp    = {Sat, 25 Feb 2023 21:35:10 +0100},
  biburl       = {https://dblp.org/rec/journals/pami/ZhangZCHY23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Related Skills

node-connect

351.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

351.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

351.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。