TextPMs
Arbitrary Shape Text Detection via Segmentation with Probability Maps; accepted by TPAMI2022
Install / Use
/learn @GXYM/TextPMsREADME
TextPMs
This is a Pytorch implementation of "Arbitrary Shape Text Detection via Segmentation with Probability Maps "

NOTE: This paper and project were completed in January 2020 and accepted by PAMI in May 2022.
NEWS
2022.06.24 0ur new work at https://github.com/GXYM/TextBPN-Plus-Plus.
2022.6.06 We updated the Google cloud links so that it can be downloaded without permission.
Prerequisites
python 3.7;
PyTorch 1.2.0;
Numpy >=1.16;
CUDA >=10.2;
GCC >=9.0;
opencv-python < 4.5.0
NVIDIA GPU(1080, 2080 or 3090);
NOTE: We tested the code in the environment of Arch Linux+Python3.7 with 1080, and Arch Linux+Python3.9 with 2080. For other environments, the code may need to be adjusted slightly.
Makefile
If “pse” is used, some cpp files need to be compiled
cd pse & make
Dataset Links
NOTE: The images of each dataset can be obtained from their official website.
Training
Prepar dataset
We provide a simple example for each dataset in data, such as Total-Text, CTW-1500, and MLT-2017 ...
Pre-training models
We provide some pre-tarining models on SynText and MLT-2017 Baidu Drive (download code: 07pb), Google Drive
Models
- Total-Text model: Baidu Drive (download code: ce36), Google Drive
- CTW-1500 model: Baidu Drive (download code: 7gov), Google Drive
- MSRA-TD500 model: Baidu Drive (download code: yocp), Google Drive
- ICDAR2017 model: Baidu Drive (download code: eu1s), Google Drive
NOTE: The model of each benchmark is pre-trained on MLT-2017; the trained model of MLT-2017 in pre-training models,so there is no link separately here.
Runing the training scripts
We provide training scripts for each dataset in scripts-train, such as Total-Text, MLT-2017. We also provide pre-training script of SynText ...
Running Evaluation
run:
sh eval.sh
The details in a are as follows:
#!/bin/bash
###### test eval ############
##################### Total-Text ###################################
CUDA_LAUNCH_BLOCKING=1 python3 eval_TextPMs.py --exp_name Totaltext --checkepoch 250 --test_size 640 1024 --threshold 0.4 --score_i 0.7 --recover watershed --gpu 0 # --viz
###################### CTW-1500 ####################################
#CUDA_LAUNCH_BLOCKING=1 python3 eval_TextPMs.py --exp_name Ctw1500 --checkepoch 480 --test_size 512 1024 --threshold 0.4 --score_i 0.7 --recover watershed --gpu 0
#################### MSRA-TD500 ######################################
#CUDA_LAUNCH_BLOCKING=1 python3 eval_TextPMs.py --exp_name TD500 --checkepoch 125 --test_size 0 832 --threshold 0.45 --score_i 0.835 --recover watershed --gpu 0
#################### Icdar2015 ######################################
#CUDA_LAUNCH_BLOCKING=1 python3 eval_TextPMs.py --exp_name Icdar2015 --checkepoch 370 --test_size 960 1920 --threshold 0.515 --score_i 0.85 --recover watershed --gpu 0
NOTE:If you want to save the visualization results, you need to open “--viz”.
Demo
You can also run prediction on your own dataset without annotations. Here is an example:
#demo.sh
#!/bin/bash
CUDA_LAUNCH_BLOCKING=1 python3 demo.py --net resnet50 --exp_name Totaltext --checkepoch 250 --test_size 640 1024 --threshold 0.4 --score_i 0.7 --recover watershed --gpu 0 --img_root ./demo --viz
Evaluate the performance
Note that we provide some the protocols for benchmarks (Total-Text, CTW-1500, MSRA-TD500, ICDAR2015. The embedded evaluation protocol in the code are obtatined from the official protocols. You don't need to run these protocols alone, because our test code will automatically call these scripts, please refer to "util/eval.py"
Visualization

Citing the related works
Please cite the related works in your publications if it helps your research:
@article{DBLP:journals/pami/ZhangZCHY23,
author = {Shi{-}Xue Zhang and
Xiaobin Zhu and
Lei Chen and
Jie{-}Bo Hou and
Xu{-}Cheng Yin},
title = {Arbitrary Shape Text Detection via Segmentation With Probability Maps},
journal = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
volume = {45},
number = {3},
pages = {2736--2750},
year = {2023},
url = {https://doi.org/10.1109/TPAMI.2022.3176122},
doi = {10.1109/TPAMI.2022.3176122},
timestamp = {Sat, 25 Feb 2023 21:35:10 +0100},
biburl = {https://dblp.org/rec/journals/pami/ZhangZCHY23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
License
This project is licensed under the MIT License - see the LICENSE.md file for details
Related Skills
node-connect
351.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
