PMTD
Pyramid Mask Text Detector designed by SenseTime Video Intelligence Research team.
Install / Use
/learn @STVIR/PMTDREADME
PMTD: Pyramid Mask Text Detector
This project hosts the inference code for implementing the PMTD algorithm for text detection, as presented in our paper:
Pyramid Mask Text Detector;
Liu Jingchao, Liu Xuebo, Sheng Jie, Liang Ding, Li Xin and Liu Qingjie;
arXiv preprint arXiv:1903.11800 (2019).
The full paper is available at: https://arxiv.org/abs/1903.11800.

Installation
Check INSTALL.md for installation instructions.
Trained model
We provide trained model on ICDAR 2017 MLT dataset here and ICDAR 2015 dataset here for downloading. Note that the result is slightly different from we reported in the paper, because PMTD is based on a private codebase, we reimplement inference code based on maskrcnn-benchmark.
ICDAR 2017
Method|Precision| Recall| F-measure ---|---|---|--- This project|85.13%|72.85%| 78.51% Paper reported|85.15%| 72.77%| 78.48%
ICDAR 2015
Method|Precision| Recall| F-measure ---|---|---|--- This project|87.48%|91.26%| 89.33% Paper reported|87.43%| 91.30%| 89.33%
A quick demo
cd PROJECT_ROOT
python demo/PMTD_demo.py \
--image_path=datasets/icdar2017mlt/ch8_validation_images/img_1.jpg \
--model_path=models/PMTD_ICDAR2017MLT.pth
Perform testing on ICDAR 2017 MLT dataset
Prepare dataset
We recommend to symlink ICDAR 2017 MLT dataset to datasets/ as follows
# eg: ~/Projects/PMTD
cd PROJECT_ROOT
mkdir -p datasets/icdar2017mlt
cd datasets/icdar2017mlt
# symlink for images and annotations
ln -s /path_to_icdar2017mlt_dataset/ch8_test_images
Generate coco label for dataset
# ${PWD} = datasets/icdar2017mlt
mkdir annotations
cd PROJECT_ROOT
python demo/utils/generate_icdar2017.py
# label will output to PROJECT_ROOT/datasets/icdar2017mlt/annotations/test_coco.json
Test images
In the test stage, we use one GPU of TITANX 11G with a batch size 4. When encountering the out-of-memory (OOM) error, you may need to modify TEST.IMS_PER_BATCH in configs/e2e_PMTD_R_50_FPN_1x_test.yaml.
# the download model should place in the path: models/PMTD_ICDAR2017MLT.pth
python tools/test_net.py --config=configs/e2e_PMTD_R_50_FPN_1x_ICDAR2017MLT_test.yaml
# results will output to PROJECT_ROOT/inference/icdar_2017_mlt_test/
# - bbox.json // when using coco evaluation criterion
# - segm.json // when using coco evaluation criterion
# - dataset.pth
# - predictions.pth
# - results_{scale}.pth, in default setting, scale=1600
Convert results to ICDAR 2017 submission format
python demo/utils/convert_results_to_icdar.py
# results will output to PROJECT_ROOT/inference/icdar_2017_mlt_test/
# - icdar.zip
submit icdar.zip to ICDAR 2017 MLT
Citations
Please consider citing our paper in your publications if this project helps your research. BibTeX reference is as follows.
@article{liu2019pyramid,
title={Pyramid Mask Text Detector},
author={Liu, Jingchao and Liu, Xuebo and Sheng, Jie and Liang, Ding and Li, Xin and Liu, Qingjie},
journal={arXiv preprint arXiv:1903.11800},
year={2019}
}
Contributors
License
Maskrcnn-benchmark is released under the MIT license. PMTD is released under the Apache 2.0 license.
Related Skills
diffs
342.5kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
openpencil
1.9kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
ui-ux-pro-max-skill
55.6kAn AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
Security Score
Audited on Feb 2, 2026
