SkillAgentSearch skills...

TCM

Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)

Install / Use

/learn @wenwenyu/TCM
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Turning a CLIP Model into a Scene Text Detector

This repository is build upon mmocr 0.4.0.

NightTime-ArT Dataset

NightTime-ArT dataset, collected from ArT, can be downloaded from here.

Usage

Environment

  • cuda 11.1
  • torch=1.8.0
  • torchvision=0.9.0
  • timm=0.4.12
  • mmcv-full=1.3.17
  • mmseg=0.20.2
  • mmdet=2.19.1
  • mmocr=0.4.0

The code is based on mmocr. Please first install the mmcv-full and mmocr following the official guidelines (mmocr).

Dataset

Pre-trained CLIP Models

  • Download the pre-trained CLIP models (RN50.pt) and save them to the pretrained folder.
  • Configure the pre-trained CLIP models path in config file as
model = dict(
    pretrained='xxx/ocrclip/pretrained/RN50.pt',
    )

Pretraining & Training & Evaluation

To pretrain the TCM model on SynthText/Synth150k, please configure the corresponding dataset path, then run:

bash dist_train.sh configs/textdet/xxnet/xxx.py 8

To finetune the TCM model based on pretrained model, please configure the load_from to the pretrained checkpoint path, then run:

bash dist_train.sh configs/textdet/xxnet/xxx.py 8

To evaluate the performance with checkpoint, run:

bash dist_test.sh configs/textdet/xxnet/xxx.py /path/to/checkpoint 1 --eval hmean-iou

Results

| Method | Data | F-measure | Model | |--------|------|-----------|--------| | TCM-DB | TD | 88.8% | config weights |
| TCM-DB | IC15 | 88.8% | config weights |
| TCM-DB | CTW | 85.1% | config |
| TCM-DB | TT | 85.9% | config |

Turning a CLIP Model into a Scene Text Spotter

TCM for Scene Text Spotter

Please refer to the spotter folder for more details.

TCM for Rotated Object Detection

Please refer to the rotated_object_detection folder for more details.

TODO

  • [x] Add FastTCM
  • [ ] Migration from mmocr 0.4.0 to mmocr 1.0.0
  • [ ] Refactor and clean code
  • [ ] Release domain adaptation setting

Cites

If you find this project helpful for your research, please consider citing the paper

@inproceedings{Yu2023TurningAC,
  title={Turning a CLIP Model into a Scene Text Detector},
  author={Wenwen Yu and Yuliang Liu and Wei Hua and Deqiang Jiang and Bo Ren and Xiang Bai},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

@article{Yu2024TurningAC,
  title={Turning a CLIP Model into a Scene Text Spotter},
  author={Wenwen Yu and Yuliang Liu and Xingkui Zhu and Haoyu Cao and Xing Sun and Xiang Bai},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024}
}

Licence

This project is under the CC-BY-NC 4.0 license. See LICENSE for more details.

Acknowledges

The project partially based on MMOCR, CLIP, MMRotate, DenseCLIP, AdelaiDet, Deformable-DETR, TESTR. Thanks for their great works.

View on GitHub
GitHub Stars201
CategoryDevelopment
Updated1mo ago
Forks20

Languages

Jupyter Notebook

Security Score

80/100

Audited on Mar 1, 2026

No findings