Revisiting Tampered Scene Text Detection in the Era of Generative AI [AAAI2025]

This is the official implementation of the paper Revisiting Tampered Scene Text Detection in the Era of Generative AI. paper

The Open-Set Text Forensics (OSTF) dataset is now publicly available at Google Drive and Baidu Drive.

Researchers are welcome 😃 to apply for this dataset by sending an email to 202221012612@mail.scut.edu.cn (with institution email address) and introducing:

Who you are and your institution.
Who is your supervisor/mentor.

The Text Forensics Reasoning (TFR) dataset is now publicly available at Google Drive and Baidu Drive.

Researchers are welcome 😃 to apply for this dataset by sending an email to 202221012612@mail.scut.edu.cn (with institution email address) and introducing:

Who you are and your institution.
Who is your supervisor/mentor.

OSTF Train data preparation

Apply, download and unzip the OSTF dataset.
Move all the 18 *.pk files from the mmacc_pks dir into the mmacc dir.
Move the mmacc dir into this main dir. Finally, after the above 3 steps, in this main dir, you will get such dir structre:

FBCNN---...
  |
configs---...
  |
mmcv_custom---...
  |
mmdet---...
  |
tools---...
  |
mmacc---srnet---...
          |
        srnet_train.pk
          |
        srnet_test.pk
          |
        anytext---...
          |
        anytext_train.pk
          |
        anytext_test.pk
          |
         ...

Texture Jitter train data preparation

Download and "tar -xvf" the pretrain.tar in this dir. After "tar -xvf", you will get a new dir named "pretrain" with 7 sub-dirs (ArT, ICDAR2013, ICDAR2015, ICDAR2017-MLT, LSVT, ReCTS, TextOCR). There will be a "train.pk" and a "msk" dir in each dir.
Download the dataset training set images from ArT, ICDAR2013 (Task 2.4: End to End (2015 edition)), ICDAR2015, ICDAR2017-MLT, LSVT (train_full_images_0/1.tar.gz 4.1G), ReCTS, TextOCR.
Rename the 7 downloaded image dirs into an "img" dir under the 7 sub-dirs. For example, "mv [Your downloaded ArT train images] pretrain/ArT/img" and "mv [Your downloaded ReCTS train images] pretrain/ReCTS/img".
Make a new dir named "revjpegs" in this main dir, and make "pretrain" dir and sub-dirs to make sure that the dir "revjpegs" has the same sub-dir structure as the "pretrain" dir. For example, it should has the dirs "revjpegs/pretrain/ArT/img" and "revjpegs/pretrain/ReCTS/img", etc, corresponding to "pretrain/ArT/img" and "pretrain/ReCTS/img" respectively.
Download the fbcnn_color.pth following this Readme.md. In the FBCNN dir, run the command to create reverse jpeg images for each of the 7 sub-dir images of the pretrain dir. For example, run "CUDA_VISIBLE_DEVICES=0 python app.py --inp pretrain/ArT/img/ --out revjpegs/ArT/img/" and "CUDA_VISIBLE_DEVICES=0 python app.py --inp pretrain/ReCTS/img/ --out revjpegs/ReCTS/img/".

Finally, after the above 5 steps, in this main dir, you will get such dir structre:

FBCNN---...
  |
configs---...
  |
pretrain---ArT---img---....
  |         |     |
  |         |   train.pk
  |         |
  |        ICDAR2015---img---...
  |         |           |
  |         |         train.pk
  |         |
  |        ...
  |
revjpeg---pretrain---ArT---img---....
  |                   |     |
  |                   |   train.pk
  |                   |
  |                   ICDAR2015---img---...
  |                   |           |
  |                   |         train.pk
  |                   |
  |                  ...
  |
mmcv_custom---...
  |
mmdet---...
  |
tools---...
  |
mmacc---...

The Texture Jitter method is implemented as "TextureSG" in "txt_pipeline" of the config files (e.g. here), its source code is here. The key function for the Texture Jitter method is the function "img_tamper" in Line450.

The DAF framework is implemented as DFPNCMap3 and CascadeCMap3 for Faster R-CNN and Cascade R-CNN respectively.

DAF key implementation (take the Faster R-CNN based DAF as an example):

Line47 Authentic Kernel implementation. The variable "self.sgl" implements the Authentic Kernel (the variable "self.C" in Line17) and its loss function (this forward function in Line21).
Line379 Authentic Kernel Modulation implements the modulation between te Authentic Kernel (the variable "self.sgl.C") and the global features (the variable "gloabl_feats"). In this line, the resulting variable "gloabl_feats" is the modulated authentic kernel.
Line324 Training model to learn real/fake classification with feature difference. During training, the feature difference between each RoI vector (the variable "mskf" in this line) and the modulated authentic kernel (the variable "glb_feats" in this line) is obtained by "mskf - glb_feats[gt_valid]". Then, this feature difference vector is fed into a fully-connected layer for final real/fake prediction as "self.fc(mskf - glb_feats[gt_valid])". During training, in this Line324, the loss between model prediction "self.fc(mskf - glb_feats[gt_valid])" and the ground-truth "gt_label[gt_valid].long()" is calculated to help the model learn real/fake classification.
Line548 Model predicts real/fake with feature difference. In this line, the modulated authentic kernel is the variables "g" and "glb_feats", the input RoI feature vectors are "m" and "mask_feats". The feature difference is obtained by "(self.convert(m)-g)" and the final classification score is obtained by feeding it to the final binary classifier and softmax layer F.softmax(self.fc(self.convert(m)-g),1).

Train

Enviroment based on Python 3.9.12

pip install -r requirements.txt

bash tools/dist_train.sh [your config_py_file] [Your gpu number, should be consistent with the dist_train.sh]

About the config files

The config files are in the config dir, all the config files following such name roles: ModelType_AblationType+TrainData.py. For the AblationType, o is for the original one without Texture Jitter Pre-training and x is the one with the Texture Jitter Pre-training. For example, fasterrcnn_xsrnet is the Faster R-CNN model pre-trained with Texture Jitter, and fine-tuned with the SR-Net training data and the Texture Jitter method.

Use the config files in the pre-training stage

Given a config file that contains the model defination you want to pretrain (e.g. cascade R-CNN):

Modify the datasets line of the train dataloader, delete the fine-tune data. For example, in cascade_xsrnet.py, modify the Line435 from "datasets = [ptdatas, ftdatas]," into "datasets = [ptdatas,],".
Modify the datasets line of the "pt_data". For example, in cascade_xsrnet.py, modify the Line412 from "datasets = [ic13,ic15,ic17]," into "datasets = [ic13,ic15,ic17,art,rects,lsvt,textocrpt],".
Modify the pre-trained weight to initialize. We use the official COCO-pretrained backbone and detection modules (RPN, RoI Heads) for initialization in the pre-training stage. For example, in cascade_xsrnet.py, modify the Line602 from 'cascade.pth' into your initial weights path (e.g. "rcnn_swin.pth"): The initial weights for Cascade R-CNN can be downloaded from my another repo ("rcnn_swin.pth" in the baseline zip file), the initial weights for Faster R-CNN can be downloaded from here.

Use the config files in the fine-tuning stage

Just need to modify the pre-trained weight into your pre-trained weights,

OSTF

Install / Use

README