Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer

Official repository for the AAAI2025 paper Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer [paper] [website].

In summary, SparseViT leverages the distinction between semantic and non-semantic features, enabling the model to adaptively extract non-semantic features that are more critical for image manipulation localization. This provides a novel approach to precisely identifying manipulated regions.

Dataset Preparation

<details> <summary>Dataset Preparation</summary> (1) Since SparseViT was trained using the CAT-Net joint dataset, you need to download the combined dataset. The specific datasets include:

CASIA2.0
FantasticReality_v1
IMD_20
tampCOCO

（For more detailed information about the dataset, you can refer to CAT-Net.）

(2) The organization of the dataset, we have defined two types of Dataset classes:

json_dataset, To retrieve the input images and their corresponding ground truth from a JSON file, the format would typically look like this:

[
    [
        "/Dataset/CASIAv2/Tp/Tp_D_NRN_S_N_arc00013_sec00045_11700.jpg",
        "/Dataset/CASIAv2/Gt/Tp_D_NRN_S_N_arc00013_sec00045_11700_gt.png"
    ],
    [
        "/Dataset/CASIAv2/Au/Au_nat_30198.jpg",
        "Negative"
    ],
    ...
]
Note: "Negative" indicates a real image with no ground truth.

mani_dataset，Automatically loads images and their corresponding ground truth pairs from a directory. The directory structure should include：

Tp subdirectory（for storing input images）

Gt subdirectory（for storing ground truth）

File pairing is automatically completed using the os.listdir() function.

An example of the organization of mani_dataset is provided in the /images directory.

(3) Combined dataset configuration, organize each dataset into a JSON file in the following format:

[
    ["ManiDataset", "/mnt/data0/public_datasets/IML/CASIA2.0"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/FantasticReality_v1/FantasticReality.json"],
    ["ManiDataset", "/mnt/data0/public_datasets/IML/IMD_20_1024"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/tampCOCO/sp_COCO_list.json"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/tampCOCO/cm_COCO_list.json"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/tampCOCO/bcm_COCO_list.json"],
    ["JsonDataset", "/mnt/data0/public_datasets/IML/tampCOCO/bcmc_COCO_list.json"]
]

Configure the path to the organized JSON file in the data_path parameter within the train.sh file.

</details>

Train setup

<details> <summary>1) Set up the coding environment</summary>

First, clone the repository:

git clone https://github.com/scu-zjz/SparseViT.git

Our environment

Ubuntu LTS 20.04.1

CUDA 11.5 + cudnn 8.4.0

Python 3.10

PyTorch 2.4

You should install the packages in requirements.txt

pip install -r requirements.txt

</details> <details> <summary>2) Download the Uniformer pretrained weights</summary>

Download the pretrained weights from Google Drive and place them in the checkpoint/train/pretrain directory.
Modify the pretrain_path in the train.sh file to the location of your Uniformer pre-trained model.

</details>

Test setup

<details> <summary>1) Set up the coding environment</summary>

Consistent with "train".

</details> <details> <summary>2) Download our pretrained checkpoints</summary>

Download our pretrained checkpoints from Google Drive and place them in the checkpoint/test directory.

</details>

Scripts

This should be super easy! Simply run

For Train

sh train.sh

For Test

python main_test.py

Here we simply provide the basic training and testing for SparseViT. Of course, you can train and test SparseViT within our proposed IMDL-BenCo framework, as they are fully compatible.

Citation

If you find our code useful, please consider citing us and give us a star!

@inproceedings{su2025can,
  title={Can we get rid of handcrafted feature extractors? sparsevit: Nonsemantics-centered, parameter-efficient image manipulation localization through spare-coding transformer},
  author={Su, Lei and Ma, Xiaochen and Zhu, Xuekang and Niu, Chaoqun and Lei, Zeyu and Zhou, Ji-Zhe},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={7},
  pages={7024--7032},
  year={2025}
}

Star History

SparseViT

Install / Use

README

Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer

Dataset Preparation

Train setup

Test setup

Scripts

Citation

Star History