SkillAgentSearch skills...

CIPA

The official code of "Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images", CVPR 2025

Install / Use

/learn @mj129/CIPA
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <h1> Multi-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images </h1>

Jie Mei<sup>1</sup>, Chenyu Lin<sup>2</sup>, Yu Qiu<sup>3</sup>, Yaonan Wang<sup>1</sup>, Hui Zhang<sup>1</sup>, Ziyang Wang<sup>4</sup>, Dong Dai<sup>4</sup></sup>

<sup>1</sup> Hunan University, <sup>2</sup> Nankai University, <sup>3</sup> Hunan Normal University, <sup>4</sup> Tianjin Medical University Cancer Institute and Hospital

arXiv License: MIT

</div>

Introduction

<p style="text-align:justify; text-justify:inter-ideograph;"> Lung cancer is a leading cause of cancer-related deaths globally. PET-CT is crucial for imaging lung tumors, providing essential metabolic and anatomical information, while it faces challenges such as poor image quality, motion artifacts, and complex tumor morphology. Deep learning-based models are expected to address these problems, however, existing small-scale and private datasets limit significant performance improvements for these methods. Hence, we introduce a large-scale PET-CT lung tumor segmentation dataset, termed PCLT20K, which comprises 21,930 pairs of PET-CT images from 605 patients. Furthermore, we propose a cross-modal interactive perception network with Mamba (CIPA) for lung tumor segmentation in PET-CT images. Specifically, we design a channel-wise rectification module (CRM) that implements a channel state space block across multi-modal features to learn correlated representations and helps filter out modality-specific noise. A dynamic cross-modality interaction module (DCIM) is designed to effectively integrate position and context information, which employs PET images to learn regional position information and serves as a bridge to assist in modeling the relationships between local features of CT images. Extensive experiments on a comprehensive benchmark demonstrate the effectiveness of our CIPA compared to the current state-of-the-art segmentation methods. We hope our research can provide more exploration opportunities for medical image segmentation. </p>

CIPA

Environment

  1. Create environment.

    conda create -n MIPA python=3.10
    conda activate MIPA
    
  2. Install all dependencies. Install pytorch, cuda and cudnn, then install other dependencies via:

    pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
    
    pip install -r requirements.txt
    
  3. Install selective_scan_cuda_core.

    cd models/encoders/selective_scan
    pip install .
    cd ../../..
    

PCLT20K

Please contact Jie Mei (jiemei AT hnu.edu.cn) for the dataset. We will get back to you shortly. The email should contain the following information. Note: For better academic communication, a real-name system is encouraged and your email suffix must match your affiliation (e.g., hello@hnu.edu.cn). If not, you need to explain why.

Name: (Tell us who you are.)
Affiliation: (The name/url of your institution or university, etc.)
Job Title: (E.g., Professor, Associate Professor, PhD, etc.)
Email: (Dataset will be sent to this email.)
How to use: (Only for non-commercial use.)

Data Preparation

  1. For our dataset PCLT20K, we orgnize the dataset folder in the following structure:

    <PCLT20K>
        |-- <0001>
            |-- <name1_CT.png>
            |-- <name1_PET.png>
            |-- <name1_mask.png>
            ...
        |-- <0002>
            |-- <name2_CT.png>
            |-- <name2_PET.png>
            |-- <name2_mask.png>
            ...
        ...
        |-- train.txt
        |-- test.txt
    

    train.txt/test.txt contains the names of items in training/testing set, e.g.:

    <name1>
    <name2>
    ...
    
  2. Please put our dataset in the data directory

Usage

Training

  1. Please download the pretrained VMamba weights, and put them under pretrained/vmamba/. We use VMamba_Tiny as default.

  2. Config setting.

    Edit config in the train.py. Change C.backbone to sigma_tiny / sigma_small / sigma_base to use the three versions of VMamba.

  3. Run multi-GPU distributed training:

    torchrun --nproc_per_node 'GPU_Numbers' train.py
    
  4. You can also use single-GPU training:

    python train.py
    
  5. Results will be saved in save_model folder.

Testing

The pretrained model of CIPA (CIPA.pth) can be downloaded:

python pred.py

Citation

If you are using the code/model provided here in a publication, please consider citing:

@inproceedings{mei2025cross,
  title={Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images},
  author={Mei, Jie and Lin, Chenyu and Qiu, Yu and Wang, Yaonan and Zhang, Hui and Wang, Ziyang and Dai, Dong},
  booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025}
}

Contact

For any questions, please contact me via e-mail: jiemei AT hnu.edu.cn.

Acknowledgment

This project is based on the VMamba and Sigma, thanks for their excellent works.

View on GitHub
GitHub Stars67
CategoryDevelopment
Updated7d ago
Forks3

Languages

Python

Security Score

95/100

Audited on Mar 25, 2026

No findings