SkillAgentSearch skills...

VLSA

Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology (ICLR 2025)

Install / Use

/learn @liupei101/VLSA
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

VLSA: Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

[Paper] | [VLSA Walkthrough] | [Awesome Papers of Pathology VLMs] | [Zhihu (中文)] | [WSI Preprocessing] | [Acknowledgements] | [Citation]

<img src="docs/VLSA.webp" width="300px" align="right"/>

Abstract: Histopathology Whole-Slide Images (WSIs) provide an important tool to assess cancer prognosis in computational pathology (CPATH). While existing survival analysis (SA) approaches have made exciting progress, they are generally limited to adopting highly-expressive architectures and only coarse-grained patient-level labels to learn prognostic visual representations from gigapixel WSIs. Such learning paradigm suffers from important performance bottlenecks, when facing present scarce training data and standard multi-instance learning (MIL) framework in CPATH. To overcome it, this paper, for the first time, proposes a new Vision-Language-based SA (VLSA) paradigm. Concretely, (1) VLSA is driven by pathology VL foundation models. It no longer relies on high-capability networks and shows the advantage of data efficiency. (2) In vision-end, VLSA encodes prognostic language prior and then employs it as auxiliary signals to guide the aggregating of prognostic visual features at instance level, thereby compensating for the weak supervision in MIL. Moreover, given the characteristics of SA, we propose i) ordinal survival prompt learning to transform continuous survival labels into textual prompts; and ii) ordinal incidence function as prediction target to make SA compatible with VL-based prediction. Notably, VLSA's predictions can be interpreted intuitively by our Shapley values-based method. The extensive experiments on five datasets confirm the effectiveness of our scheme. Our VLSA could pave a new way for SA in CPATH by offering weakly-supervised MIL an effective means to learn valuable prognostic clues from gigapixel WSIs.

<!-- Insert a pipeline of your algorithm here if got one --> <div align="center"> <a href="https://"><img width="100%" height="auto" src="./docs/fig-vlsa-overview.png"></a> </div>

On updating. Stay tuned.

📚 Recent updates:

  • 25/02/08: upload the patch features (31.86G in files) used in VLSA; you can download them from here.
  • 25/01/23: VLSA is accepted to ICLR 2025
  • 24/10/07: add the Notebook - VLSA Walkthrough
  • 24/09/24: codes & papers are live
  • 24/09/10: release VLSA

VLSA Walkthrough

Please refer to our Notebook - VLSA Walkthrough. It provides the detail of

  • individual incidence function prediction in VLSA models;
  • and prediction interpretation using our Shapley values-based method.

👩‍💻 Running the Code

Pre-requisites

All experiments are run on a machine with

  • two NVIDIA GeForce RTX 3090 GPUs
  • python 3.8 and pytorch==1.11.0+cu113

Detailed package requirements:

  • for pip or conda users, full requirements are provided in requirements.txt.
  • for Docker users, you could use our base Docker image via docker pull yuukilp/deepath:py38-torch1.11.0-cuda11.3-cudnn8-devel and then install additional essential python packages (see requirements.txt) in the container.

Training models

Use the following command to load an experiment configuration and train the VLSA model (5-fold cross-validation):

python3 main.py --config config/IFMLE/tcga_blca/cfg_vlsa_conch.yaml --handler VLSA --multi_run

All important arguments are explained in config/IFMLE/tcga_blca/cfg_vlsa_conch.yaml.

For the traditional SA models only using visual features, use this one:

python3 main.py --config config/IFMLE/tcga_blca/cfg_sa_base_conch.yaml --handler SA --multi_run

Training Logs

We advocate open-source research. Our full training logs for VLSA models can be accessed at Google Drive.

🔥 Awesome Papers of Pathology VLMs

Foundational VLMs for computational pathology:

| Model | Architecture | Paper | Code | Data | | :------------- | :---------------- | :---------------- | :-------------- | :----- | | PLIP (NatMed'23) | CLIP | A visual language foundation model for pathology image analysis using medical twitter | Github | 208,414 pathology images paired with natural language descriptions from twitter | | Quilt-Net (NeurIPS'23) | CLIP | Quilt-1M: One million image-text pairs for histopathology | Github | 802,148 image and text pairs from YouTube | | CONCH (NatMed'24) | CoCa | A Vision-Language Foundation Model for Computational Pathology | Github | over 1.17 million image-caption pairs | | CPLIP (CVPR'24) | CLIP | CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment | Github | Many-to-many VL alignment on ARCH dataset | | PathAlign (arXiv'24) | BLIP-2 | PathAlign: A vision-language model for whole slide images in histopathology | - | over 350,000 WSIs and diagnostic text pairs | | TITAN (arXiv'24) | CoCa | Multimodal Whole Slide Foundation Model for Pathology | Github | Slide-level vision-language alignment |

VLM-driven computational pathology tasks:

| Model | Subfield | Paper | Code | Base | | :------------- | :---------- | :---------------- | :-------------- | :----- | | TOP (NeurIPS'23) | WSI Classification | The rise of ai language pathologists: Exploring two-level prompt learning for few-shot weakly-supervised whole slide image classification | Github | Few-shot WSI classification | | FiVE (CVPR'24) | WSI Classification | Generalizable whole slide image classification with fine-grained visual-semantic interaction | Github | VLM pretraining for WSI classification | | ViLa-MIL (CVPR'24) | WSI Classification | Vila-mil: Dual-scale vision language multiple instance learning for whole slide image classification | Github | Dual-scale features for WSI classification | | CoD-MIL (TMI'24) | WSI Classification | CoD-MIL: Chain-of-Diagnosis Prompting Multiple Instance Learning for Whole Slide Image Classiffcation | Github | An improved version of ViLa-MIL | | VLSA (ICLR'25) | WSI Survival Analysis | Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology | Github | VLM-driven vision-language survival analysis | | QPMIL-VL (AAAI'25) | WSI Classification | Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification | Github | VLM-driven Incremental Learning for WSIs | | MSCPT (TMI'25) | WSI Classification | MSCPT: Few-shot Whole Slide Image Classification with Multi-scale and Context-focused Prompt Tuning | Github | Multimodal Prompt Tuning for WSI |

NOTE: please open a new PR if you want to add your work into this table.

WSI Preprocessing

Following CONCH, we first divide each WSI into patches of 448 * 448 pixels at 20x magnification. Then we adopt the image encoder of CONCH to extract patch features.

Our complete procedure in WSI preprocessing follows Pipeline-Processing-TCGA-Slides-for-MIL. You could

  • download the patch
View on GitHub
GitHub Stars69
CategoryDevelopment
Updated1d ago
Forks4

Languages

Jupyter Notebook

Security Score

85/100

Audited on Apr 6, 2026

No findings