SkillAgentSearch skills...

DeepLIIF

Deep Learning Inferred Multiplex ImmunoFluorescence for IHC Image Quantification (https://deepliif.org) [Nature Machine Intelligence'22, CVPR'22, MICCAI'23, Histopathology'23, MICCAI'24]

Install / Use

/learn @nadeemlab/DeepLIIF

README

<!-- PROJECT LOGO --> <br /> <p align="center"> <img src="./images/DeepLIIF_logo.png" width="50%"> <h3 align="center"><strong>Deep-Learning Inferred Multiplex Immunofluorescence for Immunohistochemical Image Quantification</strong></h3> <p align="center"> <a href="https://rdcu.be/cKSBz">Nature MI'22</a> | <a href="https://openaccess.thecvf.com/content/CVPR2022/html/Ghahremani_DeepLIIF_An_Online_Platform_for_Quantification_of_Clinical_Pathology_Slides_CVPR_2022_paper.html">CVPR'22</a> | <a href="https://arxiv.org/abs/2305.16465">MICCAI'23</a> | <a href="https://onlinelibrary.wiley.com/share/author/4AEBAGEHSZE9GDP3H8MN?target=10.1111/his.15048">Histopathology'23</a> | <a href="https://arxiv.org/abs/2405.08169">MICCAI'24</a> | <a href="https://deepliif.org/">Cloud Deployment</a> | <a href="https://nadeemlab.github.io/DeepLIIF/">Documentation</a> | <a href="#support">Support</a> </p> </p>

Reporting biomarkers assessed by routine immunohistochemical (IHC) staining of tissue is broadly used in diagnostic pathology laboratories for patient care. To date, clinical reporting is predominantly qualitative or semi-quantitative. By creating a multitask deep learning framework referred to as DeepLIIF, we present a single-step solution to stain deconvolution/separation, cell segmentation, and quantitative single-cell IHC scoring. Leveraging a unique de novo dataset of co-registered IHC and multiplex immunofluorescence (mpIF) staining of the same slides, we segment and translate low-cost and prevalent IHC slides to more expensive-yet-informative mpIF images, while simultaneously providing the essential ground truth for the superimposed brightfield IHC channels. Moreover, a new nuclear-envelop stain, LAP2beta, with high (>95%) cell coverage is introduced to improve cell delineation/segmentation and protein expression quantification on IHC slides. By simultaneously translating input IHC images to clean/separated mpIF channels and performing cell segmentation/classification, we show that our model trained on clean IHC Ki67 data can generalize to more noisy and artifact-ridden images as well as other nuclear and non-nuclear markers such as CD3, CD8, BCL2, BCL6, MYC, MUM1, CD10, and TP53. We thoroughly evaluate our method on publicly available benchmark datasets as well as against pathologists' semi-quantitative scoring. Trained on IHC, DeepLIIF generalizes well to H&E images for out-of-the-box nuclear segmentation.

DeepLIIF is deployed as a free publicly available cloud-native platform (https://deepliif.org) with Bioformats (more than 150 input formats supported) and MLOps pipeline. We also release DeepLIIF implementations for single/multi-GPU training, Torchserve/Dask+Torchscript deployment, and auto-scaling via Pulumi (1000s of concurrent connections supported); details can be found in our documentation. DeepLIIF can be run locally (GPU required) by pip installing the package and using the deepliif CLI command. DeepLIIF can be used remotely (no GPU required) through the https://deepliif.org website, calling the cloud API via Python, or via the ImageJ/Fiji plugin; details for the free cloud-native platform can be found in our CVPR'22 paper.

© This code is made available for non-commercial academic purposes.

Version Total Downloads

overview_imageOverview of DeepLIIF pipeline and sample input IHCs (different brown/DAB markers -- BCL2, BCL6, CD10, CD3/CD8, Ki67) with corresponding DeepLIIF-generated hematoxylin/mpIF modalities and classified (positive (red) and negative (blue) cell) segmentation masks. (a) Overview of DeepLIIF. Given an IHC input, our multitask deep learning framework simultaneously infers corresponding Hematoxylin channel, mpIF DAPI, mpIF protein expression (Ki67, CD3, CD8, etc.), and the positive/negative protein cell segmentation, baking explainability and interpretability into the model itself rather than relying on coarse activation/attention maps. In the segmentation mask, the red cells denote cells with positive protein expression (brown/DAB cells in the input IHC), whereas blue cells represent negative cells (blue cells in the input IHC). (b) Example DeepLIIF-generated hematoxylin/mpIF modalities and segmentation masks for different IHC markers. DeepLIIF, trained on clean IHC Ki67 nuclear marker images, can generalize to noisier as well as other IHC nuclear/cytoplasmic marker images.

Prerequisites

  1. Python 3.8
  2. Docker

Installing deepliif

DeepLIIF can be pip installed:

$ conda create --name deepliif_env python=3.8
$ conda activate deepliif_env
(deepliif_env) $ conda install -c conda-forge openjdk
(deepliif_env) $ pip install deepliif

The package is composed of two parts:

  1. A library that implements the core functions used to train and test DeepLIIF models.
  2. A CLI to run common batch operations including training, batch testing and Torchscipt models serialization.

You can list all available commands:

$ deepliif --help
Usage: deepliif [OPTIONS] COMMAND [ARGS]...

  Commonly used DeepLIIF batch operations

Options:
  --help  Show this message and exit.

Commands:
  prepare-testing-data   Preparing data for testing
  prepare-training-data  Preparing data for training
  serialize              Serialize DeepLIIF models using Torchscript
  test                   Test trained models
  test-wsi
  train                  General-purpose training script for multi-task...
  trainlaunch            A wrapper method that executes deepliif/train.py...
  visualize

Note: You might need to install a version of PyTorch that is compatible with your CUDA version. Otherwise, only the CPU will be used. Visit the PyTorch website for details. You can confirm if your installation will run on the GPU by checking if the following returns True:

import torch
torch.cuda.is_available()

Dataset for training, validation, and testing

An example data directory looks as the following:

<Data folder> 
    ├── train
    ├── val
    ├── val_cli
    └── val_cli_gt

If you use different subfolder names, you will need to add --phase {foldername} in the training or testing commands for the functions to navigate to the correct subfolder.

Content in each subfolder:

  • train: training images used by command python cli.py train, see section Training Dataset below
  • val: validation images used by command python cli.py train --with-val, see section Validation Dataset below
  • val_cli: input modalities of the validation images used by command python cli.py test, see section Testing below
  • val_cli_gt: ground truth of the output modalities from the validation images, used for evaluation purposes

Training Dataset

For training in general, each image in the training set is in the form of a set of horizontally stitched patches, in the order of base input modalities, translation modalities, and segmentation modalities (whenever applicable).

Specifically for the DeepLIIF original model, all image sets must be 512x512 and combined together in 3072x512 images (six images of size 512x512 stitched together horizontally).

We have provided a simple function in the CLI for preparing DeepLIIF data for training.

  • To use this method to prepare data for training, you need to have the image dataset for each image (including IHC, Hematoxylin Channel, mpIF DAPI, mpIF Lap2, mpIF marker, and segmentation mask) in the input directory. Each of the six images for a single image set must have the same naming format, with only the name of the label for the type of image differing between them. To reproduce the original DeepLIIF model, the label names must be, respectively: IHC, Hematoxylin, DAPI, Lap2, Marker, Seg. The command takes the address of the directory containing image set data and the address of the output dataset directory. It first creates the train and validation directories inside the given output dataset directory. It then reads all of the images in the input directory and saves the combined image in the train or validation directory, based on the given validation_ratio.
deepliif prepare-training-data --input-dir /path/to/input/images
                               --output-dir /path/to/output/images
                               --validation-ratio 0.2

Validation Dataset

The validation dataset consists of images of the same format as the training dataset and is totally optional (i.e., DeepLIIF model training command does not require a validation dataset to run). This currently is only implemented for DeepLIIF or DeepLIIFKD models with segmentation task (in which case the very last tile in the training / validation image is the segmentation tile).

To use the validation dataset during training, it is necessary to first acquire the key quantitative statistics for the model to compare against as the training progresses. In tasks that target generating a single number or an array of numbers, validation metrics can be done by simply calculating the differences between the ground truth numbers and predicted numbers. In our image generation tasks, however, the key metrics we want to monitor are segmentation results: number of positive cells, number of negative cells, etc. These are much more

Related Skills

View on GitHub
GitHub Stars245
CategoryEducation
Updated14h ago
Forks89

Languages

Python

Security Score

85/100

Audited on Apr 5, 2026

No findings