CLAM
Open source tools for computational pathology - Nature BME
Install / Use
/learn @mahmoodlab/CLAMREADME
CLAM <img src="clam-logo.png" width="280px" align="right" />
Data Efficient and Weakly Supervised Computational Pathology on Whole Slide Images. Nature Biomedical Engineering
ArXiv | Journal Link | Interactive Demo | Cite
TL;DR: CLAM is a high-throughput and interpretable method for data efficient whole slide image (WSI) classification using slide-level labels without any ROI extraction or patch-level annotations, and is capable of handling multi-class subtyping problems. Tested on three different WSI datasets, trained models adapt to independent test cohorts of WSI resections and biopsies as well as smartphone microscopy images (photomicrographs).
<img src="ani.gif" width="470px" align="left" />
CLAM: A Deep-Learning-based Pipeline for Data Efficient and Weakly Supervised Whole-Slide-level Analysis
Pre-requisites • Installation • Segmentation and Patching • Feature Extraction • Weakly Supervised Training • Testing • Trained Models • Heatmap Visualization • Examples • Pre-print • Demo • Cite
How does CLAM work? Clustering-constrained Attention Multiple Instance Learning (CLAM) is a deep-learning-based weakly-supervised method that uses attention-based learning to automatically identify sub-regions of high diagnostic value in order to accurately classify the whole slide, while also utilizing instance-level clustering over the representative regions identified to constrain and refine the feature space.
© Mahmood Lab - This code is made available under the GPLv3 License and is available for non-commercial academic purposes.
Updates:
- 04/15/2025: Checkout our new repository Trident for whole-slide image processing with support for 25+ foundation models, including UNIv2, CONCH, TITAN, and many more!
- 04/06/2024: UNI and CONCH are now available to select as pretrained encoders. See Using CONCH / UNI as Pretrained Encoders for more details. Please make sure all dependencies are installed correctly by installing the latest env.yml file (see Installation guide for details), and using the corresponding clam_latest conda environment.
- 03/19/2024: We are releasing UNI and CONCH, a pair of SOTA pretrained encoders that produce strong representations for histopathology images and enhance performance on various computational pathology workflows, including the MIL-based CLAM workflow.
- 05/24/2021: Script for heatmap visualization now available via create_heatmaps.py, with the configuration template located in heatmaps/configs. See Heatmap visualization for demo and instructions.
- 03/01/2021: New, fast patching/feature extraction pipeline is now available. TL;DR: since CLAM only requires image features for training, it is not necessary to save the actual image patches, the new pipeline rids of this overhead and instead only saves the coordinates of image patches during "patching" and loads these regions on the fly from WSIs during feature extraction. This is significantly faster than the old pipeline and usually only takes 1-2s for "patching" and a couple minutes to featurize a WSI. To use the new pipeline, make sure you are calling create_patches_fp.py and extract_features_fp.py instead of the old create_patches.py and extract_features.py scripts.
Note: while we hope that the newest update will require minimal changes to the user's workflow, if needed, you may reference the old version of the code base here. Please report any issues in the public forum.
Warning: the latest update will by default resize image patches to 224 x 224 before extracting features using the pretrained encoder. This change serves to make it more consistent with the evaluation protocol used in UNI, CONCH and other studies. If you wish to preserve the original size of the image patches generated during patching or use a different image size for feature extraction, you can do so by specifying --target_patch_size in extract_features_fp.py.
RE update 03/01/21: note that the README has been updated to use the new, faster pipeline by default. If you still wish to use the old pipeline, refer to: Guide for Old Pipeline. It saves tissue patches, which is signficantly slower and takes up a lot of storage space but can still be useful if you need to work with original image patches instead of feature embeddings.
Installation:
Please refer to our Installation guide for detailed instructions on how to get started.
WSI Segmentation and Patching
<img src="CLAM1.jpg" width="1000px" align="center" /> The first step focuses on segmenting the tissue and excluding any holes. The segmentation of specific slides can be adjusted by tuning the individual parameters (e.g. dilated vessels appearing as holes may be important for certain sarcomas.) The following example assumes that digitized whole slide image data in well known standard formats (.svs, .ndpi, .tiff etc.) are stored under a folder named DATA_DIRECTORYDATA_DIRECTORY/
├── slide_1.svs
├── slide_2.svs
└── ...
Basic, Fully Automated Run
python create_patches_fp.py --source DATA_DIRECTORY --save_dir RESULTS_DIRECTORY --patch_size 256 --seg --patch --stitch
The above command will segment every slide in DATA_DIRECTORY using default parameters, extract all patches within the segemnted tissue regions, create a stitched reconstruction for each slide using its extracted patches (optional) and generate the following folder structure at the specified RESULTS_DIRECTORY:
RESULTS_DIRECTORY/
├── masks
├── slide_1.png
├── slide_2.png
└── ...
├── patches
├── slide_1.h5
├── slide_2.h5
└── ...
├── stitches
├── slide_1.png
├── slide_2.png
└── ...
└── process_list_autogen.csv
The masks folder contains the segmentation results (one image per slide). The patches folder contains arrays of extracted tissue patches from each slide (one .h5 file per slide, where each entry corresponds to the coordinates of the top-left corner of a patch) The stitches folder contains downsampled visualizations of stitched tissue patches (one image per slide) (Optional, not used for downstream tasks) The auto-generated csv file process_list_autogen.csv contains a list of all slides processed, along with their segmentation/patching parameters used.
Additional flags that can be passed include:
--custom_downsample: factor for custom downscale (not recommended, ideally should first check if native downsamples exist)--patch_level: which downsample pyramid level to extract patches from (default is 0, the highest available resolution)--no_auto_skip: by default, the script will skip over files for which patched .h5 files already exist in the desination folder, this toggle can be used to override this behavior
Some parameter templates are also availble and can be readily deployed as good choices for default parameters:
bwh_biopsy.csv: used for segmenting biopsy slides scanned at BWH (Scanned using Hamamatsu S210 and Aperio GT450)bwh_resection.csv: used for segmenting resection slides scanned at BWHtcga.csv: used for segmenting TCGA slides
Simply pass the name of the template file to the --preset argument, for example, to use the biopsy template:
python create_patches_fp.py --source DATA_DIRECTORY --save_dir RESULTS_DIRECTORY --patch_size 256 --preset bwh_biopsy.csv --seg --patch --stitch
Custom Default Segmentation Parameters
For advanced usage, in addition to using the default, single set of parameters defined in the script create_patches_fp.py, the user can define custom templates of parameters depending on the dataset. These templates are expected to be stored under presets, and contain values for each of the parameters used during segmentation and patching.
The list of segmentation parameters is as follows:
seg_level: downsample level on which to segment the WSI (default: -1, which uses the downsample in the WSI closest to 64x downsample)sthresh: segmentation threshold (positive integer, default: 8, using a higher threshold leads to less foreground and more background detection)mthresh: median filter size (positive, odd integer, default: 7)use_otsu: use otsu's method instead of simple binary thresholding (default: False)close: additional morphological closing to apply following initial thresholding (positive integer or -1, default: 4)
The list of contour filtering parameters is as follows:
a_t: area filter threshold for tissue (positive integer, the minimum size of detected foreground contours to consider, relative to a reference patch size of 512 x 512 at level 0, e.g. a value 10 means only detected foreground contours of size greater than 10 512 x 512 sized patches at level 0 will be processed, default: 100)a_h: area filter threshold for holes (positive integer, the minimum size of detected holes/cavities in fo
