CONCH
Vision-Language Pathology Foundation Model - Nature Medicine
Install / Use
/learn @mahmoodlab/CONCHREADME
CONCH 🐚
A Vision-Language Foundation Model for Computational Pathology
Nature Medicine <img src="conch.jpg" width="300px" align="right" />
Journal Link | Open Access Read Link | Download Model | Cite
Abstract: The accelerated adoption of digital pathology and advances in deep learning have enabled the development of robust models for various pathology tasks across a diverse array of diseases and patient cohorts. However, model training is often difficult due to label scarcity in the medical domain and the model's usage is limited by the specific task and disease for which it is trained. Additionally, most models in histopathology leverage only image data, a stark contrast to how humans teach each other and reason about histopathologic entities. We introduce CONtrastive learning from Captions for Histopathology (CONCH), a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and notably over 1.17 million image-caption pairs via task-agnostic pretraining. Evaluated on a suite of 14 diverse benchmarks, CONCH can be transferred to a wide range of downstream tasks involving either or both histopathology images and text, achieving state-of-the-art performance on histology image classification, segmentation, captioning, text-to-image, and image-to-text retrieval. CONCH represents a substantial leap over concurrent visual-language pretrained systems for histopathology, with the potential to directly facilitate a wide array of machine learning-based workflows requiring minimal or no further supervised fine-tuning.
What is CONCH?
CONCH (CONtrastive learning from Captions for Histopathology) is a vision language foundation model for histopathology, pretrained on currently the largest histopathology-specific vision-language dataset of 1.17M image caption pairs. Compare to other vision language foundation models, it demonstrates state-of-the-art performance across 14 tasks in computational pathology ranging from image classification, text-to-image, and image-to-text retrieval, captioning, and tissue segmentation.
- Why use CONCH?: Compared to popular self-supervised encoders for computational pathology that were pretrained only on H&E images, CONCH may produce more performant representations for non-H&E stained images such as IHCs and special stains, and can be used for a wide range of downstream tasks involving either or both histopathology images and text. CONCH also did not use large public histology slide collections such as TCGA, PAIP, GTEX, etc. for pretraining, which are routinely used in benchmark development in computational pathology. Therefore, we make CONCH available for the research community in building and evaluating pathology AI models with minimal risk of data contamination on public benchmarks or private histopathology slide collections.
Installation
First clone the repo and cd into the directory:
git clone https://github.com/mahmoodlab/CONCH.git
cd CONCH
Then create a conda env and install the dependencies:
conda create -n conch python=3.10 -y
conda activate conch
pip install --upgrade pip
pip install -e .
Updates
- 3/20/2025: One year overview of UNI & CONCH written by our team with updated table of research applications.
- 12/02/2024: Based on CONCH v1.5, a new SOTA multimodal slide foundation model, TITAN, has been released: [Model] [Preprint]
- 07/16/2024: Included comparisons with Virchow.
- 06/15/2024: Included comparisons with Prov-GigaPath.
Research Applications using UNI & CONCH
<details> <summary> <b>Last Updated 3/20/2025</b> </summary>| Paper Name | Year | Publication | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|------| | A self-supervised framework for learning whole slide representations | 2024 | arXiv:2402.06188 | | Honeybee: a scalable modular framework for creating multimodal oncology datasets with foundational embedding models | 2024 | arXiv:2405.07460 | | Combining graph neural network and mamba to capture local and global tissue spatial relationships in whole slide images | 2024 | arXiv:2406.04377 | | STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics | 2024 | arXiv:2406.06393 | | Embedding-based multimodal learning on pan-squamous cell carcinomas for improved survival outcomes | 2024 | arXiv:2406.08521 | | A clinical benchmark of public self-supervised pathology foundation models | 2024 | arXiv:2407.06508v1 | | Path-SAM2: Transfer SAM2 for digital pathology semantic segmentation | 2024 | arXiv:2408.03651 | | Benchmarking foundation models as feature extractors for weakly-supervised computational pathology | 2024 | arXiv:2408.15823 | | Pediatric brain tumor classification using digital histopathology and deep learning: evaluation of SOTA methods on a multi-center Swedish cohort | 2024 | arXiv:2409.01330 | | Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval | 2024 | arXiv:2409.09430 | | Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction | 2024 | arXiv:2410.00945 | | Deep Learning for Fetal Inflammatory Response Diagnosis in the Umbilical Cord | 2024 | arXiv:2411.09767 | | Diagnostic Text-guided Representation Learning in Hierarchical Classification for Pathological Whole Slide Image | 2024 | arXiv:2411.10709 | | Leveraging Computational Pathology AI for Noninvasive Optical Imaging Analysis Without Retraining | 2024 | arXiv:2411.11613 | | FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification | 2024 | arXiv:2411.14743 | | RankByGene: Gene-Guided Histopathology Representation Learning Through Cross-Modal Ranking Consistency | 2024 | arXiv:2411.15076 | | ST-Align: A Multimodal Foundation Model for Image-Gene Alignment in Spatial Transcriptomics | 2024 | arXiv:2411.16793 | | Multimodal Outer Arithmetic Block Dual Fusion of Whole Slide Images and Omics Data for Precision Oncology | 2024 | arXiv:2411.17418 | | Multimodal whole slide foundation model for pathology | 2024 | arXiv:2411.19666 | | GCUNet: A GNN-Based Contextual Learning Network for Tertiary Lymphoid Structure Semantic Segmentation in Whole Slide Image | 2024 | arXiv:2412.06129 | | A multimodal ensemble approach for clear cell renal cell carcinoma treatment outcome prediction | 2024 | arXiv:2412.07136 | | From Histopathology Images to Cell Clouds: Learning Slide Representations with Hierarchical Cell Transformer | 2024 | arXiv:2412.16715 | | Vision-language models do not understand negation | 2025 | arXiv:2501.09425 | | Prior Knowledge Injection into Deep Learning Models Predicting Gene Expression from Whole Slide Images | 2025 | arXiv:2501.14056
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
API
A learning and reflection platform designed to cultivate clarity, resilience, and antifragile thinking in an uncertain world.
openclaw-plugin-loom
Loom Learning Graph Skill This skill guides agents on how to use the Loom plugin to build and expand a learning graph over time. Purpose - Help users navigate learning paths (e.g., Nix, German)
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
