SPIDER: A Multi-Organ Supervised Pathology Dataset and Baseline Models

Overview

SPIDER (Supervised Pathology Image-DEscription Repository) is a large, high-quality, and diverse patch-level dataset designed to advance AI-driven computational pathology. It provides multi-organ coverage, expert-annotated labels, and strong baseline models to support research and development in digital pathology.

This repository serves as a central hub for accessing the SPIDER datasets, pre-trained models, and related resources.

📄 Paper

For a detailed description of SPIDER, methodology, and benchmark results, refer to our research paper:

📄 SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models
View on arXiv

Resources

📂 Datasets

SPIDER consists of four organ-specific datasets. Available for download from Hugging Face Hub 🤗:

Each dataset contains:

224×224 central patches with expert-verified class labels
24 surrounding context patches forming a 1120×1120 composite region
20X magnification for high-detail analysis
Train-test splits ensuring robust benchmarking

📌 See individual dataset pages for more details.

🤖 Pretrained Models

Baseline models trained on the SPIDER datasets using the Hibou-L foundation model with an attention-based classification head. Available for download from Hugging Face Hub 🤗:

Each model supports:

Patch-level classification with multi-class labels
Improved accuracy using surrounding context patches
Easy deployment for pathology AI applications

📌 See individual model pages for inference instructions.

🔧 Getting Started

🛠 Using the Dataset

Download any SPIDER dataset using huggingface_hub:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="histai/SPIDER-colorectal", repo_type="dataset", local_dir="./spider_colorectal")

Or clone directly using Git:

git lfs install
git clone https://huggingface.co/datasets/histai/SPIDER-colorectal

Extract dataset files:

cat spider-colorectal.tar.* | tar -xvf -

🤖 Using the Model

Load a pretrained model for inference:

from transformers import AutoModel, AutoProcessor
from PIL import Image

model = AutoModel.from_pretrained("histai/SPIDER-colorectal-model", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("histai/SPIDER-colorectal-model", trust_remote_code=True)

image = Image.open("path_to_image.png")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
print(outputs.predicted_class_names)

📈 Benchmark Results

| Organ | Accuracy | Precision | F1 Score | |-------------|----------|------------|----------| | Skin | 0.940 | 0.935 | 0.937 | | Colorectal | 0.914 | 0.917 | 0.915 | | Thorax | 0.962 | 0.958 | 0.960 | | Breast | 0.902 | 0.896 | 0.897 |

🔗 More Information

📜 License

This project is licensed under CC BY-NC 4.0. The dataset and models are available for research use only.

📧 Contact

Authors: Dmitry Nechaev, Alexey Pchelnikov, Ekaterina Ivanova
📩 Emails: dmitry@hist.ai, alex@hist.ai, kate@hist.ai

📖 Citation

If you use SPIDER in your research, please cite:

@misc{nechaev2025spidercomprehensivemultiorgansupervised,
      title={SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models}, 
      author={Dmitry Nechaev and Alexey Pchelnikov and Ekaterina Ivanova},
      year={2025},
      eprint={2503.02876},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2503.02876}, 
}

SPIDER

Install / Use

README