SPIDER
SPIDER: A multi-organ pathology dataset with expert annotations and pre-trained models for AI-driven research. Available on Hugging Face 🤗.
Install / Use
/learn @histai/SPIDERREADME
SPIDER: A Multi-Organ Supervised Pathology Dataset and Baseline Models
Overview
SPIDER (Supervised Pathology Image-DEscription Repository) is a large, high-quality, and diverse patch-level dataset designed to advance AI-driven computational pathology. It provides multi-organ coverage, expert-annotated labels, and strong baseline models to support research and development in digital pathology.
This repository serves as a central hub for accessing the SPIDER datasets, pre-trained models, and related resources.
📄 Paper
For a detailed description of SPIDER, methodology, and benchmark results, refer to our research paper:
📄 SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models
View on arXiv
Resources
📂 Datasets
SPIDER consists of four organ-specific datasets. Available for download from Hugging Face Hub 🤗:
Each dataset contains:
- 224×224 central patches with expert-verified class labels
- 24 surrounding context patches forming a 1120×1120 composite region
- 20X magnification for high-detail analysis
- Train-test splits ensuring robust benchmarking
📌 See individual dataset pages for more details.
🤖 Pretrained Models
Baseline models trained on the SPIDER datasets using the Hibou-L foundation model with an attention-based classification head. Available for download from Hugging Face Hub 🤗:
Each model supports:
- Patch-level classification with multi-class labels
- Improved accuracy using surrounding context patches
- Easy deployment for pathology AI applications
📌 See individual model pages for inference instructions.
🔧 Getting Started
🛠 Using the Dataset
Download any SPIDER dataset using huggingface_hub:
from huggingface_hub import snapshot_download
snapshot_download(repo_id="histai/SPIDER-colorectal", repo_type="dataset", local_dir="./spider_colorectal")
Or clone directly using Git:
git lfs install
git clone https://huggingface.co/datasets/histai/SPIDER-colorectal
Extract dataset files:
cat spider-colorectal.tar.* | tar -xvf -
🤖 Using the Model
Load a pretrained model for inference:
from transformers import AutoModel, AutoProcessor
from PIL import Image
model = AutoModel.from_pretrained("histai/SPIDER-colorectal-model", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("histai/SPIDER-colorectal-model", trust_remote_code=True)
image = Image.open("path_to_image.png")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
print(outputs.predicted_class_names)
📈 Benchmark Results
| Organ | Accuracy | Precision | F1 Score | |-------------|----------|------------|----------| | Skin | 0.940 | 0.935 | 0.937 | | Colorectal | 0.914 | 0.917 | 0.915 | | Thorax | 0.962 | 0.958 | 0.960 | | Breast | 0.902 | 0.896 | 0.897 |
🔗 More Information
📜 License
This project is licensed under CC BY-NC 4.0. The dataset and models are available for research use only.
📧 Contact
Authors: Dmitry Nechaev, Alexey Pchelnikov, Ekaterina Ivanova
📩 Emails: dmitry@hist.ai, alex@hist.ai, kate@hist.ai
📖 Citation
If you use SPIDER in your research, please cite:
@misc{nechaev2025spidercomprehensivemultiorgansupervised,
title={SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models},
author={Dmitry Nechaev and Alexey Pchelnikov and Ekaterina Ivanova},
year={2025},
eprint={2503.02876},
archivePrefix={arXiv},
primaryClass={eess.IV},
url={https://arxiv.org/abs/2503.02876},
}
Security Score
Audited on Mar 31, 2026
