Histolab
Library for Digital Pathology Image Processing
Install / Use
/learn @histolab/HistolabREADME

Table of Contents
- Motivation
- Quickstart
- Versioning
- Authors
- License
- Roadmap
- Acknowledgements
- References
- Contribution guidelines
Motivation
The histo-pathological analysis of tissue sections is the gold standard to assess the presence of many complex diseases, such as tumors, and understand their nature. In daily practice, pathologists usually perform microscopy examination of tissue slides considering a limited number of regions and the clinical evaluation relies on several factors such as nuclei morphology, cell distribution, and color (staining): this process is time consuming, could lead to information loss, and suffers from inter-observer variability.
The advent of digital pathology is changing the way pathologists work and collaborate, and has opened the way to a new era in computational pathology. In particular, histopathology is expected to be at the center of the AI revolution in medicine [1], prevision supported by the increasing success of deep learning applications to digital pathology.
Whole Slide Images (WSIs), namely the translation of tissue slides from glass to digital format, are a great source of information from both a medical and a computational point of view. WSIs can be coloured with different staining techniques (e.g. H&E or IHC), and are usually very large in size (up to several GB per slide). Because of WSIs typical pyramidal structure, images can be retrieved at different magnification factors, providing a further layer of information beyond color.
However, processing WSIs is far from being trivial. First of all, WSIs can be stored in different proprietary formats, according to the scanner used to digitalize the slides, and a standard protocol is still missing. WSIs can also present artifacts, such as shadows, mold, or annotations (pen marks) that are not useful. Moreover, giving their dimensions, it is not possible to process a WSI all at once, or, for example, to feed a neural network: it is necessary to crop smaller regions of tissues (tiles), which in turns require a tissue detection step.
The aim of this project is to provide a tool for WSI processing in a reproducible environment to support clinical and scientific research. histolab is designed to handle WSIs, automatically detect the tissue, and retrieve informative tiles, and it can thus be integrated in a deep learning pipeline.
Getting Started
Prerequisites
Please see installation instructions.
Documentation
Read the full documentation here https://histolab.readthedocs.io/en/latest/.
Communication
Join our user group on <img src=https://user-images.githubusercontent.com/4196091/101638148-01522780-3a2e-11eb-8502-f718564ffd43.png> Slack
5 minutes introduction
<a href="https://youtu.be/AdR4JK-Eq60" target="_blank"><img src=https://user-images.githubusercontent.com/4196091/105097293-a68a0200-5aa8-11eb-8327-6039940fbdca.png></a>
Quickstart
Here we present a step-by-step tutorial on the use of histolab to
extract a tile dataset from example WSIs. The corresponding Jupyter
Notebook is available at https://github.com/histolab/histolab-box:
this repository contains a complete histolab environment that can be
used through Docker on all platforms.
Thus, the user can decide either to use histolab through
histolab-box or installing it in his/her python virtual environment
(using conda, pipenv, pyenv, virtualenv, etc...). In the latter case, as
the histolab package has been published on (PyPi),
it can be easily installed via the command:
pip install histolab
alternatively, it can be installed via conda:
conda install -c conda-forge histolab
TCGA data
First things first, let’s import some data to work with, for example the
prostate tissue slide and the ovarian tissue slide available in the
data module:
from histolab.data import prostate_tissue, ovarian_tissue
Note: To use the data module, you need to install pooch, also
available on PyPI (https://pypi.org/project/pooch/). This step is
needless if we are using the Vagrant/Docker virtual environment.
The calling to a data function will automatically download the WSI
from the corresponding repository and save the slide in a cached
directory:
prostate_svs, prostate_path = prostate_tissue()
ovarian_svs, ovarian_path = ovarian_tissue()
Notice that each data function outputs the corresponding slide, as an
OpenSlide object, and the path where the slide has been saved.
Slide initialization
histolab maps a WSI file into a Slide object. Each usage of a WSI
requires a 1-o-1 association with a Slide object contained in the
slide module:
from histolab.slide import Slide
To initialize a Slide it is necessary to specify the WSI path, and the
processed_path where the tiles will be saved. In our
example, we want the processed_path of each slide to be a subfolder of
the current working directory:
import os
BASE_PATH = os.getcwd()
PROCESS_PATH_PROSTATE = os.path.join(BASE_PATH, 'prostate', 'processed')
PROCESS_PATH_OVARIAN = os.path.join(BASE_PATH, 'ovarian', 'processed')
prostate_slide = Slide(prostate_path, processed_path=PROCESS_PATH_PROSTATE)
ovarian_slide = Slide(ovarian_path, processed_path=PROCESS_PATH_OVARIAN)
Note: If the slides were stored in the same folder, this can be done
directly on the whole dataset by using the SlideSet object of the
slide module.
With a Slide object we can easily retrieve information about the
slide, such as the slide name, the number of available levels, the
dimensions at native magnification or at a specified level:
print(f"Slide name: {prostate_slide.name}")
print(f"Levels: {prostate_slide.levels}")
print(f"Dimensions at level 0: {prostate_slide.dimensions}")
print(f"Dimensions at level 1: {prostate_slide.level_dimensions(level=1)}")
print(f"Dimensions at level 2: {prostate_slide.level_dimensions(level=2)}")
Slide name: 6b725022-f1d5-4672-8c6c-de8140345210
Levels: [0, 1, 2]
Dimensions at level 0: (16000, 15316)
Dimensions at level 1: (4000, 3829)
Dimensions at level 2: (2000, 1914)
print(f"Slide name: {ovarian_slide.name}")
print(f"Levels: {ovarian_slide.levels}")
print(f"Dimensions at level 0: {ovarian_slide.dimensions}")
print(f"Dimensions at level 1: {ovarian_slide.level_dimensions(level=1)}")
print(f"Dimensions at level 2: {ovarian_slide.level_dimensions(level=2)}")
Slide name: b777ec99-2811-4aa4-9568-13f68e380c86
Levels: [0, 1, 2]
Dimensions at level 0: (30001, 33987)
Dimensions at level 1: (7500, 8496)
Dimensions at level 2: (1875, 2124)
Note:
If the native magnification, i.e., the magnification factor used to scan the slide, is provided in the slide properties, it is also possible
to convert the desired level to its corresponding magnification factor with the level_magnification_factor property.
print(
"Native magnification factor:",
prostate_slide.level_magnification_factor()
)
print(
"Magnification factor corresponding to level 1:",
prostate_slide.level_magnification_factor(level=1),
)
Native magnification factor: 20X
Magnification factor corresponding to level 1: 5.0X
Moreover, we can retrieve or show the slide thumbnail in a separate window:
prostate_slide.thumbnail
prostate_slide.show()
