Ecotyper
EcoTyper is a machine learning framework for large-scale identification of cell states and cellular ecosystems from gene expression data.
Install / Use
/learn @digitalcytometry/EcotyperREADME
Introduction
EcoTyper is a machine learning framework for large-scale identification of cell type-specific transcriptional states and their co-association patterns from bulk and single-cell (scRNA-seq) expression data.
Ecotyper can be run in an easy-to-use web interface accessible here. The software can also be run on a personal computer, server or high-performance computing cluster following the instructions described in this GitHub repository. For further details, users may refer to our book chapter (Methods in Molecular Biology, 2023).
We have already defined cell states and ecotypes across carcinomas (Luca/Steen et al., Cell 2021) and in diffuse large B cell lymphoma (DLBCL) (Steen/Luca et al., Cancer Cell 2021). The current version of EcoTyper allows users to recover the cell states and ecotypes for these two tumor categories in their own data. Additionally, it allows users to discover and recover cell states and ecotypes in their system of interest, including directly from scRNA-seq data (see Tutorial 5). Below we illustrate each of these functionalities.
Citation
If EcoTyper software, data, and/or website are used in your publication, please cite the following paper(s):
- Luca/Steen et al., Cell 2021 (detailed description of EcoTyper and application to carcinomas).
- Steen/Luca et al., Cancer Cell 2021 (application of EcoTyper to lymphoma).
Setup
The latest version of EcoTyper source code can be found on EcoTyper GitHub repository and Ecotyper website. To set up EcoTyper, please download this folder locally:
git clone https://github.com/digitalcytometry/ecotyper
cd ecotyper
or:
wget https://github.com/digitalcytometry/ecotyper/archive/refs/heads/master.zip
unzip master.zip
cd ecotyper-master
Basic resources
The R packages listed below are required for running EcoTyper. The version numbers indicate the package versions used for developing and testing the EcoTyper code. Other R versions might work too:
- R (v3.6.0 and v4.1.0).
- R packages: ComplexHeatmap (v2.2.0 and v2.8.0), NMF (v0.21.0 and v0.23.0), RColorBrewer (v1.1.2), cluster (v2.1.0 and v2.1.2)), circlize (v0.4.10 and v0.4.12), cowplot (v1.1.0 and v1.1.1), data.table (base package R v3.6.0 and v4.1.0), doParallel (v1.0.15 and v1.0.16), ggplot2 (v3.3.2, v3.3.3), grid (base package R v3.6.0 and v4.1.0), reshape2 (v1.4.4), viridis (v0.5.1 and v0.6.1), config (v0.3.1), argparse (v2.0.3), colorspace (v1.4.1 and v2.0.1), plyr (v1.8.6), Biobase (v2.40.0).
These packages, together with the other resources pre-stored in the EcoTyper folder, allow users to:
- perform the recovery of previously defined cell states and ecotypes in their own bulk RNA-seq, microarray and scRNA-seq data (Tutorials 1 and 2).
- perform cell state and ecotype discovery in scRNA-seq and pre-sorted cell type-specific profiles (Tutorials 5 and 6).
Besides these packages, the additional resources described in the next section are needed for analyses described in Tutorials 3 and 4. Moreover, Mac users might need xquartz.
Additional resources
For some use cases, such as cell state and ecotype recovery in spatial transcriptomics assays (Tutorial 3) and de novo identification of cell states and ecotypes from bulk expression data (Tutorial 4), EcoTyper relies on CIBERSORTx (Newman et al., Nature Biotechnology 2019, a digital cytometry framework for enumerating cell types in bulk data and performing in silico deconvolution of cell type specific expression profiles. In these situations, the following additional resources are needed for running EcoTyper:
- Docker or Singularity.
- CIBERSORTx executables than can be downloaded from the CIBERSORTx website, as Docker images. Specifically, EcoTyper requires the CIBERSORTx Fractions and CIBERSORTx HiRes modules. Please follow the instructions on the Download section of the website to download the Docker images and obtain the Docker tokens necessary for running them. If Singularity is used, the Docker images need to be converted to Singularity Image Files (SIF).
EcoTyper implementation
EcoTyper is a standalone software, implemented in R (not an R package). Some of the EcoTyper functions are computationally intensive, especially for the cell state discovery step described in Tutorials 4-6. Therefore, EcoTyper is designed as a collection of modular command-line R scripts, that can be run in parallel on a multi-processor server or a high-performance cluster. Each script is designed such that its instances can typically be run on a single core.
We provide wrappers over these scripts that encapsulate the typical EcoTyper workflows (Tutorials 1-6). These wrappers can be run on a multi-core system, and allow users to discover cell states and ecotypes in their own bulk, scRNA-seq and FACS-sorted data, as well as recover previously discovered cell states and ecotypes in bulk tissue expression profiles, spatial transcriptomics assays, and single-cell RNA-seq data.
EcoTyper overview
EcoTyper performs two major types of analysis: discovery of cell states and ecotypes, starting from bulk, scRNA-seq and pre-sorted cell type specific expression profiles (e.g. FACS-sorted or deconvolved in silico); and recovery of previously defined cell states and ecotypes in new bulk, scRNA-seq and spatial transcriptomics data.
When the input is bulk data, EcoTyper performs the following major steps for discovering cell states and ecotypes:
- In silico purification: This step enables imputation of cell type-specific gene expression profiles from bulk tissue transcriptomes, using CIBERSORTx (Newman et al., Nature Biotechnology 2019).
- Cell state discovery: This step enables identification and quantitation of cell type-specific transcriptional states.
- Ecotype discovery: This step enables co-assignment of cell states into multicellular communities (ecotypes).
When the input is scRNA-seq or bulk-sorted cell type-specific profiles (e.g., FACS-purified), EcoTyper performs the following major steps for discovering cell states and ecotypes:
- Gene filtering: This step filters out genes that do not show cell type specificity.
- Cell state discovery: This step enables identification and quantitation of cell type-specific transcriptional states.
- Ecotype discovery: This step enables co-assignment of cell states into multicellular communities (ecotypes).
Regardless of the input type used for deriving cell states and ecotypes, EcoTyper can perform cell state and ecotype recovery in external expression datasets. The recovery can be performed in bulk, scRNA-seq and spatial transcriptomics data.
A book chapter published in Methods in Molecular Biology describing how to use EcoTyper in detail can be found here.
Additionally, we provide below 6 tutorials illustrating these functionalities. The first three demonstrate how the recovery of cell states and ecotypes can be performed with various input types. The last three demonstrate how the recovery of cell states and ecotypes can be performed with various input types:
- Tutorial 1: Recovery of Cell States and Ecotypes in User-Provided Bulk Data
- Tutorial 2: Recovery of Cell States and Ecotypes in User-Provided scRNA-seq Data
- Tutorial 3: Recovery of Cell States and Ecotypes in Visium Spatial Gene Expression Data
- Tutorial 4: De novo Discovery of Cell States and Ecotypes in Bulk Expression Data
- Tutorial 5: De novo Discovery of Cell States and Ecotypes in scRNA-seq Data
- Tutorial 6. De novo Discovery of Cell States and Ecotypes in Pre-Sorted Data
A schema of the tutorials is presented below:
<img src="utils/schema.png" width="100%" style="display: block; margin: auto;" />Tutorial 1: Recovery of Cell States and Ecotypes in User-Provided Bulk Data
EcoTyper comes pre-loaded with the resources necessary for the reference-guided recovery of cell states and ecotypes previously defined in carcinoma or lymphoma, in user-provided bulk expression data. In the carcinoma EcoTyper paper, we demonstrate that prior deconvolution of bulk data using CIBERSORTx HiRes i
