ScECODA
Exploratory compositional data analysis of single-cell data
Install / Use
/learn @carmonalab/ScECODAREADME
scECODA – single-cell Exploratory COmpositional Data Analysis
<p align="center"> <img width="154" height="154" alt="image" src="https://github.com/user-attachments/assets/ecd4f6c8-de4b-433c-b0f7-75aa2e37dee7" /> </p>Single-cell omics technologies are increasingly applied to large patient cohorts, opening the possibility of unsupervised patient stratification based on cellular states. Identifying patient groups that share common patterns of cellular dysregulation is a key step toward more precise therapeutic strategies. Cohort-level exploratory analysis of single-cell omics data requires mapping high-dimensional molecular profiles into a lower-dimensional patient-level (or more generally, sample-level) representation space, in which biologically meaningful cohort-level structure can be identified, for example through clustering.
scECODA is a scalable and interpretable framework for exploratory, cohort-level analysis and patient stratification based on single-cell transcriptomics. It enables intuitive exploration of multi-sample datasets –such as of large patient cohorts– and facilitates the unsupervised identification of samples with similar cell type compositional profiles. In addition, scECODA provides metrics to quantify the degree of separation between groups of samples and to pinpoint the cell types or states whose change in abundance drives these differences. scECODA takes as input scRNA-seq count matrices (as SingleCellExperiment or Seurat objects) with cell type labels, and provides simple sample-level representations such as the centered-log-ratio-transformed cell type compositional vectors, as well as pseudboulk-based representations.
Package Installation
To install scECODA directly from the GitHub repository, run the following code from within R or RStudio:
install.packages("remotes")
library(remotes)
remotes::install_github("carmonalab/scECODA")
Example
The following example uses 868 scRNA-seq samples from the blood of healthy donors (data from Gong & Sharma et al.) with previously annotated cell types. It illustrates how samples naturally separate in an unsupervised manner by donor age and CMV infection status, and highlights the top cell types whose changes in abundance drive inter-sample variation.
ecoda_object <- ecoda(
seurat_object, # or SingleCellExperiment object or count data
sample_col = "sample_id", # Metadata column containing sample annotation for each cell
celltype_col = "celltype_annotations" # Metadata column containing cell type annotations
)
plot_pca(ecoda_object)
See also the Tutorial below.
<img width="2700" height="2100" alt="GongSharma" src="https://github.com/user-attachments/assets/aa8b34ba-722c-495d-a9f7-3aea92842652" />Code to reproduce this figure: https://github.com/carmonalab/scECODA/blob/main/data-raw/Create_readme_figure.rmd
Tutorial
Check out our step-by-step scECODA tutorial (RMD)
Case studies
Case Study 1: Granularity Matters - See how fine-grained cell type annotation can be crucial to uncover inter-sample biological variation missed by broad, low-resolution annotation or pseudobulk gene expression in these ECODA anlayses of i) blood samples from healthy individuals and ii) lung samples from patients with different pulmonary diseases. (RMD)
Case Study 2: Cell type composition vs. Pseudo-bulk gene expression - See how scECODA's compositional analysis compares to pseudobulk analysis and outperforms it when differences are driven by low-abundance cell types in a semi-synthetic dataset. (RMD)
References
Halter C, et al. 2025
