Sccomp
Bayesian mixed-effect model to test differences in cell type proportions from single-cell data, in R
Install / Use
/learn @MangiolaLaboratory/SccompREADME
sccomp: Differential Composition and Variability Analysis for Single-Cell Data
Stefano Mangiola 2025-07-18
<!-- badges: start --> <!-- badges: end --><img src="inst/logo-01.png" height="139px" width="120px"/>
sccomp is a powerful R package designed for comprehensive differential composition and variability analysis in single-cell genomics, proteomics, and microbiomics data.
Why sccomp?
For cellular omic data, no method for differential variability analysis exists, and methods for differential composition analysis only take a few fundamental data properties into account. Here we introduce sccomp, a generalised method for differential composition and variability analyses capable of jointly modelling data count distribution, compositionality, group-specific variability, and proportion mean-variability association, while being robust to outliers.
<img src="inst/cartoon_methods.png" width="100%"/>Comprehensive Method Comparison
- I: Data are modelled as counts.
- II: Group proportions are modelled as compositional.
- III: The proportion variability is modelled as cell-type specific.
- IV: Information sharing across cell types, mean–variability association.
- V: Outlier detection or robustness.
- VI: Differential variability analysis.
- VII Mixed effect modelling
- VIII Removal unwanted effects
| Method | Year | Model | I | II | III | IV | V | VI | VII | VIII | |----|----|----|----|----|----|----|----|----|----|----| | sccomp | 2023 | Sum-constrained Beta-binomial | ● | ● | ● | ● | ● | ● | ● | ● | | scCODA | 2021 | Dirichlet-multinomial | ● | ● | | | | | | | | quasi-binom. | 2021 | Quasi-binomial | ● | | ● | | | | | | | rlm | 2021 | Robust-log-linear | | ● | | | ● | | | | | propeller | 2021 | Logit-linear + limma | | ● | ● | ● | | | | | | ANCOM-BC | 2020 | Log-linear | | ● | ● | | | | | | | corncob | 2020 | Beta-binomial | ● | | ● | | | | | | | scDC | 2019 | Log-linear | | ● | ● | | | | | | | dmbvs | 2017 | Dirichlet-multinomial | ● | ● | | | | | | | | MixMC | 2016 | Zero-inflated Log-linear | | ● | ● | | | | | | | ALDEx2 | 2014 | Dirichlet-multinomial | ● | ● | | | | | | |
Scientific Citation
Mangiola, Stefano, Alexandra J. Roth-Schulze, Marie Trussart, Enrique Zozaya-Valdés, Mengyao Ma, Zijie Gao, Alan F. Rubin, Terence P. Speed, Heejung Shim, and Anthony T. Papenfuss. 2023. “Sccomp: Robust Differential Composition and Variability Analysis for Single-Cell Data.” Proceedings of the National Academy of Sciences of the United States of America 120 (33): e2203828120. https://doi.org/10.1073/pnas.2203828120 PNAS - sccomp: Robust differential composition and variability analysis for single-cell data
Talk
<a href="https://www.youtube.com/watch?v=R_lt58We9nA&ab_channel=RConsortium" target="_blank"> <img src="https://img.youtube.com/vi/R_lt58We9nA/mqdefault.jpg" alt="Watch the video" width="280" height="180" border="10" /> </a>Installation Guide
sccomp is based on cmdstanr which provides the latest version of
cmdstan the Bayesian modelling tool. cmdstanr is not on CRAN, so we
need to have 3 simple step process (that will be prompted to the user is
forgot).
- R installation of
sccomp - R installation of
cmdstanr cmdstanrcall tocmdstaninstallation
Bioconductor
if (!requireNamespace("BiocManager")) install.packages("BiocManager")
# Step 1
BiocManager::install("sccomp")
# Step 2
install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev/", getOption("repos")))
# Step 3
cmdstanr::check_cmdstan_toolchain(fix = TRUE) # Just checking system setting
cmdstanr::install_cmdstan()
Github
# Step 1
devtools::install_github("MangiolaLaboratory/sccomp")
# Step 2
install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev/", getOption("repos")))
# Step 3
cmdstanr::check_cmdstan_toolchain(fix = TRUE) # Just checking system setting
cmdstanr::install_cmdstan()
Server special requirements: Restricted or read-only environments
sccomp needs to write files to disk (compiled Stan models, draw files). On shared servers or restricted environments:
- Prefer installing locally – Install sccomp in a user-writable R library (e.g.
~/R/x86_64-pc-linux-gnu-library/4.x) so it can use the default cache~/.sccomp_models. - Or request write access – Ask your system administrator for permission to write in your user directories (e.g.
~/.sccomp_models).
If your administrator has pre-compiled models in a shared directory (e.g. /opt/sccomp_models), you can point sccomp to that cache before calling any sccomp function:
library(sccomp)
cache_stan_model_dir <- "/opt/sccomp_models"
# Set the cache directory before any sccomp call (sccomp_boxplot, sccomp_estimate, etc.
# use internal functions that check this cache)
utils::assignInNamespace("sccomp_stan_models_cache_dir", cache_stan_model_dir, ns = "sccomp")
# Now run your analysis
sccomp_result <-
counts_obj |>
sccomp_estimate(formula_composition = ~ type, sample = "sample", cell_group = "cell_group",
abundance = "count", cores = 1) |>
sccomp_test()
sccomp_result |> sccomp_boxplot(factor = "type")
Alternatively, pass cache_stan_model explicitly in each call:
sccomp_result <-
counts_obj |>
sccomp_estimate(..., cache_stan_model = "/opt/sccomp_models") |>
sccomp_remove_outliers(cache_stan_model = "/opt/sccomp_models") |>
sccomp_test()
sccomp_result |> sccomp_boxplot(factor = "type", cache_stan_model = "/opt/sccomp_models")
Core Functions
| Function | Description |
|----|----|
| sccomp_estimate | Fit the model onto the data, and estimate the coefficients |
| sccomp_remove_outliers | Identify outliers probabilistically based on the model fit, and exclude them from the estimation |
| sccomp_test | Calculate the probability that the coefficients are outside the H0 interval (i.e. test_composition_above_logit_fold_change) |
| sccomp_replicate | Simulate data from the model, or part of the model |
| sccomp_predict | Predicts proportions, based on the model, or part of the model |
| sccomp_remove_unwanted_effects | Removes the variability for unwanted factors |
| plot | Plots summary plots to assess significance |
Analysis Tutorial
library(dplyr)
library(sccomp)
library(ggplot2)
library(forcats)
library(tidyr)
data("seurat_obj")
data("sce_obj")
data("counts_obj")
Binary Factor Analysis
Of the output table, the estimate columns start with the prefix c_
indicate composition, or with v_ indicate variability (when
formula_variability is set).
From Seurat, SingleCellExperiment, metadata objects
sccomp_result =
sce_obj |>
sccomp_estimate(
formula_composition = ~ type,
sample = "sample",
cell_group = "cell_group",
cores = 1,
verbose = FALSE
) |>
sccomp_test()
From counts
sccomp_result =
counts_obj |>
sccomp_estimate(
formula_composition = ~ type,
sample = "sample",
cell_group = "cell_group",
abundance = "count",
cores = 1, verbose = FALSE
) |>
sccomp_test()
Here you see the results of the fit, the effects of the factor on composition and variability. You also can see the uncertainty around those effects.
The output is a tibble containing the Following columns
cell_group- The cell groups being tested.parameter- The parameter being estimated from the design matrix described by the inputformula_compositionandformula_variability.factor- The covariate factor in the formula, if applicable (e.g., not present for Intercept or contrasts).c_lower- Lower (2.5%) quantile of the posterior distribution for a composition (c) parameter.c_effect- Mean of the posterior distribution for a composition (c) parameter.c_upper- Upper (97.5%) quantile of the posterior distribution for a composition (c) parameter.c_pH0- Probability of the null hypothesis (no difference) for a composition (c). This is not a p-value.c_FDR- False-discovery rate of the null hypothesis for a composition (c).v_lower- Lower (2.5%) quantile of the posterior distribution for a variability (v) parameter.v_effect- Mean of the posterior distribution for a variability (v) parameter.v_upper- Upper (97.5%) quantile of the posterior distribution for a variability (v) parameter.v_pH0- Probability of the null hypothesis for a variability (v).v_FDR- False-discovery rate of the null hypothesis for a variability (v).count_data- Nested input count data.
sccomp_result
## sccomp model
## ============
##
## Model specifications:
## Family: multi_beta_binomial
## Composition formula: ~type
## Variability formula: ~1
## Inference method: pathfinder
##
## Data: Samples: 20 Cell groups: 36
##
## Column prefixes: c_ -> composition parameters v_ -> variability parameters
##
## Convergence diagnostics:
## For each parameter, n_eff is the effective sample size and R_k_hat is the potential
## scale reduction factor on split chains (at convergence, R_k_hat = 1).
##
## # A tibble: 72 × 19
## cell_group parameter factor c_lower c_effect c_upper c_pH0 c_FDR c_rhat
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 B1 (Intercept) <NA> 0.946 1.19 1.45 0 0 1.0
Related Skills
node-connect
335.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
335.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.5kCommit, push, and open a PR
