SkillAgentSearch skills...

Sccomp

Bayesian mixed-effect model to test differences in cell type proportions from single-cell data, in R

Install / Use

/learn @MangiolaLaboratory/Sccomp

README

sccomp: Differential Composition and Variability Analysis for Single-Cell Data

Stefano Mangiola 2025-07-18

<!-- badges: start -->

Lifecycle:maturing R build
status

<!-- badges: end -->

<img src="inst/logo-01.png" height="139px" width="120px"/>

sccomp is a powerful R package designed for comprehensive differential composition and variability analysis in single-cell genomics, proteomics, and microbiomics data.

Why sccomp?

For cellular omic data, no method for differential variability analysis exists, and methods for differential composition analysis only take a few fundamental data properties into account. Here we introduce sccomp, a generalised method for differential composition and variability analyses capable of jointly modelling data count distribution, compositionality, group-specific variability, and proportion mean-variability association, while being robust to outliers.

<img src="inst/cartoon_methods.png" width="100%"/>

Comprehensive Method Comparison

  • I: Data are modelled as counts.
  • II: Group proportions are modelled as compositional.
  • III: The proportion variability is modelled as cell-type specific.
  • IV: Information sharing across cell types, mean–variability association.
  • V: Outlier detection or robustness.
  • VI: Differential variability analysis.
  • VII Mixed effect modelling
  • VIII Removal unwanted effects

| Method | Year | Model | I | II | III | IV | V | VI | VII | VIII | |----|----|----|----|----|----|----|----|----|----|----| | sccomp | 2023 | Sum-constrained Beta-binomial | ● | ● | ● | ● | ● | ● | ● | ● | | scCODA | 2021 | Dirichlet-multinomial | ● | ● | | | | | | | | quasi-binom. | 2021 | Quasi-binomial | ● | | ● | | | | | | | rlm | 2021 | Robust-log-linear | | ● | | | ● | | | | | propeller | 2021 | Logit-linear + limma | | ● | ● | ● | | | | | | ANCOM-BC | 2020 | Log-linear | | ● | ● | | | | | | | corncob | 2020 | Beta-binomial | ● | | ● | | | | | | | scDC | 2019 | Log-linear | | ● | ● | | | | | | | dmbvs | 2017 | Dirichlet-multinomial | ● | ● | | | | | | | | MixMC | 2016 | Zero-inflated Log-linear | | ● | ● | | | | | | | ALDEx2 | 2014 | Dirichlet-multinomial | ● | ● | | | | | | |

Scientific Citation

Mangiola, Stefano, Alexandra J. Roth-Schulze, Marie Trussart, Enrique Zozaya-Valdés, Mengyao Ma, Zijie Gao, Alan F. Rubin, Terence P. Speed, Heejung Shim, and Anthony T. Papenfuss. 2023. “Sccomp: Robust Differential Composition and Variability Analysis for Single-Cell Data.” Proceedings of the National Academy of Sciences of the United States of America 120 (33): e2203828120. https://doi.org/10.1073/pnas.2203828120 PNAS - sccomp: Robust differential composition and variability analysis for single-cell data

Talk

<a href="https://www.youtube.com/watch?v=R_lt58We9nA&ab_channel=RConsortium" target="_blank"> <img src="https://img.youtube.com/vi/R_lt58We9nA/mqdefault.jpg" alt="Watch the video" width="280" height="180" border="10" /> </a>

Installation Guide

sccomp is based on cmdstanr which provides the latest version of cmdstan the Bayesian modelling tool. cmdstanr is not on CRAN, so we need to have 3 simple step process (that will be prompted to the user is forgot).

  1. R installation of sccomp
  2. R installation of cmdstanr
  3. cmdstanr call to cmdstan installation

Bioconductor

if (!requireNamespace("BiocManager")) install.packages("BiocManager")

# Step 1
BiocManager::install("sccomp")

# Step 2
install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev/", getOption("repos")))

# Step 3
cmdstanr::check_cmdstan_toolchain(fix = TRUE) # Just checking system setting
cmdstanr::install_cmdstan()

Github

# Step 1
devtools::install_github("MangiolaLaboratory/sccomp")

# Step 2
install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev/", getOption("repos")))

# Step 3
cmdstanr::check_cmdstan_toolchain(fix = TRUE) # Just checking system setting
cmdstanr::install_cmdstan()

Server special requirements: Restricted or read-only environments

sccomp needs to write files to disk (compiled Stan models, draw files). On shared servers or restricted environments:

  1. Prefer installing locally – Install sccomp in a user-writable R library (e.g. ~/R/x86_64-pc-linux-gnu-library/4.x) so it can use the default cache ~/.sccomp_models.
  2. Or request write access – Ask your system administrator for permission to write in your user directories (e.g. ~/.sccomp_models).

If your administrator has pre-compiled models in a shared directory (e.g. /opt/sccomp_models), you can point sccomp to that cache before calling any sccomp function:

library(sccomp)

cache_stan_model_dir <- "/opt/sccomp_models"

# Set the cache directory before any sccomp call (sccomp_boxplot, sccomp_estimate, etc.
# use internal functions that check this cache)
utils::assignInNamespace("sccomp_stan_models_cache_dir", cache_stan_model_dir, ns = "sccomp")

# Now run your analysis
sccomp_result <- 
  counts_obj |>
  sccomp_estimate(formula_composition = ~ type, sample = "sample", cell_group = "cell_group", 
                  abundance = "count", cores = 1) |>
  sccomp_test()

sccomp_result |> sccomp_boxplot(factor = "type")

Alternatively, pass cache_stan_model explicitly in each call:

sccomp_result <- 
  counts_obj |>
  sccomp_estimate(..., cache_stan_model = "/opt/sccomp_models") |>
  sccomp_remove_outliers(cache_stan_model = "/opt/sccomp_models") |>
  sccomp_test()

sccomp_result |> sccomp_boxplot(factor = "type", cache_stan_model = "/opt/sccomp_models")

Core Functions

| Function | Description | |----|----| | sccomp_estimate | Fit the model onto the data, and estimate the coefficients | | sccomp_remove_outliers | Identify outliers probabilistically based on the model fit, and exclude them from the estimation | | sccomp_test | Calculate the probability that the coefficients are outside the H0 interval (i.e. test_composition_above_logit_fold_change) | | sccomp_replicate | Simulate data from the model, or part of the model | | sccomp_predict | Predicts proportions, based on the model, or part of the model | | sccomp_remove_unwanted_effects | Removes the variability for unwanted factors | | plot | Plots summary plots to assess significance |

Analysis Tutorial

library(dplyr)
library(sccomp)
library(ggplot2)
library(forcats)
library(tidyr)
data("seurat_obj")
data("sce_obj")
data("counts_obj")

Binary Factor Analysis

Of the output table, the estimate columns start with the prefix c_ indicate composition, or with v_ indicate variability (when formula_variability is set).

From Seurat, SingleCellExperiment, metadata objects

sccomp_result = 
  sce_obj |>
  sccomp_estimate( 
    formula_composition = ~ type, 
    sample = "sample", 
    cell_group = "cell_group", 
    cores = 1,
    verbose = FALSE
  ) |> 
  sccomp_test()

From counts

sccomp_result = 
  counts_obj |>
  sccomp_estimate( 
    formula_composition = ~ type, 
    sample = "sample",
    cell_group = "cell_group",
    abundance = "count", 
    cores = 1, verbose = FALSE
  ) |> 
  sccomp_test()

Here you see the results of the fit, the effects of the factor on composition and variability. You also can see the uncertainty around those effects.

The output is a tibble containing the Following columns

  • cell_group - The cell groups being tested.
  • parameter - The parameter being estimated from the design matrix described by the input formula_composition and formula_variability.
  • factor - The covariate factor in the formula, if applicable (e.g., not present for Intercept or contrasts).
  • c_lower - Lower (2.5%) quantile of the posterior distribution for a composition (c) parameter.
  • c_effect - Mean of the posterior distribution for a composition (c) parameter.
  • c_upper - Upper (97.5%) quantile of the posterior distribution for a composition (c) parameter.
  • c_pH0 - Probability of the null hypothesis (no difference) for a composition (c). This is not a p-value.
  • c_FDR - False-discovery rate of the null hypothesis for a composition (c).
  • v_lower - Lower (2.5%) quantile of the posterior distribution for a variability (v) parameter.
  • v_effect - Mean of the posterior distribution for a variability (v) parameter.
  • v_upper - Upper (97.5%) quantile of the posterior distribution for a variability (v) parameter.
  • v_pH0 - Probability of the null hypothesis for a variability (v).
  • v_FDR - False-discovery rate of the null hypothesis for a variability (v).
  • count_data - Nested input count data.
sccomp_result
## sccomp model
## ============
## 
## Model specifications:
##   Family: multi_beta_binomial 
##   Composition formula: ~type 
##   Variability formula: ~1 
##   Inference method: pathfinder 
## 
## Data: Samples: 20   Cell groups: 36 
## 
## Column prefixes: c_ -> composition parameters  v_ -> variability parameters
## 
## Convergence diagnostics:
##   For each parameter, n_eff is the effective sample size and R_k_hat is the potential
##   scale reduction factor on split chains (at convergence, R_k_hat = 1).
## 
## # A tibble: 72 × 19
##    cell_group parameter   factor c_lower c_effect c_upper   c_pH0   c_FDR c_rhat
##    <chr>      <chr>       <chr>    <dbl>    <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
##  1 B1         (Intercept) <NA>    0.946     1.19   1.45   0       0        1.0

Related Skills

View on GitHub
GitHub Stars122
CategoryDevelopment
Updated1d ago
Forks14

Languages

R

Security Score

100/100

Audited on Mar 24, 2026

No findings