Sccomp

Bayesian mixed-effect model to test differences in cell type proportions from single-cell data, in R

Generate Convert Improve

Install / Use

/learn @MangiolaLaboratory/Sccomp

About this skill

Quality Score

0/100

README

sccomp: Differential Composition and Variability Analysis for Single-Cell Data

Stefano Mangiola 2025-07-18

<img src="inst/logo-01.png" height="139px" width="120px"/>

sccomp is a powerful R package designed for comprehensive differential composition and variability analysis in single-cell genomics, proteomics, and microbiomics data.

Why sccomp?

For cellular omic data, no method for differential variability analysis exists, and methods for differential composition analysis only take a few fundamental data properties into account. Here we introduce sccomp, a generalised method for differential composition and variability analyses capable of jointly modelling data count distribution, compositionality, group-specific variability, and proportion mean-variability association, while being robust to outliers.

Comprehensive Method Comparison

I: Data are modelled as counts.
II: Group proportions are modelled as compositional.
III: The proportion variability is modelled as cell-type specific.
IV: Information sharing across cell types, mean–variability association.
V: Outlier detection or robustness.
VI: Differential variability analysis.
VII Mixed effect modelling
VIII Removal unwanted effects

| Method | Year | Model | I | II | III | IV | V | VI | VII | VIII | |----|----|----|----|----|----|----|----|----|----|----| | sccomp | 2023 | Sum-constrained Beta-binomial | ● | ● | ● | ● | ● | ● | ● | ● | | scCODA | 2021 | Dirichlet-multinomial | ● | ● | | | | | | | | quasi-binom. | 2021 | Quasi-binomial | ● | | ● | | | | | | | rlm | 2021 | Robust-log-linear | | ● | | | ● | | | | | propeller | 2021 | Logit-linear + limma | | ● | ● | ● | | | | | | ANCOM-BC | 2020 | Log-linear | | ● | ● | | | | | | | corncob | 2020 | Beta-binomial | ● | | ● | | | | | | | scDC | 2019 | Log-linear | | ● | ● | | | | | | | dmbvs | 2017 | Dirichlet-multinomial | ● | ● | | | | | | | | MixMC | 2016 | Zero-inflated Log-linear | | ● | ● | | | | | | | ALDEx2 | 2014 | Dirichlet-multinomial | ● | ● | | | | | | |

Scientific Citation

Mangiola, Stefano, Alexandra J. Roth-Schulze, Marie Trussart, Enrique Zozaya-Valdés, Mengyao Ma, Zijie Gao, Alan F. Rubin, Terence P. Speed, Heejung Shim, and Anthony T. Papenfuss. 2023. “Sccomp: Robust Differential Composition and Variability Analysis for Single-Cell Data.” Proceedings of the National Academy of Sciences of the United States of America 120 (33): e2203828120. https://doi.org/10.1073/pnas.2203828120 PNAS - sccomp: Robust differential composition and variability analysis for single-cell data

Talk

Installation Guide

sccomp is based on cmdstanr which provides the latest version of cmdstan the Bayesian modelling tool. cmdstanr is not on CRAN, so we need to have 3 simple step process (that will be prompted to the user is forgot).

R installation of sccomp
R installation of cmdstanr
cmdstanr call to cmdstan installation

Bioconductor

if (!requireNamespace("BiocManager")) install.packages("BiocManager")

# Step 1
BiocManager::install("sccomp")

# Step 2
install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev/", getOption("repos")))

# Step 3
cmdstanr::check_cmdstan_toolchain(fix = TRUE) # Just checking system setting
cmdstanr::install_cmdstan()

Github

# Step 1
devtools::install_github("MangiolaLaboratory/sccomp")

# Step 2
install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev/", getOption("repos")))

# Step 3
cmdstanr::check_cmdstan_toolchain(fix = TRUE) # Just checking system setting
cmdstanr::install_cmdstan()

Server special requirements: Restricted or read-only environments

sccomp needs to write files to disk (compiled Stan models, draw files). On shared servers or restricted environments:

Prefer installing locally – Install sccomp in a user-writable R library (e.g. ~/R/x86_64-pc-linux-gnu-library/4.x) so it can use the default cache ~/.sccomp_models.
Or request write access – Ask your system administrator for permission to write in your user directories (e.g. ~/.sccomp_models).

If your administrator has pre-compiled models in a shared directory (e.g. /opt/sccomp_models), you can point sccomp to that cache before calling any sccomp function:

library(sccomp)

cache_stan_model_dir <- "/opt/sccomp_models"

# Set the cache directory before any sccomp call (sccomp_boxplot, sccomp_estimate, etc.
# use internal functions that check this cache)
utils::assignInNamespace("sccomp_stan_models_cache_dir", cache_stan_model_dir, ns = "sccomp")

# Now run your analysis
sccomp_result <- 
  counts_obj |>
  sccomp_estimate(formula_composition = ~ type, sample = "sample", cell_group = "cell_group", 
                  abundance = "count", cores = 1) |>
  sccomp_test()

sccomp_result |> sccomp_boxplot(factor = "type")

Alternatively, pass cache_stan_model explicitly in each call:

sccomp_result <- 
  counts_obj |>
  sccomp_estimate(..., cache_stan_model = "/opt/sccomp_models") |>
  sccomp_remove_outliers(cache_stan_model = "/opt/sccomp_models") |>
  sccomp_test()

sccomp_result |> sccomp_boxplot(factor = "type", cache_stan_model = "/opt/sccomp_models")

Core Functions

| Function | Description | |----|----| | sccomp_estimate | Fit the model onto the data, and estimate the coefficients | | sccomp_remove_outliers | Identify outliers probabilistically based on the model fit, and exclude them from the estimation | | sccomp_test | Calculate the probability that the coefficients are outside the H0 interval (i.e. test_composition_above_logit_fold_change) | | sccomp_replicate | Simulate data from the model, or part of the model | | sccomp_predict | Predicts proportions, based on the model, or part of the model | | sccomp_remove_unwanted_effects | Removes the variability for unwanted factors | | plot | Plots summary plots to assess significance |

Analysis Tutorial

library(dplyr)
library(sccomp)
library(ggplot2)
library(forcats)
library(tidyr)
data("seurat_obj")
data("sce_obj")
data("counts_obj")

Binary Factor Analysis

Of the output table, the estimate columns start with the prefix c_ indicate composition, or with v_ indicate variability (when formula_variability is set).

From Seurat, SingleCellExperiment, metadata objects

sccomp_result = 
  sce_obj |>
  sccomp_estimate( 
    formula_composition = ~ type, 
    sample = "sample", 
    cell_group = "cell_group", 
    cores = 1,
    verbose = FALSE
  ) |> 
  sccomp_test()

From counts

sccomp_result = 
  counts_obj |>
  sccomp_estimate( 
    formula_composition = ~ type, 
    sample = "sample",
    cell_group = "cell_group",
    abundance = "count", 
    cores = 1, verbose = FALSE
  ) |> 
  sccomp_test()

Here you see the results of the fit, the effects of the factor on composition and variability. You also can see the uncertainty around those effects.

The output is a tibble containing the Following columns

cell_group - The cell groups being tested.
parameter - The parameter being estimated from the design matrix described by the input formula_composition and formula_variability.
factor - The covariate factor in the formula, if applicable (e.g., not present for Intercept or contrasts).
c_lower - Lower (2.5%) quantile of the posterior distribution for a composition (c) parameter.
c_effect - Mean of the posterior distribution for a composition (c) parameter.
c_upper - Upper (97.5%) quantile of the posterior distribution for a composition (c) parameter.
c_pH0 - Probability of the null hypothesis (no difference) for a composition (c). This is not a p-value.
c_FDR - False-discovery rate of the null hypothesis for a composition (c).
v_lower - Lower (2.5%) quantile of the posterior distribution for a variability (v) parameter.
v_effect - Mean of the posterior distribution for a variability (v) parameter.
v_upper - Upper (97.5%) quantile of the posterior distribution for a variability (v) parameter.
v_pH0 - Probability of the null hypothesis for a variability (v).
v_FDR - False-discovery rate of the null hypothesis for a variability (v).
count_data - Nested input count data.

sccomp_result

## sccomp model
## ============
## 
## Model specifications:
##   Family: multi_beta_binomial 
##   Composition formula: ~type 
##   Variability formula: ~1 
##   Inference method: pathfinder 
## 
## Data: Samples: 20   Cell groups: 36 
## 
## Column prefixes: c_ -> composition parameters  v_ -> variability parameters
## 
## Convergence diagnostics:
##   For each parameter, n_eff is the effective sample size and R_k_hat is the potential
##   scale reduction factor on split chains (at convergence, R_k_hat = 1).
## 
## # A tibble: 72 × 19
##    cell_group parameter   factor c_lower c_effect c_upper   c_pH0   c_FDR c_rhat
##    <chr>      <chr>       <chr>    <dbl>    <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
##  1 B1         (Intercept) <NA>    0.946     1.19   1.45   0       0        1.0

Related Skills

node-connect

335.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

82.5k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

335.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

82.5k

Commit, push, and open a PR