ScDesign3
scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics
Install / Use
/learn @SONGDONGYUAN1994/ScDesign3README
scDesign3
The R package scDesign3 is an all-in-one single-cell data simulation tool by using reference datasets with different cell states (cell types, trajectories or and spatial coordinates), different modalities (gene expression, chromatin accessibility, protein abundance, DNA methylation, etc), and complex experimental designs. The transparent parameters enable users to alter models as needed; the model evaluation metrics (AIC, BIC) and convenient visualization function help users select models. <span style="color:blue"> Detailed tutorials that illustrate various functionalities of scDesign3 are available at this website</span>. The following illustration figure summarizes the usage of scDesign3:
<img src="man/figures/scDesign3_illustration.png" width="600"/>To find out more details about scDesign3, you can check out our manuscript on Nature Biotechnology:
The computational time is quadratic to the number of features used in copula modeling. Reducing this number will greatly speed up the calculation.
Please note that the parallel computing of scDesign3 is mainly designed for UNIX OS; be careful when you set n_cores. Please note that you should consider the balance between n_cores and your ROM (memory). Simply increasing the number of cores without the increase of memory will slow down or froze your program. We recommend that you should allocate at least 1 GB for 1 core.
Table of contents
Installation<a name="installation-"></a>
To install the development version from GitHub, please run:
if (!require("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("SONGDONGYUAN1994/scDesign3")
We are now working on submitting it to Bioconductor and will provide the link once online.
Quick Start<a name="quick-start"></a>
The following code is a quick example of running our simulator. The function scdesign3() takes in a SinglecellExperiment object with the cell covariates(such as cell types, pseudotime, or spatial coordinates) stored in the colData of the SinglecellExperiment object. For more details on the SinlgeCellExperiment object, please check on its Bioconductor link.
example_simu <- scdesign3(
sce = example_sce,
assay_use = "counts",
celltype = "cell_type",
pseudotime = "pseudotime",
spatial = NULL,
other_covariates = NULL,
mu_formula = "s(pseudotime, k = 10, bs = 'cr')",
sigma_formula = "s(pseudotime, k = 5, bs = 'cr')",
family_use = "nb",
n_cores = 2,
correlation_function = "default",
usebam = FALSE,
corr_formula = "1",
copula = "gaussian",
fastmvn = FALSE,
DT = TRUE,
pseudo_obs = FALSE,
family_set = c("gauss", "indep"),
important_feature = "all",
nonnegative = TRUE,
return_model = FALSE,
nonzerovar = FALSE,
parallelization = "mcmapply",
BPPARAM = NULL,
trace = FALSE
)
The parameters of scdesign3() are:
sce: A SingleCellExperiment object.assay_use: A string which indicates the assay you will use in the sce. Default is 'counts'.celltype: A string of the column name of the cell type variable in the colData of the sce. Default is 'cell_type'. The cell type variable in the colData of the sce should be a factor variable. Use NULL if there is no column in the colData that contains the cell-type information.pseudotime: A string or a string vector of the name of pseudotime and (if exist) multiple lineages. Default is NULL. If the data only has one lineage, then this parameter should be the column name of the pseudotime variable in the colData of the sce. If the data has multiple lingaes, then this parameter be the column names of the pseudotime variables for each lineage and the variables indicating which lineage that a cell belongs to. The pseudotime variables should be continuous numeric variables.spatial: A length-two string vector of the column names of spatial coordinates in the colData of sce. Default is NULL.other_covariate: A string or a string vector of the other covariates in the colData of sce you want to include in the data. For example, you can put the column names of the batch variables and/or condition variables in the colData of sce here if your sce contains these information and you want to include these variables inmu_formulaorsigma_formulaorcorr_formula.mu_formula: A string of the mu parameter formula for fitting each gene's marginal distribution.sigma_formula: A string of the sigma parameter formula for fitting each gene's marginal distribution.family_use: A string of the marginal distribution you want to use when fitting each gene's marginal distribution. Must be one of 'poisson', 'nb', 'zip', 'zinb' or 'gaussian'.n_cores: An integer. The number of cores to use.correlation_function: A string. If 'default', the function from Rfast; if 'coop', the function from coop, which calls BLAS.usebam: A logic variable. If TRUE, use bam (generalized additive models for very large datasets) for acceleration.edf_flexible: A logic variable. If TRUE, the degree of freedom for each gene's regression model will be automatically selected for acceleration.corr_formula: A string of the correlation structure. For example, if you want to obtain a correlation structure for each cell type, then this parameter should be the column name of the cell type variable in the colData of sce.copula: A string of the copula choice. Must be one of 'gaussian' or 'vine'. Default is 'gaussian'. Note that vine copula may have better modeling of high-dimensions, but can be very slow when features are >1000.fastmvn: An logical variable. If TRUE, the sampling of multivariate Gaussian is done by mvnfast, otherwise by mvtnorm. Default is FALSE. It only matters for Gaussian copula.DT: A logic variable. If TRUE, perform the distributional transformation to make the discrete data 'continuous'. This is useful for discrete distributions (e.g., Poisson, NB). Default is TRUE. Note that for continuous data (e.g., Gaussian), DT does not make sense and should be set as FALSE.pseudo_obs: A logic variable. If TRUE, use the empirical quantiles instead of theoretical quantiles for fitting copula. Default is FALSE.family_set: A string or a string vector of the bivariate copula families. Default is c("gauss", "indep").important_feature: A string or vector which indicates whether a gene will be used in correlation estimation or not. If this is a string, then this string must be "all" or "auto", which indicates that all genes are used or the genes will be automatically selected based on the proportion of zero expression across cells for each gene. Gene with zero proportion greater than 0.8 will be excluded form gene-gene correlation estimation. If this is a vector, then this should be a logical vector with length equal to the number of genes in sce. TRUE in the logical vector means the corresponding gene will be included in gene-gene correlation estimation and FALSE in the logical vector means the corresponding gene will be excluded from the gene-gene correlation estimation. The default value is "all".nonnegative: A logical variable. If TRUE, values < 0 in the synthetic data will be converted to 0. Default is TRUE (since the expression matrix is nonnegative).return_model: A logic variable. If TRUE, the marginal models and copula models will be returned. Default is FALSE.nonzerovar: A logical variable. If TRUE, for any gene with zero variance, a cell will be replaced with 1. This is designed for avoiding potential errors, for example, PCA.parallelization: A string indicating the specific parallelization function to use. Must be one of 'mcmapply', 'bpmapply', or 'pbmcmapply', which corresponds to the parallelization function in the package parallel,BiocParallel, and pbmcapply respectively. The default value is 'mcmapply'.BPPARAM: A MulticoreParam object or NULL. When the parameter parallelization = 'mcmapply' or 'pbmcmapply', this parameter must be NULL. When the parameter parallelization = 'bpmapply', this parameter must be one of the MulticoreParam object offered by the package 'BiocParallel. The default value is NULL.TRACE: A logic variable. If TRUE, the warning/error log and runtime for gam/gamlss will be returned, FALSE otherwise. Default is FALSE.
The output of scdesign3() is a list which includes:
new_count: This is the synthetic count matrix generated byscdesign3().new_covariate:- If the parameter
ncellis set to a number that is different from the number of cells in the input data, this will be a matrix that has the new cell covariates that are used for generating new data. - If the parameter
ncellis the default value, this will beNULL.
- If the parameter
model_aic: This is a vector include the genes' marginal models' AIC, fitted copula's AIC, and total AIC, which is the sum of the previous two.model_bic: This is a vector include the genes' marginal models' BIC, fitted copula's BIC, and total BIC, which is the sum of the previous two.marginal_list:- If the parameter
return_modelis set toTRUE, this will be a list which contains the fitted gam or gamlss model for all genes in the input data. This may greatly increase the object size. - If the paramet
- If the parameter
Related Skills
diffs
337.7kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
openpencil
1.8kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
animation-guide
A design system built with Base UI
