BigSCale2
Framework for clustering, phenotyping, pseudotiming and inferring gene regulatory networks from single cell data
Install / Use
/learn @iaconogi/BigSCale2README
bigSCale 2 IS NOT LONGER MAINTAINED
I will no longer support for bugs nor questions, I leave it up to the community!
bigSCale is a complete framework for the analysis and visualization of single cell data. It allows to cluster, phenotype, perform pseudotime analysis, infer gene regulatory networks and reduce large datasets in smaller datasets with higher quality. The latest update alos allows analysis of single cell ATAC-seq data.
Why using bigSCale 2?
- bigSCale2 features the most sensitive and accurate marker detection and classification. No method is used to reduce dimensions, every bit of information is retained.
- bigSCale2 allows to infer the gene regulatory networks for any single cell dataset.
- bigSCale2 allows to compress large datasets of any size into a smaller datasets of higher quality, without loss of information. One millions cells are too many to be analyzed by your favourite tool ? Reduce them to a dataset with less cells of increased quality and go for it!
Citatations If you used bigSCale2 please cite out papers Iacono 2018 and Iacono 2019.
UPDATES<br /> v2.0 (22/11/2019): Several Major updates, especially for the networks part. It is now possible to export and visualize the networks in Cytoscape. Also improved processig when inferring networks of different conditions to compare them. <br /> v1.7 (04/07/2019): Fixed a bug inside the differential expression code. You might want to re-run your analysis (including networks, clustering, phenotyping ....) to have better results. Thanks to JasonLiZhou for signalling it.<br /> v1.6 (04/07/2019): Added a pipeline for single cell ATAC-seq data.<br /> v1.5 (28/06/2019): Fixed a bug in the iCells which was causing an excessive use of memory usage.<br />
Quick Start
bigSCale is formed by three sub-tools which can be used either independently or in synergy. Each sub-tool has its own tutorial.<br /> bigSCale 2 Core allows to cluster, phenotype and perform pseudo-time analysis. It's the main tool of bigSCale, published in Iacono 2018.<br /> bigSCale 2 GRN is the newest addition: it is the module to infer gene regulatory networks from single cell data. Iacono 2019<br /> bigSCale 2 iCells allows to reduce the dimension of any given large dataset (also millions of cells, without any loss of information) so that it can be easily and quickly analyzed by any tool.The resulting dataset has less cells with higher quality, so it can be analyzed better. It DOES NOT require any external tool such as the loom framework. <br /> For help or questions contact me at gio.iacono.work@gmail.com To install the package run
devtools::install_github("iaconogi/bigSCale2")
bigSCale 2 Core (clustering, phenotyping, pseudotime)
bigSCale 2 Gene Regulatory Networks
bigSCale 2 iCells for big Datasets
bigSCale 2 Core
Running the analysis
READ BEFORE STARTING bigSCale2 is a special tool designed to have maximum accuracy in clustering and detection of markers. bigSCale2 achieves extreme accuracy also because it does not use any dimensionality reduction. If you have a large number of cells and you want to cluster/phenotype it with bigSCale2 Core, then first process it with the tool bigSCale2 iCells. As a rule of thumb, you can analyze directly (without process with iCells) up to 20K cells if you have 16 Gb of RAM or up to 40K cells if you have 32b of RAM.
<br />bigSCale2 works with the SingleCellExperiment class. This class is a container meant to store in an organized way single cell data.
bigScale2 requires two elements to be present in the single cell class: the counts counts() and the gene names rownames().
The counts must be raw counts! The genes must no be filtered, aside from removing, if you want, the gene with all zero values.
<br />
Let us first load an example dataset : 3005 single cells from adult mouse brain Zeisel 2015
data(sce)
As you can see, the sce object contains the expression values for 19972 genes in 3005 cells. In its most basic use, bigScale is run with just one command sce=bigscale(sce) which will automatically perform all the analysis. However, for time reasons, we will instruct bigSCale2 to perform a quick analysis to save us time, by specifying speed.preset='fast', which greatly reduces the the time required to compute markers and differentially expressed genes, but at the expenses of the quality and accuracy (uses only wilcoxon test). In a real analysis we reccomand not to use this setting, and achieve maximum accuracy leaving as default speed.preset='slow' (leaving speed.preset='fast' works well when you have lots of cells, say>15K or 20K).
sce=bigscale(sce,speed.preset='fast')
The analysis are now all complete and stored again in the sce object. In the next part we'll see how to visualize the results.
Visualizing the results
Clusters and signatures of co-expressed genes
bigSCale2 feature a basic set of plot types to visualize the main results of clustering and phenotyping.<br /> First, we make a plot of the clusters and signatures of coexpressed genes. After some recent updates, to view the signaures you must run fisrt some additional lines of code.
sce=setDistances(sce)
sce=setClusters(sce)
sce=storeTransformed(sce)
viewSignatures(sce)

In this plot you can see
- The dendrogram representing how the cells are phenotypically organized and clustered
- Colored bars representing the clusters, the library size (meant as a proxy to transcriptome size/complexity) and the pseudotime of the cells. An additional color bar is displayed for any user custom
colData()(for example, sample batches, conditions and so on ...). For custom usercolData, the color codes are automatically chosen upoen the type of data (numeric or factor). - The clustered signatures of coexpressed genes alogside their size. Here, all the genes differentially expressed are organized in signatures of co-expressed genes.
Markers of specific clusters
Next, we would like to inspect the markers of a specific cluster, let's say cluster 2. To this end, we run.
viewSignatures(sce,selected.cluster=2)

Now, the plot is the same as before, but in place of the signatures of coexpressed genes we see the markers of cluster 2 stratified by level of specificity. If you read my paper Iacono 2018 then you'll know what this means. Shortly, markers of level 1 are the most specific to a given cluster. Level 1 means that this 417 genes are expressed only in cluster 2 . However, shared markers are also very important in biology. Think to all the markers shared by neuronal cell types as opposed to glial cell types. Shared genes are represented in biSCale by markers of increasing levels. Markers of level 2 (629 genes) are markers shared between cluster 2 and at most another cluster. Markers of level 3 are shared shared between cluster 2 and at most two other clusters, and so on. These markers of higher levels are typically lost by other computational tools.
Barplot of selected genes
To plot gene expression at single cell level with colored clusters. This plot works well in synergy with the plot of the hierachical clustering.
viewGeneBarPlot(sce,gene.list = c('Aqp4','Olig1','Thy1'))

Violin plot of a selected gene
viewGeneViolin(sce,'Aqp4')

t-SNE and UMAP plots
viewReduced(sce) # to see t-SNE with clusters

viewReduced(sce,color.by = 'Stmn2') # to see t-SNE with gene expression

If you want to color the cell according to some custom annotation you pass a factor variable in place of a gene name. If you want to visualize a UMAP plot first compute the UMAP data with sce=storeUMAP(sce) and then viewReduced(sce,method = 'UMAP')
Browsing markers
To have a look to the markers found by bigscale we retrive Mlist from the single cell object. Mlist is a 2 dimensional list containing for each cluster the markers of the different levels. Let's inspect the markers of level 1 (m
Related Skills
node-connect
337.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.2kCommit, push, and open a PR
