CMAP
Map large-scale single cells to the exact spatial locations
Install / Use
/learn @SuoLab-GZLab/CMAPREADME
CMAP
CMAP is an R package that encapsulates python code, for efficiently mapping large-scale individual cells to their precise spatial locations by integrating single-cell and spatial data through a divide-and-conquer strategy.
<img width="1140" alt="截屏2024-08-08 下午2 18 18" src="https://github.com/user-attachments/assets/4052a5ea-9ad3-467b-a936-b481ba213273">This approach allows for progressive refinement of spatial assignments, achieving high accuracy with reduced computational complexity. Unlike existing methods, CMAP achieves sub-spot precision, enabling precise mapping of individual cells within tissues.
How to install CMAP
To quickly use CMAP, we recommend setting up a new conda environment:
conda env create -f CMAP_EnV.environment.yml
After activating the created environment, follow the instructions in the Dependency_packages.txt file to install the necessary dependency packages. If you encouter any questions or errors during the installation of the dependencies, make sure that all packages listed in the Dependency_packages.txt file are installed. This file also provides installation commands for the required dependencies.
conda activate CMAP_Env
After that, the CMAP R package can be easily installed from Github using devtools (few seconds):
devtools::install_github("SuoLab-GZLab/CMAP")
How to run CMAP
The entire process requires a few hours to complete. We have made the MOB dataset available for users to reproduce the mapping results. You can download the simulation data from the following link (https://www.dropbox.com/scl/fo/e06uibevb368ej2v97tpp/APeWNuEbkILSyvS0b0dB2kA?rlkey=nxz6481jurujtm6ecqm4tf7v9&st=i9ho1q34&dl=0). The corresponding code (MOB.CMAP.R and MOB.CMAP.ipynb) is provided for your convenience.
1. Load the packages and set the path of python and saved directory
library(CMAP)
library(Seurat)
library(e1071)
library(purrr)
library(dplyr)
library(preprocessCore)
library(reticulate)
library(smfishHmrf)
library(Giotto)
python_path <- '/home/your/conda/envs/cmap/bin/python'
use_condaenv(python_path)
save_directory <- "/home/save/directory"
if(!file.exists(save_directory)) dir.create(save_directory, recursive = T)
2. Load scRNA-seq and ST data
Input formats supported:
CMAP supports two input formats: a processed Seurat object and an expression matrix.
You can directly load the expression matrix along with its corresponding meta-information. The expression matrix should have genes as rows and cells as columns.
spatial_location A dataframe must contain two columns (x and y), representing the spatial coordinates of each spot.
sc_meta A dataframe may include annotated cell type information; however, if this is not provided, it will not affect the performance of the CMAP mapping prediction.
If you are uncertain about the required input file formats, you can download the provided demo datasets and follow the outlined steps to adjust your data to the appropriate format for your analysis.
sc_count <- read.csv("sc_count.csv",row.names = 1, check.names=FALSE)
st_count <- read.csv("st_count.csv",row.names = 1, check.names=FALSE)
sc_meta <- read.csv("sc_meta.csv",row.names = 1, check.names=FALSE)
spatial_location <- read.csv("st_meta.csv",row.names = 1, check.names=FALSE)
You can also load the created Seurat objects. Here, we have provided demo datasets for testing (https://www.dropbox.com/scl/fi/q9axwdl8i5ukoctip12ro/CMAP.Demo.Lung_tumor.Data.Rdata?rlkey=an9l3kchva5yn5i80lqg1v0fv&st=jyvymggp&dl=0).
load("CMAP.Demo.Lung_tumor.Data.Rdata")
sc_counts <- sc_object@assays$RNA@counts
sc_meta <- data.frame(sc_object@meta.data,row.names=rownames(sc_object@meta.data))
spatial_count <- as.matrix(st_object@assays$Spatial@counts)
spatial_location <- data.frame(st_object@meta.data,row.names=rownames(st_object@meta.data))
# If you create the spatial object using tissue_hires_image.png, you can follow this command to calculate the spots' coordinates
spatial_location <- cbind(spatial_location,
x=st_object@images[[1]]@scale.factors$hires * st_object@images[[1]]@coordinates$imagecol,
y=-st_object@images[[1]]@coordinates$imagerow * st_object@images[[1]]@scale.factors$hires)
# Else if you create the spatial object using tissue_lowres_image.png,
spatial_location <- cbind(spatial_location,
x=st_object@images[[1]]@scale.factors$lowres * st_object@images[[1]]@coordinates$imagecol,
y=-st_object@images[[1]]@coordinates$imagerow * st_object@images[[1]]@scale.factors$lowres)
Normalize the data
sc_counts <- sc_counts[rowSums(sc_counts)>0,]
sc_norm = as.matrix(log1p(sweep(sc_counts,2,Matrix::colSums(sc_counts),FUN = '/') * 1e4))
spatial_count <- spatial_count[rowSums(spatial_count)>0,]
st_norm = log1p(sweep(spatial_count,2,Matrix::colSums(spatial_count),FUN = '/') * 1e4)
3. Use HMRF to do spatial clustering
The optimal number of spatial domains is determined based on the anatomical features of the tissue. In case where the number of domains is unknown, we assess different possible values and select the number that yields the highest average Silhouette width.
cluster_k <- 3
# Create specific instructions for Giotto analysis workflow
instrs <- createGiottoInstructions(save_plot = TRUE,
show_plot = TRUE,
return_plot = TRUE,
python_path = python_path,
save_dir = save_directory)
spatial_obj <- createGiottoObject(raw_exprs = spatial_count,
spatial_locs = spatial_location[,c('x','y')],
instructions = instrs,
cell_metadata = spatial_location)
# Filter genes and cells. If you have filtered some low quality spots before, you can skip this step
spatial_obj <- filterGiotto(gobject = spatial_obj,
expression_threshold = 1,
gene_det_in_min_cells = 50,
min_det_genes_per_cell = 250,
expression_values = c('raw'),
verbose = T)
spatial_obj <- normalizeGiotto(gobject = spatial_obj, scalefactor = 6000, verbose = T)
# Create spatial network
#@ maximum_distance_knn: Visium data, tissue_hires_scalef set ceiling(24.8/tissue_hires_scalef) or as maximum_distance_knn, tissue_hires_scalef is saved in scalefactors_json.json; slide-seq/ST: set 1.5
spatial_obj <- createSpatialNetwork(gobject = spatial_obj,
method = 'kNN',
k = 6, # this k represents the number of neighbors
maximum_distance_knn = 370,
minimum_k = 1,
name = 'KNN_network')
kmtest <- binSpect(spatial_obj, calc_hub = T, hub_min_int = 5,spatial_network_name = 'KNN_network')
hmrf_folder = paste0(save_directory,'/11_HMRF')
if(!file.exists(hmrf_folder)) dir.create(hmrf_folder, recursive = T)
spatial_genes_selected <- hmrf_spatial_gene(spatial_obj,
kmtest,
k = cluster_k) # k: Number of spatial domains; set according to your data.
#@ betas: For detailed settings, see https://search.r-project.org/CRAN/refmans/smfishHmrf/html/smfishHmrf.hmrfem.multi.it.min.html
# For quick results, we recomoned setting betas to 45(non-tumor) or 0(tumor sample).
# If you don't mind taking more time and want the best results, you can iteratively test values between 0 and 100 and select the best one.
HMRF_spatial_genes = doHMRF(gobject = spatial_obj,
expression_values = 'scaled',
spatial_genes = spatial_genes_selected,
k = cluster_k, # This value should match the number of spatial domains (k).
spatial_network_name="KNN_network",
betas = c(0, 45, 2),
python_path = python_path,
output_folder = paste0(hmrf_folder, '/', 'Spatial_genes/SG_topgenes_elbow_k_scaled'))
#@betas_to_add: Results from different betas that you want to add
# Recommendations: Tumor sample: beta=0; Non-tumor: beta=45.
beta = 0
spatial_obj = addHMRF(gobject = spatial_obj,
HMRFoutput = HMRF_spatial_genes,
k = cluster_k,
betas_to_add = beta, # according to the above beta settings
hmrf_name = 'HMRF')
# Add spatial domain to spatial metadata. You can also save the spatial_location as an intermediate file, which must include spatial genes and spatial cluster labels.
spatial_location = spatial_location[as.data.frame(spatial_obj@cell_metadata)[,'cell_ID'],]
column <- paste0('HMRF_k',cluster_k,'_b.',beta)
spatial_location = cbind(spatial_location,HMRF_cluster = spatial_obj@cell_metadata[,column]) # this coloumn needs to be set as described above (the number of domains and beta)
st_norm = st_norm[,rownames(spatial_location)]
4. Level 1 mapping (DomainDivision), assigning cells into different spatial domains
matrix <- data_to_transform(sc_norm,st_norm,spatial_genes_selected,batch=TRUE,pca_method='prcomp_irlba')
train_set <- cbind(as.data.frame(t(matrix[,colnames(st_norm)])),label=spatial_location$HMRF_cluster)
test_set <- as.data.frame(t(matrix[,colnames(sc_norm)]))
train_set$label = as.factor(train_set$label)
# Predict spatial domain of individual cells
# This tuning step requires some time. You can adjust the cross-validation proportion using `cross_para` parameter in the `tune_parameter()` function.
parameters <- tune_parameter(train_set, test_set, kernel = "radial", scale = TRUE, class.weight = TRUE,
Related Skills
node-connect
341.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.4kCommit, push, and open a PR
