MDSRefMaps
MDSRefMaps: an R Package for Multidimensional Scaling Reference Maps and Projections
Install / Use
/learn @tchitchek-lab/MDSRefMapsREADME
MDSRefMaps: an R Package for Multidimensional Scaling Reference Maps and Projections
Multidimensional Scaling (MDS) is one of the most powerful method to analysis high-dimensional objects. Such methods aim to project the similarities between high-dimensional objects into a space having a low number of dimensions, generally in two or three dimensions for visualization purpose. Applied to 'omics' datasets, these methods allow visualizing the homogeneity of samples within different biological conditions or visualizing the similarities between gene or protein expression patterns. In the context of complex studies, multiple MDS representations can be generated and compared to analyze and interpret the high-dimensional objects. However, the comparisons of different MDS representations can be difficult because of the lack of a common structure.
We present an R package, named MDSRefMaps, allowing the projection of additional high-dimensional objects over a predefined MDS representation. The predefined representation is named a MDS Reference Map, and the resulting projection representations are named MDS Projections. Thanks to the common structure of the Reference Map, the comparisons between the different MDS Projections can be easily done.
Table of Contents
- Package overview
- Package installation
- Construction of MDS Reference Maps and MDS Projections
- Case studies with transcriptomic datasets
- Algorithm details
- References
<a name="package_overview"/> 1. Package overview
High-throughput biological data, such as 'omics' experiments, are complex to analyze due to the large number of features measured and due to the large number of samples. Indeed, in the context of transcriptomic or proteomic studies, each profile corresponds to thousands of different gene or protein measurements. Heatmaps [1], hierarchical clusterings [2], Principal Component Analysis [3] and Multidimensional Scaling representation [4] are among the most popular and powerful analysis tools but are limited in the context of large datasets. Developing new visualization tools is then crucial to better interpret large biological datasets [5].
Dimensionality reduction methods perform a projection of the high-dimensional objects into a lower dimensional space, generally in order to visualize them. Applied to biological studies, the objects can be biological samples, for which we want to understand the effect of specific treatments. Objects can also be the genes or other biological variables, for which we want to understand the overall genomic organization or expression.
Multidimensional Scaling (MDS) is one of the most popular dimensionality reduction methods [4]. In MDS representations, the distances between the dots are proportional to the distances between the objects. The Kruskal Stress quantifies the information lost in the dimensionality reduction process.
We previously published a multidimensional scaling method, named SVD-MDS, which is based on a molecular dynamic approach [8]. In our approach, each high-dimensional object is modeled by a particle and the distance between the objects are modeled by attractive or repulsive forces between the particles. Moreover, the SVD-MDS algorithm uses a SVD algorithm to initialize the MDS representation and to enhances then the quality of the resulting MDS representation. In order to quantify the local structure conservation of an MDS representation, we have proposed the Entourage Score, which is proportional to the number of common nearest neighbors between objects in the original dataset and in the resulting representation.
Comparison between the different MDS representations can be difficult because of the absence of a common structure. We present here a new algorithm, named MDSRefMaps, which allows projecting additional high-dimensional objects over a predefined MDS representation. The predefined representations are named MDS Reference Maps and the resulting MDS representations are named MDS Projections. This strategy has been previously proposed in a transcriptomic study of mouse lung responses to different respiratory viruses [10].
In this tutorial, we illustrate the capability of our R package using two public available datasets: (i) the Human Gene Expression - Global Map (HGE-GM) [13] dataset; and (ii) the Mouse Non-Code Lung (MONOCL) dataset [11]. We show here that MDS Projections of different objects can be easily compared while this is impossible for regular MDS representations. The HGE-GM dataset is composed of microarray transcriptomic profiles obtained from different human tissues to construct a global human gene expression map. Lukk et al. collected the transcriptomic profiles from 5,372 human samples, in different tissue types, disease states and cell lines. The MONOCL dataset is composed of RNA-seq transcriptomic profiles obtained from lungs of eight different mouse species, infected by respiratory viruses (Influenza A and SARS-CoV). The long non-coding RNAs have been studied with overall 5,000 differentially expressed lncRNAs and 6,000 differentially expressed coding RNAs after infection. Those transcriptomic data are available in the supplementary materials associated papers and are also available on a public FTP server: ftp://ftp.mdsrefmaps.org/public/ (username: mdsrefmaps, password: mdsrefmaps).
<a name="package_installation"/> 2. Package installation
The ggplot2, Rcpp, and plyr R packages are required for running MDSRefMaps.
These packages can be installed using the following commands:
install.packages("ggplot2")
install.packages("Rcpp")
install.packages("plyr")
MDSRefMaps is available on GitHub, at https://github.com/tchitchek-lab/MDSRefMaps.
Its installation can be done via the devtools package using the following commands:
install.packages("devtools")
library("devtools")
install_github("tchitchek-lab/MDSRefMaps")
Once installed, the MDSRefMaps package can be loaded using the following command:
library("MDSRefMaps")
<a name="visualization_insilico"/> 3. Construction of MDS Reference Maps and MDS Projections
<a name="visualization_insilico_reference"/> 3.1 Construction of MDS Reference Maps
A MDS Reference Map basically consists on a regular MDS representation generated on set of reference objects. In the context of 'omics' studies, these objects can be either the samples or the biological variables (genes, proteins, ...).
The MDSReferenceMap() function computes a MDS Reference Map based on a distance matrix provided by the user.
The parameter k can be used to specify the number of dimensions of the resulting MDS representation (k=2 by default).
Moreover, the MDS representation can be generated using the Euclidean and Manhattan metrics (specified via the parameter method).
For instance, an example MDS Reference Map can be generated using the following commands:
# we will create two sets of 100 objects in a 10-dimentional space
n_obj = 100
n_atr = 10
# the first set of objects is distributed around a normal distribution
# having a mean of 0 and standard deviation of 2
obj_ref1 = matrix(rnorm(n_obj * n_atr, 0, 2), n_obj, n_atr)
# the second set of objects is distributed around a normal distribution
# having a mean of 10 and standard deviation of 2
obj_ref2 = matrix(rnorm(n_obj * n_atr, 10, 2), n_obj, n_atr)
# computes a distance matrix based on these objects
dist_ref = dist(rbind(obj_ref1, obj_ref2))
# generates a MDS Reference Map based on this distance matrix
map_ref = MDSReferenceMap(dist_ref,stress_sd_th = 0.001)
The MDSReferenceMap() function returns a list of three elements:
- the
pointselement is a numeric matrix contains the positions of the objects in the MDS representation - the
stresselement is a numeric value contains the Kruskal Stress of the MDS representation - the
entourageelement is a numeric value contains the Entourage Score of the MDS representation
# prints the structure content of the MDS Reference Map
print(str(map_ref))
## List of 3
## $ points : num [1:200, 1:2] 9.73 12.22 13.19 11.34 16.43 ...
## $ stress : num 11.5
## $ entourage: num 0.122
## NULL
This MDS Reference Map can be plotted, via the plotMDS() function, using the following commands:
# plots the MDS Reference Map
plotMDS(map_ref, title = "MDS Reference Map")
<img src="README/unnamed-chunk-8-1.png" style="display: block; margin: auto;" />
The plotMDS() function can be parametrized to define colors and shapes of the dots:
# defines the colors and the shapes of the dots in the MDS representation
color = c(rep("Reference set 1", n_obj), rep("Reference set 2", n_obj))
shape = c(rep("filledtrianglepointup", n_obj * 2))
# plots the MDS Reference Map with colored and shaped dots
plotMDS(map_ref, color = color, shape = shape, title = "MDS Reference Map")
<img src="README/unnamed-chunk-9-1.png" style="display: block; margin: auto;" />
<a name="visualization_insilico_projections"/> 3.2 Construction of MDS Projections
A MDS Projection is an MDS Reference Map overlayed with additional objects.
Points of the MDS Reference Map remain fixed, while the new objects are positioned regarding theirs distance
Related Skills
node-connect
351.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
