SkillAgentSearch skills...

LMSstat

Package for automation of statistics that are widely used in metabolomics.

Install / Use

/learn @CHKim5/LMSstat

README

LMSstat

Automation of statistical test with an identical data input aiming to reduce arduous work searching for packages and changing data input.

The package includes

  • Simple Statistics :u-test, t-test, post hocs of Anova and Kruskal Wallis with FDR adjusted values

  • Bar, Box, Dot, Violin plots with significance (u-test, t-test, post hocs of Anova and Kruskal Wallis)

  • Scaling & Transformation

  • Normality check (Shapiro Wilk test)

  • Scheirer–Ray–Hare Test

  • Volcano plot

  • Heatmap

  • PERMANOVA

  • NMDS

  • PCA

  • PCoA

Contribution acknowledgement

Oct.01/2021 Daehwan Kim

  • Allstats_new optimization for faster processing

  • bug fix of Allstats (regarding LETTERS210729)

Instructions

Installation

Download R

https://cran.r-project.org/bin/windows/base/

Download R Studio

https://www.rstudio.com/products/rstudio/download/

Download Rtools

https://cran.r-project.org/bin/windows/Rtools/

Download package in R

install.packages("devtools")

devtools::install_github("CHKim5/LMSstat")

library(LMSstat)

Basic structure of the Data

Used in

  • Simple statistics
  • Barplot, Boxplot, Dotplot
  • Volcano plot
  • Scheirer–Ray–Hare Test
  • PERMANOVA
  • NMDS
  • PCA
  • Scaling & Transformation
  • Normality check (Shapiro Wilk test)
  • Heatmap
#Sample Data provided within the package

data("Data")

# Uploading your own Data

setwd("C:/Users/82102/Desktop")

Data<-read.csv("statT.csv",header = F)

The column "Multilevel" is mandatory for the code to run flawlessly.

If Multilevel is not used, fill the column with random characters

Datafile needs to follow the following format

Care for Capitals: Sample, Multilevel, Group

<p align="center"> <img src="https://user-images.githubusercontent.com/77651662/125877205-e140d306-81d8-459f-8414-b8ef3bca63d7.PNG" width="750" height="400"> </p> <p align="center">statT.csv</p>

Used in

  • PERMANOVA
#Sample Data provided within the package
data("Classification")

# Uploading your own Data
Classification<-read.csv("statT_G.csv",header = F)
<p align="center"> <img src="https://user-images.githubusercontent.com/77651662/125877154-d01ad8b0-25cd-448b-905d-749a6cc93552.PNG" width="500" height="400"> </p> <p align="center">statT_G.csv</p>

Univariate statistics

Statfile<-Allstats_new(Data,Adjust_p_value = T, Adjust_method = "BH") # Optimized code using lapply / data.table for faster processing contributed by Daehwan Kim

Statfile<-Allstats(Data,Adjust_p_value = T, Adjust_method = "BH") # Previous version using for-loop
Adjustable parameters
  • Adjust_p_value = T # Set True if adjustment is needed

  • Adjust_method = F # Adjustment methods frequently used. c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY","fdr", "none")

head(Statfile[["Result"]]) # includes all statistical results

write.csv(Statfile[["Result"]],"p_value_result.csv")  # Write csv with all the p-value included

Plots

# Makes a subdirectory and saves box plots for all the variables
AS_boxplot(Statfile,asterisk = "u_test") 

# Makes a subdirectory and saves dot plots for all the variables
AS_dotplot(Statfile,asterisk = "t_test") 

# Makes a subdirectory and saves bar plots for all the variables
AS_barplot(Statfile,asterisk = "Scheffe")

# Makes a subdirectory and saves violin plots for all the variables
AS_violinplot(Statfile,asterisk = "Scheffe")

          AS_boxplot(Statfile)              AS_dotplot(Statfile)

<p align="center"> <img src="https://user-images.githubusercontent.com/77651662/125714687-1908d6eb-b2bd-4e25-8ef0-62c24466c32a.png" width="350" height="250"> <img src="https://user-images.githubusercontent.com/77651662/125714704-d7dab67e-03c0-4e35-b86a-36723f7c63de.png" width="350" height="250"> </p>

          AS_barplot(Statfile)              AS_violinplot(Statfile)

<p align="center"> <img src="https://user-images.githubusercontent.com/77651662/125715925-0878ec77-30bf-4859-8e56-316d98b6d520.jpg" width="350" height="250"> <img src="https://user-images.githubusercontent.com/77651662/141611647-1bc0f049-fcd1-4cb4-a713-5aa30f24dcb1.png" width="350" height="250"> </p>
Adjustable parameters
  • asterisk = "t_test" #c("Dunn","Scheffe","u_test","t_test")
  • significant_variable_only = F # If set to TRUE, insignificant results will not be plotted
  • color = c("#FF3300", "#FF6600", "#FFCC00", "#99CC00", "#0066CC", "#660099") # Colors for the plots
  • legend_position = "none" # "none","left","right","bottom","top"
  • order = NULL # Order of the groups c("LAC","LUE","WEI","SDF","HGH","ASH")
  • tip_length = 0.01 # significance tip length
  • label_size = 2.88 # significance label size
  • step_increase = 0.05 #significance step increase
  • width = 0.3 # box width ; size = 3 # dot size
  • fig_width = NA #figure size
  • fig_height = NA #figure size
  • Y_text = 12 # Y title size
  • X_text = 10 # X text size
  • Y_lab = 10 #y axis text size
  • T_size = 15 # Title size
  • sig_int = c(0.1,0.05) # significance interval

Scaling & Transformation

scaled_data<-D_tran(Data,param = "Auto")

           Raw_Data                     Scaled_Data

<p align="center"> <img src="https://user-images.githubusercontent.com/77651662/126724661-1e2ee121-ecfe-41eb-a52a-e3dca128c12c.PNG" width="400" height="200"> <img src="https://user-images.githubusercontent.com/77651662/126724676-44c4eaac-c007-4eaf-9962-0b739291adb6.PNG" width="400" height="200"> </p>
Adjustable parameters
  • param = "None" # "None","Auto","log10","Pareto"

  • save = F #Set true if datafile is to be saved

Normality check

#Shapiro Wilk test

Result<-Norm_test(Data)

write.csv(Result,"Normality_test_Result.csv")
<p align="center"> <img src="https://user-images.githubusercontent.com/77651662/126578553-f2d583db-d649-4db7-92bb-8cfa9433904a.PNG" width="400" height="350"> </p>

Scheirer–Ray–Hare Test

# csv files including significant variables (Multilevel, Group, interaction) and a Venn diagram are downloaded
SRH(Data)
<p align="center"> <img src="https://user-images.githubusercontent.com/77651662/130378279-fb49da48-dc7b-418e-88e7-db046364a4cc.PNG" width="400" height="350"> </p>
Adjustable parameters
  • Adjust_p_value = T # Set True if adjustment is needed
  • Adjust_method = "BH" # Adjustment methods frequently used. c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY","fdr", "none")

Volcano plot

# Makes a subdirectory and saves Volcano plots for different combination of groups
Test<-Allstats(Data)
Volcano(Test,asterisk = "t-test")
<p align="center"> <img src="https://user-images.githubusercontent.com/77651662/127811109-0cb273ba-3c01-4679-9cb3-c510f3176592.png" width="400" height="400"> <img src="https://user-images.githubusercontent.com/77651662/127816379-409d8630-eaf2-45aa-8d77-7540e5542ebd.png" width="400" height="400"> </p>
Adjustable parameters
  • asterisk = "t-test" #statistics inheriting from Allstats "Scheffe", "t-test", "u-test", "Dunn"
  • reverse = T # T, F reverse the direction of fold change
  • fig_width = NA #figure size
  • fig_height = NA #figure size
  • FC_log = 2 # Fold change log transformation value
  • pval_log = 10 #p_value log transformation value
  • dotsize = 3 #dotsize
  • x_limit = c(-2,2) #x axis limt
  • y_limit =c(0,6) #y axis limit
  • pval_intercept = 0.05 # intercept for identification
  • sig_label = T # T,F label significant variables
  • color=c("#FF3300","#FF6600","#FFCC00") #colors used for ggplots.
  • fixed_limit = F #whether the limit should be fixed or not T, F
  • max_overlap = 20 #maximum overlap for labels
  • FC_range = c(-1.5,1.5) #significant fold change range

Heatmap

# Makes a subdirectory and saves Heatmap

scaled_data<-D_tran(Data,param = "Auto")

AS_heatmap(scaled_data) #data inheriting from D_tran

dev.off() # Saved as PDF
<p align="center"> <img src="https://user-images.githubusercontent.com/77651662/126421942-247031e1-9f90-452f-b5bd-fdf5bdf5c058.PNG" width="750" height="400"> </p>
Adjustable parameters
  • col =c("green", "white", "red") # colors for heatmap
  • col_lim = c(-3, 0, 3) # color boundaries
  • reverse = T # T,F Reverse column and rows
  • distance = "pearson" # Distance matrix for HCA "pearson", "manhattan","euclidean","spearman","kendall" ,
  • rownames = T # T,F
  • colnames = T # T,F
  • Hsize = (3,6) # Width & Height c(a,b)
  • g_legend = "Group" # Annotation legend title
  • h_legend = "Color Key" # Heatmap legend title
  • Title ="Title" # Title
  • T_size = 10 # Title text size
  • R_size = 3 # row text size
  • C_size = 3 # column text size
  • Gcol =c("ASD" = "black","HGH"="red","LAC"="blue","LUE" ="grey","SDF" = "yellow","WEI"="green") # Color for top_annotation bar
  • dend_h = 0.5 #dendrite height
  • a_h = 0.2 # top annotation hegiht

Multivariate statistics

PERMANOVA

data("Data")

data("Classification") 

Single factor

PERMANOVA done with the Group column

Indiv_Perm(Data) # The group information is treated as a factor
<p align="center"> <img src="https://user-images.githubusercontent.com/77651662/125876315-c51448c9-eef1-4299-b85e-34e62a99bea8.PNG" width="750" height="225"> </p>

Multiple Factors

Loops PERMANOVA over different classes provided by Classification

Result<-Multi_Perm(Data,Classification) # The group information is treated as factors
<p align="center"> <img src="https://user-images.githubusercontent.com/77651662/125875797-89b696dd-f2a7-4ff3-aec9-74ac4b9075f9.PNG" width="750" height="400"> </p>
Adjustable parameters
  • method = Dissimilarity index c("manhattan", "e
View on GitHub
GitHub Stars6
CategoryDevelopment
Updated6mo ago
Forks5

Languages

R

Security Score

82/100

Audited on Sep 9, 2025

No findings