SkillAgentSearch skills...

G2P

G2P is an integrated genomic selection (GS) package for predicting phenotypes from genotypes. It includes 15 GS algorithms and 13 evaluation measurements.

Install / Use

/learn @cma2015/G2P
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

G2P : Genotypes to Phenotypes<br>

<br> The R package "G2P" is an integrated Genomic Selection (GS) package for predicting phenotypes from genotypes, which includes 20 GS algorithms and 13 evaluation measurements. G2P provide a comprehensive but easy-to-use platform
for Genomic Selection researchers.Besides, G2P also provide a interactive UI based on shiny which can easily operated and learned. <br>

Version and download <br>

  • Version 1.0 -First version released on Feb, 28th, 2017<br>
  • Version 1.1 -Second version released on August, 14th, 2017<br>

Depends <br>

  • R (>= 3.3.1)
  • BGLR(>= 2.10)
  • pROC(>= 1.8)
  • PRROC(>= 1.1)
  • e1071(>= 1.6-7)
  • glmnet(>= 2.0)
  • pls(>= 2.5-0)
  • randomForest(>= 4.6-12)
  • rrBLUP(>= 4.4)
  • snowfall(>= 1.84-6.1)
  • spls(>= 2.2-1)
  • brnn(>= 0.6)
  • sommer(>= 2.9)
  • hglm(>=2.1-1)
  • hglm.data(>=1.0-0)
  • snow(>=0.4-1)
  • snowfall(>=1.84-6.1) <br>

Please assure that you have installed all depends packages before you use G2P <br>

Suggests

  • rgl(>= 0.97.0)
  • pheatmap(>= 1.0.8)
  • shiny(>= 1.0.3)
  • plotly(>= 4.7.0)
  • shinythemes(>= 1.1.1)
  • ggplot2(>= 2.2.1.9000)
  • impute(>=1.46.0) <br>

Thesew packages for results display and G2P.app, if you want use interactive UI and plot in G2P,please assure you have install these packages.Besides,impute for "GSDataQC" function to impute with knn methods. <br>

Installation <br>

Install dependency packages

install.packages(c("BGLR", "PRROC","e1071","glmnet","spls","randomForest","rrBLUP","snowfall","pls","brnn","sommer","hglm","hglm.data","snow","snowfall"), dependencies=TRUE)

Install G2P

install.packages("path/G2P_1.1.0.tar.gz", repos = NULL, type = "source")
# The path cant including space.

Install suggested packages

install.packages(c("rgl", "pheatmap","shiny","plotly","shinythemes","ggplot2","impute"), dependencies=TRUE)

Contents

Main functions

  • GS Data summary <br>
  • Example data <br>
  • Trainning model <br>
  • Performance assement <br>
  • Cross validation <br>
  • Feature selection <br>
  • Results display <br>
  • Package help<br>
  • G2P.app<br>

GS algorithms (20)

  • Statistics based methods<br> BayesA, BayesB, BayesC, BRR, BL, RKHS, RR, rrBLUP, SPLS, LASSO, BRNN, AI, NR, EM, EMM, bigRR
  • Machine-learning based methods<br> RFC, RFR, SVC, SVR <br>

Evaluation measures (13)

  • Global measures <br> Pearson correlation, Kendall rank correlation, Spearman Correlation, Mean squared error (MSE), R2
  • Threshold-based measures <br> Normalized discounted cumulative gain (NDCG), meanNDCG, AUC, AUCpr, F1, Kappa, Relative efficiency (RE), Accuracy <br>

Example data

The G2P package have built-in example dataset GYSS, but it only a subset of grain yield under drought stressed(GYSS) dataset with 242 samples and 1000 SNPs. GYSS dataset ware from International Centre for the Improvement of Maize and Wheat (CIMMYT). Complete dataset could be downloaded with 242 samples and 46373 SNPs. We use subset not complete dataset in order to shorten the compute time.

Quick start

More details please install G2P and see help in R.<br>

G2P Tutorial

Command line

0 Setting up the R session

Before starting, the user should choose a working directory, preferably a directory devoted exclusively for this tutorial. After starting an R session, change working directory, load the requisite packages and set standard options:

# Display the current working directory
getwd();
# Set working directory
workingDir = ".";
setwd(workingDir)
# Library G2P package
Library(G2P)

1 Load example datasets and quick look at the format of example dataset.

Load datasets and check the input of G2P. This function provide two methods to impute the missing value and finally giving a summary list including multifarious information about input data. For example the count and percent of miss value, the minor allele frequency (MAF) etc. data(GYSS)

# Genotypic data
Markers[1:10,1:10]
# Phenotypic data
phenotype[1:10]
## GSDataQC, not impute ##
QCRes <- GSDataQC(markers = Markers, phenotype = phenotype, impute = F)

## GSDataQC, impute ##
misIndex <- sample(1:242000,100000)
Markers[misIndex] <- NA
QCResImpute <- GSDataQC(markers = Markers, phenotype = phenotype, impute = T, 
                        imputeMethod = "mean")

2 Feature selection

To score each SNP set, you can screen high grade of SNP for subsequent modeling, in order to simplify the operation and improve the precision of feature selection.Parameter "method" including "Gini","rrBLUP" and "Accuracy".

# Feature selection with rrBLUP
rrBLUP_selection <- feature_assess(markers = Markers, phenotype = phenotype, method = "rrBLUP")
# This function return a numeric array indicates the score of each position of SNPs

3 Modeling

Details for parameter setting and illustration please refer to G2P package’s help in R or reference manual in pdf.

# Fit a regression model (modelMethods including "BayesA", "BayesB", "BayesC", "BL", "BRR",  # "rrBLUP",    
# "LASSO", "SPLS", "bigRR". ) 
rrBLUP_model <- GSReModel(markers = Markers, pheVal = phenotype, modelMethods = "rrBLUP")
# Fit a machine learning model (modelMethods including "SVR" , "SVC", "RFR", "RFC")
# Fit RFR model
machine_model <- GSmachine(markers = Markers, pheVal = phenotype, modelMethods = "RFR")
# Fit classification model(RFC)
machine_model <- GSmachine(markers = Markers, pheVal = phenotype, modelMethods = "RFC",
                           posPercentage = 0.4, ntree = 500)
# Fit other models ("BRNN", "RKHS", "RR", "AI", "NR", "EM", "EMMA"), set parameter "outputModel = TRUE" to get a list including prediction results and model, otherwise, only output prediction results.  
BRNN_Res <- fit.BRNN(trainedMarkerMat = Markers, trainedPheVal = phenotype,
                     predictMarkerMat = Markers[1:10,], outputModel = TRUE,verbose = F)
RKHS_Res <- fit.RKHS(trainedMarkerMat = Markers, trainedPheVal = phenotype, 
                     predictMarkerMat = Markers[1:10,],nIter = 1500, burnIn = 500, outputModel = TRUE)
RR_Res <- fit.RR(trainedMarkerMat = Markers, trainedPheVal = phenotype,
                 predictMarkerMat = Markers[1:10,], outputModel = TRUE )
# Fit mmer models (method including "AI", "NR", "EM", "EMMA" )
mmer_Res <- fit.mmer(trainedMarkerMat = Markers, trainedPheVal = phenotype, 
                     predictMarkerMat = Markers[1:10,], method = "NR", effectConcider = "A", outputModel = TRUE)

4 Prediction

Genomic selection methods including "BRNN", "RKHS", "RR", "AI", "NR", "EM", "EMMA" have got prediction results above. The following prediction function were apply to methods "BayesA", "BayesB", "BayesC", "BL", "BRR", "rrBLUP", "LASSO", "SPLS", "bigRR", "SVR" , "SVC", "RFR" and "RFC". Details for parameter setting and illustration please refer to G2P package’s help in R or reference manual in pdf.

# testMat is a new marker matrix which need to predict
# trainModel is the already model in 3 and the modelMethods indicates the name of method.
rrBLUP_Res <- predictGS(testMat = Markers[1:10,], trainModel = rrBLUP_model, modelMethods = "rrBLUP")
This function return a numeric array indicates the prediction results of each predicted sample.

5 G2P

Multi-methods genotype to phenotype.

> G2P(trainMarker = Markers, trainPheno = phenotype, testMarker = Markers[1:10,], testPheno = phenotype[1:10], modelMethods =c("BayesA", "BayesB", "BayesC", "BL", "BRR", "rrBLUP","RFC"), outputModel =FALSE)
      realPhenScore    BayesA    BayesB    BayesC        BL       BRR    rrBLUP   RFC
DT10      0.7045494 0.6638141 0.6645500 0.6512245 0.6496041 0.6553746 0.6574948 0.820
DT100     0.3886150 0.4270637 0.4152724 0.4201070 0.4275309 0.4197220 0.4077610 0.096
DT101     0.1165665 0.3105247 0.3044984 0.2987718 0.3117646 0.2924395 0.2930300 0.104
DT102     0.3747609 0.4727063 0.4783084 0.4745835 0.4849857 0.4759522 0.4694439 0.162
DT103     0.4136186 0.4770212 0.4841845 0.4843115 0.4933730 0.4942586 0.4833140 0.166
DT104     0.6908267 0.6321259 0.6272183 0.6269430 0.6128531 0.6268135 0.6300071 0.778
DT105     0.5520021 0.5536663 0.5411970 0.5555117 0.5449907 0.5263780 0.5400619 0.144
DT109     0.5161542 0.4841527 0.4866321 0.4982482 0.4744184 0.4835680 0.4826495 0.110
DT110     0.5486291 0.6010441 0.6076803 0.5893248 0.6028943 0.6064955 0.5979550 0.124
DT111     0.7276410 0.5877352 0.5966961 0.6118398 0.6075789 0.6014777 0.6036835 0.786

6 Cross validation

Run cross validation (CV) of GS methods. Parameter “cross” indicates the folds of CV, and “seed” sets random seed. “cpus” sets the core numbers of parallel.

predlist <- G2PCrossValidation(cross = 10, seed = 1 , cpus = 3, markers = Markers,
                               pheVal = phenotype, modelMethods = c("rrBLUP", "RFC"),
                               outputModel = FALSE)
# This function return a list and each element indicates one fold results of CV.
> predlist$cv1
       testPheno    rrBLUP   RFC
DT187 0.27187247 0.4862278 0.316
DT21  0.08312729 0.4208843 0.254
DT257 0.25201410 0.4822461 0.302
DT72  0.47947204 0.6243357 0.798
DT17  0.50980385 0.4601366 0.154
DT68  0.45270891 0.4154981 0.176
DT78  0.80182295 0.4193034 0.332
DT273 0.61428571 0.5145006 0.344
DT266 0.40762308 0.7258433 0.528
DT116 0.82464237 0.5425566 0.362
DT169

Related Skills

View on GitHub
GitHub Stars17
CategoryDevelopment
Updated1mo ago
Forks9

Languages

R

Security Score

75/100

Audited on Feb 26, 2026

No findings