G2P
G2P is an integrated genomic selection (GS) package for predicting phenotypes from genotypes. It includes 15 GS algorithms and 13 evaluation measurements.
Install / Use
/learn @cma2015/G2PREADME
G2P : Genotypes to Phenotypes<br>
<br>
The R package "G2P" is an integrated Genomic Selection (GS) package for predicting phenotypes from genotypes,
which includes 20 GS algorithms and 13 evaluation measurements. G2P provide a comprehensive but easy-to-use platform
for Genomic Selection researchers.Besides, G2P also provide a interactive UI based on shiny which can easily operated and learned.
<br>
Version and download <br>
- Version 1.0 -First version released on Feb, 28th, 2017<br>
- Version 1.1 -Second version released on August, 14th, 2017<br>
Depends <br>
- R (>= 3.3.1)
- BGLR(>= 2.10)
- pROC(>= 1.8)
- PRROC(>= 1.1)
- e1071(>= 1.6-7)
- glmnet(>= 2.0)
- pls(>= 2.5-0)
- randomForest(>= 4.6-12)
- rrBLUP(>= 4.4)
- snowfall(>= 1.84-6.1)
- spls(>= 2.2-1)
- brnn(>= 0.6)
- sommer(>= 2.9)
- hglm(>=2.1-1)
- hglm.data(>=1.0-0)
- snow(>=0.4-1)
- snowfall(>=1.84-6.1) <br>
Please assure that you have installed all depends packages before you use G2P <br>
Suggests
- rgl(>= 0.97.0)
- pheatmap(>= 1.0.8)
- shiny(>= 1.0.3)
- plotly(>= 4.7.0)
- shinythemes(>= 1.1.1)
- ggplot2(>= 2.2.1.9000)
- impute(>=1.46.0) <br>
Thesew packages for results display and G2P.app, if you want use interactive UI and plot in G2P,please assure you have install these packages.Besides,impute for "GSDataQC" function to impute with knn methods. <br>
Installation <br>
Install dependency packages
install.packages(c("BGLR", "PRROC","e1071","glmnet","spls","randomForest","rrBLUP","snowfall","pls","brnn","sommer","hglm","hglm.data","snow","snowfall"), dependencies=TRUE)
Install G2P
install.packages("path/G2P_1.1.0.tar.gz", repos = NULL, type = "source")
# The path cant including space.
Install suggested packages
install.packages(c("rgl", "pheatmap","shiny","plotly","shinythemes","ggplot2","impute"), dependencies=TRUE)
Contents
Main functions
- GS Data summary <br>
- Example data <br>
- Trainning model <br>
- Performance assement <br>
- Cross validation <br>
- Feature selection <br>
- Results display <br>
- Package help<br>
- G2P.app<br>
GS algorithms (20)
- Statistics based methods<br> BayesA, BayesB, BayesC, BRR, BL, RKHS, RR, rrBLUP, SPLS, LASSO, BRNN, AI, NR, EM, EMM, bigRR
- Machine-learning based methods<br> RFC, RFR, SVC, SVR <br>
Evaluation measures (13)
- Global measures <br> Pearson correlation, Kendall rank correlation, Spearman Correlation, Mean squared error (MSE), R2
- Threshold-based measures <br> Normalized discounted cumulative gain (NDCG), meanNDCG, AUC, AUCpr, F1, Kappa, Relative efficiency (RE), Accuracy <br>
Example data
The G2P package have built-in example dataset GYSS, but it only a subset of grain yield under drought stressed(GYSS) dataset with 242 samples and 1000 SNPs. GYSS dataset ware from International Centre for the Improvement of Maize and Wheat (CIMMYT). Complete dataset could be downloaded with 242 samples and 46373 SNPs. We use subset not complete dataset in order to shorten the compute time.
Quick start
More details please install G2P and see help in R.<br>
G2P Tutorial
Command line
0 Setting up the R session
Before starting, the user should choose a working directory, preferably a directory devoted exclusively for this tutorial. After starting an R session, change working directory, load the requisite packages and set standard options:
# Display the current working directory
getwd();
# Set working directory
workingDir = ".";
setwd(workingDir)
# Library G2P package
Library(G2P)
1 Load example datasets and quick look at the format of example dataset.
Load datasets and check the input of G2P. This function provide two methods to impute the missing value and finally giving a summary list including multifarious information about input data. For example the count and percent of miss value, the minor allele frequency (MAF) etc. data(GYSS)
# Genotypic data
Markers[1:10,1:10]
# Phenotypic data
phenotype[1:10]
## GSDataQC, not impute ##
QCRes <- GSDataQC(markers = Markers, phenotype = phenotype, impute = F)
## GSDataQC, impute ##
misIndex <- sample(1:242000,100000)
Markers[misIndex] <- NA
QCResImpute <- GSDataQC(markers = Markers, phenotype = phenotype, impute = T,
imputeMethod = "mean")
2 Feature selection
To score each SNP set, you can screen high grade of SNP for subsequent modeling, in order to simplify the operation and improve the precision of feature selection.Parameter "method" including "Gini","rrBLUP" and "Accuracy".
# Feature selection with rrBLUP
rrBLUP_selection <- feature_assess(markers = Markers, phenotype = phenotype, method = "rrBLUP")
# This function return a numeric array indicates the score of each position of SNPs
3 Modeling
Details for parameter setting and illustration please refer to G2P package’s help in R or reference manual in pdf.
# Fit a regression model (modelMethods including "BayesA", "BayesB", "BayesC", "BL", "BRR", # "rrBLUP",
# "LASSO", "SPLS", "bigRR". )
rrBLUP_model <- GSReModel(markers = Markers, pheVal = phenotype, modelMethods = "rrBLUP")
# Fit a machine learning model (modelMethods including "SVR" , "SVC", "RFR", "RFC")
# Fit RFR model
machine_model <- GSmachine(markers = Markers, pheVal = phenotype, modelMethods = "RFR")
# Fit classification model(RFC)
machine_model <- GSmachine(markers = Markers, pheVal = phenotype, modelMethods = "RFC",
posPercentage = 0.4, ntree = 500)
# Fit other models ("BRNN", "RKHS", "RR", "AI", "NR", "EM", "EMMA"), set parameter "outputModel = TRUE" to get a list including prediction results and model, otherwise, only output prediction results.
BRNN_Res <- fit.BRNN(trainedMarkerMat = Markers, trainedPheVal = phenotype,
predictMarkerMat = Markers[1:10,], outputModel = TRUE,verbose = F)
RKHS_Res <- fit.RKHS(trainedMarkerMat = Markers, trainedPheVal = phenotype,
predictMarkerMat = Markers[1:10,],nIter = 1500, burnIn = 500, outputModel = TRUE)
RR_Res <- fit.RR(trainedMarkerMat = Markers, trainedPheVal = phenotype,
predictMarkerMat = Markers[1:10,], outputModel = TRUE )
# Fit mmer models (method including "AI", "NR", "EM", "EMMA" )
mmer_Res <- fit.mmer(trainedMarkerMat = Markers, trainedPheVal = phenotype,
predictMarkerMat = Markers[1:10,], method = "NR", effectConcider = "A", outputModel = TRUE)
4 Prediction
Genomic selection methods including "BRNN", "RKHS", "RR", "AI", "NR", "EM", "EMMA" have got prediction results above. The following prediction function were apply to methods "BayesA", "BayesB", "BayesC", "BL", "BRR", "rrBLUP", "LASSO", "SPLS", "bigRR", "SVR" , "SVC", "RFR" and "RFC". Details for parameter setting and illustration please refer to G2P package’s help in R or reference manual in pdf.
# testMat is a new marker matrix which need to predict
# trainModel is the already model in 3 and the modelMethods indicates the name of method.
rrBLUP_Res <- predictGS(testMat = Markers[1:10,], trainModel = rrBLUP_model, modelMethods = "rrBLUP")
This function return a numeric array indicates the prediction results of each predicted sample.
5 G2P
Multi-methods genotype to phenotype.
> G2P(trainMarker = Markers, trainPheno = phenotype, testMarker = Markers[1:10,], testPheno = phenotype[1:10], modelMethods =c("BayesA", "BayesB", "BayesC", "BL", "BRR", "rrBLUP","RFC"), outputModel =FALSE)
realPhenScore BayesA BayesB BayesC BL BRR rrBLUP RFC
DT10 0.7045494 0.6638141 0.6645500 0.6512245 0.6496041 0.6553746 0.6574948 0.820
DT100 0.3886150 0.4270637 0.4152724 0.4201070 0.4275309 0.4197220 0.4077610 0.096
DT101 0.1165665 0.3105247 0.3044984 0.2987718 0.3117646 0.2924395 0.2930300 0.104
DT102 0.3747609 0.4727063 0.4783084 0.4745835 0.4849857 0.4759522 0.4694439 0.162
DT103 0.4136186 0.4770212 0.4841845 0.4843115 0.4933730 0.4942586 0.4833140 0.166
DT104 0.6908267 0.6321259 0.6272183 0.6269430 0.6128531 0.6268135 0.6300071 0.778
DT105 0.5520021 0.5536663 0.5411970 0.5555117 0.5449907 0.5263780 0.5400619 0.144
DT109 0.5161542 0.4841527 0.4866321 0.4982482 0.4744184 0.4835680 0.4826495 0.110
DT110 0.5486291 0.6010441 0.6076803 0.5893248 0.6028943 0.6064955 0.5979550 0.124
DT111 0.7276410 0.5877352 0.5966961 0.6118398 0.6075789 0.6014777 0.6036835 0.786
6 Cross validation
Run cross validation (CV) of GS methods. Parameter “cross” indicates the folds of CV, and “seed” sets random seed. “cpus” sets the core numbers of parallel.
predlist <- G2PCrossValidation(cross = 10, seed = 1 , cpus = 3, markers = Markers,
pheVal = phenotype, modelMethods = c("rrBLUP", "RFC"),
outputModel = FALSE)
# This function return a list and each element indicates one fold results of CV.
> predlist$cv1
testPheno rrBLUP RFC
DT187 0.27187247 0.4862278 0.316
DT21 0.08312729 0.4208843 0.254
DT257 0.25201410 0.4822461 0.302
DT72 0.47947204 0.6243357 0.798
DT17 0.50980385 0.4601366 0.154
DT68 0.45270891 0.4154981 0.176
DT78 0.80182295 0.4193034 0.332
DT273 0.61428571 0.5145006 0.344
DT266 0.40762308 0.7258433 0.528
DT116 0.82464237 0.5425566 0.362
DT169
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
