G2P

G2P is an integrated genomic selection (GS) package for predicting phenotypes from genotypes. It includes 15 GS algorithms and 13 evaluation measurements.

Generate Convert Improve

Install / Use

/learn @cma2015/G2P

About this skill

Quality Score

0/100

README

G2P : Genotypes to Phenotypes

The R package "G2P" is an integrated Genomic Selection (GS) package for predicting phenotypes from genotypes, which includes 20 GS algorithms and 13 evaluation measurements. G2P provide a comprehensive but easy-to-use platform
for Genomic Selection researchers.Besides, G2P also provide a interactive UI based on shiny which can easily operated and learned.

Version and download

Version 1.0 -First version released on Feb, 28th, 2017
Version 1.1 -Second version released on August, 14th, 2017

Depends

R (>= 3.3.1)
BGLR(>= 2.10)
pROC(>= 1.8)
PRROC(>= 1.1)
e1071(>= 1.6-7)
glmnet(>= 2.0)
pls(>= 2.5-0)
randomForest(>= 4.6-12)
rrBLUP(>= 4.4)
snowfall(>= 1.84-6.1)
spls(>= 2.2-1)
brnn(>= 0.6)
sommer(>= 2.9)
hglm(>=2.1-1)
hglm.data(>=1.0-0)
snow(>=0.4-1)
snowfall(>=1.84-6.1)

Please assure that you have installed all depends packages before you use G2P

Suggests

rgl(>= 0.97.0)
pheatmap(>= 1.0.8)
shiny(>= 1.0.3)
plotly(>= 4.7.0)
shinythemes(>= 1.1.1)
ggplot2(>= 2.2.1.9000)
impute(>=1.46.0)

Thesew packages for results display and G2P.app, if you want use interactive UI and plot in G2P,please assure you have install these packages.Besides,impute for "GSDataQC" function to impute with knn methods.

Installation

Install dependency packages

install.packages(c("BGLR", "PRROC","e1071","glmnet","spls","randomForest","rrBLUP","snowfall","pls","brnn","sommer","hglm","hglm.data","snow","snowfall"), dependencies=TRUE)

Install G2P

install.packages("path/G2P_1.1.0.tar.gz", repos = NULL, type = "source")
# The path cant including space.

Install suggested packages

install.packages(c("rgl", "pheatmap","shiny","plotly","shinythemes","ggplot2","impute"), dependencies=TRUE)

Main functions

GS Data summary
Example data
Trainning model
Performance assement
Cross validation
Feature selection
Results display
Package help
G2P.app

GS algorithms (20)

Statistics based methods BayesA, BayesB, BayesC, BRR, BL, RKHS, RR, rrBLUP, SPLS, LASSO, BRNN, AI, NR, EM, EMM, bigRR
Machine-learning based methods RFC, RFR, SVC, SVR

Evaluation measures (13)

Global measures Pearson correlation, Kendall rank correlation, Spearman Correlation, Mean squared error (MSE), R2
Threshold-based measures Normalized discounted cumulative gain (NDCG), meanNDCG, AUC, AUCpr, F1, Kappa, Relative efficiency (RE), Accuracy

Example data

The G2P package have built-in example dataset GYSS, but it only a subset of grain yield under drought stressed(GYSS) dataset with 242 samples and 1000 SNPs. GYSS dataset ware from International Centre for the Improvement of Maize and Wheat (CIMMYT). Complete dataset could be downloaded with 242 samples and 46373 SNPs. We use subset not complete dataset in order to shorten the compute time.

Quick start

More details please install G2P and see help in R.

G2P Tutorial

Command line

0 Setting up the R session

Before starting, the user should choose a working directory, preferably a directory devoted exclusively for this tutorial. After starting an R session, change working directory, load the requisite packages and set standard options:

# Display the current working directory
getwd();
# Set working directory
workingDir = ".";
setwd(workingDir)
# Library G2P package
Library(G2P)

1 Load example datasets and quick look at the format of example dataset.

Load datasets and check the input of G2P. This function provide two methods to impute the missing value and finally giving a summary list including multifarious information about input data. For example the count and percent of miss value, the minor allele frequency (MAF) etc. data(GYSS)

# Genotypic data
Markers[1:10,1:10]
# Phenotypic data
phenotype[1:10]
## GSDataQC, not impute ##
QCRes <- GSDataQC(markers = Markers, phenotype = phenotype, impute = F)

## GSDataQC, impute ##
misIndex <- sample(1:242000,100000)
Markers[misIndex] <- NA
QCResImpute <- GSDataQC(markers = Markers, phenotype = phenotype, impute = T, 
                        imputeMethod = "mean")

2 Feature selection

To score each SNP set, you can screen high grade of SNP for subsequent modeling, in order to simplify the operation and improve the precision of feature selection.Parameter "method" including "Gini","rrBLUP" and "Accuracy".

# Feature selection with rrBLUP
rrBLUP_selection <- feature_assess(markers = Markers, phenotype = phenotype, method = "rrBLUP")
# This function return a numeric array indicates the score of each position of SNPs

3 Modeling

Details for parameter setting and illustration please refer to G2P package’s help in R or reference manual in pdf.

# Fit a regression model (modelMethods including "BayesA", "BayesB", "BayesC", "BL", "BRR",  # "rrBLUP",    
# "LASSO", "SPLS", "bigRR". ) 
rrBLUP_model <- GSReModel(markers = Markers, pheVal = phenotype, modelMethods = "rrBLUP")
# Fit a machine learning model (modelMethods including "SVR" , "SVC", "RFR", "RFC")
# Fit RFR model
machine_model <- GSmachine(markers = Markers, pheVal = phenotype, modelMethods = "RFR")
# Fit classification model(RFC)
machine_model <- GSmachine(markers = Markers, pheVal = phenotype, modelMethods = "RFC",
                           posPercentage = 0.4, ntree = 500)
# Fit other models ("BRNN", "RKHS", "RR", "AI", "NR", "EM", "EMMA"), set parameter "outputModel = TRUE" to get a list including prediction results and model, otherwise, only output prediction results.  
BRNN_Res <- fit.BRNN(trainedMarkerMat = Markers, trainedPheVal = phenotype,
                     predictMarkerMat = Markers[1:10,], outputModel = TRUE,verbose = F)
RKHS_Res <- fit.RKHS(trainedMarkerMat = Markers, trainedPheVal = phenotype, 
                     predictMarkerMat = Markers[1:10,],nIter = 1500, burnIn = 500, outputModel = TRUE)
RR_Res <- fit.RR(trainedMarkerMat = Markers, trainedPheVal = phenotype,
                 predictMarkerMat = Markers[1:10,], outputModel = TRUE )
# Fit mmer models (method including "AI", "NR", "EM", "EMMA" )
mmer_Res <- fit.mmer(trainedMarkerMat = Markers, trainedPheVal = phenotype, 
                     predictMarkerMat = Markers[1:10,], method = "NR", effectConcider = "A", outputModel = TRUE)

4 Prediction

Genomic selection methods including "BRNN", "RKHS", "RR", "AI", "NR", "EM", "EMMA" have got prediction results above. The following prediction function were apply to methods "BayesA", "BayesB", "BayesC", "BL", "BRR", "rrBLUP", "LASSO", "SPLS", "bigRR", "SVR" , "SVC", "RFR" and "RFC". Details for parameter setting and illustration please refer to G2P package’s help in R or reference manual in pdf.

# testMat is a new marker matrix which need to predict
# trainModel is the already model in 3 and the modelMethods indicates the name of method.
rrBLUP_Res <- predictGS(testMat = Markers[1:10,], trainModel = rrBLUP_model, modelMethods = "rrBLUP")
This function return a numeric array indicates the prediction results of each predicted sample.

5 G2P

Multi-methods genotype to phenotype.

> G2P(trainMarker = Markers, trainPheno = phenotype, testMarker = Markers[1:10,], testPheno = phenotype[1:10], modelMethods =c("BayesA", "BayesB", "BayesC", "BL", "BRR", "rrBLUP","RFC"), outputModel =FALSE)
      realPhenScore    BayesA    BayesB    BayesC        BL       BRR    rrBLUP   RFC
DT10      0.7045494 0.6638141 0.6645500 0.6512245 0.6496041 0.6553746 0.6574948 0.820
DT100     0.3886150 0.4270637 0.4152724 0.4201070 0.4275309 0.4197220 0.4077610 0.096
DT101     0.1165665 0.3105247 0.3044984 0.2987718 0.3117646 0.2924395 0.2930300 0.104
DT102     0.3747609 0.4727063 0.4783084 0.4745835 0.4849857 0.4759522 0.4694439 0.162
DT103     0.4136186 0.4770212 0.4841845 0.4843115 0.4933730 0.4942586 0.4833140 0.166
DT104     0.6908267 0.6321259 0.6272183 0.6269430 0.6128531 0.6268135 0.6300071 0.778
DT105     0.5520021 0.5536663 0.5411970 0.5555117 0.5449907 0.5263780 0.5400619 0.144
DT109     0.5161542 0.4841527 0.4866321 0.4982482 0.4744184 0.4835680 0.4826495 0.110
DT110     0.5486291 0.6010441 0.6076803 0.5893248 0.6028943 0.6064955 0.5979550 0.124
DT111     0.7276410 0.5877352 0.5966961 0.6118398 0.6075789 0.6014777 0.6036835 0.786

6 Cross validation

Run cross validation (CV) of GS methods. Parameter “cross” indicates the folds of CV, and “seed” sets random seed. “cpus” sets the core numbers of parallel.

predlist <- G2PCrossValidation(cross = 10, seed = 1 , cpus = 3, markers = Markers,
                               pheVal = phenotype, modelMethods = c("rrBLUP", "RFC"),
                               outputModel = FALSE)
# This function return a list and each element indicates one fold results of CV.
> predlist$cv1
       testPheno    rrBLUP   RFC
DT187 0.27187247 0.4862278 0.316
DT21  0.08312729 0.4208843 0.254
DT257 0.25201410 0.4822461 0.302
DT72  0.47947204 0.6243357 0.798
DT17  0.50980385 0.4601366 0.154
DT68  0.45270891 0.4154981 0.176
DT78  0.80182295 0.4193034 0.332
DT273 0.61428571 0.5145006 0.344
DT266 0.40762308 0.7258433 0.528
DT116 0.82464237 0.5425566 0.362
DT169

Related Skills

node-connect

343.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

90.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

cma2015

View profile

View on GitHub

GitHub Stars17

CategoryDevelopment

Updated1mo ago

Forks9

cma2015/G2P

Languages

Security Score

75/100

Audited on Feb 26, 2026

No findings

G2P

Install / Use

README

G2P : Genotypes to Phenotypes<br>

Version and download <br>

Depends <br>

Suggests

Installation <br>

Install dependency packages

Install G2P

Install suggested packages

Contents

Main functions

GS algorithms (20)

Evaluation measures (13)

Example data

Quick start

G2P Tutorial

Command line

0 Setting up the R session

1 Load example datasets and quick look at the format of example dataset.

2 Feature selection

3 Modeling

4 Prediction

5 G2P

6 Cross validation

Related Skills