G2P

A Genome-Wide-Association-Study Simulation Tool for Genotype Simulation, Phenotype Simulation, and Power Evaluation

More abundant simulation functions could be referred to our newly developed package SIMER for simulation of life science and breeding

Authors:

You Tang and Xiaolei Liu

Contact:

[xiaoleiliu@mail.hzau.edu.cn](Xiaolei Liu)

Installation
- Environment Setup
- Windows
- MAC
- Linux
Data Preparation
- ped
- map
- pop
Genotype Simulation
Phenotype Simulation
- Phenotype _ GUI
- Phenotype _ Pipeline
Population Structure
- Population structure _ GUI
- Population structure _ Pipeline
Quality Control
- Quality control _ GUI
- Quality control _ Pipeline
GWAS
- GWAS _ GUI
- GWAS _ Pipeline
Method Evaluation
- Method Evaluation _ GUI
- Method Evaluation _ Pipeline
FAQ and Hints

Installation

back to top

Environment Setup

back to top
JDK1.8 should be installed and environment variables must be configured before using G2P (http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)

Windows

back to top
GUI
Download all files from https://github.com/XiaoleiLiuBio/G2P/tree/master/gG2P_win_64 and double click the .jar file
Pipeline
Download all files from https://github.com/XiaoleiLiuBio/G2P/tree/master/kG2P_win_64

Mac

back to top
GUI
Download all files from https://github.com/XiaoleiLiuBio/G2P/tree/master/gG2P_mac and double click the .jar file
Pipeline
Download all files from https://github.com/XiaoleiLiuBio/G2P/tree/master/kG2P_mac
permission setting

$ chmod 777 gemma oldplink plink

Linux

back to top
GUI
Download all files from https://github.com/XiaoleiLiuBio/G2P/tree/master/gG2P_linux_x86_64 and run

$ Java -jar gG2P.jar

Pipeline
Download all files from https://github.com/XiaoleiLiuBio/G2P/tree/master/kG2P_linux_x86_64
permission setting

$ chmod 777 gemma oldplink plink

Data Preparation

All files should be prepared with the same prefix

ped

details see http://zzz.bwh.harvard.edu/plink/data.shtml#ped
back to top

|Family ID|Individual ID|Father ID|Mother ID|Sex|Trait|marker 1|marker 2|marker 3|marker 4|marker 5|marker 6| | :---: | :---: |:---: |:---: |:---: |:---: |:---: |:---: |:---: |:---: |:---: |:---: | |1|33-16| 0| 0| 0| 2| 0 0| A A| A A| A G| A G| A G| |1|38-11| 0| 0| 0| 2| 0 0| A G| A G| A A| A G| A G| |1|4226 | 0| 0| 0| 2| 0 0| A G| A A| A A| A G| A G| |1|4722| 0| 0| 0| 2| 0 0| A G| A G| A A| A G| A G| |1|A188 | 0| 0| 0| 2| 0 0| A A| A A| A A| A G| A G| |1|A214N| 0| 0| 0| 2| 0 0| A G| A A| A G| A A| A G| |1|A239 | 0| 0| 0| 2| 0 0| A A| A A| A G| A G| A A|

|Family ID|Individual ID|Father ID|Mother ID|Sex|Trait|marker 1|marker 2|marker 3|marker 4|marker 5|marker 6| | :---: | :---: |:---: |:---: |:---: |:---: |:---: |:---: |:---: |:---: |:---: |:---: | |1|33-16| 0| 0| 0| 2| 0 0| 1 1| 1 1| 1 3| 1 3| 1 3| |1|38-11| 0| 0| 0| 2| 0 0| 1 3| 1 3| 1 1| 1 3| 1 3| |1|4226 | 0| 0| 0| 2| 0 0| 1 3| 1 1| 1 1| 1 3| 1 3| |1|4722| 0| 0| 0| 2| 0 0| 1 3| 1 3| 1 1| 1 3| 1 3| |1|A188 | 0| 0| 0| 2| 0 0| 1 1| 1 1| 1 1| 1 3| 1 3| |1|A214N| 0| 0| 0| 2| 0 0| 1 3| 1 1| 1 3| 1 1| 1 3| |1|A239 | 0| 0| 0| 2| 0 0| 1 1| 1 1| 1 3| 1 3| 1 1|

map

details see http://zzz.bwh.harvard.edu/plink/data.shtml#map
back to top

|Chromosome ID|Marker ID|Genetic Distance|Physical Distance| | :---: | :---: |:---: |:---: | |1| PZB00859.1| 0| 157104| |1| PZA01271.1| 0| 1947984| |1| PZA03613.2| 0| 2914066| |1| PZA03613.1| 0| 2914171| |1| PZA03614.2| 0| 2915078| |1| PZA03614.1| 0| 2915242| |1| PZA00258.3| 0| 2973508|

pop

back to top
new samples will be generated using samples within sub-population

|Sample ID|sub-Population ID| | :---: | :---: | |33-16| 1| |38_11| 1| |4226| 1| |4722| 2| |A188| 2| |A214N| 2| |A239| 2| |A272| 2| |A441-5| 2| |A554| 3| |A556| 3| |A6| 3| |A619| 3|

qtn

back to top
each column represents simulated QTNs for each phenotype

|Phenotype 1|Phenotype 2|Phenotype 3|Phenotype 4|Phenotype 5| | :---: | :---: | :---: | :---: | :---: | |66 |67 |80 |83 |90| |9 |15 |52 |59 |135| |90 |96 |143 |147 |174| |3 |3 |15 |58 |89| |89 |118 |185 |203 |212| |69 |72 |72 |84 |110| |46 |59 |125 |204 |207| |14 |15 |19 |29 |39| |9 |23 |65 |111 |131| |19 |52 |74 |179 |194|

Genotype Simulation

Single Population _ GUI

back to top

Ped: ped file
Map: map file
Path for output Ped/Map: path for output ped and map file
Block: Yes or No, if "Yes", the whole genome will be divided into blocks and shuffled to generate new samples
Number of SNPs in each block: Number of SNPs in each block
Mutation rate: the frequency of new mutations
Imputation: if TRUE, major allele will be used to impute missing values
Population size: simulated sample size

Single Population _ Pipeline

back to top

Windows

java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --outgen D:\data\output --rn 100 --block 4 –impute
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --outgen D:\data\output --rn 100 --block 4 --mutation 0.0001 --impute

Linux/Mac

java -jar kG2P.jar --ped /root/data/AG.ped --map /root/data/AG.map --outgen /root/data/output --rn 100 --block 4 –impute
java -jar kG2P.jar --ped /root/data/AG.ped --map /root/data/AG.map --outgen /root/data/output --rn 100 --mutation 0.0001
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --outgen D:\data\output --rn 100 --block 4
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --outgen D:\data\output --rn 100 --impute
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --outgen D:\data\output --rn 100
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --outgen D:\data\output --rn 100 --mutation 0.0001

jar: executive software
ped: ped file
map: map file
outgen: output path
block: number of SNPs in each block
rn: simulated sample size
impute: if 'impute' is added, major allele will be used to impute missing value
mutation: the frequency of new mutations

Multi Populations _ GUI

back to top

Ped: ped file
Map: map file
Pop: pop file
Path for output Ped/Map: path for output ped and map file
Block: Yes, or No, if "Yes", the whole genome will be divided into blocks and shuffled to generate new samples
Number of SNPs in each block: Number of SNPs in each block
Mutation rate: the frequency of new mutations
Migration rate: the ratio of immigrants (or emigrants) for each group
Genetic drift: is the change in the frequency of an existing gene variant (allele) in a population due to random sampling of organisms
Imputation: if TRUE, major allele will be used to impute missing values
Sample size of each population: sample size of each newly simulated population 
Population size: number or vector, simulated sample size

Multi Populations _ Pipeline

back to top

Windows

java -jar kG2P.jar --ped D:\data\AG.ped –map D:\data\AG.map --pop D:\data\AG.pop --outgen D:\data\output --block 4 --rn 100
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --pop D:\data\AG.pop --outgen D:\data\output --block 4 --rn 100 --mutation 0.0001 --mig 0.1 --genetic 0.001

Linux/Mac

java -jar kG2P.jar --ped /root/data/AG.ped --map /root/data/AG.map --pop /root/data/AG.pop --outgen /root/data/output --impute --block 4 --rn 100
java -jar kG2P.jar --ped /root/data/AG.ped --map /root/data/AG.map --pop /root/data/AG.pop --outgen /root/data/output --rn 100 --mutation 0.0001 --mig 0.1 --genetic 0.001
java -jar kG2P.jar --ped /root/data/AG.ped --map /root/data/AG.map --pop /root/data/AG.pop --outgen /root/data/output --rn 100 --genetic 0.001
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.

G2P

Install / Use

README

G2P

A Genome-Wide-Association-Study Simulation Tool for Genotype Simulation, Phenotype Simulation, and Power Evaluation

Authors:

Contact:

Contents

Installation

Environment Setup

Windows

Mac

Linux

Data Preparation

ped

map

pop

qtn

Genotype Simulation

Single Population _ GUI

Single Population _ Pipeline

Windows

Linux/Mac

Multi Populations _ GUI

Multi Populations _ Pipeline

Windows

Linux/Mac