BRM
Block Regression Mapping (BRM) is a statistical method for QTL mapping based on bulked segregant analysis by deep sequencing
Install / Use
/learn @huanglikun/BRMREADME
<a name="top"></a>Block Regression Mapping (BRM)
Block Regression Mapping (BRM) is a statistical method for QTL mapping based on bulked segregant analysis by deep sequencing. The core function is programmed by R language. For the detailed description of the method, please see the original article "BRM: A statistical method for QTL mapping based on bulked segregant analysis by deep sequencing" published in Bioinformatics.
Please cite: Huang L, Tang W, Bu S, et al. BRM: A statistical method for QTL mapping based on bulked segregant analysis by deep sequencing. Bioinformatics, 2019. https://doi.org/10.1093/bioinformatics/btz861
Content
- Introduction
- Getting started
- Input data file
- Input configuration files
- About <i>u</i><sub>α/2</sub>
- Output files
- Q&A
<a name="intro"></a>Introduction
BRM is a method of BSA-seq for mapping QTLs or major genes. It can apply to different populations including recombinant inbred lines (RIL), doubled haploid (DH), haploid (H), F<sub>2</sub> , F<sub>3</sub>, and so on.
BRM finds out candidate QTL (or gene) peaks in three main steps. The first step is to divide the genome into many small blocks of equal size and calculate the average allele frequency (AF) of each block in each pool, the average allele frequency in the population (AFP) of each block as well as the allele frequency difference (AFD) between two pools in each block. The second step is to figure out the AFD threshold of the 5% overall significance level at every genomic position. The third step is to identify possible QTL positions (significant AFD peaks) and calculate the 95% confidence interval of each QTL. The BRM scripts output the above results in two files. The first file contains the results of step one and step two, and the second file contains the results of step three.
<a name="getstart"></a>Getting started
Download all scripts and examples from GitHub, then you can have a try with the example data:
git clone https://github.com/huanglikun/BRM.git
cd BRM
# Usage:
# Rscript BRM.R <Block regression mapping configuration file> <Chromosome length file> <Input data in bsa format>
# Design A
Rscript BRM.R configureExample/designA/BRM_conf.txt configureExample/chr_length.tsv dataExample/designA/yeast_markers_dp10.bsa
# Design B (the experiment which has a high selected pool and a random pool)
Rscript BRM.R configureExample/designBH/BRM_conf.txt configureExample/chr_length.tsv dataExample/designBH/yeast_markers_dp10.bsa
# Design B (the experiment which has a low selected pool and a random pool)
Rscript BRM.R configureExample/designBL/BRM_conf.txt configureExample/chr_length.tsv dataExample/designBL/yeast_markers_dp10.bsa
<a name="inputdata"></a>Input data file
The input data file is a tab-separated values file named with bsa format. It contains six columns:
| Column 1 | Column 2 | Column 3 | Column 4 | Column 5 | Column 6 | | :---: | :---: | :---: | :---: | :---: | :---: | | Chromosome code | Marker position (bp) | a | b | c | d |
Note: a, b, c and d stand for the counts of marker allele of PARENT 1 in pool 1 (high pool in Design A or selected high/low pool in Design B), PARENT 2 in pool 1, PARENT 1 in pool 2 (low pool in Design A or random pool in Design B) and PARENT 2 in pool 2, respectively.

-
<a name="inputdataexample"></a>Example
dataExample/designBL/yeast_markers_dp10.bsa
I 33070 6 6 6 7 I 33147 7 7 2 5 I 33152 4 5 2 5 I 33200 3 9 4 3 I 33293 7 7 7 2 ......
<a name="inputconf"></a>Input configuration files
-
<a name="inputconfchrlen"></a>Chromosome length file
It's a tab-separated values file with two columns:
| Column 1 | Column 2 | | --- | --- | | Chromosome code | Chromosome length (bp) |
-
<a name="inputconfchrlenexample"></a>Example
configureExample/chr_length.tsv
I 230218 II 813184 III 316620 ......
-
Note: The chromosome code in data file should corresponding to the chromosome code in chromosome length file.
-
<a name="inputconfblk"></a>Block regression mapping configuration file
It's a key-value file containing parameters required for block regression mapping (see the following table). The separator is "=". The space between key and value will be ignored.
| Key | Value type | Description | Recommend value | | --- | --- | --- | --- | | Design | A, BH, BL | A: Design A; <br>BH: Design B with selected high pool; <br>BL: Design B with selected low pool | Default: A | | n1 | Integer | Number of individuals in pool1 (high pool) | Depending on experiment design | | n2 | Integer | Number of individuals in pool2 (low pool) | Depending on experiment design | | t | 0, 1 | Population type | t = 0 for DH, RI or H; t = 1 for F<sub>2</sub>, F<sub>3</sub>, etc. | | ua | float | <i>u</i><sub>α/2</sub> value | See the table below, or can be calculated by cal_ua_fk.R or cal_ua_rk.R in the "tools" directory (see explanation below). | | UNIT | Integer | Unit block size (bp) | Default: 1000 | | DEG | Integer | The degree of the polynomials used in LOESS (local polynomial regression fitting) | 2 | | BLK | Integer or float | Block size = BLK * UNIT | e.g.: yeast, BLK = 0.2 ; rice, BLK = 20 | | MIN | Integer | Minimum total depth in a valid block | 10 | | MINVALID | Integer | Minimum valid block number in a chromosome | 10 | | Result1_File | File path | To define the result 1 output path (optional) | Default: result/result1.xls | | Result2_File | File path | To define the result 2 output path (optional) | Default: result/result2.xls |
-
<a name="inputconfblkexample"></a>Example
configureExample/designBL/BRM_conf.txt
# version 0.3 # Experiment Design = BL # available: A, BH, BL. BH: Design B with HIGH selected pool. BL: Design B with LOW selected pool. n1 = 300 # pool 1 size. The high pool for design A or the selected pool for design B. n2 = 300 # pool 2 size. The low pool for design A or the random pool for design B. t = 0 # For DH or RI etc., t=0; F2 or F3 etc., t=1 ua = 4.08 # For rice, F2:3.65; F3:3.74; F4:3.78. # Block regression UNIT = 1000 # block unit(bp) DEG = 2 # The degree of the polynomials to be used in Local Polynomial Regression Fitting. BLK = 0.2 # block size = BLK * UNIT MIN = 10 # min total depth in block MINVALID = 10 # min valid blocks in one chromosome (needed to be at least 10) # output file determination (optional) # Result1_File = result/result1.xls # including allele frequency and threshold # Result2_File = result/result2.xls # including peak information and confidence interval Result1_File = result_random_L/result1.xls # including allele frequency and threshold Result2_File = result_random_L/result2.xls # including peak information and confidence interval
-
<a name="aboutua"></a>About <i>u</i><sub>α/2</sub>
<a name="uafk"></a>The <i>u</i><sub>α/2</sub> values in various populations
-
<a name="uafktable"></a>The <i>u</i><sub>α/2</sub> values of some species in various populations
<table style="text-align: center"><thead><tr><th rowspan=2 style="text-align: center">Species</th><th rowspan=2 style="text-align: center">n<sup>a</sup></th><th colspan=2 style="text-align: center">Genome size<sup>b</sup></th><th rowspan=2 style="text-align: center">Ratio</br>(kb/cM)</th><th colspan=4 style="text-align: center"><i>u</i><sub>α/2</sub><sup>c</sup></th><th rowspan=2 style="text-align: center">Ref.</th></tr> <tr><th style="text-align: center">cM</th><th style="text-align: center">Mb</th><th style="text-align: center">H/DH/<i>F</i><sub>2</sub></th><th style="text-align: center"><i>F</i><sub>3</sub></th><th style="text-align: center"><i>F</i><sub>4</sub></th><th style="text-align: center">RIL</th></tr></thead> <tbody><tr><td><i>Arabidopsis</i></td><td>5</td><td>600</td><td>119</td><td>199</td><td>3.41</td><td>3.50</td><td>3.54</td><td>3.57</td><td>[1]</td></tr> <tr><td>Cucumber</td><td>7</td><td>1390</td><td>192</td><td>138</td><td>3.62</td><td>3.72</td><td>3.75</td><td>3.78</td><td>[2]</td></tr> <tr><td>Maize</td><td>10</td><td>2060</td><td>2106</td><td>1023</
