<a name="top"></a>Block Regression Mapping (BRM)

Block Regression Mapping (BRM) is a statistical method for QTL mapping based on bulked segregant analysis by deep sequencing. The core function is programmed by R language. For the detailed description of the method, please see the original article "BRM: A statistical method for QTL mapping based on bulked segregant analysis by deep sequencing" published in Bioinformatics.

Please cite: Huang L, Tang W, Bu S, et al. BRM: A statistical method for QTL mapping based on bulked segregant analysis by deep sequencing. Bioinformatics, 2019. https://doi.org/10.1093/bioinformatics/btz861

Content

Introduction
Getting started
Input data file
- Example
Input configuration files
- Chromosome length file
  - Example
- Block regression mapping configuration file
  - Example
About uα/2
- The uα/2 values in various populations
 - The uα/2 values of some species in various populations
 - A tool for calculating uα/2 in an Fk population
- The uα/2 values in random-mating progeny populations
 - The uα/2 values of yeast and maize in random-mating progeny populations
 - A tool for calculating uα/2 in an Rk population
Output files
- Result 1 file
  - Example
- Result 2 file
  - Example
Q&A

<a name="intro"></a>Introduction

BRM is a method of BSA-seq for mapping QTLs or major genes. It can apply to different populations including recombinant inbred lines (RIL), doubled haploid (DH), haploid (H), F2 , F3, and so on.

BRM finds out candidate QTL (or gene) peaks in three main steps. The first step is to divide the genome into many small blocks of equal size and calculate the average allele frequency (AF) of each block in each pool, the average allele frequency in the population (AFP) of each block as well as the allele frequency difference (AFD) between two pools in each block. The second step is to figure out the AFD threshold of the 5% overall significance level at every genomic position. The third step is to identify possible QTL positions (significant AFD peaks) and calculate the 95% confidence interval of each QTL. The BRM scripts output the above results in two files. The first file contains the results of step one and step two, and the second file contains the results of step three.

<a name="getstart"></a>Getting started

Download all scripts and examples from GitHub, then you can have a try with the example data:

git clone https://github.com/huanglikun/BRM.git
cd BRM
# Usage: 
# Rscript BRM.R <Block regression mapping configuration file> <Chromosome length file> <Input data in bsa format>
# Design A
Rscript BRM.R configureExample/designA/BRM_conf.txt configureExample/chr_length.tsv dataExample/designA/yeast_markers_dp10.bsa
# Design B (the experiment which has a high selected pool and a random pool)
Rscript BRM.R configureExample/designBH/BRM_conf.txt configureExample/chr_length.tsv dataExample/designBH/yeast_markers_dp10.bsa
# Design B (the experiment which has a low selected pool and a random pool)
Rscript BRM.R configureExample/designBL/BRM_conf.txt configureExample/chr_length.tsv dataExample/designBL/yeast_markers_dp10.bsa

<a name="inputdata"></a>Input data file

The input data file is a tab-separated values file named with bsa format. It contains six columns:

| Column 1 | Column 2 | Column 3 | Column 4 | Column 5 | Column 6 | | :---: | :---: | :---: | :---: | :---: | :---: | | Chromosome code | Marker position (bp) | a | b | c | d |

Note: a, b, c and d stand for the counts of marker allele of PARENT 1 in pool 1 (high pool in Design A or selected high/low pool in Design B), PARENT 2 in pool 1, PARENT 1 in pool 2 (low pool in Design A or random pool in Design B) and PARENT 2 in pool 2, respectively.

Pools illustration

<a name="inputdataexample"></a>Example

dataExample/designBL/yeast_markers_dp10.bsa

I	33070	6	6	6	7
I	33147	7	7	2	5
I	33152	4	5	2	5
I	33200	3	9	4	3
I	33293	7	7	7	2  
......

<a name="inputconf"></a>Input configuration files

<a name="inputconfchrlen"></a>Chromosome length file

It's a tab-separated values file with two columns:

| Column 1 | Column 2 | | --- | --- | | Chromosome code | Chromosome length (bp) |
- <a name="inputconfchrlenexample"></a>Example
 
 configureExample/chr_length.tsv
```
 I 230218
 II 813184
 III 316620
 ......
```

Note: The chromosome code in data file should corresponding to the chromosome code in chromosome length file.

<a name="inputconfblk"></a>Block regression mapping configuration file

It's a key-value file containing parameters required for block regression mapping (see the following table). The separator is "=". The space between key and value will be ignored.

| Key | Value type | Description | Recommend value | | --- | --- | --- | --- | | Design | A, BH, BL | A: Design A; BH: Design B with selected high pool; BL: Design B with selected low pool | Default: A | | n1 | Integer | Number of individuals in pool1 (high pool) | Depending on experiment design | | n2 | Integer | Number of individuals in pool2 (low pool) | Depending on experiment design | | t | 0, 1 | Population type | t = 0 for DH, RI or H; t = 1 for F2, F3, etc. | | ua | float | uα/2 value | See the table below, or can be calculated by cal_ua_fk.R or cal_ua_rk.R in the "tools" directory (see explanation below). | | UNIT | Integer | Unit block size (bp) | Default: 1000 | | DEG | Integer | The degree of the polynomials used in LOESS (local polynomial regression fitting) | 2 | | BLK | Integer or float | Block size = BLK * UNIT | e.g.: yeast, BLK = 0.2 ; rice, BLK = 20 | | MIN | Integer | Minimum total depth in a valid block | 10 | | MINVALID | Integer | Minimum valid block number in a chromosome | 10 | | Result1_File | File path | To define the result 1 output path (optional) | Default: result/result1.xls | | Result2_File | File path | To define the result 2 output path (optional) | Default: result/result2.xls |
- <a name="inputconfblkexample"></a>Example
 
 configureExample/designBL/BRM_conf.txt
```
 # version 0.3

 # Experiment
 Design = BL # available: A, BH, BL. BH: Design B with HIGH selected pool. BL: Design B with LOW selected pool.
 n1 = 300 # pool 1 size. The high pool for design A or the selected pool for design B.
 n2 = 300 # pool 2 size. The low pool for design A or the random pool for design B.
 t = 0 # For DH or RI etc., t=0; F2 or F3 etc., t=1
 ua = 4.08 # For rice, F2:3.65; F3:3.74; F4:3.78.

 # Block regression
 UNIT = 1000 # block unit(bp)
 DEG = 2 # The degree of the polynomials to be used in Local Polynomial Regression Fitting.
 BLK = 0.2 # block size = BLK * UNIT
 MIN = 10 # min total depth in block
 MINVALID = 10 # min valid blocks in one chromosome (needed to be at least 10)

 # output file determination (optional)
 # Result1_File = result/result1.xls	# including allele frequency and threshold
 # Result2_File = result/result2.xls	# including peak information and confidence interval
 Result1_File = result_random_L/result1.xls	# including allele frequency and threshold
 Result2_File = result_random_L/result2.xls	# including peak information and confidence interval
```

<a name="aboutua"></a>About uα/2

<a name="uafk"></a>The uα/2 values in various populations

<a name="uafktable"></a>The uα/2 values of some species in various populations
<table style="text-align: center"><thead><tr><th rowspan=2 style="text-align: center">Species</th><th rowspan=2 style="text-align: center">na</th><th colspan=2 style="text-align: center">Genome sizeb</th><th rowspan=2 style="text-align: center">Ratio(kb/cM)</th><th colspan=4 style="text-align: center">uα/2c</th><th rowspan=2 style="text-align: center">Ref.</th></tr> <tr><th style="text-align: center">cM</th><th style="text-align: center">Mb</th><th style="text-align: center">H/DH/F2</th><th style="text-align: center">F3</th><th style="text-align: center">F4</th><th style="text-align: center">RIL</th></tr></thead> <tbody><tr><td>Arabidopsis</td><td>5</td><td>600</td><td>119</td><td>199</td><td>3.41</td><td>3.50</td><td>3.54</td><td>3.57</td><td>[1]</td></tr> <tr><td>Cucumber</td><td>7</td><td>1390</td><td>192</td><td>138</td><td>3.62</td><td>3.72</td><td>3.75</td><td>3.78</td><td>[2]</td></tr> <tr><td>Maize</td><td>10</td><td>2060</td><td>2106</td><td>1023</

BRM

Install / Use

README

<a name="top"></a>Block Regression Mapping (BRM)

Content

<a name="intro"></a>Introduction

<a name="getstart"></a>Getting started

<a name="inputdata"></a>Input data file

<a name="inputdataexample"></a>Example

<a name="inputconf"></a>Input configuration files

<a name="inputconfchrlen"></a>Chromosome length file

<a name="inputconfchrlenexample"></a>Example

<a name="inputconfblk"></a>Block regression mapping configuration file

<a name="inputconfblkexample"></a>Example

<a name="aboutua"></a>About <i>u</i><sub>α/2</sub>

<a name="uafk"></a>The <i>u</i><sub>α/2</sub> values in various populations

<a name="uafktable"></a>The <i>u</i><sub>α/2</sub> values of some species in various populations