SkillAgentSearch skills...

BRM

Block Regression Mapping (BRM) is a statistical method for QTL mapping based on bulked segregant analysis by deep sequencing

Install / Use

/learn @huanglikun/BRM
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<a name="top"></a>Block Regression Mapping (BRM)

Block Regression Mapping (BRM) is a statistical method for QTL mapping based on bulked segregant analysis by deep sequencing. The core function is programmed by R language. For the detailed description of the method, please see the original article "BRM: A statistical method for QTL mapping based on bulked segregant analysis by deep sequencing" published in Bioinformatics.

Please cite: Huang L, Tang W, Bu S, et al. BRM: A statistical method for QTL mapping based on bulked segregant analysis by deep sequencing. Bioinformatics, 2019. https://doi.org/10.1093/bioinformatics/btz861

Content

<a name="intro"></a>Introduction

BRM is a method of BSA-seq for mapping QTLs or major genes. It can apply to different populations including recombinant inbred lines (RIL), doubled haploid (DH), haploid (H), F<sub>2</sub> , F<sub>3</sub>, and so on.

BRM finds out candidate QTL (or gene) peaks in three main steps. The first step is to divide the genome into many small blocks of equal size and calculate the average allele frequency (AF) of each block in each pool, the average allele frequency in the population (AFP) of each block as well as the allele frequency difference (AFD) between two pools in each block. The second step is to figure out the AFD threshold of the 5% overall significance level at every genomic position. The third step is to identify possible QTL positions (significant AFD peaks) and calculate the 95% confidence interval of each QTL. The BRM scripts output the above results in two files. The first file contains the results of step one and step two, and the second file contains the results of step three.

back to top

<a name="getstart"></a>Getting started

Download all scripts and examples from GitHub, then you can have a try with the example data:

git clone https://github.com/huanglikun/BRM.git
cd BRM
# Usage: 
# Rscript BRM.R <Block regression mapping configuration file> <Chromosome length file> <Input data in bsa format>
# Design A
Rscript BRM.R configureExample/designA/BRM_conf.txt configureExample/chr_length.tsv dataExample/designA/yeast_markers_dp10.bsa
# Design B (the experiment which has a high selected pool and a random pool)
Rscript BRM.R configureExample/designBH/BRM_conf.txt configureExample/chr_length.tsv dataExample/designBH/yeast_markers_dp10.bsa
# Design B (the experiment which has a low selected pool and a random pool)
Rscript BRM.R configureExample/designBL/BRM_conf.txt configureExample/chr_length.tsv dataExample/designBL/yeast_markers_dp10.bsa

back to top

<a name="inputdata"></a>Input data file

The input data file is a tab-separated values file named with bsa format. It contains six columns:

| Column 1 | Column 2 | Column 3 | Column 4 | Column 5 | Column 6 | | :---: | :---: | :---: | :---: | :---: | :---: | | Chromosome code | Marker position (bp) | a | b | c | d |

Note: a, b, c and d stand for the counts of marker allele of PARENT 1 in pool 1 (high pool in Design A or selected high/low pool in Design B), PARENT 2 in pool 1, PARENT 1 in pool 2 (low pool in Design A or random pool in Design B) and PARENT 2 in pool 2, respectively.

Pools illustration

  • <a name="inputdataexample"></a>Example

    dataExample/designBL/yeast_markers_dp10.bsa

    I	33070	6	6	6	7
    I	33147	7	7	2	5
    I	33152	4	5	2	5
    I	33200	3	9	4	3
    I	33293	7	7	7	2  
    ......
    

back to top

<a name="inputconf"></a>Input configuration files

  • <a name="inputconfchrlen"></a>Chromosome length file

    It's a tab-separated values file with two columns:

    | Column 1 | Column 2 | | --- | --- | | Chromosome code | Chromosome length (bp) |

    • <a name="inputconfchrlenexample"></a>Example

      configureExample/chr_length.tsv

        I    230218
        II    813184
        III    316620
        ......
      

Note: The chromosome code in data file should corresponding to the chromosome code in chromosome length file.

back to top

  • <a name="inputconfblk"></a>Block regression mapping configuration file

    It's a key-value file containing parameters required for block regression mapping (see the following table). The separator is "=". The space between key and value will be ignored.

    | Key | Value type | Description | Recommend value | | --- | --- | --- | --- | | Design | A, BH, BL | A: Design A; <br>BH: Design B with selected high pool; <br>BL: Design B with selected low pool | Default: A | | n1 | Integer | Number of individuals in pool1 (high pool) | Depending on experiment design | | n2 | Integer | Number of individuals in pool2 (low pool) | Depending on experiment design | | t | 0, 1 | Population type | t = 0 for DH, RI or H; t = 1 for F<sub>2</sub>, F<sub>3</sub>, etc. | | ua | float | <i>u</i><sub>α/2</sub> value | See the table below, or can be calculated by cal_ua_fk.R or cal_ua_rk.R in the "tools" directory (see explanation below). | | UNIT | Integer | Unit block size (bp) | Default: 1000 | | DEG | Integer | The degree of the polynomials used in LOESS (local polynomial regression fitting) | 2 | | BLK | Integer or float | Block size = BLK * UNIT | e.g.: yeast, BLK = 0.2 ; rice, BLK = 20 | | MIN | Integer | Minimum total depth in a valid block | 10 | | MINVALID | Integer | Minimum valid block number in a chromosome | 10 | | Result1_File | File path | To define the result 1 output path (optional) | Default: result/result1.xls | | Result2_File | File path | To define the result 2 output path (optional) | Default: result/result2.xls |

    • <a name="inputconfblkexample"></a>Example

      configureExample/designBL/BRM_conf.txt

        # version 0.3
      
        # Experiment
        Design   = BL       # available: A, BH, BL. BH: Design B with HIGH selected pool. BL: Design B with LOW selected pool.
        n1       = 300      # pool 1 size. The high pool for design A or the selected pool for design B.
        n2       = 300      # pool 2 size. The low pool for design A or the random pool for design B.
        t        = 0        # For DH or RI etc., t=0; F2 or F3 etc., t=1
        ua       = 4.08     # For rice, F2:3.65; F3:3.74; F4:3.78.
      
        # Block regression
        UNIT     = 1000     # block unit(bp)
        DEG      = 2        # The degree of the polynomials to be used in Local Polynomial Regression Fitting.
        BLK      = 0.2      # block size = BLK * UNIT
        MIN      = 10       # min total depth in block
        MINVALID = 10       # min valid blocks in one chromosome (needed to be at least 10)
      
        # output file determination (optional)
        # Result1_File = result/result1.xls	# including allele frequency and threshold
        # Result2_File = result/result2.xls	# including peak information and confidence interval
        Result1_File = result_random_L/result1.xls	# including allele frequency and threshold
        Result2_File = result_random_L/result2.xls	# including peak information and confidence interval
      

back to top

<a name="aboutua"></a>About <i>u</i><sub>α/2</sub>

<a name="uafk"></a>The <i>u</i><sub>α/2</sub> values in various populations

  • <a name="uafktable"></a>The <i>u</i><sub>α/2</sub> values of some species in various populations

    <table style="text-align: center"><thead><tr><th rowspan=2 style="text-align: center">Species</th><th rowspan=2 style="text-align: center">n<sup>a</sup></th><th colspan=2 style="text-align: center">Genome size<sup>b</sup></th><th rowspan=2 style="text-align: center">Ratio</br>(kb/cM)</th><th colspan=4 style="text-align: center"><i>u</i><sub>α/2</sub><sup>c</sup></th><th rowspan=2 style="text-align: center">Ref.</th></tr> <tr><th style="text-align: center">cM</th><th style="text-align: center">Mb</th><th style="text-align: center">H/DH/<i>F</i><sub>2</sub></th><th style="text-align: center"><i>F</i><sub>3</sub></th><th style="text-align: center"><i>F</i><sub>4</sub></th><th style="text-align: center">RIL</th></tr></thead> <tbody><tr><td><i>Arabidopsis</i></td><td>5</td><td>600</td><td>119</td><td>199</td><td>3.41</td><td>3.50</td><td>3.54</td><td>3.57</td><td>[1]</td></tr> <tr><td>Cucumber</td><td>7</td><td>1390</td><td>192</td><td>138</td><td>3.62</td><td>3.72</td><td>3.75</td><td>3.78</td><td>[2]</td></tr> <tr><td>Maize</td><td>10</td><td>2060</td><td>2106</td><td>1023</
View on GitHub
GitHub Stars8
CategoryDevelopment
Updated2mo ago
Forks3

Languages

R

Security Score

85/100

Audited on Jan 13, 2026

No findings