BS-Seeker2

BS-Seeker2 is a seamless and versatile pipeline for accurately and fast mapping the bisulfite-treated reads.

CGmapTools is suggested for downstream analysis of BS-Seeker2.

Remarkable new features
Supports
System requirements
Module descriptions
- 4.1 FilterReads.py
- 4.2 bs_seeker2-build.py
- 4.3 bs_seeker2-align.py
- 4.4 bs_seeker2-call_methylation.py
Contact Information
Questions and Answers
- (1) Performance
- (2) Input/Output formats
- (3) "Pysam" package related problem
- (4) Configuration of BS-Seeker2
- (5) Unique alignment
- (6) Paired-end sequencing alignment
- (7) Adapter related issue
- (8) Others

1 Remarkable new features

Reduced index for RRBS, accelerating the mapping speed and increasing mappability
Allowing local/gapped alignment with Bowtie2, increased the mappability
Option for removing reads suffering from bisulfite conversion failure

2 Supports

Supported library types
- Whole Genome-wide Bisulfite Sequencing (WGBS)
- Reduced Representative Bisulfite Sequencing (RRBS)
Supported formats for input file
- fasta
- fastq
- qseq
- pure sequence (one-line one-sequence)
Supported alignment tools
- bowtie : Single-seed, fast, (default)
- bowtie2 : Multiple-seed, gapped-alignment
  - local alignment (default for bowtie2)
  - end-to-end alignment
- soap
Supported formats for mapping results
- BAM
- SAM
- BS-seeker

3 System requirements

Linux/Unix or Mac OS platform
One of the following short read aligners

bowtie, bowtie2, soap
Python (Version 2.6 +)

It is normally pre-installed in Linux. Type " python -V" to see the installed version.
pysam package (Version 0.6.x).

Read "Questions & Answers" if you have problem when installing this package.

4 Module descriptions

4.1 FilterReads.py

Optional and independent module. Not necessary. Some reads would be extremely amplified during the PCR. This script helps you get unique reads before doing the mapping. You can decide whether or not to filter reads before doing the mapping.

Usage :


	$ python FilterReads.py
    Usage: FilterReads.py -i <input> -o <output> [-k]
    Author : Guo, Weilong; guoweilong@gmail.com
    Start from: 2012-11-10; Last Update: 2017-12-08
    Description: Unique reads for qseq/fastq/fasta/sequence.
       Low quality reads in qseq file can be filtered.
    Warning: This function is reserved for WGBS, but not for RRBS.
    For WGBS, user can also try 'samtools rmdup' to get unique reads using BAM files.
    For RRBS, it is suggested not to get unique reads, as the starting ends of reads
    are more likely to be same for the reads from one C-CCG~~~C-CGG fragment.

    Options:
    -h, --help  show this help message and exit
    -i FILE     Name of the input qseq/fastq/fasta/sequence file
    -o FILE     Name of the output file
    -k          Would not filter low quality reads if specified, only applied
              for qseq format

Tip :

This step is not suggested for RRBS library, as reads from RRBS library would more likely from the same location.

4.2 bs_seeker2-build.py

Module to build the index for BS-Seeker2.

Usage :


    $ python bs_seeker2-build.py -h
    
    Usage: bs_seeker2-build.py [options]

    Options:
      -h, --help            show this help message and exit
      -f FILE, --file=FILE  Input your reference genome file (fasta)
      --aligner=ALIGNER     Aligner program to perform the analysis: bowtie,
                            bowtie2, soap, rmap [Default: bowtie]
      -p PATH, --path=PATH  Path to the aligner program. Detected:
                            bowtie: /Install/bowtie-1.1.2/
                            bowtie2: /Install/bowtie2-master/
                            rmap: None
                            soap: None
      -d DBPATH, --db=DBPATH
                            Path to the reference genome library (generated in
                            preprocessing genome) [Default: /Install/BSseeker2/bs_utils/reference_genomes]
      -v, --version         show version of BS-Seeker2

      Reduced Representation Bisulfite Sequencing Options:
        Use this options with conjuction of -r [--rrbs]

        -r, --rrbs          Build index specially for Reduced Representation
                            Bisulfite Sequencing experiments. Genome other than
                            certain fragments will be masked. [Default: False]
        -l LOW_BOUND, --low=LOW_BOUND
                            lower bound of fragment length (excluding recognition
                            sequence such as C-CGG) [Default: 20]
        -u UP_BOUND, --up=UP_BOUND
                            upper bound of fragment length (excluding recognition
                            sequence such as C-CGG ends) [Default: 500]
        -c CUT_FORMAT, --cut-site=CUT_FORMAT
                            Cut sites of restriction enzyme. Ex: MspI(C-CGG),
                            Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
                            [Default: C-CGG]

Example


    # Build genome index for WGBS using bowtie, path of bowtie should be included in $PATH
    python bs_seeker2-build.py -f genome.fa --aligner=bowtie

    # Build genome index for RRBS with default parameters specifying the path for bowtie2
    python bs_seeker2-build.py -f genome.fa --aligner=bowtie2 -p ~/install/bowtie2-2.0.0-beta7/ -r

    # Build genome index for RRBS library using bowite2, with fragment lengths ranging [40bp, 400bp]
    python bs_seeker2-build.py -f genome.fa -r -l 40 -u 400 --aligner=bowtie2

    # Build genome index for RRBS library for double-enzyme :
    # MspI (C-CGG) & ApeKI (G-CWGC, where W=A|T, see [IUPAC code](http://www.bioinformatics.org/sms/iupac.html))
    python bs_seeker2-build.py -f genome.fa -r -c C-CGG,G-CWGC --aligner=bowtie

Tips:

Index built for BS-Seeker2 is different from the index for BS-Seeker 1. For RRBS, you need to specify "-r" in the parameters. Also, you need to specify LOW_BOUND and UP_BOUND for the range of fragment lengths according your protocol.

The fragment length is different from read length. Fragments refers to the DNA fragments which you get by size-selection step (i.e. gel-cut oor AMPure beads). Lengths of fragments are supposed to be in a range, such as [50bp,250bp].

The indexes for RRBS and WGBS are different. Also, indexes for RRBS are specific for fragment length parameters (LOW_BOUND and UP_BOUND).

4.3 bs_seeker2-align.py

Module to map reads on 3-letter converted genome.

Usage :

	$ bs_seeker2-align.py -h
    Usage: bs_seeker2-align.py {-i <single> | -1 <mate1> -2 <mate2>} -g <genome.fa> [options]

    Options:
      -h, --help            show this help message and exit

      For single end reads:
        -i INFILE, --input=INFILE
                            Input read file (FORMAT: sequences, qseq, fasta,
                            fastq). Ex: read.fa or read.fa.gz

      For pair end reads:
        -1 FILE, --input_1=FILE
                            Input read file, mate 1 (FORMAT: sequences, qseq,
                            fasta, fastq)
        -2 FILE, --input_2=FILE
                            Input read file, mate 2 (FORMAT: sequences, qseq,
                            fasta, fastq)
        -I MIN_INSERT_SIZE, --minins=MIN_INSERT_SIZE
                            The minimum insert size for valid paired-end
                            alignments [Default: 0]
        -X MAX_INSERT_SIZE, --maxins=MAX_INSERT_SIZE
                            The maximum insert size for valid paired-end
                            alignments [Default: 500]

      Reduced Representation Bisulfite Sequencing Options:
        -r, --rrbs          Map reads to the Reduced Representation genome
        -c pattern, --cut-site=pattern
                            Cutting sites of restriction enzyme. Ex: MspI(C-CGG),
                            Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
                            [Default: C-CGG]
        -L RRBS_LOW_BOUND, --low=RRBS_LOW_BOUND
                            Lower bound of fragment length (excluding C-CGG ends)

BSseeker2

Install / Use

README