BSseeker2
A versatile aligning pipeline for bisulfite sequencing data
Install / Use
/learn @BSSeeker/BSseeker2README
BS-Seeker2
BS-Seeker2 is a seamless and versatile pipeline for accurately and fast mapping the bisulfite-treated reads.
CGmapTools is suggested for downstream analysis of BS-Seeker2.
Homepage | Mirror | Published Paper | Source code | Galaxy Toolshed | UCLA Galaxy | CGmapTools
Contents
1 Remarkable new features
- Reduced index for RRBS, accelerating the mapping speed and increasing mappability
- Allowing local/gapped alignment with Bowtie2, increased the mappability
- Option for removing reads suffering from bisulfite conversion failure
2 Supports
-
Supported library types
- Whole Genome-wide Bisulfite Sequencing (WGBS)
- Reduced Representative Bisulfite Sequencing (RRBS)
-
Supported formats for input file
-
Supported alignment tools
- bowtie : Single-seed, fast, (default)
- bowtie2 : Multiple-seed, gapped-alignment
- local alignment (default for bowtie2)
- end-to-end alignment
- soap
-
Supported formats for mapping results
3 System requirements
-
Linux/Unix or Mac OS platform
-
One of the following short read aligners
-
Python (Version 2.6 +)
It is normally pre-installed in Linux. Type " python -V" to see the installed version.
-
pysam package (Version 0.6.x).
Read "Questions & Answers" if you have problem when installing this package.
4 Module descriptions
4.1 FilterReads.py
Optional and independent module. Not necessary. Some reads would be extremely amplified during the PCR. This script helps you get unique reads before doing the mapping. You can decide whether or not to filter reads before doing the mapping.
- Usage :
$ python FilterReads.py
Usage: FilterReads.py -i <input> -o <output> [-k]
Author : Guo, Weilong; guoweilong@gmail.com
Start from: 2012-11-10; Last Update: 2017-12-08
Description: Unique reads for qseq/fastq/fasta/sequence.
Low quality reads in qseq file can be filtered.
Warning: This function is reserved for WGBS, but not for RRBS.
For WGBS, user can also try 'samtools rmdup' to get unique reads using BAM files.
For RRBS, it is suggested not to get unique reads, as the starting ends of reads
are more likely to be same for the reads from one C-CCG~~~C-CGG fragment.
Options:
-h, --help show this help message and exit
-i FILE Name of the input qseq/fastq/fasta/sequence file
-o FILE Name of the output file
-k Would not filter low quality reads if specified, only applied
for qseq format
- Tip :
This step is not suggested for RRBS library, as reads from RRBS library would more likely from the same location.
4.2 bs_seeker2-build.py
Module to build the index for BS-Seeker2.
- Usage :
$ python bs_seeker2-build.py -h
Usage: bs_seeker2-build.py [options]
Options:
-h, --help show this help message and exit
-f FILE, --file=FILE Input your reference genome file (fasta)
--aligner=ALIGNER Aligner program to perform the analysis: bowtie,
bowtie2, soap, rmap [Default: bowtie]
-p PATH, --path=PATH Path to the aligner program. Detected:
bowtie: /Install/bowtie-1.1.2/
bowtie2: /Install/bowtie2-master/
rmap: None
soap: None
-d DBPATH, --db=DBPATH
Path to the reference genome library (generated in
preprocessing genome) [Default: /Install/BSseeker2/bs_utils/reference_genomes]
-v, --version show version of BS-Seeker2
Reduced Representation Bisulfite Sequencing Options:
Use this options with conjuction of -r [--rrbs]
-r, --rrbs Build index specially for Reduced Representation
Bisulfite Sequencing experiments. Genome other than
certain fragments will be masked. [Default: False]
-l LOW_BOUND, --low=LOW_BOUND
lower bound of fragment length (excluding recognition
sequence such as C-CGG) [Default: 20]
-u UP_BOUND, --up=UP_BOUND
upper bound of fragment length (excluding recognition
sequence such as C-CGG ends) [Default: 500]
-c CUT_FORMAT, --cut-site=CUT_FORMAT
Cut sites of restriction enzyme. Ex: MspI(C-CGG),
Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
[Default: C-CGG]
- Example
# Build genome index for WGBS using bowtie, path of bowtie should be included in $PATH
python bs_seeker2-build.py -f genome.fa --aligner=bowtie
# Build genome index for RRBS with default parameters specifying the path for bowtie2
python bs_seeker2-build.py -f genome.fa --aligner=bowtie2 -p ~/install/bowtie2-2.0.0-beta7/ -r
# Build genome index for RRBS library using bowite2, with fragment lengths ranging [40bp, 400bp]
python bs_seeker2-build.py -f genome.fa -r -l 40 -u 400 --aligner=bowtie2
# Build genome index for RRBS library for double-enzyme :
# MspI (C-CGG) & ApeKI (G-CWGC, where W=A|T, see [IUPAC code](http://www.bioinformatics.org/sms/iupac.html))
python bs_seeker2-build.py -f genome.fa -r -c C-CGG,G-CWGC --aligner=bowtie
- Tips:
Index built for BS-Seeker2 is different from the index for BS-Seeker 1. For RRBS, you need to specify "-r" in the parameters. Also, you need to specify LOW_BOUND and UP_BOUND for the range of fragment lengths according your protocol.
The fragment length is different from read length. Fragments refers to the DNA fragments which you get by size-selection step (i.e. gel-cut oor AMPure beads). Lengths of fragments are supposed to be in a range, such as [50bp,250bp].
The indexes for RRBS and WGBS are different. Also, indexes for RRBS are specific for fragment length parameters (LOW_BOUND and UP_BOUND).
4.3 bs_seeker2-align.py
Module to map reads on 3-letter converted genome.
- Usage :
$ bs_seeker2-align.py -h
Usage: bs_seeker2-align.py {-i <single> | -1 <mate1> -2 <mate2>} -g <genome.fa> [options]
Options:
-h, --help show this help message and exit
For single end reads:
-i INFILE, --input=INFILE
Input read file (FORMAT: sequences, qseq, fasta,
fastq). Ex: read.fa or read.fa.gz
For pair end reads:
-1 FILE, --input_1=FILE
Input read file, mate 1 (FORMAT: sequences, qseq,
fasta, fastq)
-2 FILE, --input_2=FILE
Input read file, mate 2 (FORMAT: sequences, qseq,
fasta, fastq)
-I MIN_INSERT_SIZE, --minins=MIN_INSERT_SIZE
The minimum insert size for valid paired-end
alignments [Default: 0]
-X MAX_INSERT_SIZE, --maxins=MAX_INSERT_SIZE
The maximum insert size for valid paired-end
alignments [Default: 500]
Reduced Representation Bisulfite Sequencing Options:
-r, --rrbs Map reads to the Reduced Representation genome
-c pattern, --cut-site=pattern
Cutting sites of restriction enzyme. Ex: MspI(C-CGG),
Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
[Default: C-CGG]
-L RRBS_LOW_BOUND, --low=RRBS_LOW_BOUND
Lower bound of fragment length (excluding C-CGG ends)
