SkillAgentSearch skills...

BSseeker2

A versatile aligning pipeline for bisulfite sequencing data

Install / Use

/learn @BSSeeker/BSseeker2
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

BS-Seeker2

BS-Seeker2 is a seamless and versatile pipeline for accurately and fast mapping the bisulfite-treated reads.

CGmapTools is suggested for downstream analysis of BS-Seeker2.


Homepage | Mirror | Published Paper | Source code | Galaxy Toolshed | UCLA Galaxy | CGmapTools


Contents

  1. Remarkable new features

  2. Supports

  3. System requirements

  4. Module descriptions

  5. Contact Information

  6. Questions and Answers

1 Remarkable new features

  • Reduced index for RRBS, accelerating the mapping speed and increasing mappability
  • Allowing local/gapped alignment with Bowtie2, increased the mappability
  • Option for removing reads suffering from bisulfite conversion failure

2 Supports

  • Supported library types

    • Whole Genome-wide Bisulfite Sequencing (WGBS)
    • Reduced Representative Bisulfite Sequencing (RRBS)
  • Supported formats for input file

  • Supported alignment tools

  • Supported formats for mapping results

3 System requirements

  • Linux/Unix or Mac OS platform

  • One of the following short read aligners

    bowtie, bowtie2, soap

  • Python (Version 2.6 +)

    It is normally pre-installed in Linux. Type " python -V" to see the installed version.

  • pysam package (Version 0.6.x).

    Read "Questions & Answers" if you have problem when installing this package.

4 Module descriptions

4.1 FilterReads.py

Optional and independent module. Not necessary. Some reads would be extremely amplified during the PCR. This script helps you get unique reads before doing the mapping. You can decide whether or not to filter reads before doing the mapping.

  • Usage :

	$ python FilterReads.py
    Usage: FilterReads.py -i <input> -o <output> [-k]
    Author : Guo, Weilong; guoweilong@gmail.com
    Start from: 2012-11-10; Last Update: 2017-12-08
    Description: Unique reads for qseq/fastq/fasta/sequence.
       Low quality reads in qseq file can be filtered.
    Warning: This function is reserved for WGBS, but not for RRBS.
    For WGBS, user can also try 'samtools rmdup' to get unique reads using BAM files.
    For RRBS, it is suggested not to get unique reads, as the starting ends of reads
    are more likely to be same for the reads from one C-CCG~~~C-CGG fragment.

    Options:
    -h, --help  show this help message and exit
    -i FILE     Name of the input qseq/fastq/fasta/sequence file
    -o FILE     Name of the output file
    -k          Would not filter low quality reads if specified, only applied
              for qseq format

  • Tip :

This step is not suggested for RRBS library, as reads from RRBS library would more likely from the same location.

4.2 bs_seeker2-build.py

Module to build the index for BS-Seeker2.

  • Usage :

    $ python bs_seeker2-build.py -h
    
    Usage: bs_seeker2-build.py [options]

    Options:
      -h, --help            show this help message and exit
      -f FILE, --file=FILE  Input your reference genome file (fasta)
      --aligner=ALIGNER     Aligner program to perform the analysis: bowtie,
                            bowtie2, soap, rmap [Default: bowtie]
      -p PATH, --path=PATH  Path to the aligner program. Detected:
                            bowtie: /Install/bowtie-1.1.2/
                            bowtie2: /Install/bowtie2-master/
                            rmap: None
                            soap: None
      -d DBPATH, --db=DBPATH
                            Path to the reference genome library (generated in
                            preprocessing genome) [Default: /Install/BSseeker2/bs_utils/reference_genomes]
      -v, --version         show version of BS-Seeker2

      Reduced Representation Bisulfite Sequencing Options:
        Use this options with conjuction of -r [--rrbs]

        -r, --rrbs          Build index specially for Reduced Representation
                            Bisulfite Sequencing experiments. Genome other than
                            certain fragments will be masked. [Default: False]
        -l LOW_BOUND, --low=LOW_BOUND
                            lower bound of fragment length (excluding recognition
                            sequence such as C-CGG) [Default: 20]
        -u UP_BOUND, --up=UP_BOUND
                            upper bound of fragment length (excluding recognition
                            sequence such as C-CGG ends) [Default: 500]
        -c CUT_FORMAT, --cut-site=CUT_FORMAT
                            Cut sites of restriction enzyme. Ex: MspI(C-CGG),
                            Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
                            [Default: C-CGG]
  • Example

    # Build genome index for WGBS using bowtie, path of bowtie should be included in $PATH
    python bs_seeker2-build.py -f genome.fa --aligner=bowtie

    # Build genome index for RRBS with default parameters specifying the path for bowtie2
    python bs_seeker2-build.py -f genome.fa --aligner=bowtie2 -p ~/install/bowtie2-2.0.0-beta7/ -r

    # Build genome index for RRBS library using bowite2, with fragment lengths ranging [40bp, 400bp]
    python bs_seeker2-build.py -f genome.fa -r -l 40 -u 400 --aligner=bowtie2

    # Build genome index for RRBS library for double-enzyme :
    # MspI (C-CGG) & ApeKI (G-CWGC, where W=A|T, see [IUPAC code](http://www.bioinformatics.org/sms/iupac.html))
    python bs_seeker2-build.py -f genome.fa -r -c C-CGG,G-CWGC --aligner=bowtie

  • Tips:

Index built for BS-Seeker2 is different from the index for BS-Seeker 1. For RRBS, you need to specify "-r" in the parameters. Also, you need to specify LOW_BOUND and UP_BOUND for the range of fragment lengths according your protocol.

The fragment length is different from read length. Fragments refers to the DNA fragments which you get by size-selection step (i.e. gel-cut oor AMPure beads). Lengths of fragments are supposed to be in a range, such as [50bp,250bp].

The indexes for RRBS and WGBS are different. Also, indexes for RRBS are specific for fragment length parameters (LOW_BOUND and UP_BOUND).

4.3 bs_seeker2-align.py

Module to map reads on 3-letter converted genome.

  • Usage :
	$ bs_seeker2-align.py -h
    Usage: bs_seeker2-align.py {-i <single> | -1 <mate1> -2 <mate2>} -g <genome.fa> [options]

    Options:
      -h, --help            show this help message and exit

      For single end reads:
        -i INFILE, --input=INFILE
                            Input read file (FORMAT: sequences, qseq, fasta,
                            fastq). Ex: read.fa or read.fa.gz

      For pair end reads:
        -1 FILE, --input_1=FILE
                            Input read file, mate 1 (FORMAT: sequences, qseq,
                            fasta, fastq)
        -2 FILE, --input_2=FILE
                            Input read file, mate 2 (FORMAT: sequences, qseq,
                            fasta, fastq)
        -I MIN_INSERT_SIZE, --minins=MIN_INSERT_SIZE
                            The minimum insert size for valid paired-end
                            alignments [Default: 0]
        -X MAX_INSERT_SIZE, --maxins=MAX_INSERT_SIZE
                            The maximum insert size for valid paired-end
                            alignments [Default: 500]

      Reduced Representation Bisulfite Sequencing Options:
        -r, --rrbs          Map reads to the Reduced Representation genome
        -c pattern, --cut-site=pattern
                            Cutting sites of restriction enzyme. Ex: MspI(C-CGG),
                            Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
                            [Default: C-CGG]
        -L RRBS_LOW_BOUND, --low=RRBS_LOW_BOUND
                            Lower bound of fragment length (excluding C-CGG ends)
      
View on GitHub
GitHub Stars68
CategoryDevelopment
Updated12d ago
Forks25

Languages

Python

Security Score

95/100

Audited on Mar 19, 2026

No findings