SkillAgentSearch skills...

BitMapperBS

BitMapperBS: a fast and accurate read aligner for whole-genome bisulfite sequencing

Install / Use

/learn @chhylp123/BitMapperBS
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

BitMapperBS: a fast and accurate read aligner for whole-genome bisulfite sequencing

Introduction


Here are the implementations of "BitMapperBS: a fast and accurate read aligner for whole-genome bisulfite sequencing". BitMapperBS is an ultra-fast and memory-efficient aligner that is designed for WGBS reads from directional protocol.

  • #f03c15 (update on August 24, 2019) Please do not use version 1.0.2.0, which has some problems. The current version is 1.0.2.1.

Build Requirements

(1) G++.

(2) CMake.

(3) CMake-supported build tool.

(4) zlib, libbz2 and liblzma libraries. In Ubuntu, please try: sudo apt-get install liblzma-dev zlib1g-dev libbz2-dev.

Hardware&software requirements

(1) CPU must support AVX2 or SSE4.2 instructions.

(2) When building the index for the human genome, BitMapperBS requires about 10GB RAM and 60GB disk space.

(3) When aligning the bs-seq to the human genome, BitMapperBS requires about 7GB RAM.

Supported platforms

BitMapperBS has been successfully tested using six CPU threads on a computer with a six-core Intel Core i7-8770k processor and 64GB RAM, running Ubuntu 16.04. The indexes, reference genomes and reads were stored in a Solid State Drive (SSD) to minimize the loading time. It is also actively used by Computational Biology of Aging Group and BGI Genomics to analyze WGBS data.

Docker container

BitMapper can be run as binary or as a docker container. For example, if we assume that reference genomes and samples are in /data/indexes and /data/samples folders, then:

  • to build index of the reference genome:
docker run -v /data:/data quay.io/comp-bio-aging/bit_mapper_bs:latest /opt/BitMapperBS/bitmapperBS --index /data/indexes/HUMAN/29/GRCh38.primary_assembly.genome.fa  --index_folder human_bs_index
  • to align sequence to bitmapper index:
docker run -v /data:/data quay.io/comp-bio-aging/bit_mapper_bs:latest /opt/BitMapperBS/bitmapperBS --search /data/indexes/human_bs_index  --seq1 /data/samples/SRR948855/SRR948855_1.fastq.gz --seq2 /data/samples/SRR948855/SRR948855_2.fastq.gz --sensitive --pe -t 8 --mapstats --bam -o SRR948855.bam

Installation

  • #f03c15 BitMapperBS can be easily installed via bioconda, please see https://bioconda.github.io/recipes/bitmapperbs/README.html

(1) Download the source code from Github

git clone https://github.com/chhylp123/BitMapperBS.git

(2) Build (CPU must support AVX2 instructions or SSE4.2 instructions)

cd BitMapperBS
make
  • #f03c15 (update on October 10, 2018) If system reports: "cmake: not found" or "psascan_src/inmem_psascan_src/divsufsort_template.h:42:24: fatal error: divsufsort.h: not found", please install CMake in your system.

  • #f03c15 (update on November 28, 2018) If system reports: "fatal error: zlib.h: no such file or directory" or "fatal error: bzlib.h: No such file or directory" or "fatal error: lzma.h: No such file or directory", please install zlib, libbz2 and liblzma libraries. In Ubuntu, please try: sudo apt-get install liblzma-dev zlib1g-dev libbz2-dev.

(3) (update on October 10, 2018) In most cases, BitMapperBS can be compiled from source code automatically, and is able to be implemented successfully. However, in some rare cases (e.g, old version of Linux operating system), BitMapperBS may report error message when building index. For example, report: "sh: 1: ./psascan: not found". This is because BitMapperBS utlizes psascan to build FM-index, and psascan (binary file) cannot be compiled from source code automatically. In this case, please compile psascan manually (https://www.cs.helsinki.fi/group/pads/pSAscan.html), and copy it (binary file of psascan) to the folder of BitMapperBS.

- #f03c15 Please Note!!!

  1. (update on October 10, 2018) In most cases, BitMapperBS can be compiled from source code automatically, and is able to be implemented successfully. However, in some rare cases (e.g, old version of Linux operating system), BitMapperBS may report error message when building index. For example, report: "sh: 1: ./psascan: not found". This is because BitMapperBS utlizes psascan to build FM-index, and psascan (binary file) cannot be compiled from source code automatically. In this case, please compile psascan manually, and copy it (binary file of psascan) to the folder of BitMapperBS. The detailed steps are listed as follows:

(1) Download psascan from https://www.cs.helsinki.fi/group/pads/pSAscan.html, and complie it from source code.

(2) Copy psascan (binary file) to the folder of BitMapperBS.

  1. (update on November 28, 2018) When compiling BitMapperBS, if you get the error message "fatal error: zlib.h: no such file or directory" or "fatal error: bzlib.h: No such file or directory" or "fatal error: lzma.h: No such file or directory", please install zlib, libbz2 and liblzma libraries. In Ubuntu, please try:

sudo apt-get install liblzma-dev zlib1g-dev libbz2-dev

  1. (update on November 28, 2018) Although BitMapperBS itself is significantly faster than other methods, the slow disk I/O cannot be accelerated. In practice, the most serious bottleneck of BitMapperBS is the poor performance of disk I/O, especially when using multiple CPU threads. Thus, if you want to run BitMapperBS using many CPU threads, we suggest you to adopt at least one of the following strategies:

(1) To reduce the amount of disk I/O, you can use the compressed fastq files (.fastq.gz or .fq.gz format) rather than the uncompressed raw files (.fastq or .fq format).

(2) To reduce the amount of disk I/O, you can output the mapping results in BAM format (using the option --bam) rather than SAM format.

(3) The input files and output files of BitMapperBS (e.g., the read files and the output SAM or BAM files) can be saved in fast solid state drives (SSD) storage devices, rather than slow hard disk drive (HDD) storage devices.

If you have problem with the "make" part described above, please contact us (chhy@mail.ustc.edu.cn).

Usage

Indexing Genome

./bitmapperBS --index <genome file name>

The suffix of the index file should be '.index.*'.

  • #f03c15 (update on October 10, 2018) If BitMapperBS reports: "sh: 1: ./psascan: not found" when building index, this means that psascan did not compiled and installed successfully. BitMapperBS utlizes psascan to build FM-index, and psascan (binary file) cannot be compiled from source code automatically. In this case, please compile psascan manually (https://www.cs.helsinki.fi/group/pads/pSAscan.html), and copy it (binary file of psascan) to the folder of BitMapperBS.

Quality and Adapter Trimming

We recommed users to use Trim_Galore to perform the quality and adapter trimming. Please see https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/.

Bisulfite Mapping

single-end reads

./bitmapperBS --search <genome file name> --seq <read file name> [options]

paired-end reads

./bitmapperBS --search <genome file name> --seq1 <read1 file name> --seq2 <read2 file name> [options]

output mapping results in BAM format

./bitmapperBS --search <genome file name> --seq <read file name> --bam -o output.bam [options]

Methylation Extracting

We recommend users to first remove the duplicates by Picard or samtools, and then use MethylDackel to extract methylation information. Please see https://github.com/dpryan79/methyldackel. Please note that MethylDackel maybe quite slow when using too many CPU threads. So we personally recommend users to use 2, 3 or 4 CPU threads for MethylDackel. Too many threads cannot accelerate methylation extracting step.

General Options

| Option | Short Option | Type | Default | Brief Description | | :------: | :---------------: | :-----:|:-----:| :-----| | --help | -h | String | NULL | Show the help information. | | --version | -v | String | NULL | Show current version of BitMapperBS. |

Index Options

| Option | Short Option | Type | Default | Brief Description | | :------: | :---------------: | :-----:|:-----:| :-----| | --index | -i | String | NULL | Generate an index from the specified fasta file. | | --index_folder | NULL | String | NULL | Set the folder that stores the genome indexes. If this option is not set, the indexes would be stores in the same folder of genome (input fasta file). |

Mapping Options

| Option | Short Option | Type | Default | Brief Description | | :------: | :---------------: | :-----:|:-----:| :-----| | --search | NULL| String | NULL | Search in the specified genome. If the indexes of this genome are built without "--index_folder", please provide the path to the fasta file when aligning. Otherwise please provide the path to the index folder (set by "--index_folder" during indexing).| | --fast | NULL| NULL | NULL | Set bitmapperBS in fast mode (default). Only available for paired-end mode.| | --sensitive | NULL| NULL | NULL | Set bitmapperBS in sensitive mode. Only available for paired-end mode.| | --seq | NULL| String | NULL | Provide the name of single-end read file (.fastq/.fq/.fastq.gz/.fq.gz format). | | --seq1 | NULL| String | NULL | Provide the name of paired-end read_1 file (.fastq/.fq/.fastq.gz/.fq.gz format). | | --seq2 | NULL| String | NULL | Provide the name of paired-end read_2 file (.fastq/.fq/.fastq.gz/.fq.gz format). | | -o | -o | String | stdout (Standard output) | Provide the name of output file (SAM or BAM format). | | --sam | NULL| NULL | NULL | Output mapping results in SAM format (default). | | --bam | NULL| NULL | NULL | Output mapping results in BAM format. | | -e | -e | Double | 0

View on GitHub
GitHub Stars31
CategoryDevelopment
Updated6mo ago
Forks9

Languages

C

Security Score

82/100

Audited on Sep 11, 2025

No findings