WiggleTools 1.2

The WiggleTools package allows genomewide data files to be manipulated as numerical functions, equipped with all the standard functional analysis operators (sum, product, product by a scalar, comparators), and derived statistics (mean, median, variance, stddev, t-test, Wilcoxon's rank sum test, etc).

Conda Installation

Install conda, then run:

conda install -c bioconda wiggletools

Brew Installation

Install Homebrew, then run:

brew install brewsci/bio/wiggletools

Docker Installation

Pull the latest image from Dockerhub:

docker pull ensemblorg/wiggletools:latest

Run the resulting wiggletools executable, bind-mounting the current working directory into the container:

docker container run --rm --mount type=bind,source="$(pwd)",target=/mnt ensemblorg/wiggletools  [...arguments...]

Guix Installation

Install GNU Guix, then run:

guix pull
guix install wiggletools

Build from source

Pre-requisites

WiggleTools requires three main dependencies: LibBigWig, HTSLib and GSL (GNU scientific) libraries. They themselves require zlib bzip2 and libcurl.

Installing LibBigWig

git clone https://github.com/dpryan79/libBigWig.git
cd libBigWig
make install

Installing the htslib library

git clone --recurse-submodules https://github.com/samtools/htslib.git
cd htslib 
make install

Installing the GSL library

wget ftp://www.mirrorservice.org/sites/ftp.gnu.org/gnu/gsl/gsl-latest.tar.gz 
tar -xvzpf gsl-latest.tar.gz
cd gsl*
./configure
make
make install

Installing WiggleTools

If you didn't download WiggleTools yet:

git clone https://github.com/Ensembl/WiggleTools.git

Once you installed the previous libraries and downloaded WiggleTools, you can compile the WiggleTools library:

cd WiggleTools
make

The make process produces a number of outputs:

A statically linked library in lib/
A header for that library in inc/
Various executables in bin/

There is no installation routine, meaning that you should copy the relevant files onto your path, library path, etc. Note that the executable does not require the libraries to be available.

If the system cannot find 'gsl/gsl_cdf.h' then you need to install the GNU scientific library

Just to check, you can launch the tests (requires Python):

make test

Basics

The WiggleTools library, and the derived program, are centered around the use of iterators. An iterator is a function which produces a sequence of values. The cool thing is that iterators can be built off other iterators, offering many combinations.

The wiggletools executable is run by giving it a string which describes an iterator function, which it executes, printing the output into stdout.

wiggletools <program>

If you need a refresher:

wiggletools --help

If you are an intensive user, you may find that processing many files may break limits on commandline commands, especially if shelling out from a scripting language. You may copy the program into a text file, then execute it:

wiggletools run program.txt

Input files

By default, the executable recognizes the file format from the suffix of the file name:

Wiggle files

wiggletools test/fixedStep.wig

BigWig files

wiggletools test/fixedStep.bw

BedGraph files

wiggletools test/bedfile.bg

Bed files

wiggletools test/overlapping.bed

BigBed files

wiggletools test/overlapping.bb

Bam files

Requires a .bai index file in the same directory

wiggletools test/bam.bam

Cram files

Requires a .bai index file in the same directory

wiggletools test/cram.cram

VCF files

wiggletools test/vcf.vcf

BCF files

Requires a .tbi index file in the same directory

wiggletools test/bcf.bcf

Streaming data

You can stream data into WiggleTools, e.g.:

cat test/fixedStep.wig | wiggletools -

The input data is assumed to be in Wig or BedGraph format, but can also be in Sam format:

samtools view test/bam.bam | wiggletools sam -

Operators

However, iterators can be constructed from other iterators, allowing arbitrarily complex constructs to be built. We call these iterators operators. In all the examples below, the iterators are built off simple file readers (for simplicity), but you are free to replace the inputs with other iterators.

1 Unary operators

The following operators are the most straightforward, because they only read data from a single other iterator.

Returns the absolute value of an iterators output:

wiggletools abs test/fixedStep.bw

Returns the natural log of an iterators output:

wiggletools ln test/fixedStep.bw

Returns the logarithm in an arbitrary base of an iterators output:

wiggletools log 10 test/fixedStep.bw

scale

Returns an iterator's output multiplied by a scalar (i.e. decimal number):

wiggletools scale 10 test/fixedStep.bw

offset

Returns an iterator's output added to a scalar (i.e. decimal number):

wiggletools offset 10 test/fixedStep.bw

Returns contiguous boolean regions where the iterator is strictly greater than a given cutoff:

wiggletools gt 5 test/fixedStep.bw

This is useful to define regions in the apply function, or to compute information content (see below).

Returns contiguous boolean regions where the iterator is strictly less than a given cutoff:

wiggletools lt 5 test/fixedStep.bw

This is useful to define regions in the apply function, or to compute information content (see below).

Returns contiguous boolean regions where the iterator is greater than or equal to a given cutoff:

wiggletools gte 5 test/fixedStep.bw

This is useful to define regions in the apply function, or to compute information content (see below).

Returns contiguous boolean regions where the iterator is less than or equal to a given cutoff:

wiggletools lte 5 test/fixedStep.bw

This is useful to define regions in the apply function, or to compute information content (see below).

unit

Returns 1 if the operator is non-zero, 0 otherwise, and merges contiguous positions with the same output value into blocks:

wiggletools unit test/fixedStep.bw

This is useful to define regions in the apply function (see below).

coverage

Returns a coverage plot of overlapping regions, typically read from a bed file:

wiggletools coverage test/overlapping.bed

isZero

Does not print anything, just exits with return value 1 (i.e. error) if it encounters a non-zero value:

wiggletools isZero test/fixedStep.bw

seek

Outputs only the points of an iterator within a given genomic region:

wiggletools seek chr1 2 8 test/fixedStep.bw

Sums results into fixed-size bins

wiggletools bin 2 test/fixedStep.bw

toInt

Casts the iterator's output to an int, effectively rounding any floating point values toward zero.

wiggletools toInt test/fixedStep.bw

floor

Returns the floor of a iterator's output. Note that floor rounds the output toward negative infinity.

wiggletools floor test/fixedStep.bw

shiftPos

Returns the iterator given with start and end positions shifted downwards by a specified value. Note the given value must be non-negative, as default behavior is to shift coordinates toward zero.

wiggletools shiftPos 10 test/fixedStep.bw

2 Binary operators

The following operators read data from exactly two iterators, allowing comparisons:

diff

Returns the difference between two iterators outputs:

wiggletools diff test/fixedStep.bw test/variableStep.bw

ratio

Returns the output of the first iterator divided by the output of the second (divisions by 0 are squashed, and no result is given for those bases):

wiggletools ratio test/fixedStep.bw test/variableStep.bw

overlaps

Returns the output of the second iterator that overlaps regions of the first.

wiggletools overlaps test/fixedStep.bw test/variableStep.bw

trim

Same as above but trims the regions to the overlapping portions:

wiggletools trim test/fixedStep.bw test/variableStep.bw

trimFill

Same as trim, but fills in trimmed regions with the default value of the second iterator.

wiggletools trimFill test/fixedStep.bw test/overlapping_coverage.wig

nearest

Returns the regions of the second iterator and their distance to the nearest region in the first iterator.

wiggletools nearest test/fixedStep.bw test/variableStep.bw

3 Multiplexed iterators

However, sometimes you want to compute statistics across many iterators. In this case, the function is followed by an arbitrary list of iterators, separated by spaces. The list is terminated by a colon (:) separated by spaces from other words. At the very end of a command string,

WiggleTools

Install / Use

README

WiggleTools 1.2

Conda Installation

Brew Installation

Docker Installation

Guix Installation

Build from source

Pre-requisites

Installing LibBigWig

Installing the htslib library

Installing the GSL library

Installing WiggleTools

Basics

Input files

Streaming data

Operators

1 Unary operators

2 Binary operators

3 Multiplexed iterators