WiggleTools
Basic operations on the space of numerical functions defined on the genome using lazy evaluators for flexibility and efficiency
Install / Use
/learn @Ensembl/WiggleToolsREADME
WiggleTools 1.2
Author: Daniel Zerbino
Copyright holder: EMBL-European Bioinformatics Institute (Apache 2 License)
The WiggleTools package allows genomewide data files to be manipulated as numerical functions, equipped with all the standard functional analysis operators (sum, product, product by a scalar, comparators), and derived statistics (mean, median, variance, stddev, t-test, Wilcoxon's rank sum test, etc).
Conda Installation
Install conda, then run:
conda install -c bioconda wiggletools
Brew Installation
Install Homebrew, then run:
brew install brewsci/bio/wiggletools
Docker Installation
Pull the latest image from Dockerhub:
docker pull ensemblorg/wiggletools:latest
Run the resulting wiggletools executable, bind-mounting the current working directory into the container:
docker container run --rm --mount type=bind,source="$(pwd)",target=/mnt ensemblorg/wiggletools [...arguments...]
Guix Installation
Install GNU Guix, then run:
guix pull
guix install wiggletools
Build from source
Pre-requisites
WiggleTools requires three main dependencies: LibBigWig, HTSLib and GSL (GNU scientific) libraries. They themselves require zlib bzip2 and libcurl.
Installing LibBigWig
git clone https://github.com/dpryan79/libBigWig.git
cd libBigWig
make install
Installing the htslib library
git clone --recurse-submodules https://github.com/samtools/htslib.git
cd htslib
make install
Installing the GSL library
wget ftp://www.mirrorservice.org/sites/ftp.gnu.org/gnu/gsl/gsl-latest.tar.gz
tar -xvzpf gsl-latest.tar.gz
cd gsl*
./configure
make
make install
Installing WiggleTools
If you didn't download WiggleTools yet:
git clone https://github.com/Ensembl/WiggleTools.git
Once you installed the previous libraries and downloaded WiggleTools, you can compile the WiggleTools library:
cd WiggleTools
make
The make process produces a number of outputs:
- A statically linked library in lib/
- A header for that library in inc/
- Various executables in bin/
There is no installation routine, meaning that you should copy the relevant files onto your path, library path, etc. Note that the executable does not require the libraries to be available.
If the system cannot find 'gsl/gsl_cdf.h' then you need to install the GNU scientific library
Just to check, you can launch the tests (requires Python):
make test
Basics
The WiggleTools library, and the derived program, are centered around the use of iterators. An iterator is a function which produces a sequence of values. The cool thing is that iterators can be built off other iterators, offering many combinations.
The wiggletools executable is run by giving it a string which describes an iterator function, which it executes, printing the output into stdout.
wiggletools <program>
If you need a refresher:
wiggletools --help
If you are an intensive user, you may find that processing many files may break limits on commandline commands, especially if shelling out from a scripting language. You may copy the program into a text file, then execute it:
wiggletools run program.txt
Input files
By default, the executable recognizes the file format from the suffix of the file name:
- Wiggle files
wiggletools test/fixedStep.wig
- BigWig files
wiggletools test/fixedStep.bw
- BedGraph files
wiggletools test/bedfile.bg
- Bed files
wiggletools test/overlapping.bed
- BigBed files
wiggletools test/overlapping.bb
- Bam files
Requires a .bai index file in the same directory
wiggletools test/bam.bam
- Cram files
Requires a .bai index file in the same directory
wiggletools test/cram.cram
- VCF files
wiggletools test/vcf.vcf
- BCF files
Requires a .tbi index file in the same directory
wiggletools test/bcf.bcf
Streaming data
You can stream data into WiggleTools, e.g.:
cat test/fixedStep.wig | wiggletools -
The input data is assumed to be in Wig or BedGraph format, but can also be in Sam format:
samtools view test/bam.bam | wiggletools sam -
Operators
However, iterators can be constructed from other iterators, allowing arbitrarily complex constructs to be built. We call these iterators operators. In all the examples below, the iterators are built off simple file readers (for simplicity), but you are free to replace the inputs with other iterators.
1 Unary operators
The following operators are the most straightforward, because they only read data from a single other iterator.
- abs
Returns the absolute value of an iterators output:
wiggletools abs test/fixedStep.bw
- ln
Returns the natural log of an iterators output:
wiggletools ln test/fixedStep.bw
- log
Returns the logarithm in an arbitrary base of an iterators output:
wiggletools log 10 test/fixedStep.bw
- scale
Returns an iterator's output multiplied by a scalar (i.e. decimal number):
wiggletools scale 10 test/fixedStep.bw
- offset
Returns an iterator's output added to a scalar (i.e. decimal number):
wiggletools offset 10 test/fixedStep.bw
- gt
Returns contiguous boolean regions where the iterator is strictly greater than a given cutoff:
wiggletools gt 5 test/fixedStep.bw
This is useful to define regions in the apply function, or to compute information content (see below).
- lt
Returns contiguous boolean regions where the iterator is strictly less than a given cutoff:
wiggletools lt 5 test/fixedStep.bw
This is useful to define regions in the apply function, or to compute information content (see below).
- gte
Returns contiguous boolean regions where the iterator is greater than or equal to a given cutoff:
wiggletools gte 5 test/fixedStep.bw
This is useful to define regions in the apply function, or to compute information content (see below).
- lte
Returns contiguous boolean regions where the iterator is less than or equal to a given cutoff:
wiggletools lte 5 test/fixedStep.bw
This is useful to define regions in the apply function, or to compute information content (see below).
- unit
Returns 1 if the operator is non-zero, 0 otherwise, and merges contiguous positions with the same output value into blocks:
wiggletools unit test/fixedStep.bw
This is useful to define regions in the apply function (see below).
- coverage
Returns a coverage plot of overlapping regions, typically read from a bed file:
wiggletools coverage test/overlapping.bed
- isZero
Does not print anything, just exits with return value 1 (i.e. error) if it encounters a non-zero value:
wiggletools isZero test/fixedStep.bw
- seek
Outputs only the points of an iterator within a given genomic region:
wiggletools seek chr1 2 8 test/fixedStep.bw
- bin
Sums results into fixed-size bins
wiggletools bin 2 test/fixedStep.bw
- toInt
Casts the iterator's output to an int, effectively rounding any floating point values toward zero.
wiggletools toInt test/fixedStep.bw
- floor
Returns the floor of a iterator's output. Note that floor rounds the output toward negative infinity.
wiggletools floor test/fixedStep.bw
- shiftPos
Returns the iterator given with start and end positions shifted downwards by a specified value. Note the given value must be non-negative, as default behavior is to shift coordinates toward zero.
wiggletools shiftPos 10 test/fixedStep.bw
2 Binary operators
The following operators read data from exactly two iterators, allowing comparisons:
- diff
Returns the difference between two iterators outputs:
wiggletools diff test/fixedStep.bw test/variableStep.bw
- ratio
Returns the output of the first iterator divided by the output of the second (divisions by 0 are squashed, and no result is given for those bases):
wiggletools ratio test/fixedStep.bw test/variableStep.bw
- overlaps
Returns the output of the second iterator that overlaps regions of the first.
wiggletools overlaps test/fixedStep.bw test/variableStep.bw
- trim
Same as above but trims the regions to the overlapping portions:
wiggletools trim test/fixedStep.bw test/variableStep.bw
- trimFill
Same as trim, but fills in trimmed regions with the default value of the second iterator.
wiggletools trimFill test/fixedStep.bw test/overlapping_coverage.wig
- nearest
Returns the regions of the second iterator and their distance to the nearest region in the first iterator.
wiggletools nearest test/fixedStep.bw test/variableStep.bw
3 Multiplexed iterators
However, sometimes you want to compute statistics across many iterators. In this case, the function is followed by an arbitrary list of iterators, separated by spaces. The list is terminated by a colon (:) separated by spaces from other words. At the very end of a command string,
