riboWaltz

R package for calculation of optimal P-site offsets, diagnostic analysis and visual inspection of ribosome profiling data

<br>

Overview

Ribosome profiling is a powerful technique used to study translation at the genome-wide level, generating unique information concerning ribosome positions along RNAs. Optimal localization of ribosomes requires the proper identification of the ribosome P-site in each ribosome protected fragment, a crucial step to determine trinucleotide periodicity of translating ribosomes and draw correct conclusions concerning where ribosomes are located. To determine the P-site within ribosome footprints at nucleotide resolution, the precise estimation of its offset with respect to the protected fragment is necessary.

riboWaltz is an R package for calculation of optimal P-site offsets, diagnostic analysis and visual inspection of ribosome profiling data. Taking advantage of a two-step algorithm where offset information is passed through populations of reads with different length in order to maximize offset coherence, riboWaltz computes with high precision the P-site offset. riboWaltz also provides a variety of graphical representations, laying the foundations for further accurate RiboSeq analyses and improved interpretation of positional information.

Citing riboWaltz

Please cite the following article when using riboWaltz:

Lauria F, Tebaldi T, Bernabò P, Groen EJN, Gillingwater TH, Viero G. riboWaltz: Optimization of ribosome P-site positioning in ribosome profiling data. PLoS Comput Biol. 2018 Aug 13;14(8):e1006169.

License

Before starting

Dependencies

riboWaltz requires R version >= 3.3.0 and the following packages:

CRAN
- data.table (>= 1.10.4.3)
- ggplot2 (>= 2.2.1)
- ggrepel (>= 0.6.5)
Bioconductor
- Biostrings (>= 2.46.0)
- GenomicAlignments (>= 1.14.1)
- GenomicFeatures (>= 1.24.5)
- GenomicRanges (>= 1.24.3)
- IRanges (>= 2.12.0)

All dependencies are automatically installed running the code in the next section.

Installation

R

To install riboWaltz directly from GitHub the devtools package is required. If not already installed on your system, run

install.packages("devtools")

Otherwise, load devtools and install riboWaltz by

library(devtools)
install_github("LabTranslationalArchitectomics/riboWaltz", dependencies = TRUE)

Please note: to install riboWaltz generating the vignette replace the last command with:

install_github("LabTranslationalArchitectomics/riboWaltz", dependencies = TRUE, 
		build_vignettes = TRUE)

Conda

riboWaltz is also available through Conda and can be installed into the current environment by following these instructions.

Loading

To load riboWaltz run

library(riboWaltz)

Getting help

The following sections illustrate how to make use of riboWaltz by introducing all functions included in the package and reporting most of the data structures and graphical outputs generated with the default options. Similar information are reported in the vignette returned by

browseVignettes("riboWaltz")

For additional examples and further details about the meaning and usage of all parameters in a function run

?function_name

help(package = riboWaltz)

A complete reference manual is available here.

Bugs and errors can be reported at the issues page on GitHub. Before filing new issues, please read the documentation and take a look at currently open and already closed discussions.

From BAM files to P-site offsets

Remarks

Note 1

riboWaltz currently works for read alignments based on transcript coordinates. This choice is due to the main purpose of RiboSeq assays to study translational events through the isolation and sequencing of ribosome protected fragments. Reads from RiboSeq are supposed to map on mRNAs and not on introns and intergenic regions. BAM based on transcript coordinates can be generated alinging the reads i) directly against transcript sequences or ii) against standard chromosome sequences and forcing the outputs to be translated in transcript coordinates.

The first option can be easily handled by many aligners (e.g. Bowtie), given a reference FASTA file where each sequence represents a transcript, from the beginning of the 5' UTR to the end of the 3' UTR. The second procedure is based on reference FASTA files where each sequence represents a chromosome, usually coupled with comprehensive gene annotation files (GTF or GFF). The STAR aligner with its option -quantMode TranscriptomeSAM (see the section "Output in transcript coordinates" of its manual), is an example of tool providing such a feature.

Note 2

Multiple functions described below can accept and return both list of data tables and GRangesList objects. For convenience, all example datasets included in the package, as well as the whole ReadMe, are based on and refer to data tables.

Acquiring input files

One or multiple BAM files are read, converte

RiboWaltz

Install / Use

README