SkillAgentSearch skills...

R2Dtool

Analyse RNA feature distributions.

Install / Use

/learn @comprna/R2Dtool
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

R2Dtool

R2Dtool is a set of genomics utilities for handling, integrating, and viualising isoform-mapped RNA feature data.

  • Integrate transcriptome-mapped data: R2Dtool performs liftover of transcriptome-mapped RNA features to their corresponding genomic coordinates

  • Annotate transcriptome-mapped sites: R2Dtool annotates transcript-specific metatranscript coordinates and absolute and relative distances to annotated transcript landmarks, in an isoform-specific manner.

  • Visualise isoform-aware RNA feature distributions: R2Dtool introduces isoform-aware metatranscript plots and metajunction plots to study the positonal distribution of RNA features around annotated RNA landmarks.

R2Dtool is published in Bioinformatics
Sethi, A. J., Mateos, P. A., Hayashi, R., Shirokikh, N. & Eyras, E. R2Dtool: integration and visualization of isoform-resolved RNA features. Bioinformatics, Volume 40, Issue 8 (2024)
doi: 10.1093/bioinformatics/btae495

Contents

Quick-start guide

System requirements

R2Dtool has been tested on OS X 14.1, Red Hat Enterprise Linux 4.18.0, and should be broadly compatible with any system that support Rustc and R. We recommend using R2Dtool in an environment with at least 4 CPU cores and 12GB of free memory.

  • R2Dtool requires rustc and cargo for compilation.
  • Additionally, R, along with tidyverse, binom and ragg R packages, are requred to generate metaplots.

Compiling R2Dtool from source:

# clone R2Dtool and compile using Cargo 
git clone https://github.com/comprna/R2Dtool.git && cd R2Dtool
cargo build --release

# add R2Dtool to PATH
export PATH="$PATH:$(pwd)/target/release/"

R2Dtool tutorial

We provide a short tutorial that allows you to test R2Dtool. We provide a toy dataset of isoform-mapped m6A predictions in HeLa cells, taken from Figure 1E and 1F of the manuscript.

Input data

The input dataset, ./test/m6A_isoform_sites_GRCh38_subset.bed, contains the positions and stoichiometry of m6A sites mapped to GRCh38 transcript isoforms, in a tab-separated BED3+ format. R2Dtool can operate on any transcriptome-mapped RNA sites, where transcript ID and transcript coordinates are specified columns 1-3:

$ head -n 3 ./test/m6A_isoform_sites_GRCh38_subset.bed 
transcript         start  end   base  coverage  strand  N_valid_cov  fraction_modified
ENST00000381989.4  2682   2683  a     10        +       10           0.00
ENST00000381989.4  2744   2745  a     10        +       10           0.00

To liftover positions from RNA to DNA coordinates, R2Dtool requires a GTF2.2 annotation, provided at ./test/GRCh38.110_subset.gtf:

$ head -n 2 ./test/GRCh38.110_subset.gtf
12      ensembl_havana  transcript      4487735 4538469 .       -       .       gene_id "ENSG00000047621"; gene_version "12"; transcript_id "ENST00000261250"; transcript_version "8"; gene_name "C12orf4"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "C12orf4-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS8528"; tag "basic"; tag "Ensembl_canonical"; tag "MANE_Select"; transcript_support_level "1 (assigned to previous version 7)";
12      ensembl_havana  exon    4538409 4538469 .       -       .       gene_id "ENSG00000047621"; gene_version "12"; transcript_id "ENST00000261250"; transcript_version "8"; exon_number "1"; gene_name "C12orf4"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "C12orf4-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS8528"; exon_id "ENSE00001378522"; exon_version "2"; tag "basic"; tag "Ensembl_canonical"; tag "MANE_Select"; transcript_support_level "1 (assigned to previous version 7)";

More detail on R2Dtool GTF annotation requirements is provided on the R2Dtool Wiki.

R2Dtool liftover

To fetch the genomic coordinates for each transcriptome-mapped modification site, we can use R2Dtool liftover:

# liftover transcriptomic sites to genomic coordinates
$ r2d liftover -H -g ./test/GRCh38.110_subset.gtf -i ./test/m6A_isoform_sites_GRCh38_subset.bed > ./test/liftover.bed

$ head -n 3 ./test/liftover.bed 
chromosome  start     end       name  score  strand  transcript         start  end   base  coverage  strand  N_valid_cov  fraction_modified
13          24455165  24455166               -       ENST00000381989.4  2682   2683  a     10        +       10           0.00
13          24435272  24435273               -       ENST00000381989.4  3941   3942  a     20        +       20           80.00

The genomic coordinates for each site are prepended to columns 1-6 of the output in standard BED6 format. We can now explore the genomic context of m6A sites by opening liftover.bed in the genome browser, or compare the positions of our m6A sites to features previously annotated in genomic coordinates using bedtools. Liftover calculations and ouputs are described in further detail on the R2Dtool Wiki.

R2Dtool annotate

We can use R2Dtool annotate to annotate RNA features with metatranscript coordinates and distances to local transcript features:

# annotate transcriptomic-mapped m6A sites 
$ r2d annotate -H -g ./test/GRCh38.110_subset.gtf -i ./test/m6A_isoform_sites_GRCh38_subset.bed > ./test/annotate.bed

$ head -n 3  ./test/annotate.bed 
transcript         start  end   base  coverage  strand  N_valid_cov  fraction_modified  gene_id          gene_name  transcript_biotype  tx_len  cds_start  cds_end  tx_end  transcript_metacoordinate  abs_cds_start  abs_cds_end  up_junc_dist  down_junc_dist
ENST00000381989.4  2682   2683  a     10        +       10           0.00               ENSG00000102699  PARP4      protein_coding      5437    74         5246     5437    1.50425                    2608           -2564        46            150
ENST00000381989.4  2744   2745  a     10        +       10           0.00               ENSG00000102699  PARP4      protein_coding      5437    74         5246     5437    1.51624                    2670           -2502        108           88

The input sites are appended with 12 additional columns of isoform-specific annotation data, including isoform-specific metatranscript coordinates, and distances to transcript landmarks. Annotate calculations and outputs are described in further detail on the R2Dtool Wiki.

R2Dtool metafeature plots

R2Dtool contains several plotting functions which display the distribution of RNA features across transcripts and around transcript landmarks. All of these functions operate on R2Dtool annotate -H output.

To visualise the distribution of RNA features across the length of transcripts, we can make isoform-resolved metatranscript plots. Here, we will show the proportion of methylated sites (sites with >10% m6A methylation) across a normalised metatranscript model:

# make metatranscript plot using annotated sites 
$ r2d plotMetaTranscript -i "./test/annotate.bed" -o "./test/metatranscript_m6A.png" -f "fraction_modified" -u "10" -t "upper" -l 

The plot shows the proportion of sites that have greater than 10% m6A methylation in each metatranscript bin: Distribution of m6A sites along a metatranscript range

We can also study the distribution of m6A sites around exon-exon junctions in an isoform specific manner, using the plotMetaJunction function:

# make metajunction plot using annotated sites 
$ r2d plotMetaJunction -i "./test/annotate.bed" -o "./test/metajunction_m6A.png" -f "fraction_modified" -u "10" -t "upper" 

The plot shows the proportion of sites that have greater than 10% m6A methylation, and their distances to local splice-junctions:

By default, distances to downstream junctions are indicated on the right side of the plot, and distances to upstream junctions are indicated on the left side of the plot.
To reverse the focus, so that splice junctions are centered at x = 0 and distances to m6A sites are plotted around junctions, pass the '-r' flag to R2_plotMetaJunction.R. An example is shown below.

Distribution of m6A sites around exon junctions

More information about R2Dtool plots is available on the R2Dtool Wiki.

R2Dtool file formats

Input BED file columns

R2Dtool can parse RNA feature positions in transcriptome coordinates, provided in tab-delimited BED3+ file. At minimum, the RNA feature sites must contain the following fields:

  • Column 1 must contain transcript ID.
  • Column 2 and 3 must represent the coordinates of the RNA feature in zero-based, half-open coordinates.

Any nu

Related Skills

View on GitHub
GitHub Stars19
CategoryDevelopment
Updated1mo ago
Forks4

Languages

Jupyter Notebook

Security Score

95/100

Audited on Feb 26, 2026

No findings