Ribotish
Ribo-seq TIS Hunter, predicting translation initiation sites and ORFs using riboseq data
Install / Use
/learn @zhpn1024/RibotishREADME
README for Ribo-TISH (0.2.8)
<2025-10-21 Peng Zhang>
Introduction
Translation is a critical step in gene regulation that synthesizes proteins from a given RNA template. The development of the ribosome profiling (riboseq) technique has enabled the measurement of translation at a genome-wide level. The basic idea of ribosome profiling is to perform deep-sequencing of the ribosome-protected mRNA fragment (~30 nts), termed ribosome footprints, to determine the occupancy of translating ribosomes on a given mRNA. There are several variants of the ribosome profiling technique that are based on the use of different translation inhibitors. The regular ribo-seq utilizes Cycloheximide (CHX), a translation elongation inhibitor to freeze all translating ribosomes. In contrast to CHX, the translation inhibitor lactimidomycin (LTM) and harringtonine (Harr) have a much stronger effect on initiating ribosomes. The use of these two inhibitors allows for the global mapping of translating initiating sites (TISs) when they are coupled with with ribosome profiling (TI-Seq). In addition, when LTM is used sequentially with puromycin (PMY), the TISs can be mapped quantitatively and can be compared between different conditions. we present a novel algorithm, named Ribo TIS Hunter (Ribo-TISH), for identifying translation activities using ribosome profiling data. Ribo-TISH uses statistical tests to assess the significance of translation activities. It captures significant TISs using negative binomial test, and frame biased open reading frames (ORFs) using rank sum test. Ribo-TISH can also perform differential analysis between two TI-Seq data.
Install
Please check the file 'INSTALL.rst' in the distribution.
Usage of Ribo-TISH
::
ribotish [-h] [--version] {quality,predict,tisdiff}
:Example for quality control: ribotish quality -b ltm.bam -g gene.gtf -t
:Example for prediction: ribotish predict -t ltm.bam -b chx.bam -g gene.gtf -f genome.fa -o pred.txt
:Example for differential TIS: ribotish tisdiff -1 pred1.txt -2 pred2.txt -a qti1.bam -b qti2.bam -g gene.gtf -o diff.txt
There are 3 functions available as sub-commands.
:quality: Quality control for riboseq bam data. :predict: Main function to predict ORF/TIS. :tisdiff: Call diffential TIS between two TIS data
The main input data is in bam file format. For best performance, reads should be trimmed (to ~ 29 nt RPF length) and aligned to genome using end-to-end mode (no soft-clip). Intron splicing is supported. Some attributes are needed such as NM, NH and MD. For STAR, --outSAMattributes All should be set. bam file should be sorted and indexed by samtools_.
All positions or regions reported by Ribo-TISH are 0 based, half open, same as in bed_ format.
.. _samtools: https://github.com/samtools/samtools .. _bed: https://genome.ucsc.edu/FAQ/FAQformat.html#format1
quality
Quality control of riboseq bam data. This function checks reads distribution around annotated protein coding regions on user provided transcripts, show frame bias and estimate P-site offset for different group of reads. Reads are grouped by read length as well as 5' end match or mismatch. 5' end mismatch ('m0') reads often have different distribution from matched reads. To turn off 5' end mismatch grouping, use ```--nom0```.
There are 3 output files: a txt file recording all distribution data, a pdf figure file and a python file for P-site offset parameters.
Quick examples:
For regular riboseq
::
ribotish quality -b chx.bam -g gene.gtf
For TI-Seq data
::
ribotish quality -b ltm.bam -g gene.gtf -t
Options
--------------
-b RIBOBAMPATH
``````````````
Riboseq bam data file. Reads should be trimmed and aligned to genome.
-g GENEPATH
```````````
Gene annotation file. Acceptable formats include gtf, gff, bed and genepred with gene names. Input file format can be auto detected or specified by ```--geneformat``` option
-o OUTPUT
`````````
Output all distribution data. Default: bampath[:-4]+'_qual.txt'. Quality and offset estimation is based on this distribution. User can save this file for further quick estimation trying different thresholds by ```-i``` option.
-t/--tis
````````
This data is TIS enriched, for LTM and Harr. Quality will pay more attention to TIS sites.
-i INPUT
````````
Input previous output file, do not read gene file and bam file again.
--geneformat GENEFORMAT
```````````````````````
Gene annotation file format (gtf, bed, gpd, gff, default: auto)
--chrmap CHRMAP
```````````````
Input chromosome id mapping table file if annotation chr ids are not the same as chr ids in bam/fasta files. Format:
========= =========
chr_name1 chr_name2
========= =========
Two columns, tab seperated, no specific order requirement. Mappings such as 'chr1' to '1' can be automatically processed without using this option.
-f FIGPDFPATH
`````````````
Output pdf figure file. Default: bampath[:-4]+'_qual.pdf'
-r PARAPATH
```````````
Output offset parameter file. Default: bampath+'.para.py'. This file saves P-site offsets for different reads lengths in python code dict format, and can be used in further analysis.
-l LENS
```````
Range of tag length Default: 25,35. The last number (35) is not included, i.e. the longest length considered is 34.
-d DIS
``````
Position range near start codon or stop codon. Default: -40,20
--bins BINS
```````````
Number of bins for cds profile. Default: 20
--nom0
```````````
Not consider reads with mismatch at position 0 (5' end mismatch) as a new group.
--th TH
```````
Threshold for quality. Default: 0.5. Group that frame bias ratio < TH will be considered as low quality and this group of reads will not be used in further analysis. The offset for low quality groups will not be set in parameter file.
--end3
``````````
Plot RPF 3' end profile instead of 5' end.
--colorblind
````````````
Use a color style readable for color blind people ('#F00078,#00F000,#0078F0')
--colors COLORS
```````````````
User specified Matplotlib acceptable color codes for three frames (default: 'r,g,b')
-p NUMPROC
``````````
Number of processes. Default: 1
-v/--verbose
`````````````
Increase output verbosity.
Output files
------------
OUTPUT
```````
OUTPUT is a txt file recording all distribution data in python format for each group of reads. These distributions are shown in pdf figure file. Quality and offset estimation is based on this distribution. User can save this file for further quick estimation trying different thresholds by ```-i``` option.
Pdf figure
``````````
Pdf figure file is plot of all the distributions and illustration of quality and P-site offset. The left part is for 5' end matched reads and the right part is for 5' end mismatch reads if ```--nom0``` is not set.
Upper panel: the length distribution of RPFs uniquely mapped to annotated protein-coding regions.
Lower panel: different quality metrics for RPFs uniquely mapped to annotated protein-coding regions.
Each row shows the RPFs with different lengths.
- Column 1: distribution of RPF 5’ end in 3 frames in all annotated codons. The percentage of the reads from the dominant reading frame is shown.
- Column 2: the distribution of RPF 5’end count near annotated TIS. The estimate of the P site offset and TIS accuracy are also shown. The RPFs of a specific length that do not pass threshold are considered as low quality and removed.
- Column 3: the distribution of RPF 5’end count near annotated stop codon.
- Column 4: The RPF profile throughout the protein-coding regions in 3 frames. TIS enrich score (TIS count / CDS average) is also shown for TIS data.
Offset parameter file
`````````````````````
This file saves P-site offsets for different reads lengths in python code dict format, and can be used in further analysis. The default offset file name is bampath+'.para.py' accompanied with the input bam file. The file format is like
::
offdict = {28: 12, 29: 12, 30: 12, 32: 13, 'm0': {29: 12, 30: 12, 31: 13}}
The offset parameter file is easy to interpret and can be edited by user if auto estimated offsets are not correct. The default file name will be auto-recognized in further analysis. If the bam file is in a different directory and user do not want to create a parameter file in that directory, we recommend creating a link for the bam file in current working directory, e.g. ```ln -s original/dir/ribo.bam```
Ribo-TISH does not guarantee that it can always find best P-site offset values. Users should check the quality figures and edit the parameter file if necessary.
predict
This is the main function of Ribo-TISH. This function predicts ORF/TIS with riboseq bam files. This function uses negative binomial model to fit TI-Seq background and test significance of TIS sites. For regular riboseq data, Wilcoxon rank sum test between in-frame reads and out-frame reads inside the ORF is performed.
Quick examples:
Combine TI-Seq and regular riboseq data ::
ribotish predict -t ltm.bam -b chx.bam -g gene.gtf -f genome.fa -o pred.txt
For TI-Seq data only ::
ribotish predict -t ltm.bam -g gene.gtf -f genome.fa -o pred.txt
De novo ORF prediction with only regular riboseq data using longest strategy ::
ribotish predict -b chx.bam -g gene.gtf -f genome.fa --longest -o pred.txt
De novo ORF prediction with two regular riboseq data using framebest strategy ::
ribotish predict -b chx1.bam,chx2.bam -g gene.gtf -f genome.fa --framebest -o pred.txt
Only test user provided ORF candidates with two regular riboseq data ::
ribotish predict -b chx1.bam,chx2.bam -g gene.gtf -f genome.fa -i cand.txt -o pred.txt
Options
-t TISBAMPATHS
Input TI-seq bam data files, comma seperated.
-b RIBOBAMPATHS
Regular riboseq bam data files, comma seperated.
At least one bam file should be provided by either -t or ```-b``
Related Skills
node-connect
340.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
340.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.1kCommit, push, and open a PR
