SkillAgentSearch skills...

Sports1.1

Small non-coding RNA annotation Pipeline Optimized for rRNA- and tRNA-Derived Small RNAs

Install / Use

/learn @junchaoshi/Sports1.1

README

SPORTS1.1 (Small non-coding RNA annotation Pipeline Optimized for rRNA- and tRNA-Derived Small RNAs)

<b>If you use SPORTS1.1 in your work, please cite these papers</b>:

SPORTS1.0: a tool for annotating and profiling non-coding RNAs optimized for rRNA-and tRNA-derived small RNAs

Junchao Shi, Eun-A Ko, Kenton M. Sanders, Qi Chen, Tong Zhou. SPORTS1.0: a tool for annotating and profiling non-coding RNAs optimized for rRNA-and tRNA-derived small RNAs. <i>Genomics, Proteomics & Bioinformatics</i> (2018) doi.org/10.1016/j.gpb.2018.04.004

Optimized identification and characterization of small RNAs with PANDORA-seq

Junchao Shi, Yunfang Zhang, Yun Li, Liwen Zhang, Xudong Zhang, Menghong Yan, Qi Chen, Ying Zhang. Optimized identification and characterization of small RNAs with PANDORA-seq. <i>Nature Protocols</i> (2025) doi.org/10.1038/s41596-025-01158-4

<a href='#require'> Requirements </a>

<a href='#install'> Installation </a>

<a href='#script'> Script description </a>

  • <a href='#usage'> Example usage </a>

<a href='#pre-compile'> Pre-compiled annotation databases instruction </a>

<a href='#self-compile'> Instruction for compiling annotation database by user </a>

<a href='#copyright'> Copyright and licensing information </a>

<a href='#history'> Update history </a>

<a href='#statistics'> Software statistics </a>

<a href='#disclaimer'> Disclaimer </a>

<a href='#contact'> Contact information </a>

Requirements <a id='require'></a>

Linux system, enough disk space and Ram depending on the size of RNA deep sequencing data. (Tested system: ubuntu 12.04 LTS, ubuntu 16.04 LTS)

Installation <a id='install'></a>

  1. Download SPORTS1.1 pipeline package.

    wget https://github.com/junchaoshi/SPORTS1.1/archive/master.zip

  2. Download necessary software, packages and reference databases as listed below:

    1. Perl 5 (https://www.perl.org) (Tested version: v5.14.2, v5.22.1); Perl 5 might be already installed in the linux system.

    2. Bowtie [1] (http://bowtie-bio.sourceforge.net/index.shtml) (Tested version: 1.1.2, 1.2.1.1)

    3. SRA Toolkit (https://github.com/ncbi/sra-tools) (Tested version: 2.8.2)

      cutadapt [2] (http://cutadapt.readthedocs.io/en/stable/index.html) (Tested version: 1.11)

    4. R (https://www.r-project.org/) (Tested version: 3.2.3, 3.2.5)

    5. Reference database (See lists and download link of all pre-compiled species’ databases in Pre-compiled Databases Instruction)

  3. Installation tutorial for software and packages.

    1. Install SPORTS1.1

      1. Unpack SPORTS1.1 package.

        unzip SPORTS1.1-master.zip

      2. Attach the SPORTS directory to your PATH:

        echo 'export PATH=$PATH:your_path_to_SPORTS1.1-master/source' >> ~/.bashrc

        chmod 755 your_path_to_SPORTS1.1-master/source/sports.pl

    2. Install Bowtie

      1. Unpack bowtie-1.x.x-linux-x86_64.zip.

        unzip bowtie-1.x.x-linux-x86_64.zip

      2. Attach the bowtie directory to your PATH:

        echo 'export PATH=$PATH:your_path_to_bowtie' >> ~/.bashrc

      If you are administrator user, type the following command and password to easily install bowtie:
      
      sudo apt-get install bowtie
      
    3. Install SRA Toolkit

      1. Unpack SRA toolkit files.

      2. Attach the SRA Toolkit executable path to your PATH:

        echo 'export PATH=$PATH:your_path_to_sra-toolkit/bin' >> ~/.bashrc

    4. Install cutadapt

      1. Use pip on the command line to install latest version of cutadapt:

        pip install --user --upgrade cutadapt

      2. Attach the cutadapt directory to your PATH:

        echo 'export PATH=$PATH:$HOME/.local/bin' >> ~/.bashrc

    5. Install R and R package

      1. Unpack R-x.y.z.tar.gz with:

        tar -xf R-x.y.z.tar.gz

      2. Enter into the R-x.y.z directory:

        cd R-x.y.z

      3. Type following command in terminal:

        ./configure
        
        make
        
        make check
        
        make install
        
      4. Install R packages by typing following command in terminal:

        R
        
        install.packages('ggplot2', dependencies=TRUE, repos='http://cran.rstudio.com/')
        
        install.packages('data.table', dependencies=TRUE, repos='http://cran.rstudio.com/')
        
        install.packages('stringr', dependencies=TRUE, repos='http://cran.rstudio.com/')
        
        q()
        
        n
        
  4. Start a new shell session to apply changes to environment variables:

    source ~/.bashrc

  5. Test if everything is installed properly:

    perl -v
    
    sports.pl -h
    
    bowtie
    
    fastq-dump
    
    cutadapt -h
    
    R --version
    
    If you get any error messages you should install the software once again.
    

Script description <a id='script'></a>

sports.pl

  1. Input query format:

    1. .sra files.

    2. .fastq/.fq, .fasta/.fa files of deep sequencing reads.

    Attention: compressed files need to be unpacked before input!
    
  2. Options:

    --Input:

     -i <file> Input could be: 
     
         a .sra, .fastq/.fq or .fasta/.fa file;
    
         a directory (will run all qualified files in the directory recursively); 
    
         a text document (with suffix .txt) with absolute path information for each file/folder (when processing multiple data, input each file/folder path per line)
    

    --Output:

     -o <str> output address of annotation results (default: input address)
      
     -k keep all the intermediate files generated during the running progress
     
    

    --Alignment:

     -l <int> the minimal length of the output sequences (default = 15)
    
     -L <int> the maximal length of the output sequences (default = 45)
    
     -M <int> the total number of mismatches in the entire alignment (default = 0)
     
     -a  Remove 5' / 3' adapters
    
         -x <str> (if -a applied) 5' adapter sequence. Default = "GTTCAGAGTTCTACAGTCCGACGATC"
    
         -y <str> (if -a applied) 3' adapter sequence. Default = "TGGAATTCTCGGGTGCCAAGG"
     
    

    --Others:

     -v print version information
     
     -h print this usage message
     
    
  3. Example <a id='usage'></a>

    <b>Detailed instructions for running the demo small RNA-seq data analysis, along with example output results, are provided in the Demo folder. </b>

    • Example use 1:

    The user wants to map a single fasta file against rat reference genome to get the mapping genome annotation only. (No output figures)

    Type following command in terminal:

    sports.pl -i reads.fa -g /foo/bar/Rattus_norvegicus/UCSC/rn6/Sequence/BowtieIndex/genome

    • Example use 2:

    The user wants to map several already trimed human sequencing files to human reference genome, miRNA database, tRNA database, rRNA database and piRNA database by using 4 CPU threads, then to output the result to the address: '/foo/bar/output/'.

    Write all the fastq files' addresses into a text document with suffix .txt, e.g.:

    seq_address.txt
    ---------------------------
    /foo/bar/fold_1/seq_1.fastq
    /foo/bar/fold_2/seq_2.fq
    /foo/bar/fold_2/seq_3.fq
    /foo/bar/fold_3/seq_4.fasta
    /foo/bar/fold_4/seq_5.fa
    ---------------------------
    

    Type following command in terminal:

    sports.pl -i seq_address.txt -p 4 -g /foo/bar/Homo_sapiens/genome/hg38/genome -m /Homo_sapiens/miRBase/21/miRBase_21-has -r /foo/bar/Homo_sapiens/rRNAdb/human_rRNA -t /foo/bar/Homo_sapiens/GtRNAdb/hg19/hg19-tRNAs -w /foo/bar/Homo_sapiens/piRBase/piR_human -o /foo/bar/output/

    • Example use 3:

    The user wants to map several untrimmed mouse sequencing files downloaded from NCBI or somewhere else to mouse reference genome, miRNA database, tRNA database, rRNA database, piRNA database, ensembl noncoding RNA database and Rfam database by using 4 CPU threads, then to output the result to the address: '/foo/bar/output/' and keep all the intermediate files generated during the running progress.

    Put all the sequencing files into a folder, e.g.:
    
    folder structure:
    
    ------------------------
    download_seq
       │
       ├─fold_1
       │   │
       │   ├─seq_1.sra
       │   │
       │   └─seq_2.sra
       │
       ├─fold_2
       │   │
       │   ├─fold_3
       │   │   │
       │   │   ├─seq_3.fastq
       │   │   │
       │   │   └─seq_4.fq
       │   │
       │   └─seq_5.fasta
       │
       └─seq_6.fa
    ------------------------
    

    Type following command in terminal:

    sports.pl -i /foo/bar/download_seq/ -p 4 -a -x GTTCAGAGTTCTACAGTCCGACGATC -y TGGAATTCTCGGGTGCCAAGG -g /foo/bar/Mus_musculus/genome/mm10/genome -m /foo/bar/Mus_musculus/miRBase/21/miRbase_21-mmu -r /foo/bar/Mus_musculus/rRNAdb/mouse_rRNA -t /foo/bar/Mus_musculus/GtRNAdb/mm10/mm10-tRNAs -w /foo/bar/Mus_musculus/piRBase/piR_mouse -e /foo/bar/Mus_musculus/Ensembl/Mus_musculus.GRCm38.ncrna -f /foo/bar/Mus_musculus/Rfam/12.3/Rfam-12.3-mouse -o /foo/bar/output/ -k

  4. Example output file structure for 1 query file input (e.g. SeqFile):

    Output folder structure
       │
       ├─1_SeqFile
       │   │
       │   ├─SeqFile_fa (if -k applied)
       │   │   │
       │   │   ├SeqFile.fa					---unique seqs with reads number
       │   │   │
       │   │   ├

Related Skills

View on GitHub
GitHub Stars56
CategoryDevelopment
Updated21d ago
Forks15

Languages

Perl

Security Score

100/100

Audited on Mar 9, 2026

No findings