TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.

Generate Convert Improve

Install / Use

/learn @mhammell-laboratory/TEtranscripts

About this skill

Quality Score

0/100

README

TEtranscripts

Version: 2.2.3

NOTE TEtranscripts and TEcount rely on specially curated GTF files, which are not packaged with this software due to their size. Please go to our website <https://www.mghlab.org/software/tetranscripts>_ for instructions to download the curated GTF files.

TEtranscripts and TEcount takes RNA-seq (and similar data) and annotates reads to both genes & transposable elements. TEtranscripts then performs differential analysis using DESeq2.

Github Page <https://github.com/mhammell-laboratory/TEtranscripts>_

Pypi Page <https://pypi.python.org/pypi/TEtranscripts>_

Molly Gale Hammell Lab <https://www.mghlab.org/software>_

Created by Ying Jin, Eric Paniagua, Oliver Tam & Molly Gale Hammell, February 2014

Contact: mghcompbio@gmail.com

Requirements

Python: 2.7.x or >= 3.2.x (tested on Python 2.7.11 and 3.7.7)

pysam: 0.9.x or greater

R: 2.15.x or greater

DESeq2: 1.10.x or greater

Installation

Download compressed tarball.
Unpack tarball.
Navigate into unpacked directory.
Run the following::

$ python setup.py install

If you want to install locally (e.g. /local/home/usr), run this command instead::

$ python setup.py install --prefix /local/home/usr

NOTE In the above example, you must add ::

/local/home/usr/bin

to the PATH variable, and ::

 /local/home/usr/lib/pythonX.Y/site-packages

to the PYTHONPATH variable, where X refers to the major python version, and Y refers to the minor python version. (e.g. python2.7 if using python version 2.7.x, and python3.6 if using python version 3.6.x)

Alternative Singularity Installation for HPC

Many High Performance Compunting clusters (HPCs) have access to singularity which allows for the download and execution of containers, TEtranscripts also has a container through docker, it can be downloaded by singularity thusly::

singularity pull tetranscripts.sif docker://mhammelllab/tetranscripts:latest

Execution is then through singularity as well::

singularity exec tetranscripts.sif TEtranscripts -t <treatment sample> -c <control sample> --GTF <genic-GTF-file> --TE <TE-GTF-file>

TEtranscripts

Usage

usage: TEtranscripts -t treatment sample [treatment sample ...]
                     -c control sample [control sample ...]
                     --GTF genic-GTF-file
                     --TE TE-GTF-file
                     [optional arguments]

Required arguments:
  -t | --treatment [treatment sample 1 treatment sample 2...]
     Sample files in group 1 (e.g. treatment/mutant), separated by space
  -c | --control [control sample 1 control sample 2 ...]
     Sample files in group 2 (e.g. control/wildtype), separated by space
  --GTF genic-GTF-file  GTF file for gene annotations
  --TE TE-GTF-file      GTF file for transposable element annotations

Optional arguments:

  *Input/Output options*
  --format [input file format]
     Input file format: BAM or SAM. DEFAULT: BAM
  --stranded [option]   Is this a stranded library? (no, forward, or reverse).
             no      -  Library is unstranded
             forward -  "Second-strand" cDNA library
                        (e.g. QIAseq stranded)
             reverse -  "First-strand" cDNA library
                        (e.g. Illumina TruSeq stranded)
                        DEFAULT: no.
  --sortByPos           Input file is sorted by chromosome position.
  --project [name]      Prefix used for output files (e.g. project name)
                        DEFAULT: TEtranscript_out
  --outdir [directory]  Directory for output files.
                        DEFAULT: current directory

  *Analysis options*
  --mode [TE counting mode]
     How to count TE:
        uniq        (unique mappers only)
        multi       (distribute among all alignments).
     DEFAULT: multi
  --minread [min_read] read count cutoff. DEFAULT: 1
  -L | --fragmentLength [fragLength]
     Average length of fragment used for single-end sequencing
     DEFAULT: For paired-end, estimated from the input alignment file. For single-end, ignored by default.
  -i | --iteration
     maximum number of iterations used to optimize multi-reads assignment. DEFAULT: 100
  -p | --padj [pvalue]
     FDR cutoff for significance. DEFAULT: 0.05
  -f | --foldchange [foldchange]
     Fold-change ratio (absolute) cutoff for differential expression.
     DEFAULT: 1

  *DESeq1 compatibility options*
  --DESeq
     Use DESeq (instead of DESeq2) for differential analysis.
  -n | --norm [normalization]
     Normalization method : DESeq_default (default normalization method of DESeq), TC (total annotated read counts), quant (quantile normalization). Only applicable if DESeq is used instead of DESeq2.
     DEFAULT: DESeq_default

  *Other options*
  -h | --help
     Show help message
  --verbose [number]
     Set verbose level.
       0: only show critical messages
       1: show additional warning messages
       2: show process information
       3: show debug messages
     DEFAULT: 2
  --version
     Show program's version and exit

NOTE BAM files must be either unsorted or sorted by queryname. If the BAM files are sorted by position, please use the :code:--sortByPos option

Example Command Lines

If BAM files are unsorted, or sorted by queryname::

TEtranscripts --format BAM --mode multi -t RNAseq1.bam RNAseq2.bam -c CtlRNAseq1.bam CtlRNAseq.bam --GTF gene_annot.gtf --TE te_annot.gtf --project sample_nosort_test

If BAM files are sorted by coordinates/position::

TEtranscripts --sortByPos --format BAM --mode multi -t RNAseq1.bam RNAseq2.bam -c CtlRNAseq1.bam CtlRNAseq.bam --GTF gene_annot.gtf --TE te_annot.gtf --project sample_sorted_test

Cluster Usage Recommendation

In our experience, we recommend around 20-30Gb of memory for analyzing human samples (hg19) with around 20-30 million mapped reads when running on a cluster.

TEcount

Usage

usage: TEcount -b RNAseq BAM
               --GTF genic-GTF-file
               --TE TE-GTF-file
               [optional arguments]

Required arguments:
  -b | --BAM alignment-file  RNAseq alignment file (BAM preferred)
  --GTF genic-GTF-file       GTF file for gene annotations
  --TE TE-GTF-file           GTF file for transposable element annotations

Optional arguments:

  *Input/Output options*
  --format [input file format]
     Input file format: BAM or SAM. DEFAULT: BAM
  --stranded [option]   Is this a stranded library? (no, forward, or reverse).
             no      -  Library is unstranded
             forward -  "Second-strand" cDNA library
                        (e.g. QIAseq stranded)
             reverse -  "First-strand" cDNA library
                        (e.g. Illumina TruSeq stranded)
                        DEFAULT: no.
  --sortByPos           Input file is sorted by chromosome position.
  --project [name]      Prefix used for output files (e.g. project name)
                        DEFAULT: TEcount_out
  --outdir [directory]  Directory for output files.
                        DEFAULT: current directory

  *Analysis options*
  --mode [TE counting mode]
     How to count TE:
        uniq        (unique mappers only)
        multi       (distribute among all alignments).
     DEFAULT: multi
  -L | --fragmentLength [fragLength]
     Average length of fragment used for single-end sequencing
     DEFAULT: For paired-end, estimated from the input alignment file. For single-end, ignored by default.
  -i | --iteration
     maximum number of iterations used to optimize multi-reads assignment. DEFAULT: 100

  *Other options*
  -h | --help
     Show help message
  --verbose [number]
     Set verbose level.
       0: only show critical messages
       1: show additional warning messages
       2: show process information
       3: show debug messages
     DEFAULT: 2
  --version
     Show program's version and exit

NOTE BAM files must be either unsorted or sorted by queryname. If the BAM files are sorted by position, please use the :code:--sortByPos option

Example Command Lines

If BAM files are unsorted, or sorted by queryname::

TEcount --format BAM --mode multi -b RNAseq.bam --GTF gene_annot.gtf --TE te_annot.gtf --project sample_nosort_test

If BAM files are sorted by coordinates/position::

TEcount --sortByPos --format BAM --mode multi -b RNAseq.bam --GTF gene_annot.gtf --TE te_annot.gtf --project sample_sorted_test

Cluster Usage Recommendations

TEcount is better suited than TEtranscripts for usage in the cluster environment, as each sample (e.g. replicates of an experiment) can be quantified on separate nodes. The output can then be merged into a single count table for differential analysis. In our experience, we recommend around 20-30Gb of memory for analyzing human samples (hg19) with around 20-30 million mapped reads when running on a cluster.

Recommendations for TEtranscripts input files

TEtranscripts can perform transposable element quantification from alignment results (e.g. BAM files) generated from a variety of programs. Given the variety of experimental systems, we could not provide an optimal alignment strategy for every approach. Therefore, we recommend that users identif

Related Skills

node-connect

336.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

82.9k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

336.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

82.9k

Commit, push, and open a PR