TEtranscripts
A package for including transposable elements in differential enrichment analysis of sequencing datasets.
Install / Use
/learn @mhammell-laboratory/TEtranscriptsREADME
TEtranscripts
Version: 2.2.3
NOTE TEtranscripts and TEcount rely on specially curated GTF files, which are not
packaged with this software due to their size. Please go to
our website <https://www.mghlab.org/software/tetranscripts>_
for instructions to download the curated GTF files.
TEtranscripts and TEcount takes RNA-seq (and similar data) and annotates reads to both genes & transposable elements. TEtranscripts then performs differential analysis using DESeq2.
Github Page <https://github.com/mhammell-laboratory/TEtranscripts>_
Pypi Page <https://pypi.python.org/pypi/TEtranscripts>_
Molly Gale Hammell Lab <https://www.mghlab.org/software>_
Created by Ying Jin, Eric Paniagua, Oliver Tam & Molly Gale Hammell, February 2014
Copyright (C) 2014-2023 Ying Jin, Eric Paniagua, Talitha Forcier, Oliver Tam & Molly Gale Hammell
Contact: mghcompbio@gmail.com
Requirements
Python: 2.7.x or >= 3.2.x (tested on Python 2.7.11 and 3.7.7)
pysam: 0.9.x or greater
R: 2.15.x or greater
DESeq2: 1.10.x or greater
Installation
-
Download compressed tarball.
-
Unpack tarball.
-
Navigate into unpacked directory.
-
Run the following::
$ python setup.py install
If you want to install locally (e.g. /local/home/usr), run this command instead::
$ python setup.py install --prefix /local/home/usr
NOTE In the above example, you must add ::
/local/home/usr/bin
to the PATH variable, and
::
/local/home/usr/lib/pythonX.Y/site-packages
to the PYTHONPATH variable, where X refers to the major
python version, and Y refers to the minor python version.
(e.g. python2.7 if using python version 2.7.x, and
python3.6 if using python version 3.6.x)
Alternative Singularity Installation for HPC
Many High Performance Compunting clusters (HPCs) have access to singularity which allows for the download and execution of containers, TEtranscripts also has a container through docker, it can be downloaded by singularity thusly::
singularity pull tetranscripts.sif docker://mhammelllab/tetranscripts:latest
Execution is then through singularity as well::
singularity exec tetranscripts.sif TEtranscripts -t <treatment sample> -c <control sample> --GTF <genic-GTF-file> --TE <TE-GTF-file>
TEtranscripts
Usage
::
usage: TEtranscripts -t treatment sample [treatment sample ...]
-c control sample [control sample ...]
--GTF genic-GTF-file
--TE TE-GTF-file
[optional arguments]
Required arguments:
-t | --treatment [treatment sample 1 treatment sample 2...]
Sample files in group 1 (e.g. treatment/mutant), separated by space
-c | --control [control sample 1 control sample 2 ...]
Sample files in group 2 (e.g. control/wildtype), separated by space
--GTF genic-GTF-file GTF file for gene annotations
--TE TE-GTF-file GTF file for transposable element annotations
Optional arguments:
*Input/Output options*
--format [input file format]
Input file format: BAM or SAM. DEFAULT: BAM
--stranded [option] Is this a stranded library? (no, forward, or reverse).
no - Library is unstranded
forward - "Second-strand" cDNA library
(e.g. QIAseq stranded)
reverse - "First-strand" cDNA library
(e.g. Illumina TruSeq stranded)
DEFAULT: no.
--sortByPos Input file is sorted by chromosome position.
--project [name] Prefix used for output files (e.g. project name)
DEFAULT: TEtranscript_out
--outdir [directory] Directory for output files.
DEFAULT: current directory
*Analysis options*
--mode [TE counting mode]
How to count TE:
uniq (unique mappers only)
multi (distribute among all alignments).
DEFAULT: multi
--minread [min_read] read count cutoff. DEFAULT: 1
-L | --fragmentLength [fragLength]
Average length of fragment used for single-end sequencing
DEFAULT: For paired-end, estimated from the input alignment file. For single-end, ignored by default.
-i | --iteration
maximum number of iterations used to optimize multi-reads assignment. DEFAULT: 100
-p | --padj [pvalue]
FDR cutoff for significance. DEFAULT: 0.05
-f | --foldchange [foldchange]
Fold-change ratio (absolute) cutoff for differential expression.
DEFAULT: 1
*DESeq1 compatibility options*
--DESeq
Use DESeq (instead of DESeq2) for differential analysis.
-n | --norm [normalization]
Normalization method : DESeq_default (default normalization method of DESeq), TC (total annotated read counts), quant (quantile normalization). Only applicable if DESeq is used instead of DESeq2.
DEFAULT: DESeq_default
*Other options*
-h | --help
Show help message
--verbose [number]
Set verbose level.
0: only show critical messages
1: show additional warning messages
2: show process information
3: show debug messages
DEFAULT: 2
--version
Show program's version and exit
NOTE BAM files must be either unsorted or sorted by queryname. If the BAM files are sorted by position, please use the :code:--sortByPos option
Example Command Lines
If BAM files are unsorted, or sorted by queryname::
TEtranscripts --format BAM --mode multi -t RNAseq1.bam RNAseq2.bam -c CtlRNAseq1.bam CtlRNAseq.bam --GTF gene_annot.gtf --TE te_annot.gtf --project sample_nosort_test
If BAM files are sorted by coordinates/position::
TEtranscripts --sortByPos --format BAM --mode multi -t RNAseq1.bam RNAseq2.bam -c CtlRNAseq1.bam CtlRNAseq.bam --GTF gene_annot.gtf --TE te_annot.gtf --project sample_sorted_test
Cluster Usage Recommendation
In our experience, we recommend around 20-30Gb of memory for analyzing human samples (hg19) with around 20-30 million mapped reads when running on a cluster.
TEcount
Usage
::
usage: TEcount -b RNAseq BAM
--GTF genic-GTF-file
--TE TE-GTF-file
[optional arguments]
Required arguments:
-b | --BAM alignment-file RNAseq alignment file (BAM preferred)
--GTF genic-GTF-file GTF file for gene annotations
--TE TE-GTF-file GTF file for transposable element annotations
Optional arguments:
*Input/Output options*
--format [input file format]
Input file format: BAM or SAM. DEFAULT: BAM
--stranded [option] Is this a stranded library? (no, forward, or reverse).
no - Library is unstranded
forward - "Second-strand" cDNA library
(e.g. QIAseq stranded)
reverse - "First-strand" cDNA library
(e.g. Illumina TruSeq stranded)
DEFAULT: no.
--sortByPos Input file is sorted by chromosome position.
--project [name] Prefix used for output files (e.g. project name)
DEFAULT: TEcount_out
--outdir [directory] Directory for output files.
DEFAULT: current directory
*Analysis options*
--mode [TE counting mode]
How to count TE:
uniq (unique mappers only)
multi (distribute among all alignments).
DEFAULT: multi
-L | --fragmentLength [fragLength]
Average length of fragment used for single-end sequencing
DEFAULT: For paired-end, estimated from the input alignment file. For single-end, ignored by default.
-i | --iteration
maximum number of iterations used to optimize multi-reads assignment. DEFAULT: 100
*Other options*
-h | --help
Show help message
--verbose [number]
Set verbose level.
0: only show critical messages
1: show additional warning messages
2: show process information
3: show debug messages
DEFAULT: 2
--version
Show program's version and exit
NOTE BAM files must be either unsorted or sorted by queryname. If the BAM files are sorted by position, please use the :code:--sortByPos option
Example Command Lines
If BAM files are unsorted, or sorted by queryname::
TEcount --format BAM --mode multi -b RNAseq.bam --GTF gene_annot.gtf --TE te_annot.gtf --project sample_nosort_test
If BAM files are sorted by coordinates/position::
TEcount --sortByPos --format BAM --mode multi -b RNAseq.bam --GTF gene_annot.gtf --TE te_annot.gtf --project sample_sorted_test
Cluster Usage Recommendations
TEcount is better suited than TEtranscripts for usage in the cluster environment, as each sample (e.g. replicates of an experiment) can be quantified on separate nodes. The output can then be merged into a single count table for differential analysis. In our experience, we recommend around 20-30Gb of memory for analyzing human samples (hg19) with around 20-30 million mapped reads when running on a cluster.
Recommendations for TEtranscripts input files
TEtranscripts can perform transposable element quantification from alignment results (e.g. BAM files) generated from a variety of programs. Given the variety of experimental systems, we could not provide an optimal alignment strategy for every approach. Therefore, we recommend that users identif
Related Skills
node-connect
336.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
336.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.9kCommit, push, and open a PR
