DnaPipeTE
dnaPipeTE (for de-novo assembly & annotation Pipeline for Transposable Elements), is a pipeline designed to find, annotate and quantify Transposable Elements in small samples of NGS datasets. It is very useful to quantify the proportion of TEs in newly sequenced genomes since it does not require genome assembly and works on small datasets (< 1X).
Install / Use
/learn @clemgoub/DnaPipeTEREADME

dnaPipeTE

dnaPipeTE (for de-novo assembly & annotation Pipeline for Transposable Elements), is a pipeline designed to find, classify and quantify Transposable Elements and other repeats in low coverage (< 1X) NGS datasets. It is very useful to quantify the proportion of TEs in newly sequenced genomes since it does not require genome assembly and works directly on raw short-reads.
-
:family: dnaPipeTE was created in 2015 by Clément Goubert and Laurent Modolo at the LBBE, with the latter contributions of Romain Lannes (@rLannes), @pauram and T. Mason Linscott. Thanks a lot!
-
:package: The container version has been made possible thanks to Stéphane Delmotte of the LBBE.
- The current version of dnaPipeTE is v.1.4c "container" and is available through Docker/Singularity (see Installation). Changelogs can be found here.<br>
- From now on, only the container versions of dnaPipeTE will have support. Thank you for your understanding! Container versions are stored on the Docker Hub.
- The last non-container version of dnaPipeTE 1.3.1 is available here.
-
:page_facing_up: You can read the original publication in GBE
-
:bar_chart: A companion repository dnaPT_utils provides useful scripts for post-processing and to create customizable figures. It requires a UNIX environment with
bash,Randcd-hit. It is not required for execution of dnaPipeTE.dnaPT_utils has been added to the latest distribution (v1.4c).
-
:stethoscope: If you encounter some issues with dnaPipeTE, you can request assistance here!
-
:teacher: An introductory tutorial to dnaPipeTE is available on the TE-hub
-
:teacher: An advanced tutorial is published in the book "Transposable Elements, Methods and Protocols (2022)": dnaPipeTE chapter

Installation
System requirement
dnaPipeTE can now run on any system compatible with Docker or Singularity. It is recommended to have a minimum of 16Go or RAM, and multiple cpus will improve the execution speed.
Trinity (used for the repeats' assembly) can use a lot of RAM! Here are some examples of RAM usages:
- 100,000 reads ~10 Go RAM (two Trinity iterations)
- 3,000,000 reads ~40 Go RAM (two Trinity iterations)
Docker (root users)
Docker must be installed and running on the execution machine. For more details see https://docs.docker.com/get-docker/. Then, download the dnaPipeTE container:
sudo docker pull clemgoub/dnapipete:latest
Singularity/Apptainer (non-root users, HPC,...)
For users of High Performance Clusters (HPC) and other system with no root privileges, it is recommended to use Singularity (usualy provided with the base software; for more information see https://sylabs.io/guides/3.0/user-guide/installation.html).
To use dnaPipeTE with Singularity you need to create an image of the container on your machine.
mkdir ~/dnaPipeTE
cd ~/dnaPipeTE
singularity pull --name dnapipete.img docker://clemgoub/dnapipete:latest
This step takes ~20 minute to build the image, and is only required once.
Running dnaPipeTE
Create a project folder
mkdir ~/Project
cd Project
~/Project will be mounted into the /mnt directory of the Docker or Singularity container and will contain the inputs and outputs.
Input File
The input file must be a single-end FASTQ or FASTQ.GZ file of NGS reads. It can be either the R1 or R2 end of a paired-end library. dnaPipeTE performs the sampling automatically, so you can provide a large file (> 1X) as input.
IMPORTANT: We recommend to remove mitochondrial DNA and other non-nuclear DNA from your reads (symbionts, virus, contaminants). If mtDNA reads are left in the samples, the mitochondrial genome will be assembled and will appear as one of the most abundant repeat in the output for a size of ~10kb (it may also be wrongly classified as TE!).
For the following examples, we will consider a fictitious read file called reads_input.fastq
Interactive usage
Docker
# start the dnaPipeTE container
sudo docker run -it -v ~/Project:/mnt clemgoub/dnapipete:latest
Once in the container, run:
python3 dnaPipeTE.py -input /mnt/reads_input.fastq -output /mnt/output -RM_lib ../RepeatMasker/Libraries/RepeatMasker.lib -genome_size 170000000 -genome_coverage 0.1 -sample_number 2 -RM_t 0.2 -cpu 2
Singularity
singularity shell --bind ~Project:/mnt ~/dnaPipeTE/dnapipete.img
Once in the container, run:
cd /opt/dnaPipeTE # <<<--- This line is very important to run the program with singularity!
python3 dnaPipeTE.py -input /mnt/reads_input.fastq -output /mnt/output -RM_lib ../RepeatMasker/Libraries/RepeatMasker.lib -genome_size 170000000 -genome_coverage 0.1 -sample_number 2 -RM_t 0.2 -cpu 2
Batch file usage
We create a file dnaPT_cmd.sh that will contain the dnaPipeTE command:
Docker:
#! /bin/bash
python3 dnaPipeTE.py -input /mnt/reads_input.fastq -output /mnt/output -RM_lib ../RepeatMasker/Libraries/RepeatMasker.lib -genome_size 170000000 -genome_coverage 0.1 -sample_number 2 -RM_t 0.2 -cpu 2
and then
sudo docker run -v ~Project:/mnt clemgoub/dnapipete:latest ./mnt/dnaPT_comd.sh
Singularity
#! /bin/bash
cd /opt/dnaPipeTE # <<<--- This line is very important to run the program with singularity!
python3 dnaPipeTE.py -input /mnt/reads_input.fastq -output /mnt/output -RM_lib ../RepeatMasker/Libraries/RepeatMasker.lib -genome_size 170000000 -genome_coverage 0.1 -sample_number 2 -RM_t 0.2 -cpu 2
and then
singularity exec --bind ~Project:/mnt ~/dnaPipeTE/dnapipete.img /mnt/dnaPipeTE_cmd.sh
dnaPipeTE arguments
|Argument|Description|
|---|---|
|-input | input fastq or fastq.gz files (single end only). It will be sampled |
|-output | complete path with name for the outputs |
|-cpu | maximum number of cpu to use |
|-sample_number | number of trinity iterations |
|-genome_size | size of the genome [use it with -genome_coverage; if used, do not use -sample_size] Ex. 175000000 for 175Mb |
|-genome_coverage | coverage of the genome for each sample [use it with -genome_size; if used, do not use -sample_size] Ex: 0.1 for 0.1X coverage per sample |
|-sample_size | number of reads to sample [use without -genome_size and -genome_coverage] |
|-RM_lib | path to repeat library for RepeatMasker. By default use ../RepeatMasker/Libraries/RepeatMasker.lib. For a custom library, the header format must follow: >Repeat_name#CLASS/Subclass with CLASS in "DNA, LINE, LTR, SINE, MITE, Helitron, Simple Repeat, Satellite"|
|-RM_t | Annotation threshold: minimal percentage of the query (dnaPipeTE contig) aligned on the repeat to keep the annotation from RepeatMasker. Ex: 0.2 for 20% of query in db |
|-keep_Trinity_output | Keep Trinity output files at the end of the run. Default files are removed (large and numerous).|
|-contig_length | minimum size of a repeat contig to be retained (default 200bp) |
Continuing a crashed run: dnaPipeTE is able to skip some steps if a run crashes after a checkpoint. For example, if it crashes during the Trinity assembly, the sampling won't be performed again if you launch the run again in the same output folder. The checkpoints are 1-sampling of Trinity inputs; 2- Trinity assembly.
dnaPipeTE OUTPUTS
dnaPipeTE produces a lot of outputs, some of them are very interesting!
The output folder is divided into the following parts:
- main folder (output name):
important files:
|File|Description| |---|---| | "Trinity.fasta" | this file contains the dnaPipeTE contigs, this is the last assembly performed with Trinity | | "reads_per_component_and_annotation" | table with the count of reads and bp aligned per dnaPipeTE contigs (from blastn 1), as well as its best RepeatMasker annotation. <ul><li>1: counts (#reads)</li><li>2: aligned bases</li><li>3 dnaPipeTE contig name</li><li>4 Repeat Masker hit length (bp)</li><li>5 RepeatMakser annotation</li><li>6 RM classification</li><li>7 hit length / dnaPipeTE contig length</li></ul> |
less important files you may like:
|File|Description| |---|---| |"Trinity.fasta.out" | raw RepeatMasker output (not sorted) of Trinity.fasta on the repeat libraries.| |"Counts.txt"| count of bp of the sample aligned for each TE class (used for the pieChart)| |"Reads_to_components_Rtable.txt"| input file to compute the reads and bp per contig (one line per reads)| |"Bases_per_component.pdf/png" | graph with the number of base-pairs aligned on each dnaPipeTE contig (from blast 1), ordered by genome proportion of the dnaPipeTE contig. -- however, see dnaPT_utils improved graphs| | "pieChart.pdf/png" | graph with the relative proportion of the main repeat classes, informs about the estimated proportion of repeats in the genome (from blastn 2 and 3) -- however, see: [dnaPT_utils]
Related Skills
diffs
337.3kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
openpencil
1.8kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
HappyColorBlend
HappyColorBlendVibe Project Guidelines Project Overview HappyColorBlendVibe is a Figma plugin for color palette generation with advanced tint/shade blending capabilities. It allows designers to
