SkillAgentSearch skills...

TrEMOLO

Transposable Elements MOvement detection using LOng reads

Install / Use

/learn @DrosophilaGenomeEvolution/TrEMOLO
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg

<img src="images/TrEMOLO9.png">

TrEMOLO<a name="introduction"></a>

Transposable Elements MOvement detection using LOng reads

TrEMOLO uses long reads, either directly or through their assembly, to detect:

  • Global TE variations between two assembled genomes
  • Populational/somatic variations in TE insertions/deletions

Global variations, the insiders<a name="in"></a>

Using a reference genome and an assembled one (preferentially using long contigs or even better a chrosomome-scale assembly), TrEMOLO will extract the insiders, i.e. variant transposable elements (TEs) present globally in the assembly, and tag them. Indeed, assemblers will provide the most frequent haplotype at each locus, and thus an assembly represent just the "consensus" of all haplotypes present at each locus. You will obtain a set of files with the location of these variable insertions and deletions.

Populational variations, the outsiders<a name="out"></a>

Through remapping of reads that have been used to assemble the genome of interest, TrEMOLO will identify the populational variations (and even somatic ones) within the initial dataset of reads, and thus of DNA/individuals sampled. These TE variants are the outsiders, present only in a part of the population or cells. In the same way as for insiders, you will obtain a set of files with the location of these variable insertions and deletions.

Release Notes<a name="release"></a>

Version 2.5.6

  • Change : Output files

    • Add deletion INSIDER to TE_INFOS.bed
    • Add new output file : MULTIPLE_TE_BY_ID.txt Sometimes loci contain several different TE families (for example, when working with a population). This can be shown in the file listing the insertion ID, the TE name, and the count.
  • Add : New Parameters in config.yaml

    • TIME_LIMIT : time limit for processing SV detection (rule TrEMOLO_SV_TE); put value >= 0 (hours); 0 means no time limit

Warning: : The Singularity definition file is currently unreliable and may fail to build the environment. Please use the pre-compiled image (e.g., TrEmOLO.simg) instead.

Current limitations

  • In INSIDER_VARIANT mode, TE annotation on the REFERENCE (parameter INTEGRATE_TE_TO_GENOME) is suboptimal. Some TEs might not be annotated on the reference.

  • Difficulty in identifying the true positives concerning clipped insertions (SOFT, HARD)

Upcoming Features

Comprehensive TE Analysis

In our upcoming release, we will be expanding our analysis capabilities to include a comprehensive examination of Transposable Elements (TEs) within both reads and genomes. This enhancement will go beyond merely identifying INDELs to encompass a full spectrum analysis of TEs.

Requirements<a name="requirements"></a>

Numerous tools are used by TrEMOLO. We recommand to use the Singularity installation to be sure to have all of them in the good configurations and versions.

Installation<a name="Installation"></a>

Using Git<a name="git"></a>

Once the requirements fullfilled, just git clone

git clone https://github.com/DrosophilaGenomeEvolution/TrEMOLO.git

Using Singularity<a name="singularity"></a>

Singularity installation Debian/Ubuntu with package

Compiling yourself

A Singularity container (version 3.10.0+ required) is available with all tools compiled in. The Singularity file provided in this repo and can be compiled as such:

sudo singularity build TrEMOLO.simg TrEMOLO/Singularity

YOU MUST BE ROOT for compiling

Alternatively, you can download a pre-compiled Singularity container from the following link:

Download TrEMOLO Singularity Container

Test TrEMOLO with singularity

singularity exec TrEMOLO.simg snakemake --snakefile TrEMOLO/run.snk --configfile TrEMOLO/test/tmp_config.yml
#OR
singularity run TrEMOLO.simg snakemake --snakefile TrEMOLO/run.snk --configfile TrEMOLO/test/tmp_config.yml

Pulling from SingularityHub

This option is disabled since Singularity Hub is for the moment in read-only. We are looking for a Singularity repo to ease the use.

Configuration of the parameter file<a name="configuration"></a>

TrEMOLO uses Snakemake to perform its analyses. You have then first to provide your parameters in a .yaml file (see an example in the config.yaml file). Parameters are :

# all path can be relative or absolute depending of your tree.
#It is advised to only use absolute path if you are not familiar with computer science or the importance of folder trees structure.
DATA:
    GENOME:          "/path/to/genome_file.fasta"      #genome (fasta file) [required]
    TE_DB:           "/path/to/database_TE.fasta"      #Database of TE (a fasta file) [required]
    REFERENCE:       "/path/to/reference_file.fasta"   #reference genome (fasta file) only if INSIDER_VARIANT = True [optional]
    SAMPLE:          "/path/to/reads_file.fastq"       #long reads (a fastq[.gz] file) only if OUTSIDER_VARIANT = True [optional]
    #At least, provide either REFERENCE or SAMPLE. Both can be provided
    WORK_DIRECTORY:  "/path/to/directory"         #name of output directory [optional, will be created as 'TrEMOLO_OUTPUT']

#At least, you must provide either the reference file, or the fastq file or both

CHOICE:
    PIPELINE:
        OUTSIDER_VARIANT: True  # outsiders, TE not in the assembly - population variation
        INSIDER_VARIANT: True   # insiders, TE in the assembly
        REPORT: True            # for getting a report.html file with graphics
    OUTSIDER_VARIANT:
        CALL_SV: "sniffles"     # possibilities for SV tools: sniffles, no_sniffles
        INTEGRATE_TE_TO_GENOME: True # (True, False) Re-build the assembly with the OUTSIDER integrated in
        CLIPPED_READS: False # (True, False) Processing of clipped reads (SOFT, HARD)
    INSIDER_VARIANT:
        DETECT_ALL_TE: False    # detect ALL TE on genome (parameter GENOME) assembly not only new insertion. Warning! it may be take several hours on big genomes
    INTERMEDIATE_FILE: True     # Conserve the intermediate analyses files to process them latter.


PARAMS:
    THREADS: 8 #number of threads for some task
    OUTSIDER_VARIANT:
        MINIMAP2:
            PRESET_OPTION: 'map-ont' # minimap2 option is map-ont by default (map-pb, map-ont)
            OPTION: '' # more option of minimap2 can be specified here
        SAMTOOLS_VIEW:
            PRESET_OPTION: ''
        SAMTOOLS_SORT:
            PRESET_OPTION: ''
        SAMTOOLS_C

Related Skills

View on GitHub
GitHub Stars25
CategoryDevelopment
Updated4d ago
Forks5

Languages

JavaScript

Security Score

90/100

Audited on Mar 27, 2026

No findings