TrEMOLO
Transposable Elements MOvement detection using LOng reads
Install / Use
/learn @DrosophilaGenomeEvolution/TrEMOLOREADME
- Introduction
- Release note
- Requirements
- Installation
- Configuration
- Usage
- Output files
- Modules
- strategies
- Citation & Licence
TrEMOLO<a name="introduction"></a>
Transposable Elements MOvement detection using LOng reads
TrEMOLO uses long reads, either directly or through their assembly, to detect:
- Global TE variations between two assembled genomes
- Populational/somatic variations in TE insertions/deletions
Global variations, the insiders<a name="in"></a>
Using a reference genome and an assembled one (preferentially using long contigs or even better a chrosomome-scale assembly), TrEMOLO will extract the insiders, i.e. variant transposable elements (TEs) present globally in the assembly, and tag them. Indeed, assemblers will provide the most frequent haplotype at each locus, and thus an assembly represent just the "consensus" of all haplotypes present at each locus. You will obtain a set of files with the location of these variable insertions and deletions.
Populational variations, the outsiders<a name="out"></a>
Through remapping of reads that have been used to assemble the genome of interest, TrEMOLO will identify the populational variations (and even somatic ones) within the initial dataset of reads, and thus of DNA/individuals sampled. These TE variants are the outsiders, present only in a part of the population or cells. In the same way as for insiders, you will obtain a set of files with the location of these variable insertions and deletions.
Release Notes<a name="release"></a>
Version 2.5.6
-
Change : Output files
- Add deletion
INSIDERtoTE_INFOS.bed - Add new output file :
MULTIPLE_TE_BY_ID.txtSometimes loci contain several different TE families (for example, when working with a population). This can be shown in the file listing the insertion ID, the TE name, and the count.
- Add deletion
-
Add : New Parameters in config.yaml
- TIME_LIMIT : time limit for processing SV detection (rule TrEMOLO_SV_TE); put value >= 0 (hours); 0 means no time limit
Warning: : The Singularity definition file is currently unreliable and may fail to build the environment. Please use the pre-compiled image (e.g., TrEmOLO.simg) instead.
Current limitations
-
In INSIDER_VARIANT mode, TE annotation on the REFERENCE (parameter INTEGRATE_TE_TO_GENOME) is suboptimal. Some TEs might not be annotated on the reference.
-
Difficulty in identifying the true positives concerning clipped insertions (SOFT, HARD)
Upcoming Features
Comprehensive TE Analysis
In our upcoming release, we will be expanding our analysis capabilities to include a comprehensive examination of Transposable Elements (TEs) within both reads and genomes. This enhancement will go beyond merely identifying INDELs to encompass a full spectrum analysis of TEs.
Requirements<a name="requirements"></a>
Numerous tools are used by TrEMOLO. We recommand to use the Singularity installation to be sure to have all of them in the good configurations and versions.
- For both approaches
- Python 3.6+
- For Global variation tool
- BLAST 2.2+
- Bedtools 2.27.1 v2
- Assemblytics or
- RaGOO
- Liftoff
- For Populational variation tool
- Snakemake 5.5.2+
- Minimap2 2.24+
- Samtools 1.9 and (1.15.1 optional)
- svim 1.4.2
- Sniffles 1.0.12
- Python libs
- Perl v5.26.2+
- For report
- Others
- nodejs
Installation<a name="Installation"></a>
Using Git<a name="git"></a>
Once the requirements fullfilled, just git clone
git clone https://github.com/DrosophilaGenomeEvolution/TrEMOLO.git
Using Singularity<a name="singularity"></a>
Singularity installation Debian/Ubuntu with package
Compiling yourself
A Singularity container (version 3.10.0+ required) is available with all tools compiled in. The Singularity file provided in this repo and can be compiled as such:
sudo singularity build TrEMOLO.simg TrEMOLO/Singularity
YOU MUST BE ROOT for compiling
Alternatively, you can download a pre-compiled Singularity container from the following link:
Download TrEMOLO Singularity Container
Test TrEMOLO with singularity
singularity exec TrEMOLO.simg snakemake --snakefile TrEMOLO/run.snk --configfile TrEMOLO/test/tmp_config.yml
#OR
singularity run TrEMOLO.simg snakemake --snakefile TrEMOLO/run.snk --configfile TrEMOLO/test/tmp_config.yml
Pulling from SingularityHub
This option is disabled since Singularity Hub is for the moment in read-only. We are looking for a Singularity repo to ease the use.
Configuration of the parameter file<a name="configuration"></a>
TrEMOLO uses Snakemake to perform its analyses. You have then first to provide your parameters in a .yaml file (see an example in the config.yaml file). Parameters are :
# all path can be relative or absolute depending of your tree.
#It is advised to only use absolute path if you are not familiar with computer science or the importance of folder trees structure.
DATA:
GENOME: "/path/to/genome_file.fasta" #genome (fasta file) [required]
TE_DB: "/path/to/database_TE.fasta" #Database of TE (a fasta file) [required]
REFERENCE: "/path/to/reference_file.fasta" #reference genome (fasta file) only if INSIDER_VARIANT = True [optional]
SAMPLE: "/path/to/reads_file.fastq" #long reads (a fastq[.gz] file) only if OUTSIDER_VARIANT = True [optional]
#At least, provide either REFERENCE or SAMPLE. Both can be provided
WORK_DIRECTORY: "/path/to/directory" #name of output directory [optional, will be created as 'TrEMOLO_OUTPUT']
#At least, you must provide either the reference file, or the fastq file or both
CHOICE:
PIPELINE:
OUTSIDER_VARIANT: True # outsiders, TE not in the assembly - population variation
INSIDER_VARIANT: True # insiders, TE in the assembly
REPORT: True # for getting a report.html file with graphics
OUTSIDER_VARIANT:
CALL_SV: "sniffles" # possibilities for SV tools: sniffles, no_sniffles
INTEGRATE_TE_TO_GENOME: True # (True, False) Re-build the assembly with the OUTSIDER integrated in
CLIPPED_READS: False # (True, False) Processing of clipped reads (SOFT, HARD)
INSIDER_VARIANT:
DETECT_ALL_TE: False # detect ALL TE on genome (parameter GENOME) assembly not only new insertion. Warning! it may be take several hours on big genomes
INTERMEDIATE_FILE: True # Conserve the intermediate analyses files to process them latter.
PARAMS:
THREADS: 8 #number of threads for some task
OUTSIDER_VARIANT:
MINIMAP2:
PRESET_OPTION: 'map-ont' # minimap2 option is map-ont by default (map-pb, map-ont)
OPTION: '' # more option of minimap2 can be specified here
SAMTOOLS_VIEW:
PRESET_OPTION: ''
SAMTOOLS_SORT:
PRESET_OPTION: ''
SAMTOOLS_C
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
342.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
