GraffiTE
GraffiTE is a pipeline that finds polymorphic transposable elements in genome assemblies and/or long reads, and genotypes the discovered polymorphisms in read sets using genome-graphs.
Install / Use
/learn @cgroza/GraffiTEREADME

🗞️ The GraffiTE paper is now out!
What does GraffiTE do?
-
Insertion Polymorphims:
GraffiTEfinds polymorphic transposable elements insertions in genome assemblies and/or long read datasets (presence/absence). It can further genotype the discovered polymorphisms (i.e. infer whether an insertion is homozygous or heterozygous) in read sets using a TE-graph-genome.GraffiTEhandles both "reference" (i.e. TE present in the reference genome, but absent in alternative samples) and "non-reference" (de-novo) insertions. -
VCF annotation:
GraffiTEcan also be used to annotate TE presents in structual variants (SVs) reported inVCFformat.
Pipeline overview:
-
First, each genome assembly or long read dataset is aligned to the reference genome with
minimap2, alternatively,winnowmapis available. For each sample considered, structural variants (SVs) are called withsvim-asmif using assemblies orsniffles2if using long reads and only insertions and deletions relative to the reference genome are kept.
-
Candidate SVs (INS and DEL) are scanned with
RepeatMasker, using a user-provided library of repeats of interest (.fasta). SVs covered ≥80% by repeats are kept. At this step, target site duplications (TSDs) are searched for SVs representing a single TE family.
-
Each candidate repeat polymorphism is induced in a graph-genome where TEs and repeats are represented as bubbles, allowing reads to be mapped on either presence of absence alleles with
Pangenie,GiraffeorGraphAligner.
GraffiTEwas initially developed by Cristian Groza and Clément Goubert at Guillaume Bourque's group at the Genome Centre of McGill University (Montréal, Canada).GraffiTEis based on the concept developped in Groza et al., 2022.
⚠️ Bug/issues as well as comments and suggestions are welcomed in the Issue section of this Github.
Changelog
Last update: 01/0125 | commit 1cbebbf
- :beetle: bug fix: remove
--nolowfrom RepeatMasker call: this could have caused spurious hits on low complexity regions of some TE consensus, mistaking obvious tandem repeats for real TEs. It is very important to NOT USE--nolowwith RepeatMasker (unless needed for debugging and special cases).
Previous update: 11/07/24 | commit: 76537f9
- Added a new
--tsd_timeoption to specify the time request for the TSD modules when usingclusterprofile. Default remains1h. No need to update the image, simply pull this Github repository.
Thank you @Han-Cao for submitting a pull request:
- Improve speedup for large VCF annotation
- :beetle: bug fix: change 1-based to 0-based coordinates system for SVA-VNTR module No need to update the image, simply pull this Github repository.
- :beetle: bug fix: transform RepeatMasker coordinates from 1-based to 0-based in order to meet the bed format standard and measure accurate hit length. This fixes issue #43
- New option
--break_scaffolds(see additional parameters) that automatically split contigs at runs of N > 4. With some scaffolded genomes, minimap2 can indeed return an error related to some CIGAR string being too long, typically[E::parse_cigar] CIGAR length too long at position .... Breaking scaffolds at N stretches typicaly solve this problem, caused by limitations of thehtslib/SAM specification.
- Added new/alternative compatible classes names: MITE, TIR and IS. e.g.:
>TEnameX#MITE>TEnameY#TIR/Marineror>TEnameX#IS. In previous versions, TE named with these classes were discarded byOneCodeToFindThemAll- The compatible classes in the fasta header includes (i.e.
Classin>TEname#Class/Superfamily):LINE,LTR,SINE,RC/Helitron(will be treated asDNA/RC),DNA,TIR,MITE,Retroposon,IS,Unknown,Unspecified - TE for which a classification is absent will be treated as
Unknown(e.g.>TEnameZ) - All
>TEnamesandSuperfamilywill be accepted as long as theClassname is among those supported.
- The compatible classes in the fasta header includes (i.e.
- Since > beta 0.2.5 we switched versioning to commit id. Please refer to the commit ID of the version of GraffiTE you are using if you need support.
- :beetle: bug fix: recently, the L1 inversion flag was not working (
--mammal). It has now been fixed. - Winnowmap is now available as an alternative mapper instead of Minimap2. To enable Winnowmap, use the flag
--aligner winnowmap; default remains minimap2.
- :beetle: bug fix: fix a VCF annotation issue that was happening when two distinct variants shared the same VCF POS field. Annotations are now distinct depending on the variant sequence.
- cleanup GraphAligner VCF outputs for clarity.
<
Related Skills
node-connect
335.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
82.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
335.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
82.7kCommit, push, and open a PR
