Pantera
Identification of transposable element families from pangenome polymorphisms
Install / Use
/learn @piosierra/PanteraREADME

Identification of transposable element families from pangenome polymorphisms
A pangenome is a collection of genomes or haplotypes that can be aligned and stored as a variation graph in gfa format. pantera receives as input a list of gfa files of non overlapping variation graphs and produces a library of transposable elements found to be polymorphic on that pangenome.
0- Installing
Simply download pantera.R and make it executable chmod +x pantera.R or run with Rscript Rscript pantera.R
1- Prepare your gfa files
Use pggb to create the pangenome from your starting genome sequences. In its most basic form:
1.1- Create a sigle file for each chromosome combining their fasta files.
cat *.fa yourspecies.chr1.fa
1.2- Compress and index the file.
bgzip -@ 4 yourspecies.chr1.fa
samtools faidx yourspecies.chr1.fa.gz
1.3- Create a pangenome for each chromosome. -n is the number of haplotypes/genomes included. -t is the number of threads.
pggb -i yourspecies.chr1.fa.gz -o output -n 9 -t 16
2- Obtain the library from the gfa files
2.1 Create one file gfas_list with the list of the full paths to the gfa files that will be analyzed.
2.2 Run pantera, with -c as the number of threads.
pantera -g gfas_list -c 16 -o output_folder
3- Classify the library obtained
3.1 Use RepeatClassifier, which is part of the Dfam TE tools, or your classifier of choice to classify the sequences obtained.
RepeatClassifier -consensi pantera_lib.fa
3 a- (Optional) Identify structural features in the sequences
pantercheck.R pantera_lib.fa.classified
This will produce a file pantera_lib.fa.classified.benchmark2 with the following fields:
- Column 1: sequence name
- Column 2: sequence length
- Column 3: LTR length
- Column 4: TIR length
- Column 5: polyA length
- Column 6: Sequence repetitiveness (lower means less repetitive)
Data example
In the folder test there is a gfa example file with the respective outputs that can be used to check if pantera works correctly on your system.
Requirements
pantera has been tested in Linux with R 4.2.2 to R 4.3.3
pantera requires MAFFT installed and available in the path.
pantercheck requires BLAST installed and available in the path.
Citing pantera
If you use pantera in your work, please cite:
Sierra, P., Durbin, R. Identification of transposable element families from pangenome polymorphisms. Mobile DNA 15, 13 (2024). doi.org/10.1186/s13100-024-00323-y
Related Skills
node-connect
337.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.2kCommit, push, and open a PR
