Telogator2
A method for measuring allele-specific telomere length and characterizing telomere variant repeat sequences from long reads.
Install / Use
/learn @zstephens/Telogator2README
Telogator2
A method for measuring allele-specific TL and characterizing telomere variant repeat (TVR) sequences from long reads.
If this software has been useful for your work, please cite us at:
Stephens, Z., & Kocher, J. P. (2024). Characterization of telomere variant repeats using long reads enables allele-specific telomere length estimation. BMC bioinformatics, 25(1), 194.
https://link.springer.com/article/10.1186/s12859-024-05807-5
Installation (conda env):
Telogator2 dependencies can be easily installed via conda:
git clone https://github.com/zstephens/telogator2.git && cd telogator2/
# create & activate conda environment
conda env create -f conda_env_telogator2.yaml
conda activate telogator2
# run test data
python telogator2.py -i test_data/hg002-ont-1p.fa.gz \
-o results/ \
-r ont
Installation (pip):
python3.12 -m venv venv
source activate venv/bin/activate
pip install git+https://github.com/zstephens/telogator2.git@v2.2.2
# run test data
telogator2 -i test_data/hg002-ont-1p.fa.gz \
-o results/ \
-r ont \
--minimap2 /path/to/minimap2
Notes:
An aligner executable must be specified, via either --minimap2, --winnowmap, or --pbmm2.
-i accepts fa, fa.gz, fq, fq.gz, or bam (multiple can be provided, e.g. -i reads1.fa reads2.fa). For Revio reads sequenced with SMRTLink13 and onward, we advise including both the "hifi" BAM and "fail" BAM as input.
Recommended settings:
Sequencing platforms have different sequencing error types, as such we recommend running Telogator2 with different options based on which platform was used:
PacBio Revio HiFi (30x) - -r hifi -n 4
PacBio Sequel II (10x) - -r hifi -n 3
Nanopore R10 (30x) - -r ont -n 4
Telogator2 may be unable to analyze older Nanopore data, as reads basecalled with Guppy have prohibitively high sequencing error rates in telomere regions.
For large datasets, such as data from enrichment methods described by Karimian et al. or Schmidt et al., higher thresholds may be needed to reduce false positives: -r ont -n 10.
By default Telogator2 is run with 4 processes. Runtime can be greatly reduced by specifying more, e.g. -p 8 or -p 16, based on your system's available CPU resources.
Larger test data:
These are full-sized datasets and may take several hours to run:
HiFi reads (~70x): hg002-telreads_pacbio.fa.gz
ONT reads (~25x): hg002-telreads_ont.fa.gz
Output files
The primary output files are:
tlens_by_allele.tsvallele-specific telomere lengthsall_final_alleles.pngplots of all alleles (TVR + telomere regions)violin_atl.pngviolin plot of ATLs at each chromosome arm
The main results are in tlens_by_allele.tsv, which has the following columns:
chranchor chromosome arm- subtelomeres that could not be aligned are labeled
chrUfor 'unmapped'
- subtelomeres that could not be aligned are labeled
positionanchor coordinateref_sampthe specific T2T reference contig to which the subtelomere was alignedallele_idID number for this specific allele- ids ending in
iindicate subtelomeres that were aligned to known interstitial telomere regions. These alleles should likely be excluded from subsequent analyses.
- ids ending in
TL_p75ATL (reports 75th percentile by default)read_TLsATL of each supporting read in the clusterread_lengthslength of each read in the clusterread_mapqmapping quality of each read in the clustertvr_lenlength of the cluster's TVR regiontvr_consensusconsensus TVR region sequencesupporting_readsreadnames of each read in the cluster
Human subtelomere references
The reference sequence used for telomere anchoring currently contains the first and last 500kb of each chromosome from the following T2T assemblies:
T2T-chm13- https://github.com/marbl/CHM13T2T-yao- https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA017932T2T-cn1- https://github.com/T2T-CN1/CN1T2T-hg002- https://github.com/marbl/hg002T2T-ksa001- https://github.com/bio-ontology-research-group/KSA001T2T-i002c- https://github.com/LHG-GG/I002C
More subtelomere contigs may be added as they become available.
Non-human references
Experimental support has been added for some non-human references, e.g. mouse:
python telogator2.py -i input.fa \
-o results/ \
-t source/resources/non-human/telogator-ref-mouse.fa.gz \
Or maize:
python telogator2.py -i test_data/ZMMo17-hifi-7p8p.fa.gz \
-o results/ \
-r hifi \
-t source/resources/non-human/telogator-ref-maize.fa.gz \
-k source/resources/non-human/kmers_maize.tsv \
Related Skills
next
A beautifully designed, floating Pomodoro timer that respects your workspace.
product-manager-skills
49PM skill for Claude Code, Codex, Cursor, and Windsurf: diagnose SaaS metrics, critique PRDs, plan roadmaps, run discovery, and coach PM career transitions.
devplan-mcp-server
3MCP server for generating development plans, project roadmaps, and task breakdowns for Claude Code. Turn project ideas into paint-by-numbers implementation plans.
