QuarTeT
A telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification
Install / Use
/learn @aaranyue/QuarTeTREADME
quarTeT: Telomere-to-telomere Toolkit
quarTeT is a collection of tools for T2T genome assembly and basic analysis in automatic workflow.
Task include:
- AssemblyMapper: reference-guided genome assembly
- GapFiller: long-reads based gap filling
- TeloExplorer: telomere identification
- CentroMiner: centromere candidate prediction
Version Change log
1.2.5
- Add new '--groupcontig' option for AssemblyMapper. Adding this option will output a folder containing contigs grouped by reference sequence (will group unassigned contigs into one).
- Improve error report in CentroMiner module.
- Fix a bug that '--keep' option for AssemblyMapper refers to the wrong dictionary.
- Fix a bug that passing argument containing spaces cannot be recognized by main program.
1.2.4
- Add new '--teclade' and '--teminrepeattimes' option for AssemblyMapper to control the behavior of built-in TeloExplorer.
- Add new '-a' option for GapFiller to select unimap as aligner. (Also fix the bug that default aligner not set in GapFiller after v1.2.3, thanks to a927050047, PR #46)
1.2.3
- Add new '--extract-ref-flanks' option for AssemblyMapper, which allow chimeric contig output for gap filling. (see issue #42 for detail)
- Support Unimap as aligner. As a optimized version of minimap2, it still use the '--minimapoption'.
- Fix a bug that monopolizer contig length standard is too high.
- Deprecate the web server for maintenance issue (sorry!).
1.2.2
- Add new '--keep' option for AssemblyMapper, which add all unplaced contigs in the draft genome.
- Add a new output AGP file for GapFiller to describe the modified chromosome structure.
- Fix another bug that cause small number of N representing unknown bases are identified as gap.
1.2.1
- Fix a bug in CentroMiner that ploting halted when optional gene/TE annotation file is not given.
1.2.0
- CentroMiner is refactored. It can receive gene annotation as well now. The output has only 2 folders now: TandemRepeat (fasta, gff3 of TR) and Candidate (candidate info and TR, TE, and gene content line chart, require ggplot2). TE data is removed from candidate info. It is recommended to check the line chart to decide which candidate you vote.
- '--noplot' option is added to each module. With this option, any ploting will be skiped. If you have problem in graphical issue, try this option.
- CloserScore in GapFiller detail is changed to CloserIdentity.
- Due to the major changes, the web server version of quarTeT will not be updated to the newest version for now.
1.1.8
- Gapfiller will throw a warning instead of error when flanking sequence contains gap.
1.1.7
- Support RepeatMasker's TE annotation format for CentroMiner module.
1.1.6
- add new option 'maximum TR length' (-r) for CentroMiner to avoid trf stuck. (Thanks to atotickov, PR #22)
- disable unstable 'join' mode for GapFiller by default. Use '--enablejoin' option to enable this mode. '--fillonly' option is removed.
1.1.5
- SVG output is moved to work dir instead of tmp dir. Intermediate file for figure drawing is saved to tmp dir instead of auto-remove.
- Fix a bug that running multiple quarTeT in one folder may cause error due to intermediate file overwrite.
- Fix a bug in AssemblyMapper that with option '--nofilter', contigs shorter than 50000 bp are still marked as too short and count in discarded length.
- Fix a bug that error in R figure drawing is not reported.
1.1.4
- Fix a bug in AssemblyMapper that large dict tmp file not write properly.
- Reduce more peak memory.
- Add a memory insufficient error report.
1.1.3
- Reduce peak memory.
- Add more error report.
- Fix some error without exit.
1.1.2
- Fix a bug that AssemblyMapper cannot overwrite existing telomere checking result.
- Fix a bug that small number of N repesenting unknown bases are identified as gap.
1.1.1
- Fix a bug that CentroMiner stuck after v1.0.4
1.1.0
- AssemblyMapper: new option '--nofilter'. With this option, input contigs will not be split or discard even if have gaps or too short.
- GapFiller: support join, but this is not as reliable as fill. you can use option '--fillonly' and '--joinonly' to disable one of them.
- TeloExplorer: now compatible with latest tidk version 0.2.31.
- fix a bug that error report added in v1.0.4 didn't include stderr.
1.0.4
- Add more report when called programs are failed.
1.0.3
- Fix a bug that when figure drawing is failed, there are no warning raised.
1.0.2
- Fix a bug in TeloExplorer that when more than one possible telomere-like repeats are found, it will be considered as no telomere-like repeat found.
1.0.1
- Fix a bug in CentroMiner that when no centromere-like region is found on a chromosome, genome overview plotting will unexceptly exit.
1.0.0
- Initial release
Getting Started
Use quarTeT on Web
~~quarTeT can be easily accessed on our web server.~~ (Currently deprecated.)
Use quarTeT on local
quarTeT command-line program is availble for Linux.
Dependencies
- Python3 (>3.6, tested on 3.7.4 and 3.9.12)
- Minimap2 (tested on 2.24-r1122 and 2.24-r1155-dirty)
- (Optional)Unimap (tested on 0.1-r41)
- MUMmer4 (tested on 4.0.0rc1)
- trf (tested on 4.09)
- CD-hit (tested on 4.6 and 4.8.1)
- BLAST+ (tested on 2.8.1 and 2.11.0)
- tidk (tested on 0.2.1 and 0.2.31)
- gnuplot (tested on 4.6 and 5.4)
- R (>3.5.0, tested on 3.6.0 and 4.2.2)
- RIdeogram (tested on 0.2.2)
- ggplot2 (tested on 3.3.6 and 3.4.4)
All these dependencies can be easily install via conda:
conda create -n quarTeTdependencies --channel conda-forge --channel bioconda python=3.11.4 minimap2=2.26 mummer4=4.0.0rc1 trf=4.09.1 cd-hit=4.8.1 blast=2.14.0 tidk=0.2.31 r=4.3 r-rideogram=0.2.2 r-ggplot2=3.4.4 gnuplot=5.4 unimap=0.1
Installation
quarTeT do not require installation.
Just clone this repository with git clone https://github.com/aaranyue/quarTeT, and run python3 {path}/quartet.py
Usage
quarTeT: Telomere-to-telomere Toolkit
Usage: python3 quartet.py <module> <parameters>
Modules:
AssemblyMapper | am Assemble draft genome.
GapFiller | gf Fill gaps in draft genome.
TeloExplorer | te Identify telomeres.
CentroMiner | cm Identify centromere candidates.
Use <module> -h for module usage.
AssemblyMapper
AssemblyMapper is a reference-guided assemble tool.
A phased contig-level assembly and a close-related reference genome are required as input, both in fasta format.
Note that contigs should be phased.
It's recommended to obtain such an assembly using hifiasm.
you can convert {prefix}.bp.hap1.p_ctg.gfa and {prefix}.bp.hap2.p_ctg.gfa generated by hifiasm to FASTA format as input, respectively.
Usage: python3 quartet.py AssemblyMapper <parameters>
-h, --help show this help message and exit
-r REFERENCE_GENOME (*Required) Reference genome file, FASTA format.
-q CONTIGS (*Required) Phased contigs file, FASTA format.
-c MIN_CONTIG_LENGTH Contigs shorter than INT (bp) will be removed, default: 50000
-l MIN_ALIGNMENT_LENGTH
The min alignment length to be select (bp), default: 10000
-i MIN_ALIGNMENT_IDENTITY
The min alignment identity to be select (%), default: 90
-p PREFIX The prefix used on generated files, default: quarTeT
-t THREADS Use number of threads, default: 1
-a {minimap2,unimap,mummer}
Specify alignment program (support minimap2, unimap and mummer), default: minimap2
--nofilter Use original sequence input, no filtering.
--keep Keep the unplaced contigs in draft genome
--groupcontig Add an folder output of contigs grouped by destination.
--extract-ref-flanks CHIMERA
Add an output of chimera contig containing reference flanks of x bp (check issue#42 for detail), default: 0 (off)
--plot Plot a colinearity graph for draft genome to reference alignments. (will cost more time)
--noplot Skip all ploting.
--overwrite Overwrite existing alignment file instead of reuse.
--minimapoption MINIMAPOPTION
Pass additional parameters to minimap2/unimap program, default: -x asm5
--nucmeroption NUCMEROPTION
Pass additional parameters to nucmer program.
--deltafilteroption DELTAFILTEROPTION
Pass additional parameters to delta-filter program.
--teclade {plant,animal,other}
Specify clade of this genome for telomere search. Plant will search TTTAGGG, animal will search TTAGGG, other will use tidk explore's suggestion, default: other
--teminrepeattimes TE_MIN_REPEAT_TIMES
The min repeat times to considered as telomere, default: 100
Output files should be as follow:
{prefix}.draftgenome.fasta | The pseudo-chromosome-level assembly, fasta format.
{prefix}.draftgenome.agp | The structure of this assembly, AGP format.
{prefix}.draftgenome.stat | The statistic of this assembly, including total size and each chromosome's size, GC content, gap count and locations.
{prefix}.draftgenome.png | The figure draws relative length of chromosomes and gap locations for assembly.
{prefix}.contig.ma
Related Skills
node-connect
350.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
