Breakinator
Tool to flag foldback and chimeric artifacts in long-read sequence alignment files
Install / Use
/learn @jheinz27/BreakinatorREADME
The Breakinator
The Breakinator identifies and flags putative artifact reads (foldbacks and chimeric) by parsing SAM/BAM/CRAM or PAF alignment files.
Installation
Prebuilt Binaries
Prebuilt binaries can be downloaded from the Releases page.
wget https://github.com/jheinz27/breakinator/releases/download/v{x.y.z}/breakinator-v{x.y.z}-{system}.tar.gz
tar -xvzf breakinator-v{x.y.z}-{system}.tar.gz
breakinator-v{x.y.z}-{system}/bin/breakinator --help
Bioconda
conda install -c bioconda -c conda-forge breakinator=1.1.1
Install from source
git clone https://github.com/jheinz27/breakinator
cd breakinator/breakinator
cargo build --release
./target/release/breakinator --help
Prerequisites
- Rust programming language >= v1.70
- clap = "4.0"
- rust-htslib = "0.46.0"
Breakinator Usage
Usage: breakinator [OPTIONS] --input <FILE>
Options:
-i, --input <FILE> SAM/BAM/CRAM file sorted by read IDs
--paf Input file is PAF
-q, --min-mapq <INT> Minimum mapping quality [default: 10]
-a, --min-map-len <INT> Minimum alignment length (bps) [default: 200]
--no-sym Report all foldback reads, not just those with breakpoint within margin of middle of read
-g, --genome <FASTA> Reference genome FASTA used (must be provided for CRAM input)
-m, --margin <FLOAT> [0-1], Proportion from center of read on either side to be considered sym foldback artifact [default: 0.1]
--rcoord Print read coordinates of breakpoint in output
-o, --out <FILE> Output file name [default: breakinator_out.txt]
-c, --chim <INT> Minimum distance to be considered chimeric [default: 1000000]
-f, --fold <INT> Max distance to be considered foldback [default: 200]
--tabular Print a TSV table instead of the default report (useful if evaluating multiple samples)
-t, --threads <INT> Number of threads to use for BAM/CRAM I/O [default: 2]
-h, --help Print help
-V, --version Print version
Example Usage
It is important to note that Breakinator currently only supports name-sorted files (the default output of minimap2) as it only parses one sequential group of lines with the same read ID at a time to avoid reading the whole file into memory, so breakinator should be run before any sorting of the file.
For SAM/BAM/CRAM
minimap2 -ax map-ont genome.fa reads.fastq > alignments.sam
./breakinator -i alignments.sam -o breakinator_out.txt
For PAF (include --paf flag)
minimap2 -cx map-ont --secondary=no genome.fa reads.fastq > alignments.paf
./breakinator -i alignments.sam --paf -o breakinator_out.txt
Generating PAF files
The Breakinator can also handle PAF files to input to the Breakinator. To generate these, we recommend using minimap2 with the -c and --secondary=no parameters. Secondary alignments will be ignored by the Breakinator, however including them will increase the processing time.
Example:
minimap2 -cx map-ont --secondary=no genome.fa reads.fastq > alignments.paf
The PAF can also be generated by converting a SAM file to a PAF with paftools.js using the -p parameter.
Example:
paftools.js sam2paf -p alignments.sam > alignments.paf
Optional: turn off symmetry filter for foldback artifacts
If running on a sample where you want to investiage all potential foldback events, we recommend turning off the symmetry filter with the --no-sym flag.
./breakinator -i alignments.paf --no-sym
<img width="742" alt="Screenshot 2025-05-09 at 10 15 35 AM" src="https://github.com/user-attachments/assets/c66855bb-5fbd-4143-a884-9bd200a4395f" />
Preprocessing for alignment to diploid genome assemblies with The Diploidinator
Minimap2 was not designed for diploid assemblies(eg. HG002), so when aligning reads to a diploid assembly, the mapping quality for reads may be lower, as there are multiple locations the read can align to well. We have developed a simple rust script to align reads to each haploid of the diploid assembly and then parse both paf files to choose the better alignment of the read based on the alignment score.
Diploidinator Installation
git clone https://github.com/jheinz27/breakinator
cd breakinator/diploidinator
cargo build --release
./target/release/diploidinator
Diploidinator Example Usage
NOTE: It is important to use the --secondary=no and --paf-no-hit flags when aligning with Minimap2. The diploidinator currently only works on paf files.
minimap2 -cx splice -uf -k14 -t 16 --secondary=no --paf-no-hit hg002v1.1.MATERNAL.fasta read.fastq > out_mat.paf
minimap2 -cx splice -uf -k14 -t 16 --secondary=no --paf-no-hit hg002v1.1.PATERNAL.fasta reads.fastq > out_pat.paf
diploidinator out_mat.paf out_pat.paf > out_haps_merge.paf
Merging Breakpoints Into Consensus Locations
To evaluate how many unique breakpoints are in the sample and how much read support they have, we developed a simple script to merge breakpoints together if they occur within 100bps (default -w) of eachother. We require at least 2 reads (default -s) of support to report a consensus breakpoint location.
usage: merge_breaks.py [-h] -i <breakpoints.txt> [-w --merge_window] [-s --min_support] > merged_breaks.txt
Merge Break Points from Breakinator output
optional arguments:
-h, --help show this help message and exit
-i <breakpoints.txt> input breakinator stdout
-w --merge_window Size of window to merge break points in
-s --min_support minimum reads supporting breakpoint
Citation
If the Breakinator has helped you in your research, please cite our preprint at: https://www.biorxiv.org/content/10.1101/2025.07.15.664946v2.abstract
Related Skills
node-connect
348.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
348.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
348.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
