TargetCall
TargetCall is the first pre-basecalling filter that is applicable to a wide range of use cases to eliminate wasted computation in basecalling. Described in our preprint: https://arxiv.org/abs/2212.04953
Install / Use
/learn @CMU-SAFARI/TargetCallREADME
TargetCall
TargetCall is the first pre-basecalling filter that is applicable to a wide range of use cases. TargetCall’s key idea is to quickly filter out off-target reads (i.e., reads that are dissimilar to the target reference.) before the basecalling step to eliminate the wasted computation in basecalling. TargetCall is based on ONT basecaller Bonito.
Prerequisites
TargetCall requires minimap2 to be installed. Minimap2 can be installed via Minimap2 (v2.24)
Installation
TargetCall is tested on Linux with conda version 4.7.12.
$ git clone https://github.com/CMU-SAFARI/TargetCall
$ cd TargetCall
$ conda create --name targetcall python=3.8.10
$ conda activate targetcall
(targetcall) $ pip install --upgrade pip
(targetcall) $ pip install -r requirements.txt
(targetcall) $ python setup.py develop
You may need to use requirements-cuda111.txt or requirements-cuda113.txt depending on your cuda version.
Usage
$ cd src
$ python targetcall.py ../sample_data/fast5/ ../sample_data/Monkeypox_virus.fasta TINYX011 ../sample_data/
This will create three output files under ../sample_data/
- output.fasta: contains noisy basecalled reads of fast5 files using model TINYX011
- output.sam: contains alignment of noisy reads to Monkeypox_virus reference.
- readids.txt: the read IDs of reads that are accepted by the filter.
Read IDs can be used as an input to Bonito for basecalling only the reads that are accepted by the filter using the --read-ids option.
Provided Models
You can find all models listed under bonito/models/.
| Model Name | Model Name in the Paper | # of Parameters | Basecalling Accuracy | | ------------- | ------------- | ------------- | ------------- | | default | Bonito | 9739K | 94.60% | | TINYX0111 | LC-Main*2 | 565K | 90.91% | | TINYX011 | LC-Main | 292K | 89.75% | | TINYX01 | LC-Main/2 | 146K | 86.83% | | TINYX2 | LC-Main/4 | 52K | 80.82% | | TINYX3 | LC-Main/8 | 21K | 70.42% |
Reproducing the results in the paper
We explain how to reproduce the results we show in the TargetCall paper in the test directory.
<a name="cite"></a>Citing TargetCall
TargetCall is described and evaluated in the following paper. If you find the repository and the code useful, please cite:
Meryem Banu Cavlak, Gagandeep Singh, Mohammed Alser, Can Firtina, Joël Lindegger, Mohammad Sadrosadati, Nika Mansouri Ghiasi, Can Alkan, and Onur Mutlu, "TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering," arXiv (2022). DOI
BIB:
@article{cavlak_targetcall_2022,
title = {{TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering}},
url = {https://doi.org/10.48550/arXiv.2212.04953},
journal = {arXiv},
author = {Cavlak, Meryem Banu and Singh, Gagandeep and Alser, Mohammed and Firtina, Can and Lindegger, Joël and Sadrosadati, Mohammad and Ghiasi, Nika Mansouri and Alkan, Can and Mutlu, Onur},
year = {2022},
month = dec,
}
Related Skills
node-connect
341.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.6kCommit, push, and open a PR
