TargetCall

TargetCall is the first pre-basecalling filter that is applicable to a wide range of use cases to eliminate wasted computation in basecalling. Described in our preprint: https://arxiv.org/abs/2212.04953

Generate Convert Improve

Install / Use

/learn @CMU-SAFARI/TargetCall

About this skill

Quality Score

0/100

README

TargetCall

TargetCall is the first pre-basecalling filter that is applicable to a wide range of use cases. TargetCall’s key idea is to quickly filter out off-target reads (i.e., reads that are dissimilar to the target reference.) before the basecalling step to eliminate the wasted computation in basecalling. TargetCall is based on ONT basecaller Bonito.

Prerequisites

TargetCall requires minimap2 to be installed. Minimap2 can be installed via Minimap2 (v2.24)

Installation

TargetCall is tested on Linux with conda version 4.7.12.

$ git clone https://github.com/CMU-SAFARI/TargetCall
$ cd TargetCall
$ conda create --name targetcall python=3.8.10
$ conda activate targetcall
(targetcall) $ pip install --upgrade pip
(targetcall) $ pip install -r requirements.txt
(targetcall) $ python setup.py develop

You may need to use requirements-cuda111.txt or requirements-cuda113.txt depending on your cuda version.

Usage

$ cd src
$ python targetcall.py ../sample_data/fast5/ ../sample_data/Monkeypox_virus.fasta TINYX011 ../sample_data/

This will create three output files under ../sample_data/

output.fasta: contains noisy basecalled reads of fast5 files using model TINYX011
output.sam: contains alignment of noisy reads to Monkeypox_virus reference.
readids.txt: the read IDs of reads that are accepted by the filter.

Read IDs can be used as an input to Bonito for basecalling only the reads that are accepted by the filter using the --read-ids option.

Provided Models

You can find all models listed under bonito/models/.

| Model Name | Model Name in the Paper | # of Parameters | Basecalling Accuracy | | ------------- | ------------- | ------------- | ------------- | | default | Bonito | 9739K | 94.60% | | TINYX0111 | LC-Main*2 | 565K | 90.91% | | TINYX011 | LC-Main | 292K | 89.75% | | TINYX01 | LC-Main/2 | 146K | 86.83% | | TINYX2 | LC-Main/4 | 52K | 80.82% | | TINYX3 | LC-Main/8 | 21K | 70.42% |

Reproducing the results in the paper

We explain how to reproduce the results we show in the TargetCall paper in the test directory.

<a name="cite"></a>Citing TargetCall

TargetCall is described and evaluated in the following paper. If you find the repository and the code useful, please cite:

Meryem Banu Cavlak, Gagandeep Singh, Mohammed Alser, Can Firtina, Joël Lindegger, Mohammad Sadrosadati, Nika Mansouri Ghiasi, Can Alkan, and Onur Mutlu, "TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering," arXiv (2022). DOI

BIB:

@article{cavlak_targetcall_2022,
  title = {{TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering}},
  url = {https://doi.org/10.48550/arXiv.2212.04953},
  journal = {arXiv},
  author = {Cavlak, Meryem Banu and Singh, Gagandeep and Alser, Mohammed and Firtina, Can and Lindegger, Joël and Sadrosadati, Mohammad and Ghiasi, Nika Mansouri and Alkan, Can and Mutlu, Onur},
  year = {2022},
  month = dec,
}

Related Skills

node-connect

341.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

84.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

341.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

84.6k

Commit, push, and open a PR