SkillAgentSearch skills...

NtEdit

✏️ Genome assembly polishing & SNV detection

Install / Use

/learn @BirolLab/NtEdit
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Release Downloads Conda Issues link Thank you for your Stars

Logo

ntEdit

Fast, lightweight, scalable genome sequence polishing and SNV detection & annotation

2018-current

Contents

  1. Description
  2. Implementation and requirements
  3. Install
  4. Dependencies
  5. Documentation
  6. Citing ntEdit
  7. Credits
  8. How to run ntEdit
  9. Running ntEdit
  10. ntEdit polishing options
  11. Soft-mask option
  12. SNV mode
  13. VCF input option
  14. Test data
  15. Algorithm
  16. Output files
  17. License

Description <a name=description></a>

ntEdit is a fast and scalable genomics application for polishing genome sequence assembly drafts. It simplifies polishing, variant detection and "haploidization" of gene and genome sequences with its re-usable Bloom filter design. Although it was originally designed as a general-purpose polishing tool, initally aimed at improving genome sequences by fixing base mismatches and frame shift errors with the help of more base-accurate short sequencing reads, ntEdit can also be used for polishing with long reads and to "finish" genome sequence assembly projects (refer to <a href="https://github.com/birollab/goldPolish" target="_blank">GoldPolish</a> and the <a href="https://github.com/birollab/ntedit_sealer_protocol" target="_blank">ntedit+sealer genome assembly finishing protocol</a>, respectively).

We anticipate that ntEdit will find further applications in the rapid mapping of single nucleotide variants, as demonstrated below with the genome of SARS-CoV-2, the highly transmissible pathogenic coronavirus, and etiological agent of COVID-19. Additionally, for researchers delving into the intricate roots of genetic lineages within large cohort data, we encourage the utilization of <a href="https://github.com/birollab/ntroot" target="_blank">ntRoot, an ancestry prediction framework</a> built upon the ntEdit engine. It offers a comprehensive analysis of genetic heritage, employing sequence alignment-free algorithms to unveil ancestral connections and provide insights into genetic ancestry within diverse populations.

! NOTE: In v1.3.1 onwards, the parameter k is automatically detected from supplied Bloom filters

SARS-CoV-2 evolution in human hosts SARS-CoV-2 evolution in human hosts. ntEdit v1.3.4 was used to map nucleotide variation between the first published coronavirus isolate from Wuhan in early January 2020 and over 1,500,000 SARS-CoV-2 genomes sampled from around the globe during the COVID-19 pandemic. <a href="https://bcgsc.github.io/SARS2" target="_blank">Additional (& interactive) timemaps are available</a>.

Implementation and requirements <a name=implementation></a>

ntEdit v1.2.0 and subsequent versions are written in C++.

(We compiled with gcc 5.5.0)

Install <a name=install></a>

Clone and enter the ntEdit directory.

<pre> git clone https://github.com/birollab/ntEdit.git cd ntEdit </pre>

Compile ntEdit.

<pre> meson setup build --prefix=/path/to/ntedit/install/dir cd build ninja install </pre>

Dependencies <a name=dependencies></a>

  1. ntStat (v1.0.0+, https://github.com/birollab/ntstat)
  2. BloomFilter utilities (provided in ./lib)
  3. kseq (provided in ./lib)
  4. meson
  5. ninja
  6. btllib
  7. snakemake
  8. python 3.9+
! NOTE: ntEdit v2.1.0+ IS ONLY compatible with ntStat release v1.0.0+

We recommend installing ntEdit and its dependencies, using conda:

<pre> conda install -c bioconda ntedit </pre>

Documentation <a name=docs></a>

Refer to the README.md file on how to install and run ntEdit. Our manuscript contains information about the software and its performance. ntEdit ISMB poster This ISMB2019 poster contains additional information, benchmarks and results.

Citing ntEdit <a name=citing></a>

Thank you for your Stars and for using, developing and promoting this free software!

If you use ntEdit in your research, please cite:

ntEdit: scalable genome sequence polishing

<pre> ntEdit: scalable genome sequence polishing. Warren RL, Coombe L, Mohamadi H, Zhang J, Jaquish B, Isabel N, Jones SJM, Bousquet J, Bohlmann J, Birol I. Bioinformatics. 2019 Nov 1;35(21):4430-4432. doi: 10.1093/bioinformatics/btz400. </pre>

link

The experimental data described in our paper can be downloaded from: http://www.bcgsc.ca/downloads/btl/ntedit/

Credits <a name=credits></a>

ntedit (concept, algorithm design and prototype): Rene Warren

nthash: Hamid Mohamadi, Parham Kazemi

ntstat: Parham Kazemi

C++ implementation: Jessica Zhang, Rene Warren, Johnathan Wong

Integration tests: Murathan T Goktas

ntEdit workflow: Johnathan Wong and Lauren Coombe

How to run ntEdit <a name=howto></a>

General ntEdit usage:

run-ntedit --help
usage: run-ntedit [-h] {polish,snv} ...

ntEdit: Fast, lightweight, scalable genome sequence polishing and SNV detection & annotation

positional arguments:
  {polish,snv}  ntEdit can be run in polishing or SNV modes.
    polish      Run ntEdit polishing
    snv         Run ntEdit SNV mode (Experimental)

optional arguments:
  -h, --help    show this help message and exit

Running in polishing mode <a name=run></a>

run-ntedit polish --help
usage: run-ntedit polish [-h] --draft DRAFT --reads READS [-i {0,1,2,3,4,5}] [-d {0,1,2,3,4,5,6,7,8,9,10}] [-x X] [--cap CAP] [-m {0,1,2}] [-a {0,1}] -k K
                         [-l L] [--cutoff CUTOFF] [--solid] [-t T] [-z Z] [-y Y] [-j J] [-X X] [-Y Y] [-v] [-V] [-n] [-f]

optional arguments:
  -h, --help            show this help message and exit
  --draft DRAFT         Draft genome assembly. Must be specified with exact FILE NAME. Ex: --draft myDraft.fa (FASTA, Multi-FASTA, and/or gzipped compatible),
                        REQUIRED
  --reads READS         Prefix of reads file(s). All files in the working directory with the specified prefix will be used for polishing (fastq, fasta, gz),
                        REQUIRED
  -i {0,1,2,3,4,5}      Maximum number of insertion bases to try, range 0-5, [default=5]
  -d {0,1,2,3,4,5,6,7,8,9,10}
                        Maximum number of deletions bases to try, range 0-10, [default=5]
  -x X                  k/x ratio for the number of k-mers that should be missing, [default=5.000]
  --cap CAP             Cap for the number of base insertions that can be made at one position[default=k*1.5]
  -m {0,1,2}            Mode of editing, range 0-2, [default=0] 0: best substitution, or first good indel 1: best substitution, or best indel 2: best edit
                        overall (suggestion that you reduce i and d for performance)
  -a {0,1}              Soft masks missing k-mer positions having no fix (1 = yes, default = 0, no)
  -k K                  k-mer size, REQUIRED
  -l L                  input VCF file with annotated variants (e.g., clinvar.vcf)
  --cutoff CUTOFF       The minimum coverage of k-mers in output Bloom filter [default=2, ignored if solid=True]
  --solid               Output the solid k-mers (non-erroneous k-mers), [default=False]
  -t T                  Number of threads [default=4]
  -z Z                  Minimum contig length [default=100]
  -y Y                  k/y ratio for the number of edited k-mers that should be present, [default=9.000]
  -j J                  controls size of k-mer subset. When checking subset of k-mers, check every jth k-mer [default=3]
  -X X                  Ratio of number of k-mers in the k subset that should be missing in orderto attempt fix (higher=stringent) [default=0.5, if -Y is
                        specified]
  -Y Y                  Ratio of number of k-mers in the k subset that should be present to accept an edit (higher=stringent) [default=0.5, if -X is specified]
  -v                    Verbose mode, [default=False]
  -V, --version         show program's version number and exit
  -n, --dry-run         Print out the commands that will be executed
  -f, --force           Run all ntEdit steps, regardless of existing output files

Running ntEdit in SNV mode

run-ntedit snv --help
usage: run-ntedit snv [-h] [--reference REFERENCE] [--reads READS] [--genome GENOME [GENOME ...]] -k K [-l L] [--cutoff CUTOFF] [--solid] [-t T] [-z Z] [-y Y]
                      [-j J] [-X X] [-Y Y] [-v] [-V] [-n] [-f]

optional arguments:
  -h, --help            show this help message and exit
  --reference REFERENCE
                        Reference genome assembly for SNV calling (FASTA, Multi-FASTA, and/or gzipped compatible), REQUIRED
View on GitHub
GitHub Stars73
CategoryDevelopment
Updated1d ago
Forks10

Languages

C++

Security Score

100/100

Audited on Apr 1, 2026

No findings