Dentist
Close assembly gaps using long-reads at high accuracy.
Install / Use
/learn @a-ludi/DentistREADME
:construction: Maintenance Update: :construction:
This software project is presently not under active maintenance. Users are advised that there won't be regular updates or bug fixes.
We welcome any interested individuals to consider taking up the role of a new maintainer for the project. Feel free to express your interest or fork the project to continue its development.
Thank you for your understanding.
DENTIST

DENTIST uses long reads to close assembly gaps at high accuracy.
Long sequencing reads allow increasing contiguity and completeness of fragmented, short-read based genome assemblies by closing assembly gaps, ideally at high accuracy. DENTIST is a sensitive, highly-accurate and automated pipeline method to close gaps in (short read) assemblies with long reads.
API documentation: current, v4.0.0, v3.0.0, v2.0.0
First time here? Head over to the example and make sure it works.
Table of Contents
Install
Use Conda via Snakemake (recommended)
Make sure Mamba (a frontend for Conda) is installed on your system. You can then use DENTIST like so:
# run the whole workflow on a cluster using Conda
snakemake --configfile=snakemake.yml --use-conda -jall
snakemake --configfile=snakemake.yml --use-conda --profile=slurm
The last command is explained in more detail below in the usage section.
Note: If you do not have mamba installed, you may need to pass
--conda-frontend=conda to Snakemake.
Use Conda to Manually Install Binaries
Make sure Mamba (a frontend for Conda) is installed on your system. Install DENTIST and all dependencies like so:
mamba create -n dentist -c a_ludi -c bioconda dentist-core
mamba activate dentist
mamba install -c conda-forge -c bioconda snakemake
# execute the workflow
snakemake --configfile=snakemake.yml --cores=all
More details on executing DENTIST can be found in the usage section.
Use Pre-Built Binaries
Download the latest pre-built binaries from the releases section
and extract the contents. The pre-built binaries are stored in a subfolder
called bin. Here are the instructions for v4.0.0:
# download & extract pre-built binaries
wget https://github.com/a-ludi/dentist/releases/download/v4.0.0/dentist.v4.0.0.x86_64.tar.gz
tar -xzf dentist.v4.0.0.x86_64.tar.gz
# make binaries available to your shell
cd dentist.v4.0.0.x86_64
PATH="$PWD/bin:$PATH"
# check installation with
dentist -d
# Expected output:
#
#daligner (part of `DALIGNER`; see https://github.com/thegenemyers/DALIGNER) [OK]
#damapper (part of `DAMAPPER`; see https://github.com/thegenemyers/DAMAPPER) [OK]
#DAScover (part of `DASCRUBBER`; see https://github.com/thegenemyers/DASCRUBBER) [OK]
#DASqv (part of `DASCRUBBER`; see https://github.com/thegenemyers/DASCRUBBER) [OK]
#DBdump (part of `DAZZ_DB`; see https://github.com/thegenemyers/DAZZ_DB) [OK]
#DBdust (part of `DAZZ_DB`; see https://github.com/thegenemyers/DAZZ_DB) [OK]
#DBrm (part of `DAZZ_DB`; see https://github.com/thegenemyers/DAZZ_DB) [OK]
#DBshow (part of `DAZZ_DB`; see https://github.com/thegenemyers/DAZZ_DB) [OK]
#DBsplit (part of `DAZZ_DB`; see https://github.com/thegenemyers/DAZZ_DB) [OK]
#fasta2DAM (part of `DAZZ_DB`; see https://github.com/thegenemyers/DAZZ_DB) [OK]
#fasta2DB (part of `DAZZ_DB`; see https://github.com/thegenemyers/DAZZ_DB) [OK]
#computeintrinsicqv (part of `daccord`; see https://gitlab.com/german.tischler/daccord) [OK]
#daccord (part of `daccord`; see https://gitlab.com/german.tischler/daccord) [OK]
The tarball additionally contains the Snakemake workflow, example config files and this README. In short, everything you to run DENTIST.
Use a Singularity Container (discouraged)
Remark: the Singularity container may not work properly depending on your system. (see issue #30)
Make sure Singularity is installed on your system. You can then use the container like so:
# launch an interactive shell
singularity shell docker://aludi/dentist:stable
# execute a single command inside the container
singularity exec docker://aludi/dentist:stable dentist --version
# run the whole workflow on a cluster using Singularity
snakemake --configfile=snakemake.yml --use-singularity --profile=slurm
The last command is explained in more detail below in the usage section.
Build from Source
- Install the D package manager DUB.
- Install JQ 1.6.
- Build DENTIST using either
ordub install dentistgit clone --recurse-submodules https://github.com/a-ludi/dentist.git cd dentist dub build
Runtime Dependencies
The following software packages are required to run dentist:
- The Dazzler Data Base (>=2020-07-27)
Manage sequences (reads and assemblies) in 4bit encoding alongside auxiliary information such as masks or QV tracks
- DALIGNER (=2020-01-15)
Find significant local alignments.
- DAMAPPER (>=2020-03-22)
Find alignment chains, i.e. sequences of significant local alignments possibly with unaligned gaps.
- DAMASKER (>=2020-01-15)
Discover tandem repeats.
- DASCRUBBER (>=2020-07-26)
Estimate coverage and compute QVs.
- daccord (>=v0.0.18)
Compute reference-based consensus sequence for gap filling.
Please see their own documentation for installation instructions. Note, the
available packages on Bioconda are outdated and should not be used at the
moment but they are available using conda install -c a_ludi <dependency>.
Please use the exact versions specified in the Conda recipe in case you experience troubles.
Usage
Before you start producing wonderful scientific results, you should skip over to the example section and try to run the small example. This will make sure your setup is working as expected.
Quick execution with Snakemake
TL;DR
wget https://github.com/a-ludi/dentist/releases/download/v4.0.0/dentist.v4.0.0.x86_64.tar.gz tar -xzf dentist.v4.0.0.x86_64.tar.gz cd dentist.v4.0.0.x86_64 # edit dentist.yml and snakemake.yml # execute with CONDA: snakemake --configfile=snakemake.yml --use-conda # execute with SINGULARITY: snakemake --configfile=snakemake.yml --use-singularity # execute with pre-built binaries: PATH="$PWD/bin:$PATH" snakemake --configfile=snakemake.yml
Install Snakemake version >=5.32.1 and prepare your working directory:
wget https://github.com/a-ludi/dentist/releases/download/v4.0.0/dentist.v4.0.0.x86_64.tar.gz
tar -xzf dentist.v4.0.0.x86_64.tar.gz
cp -r -t . \
dentist.v4.0.0.x86_64/snakemake/dentist.yml \
dentist.v4.0.0.x86_64/snakemake/Snakefile \
dentist.v4.0.0.x86_64/snakemake/snakemake.yml \
dentist.v4.0.0.x86_64/snakemake/envs \
dentist.v4.0.0.x86_64/snakemake/scripts
Next edit snakemake.yml and dentist.yml to fit your needs and optionally
test your configuration with
# see above for variants with pre-built binaries or Singularity
snakemake --configfile=snakemake.yml --use-conda --cores=1 -f -- validate_dentist_config
If no errors occurred the whole workflow can be executed using
# see above for variants with pre-built binaries or Singularity
snakemake --configfile=snakemake.yml --use-conda --cores=all
For small genomes of a few 100 Mbp this should run on a regular workstation.
One may use Snakemake's --cores to run independent jobs in parallel. Larger
data sets may require a cluster in which case you can use Snakemake's
[cloud][snakemake-cloud] or [cluster][snakemake-cluster] facilities.
[snakemake-cloud]
Related Skills
node-connect
349.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
prose
349.2kOpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.
frontend-design
109.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
