Namfinder
Finds Non-overlapping Approximate Matches (NAMs) between query and reference sequences using strobemers
Install / Use
/learn @ksahlin/NamfinderREADME
namfinder: Fast computation of shared regions between sequences
2023-05-19: Namfinder is not for stable use yet. The program currently contains a limiting complexity in some cases (sqared in the number of hits) for genome size comparisons. I advice not to run this software until it is fixed. This repo went public just because uLTRA long transcriptomic aligner depends on it.
Namfinder is a sequence (DNA/RNA) mapping tool used to find Non-overlapping Approximate Matches (NAMs). The output and usage mimicks that of nucmer. You can think of NAMs as Maximal Exact Matches (MEMs) but allowing some SNVs and smaller indels. NAMs are constructed from overlapping strobemer seeds.
Namfinder has borrowed the whole indexing construction codebase from strobealign (a short-read mapper), but is used only for finding NAM seeds. Credits to @marcelm, @luispedro and @psj1997 for the optimized indexing implementation. Namfinder is a more optimized version of the previous proof-of-concept tool StrobeMap that was implemented for the strobemers paper. It has changed name not to confuse it with strobealign.
Features
- Multithreading support
- Fast indexing (2-5 minutes for a human-sized reference genome)
- Output in MUMmer MEM tsv format
Table of contents
- Installation
- Usage
- Command-line options
- Index file
- Changelog
- Contributing
- Performance
- Credits
- Version info
- License
Installation
You need to have CMake, a recent g++ (tested with version 8) and zlib installed.
Then do the following:
git clone https://github.com/ksahlin/namfinder
cd namfinder
cmake -B build -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native"
make -j -C build
The resulting binary is build/namfinder.
The binary is tailored to the CPU the compiler runs on.
If it needs to run on other machines, use this cmake command instead for compatibility with most x86-64 CPUs in use today:
cmake -B build -DCMAKE_C_FLAGS="-msse4.2" -DCMAKE_CXX_FLAGS="-msse4.2"
Usage
Parameter -k is the strobe size, -s is sub-k-mer size (used for thinning in syncmers). Set -s to the same value as kfor no thinning.
Parameters -l and -u are window min and window mac for sampling the downstream strobe. only strobemers of order 2 can currently be used.
namfinder -k 10 -s 10 -l 11 -u 35 -C 500 -o nams.tsv ref.fa reads.f[a/q]
CREDITS
- Some of the ideas for the index and NAM construction in namfinder was borrowed from: Sahlin, K. Strobealign: flexible seed size enables ultra-fast and accurate read alignment. Genome Biol 23, 260 (2022). https://doi.org/10.1186/s13059-022-02831-7
- Big improvements were designed by @marcelm and @luispedro, and inplemented by @marcelm and @psj1997 (forthcoming paper).
LICENCE
MIT license, see LICENSE.
Related Skills
node-connect
347.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
