Kmerhash
A parallel DNA k-mer counter based on optimized distributed memory hash tables
Install / Use
/learn @ParBLiSS/KmerhashREADME
README
What is this repository for?
This repository contains implementation of hash tables that are aimed to better support kmer indexing. Specifically, the hash tables are based on open addressing and uses either linear probing or Robin Hood hashing.
The implemention is c++ 11 compliant and header only.
Repository Structure
The repository is organized as follows:
-
"include/kmerhash" contains the implementations, in particular hashmap_linearprobe_doubling.hpp and hashmap_robinhood_doubling.hpp.
-
"include/kmerhash/experimental" contains some experimental implementations.
-
"test/unit" contains the unit tests for the clases.
-
"benchmark" contains benchmarks, in particular BenchmarkHashTables.cpp is the primary benchmark tool.
-
"ext" contains dependencies as git submodules.
Dependencies
For the moment, this project depends on "kmerind", and indirectly on Kmerind's dependencies including google sparse hash map, google farm hash, murmur hash, mxx, google test, and tclap. These dependencies are likely to be refactored later.
The project requires a c++11 compliant compiler (4.8.4 or later) and cmake ver 2.8 or later.
To initialize the git submodules, invoke the following:
#!sh
cd {src}
git submodule init
git submodule update
cd ext/kmerind
git submodule init
git submodule update
Configuration
To compile, first create a build directory, preferably outside of the source directory.
#!sh
mkdir {build}
cd {build}
cmake {src} -DENABLE_TESTING=ON -DENABLE_BENCHMARKING=ON -DBUILD_EXAMPLE_APPLICATIONS=ON
make
Alternatively, you can use ccmake.
#!sh
mkdir {build}
cd {build}
ccmake {src}
make
How to run tests
The unit test are located in the "test" subdirectory inside the build directory. To run individual tests, the executable can be invoked directly. To run all tests, use
#!sh
cd {build}
make test
How to run benchmarks
There is currently only 1 benchmark and it is hard coded to insert 100M elements with average 5x repeats, and query with 10M elements. The following tests are run in sequence: insert, find, count, erase, count.
To run the benchmark, invoke the executable below from the build directory root.
#!sh
cd {build}
bin/benchmark_hashtables
Contribution guidelines
- Writing tests
- Code review
- Other guidelines
Who do I talk to?
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
