Kmertools
kmer based feature extraction tool for bioinformatics, metagenomics, AI/ML and more
Install / Use
/learn @anuradhawick/KmertoolsREADME
kmertools: DNA Vectorisation Tool
<div align="center"> <pre> $$\ $$\ $$$$$$$$\ $$\ $$ | $$ | \__$$ __| $$ | $$ |$$ / $$$$$$\$$$$\ $$$$$$\ $$$$$$\ $$ | $$$$$$\ $$$$$$\ $$ | $$$$$$$\ $$$$$ / $$ _$$ _$$\ $$ __$$\ $$ __$$\ $$ | $$ __$$\ $$ __$$\ $$ |$$ _____| $$ $$< $$ / $$ / $$ |$$$$$$$$ |$$ | \__| $$ | $$ / $$ |$$ / $$ |$$ |\$$$$$$\ $$ |\$$\ $$ | $$ | $$ |$$ ____|$$ | $$ | $$ | $$ |$$ | $$ |$$ | \____$$\ $$ | \$$\ $$ | $$ | $$ |\$$$$$$$\ $$ | $$ | \$$$$$$ |\$$$$$$ |$$ |$$$$$$$ | \__| \__|\__| \__| \__| \_______|\__| \__| \______/ \______/ \__|\_______/ </pre> </div>Overview
kmertools is a k-mer based feature extraction tool designed to support metagenomics and other bioinformatics analytics. This tool leverages k-mer analysis to vectorize DNA sequences, facilitating the use of these vectors in various AI/ML applications.
Features
- Oligonucleotide Frequency Vectors: Generate frequency vectors for oligonucleotides.
- Minimiser Binning: Efficiently bin sequences using minimisers to reduce data complexity.
- Chaos Game Representation (CGR): Compute CGR vectors for DNA sequences based on k-mers or whole sequence transformation.
- Coverage Histograms: Create coverage histograms to analyze the depth of sequencing reads.
- Python Binding: You can import kmertools functionality using
import pykmertools as kt
Installation
Option 1: from bioconda (recommended)
You can install kmertools from Bioconda at https://anaconda.org/bioconda/kmertools. Make sure you have conda installed.
# create conda environment and install kmertools
conda create -n kmertools -c bioconda kmertools
# activate environment
conda activate kmertools
Option 2: from PyPI
You can install kmertools from PyPI at https://pypi.org/project/pykmertools/.
pip install pykmertools
Option 3: from sources
You can install kmertools directly from the source by cloning the repository and using Rust's package manager cargo.
git clone https://github.com/your-repository/kmertools.git
cd kmertools
cargo build --release
Now add the binary to path (you may modify ~/.bashrc or ~/.zshrc)
# to add to current terminal
export PATH=$PATH:$(pwd)/target/release/
# to save to ~/.bashrc
echo "export PATH=\$PATH:$(pwd)/target/release/" >> ~/.bashrc
source ~/.bashrc
# to save to ~/.zshrc for Mac
echo "export PATH=\$PATH:$(pwd)/target/release/" >> ~/.zshrc
source ~/.zshrc
To install the python bindings run the following commands. You can use either pip or conda directories for this.
# pip
cd pip
maturin build --release
# conda
cd conda
maturin build --release
Now move to parent directory using cd .. and run the following command.
pip install target/wheels/pykmertools-<VERSION>-cp39-abi3-manylinux_2_34_x86_64.whl
Test the installation
After setting up, run the following command to print out the kmertools help message.
kmertools --help
Help
Please read our comprehensive Wiki.
Authors
- Anuradha Wickramarachchi https://anuradhawick.com
- Vijini Mallawaarachchi https://vijinimallawaarachchi.com
Citation
If you use kmertools please cite as follows.
@software{Wickramarachchi_kmertools_DNA_Vectorisation,
author = {Wickramarachchi, Anuradha and Mallawaarachchi, Vijini},
title = {{kmertools: DNA Vectorisation Tool}},
url = {https://github.com/anuradhawick/kmertools},
version = {0.1.4}
}
Please refer to the Wiki for citations of relevant algorithms.
Support and contributions
Please get in touch via author websites or GitHub issues. Thanks!
Related Skills
node-connect
351.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
