SkillAgentSearch skills...

Kvector

kvector is a small utility for converting motifs to kmer vectors to compare motifs of different lengths

Install / Use

/learn @olgabot/Kvector
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

kvector

What is kvector?

kvector is a small utility for converting motifs to kmer vectors to compare motifs of different lengths

  • Free software: BSD license
  • Documentation: https://olgabot.github.io/kvector

Installation

To install this code, clone this github repository and use pip to install

git clone https://github.com/olgabot/kvector.git
cd kvector
pip install .  # The "." means "install *this*, the folder where I am now"

Features

Check out this notebook for an overview of features with both inputs and outputs (below shows only inputs)

Count k-mers for each line in a bed file (multithreaded)

For each interval in a bed file, count the kmers and return a (n_intervals, n_kmers) matrix of the k-mer counts of each region.

kmers = kvector.per_interval_kmers(bedfile, genome_fasta, threads=threads,
    kmer_lengths=(4, 5, 6), residues='ACGT')
csv = bedfile.replace('.bed', '_kmers.csv')
kmers.to_csv(csv)

Count k-mers for each line in a bed file, intersected (multithreaded)

For each interval in a bed file, intersect it with another (other) bed file (e.g. only conserved regions of introns) and count k-mers for the intersected region. Returns a (n_intervals, n_kmers) matrix of the k-mer counts of each line in the bed file, intersected with the other bed.

kmers = kvector.per_interval_kmers(bedfile, genome_fasta, other, threads=threads,
    kmer_lengths=(4, 5, 6))
csv = bedfile.replace('.bed', '_kmers.csv')
kmers.to_csv(csv)

Count all k-mers in a fasta file

kmer_vector = kvector.count_kmers('kvector/tests/data/example.fasta', kmer_lengths=(3, 4))
kmer_vector.head()

Read HOMER motif file

motifs = kvector.read_motifs("kvector/tests/example.motifs", residues='ACGT')

The output is a pandas Series of the motif ids from the file, mapped to a dataframe of the position-weight matrix of the motif.

Create metadata matrix from the ID lines of the motifs

metadata = kvector.create_metadata(motifs)

Transform the motif PWM to a kmer vector

Keep in mind that on most computers, only kmers up to about 8 (4^8 = 65,536) can be stored in memory. You may want to do this on a supercomputer and not just your laptop.

motif_kmer_vectors = kvector.motifs_to_kmer_vectors(motifs, residues='ACGT',
    kmer_lengths=(4, 5, 6))
View on GitHub
GitHub Stars10
CategoryDevelopment
Updated1y ago
Forks2

Languages

Jupyter Notebook

Security Score

75/100

Audited on Feb 6, 2025

No findings