SkillAgentSearch skills...

Distancia

The DistanceMetrics package is a comprehensive Python library designed to compute a wide variety of distance metrics between two vectors, set, matrix or sequences. This package includes implementations of several well-known distance metrics, each providing a unique measure of dissimilarity or similarity between data points.

Install / Use

/learn @ym001/Distancia

README

.. meta:: :description: Distancia is a comprehensive Python package that provides a wide range of distance metrics and similarity measures, making it easy to calculate and compare the proximity between various types of data. This documentation provides an in-depth guide to the package, including installation instructions, usage examples, and detailed descriptions of each available metric.

:keywords: data science machine learning deep-learRandomWalkning neural-network graph text-classification text distance cython markov-chain file similarity image classification nlp machine learning loss functions distancia :keywords lang=en: machine learning, image processing, optimization,text similarity, NLP, search engine, document ranking

====================================== Welcome to Distancia's documentation!

Distancia is a comprehensive Python package that provides a wide range of distance metrics and similarity measures, making it easy to calculate and compare the proximity between various types of data. This documentation provides an in-depth guide to the package, including installation instructions, usage examples, and detailed descriptions of each available metric.

The documentation is divided into the following sections:

.. note::

The code examples provided in this documentation are written for Python 3.x. The python code in this package has been optimized by static typing with Cython

Getting Started

Distancia is designed to be simple and intuitive, yet powerful and flexible. Whether you are working with numerical data, strings, or other types of data, Distancia provides the tools you need to measure the distance or similarity between objects.

For a quick introduction, check out the quickstart_ guide. If you want to dive straight into the code, head over to the Euclidean_ page.

.. quickstart: https://distancia.readthedocs.io/en/latest/quickstart.html

.. _Euclidean: https://distancia.readthedocs.io/en/latest/Euclidean.html

.. note::

If you find any issues or have suggestions for improvements, feel free to contribute!

Installation

You can install the distancia package with pip:

.. code-block:: bash

pip install distancia

By default, this will install the core functionality of the package, suitable for users who only need basic distance metrics.

Optional Dependencies The Distancia package also supports optional modules to enable additional features. You can install these extras depending on your needs:

With pandas support: Install with additional support for working with tabular data:

.. code-block:: bash

pip install distancia[pandas]

With all supported extras: Install all optional dependencies for maximum functionality:

.. code-block:: bash

pip install distancia[all]

This modular installation allows you to keep your setup lightweight or include everything for full capabilities.

Quickstart

Here are some common examples of how to use Distancia:

.. code-block:: python

from distancia import Euclidean

point1 = [1, 2, 3] point2 = [4, 5, 6]

Create an instance of Euclidean

euclidean = Euclidean()

Calculate the Euclidean distance

distance = euclidean.compute(point1, point2)

print(f"Euclidean Distance: {distance:4f}")

.. code-block:: bash

Euclidean Distance: 5.196

.. code-block:: python

from distancia import Levenshtein

string1 = "kitten" string2 = "sitting"

distance = Levenshtein().compute(string1, string2) print(f"Levenshtein Distance: {distance:4f}")

.. code:: bash

Levenshtein Distance: 3

For a complete list and detailed explanations of each metric, see the next section.

Available measurement type

.. _Vector Distance Measures: https://distancia.readthedocs.io/en/latest/vectorDistance.html .. _Matrix Distance Measures: https://distancia.readthedocs.io/en/latest/matrixDistance.html .. _Text Distance Measures: https://distancia.readthedocs.io/en/latest/textDistance.html .. _Time Series Distance Measures: https://distancia.readthedocs.io/en/latest/timeDistance.html .. _Loss Function-Based Distance Measures: https://distancia.readthedocs.io/en/latest/lossFunction.html .. _Graph Distance Measures: https://distancia.readthedocs.io/en/latest/graphDistance.html .. _Markov Chain Distance Measures: https://distancia.readthedocs.io/en/latest/markovChainDistance.html .. _Image Distance Measures: https://distancia.readthedocs.io/en/latest/imageDistance.html .. _Audio Distance Measures: https://distancia.readthedocs.io/en/latest/soundDistance.html .. _File Distance Measures: https://distancia.readthedocs.io/en/latest/fileDistance.html

Vector Distance Measures_

Distance measures between vectors are essential in machine learning, classification, and information retrieval. Here are five of the most commonly used:

  1. Euclidean Distance_

    The Euclidean distance is the square root of the sum of the squared differences between the coordinates of two vectors. It is ideal for measuring similarity in geometric spaces.

.. _Euclidean Distance: https://distancia.readthedocs.io/en/latest/Euclidean.html

  1. Manhattan Distance_
    Also known as L1 distance, it is defined as the sum of the absolute differences between the coordinates of the vectors. It is well-suited for discrete spaces and grid-based environments.

.. _Manhattan Distance: https://distancia.readthedocs.io/en/latest/Manhattan.html

  1. Cosine Distance_
    It measures the angle between two vectors rather than their absolute distance. Commonly used in natural language processing and information retrieval (e.g., search engines).

.. _Cosine Distance: https://distancia.readthedocs.io/en/latest/Cosine.html

  1. Jaccard Distance_
    Based on the ratio of the intersection to the union of sets, it is effective for comparing sets of words, tags, or recommended items.

.. _Jaccard Distance: https://distancia.readthedocs.io/en/latest/Jaccard.html

  1. Hamming Distance_
    It counts the number of differing positions between two character or binary sequences. It is widely used in error detection and bioinformatics.

.. _Hamming Distance: https://distancia.readthedocs.io/en/latest/Hamming.html

.. note::
These distance measures are widely used in various algorithms, including clustering, supervised classification, and search engines.

Matrix Distance Measures_

Distance measures between matrices are widely used in machine learning, image processing, and numerical analysis. Below are five of the most commonly used:

  1. Frobenius Norm_ The Frobenius norm is the square root of the sum of the squared elements of the difference between two matrices. It generalizes the Euclidean distance to matrices and is commonly used in optimization problems.

.. _Frobenius Norm: https://distancia.readthedocs.io/en/latest/Frobenius.html

  1. Spectral Norm_ Defined as the largest singular value of the difference between two matrices, the spectral norm is useful for analyzing stability in numerical methods.

.. _Spectral Norm: https://distancia.readthedocs.io/en/latest/SpectralNormDistance.html

  1. Trace Norm (Nuclear Norm)_ This norm is the sum of the singular values of the difference between matrices. It is often used in low-rank approximation and compressed sensing.

.. _Trace Norm (Nuclear Norm): https://distancia.readthedocs.io/en/latest/NuclearNorm.html

  1. Mahalanobis Distance_ A statistical distance measure that considers correlations between features, making it effective in multivariate anomaly detection and classification.

.. _Mahalanobis Distance: https://distancia.readthedocs.io/en/latest/Mahalanobis.html

  1. Wasserstein Distance (Earth Mover’s Distance)_ This metric quantifies the optimal transport cost between two probability distributions, making it highly relevant in image processing and deep learning.

.. _Wasserstein Distance (Earth Mover’s Distance): https://distancia.readthedocs.io/en/latest/Wasserstein.html

.. note::
These distance measures are widely applied in fields such as computer vision, data clustering, and signal processing.

Text Distance Measures_

Distance measures between texts are crucial in natural language processing (NLP), search engines, and text similarity tasks. Below are five of the most commonly used:

  1. Levenshtein Distance (Edit Distance)_ The minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another. Used in spell checkers and DNA sequence analysis.

.. _Levenshtein Distance (Edit Distance): https://distancia.readthedocs.io/en/latest/Levenshtein.html

  1. Jaccard Similarity_
    Measures the overlap between two sets of words or character n-grams, computed as the ratio of their intersection to their union. Useful in document comparison and keyword matching.

.. _Jaccard Similarity: https://distancia.readthedocs.io/en/latest/Jaccard.html

  1. Cosine Similarity_
    Computes the cosine of the angle between two text vectors, often based on TF-IDF or word embeddings. Commonly used in search engines and document ranking.

.. _Cosine Similarity: https://distancia.readthedocs.io/en/latest/Cosine.html

  1. Damerau-Levenshtein Distance_ An extension of Levenshtein distance that also considers transpositions (swapping adjacent characters). More robust for typographical error detection.

.. _Damerau-Levenshtein Distance: https://distancia.readthedocs.io/en/latest/DamerauLevenshtein.html

  1. BLEU Score (Bilingual Evaluation Understudy)_ Measures the similarity between a candidate text and reference texts using n-gram precision. Widely used in machine translation and text summarization.

.. _BLEU Score (Bilingual E

Related Skills

View on GitHub
GitHub Stars15
CategoryData
Updated1mo ago
Forks2

Languages

Jupyter Notebook

Security Score

80/100

Audited on Feb 19, 2026

No findings