Distancia
The DistanceMetrics package is a comprehensive Python library designed to compute a wide variety of distance metrics between two vectors, set, matrix or sequences. This package includes implementations of several well-known distance metrics, each providing a unique measure of dissimilarity or similarity between data points.
Install / Use
/learn @ym001/DistanciaREADME
.. meta:: :description: Distancia is a comprehensive Python package that provides a wide range of distance metrics and similarity measures, making it easy to calculate and compare the proximity between various types of data. This documentation provides an in-depth guide to the package, including installation instructions, usage examples, and detailed descriptions of each available metric.
:keywords: data science machine learning deep-learRandomWalkning neural-network graph text-classification text distance cython markov-chain file similarity image classification nlp machine learning loss functions distancia :keywords lang=en: machine learning, image processing, optimization,text similarity, NLP, search engine, document ranking
====================================== Welcome to Distancia's documentation!
Distancia is a comprehensive Python package that provides a wide range of distance metrics and similarity measures, making it easy to calculate and compare the proximity between various types of data. This documentation provides an in-depth guide to the package, including installation instructions, usage examples, and detailed descriptions of each available metric.
The documentation is divided into the following sections:
.. note::
The code examples provided in this documentation are written for Python 3.x. The python code in this package has been optimized by static typing with Cython
Getting Started
Distancia is designed to be simple and intuitive, yet powerful and flexible. Whether you are working with numerical data, strings, or other types of data, Distancia provides the tools you need to measure the distance or similarity between objects.
For a quick introduction, check out the quickstart_ guide. If you want to dive straight into the code, head over to the Euclidean_ page.
.. quickstart: https://distancia.readthedocs.io/en/latest/quickstart.html
.. _Euclidean: https://distancia.readthedocs.io/en/latest/Euclidean.html
.. note::
If you find any issues or have suggestions for improvements, feel free to contribute!
Installation
You can install the distancia package with pip:
.. code-block:: bash
pip install distancia
By default, this will install the core functionality of the package, suitable for users who only need basic distance metrics.
Optional Dependencies The Distancia package also supports optional modules to enable additional features. You can install these extras depending on your needs:
With pandas support: Install with additional support for working with tabular data:
.. code-block:: bash
pip install distancia[pandas]
With all supported extras: Install all optional dependencies for maximum functionality:
.. code-block:: bash
pip install distancia[all]
This modular installation allows you to keep your setup lightweight or include everything for full capabilities.
Quickstart
Here are some common examples of how to use Distancia:
.. code-block:: python
from distancia import Euclidean
point1 = [1, 2, 3] point2 = [4, 5, 6]
Create an instance of Euclidean
euclidean = Euclidean()
Calculate the Euclidean distance
distance = euclidean.compute(point1, point2)
print(f"Euclidean Distance: {distance:4f}")
.. code-block:: bash
Euclidean Distance: 5.196
.. code-block:: python
from distancia import Levenshtein
string1 = "kitten" string2 = "sitting"
distance = Levenshtein().compute(string1, string2) print(f"Levenshtein Distance: {distance:4f}")
.. code:: bash
Levenshtein Distance: 3
For a complete list and detailed explanations of each metric, see the next section.
Available measurement type
.. _Vector Distance Measures: https://distancia.readthedocs.io/en/latest/vectorDistance.html .. _Matrix Distance Measures: https://distancia.readthedocs.io/en/latest/matrixDistance.html .. _Text Distance Measures: https://distancia.readthedocs.io/en/latest/textDistance.html .. _Time Series Distance Measures: https://distancia.readthedocs.io/en/latest/timeDistance.html .. _Loss Function-Based Distance Measures: https://distancia.readthedocs.io/en/latest/lossFunction.html .. _Graph Distance Measures: https://distancia.readthedocs.io/en/latest/graphDistance.html .. _Markov Chain Distance Measures: https://distancia.readthedocs.io/en/latest/markovChainDistance.html .. _Image Distance Measures: https://distancia.readthedocs.io/en/latest/imageDistance.html .. _Audio Distance Measures: https://distancia.readthedocs.io/en/latest/soundDistance.html .. _File Distance Measures: https://distancia.readthedocs.io/en/latest/fileDistance.html
Vector Distance Measures_
Distance measures between vectors are essential in machine learning, classification, and information retrieval. Here are five of the most commonly used:
-
Euclidean Distance_The Euclidean distance is the square root of the sum of the squared differences between the coordinates of two vectors. It is ideal for measuring similarity in geometric spaces.
.. _Euclidean Distance: https://distancia.readthedocs.io/en/latest/Euclidean.html
Manhattan Distance_
Also known as L1 distance, it is defined as the sum of the absolute differences between the coordinates of the vectors. It is well-suited for discrete spaces and grid-based environments.
.. _Manhattan Distance: https://distancia.readthedocs.io/en/latest/Manhattan.html
Cosine Distance_
It measures the angle between two vectors rather than their absolute distance. Commonly used in natural language processing and information retrieval (e.g., search engines).
.. _Cosine Distance: https://distancia.readthedocs.io/en/latest/Cosine.html
Jaccard Distance_
Based on the ratio of the intersection to the union of sets, it is effective for comparing sets of words, tags, or recommended items.
.. _Jaccard Distance: https://distancia.readthedocs.io/en/latest/Jaccard.html
Hamming Distance_
It counts the number of differing positions between two character or binary sequences. It is widely used in error detection and bioinformatics.
.. _Hamming Distance: https://distancia.readthedocs.io/en/latest/Hamming.html
.. note::
These distance measures are widely used in various algorithms, including clustering, supervised classification, and search engines.
Matrix Distance Measures_
Distance measures between matrices are widely used in machine learning, image processing, and numerical analysis. Below are five of the most commonly used:
Frobenius Norm_ The Frobenius norm is the square root of the sum of the squared elements of the difference between two matrices. It generalizes the Euclidean distance to matrices and is commonly used in optimization problems.
.. _Frobenius Norm: https://distancia.readthedocs.io/en/latest/Frobenius.html
Spectral Norm_ Defined as the largest singular value of the difference between two matrices, the spectral norm is useful for analyzing stability in numerical methods.
.. _Spectral Norm: https://distancia.readthedocs.io/en/latest/SpectralNormDistance.html
Trace Norm (Nuclear Norm)_ This norm is the sum of the singular values of the difference between matrices. It is often used in low-rank approximation and compressed sensing.
.. _Trace Norm (Nuclear Norm): https://distancia.readthedocs.io/en/latest/NuclearNorm.html
Mahalanobis Distance_ A statistical distance measure that considers correlations between features, making it effective in multivariate anomaly detection and classification.
.. _Mahalanobis Distance: https://distancia.readthedocs.io/en/latest/Mahalanobis.html
Wasserstein Distance (Earth Mover’s Distance)_ This metric quantifies the optimal transport cost between two probability distributions, making it highly relevant in image processing and deep learning.
.. _Wasserstein Distance (Earth Mover’s Distance): https://distancia.readthedocs.io/en/latest/Wasserstein.html
.. note::
These distance measures are widely applied in fields such as computer vision, data clustering, and signal processing.
Text Distance Measures_
Distance measures between texts are crucial in natural language processing (NLP), search engines, and text similarity tasks. Below are five of the most commonly used:
Levenshtein Distance (Edit Distance)_ The minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another. Used in spell checkers and DNA sequence analysis.
.. _Levenshtein Distance (Edit Distance): https://distancia.readthedocs.io/en/latest/Levenshtein.html
Jaccard Similarity_
Measures the overlap between two sets of words or character n-grams, computed as the ratio of their intersection to their union. Useful in document comparison and keyword matching.
.. _Jaccard Similarity: https://distancia.readthedocs.io/en/latest/Jaccard.html
Cosine Similarity_
Computes the cosine of the angle between two text vectors, often based on TF-IDF or word embeddings. Commonly used in search engines and document ranking.
.. _Cosine Similarity: https://distancia.readthedocs.io/en/latest/Cosine.html
Damerau-Levenshtein Distance_ An extension of Levenshtein distance that also considers transpositions (swapping adjacent characters). More robust for typographical error detection.
.. _Damerau-Levenshtein Distance: https://distancia.readthedocs.io/en/latest/DamerauLevenshtein.html
BLEU Score (Bilingual Evaluation Understudy)_ Measures the similarity between a candidate text and reference texts using n-gram precision. Widely used in machine translation and text summarization.
.. _BLEU Score (Bilingual E
Related Skills
feishu-drive
343.1k|
things-mac
343.1kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
343.1kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
codebase-memory-mcp
1.1kHigh-performance code intelligence MCP server. Indexes codebases into a persistent knowledge graph — average repo in milliseconds. 66 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.
