SkillAgentSearch skills...

Distances.jl

A Julia package for evaluating distances (metrics) between vectors.

Install / Use

/learn @JuliaStats/Distances.jl
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Distances.jl

Build Status Coverage Status

A Julia package for evaluating distances (metrics) between vectors.

This package also provides optimized functions to compute column-wise and pairwise distances with matrices (i.e., 2D arrays), which are often substantially faster than a straightforward loop implementation. (See the benchmark section below for details).

For distances between tables with heterogeneous columns, please check TableDistances.jl.

Supported distances

  • Euclidean distance
  • Squared Euclidean distance
  • Periodic Euclidean distance
  • Cityblock distance
  • Total variation distance
  • Jaccard distance
  • Rogers-Tanimoto distance
  • Chebyshev distance
  • Minkowski distance
  • Hamming distance
  • Cosine distance
  • Correlation distance
  • Chi-square distance
  • Kullback-Leibler divergence
  • Generalized Kullback-Leibler divergence
  • Rényi divergence
  • Jensen-Shannon divergence
  • Mahalanobis distance
  • Squared Mahalanobis distance
  • Bhattacharyya distance
  • Hellinger distance
  • Haversine distance
  • Spherical angle distance
  • Mean absolute deviation
  • Mean squared deviation
  • Root mean squared deviation
  • Normalized root mean squared deviation
  • Bray-Curtis dissimilarity
  • Bregman divergence

For Euclidean distance, Squared Euclidean distance, Cityblock distance, Minkowski distance, and Hamming distance, a weighted version is also provided.

Basic use

The library supports three ways of computation: computing the distance between two iterators/vectors, "zip"-wise computation, and pairwise computation. Each of these computation modes works with arbitrary iterable objects of known size.

Computing the distance between two iterators or vectors

Each distance corresponds to a distance type. You can always compute a certain distance between two iterators or vectors of equal length using the following syntax

r = evaluate(dist, x, y)
r = dist(x, y)

Here, dist is an instance of a distance type: for example, the type for Euclidean distance is Euclidean (more distance types will be introduced in the next section). You can compute the Euclidean distance between x and y as

r = evaluate(Euclidean(), x, y)
r = Euclidean()(x, y)

Common distances also come with convenient functions for distance evaluation. For example, you may also compute Euclidean distance between two vectors as below

r = euclidean(x, y)

Computing distances between corresponding objects ("column-wise")

Suppose you have two m-by-n matrix X and Y, then you can compute all distances between corresponding columns of X and Y in one batch, using the colwise function, as

r = colwise(dist, X, Y)

The output r is a vector of length n. In particular, r[i] is the distance between X[:,i] and Y[:,i]. The batch computation typically runs considerably faster than calling evaluate column-by-column.

Note that either of X and Y can be just a single vector -- then the colwise function computes the distance between this vector and each column of the other argument.

Computing pairwise distances

Let X and Y have m and n columns, respectively, and the same number of rows. Then the pairwise function with the dims=2 argument computes distances between each pair of columns in X and Y:

R = pairwise(dist, X, Y, dims=2)

In the output, R is a matrix of size (m, n), such that R[i,j] is the distance between X[:,i] and Y[:,j]. Computing distances for all pairs using pairwise function is often remarkably faster than evaluting for each pair individually.

If you just want to just compute distances between all columns of a matrix X, you can write

R = pairwise(dist, X, dims=2)

This statement will result in an m-by-m matrix, where R[i,j] is the distance between X[:,i] and X[:,j]. pairwise(dist, X) is typically more efficient than pairwise(dist, X, X), as the former will take advantage of the symmetry when dist is a semi-metric (including metric).

To compute pairwise distances for matrices with observations stored in rows use the argument dims=1.

Computing column-wise and pairwise distances inplace

If the vector/matrix to store the results are pre-allocated, you may use the storage (without creating a new array) using the following syntax (i being either 1 or 2):

colwise!(dist, r, X, Y)
pairwise!(dist, R, X, Y, dims=i)
pairwise!(dist, R, X, dims=i)

Please pay attention to the difference, the functions for inplace computation are colwise! and pairwise! (instead of colwise and pairwise).

Deprecated alternative syntax

The syntax

colwise!(r, dist, X, Y)
pairwise!(R, dist, X, Y, dims=i)
pairwise!(R, dist, X, dims=i)

with the first two arguments (metric and results) interchanged is supported as well. However, its use is discouraged since it is deprecated and will be removed in a future release.

Distance type hierarchy

The distances are organized into a type hierarchy.

At the top of this hierarchy is an abstract class PreMetric, which is defined to be a function d that satisfies

d(x, x) == 0  for all x
d(x, y) >= 0  for all x, y

SemiMetric is an abstract type that refines PreMetric. Formally, a semi-metric is a pre-metric that is also symmetric, as

d(x, y) == d(y, x)  for all x, y

Metric is an abstract type that further refines SemiMetric. Formally, a metric is a semi-metric that also satisfies triangle inequality, as

d(x, z) <= d(x, y) + d(y, z)  for all x, y, z

MinkowskiMetric is an abstract type that encompasses a family of metrics defined by the formula

d(x, y) = sum(w .* (x - y) .^ p) ^ (1 / p)

where the p parameter defines the metric and w is a potential weight vector (all 1's by default).

This type system has practical significance. For example, when computing pairwise distances between a set of vectors, you may only perform computation for half of the pairs, derive the values immediately for the remaining half by leveraging the symmetry of semi-metrics. Note that the types of SemiMetric and Metric do not completely follow the definition in mathematics as they do not require the "distance" to be able to distinguish between points: for these types x != y does not imply that d(x, y) != 0 in general compared to the mathematical definition of semi-metric and metric, as this property does not change computations in practice.

Each distance corresponds to a distance type. The type name and the corresponding mathematical definitions of the distances are listed in the following table.

| type name | convenient syntax | math definition | | -------------------- | --------------------------------- | --------------------| | Euclidean | euclidean(x, y) | sqrt(sum((x - y) .^ 2)) | | SqEuclidean | sqeuclidean(x, y) | sum((x - y).^2) | | PeriodicEuclidean | peuclidean(x, y, w) | sqrt(sum(min(mod(abs(x - y), w), w - mod(abs(x - y), w)).^2)) | | Cityblock | cityblock(x, y) | sum(abs(x - y)) | | TotalVariation | totalvariation(x, y) | sum(abs(x - y)) / 2 | | Chebyshev | chebyshev(x, y) | max(abs(x - y)) | | Minkowski | minkowski(x, y, p) | sum(abs(x - y).^p) ^ (1/p) | | Hamming | hamming(k, l) | sum(k .!= l) | | RogersTanimoto | rogerstanimoto(a, b) | 2(sum(a&!b) + sum(!a&b)) / (2(sum(a&!b) + sum(!a&b)) + sum(a&b) + sum(!a&!b)) | | Jaccard | jaccard(x, y) | 1 - sum(min(x, y)) / sum(max(x, y)) | | BrayCurtis | braycurtis(x, y) | sum(abs(x - y)) / sum(abs(x + y)) | | CosineDist | cosine_dist(x, y) | 1 - dot(x, y) / (norm(x) * norm(y)) | | CorrDist | corr_dist(x, y) | cosine_dist(x - mean(x), y - mean(y)) | | ChiSqDist | chisq_dist(x, y) | sum((x - y).^2 / (x + y)) | | KLDivergence | kl_divergence(p, q) | sum(p .* log(p ./ q)) | | GenKLDivergence | gkl_divergence(x, y) | sum(p .* log(p ./ q) - p + q) | | RenyiDivergence | renyi_divergence(p, q, k) | log(sum( p .* (p ./ q) .^ (k - 1))) / (k - 1) | | JSDivergence | js_divergence(p, q) | KL(p, m) / 2 + KL(q, m) / 2 with m = (p + q) / 2 | | SpanNormDist | spannorm_dist(x, y) | max(x - y) - min(x - y) | | BhattacharyyaDist | bhattacharyya(x, y) | -log(sum(sqrt(x .* y) / sqrt(sum(x) * sum(y))) | | HellingerDist | hellinger(x, y) | sqrt(1 - sum(sqrt(x .* y) / sqrt(sum(x) * sum(y)))) | | Haversine | haversine(x, y, r = 6_371_000) | Haversine formula | | SphericalAngle | spherical_angle(x, y) | Haversine formula | | Mahalanobis | mahalanobis(x, y, Q) | sqrt((x - y)' * Q * (x - y)) | | SqMahalanobis | sqmahalanobis(x, y, Q) | (x - y)' * Q * (x - y) | | MeanAbsDeviation | meanad(x, y) | mean(abs.(x - y)) | | MeanSqDeviation | msd(x, y) | mean(abs2.(x - y)) | | RMSDeviation | rmsd(x, y)

Related Skills

View on GitHub
GitHub Stars470
CategoryDevelopment
Updated13d ago
Forks97

Languages

Julia

Security Score

85/100

Audited on Mar 19, 2026

No findings