Philentropy
Information Theory and Distance Quantification with R
Install / Use
/learn @drostlab/PhilentropyREADME
philentropy <sub><sup>— Information Theory and Distance Quantification with R</sup></sub>
🧭 Similarity and Distance Quantification between Probability Functions
Describe and understand the world through data.
Data collection and data comparison are the foundations of scientific research.
Mathematics provides the abstract framework to describe patterns we observe in nature and Statistics provides the
framework to quantify the uncertainty of these patterns.
In statistics, natural patterns are described in the form of probability distributions that either follow fixed patterns (parametric distributions) or more dynamic ones (non-parametric distributions).
The philentropy package implements fundamental distance and similarity measures to quantify distances between probability density functions as well as traditional information theory measures.
In this regard, it aims to provide a framework for comparing natural patterns in a statistical notation.
🧡 This project is born out of my passion for statistics and I hope it will be useful to those who share it with me.
⚙️ Installation
# install philentropy version 0.10.0 from CRAN
install.packages("philentropy")
Or get the latest developer version:
# install.packages("devtools")
library(devtools)
install_github("HajkD/philentropy", build_vignettes = TRUE, dependencies = TRUE)
🧾 Citation
HG Drost (2018).
Philentropy: Information Theory and Distance Quantification with R.
Journal of Open Source Software, 3(26), 765.
https://doi.org/10.21105/joss.00765
🪶 I am developing
philentropyin my spare time and would be very grateful if you would consider citing the paper above if it was useful for your research. These citations help me continue maintaining and extending the package.
🧩 Quick Start
library(philentropy)
P <- c(0.1, 0.2, 0.7)
Q <- c(0.2, 0.2, 0.6)
distance(rbind(P, Q), method = "jensen-shannon")
jensen-shannon using unit 'log'.
jensen-shannon
0.02628933
💡 Tip: Got a large matrix (rows = samples, cols = features)?
Usedistance(X, method="cosine", mute.message=TRUE)to compute the full pairwise matrix quickly and quietly.
📘 Tutorials
- Introduction to the philentropy package
- Distance and Similarity Measures implemented in philentropy
- Information Theory Metrics implemented in philentropy
- Comparing many probability density functions
🧪 When should I use which distance?
| Goal | Recommended Methods |
|------|---------------------|
| 🔁 Clustering / similarity | cosine, correlation, euclidean |
| 📊 Probability or compositional data | jensen-shannon, hellinger, kullback-leibler |
| 🧬 Sparse counts / binary | canberra, jaccard, sorensen |
| ⚖️ Scale-invariant | manhattan, chebyshev |
Run
getDistMethods()to explore all 45+ implemented measures.
🧮 Examples
library(philentropy)
philentropy::getDistMethods()
[1] "euclidean" "manhattan" "minkowski" "chebyshev" "sorensen"
[6] "gower" "soergel" "kulczynski_d" "canberra" "lorentzian"
[11] "intersection" "non-intersection" "wavehedges" "czekanowski" "motyka"
[16] "kulczynski_s" "tanimoto" "ruzicka" "inner_product" "harmonic_mean"
[21] "cosine" "hassebrook" "jaccard" "dice" "fidelity"
[26] "bhattacharyya" "hellinger" "matusita" "squared_chord" "squared_euclidean"
[31] "pearson" "neyman" "squared_chi" "prob_symm" "divergence"
[36] "clark" "additive_symm" "kullback-leibler" "jeffreys" "k_divergence"
[41] "topsoe" "jensen-shannon" "jensen_difference" "taneja" "kumar-johnson"
[46] "avg"
# define probability density functions P and Q
P <- 1:10/sum(1:10)
Q <- 20:29/sum(20:29)
x <- rbind(P, Q)
philentropy::distance(x, method = "jensen-shannon")
jensen-shannon using unit 'log'.
jensen-shannon
0.02628933
Alternatively, compute all available distances:
philentropy::dist.diversity(x, p = 2, unit = "log2")
🌟 Papers using philentropy (highlights)
<details> <summary><b>Nature / Cell / Science</b></summary>Flagship examples with top venues. Click to expand full lists.
- A transcriptomic hourglass in brown algae
JS Lotharukpong, M Zheng, R Luthringer et al. – Nature, 2024 - Annelid functional genomics reveal the origins of bilaterian life cycles
FM Martín-Zamora, Y Liang, K Guynes et al. – Nature, 2023 - An atlas of gene regulatory elements in adult mouse cerebrum
YE Li, S Preissl, X Hou, Z Zhang, K Zhang et al. – Nature, 2021 - Convergent somatic mutations in metabolism genes in chronic liver disease
S Ng, F Rouhani, S Brunner, N Brzozowska et al. – Nature, 2021 - Antigen dominance hierarchies shape TCF1+ progenitor CD8 T cell phenotypes in tumors
ML Burger, AM Cruz, GE Crossland et al. – Cell, 2021 - A comparative atlas of single-cell chromatin accessibility in the human brain
YE Li, S Preissl, M Miller, ND Johnson, Z Wang et al. – Science, 2023
- sciCSR infers B cell state transition and predicts class-switch recombination dynamics using scRNA-seq
JCF Ng, G Montamat Garcia, AT Stewart et al. – Nature Methods, 2024 - Decoding the gene regulatory network of endosperm differentiation in maize
Y Yuan, Q Huo, Z Zhang, Q Wang, J Wang et al. – Nature Communications, 2024 - Population structure in a fungal human pathogen is potentially linked to pathogenicity
EA Hatmaker, AE Barber, MT Drott et al. – Nature Communications, 2025 - Pan-cancer human brain metastases atlas at single-cell resolution
X Xing, J Zhong, J Biermann, H Duan, X Zhang et al. – Cancer Cell, 2025 - Gene module reconstruction identifies cellular differentiation processes and the regulatory logic of specialized secretion in zebrafish
Y Wang, J Liu, LY Du, JL Wyss, JA Farrell, AF Schier – Developmental Cell, 2025
- Staphylococci in high resolution: Capturing diversity within the human nasal microbiota
AC Ingham, DYK Ng, S Iversen, CM Liu et al. – Cell Reports, 2025 - The power of visualizing distributional differences: formal graphical n-sample tests
K Konstantinou, T Mrkvička, M Myllymäki – Computational Statistics, 2025 - Plant species as ecological engineers of microtopography in a temperate sedge-grass marsh
J Dušek, J Novotný, B Navrátilová et al. – Scientific Reports, 2025 - Resolution of MALDI-TOF vs WGS for Bacillus identification (NASA JSC)
F Mazhari, AB Regberg, CL Castro, MG LaMontagne – Frontiers in Microbiology, 2025 - Every Hue Has Its Fan Club: Diverse Patterns of Color-Dependent Flower Visitation across Lepidoptera
D Kutcherov, EL Westerman – Integrative and Comparative Biology, 2025
🎓 philentropy has been used in dozens of peer-reviewed publications to quantify distances, divergences, and similarities in complex biological and computational datasets.
🧠 Important Functions
Distance Measures
distance()– Implements 46 probability distance/similarity measuresgetDistMethods()– Get all method names fordistancedist.diversity()– Computes distance diversity between PDFsestimate.probability()– Estimate probability vectors from counts
Information Theory
H()– Shannon’s EntropyH(X)JE()– Joint EntropyH(X,Y)CE()– Conditional EntropyH(X|Y)MI()– Mutual InformationI(X,Y)KL()– Kullback–Leibler DivergenceJSD()– Jensen–Shannon DivergencegJSD()– Generalized Jensen–Shannon Divergence
🗞️ NEWS
Find the current status and version history in the
👉 NEWS section.
🧩 Appendix — full references
A transcriptomic hourglass in brown algae
JS Lotharukpong, M Zheng, R Luthringer et al. – Nature, 2024Annelid functional genomics reveal the origins of bilaterian life cycles
FM Martín-Zamora, Y Liang, K Guynes et al. – Nature, 2023An atlas of gene regulatory elements in adult mouse cerebrum
YE Li, S Preissl, X Hou, Z Zhang, K Zhang et al. – Nature, 2021Convergent somatic mutations in metabolism genes in chronic liver disease
S Ng, F Rouhani, S Brunner, N Brzozowska et al. – Nature, 2021Antigen dominance hierarchies shape TCF1+ progenitor CD8 T cell phenotypes in tumors
ML Burger, AM Cruz, GE Crossland et al. – Cell, 2021High-content single-cell combinatorial indexing
R Mulqueen et al. – Nature Biotechnology, 2021__A comparati
