SkillAgentSearch skills...

Philentropy

Information Theory and Distance Quantification with R

Install / Use

/learn @drostlab/Philentropy

README

philentropy <sub><sup>— Information Theory and Distance Quantification with R</sup></sub>

CRAN status rstudio mirror downloads rstudio mirror downloads

🧭 Similarity and Distance Quantification between Probability Functions

Describe and understand the world through data.

Data collection and data comparison are the foundations of scientific research.
Mathematics provides the abstract framework to describe patterns we observe in nature and Statistics provides the framework to quantify the uncertainty of these patterns.

In statistics, natural patterns are described in the form of probability distributions that either follow fixed patterns (parametric distributions) or more dynamic ones (non-parametric distributions).

The philentropy package implements fundamental distance and similarity measures to quantify distances between probability density functions as well as traditional information theory measures.
In this regard, it aims to provide a framework for comparing natural patterns in a statistical notation.

🧡 This project is born out of my passion for statistics and I hope it will be useful to those who share it with me.


⚙️ Installation

# install philentropy version 0.10.0 from CRAN
install.packages("philentropy")

Or get the latest developer version:

# install.packages("devtools")
library(devtools)
install_github("HajkD/philentropy", build_vignettes = TRUE, dependencies = TRUE)

🧾 Citation

HG Drost (2018).
Philentropy: Information Theory and Distance Quantification with R.
Journal of Open Source Software, 3(26), 765.
https://doi.org/10.21105/joss.00765

🪶 I am developing philentropy in my spare time and would be very grateful if you would consider citing the paper above if it was useful for your research. These citations help me continue maintaining and extending the package.


🧩 Quick Start

library(philentropy)

P <- c(0.1, 0.2, 0.7)
Q <- c(0.2, 0.2, 0.6)

distance(rbind(P, Q), method = "jensen-shannon")
jensen-shannon using unit 'log'.
jensen-shannon 
    0.02628933

💡 Tip: Got a large matrix (rows = samples, cols = features)?
Use distance(X, method="cosine", mute.message=TRUE) to compute the full pairwise matrix quickly and quietly.


📘 Tutorials


🧪 When should I use which distance?

| Goal | Recommended Methods | |------|---------------------| | 🔁 Clustering / similarity | cosine, correlation, euclidean | | 📊 Probability or compositional data | jensen-shannon, hellinger, kullback-leibler | | 🧬 Sparse counts / binary | canberra, jaccard, sorensen | | ⚖️ Scale-invariant | manhattan, chebyshev |

Run getDistMethods() to explore all 45+ implemented measures.


🧮 Examples

library(philentropy)
philentropy::getDistMethods()
[1] "euclidean"         "manhattan"         "minkowski"         "chebyshev"         "sorensen"         
[6] "gower"             "soergel"           "kulczynski_d"      "canberra"          "lorentzian"       
[11] "intersection"      "non-intersection"  "wavehedges"        "czekanowski"       "motyka"           
[16] "kulczynski_s"      "tanimoto"          "ruzicka"           "inner_product"     "harmonic_mean"    
[21] "cosine"            "hassebrook"        "jaccard"           "dice"              "fidelity"         
[26] "bhattacharyya"     "hellinger"         "matusita"          "squared_chord"     "squared_euclidean"
[31] "pearson"           "neyman"            "squared_chi"       "prob_symm"         "divergence"       
[36] "clark"             "additive_symm"     "kullback-leibler"  "jeffreys"          "k_divergence"     
[41] "topsoe"            "jensen-shannon"    "jensen_difference" "taneja"            "kumar-johnson"    
[46] "avg"
# define probability density functions P and Q
P <- 1:10/sum(1:10)
Q <- 20:29/sum(20:29)

x <- rbind(P, Q)
philentropy::distance(x, method = "jensen-shannon")
jensen-shannon using unit 'log'.
jensen-shannon 
    0.02628933

Alternatively, compute all available distances:

philentropy::dist.diversity(x, p = 2, unit = "log2")

🌟 Papers using philentropy (highlights)

Flagship examples with top venues. Click to expand full lists.

<details> <summary><b>Nature / Cell / Science</b></summary>
  • A transcriptomic hourglass in brown algae
    JS Lotharukpong, M Zheng, R Luthringer et al. – Nature, 2024
  • Annelid functional genomics reveal the origins of bilaterian life cycles
    FM Martín-Zamora, Y Liang, K Guynes et al. – Nature, 2023
  • An atlas of gene regulatory elements in adult mouse cerebrum
    YE Li, S Preissl, X Hou, Z Zhang, K Zhang et al. – Nature, 2021
  • Convergent somatic mutations in metabolism genes in chronic liver disease
    S Ng, F Rouhani, S Brunner, N Brzozowska et al. – Nature, 2021
  • Antigen dominance hierarchies shape TCF1+ progenitor CD8 T cell phenotypes in tumors
    ML Burger, AM Cruz, GE Crossland et al. – Cell, 2021
  • A comparative atlas of single-cell chromatin accessibility in the human brain
    YE Li, S Preissl, M Miller, ND Johnson, Z Wang et al. – Science, 2023
</details> <details> <summary><b>Nature Methods / Nat Comms / Cell family</b></summary>
  • sciCSR infers B cell state transition and predicts class-switch recombination dynamics using scRNA-seq
    JCF Ng, G Montamat Garcia, AT Stewart et al. – Nature Methods, 2024
  • Decoding the gene regulatory network of endosperm differentiation in maize
    Y Yuan, Q Huo, Z Zhang, Q Wang, J Wang et al. – Nature Communications, 2024
  • Population structure in a fungal human pathogen is potentially linked to pathogenicity
    EA Hatmaker, AE Barber, MT Drott et al. – Nature Communications, 2025
  • Pan-cancer human brain metastases atlas at single-cell resolution
    X Xing, J Zhong, J Biermann, H Duan, X Zhang et al. – Cancer Cell, 2025
  • Gene module reconstruction identifies cellular differentiation processes and the regulatory logic of specialized secretion in zebrafish
    Y Wang, J Liu, LY Du, JL Wyss, JA Farrell, AF Schier – Developmental Cell, 2025
</details> <details> <summary><b>Other disciplines (selected)</b></summary>
  • Staphylococci in high resolution: Capturing diversity within the human nasal microbiota
    AC Ingham, DYK Ng, S Iversen, CM Liu et al. – Cell Reports, 2025
  • The power of visualizing distributional differences: formal graphical n-sample tests
    K Konstantinou, T Mrkvička, M Myllymäki – Computational Statistics, 2025
  • Plant species as ecological engineers of microtopography in a temperate sedge-grass marsh
    J Dušek, J Novotný, B Navrátilová et al. – Scientific Reports, 2025
  • Resolution of MALDI-TOF vs WGS for Bacillus identification (NASA JSC)
    F Mazhari, AB Regberg, CL Castro, MG LaMontagne – Frontiers in Microbiology, 2025
  • Every Hue Has Its Fan Club: Diverse Patterns of Color-Dependent Flower Visitation across Lepidoptera
    D Kutcherov, EL Westerman – Integrative and Comparative Biology, 2025
</details>

🎓 philentropy has been used in dozens of peer-reviewed publications to quantify distances, divergences, and similarities in complex biological and computational datasets.


🧠 Important Functions

Distance Measures

  • distance() – Implements 46 probability distance/similarity measures
  • getDistMethods() – Get all method names for distance
  • dist.diversity() – Computes distance diversity between PDFs
  • estimate.probability() – Estimate probability vectors from counts

Information Theory

  • H() – Shannon’s Entropy H(X)
  • JE() – Joint Entropy H(X,Y)
  • CE() – Conditional Entropy H(X|Y)
  • MI() – Mutual Information I(X,Y)
  • KL() – Kullback–Leibler Divergence
  • JSD() – Jensen–Shannon Divergence
  • gJSD() – Generalized Jensen–Shannon Divergence

🗞️ NEWS

Find the current status and version history in the
👉 NEWS section.


🧩 Appendix — full references

  • A transcriptomic hourglass in brown algae
    JS Lotharukpong, M Zheng, R Luthringer et al. – Nature, 2024

  • Annelid functional genomics reveal the origins of bilaterian life cycles
    FM Martín-Zamora, Y Liang, K Guynes et al. – Nature, 2023

  • An atlas of gene regulatory elements in adult mouse cerebrum
    YE Li, S Preissl, X Hou, Z Zhang, K Zhang et al. – Nature, 2021

  • Convergent somatic mutations in metabolism genes in chronic liver disease
    S Ng, F Rouhani, S Brunner, N Brzozowska et al. – Nature, 2021

  • Antigen dominance hierarchies shape TCF1+ progenitor CD8 T cell phenotypes in tumors
    ML Burger, AM Cruz, GE Crossland et al. – Cell, 2021

  • High-content single-cell combinatorial indexing
    R Mulqueen et al. – Nature Biotechnology, 2021

  • __A comparati

View on GitHub
GitHub Stars149
CategoryDevelopment
Updated1mo ago
Forks22

Languages

R

Security Score

100/100

Audited on Feb 24, 2026

No findings