Fitdistr
Fit distributions with mle, mge, mme and qme methods (+ bootstrap)
Install / Use
/learn @generateme/FitdistrREADME
Distribution Fitting in Clojure
Library provides the set of functions to fit univariate distribution to your (uncensored) data.
Entry point: fit function with supported methods:
- Maximum log likelihood estimation -
:mle - Maximum goodness-of-fit estimation:
- Kolmogorov-Smirnov -
:ks - Cramer-von-Mises -
:cvm - Anderson-Darling -
:ad,:adr,:adl,:ad2r,:ad2land:ad2
- Kolmogorov-Smirnov -
- Quantile matching estimation -
:qme - Method of moments (modified) -
:mme - Maximum (Product of) Spacing Estimation -
:mps
Additionally you can use:
bootstrapto generate parameters from set of resampled datainferto generate parameters computationally from data
Library is highly based on fitdistrplus R package.
For details please read this paper.
fastmath distributions and optimization methods are used here.
How does it work?
For every method target function is created which accepts distribution parameters and returns log-likelihood, MSE/MAE of quantiles or differences between cdfs. Such function is minimized or maximized using one of the available algorithms (gradient based or simplex based). Optimization is bounded. Initial values for optimization are infered from data.
For a bootstrap, sequences of resampled data are created and then each sequence is fitted. Best result (mean or median) is used as a final parametrization. Additionally confidence interval (or other extent like iqr or min-max) is returned.
Values of any target function can be calculated and returned as a fitness measure.
Method of moments - modified version
Distributions implementation don't provide higher order moments but can calculate mean (first moment) and variance (second central moment). MME method uses both to match empirical mean and variance (regardless number of parameters to estimate). MSE or MAE is used as a target for optimization.
Usage
To run inference just call one of the following functions:
(fit method distribution data params)(bootstrap method distribution data params)(infer distribution data params)
where:
method- one of supported methods as a keyword (like::mleor:qme)distribution- a name of the distribution as keywords (see below) (like::beta)data- anysequableof numbersparams- parametrization (optional, see below)
All methods return map with following keys:
:params- best parametrization:distribution- distribution object:distribution-name- name of the distribution:method- used fitting method:stats- statistics (see below)
For bootstrap you receive additionally:
:ci- confidence interval (several methods, see below):ci-type- name of the interval method:all-params- (optional) list of parameters for each resampled dataset:params- best parametrization (mean or median, depending on confidence interval)
Some validations of the data and initial parameters are made.
Examples
(require '[fitdistr.core :refer :all]
'[fitdistr.distributions :refer [distribution-data]])
Example 1
Proof that matching is accurate enough
(def target-data (->seq (distribution :weibull {:alpha 0.5 :beta 2.2}) 10000))
(fit :ad :weibull target-data {:stats [:mle]})
;; => {:stats
;; {:ad 0.19749431207310408,
;; :mle -19126.212671469282,
;; :aic 38256.425342938564,
;; :bic 38270.84602368252},
;; :params {:alpha 0.5014214878565807, :beta 2.203213102262515},
;; :distribution-name :weibull,
;; :distribution #object[org.apache.commons.math3.distribution.WeibullDistribution 0x430997b7 "org.apache.commons.math3.distribution.WeibullDistribution@430997b7"],
;; :method :ad}
(bootstrap :mle :weibull target-data {:stats #{:ad}})
;; => {:stats
;; {:mle -19126.178345014738,
;; :ad 0.35561024021990306,
;; :aic 38256.356690029475,
;; :bic 38270.77737077343},
;; :mad-median
;; {:alpha [0.4910043451070347 0.5056336146263343 0.4983189798666845],
;; :beta [2.1185018316179285 2.326029409552982 2.222265620585455]},
;; :params {:alpha 0.4983189798666845, :beta 2.222265620585455},
;; :distribution-name :weibull,
;; :distribution #object[org.apache.commons.math3.distribution.WeibullDistribution 0x63a766b9 "org.apache.commons.math3.distribution.WeibullDistribution@63a766b9"]}
(infer :weibull target-data {:stats #{:mle :ad}})
;; => {:stats
;; {:mle -19126.13369575803,
;; :ad 0.22838225327177497,
;; :aic 38256.26739151606,
;; :bic 38270.68807226002},
;; :params {:alpha 0.5012938746206328, :beta 2.215448048490149},
;; :distribution-name :weibull,
;; :distribution #object[org.apache.commons.math3.distribution.WeibullDistribution 0x3a0f2314 "org.apache.commons.math3.distribution.WeibullDistribution@3a0f2314"]}
Using the distribution
(def inferred-distribution (fit :ad :weibull target-data {:stats [:mle]}))
inferred-distribution
;; => {:stats
;; {:ad 0.22793837346762302,
;; :mle -18836.462685291066,
;; :aic 37676.92537058213,
;; :bic 37691.34605132609},
;; :params {:alpha 0.5020961308787267, :beta 2.1661515133303646},
;; :distribution-name :weibull,
;; :distribution
;; #object[org.apache.commons.math3.distribution.WeibullDistribution 0x7f421db2 "org.apache.commons.math3.distribution.WeibullDistribution@7f421db2"],
;; :method :ad}
(->distribution inferred-distribution)
;; => #object[org.apache.commons.math3.distribution.WeibullDistribution 0x7f421db2 "org.apache.commons.math3.distribution.WeibullDistribution@7f421db2"]
(cdf inferred-distribution 0.5)
;; => 0.38057731286029817
(cdf inferred-distribution 0.5 10.0)
;; => 0.5035774570603441
(pdf inferred-distribution 0.5)
;; => 0.2979270382668242
(lpdf inferred-distribution 0.5)
;; => -1.2109066604852476
(icdf inferred-distribution 0.5)
;; => 1.0439237628813434
(sample inferred-distribution)
;; => 0.002386261009069703
(log-likelihood inferred-distribution (take 10 target-data))
;; => -24.233608452146168
(likelihood inferred-distribution (take 10 target-data))
;; => 2.988667305414128E-11
(mean inferred-distribution)
;; => 4.299110979375332
(variance inferred-distribution)
;; => 91.33715492734707
(lower-bound inferred-distribution)
;; => 0.0
(upper-bound inferred-distribution)
;; => ##Inf
(distribution-id inferred-distribution)
;; => :weibull
(distribution-parameters inferred-distribution)
;; => [:beta :alpha]
(drandom inferred-distribution)
;; => 7.4061584562769776
(lrandom inferred-distribution)
;; => 4
(irandom inferred-distribution)
;; => 40
(set-seed! inferred-distribution 1337)
(take 10 (->seq inferred-distribution))
;; => (2.5184760984751717
;; 2.9550268761778735
;; 9.930032804583968
;; 11.259341860117786
;; 0.0808352042851777
;; 17.399335542961957
;; 0.0564922448326893
;; 0.32170752149468795
;; 6.063628565109016
;; 2.2215931112225826)
(set-seed! inferred-distribution 1337)
(->seq inferred-distribution 10)
;; => (2.5184760984751717
;; 2.9550268761778735
;; 9.930032804583968
;; 11.259341860117786
;; 0.0808352042851777
;; 17.399335542961957
;; 0.0564922448326893
;; 0.32170752149468795
;; 6.063628565109016
;; 2.2215931112225826)
Example 2
Search for the best distribution and its parameters. Look at last example where Pareto distribution is wrongly considered best when using inadequate method.
(def atv [0.6 2.8 182.2 0.8 478.0 1.1 215.0 0.7 7.9 316.2 0.2 17780.0 7.8 100.0 0.9 180.0 0.3 300.9
0.6 17.5 10.0 0.1 5.8 87.7 4.1 3.5 4.9 7060.0 0.2 360.0 100.8 2.3 12.3 40.0 2.3 0.1
2.7 2.2 0.4 2.6 0.2 1.0 7.3 3.2 0.8 1.2 33.7 14.0 21.4 7.7 1.0 1.9 0.7 12.6
3.2 7.3 4.9 4000.0 2.5 6.7 3.0 63.0 6.0 1.6 10.1 1.2 1.5 1.2 30.0 3.2 3.5 1.2
0.2 1.9 0.7 17.0 2.8 4.8 1.3 3.7 0.2 1.8 2.6 5.9 2.6 6.3 1.4 0.8 670.0 810.0
1890.0 1800.0 8500.0 21000.0 31.0 20.5 4370.0 1000.0 39891.8
316.2 6400.0 1000.0 7400.0 31622.8])
(defn find-best
[method ds]
(let [selector (if (= method :mle) last first)]
(dissoc (->> (map #(fit method % atv {:stats #{:mle :ad :ks :cvm}}) ds)
(remove (fn [v] (Double/isNaN (method (:stats v)))))
(sort-by (comp method :stats))
(selector))
:distribution)))
(find-best :mle [:weibull :log-normal :gamma :exponential :normal :pareto])
;; => {:stats
;; {:mle -532.4052019871922,
;; :cvm 0.6373592936482382,
;; :ks 0.1672497620724005,
;; :ad 3.4721179220009617,
;; :aic 1068.8104039743844,
;; :bic 1074.0991857726672},
;; :params {:scale 2.553816262077493, :shape 3.147240361221695},
;; :distribution-name :log-normal,
;; :method :mle}
(find-best :ad [:weibull :log-normal :gamma :exponential :normal :pareto])
;; => {:stats
;; {:ad 3.0345123029861156,
;; :cvm 0.4615381958965107,
;; :ks 0.1332827771382316,
;; :mle -532.9364810533066,
;; :aic 1069.8729621066132,
;; :bic 1075.161743904896},
;; :params {:scale 2.2941800698596815, :shape 3.2934516278879205},
;; :distribution-name :log-normal,
;; :method :ad}
(find-best :ks [:weibull :log-normal :gamma :exponential :normal :pareto])
;; => {:stats
;; {:ks 0.10129796316277348,
;; :mle -535.0197747143928,
;; :cvm 0.3675703954412623,
;; :ad 3.830809551957188,
;; :aic 1074.0395494287857,
;; :bic 1079.3283312270685},
;; :params {:scale 2.03465815391538, :shape 2.873339450786136},
;; :distribution-name :log-normal,
;; :method :ks}

Example 3
This i
Related Skills
node-connect
341.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.5kCommit, push, and open a PR
