MLJLinearModels.jl

| [Linux] | Coverage | Documentation | | :------------ | :------- | :------------ | | | | |

This is a package gathering functionalities to solve a number of generalised linear regression/classification problems which, inherently, correspond to an optimisation problem of the form

$$ L(y, X\theta) + P(\theta) $$

where:

$L$ is a loss function
$X$ is the $n \times p$ matrix of training observations, where $n$ is the number of observations (sample size) and $p$ is the number of features (dimension)
$\theta$ the length $p$ vector of weights to be optimized
$P$ is a penalty function

Additional regression/classification methods which do not directly correspond to this formulation may be added in the future.

The core aims of this package are:

make these regressions models "easy to call" and callable in a unified way,
interface with MLJ.jl,
focus on performance including in "big data" settings exploiting packages such as Optim.jl, IterativeSolvers.jl,
use a "machine learning" perspective, i.e.: focus essentially on prediction, hyper-parameters should be obtained via a data-driven procedure such as cross-validation.

Head to the quickstart section of the docs to see how to use this package.

NOTES

This section is only useful if you're interested in implementation details or would like to help extend the library. For usage instruction please head to the docs.

Implemented

"0" stands for no penalty
Analytical means the solution is computed in "one shot" using the \ solver,
CG = conjugate gradient
(Accelerated) Proximal Gradient Descent
Huber, Andrews, Bisquare, Logistic, Fair and Talwar weighing functions available.
Iteratively re-Weighted Least Squares where each system is solved iteratively via CG
In other packages such as Scikit-Learn, a scale factor is estimated along with the parameters, this is a bit ad-hoc and corresponds more to a statistical perspective, further it does not work well with penalties; we recommend using cross-validation to set the parameter of the Huber Loss.
Includes as special case the least absolute deviation (LAD) regression when δ=0.5.

| Classifiers | Formulation | Available solvers | Comments | | :-----------------| :-------------------------- | :----------------------- | :------------- | | Logistic 0/L2 | LogisticLoss + 0/L2 | Newton, Newton-CG, LBFGS | yᵢ∈{±1} | | Logistic L1/EN | LogisticLoss + 0/L2 + L1 | (F)ISTA | yᵢ∈{±1} | | Multinomial 0/L2 | MultinomialLoss + 0/L2 | Newton-CG, LBFGS | yᵢ∈{1,...,c} | | Multinomial L1/EN | MultinomialLoss + 0/L2 + L1 | ISTA, FISTA | yᵢ∈{1,...,c} |

Unless otherwise specified:

Newton-like solvers use Hager-Zhang line search (default in Optim.jl)
ISTA, FISTA solvers use backtracking line search and a shrinkage factor of β=0.8

Note: these models were all tested for correctness whenever a direct comparison with another package was possible, usually by comparing the objective function at the coefficients returned (cf. the tests):

(against scikit-learn): Lasso, Elastic-Net, Logistic (L1/L2/EN), Multinomial (L1/L2/EN)
(against quantreg): Quantile (0/L1)

Systematic timing benchmarks have not been run yet but it's planned (see this issue).

Current limitations

The models are built and tested assuming n > p; if this doesn't hold, tricks should be employed to speed up computations; these have not been implemented yet.
CV-aware code not implemented yet (code that re-uses computations when fitting over a number of hyper-parameters); "Meta" functionalities such as One-vs-All or Cross-Validation are left to other packages such as MLJ.
No support yet for sparse matrices.
Stochastic solvers have not yet been implemented.
All computations are assumed to be done in Float64.

Possible future models

Future

| Model | Formulation | Comments | | :------------------------ | :--------------------------- | :------- | | Group Lasso | L2Loss + ∑L1 over groups | ⭒ | | Adaptive Lasso | L2Loss + weighted L1 | ⭒ A | | SCAD | L2Loss + SCAD | A, B, C | | MCP | L2Loss + MCP | A | | OMP | L2Loss + L0Loss | D | | SGD Classifiers | *Loss + No/L2/L1 and OVA | SkL |

(⭒) should be added soon

Other regression models

There are a number of other regression models that may be included in this package in the longer term but may not directly correspond to the paradigm Loss+Penalty introduced earlier.

In some cases it will make more sense to just use GLM.jl.

Sklearn's list: https://scikit-learn.org/stable/supervised_learning.html#supervised-learning

What about other packages

While the functionalities in this package overlap with a number of existing packages, the hope is that this package will offer a general entry point for all of them in a way that won't require too much thinking from an end user (similar to how someone would use the tools from sklearn.linear_model). If you're looking for specific functionalities/algorithms, it's probably a good idea to look at one of the packages below:

MLJLinearModels.jl

Install / Use

README

MLJLinearModels.jl

NOTES

Implemented

Current limitations

Possible future models

Future

Other regression models

What about other packages