NMF.jl
A Julia package for non-negative matrix factorization
Install / Use
/learn @JuliaStats/NMF.jlREADME
NMF.jl
A Julia package for non-negative matrix factorization (NMF).
Development Status
Note: Nonnegative Matrix Factorization is an area of active research. New algorithms are proposed every year. Contributions are very welcomed.
Done
- Lee & Seung's Multiplicative Update (for both MSE & Divergence objectives)
- (Naive) Projected Alternate Least Squared
- ALS Projected Gradient Methods
- Coordinate Descent Methods
- Random Initialization
- NNDSVD Initialization
- Sparse NMF
- Separable NMF
To do
- Probabilistic NMF
See also NMFMerge, which can augment any least-squares NMF algorithm.
Overview
Non-negative Matrix Factorization (NMF) generally refers to the techniques for factorizing a non-negative matrix X into the product of two lower rank matrices W and H, such that WH optimally approximates X in some sense. Such techniques are widely used in text mining, image analysis, and recommendation systems.
This package provides two sets of tools, respectively for initilization and optimization. A typical NMF procedure consists of two steps: (1) use an initilization function that initialize W and H; and (2) use an optimization algorithm to pursue the optimal solution.
Most types and functions (except the high-level function nnmf) in this package are not exported. Users are encouraged to use them with the prefix NMF.. This way allows us to use shorter names within the package and makes the codes more explicit and clear on the user side.
High-Level Interface
The package provides a high-level function nnmf that runs the entire procedure (initialization + optimization):
nnmf(X, k, ...)
This function factorizes the input matrix X into the product of two non-negative matrices W and H.
In general, it returns a result instance of type NMF.Result, which is defined as
struct Result
W::Matrix{Float64} # W matrix
H::Matrix{Float64} # H matrix
niters::Int # number of elapsed iterations
converged::Bool # whether the optimization procedure converges
objvalue::Float64 # objective value of the last step
end
The function supports the following keyword arguments:
-
init: A symbol that indicates the initialization method (default =:nndsvdar).This argument accepts the following values:
random: matrices filled with uniformly random valuesnndsvd: standard version of NNDSVDnndsvda: NNDSVDa variantnndsvdar: NNDSVDar variantspa: Successive Projection Algorithmcustom: use custom matricesW0andH0
With the
nndsvdvariants, you can optionally supply the SVD result viainitdata, e.g.,initdata=svd(X). If not supplied, the factorization is computed using a randomized svd (rsvdfrom RandomizedLinAlg). -
alg: A symbol that indicates the factorization algorithm (default =:greedycd).This argument accepts the following values:
multmse: Multiplicative update (using MSE as objective)multdiv: Multiplicative update (using divergence as objective)projals: (Naive) Projected Alternate Least Squarealspgrad: Alternate Least Square using Projected Gradient Descentcd: Coordinate Descent solver that uses Fast Hierarchical Alternating Least Squares (implemetation similar to scikit-learn)greedycd: Greedy Coordinate Descentspa: Successive Projection Algorithm
-
maxiter: Maximum number of iterations (default =100). -
tol: tolerance of changes upon convergence (default =1.0e-6). -
replicates: Number of times to perform factorization (default =1). -
W0: Option for custom initialization (default =nothing). -
H0: Option for custom initialization (default =nothing).Note:
W0andH0may be overwritten. If one needs to avoid it, please pass in copies themselves. -
update_H: Option for specifying whether to update H (default =true). -
verbose: whether to show procedural information (default =false).
Initialization
-
NMF.randinit(X, k[; zeroh=false, normalize=false])
Initialize
WandHgiven the input matrixXand the rankk. This function returns a pair(W, H).Suppose the size of
Xis(p, n), then the size ofWandHare respectively(p, k)and(k, n).Usage:
W, H = NMF.randinit(X, 3)For some algorithms (e.g. ALS), only
Wneeds to be initialized. For such cases, one may set the keyword argumentzerohto betrue, then in the outputHwill be simply a zero matrix of size(k, n).Another keyword argument is
normalize. Ifnormalizeis set totrue, columns ofWwill be normalized such that each column sum to one. -
NMF.nndsvd(X, k[; zeroh=false, variant=:std])
Use the Non-Negative Double Singular Value Decomposition (NNDSVD) algorithm to initialize
WandH.Reference: C. Boutsidis, and E. Gallopoulos. SVD based initialization: A head start for nonnegative matrix factorization. Pattern Recognition, 2007.
Usage:
W, H = NMF.nndsvd(X, k)This function has two keyword arguments:
zeroh: haveHinitialized when it is set totrue, or setHto all zeros when it is set tofalse.variant: the variant of the algorithm. Default isstd, meaning to use the standard version, which would generate a rather sparseW. Other values areaandar, respectively corresponding to the variants: NNDSVDa and NNDSVDar. Particularly,aris recommended for dense NMF.
-
NMF.spa(X, k)
Use the Successive Projection Algorithm (SPA) to initialize
WandH.Reference: N. Gillis and S. A. Vavasis, Fast and robust recursive algorithms for separable nonnegative matrix factorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 698-714, 2013.
Usage:
W, H = NMF.spa(X, k)
Factorization Algorithms
This package provides multiple factorization algorithms. Each algorithm corresponds to a type. One can create an algorithm instance by choosing a type and specifying the options, and run the algorithm using NMF.solve!:
The NMF.solve! Function
NMF.solve!(alg, X, W, H)
Use the algorithm alg to factorize X into W and H.
Here, W and H must be pre-allocated matrices (respectively of size (p, k) and (k, n)). W and H must be appropriately initialized before this function is invoked. For some algorithms, both W and H must be initialized (e.g. multiplicative updating); while for others, only W needs to be initialized (e.g. ALS).
The matrices W and H are updated in place.
Algorithms
-
Multiplicative Updating
Reference: Daniel D. Lee and H. Sebastian Seung. Algorithms for Non-negative Matrix Factorization. Advances in NIPS, 2001.
This algorithm has two different kind of objectives: minimizing mean-squared-error (
:mse) and minimizing divergence (:div). BothWandHneed to be initialized.MultUpdate(obj::Symbol=:mse, # objective, either :mse or :div maxiter::Integer=100, # maximum number of iterations verbose::Bool=false, # whether to show procedural information tol::Real=1.0e-6, # tolerance of changes on W and H upon convergence update_H::Bool=true, # whether to update H lambda_w::Real=0.0, # L1 regularization coefficient for W lambda_h::Real=0.0) # L1 regularization coefficient for HNote: the values above are default values for the keyword arguments. One can override part (or all) of them.
-
(Naive) Projected Alternate Least Square
This algorithm alternately updates
WandHwhile holding the other fixed. Each update step solvesWorHwithout enforcing the non-negativity constrait, and forces all negative entries to zeros afterwards. OnlyWneeds to be initialized.ProjectedALS(maxiter::Integer=100, # maximum number of iterations verbose::Bool=false, # whether to show procedural information tol::Real=1.0e-6, # tolerance of changes on W and H upon convergence update_H::Bool=true, # whether to update H lambda_w::Real=1.0e-6, # L2 regularization coefficient for W lambda_h::Real=1.0e-6) # L2 regularization coefficient for H -
Alternate Least Square Using Projected Gradient Descent
Reference: Chih-Jen Lin. Projected Gradient Methods for Non-negative Matrix Factorization. Neural Computing, 19 (2007).
This algorithm adopts the alternate least square strategy. A efficient projected gradient descent method is used to solve each sub-problem. Both
WandHneed to be initialized.ALSPGrad(maxiter::Integer=100, # maximum number of iterations (in main procedure) maxsubiter::Integer=200, # maximum number of iterations in solving each sub-problem tol::Real=1.0e-6, # tolerance of changes on W and H upon convergence tolg::Real=1.0e-4, # tolerable gradient norm in sub-problem (firs
