SkillAgentSearch skills...

NMFk.jl

Nonnegative Matrix Factorization + k-means clustering and physics constraints for Unsupervised and Physics-Informed Machine Learning

Install / Use

/learn @SmartTensors/NMFk.jl

README

NMFk: Nonnegative Matrix Factorization + k-means clustering and physics constraints

<div style="text-align: left;"> <img src="logo/nmfk-logo.jpg" alt="nmfk" width=50% max-width=125px;/> </div>

NMFk is a module of the SmartTensors ML framework (smarttensors.com).

<div style="text-align: left"> <img src="logo/SmartTensorsNewSmall.png" alt="SmartTensors" width=25% max-width=125px;/> </div>

NMFk is a novel unsupervised machine learning methodology that allows for the automatic identification of the optimal number of features (signals/signatures) present in the data.

Classical NMF approaches do not allow for automatic estimation of the number of features.

NMFk estimates the number of features k through k-means clustering coupled with regularization constraints (sparsity, physical, mathematical, etc.).

SmartTensors can be applied to perform:

  • Feature extraction (FE)
  • Blind source separation (BSS)
  • Detection of disruptions/anomalies
  • Data gap discovery
  • Data gap filling and reconstruction
  • Image recognition
  • Text mining
  • Data classification
  • Separation (deconstruction) of co-occurring (physics) processes
  • Discovery of unknown dependencies and phenomena
  • Development of reduced-order/surrogate models
  • Identification of dependencies between model inputs and outputs
  • Guiding the development of physics models representing the ML-analyzed data
  • Blind predictions
  • Optimization of data acquisition (optimal experimental design)
  • Labeling of datasets for supervised ML analyses

NMFk provides high-performance computing capabilities to solve problems in parallel using Shared and Distributed Arrays. The parallelization allows for the utilization of multi-core / multi-processor environments. GPU and TPU accelerations are available through existing Julia packages.

NMFk provides advanced tools for data visualization, pre- and post-processing. These tools substantially facilitate the utilization of the package in various real-world applications.

NMFk methodology and applications are discussed in the research papers and presentations listed below.

NMFk is demonstrated with a series of examples and test problems provided here.

Awards

SmartTensors and NMFk were recently awarded:

<div style="text-align: left"> <img src="logo/RD100Awards-300x300.png" alt="R&D100" width=25% max-width=125px;/> </div>

Installation

After starting Julia, execute:

import Pkg
Pkg.add("NMFk")

to access the latest released version.

To utilize the latest code updates (commits), use:

import Pkg
Pkg.add(Pkg.PackageSpec(name="NMFk", rev="master"))

Docker

docker run --interactive --tty montyvesselinov/tensors

The docker image provides access to all SmartTensors packages (smarttensors.github.io).

Testing

import Pkg
Pkg.test("NMFk")

Examples

A simple problem demonstrating NMFk can be executed as follows. First, generate 3 random signals in a matrix W:

a = rand(15)
b = rand(15)
c = rand(15)
W = [a b c]

Then, mix the signals to produce a data matrix X of 5 sensors observing the mixed signals as follows:

X = [a+c*3 a*10+b b b*5+c a+b*2+c*5]

This is equivalent to generating a mixing matrix H and obtaining X by multiplying W and H

H = [1 10 0 0 1; 0 1 1 5 2; 3 0 0 1 5]
X = W * H

After that, execute NMFk to estimate the number of unknown mixed signals based only on the information in X.

import NMFk
We, He, fitquality, robustness, aic, kopt = NMFk.execute(X, 2:5; save=false, method=:simple);

The execution will produce output like this:

[ Info: Results
Signals:  2 Fit:       15.489 Silhouette:    0.9980145 AIC:    -38.30184
Signals:  3 Fit: 3.452203e-07 Silhouette:    0.8540085 AIC:    -1319.743
Signals:  4 Fit: 8.503988e-07 Silhouette:   -0.5775127 AIC:    -1212.129
Signals:  5 Fit: 2.598571e-05 Silhouette:   -0.6757581 AIC:    -915.6589
[ Info: Optimal solution: 3 signals

The code returns the estimated optimal number of signals kopt, which in this case, as expected, is equal to 3.

The code returns the fitquality and robustness; they can applied to represent how the solutions change with the increase of k:

NMFk.plot_signal_selection(2:5, fitquality, robustness)
<div style="text-align: left"> <img src="images/signal_selection.png" alt="signal_selection" width=75% max-width=200px;/> </div>

The code also returns estimates of matrices W and H.

It can be easily verified that estimated We[kopt] and He[kopt] are scaled versions of the original W and H matrices.

Note that the order of columns ('signals') in W and We[kopt] are not expected to match. The order of rows ('sensors') in H and He[kopt] are also not expected to match. The estimated orders will be different every time the code is executed.

The matrices can be visualized using:

import Pkg; Pkg.add("Mads")
import Mads
Mads.plotseries([a b c])
Mads.plotseries(We[kopt] ./ maximum(We[kopt]))
<div style="text-align: left"> <img src="images/signals_original.png" alt="signals_original" width=75% max-width=200px;/> </div> <div style="text-align: left"> <img src="images/signals_reconstructed.png" alt="signals_reconstructed" width=75% max-width=200px;/> </div>
NMFk.plotmatrix(H)
NMFk.plotmatrix(He[kopt] ./ maximum(He[kopt]))
<div style="text-align: left"> <img src="images/blind_source_separation_24_0.svg" alt="signals_original" width=50% max-width=200px;/> </div> <div style="text-align: left"> <img src="images/blind_source_separation_25_0.svg" alt="signals_reconstructed" width=50% max-width=200px;/> </div>

More examples can be found in the test, demo, examples, and notebooks directories of the NMFk repository.

Applications:

NMFk has been applied in a wide range of real-world applications. The analyzed datasets include model outputs, experimental laboratory data, and field tests:

  • Climate data and simulations
  • Watershed data and simulations
  • Aquifer simulations
  • Surface-water and Groundwater analyses
  • Material characterization
  • Reactive mixing
  • Molecular dynamics
  • Contaminant transport
  • Induced seismicity
  • Phase separation of co-polymers
  • Oil / Gas extraction from unconventional reservoirs
  • Geothermal exploration and production
  • Geologic carbon storage
  • Wildfires

Videos:

  • Progress of nonnegative matrix factorization process:
<div style="text-align: left"> <img src="movies/m643.gif" alt="nmfk-example" width=75% max-width=250px;/> </div>

More videos are available at YouTube

Notebooks:

A series of Jupyter notebooks demonstrating NMFk have been developed:

The notebooks can also be accessed using:

NMFk.notebooks()

Other Examples:

Patent:

Alexandrov, B.S., Vesselinov, V.V., Alexandrov, L.B., Stanev, V., Iliev, F.L., Source identification by non-negative matrix factorization combined with semi-supervised clustering, US20180060758A1

Publications:

  • Vesselinov, V.V., Mudunuru, M., Karra, S., O'Malley, D., Alexandrov, B.S., Unsupervised Machine Learning Based on Non-Negative Tensor Factorization for Analyzing Reactive-Mixing, 10.1016/j.jcp.2019.05.039, Journal of Computational Physics, 2019. PDF
  • Vesselinov, V.V., Alexandrov, B.S., O'Malley, D., Nonnegative Tensor Factorization for Contaminant Source Identification, Journal of Contaminant Hydrology, 10.1016/j.jconhyd.2018.11.010, 2018. PDF
  • O'Malley, D., Vesselinov, V.V., Alexandrov, B.S., Alexandrov, L.B., Nonnegative/binary matrix factorization with a D-Wave quantum annealer, PlosOne, 10.1371/journal.pone.0206653, 2018. PDF
  • Stanev, V., Vesselinov, V.V., Kusne, A.G., Antoszewski, G., Takeuchi, I., Alexandrov, B.A., Unsupervised Phase Mapping of X-ray Diffraction Data by Nonnegative Matrix Factorization Integrated with Custom Clustering, Nature Computational Materia

Related Skills

View on GitHub
GitHub Stars17
CategoryEducation
Updated9d ago
Forks1

Languages

HTML

Security Score

95/100

Audited on Mar 20, 2026

No findings