NMFk.jl
Nonnegative Matrix Factorization + k-means clustering and physics constraints for Unsupervised and Physics-Informed Machine Learning
Install / Use
/learn @SmartTensors/NMFk.jlREADME
NMFk: Nonnegative Matrix Factorization + k-means clustering and physics constraints
<div style="text-align: left;"> <img src="logo/nmfk-logo.jpg" alt="nmfk" width=50% max-width=125px;/> </div>NMFk is a module of the SmartTensors ML framework (smarttensors.com).
<div style="text-align: left"> <img src="logo/SmartTensorsNewSmall.png" alt="SmartTensors" width=25% max-width=125px;/> </div>NMFk is a novel unsupervised machine learning methodology that allows for the automatic identification of the optimal number of features (signals/signatures) present in the data.
Classical NMF approaches do not allow for automatic estimation of the number of features.
NMFk estimates the number of features k through k-means clustering coupled with regularization constraints (sparsity, physical, mathematical, etc.).
SmartTensors can be applied to perform:
- Feature extraction (FE)
- Blind source separation (BSS)
- Detection of disruptions/anomalies
- Data gap discovery
- Data gap filling and reconstruction
- Image recognition
- Text mining
- Data classification
- Separation (deconstruction) of co-occurring (physics) processes
- Discovery of unknown dependencies and phenomena
- Development of reduced-order/surrogate models
- Identification of dependencies between model inputs and outputs
- Guiding the development of physics models representing the ML-analyzed data
- Blind predictions
- Optimization of data acquisition (optimal experimental design)
- Labeling of datasets for supervised ML analyses
NMFk provides high-performance computing capabilities to solve problems in parallel using Shared and Distributed Arrays. The parallelization allows for the utilization of multi-core / multi-processor environments. GPU and TPU accelerations are available through existing Julia packages.
NMFk provides advanced tools for data visualization, pre- and post-processing. These tools substantially facilitate the utilization of the package in various real-world applications.
NMFk methodology and applications are discussed in the research papers and presentations listed below.
NMFk is demonstrated with a series of examples and test problems provided here.
Awards
SmartTensors and NMFk were recently awarded:
- 2021 R&D100 Award: Information Technologies (IT)
- 2021 R&D100 Bronze Medal: Market Disruptor in Services
Installation
After starting Julia, execute:
import Pkg
Pkg.add("NMFk")
to access the latest released version.
To utilize the latest code updates (commits), use:
import Pkg
Pkg.add(Pkg.PackageSpec(name="NMFk", rev="master"))
Docker
docker run --interactive --tty montyvesselinov/tensors
The docker image provides access to all SmartTensors packages (smarttensors.github.io).
Testing
import Pkg
Pkg.test("NMFk")
Examples
A simple problem demonstrating NMFk can be executed as follows.
First, generate 3 random signals in a matrix W:
a = rand(15)
b = rand(15)
c = rand(15)
W = [a b c]
Then, mix the signals to produce a data matrix X of 5 sensors observing the mixed signals as follows:
X = [a+c*3 a*10+b b b*5+c a+b*2+c*5]
This is equivalent to generating a mixing matrix H and obtaining X by multiplying W and H
H = [1 10 0 0 1; 0 1 1 5 2; 3 0 0 1 5]
X = W * H
After that, execute NMFk to estimate the number of unknown mixed signals based only on the information in X.
import NMFk
We, He, fitquality, robustness, aic, kopt = NMFk.execute(X, 2:5; save=false, method=:simple);
The execution will produce output like this:
[ Info: Results
Signals: 2 Fit: 15.489 Silhouette: 0.9980145 AIC: -38.30184
Signals: 3 Fit: 3.452203e-07 Silhouette: 0.8540085 AIC: -1319.743
Signals: 4 Fit: 8.503988e-07 Silhouette: -0.5775127 AIC: -1212.129
Signals: 5 Fit: 2.598571e-05 Silhouette: -0.6757581 AIC: -915.6589
[ Info: Optimal solution: 3 signals
The code returns the estimated optimal number of signals kopt, which in this case, as expected, is equal to 3.
The code returns the fitquality and robustness; they can applied to represent how the solutions change with the increase of k:
NMFk.plot_signal_selection(2:5, fitquality, robustness)
<div style="text-align: left">
<img src="images/signal_selection.png" alt="signal_selection" width=75% max-width=200px;/>
</div>
The code also returns estimates of matrices W and H.
It can be easily verified that estimated We[kopt] and He[kopt] are scaled versions of the original W and H matrices.
Note that the order of columns ('signals') in W and We[kopt] are not expected to match.
The order of rows ('sensors') in H and He[kopt] are also not expected to match.
The estimated orders will be different every time the code is executed.
The matrices can be visualized using:
import Pkg; Pkg.add("Mads")
import Mads
Mads.plotseries([a b c])
Mads.plotseries(We[kopt] ./ maximum(We[kopt]))
<div style="text-align: left">
<img src="images/signals_original.png" alt="signals_original" width=75% max-width=200px;/>
</div>
<div style="text-align: left">
<img src="images/signals_reconstructed.png" alt="signals_reconstructed" width=75% max-width=200px;/>
</div>
NMFk.plotmatrix(H)
NMFk.plotmatrix(He[kopt] ./ maximum(He[kopt]))
<div style="text-align: left">
<img src="images/blind_source_separation_24_0.svg" alt="signals_original" width=50% max-width=200px;/>
</div>
<div style="text-align: left">
<img src="images/blind_source_separation_25_0.svg" alt="signals_reconstructed" width=50% max-width=200px;/>
</div>
More examples can be found in the test, demo, examples, and notebooks directories of the NMFk repository.
Applications:
NMFk has been applied in a wide range of real-world applications. The analyzed datasets include model outputs, experimental laboratory data, and field tests:
- Climate data and simulations
- Watershed data and simulations
- Aquifer simulations
- Surface-water and Groundwater analyses
- Material characterization
- Reactive mixing
- Molecular dynamics
- Contaminant transport
- Induced seismicity
- Phase separation of co-polymers
- Oil / Gas extraction from unconventional reservoirs
- Geothermal exploration and production
- Geologic carbon storage
- Wildfires
Videos:
- Progress of nonnegative matrix factorization process:
More videos are available at YouTube
Notebooks:
A series of Jupyter notebooks demonstrating NMFk have been developed:
The notebooks can also be accessed using:
NMFk.notebooks()
Other Examples:
Patent:
Alexandrov, B.S., Vesselinov, V.V., Alexandrov, L.B., Stanev, V., Iliev, F.L., Source identification by non-negative matrix factorization combined with semi-supervised clustering, US20180060758A1
Publications:
- Vesselinov, V.V., Mudunuru, M., Karra, S., O'Malley, D., Alexandrov, B.S., Unsupervised Machine Learning Based on Non-Negative Tensor Factorization for Analyzing Reactive-Mixing, 10.1016/j.jcp.2019.05.039, Journal of Computational Physics, 2019. PDF
- Vesselinov, V.V., Alexandrov, B.S., O'Malley, D., Nonnegative Tensor Factorization for Contaminant Source Identification, Journal of Contaminant Hydrology, 10.1016/j.jconhyd.2018.11.010, 2018. PDF
- O'Malley, D., Vesselinov, V.V., Alexandrov, B.S., Alexandrov, L.B., Nonnegative/binary matrix factorization with a D-Wave quantum annealer, PlosOne, 10.1371/journal.pone.0206653, 2018. PDF
- Stanev, V., Vesselinov, V.V., Kusne, A.G., Antoszewski, G., Takeuchi, I., Alexandrov, B.A., Unsupervised Phase Mapping of X-ray Diffraction Data by Nonnegative Matrix Factorization Integrated with Custom Clustering, Nature Computational Materia
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
