DecompositionUMAP
DecompositionUMAP: A multi-scale framework for pattern analysis and anomoly detection
Install / Use
/learn @gxli/DecompositionUMAPREADME
================================================================================ Decomposition-UMAP: A framework for pattern classification and anomaly detection
.. image:: https://img.shields.io/pypi/v/decomposition-umap.svg :target: https://pypi.org/project/decomposition-umap/0.1.0 :alt: PyPI Version
.. .. image:: https://img.shields.io/travis/gxli/DecompositionUMAP.svg .. :target: https://travis-ci.org/gxli/DecompositionUMAP .. :alt: Build Status
.. image:: images/logo.png :alt: Project Logo :width: 200px :align: center
Decomposition-UMAP
.. image:: images/decomposition-umap_workflow.png :width: 100% :align: center :alt: Decomposition-UMAP workflow
Decomposition-UMAP is a general-purpose framework for pattern classification and anomaly detection. The methodology involves a two-stage process: first, the application of a multiscale decomposition technique, followed by a non-linear dimension reduction using the Uniform Manifold Approximation and Projection (UMAP) algorithm.
This software provides a structured implementation for analyzing numerical data by combining signal and image decomposition with manifold learning. The primary workflow involves decomposing an input dataset into a set of components, which serve as a high-dimensional feature vector for each point in the original data. Subsequently, the UMAP algorithm is employed to project these features into a lower-dimensional space. This process is designed to facilitate the analysis of data where features may be present across multiple scales or frequencies, enabling the separation of structured signals from noise.
.. Functionality
.. -------------
.. * Flexible API with Explicit Modes: Provides a high-level API that
.. supports single datasets, single dataset with use-supplied decomposition function and pre-computed decompositions.
.. * Powerful Decomposition Techniques: Includes interfaces for methods like
.. Constrained Diffusion Decomposition (cdd) and Empirical Mode Decomposition
.. (EMD), and Wavelet Decomposition (Wavelet).
.. * Full UMAP Control: Allows for complete control over the UMAP algorithm's parameters via convenience arguments and a flexible dictionary (umap_params).
.. * Support for Custom Functions: Users can supply their own decomposition functions for maximum extensibility.
.. * Serialization of Models: Trained UMAP models can be saved using pickle and reloaded for consistent inference on new data.
Installation
The required Python packages must be installed prior to use. It is recommended to use a virtual environment.
.. code-block:: bash
pip install numpy umap-learn scipy matplotlib constrained-diffusion
and install
Decomposition-UMAP via pip:
.. code-block:: bash
pip install decomposition-umap==0.1.0
or clone the repository and install it manually:
.. code-block:: bash
git clone https://github.com/gxli/DecompositionUMAP.git
cd DecompositionUMAP
pip install .
Usage
The following examples demonstrate the core workflows using a synthetic 256x256 dataset composed of a Gaussian anomaly embedded in a fractal noise background. Usage
The following examples demonstrate the core workflows using a synthetic 256x256 dataset composed of a Gaussian anomaly embedded in a fractal noise background.
- Data Generation
First, we generate the data. This function is assumed to be available in an `example` module within the library. After installing your package, you can import it as shown below.
.. code-block:: python
import numpy as np
# Import the library and the example data generator
import decomposition_umap
from decomposition_umap import example as du_example
# Generate a dataset with a known anomaly
data, signal, anomaly = du_example.generate_fractal_with_gaussian()
2. Running the Pipeline (Core Examples)
Example A: Standard Mode (Built-in Decomposition)
This is the most common use case for training a new model.
.. code-block:: python
import pickle
embed_map, decomposition, umap_model = decomposition_umap.decompose_and_embed(
data=data,
decomposition_method='cdd',
decomposition_max_n=6,
n_component=2,
umap_n_neighbors=20
)
# Save the model for the inference example
with open("fractal_umap_model.pkl", "wb") as f:
pickle.dump(umap_model, f)
Example B: Custom Decomposition Function (decomposition_func=...)
Use this when you have your own method for separating features.
.. code-block:: python
from scipy.ndimage import gaussian_filter
def my_custom_decomposition(data):
"""A simple decomposition using Gaussian filters."""
comp1 = gaussian_filter(data, sigma=3)
comp2 = data - comp1
return np.array([comp1, comp2])
embed_map_custom, _, _ = decomposition_umap.decompose_and_embed(
data=data,
decomposition_func=my_custom_decomposition,
n_component=2
)
Example C: Pre-computed Decomposition (decomposition=...)
This is efficient if your decomposition is slow and you want to reuse it while testing UMAP parameters.
.. code-block:: python
from decomposition_umap.multiscale_decomposition import cdd_decomposition
# Manually run the decomposition first
precomputed, _ = cdd_decomposition(data, max_n=6)
embed_map_pre, _, _ = decomposition_umap.decompose_and_embed(
decomposition=np.array(precomputed),
n_component=2
)
Example D: Inference with a Pre-trained Model
Use decompose_with_existing_model to apply a saved model to new data.
.. code-block:: python
# Generate new data for inference
new_data, _, _ = du_example.generate_fractal_with_gaussian(anomaly_center=(200, 200))
# Apply the model saved from Example A
new_embed_map, _ = decomposition_umap.decompose_with_existing_model(
model_filename="fractal_umap_model.pkl",
data=new_data,
decomposition_method='cdd',
decomposition_max_n=6
)
3. Visualizing Results
The UMAP embedding can effectively separate the anomaly from the background.
.. code-block:: python
import matplotlib.pyplot as plt
# --- Plot the UMAP embedding from Example A ---
umap_x = embed_map[0].flatten()
umap_y = embed_map[1].flatten()
is_highlighted = anomaly.flatten() > data.flatten()
plt.figure(figsize=(8, 8))
plt.scatter(
umap_x[~is_highlighted], umap_y[~is_highlighted],
label='Background', alpha=0.1, s=10, color='gray'
)
plt.scatter(
umap_x[is_highlighted], umap_y[is_highlighted],
label='Highlighted Anomaly (Anomaly > Data)',
alpha=0.8, s=15, color='red'
)
plt.title('UMAP Embedding with Anomaly Highlighted', fontsize=16)
plt.xlabel('UMAP Dimension 1')
plt.ylabel('UMAP Dimension 2')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.6)
plt.axis('equal')
plt.show()
4. Command-Line Tool
This package includes a convenient command-line tool, decomposition-umap, for quick analysis of FITS or NPY files. After installing the package, you can run it directly from your terminal.
By default, the tool saves the output files in the same directory as the input file, prefixed with the input file's name. You can optionally specify a different output directory.
Usage: .. code-block:: text
usage: decomposition-umap [-h] [-o OUTPUT_DIR] [-d DECOMPOSITION_LEVEL] [-n {2,3}]
[-m {cdd,emd}] [-p UMAP_PARAMS] [--no-verbose]
input_file
Examples:
-
Basic Analysis (Default Output Path): Process a FITS file with default settings. The output files (e.g.,
my_image_decomposition.npy) will be saved in the same directory asmy_image.fits. .. code-block:: bashdecomposition-umap path/to/my_image.fits
-
Specifying an Output Directory: Process a file and save the results into a specific folder named
analysis_results. .. code-block:: bashdecomposition-umap path/to/my_image.fits -o analysis_results/
-
3D Embedding and Custom Decomposition: Process a NumPy file, use exactly 8 decomposition components, and create a 3D UMAP embedding. .. code-block:: bash
decomposition-umap my_data.npy -o results/ -d 8 -n 3
-
Advanced UMAP Control: Use the
--umap_paramsflag to pass a JSON string of advanced parameters, such as enabling UMAP'slow_memorymode. .. code-block:: bashdecomposition-umap large_image.fits -o results/ -d 10 -p '{"n_neighbors": 50, "low_memory": true}'
API Reference
decompose_and_embed(...)
The primary function for training a new Decomposition-UMAP model. It intelligently handles multiple input modes for maximum flexibility.
-
Operating Modes (provide exactly one):
-
data(numpy.ndarray): For a single raw dataset. -
datasets(list): For a batch of raw datasets. -
data_multivariate(numpy.ndarray): For a multi-channel raw dataset. -
decomposition(numpy.ndarray): For a single pre-computed decomposition.
-
-
Key Parameters:
-
decomposition_method(str): The name of the built-in decomposition method (e.g.,'cdd','emd','wavelet'). Ignored ifdecompositionis provided. -
decomposition_max_n(int): The number of components to generate for relevant decomposition methods. -
decomposition_func(callable): A user-provided decomposition function, which overridesdecomposition_method. Ignored ifdecompositionis provided. -
n_component(int): The target dimension for the final UMAP embedding.
-
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
flutter-tutor
Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d
