MultiClean
Python library for morphological cleaning of multiclass 2D numpy arrays (edge smoothing and island removal
Install / Use
/learn @DPIRD-DMA/MultiCleanREADME
MultiClean
MultiClean is a Python library for morphological cleaning of multiclass 2D numpy arrays (segmentation masks and classification rasters). It provides efficient tools for edge smoothing and small-island removal across multiple classes, then fills gaps using the nearest valid class.
Visual Example
Below: Land Use before/after cleaning (smoothed edges, small-island removal, nearest-class gap fill).
<img src="https://raw.githubusercontent.com/DPIRD-DMA/MultiClean/main/assets/land_use_before_after.png" alt="Land Use before/after" width="900"/>Installation
pip install multiclean
or
uv add multiclean
Quick Start
import numpy as np
from multiclean import clean_array
# Create a sample classification array with classes 0, 1, 2, 3
array = np.random.randint(0, 4, (1000, 1000), dtype=np.int32)
# Clean with default parameters
cleaned = clean_array(array)
# Custom parameters
cleaned = clean_array(
array,
class_values=[0, 1, 2, 3],
smooth_edge_size=2, # kernel width, larger value increases smoothness
min_island_size=100, # remove components with area < 100
connectivity=8, # 4 or 8
max_workers=4,
fill_nan=False # enable/disable the filling of nan values in input array
)
Use Cases
MultiClean is designed for cleaning segmentation outputs from:
- Remote sensing: Land cover classification, crop mapping
- Computer vision: Semantic segmentation post-processing
- Geospatial analysis: Raster classification cleaning
- Machine learning: Neural network output refinement
Key Features
- Multi-class processing: Clean all classes in one pass
- Edge smoothing: Morphological opening to reduce jagged boundaries
- Island removal: Remove small connected components per class
- Gap filling: Fill invalids via nearest valid class (distance transform)
- Fast: NumPy + OpenCV + SciPy with parallelism
How It Works
MultiClean uses morphological operations to clean classification arrays:
- Edge smoothing (per class): Morphological opening with an elliptical kernel.
- Island removal (per class): Find connected components (OpenCV) and mark components with area
< min_island_sizeas invalid. - Gap filling: Compute a distance transform to copy the nearest valid class into invalid pixels.
Classes are processed together and the result maintains a valid label at every pixel.
API Reference
clean_array
from multiclean import clean_array
out = clean_array(
array: np.ndarray,
class_values: int | list[int] | None = None,
smooth_edge_size: int = 2,
min_island_size: int = 100,
connectivity: int = 4,
max_workers: int | None = None,
fill_nan: bool = False
)
array: 2D numpy array of class labels (int or float). For float arrays,NaNis treated as nodata and will remainNaNunlessfill_nanis set toTrue.class_values: Classes to consider. IfNone, inferred fromarray(ignoresNaNfor floats). An int restricts cleaning to a single class.smooth_edge_size: Kernel size (pixels) for morphological opening. Use0to disable.min_island_size: Remove components with area strictly< min_island_size. Use1to keep single pixels.connectivity: Pixel connectivity for components,4or8.max_workers: Parallelism for per-class operations (None lets the executor choose).fill_nan: If True will fill NAN values from input array with nearest valid value.
Returns a numpy array matching the input shape. Integer inputs return integer outputs. Float arrays with NaN are supported and can be filled or remain as NAN.
Examples
Cleaning Land Cover Data
from multiclean import clean_array
import rasterio
# Read land cover classification
with rasterio.open('landcover.tif') as src:
landcover = src.read(1)
# Clean with appropriate parameters for satellite data
cleaned = clean_array(
landcover,
class_values=[0, 1, 2, 3, 4], # forest, water, urban, crop, other
smooth_edge_size=1,
min_island_size=25,
connectivity=8,
fill_nan=False
)
Cleaning Neural Network Segmentation Output
from multiclean import clean_array
# Model produces logits; convert to class predictions
np_pred = np_model_logits.argmax(axis=0) # shape: (H, W)
# Clean the segmentation
cleaned = clean_array(
np_pred,
smooth_edge_size=2,
min_island_size=100,
connectivity=4,
)
Notebooks
See the notebooks folder for end-to-end examples:
- Land Use Example Notebook: land use classification cleaning
- Cloud Example Notebook: cloud/shadow classification cleaning
Try in Colab
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Related Skills
node-connect
353.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
111.6kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
111.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
353.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
