SwiftPixelUtils
High-performance Swift library for ML image preprocessing & postprocessing. One-line APIs for TFLite, CoreML, PyTorch/ExecuTorch. Supports classification, detection (YOLO), segmentation (DeepLabV3), quantization, augmentation, and visualization.
Install / Use
/learn @manishkumar03/SwiftPixelUtilsREADME
SwiftPixelUtils
<p align="center"> <img src="https://img.shields.io/badge/Swift-5.9-orange" alt="Swift"> <img src="https://img.shields.io/badge/platforms-iOS%2015%2B%20%7C%20macOS%2012%2B-blue" alt="Platforms"> <img src="https://img.shields.io/badge/SPM-compatible-brightgreen" alt="SPM"> <img src="https://img.shields.io/github/license/manishkumar03/SwiftPixelUtils" alt="License"> </p> <p align="center"> <strong>High-performance Swift library for image preprocessing optimized for ML/AI inference pipelines on iOS/macOS</strong> </p> <p align="center"> <a href="#installation">Installation</a> • <a href="#quick-start">Quick Start</a> • <a href="#api-reference">API Reference</a> • <a href="#features">Features</a> • <a href="docs/README.md">📚 Docs</a> </p>High-performance Swift library for image preprocessing optimized for ML/AI inference pipelines. Native implementations using Core Image, Accelerate, and Core ML for pixel extraction, tensor conversion, quantization, augmentation, and model-specific preprocessing (YOLO, MobileNet, etc.)
✨ Features
Core Preprocessing
- 🚀 High Performance: Native implementations using Apple frameworks (Core Image, Accelerate, vImage, Core ML)
- 🔢 Raw Pixel Data: Extract pixel values as typed arrays (Float, Float16, Int32, UInt8) ready for ML inference
- 🎨 Multiple Color Formats: RGB, RGBA, BGR, BGRA, Grayscale, HSV, HSL, LAB, YUV, YCbCr
- 📐 Flexible Resizing: Cover, contain, stretch, and letterbox strategies with automatic transform metadata
- ✂️ ROI Pipeline: Crop → resize → normalize in a single call via ROI options
- 🔢 ML-Ready Normalization: ImageNet, TensorFlow, custom presets
- 📊 Multiple Data Layouts: HWC, CHW, NHWC, NCHW (PyTorch/TensorFlow compatible)
- 🖼️ Multiple Sources: local file URLs, data, base64, assets, photo library
- 📱 Orientation Handling: Opt-in UIImage/EXIF orientation normalization to fix silent rotation issues
ML Framework Integration
- 🤖 Simplified ML APIs: One-line preprocessing (
getModelInput) and postprocessing (ClassificationOutput,DetectionOutput,SegmentationOutput,DepthEstimationOutput) for all major frameworks - 🤖 Model Presets: Pre-configured settings for YOLO (v8/v9/v10), RT-DETR, MobileNet, EfficientNet, ResNet, ViT, CLIP, SAM/SAM2, DINO, DETR, Mask2Former, UNet, DeepLab, SegFormer, FCN, PSPNet
- 🎯 Framework Targets: Automatic configuration for PyTorch, TensorFlow, TFLite, CoreML, ONNX Runtime, ExecuTorch, OpenCV
- 🏷️ Label Database: Built-in labels for COCO, ImageNet, VOC, CIFAR, Places365, ADE20K, Open Images, LVIS, Objects365, Kinetics
ONNX Runtime Integration
- 🔌 ONNX Helper: Streamlined tensor creation for ONNX Runtime with
ONNXHelper.createTensorData() - 📊 ONNX Data Types: Support for Float32, Float16, UInt8, Int8, Int32, Int64 tensor types
- 🎯 ONNX Model Configs: Pre-configured settings for YOLOv8, RT-DETR, ResNet, MobileNetV2, ViT, CLIP
- 🔍 Output Parsing: Built-in parsers for YOLOv8, YOLOv5, RT-DETR, SSD detection outputs
- 🧮 Segmentation Output: Parse ONNX segmentation model outputs with argmax
Quantization
- 🎯 Native Quantization: Float→Int8/UInt8/Int16/INT4 with per-tensor and per-channel support (TFLite/ExecuTorch compatible)
- 🔢 INT4 Quantization: 4-bit quantization (8× compression) for LLM weights and edge deployment
- 📊 Per-Channel Quantization: Channel-wise scale/zeroPoint for higher accuracy (CNN, Transformer weights)
- 🔄 Float16 Conversion: IEEE 754 half-precision ↔ Float32 utilities for CVPixelBuffer processing
- 🎥 CVPixelBuffer Formats: BGRA/RGBA, NV12, and RGB565 conversion to tensor data
Detection & Segmentation
- 📦 Bounding Box Utilities: Format conversion (xyxy/xywh/cxcywh), scaling, clipping, IoU, NMS
- 🖼️ Letterbox Padding: YOLO-style letterbox preprocessing with automatic transform metadata for reverse coordinate mapping
- 📏 Depth Estimation: Process MiDaS, DPT, ZoeDepth, Depth Anything outputs with scientific colormaps (Viridis, Plasma, Turbo) and custom colormaps
Data Augmentation
- 🔄 Image Augmentation: Rotation, flip, brightness, contrast, saturation, blur
- 🎨 Color Jitter: Granular brightness/contrast/saturation/hue control with range support and seeded randomness
- ✂️ Cutout/Random Erasing: Mask random regions with constant/noise fill for robustness training
- 🎲 Random Crop with Seed: Reproducible random crops for data augmentation pipelines
Tensor Operations
- 🧮 Tensor Operations: Channel extraction, patch extraction, permutation, batch concatenation
- 🔙 Tensor to Image: Convert processed tensors back to images
- 🔲 Grid/Patch Extraction: Extract image patches in grid patterns for sliding window inference
- ✅ Tensor Validation: Validate tensor shapes, dtypes, and value ranges before inference
- 📦 Batch Assembly: Combine multiple images into NCHW/NHWC batch tensors
- 📦 Batch Processing: Process multiple images with concurrency control
Visualization & Analysis
- 🎨 Drawing/Visualization: Draw boxes, keypoints, masks, and heatmaps for debugging
- 📈 Image Analysis: Statistics, metadata, validation, blur detection
📱 Example App
A comprehensive iOS example app is included in the Example/ directory, demonstrating all major features:
- TensorFlow Lite Classification - MobileNetV2 with TopK results
- ExecuTorch Classification - MobileNetV3 with TopK results
- Object Detection - YOLOv8 with NMS and bounding box visualization
- Semantic Segmentation - DeepLabV3 with colored mask overlay
- Depth Estimation - Depth Anything with colormaps and overlay visualization
- Pixel Extraction - Model presets (YOLO, RT-DETR, MobileNet, ResNet, ViT, CLIP, SAM2, DeepLab, etc.) and custom options
- Bounding Box Utilities - Format conversion, IoU calculation, NMS, scaling, clipping
- Image Augmentation - Rotation, flip, brightness, contrast, saturation, blur
- Tensor Operations - Channel extraction, permutation, batch assembly
- Drawing & Visualization - Boxes, labels, masks, and overlays
- Comprehensive UI Tests - 50+ UI tests covering all features
To run the example app:
cd Example/SwiftPixelUtilsExampleApp
pod install
open SwiftPixelUtilsExampleApp.xcworkspace
To run UI tests, select the SwiftPixelUtilsExampleAppUITests target and press ⌘U.
📦 Installation
Swift Package Manager
Add SwiftPixelUtils to your Package.swift:
dependencies: [
.package(url: "https://github.com/manishkumar03/SwiftPixelUtils.git", from: "1.0.0")
]
Or add it via Xcode:
- File → Add Package Dependencies
- Enter the repository URL
- Select version/branch
🚀 Quick Start
⚠️ Important: SwiftPixelUtils functions are synchronous and use
throws(notasync throws). Remote URLs (http, https) are not supported - download images first and use.data(Data)or.file(URL)instead.
Raw Pixel Data Extraction
import SwiftPixelUtils
// From local file
let result = try PixelExtractor.getPixelData(
source: .file(URL(fileURLWithPath: "/path/to/image.jpg")),
options: PixelDataOptions()
)
print(result.data) // Float array of pixel values
print(result.width) // Image width
print(result.height) // Image height
print(result.shape) // [height, width, channels]
// From downloaded data (for remote images)
let (data, _) = try await URLSession.shared.data(from: URL(string: "https://example.com/image.jpg")!)
let result2 = try PixelExtractor.getPixelData(
source: .data(data),
options: PixelDataOptions()
)
Using Model Presets
import SwiftPixelUtils
// Use pre-configured YOLO settings
let result = try PixelExtractor.getPixelData(
source: .file(URL(fileURLWithPath: "/path/to/image.jpg")),
options: ModelPresets.yolov8
)
// Automatically configured: 640x640, letterbox resize, RGB, scale normalization, NCHW layout
// Or MobileNet
let mobileNetResult = try PixelExtractor.getPixelData(
source: .file(URL(fileURLWithPath: "/path/to/image.jpg")),
options: ModelPresets.mobilenet
)
// Configured: 224x224, cover resize, RGB, ImageNet normalization, NHWC layout
Available Model Presets
Classification Models
| Preset | Size | Resize | Normalization | Layout |
|--------|------|--------|---------------|--------|
| mobilenet / mobilenet_v2 / mobilenet_v3 | 224×224 | cover | ImageNet | NHWC |
| efficientnet | 224×224 | cover | ImageNet | NHWC |
| resnet / resnet50 | 224×224 | cover | ImageNet | NCHW |
| vit | 224×224 | cover | ImageNet | NCHW |
| clip | 224×224 | cover | CLIP-specific | NCHW |
| dino | 224×224 | cover | ImageNet | NCHW |
Detection Models
| Preset | Size | Resize | Normalization | Layout | Notes |
|--------|------|--------|---------------|--------|-------|
| yolo / yolov8 | 640×640 | letterbox | scale | NCHW | Standard YOLO |
| yolov9 | 640×640 | letterbox | scale | NCHW | PGI + GELAN |
| yolov10 / yolov10_n/s/m/l/x | 640×640 | letterbox | scale | NCHW | NMS-free |
| rtdetr / rtdetr_l / rtdetr_x | 640×640 | letterbox | scale | NCHW | Real-time DETR |
| detr | 800×800 | contain | ImageNet | NCHW | Transformer detection |
Segmentation Models
| Preset | Size | Resize | Normalization | Layout | Notes |
|--------|------|--------|---------------|--------|-------|
| sam | 1024×1024 | contain | ImageNet | NCHW | Segment Anything |
| sam2 / sam2_t/s/b_plus/l | 1024×1024 | contain | ImageNet | NCHW | SAM2 with video |
| mask2former / mask2former_swin_t/l | 512×512 | contain | ImageNet | NCHW | Universal segmentation |
| deeplab / deeplabv3 / deeplabv3_plus | 513×513 | contain | ImageNet | NCHW | ASPP module |
| deeplab_769 / deeplab_1025 |
