🌍 WorldStrat Ensemble: Robust Satellite Image Super-Resolution

A production-grade ensemble system combining Transformer (Swin2SR) and GAN (Real-ESRGAN) architectures to achieve state-of-the-art 4x super-resolution for satellite imagery.

� Table of Contents

Overview
Key Features
Installation
Quick Start
Project Structure
Usage
Model Architecture
Performance & Results
FAQ
Contributing
License

�📖 Overview

WorldStrat Ensemble is a high-performance super-resolution pipeline designed specifically for the WorldStrat satellite imagery dataset. It addresses the unique challenges of satellite SR including:

Atmospheric Noise: Cloud interference, haze, and atmospheric scattering
Low Resolution Input: Sentinel-2 imagery at 10m/pixel → WorldView-3 quality at 2.5m/pixel
Dynamic Ranges: Varied illumination conditions from polar to equatorial regions
Large-Scale Inference: Handling thousands of images efficiently

Why Ensemble?

We fuse two complementary architectures:

| Model | Type | Strength | Weakness | |-------|------|----------|----------| | Swin2SR | Transformer | Global structure, clean edges | Less detailed textures | | Real-ESRGAN | GAN (RRDB) | Realistic high-frequency details | Can introduce artifacts |

Result: Ensemble achieves +0.2 to +0.4 dB PSNR improvement over best single model.

🚀 Key Features

Robustness

✅ Crash-Proof: Gracefully handles corrupted files, missing checkpoints, GPU OOM errors
✅ Checkpoint Recovery: Auto-detects weights from multiple search paths
✅ Fallback Mechanisms: Uses best single model if ensemble fails validation

Intelligence

🧠 Adaptive Normalization: Auto-detects raw vs. pre-normalized satellite data
🧠 Dynamic Weighting: Validation-driven ensemble strategy (Equal/Softmax/Proportional)
🧠 Self-Validation: Computes PSNR before test inference to verify quality

Efficiency

⚡ Memory Optimized: Runs on consumer GPUs (T4: 15GB, P100: 16GB)
⚡ Multi-GPU Support: Automatic DataParallel for 2+ GPUs
⚡ Progress Monitoring: Real-time logging with estimated time remaining

🛠️ Installation

Prerequisites

Python: 3.8 or higher
GPU: CUDA-enabled with 8GB+ VRAM (16GB recommended)
Disk Space: 5GB for models + dataset

Step 1: Clone Repository

git clone https://github.com/Aditya26189/klymo.git
cd klymo

Step 2: Install Dependencies

Option A: Using pip (Recommended)

# Install PyTorch with CUDA 11.8
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu118

# Install core dependencies
pip install transformers rasterio tifffile tqdm pandas numpy

# Install Swin2SR requirements
pip install timm einops

Option B: Using conda

conda create -n worldstrat python=3.9
conda activate worldstrat
conda install pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia
pip install transformers rasterio tifffile tqdm pandas numpy timm einops

Step 3: Download Model Weights

[!IMPORTANT] Model weights are NOT included in this repository due to size constraints. Download them from:

Google Drive (~500MB)

Hugging Face Hub

Place .pth files in final-models/:
final-models/
├── swin2sr_best.pth      # ~230MB
└── realesrgan_best.pth   # ~280MB

⚡ Quick Start

5-Minute Tutorial

# 1. Navigate to project directory
cd klymo

# 2. Verify GPU is available
python -c "import torch; print('GPU:', torch.cuda.get_device_name(0))"

# 3. Run inference on sample images
python WORLDSTRAT_ENSEMBLE_CORRECTED.py \
  --test_csv /path/to/test.csv \
  --output_dir ./predictions

# 4. Check results
ls -lh predictions/  # Should see ~149 .tif files

Example: Processing Custom Images

from WORLDSTRAT_ENSEMBLE_CORRECTED import WorldStratInferenceDataset
import pandas as pd

# Create test dataframe
df = pd.DataFrame({
    'lr_path': ['/data/sentinel2/image_001.tif'],
    'location': ['test_location_001']
})

# Load dataset
dataset = WorldStratInferenceDataset(df, load_hr=False)

# Run inference (see notebook for full pipeline)

📂 Project Structure

klymo/
├── 📓 ENSEMBLE_FINAL_ROBUST.ipynb       # Main inference notebook (Kaggle-ready)
├── 🐍 WORLDSTRAT_ENSEMBLE_CORRECTED.py  # Standalone Python script
├── 📖 README.md                         # This file
├── 🤝 CONTRIBUTING.md                   # Contribution guidelines
├── 📋 RELEASE_NOTES.md                  # Version history
├── 🚀 DEPLOYMENT.md                     # Production deployment guide
│
├── 📄 Documentation/
│   ├── ENSEMBLE_REASONING_DOCUMENT.txt  # Architecture decisions (detailed)
│   └── QA_DEPLOYMENT_CHECKLIST.txt      # Pre-launch checklist
│
├── 🎯 final-models/                     # Trained model weights
│   ├── swin2sr_best.pth                 # Swin2SR checkpoint
│   └── realesrgan_best.pth              # Real-ESRGAN checkpoint
│
├── 📂 sample-model/                     # Training notebooks & configs
│   ├── swin2sr-ultra-max-safe-city.ipynb
│   └── model-enrgan.ipynb
│
└── 📦 archive/                          # Historical experiments

💻 Usage

Option A: Jupyter Notebook (Kaggle/Colab)

Best for: Interactive execution, visualization, prototyping

Open ENSEMBLE_FINAL_ROBUST.ipynb in Jupyter/Kaggle

Configure paths in Cell 3 (Checkpoint Detection):

MODEL_CONFIGS = {
    'swin2sr': {
        'checkpoints': ['/kaggle/input/your-weights/swin2sr_best.pth']
    },
    # ...
}

Run cells sequentially (Shift+Enter)
Monitor checkmarks:
- ✅ Dependencies installed
- ✅ GPU detected
- ✅ Models loaded
- ✅ Validation passed
- ✅ Predictions generated

Option B: Standalone Script

Best for: Batch processing, production servers, CI/CD

python WORLDSTRAT_ENSEMBLE_CORRECTED.py \
  --test_csv /data/worldstrat/test.csv \
  --output_dir /output/predictions \
  --batch_size 4 \
  --num_workers 4

Arguments:

--test_csv: Path to test split CSV (must have lr_path column)
--output_dir: Directory for super-resolved images (default: ./predictions)
--batch_size: Inference batch size (default: auto-detect based on GPU)
--num_workers: Data loading workers (default: 2)

🧠 Model Architecture

Ensemble Strategy

The system uses validation-driven adaptive weighting:

graph TD
    A[Compute Validation PSNR] --> B{PSNR Δ?}
    B -->|Δ < 0.3 dB| C[Equal Weights<br/>0.5, 0.5]
    B -->|0.3 ≤ Δ ≤ 1.0 dB| D[Softmax T=2.0<br/>~0.65, 0.35]
    B -->|Δ > 1.0 dB| E[Proportional<br/>~0.80, 0.20]
    C --> F[Ensemble Prediction]
    D --> F
    E --> F

Why This Works:

Close Performance: Equal weighting maximizes diversity
Moderate Gap: Softmax balances contribution vs. quality
Large Gap: Proportional prevents weak model from degrading results

Model Details

Swin2SR (Transformer-based)

Architecture: Swin Transformer V2 with shifted windows
Depth: [6, 6, 6, 6, 6, 6] (6 stages, 6 blocks each)
Embedding Dim: 180
Parameters: ~28.6M
FLOPs: ~45.2G (for 128×128 input)
Trained on: WorldStrat + ImageNet (pre-training)

Real-ESRGAN (GAN-based)

Generator: RRDBNet (Residual-in-Residual Dense Blocks)
Blocks: 23 RRDB blocks
Features: 64 base channels
Growth: 32 channels per dense layer
Parameters: ~16.7M
Loss: Combination of L1 + Perceptual (VGG) + GAN

Normalization Pipeline

# Sentinel-2 (Input LR)
def normalize_sentinel(img):
    # Raw: uint16 [0, 3000] for RGB bands
    # Normalized: float32 [0, 1]
    return np.clip(img / 3000.0, 0.0, 1.0)

# WorldView-3 (Target HR)
def normalize_worldview(img):
    # Raw: uint16 12-bit [0, 4095]
    # Normalized: float32 [0, 1]
    return np.clip(img / 4095.0, 0.0, 1.0)

📊 Performance & Results

Quantitative Metrics

| Model | Architecture | Params | Val PSNR | Val SSIM | Inference Time* | |-------|-------------|--------|----------|----------|-----------------| | Swin2SR | Transformer | 28.6M | 29.59 dB | 0.8421 | 0.18s/img | | Real-ESRGAN | GAN (RRDB) | 16.7M | 29.12 dB | 0.8392 | 0.14s/img | | Ensemble | Weighted Avg | — | 29.83 dB | 0.8456 | 0.32s/img |

*On NVIDIA T4 GPU, batch_size=1, 512×512 output

Validation Results Breakdown

Dataset: 149 validation samples from WorldStrat
Regions: Urban (45%), Rural (35%), Coastal (20%)

| Region | Swin2SR | ESRGAN | Ensemble | Δ Improvement | |--------|---------|--------|----------|---------------| | Urban | 30.12 dB | 29.45 dB | 30.34 dB | +0.22 dB | | Rural | 29.28 dB | 28.93 dB | 29.52 dB | +0.24 dB | | Coastal | 29.01 dB | 28.

Klymo

Install / Use

README