Klymo
A production-grade ensemble system combining Transformer (Swin2SR) and GAN (Real-ESRGAN) architectures to achieve state-of-the-art 4x super-resolution for satellite imagery.
Install / Use
/learn @Aditya26189/KlymoREADME
🌍 WorldStrat Ensemble: Robust Satellite Image Super-Resolution
A production-grade ensemble system combining Transformer (Swin2SR) and GAN (Real-ESRGAN) architectures to achieve state-of-the-art 4x super-resolution for satellite imagery.
� Table of Contents
- Overview
- Key Features
- Installation
- Quick Start
- Project Structure
- Usage
- Model Architecture
- Performance & Results
- FAQ
- Contributing
- License
�📖 Overview
WorldStrat Ensemble is a high-performance super-resolution pipeline designed specifically for the WorldStrat satellite imagery dataset. It addresses the unique challenges of satellite SR including:
- Atmospheric Noise: Cloud interference, haze, and atmospheric scattering
- Low Resolution Input: Sentinel-2 imagery at 10m/pixel → WorldView-3 quality at 2.5m/pixel
- Dynamic Ranges: Varied illumination conditions from polar to equatorial regions
- Large-Scale Inference: Handling thousands of images efficiently
Why Ensemble?
We fuse two complementary architectures:
| Model | Type | Strength | Weakness | |-------|------|----------|----------| | Swin2SR | Transformer | Global structure, clean edges | Less detailed textures | | Real-ESRGAN | GAN (RRDB) | Realistic high-frequency details | Can introduce artifacts |
Result: Ensemble achieves +0.2 to +0.4 dB PSNR improvement over best single model.
🚀 Key Features
Robustness
- ✅ Crash-Proof: Gracefully handles corrupted files, missing checkpoints, GPU OOM errors
- ✅ Checkpoint Recovery: Auto-detects weights from multiple search paths
- ✅ Fallback Mechanisms: Uses best single model if ensemble fails validation
Intelligence
- 🧠 Adaptive Normalization: Auto-detects raw vs. pre-normalized satellite data
- 🧠 Dynamic Weighting: Validation-driven ensemble strategy (Equal/Softmax/Proportional)
- 🧠 Self-Validation: Computes PSNR before test inference to verify quality
Efficiency
- ⚡ Memory Optimized: Runs on consumer GPUs (T4: 15GB, P100: 16GB)
- ⚡ Multi-GPU Support: Automatic DataParallel for 2+ GPUs
- ⚡ Progress Monitoring: Real-time logging with estimated time remaining
🛠️ Installation
Prerequisites
- Python: 3.8 or higher
- GPU: CUDA-enabled with 8GB+ VRAM (16GB recommended)
- Disk Space: 5GB for models + dataset
Step 1: Clone Repository
git clone https://github.com/Aditya26189/klymo.git
cd klymo
Step 2: Install Dependencies
Option A: Using pip (Recommended)
# Install PyTorch with CUDA 11.8
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu118
# Install core dependencies
pip install transformers rasterio tifffile tqdm pandas numpy
# Install Swin2SR requirements
pip install timm einops
Option B: Using conda
conda create -n worldstrat python=3.9
conda activate worldstrat
conda install pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia
pip install transformers rasterio tifffile tqdm pandas numpy timm einops
Step 3: Download Model Weights
[!IMPORTANT] Model weights are NOT included in this repository due to size constraints. Download them from:
- Google Drive (~500MB)
- Hugging Face Hub
Place
.pthfiles infinal-models/:final-models/ ├── swin2sr_best.pth # ~230MB └── realesrgan_best.pth # ~280MB
⚡ Quick Start
5-Minute Tutorial
# 1. Navigate to project directory
cd klymo
# 2. Verify GPU is available
python -c "import torch; print('GPU:', torch.cuda.get_device_name(0))"
# 3. Run inference on sample images
python WORLDSTRAT_ENSEMBLE_CORRECTED.py \
--test_csv /path/to/test.csv \
--output_dir ./predictions
# 4. Check results
ls -lh predictions/ # Should see ~149 .tif files
Example: Processing Custom Images
from WORLDSTRAT_ENSEMBLE_CORRECTED import WorldStratInferenceDataset
import pandas as pd
# Create test dataframe
df = pd.DataFrame({
'lr_path': ['/data/sentinel2/image_001.tif'],
'location': ['test_location_001']
})
# Load dataset
dataset = WorldStratInferenceDataset(df, load_hr=False)
# Run inference (see notebook for full pipeline)
📂 Project Structure
klymo/
├── 📓 ENSEMBLE_FINAL_ROBUST.ipynb # Main inference notebook (Kaggle-ready)
├── 🐍 WORLDSTRAT_ENSEMBLE_CORRECTED.py # Standalone Python script
├── 📖 README.md # This file
├── 🤝 CONTRIBUTING.md # Contribution guidelines
├── 📋 RELEASE_NOTES.md # Version history
├── 🚀 DEPLOYMENT.md # Production deployment guide
│
├── 📄 Documentation/
│ ├── ENSEMBLE_REASONING_DOCUMENT.txt # Architecture decisions (detailed)
│ └── QA_DEPLOYMENT_CHECKLIST.txt # Pre-launch checklist
│
├── 🎯 final-models/ # Trained model weights
│ ├── swin2sr_best.pth # Swin2SR checkpoint
│ └── realesrgan_best.pth # Real-ESRGAN checkpoint
│
├── 📂 sample-model/ # Training notebooks & configs
│ ├── swin2sr-ultra-max-safe-city.ipynb
│ └── model-enrgan.ipynb
│
└── 📦 archive/ # Historical experiments
💻 Usage
Option A: Jupyter Notebook (Kaggle/Colab)
Best for: Interactive execution, visualization, prototyping
- Open
ENSEMBLE_FINAL_ROBUST.ipynbin Jupyter/Kaggle - Configure paths in Cell 3 (Checkpoint Detection):
MODEL_CONFIGS = { 'swin2sr': { 'checkpoints': ['/kaggle/input/your-weights/swin2sr_best.pth'] }, # ... } - Run cells sequentially (Shift+Enter)
- Monitor checkmarks:
- ✅ Dependencies installed
- ✅ GPU detected
- ✅ Models loaded
- ✅ Validation passed
- ✅ Predictions generated
Option B: Standalone Script
Best for: Batch processing, production servers, CI/CD
python WORLDSTRAT_ENSEMBLE_CORRECTED.py \
--test_csv /data/worldstrat/test.csv \
--output_dir /output/predictions \
--batch_size 4 \
--num_workers 4
Arguments:
--test_csv: Path to test split CSV (must havelr_pathcolumn)--output_dir: Directory for super-resolved images (default:./predictions)--batch_size: Inference batch size (default: auto-detect based on GPU)--num_workers: Data loading workers (default: 2)
🧠 Model Architecture
Ensemble Strategy
The system uses validation-driven adaptive weighting:
graph TD
A[Compute Validation PSNR] --> B{PSNR Δ?}
B -->|Δ < 0.3 dB| C[Equal Weights<br/>0.5, 0.5]
B -->|0.3 ≤ Δ ≤ 1.0 dB| D[Softmax T=2.0<br/>~0.65, 0.35]
B -->|Δ > 1.0 dB| E[Proportional<br/>~0.80, 0.20]
C --> F[Ensemble Prediction]
D --> F
E --> F
Why This Works:
- Close Performance: Equal weighting maximizes diversity
- Moderate Gap: Softmax balances contribution vs. quality
- Large Gap: Proportional prevents weak model from degrading results
Model Details
Swin2SR (Transformer-based)
- Architecture: Swin Transformer V2 with shifted windows
- Depth: [6, 6, 6, 6, 6, 6] (6 stages, 6 blocks each)
- Embedding Dim: 180
- Parameters: ~28.6M
- FLOPs: ~45.2G (for 128×128 input)
- Trained on: WorldStrat + ImageNet (pre-training)
Real-ESRGAN (GAN-based)
- Generator: RRDBNet (Residual-in-Residual Dense Blocks)
- Blocks: 23 RRDB blocks
- Features: 64 base channels
- Growth: 32 channels per dense layer
- Parameters: ~16.7M
- Loss: Combination of L1 + Perceptual (VGG) + GAN
Normalization Pipeline
# Sentinel-2 (Input LR)
def normalize_sentinel(img):
# Raw: uint16 [0, 3000] for RGB bands
# Normalized: float32 [0, 1]
return np.clip(img / 3000.0, 0.0, 1.0)
# WorldView-3 (Target HR)
def normalize_worldview(img):
# Raw: uint16 12-bit [0, 4095]
# Normalized: float32 [0, 1]
return np.clip(img / 4095.0, 0.0, 1.0)
📊 Performance & Results
Quantitative Metrics
| Model | Architecture | Params | Val PSNR | Val SSIM | Inference Time* | |-------|-------------|--------|----------|----------|-----------------| | Swin2SR | Transformer | 28.6M | 29.59 dB | 0.8421 | 0.18s/img | | Real-ESRGAN | GAN (RRDB) | 16.7M | 29.12 dB | 0.8392 | 0.14s/img | | Ensemble | Weighted Avg | — | 29.83 dB | 0.8456 | 0.32s/img |
*On NVIDIA T4 GPU, batch_size=1, 512×512 output
Validation Results Breakdown
Dataset: 149 validation samples from WorldStrat
Regions: Urban (45%), Rural (35%), Coastal (20%)
| Region | Swin2SR | ESRGAN | Ensemble | Δ Improvement | |--------|---------|--------|----------|---------------| | Urban | 30.12 dB | 29.45 dB | 30.34 dB | +0.22 dB | | Rural | 29.28 dB | 28.93 dB | 29.52 dB | +0.24 dB | | Coastal | 29.01 dB | 28.
