SkillAgentSearch skills...

SwiftIR

Swift as a Metalanguage for MLIR - Building the Future of ML Compilers in Swift

Install / Use

/learn @pedronahum/SwiftIR
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

SwiftIR

Swift ML Compiler Infrastructure with Automatic Differentiation

Swift Version MLIR License


Overview

SwiftIR compiles Swift code with automatic differentiation to XLA for hardware-accelerated execution. It traces computation graphs using Swift's type system and @differentiable attribute, generating StableHLO IR that runs on CPU, GPU, and TPU via PJRT.

Swift Code → Tracer → MLIR/StableHLO → XLA → CPU/GPU/TPU

Two Implementations

| | SwiftIR | SwiftIRJupyter | |---|---|---| | Backend | C++ MLIR bindings | Pure Swift (string-based MLIR) | | Tracer | DifferentiableTracer | JTracer | | Use case | Production, local development | Jupyter/Colab, rapid prototyping | | Dependencies | MLIR C API | None (pure Swift) |

Both use Swift's native @differentiable for automatic differentiation, produce identical StableHLO output, and share the same PJRT execution layer.


Building Simulation with TensorBoard Profiling

SwiftIR includes a real-world physics simulation: thermal dynamics of a radiant floor heating system. This demonstrates automatic differentiation through control flow, XLA loop fusion, and TensorBoard profiling.

SwiftIRJupyter Example

import SwiftIRJupyter
import SwiftIRProfiler

// Initialize backend and profiler
try SwiftIRJupyter.shared.initialize()
let profiler = try PJRTProfiler.create()
try profiler.start()

// Physics simulation with custom trace annotations
for epoch in 0..<numEpochs {
    try pjrtTrainStep(epoch) {
        // Trace graph construction
        try pjrtTraced("trace_simulation") {
            let ctx = JTracingContext()
            let dummy = ctx.input(shape: [], dtype: .float32)

            // Native while loop → stablehlo.while (O(1) compilation)
            let (_, finalSlab, _, _) = jWhileLoop4(
                initial: (iter, slabTemp, quantaTemp, tankTemp),
                condition: { $0.0 < maxSteps },
                body: { state in
                    // Heat transfer: Tank (70C) → Fluid → Floor slab
                    let conductance = one / (resistance * thickness / area)
                    let heatToSlab = (state.2 - state.1) * conductance * dt
                    let newSlab = state.1 + heatToSlab / (slabCp * slabMass)
                    return (state.0 + 1, newSlab, newQuanta, newTank)
                }
            )

            // Loss function
            let loss = (finalSlab - target) * (finalSlab - target)
            ctx.output(loss)
        }

        // Compile and execute
        try pjrtTraced("xla_execution") {
            let mlir = ctx.buildModule(name: "simulation")
            let result = try SwiftIRJupyter.shared.execute(mlir)
        }
    }
}

// Export profile for TensorBoard
try profiler.stop()
let data = try profiler.collectData()
try PJRTProfiler.exportToFile(data, filepath: "/tmp/profile/host.xplane.pb")

SwiftIR Example (Native @differentiable)

import SwiftIR
import SwiftIRXLA
import SwiftIRProfiler

// Initialize PJRT and profiler
let client = try PJRTCPUClient()
let profiler = try PJRTProfiler.create()
try profiler.start()

// Native Swift differentiable function
@differentiable(reverse)
func simulateTimestep(
    _ slab: DifferentiableTracer,
    _ quanta: DifferentiableTracer,
    _ tank: DifferentiableTracer
) -> (DifferentiableTracer, DifferentiableTracer, DifferentiableTracer) {
    let conductance = createConstant(1.0, shape: [], dtype: .float32) / resistance
    let heatToSlab = (quanta - slab) * conductance * dt
    return (slab + heatToSlab / slabCapacity, newQuanta, newTank)
}

// diffWhileLoop with gradient support
let (_, finalState, _, _) = diffWhileLoop(
    initial: (iteration, slabTemp, quantaTemp, tankTemp),
    condition: { $0.0 < maxSteps },
    body: { state in
        let (s, q, t) = simulateTimestep(state.1, state.2, state.3)
        return (state.0 + 1, s, q, t)
    }
)

// Compute gradients through the entire simulation
let grad = gradient(at: initialParams) { params in
    let final = simulate(params)
    return (final - target).pow(2).sum()
}

// Export profile
try profiler.stop()
try PJRTProfiler.exportToFile(profiler.collectData(), filepath: "/tmp/profile/host.xplane.pb")

Run examples and view in TensorBoard:

# Run profiled simulation
SWIFTIR_DEPS=/opt/swiftir-deps LD_LIBRARY_PATH=/opt/swiftir-deps/lib \
  swift run JupyterProfiledSimulation

# Launch TensorBoard
tensorboard --logdir=/tmp/swiftir_jupyter_profile --port=6006
# Open http://localhost:6006 → Profile tab

How SwiftIR Works

SwiftIR leverages Swift's native automatic differentiation (@differentiable) to generate gradient computations at compile time, then accelerates execution through the MLIR → OpenXLA pipeline:

┌─────────────────────────────────────────────────────────────────────────┐
│  Swift Source Code                                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐│
│  │ @differentiable(reverse)                                            ││
│  │ func blackScholes(spot: Tracer, strike: Tracer, ...) -> Tracer {   ││
│  │     let d1 = (log(spot / strike) + ...) / (vol * sqrt(time))       ││
│  │     return spot * normalCDF(d1) - strike * exp(-rate*time) * ...   ││
│  │ }                                                                   ││
│  └─────────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  Swift Compiler (SIL Differentiation Transform)                         │
│  • Generates forward pass + backward pass (pullback) at compile time    │
│  • Type-safe gradient propagation through all operations                │
│  • Zero runtime overhead for gradient tape construction                 │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  DifferentiableTracer / JTracer (Graph Capture)                         │
│  • Captures computation graph during execution                          │
│  • Traces both forward and gradient operations                          │
│  • Outputs MLIR/StableHLO intermediate representation                   │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  MLIR / StableHLO                                                        │
│  • Hardware-agnostic tensor operations                                  │
│  • Optimizations: CSE, DCE, constant folding                            │
│  • Portable across CPU, GPU, TPU                                        │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  OpenXLA (XLA Compiler)                                                  │
│  • SIMD vectorization (AVX-512, NEON)                                   │
│  • Operation fusion (eliminates intermediate allocations)               │
│  • Memory layout optimization                                           │
│  • Target-specific code generation                                      │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  PJRT Runtime                                                            │
│  • Pluggable hardware backends (CPU, GPU, TPU)                          │
│  • Async execution and memory management                                │
│  • Multi-device orchestration                                           │
└─────────────────────────────────────────────────────────────────────────┘

Key Benefits

| Aspect | SwiftIR Approach | Traditional AD Frameworks | |--------|------------------|---------------------------| | Gradient Generation | Swift compiler (compile-time) | Runtime tape recording | | Type Safety | Full Swift type checking | Runtime shape errors | | Execution | XLA-compiled, vectorized | Interpreted or JIT | | Memory | XLA fusion eliminates intermediates | Tape stores all intermediates |


Performance

Quantitative Finance Benchmarks

SwiftIR excels at gradient-heavy financial computations like Black-Scholes option pricing with Greeks (Delta). These benchmarks compare computing option prices AND their derivatives (sensitivities) across different approaches:

Performance Comparison (1M Options, CPU)

| Implementation | Options/sec | Time (1M) | vs Pure Swift | vs Swift AD | |----------------|-------------|-----------|---------------|-------------| | SwiftIR-Traced | 47.2M | 21.2ms | 1.35x | 74.4x | | Pure Swift (no gradients) | 34.9M | 28.6ms | 1.0x | — | | Swift _Differentiation | 634K | 1,577ms | 0.018x | 1.0x |

Why SwiftIR Is Fast

| Optimization | Swift AD | SwiftIR-XLA | Impact | |--------------|----------|-------------|--------| | SIMD Vectorization | None (scalar) | AVX-512/NEON | 8-16x throughput | | Operation Fusion | None (each op allocates) | Fused kernels | Eliminates memory bandwidth | | Gradient Tape | R

View on GitHub
GitHub Stars23
CategoryDevelopment
Updated10d ago
Forks0

Languages

Swift

Security Score

90/100

Audited on Mar 30, 2026

No findings