MagiCompiler

Break the Boundaries of Local Compilation for Large Models

📢 Latest News

[03/25/2026] ⚡️ LightX2V-MagiCompiler is now available! This fork of LightX2V showcases how to seamlessly integrate MagiCompiler into a SOTA framework. With minimal code changes, it unlocks even greater acceleration! Try it out, check the benchmark for details, and stay tuned for more integration demos!
[03/23/2026] 🚀 MagiCompiler is officially open-sourced! Delivering whole-graph compilation for multi-modality inference and FSDP-aware whole-layer compilation for large model training.

📖 About

MagiCompiler is an advanced compiler and runtime augmentation framework built on top of torch.compile. Designed specifically for large-scale Transformer-like architectures, it addresses the critical bottlenecks of memory walls and operator overheads.

By stepping beyond traditional local operator optimization, MagiCompiler introduces system-level optimizations, seamlessly accelerating both training and multi-modality inference workloads with minimal code intrusion.

💡 Design Philosophy

Compiler as Manager

"Reimagining the compiler: from generating kernels to orchestrating the entire dataflow."

MagiCompiler's core philosophy is Compiler as Manager. We believe a modern deep learning compiler should not be restricted to mere kernel fusion. Instead, it acts as a global manager that owns the full lifecycle of execution. MagiCompiler actively manages subgraph dispatching, dynamically orchestrates dataflow (like offloading and prefetching), and controls memory allocation, ensuring optimal balance between compute efficiency and memory footprint.

Key Features

🎯 1. Unified Inference & Training

Tailored for Transformer-like architectures with scenario-specific strategies:

Inference: Achieves full-graph capture across Transformer boundaries, maximizing kernel fusion scope.
Training: Introduces FSDP-aware layer-wise compilation. Unlocks aggressive cross-op fusion while keeping distributed parameter sharding entirely transparent.

⚡️ 2. Easy to Use, Free Gain, Plug and Play

No complex model refactoring needed. Just two decorators deliver up to 20%+ extra speedups out-of-the-box, seamlessly integrating into SOTA multi-modality frameworks.

🧠 3. Smart Asynchronous Offloading

For memory-constrained setups, our built-in selective offloading policy perfectly overlaps H2D transfers with computation, eliminating pipeline bubbles.

♻️ 4. Heuristic Activation Recomputation

Say goodbye to manual torch.utils.checkpoint. MagiCompiler automatically saves compute-bound ops (e.g., MatMul, Attention) and recomputes memory-bound ones, slashing peak memory without sacrificing throughput.

🛠 5. Magi Depyf Introspection

Meet magi_depyf, MagiCompiler’s native introspection toolkit. Compilation timelines, decompiled bytecode flows, split subgraphs, and backend artifacts are automatically dumped into the cache path as organized, human-readable files for easier debugging.

⚙️ Installation

Requirements:

Python >= 3.12
PyTorch >= 2.9
CUDA Toolkit

Recommended for reproducibility: start from the prebuilt Docker image first, then run examples inside the container.

# Option A (recommended) — Use prebuilt image
# Step 1 — Pull the image
docker pull sandai/magi-compiler:latest
# Step 2 - Start the container
docker run --name my-magi-compiler -it -d --privileged --gpus all --network host --ipc host \
  -v /path/on/host:/workspace sandai/magi-compiler:latest /bin/bash
# Step 3 - Attach the container
docker exec -it my-magi-compiler /bin/bash

# Option B — Local source installation
# Step 1 — Clone the repo
git clone https://github.com/SandAI-org/MagiCompiler.git
cd MagiCompiler

# Step 2 — System dependencies (optional, for FX graph visualization; Debian/Ubuntu)
sudo apt update && sudo apt install -y graphviz

# Step 3 — Python dependencies
pip install -r requirements.txt

# Step 4 — Install MagiCompiler (pick one)
pip install .   # End users (recommended)
# pip install -e . --no-build-isolation --config-settings editable_mode=compat  # Developer / editable

🚀 Quick Start

🧹 1. One Decorator to Rule Them All (`@magi_compile`)

Remove scattered torch.compile or torch.compiler.disable calls. Decorate your core Transformer block once for automatic full-graph capture and dynamic shape support (defaulting to dim 0).

import torch
from torch import nn
from magi_compiler import magi_compile

# Decorate your core module once. No more scattered compile tweaks!
@magi_compile
class TransformerBlock(nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()
        self.attn = Attention(hidden_dim)
        self.mlp = MLP(hidden_dim)

    def forward(self, x: torch.Tensor, mask: torch.Tensor | None) -> torch.Tensor:
        x = x + self.attn(x, mask)
        x = x + self.mlp(x)
        return x

model = TransformerBlock(hidden_dim=1024).cuda()

# Execute normally - whole-graph compilation handles dynamic batches automatically!
out = model(torch.randn(4, 128, 1024, device="cuda"), None)
out = model(torch.randn(8, 128, 1024, device="cuda"), None)

🛠️ 2. Bridge Custom Kernels (`@magi_register_custom_op`)

Using custom kernels (FlashAttention, MoE routers) that break FX tracing? Don't disable compilation. Wrap them to teach the compiler how to handle them during graph partitioning and recomputation.

from magi_compiler import m

MagiCompiler

Install / Use

README

MagiCompiler

📢 Latest News

📖 About

💡 Design Philosophy

Compiler as Manager

Key Features

🎯 1. Unified Inference & Training

⚡️ 2. Easy to Use, Free Gain, Plug and Play

🧠 3. Smart Asynchronous Offloading

♻️ 4. Heuristic Activation Recomputation

🛠 5. Magi Depyf Introspection

⚙️ Installation

🚀 Quick Start

🧹 1. One Decorator to Rule Them All (`@magi_compile`)

🛠️ 2. Bridge Custom Kernels (`@magi_register_custom_op`)

MagiCompiler

Install / Use

README

MagiCompiler

📢 Latest News

📖 About

💡 Design Philosophy

Compiler as Manager

Key Features

🎯 1. Unified Inference & Training

⚡️ 2. Easy to Use, Free Gain, Plug and Play

🧠 3. Smart Asynchronous Offloading

♻️ 4. Heuristic Activation Recomputation

🛠 5. Magi Depyf Introspection

⚙️ Installation

🚀 Quick Start

🧹 1. One Decorator to Rule Them All (@magi_compile)

🛠️ 2. Bridge Custom Kernels (@magi_register_custom_op)

🧹 1. One Decorator to Rule Them All (`@magi_compile`)

🛠️ 2. Bridge Custom Kernels (`@magi_register_custom_op`)