HEC
Equivalence Verification Checking for Code Transformation via Equality Saturation
Install / Use
/learn @Yu-Maryland/HECREADME
HEC: Equivalence Verification Checking for Code Transformation via Equality Saturation (ATC'25)
HEC is an MLIR-fronted framework that uses e-graphs to prove source-to-source transformations correct (control-flow and datapath). It normalizes MLIR, lowers it into an e-graph friendly form, and uses a customized egg backend with dynamic + static rewrites (tiling, unrolling, fusion, algebraic simplification, const folding) to certify equivalence. Our ATC’25 paper shows HEC scaling to 100K+ lines of PolyBenchC MLIR in ~40 minutes and uncovering real mlir-opt bugs (loop boundary checks and fusion RAW violations).
✨ What’s special
- 🧠 Holistic MLIR-aware front end: Preprocesses SSA, parses affine/loop constructs, and builds a typed hypergraph that unifies variable renaming, loop decomposition, and op tracking—covering both control-flow and datapath transformations.
- 🧩 Hybrid transformation-aware rewrites: Static datapath rules plus dynamic control-flow rules (tiling, unrolling, fusion, algebraic simplification, const folding) embedded in
egg, adaptable to runtime-dependent patterns. - 🔍 Proof-oriented extraction: Equality saturation explores the transformation space; minimal-cost extraction certifies equivalence and spots divergences.
- ✅ Formal verification with Z3: Dynamic loop transformations (unrolling) are formally verified using Z3 SMT solver to ensure arithmetic iteration-count identities hold before applying rewrites.
- 🧪 Real-world validation & findings: Scales to 100K+ lines of PolyBenchC MLIR in ~40 minutes; uncovered
mlir-optbugs (loop boundary checks and fusion read-after-write violations).
🔄 Static vs Dynamic Rewrite Rules
HEC employs two categories of rewrite rules to handle the diverse transformations in MLIR optimization:
Static Rules (Pattern-Based)
These are traditional e-graph rewrite rules encoded as simple pattern-match-and-replace operations:
- Algebraic simplification:
x + 0 → x,x * 1 → x, associativity, commutativity - Constant folding:
2 + 3 → 5 - Datapath identities: arithmetic and logical identities that hold universally
Static rules are always safe to apply—they're encoded directly in the e-graph rewrite system and proven correct by construction.
Dynamic Rules (Runtime-Dependent)
These handle complex control-flow transformations that depend on runtime loop parameters and require semantic analysis:
- Loop tiling: Splits loops into blocked iterations with specific tile sizes
- Loop unrolling: Merges adjacent loop segments with different step sizes
- Loop fusion: Combines consecutive loops under data-dependency constraints
Dynamic rules use custom Applier implementations that:
- Pattern-match on loop structure in the e-graph
- Extract runtime parameters (bounds, step sizes)
- Verify arithmetic correctness via Z3 (for unrolling)
- Construct the transformed loop only if verification passes
Z3-Based Formal Verification
For loop unrolling, HEC uses the Z3 SMT solver to formally verify that merging two adjacent loops preserves iteration semantics. Specifically, when merging loops with parameters (a1, b1, c1) and (a2, b2, c2):
Loop 1: for i in range(a1, b1, step=c1)
Loop 2: for i in range(a2, b2, step=c2) // where b1 == a2
Before applying the merge, Z3 proves the iteration-count identity:
relu(⌈(k-a1)/c1⌉) · (c1/c2) + relu(⌈(b2-k)/c2⌉) = relu(⌈(b2-a1)/c2⌉)
Where relu(x) = max(0, x) and k = b1 = a2 is the boundary point. This ensures the merged loop executes the exact same number of iterations. Z3 searches for counterexamples to the negation of this formula—Unsat means the transformation is provably correct, and only then does HEC apply the rewrite.
This SMT-backed verification lets HEC safely explore aggressive loop transformations while maintaining soundness guarantees that purely syntactic e-graph rewrites cannot provide.
📂 Repository layout
mlir/— PolyBench-style MLIR kernels with many optimized variants.mlir_parser.py— MLIR → hypergraph → serialized e-graph (JSON per program line).preprocess.py— SSA renaming/normalization used by the parser.checker/— Rust driver that merges e-graphs, runs the rewrite set, and reports equivalence.egg/— Local fork ofeggwith added rewrites and static Z3 linkage (static-link-z3).
🛠️ Prerequisites
- Python 3.9+ (
networkx,ordered-set,numpyviarequirements.txt). - Rust toolchain 1.89 (
rustup toolchain install 1.89). - Build tools for the bundled Z3 build:
cmake,gcc/clang, system linker.
⚙️ Setup
cd HEC
pip install -r requirements.txt
cd checker
cargo build --release
▶️ Run an equivalence check (example)
From the repo root, pipe each program (baseline first) into the checker:
(
python mlir_parser.py -i mlir/2mm/2mm.mlir
python mlir_parser.py -i mlir/2mm/unroll_16_unroll_16_2mm.mlir
) | cargo run --release --manifest-path checker/Cargo.toml
What happens step by step:
mlir_parser.pynormalizes the MLIR (SSA renaming), builds a hypergraph, then prints a serialized e-graph JSON (one line per program).- The Rust checker reads all lines, merges them into one e-graph, and runs the domain rewrites (tiling, unrolling, fusion, algebraic/const folding).
- After equality saturation, it extracts the lowest-cost expression per root and compares every program against program 0.
What you’ll see in the log:
- Best extracted expression for each program.
- Equivalence summary: for each non-baseline program, “equivalent” if there is a shared e-class id (non-empty list), otherwise “NOT equivalent.”
- Runtime, iteration count, and e-graph size for situational awareness.
Compare more than two programs by adding more mlir_parser.py -i … lines; program 0 is always the baseline (the first line of input) and every other program is checked against it. If a path is wrong or a parse fails, the checker exits early with an error to keep results trustworthy.
🔧 Useful commands
- Normalize MLIR only:
python preprocess.py -i mlir/2mm/2mm.mlir -o /tmp/normalized.mlir --mode rename - Parser only (prints JSON):
python mlir_parser.py -i mlir/2mm/2mm.mlir - Adjust logging:
RUST_LOG=info cargo run --release --manifest-path checker/Cargo.toml
🩺 Troubleshooting
- Edition errors →
rustup override set 1.89in the repo. - Z3 build issues → ensure
cmakeand a C/C++ compiler are installed; rebuildcargo clean -p z3-sys && cargo build --release --manifest-path checker/Cargo.toml. - Large benchmarks → reduce rewrite set or pick smaller MLIR variants to keep node/iteration limits in check.
📖 Citation
If you use HEC, please cite:
@article{yin2025hec,
title={HEC: Equivalence Verification Checking for Code Transformation via Equality Saturation},
author={Yin, Jiaqi and Song, Zhan and Agostini, Nicolas Bohm and Tumeo, Antonino and Yu, Cunxi},
journal={arXiv preprint arXiv:2506.02290},
year={2025}
}
Related Skills
node-connect
352.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
