diffalayze – Automated Binary Diffing & LLM-Assisted Security Analysis

diffalayze is a versatile toolkit for automating patch diffing of binary targets and enriching the results with deep-dive analysis from large language models (LLMs). It is designed for reverse engineers, vulnerability researchers, and security teams who need to track software changes, highlight potentially security-relevant modifications, and produce actionable insights quickly.

What is diffalayze?

Check out the blog post!

diffalayze orchestrates the entire patch diffing workflow:

Fetches old and new versions of a binary automatically
Runs side-by-side diffs using Ghidriff in Docker
Analyzes results using an LLM pipeline with structured scoring and severity levels
Archives every run for reproducibility and auditing

The result: fast, repeatable, and AI-enhanced binary patch analysis.

Project Structure

┌─ diffalayze.py       → Main orchestration (CLI, Docker, Ghidriff, LLM)
│
├─ utils/
│   ├─ llm_client.py   → Backend-agnostic LLM helper (Ollama, OpenAI, Claude)
│   ├─ llmanalyze.py   → Pocketflow-based diff analysis pipeline
│   └─ prompts.yaml    → Prompt definitions for multi-stage LLM analysis
│
└─ targets/
    └─ <product-dir>/  → Target-specific directory (binaries, diffs, archives)
        └─ fetch_target.py  → Script to download, extract & compare versions

Key Features

Automated Binary Diffing Runs Ghidriff in a Docker container to produce detailed side-by-side (SxS) diffs.
LLM-Powered Security Analysis Converts raw diffs into structured, severity-rated reports (NONE → CRITICAL).
Modular & Backend-Agnostic Plug-and-play with OpenAI, Claude (Anthropic), or Ollama.
Trigger-Based Actions Execute custom commands automatically when a severity threshold is met.
Versioned Archiving Every run is timestamped and stored for future comparison and auditing.

Installation

1. Ghidriff (Docker)

docker pull ghcr.io/clearbluejar/ghidriff:latest

You must have permission to run Docker containers (e.g., be a member of the docker group).

2. Python Dependencies

python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

3. Optional Backends

pip install openai anthropic

Defining a Target

A target lives in targets/<product-name>/ and must include a fetch_target.py script that:

Downloads the old and new binary versions
Verifies if a new version is available

Outputs the absolute paths to both versions, e.g.:

/home/user/diffalayze/targets/mrxsmb/old.mrxsmb.sys /home/user/diffalayze/targets/mrxsmb/new.mrxsmb.sys

Tip: For Windows binaries, you can use the provided Winbindex-based example in targets/. Just adjust:

dbfile
filename
windows_version

diffalayze supports any architecture Ghidra can handle – not just Windows binaries.

Usage

Show all CLI options

python3 diffalayze.py -h

Diff + Analyze all targets

export OPENAI_API_KEY="sk-..."
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt

python3 diffalayze.py all -f -a -lb openai -lm gpt-5

Reports are saved to:

targets/<target>/archive/ghidriffs_<timestamp>/analysis.md

Diff + Analyze a single target

python3 diffalayze.py tdx-1607 -f -a -lb openai -lm gpt-5-nano -lv

Diff only

python3 diffalayze.py tdx-1607 -f

Analyze only (existing Ghidriff output)

python3 utils/llmanalyze.py -i <path-to-sxs-output> -v -lb ollama -tn TDX

Trigger a custom action when severity threshold is met

python3 diffalayze.py tdx-1607 -f -a -llt MEDIUM -ltc ./notify.sh -lb openai -lm gpt-5-nano

Environment Variables

| Variable | Description | | ------------------- | --------------------------------------------------- | | OPENAI_API_KEY | API key for OpenAI | | ANTHROPIC_API_KEY | API key for Claude (Anthropic) | | OLLAMA_URL | Ollama endpoint (default: http://localhost:11434) | | SSL_CERT_FILE | CA certificate file path |

LLM Prompt Stages

prompts.yaml defines three stages:

per_doc – Analyzes each diffed function in isolation
final_synthesis – Combines per-function insights into a full report
evaluation – Outputs a structured JSON with level, score, summary, and recommended actions

Easily customizable to match your security policy or analysis style.

Notes & Best Practices

Token usage Expect ~8K tokens per changed function + ~5K for the summary (varies by code size and model).
Human verification required LLM results are assistive, not authoritative – always review findings manually.

Credits

This project builds on the excellent work of:

Disclaimer

Use responsibly and only with permission from all relevant parties. This toolkit is intended for educational and research purposes only.

Diffalayze

Install / Use

README