Ghidriff
Python Command-Line Ghidra Binary Diffing Engine
Install / Use
/learn @clearbluejar/GhidriffREADME
Ghidriff - Ghidra Binary Diffing Engine
ghidriff provides a command-line binary diffing capability with a fresh take on diffing workflow and results.
It leverages the power of Ghidra's ProgramAPI and FlatProgramAPI to find the added, deleted, and modified functions of two arbitrary binaries. It is written in Python3 using pyghidra to orchestrate Ghidra and jpype as the Python to Java interface to Ghidra.
Its primary use case is patch diffing. Its ability to perform a patch diff with a single command makes it ideal for automated analysis. The diffing results are stored in JSON and rendered in markdown (optionally side-by-side HTML). The markdown output promotes "social" diffing, as results are easy to publish in a gist or include in your next writeup or blog post.
High Level
flowchart LR
a(old binary - rpcrt4.dll-v1) --> b[GhidraDiffEngine]
c(new binary - rpcrt4.dll-v2) --> b
b --> e(Ghidra Project Files)
b --> diffs_output_dir
subgraph diffs_output_dir
direction LR
i(rpcrt4.dll-v1-v2.diff.md)
h(rpcrt4.dll-v1-v2.diff.json)
j(rpcrt4.dll-v1-v2.diff.side-by-side.html)
end
Sample Diffs
<div> <a href="https://gist.github.com/clearbluejar/b95ae854a92ee917cd0b5c7055b60282"><img width="30%" align=top alt="image" src="https://github.com/clearbluejar/ghidriff/assets/3752074/d53b681f-8cc9-479c-af4c-5ec697cf4989"></a> <a href="https://gist.github.com/clearbluejar/b95ae854a92ee917cd0b5c7055b60282#visual-chart-diff"><img width="30%" align=top alt="image" src="https://github.com/clearbluejar/ghidriff/assets/3752074/16d7ae4c-4df9-4bcd-b4af-0ce576d49ad1"></a> <a href="https://diffpreview.github.io/?f6fecbc507a9f1a92c9231e3db7ef40d"><img width="30%" align=top src="https://github.com/clearbluejar/ghidriff/assets/3752074/662ed834-738d-4be1-96c3-8500ccab9591"/></a> <div>Features
- Command Line (patch diffing workflow reduced to a single step)
- Highlights important changes in the TOC
- Fast - Can diff the full Windows kernel in less than a minute (after Ghidra analysis is complete)
- Enables Social Diffing
- Beautiful Markdown Output
- Easily hosted in a GitHub or GitLab gist, blog, or anywhere markdown is supported
- Visual Diff Graph Results
- Supports both unified and side by side diff results (unified is default)
- Provides unique Meta Diffs:
- Binary Strings
- Called
- Calling
- Binary Metadata
- Batteries Included
- Docker support
- Automated Testing
- Ghidra (No license required)
See below for CVE diffs and sample usage
Design Goals
- Find all added, deleted, and modified functions
- Provide foundation for automation
- Simple, Fast, Accurate
- Resilient
- Extendable
- Easy sharing of results
- Social Diffing
Powered by Ghidra
The heavy lifting of the binary analysis is done by Ghidra and the diffing is possible via Ghidra's Program API. ghidriff provides a diffing workflow, function matching, and resulting markdown and HTML diff output.
Docs
Engine
<p align='center'> <img src="https://user-images.githubusercontent.com/3752074/229976340-96394970-152f-4d88-9fe4-a46589b31c50.png" height="300"> </p>An "engine" is a self-contained, but externally-controllable, piece of code that encapsulates powerful logic designed to perform a specific type of work.
ghidriff provides a core base class GhidraDiffEngine that can be extended to create your own binary diffing implementations.
The base class implements the first 3 steps of the Ghidra headless workflow:
- Create Ghidra Project - Directory and collection of Ghidra project files and data
- Import Binary to project - Import one or more binaries to the project for analysis
- Analyze Binary - Ghidra will perform default binary analysis on each binary
The base class provides the abstract method find_matches where the actual diffing (function matching) takes place.
Extending ghidriff
ghidriff can be used as is, but it offers developers the ability to extend the tool by implementing their own differ. The basic idea is create new diffing tools by implementing the find_matches method from the base class.
class NewDiffTool(GhidraDiffEngine):
def __init__(self,verbose=False) -> None:
super().__init__(verbose)
@abstractmethod
def find_matches(
self,
old: Union[str, pathlib.Path],
new: Union[str, pathlib.Path]
) -> dict:
"""My amazing differ"""
# find added, deleted, and modified functions
# <code goes here>
return [unmatched, matched]
Implementations
There are currently 3 diffing implementations, which also display the evolution of diffing for the project.
- SimpleDiff - A simple diff implementation. "Simple" as in it relies mostly on known symbol names for matching.
- StructualGraphDiff - A slightly more advanced differ, beginning to perform some more advanced hashing (such as Halvar's Structural Graph Comparison)
- VersionTrackingDiff - The latest differ, with several correlators (an algorithm used to score specific associations based on code, program flow, or any observable aspect of comparison) for function matching. This one is fast.
Each implementation leverages the base class, and implements find_changes.
Usage
usage: ghidriff [-h] [--engine {SimpleDiff,StructualGraphDiff,VersionTrackingDiff}] [-o OUTPUT_PATH] [--summary SUMMARY] [-p PROJECT_LOCATION]
[-n PROJECT_NAME] [-s SYMBOLS_PATH] [-g GZFS_PATH] [--ba BASE_ADDRESS] [--program-options PROGRAM_OPTIONS] [--threaded | --no-threaded]
[--force-analysis] [--force-diff] [--no-symbols] [--log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}]
[--file-log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}] [--log-path LOG_PATH] [--va] [--min-func-len MIN_FUNC_LEN]
[--use-calling-counts | --no-use-calling-counts] [--gdt GDT] [--bsim | --no-bsim] [--bsim-full | --no-bsim-full]
[--max-ram-percent MAX_RAM_PERCENT] [--print-flags] [--jvm-args [JVM_ARGS]] [--sxs] [--max-section-funcs MAX_SECTION_FUNCS]
[--md-title MD_TITLE]
old new [new ...]
ghidriff - A Command Line Ghidra Binary Diffing Engine
positional arguments:
old Path to old version of binary '/somewhere/bin.old'
new Path to new version of binary '/somewhere/bin.new'. (For multiple new binaries add oldest to newest)
options:
-h, --help show this help message and exit
--engine {SimpleDiff,StructualGraphDiff,VersionTrackingDiff}
The diff implementation to use. (default: VersionTrackingDiff)
-o OUTPUT_PATH, --output-path OUTPUT_PATH
Output path for resulting diffs (default: ghidriffs)
--summary SUMMARY Add a summary diff if more than two bins are provided (default: False)
Extendend Usage
There are quite a few options here, and some complexity. Generally you can succeed with the defaults, but you can override the defaults as needed. One example might be to increase the JVM RAM used to run Ghidra to enable faster analysis of large binaries (--max-ram-percent 80). See help for details of other options.
Ghidra Project Options:
-p PROJECT_LOCATION, --project-location PROJECT_LOCATION
Ghidra Project Path (default: ghidra_projects)
-n PROJECT_NAME, --project-name PROJECT_NAME
Ghidra Project Name (default: ghidriff)
-s SYMBOLS_PATH, --symbols-path SYMBOLS_PATH
Ghidra local symbol store directory (default: symbols)
-g GZFS_PATH, --gzfs-path GZFS_PATH
Location to store GZFs of analyzed binaries (default: gzfs)
--ba BASE_ADDRESS, --base-address BASE_ADDRESS
Set base address from both programs. 0x2000 or 8192 (default: None)
--program-options PROGRAM_OPTIONS
Path to json file with Program Options (custom analyzer settings) (default: None)
Engine Options:
--threaded, --no-threaded
Use threading during import, analysis, and diffing. Recommended (default: True)
--force-analysis Force a new binary analysis each run (slow) (default: False)
--force-diff Force binary diff (ignore arch/symbols mismatch) (default: False)
--no-symbols Turn off symbols for analysis (default: False)
--log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,I
