KernelGPT
KernelGPT: Enhanced Kernel Fuzzing via Large Language Models (ASPLOS 2025)
Install / Use
/learn @ise-uiuc/KernelGPTREADME
KernelGPT: Enhanced Kernel Fuzzing via Large Language Models
<p align="left"> <a href="https://arxiv.org/abs/2401.00563"><img src="https://img.shields.io/badge/arXiv-2401.00563-b31b1b.svg?style=for-the-badge"> </p>KernelGPT is a novel approach that leverages Large Language Models (LLMs) to automatically infer and refine Syzkaller specifications, significantly enhancing Linux kernel fuzzing capabilities.
[!IMPORTANT] We are keeping improving the documents and adding more implementation details. Please stay tuned at README-DEV.md for more information.
Contact: Chenyuan Yang, Zijie Zhao, Lingming Zhang.
✨ Key Features & Achievements
- Automated Specification Inference: Uses LLMs to generate Syzkaller specifications from kernel source code analysis.
- Iterative Refinement: Employs validation feedback to automatically repair and improve generated specifications.
- Proven Effectiveness:
- Detected 24 new bugs 🐛 in the Linux kernel.
- 11 bugs assigned CVEs❗ (12 fixed so far).
- Numerous KernelGPT-generated specifications have been merged into the official Syzkaller repository.
⚙️ Prerequisites
Before you begin, ensure you have the following installed and configured:
- Python: >= 3.8 (Check
requirements.txtfor specific library versions). - Git & Git Submodules: To clone the repository and its dependencies.
- Build Tools:
make, a C compiler (likegccfor host tools),bear.sudo apt-get update && sudo apt-get install build-essential make bear git - Clang: Version 14 is required for the analysis tools.
See the analyzer README for more details.# Example for Debian/Ubuntu sudo apt-get install clang-14 libclang-14-dev # Ensure clang-14 is the default or adjust paths in subsequent steps # Example: export CC=clang-14 CXX=clang++-14 - Syzkaller: A working Syzkaller setup targeting the Linux kernel. Follow the official Syzkaller setup guide. You'll need this for specification validation and fuzzing.
- Linux Kernel Source: You need a local copy of the Linux kernel source code that you intend to analyze.
🛠️ Installation
-
Clone the Repository:
# Replace with your actual repository URL if it's hosted elsewhere git clone https://github.com/KernelGPT/KernelGPT.git cd KernelGPT -
Initialize Submodules (Linux & Syzkaller):
git submodule update --init --recursiveThis will clone the specific Linux kernel version used in the paper and Syzkaller into the
linux/andsyzkaller/subdirectories. -
Install Python Dependencies:
pip install -r requirements.txt -
Prepare Syzkaller Image (Optional but Recommended): Follow the instructions in
image/to create a suitable VM image for fuzzing.cd image # Modify create-image.sh if needed (e.g., target architecture) bash create-image.sh cd ..
🚀 Usage
The core workflow involves analyzing the kernel source, generating specifications using the LLM, and then validating/refining them.
Step 1: Kernel Preparation & Static Analysis
This step analyzes the Linux kernel source code to extract information needed by the LLM.
-
Navigate to the Linux Submodule:
cd linux -
Configure the Kernel:
allyesconfigis recommended for broad analysis coverage.# Recommended: Use the commit tested in the paper (d2f51b35) # git checkout d2f51b35 # Or your desired commit/tag # Apply patch if using commit d2f51b35 (see details below) # patch -p1 < ../spec-eval/linux-d2f51b35.patch # Ensure clang-14 is used (e.g., export CC=clang-14 HOSTCC=clang) make CC=clang HOSTCC=clang allyesconfig -
Build the Kernel with
bear: This intercepts compiler calls to generatecompile_commands.json.# Ensure clang-14 is used (e.g., export CC=clang-14 HOSTCC=clang) bear -- make CC=clang HOSTCC=clang -j$(nproc)This command generates
<details> <summary>⚠️ Potential Build Issues (Linux `d2f51b35`)</summary>compile_commands.jsonin thelinux/directory.The specific Linux kernel commit
d2f51b35used in the paper may have compilation errors withallyesconfig. Apply the provided patch before building:# Run from the linux/ subdirectory patch -p1 < ../spec-eval/linux-d2f51b35.patchThe patch fixes minor issues in
</details>net/ipv4/tcp_output.candsound/soc/codecs/aw88399.c. -
Build Analysis Tools:
cd ../spec-gen/analyzer # Ensure Clang-14 dev libraries are installed and accessible make allThis creates
analyzeandusageexecutables. -
Run Analysis & Processing:
# Ensure you are in spec-gen/analyzer/ # Analyze structures, functions, enums, etc. ./analyze -p ../../linux/compile_commands.json # Process the analyzer output python process_output.py --linux-path ../../linux # Analyze usage patterns ./usage -p ../../linux/compile_commands.json # Process the usage output python process_output.py --linux-path ../../linux --usageThis generates several
processed_*.jsonfiles inspec-gen/analyzer/, which serve as input for the LLM.
Step 2: Generate Specifications with KernelGPT
-
Set OpenAI API Key: Create a file named
openai.keyin thespec-gen/directory and place your OpenAI API key inside it.echo "YOUR_API_KEY_HERE" > spec-gen/openai.key -
Run Specification Generation:
# Ensure you are in the spec-gen/ directory # Generate N specifications (e.g., 1 for a quick test) # Input: processed_handlers.json from the analysis step # Output: JSON specifications in spec-output/ python gen_spec.py -d analyzer/processed_handlers.json -o spec-output -n 1 # For full-scale generation (might take time and cost $$) # python gen_spec.py -d analyzer/processed_handlers.json -o spec-output -n 1000
Step 3: Validate and Repair Specifications
This step uses Syzkaller's tools (syz-check) to validate the generated specifications and feeds back errors to the LLM for repair (if enabled).
- Run Evaluation Script:
This script invokes# Ensure you are in the spec-gen/ directory # Input: Generated specs from spec-output/_generated # Output: Validation results and potentially repaired specs in eval-output/ python eval_spec.py -u -s spec-output/_generated --output-name debug -o eval-output cd .. # Back to KernelGPT rootspec-eval/run-specs.pyinternally. Check the script andeval-output/for detailed logs and results.
Reuse the Generated Specifications
If you want to reuse our generated specifications for drivers (or sockets), you could use eval_spec.py:
# Under the directory `spec-gen`
python eval_spec.py -u -s ../generated-specs/specs-6.7/correct-driver-spec --output-name debug -o eval-output --merge
This command will translate all specification written in json to syzkaller format and run the syzkaller.
The log for this process is spec-eval/debug/merged.log.
Then, all the textural specifications will be under spec-eval/debug/default-tmp/syzkaller/sys/linux directory, with gpt4_as the prefix.
📝 Citation
@inproceedings{kernelgpt,
author = {Yang, Chenyuan and Zhao, Zijie and Zhang, Lingming},
title = {KernelGPT: Enhanced Kernel Fuzzing via Large Language Models},
year = {2025},
isbn = {9798400710797},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3676641.3716022},
doi = {10.1145/3676641.3716022},
pages = {560–573},
numpages = {14},
location = {Rotterdam, Netherlands},
series = {ASPLOS '25}
}
