KernelGPT: Enhanced Kernel Fuzzing via Large Language Models

KernelGPT is a novel approach that leverages Large Language Models (LLMs) to automatically infer and refine Syzkaller specifications, significantly enhancing Linux kernel fuzzing capabilities.

[!IMPORTANT] We are keeping improving the documents and adding more implementation details. Please stay tuned at README-DEV.md for more information.

Contact: Chenyuan Yang, Zijie Zhao, Lingming Zhang.

✨ Key Features & Achievements

Automated Specification Inference: Uses LLMs to generate Syzkaller specifications from kernel source code analysis.
Iterative Refinement: Employs validation feedback to automatically repair and improve generated specifications.
Proven Effectiveness:
- Detected 24 new bugs 🐛 in the Linux kernel.
- 11 bugs assigned CVEs❗ (12 fixed so far).
- Numerous KernelGPT-generated specifications have been merged into the official Syzkaller repository.

⚙️ Prerequisites

Before you begin, ensure you have the following installed and configured:

Python: >= 3.8 (Check requirements.txt for specific library versions).
Git & Git Submodules: To clone the repository and its dependencies.

Build Tools: make, a C compiler (like gcc for host tools), bear.

sudo apt-get update && sudo apt-get install build-essential make bear git

Clang: Version 14 is required for the analysis tools.

# Example for Debian/Ubuntu
sudo apt-get install clang-14 libclang-14-dev
# Ensure clang-14 is the default or adjust paths in subsequent steps
# Example: export CC=clang-14 CXX=clang++-14

See the analyzer README for more details.

Syzkaller: A working Syzkaller setup targeting the Linux kernel. Follow the official Syzkaller setup guide. You'll need this for specification validation and fuzzing.
Linux Kernel Source: You need a local copy of the Linux kernel source code that you intend to analyze.

🛠️ Installation

Clone the Repository:

# Replace with your actual repository URL if it's hosted elsewhere
git clone https://github.com/KernelGPT/KernelGPT.git
cd KernelGPT

Initialize Submodules (Linux & Syzkaller):
```
git submodule update --init --recursive
```
This will clone the specific Linux kernel version used in the paper and Syzkaller into the linux/ and syzkaller/ subdirectories.
Install Python Dependencies:
```
pip install -r requirements.txt
```
Prepare Syzkaller Image (Optional but Recommended): Follow the instructions in image/ to create a suitable VM image for fuzzing.
```
cd image
# Modify create-image.sh if needed (e.g., target architecture)
bash create-image.sh
cd ..
```

🚀 Usage

The core workflow involves analyzing the kernel source, generating specifications using the LLM, and then validating/refining them.

Step 1: Kernel Preparation & Static Analysis

This step analyzes the Linux kernel source code to extract information needed by the LLM.

Navigate to the Linux Submodule:
```
cd linux
```

Configure the Kernel: allyesconfig is recommended for broad analysis coverage.

# Recommended: Use the commit tested in the paper (d2f51b35)
# git checkout d2f51b35 # Or your desired commit/tag

# Apply patch if using commit d2f51b35 (see details below)
# patch -p1 < ../spec-eval/linux-d2f51b35.patch

# Ensure clang-14 is used (e.g., export CC=clang-14 HOSTCC=clang)
make CC=clang HOSTCC=clang allyesconfig

Build the Kernel with bear: This intercepts compiler calls to generate compile_commands.json.
```
# Ensure clang-14 is used (e.g., export CC=clang-14 HOSTCC=clang)
bear -- make CC=clang HOSTCC=clang -j$(nproc)
```
This command generates compile_commands.json in the linux/ directory.
<details> <summary>⚠️ Potential Build Issues (Linux `d2f51b35`)</summary>
The specific Linux kernel commit d2f51b35 used in the paper may have compilation errors with allyesconfig. Apply the provided patch before building:
```
# Run from the linux/ subdirectory
patch -p1 < ../spec-eval/linux-d2f51b35.patch
```
The patch fixes minor issues in net/ipv4/tcp_output.c and sound/soc/codecs/aw88399.c.
</details>

Build Analysis Tools:

cd ../spec-gen/analyzer
# Ensure Clang-14 dev libraries are installed and accessible
make all

This creates analyze and usage executables.

Run Analysis & Processing:

# Ensure you are in spec-gen/analyzer/
# Analyze structures, functions, enums, etc.
./analyze -p ../../linux/compile_commands.json

# Process the analyzer output
python process_output.py --linux-path ../../linux

# Analyze usage patterns
./usage -p ../../linux/compile_commands.json

# Process the usage output
python process_output.py --linux-path ../../linux --usage

This generates several processed_*.json files in spec-gen/analyzer/, which serve as input for the LLM.

Step 2: Generate Specifications with KernelGPT

Set OpenAI API Key: Create a file named openai.key in the spec-gen/ directory and place your OpenAI API key inside it.
```
echo "YOUR_API_KEY_HERE" > spec-gen/openai.key
```

Run Specification Generation:

# Ensure you are in the spec-gen/ directory
# Generate N specifications (e.g., 1 for a quick test)
# Input: processed_handlers.json from the analysis step
# Output: JSON specifications in spec-output/
python gen_spec.py -d analyzer/processed_handlers.json -o spec-output -n 1

# For full-scale generation (might take time and cost $$)
# python gen_spec.py -d analyzer/processed_handlers.json -o spec-output -n 1000

Step 3: Validate and Repair Specifications

This step uses Syzkaller's tools (syz-check) to validate the generated specifications and feeds back errors to the LLM for repair (if enabled).

Run Evaluation Script:

# Ensure you are in the spec-gen/ directory
# Input: Generated specs from spec-output/_generated
# Output: Validation results and potentially repaired specs in eval-output/
python eval_spec.py -u -s spec-output/_generated --output-name debug -o eval-output
cd .. # Back to KernelGPT root

This script invokes spec-eval/run-specs.py internally. Check the script and eval-output/ for detailed logs and results.

Reuse the Generated Specifications

If you want to reuse our generated specifications for drivers (or sockets), you could use eval_spec.py:

# Under the directory `spec-gen`
python eval_spec.py -u -s ../generated-specs/specs-6.7/correct-driver-spec --output-name debug -o eval-output --merge

This command will translate all specification written in json to syzkaller format and run the syzkaller. The log for this process is spec-eval/debug/merged.log.

Then, all the textural specifications will be under spec-eval/debug/default-tmp/syzkaller/sys/linux directory, with gpt4_as the prefix.

📝 Citation

@inproceedings{kernelgpt,
    author = {Yang, Chenyuan and Zhao, Zijie and Zhang, Lingming},
    title = {KernelGPT: Enhanced Kernel Fuzzing via Large Language Models},
    year = {2025},
    isbn = {9798400710797},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3676641.3716022},
    doi = {10.1145/3676641.3716022},
    pages = {560–573},
    numpages = {14},
    location = {Rotterdam, Netherlands},
    series = {ASPLOS '25}
}

KernelGPT

Install / Use

README