HackSynth
LLM Agent and Evaluation Framework for Autonomous Penetration Testing
Install / Use
/learn @aielte-research/HackSynthREADME
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing
The paper can be found on arXiv.
Introduction
<img align="left" style="width: 160px;" src="assets/logo.gif" alt="HackSynth Logo"/>We introduce HackSynth, a novel Large Language Model (LLM)-based agent capable of autonomous penetration testing. HackSynth's dual-module architecture includes a Planner and a Summarizer, which enable it to generate commands and process feedback iteratively. To benchmark HackSynth, we propose two new Capture The Flag (CTF)-based benchmark sets utilizing the popular platforms PicoCTF and OverTheWire. These benchmarks include two hundred challenges across diverse domains and difficulties, providing a standardized framework for evaluating LLM-based penetration testing agents.
<br>Using the repository
- You will have to create a Hugging Face and a Neptune.ai account
- Copy your API keys to the
.envfile, and set the desired CUDA devices, based on the.env_example - Set up the PicoCTF benchmark
- Set up the OverTheWire benchmark
- Start the HackSynth Agent
- Install the environment:
python -m venv cyber_venv source cyber_venv/bin/activate pip install -r requirements.txt - Start the benchmark with the following:
Thepython run_bench.py -b benchmark.json -c config.jsonbenchmark.jsonshould be one of the generatedbenchmark_solved.jsonfiles, or an equivalently structured file. The configuration files used by us for the measurements in the paper are also available in the configs folder.
- Install the environment:
How to Cite
If you use this code in your work or research, please cite the corresponding paper:
@misc{muzsai2024hacksynthllmagentevaluation,
title={HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing},
author={Lajos Muzsai and David Imolai and András Lukács},
year={2024},
eprint={2412.01778},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2412.01778},
}
Contributors
- Lajos Muzsai (muzsailajos@protonmail.com)
- David Imolai (david@imol.ai)
- András Lukács (andras.lukacs@ttk.elte.hu)
🔍 Also see our related project on reinforcement learning for cryptographic CTFs: HackSynth-GRPO
License
The project uses the GNU AGPLv3 license.
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
