HackSynth

LLM Agent and Evaluation Framework for Autonomous Penetration Testing

Generate Convert Improve

Install / Use

/learn @aielte-research/HackSynth

About this skill

Quality Score

0/100

README

HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing

The paper can be found on arXiv.

Introduction

We introduce HackSynth, a novel Large Language Model (LLM)-based agent capable of autonomous penetration testing. HackSynth's dual-module architecture includes a Planner and a Summarizer, which enable it to generate commands and process feedback iteratively. To benchmark HackSynth, we propose two new Capture The Flag (CTF)-based benchmark sets utilizing the popular platforms PicoCTF and OverTheWire. These benchmarks include two hundred challenges across diverse domains and difficulties, providing a standardized framework for evaluating LLM-based penetration testing agents.

<br>

Using the repository

You will have to create a Hugging Face and a Neptune.ai account
Copy your API keys to the .env file, and set the desired CUDA devices, based on the .env_example
Set up the PicoCTF benchmark
Set up the OverTheWire benchmark
Start the HackSynth Agent
- Install the environment:
```
python -m venv cyber_venv
source cyber_venv/bin/activate
pip install -r requirements.txt
```
- Start the benchmark with the following:
```
python run_bench.py -b benchmark.json -c config.json
```
  The benchmark.json should be one of the generated benchmark_solved.json files, or an equivalently structured file. The configuration files used by us for the measurements in the paper are also available in the configs folder.

How to Cite

If you use this code in your work or research, please cite the corresponding paper:

@misc{muzsai2024hacksynthllmagentevaluation,
      title={HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing}, 
      author={Lajos Muzsai and David Imolai and András Lukács},
      year={2024},
      eprint={2412.01778},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2412.01778}, 
}

Contributors

Lajos Muzsai (muzsailajos@protonmail.com)
David Imolai (david@imol.ai)
András Lukács (andras.lukacs@ttk.elte.hu)

🔍 Also see our related project on reinforcement learning for cryptographic CTFs: HackSynth-GRPO

License

The project uses the GNU AGPLv3 license.

Related Skills

node-connect

352.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。