Dataset
Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models
Install / Use
/learn @cysecbench/DatasetREADME
🚀 CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models 🛡️
<a href="https://arxiv.org/abs/2501.01335"> <img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv:CySecBench&color=red&logo=arxiv" alt="Arxiv:CySecBench"> </a>The largest and most comprehensive Generative AI-based CyberSecurity-focused Dataset for Benchmarking Large Language Models
🌟 Overview
The CySecBench paper offers:
- 🎯 A cutting-edge dataset of 12662 prompts tailored to cybersecurity challenges.
- 🧠 Novel jailbreaking methods leveraging prompt obfuscation and refinement.
- 📊 Comprehensive performance evaluation of LLMs like ChatGPT, Claude, and Gemini.
Why CySecBench?
Existing datasets are too broad and often lack focus on cybersecurity. CySecBench fills this gap by providing domain-specific prompts organized into 10 categories, enabling a precise evaluation of LLM security mechanisms.
📄 Access the Paper
You can download the full research paper here: CySecBench (PDF)
✨ Features
🗂️ Dataset
- 📁 10 Categories of Prompts:
🗂️ Repository Structure
/
├── Code/
│ ├── dataset_generation.py
│ ├── keywords.txt
├── Dataset/
│ ├── Category sets/
│ │ ├── cysecbench-cloud-attacks.csv
│ │ ├── cysecbench-control-system-attacks.csv
│ │ ├── cysecbench-cryptographic-attacks.csv
│ │ ├── cysecbench-evasion-techniques.csv
│ │ ├── cysecbench-hardware-attacks.csv
│ │ ├── cysecbench-intrusion-techniques.csv
│ │ ├── cysecbench-iot-attacks.csv
│ │ ├── cysecbench-malware-attacks.csv
│ │ ├── cysecbench-network-attacks.csv
│ │ ├── cysecbench-web-application-attacks.csv
│ ├── Full dataset/
│ │ ├── cysecbench.csv
│ ├── Sample sets/
│ ├── cysecbench-500.csv
│ ├── cysecbench-2000.csv
│ ├── cysecbench-6000.csv
🚀 Getting Started
⚙️ Prerequisites
- 🐍 Python 3.8+
- 📦 Required libraries:
openai(only for dataset generation)
📊 Results using CySecBench
🎯 Evaluation Metrics
- ✅ Success Rate (SR): Percentage of prompts bypassing ethical guidelines.
- 📈 Average Rating (AR): Degree of harmfulness in LLM responses (on a scale of 1-5, where 5 is the most harmful).
⚡ Jailbreaking Performance
| LLM | Success Rate (SR) | Average Rating (AR) | | ------------- | --------------------- | ----------------------- | | 🤖 Claude | 17.4% | 2.00 | | 🤖 ChatGPT | 65.4% | 4.06 | | 🤖 Gemini | 88.4% | 4.77 |
📜 Citation
If you use CySecBench, please cite:
@article{CySecBench2024,
title = {{CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models}},
author = {Johan Wahréus and Ahmed Mohamed Hussain and Panos Papadimitratos},
year = {2025},
journal = {arXiv preprint arXiv:2501.01335},
url = {https://arxiv.org/abs/2501.01335}
}
⭐ Star This Repository!
If you found CySecBench helpful or interesting, please give this repository a star ⭐ to show your support!
🔒 License
This project is licensed under the MIT License. See the LICENSE file for details.
Related Skills
node-connect
354.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
112.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
354.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
354.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
