VulBench
This is a benchmark for evaluating the vulnerability discovery ability of automated approaches including Large Language Models (LLMs), deep learning methods and static analyzers
Install / Use
/learn @Hustcw/VulBenchREADME
VulBench
This repository contains materials for the paper "How Far Have We Gone in Vulnerability Detection Using Large Language Model".
Usage
Suppose there is OpenAI-compatiable endpoint at http://localhost:8080, you can use the following command to evaluate the model.
# Alternatively, set OPENAI_API_KEY environment variable
export OPENAI_API_KEY=xxx
python3 query.py \
--datasets d2a ctf magma big-vul devign \
--db_path ./query_result.db \
--api_endpoint http://localhost:8080 \
--model 'Llama-2-7b-chat-hf' \
--trials 5 \
--do_binary_classification \
--do_multiple_classification \
--concurrency 100
Specifically, it will query the model Llama-2-7b-chat-hf with the datasets d2a, ctf, magma, big-vul, and devign. The results will be stored in the SQLite database query_result.db. The script will run 5 trials for each dataset and the script will send 100 requests concurrently.
After that, evaluate the result database with the following command.
python3 eval.py ./query_result.db [./query_result_1.db, ./query_result_2.db, ...]
It will generate a all_result.csv file, containing the different metrics for each dataset with different prompt type.
Extra Data
The compiled binaries (including fixed and unfixed, compiled with -g -fno-inline-functions -O2 to better represent the original individual vulnerable functions) and source code (with MAGMA_ENABLE_FIXES and MAGMA_ENABLE_CANARIES left untouched) are available at VulBench.
News
- [2023/12/18] We release the raw datasets of VulBench.
- [2024/3/19] We release the code for querying the model and evaluating the results.
Bibtex
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{gao2023far,
title={How Far Have We Gone in Vulnerability Detection Using Large Language Models},
author={Gao, Zeyu and Wang, Hao and Zhou, Yuchen and Zhu, Wenyu and Zhang, Chao},
journal={arXiv preprint arXiv:2311.12420},
year={2023}
}
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
flutter-tutor
Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d
