VulZoo
VulZoo: A Comprehensive Vulnerability Intelligence Dataset | ASE 2024 Demo
Install / Use
/learn @NUS-Curiosity/VulZooREADME
Introduction
VulZoo is a large-scale vulnerability intelligence dataset that integrates various sources of structural and non-structural data. It is designed to be used by security researchers, penetration testers, and security analysts to get a comprehensive view of vulnerabilities and their associated data.
This dataset is divided into two parts: raw data and processed data.
raw-data/: contains the raw data from different sources.processed/: contains the processed data that is extracted or converted from the raw data.
VulZoo aims to provide the most comprehensive profiling of vulnerabilities for downstream tasks, e.g., vulnerability detection, assessment, prioritization, exploitation, and mitigation.
The following figure shows the conceptual overview of VulZoo:

README.md in processed/ provides more details about the processed data.
Quick Start
If the existing data in VulZoo satisfies your demand, you can just clone this repository without --recurse-submodules option:
git clone https://github.com/NUS-Curiosity/VulZoo
The dataset is in processed/ directory. If you need the up-to-date data, please following the data management process below.
Data Management
Pre-requisites:
- Python 3.6+
- Disk space: 25GB+
VulZoo is composed of both git-based and non-git-based sources. The git-based sources are from upstream repositories and organized as git submodules in this repository. The non-git-based sources are crawled and maintained in this repository. To get started, clone the repository with the following command:
git clone --recurse-submodules https://github.com/NUS-Curiosity/VulZoo
VulZoo provides some useful scripts to help you manage the data. As some scripts require specific Python packages, it is recommended to install the required packages first:
pip install -r requirements.txt
You can run the sync-raw-data.sh script to incrementally update the local raw data:
./sync-raw-data.sh
Then, you can run the sync-processed.sh script to process the raw data and synchronize the processed data with the latest raw data:
./sync-processed.sh
P.S.
- You can run
print-statistics.pyto get the statistics of the processed data. - The updating of
attackerkb-databaserequires API key provided by AttackerKB. Please set it via environment variable and runsync-attackerkb.pyinscripts/raw-datamanually. - The CPE dictionary is too large to be uploaded to GitHub. Please run
sync-cpe.shscripts in bothscripts/raw-dataandscripts/processedlocally.
Data Sources
Structural
- CVE (Common Vulnerabilities and Exposures)
- NVD (National Vulnerability Database)
- CWE (Common Weakness Enumeration)
- CAPEC (Common Attack Pattern Enumeration and Classification)
- CISA KEV (Known Exploited Vulnerabilities)
- ZDI Advisory
- GitHub Advisory
- MITRE ATT&CK
- MITRE D3FEND
- AttackerKB
Non-structural
- Exploit-DB
- oss-security mailing list
- full-disclosure mailing list
- bugtraq mailing list
- GitHub
- git.kernel.org
Hybrid
Citation
If you use this dataset, please cite the VulZoo paper:
@inproceedings{10.1145/3691620.3695345,
author = {Ruan, Bonan and Liu, Jiahao and Zhao, Weibo and Liang, Zhenkai},
title = {VulZoo: A Comprehensive Vulnerability Intelligence Dataset},
year = {2024},
isbn = {9798400712487},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3691620.3695345},
doi = {10.1145/3691620.3695345},
booktitle = {Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering},
pages = {2334–2337},
numpages = {4},
location = {Sacramento, CA, USA},
series = {ASE '24}
}
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
Security Score
Audited on Mar 31, 2026
