SkillAgentSearch skills...

Dataset

Module for compiling and managing vulnerable programs ๐Ÿ—‚๏ธ

Install / Use

/learn @open-crs/Dataset
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

dataset ๐Ÿ—‚๏ธ



Description

dataset is the CRS module that compiles and manages the vulnerable programs which will be analyzed by the CRS.

The supported test suites are:

  • NIST's Juliet
  • NIST's C Test Suite
  • A toy dataset

Limitations

  • ELF format
  • x86 architecture

How It Works

The module does the following steps for each test suite that needs to be built:

  1. Gets the available sources into the test suite's directory.

  2. Preprocesses the sources for including all the required source code files and header files.

  3. Writes the preprocessed sources into the sources/ directory in the root of the repository.

  4. Creates a new entry into the vulnerables.csv of the dataset.

  5. Filters the source code files based on the wanted CWEs.

  6. Compiles the preprocesses source files with the compile and link flags from multiple source code files (module files and user-provided files).

  7. Writes the executables into the executables/ directory in the root of the repository.

All build operations use GCC and are performed inside a 32-bit Ubuntu 18.04 container.

Setup

  1. Make sure you have set up the repositories and Python environment according to the top-level instructions. That is:

    • Docker is installed and is properly running. Check using:

      docker version
      docker ps -a
      docker run --rm hello-world
      

      These commands should run without errors.

    • The current repository and the commons repository are cloned (with submodules) in the same directory.

    • You are running all commands inside a Python virtual environment. There should be (.venv) prefix to your prompt.

    • You have installed Poetry in the virtual environment. If you run:

      which poetry
      

      you should get a path ending with .venv/bin/poetry.

  2. Disable the Python Keyring:

    export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
    

    This is an problem that may occur in certain situations, preventing Poetry from getting packages.

  3. Install the required packages with Poetry (based on pyprojects.toml):

    poetry install --only main
    
  4. Build the Docker image used to build the dataset assets:

    docker build --platform linux/386 --tag ubuntu18.04_32bit_compiler -f docker/Dockerfile.ubuntu18.04_32bit_compiler .
    

Usage

You can use the dataset module either standalone, as a CLI tool, or integrated into Python applications, as a Python module.

As a CLI Tool

As a CLI tool, you can either use the cli.py module:

python dataset/cli.py

or the Poetry interface:

poetry run dataset

Build Test Suite

$ poetry run dataset build --testsuite TOY_TEST_SUITE
โœ… Successfully built 5 executables.

List Executables

$ poetry run dataset get
โœ… The available executables are:

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ ID               โ”ƒ CWEs                        โ”ƒ Parent Database โ”ƒ Full Path                        โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ toy_test_suite_0 โ”‚ Stack-based Buffer Overflow โ”‚ toy_test_suite  โ”‚ executables/toy_test_suite_0.elf โ”‚
โ”‚ toy_test_suite_1 โ”‚                             โ”‚ toy_test_suite  โ”‚ executables/toy_test_suite_1.elf โ”‚
โ”‚ toy_test_suite_2 โ”‚ NULL Pointer Dereference    โ”‚ toy_test_suite  โ”‚ executables/toy_test_suite_2.elf โ”‚
โ”‚ toy_test_suite_3 โ”‚ NULL Pointer Dereference    โ”‚ toy_test_suite  โ”‚ executables/toy_test_suite_3.elf โ”‚
โ”‚ toy_test_suite_4 โ”‚                             โ”‚ toy_test_suite  โ”‚ executables/toy_test_suite_4.elf โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Get Help

$ poetry run dataset
Usage: dataset [OPTIONS] COMMAND [ARGS]...

  Builds and filters datasets of vulnerable programs

Options:
  --help  Show this message and exit.

Commands:
  build  Builds a test suite.
  show   Gets the executables in the whole dataset.

As a Python Module

from dataset import Dataset

available_executables = Dataset().get_available_executables()
View on GitHub
GitHub Stars5
CategoryData
Updated7mo ago
Forks2

Languages

Python

Security Score

67/100

Audited on Aug 30, 2025

No findings