FIRE
FIRE: Combining Multi-Stage Filtering with Taint Analysis for Scalable Recurring Vulnerability Detection.
Install / Use
/learn @CGCL-codes/FIREREADME
FIRE-Public
FIRE: Combining Multi-Stage Filtering with Taint Analysis for Scalable Recurring Vulnerability Detection.
Overview
The project consists four components(packages): BloomFilter(SFBF, Section 3.1), TokenFilter(Token Similarity Filter, Section 3.2),
SyntaxFilter(AST Similarity Filter, Section 3.3), Trace(Vulnerability Identification Phase, Section 4).
Besides, we provide utils classes in Dataset package to load dataset, including the Old-New-Funcs dataset, NormalSampledataset, and a class to load the target system (Dataset/target_project.py).
During the detection, cache, log, processed, result, workspace five directories are used.
We provide dockerfile and a flask server(server.py), so you can build the project to docker and use HTTP Request to detect vulnerability.
Installation
Read first before installing:
** Make sure you are installing the right version of the requirements and dependencies! **
Installing wrong version of dependency may cause exceptions and bugs, since several dependencies are under heavy developments and change fast.
** Do not extract the file in Windows and copy them to Linux. Extract them in Linux using tar and unzip. **
Extract file in Windows may lose some metadata and cause permission issue during the detection.
Install Python Requirements
conda
conda env new -f environment.yml
pip
# Install Python Requirements Except Torch
pip install -r requirements.txt
# Install Torch
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cpu
Install CodeBert
Put codebert-base in resource/codebert.
We use the pretrained CodeBert model provided by neulab. You can find codebert here codebert-cpp.
FIRE have extensibility to other languages, if you are interested in migrating FIRE from c/cpp to other language, change codebert-cpp to codebert-<another-language> and find the right pretrained model in huggingface.
See also the neulab code-bert link: (https://github.com/neulab/code-bert-score#huggingface--models)[https://github.com/neulab/code-bert-score#huggingface--models]
If your interested language didn't have any pretrained models, you can use this one without pretraining: microsoft/codebert-base.
Note: These CodeBert Repos all have lfs objects. Simply using git clone may miss some vital objects stored in lfs. You should manually download those lfs object after cloning the model.
Install Joern
Joern needs Java to run. In our project we use jdk-17.0.11.
Install Java
Get tar.gz tarball of jdk and unzip it to resource/jdk-17.0.11.
We have tried multiple version of java and java17 works best. Make sure you are installing the right java version
JAVA_HOME="/path/to/FIRE-public/resource/jdk-17.0.11"
PATH=$PATH:$JAVA_HOME/bin
java --version
java 17.0.11 2024-04-16 LTS
Java(TM) SE Runtime Environment (build 17.0.11+7-LTS-207)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.11+7-LTS-207, mixed mode, sharing)
Install Joern-cli
We use version 1.2.1 of Joern. You can find Joern in this GitHub repo: joernio/joern.
You can find the v1.2.1 joern-cli.zip file here: joern-cli.zip
Please download the zip tarball of Joern and unzip it to resource/joern-cli
Version 1.2.1 work best for our project.** Make sure you are installing the right version **.
./resource/joern-cli/joern
██╗ ██████╗ ███████╗██████╗ ███╗ ██╗
██║██╔═══██╗██╔════╝██╔══██╗████╗ ██║
██║██║ ██║█████╗ ██████╔╝██╔██╗ ██║
██ ██║██║ ██║██╔══╝ ██╔══██╗██║╚██╗██║
╚█████╔╝╚██████╔╝███████╗██║ ██║██║ ╚████║
╚════╝ ╚═════╝ ╚══════╝╚═╝ ╚═╝╚═╝ ╚═══╝
Version: 1.2.1
Type `help` or `browse(help)` to begin
joern>
About Ctags
Since Ctags is a lightweight open-source software, we put its binary version in Database/universal-ctags with COPYING.
So you don't need to install it. However, you should make sure +x is set to ctags file before run.
./Dataset/universal-ctags/ctags --version
Universal Ctags 6.0.0(293f11e), Copyright (C) 2015-2022 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
Compiled: Dec 20 2023, 10:38:07
URL: https://ctags.io/
Output version: 0.0
Optional compiled features: +wildcards, +regex, +gnulib_regex, +iconv, +option-directory, +xpath, +json, +interactive, +yaml, +packcc, +optscript
About Redis
Trace need Redis for caching. We use Redis docker in our experiments.
** Run FIRE Outside Docker **
You can install Redis v7.2.3 using package manager or use docker.
For example, you can launch Redis using the command below.
docker run -p 6379:6379 redis:7.2.3
** Build and Run FIRE in Docker **
please make sure you have put redis 7.2.3 in resource/redis-7.2.3.
The external redis docker is no need during the detection since we will install the redis during the build of docker.
If you run FIRE outside Docker, the step is no need.
Datasets
We use Old-New-Funcs dataset to store all the vulnerabilities and patches pairs which is used in all the components of FIRE.
Old-New-Funcs Dataset
We suggest to put the dataset to resource/OldNewFuncs.
Unfortunately we can not open source the dataset we used in this project, but you can build one using your own data following the structure below.
An Example of the Old-New-Funcs dataset folder structure:
|-- OldNewFuncs
| |-- ffmpeg (software directory)
| | |-- CVE-2009-0385 (CVE directory)
| | | |-- CVE-2009-0385_CWE-189_72e715fb798f2cb79fd24a6d2eaeafb7c6eeda17_4xm.c_1.1_fourxm_read_header_OLD.vul [Vulnerable Version]
| | | |-- CVE-2009-0385_CWE-189_72e715fb798f2cb79fd24a6d2eaeafb7c6eeda17_4xm.c_1.1_fourxm_read_header_NEW.vul [Patch Version]
| | | |-- ...Other Old-New-Funcs files (with the filename extension `.vul`)
| | |-- ...Other CVEs
| |-- ...Other Software
We do not utilize the software and CVE directory name. However, we utilize the old-new-funcs file's filename in our project. Each Old-New-Funcs file should store a function.
The Old-New-Funcs filename structure:
[CVE-No.]_[CWE-No.]_[Commit]_[File Extracted From]_[Version]_[Function Name]_[OLD/NEW].vul
OLD tag refers to vulnerability version, while NEW tag refers to patch version.
We utilized the CVE, Function Name and OLD/NEW part of the filename in FIRE. So please set them properly.
~~~NormalSample Dataset~~~ (No need anymore)
The NormalSample Dataset Structure:
We suggest to put the dataset at resource/NormalSample
|-- NormalSample Dataset
| |-- ffmpeg (software directory)
| | |-- ...functions
| |-- ...Other Software
There is no extra constraints for the filenames of the normal functions store in the software directory.
How To Run
Run Locally
Make sure you have properly installed all the requirements and prepared the datasets before run.
You can execute python3 main --help to read the help message of this project.
Currently, FIRE only runs on Linux.
Basic Usage
python3 main.py /path/to/target/system
Help Message
python3 main.py --help
usage: main.py [-h] [--rebuild [{bloomFilter,old-new-funcs,normal-sample,target} ...]] project
Extract data from project dir
positional arguments:
project Path to the project dir
options:
-h, --help show this help message and exit
--rebuild [{bloomFilter,old-new-funcs,target} ...]
Rebuild any of the components/dataset cache
Note: It would be better putting the project arguments before options to avoid parsing error. An example using --rebuild option:
python3 main.py /path/to/target --rebuild bloomFilter old-new-funcs target
Rebuild Option
We provide rebuild option to rebuild the cache when there are any updates to the dataset. We suggest to apply all the rebuild options first time before running the project.
If you update Old-New-Funcs Dataset, please rebuild bloomFilter and old-new-funcs.
If you do not specify any rebuild options, target option is set default to extract function of the target system each time before the vulnerbility detection.
Use space to separate the option if you want to apply multiple rebuild option.
Results
Detection results not only display in the console, but also in the result folder as well. You can find the detection result in result/[target-system].
Run Remote or In Docker
Run server.py if you want to run FIRE remote. If you use docker, server.py runs automatically.
This will open a flask server on port 8000 on the machine/docker. You can change the port in the server.py.
python3 server.py
You can publish a vulnerability detecting job using the following HTTP requests.
Request
- Method: GET
- URL: /process?git-url={git-url}&branch={branch}
git-url: git url to the target system.branch: tag or branch of the target system.
Response
- Body(Json)
time: Project Runtime.vul: Vulnerabilities Detected.vul_cnt: Count of the detected vulnerabilities.
Docker build
You should fully generate the cache (old-new-funcs and bloomFilter) before building the docker.
docker build .
Notes
We use lazy caching technique (generate the cache vector when the vulnerability and patch function are needed) instead of generate vectors of all vulnerability and patch functions in advance in Trace component to accelerate the experiments, making the first run of FIRE might slower than expected. However, in production environment,
