BigCloneEval
BigCloneEval - A Clone Detection Tool Evaluation Framework for BigCloneBench
Install / Use
/learn @jeffsvajlenko/BigCloneEvalREADME
BigCloneEval: Evaluating Clone Detection Tools with BigCloneBench
BigCloneEval is a framework for performing clone detection tool evaluation experiments with the BigCloneBench clone benchmark.
Table of Contents
Contact Information
We are happy to answer any of your questions regarding BigCloneEval or BigCloneBench.
- Jeff Svajlenko (jeff.svajlenko@gmail.com)
- Chanchal K. Roy (chanchal.roy@usask.ca)
Contributing
Feel free to open an issue if you encounter issues or bugs.
We also welcome bug fixes, improvements and feature additions by pull request. Since it may take us some time to integrate your changes into this repository, the ideal method is to fork this repoistory for your changes and submit your pull request. Feel free to post your fork in your pull request and/or an issue so that others can use your changes while we work to integrate into the main repository.
Maintainers
@jeffsvajlenko
Contributors
@qw3ry
@SimonBaars
@exKAZUu
@T45K
(An always up-to-date list can be found here)
Installation and Setup
Complete the following steps to install the setup BigCloneEval.
Alternatively, download the VM version of BigCloneEval to have a pre-configured
environment. Username: bce, password: clones.
VM available in VMWare format at: https://1drv.ms/u/s!AhXbM6MKt_yLj_42w4Y-l5isPPiOOw?e=SIS5yW
The VM was created with VMWare Player Workstation, which is free for personal and educational purposes: https://www.vmware.com/products/workstation-player.html.
Step 1: Get the latest version of BigCloneEval
BigCloneEval is available as a github repository. The latest version can be retrieved using the following git command:
git clone https://github.com/jeffsvajlenko/BigCloneEval
Step 2: Get the latest version of BigCloneBench
Direct link: https://1drv.ms/u/s!AhXbM6MKt_yLj_NwwVacvUzmi6uorA?e=eMu0P4
Extract the contents of BigCloneBench (BigCloneBench_BCEvalVersion.tar.gz) into the 'bigclonebenchdb' directory of the BigCloneEval distribution.
To manually view this database, use h2database: http://h2database.com/html/main.html.
Step 3: Get the latest version of IJaDataset
Direct link: https://1drv.ms/u/s!AhXbM6MKt_yLj_N15CewgjM7Y8NLKA?e=cScoRJ
Extract the contents of IJaDataset (IJaDataset_BCEvalVersion.tar.gz) into the 'ijadataset' directory of the BigCloneEval distribution.
This should create a directory 'ijadataset/bcb_reduced/' which contains one sub-directory per functionality in BigCloneBench.
Step 4: Build the source code.
From the root directory, run make.
Step 5: Initialize the tools database
From the commands/ directory, execute the init script. This will initialize the tools
database.
Using BigCloneEval
The following documents the usage of BigCloneEval. Please also see the demonstration video on the BigCloneEval webpage.
Experimental Process
To evaluate the recall of a clone detection tool you must complete the following steps:
- Register the clone detection tool with BigCloneEval.
- Detect clones in IJaDataset using the clone detection tool.
- Import the detected clones into BigCloneEval.
- Configure and execution the evaluation experiment.
These steps are performed using BigCloneEval's commands, which are located in the command/ directory as scripts. These scripts must be executed from within the command directory (the command directory must be the working directory).
In the next section we outline the available commands. Then we discuss how each step can be performed in detail. Then we include specific documentation for each command.
Commands Summary
The commands are available in the commands/ directory, and should be executed from that
directory. Executing the commands with the -h flag will show their parameters.
The special bcb command can execute all other commands and provide a usage help. Run
./bcb help to get the following overview:
Usage: bcb COMMAND
The big clone bench tool
Commands:
clearClones Removes the imported clones for the specified registered tool.
countClones Count the number of clones that have been imported for the
tool.
deleteTool Deletes a tool, specified by its ID, from the framework. Also
removes any imported clones for this tool.
detectClones Executes the clone detection tool for IJaDataset in an
automated procedure. Requires a script that configures and
executes the tool, and the scalability limits of the tool
in terms of the maximum input size measured in source
files. Used deterministic input partitioning to overcome
scalability limits. Optional, clone detection can be
performed manually if desired.
evaluateRecall Measures the recall of the clones given. Highly configureable.
Summarizes recall per clone type, per inter vs
intra-project clones, per functionality in BigCloneBench
and for different syntactical similarity regions in the
output tool evaluation report.
evaluateTool Measures the recall of the specific tool based on the clones
imported for it. Highly configureable, including using
custom clone-matching algorithms. Summarizes recall per
clone type, per inter vs intra-project clones, per
functionality in BigCloneBenchand for different syntactical
similarity regions in the output tool evaluation report.
importClones Imports the clones detected by a tool in IJaDataset into the
framework for evaluation. Clones are provided as clone
pairs in a simple CSV file. See documentation below for the
expected format.
init This command initializes the tools database. It is used on
first-time setup. It can also be used to restore the tools
database to its original condition. This will delete any
tools, and their clones, from the database, and restart the
ID increment to 1.
This may take some time to execute as the database is
compacted.
listTools Lists the tools registered in the database. Including their
ID, name and description.
partitionInput partitions the files from the input directory to the output
directory.
registerTool Registers a clone detection tool with the framework. Requires
a name and description of the tool, which is stored in the
tools database. Returns a unique identifier for the tool
for indicating the target tool for the other commands. Name
and description are for reference by the user.
help Displays help information about the specified command
Step 1: Register Tool
First the tool must be registered to the framework. This is done by the registerTool command, which requires a name and description of the tool. The intention is for the user to use the name field to record the name and version of the tool, and the description to denote the configuration used in the detection experiment. These are stored for later reference by the user. A unique identifier is output for the user to refer to this tool in the latter steps.
Step 2: Clone Detection
Next the tool must be executed for IJaDataset. This can be done manually or using our detectClones command. Use the method which is easiest for your tool.
Manually
You must execute the tool for each subdirectory in ijadataset/bcb_sample. For example,
you must execute the tool for ijadataset/bcb_sample/2 and ijadataset/bcb_sample/3,
etc. Each sub-directory is the files from the full IJaDataset that contain clones of one
of the functionalities in BigCloneBench.
Some of these sub-directories may contain too many source files for the scalability constraints of some tools, particularly those that require significant memory. In this case the partitionInput command can be used to split these inputs into a number of smaller inputs given a maximum number of files the tool can reliably handle.
Automatically
To use the automatic clone detection procedure, a script must be provided that configures
and executes the tool. On Linux and OSX this is expected to be a bash script, and it
executed using bash -c /path/to/script/ /path/to/clone/detection/input/. On Wind
