SkillAgentSearch skills...

BigCloneEval

BigCloneEval - A Clone Detection Tool Evaluation Framework for BigCloneBench

Install / Use

/learn @jeffsvajlenko/BigCloneEval
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

BigCloneEval: Evaluating Clone Detection Tools with BigCloneBench

BigCloneEval is a framework for performing clone detection tool evaluation experiments with the BigCloneBench clone benchmark.

Table of Contents

Contact Information

We are happy to answer any of your questions regarding BigCloneEval or BigCloneBench.

  • Jeff Svajlenko (jeff.svajlenko@gmail.com)
  • Chanchal K. Roy (chanchal.roy@usask.ca)

Contributing

Feel free to open an issue if you encounter issues or bugs.

We also welcome bug fixes, improvements and feature additions by pull request. Since it may take us some time to integrate your changes into this repository, the ideal method is to fork this repoistory for your changes and submit your pull request. Feel free to post your fork in your pull request and/or an issue so that others can use your changes while we work to integrate into the main repository.

Maintainers

@jeffsvajlenko

Contributors

@qw3ry
@SimonBaars
@exKAZUu
@T45K

(An always up-to-date list can be found here)

Installation and Setup

Complete the following steps to install the setup BigCloneEval.

Alternatively, download the VM version of BigCloneEval to have a pre-configured environment. Username: bce, password: clones.

VM available in VMWare format at: https://1drv.ms/u/s!AhXbM6MKt_yLj_42w4Y-l5isPPiOOw?e=SIS5yW

The VM was created with VMWare Player Workstation, which is free for personal and educational purposes: https://www.vmware.com/products/workstation-player.html.

Step 1: Get the latest version of BigCloneEval

BigCloneEval is available as a github repository. The latest version can be retrieved using the following git command:

git clone https://github.com/jeffsvajlenko/BigCloneEval

Step 2: Get the latest version of BigCloneBench

Direct link: https://1drv.ms/u/s!AhXbM6MKt_yLj_NwwVacvUzmi6uorA?e=eMu0P4

Extract the contents of BigCloneBench (BigCloneBench_BCEvalVersion.tar.gz) into the 'bigclonebenchdb' directory of the BigCloneEval distribution.

To manually view this database, use h2database: http://h2database.com/html/main.html.

Step 3: Get the latest version of IJaDataset

Direct link: https://1drv.ms/u/s!AhXbM6MKt_yLj_N15CewgjM7Y8NLKA?e=cScoRJ

Extract the contents of IJaDataset (IJaDataset_BCEvalVersion.tar.gz) into the 'ijadataset' directory of the BigCloneEval distribution.

This should create a directory 'ijadataset/bcb_reduced/' which contains one sub-directory per functionality in BigCloneBench.

Step 4: Build the source code.

From the root directory, run make.

Step 5: Initialize the tools database

From the commands/ directory, execute the init script. This will initialize the tools database.

Using BigCloneEval

The following documents the usage of BigCloneEval. Please also see the demonstration video on the BigCloneEval webpage.

Experimental Process

To evaluate the recall of a clone detection tool you must complete the following steps:

  1. Register the clone detection tool with BigCloneEval.
  2. Detect clones in IJaDataset using the clone detection tool.
  3. Import the detected clones into BigCloneEval.
  4. Configure and execution the evaluation experiment.

These steps are performed using BigCloneEval's commands, which are located in the command/ directory as scripts. These scripts must be executed from within the command directory (the command directory must be the working directory).

In the next section we outline the available commands. Then we discuss how each step can be performed in detail. Then we include specific documentation for each command.

Commands Summary

The commands are available in the commands/ directory, and should be executed from that directory. Executing the commands with the -h flag will show their parameters. The special bcb command can execute all other commands and provide a usage help. Run ./bcb help to get the following overview:

Usage: bcb COMMAND
The big clone bench tool
Commands:
  clearClones     Removes the imported clones for the specified registered tool.
  countClones     Count the number of clones that have been imported for the
                    tool.
  deleteTool      Deletes a tool, specified by its ID, from the framework. Also
                    removes any imported clones for this tool.
  detectClones    Executes the clone detection tool for IJaDataset in an
                    automated procedure. Requires a script that configures and
                    executes the tool, and the scalability limits of the tool
                    in terms of the maximum input size measured in source
                    files. Used deterministic input partitioning to overcome
                    scalability limits. Optional, clone detection can be
                    performed manually if desired.
  evaluateRecall  Measures the recall of the clones given. Highly configureable.
                    Summarizes recall per clone type, per inter vs
                    intra-project clones, per functionality in BigCloneBench
                    and for different syntactical similarity regions in the
                    output tool evaluation report.
  evaluateTool    Measures the recall of the specific tool based on the clones
                    imported for it. Highly configureable, including using
                    custom clone-matching algorithms. Summarizes recall per
                    clone type, per inter vs intra-project clones, per
                    functionality in BigCloneBenchand for different syntactical
                    similarity regions in the output tool evaluation report.
  importClones    Imports the clones detected by a tool in IJaDataset into the
                    framework for evaluation. Clones are provided as clone
                    pairs in a simple CSV file. See documentation below for the
                    expected format.
  init            This command initializes the tools database. It is used on
                    first-time setup. It can also be used to restore the tools
                    database to its original condition. This will delete any
                    tools, and their clones, from the database, and restart the
                    ID increment to 1.
                  This may take some time to execute as the database is
                    compacted.
  listTools       Lists the tools registered in the database. Including their
                    ID, name and description.
  partitionInput  partitions the files from the input directory to the output
                    directory.
  registerTool    Registers a clone detection tool with the framework. Requires
                    a name and description of the tool, which is stored in the
                    tools database. Returns a unique identifier for the tool
                    for indicating the target tool for the other commands. Name
                    and description are for reference by the user.
  help            Displays help information about the specified command

Step 1: Register Tool

First the tool must be registered to the framework. This is done by the registerTool command, which requires a name and description of the tool. The intention is for the user to use the name field to record the name and version of the tool, and the description to denote the configuration used in the detection experiment. These are stored for later reference by the user. A unique identifier is output for the user to refer to this tool in the latter steps.

Step 2: Clone Detection

Next the tool must be executed for IJaDataset. This can be done manually or using our detectClones command. Use the method which is easiest for your tool.

Manually

You must execute the tool for each subdirectory in ijadataset/bcb_sample. For example, you must execute the tool for ijadataset/bcb_sample/2 and ijadataset/bcb_sample/3, etc. Each sub-directory is the files from the full IJaDataset that contain clones of one of the functionalities in BigCloneBench.

Some of these sub-directories may contain too many source files for the scalability constraints of some tools, particularly those that require significant memory. In this case the partitionInput command can be used to split these inputs into a number of smaller inputs given a maximum number of files the tool can reliably handle.

Automatically

To use the automatic clone detection procedure, a script must be provided that configures and executes the tool. On Linux and OSX this is expected to be a bash script, and it executed using bash -c /path/to/script/ /path/to/clone/detection/input/. On Wind

View on GitHub
GitHub Stars82
CategoryDevelopment
Updated8d ago
Forks19

Languages

Java

Security Score

95/100

Audited on Apr 1, 2026

No findings