BugFarm

Artifact repository for the paper "Challenging Bug Prediction and Repair Models with Synthetic Bugs", In Proceedings of The 25th IEEE International Conference on Source Code Analysis & Manipulation (SCAM 2025), Auckland, New Zealand, September 2025

Generate Convert Improve

Install / Use

/learn @Intelligent-CAT-Lab/BugFarm

About this skill

Quality Score

0/100

README

BugFarm

Artifact repository for the paper Challenging Bug Prediction and Repair Models with Synthetic Bugs, accepted at SCAM 2025, Auckland, New Zealand. Authors are Ali Reza Ibrahimzada, Yang Chen, Ryan Rong, and Reyhaneh Jabbarvand.

BugFarm

Overview

BugFarm is a framework that generates synthetic bugs through the analysis of least-attended tokens and statements in code. These synthetic bugs challenge and evaluate bug prediction and repair models. The pipeline involves extracting methods from projects, analyzing attention weights, determining least-attended components, and using LLMs to generate plausible bugs.

Data Archive

Please visit Zenodo to access the results of BugFarm. We will refer to certain files from this archive in the following sections.

Getting Started

Using Docker (Recommended)

The easiest way to set up BugFarm is using Docker:

# Build the Docker image
docker build -t bugfarm .

# Run the container
docker run -it bugfarm bash

Manual Setup

If you prefer a manual setup:

Install miniconda

Create and activate the environment:

conda env create -f environment.yaml
conda activate bugfarm

Set up the tokenizer tool
Install dependencies and download projects:
```
bash setup.sh
```

Project Modules

Attention Analyzer

This module extracts methods from projects and analyzes attention weights to determine least attended tokens (LAT) and least attended statements (LAS).

Key steps:

Extract methods from projects
Extract attention weights
Analyze attention weights to determine LAT/LAS

For detailed instructions, see Attention Analyzer README.

Bug Generator

This module uses LLMs to generate synthetic bugs based on the attention analysis results.

Key steps:

Prompt LLM with LAT/LAS information
Parse LLM responses to extract buggy methods
Select the most suitable bugs

For detailed instructions, see Bug Generator README.

We provide synthetic bugs on Zenodo. Please download mutants.zip from the BugFarm Zenodo archive.

Create Defect Dataset

This module creates datasets for training and evaluating bug detection models using various sources:

BugSwarm
Mockito-Closure (from Defects4J)
RegMiner
LEAM
muBERT

For detailed instructions, see Create Defect Dataset README.

We provide defect datasets on Zenodo. Please download defect_datasets.zip from the BugFarm Zenodo archive.

Bug Prediction

This module finetunes models for bug prediction using the created defect datasets.

For detailed instructions, see Finetuning README.

Bug Repair

We use artifacts of FitRepair for performing bug repair on the generated mutants. Please refer to the original repository for details on how to use FitRepair. We provide the generated patches from FitRepair on Zenodo. Please download apr.zip from the BugFarm Zenodo archive.

Human Study

Please refer to human_study.zip in the BugFarm Zenodo archive for the results of our human study on the generated bugs. You can also find human labeler results directly on UIUCPlus. Please refer to different branches for different human labelers and mutants.

LEAM

This module generates mutants using the LEAM framework.

For detailed instructions, see LEAM README.

muBERT

This module generates mutants using the muBERT framework.

For detailed instructions, see muBERT README.

Contact

For any questions or issues, please contact Ali Reza Ibrahimzada or open an issue on GitHub.

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

fullstack-developer

Full-Stack Developer Role Role Definition CONCEPT: Full-stack developer expertise ARCHITECTURE: Covers both frontend and backend development BEST_PRACTICE: Comprehensive web applicat

groundhog

401

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

workshop-rules

Materials used to teach the summer camp <Data Science for Kids>

Intelligent-CAT-Lab

View profile

View on GitHub

GitHub Stars6

CategoryEducation

Updated2mo ago

Forks1

Intelligent-CAT-Lab/BugFarm

Languages

Python

Security Score

90/100

Audited on Jan 27, 2026

No findings

BugFarm

Install / Use

README

BugFarm

Table of Contents

Overview

Data Archive

Getting Started

Using Docker (Recommended)

Manual Setup

Project Modules

Attention Analyzer

Bug Generator

Create Defect Dataset

Bug Prediction

Bug Repair

Human Study

LEAM

muBERT

Contact

Related Skills