SkillAgentSearch skills...

Tesoro

[TOSEM 2025] Datasets and models for TD detection in Java using comment and source code

Install / Use

/learn @NamCyan/Tesoro
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center">

Improving the detection of technical debt in Java source code with an enriched dataset

<!-- <p align="center"> <img src="assets/logo.png" width="300px" alt="logo"> </p> -->

License: MIT Tesoro on HuggingFace datasets Python Paper

</div>

Table of content


Introduction

Technical debt (TD) arises when developers choose quick fixes over well-designed, long-term solutions. Self-Admitted Technical Debts (SATDs) are a type of TD where developers explicitly acknowledge shortcuts in code comments. Most existing approaches focus on analyzing these comments, often overlooking the source code itself. This study bridges the existing gap by developing the first dataset that associates SATD comments with their respective source code, and introduces a novel approach where the input consists solely of source code.

All resources (datasets and models) can be found at Tesoro Hub 🎉.

Tesoro

We propose a novel dataset and construction pipeline (Fig. 1) to obtain informative samples for technical debt detection.

<img src="assets/pipeline.png" alt="logo">

Data Usage

$\text{Tesoro}$ contains two datasets:

  • $\text{Tesoro}_{comment}$: comments serve as the input source, to support SATD-related tasks. Source code can be used as additional context.

  • $\text{Tesoro}_{code}$: supports detecting technical debt in source code without relying on natural language comments.

<br>

Dataset on Huggingface: We publish tesoro-comment and tesoro-code on Huggingface Dataset Hub 🤗

from datasets import load_dataset

# Load Tesoro comment
dataset = load_dataset("NamCyan/tesoro-comment")

# Load Tesoro code
dataset = load_dataset("NamCyan/tesoro-code")

Dataset on Github: Tesoro is also available in this repository at data/tesoro.

Data Structure

  • tesoro-comment
{
    "id": "function id in the dataset",
    "comment_id": "comment id of the function",
    "comment": "comment text",
    "classification": "technical debt types (DESIGN | IMPLEMENTATION | DEFECT | DOCUMENTATION | TEST | NONSATD)",
    "code": "full fucntion context",
    "code_context_2": "2 lines code context",
    "code_context_10": "10 lines code context",
    "code_context_20": "20 lines code context",
    "repo": "Repository that contains this source"
}
  • tesoro-code
{
    "id": "function id in the dataset",
    "original_code": "raw function",
    "code_wo_comment": "original code without comment",
    "cleancode": "normalized version of code (lowercase, remove newline \n)",
    "label": "binary list corresponding to 4 TD types (DESIGN, IMPLEMENATION, DEFECT, TEST)",
    "repo": "Repository that contains this source"
}

Data for Experiments

The data prepared for training the SATD detector, performing k-fold evaluation, and answering the research questions is detailed in Data for Experiments.

Experiment Replication

We answer three research questions:

  • RQ1: Do the manually classified comments contribute to an improvement in the detection of SATD?

  • RQ2: Does the inclusion of source code help to enhance the detection of technical debt?

  • RQ3: What is the accuracy of different pre-trained models when detecting TD solely from source code?

All results can be found here. To reproduce the results of our experiments, see Training for more details.

Leaderboard

| Model <img width="400" height="1"> | Model size <img width="100" height="1"> | EM <img width="100" height="1"> | F1 <img width="100" height="1"> | |:-------------|:-----------|:------------------|:------------------| | Encoder-based PLMs | | CodeBERT | 125M | 38.28 | 43.47 | | UniXCoder | 125M | 38.12 | 42.58 | | GraphCodeBERT| 125M | 39.38 | 44.21 | | RoBERTa | 125M | 35.37 | 38.22 | | ALBERT | 11.8M | 39.32 | 41.99 | | Encoder-Decoder-based PLMs | | PLBART | 140M | 36.85 | 39.90 | | Codet5 | 220M | 32.66 | 35.41 | | CodeT5+ | 220M | 37.91 | 41.96 | | Decoder-based PLMs (LLMs) | | TinyLlama | 1.03B | 37.05 | 40.05 | | DeepSeek-Coder | 1.28B | 42.52 | 46.19 | | OpenCodeInterpreter | 1.35B | 38.16 | 41.76 | | phi-2 | 2.78B | 37.92 | 41.57 | | starcoder2 | 3.03B | 35.37 | 41.77 | | CodeLlama | 6.74B | 34.14 | 38.16 | | Magicoder | 6.74B | 39.14 | 42.49 |

Reference

If you're using Tesoro, please cite using this BibTeX:

@article{nam2024tesoro,
  title={Improving the detection of technical debt in Java source code with an enriched dataset},
  author={Hai, Nam Le and Bui, Anh M. T. Bui and Nguyen, Phuong T. and Ruscio, Davide Di and Kazman, Rick},
  journal={},
  year={2024}
}

License

MIT License

View on GitHub
GitHub Stars9
CategoryDevelopment
Updated26d ago
Forks1

Languages

Jupyter Notebook

Security Score

90/100

Audited on Mar 6, 2026

No findings