ReliableLM4Code

This repository extends from our recent work, "Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey" and "Large language models for software engineering: A systematic literature review". It includes necessary information for our research and a curated collection of LM4Code papers and other resources (datasets, tutorials, etc.). The focus is primarily on papers that use pre-trained models, especially large language models, to improve the reliability of language models in Software Engineering research.

For more details, please access this site

Modern language models (LMs) have been successfully employed in source code generation and understanding, leading to a significant increase in research focused on learning-based code intelligence, such as automated bug repair, and test case generation. Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls, which hinder realistic performance and further impact their reliability and applicability in real-world deployment. Such challenges drive the need for a comprehensive understanding - not just identifying these issues but delving into their possible implications and existing solutions to build more reliable language models tailored to code intelligence. Based on a well-defined systematic research approach, we conducted an extensive literature review to uncover the pitfalls inherent in LM4Code. Finally, 67 primary studies from top-tier venues have been identified. After carefully examining these studies, we designed a taxonomy of pitfalls in LM4Code research and conducted a systematic study to summarize the issues, implications, current solutions, and challenges of different pitfalls for LM4Code systems. We developed a comprehensive classification scheme that dissects pitfalls across four crucial aspects: data collection and labeling, system design and learning, performance evaluation, and deployment and maintenance. Through this study, we aim to provide a roadmap for researchers and practitioners, facilitating their understanding and utilization of LM4Code in reliable and trustworthy ways.

Please feel free to send a pull request to add papers and relevant content that are not listed here. We uploaded our completed paper lists to Google Drive with detailed reviewed information.

Content

Papers

Data Collection and Labeling

Unbalanced Distribution

Deep Learning Based Vulnerability Detection (2021), arxiv, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Does data sampling improve deep learning-based vulnerability detection? Yeas! and Nays! (2023), ICSE, X Yang, et al. [pdf]
On the Value of Oversampling for Deep Learning in Software Defect Prediction (2021), TSE, R Yedida, T Menzies. [pdf]
Robust Learning of Deep Predictive Models from Noisy and Imbalanced Software Engineering Datasets (2022), ASE, Z Li, et al. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), arxiv, B Steenhoek, et al. [pdf]

Label Errors

Robust Learning of Deep Predictive Models from Noisy and Imbalanced Software Engineering Datasets (2022), ASE, Z Li, et al. [pdf]
XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training (2022), TOSEM, Z Lin, et al. [pdf]
Understanding and Tackling Label Errors in Deep Learning-Based Vulnerability Detection (Experience Paper) (2023), ISSTA, X Nie, et al. [pdf]

Data Noise

Slice-Based Code Change Representation Learning (2023), SANER, F Zhang, et al. [pdf]
Are we building on the rock? on the importance of data preprocessing for code summarization (2022), FSE, L Shi, et al. [pdf]
Neural-Machine-Translation-Based Commit Message Generation: How Far Are We? (2018), ASE, Z Liu, et al. [pdf]

System Design and Learning

Data Snooping

AutoTransform: automated code transformation to support modern code review process (2022), ICSE, Thongtanunam, Patanamon, Chanathip Pornprasit, and Chakkrit Tantithamthavorn. [pdf]
Can Neural Clone Detection Generalize to Unseen Functionalitiesƒ (2021), ASE, C Liu, et al. [pdf]
CD-VulD: Cross-Domain Vulnerability Discovery Based on Deep Domain Adaptation (2020), TDSC, S Liu, et al. [pdf]
Deep just-in-time defect prediction: how far are we? (2021), ISSTA, Z Zeng, et al. [pdf]
Patching as translation: the data and the metaphor (2020), ASE, Y Ding, et al. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, et al. [pdf]
Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models (2302), ICSE, S Gao, et al. [pdf]
Revisiting Learning-based Commit Message Generation (2023), ICSE, J Dong, Y Lou, D Hao, et al. [pdf]
Syntax and Domain Aware Model for Unsupervised Program Translation (2302), ICSE, F Liu, J Li, L Zhang. [pdf]
How Effective Are Neural Networks for Fixing Security Vulnerabilities (2023), ISSTA, Y Wu, N Jiang, HV Pham, et al. [pdf]
Towards More Realistic Evaluation for Neural Test Oracle Generation (2305), ISSTA, Z Liu, K Liu, X Xia, et al. [pdf]
On the Evaluation of Neural Code Summarization (2022), ICSE, E Shi, Y Wang, L Du, et al. [pdf]

Spurious Correlations

Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Diet code is healthy: simplifying programs for pre-trained models of code (2022), FSE, Z Zhang, H Zhang, B Shen, et al. [pdf]
Explaining mispredictions of machine learning models using rule induction (2021), FSE, J Cito, I Dillig, S Kim, et al. [pdf]
Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching (2021), TOSEM, D Zou, Y Zhu, S Xu, et al. [pdf]
Thinking Like a Developer? Comparing the Attention of Humans with Neural Models of Code (2021), ASE, M Paltenghi, M Pradel. [pdf]
Vulnerability detection with fine-grained interpretations (2021), FSE, Y Li, S Wang, TN Nguyen. [pdf]
What do they capture? a structural analysis of pre-trained language models for source code (2022), ICSE, Y Wan, W Zhao, H Zhang, et al. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, MM Rahman, R Jiles, et al. [pdf]
Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond (2023), ISSTA, E Shi, Y Wang, H Zhang, et al. [pdf]

Inappropriate Model Design

Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking (2022), TSE, H Wang, P Ma, Y Yuan, et al. [pdf]
Improving automatic source code summarization via deep reinforcement learning (2018), ASE, Y Wan, Z Zhao, M Yang, et al.[[pdf]](https:/

ReliableLM4Code

Install / Use

README

ReliableLM4Code

Content

Papers

Data Collection and Labeling

Unbalanced Distribution

Label Errors

Data Noise

System Design and Learning

Data Snooping

Spurious Correlations

Inappropriate Model Design