SkillAgentSearch skills...

Ml4se

A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering

Install / Use

/learn @saltudelft/Ml4se

README

Machine Learning for Software Engineering

GitHub last commit badge

This repository contains a curated list of papers, PhD theses, datasets, and tools that are devoted to research on Machine Learning for Software Engineering. The papers are organized into popular research areas so that researchers can find recent papers and state-of-the-art approaches easily.

Please feel free to send a pull request to add papers and relevant content that are not listed here.

Content

Papers

Type Inference

  • Concrete Type Inference for Code Optimization using Machine Learning with SMT Solving (2023), OOPSLA'23, Ye, Fangke, et al. [pdf]
  • Learning Type Inference for Enhanced Dataflow Analysis (2023), ESORICS'23, Seidel, Lukas, et al. [pdf]
  • Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors (2023), ICSE'24, Peng, Yun, et al. [pdf]
  • DeepInfer: Deep Type Inference from Smart Contract Bytecode (2023), ESEC/FSE '23, Zhao, Kunsong, et al. [pdf]
  • Statistical Type Inference for Incomplete Programs (2023), ESEC/FSE '23, Peng, Yaohui, et al. [pdf]
  • DeMinify: Neural Variable Name Recovery and Type Inference (2023), ESEC/FSE '23, Li, Yi, et al. [pdf]
  • Learning Type Inference for Enhanced Dataflow Analysis (2023), ESORICS '23, Seidel, L. & Baker Effendi, D., et al. [pdf]
  • FQN Inference in Partial Code by Prompt-tuned Language Model of Code (2023), TOSEM journal, Huang, Qing, et al.
  • Generative Type Inference for Python (2023), ASE'23, Peng, Yun, et al. [pdf]
  • Type Prediction With Program Decomposition and Fill-in-the-Type Training (2023), arxiv, Cassano, Federico, et al. [pdf]
  • TypeT5: Seq2seq Type Inference using Static Analysis (2023), ICLR'23, Wei, Jiayi, et al. [pdf]
  • Do Machine Learning Models Produce TypeScript Types that Type Check? (2023), arxiv, Yee, M., and Arjun G. [pdf]
  • Cross-Domain Evaluation of a Deep Learning-Based Type Inference System (2022), arxiv, Gruner, Bernd, et al. [pdf] [code]
  • Learning To Predict User-Defined Types (2022), TSE'22, Jesse, Keven, et al. [pdf]
  • Recovering Container Class Types in C++ Binaries (2022), CGO'22, Wang, Xudong, et al.
  • Finding the Dwarf: Recovering Precise Types from WebAssembly Binaries (2022), PLDI'22, Lehmann, Daniel and Pradel, Michael [pdf]
  • Type4Py: Practical Deep Similarity Learning-Based Type Inference for Python (2022), ICSE'22, Mir, Amir, et al. [pdf][code]
  • Static Inference Meets Deep Learning: A Hybrid Type Inference Approach for Python (2022), ICSE'22, Peng, Yun, et al. [pdf]
<details><summary><b>Older:</b></i></summary> <div>
  • StateFormer: Fine-grained Type Recovery from Binaries Using Generative State Modeling (2021), FSE'21, Pei, Kexin, et al. [pdf][code]
  • Type Inference as Optimization (2021), NeurIPS'21 AIPLANS, Pandi, Irene Vlassi, et al. [pdf]
  • SimTyper: Sound Type Inference for Ruby using Type Equality Prediction (2021), OOPSLA'21, Kazerounian, Milod, et al.
  • Learning type annotation: is big data enough? (2021), FSE 2021, Jesse, Kevin, et al. [pdf][code]
  • Cross-Lingual Adaptation for Type Inference (2021), arxiv 2021, Li, Zhiming, et al. [pdf]
  • PYInfer: Deep Learning Semantic Type Inference for Python Variables (2021), arxiv 2021, Cui, Siwei, et al. [pdf]
  • Advanced Graph-Based Deep Learning for Probabilistic Type Inference (2020), arxiv 2020, Ye, Fangke, et al. [pdf]
  • Typilus: Neural Type Hints (2020), PLDI 2020, Allamanis, Miltiadis, et al. [pdf][code]
  • LambdaNet: Probabilistic Type Inference using Graph Neural Networks (2020), arxiv 2020, Wei, Jiayi, et al. [pdf]
  • TypeWriter: Neural Type Prediction with Search-based Validation (2019), arxiv 2019, Pradel, Michael, et al. [pdf]
  • NL2Type: Inferring JavaScript Function Types from Natural Language Information (2019), ICSE 2019, Malik, Rabee S., et al. [pdf][code]
  • Deep Learning Type Inference (2018), ESEC/FSE 2018, Hellendoorn, Vincent J., et al. [pdf][code]
  • Python Probabilistic Type Inference with Natural Language Support (2016), FSE 2016, Xu, Zhaogui, et al.
  • Predicting Program Properties from “Big Code” (2015) ACM SIGPLAN 2015, Raychev, Veselin, et al. [pdf]
</div> </details>

Code Completion

  • EXECREPOBENCH: Multi-level Executable Code Completion Evaluation (2025), arxiv, Yang, Jian, et al. [pdf]
  • ContextModule: Improving Code Completion via Repository-level Contextual Information (2025), arxiv, Guan, Zhanming, et al. [pdf]
  • REPOFUSE: Repository-Level Code Completion with Fused Dual Context (2024), arxiv, Liang, Ming, et al. [pdf]
  • Non-Autoregressive Line-Level Code Completion (2024), TOSEM, Liu, Fang, et al.
  • IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion (2024), arxiv, Li, Bolun, et al. [pdf]
  • Language Models for Code Completion: A Practical Evaluation (2024), ICSE'24, Izadi et al. [pdf]
  • Context Composing for Full Line Code Completion (2024), IDE'24, Semenkin et al. [pdf]
  • De-Hallucinator: Iterative Grounding for LLM-Based Code Completion (2024), arxiv, Eghbali, A., & Pradel, M. [pdf]
  • When Neural Code Completion Models Size up the Situation: Attaining Cheaper and Faster Completion through Dynamic Model Inference (2024), ICSE'24, Sun, Zhensu, et al. [pdf]
  • CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (2023), NeurIPS'23, Ding, Yangruibo, et al. [pdf]
  • Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context (2023), NeurIPS'23, Agrawal, Lakshya A., et al. [pdf]
  • Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation (2023), NeurIPS'23, Liu, Jiawei, et al. [pdf]
  • Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases (2023), arxiv, Tang, Ze, et al. [pdf]
  • RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems (2023), arxiv, Liu, T., et al. [pdf]
  • A Static Evaluation of Code Completion by Large Language Models (2023), arxiv, Ding, Hantian, et al. [pdf]
  • Large Language Models of Code Fail at Completing Code with Potential Bugs (2023), NeurIPS'23, Dinh, Tuan, et al. [pdf]
  • **RepoFusion: Training Code Models to Understand Your Repository

Related Skills

View on GitHub
GitHub Stars731
CategoryEducation
Updated11h ago
Forks92

Security Score

85/100

Audited on Mar 26, 2026

No findings