SkillAgentSearch skills...

CodeLLMPaper

A continuously updated collection of CodeLLM papers maintained by PurCL group @ Purdue

Install / Use

/learn @PurCL/CodeLLMPaper
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

CodeLLM Paper <a href=https://github.com/PurCL/CodeLLMPaper><img src='https://img.shields.io/github/stars/PurCL/CodeLLMPaper' width="120" height="26" /></a>

This repository provides a curated list of research papers focused on Large Language Models (LLMs) for code. It aims to facilitate researchers and practitioners in exploring the rapidly growing body of literature on this topic. The papers are systematically collected from various top-tier venues, categorized, and labeled for easier navigation.

Table of Contents

A. Venues

We have systematically selected papers from the following venues, which are top-tier conferences and journals in SE/PL/Sec/NLP communities.

Due to the large volume, we do not systematically collect the papers published in top-tier ML conferences (ICML, NeurIPS, and ICLR) and arXiv. However, we are keeping manually adding important works published in these venues. We plan to expand the collection over time, and contributions are welcome. For details, see the section How to Contribute.

B. Selection Strategy

  1. Abstract Extraction: Extract the abstracts from bib files or HTML files. The bib and HTML files of the above listed venues are stored in the directory data/rawdata.

  2. Keyword Matching: Filter abstracts that meet both of the following conditions:

    • Contains at least one keyword from: {"pretrain", "LLM", "large language model", "transformer", "code model"}.

    • Contains the keyword "code" or "program".

  3. Relevance Check Using LLMs: Use LLMs to verify if the papers obtained in Step 2 are related to LLMs for code.

  4. Manual Labeling: Manually assign labels to the papers based on domain knowledge.

All the selected papers along with the labels are maintained in the json file data/labeldata/labeldata.json. src/process.py is the python script used for selecting and labeling papers.

C. Taxonomy

The papers in this repository are categorized along three dimensions: Application, Principle, and Research Paradigm. Each paper is assigned multiple labels based on these categories. Note that categories are not necessarily disjoint.

C.1. Application

This category focuses on typical tasks in Software Engineering (SE) and Programming Languages (PL).

C.2. Principle

This category concentrates on the LLMs' ability in understanding different forms of code and the non-functional properties of the LLMs (e.g., security and robustness). We also consider how to utilize the LLMs for general reasoning problems, such as typical agent-centric designs and specific PL designs for LLMs.

Related Skills

View on GitHub
GitHub Stars615
CategoryDevelopment
Updated1d ago
Forks44

Languages

HTML

Security Score

85/100

Audited on Mar 30, 2026

No findings