LLM4CodeSummarization
No description available
Install / Use
/learn @wssun/LLM4CodeSummarizationREADME
LLM4CodeSummarization
Code for 《Source Code Summarization in the Era of Large Language Models》
Environment
Our experiment runs with Python 3.7 and Pytorch 1.6.0.
Other packages required can be installed with pip install -r requirements.txt.
Datasets
The datasets used in our experiments can be found here, including human evaluation datasets.
Build Erlang, Haskell and Prolog Dataset
Code for building Erlang, Haskell and Prolog Dataset is in the dataset directory.
cd ./dataset
- Crawl data from Github
python crawl.py
- Extract <function, summary> pairs
python erlang.py
python haskell.py
python prolog.py
Use LLMs for Code Summarization
- Calling LLMs to generate comments
python run.py
- Extract comments from LLMs' response
python beautify.py
Evaluate with LLMs
- Evaluate with GPT-4 (used for RQ2-RQ5)
python evaluate.py
- Evaluate with LLMs on the human evaluation dataset (used for RQ1). File
human_eval_record_{language}.csvcan be found here.
python llm-eval.py
Results
We upload the results in our experiment here, in which:
-
codesumdirectory contains LLMs' response (.csv) and the comment (.txt) extracted from the response -
gpt-evaldirectory contains GPT-4's evaluation scores in RQ2-RQ5 -
RQ1directory contains human evaluation scores and evaluation scores of each metric in RQ1
Figures
The directory ./figures contrains examples of five prompting techniques (zero-shot, few-shot, chain-of-thought, critique, expert)
, which are not presented in the paper due to page limit.
