M2TS
No description available
Install / Use
/learn @TranSMS/M2TSREADME
M2TS: Multi-scale Multi-modal Approach Based on Transformer for Source Code Summarization
We public the source code, datasets and results for the M2TS.
Datasets
In the M2TS, we use three large-scale datasets for experiments, including two Java and one Python datasets. In data file, we give the three datasets, which obtain from following paper. If you want to train the model, you must download the datasets.
PYB(Python Barone) dataset
- paper: https://arxiv.org/abs/1707.02275
- data: https://github.com/EdinburghNLP/code-docstring-corpus
JAH(Java Hu) dataset
- paper: https://xin-xia.github.io/publication/ijcai18.pdf
- data: https://github.com/xing-hu/TL-CodeSum
JAL(Java Leclair) dataset
- paper: https://arxiv.org/pdf/1904.02660.pdf
- data: http://leclair.tech/data/funcom/
Data preprocessing
M2TS uses ASTs and source code modalities, which uses the JDK compiler to parse java methods as ASTs, and the Treelib toolkit to prase Python functions as ASTs. In addition, before embedding ASTs, we use BERT method to embed the information of nodes.
Get ASTs
In Data-pre file, the java_get_ast.py generates ASTs for two Java datasets and python_get_ast.py generates ASTs for Python functions. You can run the following command:
python3 source.code ast.json
BERT embedding
Get here to Install the server and client, detail in here
pip install bert-serving-server
pip install bert-serving-client
Train-Test
In Model file, the _main_.py enables to train the model. We evaluate the quality of the generated summaries using four evaluation metrics.
Train and test model:
Python3 _main_.py
The nlg-eval can be set up in the following way, detail in here.
Install Java 1.8.0 (or higher).
Install the Python dependencies, run:
pip install git+https://github.com/Maluuba/nlg-eval.git@master
Requirements
If you want to run the model, you will need to install the following packages.
pytorch 1.7.1
bert-serving-client 1.10.0
bert-serving-server 1.10.0
javalang 0.13.0
nltk 3.5
networkx 2.5
scipy 1.1.0
treelib 1.6.1
Results
In result file, we give the testing results on the datasets. The Python_pre.txt is the generated summaries for PYB dataset.
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
342.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
