SkillAgentSearch skills...

M2TS

No description available

Install / Use

/learn @TranSMS/M2TS
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

M2TS: Multi-scale Multi-modal Approach Based on Transformer for Source Code Summarization

We public the source code, datasets and results for the M2TS.

Datasets

In the M2TS, we use three large-scale datasets for experiments, including two Java and one Python datasets. In data file, we give the three datasets, which obtain from following paper. If you want to train the model, you must download the datasets.

PYB(Python Barone) dataset

  • paper: https://arxiv.org/abs/1707.02275
  • data: https://github.com/EdinburghNLP/code-docstring-corpus

JAH(Java Hu) dataset

  • paper: https://xin-xia.github.io/publication/ijcai18.pdf
  • data: https://github.com/xing-hu/TL-CodeSum

JAL(Java Leclair) dataset

  • paper: https://arxiv.org/pdf/1904.02660.pdf
  • data: http://leclair.tech/data/funcom/

Data preprocessing

M2TS uses ASTs and source code modalities, which uses the JDK compiler to parse java methods as ASTs, and the Treelib toolkit to prase Python functions as ASTs. In addition, before embedding ASTs, we use BERT method to embed the information of nodes.

Get ASTs

In Data-pre file, the java_get_ast.py generates ASTs for two Java datasets and python_get_ast.py generates ASTs for Python functions. You can run the following command:

python3 source.code ast.json

BERT embedding

Get here to Install the server and client, detail in here

pip install bert-serving-server  
pip install bert-serving-client

Train-Test

In Model file, the _main_.py enables to train the model. We evaluate the quality of the generated summaries using four evaluation metrics. Train and test model:

Python3 _main_.py

The nlg-eval can be set up in the following way, detail in here.
Install Java 1.8.0 (or higher).
Install the Python dependencies, run:

pip install git+https://github.com/Maluuba/nlg-eval.git@master

Requirements

If you want to run the model, you will need to install the following packages.

pytorch 1.7.1  
bert-serving-client 1.10.0  
bert-serving-server 1.10.0  
javalang 0.13.0  
nltk 3.5  
networkx 2.5  
scipy 1.1.0  
treelib 1.6.1

Results

In result file, we give the testing results on the datasets. The Python_pre.txt is the generated summaries for PYB dataset.

Related Skills

View on GitHub
GitHub Stars24
CategoryDevelopment
Updated3mo ago
Forks3

Languages

NewLisp

Security Score

67/100

Audited on Dec 16, 2025

No findings