LLM4Vul
Reproduction package of the paper "Software Vulnerability Prediction in Low Resource Languages An Empirical Study of CodeBERT and ChatGPT" in International Conference on Evaluation andAssessment in Software Engineering (EASE) 2024
Install / Use
/learn @lhmtriet/LLM4VulREADME
Software Vulnerability Prediction in Low-Resource Languages
This is the README file for the reproduction package of the paper: "Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT".
I. Description
This project focuses on predicting software vulnerabilities using GPT-3.5-turbo model. It comprises data preprocessing, model training, and result analysis stages to predict vulnerabilities in software code.
II. Repository Structure
- DataPreprocessing.ipynb: A Jupyter notebook for reading the dataset and removing code comments to prepare the data for model training.
- OtherLanguageVulPrediction.ipynb: Implements a GPT model to predict software vulnerabilities. The model is trained using the cleaned dataset to detect and predict potential vulnerabilities.
- data/: Contains the original and pre-processed datasets, ready for use in model training and evaluation.
- utils/: Holds utility scripts and code snippets used in
DataPreprocessing.ipynbandOtherLanguageVulPrediction.ipynbfor data manipulation, preprocessing tasks, and API interactions.
III. Setup and Usage
Prerequisites:
- Python (3.x recommended)
- Jupyter Notebook or JupyterLab
- OpenAI API key for using GPT models
Steps
-
Install Dependencies Install all the required Python packages that list on OtherLanguageVulPrediction.ipynb
-
Set up your OpenAI API Key
- Obtain an API key from OpenAI's developer portal.
- In
OtherLanguageVulPrediction.ipynb, replaceopenai.api_key = ""with your API key, ensuring the key is kept confidential.
Usage
-
Start Jupyter Notebook or JupyterLab: jupyter notebook
-
Open DataPreprocessing.ipynb and run the cells to preprocess the dataset.
-
Open OtherLanguageVulPrediction.ipynb to train the model and view the prediction results.
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
