AutoVCoder
ICCD'24 paper: "AutoVCoder: A systematic framework for automated verilog code generation"
Install / Use
/learn @sjtu-zhao-lab/AutoVCoderREADME
AutoVCoder
Introduction
AutoVCoder is a systematic open-source framework designed to significantly improve the correctness of large language models (LLMs) in generating Verilog code, while also enhancing the quality of their output. The framework integrates three novel techniques, including a high-quality hardware dataset generation approach, a two-round LLM fine-tuning method, and a domain-specific retrieval-augmented generation (RAG) mechanism.
- High-Quality Hardware Dataset Generation: Enhances model training by generating high-quality datasets.
- Two-Round LLM Fine-Tuning: Further improves the model's generation capabilities through a two-round fine-tuning process.
- Domain-Specific RAG Mechanism: Utilizes retrieval-augmented generation to improve the quality of Verilog code generation.
Experimental results demonstrate that AutoVCoder outperforms both industrial and academic LLMs in Verilog code generation.
Our paper, "AutoVCoder: A Systematic Framework for Automated Verilog Code Generation using LLMs" is presented at ICCD '24. Please refer to our paper for more details:
@inproceedings{autovcoder,
author = {Mingzhe Gao and Jieru Zhao and Zhe Lin and Wenchao Ding and Xiaofeng Hou and Yu Feng and Chao Li and Minyi Guo},
title = {AutoVCoder: A Systematic Framework for Automated Verilog Code Generation using LLMs},
journal = {IEEE International Conference on Computer Design, ICCD},
year = {2024},
}
Overview

The picture shows our framework, which includes three key components: 1) generating a high-quality hardware dataset, 2) a two-round fine-tuning process for LLMs, and 3) a domain-specific retriever training mechanism for RAG. We collect Verilog designs from GitHub and use a scoring system to filter out low-quality code. The refined dataset is used for the first round of LLM fine-tuning to teach Verilog syntax and design principles. For the second round, we create a synthetic dataset using ChatGPT-3.5 and a verification process to ensure code correctness. Finally, we enhance Verilog code generation with RAG by training domain-specific retrievers using contrastive learning, which helps fetch relevant examples and knowledge.
Requirement
You can install the required packages for running this project using:
pip3 install --upgrade pip
pip3 install -r requirements.txt
Project File Tree
The project file structure is as below:
.
├── data
│ ├── first_round
│ │ └── dataset # dataset for first round
│ ├── second_round
│ │ └── dataset # dataset for second round
│ └── rag
│ └── dataset # dataset for RAG
├── src
│ ├── first_round
│ │ ├── build_dataset # Build dataset for first round
│ │ └── train # Train model for first round
│ ├── second_round
│ │ ├── build_dataset # Build dataset for second round
│ │ └── train # Train model for second round
│ └── rag # RAG dataset
│ └── train # Train model for retriever
├── tests
│ ├── rtllm # RTLLM benchmark
│ └── verilog-eval # verilog-eval benchmark
│ └── test.py # Script for testing the model
├── requirements.txt # List of project dependencies
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contact
For any questions or suggestions, please contact us via GitHub Issues.
Related Skills
node-connect
341.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.4kCommit, push, and open a PR
