VeriReason
This is the Github Repo for the paper: VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
Install / Use
/learn @NellyW8/VeriReasonREADME
VeriReason Repository
This repository contains tools and configurations for training language models for the paper:
Project Description
This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre-trained models for Verilog RTL code generation. VeriReason-Qwen2.5-3B is a 3B parameter model based on Qwen2.5-Coder-3B that combines supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation. The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis in a smaller model size. By using our curated high-quality training examples alongside a feedback-driven reward model, this 3B parameter model delivers exceptional performance on Verilog generation tasks while maintaining efficiency.
<p align="center"> <img src="assets/Verireason_workflow.png" alt="VeriReason Workflow" width="800"/> </p>Training Options
Supervised Fine-Tuning (SFT)
You can use either of the following methods to train an SFT model:
Using LLamaFactory
llamafactory-cli train qwen2.5_7b.yaml
Using OpenR1
- Move
sft_rtlto the folder:src/open_r1/ - Make the training script executable:
chmod +x run_rtl_training.sh - Run the training script:
./run_rtl_training.sh
GRPO Training
For GRPO (Generative Reinforcement Learning from Preference Optimization) training:
-
Move the necessary files to the OpenR1 directory:
mv verilog_rewards_tb.py verilog_train_tb.py src/open-r1/ -
Create a new directory for the Verilog recipe:
mkdir verilog_recipe mv verilog_grpo_tb.yaml verilog_recipe/ -
Example training command:
NCCL_DEBUG=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL CUDA_VISIBLE_DEVICES=5,6,7 ACCELERATE_USE_NCCL=1 accelerate launch --config_file recipes/accelerate_configs/zero3.yaml --num_processes=3 src/open_r1/verilog_train_rtlcoder.py --config verilog_recipe/verilog_grpo_tb.yaml --use_vllm=false
Datasets
The following datasets are available on Hugging Face:
| Dataset | Description | Link | |---------|-------------|------| | RTL-Coder_small | Filtered dataset with no reasoning | Link | | RTL-Coder_7b_reasoning_tb_simple | VeriReason simple dataset with reasoning and testbench | Link | | RTL-Coder_7b_reasoning_tb | VeriReason hard dataset with reasoning and testbench | Link | | RTL-Coder_7b_reasoning_tb_combined | VeriReason combined dataset with reasoning and testbench | Link |
Model checkpoints
The following fine-tuned models are available on Hugging Face:
| Model | Description | Link | |-------|-------------|------| | VeriReason-Qwen2.5-1.5B | 1.5B parameter model based on Qwen2.5 | Link | | VeriReason-Qwen2.5-3B | 3B parameter model based on Qwen2.5 with RTL GRPO | Link | | VeriReason-Qwen2.5-7b | 7B parameter model based on Qwen2.5 with SFT Reasoning | Link | | VeriReason-Llama-7b | 7B parameter model based on Code Llama | Link |
Requirements
- CUDA-compatible GPUs
- PyTorch with CUDA support
- Accelerate library
- NCCL for distributed training
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
