VeriReason

This is the Github Repo for the paper: VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation

Generate Convert Improve

Install / Use

/learn @NellyW8/VeriReason

About this skill

Quality Score

0/100

README

VeriReason Repository

This repository contains tools and configurations for training language models for the paper:

Project Description

This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre-trained models for Verilog RTL code generation. VeriReason-Qwen2.5-3B is a 3B parameter model based on Qwen2.5-Coder-3B that combines supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation. The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis in a smaller model size. By using our curated high-quality training examples alongside a feedback-driven reward model, this 3B parameter model delivers exceptional performance on Verilog generation tasks while maintaining efficiency.

Training Options

Supervised Fine-Tuning (SFT)

You can use either of the following methods to train an SFT model:

Using LLamaFactory

llamafactory-cli train qwen2.5_7b.yaml

Using OpenR1

Move sft_rtl to the folder: src/open_r1/
Make the training script executable:
```
chmod +x run_rtl_training.sh
```
Run the training script:
```
./run_rtl_training.sh
```

GRPO Training

For GRPO (Generative Reinforcement Learning from Preference Optimization) training:

Move the necessary files to the OpenR1 directory:

mv verilog_rewards_tb.py verilog_train_tb.py src/open-r1/

Create a new directory for the Verilog recipe:

mkdir verilog_recipe
mv verilog_grpo_tb.yaml verilog_recipe/

Example training command:

NCCL_DEBUG=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL CUDA_VISIBLE_DEVICES=5,6,7 ACCELERATE_USE_NCCL=1 accelerate launch --config_file recipes/accelerate_configs/zero3.yaml --num_processes=3 src/open_r1/verilog_train_rtlcoder.py --config verilog_recipe/verilog_grpo_tb.yaml --use_vllm=false

Datasets

The following datasets are available on Hugging Face:

| Dataset | Description | Link | |---------|-------------|------| | RTL-Coder_small | Filtered dataset with no reasoning | Link | | RTL-Coder_7b_reasoning_tb_simple | VeriReason simple dataset with reasoning and testbench | Link | | RTL-Coder_7b_reasoning_tb | VeriReason hard dataset with reasoning and testbench | Link | | RTL-Coder_7b_reasoning_tb_combined | VeriReason combined dataset with reasoning and testbench | Link |

Model checkpoints

The following fine-tuned models are available on Hugging Face:

| Model | Description | Link | |-------|-------------|------| | VeriReason-Qwen2.5-1.5B | 1.5B parameter model based on Qwen2.5 | Link | | VeriReason-Qwen2.5-3B | 3B parameter model based on Qwen2.5 with RTL GRPO | Link | | VeriReason-Qwen2.5-7b | 7B parameter model based on Qwen2.5 with SFT Reasoning | Link | | VeriReason-Llama-7b | 7B parameter model based on Code Llama | Link |

Requirements

CUDA-compatible GPUs
PyTorch with CUDA support
Accelerate library
NCCL for distributed training

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

400

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

NellyW8

View profile

View on GitHub

GitHub Stars23

CategoryEducation

Updated6d ago

Forks6

NellyW8/VeriReason

Languages

Python

Security Score

80/100

Audited on Apr 2, 2026

No findings