EasyR1

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Generate Convert Improve

Install / Use

/learn @hiyouga/EasyR1

About this skill

Quality Score

0/100

README

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

Used by Amazon Web Services

This project is a clean fork of the original veRL project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.

EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.

Features

Supported models
- Llama3/Qwen2/Qwen2.5/Qwen3 language models
- Qwen2-VL/Qwen2.5-VL/Qwen3-VL vision language models
- DeepSeek-R1 distill models
Supported algorithms
- GRPO
- DAPO
- Reinforce++
- ReMax
- RLOO
- GSPO
- CISPO
Supported datasets
- Any text, vision-text dataset in a specific format
Supported tricks
- Padding-free training
- LoRA training
- Resuming from the latest/best checkpoint
- Wandb & SwanLab & Mlflow & Tensorboard tracking

Requirements

Software Requirements

Python 3.9+
transformers>=4.54.0
flash-attn>=2.4.3
vllm>=0.8.3

We provide a Dockerfile to easily build environments.

We recommend using the pre-built docker image in EasyR1.

docker pull hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0
docker run -it --ipc=host --gpus=all hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0

If your environment does not support Docker, you can consider using Apptainer:

apptainer pull easyr1.sif docker://hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0
apptainer shell --nv --cleanenv --bind /mnt/your_dir:/mnt/your_dir easyr1.sif

Use USE_MODELSCOPE_HUB=1 to download models from the ModelScope hub.

Hardware Requirements

* estimated

| Method | Bits | 1.5B | 3B | 7B | 32B | 72B | | ------------------------ | ---- | ------ | ------ | ------ | ------- | ------- | | GRPO Full Fine-Tuning | AMP | 224GB | 440GB | 840GB | 1680GB | 3280GB | | GRPO Full Fine-Tuning | BF16 | 124GB | 140GB | 440GB | 880GB | 1680GB | | GRPO LoRA Fine-Tuning | AMP | 112GB | 124GB | 232GB | 280GB | 4*80GB |

[!NOTE] Use worker.actor.fsdp.torch_dtype=bf16 and worker.actor.optim.strategy=adamw_bf16 to enable bf16 training.

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

Installation

git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .

GRPO Full Training

bash examples/qwen2_5_vl_7b_geo3k_grpo.sh

GRPO LoRA Training

bash examples/qwen3_vl_4b_geo3k_grpo_lora.sh

Merge Checkpoint in Hugging Face Format

python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actor

[!TIP] If you encounter issues with connecting to Hugging Face, consider using export HF_ENDPOINT=https://hf-mirror.com.

If you want to use SwanLab logger, consider using bash examples/qwen2_5_vl_7b_geo3k_swanlab.sh.

Custom Dataset

Please refer to the example datasets to prepare your own dataset.

Text dataset: https://huggingface.co/datasets/hiyouga/math12k
Image-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
Multi-image-text dataset: https://huggingface.co/datasets/hiyouga/journeybench-multi-image-vqa
Text-image mixed dataset: https://huggingface.co/datasets/hiyouga/rl-mixed-dataset

How to Understand GRPO in EasyR1

To learn about the GRPO algorithm, you can refer to Hugging Face's blog.

How to Run 70B+ Model in Multi-node Environment

Start the Ray head node.

ray start --head --port=6379 --dashboard-host=0.0.0.0

Start the Ray worker node and connect to the head node.

ray start --address=<head_node_ip>:6379

Check the Ray resource pool.

ray status

Run training script on the Ray head node only.

bash examples/qwen2_5_vl_7b_geo3k_grpo.sh

See the veRL's official doc for more details about multi-node training and Ray debugger.

Other Baselines

We also reproduced the following two baselines of the R1-V project.

CLEVR-70k-Counting: Train the Qwen2.5-VL-3B-Instruct model on counting problem.
GeoQA-8k: Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem.

Performance Baselines

See baselines.md.

Awesome Work using EasyR1

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources.
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models.
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement.
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse.
Temporal-R1: Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward.
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation.
GUI-R1: A Generalist R1-Style Vision-Language Action Model For GUI Agents.
FAST-GRPO: Fast-Slow Thinking framework that dynamically adapts reasoning depth based on question characteristics.
R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning.
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning.
MM-UPT: Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO.
RL-with-Cold-Start: Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start.
ViGoRL: Grounded Reinforcement Learning for Visual Reasoning.
Revisual-R1: Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning.
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward.
Vision-Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning. [![[code]](https://img.shield

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

16.5k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

sec-edgar-agentkit

AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.

hiyouga

View profile

View on GitHub

GitHub Stars4.8k

CategoryEducation

Updated2h ago

Forks363

hiyouga/EasyR1

Languages

Python

Security Score

100/100

Audited on Mar 31, 2026

No findings