EasyR1
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Install / Use
/learn @hiyouga/EasyR1README
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework
Used by Amazon Web Services
This project is a clean fork of the original veRL project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.
EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.
Features
-
Supported models
- Llama3/Qwen2/Qwen2.5/Qwen3 language models
- Qwen2-VL/Qwen2.5-VL/Qwen3-VL vision language models
- DeepSeek-R1 distill models
-
Supported algorithms
- GRPO
- DAPO
- Reinforce++
- ReMax
- RLOO
- GSPO
- CISPO
-
Supported datasets
- Any text, vision-text dataset in a specific format
-
Supported tricks
- Padding-free training
- LoRA training
- Resuming from the latest/best checkpoint
- Wandb & SwanLab & Mlflow & Tensorboard tracking
Requirements
Software Requirements
- Python 3.9+
- transformers>=4.54.0
- flash-attn>=2.4.3
- vllm>=0.8.3
We provide a Dockerfile to easily build environments.
We recommend using the pre-built docker image in EasyR1.
docker pull hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0
docker run -it --ipc=host --gpus=all hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0
If your environment does not support Docker, you can consider using Apptainer:
apptainer pull easyr1.sif docker://hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0
apptainer shell --nv --cleanenv --bind /mnt/your_dir:/mnt/your_dir easyr1.sif
Use USE_MODELSCOPE_HUB=1 to download models from the ModelScope hub.
Hardware Requirements
* estimated
| Method | Bits | 1.5B | 3B | 7B | 32B | 72B | | ------------------------ | ---- | ------ | ------ | ------ | ------- | ------- | | GRPO Full Fine-Tuning | AMP | 224GB | 440GB | 840GB | 1680GB | 3280GB | | GRPO Full Fine-Tuning | BF16 | 124GB | 140GB | 440GB | 880GB | 1680GB | | GRPO LoRA Fine-Tuning | AMP | 112GB | 124GB | 232GB | 280GB | 4*80GB |
[!NOTE] Use
worker.actor.fsdp.torch_dtype=bf16andworker.actor.optim.strategy=adamw_bf16to enable bf16 training.
Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

Installation
git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .
GRPO Full Training
bash examples/qwen2_5_vl_7b_geo3k_grpo.sh
GRPO LoRA Training
bash examples/qwen3_vl_4b_geo3k_grpo_lora.sh
Merge Checkpoint in Hugging Face Format
python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actor
[!TIP] If you encounter issues with connecting to Hugging Face, consider using
export HF_ENDPOINT=https://hf-mirror.com.If you want to use SwanLab logger, consider using
bash examples/qwen2_5_vl_7b_geo3k_swanlab.sh.
Custom Dataset
Please refer to the example datasets to prepare your own dataset.
- Text dataset: https://huggingface.co/datasets/hiyouga/math12k
- Image-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
- Multi-image-text dataset: https://huggingface.co/datasets/hiyouga/journeybench-multi-image-vqa
- Text-image mixed dataset: https://huggingface.co/datasets/hiyouga/rl-mixed-dataset
How to Understand GRPO in EasyR1

- To learn about the GRPO algorithm, you can refer to Hugging Face's blog.
How to Run 70B+ Model in Multi-node Environment
- Start the Ray head node.
ray start --head --port=6379 --dashboard-host=0.0.0.0
- Start the Ray worker node and connect to the head node.
ray start --address=<head_node_ip>:6379
- Check the Ray resource pool.
ray status
- Run training script on the Ray head node only.
bash examples/qwen2_5_vl_7b_geo3k_grpo.sh
See the veRL's official doc for more details about multi-node training and Ray debugger.
Other Baselines
We also reproduced the following two baselines of the R1-V project.
- CLEVR-70k-Counting: Train the Qwen2.5-VL-3B-Instruct model on counting problem.
- GeoQA-8k: Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem.
Performance Baselines
See baselines.md.
Awesome Work using EasyR1
- MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources.
- Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models.
- Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement.
- MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse.
- Temporal-R1: Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward.
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation.
- GUI-R1: A Generalist R1-Style Vision-Language Action Model For GUI Agents.
- FAST-GRPO: Fast-Slow Thinking framework that dynamically adapts reasoning depth based on question characteristics.
- R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning.
- VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning.
- MM-UPT: Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO.
- RL-with-Cold-Start: Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start.
- ViGoRL: Grounded Reinforcement Learning for Visual Reasoning.
- Revisual-R1: Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning.
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward.
- Vision-Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning. [![[code]](https://img.shield
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
16.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
