SkillAgentSearch skills...

EasyR1

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Install / Use

/learn @hiyouga/EasyR1
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

GitHub Repo stars Twitter Docker Pulls

Used by Amazon Web Services

This project is a clean fork of the original veRL project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.

EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.

Features

  • Supported models

    • Llama3/Qwen2/Qwen2.5/Qwen3 language models
    • Qwen2-VL/Qwen2.5-VL/Qwen3-VL vision language models
    • DeepSeek-R1 distill models
  • Supported algorithms

    • GRPO
    • DAPO new
    • Reinforce++
    • ReMax
    • RLOO
    • GSPO new
    • CISPO new
  • Supported datasets

  • Supported tricks

    • Padding-free training
    • LoRA training new
    • Resuming from the latest/best checkpoint
    • Wandb & SwanLab & Mlflow & Tensorboard tracking

Requirements

Software Requirements

  • Python 3.9+
  • transformers>=4.54.0
  • flash-attn>=2.4.3
  • vllm>=0.8.3

We provide a Dockerfile to easily build environments.

We recommend using the pre-built docker image in EasyR1.

docker pull hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0
docker run -it --ipc=host --gpus=all hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0

If your environment does not support Docker, you can consider using Apptainer:

apptainer pull easyr1.sif docker://hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0
apptainer shell --nv --cleanenv --bind /mnt/your_dir:/mnt/your_dir easyr1.sif

Use USE_MODELSCOPE_HUB=1 to download models from the ModelScope hub.

Hardware Requirements

* estimated

| Method | Bits | 1.5B | 3B | 7B | 32B | 72B | | ------------------------ | ---- | ------ | ------ | ------ | ------- | ------- | | GRPO Full Fine-Tuning | AMP | 224GB | 440GB | 840GB | 1680GB | 3280GB | | GRPO Full Fine-Tuning | BF16 | 124GB | 140GB | 440GB | 880GB | 1680GB | | GRPO LoRA Fine-Tuning | AMP | 112GB | 124GB | 232GB | 280GB | 4*80GB |

[!NOTE] Use worker.actor.fsdp.torch_dtype=bf16 and worker.actor.optim.strategy=adamw_bf16 to enable bf16 training.

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

image

Installation

git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .

GRPO Full Training

bash examples/qwen2_5_vl_7b_geo3k_grpo.sh

GRPO LoRA Training

bash examples/qwen3_vl_4b_geo3k_grpo_lora.sh

Merge Checkpoint in Hugging Face Format

python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actor

[!TIP] If you encounter issues with connecting to Hugging Face, consider using export HF_ENDPOINT=https://hf-mirror.com.

If you want to use SwanLab logger, consider using bash examples/qwen2_5_vl_7b_geo3k_swanlab.sh.

Custom Dataset

Please refer to the example datasets to prepare your own dataset.

  • Text dataset: https://huggingface.co/datasets/hiyouga/math12k
  • Image-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
  • Multi-image-text dataset: https://huggingface.co/datasets/hiyouga/journeybench-multi-image-vqa
  • Text-image mixed dataset: https://huggingface.co/datasets/hiyouga/rl-mixed-dataset

How to Understand GRPO in EasyR1

image

How to Run 70B+ Model in Multi-node Environment

  1. Start the Ray head node.
ray start --head --port=6379 --dashboard-host=0.0.0.0
  1. Start the Ray worker node and connect to the head node.
ray start --address=<head_node_ip>:6379
  1. Check the Ray resource pool.
ray status
  1. Run training script on the Ray head node only.
bash examples/qwen2_5_vl_7b_geo3k_grpo.sh

See the veRL's official doc for more details about multi-node training and Ray debugger.

Other Baselines

We also reproduced the following two baselines of the R1-V project.

  • CLEVR-70k-Counting: Train the Qwen2.5-VL-3B-Instruct model on counting problem.
  • GeoQA-8k: Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem.

Performance Baselines

See baselines.md.

Awesome Work using EasyR1

  • MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources. [code] [arxiv]
  • Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models. [code] [arxiv]
  • Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement. [code] [arxiv]
  • MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse. [code] [arxiv]
  • Temporal-R1: Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward. [code] [arxiv]
  • NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation. [code] [arxiv]
  • GUI-R1: A Generalist R1-Style Vision-Language Action Model For GUI Agents. [code] [arxiv]
  • FAST-GRPO: Fast-Slow Thinking framework that dynamically adapts reasoning depth based on question characteristics. [code] [arxiv]
  • R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning. [code]
  • VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning. [code] [arxiv]
  • MM-UPT: Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO. [code] [arxiv]
  • RL-with-Cold-Start: Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start. [code] [arxiv]
  • ViGoRL: Grounded Reinforcement Learning for Visual Reasoning. [code] [arxiv]
  • Revisual-R1: Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning. [code] [arxiv]
  • SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward. [code] [arxiv]
  • Vision-Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning. [![[code]](https://img.shield

Related Skills

View on GitHub
GitHub Stars4.8k
CategoryEducation
Updated2h ago
Forks363

Languages

Python

Security Score

100/100

Audited on Mar 31, 2026

No findings