LightRFT

Light, Efficient, Omni-modal & Reward-model Driven Reinforcement Fine-Tuning Framework

</div>

📖 Introduction

LightRFT (Light Reinforcement Fine-Tuning) is an advanced reinforcement learning fine-tuning framework designed for Large Language Models (LLMs) and Vision-Language Models (VLMs). This framework provides efficient and scalable RLHF (Reinforcement Learning from Human Feedback) and RLVR training capabilities, supporting multiple state-of-the-art algorithms and distributed training strategies.

✨ Key Features

🚀 High-Performance Inference Engines
- Integrated vLLM and SGLang for efficient sampling and inference
- FP8 inference optimization for significantly reduced latency and memory usage
- Flexible engine sleep/wake mechanisms for optimal resource utilization
🧠 Rich Algorithm Ecosystem
- Policy Optimization: GRPO, GSPO, GMPO, Dr.GRPO
- Advantage Estimation: REINFORCE++, CPGD
- Reward Processing: Reward Norm/Clip
- Sampling Strategy: FIRE Sampling, Token-Level Policy
- Stability Enhancement: DAPO, select_high_entropy_tokens
🔧 Flexible Training Strategies
- FSDP (Fully Sharded Data Parallel) v2 support
- DeepSpeed ZeRO (Stage 1/2/3) support
- Gradient checkpointing and mixed precision training (BF16/FP16)
- Adam Offload and memory optimization techniques
🎯 Innovative Resource Collaboration
- Colocate Anything: Co-locate reward models with training models to maximize GPU utilization
  - Support multiple reward models for parallel inference on the same device
  - Dynamic memory management with automatic training/inference phase switching
  - Reduced cross-device communication overhead for improved end-to-end training efficiency
- Balance Anything 🚧 (Under Development): Intelligent load balancing system
  - Adaptive task scheduling and resource allocation
  - Automatic load balancing for multi-node training
  - Performance optimization for heterogeneous hardware environments
🌐 Comprehensive Multimodal Support
- Native Vision-Language Model (VLM) Training
  - Support for mainstream VLMs like Qwen-VL
  - Parallel processing of multimodal image-text data
  - Efficient multimodal tokenization and batching
- Multimodal Reward Modeling
  - Support for multiple visual reward models working in collaboration
  - Joint optimization of image understanding and text generation
- Complete Vision-Language Alignment Training Pipeline
  - Optimized for multimodal RLVR/RLHF training
  - Built-in support for vision-language model fine-tuning
📊 Complete Experimental Toolkit
- Weights & Biases (W&B) integration
- Math capability benchmarking (GSM8K, Geo3K, etc.)
- Trajectory saving and analysis tools
- Automatic checkpoint management

🎯 Supported Algorithms

For detailed algorithm descriptions, implementation details, and usage guide, see Algorithm Documentation.

| Algorithm | Type | Key Improvement | Paper | |-----------|------|-----------------|-------| | GRPO | Policy Optimization | Group normalized advantage estimation | arXiv:2402.03300 | | GSPO | Policy Optimization | Group sequence policy optimization | arXiv:2507.18071 | | GMPO (WIP) | Policy Optimization | Geometric-mean policy optimization | arXiv:2507.20673 | | Dr.GRPO | Policy Optimization | Length bias mitigation | arXiv:2503.20783 | | DAPO | Policy Optimization | Decoupled clip and dynamic sampling policy optimization | arXiv:2503.14476 | | REINFORCE++ | Advantage Estimation | Improved baseline estimation | arXiv:2501.03262 | | CPGD | Advantage Estimation | KL-based drift constraint | arXiv:2505.12504 | | FIRE Sampling | Sampling Strategy | High-temperature first token sampling for improved diversity | arXiv:2410.21236 |

🚀 Quick Start

Requirements

Python >= 3.12
CUDA >= 12.8
PyTorch >= 2.9.1

Docker Images

We provide pre-built Docker images for easy deployment and consistent environments. You can also build your own images using the provided Dockerfile and Makefile.

Using Pre-built Images

The official Docker images are available on Docker Hub. You can pull the latest version using:

docker pull opendilab/lightrft:v0.1.0

To run a container with GPU support:

docker run --gpus all -it --rm \
    -v /path/to/your/data:/app/data \
    -v /path/to/your/checkpoints:/app/checkpoints \
    opendilab/lightrft:v0.1.0 /bin/bash

Building Custom Images

If you need to customize the environment or build from a specific branch, you can use the provided Makefile to build the image locally.

Prerequisites: Ensure you have Docker and NVIDIA Container Toolkit installed.

Build the image:

# Build the image with the default name (opendilab/lightrft:v${VERSION})
make dbuild

The IMAGE_NAME is automatically determined based on the current version of the project. You can also override it:

make dbuild IMAGE_NAME=your-custom-tag:latest

Technical Details:
- Base Image: nvcr.io/nvidia/pytorch:25.01-py3 (includes PyTorch 2.5+ and CUDA 12.8).
- Dependencies: The build process installs essential components including vLLM, DeepSpeed, Flash-Attention, and SGLang in a specific order to ensure stability.
- Optimization: The Dockerfile uses multi-layer optimization and environment variables for non-interactive installation.

Installation

Standard Installation

LightRFT uses SGLang as the default inference backend with Flash-Attention for optimized performance.

# Clone the repository
git clone https://github.com/opendilab/LightRFT.git
cd LightRFT

# Install LightRFT with all core dependencies
pip install -e .

What gets installed: PyTorch, SGLang, Flash-Attention, Transformers, DeepSpeed, and other core dependencies.

Optional: Install vLLM Backend

If you want to use vLLM instead of (or alongside) SGLang:

# Install vLLM backend
pip install ".[vllm]"

# Or install vLLM directly
pip install vllm>=0.13.3

Troubleshooting Flash-Attention Installation

Flash-Attention is included by default but may fail on some systems due to CUDA compatibility. If installation fails, try:

Option 1: Use pre-built wheels (recommended)

# Download the appropriate wheel from https://github.com/Dao-AILab/flash-attention/releases
# Example for CUDA 12.x and PyTorch 2.9:
pip install flash_attn-2.8.3+cu12torch2.9cxx11abiTRUE-cp312-cp312-linux_x86_64.whl

Option 2: Use Docker (easiest)

# The official Docker images include all dependencies
docker pull opendilab/lightrft:v0.1.0

📚 Usage Guide

Basic Example: GRPO Training

# Single node, 8 GPU training example
cd LightRFT

# Run GRPO training (GSM8K math reasoning task)
bash examples/gsm8k_geo3k/run_grpo_gsm8k_qwen2.5_0.5b.sh

# Or run Geo3K geometry problem training (VLM multimodal)
bash examples/gsm8k_geo3k/run_grpo_geo3k_qwen2.5_vl_7b.sh

🏗️ Project Structure

LightRFT/
├── lightrft/                      # Core library
│   ├── strategy/                  # Training & inference strategies
│   │   ├── fsdp/                  # FSDP implementation
│   │   ├── deepspeed/             # DeepSpeed implementation
│   │   ├── vllm_utils/            # vLLM utilities
│   │   ├── sglang_utils/          # SGLang utilities
│   │   └── utils/                 # Strategy utilities
│   ├── models/                    # Model definitions
│   │   ├── actor_al.py            # Audio-language model actor
│   │   ├── actor_language.py      # Language model actor
│   │   ├── actor_vl.py            # Vision-language model actor
│   │   ├── grm_vl.py              # Generative reward model (Vision-Language)
│   │   ├── srm_al.py              # Scalar reward model (Audio-Language)
│   │   ├── srm_vl.py              # Scalar reward model (Vision-Language)
│   │   ├── loss.py                # Loss functions
│   │   ├── monkey_patch/          # Model adaptation patches for distributed training
│   │   ├── tests/                 # Model tests
│   │   └── utils.py               # Model utilities
│   ├── trainer/                   # Trainer implementations
│   │   ├── ppo_trainer.py         # LLM PPO trainer
│   │   ├── ppo_trainer_vl.py      # VLM PPO trainer
│   │   ├── spmd_ppo_trainer.py    # SPMD PPO trainer Extension (**Core**)
│   │   ├── grm_trainer_vl.py      # Generative reward model trainer (Vision-Language)
│   │   ├── srm_trainer_al.py      # Scalar reward model trainer (Audio-Language)
│   │   ├── srm_trainer_vl.py      # Scalar reward model trainer (Vision-Language)
│   │   ├── fast_exp_maker.py      # Fast experience generator (**Core**)
│   │   ├── experience_maker.py    # Base experience generator
│   │   ├── experience_maker_vl.py # Base experience generator for VLM
│   │   ├── replay_buffer.py       # Replay buffer
│   │   ├── replay_buffer_vl.py    # VLM replay buffer
│   │   ├── replay_buffer_utils.py # Replay b

LightRFT

Install / Use

README

LightRFT

📖 Introduction

✨ Key Features

🎯 Supported Algorithms

🚀 Quick Start

Requirements

Docker Images

Using Pre-built Images

Building Custom Images

Installation

Standard Installation

Optional: Install vLLM Backend

Troubleshooting Flash-Attention Installation

📚 Usage Guide

Basic Example: GRPO Training

🏗️ Project Structure