ROLL
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
Install / Use
/learn @alibaba/ROLLREADME
ROLL: Reinforcement Learning Optimization for Large-Scale Learning
<h4>🚀 An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models 🚀</h4> <p> <a href="https://github.com/alibaba/ROLL/blob/main/LICENSE"> <img src="https://img.shields.io/badge/license-Apache%202.0-blue.svg" alt="License"> </a> <a href="https://github.com/alibaba/ROLL/issues"> <img src="https://img.shields.io/github/issues/alibaba/ROLL" alt="GitHub issues"> </a> <a href="https://github.com/alibaba/ROLL/stargazers"> <img src="https://img.shields.io/github/stars/alibaba/ROLL?style=social" alt="Repo stars"> </a> <a href="https://arxiv.org/abs/2506.06122"><img src="https://img.shields.io/static/v1?label=arXiv&message=Paper&color=red"></a> <!-- 组织主页:点击跳转到 https://github.com/alibaba --> <a href="./assets/roll_wechat.png" target="_blank"> <img src="https://img.shields.io/badge/WeChat-green?logo=wechat" alt="WeChat QR"> </a> <a href="https://deepwiki.com/alibaba/ROLL" target="_blank"> <img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"> </a> <a href="./assets/future_lab.png" target="_blank"> <img src="https://img.shields.io/twitter/follow/FutureLab2025?style=social" alt="X QR"> </a> </p> </div>ROLL is an efficient and user-friendly RL library designed for Large Language Models (LLMs) utilizing Large Scale GPU resources. It significantly enhances LLM performance in key areas such as human preference alignment, complex reasoning, and multi-turn agentic interaction scenarios.
Leveraging a multi-role distributed architecture with Ray for flexible resource allocation and heterogeneous task scheduling, ROLL integrates cutting-edge technologies like Megatron-Core, SGLang and vLLM to accelerate model training and inference.
📢 News
| 📣 Updates |
|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [03/06/2026] 🎉 We support Qwen3.5 Dense and MoE series models and [on-policy distill](docs_roll/i18n/zh-Hans/docusaurus-plugin-content-docs/current/User Guides/Pipeline/on_policy_distill_pipeline_start.md). Welcome to use! |
| [02/03/2026] 🎉 We released FSDP2 Strategy, Megatron with LoRA, GPU partial overlapping, Qwen3-Omni supports and other features. For more details, please refer to the release notes. Welcome to use! |
| [01/01/2026] 🎉 Our Let It Flow: Agentic Crafting on Rock and Roll report released! Introducing ALE ecosystem and ROME, an open-source agentic model with novel IPA algorithm. |
| [11/08/2025] 🎉 Our ROCK: Reinforcement Open Construction Kit released, Explore the new capabilities!. |
| [10/23/2025] 🎉 Our Papers released, see Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning and Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization. |
| [10/14/2025] 🎉 Our Paper released, see Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony. |
| [09/28/2025] 🎉 Ascend NPU support — see usage guide. |
| [09/25/2025] 🎉 Our Paper released, see RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training |
| [09/24/2025] 🎉 Support Wan2_2 Reward FL pipeline. Explore the new capabilities! |
| [09/23/2025] 🎉 ROLL aligns with GEM environment definition, providing agentic Tool Use training capabilities, ToolUse docs. |
| [09/16/2025] 🎉 Qwen3-Next model training is supported, refer to configuration. |
| [09/04/2025] 🎉 ROLL supports vLLM dynamic FP8 rollout and remove_padding for acceleration. |
| [08/28/2025] 🎉 ROLL supports SFT pipeline, refer to configuration. |
| [08/13/2025] 🎉 ROLL supports AMD GPUs with out-of-box image docker and Dockerfile and specific yamls under examples/ directory. Please refer to Installation. |
| [08/11/2025] 🎉 Our Paper released, see Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning. |
| [08/10/2025] 🎉 Agentic RL supports stepwise learning, like GiGPO; Distill supports VLM. Explore the new capabilities! |
| [08/06/2025] 🎉 ROLL PPT is now available, Slides. |
| [07/31/2025] 🎉 Refactor agentic rl design. Support agentic rl async training. Explore the new capabilities! |
| [07/31/2025] 🎉 Support DistillPipeline/DpoPipeline. Support lora. Support GSPO |
| [06/25/2025] 🎉 Support thread env for env scaling and support qwen2.5 VL agentic pipeline. |
| [06/13/2025] 🎉 Support Qwen2.5 VL rlvr pipeline and upgrade mcore to 0.12 version.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
research_rules
Research & Verification Rules Quote Verification Protocol Primary Task "Make sure that the quote is relevant to the chapter and so you we want to make sure that we want to have it identifie
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
