SkillAgentSearch skills...

ART

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.5, GPT-OSS, Llama, and more!

Install / Use

/learn @OpenPipe/ART
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center">

<a href="https://art.openpipe.ai"><picture> <img alt="ART logo" src="https://github.com/openpipe/art/raw/main/assets/ART_logo.png" width="160px"> </picture></a>

<p align="center"> <h1>Agent Reinforcement Trainer</h1> </p> <p> Train multi-step agents for real-world tasks using GRPO. </p>

[![PRs-Welcome][contribute-image]][contribute-url] [PyPI version][pypi-url] Train Agent

Join Discord Documentation

</div>

🚀 W&B Training: Serverless RL

W&B Training (Serverless RL) is the first publicly available service for flexibly training models with reinforcement learning. It manages your training and inference infrastructure automatically, letting you focus on defining your data, environment and reward function—leading to faster feedback cycles, lower costs, and far less DevOps.

Key Benefits:

  • 40% lower cost - Multiplexing on shared production-grade inference cluster
  • 28% faster training - Scale to 2000+ concurrent requests across many GPUs
  • Zero infra headaches - Fully managed infrastructure that stays healthy
  • Instant deployment - Every checkpoint instantly available via W&B Inference
# Before: Hours of GPU setup and infra management
# RuntimeError: CUDA error: out of memory 😢

# After: Serverless RL with instant feedback
from art.serverless.backend import ServerlessBackend

model = art.TrainableModel(
  project="voice-agent",
  name="agent-001",
  base_model="OpenPipe/Qwen3-14B-Instruct"
)

backend = ServerlessBackend(
    api_key="your_wandb_api_key"
)
model.register(backend)
# Edit and iterate in minutes, not hours!

📖 Learn more about W&B Training →

ART Overview

ART is an open-source RL framework that improves agent reliability by allowing LLMs to learn from experience. ART provides an ergonomic harness for integrating GRPO into any python application. For a quick hands-on introduction, run one of the notebooks below. When you're ready to learn more, check out the docs.

📒 Notebooks

| Agent Task | Example Notebook | Description | Comparative Performance | | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ART•E [Serverless] | 🏋️ Train agent | Qwen3 14B learns to search emails using RULER | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/email_agent/accuracy-training-progress.svg" height="72"> benchmarks | | 2048 [Serverless] | 🏋️ Train agent | Qwen3 14B learns to play 2048 | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/2048/accuracy-training-progress.svg" height="72"> benchmarks | | ART•E LangGraph | 🏋️ Train agent | Qwen 2.5 7B learns to search emails using LangGraph | [Link coming soon] | | MCP•RL | 🏋️ Train agent | Qwen 2.5 3B masters the NWS MCP server | [Link coming soon] | | Temporal Clue | 🏋️ Train agent | Qwen 2.5 7B learns to solve Temporal Clue | [Link coming soon] | | Tic Tac Toe | 🏋️ Train agent | Qwen 2.5 3B learns to play Tic Tac Toe | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/tic-tac-toe-local/accuracy-training-progress.svg" height="72"> benchmarks | | Codenames | 🏋️ Train agent | Qwen 2.5 3B learns to play Codenames | <img src="https://github.com/openpipe/art/raw/main/assets/benchmarks/codenames/win_rate_over_time.png" height="72"> benchmarks | | AutoRL [RULER] | 🏋️ Train agent | Train Qwen 2.5 7B to master any task | [Link coming soon] | | Distillation (SFT) | 🏋️ Train model | Distill text-to-SQL from Qwen 3 235B to Qwen 3 30B | [Link coming soon] | | Summarizer (SFT + RL) | 🏋️ Train model | Train a document summarizer with SFT warmup then RL | [Link coming soon] | | SFT from a dataset | 🏋️ Train model | Fine-tune Qwen 3 30B on text-to-SQL from a dataset | [Link coming soon] |

📰 ART News

Explore our latest research and updates on building SOTA agents.

📖 See all blog posts →

Why ART?

  • ART provides convenient wrappers for introducing RL training into existing applications. We abstract the training server into a modular service that your code doesn't need to interface with.
  • Train from anywhere. Run the ART client on your laptop and let the ART server kick off an ephemeral GPU-enabled environment, or run on a local GPU.
  • Integrations with hosted platforms like W&B, Langfuse, and OpenPipe provide flexible observability
View on GitHub
GitHub Stars9.1k
CategoryEducation
Updated8h ago
Forks772

Languages

Python

Security Score

100/100

Audited on Mar 23, 2026

No findings