Rl
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
Install / Use
/learn @pytorch/RlREADME
<a href="https://pypi.org/project/torchrl"><img src="https://img.shields.io/pypi/v/torchrl" alt="pypi version"></a>
<a href="https://pypi.org/project/torchrl-nightly"><img src="https://img.shields.io/pypi/v/torchrl-nightly?label=nightly" alt="pypi nightly version"></a>
TorchRL
<p align="center"> <img src="docs/source/_static/img/icon.png" width="200" > </p>What's New | LLM API | Getting Started | Documentation | TensorDict | Features | Examples, tutorials and demos | Citation | Installation | Asking a question | Contributing
TorchRL is an open-source Reinforcement Learning (RL) library for PyTorch.
🚀 What's New
🚀 Command-Line Training Interface - Train RL Agents Without Writing Code! (Experimental)
TorchRL now provides a powerful command-line interface that lets you train state-of-the-art RL agents with simple bash commands! No Python scripting required - just run training with customizable parameters:
- 🎯 One-Command Training:
python sota-implementations/ppo_trainer/train.py - ⚙️ Full Customization: Override any parameter via command line:
trainer.total_frames=2000000 optimizer.lr=0.0003 - 🌍 Multi-Environment Support: Switch between Gym, Brax, DM Control, and more with
env=gym training_env.create_env_fn.base_env.env_name=HalfCheetah-v4 - 📊 Built-in Logging: TensorBoard, Weights & Biases, CSV logging out of the box
- 🔧 Hydra-Powered: Leverages Hydra's powerful configuration system for maximum flexibility
- 🏃♂️ Production Ready: Same robust training pipeline as our SOTA implementations
Perfect for: Researchers, practitioners, and anyone who wants to train RL agents without diving into implementation details.
⚠️ Note: This is an experimental feature. The API may change in future versions. We welcome feedback and contributions to help improve this implementation!
📋 Prerequisites: The training interface requires Hydra for configuration management. Install with:
pip install "torchrl[utils]"
# or manually:
pip install hydra-core omegaconf
Check out the complete CLI documentation to get started!
🚀 vLLM Revamp - Major Enhancement to LLM Infrastructure (v0.10)
This release introduces a comprehensive revamp of TorchRL's vLLM integration, delivering significant improvements in performance, scalability, and usability for large language model inference and training workflows:
- 🔥 AsyncVLLM Service: Production-ready distributed vLLM inference with multi-replica scaling and automatic Ray actor management
- ⚖️ Multiple Load Balancing Strategies: Routing strategies including prefix-aware, request-based, and KV-cache load balancing for optimal performance
- 🏗️ Unified vLLM Architecture: New
RLvLLMEngineinterface standardizing all vLLM backends with simplifiedvLLMUpdaterV2for seamless weight updates - 🌐 Distributed Data Loading: New
RayDataLoadingPrimerfor shared, distributed data loading across multiple environments - 📈 Enhanced Performance: Native vLLM batching, concurrent request processing, and optimized resource allocation via Ray placement groups
# Simple AsyncVLLM usage - production ready!
from torchrl.modules.llm import AsyncVLLM, vLLMWrapper
# Create distributed vLLM service with load balancing
service = AsyncVLLM.from_pretrained(
"Qwen/Qwen2.5-7B",
num_devices=2, # Tensor parallel across 2 GPUs
num_replicas=4, # 4 replicas for high throughput
max_model_len=4096
)
# Use with TorchRL's LLM wrappers
wrapper = vLLMWrapper(service, input_mode="history")
# Simplified weight updates
from torchrl.collectors.llm import vLLMUpdaterV2
updater = vLLMUpdaterV2(service) # Auto-configures from engine
This revamp positions TorchRL as the leading platform for scalable LLM inference and training, providing production-ready tools for both research and deployment scenarios.
🧪 PPOTrainer (Experimental) - High-Level Training Interface
TorchRL now includes an experimental PPOTrainer that provides a complete, configurable PPO training solution! This prototype feature combines TorchRL's modular components into a cohesive training system with sensible defaults:
- 🎯 Complete Training Pipeline: Handles environment setup, data collection, loss computation, and optimization automatically
- ⚙️ Extensive Configuration: Comprehensive Hydra-based config system for easy experimentation and hyperparameter tuning
- 📊 Built-in Logging: Automatic tracking of rewards, actions, episode completion rates, and training statistics
- 🔧 Modular Design: Built on existing TorchRL components (collectors, losses, replay buffers) for maximum flexibility
- 📝 Minimal Code: Complete SOTA implementation in just ~20 lines!
Working Example: See sota-implementations/ppo_trainer/ for a complete, working PPO implementation that trains on Pendulum-v1 with full Hydra configuration support.
Prerequisites: Requires Hydra for configuration management: pip install "torchrl[utils]"
import hydra
from torchrl.trainers.algorithms.configs import *
@hydra.main(config_path="config", config_name="config", version_base="1.1")
def main(cfg):
trainer = hydra.utils.instantiate(cfg.trainer)
trainer.train()
if __name__ == "__main__":
main()
Complete PPO training in ~20 lines with full configurability.
</details> <details> <summary>API Usage Examples</summary># Basic usage - train PPO on Pendulum-v1 with default settings
python sota-implementations/ppo_trainer/train.py
# Custom configuration with command-line overrides
python sota-implementations/ppo_trainer/train.py \
trainer.total_frames=2000000 \
training_env.create_env_fn.base_env.env_name=HalfCheetah-v4 \
networks.policy_network.num_cells=[256,256] \
optimizer.lr=0.0003
# Use different environment and logger
python sota-implementations/ppo_trainer/train.py \
env=gym \
training_env.create_env_fn.base_env.env_name=Walker2d-v4 \
logger=tensorboard
# See all available options
python sota-implementations/ppo_trainer/train.py --help
</details>
Future Plans: Additional algorithm trainers (SAC, TD3, DQN) and full integration of all TorchRL components within the configuration system are planned for upcoming releases.
LLM API - Complete Framework for Language Model Fine-tuning
TorchRL includes a comprehensive LLM API for post-training and fine-tuning of language models! This framework provides everything you need for RLHF, supervised fine-tuning, and tool-augmented training:
- 🤖 Unified LLM Wrappers: Seamless integration with Hugging Face models and vLLM inference engines
- 💬 Conversation Management: Advanced
Historyclass for multi-turn dialogue with automatic chat template detection - 🛠️ Tool Integration: Built-in support for Python code execution, function calling, and custom tool transforms
- 🎯 Specialized Objectives: GRPO (Group Relative Policy Optimization) and SFT loss functions optimized for language models
- ⚡ High-Performance Collectors: Async data collection with distributed training support
- 🔄 Flexible Environments: Transform-based architecture for reward computation, data loading, and conversation augmentation
The LLM API follows TorchRL's modular design principles, allowing you to mix and match components for your specific use case. Check out the complete documentation and GRPO implementation example to
