121 skills found · Page 1 of 5
huggingface / TrlTrain transformer language models with reinforcement learning.
CarperAI / TrlxA repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
robot-learning-co / Trlc Dk1TRLC's Developer Kit 1
NVlabs / GDPOOfficial implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
PriorLabs / Tabpfn Time SeriesZero-shot Time Series Forecasting with TabPFN (work accepted at NeurIPS 2024 TRL and TSALM workshops)
jasonvanf / Llama TrlLLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
TYH-labs / Unsloth BuddyZero-friction LLM fine-tuning skill for Claude Code, Gemini CLI & any ACP agent. Unsloth on NVIDIA · TRL+MPS/MLX on Apple Silicon. Automates env setup, LoRA training (SFT, DPO, GRPO, vision), post-hoc GRPO log diagnostics, evaluation, and export end-to-end. Part of the Gaslamp AI platform.
argilla-io / NotusNotus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach
Yaepiii / TRLO[T-IM 2025] TRLO: An Efficient LiDAR Odometry with 3D Dynamic Object Tracking and Removal
TheIndra55 / TRLAU Menu HookReverse engineering, menu and patches for Tomb Raider Anniversary, Legend and Underworld.
bmw-software-engineering / TrlcTreat Requirements Like Code
GAD-cell / Vlm GrpoAn implementation of GRPO for Unsloth's VLMs training
MilkClouds / Vla0 TrlUnofficial reimplementation of VLA-0 using TRL's SFTTrainer.
huggingface / Trl JobsTrain LLM on Hugging Face infra
Mofiqul / Trld.nvimNo description available
aoberai / TrlCode for "Transitive RL: Value Learning via Divide and Conquer"
huggingface / Trl TutoNo description available
sugarandgugu / Simple Trl Training基于DPO算法微调语言大模型,简单好上手。
Shekswess / Tiny Reasoning Language ModelCode repository dedicated to experimenting and research with tiny reasoning language model
ZJU-REAL / TimeHC RLThis repository is the official implementation of TimeHC-RL (Distilabel (Data Generation) + TRL (SFT) + VeRL (GRPO)).