Results for "trl"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

121 skills found · Page 1 of 5

huggingface / Trl

17.9k

Train transformer language models with reinforcement learning.

universal

Updated 15m ago

CarperAI / Trlx

4.7k

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

universal

machine-learningpytorchreinforcement-learning

Updated 1d ago

robot-learning-co / Trlc Dk1

600

TRLC's Developer Kit 1

universal

Updated 2d ago

NVlabs / GDPO

430

Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

universal

agentic-aigrpollm+4

Updated 49m ago

PriorLabs / Tabpfn Time Series

377

Zero-shot Time Series Forecasting with TabPFN (work accepted at NeurIPS 2024 TRL and TSALM workshops)

universal

tabpfntime-series-forecasting

Updated 4d ago

jasonvanf / Llama Trl

240

LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA

universal

adapterchatgptgpt+8

Updated 5d ago

TYH-labs / Unsloth Buddy

198

Zero-friction LLM fine-tuning skill for Claude Code, Gemini CLI & any ACP agent. Unsloth on NVIDIA · TRL+MPS/MLX on Apple Silicon. Automates env setup, LoRA training (SFT, DPO, GRPO, vision), post-hoc GRPO log diagnostics, evaluation, and export end-to-end. Part of the Gaslamp AI platform.

claude codeclaude desktop+1

apple-siliconclaude-codedpo+10

Updated 6h ago