Results for "reward-learning"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

283 skills found · Page 1 of 10

HumanCompatibleAI / Imitation

1.7k

Clean PyTorch implementations of imitation and reward learning algorithms

universal

gymnasiumimitation-learninginverse-reinforcement-learning+1

Updated 2d ago

bytedance / USO

1.2k

[CVPR 2026] 🔥🔥 Official Repo of USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning

universal

Updated 4d ago

zai-org / GLM TTS

965

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

universal

edge-computingllmtts

Updated 14h ago

yongliang-wu / DFT

554

[ICLR 2026] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.

universal

Updated 3d ago

avisingh599 / Reward Learning Rl

373

[RSS 2019] End-to-End Robotic Reinforcement Learning without Reward Engineering

universal

deep-learningdeep-reinforcement-learningmachine-learning+2

Updated 5d ago

mihirp1998 / AlignProp

314

AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods (PPO) for finetuning Stable Diffusion

universal

alignmentdiffusion-modelsreinforcement-learning+2

Updated 1mo ago

yfzhang114 / R1 Reward

283

✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

universal

Updated 10d ago

bitsauce / Carla Ppo

265

This repository hosts a customized PPO based agent for Carla. The goal of this project is to make it easier to interact with and experiment in Carla with reinforcement learning based agents -- this, by wrapping Carla in a gym like environment that can handle custom reward functions, custom debug output, etc.

zed

agent-drivingagent-learningautonomous-driving+3

Updated 22d ago

CodeGoat24 / Pref GRPO

258

Official implementation of Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

universal

Updated 2d ago

Jerry-XDL / AIDoctor

254

AIDoctor training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preferenc…

universal

Updated 6d ago

xlang-ai / Text2reward

204

[ICLR 2024 Spotlight] Text2Reward: Reward Shaping with Language Models for Reinforcement Learning

universal

agentcode-generationlanguage-model+2

Updated 2d ago

rllab-snu / Stage Wise CMORL

198

This is an official GitHub Repository for paper "Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach".

universal

Updated 3d ago

InternLM / OREAL

192

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

universal

llmmathematicso1+2

Updated 4d ago

raghavc / LLM RLHF Tuning With PPO And DPO

188

Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.

universal

Updated 4d ago