283 skills found · Page 1 of 10
HumanCompatibleAI / ImitationClean PyTorch implementations of imitation and reward learning algorithms
bytedance / USO[CVPR 2026] 🔥🔥 Official Repo of USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning
zai-org / GLM TTSGLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
yongliang-wu / DFT[ICLR 2026] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
avisingh599 / Reward Learning Rl[RSS 2019] End-to-End Robotic Reinforcement Learning without Reward Engineering
mihirp1998 / AlignPropAlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods (PPO) for finetuning Stable Diffusion
yfzhang114 / R1 Reward✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
bitsauce / Carla PpoThis repository hosts a customized PPO based agent for Carla. The goal of this project is to make it easier to interact with and experiment in Carla with reinforcement learning based agents -- this, by wrapping Carla in a gym like environment that can handle custom reward functions, custom debug output, etc.
CodeGoat24 / Pref GRPOOfficial implementation of Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Jerry-XDL / AIDoctorAIDoctor training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preferenc…
xlang-ai / Text2reward[ICLR 2024 Spotlight] Text2Reward: Reward Shaping with Language Models for Reinforcement Learning
rllab-snu / Stage Wise CMORLThis is an official GitHub Repository for paper "Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach".
InternLM / OREALExploring the Limit of Outcome Reward for Learning Mathematical Reasoning
raghavc / LLM RLHF Tuning With PPO And DPOComprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.
zli12321 / Vision SR1Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
InternLM / POLARPre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.
villekuosmanen / RewACTA supervised learning trained reward head for ACT
eric-xw / ARELCode for the ACL paper "No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling"
TEA-Lab / Diffusion Reward[ECCV 2024] 💐Official implementation of the paper "Diffusion Reward: Learning Rewards via Conditional Video Diffusion"
PiggyCh / RL Arm Under Sparse RewardA reinforcement learning project for robotic arm under sparse reward