23 skills found
argilla-io / DistilabelDistilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
RLHF-V / RLAIF V[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
mengdi-li / Awesome RLAIFA continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)
CIntellifusion / VideoDPOOfficial Implementation of VideoDPO
gauravchak / Two Tower ModelsWe write sample code for two tower models for retrieval and add RLHF/RLAIF style alignment with a ranking model to make the retrieval more aligned with the ranking model on top
xuyang-sudo / AutoRLAIFAutoRLAIF is a cutting-edge framework designed to revolutionize the fine-tuning of large language models through Reinforcement Learning from AI Feedback (RLAIF).
architsharma97 / Dpo RlaifNo description available
yonseivnl / Vlm RlaifACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
sinanuozdemir / Oreilly Llm Rl AlignmentThis training offers an intensive exploration into the frontier of reinforcement learning techniques with large language models (LLMs). We will explore advanced topics such as Reinforcement Learning with Human Feedback (RLHF), Reinforcement Learning from AI Feedback (RLAIF), Reasoning LLMs, and demonstrate practical applications such as fine-tuning
holarissun / Prompt OIRLcode for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning
andrew-silva / Mlx RlhfAn example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.
vicgalle / Zero Shot Reward ModelsZYN: Zero-Shot Reward Models with Yes-No Questions
dannylee1020 / OpenpoBuilding synthetic data for preference tuning
zhaochen0110 / TimoCode and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)
bayjarvis / LlmFine-tuning, DPO, RLHF, RLAIF on LLMs - Qwen3, Zephyr 7B GPTQ with 4-Bit Quantization, Mistral-7B-GPTQ
vicgalle / Awesome RlaifA curated and updated list of relevant articles and repositories on Reinforcement Learning from AI Feedback (RLAIF)
vicgalle / Distilled Self Critiquedistilled Self-Critique refines the outputs of a LLM with only synthetic data
mengdi-li / Vanilla RLAIF PipelineAn implementation of a vanilla RLAIF pipeline, utilizing GPT-2-Large for the summarization task with the TL;DR dataset.
riken-grp / RLAIF DialogLLMNo description available
jacooba / OfflineRLAIFNo description available