Results for "rlaif"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

23 skills found

argilla-io / Distilabel

3.1k

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

universal

aihuggingfacellms+6

Updated 18h ago

RLHF-V / RLAIF V

449

[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

universal

chatbotcvpr2025gpt-4v+6

Updated 5d ago

mengdi-li / Awesome RLAIF

198

A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)

universal

alignmentllmsrl+2

Updated 1d ago

CIntellifusion / VideoDPO

163

Official Implementation of VideoDPO

universal

aigcdiffusion-modelsgenerative-ai+4

Updated 12d ago

gauravchak / Two Tower Models

109

We write sample code for two tower models for retrieval and add RLHF/RLAIF style alignment with a ranking model to make the retrieval more aligned with the ranking model on top

universal

Updated 21d ago

xuyang-sudo / AutoRLAIF

100

AutoRLAIF is a cutting-edge framework designed to revolutionize the fine-tuning of large language models through Reinforcement Learning from AI Feedback (RLAIF).

universal

Updated 7d ago

architsharma97 / Dpo Rlaif

No description available

universal

Updated 15d ago

yonseivnl / Vlm Rlaif

ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback

universal

Updated 1mo ago

sinanuozdemir / Oreilly Llm Rl Alignment

This training offers an intensive exploration into the frontier of reinforcement learning techniques with large language models (LLMs). We will explore advanced topics such as Reinforcement Learning with Human Feedback (RLHF), Reinforcement Learning from AI Feedback (RLAIF), Reasoning LLMs, and demonstrate practical applications such as fine-tuning

universal

agentsaideepseek+8

Updated 21d ago