OfflineRLAIF

No description available

Generate Convert Improve

Install / Use

/learn @jacooba/OfflineRLAIF

About this skill

Quality Score

0/100

README

Offline RLAIF

This repository contains code for the paper Offline RLAIF: Piloting VLM Feedback for RL via SFO (Beck et al., 2025), published at The RLC Workshop on Reinforcement Learning Beyond Rewards: Ingredients for Developing Generalist Agents. The code includes an implementation of Sub-Trajectory Filtered Behavior Cloning (SFBC), a method that leverages vision-language model (VLM) feedback to improve offline reinforcement learning (RL). SFBC filters and weights sub-trajectories based on VLM-derived success probabilities, enabling effective policy learning in the absence of explicit rewards.

Installation

To set up the environment, use Conda:

conda env create -f environment.yml
conda activate d3rl

Running the Code

To train and evaluate SFBC, simply run:

python offline.py <openai_api_key>

All key arguments and hyperparameters can be modified in Main.py. Currently, they are defined as constants at the top of Main.py, so you should edit them directly before running the script.

Citing this Work

If you use this code in your research, please cite our paper:

@article{beck2025sfo,
  author    = {Jacob Beck},
  title     = {Offline RLAIF: Piloting VLM Feedback for RL via SFO},
  journal   = {RLC Workshop on Reinforcement Learning Beyond Rewards: Ingredients for Developing Generalist Agents},
  url       = {https://openreview.net/pdf?id=XuW9VGTz1w},
  year      = {2025}
}

Acknowledgments

This implementation is based on D3RLpy for offline RL baselines and uses GPT-4o as the vision-language model (VLM) for sub-trajectory evaluation.

Related Skills

node-connect

342.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

85.3k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

342.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

342.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。