OfflineRLAIF
No description available
Install / Use
/learn @jacooba/OfflineRLAIFREADME

This repository contains code for the paper Offline RLAIF: Piloting VLM Feedback for RL via SFO (Beck et al., 2025), published at The RLC Workshop on Reinforcement Learning Beyond Rewards: Ingredients for Developing Generalist Agents. The code includes an implementation of Sub-Trajectory Filtered Behavior Cloning (SFBC), a method that leverages vision-language model (VLM) feedback to improve offline reinforcement learning (RL). SFBC filters and weights sub-trajectories based on VLM-derived success probabilities, enabling effective policy learning in the absence of explicit rewards.
Installation
To set up the environment, use Conda:
conda env create -f environment.yml
conda activate d3rl
Running the Code
To train and evaluate SFBC, simply run:
python offline.py <openai_api_key>
All key arguments and hyperparameters can be modified in Main.py. Currently, they are defined as constants at the top of Main.py, so you should edit them directly before running the script.
Citing this Work
If you use this code in your research, please cite our paper:
@article{beck2025sfo,
author = {Jacob Beck},
title = {Offline RLAIF: Piloting VLM Feedback for RL via SFO},
journal = {RLC Workshop on Reinforcement Learning Beyond Rewards: Ingredients for Developing Generalist Agents},
url = {https://openreview.net/pdf?id=XuW9VGTz1w},
year = {2025}
}
Acknowledgments
This implementation is based on D3RLpy for offline RL baselines and uses GPT-4o as the vision-language model (VLM) for sub-trajectory evaluation.
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
342.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
