SkillAgentSearch skills...

OfflineRLAIF

No description available

Install / Use

/learn @jacooba/OfflineRLAIF
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Offline RLAIF

<!-- # Offline RLAIF: Piloting VLM Feedback for RL via SFO -->

This repository contains code for the paper Offline RLAIF: Piloting VLM Feedback for RL via SFO (Beck et al., 2025), published at The RLC Workshop on Reinforcement Learning Beyond Rewards: Ingredients for Developing Generalist Agents. The code includes an implementation of Sub-Trajectory Filtered Behavior Cloning (SFBC), a method that leverages vision-language model (VLM) feedback to improve offline reinforcement learning (RL). SFBC filters and weights sub-trajectories based on VLM-derived success probabilities, enabling effective policy learning in the absence of explicit rewards.

Installation

To set up the environment, use Conda:

conda env create -f environment.yml
conda activate d3rl

Running the Code

To train and evaluate SFBC, simply run:

python offline.py <openai_api_key>

All key arguments and hyperparameters can be modified in Main.py. Currently, they are defined as constants at the top of Main.py, so you should edit them directly before running the script.

Citing this Work

If you use this code in your research, please cite our paper:

@article{beck2025sfo,
  author    = {Jacob Beck},
  title     = {Offline RLAIF: Piloting VLM Feedback for RL via SFO},
  journal   = {RLC Workshop on Reinforcement Learning Beyond Rewards: Ingredients for Developing Generalist Agents},
  url       = {https://openreview.net/pdf?id=XuW9VGTz1w},
  year      = {2025}
}

Acknowledgments

This implementation is based on D3RLpy for offline RL baselines and uses GPT-4o as the vision-language model (VLM) for sub-trajectory evaluation.

Related Skills

View on GitHub
GitHub Stars5
CategoryDevelopment
Updated2mo ago
Forks0

Languages

Python

Security Score

80/100

Audited on Jan 13, 2026

No findings