InterRVOS

Official implementation of "InterRVOS: Interaction-aware Referring Video Object Segmentation".

Generate Convert Improve

Install / Use

/learn @cvlab-kaist/InterRVOS

About this skill

Quality Score

0/100

README

<div align="center"> <h1>InterRVOS: Interaction-Aware Referring Video Object Segmentation</h1>

Woojeong Jin Seongchan Kim Jaeho Lee Seungryong Kim† <br> KAIST AI <br> †: Corresponding Author

ArXiv 2025

📢 News

[x] Upcoming: InterRVOS-127K dataset and ReVIOSa checkpoints
[x] Upcoming : Data annotation pipeline
[x] Released: Training code, inference & evaluation code
[x] Released: InterRVOS on ArXiv and Project Page

🎯 Release Progress

[x] Model checkpoints
[x] InterRVOS-127K dataset (Training & Evaluation)
[x] Data annotation pipeline code
[x] Inference & evaluation code
[x] Training code

Overview

This repository contains the code for the paper InterRVOS: Interaction-aware Referring Video Object Segmentation.

In this paper, we introduce Interaction-aware Referring Video Object Segmentation (InterRVOS), a novel task that focuses on the modeling of interactions. It requires the model to segment the <b>actor</b> and <b>target</b> objects separately, reflecting their asymmetric roles in an interaction. Please refer to the project page for detailed visualization results.

Model Download

‼️ We release the pretrained ReVIOSa-1B and ReVIOSa-4B model on Hugging Face 🤗: ReVIOSa-1B and ReVIOSa-4B

🚀 Quick Start

import torch
from transformers import AutoTokenizer, AutoModel
from PIL import Image
import numpy as np
import os

# load the model and tokenizer
path = "wooj0216/ReVIOSa-4B"
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)

video_folder = "/PATH/TO/VIDEO_FOLDER"
images_paths = os.listdir(video_folder)
images_paths = [os.path.join(video_folder, image_path) for image_name in images_paths]
text_prompts = "<image>Please segment the child reaching out to man."
input_dict = {
    'video': images_paths,
    'text': text_prompts,
    'past_text': '',
    'mask_prompts': None,
    'tokenizer': tokenizer,
}
return_dict = model.predict_forward(**input_dict)
answer = return_dict["prediction"]
masks = return_dict['prediction_masks']

Dataset

‼️ We release our dataset InterRVOS-127K model on Hugging Face 🤗: wooj0216/InterRVOS-127K

Model Training & Inference

Instructions for training, inference, and evaluation are provided in ReVIOSa/README.md.

Data Annotation

Our automatic data-annotation pipeline are provided in the data_annotation.

Acknowledgement

This project is based on Sa2VA. Many thanks to the authors for their great works!

References

If you find this repository useful, please consider referring to the following paper:

@misc{jin2025interrvosinteractionawarereferringvideo,
    title={InterRVOS: Interaction-aware Referring Video Object Segmentation},
    author={Woojeong Jin and Seongchan Kim and Jaeho Lee and Seungryong Kim},
    year={2025},
    eprint={2506.02356},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2506.02356},
}

Related Skills

docs-writer

98.9k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

334.1k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

arscontexta

2.8k

Claude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.

docs

High-performance, modular RAG backend and "Knowledge Engine" Built with Go & Gin, featuring Git-Ops knowledge sync, pgvector semantic search, and OpenAI-compatible model support.

cvlab-kaist

View profile

View on GitHub

GitHub Stars26

CategoryContent

Updated12d ago

Forks2

cvlab-kaist/InterRVOS

Languages

Python

Security Score

90/100

Audited on Mar 12, 2026

No findings