ReferSplat

[ICML2025 Oral] ReferSplat: Referring Segmentation in 3D Gaussian Splatting

Generate Convert Improve

Install / Use

/learn @heshuting555/ReferSplat

About this skill

Quality Score

0/100

README

<p align="center"> <h1 align="center">ReferSplat: Referring Segmentation in 3D Gaussian Splatting</h1> <p align="center"> ICML 2025 Oral </p> <p align="center"> <a href="https://arxiv.org/abs/2508.08252"> <img src='https://img.shields.io/badge/Paper-PDF-green?style=flat&logo=arXiv&' alt='arXiv PDF'> </a> </p>

Abstract

We introduce Referring 3D Gaussian Splatting Segmentation (R3DGS), a new task that focuses on segmenting target objects in a 3D Gaussian scene based on natural language descriptions. This task requires the model to identify newly described objects that may be occluded or not directly visible in a novel view, posing a significant challenge for 3D multi-modal understanding. Developing this capability is crucial for advancing embodied AI. To support research in this area, we construct the first R3DGS dataset, Ref-LERF. Our analysis reveals that 3D multi-modal understanding and spatial relationship modeling are key challenges for R3DGS. To address these challenges, we propose ReferSplat, a framework that explicitly models 3D Gaussian points with natural language expressions in a spatially aware paradigm. ReferSplat achieves state-of-the-art performance on both the newly proposed R3DGS task and 3D open-vocabulary segmentation benchmarks. Code, trained models, and the dataset will be publicly released. ReferSplat Example

Datasets

The Ref-LERF dataset is accessible for download via the following link: baiduyun or huggingface

<path to ref-lerf dataset>
|---figurines
|---ramen
|---waldo_kitchen
|---teatime

Checkpoints and Pseudo mask

The Checkpoints and Pseudo mask are accessible for download via the following link:googledrive or huggingface

Cloning the Repository

The repository contains submodules, thus please check it out with

#SSH
git clone git@github.com:heshuting555/ReferSplat.git
cd ReferSplat

#HTTPS
git clone https://github.com/heshuting555/ReferSplat.git
cd ReferSplat

Setup

Our default, provided install method is based on Conda package and environment management:

conda env create --file environment.yml
conda activate refsplat

Training

Note: Before training, you need to train original 3DGS to obtain pretrained Gaussians for RGB rendering.

python train.py -s <path to ref-lerf dataset> -m <path to output_model>
<ref-lerf>
|---<path to ref-lerf dataset>
|   |---<figurines>
|   |---<ramen>
|   |---...
|---<path to output_model>
    |---<figurines>
    |---<ramen>
    |---...

Render

python render.py -m <path to output_model>

Get pseudo mask

Please refer to the "Grounded-SAM: Detect and Segment Everything with Text Prompt" method in https://github.com/IDEA-Research/Grounded-Segment-Anything

BibTeX

Please consider citing ReferSplat if it helps your research.

@inproceedings{ReferSplat,
  title={{ReferSplat}: Referring Segmentation in 3D Gaussian Splatting},
  author={He, Shuting and Jie, Guangquan and Wang, Changshuo and Zhou, Yun and Hu, Shuming and Li, Guanbin and Ding, Henghui},
  booktitle={International Conference on Machine Learning (ICML)}
}

Related Skills

node-connect

352.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。