ReferSplat
[ICML2025 Oral] ReferSplat: Referring Segmentation in 3D Gaussian Splatting
Install / Use
/learn @heshuting555/ReferSplatREADME
Abstract
We introduce Referring 3D Gaussian Splatting
Segmentation (R3DGS), a new task that focuses
on segmenting target objects in a 3D Gaussian
scene based on natural language descriptions.
This task requires the model to identify newly
described objects that may be occluded or not
directly visible in a novel view, posing a significant challenge for 3D multi-modal understanding. Developing this capability is crucial for advancing embodied AI. To support research in this
area, we construct the first R3DGS dataset, Ref-LERF. Our analysis reveals that 3D multi-modal
understanding and spatial relationship modeling
are key challenges for R3DGS. To address these
challenges, we propose ReferSplat, a framework
that explicitly models 3D Gaussian points with
natural language expressions in a spatially aware
paradigm. ReferSplat achieves state-of-the-art
performance on both the newly proposed R3DGS
task and 3D open-vocabulary segmentation benchmarks. Code, trained models, and the dataset will
be publicly released.

Datasets
The Ref-LERF dataset is accessible for download via the following link: baiduyun or huggingface
<path to ref-lerf dataset>
|---figurines
|---ramen
|---waldo_kitchen
|---teatime
Checkpoints and Pseudo mask
The Checkpoints and Pseudo mask are accessible for download via the following link:googledrive or huggingface
Cloning the Repository
The repository contains submodules, thus please check it out with
#SSH
git clone git@github.com:heshuting555/ReferSplat.git
cd ReferSplat
or
#HTTPS
git clone https://github.com/heshuting555/ReferSplat.git
cd ReferSplat
Setup
Our default, provided install method is based on Conda package and environment management:
conda env create --file environment.yml
conda activate refsplat
Training
Note: Before training, you need to train original 3DGS to obtain pretrained Gaussians for RGB rendering.
python train.py -s <path to ref-lerf dataset> -m <path to output_model>
<ref-lerf>
|---<path to ref-lerf dataset>
| |---<figurines>
| |---<ramen>
| |---...
|---<path to output_model>
|---<figurines>
|---<ramen>
|---...
Render
python render.py -m <path to output_model>
Get pseudo mask
Please refer to the "Grounded-SAM: Detect and Segment Everything with Text Prompt" method in https://github.com/IDEA-Research/Grounded-Segment-Anything
BibTeX
Please consider citing ReferSplat if it helps your research.
@inproceedings{ReferSplat,
title={{ReferSplat}: Referring Segmentation in 3D Gaussian Splatting},
author={He, Shuting and Jie, Guangquan and Wang, Changshuo and Zhou, Yun and Hu, Shuming and Li, Guanbin and Ding, Henghui},
booktitle={International Conference on Machine Learning (ICML)}
}
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
