XMask3D
[NeurIPS 2024] XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation
Install / Use
/learn @wangzy22/XMask3DREADME
XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation
Created by Ziyi Wang*, Yanbo Wang*, Xumin Yu, Jie Zhou, Jiwen Lu.
This repository is a pyTorch implementation of our NeurIPS 2024 paper XMask3D.
XMask3D is a framework for open vocabulary 3D semantic segmentation that improves fine-grained boundary delineation by aligning 3D features with a 2D-text embedding space at the mask level. Using a mask generator based on a pre-trained diffusion model, it enables precise textual control over dense pixel representations, enhancing the versatility of generated masks. By integrating 3D global features into a 2D denoising UNet, XMask3D adds 3D geometry awareness to mask generation. The resulting 2D masks align 3D representations with vision-language features, yielding competitive segmentation performance across benchmarks.
[arXiv]

Installation
- Follow the installation.md to install all required packages so you can do the training & evaluation afterwards.
Data Preparation
- For convenience, the download link for the processed dataset is provided here. You can download the dataset by executing the command below.
sh scripts/download_datasets.sh
Pre-trained Model Preparation
- For this project, you will need the pre-trained CLIP model and the Stable Diffusion model. Due to the instability of official network links, we provide alternative download options below:
# CLIP ViT-Large Patch14
cd /path/to/your/workspace
wget -O openai.tar.gz https://cloud.tsinghua.edu.cn/f/3890f1df1c5248a7a6e8/?dl=1
tar -xzvf openai.tar.gz
# Stable Diffusion v1.3 Checkpoint
wget -O sd_model.tar.gz https://cloud.tsinghua.edu.cn/f/8dce9b137f574e6eb57c/?dl=1
tar -xzvf sd_model.tar.gz
Usage
Training
sh run/train.sh --exp_dir=<EXPERIMENT_DIRECTORY> --config=<CONFIG_FILE>
- For example, to train on the ScanNet B15N4 benchmark, run:
sh run/train.sh --exp_dir=out/exp_b15n4 --config=config/scannet/xmask3d_scannet_B15N4.yaml
Resume
sh run/resume.sh --exp_dir=<EXPERIMENT_DIRECTORY> --config=<CONFIG_FILE>
- For example, to resume the last ckpt on the ScanNet B15N4 benchmark, run:
sh run/resume.sh --exp_dir=out/exp_b15n4 --config=config/scannet/xmask3d_scannet_B15N4.yaml
Inference
sh run/infer.sh --exp_dir=<EXPERIMENT_DIRECTORY> --config=<CONFIG_FILE> --ckpt_name=<CKPT_NAME>
- For example, to run inference using the checkpoint
b15n4.pth.taron the ScanNet B15N4 benchmark, execute the following command:
sh run/infer.sh --exp_dir=out/exp_b15n4 --config=config/scannet/xmask3d_scannet_B15N4.yaml --ckpt_name=b15n4.pth.tar
Checkpoint
| Benchmark | hIoU / mIoU<sub>b</sub> / mIoU<sub>n</sub> | Download Link | |-----------------------|-----------------------------------------------|--------------------------| | Scannet B15N4 | 70.0 / 69.8 / 70.2 | [Tsinghua Cloud] [Google] | | Scannet B12N7 | 61.7 / 70.2 / 55.1 | [Tsinghua Cloud] [Google] | | Scannet B10N9 | 55.7 / 76.5 / 43.8 | [Tsinghua Cloud] [Google] | | Scannet B170N30 | 18.0 / 27.8 / 13.3 | [Tsinghua Cloud] [Google] | | Scannet B150N50 | 15.5 / 24.4 / 11.4 | [Tsinghua Cloud] [Google] |
Citation
If you find our work useful in your research, please consider citing:
@article{wang2024xmask3d,
title={XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation},
author={Wang, Ziyi and Wang, Yanbo and Yu, Xumin and Zhou, Jie and Lu, Jiwen},
journal={arXiv preprint arXiv:2411.13243},
year={2024}
}
Related Skills
node-connect
349.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
