SegVG
[ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Install / Use
/learn @WeitaiKang/SegVGREADME
SegVG
<p align="center"> <img src='docs/framework.png' align="center" height="540px"> </p>Introduction
This repository is an official PyTorch implementation of the ECCV 2024 paper SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding. Our SegVG transfers the box-level annotation as Segmentation signals to provide an additional pixel-level supervision for Visual Grounding. Additionally, the query, text, and vision tokens are triangularly updated to mitigate domain discrepancy by our proposed Triple Alignment module. Please cite our paper if the paper or codebase is helpful to you.
@article{kang2024segvg,
title={Segvg: Transferring object bounding box to segmentation for visual grounding},
author={Kang, Weitai and Liu, Gaowen and Shah, Mubarak and Yan, Yan},
journal={arXiv preprint arXiv:2407.03200},
year={2024}}
Installation
-
Clone this repository.
git clone https://github.com/WeitaiKang/SegVG.git -
Prepare for environment.
Please refer to
ReSCfor setting up environment. We use the 1.12.1+cu116 version pytorch. -
Prepare for data.
Please download the coco train2014
images.Please download the referring expression annotations from the 'annotation' directory of
SegVG.Please download the
ResNet101ckpts of vision backbone from TransVG.You can place them wherever you want. Just remember to set the paths right in your train.sh and test.sh.
Model Zoo
Our model ckpts are available in the 'ckpt' directory of SegVG.
- RefCOCO
| Model | val | testA | testB | |---------|---------|--------|--------| | SegVG | 86.84 | 89.46 | 83.07 |
- RefCOCO+
| Model | val | testA | testB | |---------|---------|--------|--------| | SegVG | 77.18 | 82.63 | 67.59 |
- RefCOCOg
| Model | val-g | val-u | test-u | |---------|---------|--------|--------| | SegVG | 76.01 | 78.35 | 77.42 |
- ReferItGame
| Model | test | |---------|---------| | SegVG | 75.59 |
Training and Evaluation
-
Training
bash train.shPlease take a look of
train.shto set the parameters. -
Evaluation
bash test.shPlease take a look of
test.shto set the parameters.
Acknowledge
This codebase is partially based on TransVG.
Related Skills
node-connect
349.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
