Vxp

[3DV 2025] VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition

Generate Convert Improve

Install / Use

/learn @yunjinli/Vxp

About this skill

Quality Score

0/100

README

VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition (3DV 2025)

Project page | Paper

News

2024/12/20: We are actively working on the improved version VXP v2, stay tuned...

Introduction

We propose a novel Voxel-Cross-Pixel (VXP) approach, which establishes voxel and pixel correspondences in a self-supervised manner and brings them into a shared feature space. We achieve state-of-the-art performance in cross-modal retrieval on the Oxford RobotCar, ViViD++ datasets and KITTI benchmark, while maintaining high uni-modal global localization accuracy.

| | | | --------------------------------------------- | --------------------------------------------- | | 2d3d | 3d2d | | 2d2d | 3d3d |

teaser pipeline

Setup the environement

git clone https://github.com/yunjinli/vxp.git
cd vxp
conda create -n VXP python=3.10 -y
conda activate VXP
pip install torch==2.0.1 torchvision==0.15.2 numpy pandas tqdm tensorboard psutil scikit-learn==1.2.2 bitarray pytorch-metric-learning==0.9.94 torchinfo
pip install -U openmim
mim install mmengine==0.7.3 mmcv==2.0.0 mmdet==3.0.0 mmdet3d==1.1.0
pip install 'git+https://github.com/facebookresearch/detectron2.git'

For sparse 3D convolution, we're using spconv library. You can follow the detailed installation guide on their repository. Or you can simply run the following command with specific cuda version (I'm using CUDA 12.0).

pip install spconv-cu120

Dataset Format / Creation

Please see here.

Training

Please see here.

Inference

Please see here.

BibTex

@article{li2024vxp,
    title={VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition},
    author={Li, Yun-Jin and Gladkova, Mariia and Xia, Yan and Wang, Rui and Cremers, Daniel},
    journal={arXiv preprint arXiv:2403.14594},
    year={2024}
}

Related Skills

node-connect

343.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

90.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。