ScanReason
[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities
Install / Use
/learn @ZCMax/ScanReasonREADME
📦 Benchmark and Model
Benchmark Overview
<p align="center"> <img src="assets/scanreason_benchmark_v2.png" align="center" width="100%"> </p> ScanReason is the first comprehensive and hierarchical 3D reasoning grounding benchmark. We define 5 types of questions depending on which type of reasoning is required: Spatial reasoning and function reasoning require fundamental understanding of the 3D physical world, focusing on objects themselves and inter-object spatial relationships in a 3D scene respectively, and logistic reasoning, emotional reasoning, and safety reasoning are high-level reasoning skills built upon the two fundamental reasoning abilities to address user-centric real-world applications. <!-- <p align="center"> <img src="assets/3d_reasoning_grounding.png" align="center" width="100%"> </p> -->Model Overview
<p align="center"> <img src="assets/Fig_Method.png" align="center" width="100%"> </p>🔥 News
- [2023-10-10] We release our pre-version of ScanReason validation benchmark. Download here. The corresponding 3D bounding boxes annotations could be obtained through the object ids from EmbodiedScan.
- [2023-10-01] We release the training and inference codes of ReGround3D.
- [2023-07-02] We release the paper of ScanReason.
Getting Started
1. Installation
-
We utilize at least 4 A100 GPU for training and inference.
-
We test the code under the following environment:
- CUDA 11.8
- Python 3.9
- PyTorch 2.1.0
-
Git clone our repository and creating conda environment:
git clone https://github.com/ZCMax/ScanReason.git conda create -n scanreason python=3.9 conda activate scanreason pip install -r requirements.txt -
Follow EmbodiedScan Installation Doc to install embodiedscan series.
-
Compile Pointnet2
cd pointnet2 python setup.py install --user
2. Data Preparation
-
Follow EmbodiedScan Data Preparation Doc to download the raw scan (RGB-D) datasets and modify the
VIDEO_FOLDERintrain_ds.shto the raw data path. -
Download the text annotations from Google Drive and modify the
JSON_FOLDERintrain_ds.shto the annotations path, and modify theINFO_FILEdata path which is included in the annotations.
3. Training ReGround3D
We provide the slurm training script with 4 A100 GPUs:
./scripts/train_ds.sh
4. Evaluation ReGround3D
After training, you can run the
./scripts/convert_zero_to_fp32.sh
to convert the weights to pytorch_model.bin file, and then use
./scripts/merge_lora_weights.sh
to merge lora weight and obtain the final checkpoints under ReGround3D-7B.
Finally, run
./scripts/eval_ds.sh
to obtain the grounding results.
📝 TODO List
- [x] First Release.
- [x] Release ReGround3D code.
- [ ] Release ScanReason datasets and benchmark.
📄 License
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/80x15.png" /></a> <br /> This work is under the <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
👏 Acknowledgements
This repo benefits from LISA, EmbodiedScan, 3D-LLM, LLaVA.
