SkillAgentSearch skills...

ISO

[ECCV 2024] Monocular Occupancy Prediction for Scalable Indoor Scenes

Install / Use

/learn @hongxiaoy/ISO

README

<div align="center"> <h1>Monocular Occupancy Prediction for Scalable Indoor Scenes</h1>

Hongxiao Yu<sup>1,2</sup> · Yuqi Wang<sup>1,2</sup> · Yuntao Chen<sup>3</sup> · Zhaoxiang Zhang<sup>1,2,3</sup>

<sup>1</sup>School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS)

<sup>2</sup>NLPR, MAIS, Institute of Automation, Chinese Academy of Sciences (CASIA)

<sup>3</sup>Centre for Artificial Intelligence and Robotics (HKISI_CAS)

ECCV 2024

Static Badge Static Badge Static Badge

PWC

<img src="NYUv2.gif" width = "800" height = "200" /> </div>

Performance

Here we compare our ISO with the previously best NDC-Scene and MonoScene model.

| Method | IoU | ceiling | floor | wall | window | chair | bed | sofa | table | tvs | furniture | object | mIoU | |:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:| | MonoScene | 42.51 | 8.89 | 93.50 | 12.06 | 12.57 | 13.72 | 48.19 | 36.11 | 15.13 | 15.22 | 27.96 | 12.94 | 26.94 | | NDC-Scene| 44.17 | 12.02 | 93.51 | 13.11 | 13.77 | 15.83 | 49.57 | 39.87 | 17.17 | 24.57 | 31.00 | 14.96 | 29.03 | | Ours | 47.11 | 14.21 | 93.47 | 15.89 | 15.14 | 18.35 | 50.01 | 40.82 | 18.25 | 25.90 | 34.08 | 17.67 | 31.25 |

We highlight the best results in bold.

Pretrained models on NYUv2 can be downloaded here.

Preparing ISO

Installation

  1. Create conda environment:
$ conda create -n iso python=3.9 -y
$ conda activate iso
  1. This code was implemented with python 3.9, pytorch 2.0.0 and CUDA 11.7. Please install PyTorch:
$ conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
  1. Install the additional dependencies:
$ git clone --recursive https://github.com/hongxiaoy/ISO.git
$ cd ISO/
$ pip install -r requirements.txt

:bulb:Note

Change L140 in depth_anything/metric_depth/zoedepth/models/base_models/dpt_dinov2/dpt.py to

self.pretrained = torch.hub.load('facebookresearch/dinov2', 'dinov2_{:}14'.format(encoder), pretrained=False)

Then, download Depth-Anything pre-trained model and metric depth model checkpoints file to checkpoints/.

  1. Install tbb:
$ conda install -c bioconda tbb=2020.2
  1. Finally, install ISO:
$ pip install -e ./

:bulb:Note

If you move the ISO dir to another place, you should run

pip cache purge

then run pip install -e ./ again.

Datasets

NYUv2

  1. Download the NYUv2 dataset.

  2. Create a folder to store NYUv2 preprocess data at /path/to/NYU/preprocess/folder.

  3. Store paths in environment variables for faster access:

    $ export NYU_PREPROCESS=/path/to/NYU/preprocess/folder
    $ export NYU_ROOT=/path/to/NYU/depthbin 
    

    :bulb:Note

    Recommend using

    echo "export NYU_PREPROCESS=/path/to/NYU/preprocess/folder" >> ~/.bashrc

    format command for future convenience.

  4. Preprocess the data to generate labels at a lower scale, which are used to compute the ground truth relation matrices:

    $ cd ISO/
    $ python iso/data/NYU/preprocess.py NYU_root=$NYU_ROOT NYU_preprocess_root=$NYU_PREPROCESS
    

Occ-ScanNet

  1. Download the Occ-ScanNet dataset, this include:

    • posed_images
    • gathered_data
    • train_subscenes.txt
    • val_subscenes.txt
  2. Create a root folder to store Occ-ScanNet dataset /path/to/Occ/ScanNet/folder, and move the all dataset files to this folder, zip files need extraction.

  3. Store paths in environment variables for faster access:

    $ export OCC_SCANNET_ROOT=/path/to/Occ/ScanNet/folder
    

    :bulb:Note

    Recommend using

    echo "export OCC_SCANNET_ROOT=/path/to/Occ/ScanNet/folder" >> ~/.bashrc

    format command for future convenience.

Pretrained Models

Download ISO pretrained models on NYUv2, then put them in the folder /path/to/ISO/trained_models.

huggingface-cli download --repo-type model hongxiaoy/ISO

If you didn't install huggingface-cli before, please following official instructions.

Running ISO

Training

NYUv2

  1. Create folders to store training logs at /path/to/NYU/logdir.

  2. Store in an environment variable:

$ export NYU_LOG=/path/to/NYU/logdir
  1. Train ISO using 2 GPUs with batch_size of 4 (2 item per GPU) on NYUv2:
$ cd ISO/
$ python iso/scripts/train_iso.py \
    dataset=NYU \
    NYU_root=$NYU_ROOT \
    NYU_preprocess_root=$NYU_PREPROCESS \
    logdir=$NYU_LOG \
    n_gpus=2 batch_size=4

Occ-ScanNet

  1. Create folders to store training logs at /path/to/OccScanNet/logdir.

  2. Store in an environment variable:

$ export OCC_SCANNET_LOG=/path/to/OccScanNet/logdir
  1. Train ISO using 2 GPUs with batch_size of 4 (2 item per GPU) on Occ-ScanNet (should match config file name in train_iso.py):
$ cd ISO/
$ python iso/scripts/train_iso.py \
    dataset=OccScanNet \
    OccScanNet_root=$OCC_SCANNET_ROOT \
    logdir=$OCC_SCANNET_LOG \
    n_gpus=2 batch_size=4

Evaluating

NYUv2

To evaluate ISO on NYUv2 test set, type:

$ cd ISO/
$ python iso/scripts/eval_iso.py \
    dataset=NYU \
    NYU_root=$NYU_ROOT\
    NYU_preprocess_root=$NYU_PREPROCESS \
    n_gpus=1 batch_size=1

Inference

Please create folder /path/to/iso/output to store the ISO outputs and store in environment variable:

export ISO_OUTPUT=/path/to/iso/output

NYUv2

To generate the predictions on the NYUv2 test set, type:

$ cd ISO/
$ python iso/scripts/generate_output.py \
    +output_path=$ISO_OUTPUT \
    dataset=NYU \
    NYU_root=$NYU_ROOT \
    NYU_preprocess_root=$NYU_PREPROCESS \
    n_gpus=1 batch_size=1

Visualization

You need to create a new Anaconda environment for visualization.

conda create -n mayavi_vis python=3.7 -y
conda activate mayavi_vis
pip install omegaconf hydra-core PyQt5 mayavi

If you meet some problem when installing mayavi, please refer to the following instructions:

NYUv2

$ cd ISO/
$ python iso/scripts/visualization/NYU_vis_pred.py +file=/path/to/output/file.pkl

Aknowledgement

This project is built based on MonoScene. Please refer to (https://github.com/astra-vision/MonoScene) for more documentations and details.

We would like to thank the creators, maintainers, and contributors of the MonoScene, NDC-Scene, ZoeDepth, Depth Anything for their invaluable work. Their dedication and open-source spirit have been instrumental in our development.

Citation

@article{yu2024monocular,
  title={Monocular Occupancy Prediction for Scalable Indoor Scenes},
  author={Yu, Hongxiao and Wang, Yuqi and Chen, Yuntao and Zhang, Zhaoxiang},
  journal={arXiv preprint arXiv:2407.11730},
  year={2024}
}
View on GitHub
GitHub Stars67
CategoryEducation
Updated2d ago
Forks1

Languages

Python

Security Score

100/100

Audited on Mar 26, 2026

No findings