<img src="docs/figs/logo.png" align="center" width="18%"> <h1 align="center">OpenESS: Event-Based Semantic Scene Understanding with Open Vocabularies</h1> <a href="https://ldkong.com/" target='_blank'>Lingdong Kong</a>1,2    <a href="https://github.com/youquanl" target='_blank'>Youquan Liu</a>3    <a href="https://ipal.cnrs.fr/lai-xing-ng/" target='_blank'>Lai Xing Ng</a>4    <a href="https://ipal.cnrs.fr/benoit-cottereau-personal-page/" target='_blank'>Benoit R. Cottereau</a>5,6    <a href="https://www.comp.nus.edu.sg/cs/people/ooiwt/" target='_blank'>Wei Tsang Ooi</a>1 1National University of Singapore    2CNRS@CREATE    3Hochschule Bremerhaven    4Institute for Infocomm Research, A*STAR    5IPAL, CNRS IRL 2955, Singapore    6CerCo, CNRS UMR 5549, Universite Toulouse III

About

OpenESS is an open-vocabulary event-based semantic segmentation (ESS) framework that synergizes information from image, text, and event-data domains to enable scalable ESS in an open-world, annotation-efficient manner.

:gear: Installation
:hotsprings: Data Preparation
- DSEC-Semantic
- DDD17-Seg
:rocket: Getting Started
Benchmark
Citation
License
Acknowledgements

:gear: Installation

Kindly refer to INSTALL.md for the installation details.

:hotsprings: Data Preparation

DSEC-Semantic

Step 1: Download the DSEC dataset from the official dataset page. Below, we summarize the links used for downloading each of the resources:

| Training Data | Link | Size | Description | |:-:|:-:|:-:|:-| | Events | download | 125 GB | The raw event data in .h5 format | | Frames | download | 216 GB | The RGB frames in .png format | | Disparities | download | 12 GB | The disparities between left and right sensors | | Semantic Masks | download | 88.6 MB | The ground truth semantic segmentation labels |

| Test Data | Link | Size | Description | |:-:|:-:|:-:|:-| | Events | download | 27 GB | The raw event data in .h5 format | | Frames | download | 43 GB | The RGB frames in .png format | | Semantic Masks | download | 28.9 MB | The ground truth semantic segmentation labels |

Step 2: Link the dataset to path ./data. Your dataset folder should end up aligning with the following dataset structure:

./data/DSEC
        ├── test
        │     ├── zurich_city_13_a
        │     │    ├── events
        │     │    │    └── left
        │     │    │          └── events.h5
        │     │    ├── images
        │     │    │    └── left
        │     │    │          │── 000000.png
        │     │    │          │── ...
        │     │    │          └── 000378.png
        │     │    ├── images_aligned
        │     │    │    └── left
        │     │    │    │── 000000.png
        │     │    │    │── ...
        │     │    │    └── 000378.png
        │     │    ├── reconstructions
        │     │    │    └── left
        │     │    │          │── 000000.png
        │     │    │          │── ...
        │     │    │          └── 000378.png
        │     │    └── semantic
        │     │         └── left
        │     │               │── 000000.png
        │     │               │── ...
        │     │               └── 000378.png
        │     ├── zurich_city_14_c
        │     └── zurich_city_15_a
        └── train
              ├── zurich_city_00_a
              ├── zurich_city_01_a
              ├── zurich_city_02_a
              ├── zurich_city_04_a
              ├── zurich_city_05_a
              ├── zurich_city_06_a
              ├── zurich_city_07_a
              └── zurich_city_08_a

Step 3: Prepare frame data that aligns with the events. Please follow the same procedure from Sun et al. (ESS: Learning Event-based Semantic Segmentation from Still Images), and place the processed frame data into the folder named images_aligned.

Additionally, we provided our processed DSEC-Semantic frame data at this Google Drive link (~4.95 GB).
Step 4: Prepare the zero-shot semantic labels for T2E: Text-to-Event Consistency Regularization. For more details, kindly refer to FC-CLIP.md.

Additionally, we provided our generated DSEC-Semantic T2E labels at this Google Drive link (~47.5 MB).
Step 5: Prepare the event reconstruction data. Please follow the same procedure from Sun et al. (ESS: Learning Event-based Semantic Segmentation from Still Images), and place the processed frame data into the folder named images_aligned.

The pretrained E2VID model can be downloaded from this link and should be placed under the folder /e2vid/pretrained/.

Additionally, we provided our processed DSEC-Semantic event reconstruction data at this Google Drive link (~2.41 GB).
Step 6: Generate the semantic superpixels of SAM for DSEC-Semantic. You should first download the pretrained SAM model from this link.

Next, run the following scripts to generate the superpixels:
```
# for training set
python data_preparation/superpixel_generation_dsec_sam.py -r data/DSEC/train

# for test set
python data_preparation/superpixel_generation_dsec_sam.py -r data/DSEC/test
```
The generated superpixels should be placed in the folder named sp_sam_rgb.
Step 7: Generate the semantic superpixels of SLIC for DSEC-Semantic. You can directly run the following script to generate the superpixels:
```
python data_preparation/superpixel_segmenter_dsec_slic.py --worker $WORKER_NUM --num_segments $SEGMENTS_NUM
```
The generated superpixels should be placed in the folder named sp_slic_rgb.

To summarize, for each of the sequences in DSEC-Semantic, we expect that you prepare the following data for running the experiments:

./sequence_name
    ├── events
    ├── images
    ├── images_aligned
    ├── pl_fcclip_rgb
    ├── reconstructions
    ├── semantic
    ├── sp_sam_rgb
    └── sp_slic_rgb

DDD17-Seg

Step 1: Download the DDD17 dataset from the official dataset page and/or from the Ev-SegNet paper.

Step 2: Link the dataset to path ./data. Your dataset folder should end up aligning with the following dataset structure:

./data/DDD17
        ├── dir0
        │    ├── events.dat.t
        │    ├── events.dat.xyp
        │    ├── index
        │    │     │── index_10ms.npy
        │    │     │── index_50ms.npy
        │    │     └── index_250ms.npy
        │    ├── images
        │    │     │── img_00000002.png
        │    │     │── ...
        │    │     └── img_00011178.png
        │    ├── images_aligned
        │    │     │── img_00000002.png
        │    │     │── ...
        │    │     └── img_00011178.png
        │    ├── reconstructions
        │    │     │── img_00000002.png
        │    │     │── . . .
        │    │     └── img_00011178.png
        │    └── segmentation_masks
        │          │── img_00000002.png
        │          │── . . .
        │          └── img_00011178.png
        ├── dir1
        ├── dir3
        ├── dir4
        ├── dir6
        └── dir7

Step 3: Prepare frame data that aligns with the events.

OpenESS

Install / Use

README

About

Table of Contents

:gear: Installation

:hotsprings: Data Preparation

DSEC-Semantic

DDD17-Seg