OpenESS
[CVPR 2024 Highlight] OpenESS: Event-Based Semantic Scene Understanding with Open Vocabularies
Install / Use
/learn @ldkong1205/OpenESSREADME
About
OpenESS is an open-vocabulary event-based semantic segmentation (ESS) framework that synergizes information from image, text, and event-data domains to enable scalable ESS in an open-world, annotation-efficient manner.
| <img width="173" src="docs/figs/teaser_1.png"> | <img width="173" src="docs/figs/teaser_2.png"> | <img width="173" src="docs/figs/teaser_3.png"> | <img width="173" src="docs/figs/teaser_4.png"> | | :-: | :-: | :-: | :-: | | Input Event Stream | “Driveable” | “Car” | “Manmade” | | <img width="173" src="docs/figs/teaser_5.png"> | <img width="173" src="docs/figs/teaser_6.png"> | <img width="173" src="docs/figs/teaser_7.png"> | <img width="173" src="docs/figs/teaser_8.png"> | | Zero-Shot ESS | “Walkable” | “Barrier” | “Flat” |
Table of Contents
- :gear: Installation
- :hotsprings: Data Preparation
- :rocket: Getting Started
- Benchmark
- Citation
- License
- Acknowledgements
:gear: Installation
Kindly refer to INSTALL.md for the installation details.
:hotsprings: Data Preparation
DSEC-Semantic
-
Step 1: Download the DSEC dataset from the official dataset page. Below, we summarize the links used for downloading each of the resources:
| Training Data | Link | Size | Description | |:-:|:-:|:-:|:-| | Events | download | 125 GB | The raw event data in .h5 format | | Frames | download | 216 GB | The RGB frames in .png format | | Disparities | download | 12 GB | The disparities between left and right sensors | | Semantic Masks | download | 88.6 MB | The ground truth semantic segmentation labels |
| Test Data | Link | Size | Description | |:-:|:-:|:-:|:-| | Events | download | 27 GB | The raw event data in .h5 format | | Frames | download | 43 GB | The RGB frames in .png format | | Semantic Masks | download | 28.9 MB | The ground truth semantic segmentation labels |
-
Step 2: Link the dataset to path
./data. Your dataset folder should end up aligning with the following dataset structure:./data/DSEC ├── test │ ├── zurich_city_13_a │ │ ├── events │ │ │ └── left │ │ │ └── events.h5 │ │ ├── images │ │ │ └── left │ │ │ │── 000000.png │ │ │ │── ... │ │ │ └── 000378.png │ │ ├── images_aligned │ │ │ └── left │ │ │ │── 000000.png │ │ │ │── ... │ │ │ └── 000378.png │ │ ├── reconstructions │ │ │ └── left │ │ │ │── 000000.png │ │ │ │── ... │ │ │ └── 000378.png │ │ └── semantic │ │ └── left │ │ │── 000000.png │ │ │── ... │ │ └── 000378.png │ ├── zurich_city_14_c │ └── zurich_city_15_a └── train ├── zurich_city_00_a ├── zurich_city_01_a ├── zurich_city_02_a ├── zurich_city_04_a ├── zurich_city_05_a ├── zurich_city_06_a ├── zurich_city_07_a └── zurich_city_08_a -
Step 3: Prepare frame data that aligns with the events. Please follow the same procedure from Sun et al. (ESS: Learning Event-based Semantic Segmentation from Still Images), and place the processed frame data into the folder named
images_aligned.Additionally, we provided our processed DSEC-Semantic frame data at this Google Drive link (~4.95 GB).
-
Step 4: Prepare the zero-shot semantic labels for T2E: Text-to-Event Consistency Regularization. For more details, kindly refer to FC-CLIP.md.
Additionally, we provided our generated DSEC-Semantic T2E labels at this Google Drive link (~47.5 MB).
-
Step 5: Prepare the event reconstruction data. Please follow the same procedure from Sun et al. (ESS: Learning Event-based Semantic Segmentation from Still Images), and place the processed frame data into the folder named
images_aligned.The pretrained E2VID model can be downloaded from this link and should be placed under the folder
/e2vid/pretrained/.Additionally, we provided our processed DSEC-Semantic event reconstruction data at this Google Drive link (~2.41 GB).
-
Step 6: Generate the semantic superpixels of SAM for DSEC-Semantic. You should first download the pretrained SAM model from this link.
Next, run the following scripts to generate the superpixels:
# for training set python data_preparation/superpixel_generation_dsec_sam.py -r data/DSEC/train # for test set python data_preparation/superpixel_generation_dsec_sam.py -r data/DSEC/testThe generated superpixels should be placed in the folder named
sp_sam_rgb. -
Step 7: Generate the semantic superpixels of SLIC for DSEC-Semantic. You can directly run the following script to generate the superpixels:
python data_preparation/superpixel_segmenter_dsec_slic.py --worker $WORKER_NUM --num_segments $SEGMENTS_NUMThe generated superpixels should be placed in the folder named
sp_slic_rgb.
To summarize, for each of the sequences in DSEC-Semantic, we expect that you prepare the following data for running the experiments:
./sequence_name
├── events
├── images
├── images_aligned
├── pl_fcclip_rgb
├── reconstructions
├── semantic
├── sp_sam_rgb
└── sp_slic_rgb
DDD17-Seg
-
Step 1: Download the DDD17 dataset from the official dataset page and/or from the Ev-SegNet paper.
-
Step 2: Link the dataset to path ./data. Your dataset folder should end up aligning with the following dataset structure:
./data/DDD17 ├── dir0 │ ├── events.dat.t │ ├── events.dat.xyp │ ├── index │ │ │── index_10ms.npy │ │ │── index_50ms.npy │ │ └── index_250ms.npy │ ├── images │ │ │── img_00000002.png │ │ │── ... │ │ └── img_00011178.png │ ├── images_aligned │ │ │── img_00000002.png │ │ │── ... │ │ └── img_00011178.png │ ├── reconstructions │ │ │── img_00000002.png │ │ │── . . . │ │ └── img_00011178.png │ └── segmentation_masks │ │── img_00000002.png │ │── . . . │ └── img_00011178.png ├── dir1 ├── dir3 ├── dir4 ├── dir6 └── dir7 -
Step 3: Prepare frame data that aligns with the events.
