Zson
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings. NeurIPS 2022
Install / Use
/learn @gunagg/ZsonREADME
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings
This is a PyTorch implementation of the NeurIPS-22 paper: https://arxiv.org/abs/2206.12403
Arjun Majumdar*, Gunjan Aggarwal*, Bhavika Devnani, Judy Hoffman and Dhruv Batra
Georgia Institute of Technology, Meta AI
Details
We present a scalable approach for learning open-world object-goal navigation (ObjectNav) – the task of asking a virtual robot (agent) to find any instance of an object in an unexplored environment (e.g., “find a sink”). Our approach is entirely zero-shot – i.e., it does not require ObjectNav rewards or demonstrations of any kind.
<p align="center"> <img src="imgs/zson_approach.png" height="400"> <p align="center"><i>Model Architecture for ZSON. </i></p> </p>Installation
All the required data can be downloaded from here.
-
Create a conda environment:
conda create -n zson python=3.7 cmake=3.14.0conda activate zson -
Install pytorch version
1.10.2:conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=11.3 -c pytorch -c conda-forge -
Install habitat-sim:
conda install habitat-sim-challenge-2022 headless -c conda-forge -c aihabitat -
Install habitat-lab:
git clone --branch challenge-2022 https://github.com/facebookresearch/habitat-lab.git habitat-lab-challenge-2022cd habitat-lab-challenge-2022pip install -r requirements.txtpython setup.py develop --all # install habitat and habitat_baselinescd ..
Download and Install zson:
-
Setup steps
git clone git@github.com:gunagg/zson.gitcd zsonpip install -r requirements.txtpython setup.py develop -
Follow the instructions here to set up the
data/scene_datasets/directory.gibsonscenes can be found here. -
Download the HM3D ImageNav training dataset:
wget https://huggingface.co/gunjan050/ZSON/resolve/main/imagenav_hm3d.zipunzip imagenav_hm3d.ziprm imagenav_hm3d.zip # clean-up -
Download the MP3D objectnav dataset.
wget https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/m3d/v1/objectnav_mp3d_v1.zipmkdir -p data/datasets/objectnav/mp3d/v1unzip objectnav_mp3d_v1.zip -d data/datasets/objectnav/mp3d/v1rm objectnav_mp3d_v1.zip # clean-up -
Download the HM3D objectnav dataset.
wget https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/hm3d/v1/objectnav_hm3d_v1.zipunzip objectnav_hm3d_v1.zip -d data/datasets/objectnav/rm objectnav_hm3d_v1.zip # clean-up -
Download the trained checkpoints zson_conf_A.pth and zson_conf_B.pth, and move to
data/checkpoints. -
To train policies using OVRL pretrained RGB encoder, download the model weights from here and move to
data/models/. More details on the encoder can be found here. -
Setup
data/goal_datasetsusing the scripttools/extract-goal-features.py. This caches CLIP goal embeddings for faster training.Your directory structure should now look like this:
. +-- habitat-lab-v0.2.1/ | ... +-- zson/ | +-- data/ | | +-- datasets/ | | | +-- objectnav/ | | | +-- imagenav/ | | +-- scene_datasets/ | | | +-- hm3d/ | | | +-- mp3d/ | | +-- goal_datasets/ | | | +-- imagenav/ | | | | +-- hm3d/ | | +-- models/ | | +-- checkpoints/ | +-- zson/ | ...
Usage
ZSON configuration A ImageNav Training
sbatch scripts/imagenav-v1-hm3d-ovrl-rn50.sh
ZSON configuration B ImageNav Training
sbatch scripts/imagenav-v2-hm3d-ovrl-rn50.sh
ObjectNav Evaluation
To evaluate a checkpoint trained using ZSON checkpoint use the following command:
sbatch scripts/objnav-eval-$DESIRED-CONFIGURATION$-$DATASET$.sh
Citation
If you use this code in your research, please consider citing:
@inproceedings{majumdar2022zson,
title={ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings},
author={Majumdar, Arjun and Aggarwal, Gunjan and Devnani, Bhavika and Hoffman, Judy and Batra, Dhruv},
booktitle={Neural Information Processing Systems (NeurIPS)},
year={2022}
}
Related Skills
node-connect
342.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.7kCommit, push, and open a PR
