SkillAgentSearch skills...

MapGPT

[ACL 24] The official implementation of MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation.

Install / Use

/learn @chen-judge/MapGPT
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

MapGPT

The official implementation of MapGPT. [Paper] [Project]

MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation.

Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong.

Annual Meeting of the Association for Computational Linguistics (ACL 2024).

<!-- <p align="center"> <img src="figs/intro.png" alt="introduction" style="width:560px;height:400px;"> </p> --> <p align="center"> <img src="figs/framework.png" alt="framework"> </p>

If you have any questions, please contact me by email: jqchen(at)cs.hku.hk

Setup

Install Matterport3D simulators: follow instructions here. We use the latest version instead of v0.1.

Install requirements:

conda create -n MapGPT python=3.10
conda activate MapGPT
pip install -r requirements.txt

Prepare data:

  • You can follow DUET and set the annotations for testing val-unseen set.
  • We sample a subset containing 72 scenes and 216 cases for quick and cost-effective testing. You can download the corresponding MapGPT_72_scenes_processed.json and place it in the datasets/R2R/annotations directory.
  • The observation images need to be collected in advance from the simulator. You can use your own saved images or use the RGB_Observations.zip we have processed.

GPT key: please set your API key here.

Inference

In addition to the reported results of GPT-4v in the paper, we have also included the implementation of latest GPT-4o which is faster and cheaper.

You can run the following script where --llm is set as gpt-4o-2024-05-13 and --response_format is set as json.

bash scripts/gpt4o.sh

The performance comparison between two implementations on a sampled subset is as follows. GPT-4o can achieve better NE but slightly worse SR.

| LLMs | NE | OSR | SR | SPL | | --- | --- | --- | --- | --- | | GPT-4v | 5.62 | 57.9 | 47.7 | 38.1 | | GPT-4o | 5.11 | 56.9 | 46.3 | 37.8 |

Note that you should modify the following part in gpt4o.sh to set the path to your observation images, the split you want to test, etc.

--root_dir ${DATA_ROOT}
--img_root /path/to/images
--split MapGPT_72_scenes_processed
--end 10  # the number of cases to be tested
--output_dir ${outdir}
--max_action_len 15
--save_pred
--stop_after 3
--llm gpt-4o-2024-05-13
--response_format json
--max_tokens 1000

Citation

@inproceedings{chen2024affordances,
  title={Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation},
  author={Chen, Jiaqi and Lin, Bingqian and Liu, Xinmin and Ma, Lin and Liang, Xiaodan and Wong, Kwan-Yee~K.},
  booktitle = "Proceedings of the AAAI Conference on Artificial Intelligence",
  year={2025}
}
@inproceedings{chen2024mapgpt,
  title={MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation},
  author={Chen, Jiaqi and Lin, Bingqian and Xu, Ran and Chai, Zhenhua and Liang, Xiaodan and Wong, Kwan-Yee~K.},
  booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics",
  year={2024}
}
View on GitHub
GitHub Stars126
CategoryDevelopment
Updated3d ago
Forks11

Languages

Python

Security Score

80/100

Audited on Mar 29, 2026

No findings