OceanGym

OceanGym: A Benchmark Environment for Underwater Embodied Agents

Generate Convert Improve

Install / Use

/learn @OceanGPT/OceanGym

About this skill

Quality Score

0/100

README

<h1 align="center"> <img src="asset/img/OceanGym_logo.png" style="width:5%; height: auto; vertical-align: middle;"> OceanGym </h1> <h3 align="center">A Benchmark Environment for Underwater Embodied Agents</h3> <div align="center" style="display: inline-flex; align-items: center; justify-content: center; gap: 20px;"> <div style="display: flex; flex-direction: column; justify-content: center; height: 120px;"> <span>🌐 <a href="https://oceangpt.github.io/OceanGym" target="_blank">Home Page</a></span> <span>📄 <a href="https://arxiv.org/abs/2509.26536" target="_blank">ArXiv Paper</a></span> <span>🤗 <a href="https://huggingface.co/datasets/zjunlp/OceanGym" target="_blank">Hugging Face</a></span> <span>☁️ <a href="https://drive.google.com/file/d/1EfKHeiyQD5eoJ6-EsiJHuIdBRM5Ope5A/view?usp=drive_link" target="_blank">Google Drive</a></span> <span>☁️ <a href="https://pan.baidu.com/s/1OKotYiYphSUtUSJH9sWO7A?pwd=txye" target="_blank">Baidu Drive</a></span> </div> </div> <h5 align="center"> The scene file is available at the following link </h5> <div align="center" style="display: inline-flex; align-items: center; justify-content: center; gap: 20px;"> <span>☁️ <a href="https://pan.baidu.com/s/1OKotYiYphSUtUSJH9sWO7A?pwd=txye" target="_blank">Large Environment</a></span> <span>☁️ <a href="https://pan.baidu.com/s/1k0G2ZLu-ATlecr3C_otdRA?pwd=2w1u" target="_blank">Small Environment</a></span> </div> <img src="asset/img/o1.jpg" style="width:120%; height: auto;" align=center>

OceanGym is a high-fidelity embodied underwater environment that simulates a realistic ocean setting with diverse scenes. As illustrated in figure, OceanGym establishes a robust benchmark for evaluating autonomous agents through a series of challenging tasks, encompassing various perception analyses and decision-making navigation. The platform facilitates these evaluations by supporting multi-modal perception and providing action spaces for continuous control.

OceanGym supports a wide range of underwater targets and allows users to freely create, edit, and customize these objects within the environment.
The platform incorporates water–flow and hydrodynamic simulation (there exists a discrepancy), as well as depth-dependent lighting and visibility modeling, enabling reproduction of underwater conditions.
Users can flexibly modify environmental parameters, inject new scenes, or adjust task settings, serving as a versatile testbed for benchmarking and developing underwater autonomous agents.

We have provided a teaching demonstration video here:bilibili

💐 Acknowledgement

OceanGym environment is built upon Unreal Engine (UE) 5.3, with certain components developed by drawing inspiration from and partially based on HoloOcean. We sincerely acknowledge their valuable contribution.

🔔 News

12-2025, we updated the world that supports underwater current simulation
10-2025, we released the initial version of OceanGym along with the accompanying paper.
04-2025, we launched the OceanGym project.

Contents:

💐 Acknowledgement
🔔 News
📺 Quick Start
- Decision Task
- Perception Task
⚙️ Set up Environment
🧠 Decision Task
- Target Object Locations
- Evaluation Criteria
👀 Perception Task
- Using the Bench to Eval
- Collecting Image Data
⏱️ Results
- Decision Task
- Perception Task
📚 Datasets
🚩 Citation

📺 Quick Start

Install the experimental code environment using pip:

pip install -r requirements.txt

Decision Task

Only the environment is ready! Build the environment based on here.

Step 1: Run a Task Script

For example, to run task 4:

python decision\tasks\task4.py

Follow the keyboard instructions or switch to LLM mode for automatic decision-making.

Step 2: Keyboard Control Guide

| Key | Action | |-------------|------------------------------| | W | Move Forward | | S | Move Backward | | A | Move Left | | D | Move Right | | J | Turn Left | | L | Turn Right | | I | Move Up | | K | Move Down | | M | Switch to LLM Mode | | Q | Exit |

You can use WASD for movement, J/L for turning, I/K for up/down. Press M to switch to large language model mode (may cause temporary lag). Press Q to exit.

Step 3: View Results

Logs and memory files are automatically saved in the log/ and memory/ directories.

Step 4: Evaluate the results

Place the generated memory and important_memory files into the corresponding point folders. Then, set the evaluation paths in the evaluate.py file.

We provide 6 experimental evaluation paths. In evaluate.py, you can configure them as follows:

eval_roots = [
    os.path.join(eval_root, "main", "gpt4omini"),
    os.path.join(eval_root, "main", "gemini"),
    os.path.join(eval_root, "main", "qwen"),
    os.path.join(eval_root, "migration", "gpt4o"),
    os.path.join(eval_root, "migration", "qwen"),
    os.path.join(eval_root, "scale", "qwen"),
]

To run the evaluation:

python decision\utils\evaluate.py

The generated results will be saved under the \eval\decision folder.

Perception Task

All commands are applicable to Linux, so if you using Windows, you need to change the corresponding path representation (especially the slash).

Step 1: Prepare the dataset

After downloading from Hugging Face or Google Drive, put it into the data/perception folder.

Step 2: Select model parameters

| parameter | function | | ---| --- | | model_template | The large language model message queue template you selected. | | model_name_or_path | If it is an API model, it is the model name; if it is a local model, it is the path. | | api_key | If it is an API model, enter your key. | | base_url | If it is an API model, enter its baseful URL. |

Now we only support OpenAI, Google Gemma, Qwen and OpenBMB.

MODELS_TEMPLATE="Yours"
MODEL_NAME_OR_PATH="Yours"
API_KEY="Yours"
BASE_URL="Yours"

Step 3: Run the experiments

| parameter | function | | ---| --- | | exp_name | Customize the name of the experiment to save the results. | | exp_idx | Select the experiment number, or enter "all" to select all. | | exp_json | JSON file containing the experiment label data. | | images_dir | The folder where the experimental image data is stored. |

For the experimental types, We designed (1) multi-view perception task and (2) context-based perception task.

For the lighting conditions, We designed (1) high illumination and (2) low illumination.

For the auxiliary sonar, We designed (1) without sonar image (2) zero-shot sonar image and (3) sonar image with few sonar example.

Such as this command is used to evaluate the multi-view perception task under high illumination:

python perception/eval/mv.py \
    --exp_name Result_MV_highLight_00 \
    --exp_idx "all" \
    --exp_json "/data/perception/highLight.json" \
    --images_dir "/data/perception/highLight" \
    --model_template $MODELS_TEMPLATE \
    --model_name_or_path $MODEL_NAME_OR_PATH \
    --api_key $API_KEY \
    --base_url $BASE_URL

For more patterns about perception tasks, please read this part carefully.

⚙️ Set up Environment

This project is based on the HoloOcean environment. 💐

We have placed a simplified version here. If you encounter any detailed issues, please refer to the original installation document.

We have provided a teaching demonstration video here:bilibili

Install the OceanGym_large.zip

From ☁️ <a href="https://drive.google.com/file/d/1EfKHeiyQD5eoJ6-EsiJHuIdBRM5Ope5A/view?usp=drive_link" target="_blank">Google Drive</a> ☁️ <a href="https://pan.baidu.com/s/16h86huHLeFGAKatRWvLrFQ?pwd=wput" target="_blank">Baidu Drive</a> download the OceanGym_large.zip And extract it to the folder you want

Packaged Installation

Python Library

From the cloned repository, install the Python package by doing the following:

cd OceanGym_large/client
pip install .

Worlds Packages

Related Skills

node-connect

349.7k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.7k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.7k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。