SimplerEnv

Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)

Generate Convert Improve

Install / Use

/learn @simpler-env/SimplerEnv

About this skill

Quality Score

0/100

README

SimplerEnv: Simulated Manipulation Policy Evaluation Environments for Real Robot Setups

Significant progress has been made in building generalist robot manipulation policies, yet their scalable and reproducible evaluation remains challenging, as real-world evaluation is operationally expensive and inefficient. We propose employing physical simulators as efficient, scalable, and informative complements to real-world evaluations. These simulation evaluations offer valuable quantitative metrics for checkpoint selection, insights into potential real-world policy behaviors or failure modes, and standardized setups to enhance reproducibility.

This repository's code is based in the SAPIEN simulator and the CPU based ManiSkill2 benchmark. We have also integrated the Bridge dataset environments into ManiSkill3, which offers GPU parallelization and can run 10-15x faster than the ManiSkill2 version. For instructions on how to use the GPU parallelized environments and evaluate policies on them, see: https://github.com/simpler-env/SimplerEnv/tree/maniskill3

This repository encompasses 2 real-to-sim evaluation setups:

Visual Matching evaluation: Matching real & sim visual appearances for policy evaluation by overlaying real-world images onto simulation backgrounds and adjusting foreground object and robot textures in simulation.
Variant Aggregation evaluation: creating different sim environment variants (e.g., different backgrounds, lightings, distractors, table textures, etc) and averaging their results.

We hope that our work guides and inspires future real-to-sim evaluation efforts.

SimplerEnv: Simulated Manipulation Policy Evaluation Environments for Real Robot Setups

Getting Started

Follow the Installation section to install the minimal requirements for our environments. Then you can run the following minimal inference script with interactive python. The scripts creates prepackaged environments for our visual matching evaluation setup.

import simpler_env
from simpler_env.utils.env.observation_utils import get_image_from_maniskill2_obs_dict

env = simpler_env.make('google_robot_pick_coke_can')
obs, reset_info = env.reset()
instruction = env.get_language_instruction()
print("Reset info", reset_info)
print("Instruction", instruction)

done, truncated = False, False
while not (done or truncated):
   # action[:3]: delta xyz; action[3:6]: delta rotation in axis-angle representation;
   # action[6:7]: gripper (the meaning of open / close depends on robot URDF)
   image = get_image_from_maniskill2_obs_dict(env, obs)
   action = env.action_space.sample() # replace this with your policy inference
   obs, reward, done, truncated, info = env.step(action) # for long horizon tasks, you can call env.advance_to_next_subtask() to advance to the next subtask; the environment might also autoadvance if env._elapsed_steps is larger than a threshold
   new_instruction = env.get_language_instruction()
   if new_instruction != instruction:
      # for long horizon tasks, we get a new instruction when robot proceeds to the next subtask
      instruction = new_instruction
      print("New Instruction", instruction)

episode_stats = info.get('episode_stats', {})
print("Episode stats", episode_stats)

Additionally, you can play with our environments in an interactive manner through ManiSkill2_real2sim/mani_skill2_real2sim/examples/demo_manual_control_custom_envs.py. See the script for more details and commands.

Installation

Prerequisites:

CUDA version >=11.8 and < 13 (this is required if you want to perform a full installation of this repo and perform RT-1 or Octo inference)
An NVIDIA GPU (ideally RTX; for non-RTX GPUs, such as 1080Ti and A100, environments that involve ray tracing will be slow). Currently TPU is not supported as SAPIEN requires a GPU to run.

Create an anaconda environment:

conda create -n simpler_env python=3.10 (3.10 or 3.11)
conda activate simpler_env

Clone this repo:

git clone https://github.com/simpler-env/SimplerEnv --recurse-submodules

Install numpy<2.0 (otherwise errors in IK might occur in pinocchio):

pip install numpy==1.24.4

Install ManiSkill2 real-to-sim environments and their dependencies:

cd {this_repo}/ManiSkill2_real2sim
pip install -e .

Install this package:

cd {this_repo}
pip install -e .

If you'd like to perform evaluations on our provided agents (e.g., RT-1, Octo), or add new robots and environments, please additionally follow the full installation instructions here.

Examples

Simple RT-1 and Octo evaluation script on prepackaged environments with visual matching evaluation setup: see simpler_env/simple_inference_visual_matching_prepackaged_envs.py.
Colab notebook for RT-1 and Octo inference: see this link.
Environment interactive visualization and manual control: see ManiSkill2_real2sim/mani_skill2_real2sim/examples/demo_manual_control_custom_envs.py
Policy inference scripts to reproduce our Google Robot and WidowX real-to-sim evaluation results with sweeps over object / robot poses and advanced loggings. These contain both visual matching and variant aggregation evaluation setups along with RT-1, RT-1-X, and Octo policies. See scripts/.
Real-to-sim evaluation videos from running scripts/*.sh: see this link.

Current Environments

To get a list of all available environments, run:

import simpler_env
print(simpler_env.ENVIRONMENTS)

| Task Name | ManiSkill2 Env Name | Image (Visual Matching) | | ----------- | ----- | ----- | | google_robot_pick_coke_can | GraspSingleOpenedCokeCanInScene-v0 | <img src="./images/example_visualization/google_robot_coke_can_visual_matching.png" width="128" height="128" > | | google_robot_pick_object | GraspSingleRandomObjectInScene-v0 | <img src="./images/example_visualization/google_robot_pick_random_object.png" width="128" height="128" > | | google_robot_move_near | MoveNearGoogleBakedTexInScene-v1 | <img src="./images/example_visualization/google_robot_move_near_visual_matching.png" width="128" height="128" > | | google_robot_open_drawer | OpenDrawerCustomInScene-v0 | <img src="./images/example_visualization/google_robot_open_drawer_visual_matching.png" width="128" height="128" > | | google_robot_close_drawer | CloseDrawerCustomInScene-v0 | <img src="./images/example_visualization/google_robot_close_drawer_visual_matching.png" width="128" height="128" > | | google_robot_place_in_closed_drawer | PlaceIntoClosedDrawerCustomInScene-v0 | <img src="./images/example_visualization/google_robot_put_apple_in_closed_top_drawer.png" width="128" height="128" > | | widowx_spoon_on_towel | PutSpoonOnTableClothInScene-v0 | <img src="./images/example_visualization/widowx_spoon_on_towel_visual_matching.png" width="128" height="128" > | | widowx_carrot_on_plate | PutCarrotOnPlateInScene-v0 | <img src="./images/example_visualization/widowx_carrot_on_plate_visual_matching.png" width="128" height="128" > | | widowx_stack_cube | StackGreenCubeOnYellowCubeBakedTexInScene-v0 | <img src="./images/example_visualization/widowx_stack_cube_visual_matching.png" width="128" height="128" > | | widowx_put_eggplant_in_basket | PutEggplantInBasketScene-v0 | <img src="./images/example_visualization/widowx_put_eggplant_in_basket_visual_matching.png" width="128" height="128" > |

We also support creating sub-tasks variations such as google_robot_pick_{horizontal/vertical/standing}_coke_can, google_robot_open_{top/middle/bottom}_drawer, and google_robot_close_{top/middle/bottom}_drawer. For the google_robot_place_in_closed_drawer task, we use the google_robot_place_apple_in_closed_top_drawer subtask for paper evaluations.

By default, Google Robot environments use a control frequency of 3hz, and Bridge environments use a control frequency of 5hz. Simulation frequency is ~500hz.

Compare Your Policy Evaluation Approach to SIMPLER

We make it easy to compare your offline robot policy evaluation approach to SIMPLER. In our paper, we use two metrics to measure the quality of simulated evaluation pipelines: Mean Maximum

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

API

A learning and reflection platform designed to cultivate clarity, resilience, and antifragile thinking in an uncertain world.

openclaw-plugin-loom

Loom Learning Graph Skill This skill guides agents on how to use the Loom plugin to build and expand a learning graph over time. Purpose - Help users navigate learning paths (e.g., Nix, German)

simpler-env

View profile

View on GitHub

GitHub Stars1.0k

CategoryEducation

Updated1d ago

Forks183

simpler-env/SimplerEnv

Languages

Jupyter Notebook

Security Score

100/100

Audited on Mar 24, 2026

No findings