Procgen
Procgen Benchmark: Procedurally-Generated Game-Like Gym-Environments
Install / Use
/learn @openai/ProcgenREADME
Status: Maintenance (expect bug fixes and minor updates)
Procgen Benchmark
[Blog Post] [Paper]
16 simple-to-use procedurally-generated gym environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills. The environments run at high speed (thousands of steps per second) on a single core.
We ran a competition in 2020 which used these environments to measure sample efficiency and generalization in RL. You can learn more here.
<img src="https://raw.githubusercontent.com/openai/procgen/master/screenshots/procgen.gif">These environments are associated with the paper Leveraging Procedural Generation to Benchmark Reinforcement Learning (citation). The code for running some experiments from the paper is in the train-procgen repo. For those familiar with the original CoinRun environment, be sure to read the updated CoinRun description below as there have been subtle changes to the environment.
Compared to Gym Retro, these environments are:
- Faster: Gym Retro environments are already fast, but Procgen environments can run >4x faster.
- Randomized: Gym Retro environments are always the same, so you can memorize a sequence of actions that will get the highest reward. Procgen environments are randomized so this is not possible.
- Customizable: If you install from source, you can perform experiments where you change the environments, or build your own environments. The environment-specific code for each environment is often less than 300 lines. This is almost impossible with Gym Retro.
Supported platforms:
- Windows 10
- macOS 10.14 (Mojave), 10.15 (Catalina)
- Linux (manylinux2010)
Supported Pythons:
- 3.7 64-bit
- 3.8 64-bit
- 3.9 64-bit
- 3.10 64-bit
Supported CPUs:
- Must have at least AVX
Installation
First make sure you have a supported version of python:
# run these commands to check for the correct python version
python -c "import sys; assert (3,7,0) <= sys.version_info <= (3,10,0), 'python is incorrect version'; print('ok')"
python -c "import platform; assert platform.architecture()[0] == '64bit', 'python is not 64-bit'; print('ok')"
To install the wheel:
pip install procgen
If you get an error like "Could not find a version that satisfies the requirement procgen", please upgrade pip: pip install --upgrade pip.
To try an environment out interactively:
python -m procgen.interactive --env-name coinrun
The keys are: left/right/up/down + q, w, e, a, s, d for the different (environment-dependent) actions. Your score is displayed as "episode_return" in the lower left. At the end of an episode, you can see your final "episode_return" as well as "prev_level_complete" which will be 1 if you successfully completed the level.
To create an instance of the gym environment:
import gym
env = gym.make("procgen:procgen-coinrun-v0")
To create an instance of the gym3 (vectorized) environment:
from procgen import ProcgenGym3Env
env = ProcgenGym3Env(num=1, env_name="coinrun")
Docker
A Dockerfile is included to demonstrate a minimal Docker-based setup that works for running random agent.
docker build docker --tag procgen
docker run --rm -it procgen python3 -m procgen.examples.random_agent_gym
There is a second Dockerfile to demonstrate installing from source:
docker build . --tag procgen --file docker/Dockerfile.dev
docker run --rm -it procgen python -c "from procgen import ProcgenGym3Env; env = ProcgenGym3Env(num=1, env_name='coinrun'); print(env.observe())"
Environments
The observation space is a box space with the RGB pixels the agent sees in a numpy array of shape (64, 64, 3). The expected step rate for a human player is 15 Hz.
The action space is Discrete(15) for which button combo to press. The button combos are defined in env.py.
If you are using the vectorized environment, the observation space is a dictionary space where the pixels are under the key "rgb".
Here are the 16 environments:
| Image | Name | Description |
| --- | --- | --- |
| <img src="https://raw.githubusercontent.com/openai/procgen/master/screenshots/bigfish.png" width="200px"> | bigfish | The player starts as a small fish and becomes bigger by eating other fish. The player may only eat fish smaller than itself, as determined solely by width. If the player comes in contact with a larger fish, the player is eaten and the episode ends. The player receives a small reward for eating a smaller fish and a large reward for becoming bigger than all other fish, at which point the episode ends.
| <img src="https://raw.githubusercontent.com/openai/procgen/master/screenshots/bossfight.png" width="200px"> | bossfight | The player controls a small starship and must destroy a much bigger boss starship. The boss randomly selects from a set of possible attacks when engaging the player. The player must dodge the incoming projectiles or be destroyed. The player can also use randomly scattered meteors for cover. After a set timeout, the boss becomes vulnerable and its shields go down. At this point, the players projectile attacks will damage the boss. Once the boss receives a certain amount of damage, the player receives a reward, and the boss re-raises its shields. If the player damages the boss several times in this way, the boss is destroyed, the player receives a large reward, and the episode ends.
| <img src="https://raw.githubusercontent.com/openai/procgen/master/screenshots/caveflyer.png" width="200px"> | caveflyer | The player must navigate a network of caves to reach the exit. Player movement mimics the Atari game “Asteroids”: the ship can rotate and travel forward or backward along the current axis. The majority of the reward comes from successfully reaching the end of the level, though additional reward can be collected by destroying target objects along the way with the ship's lasers. There are stationary and moving lethal obstacles throughout the level.
| <img src="https://raw.githubusercontent.com/openai/procgen/master/screenshots/chaser.png" width="200px"> | chaser | Inspired by the Atari game “MsPacman”. Maze layouts are generated using Kruskal’s algorithm, and then walls are removed until no dead-ends remain in the maze. The player must collect all the green orbs. 3 large stars spawn that will make enemies vulnerable for a short time when collected. A collision with an enemy that isn’t vulnerable results in the player’s death. When a vulnerable enemy is eaten, an egg spawns somewhere on the map that will hatch into a new enemy after a short time, keeping the total number of enemies constant. The player receives a small reward for collecting each orb and a large reward for completing the level.
| <img src="https://raw.githubusercontent.com/openai/procgen/master/screenshots/climber.png" width="200px"> | climber | A simple platformer. The player must climb a sequence of platforms, collecting stars along the way. A small reward is given for collecting a star, and a larger reward is given for collecting all stars in a level. If all stars are collected, the episode ends. There are lethal flying monsters scattered throughout the level.
| <img src="https://raw.githubusercontent.com/openai/procgen/master/screenshots/coinrun.png" width="200px"> | coinrun | A simple platformer. The goal is to collect the coin at the far right of the level, and the player spawns on the far left. The agent must dodge stationary saw obstacles, enemies that pace back and forth, and chasms that lead to death. Note that while the previously released version of CoinRun painted velocity information directly onto observations, the current version does not. This makes the environment significantly more difficult.
| <img src="https://raw.githubusercontent.com/openai/procgen/master/screenshots/dodgeball.png" width="200px"> | dodgeball | Loosely inspired by the Atari game “Berzerk”. The player spawns in a room with a random configuration of walls and enemies. Touching a wall loses the game and ends the episode. The player moves relatively slowly and can navigate throughout the room. There are enemies which also move slowly and which will occasionally throw balls at the player. The player can also throw balls, but only in the direction they are facing. If all enemies are hit, the player can move to the unlocked platform and earn a significant level completion bonus.
| <img src="https://raw.githubusercontent.com/openai/procgen/master/screenshots/fruitbot.png" width="200px"> | fruitbot | A scrolling game where the player controls a robot that must navigate between gaps in walls and collect fruit along the way. The player receives a positive reward for collecting a piece of fruit, and a larger negative reward for mistakenly collecting a non-fruit object. Half of the spawned objects are fruit (positive reward) and half are non-fruit (negative reward). The player receives a large reward if they reach the end of the level. Occasionally the player must use a key to unlock gates which block the way.
| <img src="https://raw.githubusercontent.com/openai/procgen/master/screenshots/heist.png" width="200px"> | heist | The player must steal the gem hidden behind a network of locks. Each lock comes in one of three colors, and the necessary keys to open these locks are scattered throughout the level. The level layout takes the form of a maze, again generated by Kruskal's algorithm. Once the player collects a key of a certain color, the player may open the lock of
