Genesis

Official PyTorch implementation of GENESIS and GENESIS-V2

Generate Convert Improve

Install / Use

/learn @applied-ai-lab/Genesis

About this skill

Quality Score

0/100

README

Genesis and Genesis-V2

This is the official PyTorch reference implementation of:

"GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations"
Martin Engelcke, Adam R. Kosiorek, Oiwi Parker Jones, and Ingmar Posner
International Conference on Learning Representations (ICLR), 2020

"Reconstruction Bottlenecks in Object-Centric Generative Models"
Martin Engelcke, Oiwi Parker Jones, and Ingmar Posner
Workshop on Object-Oriented Learning at ICML, 2020

"GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement"
Martin Engelcke, Oiwi Parker Jones, and Ingmar Posner
Advances in Neural Information Processing Systems (NeurIPS), 2021

As part of these works, the repository also includes:

a re-implementation of MONet from "MONet: Unsupervised Scene Decomposition and Representation" by Burgess et al.;
a re-implementation of GECO from "Taming VAEs" by Rezende and Viola.

Setup

Dependencies

Clone the repository into, e.g., ~/code/genesis:

git clone --recursive https://github.com/applied-ai-lab/genesis.git ~/code/genesis

We use Forge to save some legwork. It is included as a submodule but you need to add it to your python path, e.g. with:

# If needed, replace .bashrc with .zshrc or similar
echo 'export PYTHONPATH="${PYTHONPATH}:${HOME}/code/genesis/forge"' >> ~/.bashrc

You can either install PyTorch, TensorFlow, and all other dependencies manually or you can setup up conda environment with all required dependencies using the environment.yml file:

conda env create -f environment.yml
conda activate genesis_env

Datasets

This repository contains data loaders for the three datasets considered in the ICLR paper. A few steps are required for setting up each individual dataset. We also provide a PyTorch wrapper around the Multi-Object Datasets used for the experiments on the Objects Room dataset in the ICML workshop paper.

Multi-dSprites

Generate coloured Multi-dSprites from the original dSprites dataset with:

cd ~/code/genesis
mkdir -p data/multi_dsprites/processed
git clone https://github.com/deepmind/dsprites-dataset.git data/multi_dsprites/dsprites-dataset
python scripts/generate_multid.py

NOTE: An RGB colour is sampled from 125 possible colours for each scene component. By default, multiple components in an image can have the same colour. This can lead, e.g., to a foreground object to have the same colour as the background so that the object is practically "invisible". A ground truth segmentation mask will still be associated with such an invisible object. If you want to avoid this, you can set --unique_colours True during training to use an alternative dataset where each component in an image has a unique colour.

GQN (rooms-ring-camera)

The GQN datasets are quite large. The rooms_ring_camera dataset as used in the paper takes about 250GB and can be downloaded with:

pip install gsutil
cd ~/code/genesis
mkdir -p data/gqn_datasets
gsutil -m cp -r gs://gqn-dataset/rooms_ring_camera data/gqn_datasets

Note that we use a modified version of the TensorFlow GQN data loader from ogroth/tf-gqn which is included in third_party/tf_gqn.

ShapeStacks

You need about 30GB of free disk space for ShapeStacks:

# Download compressed dataset
cd data
wget -i ../utils/shapestacks_urls.txt
# Uncompress files
tar xvzf shapestacks-meta.tar.gz
tar xvzf shapestacks-mjcf.tar.gz
tar xvzf shapestacks-rgb.tar.gz
cd -

The instance segmentation labels for ShapeStacks can be downloaded from here.

Multi-Object Datasets

The repository contains a wrapper around the Multi-Object Datasets, returning an iterable which behaves similarly to a PyTorch DataLoader object. The default config assumes that any datasets you wish to use have been downloaded to data/multi-object-datasets. As for the GQN data, this can be done with gsutil. You can download all four datasets at once with:

gsutil cp -r gs://multi-object-datasets data/

Sketchy

Clone deepmind-research into, e.g., code/deepmind-research:

git clone https://github.com/deepmind/deepmind-research.git ~/code/deepmind-research

Download lift_green__demos 2 and stack_green_on_red__demos 2 using the script at deepmind-research/sketchy/download.sh. Put the data into ~/code/genesis/data/sketchy/records with the contents of the folder being the actual tfrecord files. Make sure the deepmind-research is on your python path:

# If needed, replace .bashrc with .zshrc or similar
echo 'export PYTHONPATH="${PYTHONPATH}:${HOME}/code/deepmind-research"' >> ~/.bashrc

Create a separate environment according to deepmind-research/sketchy/requirements.txt

# Leave current environment first if necessary
conda deactivate
conda create -n sketchy python=3.7
conda activate sketchy
pip install -r ~/code/deepmind-research/sketchy/requirements.txt
# Some additional dependencies
pip install torch==1.3.1 torchvision==0.4.2 tqdm pillow

You can now preprocess the data with:

python scripts/sketchy_preparation.py
conda deactivate

MIT-Princeton 2016 Amazon Picking Challenge (APC)

Dowload the "Object Segmentation Training Dataset" from the team's website via the download link (ca. 130GB). Move the training.zip file into ~/code/genesis/data/apc and unpack it. Preprocess the data by running:

python datasets/apc_config.py

Training

You can train Genesis-v2, Genesis, MONet and baseline VAEs on the datasets using the default hyperparameters with, e.g.:

python train.py --data_config datasets/shapestacks_config.py --model_config models/genesisv2_config.py
python train.py --data_config datasets/gqn_config.py --model_config models/genesis_config.py
python train.py --data_config datasets/multi_object_config.py --model_config models/monet_config.py
python train.py --data_config datasets/multid_config.py --model_config models/vae_config.py

You can change many of the hyperparameters via the Forge command line flags in the respective config files, e.g.:

python train.py --data_config datasets/multid_config.py --model_config models/genesis_config.py --batch_size 64 --learning_rate 0.001

See train.py and the config files for the available flags.

TensorBoard logs are written to file with TensorboardX. Run tensorboard --logdir checkpoints to monitor training.

NOTE 1: If you train MONet with the default config flags, then the hyperparameters from our ICLR paper are used which are different from the ones in Burgess et al.. If you want to use the training hyperparameters from Burgess et al., then you need to add the following flags: --geco False --pixel_std1 0.09 --pixel_std2 0.11 --train_iter 1000000 --batch_size 64 --optimiser rmsprop.

NOTE 2: The Sketchy results in the GENESIS-V2 paper use a different GECO goal than used in the other experiments. It is necessary to override the default value to reproduce these results, which can be done by adding --g_goal 0.5645 as a training flag.

Evaluation

To compute the FID score for a trained model you can run, e.g.:

python scripts/compute_fid.py --data_config datasets/gqn_config.py --model_config models/genesis_config.py --model_dir checkpoints/MyModel/1 --model_file model.ckpt-FINAL

Similarly, you can compute the segmentation metrics with, e.g.:

python scripts/compute_seg_metrics.py --data_config datasets/gqn_config.py --model_config models/genesis_config.py --model_dir checkpoints/MyModel/1 --model_file model.ckpt-FINAL

Visualisation

You can visualise your data with, e.g.:

python scripts/visualise_data.py --data_config datasets/multid_config.py
python scripts/visualise_data.py --data_config datasets/gqn_config.py
python scripts/visualise_data.py --data_config datasets/shapestacks_config.py
python scripts/visualise_data.py --data_config datasets/multi_object_config.py --dataset objects_room

Scripts for visualising reconstructions/segmentations and samples are available at scripts/visualise_reconstruction.py and scripts/visualise_generation.py, respectively.

Pretrained models & results

Checkpoints of pretrained models are available here.

Generation and segmentation metrics of the released model checkpoints are summarised in the following table: | Model | Dataset | FID ↓ | ARI-FG ↑ | MSC-FG ↑ | ARI ↑ | MSC ↑ | | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | | GENESIS | Multi-dSprites | 25.0 | 0.57 | 0.69 | - | - | | GENESIS | GQN | 79.4 | no labels | no labels | no labels | no labels | | GENESIS | ShapeStacks | 235.4 | 0.71 | 0.64 | - | - | | GENESIS-V2 | ShapeStacks | 108.1

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

flutter-tutor

Flutter Learning Tutor Guide You are a friendly computer science tutor specializing in Flutter development. Your role is to guide the student through learning Flutter step by step, not to provide d

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

16.9k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary