TEACh

Aishwarya Padmakumar*, Jesse Thomason*, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment. The code and model weights are licensed under the MIT License (see SOFTWARELICENSE), images are licensed under Apache 2.0 (see IMAGESLICENSE) and other data files are licensed under CDLA-Sharing 1.0 (see DATALICENSE). Please include appropriate licensing and attribution when using our data and code, and please cite our paper.

Citation:

@inproceedings{teach,
  title={{TEACh: Task-driven Embodied Agents that Chat}},
  author={Padmakumar, Aishwarya and Thomason, Jesse and Shrivastava, Ayush and Lange, Patrick and Narayan-Chen, Anjali and Gella, Spandana and Piramuthu, Robinson and Tur, Gokhan and Hakkani-Tur, Dilek},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={36},
  number={2},
  pages={2017--2025},
  year={2022}
}

As of 09/07/2022, the dataset has been updated to include dialog acts annotated in the paper

Dialog Acts for Task-Driven Embodied Agents

Spandana Gella*, Aishwarya Padmakumar*, Patrick Lange, Dilek Hakkani-Tur

If using the dialog acts in your work, please cite the following paper:

@inproceedings{teachda,
  title={{Dialog Acts for Task-Driven Embodied Agents}},
  author={Gella, Spandana and Padmakumar, Aishwarya and Lange, Patrick and Hakkani-Tur, Dilek},
  booktitle={Proceedings of the 23nd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDial)},
  year={2022},
  pages={111-123}
}

Interactions in the games, EDH instances and TfD instances that are utterances now have an additional field da_metadata containing the dialog act annotations. See the data exploration notebook for sample code to view dialog acts.

Prerequisites

python3 >=3.7,<=3.8
python3.x-dev, example: sudo apt install python3.8-dev
tmux, example: sudo apt install tmux
xorg, example: sudo apt install xorg openbox
ffmpeg, example: sudo apt install ffmpeg

Installation

pip install -r requirements.txt
pip install -e .

Downloading the dataset

Run the following script:

teach_download

This will download and extract the archive files (experiment_games.tar.gz, all_games.tar.gz, images_and_states.tar.gz, edh_instances.tar.gz & tfd_instances.tar.gz) in the default directory (/tmp/teach-dataset).
Optional arguments:

-d/directory: The location to store the dataset into. Default=/tmp/teach-dataset.
-se/--skip-extract: If set, skip extracting archive files.
-sd/--skip-download: If set, skip downloading archive files.
-f/--file: Specify the file name to be retrieved from S3 bucket.

File changes (12/28/2022): We have modified EDH instances so that the state changes checked for to evaluate success are only those that contribute towards task success in the main task of the gameplay session the EDH instance is created from. We have removed EDH instances that had no state changes meeting these requirements. Additionally, two game files, and their corresponding EDH and TfD instances were deleted from the valid_unseen split due to issues in the game files. Version 3 of our paper on Arxiv, which will be public on Dec 30, 2022 contains the updated dataset size and experimental results.

Remote Server Setup

If running on a remote server without a display, the following setup will be needed to run episode replay, model inference of any model training that invokes the simulator (student forcing / RL).

Start an X-server

tmux
sudo python ./bin/startx.py

Exit the tmux session (CTRL+B, D). Any other commands should be run in the main terminal / different sessions.

Replaying episodes

Most users should not need to do this since we provide this output in images_and_states.tar.gz.

The following steps can be used to read a .json file of a gameplay session, play it in the AI2-THOR simulator, and at each time step save egocentric observations of the Commander and Driver (Follower in the paper). It also saves the target object panel and mask seen by the Commander, and the difference between current and initial state.

Replaying a single episode locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_fn /path/to/game/file \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--status-out-fn /path/to/desired/output/status/file.json

Note that --status-out-fn must end in .json Also note that the script will by default not replay sessions for which an output subdirectory already exists under --write-frames-dir Additionally, if the file passed to --status-out-fn already exists, the script will try to resume files not marked as replayed in that file. It will error out if there is a mismatch between the status file and output directories on which sessions have been previously played. It is recommended to use a new --write-frames-dir and new --status-out-fn for additional runs that are not intended to resume from a previous one.

Replay all episodes in a folder locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_dir /path/to/dir/containing/.game.json/files \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--num_processes 50 \
--status-out-fn /path/to/desired/output/status/file.json

To generate a video, additionally specify --create_video. Note that for images to be saved, --write_images must be specified and --write-frames-dir must be provided. For state changes to be saved, --write_states must be specified and --write_frames_dir must be provided.

Evaluation

We include sample scripts for inference and calculation of metrics. teach_inference and teach_eval. teach_inference is a wrapper that implements loading EDH instance, interacting with the simulator as well as writing the game file and predicted action sequence as JSON files after each inference run. It dynamically loads the model based on the --model_module and --model_class arguments. Your model has to implement teach.inference.teach_model.TeachModel. See teach.inference.sample_model.SampleModel for an example implementation which takes random actions at every time step.

After running teach_inference, you use teach_eval to compute the metrics based output data produced by teach_inference.

Sample run:

export DATA_DIR=/path/to/data/with/games/and/edh_instances/as/subdirs (Default in Downloading is /tmp/teach-dataset)
export OUTPUT_DIR=/path/to/output/folder/for/split
export METRICS_FILE=/path/to/output/metrics/file_without_extension

teach_inference \
    --data_dir $DATA_DIR \
    --output_dir $OUTPUT_DIR \
    --split valid_seen \
    --metrics_file $METRICS_FILE \
    --model_module teach.inference.sample_model \
    --model_class SampleModel

teach_eval \
    --data_dir $DATA_DIR \
    --inference_output_dir $OUTPUT_DIR \
    --split valid_seen \
    --metrics_file $METRICS_FILE

To run TfD inference instead of EDH inference add --benchmark tfd to the inference command.

TEACh Benchmark Challenge

For participation in the challenge, you will need to submit a docker image container your code and model. Docker containers using your image will serve your model as HTTP API following the [TEACh API Specification](#TEACh API Specification). For your convenience, we included the teach_api command which implements this API and is compatible with models implementing teach.inference.teach_model.TeachModel also used by teach_inference.

We have also included two sample Docker images using teach.inference.sample_model.SampleModel and teach.inference.et_model.ETModel respectively in docker/.

When evaluating a submissions, the submitted container will be started with access to a single GPU and no internet access. For details see Step 3 - Start your container.

The main evaluation code invoking your submission will also be run as Docker container. It reuses the teach_inference CLI command together with teach.inference.remote_model.RemoteModel to call the HTTP API running in your container. For details on how to start it locally see Step 4 - Start the evaluation.

Please note that TfD inference is not currently supported via Docker image.

Testing Locally

Assuming you have downloaded the data to /home/ubuntu/teach-dataset and followed Prerequisites and Remote Server Setup.

Step 0 - Setup Environment

export HOST_DATA_DIR=/home/ubuntu/teach-dataset
export HOST_IMAGES_DIR=/home/ubuntu/images
export HOST_OUTPUT_DIR=/home/ubuntu/output
export API_PORT=5000
export SUBMISSION_PK=168888
export INFERENCE_GPUS='"device=0"'
export API_GPUS='"device=1"'
export SPLIT=valid_seen
export DOCKER_NETWORK=no-internet

mkdir -p $HOST_IMAGES_DIR $HOST_OUTPUT_DIR
docker network create --driver=bridge --internal $DOCKER_NETWORK

Note: If you run on a machine that only has a single GPU, set API_GPUS='"device=0"'.

Step 1 - Build the `remote-inference-runner` container

docker build -t remote-inference-runner -f docker/Dockerfile.RemoteInferenceRunner .

Step 2 - Build your container

Note: When customizing the images for your own usage, do not edit the following or your subm

Teach

Install / Use

README

TEACh

Prerequisites

Installation

Downloading the dataset

Remote Server Setup

Replaying episodes

Evaluation

TEACh Benchmark Challenge

Testing Locally

Step 0 - Setup Environment

Step 1 - Build the `remote-inference-runner` container

Step 2 - Build your container

Teach

Install / Use

README

TEACh

Prerequisites

Installation

Downloading the dataset

Remote Server Setup

Replaying episodes

Evaluation

TEACh Benchmark Challenge

Testing Locally

Step 0 - Setup Environment

Step 1 - Build the remote-inference-runner container

Step 2 - Build your container

Step 1 - Build the `remote-inference-runner` container