Note on the top: the project is unmaintained.

Transformer-based dialog models work better and we recommend using them instead of RNN-based CakeChat. See, for example https://github.com/microsoft/DialoGPT

CakeChat: Emotional Generative Dialog System

CakeChat is a backend for chatbots that are able to express emotions via conversations.

CakeChat representation

CakeChat is built on Keras and Tensorflow.

The code is flexible and allows to condition model's responses by an arbitrary categorical variable. For example, you can train your own persona-based neural conversational model[1] or create an emotional chatting machine[2].

Main requirements

python 3.5.2
tensorflow 1.12.2
keras 2.2.4

Network architecture and features
Quick start
Setup for training and testing
1. Docker
  1. CPU-only setup
  2. GPU-enabled setup
2. Manual setup
Getting the pre-trained model
Training data
Training the model
Running CakeChat server
Repository overview
1. Important tools
2. Important configuration settings
Example use cases
References
Credits & Support
License

Network architecture and features

Network architecture

Model:

Hierarchical Recurrent Encoder-Decoder (HRED) architecture for handling deep dialog context[3].
Multilayer RNN with GRU cells. The first layer of the utterance-level encoder is always bidirectional. By default, CuDNNGRU implementation is used for ~25% acceleration during inference.
Thought vector is fed into decoder on each decoding step.
Decoder can be conditioned on any categorical label, for example, emotion label or persona id.

Word embedding layer:

May be initialized using w2v model trained on your corpus.
Embedding layer may be either fixed or fine-tuned along with other weights of the network.

Decoding

4 different response generation algorithms: "sampling", "beamsearch", "sampling-reranking" and "beamsearch-reranking". Reranking of the generated candidates is performed according to the log-likelihood or MMI-criteria[4]. See configuration settings description for details.

Metrics:

Perplexity
n-gram distinct metrics adjusted to the samples size[4].
Lexical similarity between samples of the model and some fixed dataset. Lexical similarity is a cosine distance between TF-IDF vector of responses generated by the model and tokens in the dataset.
Ranking metrics: mean average precision and mean recall@k.[5]

Quick start

In case you are familiar with Docker here is the easiest way to run a pre-trained CakeChat model as a server. You may need to run the following commands with sudo.

CPU version:

docker pull lukalabs/cakechat:latest && \

docker run --name cakechat-server -p 127.0.0.1:8080:8080 -it lukalabs/cakechat:latest bash -c "python bin/cakechat_server.py"

GPU version:

docker pull lukalabs/cakechat-gpu:latest && \

nvidia-docker run --name cakechat-gpu-server -p 127.0.0.1:8080:8080 -it lukalabs/cakechat-gpu:latest bash -c "CUDA_VISIBLE_DEVICES=0 python bin/cakechat_server.py"

That's it! Now test your CakeChat server by running the following command on your host machine:

python tools/test_api.py -f localhost -p 8080 -c "hi!" -c "hi, how are you?" -c "good!" -e "joy"

The response dict may look like this:

{'response': "I'm fine!"}

Setup for training and testing

Docker

Docker is the easiest way to set up the environment and install all the dependencies for training and testing.

CPU-only setup

Note: We strongly recommend using GPU-enabled environment for training CakeChat model. Inference can be made both on GPUs and CPUs.

Install Docker.
Pull a CPU-only docker image from dockerhub:

docker pull lukalabs/cakechat:latest

Run a docker container in the CPU-only environment:

docker run --name <YOUR_CONTAINER_NAME> -it lukalabs/cakechat:latest

GPU-enabled setup

Install nvidia-docker for the GPU support.
Pull GPU-enabled docker image from dockerhub:

docker pull lukalabs/cakechat-gpu:latest

Run a docker container in the GPU-enabled environment:

nvidia-docker run --name <YOUR_CONTAINER_NAME> -it cakechat-gpu:latest

That's it! Now you can train your model and chat with it. See the corresponding section below for further instructions.

Manual setup

If you don't want to deal with docker, you can install all the requirements manually:

pip install -r requirements.txt -r requirements-local.txt

NB:

We recommend installing the requirements inside a virtualenv to prevent messing with your system packages.

Getting the pre-trained model

You can download our pre-trained model weights by running python tools/fetch.py.

The params of the pre-trained model are the following:

context size 3 (<speaker_1_utterance>, <speaker_2_utterance>, <speaker_1_utterance>)
each encoded utterance contains up to 30 tokens
the decoded utterance contains up to 32 tokens
both encoder and decoder have 2 GRU layers with 768 hidden units each
first layer of the encoder is bidirectional

Training data

The model was trained on a preprocessed Twitter corpus with ~50 million dialogs (11Gb of text data). To clean up the corpus, we removed

URLs, retweets and citations;
mentions and hashtags that are not preceded by regular words or punctuation marks;
messages that contain more than 30 tokens.

We used our emotions classifier to label each utterance with one of the following 5 emotions: "neutral", "joy", "anger", "sadness", "fear", and used these labels during training. To mark-up your own corpus with emotions you can use, for example, DeepMoji tool.

Unfortunately, due to Twitter's privacy policy, we are not allowed to provide our dataset. You can train a dialog model on any text conversational dataset available to you, a great overview of existing conversational datasets can be found here: https://breakend.github.io/DialogDatasets/

The training data should be a txt file, where each line is a valid json object, representing a list of dialog utterances. Refer to our dummy train dataset to see the necessary file structure. Replace this dummy corpus with your data before training.

Training the model

There are two options:

training from scratch
fine-tuning the provided trained model

The first approach is less restrictive: you can use any training data you want and set any config params of the model. However, you should be aware that you'll need enough train data (~50Mb at least), one or more GPUs and enough patience (days) to get good model's responses.

The second approach is limited by the choice of config params of the pre-trained model – see cakechat/config.py for the complete list. If the default params are suitable for your task, fine-tuning should be a good option.

Fine-tuning the pre-trained model on your data

Fetch the pre-trained model from Amazon S3 by running python tools/fetch.py.
Put your training text corpus to data/corpora_processed/train_processed_dialogs.txt. Make sure that your dataset is large enough, otherwise your model risks to overfit the data and the results will be poor.
Run python tools/train.py.
1. The script will look for the pre-trained model weights in results/nn_models, the full path is inferred from the set of config params.
2. If you want to initialize the model weights from a custom file, you can specify the path to the file via -i argument, for example, python tools/train.py -i results/nn_models/my_saved_weights/model.current.
3. Don't forget to set CUDA_VISIBLE_DEVICES=<GPU_ID> environment variable (with <GPU_ID> as in output of nvidia-smi command) if you want to use GPU. For example, CUDA_VISIBLE_DEVICES=0 python tools/train.py will run the train process on the 0-th GPU.
4. Use parameter -s to train the model on a subset of the first N samples of your training data to speed up preprocessing for debugging. For example, run python tools/train.py -s 1000 to train on the first 1

Cakechat

Install / Use

README