CTRL - A Conditional Transformer Language Model for Controllable Generation

Authors: Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong, and Richard Socher

Updates

Apr 20, 2020

We are adding a model card for CTRL! Please reach out if you have any questions about it.

Oct 31, 2019

Adding functionality to convert a model from TF to HuggingFace/Transformers in response to a request. To convert the checkpoint, simply install transformers via pip install transformers and run python -u convert_tf_to_huggingface_pytorch.py --tf <path_to_tensorflow_data_checkpoint> --pytorch <path_to_where_you_want_to_store_pytorch_checkpoint>

Then, to use this in HuggingFace:

# create folder and contents for HuggingFace/Transformers
mkdir custom_ctrl_model
cd custom_ctrl_model
mv <path_to_pytorch_checkpoint_from_above> .
wget -O config.json https://storage.googleapis.com/sf-ctrl/pytorch/ctrl-config.json
wget -O merges.txt https://raw.githubusercontent.com/salesforce/ctrl/master/ctrl-merges.txt
wget -O vocab.json https://raw.githubusercontent.com/salesforce/ctrl/master/ctrl-vocab.json

# run
python examples/run_generation.py  --model_type ctrl --model_name <path_to_custom_ctrl_model>/ --temperature 0 --repetition 1.2

Oct 21, 2019

CTRL is now in hugginface/transformers!

You can simply follow the installation instructions and run:

python examples/run_generation.py  --model_type ctrl --model_name ctrl --temperature 0 --repetition 1.2

Sep 25, 2019

Two more additions:

We add the code to fine-tune the model on a custom dataset in the training_utils folder. Please refer to the README within the folder for details and example usage.
You can get a 36-layer model from gs://sf-ctrl/seqlen256_36layers_v0.ckpt/; the generation of this model is markedly worse than the 48-layer (base) model but still quite coherent.

Sep 23, 2019

The repo now supports (experimental) inference on PyTorch; Collaboratory: https://colab.research.google.com/drive/1nDh3ayRPJGK5ciPO2D3TFkYZFqclBWHY. Simply install PyTorch via pip install torch and run python pytorch_generation.py with the same flags as the base generation.py script except one exception: unlike the base version, here, the model_path requires the path to the .data file and not just the ckpt folder (see collaboratory for example). The code will convert the weights from TensorFlow in the first run and then create a loadable checkpoint for easier subsequent loading. You still need Tensorflow installed for the first step.

Sep 19, 2019

You should now be able to run inference on K80/T4/P100/similar GPUs using the lower_memory branch. We quantized certain weights to fp16 which reduced memory usage. Simply clone the repo and git checkout lower_memory. Here is a collaboratory link that demonstrates this functionality: https://colab.research.google.com/drive/1hVveBQShDru1Mjnhe4C21uQv4A2eH1tV

This functionality is being tested, please file GitHub issues if you see something aberrent. We still recommend using the full model if possible. Once the functionality has been sufficiently tested, we will update the repo and merge into master.

Two quick notes: (1) Unlike the base version, here, the model_path requires the path to the .data file and not just the ckpt folder (see collaboratory for example), (2) the first generation is slow because of overhead in setting up the model but the subsequent ones should be fast.

Introduction

Large-scale language models show promising text generation capabilities, but users cannot easily control this generation process. We release CTRL, a 1.6 billion-parameter conditional transformer language model, trained to condition on control codes that specify domain, subdomain, entities, relationships between entities, dates, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while providing more explicit control over text generation.

Paper link: https://arxiv.org/abs/1909.05858

Blog link: https://blog.einstein.ai/introducing-a-conditional-transformer-language-model-for-controllable-generation/

The code currently supports two functionalities:

Generating from a trained model, two models are available for download - one with a sequence length of 256 and another with a sequence length of 512 -- they are trained with word-level vocabularies and through a sliding window approach can generate well beyond their trained sequence lengths.
Source attribution - given a prompt, prints the perplexity of the prompt conditional on each domain control code (see Section 5 of the paper).

Please refer to the argument flags for more details regarding the options available for either.

Citation
License
Questions for Deliberation
Usage
Sample Generations
Sample Source Attributions
FAQs
Get Involved

Citation

@article{keskarCTRL2019,
  title={{CTRL - A Conditional Transformer Language Model for Controllable Generation}},
  author={Keskar, Nitish Shirish and McCann, Bryan and Varshney, Lav and Xiong, Caiming and Socher, Richard},
  journal={arXiv preprint arXiv:1909.05858},
  year={2019}
}

License

The code is released under the BSD-3 License (see LICENSE.txt for details), but we also ask that users respect the following:

This software should not be used to promote or profit from:

violence, hate, and division,

environmental destruction,

abuse of human rights, or

the destruction of people's physical and mental health.

We encourage users of this software to tell us about the applications in which they are putting it to use by emailing ctrl-monitoring@salesforce.com, and to use appropriate documentation when developing high-stakes applications of this model.

Questions for Deliberation

We consulted extended members of the AI community in the responsible publication of this model. In particular, a preview of a Partnership on AI (PAI) project relating to AI research publication norms was considered prior to the release of this work. While this PAI project is as-yet unpublished, it is informed by companies, organizations, and people differently affected by artificial intelligence and presents key considerations to evaluate before publishing potentially high-impact research.

The questions referenced from the early draft of the PAI project included:

How do you envision your research being used in the world? Who will use it? How much expertise is required to use it?
Who will use it?
Why would they be motivated to replicate / productionize your work?
How would a science fiction author turn your research into a dystopian story?
What is the worst way someone could use your research finding, given no resource constraints?
What are the historical patterns of misuse or application in this area? How can the research be made more robust against such misuse?
Which populations or communities will this technology negatively affect, deployed in the scenarios you envision? Will some groups be disproportionately affected?

Usage

Here are the steps to get generating:

Install the dependencies

This code relies on TensorFlow 1.14 and fastBPE.

TensorFlow can be installed via pip install tensorflow[-gpu]==1.14. fastBPE installation instructions can be found in the GitHub repository linked above. We highly recommend experimenting within a virtualenv or Docker image.

For inference on PyTorch, please see the update on Sep 23 at the top of this README. If you use PyTorch, you can skip Step 2.

Patch the /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py (or equivalent, if installed elsewhere) by running

patch -b <path_to_tensorflow_estimator_package>/python/estimator/keras.py estimator.patch

We highly recommend experimenting within a virtualenv or Docker image since the workflow involves patching a TensorFlow file to support some custom functionality. This step is not optional; skipping this step will cause errors (irrespective of device).

If you run into OOM issues because of GPU memory exhaustion, please use the lower_memory branch. See the (Sep 19, 2019) update at the top of this README for details.

Get the model files from gs://sf-ctrl/seqlen256_v1.ckpt/ or gs://sf-ctrl/seqlen512_v1.ckpt/.

A 36-layer model is also available at gs://sf-ctrl/seqlen256_36layers_v0.ckpt/.

The model architecture is identical for both checkpoints. The former is trained with lower training sequence length (256) while the latter is trained with a larger one (512). We plan to update the models (with the appropriate version tags) as we continue to train them longer and on more data. Our current recommendation is to use the 256_v1 model unless you have a strong reason not to. If you have no preference for domain, Links is always a good first choice.

With gsutil installed, you can simply run gsutil -m cp -r gs://sf-ctrl/seqlen256_v1.ckpt/ . for copying the model checkpoint over.

Without gsutil, you can follow the route recommended @ https://github.com/salesforce/ctrl/issues/7#issuecomment-531303214

Run the ge

Ctrl

Install / Use

README