Crosslingual Generalization through Multitask Finetuning

This repository provides an overview of all components used for the creation of BLOOMZ & mT0 and xP3 introduced in the paper Crosslingual Generalization through Multitask Finetuning. Link to 25min video on the paper by Samuel Albanie; Link to 4min video on the paper by Niklas Muennighoff.

Data
Models
Create xP3
Train models
- BLOOMZ
- mT0
Evaluate models
- Rank Evaluation
- Generation Evaluation
Plots & Tables
- Plots
- Tables
Citation

Data

<table> <tr> <th>Name</th> <th>Explanation</th> <th>Example models</th> </tr> <tr> <td><a href=https://huggingface.co/datasets/Muennighoff/xP3x>xP3x</a></t> <td>Mixture of 17 tasks in 277 languages with English prompts</td> <td>WIP - Join us at Project Aya @<a href=https://cohere.for.ai/>C4AI</a> to help!</td> </tr> <tr> <td><a href=https://huggingface.co/datasets/bigscience/xP3>xP3</a></t> <td>Mixture of 13 training tasks in 46 languages with English prompts</td> <td><a href=https://huggingface.co/bigscience/bloomz>BLOOMZ</a> & <a href=https://huggingface.co/bigscience/mt0-xxl>mT0-13B</a></td> </tr> <tr> <td><a href=https://huggingface.co/datasets/bigscience/xP3mt>xP3mt</a></t> <td>Mixture of 13 training tasks in 46 languages with prompts in 20 languages (machine-translated from English)</td> <td><a href=https://huggingface.co/bigscience/bloomz-mt>BLOOMZ-MT</a> & <a href=https://huggingface.co/bigscience/mt0-xxl-mt>mT0-13B-MT</a></td> </tr> <tr> <td><a href=https://huggingface.co/datasets/bigscience/xP3all>xP3all</a></t> <td>xP3 + our evaluation datasets adding an additional 3 tasks for a total of 16 tasks in 46 languages with English prompts</td> <td></td> </tr> <tr> <td><a href=https://huggingface.co/datasets/bigscience/xP3megds>xP3megds</a></t> <td><a href=https://github.com/bigscience-workshop/Megatron-DeepSpeed>Megatron-DeepSpeed</a> processed version of xP3</td> <td><a href=https://huggingface.co/bigscience/bloomz>BLOOMZ</a></td> </tr> <tr> <td><a href=https://huggingface.co/datasets/Muennighoff/P3>P3</a></t> <td>Repreprocessed version of the English-only <a href=https://huggingface.co/datasets/bigscience/P3>P3</a> with 8 training tasks</td> <td><a href=https://huggingface.co/bigscience/bloomz-p3>BLOOMZ-P3</a> & <a href=https://huggingface.co/bigscience/mt0-xxl-p3>mT0-13B-P3</a></td> </tr> </table>

Models

<table> <tr> <th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/datasets/bigscience/xP3>xP3</a>. Recommended for prompting in English. </tr> <tr> <td>Parameters</td> <td>300M</td> <td>580M</td> <td>1.2B</td> <td>3.7B</td> <td>13B</td> <td>560M</td> <td>1.1B</td> <td>1.7B</td> <td>3B</td> <td>7.1B</td> <td>176B</td> </tr> <tr> <td>Finetuned Model</td> <td><a href=https://huggingface.co/bigscience/mt0-small>mt0-small</a></td> <td><a href=https://huggingface.co/bigscience/mt0-base>mt0-base</a></td> <td><a href=https://huggingface.co/bigscience/mt0-large>mt0-large</a></td> <td><a href=https://huggingface.co/bigscience/mt0-xl>mt0-xl</a></td> <td><a href=https://huggingface.co/bigscience/mt0-xxl>mt0-xxl</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-560m>bloomz-560m</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-1b1>bloomz-1b1</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-1b7>bloomz-1b7</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-3b>bloomz-3b</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-7b1>bloomz-7b1</a></td> <td><a href=https://huggingface.co/bigscience/bloomz>bloomz</a></td> </tr> </tr> <tr> <th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/datasets/bigscience/xP3mt>xP3mt</a>. Recommended for prompting in non-English.</th> </tr> <tr> <td>Finetuned Model</td> <td></td> <td></td> <td></td> <td></td> <td><a href=https://huggingface.co/bigscience/mt0-xxl-mt>mt0-xxl-mt</a></td> <td></td> <td></td> <td></td> <td></td> <td><a href=https://huggingface.co/bigscience/bloomz-7b1-mt>bloomz-7b1-mt</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-mt>bloomz-mt</a></td> </tr> <th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/datasets/Muennighoff/P3>P3</a>. Released for research purposes only. Strictly inferior to above models!</th> </tr> <tr> <td>Finetuned Model</td> <td></td> <td></td> <td></td> <td></td> <td><a href=https://huggingface.co/bigscience/mt0-xxl-p3>mt0-xxl-p3</a></td> <td></td> <td></td> <td></td> <td></td> <td><a href=https://huggingface.co/bigscience/bloomz-7b1-p3>bloomz-7b1-p3</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-p3>bloomz-p3</a></td> </tr> <th colspan="12">Original pretrained checkpoints. Not recommended.</th> <tr> <td>Pretrained Model</td> <td><a href=https://huggingface.co/google/mt5-small>mt5-small</a></td> <td><a href=https://huggingface.co/google/mt5-base>mt5-base</a></td> <td><a href=https://huggingface.co/google/mt5-large>mt5-large</a></td> <td><a href=https://huggingface.co/google/mt5-xl>mt5-xl</a></td> <td><a href=https://huggingface.co/google/mt5-xxl>mt5-xxl</a></td> <td><a href=https://huggingface.co/bigscience/bloom-560m>bloom-560m</a></td> <td><a href=https://huggingface.co/bigscience/bloom-1b1>bloom-1b1</a></td> <td><a href=https://huggingface.co/bigscience/bloom-1b7>bloom-1b7</a></td> <td><a href=https://huggingface.co/bigscience/bloom-3b>bloom-3b</a></td> <td><a href=https://huggingface.co/bigscience/bloom-7b1>bloom-7b1</a></td> <td><a href=https://huggingface.co/bigscience/bloom>bloom</a></td> </tr> </table>

Create xP3(x)

We have processed & uploaded xP3. If you want to recreate it, follow these steps:

Get promptsource: For xP3mt git clone -b xp3mt https://github.com/Muennighoff/promptsource.git, for xP3 git clone -b tr13 https://github.com/Muennighoff/promptsource.git & install cd promptsource; pip install -e .
Get packages pip install -q datasets iso-639
Get the creation script & edit it if necessary:

For xP3mt, set USE_ENGLISH_PROMPTS = False in the beginning
For xP3, set USE_ENGLISH_PROMPTS = True in the beginning

Run the script, such as via python prepare_xp3.py or a SLURM script

For the new extension of xP3, xP3x, the process is largely the same except:

Install the xp3x branch instead i.e. pip install git+https://github.com/Muennighoff/promptsource.git@xp3x
The creation script is in this repository & named create_xp3x.py.

xP3x is a superset of xP3, so unless you want to reproduce the paper, we recommend always using xP3x (or xP3mt if you want machine-translated prompts).

Train models

BLOOMZ

Download the pretrained model checkpoint, which is of shape PP=12, TP=4, DP=4. If you'd like to reshape the model you will also need to download the universal checkpoint. If you want to continue finetuning, you should use our finetuned checkpoint, which is of shape PP=72, TP=1, DP=4.
Setup the training code: git clone -b t0loading https://github.com/bigscience-workshop/Megatron-DeepSpeed & follow its setup guide to create an environment with necessary packages.
Download the Megatron-DeepSpeed processed xP3megds or repreprocess it for Megatron-DeepSpeed yourself by downloading xP3, removing the merged_{lang}.jsonl files & preprocess it using the script here.
Setup & run the training script: We use SLURM scripts available at bigscience-workshop/bigscience/train/tr13-mtf and referred to as xp3capmixnewcodelonglossseq. E.g. this is the script launched to train bloomz. Important parts of the script to modify are:

#SBATCH variables, such as nodes, gpus, time, etc. - Our SLURM guide is here
source $six_ALL_CCFRWORK/start-tr13f-6B3-ml-t0 to point to your own conda environment setup via Megatron-DeepSpeed
PATH environment variables, notably
- TRAIN_DATA_PATH & VALID_DATA_PATH, which point to files pointing to your processed training and validation data. We provide our files in this repository (xp3capmixnewcodelong_train.txt & xp3capmixnewcodelong_validation.txt), but you will likely want to change the paths inside. The percentages per language are based on how much each language makes up in xP3 with code being slightly upsampled.
PP_SIZE=72, TP_SIZE=1 & BATCH SIZE & co specifying th

Xmtf

Install / Use

README