ALMA
State-of-the-art LLM-based translation models.
Install / Use
/learn @fe1ixxu/ALMAREADME
ALMA: Advanced Language Model-based translator
</div> <p align="center"> <a href="LICENSE" alt="MIT License"><img src="https://img.shields.io/badge/license-MIT-FAD689.svg" /></a> <a href="https://arxiv.org/abs/2309.11674" alt="ALMA paper"><img src="https://img.shields.io/badge/ALMA-Paper-D9AB42" /></a> <a href="https://arxiv.org/abs/2401.08417" alt="ALMA-R paper"><img src="https://img.shields.io/badge/ALMA--R-Paper-F6C555" /></a> <a href="https://arxiv.org/pdf/2410.03115" alt="X-ALMA paper"><img src="https://img.shields.io/badge/X--ALMA-Paper-F3B425" /></a> <!-- <a href="https://notes.aimodels.fyi/alma-a-new-training-method-that-boosts-translation-performance-for-large-language-models/"><img alt="Summary Link" src="https://img.shields.io/badge/summary-link-F6C555" /></a> --> <a href="https://www.clsp.jhu.edu/" alt="jhu"><img src="https://img.shields.io/badge/Johns_Hopkins_University-BEC23F" /></a> <a href="https://www.microsoft.com/en-us/research/" alt="MSlogo"><img src="https://img.shields.io/badge/Microsoft-B1B479?logo=microsoft" /></a> <a href="https://twitter.com/fe1ixxu"> <img src="https://img.shields.io/twitter/follow/haoranxu?style=social&logo=twitter" alt="follow on Twitter"></a> </p>ALMA has three generations: ALMA (1st), ALMA-R (2nd), and X-ALMA(3rd NEW!).
ALMA (Advanced Language Model-based TrAnslator) is a many-to-many LLM-based translation model, which adopts a new translation model paradigm: it begins with fine-tuning on monolingual data and is further optimized using high-quality parallel data. This two-step fine-tuning process ensures strong translation performance.
ALMA-R builds upon ALMA models, with further LoRA fine-tuning with our proposed Contrastive Preference Optimization (CPO) as opposed to the Supervised Fine-tuning used in ALMA. CPO fine-tuning requires our triplet preference data for preference learning. ALMA-R now can matches or even exceeds GPT-4 or WMT winners!
X-ALMA (NEW!) extends ALMA(-R) from 6 languages to 50 languages and ensures top-tier performance across 50 diverse languages, regardless of their resource levels. This is achieved by plug-and-play language-specific module architecture and a carefully designed 5-step training recipe with novel Adaptive-Rejection Preference Optimization methods.
Old ALMA Repo:
News 🌟
⭐ Jan. 22 2025 X-ALMA has been accepted at ICLR 2025!
⭐ Oct. 6 2024 X-ALMA is out! Please find the paper here and models & datasets here.
⭐ Jun. 20 2024 We want to give a shout out to SimPO, which shares a similar reference-free preference learning framework with CPO but in a more stable manner due to its special length normalization and target reward margin. The most exciting thing is that CPO and SimPO can potentially be used together! Learn more about CPO-SimPO!
⭐ May.1 CPO paper has been accepted at ICML 2024!
⭐ Mar.22 2024 CPO method now is merged at huggingface trl! See details here.
⭐ Jan.16 2024 ALMA-R is released! Please check more details with our new paper: Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation.
⭐ Jan.16 2024 The ALMA paper: A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models has been accepted at ICLR 2024! Check out more details here!
Contents 📄
:star: Supports :star:
- AMD and Nvidia Cards
- Data Parallel Evaluation
- Also support LLaMA-1, LLaMA-2, OPT, Faclon, BLOOM, MPT
- LoRA Fine-tuning
- Monolingual data fine-tuning, parallel data fine-tuning
Download ALMA Models and Dataset 🚀
We release seven translation models for ALMA series:
Model checkpoints are released at huggingface: | Models | Base Model Link | LoRA Link | |:-------------:|:---------------:|:---------:| | ALMA-7B (1st gen) | haoranxu/ALMA-7B | - | | ALMA-7B-LoRA (1st gen) | haoranxu/ALMA-7B-Pretrain | haoranxu/ALMA-7B-Pretrain-LoRA | | ALMA-7B-R (2nd gen) | haoranxu/ALMA-7B-R (LoRA merged) | - | | ALMA-13B-LoRA (1st gen) | haoranxu/ALMA-13B | - | | ALMA-13B-LoRA | haoranxu/ALMA-13B-Pretrain | haoranxu/ALMA-13B-Pretrain-LoRA | | ALMA-13B-R (2nd gen) | haoranxu/ALMA-13B-R (LoRA merged) | - | | X-ALMA (NEW, 3rd gen) | X-ALMA Models | - |
Note that ALMA-7B-Pretrain and ALMA-13B-Pretrain are NOT translation models. They only experience stage 1 monolingual fine-tuning (20B tokens for the 7B model and 12B tokens for the 13B model), and should be utilized in conjunction with their LoRA models.
We have also provided the WMT'22 and WMT'23 translation outputs from ALMA-13B-LoRA and ALMA-13B-R in the outputs directory. These outputs also includes our outputs of baselines and can be directly accessed and used for subsequent evaluations.
Datasets used by ALMA and ALMA-R are also released at huggingface now (NEW!) | Datasets | Train / Validation| Test | |:-------------:|:---------------:|:---------:| | ALMA Human-Written Parallel Data | Parallel train and validation | WMT'22 | | ALMA-R Triplet Preference Data | Triplet Preference Data | WMT'22 and WMT'23 | | X-ALMA Data | 50-language parallel data and preference data | WMT'23 and FLORES-200 |
A Quick Start
X-ALMA is designed with a plug-and-play architecture, consisting of two components: a base model and language-specific modules, with each module shared across different language groups. There are three ways to load X-ALMA for translation. An example of translating "我爱机器翻译。" into English (X-ALMA should also able to do multilingual open-ended QA).
The first way: loading the merged model where the language-specific module has been merged into the base model (Recommended):
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
from peft import PeftModel
GROUP2LANG = {
1: ["da", "nl", "de", "is", "no", "sv", "af"],
2: ["ca", "ro", "gl", "it", "pt", "es"],
3: ["bg", "mk", "sr", "uk", "ru"],
4: ["id", "ms", "th", "vi", "mg", "fr"],
5: ["hu", "el", "cs", "pl", "lt", "lv"],
6: ["ka", "zh", "ja", "ko", "fi", "et"],
7: ["gu", "hi", "mr", "ne", "ur"],
8: ["az", "kk", "ky", "tr", "uz", "ar", "he", "fa"],
}
LANG2GROUP = {lang: str(group) for group, langs in GROUP2LANG.items() for lang in langs}
group_id = LANG2GROUP["zh"]
model = AutoModelForCausalLM.from_pretrained(f"haoranxu/X-ALMA-13B-Group{group_id}", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(f"haoranxu/X-ALMA-13B-Group{group_id}", padding_side='left')
# Add the source sentence into the prompt template
prompt="Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"
# X-ALMA needs chat template but ALMA and ALMA-R don't need it.
chat_style_prompt = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(chat_style_prompt, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer(prompt, return_tensors="pt", padding=True, max_length=40, truncation=True).input_ids.cuda()
# Translation
with torch.no_grad():
generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs)
The second way: loading the base model and language-specific module (Recommended):
model = AutoModelForCausalLM.from_pretrained("haoranxu/X-ALMA-13B-Pretrain", torch_dtype=torch.fl
