Mbrs
A library for minimum Bayes risk (MBR) decoding
Install / Use
/learn @naist-nlp/MbrsREADME
Installation
You can install from PyPi:
pip install mbrs
For developers, it can be installed from the source.
git clone https://github.com/naist-nlp/mbrs.git
cd mbrs/
pip install ./
For uv users:
git clone https://github.com/naist-nlp/mbrs.git
cd mbrs/
uv sync
Quick start
mbrs provides two interfaces: command-line interface (CLI) and Python API.
Command-line interface
Command-line interface can run MBR decoding from command-line. Before
running MBR decoding, you can generate hypothesis sentences with
mbrs-generate:
mbrs-generate \
sources.txt \
--output hypotheses.txt \
--lang_pair en-de \
--model facebook/m2m100_418M \
--num_candidates 1024 \
--sampling eps --epsilon 0.02 \
--batch_size 8 --sampling_size 8 --fp16 \
--report_format rounded_outline
Beam search can also be used by replacing
--sampling eps --epsilon 0.02 with --beam_size 10.
Next, MBR decoding and other decoding methods can be executed with
mbrs-decode. This example regards the hypothesis set as the
pseudo-reference set.
mbrs-decode \
hypotheses.txt \
--num_candidates 1024 \
--nbest 1 \
--source sources.txt \
--references hypotheses.txt \
--output translations.txt \
--report report.txt --report_format rounded_outline \
--decoder mbr \
--metric comet \
--metric.model Unbabel/wmt22-comet-da \
--metric.batch_size 64 --metric.fp16 true
You can pass the arguments using a configuration yaml file via
--config_path option. See
docs for the
details.
Finally, you can evaluate the score with mbrs-score:
mbrs-score \
hypotheses.txt \
--sources sources.txt \
--references hypotheses.txt \
--format json \
--metric bleurt \
--metric.batch_size 64 --metric.fp16 true
Python API
This is the example of COMET-MBR via Python API.
from mbrs.metrics import MetricCOMET
from mbrs.decoders import DecoderMBR
SOURCE = "ありがとう"
HYPOTHESES = ["Thanks", "Thank you", "Thank you so much", "Thank you.", "thank you"]
# Setup COMET.
metric_cfg = MetricCOMET.Config(
model="Unbabel/wmt22-comet-da",
batch_size=64,
fp16=True,
)
metric = MetricCOMET(metric_cfg)
# Setup MBR decoding.
decoder_cfg = DecoderMBR.Config()
decoder = DecoderMBR(decoder_cfg, metric)
# Decode by COMET-MBR.
# This example regards the hypotheses themselves as the pseudo-references.
# Args: (hypotheses, pseudo-references, source)
output = decoder.decode(HYPOTHESES, HYPOTHESES, source=SOURCE, nbest=1)
print(f"Selected index: {output.idx}")
print(f"Output sentence: {output.sentence}")
print(f"Expected score: {output.score}")
List of implemented methods
Metrics
Currently, the following metrics are supported:
- BLEU (Papineni et al., 2002):
bleu - TER (Snover et al.,
2006):
ter - chrF (Popović et al., 2015):
chrf - COMET (Rei et al.,
2020):
comet - COMETkiwi (Rei et al.,
2022):
cometkiwi - XCOMET (Guerreiro et al., 2023):
xcomet - XCOMET-lite (Larionov et al., 2024):
xcometwith--metric.model="myyycroft/XCOMET-lite" - BLEURT (Sellam et al.,
2020):
bleurt(thanks to @lucadiliello) - MetricX (Juraska et al., 2023;
Juraska et al., 2024):
metricx - BERTScore (Zhang et al., 2020):
bertscore
Decoders
The following decoding methods are implemented:
- N-best reranking:
rerank - MBR decoding:
mbr
Specifically, the following methods of MBR decoding are included:
- Expectation estimation:
- Monte Carlo estimation (Eikema and Aziz, 2020; Eikema and Aziz, 2022)
- Model-based estimation (Jinnai et al.,
2024):
--reference_lprobsoption
- Efficient methods:
- Confidence-based pruning (Cheng and Vlachos,
2023) :
pruning_mbr - Reference aggregation (DeNero et al.,
2009; Vamvas and Sennrich,
2024):
aggregate_mbr- N-gram aggregation on BLEU (DeNero et al., 2009)
- N-gram aggregation on chrF (Vamvas and Sennrich, 2024)
- Embedding aggregation on COMET (Vamvas and Sennrich, 2024; Deguchi et al., 2024)
- Centroid-based MBR (Deguchi et al.,
2024):
centroid_mbr - Probabilistic MBR (Trabelsi et al.,
2024):
probabilistic_mbr
- Confidence-based pruning (Cheng and Vlachos,
2023) :
Selectors
The final output list is selected according to these selectors:
- N-best selection:
nbest - Diverse selection (Jinnai et al., 2024):
diverse
Related projects
- mbr
- Highly integrated with huggingface
transformers by
customizing
generate()method of model implementation. - If you are looking for an MBR decoding library that is fully integrated into transformers, this might be a good choice.
- Our mbrs works standalone; thus, not only transformers but also fairseq or LLM outputs via API can be used.
- Highly integrated with huggingface
transformers by
customizing
Citation
If you use this software, please cite:
@inproceedings{deguchi-etal-2024-mbrs,
title = "mbrs: A Library for Minimum {B}ayes Risk Decoding",
author = "Deguchi, Hiroyuki and
Sakai, Yusuke and
Kamigaito, Hidetaka and
Watanabe, Taro",
editor = "Hernandez Farias, Delia Irazu and
Hope, Tom and
Li, Manling",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-demo.37",
pages = "351--362",
}
License
This library is mainly developed by Hiroyuki Deguchi and published under the MIT-license.
Related Skills
node-connect
351.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
110.9kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
110.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
351.8kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
