SkillAgentSearch skills...

BlonDe

Official implementations for (1) BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation and (2) Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus

Install / Use

/learn @EleanorJiang/BlonDe
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

BlonDe and BWB

Python version arxiv PyPI version License: MIT

BlonDe and BWB are developed for document-level machine translation. BlonDe is an automatic evaluation metric that explicitly tracks discourse phenomena. BWB is a large-scale bilingual parallel corpus that consists of web novels.

We hope that they will serve as a guide and inspiration for more work in the area of document level machine translation.

Quick Links

News:

<!-- - Features to appear in the next version (currently in the master branch): -->
  • May 2023: BWB got accepted to ACL2023 🎉
  • Jan 2023: Released the annotated BWB-test dataset: Entity, terminology, coreference, quotation.
  • June 2022: Released the BWB dataset.
  • May 2022: Released the BlonDe package.
  • May 2022: BlonDe got accepted to NAACL2022 🎉

Please see release logs for older updates.

If you use the BlonDe package or the BWB dataset for your research, please cite:

@inproceedings{jiang-etal-2022-blonde,
      title="{BlonDe}: An Automatic Evaluation Metric for Document-level Machine Translation", 
      author="Yuchen Eleanor Jiang and Tianyu Liu and Shuming Ma and Dongdong Zhang and Jian Yang and Haoyang Huang and Rico Sennrich and Ryan Cotterell and Mrinmaya Sachan and Ming Zhou",
      booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
      month = jul,
      year = "2022",
      address = "Seattle, United States",
      publisher = "Association for Computational Linguistics",
      url = "https://aclanthology.org/2022.naacl-main.111",
      doi = "10.18653/v1/2022.naacl-main.111",
      pages = "1550--1565",
}
@inproceedings{jiang-etal-2023-discourse,
      title="Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus", 
      author="Yuchen Eleanor Jiang and Tianyu Liu and Shuming Ma and Dongdong Zhang and Ryan Cotterell and Mrinmaya Sachan",
      booktitle = "Proceedings of the 2023 Conference of the Association for Computational Linguistics: Human Language Technologies",
      month = jul,
      year = "2023",
      address = "Toronto, Canada",
      publisher = "Association for Computational Linguistics",
      url = "https://aclanthology.org/2023.acl-main.111",
      doi = "10.18653/v1/2023.main.111",
      pages = "1550--1565",
}

📐 The BlonDe Package:

Package Overview

<img align="right" width="300" src="image/blonde_motivation.png">

Standard automatic metrics, e.g. BLEU, are not reliable for document-level MT evaluation. They can neither distinguish document-level improvements in translation quality from sentence-level ones, nor identify the discourse phenomena that cause context-agnostic translations.

BlonDe is proposed to widen the scope of automatic MT evaluation from sentence to the document level. It takes discourse coherence into consideration by categorizing discourse-related spans and calculating the similarity-based F1 measure of categorized spans.

As shown in the figure, BlonDe is a lot more selective than BLEU for document-level MT and shows a larger quality difference between human and machine translations.

In the BlonDe package, there are:

  • BlonDe: the main metric, combining dBlonDe with sentence-level measurement
  • dBlonDe: measures the discourse phonomena (entity, tense, pronoun, discourse markers)
  • BlonDe+: takes human annotation (annotated ambiguous/ommitted phrases and manually-annotated NER) into consideration

⏳ Installation

Python>=3.6 only

Before you install blonde, make sure that your pip, setuptoolswheel and spacy are up to date, and en_core_web_sm is downloaded.

pip install -U pip setuptools wheel
pip install -U spacy
python -m spacy download en_core_web_sm

Install the official Python module from PyPI:

pip install blonde

Install the latest unstable version from the master branch on Github:

pip install git+https://github.com/EleanorJiang/BlonDe

Install from the source:

git clone https://github.com/EleanorJiang/BlonDe
cd BlonDe
pip install .

and you may test your installation by:

python -m unittest discover

Usage

We provide a command line interface (CLI) of BlonDe as well as a python module. We provide example inputs under ./example.

Command-line Usage

You can use it as follows for the simplest usage:

blonde -r example/ref.txt -s sys.txt

To use human-annotated spans for BlonDe+, add -p and provide the annotation file path with -an, as in:

blonde -r example/ref.txt -s sys.txt -p -an example/an.txt

To use human-annotated named entities (instead of automatic detected ones), add -p and provide the named entity file path with -ner, as in:

blonde -r example/ref.txt -s sys.txt -p -ner example/ner.txt

The full list of named arguments:

General arguments:

  -h, --help            show this help message and exit
  -r REFERENCE [REFERENCE ...], --reference REFERENCE [REFERENCE ...]
                        reference file path(s), each line is a sentence
  -s SYSTEM, --system SYSTEM
                        system file path, each line is a sentence
  --version, -V         show program's version number and exit

BlonDe-related arguments:

  --categories CATEGORIES [CATEGORIES ...], -c CATEGORIES [CATEGORIES ...]
                        The categories of BLONDE. 
                        Default: ('tense', 'pronoun', 'entity', 'dm', 'n-gram')
  --average-method {geometric,arithmetic}, -aver {geometric,arithmetic}
                        The average method to use, geometric or arithmetic.
                        Defaults: geometric
  --smooth-method {none,floor,add-k,exp}, -sm {none,floor,add-k,exp}
                        Smoothing method: exponential decay, floor (increment zero counts), add-k (increment num/denom by k for n>1), or none.
                        Default: exp
  --smooth-value SMOOTH_VALUE, -sv SMOOTH_VALUE
                        The smoothing value. Only valid for floor and add-k. 
                        Defaults: floor: 0.1, add-k: 1
  --lowercase LOWERCASE, -lc LOWERCASE
                        If True, enables case-insensitivity. Default: True

Weight-related arguments:

  --override-weights, -w
                        Whether to customize the weights of BLONDE
  --reweight, -rw       Whether to reweight the weights of BLONDE to 1
  --weight-tense WEIGHT_TENSE [WEIGHT_TENSE ...], -wt WEIGHT_TENSE [WEIGHT_TENSE ...]
                        The weights of TENSE (verb types), should be a tuple of length 7, corresponds to ('VBD', 'VBN', 'VBP',
                        'VBZ', 'VBG', 'VB', 'MD'). Defaults: (1/7, 1/7, 1/7, 1/7, 1/7, 1/7, 1/7). Only valid when `override_weights`
                        is used
  --weight-pronoun WEIGHT_PRONOUN [WEIGHT_PRONOUN ...], -wp WEIGHT_PRONOUN [WEIGHT_PRONOUN ...]
                        The weights of PRONOUN, should be a tuple of length 4, corresponds to ('masculine', 'feminine', 'neuter',
                        'epicene'). Defaults: (0.5, 0.5, 0, 0). Only valid when `override_weights` is used
  --weight-entity WEIGHT_ENTITY [WEIGHT_ENTITY ...], -we WEIGHT_ENTITY [WEIGHT_ENTITY ...]
                        The weights of PERSON and NONPERSON entities, Defaults: (1/2, 1/2). Only valid when `override_weights` is
                        used
  --weight-discourse-marker WEIGHT_DISCOURSE_MARKER [WEIGHT_DISCOURSE_MARKER ...], -wdm WEIGHT_DISCOURSE_MARKER [WEIGHT_DISCOURSE_MARKER ...]
                        The weights of DISCOURSE MARKER, should be a tuple of length 5, corresponds to ('comparison', 'cause',
                        'conjunction', 'asynchronous', 'synchronous'). Defaults: (0.5, 0.5, 0, 0). Only valid when
                        `override_weights` is used

BlonDe+ related arguments, annotation required:

  --plus, -p            Whether to add BLONDE PLUS categories. If so, please provide annotation files that are in the required
                        format.
  --annotation ANNOTATION, -an ANNOTATION
                        Annotation file path, each line is the annotation corresponding a sentence. See README for annotation format
  --ner-refined NER_REFINED, -ner NER_REFINED
                        Named entity file path, each line is the named entities corresponding a sentence. If provided, the annotated
                        named entities instead of the automated recognized ones are used in BLONDE. See README for named entity
                        annotation format
  --plus-categories PLUS_CATEGORIES [PLUS_CATEGORIES ...], -pc PLUS_CATEGORIES [PLUS_CATEGORIES ...]
                        The categories that your annotation files contain, Defaults: ('ambiguity', 'ellipsis'). Only valid when
                        `plus` is used
  --plus-weights PLUS_WEIGHTS [PLUS_WEIGHTS ...], -pw PLUS_WEIGHTS [PLUS_WEIGHTS ...]
                        The corresponding weights of pl
View on GitHub
GitHub Stars83
CategoryDevelopment
Updated11d ago
Forks10

Languages

Python

Security Score

100/100

Audited on Mar 17, 2026

No findings