Mlqe
We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scale 1 to 100) generated though human evaluations that represent the quality of the translations.Paper Title Unsupervised Quality Estimation for Neural Machine Translation
Install / Use
/learn @facebookresearch/MlqeREADME
MultiLingual Quality Estimation (MLQE) Dataset
This repository contains data for the 2020 Quality Estimation Shared Task:
http://www.statmt.org/wmt20/quality-estimation-task.html
Training and development data
Check the 'data' folder
NMT models
Check the 'nmt-models' folder
Parallel data used to train the NMT models
Check 'http://www.statmt.org/wmt20/quality-estimation-task.html'
German-English
Europarl v9
ParaCrawl v3
Common Crawl corpus
News Commentary v14
Wiki Titles v1
Document-split Rapid corpus
Chinese-English
News Commentary v14
Wiki Titles v1
UN Parallel Corpus V1.0
CWMT Corpus (casia2015, datum2015, datum2017, NEU)
Romanian-English
Estonian-English
Europarl v8
Rapid corpus of EU press releases
Sinhala-English
Flores Iterative Back Translation
Nepali-English
Flores Iterative Back Translation
Citation
If you use this data in your work, please cite:
@article{tacl2020,
title = {Unsupervised Quality Estimation for Neural Machine Translation},
author = {Fomicheva, Marina and Sun, Shuo and Yankovskaya, Lisa and Blain, Frédéric and Guzmán, Francisco and Fishel, Mark and Aletras, Nikolaos and Chaudhary, Vishrav and Specia, Lucia},
journal = {Transactions of the Association for Computational Linguistics},
volume = {8},
pages = {539-555},
year = {2020}
}
Changelog
- 2020-03-15: Adding details about training data for NMT models
- 2020-03-19: Releasing dataset
License
The dataset is licensed under CC-BY-SA, see the LICENSE file for details.
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
Security Score
Audited on Oct 31, 2024
