Sembleu
SemBleu: A Robust Metric for AMR Parsing Evaluation
Install / Use
/learn @freesunshine0316/SembleuREADME
SemBleu: A Robust Metric for AMR Parsing Evaluation
The repository corresponds to our ACL 2019 paper entitled "SemBleu: A Robust Metric for AMR Parsing Evaluation".
- SemBleu is fast, taking less than a second to evaluate a thousand AMR pairs.
- SemBleu is accuracy without any search errors.
- SemBleu considers high-order correspondences. From our experiments, it is mostly consistent with Smatch, but SemBleu can better capture performance variations.
Usage
chmod a+x eval.sh
./eval.sh output-file-path reference-file-path
Same as Smatch, AMRs in each file are separated by one empty line, such as:
(a / ask-01 :ARG0 (b / boy) :ARG1 (q / question))
(a / answer-01 :ARG0 (g / girl) :ARG1 (q / question))
AMR data
If you're developing a new metric and would like to have a comparison. Here is the 100 AMR graphs and the corresponding system outputs.
Results
The table below lists the SemBleu scores of recent SOTA work. The numbers are obtained by running our script on their provided outputs.
| Model | SemBleu | |---|---| | LDC2015E86 || | Lyu and Titov, (ACL 2018) | 58.7 | | Groschwitz et al., (ACL 2018) | 51.8 | | Guo and Lu, (EMNLP 2018) | 50.4 | | LDC2016E25 || | Lyu and Titov, (ACL 2018) | 60.3 | | van Noord and Bos, (CLIN 2017) | 49.5 | | LDC2017T10 || | Zhang et al., (ACL 2019) | 59.9 | | Cai and Lam (EMNLP 2019) | 56.9 | | Groschwitz et al., (ACL 2018) | 52.5 | | Guo and Lu, (EMNLP 2018) | 52.4 |
Related Skills
node-connect
353.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
