Sumeval
Well tested & Multi-language evaluation framework for text summarization.
Install / Use
/learn @chakki-works/SumevalREADME
- Well tested
- The ROUGE-X scores are tested compare with original Perl script (ROUGE-1.5.5.pl).
- The BLEU score is calculated by SacréBLEU, that produces the same values as official script (
mteval-v13a.pl) used by WMT.
- Multi-language
- Not only English, Japanese and Chinese are also supported. The other language is extensible easily.
Of course, implementation is Pure Python!
How to use
from sumeval.metrics.rouge import RougeCalculator
rouge = RougeCalculator(stopwords=True, lang="en")
rouge_1 = rouge.rouge_n(
summary="I went to the Mars from my living town.",
references="I went to Mars",
n=1)
rouge_2 = rouge.rouge_n(
summary="I went to the Mars from my living town.",
references=["I went to Mars", "It's my living town"],
n=2)
rouge_l = rouge.rouge_l(
summary="I went to the Mars from my living town.",
references=["I went to Mars", "It's my living town"])
# You need spaCy to calculate ROUGE-BE
rouge_be = rouge.rouge_be(
summary="I went to the Mars from my living town.",
references=["I went to Mars", "It's my living town"])
print("ROUGE-1: {}, ROUGE-2: {}, ROUGE-L: {}, ROUGE-BE: {}".format(
rouge_1, rouge_2, rouge_l, rouge_be
).replace(", ", "\n"))
from sumeval.metrics.bleu import BLEUCalculator
bleu = BLEUCalculator()
score = bleu.bleu("I am waiting on the beach",
"He is walking on the beach")
bleu_ja = BLEUCalculator(lang="ja")
score_ja = bleu_ja.bleu("私はビーチで待ってる", "彼がベンチで待ってる")
From the command line
sumeval r-nlb "I'm living New York its my home town so awesome" "My home town is awesome"
output.
{
"options": {
"stopwords": true,
"stemming": false,
"word_limit": -1,
"length_limit": -1,
"alpha": 0.5,
"input-summary": "I'm living New York its my home town so awesome",
"input-references": [
"My home town is awesome"
]
},
"averages": {
"ROUGE-1": 0.7499999999999999,
"ROUGE-2": 0.6666666666666666,
"ROUGE-L": 0.7499999999999999,
"ROUGE-BE": 0
},
"scores": [
{
"ROUGE-1": 0.7499999999999999,
"ROUGE-2": 0.6666666666666666,
"ROUGE-L": 0.7499999999999999,
"ROUGE-BE": 0
}
]
}
Undoubtedly you can use file input. Please see more detail by sumeval -h.
Install
pip install sumeval
Dependencies
- BLEU is depends on SacréBLEU
- To calculate
ROUGE-BE,spaCyis required. - To use lang
ja,janomeorMeCabis required.- Especially to get score of
ROUGE-BE,GiNZAis needed additionally.
- Especially to get score of
- To use lang
zh,jiebais required.- Especially to get score of
ROUGE-BE,pyhanlpis needed additionally.
- Especially to get score of
Test
sumeval uses two packages to test the score.
- pythonrouge
- It calls original perl script
pip install git+https://github.com/tagucci/pythonrouge.git
- rougescore
- It's simple python implementation for rouge score
pip install git+git://github.com/bdusell/rougescore.git
Welcome Contribution :tada:
Add supported language
The tokenization and dependency parse process for each language is located on sumeval/metrics/lang.
You can make language class by inheriting BaseLang.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
workshop-rules
Materials used to teach the summer camp <Data Science for Kids>
