SkillAgentSearch skills...

Naeval

Comparing quality and performance of NLP systems for Russian language

Install / Use

/learn @natasha/Naeval
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<img src="https://github.com/natasha/natasha-logos/blob/master/naeval.svg">

CI

Naeval — comparing quality and performance of NLP systems for Russian language. Naeval is used to evaluate <a href="https://github.com/natasha">project Natasha</a> components: <a href="https://github.com/natasha/razdel">Razdel</a>, <a href="https://github.com/natasha/navec">Navec</a>, <a href="https://github.com/natasha/slovnet">Slovnet</a>.

Install

Naeval supports Python 3.7+

$ pip install naeval

Documentation

Materials are in Russian:

  • <a href="https://natasha.github.io/naeval">Naeval page on natasha.github.io</a>
  • <a href="https://youtu.be/-7XT_U6hVvk?t=2443">Naeval section of Datafest 2020 talk</a>

Models

<table> <tr> <th>Model</th> <th>Tags</th> <th>Description</th> </tr> <tr> <td> DeepPavlov NER <a name="deeppavlov_ner"> <a href="#deeppavlov_ner"><code>#</code></a> </td> <td><code>ner</code></td> <td> BiLSTM-CRF NER trained on Collection5. <a href="https://github.com/deepmipt/ner">Original repo</a>, <a href="http://docs.deeppavlov.ai/en/master/features/models/ner.html">docs</a>, <a href="https://arxiv.org/pdf/1709.09686.pdf">paper</a> </td> </tr> <tr> <td> DeepPavlov BERT NER <a name="deeppavlov_bert_ner"> <a href="#deeppavlov_bert_ner"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> Current SOTA for Russian language. <a href="http://docs.deeppavlov.ai/en/master/features/models/bert.html#bert-for-named-entity-recognition-sequence-tagging">Docs</a>, <a href="https://www.youtube.com/watch?v=eKTA8i8s-zs">video</a> </td> </tr> <tr> <td> <a href="https://github.com/deepmipt/Slavic-BERT-NER">DeepPavlov Slavic BERT NER</a> <a name="deeppavlov_slavic_bert_ner"> <a href="#deeppavlov_slavic_bert_ner"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> DeepPavlov solution for BSNLP-2019. <a href="https://www.aclweb.org/anthology/W19-3712/">Paper</a> </td> </tr> <tr> <td> DeepPavlov Morph <a name="deeppavlov_morph"> <a href="#deeppavlov_morph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> <a href="http://docs.deeppavlov.ai/en/master/features/models/morphotagger.html">Docs</a> </td> </tr> <tr> <td> DeepPavlov BERT Morph <a name="deeppavlov_bert_morph"> <a href="#deeppavlov_bert_morph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> <a href="http://docs.deeppavlov.ai/en/master/features/models/bert.html#bert-for-morphological-tagging">Docs</a> </td> </tr> <tr> <td> DeepPavlov BERT Syntax <a name="deeppavlov_bert_syntax"> <a href="#deeppavlov_bert_syntax"><code>#</code></a> </td> <td> <code>syntax</code> </td> <td> BERT + biaffine head. <a href="http://docs.deeppavlov.ai/en/master/features/models/syntaxparser.html">Docs</a> </td> </tr> <tr> <td> <a href="https://github.com/natasha/slovnet#ner">Slovnet NER</a> <a name="slovnet_ner"> <a href="#slovnet_ner"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> </td> </tr> <tr> <td> Slovnet BERT NER <a name="slovnet_bert_ner"> <a href="#slovnet_bert_ner"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/natasha/slovnet#morph">Slovnet Morph</a> <a name="slovnet_morph"> <a href="#slovnet_morph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> </td> </tr> <tr> <td> Slovnet BERT Morph <a name="slovnet_bert_morph"> <a href="#slovnet_bert_morph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/natasha/slovnet#syntax">Slovnet Syntax</a> <a name="slovnet_syntax"> <a href="#slovnet_syntax"><code>#</code></a> </td> <td> <code>syntax</code> </td> <td> </td> </tr> <tr> <td> Slovnet BERT Syntax <a name="slovnet_bert_syntax"> <a href="#slovnet_bert_syntax"><code>#</code></a> </td> <td> <code>syntax</code> </td> <td> </td> </tr> <tr> <td> <a href="http://pullenti.ru/">PullEnti</a> <a name="pullenti"> <a href="#pullenti"><code>#</code></a> </td> <td> <code>ner</code> <code>morph</code> </td> <td> First place on factRuEval-2016, super sophisticated ruled based system </td> </tr> <tr> <td> <a href="https://stanfordnlp.github.io/stanza/">Stanza</a> <a name="stanza"> <a href="#stanza"><code>#</code></a> </td> <td> <code>ner</code> <code>morph</code> <code>syntax</code> </td> <td> Tool by Stanford NLP released in 2020. <a href="https://arxiv.org/pdf/2003.07082.pdf">Paper</a> </td> </tr> <tr> <td> <a href="https://spacy.io/">SpaCy</a> <a name="spacy"> <a href="#spacy"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> <code>ner</code> <code>morph</code> <code>syntax</code> </td> <td> Uses <a href="https://github.com/buriy/spacy-ru">Russian models</a> trained by @buriy </td> </tr> <tr> <td> <a href="https://texterra.ispras.ru">Texterra</a> <a name="texterra"> <a href="#texterra"><code>#</code></a> </td> <td> <code>morph</code> <code>syntax</code> <code>ner</code> <code>token</code> <code>sent</code> </td> <td> Multifunctional NLP solution by <a href="https://www.ispras.ru/">ISP RAS</a> </td> </tr> <tr> <td> <a href="https://github.com/yandex/tomita-parser/">Tomita</a> <a name="tomita"> <a href="#tomita"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> GLR-parser by Yandex, only implementation for person names is publicly available </td> </tr> <tr> <td> <a href="https://github.com/mit-nlp/MITIE">MITIE</a> <a name="mitie"> <a href="#mitie"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> Engine developed at MIT + <a href="http://lang.org.ua/en/models/">third party model for Russian language</a> </td> </tr> <tr> <td> <a href="https://github.com/Koziev/rupostagger">RuPosTagger</a> <a name="rupostagger"> <a href="#rupostagger"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> CRF tagger, part of <a href="http://www.solarix.ru/">Solarix project</a> </td> </tr> <tr> <td> <a href="https://github.com/IlyaGusev/rnnmorph">RNNMorph</a> <a name="rnnmorph"> <a href="#rnnmorph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> First place solution on morphoRuEval-2017. <a href="https://habr.com/ru/post/339954/">Post on Habr</a> </td> </tr> <tr> <td> <a href="https://github.com/chomechome/maru">Maru</a> <a name="maru"> <a href="#maru"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> </td> </tr> <tr> <td> <a href="http://ufal.mff.cuni.cz/udpipe">UDPipe</a> <a name="udpipe"> <a href="#udpipe"><code>#</code></a> </td> <td> <code>morph</code> <code>syntax</code> </td> <td> Model trained on SynTagRus </td> </tr> <tr> <td> <a href="https://www.nltk.org/">NLTK</a> <a name="nltk"> <a href="#nltk"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> Multifunctional library, provides model for Russian text segmentation. <a href="https://www.nltk.org/api/nltk.tokenize.html#module-nltk.tokenize">Docs</a> </td> </tr> <tr> <td> <a href="https://github.com/nlpub/pymystem3">MyStem</a> <a name="mystem"> <a href="#mystem"><code>#</code></a> </td> <td> <code>token</code> <code>morph</code> </td> <td> Wrapper for Yandex morphological analyzers </td> </tr> <tr> <td> <a href="https://github.com/luismsgomes/mosestokenizer">Moses</a> <a name="moses"> <a href="#moses"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> Wrapper for Perl Moses utils </td> </tr> <tr> <td> <a href="https://github.com/fnl/segtok">SegTok</a> <a name="segtok"> <a href="#segtok"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/Koziev/rutokenizer">RuTokenizer</a> <a name="rutokenizer"> <a href="#rutokenizer"><code>#</code></a> </td> <td> <code>token</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/natasha/razdel">Razdel</a> <a name="razdel"> <a href="#razdel"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/aatimofeev/spacy_russian_tokenizer">Spacy Russian Tokenizer</a> <a name="spacy_russian_tokenizer"> <a href="#spacy_russian_tokenizer"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> Spacy segmentation pipeline for Russian texts by @aatimofeev </td> </tr> <tr> <td> <a href="https://github.com/deepmipt/ru_sentence_tokenizer">RuSentTokenizer</a> <a name="rusenttokenizer"> <a href="#rusenttokenizer"><code>#</code></a> </td> <td> <code>sent</code> </td> <td> DeepPavlov sentence segmentation </td> </tr> <!-- <tr> --> <!-- <td> --> <!-- <a name=""> --> <!-- <a href="#"><code>#</code></a> --> <!-- </td> --> <!-- <td> --> <!-- <code></code> --> <!-- </td> --> <!-- <td> --> <!-- </td> --> <!-- </tr> --> </table>

Tokenization

See <a href="https://github.com/natasha/razdel#evaluation">Razdel evalualtion section</a> for more info.

<!--- token ---> <table border="0" class="dataframe"> <thead> <tr> <th></th> <th colspan="2" halign="left">corpora</th> <th colspan="2" halign="left">syntag</th> <th colspan="2" halign="left">gicrya</th> <th colspan="2" halign="left">rnc</th> </tr> <tr> <th></th> <th>errors</th> <th>time</th> <th>errors</th> <th>time</th> <th>errors</th> <th>time</th> <th>errors</th> <th>time</th> </tr> </thead> <tbody> <tr> <th>re.findall(\w+|\d+|\p+)</th> <td>24</td> <td>0.5</td> <td>16</td> <td>0.5</td> <td>19</td> <td>0.4</td> <td>60</td> <td>0.4</td> </tr> <tr> <th>spacy</th> <td>26</td> <td>6.2</td> <td>13</td> <td>5.8</td> <td><b>14</b></td> <td>4.1</td> <td>32</td> <td>3.9</td> </tr> <tr> <th>nltk.word_tokenize</th> <td>60</td> <td>3.4</td> <td>256</td> <td>3.3</td> <td>75</td> <td>2.7</td> <td>199</td> <td>2.9</td> </tr> <tr> <th>mystem</th>

Related Skills

View on GitHub
GitHub Stars50
CategoryDevelopment
Updated14d ago
Forks6

Languages

Python

Security Score

100/100

Audited on Mar 17, 2026

No findings