Naeval

Comparing quality and performance of NLP systems for Russian language

Generate Convert Improve

Install / Use

/learn @natasha/Naeval

About this skill

Quality Score

0/100

README

Naeval — comparing quality and performance of NLP systems for Russian language. Naeval is used to evaluate <a href="https://github.com/natasha">project Natasha</a> components: <a href="https://github.com/natasha/razdel">Razdel</a>, <a href="https://github.com/natasha/navec">Navec</a>, <a href="https://github.com/natasha/slovnet">Slovnet</a>.

Install

Naeval supports Python 3.7+

$ pip install naeval

Documentation

Materials are in Russian:

<a href="https://natasha.github.io/naeval">Naeval page on natasha.github.io</a>
<a href="https://youtu.be/-7XT_U6hVvk?t=2443">Naeval section of Datafest 2020 talk</a>

Models

<table> <tr> <th>Model</th> <th>Tags</th> <th>Description</th> </tr> <tr> <td> DeepPavlov NER <a name="deeppavlov_ner"> <a href="#deeppavlov_ner"><code>#</code></a> </td> <td><code>ner</code></td> <td> BiLSTM-CRF NER trained on Collection5. <a href="https://github.com/deepmipt/ner">Original repo</a>, <a href="http://docs.deeppavlov.ai/en/master/features/models/ner.html">docs</a>, <a href="https://arxiv.org/pdf/1709.09686.pdf">paper</a> </td> </tr> <tr> <td> DeepPavlov BERT NER <a name="deeppavlov_bert_ner"> <a href="#deeppavlov_bert_ner"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> Current SOTA for Russian language. <a href="http://docs.deeppavlov.ai/en/master/features/models/bert.html#bert-for-named-entity-recognition-sequence-tagging">Docs</a>, <a href="https://www.youtube.com/watch?v=eKTA8i8s-zs">video</a> </td> </tr> <tr> <td> <a href="https://github.com/deepmipt/Slavic-BERT-NER">DeepPavlov Slavic BERT NER</a> <a name="deeppavlov_slavic_bert_ner"> <a href="#deeppavlov_slavic_bert_ner"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> DeepPavlov solution for BSNLP-2019. <a href="https://www.aclweb.org/anthology/W19-3712/">Paper</a> </td> </tr> <tr> <td> DeepPavlov Morph <a name="deeppavlov_morph"> <a href="#deeppavlov_morph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> <a href="http://docs.deeppavlov.ai/en/master/features/models/morphotagger.html">Docs</a> </td> </tr> <tr> <td> DeepPavlov BERT Morph <a name="deeppavlov_bert_morph"> <a href="#deeppavlov_bert_morph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> <a href="http://docs.deeppavlov.ai/en/master/features/models/bert.html#bert-for-morphological-tagging">Docs</a> </td> </tr> <tr> <td> DeepPavlov BERT Syntax <a name="deeppavlov_bert_syntax"> <a href="#deeppavlov_bert_syntax"><code>#</code></a> </td> <td> <code>syntax</code> </td> <td> BERT + biaffine head. <a href="http://docs.deeppavlov.ai/en/master/features/models/syntaxparser.html">Docs</a> </td> </tr> <tr> <td> <a href="https://github.com/natasha/slovnet#ner">Slovnet NER</a> <a name="slovnet_ner"> <a href="#slovnet_ner"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> </td> </tr> <tr> <td> Slovnet BERT NER <a name="slovnet_bert_ner"> <a href="#slovnet_bert_ner"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/natasha/slovnet#morph">Slovnet Morph</a> <a name="slovnet_morph"> <a href="#slovnet_morph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> </td> </tr> <tr> <td> Slovnet BERT Morph <a name="slovnet_bert_morph"> <a href="#slovnet_bert_morph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/natasha/slovnet#syntax">Slovnet Syntax</a> <a name="slovnet_syntax"> <a href="#slovnet_syntax"><code>#</code></a> </td> <td> <code>syntax</code> </td> <td> </td> </tr> <tr> <td> Slovnet BERT Syntax <a name="slovnet_bert_syntax"> <a href="#slovnet_bert_syntax"><code>#</code></a> </td> <td> <code>syntax</code> </td> <td> </td> </tr> <tr> <td> <a href="http://pullenti.ru/">PullEnti</a> <a name="pullenti"> <a href="#pullenti"><code>#</code></a> </td> <td> <code>ner</code> <code>morph</code> </td> <td> First place on factRuEval-2016, super sophisticated ruled based system </td> </tr> <tr> <td> <a href="https://stanfordnlp.github.io/stanza/">Stanza</a> <a name="stanza"> <a href="#stanza"><code>#</code></a> </td> <td> <code>ner</code> <code>morph</code> <code>syntax</code> </td> <td> Tool by Stanford NLP released in 2020. <a href="https://arxiv.org/pdf/2003.07082.pdf">Paper</a> </td> </tr> <tr> <td> <a href="https://spacy.io/">SpaCy</a> <a name="spacy"> <a href="#spacy"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> <code>ner</code> <code>morph</code> <code>syntax</code> </td> <td> Uses <a href="https://github.com/buriy/spacy-ru">Russian models</a> trained by @buriy </td> </tr> <tr> <td> <a href="https://texterra.ispras.ru">Texterra</a> <a name="texterra"> <a href="#texterra"><code>#</code></a> </td> <td> <code>morph</code> <code>syntax</code> <code>ner</code> <code>token</code> <code>sent</code> </td> <td> Multifunctional NLP solution by <a href="https://www.ispras.ru/">ISP RAS</a> </td> </tr> <tr> <td> <a href="https://github.com/yandex/tomita-parser/">Tomita</a> <a name="tomita"> <a href="#tomita"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> GLR-parser by Yandex, only implementation for person names is publicly available </td> </tr> <tr> <td> <a href="https://github.com/mit-nlp/MITIE">MITIE</a> <a name="mitie"> <a href="#mitie"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> Engine developed at MIT + <a href="http://lang.org.ua/en/models/">third party model for Russian language</a> </td> </tr> <tr> <td> <a href="https://github.com/Koziev/rupostagger">RuPosTagger</a> <a name="rupostagger"> <a href="#rupostagger"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> CRF tagger, part of <a href="http://www.solarix.ru/">Solarix project</a> </td> </tr> <tr> <td> <a href="https://github.com/IlyaGusev/rnnmorph">RNNMorph</a> <a name="rnnmorph"> <a href="#rnnmorph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> First place solution on morphoRuEval-2017. <a href="https://habr.com/ru/post/339954/">Post on Habr</a> </td> </tr> <tr> <td> <a href="https://github.com/chomechome/maru">Maru</a> <a name="maru"> <a href="#maru"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> </td> </tr> <tr> <td> <a href="http://ufal.mff.cuni.cz/udpipe">UDPipe</a> <a name="udpipe"> <a href="#udpipe"><code>#</code></a> </td> <td> <code>morph</code> <code>syntax</code> </td> <td> Model trained on SynTagRus </td> </tr> <tr> <td> <a href="https://www.nltk.org/">NLTK</a> <a name="nltk"> <a href="#nltk"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> Multifunctional library, provides model for Russian text segmentation. <a href="https://www.nltk.org/api/nltk.tokenize.html#module-nltk.tokenize">Docs</a> </td> </tr> <tr> <td> <a href="https://github.com/nlpub/pymystem3">MyStem</a> <a name="mystem"> <a href="#mystem"><code>#</code></a> </td> <td> <code>token</code> <code>morph</code> </td> <td> Wrapper for Yandex morphological analyzers </td> </tr> <tr> <td> <a href="https://github.com/luismsgomes/mosestokenizer">Moses</a> <a name="moses"> <a href="#moses"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> Wrapper for Perl Moses utils </td> </tr> <tr> <td> <a href="https://github.com/fnl/segtok">SegTok</a> <a name="segtok"> <a href="#segtok"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/Koziev/rutokenizer">RuTokenizer</a> <a name="rutokenizer"> <a href="#rutokenizer"><code>#</code></a> </td> <td> <code>token</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/natasha/razdel">Razdel</a> <a name="razdel"> <a href="#razdel"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/aatimofeev/spacy_russian_tokenizer">Spacy Russian Tokenizer</a> <a name="spacy_russian_tokenizer"> <a href="#spacy_russian_tokenizer"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> Spacy segmentation pipeline for Russian texts by @aatimofeev </td> </tr> <tr> <td> <a href="https://github.com/deepmipt/ru_sentence_tokenizer">RuSentTokenizer</a> <a name="rusenttokenizer"> <a href="#rusenttokenizer"><code>#</code></a> </td> <td> <code>sent</code> </td> <td> DeepPavlov sentence segmentation </td> </tr>            </table>

Tokenization

See <a href="https://github.com/natasha/razdel#evaluation">Razdel evalualtion section</a> for more info.

<table border="0" class="dataframe"> <thead> <tr> <th></th> <th colspan="2" halign="left">corpora</th> <th colspan="2" halign="left">syntag</th> <th colspan="2" halign="left">gicrya</th> <th colspan="2" halign="left">rnc</th> </tr> <tr> <th></th> <th>errors</th> <th>time</th> <th>errors</th> <th>time</th> <th>errors</th> <th>time</th> <th>errors</th> <th>time</th> </tr> </thead> <tbody> <tr> <th>re.findall(\w+|\d+|\p+)</th> <td>24</td> <td>0.5</td> <td>16</td> <td>0.5</td> <td>19</td> <td>0.4</td> <td>60</td> <td>0.4</td> </tr> <tr> <th>spacy</th> <td>26</td> <td>6.2</td> <td>13</td> <td>5.8</td> <td><b>14</b></td> <td>4.1</td> <td>32</td> <td>3.9</td> </tr> <tr> <th>nltk.word_tokenize</th> <td>60</td> <td>3.4</td> <td>256</td> <td>3.3</td> <td>75</td> <td>2.7</td> <td>199</td> <td>2.9</td> </tr> <tr> <th>mystem</th>

Related Skills

node-connect

343.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

claude-opus-4-5-migration

92.1k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

frontend-design

92.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

model-usage

343.3k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

natasha

View profile

View on GitHub

GitHub Stars50

CategoryDevelopment

Updated14d ago

Forks6

natasha/naeval

Languages

Python

Security Score

100/100

Audited on Mar 17, 2026

No findings