Naeval
Comparing quality and performance of NLP systems for Russian language
Install / Use
/learn @natasha/NaevalREADME
Naeval — comparing quality and performance of NLP systems for Russian language. Naeval is used to evaluate <a href="https://github.com/natasha">project Natasha</a> components: <a href="https://github.com/natasha/razdel">Razdel</a>, <a href="https://github.com/natasha/navec">Navec</a>, <a href="https://github.com/natasha/slovnet">Slovnet</a>.
Install
Naeval supports Python 3.7+
$ pip install naeval
Documentation
Materials are in Russian:
- <a href="https://natasha.github.io/naeval">Naeval page on natasha.github.io</a>
- <a href="https://youtu.be/-7XT_U6hVvk?t=2443">Naeval section of Datafest 2020 talk</a>
Models
<table> <tr> <th>Model</th> <th>Tags</th> <th>Description</th> </tr> <tr> <td> DeepPavlov NER <a name="deeppavlov_ner"> <a href="#deeppavlov_ner"><code>#</code></a> </td> <td><code>ner</code></td> <td> BiLSTM-CRF NER trained on Collection5. <a href="https://github.com/deepmipt/ner">Original repo</a>, <a href="http://docs.deeppavlov.ai/en/master/features/models/ner.html">docs</a>, <a href="https://arxiv.org/pdf/1709.09686.pdf">paper</a> </td> </tr> <tr> <td> DeepPavlov BERT NER <a name="deeppavlov_bert_ner"> <a href="#deeppavlov_bert_ner"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> Current SOTA for Russian language. <a href="http://docs.deeppavlov.ai/en/master/features/models/bert.html#bert-for-named-entity-recognition-sequence-tagging">Docs</a>, <a href="https://www.youtube.com/watch?v=eKTA8i8s-zs">video</a> </td> </tr> <tr> <td> <a href="https://github.com/deepmipt/Slavic-BERT-NER">DeepPavlov Slavic BERT NER</a> <a name="deeppavlov_slavic_bert_ner"> <a href="#deeppavlov_slavic_bert_ner"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> DeepPavlov solution for BSNLP-2019. <a href="https://www.aclweb.org/anthology/W19-3712/">Paper</a> </td> </tr> <tr> <td> DeepPavlov Morph <a name="deeppavlov_morph"> <a href="#deeppavlov_morph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> <a href="http://docs.deeppavlov.ai/en/master/features/models/morphotagger.html">Docs</a> </td> </tr> <tr> <td> DeepPavlov BERT Morph <a name="deeppavlov_bert_morph"> <a href="#deeppavlov_bert_morph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> <a href="http://docs.deeppavlov.ai/en/master/features/models/bert.html#bert-for-morphological-tagging">Docs</a> </td> </tr> <tr> <td> DeepPavlov BERT Syntax <a name="deeppavlov_bert_syntax"> <a href="#deeppavlov_bert_syntax"><code>#</code></a> </td> <td> <code>syntax</code> </td> <td> BERT + biaffine head. <a href="http://docs.deeppavlov.ai/en/master/features/models/syntaxparser.html">Docs</a> </td> </tr> <tr> <td> <a href="https://github.com/natasha/slovnet#ner">Slovnet NER</a> <a name="slovnet_ner"> <a href="#slovnet_ner"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> </td> </tr> <tr> <td> Slovnet BERT NER <a name="slovnet_bert_ner"> <a href="#slovnet_bert_ner"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/natasha/slovnet#morph">Slovnet Morph</a> <a name="slovnet_morph"> <a href="#slovnet_morph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> </td> </tr> <tr> <td> Slovnet BERT Morph <a name="slovnet_bert_morph"> <a href="#slovnet_bert_morph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/natasha/slovnet#syntax">Slovnet Syntax</a> <a name="slovnet_syntax"> <a href="#slovnet_syntax"><code>#</code></a> </td> <td> <code>syntax</code> </td> <td> </td> </tr> <tr> <td> Slovnet BERT Syntax <a name="slovnet_bert_syntax"> <a href="#slovnet_bert_syntax"><code>#</code></a> </td> <td> <code>syntax</code> </td> <td> </td> </tr> <tr> <td> <a href="http://pullenti.ru/">PullEnti</a> <a name="pullenti"> <a href="#pullenti"><code>#</code></a> </td> <td> <code>ner</code> <code>morph</code> </td> <td> First place on factRuEval-2016, super sophisticated ruled based system </td> </tr> <tr> <td> <a href="https://stanfordnlp.github.io/stanza/">Stanza</a> <a name="stanza"> <a href="#stanza"><code>#</code></a> </td> <td> <code>ner</code> <code>morph</code> <code>syntax</code> </td> <td> Tool by Stanford NLP released in 2020. <a href="https://arxiv.org/pdf/2003.07082.pdf">Paper</a> </td> </tr> <tr> <td> <a href="https://spacy.io/">SpaCy</a> <a name="spacy"> <a href="#spacy"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> <code>ner</code> <code>morph</code> <code>syntax</code> </td> <td> Uses <a href="https://github.com/buriy/spacy-ru">Russian models</a> trained by @buriy </td> </tr> <tr> <td> <a href="https://texterra.ispras.ru">Texterra</a> <a name="texterra"> <a href="#texterra"><code>#</code></a> </td> <td> <code>morph</code> <code>syntax</code> <code>ner</code> <code>token</code> <code>sent</code> </td> <td> Multifunctional NLP solution by <a href="https://www.ispras.ru/">ISP RAS</a> </td> </tr> <tr> <td> <a href="https://github.com/yandex/tomita-parser/">Tomita</a> <a name="tomita"> <a href="#tomita"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> GLR-parser by Yandex, only implementation for person names is publicly available </td> </tr> <tr> <td> <a href="https://github.com/mit-nlp/MITIE">MITIE</a> <a name="mitie"> <a href="#mitie"><code>#</code></a> </td> <td> <code>ner</code> </td> <td> Engine developed at MIT + <a href="http://lang.org.ua/en/models/">third party model for Russian language</a> </td> </tr> <tr> <td> <a href="https://github.com/Koziev/rupostagger">RuPosTagger</a> <a name="rupostagger"> <a href="#rupostagger"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> CRF tagger, part of <a href="http://www.solarix.ru/">Solarix project</a> </td> </tr> <tr> <td> <a href="https://github.com/IlyaGusev/rnnmorph">RNNMorph</a> <a name="rnnmorph"> <a href="#rnnmorph"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> First place solution on morphoRuEval-2017. <a href="https://habr.com/ru/post/339954/">Post on Habr</a> </td> </tr> <tr> <td> <a href="https://github.com/chomechome/maru">Maru</a> <a name="maru"> <a href="#maru"><code>#</code></a> </td> <td> <code>morph</code> </td> <td> </td> </tr> <tr> <td> <a href="http://ufal.mff.cuni.cz/udpipe">UDPipe</a> <a name="udpipe"> <a href="#udpipe"><code>#</code></a> </td> <td> <code>morph</code> <code>syntax</code> </td> <td> Model trained on SynTagRus </td> </tr> <tr> <td> <a href="https://www.nltk.org/">NLTK</a> <a name="nltk"> <a href="#nltk"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> Multifunctional library, provides model for Russian text segmentation. <a href="https://www.nltk.org/api/nltk.tokenize.html#module-nltk.tokenize">Docs</a> </td> </tr> <tr> <td> <a href="https://github.com/nlpub/pymystem3">MyStem</a> <a name="mystem"> <a href="#mystem"><code>#</code></a> </td> <td> <code>token</code> <code>morph</code> </td> <td> Wrapper for Yandex morphological analyzers </td> </tr> <tr> <td> <a href="https://github.com/luismsgomes/mosestokenizer">Moses</a> <a name="moses"> <a href="#moses"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> Wrapper for Perl Moses utils </td> </tr> <tr> <td> <a href="https://github.com/fnl/segtok">SegTok</a> <a name="segtok"> <a href="#segtok"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/Koziev/rutokenizer">RuTokenizer</a> <a name="rutokenizer"> <a href="#rutokenizer"><code>#</code></a> </td> <td> <code>token</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/natasha/razdel">Razdel</a> <a name="razdel"> <a href="#razdel"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> </td> </tr> <tr> <td> <a href="https://github.com/aatimofeev/spacy_russian_tokenizer">Spacy Russian Tokenizer</a> <a name="spacy_russian_tokenizer"> <a href="#spacy_russian_tokenizer"><code>#</code></a> </td> <td> <code>token</code> <code>sent</code> </td> <td> Spacy segmentation pipeline for Russian texts by @aatimofeev </td> </tr> <tr> <td> <a href="https://github.com/deepmipt/ru_sentence_tokenizer">RuSentTokenizer</a> <a name="rusenttokenizer"> <a href="#rusenttokenizer"><code>#</code></a> </td> <td> <code>sent</code> </td> <td> DeepPavlov sentence segmentation </td> </tr> <!-- <tr> --> <!-- <td> --> <!-- <a name=""> --> <!-- <a href="#"><code>#</code></a> --> <!-- </td> --> <!-- <td> --> <!-- <code></code> --> <!-- </td> --> <!-- <td> --> <!-- </td> --> <!-- </tr> --> </table>Tokenization
See <a href="https://github.com/natasha/razdel#evaluation">Razdel evalualtion section</a> for more info.
<!--- token ---> <table border="0" class="dataframe"> <thead> <tr> <th></th> <th colspan="2" halign="left">corpora</th> <th colspan="2" halign="left">syntag</th> <th colspan="2" halign="left">gicrya</th> <th colspan="2" halign="left">rnc</th> </tr> <tr> <th></th> <th>errors</th> <th>time</th> <th>errors</th> <th>time</th> <th>errors</th> <th>time</th> <th>errors</th> <th>time</th> </tr> </thead> <tbody> <tr> <th>re.findall(\w+|\d+|\p+)</th> <td>24</td> <td>0.5</td> <td>16</td> <td>0.5</td> <td>19</td> <td>0.4</td> <td>60</td> <td>0.4</td> </tr> <tr> <th>spacy</th> <td>26</td> <td>6.2</td> <td>13</td> <td>5.8</td> <td><b>14</b></td> <td>4.1</td> <td>32</td> <td>3.9</td> </tr> <tr> <th>nltk.word_tokenize</th> <td>60</td> <td>3.4</td> <td>256</td> <td>3.3</td> <td>75</td> <td>2.7</td> <td>199</td> <td>2.9</td> </tr> <tr> <th>mystem</th>Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
claude-opus-4-5-migration
92.1kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
model-usage
343.3kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
