Huspacy
HuSpaCy: industrial-strength Hungarian natural language processing
Install / Use
/learn @huspacy/HuspacyREADME

HuSpaCy is a spaCy library providing industrial-strength Hungarian language processing facilities through spaCy models. The released pipelines consist of a tokenizer, sentence splitter, lemmatizer, tagger (predicting morphological features as well), dependency parser and a named entity recognition module. Word and phrase embeddings are also available through spaCy's API. All models have high throughput, decent memory usage and close to state-of-the-art accuracy. A live demo is available here, model releases are published to Hugging Face Hub.
This repository contains material to build HuSpaCy and all of its models in a reproducible way.
Installation
To get started using the tool, first, we need to download one of the models. The easiest way to achieve this is to install huspacy (from PyPI) and then fetch a model through its API.
pip install huspacy
import huspacy
# Download the latest CPU optimized model
huspacy.download()
Install the models directly
You can install the latest models directly from 🤗 Hugging Face Hub:
- CPU optimized large model:
pip install hu_core_news_lg@https://huggingface.co/huspacy/hu_core_news_lg/resolve/main/hu_core_news_lg-any-py3-none-any.whl - GPU optimized transformers model:
pip install hu_core_news_trf@https://huggingface.co/huspacy/hu_core_news_trf/resolve/main/hu_core_news_trf-any-py3-none-any.whl
To speed up inference on GPU, CUDA must be installed as described in https://spacy.io/usage.
Read more on the models here
Quickstart
HuSpaCy is fully compatible with spaCy's API, newcomers can easily get started with spaCy 101 guide.
Although HuSpacy models can be loaded with spacy.load(...), the tool provides convenience methods to easily access downloaded models.
# Load the model using spacy.load(...)
import spacy
nlp = spacy.load("hu_core_news_lg")
# Load the default large model (if downloaded)
import huspacy
nlp = huspacy.load()
# Load the model directly as a module
import hu_core_news_lg
nlp = hu_core_news_lg.load()
To process texts, you can simply call the loaded model (i.e. the nlp callable object)
doc = nlp("Csiribiri csiribiri zabszalma - négy csillag közt alszom ma.")
As HuSpaCy is built on spaCy, the returned doc document contains all the annotations given by the pipeline components.
API Documentation is available in our website.
Models overview
We provide several pretrained models:
hu_core_news_lgis a CNN-based large model which achieves a good balance between accuracy and processing speed. This default model provides tokenization, sentence splitting, part-of-speech tagging (UD labels w/ detailed morphosyntactic features), lemmatization, dependency parsing and named entity recognition and ships with pretrained word vectors.hu_core_news_trfis built on huBERT and provides the same functionality as the large model except the word vectors. It comes with much higher accuracy in the price of increased computational resource usage. We suggest using it with GPU support.hu_core_news_mdgreatly improves onhu_core_news_lg's throughput by loosing some accuracy. This model could be a good choice when processing speed is crucial.hu_core_news_trf_xlis an experimental model built on XLM-RoBERTa-large. It provides the same functionality as thehu_core_news_trfmodel, however it comes with slightly higher accuracy in the price of significantly increased computational resource usage. We suggest using it with GPU support.
HuSpaCy's model versions follows spaCy's versioning scheme.
A demo of the models is available at Hugging Face Spaces.
To read more about the model's architecture we suggest reading the relevant sections from spaCy's documentation.
Comparison
| Models | md | lg | trf | trf_xl |
|-----------------|--------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|
| Embeddings | 100d floret | 300d floret | transformer:<br/>huBERT | transformer:<br/>XLM-RoBERTa-large |
| Target hardware | CPU | CPU | GPU | GPU |
| Accuracy | ⭑⭑⭑⭒ | ⭑⭑⭑⭑ | ⭑⭑⭑⭑⭒ | ⭑⭑⭑⭑⭑ |
| Resource usage | ⭑⭑⭑⭑⭑ | ⭑⭑⭑⭑ | ⭑⭑ | ⭒ |
Citation
If you use HuSpaCy or any of its models, please cite it as:
@InProceedings{HuSpaCy:2023,
author= {"Orosz, Gy{\"o}rgy and Szab{\'o}, Gerg{\H{o}} and Berkecz, P{\'e}ter and Sz{\'a}nt{\'o}, Zsolt and Farkas, Rich{\'a}rd"},
editor= {"Ek{\v{s}}tein, Kamil and P{\'a}rtl, Franti{\v{s}}ek and Konop{\'i}k, Miloslav"},
title = {{"Advancing Hungarian Text Processing with HuSpaCy: Efficient and Accurate NLP Pipelines"}},
booktitle = {{"Text, Speech, and Dialogue"}},
year = "2023",
publisher = {{"Springer Nature Switzerland"}},
address = {{"Cham"}},
pages = "58--69",
isbn = "978-3-031-40498-6"
}
@InProceedings{HuSpaCy:2021,
title = {{HuSpaCy: an industrial-strength Hungarian natural language processing toolkit}},
booktitle = {{XVIII. Magyar Sz{\'a}m{\'\i}t{\'o}g{\'e}pes Nyelv{\'e}szeti Konferencia}},
author = {Orosz, Gy{\"o}rgy and Sz{\' a}nt{\' o}, Zsolt and Berkecz, P{\' e}ter and Szab{\'
Related Skills
claude-opus-4-5-migration
107.6kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
346.8kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
TrendRadar
50.7k⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
mcp-for-beginners
15.8kThis open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.
