Pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Install / Use
/learn @proycon/PynlplREADME
PyNLPl - Python Natural Language Processing Library
.. image:: https://travis-ci.org/proycon/pynlpl.svg?branch=master :target: https://travis-ci.org/proycon/pynlpl
.. image:: http://readthedocs.org/projects/pynlpl/badge/?version=latest :target: http://pynlpl.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status
.. image:: http://applejack.science.ru.nl/lamabadge.php/pynlpl :target: http://applejack.science.ru.nl/languagemachines/
.. image:: https://zenodo.org/badge/759484.svg :target: https://zenodo.org/badge/latestdoi/759484
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation).
The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.
The following modules are available:
pynlpl.datatypes- Extra datatypes (priority queues, patterns, tries)pynlpl.evaluation- Evaluation & experiment classes (parameter search, wrapped progressive sampling, class evaluation (precision/recall/f-score/auc), sampler, confusion matrix, multithreaded experiment pool)pynlpl.formats.cgn- Module for parsing CGN (Corpus Gesproken Nederlands) part-of-speech tagspynlpl.formats.folia- Extensive library for reading and manipulating the documents inFoLiA <http://proycon.github.io/folia>_ format (Format for Linguistic Annotation).pynlpl.formats.fql- Extensive library for the FoLiA Query Language (FQL), built on top ofpynlpl.formats.folia. FQL is currently documentedhere <https://github.com/proycon/foliadocserve>__.pynlpl.formats.cql- Parser for the Corpus Query Language (CQL), as also used by Corpus Workbench and Sketch Engine. Contains a convertor to FQL.pynlpl.formats.giza- Module for reading GIZA++ word alignment datapynlpl.formats.moses- Module for reading Moses phrase-translation tables.pynlpl.formats.sonar- Largely obsolete module for pre-releases of the SoNaR corpus, usepynlpl.formats.foliainstead.pynlpl.formats.timbl- Module for reading Timbl output (consider usingpython-timbl <https://github.com/proycon/python-timbl>_ instead though)pynlpl.lm.lm- Module for simple language model and reader for ARPA language model data as well (used by SRILM).pynlpl.search- Various search algorithms (Breadth-first, depth-first, beam-search, hill climbing, A star, various variants of each)pynlpl.statistics- Frequency lists, Levenshtein, common statistics and information theory functionspynlpl.textprocessors- Simple tokeniser, n-gram extraction
Installation
Download and install the latest stable version directly from the Python Package
Index with pip install pynlpl (or pip3 for Python 3 on most
systems). For global installations prepend sudo.
Alternatively, clone this repository and run python setup.py install (or
python3 setup.py install for Python 3 on most system. Prepend sudo for
global installations.
This software may also be found in the certain Linux distributions, such as
the latest versions as Debian/Ubuntu, as python-pynlpl and python3-pynlpl.
PyNLPL is also included in our LaMachine <http://proycon.github.io/LaMachine>_ distribution.
Documentation
API Documentation can be found here <http://pynlpl.readthedocs.io/en/latest/>__.
Related Skills
claude-opus-4-5-migration
82.9kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
336.5kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
TrendRadar
49.8k⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
mcp-for-beginners
15.6kThis open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed for developers, it focuses on practical techniques for building modular, scalable, and secure AI workflows from session setup to service orchestration.
