SkillAgentSearch skills...

Syntaxdot

Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.

Install / Use

/learn @tensordot/Syntaxdot

README

SyntaxDot

Introduction

SyntaxDot is a sequence labeler and dependency parser using Transformer networks. SyntaxDot models can be trained from scratch or using pretrained models, such as BERT or XLM-RoBERTa.

In principle, SyntaxDot can be used to perform any sequence labeling task, but so far the focus has been on:

  • Part-of-speech tagging
  • Morphological tagging
  • Topological field tagging
  • Lemmatization
  • Named entity recognition

The easiest way to get started with SyntaxDot is to use a pretrained sticker2 model (SyntaxDot is currently compatbile with sticker2 models).

Features

  • Input representations:
    • Word pieces
    • Sentence pieces
  • Flexible sequence encoder/decoder architecture, which supports:
    • Simple sequence labels (e.g. POS, morphology, named entities)
    • Lemmatization, based on edit trees
    • Simple API to extend to other tasks
    • Dependency parsing as sequence labeling
  • Dependency parsing using deep biaffine attention and MST decoding.
  • Multi-task training and classification using scalar weighting.
  • Encoder models:
    • Transformers
    • Finetuning of BERT, XLM-RoBERTa, ALBERT, and SqueezeBERT models
  • Model distillation
  • Deployment:
    • Standalone binary that links against PyTorch's libtorch
    • Very liberal license

Documentation

References

SyntaxDot uses techniques from or was inspired by the following papers:

Issues

You can report bugs and feature requests in the SyntaxDot issue tracker.

License

For licensing information, see COPYRIGHT.md.

View on GitHub
GitHub Stars80
CategoryCustomer
Updated3mo ago
Forks3

Languages

Rust

Security Score

82/100

Audited on Jan 1, 2026

No findings