34 skills found · Page 1 of 2
openai / Tiktokentiktoken is a fast BPE tokeniser for use with OpenAI's models.
dragonofmercy / Tokenize2Tokenize2 is a plugin which allows your users to select multiple items from a predefined list or ajax, using autocompletion as they type to find each item. You may have seen a similar type of text entry when filling in the recipients field sending messages on facebook or tags on tumblr.
LanguageMachines / UctoUnicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
joliciel-informatique / TalismaneNLP framework: sentence detector, tokeniser, pos-tagger and dependency parser
andreihar / TaibunTaiwanese Hokkien Transliterator and Tokeniser
proycon / Python UctoThis is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
jonsafari / Tok TokA fast, simple, multilingual tokenizer
mvitlov / Tiktokentiktoken is a BPE tokeniser for use with OpenAI's models
jesopo / IrctokensRFC1459 and IRCv3 protocol tokeniser library for python3
pavelsof / IpatokIPA tokeniser
pringao-chevere / NFT HorcruxNFT Tokeniser
rockerBOO / Sd TokenizerView the tokenisation of your words using the tokeniser for a Stable Diffusion model.
bauwenst / TkTkTA collection of Pythonic subword tokenisers and text preprocessing tools.
ShrimpingIt / MedeaLow-overhead JSON tokeniser / lexer / parser library for Micropython and Python3
ben-sb / JisuJavaScript Parser
robfahey / Ja TokeniserMeCab-based Japanese Language Tokeniser optimised for Twitter Data
danny50610 / Bpe TokeniserPHP port for openai/tiktoken (most)
JerryFans / Flutter Tiktokenflutter_tiktoken is a flutter offline package for a fast BPE tokeniser for OpenAI models.
steve-fryatt / TokenizeCross-platform tokeniser for ARM BBC BASIC.
kuhumcst / RtfreaderText segmenter and tokeniser for Danish, English and other languages. Reads an RTF or flat text file and outputs the text, one line per sentence & optionally tokenized.