8 skills found
aalto-speech / MorfessorMorfessor is a tool for unsupervised and semi-supervised morphological segmentation
mast-group / OpenVocabCodeNLMContains the code for our ICSE 2020 paper: Big Code != Big Vocabulary: Open-Vocabulary Language Models for Source Code and for its earlier pre-print: Maybe Deep Neural Networks are the Best Choice for Modeling Source Code (https://arxiv.org/abs/1903.05734). This is the first open vocabulary language model for code that uses the byte pair encoding algorithm (BPE) to learn a segmentation of code tokens into subword units.
jiesutd / SubwordEncoding CWSSubword Encoding in Lattice LSTM for Chinese Word Segmentation
aalto-speech / FlatcatMorfessor FlatCat
Waino / Morfessor EmpruneMorfessor EM+Prune
majeek / Vml HdParsing and subword segmentation code for the VML-HD Dataset
marialyu / Arabic SegmentationPrototype python tool for extraction arabic text line from image and segmentation it into subwords
cooelf / Subword SegEffective Subword Segmentation for Text Comprehension (TASLP 2019)