44 skills found · Page 1 of 2
proycon / PynlplPyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
paperai / PdfannoLinguistic Annotation and Visualization Tool for PDF Documents
proycon / FlatFoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
ines / Spacy Graphql🤹♀️ Query spaCy's linguistic annotations using GraphQL
korpling / ANNISANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation.
ETCBC / BhsaHebrew Bible + Linguistic annotations in text-fabric format. Fixed and ongoing versions.
proycon / FoliaFoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
Clear-Bible / Macula HebrewSyntax trees, morphology, and linguistic annotations for the Hebrew Bible
Clear-Bible / Macula GreekSyntax trees, morphology, and linguistic annotations for the Greek Bible
hltfbk / E3C CorpusE3C is a freely available multilingual corpus (Italian, English, French, Spanish, and Basque) of semantically annotated clinical narratives to allow for the linguistic analysis, benchmarking, and training of information extraction systems. It consists of two types of annotations: (i) clinical entities: pathologies, symptoms, procedures, body parts, etc., according to standard clinical taxonomies (i.e. SNOMED-CT, ICD-10); and (ii) temporal information and factuality: events, time expressions, and temporal relations according to the THYME standard. The corpus is organised into three layers, with different purposes. Layer 1: about 25K tokens per language with full manual annotation of clinical entities, temporal information and factuality, for benchmarkingand linguistic analysis. Layer 2: 50-100K tokens per language with semi-automatic annotations of clinical entities, to be used to train baseline systems. Layer 3: about 1M tokens per language of non-annotated medical documents to be exploited by semi-supervised approaches. Researchers can use the benchmark training and test splits of our corpus to develop and test their own models. We trained several deep learning based models and provide baselines using the benchmark. Both the corpus and the built models will be available through the ELG platform.
acoli-repo / OliaOntologies of Linguistic Annotation. Machine-readable tagsets and annotation schemata for more than 100 languages.
cl-tohoku / PheMTA phenomenon-wise evaluation dataset for Japanese-English machine translation robustness. The dataset is based on the MTNT dataset, with additional annotations of four linguistic phenomena; Proper Noun, Abbreviated Noun, Colloquial Expression, and Variant. COLING 2020.
proycon / FoliapyAn extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
cidles / Poio ApiPoio API is a free and open source Python library to access and search data from language documentation in your linguistic analysis workflow. It converts file formats like Elan’s EAF, Toolbox files, Typecraft XML and others into annotation graphs as defined in ISO 24612. Those graphs, for which we use an implementation called “Graph Annotation Framework” (GrAF), allow unified access to linguistic data from a wide range sources.
infraling / AtomicSoftware for multi-level annotation of linguistic corpora
hexatomic / HexatomicHexatomic is an extensible software for deep multi-layer annotation of linguistic corpora
ld4lt / Linguistic AnnotationTowards a consolidated LOD vocabulary for linguistic annotations
jtauber / Plato TextsGreek texts (eventually) with linguistic annotation (for Greek Learner Texts Project)
neulab / CmulabCMU Linguistic Annotation Backend
tpellard / TypglossA LaTeX package to typeset linguistic abbreviations and syntactic annotations in interlinear glossed examples