Verbecc
Verbe Complete Conjugator (verbecc) supports Catalan, Spanish, French, Italian, Portuguese and Romanian and can predict conjugation for unknown verbs using Machine Learning
Install / Use
/learn @bretttolbert/VerbeccREADME
Python library for verb conjugation in French, Spanish, Catalan, Italian, Portuguese, and Romanian, enhanced by machine learning
[EN] Verbs completely conjugated: verb conjugations for French, Spanish, Portuguese, Italian, Romanian and Catalan, enhanced by machine learning
[CA] Verbs completament conjugats: conjugacions verbals per a francès, espanyol, portuguès, italià, romanès i català, millorades per l'aprenentatge automàtic
[ES] Verbos completamente conjugados: conjugaciones de verbos en francés, español, portugués, italiano, rumano y catalán, mejoradas por aprendizaje automático
[FR] Verbes complètement conjugués: conjugaisons des verbes français, espagnol, portugais, italien, roumain et catalan, à l'aide de l'apprentissage automatique
[IT] Verbi completamente coniugati: coniugazioni di verbi per francese, spagnolo, portoghese, italiano, rumeno e catalano, migliorate dall'apprendimento automatico
[PT] Verbos completamente conjugados: conjugações verbais para francês, espanhol, português, italiano, romeno e catalão, aprimoradas pelo aprendizado de máquina
[RO] Verbe complet conjugate: conjugări de verbe pentru franceză, spaniolă, portugheză, italiană, română și catalană, îmbunătățite de învățarea automată
Contents
- Quick Start
- Live Demo
- Example Output
- What's new in Verbecc 2.0
- Academic publications referencing Verbecc
- Typing - Parameter and Data Type Annotations
- Multi-Language Conjugation
- Multi-Language Conjugation using English mood and tense names via
localizationmodule - Credits
Live Demo
Example Output
| Français / French | Català / Catalan | Español / Castellano / Spanish | Português / Portuguese | Italiano / Italian | Română / Romanian |
| ------ | ------ | ------- | -------- | --------- | ------ |
| Français / French | Català / Catalan | Español / Castellano /Spanish | Português / Portuguese | Italiano / Italian | Română / Romanian |
| French être (to be) | Catalan ser (to be) | Spanish ser (to be) | Portuguese ser (to be) | Italian essere (to be) | Romanian fi (to be) |
| French se lever (to lift oneself) | | | | | |
| French ubériser (to "uberize") (unknown verb conjugated with ML template prediction)) | | | | | |
Features
- Multilingual
- Conjugate verbs in six romance languages: French, Spanish, Portuguese, Italian, Romanian, Catalan
- Includes Spanish voseo conjugation, with regional options in development.
- Predict conjugation of unknown verbs with 99% accuracy using machine learning techniques
- Conjugate thousands of known verbs without machine learning, using simple string transformations based on XML conjugation templates
- Complete
- Includes both simple and compound conjugations (i.e. with helping/auxiliary verbs)
- Includes alternate conjugations (for regional variations, e.g. Catalan vs. Valencian)
- Includes inflections for all genders where applicable
- Includes inlections for misc. pronouns such as the Spanish pronouns
ustedandustedesand the French pronounon.
- Quality
- Fully type-annotated python library
- Unit-tests require type-annotations on everything
- Typed return data
- Meticulously organized source tree
- Has a plethora of unit-tests to ensure correctness of verb conjugations
- Continuous Integration with GitHub Actions CI/CD pipeline
- CI tests python 3.9, 3.10, 3.11, 3.12, 3.13 and 3.14.
- Dependencies:
scikit-learn,scipy,numpy,lxml,pyaml,jsbeautifier,importlib_resources
- Fully type-annotated python library
- Trusted
- Cited in academic publications
Quick Start
git clone https://github.com/bretttolbert/verbecc.git
cd verbecc
pip install .
Academic publications referencing verbecc
What's new in Verbecc 2.0
| verbecc 1.x | verbecc 2.x |
| --- | --- |
| lang='fr' | lang=Lang.fr / from verbecc import LangCodeISO639_1 as Lang |
| mood="indicatif" | mood=Moods.fr.Indicatif / from verbecc import Moods |
| tense="présent" | tense=Tenses.fr.Présent / from verbecc import Tenses |
| gender='f' | gender=Gender.f / from verbecc import Gender |
| person="1s" | person=Person.First, number=Number.Singular / from verbecc import Person, Number |
| Conjugations include masculine pronouns (default) or feminine but not both | All pronouns, including both masculine and feminine third-person pronouns are included |
| lang_specific_options is a parameter of the conjugate method | lang_specific_options is a parameter of the CompleteConjugator class constructor |
| gender is a parameter of the conjugate method | there is no gender parameter, instead all possible gender inflections are returned |
| alternate_options is a parameter of the conjugate method | there is no alternate_options parameter, instead all possible conjugations, including alternates, are returned (use c[0] to get default conjugation, c[1] to get first alternate, etc.) |
| Spanish Conjugations include tú (default) or vos but not both | All pronouns, including both tú and vos are included |
| Pronouns such as French on and Spanish usted/ustedes not included | French on and Spanish usted/ustedes pronouns are included |
| Array index is used to determine Person, i.e. 1s, 2s, 3s, 1p, 2p, 3p | Each Conjugation object in the TenseConjugation has Person, Number and Gender values (any of which may be None if not-applicable) |
| Returned objects are primitive (Dict) data types | Returned wrapper objects are subclasses of AbstractConjugation (e.g. CompleteConjugation) with get_data() and to_json() methods |
| Conjugator returns CompleteConjugationData | CompleteConjugator returns wrapper type CompleteConjugation, CompleteConjugation.get_data() returns CompleteConjugationData |
| (no wrapper types) | Wrapper types hierarchy: CompleteConjugation > MoodsConjugation > MoodConjugation > TenseConjugation > Conjugation -> conjugations: List[str] |
| Primitive data types hierarchy: Conjugation > MoodsConjugation > MoodConjugation > TenseConjugation > PersonConjugation | Primitive data types hierarchy: CompleteConjugationData > MoodsConjugationData > MoodConjugationData > TenseConjugationData > ConjugationData -> conjugations: List[str] |
| pred_score was always included in the output | pred_score is only included in output if predicted is true |
| Only returned primitive Python data | Conjugation objects have both .to_json() and .to_yaml() methods | |
Typing - Parameter and Data Type Annotations
Originally verbecc used strings for most parameters. verbecc is now fully type-annotated but strings are still supported for backwards-compatibility and ease of use. This is accomplished using StrEnum for parameters and by defining a hierarchy of typing type definitions for the returned data objects (See conjugation.py).
E.g.:
>>> from verbecc import grammar_defines, localization, Moods, Tenses, Person, Number, Gender, LangCodeISO639_1 as Lang
>>> xmood = localization.xmood
>>> xtense = localization.xtense
>>> grammar_defines.SUPPORTED_LANGUAGES[Lang.fr]
'français'
>>> xtense(Lang.fr, Tenses.en.Present)
<TenseFr.Présent: 'présent'>
>>> xmood(Lang.fr, Moods.en.Subjunctive)
<MoodFr.Subjonctif: 'subjonctif'>
>>> Gender.f
<Gender.f: 'f'>
>>> Number.Singular
<Number.Singular: 's'>
>>> Person.First
<Person.First: '1'>
Multi-Language Conjugation
>>> from functools import partial
>>> from verbecc import CompleteConjugator, LangCodeISO639_1 as Lang, grammar_defines, Moods, Tenses
>>> ccgs = {lang : CompleteConjugator(lang) for lang in grammar_defines.SUPPORTED_LANGUAGES}
>>> print([c[0] for c in ccgs[Lang.fr].conjugate('être')[Moods.fr.Indicatif][Tenses.fr.Présent]])
['je suis', 'tu es', 'il est', 'elle est', 'on est', 'nous sommes', 'vous êtes', 'ils sont', 'elles sont']
>>> print([c[0] for c in ccgs[Lang.es].conjugate('ser')[Moods.es.Indicativo][Tenses.es.Presente]])
[
