SkillAgentSearch skills...

English

World's most accurate and fast procedural English conjugation library

Install / Use

/learn @gold-silver-copper/English

README

english

Crates.io Docs.rs License Discord

english is a blazing fast and light weight English inflection library written in Rust. Total bundled data size is less than 1 MB. It provides extremely accurate verb conjugation and noun/adjective declension based on highly processed Wiktionary data, making it ideal for real-time procedural text generation.

⚡ Speed and Accuracy

Evaluation of the English inflector (extractor/main.rs/check_*) and performance benchmarking (examples/speedmark.rs) shows:

| Part of Speech | Correct / Total | Accuracy | Throughput (calls/sec) | Time per Call | |----------------|----------------|-----------|-----------------------|---------------| | Nouns | 243495 / 244001 | 99.79% | 7,499,749 | 133 ns | | Verbs | 161215 / 165457 | 97.44% | 12,423,891 | 80 ns | | Adjectives | 121512 / 121719 | 99.83% | 15,607,807 | 64 ns |

Note: Benchmarking was done under a worst-case scenario; typical real-world usage is 50~ nanoseconds faster.

📦 Installation

cargo add english

Then in your code:

use english::*;
fn main() {
    // --- Mixed Sentence Example ---
    let subject_number = Number::Plural;
    let subject = format!(
        "{} {}",
        English::verb(
            "run",
            &Person::First,
            &Number::Singular,
            &Tense::Present,
            &Form::Participle
        ),
        English::noun("child", &subject_number)
    ); // running children
    let verb = English::verb(
        "steal",
        &Person::Third,
        &subject_number,
        &Tense::Past,
        &Form::Finite,
    ); //stole
    let object = count_with_number("potato", 7); //7 potatoes

    let sentence = format!("The {} {} {}.", subject, verb, object);
    assert_eq!(sentence, "The running children stole 7 potatoes.");

    // --- Nouns ---
    assert_eq!(
        format!("{} of jeans", count_with_number("pair", 3)),
        "3 pairs of jeans"
    );
    // Regular plurals
    assert_eq!(English::noun("cat", &Number::Plural), "cats");
    // Add a number 2-9 to the end of the word to try different forms.
    assert_eq!(English::noun("die2", &Number::Plural), "dice");
    // Use count function for better ergonomics if needed
    assert_eq!(count("man", 2), "men");
    // Use count_with_number function to preserve the number
    assert_eq!(count_with_number("nickel", 3), "3 nickels");
    // Invariant nouns
    assert_eq!(English::noun("sheep", &Number::Plural), "sheep");

    // --- Verbs ---
    assert_eq!(
        English::verb(
            "pick",
            &Person::Third,
            &Number::Singular,
            &Tense::Past,
            &Form::Finite
        ),
        "picked"
    );
    assert_eq!(
        English::verb(
            "walk",
            &Person::First,
            &Number::Singular,
            &Tense::Present,
            &Form::Participle
        ),
        "walking"
    );
    assert_eq!(
        English::verb(
            "go",
            &Person::First,
            &Number::Singular,
            &Tense::Past,
            &Form::Participle
        ),
        "gone"
    );
    // Add a number 2-9 to the end of the word to try different forms.
    assert_eq!(
        English::verb(
            "lie",
            &Person::Third,
            &Number::Singular,
            &Tense::Past,
            &Form::Finite
        ),
        "lay"
    );
    assert_eq!(
        English::verb(
            "lie2",
            &Person::Third,
            &Number::Singular,
            &Tense::Past,
            &Form::Finite
        ),
        "lied"
    );
    // "to be" has the most verb forms in english and requires using verb()
    assert_eq!(
        English::verb(
            "be",
            &Person::First,
            &Number::Singular,
            &Tense::Present,
            &Form::Finite
        ),
        "am"
    );

    // --- Adjectives ---
    // Add a number 2-9 to the end of the word to try different forms. (Bad has the most forms at 3)
    assert_eq!(English::adj("bad", &Degree::Comparative), "more bad");
    assert_eq!(English::adj("bad", &Degree::Superlative), "most bad");
    assert_eq!(English::adj("bad2", &Degree::Comparative), "badder");
    assert_eq!(English::adj("bad2", &Degree::Superlative), "baddest");
    assert_eq!(English::adj("bad3", &Degree::Comparative), "worse");
    assert_eq!(English::adj("bad3", &Degree::Superlative), "worst");
    assert_eq!(English::adj("bad3", &Degree::Positive), "bad");

    // --- Pronouns ---
    assert_eq!(
        English::pronoun(
            &Person::First,
            &Number::Singular,
            &Gender::Neuter,
            &Case::PersonalPossesive
        ),
        "my"
    );
    assert_eq!(
        English::pronoun(
            &Person::First,
            &Number::Singular,
            &Gender::Neuter,
            &Case::Possessive
        ),
        "mine"
    );

    // --- Possessives ---
    assert_eq!(English::add_possessive("dog"), "dog's");
    assert_eq!(English::add_possessive("dogs"), "dogs'");
}

For a more involved but still minimal example of building a small domain layer on top of english, see crates/english/examples/semantic_triples.rs:

cargo run -p english --example semantic_triples

It shows custom noun/verb/adj/adv types, semantic triples, perspective-sensitive rendering, modifiers, complements, adjuncts, and agreement-driven pronoun and tense shifts.

🔧 Crate Overview

english

The public API for verb conjugation and noun/adjective declension.

  • Combines optimized data generated from extractor with inflection logic from english-core
  • Pure Rust, only one dependency: phf
  • PHF-backed irregular lookups with regular-rule fallback
  • Code generation ensures no runtime penalty

english-core

The core engine for English inflection — pure algorithmic logic.

  • Implements the core rules for conjugation/declension
  • Used to classify forms as regular or irregular for the extractor
  • Has no data dependency — logic-only
  • Can be used stand alone for an even smaller footprint (at the cost of some accuracy)

extractor

A tool to process and refine Wiktionary data.

  • Parses large English Wiktionary dumps
  • Extracts all verb, noun, and adjective forms
  • Uses english-core to filter out regular forms, preserving only irregulars
  • Generates the static PHF tables used in english

📦 Obtaining Wiktionary Data & Running the Extractor

This project relies on raw data extracted from Wiktionary. Current version built with data from 8/17/2025.

Steps

  1. Download the raw Wiktextract JSONL dump (~20 GB) from Kaikki.org.
  2. Place the file somewhere accessible (e.g. ../rawwiki.jsonl).
  3. From the repository root, run: cargo xtask refresh-data --dump ../rawwiki.jsonl
  4. The generated Rust tables are written to /crates/english/generated, and intermediate CSV/JSONL artifacts are written to /data/intermediate

To also run the extractor evaluation reports against the current library data, add --with-checks.

Benchmarks

Performance benchmarks were run on my M2 Macbook.

Writing benchmarks and tests for such a project is rather difficult and requires opinionated decisions. Many words may have alternative inflections, and the data in wiktionary is not perfect. Many words might be both countable and uncountable, the tagging of words may be inconsistent. This library includes a few uncountable words in its dataset, but not all. Uncountable words require special handling anyway. Take all benchmarks with a pound of salt, write your own tests for your own usecases. Any suggestions to improve the benchmarking are highly appreciated.

Disclaimer

Wiktionary data is often unstable and subject to weird changes. This means that the provided inflections may change unexpectedly. The generated lookup tables in crates/english/generated/*_phf.rs are the source of truth for a given revision.

Inspirations and Thanks

  • Ole in the bevy discord suggested I use phf instead of sorted arrays, this resulted in up to 40% speedups
  • https://github.com/atteo/evo-inflector
  • https://github.com/plurals/pluralize

📄 License

View on GitHub
GitHub Stars42
CategoryDevelopment
Updated3d ago
Forks3

Languages

Rust

Security Score

95/100

Audited on Mar 28, 2026

No findings