SkillAgentSearch skills...

Sylvia

Use phoneme-based regular expressions to find words in the Carnegie-Mellon Pronouncing Dictionary.

Install / Use

/learn @bgutter/Sylvia
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

#+TITLE: Sylvia

Search pronunciations in the CMU Pronouncing Dictionary using a reglular-expression like syntax. Input-format regular expressions are lightly preprocessed into Python-format regular expressions, and then mapped over an encoded version of cmudict. Results are sorted by popularity using Peter Norvig's list of word popularities derived from Google's N-Gram dataset.

  • Here for the Emacs library? You can skip this and jump directly into the [[./sylvia-emacs/README.org][sylvia-mode README]]!

  • Installation

#+BEGIN_SRC sh brandon@brandon-babypad-linux ~> pip2 install sylvia #+END_SRC

  • Usage

Interactive Sylvia prompt:

#+BEGIN_SRC brandon@brandon-babypad-linux ~> python2 -m sylvia

Type 'help' for options, press enter to quit.

sylvia> #+END_SRC

Run one-off command:

#+BEGIN_SRC brandon@brandon-babypad-linux ~> python2 -m sylvia -c "regex G #* AE #* IH #* %" Gravity Graphical Grandchildren Garrison Graphically
Gallegos Gravitate Garretson Gastineau Gallimard
Galligan Grandison Gallivan Glatfelter Garibay
Garelick Garrigan Garriga Gravitates Galipeau
Gavigan Gamelin Gateley Grandillo Galipault
Garringer Gradison Grandchildren's Glastetter Garity
Galliher Gantenbein #+END_SRC

  • Commands

Sylvia's functionality is broken down into various subcommands. These commands can be run from the interactive prompt, or as single-lines directly from your system shell.

** regex

This is the most powerful feature of Sylvia. It allows searches of cmudict based on phoneme patterns.

Sylvia's query format is nearly identical to traditional Python 2 regular expressions, with the exception that it is intended not to match against patterns of characters, but rather patterns of phonemes. To construct a regular expression query for Sylvia, remember the following rules:

  1. Whitespace must be used to delimit consecutive phoneme literals. It may also be used anywhere else in the regular expression, as whitespace is meaningless in the context of a phoneme sequence, and will be stripped during preprocessing.
  2. # is a shortcut for "any consonant sound"
  3. @ is a shortcut for "any vowel sound"
  4. % is a shortcut for "any syllable", and is equivalant to #*@#*
  5. Otherwise, whatever flies with Python's regular expression format will work in Sylvia. Just use some common sense, as some things (such as character classes) will be wholly inapplicable to searches in phoneme-space.

Use of this command is as follows:

#+BEGIN_SRC sylvia> regex {regex tokens} #+END_SRC

[[http://www.speech.cs.cmu.edu/cgi-bin/cmudict][Consult Carnegie Mellon's cmudict documentation]] to learn more about the phoneme set.

[[https://docs.python.org/2/library/re.html][Consult the Python docs]] to learn more about Python's regex format.

*** Examples

Find words starting with zero or more consonant sounds, followed by the "long E" sound (phoneme IY), followed by zero or more consonant sounds, followed by the "ed" sound (the phoneme sequence EH D):

#+BEGIN_SRC sylvia> regex #* IY #* EH D Steelhead Seabed Beachhead Retread Behead #+END_SRC

Find all six syllable words where the first syllable uses the "short i" sound (phoneme IH), and ends in either the D or P phonemes.

#+BEGIN_SRC sylvia> regex #*IH%%%%%(D|P) Differentiated Individualized Deteriorated Institutionalized
Incapacitated Internationalized Interrelationship Misappropriated
Disassociated Discombobulated Insubstantiated
#+END_SRC

Note here that only five % symbols are needed, as a single vowel sound constitutes a single syllable, and we explicitly call out the first vowel sound via IH.

Find all words that start with the R sound, followed by some vowel, followed by the D sound, followed by another vowel, followed by the NG phoneme:

#+BEGIN_SRC sylvia> regex R@D@NG Reading Riding Redding Raiding Ridding Reding Rodding Ruding Rawding #+END_SRC

** lookup

If you just want to lookup the pronunciations for a word, you can do that too. This can be a good way to quickly learn the phonemes for a particular sound when constructing queries. Due to cultural and geographic variations in pronunciation, this command can return multiple sequences.

Use of this command is as follows:

#+BEGIN_SRC sylvia> lookup {word} #+END_SRC

*** Examples

#+BEGIN_SRC sylvia> lookup turkmenistan T ER K M EH N IH S T AE N
#+END_SRC

#+BEGIN_SRC sylvia> lookup capture K AE P CH ER
#+END_SRC

#+BEGIN_SRC sylvia> lookup tomato T AH M EY T OW T AH M AA T OW
#+END_SRC

** rhyme

Sylvia can act as a rhyming dictionary, returning words which rhyme with a given word. There are three "rhyme levels", which define how rhymes are determined.

  1. perfect lists all words which contain the same sequence of phonemes as the given word, including and following the first vowel in the given pronunciation. Before that vowel, the matched words can contain any sounds.
  2. default is the same as perfect, except that additional consonant sounds can be interspersed between the matched sequence phonemes.
  3. loose is similar, except it ignores consonant sounds entirely.

Use of this command is as follows:

#+BEGIN_SRC sylvia> rhyme {rhyme-level} {word} #+END_SRC

rhyme-level can be omitted if default behavior is desired.

There are plans to improve these models by matching phonemes based on their vocal characteristics. For example, all nasal phonemes may be considered matches by default, or all plosive sounds, etc. The behavior documented above is subject to change at any time.

*** Examples

List words which rhyme with "chatter", using the perfect algorithm.

#+BEGIN_SRC sylvia> rhyme perfect chatter Matter Latter Batter Mater Platter
Scatter Flatter Shatter Hatter Splatter
Fatter Patter Antimatter Clatter Spatter
Schlatter Blatter Natter Sater Satter
Slatter Tatter Mcphatter Chitterchatter Smatter
Vanatter Vannater Vatter Vannatter Mcfatter
Wildcatter
#+END_SRC

...using the default algorithm...

#+BEGIN_SRC sylvia> rhyme chatter
After Chapter Matter Master
Factors Factor Pattern Faster
Matters Webmaster Patterns Adapter
Contractor Contractors Disaster Actor
Masters Latter Chapters Actors
Adapters Lancaster Saturn Adaptor
Pastor Thereafter Tractor Scattered
Disasters Ticketmaster Napster Laughter
Reactor Adaptors Baxter Stratford
Blaster Lantern Bastard Maxtor
Tractors Shattered Plaster Hereafter
Subchapter Batter Broadcasters Antwerp
Raptor Mater Platter Scatter
Hamster Raster Subcontractor Reactors
Pastors Subcontractors Broadcaster Mastered ... many more... #+END_SRC

...and using the loose algorithm.

#+BEGIN_SRC sylvia> rhyme loose chatter After Standard Password Chapter
Standards Rather Matter Cancer
Answer Master Transfer Answers
Factors Factor Pattern Faster
Matters Manner Webmaster Patterns
Hampshire Adapter Contractor Banner
Contractors Alexander Capture Disaster
Actor Masters Traveler Latter
Albert Chapters Packard Answered
Scanner Bachelor Actors Transfers
Adverse Amber Tracker Transferred
Planner Hacker Commander Adapters
Scanners Manufactured Stanford Manufacture
Anchor Gathered Travelers Captured
Grammar Hazard Anger Gather
Lancaster Hammer Manor Programmer
Hazards Bradford Madagascar Saturn
Banners Passwords Adaptor Pastor
Hamburg Ladder Flashers Programmers
Planner

Related Skills

View on GitHub
GitHub Stars35
CategoryDevelopment
Updated8mo ago
Forks2

Languages

Python

Security Score

82/100

Audited on Jul 16, 2025

No findings