Clts
Cross-Linguistic Transcription Systems
Install / Use
/learn @cldf-clts/CltsREADME
Cross-Linguistic Transcription Systems
This repository provides the data underlying the "cross-linguistic transcription systems" project (CLTS [siː ɛl tʰiː ɛs]), which offers transcription systems and transcription data for various sources. Please see CONTRIBUTING.md for more information on how to contribute.
Master data
This repository contains files that are generated by running commands from the
pyclts package, intended to help with curation.
Thus, it is important to know where master (or authoritative) copies of certain
data types live (i.e. where to edit data).
- References:
data/references.bib - Feature system:
pkg/transcriptionsystems/features.json - BIPA transcription system:
pkg/transcriptionsystems/bipa/ - Index of source datasets:
sources/index.tsv - Source datasets:
sources/*/graphemes.tsv
CLDF Dataset
CLDF Metadata: cldf-metadata.json
Sources: data/references.bib
The Cross-Linguistic Transcription Systems (CLTS) project provides a catalog of speech sounds aggregated from (and linked to) phonetic notation systems from various sources.
property | value --- | --- dc:conformsTo | CLDF Generic dc:identifier | https://doi.org/10.5281/zenodo.3515744
<a name="table-sourcesindextsv"></a>Table sources/index.tsv
CLTS is compiled from information about transcriptions and how these relate to sounds from many sources, such as phoneme inventory databases like PHOIBLE or relevant typological surveys.
property | value --- | --- dc:extent | 33
Columns
Name/Property | Datatype | Description
--- | --- | ---
NAME | string | Primary key
DESCRIPTION | string |
REFS | list of string (separated by , ) | References data/references.bib::BibTeX-key
TYPE | string<br>Valid choices:<br> td ts sc | CLTS groups transcription information into three categories: Transcription systems (ts), transcription data (td) and soundclass systems (sc).
URITEMPLATE | string | Several CLTS sources provide an online catalog of the graphemes they describe. If this is the case, the URI template specified in this column was used to derive the URL column in graphemes.csv.
<a name="table-datafeaturestsv"></a>Table data/features.tsv
The feature system employed by CLTS describes sounds by assigning values for certain features (constrained by sound type). The permissible values per (feature, sound type) are listed in this table.
property | value --- | --- dc:extent | 163
Columns
Name/Property | Datatype | Description
--- | --- | ---
ID | string | Primary key
TYPE | string<br>Valid choices:<br> consonant vowel tone | CLTS distinguishes the basic sound types consonant, vowel, tone, and marker. Features are defined for consonants, vowels, and tones.
FEATURE | string | Note that CLTS features are not necessarily binary.
VALUE | string |
<a name="table-datagraphemestsv"></a>Table data/graphemes.tsv
property | value --- | --- dc:extent | 81895
Columns
Name/Property | Datatype | Description
--- | --- | ---
PK | integer | Primary key
GRAPHEME | string | Grapheme used in a particular transcription to denote a sound
NAME | string | The ordered concatenation of feature values of the denoted sound<br>References data/sounds.tsv::NAME
BIPA | string | The grapheme for the denoted sound in the Broad IPA transcription system
DATASET | string | Links to the source of this grapheme<br>References sources/index.tsv::NAME
FREQUENCY | integer |
URL | anyURI | URL of the grapheme in its source online database
IMAGE | string | Image of the typeset grapheme.
SOUND | string | Audio recording of the sound being pronounced.
EXPLICIT | string | Indicates whether the mapping of grapheme to sound was done manually (explicitly, +) or whether it was inferred from the Grapheme.
FEATURES | string | Features of the sound as described in the local feature system of the source dataset
NOTE | string |
<a name="table-datasoundstsv"></a>Table data/sounds.tsv
property | value --- | --- dc:extent | 8765
Columns
Name/Property | Datatype | Description
--- | --- | ---
ID | string |
NAME | string | Ordered list of features + sound type<br>Primary key
FEATURES | list of string (separated by ) | Ordered list of feature values for the sound.<br>References data/features.tsv::ID
GRAPHEME | string | CLTS choses the BIPA grapheme as canonical representative of the graphemes mapped to a sound.
UNICODE | list of string (separated by /) | Unicode character names of the codepoints in GRAPHEME
GENERATED | boolean | Indicates whether the sound was inferred by our algorithmic procedure (which is active for all diphthongs, all cluster sounds, but also all sounds which we do not label explicitly) or whether no inference was needed, since the sound is explicitly defined.
TYPE | string<br>Valid choices:<br> consonant vowel diphthong tone cluster | CLTS defines five sound types: consonant, vowel, tone, diphthong, and cluster. The latter two are always GENERATED.
NOTE | string |
Related Skills
node-connect
347.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
