SkillAgentSearch skills...

Abvdoceanic

No description available

Install / Use

/learn @lexibank/Abvdoceanic
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

CLDF dataset derived from Greenhill et al.'s "Austronesian Basic Vocabulary Database" from 2020 focusing on Oceanic languages

CLDF validation

How to cite

If you use these data please cite

  • the original source

    Greenhill, S.J., Blust. R, & Gray, R.D. (2008). The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics. Evolutionary Bioinformatics, 4:271-283.

  • the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-BY-4.0 license

Available online at https://abvd.shh.mpg.de/austronesian/

Conceptlists in Concepticon:

Notes

Notes:

Making a Nexus File:

You will need to have the lexibank dataset installed. Probably best outside the directory:

# set up and install a virtual environment
python -m venv env
source ./env/bin/activate

# clone git repository
git clone https://github.com/lexibank/abvdoceanic

# or update repository
cd abvd_oceanic
git checkout main
git pull
cd ..

# install dataset
cd abvd_oceanic
pip install -e .
cd ..

To make a nexus file, use the custom abvdoceanic.nexus in cldfbench. The parameters are:

  • --output=/path/to/filename.nex = the output file to write.
  • --ascertainment={token} add BEASTs ascertainment correction if you want.
    • overall - one ascertainment character added for overall correction.
    • word - per word ascertainment correction.
  • --removecombined={int} - set level at which to filter combined cognates.
# make a nexus file, with combined cognates removed above level 2:
cldfbench abvdoceanic.nexus --removecombined 2 --output abvdoceanic.nex

# ...with per-word ascertainment correction:
cldfbench abvdoceanic.nexus --ascertainment=word --removecombined 2 --output abvdoceanic.nex

Statistics

CLDF validation Glottolog: 100% Concepticon: 100% Source: 0% BIPA: 100% CLTS SoundClass: 100%

  • Varieties: 418 (linked to 411 different Glottocodes)
  • Concepts: 191 (linked to 191 different Concepticon concept sets)
  • Lexemes: 78,515
  • Sources: 0
  • Synonymy: 1.14
  • Cognacy: 74,236 cognates in 9,490 cognate sets (2,308 singletons)
  • Cognate Diversity: 0.12
  • Invalid lexemes: 0
  • Tokens: 392,135
  • Segments: 432 (0 BIPA errors, 0 CLTS sound class errors, 431 CLTS modified)
  • Inventory size (avg): 30.64

Possible Improvements:

  • Entries missing sources: 78515/78515 (100.00%)

Contributors

Name | GitHub user | Description | Role --- | --- | --- | --- Simon J. Greenhill | @SimonGreenhill | maintainer | Author Mary Walworth | @maryewal | maintainer | Author Isaac Stead | @antipodite | maintainer | Author Tihomir Rangelov | @tihomirrangelov | maintainer | Author Johann-Mattis List | @lingulist | orthography profiles | Editor Frederic Blum | @FredericBlum | orthography profiles | Editor

CLDF Datasets

The following CLDF datasets are available in cldf:

View on GitHub
GitHub Stars6
CategoryDevelopment
Updated7mo ago
Forks2

Languages

HTML

Security Score

72/100

Audited on Aug 15, 2025

No findings