SkillAgentSearch skills...

Clics3

See also the Code Ocean capsule (https://codeocean.com/capsule/7201165/tree/v2) accompanying this project.

Install / Use

/learn @clics/Clics3
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

CLICS³

The third installment of CLICS - the database of Cross-Linguistic Colexifications.

Cite as

Rzymski, Tresoldi et al. 2019. The Database of Cross-Linguistic Colexifications, reproducible analysis of cross- linguistic polysemies. DOI: doi.org/10.17613/5awv-6w15

This repository contains

This data is licensed under CC BY 4.0.

Creating the CLICS database

  1. Analyses should ideally be carried within virtual environments, so that guarantee that the necessary libraries don't interact with other pipelines. This is particularly true if you have multiple Python versions installed in your system. Please confirm that you are running clics3 in a virtual environment if you run into any issue. There are many different solutions on how to use virtual environments, with details depending on the system you use (for example, if you are using Conda). A good overview is provided here but, for most systems, it should be enough to create an environment with

    python -m venv env
    

    And activate it with

    source env/bin/activate
    

    Once all the work had been carried, you can leave the virtual environment by closing your shell or issueing the deactivate command.

  2. Install version 3 of the Python package pyclics DOI:

    pip install "pyclics>=3.0"
    
  3. Download and install the Lexibank datasets from which to aggregate colexifications:

    curl -O https://raw.githubusercontent.com/clics/clics3/master/datasets.txt
    pip install -r datasets.txt
    
  4. Download data for the reference catalogs Glottolog and Concepticon:

    cldfbench catconfig
    
  5. Create the SQLite database clics.sqlite:

    clics load --glottolog-version v4.0 --concepticon-version v2.2.0
    

    and confirm all 30 datasets have been loaded

    clics datasets
    

| # | Dataset | Parameters | Concepticon | Varieties | Glottocodes | Families | |:----|:------------------|-------------:|--------------:|------------:|--------------:|-----------:| | 1 | abrahammonpa | 304 | 304 | 30 | 16 | 2 | | 2 | allenbai | 499 | 499 | 9 | 9 | 1 | | 3 | bantubvd | 420 | 415 | 10 | 10 | 1 | | 4 | beidasinitic | 736 | 735 | 18 | 18 | 1 | | 5 | bodtkhobwa | 553 | 536 | 8 | 8 | 1 | | 6 | bowernpny | 338 | 338 | 175 | 172 | 1 | | 7 | castrosui | 510 | 508 | 16 | 3 | 1 | | 8 | chenhmongmien | 793 | 793 | 22 | 20 | 1 | | 9 | diacl | 537 | 537 | 371 | 351 | 25 | | 10 | halenepal | 699 | 662 | 13 | 13 | 2 | | 11 | hantganbangime | 299 | 299 | 22 | 22 | 5 | | 12 | hubercolumbian | 346 | 345 | 69 | 65 | 16 | | 13 | ids | 1310 | 1308 | 320 | 275 | 60 | | 14 | kraftchadic | 433 | 428 | 66 | 59 | 2 | | 15 | lexirumah | 604 | 602 | 357 | 231 | 12 | | 16 | logos | 707 | 707 | 5 | 5 | 1 | | 17 | marrisonnaga | 580 | 572 | 40 | 39 | 1 | | 18 | mitterhoferbena | 342 | 335 | 13 | 13 | 1 | | 19 | naganorgyalrongic | 969 | 877 | 10 | 8 | 1 | | 20 | northeuralex | 952 | 951 | 107 | 107 | 21 | | 21 | robinsonap | 391 | 391 | 13 | 13 | 1 | | 22 | satterthwaitetb | 418 | 418 | 18 | 18 | 1 | | 23 | sohartmannchin | 279 | 279 | 8 | 7 | 1 | | 24 | suntb | 929 | 929 | 49 | 49 | 1 | | 25 | tls | 1140 | 811 | 126 | 107 | 1 | | 26 | transnewguineaorg | 904 | 865 | 1004 | 760 | 106 | | 27 | tryonsolomon | 317 | 314 | 111 | 96 | 5 | | 28 | wold | 1459 | 1458 | 41 | 41 | 24 | | 29 | yanglalo | 875 | 869 | 7 | 7 | 1 | | 30 | zgraggenmadang | 311 | 310 | 98 | 98 | 1 | | | TOTAL | 0 | 2906 | 3156 | 2271 | 200 |

  1. Create the colexification network (encoded as GML graph):
    clics -t 3 -f families colexification --show 20 --format pipe
    
    This will create the graph at graphs/network-3-families.gml and show the 20 most common colexifications:

| ID A | Concept A | ID B | Concept B | Families | Languages | Words |
|-------:|:-------------------------|-------:|:-------------------------------|-----------:|------------:|--------:| | 906 | TREE | 1803 | WOOD | 59 | 348 | 361 | | 1313 | MOON | 1370 | MONTH | 57 | 324 | 327 | | 72 | CLAW | 1258 | FINGERNAIL | 55 | 236 | 243 | | 1297 | LEG | 1301 | FOOT | 52 | 349 | 358 | | 1352 | KNIFE | 3210 | KNIFE (FOR EATING) | 51 | 268 | 282 | | 2267 | SON-IN-LAW (OF MAN) | 2266 | SON-IN-LAW (OF WOMAN) | 49 | 261 | 280 | | 763 | SKIN | 1204 | BARK | 49 | 209 | 213 | | 1307 | LANGUAGE | 1599 | WORD | 49 | 148 | 149 | | 1277 | HAND | 1673 | ARM | 48 | 294 | 300 | | 1408 | HEAR | 1608 | LISTEN | 48 | 107 | 109 | | 634 | MEAT | 2259 | FLESH | 47 | 252 | 262 | | 2265 | DAUGHTER-IN-LAW (OF MAN) | 2264 | DAUGHTER-IN-LAW (OF WOMAN) | 47 | 234 | 256 | | 763 | SKIN | 629 | LEATHER | 46 | 236 | 258 | | 837 | BLUE | 1425 | GREEN | 46 | 195 | 204 | | 2261 | MALE (OF PERSON) | 2263 | MALE (OF ANIMAL) | 45 | 145 | 163 | | 1199 | WIFE | 962 | WOMAN | 44 | 289 | 301 | | 480 | PLATE | 481 | DISH | 44 | 155 | 170 | | 2260 | FEMALE (OF PERSON) | 2262 | FEMALE (OF ANIMAL) | 44 | 146 | 154 | | 1228 | EARTH (SOIL) | 626 | LAND | 43 | 159 | 167 | | 667 | ROAD | 2252 | PATH | 43 | 133 | 153 | | 705 | GO UP (ASCEND) | 1102 | CLIMB | 43 | 132 | 146 | | 683 | PERSON | 1554 | MAN | 41 | 199 | 205 | | 2255 | FATHER-IN-LAW (OF MAN) | 2254 | FATHER-IN-LAW (OF WOMAN) | 41 | 187 | 204 | | 133 | WEAVE | 3294 | BRAID (VERB) OR WEAVE (BASKET) | 41 | 122 | 133 | | 2261 | MALE (OF PERSON) | 1554 | MAN | 41 | 104 | 115 | | 1474 | SEA | 645 | OCEAN | 41 | 101 | 110 | | 215 | LIE DOWN | 1585 | SLEEP | 40 | 191 | 197 | | 1265 | HIGH | 711 | TALL | 40 | 168 | 182 | | 256 | FOOD | 1526 | MEAL | 40 | 124 | 136 | | 1732 | SKY | 1565 | HEAVEN | 40 | 117 | 120 | | 1443 | WALK | 695 | GO | 39 | 288 | 320 | | 2257 | MOTHER-IN-LAW (OF MAN) | 2256 | MOTHER-IN-LAW (OF WOMAN) | 39 | 181 | 203 | | 1618 | GRANDSON | 1619 | GRANDDAUGHTER | 39 | 133 | 151 | | 1203 | LONG | 711 | TALL | 39 | 113 | 121 | | 2260 | FEMALE (OF PERSON) | 962 | WOMAN | 39 | 109 | 119 | | 948 | WATER | 666 | RIVER | 38 | 197 | 200 | | 1229 | OLD | 406 | OLD MAN | 38 | 103 | 107 | | 531 | HOW MUCH | 3450 | HOW MANY PIECES | 37 | 184 | 203 | | 706 | DARK | 163 | BLACK | 37 | 95 | 97 | | 855 | SEIZE | 702 | CATCH | 36 | 150 | 161 |

  1. Run subgraph and infomap cluster algorithms:
    clics --seed 42 -t 3 -f families makeapp
    
    The clustered networks will be written to GML graphs and exported in a way suitable for exploring with the CLICS javascript app. We can get some summary statistics running
    clics -t 3 -f families --graphname infomap graph_stats
    -----------  ----
    nodes        1647
    edges        2967
    components     92
    communities   249
    -----------  ----
    
    Note that clustering may be non-deterministic, i.e. you may compute slightly different clusters than the ones distributed in the GML files in this repository.
  2. Finally, we can explore the clusters in the CLICS javascript app:
    clics runapp
    
View on GitHub
GitHub Stars18
CategoryDevelopment
Updated10mo ago
Forks5

Security Score

72/100

Audited on May 28, 2025

No findings