About Small World of Words project (SWOW) & SWOW-ZH

The small world of words project is a large-scale scientific study that aims to build a mental dictionary or lexicon in the major languages of the world and make this information widely available <a id="fnr.1" class="footref" href="#fn.1" role="doc-backlink">1</a>.

In contrast to a thesaurus or dictionary, we use word associations to learn about what words mean and which ones are central in the human mind. This enables psychologists, linguists, neuro-scientists and others to test new theories about how we represent and process language. This knowledge could also be applied in a variety of ways, from learning about the difference between cultures, to learning (or forgetting) new words in a first or a second language.

SWOW-ZH is a daughter project of SWOW to map mental lexicon in Chinese, as the suffix ZH stands for Zhongwen (中文, Chinese). It was initiated to provide a comprehensive framework to measure the mental lexicon with regard to the Chinese culture and people, and the bases for comparative studies between Chinese and other languages.

The participant task we used is called multiple response association <a id="fnr.2" class="footref" href="#fn.2" role="doc-backlink">2</a>. The methodology is based on a continued word association task, in which participants see a cue word and are asked to give three associated responses to this cue word. As the number of participants increases, the lexicon becomes comprehensive and efficient in representing mental lexicon. Therefore, it focuses on the aspects of word meaning that are shared between people without imposing restrictions on what aspects of meaning should be considered.

Chinese is a demographically and culturally complex language, whose dialects and writing systems are difficult to exhaust. In the SWOW-ZH project, we primarily focused on Mandarin Chinese (普通话, Putonghua) and simplified Chinese writing system, which are used in most regions of the Chinese mainland. Additionally, the native dialect of the participants was collected as a complementary information. Alternatively, another SWOW daughter project focusing on Cantonese, SWOW-HK, might be of your interest.

The study was conducted in Professor CAI Qing's lab at the School of Psychology and Cognitive Science, East China Normal University (华东师范大学心理与认知科学学院，蔡清教授团队), in collaboration with Dr. Simon de Deyne at Melbourne University, who founded the SWOW project when he was under the supervision by Professor Gert Storms at University of Leuven.
Please address questions and suggestions to:
- DING Ziyi | 丁子益 | ziyi.ecnu@gmail.com | ZiyiDing7@github
- LI Bing | 李兵 | lbing314@gmail.com | lib314a@github
Affiliations:
- Shanghai Key Laboratory of Brain Functional Genomics (Ministry of Education), Affiliated Mental Health Center (ECNU), Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai, China
- Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai, China
Thanks:
- This work was supported by the National Natural Science Foundation of China (grant numbers 31970987 to Qing Cai) and the Australian Research Council Early Career Grant (DE140101749 to Simon De Deyne)
License of the data: See https://smallworldofwords.org/en/project/
License of the code: <a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a> This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.

Cite us:

APA: Li, B., Ding, Z., De Deyne, S., & Cai, Q. (2024). A large-scale database of Mandarin Chinese word associations from the Small World of Words Project. Behavior Research Methods, 57(1), 34. http://dx.doi.org/10.3758/s13428-024-02513-1
bibtex:

  @article{li_large-scale_2024,
title = {A large-scale database of {Mandarin} {Chinese} word associations from the {Small} {World} of {Words} {Project}},
volume = {57},
issn = {1554-3528},
url = {https://link.springer.com/10.3758/s13428-024-02513-1},
doi = {10.3758/s13428-024-02513-1},
language = {en},
number = {1},
urldate = {2025-01-02},
journal = {Behavior Research Methods},
author = {Li, Bing and Ding, Ziyi and De Deyne, Simon and Cai, Qing},
month = dec,
year = {2024},
pages = {34},
}

Download the dataset

Access the dataset from the webpage: https://smallworldofwords.org/zh/project/research

Current: 4 November 2024

Instructions to the repository

Prompt: Instead of exploring the repo in your browser, cloning it onto your local machine may be more convenient.

In this repository you will find a basic analysis pipeline for the Chinese SWOW project which allows you to import a preprocessing the data as well as compute some basic statistics.

Obtaining the data

In addition to the scripts, you will need to retrieve the word association data. Currently word association and participant data is available for 10,192 cues. The data consists of over 2 million responses collected between 2016 and 2023. They are published. If you want to use these data for your own research, you can obtain them from the Small World of Words research page (https://smallworldofwords.org/zh/project/research).

To start the pipeline, SWOW-ZH_raw.(csv|mat) should be put into the data folder.

While the majority of the data was collected on the SWOW platform (ZH), a subset was collected on another China-based surveying platform NAODAO (脑岛) using the same tasks with the same inclusion standards. This presumably won't detriment the reliability of the data.

If you find any of this useful, please consider sharing the word association study (https://smallworldofwords.org/zh/project).

Raw data

Since this is an ongoing project, data is regularly updated. Hence, all datafiles refer to a release date in its filename. You need to rename the data according to the README so that the scripts can properly load the data.

SequenceNumber: A system coding, ascending from 1 to the end.
TrialsID: Unique identifiers for trials. Each trial is made up of one cue and three responses.
ParticipantID: Unique identifiers for the participants.
Created_at: Time and date when trials were finished.
Age: Age reported by participants.
NativeLanguage: Chinese dialects and Mandarin reported by participants.
- Tags in the NAODAO platform:
  - PUTON: Putonghua or Mandarin, which is the standards of pronunciation populated officially (普通话);
  - SOUTHE: Southeast dialects, which represents northern and southern Fujian dialects, covering most of Fujian, Chaoshan, Hainan and Taiwan (东南部方言：代表为包括闽北及闽南方言，覆盖福建大部及潮汕、海南及台湾);
  - NORTH: Northern dialects representing the three northeastern provinces and the Inner Mongolian dialect, Hebei-Yulu, Jiaodong, Liaodong and the northern part of the Hanshui River Basin (北方方言：代表为东北三省及内蒙方言、冀豫鲁、胶东、辽东和汉水流域北部);
  - SOUTH: Southern dialects representing Cantonese in Guangxi, Guangdong, Hainan, Hong Kong and Macau (南部方言：代表为包括广西、广东和海南的平话、白话，及香港和澳门的粤语);
  - JIANG: Jianghuai dialects, which represents Jianghuai River Basin, Subei and Lunan (江淮方言：代表为江淮流域及苏北、鲁南);
  - SHAN: Shan-Shaan dialects from Shaanxi and Shanxi (陕、晋方言：代表为陕西及山西各地);
  - HAKKA: Hakka languages scattered all over China (客家话：代表为分布在各地的客家族语);
  - SOUTHW: Southwestern dialects from most of Yunguichuan, Hubei, and Hunan (西南方言：代表为云贵川鄂湘大部);
  - WU: Wu dialects from Jiangxi and eastern Anhui, most of Zhejiang and Shanghai (吴方言：代表为江西和安徽东部、浙江大部及上海);
  - NORTHW: Northwestern dialects from Yinchuan, Lanzhou, and Xining (西北方言：代表为银川、兰州、西宁).
- Tags in the SWOW platform:
  - PUTON: Which is the same as NAODAO;
  - EASTW: Which is the same as WU on NAODAO;
  - JIANG: Which is the same as NAODAO;
  - SHAN: Which is the same as N

SWOWZH

Install / Use

README

Table of Contents

About Small World of Words project (SWOW) & SWOW-ZH

Download the dataset

Instructions to the repository

Obtaining the data

Raw data