Seegull
SeeGULL is a broad-coverage stereotype dataset in English containing stereotypes about identity groups spanning 178 countries across 8 different geo-political regions across 6 continents, as well as state-level identities within the US and India.
Install / Use
/learn @google-research-datasets/SeegullREADME
SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models
This repository contains data resources for the paper "SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models". This dataset contains stereotype examples that may be offensive.
Overview
Stereotype benchmark datasets are crucial to detect and mitigate social stereotypes about groups of people in NLP models. However, existing datasets are limited in size and coverage, and are largely restricted to stereotypes prevalent in the Western society. This is especially problematic as language technologies gain hold across the globe. To address this gap, we present SeeGULL, a broad-coverage stereotype dataset, built by utilizing generative capabilities of large language models such as PaLM, and GPT-3, and leveraging a globally diverse rater pool to validate the prevalence of those stereotypes in society. SeeGULL is in English, and contains stereotypes about identity groups spanning 178 countries across 8 different geo-political regions across 6 continents, as well as state-level identities within the US and India. We also include fine-grained offensiveness scores for different stereotypes and demonstrate their global disparities. Furthermore, we include comparative annotations about the same groups by annotators living in the region vs. those that are based in North America, and demonstrate that within-region stereotypes about groups differ from those prevalent in North America.
Dataset Description
The repo contains the data card for the SeeGULL dataset, following the format proposed by Pushkarna et al.. The data card includes details of the dataset such as intended usage, field names and meanings, annotator recruitment and payments. The dataset folder contains the following 3 files:
stereotypes_global.csv: Nationality based stereotypesstereotypes_indian_states.csv: Stereotypes about Indian Statesstereotypes_us_states.csv: Stereotypes about US States
For nationality based stereotypes (stereotypes_global.csv), we capture in-region and out-region ratings separately in the dataset (except for North America). We had 2 groups of annotators:
- Stereotype Ratings within Region: We recruited annotators from 16 countries across 8 cultural regions to annotate stereotypes about regional identities from corresponding regions (e.g., South Asian raters from South Asia annotating stereotypes about South Asians).
- Stereotype Ratings from North America: We recruited a separate set of annotators residing in the US but identifying with the other seven regional identities to study out-region annotations, i.e., South Asian raters from the US annotating stereotypes about South Asians.
Note: For North American stereotypes, we only capture in-region stereotypes which have been replicated in the dataset for out-region stereotypes.
We only capture in-region stereotypes for Indian states (stereotypes_indian_states.csv) and the US states (stereotypes_us_states.csv).
All three files contain the mean offensiveness scores. Higher the score, more offensive the stereotype.
Changes in SeeGULL V2
The stereotypes_global.csv dataset has been updated. We made the following changes:
- Removed noisy attributes
- Converted all attributes to lowercase
- Lemmatization mapped attributes to a single root word/phrase
- Removed stop words and mapped attributes to a single word/phrase
- Removed punctuations ('-', '_') from attributes
- Corrected typos in some identities
Citation
@inproceedings{jha-etal-2023-seegull,
title = "{S}ee{GULL}: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models",
author = "Jha, Akshita and
Mostafazadeh Davani, Aida and
Reddy, Chandan K and
Dave, Shachi and
Prabhakaran, Vinodkumar and
Dev, Sunipa",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-long.548",
pages = "9851--9870",
}
Related Skills
node-connect
344.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
96.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
Security Score
Audited on Jan 22, 2026
