Toxicity

The world's largest social media toxicity dataset.

Generate Convert Improve

Install / Use

/learn @surge-ai/Toxicity

About this skill

Quality Score

0/100

README

The Toxicity Dataset

by Surge AI, the world's most powerful NLP data labeling platform and workforce

Saving the internet is fun. Combing through thousands of online comments to build a toxicity dataset isn't. That's why we're creating the world's largest dataset of social media toxicity — so you can skip the slog and get to work.

We hope you find this sample of our dataset useful, whether you want to flag hateful speech, develop content moderation tools, or build classifiers to detect toxic messages.

Interested in the full dataset of toxicity to train your ML models, or toxicity in other languages (Spanish, French, German, Japanese, Portuguese, and 17+ more)? We work with top AI and Safety companies around the world to build human-powered datasets to train stunning ML. Reach out to team@surgehq.ai!

Dataset

This repo contains 500 toxic and 500 non-toxic comments from a variety of popular social media platforms. Click on toxicity_en.csv to see a spreadsheet of 1000 English examples. Rather than operating under a strict definition of toxicity, we asked our team to identify comments that they personally found toxic.

Columns

text: the text of the comment
is_toxic: whether or not the comment is toxic

Future

We'll be adding more languages and annotations (e.g., augmenting each comment with a severity ranking, adding categories, etc) over time. You can also check out our other free datasets here.

If you're also interested in a dataset of profanity, check out our obscenity list.

Related Skills

bluebubbles

337.4k

Use when you need to send or manage iMessages via BlueBubbles (recommended iMessage integration). Calls go through the generic message tool with channel="bluebubbles".

bear-notes

337.4k

Create, search, and manage Bear notes via grizzly CLI.

claude-ads

1.2k

Comprehensive paid advertising audit & optimization skill for Claude Code. 186 checks across Google, Meta, YouTube, LinkedIn, TikTok & Microsoft Ads with weighted scoring, parallel agents, and industry templates.

claude-ads

1.2k

surge-ai

View profile

View on GitHub

GitHub Stars189

CategoryMarketing

Updated2mo ago

Forks15

surge-ai/toxicity

Security Score

100/100

Audited on Jan 14, 2026

No findings