Emojional

Emoji embeddings trained using their emotional content from their online dictionary meanings.

Generate Convert Improve

Install / Use

/learn @elenabarry/Emojional

About this skill

Quality Score

0/100

README

Emojional

The corresponding paper for this repository can be found here.

Inspired by the current lack of existing emoji embedding models and their limited understanding of the nature of the evolving emotional content of the emoji, we have created novel emoji embeddings using their emotional content from their dictionary meanings. The subsequent emoji embeddings are generally more accurate than the state-of-the-art embeddings when tested on the task of sentiment analysis.

As these embeddings were also trained on keywords, the subsequent embeddings are durable and can be used in different natural language tasks such as emotion, cyberbully and sarcasm detection successfully. The current embedding file contains all emojis as of v13.1 from Unicode.org (1816 emojis). The emoji embedding file will be updated when new emojis are added.

Creating the Dataset

We scraped the key emotive words from the online emoji dictionaries Emojipedia and Emojis.Wiki and created a new dataset. This is the script we used to scrape each emoji description from these websites. By using a list of uniquely emotive, sensory and other keywords we were able to use the Python library Beautiful Soup to scrape any matched words for each emoji description.

The final dataset structure looks like this: 🔮 crystal ball future magic mysterious

The full dataset can be found here.

In order for the model to train the data, the data needs to be in a tab-delimited, newline-delimited format:

crystal ball 🔮 True
magic 🔮 True
mysterious 🔮 True

To achieve this we created a change dataset format script which also shuffles the data.

Negative Sampling

To make quality embeddings, we created negative samples.

ripe fruits🔮 False
dirt 🔮 False
approval 🔮 False

Test Train Dev Split

Our full dataset consists of 10854 true samples and 890 false samples. We use a 91.8% train, 4.1% test, 4.1% develop split.

Our Data Folder

The data used to train the model can be found here.

Training Folder

train.txt consists of 9964 true samples.
test.txt consists of 445 true samples and 445 false samples.
dev.txt consists of 445 true samples and 445 false samples which are different from then test.txt.

Testing Folder

train.txt uses 20 true samples
test.txt uses 20 true samples
dev.txt uses 20 true samples

The testing folder contains 20 identical true samples.

Training

We used a PyTorch implementation of emoji2vec [1]. The original implementation of emoji2vec can be found here [2]. The model will generate emoji vectors with dimension 300, training in batches of 8, 4 positive and 4 negative examples at a learning rate of 0.001. The model performs early-stopping on a held-out development set using 60 epochs of training. Various metrics, including an accuracy and F1 score are outputted.

Training the dataset

We downloaded the repository of the PyTorch implementation of emoji2vec [1] and updated the file 'presentation.ipynb'. We replaced the data folder with our new data and downloaded pretrained word vectors Google News word2vec to run this implementation.

If the file ‘phrase_embeddings.pkl’ exists in the ‘pre-trained’ folder, it needs to be deleted as this will allow a new dictionary to be created from the new dataset. The file ‘presentation.ipynb’ is run to train the emoji embeddings. This implementation of the model will produce our emojional embeddings.

Testing

We downloaded the repository of emoji2vec [2] and updated several files to current Python standards. We tested different versions of our emoji embeeding output files by adding them to the folder 'data/word2ec', as well as a copy of the Google News word2vec embeddings. The file 'TwitterClassfication.ipynb' executes the testing.

Results

We compared our emoji embeddings to the state-of-the-art emoji embeddings using a Twitter sentiment analysis task on a 2015 dataset. Our emojional embeddings generally beat other embeddings using Random Forests and scored the second highest using Linear SVM.

Mapping to Emotions and Key Words

We have evaluated the emoji embeddings on a list of emotions, sensations, feelings and keywords. Each emoji embeddings can be seen to successfully display multiple senses.

Plutchicks Wheel of Emotion

Screenshot 2021-05-30 at 15 55 08

Humour

Screenshot 2021-05-30 at 16 32 36

Seasonal

Screenshot 2021-05-30 at 15 51 04

Visulization

Visualizing Embeddings in 2D spaces

We also present our results in the form of t-SNE visualisation where you can see clusters of emotions in 2D space. We used the Microsoft repository emoji2recipe[3] and updated the 'Visualisation.ipynb' script to work with current package standards. download

Using the Emoji Embeddings

To use the embedding you need to download the emojional.bin file and include the following code within your model.

import gensim

e2v = gensim.models.KeyedVectors.load_word2vec_format("emojional.bin", binary=True)

References

[1]”pwiercinski/emoji2vec_pytorch", GitHub. [Online]. Available: https://github.com/pwiercinski/emoji2vec_pytorch. [Accessed: 30- Mar- 2021].

[2]”uclnlp/emoji2vec", GitHub. [Online]. Available: https://github.com/uclnlp/emoji2vec. [Accessed: 30- Mar- 2021].

[3]”microsoft/Emoji2recipe", GitHub. [Online]. Available: https://github.com/microsoft/Emoji2recipe. [Accessed: 30- Mar- 2021].

Related Skills

qqbot-channel

348.2k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.2k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

348.2k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

Design

Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t

elenabarry

View profile

View on GitHub

GitHub Stars16

CategoryContent

Updated9mo ago

Forks0

elenabarry/emojional

Languages

Jupyter Notebook

Security Score

72/100

Audited on Jun 23, 2025

No findings