Emojional
Emoji embeddings trained using their emotional content from their online dictionary meanings.
Install / Use
/learn @elenabarry/EmojionalREADME
Emojional
The corresponding paper for this repository can be found here.
Inspired by the current lack of existing emoji embedding models and their limited understanding of the nature of the evolving emotional content of the emoji, we have created novel emoji embeddings using their emotional content from their dictionary meanings. The subsequent emoji embeddings are generally more accurate than the state-of-the-art embeddings when tested on the task of sentiment analysis.
As these embeddings were also trained on keywords, the subsequent embeddings are durable and can be used in different natural language tasks such as emotion, cyberbully and sarcasm detection successfully. The current embedding file contains all emojis as of v13.1 from Unicode.org (1816 emojis). The emoji embedding file will be updated when new emojis are added.
Creating the Dataset
We scraped the key emotive words from the online emoji dictionaries Emojipedia and Emojis.Wiki and created a new dataset. This is the script we used to scrape each emoji description from these websites. By using a list of uniquely emotive, sensory and other keywords we were able to use the Python library Beautiful Soup to scrape any matched words for each emoji description.
The final dataset structure looks like this: 🔮 crystal ball future magic mysterious
The full dataset can be found here.
In order for the model to train the data, the data needs to be in a tab-delimited, newline-delimited format:
- crystal ball 🔮 True
- magic 🔮 True
- mysterious 🔮 True
To achieve this we created a change dataset format script which also shuffles the data.
Negative Sampling
To make quality embeddings, we created negative samples.
- ripe fruits🔮 False
- dirt 🔮 False
- approval 🔮 False
Test Train Dev Split
Our full dataset consists of 10854 true samples and 890 false samples. We use a 91.8% train, 4.1% test, 4.1% develop split.
Our Data Folder
The data used to train the model can be found here.
Training Folder
- train.txt consists of 9964 true samples.
- test.txt consists of 445 true samples and 445 false samples.
- dev.txt consists of 445 true samples and 445 false samples which are different from then test.txt.
Testing Folder
- train.txt uses 20 true samples
- test.txt uses 20 true samples
- dev.txt uses 20 true samples
The testing folder contains 20 identical true samples.
Training
We used a PyTorch implementation of emoji2vec [1]. The original implementation of emoji2vec can be found here [2]. The model will generate emoji vectors with dimension 300, training in batches of 8, 4 positive and 4 negative examples at a learning rate of 0.001. The model performs early-stopping on a held-out development set using 60 epochs of training. Various metrics, including an accuracy and F1 score are outputted.
Training the dataset
We downloaded the repository of the PyTorch implementation of emoji2vec [1] and updated the file 'presentation.ipynb'. We replaced the data folder with our new data and downloaded pretrained word vectors Google News word2vec to run this implementation.
If the file ‘phrase_embeddings.pkl’ exists in the ‘pre-trained’ folder, it needs to be deleted as this will allow a new dictionary to be created from the new dataset. The file ‘presentation.ipynb’ is run to train the emoji embeddings. This implementation of the model will produce our emojional embeddings.
Testing
We downloaded the repository of emoji2vec [2] and updated several files to current Python standards. We tested different versions of our emoji embeeding output files by adding them to the folder 'data/word2ec', as well as a copy of the Google News word2vec embeddings. The file 'TwitterClassfication.ipynb' executes the testing.
Results
We compared our emoji embeddings to the state-of-the-art emoji embeddings using a Twitter sentiment analysis task on a 2015 dataset. Our emojional embeddings generally beat other embeddings using Random Forests and scored the second highest using Linear SVM.
<img width="467" alt="Screenshot 2021-05-31 at 02 46 22" src="https://user-images.githubusercontent.com/53048127/120128657-73741f80-c1ba-11eb-8f0e-9930e157937b.png">Mapping to Emotions and Key Words
We have evaluated the emoji embeddings on a list of emotions, sensations, feelings and keywords. Each emoji embeddings can be seen to successfully display multiple senses.
Plutchicks Wheel of Emotion

Humour

Seasonal

Visulization
Visualizing Embeddings in 2D spaces
We also present our results in the form of t-SNE visualisation where you can see clusters of emotions in 2D space. We used the Microsoft repository emoji2recipe[3] and updated the 'Visualisation.ipynb' script to work with current package standards.

Using the Emoji Embeddings
To use the embedding you need to download the emojional.bin file and include the following code within your model.
import gensim
e2v = gensim.models.KeyedVectors.load_word2vec_format("emojional.bin", binary=True)
References
[1]”pwiercinski/emoji2vec_pytorch", GitHub. [Online]. Available: https://github.com/pwiercinski/emoji2vec_pytorch. [Accessed: 30- Mar- 2021].
[2]”uclnlp/emoji2vec", GitHub. [Online]. Available: https://github.com/uclnlp/emoji2vec. [Accessed: 30- Mar- 2021].
[3]”microsoft/Emoji2recipe", GitHub. [Online]. Available: https://github.com/microsoft/Emoji2recipe. [Accessed: 30- Mar- 2021].
Related Skills
qqbot-channel
348.2kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.2k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
348.2kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
