sentiment.ai

New package for using pre-trained deep learning models (from tf hub) embed text and predict sentiment minus the hassle! In benchmarks, we are head-and-shoulders above traditional lexical sentiment analysis and even go toe-to-toe with Azure Cognitive Services (only we're free!) while also making it easy to work with text embeddings for other analyses. What's more, the community can contribute sentiment scoring models such that the power of sentiment.ai can grow over time!

See github.io page here https://benwiseman.github.io/sentiment.ai/

Contributors:

GitHub Contributors Image

Overview

Korn Ferry Institute's AITMI team made sentiment.ai for researchers and tinkerers who want a straight-forward way to use powerful, open source deep learning models to improve their sentiment analyses. Our approach is relatively simple and out performs the current best offerings on CRAN and even Microsoft's Azure Cognitive Services. Given that we felt the current norm for sentiment analysis isn't quite good enough, we decided to open-source our simplified interface to turn Universal Sentence Encoder embedding vectors into sentiment scores.

We've wrapped a lot of the underlying hassle up to make the process as simple as possible. In addition to just being cool, this approach solves several problems with traditional sentiment analysis, namely:

More robust, can handle spelling mitsakes and mixed case, and can be applied to dieciséis (16) languages!
Doesn't need a ridged lexicon, rather it matches to an embedding vector (reduces language to a vector of numbers that capture the information, kind of like a PCA). This means you can get scores for words that are not in the lexicon but are similar to existing words!
Choose the context for what negative and positive mean using the sentiment_match() function. For example, you could set positive to mean "high quality" and negative to mean "low quality" when looking at product reviews.
Power Because it draws from language embedding models trained on billions of texts, news articles, and wikipedia entries, it is able to detect things such as "I learned so much on my trip to Hiroshima museum last year!" is associated with something positive and that "What happeded to the people of Hiroshima in 1945" is associated with something negative.
The power is yours We've designed sentiment.ai such that the community can contribute sentiment models via github. This way, it's easier for the community to work together to make sentiment analysis more reliable! Currently only xgboost and glms (trained on the 512-D embeddings generated with tensorflow) are supported, however in a future update we will add functionality to allow arbitrary sentiment scoring models.

Simple Example

# Load the packages
require(sentiment.ai)
require(SentimentAnalysis)
require(sentimentr)
require(data.table)

# Only if it's your first ever time
# install_sentiment.ai()

# Initiate the model
# This will create the sentiment.ai.embed model
# Do this so it can be reused without recompiling - especially on GPU!
init_sentiment.ai()

text <- c(
    "What a great car. It stopped working after a week.",
    "Steve Irwin working to save endangered species",
    "Bob Ross teaching people how to paint",
    "I saw Adolf Hitler on my vacation in Argentina...",
    "the resturant served human flesh",
    "the resturant is my favorite!",
    "the resturant is my favourite!",
    "this restront is my FAVRIT innit!",
    "the resturant was my absolute favorite until they gave me food poisoning",
    "This fantastic app freezes all the time!",
    "I learned so much on my trip to Hiroshima museum last year!",
    "What happened to the people of Hiroshima in 1945",
    "I had a blast on my trip to Nagasaki",
    "The blast in Nagasaki",
    "I love watching scary horror movies",
    "This package offers so much more nuance to sentiment analysis!",
     "you remind me of the babe. What babe? The babe with the power! What power? The power of voodoo. Who do? You do. Do what? Remind me of the babe!"
)

# sentiment.ai
sentiment.ai.score <- sentiment_score(text)

# From Sentiment Analysis
sentimentAnalysis.score <- analyzeSentiment(text)$SentimentQDAP

# From sentimentr
sentimentr.score <- sentiment_by(get_sentences(text), 1:length(text))$ave_sentiment


example <- data.table(target = text, 
                      sentiment.ai = sentiment.ai.score,
                      sentimentAnalysis = sentimentAnalysis.score,
                      sentimentr = sentimentr.score)

| target | sentiment.ai | sentimentAnalysis | sentimentr | |:------:|:-------------:| :----------------:|:----------:| |What a great car. It stopped working after a week. |-0.7 | 0.4 | 0.09 | |Steve Irwin working to save endangered species | 0.27 | 0.17 | -0.09 | |Bob Ross teaching people how to paint | 0.28 | 0 | 0 | |I saw Adolf Hitler on my vacation in Argentina… | -0.29 | 0 | 0.27 | |the resturant served human flesh | -0.32 | 0.25 | 0 | |the resturant is my favorite! | 0.8 | 0.5 | 0.34 | |the resturant is my favourite! | 0.78 | 0 | 0 | |this restront is my FAVRIT innit! | 0.63 | 0 | 0 | |the resturant was my absolute favorite until they gave me food poisoning | -0.36 | 0 | 0.12 | |This fantastic app freezes all the time! | -0.41 | 0.25 | 0.13 | |I learned so much on my trip to Hiroshima museum last year! | 0.64 | 0 | 0 | |What happened to the people of Hiroshima in 1945 | -0.58 | 0 | 0 | |I had a blast on my trip to Nagasaki |0.73 | -0.33 | -0.13 | |The blast in Nagasaki | -0.51 | -0.5 | -0.2 | |I love watching scary horror movies | 0.54 | 0 | -0.31 | |This package offers so much more nuance to sentiment analysis! | 0.74 | 0 | 0 | |you remind me of the babe. What babe? The babe with the power! What power? The power of voodoo. Who do? You do. Do what? Remind me of the babe! | 0.55 | 0.3 | -0.05 |

Benchmarks

So, what impact does more robust detection and some broader context have? To test it, in real-world scenarios, we use two datasets/use cases:

classifying whether review text from Glassdoor.com is from a pro or a con
the popular airline tweet sentiment dataset.

We use the default settings for sentimentr, the QDAP dictionary in sentimentAnalysis, and en.large in sentiment.ai. We prefer the use of Kappa to validate classification as it's a less forgiving metric than F1 scores. In both benchmarks sentiment.ai comes out on top by a decent margin!

Note that our testing and tuning was one using comments written in English.

Glassdoor

Applied example, estimating whether the text from a glassdoor.com review is positive or negative. The validation set used here is the same data KFI used in our 2020 SIOP workshop

Note: As a part of KFI's core purpose, sentiment.ai's scoring models were tuned with extra work-related data, hence this is tilted in our favor!

Airline Tweets

Taken from the airline tweet dataset from Kaggle. Classification is positive vs negative (neutral was omitted to remove concerns about cutoff values).

Note: Azure Cognitive Services tune their sentiment model on product reviews, as such this is tilted in favor of Azure!

Fierce. It looks like we can be pretty confident that sentiment.ai is a pretty fab alternative to existing packages! Note that over time our sentiment scoring models will get better and better!

Installation & Setup

New installation

After installing sentiment.ai from CRAN, you will need to make sure you have a compatible python environment for tensorflow and tensorflow-text. As this can be a cumbersome experience, we included a convenience function to install that for you:

install_sentiment.ai()

This only needs to be run the first time you install the package. If you're feeling adventurous, you can modify the environment it will create with the following paramaters:

envname - the name of the virtual environment
method - if you specifically want "conda" or "virtualenv"
gpu - set to TRUE if you want to run tensorflow-gpu
python_version - The python version used in the virtual environment
modules - a names list of the dependencies and versions


# Just leave this as default unless you have a good reason to change it. 
# This is quite dependent on specific versions of python moduules
install_sentiment.ai()

Assuming you're using RStudio, it ca

Sentiment.AI

Install / Use

README