Sentimentr

Dictionary based sentiment analysis that considers valence shifters

Generate Convert Improve

Install / Use

/learn @trinker/Sentimentr

About this skill

Quality Score

0/100

README

sentimentr

sentimentr is designed to quickly calculate text polarity sentiment in the English language at the sentence level and optionally aggregate by rows or grouping variable(s).

sentimentr is a response to my own needs with sentiment detection that were not addressed by the current R tools. My own polarity function in the qdap package is slower on larger data sets. It is a dictionary lookup approach that tries to incorporate weighting for valence shifters (negation and amplifiers/deamplifiers). Matthew Jockers created the syuzhet package that utilizes dictionary lookups for the Bing, NRC, and Afinn methods as well as a custom dictionary. He also utilizes a wrapper for the Stanford coreNLP which uses much more sophisticated analysis. Jocker's dictionary methods are fast but are more prone to error in the case of valence shifters. Jocker's addressed these critiques explaining that the method is good with regard to analyzing general sentiment in a piece of literature. He points to the accuracy of the Stanford detection as well. In my own work I need better accuracy than a simple dictionary lookup; something that considers valence shifters yet optimizes speed which the Stanford's parser does not. This leads to a trade off of speed vs. accuracy. Simply, sentimentr attempts to balance accuracy and speed.

Why sentimentr

So what does sentimentr do that other packages don't and why does it matter?

sentimentr attempts to take into account valence shifters (i.e., negators, amplifiers (intensifiers), de-amplifiers (downtoners), and adversative conjunctions) while maintaining speed. Simply put, sentimentr is an augmented dictionary lookup. The next questions address why it matters.

So what are these valence shifters?

A negator flips the sign of a polarized word (e.g., "I do not like it."). See lexicon::hash_valence_shifters[y==1] for examples. An amplifier (intensifier) increases the impact of a polarized word (e.g., "I really like it."). See lexicon::hash_valence_shifters[y==2] for examples. A de-amplifier (downtoner) reduces the impact of a polarized word (e.g., "I hardly like it."). See lexicon::hash_valence_shifters[y==3] for examples. An adversative conjunction overrules the previous clause containing a polarized word (e.g., "I like it but it's not worth it."). See lexicon::hash_valence_shifters[y==4] for examples.

Do valence shifters really matter?

Well valence shifters affect the polarized words. In the case of negators and adversative conjunctions the entire sentiment of the clause may be reversed or overruled. So if valence shifters occur fairly frequently a simple dictionary lookup may not be modeling the sentiment appropriately. You may be wondering how frequently these valence shifters co-occur with polarized words, potentially changing, or even reversing and overruling the clause's sentiment. The table below shows the rate of sentence level co-occurrence of valence shifters with polarized words across a few types of texts.

<table> <thead> <tr class="header"> <th align="left">Text</th> <th align="right">Negator</th> <th align="right">Amplifier</th> <th align="right">Deamplifier</th> <th align="right">Adversative</th> </tr> </thead> <tbody> <tr class="odd"> <td align="left">Cannon reviews</td> <td align="right">21%</td> <td align="right">23%</td> <td align="right">8%</td> <td align="right">12%</td> </tr> <tr class="even"> <td align="left">2012 presidential debate</td> <td align="right">23%</td> <td align="right">18%</td> <td align="right">1%</td> <td align="right">11%</td> </tr> <tr class="odd"> <td align="left">Trump speeches</td> <td align="right">12%</td> <td align="right">14%</td> <td align="right">3%</td> <td align="right">10%</td> </tr> <tr class="even"> <td align="left">Trump tweets</td> <td align="right">19%</td> <td align="right">18%</td> <td align="right">4%</td> <td align="right">4%</td> </tr> <tr class="odd"> <td align="left">Dylan songs</td> <td align="right">4%</td> <td align="right">10%</td> <td align="right">0%</td> <td align="right">4%</td> </tr> <tr class="even"> <td align="left">Austen books</td> <td align="right">21%</td> <td align="right">18%</td> <td align="right">6%</td> <td align="right">11%</td> </tr> <tr class="odd"> <td align="left">Hamlet</td> <td align="right">26%</td> <td align="right">17%</td> <td align="right">2%</td> <td align="right">16%</td> </tr> </tbody> </table>

Indeed negators appear ~20% of the time a polarized word appears in a sentence. Conversely, adversative conjunctions appear with polarized words ~10% of the time. Not accounting for the valence shifters could significantly impact the modeling of the text sentiment.

The script to replicate the frequency analysis, shown in the table above, can be accessed via:

val_shift_freq <- system.file("the_case_for_sentimentr/valence_shifter_cooccurrence_rate.R", package = "sentimentr")
file.copy(val_shift_freq, getwd())

Why sentimentr
Functions
The Equation
Installation
Examples
Contact

Functions

There are two main functions (top 2 in table below) in sentimentr with several helper functions summarized in the table below:

<table style="width:100%;"> <colgroup> <col width="25%" /> <col width="74%" /> </colgroup> <thead> <tr class="header"> <th>Function</th> <th>Description</th> </tr> </thead> <tbody> <tr class="odd"> <td><code>sentiment</code></td> <td>Sentiment at the sentence level</td> </tr> <tr class="even"> <td><code>sentiment_by</code></td> <td>Aggregated sentiment by group(s)</td> </tr> <tr class="odd"> <td><code>profanity</code></td> <td>Profanity at the sentence level</td> </tr> <tr class="even"> <td><code>profanity_by</code></td> <td>Aggregated profanity by group(s)</td> </tr> <tr class="odd"> <td><code>emotion</code></td> <td>Emotion at the sentence level</td> </tr> <tr class="even"> <td><code>emotion_by</code></td> <td>Aggregated emotion by group(s)</td> </tr> <tr class="odd"> <td><code>uncombine</code></td> <td>Extract sentence level sentiment from <code>sentiment_by</code></td> </tr> <tr class="even"> <td><code>get_sentences</code></td> <td>Regex based string to sentence parser (or get sentences from <code>sentiment</code>/<code>sentiment_by</code>)</td> </tr> <tr class="odd"> <td><code>replace_emoji</code></td> <td>repalcement</td> </tr> <tr class="even"> <td><code>replace_emoticon</code></td> <td>Replace emoticons with word equivalent</td> </tr> <tr class="odd"> <td><code>replace_grade</code></td> <td>Replace grades (e.g., "A+") with word equivalent</td> </tr> <tr class="even"> <td><code>replace_internet_slang</code></td> <td>replacment</td> </tr> <tr class="odd"> <td><code>replace_rating</code></td> <td>Replace ratings (e.g., "10 out of 10", "3 stars") with word equivalent</td> </tr> <tr class="even"> <td><code>as_key</code></td> <td>Coerce a <code>data.frame</code> lexicon to a polarity hash key</td> </tr> <tr class="odd"> <td><code>is_key</code></td> <td>Check if an object is a hash key</td> </tr> <tr class="even"> <td><code>update_key</code></td> <td>Add/remove terms to/from a hash key</td> </tr> <tr class="odd"> <td><code>highlight</code></td> <td>Highlight positive/negative sentences as an HTML document</td> </tr> <tr class="even"> <td><code>general_rescale</code></td> <td>Generalized rescaling function to rescale sentiment scoring</td> </tr> <tr class="odd"> <td><code>sentiment_attribute</code></td> <td>Extract the sentiment based attributes from a text</td> </tr> <tr class="even"> <td><code>validate_sentiment</code></td> <td>Validate sentiment score sign against known results</td> </tr> </tbody> </table>

The Equation

The equation below describes the augmented dictionary method of sentimentr that may give better results than a simple lookup dictionary approach that does not consider valence shifters. The equation used by the algorithm to assign value to polarity of each sentence fist utilizes a sentiment dictionary (e.g., Jockers, (2017))

Related Skills

node-connect

344.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

99.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

344.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

344.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。