Sentimentr
Dictionary based sentiment analysis that considers valence shifters
Install / Use
/learn @trinker/SentimentrREADME
sentimentr

sentimentr is designed to quickly calculate text polarity sentiment in the English language at the sentence level and optionally aggregate by rows or grouping variable(s).
sentimentr is a response to my own needs with sentiment detection
that were not addressed by the current R tools. My own polarity
function in the qdap package is slower on larger data sets. It is a
dictionary lookup approach that tries to incorporate weighting for
valence shifters (negation and amplifiers/deamplifiers). Matthew Jockers
created the
syuzhet package
that utilizes dictionary lookups for the Bing, NRC, and Afinn methods as
well as a custom dictionary. He also utilizes a wrapper for the
Stanford coreNLP which
uses much more sophisticated analysis. Jocker's dictionary methods are
fast but are more prone to error in the case of valence shifters.
Jocker's addressed these
critiques
explaining that the method is good with regard to analyzing general
sentiment in a piece of literature. He points to the accuracy of the
Stanford detection as well. In my own work I need better accuracy than a
simple dictionary lookup; something that considers valence shifters yet
optimizes speed which the Stanford's parser does not. This leads to a
trade off of speed vs. accuracy. Simply, sentimentr attempts to
balance accuracy and speed.
Why sentimentr
So what does sentimentr do that other packages don't and why does it matter?
sentimentr attempts to take into account valence shifters (i.e., negators, amplifiers (intensifiers), de-amplifiers (downtoners), and adversative conjunctions) while maintaining speed. Simply put, sentimentr is an augmented dictionary lookup. The next questions address why it matters.
So what are these valence shifters?
A negator flips the sign of a polarized word (e.g., "I do not like it."). See
lexicon::hash_valence_shifters[y==1]for examples. An amplifier (intensifier) increases the impact of a polarized word (e.g., "I really like it."). Seelexicon::hash_valence_shifters[y==2]for examples. A de-amplifier (downtoner) reduces the impact of a polarized word (e.g., "I hardly like it."). Seelexicon::hash_valence_shifters[y==3]for examples. An adversative conjunction overrules the previous clause containing a polarized word (e.g., "I like it but it's not worth it."). Seelexicon::hash_valence_shifters[y==4]for examples.
Do valence shifters really matter?
<table> <thead> <tr class="header"> <th align="left">Text</th> <th align="right">Negator</th> <th align="right">Amplifier</th> <th align="right">Deamplifier</th> <th align="right">Adversative</th> </tr> </thead> <tbody> <tr class="odd"> <td align="left">Cannon reviews</td> <td align="right">21%</td> <td align="right">23%</td> <td align="right">8%</td> <td align="right">12%</td> </tr> <tr class="even"> <td align="left">2012 presidential debate</td> <td align="right">23%</td> <td align="right">18%</td> <td align="right">1%</td> <td align="right">11%</td> </tr> <tr class="odd"> <td align="left">Trump speeches</td> <td align="right">12%</td> <td align="right">14%</td> <td align="right">3%</td> <td align="right">10%</td> </tr> <tr class="even"> <td align="left">Trump tweets</td> <td align="right">19%</td> <td align="right">18%</td> <td align="right">4%</td> <td align="right">4%</td> </tr> <tr class="odd"> <td align="left">Dylan songs</td> <td align="right">4%</td> <td align="right">10%</td> <td align="right">0%</td> <td align="right">4%</td> </tr> <tr class="even"> <td align="left">Austen books</td> <td align="right">21%</td> <td align="right">18%</td> <td align="right">6%</td> <td align="right">11%</td> </tr> <tr class="odd"> <td align="left">Hamlet</td> <td align="right">26%</td> <td align="right">17%</td> <td align="right">2%</td> <td align="right">16%</td> </tr> </tbody> </table>Well valence shifters affect the polarized words. In the case of negators and adversative conjunctions the entire sentiment of the clause may be reversed or overruled. So if valence shifters occur fairly frequently a simple dictionary lookup may not be modeling the sentiment appropriately. You may be wondering how frequently these valence shifters co-occur with polarized words, potentially changing, or even reversing and overruling the clause's sentiment. The table below shows the rate of sentence level co-occurrence of valence shifters with polarized words across a few types of texts.
Indeed negators appear ~20% of the time a polarized word appears in a sentence. Conversely, adversative conjunctions appear with polarized words ~10% of the time. Not accounting for the valence shifters could significantly impact the modeling of the text sentiment.
The script to replicate the frequency analysis, shown in the table above, can be accessed via:
val_shift_freq <- system.file("the_case_for_sentimentr/valence_shifter_cooccurrence_rate.R", package = "sentimentr")
file.copy(val_shift_freq, getwd())
Table of Contents
Functions
There are two main functions (top 2 in table below) in sentimentr with several helper functions summarized in the table below:
<table style="width:100%;"> <colgroup> <col width="25%" /> <col width="74%" /> </colgroup> <thead> <tr class="header"> <th>Function</th> <th>Description</th> </tr> </thead> <tbody> <tr class="odd"> <td><code>sentiment</code></td> <td>Sentiment at the sentence level</td> </tr> <tr class="even"> <td><code>sentiment_by</code></td> <td>Aggregated sentiment by group(s)</td> </tr> <tr class="odd"> <td><code>profanity</code></td> <td>Profanity at the sentence level</td> </tr> <tr class="even"> <td><code>profanity_by</code></td> <td>Aggregated profanity by group(s)</td> </tr> <tr class="odd"> <td><code>emotion</code></td> <td>Emotion at the sentence level</td> </tr> <tr class="even"> <td><code>emotion_by</code></td> <td>Aggregated emotion by group(s)</td> </tr> <tr class="odd"> <td><code>uncombine</code></td> <td>Extract sentence level sentiment from <code>sentiment_by</code></td> </tr> <tr class="even"> <td><code>get_sentences</code></td> <td>Regex based string to sentence parser (or get sentences from <code>sentiment</code>/<code>sentiment_by</code>)</td> </tr> <tr class="odd"> <td><code>replace_emoji</code></td> <td>repalcement</td> </tr> <tr class="even"> <td><code>replace_emoticon</code></td> <td>Replace emoticons with word equivalent</td> </tr> <tr class="odd"> <td><code>replace_grade</code></td> <td>Replace grades (e.g., "A+") with word equivalent</td> </tr> <tr class="even"> <td><code>replace_internet_slang</code></td> <td>replacment</td> </tr> <tr class="odd"> <td><code>replace_rating</code></td> <td>Replace ratings (e.g., "10 out of 10", "3 stars") with word equivalent</td> </tr> <tr class="even"> <td><code>as_key</code></td> <td>Coerce a <code>data.frame</code> lexicon to a polarity hash key</td> </tr> <tr class="odd"> <td><code>is_key</code></td> <td>Check if an object is a hash key</td> </tr> <tr class="even"> <td><code>update_key</code></td> <td>Add/remove terms to/from a hash key</td> </tr> <tr class="odd"> <td><code>highlight</code></td> <td>Highlight positive/negative sentences as an HTML document</td> </tr> <tr class="even"> <td><code>general_rescale</code></td> <td>Generalized rescaling function to rescale sentiment scoring</td> </tr> <tr class="odd"> <td><code>sentiment_attribute</code></td> <td>Extract the sentiment based attributes from a text</td> </tr> <tr class="even"> <td><code>validate_sentiment</code></td> <td>Validate sentiment score sign against known results</td> </tr> </tbody> </table>The Equation
The equation below describes the augmented dictionary method of sentimentr that may give better results than a simple lookup dictionary approach that does not consider valence shifters. The equation used by the algorithm to assign value to polarity of each sentence fist utilizes a sentiment dictionary (e.g., Jockers, (2017))
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
