Sentiment Analysis: Rahil Arshad 2002422 M.Tech. CS I Yr

Sentiment Analysis
Rahil Arshad
2002422
M.Tech. CS I Yr
Under Guidance Of:

Prof Gur Saran
Dept of Mathematics
Dayalbagh Educational Institute
Why sentiment analysis?
 Movie: is this review positive or negative?
 Products: what do people think about the new iPhone?
 Public sentiment: how is consumer confidence? Is
despair increasing?
 Politics: what do people think about this candidate or
issue?
 Prediction: predict election outcomes or market trends
from sentiment
2
Sentiment analysis has many
other names
 Opinion extraction
 Opinion mining
 Sentiment mining
 Subjectivity analysis
3
Scherer Typology of Affective States
 Emotion: brief organically synchronized … evaluation of a major event
 angry, sad, joyful, fearful, ashamed, proud, elated
 Mood: diffuse non-caused low-intensity long-duration change in subjective
feeling
 cheerful, gloomy, irritable, listless, depressed, buoyant
 Interpersonal stances: affective stance toward another person in a specific
interaction
 friendly, flirtatious, distant, cold, warm, supportive, contemptuous
 Attitudes: enduring, affectively colored beliefs, dispositions towards objects or
persons
 liking, loving, hating, valuing, desiring
 Personality traits: stable personality dispositions and typical behavior
tendencies
 nervous, anxious, reckless, morose, hostile, jealous
4
Sentiment Analysis
Sentiment analysis is the detection of attitudes
“enduring, affectively colored beliefs, dispositions
towards objects or persons”
1. Holder (source) of attitude
2. Target (aspect) of attitude
3. Type of attitude
○ From a set of types
 Like, love, hate, value, desire, etc.
○ Or (more commonly) simple weighted polarity:
 positive, negative, neutral, together with strength
4. Text containing the attitude
○ Sentence or entire document 5
Sentiment Analysis
 Simplest task:
Is the attitude of this text positive or
negative?
 More complex:
Rank the attitude of this text from 1 to 5
 Advanced:
Detect the target, source, or complex
attitude types
6
SENTIMENT
ANALYSIS
A Baseline Algorithm
Sentiment Classification in
Twitter
 Polarity detection:
Is a tweet hatred positive or negative?
 Data: Polarity Data 2.0:
https://www.kaggle.com/arkhoshghalb/twitter-
sentiment-analysis-hatred-speech
8
Baseline Algorithm
 Tokenization
 Feature Extraction
 Classification using different classifiers
Naïve Bayes
MaxEnt
SVM
9
Sentiment Tokenization Issues
 Deal with HTML and XML markup

 Twitter mark-up (names, hash tags)
 Capitalization (preserve for words in all caps)
 Phone numbers, dates
 Emoticons
 Useful code:
Christopher Potts sentiment tokenizer
Brendan O’Connor twitter tokenizer
10
Extracting Features for Sentiment
Classification
 How to handle negation
I didn’t like this movie
vs
I really like this movie
 Which words to use?
Only adjectives
All words
○ All words turns out to work better, at least on this data
11
Negation
Add NOT_ to every word between negation and following
punctuation:
didn’t like this movie, but I
didn’t NOT_like NOT_this NOT_movie but I
12
Naïve Bayes
cNB  argmaxP(c j )
c jC
 P(w | c )
i positions
i j
ˆ count ( w, c)  1
P( w | c) 
count (c)  V
13
Naive Bayes classifiers are a collection of
classification algorithms based
on Bayes’ Theorem. It is not a single
algorithm but a family of algorithms
where all of them share a common
principle, i.e. every pair of features being
classified is independent of each other.
14
Assumptions in Naïve Bayes
The fundamental Naive Bayes
assumption is that each feature makes
an:
independent
equal
contribution to the outcome.
15
 Bayes’ Theorem finds the probability of an
event occurring given the probability of
another event that has already occurred.
Bayes’ theorem is stated mathematically as
the following equation:
P( B | A) P( A)
P( A | B) 
P( B)
where A and B are events and P(B) is not 0.
16
 Basically, we are trying to find probability of
event A, given the event B is true. Event B
is also termed as evidence.
 P(A) is the priori of A (the prior probability,
i.e. Probability of event before evidence is
seen). The evidence is an attribute value of
an unknown instance(here, it is event B).
 P(A|B) is a posteriori probability of B, i.e.
probability of event after evidence is seen.
17
 Now, with regards to our dataset, we
can apply Bayes’ theorem in following
way:
P( X | y ) P( y )
P( y | X ) 
P( X )
where, y is class variable and X is a
dependent feature vector (of size n)
where:
X  ( x1 , x2 , x3 ,  , xn )
18
Tasks to Perform
 Task #1: Perform Exploratory Data Analysis
 Task #2: Perform data cleaning - removing
punctuation
 Task #3: Perform data cleaning - remove stop
words
 Task #4: Perform Count Vectorization (Tokenization)
 Task #5: Create a pipeline to remove stop-words,
punctuation, and perform tokenization
 Task #6: Train a Naive Bayes Classifier
 Task #7: Assess trained model performance
19
Cross-Validation
Iteration
1 Test Training
 Break up data into 10
folds
(Equal positive and
2 Test Training
negative inside each fold?)
 For each fold
Choose the fold as a 3 Training Test Training
temporary test set
Train on 9 folds, compute
performance on the test fold 4 Training Test
 Report average
performance of the 10
runs 5 Training Test
20
Summary on Sentiment
 Generally modeled as classification or regression
task
predict a binary or ordinal label
 Features:
Negation is important
Using all words (in naïve bayes) works well for some
tasks
Finding subsets of words may help in other tasks
○ Hand-built polarity lexicons
○ Use seeds and semi-supervised learning to induce
lexicons
21
Computational Work On Other
Affective States
 Emotion:
Detecting annoyed callers to dialogue system
Detecting confused/frustrated versus confident
students
 Mood:
Finding traumatized or depressed writers
 Interpersonal stances:
Detection of flirtation or friendliness in conversations
 Personality traits:
Detection of extroverts
22
Detection of Friendliness
 Friendly speakers use collaborative
conversational style
Laughter
Less use of negative emotional words
More sympathy
○ That’s too bad I’m sorry to hear that
More agreement
○ I think so too
Less hedges
○ kind of sort of a little …
23
THANKS !!!
24

Sentiment Analysis: Rahil Arshad 2002422 M.Tech. CS I Yr

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sentiment Analysis: Rahil Arshad 2002422 M.Tech. CS I Yr

Uploaded by

Copyright:

Available Formats

Sentiment Analysis

Under Guidance Of:

 Deal with HTML and XML markup

didn’t like this movie, but I

didn’t NOT_like NOT_this NOT_movie but I

where A and B are events and P(B) is not 0.

You might also like