Professional Documents
Culture Documents
PROJECT-I REPORT
On
SENTIMENT ANALYSIS
Sentiment Analysis
Submitted to MAHARAJA RANJIT SINGH PUNJAB TECHNICAL
UNIVERSITY in partial fulfillment of the requirement for the award of the
degree of
B. TECH
In
i
2
ACKNOWLEDGEMENT
We would like to express our special thanks of gratitude to our Project guide Dr. Swati Jindal
Who gave us the golden opportunity to this wonderful project on Sentiment Analysis, which also
helped us in doing the lot of research and we can to know to know about so many new things
about analysis . We are really thankful to him.
I express my sincere gratitude to Dr. Dinesh kumar worthy HOD and Er. Naresh Garg and Er.
Manpreet Kaur, Training & Placement In-charge for providing me an opportunity to undergo
Project-I.
2
3
CANDIDATE’S DECLARATION
Place: BATHINDA
Date:
3
4
CONTENTS
1. ABSTRACT ........................................................... 5
2. INTRODUCTION… ................................................ 6
5. INCLUDING TECHNOLOGY………………………17
7. LIMITATION OF PROJECT……………………….29
8. CONCLUSION. ..................................................... 30
4
4
ABSTRACT
Sentiment analysis and opinion mining is the field of study that analyzes people's opinions,
sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active
research areas in natural language processing and is also widely studied in data mining, Web
mining, and text mining. In fact, this research has spread outside of computer science to the
management sciences and social sciences due to its importance to business and society as a
whole. The growing importance of sentiment analysis coincides with the growth of social media
such as reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks. For the first
time in human history, we now have a huge volume of opinionated data recorded in digital form
for analysis. Sentiment analysis systems are being applied in almost every business and social
domain because opinions are central to almost all human activities and are key influencers of our
behaviors. Our beliefs and perceptions of reality, and the choices we make, are largely
conditioned on how others see and evaluate the world. For this reason, when we need to make a
decision ,we often seek out the opinions of others.
5
1.1 Introduction
Natural language processing (NLP) is an area of computer science and artificial
intelligence concerned with the interaction between computers and humans in natural
language. The ultimate goal of NLP is to help computers understand language as well as we
do. It is the driving force behind things like virtual assistants, speech recognition, sentiment
analysis, automatic text summarization, machine translation and much more. In this post,
we'll cover the basics of natural language processing, dive into some of its techniques and
also learn how NLP has benefited recent advances in deep learning.
Natural language processing (NLP) is the intersection of computer science, linguistics
and machine learning. The field focuses on communication between computers and humans
in natural language and NLP is all about making computers understand and generate human
language. Applications of NLP techniques include voice assistants like Amazon's Alexa and
Apple's Siri, but also things like machine translation and text-filtering.
NLP has heavily benefited from recent advances in machine learning, especially from deep
learning techniques. The field is divided into the three parts:
Problem of Project
WHY NLP IS DIFFICULT
NLP is a subset of computer science and machine learning that attempts to derive meaning
from textual data and can help in the problems related to sentiment Sanalysis and chatbot
creation etc.
Human language is special for several reasons. It is specifically constructed to convey the
speaker/writer's meaning. It is a complex system, although little children can learn it pretty
quickly.
Another remarkable thing about human language is that it is all about symbols. According to
Chris Manning, a machine learning professor at Stanford, it is a discrete, symbolic, categorical
signaling system. This means we can convey the same meaning in different ways (i.e., speech,
gesture, signs, etc.) The encoding by the human brain is a continuous pattern of activation by
which the symbols are transmitted via continuous signals of sound and vision.
6
Understanding human language is considered a difficult task due to its complexity. For
example, there is an infinite number of different ways to arrange words in a sentence. Also,
words can have several meanings and contextual information is necessary to correctly
interpret sentences. Every language is more or less unique and ambiguous. Just take a look at
the following newspaper headline "The Pope’s baby steps on gays." This sentence clearly has
two very different interpretations, which is a pretty good example of the challenges in NLP.
Note that a perfect understanding of language by a computer would result in an AI that can
process the whole information that is available on the internet, which in turn would probably
result in artificial general intelligence.
Sentiment analysis is one of the hardest tasks in natural language processing because
even humans struggle to analyze sentiments accurately.
Data scientists are getting better at creating more accurate sentiment classifiers, but
there’s still a long way to go. Let’s take a closer look at some of the main challenges
of machine-based sentiment analysis:
Subjectivity and Tone
There are two types of text: subjective and objective. Objective texts do not contain
explicit sentiments, whereas subjective texts do. Say, for example, you intend to
analyze the sentiment of the following two texts:
Most people would say that sentiment is positive for the first one and neutral for the
second one, right? All predicates (adjectives, verbs, and some nouns) should not be
treated the same with respect to how they create sentiment. In the examples
above, nice is more subjective than red.
All utterances are uttered at some point in time, in some place, by and to some people,
you get the point. All utterances are uttered in context. Analyzing sentiment without
context gets pretty difficult. However, machines cannot learn about contexts if they are
not mentioned explicitly. One of the problems that arise from context is changes
in polarity. Look at the following responses to a survey:
Everything of it.
Absolutely nothing!
Imagine the responses above come from answers to the question What did you like
about the event? The first response would be positive and the second one would be
negative, right? Now, imagine the responses come from answers to the question What
did you DISlike about the event? The negative in the question will make sentiment
analysis change altogether.
7
A good deal of preprocessing or postprocessing will be needed if we are to take into
account at least part of the context in which texts were produced. However, how to
preprocess or postprocess data in order to capture the bits of context that will help
analyze sentiment is not straightforward.
Irony and Sarcasm
When it comes to irony and sarcasm, people express their negative sentiments using
positive words, which can be difficult for machines to detect without having a thorough
understanding of the context of the situation in which a feeling was expressed.
For example, look at some possible answers to the question, Did you enjoy your
shopping experience with us?
What sentiment would you assign to the responses above? The first response with an
exclamation mark could be negative, right? The problem is there is no textual cue that
will help a machine learn, or at least question that sentiment since yeah and sure often
belong to positive or neutral texts.
How about the second response? In this context, sentiment is positive, but we’re sure
you can come up with many different contexts in which the same response can
express negative sentiment.
The way we understand what someone has said is an unconscious process relying on
our intuition and knowledge about language itself. In other words, the way we
understand language is heavily based on meaning and context. Computers need a
different approach, however. The word “semantic” is a linguistic term and means
"related to meaning or logic."
Speech recognition, for example, has gotten very good and works almost flawlessly,
but we still lack this kind of proficiency in natural language understanding. Your phone
basically understands what you have said, but often can’t do anything with it because
it doesn’t understand the meaning behind it. Also, some of the technologies out there
8
only make you think they understand the meaning of a text. An approach based on
keywords or statistics or even pure machine learning may be using a matching or
frequency technique for clues as to what the text is “about.” These methods are limited
because they are not looking at the real underlying meaning.
Emojis
There are two types of emojis according to Guibon et al.. Western emojis (e.g. :D) are
encoded in only one or two characters, whereas Eastern emojis (e.g. ¯ \ (ツ) / ¯) are
a longer combination of characters of a vertical nature. Emojis play an important role
in the sentiment of texts, particularly in tweets.
Here’s a quite comprehensive list of emojis and their unicode characters that may
come in handy when preprocessing.
Defining Neutral
Here are some ideas to help you identify and define neutral texts:
Sentiment analysis is a tremendously difficult task even for humans. On average, inter-
annotator agreement (a measure of how well two (or more) human labelers can make
the same annotation decision).is pretty low when it comes to sentiment analysis. And
since machines learn from the data they are fed, sentiment analysis classifiers might
not be as precise as other types of classifiers.
9
Still, sentiment analysis is worth the effort, even if your sentiment analysis predictions
are wrong from time to time. By using MonkeyLearn’s sentiment analysis model, you
can expect correct predictions about 70-80% of the time you submit your texts for
classification.
If you are new to sentiment analysis, then you’ll quickly notice improvements. For
typical use cases, such as ticket routing, brand monitoring, and VoC analysis, you’ll
save a lot of time and money on tedious manual tasks.
Let's look at some of the most popular techniques used in natural language processing. Note
how some of them are closely intertwined and only serve as subtasks for solving larger
problems.
PARSING
That actually nailed it but it could be a little more comprehensive. Parsing refers
to the formal analysis of a sentence by a computer into its constituents, which
results in a parse tree showing their syntactic relation to one another in visual
form, which can be used for further processing and understanding.
Below is a parse tree for the sentence "The thief robbed the apartment."
Included is a description of the three different information types conveyed by
the sentence.
10
The letters directly above the single words show the parts of speech for each word (noun,
verb and determiner). One level higher is some hierarchical grouping of words into phrases.
For example, "the thief" is a noun phrase, "robbed the apartment" is a verb phrase and when
put together the two phrases form a sentence, which is marked one level higher.
But what is actually meant by a noun or verb phrase? Noun phrases are one or more words
that contain a noun and maybe some descriptors, verbs or adverbs. The idea is to group nouns
with words that are in relation to them.
A parse tree also provides us with information about the grammatical relationships of the
words due to the structure of their representation. For example, we can see in the structure
that "the thief" is the subject of "robbed."
With structure I mean that we have the verb ("robbed"), which is marked with a "V" above it
and a "VP" above that, which is linked with a "S" to the subject ("the thief"), which has a "NP"
above it. This is like a template for a subject-verb relationship and there are many others for
other types of relationships.
STEMMING
Stemming is a technique that comes from morphology and information retrieval which is used
in NLP for pre-processing and efficiency purposes. It's defined by the dictionary as to "originate
in or be caused by.”
11
Basically, stemming is the process of reducing words to their word stem. A "stem" is
the part of a word that remains after the removal of all affixes. For example, the stem
for the word "touched" is "touch." "Touch" is also the stem of "touching," and so on.
You may be asking yourself, why do we even need the stem? Well, the stem is
needed because we're going to encounter different variations of words that actually
have the same stem and the same meaning. For example:
With the use of sentiment analysis, for example, we may want to predict a customer's opinion
and attitude about a product based on a review they wrote. Sentiment analysis is widely
applied to reviews, surveys, documents and much more.
12
with context. For example, take the phrase, “sick burn” In the context of video games, this
might actually be a positive statement.
Creating a set of NLP rules to account for every possible sentiment score for every possible
word in every possible context would be impossible. But by training a machine learning
model on pre-scored data, it can learn to understand what “sick burn” means in the context
of video gaming, versus in the context of healthcare. Unsurprisingly, each language requires
its own sentiment classification model.
On the fateful evening of April 9th, 2017, United Airlines forcibly removed a
passenger from an overbooked flight. The nightmare-ish incident was filmed by other
passengers on their smartphones and posted immediately. One of the videos, posted
13
to Facebook, was shared more than 87,000 times and viewed 6.8 million times by 6pm
on Monday, just 24 hours later.
The fiasco was only magnified by the company’s dismissive response. On Monday
afternoon, United’s CEO tweeted a statement apologizing for “having to re-
accommodate customers.”
This is exactly the kind of PR catastrophe you can avoid with sentiment analysis. It’s
an example of why it’s important to care, not only about if people are talking about
your brand, but how they’re talking about it. More mentions don't equal positive
mentions.
Brands of all shapes and sizes have meaningful interactions with customers, leads,
even their competition, all across social media. By monitoring these conversations you
can understand customer sentiment in real time and over time, so you can detect
disgruntled customers immediately and respond as soon as possible.
Most marketing departments are already tuned into online mentions as far as volume
– they measure more chatter as more brand awareness. But businesses need to look
beyond the numbers for deeper insights.
Brand Monitoring
Not only do brands have a wealth of information available on social media, but across
the internet, on news sites, blogs, forums, product reviews, and more. Again, we can
look at not just the volume of mentions, but the individual and overall quality of those
mentions.
In our United Airlines example, for instance, the flare-up started on the social media
accounts of just a few passengers. Within hours, it was picked up by news sites and
spread like wildfire across the US, then to China and Vietnam, as United was accused
of racial profiling against a passenger of Chinese-Vietnamese descent. In China, the
incident became the number one trending topic on Weibo, a microblogging site with
almost 500 million users.
And again, this is all happening within mere hours of the incident.
Brand monitoring offers a wealth of insights from conversations happening about your
brand from all over the internet. Analyze news articles, blogs, forums, and more
to guage brand sentiment, and target certain demographics or regions, as desired.
Automatically categorize the urgency of all brand mentions and route them instantly to
designated team members.
Get an understanding of customer feelings and opinions, beyond mere numbers and
statistics. Understand how your brand image evolves over time, and compare it to that
of your competition. You can tune into a specific point in time to follow product
releases, marketing campaigns, IPO filings, etc., and compare them to past events.
Real-time sentiment analysis allows you to identify potential PR crises and take
immediate action before they become serious issues. Or identify positive comments
and respond directly, to use them to your benefit.
14
Example: Expedia Canada
Around Christmas time, Expedia Canada ran a classic “escape winter” marketing
campaign. All was well, except for the screeching violin they chose as background
music. Understandably, people took to social media, blogs, and forums. Expedia
noticed right away and removed the ad.
Then, they created a series of follow-up spin-off videos: one showed the original actor
smashing the violin; another invited a real negative Twitter user to rip the violin out of
the actor’s hands on screen. Though their original campaign was a flop, Expedia were
able to redeem themselves by listening to their customers and responding.
Sentiment analysis allows you to automatically monitor all chatter around your brand
and detect and address this type of potentially-explosive scenario while you still have
time to defuse it.
Voice of Customer (VoC)
Social media and brand monitoring offer us immediate, unfiltered, and invaluable
information on customer sentiment, but you can also put this analysis to work on
surveys and customer support interactions.
Net Promoter Score (NPS) surveys are one of the most popular ways for businesses
to gain feedback with the simple question: Would you recommend this company,
product, and/or service to a friend or family member? These result in a single score
on a number scale.
Sentiment analysis can be used on any kind of survey – quantitative and qualitative –
and on customer support interactions, to understand the emotions and opinions of
your customers. Tracking customer sentiment over time adds depth to help
understand why NPS scores or sentiment toward individual aspects of your business
may have changed.
You can use it on incoming surveys and support tickets to detect customers who are
‘strongly negative’ and target them immediately to improve their service. Zero in on
certain demographics to understand what works best and how you can improve.
Real-time analysis allows you to see shifts in VoC right away and understand the
nuances of the customer experience over time beyond statistics and percentages.
15
Discover how we analyzed the sentiment of thousands of Facebook reviews, and
transformed them into actionable insights.
In Brazil, federal public spending rose by 156% from 2007 to 2015, while satisfaction
with public services steadily decreased. Unhappy with this counterproductive
progress, the Urban Planning Department recruited McKinsey to help them focus on
user experience, or “citizen journeys,” when delivering services. This citizen-centric
style of governance has led to the rise of what we call Smart Cities.
McKinsey developed a tool called City Voices, which conducts citizen surveys across
more than 150 metrics, and then runs sentiment analysis to help leaders understand
how constituents live and what they need, in order to better inform public policy. By
using this tool, the Brazilian government was able to uncover the most urgent needs
– a safer bus system, for instance – and improve them first.
If this can be successful on a national scale, imagine what it can do for your company.
Customer Service
We already looked at how we can use sentiment analysis in terms of the broader VoC,
so now we’ll dial in on customer service teams.
We all know the drill: stellar customer experiences means a higher rate of returning
customers. Leading companies know that how they deliver is just as, if not more,
important as what they deliver. Customers expect their experience with companies to
be immediate, intuitive, personal, and hassle-free. If not, they’ll leave and do business
elsewhere. Did you know that one in three customers will leave a brand after just one
bad experience?
You can use sentiment analysis and text classification to automatically organize
incoming support queries by topic and urgency to route them to the correct department
and make sure the most urgent are handled right away.
Central to deep learning and natural language is "word meaning," where a word and
especially its meaning are represented as a vector of real numbers. With these vectors that
represent words, we are placing words in a high-dimensional space. The interesting thing
about this is that the words, which are represented by vectors, will act as a semantic space.
This simply means the words that are similar and have a similar meaning tend to cluster
16
together in this high-dimensional vector space. You can see a visual representation of word
meaning below:
You can find out what a group of clustered words mean by doing principal component analysis
(PCA) or dimensionality reduction with T-SNE, but this can sometimes be misleading because
they oversimplify and leave a lot of information on the side. It's a good way to get started (like
logistic or linear regression in data science), but it isn’t cutting edge and it is possible to do it
way better.
We can also think of parts of words as vectors which represent their meaning. Imagine the
word "undesirability." Using a morphological approach, which involves the different parts a
word has, we would think of it as being made out of morphemes (word parts) like this: "Un +
desire + able + ity." Every morpheme gets its own vector. From this we can build a neural
network that can compose the meaning of a larger unit, which in turn is made up of all of the
morphemes.
Deep learning can also make sense of the structure of sentences with syntactic parsers.
Google uses dependency parsing techniques like this, although in a more complex and larger
manner, with their "McParseface" and "SyntaxNet."
By knowing the structure of sentences, we can start trying to understand the meaning of
sentences. We start off with the meaning of words being vectors but we can also do this with
whole phrases and sentences, where the meaning is also represented as vectors. And if we
want to know the relationship of or between sentences, we train a neural network to make
those decisions for us.Deep learning is also good for sentiment analysis. Take this movie
review, for example: "This movie does not care about cleverness, with or any other kind of
intelligent humor." A traditional approach would have fallen into the trap of thinking this is a
positive review, because "cleverness or any other kind of intelligent humor" sounds like a
positive intent, but a neural network would have recognized its real meaning. Other
applications are chatbots, machine translation, Siri, Google inbox suggested replies and so
on.There has also been huge advancements in machine translation through the rise of
recurrent neural networks, about which I also wrote a blog-post.
Including Technology
5.1 Spyder: A powerful weapon for Machine Learning in Python
First of all, you would need to install Anaconda distribution which can be
downloaded from the link https://www.anaconda.com/download/ (for
Windows users only).
The installation is pretty simple just keep on clicking next and agree to terms and
conditions. So, the reason for installing Anaconda is that it comes with a lot of
preinstalled packages and Spyder is one of them. After installing the software just
click on the anaconda icon on the desktop or go to the search option in windows
17
10 and type in anaconda navigator, for the Ubuntu users you can install anaconda
using the terminal. As you open the navigator you will see the anaconda GUI
which looks like this:
Screenshot-0
From here just click on the launch button below the Spyder and a new Spyder
GUI will be opened in a separate window:
18
Screenshot-1
Ad by Valueimpression
As you can see by default a new .py file named untitled2.py has been
created. Untitled2 is the name of the file in which you will be writing your python
code.
Here, I would be highlighting some of the basic features that important features
and would be explaining them to you:
19
Screenshot-2
The portion marked in sky blue is used to set the directory of the file to be opened,
in the previous article (Linear Regression in Machine Learning) I had mentioned in
my code that I have stored the source code and .csv file in the same folder so
after you save those 2 files in the same folder you can just go to this set directory
option and select the folder in which you have stored the two files.
The portion marked in orange is the variable explorer this basically shows us the
info about all the variables that we have created, after selecting your code using
ctrlA and compiling your code using shift-enter just click on this option you will
see the following:
20
Screenshot-3
On the upper right of this screen you will see a box containing some names below
the name column such as x_test, x_train etc these are the variables I have
used and below the type column, you’ll see their datatype. As you move further
right you can see size as well as the values stored.
As you go back to screenshot-2 you will see another portion marked in dark blue.
The area marked is known as the file explorer and the main purpose of the file
explorer is to select the files and load it on your Spyder. It also allows you to have
a glimpse of the files present in the particular directory that you have selected.
And at last but not the least you’ll see the IPython console option marked with
black ink in screenshot-2. IPython is basically a command shell for interactive
computing in multiple programming languages, originally developed for
the Python programming language, that offers introspection, rich media, shell
syntax, tab completion, and history.
These are some of the basic features that Spyder offers, however, there are my
more just install Anaconda see for yourself.
21
22
23
24
25
26
27
28
LIMITATION OF PROJECT
Sentiment analysis tools can identify and analyse many pieces of text
automatically and quickly.But computer programs have problems recognizing
things like sarcasm and irony, negations, jokes, and exaggerations - the sorts
of things a person would have little trouble identifying. And failing to recognize
these can skew the results.
With short sentences and pieces of text, for example like those you find on
Twitter especially, and sometimes on Facebook, there might not be enough
context for a reliable sentiment analysis. However, in general, Twitter has a
reputation for being a good source of information for sentiment analysis, and
with the new increased word count for tweets it's likely it will become even more
useful.So, automated sentiment analysis tools do a really great job of analysing
text for opinion and attitude, but they're not perfect.
When you're using a tool like Typely to analyse your text to see if it conveys the
sentiment you want for your readers/audience, combine the results it gives you
with your human judgement to identify anything the tool may not be able to
easily determine.
29
Conclusions and Futur SCOPE
Conclusions
The field of sentiment analysis is an exciting new research direction due to large number of
real-world applications where discovering people’s opinion is important in better decision-
making. The development of techniques for the document-level sentiment analysis is one of
the significant components of this area. Recently, people have started expressing their
opinions on the Web that increased the need of analyzing the opinionated online content for
various real-world applications. A lot of research is present in literature for detecting
sentiment from the text. Still, there is a huge scope of improvement of these existing
sentiment analysis models.
Sentiment analysis or opinion mining is a field of study that analyzes people’s sentiments,
attitudes, or emotions towards certain entities. This project tackles a fundamental problem
of sentiment analysis, sentiment polarity categorization. Online product reviews from
Amazon.com are selected as data used for this study. A sentiment polarity categorization
process has been proposed along with detailed descriptions of each step. Experiments for
both sentence-level categorization and review-level categorization have been performed.
30
The future scope of sentiment analysis
Sentiment analysis is a useful tool for any organization or group for which public
sentiment or attitude towards them is important for their success - whichever way that
success is defined.
On social media, blogs, and online forums millions of people are busily discussing and
reviewing businesses, companies, and organizations. And those opinions are being
‘listened to’ and analysed.
Those being discussed are making use of this enormous amount of data by using
computer programs that don’t just locate all mentions of their products, services, or
business, but also determine the emotions and attitudes behind the words being used.
The results from sentiment analysis help businesses understand the conversations
and discussions taking place about them, and helps them react and take action
accordingly.
They can quickly identify any negative sentiments being expressed, and turn poor
customer experiences into very good ones.
They can create better products and services, and they can formulate the marketing
messages they send out according to the sentiments being expressed by their target
audience or customers.
Universities can use sentiment analysis to analyze student feedback and comments
garnered either from their own surveys, or from online sources such as social media.
They can then use the results to identify and address any areas of student
dissatisfaction, as well as identify and build on those areas where students are
expressing positive sentiments.
And by analysing the sentiment behind customer reviews on sites like TripAdvisor and
Yelp, hotels and restaurants can not only manage their reputations by improving the
services offered, but can also gauge the general customer attitude to their business
or brand.
Businesses can compare their results with those of their competitors to better
understand people’s attitude to their business. They can identify where they may be
excelling, or identify where there’s room for improvement compared to the competition.
31
They can also conduct market research into general sentiment around key issues,
topics, products, and services, before developing and launching their own new
services, products or features.
References:
https://monkeylearn.com/sentiment-analysis/
https://ieeexplore.ieee.org/abstract/document/6812968
32