You are on page 1of 52

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/324965434

SENTIMENT ANALYSIS OF TWEET DATA

Thesis · January 2017

CITATIONS READS
0 617

4 authors, including:

S. M. Mazharul Hoque Chowdhury Priyanka Ghosh


Daffodil International University Jahangirnagar University
26 PUBLICATIONS   5 CITATIONS    11 PUBLICATIONS   7 CITATIONS   

SEE PROFILE SEE PROFILE

Syed Akhter Hossain


Daffodil International University
116 PUBLICATIONS   492 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Smart Security Surveillance using IOT View project

MS Thesis View project

All content following this page was uploaded by S. M. Mazharul Hoque Chowdhury on 05 May 2018.

The user has requested enhancement of the downloaded file.


SENTIMENT ANALYSIS OF TWEET DATA

BY

S.M. Mazharul Hoque Chowdhury


ID: 131-15-2213

Priyanka Ghosh
ID:131-15-2378

AND

Most. Arina Afrin


ID:133-15-2964

This Report Presented in Partial Fulfillment of the Requirements for the


Degree of Bachelor of Science in Computer Science and Engineering .

Supervised By

Dr. Syed Akhter Hossain


Professor and Head
Department of Computer Science and Engineering
Daffodil International University

DAFFODIL INTERNATIONAL UNIVERSITY


DHAKA, BANGLADESH
DECMBER 2016

i
©Daffodil International University
APPROVAL
This Project titled “Sentiment Analysis of Tweet Data”, submitted by S.M. Mazharul
Hoque Chowdhury, Priyanka Ghosh and Most. Arina Afrin to the Department of
Computer Science and Engineering, Daffodil International University, has been accepted
as satisfactory for the partial fulfillment of the requirements for the degree of B.Sc. in
Computer Science and Engineering and approved as to its style and contents.
BOARD OF EXAMINERS

Dr. Syed Akhter Hossain Chairman


Professor and Head
Department of Computer Science and Engineering
Faculty of Science & Information Technology
Daffodil International University

Dr. Sheak Rashed Haider Noori Internal Examiner


Associate Professor
Department of Computer Science and Engineering
Faculty of Science & Information Technology
Daffodil International University

Narayan Ranjan Chakraborty Internal Examiner


Assistant Professor and Associate Head
Department of Computer Science and Engineering
Faculty of Science & Information Technology
Daffodil International University

Nazmun Nessa Moon Internal Examiner


Assistant Professor
Department of Computer Science and Engineering
Faculty of Science & Information Technology
Daffodil International University

Dr. Mohammad Shorif Uddin External Examiner


Professor and Chairman
Department of Computer Science and Engineering
Jahangirnagar University
ii
©Daffodil International University
DECLARATION
We hereby declare that, this project has been done by us under the supervision of Dr.
Syed Akhter Hossain, Professor and Head, Department of Computer Science and
Engineering, Daffodil International University. We also declare that neither this project
nor any part of this project has been submitted elsewhere for award of any degree or
diploma.

Supervised by:

Dr. Syed Akhter Hossain


Professor and Head
Department of Computer Science and Engineering
Daffodil International University

Submitted by:

S.M. Mazharul Hoque Chowdhury


ID: -131-15-2213
Department of CSE
Daffodil International University

Priyanka Ghosh
ID: -131-15-2378
Department of CSE
Daffodil International University

Most. Arina Afrin


ID: -133-15-2964
Department of CSE
Daffodil International University

iii
©Daffodil International University
ACKNOWLEDGEMENT
First we express our heartiest thanks and gratefulness to almighty Allah for His divine
blessing makes us possible to complete this project successfully.

We fell grateful to and wish our profound our indebtedness to of Dr. Syed Akhter
Hossain, Professor and Head, Department of Computer Science and Engineering,
Daffodil International University, Dhaka. Deep Knowledge & keen interest of our
supervisor in the field of data mining influenced us to carry out this project .His endless
patience, scholarly guidance, continual encouragement, constant and energetic
supervision, constructive criticism , valuable advice, reading many inferior draft and
correcting them at all stage have made it possible to complete this project.

We would like to express our heartiest gratitude to other faculty member and the staff of
Computer Science and Engineering Department of Daffodil International University, for
their kind help to finish our project.

We would like to thank our entire course mate in Daffodil International University, who
took part in this discuss while completing the course work.

Finally, we must acknowledge with due respect the constant support and patients of our
parents.

iv
©Daffodil International University
ABSTRACT
This project is on “Sentiment Analysis of Tweet Data” is a sentiment analysis project
based on big data analytics. This project will help us to analyze sentiment from twitter
generated text data. Which will be able to show the sentimental state for a person or a
specific topic or whatever we want.

The aim for our project is to build up an application which will be able to connect non-
technical people to the field of data mining. Where using simple user interface they will
be able to create a report based on their search keyword. This application is very simple
and easy to use. So it will be comfortable for everyone to use it.

To develop this project we just need to know R language and platform and its library. To
create the user interface we used shiny, which is a package including PHP and HTML to
create a web application. We will input inside the shiny app and in the background R will
do the data collection, analysis and output generation. Then shiny will print the output in
the web application.

After completing the application the application is working fine and its main task is
running the application on the browser, which is taking input and providing us output
based on input.

v
©Daffodil International University
TABLE OF CONTENTS

CONTENS PAGE

Board of examiners ii

Declaration iii

Acknowledgements iv

Abstract V

List of Figures vii

CHAPTER 1: CHAPTER NAME 1-3

1.1 Objective 1

1.2 Motivation 2

1.3 Expected Outcome 3

1.3 Report Layout 3

CHAPTER 2: LITERATURE REVIEW 4-9

2.1 Sentiment Analysis 4

2.2 Related Works 7

2.3 Challenges 8

CHAPTER 3: PROPOSED RESEARCH 10-21

3.1 Research Methodology 10

3.2 Data Collection 11

vi
©Daffodil International University
3.2.1 Creating Twitter App 11

3.2.2 R Setup 13

3.2.3 Collecting Twitter Data 15

3..3 Data Processing 16

3.3.1 Data Pre-processing 17

3.3.2 Data Analyze 17

3.4 Flow Model 20

3.5 Experimental Layout 21

CHAPTER 4: RESULTS AND DISCUSSION 22-38

4.1 Experimental Result 22

4.1.1 Emotion Class 23

4.1.2 Polarity Class 25

4.1.3 Word Cloud 28

4.1.4 Scoring 31

4.2 Final Outcome 34

CHAPTER 5: CONCLUSION AND FUTURE SCOPE 39-40

39
5.1 Conclusion
40
5.2 Future Scope

REFERENCES 41-42

vii
©Daffodil International University
LIST OF FIGURES
FIGURES PAGE NO
Figure 3.1: Creating A Twitter Application 10
Figure 3.2: Application Created In Twitter 11
Figure 3.3: Application Homepage 13
Figure 3.4: R Download 13
Figure 3.5: R User Interface 14
Figure 3.6: R Studio User Interface 15
Figure 3.7: Data Collection and Store 16
Figure 3.8: Storing Data 17
Figure 3.9: Positive and Negative word Library 18
Figure 3.10: Sentiment Scoring Method 19
Figure 3.11: Tweet Sentiment Scoring 19
Figure 3.12: Data Flow 20
Figure 3.13: Experimental Layout 21
Figure 4.1: Emotion Class Sudo Code 24
Figure 4.2: Emotion Class Bar Diagram 25
Figure 4.3: Polarity class Sudo Code 27
Figure 4.4: Polarity class Bar Diagram 28
Figure 4.5: Word Cloud Script 30
Figure 4.6: Word Cloud for Tweet Data 31
Figure 4.7: Scoring System Table 32
Figure 4.8: Scoring Histogram plot/bar Chart Diagram 33
Figure 4.9: Score chart for Tweets 33
Figure 4.10: Running Application 35
Figure 4.11: Searching for Tweet 35
Figure 4.12: Emotion Class Output 36
Figure 4.13: Polarity Class Output. 37
Figure 4.14: Word Cloud Output. 37
Figure 4.15: Score Histogram Output 38

viii
©Daffodil International University
0
©Daffodil International University
CHAPTER 1

INTRODUCTION

Twitter, a popular social network site, is one such source which contributes huge data
which possess values beyond social and commercial interests. Twitter users can express
or share their opinions, feelings or information regarding events, products, health or
anything in their 140 character restricted short messages termed tweets. It is a great
source of data. Now a days, everyday 500 million tweets are posted in twitter. So it has
become very important to analyze them to find track of data flowing throw twitter. As a
part of data analyzing data or data mining analyzing the sentiment of twitter data is also
important to know the sentiment of the user. Analyzed data can be used for research,
business market study, product review etc.

1.1 OBJECTIVE

Research is an organized investment of a problem in which there is an attempt to gain


solution to a problem. To get right solution of a right problem, clearly defined objectives
are very important. Clearly defined objectives enlighten the way in which the researcher
has to proceed. Research objectives are usually expressed in lay terms and are directed as
much to the client as to the researcher. Research objectives may be linked with a
hypothesis or used as a statement of purpose in a study that does not have a hypothesis. A
researcher objective is clear, concise, declarative statement, which provides direction to
investigate the variables. Generally research objective focus on the way to measure the
variable, such as identify or describe them. Sometime objectives are directed towards
identifying the relationship difference between two variables. Research objective outline
the specific goals the study plans to achieve when completed.

1
©Daffodil International University
Our research objectives are given below:

 To study Big Data & Data Analytics.


 To study sentiment analysis.
 To develop and seek Knowledge on how to deal with tweet data.
 To develop Computational tool for sentiment analysis.
 To apply the knowledge of sentiment analysis on the real life example.
 To build an app for non-technical users.

1.2 MOTIVATION

The explosive growth of the textual information on the web in the past few decades has
brought radical change in human life. In the web, people share their opinions and
sentiments. In many forms about services the services they are aware of. This creates a
large collection of opinions and views in the form of texts, which needs to be analyzed to
know the efficacy of the service. Opinions are a usually subjective expression that
describes person’s sentiment, feelings towards the object or service.

Now a day's twitter is a very popular social website were from all over the world people
connect with each other and communicate and express their thought through texts. By
analyzing their tweets we can decide whatever their mental situation is going on. Even
what is the recent “hot topic”. So we were motivated to analyze tweet data from this
thinking. Before starting on something motivation is very important. Because without
proper motivation it’s not possible to carry on with it. In some cases only motivation can
complete a huge part of an analysis or task.

Motivation for our projects are:

Recent revolution of data analysis and big data.


Our personal interest on data mining and sentiment analysis.
Opportunity to learn new things.

2
©Daffodil International University
1.3 EXPECTED OUTCOME

We choose “Sentiment analysis of tweet data” as our project because we wanted to build
a data mining app which will be able to create interaction between tweet data and data
mining. We wanted to build an app which will be able to collect twitter data for example
tweets and analyze them. After the analysis the application will provide us the statistical
overview of sentimental situation of targeted people.

So finally we decided to build a data mining application using R for sentiment analysis.
Where we will be able to search tweets using keyword, for a particular topic, of a
particular account or random data. User will also be able to see trending hot topics or
keywords on their search topic. We will also build a web application for non-technical
users. So that they will be able to analyze data themselves from twitter. Outputs will be
viewed as histogram plot, bar diagram and word cloud.

1.4 REPORT LAYOUT

There are five chapters in this research paper. They are: Introduction, Literature Review,
Proposed Research, Results and Discussion, Conclusion and Future.

Chapter one: Introduction; Objective, Motivation, Expected Outcome, Report layout.

Chapter two: Literature Review; Sentiment Analysis, Related works, challenges.

Chapter three: Proposed Research; Research Methodology, Data Collection, Data


Processing, Flow Model, Experimental layout.

Chapter four: Results and Discussion; Experimental Result, Discussion.

Chapter five: Conclusion and Future; conclusion, Future Scope.

3
©Daffodil International University
CHAPTER 2

LITERATURE REVIEW

Sentiment analysis is study of data and finding out the sentimental state. Data can be text
or a audio or a video file. Sentiment can be measured by applying different method for
different type of data. For example, text can be analyzed by the keywords on it, audio
files can be analyzed using tone of the voice and video or image file can be analyzed by
tracking the state of the face. Recently sentiment analysis has become very popular
because of the large data flow over the internet. Researchers are trying to analyze
different type of data for different purpose. But the main challenge for sentiment analysis
is detecting actual state of emotion. Because some cases analyzed data don't result what
they actually mean. Because some positive sentences sometimes mean as negative and
some negatives mean as positive. Researchers are now trying to solve this kind of issues
to get much more accurate result of sentiment analysis.

2.1 SENTIMENT ANALYSIS

Sentiment means feelings, it also be attitudes, emotions or opinions. And sentiment


analysis is the process of computationally identifying and categorizing opinions
expressed in a piece of text, in order to determine whether the writer's attitude towards a
particular topic, product, etc. is positive, negative, or neutral. Using natural language
processing, statistics, or machine learning methods to extract, identify, or otherwise
characterize the sentiment content of a text unit. Sometimes it referred to as opinion
mining.

The purpose of sentiment analysis is to determine the attitude of a communicator through


the contextual polarity of their speaking or writing. Their attitude may be reflected in

4
©Daffodil International University
their own judgment, emotional state of the subject, or the state of any emotional
communication they are using to affect a reader or listener.

Sentiment analysis is a fast growing subject in the technical communication field. With
the increase in social media, online retail, and personal blogs and publications knowing
where public sentiment is leaning has translated into a rapid evolution in sentiment
analysis that can become a valuable skill.

When we perform sentiment analysis on some content, the main searching point is the
opinions in content and we picking the sentiment based on those opinions. An opinion is
an expression that consists of two key components. One of them is a target or topic and
another one is a sentiment on the topic.

Consider a sentence, “I love this company“, here “this company” is the topic and the
sentiment that is expressed by the verb “love” is positive.

Sentiment analysis is just not a feature in a social analytics tool – it’s a field of study.
This field is still being studied, albeit not at great lengths due to the intricacy of this
analysis, in the same way that some aspects of linguistics are still up to debate or not fully
understood.

Existing approaches to sentiment analysis can be grouped into three main categories:
knowledge-based techniques, statistical methods, and hybrid approaches.

Knowledge-based techniques classify text by affect categories based on the presence of


unambiguous affect words such as happy, sad, afraid, and bored. Some knowledge bases
not only list obvious affect words, but also assign arbitrary words a probable "affinity" to
particular emotions.

Statistical methods leverage on elements from machine learning such as latent semantic
analysis, support vector machines, "bag of words" and Semantic Orientation — Point

5
©Daffodil International University
wise Mutual Information. More sophisticated methods try to detect the holder of a
sentiment (the person who maintains that affective state) and the target (the entity about
which the affect is felt). To mine the opinion in context and get the feature which has
been opinionated, the grammatical relationships of words are used. Grammatical
dependency relations are obtained by deep parsing of the text.

Hybrid approaches leverage on both machine learning and elements from knowledge
representation such as ontology and semantic networks in order to detect semantics that
are expressed in a subtle manner, e.g., through the analysis of concepts that do not
explicitly convey relevant information, but which are implicitly linked to other concepts
that do so.

Open source software tools deploy machine learning, statistics, and natural language
processing techniques to automate sentiment analysis on large collections of texts,
including web pages, online news, internet discussion groups, online reviews, web blogs,
and social media. Knowledge-based systems, on the other hand, make use of publicly
available resources, to extract the semantic and affective information associated with
natural language concepts. Sentiment analysis can also be performed on visual content,
i.e., images and videos. One of the first approach in this direction is SENTIBANK
utilizing an adjective noun pair representation of visual content. [1]

The applications for sentiment analysis are endless. More and more we’re seeing it used
in social media monitoring and VOC to track customer reviews, survey responses,
competitors, etc. However, it is also practical for use in business analytics and situations
in which text needs to be analyzed.

2.2 RELATED WORKS

Sentiment analysis is in demand because of its efficiency. Thousands of text documents


can be processed for sentiment (and other features including named entities, topics,
themes, etc.) in seconds, compared to the hours it would take a team of people to

6
©Daffodil International University
manually complete. Because it is so efficient (and accurate – Semantria has 80%
accuracy for English content) many businesses are adopting text and sentiment analysis
and incorporating it into their processes. [2]

People all around the world now a days working on different topics of sentiment analysis.
As the amount of data increasing day by day data mining and sentiment analysis became
more popular to people. Different companies need sentiment analysis for collecting
customer information. Because this is the easiest way to find their customers and target
them for their business.

Even now social media also need sentiment analysis. Using sentiment analysis they

determine marketing strategy, improve campaign success, improve product messaging

and improve customer service.

Some sentiment analysis related works are:

1. Opinion mining and sentiment analysis. By Bo Pang and Lillian Lee, Foundations

and Trends in Information Retrieval, Vol. 2, No 1-2 (2008) 1–135.

2. Twitter Sentiment Analysis: The Good the Bad and the OMG! ByEfthymios

Kouloumpis, Theresa Wilson, Johanna Moore, Proceedings of the Fifth

International AAAI Conference on Weblogs and Social Media.

3. Sentiment Analysis of Short Informal Texts, Svetlana Kiritchenko, Xiaodan Zhu

and Saif M. Mohammad, Journal of Artificial Intelligence Research 50 (2014)

723–762.

4. Sentiment Analysis: Capturing Favorability Using Natural Language Processing,

Tetsuya Nasukawa and Jeonghee Yi.

7
©Daffodil International University
5. Sentiment Analysis of Political Tweets: Towards an Accurate Classifier, Akshat

Bakliwal, Jennifer Foster, Jennifer van der Puil, Ron O’Brien, Lamia Tounsi and

Mark Hughes.

6. Stock trend prediction using news sentiment analysis, Kalyani Joshi, Prof.

Bharathi H. N., Prof. Jyothi Rao.

7. Decision Making Using Sentiment Analysis from Twitter, M.Vasuki, J.Arthi,

K.Kayalvizh.

All those are the recent research works on sentiment analysis. They focused on different
topics like customer review, decision making, NLP, twitter data, openion mining etc.

2.3 CHALLENGES

As we know that sentiment analysis is the analysis of image, text, audio, video sentiment
recognition. Sentiment analysis and evaluation process are facing several challenges.
These challenges become obstacles in analyzing the accurate meaning of sentiments and
detecting the suitable sentiment polarity. To identify and extract subjective information
from text the sentiment analysis is the practice of applying natural language processing
and text analysis techniques [3].

The degree of accuracy issue is hard to answer, said Bing Liu, a University of Chicago
computer science professor specializing in data mining. It depends on what are
measuring, the level of analyzing text, and the number of data sets across domains and
the voice sound quality of videos, among other variables. Still, he thinks that progress is
being made in this regard [2].

In sentiment analysis it is more challenging to detect a more in depth sentiment/emotion.


Positive and negative is a very simple analysis but the challenging one is to extract

8
©Daffodil International University
emotions like how much hate there is inside the opinion, how much happiness, how much
sadness, etc.

Emotion detection is really a difficult task because sometimes it happens that someone
tell something that seems positive but in real it’s not positive the sense was negative. So
sometimes it is difficult to understand meaning of a sentence cause the emotions are too
much complex.

Mainly sentiment analysis try to detect the mental situation of a person. But sometimes it
become tough to tell what the person meant. If we consider audio sentiment analysis, then
noise or voice tune difference can create major error in output. Same for text analysis,
because some texts word wise meaning is totally different from its actual meaning. That’s
why sentiment analysis is facing major challenges now a days.

9
©Daffodil International University
CHAPTER 3

PROPOSED RESEARCH

In this project we tried to analyze twitter data to get sentimental state from it. To do that
at first we need to collect data from twitter. We created a twitter app to get the data. Then
we connected it with our application. We wrote a script to analyze collected data. So
when we run the program it will automatically collect the data, analyze it and give the
output.

3.1 RESEARCH METHODOLOGY

As we selected sentiment analysis of tweet data as our project. So we needed to collect


data from twitter. Everyday twitter generate tons of or terabytes of data by its users.
Where most of them are unstructured data. So before starting our research work and
implementation we needed to consider this as our challenge for sentiment analysis.

We found different platforms for data mining as orange, Weka, Rattle GUI, Apache
Mahout, R, Hadoop, UIMA, SenticNet API, Natural Language Toolkit etc. Then we
selected R as our data mining tool and platform.

We selected R as our data mining tool and platform because of its friendly user interface
and strong library for data processing, data mining and output visualization. So at first we
learnt how to work with R. Then we started learning R programming language which is
mainly used for data analysis and it is a high level programing language. We learned R
programming language from https://www.datacamp.com. It is very helpful website
because its shows how to learn R in step by step process.

After that we created a twitter app for data collection and using R language we created an
application. This application can connect to twitter using internet and collect data and
analyze them. Collected data are stored in a CSV file.

10
©Daffodil International University
3.2 DATA COLLECTION

To collect data we needed a twitter application, which can give us access to the twitter
and application will be able to find expected tweets. To create a twitter application we
went to the twitter application management website. Then we created an application
named Sentiment Analysis BDA.

3.2.1 CREATING TWITTER APP

To create the application we needed to fill up a form providing name of the application,
what is the purpose of the application and other necessary data.

Figure 3.1 creating a twitter application

After providing all information our application is created and now we can access it and
view detail information about it.

11
©Daffodil International University
Figure 3.2 application created in twitter

In this application home page they provided us some important information called
consumer key, consumer, Consumer Secret, access token, access token secret about
accessing the app to collect tweet. Without accessing those information no one will be
able to connect this app with our application.

12
©Daffodil International University
Figure 3.3 application home page

3.2.2 R SETUP

When creating twitter application is done now we are ready to build our R application to
collect data and store them. To create R application first we downloaded and installed the
R application to our computer.

Figure 3.4 R download

13
©Daffodil International University
After downloading and installing the R app we will get a user interface shown below:

Figure 3.5 R user interface

But the user interface don’t look that much comfortable to use and there is no
visualization option. So now we have to install another application named RStudio.

RStudio gives R programmers a very strong and helpful user interface. They will also get
a visualization platform where they will be able to visualize their statistical result or
output.

14
©Daffodil International University
Figure 3.6 RStudio user interface

After installing necessary components now we are ready to write the program to collect
tweets and store them.

3.2.3 COLLECTING TWITTER DATA

To collect data from twitter we have to write a script which will connect the application
and collect tweets from twitter. Now the script for data collection and connection is given
below:

15
©Daffodil International University
Figure 3.7 data collection and store

We used two libraries library(twitteR), library(ROAuth) for data collection and linking
the application with twitter.This script will collect data from twitter and store them in a
CSV file. But as we need to analyze them and show user the output so we later only
worked with data file collected from twitter and analyzed them.

Tweets will be collected based on last upload. User will be able to collect as much as
tweets he wants. Tweet collection will be based on search keyword, number of tweets and
existence of that tweet keyword in the twitter.

3.3 DATA PROCESSING

When all the data are collected, our main task then starts. Which is “Sentiment analysis
of tweet data”. But for processing the data we need to work in two steps. They are:

i. Data pre-processing
ii. Data analyze

After completing those steps we will be able to get our expected output of our project.

16
©Daffodil International University
3.3.1 DATA PRE-PROCESSING

In this step we will cut off all the unexpected and unwanted data generated by twitter as
location, retweets number, link etc.

To cut off unexpected data we need the following script:

Figure 3.8 shorting data


Those codes will short unwanted columns and only store the tweets. Data pre-processing
is important because without pre-processing we won’t get better output and analyzing
process will take more time.

3.3.2 DATA ANALYZE

When tweets are pre-processed data are ready to analyze. There is several methods to
analyze data for sentiment analysis. They are Sentiment Classification Using Lexical
Contextual Sentence Structure, Combining Lexicon and Learning based Approaches for
Concept-Level Sentiment Analysis, Interdependent Latent Dirichlet Allocation, A Joint
Model of Feature Mining and Sentiment Analysis for Product Review Rating, Opinion
Digger, Latent Aspect Rating Analysis on Review Text Data: A Rating Regression
Approach.[5]

In our project we used combining Lexicon and Learning based approaches for concept-
level sentiment analysis. Here we first divided our tweet sentences into words. So that we
are able to compare with our stored positive and negative words.

17
©Daffodil International University
We used positive and negative words as our word library for positive and negative
sentiment. We stored them as different .txt file and stored them in the same directory
where the r and CSV files are stored. Demo of positive and negative words are given
below:

Figure 3.9 positive and negative word library

Now we need to write an R script which will read our pre-processed tweet data and
divide them into words. Then it will match those words with our previously stored
positive and negative words which we are considering as our library for our sentiment
analysis.

Here the script will also score our tweets according to its positive or negative values. If
the positive value is greater than negative value the tweet is a positive tweet.

18
©Daffodil International University
The scoring system works on based on following method:

Figure 3.10 sentiment scoring method

Now we will apply this same method in R to analyze the tweet data which we collected
from twitter before and processed. The R program for analyzing data is given below:

Figure 3.11 Tweet sentiment scoring

19
©Daffodil International University
After calculating the twitter value we will get the twitter value as positive or negative. If
the tweet is not positive or negative then it will be counted as neutral tweet.

3.4 FLOW MODEL


In our project we divided our data flow is different parts according to how data are
passing through each other. First a twitter app is connected with a twitter account, then an
R script connect it with R, so now data collection is ready. When data collection is
complete then R script analyze it and gives some statistical report as bar diagram, word
cloud as output. Than all the outputs are sent to the browser and browser visualize them
for user.

Figure 3.12 Data flow

According to the flow model system will recheck for data connection before collecting
data. If it is not connected then it will try total 3 times then connection will fail. Data flow
will be direct as it is given is the figure.

20
©Daffodil International University
3.5 EXPERIMENTAL LAYOUT
According to our plan we wanted to build a user interface for general users who wants to
analyze the tweet data. So first of all we need a search box, which will give us access for
searching tweets. Then we need an output visualization part for viewing the results. We
will have multiple outputs for the search results.

So we need to plot them in different places so that user can easily understand the result
generated by the program. As well as outputs must change depending on different search
keyword. So user don’t need to reopen the app again and again.
We are going to use shiny. So as shiny default there is three parts in a web page.
1. Title panel,
2. Side bar panel,
3. Main panel.

So considering all the conditions we designed a layout for our project. The layout is given
below:

Figure 3.13 Experimental Layout

21
©Daffodil International University
CHAPTER 4
RESULTS AND DISCUSSION
To use an application creating a user interface is the best option. So we created a user
interface for our application so that anyone can use them easily. We used two types of
algorithms as Naive Bayes and Laxicon, on the other hand we got four kind of data as
output. All the outputs are visualized in our application via web browser.

4.1 EXPERIMENTAL RESULT


We created our project in two different ways for both who knows about R programming
language and who don’t know anything about it. Because there is always some people
who are interested about the internal structure and play with it and some other people
who wants to work using a simple user interface.

So first we created a program that can connect our program with R then it will collect
data and store them in the working directory. After storing the data our program will
retrieve the store data and short them and it is ready for analysis. In analysis part it will
happen in 4 different steps.
1. Emotion,
2. Polarity,
3. WordCloud &
4. Scoring.

Those analysis will show us the complete state of the sentiment for our collected tweets.
We stored every kind of data like CSV, image etc. Because by storing them if we want to
do some more analysis, it will be a lot easier. For example, if we want to compare all the
news of CNN for the month January and February then we just need to collect the data
and store them after processing in different location and compare them using the
WordCloud and bar charts.

22
©Daffodil International University
4.1.1 EMOTION CLASS

An emotion is a complex psychological state that involves three distinct components: a


subjective experience, a physiological response, and a behavioral or expressive response.
Emotions, can be considered as feelings, include experiences such as love, hate, anger,
trust, joy, panic, fear, and grief. Emotions are related to, but different from, mood.

A basic task in sentiment analysis is classifying the polarity of a given text at the
document, sentence, or feature/aspect level—whether the expressed opinion in a
document, a sentence or an entity feature/aspect is positive, negative, or neutral.
Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional
states such as "angry", "sad", and "happy".[6]

There are 6 emotion categories that are widely used to describe humans’ basic emotions,
based on facial expression [7]: anger, disgust, fear, happiness, sadness and surprise.
These are mainly associated with negative sentiment, with “Surprise” being the most
ambiguous, as it can be associated with either positive or negative feelings. Interestingly,
the number of basic human emotions has been recently “reduced”, or rather re-
categorized, to just 4; happiness, sadness, fear/surprise, and anger/disgust.[8]

So if we want to do sentiment analysis at first we need to know and understand what


emotion is and how it can be described or expressed. Without this analyzing human
sentiment through tweets will be difficult to do.

For our twitter sentiment analysis project we divided tweet sentiments in 6 important
classes. They are joy, sadness, surprise, anger, fear and unknown. To do this we used
Naive Bayes algorithm (Bag of Words) so that we can find out the individual scores for
all those emotion classes. Between all of them unknown class was calculated by the
unknown tweets number.

23
©Daffodil International University
To get the emotion class we need two libraries. They are require(plyr) and
require(stringr). Where plyr is a tool for Tools for splitting, applying and combining Data
and stringr is a tool for simple, consistent wrappers for common string operations.

To analyze the emotion of the tweets at first we need to consider our preprocessed data
for applying Naive Bayes algorithm. This algorithm can analyze data for two state
polarity and emotion. So here we will run the emotion program, so that we can find out
emotional value of the tweets.

We wrote a simple R script to apply Naive Bayes algorithm for emotion analysis. The
script is given below:

Figure 4.1 Emotion class sudo code

When we run the program, it automatically analyze the tweets and apply Naïve Bayes
algorithm to analyze emotion. This algorithm mainly classify emotions in 5 basic classes.
They are anger, fear, joy, sadness and surprise. After that rest of the tweets are counted as
Unknown class.

24
©Daffodil International University
After running the program we get the output as a bar diagram. We set different color for
different bars for better visualization. The output bar plot is given below:

Figure 4.2 Emotion class bar diagram

In following figure we got output for the channel CNN. But most of the tweets were
counted as unknown because Naïve Bayes algorithm could not find out the class for those
tweets. We also can see which emotion class having which value or how many tweets
they have.

4.1.2 POLARITY CLASS

Polarity is the state of emotion which defines the positivity or negativity of text. Polarity,
also known as orientation is he emotion expressed in the sentence. A text can be divided
into three forms, positive, negative and neutral. Depending on the comparison with the
positive and negative words score of a tweet can be easily counted.

25
©Daffodil International University
As like as emotion class we applied Naïve Bayes algorithm in here too. As we know the
basic task of sentiment analysis is figure out the sentiment state for a text depending on
the words in it.

To find out the priority at first we need to divide the text into single words so that
comparison or the calculation become easier. Then Naïve Bayes algorithm search for
words that can express sentiment in a tweet.

As we know Naïve Bayes algorithm works with bag of words. So first of all we need to
find out all the positive or negative words.

For example a list of positive and negative words are given below:

Positive Abound, abounds, acclamation, accolade, accolades, accommodative,


words accommodative, accomplish, accomplished, accomplishment,
accomplishments, accurate, accurately, achievable, achievement,
achievements, achievable
Negative 2-faced, 2-faces, abnormal, abolish, abominable, abominably, abominate,
words abomination, abort, aborted, aborts, abrade, abrasive, abrupt, abruptly,
abscond, absence, absent-minded, absentee, absurd, absurdity, absurdly,
absurdness, abuse

Those bag of words are used to find out if the tweet is positive or negative. For the each
positive or negative word every tweets will get 1 score. Here,

Tweet score = tweets positive value – tweets negative value

For example, we have a tweet “that was a pretty good party.”

For this we will get the score: positive = pretty (1) + good (1) = 2;

Negative = 0

So the tweet score for this tweet will be

Tweet score = 2 – 0 = 2

26
©Daffodil International University
All the tweets having positive tweet score will be positive tweets and all the tweets
having negative tweet score will be negative tweets. If any tweet contain tweet value 0, it
will be under neutral tweet. Counting all of them we will get a proper chart of positive,
negative and neutral tweets.

To calculate the polarity of the text we wrote an R script which can calculate the tweet
score using Naïve Bayes algorithm. The R script is given below:

Figure 4.3 Polarity class sudo code

When we run this program it will find out all the positive and negative words and start
analyzing them to differentiate them. So they will get their tweet score. Then according to
their tweet score they create a histogram plot of positive, negative and neutral tweets.
After running the polarity class program we will get the following diagram:

27
©Daffodil International University
Figure 4.4 Polarity class bar diagram

4.1.3 WORD CLOUD

If we want to define word cloud then we can say that an image composed of words used
in a particular text or subject, in which the size of each word indicates its frequency or
importance.

Word clouds (also known as text clouds or tag clouds) work in a simple way: the more a
specific word appears in a source of textual data (such as a speech, blog post, or
database), the bigger and bolder it appears in the word cloud.

Word clouds or tag clouds are graphical representations of word frequency that give
greater prominence to words that appear more frequently in a source text. The larger the
word in the visual the more common the word was in the document(s). This type of
visualization can assist evaluators with exploratory textual analysis by identifying words

28
©Daffodil International University
that frequently appear in a set of interviews, documents, or other text. It can also be used
for communicating the most salient points or themes in the reporting stage.[9]

So we also analyzed word cloud to find out hot topic or what about people are talking
most now a days. Word cloud is very important to understand people’s sentiment and
what about they are sad or happy. Without considering word cloud it will be tough to find
out key points of sentiment analysis bar chart or scoring.

Though there is a lot of tools to create a word cloud. Where we just need to upload our
document file or the website link and they provide us the word cloud. But for our
program we created our own word cloud analysis system.

To create a word cloud in R we need to use some libraries as require(tm),


require(wordcloud). Those two libraries is needed for mining data and creating word
cloud. We also need another library for visualization. Which is require(RColorBrewer). It
will provide us the theme for word cloud visualization.

Then we cleared our existed data and removed unwanted data as punctuations and stop
words. We converted words in lower case and converted them into plain text. Then a
word will be considered as valid for word cloud. Higher scorer will be the largest word
considering that all those words who have lower value will be smaller in size.

We wrote an R script to create word cloud for analyze the twitter data. The script is given
below:

29
©Daffodil International University
Figure 4.5 Word cloud script

After running the following script we will get our expected output which is word cloud of
our collected tweets. There is different themes for word cloud and they define what will
be the color for a keyword based on its value. Here we used dark2 theme for our word
cloud.

When we run this script we will get the word cloud. Word cloud size depend on the
number of tweets and keywords. A demo output for word cloud is given below:

30
©Daffodil International University
Figure 4.6 Word cloud for tweet data

From the following word cloud we can see that the output are divided in 6 different
classes. They are joy, sadness, surprise, anger, fear and unknown. Each class represent its
own type of emotions and frequent words for their tweets. Each class have its own color,
so that user can understand which word belongs to which class.

4.1.4 SCORING

Calculating score came from the positive and negative scores of the tweets. Basically in
this section we tried to show the number of tweets having same value for positive or
negative sentiment. The higher the score, the stronger the sentiment. According to our
consideration all the tweets cannot be counted as average positive or negative. Because

31
©Daffodil International University
some of them are simple positive sentence and some of them are negative sentences. But
there is also some sentences those can strongly express the sentiment as positive or
negative.

So this calculation was added to measure the strong sentiment state for the data we got
from twitter. All those tweets who are having same sentiment value will be added to the
same sentiment score bar.

For example, suppose we have 5 individual data or tweets:

1. I am having some trouble today.


2. That party way awesome, mind blowing, excellent. I just loved it.
3. Mango is a tasty Fruit.
4. Submit your CV in here http//…..
5. He is a good man.

Considering them from the 1 st tweet we got the score -1, 2nd tweet we got the score 4, 3rd
tweet we got the score 1, 4th tweet we got the score 0, 5th tweet we got the score 1.

So now here we got the score (-1, 4, 1, 0, 1), where 2 tweets are having score 1 and all
others are having different scores. So we got the bar chart as,

Score -1 0 1 2 3 4
Tweets 1 1 2 0 0 1

Figure 4.7 scoring system table

We did the same thing using R programming. We calculated the score and plotted them
as bar chart. The R program for scoring output is given below:

32
©Daffodil International University
Figure 4.8 Scoring histogram plot / bar chart program

In the given program we also stored the generated histogram plot or bar chart in our
working directory for further analysis. In the plot we set the row as “Number of tweets”
and column as “Tweets having sentiment score”. The score and the tweet numbers are
calculated as we previously described. The output histogram plot or bar chart for scores
are given below:

Figure 4.9 Score chart for tweets

33
©Daffodil International University
Following figure is showing us the score dependent output based on our previously done
analysis. Outputs will be different based on different time, search keyword, number of
tweets we collected and tweet updates.

4.2 FINAL OUTCOME

After completing our analysis we wanted to create a user interface where general users
will be able to analyze data. So we choose shiny as our tool to create the user interface.
The main benefit of shiny is it can create web applications which can interact with R.

To build a shiny app two thing is required. One of them is ui, and another one is server.
UI and server can be kept in same file like “app.r”, or in separate files like “ui.r” and
“server.r”.

UI contain the outputs and the webpage. On the other hand, server contain the main
program and inputs. So we kept our main program in the server file and created a
webpage where we kept a form by which we can search anything we want to analyze. On
another part of the page outputs will be shown.

To run the program we need to enter into the R software and go to the UI or server script.
There will be a “Run App” button. If we press that the program will run into instructed
window for example browser.

34
©Daffodil International University
Figure 4.10 Running Application

When the webpage opens in the browser then we will be able to see a search box on the
left. We need to insert whatever we want to search. Then we need to click on “VIEW”
button to view the analyzed result.

Figure 4.11 Searching for tweets

35
©Daffodil International University
After that the whole R program will start on the background. It will collect tweets
according to the search input. Then it will preprocess them. The preprocessed data will
create an emotion class diagram and plot it into the “Sentiment analysis for given word”
tab.

Figure 4.12 Emotion class output

If we want to view polarity class output we need to select “Polarity” from the form on the
left side. Then the program will analyze the polarity and plot the diagram of polarity class
instead of emotion class.

36
©Daffodil International University
Figure 4.13 Polarity class output

To view the word cloud output we need to select on the “WORDCLOUD” tab and we
will be able to see the word cloud in there. Depending on the processing speed this may
take a while to view the output.

Figure 4.14 Word cloud output

37
©Daffodil International University
Our last part of output is score histogram plot. To view the scores histogram bar plot we
need to click on the “scores” tab. Then the plot will appear in the web page.

Figure 4.15 Scores histogram output

38
©Daffodil International University
CHAPTER 5

CONCLUSION AND FUTURE SCOPE

As we can see that the output of this project is giving us a overall statistical view for
selected or given search keyword. So that we can find out the emotional state for targeted
keyword or data or person. Those outputs can be used for different sectors like education,
research, business. We are still working on it to get better result and use it in specific
work to help people. But learning and applying our methods for the future development is
our main challenge.

5.1 CONCLUSION

In this project we tried to develop a complete project on sentiment analysis of tweet data.
Which will be able to collect tweet data and analyze them and based on the analysis it
will provide statistical sentimental structure of the search keyword. Even it will be easy
for non-technical people to use it and examine the output. While we were working this
project we learnt a lot of things, also faced a lot of challenges also. From this project we
learnt data mining and analysis. Now we have real time data analysis experience. We
enjoyed our project work.

On the other hand, the main challenge we faced is resource. There is a very few good
websites for learning data mining. Even if we face any problem it is difficult to find
solution for that. That’s why we lost a lot of time in finding solution.

Though it took a lot of time to learn this step by step and all those steps were very small,
but we learnt from those. For our analysis we learnt R platform and language. We also
got idea about different packages inside it and how to work with them. We tried to not to
make it complex and obtain a high efficiency result from this project.

39
©Daffodil International University
5.2 FUTURE SCOPE

Sentiment analysis is already evolving from general (positive, negative and neutral) to
much more complex or more granular and deep understanding. So the demand of sentiment
analysis is increasing in both side of research and business. Researchers are working on the
accuracy of the algorithm and development of the lexicon based analysis. On the other hand,
business companies are working on the market policy and customer satisfaction analysis to
develop their business.

Out research work has potential of both to be used as commercial aspects or to do further
research. For commercial aspect, business companies can find out their customer satisfaction
based on tweets. So that they will be able to change their business policy to improve their
benefits and attract new customers to their product. On the other hand, researchers can collect
data for individuals to get the sentimental status of a person and use that data for further
research. Even development of accuracy of algorithms are also can be done by our research
work. In future we are planning to implement more algorithms in our research work to make
it more accurate for sentiment analysis. We want to contribute more in this research field by
keeping carry on study.

40
©Daffodil International University
REFERENCES

1. SENTIMENT ANALYSIS; https://en.wikipedia.org/wiki/Sentiment_analysis; Last


Accessed on 23th August 2016.
2. RELATED WORKS; https://www.quora.com/What-are-the-applications-of-sentiment-
analysis-Why-is-it-in-so-much-discussion-and-demand; Last Accessed on 14th
September 2016.
3. CHALLENGES; http://www.adweek.com/prnewser/5-key-challenges-in-sentiment-
analysis/116604; Last Accessed on 20th September 2016.
4. CHALLENGES; http://www.sciencedirect.com/science/article/pii/S1018363916300071;
Last Accessed on 20th September 2016;
5. Anais Collomb, Crina Costea, Damien Joyeux, Omar Hasan, Lionel Brunie, A Study and
Comparison of Sentiment Analysis Methods for Reputation Evaluation, University of
Lyon, INSA-Lyon, F-69621 Villeurbanne, France.
6. EMOTION CLASS; https://blog.hootsuite.com/beginners-guide-sentiment; Last
th
Accessed on 11 October 2016.
7. EMOTION CLASS; https://www.microsoft.com/developerblog/real-life-
code/2015/11/30/ Emotion-Detection-and-Recognition-from-Text-using-Deep-
Learning.html; Last Accessed On 11th October 2016.
8. EMOTION CLASS; http://betterevaluation.org/en/evaluation-options/wordcloud; Last
Accessed on 11th October 2016.
9. WORDCLOUD; https://web.njit.edu/~da225/NetHelp/default.htm?turl=Documents%2
Fcurrentexamplesofsentimentanalysis.html; Last Accessed on 23rd October 2016.
10. RESEARCH METHODOLOGY; https://www.tracx.com/resources/blog/out-of-the-dark-
ages-the-rise-of-social-media-sentiment-analysis-part-1-the-dark-ages; Last Accessed on
7th July 2016.
11. SENTIMENT ANALYSIS; http://searchbusinessanalytics.techtarget.com/definition/
opinion-mining-sentiment-mining; Last Accessed on 5th August 2016.
12. RESEARCH METHODOLOGY; http://www.slideshare.net/sumit786raj/sentiment-
analysis-of-twitter-data; Last Accessed On 10th July 2016.

41
©Daffodil International University
13. LEKHA R. NAIR, DR. SUJALA D. SHETTY, STREAMING TWITTER DATA
ANALYSIS USING SPARK FOR EFFECTIVE JOB SEARCH, Journal of Theoretical
and Applied Information Technology, 20th October 2015. Vol.80. No.2.
14. Vishal A. Kharde, S.S. Sonawane, Sentiment Analysis of Twitter Data: A Survey of
Techniques, International Journal of Computer Applications (0975 – 8887), Volume 139
– No.11, April 2016.
15. Penchalaiah.C, Murali.G, Suresh Babu.A, Effective Sentiment Analysis on Twitter Data
using: Apache Flume and Hive, IJISET - International Journal of Innovative Science,
Engineering & Technology, Vol. 1 Issue 8, October 2014.
16. Manjunath Mulimani, Nireeksha V Shetty, Nikshitha Shetty, Renita Maria Lolo,
Rajashree L, Data Analysis on Current Affairs Using R, International Journal of
Advanced Research in Computer Science and Software Engineering, Volume 5, Issue 4,
April 2015.
17. LEKHA R. NAIR, DR. SUJALA D. SHETTY, STREAMING TWITTER DATA
ANALYSIS USING SPARK FOR EFFECTIVE JOB SEARCH, Journal of Theoretical
and Applied Information Technology 20th October 2015. Vol.80. No.2.

42
©Daffodil International University

View publication stats

You might also like