Professional Documents
Culture Documents
net/publication/324965434
CITATIONS READS
0 617
4 authors, including:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by S. M. Mazharul Hoque Chowdhury on 05 May 2018.
BY
Priyanka Ghosh
ID:131-15-2378
AND
Supervised By
i
©Daffodil International University
APPROVAL
This Project titled “Sentiment Analysis of Tweet Data”, submitted by S.M. Mazharul
Hoque Chowdhury, Priyanka Ghosh and Most. Arina Afrin to the Department of
Computer Science and Engineering, Daffodil International University, has been accepted
as satisfactory for the partial fulfillment of the requirements for the degree of B.Sc. in
Computer Science and Engineering and approved as to its style and contents.
BOARD OF EXAMINERS
Supervised by:
Submitted by:
Priyanka Ghosh
ID: -131-15-2378
Department of CSE
Daffodil International University
iii
©Daffodil International University
ACKNOWLEDGEMENT
First we express our heartiest thanks and gratefulness to almighty Allah for His divine
blessing makes us possible to complete this project successfully.
We fell grateful to and wish our profound our indebtedness to of Dr. Syed Akhter
Hossain, Professor and Head, Department of Computer Science and Engineering,
Daffodil International University, Dhaka. Deep Knowledge & keen interest of our
supervisor in the field of data mining influenced us to carry out this project .His endless
patience, scholarly guidance, continual encouragement, constant and energetic
supervision, constructive criticism , valuable advice, reading many inferior draft and
correcting them at all stage have made it possible to complete this project.
We would like to express our heartiest gratitude to other faculty member and the staff of
Computer Science and Engineering Department of Daffodil International University, for
their kind help to finish our project.
We would like to thank our entire course mate in Daffodil International University, who
took part in this discuss while completing the course work.
Finally, we must acknowledge with due respect the constant support and patients of our
parents.
iv
©Daffodil International University
ABSTRACT
This project is on “Sentiment Analysis of Tweet Data” is a sentiment analysis project
based on big data analytics. This project will help us to analyze sentiment from twitter
generated text data. Which will be able to show the sentimental state for a person or a
specific topic or whatever we want.
The aim for our project is to build up an application which will be able to connect non-
technical people to the field of data mining. Where using simple user interface they will
be able to create a report based on their search keyword. This application is very simple
and easy to use. So it will be comfortable for everyone to use it.
To develop this project we just need to know R language and platform and its library. To
create the user interface we used shiny, which is a package including PHP and HTML to
create a web application. We will input inside the shiny app and in the background R will
do the data collection, analysis and output generation. Then shiny will print the output in
the web application.
After completing the application the application is working fine and its main task is
running the application on the browser, which is taking input and providing us output
based on input.
v
©Daffodil International University
TABLE OF CONTENTS
CONTENS PAGE
Board of examiners ii
Declaration iii
Acknowledgements iv
Abstract V
1.1 Objective 1
1.2 Motivation 2
2.3 Challenges 8
vi
©Daffodil International University
3.2.1 Creating Twitter App 11
3.2.2 R Setup 13
4.1.4 Scoring 31
39
5.1 Conclusion
40
5.2 Future Scope
REFERENCES 41-42
vii
©Daffodil International University
LIST OF FIGURES
FIGURES PAGE NO
Figure 3.1: Creating A Twitter Application 10
Figure 3.2: Application Created In Twitter 11
Figure 3.3: Application Homepage 13
Figure 3.4: R Download 13
Figure 3.5: R User Interface 14
Figure 3.6: R Studio User Interface 15
Figure 3.7: Data Collection and Store 16
Figure 3.8: Storing Data 17
Figure 3.9: Positive and Negative word Library 18
Figure 3.10: Sentiment Scoring Method 19
Figure 3.11: Tweet Sentiment Scoring 19
Figure 3.12: Data Flow 20
Figure 3.13: Experimental Layout 21
Figure 4.1: Emotion Class Sudo Code 24
Figure 4.2: Emotion Class Bar Diagram 25
Figure 4.3: Polarity class Sudo Code 27
Figure 4.4: Polarity class Bar Diagram 28
Figure 4.5: Word Cloud Script 30
Figure 4.6: Word Cloud for Tweet Data 31
Figure 4.7: Scoring System Table 32
Figure 4.8: Scoring Histogram plot/bar Chart Diagram 33
Figure 4.9: Score chart for Tweets 33
Figure 4.10: Running Application 35
Figure 4.11: Searching for Tweet 35
Figure 4.12: Emotion Class Output 36
Figure 4.13: Polarity Class Output. 37
Figure 4.14: Word Cloud Output. 37
Figure 4.15: Score Histogram Output 38
viii
©Daffodil International University
0
©Daffodil International University
CHAPTER 1
INTRODUCTION
Twitter, a popular social network site, is one such source which contributes huge data
which possess values beyond social and commercial interests. Twitter users can express
or share their opinions, feelings or information regarding events, products, health or
anything in their 140 character restricted short messages termed tweets. It is a great
source of data. Now a days, everyday 500 million tweets are posted in twitter. So it has
become very important to analyze them to find track of data flowing throw twitter. As a
part of data analyzing data or data mining analyzing the sentiment of twitter data is also
important to know the sentiment of the user. Analyzed data can be used for research,
business market study, product review etc.
1.1 OBJECTIVE
1
©Daffodil International University
Our research objectives are given below:
1.2 MOTIVATION
The explosive growth of the textual information on the web in the past few decades has
brought radical change in human life. In the web, people share their opinions and
sentiments. In many forms about services the services they are aware of. This creates a
large collection of opinions and views in the form of texts, which needs to be analyzed to
know the efficacy of the service. Opinions are a usually subjective expression that
describes person’s sentiment, feelings towards the object or service.
Now a day's twitter is a very popular social website were from all over the world people
connect with each other and communicate and express their thought through texts. By
analyzing their tweets we can decide whatever their mental situation is going on. Even
what is the recent “hot topic”. So we were motivated to analyze tweet data from this
thinking. Before starting on something motivation is very important. Because without
proper motivation it’s not possible to carry on with it. In some cases only motivation can
complete a huge part of an analysis or task.
2
©Daffodil International University
1.3 EXPECTED OUTCOME
We choose “Sentiment analysis of tweet data” as our project because we wanted to build
a data mining app which will be able to create interaction between tweet data and data
mining. We wanted to build an app which will be able to collect twitter data for example
tweets and analyze them. After the analysis the application will provide us the statistical
overview of sentimental situation of targeted people.
So finally we decided to build a data mining application using R for sentiment analysis.
Where we will be able to search tweets using keyword, for a particular topic, of a
particular account or random data. User will also be able to see trending hot topics or
keywords on their search topic. We will also build a web application for non-technical
users. So that they will be able to analyze data themselves from twitter. Outputs will be
viewed as histogram plot, bar diagram and word cloud.
There are five chapters in this research paper. They are: Introduction, Literature Review,
Proposed Research, Results and Discussion, Conclusion and Future.
3
©Daffodil International University
CHAPTER 2
LITERATURE REVIEW
Sentiment analysis is study of data and finding out the sentimental state. Data can be text
or a audio or a video file. Sentiment can be measured by applying different method for
different type of data. For example, text can be analyzed by the keywords on it, audio
files can be analyzed using tone of the voice and video or image file can be analyzed by
tracking the state of the face. Recently sentiment analysis has become very popular
because of the large data flow over the internet. Researchers are trying to analyze
different type of data for different purpose. But the main challenge for sentiment analysis
is detecting actual state of emotion. Because some cases analyzed data don't result what
they actually mean. Because some positive sentences sometimes mean as negative and
some negatives mean as positive. Researchers are now trying to solve this kind of issues
to get much more accurate result of sentiment analysis.
4
©Daffodil International University
their own judgment, emotional state of the subject, or the state of any emotional
communication they are using to affect a reader or listener.
Sentiment analysis is a fast growing subject in the technical communication field. With
the increase in social media, online retail, and personal blogs and publications knowing
where public sentiment is leaning has translated into a rapid evolution in sentiment
analysis that can become a valuable skill.
When we perform sentiment analysis on some content, the main searching point is the
opinions in content and we picking the sentiment based on those opinions. An opinion is
an expression that consists of two key components. One of them is a target or topic and
another one is a sentiment on the topic.
Consider a sentence, “I love this company“, here “this company” is the topic and the
sentiment that is expressed by the verb “love” is positive.
Sentiment analysis is just not a feature in a social analytics tool – it’s a field of study.
This field is still being studied, albeit not at great lengths due to the intricacy of this
analysis, in the same way that some aspects of linguistics are still up to debate or not fully
understood.
Existing approaches to sentiment analysis can be grouped into three main categories:
knowledge-based techniques, statistical methods, and hybrid approaches.
Statistical methods leverage on elements from machine learning such as latent semantic
analysis, support vector machines, "bag of words" and Semantic Orientation — Point
5
©Daffodil International University
wise Mutual Information. More sophisticated methods try to detect the holder of a
sentiment (the person who maintains that affective state) and the target (the entity about
which the affect is felt). To mine the opinion in context and get the feature which has
been opinionated, the grammatical relationships of words are used. Grammatical
dependency relations are obtained by deep parsing of the text.
Hybrid approaches leverage on both machine learning and elements from knowledge
representation such as ontology and semantic networks in order to detect semantics that
are expressed in a subtle manner, e.g., through the analysis of concepts that do not
explicitly convey relevant information, but which are implicitly linked to other concepts
that do so.
Open source software tools deploy machine learning, statistics, and natural language
processing techniques to automate sentiment analysis on large collections of texts,
including web pages, online news, internet discussion groups, online reviews, web blogs,
and social media. Knowledge-based systems, on the other hand, make use of publicly
available resources, to extract the semantic and affective information associated with
natural language concepts. Sentiment analysis can also be performed on visual content,
i.e., images and videos. One of the first approach in this direction is SENTIBANK
utilizing an adjective noun pair representation of visual content. [1]
The applications for sentiment analysis are endless. More and more we’re seeing it used
in social media monitoring and VOC to track customer reviews, survey responses,
competitors, etc. However, it is also practical for use in business analytics and situations
in which text needs to be analyzed.
6
©Daffodil International University
manually complete. Because it is so efficient (and accurate – Semantria has 80%
accuracy for English content) many businesses are adopting text and sentiment analysis
and incorporating it into their processes. [2]
People all around the world now a days working on different topics of sentiment analysis.
As the amount of data increasing day by day data mining and sentiment analysis became
more popular to people. Different companies need sentiment analysis for collecting
customer information. Because this is the easiest way to find their customers and target
them for their business.
Even now social media also need sentiment analysis. Using sentiment analysis they
1. Opinion mining and sentiment analysis. By Bo Pang and Lillian Lee, Foundations
2. Twitter Sentiment Analysis: The Good the Bad and the OMG! ByEfthymios
723–762.
7
©Daffodil International University
5. Sentiment Analysis of Political Tweets: Towards an Accurate Classifier, Akshat
Bakliwal, Jennifer Foster, Jennifer van der Puil, Ron O’Brien, Lamia Tounsi and
Mark Hughes.
6. Stock trend prediction using news sentiment analysis, Kalyani Joshi, Prof.
K.Kayalvizh.
All those are the recent research works on sentiment analysis. They focused on different
topics like customer review, decision making, NLP, twitter data, openion mining etc.
2.3 CHALLENGES
As we know that sentiment analysis is the analysis of image, text, audio, video sentiment
recognition. Sentiment analysis and evaluation process are facing several challenges.
These challenges become obstacles in analyzing the accurate meaning of sentiments and
detecting the suitable sentiment polarity. To identify and extract subjective information
from text the sentiment analysis is the practice of applying natural language processing
and text analysis techniques [3].
The degree of accuracy issue is hard to answer, said Bing Liu, a University of Chicago
computer science professor specializing in data mining. It depends on what are
measuring, the level of analyzing text, and the number of data sets across domains and
the voice sound quality of videos, among other variables. Still, he thinks that progress is
being made in this regard [2].
8
©Daffodil International University
emotions like how much hate there is inside the opinion, how much happiness, how much
sadness, etc.
Emotion detection is really a difficult task because sometimes it happens that someone
tell something that seems positive but in real it’s not positive the sense was negative. So
sometimes it is difficult to understand meaning of a sentence cause the emotions are too
much complex.
Mainly sentiment analysis try to detect the mental situation of a person. But sometimes it
become tough to tell what the person meant. If we consider audio sentiment analysis, then
noise or voice tune difference can create major error in output. Same for text analysis,
because some texts word wise meaning is totally different from its actual meaning. That’s
why sentiment analysis is facing major challenges now a days.
9
©Daffodil International University
CHAPTER 3
PROPOSED RESEARCH
In this project we tried to analyze twitter data to get sentimental state from it. To do that
at first we need to collect data from twitter. We created a twitter app to get the data. Then
we connected it with our application. We wrote a script to analyze collected data. So
when we run the program it will automatically collect the data, analyze it and give the
output.
We found different platforms for data mining as orange, Weka, Rattle GUI, Apache
Mahout, R, Hadoop, UIMA, SenticNet API, Natural Language Toolkit etc. Then we
selected R as our data mining tool and platform.
We selected R as our data mining tool and platform because of its friendly user interface
and strong library for data processing, data mining and output visualization. So at first we
learnt how to work with R. Then we started learning R programming language which is
mainly used for data analysis and it is a high level programing language. We learned R
programming language from https://www.datacamp.com. It is very helpful website
because its shows how to learn R in step by step process.
After that we created a twitter app for data collection and using R language we created an
application. This application can connect to twitter using internet and collect data and
analyze them. Collected data are stored in a CSV file.
10
©Daffodil International University
3.2 DATA COLLECTION
To collect data we needed a twitter application, which can give us access to the twitter
and application will be able to find expected tweets. To create a twitter application we
went to the twitter application management website. Then we created an application
named Sentiment Analysis BDA.
To create the application we needed to fill up a form providing name of the application,
what is the purpose of the application and other necessary data.
After providing all information our application is created and now we can access it and
view detail information about it.
11
©Daffodil International University
Figure 3.2 application created in twitter
In this application home page they provided us some important information called
consumer key, consumer, Consumer Secret, access token, access token secret about
accessing the app to collect tweet. Without accessing those information no one will be
able to connect this app with our application.
12
©Daffodil International University
Figure 3.3 application home page
3.2.2 R SETUP
When creating twitter application is done now we are ready to build our R application to
collect data and store them. To create R application first we downloaded and installed the
R application to our computer.
13
©Daffodil International University
After downloading and installing the R app we will get a user interface shown below:
But the user interface don’t look that much comfortable to use and there is no
visualization option. So now we have to install another application named RStudio.
RStudio gives R programmers a very strong and helpful user interface. They will also get
a visualization platform where they will be able to visualize their statistical result or
output.
14
©Daffodil International University
Figure 3.6 RStudio user interface
After installing necessary components now we are ready to write the program to collect
tweets and store them.
To collect data from twitter we have to write a script which will connect the application
and collect tweets from twitter. Now the script for data collection and connection is given
below:
15
©Daffodil International University
Figure 3.7 data collection and store
We used two libraries library(twitteR), library(ROAuth) for data collection and linking
the application with twitter.This script will collect data from twitter and store them in a
CSV file. But as we need to analyze them and show user the output so we later only
worked with data file collected from twitter and analyzed them.
Tweets will be collected based on last upload. User will be able to collect as much as
tweets he wants. Tweet collection will be based on search keyword, number of tweets and
existence of that tweet keyword in the twitter.
When all the data are collected, our main task then starts. Which is “Sentiment analysis
of tweet data”. But for processing the data we need to work in two steps. They are:
i. Data pre-processing
ii. Data analyze
After completing those steps we will be able to get our expected output of our project.
16
©Daffodil International University
3.3.1 DATA PRE-PROCESSING
In this step we will cut off all the unexpected and unwanted data generated by twitter as
location, retweets number, link etc.
When tweets are pre-processed data are ready to analyze. There is several methods to
analyze data for sentiment analysis. They are Sentiment Classification Using Lexical
Contextual Sentence Structure, Combining Lexicon and Learning based Approaches for
Concept-Level Sentiment Analysis, Interdependent Latent Dirichlet Allocation, A Joint
Model of Feature Mining and Sentiment Analysis for Product Review Rating, Opinion
Digger, Latent Aspect Rating Analysis on Review Text Data: A Rating Regression
Approach.[5]
In our project we used combining Lexicon and Learning based approaches for concept-
level sentiment analysis. Here we first divided our tweet sentences into words. So that we
are able to compare with our stored positive and negative words.
17
©Daffodil International University
We used positive and negative words as our word library for positive and negative
sentiment. We stored them as different .txt file and stored them in the same directory
where the r and CSV files are stored. Demo of positive and negative words are given
below:
Now we need to write an R script which will read our pre-processed tweet data and
divide them into words. Then it will match those words with our previously stored
positive and negative words which we are considering as our library for our sentiment
analysis.
Here the script will also score our tweets according to its positive or negative values. If
the positive value is greater than negative value the tweet is a positive tweet.
18
©Daffodil International University
The scoring system works on based on following method:
Now we will apply this same method in R to analyze the tweet data which we collected
from twitter before and processed. The R program for analyzing data is given below:
19
©Daffodil International University
After calculating the twitter value we will get the twitter value as positive or negative. If
the tweet is not positive or negative then it will be counted as neutral tweet.
According to the flow model system will recheck for data connection before collecting
data. If it is not connected then it will try total 3 times then connection will fail. Data flow
will be direct as it is given is the figure.
20
©Daffodil International University
3.5 EXPERIMENTAL LAYOUT
According to our plan we wanted to build a user interface for general users who wants to
analyze the tweet data. So first of all we need a search box, which will give us access for
searching tweets. Then we need an output visualization part for viewing the results. We
will have multiple outputs for the search results.
So we need to plot them in different places so that user can easily understand the result
generated by the program. As well as outputs must change depending on different search
keyword. So user don’t need to reopen the app again and again.
We are going to use shiny. So as shiny default there is three parts in a web page.
1. Title panel,
2. Side bar panel,
3. Main panel.
So considering all the conditions we designed a layout for our project. The layout is given
below:
21
©Daffodil International University
CHAPTER 4
RESULTS AND DISCUSSION
To use an application creating a user interface is the best option. So we created a user
interface for our application so that anyone can use them easily. We used two types of
algorithms as Naive Bayes and Laxicon, on the other hand we got four kind of data as
output. All the outputs are visualized in our application via web browser.
So first we created a program that can connect our program with R then it will collect
data and store them in the working directory. After storing the data our program will
retrieve the store data and short them and it is ready for analysis. In analysis part it will
happen in 4 different steps.
1. Emotion,
2. Polarity,
3. WordCloud &
4. Scoring.
Those analysis will show us the complete state of the sentiment for our collected tweets.
We stored every kind of data like CSV, image etc. Because by storing them if we want to
do some more analysis, it will be a lot easier. For example, if we want to compare all the
news of CNN for the month January and February then we just need to collect the data
and store them after processing in different location and compare them using the
WordCloud and bar charts.
22
©Daffodil International University
4.1.1 EMOTION CLASS
A basic task in sentiment analysis is classifying the polarity of a given text at the
document, sentence, or feature/aspect level—whether the expressed opinion in a
document, a sentence or an entity feature/aspect is positive, negative, or neutral.
Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional
states such as "angry", "sad", and "happy".[6]
There are 6 emotion categories that are widely used to describe humans’ basic emotions,
based on facial expression [7]: anger, disgust, fear, happiness, sadness and surprise.
These are mainly associated with negative sentiment, with “Surprise” being the most
ambiguous, as it can be associated with either positive or negative feelings. Interestingly,
the number of basic human emotions has been recently “reduced”, or rather re-
categorized, to just 4; happiness, sadness, fear/surprise, and anger/disgust.[8]
For our twitter sentiment analysis project we divided tweet sentiments in 6 important
classes. They are joy, sadness, surprise, anger, fear and unknown. To do this we used
Naive Bayes algorithm (Bag of Words) so that we can find out the individual scores for
all those emotion classes. Between all of them unknown class was calculated by the
unknown tweets number.
23
©Daffodil International University
To get the emotion class we need two libraries. They are require(plyr) and
require(stringr). Where plyr is a tool for Tools for splitting, applying and combining Data
and stringr is a tool for simple, consistent wrappers for common string operations.
To analyze the emotion of the tweets at first we need to consider our preprocessed data
for applying Naive Bayes algorithm. This algorithm can analyze data for two state
polarity and emotion. So here we will run the emotion program, so that we can find out
emotional value of the tweets.
We wrote a simple R script to apply Naive Bayes algorithm for emotion analysis. The
script is given below:
When we run the program, it automatically analyze the tweets and apply Naïve Bayes
algorithm to analyze emotion. This algorithm mainly classify emotions in 5 basic classes.
They are anger, fear, joy, sadness and surprise. After that rest of the tweets are counted as
Unknown class.
24
©Daffodil International University
After running the program we get the output as a bar diagram. We set different color for
different bars for better visualization. The output bar plot is given below:
In following figure we got output for the channel CNN. But most of the tweets were
counted as unknown because Naïve Bayes algorithm could not find out the class for those
tweets. We also can see which emotion class having which value or how many tweets
they have.
Polarity is the state of emotion which defines the positivity or negativity of text. Polarity,
also known as orientation is he emotion expressed in the sentence. A text can be divided
into three forms, positive, negative and neutral. Depending on the comparison with the
positive and negative words score of a tweet can be easily counted.
25
©Daffodil International University
As like as emotion class we applied Naïve Bayes algorithm in here too. As we know the
basic task of sentiment analysis is figure out the sentiment state for a text depending on
the words in it.
To find out the priority at first we need to divide the text into single words so that
comparison or the calculation become easier. Then Naïve Bayes algorithm search for
words that can express sentiment in a tweet.
As we know Naïve Bayes algorithm works with bag of words. So first of all we need to
find out all the positive or negative words.
For example a list of positive and negative words are given below:
Those bag of words are used to find out if the tweet is positive or negative. For the each
positive or negative word every tweets will get 1 score. Here,
For this we will get the score: positive = pretty (1) + good (1) = 2;
Negative = 0
Tweet score = 2 – 0 = 2
26
©Daffodil International University
All the tweets having positive tweet score will be positive tweets and all the tweets
having negative tweet score will be negative tweets. If any tweet contain tweet value 0, it
will be under neutral tweet. Counting all of them we will get a proper chart of positive,
negative and neutral tweets.
To calculate the polarity of the text we wrote an R script which can calculate the tweet
score using Naïve Bayes algorithm. The R script is given below:
When we run this program it will find out all the positive and negative words and start
analyzing them to differentiate them. So they will get their tweet score. Then according to
their tweet score they create a histogram plot of positive, negative and neutral tweets.
After running the polarity class program we will get the following diagram:
27
©Daffodil International University
Figure 4.4 Polarity class bar diagram
If we want to define word cloud then we can say that an image composed of words used
in a particular text or subject, in which the size of each word indicates its frequency or
importance.
Word clouds (also known as text clouds or tag clouds) work in a simple way: the more a
specific word appears in a source of textual data (such as a speech, blog post, or
database), the bigger and bolder it appears in the word cloud.
Word clouds or tag clouds are graphical representations of word frequency that give
greater prominence to words that appear more frequently in a source text. The larger the
word in the visual the more common the word was in the document(s). This type of
visualization can assist evaluators with exploratory textual analysis by identifying words
28
©Daffodil International University
that frequently appear in a set of interviews, documents, or other text. It can also be used
for communicating the most salient points or themes in the reporting stage.[9]
So we also analyzed word cloud to find out hot topic or what about people are talking
most now a days. Word cloud is very important to understand people’s sentiment and
what about they are sad or happy. Without considering word cloud it will be tough to find
out key points of sentiment analysis bar chart or scoring.
Though there is a lot of tools to create a word cloud. Where we just need to upload our
document file or the website link and they provide us the word cloud. But for our
program we created our own word cloud analysis system.
Then we cleared our existed data and removed unwanted data as punctuations and stop
words. We converted words in lower case and converted them into plain text. Then a
word will be considered as valid for word cloud. Higher scorer will be the largest word
considering that all those words who have lower value will be smaller in size.
We wrote an R script to create word cloud for analyze the twitter data. The script is given
below:
29
©Daffodil International University
Figure 4.5 Word cloud script
After running the following script we will get our expected output which is word cloud of
our collected tweets. There is different themes for word cloud and they define what will
be the color for a keyword based on its value. Here we used dark2 theme for our word
cloud.
When we run this script we will get the word cloud. Word cloud size depend on the
number of tweets and keywords. A demo output for word cloud is given below:
30
©Daffodil International University
Figure 4.6 Word cloud for tweet data
From the following word cloud we can see that the output are divided in 6 different
classes. They are joy, sadness, surprise, anger, fear and unknown. Each class represent its
own type of emotions and frequent words for their tweets. Each class have its own color,
so that user can understand which word belongs to which class.
4.1.4 SCORING
Calculating score came from the positive and negative scores of the tweets. Basically in
this section we tried to show the number of tweets having same value for positive or
negative sentiment. The higher the score, the stronger the sentiment. According to our
consideration all the tweets cannot be counted as average positive or negative. Because
31
©Daffodil International University
some of them are simple positive sentence and some of them are negative sentences. But
there is also some sentences those can strongly express the sentiment as positive or
negative.
So this calculation was added to measure the strong sentiment state for the data we got
from twitter. All those tweets who are having same sentiment value will be added to the
same sentiment score bar.
Considering them from the 1 st tweet we got the score -1, 2nd tweet we got the score 4, 3rd
tweet we got the score 1, 4th tweet we got the score 0, 5th tweet we got the score 1.
So now here we got the score (-1, 4, 1, 0, 1), where 2 tweets are having score 1 and all
others are having different scores. So we got the bar chart as,
Score -1 0 1 2 3 4
Tweets 1 1 2 0 0 1
We did the same thing using R programming. We calculated the score and plotted them
as bar chart. The R program for scoring output is given below:
32
©Daffodil International University
Figure 4.8 Scoring histogram plot / bar chart program
In the given program we also stored the generated histogram plot or bar chart in our
working directory for further analysis. In the plot we set the row as “Number of tweets”
and column as “Tweets having sentiment score”. The score and the tweet numbers are
calculated as we previously described. The output histogram plot or bar chart for scores
are given below:
33
©Daffodil International University
Following figure is showing us the score dependent output based on our previously done
analysis. Outputs will be different based on different time, search keyword, number of
tweets we collected and tweet updates.
After completing our analysis we wanted to create a user interface where general users
will be able to analyze data. So we choose shiny as our tool to create the user interface.
The main benefit of shiny is it can create web applications which can interact with R.
To build a shiny app two thing is required. One of them is ui, and another one is server.
UI and server can be kept in same file like “app.r”, or in separate files like “ui.r” and
“server.r”.
UI contain the outputs and the webpage. On the other hand, server contain the main
program and inputs. So we kept our main program in the server file and created a
webpage where we kept a form by which we can search anything we want to analyze. On
another part of the page outputs will be shown.
To run the program we need to enter into the R software and go to the UI or server script.
There will be a “Run App” button. If we press that the program will run into instructed
window for example browser.
34
©Daffodil International University
Figure 4.10 Running Application
When the webpage opens in the browser then we will be able to see a search box on the
left. We need to insert whatever we want to search. Then we need to click on “VIEW”
button to view the analyzed result.
35
©Daffodil International University
After that the whole R program will start on the background. It will collect tweets
according to the search input. Then it will preprocess them. The preprocessed data will
create an emotion class diagram and plot it into the “Sentiment analysis for given word”
tab.
If we want to view polarity class output we need to select “Polarity” from the form on the
left side. Then the program will analyze the polarity and plot the diagram of polarity class
instead of emotion class.
36
©Daffodil International University
Figure 4.13 Polarity class output
To view the word cloud output we need to select on the “WORDCLOUD” tab and we
will be able to see the word cloud in there. Depending on the processing speed this may
take a while to view the output.
37
©Daffodil International University
Our last part of output is score histogram plot. To view the scores histogram bar plot we
need to click on the “scores” tab. Then the plot will appear in the web page.
38
©Daffodil International University
CHAPTER 5
As we can see that the output of this project is giving us a overall statistical view for
selected or given search keyword. So that we can find out the emotional state for targeted
keyword or data or person. Those outputs can be used for different sectors like education,
research, business. We are still working on it to get better result and use it in specific
work to help people. But learning and applying our methods for the future development is
our main challenge.
5.1 CONCLUSION
In this project we tried to develop a complete project on sentiment analysis of tweet data.
Which will be able to collect tweet data and analyze them and based on the analysis it
will provide statistical sentimental structure of the search keyword. Even it will be easy
for non-technical people to use it and examine the output. While we were working this
project we learnt a lot of things, also faced a lot of challenges also. From this project we
learnt data mining and analysis. Now we have real time data analysis experience. We
enjoyed our project work.
On the other hand, the main challenge we faced is resource. There is a very few good
websites for learning data mining. Even if we face any problem it is difficult to find
solution for that. That’s why we lost a lot of time in finding solution.
Though it took a lot of time to learn this step by step and all those steps were very small,
but we learnt from those. For our analysis we learnt R platform and language. We also
got idea about different packages inside it and how to work with them. We tried to not to
make it complex and obtain a high efficiency result from this project.
39
©Daffodil International University
5.2 FUTURE SCOPE
Sentiment analysis is already evolving from general (positive, negative and neutral) to
much more complex or more granular and deep understanding. So the demand of sentiment
analysis is increasing in both side of research and business. Researchers are working on the
accuracy of the algorithm and development of the lexicon based analysis. On the other hand,
business companies are working on the market policy and customer satisfaction analysis to
develop their business.
Out research work has potential of both to be used as commercial aspects or to do further
research. For commercial aspect, business companies can find out their customer satisfaction
based on tweets. So that they will be able to change their business policy to improve their
benefits and attract new customers to their product. On the other hand, researchers can collect
data for individuals to get the sentimental status of a person and use that data for further
research. Even development of accuracy of algorithms are also can be done by our research
work. In future we are planning to implement more algorithms in our research work to make
it more accurate for sentiment analysis. We want to contribute more in this research field by
keeping carry on study.
40
©Daffodil International University
REFERENCES
41
©Daffodil International University
13. LEKHA R. NAIR, DR. SUJALA D. SHETTY, STREAMING TWITTER DATA
ANALYSIS USING SPARK FOR EFFECTIVE JOB SEARCH, Journal of Theoretical
and Applied Information Technology, 20th October 2015. Vol.80. No.2.
14. Vishal A. Kharde, S.S. Sonawane, Sentiment Analysis of Twitter Data: A Survey of
Techniques, International Journal of Computer Applications (0975 – 8887), Volume 139
– No.11, April 2016.
15. Penchalaiah.C, Murali.G, Suresh Babu.A, Effective Sentiment Analysis on Twitter Data
using: Apache Flume and Hive, IJISET - International Journal of Innovative Science,
Engineering & Technology, Vol. 1 Issue 8, October 2014.
16. Manjunath Mulimani, Nireeksha V Shetty, Nikshitha Shetty, Renita Maria Lolo,
Rajashree L, Data Analysis on Current Affairs Using R, International Journal of
Advanced Research in Computer Science and Software Engineering, Volume 5, Issue 4,
April 2015.
17. LEKHA R. NAIR, DR. SUJALA D. SHETTY, STREAMING TWITTER DATA
ANALYSIS USING SPARK FOR EFFECTIVE JOB SEARCH, Journal of Theoretical
and Applied Information Technology 20th October 2015. Vol.80. No.2.
42
©Daffodil International University