Professional Documents
Culture Documents
net/publication/328622424
CITATIONS READS
6 3,099
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Divakar Yadav on 09 February 2019.
1 Introduction
Recently, many people in the world use social sites like Twitter, Facebook,
LinkedIn to share their views with the world. It is one of the best communication
tools.Thus, the bulk of data is generated (known as big data) and for analysis
the reviews, sentiment analysis was introduced. Sentiment Analysis (SA) is the
process of finding whether the given texts have a positive, negative or neutral
opinion. It also uses to detect the emotion of people, decision making process,
etc.The formal definition of Sentimental Analysis is “extracting the semantics
and determining the attitude of a speaker which conclude either positive, nega-
tive or neutral reaction.” It was first time used in 2003.
It was also for analysis of pre-or-post criminal activities on social media, product
reviews, movie reviews, news, and blogs, etc.The advantage of sentiment analysis
is to improve the products, leads to innovations, growth in market etc [1].This
method is also known opinion mining. This analysis totally depends upon the
context provided by the speaker. Sentiment analysis is handled at many levels
of granularity i.e. at the document level, sentence level, and phrase level. The
most well-known use of sentiment analysis is in reviews of items and services
given to the users. It is the application of natural language processing (NLP)
and it is commonly used in a recommender system. In our paper, we are using
data from Twitter. Twitter is an online social networking site, which provides a
virtual environment for the people who are interested in hanging out together. It
helps the people to express the thoughts on a subject. People post their views on
numerous topics like a recent issue, party-political issue, Bollywood-Hollywood
etc.There are many NLP technique which detects the sentiments of Twitter like
Stop word removing, Parts of Speech Tagging, Name Entity Recognition (NER)
which is trailed by bags of words etc. These techniques use dictionaries as the
references. Since no training is provided, it requires less computational power.
We are using lexicon approach which is used to classify the text into two classes:
“Positive”and “Negative” with the help of dictionaries.The challenges that arise
during extraction of the features and then doing classification of that text are
given below but some of the challenges are removed by cleaning the text data
set.
– Handling the big data which consist of the opinions given by the people.
– Informal languages, slang word/abbreviation or emoticons usage.
– Spelling mistakes/ typo mistakes.
– Detection of sarcasm. [2]. E.g. Dont bother me. I am living happily ever
after. Sarcasm: Speaker is taunting as well as hurting the person.
– Ambiguous sentences used by a user. E.g. I have never tasted a pizza quite
like that one before! Ambiguity: Was the pizza good or bad
– Hashtag based text detection [3].
– Detecting hidden sentiment of a user.
– Polarity Shifting detection [4].
The investigation of Twitter information is a rising field that needs more neces-
sities substantially more consideration. There are various methods to classifies
tweets into positive or negative class. Some researchers use machine learning
approach and some uses lexical based method. The ultimate goal is to extract
the sentiments of the given dataset.
In our paper, we use R language for our experiment. R is a freely available soft-
ware which is used for statistical computation, data manipulation, and graphical
display. It is a dialect of S which was designed by John M. Chambers in 1980.
It provides many statistical techniques like clustering, classification etc. It can
be easily run on any operating system (Windows, Unix, MacOS). It becomes
popular because it provides following facilities:
The corpus is the collection of tweets on our Honble Prime Minister Narendra
Modi. The dataset is a collected with the help of twitter streaming API. API
provides the authentication to access the tweets. In this, we acquire about 150
tweets and for that we used the following command of R for extracting the
tweets:
#extract the tweets
modi.tweets <– searchTwitter(”Modi”,n=150)
To enhance the performance, the dataset given as shouldn’t contain any type
of noise i.e. it should be clean dataset. In this section, we are removing noise
from tweets after extracting them. These are removed because it doesn’t pro-
vide any time of knowledge regarding the output we want. While scanning the
dataset, the useless data is also scanned which consume lots of time (CPU cycles
are wasted). Due these reasons we are eliminating the noise (useless data) from
tweets. Using R tool, the tweets are extracted and the next step is to clean the
data. In the cleaning of data, the emoticons,URL punctuation marks/Target/
are removed shown in Fig 3.
These are cleaned because we can easily understand the sentiments of the user by
removing these useless data. For example, “I like a @YouTube video http://t.co/et8m
Shyam Rangilla performed good” after cleaning it will look like “I like a @YouTube
video Shyam Rangilla performed good. Likewise, I love my India , ” and after
cleaning “I love my India” emoticons are removed.
Fig. 3. Extracted clean tweets (10) about Prime Minister Modi.
5 Conclusion
Sentiment analysis is the method of investigating the sentiments of the given text
so that a good decision could be made for improvement. Mainly there are two ap-
proaches lexicon based and machine learning based. We have focused on lexicon
based approach. In our experiment, we use a dataset of twitter and two dictio-
naries (Positive and Negative) which were manually designed. We have taken
the support of R language for our experiments. Some of the twitter sentences
are shown on Honble Prime Minister Narendra Modi. The difference between
positive word and negative word in a sentence was calculated which was stored
in variable Score. Score states the polarity of the sentence, whether it is a pos-
itive or negative sentence. If the score has a positive value then the sentence is
positive, otherwise negative. The result is shown in Fig 5.
In the future, we will use machine learning approach to compare the result with
the lexicon based approach. In addition, we will consider emoticons, discourse
words and slang words used in tweets while expressing the feeling. Hybrid lan-
guage and complex sentences will be considered too.
References
1. H. Thakkar and D. Patel, “Approaches for sentiment analysis on twitter: A state-
of-art study,” arXiv preprint arXiv:1512.01043, 2013.
2. M. Bouazizi and T. Ohtsuki, “A pattern-Based approach for Sarcasm Detection
on Twitter,” IEEE Access, vol. 4, pp. 5477–5488, 2016.
3. A. Joshi, P. Bhattacharyya and M. J. Carman, “Automatic sarcasm detection: A
survey,” arXiv preprint arXiv:1602.03426, 2016
4. R. Xia, F. Xu, C. Zong, Q. Li, Y. Qi and T. Li, “Dual sentiment analysis: Consider-
ing two sides of one review,” IEEE transactions on knowledge and data engineering,
vol. 27, pp. 2120–2133, 2015.
5. M. a. L. B. Hu, “Mining and summarizing customer reviews,Proceedings of the
tenth ACM SIGKDD international conference on Knowledge discovery and data
mining,” ACM, pp. 168–177, 2004.
6. X. Ding, B. Liu and P. S. Yu, “A holistic lexicon-based approach to opinion min-
ing,”ACM, pp. 231–240, 2008.
7. M. Taboada, J. Brooke, M. Tofiloski, K. Voll and M. Stede, “Lexicon-based meth-
ods for sentiment analysis,” Computational linguistics, vol. 37, pp. 267–307, 2011.
8. M. Kanakaraj and R. M. R. Guddeti, “NLP based sentiment analysis on Twitter
data using ensemble classifiers,” IEEE, pp. 1–5, 2015.
9. J. D. Rennie, L. Shih, J. Teevan and D. R. Karger, “Tackling the poor assumptions
of naive bayes text classifiers,”Proceedings of the 20th International Conference on
Machine Learning (ICML-03), pp. 616–623, 2003.
10. S. Schrauwen, “Machine learning approaches to sentiment analysis using the Dutch
Netlog Corpus,” Computational Linguistics and Psycholinguistics Research Center,
pp. 30–34, 2010.
11. M. Bouazizi and T. Ohtsuki,“A Pattern-Based Approach for Multi-Class Sentiment
Analysis in Twitter,”IEEE Access, vol. 5, pp. 20617–20639, 2017.
12. R. Xia, F. Xu, C. Zong, Q. Li, Y. Qi and T. Li, “Dual sentiment analysis: Consider-
ing two sides of one review,” IEEE transactions on knowledge and data engineering,
vol. 27, pp. 2120–2133, 2015.
13. E. Boiy and M.-F. Moens, “A machine learning approach to sentiment analysis
in multilingual Web texts,” Information retrieval Springer, vol. 12, pp. 526–558,
2009.
14. G. altoglou and M. Thelwall, “Twitter, MySpace, Digg: Unsupervised sentiment
analysis in social media,” ACM Transactions on Intelligent Systems and Technol-
ogy (TIST), vol. 3, p. 66, 2012.
15. T. Wilson, J. Wiebe and P. Hoffmann, “Recognizing contextual polarity: An explo-
ration of features for phrase-level sentiment analysis,” Computational linguistics,
vol. 35, pp. 399–433, 2009
16. A. P.Jain, and V.D. Katkar,(2015, December). “Sentiments analysis of Twitter data
using data mining.”In Information Processing (ICIP), 2015 International Confer-
ence on pp. 807-810. IEEE.
17. H. Parveen, and S.Pandey,(2016, July).“ Sentiment analysis on Twitter Data-set
using Naive Bayes algorithm.” In Applied and Theoretical Computing and Commu-
nication Technology (iCATccT), 2016 2nd International Conference on pp. 416-419.
IEEE.
18. V. Jha, N. Manjunath, P. D. Shenoy, K. R.Venugopal, and L. M. Patnaik, 2015,
July. “Homs: Hindi opinion mining system.” In Recent Trends in Information Sys-
tems (ReTIS), 2015 IEEE 2nd International Conference on pp. 366-371. IEEE.
19. E. Boiy and M. F. Moens (2009). “A machine learning approach to sentiment
analysis in multilingual Web texts.” Information retrieval, vol. 12 number 5, pp.
526-558. Springer.
20. S. Y. Ganeshbhai, Bhumika K. Shah, “Feature Based Opinion Mining : A Sur-
vey,” 2015 IEEE International Advance Computing Conference (IACC) pp. 919923,
2015.
21. Avaliable at: [http://rfunction.com/archives/1984]