You are on page 1of 5

SENTIMENT ANALYSIS USING RNN AND GOOGLE TRANSLATOR

1 2
Dipti Mahajan Dev Kumar Chaudhary
1,2
Amity University, Uttar Pradesh, India
1 2
dipti7998@gmail.com, dkchaudhary@amity.edu

Abstract
Sentimental analysis involves the analysis of sentiments or
Sentimental analysis is the term which is used to refer the
opinions that are involved with a particular product or a
usage of NLP (Natural language processing), Analysis of
service or about a particular person which in turn generates a
text, linguistics related to computers or machines and the
review collectively to be positive , negative or neutral.
metrics of biology termed as biometrics. Sentimental
Generally speaking, sentiment analysis aims to determine the
analysis is carried out for the sole purpose of the reviews
attitude of a speaker, writer, or other subject with respect to
or surveys of a product or a person which are in
some topic or the overall contextual polarity or emotional
conclusion to the sentiments of people involved for that
reaction to a document, interaction, or event.
particular product or customer.
Opinion Mining which comes under the Sentimental First, of all we fetch the data from Twitter in order to perform
analysis or sometimes termed as same also uses the analysis of sentiments. Twitter being the most popular
techniques of machine learning which involves those platform in social networking in which millions of statuses are
algorithms which help us to find the review of the product posted everyday which are termed as “tweets”. That’s why
or service which is in return helpful for the customers or twitter can be termed as the having the largest data sets on the
buyers and also for the sellers which can help them to internet. Twitter can involve the statuses having opinions on
improve their product. different products or services or about a particular person. It
This research paper’s focal points are the techniques forms to be a platform for the direct marketing for particular
which involves NLP(Natural language processing) in brands of products and services as tweets are posted everyday
which there is a usage of Stanford Library for increasing involving the opinions.
the competence power of the machine to classify more data The main obstacles that are faced when we fetch tweets or data
which is fetched from Twitter using the twitter Application from twitter are:
program interface , since twitter now a days is that ¾ Linguistic Issues : fetching tweets or data which are in
platform where you can find out the tweets about a different languages other than English.
particular person or the product. So, it becomes a very ¾ Polarity Of Tweets : Most of the time , we find that
successful platform to fetch data. the neutral sentences or tweets are more than the
Google Translator is used for taking the account of that positive and negative tweets because of which its hard
data also which is not in English. Sometimes fetching data
for the particular review of a product.
involves those sentences which are not in English so to
increase the accuracy we use Google Translator ¾ Limitation Of Sentiments : sometimes , it is seen that
Application program Interface in our project. the tweets lack the sentiments because of being short
For the classification of sentiments whether being positive and doesn’t really disclose any sentiments.
, negative or neutral we use the best algorithm of Neural
networks rather than using algorithms like Maximum In this project we tried to remove the obstacles that are listed
entropy or naïve bayes. above by applying and using techniques involving Google
By using all these techniques we try to find the maximum Translator Application Program Interface which removes the
accuracy than others. linguistic issues as it will convert the non English tweets or
sentences in English and then it will be easier for the machine
to collect and read data as the machine will be trained in
English.
For increasing the scope of machine’s competence power we
use Stanford library, as it is the English library. So, that the
I. INTRODUCTION machine will easily understand the sentence or tweet and will
be able to classify the tweet to be positive, negative or neutral.

c
978-1-5386-1719-9/18/$31.00 2018 IEEE 798
For the determining the polarity of data we use the algorithm Paper[14] and paper[15] carry out the analysis and processes
of RNN( Recurrent Neural Network) because it is termed as of sentiments over social web data.
the best algorithm as it is inspired by biological neural
networks. We see that the artificial neural networks works in
the same way as a normal human brain will work, it follows
III. METHODOLOGY
the same approach. RNN , let each unit or can be termed as
neuron in the network to intake or gain information from The methodology used to carry out the sentiment analysis of a
previous unit. RNNs can use the memory which is internal in product, service or a person is by classifying the sentiments in
system to process and analyze series or sequence of inputs. three categories of being positive , negative or neutral and on
The next sections contain Related Work, Methodology, the basis of these categories review is formed.
Proposed Approach, Implementation, Conclusion and The techniques and methods involved to carry out sentiment
References. analysis are machine learning techniques involving RNN
algorithm and NLP, for improving the competence power and
accuracy of machine we introduce Stanford library. For the
II.RELATED WORK data that is fetched from twitter on which sentiment analysis is
to be performed we use google translator to remove linguistic
The field of sentiment Analysis involves machine learning
issues.
techniques and also a social network platform like twitter for
Firstly, we fetch the data from twitter by forming a twitter
fetching the data. Many researchers have carried out
account from the developer’s side and getting consumer and
researches in this field. Paper [1] proposed a set of techniques
secret consumer key from them and then using twitter
of machine learning which involves semantic analysis for the
application program interface and requesting the twitter’s
purpose of classifying the sentences and product reviews based
server to authorize us to collect data from it. After getting the
on the data which was fetched by twitter. They used naïve
data now we have to handle the data
bayes algorithm which gave them better result than maximum
After fetching the data from twitter we transfer the data to
entropy or SVM. Paper [2] represents the survey of various
google translator and by using google translator’s Application
machine learning techniques and then they carried out the
program interface we convert that data which is not in English
comparison between them with the level of accuracy. On
to the language English which in turn increases the accuracy
comparing they got 85% more accuracy by using supervised
level and classification of the fetched data to be more accurate.
machine learning which is higher than that of unsupervised
After passing the data from google translator , we carry out the
learning technique. Paper [3] applied Bigram , unigram ,
pre-processing of data. . We pre-process the data by removing
object-oriented features for the sentiment analysis. They
unwanted words and removing those words which are not
choose an effective feature set to upgrade and enhance
going to be needed in further process.
accuracy of classifiers. Paper [4] attempted to classify
For further processing we use machine learning techniques that
sentiment analysis for movie reviews using machine learning
involves the use of NLP( Natural Language Processing) in
techniques. Two different algorithms Naïve bayes and Support
which there is an interaction between the computer and human
Vector Machine are implemented. It was observed that SVM
language. It is an approach of making the machine understand
classifier performs the best than every other classifier.
our language and not we learning the machine’s language. So,
Paper [5] presented paper which included sophisticated
to increase the competence power of machine to be accurate
categorization of a large number of articles and illustration of
for the classification of data we use Stanford library as it will
the research carried out in field of sentiment analysis.
have a whole lot of sentences in English which in turn
Paper [6] deduced that even the lexicon-based methods are
increases the understandability and competence power of
also very competitive because they require less effort in
machine that will eventually intake more and more sentences
human- labeled document and it is not sensitive to quality and
of data that is fetched.
quantity of training dataset.
Now, for the classification of data in positive , negative and
Paper [7] explored the application of recursive neural
neutral sentiments we use the algorithm of RNN( recurrent
networks on the task of sentiment analysis of tweets. They
neural network) because it is termed as the best algorithm
experimented with different neural networks with the different
when it is compared with other classification algorithms like
levels of hidden layers.
Naïve Bayes , Maximum Entropy , SVM. RNN is termed as
Paper[8] presents the analysis in first decade.
the best algorithm because its competence power is more than
Paper [9] presents the analysis of sentiments through the same
the others and it works on the same approach of the human’s
approach of algorithms of classification.
brain. It follows the biological neural network . In RNN each
Paper[10] and paper [11] presents us with the over the time
neuron or an unit can take information from previous neuron
evolution and techniques in the process of analysis of
and carry out the further process. Each node (neuron) has a
sentiments.
time-varying real-valued activation. Each connection (synapse)
Paper[12] and paper[13] explores and presents the mapping
has a modifiable real-valued weight. Nodes are either input
techniques and the usual problems that are faced during the
nodes (receiving data from outside the network), output nodes
process of sentiment analysis.

2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 799
(yielding results), or hidden nodes (that modify the data en pointless words are necessary because they directly influence
route from input to output). Only unpredictable inputs of some the ability of classifiers.
RNN in the hierarchy become inputs to the next higher level Classification :
RNN, which therefore recomputes its internal state only rarely. Extracting the features from the tweets and finding the out the
Each higher level RNN thus studies a compressed polarity of those features whether they will be positive ,
representation of the information. negative or neutral. For this classification purpose we used
RNN(Recurrent Neural Network) algorithm .
RNN : suppose there is 5-layer neural network , one layer is
considered for each word, the formulas that carries out the
computation happening are as follows :
is calculated on the basis of previous hidden state ,

FIG-Y,[17]
IV. PROPOSED APPROACH

FETCHING THE DATA FROM


TWITTER

HANDLING THE DATA BY GOOGLE


TRANSLATOR FIG-X, [16]

VI. CONCLUSION
PRE-PROCESS THE DATA
It is concluded that there was an accuracy of more
than 90% by using the RNN( Recurrent neural
CLASSIFICATION
network) algorithm as when we classify the data , the
amount of data that was taken in account to be
classified was more than the quantity of data that is
RESULT usually taken by other research papers because in this
project we used the Stanford library and google
translator which will in turn increase the competence
power of machine as we are providing a whole set of
english sentences to it to learn and understand so that
it doesn’t neglect any data and the accuracy that we
V. IMPLEMENTATION
receive will be maximum than
Fetching the others.
Data :
Collecting the data from twitter by making an account from
developer’s side and getting a consumer key and a secret The data on which we draw our conclusion was a data
consumer key as a token of authorization. Using twitter’s set which was fetched from twitter regarding ‘Mr.
application program interface as a library tool to collect tweets Narendra Modi’-Prime Minister of India. He’s
from internet for the analysis of sentiments. Now we handle
considered to be a famous entity about whom we
the data by translating those data that are not in English
language to English by using Google Translator Application fetched around 10,000 sentences and since we used
program interface which provides us those tools which Stanford Library to increase the power of
translate the non English sentences to English. NLP(Natural Language Processing) the data which
Pre-processing of Data : was classified in the end was more than half of the
The pre-processing techniques isolates those words from the sentences which were taken in consideration before,
tweets that are of no use for further process. These words are that’s how this research proves to be more efficient
regarded as pointless words. This process is considered as the
most fundamental step in the whole process. The removing of

800 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
than others. The result of accuracy that was obtained
in comparison with other algorithms is as follows:

Classifiers Features Accuracy(%)

RNN(Recurrent Same 90.3


neural Network) approach as
biological
neural
FIG-2
network, more
competence
power Fig-2 : Represents the comparison of accuracy between Naive
Bayes, SVM and RNN.
Naïve Bayes Object- 79.54
oriented,
Bigram
Features

Support Vector Object- 79.56


Machine oriented , VII. REFERENCES
Bigram [1] Geetika Gautam and Divakar Yadav, Sentiment analysis of
Features twitter data using machine learning approaches and semantic
analysis 978-1-4799-5173-4 IEEE, 2014.

[2] Bhavitha Bk, Anisha P Rodrigues , Niranjan N


Chiplunkar, comparative study of machine learning techniques
in sentimental analysis, 978-1-5090-5297-4 IEEE, 2017.

[3] Bac lee and Huy Nguyen ,Twitter sentiment analysis using
machine learning techniques c Springer International
Publishing Switzerland, 2015.

[4] Abhinash Tripathy ,Ankit Agarwal ,Santanu Kumar Rath,


classification of sentimental reviews using machine learning
techniques, March 2015.

[5] Walaa Medhat, Ahmed Hassan, Hoda Korashy, sentimental


analysis algorithms and applications : A survey, March 2014.
FIG-1
[6] Hailong Zang ,Wenyan Gan , Bo Ziang, Machine learning
x-axis: Algorithm and lexicon based methods for sentiment Classification : A
y-axis: Accuracy level survey , IEEE 2014.
Fig-1 : Comparison of accuracy between Naïve
Bayes and SVM , which is nearly same. [7] YeYaun, You Zhou, Twitter Sentimental analysis with
RNN cs224d.stanford.edu/reports/YuanYe.pdf

[8] Oskar Ahlgren, Research On Sentiment Analysis: The


First, 2017.

[9] Rudy Prabowo1 , Mike Thelwall,Sentiment Analysis: A


Combined Approach, vol. 3, issue 2, 143-157, 2009.

2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence) 801
[10] Mika V. Mäntylä Daniel Graziotin, Miikka Kuutila The
Evolution of Sentiment Analysis - A Review of Research
Topics, Venues, and Top Cited Papers, February 2017, Pages
16-32, ISSN 1574-0137.

[11] Seema Chithore, D. A. Phalke, Sentiment analysis


algorithms and applications: A survey, December 2014, Pages
1093-1113.

[12] R. Piryani, D. Madhavi , V.K. Singh, Analytical mapping


of opinion mining and sentiment analysis research during
2000–2015, January 2017, Pages 122-150.

[13] Vishal. A.Kharde, sheetal. Sonawane, Sentimental


Analysis of Twitter Data : A survey of Techniques,
International Journal of Computer Applications 139(11): 5-15,
April 2016.

[14] M.O. Mokhtar, Osama Ismael, Online Paper Review


Analysis,Doaa Mohey El-Din,Hoda Vol. 6, No. 9, 2015.

[15] Raj Kumar Verma, Ritu Tiwari, Nirmal Sentiment


Analysis of Social web data: A Review,
Roberts,307465189, March 2016.

[16]http://d3kbpzbmcynnmx.cloudfront.net/wp-
content/uploads/2015/09/rnn.jpg - Image address of FIG-X.

[17]http://s0.wp.com/latex.php?latex=s_t%3Df%28Ux_t+%2B
+Ws_%7Bt-1%7D%29&bg=ffffff&fg=000&s=0,Image
address of FIG-Y.

802 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence)

You might also like