You are on page 1of 5

Sentiment Analysis Based on Weighted Word2vec and Att-

LSTM
Huanhuan Yuan Yongli Wang, Xia Feng Shurong Sun
School of Computer Science and School of Computer Science and Zhenjiang Analysis InfoTech Ltd,
Engineering, Nanjing University of Engineering, Nanjing University of Zhenjiang
Science and Technology, Nanjing Science and Technology, Nanjing 8618851026397
8615720622990 8618936032016, 8618761878717
779477284@qq.com
15720622990@163.com yongliwang@njust.edu.cn,
779477284@qq.com

ABSTRACT The popular method of text sentiment analysis is based on


The Internet has become an indispensable part of modern people's traditional machine learning. Through manually designing
lives. The sentiment analysis of text generated by the Internet has features, it can construct text information features, and then use
gradually become a research hot spot. Through the sentiment machine learning methods are to finish sentiment analysis.
analysis of texts, information such as the public's emotional status, Commonly used traditional machine learning methods are Naive
views on some social phenomena, and preferences for a product Bayes[3], support vector machines[4], maximum entropy methods,
can be obtained. It contributes to commercial value and social etc. These methods can all be classified as shallow learning
stability. The common research methods are based on traditional methods. Shallow learning methods are relatively small and easy
machine learning algorithms. According to hand-labeled to implement, but their ability to express complex functions is
sentiment lexicons, we use machine learning algorithms such as limited, thus restrict the generalization ability of complex
naive Bayes, support vector machines, and maximum entropy classification problems. To compensate for this deficiency, the
methods are to perform sentiment analysis on textual information. shallow learning model introduces artificial constructive features
To reduce the dependence on hand-built emotional dictionary and such as the use of artificially tagged emotional dictionaries, syntax
highlight the role of keywords in the review text, this paper and grammar analysis, and so on. Although the artificial structure
proposes the weighted word2vec, adds the Attention mechanism feature can effectively improve the accuracy of text sentiment
to the Long-Short Term Memory (LSTM) model. Experiment analysis, it takes time and effort because of the need for excessive
result shows that the method is significantly better than traditional manual labeling of data. It also requires prior knowledge. With the
machine learning methods. development of the Internet and the continuously expanding scale
of text data, these methods are much limited.
CCS Concepts In the past two years, deep learning technology has developed
• Computing methodologies → Neural networks rapidly. It has been found that the analysis of emotions in text
based on deep learning related techniques is better than traditional
Keywords learning methods. Using the deep learning methods, you don't
Sentiment analysis; word2vec; TFIDF; Att-LSTM need to rely on any manually labeled sentiment lexicon and
syntactic analysis results. If you build a neural network, you can
1. INTRODUCTION directly classify emotions. But many neural network models
The main task of text sentiment analysis is to identify the ignore the role of keywords.
emotional information what users express in emotional texts. At
present, textual sentiment mainly involves two kinds: positive Attention mechanisms can achieve excellent results in text and
emotions and negative emotions. Common emotion analysis image fields, such as image recognition (Mnih [5]), machine
methods are mainly divided into two categories: (1) unsupervised translation (Bahdanau [6]), semantic implication (Rocktaschel [7])
classification method based on emotional dictionary[1], which and sentence summary (Rush [8]). More importantly, neurological
determines the emotional tendency of text by calculating the attention can improve reading synthesis (Herman [9]). This paper
relative number of positive and negative affective words in text; presents the Weighted W2V-Att-LSTM sentiment analysis model.
(2)Supervised classification method based on machine learning[2], The contributions of this article are as follows:
which uses the annotated corpus, selects the most distinguishing
1) Combine word2vec and TFIDF algorithms to capture
feature through the feature function, and then uses the machine
contextual semantic information and calculate the importance of
learning algorithm to train the corpus to get the classification
vocabulary in the text;
model for emotion classification.
2) Based on the Long-Short Term Memory model, the Attention
Permission to make digital or hard copies of all or part of this work for mechanism is added to the text sentiment analysis to highlight the
personal or classroom use is granted without fee provided that copies are role of keywords in the review text.
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy 2. RELATED WORK
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. 2.1 Word Vector Representation
CSAI '18, December 8–10, 2018, Shenzhen, China One hot representation corresponds to the element of the target
© 2018 Association for Computing Machinery. word is not zero, all other elements are zero. Therefore, one hot
ACM ISBN 978-1-4503-6606-9/18/12…$15.00 representation exists dimension disaster and “lexical gap”
DOI: https://doi.org/10.1145/3297156.3297228

420
phenomenon. Distributed representation solves the problem of one emotion text to word2vec.Word2vec constructs a vector for each
hot representation through mapping each word to a shorter word word, and each word vector is assigned a weight by TFIDF
vector during training process. However, the short word vector algorithm. And then the process uses the LSTM network with
dimension generally needs to be artificially specified. Attention mechanism to train feature vectors, so that the output
feature vectors contain word semantic features and word sequence
Before the appearance of word2vec, some people had used neural features. Finally, SoftMax regression classifier is used to predict
networks to train word vectors. Bengio[10] used a three-layer the text's emotional orientation in the output layer.
neural network to construct a language model. The process of
model processing is very time-consuming. The vocabulary is
generally more than one million levels, which means that the
amount of output probability of each word is very large. Mnih[11]
proposed a Log-Bilinear model to train the language model. To
optimize Log-Bilinear model, Mikolov[12] provided a log-bilinear
model which removed the hidden layer of the neural network,
used only the linear representation capability, calculated the real
number to represent word vector.
Traditional text representation methods have the disadvantages of
high dimensionality, sparseness, and lack of semantic information.
This paper uses the word2vec model to represent vectors.
Word2vec method trains the N-gram language model through a
neural network machine learning algorithm, and generates the
vector corresponding to word in the training process. The neural Figure 1. Framework of the sentiment analysis model
network language model can map text to a low-dimensional vector,
and it can also make the word vector obtained by training carry
3.2 Weighted Word2vec
semantic information. The word2vec word vector captures the The word2vec model is trained according to each sentence in the
semantic information of the context, but the word2vec model data set, and slides on the sentence with a fixed window. It
cannot distinguish the importance of the vocabulary in the text. predicts the vector of the word in the middle of the fixed window
This paper intends to use the TFIDF algorithm to weight the according to the context of the sentence, and then trains the model
word2vec model. according to a loss function and optimization method.

2.2 Sentiment Analysis Based on Machine The word2vec model captures the semantic information of the
context, but it cannot distinguish the importance of the vocabulary
Learning in the text. The TFIDF algorithm is one of the important
At present, the main research methods of sentiment analysis are algorithms for calculating the weight of feature items. Therefore,
divided into two categories. One is a method based on emotional this paper proposes a weighted word2vec model based on the
dictionaries and rules, and the other is a method based on machine TFIDF algorithm. Based on the word2vec model, we propose a
learning that is used usually. Turney [13] used the PMI to expand weighted representation of the review document vector through
the reference dictionary for the deficiency of the emotional combining with TFIDF algorithm.
dictionary; Yang[14] extracted and analyzed relevant features of
emotional words, then used SVM to identify and classify Algorithm 1 Weighted W2V
sentences. Pang[15] attempted to use the n-grams model and SVM
Input: word vector w
model to classify emotions, and selected unigrams as features to
obtain the best classification results. However, in the study of Output: weighted word vector x
sentiment analysis based on machine learning, the main job is to Step:
design features manually. This work is very random and takes a
long time. Deep neural network technology can use models to While (Text is not empty) do
automatically learn the deep features of text, especially recursive Computing the probability of tf in a single
neural network model is more suitable for feature learning of document: ,
serial data such as text.
Based on previous research, this paper solves the problem of the Computing the probability of idf in the whole
gradient disappearance of the text feature selection problem. document: .
Through the control of the three kinds of doors of the LSTM
model, the long-term dependence problem in the training of the Computing the weighted vector of word: ,
,
RNN model is solved. At the same time, the Attention mechanism
is added to the LSTM model to obtain the semantic code ∑ ∈ ,
containing the attention probability distribution of the input
Multiplies w and weight vector to calculate weighted word
sequence node. It is used as the input of the classifier to reduce the
information loss and information redundancy in the feature vector vector w to form weighted comment vector x :
extraction process. This can effectively highlight the role of ∑∈ ,
keywords. End while
3. SENTIMENT ANALYSIS MODEL
3.1 Model Framework In the above algorithm, a set D includes M comment texts. The
The model for implementing sentiment analysis in this paper is text 1,2, … , has done word segmentation operation, and
shown in the following figure. First, we input preprocessed is trained by the word2vec model to obtain the N-dimensional

421
word vector w (w , ,… ) corresponding to each word. The Attention-based LSTM (Att-LSTM) model preserves the
This paper uses the TFIDF algorithm to calculate weight value intermediate output of the input sequence by retaining the LSTM
K t, of every word in the text, which is expressed as the word t encoder, then trains the model to selectively learn these inputs and
weight in the text 1,2, … , . TFIDF comprehensively associates the output sequence with the model output. Figure 2
considers the probability of a word appearing in a single text and shows the architecture of the Att-LSTM model. The attention
the weight of the word in the entire text set. mechanism calculates the weight of the historical node's influence
on the current node, and forms the attention probability
denotes the number of occurrences of the word t in the text distribution. The Attention mechanism breaks the restriction that
1,2, … , , and denotes the sum of all occurrences the traditional encoder-decoder structure depends on a fixed-
of the word in the text . M is the total number of training texts, length vector.
and is the number of texts where the word t appears in the
training text set. tf , is the word frequency of the word t in The input sequence of the text is X , ,…, .The average
the , and is the word vector of the word t. of the input vectors of the history nodes is the input vector of the
article overall represents , which is the last input of the
3.3 LSTM Based on Attention Mechanism encoding stage. , , … , corresponds to the hidden layer state
The LSTM architecture consists of a memory cell c and three values of the input sequence , , … , . corresponds to the
gates including input gate i, output gate o and forget gate f. hidden layer state value of input . in the figure is the
Outputs of the three control gates input gate, forget gate, and attention probability of the historical node to the last node. The
output gate are respectively connected to a multiplication unit to influence weight of the input sequence , , … , on the text
control the network input, output, and status of the cell unit. Using can be calculated, which can highlight the role of keywords and
formal languages, LSTM can be expressed as: reduce the influence of non-keywords on the overall semantics of
the text. The Att-LSTM model has two calculations:
Step 1: Calculate the probability of attention distribution.
exp
∑ exp

(2)
∙ (1)
represents the attention probability weight of node i for node
K. T is the number of elements of the input sequence. V, W, U are
weight matrix, and is the last input corresponding hidden layer
∙ tanh state. represents the state value of the hidden layer
is the activation function sigmoid; W, U, V and b represent the corresponding to the i-th element of the input sequence.
coefficient matrix and the offset vector, 、 、 represent the Step 2: Calculate the semantic encoding and feature vectors of the
calculation formulae of three gates at the time t. is the attention distribution probability.
calculation method of the memory cell at time t, and is the
output of the LSTM cell at time t. C ∑
Based on the idea of Attention Model, this paper designs an , H C, , (3)
Attention-Based LSTM model combined the characteristics of
sentiment analysis. This model mainly designs a new method of The semantic code C is mainly obtained by accumulating the
attention probability calculation, and uses this method to generate product of the attention probability weight and the hidden layer
a semantic code containing the attention probability distribution state of the historical input node. We use the semantic encoding of
and generate the final feature vector at the same time. the attention probability distribution of the historical node and the
text population vector as input of the LSTM module, and then the
hidden node state value , of the last node is the final feature
vector. The , contains the weight information of the historical
input nodes, and highlights the semantic information of the key
nodes.
Finally, the softmax layer is transformed into a conditional
probability distribution. and are parameters of the softmax
layer.

s x softmax , (4)

3.4 Model Training


The method uses the Adam optimization algorithm for deep
learning to train the model. The algorithm is a first-order
optimization algorithm that can replace the traditional stochastic
Figure 2. LSTM model architecture based on attention gradient descent algorithm. It can iteratively update the neural
mechanism network weights based on training data. The stochastic gradient
descent algorithm maintains a single learning rate to update all

422
weights, and the learning rate does not change during training number of texts that are actually negative but predicted to be
process. The Adam algorithm designs an independent adaptive positive; TN indicates the number of texts that are actually
learning rate for different parameters by calculating the first and negative and predicted are also negative. The accuracy rate is used
second moment estimates of the gradient. to measure the accuracy of the classifier. The recall rate is used to
measure whether the classifier can find all the samples. These two
Algorithm 2 Adam Algorithm
indicators are indispensable and should be taken into account at
Input: the same time. Therefore, F1 measures can be used to balance
these two aspects.
Parameter:
Step length . (6)
Exponential decay rate of moment estimation . ,
. 4.3 Experiment
Small constants for numerical stability
4.3.1 Comparison to Other Methods
Step: We compare with several traditional methods, including:
Initialize first-order and second-order moment variables word2vec-based SVM method, the word2vec-based LSTM
, method, the weighted word2vec-based LSTM method, and the
Attention mechanism-based LSTM method.
Initialize time step
W2V-SVM model: Use word2vec model to form word vector, the
While(Failed to meet stop condition)do classification is trained by the SVM model.
Take a small batch containing m W2V-LSTM model: After the text is converted into vectors based
samples ,⋯, from the training set, and the on word2vec, the classification is trained by the LSTM model.
corresponding target is Weighted W2V-LSTM model: Combine the TFIDF algorithm and
∑ word2vec to convert the text into vectors and classify them by
Calculate the gradient ← ; ,
LSTM training.
W2V-Att-LSTM model: After the text is transformed into a vector
Update first moment estimation: ← based on word2vec, the LSTM model based on the Attention
mechanism is used for training.
Update second moment estimation : ←
⨀g 4.3.2 Parameter Settings
, 1. Experimental Environment Configuration
Correct the deviation of the first moment: ←
(1) Software Environment
,
Correct the deviation of the second moment: ← Language: Python; Platform: Google TensorFlow Deep Learning
, Framework.
Calculate update value:∆ ,

(2) Hardware environment
Apply update value: ← ∆
Operating platform: Win10; CPU: Intel dual-core 4.0GHz;
End while Memory: 12G; Hard disk: 1T.
2. LSTM hyper-parameter definition
4. EXPERIMENT The batch size is 24, the number of units of the LSTM is 64, the
4.1 Data Set classification category is 2, the number of training iteration is
The data set of this paper includes two parts, one is an English 100000, the optimizer selects the commonly used Adam, and the
data set and the other is a Chinese data set. The English dataset is learning rate is set to 0.001 by default.
an IMDB film review set. The IMDB dataset contains 25,000
movie data, with 12,500 positive and negative texts. The Chinese 4.3.3 Experimental Results
dataset is selected from the hotel review corpora (Chn Senti Corp). It can be seen from the performance results of the following
This dataset is a corpus of hotel reviews collected by Dr. Tan of comparison experiments that the proposed method in the paper
the Chinese Academy of Sciences. This article selects has better results than the other models in accuracy, recall, and F1
ChnSentiCorp-Htl-ba-6000 data to conduct experiments with measures. From the Figure 3 and Figure 4, it is not difficult to find
3,000 positive and negative texts. Chinese text needs that no matter which model is used on an English data set or on a
segmentation compared to English text. Chinese data set, the effect of our method is much better. The
main reason is that Chinese needs word segmentation in the
4.2 Indicator preprocessing stage. Word segmentation will always produce
Precision and recall are two indicators used to evaluate the some minor errors in some reasons. Semantic bias caused by word
outcome of the classification. segmentation will reduce the performance of the model. In the
comparison algorithms, the performance of the weighted W2V-
precision , (5) LSTM is not as good as that of the W2V-Att-LSTM, indicating
that for the text sentiment analysis, the TFIDF algorithm dose not
TP indicates the number of texts that are actually positive and the performs well as the Attention mechanism when calculating the
prediction is also positive; FN indicates the number of texts that importance of words.
are actually positive but predicted to be negative; FP indicates the

423
7. REFERENCES
[1] Sugimoto F, Yoneyama M. A method for classifying emotion
of text based on emotional dictionaries for emotional
reading[C]// Iasted International Conference on Artificial
Intelligence and Applications. ACTA Press, 2006:91-96.
[2] Tang H F, Tan S B, Cheng X Q. Research on Sentiment
Classification of Chinese Reviews Based on Supervised
Machine Learning Techniques[J]. Journal of Chinese
Information Processing, 2007, 21(6):88-126.
[3] Pak A, Paroubek P. Twitter as a Corpus for Sentiment
Analysis and Opinion Mining[C]//LREc. 2010, 10(2010).
[4] Baharudin B. Sentence based sentiment classification from
online customer reviews[C]//Proceedings of the 8th
Figure 3. Performance results of each model on the English International Conference on Frontiers of Information
dataset Technology. ACM, 2010: 25.
[5] Mnih V, Heess N, Graves A, et al. Recurrent models of
visual attention[J]. 2014, 3:2204-2212.
[6] Bahdanau D, Cho K, Bengio Y. Neural Machine Translation
by Jointly Learning to Align and Translate[J]. Computer
Science, 2014.
[7] Tim Rocktaschel, Grefenstette E, Hermann K M, et al.
Reasoning about Entailment with Neural Attention[J]. 2015.
[8] Rush A M, Chopra S, Weston J. A Neural Attention Model
for Abstractive Sentence Summarization [J]. Computer
Science, 2015.
[9] Hermann K M, Kočiský T, Grefenstette E, et al. Teaching
machines to read and comprehend[J]. 2015:1693-1701.
Figure 4. Performance results of each model on the Chinese [10] Y. Bengio,R.Ducharme, P. Vincent. A neural probabilistic
dataset language model. Journal of Machine Learning Research,
2003,3:1137-1155.
5. Conclusion [11] Mnih A, Hinton GE.A scalable hierarchical distributed
In this paper, we propose a new text sentiment analysis method. language model[C]//Proc of the NIPS. 2009: 1081-1088.
After the text information is encoded into the word vector by
word2vec, the weight matrix is added in combination with the [12] Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of
TFIDF algorithm to form the LSTM input. To achieve the Word Representations in Vector Space[J]. Computer Science,
emotion classification, the text-related features are obtained by 2013.
LSTM, and then we combine Attention mechanism to obtain the [13] Turney P D,Littman M L.Measuring Praise and Criticism:
feature vectors. The experimental results show that the proposed Inference of Semantic Orientation from Association[J]. ACM
method is feasible and effective, and the method can better find
Transactions on Information Systems (TOIS), 2003, 21 (04):
the emotional orientation of textual information. At present, there
315-346.
are more and more texts including the English and Chinese mixed
review information. The next step will further explore the [14] YANG Jing,LIN Shi-ping.Emotion Analysis on Text Words
sentiment analysis tasks of the mixed Chinese and English texts. and Sentences based on SVM[J].Computer Applications and
Software,2011,28(09):225-228.
6. ACKNOWLEDGMENT [15] Pang B,Lee L,Vaithyanathan S.Thumbs up?:Sentiment
This article has been awarded by the National Natural Science Classification Using Machine Learning Techniques[C].
Foundation of China (61170035, 61272420, 81674099), the Proceedings of the ACL-02 Conference on Empirical
Fundamental Research Fund for the Central Universities Methods in Natural Language Processing Association for
(30916011328, 30918015103), and Nanjing Science and Computational Linguistics,2002:79-86.
Technology Development Plan Project (201805036).

424

You might also like