You are on page 1of 9

1Effectiveness 

of
Context-Aware Tweet Enrichment for
2Sentiment Analysis
3
4Bashar Tahayna1, Ramesh Kumar Ayyasamy1, Rehan Akbar1
5
61 Department of Information Systems, Faculty of Information and Communication Technology
7Universiti Tunku Adbdul Rahman (UTAR), Kampar, Malaysia

9
10Corresponding Author:
11Bashar Tahayna1
12Jalan Universiti, Bandar Barat, 31900 Kampar, Perak, Malaysia
13Email address: bashar.tahayna@1utar.my
14
15Abstract

16The large source of information space produced by the plethora of microblogging and social
17media platforms has spawned a slew of new applications and spurred the evolution and the
18development of sentiment analysis research. We propose a sentiment analysis technique which
19operates at two levels. In the first stage, we extracted the important keywords reflecting the
20intent of a tweet. In the second stage, we expanded those keywords to enrich the tweet with
21context-aware words, phrases, or even latent variables. To preserve the latent relationships
22between tweet terms and their expansion representation, sentence encodings and contextualized
23word embeddings are utilized. To investigate the performance of tweet embeddings on sentiment
24analysis task, we have tested the context-free models (i.e., word2vec, Sentence2Vec, Glove, and
25FastText), the dynamic embedding model (BERT), the deep contextualized word representations
26(ELMO), and the entity-based model (Wikipedia). We followed the state-of-art hybrid deep
27learning model combining the convolutional natural network and Long short-term memory
28network to classify tweet data into negative, neutral, or positive polarity score. The proposed
29method and results proves that text enrichment improves the accuracy of sentiment polarity
30classification with noticeable percentage.
31
32Introduction
33Twitter and other social microblogging platforms have become key news and information
34sources [1,13]. By March 2021, there is an average of 500 million tweets generated on daily
35basis [2]. The massive information space has led to a rapid development of Sentiment Analysis
36research; Sentiment analysis challenge became a very attractive research topic in which
37researchers around the world put their enormous exertion to solve. The problem can be simply
38defined as the act of analysing a given “text”, from within large collections, to classify the
39concealed sentiments.

A
40On Twitter, people express their opinion in natural language by tweeting on a product, service, or
41anything using multiple linguistic features that represent their intention. The use of word
42variations increases the likelihood of vocabulary mismatch and makes a tweet difficult to
43understand. This problem is not straightforward to solve; it has several complicated dimensions
44to take care of and usually encounters issues such as a) the unstructured nature of the data, b) the
45multi-lingual aspects, c) incomplete sentences, d) idioms, jargon, or ad hoc words, e) lexical vs
46semantic vs syntactic attributes, f) ambiguity and the implicature referencing or meanings, and
47many more. In early days, methods of keyword-based Sentiment Analysis were developed to
48perform global sentiment classification based on bag-of-words frequency model. The difficulty
49with these approaches is that they don't account for the "sentimental" links between words in the
50local context, therefore they usually can't disclose the transmitted "implicit meaning”.
51In lieu of this, other research efforts have been directed toward utilizing local information within
52the text. For example, in aspect-based sentiment analysis, the analysis is done based on the
53aspect-polarity rather than merely the presence of bag of words within a document. Yet, many
54existing solutions do not take into account the contextual “intention” or meaning of words related
55to the aspect. The author of [3] says: “Tweets are often shortened and not trivial to understand
56without their context” [3]. Linguistic features that create semantic ambiguity and
57misinterpretation of the user intention, and additional factors, such as the non-standard use of a
58written language, slangs, abbreviations, or the implicit discourse referring to an aspect indirectly
59affect the ability to accurately analyse the actual users’ intention and the effectiveness of
60sentiment analysis systems.
61To overcome these issues, we propose an enrichment model to expand and relate a given word
62with its lexical or semantic context. To obtain the expansion terms, external knowledge-based
63processing (i.e. Ontologies, Wordnet, Wikipedia, and other online corpora) can be utilized to add
64semantic model to the text. We believe that when numerous external sources are used in the
65enrichment process, external knowledge-based tweet expansion will gain the most. This is
66because ontologies like WordNet and ConceptNet provide semantically related words. However,
67other corpus-based external sources employ concept proximity to assess term-relatedness. [4]. To
68extract sentiments from a tweet, we use the pre-trained word and sentence embeddings without
69any additional feature engineering. At the same time, we use the same model to evaluate our
70tweet enrichment representation on the tweets dataset [5]. The hopes are to obtain a correct and
71improved polarity classification based on the trained data set by considering the expansion and
72the contextual “intention” related to a word.
73In this paper we focus on how to enrich tweets to reduce vocabulary mismatch using
74contextualized word embeddings or sentence embeddings; We enrich tweets by adding relevant
75and context-driven keywords and then use different embedding techniques. Our learning model
76is inspired by the success of a state-of-art hybrid model which combines both (CNN)
77convolutional neural network and the (LSTM) Long short-term memory network to predict the
78sentiment from the given tweet [5]. The F1-score of the proposed enrichment method
79outperforms the standard baseline representation (raw tweets) by a significant margin. We
80organize the paper as the following. Section two discusses previous and related researches.
81Whereas the third section describes our framework for sentiment analysis of tweet messages. In
82this section we mainly describe the feature extraction method and the embedding techniques
83respectively. Section 4 gives a detailed overview on the results of our experiments. Finally, the
84last section provides a conclusion and future work plan.
85
86Previous Work
87Query expansion has been widely adopted and used in retrieval systems. In [23], the authors
88proposed a query expansion model to enhance search on Twitter platform. The authors claimed
89that the effectiveness of the method and of combining patterns and word embedding to improve
90twitter search [23]. In [24], the authors introduced described different term-based and topical
91feature extraction method to retrieve data from Twitter platform. It is also worthwhile to mention
92that none of the mentioned word uses any external knowledge-bases and relies only on the
93training data only.
94The work on [25] is a noticable work that utilizes the external knowledge-base (Wikipedia) to
95compute semantic relatedness between terms presented in a text. Although the authors did not
96utilize any embedding techniques, they claimed that their method achieves “notable
97improvements in correlation of computed relatedness scores with human judgements” [25].
98Word2vec endeavors to connect words with vectors space [8]. The spatial distance between
99vectors depicts the similitude connection between words. Global Vectors (commonly known as
100GloVe) is another word embedding model [7]. It learns word representation using matrix
101factorization algorithm for word-context matrix. Word2Vec and GloVe both handle whole full
102words, but they struggle with unfamiliar words (aka, out of vocabulary words). The context is
103irrelevant for Word2Vec and GloVe word embeddings. This mean that a single word would have
104the same representation even the context was different. GloVe is simply a better version of
105Word2Vec. FastText (based on Word2Vec) is a word-fragment-based program that can generally
106tolerate unknown words while still producing one vector per word [9]. FastText employs n-
107grams to ensure that local word ordering is preserved. ELMo [9] and BERT [10] handle this
108issue by providing context-sensitive representations. In other words, f(word, context) gives an
109embedding in ELMo or BERT.
110BERT employs the transformer, which is an attention mechanism that learns contextual
111associations between words (or sub-words) in a text. [10]. ELMo is a character-based system that
112generates vectors for each character that may be merged using a deep learning model or simply
113averaged to get a word vector. Hybrid machine learning approaches have recently been
114demonstrated as possible models for decreasing sentiment errors on increasingly complicated
115training data; Combining two (or more) procedures tends to incorporate the benefits of each and,
116as a result, addresses some of the shortcomings of using individual/singular machine learning
117model. In order to produce more accurate findings, a hybrid technique was devised in an
118experiment that used both support vector machine and the semantic orientation [13,14].
119In recent years, a new type of deep learning approach has been utilized for sentiment analysis in
120a variety of languages. The authors of [11] have utilized a combined model of CNN and LSTM
121to solve the sentiment classification problem. The authors of [12] defined a hybrid model based
122on deep neural network architectures including CNNs and LSTMs to achieve the first rank on all
123of the five English sub-tasks in SemEval-2017. Cho et al. [21] proposed word representation
124using high-level feature embedding that works at the character-level CNN and bi-LSTM. They
125conclude that capturing local and global information of any word token is very efficient to
126enhance the sentiment polarity estimation. By modelling the interactions of words throughout the
127composing process, Wang, L. et al. [15] used LSTM for Twitter sentiment categorization. Using
128the same hybrid model, Guggilla, et al. [16] suggested a solution to determine if a given phrase is
129factual or emotional by using Word2vec and linguistic embeddings features. To improve phrase
130and sentence representation, Huang, et al. [17] suggested encoding syntactic information (e.g.,
131POS tags) in a tree-structured LSTM. Yu and Jiang [18] investigated further into a generalized
132solution to the phrase embeddings. The authors studied cross-domain sentiment classification
133and came up with an architecture two separate CNNs to train the hidden feature representations
134of labeled and unlabeled datasets. In this work, we have obtained our results using the state-of-
135art hybrid model [11, 15, 16, 17, 18] which combines recurrent neural networks (RNN) and
136convolutional neural networks (CNN).
137
138Proposed Framework
139Figure 1 depicts the component-based architecture and methodology of our proposed tweet
140expansion framework. The framework consists of several steps including tweet pre-processing
141and manipulation, feature extraction, representation, and expansion, lexical simplification, and
142deep learning model classifier. In the following sections we give a detailed description of each
143step.
144
145
146
147 Figure 1. Tweet enrichment and sentiment analysis proposed architecture.
148
149Tweet Feeder
150To build a real-time system that monitors a specific account on Twitter, we access Twitter API
151using Python and retrieve all the retweets on a specific tweet. For example, a recent tweet by
152American Airlines and people replies are shown in Figure 2.
153
154Tweet Manipulation and Pre-processing
155To generate our baseline tweets dataset, we manipulate raw tweets by removing noisy URLs and
156applying some of the common text normalization processes:
157  Stop words removal: The process eliminates useless encoding of words that are missing in
158 any pre-trained word embedding,
159  Case folding: The process of converting words or phrases to lowercase,
160  Mapping special values for their type (e.g.: ‘3pm’ → ‘TIME’),
161  Special characters removal: This process includes the removal of hashtags, numbers,
162 punctuation marks, and characters other than letters of the alphabet,
163  Acronym normalization (e.g.: ‘FB’ → ‘Facebook’) and abbreviation normalization (e.g.:
164 ‘AFAIK’ –→ ‘As far as I know’),
165  Spelling correction.
166
167
168
169
170
171
172
173 Figure 2. Sample of a tweet and replies from the American Airlines account
174
175Tweet Expansion
176We undertake additional step to enrich tweet keywords using ConceptNet, Wordnets and
177Wikipedia to produce an extended form of a particular tweet. By adding, removing, or
178substituting words to the original tweet, the final enrichment phase creates supporting keyword
179vectors of semantically related phrases (as shown in Figure 3). The ability to recognize term
180connections would allow for a more methodical approach to tweet term selection. There are two
181types of term relationships: lexical and semantic. In the lexical type, relationships between
182fundamental word forms with the same meaning are depicted. For example, synonyms and
183morphological variations are examples of such relations. On the other hand,
184hyponymy/hypernymy (i.e. IS-A relationship), and meronymy/holonymy (i.e. HAS-A
185relationship), indicate the semantic relationships between the meanings of words [6]. In addition
186to these formally defined relationships, word pairs can also be linked depending on their position
187within a certain distance in a tweet. If such pairings of terms regularly coexist in the same
188archive, they are considered to be connected to the same topic. Extracting the relationship
189between formal and informal terms will ensure that all terms related to one aspect of the tweet
190are taken into account. Tweets can be linked directly/indirectly through certain forwarding
191relationships. A direct link appears when some tweets contain the same terms, while the indirect
192relation implies that the tweets have different words, regardless of their relationship (e.g.
193synonyms) [15].
194Substitution or addition of tweet keywords with corresponding synonyms,
195hyponyms/hypernyms, and meronyms/holonyms are common forms of content change as
196described in [20]. We utilized WordNet ontology to derive the lexically and semantically related
197words. At the same time, we implemented a heuristic function to compute how close two noun
198keywords using the distance between Wikipedia articles. For ConceptNet and WordNet we store
199IDs for words that have some form of semantic relation. For Wikipedia no such relations can be
200extracted, so we used the set of categories associated with any given article as semantic relations.
201To measure similarity between words in WordNet ontology, we use Leacock and Chodorow [22]
202formula:
2∗N
203 ¿ wup (C 1 ,C 2)=
N 1+ N 2+ 2∗N
204
205Where C1 and C2 represent two different concepts and N1, N2 evaluate the distances (in IS-A
206connections) between the concepts and a particular common concept. On the other side, N
207measures the distance between the nearest common ancestor of the two concepts and the root
208node. Table 1 shows a sample of original tweets and their semantically similar tweets generated
209by BERT Transformer. Some terms may not have a corresponding word embedding when the
210tweet's terms are replaced by its vector representations. This problem can be solved by removing
211the terms or filling their corresponding vectors with random values. It can also be set to a
212specific word's embedding, as UNK. Unknown words, on the other hand, will all be represented
213by the same vector.
214To create embeddings for the missing words, we take a different approach; we examine tweets
215based on their linguistic features. Namely, three key linguistic features are retrieved
216(morphological, syntactical, and semantics). The use of both formally and informally linked
217terms in tweet expansion, in combination with the other variables described above, is expected to
218reduce the problem of tweet vocabulary mismatch. Next, for the unknown terms, we extract their
219three grams combinations and use the word embedding model to search up each gram. By doing
220this, we can compute the average of the discovered grams' embeddings. However, if none of the
221word's 3-grams have any embeddings, we try higher grams (4, 6, …, etc grams) until we
222discover the matching embedding or until we reach the length of the unknown word. In the
223second situation, we look for additional linguistic features in an external knowledge base.
224
225 Table 1. Relevant keywords extracted from External Knowledge-Bases
226
227
228 Figure 3. Confusion matric of BERT transformer-based tweet rephrasing
229
230Tweet Representation
231To extract features from a tweet, we substitute the term with its corresponding word embedding
232using different embeddings. In the ELMo embedding, vector representing a token or word acts as
233a function of the whole sentence containing that token/word. As a result, different word vectors
234for the same word may exist in different contexts. Unlike Glove, Word2vec, and fastText
235embedding techniques, which ignore word order at the training phase and look up words and
236vectors in a dictionary, ELMo generates on-the-fly vectors through a deep learning model. The
237effectiveness of the different embeddings is shown in the result section below.
238On the other hand, we reasoned that utilizing context to establish connections to identify
239common ground between word meanings sounded a lot like spanning nodes in graph-searching
240techniques, so we chose to represent the disambiguation process as several Trees (one for each
241conceivable meaning) attempting to connect one to another. To make the model function, we
242needed contextual information about alternative interpretations arranged in a graph-like manner.
243We start looking for terms in WordNet and ConceptNet, then we disambiguate terms using
244Wikipedia. Figure 4 gives an example of term disambiguation using Wikipedia. Figure 5 gives
245an example of keyword relations with other terms.
246
247 Figure 4. Example on Wikipedia disambiguation
248
249 Figure 5. Keyword relatedness from ConceptNet
250Tweet Embedding
251We initialized the embeddings of words using different pre-trained embeddings. Namely. We
252tested different word/sentence embedding on the baseline dataset (raw clean tweets). On the
253other experiment, we generate an expanded tweets dataset which is composed of semantically
254related sentences derived from the baseline dataset and applied ELMo embedding. On the same
255expanded tweet, we tested Word2Vec, Sent2Vec-based embedding. We used Wikipedia2Vec to
256learn embeddings; we created an entity-annotated corpus from Wikipedia by interpreting entity
257links in Wikipedia articles as entity annotations, and then used the produced corpus to train 300-
258dimensional skip-gram embeddings [7] using negative sampling. By the learnt embeddings, we
259can guarantee that similar words will be clustered together in a single vector space.
260
261The Hybrid Learning Model
262We utilized the embedding techniques with LSTM and CNN models because “representing
263words and sentences into vectors provides better performance for NLP problems” [16]. The idea
264behind embedding assumes that similar words and their context have similar vector
265representations as in the sentences “Best Italian restaurant in Kuala Lumpur”, “Top Italian food
266in Kuala Lumpur”. Most likely that both sentences will be used in a similar context, and also
267with similar words that have similar vector representation such as “Pasta,” “favorite,” or
268“cousin”. These vectors become the input to the proposed CNN and LSTM. We have tested the
269hybrid model with the embedding of the enriched tweets at both word and sentence vectors
270representations. After tuning our hybrid CNN and LSTM models, we find the best hyper-
271parameters that set the highest F1-score. Soft voting is utilized to average the projected
272probability of class membership among the selected models, and the hybrid class prediction is
273picked from the class with the highest average.
274
275
276Results
277
278Evaluation Metrics
279To validate our results on the training data, we computed the accuracy to find the number of
280predictions out of the entire predictions. We also used F-measure [19] to calculate the harmonic
281mean of precision and recall. The Precision encodes the ratio of all positive examples that are
282correctly classified to all of the positives classified samples. Whereas, Recall encodes the ratio of
283the positive samples that are correctly classified to the total number of positive samples.
284
285Experimental Results
286As shown in Table 2, the obtained results show that classification consumer sentiment as positive
287or negative achieves better measures when using the expanded forms of tweets rather than using
288the raw baseline tweets. Also, we noticed that the word-level embedding performs better than
289sentence embedding on the given dataset.
290
291
292
293 Figure 6. Accuracy of the hybrid model using raw/Enriched tweets
294
295Interestingly, we may infer that the tweet enrichment procedure is critical in gaining access to
296factors that cannot be determined straight from raw tweet data (see model accuracy in Figure 6).
297Such an enrichment model might be used to improve sentiment analysis on a larger scale.
298
299
300 Table 2 Precision/Recall Comparison for Embedding Raw and Enriched Tweets
301
302Conclusion
303The suggested study focuses solely on the task of tweet enrichment for sentiment analysis. We
304find out that regardless of the used embedding, tweet enrichment proves that it can overcome the
305Out of Vocabulary problem; words that are not in the training set. We have utilized a CNN-
306LSTM hybrid classifier for sentiment classification; other hybrid models might perform better to
307solve this problem. Furthermore, we only looked at English tweets from the Twitter platform.
308However, our method does not incorporate customer sentiments in other languages. We believe
309it will be very interesting research direction to do additional research which includes tweets
310offered in other languages or a combination of languages. Furthermore, because certain aspects
311might have an impact on the sentiment score of a given tweet, aspect-based sentiment analysis
312may be taken into account in the upcoming research. In a future study, we plan to investigate
313other knowledge basis and test further embedding techniques.
314
315References
316[1] J. V. DIJCK, "Tracing Twitter: The rise of a microblogging platform," International Journal of Media &
317 Cultural Politics, vol. 7, no. 16, pp. 333-348, 2011.
318[2] Twitter official Blog, Accessed on April, 2021
319[3] E. B. Setiawan, D. H. Widyantoro and K. Surendro, "Feature Expansion for Sentiment Analysis in
320 Twitter", The 5th International Conference on Electrical Engineering, Computer Science and Informatics
321 (EECSI), pp. 509-513, 2018, doi: 10.1109/EECSI.2018.8752851.
322[4] M. Kacmajor and J. D. Kelleher, “Capturing and measuring thematic relatedness,” Language Resources and
323 Evaluation, doi: 10.1007/s10579-019-09452-w, Mar. 2019,
324[5] Sentiment140? What is Sentiment140? [Internet]. [cited 2020 Dec 10]. Available from:
325 http://help.sentiment140.com.
326[6] ‌B. Selvaretnam, B. and M. Belkhatir, “Natural language technology and query expansion: issues, state-of-the-
327 art and perspectives”. Journal of Intelligent Information Systems, Vol. 38(3), pp.709-740, 2012
328[7] T. Mikolov, K. Chen, G. Corrado, J. Dean, “Efficient estimation of word representations in vector space”.
329 arXiv preprint arXiv:1301.3781, 2013.
330[8] P. Bojanowski, E. Grave, A. Joulin, & T. Mikolov, “Enriching word vectors with subword
331 information”. Transactions of the Association for Computational Linguistics, 5, 135-146, 2017.
332[9] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, & L. Zettlemoyer. Deep contextualized
333 word representations. arXiv preprint arXiv:1802.05365, 2018.
334[10] J. Devlin, M. W. Chang, K. Lee, & K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for
335 language understanding”. arXiv preprint arXiv:1810.04805, 2018.
336[11] C. Zhou, C. Sun, Z. Liu, F. Lau, “A C-LSTM neural network for text classification”. arXiv preprint
337 arXiv:1511.08630. 2015.
338[12] P. M. Sosa, “Twitter sentiment analysis using combined LSTM-CNN models”. Eprint Arxiv, 1-9, 2017.
339[13] J. Priem, K. L. Costello, “How and why scholars cite on Twitter”. Proceedings of the American Society for
340 Information Science and Technology, 47(1), 1-4, 2010.
341[14] A. Shoukry, A. Rafea, “A hybrid approach for sentiment classification of Egyptian Dialect Tweets”, in: Arabic
342 Computational Linguistics (ACLing), First International Conference on”, IEEE. pp. 78–85., 2015
343[15] X. Wang, Y. Liu, C. Sun, B. Wang, & X. Wang, “Predicting polarities of tweets by composing word
344 embeddings with long short-term memory”. In Proceedings of the Annual Meeting of the Association for
345 Computational Linguistics, 2015.
346[16] C. Guggilla, T. Miller, & I. Gurevych, “CNN-and LSTM-based claim classification in online user comments”.
347 In Proceedings of the International Conference on Computational Linguistics, 2016.
348[17] M. Huang, Q. Qian, & X. Zhu, “Encoding syntactic knowledge in neural networks for sentiment
349 classification”. ACM Transactions on Information Systems, 35, 1–27, 2017.
350[18] J. Yu, & J. Jiang, “Learning sentence embeddings with auxiliary tasks for cross-domain sentiment
351 classification”. In Proceedings of the Conference on Empirical Methods in Natural Language Processing,
352 2016.
353[19] B. Agarwal, N. Mittal, P. Bansal, & S. Garg, “Sentiment analysis using common-sense and context
354 information”. Computational intelligence and neuroscience, 2015.
355[20] S. Y. Rieh, “Analysis of multiple query reformulations on the web: The interactive information retrieval
356 context”. Information Processing & Management, 42(3), 751-768, 2006
357[21] M. Cho, J. Ha, C. Park, & S. Park, “Combinatorial feature embedding based on CNN and LSTM for
358 biomedical named entity recognition”. Journal of biomedical informatics, 103, 103381, 2020
359[22] C. Leacoc, & M. Chodorow, “Combining local context and WordNet similarity for word sense
360 identification”. WordNet: An electronic lexical database, 49(2), 265-283, 1998.
361[23] M. Bendella, & M. Quafafou, Patterns Based Query Expansion for Enhanced Search on Twitter Data.
362 In ICFCA (Supplements) (pp. 100-112).
363[24] C. H. Lau, Y. Li, & D. Tjondronegoro, Microblog Retrieval Using Topical Features and Query Expansion.
364 In TREC 11, 2011.
365[25] Gabrilovich, E., & Markovitch, S. Computing semantic relatedness using Wikipedia-based explicit semantic
366 analysis. In IJcAI (Vol. 7, pp. 1606-1611), 2007.

You might also like