Topic-Enriched Word Embeddings For Sarcasm Identification

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/332928314
Topic-Enriched Word Embeddings for Sarcasm Identification
Chapter · May 2019

DOI: 10.1007/978-3-030-19807-7_29
CITATIONS READS
101 383
1 author:
Aytug Onan
Izmir Katip Celebi University
79 PUBLICATIONS 4,102 CITATIONS
SEE PROFILE
All content following this page was uploaded by Aytug Onan on 15 April 2021.
The user has requested enhancement of the downloaded file.

Topic-Enriched Word Embeddings
for Sarcasm Identification
Aytuğ Onan(&)
Department of Computer Engineering, Faculty of Engineering and Architecture,

İzmir Katip Çelebi University, 35620 İzmir, Turkey
aytug.onan@ikc.edu.tr
Abstract. Sarcasm is a type of nonliteral language, where people may express

their negative sentiments with the use of words with positive literal meaning,
and, conversely, negative meaning words may be utilized to indicate positive
sentiment. User-generated text messages on social platforms may contain sar-
casm. Sarcastic utterance may change the sentiment orientation of text docu-
ments from positive to negative, or vice versa. Hence, the predictive
performance of sentiment classification schemes may be degraded if sarcasm
cannot be properly handled. In this paper, we present a deep learning based
approach to sarcasm identification. In this regard, the predictive performance of
topic-enriched word embedding scheme has been compared to conventional
word-embedding schemes (such as, word2vec, fastText and GloVe). In addition
to word-embedding based feature sets, conventional lexical, pragmatic, implicit
incongruity and explicit incongruity based feature sets are considered. In the
experimental analysis, six subsets of Twitter messages have been taken into
account, ranging from 5000 to 30.000. The experimental analysis indicate that
topic-enriched word embedding schemes utilized in conjunction with conven-
tional feature sets can yield promising results for sarcasm identification.
Keywords: Sarcasm detection Word-embedding based features

Deep learning
1 Introduction
Sarcasm can be defined as a cutting, ironic remark that is intended to express contempt
or ridicule [1]. Sarcasm is a type of nonliteral language, where people may express their
negative sentiments with the use of words with positive literal meaning, and, con-
versely, negative meaning words may be utilized to indicate positive sentiment. With
the advances in information and communication technologies, the immense quantity of
user-generated text documents have been available on the Web. As a result, sentiment
analysis has emerged as an important research direction. User-generated content on
social platforms can serve as an essential source of information. The identification of
sentiments towards entities, products or services can be important for business orga-
nizations, governments and individual decision makers [2]. The identification of sub-
jective information from online content can be utilized to generate structured
knowledge to construct decision support systems.
© Springer Nature Switzerland AG 2019

R. Silhavy (Ed.): CSOC 2019, AISC 984, pp. 293–304, 2019.
https://doi.org/10.1007/978-3-030-19807-7_29
294 A. Onan
The social content available on the Web generally contains nonliteral language,
such as irony and sarcasm. There are a number of challenges encountered in sentiment
analysis, such as negation, domain dependence, polysemy and sarcasm [3]. Sarcastic
utterance can change the sentiment orientation of text documents from positive to
negative, or vice versa. In text documents with sarcasm, the expressed text utterances
and the intension of the person utilizing sarcasm can be completely opposite [4].
Hence, the predictive performance of sentiment classification schemes may be degra-
ded if sarcasm cannot be handled properly.
Automatic identification of sarcasm is a challenging problem in sentiment analysis.
First of all, sarcasm is a difficult concept to define. Since is a difficult concept to define,
it is even difficult to precisely identify for people whether a particular statement is
sarcastic or not [5]. In addition to that, there is no lots of accurately-labeled naturally
occurring utterances labeled for machine learning based sarcasm identification.
Sarcasm may be encountered in short text documents, which is characterized by the
limited length (such as, Twitter messages), in long text documents (such as, review
posts and forum posts) and transcripts of TV shows [1]. Twitter is a popular
microblogging platform, where people can express their opinions, feelings and ideas in
short messages, called tweets, within 140-character limit. With the use of Twitter,
people may interact to each other in a faster way. Twitter may be utilized, as daily
chatter, conversation, sharing information and reading breaking news [6]. Twitter
serves as an essential source of information for practitioners and researchers. Twitter is
an important source for sarcasm identification and sarcasm identification on Twitter is a
promising research direction. The earlier supervised learning schemes for sarcasm
identification on Twitter have been utilized linguistic feature sets, such as lexical
unigrams, bigrams, sentiments, punctuation marks, emoticons, character n-grams,
quotes and pronunciations [7]. Sarcasm identification approaches may be broadly
divided into three schemes, as rule-based approaches, statistical methods and deep
learning based approaches [1]. Rule-based schemes to sarcasm identification seek to
identify sarcastic text with the use of rules based on the indicators for sarcasm. Sta-
tistical methods to sarcasm identification utilize supervised learning algorithms, such
as, support vector machines, logistic regression, Naïve Bayes and decision trees. Deep
learning is a promising research direction in machine learning, which can be applied in
a wide range of applications, including computer vision, speech recognition and natural
language processing, with high predictive performance.
In this paper, we present a deep learning based approach to sarcasm identification.
In this regard, the predictive performance of topic-enriched word embedding scheme
has been compared to conventional word-embedding schemes (such as, word2vec,
fastText and GloVe). In addition to word-embedding based feature sets, conventional
lexical, pragmatic, implicit incongruity and explicit incongruity based feature sets are
considered.
The rest of this paper is organized as follows: In Sect. 2, related work on sarcasm
identification has been presented. Section 3 presents word-embedding schemes utilized
in the empirical analysis, Sect. 4 briefly presents convolutional neural network. In
Sect. 5, experimental procedure and empirical results are presented. Finally, Sect. 6
presents the concluding remarks.
Topic-Enriched Word Embeddings for Sarcasm Identification 295
2 Related Work
This section briefly discusses the existing works on machine learning and deep learning
based schemes for sarcasm identification.
Gonzalez-Ibanez et al. [8] examined the predictive performance of unigrams, dic-
tionary based feature sets and pragmatic factors for sarcasm identification. In the
presented scheme, support vector machines and logistic regression classifiers have been
utilized. In another study, Reyes et al. [9] utilized properties of figurative languages
(such as, ambiguity, polarity, unexpectedness and emotional scenarios) for sarcasm
identification on Twitter. In another study, Reyes et al. [10] examined the predictive
performance of conceptual features (such as, signatures, unexpectedness, style and
emotional scenarios) for sarcasm identification on Twitter. The presented scheme
obtained a predictive performance of 0.72 in terms of F-measure.
Ptacek et al. [11] presented a machine learning based approach to identify sarcasm
on Twitter messages written in English and Czech language. In the presented scheme,
n-gram based features (such as, character n-grams, n-grams, skip-bigram), pattern
based features (such as, word-shape pattern), part of speech based features (such as,
POS characteristics, POS n-grams and POS word-shape) and other features (such as,
emoticons, punctuations, pointedness and word-case) have been utilized as the feature
sets. In another study, Barbieri et al. [12] examined the predictive performance of
feature sets, such as frequency-based features, written-spoken style uses, intensity of
adjectives and adverbs, length, punctuation, emoticons, sentiments, synonyms and
ambiguities for sarcasm identification. In another study, Rajadesingan et al. [13] pre-
sented a behavioral modeling scheme for sarcasm identification.
Farias et al. [14] utilized the predictive performance of affective features for irony
detection on Twitter. In this regard, structural features (such as, punctuation marks,
length of words, emoticons, discourse markers, part of speech and semantic similarity),
affective features (such as, sentiment lexicons, sentiment-related features, and emo-
tional categories) have been taken into consideration. Bouazizi and Otsuki [15] pre-
sented a pattern-based scheme for sarcasm identification on Twitter. In the presented
scheme, sentiment-related features, punctuation-related features, syntactic and semantic
features and pattern-related features have been considered. In another study, Kumar
et al. [16] presented a machine learning based scheme for sarcasm identification in
numbers. In another study, Mishra et al. [17] utilized lexical, implicit incongruity based
features, explicit incongruity based features, textual features, simple gaze based fea-
tures and complex gaze based features for sarcasm identification.
In addition to machine learning based schemes, the paradigm of deep learning has
been recently utilized for sarcasm identification. For instance, Ghosh et al. [18]
examined different word-embedding based schemes (such as, weighted textual matrix
factorization, word2vec and GloVe) for sarcasm identification. Similarly, Joshi et al.
[19] utilized four word-embedding based schemes (namely, latent semantic analysis,
GloVe, dependency weights and word2vec) for sarcasm detection. In another study,
Poria et al. [20] presented a deep learning based approach for sarcasm identification
based on pre-trained sentiment, emotion and personality models based on convolu-
tional neural networks.
296 A. Onan
3 Word-Embedding Schemes
Bag-of-words model is a typical representation scheme for natural language processing

tasks, where words are represented in a vector space such that the more similar words
are located to nearby positions. One of the problems regarding bag-of-words scheme is
that the model regards words within the same context have the same semantic meaning.
In order to build deep learning based schemes for natural language processing tasks,
word-embeddings based representation schemes is an important language modelling
technique, in which semantic and syntactic relations among the words have been
captured based on large unsupervised sets of documents [21]. In this study, the pre-
dictive performance of LDA2vec has been compared to three well-known word-
embedding schemes, namely, word2vec, global vectors (GloVe) and fastText have
been considered. The rest of this section briefly presents word-embedding schemes
utilized in the empirical analysis.
The word2vec is an unsupervised and efficient method to construct word embed-
dings from text documents. The word2vec model consists of two different architec-
tures, namely, continuous bag of words model (CBOW) and continuous skip-gram
model [22]. CBOW model takes the context of each word as the input and seeks to
identify the word related to the context. In contrast, skip-gram model predicts the
particular context words based on the center word. Skip-gram model works well with
small amount of data and can effectively represent infrequent words, whereas CBOW
model is faster and can effectively represent more frequent words. Let we denote a
sequence of training words w1 ; w2 ; . . .; wT with length T, the objective of skip-gram
model is determined based on Eq. 1 [23]:
1 XT X
argmaxh C j C;j6¼0
logP h w t þ j jw t ð1Þ
T t¼1

where C denote the size of training context, P wt þ j jwt represents a neural network
with a set of parameters denoted by h.
The fastText model is an extension of word2vec model, which represent each word
by breaking words into several character n-grams (sub-words) [24]. With the use of
fastText based word-embedding, rare words can be represented in a more efficient way.
The fastText model is a computationally efficient model and since it takes character n-
grams into account, good representation schemes can be obtained for rare words.
The global vectors (GloVe) is a global log-bilinear regression model for word
embeddings based on global matrix factorization and local context window methods
[25]. The objective of Glove model is determined based on Eq. 2:
XV 2
J¼ i;j¼1
f Xij wTi xj þ bi þ bj logXij ð2Þ
where V denotes the vocabulary size, w2Rd represent word vectors, x 2Rd represent
context word vectors, X denote co-occurrence matrix
and Xij denotes the number of
times word j occurs in the context of word i. f Xij denotes a weighting function and
bi ; bj are bias parameters [25].
The LDA2vec model is a word-embedding scheme based on word2vec and the

latent Dirichlet allocation, which extracts dense word vectors from the latent document-
level mixture vectors jointly from Dirichlet-distribution [26]. LDA2vec model enables
to identify topics from texts and obtain topic-adjusted word vectors. In this way, the
interpretability of the word vectors has been enhanced by linking each word to the
corresponding topic. The LDA2vec model utilizes skip-gram negative sampling as the
objective function to identify topic weights for the documents.
4 Convolutional Neural Networks
Convolutional neural networks (CNNs) are a type of neural networks which process
data with a grid-like topology. Convolutional neural networks have been successfully
utilized in a number of applications, including image recognition, computer vision and
natural language processing [27, 28]. Convolutional neural networks are characterized
by the convolution operation in their layers. A typical convolutional neural network
architecture consists of input layer, output layer and hidden layers. The hidden layers of
the architecture may be convolutional layer, pooling layer, normalization layer or fully
connected layers. In convolutional layers, convolution operation has been applied on
the input data to obtain feature maps. In order to add nonlinearity to the architecture,
each feature map has been also subject to the activation functions [29]. In convolutional
neural networks, the rectified linear unit has been widely utilized as the activation
function. In pooling layers, the number of parameters and operations for obtaining the
output has been reduced in order to eliminate overfitting. In convolutional neural
networks, max pooling scheme has been widely utilized as the pooling function. After
convolution and pooling layers, the final output of the architecture has been identified
by fully connected layers [30].
5 Experimental Procedure and Results
In this section, dataset collection and preprocessing methodology, experimental pro-

cedure, evaluation measures and empirical results of the study have been presented.
5.1 Dataset Collection and Preprocessing

In the dataset collection, we adopted the framework presented in [31]. To build the
sarcasm dataset with sarcastic and non-sarcastic tweets, self-annotated tweets of Twitter
users have been utilized. Twitter messages with hashtags of “sarcasm” or “sarcastic”
are taken as sarcastic tweets, whereas hashtags about positive and negative sentiments
are regarded as non-sarcastic tweets. In this way, we obtained approximately 40.000
tweets written in English. Twitter4J, an open-source Java library for utilizing Twitter
Streaming API, has been utilized to collect dataset. In order to eliminate duplicated
tweets, retweets, ambiguous, irrelevant and redundant tweets, automatic filtering has
been utilized. Each tweet has been manually annotated with the use of a single-class
label, as either sarcastic or non-sarcastic. In this way, we obtained a collection of
298 A. Onan
roughly 15.000 sarcastic tweets and roughly 24.000 non-sarcastic tweets. In order to
obtain a balanced corpus, our final corpus contains Twitter messages with 15.000
sarcastic tweets and 15.000 non-sarcastic tweets. To preprocess our corpus, we have
adopted the framework presented in [32]. First, tokenization has been utilized on the
corpus to divide tweets into tokens, such as words and punctuation marks. To handle
with the tokenization process, Twokenize tool has been utilized. At the end of the
tokenization process, unnecessary items generated by Twokenize has been eliminated.
In addition, mentions, replies to other users’ tweets, URLs and special characters have
been eliminated. In Table 1, the distribution of the dataset and the basic descriptive
information for the dataset has been given.
Table 1. Descriptive information regarding the corpus utilized in empirical analysis

Set Positive Negative Total number
Training set 12.000 12.000 24.000
Testing set 3.000 3.000 6.000
5.2 Experimental Procedure

In the empirical analysis, the predictive performance of topic-enriched word embed-
ding scheme has been compared to conventional word-embedding schemes (such as,
word2vec, fastText and GloVe). In addition to word-embedding based feature sets,
conventional lexical, pragmatic, implicit incongruity and explicit incongruity based
feature sets are considered. In the experimental analysis, six subsets of Twitter mes-
sages have been taken into account, ranging from 5000 to 30.000. For each subset of
the corpus, 80% of data has been utilized as the training set, while the rest of data has
been utilized as the testing set. In order to implement convolutional neural network,
TensorFlow has been utilized. For the LDA2vec based word embedding, a number of
parameters (including, the number of topics and the negative sampling exponent) have
been taken into consideration. Since the best predictive performance is obtained by the
number of topics (N=25) and the negative sampling exponent (b 2 0.75), we have
reported the results for these parameter sets in the empirical results. For the word-
embedding schemes, vector size has been set to 200.
5.3 Evaluation Measures

In order to empirical analysis, we have utilized F-measure, as the evaluation measure.
Precision (PRE) is the proportion of the true positives against the true positives and
false positives as given by Eq. 3:
TP
PRE ¼ ð3Þ
TP þ FP
Recall (REC) is the proportion of the true positives against the true positives and
false negatives as given by Eq. 4:
TP
REC ¼ ð4Þ
TP þ FN
F-measure takes values between 0 and 1. It is the harmonic mean of precision and
recall as determined by Eq. 5:
2 PRE REC
F measure ¼ ð5Þ
PRE þ REC
5.4 Experimental Results

In Table 2, the predictive performance of four word embedding based feature repre-
sentation schemes, namely, word2vec, global vectors (GloVe), fastText and LDA2vec
has been given in terms of F-measure values. In addition to word-embedding based
feature sets, conventional lexical, pragmatic, implicit incongruity and explicit incon-
gruity based feature sets are considered [33, 34]. For the F-measure values given in
Table 2, convolutional neural network has been utilized. For word2vec and fastText
models, continuous skip-gram and continuous bag of words (CBOW) algorithms are
taken into consideration. In addition, different subsets of Twitter messages, ranging
from 5000 to 30.000 are taken into consideration. In Table 2, subset1 corresponds to
subset of corpus with 5000 tweets, subset2 corresponds to subset of corpus with 10000
tweets, subset3 corresponds to subset of corpus with 15000 tweets, subset4 corresponds
to subset of corpus with 20000 tweets, subset5 corresponds to subset of corpus with
25000 tweets and subset6 corresponds to the entire corpus with 30000 tweets.
As it can be observed from F-measure values presented in Table 2, LDA2Vec
based word embedding scheme yield higher predictive performance compared to other
word-embedding based schemes, such as word2vec, fastText and global vectors
(GloVe). The second highest predictive performance in terms of F-measure is generally
obtained by GloVe based word embedding and the lowest predictive performance is
generally obtained by word2vec based word embedding schemes. Regarding the pre-
dictive performance of continuous skip-gram and continuous bag of words (CBOW)
algorithms utilized in word2vec and fastText models, continuous bag of words
(CBOW) models generally outperform continuous skip-gram models. In addition to
word-embedding based feature representation, the empirical analysis also examines
whether integration of feature sets, such as lexical, pragmatic, implicit incongruity and
explicit incongruity based features can enhance the predictive performance of deep
learning based sarcasm identification schemes. As it can be observed from the results
listed in Table 2, the utilization of lexical, pragmatic, implicit incongruity and explicit
incongruity based features in conjunction with word-embedding based representation
schemes can enhance the predictive performance of classification schemes. The highest
predictive performance among the compared schemes is obtained by the integration of
LDA2Vec based word embedding with explicit incongruity based feature sets and the
second highest predictive performance is generally obtained by LDA2Vec based word
embedding with implicit incongruity based feature sets.
300 A. Onan
Table 2. F-measure values obtained by different word-embedding schemes

Representation Scheme Subset#1 Subset#2 Subset#3 Subset#4 Subset#5 Subset#6
word2vec (Skip-gram) 0.734 0.749 0.740 0.772 0.786 0.868
word2vec (CBOW) 0.737 0.762 0.741 0.790 0.796 0.839
fastText (Skip-gram) 0.738 0.769 0.759 0.791 0.806 0.842
fastText (CBOW) 0.740 0.771 0.768 0.793 0.806 0.838
GloVe 0.745 0.774 0.770 0.794 0.807 0.877
LDA2Vec 0.745 0.775 0.775 0.794 0.812 0.847
word2vec (Skip-gram) + Lexical 0.749 0.776 0.775 0.796 0.814 0.881
word2vec (Skip-gram) + Pragmatic 0.749 0.782 0.778 0.801 0.818 0.894
word2vec (Skip-gram) + Implicit 0.754 0.782 0.780 0.801 0.818 0.848
incongruity
word2vec (Skip-gram) + Explicit 0.755 0.783 0.782 0.803 0.819 0.865
incongruity
word2vec (CBOW) + Lexical 0.761 0.784 0.783 0.805 0.820 0.878
word2vec (CBOW) + Pragmatic 0.762 0.786 0.785 0.807 0.821 0.868
word2vec (CBOW) + Implicit 0.766 0.788 0.787 0.807 0.823 0.846
incongruity
word2vec (CBOW) + Explicit 0.768 0.791 0.789 0.807 0.825 0.879
incongruity
fastText (Skip-gram) + Lexical 0.770 0.792 0.791 0.811 0.830 0.865
fastText (Skip-gram) + Pragmatic 0.780 0.796 0.792 0.813 0.832 0.886
fastText (Skip-gram) + Implicit 0.781 0.797 0.796 0.813 0.835 0.863
incongruity
fastText (Skip-gram) + Explicit 0.781 0.799 0.798 0.814 0.836 0.872
incongruity
fastText (CBOW) + Lexical 0.782 0.800 0.798 0.814 0.837 0.871
fastText (CBOW) + Pragmatic 0.782 0.801 0.802 0.815 0.838 0.871
fastText (CBOW) + Implicit 0.782 0.802 0.806 0.817 0.841 0.907
incongruity
fastText (CBOW) + Explicit 0.784 0.807 0.809 0.824 0.843 0.855
incongruity
GloVe + Lexical 0.787 0.809 0.810 0.826 0.844 0.863
GloVe + Pragmatic 0.788 0.811 0.817 0.831 0.849 0.847
GloVe + Implicit incongruity 0.792 0.811 0.818 0.832 0.853 0.853
GloVe + Explicit incongruity 0.793 0.812 0.820 0.833 0.856 0.858
LDA2Vec + Lexical 0.802 0.813 0.822 0.835 0.858 0.857
LDA2Vec + Pragmatic 0.811 0.815 0.827 0.841 0.861 0.872
LDA2Vec + Implicit incongruity 0.815 0.818 0.828 0.849 0.862 0.858
LDA2Vec + Explicit incongruity 0.816 0.823 0.845 0.853 0.873 0.876
In order to summarize the main findings of the empirical analysis, Fig. 1 presents
the main effects plot for average F-measure values of compared representation schemes
and Fig. 2 presents the main effects plot for different subsets of dataset. Among the
compared subsets of the corpus, ranging from 5000 to 30000, the highest predictive
performance is obtained by utilizing the entire corpus, denoted as Subset#6.
Mean
Mean
0,76
0,78
0,80
0,82
0,84
0,86
0,78
0,79
0,80
0,82
0,83
0,84
0,85
0,77
0,81
fastText (CBOW)
fastText (CBOW) + Explicit incongruity
fastText (CBOW) + Implicit incongruity
Subset#1
fastText (CBOW) + Lexical
fastText (CBOW) + Pragmatic
fastText (Skip-gram)
fastText (Skip-gram) + Explicit incongruity
fastText (Skip-gram) + Implicit incongruity
Subset#2
fastText (Skip-gram) + Lexical
fastText (Skip-gram) + Pragmatic
GloVe
GloVe + Explicit incongruity
GloVe + Implicit incongruity
GloVe + Lexical
Subset#3
GloVe + Pragmatic
LDA2Vec
Data Means
Data Means
LDA2Vec + Explicit incongruity
Subsets
LDA2Vec + Implicit incongruity
LDA2Vec + Lexical
Subset#4
Representation Scheme LDA2Vec + Pragmatic
word2vec (CBOW)
Main Effects Plot for F-measure
word2vec (CBOW) + Explicit incongruity
Main Effects Plot for F-measure

word2vec (CBOW) + Implicit incongruity
word2vec (CBOW) + Lexical
Subset#5
word2vec (CBOW) + Pragmatic
word2vec (Skip-gram)
Fig. 2. The main effects plot for different subsets of dataset.

word2vec (Skip-gram) + Explicit incongruity
word2vec (Skip-gram) + Implicit incongruity
word2vec (Skip-gram) + Lexical
Topic-Enriched Word Embeddings for Sarcasm Identification
word2vec (Skip-gram) + Pragmatic
Subset#6
Fig. 1. The main effects plot for average F-measure values of compared representation schemes
301
302 A. Onan
6 Conclusion
Sarcasm is a type of nonliteral language, where people may express their negative
sentiments with the use of words with positive literal meaning, and, conversely, neg-
ative meaning words may be utilized to indicate positive sentiment. With the advances
in information and communication technologies, the immense quantity of user-
generated information available on the Web. As a result, sentiment analysis, which is
the process of extracting public sentiment towards entities or subjects, is a promising
research direction. Much online content contains sarcasm or other forms of nonliteral
language. The predictive performance of sentiment classification schemes may be
degraded if sarcasm cannot be handled properly. In this paper, we present a deep
learning based approach to sarcasm detection. In this scheme, LDA2vec, word2vec,
fastText and GloVe word embedding schemes have been utilized for sarcasm identi-
fication. In addition to word-embedding based feature sets, conventional lexical,
pragmatic, implicit incongruity and explicit incongruity based feature sets are con-
sidered. The experimental results indicated that LDA2vec outperforms other word-
embedding schemes for sarcasm identification. In addition, the utilization of lexical,
pragmatic, implicit incongruity and explicit incongruity based features in conjunction
with word-embedding based representation schemes can yield promising results.
References
1. Joshi, A., Bhattacharyya, P., Carman, M.J.: Automatic sarcasm detection: a survey. ACM
Comput. Surv. 50, 73 (2017)
2. Fersini, E., Messina, E., Pozzi, F.A.: Sentiment analysis: Bayesian ensemble learning. Decis.
Support Syst. 68, 26–38 (2014)
3. Joshi, A., Bhattacharyya, P., Carman, M.J.: Understanding the phenomenon of sarcasm. In:
Joshi, A., Bhattacharyya, P., Carman, M.J. (eds.) Investigations in Computational Sarcasm,
pp. 33–57. Springer, Berlin (2018)
4. Onan, A.: Sarcasm identification on twitter: a machine learning approach. In: Silhavy, R.,
Senkerik, R., Kominkova, Z., Prokopova, Z., Silhavy, P. (eds.) Artificial Intelligence Trends
in Intelligent Systems, pp. 374–383. Springer, Berlin (2017)
5. Muresan, S., Gonzalez-Ibanez, R., Ghosh, D., Wacholder, N.: Identification of nonliteral
language in social media: a case study on sarcasm. J. Assoc. Inf. Sci. Technol. (2016).
https://doi.org/10.1002/asi.23624
6. Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage
and communities. In: Proceedings of the 9th WebKDD Conference, pp. 56–65. ACM, New
York (2007)
7. Zhang, M., Zhang, Y., Fu, G.: Tweet sarcasm detection using deep neural network. In:
Proceedings of the 26th International Conference on Computational Linguistics, pp. 2449–
2460. COLING, New York (2016)
8. Gonzalez-Ibanez, R., Muresan, S., Wacholder, N.: Identifying sarcasm in twitter: a closer
look. In: Proceedings of the 49th Annual Meeting of the Association for Computational
Linguistics, pp. 581–586. ACL, New York (2011)
9. Reyes, A., Rosso, P., Buscaldi, D.: From humar recognition to irony detection: the figurative
language of social media. Data Knowl. Eng. 74, 1–12 (2012)
10. Reyes, A., Rosso, P., Veale, T.: A multidimensional approach for detecting irony in twitter.
Lang. Resour. Eval. 47(1), 239–268 (2013)
11. Ptacek, T., Habernal, I., Hong, J.: Sarcasm detection on czech and english twitter. In:
Proceedings of COLING 2014, pp. 213–223. COLING, New York (2014)
12. Barbieri, F., Saggion, H., Ronzano, F.: Modelling sarcasm in twitter a novel approach. In:
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment
and Social Media Analysis, pp. 50–58. ACL, New York (2014)
13. Rajadesingan, A., Zafarani, R., Liu, H.: Sarcasm detection on twitter: a behavioural
modelling approach. In: Proceedings of the Eight ACM International Conference on Web
Search and Data Mining, pp. 97–106. ACM, New York (2015)
14. Hernandez-Faria, D., Patti, V., Rosso, P.: Irony detection in twitter: the role of affective
content. ACM Trans. Internet Technol. 16(3), 1–19 (2016)
15. Bouazizi, M., Ohtsuki, T.O.: A pattern-based approach for sarcasm detection on Twitter.
IEEE Access 4, 5477–5488 (2016)
16. Kumar, L., Somani, A., Bhattacharyya, P.: Having 2 hours to write a paper is fun: detecting
sarcasm in numerical portions of text. arXiv preprint arXiv:1709.01950 (2017)
17. Mishra, A., Kanojia, D., Nagar, S., Dey, K., Bhattacharyya, P.: Harnessing cognitive
features for sarcasm detection. arXiv preprint arXiv:1701.05574 (2017)
18. Ghosh, D., Guo, W., Muresan, S.: Sarcastic or not: word embeddings to predict the literal or
sarcastic meaning of words. In: Proceedings of the 2015 Conference on Empirical Methods
in Natural Language Processing, pp. 1003–1012. ACL, New York (2015)
19. Joshi, A., Tripathi, V., Patel, K., Bhattacharyya, P., Carman, M.: Are word embedding-based
features useful for sarcasm detection. arXiv preprint arXiv:1610.00883 (2016)
20. Poria, S., Cambria, E., Hazarika, D., Vij, P.: A deeper look into sarcastic tweets using deep
convolutional neural networks. arXiv preprint arXiv:1610.08815 (2016)
21. Rezaeinia, S.M., Ghodsi, A., Rahmani, R.: Improving the accuracy of pre-trained word
embeddings for sentiment analysis. arXiv preprint arXiv:1711.08609 (2017)
22. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in
vector space. arXiv preprint arXiv:1301.3781 (2013)
23. Bairong, Z., Wenbo, W., Zhiyu, L., Chonghui, Z., Shinozaki, T.: Comparative analysis of
word embedding methods for DSTC6 end-to-end conversation modelling track. In:
Proceedings of the 6th Dialog System Technology Challenges Workshop (2017)
24. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword
information. arXiv preprint arXiv:1607.04606 (2016)
25. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In:
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing,
pp. 1532–1543. ACL, New York (2014)
26. Moody, C.E., Johnson, R., Zhang, T.: Mixing Dirichlet Topic Models and Word
Embeddings to Make Lda2vec (2014). https://www.datacamp.com/community/tutorials/
lda2vec-topic-model
27. Johnson, R., Zhang, T.: Effective use of word order for text categorization with
convolutional neural networks. arXiv preprint arXiv:1412.1058 (2014)
28. Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural
language processing. IEEE Comput. Intell. Mag. 13(3), 55–75 (2018)
29. Kilimci, Z., Akyokus, S.: Deep learning and word embedding-based heterogeneous classifier
ensembles for text classification. Complexity 2018, 1–10 (2018)
30. Cireşan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image
classification. arXiv preprint arXiv:1202.2745 (2012)
304 A. Onan
31. Gonzalez-Ibanez, R., Muresan, S., Wacholder, N.: Identifying sarcasm in Twitter: a closer
look. In: Proceedings of the 49th Annual Meeting of the Association for Computation
Linguistics, pp. 581–586. ACL, New York (2011)
32. Paredes-Valverde, M.A., Colomo-Palacios, R., Salas-Zarate, M., Valencia-Garcia, R.:
Sentiment analysis in Spanish for improvement of product and services: a deep learning
approach. Sci. Program. 2017, 1–12 (2017)
33. Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N., Huang, R.: Sarcasm as contrast
between a positive sentiment and negative situation. In: Proceedings of the 2013 Conference
on Empirical Methods in Natural Language Processing, pp. 704–714. ACL, New York
(2013)
34. Ramteke, A., Malu, A., Bhattacharyya, P., Nath, J.S.: Detecting turnarounds in sentiment
analysis: thwarting. In: Proceedings of the 51st Annual Meeting of the Association for
Computational Linguistics, pp. 860–865. ACL, New York (2013)
View publication stats

Topic-Enriched Word Embeddings For Sarcasm Identification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic-Enriched Word Embeddings For Sarcasm Identification

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Topic-Enriched Word Embeddings for Sarcasm Identiﬁcation

Chapter · May 2019

The user has requested enhancement of the downloaded file.

Department of Computer Engineering, Faculty of Engineering and Architecture,

Abstract. Sarcasm is a type of nonliteral language, where people may express

Keywords: Sarcasm detection Word-embedding based features

© Springer Nature Switzerland AG 2019

Bag-of-words model is a typical representation scheme for natural language processing

The LDA2vec model is a word-embedding scheme based on word2vec and the

4 Convolutional Neural Networks

5 Experimental Procedure and Results

In this section, dataset collection and preprocessing methodology, experimental pro-

5.1 Dataset Collection and Preprocessing

Table 1. Descriptive information regarding the corpus utilized in empirical analysis

5.2 Experimental Procedure

5.3 Evaluation Measures

5.4 Experimental Results

Table 2. F-measure values obtained by different word-embedding schemes

word2vec (CBOW) + Explicit incongruity

Main Effects Plot for F-measure

Fig. 2. The main effects plot for different subsets of dataset.

word2vec (Skip-gram) + Pragmatic

View publication stats

You might also like