Professional Documents
Culture Documents
ABSTRACT Micro-blogging sources such as the Twitter social network provide valuable real-time data for
market prediction models. Investors’ opinions in this network follow the fluctuations of the stock markets
and often include educated speculations on market opportunities that may have impact on the actions of
other investors. In view of this, we propose a novel system to detect positive predictions in tweets, a type
of financial emotions which we term ‘‘opportunities’’ that are akin to ‘‘anticipation’’ in Plutchik’s theory.
Specifically, we seek a high detection precision to present a financial operator a substantial amount of such
tweets while differentiating them from the rest of financial emotions in our system. We achieve it with
a three-layer stacked Machine Learning classification system with sophisticated features that result from
applying Natural Language Processing techniques to extract valuable linguistic information. Experimental
results on a dataset that has been manually annotated with financial emotion and ticker occurrence tags
demonstrate that our system yields satisfactory and competitive performance in financial opportunity
detection, with precision values up to 83%. This promising outcome endorses the usability of our system to
support investors’ decision making.
INDEX TERMS Financial management, machine learning, emotion recognition, natural language
processing.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 215679
F. de Arriba-Pérez et al.: Detection of Financial Opportunities in Micro-Blogging Data With a Stacked Classification System
To the best of our knowledge, this work is the first proposal knowledge extraction from the comments by experts in finan-
based on EA techniques to detect what we call financial oppor- cial news, blogs and social media is an interesting research
tunities, that is, Twitter posts that speculate or reason that goal and a valuable asset for practical applications [29], [30].
the value of particular actives will grow, rather than merely Previous researchers have analysed online sources to char-
expressing positive opinions about those actives. Specifically, acterise stock markets. Most of them have focused on news,
we present a three-layer stacked classifier with a first stage to as [31], who studied the correlation between financial market
discard neutral entries, a second stage to distinguish generic events and financial news content. In this regard, the work
positive emotions from negative ones and a last stage to by [32] deserves a special mention. Using Machine Learning
differentiate opportunity entries –our ultimate goal– from any models of Latent Dirichlet Allocation, they concluded that
other positive emotions. the information extracted from news sources is better than
For this purpose, the system is enhanced with sophisticated price variations for predicting assets’ volatility. Additionally,
linguistic information features such as n-gram sequences, in [33] a Deep Learning ( DL) model for news categorisation
emotion and polarity dictionaries, and frequency counters of was presented proving that financial sources of information
hashtags, numerical information and percentages. have a significant effect on investments. There is also work on
knowledge extraction from financial blogs. For example, the
B. RESEARCH GOAL AND CONTRIBUTIONS method by [34] combined stock indexes with sentiment time
We seek a Machine Learning system to detect financial series from micro-blogs. Besides, [29] applied multivariate
opportunities in textual information extracted from tweets, regression to financial blogs to study how the markets react
where by opportunities we refer to positive user speculations to the posts. Finally, the study of the information posted
and forecasts (e.g. ‘‘the average yield of Google is 15.5%’’) on social media platforms like Twitter has been a relevant
instead of mere positive statements (e.g. ‘‘I like Google’’). research topic in recent years. Consider for example the
They are akin to looking forward positively for upcoming analysis of consumer profiles by [35] and the study of the evo-
events, as ‘‘anticipation’’ in Plutchik’s theory [24]. lution of stock prices and social media content by [36] based
This problem has not been previously considered in aca- on a latent space model [37]. However, none of these works
demic research and has remarkable practical applications for has considered user speculations about the evolution of the
both stock market screening and decision making. Notwith- assets. Note that all of them have taken temporal information
standing this applied perspective, our research has also a directly from the timestamps of financial forecasts or posts
theoretical goal, an enhanced understanding of the linguistic [29], [32], [34], [36]. In fact, we are not aware of any financial
dimension of micro-blogging comments to extract investment research supported by temporal analysis at discursive level
opportunities. Summing up, this work contributes to more such as ours.5
effective Machine Learning classifiers for stock market data In this context, NLP techniques [40] such as SA [41] and
mining with the following: EA [42], [43] have successfully extracted knowledge from
comments of stock market experts. Specifically, in [42] the
• A three-layer stacked Machine Learning classification
authors presented a novel approach for the automatic extrac-
system for detecting opportunities in micro-blogging
tion of a high-coverage and high-precision lexicon annotated
comments.
with emotion scores, while in [43] they confirmed the impact
• An analysis of the most relevant features through NLP
of social media emotions on the stock market.
techniques.
Often as an initial stage of EA, SA infers sentiment polarity
In addition, we employ a new dataset of investors’ com- from language features of user opinions [41], [44]. Typically,
ments for stock market forecasting with 6,000 entries –similar three polarity levels (negative, neutral and positive) are con-
in size to those in other state-of-the-art studies [25]–[28], sidered [45], although some authors add two deeper positive
which has been manually annotated with financial emotion and negative sentiment levels [46], [47]. Among the wealth
and ticker occurrence tags (e.g. in ‘‘the average yield of of research in Twitter mining, we can cite for example the
Google is 15.5%’’ the ticker tag is GOOGL) by experts in work by [48] on the analysis of consumer opinions about
the field. commercial brands, or by [49] on financial SA. The latter
The rest of this article is organised as follows. Section II classified Twitter information about eight stock markets with
reviews related work in knowledge extraction, discussing EA a Support Vector Classifier ( SVC).
from general and financial perspectives. Section III describes EA infers feelings from linguistic information in user com-
the classification problem and our solution. Section IV ments [50], [51]. It has become a central topic in the field
presents the experimental text corpus and the numerical tests of Affective Computing. Some typical emotion classes in
that validate our approach. Finally, Section V concludes this
article. 5 This aspect is relevant. Even though [38] already noted the interest
of the temporal perspective of the customers, i.e. the importance of their
II. RELATED WORK own understanding of time, few works have addressed temporal analysis at
discursive level. Among early ones we must mention [39], which extracted
Novice investors in stock markets give high considera- time from punctuation, word choices and academic terminology or keywords
tion to the experience of financial experts. Thus, automatic in journal titles, by also considering common sense context information.
the literature are love, joy, anger, sadness, fear and sur- that enter the classification module. Table 2 presents some
prise, as defined by [52]. For example, the approach by [53] examples before and after applying these transformations.
detected affection in virtual communication environments. EA • Filtering. It selects finance-related tweets while
has numerous applications in education [54], healthcare [55], discarding spam and related tweets. Basically, tweets
[56] and gaming [57], to cite some. It has also caught the containing asset identifiers such as stock markets’ tick-
attention of the industry as a profitable asset. In the particular ers, URLs and quantitative information. We also remove
case of finance, [58] proposed a Linked Data approach for advertising entries with spam-related words and expres-
modelling sentiment and emotions from tweets about secu- sions such as sorteo ‘draw’, gratis ‘free’ and ?‘quieres
rities of the Spanish stock market. The system for financial ganar dinero? ‘do you want to earn money?’. Moreover,
forecasting by [22] incorporated different sources of informa- we discard words in other languages than Spanish using
tion into an integrated river model and combined GPOMS6 with the Enchant Python module.7 Finally, to discard highly
a Neural Network classifier, and [23] applied a multi-layer similar tweets, we implemented a similarity detector
perceptron to study the influence of investor emotions in based on the Jaccard distance with a threshold of 0.75%,
investment decisions. At the end, even though EA is still an inspired by state-of-the-art spam detection techniques
interesting research topic [59], [60], no previous analyses [65]–[67].
have taken into consideration finance-related emotions, such • Hashtag and mention splitting. We decompose hashtags
as opportunity or anticipation, and have focused exclusively in words with a splitter that uses our own lexica [68],
on general human emotions. [69] and the Spanish frequency reference corpus (CREA)
Training methodologies, on the other hand, can be divided by Real Academia Española de la Lengua.8 As shown
into supervised (manually annotated), semi-supervised and in the example in Table 2, the splitter divides the term
unsupervised approaches. For example, in [61] some authors acuerdocomercial successfully as acuerdo comercial
of this article proposed an unsupervised system for auto- ‘commercial agreement’.
matic generation of emotion lexica with polarity data. Hybrid • Spelling correction. We correct words with spelling
solutions such as [62] (which applied classifiers as well as mistakes by replacing them by the most likely candi-
unsupervised approaches with polarity lexicons and syntactic date through our own word distance algorithm that uses
structures) have produced high competitive results. Finally, CREA as well as the Enchant Python module. In the
stacking strategies combining different Machine Learning example in Table 2, the word precausion is corrected and
techniques into the same predictive model can improve per- replaced by precaución ‘caution’.
formance [63], [64]. It has been our intent to exploit the ben- • Stock market asset, mention and hashtag detection and
efits of stacking in ensemble learning techniques to further removal. We use regular expressions to detect capitalised
increase the performance of our system. letters and identify representative symbols such as $, @
and #. Once identified, all these elements are removed
III. SYSTEM ARCHITECTURE from the text of the entries.
In this section we present our novel three-layer stacked sys- • Stop words removal. Meaningless words such as deter-
tem for detecting financial tweets on market opportunities. miners and prepositions9 are removed from the text.
Specifically, the first stage detects neutral entries, the second We also remove URLs and retweet ( RT) tags. We consider
stage distinguishes between general positive and negative days of the week and months of the year as stop words.
emotions and the last stage separates opportunity entries from However, we keep elements such as no ‘not’, sí ‘yes’,
any other positive statements. Figure 1 illustrates its frame- muy ‘very’ and poco ‘few’, since they help to interpret
work, which consists of two modules that will be described in the evolution of the assets.
detail in the next subsections, where financial tweets are taken • Quantitative data and laughter replacement. We sub-
from a training database. Table 1 shows examples of input stitute quantitative data (numbers and percentages) by
data and the corresponding emotion categories (including tags representing their sign, that is, ‘+’ and ‘−’ for
‘‘opportunity’’, which we have already described). In the positive and negative figures, respectively. We decided
context of Plutchik’s theory [24], ‘‘positive statement’’ cor- to use these symbolic representations rather than the
responds to any positive emotion on present events such as amounts themselves because they are more clearly
‘‘expectation’’; and ‘‘negative awareness’’ to any negative related to stock markets’ upward and downward move-
opinion about an asset, regardless of the sentiment expressed ments. Laughing onomatopoeic expressions such as
in the tweet, which is similar to ‘‘disapproval’’. haha and hehe, which are typically composed of the
characters j or h interspersed with the vowels a, e and
A. DATA PROCESSING MODULE i in any quantity, are replaced by the LAUGH tag.
This module discards useless data. Specifically, we apply a
series of text transformations to ensure the quality of the data
7 Available at https://pypi.org/project/pyenchant/, August 2020.
6 Google Profile of Moods States (GPOMS). At the time this article was 8 Available at http://corpus.rae.es/lfrecuencias.html, August 2020.
written it was no longer available. 9 Available at https://www.ranks.nl/stopwords/spanish, August 2020.
• Text lemmatisation. The content of the tweets is split into we also include the textual representation of the emojis
tokens (words), which are independently checked in the in the tweets. To generate char n-grams, word tokens
Hunspell dictionary.10 Finally, the words are lemmatised and word n-grams we combined CountVectorizer12 with
using the Freeling NLP tool.11 GridSearchCV,13 both from the Scikit-Learn Python
library, using the parameters in Listing 1. GridSearchCV
B. CLASSIFICATION MODULE performs an exhaustive search over specified parameter
1) MACHINE LEARNING MODEL: FEATURE PROCESSING ranges for an estimator. As a result, we obtained the
The features we selected to extract valuable information from optimal parameters max_df = 0.5, min_df = 0.001 and
our dataset and build our model are: ngram_range = (1,7) in our scenario.
• Frequency counters. These features count the number
• Char-grams, word tokens and word-grams. Char-grams
of hashtags, negative and positive amounts and per-
are sequences of adjacent characters in a text (note
centages, exclamation marks, interrogation marks and
that spaces must also be considered in this case). Word
adverbs in the content of the tweets.
tokens represent character grams only from the text
inside word boundaries, whereas word n-grams are
sequences of adjacent words in a text, among which 12 Available at https://scikit-learn.org/stable/ modules/generat-
ed/sklearn.feature_extraction.text.CountVectorizer.html, July 2020.
10 Available at https://hunspell.github.io/, August 2020. 13 Available at https://scikit-learn.org/stable/ modules/generat-
11 Available at http://nlp.lsi.upc.edu/freeling/node/1, August 2020. ed/sklearn.model_selection.GridSearchCV.html, August 2020.
tweet and ŷi ∈ N its predicted category (so that yi and ŷi are • Operating System: Ubuntu 18.04.2 LTS 64 bits
taken to a numerical space that allows defining a partial or • Processor: IntelCore i9-9900K 3.60 GHz
total ordering). • RAM: 32GB DDR4
The level of confidence in the prediction depends on the • Disk: 500 Gb (7200 rpm SATA) + 256 GB SSD
probabilities of correct class assignment. Logically, it is pos-
sible to select subsets of vectors in the training set for which A. DATASET
such probabilities are higher, and thus to trade accuracy by TxStockData S.L., a fintech company, provided us with a
precision. dataset composed of 6,000 tweets (similar in size to those
in other state-of-the-art studies [25]–[28]). The dataset was
IV. RESULTS AND DISCUSSION gathered from 14th May 2019 to 3rd February 2020. Its
All the experiments were executed on a computer with the entries were manually annotated with financial emotion tags
following specifications: by five experts in finance and NLP. At the end, the emotion
TABLE 4. Distribution of samples in the dataset by category. of the probability of presenting negative awareness results
as opportunities, A−
P+
, which would strongly discourage an
A−+
operator, that is τ2 = 1 − P
S ++ +P++ +NP+ +A−+
(where NP+
P P P
corresponds to neutral tweets from a financial perspective).
Therefore, our declared goal is first and foremost a reasonably
high precision in the detection of opportunities, for high
values of τ1 and, secondarily, of τ2 .
TABLE 6. Precisions and tolerances for numerical test 1. TABLE 8. Precisions and tolerances of the two-layer stacked classifier
with decision depth, numerical test 3.
management dashboards. It exploits sophisticated linguistic [14] M. Nofer and O. Hinz, ‘‘Using Twitter to predict the stock market,’’ Bus.
features, such as polarity and emotion dictionaries and tem- Inf. Syst. Eng., vol. 57, no. 4, pp. 229–242, Aug. 2015.
[15] T. Dimpfl and S. Jank, ‘‘Can Internet search queries help to predict stock
poral identification by discursive analysis, in its ML model. market volatility?’’ Eur. Financial Manage., vol. 22, no. 2, pp. 171–192,
It yields satisfactory and competitive market-level perfor- Mar. 2016.
mance in financial opportunity detection. [16] X. Zhong and D. Enke, ‘‘Forecasting daily stock market return using
dimensionality reduction,’’ Expert Syst. Appl., vol. 67, pp. 126–139,
Experimental results, including two ad-hoc tolerance met- Jan. 2017.
rics focusing on data quality from an operator perspective, [17] X. Zhang, Y. Zhang, S. Wang, Y. Yao, B. Fang, and P. S. Yu, ‘‘Improving
demonstrate that our three-layer stacked system with decision stock market prediction via heterogeneous information fusion,’’ Knowl.-
Based Syst., vol. 143, pp. 236–247, Mar. 2018.
probability threshold (decision depth) succeeds to achieve our [18] E. Hoseinzade and S. Haratizadeh, ‘‘CNNpred: CNN-based stock market
goal. In particular, using a RF algorithm with all features in prediction using a diverse set of variables,’’ Expert Syst. Appl., vol. 129,
the model, the system attains ∼ 83% financial opportunity pp. 273–285, Sep. 2019.
[19] F. Sun, A. Belatreche, S. Coleman, T. M. McGinnity, and Y. Li, ‘‘Pre-
precision with detection tolerances τ1 ∼ 90% and τ2 ∼ 96%. processing online financial text for sentiment classification: A natural
Given the valuable up-to-date information in micro-blogging language processing approach,’’ in Proc. IEEE Conf. Comput. Intell.
platforms, these promising results endorse the usability of our Financial Eng. Econ. (CIFEr), Mar. 2014, pp. 122–129.
[20] I. E. Fisher, M. R. Garnsey, and M. E. Hughes, ‘‘Natural language process-
system to support investors’ decision making. ing in accounting, auditing and finance: A synthesis of the literature with a
As future work and to further expand the contributions roadmap for future research,’’ Intell. Syst. Accounting, Finance Manage.,
of our research, the emotion classifiers will be improved vol. 23, no. 3, pp. 157–214, Jul. 2016.
[21] F. Z. Xing, E. Cambria, and R. E. Welsch, ‘‘Natural language based
with domain-specific tweet filters and quantitative (objec- financial forecasting: A survey,’’ Artif. Intell. Rev., vol. 50, no. 1, pp. 49–73,
tive) numerical information from stock exchange prices. Jun. 2018.
In addition to opportunities, this will allow analysing positive [22] K. K. Singh and P. Dimri, ‘‘Score based financial forecasting method by
incorporating different sources of information flow into integrative river
statement and negative awareness emotions in stock market model,’’ in Proc. 6th Int. Conf. Cloud Syst. Big Data Eng. (Confluence),
reports. Moreover, extensions for a multilingual version of Jan. 2016, pp. 685–688.
our framework to cover French and English will be provided. [23] N. I. M. Razi, M. Othman, and H. Yaacob, ‘‘Investment decisions
based on EEG emotion recognition,’’ Adv. Sci. Lett., vol. 23, no. 11,
pp. 11345–11349, Nov. 2017.
REFERENCES [24] R. Plutchik, ‘‘The circumplex as a general model of the structure of
[1] T. Li, J. van Dalen, and P. J. van Rees, ‘‘More than just noise? Examining emotions and personality,’’ in Circumplex Models of Personality and Emo-
the information content of stock microblogs on financial markets,’’ J. Inf. tions. Worcester, MA, USA: American Psychological Association, 2004,
Technol., vol. 33, no. 1, pp. 50–69, Mar. 2018. pp. 17–45.
[25] S. P. Chatzis, V. Siakoulis, A. Petropoulos, E. Stavroulakis, and
[2] X. Li, P. Wu, and W. Wang, ‘‘Incorporating stock prices and news senti-
N. Vlachogiannakis, ‘‘Forecasting stock market crisis events using deep
ments for stock market prediction: A case of Hong Kong,’’ Inf. Process.
and statistical machine learning techniques,’’ Expert Syst. Appl., vol. 112,
Manage., vol. 57, no. 5, Sep. 2020, Art. no. 102212.
pp. 353–371, Dec. 2018.
[3] G. Bello-Orgaz, R. M. Mesas, C. Zarco, V. Rodriguez, O. Cordón, and
[26] M. Al-Smadi, M. Al-Ayyoub, Y. Jararweh, and O. Qawasmeh, ‘‘Enhancing
D. Camacho, ‘‘Marketing analysis of wineries using social collective
aspect-based sentiment analysis of arabic Hotels’ reviews using morpho-
behavior from users’ temporal activity on Twitter,’’ Inf. Process. Manage.,
logical, syntactic and semantic features,’’ Inf. Process. Manage., vol. 56,
vol. 57, no. 5, Sep. 2020, Art. no. 102220.
no. 2, pp. 308–319, Mar. 2019.
[4] M. Meire, M. Ballings, and D. Van den Poel, ‘‘The added value of social
[27] D. Simester, A. Timoshenko, and S. I. Zoumpoulis, ‘‘Targeting prospec-
media data in B2B customer acquisition systems: A real-life experiment,’’
tive customers: Robustness of machine-learning methods to typical data
Decis. Support Syst., vol. 104, pp. 26–37, Dec. 2017.
challenges,’’ Manage. Sci., vol. 66, no. 6, pp. 2495–2522, 2019.
[5] P.-F. Pai and C.-H. Liu, ‘‘Predicting vehicle sales by sentiment anal- [28] J. Tuke, A. Nguyen, M. Nasim, D. Mellor, A. Wickramasinghe, N. Bean,
ysis of Twitter data and stock market values,’’ IEEE Access, vol. 6, and L. Mitchell, ‘‘Pachinko prediction: A Bayesian method for event pre-
pp. 57655–57662, 2018. diction from social media data,’’ Inf. Process. Manage., vol. 57, no. 2,
[6] H. Yuan, W. Xu, Q. Li, and R. Lau, ‘‘Topic sentiment mining for sales per- Mar. 2020, Art. no. 102147.
formance prediction in e-commerce,’’ Ann. Oper. Res., vol. 270, nos. 1–2, [29] L. K. Rickett, ‘‘Do financial blogs serve an infomediary role in capital
pp. 553–576, Nov. 2018. markets?’’ Amer. J. Bus., vol. 31, no. 1, pp. 17–40, Apr. 2016.
[7] F. Mai, Z. Shan, Q. Bai, X. Wang, and R. H. L. Chiang, ‘‘How does social [30] W. He, F.-K. Wang, and V. Akula, ‘‘Managing extracted knowledge from
media impact bitcoin value? A test of the silent majority hypothesis,’’ big social media data for business decision making,’’ J. Knowl. Manage.,
J. Manage. Inf. Syst., vol. 35, no. 1, pp. 19–52, Jan. 2018. vol. 21, no. 2, pp. 275–294, Apr. 2017.
[8] Y. Sun, X. Liu, G. Chen, Y. Hao, and Z. J. Zhang, ‘‘How mood affects the [31] M. Alanyali, H. S. Moat, and T. Preis, ‘‘Quantifying the relationship
stock market: Empirical evidence from microblogs,’’ Inf. Manage., vol. 57, between financial news and the stock market,’’ Sci. Rep., vol. 3, no. 1,
no. 5, pp. 103–181, Jul. 2020. pp. 3578–3584, Dec. 2013.
[9] D. Enke and S. Thawornwong, ‘‘The use of data mining and neural [32] A. Atkins, M. Niranjan, and E. Gerding, ‘‘Financial news predicts stock
networks for forecasting stock market returns,’’ Expert Syst. Appl., vol. 29, market volatility better than close price,’’ J. Finance Data Sci., vol. 4, no. 2,
no. 4, pp. 927–940, Nov. 2005. pp. 120–137, Jun. 2018.
[10] M. S. Gerber, ‘‘Predicting crime using Twitter and kernel density estima- [33] M.-Y. Day and C.-C. Lee, ‘‘Deep learning for financial sentiment analysis
tion,’’ Decis. Support Syst., vol. 61, pp. 115–125, May 2014. on finance news providers,’’ in Proc. IEEE/ACM Int. Conf. Adv. Social
[11] A. G. Reece, A. J. Reagan, K. L. M. Lix, P. S. Dodds, C. M. Danforth, Netw. Anal. Mining (ASONAM), Aug. 2016, pp. 1127–1134.
and E. J. Langer, ‘‘Forecasting the onset and course of mental illness with [34] Y. Wang, ‘‘Stock market forecasting with financial micro-blog based on
Twitter data,’’ Sci. Rep., vol. 7, no. 1, pp. 1–11, Dec. 2017. sentiment and time series analysis,’’ J. Shanghai Jiaotong Univ. Sci.,
[12] K. Zahra, M. Imran, and F. O. Ostermann, ‘‘Automatic identification of vol. 22, no. 2, pp. 173–179, Apr. 2017.
eyewitness messages on Twitter during disasters,’’ Inf. Process. Manage., [35] E. Ioanăs and I. Stoica, ‘‘Social media and its impact on consumers
vol. 57, no. 1, pp. 102–107, 2020. behavior,’’ Int. J. Econ. Practices Theories, vol. 4, no. 2, pp. 295–303,
[13] N. Oliveira, P. Cortez, and N. Areal, ‘‘The impact of microblogging data 2014.
for stock market prediction: Using Twitter to predict returns, volatility, [36] A. Sun, M. Lachanski, and F. J. Fabozzi, ‘‘Trade the tweet: Social media
trading volume and survey sentiment indices,’’ Expert Syst. Appl., vol. 73, text mining and sparse matrix factorization for stock market prediction,’’
pp. 125–144, May 2017. Int. Rev. Financial Anal., vol. 48, pp. 272–281, Dec. 2016.
[37] F. Ming, F. Wong, Z. Liu, and M. Chiang, ‘‘Stock market prediction from [59] D. Duxbury, T. Gärling, A. Gamble, and V. Klass, ‘‘How emotions
WSJ: Text mining via sparse matrix factorization,’’ in Proc. IEEE Int. Conf. influence behavior in financial markets: A conceptual analysis and
Data Mining, Dec. 2014, pp. 430–439. emotion-based account of buy-sell preferences,’’ Eur. J. Finance, vol. 26,
[38] P. T. Gibbs, ‘‘Time, temporality and consumer behaviour,’’ Eur. J. Market- pp. 1–22, Mar. 2020.
ing, vol. 32, nos. 11–12, pp. 993–1007, Dec. 1998. [60] S. Pengnate and F. J. Riggins, ‘‘The role of emotion in P2P microfinance
[39] J. M. Forray and J. Woodilla, ‘‘Artefacts of management academe,’’ Time funding: A sentiment analysis approach,’’ Int. J. Inf. Manage., vol. 54,
Soc., vol. 14, nos. 2–3, pp. 323–339, Sep. 2005. Oct. 2020, Art. no. 102138.
[40] B. Liu, Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. [61] M. Fernández-Gavilanes, J. Juncal-Martínez, S. García-Méndez,
Cambridge, U.K.: Cambridge Univ. Press, 2015. E. Costa-Montenegro, and F. J. González-Castaño, ‘‘Creating Emoji
[41] A. Derakhshan and H. Beigy, ‘‘Sentiment analysis on stock social media Lexica from unsupervised sentiment analysis of their descriptions,’’
for stock price movement prediction,’’ Eng. Appl. Artif. Intell., vol. 85, Expert Syst. Appl., vol. 103, pp. 74–91, Aug. 2018.
pp. 569–578, Oct. 2019. [62] M. Fernández-Gavilanes, T. Àlvarez-López, J. Juncal-Martínez,
[42] J. Staiano and M. Guerini, ‘‘Depeche mood: A lexicon for emotion analysis E. Costa-Montenegro, and F. J. González-Castaño, ‘‘GTI: An unsupervised
from crowd annotated news,’’ in Proc. 52nd Annu. Meeting Assoc. Comput. approach for sentiment analysis in Twitter,’’ in Proc. 9th Int. Workshop
Linguistics, vol. 2. Stroudsburg, PA, USA: Association for Computational Semantic Eval. (SemEval), 2015, pp. 35–40.
Linguistics, 2014, pp. 427–433. [63] A. Mehmood, B.-W. On, I. Lee, I. Ashraf, and G. Sang Choi, ‘‘Spam
[43] Y. Ge, J. Qiu, Z. Liu, W. Gu, and L. Xu, ‘‘Beyond negative and positive: comments prediction using stacking with ensemble learning,’’ J. Phys.,
Exploring the effects of emotions in social media during the stock market Conf. Ser., vol. 933, Jan. 2018, Art. no. 012012.
crash,’’ Inf. Process. Manage., vol. 57, no. 4, Jul. 2020, Art. no. 102218. [64] Y. Wang, S. Liu, S. Li, J. Duan, Z. Hou, J. Yu, and K. Ma, ‘‘Stacking-based
ensemble learning of self-media data for marketing intention detection,’’
[44] G. Xu, Y. Meng, X. Qiu, Z. Yu, and X. Wu, ‘‘Sentiment analysis of
Future Internet, vol. 11, no. 7, p. 155, Jul. 2019.
comment texts based on BiLSTM,’’ IEEE Access, vol. 7, pp. 51522–51532,
[65] S. R. Harsule and M. K. Nighot, ‘‘N-gram classifier system to filter spam
2019.
messages from OSN user wall,’’ in Advances in Intelligent Systems and
[45] Y. Fang, H. Tan, and J. Zhang, ‘‘Multi-strategy sentiment analysis of
Computing. Singapore: Springer, 2016, pp. 21–28.
consumer reviews based on semantic fuzziness,’’ IEEE Access, vol. 6,
[66] S. Bajaj, N. Garg, and S. K. Singh, ‘‘A novel user-based spam review
pp. 20625–20631, 2018.
detection,’’ Procedia Comput. Sci., vol. 122, pp. 1009–1015, Jan. 2017.
[46] A. Balahur and J. M. Perea-Ortega, ‘‘Sentiment analysis system adaptation
[67] S. Temma, M. Sugii, and H. Matsuno, ‘‘The document similarity index
for multilingual processing: The case of tweets,’’ Inf. Process. Manage.,
based on the jaccard distance for mail filtering,’’ in Proc. 34th Int. Tech.
vol. 51, no. 4, pp. 547–556, Jul. 2015.
Conf. Circuits/Syst., Comput. Commun. (ITC-CSCC), Jun. 2019, pp. 1–4.
[47] J. Bučar, J. Povh, and M. Žnidaršič, ‘‘Sentiment classification of the [68] S. García-Méndez, M. Fernández-Gavilanes, E. Costa-Montenegro,
Slovenian news texts,’’ in Proc. 9th Int. Conf. Comput. Recognit. Syst. J. Juncal-Martínez, and F. J. González-Castaño, ‘‘Automatic natural
Cham, Switzerland: Springer, 2016, pp. 777–787. language generation applied to alternative and augmentative
[48] D. Zimbra, M. Ghiassi, and S. Lee, ‘‘Brand-related Twitter sentiment anal- communication for online video content services using SimpleNLG
ysis using feature engineering and the dynamic architecture for artificial for Spanish,’’ in Proc. Internet Accessible Things, Apr. 2018, pp. 1–4.
neural networks,’’ in Proc. 49th Hawaii Int. Conf. Syst. Sci. (HICSS), [69] S. García-Méndez, M. Fernández-Gavilanes, E. Costa-Montenegro,
Jan. 2016, pp. 1930–1938. J. Juncal-Martínez, and F. Javier González-Castaño, ‘‘A library for
[49] J. Smailović, M. Grčar, N. Lavrač, and M. Žnidaršič, ‘‘Predictive sentiment automatic natural language generation of spanish texts,’’ Expert Syst.
analysis of tweets: A stock market application,’’ in Human-Computer Appl., vol. 120, pp. 372–386, Apr. 2019.
Interaction and Knowledge Discovery in Complex, Unstructured, Big [70] J. Jurgovsky, M. Granitzer, K. Ziegler, S. Calabretto, P.-E. Portier,
Data (Lecture Notes in Computer Science), vol. 7947. Berlin, Germany: L. He-Guelton, and O. Caelen, ‘‘Sequence classification for credit-card
Springer, 2013, pp. 77–88. fraud detection,’’ Expert Syst. Appl., vol. 100, pp. 234–245, Jun. 2018.
[50] J. K. Rout, K.-K.-R. Choo, A. K. Dash, S. Bakshi, S. K. Jena, and [71] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and
K. L. Williams, ‘‘A model for sentiment and emotion analysis of unstruc- A. Swami, ‘‘The limitations of deep learning in adversarial settings,’’ in
tured social media text,’’ Electron. Commerce Res., vol. 18, no. 1, Proc. IEEE Eur. Symp. Secur. Privacy (EuroS P), Mar. 2016, pp. 372–387.
pp. 181–199, Mar. 2018. [72] R. Keshari, S. Ghosh, S. Chhabra, M. Vatsa, and R. Singh, ‘‘Unravelling
[51] C.-H. Chen, W.-P. Lee, and J.-Y. Huang, ‘‘Tracking and recognizing emo- small sample size problems in the deep learning world,’’ in Proc. IEEE 6th
tions in short text messages from online chatting services,’’ Inf. Process. Int. Conf. Multimedia Big Data (BigMM), Sep. 2020, pp. 134–143.
Manage., vol. 54, no. 6, pp. 1325–1344, Nov. 2018. [73] M. Abdul-Mageed and L. Ungar, ‘‘EmoNet: Fine-grained emotion detec-
[52] W. G. Parrott, Emotions in Social Psychology: Essential Readings. Hove, tion with gated recurrent neural networks,’’ in Proc. 55th Annu. Meeting
U.K.: Psychology Press, 2001. Assoc. Comput. Linguistics. Stroudsburg, PA, USA: Association for Com-
[53] A. Neviarouskaya, H. Prendinger, and M. Ishizuka, ‘‘Textual affect sens- putational Linguistics, 2017, pp. 718–728.
ing for sociable and expressive online communication,’’ in Proc. Int. [74] L. Santamaria-Granados, M. Munoz-Organero, G. Ramirez-Gonzalez,
Conf. Affect. Comput. Intell. Interact. Berlin, Germany: Springer, 2007, E. Abdulhay, and N. Arunkumar, ‘‘Using deep convolutional neural net-
pp. 218–229. work for emotion detection on a physiological signals dataset (AMIGOS),’’
[54] J. Xu, Z. Huang, M. Shi, and M. Jiang, ‘‘Emotion detection in E-learning IEEE Access, vol. 7, pp. 57–67, 2019.
using expectation-maximization deep spatial-temporal inference net- [75] R. Luss and S. Rosset, ‘‘Generalized isotonic regression,’’ J. Comput.
work,’’ in Advances in Intelligent Systems and Computing, vol. 650. Cham, Graph. Statist., vol. 23, no. 1, pp. 192–210, Jan. 2014.
Switzerland: Springer, 2018, pp. 245–252.
[55] M. Z. Asghar, A. Khan, K. Khan, H. Ahmad, and I. A. Khan, ‘‘COGEMO:
Cognitive-based emotion detection from patient generated health reviews,’’
J. Med. Imag. Health Informat., vol. 7, no. 6, pp. 1436–1444, Oct. 2017.
[56] S. Z. Bong, K. Wan, M. Murugappan, N. M. Ibrahim, Y. Rajamanickam,
and K. Mohamad, ‘‘Implementation of wavelet packet transform and non
FRANCISCO DE ARRIBA-PÉREZ received the
linear analysis for emotion classification in stroke patient using brain
B.S. degree in telecommunication technologies
signals,’’ Biomed. Signal Process. Control, vol. 36, pp. 102–112, Jul. 2017.
engineering, the M.S. degree in telecommunica-
[57] M. S. Hossain, G. Muhammad, B. Song, M. M. Hassan, A. Alelaiwi,
and A. Alamri, ‘‘Audio–visual emotion-aware cloud gaming framework,’’ tion engineering, and the Ph.D. degree from the
IEEE Trans. Circuits Syst. Video Technol., vol. 25, no. 12, pp. 2105–2118, University of Vigo, Spain, in 2013, 2014, and
Dec. 2015. 2019, respectively.
[58] J. F. Sánchez-Rada, M. Torres, C. A. Iglesias, R. Maestre, and E. Peinado, He is currently a Researcher with the Infor-
‘‘A linked data approach to sentiment and emotion analysis of Twitter in the mation Technologies Group, University of Vigo,
financial domain,’’ in Proc. 2nd Int. Workshop Semantic Web Enterprise Spain. His research includes the development of
Adoption Best Pract., 2nd Int. Workshop Finance Econ. Semantic Web, machine learning solutions for different use cases
vol. 1240, 2014, pp. 51–62. among the ones financial and heath must be highlighted.