Articulo 2 Ingles

Received October 31, 2020, accepted November 24, 2020, date of publication November 27, 2020,
date of current version December 11, 2020.

Digital Object Identifier 10.1109/ACCESS.2020.3041084
Detection of Financial Opportunities

in Micro-Blogging Data With a
Stacked Classification System
FRANCISCO DE ARRIBA-PÉREZ, SILVIA GARCÍA-MÉNDEZ ,
JOSÉ ÁNGEL REGUEIRO-JANEIRO , AND FRANCISCO J. GONZÁLEZ-CASTAÑO
Information Technology Group, atlanTTic, School of Telecommunications Engineering, University of Vigo, Campus, 36310 Vigo, Spain
Corresponding author: Francisco J. González-Castaño (javier@gti.uvigo.es)
This work was supported in part by the Ministerio de Economía, Industria y Competitividad, Spain, under Grant TEC2016-76465-C2-2-R,
and in part by Xunta de Galicia under Grant GRC2018/053,ED341D-R2016/012.
ABSTRACT Micro-blogging sources such as the Twitter social network provide valuable real-time data for
market prediction models. Investors’ opinions in this network follow the fluctuations of the stock markets
and often include educated speculations on market opportunities that may have impact on the actions of
other investors. In view of this, we propose a novel system to detect positive predictions in tweets, a type
of financial emotions which we term ‘‘opportunities’’ that are akin to ‘‘anticipation’’ in Plutchik’s theory.
Specifically, we seek a high detection precision to present a financial operator a substantial amount of such
tweets while differentiating them from the rest of financial emotions in our system. We achieve it with
a three-layer stacked Machine Learning classification system with sophisticated features that result from
applying Natural Language Processing techniques to extract valuable linguistic information. Experimental
results on a dataset that has been manually annotated with financial emotion and ticker occurrence tags
demonstrate that our system yields satisfactory and competitive performance in financial opportunity
detection, with precision values up to 83%. This promising outcome endorses the usability of our system to
support investors’ decision making.
INDEX TERMS Financial management, machine learning, emotion recognition, natural language
processing.
I. INTRODUCTION sources. Two examples are Thomson Reuters Eikon,3 which

A. MOTIVATION performs sentiment analysis ( SA) of social media content,
Among the many factors that compel people to take decisions and the Bloomberg platform,4 which provides social media
in stock markets, opinions on micro-blogging sites deserve indicators.
consideration [1]–[3]. It has been shown that fast paced infor- Furthermore, [9] argued that, even though stock mar-
mation in social media is valuable for sales prediction [4]–[6] kets contain enough information to define their state, this
and has strong influence on micro-economic trends in stock information becomes obsolete before the general public
markets [7]. Investors’ decisions can be affected not only by can process it to take decisions. Therefore, instantaneous
media content [1] but also by public mood [8]. Besides, the information in the Twitter micro-blogging network deserves
advent of powerful social trading platforms, such as eToro1 attention [10]–[12], and, in particular, in finance [7], [13].
and XTB online trading,2 has broadened the spectrum of Stock market data mining conforms a significant body of
input data. There already exist online tools to analyse these research [14]–[18] supporting market predictability. Some
works have focused on Natural Language Processing (NLP)
techniques [19]–[21] and their application to emotion
1 Available at https://www.etoro.com/, August 2020. analysis (EA) [22], [23].
2 Available at https://www.xtb.com/, August 2020.
The associate editor coordinating the review of this manuscript and 3 Available at https://eikon.thomsonreuters.com/, August 2020.
approving it for publication was Kaustubh Raosaheb Patil . 4 Available at https://www.bloomberg.com/, August 2020.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 215679
F. de Arriba-Pérez et al.: Detection of Financial Opportunities in Micro-Blogging Data With a Stacked Classification System
To the best of our knowledge, this work is the first proposal knowledge extraction from the comments by experts in finan-
based on EA techniques to detect what we call financial oppor- cial news, blogs and social media is an interesting research
tunities, that is, Twitter posts that speculate or reason that goal and a valuable asset for practical applications [29], [30].
the value of particular actives will grow, rather than merely Previous researchers have analysed online sources to char-
expressing positive opinions about those actives. Specifically, acterise stock markets. Most of them have focused on news,
we present a three-layer stacked classifier with a first stage to as [31], who studied the correlation between financial market
discard neutral entries, a second stage to distinguish generic events and financial news content. In this regard, the work
positive emotions from negative ones and a last stage to by [32] deserves a special mention. Using Machine Learning
differentiate opportunity entries –our ultimate goal– from any models of Latent Dirichlet Allocation, they concluded that
other positive emotions. the information extracted from news sources is better than
For this purpose, the system is enhanced with sophisticated price variations for predicting assets’ volatility. Additionally,
linguistic information features such as n-gram sequences, in [33] a Deep Learning ( DL) model for news categorisation
emotion and polarity dictionaries, and frequency counters of was presented proving that financial sources of information
hashtags, numerical information and percentages. have a significant effect on investments. There is also work on
knowledge extraction from financial blogs. For example, the
B. RESEARCH GOAL AND CONTRIBUTIONS method by [34] combined stock indexes with sentiment time
We seek a Machine Learning system to detect financial series from micro-blogs. Besides, [29] applied multivariate
opportunities in textual information extracted from tweets, regression to financial blogs to study how the markets react
where by opportunities we refer to positive user speculations to the posts. Finally, the study of the information posted
and forecasts (e.g. ‘‘the average yield of Google is 15.5%’’) on social media platforms like Twitter has been a relevant
instead of mere positive statements (e.g. ‘‘I like Google’’). research topic in recent years. Consider for example the
They are akin to looking forward positively for upcoming analysis of consumer profiles by [35] and the study of the evo-
events, as ‘‘anticipation’’ in Plutchik’s theory [24]. lution of stock prices and social media content by [36] based
This problem has not been previously considered in aca- on a latent space model [37]. However, none of these works
demic research and has remarkable practical applications for has considered user speculations about the evolution of the
both stock market screening and decision making. Notwith- assets. Note that all of them have taken temporal information
standing this applied perspective, our research has also a directly from the timestamps of financial forecasts or posts
theoretical goal, an enhanced understanding of the linguistic [29], [32], [34], [36]. In fact, we are not aware of any financial
dimension of micro-blogging comments to extract investment research supported by temporal analysis at discursive level
opportunities. Summing up, this work contributes to more such as ours.5
effective Machine Learning classifiers for stock market data In this context, NLP techniques [40] such as SA [41] and
mining with the following: EA [42], [43] have successfully extracted knowledge from
comments of stock market experts. Specifically, in [42] the
• A three-layer stacked Machine Learning classification
authors presented a novel approach for the automatic extrac-
system for detecting opportunities in micro-blogging
tion of a high-coverage and high-precision lexicon annotated
comments.
with emotion scores, while in [43] they confirmed the impact
• An analysis of the most relevant features through NLP
of social media emotions on the stock market.
techniques.
Often as an initial stage of EA, SA infers sentiment polarity
In addition, we employ a new dataset of investors’ com- from language features of user opinions [41], [44]. Typically,
ments for stock market forecasting with 6,000 entries –similar three polarity levels (negative, neutral and positive) are con-
in size to those in other state-of-the-art studies [25]–[28], sidered [45], although some authors add two deeper positive
which has been manually annotated with financial emotion and negative sentiment levels [46], [47]. Among the wealth
and ticker occurrence tags (e.g. in ‘‘the average yield of of research in Twitter mining, we can cite for example the
Google is 15.5%’’ the ticker tag is GOOGL) by experts in work by [48] on the analysis of consumer opinions about
the field. commercial brands, or by [49] on financial SA. The latter
The rest of this article is organised as follows. Section II classified Twitter information about eight stock markets with
reviews related work in knowledge extraction, discussing EA a Support Vector Classifier ( SVC).
from general and financial perspectives. Section III describes EA infers feelings from linguistic information in user com-
the classification problem and our solution. Section IV ments [50], [51]. It has become a central topic in the field
presents the experimental text corpus and the numerical tests of Affective Computing. Some typical emotion classes in
that validate our approach. Finally, Section V concludes this
article. 5 This aspect is relevant. Even though [38] already noted the interest
of the temporal perspective of the customers, i.e. the importance of their
II. RELATED WORK own understanding of time, few works have addressed temporal analysis at
discursive level. Among early ones we must mention [39], which extracted
Novice investors in stock markets give high considera- time from punctuation, word choices and academic terminology or keywords
tion to the experience of financial experts. Thus, automatic in journal titles, by also considering common sense context information.
215680 VOLUME 8, 2020

the literature are love, joy, anger, sadness, fear and sur- that enter the classification module. Table 2 presents some
prise, as defined by [52]. For example, the approach by [53] examples before and after applying these transformations.
detected affection in virtual communication environments. EA • Filtering. It selects finance-related tweets while
has numerous applications in education [54], healthcare [55], discarding spam and related tweets. Basically, tweets
[56] and gaming [57], to cite some. It has also caught the containing asset identifiers such as stock markets’ tick-
attention of the industry as a profitable asset. In the particular ers, URLs and quantitative information. We also remove
case of finance, [58] proposed a Linked Data approach for advertising entries with spam-related words and expres-
modelling sentiment and emotions from tweets about secu- sions such as sorteo ‘draw’, gratis ‘free’ and ?‘quieres
rities of the Spanish stock market. The system for financial ganar dinero? ‘do you want to earn money?’. Moreover,
forecasting by [22] incorporated different sources of informa- we discard words in other languages than Spanish using
tion into an integrated river model and combined GPOMS6 with the Enchant Python module.7 Finally, to discard highly
a Neural Network classifier, and [23] applied a multi-layer similar tweets, we implemented a similarity detector
perceptron to study the influence of investor emotions in based on the Jaccard distance with a threshold of 0.75%,
investment decisions. At the end, even though EA is still an inspired by state-of-the-art spam detection techniques
interesting research topic [59], [60], no previous analyses [65]–[67].
have taken into consideration finance-related emotions, such • Hashtag and mention splitting. We decompose hashtags
as opportunity or anticipation, and have focused exclusively in words with a splitter that uses our own lexica [68],
on general human emotions. [69] and the Spanish frequency reference corpus (CREA)
Training methodologies, on the other hand, can be divided by Real Academia Española de la Lengua.8 As shown
into supervised (manually annotated), semi-supervised and in the example in Table 2, the splitter divides the term
unsupervised approaches. For example, in [61] some authors acuerdocomercial successfully as acuerdo comercial
of this article proposed an unsupervised system for auto- ‘commercial agreement’.
matic generation of emotion lexica with polarity data. Hybrid • Spelling correction. We correct words with spelling
solutions such as [62] (which applied classifiers as well as mistakes by replacing them by the most likely candi-
unsupervised approaches with polarity lexicons and syntactic date through our own word distance algorithm that uses
structures) have produced high competitive results. Finally, CREA as well as the Enchant Python module. In the
stacking strategies combining different Machine Learning example in Table 2, the word precausion is corrected and
techniques into the same predictive model can improve per- replaced by precaución ‘caution’.
formance [63], [64]. It has been our intent to exploit the ben- • Stock market asset, mention and hashtag detection and
efits of stacking in ensemble learning techniques to further removal. We use regular expressions to detect capitalised
increase the performance of our system. letters and identify representative symbols such as $, @
and #. Once identified, all these elements are removed
III. SYSTEM ARCHITECTURE from the text of the entries.
In this section we present our novel three-layer stacked sys- • Stop words removal. Meaningless words such as deter-
tem for detecting financial tweets on market opportunities. miners and prepositions9 are removed from the text.
Specifically, the first stage detects neutral entries, the second We also remove URLs and retweet ( RT) tags. We consider
stage distinguishes between general positive and negative days of the week and months of the year as stop words.
emotions and the last stage separates opportunity entries from However, we keep elements such as no ‘not’, sí ‘yes’,
any other positive statements. Figure 1 illustrates its frame- muy ‘very’ and poco ‘few’, since they help to interpret
work, which consists of two modules that will be described in the evolution of the assets.
detail in the next subsections, where financial tweets are taken • Quantitative data and laughter replacement. We sub-
from a training database. Table 1 shows examples of input stitute quantitative data (numbers and percentages) by
data and the corresponding emotion categories (including tags representing their sign, that is, ‘+’ and ‘−’ for
‘‘opportunity’’, which we have already described). In the positive and negative figures, respectively. We decided
context of Plutchik’s theory [24], ‘‘positive statement’’ cor- to use these symbolic representations rather than the
responds to any positive emotion on present events such as amounts themselves because they are more clearly
‘‘expectation’’; and ‘‘negative awareness’’ to any negative related to stock markets’ upward and downward move-
opinion about an asset, regardless of the sentiment expressed ments. Laughing onomatopoeic expressions such as
in the tweet, which is similar to ‘‘disapproval’’. haha and hehe, which are typically composed of the
characters j or h interspersed with the vowels a, e and
A. DATA PROCESSING MODULE i in any quantity, are replaced by the LAUGH tag.
This module discards useless data. Specifically, we apply a
series of text transformations to ensure the quality of the data
7 Available at https://pypi.org/project/pyenchant/, August 2020.
6 Google Profile of Moods States (GPOMS). At the time this article was 8 Available at http://corpus.rae.es/lfrecuencias.html, August 2020.
written it was no longer available. 9 Available at https://www.ranks.nl/stopwords/spanish, August 2020.
VOLUME 8, 2020 215681

FIGURE 1. Our proposed framework.
TABLE 1. Examples of dataset entries with emotion tags.
• Text lemmatisation. The content of the tweets is split into we also include the textual representation of the emojis
tokens (words), which are independently checked in the in the tweets. To generate char n-grams, word tokens
Hunspell dictionary.10 Finally, the words are lemmatised and word n-grams we combined CountVectorizer12 with
using the Freeling NLP tool.11 GridSearchCV,13 both from the Scikit-Learn Python
library, using the parameters in Listing 1. GridSearchCV
B. CLASSIFICATION MODULE performs an exhaustive search over specified parameter
1) MACHINE LEARNING MODEL: FEATURE PROCESSING ranges for an estimator. As a result, we obtained the
The features we selected to extract valuable information from optimal parameters max_df = 0.5, min_df = 0.001 and
our dataset and build our model are: ngram_range = (1,7) in our scenario.
• Frequency counters. These features count the number
• Char-grams, word tokens and word-grams. Char-grams
of hashtags, negative and positive amounts and per-
are sequences of adjacent characters in a text (note
centages, exclamation marks, interrogation marks and
that spaces must also be considered in this case). Word
adverbs in the content of the tweets.
tokens represent character grams only from the text
inside word boundaries, whereas word n-grams are
sequences of adjacent words in a text, among which 12 Available at https://scikit-learn.org/stable/ modules/generat-
ed/sklearn.feature_extraction.text.CountVectorizer.html, July 2020.
10 Available at https://hunspell.github.io/, August 2020. 13 Available at https://scikit-learn.org/stable/ modules/generat-
11 Available at http://nlp.lsi.upc.edu/freeling/node/1, August 2020. ed/sklearn.model_selection.GridSearchCV.html, August 2020.
215682 VOLUME 8, 2020

TABLE 2. Examples of tweets before and after processing.
2) EMOTION ANALYSIS: THREE-LAYER STACKED SYSTEM

Figure 2 shows the flow diagram of the stacked system. It is
composed of three stages: a first stage that distinguishes
between neutral and non-neutral entries, a second stage to
distinguish between positive and negative emotions and a
Listing. 1. Configuration for the generation of n-grams. last stage to extract opportunities from positive statements.
In each of these stages, we applied a decision depth thresh-
old to maximise opportunity detection, because in financial
•Dictionary features. We use our polarity and emoji lexi- analysis a high precision in key categories is preferable over
con,14 for negative and positive sentiments (specifically obtaining a large amount of positives [70].
for characterisation of general negative and posi- The system includes implementations of Gradient Descent
tive financial emotions), negative awareness, positive (GD), Decision Tree (DT), Random Forest (RF) and Support
statement and opportunity financial emotions; and an Vector Classification ( SVC) algorithms from the Scikit-Learn
external emotion lexicon,15 for sadness and happiness. Python library.16 Note that the size of our training datasets did
We count the words and emojis in the tweets correspond- not justify the application of DL techniques [71], [72]. In this
ing to each sentiment and emotion category of those regard, works such as [73], [74] are representative state of the
lexica. art examples of DL techniques for EA where the datasets are
• Temporal features. These features, based on the discur-
much larger than ours.
sive analysis in the preprocessing stage, are relevant Accordingly, we must characterise the level of confidence
because speculative tweets (such as those expressing (depth level) of emotion prediction precision. This can be
caution or opportunities) tend to use certain verbal tenses formalised with a non-parametric approach based on isotonic
(such as conditional tenses) in conjunction with partic- regression [75]. This approach fits data with a non-decreasing
ular expressions like alto riesgo ‘high risk’ and ocasión function defining a partial or total ordering, by minimising
inmejorable ‘unbeatable occasion’. The temporal fea- quadratic expression (1) subject to (2).
tures of a tweet count the verbs in past, present, future,
X
αi (yi − ŷi )2 (1)
and conditional tenses. i
Table 3 summarises the features to build our Machine ŷmin = ŷ1 ≤ ŷ2 . . . ≤ ŷn = ŷmax (2)
Learning model for financial opportunity detection.
In expression (1), each variable αi is a strictly positive
14 Available at https://www.gti.uvigo.es/index.php/en /resources/8- weight (1 by default), yi ∈ N is the true category of the i-th
lexicon-of-polarity-and-list-of-emojis -by-polarity-and-emotion-for-
application -in-the-financial-field, August 2020. 16 Available at https://scikit-learn.org/stable/ supervised_learning.html#
15 Available at http://www.cic.ipn.mx/∼sidorov/SEL.txt, August 2020. supervised-learning, August 2020.
VOLUME 8, 2020 215683

TABLE 3. Input features in the machine learning model.
FIGURE 2. Flow diagram of our stacked classifier.
tweet and ŷi ∈ N its predicted category (so that yi and ŷi are • Operating System: Ubuntu 18.04.2 LTS 64 bits
taken to a numerical space that allows defining a partial or • Processor: IntelCore i9-9900K 3.60 GHz
total ordering). • RAM: 32GB DDR4
The level of confidence in the prediction depends on the • Disk: 500 Gb (7200 rpm SATA) + 256 GB SSD
probabilities of correct class assignment. Logically, it is pos-
sible to select subsets of vectors in the training set for which A. DATASET
such probabilities are higher, and thus to trade accuracy by TxStockData S.L., a fintech company, provided us with a
precision. dataset composed of 6,000 tweets (similar in size to those
in other state-of-the-art studies [25]–[28]). The dataset was
IV. RESULTS AND DISCUSSION gathered from 14th May 2019 to 3rd February 2020. Its
All the experiments were executed on a computer with the entries were manually annotated with financial emotion tags
following specifications: by five experts in finance and NLP. At the end, the emotion
215684 VOLUME 8, 2020

TABLE 4. Distribution of samples in the dataset by category. of the probability of presenting negative awareness results
as opportunities, A−
P+
, which would strongly discourage an
A−+
operator, that is τ2 = 1 − P
S ++ +P++ +NP+ +A−+
(where NP+
P P P
corresponds to neutral tweets from a financial perspective).
Therefore, our declared goal is first and foremost a reasonably
high precision in the detection of opportunities, for high
values of τ1 and, secondarily, of τ2 .
TABLE 5. Confusion matrix for tolerance calculations.

C. RESULTS
In this section, we evaluate our system to detect financial
opportunities. First we analyse the contributions the fea-
tures of the Machine Learning model (see Table 3) using
a basic single-layer classifier of opportunities vs. rest of
emotions (Section IV-C1). Then we present the precision
and tolerance results of a two-layer (neutral vs. non-neutral /
tag of each entry in the dataset was computed by majority opportunities vs. rest of emotions) stacked classifier, without
voting (in case of tie, it was selected randomly). To ensure the (Section IV-C2) and with (Section IV-C3) decision depth
quality of the data, we discarded repeated and spam entries, thresholds. Finally, we present our final three-layer stacked
yielding approximately 5,000 valid tweets.17 Each entry has classifier (which adds a final opportunities vs. positive state-
the following structure: ments layer) with decision depth thresholds, culminating
• ID: a unique numerical identifier. our incremental design (Section IV-C4). All results pre-
• Tweet: original text of the Tweet. sented in this section were computed by applying 10-fold
• Ticker: stock market assets mentioned in the text. cross-validation.
• Emotion: emotion label. Our model, with all characteristics, has ∼400,000 n-gram
Table 4 shows the distribution of the entries in the experi- features. This is computationally intractable. Consequently,
mental dataset by emotion category. we applied an attribute selector to extract the most relevant
features with a maximum decrease of 0.5% in precision.
B. DEFINITION OF PERFORMANCE METRICS Specifically, we chose the SelectPercentile18 method from
The practical goal of our system is to present to the user the Scikit-Learn Python library, as it outperformed other
the content of the tweets that are marked as opportunities. alternatives (SelectFromModel, SelectKBest, and RFECV).
Therefore, as previously said, we are interested in maximis- This method selects features according to a highest score
ing precision for the target financial emotion opportunity. percentile. We set a χ 2 score function and an 80th percentile
We defined two auxiliary tolerance metrics for those output threshold. At the end, we kept ∼50,000 n-gram features.
decisions that do not correspond to true opportunities but
will not necessarily discourage an operator, inspired by works 1) NUMERICAL TEST 1: RELEVANCE OF THE FEATURES
like [70]. Firstly we evaluated the performance of a single-layer classi-
We explain these tolerance metrics with the confusion fier that was trained to distinguish between opportunities and
matrix in Table 5. Notation XY represents the number all the other financial emotions in Table 4. Table 6 shows aver-
of tweets of emotion category X that are classified into age results for the target emotion opportunity with 10-fold
category Y . cross-validation. We started with a basic feature set only
From this confusion matrix we define tolerances τ1 and τ2 containing char-grams, word tokens and word-grams. Adding
as follows: all features in Section III-B1 was slightly advantageous for all
SP++ + P+ classifiers (up to ∼ 4% of improvement) except for the SVC.
P+
τ1 = (3) Moreover, in light of the results, RF seemed the best classifier
SP++ + P+
P+
+ NP+ + A−
P+ to detect opportunities, followed by the GD classifier.
SP++ + P+
P+
+ NP+ We considered these initial results (precision, τ1 and τ2 up
τ2 = (4)
SP++ + P+
P+
+ NP+ + A−
P+
to 64.33%, 70.65% and 87.31%, respectively) promising and
they motivated us to design more advanced approaches. The
In both cases the denominator is the sum of all decisions difference between τ1 and τ2 , which ranged between 15% and
marked as opportunities. Tolerance τ1 grows with SP+ +
, that is, 21%, indicated that a significant amount of neutral entries
when the user is presented positive statements as opportuni- were considered as opportunities. This is why we decided to
ties. Strictly these decisions are errors, but they have minimal
impact on user trust if P+P+ SP+ . Tolerance τ2 is the inverse
+
18 Available at https://scikit-learn.org/stable/modules /feature_selection.

17 We will make this dataset available to other researchers on request. html, August 2020.
VOLUME 8, 2020 215685

TABLE 6. Precisions and tolerances for numerical test 1. TABLE 8. Precisions and tolerances of the two-layer stacked classifier
with decision depth, numerical test 3.
3) NUMERICAL TEST 3: TWO-LAYER STACKED CLASSIFIER

WITH DECISION DEPTH
As previously stated, in businesses and investment, it is
TABLE 7. Precisions and tolerances of the two-layer stacked classifier, preferable to obtain less positives for key categories with high
numerical test 2.
precision. Take personalised investment advertising as an
example. Banks must identify customer profiles with higher
success probability, since personalised commercialisation
actions are rather expensive. From a practical perspective,
we set the goal of detecting at least 10% of all opportunity
entries with high precision.
Accordingly, in this test we relied on the predict_
proba setting of the Scikit-Learn Python library to set class
thresholds. It allows computing the probabilities of the possi-
ble outcomes (classes in our model), thus providing levels of
confidence (see Section III-B2). When the probability of the
most likely class for a vector exceeded the threshold of that
explore the benefits of a stacked classifier with an initial stage class (which we term decision depth) the vector was assigned
to filter neutral entries. that class. Otherwise, the entry was considered neutral. This
procedure was followed at both stages of numerical test 2,
2) NUMERICAL TEST 2: TWO-LAYER STACKED CLASSIFIER with the same depth value, after empirical tests.
In this test we evaluated a two-layer stacked classifier. The Table 8 shows the results of the two-layer stacked classifier
first stage was designed two distinguish between neutral and with decision depth. Both classifiers achieved significant
non-neutral entries and the second stage distinguished oppor- precision improvements. Besides, the RF classifier achieved a
tunities from the rest of financial emotions in Table 4. In both precision above 70% in all cases for τ1 ∼ 80% and τ2 ∼ 90%.
stages we used the same type of classifier but we applied inde- Regardless of the fact that precision and τ1 were still not satis-
pendent hyperparameter optimisation in each stage. Table 7 factory (they were less than 75% and 90%, respectively), the
shows the resulting precisions and tolerances. values of τ2 for the two classifiers with all features indicated
If we compare this test with numerical test 1, there was no that many negative awareness entries were still classified as
significant improvement with the GD and DT classifiers but opportunities, which was unacceptable.
the stacked approach produced a 5% performance boost for
the other two. More specifically, the SVC and RF classifiers 4) NUMERICAL TEST 4: THREE-LAYER STACKED CLASSIFIER
produced the best results for all metrics and both scenarios WITH DECISION DEPTH
(basic set and all features, except for the precision of the All points considered, to meet our goals we finally applied the
SVC if compared with the GD classifier with all features). three-layer stacked classifier framework (see Section III-B2)
Furthermore, even though the GD and SVC classifiers behaved with decision depth thresholds where the third stage distin-
similarly with all features, the GD classifier is far more com- guishes opportunities from positive statement entries.
putationally intensive. DT precision results were considerably We relied once again on the predict_proba setting of
bad, of around 40%. Thus we decided to discard GD and DT the Scikit-Learn Python library to set the depth threshold.
for opportunity detection and keep only SVC and RF in further As in the previous numerical test we used the same classifier
analysis. and depth threshold in all layers. Table 9 shows the results of
Ultimately, even though this test revealed further improve- this test. Note the significant improvements by the SVC and RF
ment in precision and tolerance with the two-level stacked classifiers for all metrics. With the RF classifier we obtained
classifier, the highest precision (with the RF algorithm) was precision values above 80%, τ1 above 90% and τ2 above 95%,
still under 75%. However, the tolerances against operator meeting our initial requirements. Less than 5% of the negative
discouragement were moderately satisfactory (τ1 >75% awareness entries were misclassified as opportunities, prob-
and τ2 >85%). ably due to high ambiguity and, thus, annotation errors.
215686 VOLUME 8, 2020

FIGURE 3. Histogram of ticker mentions in opportunities.
TABLE 9. Precisions and tolerances of the three-layer stacked classifier

with decision depth, numerical test 4.
TABLE 10. Improvements in precisions and tolerances between

numerical tests 1 and 4.
D. EXPLOITING THE RESULTS

For the sake of clarity and a handy comparison of the different
numerical tests, Table 10 shows the differences in precision,
τ1 and τ2 between numerical tests 1 and 4. The improvement
achieved with SVC and RF classifiers is noteworthy, supporting
the benefits of a stacking approach for financial opportunity FIGURE 4. Possible integration of our system in a mobile app.
detection. The performance boost for RF, our final choice,
was almost 20% for precision and τ1 . In addition to meeting Therefore, a possible way to exploit these results would be
our performance goals, we succeeded to identify as many listing in a scroll section of the user dashboard the tweets we
opportunity entries with high precision as our application pick as opportunities with the three-level stacked classifier.
demanded, 9.27% and 10.43% on average with RF and SVC Then we could highlight the tweets mentioning assets that
respectively, with 10-fold cross-validation. are detected following our approach with decision depth.
Moreover, Figure 3 represents the histogram of the men- Figure 4 shows an example with real tweets as classified by
tions to assets in opportunity tweets. By examining the
our system. Symbol is displayed next to the tweets that
specific assets that were mentioned between 14 May 2019
contain opportunities that are detected with high precision.
and 3 February 2020 in detected opportunities in the test-
ing sets in numerical experiment 4, we observed that those V. CONCLUSION
assets corresponded to ∼ 80% of the mentions to the most Motivated by the strong influence of social media in users’
interesting stock actives (upper quartile of all mentions in decisions in global markets, we propose a novel three-layer
opportunities in the testing subsets, marked in red in Figure 3. stacked system to detect financial opportunities in tweets.
For clarity, only some representative well-known tickers are Our solution has been designed to extract such tweets with
shown in the x axis). high precision to present their content on personal finance
VOLUME 8, 2020 215687

management dashboards. It exploits sophisticated linguistic [14] M. Nofer and O. Hinz, ‘‘Using Twitter to predict the stock market,’’ Bus.
features, such as polarity and emotion dictionaries and tem- Inf. Syst. Eng., vol. 57, no. 4, pp. 229–242, Aug. 2015.
[15] T. Dimpfl and S. Jank, ‘‘Can Internet search queries help to predict stock
poral identification by discursive analysis, in its ML model. market volatility?’’ Eur. Financial Manage., vol. 22, no. 2, pp. 171–192,
It yields satisfactory and competitive market-level perfor- Mar. 2016.
mance in financial opportunity detection. [16] X. Zhong and D. Enke, ‘‘Forecasting daily stock market return using
dimensionality reduction,’’ Expert Syst. Appl., vol. 67, pp. 126–139,
Experimental results, including two ad-hoc tolerance met- Jan. 2017.
rics focusing on data quality from an operator perspective, [17] X. Zhang, Y. Zhang, S. Wang, Y. Yao, B. Fang, and P. S. Yu, ‘‘Improving
demonstrate that our three-layer stacked system with decision stock market prediction via heterogeneous information fusion,’’ Knowl.-
Based Syst., vol. 143, pp. 236–247, Mar. 2018.
probability threshold (decision depth) succeeds to achieve our [18] E. Hoseinzade and S. Haratizadeh, ‘‘CNNpred: CNN-based stock market
goal. In particular, using a RF algorithm with all features in prediction using a diverse set of variables,’’ Expert Syst. Appl., vol. 129,
the model, the system attains ∼ 83% financial opportunity pp. 273–285, Sep. 2019.
[19] F. Sun, A. Belatreche, S. Coleman, T. M. McGinnity, and Y. Li, ‘‘Pre-
precision with detection tolerances τ1 ∼ 90% and τ2 ∼ 96%. processing online financial text for sentiment classification: A natural
Given the valuable up-to-date information in micro-blogging language processing approach,’’ in Proc. IEEE Conf. Comput. Intell.
platforms, these promising results endorse the usability of our Financial Eng. Econ. (CIFEr), Mar. 2014, pp. 122–129.
[20] I. E. Fisher, M. R. Garnsey, and M. E. Hughes, ‘‘Natural language process-
system to support investors’ decision making. ing in accounting, auditing and finance: A synthesis of the literature with a
As future work and to further expand the contributions roadmap for future research,’’ Intell. Syst. Accounting, Finance Manage.,
of our research, the emotion classifiers will be improved vol. 23, no. 3, pp. 157–214, Jul. 2016.
[21] F. Z. Xing, E. Cambria, and R. E. Welsch, ‘‘Natural language based
with domain-specific tweet filters and quantitative (objec- financial forecasting: A survey,’’ Artif. Intell. Rev., vol. 50, no. 1, pp. 49–73,
tive) numerical information from stock exchange prices. Jun. 2018.
In addition to opportunities, this will allow analysing positive [22] K. K. Singh and P. Dimri, ‘‘Score based financial forecasting method by
incorporating different sources of information flow into integrative river
statement and negative awareness emotions in stock market model,’’ in Proc. 6th Int. Conf. Cloud Syst. Big Data Eng. (Confluence),
reports. Moreover, extensions for a multilingual version of Jan. 2016, pp. 685–688.
our framework to cover French and English will be provided. [23] N. I. M. Razi, M. Othman, and H. Yaacob, ‘‘Investment decisions
based on EEG emotion recognition,’’ Adv. Sci. Lett., vol. 23, no. 11,
pp. 11345–11349, Nov. 2017.
REFERENCES [24] R. Plutchik, ‘‘The circumplex as a general model of the structure of
[1] T. Li, J. van Dalen, and P. J. van Rees, ‘‘More than just noise? Examining emotions and personality,’’ in Circumplex Models of Personality and Emo-
the information content of stock microblogs on financial markets,’’ J. Inf. tions. Worcester, MA, USA: American Psychological Association, 2004,
Technol., vol. 33, no. 1, pp. 50–69, Mar. 2018. pp. 17–45.
[25] S. P. Chatzis, V. Siakoulis, A. Petropoulos, E. Stavroulakis, and
[2] X. Li, P. Wu, and W. Wang, ‘‘Incorporating stock prices and news senti-
N. Vlachogiannakis, ‘‘Forecasting stock market crisis events using deep
ments for stock market prediction: A case of Hong Kong,’’ Inf. Process.
and statistical machine learning techniques,’’ Expert Syst. Appl., vol. 112,
Manage., vol. 57, no. 5, Sep. 2020, Art. no. 102212.
pp. 353–371, Dec. 2018.
[3] G. Bello-Orgaz, R. M. Mesas, C. Zarco, V. Rodriguez, O. Cordón, and
[26] M. Al-Smadi, M. Al-Ayyoub, Y. Jararweh, and O. Qawasmeh, ‘‘Enhancing
D. Camacho, ‘‘Marketing analysis of wineries using social collective
aspect-based sentiment analysis of arabic Hotels’ reviews using morpho-
behavior from users’ temporal activity on Twitter,’’ Inf. Process. Manage.,
logical, syntactic and semantic features,’’ Inf. Process. Manage., vol. 56,
vol. 57, no. 5, Sep. 2020, Art. no. 102220.
no. 2, pp. 308–319, Mar. 2019.
[4] M. Meire, M. Ballings, and D. Van den Poel, ‘‘The added value of social
[27] D. Simester, A. Timoshenko, and S. I. Zoumpoulis, ‘‘Targeting prospec-
media data in B2B customer acquisition systems: A real-life experiment,’’
tive customers: Robustness of machine-learning methods to typical data
Decis. Support Syst., vol. 104, pp. 26–37, Dec. 2017.
challenges,’’ Manage. Sci., vol. 66, no. 6, pp. 2495–2522, 2019.
[5] P.-F. Pai and C.-H. Liu, ‘‘Predicting vehicle sales by sentiment anal- [28] J. Tuke, A. Nguyen, M. Nasim, D. Mellor, A. Wickramasinghe, N. Bean,
ysis of Twitter data and stock market values,’’ IEEE Access, vol. 6, and L. Mitchell, ‘‘Pachinko prediction: A Bayesian method for event pre-
pp. 57655–57662, 2018. diction from social media data,’’ Inf. Process. Manage., vol. 57, no. 2,
[6] H. Yuan, W. Xu, Q. Li, and R. Lau, ‘‘Topic sentiment mining for sales per- Mar. 2020, Art. no. 102147.
formance prediction in e-commerce,’’ Ann. Oper. Res., vol. 270, nos. 1–2, [29] L. K. Rickett, ‘‘Do financial blogs serve an infomediary role in capital
pp. 553–576, Nov. 2018. markets?’’ Amer. J. Bus., vol. 31, no. 1, pp. 17–40, Apr. 2016.
[7] F. Mai, Z. Shan, Q. Bai, X. Wang, and R. H. L. Chiang, ‘‘How does social [30] W. He, F.-K. Wang, and V. Akula, ‘‘Managing extracted knowledge from
media impact bitcoin value? A test of the silent majority hypothesis,’’ big social media data for business decision making,’’ J. Knowl. Manage.,
J. Manage. Inf. Syst., vol. 35, no. 1, pp. 19–52, Jan. 2018. vol. 21, no. 2, pp. 275–294, Apr. 2017.
[8] Y. Sun, X. Liu, G. Chen, Y. Hao, and Z. J. Zhang, ‘‘How mood affects the [31] M. Alanyali, H. S. Moat, and T. Preis, ‘‘Quantifying the relationship
stock market: Empirical evidence from microblogs,’’ Inf. Manage., vol. 57, between financial news and the stock market,’’ Sci. Rep., vol. 3, no. 1,
no. 5, pp. 103–181, Jul. 2020. pp. 3578–3584, Dec. 2013.
[9] D. Enke and S. Thawornwong, ‘‘The use of data mining and neural [32] A. Atkins, M. Niranjan, and E. Gerding, ‘‘Financial news predicts stock
networks for forecasting stock market returns,’’ Expert Syst. Appl., vol. 29, market volatility better than close price,’’ J. Finance Data Sci., vol. 4, no. 2,
no. 4, pp. 927–940, Nov. 2005. pp. 120–137, Jun. 2018.
[10] M. S. Gerber, ‘‘Predicting crime using Twitter and kernel density estima- [33] M.-Y. Day and C.-C. Lee, ‘‘Deep learning for financial sentiment analysis
tion,’’ Decis. Support Syst., vol. 61, pp. 115–125, May 2014. on finance news providers,’’ in Proc. IEEE/ACM Int. Conf. Adv. Social
[11] A. G. Reece, A. J. Reagan, K. L. M. Lix, P. S. Dodds, C. M. Danforth, Netw. Anal. Mining (ASONAM), Aug. 2016, pp. 1127–1134.
and E. J. Langer, ‘‘Forecasting the onset and course of mental illness with [34] Y. Wang, ‘‘Stock market forecasting with financial micro-blog based on
Twitter data,’’ Sci. Rep., vol. 7, no. 1, pp. 1–11, Dec. 2017. sentiment and time series analysis,’’ J. Shanghai Jiaotong Univ. Sci.,
[12] K. Zahra, M. Imran, and F. O. Ostermann, ‘‘Automatic identification of vol. 22, no. 2, pp. 173–179, Apr. 2017.
eyewitness messages on Twitter during disasters,’’ Inf. Process. Manage., [35] E. Ioanăs and I. Stoica, ‘‘Social media and its impact on consumers
vol. 57, no. 1, pp. 102–107, 2020. behavior,’’ Int. J. Econ. Practices Theories, vol. 4, no. 2, pp. 295–303,
[13] N. Oliveira, P. Cortez, and N. Areal, ‘‘The impact of microblogging data 2014.
for stock market prediction: Using Twitter to predict returns, volatility, [36] A. Sun, M. Lachanski, and F. J. Fabozzi, ‘‘Trade the tweet: Social media
trading volume and survey sentiment indices,’’ Expert Syst. Appl., vol. 73, text mining and sparse matrix factorization for stock market prediction,’’
pp. 125–144, May 2017. Int. Rev. Financial Anal., vol. 48, pp. 272–281, Dec. 2016.
215688 VOLUME 8, 2020

[37] F. Ming, F. Wong, Z. Liu, and M. Chiang, ‘‘Stock market prediction from [59] D. Duxbury, T. Gärling, A. Gamble, and V. Klass, ‘‘How emotions
WSJ: Text mining via sparse matrix factorization,’’ in Proc. IEEE Int. Conf. influence behavior in financial markets: A conceptual analysis and
Data Mining, Dec. 2014, pp. 430–439. emotion-based account of buy-sell preferences,’’ Eur. J. Finance, vol. 26,
[38] P. T. Gibbs, ‘‘Time, temporality and consumer behaviour,’’ Eur. J. Market- pp. 1–22, Mar. 2020.
ing, vol. 32, nos. 11–12, pp. 993–1007, Dec. 1998. [60] S. Pengnate and F. J. Riggins, ‘‘The role of emotion in P2P microfinance
[39] J. M. Forray and J. Woodilla, ‘‘Artefacts of management academe,’’ Time funding: A sentiment analysis approach,’’ Int. J. Inf. Manage., vol. 54,
Soc., vol. 14, nos. 2–3, pp. 323–339, Sep. 2005. Oct. 2020, Art. no. 102138.
[40] B. Liu, Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. [61] M. Fernández-Gavilanes, J. Juncal-Martínez, S. García-Méndez,
Cambridge, U.K.: Cambridge Univ. Press, 2015. E. Costa-Montenegro, and F. J. González-Castaño, ‘‘Creating Emoji
[41] A. Derakhshan and H. Beigy, ‘‘Sentiment analysis on stock social media Lexica from unsupervised sentiment analysis of their descriptions,’’
for stock price movement prediction,’’ Eng. Appl. Artif. Intell., vol. 85, Expert Syst. Appl., vol. 103, pp. 74–91, Aug. 2018.
pp. 569–578, Oct. 2019. [62] M. Fernández-Gavilanes, T. Àlvarez-López, J. Juncal-Martínez,
[42] J. Staiano and M. Guerini, ‘‘Depeche mood: A lexicon for emotion analysis E. Costa-Montenegro, and F. J. González-Castaño, ‘‘GTI: An unsupervised
from crowd annotated news,’’ in Proc. 52nd Annu. Meeting Assoc. Comput. approach for sentiment analysis in Twitter,’’ in Proc. 9th Int. Workshop
Linguistics, vol. 2. Stroudsburg, PA, USA: Association for Computational Semantic Eval. (SemEval), 2015, pp. 35–40.
Linguistics, 2014, pp. 427–433. [63] A. Mehmood, B.-W. On, I. Lee, I. Ashraf, and G. Sang Choi, ‘‘Spam
[43] Y. Ge, J. Qiu, Z. Liu, W. Gu, and L. Xu, ‘‘Beyond negative and positive: comments prediction using stacking with ensemble learning,’’ J. Phys.,
Exploring the effects of emotions in social media during the stock market Conf. Ser., vol. 933, Jan. 2018, Art. no. 012012.
crash,’’ Inf. Process. Manage., vol. 57, no. 4, Jul. 2020, Art. no. 102218. [64] Y. Wang, S. Liu, S. Li, J. Duan, Z. Hou, J. Yu, and K. Ma, ‘‘Stacking-based
ensemble learning of self-media data for marketing intention detection,’’
[44] G. Xu, Y. Meng, X. Qiu, Z. Yu, and X. Wu, ‘‘Sentiment analysis of
Future Internet, vol. 11, no. 7, p. 155, Jul. 2019.
comment texts based on BiLSTM,’’ IEEE Access, vol. 7, pp. 51522–51532,
[65] S. R. Harsule and M. K. Nighot, ‘‘N-gram classifier system to filter spam
2019.
messages from OSN user wall,’’ in Advances in Intelligent Systems and
[45] Y. Fang, H. Tan, and J. Zhang, ‘‘Multi-strategy sentiment analysis of
Computing. Singapore: Springer, 2016, pp. 21–28.
consumer reviews based on semantic fuzziness,’’ IEEE Access, vol. 6,
[66] S. Bajaj, N. Garg, and S. K. Singh, ‘‘A novel user-based spam review
pp. 20625–20631, 2018.
detection,’’ Procedia Comput. Sci., vol. 122, pp. 1009–1015, Jan. 2017.
[46] A. Balahur and J. M. Perea-Ortega, ‘‘Sentiment analysis system adaptation
[67] S. Temma, M. Sugii, and H. Matsuno, ‘‘The document similarity index
for multilingual processing: The case of tweets,’’ Inf. Process. Manage.,
based on the jaccard distance for mail filtering,’’ in Proc. 34th Int. Tech.
vol. 51, no. 4, pp. 547–556, Jul. 2015.
Conf. Circuits/Syst., Comput. Commun. (ITC-CSCC), Jun. 2019, pp. 1–4.
[47] J. Bučar, J. Povh, and M. Žnidaršič, ‘‘Sentiment classification of the [68] S. García-Méndez, M. Fernández-Gavilanes, E. Costa-Montenegro,
Slovenian news texts,’’ in Proc. 9th Int. Conf. Comput. Recognit. Syst. J. Juncal-Martínez, and F. J. González-Castaño, ‘‘Automatic natural
Cham, Switzerland: Springer, 2016, pp. 777–787. language generation applied to alternative and augmentative
[48] D. Zimbra, M. Ghiassi, and S. Lee, ‘‘Brand-related Twitter sentiment anal- communication for online video content services using SimpleNLG
ysis using feature engineering and the dynamic architecture for artificial for Spanish,’’ in Proc. Internet Accessible Things, Apr. 2018, pp. 1–4.
neural networks,’’ in Proc. 49th Hawaii Int. Conf. Syst. Sci. (HICSS), [69] S. García-Méndez, M. Fernández-Gavilanes, E. Costa-Montenegro,
Jan. 2016, pp. 1930–1938. J. Juncal-Martínez, and F. Javier González-Castaño, ‘‘A library for
[49] J. Smailović, M. Grčar, N. Lavrač, and M. Žnidaršič, ‘‘Predictive sentiment automatic natural language generation of spanish texts,’’ Expert Syst.
analysis of tweets: A stock market application,’’ in Human-Computer Appl., vol. 120, pp. 372–386, Apr. 2019.
Interaction and Knowledge Discovery in Complex, Unstructured, Big [70] J. Jurgovsky, M. Granitzer, K. Ziegler, S. Calabretto, P.-E. Portier,
Data (Lecture Notes in Computer Science), vol. 7947. Berlin, Germany: L. He-Guelton, and O. Caelen, ‘‘Sequence classification for credit-card
Springer, 2013, pp. 77–88. fraud detection,’’ Expert Syst. Appl., vol. 100, pp. 234–245, Jun. 2018.
[50] J. K. Rout, K.-K.-R. Choo, A. K. Dash, S. Bakshi, S. K. Jena, and [71] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and
K. L. Williams, ‘‘A model for sentiment and emotion analysis of unstruc- A. Swami, ‘‘The limitations of deep learning in adversarial settings,’’ in
tured social media text,’’ Electron. Commerce Res., vol. 18, no. 1, Proc. IEEE Eur. Symp. Secur. Privacy (EuroS P), Mar. 2016, pp. 372–387.
pp. 181–199, Mar. 2018. [72] R. Keshari, S. Ghosh, S. Chhabra, M. Vatsa, and R. Singh, ‘‘Unravelling
[51] C.-H. Chen, W.-P. Lee, and J.-Y. Huang, ‘‘Tracking and recognizing emo- small sample size problems in the deep learning world,’’ in Proc. IEEE 6th
tions in short text messages from online chatting services,’’ Inf. Process. Int. Conf. Multimedia Big Data (BigMM), Sep. 2020, pp. 134–143.
Manage., vol. 54, no. 6, pp. 1325–1344, Nov. 2018. [73] M. Abdul-Mageed and L. Ungar, ‘‘EmoNet: Fine-grained emotion detec-
[52] W. G. Parrott, Emotions in Social Psychology: Essential Readings. Hove, tion with gated recurrent neural networks,’’ in Proc. 55th Annu. Meeting
U.K.: Psychology Press, 2001. Assoc. Comput. Linguistics. Stroudsburg, PA, USA: Association for Com-
[53] A. Neviarouskaya, H. Prendinger, and M. Ishizuka, ‘‘Textual affect sens- putational Linguistics, 2017, pp. 718–728.
ing for sociable and expressive online communication,’’ in Proc. Int. [74] L. Santamaria-Granados, M. Munoz-Organero, G. Ramirez-Gonzalez,
Conf. Affect. Comput. Intell. Interact. Berlin, Germany: Springer, 2007, E. Abdulhay, and N. Arunkumar, ‘‘Using deep convolutional neural net-
pp. 218–229. work for emotion detection on a physiological signals dataset (AMIGOS),’’
[54] J. Xu, Z. Huang, M. Shi, and M. Jiang, ‘‘Emotion detection in E-learning IEEE Access, vol. 7, pp. 57–67, 2019.
using expectation-maximization deep spatial-temporal inference net- [75] R. Luss and S. Rosset, ‘‘Generalized isotonic regression,’’ J. Comput.
work,’’ in Advances in Intelligent Systems and Computing, vol. 650. Cham, Graph. Statist., vol. 23, no. 1, pp. 192–210, Jan. 2014.
Switzerland: Springer, 2018, pp. 245–252.
[55] M. Z. Asghar, A. Khan, K. Khan, H. Ahmad, and I. A. Khan, ‘‘COGEMO:
Cognitive-based emotion detection from patient generated health reviews,’’
J. Med. Imag. Health Informat., vol. 7, no. 6, pp. 1436–1444, Oct. 2017.
[56] S. Z. Bong, K. Wan, M. Murugappan, N. M. Ibrahim, Y. Rajamanickam,
and K. Mohamad, ‘‘Implementation of wavelet packet transform and non
FRANCISCO DE ARRIBA-PÉREZ received the
linear analysis for emotion classification in stroke patient using brain
B.S. degree in telecommunication technologies
signals,’’ Biomed. Signal Process. Control, vol. 36, pp. 102–112, Jul. 2017.
engineering, the M.S. degree in telecommunica-
[57] M. S. Hossain, G. Muhammad, B. Song, M. M. Hassan, A. Alelaiwi,
and A. Alamri, ‘‘Audio–visual emotion-aware cloud gaming framework,’’ tion engineering, and the Ph.D. degree from the
IEEE Trans. Circuits Syst. Video Technol., vol. 25, no. 12, pp. 2105–2118, University of Vigo, Spain, in 2013, 2014, and
Dec. 2015. 2019, respectively.
[58] J. F. Sánchez-Rada, M. Torres, C. A. Iglesias, R. Maestre, and E. Peinado, He is currently a Researcher with the Infor-
‘‘A linked data approach to sentiment and emotion analysis of Twitter in the mation Technologies Group, University of Vigo,
financial domain,’’ in Proc. 2nd Int. Workshop Semantic Web Enterprise Spain. His research includes the development of
Adoption Best Pract., 2nd Int. Workshop Finance Econ. Semantic Web, machine learning solutions for different use cases
vol. 1240, 2014, pp. 51–62. among the ones financial and heath must be highlighted.
VOLUME 8, 2020 215689

SILVIA GARCÍA-MÉNDEZ received the B.S. FRANCISCO J. GONZÁLEZ-CASTAÑO received

degree in telecommunication technologies engi- the B.S. degree from the University of Santi-
neering and the M.S. degree in telecommunication ago de Compostela, Spain, in 1990, and the
engineering from the University of Vigo, Spain, Ph.D. degree from the University of Vigo, Spain,
in 2014 and 2016, respectively. in 1998.
She is currently pursuing the Ph.D. degree in He is currently a Professor with the Univer-
information and communications technology from sity of Vigo, Spain, where he leads the Infor-
the University of Vigo, Spain. Since 2015, she mation Technologies Group. He has authored
has been a Researcher with the Information Tech- over 100 articles in international journals, in the
nologies Group, University of Vigo, Spain. Her fields of telecommunications and computer sci-
research includes the development of automatic solutions for natural lan- ence. He has participated in several relevant national and international
guage processing in Spanish and English. projects. He holds three U.S. patents.
Ms. García-Méndez awards and honors include the Connecting for Good
Vodafone Award in 2017.
JOSÉ ÁNGEL REGUEIRO-JANEIRO received

the B.S. degree in telecommunication technologies
engineering and the M.S. degree in telecommu-
nication engineering from the University of Vigo,
Spain, in 2017 and 2019, respectively.
He is currently pursuing the Ph.D. degree in
information and communications technology with
the University of Vigo, Spain. Since 2019, he has
been a Researcher with the Information Tech-
nologies Group, University of Vigo. His research
includes the development of machine learning solutions for the financial
field.
215690 VOLUME 8, 2020

Articulo 2 Ingles

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Articulo 2 Ingles

Uploaded by

Copyright:

Available Formats

Received October 31, 2020, accepted November 24, 2020, date of publication November 27, 2020,

date of current version December 11, 2020.

Detection of Financial Opportunities

I. INTRODUCTION sources. Two examples are Thomson Reuters Eikon,3 which

215680 VOLUME 8, 2020

VOLUME 8, 2020 215681

FIGURE 1. Our proposed framework.

TABLE 1. Examples of dataset entries with emotion tags.

215682 VOLUME 8, 2020

TABLE 2. Examples of tweets before and after processing.

2) EMOTION ANALYSIS: THREE-LAYER STACKED SYSTEM

VOLUME 8, 2020 215683

TABLE 3. Input features in the machine learning model.

FIGURE 2. Flow diagram of our stacked classifier.

215684 VOLUME 8, 2020

TABLE 5. Confusion matrix for tolerance calculations.

18 Available at https://scikit-learn.org/stable/modules /feature_selection.

VOLUME 8, 2020 215685

3) NUMERICAL TEST 3: TWO-LAYER STACKED CLASSIFIER

215686 VOLUME 8, 2020

FIGURE 3. Histogram of ticker mentions in opportunities.

TABLE 9. Precisions and tolerances of the three-layer stacked classifier

TABLE 10. Improvements in precisions and tolerances between

D. EXPLOITING THE RESULTS

VOLUME 8, 2020 215687

215688 VOLUME 8, 2020

VOLUME 8, 2020 215689

SILVIA GARCÍA-MÉNDEZ received the B.S. FRANCISCO J. GONZÁLEZ-CASTAÑO received

JOSÉ ÁNGEL REGUEIRO-JANEIRO received

215690 VOLUME 8, 2020

You might also like