You are on page 1of 9

Expert Systems with Applications 157 (2020) 113481

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Gold volatility prediction using a CNN-LSTM approach


Andrés Vidal a, Werner Kristjanpoller a,⇑
a
Departamento de Industrias, Universidad Técnica Federico Santa María, Av. España 1680, Valparaíso, Chile

a r t i c l e i n f o a b s t r a c t

Article history: Prediction of volatility for different types of financial assets is one of the tasks of greater mathematical
Received 24 June 2019 complexity in time series prediction, mainly due to its noisy, non-stationary and heteroscedastic struc-
Revised 23 December 2019 ture. On the other hand, gold is an asset of particular importance for hedging and diversification of invest-
Accept 23 April 2020
ment portfolios, and therefore it is important to predict future volatility of this asset. This paper seeks to
Available online 4 May 2020
significantly improve the forecast of gold volatility by combining two deep learning methodologies:
short-term memory networks (LSTM) added to convolutional neural networks (specifically a pre-
Keywords:
trained VGG16 network). It is important to mention that these types of hybrid architectures have not
Gold price volatility
Volatility forecasting
been used in time series prediction, so it is a completely new approach to solving these types of problems.
Deep learning The CNN-LSTM hybrid model is capable of including images as input which provides a wide variety of
CNN information associated with both static and dynamic characteristics of the series. In parallel, different
LSTM lags of profitability of the series are entered as input, which allows it to learn from the temporal structure.
Stock returns forecasting The results show a substantial improvement when this hybrid model is compared to the GARCH and
Hyperparameter setting LSTM models. A 37% reduction in MSE is observed compared to the classic GARCH model, and 18% com-
pared to the LSTM model. Finally, the Model Confidence Model (MCS) determines a significant improve-
ment in the prediction of the hybrid model. The fundamental importance of this research lies in the
application of a new type of architecture capable of processing various sources of information for any
time series prediction task.
Ó 2020 Elsevier Ltd. All rights reserved.

1. Introduction Chua, Sick, and Woodward (1990) state that gold has a low Beta,
according to the CAPM model, and find that its difference of zero is
The prediction of volatility in financial assets has been widely insignificant for various time periods. This shows that gold price
studied during the last decades, mainly because it is an indicator movements have no correlation with stock price movements on
that allows deducing the risk associated with said asset in a certain average over the period examined. This is what gives gold the abil-
time window. On the other hand, there are assets known as instru- ity to hedge portfolio risk.
ments to balance this volatility in a given investment portfolio, and In Bruno and Chincarini (2010) these authors calculated the
gold is widely known to possess this characteristic. optimally weighted portfolios to assess how much gold investors
In the current economic scenario, it is imperative to estimate in various countries should have in order to maximise their risk-
the volatility of gold, due to its wide use in uncorrelated or negathe return profile. These weights vary considerably, from 0.1% to 12%.
financial market. Some of its applications are mentioned below, Gold has also been used as a hedge assets in currency portfolios.
mainly linked to hedging and portfolio diversification. In Capie, Mills, and Wood (2005) authors studied whether gold can
A hedge is defined as ‘‘an asset that istively correlated with act as a hedge against currency risk, they show that its ability to act
another asset or portfolio on average” (Baur & Lucey, 2010). as a hedge is time varying based on unpredictable political and
Sumner, Johnson, and Soenen (2010) studied real return and economic events. Joy (2011) expanded the sample of dollar pairs
volatility spillovers between gold, stocks and bonds in the US. They to 16 currencies from 1986 to 2008. Using dynamic conditional
did not find almost any spillovers in the period analyzed (between correlation models, this author concludes that gold is a dollar
1970 and 2009), which can confirm that gold has diversifying char- hedge, and is becoming more strongly so over the 23 year period
acteristics in the investment portfolio. examined. Reboredo (2013) also finds that gold acts as a dollar
hedge using weekly data from 2000 to 2011 for 8 currency pairs.
⇑ Corresponding author. Similar findings of gold as a potentially useful hedge for currencies
E-mail addresses: andres.vidal.12@sansano.usm.cl (A. Vidal), werner.kristjan- are reported in Reboredo and Rivera-Castro (2014) and Lu and
poller@usm.cl (W. Kristjanpoller).

https://doi.org/10.1016/j.eswa.2020.113481
0957-4174/Ó 2020 Elsevier Ltd. All rights reserved.
2 A. Vidal, W. Kristjanpoller / Expert Systems with Applications 157 (2020) 113481

Hamori (2013) also examine the safe haven properties of gold for the CNN-LSTM model forecasting will be enriched from an image
currencies; and Apergis (2014) who shows gold is a useful predic- provided as an input. To generate the images, a technique will be
tor and hedge for the Australian dollar. used to transform time series into RGB images; these images are
On the other hand, gold is defined as safe haven. An asset may constructed from a number of lags in the series and contain static
be a hedge, providing protection on average, but fails in times of information (with respect to lag autocorrelations) and dynamic
extreme stress. Gold is attractive for investors is in times of finan- information (referring to the probabilities of movement between
cial crisis, and this is widely recognised and frequently mentioned different volatility states in the lags). Thus, a series of images is
in the financial press (e.g. Sanderson (2015)). An example of this is generated that possesses nonlinear and valuable information in
the global stock market crash in August 2015: the S&P500 order to understand the changing patterns of the original series.
decreased by 10%, but gold price increased by 5% in US dollars. These images are processed using a pre-trained Convolutional
Such phenomena also occurred after the financial crisis in 2007 Neural Network (CNN), specifically a VGG16 network, transform-
and during Black Monday in 1987. This indicates that gold is a ing the RGB image into a feature vector. The previously mentioned
highly used commodity for reserves of major economies. For the vector is combined with the output of the LSTM model, and then a
second quarter of 2017, the official US gold reserve was 8133.5 final layer generate the volatility forecast. Finally, the Model Con-
tons, which is equivalent to 74.5% of total US reserves. fidence Set (MCS) is applied to compare the predictive ability of
Baur and Lucey (2010) develop the underlying idea of a safe the different models.
haven, focused on gold’s relationship to other asset prices at times As a fundamental hypothesis, it is expected that the transforma-
of extreme market movements. They define it in terms of its ability tion into images will encode important series information and that
to protect wealth from financial market crashes. These authors the CNN network will be able to identify and extract patterns in
measure market distress as periods when stock or bond indices fall order to use them in volatility forecasting.
below 1%, 2.5% and 5% quantile of the return distribution. This also The main contributions of this study can be summarized as fol-
provides a clear separation of the ideas of a hedge and a safe haven. lows: first of all, a transformation of time series into images was
Baur and McDermott (2010) refine the definition of a safe haven to used, capturing both static and dynamic series data. To the best
the following, ‘‘A strong (weak) safe haven defined as an asset that of our knowledge, this type of application has not been previously
is negatively correlated (uncorrelated) with another asset or port- used to forecast volatility. Thus, it corresponds to a completely new
folio in certain periods only, e.g. in times of falling stock markets.” approach to extract and use financial time series information.
On an empirical level, Baur and Lucey (2010) study the relationship Although LSTM techniques have been used in financial forecasting
between US, UK and German stock and bond returns and gold and some CNN studies even aim to predict categorical variables
returns, finding gold is a hedge and a safe haven for stocks, but (such as stock direction), these two Deep Learning tools have yet
not bonds. However, gold is a safe haven only 15 days after a mar- to be combined in order to enhance predictive models. For this rea-
ket crash. In contrast, Bredin, Conlon, and Potí (2015) apply wave- son, this study also provides a new perspective regarding this type
let analysis, suggesting that contrary to the 15 day finding of Baur of implementation. Secondly, experimental results using gold
and Lucey gold is be a safe haven for up to a year. Baur and prices from the past 50 years show that the proposed CNN-LSTM
McDermott (2010) extend this analysis to a more international model achieves results that improve GARCH model forecasting
sample and confirm gold’s status as a safe haven for equities but by 21% and by more than 10% in the classic LSTM model compared
not for all countries examined. In some countries such as Australia, to its RMSE. Finally, the MCS confirms the CNN-LSTM model as the
Canada, Japan and the BRIC’s, gold is ineffective in protecting best predictive model. These results can be used to obtain better
wealth from extreme market movements. prediction of the future, especially the volatility behavior of an
Given the above, gold price volatility prediction is a widely asset as widely used as gold. Using this type of analysis, it is possi-
studied and required task in finance and economics. Auto regres- ble to analyze whether it is a good hedge asset in the short term,
sive conditional heteroskedasticity models (ARCH) are normally which allows more efficient management of the investment.
used for volatility forecasting, including a wide range of method- The rest of this paper is organized as follows: Section 2 provides
ologies from this family, such as GARCH, EGARCH, ARCH-M, among a look at studies related to volatility forecasting, the representation
others. These have been widely studied for volatility forecasting of time series as images, and Deep Learning to forecast time series
(Trück & Liang, 2012), proving their efficiency in capturing short- and classify images. Section 3 describes the dataset and details the
term variations. However, a drawback of this type of model is its methodology used, including the generation of RGB images, the
difficulty to capture long-term features. On the other hand, Deep CNN classifier used, modifications to the LSTM model, and evalua-
Neural Networks (DNN) possess the ability to learn complex and tion metrics. The experimental results will be presented in Sec-
non-linear relationships from a dataset, which makes them espe- tion 4. Section 5 shows the conclusions of the paper, and to
cially useful for performing classification and prediction tasks in finish, Section 6 indicates future implementations.
the stock market. It is possible to improve forecasts made by clas-
sical econometric models (such as GARCH) with DNNs, since DNNs
2. Related works
have the ability to learn more complex patterns, both in the long-
term and the short-term.
In this section, some of the most relevant methodologies in time
The motivation of this study is to develop a methodology that is
series volatility forecasting are presented. Given the nature of the
capable of making a significant improvement in the prediction of
proposed methodology, it will be necessary to review studies con-
gold volatility. This is important in order to better understand
cerning the topics of volatility forecasting, the representation of
the future behavior of this asset, which as already mentioned is
time series as images, and the deep learning methodologies
of vital importance in a large number of investment portfolios. It
applied in prediction and classification.
seeks to achieve this through the use of modern machine learning
techniques, making the proposed model capable of learning pat-
terns from a large volume of data, which is not necessarily 2.1. Volatility forecasting methods
structured.
With this in mind, we use a Long Short-Term Memory (LSTM) In the literature, there are a great deal of models capable of fore-
network, which corresponds to a special type of Recurrent Neural casting volatility; some of the most widely used are the ARCH
Network (RNN), combined with a CNN network. Subsequently, models. There are applications of these models aimed at forecast-
A. Vidal, W. Kristjanpoller / Expert Systems with Applications 157 (2020) 113481 3

ing the daily price volatility of various commodities such as cocoa, 2.4. Deep learning to classify images
corn, cotton, gold, silver, sugar, and wheat in the long term, using a
combination of GARCH-ISD (Implied Standard Deviation) models Another widely-used Deep Learning architecture is the Convo-
(Kroner, Kneafsey, & Claessens, 1995). In addition, Tully and lutional Neural Network (CNN). This methodology is in the state
Lucey (2007) investigated macroeconomic influences on gold from of the art of various kinds of applications such as image detection,
1983 to 2003 using an AP-GARCH model, concluding that the AP- speech recognition (Mitra et al., 2018), facial recognition (Franc &
GARCH model was best fitted to the data, and the most important Cech, 2018), etc. Despite the success of these architectures in other
explanatory variable was the dollar. Other studies related to this sciences, they have yet to be explored in depth in financial prob-
type of model were conducted by the authors of Trück and Liang lems. Some studies performed with CNN for time series prediction
(2012), where they evaluated different models used in volatility are mentioned below.
forecasting (GARCH, TARCH, TGARCH, ARMA) in order to study In Ding, Zhang, Liu, and Duan (2015), a predictive model of
their predictive ability in the gold market, concluding that the events in the stock market was designed. Initially, the events are
TARCH model exhibited the best results. extracted from financial news and represented as a vector using
Furthermore, Artificial Neural Networks (ANN) have been used word embedding. Then, a CNN is trained to model short-term
as volatility forecasting techniques. Parisi, Parisi, and Díaz (2008) and long-term influences on stock prices. A model was proposed
analyzed Recursive Neural Networks and Rolling Neural Networks which delivered better results than an SVM in the S&P 500 index
as modifications to traditional neural networks, applying these and in forecasting individual stocks.
strategies to predict variations in the price of gold. Kristjanpoller Gunduz et al. (2017) proposed a CNN architecture with ordered
and Minutolo (2015) predicted gold price volatility with an ANN- features to forecast the intraday direction of 100 stocks from the
GARCH hybrid model. The results indicated that the proposed Istanbul Stock Exchange. The proposed classifier obtained better
model surpassed the traditional GARCH model in predictive ability. results than a logistic regression model.
Machine learning techniques such as SVM (Choudhury, Ghosh,
Bhattacharya, Fernandes, & Tiwari, 2014), fuzzy logic (Dash & 3. Background and proposed model
Dash, 2016), ANFIS (Yazdani-Chamzini, Yakhchali,
Volungeviciene, & Zavadskas, 2012), (Kristjanpoller & Michell, In this section, details on the dataset used, transformation
2018), among others, have been widely used in the literature to methodologies of time series into images, classifiers, and forecast-
predict time series. ing models will be provided. Also, it will presented the proposed
model. Moreover, the metric used to evaluate each model’s perfor-
2.2. Representation of time series as images mance will also be detailed.

A natural application for the use of CNN is image classification, 3.1. Datasets
where the network is able to learn geometric, chromatic, etc., pat-
terns from a label associated with each image. Subsequently, a The dataset used corresponds to the daily spot price for the gold
forecast is performed of the label. Regarding this point, financial time series of the London Bullion Market Association (LBMA). It
time series face a big problem, since they do not possess the ability contains 12530 trading days from April 1968 to October 2017.
to provide information based on images, and the feature matrices, The dataset was divided into 2 parts: the training set, correspond-
which can be used to represent information (such as in Gunduz, ing to 80% of the data, and the testing set, equivalent to the remain-
Yaslan, & Cataltepe (2017)), do not necessarily possess elements ing 20%.
that can recognize convolutional network architectures (such as For each element in the price series P t , where the index t
lines, vertices, angles, shadows, etc.). denotes the daily closing price observation, its logarithmic return
In Wang and Oates (2015), a new framework was used to rt was calculated.
encode time series as different types of images, Gramian Angular rt ¼ logPt  logPt1 ð1Þ
Fields (GAF) and Markov Transition Fields (MTF). This allows the
use of computer vision techniques for classification. Squared logarithmic return is a good approximation of volatility
and can be calculated without knowing the values of future
returns. Thus, the current value and a certain amount of lags (3 lags
2.3. Deep learning to forecast time series for these tests) will be used as a predictor of historical volatility.
Then, historical volatility is calculated as the variance of loga-
During recent years, Deep Learning models have generated rithmic return in a window of 14 days into the future, as shown
great interest in various research areas, including the prediction in the following equation.
of financial time series. Models such as Deep Belief Networks
þ14
(DBN), Long Short-Term Memory (LSTM), Gated Recurrent Unit 1 tX
HV t ¼ ðr i  r t Þ2 ð2Þ
(GRU), among other architectures, have been widely used in the 14 i¼tþ1
state of the art of several stock and commodity market studies.
In Kim and Won (2018), an LSTM model is combined with various
GARCH-type models. The hybrid model generates an improvement 3.2. Proposed architecture of the CNN-LSTM model
in the performance of realized volatility forecasting for the KOSPI
200 index. In citepbaek2018mod, a new framework for stock mar- The proposed volatility forecasting model consists of two
ket forecasting is proposed. This consists of two LSTM modules: the stages: first, the generation of the RGB image series It . The second
first prevents the network from overfitting and the second carries stage corresponds to the model training, where on the one hand,
out forecasts. The results indicated that the use of both modules the image embedding is generated into a new feature space, and
together significantly improved the LSTM network forecasts. In in parallel, the LSTM layers includes square of logarithmic return
Zhu, Yin, and Li (2014), a DBN was used to implement a stock trad- r2t , adding a certain number of lags r2t1 ; r2t2 . . .. The two embed-
ing decision system. Data from different stocks belonging to the dings associated with each of the previous layers are concatenated
S&P 500 were used. The system achieved better results than the and connected with two Dense layers, the first with dimhybrid output
buy and hold strategy. neurons, which allows to generate a reduction in dimensionality
4 A. Vidal, W. Kristjanpoller / Expert Systems with Applications 157 (2020) 113481

from the concatenation, finally the last layer has a single neuron, 3.2.1.2. Markov transition fields. Given the time series X; Q quantile
providing the final output. In this way, the weightings of the bins are identified and each value of xi is assigned its respective
CNN-LSTM network are trained to forecast realized volatility HV t . value of bins qj (j 2 ½1; Q ). Then, an adjacency matrix W is con-
The CNN-LSTM model scheme, detailed in the last stage, is shown structed, of dimensions QxQ, counting the transitions between
below. the different quantile bins in the manner of a Markov chain and
respecting the original order of the series. wi;j is given by the fre-
quency at which a point in the quantile qi is followed by a point
3.2.1. Encoding time series to images
in the quantile qj . Thus, a normalization process is performed so
As mentioned in detail in Wang and Oates (2015), the objective P
of this transformation is to generate an image It , which will form that j wi;j ¼ 1; 8i 2 ½1; Q . W corresponds to the Markov transition
part of the time series. This will be constructed for each of the peri- matrix, then the Markov Transition Field (MTF) is defined as
ods of time t as an RGB image, meaning that it will be made up of 3 follows:
channels. Each channel is expressed as a feature matrix. The first 2 3
type of image is a Gramian Angular Fields (GAF), where the time wijjx1 2qi ;x1 2qj    wijjx1 2qi ;xn 2qj
6w    wijjx2 2qi ;xn 2qj 7
series is represented in a polar coordinate system instead of the 6 ijjx2 2qi ;x1 2qj 7
6 7
typical Cartesian representation. In the GAF matrix, each element M¼6 .. .. .. 7 ð8Þ
6 . . . 7
is the cosine-sine of a sum of angles (thus, 2 channels are defined 4 5
to apply the cosine and sine functions). The second type of image is wijjxn 2qi ;x1 2qj    wijjxn 2qi ;xn 2qj
inspired by studies on the duality between time series and com-
plex networks (Campanharo, Sirer, Malmgren, Ramos, & Amaral, The MTF matrix denotes the transition probability from a quan-
2011), corresponding to Markov Transition Fields (MTF). The main tile qi ! qj for each element of the series X.
idea consists of building the Markov matrix from quantile bins A series of images It ¼ ½Gcos ; Gsin ; M  is constructed from the
after discretization and encoding the likelihood of dynamic transi- three previously described matrices, where Gcos ; Gsin , and M are cal-
tion in a matrix. culated from the series ^rt . Note that the two parameters necessary
Applying these methodologies to a gold time series will be per- to build the series of images are N and Q.
formed in the following manner: for each return value r t , a series
will be generated with N lags. This new series is defined by
3.2.2. For embedding: Convolutional Neural Network (CNN)
^rt ¼ rtn ; rtðn1Þ ; rtðn2Þ ; . . . ; rt ; an image It will be generated from
After generating the series of images It , the information con-
the series ^rt . tained in each of the image channels is used as an input for a
VGG16 network. This generates an embedding that transforms
3.2.1.1. Gramian angular fields. Given the time series the image from its initial dimensions into a new space; this vector
X ¼ x1 ; x2 ; . . . ; xn with N observations, X is rescaled so that all values will be combined with the result of the LSTM network to generate
belong to the interval ½1; 1 from the following transformation. the forecast.
The Convolutional Neural Network (CNN) is basically composed
ðxi  maxð X ÞÞ þ ðxi  minð X ÞÞ of many layers with distinctive architectures, called convolutional
~xi ¼ ð3Þ
maxð X Þ  minð X Þ layers and subsampling (pooling) layers (Wang, Xiang, Zhong, &

Zhou, 2018). The main difference with an MLP is the connection
Then, it is possible to represent two rescaled versions of X in between layers. In the CNN, each local part (receptive field) is con-
polar coordinates: one associated with the cosine function and nected to a single neuron, whereas the inputs in an MLP are fully
another associated with the sine function. connected to the neurons from the next layer.
( There are three main layers in the architecture: the input layer,

/ ¼ arccosð~xi Þ; 1 6 ~xi P 1; ~xi  X hidden layers, and the output layer. Hidden layers basically consist
ð4Þ of convolution layers followed by subsampling layers. Different
r ¼ Nti ; ti N
types of filters to the inputs are used in each convolution layer,
( and the activation function is applied to the resulting output. Thus,

h ¼ arcsinð~xi Þ; 1 6 ~xi P 1; ~xi 2 X the pooling layers are implemented in the activated outputs.
ð5Þ Due to the pooling process, input data with different dimen-
r ¼ Nti ; ti 2 N
sions can be processed by the network. Moreover, this means that
Finally, two GAF matrices can be generated associated with the the location of the pattern in the image will not matter, given that
time series X. the filters will process the entire image. Regarding this, the sub-
2 3 sampling process is the solution for images with rotated or
cosð/1 þ /1 Þ    cosð/1 þ /n Þ expanded objects, or objects in different positions. On the other
6 cosð/ þ / Þ    cosð/ þ / Þ 7 hand, the filters applied to the input data convert low-level attri-
6 2 1 2 n 7
Gcos ¼6
6 .. .. ..
7
7 ð6Þ butes of local parts into representations of high-level attributes
4 . . . 5 in each layer.
cosð/n þ /1 Þ    cosð/n þ /n Þ CNN architectures require multiple hyperparameters to be con-
figured, such as the filter size, stride, and pooling type. Filter size
2 3 indicates the set of inputs to which the transformation kernel will
sinðh1 þ h1 Þ    sinðh1 þ hn Þ
6 sinðh þ h Þ    sinðh þ h Þ 7 be applied in the convolution process. Stride determines how
6 2 1 2 n 7
many steps will be given when running the filtering process.
Gsin ¼6
6 .. .. ..
7
7 ð7Þ
4 . . . 5 Finally, pooling type will indicate which pooling process will be
applied to the filtering output (LeCun, Kavukcuoglu, & Farabet,
sinðhn þ h1 Þ    sinðhn þ hn Þ
2010).
GAF matrices contain information regarding the temporal cor- For this implementation, a VGG16 (Szegedy et al., 2015) net-
relation between two series elements, the dimension of the matrix work pre-trained with the ImageNET Dataset was used, applying
is NxN, and it represents the first two channels of the image It . the following modifications:
A. Vidal, W. Kristjanpoller / Expert Systems with Applications 157 (2020) 113481 5

 Up to the layer ’block4_conv3’ (specified in the architecture of network. To train the CNN-LSTM network, it is necessary to define
the VGG network) was configured to extract the feature vector two parameters associated with the series of images (N; Q ) and two
from the images. parameters associated with the architecture of the model (dimlstm
 Subsequently, 4 extra layers were added to the model, to per- and dimhybrid ). Thus, an estimation will be performed of different
form a dimensionality reduction: two convolutional layers CNN  LSTM n;q;dimlstm ;dimhybrid models. Then, the
(with 64 and 10 filters and Relu activation function), a Flatten CNN  LSTM n;q;dimlstm ;dimhybrid model will be estimated, and the model
layer and a Dense layer with parameter ’dim _merge ’, corre- that presents the lowest Mean Square Error (MSE) indicator will
sponding to the size of the vector that will be combined with be selected.
the output of the LSTM network.
3.3. Loss function, benchmark models and test
3.2.3. For time-series forecasting: Long Short-Term Memory (LSTM)
A Recurrent Neural Network (RNN) learns temporal patterns To evaluate the accuracy of the forecasting of the proposed
using sequential data. The memory properties of RNN are not pre- CNN-LSTM model, the loss function used is Mean Square Error
sent in the previous deep feedforward networks. However, the (MSE). The MSE is validated as the best error measure for volatility
vanishing gradient problem occurs when training standard RNN, (Fuertes, Izzeldin, & Kalotychou, 2009). Different configurations of
meaning that networks have difficulty retaining information for a the CNN-LSTM model will be compared with different configura-
long period of time (Hochreiter, 1998). The LSTM model was devel- tions and models used in related studies to predict gold volatility.
oped to avoid these problems (Hochreiter & Schmidhuber, 1997). In particular, the models used as benchmarks are Generalized
This type of architecture uses memory cells and gates to store Autoregressive Conditional Heteroskedasticity (GARCH), Regres-
information for long periods of time, or to forget it if it is sion Support Vectors (SVR), Long Term Short Term Memory Net-
unnecessary. works (LSTM) and Convolutional Neural Networks (CNN).
Fig. 1 represents the structure of a memory block, considering
n  2
memory cell and gates. xt and ht correspond to the input and hid- 1X di
MSE ¼ HV t  HV ð9Þ
den state, respectively, f t ; it , and ot correspond to the gates associ- n i¼1
ated with forget, input, and output, respectively. nt is the candidate
input to be stored. Finally, the stored amount is controlled by the where HV di corresponds to the volatility estimated by the model and
input gate. HV i is the historical volatility, n corresponds to the total number of
For the proposed model, two LSTM layers will be used, entering predictions made in the test set.
3 lags of the logarithmic return squared (with sizes of (dimlstm ) and After obtaining the best prediction models, the superiority
(dimmerge ) respectively), the value of dimmerge will remain fixed at accuracy will be tested by the Model Confidence Set (MCS)
200. The result will be combined with the output of the VGG16 (Hansen, Lunde, & Nason, 2003; Hansen, Lunde, & Nason, 2001).

Fig. 1. Proposed CNN-LSTM architecture.


6 A. Vidal, W. Kristjanpoller / Expert Systems with Applications 157 (2020) 113481

To perform this test, MSE is used as a loss function, since it is Table 2


defined as a robust loss function (Patton, 2011). The MCS is a test Top 5 best LSTM-CNN model results varying the main parameters associated with its
architecture.
that discards inferior models based on their predictive capacity
by means of a rule of elimination. The surviving models have a sim- N Q dim_lstm dim_hybrid MSE
ilar predictive capacity associated with the loss function used. The 80 20 200 100 1.9840E08
MCS provides test p-values for each model; the main difference 80 4 50 200 2.0489E08
with other tests is that the selection depends on the data features 60 4 100 200 2.3312E08
80 20 100 200 2.3135E08
and how these can be modeled using a specific methodology. 50 20 100 100 2.3490E08

The parameters used are: the number of lags used to construct the images (N), the
4. Experiments number of quantiles used for the MTF images (Q), the dimension of LSTM1
(DIMLSTM ) and the Dense layer dimension specified after the concatenation
(DIM HYBRID ). The indicator we seek to minimize is the MSE.
4.1. Benchmarking models

The models used as reference points are Generalized Autore- Out of the results observed, the CNN model with parameters
gressive Conditional Heteroskedasticity (GARCH), Regression Sup- N ¼ 80; Q ¼ 20; dimlstm ¼ 200 and dimhybrid ¼ 100 is selected as the
port Vectors (SVR), Long Term Short Term Memory Networks optimal configuration. The best MSE achieved by the proposed
(LSTM) and Convolutional Neural Networks (CNN). model is 1.9840E08, which is 18% lower than the best MSE of
The GARCH model was analyzed in more than 400 configura- the benchmarking models.
tions, with the lowest MSE obtained being 3.1866E08; mean- Fig. 2 shows the results obtained by the CNN-LSTM model. It is
while, for the SVR, 500 configurations were tested with a possible to see that the proposed model generates significant
minimum MSE of 3.5787E08. The 400 ANN models tested have improvements in volatility peaks, where it is able to predict more
a lowest MSE of 2.6842E08. It was also included within the mod- stable patterns of growth and decline during the pre and post peak
els to compare the ANN-GARCH hybrid model, minimally reducing days. On the other hand, the LSTM model is only able to predict
the MSE to 2.6807E08. Each of the models of the proposed model these peaks on the basis of punctual jumps from the baseline value.
components were independently tested. In the case of CNN, the Another effect that can be observed is in periods where the volatil-
best model obtained an MSE of 2.6909E08, while the best LSTM ity remains stable. The CNN-LSTM model is able to predict greater
model reached an MSE of 2.4225E08, showing a superiority over movement in a higher and lower range than the basal level, while
all of the other analyzed models. Table 1 shows the best 3 config- the LSTM model remains with very low, almost null variations.
urations for each of the models analyzed as benchmarks. Finally, the two effects described qualitatively explain the low
error when incorporating the images as inputs for the hybrid
4.2. CNN-LSTM model results architecture.

The proposed model was trained for 72 cases, defined by the 4.3. MCS results
following parameters:
N 2 ½50; 60; 80; Q 2 ½4; 20; 40; 80; dimlstm 2 ½50; 100; 200 and To determine the superiority of the proposed model over all the
dimhybrid 2 ½100; 200. The best 5 results are displayed in Table 2. benchmarking models to forecast gold volatility, the Model Confi-

Table 1
Top 3 configuration models for each Benchmark model.

Model Configuration Parameters


GARCH MSE W AR p q
1 3.1866E08 252 1 3 3
2 3.1981E08 252 3 2 2
3 3.2028E08 252 1 2 3
SVR MSE epsilon k C W
1 3.5787E08 0.3 10 1 200
2 3.5787E08 0.3 10 5 200
3 3.5787E08 0.3 10 3 200
ANN MSE AR W L N
1 2.6842E08 24 252 4 20
2 2.6862E08 24 252 4 5
3 2.6911E08 5 252 4 10
ANN-GARCH MSE AR W L N
1 2.6807E08 5 252 4 5
2 2.6827E08 5 252 3 20
3 2.6853E08 24 252 2 20
CNN MSE N Q B
1 2.6909E08 45 40 128
2 2.6926E08 60 20 128
3 2.712E08 28 80 64
LSTM MSE AR E B
1 2.4225E08 30 200 10
2 2.4818E08 15 300 10
3 2.4875E08 20 300 10

The parameters used are: the length of the windows to adjust (W), the number of series lagged (AR), the number of autorregressive terms (p), number of volatility lagged
terms (q), number of layers (L), number of neurons per layer (N), the error slack variable (epsilon), the bound on the Lagrangian multiplier (C), the parameter of desviation for
the gaussian kernel (k), the number of lags used to construct the images (N), the number of quantiles used for the MTF images (Q), the size of the batch for training (B) and the
number of epochs used (E). The indicator we seek to minimize is the RMSE associated with the CNN-LSTM model. For the parameters of SVR see Smola (2004), for the GARCH
parameters see Engle (1982) and Bollerslev (1986)and for ANN and ANN-GARCH parameters see Kristjanpoller and Minutolo (2015).
A. Vidal, W. Kristjanpoller / Expert Systems with Applications 157 (2020) 113481 7

Fig. 2. CNN-LSTM model vs LSTM model forecasting results and realized volatility.

Table 3
Model Confidence Set results and Accuracy Improvement

Model MSE Improvement MCS p-value The advantages of the CNN-LSTM model proposed in this work,
with respect to other volatility forecasting methodologies, can be
GARCH 3.1866E08 37.74% 0.03
SVR 3.5787E08 44.56% 0.04 summarized as follows:
ANN 2.6842E08 26.09% 0.15
ANN-GARCH 2.6807E08 25.99% 0.19  The main advantage given by this work is to generate better
CNN 2.6909E08 26.27% 0.25
prediction of the future, especially for the volatility behavior
LSTM 2.4225E08 18.10% 0.26
LSTM-CNN 1.9840E08 0.00% 1.00
of an asset as widely used as gold, it will be possible to analyze
whether it is a good hedge asset in the short term, which allows
The improvement means the decreasing percentage of the MSE of the proposed more efficient management the investment portfolio, being able
model compared with each model. The MCS was applied with 10,000 as the number
of bootstrapping and a block length of 100.
to replace or enhance this asset.
 The possibility of expressing the relationships of a time series as
different matrices (equivalent to an image) allows us to analyze
dence Set was applied. There is a superiority M90 of the proposed a greater amount of information to identify relationships
model over all the benchmarking models studied, showing the between increases and decreases in periods of volatility. This
improvement in forecasting accuracy when the image is processed is a great advantage with regards to most traditional models,
by the CNN and used as an input in the LSTM. The results obtained whereas they are only able to extract information directly from
can be observed in Table 3. series lags.
 The CNN model allows us to extract information from data
organized with a different abstraction level, detecting patterns
that can be expressed in different scales, orientations, or per-
5. Discussion and conclusion spectives. This causes parameter detection to be especially
robust.
The use of Deep Learning models in finance is in constant devel-  For each forecast performed by the model, information is being
opment due to the increasing amount of data to which we are used from a large amount of lags, which is expressed through
exposed. The proposed model allows a combination of two tech- the processed image. Then, there are two Deep Learning net-
niques focused on processing completely different data. Given this, works that are specially developed in order to use of this type
various modifications can be developed to investigate potential of data.
advantages to this architecture.
In this study, realized volatility was forecasted for gold price Limitations of the proposed hybrid model:
using different levels of abstraction for information contained in
the price series. Based on return, volatility, and images generated  First, it is important to mention that the hybrid model training
from these data, we managed to train an architecture based on process has a high computational cost, where each iteration in
CNN and LSTM models to forecast realized volatility. the training (for each value of the parameters N; Q ; dimlstm and
Experimental results are obtained from the time series of gold dimhybrid ) takes about 30 min, with hardware suitable for Deep
prices, using 40 years for the training set and 10 years for the test Learning architecture training (NVDIA GTX 1060 6 GB SC
set. In the first stage, the time series of images associated with the GDDR5 graphics card), compared to the simple LSTM model that
logarithmic return of gold is constructed (using the number of lags takes less than a minute to train.
N and the quantity of quantiles Q as parameters). Later, 72 varia-  Secondly, there is an important complication regarding the
tions of the hybrid model were trained, which the previously men- amount of data needed to perform the training in an optimal
tioned series of images combined as inputs and a number of lags way. For gold, it is possible to find records for 50 years, however
associated with return. The optimal combination is obtained for other assets will not have that amount of information, so it is
the configuration N ¼ 80; Q ¼ 20; dimlstm ¼ 200; dimhybrid ¼ 100, not possible to collect a quantity of data that has ideal condi-
obtaining an MSE of 2:3490E  8, decreasing by 18% with respect tions for the hybrid model.
to the best MSE of the models used as a benchmark.  There is a third limitation regarding the types of data used,
From the performed experiments and the subsequent evalua- since these were only from gold prices, which prevents a
tion based on MCS, it was observed that the CNN-LSTM model broader vision that could add much value to the model, relating
achieved a better predictive performance than the other models. patterns related to other market variables.
8 A. Vidal, W. Kristjanpoller / Expert Systems with Applications 157 (2020) 113481

In closing, it is important to explicitly detail the differences of References


this work with other previous research, as mentioned in the Sub-
Apergis, N. (2014). Can gold prices forecast the australian dollar movements?.
section: 2.3, of all the works that have used Deep Learning tech- International Review of Economics and Finance, 29, 75–82.
niques for the prediction of time series none have had combined Batten, J. A., Ciner, C., & Lucey, B. M. (2010). The macroeconomic determinants of
architectures of the CNN and LSTM type. These architectures have volatility in precious metals markets. Resources Policy, 35, 65–71.
Baur, D. G., & Lucey, B. M. (2010). Is gold a hedge or a safe haven? An analysis of
been used individually to predict categorical variables associated
stocks, bonds and gold. Financial Review, 45, 217–229.
with the series or to predict some response based on the lags of Baur, D. G., & McDermott, T. K. (2010). Is gold a safe haven? International evidence.
the series. In this implementation, a mechanism was generated Journal of Banking & Finance, 34, 1886–1898.
with which the CNN network generates an embedding of features Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity.
Journal of Econometrics, 31, 307–327.
that provides significant information for the prediction of volatil- Bredin, D., Conlon, T., & Potí, V. (2015). Does gold glitter in the long-run? Gold as a
ity. Nor have techniques that transform time series into images hedge and safe haven across time and investment horizon. International Review
been used in the financial field, which results to be an advantage of Financial Analysis, 41, 320–328.
Bruno, S., & Chincarini, L. (2010). A historical examination of optimal real return
for the use of these types of architectures. On the other hand classic portfolios for non-us investors. Review of Financial Economics, 19, 161–178.
models of the GARCH type are based solely on extraction of infor- Campanharo, A. S. L. O., Sirer, M. I., Malmgren, R. D., Ramos, F. M., & Amaral, L. A. N.
mation from the lags for a certain time series, so it loses all the (2011). Duality between time series and networks. PLoS ONE, 6 e23378.
Capie, F., Mills, T. C., & Wood, G. (2005). Gold as a hedge against the dollar. Journal of
information related to the dynamics of the series, added to the fact International Financial Markets Institutions and Money, 15, 343–352.
that it is not capable to process large amounts of information (add- Choudhury, S., Ghosh, S., Bhattacharya, A., Fernandes, K. J., & Tiwari, M. K. (2014). A
ing other types of data from the same series or from the market). real time clustering and svm based price-volatility prediction for optimal
trading strategy. Neurocomputing, 131, 419–426.
Finally, this work provides a new architecture that can be used Chua, J. H., Sick, G., & Woodward, R. S. (1990). Diversifying with gold stocks.
for any type of time series forecast. Financial Analysts Journal, 46, 76–79.
Dash, R., & Dash, P. (2016). An evolutionary hybrid fuzzy computationally efficient
egarch model for volatility prediction. Applied Soft Computing, 45, 40–60.
6. Future works Ding, X., Zhang, Y., Liu, T., & Duan, J. (2015). Deep learning for event-driven stock
prediction. IJCAI, 2327–2333.
Elder, J., Miao, H., & Ramchander, S. (2012). Impact of macroeconomic news on
In order resolve the limitations mentioned in the discussion, the metal futures. Journal of Banking and Finance, 36, 51–65.
following modifications could be made in a future Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of
the variance of united kingdom inflation. Econometrica: Journal of the
implementation: Econometric Society, 987–1007.
Franc, V., & Cech, J. (2018). Learning cnns from weakly annotated facial images.
 First a methodology could be executed that can identify which Image and Vision Computing, 77, 10–20.
Fuertes, A., Izzeldin, M., & Kalotychou, E. (2009). On forecasting daily stock
layer of the proposed architecture brings greater value to the volatility: The role of intraday information and market conditions. International
model and which ones contribute the least. In this way a more Journal of, Forecasting, 259–281.
bounded architecture could be generated and give priority to a Gunduz, H., Yaslan, Y., & Cataltepe, Z. (2017). Intraday prediction of borsa istanbul
using convolutional neural networks and feature correlations. Knowledge-Based
greater execution of iterations to configure the hyperparame-
Systems, 137, 138–148.
ters. Some modifications that could be made in the architecture Hansen, P. R., Lunde, A., & Nason, J. M. (2001). The model confidence set.
are: a different process for generating images, the CNN architec- Econometrica, 79, 453–497.
ture used, the dimension to combine the embeddings, the quan- Hansen, P. R., Lunde, A., & Nason, J. M. (2003). Choosing the best volatility models:
The model confidence set approach. Oxford Bulletin of Economics and Statistics,
tity and dimension of hidden layers used to reduce 65, 839–861.
dimensionality, among others. Hochreiter, S. (1998). The vanishing gradient problem during learning recurrent
 To solve the second difficulty, it is necessary to make improve- neural nets and problem solutions. International Journal of Uncertainty, Fuzziness
and Knowledge-Based Systems, 6, 107–116.
ments focused on the amount of data used for training. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural
Although a horizon of 50 years was used, the amount of data Computation, 9, 1735–1780.
is far from the optimal quantity for a Deep Learning model. Joy, M. (2011). Gold and the us dollar: Hedge or haven?. Finance Research Letters, 8,
120–131.
The power of these architectures lies in harnessing a big data Kim, H. Y., & Won, C. H. (2018). Forecasting the volatility of stock price index: A
source that is capable of displaying a large number of patterns hybrid model integrating lstm with multiple garch-type models. Expert Systems
and labels for analysis. For this reason, we propose working with Applications, 103, 25–37.
Kristjanpoller, W., & Michell, K. (2018). A stock market risk forecasting model
with a high-frequency price source that allows the utilization
through integration of switching regime, anfis and garch techniques. Applied
of these features for the proposed models. Soft Computing, 67, 106–116.
 Finally, regarding the broader view of the market as a whole, a Kristjanpoller, W., & Minutolo, M. (2015). Gold price volatility: A forecasting
approach using the artificial neural network–garch model. Expert Systems with
greater number of variables that affect the price of gold could be
Applications, 42, 7245–7251.
used in the model. There are numerous studies relating macroe- Kroner, K. F., Kneafsey, K. P., & Claessens, S. (1995). Forecasting volatility in
conomic variables to gold price Batten, Ciner, and Lucey (2010), commodity markets. Journal of Forecasting, 14, 77–95.
Shafiee and Topal (2010), Elder, Miao, and Ramchander (2012), LeCun, Y., Kavukcuoglu, K., & Farabet, C. (2010). Convolutional networks and
applications in vision. In Proceedings of 2010 IEEE International Symposium on
such as oil price, inflation, among others. Circuits and Systems (pp. 253–256).
Lu, Y., & Hamori, S. (2013). Gold prices and exchange rates: A time-varying copula
analysis. Applied Financial Economics, 24, 41–50.
Declaration of Competing Interest Mitra, V., Sivaraman, G., Nam, H., Espy-Wilson, C., Saltzman, E., & Tiede, M. (2018).
Hybrid convolutional neural networks for articulatory and acoustic information
based speech recognition. Speech Communication, 89, 103–112.
The authors declare that they have no known competing finan- Parisi, A., Parisi, F., & Díaz, D. (2008). Forecasting gold price changes: Rolling and
cial interests or personal relationships that could have appeared to recursive neural network models. Journal of Multinational Financial Management,
influence the work reported in this paper. 18, 477–487.
Patton, A. J. (2011). Volatility forecast comparison using imperfect volatility proxies.
Journal of Econometrics, 160, 246–256.
Reboredo, J. C. (2013). Is gold a safe haven or a hedge for the us dollar? Implications
CRediT authorship contribution statement for risk management. Journal of Banking and Finance, 37, 2665–2676.
Reboredo, J. C., & Rivera-Castro, M. A. (2014). Can gold hedge and preserve value
Andrés Vidal: Conceptualization, Methodology, Software, Writ- when the us dollar depreciates? Economic Modelling, 39, 168–173.
Sanderson, H. (2015). Gold rises amid expectation of ecb move on qe. The Financial
ing - original draft, Validation. Werner Kristjanpoller: Supervi- Times.
sion, Validation, Writing - review & editing.
A. Vidal, W. Kristjanpoller / Expert Systems with Applications 157 (2020) 113481 9

Shafiee, S., & Topal, E. (2010). An overview of global gold market and gold price Wang, S., Xiang, J., Zhong, Y., & Zhou, Y. (2018). Convolutional neural network-based
forecasting. Resources Policy, 35, 178–189. hidden markov models for rolling element bearing fault identification.
Smola, A., & S., B. (2004). A tutorial on support vector regression. Statistics and Knowledge-Based Systems, 144, 65–76.
computing,(pp. 199–222).. Wang, Z., & Oates, T. (2015). Encoding time series as images for visual inspection
Sumner, S., Johnson, R., & Soenen, L. (2010). Spillover effects among gold, stocks, and and classification using tiled convolutional neural networks. In AAAI
bonds. Journal of Centrum Cathedra, 3, 106. Publications, Workshops at the Twenty-Ninth AAAI Conference on Artificial
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S. E., Anguelov, D., Erhan, D., Intelligence. .
Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Yazdani-Chamzini, A., Yakhchali, S. H., Volungeviciene, D., & Zavadskas, E. K. (2012).
Proceedings of the 28th Conference on Computer Vision and Pattern Recognition Forecasting gold price changes by using adaptive network fuzzy inference
(pp. 1–9). system. Journal of Business Economics and Management, 13, 994–1010.
Trück, S., & Liang, K. (2012). Modelling and forecasting volatility in the gold market. Zhu, C., Yin, J., & Li, Q. (2014). A stock decision support system based on dbns.
International Journal of Banking and Finance, 9, 48–80. Journal of Computational Information System, 10, 883–893.
Tully, E., & Lucey, B. M. (2007). A power garch examination of the gold market.
Research in International Business and Finance, 21, 316–325.

You might also like