Mathematics 08 01640 PDF

mathematics
Review
Comprehensive Review of Deep Reinforcement
Learning Methods and Applications in Economics
Amirhosein Mosavi 1,2, * , Yaser Faghan 3 , Pedram Ghamisi 4,5 , Puhong Duan 6 ,
Sina Faizollahzadeh Ardabili 7 , Ely Salwana 8 and Shahab S. Band 9,10, *
1 Environmental Quality, Atmospheric Science and Climate Change Research Group,
Ton Duc Thang University, Ho Chi Minh City, Vietnam
2 Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Vietnam
3 Instituto Superior de Economia e Gestao, University of Lisbon, 1200-781 Lisbon, Portugal;
yaser.kord@yahoo.com
4 Helmholtz-Zentrum Dresden-Rossendorf, Chemnitzer Str. 40, D-09599 Freiberg, Germany;
pedram.ghamisi@uantwerpen.be
5 Department of Physics, Faculty of Science, the University of Antwerp, Universiteitsplein 1,
2610 Wilrijk, Belgium
6 College of Electrical and Information Engineering, Hunan University, Changsha 410082, China;
puhong_duan@hnu.edu.cn
7 Department of Biosystem Engineering, University of Mohaghegh Ardabili, Ardabil 5619911367, Iran;
Sina.faiz@uma.ac.ir
8 Institute of IR4.0, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia; elysalwana@ukm.edu.my
9 Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam
10 Future Technology Research Center, College of Future,
National Yunlin University of Science and Technology, 123 University Road, Section 3,
Douliou, Yunlin 64002, Taiwan
* Correspondence: amirhosein.mosavi@tdtu.edu.vn (A.M.);
shamshirbandshahaboddin@duytan.edu.vn (S.S.B.)

Received: 10 May 2020; Accepted: 15 September 2020; Published: 23 September 2020
Abstract: The popularity of deep reinforcement learning (DRL) applications in economics has
increased exponentially. DRL, through a wide range of capabilities from reinforcement learning (RL)
to deep learning (DL), offers vast opportunities for handling sophisticated dynamic economics systems.
DRL is characterized by scalability with the potential to be applied to high-dimensional problems in
conjunction with noisy and nonlinear patterns of economic data. In this paper, we initially consider a
brief review of DL, RL, and deep RL methods in diverse applications in economics, providing an
in-depth insight into the state-of-the-art. Furthermore, the architecture of DRL applied to economic
applications is investigated in order to highlight the complexity, robustness, accuracy, performance,
computational tasks, risk constraints, and profitability. The survey results indicate that DRL can
provide better performance and higher efficiency as compared to the traditional algorithms while
facing real economic problems in the presence of risk parameters and the ever-increasing uncertainties.
Keywords: economics; deep reinforcement learning; deep learning; machine learning; mathematics;
applied informatics; big data; survey; literature review; explainable artificial intelligence; ensemble;
anomaly detection; 5G; fraud detection; COVID-19; Prisma; data science; supervised learning
1. Introduction
Deep learning (DL) techniques are based on the use of multi-neurons that rely on the multi-layer
architectures to accomplish a learning task. In DL, the neurons are linked to the input data in
Mathematics 2020, 8, 1640; doi:10.3390/math8101640 www.mdpi.com/journal/mathematics

Mathematics 2020, 8, 1640 2 of 42
conjunction with a loss function for the purpose of updating their weights and maximizing the fitting
to the inbound data [1,2]. In the structure of a multi-layer, every node takes the outputs of all the
prior layers in order to represent outputs set by diminishing the approximation of the primary input
data, while multi-neurons learn various weights for the same data at the same time. There is a great
demand for the appropriate mechanisms to improve productivity and product quality in the current
market development. DL enables predicting and investigating complicated market trends compared
to the traditional algorithms in ML. DL presents great potential to provide powerful tools to learn from
stochastic data arising from multiple sources that can efficiently extract complicated relationships and
features from the given data. DL is reported as an efficient predictive tool to analyze the market [3,4].
Additionally, compared to the traditional algorithms, DL is able to prevent the over-fitting problem,
to provide more efficient sample fitting associated with complicated interactions, and to outstretch
input data to cover all the essential features of the relevant problem [5].
Reinforcement learning (RL) [6] is a powerful mathematical framework for experience-driven
autonomous learning [7]. In RL, the agents interact directly with the environment by taking actions
to enhance its efficiency by trial-and-error to optimize the cumulative reward without requiring
labeled data. Policy search and value function approximation are critical tools of autonomous
learning. The search policy of RL is to detect an optimal (stochastic) policy applying gradient-based or
gradient-free approaches dealing with both continuous and discrete state-action settings [8]. The value
function strategy is to estimate the expected return in order to find the optimal policy dealing with all
possible actions based on the given state. While considering an economic problem, despite traditional
approaches [9], reinforcement learning methods prevent suboptimal performance, namely, by imposing
significant market constraints that lead to finding an optimal strategy in terms of market analysis and
forecast [10]. Despite RL successes in recent years [11–13], these results suffer the lack of scalability
and cannot manage high dimensional problems. The DRL technique, by combining both RL and DL
methods, where DL is equipped with the vigorous function approximation, representation learning
properties of deep neural networks (DNN), and handling complex and nonlinear patterns of economic
data, can efficiently overcome these problems [14,15]. Ultimately, the purpose of this paper is to
comprehensively provide an overview of the state-of-the-art in the application of both DL and DRL
approaches in economics. However, in this paper, we focus on the state-of-the-art papers that employ
DL, RL, and DRL methods in economics issues. The main contributions of this paper can be summarized
as follows:
• Classification of the existing DL, RL, and DRL approaches in economics.

• Providing extensive insights into the accuracy and applicability of DL-, RL-, and DRL-based
economic models.
• Discussing the core technologies and architecture of DRL in economic technologies.
• Proposing a general architecture of DRL in economics.
• Presenting open issues and challenges in current deep reinforcement learning models in economics.
The survey is organized as follows. We briefly review the common DL and DRL techniques in
Section 2. Section 3 proposes the core architecture and applicability of DL and DRL approaches in
economics. Finally, we follow the discussion and present real-world challenges in the DRL model in
economics in Section 4 with a conclusion to work in Section 5.
2. Methodology and Taxonomy of the Survey

The survey adopts the Prisma standard to identify and review the DL and DRL methods used in
economics. As stated in [16], a systematic review based on the Prisma method includes four steps:
(1) identification, (2) screening, (3) eligibility, (4) inclusion. In the identification stage, the documents
are identified through an initial search among the mentioned databases. Through Thomson Reuters
Web-of-Science (WoS) and Elsevier Scopus, 400 of the most relevant articles are identified. The screening
step includes two stages in which, first, duplicate articles are eliminated. As a result, 200 unique articles
Mathematics 2020, 8, 1640 3 of 42
Mathematics 2020, 8, x FOR PEER REVIEW 3 of 44

moved to the next stage, where the relevance of the articles is examined on the basis of their title and
of their
abstract. Thetitle and of
result abstract. Thewas
this step result
80 of this step
articles forwas 80 articles
further for furtherThe
consideration. consideration.
next step ofThe
thenext
Prisma
step of the Prisma model is eligibility, in which the full text of articles was read by the authors, and
model is eligibility, in which the full text of articles was read by the authors, and 57 of them considered
57 of them considered eligible for final review in this study. The last step of the Prisma model is the
eligible for final review in this study. The last step of the Prisma model is the creation of the database of
creation of the database of the study, which is used for qualitative and quantitative analyses. The
the study, which is used for qualitative and quantitative analyses. The database of the current research
database of the current research comprises 57 articles, and all the analyses in this study took place
comprises
based on57 these
articles, andFigure
articles. all the1analyses
illustratesin
thethis study
steps took place
of creating based on
the database these
of the articles.
current Figure 1
research
illustrates the steps of creating
based on the Prisma method. the database of the current research based on the Prisma method.
Figure 1. Diagram of the systematic selection of the survey database.

Figure 1. Diagram of the systematic selection of the survey database.
Taxonomy of the survey is given in Figure 2, where the notable methods DRL are presented.
Mathematics
Taxonomy 2020, 8,of
x FOR
the PEER REVIEW
survey is given 4 of 44
in Figure 2, where the notable methods DRL are presented.
Value-Based Methods Q-Learning Deep Q-Networks (DQN)

Model-Based Methods
Deep Reinforcement Learning Distributional DQN
Double DQN
Pure Model-Based Methods Actor–Critic Methods Policy Gradient Methods
Integrating Model-Free and Model-Based Methods (IMF&MBM) Combined Policy Gradient and Q-Learning
Stochastic Policy Gradient (SPG) Deterministic Policy Gradient (DPG)
Figure 2. Taxonomy of the notable DRL applied in economics.

2.1. Deep Learning Methods

In this section, we review the most commonly used DL algorithms, which have been applied in
various fields [16–20]. These deep networks comprise stacked auto-encoders (SAEs), deep belief
networks (DBNs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs). A
fundamental structure of a neural network is presented in Figure 3.
Stochastic Policy Gradient (SPG) Deterministic Policy Gradient (DPG)
Mathematics 2020, 8, 1640 4 of 42

2.1.
2.1.Deep
DeepLearning
LearningMethods
Methods
In
Inthis
thissection,
section,wewereview
reviewthethemost
mostcommonly
commonly used DLDL
used algorithms, which
algorithms, have
which been
have applied
been in
applied
various fields
in various [16–20].
fields These
[16–20]. Thesedeep networks
deep networks comprise
comprisestacked
stackedauto-encoders
auto-encoders(SAEs),
(SAEs),deep
deepbelief
belief
networks
networks(DBNs),
(DBNs),convolutional
convolutionalneural
neuralnetworks
networks(CNNs),
(CNNs),and recurrent
and neural
recurrent networks
neural (RNNs).
networks A
(RNNs).
fundamental
A fundamental structure of aofneural
structure network
a neural is presented
network in Figure
is presented 3. 3.
in Figure
Structureofofaasimple
Figure3.3.Structure
Figure simpleneural
neuralnetwork.
network.
2.2.Stacked
2.2. StackedAuto-Encoders
Auto-Encoders(SAEs)
(SAEs)
Thebasic
The basicbuilding
buildingblock
blockof of
thethe stacked
stacked AEAE is called
is called an AE,an which
AE, which includes
includes one visible
one visible input
input layer
layerone
and andhidden
one hidden
layer layer
[17]. [17].
It hasIt two
has two steps
steps in the
in the training
training process.
process. InIn mathematics,they
mathematics, theycan
canbebe
explained as Equations (1) and
explained as equations (1) and (2): (2):
h = f ( wh x + b h ) (1)
ℎ = 𝑓(𝑤ℎ 𝑥 + 𝑏ℎ ) (1)
Y = f wy x + by (2)
𝒴 = 𝑓(𝑤𝑦 𝑥 + 𝑏𝑦 ) (2)
The hidden layer h can be transformed to provide the output value Y. Here, x ∈ R𝑑d represents
The hidden layer ℎ can be transformed to provide the output value 𝒴. Here, 𝑥 𝜖 ℝ represents
the input values, and h ∈ RL denotes the hidden layer, i.e., the encoder. wh and w y represent the
the input values, and ℎ 𝜖 ℝ𝐿 denotes the hidden layer, i.e., the encoder. 𝑤ℎ and 𝑤𝑦 represent the
input-to-hidden and hidden-to-input weights, respectively. bh and b y refer to the bias of the hidden and
output terms, respectively, and f (.) indicates an activation function. One can estimate the error term
utilizing the Euclidean distance for approximating input data x while minimizing kx − yk22 . Figure 4
presents the architecture of an AE.
2.3. Deep Belief Networks (DBNs)

The basic building block of deep belief networks is known as a restricted Boltzmann machine
(RBM) which is a layer-wise training model [18]. It contains a two-layer network with visible and
hidden units. One can express the joint configuration energy of the units as Equation (3):
d
X L
X d X
X L
E(v, h; θ) = − bi vi − a jh j − wij vi h j =−bT v − aT h−vT wh (3)
i=1 j=1 i=1 j=1
where bi and a j are the bias term of the visible and hidden units, respectively. Here, wij denotes the
weight between the visible unit i and hidden unit j. In RBM, the hidden units can capture an unbiased
sample from the given data vector, as they are conditionally independent from knowing the visible
Mathematics
Mathematics2020,
2020,8,8,x1640
FOR PEER REVIEW 55ofof44
42
input-to-hidden and hidden-to-input weights, respectively. 𝑏ℎ and 𝑏𝑦 refer to the bias of the hidden
states.
and Oneterms,
output can improve the feature
respectively, representation
and 𝑓(. of activation
) indicates an a single RBM by cumulating
function. diverse RBMs
One can estimate one
the error
after another, which constructs a DBN for detecting a deep hierarchical representation of the
term utilizing the Euclidean distance for approximating input data 𝑥 while minimizing ‖𝑥 − 𝑦‖22 . training
data. Figure
Figure 5 presents
4 presents a simple architecture
the architecture of an AE. of an RBM.

Thesimple
Figure4.4.The
Figure simplearchitecture
architectureofofan
anAE.
AE.
2.3. Deep Belief Networks (DBNs)

The basic building block of deep belief networks is known as a restricted Boltzmann machine
(RBM) which is a layer-wise training model [18]. It contains a two-layer network with visible and
hidden units. One can express the joint configuration energy of the units as Equation (3):
𝐸(𝑣, ℎ; 𝜃) = − ∑𝑑𝑖=1 𝑏𝑖 𝑣𝑖 − ∑𝐿𝑗=1 𝑎𝑗 ℎ𝑗 − ∑d𝑖=1 ∑L𝑗=1 𝑤𝑖𝑗 𝑣𝑖 ℎ𝑗 =-𝑏 𝑇 𝑣 − 𝑎𝑇 ℎ-𝑣 𝑇 𝑤ℎ (3)
where 𝑏𝑖 and 𝑎𝑗 are the bias term of the visible and hidden units, respectively. Here, 𝑤𝑖𝑗 denotes
the weight between the visible unit 𝑖 and hidden unit 𝑗. In RBM, the hidden units can capture an
unbiased sample from the given data vector, as they are conditionally independent from knowing
the visible states. One can improve the feature representation of a single RBM by cumulating diverse
RBMs one after another, which constructs a DBN for detecting a deep hierarchical representation of
the training data. Figure 5 presents
Figure a simple
5. A simple architecture
architecture of anwith
of an RBM RBM.hidden layers.
Figure 5. A simple architecture of an RBM with hidden layers.
2.4. Convolutional Neural Networks (CNNs)
CNNs are composed of a stack of periodic convolution layers and pooling layers with multiple
fully connected layers. In the convolutional layer, CNNs employ a set of kernels to convolve the input
data and intermediate features to yield various feature maps. In general, the pooling layer follows a
convolutional layer, which is utilized to diminish the dimensions of feature maps and the network
parameters. Finally, by utilizing the fully connected layers, these obtained maps can be transformed
into feature vectors. We present the formula of the vital parts of the CNNs, which are the convolution
layers. Assume that X is the input cube with the size of m × n × d where m × n refers to the spatial level
2.4. Convolutional Neural Networks (CNNs)
CNNs are composed of a stack of periodic convolution layers and pooling layers with multiple
fully connected layers. In the convolutional layer, CNNs employ a set of kernels to convolve the input
data and intermediate features to yield various feature maps. In general, the pooling layer follows a
Mathematics 2020, 8,layer,
convolutional 1640 which is utilized to diminish the dimensions of feature maps and the network
6 of 42
parameters. Finally, by utilizing the fully connected layers, these obtained maps can be transformed
into feature vectors. We present the formula of the vital parts of the CNNs, which are the convolution
of X, and d counts the channels. The j-th filter is specified with weight w j and bias b j . Then, one can
layers. Assume that 𝑋 is the input cube with the size of 𝑚 × 𝑛 × 𝑑 where 𝑚 × 𝑛 refers to the spatial
express the j-th output associated with the convolution layer as Equation (4):
level of 𝑋, and 𝑑 counts the channels. The 𝑗-th filter is specified with weight 𝑤𝑗 and bias 𝑏𝑗 . Then,
one can express the 𝑗-th output associated Xd with the convolution
layer as Equation (4):
y j = 𝑑 f xi ∗ w j + b j , j = 1, 2, . . . , k (4)
𝑦𝑗 = ∑𝑖=1 𝑓(𝑥𝑖 ∗ 𝑤𝑗 + 𝑏𝑗 ), 𝑗 = 1, 2, … , 𝑘 (4)
i=1
Here, the activation function 𝑓(. ) is used to enhance the network's nonlinearity. Currently,
Here, the activation function f (.) is used to enhance the network’s nonlinearity. Currently,
ReLU [19] is the most popular activation function that leads to notably rapid convergence and
ReLU [19] is the most popular activation function that leads to notably rapid convergence and
robustness in terms of gradient vanishing [20]. Figure 6 presents the convolution procedure.
robustness in terms of gradient vanishing [20]. Figure 6 presents the convolution procedure.
(a) Kernel (b) Input (c) Output

Figure 6.
Figure 6. Convolution
Convolution procedure.
procedure.
2.5.
2.5. Recurrent
Recurrent Neural
Neural Networks
Networks (RNNs)
(RNNs)
RNNs
RNNs extended
extended the
the conventional
conventional neural
neural network
network with
with loops
loops in
in connections
connections and
and were
were developed
developed
in
in [21];
[21]; RNNs
RNNs can
can identify
identify patterns
patterns in sequential
sequential data and dynamic temporal
temporal specification
specification byby utilizing
utilizing
recurrent
recurrent hidden
hidden states
states compared
compared to to the
the feedforward
feedforward neural network. Suppose that 𝑥x is
Suppose that is the
the input
input
vector. The recurrent hidden state hhti<𝑡>
of the RNN can be updated by Equation (5):
vector. The recurrent hidden state ℎ of the RNN can be updated by Equation (5):
𝑜
ℎ<𝑡> = {𝑓 (ℎ<𝑡−1> t 𝑖𝑓=𝑡=𝑜
(
o o ,
, 𝑥 <𝑡> )i f 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (5)
hhti = 1
ht−1i hti , (5)
f1 (h , x ) otherwise
where 𝑓1 denotes a nonlinear function, such as a hyperbolic agent function. The update rule of the
recurrent
where hidden state
f1 denotes can be expressed
a nonlinear as equation
function, such (6):
as a hyperbolic agent function. The update rule of the
recurrent hidden state can be expressed as Equation (6):
ℎ<𝑡> = 𝑓1 (𝑢ℎ<𝑡−1> + 𝑤𝑥 <𝑡> + 𝑏ℎ ), (6)
where 𝑤 and 𝑢 indicate the coefficient hhti = matrices
f1 (uhht−1ifor
+ thewxhtiinput
+ bh )in, the current state and the activation
(6)
of recurrent hidden units at the prior step, respectively. 𝑏ℎ is the bias vector. Therefore, the output
where
𝒴 <𝑡> atw time
and u𝑡indicate the coefficient
is presented as Equationmatrices
(7): for the input in the current state and the activation of
recurrent hidden units at the prior step, respectively. bh is the bias vector. Therefore, the output Yhti at
𝒴 <𝑡> = 𝑓2 (𝑝ℎ<𝑡> + 𝑏𝑦 ). (7)
time t is presented as Equation (7):
Here, 𝑓 is the nonlinear function, and Yhti 𝑝=is phhticoefficient
f2 (the + b y ). matrix for the activation of recurrent (7)
hidden units in the current step, and 𝑏ℎ represents the bias vector. Due to the vanishing gradient of
Here, f is the nonlinear function, and p is the coefficient matrix for the activation of recurrent
traditional RNNs, long-short-term memory (LSTM) [22] and gated recurrent units [23] were
hidden units in the current step, and bh represents the bias vector. Due to the vanishing gradient of
introduced to handle huge sequential data. The convolution procedure is presented in [24]. Figure 7
traditional RNNs, long-short-term memory (LSTM) [22] and gated recurrent units [23] were introduced
illustrates a schematic for an RNN.
to handle huge sequential data. The convolution procedure is presented in [24]. Figure 7 illustrates a
schematic for an RNN.
Mathematics
Mathematics 2020,2020, 8, 1640
8, x FOR PEER REVIEW 7 of 42
8 of 44
Anillustration
Figure7.7.An
Figure illustration of
ofRNN.
RNN.
2.6. Deep Reinforcement Learning Methods
2.6. Deep Reinforcement Learning Methods
In this section, we mainly focus on the most commonly used deep RL algorithms, such as
In this section,
value-based we mainly
methods, policy focus
gradienton methods,
the mostand commonly
model-based usedmethods.
deep RL algorithms, such as value-
based methods,
We firstpolicy
describe gradient methods, the
how to formulate andRL model-based
problem for the methods.
agent dealing with an environment
We first
while thedescribe
goal is tohow to formulate
maximize the RL
cumulative problem
rewards. Thefor theimportant
two agent dealing with an of
characteristics environment
RL are
whileasthe
follows: first,
goal is to the agent has
maximize the capability
cumulative of learning
rewards. The twogoodimportant
behavior incrementally andofsecond,
characteristics RL are as
the RL agent enjoys the trial-and-error experience by only dealing with the environment
follows: first, the agent has the capability of learning good behavior incrementally and second, the and gathers
information
RL agent enjoys(see
the Figure 6). It is worth
trial-and-error mentioning
experience that RL
by only methods
dealing arethe
with ableenvironment
to practically and
provide
gathers
the most appropriate method in terms of computational efficiency as compared to some traditional
information (see Figure 6). It is worth mentioning that RL methods are able to practically provide the
approaches. One can model the RL problem with a Markov Decision Process (MDP) with a 5-tuple (S,
most appropriate method in terms of computational efficiency as compared to some traditional
A, T, R, λ) where S (state-space), A (action space), T ∈ [0,1] (transition function), R (reward function),
approaches. One can model the RL problem with a Markov Decision Process (MDP) with a 5-tuple
and γ ∈ [0,1) (discount factor). The RL agent aims to search for the optimal expected return base on the
(𝒮, 𝒜, T, ℛ, 𝜆) where
value function V π (s)𝒮by(state-space),
Equation (8). 𝒜 (action space), T ∈ [0,1] (transition function), ℛ (reward
function), and 𝛾 ∈ [0, 1) (discount factor). The RL agent aims to search for the optimal expected
∞
𝜋X
return base on the value function 𝑉 (𝑠) kby equation (8).
V π (s)= E γ rk+t st = s. π where V ∗ = max V π (s)

(8)

𝜋 ∞ k =0 𝑘 π∈Π
∗ 𝜋
𝑉 (𝑠) = 𝔼(∑𝑘=0 𝛾 𝑟𝑘+𝑡 |𝑠𝑡 = 𝑠. 𝜋)where 𝑉 = max 𝑉 (𝑠)

(8)
𝜋∈Π
where:
where:
rt = E R(st , a, st+1 ), (9)
a∼π(st ,.)
𝑟𝑡 = 𝔼 ℛ(𝑠𝑡 , 𝑎, 𝑠𝑡+1 ), (9)
P(st+1 st , at ) 𝑎=~T𝜋(𝑠
(st𝑡.,.)at . st+1 ) with at ∼ π(st ) (10)
Q(value
Analogously, the ℙ 𝑠𝑡+1 |𝑠function
𝑡 , 𝑎𝑡 ) =can Τ(𝑠
be𝑡 .expressed
𝑎𝑡 . 𝑠𝑡+1 ) as
with 𝑎𝑡 ~(11):
Equation 𝜋(𝑠𝑡 . . ) (10)
Analogously, the Q value function

∞
X
can
be expressed  as Equation (11):
Qπ (s.a)= π

𝜋 ∞E 𝑘 γk rk+t st = s.at = a. π where Q∗ = max ∗ Q (s.a) 𝜋 (11)
𝑄 (𝑠. 𝑎) = 𝔼(∑𝑘=0 𝛾 k =0
𝑟𝑘+𝑡 |𝑠 𝑡 = 𝑠. 𝑎 𝑡 = 𝑎. 𝜋) where 𝑄
π∈Π = max 𝑄 (𝑠. 𝑎) (11)
𝜋∈Π
One One
can can
see see
thethe
general architecture
general architectureof
ofthe
the DRL algorithmsininFigure
DRL algorithms Figure 8 adapted
8 adapted from
from [25].[25].
Mathematics 2020, 8, 1640 8 of 42
Interaction
8. Interaction
Figure 8. between
between thethe agent
agent andand environment
environment and and the general
the general structure
structure of theofDRL
the
DRL approaches.
approaches.
2.6.1. Value-Based Methods

2.6.1. Value-Based Methods
The value-based algorithms allow us to construct a value function for defining a policy. We discuss
The value-based algorithms allow us to construct a value function for defining a policy. We
the Q-learning algorithm [26] and the deep q-network (DQN) algorithm [6] with great success when
discuss the Q-learning algorithm [26] and the deep q-network (DQN) algorithm [6] with great success
playing ATARI games. We then give a brief review of the improved DQN algorithm.
when playing ATARI games. We then give a brief review of the improved DQN algorithm.
2.6.2. Q-Learning
2.6.2. Q-Learning
The basic value-based algorithm is called the Q-learning algorithm. Assume Q is the value
The basic value-based algorithm is called the Q-learning algorithm. Assume Q is the value function;
function; then, the optimal value of the Q-learning algorithm using the Bellman equation [27] can be
then, the optimal value of the Q-learning algorithm using the Bellman equation [27] can be expressed
expressed as Equation (12):
as equation (12):
Q∗ (s.a) = (BQ∗ ) (s.a) (12)
𝑄 ∗ (s. a) = (ℬ𝑄 ∗ ) (s.a) (12)
where the Bellman operator (B) can be described as Equation (13):
where the Bellman operator (ℬ) can beX described as Equation (13):
(BK)(s, a)= T(s, a, ś) (R(s, a, ś) + γ max K(ś, á)) (13)
(ℬ𝐾)(𝑠, 𝑎) =∑𝑠́ ∈𝑆 Τ(𝑠, 𝑎, 𝑠́ ) (ℛ(𝑠, 𝑎, 𝑠́ ) + 𝛾á∈A
max Κ (𝑠́ , 𝑎́ )) (13)
ś∈S 𝑎́ ∈ 𝒜
∗
Here,
Here, the
the unique
uniqueoptimal
optimalsolution
solutionofofthe
theQQvalue functionisis 𝑄Q∗(s,(s,a).
valuefunction a). One
Onecan
can check
check out
out the
the
theoretical analysis of the optimal Q function in discrete space with sufficient exploration
theoretical analysis of the optimal Q function in discrete space with sufficient exploration guarantee guarantee
in
in [26].
[26]. In
Inpractice,
practice, aaparameterized
parameterized value
valuefunction
function isisable
ableto
toovercome
overcome the thehigh
highdimensional
dimensional problems
problems
(possibly continuous space).
(possibly continuous space).
2.6.3. Deep
Deep Q-Networks (DQN)
The DQN algorithm is presented by Mnih et al. [6] that can obtain good results for ATARI games
in an online framework. In
In deep
deep Q-learning,
Q-learning, we
wemake
makeuseuse of
of aa neural
neural net
net to
to estimate
estimate aa complex,
complex,
nonlinear Q-value function. Imagine the target function as Equation (14):
𝑄
𝑌Q = 𝑟 + 𝛾 max Q (𝑠́ , 𝑎́ ; 𝜃𝑘̅ ) (14)
Y𝑘k = r + γ max
𝑎́ ∈ 𝒜 Q(ś, á; θk ) (14)
á∈A
The 𝜃𝑘̅ parameter, which defines the values of the Q function at the 𝑘 𝑡ℎ iteration, will be
updated θk parameter,
The only every 𝐴 𝜖which
ℕ iteration
definestothe
keep the stability
values of the Qand diminish
function thekthrisk
at the of divergence.
iteration, In order
will be updated
to
onlybound
everythe
A instabilities,
∈ N iterationDQN uses
to keep thetwo heuristics,
stability i.e., the target
and diminish Q-network
the risk and the
of divergence. Inreplay
order tomemory
bound
[28]. Additionally, DQN takes the advantage of other heuristics such as clipping
the instabilities, DQN uses two heuristics, i.e., the target Q-network and the replay memory [28]. the rewards for
maintaining reasonable target values and for ensuring proper learning. An interesting aspect of DQN
is that a variety of deep learning techniques are used to practically improve its performance such as the
Mathematics 2020, 8, 1640 9 of 42
Additionally, DQN takes the advantage of other heuristics such as clipping the rewards for maintaining
reasonable target values and for ensuring proper learning. An interesting aspect of DQN is that
a variety of deep learning techniques are used to practically improve its performance such as the
preprocessing step of the inputs, convolutional layers, and the optimization (stochastic gradient
preprocessing step of the inputs, convolutional layers, and the optimization (stochastic gradient
descent) [29]. We show the general scheme of the DQN algorithm [25] in Figure 9.
descent) [29]. We show the general scheme of the DQN algorithm [25] in Figure 9.
Figure 9. Basic structure of the DQN algorithm.

Figure 9. Basic structure of the DQN algorithm.
2.6.4. Double DQN
2.6.4. Double DQN
In Q-learning, the DQN-value function utilizes a similar amount as Q-value in order to identify
andIn Q-learning,
evaluate the DQN-value
an action function
which may cause utilizes a similar
overestimated valuesamount as Q-value
and upward bias inin
theorder to identify
algorithm. Thus,
and evaluate an action which may cause overestimated values and upward bias in the
the double estimator method can be used for each variable to efficiently remove the positive bias in algorithm.
Thus, the double
the action estimator
estimation method
process [30]. can
The be used DQN
Double for each variable to efficiently
is independent remove
of any source the positive
of error, namely,
bias in the environmental
stochastic action estimation process
error. [30].value
The target The Double
functionDQN
in theisdouble
independent of any source
DQN (DDQN) of error,
can be described
namely, stochastic
as Equation (15):environmental error. The target value function in the double DQN (DDQN) can be
described as Equation (15): YDDQN = r + γQ(ś, argmax Q(ś, a; θ ); θ ) (15)
k k k
a∈A
𝑌𝑘𝐷𝐷𝑄𝑁 = 𝑟 + 𝛾𝑄 (𝑠́ , arg max 𝑄 (𝑠́ , 𝑎; 𝜃𝑘 ); 𝜃𝑘̅ )
Compared to the Q-network, DDQN is usually and to obtain a(15)
able to improve stability
𝑎∈𝒜 more
accurate Q-value function as well.
Compared to the Q-network, DDQN is usually able to improve stability and to obtain a more
accurate Q-value function as well.
2.6.5. Distributional DQN
The idea of approaches
2.6.5. Distributional DQN explained in the previous subsections was to estimate the expected
cumulative return. Another interesting method is to represent a value distribution which allows
The idea of approaches explained in the previous subsections was to estimate the expected
us to better detect the inherent stochastic rewards and agent transitions in conjunction with the
cumulative return. Another interesting method is to represent a value distribution which allows us to
environment. One can define the random distribution return function associated with policy π as
better detect the inherent stochastic rewards and agent transitions in conjunction with the
follows in Equation (16):
environment. One can define the random distribution return function associated with policy 𝜋 as
Zπ (s, a) = R(s, a, Ś)+γZπ (Ś, Á) (16)
follows in Equation (16):
Equation (16) includes random𝜋state-action pairś (Ś, Á) and Á ∼ π(.|Ś). Thus, the Q value function
Ζ (s, a) = ℛ(s, a, 𝑆) + 𝛾Ζ 𝜋 (𝑆́, 𝐴́) (16)
can be expressed as Equation (17):
E[Zπ (𝑆
Qπ (s, a) =pairs
Equation (16) includes random state-action (s,́ , a𝐴)]́ ) and 𝐴́ ~ 𝜋(. |𝑆́). Thus, the Q value (17)
function can be expressed as Equation (17):
In practice, the distributional Bellman equation, which interacts with deep learning, can play the
role of the approximation function [31–33].𝑄 𝜋 (s,
The
𝜋
𝔼[Ζbenefit
a) =main (s, a)] of this approach is the implementation (17) of
risk-aware behavior
In practice, [34], and improved
the distributional Bellmanlearning
equation, provides a richer set
which interacts of deep
with training signalscan
learning, [35].
play the
role of the approximation function [31-33]. The main benefit of this approach is the implementation of
2.7. Policy Gradient Methods
risk-aware behavior [34], and improved learning provides a richer set of training signals [35].
This section discusses policy gradient (PG) methods that are frequently used algorithms in
2.7. Policy Gradient
reinforcement Methods
learning [36] which follows a class of policy-based methods. The method is to find a
neural network parameterized policy in order to maximize the expected cumulative reward [37].
This section discusses policy gradient (PG) methods that are frequently used algorithms in
reinforcement learning [36] which follows a class of policy-based methods. The method is to find a
neural network parameterized policy in order to maximize the expected cumulative reward [37].
Mathematics 2020, 8, 1640 10 of 42
2.7.1. Stochastic Policy Gradient (SPG)

The easiest approach to obtain the policy gradient estimator could be to utilize algorithm [38].
The general approach to derive the estimated gradient is shown in Equation (18):
∇ω πω (s, a) = πω (s, a) ∇ω log(πω (s, a)) (18)
while
∇ω V πω (s0 )= Es∼ρπ ω,a∼πω [∇ω (log πω (s, a))Qπω (s, a)] (19)
Note that, in these methods, the policy evaluation estimates the Q-function, and the policy
improvement optimizes the policy by taking a gradient step, utilizing the value function approximation.
The easy way of estimating the Q-function is to exchange it with a cumulative return from entire
trajectories. A value-based method such as the actor-critic method can be used to estimate the
return efficiently. In general, an entropy function can be used for the policy randomness and efficient
exploration purpose. Additionally, it is common to employ an advantage value function where conducts
a measurement of comparison to the expected return for each action. In practice, this replacement
improves the numerical efficiency.
2.7.2. Deterministic Policy Gradient (DPG)

The DPG approach is the expected gradient of the action–value function. The deterministic
policy gradient can be approximated without using an integral term over the action space. It can be
demonstrated that the DPG algorithms can perform better than SPG algorithms in high-dimensional
action spaces [39]. NFQ and DQN algorithms can resolve the problematic discrete actions using the
Deep Deterministic Policy Gradient (DDPG) [39] and the Neural Fitted Q Iteration with Continuous
Actions (NFQCA) [40] algorithms, with the direct representation of a policy. An approach proposed
by [41] was developed to overcome the global optimization problem while updating greedy policy
at each step. They defined a differentiable deterministic policy that can be moved to the gradient
direction of the value function for deriving the DDPG algorithm, Equation (20):
∇ω V πω (s0 )= Es∼ρπ ω [∇ω (πω )∇a (Qπω (s, a))a = πω (s)]

(20)
which shows that Equation (20) is based on ∇a (Qπω (s, a)).
2.7.3. Actor–Critic Methods

An actor–critic architecture is a common approach where the actor updates the policy distribution
with policy gradients, and the critic estimates the value function for the current policy [42], Equation (21).
∇ω V πω (s0 )= Es∼ρπβ , a∼π [∇θ (log πω (s, a)Qπω (s, a))]. (21)
β
where β is behavior policy that makes the gradient biased, and the critic with parameter θ estimates
the value function, Q(s, a; θ), with the current policy π.
In deep reinforcement learning, the actor–critic functions can be parameterized with nonlinear
neural networks [36]. The approach proposed by Sutton [7] was quite simple but not computationally
efficient. The ideal is to design an architecture to profit from the reasonably fast reward propagation,
the stability, and the capability using replay memory. However, the new approach utilized in the
actor–critic framework presented by Wang et al. [43] and Gruslys et al. [44] has sample efficiency and
is computationally efficient as well.
2.7.4. Combined Policy Gradient and Q-Learning

In order to improve the policy strategy in RL, an efficient technique needs to be engaged, such as
a policy gradient, applying a sample-efficient approach and value function approximation associated
Mathematics 2020, 8, 1640 11 of 42
with the policy. These algorithms enable us to work with continuous action spaces, to construct the
policies for explicit exploration, and to apply the policies to multiagent setting where the problem
deals with the stochastic optimal policy. However, the idea of combining policy gradient methods
with optimal policy Q-learning was proposed by O Donoghue et al. [45] while summing the equations
with an entropy function, Equation (22):
∇ω V πω (s0 ) = Es,a [∇ω (log πω (s, a))Qπω (s, a)]+αEs [∇ω Hπω (s)] (22)
where
Hπ ( s ) =
X
π(s, a) log π(s, a) (23)
a
It showed that in some specific settings, both value-based and policy-based approaches have a
fairly similar structure [46–48].
2.8. Model-Based Methods

We have discussed so far, the value-based or the policy-based methods which belong to the
model-free approach. In this section, we focus on the model-based approach where the model deals
with the dynamics of the environment and the reward function.
2.8.1. Pure Model-Based Methods

When the explicit model is not known, it can be learned from experience by the function
approximators [49–51]. The model plays the actual environment role to recommend an action.
The common approach in the case of discrete actions is look ahead search, and trajectory optimization
can be utilized in a continuous case. A lookahead search is to generate potential trajectories with the
difficulty of exploration and exploitation trade-off in sampling trajectories. The popular approaches
for lookahead search are Monte Carlo tree search (MCTS) methods such that the MCTS algorithm
recommends an action (see Figure 9). Recently, instead of using explicit tree search techniques,
learning an end-to-end model was developed [52] with improved sample efficiency and performance
as well. Lookahead search techniques are useless in a continuous environment. Another approach
is PILCO, i.e., apply Gaussian processes in order to produce a probabilistic model with reasonable
sample efficiency [53]. However, the gaussian processes are reliable only in low-dimensional problems.
One can take the benefit of the generalization capabilities of DL approaches to build the model of the
environment in higher dimensions. For example, a DNN can be utilized in a latent state space [54].
Another approach aims at leveraging the trajectory optimization such as guided policy search [55] by
taking a few sequences of actions and then learns the policy from these sequences. The illustration of
the MCTS process is presented in Figure 10 and was adopted from [25].
2.8.2. Integrating Model-Free and Model-Based Methods (IMF&MBM)

The choice of model-free versus model-based approaches mainly depends on the model architecture
such as policy and value function. As an example to clearly explain the key point, assume that an
agent needs to pass the street randomly while the best choice is to take the step unless something
unusual happens in front of the agent. In this situation, using the model-based approach may be
problematic due to the randomness of the model, while a model-free approach to find the optimal
policy is highly recommended. There is a possibility of integrating planning and learning to produce a
practical algorithm which is computationally efficient. In the absence of a model where the limited
number of trajectories is known, one approach is to build an algorithm to generalize or to construct a
model to generate more samples for model-free problems [56]. Another choice could be utilizing a
model-based approach to accomplish primary tasks and apply model-free fine-tuning to achieve the goal
successfully [57]. The tree employs a model to search techniques directly [58]. The notion of a neural
network is able to combine these two approaches. The model proposed by Heess et al. [59] engaged a
as well. Lookahead search techniques are useless in a continuous environment. Another approach is
PILCO, i.e., apply Gaussian processes in order to produce a probabilistic model with reasonable
sample efficiency [53]. However, the gaussian processes are reliable only in low-dimensional
problems. One can take the benefit of the generalization capabilities of DL approaches to build the
Mathematics 2020, 8, 1640 12 of 42
model of the environment in higher dimensions. For example, a DNN can be utilized in a latent state
space [54]. Another approach aims at leveraging the trajectory optimization such as guided policy
search [55] by taking
backpropagation a few sequences
algorithm of actions
to estimate a valueand then learns
function. the policy
Another from
example these
is the sequences.
work Theby
presented
illustration of the MCTS process is presented in Figure 10 and was adopted from [25].
Schema Networks [60] that uses prolific structured architecture where it leads to robust generalization.
Figure 10. Illustration of the MCTS process.

Figure 10. Illustration of the MCTS process.
3. Review Section
This section discusses an overview of various interesting uses of both DL and deep RL approaches
in economics.
3.1. Deep Learning Application in Economics

The recent attractive application of deep learning in a variety of economics domains is discussed
in this section.
3.1.1. Deep Learning in Stock Pricing

From an economic point of view, the stock market value and its development are essential to
business growth. In the current economic situation, there are many investors around the world
that are interested in the stock market in order to receive quick and better return compared to other
sectors. The presence of uncertainty and risk in the forecasting of stock pricing bring challenges to the
researcher to design a market model for prediction. Despite all advances to develop mathematical
models for forecasting, they are still not that successful [61]. The deep learning topic attracts scientists
and practitioners as it is useful for high revenue while enhancing the prediction accuracy with DL
methods. Table 1 presents recent research.
Table 1. Application of deep learning in stock price prediction.
Reference Methods Application

Deep learning framework for stock
[62] Two-Streamed gated recurrent unit network
value prediction
[63] Filtering methods Novel filtering approach
Pattern matching algorithm for forecasting
[64] Pattern techniques
the stock value
Advanced DL framework for the stock
[65] Multilayer deep Approach
value price
According to Table 1, Minh et al. [62] presented a more realistic framework for forecasting stock
price movement concerning financial news and sentiment dictionary, as previous studies mostly relied
Mathematics 2020, 8, 1640 13 of 42
on an inefficient sentiment dataset, which are crucial in stock trends, which led to poor performance.
They proposed the Two-stream Gated Recurrent Unit (TGRU) using deep learning techniques that
perform better than the LSTM model. Where it takes the advantage of applying two states that enable
Harvard IV-4. Additionally, they provided a simulation system for investors in order to calculate
the model2020,
Mathematics to provide
8, x FOR much
PEER better information. They presented a sentiment Stock2Vec embedding
REVIEW 14 of 44
their actual return. Results were evaluated using accuracy, precision, and recall values to compare
with the proof of the model robustness in terms of market risk while using Harvard IV-4. Additionally,
TGRU and LSTM techniques with GRU. Figure 11 presents the relative percentage of performance
they provided
Harvard a simulation they
IV-4. Additionally, system for investors
provided in order
a simulation to calculate
system their actual
for investors return.
in order Results
to calculate
factors for TGRU and LSTM in comparison with that for GRU.
were actual
their evaluated using
return. accuracy,
Results wereprecision,
evaluatedand recall
using values to
accuracy, compareand
precision, TGRU andvalues
recall LSTMtotechniques
compare
with GRU.
TGRU Figure techniques
and LSTM 11 presentswith
the relative percentage
GRU. Figure of performance
11 presents factors
the relative for TGRU
percentage and LSTM in
of performance
comparison
factors with and
for TGRU that LSTM
for GRU.
in comparison with that for GRU.
Figure 11. Performance factors for comparing TGRU and LSTM with GRU.
As is clearFigure from 11.

Figure 11, TGRU presents higher improvement in relative values for
Performance factors for comparing TGRU and LSTM with GRU.
performance factors Figure
in11. Performance
comparison factors
with for comparing
LSTM. Also, TGRUTGRU and LSTM
provides withimprovement
higher GRU. in recall
values. As is clear from Figure 11, TGRU presents higher improvement in relative values for performance
As is clear from Figure 11, TGRU presents higher improvement in relative values for
Song
factors et al [63] presented
in comparison with LSTM.work to apply
Also, TGRUspotlighted deepimprovement
provides higher learning techniques
in recallfor forecasting
values.
performance factors in comparison with LSTM. Also, TGRU provides higher improvement in recall
stockSongtrends. The research developed a deep learning model with a novel input-feature
et al. [63] presented work to apply spotlighted deep learning techniques for forecasting mainly
stock
values.
focused
trends. Theon filtering
researchtechniques
developedinaterms of delivering
deep learning better
model withtraining
a novelaccuracy. Results
input-feature werefocused
mainly evaluatedon
Song et al [63] presented work to apply spotlighted deep learning techniques for forecasting
by accuracy
filtering valuesininterms
techniques a training step. better
of delivering Comparing
trainingprofit values
accuracy. for the
Results weredeveloped
evaluated approaches
by accuracy
stock trends. The research developed a deep learning model with a novel input-feature mainly
indicated
values in athat the novel
training filtering technology
step. Comparing and for
profit values stock
the price modelapproaches
developed by employing 715 features
indicated that the
focused on filtering techniques in terms of delivering better training accuracy. Results were evaluated
provided the highest
novel filtering returnand
technology value by more
stock pricethan
model130$.
by Figure 12 presents
employing a visualized
715 features comparison
provided for
the highest
by accuracy values in a training step. Comparing profit values for the developed approaches
accuracy values. The comparative analysis for the methods developed by Song
return value by more than 130$. Figure 12 presents a visualized comparison for accuracy values.et al [63] is presented
indicated that the novel filtering technology and stock price model by employing 715 features
in
TheFigure 12.
comparative analysis for the methods developed by Song et al. [63] is presented in Figure 12.
provided the highest return value by more than 130$. Figure 12 presents a visualized comparison for
accuracy values. The comparative analysis for the methods developed by Song et al [63] is presented
in Figure 12.
Figure 12.
Figure Comparing accuracy
12. Comparing accuracy values
values for
for the
the methods.
methods.
Based on Figure 12, the highest accuracy is related to a simple feature model without filtering,
and the lowest accuracyFigure
is related to a novelaccuracy
12. Comparing featurevalues
model forwith
the plunge filtering. By considering
methods.
and the lowest accuracy is related to a novel feature model with plunge filtering. By considering the
trend, it can be concluded that the absence of the filtering process has a considerable effect on
increasing the accuracy.
and the lowest accuracy is related to a novel feature model with plunge filtering. By considering the
In another study, Go and Hong [64] employed the DL technique to forecast stock value streams
trend, it can be concluded that the absence of the filtering process has a considerable effect on
while analysing the pattern in stock price. The study designed a DNN deep learning algorithm to
In another study, Go and Hong [64] employed the DL technique to forecast stock value streams
Mathematics 2020, 8, 1640 14 of 42
the trend, it can be concluded that the absence of the filtering process has a considerable effect on
In another
Mathematics 2020, 8, study, Go and
x FOR PEER Hong [64] employed the DL technique to forecast stock value streams
REVIEW 15 of 44
while analysing the pattern in stock price. The study designed a DNN deep learning algorithm to find
the pattern
find utilizing
the pattern the time
utilizing seriesseries
the time technique which
technique had ahad
which higha accuracy performance.
high accuracy Results
performance. were
Results
evaluated
were by theby
evaluated percentage of test sets
the percentage of 20
of test companies.
sets The accuracy
of 20 companies. value for value
The accuracy DNN was calculated
for DNN was
to be 86%. to
calculated However, DNN had DNN
be 86%. However, some disadvantages such as over
had some disadvantages fitting
such andfitting
as over complexity. Therefore,
and complexity.
it was proposed
Therefore, it wasto employ CNN
proposed and RNN.
to employ CNN and RNN.
In the
In the study
study by by Das and Mishra
Mishra [65], a new multilayer
multilayer deep
deep learning
learning approach
approach waswas used
used byby
employing the time series concept for data representation to forecast the close price of
employing the time series concept for data representation to forecast the close price of current stock. current stock.
Results were
Results were evaluated
evaluatedby byprediction
predictionerror and
error andaccuracy
accuracyvalues compared
values comparedto the
to results obtained
the results from
obtained
the related
from studies.
the related Based
studies. on the
Based results,
on the thethe
results, prediction
predictionerror was
error wasvery
verylow
lowbased
basedonon the outcome
outcome
graph, and
graph, and the
the predicted
predicted price
price was
was fairly
fairly close
close to the reliable price in a time series data. Figure
Figure 1313
provides a comparison of the the proposed
proposed method
method with
with the
the similar
similar method
method reported
reported byby different
different studies
studies
in terms
in terms of accuracy. Figure 13 reports the comparative analysis from Das and Mishra [65]. [65].
Figure 13.
Figure The accuracy
13. The accuracy values.
values.
Based on
Based on Figure
Figure 13,
13, the
the proposed
proposed method
method byby employing
employing thethe related
related dataset
dataset could
could considerably
considerably
improve the
improve theaccuracy
accuracy value
value by about
by about 34,and
34, 13, 13,1.5%
andcompared
1.5% compared
with Huynhwithet Huynh et al. [66],
al [66], Mingyue et
Mingyue et al. [67], and Weng et al.
al [67], and Weng et al [68], respectively. [68], respectively.
The used
The used approach
approach inin [62]
[62] has
has the
the strong
strong advantage
advantage ofof utilizing
utilizing forward
forward and
and backward
backward learning
learning
states at
states at the
the same
same time
time to
to present
present more
more useful
useful information. Part of
information. Part of the
the mathematical
mathematical forward
forward pass
pass
formula for the update gate is given in Equation
formula for the update gate is given in Equation (24): (24):
⃗⃗⃗⃗⃗𝓏→𝑥𝑡 + ⃗⃗⃗⃗ → →⃗⃗⃗⃗

𝓏𝑡→
⃗⃗⃗ zt==𝜎(𝑊
σ(W 𝑈→ ℎ⃗⃗⃗⃗⃗⃗⃗⃗
𝑡−1++bz𝑏)𝓏 ) (24)
z xt +U𝓏z ht−1 (24)
and the backward pass formula is shown in Equation (25):
and the backward pass formula is shown in Equation (25):
𝓏𝑡 = 𝜎(𝑊
⃖⃗⃗⃗ 𝑈𝓏 ⃖⃗⃗⃗⃗⃗⃗⃗⃗
⃖⃗⃗⃗⃗⃗𝓏 𝑥𝑡 + ⃖⃗⃗⃗⃗ ℎ←𝑡−1 ←+ ⃖⃗⃗⃗⃗
𝑏𝓏 ) (25)
← ← ←
where 𝑥𝑡 is the input vector, and b zt = σ ( Wz x t + U z ht−1 +
is the bias. 𝜎 denotes bz ) logistic
the (25)
function. ℎ𝑡 is the activation
function and W, U are the weights. In the construction of TGRU, both forward and backward passes
where xt is the input vector, and b is the bias. σ denotes the logistic function. ht is the activation function
linked into a single context for the stock forecasting, which led to dramatically enhanced accuracy of
and W, U are the weights. In the construction of TGRU, both forward and backward passes linked
the prediction by applying more efficient financial indicators regarding financial analysis. In Figure
into a single context for the stock forecasting, which led to dramatically enhanced accuracy of the
13 one can see the whole architecture of the proposed model. The TGRU structure is presented in
prediction by applying more efficient financial indicators regarding financial analysis. In Figure 13 one
Figure 14, adopted from [62].
can see the whole architecture of the proposed model. The TGRU structure is presented in Figure 14,
adopted from [62].
Mathematics 2020, 8, 1640 15 of 42
Figure 14. Illustration

Illustration of the TGRU structure.
3.1.2. Deep
3.1.2. Deep Learning
Learning in
in Insurance
Insurance
Another application
Another application of of DL
DL methods
methods is is the
the insurance
insurance sector. One of
sector. One of the
the challenges
challenges of
of insurance
insurance
companies is to efficiently manage fraud detection (see Table 2). In recent years,
companies is to efficiently manage fraud detection (see Table 2). In recent years, ML techniques have ML techniques
have widely
been been widely
used used to develop
to develop practical
practical algorithms
algorithms in field
in this this field
due due to high
to the the high market
market demand
demand for
for new
new approaches
approaches compared
compared with traditional
with traditional methods methods to practically
to practically measuremeasure
all typesall types(Brockett
of risks of risks
(Brockett
et al. 2002;etPathak
al. 2002;
et Pathak
al. 2005,etDerrig,
al. 2005, Derrig,
2002). For 2002). Forthere
instance, instance, there demands
are many are many for
demands for car
car insurance
insurance that forces companies to find novel strategies in order to meliorate and upgrade
that forces companies to find novel strategies in order to meliorate and upgrade their system. Table their system.
Table 2 summarizes
2 summarizes the notable
the most most notable
studiesstudies
for thefor the application
application of DLof DL techniques
techniques in insurance.
in insurance.
Table 2. Application
Table 2. Application of
of deep
deep learning
learning in
in the
the Insurance industry.
Insurance industry.
References Methods Application
[69] Cycling algorithms Fraud detection in car insurance
[69][70] Cycling algorithms
LDA-based appraoch Fraud detection
Insuranceinfraud
car insurance
[71] Autoencoder technique Evaluation of risk in car insurance
[70] LDA-based appraoch Insurance fraud
[71] Autoencoder technique Evaluation of risk in car insurance

Bodaghi and Teimourpour [69] proposed a new method to detect professional fraud in car
insurance for big data by using social network analysis. Their approach employed cycling, which plays
Bodaghi
crucial and
roles in Teimourpour
network systems,[69] proposed an
to construct a new method
indirect to detect
collisions professional
network and to thenfraudidentify
in car
insurance for big data by using social network analysis. Their approach employed
doubtful cycles in order to make more profit concerning more realistic market assumption. Fraud cycling, which
plays crucial
detection mayroles in pricing
affect networkstrategies
systems,andto construct
long-term anprofit
indirect collisions
while dealingnetwork
with theand to thenindustry.
insurance identify
doubtful
Evaluationcycles
of thein methods
order to make more profit
for suspicious concerning
components more
was realistic by
performed market assumption.
the probability of Fraud
being
detection may affect pricing strategies and long-term profit while dealing
fraudulent in the actual data. Fraud probability was calculated for different numbers of nodes in with the insurance
industry. EvaluationIDs
various community of and
the methods
cycle IDs.for suspicious
Based components
on results, the highest was performed
Fraud by the
probability forprobability
a community of
beingobtained
was fraudulent in thenumber
at node actual data. Fraud
10, by 3.759,probability was calculated
and the lowest for different
Fraud probability for anumbers
community of nodes
was
in various community IDs and cycle IDs. Based on results, the highest Fraud
obtained at node number 25, by 0.638. Also, the highest Fraud probability for a cycle was obtained probability for at
a
community
node numberwas obtained
10, by at node
7.898, which wasnumber
about 110%10, by 3.759,
higher and
than theforlowest
that Fraud probability
the community, for a
and the lowest
community was obtained at node number 25, by 0.638. Also, the highest Fraud
Fraud probability for the cycle was obtained at node number 12, by 1.638, which was about 156% probability for a cycle
was obtained at node number
higher than that for the community. 10, by 7.898, which was about 110% higher than that for the
community,
Recently,and the deep
a new lowest Fraud probability
learning model was for the cycle
presented to was obtained
investigate at node
fraud number
in car insurance 12, by
by 1.638,
Wang
which was about 156% higher than that for the community.
and Xu [70]. The proposed model outperforms the traditional method where the combination of
Recently, a new deep learning model was presented to investigate fraud in car insurance by
Wang and Xu [70]. The proposed model outperforms the traditional method where the combination
Mathematics 2020, 8, 1640 16 of 42
latent
of Dirichlet
latent allocation
Dirichlet allocation(LDA) [60]
(LDA) and
[60] thethe
and DNNDNN technique is utilized
technique to to
is utilized extract thethe
extract text features
text of
features
of
thelatent Dirichlet
accidents allocation
comprising (LDA)
traditional [60] and
featuresthe DNN
and text technique
features. is utilized
Another to extract
important the
topic
of the accidents comprising traditional features and text features. Another important topic that brings text features
that brings
of the accidents
interest
interest to
to the comprising
the insurance
insurance traditional
industry
industry is features and
is telematics
telematics text that
devices
devices features.
deal Another
that deal with important
with detecting
detecting topicinformation
hidden
hidden that brings
information
interest
inside theto data
the insurance industry
efficiently.
efficiently. Results is telematics
were devices
evaluated by that deal
accuracy with
and detecting
precision hidden
performance information
factors in
inside the data
two scenarios, efficiently.
“with LDA” Results
and were
“without evaluated
LDA”, to by accuracy
consider the and
effect precision
of LDA on
scenarios, “with LDA” and “without LDA”, to consider the effect of LDA on the prediction performance
the prediction factors in
process.
two
Figurescenarios,
process.14Figure “with
presents
14 the LDA” and
visualized
presents the “without
results.
visualized LDA”,Atocomparative
Aresults.
comparative consider
analysisthe effect RF,
ofanalysis
SVM, of
of LDA
and
SVM, on
DNNRF,the
by prediction
and Wang
DNNand by
process.
Xu [70] isFigure 14
illustratedpresents
in Figurethe visualized
15.
Wang and Xu [70] is illustrated in Figure 15. results. A comparative analysis of SVM, RF, and DNN by
Wang and Xu [70] is illustrated in Figure 15.
Figure 15.
Figure Comparative analysis
15. Comparative analysis of
of SVM,
SVM, RF,
RF, and
and DNN.
DNN.
Figure 15. Comparative analysis of SVM, RF, and DNN.
According to Figure 15, LDA has a positive effect on the accuracy of DNN and RF that could
According to Figure 15, LDA has a positive effect on the accuracy of DNN and RF that could
successfully
According increase the accuracy
to Figure 15, LDAof of
has DNN and RF by about
a positive 7% and 1%, respectively,
of DNN and but RF reduce the
successfully increase the accuracy DNN and RFeffect on the
by about 7%accuracy
and 1%, respectively, but that could
reduce the
accuracy
successfullyof SVM by
increase about 1.2%.
the accuracy On the
oftheDNN other
and hand,
RF by LDA
aboutcould successfully
7% successfully
and 1%, respectively,increase the precision
butprecision
reduce theof
accuracy of SVM by about 1.2%. On other hand, LDA could increase the of
SVM,
accuracyRF, and
of SVM DNN by by about
about 1.2%.10, 1.2
On and
the 4%,
other respectively.
hand, LDA could successfully increase the precision of
SVM, RF, and DNN by about 10, 1.2 and 4%, respectively.
Recent
SVM,Recent
RF, and work
DNN proposed
by about an10,algorithm
1.2 and 4%, combining an auto-encoder technique with telematics data
respectively.
work proposed an algorithm combining an auto-encoder technique with telematics data
to forecast the
Recentthe workrisk associated with insurance customers [71]. To efficiently deal with
with telematics
a large dataset,
to forecast riskproposed
associatedanwith
algorithm
insurance combining
customersan auto-encoder
[71]. To efficiently technique
deal with data
a large dataset,
oneforecast
to requires powerful
the risk updatedwith
associated toolsinsurance
for detecting valuable
customers information
[71]. To efficiently suchdeal
as telematics
with a devices
large [71].
dataset,
one requires powerful updated tools for detecting valuable information such as telematics devices
The
one work utilized a conceptual model in conjunction with telematics technology to forecast the risk
[71].requires
The work powerful
utilized updated
a conceptual toolsmodel
for detecting valuable
in conjunction with information
telematicssuch as telematics
technology devices
to forecast the
(see
[71]. Figure
The 16).
work While
utilized the
a risk score
conceptual (RS)
model calculated
in by Equation
conjunction with (26):
telematics technology to forecast the
risk (see Figure 16). While the risk score (RS) calculated by equation (26):
risk (see Figure 16). While the risk score (RS) calculated by equation P P (26):
∑ ∑ 𝑂
(𝑅𝑆) = ∑ 𝑊 * 𝑂 where 𝑊 i 𝑖 j𝑗 O𝑖𝑗ij
𝑐𝑖 = P
X
(RS) j = 𝑗 𝑖 W𝑐𝑖 ∗ O𝑖𝑗 where W = ∑ 𝑖∑
(26)
(26)
∑ 𝑗 𝑗𝑂𝑂
(𝑅𝑆)𝑗 = ∑𝑖 𝑊𝑐c𝑖i* 𝑂𝑖𝑗 ij
where 𝑊 ci
𝑐𝑖 = ∑j O
𝑖𝑗𝑖𝑗
ij (26)
i 𝑂
𝑗 𝑖𝑗
where 𝑊 and 𝑂 represent the risk weight and driving style, respectively. Their experimental results
showedW
where 𝑊theand
and O𝑂represent
represent
superiority therisk
of the
their risk weightapproach
weight
proposed anddriving
and driving style,
(seestyle,
figure respectively.
respectively.
adopted forTheir Their [71]). results
experimental
the model
showed the superiority of their proposed approach (see (see figure
figure adopted
adopted for for the
the model
model [71]).
[71]).
Figure 16. The graph of the conceptual model.

Figure 16.
Figure The graph
16. The graph of
of the
the conceptual
conceptual model.
model.
3.1.3. Deep Learning in Auction Mechanisms

Auction design has a major importance in practice that allows the organizations to present better
Auction design
services to their has a major
customers. importance
A great in practice
challenge to learn that allows the
a trustable organizations
auction is that its to presentrequire
bidders better
services to their customers. A great challenge to learn a trustable auction is that its bidders require
Mathematics 2020, 8, 1640 17 of 42

optimal strategy
Auction for maximizing
design profit. In this
has a major importance in direction, Myerson
practice that allows designed an optimal
the organizations auctionbetter
to present with
only a single item [72]. There are many works with results for single bidders but most often
services to their customers. A great challenge to learn a trustable auction is that its bidders require with
partial optimality [73-75]. Table 3 presents the notable studies developed by DL
optimal strategy for maximizing profit. In this direction, Myerson designed an optimal auction techniques in
Auction
with only Mechanisms.
a single item [72]. There are many works with results for single bidders but most often
with partial optimality [73–75]. Table 3 presents the notable studies developed by DL techniques in
Table 3. Application of deep learning in auction design.
Auction Mechanisms.
Table 3. Application of deep learning in auction design.
[76] Augmented Lagrangian Technique Optimal auction design
[77]
[76] Extended RegretNet method
Augmented Lagrangian Technique Maximized return in
Optimal auction auction
design
[77] Extended RegretNet method Maximized return in auction
[78] Data-Driven Method Mechanism design in auction
[78] Data-Driven Method Mechanism design in auction
[79] Multi-layer
Multi-layerneural
neuralNetwork
Networkmethod
method Auction
Auctionin
inmobile
mobilenetworks
networks
Dütting et al.
al [80] designed
designed aa compatible
compatibleauction
auctionwith
withmulti-bidders
multi-bidders that
that maximizes
maximizes thethe profit
profit by
by applying
applying multi-layer
multi-layer neural
neural networks
networks for encoding
for encoding its mechanisms.
its mechanisms. The The proposed
proposed method
method was
was able
able to solve
to solve muchmuch
moremore complex
complex taskstasks
whilewhile
usingusing the augmented
the augmented Lagrangian
Lagrangian techniquetechnique than LP-
than LP-based
based approach.
approach. Results
Results werewere evaluated
evaluated by comparing
by comparing thethe total
total revenue.Despite
revenue. Despiteall all previous results,
results,
the proposed
proposed approach
approachhadhadgreat
greatcapability
capabilitytotobebe applied
applied to to
thethe large
large setting
setting with with
highhigh profit
profit and and
low
low regret.
regret. Figure
Figure 17 presents
17 presents the revenue
the revenue outcomes
outcomes forstudy
for the the study by Dütting
by Dütting et al.et[80].
al [80].
Figure 17.
Figure The comparison
17. The comparison of
of revenue
revenue for
for the
the study.
study.
According to Figure 17, RegretNet as the proposed technique increased the revenue by about 2.32
According to Figure 17, RegretNet as the proposed technique increased the revenue by about
and 2% compared with VVCA and AMAbsym, respectively.
2.32 and 2% compared with VVCA and AMAbsym, respectively.
Another study used the deep learning approach to extend the result in [76] in terms of both budget
Another study used the deep learning approach to extend the result in [76] in terms of both
constraints and Bayesian compatibility [77]. The method demonstrated that neural networks are able
budget constraints and Bayesian compatibility [77]. The method demonstrated that neural networks
to efficiently design novel optimal-revenue auctions by focusing on multiple setting problems with
are able to efficiently design novel optimal-revenue auctions by focusing on multiple setting
different valuation distributions. Additionally, a new method proposed by [78] improved the result
problems with different valuation distributions. Additionally, a new method proposed by [78]
in [80] by constructing different mechanisms to apply DL techniques. The approach makes use of
improved the result in [80] by constructing different mechanisms to apply DL techniques. The
strategy under the assumption that multiple bids can be applied to each bidder. Another attractive
approach makes use of strategy under the assumption that multiple bids can be applied to each
approach applied to mobile blockchain networks constructed an effective auction using a multi-layer
bidder. Another attractive approach applied to mobile blockchain networks constructed an effective
neural network technique [79]. Neural networks trained by formulating parameters maximize the
auction using a multi-layer neural network technique [79]. Neural networks trained by formulating
profit of the problem, which considerably outperformed the baseline approach. The main recent work
parameters maximize the profit of the problem, which considerably outperformed the baseline
mentioned in Table 3 indicates that this field is growing fast.
approach. The main recent work mentioned in Table 3 indicates that this field is growing fast.
The proposed approach in [77] modified the regret definition in order to handle budget
constraints while designing an auction with multiple item settings (see Figure 18). The main expected
regret is shown in Equation (27):
Mathematics 2020, 8, 1640 18 of 42
The proposed approach in [77] modified the regret definition in order to handle budget constraints
while designing an auction with multiple item settings (see Figure 18). The main expected regret is
shown in Equation (27):
" #
𝑟𝑔𝑡
rgt𝑖i == E𝔼t𝑡i ∼F
𝑖 ~𝐹
i𝑖
[max
max χ𝜒( p(𝑝
i(t´𝑖(𝑡 ́≤ ≤
b 𝑏
)( )(𝒰
U ( t (𝑡
, t´
i ) 𝑖 ) i 𝑖 i 𝑖i 𝑖 i 𝑖
, 𝑡
) ́ )
− −
U i 𝑖i 𝑖i 𝑖 ))],
𝒰
( t (𝑡
, t , 𝑡
)) , (27)
(27)
𝑡́𝑖 ∈𝒯
t´i ∈T i 𝑖
where 𝜒 is the indicator function, 𝒰 is the interim utility, and 𝑝 is the interim payment. The
where χ is the indicator function, U is the interim utility, and p is the interim payment. The method
method improved the state-of-the-art utilized DL concepts to optimally design an auction applied to
improved the state-of-the-art utilized DL concepts to optimally design an auction applied to the
the multiple items with high-profit. Figure 18 represents an adaptation of RegretNet [77].
multiple items with high-profit. Figure 18 represents an adaptation of RegretNet [77].
Figure 18. Illustration

Figure 18. Illustration of
of aa budgeted
budgeted RegretNet.
RegretNet.
3.1.4. Deep Learning in Banking and Online Markets

3.1.4. Deep Learning in Banking and Online Markets
In current technology improvement, fraud detection is a challenging application of deep learning,
In current technology improvement, fraud detection is a challenging application of deep
namely, in online shopping and credit cards. There is a high market demand to construct an efficient
learning, namely, in online shopping and credit cards. There is a high market demand to construct
system for fraud detection in order to keep the involved system safe (see Table 4).
an efficient system for fraud detection in order to keep the involved system safe (see Table 4).
Table 4. Application of deep learning in the banking system and online market.
Table 4. Application of deep learning in the banking system and online market.
[81] AE Fraud detection in unbalanced datasets
[82]
[81] NetworkAE topology credit card
Fraud detection intransactions
unbalanced datasets
[83] Natural language Processing Anti-money laundering detection
[82]
[84] Network
AE and topology
RBM architecture Fraudcredit card transactions
detection in credit cards
[83] Natural language Processing Anti-money laundering detection
Unsupervised
[84] learningAEcould
andbe used
RBM to investigate onlineFraud
architecture transactions
detectiondue to variable
in credit cards patterns of
fraud and change in customer’s behavior. An interesting work relied on employing deep learning
methods such as AE and RBM to mimic irregularity from regular patterns by rebuilding regular
Unsupervised learning could be used to investigate online transactions due to variable patterns
transactions in real-time [84]. The applied fundamental experiments to confirm that AE and RBM
of fraud and change in customer’s behavior. An interesting work relied on employing deep learning
approaches are able to accurately detect credit cards using a huge dataset. Although deep learning
methods such as AE and RBM to mimic irregularity from regular patterns by rebuilding regular
transactions in real-time [84]. The applied fundamental experiments to confirm that AE and RBM
approaches are able to accurately detect credit cards using a huge dataset. Although deep learning
approaches enable us to fairly detect the fraud problem in credit cards, model building makes use of
diverse parameters that affect its outcomes. The work of Abhimanyu [82] evaluated commonly used
methods in deep learning to efficiently check out previous fraud detection problems in terms of class
Mathematics 2020, 8, 1640 19 of 42
approaches enable us to fairly detect the fraud problem in credit cards, model building makes use of
diverse parameters that affect its outcomes. The work of Abhimanyu [82] evaluated commonly used
methods in deep learning to efficiently check out previous fraud detection problems in terms of class
inconsistency and scalability. A plenary advice was provided by the authors regarding analysis of
model parameter sensitivity and its tuning while applying neural network architecture to the fraud
detection problems in credit cards. Another study by [81] designed an autoencoder algorithm in order
to model the fraudulent activities, as efficient automated tools need to accurately handle huge daily
transactions around the world. The model enables investigators to give a report regarding unbalanced
datasets where there is no need to use data balanced approaches such as the Under-Sampling approach.
One of the world’s largest industries is money laundering, which is the unlawful process of hiding
the original source of received money unlawfully by transmitting it through a complicated banking
transaction. A recent work, considering anti-money laundering detection, designed a new framework
using natural language processing (NLP) technology [83]. Here, the main reason for constructing a deep
learning framework is decreasing the cost of human capital and time consumption. The distributed
and scalable method (e.g., NLP) performs complex mechanisms associated with various data sources
such as news and tweets in order to make the decision simpler while providing more records.
It is a great challenge to design a practical algorithm to prevent fraudulent transactions in financial
sectors while dealing with a credit card. The work in [81] presented an efficient algorithm that has
the superiority of controlling unbalanced data compared to traditional algorithms. Where anomaly
detection can be handled by the reconstruction loss function, we depicted their proposed AE algorithm
in Algorithm 1 [81].
Algorithm 1. AE pseudo algorithm

Steps Processes
Input Matrix X II input dataset
Parameter of the matrix//parameter (w,bx,bh)
Step 1:Prepare the input data
where: w: Weight between layers, bx Encoder’s parameters,
bh Decoder’s Parameters
h←null // vector for the hidden layer
X←null // Reconstructed x
Step 2: initial Variables L←null II vector for Loss Function
1←batch number
i←0
While i < 1 do
II Encoder function maps an input X to hidden representation h:
h = f (p[i ].w + p[i] bx)
/* Decoder function maps hidden representation h back to a
Reconstruction X:*/
X = g(p[i ].l/ + p[i] bx)
/*For nonlinear reconstruction, the reconstruction loss is
generally from cross-entropy :*/
Step 3: loop statement
L = −sum(x *log(X) + (1 − x) *log(l − X))
/* For linear reconstruction, the reconstruction loss is generally
from the squared error:*/
L = sum(x − X)2
Min
θ[i] = p L(x − X)
End while
Return θ
θ←<null matrix>//objective function
/*Training an auto-encoder involves finding
Step 4: output
parameters = (W,hx,bb) that minimize the reconstruction loss
in the given dataset X and the objective function*/
Mathematics 2020, 8, 1640 20 of 42
3.1.5. Deep Learning in Macroeconomics

Macroeconomic prediction approaches have gained much interest in recent years, which are
helpful for investigating economics growth and business changes [85]. There are many proposed
methods that can forecast macroeconomic indicators, but these approaches require huge amounts of
data and suffer from model dependency. Table 5 shows the recent results which are more acceptable
Table 5. Application of deep learning in macroeconomics.
than the previous ones.
Table 5. Application of deep learning in macroeconomics.
[86] Encoder-decoder Indicator prediction
[87]
[86] Backpropagation Approach
Encoder-decoder Forecasting
Indicator inflation
prediction
[87] Backpropagation Approach Forecasting inflation
[88]
[88] Feed-Forward
Feed-Forward neural
neural Network
Network Asset
Asset allocation
allocation
Application
Application of ofdeep
deeplearning
learninginin
macroeconomics
macroeconomics hashas
been exponentially
been exponentiallygrowing during
growing the past
during the
few
pastyears [89–91].
few years Smalter
[89-91]. and Cook
Smalter [92] presented
and Cook a new robust
[92] presented a new model
robust called
modelan encoder–decoder
called an encoder–
that makes
decoder thatuse of deep
makes neural
use of deep architecture to increase
neural architecture the accuracy
to increase of prediction
the accuracy with awith
of prediction lowadata
low
data demand
demand concerning
concerning unemployment
unemployment problems.
problems. The The
mean mean absolute
absolute errorerror (MAE)
(MAE) waswas employed
employed for
for evaluating
evaluating the the results.
results. Figure
Figure 1919 visualizesthe
visualizes theaverage
averagevalues
valuesofofMAE
MAEobtained
obtained by
by Smalter
Smalter and
Cook [88]. This visualization compares average MAE values for CNN, LSTM, AE, DARM, and the
Professional Forecasters
Survey of Professional Forecasters (SPF).
(SPF).
Figure
Figure 19. Average MAE
19. Average MAE reports.
reports.
According to Figure 19, the lowest average MAE is related to AE followed by SPF with additional
According to Figure 19, the lowest average MAE is related to AE followed by SPF with
advantages such as supplying nice single-series efficiency, higher accuracy of predicting, and better
additional advantages such as supplying nice single-series efficiency, higher accuracy of predicting,
model specification.
and better model specification.
Haider and Hanif [87] employed an ANN method applied to forecast macroeconomic indicators
for inflation. Results were evaluated by RMSE values for comparing the performance of ANN, AR,
for inflation. Results were evaluated by RMSE values for comparing the performance of ANN, AR,
and ARIMA techniques. Figure 20 presents the average RMSE values for comparison.
According to Figure 20, the study with model simulation based on a backpropagation approach
outperformed previous models such as the autoregressive integrated moving average (ARIMA) model.
Another useful application of DL architecture is to deal with investment decisions in conjunction with
macroeconomic data. Chakravorty et al. [88] used a feed-forward neural network to perform tactical
asset allocation while applying macroeconomic indicators and price-volume trends. They proposed
two different methods in order to build a portfolio; the first one estimated expected returns and
uncertainty, and the second approach obtained allocation directly using neural network architecture
and the optimized portfolio Sharpe. Their methods with the adopted trading strategy demonstrated a
comparable achievement with previous results.
According to Figure 19, the lowest average MAE is related to AE followed by SPF with
additional advantages such as supplying nice single-series efficiency, higher accuracy of predicting,
and better model specification.
for inflation. Results were evaluated by RMSE values for comparing the performance of ANN,
Mathematics 2020, 8, 1640
AR,
21 of 42
Figure 20. Average RMSE reports.

Figure 20. Average RMSE reports.
A new technique was used in [86] to enhance the indicator prediction accuracy; the model requires
few data. Experimental results indicate that the encoder–decoder outperformed the highly cited SPF
prediction. The results in Table 6 showed that the encoder–decoder is more responsive than the SPF
prediction or is more adaptable to data changes [86].
Table 6. Inflection point prediction for Unemployment around 2007.
Time Horizon SPF Encoder–Decoder

3-month horizon model Q3 2007 Q1 2007
3.1.6. Deep Learning in Financial Markets (Service & Risk Management)

In financial markets, it is crucial to efficiently handle the risk arising from credits. Due to recent
advance in big data technology, DL models can design a reliable financial model in order to forecast
credit risk in banking systems (see Table 7).
Table 7. Application of deep learning in financial markets (services and risk management).

[89] Binary Classification Technique Loan pricing
[90] Feature selection Credit risk analysis
[91] AE Portfolio management
[92] Likelihood Esrtimation Mortgage risk
Addo et al. [89] employed a binary classification technique to identify essential features of selected
ML and DL models in order to evaluate the stability of the classifiers based on their performance.
The study used the models separately to forecast loan default probability, by considering the selected
features and the selected algorithm, which are the crucial keys in loan pricing processes. In credit
risk management, it is important to distinguish the good and the bad customers which oblige us to
construct very efficient classification models by using deep learning tools. Figure 21 presents the
results visualized for the study by Addo et al. [89] to compare the performance of LR, RF, Gboosting,
and DNN in terms of RMSE and area under the curve values (see Figure 21).
Mathematics
Mathematics2020,
2020,8,
8,x1640
xFOR
FORPEER
PEERREVIEW
REVIEW 23
23of
of44
44
Mathematics 2020, 8, 22 of 42
(a)
(a) (b)
(b)
Figure
Figure21.
Figure 21.AUC
21. AUC(a)
AUC (a)and
(a) andRMSE
and RMSE(b)
RMSE (b)results.
(b) results.
results.
In general,
In general, credit
general,credit data
creditdata
data include
include
include useless
useless andand
useless unneeded
unneeded
and unneeded features
features that
that need
that need
features to to
to be
be filtered
be filtered
need by
by the
by the feature
filtered the
feature
selection selection
strategy strategy
to to
provide provide
the the
classifierclassifier
with with
high high
accuracy.
feature selection strategy to provide the classifier with high accuracy. accuracy.
Ha
Haand
andNguyen
Nguyen[90] [90]studied
studiedaaanovel
novelfeature
novel featureselection
feature selectionmethod
methodthatthathelps
helpsfinancial
helps financialinstitution
financial institutionsto
sto
perform
perform credit assessment with less workload while focusing on important variables andto
credit assessment with less workload while focusing on important variables and toimprove
to improve
improve
the
the classification
the classification accuracy
classification accuracy in
accuracy in terms
terms ofof credit
credit scoring
scoring and
andcustomer rating. The
customerrating. The accuracy
The accuracy value
accuracy value was
was
employed
employed for comparing the performance of Linear SVM, CART, k-NN, Naïve Bayes, MLP, and
employed for
for comparing
comparing the
the performance
performance of
of Linear
Linear SVM,
SVM, CART,
CART, k-NN,
k-NN, Naï ve
Naïve Bayes,
Bayes, MLP,
MLP, and RF
RF
techniques.
techniques. Figure
techniques. Figure22
Figure 22presents
22 presentsthe
presents thecomparison
the comparisonresults
comparison resultsrelated
results relatedto
related tothe
to the study
the study by
study by Ha
by Ha and
Ha and Nguyen
and Nguyen [90]
Nguyen [90]
(see
(seeFigure
Figure22).
22).
Figure 22. The comparison results for the study.
Figure 22 contains the base type

Figure
Figure 22.of
22. Themethods
The comparison
comparisonandresults
their integration
results for
forthe
thestudy.with GA, PSO, LR, LDA, and
study.
t-tests as different scenarios except RF which is in its baseline type. In all the cases except linear SVM,
Figure
Figure 22
integrating 22 contains
with the
the base
optimizers
contains andtype
base filters
type of methods
ofsuch and
as GA,
methods and their
PSO, integration
theirand with
LR improved
integration with GA, PSO,
PSO,LR,
the accuracy
GA, LDA,
LDA, and
LR,compared and
t-tests
with as
their
t-tests different scenarios
baselinescenarios
as different except
type. However,
except RFRF which
in SVM,
whichtheis in its baseline
baseline
is in model
its baseline type. In all
provided
type. the cases
higher
In all the except
casesaccuracy linear SVM,
compared
except linear SVM,
integrating
with with
with optimizers
other scenarios.
integrating optimizers and and filters
filters such
such asas GA,
GA, PSO,
PSO, andand LR LR improved
improved the the accuracy
accuracy compared
compared
with Atheir baseline
study type.
applied However,
hierarchical in SVM,
models the
to baseline
present model
high provided
performance
with their baseline type. However, in SVM, the baseline model provided higher accuracy compared higher accuracy
regarding big compared
data [91].
with
They other scenarios.
constructed
with other a deep portfolio with the four processes of auto-encoding, calibrating, validating,
scenarios.
AAstudy
studyapplied
and verifying hierarchical
the application
applied of amodels
hierarchical portfolio
models to
topresent
presenthigh
including highputperformance regarding
and call options
performance with big
regarding bigdata
data [91].
underlying They
stocks.
[91]. They
constructed
constructed a deep portfolio with the four processes of auto-encoding, calibrating, validating, and
Their methods a deep
are portfolio
able to find with
the the
optimalfour processes
strategy in of
the auto-encoding,
theory of deep calibrating,
portfolios by validating,
ameliorating the
and
verifying
deep
verifying the
feature. application
the Another
application of
work a portfolio
of adeveloped including
portfolio aincluding put
deep learning and
put and call
model options
call in with
mortgage
options underlying
withrisk applied to
underlying stocks. Their
hugeTheir
stocks. data
methods
methods are able to find the optimal strategy in the theory of deep portfolios by ameliorating the
are able to find the optimal strategy in the theory of deep portfolios by ameliorating the
deep feature. Another work developed a deep learning model in mortgage risk applied to huge23data
Mathematics 2020, 8, 1640 of 42
sets that discovered the nonlinearity relationship between the variables and debtor behavior which
can be affected by the local economic situation [92]. Their research demonstrated that the
sets that discovered
unemployment the nonlinearity
variable relationship
is substantial between
in the risk the variables
of mortgage and brings
which debtor behavior which can
the importance of
be affected by the local economic situation
implications to the relevant practitioners. [92]. Their research demonstrated that the unemployment
variable is substantial in the risk of mortgage which brings the importance of implications to the
3.1.7. Deep
relevant Learning in Investment
practitioners.
3.1.7.Financial problems
Deep Learning generally need to be analyzed in terms of datasets from multiple sources.
in Investment
Thus, it is substantial to construct a reliable model for handling unusual interactions and features
from Financial problems
the data for generally
efficient needTable
forecasting. to be8analyzed
comprises inthe
terms of datasets
recent from
results of multiple
using sources.
deep learning
Thus, it is substantial to construct
approaches in financial investment. a reliable model for handling unusual interactions and features from
the data for efficient forecasting. Table 8 comprises the recent results of using deep learning approaches
in financial investment. Table 8. Application of deep learning in stock price prediction.
References Table 8. Application

Methods of deep learning in stock price prediction.
Application
[93] Reference LSTM and AE

Methods Market investment
Application
[94] [93] LSTM and AE
Hyper-parameter Market
Optioninvestment
pricing in finance
[94] Hyper-parameter Option pricing in finance
[95] [95] LSTM
LSTMand SVR
and SVR Quantitative
Quantitative strategy
strategy in investment
in investment
[96] [96] R-NN and genetic method
R-NN and genetic method Smart financial
Smart investment
financial investment
Aggarwal and andAggarwal

Aggarwal [93][93]
designed a deep
designed a learning model applied
deep learning model to an economic
applied to an investment
economic
problem with the capability of extracting nonlinear data patterns. They presented
investment problem with the capability of extracting nonlinear data patterns. They presented a decision model a
using neural
decision modelnetwork
using architecture
neural network sucharchitecture
as LSTM, auto-encoding,
such as LSTM,and smart indexing
auto-encoding, andtosmart
betterindexing
estimate
thebetter
to risk of portfolio
estimate the selection with securities
risk of portfolio selectionforwith thesecurities
investment problem.
for the investmentCulkin and Das
problem. [94]
Culkin
investigated the option pricing problem using DNN architecture to reconstruct
and Das [94] investigated the option pricing problem using DNN architecture to reconstruct the well-the well-known Black
and Scholes
known Blackformula
and Scholeswithformula
considerable accuracy. The
with considerable study tried
accuracy. to revisit
The study triedthe
to previous
revisit theresult [97]
previous
and made the model more accurate with the diverse selection of hyper-parameters.
result [97] and made the model more accurate with the diverse selection of hyper-parameters. The The experimental
result showedresult
experimental that the modelthat
showed can the
price the options
model can pricewiththeminor error.
options Fang
with et al.error.
minor [95] investigated
Fang et al [95]the
option pricingthe
investigated problem
optionin pricing
conjunction with transaction
problem in conjunction complexity where the research
with transaction complexitygoal iswhere
to explore
the
efficient investment
research strategyefficient
goal is to explore in a high-frequency trading manner.
investment strategy The work used
in a high-frequency the delta
trading hedging
manner. The
concept
work to the
used handle
deltathe risk in concept
hedging additiontotohandle
the several
the riskkeyincomponents of the
addition to the pricing
several keysuch as implied
components of
volatility
the pricingand
suchhistorical
as impliedvolatility, etc. and
volatility LSTM-SVR
historicalmodels wereetc.
volatility, applied to forecast
LSTM-SVR modelsthe final
weretransaction
applied to
price with
forecast theexperimental
final transactionresults.
priceResults were evaluated
with experimental by the
results. deviation
Results value andby
were evaluated were
thecompared
deviation
with single
value RF and
and were single LSTM
compared methods.
with single Figure
RF and 23 LSTM
single presents the visualized
methods. Figure 23results to compare
presents results
the visualized
more clearly for the study by Culkin and Das [94].
results to compare results more clearly for the study by Culkin and Das [94].
Figure 23.
Figure Deviation results
23. Deviation results for
for LSTM,
LSTM, RF,and
RF, and LSTM-SVR.
LSTM-SVR.
Mathematics
Mathematics 2020,
2020, 8,
8, x1640
FOR PEER REVIEW 25
24of
of44
42
Based on the results, the single DL approach outperforms the traditional RF approach with
higherBased on in
return theadopted
results, the single DLstrategy,
investment approach outperforms
but its deviation theistraditional RF approach
considerably higher thanwiththat
higher
for
return
the in adopted
hybrid investment
DL (LSTM-SVR) strategy,
method. A but its learning
novel deviationGenetic
is considerably
Algorithm higher than that
is proposed byfor the hybrid
Serrano [96]
concerning Smartmethod.
DL (LSTM-SVR) Investment thatlearning
A novel uses theGenetic
R-NN Algorithm
model to emulate
is proposed human behavior.
by Serrano [96]The model
concerning
makes use of complicated
Smart Investment that usesdeep learning
the R-NN architecture
model to emulatewhere
humanreinforcement
behavior. Thelearning
modeloccurs
makesforusefast
of
decision-making purposes, deep learning for constructing stock identity, clusters for the overall
complicated deep learning architecture where reinforcement learning occurs for fast decision-making
decision-making purpose, for
purposes, deep learning andconstructing
genetics for the transfer
stock purpose.
identity, Thefor
clusters created algorithm
the overall improved the
decision-making
return
purpose,regarding the market
and genetics for therisk with purpose.
transfer minor error Theperformance.
created algorithm improved the return regarding
The authors
the market in [96]
risk with constructed
minor a complex model while the genetic algorithm using network
error performance.
weights
Thewas usedinto[96]
authors imitate the human
constructed brain where
a complex modelapplying
while thethe following
genetic formula
algorithm usingfor the encoding–
network weights
decoding
was used organism,
to imitate theEquation
human(28):
brain where applying the following formula for the encoding–decoding
organism, Equation (28):
𝑚𝑖𝑛‖𝑋 − 𝜁(𝑊2 𝜁(𝑋𝑊1 ))‖ s.t. 𝑊1 ≥ 0 (28)
min kX − ζ(W2 ζ(XW1 ))k s.t. W1 ≥ 0 (28)
and
and
𝑊22==pinv(𝜁(𝑋𝑊
W pinv(ζ(XW 1 )) 𝑋 , pinv(𝑥) = (𝑥 𝑇T𝑥)𝑥 𝑇T
1 )) X , pinv(x) = (x x)x
(29)
(29)
where 𝑥x is
where is the
the input vector, 𝑊
inputvector, W isisthe
theweight matrix.𝑄1Q
weightmatrix. = 1𝜁(𝑋𝑊 ) represents
= ζ(1XW the neuron vector, and
1 ) represents the neuron vector,
𝑄and
2 = 𝜁(𝑊
Q2 = 𝑄
2 ζ ) represents the cell vector. We illustrate the whole structure of
1 (W2 Q1 ) represents the cell vector. We illustrate the whole structure the model in Figure
of the model 24
in
adopted
Figure 24from [96]. from [96].
adopted
Figure 24. Structure of the smart investment model.

Figure 24. Structure of the smart investment model.
3.1.8. Deep Learning in Retail
3.1.8. New
Deepapplications
Learning inofRetail
DL as well as novel DRL in retail industry are emerging in a fast pace [98–117].
Augmented reality (AR) enables customers to improve their experience while buying/finding a product
New applications of DL as well as novel DRL in retail industry are emerging in a fast pace [98-
from real stores. This algorithm is frequently used by researchers in the field. Table 9 presents the
117]. Augmented reality (AR) enables customers to improve their experience while buying/finding a
notable studies.
product from real stores. This algorithm is frequently used by researchers in the field. Table 9 presents
the notable studies.
Table 9. Application of deep learning in retail markets.

Mathematics 2020, 8, 1640 25 of 42
[98] Augmented reality and image classification Improving shopping in retail markets
Table 9. Application of deep learning in retail markets.
[99]Reference DNNMethods
methods Sale prediction
Application
[100] [98] Augmented reality
CNNand image classification Improving shopping ininretail
Investigation retailmarkets
stores
[99] DNN methods Sale prediction
[101] [100] Adaptable CNN
CNN Validation in
Investigation inthe food
retail industry
stores
[101] Adaptable CNN Validation in the food industry
Cruz et al [98] combined the DL technique and augmented reality approaches in order to provide
information
Cruz et for clients.
al. [98] They presented
combined a mobileand
the DL technique application
augmented that is able
reality to locate clients
approaches in order bytomeans
provide of
the image classification technique in deep learning. Then, employed AR approaches
information for clients. They presented a mobile application that is able to locate clients by means to efficiently
advise finding
of the image the selectedtechnique
classification product in with
deep alllearning.
helpful information
Then, employed regarding that product
AR approaches in large
to efficiently
stores.
advise Another
finding the interesting
selected application
product with ofall
deep learning
helpful methodsregarding
information is in fashion
thatretail withinlarge
product largesupply
stores.
and
Another interesting application of deep learning methods is in fashion retail with large supply [99]
demand where precise sales prediction is tied to the company’s profit. Nogueira et al and
designed
demand wherea novel DNN
precise to prediction
sales accuratelyisforecast
tied to the future sales where
company’s the modelet uses
profit. Nogueira al. [99]a designed
huge anda
completely
novel DNNdifferent set offorecast
to accurately variables such
future as physical
sales where the specifications
model uses aofhuge the product and thedifferent
and completely idea of
subject-matter
set of variables expert. Their experimental
such as physical specificationswork of theindicated
product and thatthe DNN
idea ofmodel performance
subject-matter is
expert.
comparable with other
Their experimental work shallow approaches
indicated that DNN such
modelas RF and SVR. An
performance important issue
is comparable with for retail
other is to
shallow
analyze
approachesclientsuchbehavior
as RF andin order
SVR. Anto maximize
important issuerevenue that can
for retail is tobe performed
analyze client by computer
behavior vision
in order to
techniques in the study
maximize revenue that by
canDebeSousa Ribeiro
performed byetcomputer
al [100]. The proposed
vision techniques CNN inregression
the study by model deals
De Sousa
with counting
Ribeiro problems
et al. [100]. where assessing
The proposed the number
CNN regression model of deals
available
withpersons
countinginproblems
the stores and detecting
where assessing
crucial spots. The work also designed a foreground/background approach for
the number of available persons in the stores and detecting crucial spots. The work also designed a detecting the behavior
of people in retail markets.
foreground/background The results
approach of the proposed
for detecting method
the behavior were compared
of people with a common
in retail markets. The resultsCNN of
technique in terms of accuracy. Figure 25 presents the visualized result
the proposed method were compared with a common CNN technique in terms of accuracy. Figure 25 for the study by De Sousa
Ribeiro
presentsetthe
al [100].
visualized result for the study by De Sousa Ribeiro et al. [100].
Figure 25. Comparison of CNN for two separate data sets.

Figure 25. Comparison of CNN for two separate data sets.
According to Figure 25, the proposed model is robust enough and performs better than common
CNNAccording
methods to in Figure
both datasets. In the current
25, the proposed model situation of the food
is robust enough retail industry,
and performs better it is crucial
than common to
distribute
CNN the products
methods in bothtodatasets.
the market
In with proper situation
the current informationof and
the remove the possibility
food retail industry, itofismislabeling
crucial to
in terms ofthe
distribute public health. to the market with proper information and remove the possibility of
products
A new in
mislabeling adaptable
terms ofconvolutional
public health.neural network approach for Optical Character Verification was
presented
A newwith the aimconvolutional
adaptable to automatically identify
neural the use
network by the dates
approach in a large
for Optical dataset [101].
Character The model
Verification was
applies both
presented a k-means
with the aim algorithm and k-nearest
to automatically identifyneighbor to incorporate
the use by the dates in calculated centroids
a large dataset [101].toThe
the
CNN for
model efficient
applies bothseparation
a k-meansandalgorithm
adaptation.
andDeveloped
k-nearest models
neighbor allow us to better manage
to incorporate calculatedthecentroids
precision
and the readability of important use related to dates and to better control the tractability information
in food producing processes. The most notable research is collected in Table 9 with the application of
DL in retail market. The adaptation method in [117] interconnects C and Z cluster centroids illustrated
to the CNN for efficient separation and adaptation. Developed models allow us to better manage the
precision and the readability of important use related to dates and to better control the tractability
information in food producing processes. The most interesting recent research is collected in Table
Mathematics 2020, 8, 1640 26 of 42
11 with the application of DL in retail market. The adaptation method in [117] interconnects C and Z
cluster centroids illustrated in Figure 11. This is to construct a new augmented cluster including
information
in Figure 11.from
Thisboth
is tonetworks
constructina CNN1 and CNN2
new augmented usingincluding
cluster k-NN classification
information[101]
from(see Figure
both 26).
networks
The methodologies
in CNN1 and CNN2 leadusing
to considerably improved [101]
k-NN classification food safety and correct
(see Figure labelling.
26). The methodologies lead to
considerably improved food safety and correct labelling.
Figure26.
Figure Full
26.Full structure
structure ofof the
the adaptable
adaptable CNNframework.
CNN framework.
3.1.9. Deep Learning in Business (Intelligence)

3.1.9. Deep Learning in Business (Intelligence)
Nowadays, big data solutions play the key role in business services and productivities to efficiently
Nowadays, big data solutions play the key role in business services and productivities to
reinforce the market. To solve the complex business intelligence (BI) problems dealing with market
efficiently reinforce the market. To solve the complex business intelligence (BI) problems dealing with
data, DL techniques are useful (see Table 10). Table 10 presents the notable studies for the application
market data, DL techniques are useful (see Table 10). Table 10 presents the notable studies for the
of deep learning in business intelligence.
application of deep learning in business intelligence.
Table 10. Application of deep learning in business intelligence.
Table 10. Application of deep learning in business intelligence.
[102] MLP BI with client data
[102][103] MLP
MLS and SAE BI with
Feature client data
selection in market data
[103][104] RNN
MLS and SAE Information detection in business
Feature selection in market data data
[105] RNN Predicting procedure in business
[104] RNN Information detection in business data
[105] RNN Predicting procedure in business
Fombellida et al. [102] engaged the concept of meta plasticity, which has the capability of
improving the flexibility of learning mechanisms to detect more useful information and learning from
Fombellida
data. The studyet al [102]
focused on engaged
MLP where thethe
concept
outputof meta
is in the plasticity,
applicationwhich has the
of BI while capability
utilizing of
the client
improving the flexibility of learning mechanisms to detect more useful information and
data. The developed work approved that the model reinforces learning over the classical MLPs and learning from
data. The
other study in
systems focused
termson of MLP where
learning the output
evolution andisultimate
in the application of BI while
accomplishment utilizing
regarding the client
precision and
data.
stability. The results of the proposed method were compared with MOE developed by West [106]and
The developed work approved that the model reinforces learning over the classical MLPs and
other
MCQP systems in terms
developed of learning
by Peng evolution
et al. [107] and ultimate
and Fombellida accomplishment
et al. [102] in terms ofregarding
accuracyprecision and
and sensitivity.
stability. Thepresented
Results are results ofin the proposed
Figure 27. method were compared with MOE developed by West [106]
and MCQP developed by Peng
As is clear from Figure 27, the et proposed
al [107] and Fombellida
method providedethigher
al. [102] in terms
sensitivity of accuracy
compared with and
other
sensitivity. Results are presented
methods and a lower accuracy value. in Figure 27.
Business intelligence Net, which is a recurrent neural network for exploiting irregularity in business
procedures, was designed to mainly manage the data aspect of business procedures. Where, this Net
does not depend on any given information about the procedure and clean dataset by Nolle et al. [104].
The study presented a heuristic setting that decreased the handy workload. The proposed approach can
be applied to model the time dimension in sequential phenomenon which is useful for anomalies. It is
proven by the experimental results that BI Net is a trustable approach with high capability of anomaly
detection in business phenomenon logs. An alternative strategy to the ML algorithm needs to manage
Mathematics 2020, 8, 1640 27 of 42
huge amounts of data generated by enlargement and broader use of digital technology. DL enables us
to dramatically handle large datasets with heterogeneous properties. Singh and Verma [103] designed
a new multi-layer feature selection interacting with a stacked auto-encoder (SAE) to detect crucial
representations of data. The novel presented method performed better than commonly used ML
algorithms applied to the Farm Ads dataset. Results were evaluated by employing accuracy and area
Figure
Mathematics 2020, 8, x FOR PEER27. The visualized results considering sensitivity and accuracy.
REVIEW 28 of 44
under the curve factors. Figure 28 presents the visualized results according to Singh and Verma [103].
As is clear from Figure 27, the proposed method provided higher sensitivity compared with
other methods and a lower accuracy value.
Business intelligence Net, which is a recurrent neural network for exploiting irregularity in
business procedures, was designed to mainly manage the data aspect of business procedures. Where,
this Net does not depend on any given information about the procedure and clean dataset by Nolle
et al [104]. The study presented a heuristic setting that decreased the handy workload. The proposed
approach can be applied to model the time dimension in sequential phenomenon which is useful for
anomalies. It is proven by the experimental results that BI Net is a trustable approach with high
capability of anomaly detection in business phenomenon logs. An alternative strategy to the ML
algorithm needs to manage huge amounts of data generated by enlargement and broader use of
digital technology. DL enables us to dramatically handle large datasets with heterogeneous
properties. Singh and Verma [103] designed a new multi-layer feature selection interacting with a
stacked auto-encoder (SAE) to detect crucial representations of data. The novel presented method
performed better than commonly used ML algorithms applied to the Farm Ads dataset. Results were
evaluated by employing accuracy and area under the curve factors. Figure 28 presents the visualized
results according to Singh and Verma [103].
Figure 27. The visualized results considering sensitivity and accuracy.
Figure 27. The visualized results considering sensitivity and accuracy.
As is clear from Figure 27, the proposed method provided higher sensitivity compared with
other methods and a lower accuracy value.
Business intelligence Net, which is a recurrent neural network for exploiting irregularity in
business procedures, was designed to mainly manage the data aspect of business procedures. Where,
this Net does not depend on any given information about the procedure and clean dataset by Nolle
et al [104]. The study presented a heuristic setting that decreased the handy workload. The proposed
approach can be applied to model the time dimension in sequential phenomenon which is useful for
anomalies. It is proven by the experimental results that BI Net is a trustable approach with high
capability of anomaly detection in business phenomenon logs. An alternative strategy to the ML
algorithm needs to manage huge amounts of data generated by enlargement and broader use of
digital technology. DL enables us to dramatically handle large datasets with heterogeneous
properties. Singh and Verma [103] designed a new multi-layer feature selection interacting with a
stacked auto-encoder (SAE) to detect crucial representations of data. The novel presented method
performed better than commonly used ML algorithms applied to the Farm Ads dataset. Results were
evaluated by employingFigure 28. Comparative
accuracy and areaanalysis
under of RF,
the linearfactors.
curve SVM, and RBF 28
Figure SVM.
presents the visualized
Figure 28. Comparative analysis of RF, linear SVM, and RBF SVM.
results according to Singh and Verma [103].
According to Figure 28, RBF-based SVM provided higher accuracy and AUC, and the lowest
accuracy and AUC were related to the RF method. The kernel type is the most important factor for the
performance of SVM, and the RBF kernel provides higher performance compared with that for the
linear kernel.
A new approach [105] used recurrent neural network architecture for forecasting the business
procedure manner. Their approach was equipped with an explicit process characteristic, with no need
for a model, where the RNN inputs build by means of an embedded space with demonstrated results
regarding the validation accuracy and the feasibility of this method.
The work in [103] utilized novel multi-layer SAE architecture to handle Farm Ads data setting
where the traditional ML algorithm did not perform well regarding high dimensionality and the huge
sample size characteristics of Farm Ads data. Their feature selection approach has high capability of
A new approach [105] used recurrent neural network architecture for forecasting the business
procedure manner. Their approach was equipped with an explicit process characteristic, with no
need for a model, where the RNN inputs build by means of an embedded space with demonstrated
results regarding the validation accuracy and the feasibility of this method.
The
Mathematics work
2020, in [103] utilized novel multi-layer SAE architecture to handle Farm Ads data
8, 1640 28 setting
of 42
where the traditional ML algorithm did not perform well regarding high dimensionality and the huge
sample size characteristics of Farm Ads data. Their feature selection approach has high capability of
dimensionality
dimensionality reduction andand
reduction enhancing the classifier
enhancing performance.
the classifier We depict
performance. the complete
We depict process
the complete of
process
theof
algorithm in Figure 29.
the algorithm in Figure 29.
Figure 29. Multi-Layer SAE architecture.

Figure 29. Multi-Layer SAE architecture.
Although DL utilizes the representation learning property of DNN and handles complex problems
Although
involving nonlinearDLpatterns
utilizes of
theeconomic
representation learning
data, DL methodsproperty
generallyof DNN
cannotand handles
align complex
the learning
problems
problem withinvolving
the goal ofnonlinear
the traderpatterns of economic
and the significant data,constraint
market DL methods generally
to detect cannotstrategy
the optimal align the
learningwith
associated problem with the
economics goal of RL
problems. the approaches
trader and enable
the significant
us to fill market
the gapconstraint
with theirto detect the
powerful
optimal strategy
mathematical associated with economics problems. RL approaches enable us to fill the gap with
structure.
their powerful mathematical structure.
3.2. Deep Reinforcement Learning Application in Economics
3.2. Deep Reinforcement
Despite the traditionalLearning Application
approaches, DRL in Economics
has the important capability of capturing substantial
market Despite
conditions
theto provide the
traditional best strategy
approaches, DRLinhas
economics, whichcapability
the important also provides the potential
of capturing of
substantial
scalability and efficient handling of high-dimensional problems. Thus, we are motivated to consider
market conditions to provide the best strategy in economics, which also provides the potential of
the recent advance of deep RL applications in economics and the financial market.
3.3. Deep Reinforcement Learning in Stock Trading

Financial companies need to detect the optimal strategy while dealing with stock trading in the
dynamic and complicated environment in order to maximize their revenue. Traditional methods
applied to stock market trading are quite difficult for experimentation when the practitioner wants to
consider transaction costs. RL approaches are not efficient enough to find the best strategy due to the
lack of scalability of the models to handle high-dimensional problems [108]. Table 11 presents the most
notable studies developed by deep RLs in the stock market.
Mathematics 2020, 8, 1640 29 of 42
Table 11. Application of deep RLs in the stock market.

[109] DDPG Dynamic stock market
[110] Adaptive DDPG Stock portfolio strategy
[111] DQN methods Efficient market strategy
[112] RCNN Automated trading
Xiong et al. [109] used the deep deterministic policy gradient (DDPG) algorithm as an alternative
to explore the optimal strategy in dynamic stock markets. The algorithm components handle large
action-state space, taking care of the stability, removing sample correlation, and enhancing data
utilization. Results demonstrated that the applied model is robust in terms of equilibrating risk
and performs better as compared to traditional approaches with the guarantee of higher return.
Xinyi et al. [110] designed a new Adaptive Deep Deterministic Reinforcement Learning framework
(Adaptive DDPG) dealing with detecting optimal strategy in dynamic and complicated stock markets.
The model combined an optimistic and pessimistic deep RL that relies on both negative and positive
forecasting errors. Under complicated market situations, the model has the capability to gain better
portfolio profit based on the Dow Jones stocks.
Li et al. [111] did survey studies on deep RL for analyzing the multiple algorithms for stock
decision-making mechanism. Their experimental results, based on three classical models, DQN,
Double DQN and Dueling DQN, showed that among them, DQN models enable us to attain a better
investment strategy in order to optimize the return in stock trading where applying empirical data
to validate the model. Azhikodan et al. [112] focused on automating oscillation in securities trading
using deep RL where they employed a recurrent convolutional neural network (RCNN) approach for
forecasting stock value from the economic news. The original goal of the work was to demonstrate that
deep RL techniques are able to efficiently detect the stock trading tricks. The recent deep RL results in
the application of stock markets can be viewed in Table
12. An adaptive DDPG [110] comprising both
an actor network µ(s|θµ ) and a critic network Q s, aθQ utilized for the stock portfolio with a different
update structure than the DDPG approach is given by Equation (30):
θQ́ ← τθQ +(1 − τ) θQ́ θµ́ ← τθµ +(1 − τ) θµ́ (30)
where the new target function is Equation (31):
Yi = ri +γQ́(si+1 , µ́(si+1 |θµ́ , θQ́ )) (31)
Table 12. Application of deep reinforcement learning in portfolio management.

[109,113] DDPG Algorithmic trading
[114] Model-less CNN Financial portfolio algorithm
[15] Model-free Advanced strategy in portfolio trading
[115] Model-based Dynamic portfolio optimization
Experimental results indicated that the model outperformed the DDPG with high portfolio return
as compared to the previous methods. One can check the structure of the model in Figure 30 adopted
from [110].
[115] Model-based Dynamic portfolio optimization
Experimental results indicated that the model outperformed the DDPG with high portfolio
Mathematics as
return compared
2020, 8, 1640 to the previous methods. One can check the structure of the model in Figure
30 of 42 30
adopted from [110].
Critic network
Actor network  N  a  (t )
a   (s)     
 N a  (t )
µ(S)
S Q(s,a)
Figure
Figure 30. 30.Architecture
Architecture
forfor
thethe adaptive
adaptive deep
deep deterministic
deterministic reinforcement
reinforcement learning
learning framework
framework
(Adaptive
(Adaptive DDPG).
DDPG).
3.3.1. Deep
3.3.1. Reinforcement
Deep Learning
Reinforcement in Portfolio
Learning Management
in Portfolio Management
Algorithmic
Algorithmic trading
tradingarea is currently
area is currently using
usingdeep
deep RLRL techniques
techniques forfor
portfolio
portfolio management
management with
with
fixed allocation of capital into various financial products (see Table 12). Table
fixed allocation of capital into various financial products (see Table 12). Table 12 presents the notable 12 presents the notable
studies
studiesin the application
in the application of deep
of deepreinforcement
learning in portfolio
in portfoliomanagement.
management.
Liang et al. [113] engaged different RL methods, such
Liang et al [113] engaged different RL methods, such as DDPG, Proximal as DDPG, Proximal Policy
Policy Optimization
Optimization
(PPO), and PG methods, to obtain the strategy associated
(PPO), and PG methods, to obtain the strategy associated with financial portfolio in with financial portfolio in continuous
continuous
action-space. They compared the performance of the models in various
action-space. They compared the performance of the models in various settings in conjunction with settings in conjunction
with
thethe China
China asset
asset market
market andand indicated
indicated thatthat
PG PG is more
is more favorable
favorable in stock
in stock trading
trading thanthan
the two the others.
two
others. The study
The study also presented
also presented a novel a novel Adversarial
Adversarial Training Training
approach approach for ameliorate
for ameliorate the training
the training efficiency
efficiency and the mean-return, where the new training approach
and the mean-return, where the new training approach ameliorated the performance of PG ameliorated the performance of as
PGcompared
as compared to uniform constant rebalanced portfolios (UCRP). Recent
to uniform constant rebalanced portfolios (UCRP). Recent work by Jiang et al [114] work by Jiang et al. [114]
employed
employed a deterministic
a deterministic deepdeep reinforcement
learning approach
approach based
basedononcryptocurrency
cryptocurrency to obtain
to obtain
thethe
optimal strategy in financial problem settings. Cryptocurrencies, such
optimal strategy in financial problem settings. Cryptocurrencies, such as Bitcoin, can be used as Bitcoin, can be used in in
place of governmental money. The study designed a model-less convolutional
place of governmental money. The study designed a model-less convolutional neural network (RNN) neural network (RNN)
where
wherethetheinputs
inputs areare
historical
historicalasset prices
asset pricesfrom
froma cryptocurrency
a cryptocurrency exchange
exchange with
withthethe
aim aimto produce
to produce
thethe
set set
of portfolio
of portfolio weights. The proposed models try to optimize the return using thereward
weights. The proposed models try to optimize the return using the network network
function
rewardinfunction
the reinforcement learning manner.
in the reinforcement learning The performance
manner. of their presented
The performance of their CNN approach
presented CNN
obtained good results as compared to other benchmark algorithms
approach obtained good results as compared to other benchmark algorithms in portfolio in portfolio management. Another
trading algorithm
management. was proposed
Another trading as the model-free
algorithm Reinforcement
was proposed Learning framework
as the model-free Reinforcement [15] Learning
where
theframework
reward function was engaged
[15] where the rewardby applying
function a fully
wasexploiting
engaged DPG approacha sofully
by applying as toexploiting
optimize the DPG
accumulated return. The model includes Ensemble of Identical independent
approach so as to optimize the accumulated return. The model includes Ensemble of Identical evaluators (EIIE) topology
to incorporate
independentlarge sets of neural-net
evaluators in terms
(EIIE) topology toofincorporate
weight-sharing largeandsetsa portfolio-vector
of neural-net inmemory terms of(PVM) weight-
in sharing
order toand prevent the gradient destruction problem. There is an online
a portfolio-vector memory (PVM) in order to prevent the gradient destruction problem.stochastic batch learning
(OSBL)
Therescheme to consecutively
is an online analyze
stochastic batch steady (OSBL)
learning inboundschememarketto information,
consecutively in conjunction
analyze steady with inbound
CNN,
LSTM,
marketandinformation,
RNN models. The results indicated
in conjunction with CNN, that the models,
LSTM, and RNN while interacting
models. with benchmark
The results indicated that
models, performed better than previous portfolio management algorithms
the models, while interacting with benchmark models, performed better than previous portfolio based on a cryptocurrency
market database. A novel model-based deep RL scheme was designed by Yu et al. [115] in the
sense of automated trading to take action and make the decisions sequentially associated with global
goals. The model architecture includes an infused prediction module (IPM), a generative adversarial
data augmentation module (DAM), and a behavior cloning module (BCM), dealing with designed
back-testing. empirical results, using historical market data, proved that the model is stable and gains
more return as compared to baseline approaches and other recent model-free methods. Portfolio
optimization is a challenging task while trading stock in the market. Authors in [115] utilized a novel
efficient RL architecture associated with a risk-sensitive portfolio combining IPM to forecast the stock
trend with historical asset prices for improving the performance of the RL agent, DAM to control the
over-fitting problem, and BCM to handle unexpected movement in portfolio weights and to retain
cloning module (BCM), dealing with designed back-testing. empirical results, using historical market
data, proved that the model is stable and gains more return as compared to baseline approaches and
other recent model-free methods. Portfolio optimization is a challenging task while trading stock in
the market. Authors in [115] utilized a novel efficient RL architecture associated with a risk-sensitive
portfolio combining
Mathematics 2020, 8,IPM
1640 to forecast the stock trend with historical asset prices for improving the31 of 42
performance of the RL agent, DAM to control the over-fitting problem, and BCM to handle
unexpected movement in portfolio weights and to retain the portfolio with low volatility (see Figure
theresults
31). The portfolio with low volatility
demonstrate (see Figure
the complex model 31). The robust
is more results and
demonstrate
profitablethe
ascomplex model
compared is more
to the
robust and profitable
prior approaches [115]. as compared to the prior approaches [115].
AE
DAM
IPM BCM
DH
Data Fetcher Agent
Market
MSE Broker Trade Execution Engine
FigureFigure
31. Trading architecture.
31. Trading architecture.
3.3.2. Reinforcement
3.3.2. Deep Deep Reinforcement Learning
Learning in Online
in Online Services
Services
In current
In current development
development of online
of online services,the
services, the users
users face
facethe
thechallenge
challengeof detecting theirtheir
of detecting interested
interested items efficiently where recommendation techniques enable us to give the right solutionsto this
items efficiently where recommendation techniques enable us to give the right solutions
to thisproblem.
problem.Various
Variousrecommendation
recommendationmethods
methods are
arepresented
presented such
suchas as
content-based
content-basedcollaborative filtering,
collaborative
factorization
filtering, machines,
factorization multi-armed
machines, multi-armedbandits, to name
bandits, a few.
to name TheseThese
a few. proposed approaches
proposed are mostly
approaches
limited to where the users and recommender systems interact statically and focus
are mostly limited to where the users and recommender systems interact statically and focus on short- on short-term
rewards.Table
term rewards. Table1313presents
presentsthe
thenotable
notable studies
studies ininthethe application
application of deep
of deep reinforcement
reinforcement learning in
learning
online services.
in online services.
Table 13. Application of deep reinforcement learning in online services.
Table 13. Application of deep reinforcement learning in online services.
[116] Actor–critic method Recommendation architecture
[116] [117] Actor–criticSS-RTB
methodmethod Recommendation architecture
Bidding optimization in advertising
[117] [118] SS-RTB methodand DQN
DDPG Pricing algorithm for online
Bidding optimization in advertisingmarket
[119] DQN scheme Online news recommendation
[118] DDPG and DQN Pricing algorithm for online market
[119] DQN scheme Online news recommendation
Feng et al. [116] applied a new recommendation using an actor–critic model in RL that can
explicitly have a dynamic interaction and long-term rewards in consecutive decision-making processes.
Feng et al [116] applied a new recommendation using an actor–critic model in RL that can
The experimental work utilizing practical datasets proved that the presented approach outperforms
explicitly have a dynamic interaction and long-term rewards in consecutive decision-making
the prior methods. A key task in online advertising is to optimize advertisers’ gain in the framework
of bidding optimization.
Zhao et al. [117] focused on real-time bidding (RTB) applied to sponsored search (SS) auction in a
complicated stochastic environment associated with user action and bidding policies (see Algorithm 2).
Their model, the so-called SS-RTB, engaged reinforcement learning concepts to adjust a robust Markov
Decision Process (MDP) model in a changing environment, is based on a suitable aggregation level of
datasets from auction market. Empirical results of both online and offline evaluation based on the
Alibaba auction platform indicated the usefulness of the proposed method. In the rapid growing of
business, online retailers face more complicated operations that heavily require detection of updated
Mathematics 2020, 8, 1640 32 of 42
pricing methodology in order to optimize their profit. Liu et al. [118] proposed a pricing algorithm
that models the problem in the framework of MDP based on E-commerce platform. They used deep
reinforcement learning approaches to effectively deal with the dynamic market environment and
market changes and to set efficient reward functions associated with the complex environment. Online
testing for such models is impossible as legally the same prices have to be presented to the various
customers simultaneously. The model engages deep RL techniques to maximize the long-term return
while applying both discrete and continuous setting for pricing problems. Empirical experiments
showed that the obtained policies from DDPG and DQN methods were much better than other pricing
strategies based on various business database.s Based on reports of Kompan and Bieliková [120],
news aggregator services are the favorite online services able to supply massive volume of content
for the users. Therefore, it is important to design news recommendation approaches for boosting the
user experience. Results were evaluated by precision and recall values into two scenarios: SME.SK
and manually annotated. Figure 32 presents the visualized results for the study by Kompan and
Bieliková [120].
Algorithm 2 [117]: DQN Leaming

1: for epsode = 1 to n do
2: Initialize replay memory D to capacity N
3: Initialize action value functions (Qtrain.Qep. Qtarget)
with weights Otrain.Oep.Otarget
4: for t = 1 to m do
5: With probability E select a random action at
6: otherwise select at = arg max 0 Qep(St, a;θep )
7: Execute action at to auction simulator and observe
state St+I and reward rt
8: if budget of St+ 1 < 0, then continue
9: Store transition (s1,at,r1, s1+1 ) in D
10: Sample random rmru batch of transitions
(sj,aj,rj, Sj+1 ) from D
11: if j = m then
12: Set Yi = ri
13: else
14: Set Yi = ri + y arg maxa , Qiarget (si+1 • a.,θtarget )
15: end if
16: Perform a gradient descent step on the loss function
(Yi − Qirain (si.ai;θ1rain )) 2
17: end for
18: Update θep with θ
19: Every C steps, update θ1arget with θ
20: end for
As is clear from Figure 32, in both scenarios, the proposed method provided the lowest recall and
precision compared with TFIDF. A recent research by Zheng et al. [119] designed a new framework
using the DQN scheme for online news recommendation with the capability of taking both current and
future rewards. The study considered user activeness and also the Dueling Bandit Gradient Descent
approach to meliorate the recommendation accuracy. Wide empirical results, based on online and
offline tests, indicated the superiority of their novel approach.
Mathematics 2020, 8, 1640 33 of 42
Figure 32. Comparative analysis of TFID for DDPG and DQN.

Figure 32. Comparative analysis of TFID for DDPG and DQN.
To deal with the bidding optimization problem is the real-world challenge in online advertising.
As the
Despite is clear
priorfrom Figure
methods, 32, inwork
recent both[117]
scenarios, the
used an proposed
approach method
called SS-RTBprovided the sophisticated
to handle lowest recall
changing environments associated with bidding policies (see Table 13). More importantly, the amodel
and precision compared with TFIDF. A recent research by Zheng et al [119] designed new
framework
was extendedusing themulti-agent
to the DQN scheme for online
problem using news
robustrecommendation with the capability
MDP and then evaluated of taking
by the performance
both current
metric and future rewards.
PUR_AMT/COST The study
that showed considered
the model user activeness
considerably and also
outperform thethe Dueling Bandit
PUR_AMT/COST.
Gradient
The Descent approach
PUR_AMT/COST to meliorate
is a metric the recommendation
of optimizing the buying amountaccuracy. Wide
with high empiricalof
correlation results, based
minimizing
on online
the cost. and offline tests, indicated the superiority of their novel approach.
To deal with the bidding optimization problem is the real-world challenge in online advertising.
4. Discussion
Despite the prior methods, recent work [117] used an approach called SS-RTB to handle sophisticated
changing environments
A comparative associated
result of DL andwithDRLbidding
modelspolicies (see Table
was applied to 13).
the More importantly,
multiple economicthe model
domains
was extended to the multi-agent problem using robust MDP and then evaluated by
discussed in Table 14 and Figure 33. Different aspects of proposed models were summarized such asthe performance
metric complexity,
model PUR_AMT/COST thatand
accuracy showed
speed,thesource
modelofconsiderably outperform
dataset, internal detectiontheproperty
PUR_AMT/COST. The
of the models,
PUR_AMT/COST is a metric of optimizing the buying amount with high correlation of
and profitability of the model regarding to revenue and managing the risk. Our work indicated that bothminimizing
the and
DL cost.DRL algorithms are useful for prediction purposes with almost the same quality of prediction
in terms of statistical error. Deep learning models such as AE methods in risk management [91] and
4. Discussion
LSTM-SVR approaches [95] in investment problems showed that they enable agents to considerably
maximize their
A comparative revenue while
result taking
of DL andcare
DRLofmodels
risk constraints
was applied withto reasonably
the multiple higheconomic
performance as it
domains
is quite important
discussed in Table to 14 the
andeconomic
Figure 33.markets.
DifferentIn addition,
aspects reinforcement
of proposed models learning is able to simulate
were summarized such as
more efficient models with more realistic market constraints, while deep
model complexity, accuracy and speed, source of dataset, internal detection property of the models, RL goes further to solve
the
andscalability problem
profitability of the of RL algorithms,
model regarding to which is crucial
revenue for fast users
and managing and market
the risk. Our work growth, and that
indicated
efficiently
both DL and work DRLwith high-dimensional
algorithms are usefulsettings as it is highly
for prediction purposesdesired
withinalmost
the financial
the same market.
qualityDRL
of
can give notable help to design more efficient algorithms for forecasting and
prediction in terms of statistical error. Deep learning models such as AE methods in risk management analyzing the market
with real-word
[91] and LSTM-SVRparameters. The deep
approaches deterministic
[95] in investmentpolicy problemsgradient
showed(DDPG)that method
they enable usedagents
by [109]
to
in stock trading
considerably demonstrated
maximize how thewhile
their revenue model is ablecare
taking to handle
of risk large settingswith
constraints concerning
reasonably stability,
high
improving
performance data
as ituse, and important
is quite equilibrating riskeconomic
to the while optimizing
markets. In the return with
addition, a high performance
reinforcement learning is
guarantee.
able to simulateAnother
moreexample
efficientin the deep
models withRL framework
more realistic made
marketuse of the DQN
constraints, while scheme
deep RL [119] to
goes
ameliorate the news recommendation accuracy dealing with huge users at
further to solve the scalability problem of RL algorithms, which is crucial for fast users and market the same time with a
considerably
growth, and that high-performance
efficiently work guarantee of the method.
with high-dimensional Our as
settings review demonstrated
it is highly desired in that there is
the financial
great
market.potential
DRL can in improving
give notable deep
helplearning
to design methods appliedalgorithms
more efficient to the RL problems to better
for forecasting andanalyze the
analyzing
relevant problem for finding the best strategy in a wide range of economic
the market with real-word parameters. The deep deterministic policy gradient (DDPG) method used domains. Furthermore,
RL is widely
by [109] growing
in stock tradingresearch in DL. However,
demonstrated how the modelmost ofis theablewell-known
to handle largeresults in RL concerning
settings have been
in single-agent
stability, environments,
improving data use,whileand the cumulativerisk
equilibrating reward
while justoptimizing
involves single state-action
the return with spaces.
a high
It would be interesting
performance guarantee.toAnother
consider multi-agent
example in thescenarios
deep RL in RL that comprise
framework made usemore of thethan
DQN onescheme
agent
where
[119] tothe cumulative
ameliorate reward
the news can be affectedaccuracy
recommendation by the actions
dealingofwith other
hugeagents.
users There is some
at the same time recent
with
research that applied multi-agent reinforcement learning (MARL) scenarios
a considerably high-performance guarantee of the method. Our review demonstrated that there is to a small set of agents
great potential in improving deep learning methods applied to the RL problems to better analyze the
relevant problem for finding the best strategy in a wide range of economic domains. Furthermore,
Mathematics 2020, 8, 1640 34 of 42
but not to a large set of agents [121]. For instance, in the financial markets where there are a huge
number of agents at the same time, the acts of particular agents, concerning the optimization problem,
might be affected by the decisions of the other agents. The MARL can provide an imprecise model and
then inexact prediction while considering infinite agents. On the other hand, mathematicians use the
theory of mean-field games (MFGs) to model a huge variety of non-cooperative agents in a complicated
multi-agent dynamic environment, but in theory, MFG modeling might often lead to derive unsolvable
partial differential equations (PDE) while dealing with large numbers of agents. Therefore, the big
challenge in the application of the MARL is to efficiently model a huge number of agents and then
solve the real-world problem using MFGs, dealing with economics and financial markets. Now,
the big challenges might be applying MFGs to MARL systems while involving infinite agents. Note
that MGFs are able to efficiently model the problem but give unsolvable equations [122,123]. Recent
advanced research demonstrated that using MFGs can diminish the complexity of the MARL system
dealing with infinite agents. Thus, the question is how to effectively present the combination of MGFs
(mathematically), and MARL that can help out to solve unsolvable equations applied to economic and
financial problems. For example, the goal of all the high-frequency traders is to dramatically optimize
the profit where MARL scenarios can be modeled by MFGs, assuming that all the agents have the same
cumulative reward.
Table 14. Comparative study of DL and DRL models in economics.
Dataset Detection Accuracy & Processing

Methods Complexity Profit & Risk Application
Type Efficiency Rate
TGRU Historical High Reasonable High-Reasonable Reasonable profit Stock pricing
DNN Historical Reasonably high Reasonable High-Reasonable Reasonable profit Stock pricing
AE Historical High High High-Reasonably high Reasonably low risk Insurance
RgretNet Historical High Reasonable High-Reasonable High profit Auction design
Credit
AE & RBM Historical Reasonably high Reasonable High-Reasonable Low risk
card fraud
ENDE Historical Reasonable Reasonable High-Reasonable — Macroeconomic
Risk
AE Historical High High High-Reasonable High-Low
management
LSTM-SVR Historical High Reasonable Reasonably-high-High High-Low Investment
profit
CNNR Historical Reasonable High Reasonably high-High Reasonably high
Retail market
Business
RNN Historical Reasonable High Reasonable-Reasonable —
intelligence
DDPG Historical High High Reasonably high-High High-Low Stock trading
portfolio
IMF&MBM Historical High High Reasonabley high-High High profit
managing
DQN Historical High High High-High High-Low Online services
Figure 33. The ranking score for each method in its own application.
Figure 34 illustrates a comprehensive evaluation of models’ performance in the surveyed manuscript.

LSTM, CNN, DNN, RNN, and GRU generally produced higher RMSE. The identified DRL methods with
higher performance can be further used in emerging applications, e.g., pandemic modeling, financial
markets, and 5G communication.
Mathematics 2020, 8, 1640 35 of 42
Figure 34 illustrates
Figure a comprehensive
34 illustrates evaluation of evaluation
a comprehensive models’ performance
of models’ in the surveyed manuscript.
performance in the surveyed
LSTM, CNN, DNN,
manuscript. RNN,CNN,
LSTM, and GRU
DNN,generally
RNN,produced
and GRUhigher RMSE.produced
generally The identified DRL
higher methods
RMSE. with
The identified
higher performance can be further used in emerging applications, e.g., pandemic modeling, financial
DRL methods with higher performance can be further used in emerging applications, e.g., pandemic
markets, and 5G
modeling, communication.
financial markets, and 5G communication.
Figure 34. 34.

Figure Comparative analysis
Comparative of the
analysis models’
of the performance
models’ performance
5. Conclusions
In the current fast economics and market growth, there is a high demand for the appropriate
5. Conclusions
mechanisms in order to considerably enhance the productivity and quality of the product. Thus, DL can
In the current
contribute fast economics
to effectively forecastand
andmarket growth, there
detect complex marketis trends,
a high demand for the
as compared to appropriate
the traditional
mechanisms in order
ML algorithms, withto the
considerably enhanceofthe
major advantage productivity
a high-level and extraction
feature quality of the product.
property andThus, DL
proficiency
canofcontribute to effectively forecast and detect complex market trends, as compared to
the problem solver methods. Furthermore, reinforcement learning enables us to construct more the traditional
MLefficient
algorithms, with theregarding
frameworks major advantage of a high-level
the integration feature extraction
of the prediction problemproperty
with the and proficiency
portfolio structure
of the
task,problem solver
considering methods.
crucial Furthermore,
market reinforcement
constraints and learning enables
better performance, us todeep
while using construct more
reinforcement
efficient frameworks regarding the integration of the prediction problem with the portfolio
learning architecture and the combination of both DL and RL approaches, for RL to resolve the problem structure
task,
of considering crucial
scalability and to bemarket constraints
applied and better performance,
to the high-dimensional problems while using deep
as desired reinforcement
in real-world market
learning
settings. Several DL and deep RL approaches, such as DNN, Autoencoder, RBM, LSTM-SVR, the
architecture and the combination of both DL and RL approaches, for RL to resolve CNN,
RNN, DDPG, DQN, and a few others, were reviewed in the various application of economic and
market domains, where the advanced models improved prediction to extract better information and
to find the optimal strategy mostly in complicated and dynamic market conditions. This brief work
represents the basic issue that all proposed approaches are mainly to fairly deal with the model
complexity, robustness, accuracy, performance, computational tasks, risk constraints, and profitability.
Practitioners can employ a variety of both DL and deep RL techniques, with the relevant strengths
and weaknesses, that serve in economic problems to enable the machine to detect the optimal strategy
associated with the market. Recent works showed that the novel techniques in DNN, and recent
interaction with reinforcement learning, so-called deep RL, have the potential to considerably enhance
the model performance and accuracy while handling real-world economic problems. We mention that
our work indicates the recent approaches in both DL and deep RL perform better than the classical ML
approaches. Significant progress can be obtained by designing more efficient novel algorithms using
deep neural architectures in conjunction with reinforcement learning concepts to detect the optimal
strategy, namely, optimize the profit and minimize the loss while considering the risk parameters in a
highly competitive market. For future research, a survey of machine learning and deep learning in 5G,
cloud computing, cellular network, and COVID-19 outbreak is suggested.
Mathematics 2020, 8, 1640 36 of 42
Author Contributions: Conceptualization, A.M.; methodology, A.M.; software, Y.F.; validation, A.M., P.G. and
P.D.; formal analysis, S.F.A. and E.S.; investigation, A.M.; resources, S.S.B.; data curation, A.M.; writing—original
draft preparation, A.M.; writing—review and editing, P.G.; visualization, S.F.A.; supervision, P.G. and S.S.B.;
project administration, A.M.; funding acquisition, A.M. All authors have read and agreed to the published version
of the manuscript.
Funding: We acknowledge the financial support of this work by the Hungarian State and the European Union
under the EFOP-3.6.1-16-2016-00010 project and the 2017-1.3.1-VKE-2017-00025 project. The research presented in
this paper was carried out as part of the EFOP-3.6.2-16-2017-00016 project in the framework of the New Szechenyi
Plan. The completion of this project is funded by the European Union and co-financed by the European Social
Fund. We acknowledge the financial support of this work by the Hungarian State and the European Union under
the EFOP-3.6.1-16-2016-00010 project.
Acknowledgments: We acknowledge the financial support of this work by the Hungarian-Mexican bilateral
Scientific and Technological (2019-2.1.11-TÉT-2019-00007) project. The support of the Alexander von Humboldt
Foundation is acknowledged.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
AE Autoencoder
ALT Augmented Lagrangian Technique
ARIMA Autoregressive Integrated Moving Average
BCM Behavior Cloning Module
BI Business Intelligence
CNN Convolutional Neural Network
CNNR Convolutional Neural Network Regression
DAM Data Augmentation Module
DBNs Deep Belief Networks
DDPG Deterministic Policy Gradient
DDQN Double Deep Q-network
DDRL Deep Deterministic Reinforcement Learning
DL Deep Learning
DNN Deep Neural Network
DPG Deterministic Policy Gradient
DQN Deep Q-network
DRL Deep Reinforcement Learning
EIIE Ensemble of Identical Independent Evaluators
ENDE Encoder-Decoder
GANs Generative Adversarial Nets
IPM Infused Prediction Module
LDA Latent Dirichlet Allocation
MARL Multi-agent Reinforcement Learning
MCTS Monte-Carlo Tree Search
MDP Markov Decision Process
MFGs Mean Field Games
ML Machine Learning
MLP Multilayer Perceptron
MLS Multi Layer Selection
NFQCA Neural Fitted Q Iteration with Continuous Actions
NLP Natural Language Processing
OSBL Online Stochastic Batch Learning
PCA Principal Component Analysis
PDE Partial Differential Equations
PG Policy Gradient
Mathematics 2020, 8, 1640 37 of 42
PPO Proximal Policy Optimization

PVM Portfolio-Vector Memory
RBM Restricted Boltzmann Machine
RCNN Recurrent Convolutional Neural Network
RL Reinforcement Learning
RNN Recurrent Neural Network
R-NN Random Neural Network
RS Risk Score
RTB Real-Time Bidding
SAE Stacked Auto-encoder
SPF Survey of Professional Forecasters
SPG Stochastic Policy Gradient
SS Sponsored Search
SVR Support Vector Regression
TGRU Two-Stream Gated Recurrent Unit
UCRP Uniform Constant Rebalanced Portfolios
References
1. Erhan, D.; Bengio, Y.; Courville, A.; Vincent, P. Visualizing higher-layer features of a deep network.
Univ. Montr. 2009, 1341, 1.
2. Olah, C.; Mordvintsev, A.; Schubert, L. Feature visualization. Distill 2017, 2, e7. [CrossRef]
3. Ding, X.; Zhang, Y.; Liu, T.; Duan, J. Deep learning for event-driven stock prediction. In Proceedings
of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina,
25–31 July 2015.
4. Pacelli, V.; Azzollini, M. An artificial neural network approach for credit risk management. J. Intell. Learn.
Syst. Appl. 2011, 3, 103–112. [CrossRef]
5. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent
neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
6. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.;
Fidjeland, A.K.; Ostrovski, G. Human-level control through deep reinforcement learning. Nature 2015, 518,
529–533. [CrossRef] [PubMed]
7. Sutton, R.S. Generalization in reinforcement learning: Successful examples using sparse coarse coding.
In Proceedings of the Advances in Neural Information Processing Systems 9, Los Angeles, CA, USA,
September 1996; pp. 1038–1044.
8. Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief
survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [CrossRef]
9. Moody, J.; Wu, L.; Liao, Y.; Saffell, M. Performance functions and reinforcement learning for trading systems
and portfolios. J. Forecast. 1998, 17, 441–470. [CrossRef]
10. Dempster, M.A.; Payne, T.W.; Romahi, Y.; Thompson, G.W. Computational learning techniques for intraday
FX trading using popular technical indicators. IEEE Trans. Neural Netw. 2001, 12, 744–754. [CrossRef]
11. Bekiros, S.D. Heterogeneous trading strategies with adaptive fuzzy actor–critic reinforcement learning:
A behavioral approach. J. Econ. Dyn. Control 2010, 34, 1153–1170. [CrossRef]
12. Kearns, M.; Nevmyvaka, Y. Machine learning for market microstructure and high frequency trading. In High
Frequency Trading: New Realities for Traders, Markets, and Regulators; Easley, D., de Prado, M.L., O’Hara, M.,
Eds.; Risk Books: London, UK, 2013.
13. Britz, D. Introduction to Learning to Trade with Reinforcement Learning. Available online: http:
//www.wildml.com/2018/02/introduction-to-learning-to-tradewith-reinforcement-learning (accessed on
1 August 2018).
14. Guo, Y.; Fu, X.; Shi, Y.; Liu, M. Robust log-optimal strategy with reinforcement learning. arXiv 2018,
arXiv:1805.00205.
15. Jiang, Z.; Xu, D.; Liang, J. A deep reinforcement learning framework for the financial portfolio management
problem. arXiv 2017, arXiv:1706.10059.
16. Nosratabadi, S.; Mosavi, A.; Duan, P.; Ghamisi, P. Data science in economics. arXiv 2020, arXiv:2003.13422.
Mathematics 2020, 8, 1640 38 of 42
17. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J.
Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [CrossRef]
18. Chen, Y.; Zhao, X.; Jia, X. Spectral–spatial classification of hyperspectral data based on deep belief network.
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2381–2392. [CrossRef]
19. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks.
In Proceedings of the Advances in Neural Information Processing Systems 25, Los Angeles, CA, USA,
1 June 2012; pp. 1097–1105.
20. Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult.
IEEE Trans. Neural Netw. 1994, 5, 157–166. [CrossRef]
21. Williams, R.J.; Zipser, D. A learning algorithm for continually running fully recurrent neural networks.
Neural Comput. 1989, 1, 270–280. [CrossRef]
22. Schmidhuber, J.; Hochreiter, S. Long short-term memory. Neural Comput. 1997, 9, 1735–1780.
23. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on
sequence modeling. arXiv 2014, arXiv:1412.3555.
24. Wu, J. Introduction to Convolutional Neural Networks; National Key Lab for Novel Software Technology, Nanjing
University: Nanjing, China, 2017; Volume 5, p. 23.
25. François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement
learning. Found. Trends® Mach. Learn. 2018, 11, 219–354. [CrossRef]
26. Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [CrossRef]
27. Bellman, R.E.; Dreyfus, S. Applied Dynamic Programming; Princeton University Press: Princeton, NJ, USA, 1962.
28. Lin, L.-J. Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn.
1992, 8, 293–321. [CrossRef]
29. Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent
magnitude. COURSERA Neural Netw. Mach. Learn. 2012, 4, 26–31.
30. Hasselt, H.V. Double Q-learning. In Proceedings of the Advances in Neural Information Processing Systems
23, San Diego, CA, USA, July 2010; pp. 2613–2621.
31. Bellemare, M.G.; Dabney, W.; Munos, R. A distributional perspective on reinforcement learning.
In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017;
Volume 70, pp. 449–458.
32. Dabney, W.; Rowland, M.; Bellemare, M.G.; Munos, R. Distributional reinforcement learning with quantile
regression. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans,
LA, USA, 2–7 February 2018.
33. Rowland, M.; Bellemare, M.G.; Dabney, W.; Munos, R.; Teh, Y.W. An analysis of categorical distributional
reinforcement learning. arXiv 2018, arXiv:1802.08163.
34. Morimura, T.; Sugiyama, M.; Kashima, H.; Hachiya, H.; Tanaka, T. Nonparametric return distribution
approximation for reinforcement learning. In Proceedings of the 27th International Conference on Machine
Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 799–806.
35. Jaderberg, M.; Mnih, V.; Czarnecki, W.M.; Schaul, T.; Leibo, J.Z.; Silver, D.; Kavukcuoglu, K. Reinforcement
learning with unsupervised auxiliary tasks. arXiv 2016, arXiv:1611.05397.
36. Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous
methods for deep reinforcement learning. In Proceedings of the 12th International Conference, MLDM 2016,
New York, NY, USA, 18–20 July 2016; Springer: Berlin/Heidelberg, Germany, 2016; Volume 9729.
37. Salimans, T.; Ho, J.; Chen, X.; Sidor, S.; Sutskever, I. Evolution strategies as a scalable alternative to
reinforcement learning. arXiv 2017, arXiv:1703.03864.
38. Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning.
Mach. Learn. 1992, 8, 229–256. [CrossRef]
39. Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient
algorithms. In Proceedings of the 31st International Conference on International Conference on Machine
Learning, Beijing, China, 21–26 June 2014.
40. Hafner, R.; Riedmiller, M. Reinforcement learning in feedback control. Mach. Learn. 2011, 84, 137–169.
[CrossRef]
41. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control
with deep reinforcement learning. arXiv 2015, arXiv:1509.02971.
Mathematics 2020, 8, 1640 39 of 42
42. Konda, V.R.; Tsitsiklis, J.N. Actor-critic algorithms. In Proceedings of the Advances in Neural Information
Processing Systems 12, Chicago, IL, USA, 20 June 2000; pp. 1008–1014.
43. Wang, Z.; Bapst, V.; Heess, N.; Mnih, V.; Munos, R.; Kavukcuoglu, K.; de Freitas, N. Sample efficient
actor-critic with experience replay. arXiv 2016, arXiv:1611.01224.
44. Gruslys, A.; Azar, M.G.; Bellemare, M.G.; Munos, R. The reactor: A sample-efficient actor-critic architecture.
arXiv 2017, arXiv:1704.04651.
45. O’Donoghue, B.; Munos, R.; Kavukcuoglu, K.; Mnih, V. Combining policy gradient and Q-learning. arXiv
2016, arXiv:1611.01626.
46. Fox, R.; Pakman, A.; Tishby, N. Taming the noise in reinforcement learning via soft updates. arXiv 2015,
arXiv:1512.08562.
47. Haarnoja, T.; Tang, H.; Abbeel, P.; Levine, S. Reinforcement learning with deep energy-based policies.
In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017;
Volume 70, pp. 1352–1361.
48. Schulman, J.; Chen, X.; Abbeel, P. Equivalence between policy gradients and soft q-learning. arXiv 2017,
arXiv:1704.06440.
49. Oh, J.; Guo, X.; Lee, H.; Lewis, R.L.; Singh, S. Action-conditional video prediction using deep networks in
atari games. In Proceedings of the Advances in Neural Information Processing Systems 28: 29th Annual
Conference on Neural Information Processing Systems, Seattle, WA, USA, May 2015; pp. 2863–2871.
50. Mathieu, M.; Couprie, C.; LeCun, Y. Deep multi-scale video prediction beyond mean square error. arXiv
2015, arXiv:1511.05440.
51. Finn, C.; Goodfellow, I.; Levine, S. Unsupervised learning for physical interaction through video prediction.
In Proceedings of the Advances in Neural Information Processing Systems Advances in Neural Information,
Processing Systems 29, Barcelona, Spain, 5–10 December 2016; pp. 64–72.
52. Pascanu, R.; Li, Y.; Vinyals, O.; Heess, N.; Buesing, L.; Racanière, S.; Reichert, D.; Weber, T.; Wierstra, D.;
Battaglia, P. Learning model-based planning from scratch. arXiv 2017, arXiv:1707.06170.
53. Deisenroth, M.; Rasmussen, C.E. PILCO: A model-based and data-efficient approach to policy search.
In Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA,
28 June–2 July 2011; pp. 465–472.
54. Wahlström, N.; Schön, T.B.; Deisenroth, M.P. From pixels to torques: Policy learning with deep dynamical
models. arXiv 2015, arXiv:1502.02251.
55. Levine, S.; Koltun, V. Guided policy search. In Proceedings of the International Conference on Machine
Learning, Oslo, Norway, February 2013; pp. 1–9.
56. Gu, S.; Lillicrap, T.; Sutskever, I.; Levine, S. Continuous deep q-learning with model-based acceleration.
In Proceedings of the International Conference on Machine Learning(ICML 2013), Atlanta, GA, USA,
June 2013; pp. 2829–2838.
57. Nagabandi, A.; Kahn, G.; Fearing, R.S.; Levine, S. Neural network dynamics for model-based deep
reinforcement learning with model-free fine-tuning. In Proceedings of the 2018 IEEE International Conference
on Robotics and Automation (ICRA), Brisbane, Australia, 21–26 May 2018; pp. 7559–7566.
58. Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.;
Panneershelvam, V.; Lanctot, M. Mastering the game of Go with deep neural networks and tree search.
Nature 2016, 529, 484–489. [CrossRef]
59. Heess, N.; Wayne, G.; Silver, D.; Lillicrap, T.; Erez, T.; Tassa, Y. Learning continuous control policies by
stochastic value gradients. In Proceedings of the Advances in Neural Information Processing Systems;
Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada, December 2015;
pp. 2944–2952.
60. Kansky, K.; Silver, T.; Mély, D.A.; Eldawy, M.; Lázaro-Gredilla, M.; Lou, X.; Dorfman, N.; Sidor, S.;
Phoenix, S.; George, D. Schema networks: Zero-shot transfer with a generative causal model of intuitive
physics. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia,
6–11 August 2017; Volume 70, pp. 1809–1818.
61. Patel, J.; Shah, S.; Thakkar, P.; Kotecha, K. Predicting stock and stock price index movement using trend
deterministic data preparation and machine learning techniques. Expert Syst. Appl. 2015, 42, 259–268.
[CrossRef]
Mathematics 2020, 8, 1640 40 of 42
62. Lien Minh, D.; Sadeghi-Niaraki, A.; Huy, H.D.; Min, K.; Moon, H. Deep learning approach for short-term
stock trends prediction based on two-stream gated recurrent unit network. IEEE Access 2018, 6, 55392–55404.
[CrossRef]
63. Song, Y.; Lee, J.W.; Lee, J. A study on novel filtering and relationship between input-features and target-vectors
in a deep learning model for stock price prediction. Appl. Intell. 2019, 49, 897–911. [CrossRef]
64. Go, Y.H.; Hong, J.K. Prediction of stock value using pattern matching algorithm based on deep learning.
Int. J. Recent Technol. Eng. 2019, 8, 31–35. [CrossRef]
65. Das, S.; Mishra, S. Advanced deep learning framework for stock value prediction. Int. J. Innov. Technol.
Explor. Eng. 2019, 8, 2358–2367. [CrossRef]
66. Huynh, H.D.; Dang, L.M.; Duong, D. A new model for stock price movements prediction using deep neural
network. In Proceedings of the Eighth International Symposium on Information and Communication
Technology, Ann Arbor, MI, USA, June 2016; pp. 57–62.
67. Mingyue, Q.; Cheng, L.; Yu, S. Application of the artifical neural network in predicting the direction of
stock market index. In Proceedings of the 2016 10th International Conference on Complex, Intelligent,
and Software Intensive Systems (CISIS), Fukuoka, Japan, 6–8 July 2016; pp. 219–223.
68. Weng, B.; Martinez, W.; Tsai, Y.T.; Li, C.; Lu, L.; Barth, J.R.; Megahed, F.M. Macroeconomic indicators alone
can predict the monthly closing price of major US indices: Insights from artificial intelligence, time-series
analysis and hybrid models. Appl. Soft Comput. 2017, 71, 685–697. [CrossRef]
69. Bodaghi, A.; Teimourpour, B. The detection of professional fraud in automobile insurance using social
network analysis. arXiv 2018, arXiv:1805.09741.
70. Wang, Y.; Xu, W. Leveraging deep learning with LDA-based text analytics to detect automobile insurance
fraud. Decis. Support Syst. 2018, 105, 87–95. [CrossRef]
71. Siaminamini, M.; Naderpour, M.; Lu, J. Generating a risk profile for car insurance policyholders: A deep
learning conceptual model. In Proceedings of the Australasian Conference on Information Systems, Geelong,
Australia, 3–5 December 2012.
72. Myerson, R.B. Optimal auction design. Math. Oper. Res. 1981, 6, 58–73. [CrossRef]
73. Manelli, A.M.; Vincent, D.R. Bundling as an optimal selling mechanism for a multiple-good monopolist.
J. Econ. Theory 2006, 127, 1–35. [CrossRef]
74. Pavlov, G. Optimal mechanism for selling two goods. BE J. Theor. Econ. 2011, 11, 122–144. [CrossRef]
75. Cai, Y.; Daskalakis, C.; Weinberg, S.M. An algorithmic characterization of multi-dimensional mechanisms.
In Proceedings of the Forty-Fourth Annual ACM Symposium on Theory of Computing; Association for
Computing Machinery, New York, NY, USA, 19–22 May 2012; pp. 459–478.
76. Dutting, P.; Zheng, F.; Narasimhan, H.; Parkes, D. Optimal economic design through deep learning.
In Proceedings of the Conference on Neural Information Processing Systems (NIPS), Berlin, Germany,
4–9 December 2017.
77. Feng, Z.; Narasimhan, H.; Parkes, D.C. Deep learning for revenue-optimal auctions with budgets.
In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems,
Stockholm, Sweden, 10–15 July 2018; pp. 354–362.
78. Sakurai, Y.; Oyama, S.; Guo, M.; Yokoo, M. Deep false-name-proof auction mechanisms. In Proceedings
of the International Conference on Principles and Practice of Multi-Agent Systems, Kolkata, India,
12–15 November 2010; pp. 594–601.
79. Luong, N.C.; Xiong, Z.; Wang, P.; Niyato, D. Optimal auction for edge computing resource management
in mobile blockchain networks: A deep learning approach. In Proceedings of the 2018 IEEE International
Conference on Communications (ICC), Kansas City, MO, USA, 20–24 May 2018; pp. 1–6.
80. Dütting, P.; Feng, Z.; Narasimhan, H.; Parkes, D.C.; Ravindranath, S.S. Optimal auctions through deep
learning. arXiv 2017, arXiv:1706.03459.
81. Al-Shabi, M. Credit card fraud detection using autoencoder model in unbalanced datasets. J. Adv. Math.
Comput. Sci. 2019, 33, 1–16. [CrossRef]
82. Roy, A.; Sun, J.; Mahoney, R.; Alonzi, L.; Adams, S.; Beling, P. Deep learning detecting fraud in credit card
transactions. In Proceedings of the 2018 Systems and Information Engineering Design Symposium (SIEDS),
Charlottesville, VA, USA, 27 April 2018; IEEE: Trondheim, Norway, 2018; pp. 129–134.
Mathematics 2020, 8, 1640 41 of 42
83. Han, J.; Barman, U.; Hayes, J.; Du, J.; Burgin, E.; Wan, D. Nextgen aml: Distributed deep learning based
language technologies to augment anti money laundering investigation. In Proceedings of the ACL 2018,
System Demonstrations, Melbourne, Australia, 15–20 July 2018.
84. Pumsirirat, A.; Yan, L. Credit card fraud detection using deep learning based on auto-encoder and restricted
boltzmann machine. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 18–25. [CrossRef]
85. Estrella, A.; Hardouvelis, G.A. The term structure as a predictor of real economic activity. J. Financ. 1991, 46,
555–576. [CrossRef]
86. Smalter Hall, A.; Cook, T.R. Macroeconomic indicator forecasting with deep neural networks. In Federal
Reserve Bank of Kansas City Working Paper; Federal Reserve Bank of Kansas City: Kansas City, MO, USA, 2017;
Volume 7, pp. 83–120. [CrossRef]
87. Haider, A.; Hanif, M.N. Inflation forecasting in Pakistan using artificial neural networks. Pak. Econ. Soc. Rev.
2009, 47, 123–138.
88. Chakravorty, G.; Awasthi, A. Deep learning for global tactical asset allocation. SSRN Electron. J. 2018, 3242432.
[CrossRef]
89. Addo, P.; Guegan, D.; Hassani, B. Credit risk analysis using machine and deep learning models. Risks 2018,
6, 38. [CrossRef]
90. Ha, V.-S.; Nguyen, H.-N. Credit scoring with a feature selection approach based deep learning. In Proceedings
of the MATEC Web of Conferences, Beijing, China, 25–27 May 2018; p. 05004.
91. Heaton, J.; Polson, N.; Witte, J.H. Deep learning for finance: Deep portfolios. Appl. Stoch. Models Bus. Ind.
2017, 33, 3–12. [CrossRef]
92. Sirignano, J.; Sadhwani, A.; Giesecke, K. Deep learning for mortgage risk. arXiv 2016, arXiv:1607.02470.
[CrossRef]
93. Aggarwal, S.; Aggarwal, S. Deep investment in financial markets using deep learning models. Int. J.
Comput. Appl. 2017, 162, 40–43. [CrossRef]
94. Culkin, R.; Das, S.R. Machine learning in finance: The case of deep learning for option pricing. J. Invest. Manag.
2017, 15, 92–100.
95. Fang, Y.; Chen, J.; Xue, Z. Research on quantitative investment strategies based on deep learning. Algorithms
2019, 12, 35. [CrossRef]
96. Serrano, W. The random neural network with a genetic algorithm and deep learning clusters in fintech:
Smart investment. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications
and Innovations, Hersonissos, Crete, Greece, 24–26 May 2019; pp. 297–310.
97. Hutchinson, J.M.; Lo, A.W.; Poggio, T. A nonparametric approach to pricing and hedging derivative securities
via learning networks. J. Financ. 1994, 49, 851–889. [CrossRef]
98. Cruz, E.; Orts-Escolano, S.; Gomez-Donoso, F.; Rizo, C.; Rangel, J.C.; Mora, H.; Cazorla, M. An augmented
reality application for improving shopping experience in large retail stores. Virtual Real. 2019, 23, 281–291.
[CrossRef]
99. Loureiro, A.L.; Miguéis, V.L.; da Silva, L.F. Exploring the use of deep neural networks for sales forecasting in
fashion retail. Decis. Support Syst. 2018, 114, 81–93. [CrossRef]
100. Nogueira, V.; Oliveira, H.; Silva, J.A.; Vieira, T.; Oliveira, K. RetailNet: A deep learning approach for people
counting and hot spots detection in retail stores. In Proceedings of the 2019 32nd SIBGRAPI Conference on
Graphics, Patterns and Images (SIBGRAPI), Rio de Janeiro, Brazil, 28–31 October 2019; pp. 155–162.
101. Ribeiro, F.D.S.; Caliva, F.; Swainson, M.; Gudmundsson, K.; Leontidis, G.; Kollias, S. An adaptable deep
learning system for optical character verification in retail food packaging. In Proceedings of the 2018 IEEE
Conference on Evolving and Adaptive Intelligent Systems (EAIS), Rhodes, Greece, 25–27 May 2018; pp. 1–8.
102. Fombellida, J.; Martín-Rubio, I.; Torres-Alegre, S.; Andina, D. Tackling business intelligence with bioinspired
deep learning. Neural Comput. Appl. 2018, 1–8. [CrossRef]
103. Singh, V.; Verma, N.K. Deep learning architecture for high-level feature generation using stacked auto
encoder for business intelligence. In Complex Systems: Solutions and Challenges in Economics, Management and
Engineering; Springer: Berlin/Heidelberg, Germany, 2018; pp. 269–283.
104. Nolle, T.; Seeliger, A.; Mühlhäuser, M. BINet: Multivariate business process anomaly detection using deep
learning. In Proceedings of the International Conference on Business Process Management, Sydney, Australia,
9–14 September 2018; pp. 271–287.
Mathematics 2020, 8, 1640 42 of 42
105. Evermann, J.; Rehse, J.-R.; Fettke, P. Predicting process behaviour using deep learning. Decis. Support Syst.
2017, 100, 129–140. [CrossRef]
106. West, D. Neural network credit scoring models. Comput. Oper. Res. 2000, 27, 1131–1152. [CrossRef]
107. Peng, Y.; Kou, G.; Shi, Y.; Chen, Z. A multi-criteria convex quadratic programming model for credit data
analysis. Decis. Support Syst. 2008, 44, 1016–1030. [CrossRef]
108. Kim, Y.; Ahn, W.; Oh, K.J.; Enke, D. An intelligent hybrid trading system for discovering trading rules for the
futures market using rough sets and genetic algorithms. Appl. Soft Comput. 2017, 55, 127–140. [CrossRef]
109. Xiong, Z.; Liu, X.-Y.; Zhong, S.; Yang, H.; Walid, A. Practical deep reinforcement learning approach for stock
trading. arXiv 2018, arXiv:1811.07522.
110. Li, X.; Li, Y.; Zhan, Y.; Liu, X.-Y. Optimistic bull or pessimistic bear: Adaptive deep reinforcement learning
for stock portfolio allocation. arXiv 2019, arXiv:1907.01503.
111. Li, Y.; Ni, P.; Chang, V. An empirical research on the investment strategy of stock market based on deep
reinforcement learning model. Comput. Sci. Econ. 2019. [CrossRef]
112. Azhikodan, A.R.; Bhat, A.G.; Jadhav, M.V. Stock Trading Bot Using Deep Reinforcement Learning.
In Innovations in Computer Science and Engineering; Springer: Berlin/Heidelberg, Germany, 2019; pp. 41–49.
113. Liang, Z.; Chen, H.; Zhu, J.; Jiang, K.; Li, Y. Adversarial deep reinforcement learning in portfolio management.
arXiv 2018, arXiv:1808.09940.
114. Jiang, Z.; Liang, J. Cryptocurrency portfolio management with deep reinforcement learning. In Proceedings
of the 2017 Intelligent Systems Conference (IntelliSys), London, UK, 7–8 September 2017; pp. 905–913.
115. Yu, P.; Lee, J.S.; Kulyatin, I.; Shi, Z.; Dasgupta, S. Model-based Deep Reinforcement Learning for Dynamic
Portfolio Optimization. arXiv 2019, arXiv:1901.08740.
116. Feng, L.; Tang, R.; Li, X.; Zhang, W.; Ye, Y.; Chen, H.; Guo, H.; Zhang, Y. Deep reinforcement learning based
recommendation with explicit user-item interactions modeling. arXiv 2018, arXiv:1810.12027.
117. Zhao, J.; Qiu, G.; Guan, Z.; Zhao, W.; He, X. Deep reinforcement learning for sponsored search real-time
bidding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &
Data Mining, London, UK, 19–23 August 2018; pp. 1021–1030.
118. Liu, J.; Zhang, Y.; Wang, X.; Deng, Y.; Wu, X. Dynamic Pricing on E-commerce Platform with Deep
Reinforcement Learning. arXiv 2019, arXiv:1912.02572.
119. Zheng, G.; Zhang, F.; Zheng, Z.; Xiang, Y.; Yuan, N.J.; Xie, X.; Li, Z. DRN: A deep reinforcement learning
framework for news recommendation. In Proceedings of the 2018 World Wide Web Conference, Lyon, France,
23–27 April 2018; pp. 167–176.
120. Kompan, M.; Bieliková, M. Content-based news recommendation. In Proceedings of the International
Conference on Electronic Commerce and Web Technologies, Valencia, Spain, September 2015; pp. 61–72.
121. Jaderberg, M.; Czarnecki, W.M.; Dunning, I.; Marris, L.; Lever, G.; Castaneda, A.G.; Beattie, C.;
Rabinowitz, N.C.; Morcos, A.S.; Ruderman, A. Human-level performance in first-person multiplayer
games with population-based deep reinforcement learning. arXiv 2018, arXiv:1807.01281.
122. Guéant, O.; Lasry, J.-M.; Lions, P.-L. Mean field games and applications. In Paris-Princeton Lectures on
Mathematical Finance 2010; Springer: Berlin/Heidelberg, Germany, 2011; pp. 205–266.
123. Vasiliadis, A. An introduction to mean field games using probabilistic methods. arXiv 2019, arXiv:1907.01411.
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Mathematics 08 01640 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mathematics 08 01640 PDF

Uploaded by

Copyright:

Available Formats

mathematics

Mathematics 2020, 8, 1640; doi:10.3390/math8101640 www.mdpi.com/journal/mathematics

• Classification of the existing DL, RL, and DRL approaches in economics.

2. Methodology and Taxonomy of the Survey

Mathematics 2020, 8, x FOR PEER REVIEW 3 of 44

Figure 1. Diagram of the systematic selection of the survey database.

Value-Based Methods Q-Learning Deep Q-Networks (DQN)

Deep Reinforcement Learning Distributional DQN

Stochastic Policy Gradient (SPG) Deterministic Policy Gradient (DPG)

Figure 2. Taxonomy of the notable DRL applied in economics.

2.1. Deep Learning Methods

Mathematics 2020, 8, 1640 4 of 42

2.3. Deep Belief Networks (DBNs)

Mathematics 2020, 8, x FOR PEER REVIEW 6 of 44

2.3. Deep Belief Networks (DBNs)

(a) Kernel (b) Input (c) Output

Analogously, the Q value function

2.6.1. Value-Based Methods

Figure 9. Basic structure of the DQN algorithm.

2.7.1. Stochastic Policy Gradient (SPG)

∇ω πω (s, a) = πω (s, a) ∇ω log(πω (s, a)) (18)

2.7.2. Deterministic Policy Gradient (DPG)

∇ω V πω (s0 )= Es∼ρπ ω [∇ω (πω )∇a (Qπω (s, a)) a = πω (s)]

which shows that Equation (20) is based on ∇a (Qπω (s, a)).

2.7.3. Actor–Critic Methods

2.7.4. Combined Policy Gradient and Q-Learning

2.8. Model-Based Methods

2.8.1. Pure Model-Based Methods

2.8.2. Integrating Model-Free and Model-Based Methods (IMF&MBM)

Figure 10. Illustration of the MCTS process.

3.1. Deep Learning Application in Economics

3.1.1. Deep Learning in Stock Pricing

Table 1. Application of deep learning in stock price prediction.

Reference Methods Application

As is clearFigure from 11.

⃗⃗⃗⃗⃗𝓏→𝑥𝑡 + ⃗⃗⃗⃗ → →⃗⃗⃗⃗

Figure 14. Illustration

[71] Autoencoder technique Evaluation of risk in car insurance

Figure 16. The graph of the conceptual model.

3.1.3. Deep Learning in Auction Mechanisms

Mathematics 2020, 8, x FOR PEER REVIEW 18 of 44

Figure 18. Illustration

3.1.4. Deep Learning in Banking and Online Markets

Algorithm 1. AE pseudo algorithm

3.1.5. Deep Learning in Macroeconomics

Figure 20. Average RMSE reports.

Table 6. Inflection point prediction for Unemployment around 2007.

Time Horizon SPF Encoder–Decoder

3.1.6. Deep Learning in Financial Markets (Service & Risk Management)

Reference Methods Application

Figure 22. The comparison results for the study.

Figure 22 contains the base type

References Table 8. Application

[93] Reference LSTM and AE

Aggarwal and andAggarwal

Figure 24. Structure of the smart investment model.

Table 9. Application of deep learning in retail markets.

References Methods Application

Figure 25. Comparison of CNN for two separate data sets.

3.1.9. Deep Learning in Business (Intelligence)

Figure 29. Multi-Layer SAE architecture.

3.3. Deep Reinforcement Learning in Stock Trading

Table 11. Application of deep RLs in the stock market.

∇ω V πω (s0 )= Es∼ρπ ω [∇ω (πω )∇a (Qπω (s, a))a = πω (s)]