You are on page 1of 5

Bayesian Compressed Sensing-Based Hybrid

Models for Stock Price Forecasting


Somaya Sadik ∗ , Mohamed Et-tolba † , Benayad Nsiri ‡
∗ ‡ Research center STIS, Team M2CS, ENSAM, Mohammed V University, Rabat, Morocco
† Institut National des Postes et Télécommunications, Rabat, Morocco
2023 IEEE Statistical Signal Processing Workshop (SSP) | 978-1-6654-5245-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/SSP53291.2023.10207939

E-mail: ∗ somaya sadik@um5.ac.ma,


† ettolba@inpt.ac.ma,
‡ benayad.nsiri@ensam.um5.ac.ma

Abstract—Nowadays, conventional statistical approaches to Stock price time series are known to be noisy due to
stock price forecasting fail to provide accurate predictions diverse factors such as fast transmission, storage errors, and
because financial data are affected by noise from different a high number of transactions in a small period. The noise
sources. To deal with this issue, we propose to apply Bayesian
compressed sensing (BCS) for noise removal before performing effect renders the forecasting models unstable and prone to
any prediction. This results in a hybrid forecasting model overfitting [7]. Consequently, one has to integrate a denoising
combining BCS, de-noising, and a prediction technique. The (pre-processing) method in the forecasting models for more
BCS approach was chosen instead of the traditional compressed accurate predictions.
sensing (CS) due to its superiority in terms of signal recovery
accuracy. In the prediction step, we consider three models Data denoising has been carried out in many studies as
namely, autoregressive integrated moving average (ARIMA),
long short-term memory (LSTM), and forward neural networks a pre-processing method for stock price forecasting. In [8],
(FNN). The Standard & Poor 500 index (SP500), the Hang a neural network model aided by an exponential smoothing
Seng index (HSI), and the Euro Stock 50 index (EU50) series denoising method was presented for stock market forecasting.
are used as sample data for validation. In terms of accuracy, In [9], the authors proposed an AI learning scheme for
numerical results show that the proposed BCS-based hybrid crude oil price forecasting using compressed sensing (CS)
models provide better performance compared to their single
counterparts. for noise removal. A bagging method based on the empirical
Index Terms—Bayesian compressed sensing, Stock price fore- mode decomposition and the Holt-Winter seasonal method
casting, Neural networks, LSTM, FNN, ARIMA, Denoising. was proposed for predicting stock market time series [10].
Additionally, in [11], Tang et al. combined different denoising
I. I NTRODUCTION methods with (LSTM) to build a data prediction model. In
[12], the authors suggested combining convolutional neural
Developing accurate frameworks for stock price forecast- networks with CS for data denoising and recovery. These em-
ing has long been regarded as a challenging subject of study. pirical studies have thoroughly demonstrated the success of
Although the efficient market hypothesis argues that the denoising-based hybrid models in comparison to their single
prediction of future price values is impossible, many existing counterparts. Particularly, it has been proven that CS-based
propositions in the literature prove that this is feasible by data pre-processing strengthens the forecasting performance
employing properly designed predictive models [1]. Tradi- [9], [12].
tional predictive models, such as the autoregressive integrated
moving average (ARIMA) [2], and machine learning (ML) In this paper, we propose a hybrid model for financial time
techniques, including long short-term memory (LSTM), and series forecasting using Bayesian compressed sensing (BCS)
forward neural networks (FNN) [3], [4], have already been as a denoiser. The idea is to exploit the power of the BCS over
proposed for stock price forecasting. The ARIMA approach the conventional CS [13] in terms of signal reconstruction
assumes that the original price is linear, which can lead to efficiency, and noise removal [14]. In the second phase of the
unsatisfactory results when the stock price is nonstationary proposed approach, the noise-free time series are modeled,
[5]. As for artificial intelligence (AI) tools, they are capable and their future values are predicted using a traditional model
of modeling the hidden nonlinear patterns of the data without or an ML model. In this work, we consider the traditional
any statistical assumptions on them. model ARIMA, and the two ML models LSTM, and FNN.
It has also been determined that the accuracy of a stock
price forecasting model depends on the variables taken into The remainder of this paper is organized as follows. In
consideration in the model, the used algorithms, and the Section II, we present the proposed BCS-based denoising
adjustments made for optimizing the model. Hybrid models approach. In Section III, we study the forecasting models for
are often the result of tuning the predictive models by either, stock price prediction. In section IV, we give the numerical
adding a pre-processing step, reconfiguring some parameters, results and their analysis. Finally, the paper is concluded in
or extending the model by an enhancing procedure [6]. Section IV.

978-1-6654-5245-8/23/$31.00
Authorized licensed use limited to:©2023 507on January 30,2024 at 16:51:09 UTC from IEEE Xplore. Restrictions apply.
IEEE UNIVERSITY. Downloaded
BOURNEMOUTH
II. F INANCIAL DATA DENOISING WITH BAYESIAN B. Bayesian compressed sensing
COMPRESSED SENSING From a Bayesian perspective, the CS optimization problem
Let p = (p0 , . . . , pt , . . . , pn )T represent an (n + 1)- aims to derive a posterior probability distribution for s
dimensional stock price time series. For stationarity reasons, given the assumption that it is sparse and that the set of
we consider the log-returns of the stock price, represented by measurements y is known [19].
the n-dimensional vector r = (r1 , . . . , rt , . . . , rn )T whose Assume that the elements of the noise n∗ are Gaussian
elements are given by, distributed random variables with zero mean and unknown
variance σ 2 . The obtained Gaussian likelihood model for s
rt = log(pt ) − log(pt−1 ) (1)
and σ 2 based on y is expressed as,
In the rest of this paper, the stock price is represented by  
2
 2 −m/2
 −1 2
the vector r. Since the stock price time series is noisy, we p y | s; σ = 2πσ exp ∥Θs − y∥2 (7)
assume that the return vector r is composed of a vector r ∗ 2σ 2
representing the trend, and a vector n representing the noise Given the measurements y and that Θ is known, the prob-
[15]. Then, its tth element is expressed as, lem in (7) corresponds to finding the posterior densities for
s and σ 2 . Therefore if we use the Gaussian likelihood model
rt = rt∗ + nt (2)
(7), we can get a solution for the minimization problem in
A. CS-based denoising formulation (6) by computing the maximum a posteriori (MAP) estimate
Generally, CS needs the input signal to be sparse to of s.
guarantee its recovery from a small number of noisy mea- To ensure that s is sparse in the Bayesian framework, a
surements [13]. Luckily, financial signals are sparse or admit sparsity-inducing prior is chosen to model its distribution
a sparse representation in a transform domain such as Fourier [14]. A hierarchical prior is used for both s and σ 2 to
transform (FT), wavelet transform (WT), or discrete cosine estimate their posterior distribution since it is conjugate to
transform (DCT) [16]. the Gaussian likelihood model. Let st be the tth element of
Let us denote by Φ the transform matrix (or the sparsifying s, which follows a Gaussian distribution with zero mean and
matrix) whose size is n × n. Then, the sparse representation variance αt−1 . The prior of the sparse vector s is expressed
of the trend r ∗ in the transform domain Φ is written as, as,
n
Y
N st /0, αt−1

r ∗ = Φs (3) p(s/α) = (8)
t=1
where s is said to be k-sparse, i.e., it has k non-zero elements
while the others are equal to zero or negligible. where α = (α1 , . . . , αt , . . . , αt )T is the hyperparameter
After the sparse representation, the original stock price vector corresponding to s. If we replace the variance σ 2 with
time series r is acquired in an m-dimensional vector y using β, the Gamma priors placed on the hyperparameters α and
an m × n sensing matrix Ψ where m ≪ n. This is expressed β are given by,
as, n
Y
p(α/a, b) = Γ (αt /a, b) (9)
y = Ψr = Ψ(r ∗ + n) = ΨΦs + n∗ = Θs + n∗ (4)
t=1
where n∗ = Ψn and Θ = ΨΦ. Note that Ψ is a Gaussian p (β/c, d) = Γ (β/c, d) (10)
matrix whose elements are independent and identically dis-
tributed (i. i. d.). This ensures that Ψ is incoherent with any Under Bayes’ rule, the posterior density of s is given by,
sparsifying matrix Φ [17]. p(y | s, β)p(s | α)
From (4), the reconstruction problem is reduced to esti- p(s | y, α, β) = (11)
p(y | α, β)
mating the sparse vector s from the measurement vector y.
Accordingly, the reconstruction problem of the original time The marginal likelihood p(y | α, β) can be written as,
series r from its compressed version y can be expressed as
Z
an l0 -norm problem, p (y | α, β) = p (y/s, β) p (s/α) ds
  (12)
ŝ = arg min∥s∥0 , s.t. ∥Θs − y∥2 ≤ ϵ (5) −m − 12 1 T −1
s
= (2π) 2 |C| exp − y C y
2
where ϵ is the bound on the noise level (∥n∗ ∥2 ≤ ϵ).
However, (5) is an NP-hard problem, which has an infinite where C = βI +ΘG−1 ΘT and G = diag (α1 , α2 , . . . , αn ).
number of solutions. It is approximated by a convex l1 -norm By substituting (7), (8), and (12) in (11), we obtain the
minimization problem as follows, Gaussian posterior for the sparse vector s,

ŝ = arg min∥s∥1 , s.t. ∥Θs − y∥2 ≤ ϵ (6) p(s | y, α, β) ∼ N (µ, Σ) (13)


s
where the mean µ and covariance Σ are represented by,
Solving the optimization problem (6), which is equivalent
to noise removal, can be done by different approaches [18], µ = βΣΘT y (14)
such as the greedy algorithm orthogonal matching pursuit  −1
(OMP) [9]. This provides an estimate of the sparse vector s. Σ = βΘT Θ + G (15)

508on January 30,2024 at 16:51:09 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded
To find the parameters µ and Σ, we need an estimate of stochastic terms in the time series. p and q are called the AR
the hyperparameters α and β. This is done by evaluating the and MA orders, respectively. Identification of the values of p
maximum likelihood (ML) point estimate (α̂, β̂), which is and q will be done by cross-validation.
written as,
B. LSTM
(α̂, β̂) = arg max L(α, β) = arg max log p(y | α, β) (16)
LSTM is a subclass of recurrent neural networks (RNNs)
By differentiating (16) with respect to the hyperparameters
that can learn long-term connections by relying on feedback
α and β, Their updated values are given by,
loops [21]. Memory blocks are used in LSTM to replace
γt
αtnew = (17) conventional neurons in the hidden layer. Each LSTM unit is
µt made up of a memory cell as well as three major gates: input,
Pn
new m − t=1 γt output, and forget. The new information that is recorded into
β = (18) the memory state ct at time t is controlled by the input gate it
∥y − Θµ∥22
and a second gate c∗t . The forget gate ft regulates the previous
where γt = 1 − αt Σt,t , Σt,t is the tth diagonal element of information that must be erased or preserved on the memory
Σ and µt is the tth posterior mean weight of µ. cell at time t − 1, whereas the output gate ot controls which
Consequently, the values of α and β maximizing the information may be used for the memory cell’s output. The
marginal likelihood (16) are found iteratively using the rel- operations performed by an LSTM unit are described by,
evance vector machine (RVM) algorithm [20]. The hyperpa-
rameters α and β are, first set to some initial values, and it = σ (Ui st + Wi ht−1 + bi )
the parameters µ and Σ are then provided according to (14) ft = σ (Ug st + Wg ht−1 + bg )
and (15). Then, new values for α and β are generated based
c∗t = tanh (Uc st + Wc ht−1 + bc ) (20)
on (17) and (18). The used algorithm will iterate until the
convergence requirement is met. In this paper, a threshold is ct = gt ⊙ ct−1 + it ⊙ c∗t
set for the signal variance so that the noise can be reduced. ot = σ (Uo xt + Wo ht−1 + bo )
Once we get the final estimates of α and β, we can
generate the posterior density function of s that has a general where st is the input, b denotes the bias, W and U are weight
Gaussian form. Its maximization yields the MAP estimate of matrices, σ is the sigmoid activation function, and ⊙ is the
s, which is equivalent to the mean of the posterior distribution point-wise multiplication. The hidden state ht , representing
(ŝ = µ). The resulting vector ŝ is regarded as a solution to the output of the memory cell is given by,
the minimization problem in (6). Then, we can reconstruct
ht = ot ⊙ tanh (ct ) (21)
the trend r ∗ knowing the sparsifying matrix Φ. Thus, we
get an approximate estimation r ∗ of the original stock price
C. FNN
returns r along with a reduction in the noise level.
This model is a standard one-layer feed-forward neural
III. S TOCK PRICE FORECASTING MODELS network (FNN) that uses the Levenberg–Marquardt (LM)
The proposed BCS-based denoising scheme provides a learning algorithm [22]. Basically, the in-sample dataset is
dataset of the stock price returns with minimum noise. Based used to train an FNN-based forecasting model to make
on the cleaned data, different forecasting models are used to predictions with the out-of-sample dataset. The minimization
predict the future values of the stock price time series. The of the forecasting error function serves as a validation variable
chosen predictive models, namely ARIMA, LSTM, and FNN for iteratively updating the model parameters including con-
are presented hereafter. nection weights and node bias. The topology of the network
consists of five input neurons, a hidden layer with q = 10
A. ARIMA
neurons, and one output node. Besides, the function Tan-
This is a well-known linear forecasting model that com- sigmoid is applied as the activation function in the hidden
bines an autoregressive process (AR) and a moving average layers, while linear functions connect the output layers. The
process (MA) to provide a mixed time series model [1]. We final output of the FNN-based forecasting model can be
assume that the stock price time series s to be estimated is formulated as,
stationary, and ϵ is Gaussian white noise with zero-mean and
variance σ 2 . Under the ARIMA model, the tth element of s 10 5
!
X X
is estimated by, st+l = wjl f wij st−i + bj + bl (22)
j=1 i=1
p
X q
X
st = c + ϕi st−i + ϵt + θi ϵt−i (19) where st+l is the predicted future stock price return, l is
i=1 i=0 the horizon, st−i represents the historical observations, wjl
where c is a constant, st−i corresponds to the past observation are the connection weights from a hidden layer’s node to the
of dataset s at time (t − i), and ϵt is the tth element of the output layer, wij are the weights connecting the input neurons
noise vector ϵ. The operator ϕi represents the autocorrelation to the hidden neurons, bj and bl represent the bias, and f is
coefficients at lags 1, 2, . . . , p, and θi are the weights of the the activation function.

509on January 30,2024 at 16:51:09 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded
IV. N UMERICAL RESULTS FNN-BCS predictions are more in tune with it. This is im-
portant in the field of trading since accurate trend forecasting
To evaluate and analyze the performance of the proposed is what generates money and avoids losses.
BCS-based hybrid forecasting models, we considered three Figure 3 presents the RMSE results of the FNN-based and
daily closing price datasets of stock market indices, namely ARIMA-based forecasting models with various sparsity levels
the Standard & Poor 500 index (SP500), the Hang Seng index (10% to 80%) for the SP500 dataset. Obviously, beyond the
(HSI), and the Euro Stock 50 index (EU50). The data was sparsity level of 50%, the RMSE of the ARIMA-BCS rises,
acquired from Yahoo Finance with a period of five years while the RMSE of the FNN-BCS continues to decrease.
ranging from January 24th , 2018 to December 31th , 2022. The FNN-BCS model is more noise resistant since the noise
The datasets are divided into training and testing sets with present in the measured signal grows with the sparsity level.
a ratio of 4 : 1. All the experiments were performed on the
Matlab platform 2021a installed on a PC with a 1.60 GHz
i7-7Y75 processor and 16,0 GB RAM. TABLE I
During the denoising phase, the sparse representation of the P ERFORMANCE COMPARISON BASED ON RMSE OF THE PROPOSED
MODELS AND THEIR SINGLE COUNTERPARTS FOR THE DATASETS SP500,
original time series is computed using the DCT. This sparsi- HSI, AND EU50.
fying matrix was selected due to its computation simplicity
and its energy compaction properties. Then, the acquisition Model SP500 HSI EU50
ARIMA 0.0184 0.0194 0.0179
is performed using a random Gaussian matrix with a 50% LSTM 0.0974 0.0669 0.0761
sampling rate. Afterward, the BCS is performed for clean FNN 0.0148 0.0147 0.0125
signal recovery. ARIMA-BCS 0.0139 0.0140 0.0137
LSTM-BCS 0.0797 0.0338 0.0396
Figure 1, presents the log-return signals representing the FNN-BCS 0.0134 0.0142 0.0124
original noisy time series SP500 and the resulting denoised
dataset of size 50. It is observed that the log-returns of the
denoised time series show weak fluctuations compared to the 0.02
Original sample signal SP500
original data. In other words, denoising lowers the volatility FNN Prediction sample SP500
of the noisy time series. The prediction performance of the 0.01 FNN-BCS Prediction sample SP500
Log-returns

0
Original signal sample SP500
0.02 Denoised signal sample SP500 -0.01
Log-returns

0.01
0 -0.02
10 20 30 40 50 60
Time (days)
-0.01
Fig. 2. Sample signals of size 60 of the original time series and its FNN
-0.02 and FNN-BCS prediction results for the SP500 dataset.

10 20Time (days)30 40 50
Fig. 1. Sample signals of size 50 of the original time series and its denoising
using the BCS for the SP500 dataset. 0.018 FNN model
FNN-BCS model
ARIMA model
RMSE

0.016
proposed models is evaluated in terms of the root mean ARIMA-BCS model

squared error (RMSE), which is expressed as, 0.014

v 0.012
u
u1 X N 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Sparsity level
RM SE = t (x̂(t) − x(t))2 (23)
N t=1 Fig. 3. RMSE comparison of the models FNN, FNN-BCS, ARIMA, and
ARIMA-BCS for the dataset SP500 with varying sparsity levels.

Table I gives the RMSE results of the proposed BCS-based


hybrid forecasting models and other benchmark models. It V. C ONCLUSION
is clearly shown that both ARIMA-BCS and FNN-BCS The prediction performance of financial forecasting under
provide better performance compared with the other single the noise effect arises from several factors. In this paper,
models ARIMA, FNN, and LSTM and the other hybrid model we proposed several hybrid forecasting models integrating
LSTM-BCS. It is also noticed that hybrid models integrating a novel denoising method based on the BCS approach.
the BCS-based denoising outperform their single benchmark We first used the DCT and BCS for the sparsification and
models. Additionally, the models based on the LSTM neural reconstruction of the original time series and then estimate the
networks have an overall weak performance. predictions for the stock price using the forecasting models
Figure 2 displays a small sample (size 60) of the prediction ARIMA, LSTM, and FNN. Experimental results proved that
results for the models FNN and FNN-BCS applied to the the proposed denoising scheme enhances the performance of
dataset SP500. In some parts, the FNN prediction sample the forecasting models compared with their single counter-
gives a delayed response to the original signal, while the parts.

510on January 30,2024 at 16:51:09 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded
R EFERENCES trum analysis”, Mathematical Problems in Engineering, 1-13, 2021.
doi:10.1155/2021/9942410.
[1] X. Gabaix, and K. S. Ralph, ”In search of the origins of financial
[12] J. Malczewski, Jarosław, and Wawrzyniec Czubak. ”Hybrid Con-
fluctuations: The inelastic markets hypothesis”, No. w28967, National
volutional Neural Networks Based Framework for Skimmed Milk
Bureau of Economic Research, 2021.
Powder Price Forecasting” Sustainability 13, no. 7: 3699. 2021.
[2] A. A. Adebiyi, A. O. Adewumi, C. K. Ayo. ”Stock price prediction
https://doi.org/10.3390/su13073699
using the ARIMA model”. Proceedings - UKSim-AMSS 16th Inter-
[13] E. J. Candès. ”Compressive sampling”, Proceedings of the International
national Conference on Computer Modelling and Simulation, UKSim,
Congress of Mathematicians, Vol. 3, pp 1433-1452, 2006-01-01, ISBN
2014. doi:10.1109/UKSim.2014.67.
978-3-03719-022-7.
[3] O. B. Sezer, U. G. Mehmet, and A. M. Ozbayoglu. ”Financial time
[14] M. E. Tipping, “Sparse Bayesian learning and the relevance vector
series forecasting with deep learning: A systematic literature review:
machine,” J. Mach. Learn. Res., vol. 1, pp. 211–244, Sep. 2001.
2005–2019.” Applied soft computing 90, p: 106181, 2020.
[15] A. Borovykh, S. Bohte, C. W. Oosterlee. ”Conditional time series
[4] Torres, F. José, D. Hadjout, A. Sebaa, et al, ”Deep learning for time
forecasting with convolutional neural networks”, 2017. arXiv preprint
series forecasting: a survey”, Big Data, vol. 9, no 1, p. 3-21, 2021.
arXiv:1703.04691.
[5] H. Yu, L. J. Ming, R. Sumei, Z. Shuping, ”A hybrid model for financial
[16] B. Du, D. Fernandez-Reyes, and P. Barucca, ”Image processing
time series forecasting-integration of EWT, ARIMA with the improved
tools for financial time series classification”, 2020. arXiv preprint
ABC optimized ELM”, IEEE Access, vol. 8, pp:84501-84518, 2020.
arXiv:2008.06042.
doi:10.1109/ACCESS.2020.2987547.
[17] H. Nouasria, and M. Et-tolba. ”New constructions of Bernoulli and
[6] M. Durairaj, and B. K. Mohan. ”A review of two decades of deep
Gaussian sensing matrices for compressive sensing.” 2017 International
learning hybrids for financial time series prediction.” International
Conference on Wireless Networks and Mobile Communications (WIN-
Journal on Emerging Technologies, vol. 10, n. 3, pp:324-331,2019.
COM). IEEE, 2017.
[7] H. Hassani, A. Dionisio, M. Ghodsi, ”The effect of noise reduction in
[18] Stanković, Ljubiša, et al. ”A tutorial on sparse signal reconstruction
measuring the linear and nonlinear dependency of financial markets”,
and its applications in signal processing.” Circuits, Systems, and Signal
Nonlinear Analysis: Real World Applications 11, 492-502, 2010.
Processing, vol. 38, pp: 1206-1263, 2019.
doi:10.1016/j.nonrwa.2009.01.004.
[19] S. Ji, Y. Xue and L. Carin, ”Bayesian Compressive Sensing,” in IEEE
[8] E. L. de Faria, M. P. Albuquerque, J. L. Gonzalez, J. T. Caval-
Transactions on Signal Processing, vol. 56, no. 6, pp. 2346-2356, June
cante, M. P. Albuquerque, ”Predicting the Brazilian stock market
2008, doi: 10.1109/TSP.2007.914345.
through neural networks and adaptive exponential smoothing meth-
[20] T. Fletcher, ”Relevance vector machines explained.” University College
ods”, Expert Systems with Applications 36, 12506-12509, 2009.
London: London, UK, 2010.
doi:10.1016/j.eswa.2009.04.032.
[21] S. Siami-Namini, N. Tavakoli, and A. S. Namin ”The performance
[9] L. Yu, Y. Zhao, L. Tang, ”A compressed sensing based ai learning
of LSTM and BiLSTM in forecasting time series”, In 2019 IEEE
paradigm for crude oil price forecasting”, Energy Economics 46,
International Conference on Big Data, pp. 3285-3292, IEEE, 2019.
236–245, 2014. doi:10.1016/j.eneco.2014.09.019.
[22] H. Liu, ”On the Levenberg-Marquardt training method for feed-
[10] A. M. Awajan, M. T. Ismail, S. A. Wadi, ”Improving forecasting
forward neural networks,” 2010 Sixth International Conference
accuracy for stock market data using emd-hw bagging”, PLoS ONE
on Natural Computation, Yantai, China, 2010, pp. 456-460, doi:
13, 2018. doi:10.1371/journal.pone.0199582.
10.1109/ICNC.2010.5583151.
[11] Q. Tang, R. Shi, T. Fan, Y. Ma, J. Huang, ”Prediction of financial
time series based on LSTM using wavelet transform and singular spec-

511on January 30,2024 at 16:51:09 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded

You might also like