Professional Documents
Culture Documents
et de l’analyse de l’information
Students :
Ilyass El fikri
Mohamed Assili
Damien Ergun
Kenza Chebil
Professor :
Youssef Esstafa
December 2023
Table of Contents
Abstract 3
1 Introduction 3
2 Models presentation 4
2.1 ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.2 Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Gradient Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.2 Mechanisms of Gradient Boosting and XGBoost . . . . . . . . . . . . . . . . . . . 8
7 Deployment 24
8 Conclusion 25
1
9 Annex 26
9.1 Annex A - ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.2 Annex B - ARIMA-GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.3 Annex C - ARIMA-EGARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
9.4 Annex D- FARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
9.5 Annex E- - Residuals diagnostic for stacking . . . . . . . . . . . . . . . . . . . . . . . . . . 28
10 Personal note 28
10.1 Assili Mohamed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
10.2 El Fikri Ilyass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
10.3 ERGUN Damien . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
10.4 Kenza Chebil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
11 Bibliography 29
2
Abstract
1 Introduction
The examination and projection of stock prices have long fascinated experts in the fields of finance
and economics, largely due to the inherent volatility of the stock market. This volatility is significantly
influenced by a variety of economic indicators, including, but not limited to, the Consumer Price Index
(CPI) and inflation rates. This fluctuation in the market presents a dual-faced scenario : on one hand, it
offers substantial opportunities for traders who aim to capitalize on these market movements for profit ; on
the other hand, it also escalates the potential risks and the likelihood of financial losses. In this complex
environment, the ability to accurately forecast market trends is not just beneficial but a critical tool for
market researchers, traders, and investors alike.
Stock prices, which are meticulously recorded over specific periods, lend themselves to analysis and
modeling using time series analysis. This statistical method is invaluable for drawing meaningful conclu-
sions from stock market data, providing a structured approach to understanding and predicting market
behaviors.
A key focus in this realm is the S&P 500, a predominant index in the American stock market. This
index is a reflection of the market capitalizations of 500 leading companies that are listed on major stock
exchanges like the New York Stock Exchange (NYSE) and the NASDAQ Stock Market. The S&P 500 is
often considered a benchmark for the overall health of the U.S. stock market and, by extension, the U.S.
economy.
In our study, we propose to commence with an in-depth analysis using time series models to forecast
daily stock prices and examine the volatility within the S&P 500 index. Following this, our exploration will
extend to more avant-garde methodologies, incorporating cutting-edge techniques in machine learning and
deep learning. These advanced methods hold the promise of enhancing the accuracy and efficiency of our
predictions, offering a more nuanced understanding of stock market dynamics. This exploration aims not
only to contribute to the academic and practical understanding of financial markets but also to provide
actionable insights for market participants.
3
2 Models presentation
2.1 ARIMA
In the analysis of financial time series data, the ARIMA model stands out as a fundamental and
versatile tool. ARIMA, an acronym for AutoRegressive Integrated Moving Average, is designed to model
and forecast time series data. It encompasses three key components : Autoregression (AR), Differencing (I),
and Moving Average (MA). The AR part exploits correlations between a time series and lagged versions
of itself, while the I component involves differencing the data to attain stationarity— a state where the
series has constant mean and variance over time. Finally, the MA aspect models the error of the time
series as a linear combination of error terms from the past. Collectively, these components allow ARIMA
to effectively capture various patterns in volatile financial data, making it a cornerstone methodology in
time series forecasting.
Mathematical formulation
An ARIMA model, denoted as ARIMA(p, d, q), is defined by three parameters :
— p - the number of lag observations (lag order).
— d - the degree of differencing .
— q - the size of the moving average window (order of moving average).
The model is represented as :
Φ(B)(1 − B)d Yt = Θ(B)εt
Where :
— Yt is the time series.
— B is the backshift operator.
— Φ(B) is the autoregressive polynomial of order p.
— Θ(B) is the moving average polynomial of order q.
— εt is the error term.
2.2 GARCH
An interesting aspect to examine when considering financial data is volatility. It can be thought of as
the variance from a statistical point of view. In fact, the volatility measures the variation or dispersion of
a financial asset or market index over a specific period. It is a crucial concept since it reflects the level of
uncertainty or risk associated with an investment. Moreover, one can base various trading strategies off of
volatility, use the volatility values to calculate the Value At Risk, and last but not least make use of the
volatility in derivatives pricing.
ARCH and GARCH models have been used extensively in financial contexts to model volatility. In order
to model volatility for the considered index in our case, we used the GARCH model. The advent of such
models was necessary since it is logical to consider that the variance is not constant over a moving period of
time. A theory of dynamic volatilities was then developed. GARCH stands for Generalized Autoregressive
Conditional Heteroskedastic, is it a generalization of the ARCH model, that allows for a more flexible lag
structure. This extension resembles the one of AR to ARMA models.
4
and (β1 , . . . βq ) ∈ Rq−1 × R∗+ and ω > 0 such that :
Xt = yt + σt ηt
p
X q
X
σt2 = ω + 2
αi Xt−i + 2
βi σt−j
i=1 j=1
We can note the two following properties of the conditional mean and conditional variance of a GARCH
model :
It therefore implies the following moments properties (for a stationary GARCH process) :
E[Xt ] = 0
ω
V[Xt ] = Pp Pq
1− i=1 αi − j=1 βi
Our point of interest being the modelling of volatility, the equation (2) enables us to state that the
conditional variance of the process X is the volatility.
2.3 LSTM
Neural networks can also be used for predicting the future stock price when used in a regression task
setting. One of the famous classes of neural networks are recurrent ones. LSTM (Long Short Term Memory)
is part of this class, with the addition that it enables maintaining information for a long period of time,
thus addressing one of the main limitations of the recurrent networks, that is the vanishing gradient issue
which makes the network unable to learn and retain dependencies between the sequential variables.
2.3.1 Architecture
LSTM networks have three main components : Memory cells, gates and activation functions.
1. Memory cells : They enable us to perform modeling of dependencies through a long period of
time by maintaining a state in the cells. It consists of a numerical value that the network will use
through different time steps.
2. Input gate : Controls whether the input should modify the cell’s information.
3. Forget gate : Controls what amount of information should be discarded from the cell ( may
potentially discard all the available information).
4. Output gate : Controls the output based on the current cell state, potentially outputting infor-
mation regardless of the cell’s state.
5. Activation functions : These regulate the flow of each of the gates above : They are applied to
the gate’s information which is nothing but a weighted sum of the input’s, output’s and memory
cell’s information. Generally, the sigmoid and hyperbolic tangent are the ones that are used.
5
2.3.2 Mechanism
The LSTM network processes in the following way :
The input gate determines how much of the new information, given the cell’s state and output information,
should be added to the cell state. The forget gate performs the selection of unnecessary, outdated or
irrelevant information that should be removed.
The combination of the last 2 steps then enables the altering of the cell’s state by exactly adding relevant
information and removing outdated historical context.
The output gate then produces the information that will either be used for prediction directly or fed to the
fully connected layers part of the network, by utilizing the current state and the additional information
as well as taking into account the removals.
Mathematically, it can be described as follows : We consider that xt. is the input at time t, z.t is the output
at time t, st. is the cell state at time t. We also note ω.l , ω.φ , ω.o respectively the weights for the input,
forget and output gates. Finally, let f be the activation function considered.
For each of the input, forget and output gates we have ( with l, φ, o referring to them respectively) :
I
X H
X C
X
atl = ωil xti + ωhl zht−1 + ωcl st−1
c
i=1 h=1 c=1
XI H
X C
X
atφ = ωiφ xti + ωhφ zht−1 + ωcφ st−1
c
i=1 h=1 c=1
XI XH C
X
ato = ωio xti + ωho zht−1 + ωco st−1
c
i=1 h=1 c=1
btl = f (atl )
btφ = f (atφ )
bto = f (ato )
where the first term represents the effect of the forget gate on the previous cell state, and the second one is
the weighted sum of the input and output, transformed by an activation function and scaled by the value
of the input gate.
Lastly, the output of the LSTM neuron is :
the value of the state cell activated through h and scaled by the output gate’s value.
The weights are calculated during the training.
6
Figure 1 – LSTM Bloc
7
— Lag Features : Prior values are utilized as predictors to capture temporal dependencies.
— Moving Window Statistics : Utilized to emphasize both immediate and enduring patterns.
Normalization and Scaling
— Techniques such as Min-Max Scaling or Z-score Scaling are utilized to assure equal contribution of
features.
— Dealing with Outliers : Outliers are managed using robust scaling or capping approaches.
Data Splitting Strategy
— Data splitting strategy is a method used to divide a dataset into separate subsets for analysis
or modeling purposes.
— Temporal Splits : The data is partitioned chronologically, with 80% allocated to the training set
and 20% allocated to the test set.
— Approach for Cross-Validation : Time-based techniques such as rolling or expanding window
cross-validation are employed. The list ends.
Data Quality Checks Techniques such as imputation or forward filling are employed to handle
missing values. Data obtained from credible sources such as Yahoo Finance guarantees the accuracy and
reliability of the data.
This extensive preparation guarantees that the input data is in a format that is favorable for optimizing
the performance of XGBoost and GBM models. Through painstaking feature engineering and data trans-
formation procedures, we establish a strong basis for reliable and precise predictive modeling by effectively
addressing the unique characteristics of financial time series data.
where L is the loss function representing the discrepancy between actual values yi and predictions ŷi , and
Ω denotes the regularization term which penalizes the complexity of the model.
Algorithm for Time Series Forecasting :
The algorithmic depiction demonstrates the adaptation of XGBoost specifically for time series forecas-
ting. It is essential to incorporate lagged features as inputs in order to accurately capture time-dependent
trends in the data. The inherent recursive structure of time series forecasting involves utilizing the result
at a given time step to inform the prediction for the subsequent time step. The regularization components,
denoted as Ω, in XGBoost aid in managing the complexity of the model. This ensures that the model
captures the fundamental patterns in financial time series data without excessively fitting to the inherent
noise, hence preventing overfitting.
8
Algorithm 1 gradient boosting
P
Initialize the model with a constant value : f0 (x) = arg minγ L(yi , γ)
for m = 1 to M do
Compute the residuals, representing the errors of the model’s predictions at the previous step :
∂L(yi , f (xi ))
rim = −
∂f (xi ) f (x)=fm−1 (x)
for all i
Fit a base learner (e.g., decision tree) hm (x) to these residuals, rim
Utilize lagged features of the time series as inputs to the base learner to capture temporal dependencies
Find the best multiplier γm for hm (x) :
X
γm = arg minγ L(yi , fm−1 (xi ) + γhm (xi ))
9
3 S&P500 price prediction
3.1 Methodology
This section outlines the key steps in constructing our time series model over a 22-year and 11-month
period, from January 1, 2000, to November 17, 2023. The methodology is depicted in the following flow
chart (Figure 1).
Our forecasting approach utilizes test data (last 20% of the data). To evaluate forecast accuracy, we
calculate the Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE) as follows :
n
1X
M AE = |yt − yˆt | (3)
n
t=1
v
u n
u1 X
RM SE = t (yt − yˆt )2 (4)
n
t=1
Optimal predictions are indicated by the lowest MAE and RMSE values.
3.2 Stationnarity
The foundational stage in analyzing time-indexed data is to stabilize the series by converting it from
a non-stationary to a stationary state. This step is critical as many statistical and econometric models,
including ARMA, presuppose stationarity within the dataset to ensure the validity of their application
and the reliability of their results.
10
This process can be mathematically represented as :
∆Yt = Yt − Yt−1
In the preprocessing of financial time series, it’s typical to first perform a logarithmic transformation, then
proceed with differencing. The logarithmic transformation is applied because financial data frequently ex-
hibit exponential growth, and this transformation aids in smoothing the series. Subsequently, differencing
is employed to stabilize the variance of the time serie.
Figures 3 and 4 demonstrate that the SP500 price has long-term upward and downward trends, indi-
cating a non-stationary series. It is clear that both the mean and variance are not constant over time. A
structural break is noticeable between 2008 and 2009, because of the global financial crisis.
Figure 5 – ACF of the log price Figure 6 – PACF of the log price
Figure 6 show that PACF has a significant value at lag 1 and then cuts off, suggesting the model could
be an ARMA(1,0). However, Figure 5 illustrates that the ACF plot diminishing extremely slowly in a
linear manner, indicating the series is trended and non-stationary. To handle the series non-stationarity,
we use first differencing.
Figure 7 displays fluctuations around 0, suggesting that the series attains mean stationarity following
the first differencing. Nonetheless, it is clear that the variance of the series remains non-stationary. We
also tested for stationarity using the Augmented Dickey-Fuller Test. At the 5% significance level, we reject
the null hypothesis, then we conclude that after the first differencing the data is stationary. Therefore, we
will now proceed with the model estimation.
11
Test for unit root in p-Value
Level 0.5047
First difference 0.01
Figure 9 – ACF of the differenced log price Figure 10 – PACF of the differenced log price
As an alternative approach, model identification can be conducted by choosing the model with the
lowest AIC (Akaike Information Criterion).
Based on this criterion, we have selected an ARIMA(1,1,2) given by :
This formula highlights the utility of differencing log prices. The difference in log prices corresponds to
returns, similar to the percentage changes in stock prices, especially when dealing with daily data. This
can be easily demonstrated using a first-order Taylor expansion :
Pt − Pt−1
rt = , ln(1 + rt ) = rt + o(rt ) as rt → 0
Pt−1
Firstly, we check residual normality. The lower right plot in Figure 11 shows non-normal distribution
of residuals. The Jarque-Bera test confirms this non-normality. Later, we will explore the way to address
this issue.
12
Test p-Value Conclusion on residuals
Jarque-Bera < 2.2e-16 Non-normality
Ljung-Box 0.4103 Independence
We then test the independence of residuals, using the Ljung-Box test. Typically, we expect autocorre-
lation in about 5% of the estimates. Testing up to ln(number of observations) lags is recommended as a
rule of thumb. For our model, we applied the Ljung-Box test to 4 lags. The test results support the null
hypothesis, suggesting independent residuals. Thus, we infer that our model’s error terms are effectively
white noise.
In the upper plot of Figure 13, we observe distinct clusters of volatility within the time series. These
clusters become more pronounced when plotting the squared residuals, as shown in Figure 11. This pat-
tern suggests a non-constant variance of residuals, leading to the conclusion that the homoscedasticity
assumption does not hold.
Given the ARIMA resisiduals in equation (3), The selected model is a GARCH(1,2) expressed as
follows :
To better fit the distribution of residuals, we shift from assuming normality for zt to adopting a skewed
student’s t-distribution (sstd). Figure 14 illustrates the inadequacy of a normal distribution in our context.
13
εt = σt zt , zt ∼ N(0, 1)
σt2 = ω + α1 ε2t−1 + β1 σt−1
2 2
+ β2 σt−2
The next step involves checking the stationarity of the conditional variance σt in the GARCH(1,2)
model. This assessment can be conducted by confirming that the sum of the coefficients, α1 , β1 , and β2 ,
is less than 1. The detailed summary containing the coefficients of the ARIMA-GARCH model analysis
are provided in the annex.
3.4.2 Comparative Forecasting Analysis of S&P 500 Index Prices : ARIMA(1,1,2) vs.
ARIMA(1,1,2)-GARCH(1,2) Models
In this section, we explore the effectiveness of our forecasting approach in predicting the log prices for
out-of-sample (test) data.
As discussed in Section 3.1, the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)
are utilized to assess forecast accuracy. The results indicate a notable improvement in accuracy when
employing the ARIMA(1,1,2)-GARCH(1,2) combined model over the standalone ARIMA(1,1,2) model.
The forecast depicted in Figure 15 clearly illustrates that our predictions align closely with the trend
observed in the test data.
In the following section, we introduce an additional model. This model will be stacked with our current
one, aiming to enhance the accuracy of our predictions.
14
3.5 ARIMA-EGARCH model
The EGARCH(2,2) model, selected based on the AIC criterion for our residuals, assumes that zt follows
a skewed student’s t-distribution. The residuals εt from the ARIMA(1,1,2) are modeled as εt = σt zt , where
σt is the conditional standard deviation and zt is the standardized residual. The model is expressed as :
2 2 2
X X |εt−j | |εt−j | X εt−j
log(σt2 ) =κ+ 2
γi log(σt−i ) + αj −E + ξj .
σt−j σt−j σt−j
i=1 j=1 j=1
Here, κ, γi , αj , and ξj are the model parameters, and σt2 denotes the conditional variance.
However, this model yields less accurate predictions on the test data. Nevertheless, it still manages to
capture the overall upward trend observed in the test data.
The table below presents a comparative analysis of the accuracy of the newly introduced model against
the models discussed earlier :
3.6 Stacking
In order to enhance forecasting accuracy, we assigned specific weights to the forecasts produced by
the two previous models. This was achieved by partitioning our test data into a new training and test
set. To determine these weights effectively, we utilized linear regression, predicting the weights through an
Ordinary Least Squares (OLS) method. The final combined model is expressed as follows :
15
The accuracy has markedly increased, as demonstrated by Figure 19, which clearly illustrates the rise
in price. However, there is a slight overestimation of this price for almost 20% of the displayed data,
representing the new test data for linear regression.
As a result, we will maintain this model as a benchmark for our trading strategy, given its superior
performance in predicting recent data.
3.7 Extension
Another aspect yet to be investigated is the presence of long memory in our data. A time series
characterized by long memory displays an autocorrelation function that decays slowly towards zero.
2
In our analysis, when examining the ACF of the series log PPt−1t
, we observe this behavior. Thus, it
is apparent that our series exhibits long memory, suggesting that the FARIMA (Fractional AutoRegressive
Integrated Moving Average) models would be more suitable for modeling this type of data.
The selection of the autoregressive order p and moving average order q was based on opting for the
model with the lowest Akaike Information Criterion (AIC), yielding p = 2 and q = 2.
After training the model, the fractional differencing value was found to be d=0.49, which is lower than
0.5. As a result, the model is expressed by the equation :
16
After having verified the assumptions underlying our model, we can now move forward to evaluate its
predictive accuracy. It is important to highlight that this modeling approach does not discern between
negative and positive returns, as it is based on the squared returns which constitute a long memory series.
Additionally, it is not capable of predicting the actual price of the index.
The results of the forecasting are depicted in Figures 23 and 24.
By looking at the tails of the fitted distribution in green we can observe that it is heavier than the
one in red, which corresponds to a normal one, therefore suggesting to use a different distribution than
the normal one. To confirm this intuition we use the Jarque Bera test to check if the distribution of log
returns is a normal one.
The table above confirms that the distribution is not normal, thus a distribution similar to the normal
17
one but with heavier tails is preferred. Consequently, we decided to use the Student or generalized error
distribution when fitting the GARCH process later on.
4.2 GARCH
We first begin by splitting the available data into two sets : one for training purposes representing
80% of the data set, and the remaining 20% for testing and validation purposes. One can note that the
splitting in time series analysis is different than the one traditionally used for machine learning : we split
the data in a continuous manner rather than randomly picking observations from the data set in order to
account for the natural dependency present in this type of data.
In order to model volatility, the first step is to find the suitable ARMA process for the conditional
mean of the series. We will be using the log returns of the S&P 500 as our series, and by looking at the plot
of the series as well as the ACF and PACF, we observe signs of stationarity. To confirm our observations,
we will use the augmented Dickey-Fuller test : We obtain a test statistic of -18.42 and a p-value lower than
0.01, thus confirming the stationarity of the log returns.
To choose the best ARMA for the conditional mean, we fit several models of different orders, namely
ARMA(p,q) for p and q in {1, 2, 3, 4} and we choose the best one utilizing the AIC minimization criterion :
the chosen model corresponds to an ARMA(3,4). Concerning the diagnostic of this choice, we check the
correlation of residuals using a Box-Ljung test with lag 10. A p-value of 0.5681 (> 0.05) confirms that the
residuals are uncorrelated. To further motivate the use of GARCH model to estimate volatility, we use the
ArchLM test to confirm the existence of ARCH effect in the mean model.
18
The next step is to model the conditional variance of the series, thus giving an estimate of the volatility.
We will use the AIC minimization criterion in order to select the most suitable model among several
GARCH(p,q) ones with p,q in {1, 2, 3}, specifying a student or generalized error underlying distribution.
Order GARCH(p,q)
STD GED
(1,1) -6.518204 -6.522004
(2,1) -6.522558 -6.524094
(3,1) -6.521499 -6.523678
(1,2) -6.519319 -6.521493
(2,2) -6.522471 -6.525927
(3,2) -6.520311 446.339786
(1,3) -6.519149 -6.523103
(2,3) -6.522389 -6.525511
(3,3) -6.521822 -6.523360
Under the student distributed errors assumption, we find that the GARCH(2,1) model was the best
one with regards to the AIC minimization criterion, whereas with GED errors the GARCH(2,2) model
was found to have the least AIC. We will further examine the performance of the best selected models for
both errors distribution by computing their MAE, MSE, and RMSE on training and test sets.
With regards to all the error metrics considered, we can observe that the GARCH model fitted using
the generalized error distribution is the one which achieves minimum values for MAE, MSE and RMSE.
19
Test
Ljung-Box Weighted ARCH LM
(Standardized Residuals, lag 1) (lag 3)
H0 : Residuals are not correlated H0 : No ARCH effect
Hypotheses
H1 : Residuals are correlated H1 : ARCH effect
Model STD
Test statistic 1.611 0.08739
p-value 0.2044 0.7675
Model GED
Test statistic 0.7515 0.02663
p-value 0.3860 0.8704
As a final observation of our model, we can take a look at how the prediction of future volatility looks
like on our test set. As we can see both models fail to capture the relatively big values of volatility in the
beginning of 2020 during the COVID period, but overall they give a stale prediction of volatility, which
is confirmed by the low variance in the returns after the COVID spikes. Finally, the GARCH(2,1) with
student error distribution tends to give higher estimates of the volatility compared to the GARCH(2,2)
with generalized error distribution. If we were to choose the best model with regards to error metrics
only, we would go with the latter, but since their evaluation metrics aren’t very different from each other,
especially on the training set we may choose the GARCH(2,1) with student error distribution if we are
not looking for a conservative prediction of the volatility, which can be the case in financial applications
when calculating the Value At Risk (VaR).
20
5 A Deep Learning approach
In this section we will take a look at the performance of a neural network approach in predicting the
S&P500 stock price, by using a LSTM model.
We will be processing in the same way we did for the other models, by splitting the data into two sets : 80%
for training and 20% for testing purposes. As for machine learning models, deep learning models also have
hyperparameters that should be tuned in order to boost the performance of the model. LSTM network has
many different hyperparameters that can be tuned such as the number of LSTM units, the input sequence
length, the batch size, number of epochs, learning rate. . .It can be computationally expensive to fine-tune
all the possible hyperparameters, that’s why we will only be tuning the following ones : the number of
units, the learning rate and the number of layers. We will be testing values in {1, 2}, {0.001, 0.01} and
{50, 100} respectively. We find that the best model is the one with 0.001 learning rate, 1 units and 100
layers. The following plot shows the evolution of the loss on training set and test set.
21
6 A Gradient boosting Approach
This section of the study provides a thorough examination of the outcomes gained by implementing
a Gradient Boosting model for forecasting the S&P 500 index. Our approach, based on meticulous data
preparation and feature engineering, has resulted in a model that accurately represents the intricate pat-
terns present in the financial time series data. The results section will provide a comprehensive analysis of
the model’s prediction performance, focusing on important metrics such as Mean Squared Error (MSE),
Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). These metrics offer valuable in-
formation regarding the precision and dependability of the forecasts generated by the Gradient Boosting
model. In addition, we will examine the residual diagnostics to further evaluate the model’s prediction
skills. Through the analysis of residuals, we can assess the model’s capacity to accurately represent the
fundamental data patterns and detect any possible biases or inefficiencies.
22
6.4 Model Training, Parameter Tuning, and Evaluation
We utilized a resilient cross-validation approach specifically designed for time series data to train our
models. To acknowledge the intrinsic sequential dependence in financial data, we employed time series
cross-validation. This methodology adheres to the chronological sequence of observations, guaranteeing
that only historical data is utilized to forecast future values, hence avoiding any form of anticipatory bias.
Our data was partitioned into training and validation sets using a rolling window technique. During each
iteration, the model underwent training using a consistent initial set of data, which was then augmented
with additional data points for succeeding iterations. This approach is especially efficient for time series
data as it closely emulates the way models are employed in real-life situations, consistently incorporating
new information as it becomes accessible.
Metric Value
Mean Squared Error (MSE) 9.18308 × 10−5
Mean Absolute Error (MAE) 0.00554856
Root Mean Square Error (RMSE) 0.009582839
(a) Residuals over the test time period. (b) Predictions over the test time period.
Figure 29 – Comparative visualization of residuals and predictions from the time series model.
we are now going to delve in the residuals over the 3 year time period of the test data. The Residuals
Plot presented allows us to assess the effectiveness of a Generalized Boosting Model (GBM) when applied
to a time series data collection.
23
1. Residual Distribution : The plot has random residuals. This lack of structure indicates that the
model accurately captures data patterns and cycles without misspecifications.
2. Residual Variance : Time-dependent homoscedastic residuals are plotted. This consistency suggests
the model’s expected accuracy is constant throughout time.
3. Outliers : One large peak may signal exceptions. Rare or unexpected events may be missed by the
model. Examine these data pieces to discover their type and model performance impact.
4. Residual Centering : The residuals hover around the red zero line, showing the model does neither
exaggerate nor underestimate.
The Residuals Plot illustrates that GBM captures the main structure without systematic bias, fitting
time series data well. The need for quantitative analysis and outliers suggest model change.
7 Deployment
In order to put into application the predictions we obtained, we will use the predicted prices by the
stacking model, as well as the predicted volatilities and implement 2 trading strategies based on that. The
first one is based on a simple idea : buy if the predicted price for tomorrow is larger than today’s price.
With this first strategy, we get a return of 0.58, beating the benchmark return of the S&P500 which is
equal to 0.50.
24
8 Conclusion
In this comprehensive analysis of the S&P 500 stock index, we employed various advanced statistical
and machine learning techniques to predict stock prices and volatility. The methods ranged from traditional
time series models like ARIMA and GARCH to the advanced deep learning approach using LSTM and
gradient boosting.
Our findings illustrate the effectiveness of combining ARIMA with GARCH models to enhance predic-
tive accuracy, especially for financial time series data characterized by volatility clustering. The ARIMA(1,1,2)-
GARCH(1,2) model, in particular, stood out for its ability to closely align predictions with actual market
trends. We also explored the potential of ARIMA-EGARCH models, but they exhibited less accuracy
compared to the combined ARIMA-GARCH model. The aim of this study was also to understand and
explain how machine learning algorithms can be use in the time serie framework to forecast and predict
time series but also the pitfalls and sensible points to take into account when using this types of algorithms.
In addition, the application of a Long Short-Term Memory (LSTM) neural network presented an
alternative perspective, leveraging the power of deep learning to capture complex, nonlinear relationships
inherent in stock market data.
A key insight from this study is the relevance of selecting appropriate models based on the specific
characteristics of financial time series, such as non-stationarity and volatility clustering.
For future research, we recommend exploring hybrid models that combine the strengths of statistical
time series analysis and machine learning. Such models could potentially offer more robust and accurate
forecasts by capturing both linear and nonlinear dynamics in stock price movements. Additionally, incor-
porating external factors such as macroeconomic indicators or market sentiment analysis could further
enhance the predictive power of these models. Furthermore, as a limitation to our findings and concerning
their deployment, we were hoping to implement a a second strategy that involved using Bollinger bands
and creates a short signal if the predicted price is greater than upper band, and a long signal if the pre-
dicted price is lower than the lower band, but due to the fact that our prediction was always within the
bands, we weren’t able to create signals for the considered period.
In conclusion, our research contributes valuable insights into the field of financial forecasting, offering
a foundation for both academic exploration and practical application in financial analysis and trading
strategies.
25
9 Annex
9.1 Annex A - ARIMA
26
9.3 Annex C - ARIMA-EGARCH
27
Figure 35 – Residuals diagnostic
10 Personal note
10.1 Assili Mohamed
In this project, I worked on the third part which involved modeling the S&P 500 index price using a
special type of model called ARIMA-GARCH. This model combines two methods to better predict how the
index price might change. Additionally, I tested another model, called FARIMA, on the squared returns
of the index. This showed our data is a "long memory" data
A major challenge in this project was making sure the models were based on correct assumptions. It
was difficult to confirm that these assumptions were right. When they weren’t, I had to find ways to make
the models work properly. This part of the project required careful checking and coming up with different
solutions to ensure the models were reliable and gave good results.
To finish, I would like to thank our professor for his guidance throughout this project. His expertise
and timely answers to my questions greatly contributed in this study.
28
10.3 ERGUN Damien
I’ve been developing gradient boosting models. I was initially hopeful about the Gradient Boosting
Machine (GBM) model’s predictive potential. However, the earliest iterations of the model, which relied
exclusively on prior log returns as predictors, were insufficient. They were unable to capture the complicated
patterns and signals found in financial time series data. This setback motivated me to learn more about
technical analysis. To improve the model’s predictive powers, I recognized it needed to be fed features that
encapsulated the underlying market dynamics without overwhelming it with complexity. The integration
of carefully chosen technological indicators was a watershed moment. The Relative Strength Index (RSI),
Moving Average Convergence Divergence (MACD), and other indicators gave a distilled essence of market
emotion and trends.The challenge was to establish a balance ; if there were too few indications, the model
would miss out on important information ; if there were too many, it could overfit or become bogged down
by noise. I identified a selection of indicators that complemented the historical log returns, allowing the
GBM model to dramatically improve its projections after much experimentation and fine-tuning.
11 Bibliography
Références
[1] Weiwei Jiang, "Applications of deep learning in stock market prediction : Recent progress," Depart-
ment of Electronic Engineering, Tsinghua University, 2020.
[2] Robert F. Engle III, "Risk and volatility : Econometric models and financial practice," Nobel Lecture,
2003.
[3] Tim Bollerslev, "Generalized autoregressive conditional heteroskedasticity," Journal of Econometrics,
vol. 31, pp. 307–327, 1986.
[4] Sreelekshmy Selvin, R. Vinayakumar, E.A. Gopalakrishnan, Vijay Krishna Menon, K.P. Soman,
"Stock price prediction using LSTM, RNN and CNN-sliding window model," Centre for Computatio-
nal Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa
Vidyapeetham, Amrita University, India, 2017.
[5] Prapanna Mondal, Labani Shit, Saptarsi Goswami, "Study of effectiveness of time series modeling
(ARIMA) in forecasting stock prices," International Journal of Computer Science, Engineering and
Applications (IJCSEA), vol. 4, no. 2, 2014.
[6] Ayodele A. Adebiyi, Aderemi O. Adewumi, Charles K. Ayo, "Stock Price Prediction Using the ARIMA
Model," School of Mathematics, Statistics & Computer Science, University of KwaZulu-Natal, Durban,
South Africa, 2014.
29