Rapport Séries Temporelles

Ecole nationale de la statistique
et de l’analyse de l’information
Time series project
Stock Price Prediction
Students :
Ilyass El fikri
Mohamed Assili
Damien Ergun
Kenza Chebil
Professor :
Youssef Esstafa
December 2023
Table of Contents
Abstract 3
1 Introduction 3
2 Models presentation 4
2.1 ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.2 Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Gradient Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.2 Mechanisms of Gradient Boosting and XGBoost . . . . . . . . . . . . . . . . . . . 8
3 S&P500 price prediction 10

3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Stationnarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 ARIMA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.1 Model identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.2 Residuals Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 ARIMA-GARCH model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4.1 Model implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4.2 Comparative Forecasting Analysis of S&P 500 Index Prices : ARIMA(1,1,2) vs.
ARIMA(1,1,2)-GARCH(1,2) Models . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5 ARIMA-EGARCH model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.6 Stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.7 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 SP&500 Volatility Prediction 17

4.1 Examining Log returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Model diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 A Deep Learning approach 21
6 A Gradient boosting Approach 22

6.1 Data preparation and preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.2 Feature engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.3 Choice of Technical Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.4 Model Training, Parameter Tuning, and Evaluation . . . . . . . . . . . . . . . . . . . . . . 23
6.4.1 Hyperparameter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.4.2 Model evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7 Deployment 24
8 Conclusion 25
1
9 Annex 26
9.1 Annex A - ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.2 Annex B - ARIMA-GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.3 Annex C - ARIMA-EGARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
9.4 Annex D- FARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
9.5 Annex E- - Residuals diagnostic for stacking . . . . . . . . . . . . . . . . . . . . . . . . . . 28
10 Personal note 28
10.1 Assili Mohamed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
10.2 El Fikri Ilyass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
10.3 ERGUN Damien . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
10.4 Kenza Chebil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
11 Bibliography 29
2
Abstract
1 Introduction
The examination and projection of stock prices have long fascinated experts in the fields of finance
and economics, largely due to the inherent volatility of the stock market. This volatility is significantly
influenced by a variety of economic indicators, including, but not limited to, the Consumer Price Index
(CPI) and inflation rates. This fluctuation in the market presents a dual-faced scenario : on one hand, it
offers substantial opportunities for traders who aim to capitalize on these market movements for profit ; on
the other hand, it also escalates the potential risks and the likelihood of financial losses. In this complex
environment, the ability to accurately forecast market trends is not just beneficial but a critical tool for
market researchers, traders, and investors alike.
Stock prices, which are meticulously recorded over specific periods, lend themselves to analysis and
modeling using time series analysis. This statistical method is invaluable for drawing meaningful conclu-
sions from stock market data, providing a structured approach to understanding and predicting market
behaviors.
A key focus in this realm is the S&P 500, a predominant index in the American stock market. This
index is a reflection of the market capitalizations of 500 leading companies that are listed on major stock
exchanges like the New York Stock Exchange (NYSE) and the NASDAQ Stock Market. The S&P 500 is
often considered a benchmark for the overall health of the U.S. stock market and, by extension, the U.S.
economy.
In our study, we propose to commence with an in-depth analysis using time series models to forecast
daily stock prices and examine the volatility within the S&P 500 index. Following this, our exploration will
extend to more avant-garde methodologies, incorporating cutting-edge techniques in machine learning and
deep learning. These advanced methods hold the promise of enhancing the accuracy and efficiency of our
predictions, offering a more nuanced understanding of stock market dynamics. This exploration aims not
only to contribute to the academic and practical understanding of financial markets but also to provide
actionable insights for market participants.
3
2 Models presentation
2.1 ARIMA
In the analysis of financial time series data, the ARIMA model stands out as a fundamental and
versatile tool. ARIMA, an acronym for AutoRegressive Integrated Moving Average, is designed to model
and forecast time series data. It encompasses three key components : Autoregression (AR), Differencing (I),
and Moving Average (MA). The AR part exploits correlations between a time series and lagged versions
of itself, while the I component involves differencing the data to attain stationarity— a state where the
series has constant mean and variance over time. Finally, the MA aspect models the error of the time
series as a linear combination of error terms from the past. Collectively, these components allow ARIMA
to effectively capture various patterns in volatile financial data, making it a cornerstone methodology in
time series forecasting.
Mathematical formulation
An ARIMA model, denoted as ARIMA(p, d, q), is defined by three parameters :
— p - the number of lag observations (lag order).
— d - the degree of differencing .
— q - the size of the moving average window (order of moving average).
The model is represented as :
Φ(B)(1 − B)d Yt = Θ(B)εt
Where :
— Yt is the time series.
— B is the backshift operator.
— Φ(B) is the autoregressive polynomial of order p.
— Θ(B) is the moving average polynomial of order q.
— εt is the error term.
2.2 GARCH
An interesting aspect to examine when considering financial data is volatility. It can be thought of as
the variance from a statistical point of view. In fact, the volatility measures the variation or dispersion of
a financial asset or market index over a specific period. It is a crucial concept since it reflects the level of
uncertainty or risk associated with an investment. Moreover, one can base various trading strategies off of
volatility, use the volatility values to calculate the Value At Risk, and last but not least make use of the
volatility in derivatives pricing.
ARCH and GARCH models have been used extensively in financial contexts to model volatility. In order
to model volatility for the considered index in our case, we used the GARCH model. The advent of such
models was necessary since it is logical to consider that the variance is not constant over a moving period of
time. A theory of dynamic volatilities was then developed. GARCH stands for Generalized Autoregressive
Conditional Heteroskedastic, is it a generalization of the ARCH model, that allows for a more flexible lag
structure. This extension resembles the one of AR to ARMA models.
Mathematical formulation and properties

Let us consider a process (Xt )t∈Z .
X is said to be a GARCH(p, q) process if there exists (α1 , . . . αp ) ∈ Rp−1 × R∗+
4
and (β1 , . . . βq ) ∈ Rq−1 × R∗+ and ω > 0 such that :
Xt = yt + σt ηt
p
X q
X
σt2 = ω + 2
αi Xt−i + 2
βi σt−j
i=1 j=1
where (ηt )t∈Z is independent, identically distributed of mean 0 and variance 1.
The process X is said to be stationary if :

p
X q
X
αi + βj < 1
i=1 j=1
We can note the two following properties of the conditional mean and conditional variance of a GARCH
model :
E[Xt |Xt−1 , Xt−2 . . . ] = 0 (1)

V[Xt |Xt−1 , Xt−2 . . . ] = σt2 (2)
It therefore implies the following moments properties (for a stationary GARCH process) :
E[Xt ] = 0
ω
V[Xt ] = Pp Pq
1− i=1 αi − j=1 βi
Our point of interest being the modelling of volatility, the equation (2) enables us to state that the
conditional variance of the process X is the volatility.
2.3 LSTM
Neural networks can also be used for predicting the future stock price when used in a regression task
setting. One of the famous classes of neural networks are recurrent ones. LSTM (Long Short Term Memory)
is part of this class, with the addition that it enables maintaining information for a long period of time,
thus addressing one of the main limitations of the recurrent networks, that is the vanishing gradient issue
which makes the network unable to learn and retain dependencies between the sequential variables.
2.3.1 Architecture
LSTM networks have three main components : Memory cells, gates and activation functions.
1. Memory cells : They enable us to perform modeling of dependencies through a long period of
time by maintaining a state in the cells. It consists of a numerical value that the network will use
through different time steps.
2. Input gate : Controls whether the input should modify the cell’s information.
3. Forget gate : Controls what amount of information should be discarded from the cell ( may
potentially discard all the available information).
4. Output gate : Controls the output based on the current cell state, potentially outputting infor-
mation regardless of the cell’s state.
5. Activation functions : These regulate the flow of each of the gates above : They are applied to
the gate’s information which is nothing but a weighted sum of the input’s, output’s and memory
cell’s information. Generally, the sigmoid and hyperbolic tangent are the ones that are used.
5
2.3.2 Mechanism
The LSTM network processes in the following way :
The input gate determines how much of the new information, given the cell’s state and output information,
should be added to the cell state. The forget gate performs the selection of unnecessary, outdated or
irrelevant information that should be removed.
The combination of the last 2 steps then enables the altering of the cell’s state by exactly adding relevant
information and removing outdated historical context.
The output gate then produces the information that will either be used for prediction directly or fed to the
fully connected layers part of the network, by utilizing the current state and the additional information
as well as taking into account the removals.
Mathematically, it can be described as follows : We consider that xt. is the input at time t, z.t is the output
at time t, st. is the cell state at time t. We also note ω.l , ω.φ , ω.o respectively the weights for the input,
forget and output gates. Finally, let f be the activation function considered.
For each of the input, forget and output gates we have ( with l, φ, o referring to them respectively) :
I
X H
X C
X
atl = ωil xti + ωhl zht−1 + ωcl st−1
c
i=1 h=1 c=1
XI H
X C
X
atφ = ωiφ xti + ωhφ zht−1 + ωcφ st−1
c
i=1 h=1 c=1
XI XH C
X
ato = ωio xti + ωho zht−1 + ωco st−1
c
i=1 h=1 c=1
the activation function is then applied and we obtain :
btl = f (atl )
btφ = f (atφ )
bto = f (ato )
Concerning the state cell, we first calculate the value

I
X H
X
atc = ωic xti + ωhc zht−1
i=1 h=1
The information in the cell at time t is then calculated by :
stc = btφ st−1

c + btl g(atc )
where the first term represents the effect of the forget gate on the previous cell state, and the second one is
the weighted sum of the input and output, transformed by an activation function and scaled by the value
of the input gate.
Lastly, the output of the LSTM neuron is :
btc = bto h(stc )
the value of the state cell activated through h and scaled by the output gate’s value.
The weights are calculated during the training.
6
Figure 1 – LSTM Bloc
2.4 Gradient Boosting

When trying to understand and forecast the complicated patterns of financial markets, as represented
by indexes such as the S&P 500, we rely on sophisticated machine learning methods that are known for their
reliability and effectiveness in dealing with complex datasets. Among these options, Gradient Boosting and
its improved counterpart, XGBoost (eXtreme Gradient Boosting), are particularly notable for their strong
capabilities. Financial markets, such as the S&P 500, exhibit non-linear and intricate interconnections.
Conventional linear models frequently fail to capture these nuances, thereby requiring the utilization of
algorithms capable of modeling the intricate interactions of market components. Gradient Boosting and
XGBoost excel at uncovering non-linear patterns by constructing successive decision trees that correct
the errors made by their predecessors. The ensemble approach is highly effective in capturing complex
relationships and integrating diverse data types, such as categorical and numerical inputs commonly found
in financial databases. Simultaneously, the unpredictable characteristics of financial markets provide a
substantial danger of overfitting, wherein a model may exhibit strong performance on past data but fails
to apply to novel, unfamiliar market situations. XGBoost tackles this issue by integrating regularization
approaches, specifically L1 (LASSO) and L2 (Ridge), that penalize the complexity of the model and
improve its ability to generalize. The collective aspect of these models additionally enhances their resilience,
rendering them dependable instruments in the volatile domain of financial prediction. The selection of
Gradient Boosting and XGBoost for predicting the S&P 500 is based on their established success in
dealing with intricate, non-linear data patterns, resistance to overfitting, computational effectiveness, and
capacity to offer understandable insights.
2.4.1 Data Preprocessing

Efficient data preprocessing is essential for the efficacy of machine learning models such as XGBoost
and GBM, particularly in the domain of financial time series prediction. This section provides a concise
overview of the fundamental preprocessing procedures utilized.
Dealing with Time Series Characteristics
— Stationarity : Methods such as differencing or logarithmic transformations are employed to main-
tain the stability of the statistical features of the data over time.
— Seasonality and Trends : Utilizing differencing or seasonal decomposition methods is particularly
effective for analyzing the log returns of indicators.
Feature Engineering
7
— Lag Features : Prior values are utilized as predictors to capture temporal dependencies.
— Moving Window Statistics : Utilized to emphasize both immediate and enduring patterns.
Normalization and Scaling
— Techniques such as Min-Max Scaling or Z-score Scaling are utilized to assure equal contribution of
features.
— Dealing with Outliers : Outliers are managed using robust scaling or capping approaches.
Data Splitting Strategy
— Data splitting strategy is a method used to divide a dataset into separate subsets for analysis
or modeling purposes.
— Temporal Splits : The data is partitioned chronologically, with 80% allocated to the training set
and 20% allocated to the test set.
— Approach for Cross-Validation : Time-based techniques such as rolling or expanding window
cross-validation are employed. The list ends.
Data Quality Checks Techniques such as imputation or forward filling are employed to handle
missing values. Data obtained from credible sources such as Yahoo Finance guarantees the accuracy and
reliability of the data.
This extensive preparation guarantees that the input data is in a format that is favorable for optimizing
the performance of XGBoost and GBM models. Through painstaking feature engineering and data trans-
formation procedures, we establish a strong basis for reliable and precise predictive modeling by effectively
addressing the unique characteristics of financial time series data.
2.4.2 Mechanisms of Gradient Boosting and XGBoost

Gradient Boosting Mechanism
Loss Function : X
L(y, f (x)) = (yi − f (xi ))2
where y represents the actual time series values, and f (x) is the model’s prediction at each time step.
Algorithm for Time Series Forecasting :
The fundamental adjustment in this algorithm for time series forecasting involves incorporating lagged
data into the model. These characteristics enable the algorithm to acquire knowledge from previous values
of the series, which is essential for capturing the intrinsic temporal relationships in time series data. In
addition, the recursive prediction phase illustrates the utilization of the model’s output at a given time
step as an input for forecasting upcoming stages, which is a prevalent technique in time series analysis.
XGBoost Mechanism
Regularized Learning Objective :
X X
Obj = L(yi , ŷi ) + Ω(fi )
where L is the loss function representing the discrepancy between actual values yi and predictions ŷi , and
Ω denotes the regularization term which penalizes the complexity of the model.
Algorithm for Time Series Forecasting :
The algorithmic depiction demonstrates the adaptation of XGBoost specifically for time series forecas-
ting. It is essential to incorporate lagged features as inputs in order to accurately capture time-dependent
trends in the data. The inherent recursive structure of time series forecasting involves utilizing the result
at a given time step to inform the prediction for the subsequent time step. The regularization components,
denoted as Ω, in XGBoost aid in managing the complexity of the model. This ensures that the model
captures the fundamental patterns in financial time series data without excessively fitting to the inherent
noise, hence preventing overfitting.
8
Algorithm 1 gradient boosting
P
Initialize the model with a constant value : f0 (x) = arg minγ L(yi , γ)
for m = 1 to M do
Compute the residuals, representing the errors of the model’s predictions at the previous step :

∂L(yi , f (xi ))
rim = −
∂f (xi ) f (x)=fm−1 (x)
for all i
Fit a base learner (e.g., decision tree) hm (x) to these residuals, rim
Utilize lagged features of the time series as inputs to the base learner to capture temporal dependencies
Find the best multiplier γm for hm (x) :
X
γm = arg minγ L(yi , fm−1 (xi ) + γhm (xi ))
Update the model : fm (x) = fm−1 (x) + γm hm (x)

For recursive prediction, use the model’s output at one step to predict the next step in the time series
end for
The final model fM (x) represents the combined effect of sequentially fitting base learners to correct the
residuals of the model, incorporating temporal dynamics
Algorithm 2 Xgboost algorithm

Initialize model with f0 (x), which can be a simple predictor or a constant value.
for m = 1 to M do
Compute the gradient gim and Hessian him of the loss function for each observation ;
Fit a tree to these gradients and Hessians, capturing the error patterns ;
Use lagged features of the time series as inputs to the tree to incorporate temporal dynamics ;
Prune the tree using the complexity parameter to prevent overfitting ;
Update the model by adding the contribution of the pruned tree ;
Modify the output value of each leaf in the tree, considering the learning rate and regularization ;
For recursive prediction, utilize the model’s output at a previous time step as an input feature for
subsequent predictions ;
end for
The final model fM (x) represents the cumulative learning from the ensemble of pruned trees, with each
tree correcting the residual errors of the preceding ones and considering the temporal structure of the
data ;
9
3 S&P500 price prediction
3.1 Methodology
This section outlines the key steps in constructing our time series model over a 22-year and 11-month
period, from January 1, 2000, to November 17, 2023. The methodology is depicted in the following flow
chart (Figure 1).
Figure 2 – Flow chart
Our forecasting approach utilizes test data (last 20% of the data). To evaluate forecast accuracy, we
calculate the Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE) as follows :
n
1X
M AE = |yt − yˆt | (3)
n
t=1
v
u n
u1 X
RM SE = t (yt − yˆt )2 (4)
n
t=1
Optimal predictions are indicated by the lowest MAE and RMSE values.
3.2 Stationnarity
The foundational stage in analyzing time-indexed data is to stabilize the series by converting it from
a non-stationary to a stationary state. This step is critical as many statistical and econometric models,
including ARMA, presuppose stationarity within the dataset to ensure the validity of their application
and the reliability of their results.
10
This process can be mathematically represented as :
∆Yt = Yt − Yt−1
In the preprocessing of financial time series, it’s typical to first perform a logarithmic transformation, then
proceed with differencing. The logarithmic transformation is applied because financial data frequently ex-
hibit exponential growth, and this transformation aids in smoothing the series. Subsequently, differencing
is employed to stabilize the variance of the time serie.
Figure 3 – Logarithm of SP500 index price Figure 4 – SP500 index price
Figures 3 and 4 demonstrate that the SP500 price has long-term upward and downward trends, indi-
cating a non-stationary series. It is clear that both the mean and variance are not constant over time. A
structural break is noticeable between 2008 and 2009, because of the global financial crisis.
Figure 5 – ACF of the log price Figure 6 – PACF of the log price
Figure 6 show that PACF has a significant value at lag 1 and then cuts off, suggesting the model could
be an ARMA(1,0). However, Figure 5 illustrates that the ACF plot diminishing extremely slowly in a
linear manner, indicating the series is trended and non-stationary. To handle the series non-stationarity,
we use first differencing.
Figure 7 displays fluctuations around 0, suggesting that the series attains mean stationarity following
the first differencing. Nonetheless, it is clear that the variance of the series remains non-stationary. We
also tested for stationarity using the Augmented Dickey-Fuller Test. At the 5% significance level, we reject
the null hypothesis, then we conclude that after the first differencing the data is stationary. Therefore, we
will now proceed with the model estimation.
11
Test for unit root in p-Value
Level 0.5047
First difference 0.01
Figure 8 – Augmented Dickey-Fuller Test
Figure 7 – Logarithm of SP500 after first differen-

cing
3.3 ARIMA model

3.3.1 Model identification
Based on the preceding analysis, the differentiation parameter d is clearly identified as 1. Initially, an
attempt is made to identify p and q by examining the number of significant peaks in the ACF and PACF
plots, but this approach proves to be inconclusive as shown below.
Figure 9 – ACF of the differenced log price Figure 10 – PACF of the differenced log price
As an alternative approach, model identification can be conducted by choosing the model with the
lowest AIC (Akaike Information Criterion).
Based on this criterion, we have selected an ARIMA(1,1,2) given by :
log(Pt ) − log(Pt−1 ) = ϕ1 (log(Pt−1 ) − log(Pt−2 )) + θ1 εt−1 + θ2 εt−2 + εt , where εt ∼ N(0, 1) (5)
This formula highlights the utility of differencing log prices. The difference in log prices corresponds to
returns, similar to the percentage changes in stock prices, especially when dealing with daily data. This
can be easily demonstrated using a first-order Taylor expansion :
Pt − Pt−1
rt = , ln(1 + rt ) = rt + o(rt ) as rt → 0
Pt−1
3.3.2 Residuals Diagnostics

After estimating model’s parameters, we evaluate the model assumptions.
Firstly, we check residual normality. The lower right plot in Figure 11 shows non-normal distribution
of residuals. The Jarque-Bera test confirms this non-normality. Later, we will explore the way to address
this issue.
12
Test p-Value Conclusion on residuals
Jarque-Bera < 2.2e-16 Non-normality
Ljung-Box 0.4103 Independence
Figure 12 – Residuals Test Results
Figure 11 – Residuals diagnostic
We then test the independence of residuals, using the Ljung-Box test. Typically, we expect autocorre-
lation in about 5% of the estimates. Testing up to ln(number of observations) lags is recommended as a
rule of thumb. For our model, we applied the Ljung-Box test to 4 lags. The test results support the null
hypothesis, suggesting independent residuals. Thus, we infer that our model’s error terms are effectively
white noise.
In the upper plot of Figure 13, we observe distinct clusters of volatility within the time series. These
clusters become more pronounced when plotting the squared residuals, as shown in Figure 11. This pat-
tern suggests a non-constant variance of residuals, leading to the conclusion that the homoscedasticity
assumption does not hold.
Figure 13 – Squared residuals
To model this heteroscedasticity in variance, a GARCH (Generalized Autoregressive Conditional He-

teroskedasticity) model, adept to capture and reflect more recent changes and fluctuations in the series
3.4 ARIMA-GARCH model

3.4.1 Model implementation
As observed in the previous section, even though there is stationarity in mean, stationarity in volati-
lity (or variance) is not present. Consequently, we will model the residuals of the ARIMA model using a
GARCH model. The orders of the GARCH will be determined based on the Akaike Information Criterion
(AIC).
Given the ARIMA resisiduals in equation (3), The selected model is a GARCH(1,2) expressed as
follows :
To better fit the distribution of residuals, we shift from assuming normality for zt to adopting a skewed
student’s t-distribution (sstd). Figure 14 illustrates the inadequacy of a normal distribution in our context.
13
εt = σt zt , zt ∼ N(0, 1)
σt2 = ω + α1 ε2t−1 + β1 σt−1
2 2
+ β2 σt−2
Figure 14 – Residuals distribution vs

Normal distribution, skewness=-0.36, kur-
tosis=8.02
The next step involves checking the stationarity of the conditional variance σt in the GARCH(1,2)
model. This assessment can be conducted by confirming that the sum of the coefficients, α1 , β1 , and β2 ,
is less than 1. The detailed summary containing the coefficients of the ARIMA-GARCH model analysis
are provided in the annex.
3.4.2 Comparative Forecasting Analysis of S&P 500 Index Prices : ARIMA(1,1,2) vs.
ARIMA(1,1,2)-GARCH(1,2) Models
In this section, we explore the effectiveness of our forecasting approach in predicting the log prices for
out-of-sample (test) data.
As discussed in Section 3.1, the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)
are utilized to assess forecast accuracy. The results indicate a notable improvement in accuracy when
employing the ARIMA(1,1,2)-GARCH(1,2) combined model over the standalone ARIMA(1,1,2) model.
Model MAE RMSE

ARIMA(1,1,2) 0.31 0.35
ARIMA(1,1,2)-GARCH(1,2) 0.12 0.15
Table 1 – Comparison of Forecast Accuracy : ARIMA(1,1,2) vs. ARIMA(1,1,2)-GARCH(1,2)
The forecast depicted in Figure 15 clearly illustrates that our predictions align closely with the trend
observed in the test data.
Figure 15 – Comparison of Forecasted Values with Test Data
In the following section, we introduce an additional model. This model will be stacked with our current
one, aiming to enhance the accuracy of our predictions.
14
3.5 ARIMA-EGARCH model
The EGARCH(2,2) model, selected based on the AIC criterion for our residuals, assumes that zt follows
a skewed student’s t-distribution. The residuals εt from the ARIMA(1,1,2) are modeled as εt = σt zt , where
σt is the conditional standard deviation and zt is the standardized residual. The model is expressed as :
2 2 2
X X |εt−j | |εt−j | X εt−j
log(σt2 ) =κ+ 2
γi log(σt−i ) + αj −E + ξj .
σt−j σt−j σt−j
i=1 j=1 j=1
Here, κ, γi , αj , and ξj are the model parameters, and σt2 denotes the conditional variance.
However, this model yields less accurate predictions on the test data. Nevertheless, it still manages to
capture the overall upward trend observed in the test data.
The table below presents a comparative analysis of the accuracy of the newly introduced model against
the models discussed earlier :
Model MAE RMSE

ARIMA(1,1,2) 0.31 0.35
ARIMA(1,1,2)-GARCH(1,2) 0.12 0.15
ARIMA(1,1,2)-EGARCH(2,2) 0.22 0.26
Figure 16 – Model Accuracy Comparison

Figure 17 – Comparison of Forecasted Values with
Test Data
3.6 Stacking
In order to enhance forecasting accuracy, we assigned specific weights to the forecasts produced by
the two previous models. This was achieved by partitioning our test data into a new training and test
set. To determine these weights effectively, we utilized linear regression, predicting the weights through an
Ordinary Least Squares (OLS) method. The final combined model is expressed as follows :
log(Pt ) = 2.48 · log(Pt )ARIMA-GARCH − 1.49 · log(Pt )ARIMA-EGARCH

Note that the assumption of this linear regression aren’t valid, a diagnostic of its residuals is given on
the annex. The main purpose is just to see if we can gain in term of accuracy.
Model MAE RMSE

Stacking 0.07 0.09
Figure 18 – Stacking accuracy
Figure 19 – Comparison of Forecasted Values with

Test Data
15
The accuracy has markedly increased, as demonstrated by Figure 19, which clearly illustrates the rise
in price. However, there is a slight overestimation of this price for almost 20% of the displayed data,
representing the new test data for linear regression.
As a result, we will maintain this model as a benchmark for our trading strategy, given its superior
performance in predicting recent data.
3.7 Extension
Another aspect yet to be investigated is the presence of long memory in our data. A time series
characterized by long memory displays an autocorrelation function that decays slowly towards zero.
2
In our analysis, when examining the ACF of the series log PPt−1t
, we observe this behavior. Thus, it
is apparent that our series exhibits long memory, suggesting that the FARIMA (Fractional AutoRegressive
Integrated Moving Average) models would be more suitable for modeling this type of data.
Figure 20 – ACF of Squared Log Returns of the S&P 500 Prices
The selection of the autoregressive order p and moving average order q was based on opting for the
model with the lowest Akaike Information Criterion (AIC), yielding p = 2 and q = 2.
After training the model, the fractional differencing value was found to be d=0.49, which is lower than
0.5. As a result, the model is expressed by the equation :
(1 − ϕ1 L − ϕ2 L2 )(1 − L)0.49 Xt = (1 + θ1 L + θ2 L2 )ϵt

2
where L is a lag operator, and Xt = log PPt−1t
.
The validity of a weak FARIMA model hinges on two key assumptions : firstly, that d < 0.5, which is
confirmed in our case, and secondly, that the residuals should be weak white noise. This implies they have
zero mean, finite variance, and are uncorrelated (i.e., no autocorrelation). From the upper plot of Figure
21, the first two conditions appear to be visually satisfied. The last condition is also confirmed, based on
the Ljung-Box test results.
Test p-Value Conclusion on residuals

Ljung-Box 0.55 Independence
Figure 22 – Residuals Test Results
16
After having verified the assumptions underlying our model, we can now move forward to evaluate its
predictive accuracy. It is important to highlight that this modeling approach does not discern between
negative and positive returns, as it is based on the squared returns which constitute a long memory series.
Additionally, it is not capable of predicting the actual price of the index.
The results of the forecasting are depicted in Figures 23 and 24.
Figure 23 – Forecast of squared returns Figure 24 – Forecast of absolute returns
4 SP&500 Volatility Prediction

4.1 Examining Log returns
We first take a look at the series we are going to use in the GARCH model, the log returns defined by
Xt = log( PPt−1
t
) where Pt is the closing price of day t.
By looking at the tails of the fitted distribution in green we can observe that it is heavier than the
one in red, which corresponds to a normal one, therefore suggesting to use a different distribution than
the normal one. To confirm this intuition we use the Jarque Bera test to check if the distribution of log
returns is a normal one.
Test Hypothesis Test statistic p-value

H0 : Normality
Jarque Bera 26452 <0.0001
H1 : Non-normality
The table above confirms that the distribution is not normal, thus a distribution similar to the normal
17
one but with heavier tails is preferred. Consequently, we decided to use the Student or generalized error
distribution when fitting the GARCH process later on.
4.2 GARCH
We first begin by splitting the available data into two sets : one for training purposes representing
80% of the data set, and the remaining 20% for testing and validation purposes. One can note that the
splitting in time series analysis is different than the one traditionally used for machine learning : we split
the data in a continuous manner rather than randomly picking observations from the data set in order to
account for the natural dependency present in this type of data.
Figure 25 – Train-test split of the data set.
In order to model volatility, the first step is to find the suitable ARMA process for the conditional
mean of the series. We will be using the log returns of the S&P 500 as our series, and by looking at the plot
of the series as well as the ACF and PACF, we observe signs of stationarity. To confirm our observations,
we will use the augmented Dickey-Fuller test : We obtain a test statistic of -18.42 and a p-value lower than
0.01, thus confirming the stationarity of the log returns.

H0 : not stationary
Augmented Dickey-Fuller -18.42 <0.01
H1 : stationary
Table 2 – Stationarity of log returns.
To choose the best ARMA for the conditional mean, we fit several models of different orders, namely
ARMA(p,q) for p and q in {1, 2, 3, 4} and we choose the best one utilizing the AIC minimization criterion :
the chosen model corresponds to an ARMA(3,4). Concerning the diagnostic of this choice, we check the
correlation of residuals using a Box-Ljung test with lag 10. A p-value of 0.5681 (> 0.05) confirms that the
residuals are uncorrelated. To further motivate the use of GARCH model to estimate volatility, we use the
ArchLM test to confirm the existence of ARCH effect in the mean model.

Ljung-Box H0 : residuals correlated
8.6238 0.5681
on residuals H1 : residuals not correlated
H0 : No ARCH effect
ARCH LM 1386.6 <0.0001
H1 : ARCH effect
Table 3 – Selected Mean model diagnostic.
18
The next step is to model the conditional variance of the series, thus giving an estimate of the volatility.
We will use the AIC minimization criterion in order to select the most suitable model among several
GARCH(p,q) ones with p,q in {1, 2, 3}, specifying a student or generalized error underlying distribution.
Order GARCH(p,q)
STD GED
(1,1) -6.518204 -6.522004
(2,1) -6.522558 -6.524094
(3,1) -6.521499 -6.523678
(1,2) -6.519319 -6.521493
(2,2) -6.522471 -6.525927
(3,2) -6.520311 446.339786
(1,3) -6.519149 -6.523103
(2,3) -6.522389 -6.525511
(3,3) -6.521822 -6.523360
Table 4 – AIC for different models.
Under the student distributed errors assumption, we find that the GARCH(2,1) model was the best
one with regards to the AIC minimization criterion, whereas with GED errors the GARCH(2,2) model
was found to have the least AIC. We will further examine the performance of the best selected models for
both errors distribution by computing their MAE, MSE, and RMSE on training and test sets.
Model Train MAE Train MSE Train RMSE

STD 0.01054 0.000146 0.01211
GED 0.01043 0.000141 0.01191
Table 5 – Errors calculated on training sample for best selected models.
Model Test MAE Test MSE Test RMSE

STD 0.01956 0.000392 0.01995
GED 0.01305 0.00017 0.01312
Table 6 – Errors calculated on test sample for best selected models.
With regards to all the error metrics considered, we can observe that the GARCH model fitted using
the generalized error distribution is the one which achieves minimum values for MAE, MSE and RMSE.
4.3 Model diagnostics

The table summarized some of the diagnostics that can be done on both selected models. As we can
observe, the Ljung-Box test null hypothesis of residuals being correlated is not rejected at 5% significance
level, thus demonstrating that the standardized residuals are considered as white noise. Moreover, the
weighted ARCH LM test informs that there is no presence of ARCH effects in the models. These tests
show that the best chosen fitted GARCH models are able to correct the serial correlation of the S&P500
log returns in the conditional variance equation.
19
Test
Ljung-Box Weighted ARCH LM
(Standardized Residuals, lag 1) (lag 3)
H0 : Residuals are not correlated H0 : No ARCH effect
Hypotheses
H1 : Residuals are correlated H1 : ARCH effect
Model STD
Test statistic 1.611 0.08739
p-value 0.2044 0.7675
Model GED
Test statistic 0.7515 0.02663
p-value 0.3860 0.8704
Table 7 – Selected models diagnostic.
As a final observation of our model, we can take a look at how the prediction of future volatility looks
like on our test set. As we can see both models fail to capture the relatively big values of volatility in the
beginning of 2020 during the COVID period, but overall they give a stale prediction of volatility, which
is confirmed by the low variance in the returns after the COVID spikes. Finally, the GARCH(2,1) with
student error distribution tends to give higher estimates of the volatility compared to the GARCH(2,2)
with generalized error distribution. If we were to choose the best model with regards to error metrics
only, we would go with the latter, but since their evaluation metrics aren’t very different from each other,
especially on the training set we may choose the GARCH(2,1) with student error distribution if we are
not looking for a conservative prediction of the volatility, which can be the case in financial applications
when calculating the Value At Risk (VaR).
Figure 26 – Predicted Volatility on test set
20
5 A Deep Learning approach
In this section we will take a look at the performance of a neural network approach in predicting the
S&P500 stock price, by using a LSTM model.
We will be processing in the same way we did for the other models, by splitting the data into two sets : 80%
for training and 20% for testing purposes. As for machine learning models, deep learning models also have
hyperparameters that should be tuned in order to boost the performance of the model. LSTM network has
many different hyperparameters that can be tuned such as the number of LSTM units, the input sequence
length, the batch size, number of epochs, learning rate. . .It can be computationally expensive to fine-tune
all the possible hyperparameters, that’s why we will only be tuning the following ones : the number of
units, the learning rate and the number of layers. We will be testing values in {1, 2}, {0.001, 0.01} and
{50, 100} respectively. We find that the best model is the one with 0.001 learning rate, 1 units and 100
layers. The following plot shows the evolution of the loss on training set and test set.
Figure 27 – loss evaluated on training set and test set.
These are the results for LSTM on the testing set.
Figure 28 – Predictions on the test set.
21
6 A Gradient boosting Approach
This section of the study provides a thorough examination of the outcomes gained by implementing
a Gradient Boosting model for forecasting the S&P 500 index. Our approach, based on meticulous data
preparation and feature engineering, has resulted in a model that accurately represents the intricate pat-
terns present in the financial time series data. The results section will provide a comprehensive analysis of
the model’s prediction performance, focusing on important metrics such as Mean Squared Error (MSE),
Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). These metrics offer valuable in-
formation regarding the precision and dependability of the forecasts generated by the Gradient Boosting
model. In addition, we will examine the residual diagnostics to further evaluate the model’s prediction
skills. Through the analysis of residuals, we can assess the model’s capacity to accurately represent the
fundamental data patterns and detect any possible biases or inefficiencies.
6.1 Data preparation and preprocessing

We initiated our study by using the S&P 500 index data, which covers the period from January 1,
2000, to today. This ensures that our analysis takes into account the whole range of market situations
over a duration of two decades. The dataset was obtained from Yahoo Finance, a reputable repository for
financial time series data. In order to capture the time-based relationships within the financial series, we
created lagged features using the logarithmic returns. This procedure entails rearranging the time series
to generate features that represent past time intervals, a method that aids in identifying patterns such
as momentum or mean-reversion. Based on the autocorrelation study, we examined five previous time
intervals and found that recent past values have a substantial impact on the current return.
6.2 Feature engineering

6.3 Choice of Technical Indicators
The use of technical indicators for the improved GBM model was a deliberate decision made to capture
the fundamental patterns and trends in the financial time series data. Log returns serve as a fundamental
component of time series analysis, representing the percentage change in price. They are more stable
and consistent compared to raw prices. Temporal dependencies were incorporated by creating lagged
characteristics of log returns, enabling the model to learn from recent market movements. After careful
analysis, it was established that incorporating 5 lagged features is the most suitable approach to capture
significant recent events without causing overfitting or excessive complexity.
Furthermore, to offer a comprehensive perspective of the market, many technical indicators were in-
corporated alongside lagged returns.
— The utilization of Simple Moving Averages (SMA) and Exponential Moving Averages
(EMA) for durations of 20, 50, and 200 days provided a smoothing analysis of price movements
across short, medium, and long-term timeframes.
— The Relative Strength Index (RSI) calculated over a period of 14 and 21 days assesses the
velocity and magnitude of price fluctuations, indicating whether the market is overbought or over-
sold.
— The Moving Average Convergence Divergence (MACD) and its signal line indicated ins-
tances of momentum and trend reversals.
— The MACD Histogram visually illustrates the disparity between the MACD and its signal line,
highlighting changes in momentum.
— The Rate of Change (ROC) measures the speed at which prices move, incorporating a momen-
tum aspect.
— The Standard Deviation (StdDev) quantifies the level of uncertainty and potential danger
linked to price fluctuations.
22
6.4 Model Training, Parameter Tuning, and Evaluation
We utilized a resilient cross-validation approach specifically designed for time series data to train our
models. To acknowledge the intrinsic sequential dependence in financial data, we employed time series
cross-validation. This methodology adheres to the chronological sequence of observations, guaranteeing
that only historical data is utilized to forecast future values, hence avoiding any form of anticipatory bias.
Our data was partitioned into training and validation sets using a rolling window technique. During each
iteration, the model underwent training using a consistent initial set of data, which was then augmented
with additional data points for succeeding iterations. This approach is especially efficient for time series
data as it closely emulates the way models are employed in real-life situations, consistently incorporating
new information as it becomes accessible.
6.4.1 Hyperparameter Tuning

The hyperparameter tuning process was conducted to optimize the predicted accuracy while avoiding
overfitting. A grid search was performed on both GBM and XGBoost models, exploring various values for
parameters like tree depth, number of trees, and learning rate. The Gradient Boosting Machine (GBM)
was configured with 500 trees, an interaction depth of 30, and a shrinkage rate of 0.1. The selection of
these parameters was based on their performance throughout the cross-validation procedure, achieving a
trade-off between model complexity and generalizability.
6.4.2 Model evaluation

The models’ performance was assessed using standard metrics, namely Mean Squared Error (MSE),
Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). The derived values were as follows :
Metric Value
Mean Squared Error (MSE) 9.18308 × 10−5
Mean Absolute Error (MAE) 0.00554856
Root Mean Square Error (RMSE) 0.009582839
Table 8 – Performance Metrics
(a) Residuals over the test time period. (b) Predictions over the test time period.
Figure 29 – Comparative visualization of residuals and predictions from the time series model.
we are now going to delve in the residuals over the 3 year time period of the test data. The Residuals
Plot presented allows us to assess the effectiveness of a Generalized Boosting Model (GBM) when applied
to a time series data collection.
23
1. Residual Distribution : The plot has random residuals. This lack of structure indicates that the
model accurately captures data patterns and cycles without misspecifications.
2. Residual Variance : Time-dependent homoscedastic residuals are plotted. This consistency suggests
the model’s expected accuracy is constant throughout time.
3. Outliers : One large peak may signal exceptions. Rare or unexpected events may be missed by the
model. Examine these data pieces to discover their type and model performance impact.
4. Residual Centering : The residuals hover around the red zero line, showing the model does neither
exaggerate nor underestimate.
The Residuals Plot illustrates that GBM captures the main structure without systematic bias, fitting
time series data well. The need for quantitative analysis and outliers suggest model change.
7 Deployment
In order to put into application the predictions we obtained, we will use the predicted prices by the
stacking model, as well as the predicted volatilities and implement 2 trading strategies based on that. The
first one is based on a simple idea : buy if the predicted price for tomorrow is larger than today’s price.
With this first strategy, we get a return of 0.58, beating the benchmark return of the S&P500 which is
equal to 0.50.
Figure 30 – Strategy returns
24
8 Conclusion
In this comprehensive analysis of the S&P 500 stock index, we employed various advanced statistical
and machine learning techniques to predict stock prices and volatility. The methods ranged from traditional
time series models like ARIMA and GARCH to the advanced deep learning approach using LSTM and
gradient boosting.
Our findings illustrate the effectiveness of combining ARIMA with GARCH models to enhance predic-
tive accuracy, especially for financial time series data characterized by volatility clustering. The ARIMA(1,1,2)-
GARCH(1,2) model, in particular, stood out for its ability to closely align predictions with actual market
trends. We also explored the potential of ARIMA-EGARCH models, but they exhibited less accuracy
compared to the combined ARIMA-GARCH model. The aim of this study was also to understand and
explain how machine learning algorithms can be use in the time serie framework to forecast and predict
time series but also the pitfalls and sensible points to take into account when using this types of algorithms.
In addition, the application of a Long Short-Term Memory (LSTM) neural network presented an
alternative perspective, leveraging the power of deep learning to capture complex, nonlinear relationships
inherent in stock market data.
A key insight from this study is the relevance of selecting appropriate models based on the specific
characteristics of financial time series, such as non-stationarity and volatility clustering.
For future research, we recommend exploring hybrid models that combine the strengths of statistical
time series analysis and machine learning. Such models could potentially offer more robust and accurate
forecasts by capturing both linear and nonlinear dynamics in stock price movements. Additionally, incor-
porating external factors such as macroeconomic indicators or market sentiment analysis could further
enhance the predictive power of these models. Furthermore, as a limitation to our findings and concerning
their deployment, we were hoping to implement a a second strategy that involved using Bollinger bands
and creates a short signal if the predicted price is greater than upper band, and a long signal if the pre-
dicted price is lower than the lower band, but due to the fact that our prediction was always within the
bands, we weren’t able to create signals for the considered period.
In conclusion, our research contributes valuable insights into the field of financial forecasting, offering
a foundation for both academic exploration and practical application in financial analysis and trading
strategies.
25
9 Annex
9.1 Annex A - ARIMA
Figure 31 – ARIMA summary
9.2 Annex B - ARIMA-GARCH
Figure 32 – ARIMA-GARCH summary.
26
9.3 Annex C - ARIMA-EGARCH
Figure 33 – ARIMA-EGARCH summary.
9.4 Annex D- FARIMA
Figure 34 – FARIMA summary.
27
9.5 Annex E- - Residuals diagnostic for stacking
10 Personal note
10.1 Assili Mohamed
In this project, I worked on the third part which involved modeling the S&P 500 index price using a
special type of model called ARIMA-GARCH. This model combines two methods to better predict how the
index price might change. Additionally, I tested another model, called FARIMA, on the squared returns
of the index. This showed our data is a "long memory" data
A major challenge in this project was making sure the models were based on correct assumptions. It
was difficult to confirm that these assumptions were right. When they weren’t, I had to find ways to make
the models work properly. This part of the project required careful checking and coming up with different
solutions to ensure the models were reliable and gave good results.
To finish, I would like to thank our professor for his guidance throughout this project. His expertise
and timely answers to my questions greatly contributed in this study.
10.2 El Fikri Ilyass

During this project, my primary focus was on predicting the volatility of the S&P 500 using the
GARCH model, a complex and sophisticated tool for financial time series analysis. This task allowed me
to fully utilize and apply the extensive knowledge I gained from my time series courses. One of the critical
aspects of this work involved ensuring that various assumptions inherent in the GARCH model were
verified, which is crucial for the model’s accuracy and reliability. Additionally, I also engaged in working
with LSTM (Long Short-Term Memory) models as an approach to model S&P500 index price.
In addition to my individual contributions, I collaborated closely with my colleague Mohamed on
implementing a trading strategy that enabled us to apply our models in real-time conditions. This phase of
the project was particularly challenging, as it required us to adapt and modify our theoretical knowledge
to suit the unpredictable nature of financial markets. However, it was also incredibly rewarding, as it
allowed us to see the practical implications and potential of our work.
28
10.3 ERGUN Damien
I’ve been developing gradient boosting models. I was initially hopeful about the Gradient Boosting
Machine (GBM) model’s predictive potential. However, the earliest iterations of the model, which relied
exclusively on prior log returns as predictors, were insufficient. They were unable to capture the complicated
patterns and signals found in financial time series data. This setback motivated me to learn more about
technical analysis. To improve the model’s predictive powers, I recognized it needed to be fed features that
encapsulated the underlying market dynamics without overwhelming it with complexity. The integration
of carefully chosen technological indicators was a watershed moment. The Relative Strength Index (RSI),
Moving Average Convergence Divergence (MACD), and other indicators gave a distilled essence of market
emotion and trends.The challenge was to establish a balance ; if there were too few indications, the model
would miss out on important information ; if there were too many, it could overfit or become bogged down
by noise. I identified a selection of indicators that complemented the historical log returns, allowing the
GBM model to dramatically improve its projections after much experimentation and fine-tuning.
10.4 Kenza Chebil
11 Bibliography
Références
[1] Weiwei Jiang, "Applications of deep learning in stock market prediction : Recent progress," Depart-
ment of Electronic Engineering, Tsinghua University, 2020.
[2] Robert F. Engle III, "Risk and volatility : Econometric models and financial practice," Nobel Lecture,
2003.
[3] Tim Bollerslev, "Generalized autoregressive conditional heteroskedasticity," Journal of Econometrics,
vol. 31, pp. 307–327, 1986.
[4] Sreelekshmy Selvin, R. Vinayakumar, E.A. Gopalakrishnan, Vijay Krishna Menon, K.P. Soman,
"Stock price prediction using LSTM, RNN and CNN-sliding window model," Centre for Computatio-
nal Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa
Vidyapeetham, Amrita University, India, 2017.
[5] Prapanna Mondal, Labani Shit, Saptarsi Goswami, "Study of effectiveness of time series modeling
(ARIMA) in forecasting stock prices," International Journal of Computer Science, Engineering and
Applications (IJCSEA), vol. 4, no. 2, 2014.
[6] Ayodele A. Adebiyi, Aderemi O. Adewumi, Charles K. Ayo, "Stock Price Prediction Using the ARIMA
Model," School of Mathematics, Statistics & Computer Science, University of KwaZulu-Natal, Durban,
South Africa, 2014.
29

Rapport Séries Temporelles

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rapport Séries Temporelles

Uploaded by

Copyright:

Available Formats

Ecole nationale de la statistique

Time series project

Stock Price Prediction

3 S&P500 price prediction 10

4 SP&500 Volatility Prediction 17

5 A Deep Learning approach 21

6 A Gradient boosting Approach 22

Mathematical formulation and properties

where (ηt )t∈Z is independent, identically distributed of mean 0 and variance 1.

The process X is said to be stationary if :

E[Xt |Xt−1 , Xt−2 . . . ] = 0 (1)

the activation function is then applied and we obtain :

Concerning the state cell, we first calculate the value

The information in the cell at time t is then calculated by :

stc = btφ st−1

btc = bto h(stc )

2.4 Gradient Boosting

2.4.1 Data Preprocessing

2.4.2 Mechanisms of Gradient Boosting and XGBoost

Update the model : fm (x) = fm−1 (x) + γm hm (x)

Algorithm 2 Xgboost algorithm

Figure 2 – Flow chart

Figure 3 – Logarithm of SP500 index price Figure 4 – SP500 index price

Figure 8 – Augmented Dickey-Fuller Test

Figure 7 – Logarithm of SP500 after first differen-

3.3 ARIMA model

log(Pt ) − log(Pt−1 ) = ϕ1 (log(Pt−1 ) − log(Pt−2 )) + θ1 εt−1 + θ2 εt−2 + εt , where εt ∼ N(0, 1) (5)

3.3.2 Residuals Diagnostics

Figure 12 – Residuals Test Results

Figure 11 – Residuals diagnostic

Figure 13 – Squared residuals

To model this heteroscedasticity in variance, a GARCH (Generalized Autoregressive Conditional He-

3.4 ARIMA-GARCH model

Figure 14 – Residuals distribution vs

Model MAE RMSE

Table 1 – Comparison of Forecast Accuracy : ARIMA(1,1,2) vs. ARIMA(1,1,2)-GARCH(1,2)

Figure 15 – Comparison of Forecasted Values with Test Data

Model MAE RMSE

Figure 16 – Model Accuracy Comparison

log(Pt ) = 2.48 · log(Pt )ARIMA-GARCH − 1.49 · log(Pt )ARIMA-EGARCH

Model MAE RMSE

Figure 18 – Stacking accuracy

Figure 19 – Comparison of Forecasted Values with

Figure 20 – ACF of Squared Log Returns of the S&P 500 Prices

(1 − ϕ1 L − ϕ2 L2 )(1 − L)0.49 Xt = (1 + θ1 L + θ2 L2 )ϵt

Test p-Value Conclusion on residuals

Figure 22 – Residuals Test Results

Figure 21 – Residuals diagnostic

Figure 23 – Forecast of squared returns Figure 24 – Forecast of absolute returns

4 SP&500 Volatility Prediction

Test Hypothesis Test statistic p-value

Figure 25 – Train-test split of the data set.

Test Hypothesis Test statistic p-value

Table 2 – Stationarity of log returns.

Test Hypothesis Test statistic p-value

Table 3 – Selected Mean model diagnostic.

Table 4 – AIC for different models.

Model Train MAE Train MSE Train RMSE

Table 5 – Errors calculated on training sample for best selected models.

Model Test MAE Test MSE Test RMSE

Table 6 – Errors calculated on test sample for best selected models.

4.3 Model diagnostics

Table 7 – Selected models diagnostic.