You are on page 1of 18

NATIONAL ECONOMICS UNIVERSITY

Faculty of Mathematical Economics


-------***-------

ASSIGNMENT
ECONOMETRICS 2

Topic: Application of ARIMA and ARCH-GARCH models


in forecasting stock prices

Full name: Nguyen Ngoc Minh


Class: Actuary 63
Student’s ID: 11219238

Hanoi, 2023
I. Introduction
Predicting the future movement of stock prices and their volatility is a common area of interest
for both financial professionals and academics. For investors, understanding the future behavior
of stock prices allows them to make informed decisions about how to allocate their assets in
order to maximize their returns. Similarly, financial scholars believe that accurately predicting
the prices of capital assets is crucial for developing more precise asset pricing theories. Over the
years, researchers have developed a variety of quantitative modeling techniques to forecast stock
prices and volatility, including combinations of ARIMA and GARCH models.
Within the scope of the Econometrics II course, this study will focus on applying ARIMA and
GARCH models to predict the closing prices and volatility of Binh Duong Mineral and
Construction JSC’s share on 10 trading days between August 1st, 2023 and August 14th, 2023.

II. Theoretical Framework


II.1. ARIMA Model
ARIMA stands for "Autoregressive Integrated Moving Average." It is a forecasting method used
in statistics and econometrics, particularly in time series analysis. ARIMA models are used to
better understand or forecast future trends in a time series data set. They are based on the idea
that the current values of a time series can be predicted based on its past values and by the
accumulation of shocks in the past now that the series is stationary.
ARIMA models are defined by three parameters:
Autoregressive (AR): The AR term represents the dependence of the current value of the time
series on its past values. The number of AR terms in a model is denoted by the order p.
Integrated (I): The I term represents the number of times the time series must be differenced to
make it stationary. Stationarity means that the statistical properties of the time series, such as its
mean and variance, do not change over time. The number of I terms in a model is denoted by the
order d.
Moving Average (MA): The MA term represents the dependence of the current value of the time
series on its past errors or noise. The number of MA terms in a model is denoted by the order q.
The general model of ARIMA ( p , d ,q ) is:
Y t =μ+(ϕ 1 Y t −1+ ⋯ + ϕ p Y t −p )+(ε t +θ 1 ε t−1 + ⋯ + θq ε t −q )

Where Y t is the stationary I(d) series, µ is the intercept, ε t is the shock of Y t at time t, p is the
order of AR(p), ϕ p is the coefficient of AR(p), q is the order of MA(q), θq is the coefficient of
MA(q). In addition, the values assigned to parameters p, d and q must be non-negative integer. If
d = 0, then the ARIMA(p, d, q) model will become an ARMA(p, q) model.
In business and finance, ARIMA models is widely used to forecast future quantities and prices
for its simplicity and the capability to generalize non-stationary series.
II.2. ARCH - GARCH Models
The ARCH (autoregressive conditional heteroskedasticity) and GARCH (generalized
autoregressive conditional heteroskedasticity) models are statistical models used to model
time series data with time-varying volatility. In other words, they are used to model time series
data where the variance of the error term is not constant and instead changes over time.
ARCH models were developed by Robert Engle in 1982 to model the volatility of financial
returns. The ARCH model is a type of autoregressive (AR) model, which means that the current
value of the error term is a function of its past values. However, unlike a standard AR model, the
ARCH model allows the variance of the error term to change over time. This is done by making
the size of the ARCH coefficients (which represent the weights given to the past values of the
error term) dependent on the size of the past values of the error term themselves.
GARCH models were developed by Tim Bollerslev in 1986 as an extension of the ARCH model.
The GARCH model includes an additional term that represents the dependence of the current
variance on the past variances. This allows the GARCH model to capture more complex patterns
of volatility than the ARCH model.
The model of ARCH(p):
2 2 2
σ t =w+ γ 1 ε t −1+ ⋯ + γ p ε t − p +v t

where w is the long-run volatility parameter with positive value, v t is the residual at time t, p is
the number of actual errors included in the model, and γ 1 ,… , γ q are non-negative regression
q
coefficients such that ∑ γ j <1. Besides, the distribution of ϵ t given its lag values ϵ t −1 , … , ϵ t−q is
j=1
assumed to be normally distributed.
The model of GARCH(p, q):
2 2 2 2 2
σ t =w+ δ 1 σ t −1+ ⋯ +δ p σ t − p+ γ 1 ε t −1+ ⋯ + γ q ε t−q + v t

where p is the number of lag values of conditional variance, and δ 1 , … , δ pare non-negative
p q
regression coefficients such that ∑ δ i + ∑ γ j< 1.
i=1 j=1

III. Data & Methodology


III.1. Data
For this study, I have collected a series of closing share prices of Binh Duong Mineral and
Construction JSC (HOSE: KSB) from 392 trading sessions between January 1st, 2022 and July
31st, 2023. All observations are obtained from the trading history of KSB stocks on Ho Chi Minh
City Stock Exchange (HOSE). There are reasons why I’ve chosen this stock. Firstly, KSB JSC is
one of the country’s leading companies that engages in the mineral exploration, exploitation and
processing, and production and trading of construction materials. Secondly, KSB stocks have an
established trading history as KSB JSC has been listed on HOSE since 2010. These factors
ensure the credibility and integrity of the company, which greatly affects the value of its stocks.
Besides, my dataset also consists of a growth rate series and a log return series, which I have
exported from the original stock price series using the following formulas:
y t − y t −1
Growth rate: g y t =
y t−1

Log return: lyt =ln ( )yt


y t−1

Let {ks b t } be the series of closed prices of KSB shares, {gks bt }be the series of growth rates on
KSB shares, and { lks bt } be the series of log returns on KSB shares. Then, a summary of statistics
and the time series plots for all 03 time series in the data set is provided below:
Table 1. Descriptive Statistics

Closed price Growth rate Log return


{ks b t } {gks bt } { lks bt }
Observations 392 391 391
Mean 28946.68 -0.00046 -0.001
S.D 9229.71 0.032797 0.033012
Minimum 12500 -0.07 -0.07257
Maximum 53000 0.069767 0.067441
Kurtosis -0.164 0.121533 0.15684
Skewness 0.717 -0.28768 -0.38716

Figure 1. KSB stock prices series


Figure 2. Growth rate series

Figure 3. Log return series

III.2. Methodology
Predicting KBS stock prices using ARIMA model
In this study, I will apply the Box-Jenkins method to choose the most suitable ARIMA models
for all 3 series – closing price, growth rate, and log return, after that I will employ those models
to forecast the future values of KBS stock.
Figure 4. Box-Jenkins Methods
Step 1: Electing data
My dataset will be divided into 2 parts: the training set and the validation set. The validation set
includes 10 observations, from August 1st, 2023 to August 14th, 2023, while the training set
consists of the rest of the series. This helps to evaluate and compare between the actual prices
and the forecast prices.
Step 2: Testing for stationary series
I will use the Dickey-Fuller test to check for stationarity. The null hypothesis is that the series is
unit root and non-stationary. If the DF statistic is larger than the critical value, then rejecting the
null hypothesis, the series is therefore stationary. Afterwards, I will plot the ACF and PACF of
the stationary series to determine the order. The order of AR is the furthest order of the series
having partial autocorrelation whereas the order of MA is the furthest order of the series having
autocorrelation.
Step 3: Estimating
Having chosen the order for ARIMA, I will estimate the intercept and coefficients of these
models. With the criteria of significant coefficients and AIC criteria, I will be able to elect the
optimal models.
Step 4: Diagnostic checking
Firstly, I will use the inverse unit root circle to check for stationarity of the autoregressive terms
and moving average terms in the model. If all inverse roots are within the unit circle, I can surely
conclude the AR process and MA process in the ARIMA model are stationary.
Secondly, I will use the ACF and PACF correlograms to check for autocorrelation and partial
autocorrelation in the residual series. If spikes at all orders are insignificant, then the residual
series has no autocorrelation and partial autocorrelation. Nevertheless, if there are significant
spikes at order smaller than 10 in the ACF and PACF correlograms, then the model is
misspecified and must be removed.
Finally, I will exhibit the Box-Ljung test to test for hypothesis that the residual series is a white
noise. If the p-value of the test is higher than 5%, then I can conclude that the residual series is a
white noise.
Step 5: Forecasting
From the models in the comparison table, I will choose 3 candidate models based on 3 criteria:
AR and MA process are stationary, residual series is a white noise process and most importantly,
smallest AIC values. Then, I will use these models to forecast for the 10 observations in the
validation data set. In the case of the growth rate series and log return series, I will apply the
following formulas to calculate KSB stock values for the last 10 days:

For growth rate series: ks bt =( 1+ gks b t )∗ks bt

For log return series: ks bt =exp ( lks bt )∗ks b t−1

III.3. Results of forecasting stock prices using ARIMA model


Stationarity

Table 2. Comparison between DF test and critical values for stock price series
Without drift With drift With trend
τ stat τ 0.05 τ stat τ 0.05 τ stat τ 0.05
-1.1962 -1.95 -1.9427 -2.87 -1.2526 -3.42

As we can see, ¿ τ stat ∨¿ ¿ τ 0.05∨¿ in all three cases of DF tests with trend, with drift and without
drift. Therefore, at significant level 5%, we can conclude that the stock price series is non-
stationary.
Table 3. Comparison between DF test statistics and critical values for 1st difference series,
growth rate series and log return series
Without drift With drift With trend
τ stat τ 0.05 τ stat τ 0.05 τ stat τ 0.05
First difference series
-13.4043 -1.95 -13.414 -2.87 -13.5548 -3.42
Growth rate series
-13.9902 -1.95 -13.9757 -2.87 -14.0994 -3.42
Log return series
-13.9262 -1.95 -13.9246 -2.87 -14.0577 -3.42
Table 4. DF test’s coefficient estimation results
Without drift With drift With trend
Intercept -34.22125 -180.22590 *
1st difference Lagged values -0.87599 *** -0.87837 *** -0.89230 ***
Time trend 0.74219 *
Intercept -0.0004194 -0.005089
Growth rate Lagged values -0.90812 *** -0.9084152 *** -0.9206 ***
Time trend 0.0000238
Intercept -0.0009087 -0.005765 *
Log return Lagged values -0.90225 *** -0.9036489 *** -0.9166 ***
Time trend 0.000025 *
*, **, ***: significant at 10%, 5%, 1%
From Table 3 and Table 4, at significant level 5%, we can conclude that our 1 st difference series,
growth rate series and log return series are stationary around 0.
Autocorrelation and Partial Autocorrelation
Figure 5. ACF and PACF correlograms of 1st difference series

From the ACF and PACF correlograms of 1st difference series, we can infer the possible values
for lag order p= 1,2,4,6 and for order of moving average q=1,2,4,6 too.
Figure 6. ACF and PACF correlograms of growth rate series

From the ACF and PACF correlograms of growth rate series, we can infer the possible values for
lag order p= 1,2,3,6 and for order of moving average q=1,2,3,6 too.
Figure 7. ACF and PACF correlograms of log return series

From the ACF and PACF correlograms of log return series, we can infer the possible values for
lag order p= 1,2,3,4,6 and for order of moving average q=1,2,3,4,6 too.
Fitting models
Table 5. 03 best ARIMA Model Specifications for the stock prices series
ARIMA models stock price series
ARIMA(4, 1, 2) ARIMA(6, 1, 2) ARIMA(4, 1, 4)
Coefficients
Significant Coefficients 5/6 4/8 4/8
Stationarity
Inverse Roots in Unit Circle All roots are All roots are All roots are
within unit circle within unit circle within unit circle
Residual Diagnostics
ACF No significant No significant No significant
autocorrelation at autocorrelation at autocorrelation at
any order any order any order
p-value of Box-Ljung Test 0.2366 0.34 0.2462
Information criteria
AIC 6318.79 6322.98 6321.61

From Table 5, we could see that ARIMA (4, 1, 2) has the most significant coefficients compared
to the two other models, and most importantly, it has the smallest AIC, therefore I choose
ARIMA (4, 1, 2) to forecast for the stock price.
Figure 8. Unit circle for ARIMA (4, 1, 2)
Figure 8 shows that all inverse roots lie inside the unit circle, so we can strictly conclude that the
1st difference series is stationary.

Figure 9. Residual series for ARIMA(4, 1, 2)

For ARIMA(4, 1, 2), the P-value is 23.66%, larger than the significance level of 10%, thus the
residual series of the model is white noises.
Table 6. 03 best ARIMA Model Specifications for the growth rate series
ARIMA models growth rate series
ARIMA(1, 0, 1) ARIMA(2, 0, 2) ARIMA(2, 0, 3)
Coefficients
Significant Coefficients 1/2 4/4 5/5
Stationarity
Inverse Roots in Unit Circle All roots are All roots are All roots are
within unit circle within unit circle within unit circle
Residual Diagnostics
ACF No significant No significant No significant
autocorrelation at autocorrelation at autocorrelation at
any order any order any order
p-value of Box-Ljung Test 0.6167 0.527 0.6649
Information criteria
AIC -1524.95 -1528.56 -1530.19

Table 6 implies that model ARIMA(2, 0, 3) has the most significant coefficients and the smallest
AIC, therefore it should be the most suitable model to forecast for the growth rate.

Figure 10. Unit circle for ARIMA(2, 0, 3)

Figure 10 shows that all inverse roots lie inside the unit circle, so we can strictly conclude that
the growth rate series is stationary.

Figure 11. Residual series for ARIMA(2, 0, 3)


For ARIMA (2, 0, 3), the P-value is 66.49%, larger than the significance level of 10%, and only
the spike after order 10 is significant, thus the residual series of the model is white noises.
Table 7. 03 best ARIMA Model Specifications for the log return series
ARIMA models log return series
ARIMA(2, 0, 6) ARIMA(6, 0, 2) ARIMA(4, 0, 2)
Coefficients
Significant Coefficients 5/8 5/8 6/6
Stationarity
Inverse Roots in Unit Circle All roots are All roots are All roots are
within unit circle within unit circle within unit circle
Residual Diagnostics
ACF No significant No significant Significant partial
autocorrelation at autocorrelation at autocorrelation at
any order any order order 6.
p-value of Box-Ljung Test 0.5554 0.302 0.09385
Information criteria
AIC -1519.38 -1520.4 -1515.6
Model ARIMA (4, 0, 2) has the most significant coefficients, however, it has significant
autocorrelation at order 6, and P-value is smaller than the significance level of 10%. Thus, this
model should be removed. 2 models left have the same number of significant coefficients, but
model ARIMA (6, 0, 2) has smaller AIC, therefore it could be chosen to forecast for log return
series.
Figure 12. Residual series for ARIMA (4, 0, 2)
Figure 13. Unit circle of ARIMA (6, 0, 2)

Figure 13 shows that all inverse roots lie inside the unit circle, so we can strictly conclude that
the log return series is stationary.

Figure 14. Residual series for ARIMA (6, 0, 2)


For ARIMA (6, 0, 2), the P-value is 30.2%, larger than the significance level of 10%, and only
the spike after order 10 is significant, thus the residual series of the model is white noises.
Forecasting for validation set
To find the best model between ARIMA (4, 1, 2), ARIMA (2, 0, 3) and ARIMA (6, 0, 2), I will
apply both models to forecast for the validation set. The model with smaller forecast error will be
the optimal model and will be chosen to forecast for future stock price.
Firstly, I will use ARIMA (4, 1, 2) for stock price series to forecast for 10 validation data.
Table 8. Forecasting results for stock price using ARIMA (4, 1, 2)
Date Actual price Fitted price Error
18/7/2023 30800 30958.53 0.51%
19/7/2023 30950 30915.81 -0.11%
20/7/2023 32000 31068.15 -2.91%
21/7/2023 32200 31304.93 -2.78%
24/7/2023 32500 31472.47 -3.16%
25/7/2023 31800 31458.79 -1.07%
26/7/2023 31850 31253.61 -1.87%
27/7/2023 31500 30951.40 -1.74%
28/7/2023 32000 30697.20 -4.07%
31/7/2023 31800 30606.20 -3.75%

Similarly, I will do the same task with ARIMA (2, 0, 3) for growth rate series.
Table 9. Forecasting results for stock price using ARIMA (2, 0, 3)

Date Actual price Fitted price Error


18/7/2023 30800 31259.49 1.49%
19/7/2023 30950 31226.07 0.89%
20/7/2023 32000 31150.98 -2.65%
21/7/2023 32200 31239.35 -2.98%
24/7/2023 32500 31124.99 -4.23%
25/7/2023 31800 31161.82 -2.01%
26/7/2023 31850 31158.42 -2.17%
27/7/2023 31500 31079.00 -1.34%
28/7/2023 32000 31147.03 -2.67%
31/7/2023 31800 31066.66 -2.31%
And for log return series:
Table 10. Forecasting results for stock price using ARIMA (6, 0, 2)
Date Actual price Fitted price Error
18/7/2023 30800 31044.91801 0.80%
19/7/2023 30950 31075.13675 0.40%
20/7/2023 32000 31066.17121 -2.92%
21/7/2023 32200 30983.43785 -3.78%
24/7/2023 32500 30971.54714 -4.70%
25/7/2023 31800 30962.3434 -2.63%
26/7/2023 31850 30886.56637 -3.02%
27/7/2023 31500 30853.57234 -2.05%
28/7/2023 32000 30861.07067 -3.56%
31/7/2023 31800 30803.62703 -3.13%

Assessing Forecast errors


Table 11. Forecast errors
ARIMA (4, 1, 2) ARIMA (2, 0, 3) ARIMA (6, 0, 2)
RMSE 938.97 0.0318375 0.03171015
MAE 699.6962 0.02439185 0.02437886
MASE 0.9977031 0.7323125 0.7376793

From Table 10, RMSE and MAE of ARIMA (6, 0, 2) are smaller than those of two others.
Therefore, it could be concluded that model ARIMA (6, 0, 2) used for log-return series is the best
model to forecast. The equation of ARIMA (6, 0, 2) is:

r t =−0.2836 r t−1−0.915 r t −2+ 0.1883 r t−3−0.0814 r t −4 +0.0188 r t −5+ 0.0422r t −6 +


ε t +0.4969 ε t −1+ 0.9302 ε t −2

Forecasting for future


In this part, to forecast for the stock price for the next 10 days from 31/07/2023, I will use model
ARIMA (6, 0, 2) for log-return series with full data from 1/1/2022 to 31/07/2023 (a total of 391
observations). The forecasting method is mixed. The following table is the stock price
forecasting of KSB for the next 10 days.
Table 12. Future forecasting
Date Actual price Forecast price Error
1/8/2023 31600 31625.68 0.08%
2/8/2023 32100 31637.71 -1.44%
3/8/2023 31750 31648.92 -0.32%
4/8/2023 31900 31545.55 -1.11%
7/8/2023 32600 31534.81 -3.27%
8/8/2023 32049 31556.48 -1.54%
9/8/2023 31800 31467.69 -1.05%
10/8/2023 30500 31415.1 3.00%
11/8/2023 31000 31449.82 1.45%
14/8/2023 31250 31400.42 0.48%

From Table 11, the forecasting results have less than 5% error, within the allowable limits.

III.4. ARCH – GARCH modelling results


III.4.1. Models
As the log-return series is stationary around zero, I will forecast for the volatility of the log-
return of KSB stock prices. I use the ARCH test from order 1 for the log-return series, and it is
indicated that the P-value of the ARCH test for order 1 to order 90 are smaller than the
significant level of 5%. Thus, the series has ARCH effects to order 90. However, to have a better
forecast for volatility, I expand the ARCH test to GARCH. And the available models for
GARCH test are presented in the table below.
Table 13. ARCH – GARCH Models
GARCH (0,1) (0,2) (1,1)
0.0008929 0.0007111 0.00001563
w
[***] [***] [***]
0.09994 0.1056 0.07918
γ1
[***]
0.1839
γ2 - -
[**]
0.904
δ1 - -
[***]
δ2 - - -

GARCH (1, 1) has the most numbers of significant coefficients. Its delta and gammas are all
non-negative, and their sum is smaller than 1. Therefore, it could be concluded that GARCH (1,
1) is the best model to forecast for the volatility of KSB’s log return.
The model of GARCH (1, 1) is:
2 2 2
σ t =0.00001563+0.07918 ε t−1 +0.904 σ t−1 +v t

III.4.2. Unconditional variance


The estimated unconditional variance of GARCH (1, 1) is:
2 0.00001563
σ = =0.000929251
1−0.07918−0.904
III.4.3. Forecast for volatility
In this part, to forecast for the volatility of the log-return series for the next 10 days from
31/07/2023, I use the above model GARCH (1, 1) with mixed forecasting method. The following
table is the volatility forecasting for the next 10 days.
Table 14. Forecasting for volatility

Date Forecast volatility (σ 2t )


1/8/2023 0.00038185
2/8/2023 0.00018522
3/8/2023 0.00018891
4/8/2023 0.00018497
7/8/2023 0.00018561
8/8/2023 0.00021729
9/8/2023 0.00018675
10/8/2023 0.00018485
11/8/2023 0.00018475
14/8/2023 0.00018475

IV. Conclusion and Limitations


This study focuses on using ARIMA model, ARCH – GARCH model to forecast for the
KSB stock price and volatility in short-term. From analyzing these models and
forecasting, there are some conclusions that could be drawn from:
Firstly, the forecast results in August show that the forecast price is approximately the
actual value with the error smaller than the allowable limit 5%. This could be said that the
reliability of the ARIMA (6, 0, 2) for log return series is significantly high. Meanwhile,
the GARCH (1, 1) reflects the volatility of log-return rate for 10 days, helps to estimate
the risk of KSB and how the return fluctuates in the future.
However, there are some limitations in forecasting by ARIMA and ARCH-GARCH. The
number of observations is significantly small, therefore the forecast results may not
reflect the true story of the stock. Furthermore, a stock and its volatility are affected by
enormous systematic and firm-specific risks. For example, in some trading sessions, the
impact of major external factors such as sentimental investments, lack of information
about major changes, unpredictable event in firms and market might make the forecast
error higher. Thus, the results of the model need more consideration and factors. But
generally, it could be concluded that the ARIMA model and ARCH-GARCH model are
utilized to forecast in the short run.
V. References

You might also like