FULLTEXT01

Philip Jansson & Hugo Larsson
ARIMA Modeling
Forecasting Indices on the Stockholm Stock Exchange
ARIMA Modellering
Att förutspå index på Stockholmsbörsen
Bachelor Thesis – Finance

Kandidatuppsats – Finansiell ekonomi
Term: HT-19
Supervisor: Klaas Staal
Karlstad Business School

Karlstad University SE-651 88 Karlstad
Phone: +46 54 700 10 00
E-mail: handels@kau.se kau.se/en/hhk
Acknowledgment
First and foremost, we would like to express our gratitude to our supervisor, Klaas Staal, for his
support and guidance throughout the process of writing this thesis. We would further like to thank
Karl-Markus Modén for his inputs on how to improve the quality of the thesis, Jari Appelgren for the
help with relevant literature, and finally fellow peers for their inputs during seminars.
Hugo Larsson & Philip Jansson, Karlstad, January 13th, 2020.
ii
Abstract
The predictability of the stock market has been discussed over a long period of time and is of great
interest to anyone investing in the stock market. Some people argue that the stock market is
impossible to predict, while others believe that the market is somewhat predictable. The purpose of
this study is to construct, evaluate, and compare different ARIMA models’ ability to forecast the
Stockholm Stock Exchange indices, OMXSPI and OMXS30. To find the best-fitted model for each
index respectively, the Expert Modeler tool in the software SPSS is used. In the analysis, the
suggested model is confirmed by following the Box-Jenkins methodology. Furthermore, Akaike’s
information criterion is used to show that this model has the best fit, compared to another estimated
model. Statistical measurements that aim to measure the performance of time series forecasting
models are used to compare the best-fitted model for each index to a model with worse fit, based on
the data. To enable comparison between the models, out-of-sample predictions are made for already
occurred periods and the measurements used measures the mean percentage error from the actual
outcomes. Thereafter, the performances of the best-fitted models are compared to the performances of
models that are used in a previous study, where time series have been predicted using ARIMA
models.
The results of the study showed that the models suggested by SPSS performed better in forecasting
the indices compared to the models with worse fit, based on the data. Compared to the models used in
a previous study, the out-of-sample forecast performances of the models in this study are in line with
those of that previous study. The mean percentage error for the forecasted values of OMXSPI and
OMXS30 were 4,74% and 2,22%, respectively. The mean absolute percentage error for the forecasted
values were 6% for OMXSPI and 4,35% for OMXS30.
Keywords: Forecasting, ARIMA, Index, MPE, MAPE.
iii
Sammanfattning
Aktiemarknadens förutsägbarhet har diskuterats under lång tid och är av stort intresse för samtliga
som investerar på börsen. Det finns dem som påstår att aktiemarknaden är fullkomligt oförutsägbar,
samtidigt som andra hävdar att den är delvis förutsägbar. Studier har genomförts i syfte att skapa
verktyg som förutspår rörelser på aktiemarknaden. Denna studie syftar till att skapa, utvärdera och
jämföra olika ARIMA-modellers förmåga att förutspå Stockholmsbörsens index OMXSPI och
OMXS30. För att hitta den bäst anpassade modellen baserat på data för respektive index, används
verktyget Expert Modeler i programmet SPSS. I analysen bekräftas den föreslagna modellen genom
att följa den så kallade Box-Jenkins metodologin och Akaikes informationskriterium används för att
visa att modellen är bäst anpassad efter data, jämfört med en alternativ modell. Genom statistiska mått
som utvärderar prestationen av modeller som förutspår tidsserier, jämförs den bäst anpassade
modellen för respektive index med en modell som är sämre anpassad till data. För att möjliggöra
jämförelse modellerna emellan, förutspås en period som redan inträffat och måtten som används
mäter den generella avvikelsen per förutspått värde gentemot det faktiska utfallet. Utöver denna
jämförelse, jämförs prestationen av de bäst anpassade modellerna med prestationer av modeller
använda i en tidigare studie, där tidsserier förutspåtts med hjälp av ARIMA-modeller.
Resultatet av studien visar att de föreslagna modellerna från SPSS presterade bättre i syfte att förutspå
indexen, jämfört med modeller sämre anpassade för data. Jämfört med modeller från den tidigare
studien presterade de bäst anpassade modellerna i denna studie likvärdigt. Den generella avvikelsen
för de förutspådda värdena av OMXSPI och OMXS30 var 4,74% och 2,22%, respektive. I absoluta tal
var den generella avvikelsen för OMXSPI 6% och för OMXS30 4,35%.
Nyckelord: Förutspå, ARIMA, Index, MPE, MAPE.
iv
Table of Contents
1. Introduction ........................................................................................................................ 1
1.1. Research Problem ......................................................................................................................... 1
1.2. Purpose ......................................................................................................................................... 2
1.3. Methodology ................................................................................................................................. 2
1.4. Delimitations ................................................................................................................................ 3
1.5. Disposition .................................................................................................................................... 3
2. The Critiques of the Efficient Market Hypothesis .............................................................. 4
3. Time Series Econometrics .................................................................................................. 5
3.1. Time Series Data .......................................................................................................................... 5
3.2. Stochastic Processes ..................................................................................................................... 5
3.3. Autoregressive Model................................................................................................................... 5
3.4. Stationary Process......................................................................................................................... 5
3.5. Nonstationary Process .................................................................................................................. 6
3.6. Integrated Process ......................................................................................................................... 6
3.7. Deterministic Trend ...................................................................................................................... 7
3.8. Modeling of Time Series Data...................................................................................................... 7
3.8.1. Autoregressive Process ......................................................................................................................... 7
3.8.2. Moving Average Process ...................................................................................................................... 7
3.8.3. Autoregressive and Moving Average Process ...................................................................................... 8
3.8.4. Autoregressive Integrated Moving Average Process ........................................................................... 8
4. Empirical strategy .............................................................................................................. 9

4.1. Detecting Stationarity ................................................................................................................... 9
4.2. The Box-Jenkins Methodology .................................................................................................... 9
4.3. Ljung-Box Statistic ..................................................................................................................... 10
4.4. The Expert Modeler .................................................................................................................... 10
4.5. Akaike’s Information Criterion .................................................................................................. 11
4.6. Mean Percentage Error & Mean Absolute Percentage Error ..................................................... 11
4.7. OMXS30 & OMXSPI ................................................................................................................ 12
5. Literature Review ............................................................................................................. 13
5.1. Forecasting the Indian Stock Market .......................................................................................... 13
5.2. Comparison of Forecasting Models Accuracy ........................................................................... 13
5.3. Building a Forecasting Model; Using the Box-Jenkins Methodology ....................................... 13
5.4. Comparing the AIC of Different Models to Find the Best Fit.................................................... 14
6. Analysis ............................................................................................................................ 15
6.1. Descriptive Statistics & Line Charts .......................................................................................... 15
v
6.2. Modeling OMXSPI..................................................................................................................... 17
6.3. Modeling OMXS30 .................................................................................................................... 22
6.4. Comparison to Previous Studies ................................................................................................. 27
7. Conclusion........................................................................................................................ 29
8. References & Data Sources ............................................................................................. 31
vi
1. Introduction
Forecasts are predictions of the future that are based on what is occurring in the present, but also on
what has occurred in the past. Well allocated and diversified financial assets in the stock market gives
a high probability of a substantial long-term growth. Therefore, predicting the stock market has
become important and widely popular. However, due to the nature of high volatility and the stochastic
behavior of the stock market, forecasting has proven to be a difficult task. Even though perfect
predictions over time are impossible, there are several models that are relatively accurate when
forecasting a time series such as stock values.
The autoregressive integrated moving average, also known as an ARIMA model, is a forecasting
approach that has become popular. In this paper, different ARIMA models are constructed to forecast
the two major indices OMXS30 and OMXSPI, on the Stockholm Stock Exchange, and the accuracy
of the different models are compared. Out-of-sample predictions are made for already occurred
periods to allow for evaluation of the models forecasting accuracy.
“A Prediction Approach for Stock Market Volatility Based on Time Series Data” is an article where
two Indian stock market indices are modelled using ARIMA modeling (Idrees et al. 2019). The article
will be used as the starting point for the empirical analysis in this thesis and the approach will be
similar to the one in that article when constructing the statistical models to forecast OMXSPI and
OMXS30.
The efficient market hypothesis states that a stock market displays full information and all the
information is instantly included in the price of the stocks listed on the market. Therefore, the market
is assumed to be efficient. According to the hypothesis, stock values exhibit the characteristics of a
random walk and therefore cannot be predicted. However, statisticians and economists have
challenged the hypothesis and claimed that stock values are predictable to some extent by suggesting
several statistical methods and theories that are somewhat efficient in forecasting future stock values.
For example, challengers of the hypothesis have found evidence that stock markets exhibit seasonal
patterns and that stocks are partially dependent on their past values, making future values somewhat
predictable, violating the hypothesis.
1.1. Research Problem
The Expert Modeler in SPSS will be used to estimate appropriate ARIMA models for the time series
and the models will be validated by following the Box-Jenkins methodology. The out-of-sample
forecasting accuracy of the suggested models will be compared to the out-of-sample forecasting
accuracy of models with worse fit. Akaike’s information criterion (AIC) will measure the goodness of
1
fit and the mean percentage error (MPE) and the mean absolute percentage error (MAPE) will
measure the out-of-sample forecasting accuracy.
1.2. Purpose
The purpose of this study is to construct, evaluate, and compare different ARIMA models’ ability to
forecast the Stockholm Stock Exchange indices, OMXSPI and OMXS30. Akaike’s information
criterion, the mean percentage error, and the mean absolute percentage error will be used to compare
different ARIMA models.
A research has previously been conducted where the best-fitted ARIMA models were constructed for
two Indian stock market indices. In that research, the AIC was used to decide which model had the
best fit for the data. The best-fitted models were used for out-of-sample forecasting, and to evaluate
the models’ ability to forecast the indices the MPE for the out-of-sample forecasts was computed. In
addition to the purpose stated in the previous paragraph, this thesis aims to investigate if a similar
result is possible when forecasting the two Swedish indices using an ARIMA model that according to
the AIC has the best fit.
1.3. Methodology
The research will be based on secondary data obtained from the Nasdaq website for the two stock
indices which represents the closing price observed monthly. The Expert Modeler in SPSS will be
used to estimate an appropriate ARIMA model based on the data for each index. The procedure of
locating an appropriate model is based on a series of steps, including proper transformation and
differencing, detection of ARIMA pattern, estimation of the parameters, and diagnostic checking of
the residuals through the Ljung-Box statistic. Based on the procedure of the Expert Modeler, the
suggested models will be considered to be the best-fitted models for the data. The Box-Jenkins
methodology will be followed to validate the suggested models. For pattern detection, which is the
first step of the Box-Jenkins methodology, the correlograms of the ACF and PACF with the 95%
confidence interval will be used. To test if the models fulfil the requirements of the third step of the
Box-Jenkins methodology, the Ljung-Box statistic will be used. For the test the five percent
significance level will be used and the critical value is extracted from the chi-squared distribution
table.
After an appropriate model is located, an additional model with worse fit will be estimated for each
index. To illustrate goodness of fit, Akaike’s information criterion will be used. The estimated
ARIMA models will be used to forecast twelve months of the indices for periods that have already
occurred, namely the twelve months of 2019. For the out-of-sample forecasts, the mean percentage
error (MPE) and the mean absolute percentage error (MAPE) will be computed to enable a
comparison of the forecasting accuracy between the best-fitted model and the model with worse fit.
2
The comparison will be used to evaluate whether the lowest AIC gives the best out-of-sample
forecasting model. Moreover, the accuracy of the best-fitted models will be compared to the result of
a previous study.
1.4. Delimitations
The thesis is limited to consider monthly data of two indices that represents the overall performance
of the Stockholm Stock Exchange, OMXSPI and OMXS30. The lower boundary is set to observe the
same number of years that are available from the source. The data collected for the indices ranges
from 1987-01-02 to 2018-12-03, which transfers into 32 years of observations. Since the data is
observed monthly, there are in total 384 observations for the two indices respectively. The reasoning
behind the upper boundary is to be able to forecast the twelve months of 2019 and compare the
predictions to the actual outcomes.
1.5. Disposition
The following section introduces the efficient market hypothesis and some of its critiques with their
corresponding counter-arguments. Thereafter the theoretical framework of time series econometrics is
introduced to create a fundamental understanding of the requirements of time series analysis and
ARIMA forecasting. In the fourth section the empirical strategy is presented; including how to detect
stationarity, the modeling approach of the Expert Modeler, the Box-Jenkins methodology, the Ljung-
Box statistic, Akaike’s information criterion, explanation of the used indices, and the measurements
intended to evaluate the ARIMA models’ out-of-sample forecasting accuracy. After the empirical
strategy, previous research closely related to the topic of this thesis is presented and summarized in
the literature review. The analysis follows the literature review, where the descriptive statistics are
presented, followed by a validation of the Expert Modeler’s suggested models and a comparison of
different models using AIC, MPE and MAPE is performed. After the comparison, the out-of-sample
forecasts of the best-fitted models and the line charts of the forecasts are displayed in the analysis.
The analysis also includes a comparison of the results in this study to the results of previous studies
on similar topics. To summarize the thesis, a conclusion based on the empirical strategy and the
analysis is conducted in section seven. The conclusion section also includes suggestions for further
research on this topic.
3
2. The Critiques of the Efficient Market Hypothesis
The efficient market hypothesis states that a capital market reflects full information which means that
news is instantly incorporated in stock prices. The instant price adjustment with regards to new
information makes the market efficient. A random walk describes changes in a time series where an
observation is a random movement from the last recorded observation. According to the efficient
market hypothesis, the price of stocks follows a random walk where the flow of news and information
are considered to be the stochastic component. Since news are random and instantly incorporated in
the price, at a given level of risk, experts are not expected to receive a higher rate of return than an
investor with no investing experience, as long as that investor holds a diversified portfolio. This
implies that neither technical analysis nor fundamental analysis could increase the rate of return
(Malkiel 2003).
In the following, some of the critiques of the efficient market hypothesis and the counter-arguments as
presented by Malkiel (2003) are given. One critique is that stock prices do not always behave as true
random walks. This is based on evidence that stock prices sometimes move in the same direction too
many times consecutively to be regarded as random and that the serial correlations of stock prices are
different from zero in the short-run. On the other hand, these predictable patterns are hard to take
advantage of since they seem to instantly vanish once they are revealed to the public.
Another critique of the efficient market hypothesis is that there sometimes appear to be seasonality in
the stock market. One phenomenon is what has become known as the “January effect” in which some
stock indices tend to give a greater rate of return during the start of January than for the rest of the
year. A presented counter-argument is that if there in fact was a January effect, investors would buy in
late December and sell in early January. However, if many investors would follow this strategy, one
would have to buy and sell earlier than everyone else in order to take advantage of the arbitrage
possibility. This would eventually move the January effect to December, disturbing the seasonality
(Malkiel 2003).
Another predictable pattern that has been discovered in financial markets is the correlation between
dividend yields and the rate of return. Researches has shown that the initial dividend yield of a stock
market can explain up to 40 percent of the variance of a stock market’s future rate of return. A
counter-argument to this would be that the dividend yields for 10-year periods during mid 1980s and
1990s was 3% or lower, while the markets average rate of return was 15%. This means that the
forecasted expected return based on dividend yields would have been lower than the actual outcomes
(Malkiel 2003).
Based on the critiques and the counter-arguments presented above, the debate on the validity of the
efficient market hypothesis remains open.
4
3. Time Series Econometrics
3.1. Time Series Data
Time series data is defined as a collection of values of a variable that differs over time. The intervals
between observations of a time series can vary. However, the range of the intervals should be
consistent throughout the observed period e.g. daily, weekly, monthly etc. In general, the time series
is assumed to be stationary in empirical work based on time series (Gujarati & Porter 2008).
3.2. Stochastic Processes
A process is said to be stochastic, or random, if the collection of a variable is gathered over a

sequence of time. A stochastic process can be either stationary or nonstationary (Gujarati & Porter
2008).
3.3. Autoregressive Model
An autoregressive model is a model where the dependent variable is regressed on at least one lagged
period of itself. If an autoregressive model includes one lagged period of itself, it follows a first-order
autoregressive stochastic process, denoted AR(1). Furthermore, if the model includes p number of
lagged periods of the dependent variable, it follows a pth-order autoregressive process, denoted AR(p)
(Gujarati & Porter 2008).
3.4. Stationary Process

There are different types of stationarity. Second order stationary, commonly known as weakly
stationary, is considered to be sufficient in most empirical works. A stochastic process is weakly
stationary if it has constant mean and variance and the covariance is time invariant, i.e. the statistics
do not change over time (Gujarati & Porter 2008).
A white noise process is a special type of stationary stochastic process. A stochastic process is
considered to be white noise if the mean is equal to zero, the variance is constant, and the
observations are serially uncorrelated (Gujarati & Porter 2008).
5
3.5. Nonstationary Process
A stochastic process that has a time-varying mean, variance, or covariance is said to be nonstationary.
Financial data usually follows a random walk which is a type of nonstationary stochastic process. A
random walk is either with or without drift, indicating the presence of an intercept, and is an AR(1)
process. Regressing 𝑌" on 𝑌"#$ estimates the following
𝑌" = 𝜌𝑌"#$ + 𝑢"
and if 𝜌 equals 1, the model becomes what is known as a random walk (Gujarati & Porter 2008).
A random walk without drift is a process where the dependent variable can be estimated on one
lagged period of itself plus an error term, assumed to be white noise, known as a random shock. The
formula for a random walk without drift excludes the intercept. The mean is constant over time in a
random walk without drift, however, the variance is increasing indefinitely over time, making it a
nonstationary stochastic process (Gujarati & Porter 2008).
Random walk without drift:
𝑌" = 𝑌"#$ + 𝑢"
Similar to a random walk without drift, a random walk with drift is a process where the variable is
dependent on its own lagged values and a random shock. However, the model that may be used to
estimate a random walk with drift includes an intercept known as the drift parameter, denoted by 𝛿.
This parameter indicates if the time series is trending upwards or downwards, depending on whether 𝛿
is positive or negative. A random walk with drift is a nonstationary stochastic process since the mean
and variance are increasing over time (Gujarati & Porter 2008).
Random walk with drift:
𝑌" = 𝛿 + 𝑌"#$ + 𝑢"
The preceding random walks have infinite memory which means that the effects of random shocks
persist throughout the whole time period. The random walks are known as difference stationary
processes, meaning that even though the stochastic processes are nonstationary, they become
stationary through the first order difference (Gujarati & Porter 2008).
3.6. Integrated Process

A nonstationary stochastic process that has to be differenced one time to become stationary, is said to
be integrated of the first order, denoted 𝐼(1). Likewise, a nonstationary stochastic process that has to
be differenced twice to become stationary, is said to be integrated of the second order, denoted 𝐼(2).
Furthermore, this means that a nonstationary stochastic process that has to be differenced 𝑑 times, is
said to be integrated of order 𝑑, denoted 𝑌" ~ 𝐼(𝑑). A time series that is stationary without any
differencing is integrated of order zero, denoted 𝑌" ~ 𝐼(0) (Gujarati & Porter 2008).
6
3.7. Deterministic Trend
A time series that is deterministic can be perfectly forecasted. However, most time series are partially
deterministic and partially stochastic, making them impossible to predict perfectly due to the
probability distribution of future values (Chatfield 2003).
If a variable is dependent on its past values and a time variable, it is estimated by the following;
𝑌" = 𝛽$ + 𝛽4 𝑡 + 𝑌"#$ + 𝑢"
where 𝑡 is a variable that measures time chronologically and 𝑢" is an error term, assumed to be white
noise. The equation is known as a random walk with drift and deterministic trend and is stochastic but
also partially deterministic, due to the time trend 𝑡 (Gujarati & Porter 2008).
3.8. Modeling of Time Series Data
When working with forecasting of time series data, the underlying time series is assumed to be
stationary. Assuming stationarity, there are several different approaches to construct forecasting
models, for example an autoregressive process, a moving average process, an autoregressive and
moving average process, and an autoregressive integrated moving average process (Gujarati & Porter
2008).
3.8.1. Autoregressive Process
An autoregressive process may be used to forecast a time series. As mentioned earlier, a first-order
autoregressive model is denoted AR(1) and is 𝑌" regressed on 𝑌"#$ . An autoregressive model of the
pth-order is denoted AR(p) and takes the form of
𝑌" = 𝛿 + 𝛼$ 𝑌"#$ + 𝛼4 𝑌"#4 +. . . + 𝛼8 𝑌"#8 + 𝑢"
where the constant is denoted by 𝛿 and 𝑢" is white noise (Gujarati & Porter 2008).
3.8.2. Moving Average Process
In a moving average process, the dependent variable is regressed on current and lagged error terms
and is therefore estimated through a constant and a moving average of the error terms. If the
dependent variable is regressed on the current and one lagged error term, it follows a first-order
moving average process, denoted MA(1). Moreover, a model that includes q number of error terms
follows a qth-order moving average process, denoted MA(q). A MA(q) process is defined as
𝑌" = 𝜇 + 𝛽: 𝑢" + 𝛽$ 𝑢"#$ + 𝛽4 𝑢"#4 +. . . + 𝛽; 𝑢"#;
where the error terms 𝑢 are assumed to be white noise and 𝜇 is the constant (Gujarati & Porter 2008).
In a MA model the error terms are usually scaled to make 𝛽: equal to one (Chatfield 2003).
7
3.8.3. Autoregressive and Moving Average Process
It is possible to combine an autoregressive process and a moving average process since the dependent
variable often possess characteristics of both. This is called an autoregressive and moving average
process, or ARMA. If both of the underlying AR and MA models are of the first-order, the model is
denoted ARMA(1, 1) and defined as
𝑌" = 𝜃 + 𝛼$ 𝑌"#$ + 𝛽: 𝑢" + 𝛽$ 𝑢"#$
where 𝜃 is the constant. If the underlying autoregressive model is of order p and the moving average
model is of order q, the ARMA process is denoted by ARMA(p, q) (Gujarati & Porter 2008).
3.8.4. Autoregressive Integrated Moving Average Process
If the time series of an ARMA model has to be differenced a certain number of times to become
stationary, the model becomes what is known as an autoregressive integrated moving average model,
or an ARIMA model. As mentioned previously, a time series which has to be differenced d number of
times in order to become stationary, is integrated of order d, denoted 𝐼(𝑑). In its general form, the
ARIMA model is denoted ARIMA(p, d, q) which means that the AR is of the pth-order, the time
series is integrated d number of times, and the moving average is of the qth-order. This further means
that if the underlying AR and MA models are of the first-order, and the time series is stationary at the
first difference, the ARIMA model is denoted ARIMA(1, 1, 1). It is important to note that an ARIMA
model is not derived from any economic theory, that is, it is an atheoretic model. The Box-Jenkins
methodology can be followed to determine p, d, and q and estimate an ARIMA model (Gujarati &
Porter 2008).
8
4. Empirical strategy
4.1. Detecting Stationarity
There are several different methods to identify whether a time series is stationary or not. Graphical
analysis is a visual approach where the time series is plotted against time. The purpose of the graph is
to decide if there is a trend in the time series or if the time series satisfies the requirements of
stationarity (Gujarati & Porter 2008).
Another method to test for stationarity is by computing the autocorrelation function, also known as
the ACF. The autocorrelation function is the ratio between the covariance at a specific lag, generally
expressed as lag 𝑘, to the variance. At lag 𝑘, 𝜌> denotes the ACF and is defined as follows;
𝛾>
𝜌> =
𝛾:
where 𝛾> is the covariance at lag 𝑘 and 𝛾: is the variance. The ACF can be plotted by using a
correlogram. In the correlogram, if all or most of the lags are statistically insignificant, there is no
specific pattern, constant variance, and the autocorrelations at various lags hovers around zero, the
time series could be regarded as stationary. This means that a time series is most likely stationary if
the ACF correlogram resembles a white noise process (Gujarati & Porter 2008).
The choice of number of lags is an empirical question and has no obvious answer (Gujarati & Porter
2008). In this thesis the number of lags used for the correlograms are twelve. The reasoning behind
the chosen number of lags is because the data is observed monthly and it is therefore sensible, since
twelve months sums to one year.
4.2. The Box-Jenkins Methodology

The Box-Jenkins methodology consists of four consecutive steps that should be followed when
building an ARIMA model. The first step is called identification, and the purpose of this step is to
determine appropriate values for p, d, and q. The ACF and the partial autocorrelation function (PACF)
with their respective correlograms are used for pattern detection of p, d, and q in the first step. The
PACF measures the autocorrelation between observations in a time series that are separated by k
number of lags and the intermediate autocorrelation between the lags are held constant. Estimation of
the parameters in the model is the second step. Step three is diagnostic checking, which tests the
chosen ARIMA model’s goodness of fit, usually done by testing if the residuals are white noise. In the
case of residuals that are not white noise, step one, two, and three should be repeated using new
values for p, d, and q. However, if the residuals are white noise, the model should be accepted and it is
possible to proceed to step four. Forecasting is the fourth step where the model may be used to predict
desired periods for the time series (Gujarati & Porter 2008).
9
Figure 1: The Box-Jenkins methodology.
4.3. Ljung-Box Statistic
To test if there is joint autocorrelation for a certain number of lags, the Ljung-Box statistic may be
used. The Ljung-Box statistic has m degrees of freedom, where m is equal to the number of lags, and
follows the chi-square distribution. Furthermore, it can be used to test if a series is white noise for a
certain number of lags and the Ljung-Box statistic may therefore be used for the third step in the
Box-Jenkins methodology, testing whether the residuals of the estimated ARIMA model are white
noise. If the Ljung-Box statistic is statistically insignificant, there is no evidence suggesting that
residuals are not a white noise process. The Ljung-Box statistic is defined as
H
𝜌E>4
𝐿𝐵 = 𝑛(𝑛 + 2) C D G
𝑛−𝑘
>I$
where 𝑛 denotes the size of the sample, 𝑚 denotes number of lags, and 𝜌E> is the autocorrelation at the
𝑘th lag (Gujarati & Porter 2008).
4.4. The Expert Modeler

The Expert Modeler in SPSS is a tool that can be used to locate an appropriate ARIMA model based
on the data. The tool follows a series of steps to find an appropriate fitted ARIMA model for the
underlying time series. The first step of the Expert Modeler is to consider the seasonal length of the
time series. Thereafter SPSS applies the proper transformation to the data and then the time series is
10
differenced to make it stationary. When the time series is stationary, the ACF and PACF is used to
detect the pattern for the model. Thereafter, the model is fitted through CLS and maximum likelihood
and statistically insignificant parameters in the model are deleted. Following the fitting of the model, a
diagnostic procedure for the residuals is performed using the Ljung-Box statistic, and the patterns in
the ACF and PACF are considered to test if the residuals are white noise and the model is thereafter
given in the SPSS output (IBM 2018).
4.5. Akaike’s Information Criterion

Akaike’s information criterion, or AIC, is a criterion that may be used to choose the model with the
best fit among different models. It is possible to evaluate regression models efficiency for both in- and
out-of-sample forecasting, using the AIC. Generally, adding regressors to a model provides a better
fitted model. However, adding too many regressors to a model will result in adding unnecessary
information. The AIC penalizes the addition of too much information and the AIC increases as a
model becomes overfitted. Therefore, the model with the lowest AIC is the model with the best fit,
given that the models have the same regressand. The AIC is defined as
𝑅𝑆𝑆
𝐴𝐼𝐶 = 𝑒 4>/O
𝑛
where 𝑘 denotes the number of estimated parameters in the model, 𝑛 is the sample size, and 𝑅𝑆𝑆 is
the residual sum of squares (Gujarati & Porter 2008).
4.6. Mean Percentage Error & Mean Absolute Percentage Error

The performance of a forecasting model when predicting the future of a given variable is usually of
interest and several different statistical measurements, intended to evaluate the forecasting accuracy of
a model, have been formulated. The forecasting errors are often included in the measurements and
two measurements that are based on the relative forecasting errors are the mean percentage error
(MPE) and the mean absolute percentage error (MAPE). While some measurements are differently
scaled due to the characteristic of the variable and therefore misleading in comparisons, the MPE and
MAPE are easily comparable since they are measured in percent (Montgomery et al. 2015).
O
1 𝑦" − 𝑦E" (𝑡 − 1)
𝑀𝑃𝐸 = C U W
𝑛 𝑦"
"I$
O
1 𝑦" − 𝑦E" (𝑡 − 1)
𝑀𝐴𝑃𝐸 = C X X
𝑛 𝑦"
"I$
11
In the equations above, the 𝑦" represents the actual outcome of period 𝑡, 𝑦E" (𝑡 − 1) represents the
forecasted value of period 𝑡 predicted at period (𝑡 − 1), and 𝑛 represents the number of periods
predicted (Montgomery et al. 2015).
4.7. OMXS30 & OMXSPI

The index OMXSPI measures the weighted performance of all the stocks listed on Stockholm Stock
Exchange (Nasdaq n.d.a). OMXS30 is a weighted index that measures the performance of the 30 most
traded stocks on the Stockholm Stock Exchange (Nasdaq n.d.b)
12
5. Literature Review
5.1. Forecasting the Indian Stock Market
As mentioned previously, the article “A Prediction Approach for Stock Market Volatility Based on
Time Series Data” is the starting point for the empirical analysis in this thesis. The purpose of the
article is to construct an efficient forecasting model for two indices on two separate Indian stock
markets, namely Nifty and Sensex. In the article, the logarithmic transformation is applied to the data
and two ARIMA models are estimated to forecast the two indices. The best estimated models for the
data were two ARIMA(0, 1, 0) with drift and the authors of the article conclude that a correctly
chosen ARIMA model is sufficiently accurate in forecasting time series data. The conclusion is based
on the fact that the predicted values of the used models in the article, on average, deviated by
approximately 5% from the actual outcome, computed by the out-of-sample MPE (Idrees et al. 2019).
5.2. Comparison of Forecasting Models Accuracy
In the article “ARIMA: An Applied Time Series Forecasting Model for the Bovespa Stock Index” the
MAPE is used to determine which model, among several different forecasting models, is the most
accurate in forecasting the Brazilian stock index Bovespa. Among the models, the authors compare an
autoregressive model, two different exponential smoothing models, and an ARIMA(0, 2, 1). The
Box-Jenkins methodology is followed when building the ARIMA model in the article. The authors
conclude that according to the data, an AR(1) is the most accurate model since it has the lowest
out-of-sample MAPE. The authors further conclude that an AR(1) for the Bovespa stock index is an
adequate model to use as a tool to forecast the index (Rotela Junior et al. 2014).
5.3. Building a Forecasting Model; Using the Box-Jenkins Methodology
In their research, Paretkar et al. (2010) followed the Box-Jenkins methodology to build a seasonal
autoregressive integrated moving average, or SARIMA, which was supposed to forecast the
short-term power flows on transmission interties in the USA. A SARIMA is a modified ARIMA that
should be used if there is a seasonal pattern in the time series that is intended to be forecasted. Each
specific day of the week was unique for the used data, therefore the authors used weekly data for each
Thursday from January 2006 to May 2008 to build the SARIMA model, intended to forecast 16
Thursdays ahead. The conclusion of the study showed that by applying the techniques of the
Box-Jenkins methodology, it is possible to build a model that fits the data and the chosen model in the
research was sufficiently accurate in forecasting the time series. If there is a seasonal pattern in a time
series, a SARIMA will be sufficiently accurate in forecasting the time series. Furthermore, the authors
concluded that a SARIMA is more accurate in the short-run than it is in the long-run and the
13
parameters should therefore be re-estimated as time goes on, given that long-term forecasting is
desired.
5.4. Comparing the AIC of Different Models to Find the Best Fit
By using Akaike’s information criterion, Snipes & Taylor (2014) performed a research to discover the
best-fitted model to explain the relationship between the rating of wines and the respective price. In
their research, they used what is known as the AICc which is a slightly modified AIC. Similar to AIC,
the AICc penalizes the addition of unnecessary information to a statistical model and the model with
the lowest AICc score, among different models, has the best fit based on the data. To find the
best-fitted model to explain the relationship, Snipes and Taylor estimated nine different regression
models where the next model included either new or additional information compared to the previous.
The conclusion of the research was that they were able to confirm previous studies and they also
found an additional variable which has not been considered in earlier studies that was significant
when explaining the relationship. Moreover, the authors concluded that additional information in a
regression model does not necessarily improve the regression model’s ability to explain the
regressand, since the model that they found to have the best fit had relatively few regressors compared
to many other estimated models in their research. This further means that the AIC finds a
well-balanced model and a more complex regression model is not always the most accurate.
14
6. Analysis
6.1. Descriptive Statistics & Line Charts

Table 1 presents the descriptive statistics for OMXSPI and OMXS30. As mentioned in the
delimitations section, the data for each index consist of 384 observations over a span of 32 years.
Figure 2 and Figure 4 illustrate two line charts where OMXSPI and OMXS30 are plotted against time.
The line charts resemble two random walks with drift since the mean and variance is increasing over
time, i.e. they are non-stationary. The decaying autocorrelation as lags increase, illustrated in the ACF
correlograms in Figure 3 and Figure 5 for OMXSPI and OMXS30 respectively, further supports that
the indices are non-stationary.
Table 1: Descriptive statistics of OMXSPI and OMXS30.
Standard
Variable N Minimum Maximum Mean Deviation
OMXSPI 384 37,580 614,960 247,753 160,099
OMXS30 384 102,230 1686,640 779,752 470,586
Figure 2: Line chart of OMXSPI plotted against time.
15
Figure 3: The ACF correlogram of OMXSPI with the 95% confidence interval.
Figure 4: Line chart of OMXS30 plotted against time.
16
Figure 5: The ACF correlogram of OMXS30 with the 95% confidence interval.
6.2. Modeling OMXSPI
The best-fitted ARIMA model for the data, according to the Expert Modeler, is an ARIMA(0, 1, 1)
with the natural logarithmic transformation. The model includes zero autoregressive components, one
moving average component, and is integrated of the first order. The purpose of the natural logarithmic
transformation is to reduce the influence of outliers. Since the natural logarithmic transformation is
applied to the data and the ARIMA model is integrated of the first order, the right side of the equation
forecasts the rate of return between two periods. In order to forecast a period, the previous period has
to be moved from the left side to the right side of the equation. In the proceeding paragraphs, the Box-
Jenkins methodology is followed to validate the chosen model.
According to the Expert Modeler, the natural logarithmic transformation of OMXSPI is stationary
after the first difference. This is confirmed in Figure 6 where 𝑙𝑛𝑂𝑀𝑋𝑆𝑃𝐼" − 𝑙𝑛𝑂𝑀𝑋𝑆𝑃𝐼"#$ has been
plotted against time. The line chart shows that the mean and variance are constant over time and the
covariance is time invariant, which are the requirements for a time series to be considered stationary.
The correlogram of the ACF for 𝑙𝑛𝑂𝑀𝑋𝑆𝑃𝐼" − 𝑙𝑛𝑂𝑀𝑋𝑆𝑃𝐼"#$ shown in Figure 7 further supports that
the time series is stationary, since it resembles a white noise process. The choice of one moving
average component can be motivated in Figure 7 and Figure 8, where the correlograms of the ACF
and PACF are illustrated respectively for the stationary process of OMXSPI. A significant spike at the
first lag in the ACF, combined with the exponential decline as lags increase in the PACF, suggests a
MA(1) process. Finally, the patterns in the correlograms do not support any evidence for an
autoregressive component.
17
Figure 6: Line chart of 𝑙𝑛𝑂𝑀𝑋𝑆𝑃𝐼" − 𝑙𝑛𝑂𝑀𝑋𝑆𝑃𝐼"#$ plotted against time.
Figure 7: The ACF correlogram of 𝑙𝑛𝑂𝑀𝑋𝑆𝑃𝐼" − 𝑙𝑛𝑂𝑀𝑋𝑆𝑃𝐼"#$ with the 95% confidence interval.
18
Figure 8: The PACF correlogram of 𝑙𝑛𝑂𝑀𝑋𝑆𝑃𝐼" − 𝑙𝑛𝑂𝑀𝑋𝑆𝑃𝐼"#$ with the 95% confidence interval.
After identification of p, d and q, the second step in the Box-Jenkins methodology is to estimate the
parameters. The moving average parameter for ARIMA(0, 1, 1) of OMXSPI is statistically
significantly different from zero and the parameter estimate is presented in Table 2.
Table 2: Parameter estimate for the ARIMA(0, 1, 1) of OMXSPI.
Variable Coefficient Standard error t Significance

MA(1) -0,185 0,050 -3,675 0,000
The correlograms in Figure 9 and Figure 10 presents the ACF and PACF respectively for the residuals
of OMXSPI’s ARIMA model. Since all values at various lags hover around zero, the variance is
constant, there is no serial correlation, and most lags are statistically insignificant, the residuals
resemble a white noise process. The Ljung-Box statistic at the twelfth lag, equal to 15,609 for the
ACF, can further support the previous statement since the test statistic is smaller than the chi-square,
with twelve degrees of freedom, critical value of 21,026. The test statistic is statistically insignificant
and the null hypothesis cannot be rejected on the 5% significance level, i.e. there is no statistical
evidence suggesting that the residuals are not a white noise process . The third step in the Box-Jenkins
methodology is satisfied and the model may therefore be used to forecast the index.
19
Figure 9: The ACF correlogram of the residuals for the ARIMA(0, 1, 1) of OMXSPI with the 95%
confidence interval.
Figure 10: The PACF correlogram of the residuals for the ARIMA(0, 1, 1) of OMXSPI with the 95%
The ARIMA(0, 1, 1) model is used to forecast the twelve months of 2019 for the stock market index
OMXSPI. An additional model with worse fit is estimated and used to forecast the same period,
enabling comparison of the out-of-sample forecasting accuracy. The estimated AIC, MPE and MAPE
of the two different ARIMA models for OMXSPI are displayed in Table 3. The AIC of the ARIMA(0,
1, 1) is the lowest at 192,56 and according to the Akaike information criterion, this model should be
used since it has the best fit. The ARIMA with the best fit also has the lowest out-of-sample MPE and
20
MAPE, indicating that the model is the most accurate in forecasting the twelve months of 2019
between the two models.
Table 3: Two ARIMA models of OMXSPI with their corresponding AIC, MPE, and MAPE.
Model AIC MPE MAPE

ARIMA(0, 1, 1) 192,56 4,74% 6,00%
ARIMA(2, 1, 2) 194,02 4,81% 6,10%
In Table 4, the forecasted values of the twelve months of 2019 estimated with the ARIMA(0, 1, 1) and
the actual outcomes are presented for each month. The MPE is approximately 4,74% and the MAPE
is approximately 6% for the out-of-sample forecasted values. In Figure 11, a visual demonstration of
the fitted values in relation to the observed values for OMXSPI is illustrated. The figure also displays
the OMXSPI forecasts.
Table 4: Actual outcomes and forecasts for the OMXSPI ARIMA(0, 1, 1).
Date Actual Forecast

2019-01-02 525,81 565,44
2019-02-01 568,09 566,84
2019-03-01 589,42 568,25
2019-04-01 596,31 569,66
2019-05-02 618,52 571,07
2019-06-03 577,59 572,49
2019-07-01 620,82 573,91
2019-08-01 619,54 575,33
2019-09-02 605,11 576,76
2019-10-01 620,25 578,19
2019-11-01 649,27 579,62
2019-12-02 650,83 581,06
21
Figure 11: Line chart of OMXSPI, the fitted values of the ARIMA(0, 1, 1), and the forecasts.
6.3. Modeling OMXS30

For OMXS30, the Expert Modeler proposes an ARIMA(0, 1, 1) model with a natural logarithmic
transformation, to reduce the influence of outliers, as the best fit. In the model, there is no
autoregressive component, one moving average component, and the model is integrated of the first
order. Since the natural logarithmic transformation is applied to the data and the ARIMA model is
integrated of the first order, the right side of the equation forecasts the rate of return between two
periods. Similar to section 6.2., the ARIMA(0, 1, 1) for OMXS30 is validated in the proceeding
paragraphs by following the Box-Jenkins methodology.
The Expert Modeler suggests that the natural logarithmic transformation of OMXS30 is stationary
after the first difference, which is shown in Figure 12. According to the figure, the mean and variance
are constant over time and the covariance is time invariant, i.e. the time series is stationary. Figure 13
presents the correlogram for the ACF of 𝑙𝑛𝑂𝑀𝑋𝑆30" − 𝑙𝑛𝑂𝑀𝑋𝑆30"#$ , where the resemblance of a
white noise process further confirms that the time series is stationary. The correlograms of the ACF
and the PACF of the stationary process for OMXS30 illustrated in Figure 13 and Figure 14,
respectively, shows a significant spike at the first lag in the ACF and an exponential decline as lags
increase in the PACF, confirming that the time series is a MA(1) process. The patterns in the
correlograms suggests that there should not be an autoregressive process included in the ARIMA
model for OMXS30.
22
Figure 12: Line chart of 𝑙𝑛𝑂𝑀𝑋𝑆30" − 𝑙𝑛𝑂𝑀𝑋𝑆30"#$ plotted against time.
Figure 13: The ACF correlogram of 𝑙𝑛𝑂𝑀𝑋𝑆30" − 𝑙𝑛𝑂𝑀𝑋𝑆30"#$ with the 95% confidence interval.
23
Figure 14: The PACF correlogram of 𝑙𝑛𝑂𝑀𝑋𝑆30" − 𝑙𝑛𝑂𝑀𝑋𝑆30"#$ with the 95% confidence
interval.
As mentioned earlier, the second step in the Box-Jenkins methodology after the identification of p, d
and q, is to estimate the model’s parameters. The parameter for the ARIMA(0, 1, 1) of OMXS30 is
statistically significantly different from zero, and the estimated parameter is presented in Table 5.
Table 5: Parameter estimate for the ARIMA(0, 1, 1) of OMXS30.
Variable Coefficient Standard error t Significance

MA(1) -0,165 0,050 -3,276 0,001
The Ljung-Box statistic at the twelfth lag computed for the ACF of the residuals for the ARIMA(0, 1,
1) of OMXS30 is equal to 19,512 and does not exceed the chi-square, with twelve degrees of
freedom, critical value of 21,026. Therefore, the null hypothesis cannot be rejected on the 5%
significance level. Since the Ljung-Box statistic is statistically insignificant there is no statistical
evidence suggesting that the residuals are not a white noise process. The ACF and PACF
correlograms of the residuals for the ARIMA model, presented in Figure 15 and Figure 16, further
confirms that the residuals follow a white noise process since the lags hover around zero, the variance
is constant, there is no evidence of serial correlation, and most lags are statistically insignificant. The
white noise residuals satisfy the third step in the Box-Jenkins methodology and the ARIMA (0, 1, 1)
of OMXS30 may be used to forecast the time series.
24
Figure 15: The ACF correlogram of the residuals for the ARIMA(0, 1, 1) of OMXS30 with the 95%
Figure 16: The PACF correlogram of the residuals for the ARIMA(0, 1, 1) of OMXS30 with the 95%
The ARIMA(0, 1, 1) is used to forecast the twelve months of 2019 for the stock market index
OMXS30. To compare the out-of-sample forecast accuracy of the ARIMA(0, 1, 1) to a model with a
worse fit, an ARIMA(2, 1, 2) is estimated and used to forecast the same period. In Table 6, the two
ARIMA models of OMXS30 with their corresponding AIC, MPE and MAPE are presented. The
Expert Modeler’s suggested ARIMA(0, 1, 1) of OMXS30, has the lowest AIC at 2126,91, verifying
that this model has the best fit. Furthermore, the out-of-sample MPE and MAPE are also the lowest
25
for the best-fitted model. This indicates that between the two models, the model with the best fit is the
most precise in forecasting 2019.
Table 6: Two ARIMA models of OMXS30 with their corresponding AIC, MPE, and MAPE.
Model AIC MPE MAPE

ARIMA(0, 1, 1) 2126,91 2,22% 4,35%
ARIMA(2, 1, 2) 2149,42 2,23% 4,40%
In Table 7, the forecasted values of the twelve months of 2019 estimated with the ARIMA(0, 1, 1) and
the actual outcomes are presented for each month. The MPE is approximately 2,22% and the MAPE
is approximately 4,35% for the out-of-sample forecasted values. In Figure 17, a visual demonstration
of the fitted values in relation to the observed values for OMXS30 is illustrated. The figure also
displays the forecasts for OMXS30.
Table 7: Actual outcomes and forecasts for the OMXS30 ARIMA(0, 1, 1).
Date Actual Forecast

2019-01-02 1405,85 1536,11
2019-02-01 1525,69 1540,19
2019-03-01 1580,22 1544,28
2019-04-01 1580,33 1548,38
2019-05-02 1665,45 1552,50
2019-06-03 1518,23 1556,62
2019-07-01 1642,00 1560,76
2019-08-01 1612,91 1564,90
2019-09-02 1570,19 1569,06
2019-10-01 1635,64 1573,23
2019-11-01 1737,51 1577,41
2019-12-02 1706,82 1581,60
26
Figure 17: Line chart of OMXS30, the fitted values of the ARIMA(0, 1, 1), and the forecasts.
6.4. Comparison to Previous Studies

The Expert Modeler suggested an ARIMA(0, 1, 1) with the natural logarithmic transformation for
both OMXS30 and OMXSPI. In the research where Indian stock market indices were forecasted, the
best-fitted model for each index was an ARIMA(0, 1, 0) with drift and the natural logarithmic
transformation. Comparing the estimated out-of-sample MPEs of the two best-fitted models for the
Stockholm Stock Exchange indices to the computed MPEs of the models for the Indian indices, the
MPEs of the models for the Swedish indices are lower. However, even though the MPEs are lower,
the results are in line with each other. The authors for the article on the Indian indices, concluded that
the models are sufficiently accurate in forecasting the indices due to MPEs approximately equal to
5%. Since the results in this study are in line with the results of the research where the Indian indices
were forecasted, the best-fitted models in this thesis are considered to be sufficiently accurate in
forecasting OMXS30 and OMXSPI.
The patterns in Figure 2 and Figure 4 suggests somewhat of deterministic trends in the time series,
since the lines in the charts are increasing over time. This is further supported by the common idea
that the stock market is steadily increasing in the long-run. Therefore, the time series could to some
extent be considered partially predictable and a forecasting model such as an ARIMA model could be
accurate in forecasting a stock market index. However, perfect predictions would render MAPEs and
MPEs of exactly zero, which is not the case in this thesis. This implies that the indices are partially
random and can therefore not be perfectly predicted over time.
27
In the research concerning the Bovespa index, the authors evaluate the accuracy of different
forecasting models for the Brazilian stock index Bovespa, by comparing the MAPE of each model.
They found that an AR(1) gave the lowest MAPE and was therefore considered the best model to
forecast the index. For the Swedish stock indices, two ARIMA models with different AICs have been
compared. According to Akaike’s information criterion, the model with the lowest AIC has the best fit
and should be considered the best model. When predicting out-of-sample periods for the Swedish
stock indices, the models with the lowest AIC gave the most accurate forecasts and therefore the
lowest MAPE. Connecting these results to the research of the Bovespa index, the ARIMA(0, 1, 1) for
each index in this paper may be considered adequate to forecast the Swedish stock indices, since the
MAPE of these models were the lowest.
Similar to the research of Snipes & Taylor (2014), the more complex model in this research,
ARIMA(2, 1, 2), had a higher AIC compared to the ARIMA(0, 1, 1) suggested by the Expert Modeler
for each index. Furthermore, the model with the lower AIC had both lower MPE and MAPE. Not only
does this agree with the conclusion of Snipes and Taylor, stating that complexity is not necessary for a
well-fitted model, it also suggests that a less complex model might give better forecasts.
Looking at the two last predictions for each index, in Table 4 and Table 7, the forecasting errors
increase the further into the future predictions were made. This agrees with the conclusion in the
previous research of Paretkar et al. (2010), that increasing prediction errors suggest that re-estimations
of the parameters should be performed to enhance the accuracy of long-term forecasting.
28
7. Conclusion
In the thesis, the best-fitted ARIMA models for OMXSPI and OMXS30 were estimated using the
Expert Modeler in SPSS. The models were used to forecast the twelve months of 2019 for the two
indices respectively. The out-of-sample forecasts and the actual outcomes were used to compute the
MPE and MAPE, enabling comparison to worse fitted models and models used in earlier studies on
stock index forecasting.
To validate the models, the Box-Jenkins methodology was followed. When the Expert Modeler
estimates an appropriate ARIMA model for a time series, one of the steps in the procedure is to
determine the stationary process of the time series, followed by pattern detection in the ACF and
PACF. In the analysis, same p, d and q for the ARIMA models were detected through graphical
analysis of the line charts and the correlograms of the stationary processes, confirming the suggested
models. Furthermore, diagnostic checking of the models’ residuals was conducted through the
Ljung-Box statistic and graphical analysis of the correlograms of the residuals. The residuals of the
models resembled white noise processes and the models could therefore be used to forecast the twelve
months of 2019 for OMXSPI and OMXS30. The validation of the suggested models by following the
Box-Jenkins methodology, advocates that the Expert Modeler was appropriate in estimating ARIMA
models for the used time series.
In addition to the models suggested by the Expert Modeler, an ARIMA model with a greater AIC was
estimated and used to forecast the two indices. Thereafter, a comparison using MPE and MAPE was
made between the suggested model and the model with the greater AIC. The result showed that the
suggested model for each index, with the lower AIC, was more accurate in forecasting the time series
due to a lower MPE and MAPE for the out-of-sample forecasts.
As mentioned earlier, a previous research where the purpose was to forecast the Indian stock market
indices, Nifty and Sensex, was the starting point for the empirical analysis in this thesis. In that study,
the best-fitted ARIMA model was used to forecast each index and the MPE was computed for the
out-of-sample forecasts, which was approximately equal to 5% for both models. The MPEs computed
for the out-of-sample forecasts in this thesis, are lower at approximately 4,74% for OMXSPI and
2,22% for OMXS30. Even though the results in this research are lower, they are in line with the
results of the study where the Indian indices were forecasted. The major difference is that the
appropriate ARIMA models are different for the two studies and this shows that the appropriate p, d,
and q of an ARIMA model are different for different time series.
Supporters of the efficient market hypothesis believes that a stock market is random and can therefore
not be predicted. However, challengers of the hypothesis argue that a stock market is partially
predictable and it is therefore possible to create an idea of future movements on the market. The
authors of the research on the Indian indices consider their ARIMA models sufficiently accurate in
29
forecasting the underlying time series, due to MPEs around 5%. Since the results in this research for
the Swedish indices are in line with the results of that study, the best-fitted models in this thesis are
assumed to be sufficiently accurate in forecasting 2019. This does not prove that a stock market is
efficient or inefficient, but it is possible to conclude that a well-informed guess could be better than a
random guess. The increasing prediction errors of more distant forecasts, discussed in the analysis,
indicates that an ARIMA is more accurate in the short-run compared to the long-run. Therefore, the
parameters should be re-estimated over time to increase the odds of sufficiently accurate predictions
in the long-run.
Further research should focus on comparison of forecasting performances of models such as GARCH
and exponential smoothing to the forecasting performance of an ARIMA model. When forecasting
OMXS30 and OMXSPI, different time periods or different observational frequencies could be of
interest and could give better forecasting results. One of the critiques of the efficient market
hypothesis is that there appears to be seasonality in the stock market. Therefore, if evidence for
seasonality can be found in stock market data, a seasonal ARIMA model, or SARIMA, should be
tested to forecast the time series to examine if better out-of-sample forecasting can be achieved
compared to the results of this study.
30
8. References & Data Sources
Chatfield, C. (2003). The analysis of Time Series: An Introduction. 6. ed. Boca Raton: Chapman &
Hall/CRC.
Gujarati, N.D. & Porter, D.C. (2008). Basic Econometrics. 5. ed. New York: McGraw-Hill/Irwin
IBM (2018). IBM SPSS Modeler 18.2 Algorithms Guide. https://www.ibm.com/support/pages/spss-

modeler-182-documentation [2019-12-05]
Idrees, S. M., Afshar Alam, M., Agarwal, P. (2019). A Prediction Approach for Stock Market
Volatility Based on Time Series Data. IEEE Access, 7, 17287-17298. doi:
10.1109/ACCESS.2019.2895252
Malkiel, B. G. (2003). The Efficient Market Hypothesis and Its Critics. Journal of Economic
Perspectives, 17(1), 59-82. doi: 10.1257/089533003321164958
Montogomery, D. C., Jennings, C. L., Kulahci, M. (2015). Introduction to Time Series Analysis and
Forecasting. 2. ed. Hoboken: John Wiley & Sons.
Nasdaq (2019a). Historiska Kurser OMXSPI.

http://www.nasdaqomxnordic.com/index/historiska_kurser?Instrument=SE0000744195 [2019-12-10]
Nasdaq (2019b). Historiska Kurser OMXS30.

http://www.nasdaqomxnordic.com/index/historiska_kurser?Instrument=SE0000337842 [2019-12-10]
Nasdaq (n.d.a.). Vad är aktieindex?
http://www.nasdaqomxnordic.com/utbildning/aktier/vadaraktieindex?languageId=3 [2019-12-10]
Nasdaq (n.d.a.). Vad är OMX Stockholm 30 index?
http://www.nasdaqomxnordic.com/utbildning/optionerochterminer/vadaromxstockholm30index
[2019-12-10]
Paretkar, P. S., Mili, L., Centeno, V., Jin, K., Miller, C. (2010). Short-term Forecasting of Power
Flows over Major Transmission Interties: Using Box and Jenkins ARIMA Methodology. IEEE PES
General Meeting. doi: 10.1109/PES.2010.5589442
Rotela Junior, P., Riêra Salomon, F. L., de Oliveira Pamplona, E. (2014). ARIMA: An Applied Time
Series Forecasting Model for the Bovespa Stock Index. Applied Mathematics, 5, 3383-3391. doi:
10.4236/am.2014.521315
Snipes, M. & Taylor, D. C. (2014). Model selection and Akaike Information Criteria: An example
from wine rating and prices. Wine Economics and Policy, 3(1), 3-9. doi: 10.1016/j.wep.2014.03.001
31

FULLTEXT01

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FULLTEXT01

Uploaded by

Copyright:

Available Formats

Philip Jansson & Hugo Larsson

Bachelor Thesis – Finance

Karlstad Business School

Hugo Larsson & Philip Jansson, Karlstad, January 13th, 2020.

Keywords: Forecasting, ARIMA, Index, MPE, MAPE.

Nyckelord: Förutspå, ARIMA, Index, MPE, MAPE.

4. Empirical strategy .............................................................................................................. 9

1.1. Research Problem

3.1. Time Series Data

3.2. Stochastic Processes

A process is said to be stochastic, or random, if the collection of a variable is gathered over a

3.3. Autoregressive Model

3.4. Stationary Process

Random walk without drift:

𝑌" = 𝑌"#$ + 𝑢"

Random walk with drift:

𝑌" = 𝛿 + 𝑌"#$ + 𝑢"

3.6. Integrated Process

𝑌" = 𝛽$ + 𝛽4 𝑡 + 𝑌"#$ + 𝑢"

3.8. Modeling of Time Series Data

3.8.1. Autoregressive Process

𝑌" = 𝛿 + 𝛼$ 𝑌"#$ + 𝛼4 𝑌"#4 +. . . + 𝛼8 𝑌"#8 + 𝑢"

3.8.2. Moving Average Process

𝑌" = 𝜇 + 𝛽: 𝑢" + 𝛽$ 𝑢"#$ + 𝛽4 𝑢"#4 +. . . + 𝛽; 𝑢"#;

𝑌" = 𝜃 + 𝛼$ 𝑌"#$ + 𝛽: 𝑢" + 𝛽$ 𝑢"#$

3.8.4. Autoregressive Integrated Moving Average Process

4.1. Detecting Stationarity

4.2. The Box-Jenkins Methodology

4.3. Ljung-Box Statistic

4.4. The Expert Modeler

4.5. Akaike’s Information Criterion

4.6. Mean Percentage Error & Mean Absolute Percentage Error

4.7. OMXS30 & OMXSPI

5.1. Forecasting the Indian Stock Market

5.2. Comparison of Forecasting Models Accuracy

5.3. Building a Forecasting Model; Using the Box-Jenkins Methodology

6.1. Descriptive Statistics & Line Charts

Table 1: Descriptive statistics of OMXSPI and OMXS30.

Figure 2: Line chart of OMXSPI plotted against time.

Figure 4: Line chart of OMXS30 plotted against time.

6.2. Modeling OMXSPI

Table 2: Parameter estimate for the ARIMA(0, 1, 1) of OMXSPI.

Variable Coefficient Standard error t Significance

Model AIC MPE MAPE

Date Actual Forecast

6.3. Modeling OMXS30

Table 5: Parameter estimate for the ARIMA(0, 1, 1) of OMXS30.

Variable Coefficient Standard error t Significance

Model AIC MPE MAPE

Date Actual Forecast

6.4. Comparison to Previous Studies

IBM (2018). IBM SPSS Modeler 18.2 Algorithms Guide. https://www.ibm.com/support/pages/spss-

Nasdaq (2019a). Historiska Kurser OMXSPI.

Nasdaq (2019b). Historiska Kurser OMXS30.

Nasdaq (n.d.a.). Vad är aktieindex?

Nasdaq (n.d.a.). Vad är OMX Stockholm 30 index?

You might also like