You are on page 1of 4

Time series analysis of NASDAQ Composite based

on seasonal ARIMA model


Weiqiang Wang Zhendong Niu
School of computer science, Beijing Institute of School of computer science, Beijing Institute of
Technology, Technology,
Beijing, China, 10081 Beijing, China, 10081
Department of Statistics, University of California-Los zniu@bit.edu
Angeles, Los Angeles
Weiqiang.wang1983@gmail.com

ABSTRACT:An autoregressive integrated moving average (ARIMA) In this paper, we investigate Seasonal ARIMA as a good
model was one of the most popular linear models in financial time analytical method which is based on a time series and
series forecasting in the past. In this context, A time series analysis interrelated dynamic data. We have selected NASDAQ data
of the NASDAQ composite indices is provided study its movement extracted from yahoo using seasonal ARIMA models to
in 1998-2008. This paper proposed a general expression of seasonal construct an analysis.
ARIMA models with periodicity and provide parameter
estimation,diagnostic checking procedures to model, predict
NASDAQ data extracted from yahoo website using seasonal
II. DATA
ARIMA models, and also compare with other models. we show The NASDAQ composite indices data was extracted from
experimental results with NASDAQ data sets indicate that the yahoo website. In this paper, we choose the low and high
seasonal ARIMA model can be an effective way to forecast finance. monthly data respectively. A monthly average is calculated
from January 1998 to January 2008.The data is organized in a
Keywords: Seasonal ARIMA; Time series; Nasdaq;
data frame, with value as rows and days as columns and the
data is 12 × 10, with no missing data, so we have 120
observations for each variable.
I. INTRODUCTION
The NASDAQ (acronym of National Association of Nasdaq 1998-2008
Securities Dealers Automated Quotations) is an American
stock exchange. It is the largest electronic screen-based equity
4500

securities trading market in the United States. With


approximately 3,200 companies, it has more trading volume
3500

per hour than any other stock exchange in the world.[1]


indices

2500

Most nonstationary economic time series are thought to


be driven by at least two factors of consequence. One of these
factors is nonstationary and its dynamic is represented in some
1500

form of a random walk structure. On the other hand, another


factor is thought to be the cyclical deviation and is usually 1998 2000 2002 2004 2006 2008

represented as some form of autoregressive structure. Time

Some of the important applications are Nelson and Plosser Figure 1:


(1982), Schwartz and Smith(2000), Shirvani and Wilbratte Figure 1 shows the monthly NASDAQ composite indices
(2007), and Stock and Watson (1988) to name a few. For from January 1998 through January 2008. There is obviously a
macroeconomic applications, some authors allow this strong upward trend except the peak period around 2000 which
component structure to be subjected to regime changes of because the “I.T. bubble” was a speculative bubble covering
varying descriptions depending on what hypotheses are being roughly 1995–2001 (with a climax on March 10, 2000 with the
tested (Kim & Piger, 2002; Lo & Piger, 2005). NASDAQ peaking at 5132.52) during which stock markets in
Shirvani and Wilbratte (2007) adopted a component-based Western nations saw their value increase rapidly from growth
approach to decompose stock prices into long-term trends and in the new Internet sector and related fields. In this display,
short-term cyclical components. In turn, they are able to Observation suggests that the variation series is not constant
explain the trend component with the help of fundamentals for over time and that there is a trend as well as a fluctuation and
stock valuations.

978-1-4244-4639-1/09/$25.00 ©2009 IEEE


seasonal pattern in the data. This NASDAQ time series have to Θ ( B ) = 1 − Θ1 B − θ 2 Bs2 − ... − θ Q BsQ (6)
be determined as non stationary.
and
III. ARIMA MODEL
ω (7) t = ∇ D
s ∇ d
Y t

In the late 1960s,Box and Jenkins advocated ARIMA The last equation illustrates the multiplicative seasonal
methodology for time series based on finite-parameter models. behavior indicating that seasonal and consecutive differencing
may be required to induce stationary. Seasonal ARIMA should
An autoregressive integrated moving average (ARIMA) be used discreetly. If we applied the seasonal model to non-
model is fitted to time series data either to better understand the seasonal data, the forecast would show a cycle that may be far
data or to predict future points in the series. The model is from the truth. We should make sure the data contains
generally denoted to as an ARIMA(p,d,q) model where p is the seasonality before applying the model.
number of autoregressive terms, d is the number of non
seasonal differences, and q is the number of lagged forecast
V. ESTIMATIONS AND DIAGNOSTIC CHECKING
errors in the prediction equation.
The strong seasonal autocorrelation relationships are shown
⎛ p
⎞ ⎛ q
⎞ (1) in Figure1. Evidence shows that there is substantial other
∑ φ L ⎟ (1 − L ) ∑θ
d
⎜1 − i
i
X t = ⎜1 + i Li ⎟ ε t
⎝ i =1 ⎠ ⎝ i =1 ⎠ correlation that needs to be modeled. Clearly we need at least
one order of differentiation.
In equation (1),where L is the lag operator, the i are the
φ
parameters of the autoregressive part of the model, the θi are Time Series Plot of the First Differences of log(Nasdaq) Levels
the parameters of the moving average part and the are error

0.2
terms. The error terms are generally assumed to be
independent, identically distributed variables sampled from a First Differences of log(Nasdaq)

normal distribution with zero mean. 0.1

Before you begin to format your paper, first write and save
0.0

the content as a separate text file. Keep your text and graphic
files separate until after the text has been formatted and styled.
-0.1

Do not use hard tabs, and limit use of hard returns to only one
-0.2

return at the end of a paragraph. Do not add any kind of


pagination anywhere in the paper. Do not number text heads- 1998 2000 2002 2004 2006 2008

the template will do that for you. Time

Figure :First Differences of log(Nasdaq)


IV. SEASONAL ARIMA MODEL
Figure 2 shows the time series plot of the log(Nasdaq)
The seasonal part of an ARIMA model has the same levels after we take a first difference. The upward trend has
structure as the non-seasonal one: it may have an AR factor, an now almost disappeared, but the strong seasonality is still
MA factor, and an order of differencing. The seasonal present as evidenced by the behavior shown in Figure 2. In
autoregressive integrated moving average model of Box and addition, the inconstant variation still seems to be a big
Jenkins(1970) [3]is given by problem. Maybe seasonal differencing will bring us to a series
that may be modeled more simply.
⎡⎣ ∇ d ∇ sD Y t − μ ⎤⎦ =
θ (B )Θ (B ) e
S
(2)
φ (B )Φ (B ) S t
A. Parameter Estimation
and is denoted as an ARIMA(p,d,q)×(P,D,Q)S Applying the SARIMA model to our Nasdaq data, the
following table shows:
where P is number of seasonal autoregressive (SAR) terms,
D is number of seasonal differences, Q is number of seasonal
moving average (SMA) terms. We can decide s(Seasonal Span)
by the time span for each cycle: for quarterly data, we apply s =
4; for daily data, we set s = 7; for monthly data, we set s = 12;
coefficient θ Θ
and for hourly data, P, Q is numbers of seasonal, 0.4109 -0.8581
autoregressive terms and seasonal moving average terms, s.e 0.1005 0.1384
respectively. Further,
AIC 1421.46
φ(B) =1−φ1B −φ2B2 −... −φP BP (3)

θ ( B ) = 1 − θ 1 B − θ 2 B 2 − ... − θ q B q (4) Table 1. parameter estimation

Φ ( B ) = 1 − Φ 1 B − Φ 2 B s2 − ... − φ p B sP (5)
Thus our seasonal ARIMA(0,1,1) ×(0, 1, 1)12 model is: Above at all, through the diagnostic checking including
histogram of the residual, QQ-plot of the residuals, the
Yt=Yt-1+Yt-12-Yt-13+ ε t +0.4109 ε t −1 +0.8581 ε t −12 experiment results indicate that model expresses very well. We
could conclude Nasdaq can be represent very well by

+(0.4109)( 0.8581) ε t −13


ARIMA(0,1,1)×(0,1,1)12.
(8)
VI. FORECASTING
B. Diagnostic Checking
Diagnostic checking is necessary to ensure the best To examine our prediction, we compare the actual values
forecasting model has been built. The coefficient estimates are with the predicted values. The result appears in Figure 7.
highly significant and the estimated value is small. However, in
order to check the estimated ARIMA(0, 1, 1) × (0, 1, 1)12 Forecasts from ARIMA(0,1,1)(0,1,1)12
model, there are some other things to check. Here we use
Histogram and Q-Q plot approaches for testing normality and
identifying outliers.

4000
Histogram of NASDAQ

3000
na
6

2000
Density

1000
2

2000 2002 2004 2006 2008

Time
0

-0.2 -0.1 0.0 0.1 0.2


Fig. 2. Predicted Value
Nasdaq
In Figure 1 is the actual plot from January 1998 to January
. Histogram of the residuals 2007, and the forecast plot is the same from January 1999 to
January 2007, but with the prediction 2008. Figure 6.1 shows
By plotting the histogram of the residual we can see the the forecasts and 95% forecast limits for a lead time of one
center totally nearby zero, and the shape of histogram appears year for the ARIMA(0, 1, 1) × (0, 1, 1)12 model. The last one
“good shaped”. year of observed data is also shown. The forecasts mimic the
To take a further look, another way to test the normality is stochastic periodicity in the data quite well, and the forecast
Q-Q plot. we would like to determine if outliers exist. We can limits give a good feeling for the precision of the forecasts.
take a look at the normal Q-Q plot formed by residuals, from
the Q-Q plot below, we can see there are not many outliers, VII. MODEL COMPARISON
almost all the points are laid on the line. In order to support the results so far, we can compare the
seasonal ARIMA model to different models such as the non-
Normal Q-Q Plot seasonal ARIMA model and the mixed model (ARMA).
However, the order of (p, d, q) remains unchanged for the non-
0 .2

seasonal ARIMA model to demonstrate the pure effects caused


by not considering seasonality for seasonal data; for the
ARMA model, we briefly introduce the process of model
0 .1
S a m p le Q u a n tile s

identification and apply the result of the identification process


to the underlying data. And we will focus on the difference in
0 .0

criteria measuring the goodness of fit between each model, and


the forecast outcome from each model will be examined
-0 .1

closely.
-0 .2

A. non-seasonal ARIMA Model


-2 -1 0 1 2
If we apply a non-seasonal ARIMA model to the Nasdaq
Theoretical Quantiles
data with the same power transformation, the parameters for
Fig. 1. QQ-plot of the residuals ARIMA(0,1,1) are:
coefficient θ would either make an inaccurate forecast or choose the
incorrect model fitting process.
0.3231
s.e. 0.0875
AIC 1549.4 VIII. CONCLUSIONS
In this paper, we presented a seasonal ARIMA(0, 1,
1)×(0.1.1)12 model which has been fitted and provide
The parameter θ changed slightly without considering the parameter estimation , diagnostic checking procedures to
seasonality. However, the AIC becomes worse going from model, predict NASDAQ data extracted from yahoo using
1421.46 to 1549.4. seasonal ARIMA models, and also compare with no-season
ARIMA model and mixed ARMA model, the experimental
B. Mixed ARMA Model results reaffirm that the seasonal ARIMA model is better fitted
than other models for our data. A forecasting plot has been
The mixed model is also called ARMA (Autoregressive drawn. The forecast of NASDAQ illustrates the pattern as well
Moving Average process).In general, if Y t is a mixed ARMA as the seasonality of the data. The model will be helpful to
process of orders p and q, we abbreviate the name to predict the NASDAQ composite price indices. So we can see
ARMA(p,q), and it can be defined as: seasonal ARIMA model has board applicability in the field.

p q
Yt = μ t + ∑ φ iYt − i + ∑ φ j ε t − j (9)
REFERENCES
i =1 j =1 [1] "NASDAQ Performance Report". NASDAQ Newsroom. The Nasdaq
Stock Market. 2007-01-12.
[2] www.finance.yahoo.com
In this case, we can try the ARMA model with p = 2 and q [3] G.E.P. Box and G.M.Jenkins. Time Series Analysis: Forecasting and
= 2. The parameter estimation is shown in the following table: Control.Holden-Day, revised edition, 1976.
[4] Bernhard Pfaff. Analysis of Integrated and cointegrated Time Series
with R. Springer, September 2005.
coefficient φ1 φ2 θ1 θ2 [5] W.N Venables and B.D.Ripley. Modern Applied Statistics with S,
Springer, January 2002.
1.9393 -0.9468 -0.7146 -0.2854 [6] Robert F. Nau. Introduction to arima: nonseasonal models, 2005.
http://www.duke.edu/rnau/411arim.htm.
s.e. 0.0261 0.0262 0.0990 0.0964 [7] Maindonald, J., and Braun, J., Data Analysis and Graphics Using R, An
Example-Based Approach, Cambridge University Press, Second
AIC 1562.28 Edition,2007.
[8] Shumway, R.H., and Stoffer, D.S., Time Series Analysis and Its
Applications with R Examples, Second Edition, Springer, 2006.
Table 4. Parameters Estimation for ARMA(2,2) Model [9] H. Akaike, Time Series Analysis and Control Through Parametric
Models, Applied Time Series Analysis, Academic Press, New York
1978.
So the ARMA(2,2) model is: [10] Bovas Abraham & Johannes Ledolter. Statistical Methods for
Forecasting.John Wiley & Sons, Inc, 2005.

Yt=1.9393Yt-1+ (-0.9468)Yt-2+ ε t +0.7146 ε t −1 [11] Enders, W. Applied Econometric Time Series. Wiley, 2003.

+0.2854 ε t − 2 (10)
AIC changes significantly, which indicates that the ARMA
model may be an inappropriate one for the Nasdaq data. The
result showed increasing ranges of AIC, and using the ARMA
model provides a poor description of the Nasdaq data.

C. Concluding Remarks
In this chapter, we reaffirm that the seasonal ARIMA
model is the best choice for our data. The forecast from the
seasonal ARIMA model provides a similar trend as the original
data, while the AIC from the ARIMA model is increasing so
much. Further, the prediction from the mixed ARMA model
performs even worse. So we can conclude that when dealing
with seasonal or non-stationary data, the integrated model as
well as the seasonal model should be considered. Otherwise we

You might also like