You are on page 1of 53

Chapter 1 Introduction.

Instructor: Pan Guang-Ming

Division of Mathematical Sciences, SPMS


Nanyang Technological University, Singapore
Outline

➊ Time series and forecast


➋ Difference between times series and i.i.d random
variables
➌ Components of times series
➍ Measuring forecasting errors
Some Questions

◮ What is a time series?


◮ What are the purposes of time series analysis?
◮ What are the difference between classical (Independent Identically
Distributed (i.i.d.)) statistical analysis (e.g. inference and
modelling) and time series analysis?
◮ What are the typical components of a time series ?
◮ How to assess a prediction method?
The definition of time series

Definition
A time series is a sequence of observations over time.
Time series in different fields

Time series occur in a variety of fields.


◮ In business and economics, we observe weekly interest rates, daily
closing stock prices, monthly price indices, yearly sales figures, and
so forth.
◮ In engineering, we observe sound, electric signals and voltage.
◮ In medical studies, we measure electroencephalogram (EEG) and
electrocardiogram (EKG) tracings.
◮ In the social sciences, we study annual birth rates, mortality rates,
accident rates and various crimes rates.
The list of areas in which time series are studied is virtually endless.
The annual rainfall in California

Example
Exhibit 1 displays a time series plot of the annual rainfall amounts
recorded in Los Angeles, California, over more than 100 years. The plot
shows considerable variation in rainfall amount over the year −some years
are low, some high, and many are in-between in value. The year 1883
was an exceptionally wet year for Los Angeles, while 1983 was quite dry.
For analysis and modeling purposes we are interested in whether or not
consecutive years are related in some way. If so, we might be able to use
one year’s rainfall value to help forecast next year’s rainfall amount.
The annual abundance of Canadian hare

Example
Our second example concerns the annual abundance of Canadian hare.
Fig 2 gives the time series plot of this abundance over about 30 years.
Neighboring values here are very closely related. Large changes in
abundance do not occur from one year to the next. We see an upward
trend in the plot−−low values tend to be followed by low values in the
next year, middle-sized values by middle-sized values, and high values by
high values.
Time plot for the annual rainfall

Figure: Plot of Annual Rainfall


Time plot for Canadian hare
Continuous Time series against discrete time series

Definition
A time series is said to be discrete when the set T0 of times at which
observations are taken is a discrete set. A time series is said to be
continuous when observations are recorded continuously over some time
interval,e.g. when T0 = [0, 1].
Remarks: We are mainly interested in discrete-time time series with
equally fixed time intervals. e.g. observations made monthly, daily,
weekly, etc.
Some Representative Time Series (I)

Example
Unemployment Rate (%) in Singapore

year rate year rate year rate


1973 4.4 1984 2.7 1995 2.7
1974 3.9 1985 4.1 1996 3.0
1975 4.5 1986 6.5 1997 2.4
1976 4.4 1987 4.7 1998 3.2
1977 3.9 1988 3.3 1999 4.6
1978 3.6 1989 2.2 2000 4.4
1979 3.3 1990 1.7 2001 3.4
1980 3.5 1991 1.9 2002 5.2
1981 2.9 1992 2.7 2003 5.4
1982 2.6 1993 2.7
1983 3.2 1994 2.6
Fig for the unemployment rate

1
1970 1975 1980 1985 1990 1995 2000 2005

Figure: Plot of Example 2


Some Representative Time Series (II)

7000

6000
number of lynx

5000

4000

3000

2000

1000

0
0 20 40 60 80 100 120

time (year)

Figure: Canadian Lynx captured (1828-1934)


Some Representative Time Series (III)

35

30
daily temperature in HK

25

20

15

10

5
0 200 400 600 800 1000 1200

Figure: Temperature in Hong Kong (1994-1997)


Some Representative Time Series (IV)

300

250
no. of patients

200

150

100
0 50 100 150 200 250 300 350 400
time (daily)

Figure: Number of patients with respiratory problems in Hong Kong


(1994)
Some Representative Time Series (V)

10000

8000
cases of measles

6000

4000

2000

0
0 100 200 300 400 500 600 700 800 900
time (week)

Figure: Measles cases in London (1944-1978)


Some Representative Time Series (VI)

1600

1400

1200
SP500 index

1000

800

600

400

200
0 500 1000 1500 2000 2500 3000 3500
time (daily)

Figure: S & P500 stock index (1990-2004)


Purpose of Time Series Analysis

◮ Time series data are often examined in hopes of discovering a


historical pattern that can be exploited in the preparation of a
forecast.
◮ Specifically speaking, the first objective of time series analysis
is to understand or model the stochastic mechanism that
gives rise to the observed series. When presented with a time
series, the first step in the analysis is usually to plot the
observations against time to give what is called a time plot,
and then to obtain simple descriptive measures of the main
properties of the series. The power of the time plot is
illustrated in Fig 4, which clearly shows that there is a regular
seasonal effect.
Forecasts

After an appropriate family of models has been chosen and estimated,


the next major objective of time series analysis is to forecast or predict
future values of the series.
Definition
Prediction of future events and conditions are called forecasts, and the
act of making such predictions is called forecasting.
For example,
◮ what will be the unemployment rate next year?
◮ Is there a trend in global temperature?
◮ what is the seasonal effect?
◮ what is the relationship between GDP and interest rate?
Forecasting methods

Broadly speaking, forecasting methods can be divided into two


basic types:
◮ Qualitative forecasting methods: use the opinions of experts
to predict future events subjectively.
◮ Quantitative forecasting methods: Based the historical data,
use statistical methods to predict future values of a variable.
The difference between the time series and i .i .d . statistics
(I)

Time series data are dependent:


◮ there is an order for the observation of time series.
◮ time series data are dependent. e.g. this month’s exchange
rate will be correlated with the last month’s.
The difficulty with dependence (I)

Consider the i .i .d. case first. Suppose that X1 , X2 , · · · , Xn are i.i.d


random sample with mean µ and variance σ 2 . We then estimate µ
by
µ̂ = (X1 + X2 + · · · + Xn )/n.
The variance of µ̂ is
σ2
var(µ̂) =.
n
It follows that Var (µ̂) → 0 as n → ∞.
The difficulty with dependence (II)
We now consider the situation where all the Xi ’s are “perfectly”
correlated, i.e.
cov(Xi , Xj ) = σ 2 .
We still estimate µ by
µ̂ = (X1 + X2 + · · · + Xn )/n.
Its variance becomes
1
var(µ̂) = var(X1 + X2 + · · · + Xn )
n2
1 X X
n
= { var(X i ) + cov(Xi , Xj )}
n2
i =1 i 6=j
1 2 2
= n σ = σ2.
n2
n observation ≡ only one observation!!!
This example implies that: Dependence requires more
observations; more powerful/complicated mathematics in making
statistical analysis; and that we need to distinguish different type
Components of a time series

In order to identify the pattern of time series data, it is often


convenient to think of a time series as consisting of several
components.
Definition
The components of a time series are: Trend, Cycle, Seasonal
variations and Irregular fluctuations.
Trend

Trend refers to the upward or downward movement that


characterizes a time series over a period of time. Trend reflects the
long-run growth or decline in the time series.
Reasons for trend: technology improvement; changes in consumer
tastes; increases of per capita income; increase of total population;
market growth; inflation or deflation.
Fig for Trend

International Airline Passengers: Monthly Totals, 1949−1960


700

600
number of passengers

500

400

300

200

100
1948 1950 1952 1954 1956 1958 1960 1
time: month
Cycle

Cycle refers to recurring up and down movements around trend levels.


These fluctuations can have a duration of anywhere from two to ten
years or even longer measured from peak to peak or trough to trough.
Typical examples for cycle: “business cycle (Karl Mark’s explanation)”;
“nature Phenomena”; “interaction of two variables”. Explanations of
cycle is very difficult.
Fig for Cycle

Wolfer Sunspot Numbers, Annual, 1770−1869


200

150

100

50

0
1770 1780 1790 1800 1810 1820 1830 1840 1850 1860 1870
time: year
Seasonality

Seasonal variations or Seasonality are periodic patterns in a time


series that complete themselves within a calendar year and are then
repeated on a yearly basis.
Typical examples are the average monthly temperature, the number of
monthly housing starts and department store sales. Reasons for
seasonalilty: weather; customs.
Fig for Seasonality

35

30
daily temperature in HK

25

20

15

10

5
0 200 400 600 800 1000 1200

Figure: Temperature in Hong Kong (1994-1997)


Irregular fluctuations

Irregular fluctuations represent what is ”left over” in a time series after


tread, cycle and seasonal variations have been accounted for. Some
irregular fluctuations in a time series are caused by ”unusual” events that
can not be forecasted such as earthquakes, accidents, hurricanes, wars
and the like.
It is random movement in a time series.
Possible structures for a time series

These components may be combined in different ways. It is usually


assumed that they are multiplied or added. This leads to possible
structures for a time series as follows.
Additive and Multiplicative

◮ Additive
Yt = Tt + Ct + St + Rt
◮ Multiplicative
Yt = Tt × Ct × St × Rt

where Tt is the trend component (or factor) in time period (or point) t;
St is the seasonal component (or factor) in time period (or point) t; Ct is
the cyclical component (or factor) in time period (or point) t; Rt is the
irregular component (or factor) in time period (or point) t;
The second type can be changed into an additive model by taking the
logarithms.
Errors in Forecasting

◮ Types of Forecasts: we consider two types of forecasts: the point


forecast and the prediction interval forecast.
◮ A point forecast is a single number that represents our best
prediction (or guess) of the actual value being forecasted. Based on
{y1 , y2 , ..., ys }, any estimate of the value at time s + m with m > 0
is a point forecast. Denoted by ŷs+m .
Interval forecast

A prediction interval forecast is an interval (or range) of numbers that is


calculated so that we are very confident (usually, with 95% confidence).
Based on {y1 , y2 , ..., ys }, any confidence estimate of the value at time
s + m with m > 0 is a prediction interval.
Prediction of unemployment(I)

unemployment rate in Singapore


7

4 *
* *
3

1
1990 1992 1994 1996 1998 2000 2002 2004 2006
Prediction of unemployment(II)

unemployment rate in Singapore


7

1
1990 1992 1994 1996 1998 2000 2002 2004 2006
Measuring Forecasting error

All forecasting situations involve some degree of uncertainty. We


recognize this fact by including an irregular component in the description
of a time series. The presence of this irregular component, which
represents unexplained or unpredictable fluctuations in the data, means
that some error in forecasting must be expected.
We now consider the problem of measuring forecasting errors.
Forecast error

Denote the actual value of the variable of interest in time t as yt and the
predicted value of yt by ŷt . We then introduce the following concepts.
Definition

forecast error: et = yt − ŷt


Plots of forecast error plot et against t
absolute deviation: |et | = |yt − ŷt |,
squared error: et2 = (yt − ŷt )2 .
Measuring Forecasting error

An examination of forecast errors over time can often indicate whether


the forecasting technique being used does or does not match the pattern
of the data. For example, if a forecasting technique is accurately
forecasting the trend, seasonal, or cyclical components that are present in
a time series, the forecast errors should reflect only the irregular
component. In such a case, the forecast errors should be purely random.
See the following figures and we will demonstrate more in the subsequent
lectures.
An example of purely random

Example
{yt : 0.1001, 1.6900, 2.3156, 2.7119, 3.7902, 3.6686, 4.6908, 2.7975,
4.4802, 4.8433, 3.8959, 6.2573, 5.4435, 8.4151, 6.6949, 8.5287, 8.7193,
8.0781, 7.3293, 9.9408}

2
12
random forecast errors
10

e : prediction errors
1
8
yt

6 0

4
−1

t
2

0
−2
0 5 10 15 20 0 5 10 15 20
times times
General Philosophy

If the forecasting errors over time indicate that the forecasting


methodology is appropriate (random distribution of errors), it is
important to measure the magnitude of the errors so that we can
determine whether accurate forecasting is possible.
The magnitude of the errors

◮ mean of estimation errors:


X
n
n −1
et ;
t=1

◮ mean absolute deviation (MAD):

X
n
n−1 |et |;
t=1

◮ mean squared errors (MSE):

X
n
n−1 et2 .
t=1
An example demonstrating the power of the above three
measures

Actual Predicted Error Absolute squared


value value deviation error
yt ŷt et = yt − ŷt |et | et2
25 22 3 3 9
28 30 -2 2 4
29 30 -1 1 1
mean 0 2 4.67
Conclusions from such an example

Therefore, for the above three predictions,

mean prediction error = 0


mean absolute deviation = 2
mean squared error = 4.67.

“Mean prediction error” can not be used to assess a prediction method


because the positive and negative errors, no matter how large or small,
will cancel each other out.
MAD and MSE can be used as criteria for assessing a prediction method.
An assessment can be carried out by observations.
One more example(I)
One can propose the following two simple methods for prediction of yt .
Method A:
ŷt = yt−1
Method B:
1
ŷt = (yt−1 + yt−2 )
2

Actual Predicted Predicted Absolute


value value of A value of B deviation
yt ŷA,t ŷB,t |eA,t | |eB,t |
25 – – – –
28 25 – 3 –
29 28 26.5 1 2.5
30 29 28.5 1 1.5
27 30 29.5 3 3
23 27 28.5 4 5.5
20 23 25 3 5
One more example(II)

we have

MADA = (3 + 1 + 1 + 3 + 4 + 3)/6 = 2.5;


MSEA = (32 + 12 + 12 + 32 + 42 + 32 )/6 = 7.5

and

MADB = (2.5 + 1.5 + 3 + 5.5 + 5)/5 = 3.5;


MSEB = (2.52 + 1.52 + 32 + 5.52 + 52 )/5 = 14.55.

Both criteria suggest that method A is more accurate than method B.


Comparison of MAD and MSE (I)

We now describe how these two measures differ. The extreme prediction
errors have bigger influence on MSE than MAD.
Example
Suppose we have other two methods and their predict are

Actual Predicted Predicted Absolute


value value 1 value 2 deviation
yt ŷ1,t ŷ2,t |e1,t | |e2,t |
25 22 24 3 1
28 30 27 2 1
29 30 30 1 1
30 30 30 0 0
27 30 28 3 1
23 27 30 4 7
20 22 22 2 2
Comparison of MAD and MSE (II)
For method 1:

1X
7
MAD1 = |e1,t | = 2.14;
7 t=1

1X 2
7
MSE1 = e = 6.14;
7 t=1 1,t

For method 2:

1X
7
MAD2 = |e2,t | = 1.86;
7 t=1

1X 2
7
MSE2 = e = 8.14.
7 t=1 2,t

Based on MAD, method 2 is better than method 1;


Based on MSE, method 1 is better than method 2.
Conclusion (I)

It is commonly believed that MAD is a better criterion than MSE.


However, mathematically MSE is more convenient than MAD.
The difficulty with MAD
For example, a time series

{1.81, 2.73, 1.41, 4.18, 1.86,


2.11, 3.07, 2.06, 1.90, 1.17}

Denote them by y1 , ..., y10 . Suppose that it follows the following model

yt = β + εt .

The prediction is then


ŷt = β.
The aim is to estimate β. By MAD, we need to estimate β by minimizing
X
n
n−1 |yt − β|.
t=1

No simple answer! [Numerical calculation suggest the solution is


β = 1.93 ]
Tractable MSE

However, by MSE, we need to estimate β by minimizing


X
n
n −1
(yt − β)2 .
t=1

We can get the solution (How?)

X
n
β̂ = n−1 yt = 2.23.
t=1
A Scientific view towards statistical forecasting

◮ Statistical forecasting must be based on very strong assumptions:


the future behavior of the times series has the same kind “statistics
properties” as the observations.
◮ Usually, statistical methods can only be applied to “short term”
forecasts.
◮ Statistical forecasting must be combined with other prediction
methods in practice.

You might also like