Chapter 1 - Lecture Slides

Chapter 1 Introduction.
Instructor: Pan Guang-Ming
Division of Mathematical Sciences, SPMS

Nanyang Technological University, Singapore
Outline
➊ Time series and forecast

➋ Difference between times series and i.i.d random
variables
➌ Components of times series
➍ Measuring forecasting errors
Some Questions
◮ What is a time series?

◮ What are the purposes of time series analysis?
◮ What are the difference between classical (Independent Identically
Distributed (i.i.d.)) statistical analysis (e.g. inference and
modelling) and time series analysis?
◮ What are the typical components of a time series ?
◮ How to assess a prediction method?
The definition of time series
Definition
A time series is a sequence of observations over time.
Time series in different fields
Time series occur in a variety of fields.

◮ In business and economics, we observe weekly interest rates, daily
closing stock prices, monthly price indices, yearly sales figures, and
so forth.
◮ In engineering, we observe sound, electric signals and voltage.
◮ In medical studies, we measure electroencephalogram (EEG) and
electrocardiogram (EKG) tracings.
◮ In the social sciences, we study annual birth rates, mortality rates,
accident rates and various crimes rates.
The list of areas in which time series are studied is virtually endless.
The annual rainfall in California
Example
Exhibit 1 displays a time series plot of the annual rainfall amounts
recorded in Los Angeles, California, over more than 100 years. The plot
shows considerable variation in rainfall amount over the year −some years
are low, some high, and many are in-between in value. The year 1883
was an exceptionally wet year for Los Angeles, while 1983 was quite dry.
For analysis and modeling purposes we are interested in whether or not
consecutive years are related in some way. If so, we might be able to use
one year’s rainfall value to help forecast next year’s rainfall amount.
The annual abundance of Canadian hare
Example
Our second example concerns the annual abundance of Canadian hare.
Fig 2 gives the time series plot of this abundance over about 30 years.
Neighboring values here are very closely related. Large changes in
abundance do not occur from one year to the next. We see an upward
trend in the plot−−low values tend to be followed by low values in the
next year, middle-sized values by middle-sized values, and high values by
high values.
Time plot for the annual rainfall
Figure: Plot of Annual Rainfall

Time plot for Canadian hare
Continuous Time series against discrete time series
Definition
A time series is said to be discrete when the set T0 of times at which
observations are taken is a discrete set. A time series is said to be
continuous when observations are recorded continuously over some time
interval,e.g. when T0 = [0, 1].
Remarks: We are mainly interested in discrete-time time series with
equally fixed time intervals. e.g. observations made monthly, daily,
weekly, etc.
Some Representative Time Series (I)
Example
Unemployment Rate (%) in Singapore
year rate year rate year rate

1973 4.4 1984 2.7 1995 2.7
1974 3.9 1985 4.1 1996 3.0
1975 4.5 1986 6.5 1997 2.4
1976 4.4 1987 4.7 1998 3.2
1977 3.9 1988 3.3 1999 4.6
1978 3.6 1989 2.2 2000 4.4
1979 3.3 1990 1.7 2001 3.4
1980 3.5 1991 1.9 2002 5.2
1981 2.9 1992 2.7 2003 5.4
1982 2.6 1993 2.7
1983 3.2 1994 2.6
Fig for the unemployment rate
1
1970 1975 1980 1985 1990 1995 2000 2005
Figure: Plot of Example 2

Some Representative Time Series (II)
7000
6000
number of lynx
5000
4000
3000
2000
1000
0
0 20 40 60 80 100 120
time (year)
Figure: Canadian Lynx captured (1828-1934)

Some Representative Time Series (III)
35
30
daily temperature in HK
25
20
15
10
5
0 200 400 600 800 1000 1200
Figure: Temperature in Hong Kong (1994-1997)

Some Representative Time Series (IV)
300
250
no. of patients
200
150
100
0 50 100 150 200 250 300 350 400
time (daily)
Figure: Number of patients with respiratory problems in Hong Kong

(1994)
Some Representative Time Series (V)
10000
8000
cases of measles
6000
4000
2000
0
0 100 200 300 400 500 600 700 800 900
time (week)
Figure: Measles cases in London (1944-1978)

Some Representative Time Series (VI)
1600
1400
1200
SP500 index
1000
800
600
400
200
0 500 1000 1500 2000 2500 3000 3500
time (daily)
Figure: S & P500 stock index (1990-2004)

Purpose of Time Series Analysis
◮ Time series data are often examined in hopes of discovering a

historical pattern that can be exploited in the preparation of a
forecast.
◮ Specifically speaking, the first objective of time series analysis
is to understand or model the stochastic mechanism that
gives rise to the observed series. When presented with a time
series, the first step in the analysis is usually to plot the
observations against time to give what is called a time plot,
and then to obtain simple descriptive measures of the main
properties of the series. The power of the time plot is
illustrated in Fig 4, which clearly shows that there is a regular
seasonal effect.
Forecasts
After an appropriate family of models has been chosen and estimated,

the next major objective of time series analysis is to forecast or predict
future values of the series.
Definition
Prediction of future events and conditions are called forecasts, and the
act of making such predictions is called forecasting.
For example,
◮ what will be the unemployment rate next year?
◮ Is there a trend in global temperature?
◮ what is the seasonal effect?
◮ what is the relationship between GDP and interest rate?
Forecasting methods
Broadly speaking, forecasting methods can be divided into two

basic types:
◮ Qualitative forecasting methods: use the opinions of experts
to predict future events subjectively.
◮ Quantitative forecasting methods: Based the historical data,
use statistical methods to predict future values of a variable.
The difference between the time series and i .i .d . statistics
(I)
Time series data are dependent:

◮ there is an order for the observation of time series.
◮ time series data are dependent. e.g. this month’s exchange
rate will be correlated with the last month’s.
The difficulty with dependence (I)
Consider the i .i .d. case first. Suppose that X1 , X2 , · · · , Xn are i.i.d

random sample with mean µ and variance σ 2 . We then estimate µ
by
µ̂ = (X1 + X2 + · · · + Xn )/n.
The variance of µ̂ is
σ2
var(µ̂) =.
n
It follows that Var (µ̂) → 0 as n → ∞.
The difficulty with dependence (II)
We now consider the situation where all the Xi ’s are “perfectly”
correlated, i.e.
cov(Xi , Xj ) = σ 2 .
We still estimate µ by
µ̂ = (X1 + X2 + · · · + Xn )/n.
Its variance becomes
1
var(µ̂) = var(X1 + X2 + · · · + Xn )
n2
1 X X
n
= { var(X i ) + cov(Xi , Xj )}
n2
i =1 i 6=j
1 2 2
= n σ = σ2.
n2
n observation ≡ only one observation!!!
This example implies that: Dependence requires more
observations; more powerful/complicated mathematics in making
statistical analysis; and that we need to distinguish different type
Components of a time series
In order to identify the pattern of time series data, it is often

convenient to think of a time series as consisting of several
components.
Definition
The components of a time series are: Trend, Cycle, Seasonal
variations and Irregular fluctuations.
Trend
Trend refers to the upward or downward movement that

characterizes a time series over a period of time. Trend reflects the
long-run growth or decline in the time series.
Reasons for trend: technology improvement; changes in consumer
tastes; increases of per capita income; increase of total population;
market growth; inflation or deflation.
Fig for Trend
International Airline Passengers: Monthly Totals, 1949−1960

700
600
number of passengers
500
400
300
200
100
1948 1950 1952 1954 1956 1958 1960 1
time: month
Cycle
Cycle refers to recurring up and down movements around trend levels.

These fluctuations can have a duration of anywhere from two to ten
years or even longer measured from peak to peak or trough to trough.
Typical examples for cycle: “business cycle (Karl Mark’s explanation)”;
“nature Phenomena”; “interaction of two variables”. Explanations of
cycle is very difficult.
Fig for Cycle
Wolfer Sunspot Numbers, Annual, 1770−1869

200
150
100
50
0
1770 1780 1790 1800 1810 1820 1830 1840 1850 1860 1870
time: year
Seasonality
Seasonal variations or Seasonality are periodic patterns in a time

series that complete themselves within a calendar year and are then
repeated on a yearly basis.
Typical examples are the average monthly temperature, the number of
monthly housing starts and department store sales. Reasons for
seasonalilty: weather; customs.
Fig for Seasonality
35
30
daily temperature in HK
25
20
15
10
5
0 200 400 600 800 1000 1200
Figure: Temperature in Hong Kong (1994-1997)

Irregular fluctuations
Irregular fluctuations represent what is ”left over” in a time series after

tread, cycle and seasonal variations have been accounted for. Some
irregular fluctuations in a time series are caused by ”unusual” events that
can not be forecasted such as earthquakes, accidents, hurricanes, wars
and the like.
It is random movement in a time series.
Possible structures for a time series
These components may be combined in different ways. It is usually

assumed that they are multiplied or added. This leads to possible
structures for a time series as follows.
Additive and Multiplicative
◮ Additive
Yt = Tt + Ct + St + Rt
◮ Multiplicative
Yt = Tt × Ct × St × Rt
where Tt is the trend component (or factor) in time period (or point) t;
St is the seasonal component (or factor) in time period (or point) t; Ct is
the cyclical component (or factor) in time period (or point) t; Rt is the
irregular component (or factor) in time period (or point) t;
The second type can be changed into an additive model by taking the
logarithms.
Errors in Forecasting
◮ Types of Forecasts: we consider two types of forecasts: the point

forecast and the prediction interval forecast.
◮ A point forecast is a single number that represents our best
prediction (or guess) of the actual value being forecasted. Based on
{y1 , y2 , ..., ys }, any estimate of the value at time s + m with m > 0
is a point forecast. Denoted by ŷs+m .
Interval forecast
A prediction interval forecast is an interval (or range) of numbers that is

calculated so that we are very confident (usually, with 95% confidence).
Based on {y1 , y2 , ..., ys }, any confidence estimate of the value at time
s + m with m > 0 is a prediction interval.
Prediction of unemployment(I)
unemployment rate in Singapore

7
4 *
* *
3
1
1990 1992 1994 1996 1998 2000 2002 2004 2006
Prediction of unemployment(II)
unemployment rate in Singapore

7
1
1990 1992 1994 1996 1998 2000 2002 2004 2006
Measuring Forecasting error
All forecasting situations involve some degree of uncertainty. We

recognize this fact by including an irregular component in the description
of a time series. The presence of this irregular component, which
represents unexplained or unpredictable fluctuations in the data, means
that some error in forecasting must be expected.
We now consider the problem of measuring forecasting errors.
Forecast error
Denote the actual value of the variable of interest in time t as yt and the
predicted value of yt by ŷt . We then introduce the following concepts.
Definition
forecast error: et = yt − ŷt

Plots of forecast error plot et against t
absolute deviation: |et | = |yt − ŷt |,
squared error: et2 = (yt − ŷt )2 .
Measuring Forecasting error
An examination of forecast errors over time can often indicate whether

the forecasting technique being used does or does not match the pattern
of the data. For example, if a forecasting technique is accurately
forecasting the trend, seasonal, or cyclical components that are present in
a time series, the forecast errors should reflect only the irregular
component. In such a case, the forecast errors should be purely random.
See the following figures and we will demonstrate more in the subsequent
lectures.
An example of purely random
Example
{yt : 0.1001, 1.6900, 2.3156, 2.7119, 3.7902, 3.6686, 4.6908, 2.7975,
4.4802, 4.8433, 3.8959, 6.2573, 5.4435, 8.4151, 6.6949, 8.5287, 8.7193,
8.0781, 7.3293, 9.9408}
2
12
random forecast errors
10
e : prediction errors
1
8
yt
6 0
4
−1
t
2
0
−2
0 5 10 15 20 0 5 10 15 20
times times
General Philosophy
If the forecasting errors over time indicate that the forecasting

methodology is appropriate (random distribution of errors), it is
important to measure the magnitude of the errors so that we can
determine whether accurate forecasting is possible.
The magnitude of the errors
◮ mean of estimation errors:

X
n
n −1
et ;
t=1
◮ mean absolute deviation (MAD):
X
n
n−1 |et |;
t=1
◮ mean squared errors (MSE):
X
n
n−1 et2 .
t=1
An example demonstrating the power of the above three
measures
Actual Predicted Error Absolute squared

value value deviation error
yt ŷt et = yt − ŷt |et | et2
25 22 3 3 9
28 30 -2 2 4
29 30 -1 1 1
mean 0 2 4.67
Conclusions from such an example
Therefore, for the above three predictions,
mean prediction error = 0

mean absolute deviation = 2
mean squared error = 4.67.
“Mean prediction error” can not be used to assess a prediction method

because the positive and negative errors, no matter how large or small,
will cancel each other out.
MAD and MSE can be used as criteria for assessing a prediction method.
An assessment can be carried out by observations.
One more example(I)
One can propose the following two simple methods for prediction of yt .
Method A:
ŷt = yt−1
Method B:
1
ŷt = (yt−1 + yt−2 )
2
Actual Predicted Predicted Absolute

value value of A value of B deviation
yt ŷA,t ŷB,t |eA,t | |eB,t |
25 – – – –
28 25 – 3 –
29 28 26.5 1 2.5
30 29 28.5 1 1.5
27 30 29.5 3 3
23 27 28.5 4 5.5
20 23 25 3 5
One more example(II)
we have
MADA = (3 + 1 + 1 + 3 + 4 + 3)/6 = 2.5;

MSEA = (32 + 12 + 12 + 32 + 42 + 32 )/6 = 7.5
and
MADB = (2.5 + 1.5 + 3 + 5.5 + 5)/5 = 3.5;

MSEB = (2.52 + 1.52 + 32 + 5.52 + 52 )/5 = 14.55.
Both criteria suggest that method A is more accurate than method B.

Comparison of MAD and MSE (I)
We now describe how these two measures differ. The extreme prediction
errors have bigger influence on MSE than MAD.
Example
Suppose we have other two methods and their predict are
Actual Predicted Predicted Absolute

value value 1 value 2 deviation
yt ŷ1,t ŷ2,t |e1,t | |e2,t |
25 22 24 3 1
28 30 27 2 1
29 30 30 1 1
30 30 30 0 0
27 30 28 3 1
23 27 30 4 7
20 22 22 2 2
Comparison of MAD and MSE (II)
For method 1:
1X
7
MAD1 = |e1,t | = 2.14;
7 t=1
1X 2
7
MSE1 = e = 6.14;
7 t=1 1,t
For method 2:
1X
7
MAD2 = |e2,t | = 1.86;
7 t=1
1X 2
7
MSE2 = e = 8.14.
7 t=1 2,t
Based on MAD, method 2 is better than method 1;

Based on MSE, method 1 is better than method 2.
Conclusion (I)
It is commonly believed that MAD is a better criterion than MSE.

However, mathematically MSE is more convenient than MAD.
The difficulty with MAD
For example, a time series
{1.81, 2.73, 1.41, 4.18, 1.86,

2.11, 3.07, 2.06, 1.90, 1.17}
Denote them by y1 , ..., y10 . Suppose that it follows the following model
yt = β + εt .
The prediction is then

ŷt = β.
The aim is to estimate β. By MAD, we need to estimate β by minimizing
X
n
n−1 |yt − β|.
t=1
No simple answer! [Numerical calculation suggest the solution is

β = 1.93 ]
Tractable MSE
However, by MSE, we need to estimate β by minimizing

X
n
n −1
(yt − β)2 .
t=1
We can get the solution (How?)
X
n
β̂ = n−1 yt = 2.23.
t=1
A Scientific view towards statistical forecasting
◮ Statistical forecasting must be based on very strong assumptions:

the future behavior of the times series has the same kind “statistics
properties” as the observations.
◮ Usually, statistical methods can only be applied to “short term”
forecasts.
◮ Statistical forecasting must be combined with other prediction
methods in practice.

Chapter 1 - Lecture Slides

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1 - Lecture Slides

Uploaded by

Copyright:

Available Formats

Chapter 1 Introduction.

Instructor: Pan Guang-Ming

Division of Mathematical Sciences, SPMS

➊ Time series and forecast

◮ What is a time series?

Time series occur in a variety of fields.

Figure: Plot of Annual Rainfall

year rate year rate year rate

Figure: Plot of Example 2

Figure: Canadian Lynx captured (1828-1934)

Figure: Temperature in Hong Kong (1994-1997)

Figure: Number of patients with respiratory problems in Hong Kong

Figure: Measles cases in London (1944-1978)

Figure: S & P500 stock index (1990-2004)

◮ Time series data are often examined in hopes of discovering a

After an appropriate family of models has been chosen and estimated,

Broadly speaking, forecasting methods can be divided into two

Time series data are dependent:

Consider the i .i .d. case first. Suppose that X1 , X2 , · · · , Xn are i.i.d

In order to identify the pattern of time series data, it is often

Trend refers to the upward or downward movement that

International Airline Passengers: Monthly Totals, 1949−1960

Cycle refers to recurring up and down movements around trend levels.

Wolfer Sunspot Numbers, Annual, 1770−1869

Seasonal variations or Seasonality are periodic patterns in a time

Figure: Temperature in Hong Kong (1994-1997)

Irregular fluctuations represent what is ”left over” in a time series after

These components may be combined in different ways. It is usually

◮ Types of Forecasts: we consider two types of forecasts: the point

A prediction interval forecast is an interval (or range) of numbers that is

unemployment rate in Singapore

unemployment rate in Singapore

All forecasting situations involve some degree of uncertainty. We

forecast error: et = yt − ŷt

An examination of forecast errors over time can often indicate whether

If the forecasting errors over time indicate that the forecasting

◮ mean of estimation errors:

◮ mean absolute deviation (MAD):

◮ mean squared errors (MSE):

Actual Predicted Error Absolute squared

Therefore, for the above three predictions,

mean prediction error = 0

“Mean prediction error” can not be used to assess a prediction method

Actual Predicted Predicted Absolute

MADA = (3 + 1 + 1 + 3 + 4 + 3)/6 = 2.5;

MADB = (2.5 + 1.5 + 3 + 5.5 + 5)/5 = 3.5;

Both criteria suggest that method A is more accurate than method B.

Actual Predicted Predicted Absolute

Based on MAD, method 2 is better than method 1;

It is commonly believed that MAD is a better criterion than MSE.

{1.81, 2.73, 1.41, 4.18, 1.86,

The prediction is then

No simple answer! [Numerical calculation suggest the solution is

However, by MSE, we need to estimate β by minimizing

We can get the solution (How?)

◮ Statistical forecasting must be based on very strong assumptions:

You might also like