Professional Documents
Culture Documents
STATISTICAL QUALITY
Indira Gandhi National Open University
School of Sciences CONTROL AND
TIME SERIES ANALYSIS
6s
R2
R3 R5 R6 R9
R4
Volume
2
TIME SERIES ANALYSIS AND RELIABILITY THEORY
BLOCK 3
Time Series Analysis 5
BLOCK 4
Reliability Theory 147
Curriculum and Course Design Committee
Prof. Sujatha Varma Prof. Rakesh Srivastava
Former Director, SOS Department of Statistics
IGNOU, New Delhi M.S. University of Baroda, Vadodara (GUJ)
Units 15-18 are adapted from IGNOU course MSTE-001: Industrial Statistics-I of PGDAST programme, Block 4,
Units 13-16.
Formatted and CRC Prepared by Ms Preeti, SOS, IGNOU
Course Coordinator: Dr. Prabhat Kumar Sangal
Programme Coordinators: Dr. Neha Garg and Dr. Prabhat Kumar Sangal
Print Production
Mr. Rajiv Girdhar Mr. Hemant Parida
Assistant Registrar Section Officer
MPDD, IGNOU, New Delhi MPDD, IGNOU, New Delhi
June, 2023
© Indira Gandhi National Open University, 2023
ISBN-978-81-266-
All rights reserved. No part of this work may be reproduced in any form, by mimeograph or any other means,
without permission in writing from the Indira Gandhi National Open University
Further information on the Indira Gandhi National Open University may be obtained from the University’s Office at
MaidanGarhi, New Delhi-110068 or visit University’s website http://www.ignou.ac.in
Printed and published on behalf of the Indira Gandhi National Open University, New Delhi by the Director, School
of Sciences.
VOLUME 2: TIME SERIES ANALYSIS AND
RELIABILITY THEORY
Dear learners, welcome again to the Course, “Statistical Quality Control and Time Series
Analysis”. In Volume 1: Statistical Quality Control, you have learnt how statistical tools help
in maintaining the quality of the products so that they fulfil their specifications. In this volume,
you will study what a time series is, various methods of estimation of components of time
series, various time series models, basic functions of reliability, and reliability evaluation of
simple and complex systems. This volume also comprises two blocks: Block 3 and Block 4.
Block 3 of this course is titled “Time Series Analysis” and the goal of this block is to build the
abilities necessary for learners to use statistical techniques to analyse time series data and
forecast the values of the time series. We start with what is time series and what are its
components, various techniques of estimating its components such as trend, seasonal, cyclic
and random, and some fundamental concepts that are necessary for a proper understanding
of time series modelling such as stationarity, non-stationarity, and correlation analysis in time
series also discussed. We explain various time series models such as autoregressive model,
moving average model, autoregressive moving average model and autoregressive integrated
moving average model.
Block 4 of this course is titled “Reliability Theory” and contains four units that broadly cover
another important topic reliability theory. The objective of this block is to strengthen the
learners' abilities so they can use statistical techniques to compute a system's reliability and
improve the reliability of a system. In this block, we explain reliability of a component/system
which means how long it performs its intended function successfully under given conditions.
Here, we explain various basic functions of reliability and reliability evaluation of simple and
complex systems. Here we also discuss how to improve the system’s performance.
If you feel like reading more that what this course contains, you may like to consult the
following books:
Block
3
TIME SERIES ANALYSIS
UNIT 10
Trend Component Analysis 9
UNIT 11
Seasonal Component Analysis 45
UNIT 12
Stationary Time Series 75
UNIT 13
Correlation Analysis in Time Series 95
UNIT 14
Time Series Modelling Techniques 117
5
BLOCK 1: Process Control
Unit 1: Overview of Statistical Quality Control
6
BLOCK 3: TIME SERIES ANALYSIS
In our day-to-day life, we generally collects data at one point in time such type of data is called
cross-sectional data. In cross-sectional data, we collect information about different individuals/
subjects at the same point in time or during the same time. For example, data related to the
income of families living in New Delhi, GDP of various contraries in 2023, temperature of
different states on a particular day, marks of learners pursuing a programme, etc. For such
types of data, we just describe or summarised the current status of a group using mean,
SD,...etc. For example, for the data related to the income of families living in New Delhi, we
just find the average income, number of families below the poverty line, etc. but we do not find
whether the income increasing or decreasing.
There are so many situations where we collect data over time. For example, in business, we
observe daily sales, weekly interest rates, and daily closing stock prices. In meteorology, we
observe daily high and low temperatures and hourly wind speeds. In agriculture, we record
annual figures for crops and quarterly production, In the biological sciences, we observe the
electrical activity of the heart at millisecond intervals, etc. Such types of data are called time
series data. A time series is a set of numeric data of a variable that is collected over time at
regular intervals and arranged in chronological (time) order.
The objective of this block is to develop the skills which are essential to apply statistical tools
for analysing time series data and forecasting the values of the time series. This block
comprises five units.
In Unit 10: Trend Component Analysis, we shall discuss what a time series is and what are
its components. To see better patterns of the time series, we describe the methods of
smoothing or filtering such as simple and weighted moving averages with exponential
smoothing. We explain additive and multiplicative modes of time series. We also discuss the
estimation of trend component using method of least squares and moving average.
Unit 11: Seasonal Component Analysis deals with the study of some methods for estimating
the seasonal component and cyclic components. Here, we explain simple average method,
ratio to moving average, ratio to trend method with merits and merits for estimating seasonal
components. We also discuss how to deseasonalise the data and explain estimation of trend
using deseasonalised data. The estimation of cyclic and random components are also
discussed in this unit. After estimating comonents of a time series, we shall discuss the method
of forecast on basis of the components of the time series.
Unit 12: Stationary Time Series and Unit 13: Correlation Analysing in Time Series are
devoted to discussing some fundamental concepts that are necessary for a proper
understanding of time series modelling. Unit 12 begins with a simple introduction of stationary
and nonstationary time series. We also explain various methods of transforming a
nonstationary time series into a stationary one. Unit 13 begins with a simple introduction of
autocovariance, autocorrelation functions and partial autocorrelation in time series. We discuss
how to estimate these functions using time series data. We also discuss correlogram and how
to interpret it.
Unit 14: Time Series Modelling Techniques explains various time series models that are
used for forecasting. We begin with a simple introduction of the necessity of time series models
instead of ordinary regression models. We discuss the autoregressive (AR), moving average
(MA), autoregressive moving average (ARMA) and autoregressive integrated moving average
(ARIMA) models. When you deal with real-time series data, then the first question that may
arise in your mind is how you know which time series model is most suitable for a particular
7
time series data. For that, we discuss time series model selection in this unit.
Expected Learning Outcomes
After completing this block, you should be able to:
understand what the time series is and describe its components;
describe smoothing techniques for forecasting models, including, simple moving
average, weighted moving average, and exponential smoothing;
explain various methods for the estimation of the trend;
apply the various methods for estimating seasonal component such as simple average
method, the ratio to trend method and ratio to moving average method;
discuss the methods of estimation of cyclic and irregular fluctuations in a time series;
use trend, seasonal and cyclical components to forecast future values.
distinguish between stationary and nonstationary time series and transform a
nonstationary time series to stationary time series.
describe the concept of covariance and correlation in time series and explain
autocovariance and autocorrelation functions;
describe and use of autoregressive models, moving average models, autoregressive
moving average models and autoregressive integrated moving average models; and
selection of a particular time series model for real-life time series data.
Structure
10.1 Introduction Exponential Smoothing
10.1 INTRODUCTION
Most of the data used in statistical analysis is collected at one point of time
such type of data is called cross-sectional data. In cross-sectional data, we
collect information about different individuals/subjects at the same point of
time or during the same time. For example, data related to learners pursuing
the MSCAST programme in July 2023 such as name, qualification, age,
address, marks in graduation, etc., production of milk, import and export,
information on the household income of New Delhi residents, etc. For such
type of data, we just describe the status of the group at a point. For example,
for the data related to the income of families living in New Delhi, we just find
the average income, number of families below the poverty line, etc. but we do
not find whether the income increasing or decreasing.
There are so many situations where we collect data over time. For example, in
business, we observe daily sales, weekly interest rates, and daily closing stock
prices. In meteorology, we observe daily high and low temperatures and
9
* Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 3 Time Series Analysis
hourly wind speeds. In agriculture, we record annual figures for crops and
quarterly production, In the biological sciences, we observe the electrical
activity of the heart at millisecond intervals, etc. Such types of data are called
time series data. A time series is a set of numeric data of a variable that
is collected over time at regular intervals and arranged in chronological
(time) order.
In this unit, we shall discuss what a time series is and what are its
components. In Sec. 10.2, we discuss what is time series with various
examples. The components of a time series are described in Sec. 10.3. In
Sec. 10.4, we explore different basic models of time series which show the
relationships among the various components of a time series. To see better
patterns of the time series, we describe the methods of smoothing or filtering
such as simple and weighted moving averages, and exponential smoothing in
Sec. 10.5. Sec. 10.6 and 10.7 are devoted to estimation of trend effects using
the method of least squares (curve fitting) and moving average, respectively.
In the next unit, you will learn various methods of estimating other components
of a time series.
Expected Learning Outcomes
After studying this unit, you would be able to
explain what the time series is;
describe the components of time series;
explain the basic models of time series;
decompose the time series into different components for further analysis;
describe smoothing techniques for forecasting models, including, simple
moving average, weighted moving average, and exponential smoothing;
and
explain various methods for the estimation of the trend.
From the above data, we see that the sales of the commodity vary with time
(quarterly and yearly). The variation occurs because of the effects of the
various forces (such as seasons) at work, commonly known as components of
time series.
In the past when we analysed the time series, then we assumed that data
values of a time series variable are determined by four underlying
environmental forces that operate both individually and collectively over time.
They are: (i) Trend (ii) Seasonal (iii) Cyclic and (iv) Remaining variation
attributed to Irregular fluctuations (sometimes referred to as Random
component).
Time Series
Irregular or
Long term or Seasonal Cyclic
Random
Trend Component Component Component
Component
11
Block 3 Time Series Analysis
This approach is not necessarily the best one and we shall discuss the
modern approach in later units. Some or all the components are present in
varying amounts and can be classified into the mentioned four categories. We
shall now discuss these components in more detail one at a time.
10.3.1 Trend Component
Usually, time series data show random variation, but over a long period of
time, there may be a gradual shift in the mean level to a higher or a lower
level. This gradual shift in the level of time series is known as the trend. In
other words, the general tendency of values of the data to increase or
decrease during a long period of time is called the trend.
When time series values are plotted on a graph and the values show an
increasing or decreasing (on an average) pattern during a long period with
reference to the time, then the time series is called the time series with a trend
effect. The time series may show different types of trends.
Time Series with Upward Trend
When a time series values are plotted on a graph and the values are
increasing or showing an upward pattern (as shown in Fig. 10.1) with
reference to the time then the time series is called the time series with an
upward trend. For example, upward tendencies are seen in the data of
population growth, currency in circulation, prices of petroleum products in
India, number of passengers in the metro, literary rate, GDP of a country, etc.
We plot a time series graph (Fig. 10.1) of the GDP of a country from 2011 to
2020 that shows an upward trend.
Y
82
72
62
GDP
52
42
32 X
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Year
8.5
7.5
Death rate
6.5
5.5
4.5 X
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Year
Y
4.0
3.5
3.0
2.5
2.0
Yield
1.5
1.0
0.5
0.0
-0.5
-1.0 X
Year
This should be clearly understood that a trend is general, smooth, long term
and the average tendency of a time series data. The increase or decrease
may not necessarily be in the same direction throughout the given period. The
tendency of a time series may be found in either the form of a linear or a
nonlinear (curvy linear) trend. If the time series data is plotted and the points
on the graph cluster more or less around a straight line, then the tendency
shown by the data is called a linear trend in time series. Similarly, if the points
plotted on the graph do not cluster more or less around a straight line, then the
tendency shown by the data is called a nonlinear or curvilinear trend. Trends
are also known as long-term variations. The long-term or long period of time 13
Block 3 Time Series Analysis
is a relative term which cannot be defined. In some cases, a period of one
week may be long while in some cases a period of 2 years may not be
enough. Some of the more important causes of long-term trend movements in
a time series include population growth, urbanisation, technological
improvements, economic advancements and developments, and consumer
shifts in habits and attitudes.
10.3.2 Seasonal Component
In a time series, the variations which occur due to the rhythmic or natural
forces and operate in a regular and periodic manner over a span of less than
or equal to one year are termed as seasonal variations. We generally think
of seasonal movement in time series as occurring yearly, but it can also
represent any regularly repeating pattern that is less than one year in duration.
For example, daily traffic volume data show within-day seasonal behaviour,
with peak levels occurring during rush hours, moderate flow during the rest of
the day, and light flow from midnight to early morning. Thus, in a time series,
seasonal variation may exist if data are recorded quarterly, monthly, daily and
so on. Even though the data may be recorded over a span of three months,
one month, a week or a day, the amplitudes of the seasonal variation may be
different. Most of the time series data of economic or business fields show the
seasonal pattern. For example, the number of farming units (such as ploughs
and tractors) sold quarterly for the period 2019 to 2022 shows a seasonal
effect as shown in Fig. 10.4.
Y
75
70
65
60
55
50
45
40 X
The seasonal pattern existing in a time series may be either due to natural
forces or man-made conventions.
Seasonal Variations due to Natural Forces
Variations in time series that arise due to changes in seasons or weather
conditions and climatic changes are known as seasonal variations due to
natural forces. For example, sales of umbrellas and raincoat increase very fast
14 in the rainy season, the demand for air conditioners goes up in the summer
Unit 10 Trend Component Analysis
season, and the sale of woollens go up in winter all being operated by natural
forces.
Seasonal Variation due to Man-Made Conventions
Variations in time series that arise due to changes in fashions, habits, tastes,
and customs of people in any society are called seasonal variations due to
man-made conventions. For example, in our country sales of gold and clothes
go up in marriage seasons and festivals.
10.3.3 Cyclic Component
Apart from seasonal effects, some time series exhibit variation due to some
other physical causes, which is called cyclic variation. Cyclic variations are
wave-like movements in a time series (as shown in Fig. 10.5), which can vary
greatly in both duration and amplitude. Cyclical variations are recurrent
upward or downward movements in a time series, but the period of a cycle is
greater than a year whereas this period is less than one year in seasonal
variation. Cyclic and seasonal variations are seen as similar, but they are quite
different. If the variations are not of a fixed period, then they are cyclic and if
the period is constant and associated with some aspect of the season, then
the pattern is seasonal. In general, the average length of cycles is longer than
the length of a seasonal pattern, and the magnitude of cycles tends to be more
variable than the magnitude of seasonal variations.
The cyclic variation in a time series is usually called the “Business cycle”
and comprises four phases of business i. e. prosperity (boom), recession,
depression, and recovery.
Prosperity (boom)
The prosperity of any business is its profit. During a period of boom,
businessmen and industrialists invest more and the economy surpasses the
level of full employment and the level of production increases. These
incentives make them produce more and therefore profit more.
Recession
When there is excessive expansion, then it results in diseconomies that make
it difficult to keep up with large-scale production. Additionally, it causes greater
prices, rising salaries, and additional shortages. In an economic cycle, this is
referred to as a recession.
Depression
In this phase of the economic cycle, output, income, and employment all start
to drop rapidly. Also, investments decrease, and businesses are demoralised.
Thus, it leads to pessimism which leads to deflation and depression.
Recovery
The depressive phase does not last forever. After some time, there is a cooling
down and the improvement of trade begins. During the recovery period, old
debts are repaid, and the units which are weaker are settled. As a result, the
unemployment rate is gradually declining over time and income is generated.
2014 18 8 12 9
2015 5 8 4 11
2016 4 10 14 18
2017 24 23 27 30
2018 35 32 30 38
2019 32 35 30 24
We plot the time series data by taking sales on the Y-axis and the quarters on
16 the X-axis. We get the time series plot as shown in Fig. 10.6.
Unit 10 Trend Component Analysis
The plot shows more clearly the presence of different components in the time
series data. The plot also shows seasonal as well as cyclic effects. If we draw
a free-hand line to show the approximate movement of a curve around the
line, then this line shows the presence of a long-term linear trend.
All time series need not necessarily exhibit all four components. For example,
the time series data of annual production of a yield does not have seasonal
variations and similarly, a time series for the annual rainfall does not contain
cyclical variations.
Before moving to the next section, you can try the following Self Assessment
Question for better understanding.
SAQ 1
What is time series? Describe its components.
where, Tt, Ct, St and It denote the trend, cyclic, seasonal and irregular
variations. The multiplicative model is really a multiplicative version of the
additive model. This model is found appropriate for many business and
economic data. For example, the time series of the production of electricity,
the time series of the number of passengers who opted the air travelling, the
time series of the consumption of soft drinks, etc.
You can try the following Self Assessment Question for better understanding.
SAQ 2
What is the difference between additive and multiplicative models?
and important smoothing methods for the time series data namely moving
averages and exponential smoothing methods.
10.5.1 Simple Moving Average
The moving average (MA) is the simplest method for smoothing time series
data. A moving average removes irregular fluctuations and short-term
fluctuations from a time series. This method is based on averaging each
observation of a series with its surrounding observations, that is, past and
future observations in chronological order. In this method, we find the simple
moving averages of time series data over m span/period of time, and these
averages are called m-period moving averages. These averages are
smoothed versions of the original time series. In this method, we put the
average on the middle value of the set of observations, therefore, we explain
the moving average for old and even periods as follows:
When m is odd
In some situations, the data may show seasonal effects over an odd period of
time, e.g., 5 days, 7 months, etc. It means that after every 5 days or 7 months,
data behave in a similar manner. Therefore, to remove the seasonal effects
from the time series, we calculate the moving average of an odd span/period
of time, in our example, it is 5 days or 7 months. For odd periods, the method
consists of the following steps:
Step 1: We calculate the average of the first m values of the time series and
y1 + y 2 + y 3
MA1 =
3
where y1, y 2 and y 3 are the first three observations of the time series.
Note: The moving average eliminates periodic variations if the span of the
period of the moving average (m) is equal to the period of the oscillatory
variation. Therefore, we should choose m which constitutes a cycle, for
example, if a cycle is completed in 3 months, we should calculate exactly 3
monthly moving averages. If there is a variation in the span of cycles, for
example, suppose, the first cycle is completed in 3 months, the second in 3
months and the third in 5 months and so on then we should use the average of
these time spans as m.
Let us take an example to explain the procedure of the moving average
method.
Example 1: The following table shows the number of fire insurance claims
received by an insurance company in each four-month period from 2018 to
2021:
No. of
17 13 15 19 17 19 22 14 20 23 19 20
Claims
Calculate and plot the three-period and five-period moving average series for
the number of fire insurance claims. Compare these two moving average
series.
Solution: Since m = 3 is odd, therefore, we take the average of the first three
observations and put the same in the middle of these observations, that is, in
20 front of the second observation. The first value of m = 3 years moving average
Unit 10 Trend Component Analysis
17 + 13 + 15
is the average of 17, 13 and 15 which equals to = 15 and we put
3
this value in front of II period of 2018. The second value of moving averages is
obtained by discarding the first observation i.e. 17 and including the next
observation, that is, 19, this gives an average of 13, 15 and 19, which is equal
13 + 15 + 19
to = 15.67 . And we put it against the third observation, that is III
3
period of 2018. We repeat the same procedure of calculating three-period
moving averages until all data are exhausted. Similarly, we can find the five-
period moving averages by following the same procedure of three-period
moving averages, that is, we calculate the average of the first five
observations 17, 13, 15, 19 and 17 and put the same in front of the third
period. We calculate the rest of the moving averages in the following table:
No. of
Year Period 3-period MA 5-period MA
Claims
I 17 --- ---
17 + 13 + 15 ---
2018 II 13 = 15.00
3
13 + 15 + 19 17 + 13 + 15 + 19 + 17
III 15 = 15.67 = 16.20
3 5
15 + 19 + 17 13 + 15 + 19 + 17 + 19
I 19 = 17 17 = 16.60
3 5
19 + 17 + 19 15 + 19 + 17 + 19 + 22
2019 II 17 = 18.33 = 18.40
3 5
17 + 19 + 22 19 + 17 + 19 + 22 + 14
III 19 = 19.33 = 18.20
3 5
19 + 22 + 14 17 + 19 + 22 + 14 + 20
I 22 = 18.33 = 18.40
3 5
22 + 14 + 20 19 + 22 + 14 + 20 + 23
2020 II 14 = 18.67 = 19.60
3 5
14 + 20 + 23 22 + 14 + 20 + 23 + 19
III 20 = 19.00 = 19.60
3 5
20 + 23 + 19 14 + 20 + 23 + 19 + 20
I 23 = 20.67 = 19.20
3 5
2021 23 + 19 + 20 ---
II 19 = 20.67
3
From the above table, we observed that the original series varies between 13
and 22 whereas moving averages vary between 15 and 20.67 (3-period) and
16.20 to 19.60 (5-period) which are much smoother than the original series.
The moving averages fluctuate less than the fluctuation of the original
observations which are calculated as they smooth (or filter out) the effect of
seasonal/irregular components. This helps us to appreciate the effect of the
trend more clearly.
We now plot the original observations with both the 3-period and 5-period MA
values by taking them on the Y-axis and the period on the X- axis as shown in
Fig. 10.7.
21
Block 3 Time Series Analysis
Y
24 3-period MA 5-period MA
Original data
22
20
No. of Claims
18
16
14
12
10 X
I II III I II III I II III I II III
2018 2019 2020 2021
Period
Fig. 10.7: Insurance claims data with 3 and 5- period moving averages.
From a comparison of the line plots of the 3-period and 5-period moving
average values, we can see that there is less fluctuation (greater smoothing)
in the 5-period moving average series than in the 3-period moving average
series.
Therefore, we can conclude that the term, m, for the moving average affects
the degree of smoothing:
• A shorter period (m) of the moving average produces a more jagged
moving average curve.
Now the problem is what should be the value of m. If m is increased then the
series becomes much smoother and it may also smooth out the effect of
cyclical and seasonal components, which are our main interest of study. To
take away seasonality from a series, we would use a moving average with a
length equal to the seasonal span. Thus, in the smoothed series, each
smoothed value has been averaged across all seasons. Sometimes 3-year, 5-
year or 7-year moving averages are used to expose the combined trend and
cyclical movement of time series.
The moving average can also be utilised as a forecasting model with some
simple steps. In simple moving average, we put the moving average at the
centre of the set of m observations as you have seen in Example 1 and we
have put the average of 17, 13 and 15 which equals 15 against the Period II of
2018. It means we forecast the value of the Period II of 2018. And for that, we
use both the past (observation of period I) and future (observation of period III)
of a given time point. In that sense, we cannot use moving averages as such
forecasting because, at the time of forecasting, the future is typically unknown.
Hence, for the purpose of forecasting, we use trailing moving averages. In
this method, we take an average of past consecutive m observations of the
series as follows:
y t + y t −1 + ... + y t −m+1
y′t +1 =
22 m
Unit 10 Trend Component Analysis
where y′t +1 is the forecast value of y t +1 . For example, if m = 3, we compute the
first forecast value by taking the average of the first three values as
y 3 + y 2 + y1
y′4 =
3
Furthermore, only the first forecast value is constructed by averaging only the Forecast Error
actual values of the series. As we move to the second forecast, the actual The error of an
values are replaced with the previously forecasted values. For instance, the individual forecast is the
difference between the
second forecast value is defined by the following expression:
actual value and the
y′t +1 + y t + y t −1 + ... + y t −m+ 2 forecast of that value. If
y′t + 2 = are the actual
m
and forecast of that
For example, if m = 3, we compute the second forecast value as value values at time t,
respectively then
y′4 + y 3 + y 2 forecast error can
y′5 =
3 defined as
If the forecast value for y t is y′t , then we can calculate the forecast error as
e=
t y t − y′t
I 17
II 13
2018 1× 17 + 1× 13 + 2 × 15 + 4 × 19
= 17.00
1+ 1+ 3 + 4
17 + 16.75
III 15 = 16.88
2
1× 13 + 1× 15 + 2 × 19 + 4 × 17
= 16.75
1+ 1+ 2 + 4
16.75 + 18
I 19 = 17.38
2
1× 15 + 1× 19 + 2 × 17 + 4 × 19
= 18.00
1+ 1+ 2 + 4
18 + 20.25
II 17 = 19.13
2
2019
1× 19 + 1× 17 + 2 × 19 + 4 × 22
= 20.25
1+ 1+ 2 + 4
20.25 + 17
III 19 = 18.63
2
1× 17 + 2 × 19 + 3 × 22 + 4 × 14
= 17.00
1+ 1+ 2 + 4
17 + 18.63
I 22 = 17.81
2
1× 19 + 1× 22 + 2 × 14 + 4 × 20
= 18.63
1+ 1+ 2 + 4
18.63 + 21
II 14 = 19.81
2
2020
1× 22 + 1× 14 + 2 × 20 + 4 × 23
= 21.00
1+ 1+ 2 + 4
21 + 19.5
III 20 = 20.25
2
1× 14 + 1× 20 + 2 × 23 + 4 × 19
= 19.50
1+ 1+ 2 + 4
19.5 + 20.13
2021 I 23 = 19.81
24 2
Unit 10 Trend Component Analysis
1× 20 + 1× 23 + 2 × 19 + 4 × 20
= 20.13
1+ 1+ 2 + 4
II 19
III 20
2. A moving average time series is a smoother series than the original time
series values. It has removed the effect of short-term fluctuations (i.e.
seasonal and irregular fluctuations) from the original observations by
averaging over these short-term fluctuations.
4. The method is very flexible in the sense that the addition of a few more
figures to the data simply results in a few more trend values without
affecting the previous calculations.
Demerits
SAQ 3
The marketing manager of an electricity company recorded the following
quarterly demand levels for electricity (in 1000 megawatts) in a city from 2020
to 2022.
Season 2020 2021 2022
Summer 70 101 146
Monsoon 52 64 92
Winter 22 24 38
Spring 31 45 49
where α is called the exponential smoothing constant and lies between 0 and
1. It controls the rate at which weights decrease.
This method consists of the following steps:
Step 1: We take the first given value as the first smoothed value, i.e.,
y1′ = y1
Step 2: We compute the second smoothed value using the first smoothed
26 value. We compute the second smoothed value as
Unit 10 Trend Component Analysis
y′2 = α y 2 + (1 − α ) y1′
Step 3: We repeat this process until all data are exhausted. We compute the
tth smoothed value as follows:
y′t = α y t + (1 − α ) y′t −1
The popular choice of the smoother constant is α = 0.2. For this, we assign a
weight of 0.2 on the most recent observation and a weight of 1 − 0.2 = 0.8 on
the most recent forecast value.
Similarly, you can compute the rest values in the same manner and also for
α = 0.4 and 0.8.
Y
24
α = 0.4
α = 0.8
22 α = 0.2
Original data
20
Claims
18
16
14
12 X
I II III I II III I II III I II III
2018 2019 2020 2021
Period
Fig. 10.8: Original and smoothed time series values of the claim data.
3. If we forecast using the moving averages method, then m prior values are
required. If we have to forecast many values, then this is time-consuming.
Whereas the exponential method uses only two pieces of data.
Demerits
1. The method is not flexible in the sense that if some figures are added to
the data, then we have to do all calculations again.
2. This method gives good results in the absence of seasonal or cyclical
variations. As a result, forecasts are not accurate when data with cyclical
or seasonal variations are present.
After understanding the exponential smoothing method, you may be interested
in doing the same yourself. For that, you can try the following Self Assessment
28 Question.
Unit 10 Trend Component Analysis
SAQ 4
The annual expenditure levels (in millions) to promote products and services
for the financial services sector such as banks, insurance, investments, etc.
from 2015 to 2022 are shown in the following table:
Year 2015 2016 2017 2018 2019 2020 2021 2022
Expenditure 5.5 7.2 8.0 9.6 10.2 11.0 12.5 14.0
We think that you have understood the importance of smoothing and how to
apply it in time series data. We shall discuss how to estimate the trend
component in the next session.
The coefficients β0 and β1 denote the intercept and the slope of the trend
line, respectively. The intercept β0 represents the predicted value of Y
when t = 0 and the slope β1 represents the average predicted change in 29
Block 3 Time Series Analysis
Y resulting from a one-unit change in t.
We can estimate the values of the constants β0 and β1 using the following
normal equations:
∑Y t = nβ0 + β1 ∑ t
∑ tY t = β0 ∑ t + β1 ∑ t 2
where n is the number of observations in the given time series. We obtain the
values of ∑ Yt , ∑ t, ∑ tYt and ∑ t 2 from the given time series data and solve
these normal equations for the values of β0 and β1 .
∑Y t = nβ0 + β1 ∑ Xt
∑X Y t t = β0 ∑ Xt + β1 ∑ X2t
After that, we put the value of Xt in terms of t to find the final trend line.
Let us take an example to understand how to fit a linear trend line for real-life
time series data.
Example 4: The sales director of a real estate company wants to study the
general direction (trend) of future housing sales. For that, he/she recorded the
number of houses sold from 2010 to 2018 as given in the following table:
Year 2010 2011 2012 2013 2014 2015 2016 2017 2018
Sales 52 54 48 60 61 66 70 80 92
(i) Construct a simple trend line for the house sales data for the real estate
company.
(ii) Find the trend values for the given data and find forecast errors.
(iii) Plot the given data with trend values.
(iv) Use the trend line of best fit to estimate the level of house sales for the
year 2022.
Solution: The linear trend line equation is given by
30 Yt = β0 + β1t
Unit 10 Trend Component Analysis
Since n (number of years) = 9 is odd and the middle value is 2014, therefore,
we make the following transformation in time t as
Xt = t − 2014
∑Y t = nβ0 + β1 ∑ Xt
∑X Y t t = β0 ∑ Xt + β1 ∑ X2t
We can find the trend values using the above trend line by putting values of t.
For example, t = 2010.
Ŷt = 64.78 + 4.8 ( 2010 − 2014 ) = 45.58
You can calculate the rest of the values in a similar manner. We have
calculated the same in the above table. We now plot the time series data and
trend line by taking years on the X-axis and the house sales and trend values
on the Y-axis. We get the time series plot as shown in Fig. 10.9.
We can estimate the trend value of house sales for 2022 by putting t = 2022 in
the above linear trend line as follows:
Ŷt = 64.78 + 4.8 ( 2022 − 2014 ) = 103.18 ≈ 103
31
Block 3 Time Series Analysis
Y
100
90 Trend line
80
House sales
Original data
70
60
50
40 X
2010 2011 2012 2013 2014 2015 2016 2017 2018
Year
We proceed same as in the case of the trend line, the normal equations for
estimating β0 , β1 and β2 after the transform of the data are given as follows:
∑Y t = nβ0 + β1 ∑ Xt + β2 ∑ X2t
∑X Y t t = β0 ∑ Xt + β1 ∑ X2t + β2 ∑ X3t
∑X Y 2
t t = β0 ∑ X2t + β1 ∑ X3t + β2 ∑ Xt4
After that, we put the value of Xt in terms of t to find the final quadratic trend.
To illustrate this, let us take an example to fit a quadratic trend.
Example 5: Fit a quadratic trend equation for the house sales data of the real
estate company given in Example 4. Also
(i) Find forecast errors.
(iii) Use the quadratic trend equation, to estimate the level of house sales for
year 2022.
32 Solution: The quadratic trend equation is given as
Unit 10 Trend Component Analysis
Yt = β0 + β1t + β2 t 2
∑Y t = nβ0 + β1 ∑ Xt + β2 ∑ X2t
∑X Y t t = β0 ∑ Xt + β1 ∑ X2t + β2 ∑ X3t
∑X Y 2
t t = β0 ∑ X2t + β1 ∑ X3t + β2 ∑ Xt4
By putting the values from the table in the normal equations, we get
583= 9 × β0 + 0 × β1 + β2 × 60 ⇒ 9β0 + 60β2 = 583
288
288 = β0 × 0 + β1 × 60 + β2 × 0 ⇒ β1 = = 4.8
60
4108 = β0 × 60 + β1 × 0 + β2 × 708 ⇒ 60β0 + 708β2 = 4108
After solving the above equation for β0 , β1 and β2 get the estimate of these as
After putting the value of Xt in terms of t, we get the desired quadratic trend
equation as follows:
We can find the trend values using the above quadratic trend equation by
putting t values. For example, for t = 2010.
e1 =y1 − yˆ 1 =
52 − 52.31 =
−0.31
33
Block 3 Time Series Analysis
You can calculate the rest of the values in a similar manner. We have
calculated the same in the table.
We can estimate the trend value of house sales for 2022 by putting t = 2022 in
the above linear trend line as follows:
Ŷt = 59.99 + 4.8 ( 2024 − 2014 ) + 0.72 ( 2024 − 2014 ) = 179.99 ≈ 180
2
We now plot the time series data and trend line by taking years on the X-axis
and the house sales and trend values on the Y-axis. We get the time series
plot as shown in Fig. 10.10.
Y
100
90 Quardetic trend
Trend line
80
Original data
House sales
70
60
50
40 X
2010 2011 2012 2013 2014 2015 2016 2017 2018
Year
Fig. 10.10: Fitted quadratic and linear trend models with house sales data.
If we compare the forecast errors that occurred in both linear and quadratic
form, then we observe that these are less in quadratic in comparison to the
linear so we can say that the quadratic trend fits better than linear on the
number of houses sold by the company.
Before going to the next session, you may like to fit linear or quadratic trend.
So, try a Self Assessment Question.
SAQ 5
The following table gives the gross domestic product (GDP) in 100 million for a
certain country from 2010 to 2020:
2019 2020
Year 2011 2012 2013 2014 2015 2016 2017 2018
GDP 35 37 51 54 62 64 74 71 83 80
(i) Fit a trend line for GDP data and find trend values with the help of trend
line.
34 (iii) Use best-fit trend model to predict the country’s GDP for 2022
Unit 10 Trend Component Analysis
We can transform this model to a linear trend model by taking the natural
logarithm (base e) on both sides of the above model as follows:
log ( Y=
t) log ( β0 ) + β1t log ( e )
This is the equation of linear trend. Therefore, we proceed in the same way as
in the case of the linear trend line and find the estimate of a and β1 .
Once, we estimate of a and β1 i.e. â and β̂1 are obtained, then we can obtain
an estimate of β0 i.e. β̂0 with the help of â as
βˆ 0 =eâ
After that, we put the values of estimates of β0 and β1 in the exponential form
to get the equation of best fit.
Let us try a Self Assessment Question for the exponential trend in the same
manner as described above.
SAQ 6
The gross revenue of a company from 2015 to 2022 is given in the following
table:
(i) Check which model of trend is the best fit to the given data.
(ii) Fit the suitable model.
(iii) Find the forecast errors.
(iv) Use the best fit trend model to predict the company’s gross revenue for
2025.
10.8 SUMMARY
In this unit, we have discussed:
• A time series is a set of numeric data of a variable that is collected over
time at regular intervals and arranged in chronological (time) order.
• The different components of time series are trend, cyclical, seasonal and
irregularity;
• The variations which occur due to rhythmic or natural forces and operate
in a regular and periodic manner over a span of less than or equal to one
year are termed as seasonal variations.
• Apart from seasonal effects, some time series exhibit variation due to
some other physical causes, which is called cyclic variation. It has a
period greater than one year.
• The variations in a time series which do not repeat in a definite pattern are
called irregular variations or irregular components.
10.10 SOLUTION/ANSWERS
Self-Assessment Questions (SAQs)
1. Refer Secs. 10.2 and 10.3.
2. Refer Sec. 10.4.
3. Since m = 4 is even, therefore, we compute the first moving average as
70 + 52 + 22 + 31
=MA1 = 43.75
4
We put the first MA in the middle of the second and third observations by
creating blank rows after each observation as explained in Example 2.
We can also calculate the weighted MAs by taking the weights w1 = 1,
w 2 2,=
= w 3 3 and w 4 = 4 as follows:
1× 70 + 2 × 52 + 3 × 22 + 4 × 31
WMA1 = 36.40
1+ 2 + 3 + 4
Summer 70
Monsoon 52
51.50 59.30
54.50 64.30
55.00 52.10
69.75 83.10
37
Block 3 Time Series Analysis
Summer 146 78.50 84.25
80.25 76.50
81.25 64.00
Winter 38
Spring 49
We now plot the time series data and MAs by taking years on the X-axis and
the demand of electricity and centred MA and WMA on the Y-axis. We get the
time series plot as shown in Fig. 10.11.
Y
170
150
130 Centred
Demand of electricity
90
Demand of
electricity
70
50
30
10 X
From the above figure, we observed that WMA is smoother than the simple
MA.
4. In the exponential smoother method, we take the first smooth(forecast)
value as the first given value, i.e.,
′ y=
y=
1 1 5.5
We can compute the forecast error as
e1 = y1 − y1′ = 5.5 − 5.5 = 0
We can compute the second smoothed value using the first smoothed
value as
y′2 = α y 2 + (1 − α ) y1′
Similarly, you can compute the rest values in the same manner as
follows:
Exponential
Year Expenditure Smoothing Forecast Error
(α = 0.8)
We now demonstrate the impact of the smoothing factor α using the time
series graph shown in Fig. 10.12.
15
14
13 Smoothed
values α = 0.8
12
Expenditure
11
Original data
10
5
2015 2016 2017 2018 2019 2020 2021 2022
Year
From the above figure, we observed that the time series is not smoother so
α = 0.8 has less impact for smoothing the time series.
5. The linear trend line equation is given by
Yt = β0 + β1t
t − 2015.5
= = 2 ( t − 2015.5 )
1/ 2 39
Block 3 Time Series Analysis
After the transformation, the normal equations for linear trend line are:
∑Y t = nβ0 + β1 ∑ Xt
∑X Y t t = β0 ∑ Xt + β1 ∑ X2t
611
611
= 10 × β0 + β1 × 0 ⇒=
β0 = 61.1
10
889
889 = β0 × 0 + β1 × 330 ⇒=
β1 = 2.69
330
Thus, the final linear trend line is given by
Ŷt 61.1 + 2.69X t
=
We can find the trend values using the above line by putting t values.
For example, t = 2011.
Ŷt 61.1 + 2.69 × 2 ( 2011 − 2015.5
= = ) 61.6 − 2.69 × ( −9=) 36.89
We can compute the forecast error as
e1 =y1 − yˆ 1 =
35 − 36.89 =
−1.89
You can calculate the rest of the values in a similar manner. We have
calculated the same in the above table.
We can estimate the trend value of the GDP of the country for 2022 by
putting t = 2022 in the above linear trend line as follows:
We now plot the time series data and trend line by taking years on the X-
axis and the GDP and trend values on the Y-axis. We get the time series
plot as shown in Fig. 10.13.
Y Trend line
82
72
Original data
62
GDP
52
42
32 X
2010 2011 2012 2013 2014 2015 2016 2017 2018
Year
6. To check which model of trend is the best fit to the given data, we first
plot the given data by taking the year on the X-axis and corresponding
gross revenue on the Y-axis as shown in Fig. 10.14.
Y
350
300
250
Gross revenue
200
150
100
50
0 X
2015 2016 2017 2018 2019 2020 2021 2022
Year
Yt = β0 eβ1t
Z t log ( Yt )=
where = ,a log ( β0 )
t − 2018.5
= = 2 ( t − 2018.5 )
1/ 2
Therefore, the normal equations for estimating a and β1 are:
∑ Z= t na + β1 ∑ Xt
∑=
X Z a∑ X
t t t + β1 ∑ X2t
Gross
Year Xt = Zt = Trend Forecast
Revenue Xt Zt X 2t
(t) 2(t – 018.5) log(Yt) Value Error
(Yt)
35.74
35.74 = 8 × a + β1 × 0 ⇒
= a = 4.47
8
33.36
33.36 = a × 0 + β1 × 168 =
⇒ β1 = 0.20
168
Therefore,
βˆ 0 =eâ
(=
2.20 )
4.47
= 87.60
0.20×2( t − 2018.5 )
β0 eβ1t =
Ŷt = 87.60e
42
Unit 10 Trend Component Analysis
We can find the trend values using the above line by putting t values.
For example, for t = 2015
You can calculate the rest of the values in a similar manner. We have
calculated the same in the table.
y1 − Yˆ 2015 =
e1 = 15 − 21.58 =
−6.58
We can estimate the trend value of the gross revenue of the company
for 2025 by putting t = 2025 in the above trend equation as follows:
0.120×2( 2025 − 2018.5 )
Ŷt = 87.60e
= 1151.79
We now plot the time series data and trend values by taking years on the
X-axis and the gross revenue and trend values on the Y-axis. We get the
time series plot as shown in Fig. 10.15.
Y
400
350 Exponential
trend
300
Gross revenue
250
200
100
50
X
0
2015 2016 2017 2018 2019 2020 2021 2022
Year
Fig. 10.15: Fitted exponential trend model and gross revenue data.
For comparison with the linear trend, we also calculate the forecast
errors in the same table as follows:
e2 =
y 2 − MA =
37 − 41 =
−4
By comparing the forecast errors that occur in both the moving average
and trend line, we observed that in most of the cases, the error is less in
the trend line in compression of the moving average.
We also pot the estimated trend using the method of least squares and
moving average method in Fig. 10.16.
Y Trend using MA
82
GDP data
72
Trend Using
62 method of least
squares
GDP
52
42
32 X
2010 2011 2012 2013 2014 2015 2016 2017 2018
Year
Fig. 10.16: GDP data with trend values using method of least squares and moving
average.
44
UNIT 11
SEASONAL COMPONENT
ANALYSIS
Structure
11.1 Introduction 11.7 Estimation of Cyclic
Component
Expected Learning Outcomes
Residual Method
11.2 Estimation of Seasonal
Component 11.8 Estimation of Random
Component
11.3 Simple Average Method
11.9 Forecasting
11.4 Ratio to Trend Method
11.10 Summary
11.5 Ratio to Moving Average
Method 11.11 Terminal Questions
11.6 Estimation of Trend from 11.12 Solution/Answers
Deseasonalised Data
11.1 INTRODUCTION
In Unit 10, you have seen that time series can be decomposed into four
components i.e. trend, seasonal, cycle and irregular components. Our aim is
to estimate these components and use them for forecasting. We have already
described some methods for smoothing or filtering the time series i.e. simple
average, weighted moving average and exponential smoothing methods. We
have also described some methods for estimating the trend, i.e., methods of
least squares and moving average methods.
In Unit 10, you also study seasonal component. Seasonal variations are
fluctuations in a time series that are repeated at regular intervals within a year
(e.g. daily, weekly, monthly, quarterly). Seasonal variation may be caused by
the seasons, temperature, rainfall, public holidays, etc. In case when the effect
of seasonal variation is not removed from the time series data then the trend
estimate will also be affected by seasonal effects. In such cases, we divide the
original time series values by corresponding seasonal indices. This technique
is called deseasonalisation of data.
In this unit, we shall discuss some methods for estimating the seasonal
component and cyclic components. In Sec. 11.2, we discuss the goals of
seasonal component analysis. In Secs. 11.3, 11.4 and 11.5, we explain the
45
* Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 2 Time Series Analysis
simple average method, ratio to moving average, and ratio to trend method
with their merits and merits, for estimating seasonal components, respectively.
In Sec. 11.6, we explain the estimation of trend using deseasonalised data.
The estimation of cyclic and random components is discussed in Secs. 11.7
and 11.8, respectively. Once, we estimate trend, seasonal and cyclic
components, then we shall use them for forecasting purposes. Therefore, Sec.
11.9 is devoted to the forecast of the time series. In the next units, you will
study the modern approach to forecasting the time series.
Expected Learning Outcomes
After studying this unit, you would be able to
describe the effect of seasonal variation in the time series influences;
goals of analysis the seasonal components;
apply the simple average method for the estimation of seasonal indices;
use the ratio to trend method for computing seasonal indices;
apply the ratio to moving average method for estimation of seasonal
indices;
describe the method of estimation of trend component from
deseasonalised time series data;
discuss the methods of estimation of cyclic and irregular fluctuations in a
time series; and
use trend, seasonal and cyclical components to forecast future values.
That is, after calculating the grand average, we express each average
as the percentage of the grand average y . These percentages are
known as adjusted seasonal indices. We can compute the ith
seasonal index (Si) as
yi
S=i × 100
y
Interpretation of seasonal index
Each (adjusted) seasonal index measures the average magnitude of the
Seasonal index also
seasonal influence on the actual values of the time series for a given period
called seasonal effect or
within the year and it measures how a particular season compares on average
seasonal component,
to the mean of the cycle. For example, if the seasonal index of the season
(April-June) of the sales of AC of a particular company is 170.5. It means that
the sale of AC in that season, on an average, is 70.5 (170.5-100) times higher
than the average. If the seasonal index for the season (October-December) is
75 then it indicates that the sale of AC in that season, on an average, is 25
(100-75) times below the average and the company may be depressed by the
presence of seasonal forces by approximately 25%. Alternatively, the AC
sales would be about 25% higher if seasonal influences had not been present.
After understanding various steps of computing seasonal indices using the
simple average method and how to interpret these, let us take an example to
apply this method.
Example 1: The marketing manager of an electricity company recorded the
following quarterly (seasonally) demand levels for electricity (in 1000
megawatts) in a city from 2019 to 2022.
Season 2019 2020 2021 2022
Summer 70 101 120 135
Monsoon 52 64 75 82
Winter 22 24 30 34
Spring 31 45 49 50
(i) Calculate the seasonal index for each season by assuming that there
are no trend and cyclic effects.
(ii) Plot seasonal indices and original data on the same graph.
(iii) Also, interpret the seasonal indices.
We now plot the demand of electricity and seasonal indices on the graph
paper as shown below:
Y
190
170
Demand of Seasonal index
150 electricity
Demand of electricity
130
110
90
70
50
30
10
X
Year
After understanding the simple average method and how to calculate seasonal
indices, we now discuss the merits and demerits of this method.
• This method is based on the unrealistic assumption that the trend and
cyclical variations are not present in the time series data. These
assumptions are not met in most of the economic or business time series
because they generally exhibit trends.
Before moving to the next method of measuring seasonal component, you can
try the following Self Assessment Question for better understanding.
SAQ 1
The sales manager of a company recorded the monthly sales (in thousands)
of a product for the years 2020, 2021 and 2022 and are given as follows:
Sales
Months
2010 2011 2012
January 120 150 160
February 110 140 150
March 100 130 140
April 140 160 160
May 150 160 150
June 150 150 170
July 160 170 160
August 130 120 130
September 110 1360 100
October 100 120 100
November 120 130 110
December 150 140 150
Obtain the monthly seasonal indices assuming that there is no trend and cyclic
effects.
After understanding the simple average method and its merits and demerits,
we now learn the second method of calculating seasonal variation, that is, the
50 ratio to trend method.
Unit 11 Seasonal Component Analysis
Step 4: To eliminate the trend effect from the data, we express the original
time series values as the percentages of the trend value assuming And after dividing by the
the multiplicative model. These percentages will contain the seasonal, trend, the seasonal and
and irregular components. irregular components
will remain
Step 5: After eliminating the trend effect, the series contains only seasonal
and irregular effects. Therefore, we now obtain the seasonal indices
free from the irregular variations by following the same procedure
discussed in the simple average method in the previous section.
Thus, we find the average (mean or median) of ratio to trend values
(or percentages values) for each season in all years. It is suggested
to prefer the median despite the mean if there are some extreme
values (outliers), which are not primarily due to seasonal effects. In
this way, the irregular variation is removed. If there are few abnormal
values in the percentage values, then the mean should be preferred
to remove the randomness. These averages are known as seasonal
indices.
Step 6: If the sum of the seasonal indices is not 400 for quarterly data and
1200 for monthly data, then we find the adjusted seasonal indices by
expressing each seasonal index as the percentage of the average of
seasonal indices as 51
Block 2 Time Series Analysis
Seasonal index
Adjusted seasonal index = × 100
Average of seasonal index
70 + 52 + 22 + 31
2019 70 52 22 31 = 43.75
4
101 + 64 + 24 + 45
2020 101 64 24 45 = 58.5
4
120 + 75 + 30 + 49
2021 120 75 30 49 = 68.5
4
135 + 82 + 34 + 50
2022 135 82 34 50 = 75.25
4
Year Average
2019 43.75
2020 58.5
2021 68.5
2022 75.25
We now determine the yearly trend by fitting a linear trend by the method of
least square as discussed in Unit 10.
After the transformation, the normal equations for a linear trend line are:
∑=
Y t nβ0 + β1 ∑ Xt
∑
= XY t t β0 ∑ Xt + β1 ∑ X2t
52
We calculate the values of ∑ Y , ∑ X , ∑ X Y and ∑ X
t t t t
2
t in the following table:
Unit 11 Seasonal Component Analysis
Xt =
Year (t) Yt XtYt X2t
2(t-2020.5)
2019 43.75 –3 –131.25 9
2020 58.50 –1 –58.50 1
2021 68.50 1 68.50 1
2022 75.25 3 225.75 9
Total 246 0 104.50 20
We now find the trend values by putting Xt = –3, –1, 1, 3 in the above trend
line equation as follows:
= 61.5 + 5.23 × ( −=
Trend value (2019) 3) 45.81
= 61.5 + 5.23 × ( −=
Trend value (2020) 1) 56.27
= 61.5 + 5.23 ×=
Trend value (2021) 1 66.73
Trend value (2022)= 61.5 + 5.23 × =
3 77.19
These trend values represent the averages of the corresponding year and are
supposed to lie at the centre of the corresponding year. Therefore, we place
these in the middle of the 2nd and 3rd quarters. We now determine the trend
values for each quarter using the yearly increment (slope = 5.23) of the trend
line.
We have the yearly increment = 5.23.
5.23
Therefore, the quarterly increment will=
be = 1.31
4
1.31
Similarly, the half-quarterly increment will=
be = 0.66
2
Thus, the trend value for the 3rd quarter of the year 2019 will be
= Trend value of 2019 + half quarterly increment = 45.81 + 0.66 = 46.47 .
Similarly, the trend value for the 2nd quarter of the year 2019 will be
= Trend value of 2019 – half quarterly increment = 45.81 − 0.66 = 45.15 .
Thus, the trend value for the 1st quarter of the same year will be
= Trend value of 2nd quarter – quarterly increment= 45.15 − 1.31= 43.84 .
In a similar way, the values for the 4th quarter of the same year will be
= Trend value of 3rd quarter + quarterly increment= 46.47 + 1.31= 47.78 .
Similarly, you can compute the trend values for the rest years as we have
computed for all quarters of the year 2019. We calculated the same in the
following table: 53
Block 2 Time Series Analysis
Quarterly trend values
Summer 45.15 – 1.31 = 43.84 55.61 – 1.31 = 54.30 66.07 – 1.31 = 64.76 76.53 – 1.31 = 75.22
Monsoon 45.81 – 0.66 = 45.15 56.27 – 0.66 = 55.61 66.73 – 0.66 = 66.07 77.19 – 0.66 = 76.53
Winter 45.81 + 0.66 = 46.47 56.27 + 0.66 = 56.93 66.73 + 0.66 = 67.39 77.19 + 0.66 = 77.85
Spring 46.47 + 1.31 = 47.78 56.93 + 1.31 = 58.24 67.39 + 1.31 = 68.7 77.85 + 1.31 = 79.16
After getting the quarterly trend values, we now remove the trend effect. For
that, we divide each original value into the corresponding trend value and
express them in percentages as shown in the following table:
Season 2019 2020 2021 2022
52 64 75 82
Monsoon 115.17
× 100 = 115.09
× 100 = 113.52
× 100 = 107.15
× 100 =
45.15 55.61 66.07 76.53
22 24 30 34
Winter 47.34
× 100 = 42.16
× 100 = 44.52
× 100 = 43.67
× 100 =
46.47 56.93 67.39 77.85
31 45 49 50
Spring 64.88
× 100 = 77.27
× 100 = 71.32
× 100 = 63.16
× 100 =
47.78 58.24 68.7 79.16
Now, the above data is free from trend effect so we can apply the simple
average method to calculate the seasonal indices. We now compute the
average(seasonal index) yi of each season/quarter in different years and then
adjused seasonal indices as follows:
Calculations for seasonal indices
Season 2019 2020 2021 2022 Average (Seasonal Index) Adjused Seasonal Index
The average yearly seasonal indices obtained above are adjusted to a total of
400 because the total of the seasonal Indices for each quarter is 405.04 which
is greater than 400. So we express each seasonal index as the percentage of
the average of seasonal indices. The adjusted seasonal indices for each
quarter are given in the last column of the above table.
We now discuss the merits and demerits of ratio to trend method as follows:
Merits and Demerits
SAQ 2
The marketing manager of a company that manufactures and distributes
farming equipment (harvesters, ploughs and tractors) recorded the number of
farming units sold quarterly for the period 2018 to 2020 which are given in the
following table:
Quarter
Q1 Q2 Q3 Q4
Year
2018 48 41 60 65
2019 58 52 68 74
2020 60 56 75 78
(i) Find the quarterly seasonal indexes for farming equipment sold using the
ratio to trend method by assuming that there is no cyclic effect.
(ii) Do seasonal forces significantly influence the sale of farming equipment?
Comment.
After understanding the simple average and ratio to trend methods and their
merits and demerits, let us move to the ratio to moving average method.
Summer 70
Monsoon 52
70 + 52 + 22 + 31
= 43.75
4
2019 43.75 + 51.50 22
Winter 22 = 47.63 × 100 =
46.19
2 47.63
52 + 22 + 31 + 101
= 51.50
4
51.50 + 54.5 31
Spring 31 = 53 × 100 =
59.49
2 53
22 + 31 + 101 + 64
= 54.50
4
56
Unit 11 Seasonal Component Analysis
54.5 + 55 101
Summer 101 = 54.75 × 100 =
184.47
2 54.75
31 + 101 + 64 + 24
= 55
4
55 + 58.5 64
Monsoon 64 = 56.75 × 100 =
112.78
2 56.75
101 + 64 + 24 + 45
2020 = 58.5
4
58.5 + 63.25 24
Winter 24 = 60.88 × 100 =
39.43
2 60.88
64 + 24 + 45 + 120
= 63.25
4
63.25 + 66 45
Spring 45 = 64.63 × 100 =
69.63
2 64.63
24 + 45 + 120 + 75
= 66
4
66 + 67.5 120
Summer 120 = 66.75 × 100 =
179.78
2 66.75
45 + 120 + 75 + 30
= 67.5
4
67.5 + 68.5 75
Monsoon 75 = 68 × 100 =
110.29
2 68
120 + 75 + 30 + 49
2021 = 68.5
4
68.5 + 72.25 30
Winter 30 = 70.38 × 100 =
42.63
2 70.38
75 + 30 + 49 + 135
= 72.25
4
72.25 + 74 49
Spring 49 = 73.13 × 100 =
67.01
2 73.13
30 + 49 + 135 + 82
= 74
4
74 + 75 135
Summer 135 = 74.5 × 100 =
181.21
2 74.5
49 + 135 + 82 + 34
= 75
4
75 + 75.25 82
Monsoon 82 = 75.13 × 100 =
109.15
2022 2 75.13
135 + 82 + 34 + 50
= 75.25
4
Winter 34
Spring 50
After that, we compute the seasonal ratio for each period (for which we have
the moving average) by dividing each actual time series value by its
corresponding moving average value as
Actual value
Seasona ratio
= × 100
Moving average
We have computed these in the last column of the above table. 57
Block 2 Time Series Analysis
We now prepare a two-way table consisting of the seasonal ratios quarter-
wise for all years as shown in the following table and calculate median
(seasonal indices) and adjusted seasonal indices as follows:
Seasonal Index Adjusted Seasonal
Season 2019 2020 2021 2022
( Median) Index
181.21
Summer 184.47 179.78 181.21 181.21 × 100 =
180.7
100.29
110.29
Monsoon 112.78 110.29 109.15 110.29 × 100 =
109.98
100.29
42.63
Winter 46.19 39.43 42.63 42.63 × 100 =
42.51
100.29
67.01
Spring 58.49 69.63 67.01 67.01 × 100 =
66.82
100.29
Total 401.14 400
Average 100.29 100
After understanding the ratio to moving average method, let us see the merits
and demerits of this method.
Merits and Demerits
• The ratio to moving average method is the most popular method because
it can also be applied when all four components of a time series are
present.
• It is easier than the ratio to trend method and provides satisfactory
estimates.
• It is better than the simple average method which assumes that the trend
is absent. But more difficult than the simple average method.
• Its primary drawback is a loss of information when calculating trend, the
moving averages (trend) of a few seasons in the beginning and an equal
number of seasons in the end are not available.
Before moving to the next session, you should try the following Self
Assessment Question.
SAQ 3
Compute the quarterly seasonal indexes for the sales of farming equipment
data given in SAQ 2.
Deseasonalised
Demand of Seasonal
Year Season Demand of
Electricity Index
Eelectricity
70
Summer 70 180.70 × 100 =
38.74
180.70
52
Monsoon 52 109.98 × 100 =
47.28
109.98
2019
22
Winter 22 42.51 × 100 =
51.75
42.51
31
Spring 31 66.82 × 100 =
46.39
66.82
101
Summer 101 180.70 × 100 =
55.89
180.70
2020
64
Monsoon 64 109.98 × 100 =
58.19
109.98
59
Block 2 Time Series Analysis
24
Winter 24 42.51 × 100 =
56.46
42.51
45
Spring 45 66.82 × 100 =
67.35
66.82
120
Summer 120 180.70 × 100 =
66.41
180.70
75
Monsoon 75 109.98 × 100 =
68.19
109.98
2021
30
Winter 30 42.51 × 100 =
70.57
42.51
49
Spring 49 66.82 × 100 =
73.33
66.82
135
Summer 135 180.70 × 100 =
74.71
180.70
82
Monsoon 82 109.98 × 100 =
74.56
109.98
2022
34
Winter 34 42.51 × 100 =
79.98
42.51
50
Spring 50 66.82 × 100 =
74.83
66.82
We now determine the trend by fitting a linear trend line by the method of least
square as discussed in Unit 10.
Here, time is in the form of seasons (Summer, Monsoon, Winter, Spring)
which is non-numeric. Therefore, we use the sequential numbering system
and take t = 1, 2, 3, … instead of the name of seasons as shown in the next
table.
The linear trend line equation is given by
Y=
t β0 + β1t
After the transformation, the normal equations for linear trend line are:
∑=
Y t nβ0 + β1 ∑ Xt
∑
= XY t t β0 ∑ Xt + β1 ∑ X2t
Deseasonalised Xt = Trend
Year Season t Demand of XtYt X 2t Value
Electricity (Yt) 2(t-8.5)
1004.63
1004.63 = 16 × β0 + β1 × 0 ⇒ β0 = = 62.79
16
1690.47
1690.47 = β0 × 0 + β1 × 1360 ⇒ β1 = = 1.24
1360
Thus, the final linear trend line is given by
=Yt 62.79 + 1.24X t
We calculate the trend value by putting Xt = –15, –13,…, 15 in the above trend
line. The same is calculated in the last column of the above table.
We now plot the original and deseasonalised demand of electricity with the
trend line in Fig. 11.2.
Y Deseasonalised
150 demand of electricity
130 Trend
Demand of
electricity
110
Demand of electricity
90
70
50
30
10 X
Fig. 11.2: Actual and deseasonalised demand of electricity with trend values. 61
Block 2 Time Series Analysis
You should try a Self Assessment Question before going to the next section.
SAQ 4
A manager of a national park recorded the following data on the number of
visitors (in thousands) who visited the park in each quarter of 2021 and 2022:
Seasonal
Season 2021 2022
index
Summer 162 51 54
Monsoon 62 28 32
Winter 87 41 45
Spring 89 36 43
Tt × St × It
and then = Ct × It
Tt
If irregular variations are also removed from the time series data then we will
be able to isolate the cyclic fluctuations. We follow the following steps in the
computation of cyclic variations:
Step 1: We first compute the trend values preferably by the moving average
method of suitable order as discussed in Unit 10. Generally, the order Since all components
of moving average is taken as the period of seasonal effect. are present in the time
Step 2: After finding the trend values, we find the seasonal indices preferably sereis, therefore
by ratio to moving average as discussed in Sec. 11.5.
Step 3: After estimating the seasonal effects, we obtain deseasonalised And after dividing by the
values by dividing the actual value by its corresponding seasonal seasonal component
index which removes the seasonal effect from the original data. trend, cyclic and
irregular will remain
Step 4: We then fit the trend equation on the deseasonalised data and find
the trend values and express the deseasonalised data as a
percentage of trend values.
When we divide by the
Step 5: If the time series does not contain any random variations, then Step 4 trend vaues then
will provide the cyclical variations. Otherwise, we filter/smooth out the
random variations by computing moving averages of values obtained
in Step 4 with an appropriate period. Weighted moving average with
suitable weights may also be used, if necessary, for this purpose.
63
Block 2 Time Series Analysis
Let us take up an example to illustrate how to calculate cyclic component.
Example 5: Calculate cyclic effects using the deseasonalised and trend
values for the demand of electricity calculated in Example 4. Also, plot the
original and deseasonalised values and cyclic effect at the same axis.
Solution: The deseasonalised and trend values for the demand of electricity
calculated in Example 4 are given in the following table:
Deseasonalised Cyclic Effect
Trend
Year Season Demand of
Value
Electricity
38.74
Summer 38.74 44.19 × 100 =
87.67
44.19
47.28
Monsoon 47.28 46.67 × 100 =
101.31
46.67
2019
51.75
Winter 51.75 49.15 × 100 =
105.29
49.15
46.39
Spring 46.39 51.63 × 100 =
89.85
51.63
55.89
Summer 55.89 54.11 × 100 =
103.29
54.11
58.19
Monsoon 58.19 56.59 × 100 =
102.83
56.59
2020
56.46
Winter 56.46 59.07 × 100 =
95.58
59.07
67.35
Spring 67.35 61.55 × 100 =
109.42
61.55
66.41
Summer 66.41 64.03 × 100 =
103.72
64.03
68.19
Monsoon 68.19 66.51 × 100 =
102.53
66.51
2021
70.57
Winter 70.57 68.99 × 100 =
102.29
68.99
73.33
Spring 73.33 71.47 × 100 =
102.60
71.47
74.71
Summer 74.71 73.95 × 100 =
101.03
73.95
74.56
Monsoon 74.56 76.43 × 100 =
97.55
76.43
2022
79.98
Winter 79.98 78.91 × 100 =
101.36
78.91
74.83
Spring 74.83 81.39 × 100 =
91.94
81.39
We now plot the original and deseasonalised values with the cyclic effect in
64 Fig. 11.4.
Unit 11 Seasonal Component Analysis
110
90
Claims
70
50
30
10 X
You should try a Self Assessment Question before going to the next section.
SAQ 5
Compute cyclic component using the deseasonalised and trend values for the
number of visitors (in thousands) calculated in SAQ 4.
11.9 FORECASTING
One of the main purposes of time series analysis is forecasting. In the simplest
terms, time-series forecasting is a method for predicting future values over a
period or at a precise point in the future using historical and present
data. Forecasting in time series is based on the assumption that the time
series will remain the same in future, at least for the period for which we are 65
Block 2 Time Series Analysis
forecasting, as in the past. This assumption is not very realistic, and we shall
assume that at least for short-term forecasting the process remains the same
as in the past.
If a time series plot shows that there is no seasonal component, or on a
Forecasing
theoretical basis there is no reason for having a seasonal component, then
Time-series forecasting
one can estimate the trend component and by using the trend equation one
is a method for
predicting future values
can easily forecast as you have seen in Example 1 of Unit 10. If a time series
over a period or at a plot shows that there is a significant seasonal effect and on the theoretical
precise point in the ground also there is a valid reason for the presence of such a component,
future using historical then we have to take into account seasonal variations while estimation and
and present data. forecasting.
When we have quarterly (monthly) data and the period of seasonality is one
year then we have to estimate four (twelve) seasonal indices, one for each
quarter (month). After deseasonalised data, the trend equation is fitted. Then
we project the trend for the period for which we have to forecast and then
adjust it for seasonal effect by multiplying it by the corresponding seasonal
index. This gives the final forecast value which has been corrected for
seasonal effect. For forecasting on the basis of time series components, we
follow the following steps:
Step 1: We first compute the trend values preferably by the moving average
method of suitable order as discussed in Unit 10. Generally, the order
of moving average is taken as the period of seasonal effect.
Step 2: After finding the trend values, we find the seasonal indices preferably
by ratio to moving average as discussed in Sec. 11.5.
We forecast the value for the winter of 2023 by putting t = 19 in the above
trend line equation because the numerical coding for the winter of 2023 is 19
as discussed in Example 4.
SAQ 6
Forecast the number of visitors for the spring season of 2023 using estimated
seasonal indices and fitted trend line equation in SAQ 4.
11.10 SUMMARY
In this unit, we have discussed:
• The simple average method is used to estimate the seasonal effect from
the given time series data. It is based on the basic assumption that the
data do not contain any trend and cyclic components and consists of
eliminating irregular components by averaging the monthly (or quarterly
or yearly) values over the years.
• The seasonal index also called seasonal effect or seasonal component
and it measures the average magnitude of the seasonal influence on the
actual values of the time series for a given period within the year and it
measures how a particular season compares on average to the mean of
the cycle.
• The ratio to trend method is used when cyclical variations are absent
from the data, i.e., the time series variable consists of trend, seasonal
and random components.
• The ratio to moving average method is better than the simple and ratio to
trend methods because of its accuracy, and it can also be applied when
all components namely trend, seasonal, cyclic and irregular variations
present in time series.
• For calculating deseasonalised values, we divide the actual value by its
corresponding seasonal index, that is,
Actual value
Deseasonalised value = × 100
Seasonal index
• To construct meaningful typical cycle indexes of cyclic component similar
to those that have developed for trends and seasons is impossible
because the successive cycles vary widely in timing, amplitude and
patterns and are inextricably mixed with irregular factors.
• An irregular variation cannot be eliminated from any time series because
of its nature. It is very difficult to devise a formula for their direct
computation. But this component can be removed a little bit by averaging
the indices.
• Time series forecasting is a method for predicting future values over a 67
Block 2 Time Series Analysis
period or at a precise point in the future using historical and present
data.
10.12 SOLUTION/ANSWERS
Self Assessment Questions (SAQs)
1. Since time series data of sales are recorded monthly, therefore, we
compute the average and seasonal indices for each month as shown in
the following table:
Sales Average Adjusted
Months Total Seasonal Seasonal
2020 2021 2022 Index Index
January 36 45 48 129 43 104.67
February 33 42 45 120 40 97.37
March 30 39 42 111 37 90.07
April 42 48 48 138 46 111.98
May 45 48 45 138 46 111.98
June 45 45 51 141 47 114.41
July 48 51 48 147 49 119.28
August 39 36 39 114 38 92.50
September 33 42 30 105 35 85.20
October 30 36 30 96 32 77.90
November 36 39 33 108 36 87.63
December 45 42 45 132 44 107.11
Total 493 1200.10
Average 41.08 100
2. Since we have to find seasonal indices using the ratio to trend method,
therefore, we first convert the quarterly data into yearly by computing the
average of all quarters of each year as follows:
Year Q1 Q2 Q3 Q4 Average
48 + 41 + 60 + 65
2018 48 41 60 65 = 53.50
4
2019 58 52 68 74 63.00
2020 60 56 75 78 67.25
2021 62 60 82 90 73.50
We now fit the trend line of the yearly data. For that, the linear trend line
equation is given by
68 Y=
t β0 + β1t
Unit 11 Seasonal Component Analysis
Since n (= 4) is even, therefore, we make the following transformation in
time t:
t − average of two middle value t − 2019.5
Xt
= = = 2 ( t − 2019.5 )
half of interval in t values 1/ 2
After the transformation, the normal equations for linear trend line will be:
∑=
Y t nβ0 + β1 ∑ Xt
∑
= XY t t β0 ∑ Xt + β1 ∑ X2t
Q1 54.28– 0.8 = 53.48 60.7 – 0.8 = 59.9 67.12 – 0.8 = 66.32 73.54 – 0.8 = 72.74
Q2 54.68 – 0.4 = 54.28 61.1 – 0.4 = 60.7 67.52 – 0.4 = 67.12 73.94– 0.4 = 73.54
Q3 54.68 + 0.4 = 55.08 61.1 + 0.4 = 61.5 67.52 + 0.4 = 67.56 73.94 + 0.4 = 74.34
Q4 55.08 + 0.8 = 55.88 61.5 + 0.8 = 62.3 67.56 + 0.8 = 68.72 74.34 + 0.8 = 75.14
69
Block 2 Time Series Analysis
We now remove the trend effect by dividing each original value by the
corresponding trend value and expressing them in percentages as
shown in the following table:
Quarter 2018 2019 2020 2021
48 41 60 65
Q1 × 100 =
89.75 68.45
× 100 = 90.47
× 100 = 89.36
× 100 =
53.48 59.9 66.32 72.74
58 52 68 74
Q2 106.85
× 100 = 85.67
× 100 = 101.31
× 100 = 100.63
× 100 =
54.28 60.7 67.12 73.54
60 56 75 78
Q3 108.93
× 100 = 91.06
× 100 = 110.42
× 100 = 104.92
× 100 =
55.08 61.5 67.72 74.34
62 60 82 90
Q4 110.95
× 100 = 96.31
× 100 = 119.32
× 100 = 119.78
× 100 =
55.88 62.3 68.72 75.14
Interpretation
The seasonal index for the first quarters is 84.82 which indicates that the
sale of farming equipment, on average, is 15.18 (100 – 84.82)times
below the average. Therefore, we can say that the sale of farming
equipment would be about 15% higher in the season if seasonal
influences had not been present. The seasonal index for the second and
third quarters are almost equal to the average whereas it is (111.99-100)
12 times higher than the average.
3. We first arrange the sale of the farming equipment data quarterly in
chronological order of different years and then obtain 4 quarterly moving
averages as follows:
Calculation of moving averages
Centred
Farming 4-quarterly Seasonal
Year Quarter 4-quarterly
Equipment MA Ratio
MA
Q1 48
Q2 41
2018 53.50
Q3 60 54.75 109.59
56
Q4 65 57.38 113.28
70 58.75
Unit 11 Seasonal Component Analysis
Q1 58 59.75 97.07
60.75
Q2 52 61.88 84.03
2019 63
Q3 68 63.25 107.51
63.50
Q4 74 64 115.63
64.50
Q1 60 65.38 91.77
66.25
Q2 56 66.75 83.90
2020 67.25
Q3 75 67.5 111.11
67.75
Q4 78 68.25 114.29
68.75
Q1 62 69.63 89.04
70.50
Q2 60 72 83.33
2021 73.50
Q3 82
Q4 90
After that, we compute the seasonal ratios for each period (for which we
have the moving average) by dividing each actual time series value, by
its corresponding moving average value as
Actual value
Seasona ratio
= × 100
Moving average
We have computed the seasonal ratios in the last column of the above
table.
Seasonal Adjusted
Season 2018 2019 2020 2021
Index (Median) Seasonal Index
Q1 97.07 91.77 89.04 91.77 91.87
Q1 84.03 83.90 83.33 83.90 83.99
Q1 109.59 107.51 111.11 109.59 109.71
Q1 113.28 115.63 114.29 114.29 114.42
Number Deseasonalised
Seasonal
Year Season of Data of Number of
Index
Visitors Visitors
2021 Summer 51 162 31.48
Monsoon 28 62 45.16
Winter 41 87 47.13
Spring 36 89 40.45
2022 Summer 54 162 33.33
Monsoon 32 62 51.61
Winter 45 87 51.72
Spring 43 89 48.31
∑=
Y t nβ0 + β1 ∑ Xt
∑
= XY t t β0 ∑ Xt + β1 ∑ X2t
Y=
t 43.65 + 0.93 × 2 ( t − 4.5 )
We calculate the trend value by putting Xt = –7, –5, ,…, 7 in the above
trend line. The trend values are calculated in the last column of the
above table.
We now plot the original and deseasonalised number of visitors with the
trend line in Fig. 11.5.
Deseasonalised no. of
visitors
70 Y
No. of visitors
60 Trend values
Number of visitors
50
40
30
20
10 X
2021 2022
Year
Fig. 11.5: Actual and deseasonalised number of visitors with trend values.
Number Deseasonalised
Seasonal Trend Cyclic
Year Season of Data of Number
Index Value Effect
Visitors of Visitors
Summer 51 162 31.48 37.14 84.76
Monsoon 28 62 45.16 39.00 115.79
2019
Winter 41 87 47.13 40.86 115.35
Spring 36 89 40.45 42.72 94.69
Summer 54 162 33.33 44.58 74.76
Monsoon 32 62 51.61 46.44 111.13
2020
Winter 45 87 51.72 48.30 107.08
Spring 43 89 48.31 50.16 96.31
31.48
× 100 =
84.76
37.14
73
Block 2 Time Series Analysis
The rest are calculated in the last column of the table.
6. We have the fitted trend equation:
Y=
t 43.65 + 0.93 × 2 ( t − 4.5 )
74
UNIT 12
STATIONARY TIME SERIES
Structure
12.1 Introduction 12.4 Transforming Nonstationary
Time Series into Stationary
Expected Learning Outcomes
Differencing
12.2 Stationary and Nonstationary
Time Series Seasonal Differencing
Statistical Tests
12.1 INTRODUCTION
In the previous two units of this block, we have seen that time series can be
decomposed into four components, i.e., trend, seasonal, cycle, and irregular
components. We have also discussed some methods for estimating trend,
seasonal and cyclic components and then how to use them for the forecast.
Due to several features of the time series, this approach is not necessarily the
A time series is said
best one. to be stationary if
According to the modern approach, we try to fit a time series model so that we the properties of
one segment of the
can forecast the observations. But one of the essential elements of time series time series are
modelling is stationarity. A stationary times series is unaffected by the instant similar to the other
at which it is viewed. Most of time series forecasting models assume that the segment of the time
series. In other
underlying time series is stationary. In this unit, you will learn some
words, a stationary
fundamental concepts that are necessary for a proper understanding of time time series is a
series modelling. We begin with a simple introduction of stationary and series whose
nonstationary time series in Sec. 12.2. Since stationarity is one of the essential statistical properties
such as mean,
elements of a time series, therefore, we discuss various methods of detecting
variance, etc.of one
stationarity in Sec. 12.3. As stationarity is necessary to model a time series section are much
and if a time series shows a particular type of non-stationarity, then some like other sections.
simple transformation makes it stationary and then we can model them.
Therefore, in Sec. 12.4, we explain various methods of transforming a 75
* Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 3 Time Series Analysis
nonstationary time series into a stationary one. In the next unit, you will study
the concept of correlation in time series.
76
Fig. 12.1: Distributions of stationary and nonstationary time series.
Unit 12 Stationary Time Series
Part (a) of the figure shows that the distribution of the time series remains the
same at different segments, therefore, it is stationary time series whereas part
(b) of the figure shows that the distribution of the time series changed with
time segment, therefore, it is a nonstationary time series.
We now discuss which statistical properties are used to check the same. As
you know that the mean and the variance of data are frequently used as basic
statistics to capture characteristics of data. Also, to describe the broad
characteristics of the data distribution, a histogram is used. Therefore, by
obtaining the mean, the variance and the histogram, it is expected to capture
some aspects or features of the data. but it is observed that for two different
time series data, the shape of the histogram is almost similar. Consider the
two segments of the sales data of a grocery shop in different months as shown
in Fig. 12.2.
Y Y
90
10880
10780
10680
88
10580
10480
10380 86
10280
10180
84
10080
9980
9880
1 10 19 28 37 X 82
1 10 19 28 37 X
Time Time
(a) (b)
Fig. 12.2: Two segments of the sales data.
From Fig. 12.2, we can observe that the shape of both segments of the time
series appeared differently. We now plot the histograms of these in Fig. 12.3.
(a) (b)
Fig. 12.3 indicates that the histograms of both segments are quite similar, but
our time series are quite different. Therefore, it indicates that a histogram is
not sufficient to check whether the properties/distribution of one segment of
the time series are similar to the other segment. In other words, we can say
that only the univariate (marginal) distribution of the time series cannot check
the same. Therefore, we move to the joint distribution, and we know that joint
distribution describes the properties such as covariance and correlation
coefficients. As you know that to get the idea of correlation, we plot the scatter
77
diagram. Therefore, we plot the scatter diagrams of both segments of the time
Block 3 Time Series Analysis
series by taking time series values, say, Yt on the X-axis and its lag values,
The number of say, Yt+1 on the Y-axis (You will learn about the term lag in the next sections of
intervals between
the two this unit.) as shown in Fig. 12.4.
observations is the
lag. For example,
91
10900
current and
previous
10700 89
observations at Yt 84
9800 81
9800 9900 10000 10100 10200 10300 10400 10500 10600 10700 10800 10900 81 82 83 84 85 86 87 88 89 90 91
(a) (b)
Fig. 12.4: Scatterplots with it lag of two segments of the sales data.
The scatterplot in Fig. 12.4(a) reveals that the data are uniformly distributed
about the origin in a circle, which suggests that there is minimal connection
between Yt and Yt+1. On the other hand, the scatterplot shown in Fig. 12.4(b) is
concentrated around a line with a positive slope, showing that Yt and Yt+1
have a strong positive correlation.
These examples demonstrate that it is important to take into account not only
the distribution of Yt but also the joint distribution of a time series with its lags
for checking the stationarity of a time series.
We shall describe two types of stationarity as follows:
This is one of the most commonly observed series in real-life practice. In this
course whenever we call a times series stationary, we in fact refer to weak
stationarity.
12.2.2 Strict Stationarity
As we described that if the time series data do not follow the normal
distribution, then the mean, variance and covariance do not capture the
complete distribution. Therefore, it is necessary to check the joint distribution
of the time series. In the strict sense, a time series is called stationary if the
joint probability distribution of the observations, Yt, Yt+1, …, Yt+n remains the
same as another set of observations shifted by k (k > 0), k is called lag, time
units, that is, Yt+k, Yt+k+1, …, Yt+k+n. As a result, a time series is strictly
stationary if all statistical measures (such as mean, variance, higher moments,
etc.) are constant with respect to time, i.e., do not depend on t. Thus, when n
= 1, the (univariate) distribution of Yt is the same as that of Yt+k for all t and k.
We can also say that the Y’s are (marginally) identically distributed and
therefore, it then follows for all t and k that the mean and variance are constant
over time, that is,
Mean [ Yt ] Mean
= = [ Yt +k ] μ(constant ) and
Var [ Yt ] Var
= = [ Yt +k ] σ 2 (constant )
When n = 2, then according to the definition of stationarity we see that the
bivariate distribution of Yt and Ys must be the same as that of Yt+k and Ys+k, it
follows that for all t, s and k the covariance between y t and y s is constant over
time, that is,
Cov [ Yt , Ys ] Cov
= = [ Yt +k , Yt +s ] constant
That is, the covariance between Yt and Ys depends only on the time only
through the time difference t − s and not on the actual times t and s. 79
Block 3 Time Series Analysis
In most of the time series analysis, we do not require such strong conditions
and we shall define a weak type of stationarity which will be sufficient for most
of our purposes.
After understanding the stationarity and non-stationarity, we now come to the
answer “Why is stationarity necessary for time series analysis?” There
are two main reasons:
• The first one is that the inferences drawn from a nonstationary process
will not be reliable as its statistical properties/ parameters will keep on
changing with time. Thus, if these parameters are continuously changing,
estimating them by averaging over time will not be accurate.
Y Y
870 168
Monthly Sales of New Houses
770 166
164
670
162
Car Sales
570
160
470 158
370 156
154
270
152
170 X
150 X
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Year Day
(a) (b)
170
Y
160
150
140
Sales of new Houses
130
120
110
100
90
80
70 X
2013 2014 2015 2016 2017 2018 2019
Month
(c)
Y
1500 Y
66
1350
64
1200
Sales of ice-cream
1050 62
Traffic density
900 60
750
58
600
56
450
300 54
150 X 52 X
2013 2014 2015 2016 2017 1 11 21 31 41 51 61 71
Month Day
(d) (e)
Fig. 12.5: Time series plots of various data. 81
Block 3 Time Series Analysis
Which of these do you think are stationary? Let us discuss them one at
a time.
If you look at the first plot (car sales) (a), we can see that the mean varies
(increases) with time which results in an upward trend. Thus, this is a
nonstationary series. For a series to be classified as stationary, it should
not exhibit a trend.
Moving on to the second plot (b), we can see that there is no trend in the
series, but the variance of the series increases with time. As mentioned
previously, a stationary series must have a constant variance.
The third plot (c) shows that the sales of new houses first increase and then
decrease over a span of time, therefore, it is not also a stationary series.
The fourth plot (d) shows the seasonality as well as the trend (upward) and a
regularly repeating pattern of highs and lows related to months of the year.
Therefore, it is not a stationary series.
If you look at the fifth plot (e), we can see that there is no consistent trend
(upward or downward) over the entire time span. The series appears to slowly
wander up and down. The horizontal line drawn at 60 indicates the mean of
the series and we notice that the series tends to stay on the same side of the
mean (above or below) for a while and then wanders to the other side. Also,
the variance is constant. Almost by definition, there is no seasonality. So we
can say that this time series is stationary.
This method is only used to get an idea about stationarity and is not
completely reliable.
Autocorrelation Functions Plots
The time series plot gives only the idea of stationarity. Autocorrelation function
plot also known as correlogram is another method through which we can look
for the stationarity of a time series more accurately. We will describe it in
the next section.
After understanding how to visually check the stationarity in the data. Let's
move to other techniques for detecting stationarity.
12.3.2 Summary Statistics
As we discussed, a stationary time series has a constant mean, variance, etc.
over time. Therefore, we can use summary statistics like mean and variance to
check whether a time series is stationary or not.
In this method, we divide the data into two or more random groups and for
each group, we calculate the summary statistics as the mean or other
moments. After that, we analyse the summary statistics of such groups. If the
mean and variance of these groups are very close to each other, the series is
stationary otherwise, it is not stationary.
For example, we split the data of the traffic intensity as discussed above into
two halves and calculate the mean and variance of each group as
1 n1
82
Mean of the first =
group = ∑ xi 60
n1 i=1
Unit 12 Stationary Time Series
n2
1
Mean of the second =
group = ∑ xi 60.5
n2 i=1
1 n1
=
Variance of the first group ∑ (xi − x)
n1 i=1
= 2
5.69
n2
1
=
Variance of the second group
n2
∑ (x
i =1
i x)2 5.26
−=
We can observe that the mean and variance of both groups are very close to
each other, therefore, the series is stationary.
In a similar way, if we split the data of the monthly sales of new houses sold in
the region into two halves and calculate mean and variance of each group as
1 n1
Mean of the first =
group = ∑ yi 132
n1 i=1
1 n2
Mean of the second =
group = ∑ yi 114
n2 i=1
1 n1
=
Variance of the first group ∑ (yi − y)
n1 i=1
= 2
168.48
n2
1
=
Variance of the second group
n2
∑ (y
i =1
i y)2 366.45
−=
We can observe that there is a big difference between the two variances and
the two means' values. This implies that the series is not stationary.
If we want to apply a more effective and practical way to check whether the
series is stationary or not, then we may use different statistical tests which are
discussed in the next sub-sections.
There are both parametric and nonparametric tests (you will learn about
parametric and nonparametric tests in MST-016: Statistical Inference and
MST-019: Classical and Bayesian Estimation) that may be used to check
whether a time series is stationary or not. For testing the stationarity, we
formulate the hypotheses as
For testing the hypotheses, two of the most used tests to test for stationarity
are the Augmented Dickey-Fuller test and the Kwatkowski-Phillips-
Schmidt-Shin test. These tests are beyond the scope of this unit. We shall
not discuss these here and if someone is interested in these, may refer to
Time Series Analysis Forecasting and Control, 4th Edition, written by Box,
Jenkins and Reinsel.
You now may like to pause here and check your understanding of detecting
stationarity by answering the following Self Assessment Question. 83
Block 3 Time Series Analysis
SAQ 1
The following Figs 12.6 (a) and (b) show the production of a company in
different years and temperatures of a process on different days, respectively.
Check visually whether the time series are stationary or not.
44,000 Y
39,000
34,000
29,000
Production
24,000
19,000
14,000
9,000
4,000 X
1 11 21 31 41
Year
Fig. 12.6 (a): Time series plots of production of a company in different years.
144 Y
142
140
Temperature
138
136
134
132
130 X
1 11 21 31 41
Day
Fig. 12.6 (b): Time series plots of the temperature of a process on different days
• Seasonal differencing
• Log-Transformation
• Power Transformation
• Box-Cox transformation
where Yt and Yt-1 are the values of the time series at time point t and t –1,
respectively and Yt′ represents the first-order difference.
12.4.3 Log-Transformation
In time series analysis, the log-transformation is often used to stabilise the
variance and remove the non-linearity trend from a time series. Time series
with an exponential trend can be made linear by taking the logarithm of the
values. If we denote the original observations as Y1, Y2 ,..., YN then we make
the transformation as
Z t = log ( Yt ) 85
Block 3 Time Series Analysis
where Z1, Z2 ,..., ZN denote the transformed observations and the logarithm is
natural (i.e., to base e). Since the stationarity of time series helps us construct
the forecast model, therefore, it is important to apply the inverse of that
transformation to the data in order to get back to the original scale. Therefore,
we can find the original values as
Yt = exp ( Z t )
George Box and Sir David Cox proposed a very useful family of
transformations called Box-Cox transformations. The beauty of this
transformation is that it includes both logarithms and power
transformations and is defined as follows:
log ( Yt ) ; if λ =0
Zt = Y λ − 1
t
; otherwise
λ
In this transformation, the logarithm is natural (i.e., to base e). It depends on
the parameter λ which varies from –5 to 5. If λ = 0, then this transformation
uses the log-transformation whereas if λ ≠ 0, then a power transformation.
We consider all values of λ and select the optimal value for our data. The
“optimal value” is the one which results in the variance stationary. We list
some common values used for λ as follows:
• λ = –1 is a reciprocal transform.
• λ = –0.5 is a reciprocal square root transform.
• λ = 0 is a log transform.
• λ = 0.5 is a square root transform.
• λ = 1 is no transform.
In the case of the Box-Cox method, we can find the original values as
exp ( Z t ) ; if λ =0
Yt =
( λZ t + 1) ; otherwise
1/ λ
Y
65
60
Energy Consumption
55
50
45
40
35
30
25 X
Year
From the plot, we can see that the mean energy consumption varies
(increases) with time which results in an upward trend. Thus, this is a
nonstationary time series. For a series to be classified as stationary, it should
not exhibit a trend.
For removing the trend, we obtain the first-order difference, that is, we
compute the difference between consecutive observations in the series by
subtracting the previous value from each value in the series. Mathematically, it
87
can be applied as
Block 3 Time Series Analysis
Y=
t
′ Yt − Yt −1
You calculate the rest values in a similar way which are given in the following
table:
Energy Energy
Year Yt′ Year Yt′
Consumption (Yt) Consumption (Yt)
2001 32.0 ̶ 2011 43.4 1.8
2002 34.6 2.6 2012 45.0 1.6
2003 37.0 2.4 2013 45.7 0.7
2004 36.7 –0.3 2014 46.0 0.3
2005 37.6 0.9 2015 49.6 3.6
2006 37.1 –0.5 2016 51.8 2.2
2007 39.1 2.0 2017 54.0 2.2
2008 41.7 2.6 2018 53.8 –0.2
2009 41.8 0.1 2019 55.7 1.9
2010 41.6 –0.2 2020 58.8 3.1
To study the impact of the first-order differencing on the pattern of the time
series, we plot the first-order differences values against time (years) in
Fig. 12.8.
Y
4.0
3.5
First differenced values
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
-1.0 X
Year
Fig. 12.8: Time series plots after the first difference of the energy consumption data.
If you look at Fig. 12.8, we observe that there is no consistent trend (upward or
downward) over the entire time span. It means that the first-order difference
removes the trend effect, and the time series becomes almost stationary.
Before going to the next session, you may like to check stationarity yourself.
For that try a Self Assessment Question.
SAQ 2
Suppose a researcher wants to study the pattern of sales of a store. He
records the monthly sales (in thousands) of the store for 36 months which are
given as follows:
Month Sales Month Sales
1 150 14 3352
2 184 15 4524
88 3 245 16 5512
Unit 12 Stationary Time Series
4 284 17 6022
5 301 18 6254
6 325 19 7906
7 526 20 9325
8 852 21 10450
9 1253 22 9358
10 1542 23 13688
11 1425 24 18542
12 1742 25 25142
13 2015
12.5 SUMMARY
In this unit, we have discussed:
• A time series is said to be stationary if the statistical properties of one
segment of the time series are similar to the other segment of the time
series otherwise it is called nonstationary time series.
• Various methods for detecting stationarity such as visualisation, summary
statistics and statistical tests.
• Various methods of transforming nonstationary time series to stationary
such as differencing, seasonal differencing, log-transformation, power
transformation, and Box-Cox transformation.
12.7 SOLUTION/ANSWERS
Self Assessment Questions (SAQs)
1. Fig. 12.6 (a) shows that the production of the company increases with
time, therefore, there is a trend effect in the time series. Thus, this is a
nonstationary time series. For a series to be classified as stationary, it
should not exhibit a trend.
Fig. 12.6 (b) shows that there is no consistent trend (upward or
downward) over the entire period. The series appears to slowly wander
up and down. If we plot a line at 135 then it indicates the mean of the
series and we notice that the series tends to stay on the same side of
the mean (above or below) for a while and then wanders to the other
side. Also, the variance is constant. Almost by definition, there is no
seasonality. So we can say that this time series is stationary.
2. We plot the sales data by taking months on the X-axis and the sales
on the Y-axis. We get the time series plot as shown in Fig. 12.9.
Y
30000
25000
20000
Sales
15000
10000
5000
X
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Month
From the plot, we observe that sale increases with time which results in
an upward trend. For a series to be classified as stationary, it should not
90 exhibit a trend. Thus, it is a nonstationary series.
Unit 12 Stationary Time Series
According to log-transformation, we take the natural log (base e) of sales
data as shown in the following table:
Month Sales (Yt) Log (Yt) Month Sales (Yt) Log (Yt)
13 2015 3.304
Y
5.0
4.5
Log-transformed Sales
4.0
3.5
3.0
2.5
2.0 X
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Month
We now obtain the first-order difference of the transformed data, that is,
we compute the difference between consecutive observations in the
series by subtracting the previous value from each value in the series as: 91
Block 3 Time Series Analysis
=Yt′ log ( Yt ) − log ( Yt −1 )
Month Sales (Yt) Log (Yt) Yt′ Month Sales Log (yt) Yt′
1 150 2.176 0.089 14 3352 3.525 0.130
2 184 2.265 0.124 15 4524 3.656 0.086
3 245 2.389 0.064 16 5512 3.741 0.038
4 284 2.453 0.025 17 6022 3.780 0.016
5 301 2.479 0.033 18 6254 3.796 0.102
6 325 2.512 0.209 19 7906 3.898 0.072
7 526 2.721 0.209 20 9325 3.970 0.049
8 852 2.930 0.168 21 10450 4.019 -0.048
9 1253 3.098 0.090 22 9358 3.971 0.165
10 1542 3.188 -0.034 23 13688 4.136 0.132
11 1425 3.154 0.087 24 18542 4.268 0.132
12 1742 3.241 0.063 25 25142 4.400 0.130
13 2015 3.304 0.221
To study the impact of the differencing had the time series, we plot the
first-order differences against time (months) in Fig. 12.11.
Y
0.25
0.20
First-order difference
0.15
0.10
0.05
0.00
-0.05
-0.10 X
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Month
If you look at Fig. 12.11, you will observe that there is no consistent
trend (upward or downward) over the entire time span. It means that the
first-order difference removes the trend effect, and the time series
becomes almost stationary.
92
Unit 12 Stationary Time Series
Terminal Questions (TQs)
1. First, we have to plot the time series data. For that, we take the month
on the X-axis and the sales of new single houses on the Y-axis as
shown in Fig. 12.12.
Y
450
Sales of new single houses
400
350
300
250
200
150
100
50 X
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Month
Fig. 12.12: Time series plots of the sales of new single houses data.
From the figure, we observed that the sales of new houses increase with
time, therefore, there is a trend effect in the time series. Thus, this is a
nonstationary series.
We can remove the trend by transforming the data using differencing. So
we obtain the first-order difference, that is, we compute the difference
between consecutive observations in the series by subtracting the
previous value from each value in the series as:
Y=
t
′ Yt − Yt −1
To study the impact of the differencing had the time series, we plot the
first-order differences against time (months) in Fig. 12.13.
If you look at Fig. 12.13, you will observe that there is no consistent
trend (upward or downward) over the entire time span. It means that the
first-order difference removes the trend effect, and the time series
93
becomes almost stationary.
Block 3 Time Series Analysis
Y
40
30
First difference
20 Mean
10
0 X
2 3 4 5 6 7 8 9 10 11 12 13 14 15
Year
Fig. 12.13: Time series plots after the first difference of the sales of new single houses.
94
UNIT 13
CORRELATION ANALYSIS IN
TIME SERIES
Structure
13.1 Introduction 13.5 Correlogram
Expected Learning Outcomes 13.6 Interpretation of
Correlogram
13.2 Autocovariance and
Autocorrelation Functions 13.7 Summary
13.3 Estimation of Autocovariance 13.8 Terminal Questions
and Autocorrelation
13.9 Solution/Answers
Functions
13.4 Partial Autocorrelation
Function
13.1 INTRODUCTION
With the help of the time series data, we try to fit a time series model so that
we can forecast the observations. But one of the essential elements of time
series modelling is stationarity. In the previous unit, you have studied what is
stationary and nonstationary time series and how to detect and transform
nonstationary time series to stationary time series. As you know, a time series
is a collection of observations with respect to time, therefore, there is a chance
that a value at the present time may relate/depend on the past value. In most
of the time series, we observe such relationships. To study the degree of
relationship between previous/past values with the current value, we have to
study the covariance and correlation between them before modelling the time
series. Therefore, in this unit, you will study correlation analysis in time series.
We begin with a simple introduction of autocovariance and autocorrelation
functions in time series in Sec. 13.2. In Sec. 13.3, we discuss how to estimate
the autocovariance and autocorrelation functions using time series data.
When we study the autocorrelation between observations in the presence of
the intermediate variables, then it does not give the true picture of the relation.
Therefore, to remove the effect of the same, we use partial autocorrelation
which is discussed in Sec. 13.4. To present the autocorrelation/ partial
autocorrelation in the form of graphs/diagrams, we use a correlogram. In
Sec. 13.5, we describe what is correlogram and how to plot it. The
95
* Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 3 Time Series Analysis
interpretation of the correlogram is also explained in Sec. 13.6. In the next
unit, you will study different models for time series.
the value of another variable and vice versa. A zero value indicates no
relationship between the variables.
The main problem with the covariance is that it is hard to interpret due to its
wide range ( −∞ to + ∞ ). For example, our data set could return a value say 5,
or 500. It may take a large value if the variables X and Y are large. Therefore,
a large value of covariance does not indicate that there exists a strong
relationship between the variables. It means that it does not tell us that there
exists a strong relationship between the variables when it is large. A value of
500 tells us that the variables are correlated, but unlike the correlation
coefficient, that number doesn’t tell us exactly how strong that relationship is.
There is no meaning of the numerical value of covariance only the sign is
useful. To overcome this problem the covariance is divided by the standard
deviation to get the correlation coefficient.
Correlation
Lag
The number of intervals between the two observations is the lag. For example,
the lag between the current and previous observations is one. If you go back
one more interval, the lag is two, and so on. In mathematical terms, the
observations Yt and Yt+k are separated by k time units, then the lag is k. This
lag can be days, quarters, or years depending on the nature of the data. When
k = 1, you are assessing adjacent observations.
We now come to our main topic autocovariance, and autocorrelation and we
now define autocovariance/autocorrelation formally. 97
Block 3 Time Series Analysis
Autocovariance
The autocovariance is the same as the covariance. The only difference is that
the autocovariance is applied to the same time series data, i.e., you compute
the covariance of the data say temperature Y with the same data temperature
Y, but from a previous period.
Autocorrelation
In time series analysis, the autocorrelation is the fundamental technique for
calculating the degree of correlation between a series and its lags. This
method is fairly similar to the Pearson correlation coefficient but
autocorrelation uses the same time series twice: one in its original form and
the second lagged one or more time periods as in autocovariance. We now
define autocorrelation as
Autocorrelation is a measure of the degree of relationship between a
given time series and a lagged version of itself over successive time intervals.
If Yt and Yt +k denote the value of a stationary time series which start from
time t and t+k, respectively,then the autocorrelation function/ coefficient
98 between time series Yt and its lag value Yt +k is defined as
Unit 13 Correlation Analysis in Time Series
Cov ( Yt , Yt +k )
ρk =
Var ( Yt ) Var ( Yt +k )
Since for stationary time series variance of the series remains constant,
therefore,
Var ( Yt ) = Var ( Yt +k )
Cov ( Yt , Yt +k ) ∑ ( Y − μ)( Y
t t +k − μ)
=ρk = t =1
Var ( Yt ) N
∑ ( Y − μ)
2
t
t =1
∑ ( Y − μ)( Y
t t +k − μ)
γk
ρk =
t =1
N
γ0
∑ ( Yt − μ)
2
t =1
∑ ( Y − μ)( Y − μ)
t t
γ0
ρ
=0
t =1
N
= = 1
γ0
∑ ( Y − μ)
2
t
t =1
The degree of correlation between a series and its lags indicates the
pattern/characteristics of the series. For example, if a time series has a
seasonality component say monthly then we will observe a strong correlation
with its seasonal lags, say, 12, 24, and 36 months.
Some important properties of time series can be studied with the help of
autocovariance and autocorrelation functions. They measure the linear
relationship between observations at different time lags apart. They provide
useful descriptive properties of the time series under study. This is also an
important tool for guessing a suitable model for the time series data.
After understanding the concept of autocovariance and autocorrelation
functions, we now study how to estimate them using sample data.
∑(y t− y )( y t −k − y )
c
ρ̂k= rk= t =1
n
= k ; k= 1,2,...,n − 1
c0
∑ ( yt − y )
2
t =1
Calculate mean, variance and autocorrelation functions for the given data.
Solution: As you know that the autocovariance/autocorrelation function is
calculated between variables with multiple values of the same length.
100 Therefore, to compute the sample autocorrelation, first of all, we make two
Unit 13 Correlation Analysis in Time Series
series of the same length. If y t denotes the value of the temperature/series at
any particular time t then y t +1 denotes the value of the temperature/series one
time after time t. That is, y t +1 is the lag 1 value of y t as shown in the following
table:
Day Temperature (yt) yt+1 Day Temperature (yt) yt+1
1 22 -- 9 28 28
2 23 22 10 30 28
3 23 23 11 32 30
4 24 23 12 32 31
5 23 24 13 34 30
6 25 23 14 33 31
7 26 25 15 34 31
8 28 26
5 23 24 23 23 22
6 25 23 24 23 23
7 26 25 23 24 23
8 28 26 25 23 24
101
Block 3 Time Series Analysis
9 28 28 26 25 23
10 30 28 28 26 25
11 31 30 28 28 26
12 30 31 30 28 28
13 31 30 31 30 28
14 31 31 30 31 30
15 30 31 31 30 31
Total 405
Since for the calculation of the autocorrelation function, we assume that the
time series is stationary, therefore, mean and variance of the series will be
constant. Thus, we calculate the sample mean and variance of the given
original time series and make the necessary calculations for calculating the
autocovariance and autocorrelation function in the following table:
yt − y ( yt − y )
2
y t +1 − y yt+2 − y yt +3 − y yt+4 − y ( y t − y )( y t +1 − y ) ( y t − y )( y t + 2 − y ) ( y t − y )( y t + 3 − y ) ( y t − y )( y t + 4 − y )
–5 25
–4 16 –5 20
–4 16 –4 –5 16 20
–3 9 –4 –4 –5 12 12 15
-4 16 –3 –4 –4 –5 12 16 16 20
–2 4 –4 –3 –4 –4 8 6 8 8
–1 1 –2 –4 –3 –4 2 4 3 4
1 1 –1 –2 –4 –3 –1 –2 –4 –3
1 1 1 –1 –2 –4 1 –1 –2 –4
3 9 1 1 –1 –2 3 3 –3 –6
4 16 3 1 1 -1 12 4 4 –4
3 9 4 3 1 1 12 9 3 3
4 16 3 4 3 1 12 16 12 4
4 16 4 3 4 3 16 12 16 12
3 9 4 4 3 4 12 12 9 12
Total 164 –3 –7 –11 –14 137 111 77 46
Therefore,
1 n 405
Mean
= ∑ =
n t =1
y t = 27
15
1 n 164
Variance =c 0 = ∑ ( y t − y ) =
2
=10.933
n t =1 15
Autocovariance function
1 n−1 1
c1 = ∑
n t =1
( y t − y )( y t +1 − y ) = × 137 = 9.133
15
1 n−2 1
c2 = ∑
n t =1
( y t − y )( y t +2 − y ) = × 111 = 7.4
15
1 n−3 1
102 c3 = ∑
n t =1
( y t − y )( y t +3 − y ) = × 77 = 5.133
15
Unit 13 Correlation Analysis in Time Series
1 n− 4
1
c4 = ∑
n t =1
( y t − y )( y t + 4 − y ) = × 46 = 3.067
15
SAQ 1
A researcher wants to study the pattern of the unemployment rate in his
country. He collected quarterly unemployment rate data and given in the
following table:
Unemployment Quarter Unemployment
Quarter
rate rate
1 91 7 64
2 45 8 99
3 89 9 64
4 36 10 89
5 72 11 68
6 51 12 108
Compute:
(i) mean and variance, and
(ii) Autocovariance and autocorrelation functions.
This is the correlation between values two time periods apart conditional on
knowledge of the value in between. (By the way, the two variances in the
denominator will equal each other in a stationary series.), therefore,
The formula for calculating the partial autocorrelation function looks scary,
therefore, we calculate it using the autocorrelation function instead of it.
The 1st order partial autocorrelation function equals to the 1st order
autocorrelation function, that is,
φ11 =ρ1
Similarly, we can define the 2nd order (lag) partial autocorrelation function in
terms of autocorrelation function as
φ22 =
(ρ − ρ )
2
2
1
104 (1 − ρ )2
1
Unit 13 Correlation Analysis in Time Series
where
φ1k 1 ρ1 ρ2 ρk −1 ρ1 and if then
φ ρ 1 ρ3 ρk − 2 ρ according to Cramer-
2k 1 2 Rule the system has
=φk φ3k
= ,Pk ρ2 ρ1 1 ρk −3 and
= Ψk ρ2 unique solution and is
given by
φkk ρk −1 ρk − 2 ρk −3 1 ρk
In the above expression, the last coefficient, φkk , is the partial autocorrelation Where
Pk*
φkk =
Pk
As you saw, the autocorrelation function helps assess the properties of a time
series. In contrast, the partial autocorrelation function (PACF) is more useful
for finding the order of an autoregressive, autoregressive integrated moving
average (ARIMA) model. You will study these models in the next unit.
φˆ11 =r1
φˆ22 =
(
r2 − r12 )
(
1 − r12 )
The general form for calculating the sample partial autocorrelation function of
order k is given in the matrix form as shown below:
1 r1 r2 r1
r1 1 r3 r2
r2 r1 1 r3
*
P̂ k rk −1 rk − 2 rk −3 rk
φˆkk= =
Pˆk 1 r1 r2 rk −1
r1 1 r3 rk − 2
r2 r1 1 rk −3
rk −1 rk − 2 rk −3 1
Let's consider an example which helps you to understand how to calculate the
sample partial autocorrelation function.
Example 2: For the data given in Example 2 of Unit 12, calculate the sample
partial autocorrelation up to order 3.
φ11 = r1 = 0.835
We can calculate the 2nd order (lag) sample partial autocorrelation function as
=
φ22
(r =
−r )
2 1
2
0.677 − ( 0.835 )
2
(1 − r ) 1
2
1 − ( 0.835 )
2
−0.020
= = −0.067
106 0.303
Unit 13 Correlation Analysis in Time Series
1 r1 r1 1 0.835 0.835
r1 1 r2 = 0.835 1 0.677
r2 r1 r3 0.677 0.835 0.470
SAQ 2
For the data given in SAQ 1, calculate the sample partial autocorrelation
function up to order 2.
13.5 CORRELOGRAM
In the previous sessions, you learnt autocovariance, autocorrelation, and
partial autocorrelation functions which are used to understand the properties of
time series, fit the appropriate models, and forecast future events of the series.
With the help of the autocorrelation/partial autocorrelation function, we can
also diagnose whether the time series is stationary or not. But a group of a
large number of autocorrelation always makes misperceptions to the reader
and he/she may understand it wrongly. If we present the autocorrelation/
partial autocorrelation function in the form of graphs/diagrams, then it attracts
the reader and it can be understood better. 107
Block 3 Time Series Analysis
A plot in which we take the autocorrelation function on the vertical axis and
different lags on the horizontal axis is known as a correlogram. The technique
of drawing a correlogram is the same as that of a simple bar diagram. The
only difference is that we just take a line instead of a bar of the same width.
Each bar in the correlogram represents the level of correlation between the
series and its lags in chronological order. A correlogram is also known as an
autocorrelation function (ACF) plot or autocorrelation plot. It gives
a summary of autocorrelation at different lags. With the help of a
correlogram, we can easily examine the nature of the time series and
diagnose a suitable model for the time series data.
The correlogram suggests that observations with smaller lag are positively
correlated and autocorrelation decreases as lag k increases. In most of the
time series, it is noticed that the absolute value of rk i.e. | rk| decreases as k
increases. This is because observations which are located far away are not
much related to each other, whereas observations close may be positively or
negatively correlated.
Let us understand how we plot a correlogram with the help of an example.
Example 3: For the data given in Example 2 of Unit 12, plot the correlogram.
Solution: A correlogram is a plot of the autocorrelation function with respect to
its lag, therefore, first of all, we have to compute the sample autocorrelation
coefficients. In Example 2, we have already calculated these. Therefore, to the
sake of time, we just write them here
r1 = 0.835, r2 = 0.676, r3 = 0.469, r4 = 0.280
For the correlogram, we take lags on the X-axis and sample autocorrelation
function on the Y-axis. At each lag, we draw a line, which represents the level
of correlation between the series and a lagged version of itself, as shown in
the following Fig. 13.2.
After learning what is correlogram and how we plot it, we now understand how
the correlogram helps us to recognise the nature of a time series.
helpful for visual inspection to recognise the nature of time series, though it is
not always easy. We now describe certain types of time series and the nature
of their correlograms.
Random Series
A time series is completely random if it contains only independent
observations. Therefore, the values of the autocorrelation function for such a
series are approximately zero, that is, rk 0 and the correlogram of such a
random time series will be moving around the zero line. The typical
correlogram is shown in Fig. 13.3.
Alternating Series
If a time series behaves in a very rough and zig-zag manner, alternating
between above and below mean, then it indicates by negative rk and positive
rk+1 and vice-versa. The correlogram of an alternating time series is shown in
Fig. 13.4.
A time series is said to be stationary if its mean, variance and covariance are
109
almost constant and it is free from trend and seasonal effects. The
Block 3 Time Series Analysis
110
Fig. 13.6: The correlogram of time series having trend effect.
Unit 13 Correlation Analysis in Time Series
SAQ 3
A share market expert wants to study the pattern of a particular share price.
For that, he calculates the autocorrelation for different lags which are given as
follows:
r0 = 1, r1 = 0.482 , r2 = 0.050 , r3 = −0.159 , r4 = 0.253 , r5 = −0.024 , r6 = 0.053,
r13 = 0.407 , r14 = 0.010 , r15 = −0.181, r16 = −0.257 , r7 = −0.057 , r18 = 0.016
r19 = −0.051
13.7 SUMMARY
In this unit, we have discussed:
• Role of correlation analysis in time series.
• The covariance between a given time series and a lagged version of itself
over successive time intervals is called autocovariance. The formula for
calculating the autocovariance function is given as
1 N− k
γ −k Cov ( Yt , Yt +k ) =
γk == ∑ ( Yt − μ)( Yt +k − μ)
N t =1
and its estimate using sample data is as follows:
1 n −k
γ̂k ==
ck ∑ ( y t − y )( y t +k − y ); k =
n t =1
1,2,...,n − 1
∑ ( Y − μ)( Y
t − μ)
γk t +k
=ρk =
t =1
N
γ0
∑ ( Yt − μ)
2
t =1
∑(y t− y )( y t −k − y )
c
ρ̂k= rk= t =1
n
= k ; k= 1,2,...,n − 1
c0
∑ ( yt − y )
2
t =1
φ22 =
(ρ − ρ )
2
2
1
(1 − ρ ) 2
1
Pk*
φkk =
Pk
13.9 SOLUTION/ANSWERS
Self Assessment Questions (SAQs)
1. Since there are 12 observations, therefore, we prepare the data up to
n/4 = 12/4 = 3 lags as follows:
Quarter Unemployment (yt) yt+1 yt+2 yt+3
1 91
2 45 91
3 89 45 91
4 36 89 45 91
5 72 36 89 45
6 51 72 36 89
7 64 51 72 36
8 99 64 51 72
9 64 99 64 51
10 89 64 99 64
11 68 89 64 99
12 108 68 89 64
Total 876
yt − y ( yt − y )
2
y t +1 − y y t + 2 − y y t + 3 − y ( y t − y )( y t +1 − y ) ( y t − y )( y t +2 − y ) ( y t − y )( y t +3 − y )
18 324
–28 784 18 –504
16 256 –28 18 –448 288
–37 1369 16 –28 18 –592 1036 –666
–1 1 –37 16 –28 37 –16 28
–22 484 –1 –37 16 22 814 –352
–9 81 –22 –1 –37 198 9 333
26 676 –9 –22 –1 –234 –572 –26
–9 81 26 –9 –22 –234 81 198
16 256 –9 26 –9 –144 416 –144
–5 25 16 –9 26 –80 45 –130
35 1225 –5 16 –9 –175 560 –315
0 5562 –2154 2661 –1074
Therefore,
1 n 876
Mean
= ∑ =
n t =1
y t = 73 ,
12
1 n 5562
Variance =c 0 = ∑ ( y t − y ) =
2
=463.5
n t =1 12 113
Block 3 Time Series Analysis
Autocovariance
1 n−1 1
c1 = ∑
n t =1
( y t − y )( y t +1 − y ) = × −2154 = −179.5
15
1 n−2
1
c 2 = ∑ ( y t − y )( y t + 2 − y ) = × 2661 = 221.75
n t =1 12
1 n−3 1
c 3 = ∑ ( y t − y )( y t + 3 − y ) = × −1074 = −89.5
n t =1 12
After calculating the autocovariance function, we now calculate the
sample autocorrelation function as
r1 = −0.387 , r2 = 0.478 , r3 = −0.193
2. In SAQ 1, we have already calculated the sample autocorrelation
coefficients which are as follows:
c1 −179.5 c 2 221.75
r1 = = = −0.387 , =
r2 = = 0.478
c0 463.9 c0 463.9
c 3 −89.5
r3 = = = −0.193
c 0 463.9
φˆ22
=
(r =
2−r )
1
2
0.478 − ( −0.387 )
=
2
0.386
(1 − r )
1
2
1 − ( −0.193 )
2
3. For plotting the correlogram, we take lags on the X-axis and sample
autocorrelation coefficients on the Y-axis. At each lag, we draw a line,
which represents the level of correlation between the series and its lags,
as shown in the following Fig. 13.8.
yt − y ( yt − y )
2
y t +1 − y yt+2 − y yt +3 − y yt+4 − y ( y t − y )( y t +1 − y ) ( y t − y )( y t + 2 − y ) ( y t − y )( y t + 3 − y ) ( y t − y )( y t + 4 − y )
17 289
0 0 17 0
11 121 0 17 0 187
–3 9 11 0 17 –33 0 –51
–16 256 –3 11 0 17 48 –176 0 –272
–6 36 –16 –3 11 0 96 18 –66 0
–1 1 4 –6 –16 –3 –4 6 16 3
–6 36 –11 –1 4 –6 66 6 –24 36
1 n −k 1272
Variance =c 0 = ∑ ( y t − y ) =
2
=90.86
n t =1 14
Autocovariance
1 n−1 1
c1 = ∑
n t =1
( y t − y )( y t +1 − y ) = × −110 = −7.86
14
1 n−2 1
c2 = ∑
n t =1
( y t − y )( y t +2 − y ) = × −41 = −2.93
14
1 n−3 1
c3 = ∑
n t =1
( y t − y )( y t +3 − y ) = × 33 = 2.36
14
1 n− 4 1
c4 = ∑
n t =1
( y t − y )( y t + 4 − y ) = × −109 = −7.79
14
c 3 2.36 c −7.79
r3
= = = 0.026 , r4 = 4 = = −0.086
c 0 90.86 c 0 90.86
Fig. 13.9: The correlogram of time series data of sales of new single houses.
116
UNIT 14
TIME SERIES MODELLING
TECHNIQUES
Structure
14.1 Introduction qth-order Moving Average Models
14.1 INTRODUCTION
In the previous units (Units 12 and 13), you have learnt stationary and
nonstationary time series, the concept of autocorrelation and partial
autocorrelation in time series. When the data is autocorrelated, then most of
the standard modelling methods may become misleading or sometimes even
useless because they are based on the assumption of independent
observations. Therefore, we need to consider alternative methods that take
into account the autocorrelation in the data. Such types of models are known
as time series models. In this unit, you will study time series models, such as
autoregressive (AR), moving average (MA) autoregressive moving average
(ARMA), autoregressive integrated moving average (ARIMA), etc.
In this unit, you will learn some time series models. We begin with a simple
introduction of the necessity of time series models instead of ordinary
regression models in Sec. 14.2. In Secs. 14.3 and 14.4, we discuss the
autoregressive and moving average models with their types and properties,
respectively. In Sec. 14.5, the autoregressive moving average models are
117
* Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 3 Time Series Analysis
explained. The AR, MA and ARMA models are used for stationary time series.
If a time series is nonstationary then we use autoregressive integrated moving
average (ARIMA), therefore, you will learn it in Sec. 14.6. When you deal with
real-time series data, then the first question may arise in your mind how you
know which time series model is most suitable for a particular time series data.
For that, we discuss time series model selection in Sec. 14.7.
The coefficients β0 and β1 denote the intercept and the slope of the
regression line, respectively. The intercept β0 represents the predicted value
of Y when X = 0 and the slope β1 represents the average predicted change
in Y resulting from a one-unit change in X. Also each observation Y is
consisting of the systematic or explained part of the model, β0 + β1X and
random error ε . The term "error" in this context refers to a departure from the
underlying straight line model rather than a mistake and it includes everything
affecting Y other than predictor variable X. The error term has the following
assumptions:
• The mean of the error term should be zero, i.e., E [ ε ] =0 .
• The error term should have constant variance, i.e., Var [ ε ] =σ2 =constant
The model expresses the present value as a linear combination of the mean of
the series δ (read as delta), the previous value of the variable y t −1 and the
error term ɛt (read as epsilon). The magnitude of the impact of the previous
value on the present value is quantified using a coefficient denoted with φ1
(read as phai). The "error term " is called white noise and it is normally
distributed with mean zero and constant variance (σ2 ).
120
Var
= [ y t ] Var [δ] + φ12 Var [ y t −1 ] + Var [ε t ]
Unit 14 Time Series Modelling Techniques
Since time series is stationary, therefore, Var [ y t ] = Var [ y t −1 ]
Therefore,
Var [ y t ] =
φ12 Var [ y t ] + σ 2
σ2
Var [=
yt ] 2
≥ 0 when φ12 < 1
1 − φ1
Cov [ ε t , y t +k ] = 0
Therefore,
γk = φ1γk −1
Hence,
σ2 σ2
γ1 = φ1Var ( y t ) =
φ1γ 0 = φ1
1 − φ12
Since Var [ y t ] =
1 − φ12
σ2 σ2
γ2 =φ1γ1 =φ12 Since γ 1 = φ1
1 − φ12 1 − φ12
σ2
γk = φ1k
φ1γk −1 =
1 − φ12
The autocorrelation function(ACF) for an AR(1) model is as follows:
γk
ρk = φ1k for k =
= 0,1,2,...
γ0
We now study the properties of the AR(2) model as we have studied for AR(1).
Mean and Variance
We can find the mean and variance of an AR(2) model as we have obtained
for AR(1). Here, we just write these as follows:
δ
Mean =
1− φ1 − φ2
Similarly,
σ2
Var [ y t ]
= ≥ 0 when φ12 + φ22 < 1
1 − φ12 − φ22
Autocovariance and Autocorrelation Functions
We can find the autocovariance of an AR(2) model as we have obtained for
AR(1) model. Here, we just write these as follows:
σ 2 if k = 0
γk = Cov [ y t , y t +k ] = φ1γk −1 + φ2 γk − 2 +
0 if k > 0
Therefore,
γ 0 = φ1γ1 + φ2 γ 2
and
γk = φ1γk −1 + φ2 γk − 2 + σ 2 ; k = 1,2,...
ρ3 =φ1ρ2 + φ2ρ1
• φ1 + φ2 < 1
• φ2 − φ1 < 1
We now study the properties of the AR(p) model as we have studied for AR(1)
and AR(2) models.
Mean and Variance
We can find the mean and variance of an AR(p) model as we have obtained
for AR(1). Here, we just write these as follows:
δ
Mean =
1 − φ1 − φ2 − ... − φp
and
σ2
=Var [ y t ] ≥ 0 when φ12 + φ22 + ... + φp2 < 1
1 − φ12 − φ22 − ... − φp2
Autocovariance and Autocorrelation Functions
We can find the autocovariance of an AR(2) model as we have obtained for
AR(1). Here, we just write these as follows:
σ 2 if k = 0
γk = Cov [ y t , y t +k ] = φ1γk −1 + φ2 γk − 2 + ... + φp γk −p +
0 if k > 0
Therefore,
γ 0 = φ1γ1 + φ2 γ 2 + ... + φp γp
and
γk = φ1γk −1 + φ2 γk − 2 + ... + φp γk −p + σ 2 ; k = 1,2,...
123
Block 3 Time Series Analysis
The autocorrelation function for an AR(2) model can be obtained by dividing γk
by γ 0 as follows:
where ε t N [0,1]
δ = 10 and φ1 =0.2
σ2
γ1 =
φ1γ 0 =
φ1 0.2 × 1.04 =
= 0.21
1 − φ12
γ2 =φ1γ1 =
0.2 × 0.21 =
0.04
γ3 =
φ1γ 2 =
0.2 × 0.04 =
0.008
124 We can forecast, the next value of y100 using the prediction model as
Unit 14 Time Series Modelling Techniques
ŷ=
t 10 + 0.2y t −1
Therefore, if the current observation is y100 = 7.5, then the next observation
y101 = 11.15 will be above the mean (12.5).
You may like to try the following Self Assessment Question before studying
further.
SAQ 1
Consider the time series model
yt =
5 + 0.8y t −1 − 0.5y t − 2 + ε t
where ε t N [0,2]
Moving average (MA) models are the models in which the value of a
variable in the current period is regressed against the residuals in the
previous period.
A moving average model, states that the current value is linearly dependent on
the past error terms. 125
Block 3 Time Series Analysis
A Moving Average model is similar to an autoregressive model, except that
instead of being a linear combination of past time series values, it is a linear
combination of the past error/residual/white noise terms.
Concept of Invertibility
An autoregressive model can be used on time series data if and only if the
time series is stationary. The moving average models are always stationary.
But some restrictions also are imposed on the parameters of the moving
average model as in the case of the autoregressive model.
As by the definition of the moving average model, the value of a variable in the
current period is regressed against the residuals in the previous period and if
The MA model is defined
in many textbooks and
y t is the value of a variable at time t and ε t −1 is the residuals at time t –1 then
computer software with we can express moving average model of first-order as
minus sign before the
terms. Although this y t =μ + ε t + θ1ε t −1
switches the algebraic
signs of estimated We can write the above expression as
coefficient values and
(unsquared) theta terms ε t = y t − μ − θ1ε t −1
in formulas for ACFs and
variances. It has no ε t = y t − μ − θ1 ( y t −1 − μ − θ1ε t − 2 )
effect on the model’s
overall theoretical = μ ( θ1 − 1) + y t + θ1y t −1 + θ12 ε t − 2
features. In order to
accurately construct the
estimated model, you
must examine your ∞
software to make sure ε t= c + ∑ θ1i yit
that either to make sure i=0
that either negative or
positive signs are used.
It means that we can convert/invert the past residuals/errors//noises into past
R software uses positive
signs in its underlying observations. In other words, we can convert/invert a moving average model
model, as we take here. into the autoregressive model. This property is called invertibility. This notion
is very important if one wants to forecast the future values of the dependent
variable, otherwise, the forecasting task will be impossible (i.e., the residuals
in the past cannot be estimated, as it cannot be observed). Actually, when the
model is not invertible, the innovations can still be represented by
observations of the future, this is not helpful at all for forecasting purposes.
Any autoregressive process is necessarily invertible but a stationarity condition
must be imposed to ensure the uniqueness of the model for a particular
autocorrelation structure. A moving average process on the other hand is
stationary but in order for there to be a unique model for a particular
autocorrelation structure an invertbility condition must be imposed.
We now discuss different types of moving average models on the basis of the
correlation with the residuals of the previous periods in the following sub-
sections.
The moving average model in which the value of a variable in the current
period is regressed against its previous residual is called the first-order moving
average. For example, if today's price of a share depends on whatever has
happened in the other factor on the previous day except for the price of the
126 share on the previous day then we use the first-order moving average.
Unit 14 Time Series Modelling Techniques
If y t is the value of a variable at time t and ε t −1 is the residuals at time t –1
then the moving average model of first-order is given as follows:
y t =μ + ε t + θ1ε t −1
E [ y t ] =E [μ] + E [ ε t ] + θ1E [ ε t −1 ]
=E [ ε t ] E=
[ε t −1 ] 0 because ε t N 0,σ 2
=μ + 0 + θ1 × 0
and μ is a constant so E (μ) =μ
Therefore,
Mean of MA(1) = μ
Similarly,
Var [ y t ] =
Var [μ] + θ12 Var [ ε t −1 ] + Var [ ε t ]
Since
Var [ ε t ] Var
= = [ε t −1 ] σ 2 because ε t N 0,σ 2
Therefore,
[ y t ] θ12σ 2 + σ 2
Var =
Var [ y t ] = ( )
1 + θ12 σ 2 ≥ 0
γ1 = θ1σ 2
127
Block 3 Time Series Analysis
γk 0; k > 1
=
ρk 0; k > 1
=
It indicates that the autocorrelation function for MA(1) model becomes zero
after lag 1.
Conditions for Invertibility
The moving average models are always stationary. However, some
restrictions are also imposed on the parameters of the moving average models
otherwise the model can not converge. Therefore, some constraints on the
values of the parameters are required for the invertibility of the MA (1)
model which is as follows:
θ1 < 1 ⇒ −1 < θ1 < 1
As for MA (1) model, the coefficients θ1 & θ2 represent the magnitude of the
impact of past residuals on the present value.
Let us study the properties of the MA(2) model.
Mean of MA(2) = μ
Var [ y t ] =
Var [μ] + θ12 Var [ ε t −1 ] + θ22 Var [ ε t − 2 ] + Var [ ε t ]
Since
128
Var [μ] = 0 because μ is constant
Unit 14 Time Series Modelling Techniques
Var [ ε t ] Var
= = [ε t −1 ] Var
= [ε t −2 ] σ because ε t N 0,σ
2 2
Therefore,
Var [ y t ] = θ12σ 2 + θ22σ 2 + σ 2
( )
Var [ y t ] = 1 + θ12 + θ22 σ 2 ≥ 0
( )
γ 0 =Var ( y t ) = 1 + θ12 + θ22 σ 2
γ1
= ( θ1 + θ1θ2 ) σ 2
γ 2 = θ2σ 2
γk 0; k > 2
=
γ2 θ2
ρ=
2 =
γ 0 1 + θ12 + θ22
ρk 0; k > 2
=
• θ1 + θ2 < 1
• θ2 − θ1 < 1
After understanding the first and second-order moving average models, you
are interested to know the general form of the moving average model. Let’s
discuss the general form of the moving average model.
A moving average model states that the current value is linearly dependent on
the current and past error terms.
If y t is the value of a variable at time t and ε t , ε t −1 , ε t − 2 ,...,ε t − q are the residuals
at time t,t − 1,t − 2,...,t − q , respectively, then the moving average model of the
qth order is expressed as the present value ( y t ) as a linear combination of the
mean of the series (μ) , the present error term ( ε t ) , and past error terms ( ε t −1 ,
ε t − 2 ,...,ε t − q ). Mathematically, we express a general moving average model as
follows: 129
Block 3 Time Series Analysis
y t =μ + ε t + θ1ε t −1 + θ2 ε t − 2 + ... + θqε t − q
where θ1,θ2 ,...,θq represent the magnitude of the impact of past errors on the
present value.
After understanding the form of the general moving average model, we now
study the properties of the model.
Mean and Variance
The mean and variance of MA(q) model are given as follows:
Mean = μ
( )
Var [ y t ] = 1 + θ12 + θ22 + ... + θ2q σ 2 ≥ 0
(
γ 0 = 1 + θ12 + θ22 + ... + θ2q σ 2)
( θ + θ1θk +1 + ... + θq−k θq ) σ 2 ;k =
1,2,...,q
γk = k
0;k > q
where ε t N [0,1]
(iii) What are the mean and variance of the time series?
Solution: Since the variable in the current period is regressed against its
previous residual, therefore, it is the first-order moving average model. To
check whether it is invertible, first of all, we find the parameters of the time
series model. We now compare it with its standard form, that is,
y t =μ + ε t + θ1ε t −1
We obtain
μ = 10 , θ1 = 0.8
The invertibility constraints for MA(1) is −1 < θ1 < 1. Since θ1 lies between –1
and 1, therefore, the time series model MA(1) is invertible.
Mean = 2
( )
Var [ y t ] = 1 + θ12 σ 2 = (1 + 0.64 ) × 1 = 1.64
=γ 0 Var
= ( y t ) 1.64
γ1= θ1σ 2= 0.8 × 1= 0.8
γk 0; k > 1
=
ρk 0; k > 1
=
We can forecast, the next value of y100 using the prediction model as
ŷ t= 2 + 0.8ε t −1
Therefore, if the residual at t = 100 is 2.3, then the next observation y101 = 2.18
will be above the mean (2).
You may try the following Self Assessment Question before studying further.
SAQ 2
Consider the time series model
y t = 42 + ε t + 0.7ε t −1 − 0.2ε t − 2
where ε t N [0,2]
(v) Suppose the residual errors for time periods 20 and 21 are 0.23 and 0.54
then forecast the next observation.
ARMA(1,1) Models
ARMA(1,1) models are the models in which the value of a variable in the
current period is related to its own value in the previous period as well as
values of the residual in the previous period. It is a mixture of AR(1) and
MA(1).
If y t and y t −1 are the values of a variable at time t and t –1, respectively and if
ε t and ε t −1 are the residuals at time t and t –1, respectively then the
ARMA (1,1) model is expressed as follows:
y t = δ + φ1y t −1 + θ1ε t −1 + ε t
Var [ y t ]
=
(1 + 2φ θ1 1 )
+ θ12 σ 2
≥ 0 when φ12 < 1
2
1− φ 1
(1 + 2φ θ )
+ θ12 σ 2
=γ 0 Var
= [yt ] 1 1
1− φ 2
1
γ1 =
( φ1 + θ1 )(1 + φ1θ1 ) σ 2
1 − φ12
γk =
φ1γk −1 for k =
2,3,...
• θ1 + θ2 < 1
• θ2 − θ1 < 1
Solution:
Since the variable in the current period is regressed against its previous value
as well as previous residual, therefore, it is the ARMA model of order (1, 1). To
check whether it is stationary and invertible, first of all, we find the parameters
of the time series model. We now compare it with its standard form, that is,
y t = δ + φ1y t −1 + θ1ε t −1 + ε t
We obtain
δ = 20 , φ1 =−0.5 , θ1 = 0.7
The stationary constraints for ARMA(1, 1) is −1 < φ1 < 1 . Since φ1 lies
between –1 and 1, therefore, the time series model ARMA(1, 1) is stationary. 135
Block 3 Time Series Analysis
Similarly, the invertibility constraints for ARMA(1,1) is −1 < θ1 < 1. Since θ1
lies between –1 and 1, therefore, the time series model ARMA(1,1) is
invertible.
We now calculate the autocovariance function of ARMA(1, 1) as
( φ1 + θ1 )(1 + φ1θ1 )
γk for k = 1
ρ=
k = 1 + 2φ1θ1 + θ12
γ0 k
φ1 ρk −1 for k = 2,3,...
ρ1 =
( φ1 + θ1 )(1 + φ1=
θ1 ) ( −0.5 + 0.7 )(1 − 0.5 × 0.7
=
) 0.13
= 0.165
1 + 2 × −0.5 × 0.7 + ( 0.7 )
2 2
1 + 2φ1θ1 + θ1 0.79
φ12ρ1 =
ρ2 = −0.5 × −0.5 × 0.165 =
0.041
( −0.5 ) × 0.041 =
3
φ13ρ2 =
ρ3 = −0.005
SAQ 3
Consider the ARMA time series model
y t =+
27 0.8y t −1 + 0.3ε t −1 + ε t
where y′t is the differenced series which may have been differenced more
than once and p and q are the orders of autoregressive and moving average
parts.
For the first difference, we can write the ARIMA model as
y t − y t −1 = δ + φ1 ( y t −1 − y t − 2 ) + φ2 ( y t − 2 − y t −3 ) + ... + φp ( y t − q − y t − q−1 )
− θ1ε1 − θ2 ε t − 2 − ... − θqε t − q + ε t
On the basis of different values of p, q and d, the ARIMA model has different
types which are discussed as follows:
14.6.1 Various Forms of ARIMA Models
The ARIMA model has various forms for different values of the parameters
p, d and q of the model. We discuss some standard forms as follows:
It is important that the Step 1: Since there are two types of models which are used for stationary and
choice of order makes nonstationary time series, therefore, first of all, we plot the time series
sense. For example,
suppose you have had data and check whether the time series data is stationary or
blood pressure readings nonstationary as you have learned in Unit 13.
for every day over the
past two years. You Step 2: If the time series is stationary, we have to decide which model out
may find that an AR(1)
or AR(2) model is of AR, MA and ARMA is suitable for our time series data. To
appropriate for distinguish among them, we calculate the autocorrelation function
modeling blood
pressure. However, the (ACF) and the partial autocorrelation function (PACF) as discussed in
PACF may indicate a Unit 13. After that, we plot the ACF and PACF versus the lag, that is,
large partial
autocorrelation value at
correlogram as discussed in Unit 13 and try to identify the pattern of
a lag of 17, but such a both. The ACF plot is most useful for identifying the AR model and
large order for an PACF plot for the order of the AR model whereas the PACF plot is
autoregressive model
likely does not make most useful for identifying the MA model and ACF plot for the order of
much sense. the MA model. We now try to understand how to distinguish between
AR, MA and ARMA models as follows:
Case I (AR model): In the plot of ACF versus the lag (correlogram), if you
see a gradual diminish in amount or exponential decay
then this indicates that the values of the time series are
serially correlated, and the series can be modelled
through an AR mode. For determining the order of an
AR model, we use a plot of PACF versus the lag. If the
PACF output cuts off, which means the PACF is almost
zero at lag p+1, then it indicates that the AR model of
order p. We can also calculate PACF by increasing the
order one by one and as soon as this lies within the
range of ± 2/√n (where n is the size of the time series)
we should stop and take the order as the last significant
PACF as the order of the AR model (see SAQ 4).
Case II (MA model): In the plot of PACF versus the lag, if you see a gradual
diminish in amount or exponential decay then this
indicates that the series can be modelled through an MA
model and if the ACF output cuts off, means the ACF
almost zero, at lag q+1, then it indicates that the MA
138 model of order q.
Unit 14 Time Series Modelling Techniques
Case III (ARMA model): If the autocorrelation function (ACF) as well as the
partial autocorrelation function (PACF) plots show a
gradual diminish in amount (exponential decay) or
damped sinusoid pattern then this indicates that the
series can be modelled through an ARMA model but
it makes the identification of the order of the ARMA
(p, q) model relatively more difficult. For that
extended ACF, generalised sample PACS, etc. are
used which are beyond the scope of this course. For
more detail, you can consult Time Series Analysis
Forecasting and Control, 4th Edition, written by Box,
Jenkins and Reinsel.
Step 3: If the time series is nonstationary, we obtain the first, second, etc.
differences of the time series as discussed in Unit 13 until it becomes
stationary and ensure that trend and seasonal components are
removed and find d. Suppose after the second difference the series
becomes stationary then d is 2. Generally, one or two-stage
differencing is sufficient. The differenced series will be shorter (as you
have observed in Unit 13) than the source series. An ARMA model is
then fitted to the resulting time series. Since ARIMA models have
three parameters, therefore, there are many variations to the possible
models that could be fitted. We should choose the ARIMA models as
simple as possible, i.e. contain as few terms as possible (small values
of p and q). For more detail, you can consult Time Series Analysis
Forecasting and Control, 4th Edition, written by Box, Jenkins and
Reinsel.
Step 4: After identifying the model, we estimate the parameters of the model
using the method of moments, maximum likelihood estimation, least
squares methods, etc. The method of moments is the simplest of
these. In this method, we equate the sample autocorrelation functions
to the corresponding population autocorrelation functions which are
the function of the parameters of the model and solve these
equations for the parameters of the model. However, this method is
not a very efficient method of estimation of parameters. For moving
average processes usually the maximum likelihood method is used
which gives more efficient estimates when n is large. We shall not
discuss this anymore here and if someone is interested in this, may
refer to Time Series Analysis Forecasting and Control, 4th Edition,
written by Box, Jenkins and Reinsel.
Step 5: After fitting the best model, we give a diagnostic check to the
residuals to examine whether the fitted model is adequate or not. It
helps us to ensure no more information is left for extraction and check
the goodness of fit. For the residual analysis, we plot the ACF and
PACF of the residual and check whether there is a pattern or not. For
the adequate model there should be no structure in ACF and PACF of
the residual and should not differ significantly from zero for all lags
greater than one. For the goodness of fit, we use Akaike’s information
criterion (AIC) and Bayesian information criterion (BIC). We have not 139
Block 3 Time Series Analysis
discussed all the above aspects in detail here but interested one
should consult Time Series Analysis Forecasting and Control, 4th
Edition, written by Box, Jenkins and Reinsel.
After understanding the procedure of selection of a time series model, let us
take an example.
Example 4: The temperature (in oC) in a particular area on different days
collected by the meteorological department is as given below:
Day Temperature Day Temperature
1 27 9 28
2 29 10 30
3 31 11 30
4 27 12 26
5 28 13 30
6 30 14 31
7 32 15 27
8 29
Y
40
38
Temperature
36
34
32
30 X
1 2 3 4 5 6 7 8 9 10 11 12 13
Day
We now estimate the parameters ( δ and φ1 ) of the model using the method of
moments. In this method, we equate the sample autocorrelation functions to
the population autocorrelation functions which are the function of the
parameters of the model and solve these as below:
r1 = ρ1
For estimating the parameter δ , first, we find the mean of the given data and
δ
then we use the relationship Mean =
1 − φ1
1 15 435
Mean
= ∑=
15 i=1
yi = 29
15
Therefore,
δ
Mean = 29 = ⇒ δ = 29 × 0.165 = 4.785
1 − 0.835
Therefore the suitable model for the temperature data is
yt =
4.785 + 0.835y t −1 + ε t
Before going to the next session, you may like to do some exercise yourself.
Let us try Self Assessment Question.
SAQ 4
A researcher wants to develop an autoregressive model for the data on
COVID-19 patients in a particular city. For that, he collected the data 100 days
and calculated the autocorrelation function which are given as follow:
r1 = 0.73, r2 = 0.39, r3 = 0.07
14.8 SUMMARY
In this unit, we have discussed
14.10 SOLUTIONS/ANSWERS
Self Assessment Questions (SAQs)
1. For checking the stationarity of the time series, first of all, we find the
parameters of the time series model. Since the variable in the current
period is regressed against its previous and previous to previous values,
therefore, it is the second-order autoregressive model. We now compare
it with its standard form, that is, y t = δ + φ1y t −1 + φ2 y t − 2 + ε t . We obtain
δ = 5 , φ1 =0.8 , φ2 =−0.5
Since all three conditions for stationarity are satisfied, hence, the time
series is stationary.
σ2 2 2
Var [ y t ] = 2
= 2
= = 18.18
1 − φ1 − φ2 1 − 0.64 − 0.25 0.11
φ1 0.8
=ρ1 = = 0.53
142 1 − φ2 1 + 0.5
Unit 14 Time Series Modelling Techniques
ρ2 =φ1ρ1 + φ2 =0.8 × 0.53 − 0.5 =−0.076
−0.061 − 0.265 =
= −0.326
We can forecast, the next value of y52 using the prediction model as
ŷ t =
5 + 0.8y t −1 − 0.5y t − 2
ŷ 52 =
5 + 0.8y 51 − 0.5y 50
2. Since the variable in the current period is regressed against its previous
and previous to previous residuals, therefore, it is a second-order
moving average model. For checking the invertibility of the MA(2) model,
first of all, we find the parameters of the time series model. We now
compare it with its standard form, that is, y t =μ + ε t + θ1ε t −1 + θ2 ε t − 2 . We
have
μ = 42
θ1 = 0.7
θ2 = −0.2
• θ1 + θ2 < 1
• θ2 − θ1 < 1
Since θ2 = −0.2 , therefore, it lies between –1 and 1
Since all three conditions for the invertibility of MA(2) model are satisfied,
hence the time series is invertible.
We now calculate the mean and variance of the series as
Mean = 42
σ2 2 2
Var [ y t ] = 2
= 2
= = 1.307
1 + θ1 + θ2 1 + 0.49 + 0.04 1.53
ρk 0; k > 2
=
We can forecast, the next value of y52 using the prediction model as
ŷ t =+
42 0.7y t −1 − 0.2y t − 2
ŷ 22 =
42 + 0.7ε 21 − 0.2ε 20
143
Block 3 Time Series Analysis
ŷ 22 = 42 + 0.7 × 0.54 − 0.2 × 0.23 = 42.332
We obtain
δ = 27 , φ1 =0.8 , θ1 = 0.3
ρ1 =
( φ1 + θ1 )(1 +=
φ1θ1 ) ( 0.8 + 0.3 )(1 − 0.8 × 0.3 )
=
0.836
= 0.532
1 + 2φ1θ1 + θ12 1 + 2 × 0.8 × 0.3 + ( 0.3 )
2
1.57
( 0.8 ) × 0.532 =
2
φ12ρ1 =
ρ2 = 0.340
( 0.8 ) × 0.340 =
3
φ13ρ2 =
ρ3 = 0.174
4. As we know from Unit 13, the 1st-order partial autocorrelation equals the
1st-order autocorrelation, that is,
φ11 = r1 = 0.73
φˆ22 =
(r − r ) =0.39 − ( 0.73 )
2 1
2 2
=−0.31
(1 − r ) 1 − ( 0.73 )
1
2 2
Since PACF of first-order lies outside the range ± 2/√n = ± 2/√100 = 0.2,
therefore, we calculate second-order PACF as
(
φˆ 22 =
)
r2 − r12
=
0.54 − ( 0.69 )
=
2
0.064
= 0.049
(
1 − r1)
2
1 − ( 0.69 )
2
0.524
Since PACF (2) lies within the range of ± 2/√n = 0.22, therefore, AR(1)
will be suitable for this time series.
146