You are on page 1of 148

MST-014

STATISTICAL QUALITY
Indira Gandhi National Open University
School of Sciences CONTROL AND
TIME SERIES ANALYSIS

6s

R2

R3 R5 R6 R9

R4

TIME SERIES ANALYSIS AND VOLUME 2


RELIABILITY THEORY
MST-014
STATISTICAL QUALITY
Indira Gandhi National Open University
School of Sciences
CONTROL AND TIME
SERIES ANALYSIS

Volume

2
TIME SERIES ANALYSIS AND RELIABILITY THEORY
BLOCK 3
Time Series Analysis 5
BLOCK 4
Reliability Theory 147
Curriculum and Course Design Committee
Prof. Sujatha Varma Prof. Rakesh Srivastava
Former Director, SOS Department of Statistics
IGNOU, New Delhi M.S. University of Baroda, Vadodara (GUJ)

Prof. Diwakar Shukla Prof. Sanjeev Kumar


Department of Mathematics and Statistics Department of Statistics
Dr. H. S. Gaur Central University, Sagar (MP) Banaras Hindu University, Varanasi (UP)

Prof. Gulshan Lal Taneja Prof. Shalabh


Department of Mathematics Department of Mathematics and Statistics
M.D. University, Rohtak (HR) Indian Institute of Technology, Kanpur (UP)

Prof. Gurprit Grover Prof. V. K. Singh (Retd.)


Department of Statistics Department of Statistics
University of Delhi, New Delhi Banaras Hindu University, Varanasi (UP)

Prof. H. P. Singh Prof. Manish Trivedi, SOS, IGNOU


Department of Statistics
Vikram University, Ujjan (MP) Dr. Taruna Kumari, SOS, IGNOU

Prof. Rahul Roy Dr. Neha Garg, SOS, IGNOU


Mathematics and Statistics Unit
Indian Statistical Institute, New Delhi Dr. Rajesh, SOS, IGNOU

Prof. Rajender Prasad Dr. Prabhat Kumar Sangal, SOS, IGNOU


Division of Design of Experiments,
IASRI, Pusa, New Delhi Dr. Gajraj Singh, SOS, IGNOU

Course Preparation Team


Course Editor Course Writer
Prof. Ram Kishan (Units 10-18) Dr. Prabhat Kumar Sangal (Units 10-14)
Department of Statistics School of Sciences,
D.A.V. (PG) College IGNOU, New Delhi
Maa Shakambhari University,
Saharanpur (UP) Dr. Rajesh (Units 15-18)
School of Sciences,
IGNOU, New Delhi

Units 15-18 are adapted from IGNOU course MSTE-001: Industrial Statistics-I of PGDAST programme, Block 4,
Units 13-16.
Formatted and CRC Prepared by Ms Preeti, SOS, IGNOU
Course Coordinator: Dr. Prabhat Kumar Sangal
Programme Coordinators: Dr. Neha Garg and Dr. Prabhat Kumar Sangal

Print Production
Mr. Rajiv Girdhar Mr. Hemant Parida
Assistant Registrar Section Officer
MPDD, IGNOU, New Delhi MPDD, IGNOU, New Delhi

June, 2023
© Indira Gandhi National Open University, 2023
ISBN-978-81-266-
All rights reserved. No part of this work may be reproduced in any form, by mimeograph or any other means,
without permission in writing from the Indira Gandhi National Open University
Further information on the Indira Gandhi National Open University may be obtained from the University’s Office at
MaidanGarhi, New Delhi-110068 or visit University’s website http://www.ignou.ac.in
Printed and published on behalf of the Indira Gandhi National Open University, New Delhi by the Director, School
of Sciences.
VOLUME 2: TIME SERIES ANALYSIS AND
RELIABILITY THEORY
Dear learners, welcome again to the Course, “Statistical Quality Control and Time Series
Analysis”. In Volume 1: Statistical Quality Control, you have learnt how statistical tools help
in maintaining the quality of the products so that they fulfil their specifications. In this volume,
you will study what a time series is, various methods of estimation of components of time
series, various time series models, basic functions of reliability, and reliability evaluation of
simple and complex systems. This volume also comprises two blocks: Block 3 and Block 4.
Block 3 of this course is titled “Time Series Analysis” and the goal of this block is to build the
abilities necessary for learners to use statistical techniques to analyse time series data and
forecast the values of the time series. We start with what is time series and what are its
components, various techniques of estimating its components such as trend, seasonal, cyclic
and random, and some fundamental concepts that are necessary for a proper understanding
of time series modelling such as stationarity, non-stationarity, and correlation analysis in time
series also discussed. We explain various time series models such as autoregressive model,
moving average model, autoregressive moving average model and autoregressive integrated
moving average model.
Block 4 of this course is titled “Reliability Theory” and contains four units that broadly cover
another important topic reliability theory. The objective of this block is to strengthen the
learners' abilities so they can use statistical techniques to compute a system's reliability and
improve the reliability of a system. In this block, we explain reliability of a component/system
which means how long it performs its intended function successfully under given conditions.
Here, we explain various basic functions of reliability and reliability evaluation of simple and
complex systems. Here we also discuss how to improve the system’s performance.

Expected Learning Outcomes


After completing this volume, you should be able to:
 understand what the time series is and describe its components;
 describe smoothing techniques for forecasting models, including, simple moving
average, weighted moving average, and exponential smoothing;
 explain various methods for the estimation of the trend;
 apply the various methods for estimating seasonal component such as simple average
method, the ratio to trend method and ratio to moving average method;
 discuss the methods of estimation of cyclic and irregular fluctuations in a time series;
 use trend, seasonal and cyclical components to forecast future values.
 distinguish between stationary and nonstationary time series and transform a
nonstationary time series to stationary time series.
 describe the concept of covariance and correlation in time series and explain
autocovariance and autocorrelation functions;
 describe and use of autoregressive models, moving average models, autoregressive
moving average models and autoregressive integrated moving average models;
 selection of a particular time series model for real-life time series data;
 define reliability and explain the basic functions, namely, reliability function, cumulative
failure distribution function, failure density function and hazard rate;
 define a simple system and evaluate the reliability of a system when its components
are in series, in parallel and in mixed configurations.
 define redundancy, active redundancy and standby (passive) redundancy;
 evaluate the reliability of a k-out-of-n system and a standby system; and
 define a complex system and evaluate the reliability of complex systems.

If you feel like reading more that what this course contains, you may like to consult the
following books:

Suggested Further Readings


1. Anderson, T. W. (1971). The Statistical Analysis of Time Series, Wiley, NY.
2. Balagurusamy, E. (1984). Reliability Engineering. Tata McGraw Hill Education Private
Limited.
3. Billinton, R. and Allan, R. N. (1983). Reliability Evaluation of Engineering Systems:
Concepts and Techniques. Plenum Press New York and London.
4. Box, G. E. P. and Jenkins, G. M. (1976). Time Series Analysis - Forecasting and Control,
Holden-Day, San Francisco.
5. Ebeling, C. E. (2000). An Introduction to Reliability and Maintainability Engineering. Tata
McGraw Hill Education Private Limited.
6. Kitagawa Genshiro (2010). Introduction to Time Series Modeling. Chapman & Hall/CRC
Taylor & Francis Group
7. Krispin Rami (2019). Hands-On Time Series Analysis with R. Packt Publishing Ltd.
8. Kumar, U. D., Crocker John, Chitra, T. and Saranga Haritha (2006). Reliability and Six
Sigma. Springer, Boston, MA.
9. Montgomery, D. C. and Johnson, L. A. (1977). Forecasting and Time Series Analysis.
McGraw Hill.
10. Srinath, L. S. (1975). Concepts in Reliability with an Introduction to Maintainability and
Availability. Affiliated East-West Press Pvt. Ltd.
11. Wegner Trevor (2016). Applied Business Statistics. Juta and Company Ltd.
Your feedback pertaining to this course will help us undertake maintenance and timely revision
of the course. You may give your feedback using the following link:
Feedback Link: https://forms.gle/21ZrF2Gw8Wxn5hym9
We hope that you would enjoy reading the self-learning material of this course. Wishing you a
happy learning experience! and all the best in this endeavour!

Course Preparation Team


MST-014
STATISTICAL QUALITY
Indira Gandhi National Open University
School of Sciences CONTROL AND TIME
SERIES ANALYSIS

Block

3
TIME SERIES ANALYSIS
UNIT 10
Trend Component Analysis 9
UNIT 11
Seasonal Component Analysis 45
UNIT 12
Stationary Time Series 75
UNIT 13
Correlation Analysis in Time Series 95
UNIT 14
Time Series Modelling Techniques 117

5
BLOCK 1: Process Control
Unit 1: Overview of Statistical Quality Control

Unit 2: Control Charts for Mean and Variation

Unit 3: Control Charts for Defectives

Unit 4: Control Charts for Defects and Small Shifts

Unit 5: Process Capability and Six-Sigma

BLOCK 2: Product Control


Unit 6: Acceptance Sampling Plans

Unit 7: Rectifying Sampling Plans

Unit 8: Single Sampling Plans

Unit 9: Double Sampling Plans

BLOCK 3: Time Series Analysis


Unit 10: Trend Component Analysis

Unit 11: Seasonal Component Analysis

Unit 12: Stationary Time Series

Unit 13: Correlation Analysing in Time Series

Unit 14: Time Series Modelling Techniques

BLOCK 4: Reliability Theory


Unit 15: Basic Functions of Reliability

Unit 16: Reliability of Series and Parallel Systems

Unit 17: Reliability of Standby Systems

Unit 18: Reliability of Complex Systems

6
BLOCK 3: TIME SERIES ANALYSIS
In our day-to-day life, we generally collects data at one point in time such type of data is called
cross-sectional data. In cross-sectional data, we collect information about different individuals/
subjects at the same point in time or during the same time. For example, data related to the
income of families living in New Delhi, GDP of various contraries in 2023, temperature of
different states on a particular day, marks of learners pursuing a programme, etc. For such
types of data, we just describe or summarised the current status of a group using mean,
SD,...etc. For example, for the data related to the income of families living in New Delhi, we
just find the average income, number of families below the poverty line, etc. but we do not find
whether the income increasing or decreasing.
There are so many situations where we collect data over time. For example, in business, we
observe daily sales, weekly interest rates, and daily closing stock prices. In meteorology, we
observe daily high and low temperatures and hourly wind speeds. In agriculture, we record
annual figures for crops and quarterly production, In the biological sciences, we observe the
electrical activity of the heart at millisecond intervals, etc. Such types of data are called time
series data. A time series is a set of numeric data of a variable that is collected over time at
regular intervals and arranged in chronological (time) order.
The objective of this block is to develop the skills which are essential to apply statistical tools
for analysing time series data and forecasting the values of the time series. This block
comprises five units.
In Unit 10: Trend Component Analysis, we shall discuss what a time series is and what are
its components. To see better patterns of the time series, we describe the methods of
smoothing or filtering such as simple and weighted moving averages with exponential
smoothing. We explain additive and multiplicative modes of time series. We also discuss the
estimation of trend component using method of least squares and moving average.
Unit 11: Seasonal Component Analysis deals with the study of some methods for estimating
the seasonal component and cyclic components. Here, we explain simple average method,
ratio to moving average, ratio to trend method with merits and merits for estimating seasonal
components. We also discuss how to deseasonalise the data and explain estimation of trend
using deseasonalised data. The estimation of cyclic and random components are also
discussed in this unit. After estimating comonents of a time series, we shall discuss the method
of forecast on basis of the components of the time series.
Unit 12: Stationary Time Series and Unit 13: Correlation Analysing in Time Series are
devoted to discussing some fundamental concepts that are necessary for a proper
understanding of time series modelling. Unit 12 begins with a simple introduction of stationary
and nonstationary time series. We also explain various methods of transforming a
nonstationary time series into a stationary one. Unit 13 begins with a simple introduction of
autocovariance, autocorrelation functions and partial autocorrelation in time series. We discuss
how to estimate these functions using time series data. We also discuss correlogram and how
to interpret it.
Unit 14: Time Series Modelling Techniques explains various time series models that are
used for forecasting. We begin with a simple introduction of the necessity of time series models
instead of ordinary regression models. We discuss the autoregressive (AR), moving average
(MA), autoregressive moving average (ARMA) and autoregressive integrated moving average
(ARIMA) models. When you deal with real-time series data, then the first question that may
arise in your mind is how you know which time series model is most suitable for a particular
7
time series data. For that, we discuss time series model selection in this unit.
Expected Learning Outcomes
After completing this block, you should be able to:
 understand what the time series is and describe its components;
 describe smoothing techniques for forecasting models, including, simple moving
average, weighted moving average, and exponential smoothing;
 explain various methods for the estimation of the trend;
 apply the various methods for estimating seasonal component such as simple average
method, the ratio to trend method and ratio to moving average method;
 discuss the methods of estimation of cyclic and irregular fluctuations in a time series;
 use trend, seasonal and cyclical components to forecast future values.
 distinguish between stationary and nonstationary time series and transform a
nonstationary time series to stationary time series.
 describe the concept of covariance and correlation in time series and explain
autocovariance and autocorrelation functions;
 describe and use of autoregressive models, moving average models, autoregressive
moving average models and autoregressive integrated moving average models; and
 selection of a particular time series model for real-life time series data.

The following notations and symbols are used in this block:


Notations and Symbols
Sec./Secs. : Section/Sections
Fig./Figs. : Figure/Figures
Yt and Ŷt : Mesearment and estimated value of time series variable
Tt, Ct, St and It : Trend, cyclic, seasonal and irregular variations
M and w : Period of moving average and weight
MA/WMA : Moving average/Weighted moving average
et and α : Forecast error and exponential smoothing constant
β0, β1,.. : Trend equation constants
Var [ y t ] and Cov [ y t , y s ] : Variance of variable yt and covariance between yt and ys
Yt′ : First-order difference of Yt
rXY = rYX : Correlation coefficient between variables X and Y
k and ɛt : Lag and error term
γk and ρk : Autocovariance and autocorrelation
ck and rk : Sample autocovariance and autocorrelation
PACF ( φkk ) : Partial autocorrelation function
AR and ARMA : Autoregressive and autoregressive moving average
ARIMA : Autoregressive integrated moving average
φ and θ : Autoregressive and moving average models’ constant

8 Block Preparation Team


UNIT 10
TREND COMPONENT ANALYSIS

Structure
10.1 Introduction Exponential Smoothing

Expected Learning Outcomes


10.6 Estimation of Trend
10.2 Introduction to Time Series Component Using Method
of Least Squares
10.3 Components of Time series
Linear Trend
Trend Component
Quadratic Trend
Seasonal Component
Exponential Trend
Cyclic Component
10.7 Estimation of Trend
Irregular Component
Component Using Moving
10.4 Basic Models of Time Series Average
Additive Model 10.8 Summary
Multiplicative Model 10.9 Terminal Questions
10.5 Smoothing Time Series 10.10 Solution /Answers
Simple Moving Average

Weighted Moving Average

10.1 INTRODUCTION
Most of the data used in statistical analysis is collected at one point of time
such type of data is called cross-sectional data. In cross-sectional data, we
collect information about different individuals/subjects at the same point of
time or during the same time. For example, data related to learners pursuing
the MSCAST programme in July 2023 such as name, qualification, age,
address, marks in graduation, etc., production of milk, import and export,
information on the household income of New Delhi residents, etc. For such
type of data, we just describe the status of the group at a point. For example,
for the data related to the income of families living in New Delhi, we just find
the average income, number of families below the poverty line, etc. but we do
not find whether the income increasing or decreasing.
There are so many situations where we collect data over time. For example, in
business, we observe daily sales, weekly interest rates, and daily closing stock
prices. In meteorology, we observe daily high and low temperatures and
9
* Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 3 Time Series Analysis
hourly wind speeds. In agriculture, we record annual figures for crops and
quarterly production, In the biological sciences, we observe the electrical
activity of the heart at millisecond intervals, etc. Such types of data are called
time series data. A time series is a set of numeric data of a variable that
is collected over time at regular intervals and arranged in chronological
(time) order.
In this unit, we shall discuss what a time series is and what are its
components. In Sec. 10.2, we discuss what is time series with various
examples. The components of a time series are described in Sec. 10.3. In
Sec. 10.4, we explore different basic models of time series which show the
relationships among the various components of a time series. To see better
patterns of the time series, we describe the methods of smoothing or filtering
such as simple and weighted moving averages, and exponential smoothing in
Sec. 10.5. Sec. 10.6 and 10.7 are devoted to estimation of trend effects using
the method of least squares (curve fitting) and moving average, respectively.
In the next unit, you will learn various methods of estimating other components
of a time series.
Expected Learning Outcomes
After studying this unit, you would be able to
 explain what the time series is;
 describe the components of time series;
 explain the basic models of time series;
 decompose the time series into different components for further analysis;
 describe smoothing techniques for forecasting models, including, simple
moving average, weighted moving average, and exponential smoothing;
and
 explain various methods for the estimation of the trend.

10.2 INTRODUCTION TO TIMES SERIES


A time series is a collection of observations made sequentially through time. In
other words, the data on any characteristic collected with respect to time
over a span of time periods is called a time series. Normally, we shall
assume that observations are available at equal intervals of time e.g., yearly,
monthly, daily, hourly, etc. Some time series cover a period of several years.
The time series data are collected in most of the fields, ranging from
economics to engineering. For example, in business, the time series data
gathered such as daily sales, daily closing stock prices, price of an item, in the
meteorological department, daily high and low temperatures, hourly wind
speed, in agriculture, annual figures for crops and yearly production, soil
erosion, In the biological sciences, the electrical activity of the heart at
millisecond intervals, brain monitoring (ECG), in ecology, the abundance of an
animal species. In medicine, blood pressure tracking, weight tracking,
cholesterol measurements, heart rate monitoring, etc.
There are two main goals in analysing a time series:
10 • First, one may want to describe or summarise the key features of the
Unit 10 Trend Component Analysis

time series data, and


• Second, to predict what will happen in future based on past data (this is
called forecasting). Forecasting
Time-series forecasting
For example, meteorologists forecast future weather conditions based on past
is a method for
observations, the milk production company forecasts the future demand of
predicting future values
milk on the sales of milk on past days, business decision makers predict future over a period or at a
sales, etc. Due to several special features of time series, we require different precise point in the
techniques to analyse and model the time series data for the forecast. Time future using historical
series analysis is the art of extracting meaningful insights from time series and present data.
data by exploring the series' structure and characteristics and identifying
patterns that can then be utilized to forecast future events of the series.
Time series analysis assumes that data values of a time series variable are
determined by four underlying environmental forces that operate both
individually and collectively over time. They are trend (T), seasonal variations
(S) cyclic variations (C) irregular (random) variations (I). These are called
components of time series.

10.3 COMPONENTS OF TIME SERIES


The time series data do not remain constant over time while there is a
variation in the values of the data. For example, the manager of a company
collected the quarterly data of sales of a commodity for the period 2014-2020
which is given as follows:
Year Quarter 1 Quarter 2 Quarter 3 Quarter 4
2014 18 8 12 9
2015 5 8 4 11
2016 4 10 14 18
2017 24 23 27 30
2018 35 32 30 38
2019 32 35 30 24

From the above data, we see that the sales of the commodity vary with time
(quarterly and yearly). The variation occurs because of the effects of the
various forces (such as seasons) at work, commonly known as components of
time series.
In the past when we analysed the time series, then we assumed that data
values of a time series variable are determined by four underlying
environmental forces that operate both individually and collectively over time.
They are: (i) Trend (ii) Seasonal (iii) Cyclic and (iv) Remaining variation
attributed to Irregular fluctuations (sometimes referred to as Random
component).

Time Series

Irregular or
Long term or Seasonal Cyclic
Random
Trend Component Component Component
Component
11
Block 3 Time Series Analysis
This approach is not necessarily the best one and we shall discuss the
modern approach in later units. Some or all the components are present in
varying amounts and can be classified into the mentioned four categories. We
shall now discuss these components in more detail one at a time.
10.3.1 Trend Component
Usually, time series data show random variation, but over a long period of
time, there may be a gradual shift in the mean level to a higher or a lower
level. This gradual shift in the level of time series is known as the trend. In
other words, the general tendency of values of the data to increase or
decrease during a long period of time is called the trend.
When time series values are plotted on a graph and the values show an
increasing or decreasing (on an average) pattern during a long period with
reference to the time, then the time series is called the time series with a trend
effect. The time series may show different types of trends.
Time Series with Upward Trend
When a time series values are plotted on a graph and the values are
increasing or showing an upward pattern (as shown in Fig. 10.1) with
reference to the time then the time series is called the time series with an
upward trend. For example, upward tendencies are seen in the data of
population growth, currency in circulation, prices of petroleum products in
India, number of passengers in the metro, literary rate, GDP of a country, etc.
We plot a time series graph (Fig. 10.1) of the GDP of a country from 2011 to
2020 that shows an upward trend.

Y
82

72

62
GDP

52

42

32 X
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Year

Fig. 10.1: GDP of a country from 2011 to 2020.

Time Series with Downward Trend


When a time series values are plotted on a graph and the values decrease or
show a downward pattern (as shown in Fig. 10.2) with reference to the time
then the time series is called the time series with a downward trend. For
example, the downward trend is seen in the data of death rate, birth rate,
number of landline phones, etc. The death rate of a country from 2012 to 2022
12 is showing a downward trend as shown in Fig. 10.2.
Unit 10 Trend Component Analysis

8.5

7.5
Death rate

6.5

5.5

4.5 X
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Year

Fig. 10.2: Death rate of a country from 2012 to 2022.

Time Series with No Trend


It is to be noted that all time series do not show an increasing or decreasing
trend. In some cases, the values of time series fluctuate around a constant
reading and do not show any trend with respect to time. Therefore, if a time
series data is plotted on a graph paper and does not show any trend that is
there is neither an upward nor a downward trend reflected in the time series
plot then this kind of time series is called a time series with no trend. For
example, the yield of a crop in a particular area from 2002 to 2021 shows no
trend as shown in Fig. 10.3.

Y
4.0
3.5
3.0
2.5
2.0
Yield

1.5
1.0
0.5
0.0
-0.5
-1.0 X

Year

Fig. 10.3: Yield of a crop in a particular area from 2002 to 2021.

This should be clearly understood that a trend is general, smooth, long term
and the average tendency of a time series data. The increase or decrease
may not necessarily be in the same direction throughout the given period. The
tendency of a time series may be found in either the form of a linear or a
nonlinear (curvy linear) trend. If the time series data is plotted and the points
on the graph cluster more or less around a straight line, then the tendency
shown by the data is called a linear trend in time series. Similarly, if the points
plotted on the graph do not cluster more or less around a straight line, then the
tendency shown by the data is called a nonlinear or curvilinear trend. Trends
are also known as long-term variations. The long-term or long period of time 13
Block 3 Time Series Analysis
is a relative term which cannot be defined. In some cases, a period of one
week may be long while in some cases a period of 2 years may not be
enough. Some of the more important causes of long-term trend movements in
a time series include population growth, urbanisation, technological
improvements, economic advancements and developments, and consumer
shifts in habits and attitudes.
10.3.2 Seasonal Component
In a time series, the variations which occur due to the rhythmic or natural
forces and operate in a regular and periodic manner over a span of less than
or equal to one year are termed as seasonal variations. We generally think
of seasonal movement in time series as occurring yearly, but it can also
represent any regularly repeating pattern that is less than one year in duration.
For example, daily traffic volume data show within-day seasonal behaviour,
with peak levels occurring during rush hours, moderate flow during the rest of
the day, and light flow from midnight to early morning. Thus, in a time series,
seasonal variation may exist if data are recorded quarterly, monthly, daily and
so on. Even though the data may be recorded over a span of three months,
one month, a week or a day, the amplitudes of the seasonal variation may be
different. Most of the time series data of economic or business fields show the
seasonal pattern. For example, the number of farming units (such as ploughs
and tractors) sold quarterly for the period 2019 to 2022 shows a seasonal
effect as shown in Fig. 10.4.

Y
75

70

65

60

55

50

45

40 X

2019 2020 2021 2022


Year

Fig. 10.4: Quarterly sales of farming units.

The seasonal pattern existing in a time series may be either due to natural
forces or man-made conventions.
Seasonal Variations due to Natural Forces
Variations in time series that arise due to changes in seasons or weather
conditions and climatic changes are known as seasonal variations due to
natural forces. For example, sales of umbrellas and raincoat increase very fast
14 in the rainy season, the demand for air conditioners goes up in the summer
Unit 10 Trend Component Analysis
season, and the sale of woollens go up in winter all being operated by natural
forces.
Seasonal Variation due to Man-Made Conventions
Variations in time series that arise due to changes in fashions, habits, tastes,
and customs of people in any society are called seasonal variations due to
man-made conventions. For example, in our country sales of gold and clothes
go up in marriage seasons and festivals.
10.3.3 Cyclic Component
Apart from seasonal effects, some time series exhibit variation due to some
other physical causes, which is called cyclic variation. Cyclic variations are
wave-like movements in a time series (as shown in Fig. 10.5), which can vary
greatly in both duration and amplitude. Cyclical variations are recurrent
upward or downward movements in a time series, but the period of a cycle is
greater than a year whereas this period is less than one year in seasonal
variation. Cyclic and seasonal variations are seen as similar, but they are quite
different. If the variations are not of a fixed period, then they are cyclic and if
the period is constant and associated with some aspect of the season, then
the pattern is seasonal. In general, the average length of cycles is longer than
the length of a seasonal pattern, and the magnitude of cycles tends to be more
variable than the magnitude of seasonal variations.
The cyclic variation in a time series is usually called the “Business cycle”
and comprises four phases of business i. e. prosperity (boom), recession,
depression, and recovery.
Prosperity (boom)
The prosperity of any business is its profit. During a period of boom,
businessmen and industrialists invest more and the economy surpasses the
level of full employment and the level of production increases. These
incentives make them produce more and therefore profit more.
Recession
When there is excessive expansion, then it results in diseconomies that make
it difficult to keep up with large-scale production. Additionally, it causes greater
prices, rising salaries, and additional shortages. In an economic cycle, this is
referred to as a recession.
Depression
In this phase of the economic cycle, output, income, and employment all start
to drop rapidly. Also, investments decrease, and businesses are demoralised.
Thus, it leads to pessimism which leads to deflation and depression.
Recovery
The depressive phase does not last forever. After some time, there is a cooling
down and the improvement of trade begins. During the recovery period, old
debts are repaid, and the units which are weaker are settled. As a result, the
unemployment rate is gradually declining over time and income is generated.

These four phases of a business cycle i. e. prosperity (boom), recession,


depression and recovery are shown in Fig. 10.5.
15
Block 3 Time Series Analysis

Fig. 10.5: Business cycle and its phases.

10.3.4 Irregular Component


Apart from these regular variations, long-term and short-term variation,
random or irregular factors which are not accounted for trend, seasonal or
cyclic variations, exists in almost all time series. The variations in a time series
which do not repeat in a definite pattern are called irregular variations or
irregular component of a time series. The irregular variations in a time series
may be either due to unforeseen one-off events such as natural disasters
(floods, droughts, fires) or man-made disasters (strikes, boycotts, accidents,
war, riots).
Since occurrences of irregular variations are totally unpredictable and follow
no specific pattern, therefore, we cannot think of their time of occurrence,
direction, and magnitude. In the latter units, we shall try to explain it by
probability models such as autoregressive (AR) and moving average (MA)
models, etc.

As we have discussed all four components which affect individually as well as


jointly to the time series. Now, let us take an example of quarterly data of sales
of a commodity for the period 2014-2019 given in the following table:

Year Quarter 1 Quarter 2 Quarter 3 Quarter 4

2014 18 8 12 9

2015 5 8 4 11

2016 4 10 14 18

2017 24 23 27 30

2018 35 32 30 38

2019 32 35 30 24

We plot the time series data by taking sales on the Y-axis and the quarters on
16 the X-axis. We get the time series plot as shown in Fig. 10.6.
Unit 10 Trend Component Analysis

Fig. 10.6: Trend, Cycles and Seasonal variation in quarterly sales.

The plot shows more clearly the presence of different components in the time
series data. The plot also shows seasonal as well as cyclic effects. If we draw
a free-hand line to show the approximate movement of a curve around the
line, then this line shows the presence of a long-term linear trend.
All time series need not necessarily exhibit all four components. For example,
the time series data of annual production of a yield does not have seasonal
variations and similarly, a time series for the annual rainfall does not contain
cyclical variations.
Before moving to the next section, you can try the following Self Assessment
Question for better understanding.

SAQ 1
What is time series? Describe its components.

10.4 BASIC MODELS OF TIME SERIES


In the previous section, we discussed different types of factors which affect the
time series. In this section, we shall discuss the commonly used mathematical
models which explain the time series data reasonably well. The basic models
show the functional relationship among the various components of a time
series. While discussing these models, we shall use the notation Yt for the
value of the time series at the time t. The following are the basic time series
models:
10.4.1 Additive Model
The additive model is one of the most widely used models. The additive model Since in time series, we
assumes that at any time t the time series value Yt is the sum of all four have measured values of
components. According to the additive model, the value of a time series a variable over time,
therefore, we use “t” as
variable can be expressed as
a subscript in the
Yt = Tt + Ct + St + It variable of time series
models.
where, Tt, Ct, St and It are the trend, cyclic, seasonal and irregular variations at
time t, respectively. In this model, it is assumed that cyclic effects remain 17
Block 3 Time Series Analysis
constant for all cycles and the seasonal effects remain constant during any
year or corresponding period. In this model, it is also assumed that irregular
variation is an independent identically normally distributed variable with mean
0, i.e. the irregular variation effect remains constant throughout. Obviously, the
additive model implies that seasonal variations in different years, cyclic
variations in different cycles and irregular variations in different trends show
equal absolute effects irrespective of the trend value.
10.4.2 Multiplicative Model
In the additive model, we have assumed that a value of a time series variable
is the sum of the trend, cyclic, seasonal, and irregular components but this
model is based on the assumption that cyclic effects remain constant for all
cycles and the seasonal effects remain constant during any year or
corresponding period. However, there are several situations where the
seasonal variations exhibit an increase or decrease trend over time. When
seasonal variations exhibit any change over time in terms of an increasing or
decreasing trend, we can try the multiplicative model. In other words, if the
various components in a time series operate proportionately to the general
level of the series, the multiplicative model is appropriate. The multiplicative
model is formed with the assumption that the time series value Yt at time t is
the product of the trend, cyclic, seasonal and irregular components of the
series. Symbolically, the multiplicative model can be described as.
Yt = Tt × Ct × St × It

where, Tt, Ct, St and It denote the trend, cyclic, seasonal and irregular
variations. The multiplicative model is really a multiplicative version of the
additive model. This model is found appropriate for many business and
economic data. For example, the time series of the production of electricity,
the time series of the number of passengers who opted the air travelling, the
time series of the consumption of soft drinks, etc.
You can try the following Self Assessment Question for better understanding.

SAQ 2
What is the difference between additive and multiplicative models?

10.5 SMOOTHING TIME SERIES


For the estimation of trend and seasonal effects, it is very important to smooth
out (or filter out) the effect of irregular fluctuations of time series so that the
effects of trend and seasonal components can be easily estimated. Smoothing
helps us to see better patterns of the time series such as trend. Generally
smoothing smooths out the irregular roughness to see a clearer trend.
Smoothing techniques are based on averaging values over multiple periods to
reduce irregular fluctuations. Since these techniques “smoothing out” the
short term/irregular fluctuations from the time-series data, therefore, they are
called smoothing techniques. After smoothing out the short-term fluctuation,
we can estimate and forecast and trend effect. They manage the data in the
sense that we can estimate time series components directly from the data
18 without a predetermined structure. In this section, we shall discuss two simple
Unit 10 Trend Component Analysis

and important smoothing methods for the time series data namely moving
averages and exponential smoothing methods.
10.5.1 Simple Moving Average
The moving average (MA) is the simplest method for smoothing time series
data. A moving average removes irregular fluctuations and short-term
fluctuations from a time series. This method is based on averaging each
observation of a series with its surrounding observations, that is, past and
future observations in chronological order. In this method, we find the simple
moving averages of time series data over m span/period of time, and these
averages are called m-period moving averages. These averages are
smoothed versions of the original time series. In this method, we put the
average on the middle value of the set of observations, therefore, we explain
the moving average for old and even periods as follows:
When m is odd
In some situations, the data may show seasonal effects over an odd period of
time, e.g., 5 days, 7 months, etc. It means that after every 5 days or 7 months,
data behave in a similar manner. Therefore, to remove the seasonal effects
from the time series, we calculate the moving average of an odd span/period
of time, in our example, it is 5 days or 7 months. For odd periods, the method
consists of the following steps:
Step 1: We calculate the average of the first m values of the time series and

place it against the middle value, i.e.,


(m + 1) th observation. For
2
example, if m = 3, we compute the average of the first three
observations of the time series as

y1 + y 2 + y 3
MA1 =
3
where y1, y 2 and y 3 are the first three observations of the time series.

and place it against the


(m + 1) th observation, i.e., ( 3 + 1) th = 2nd
2 2
observation (y2).
Step 2: We discard the first observation and include the next observation.
Then we take the average of m values again. For example, if m = 3,
we discard the first observation and determine the average of the
second, third and fourth observations and place it against the middle
of the second, third and fourth observations, that is, in front of 3rd
observation.
Step 3: We repeat this process until all data are exhausted. These steps
provide us with a new time series of m-period moving averages which
will be smoother than the original time series.
When m is even
Sometimes, there may be a seasonal effect over an even period, e.g.,
quarterly or yearly, etc. This means that after every fourth quarter or 12
months, data behave in a similar way. Since there is no centre of these four
observations, therefore, the simple moving averages for even periods need to 19
Block 3 Time Series Analysis
be centred. So, it is also known as centred moving averages. If we take a
centred moving average with m = 4 then it will filter (or eliminate) the effect of
the season by a quarter. This method consists of the following steps:
Step 1: We calculate the average of the first m values of the time series and
m
place it against the middle value, i.e., . For example, if m = 4, we
2
compute the average of the first four observations and place it
against the middle of the second and third observations.
Step 2: We discard the first observation and include the next observation and
then take the average of the next m observations again. For
example, if m = 4, we discard the first observation, find the average
of the second, third, fourth and fifth observations and place it against
the middle of the third and fourth observations. We repeat this
process until all data are exhausted as shown in Example 2.
Step 3: Since the period (m) is even and there is no exact midpoint, so we
cannot put the MA corresponding to any year. Therefore, we
determine the centred moving average. To determine the first
centred moving average, we compute the average of the first two
moving averages and place it against the middle value of the two
moving averages. For example, if m = 4, we calculate the average of
the first two moving averages and place it against the middle of the
first and second moving averages, i.e., against the third observation.
Step 4: We repeat this process until all data are exhausted.

Note: The moving average eliminates periodic variations if the span of the
period of the moving average (m) is equal to the period of the oscillatory
variation. Therefore, we should choose m which constitutes a cycle, for
example, if a cycle is completed in 3 months, we should calculate exactly 3
monthly moving averages. If there is a variation in the span of cycles, for
example, suppose, the first cycle is completed in 3 months, the second in 3
months and the third in 5 months and so on then we should use the average of
these time spans as m.
Let us take an example to explain the procedure of the moving average
method.
Example 1: The following table shows the number of fire insurance claims
received by an insurance company in each four-month period from 2018 to
2021:

Year 2018 2019 2020 2021

Period I II III I II III I II III I II III

No. of
17 13 15 19 17 19 22 14 20 23 19 20
Claims

Calculate and plot the three-period and five-period moving average series for
the number of fire insurance claims. Compare these two moving average
series.
Solution: Since m = 3 is odd, therefore, we take the average of the first three
observations and put the same in the middle of these observations, that is, in
20 front of the second observation. The first value of m = 3 years moving average
Unit 10 Trend Component Analysis
17 + 13 + 15
is the average of 17, 13 and 15 which equals to = 15 and we put
3
this value in front of II period of 2018. The second value of moving averages is
obtained by discarding the first observation i.e. 17 and including the next
observation, that is, 19, this gives an average of 13, 15 and 19, which is equal
13 + 15 + 19
to = 15.67 . And we put it against the third observation, that is III
3
period of 2018. We repeat the same procedure of calculating three-period
moving averages until all data are exhausted. Similarly, we can find the five-
period moving averages by following the same procedure of three-period
moving averages, that is, we calculate the average of the first five
observations 17, 13, 15, 19 and 17 and put the same in front of the third
period. We calculate the rest of the moving averages in the following table:
No. of
Year Period 3-period MA 5-period MA
Claims

I 17 --- ---

17 + 13 + 15 ---
2018 II 13 = 15.00
3
13 + 15 + 19 17 + 13 + 15 + 19 + 17
III 15 = 15.67 = 16.20
3 5
15 + 19 + 17 13 + 15 + 19 + 17 + 19
I 19 = 17 17 = 16.60
3 5
19 + 17 + 19 15 + 19 + 17 + 19 + 22
2019 II 17 = 18.33 = 18.40
3 5
17 + 19 + 22 19 + 17 + 19 + 22 + 14
III 19 = 19.33 = 18.20
3 5
19 + 22 + 14 17 + 19 + 22 + 14 + 20
I 22 = 18.33 = 18.40
3 5
22 + 14 + 20 19 + 22 + 14 + 20 + 23
2020 II 14 = 18.67 = 19.60
3 5
14 + 20 + 23 22 + 14 + 20 + 23 + 19
III 20 = 19.00 = 19.60
3 5
20 + 23 + 19 14 + 20 + 23 + 19 + 20
I 23 = 20.67 = 19.20
3 5

2021 23 + 19 + 20 ---
II 19 = 20.67
3

III 20 --- ---

From the above table, we observed that the original series varies between 13
and 22 whereas moving averages vary between 15 and 20.67 (3-period) and
16.20 to 19.60 (5-period) which are much smoother than the original series.
The moving averages fluctuate less than the fluctuation of the original
observations which are calculated as they smooth (or filter out) the effect of
seasonal/irregular components. This helps us to appreciate the effect of the
trend more clearly.

We now plot the original observations with both the 3-period and 5-period MA
values by taking them on the Y-axis and the period on the X- axis as shown in
Fig. 10.7.
21
Block 3 Time Series Analysis

Y
24 3-period MA 5-period MA
Original data
22

20

No. of Claims
18

16

14

12

10 X
I II III I II III I II III I II III
2018 2019 2020 2021
Period

Fig. 10.7: Insurance claims data with 3 and 5- period moving averages.

From a comparison of the line plots of the 3-period and 5-period moving
average values, we can see that there is less fluctuation (greater smoothing)
in the 5-period moving average series than in the 3-period moving average
series.
Therefore, we can conclude that the term, m, for the moving average affects
the degree of smoothing:
• A shorter period (m) of the moving average produces a more jagged
moving average curve.

• A longer period (m) of the moving average produces a smoother moving


average curve.

Now the problem is what should be the value of m. If m is increased then the
series becomes much smoother and it may also smooth out the effect of
cyclical and seasonal components, which are our main interest of study. To
take away seasonality from a series, we would use a moving average with a
length equal to the seasonal span. Thus, in the smoothed series, each
smoothed value has been averaged across all seasons. Sometimes 3-year, 5-
year or 7-year moving averages are used to expose the combined trend and
cyclical movement of time series.
The moving average can also be utilised as a forecasting model with some
simple steps. In simple moving average, we put the moving average at the
centre of the set of m observations as you have seen in Example 1 and we
have put the average of 17, 13 and 15 which equals 15 against the Period II of
2018. It means we forecast the value of the Period II of 2018. And for that, we
use both the past (observation of period I) and future (observation of period III)
of a given time point. In that sense, we cannot use moving averages as such
forecasting because, at the time of forecasting, the future is typically unknown.
Hence, for the purpose of forecasting, we use trailing moving averages. In
this method, we take an average of past consecutive m observations of the
series as follows:

y t + y t −1 + ... + y t −m+1
y′t +1 =
22 m
Unit 10 Trend Component Analysis
where y′t +1 is the forecast value of y t +1 . For example, if m = 3, we compute the
first forecast value by taking the average of the first three values as
y 3 + y 2 + y1
y′4 =
3
Furthermore, only the first forecast value is constructed by averaging only the Forecast Error
actual values of the series. As we move to the second forecast, the actual The error of an
values are replaced with the previously forecasted values. For instance, the individual forecast is the
difference between the
second forecast value is defined by the following expression:
actual value and the
y′t +1 + y t + y t −1 + ... + y t −m+ 2 forecast of that value. If
y′t + 2 = are the actual
m
and forecast of that
For example, if m = 3, we compute the second forecast value as value values at time t,
respectively then
y′4 + y 3 + y 2 forecast error can
y′5 =
3 defined as

If the forecast value for y t is y′t , then we can calculate the forecast error as

e=
t y t − y′t

10.5.2 Weighted Moving Average


The simple moving average described in the previous sub-section is not
generally recommended for measuring trend, although it can be useful for
removing seasonal variation. In simple moving averages, we give equal
importance to all observations, therefore, we assign equal weights but
sometimes we may observe that a certain time period has more importance in
comparison to the others. For example, a forecaster might believe that the
previous month’s value is two times as important in forecasting as other
months. Therefore, a moving average in which some time periods are
weighted differently than others is called a weighted moving average (WMA).
Generally, more weights are assigned to the recent observations and less to
past observations in the weighted moving average. We assign weights in such
a way that all weights are positive. If wi denotes the weight assigned to the ith
observation (i = 1, 2, ..., m) then we can compute the first weighted moving
average (WMA) as follows:
w1y1 + w 2 y 2 + ... + w m ym
WMA1 = such that w i ≥ 0
w1 + w 2 + ... + w m
The procedure for getting the smooth trend values by the weighted moving
average is the same as that for the simple moving average in spite of weights.
Therefore, we find the weighted average instead of the simple average in the
weighted moving average.
Let us take an example to illustrate this method.
Example 2: Compute 4-month weighted moving average for the number of fire
insurance claims data given in Example 1 using weights 1,1,2 and 4.

Solution: Here, the weights are given as


w1 1,=
= w 2 1,=
w 3 2,=
w4 4

Thus, we can find the first weighted moving average as 23


Block 3 Time Series Analysis
w1y1 + w 2 y 2 + ... + w m ym 1× 17 + 1× 13 + 2 × 15 + 4 × 19
=WMA1 = = 17.00
w1 + w 2 + ... + w m 1+ 1+ 2 + 4
In a similar way, you compute the rest weighted moving averages. Since
period (4) is even, therefore, we put the first WMA in the middle of the second
and third observations. For that, we create blank rows after each observation.
Also, we determine the centred moving average by averaging the first two
WMAs and placing it against the middle value, i.e., we calculate the average of
the first two WMAs and place it against the middle of the first and second
moving averages, i.e., against the third observation. We show these in the
following table:

No. of Centred 4-period


Year Period 4-period WMA
Claims WMA

I 17

II 13

2018 1× 17 + 1× 13 + 2 × 15 + 4 × 19
= 17.00
1+ 1+ 3 + 4
17 + 16.75
III 15 = 16.88
2
1× 13 + 1× 15 + 2 × 19 + 4 × 17
= 16.75
1+ 1+ 2 + 4
16.75 + 18
I 19 = 17.38
2
1× 15 + 1× 19 + 2 × 17 + 4 × 19
= 18.00
1+ 1+ 2 + 4
18 + 20.25
II 17 = 19.13
2
2019
1× 19 + 1× 17 + 2 × 19 + 4 × 22
= 20.25
1+ 1+ 2 + 4
20.25 + 17
III 19 = 18.63
2
1× 17 + 2 × 19 + 3 × 22 + 4 × 14
= 17.00
1+ 1+ 2 + 4
17 + 18.63
I 22 = 17.81
2
1× 19 + 1× 22 + 2 × 14 + 4 × 20
= 18.63
1+ 1+ 2 + 4
18.63 + 21
II 14 = 19.81
2
2020
1× 22 + 1× 14 + 2 × 20 + 4 × 23
= 21.00
1+ 1+ 2 + 4
21 + 19.5
III 20 = 20.25
2
1× 14 + 1× 20 + 2 × 23 + 4 × 19
= 19.50
1+ 1+ 2 + 4
19.5 + 20.13
2021 I 23 = 19.81
24 2
Unit 10 Trend Component Analysis

1× 20 + 1× 23 + 2 × 19 + 4 × 20
= 20.13
1+ 1+ 2 + 4

II 19

III 20

After understanding the moving average as a smoothing and forecast


technique, we now discuss the merits and demerits of the moving average.
Merits

1. It is simple as compared to the other method (the exponential method


discussed in the next section).

2. A moving average time series is a smoother series than the original time
series values. It has removed the effect of short-term fluctuations (i.e.
seasonal and irregular fluctuations) from the original observations by
averaging over these short-term fluctuations.

3. If the moving average period coincides with the period of cyclical


variations, the moving average method eliminates the regular cyclical
fluctuations. Even if the variations are not eliminated, it reduces their
intensity.

4. The method is very flexible in the sense that the addition of a few more
figures to the data simply results in a few more trend values without
affecting the previous calculations.

5. The method is suitable for determining trend when a linear trend is


present in the time series.

Demerits

1. As seen from the moving average calculations, its primary drawback is a


loss of information (data values) at both ends of the original time series.
However, this is not a significant drawback if the time series is long, say
50 time periods or more.

2. If a series consists of a non-linear trend, a moving average will not reveal


the trend present.

3. The selection of the extent or period of the moving average is quite


difficult particularly when the time series data exhibit cycles which are not
regular in period and amplitude. The effect of an inappropriate selection of
the extent or period is that moving averages may generate cycles or other
movements which were not present in the original data.

4. The moving averages are greatly affected by extreme values. To


overcome this somewhat, a weighted moving average with appropriate
weights is used.

Before moving to the next method of smoothing, i.e., exponential smoothing,


you can try the following Self Assessment Question for better understanding. 25
Block 3 Time Series Analysis

SAQ 3
The marketing manager of an electricity company recorded the following
quarterly demand levels for electricity (in 1000 megawatts) in a city from 2020
to 2022.
Season 2020 2021 2022
Summer 70 101 146
Monsoon 52 64 92
Winter 22 24 38
Spring 31 45 49

(i) Calculate the quarterly moving averages of the electricity demand.


(ii) Calculate the quarterly weighted moving averages taking weights as 1, 2,
3 and 4.
(iii) Plot both the quarterly simple and weighted moving averages together
with the original data on the same axis. Also, compare both methods.

10.5.3 Exponential Smoothing


A popular forecasting method in business is exponential smoothing. Its
adaptability, simplicity in automation, low cost, and high performance are the
main reasons for its popularity. Simple exponential smoothing is similar to the
moving average, except that instead of taking a simple average over the m
most recent values, we take a weighted average of all past values so that the
weights decrease exponentially into the past. The decay rate of the
observation weights is set by the smoothing parameter of the model α (0 < α <
1) and called the exponential smoothing constant. The idea is to give more
weight to recent information, but previous information should not be
completely ignored. Similar to the moving average, simple exponential
smoothing can be used for forecasting but the main assumption is that the
series stays at the same level (that is, the local mean of the series is constant)
over time, and therefore, this method is suitable for series with neither trend
nor seasonal components. As mentioned earlier, such a series can be
obtained by removing trend and/or seasonality from the original time series
and then applying exponential smoothing to the series of residuals (which are
assumed to contain no trend or seasonality). If y1, y 2 ,.., y t are the observations
of a time series, then the smoothed value at time t is given by
y′t = αy t + α (1 − α ) y t −1 + α (1 − α ) y t − 2 + ...
2

We can also write, the above expression as


y′t = αy t + (1 − α ) y′t −1

where α is called the exponential smoothing constant and lies between 0 and
1. It controls the rate at which weights decrease.
This method consists of the following steps:
Step 1: We take the first given value as the first smoothed value, i.e.,
y1′ = y1

Step 2: We compute the second smoothed value using the first smoothed
26 value. We compute the second smoothed value as
Unit 10 Trend Component Analysis
y′2 = α y 2 + (1 − α ) y1′

Step 3: We repeat this process until all data are exhausted. We compute the
tth smoothed value as follows:
y′t = α y t + (1 − α ) y′t −1

The popular choice of the smoother constant is α = 0.2. For this, we assign a
weight of 0.2 on the most recent observation and a weight of 1 − 0.2 = 0.8 on
the most recent forecast value.

Let us look at an example which helps you to understand how to compute


exponentially smooth values.
Example 3: Consider the data of the number of fire insurance claims received
by an insurance company given in Example 1. Smooth the given time series
data using smoothing factors 0.2, 0.4 and 0.8. Compute and compare the
forecast errors produced by using the different exponential smoothing
constants.
Solution: In the exponential smoother method, we take the first
smooth(forecast) value as the first given value, i.e.,
′ y=
y=
1 1 17
We can compute the forecast error as
e1 = y1 − y1′ = 17 − 17 = 0
We can compute the second smoothed value using the first smoothed value
and α = 0.2 as
y′2 =α y 2 + (1 − α ) y1′ =0.2 × 13 + (1 − 0.2) × 17 =16.20

We can compute the forecast error as


e1 = y 2 − y′2 = 13 − 16.20 = 3.80

Similarly, you can compute the rest values in the same manner and also for
α = 0.4 and 0.8.

No. of Exponential Smoothing Forecast Forecast Forecast


Year Period
Claims α = 0.2 α = 0.4 α = 0.8 Error (α = 0.2) Error (α = 0.4) Error (α = 0.8)
I 17 17.00 17.00 17.00 0 0 0
2018 II 13 16.20 15.40 13.80 –3.20 –2.40 –0.80
III 15 15.96 15.24 14.76 0.96 –0.24 0.24
I 19 16.57 16.74 18.15 2.43 2.26 0.85
2019 II 17 16.65 16.85 17.23 0.35 0.15 –0.23
III 19 17.12 17.71 18.65 1.88 1.29 0.35
I 22 18.10 19.42 21.33 3.90 2.58 0.67
2020 II 14 17.28 17.25 15.47 –3.28 –3.25 –1.47
III 20 17.82 18.35 19.09 2.18 1.65 0.91
I 23 18.86 20.21 22.22 4.14 2.79 0.78
2021 II 19 18.89 19.73 19.64 0.11 –0.73 –0.64
III 20 19.11 19.84 19.93 0.89 0.16 0.07
27
Block 3 Time Series Analysis
From the above table, we observe that as we increase the value of the
smoother constant then the forecast error decreases. We now demonstrate
the impact of the smoothing factor α using the time series graph as shown in
Fig. 10.8.

Y
24
α = 0.4
α = 0.8
22 α = 0.2
Original data

20
Claims

18

16

14

12 X
I II III I II III I II III I II III
2018 2019 2020 2021
Period

Fig. 10.8: Original and smoothed time series values of the claim data.

From the above figure, we observed that as we increase the smoothing


constant α, the series is smoother. Therefore, α plays the same role in
exponential smoothing as m in the moving average.
After understanding the exponential smoothing and forecast technique, we
now discuss the merits and demerits of this method.
Merits

1. It is very simple in concept and very easy to understand.


2. The primary merit of the exponential method over the moving average is
that there is no loss of information (data values) as in the case of the
moving average.

3. If we forecast using the moving averages method, then m prior values are
required. If we have to forecast many values, then this is time-consuming.
Whereas the exponential method uses only two pieces of data.
Demerits
1. The method is not flexible in the sense that if some figures are added to
the data, then we have to do all calculations again.
2. This method gives good results in the absence of seasonal or cyclical
variations. As a result, forecasts are not accurate when data with cyclical
or seasonal variations are present.
After understanding the exponential smoothing method, you may be interested
in doing the same yourself. For that, you can try the following Self Assessment
28 Question.
Unit 10 Trend Component Analysis

SAQ 4
The annual expenditure levels (in millions) to promote products and services
for the financial services sector such as banks, insurance, investments, etc.
from 2015 to 2022 are shown in the following table:
Year 2015 2016 2017 2018 2019 2020 2021 2022
Expenditure 5.5 7.2 8.0 9.6 10.2 11.0 12.5 14.0

Use exponential smoothing to obtain filtered values by taking α = 0.8 and


calculate the forecast errors. Also, plot the original and smoothed values.

We think that you have understood the importance of smoothing and how to
apply it in time series data. We shall discuss how to estimate the trend
component in the next session.

10.6 ESTIMATION OF TREND COMPONENT


USING METHOD OF LEAST SQUARES
There are several ways to determine trend effects in time series data and one
of the more prominent is the method of least squares. This method is one of
the most common methods for identifying and quantifying the relationship
between a dependent variable and single or multiple independent variables. It
can also be used to fit a trend. We can also use fitted trend for forecasting. To
create a trend model that captures a time series with a global trend, the
dependent/ response/output variable (Y) is set as the time series
measurement or some function of it, and the independent/predictor variable
(X) is set as a time period. In this method, we fit a curve in such a way that the
squares of the forecast errors should be minimum.
Many possible trends can be explored with time series data. In this section, we
examine only the linear model, the quadratic model and the exponential model
because they are the easiest to understand and simplest to compute. Because
seasonal effects can confound trend analysis, it is assumed here that no
seasonal effects occur in the time series data or they were removed prior to You will study how to
determining the trend. remove trend/seasonal
variations from a time
A linear trend means that the values of the series increase or decrease linearly series data in the next
in time, whereas an exponential trend captures an exponential increase or unit.
decrease.
10.6.1 Linear Trend
When the values of the time series increase or decrease linearly with time
then we use linear trend.
In the simplest case, the linear trend model allows for a linear relationship
between the forecast variable Y and a single predictor variable time t. In this
case, the linear trend line equation is as follows:
Yt = β0 + β1t

The coefficients β0 and β1 denote the intercept and the slope of the trend
line, respectively. The intercept β0 represents the predicted value of Y
when t = 0 and the slope β1 represents the average predicted change in 29
Block 3 Time Series Analysis
Y resulting from a one-unit change in t.
We can estimate the values of the constants β0 and β1 using the following
normal equations:

∑Y t = nβ0 + β1 ∑ t

∑ tY t = β0 ∑ t + β1 ∑ t 2

where n is the number of observations in the given time series. We obtain the
values of ∑ Yt , ∑ t, ∑ tYt and ∑ t 2 from the given time series data and solve
these normal equations for the values of β0 and β1 .

Generally, the time t is given in years, therefore, to calculate the values of


∑ t, ∑ tYt and ∑ t 2 manually becomes very cumbersome. Therefore, to
simplify the calculations, we may make the following transformations in t:
 t − middle value
 interval in t values ( when n is odd )

Xt = 
 t − average of two middle value
 ( when n is even )
 half of interval in t values

Therefore, the normal equations become:

∑Y t = nβ0 + β1 ∑ Xt

∑X Y t t = β0 ∑ Xt + β1 ∑ X2t

If βˆ 0 and βˆ 1 represent the estimated values of β0 and β1 , respectively, then the


fitted trend line for estimating or forecasting the trend values is given as
follows:
Yˆ t = βˆ 0 + βˆ 1Xt

After that, we put the value of Xt in terms of t to find the final trend line.
Let us take an example to understand how to fit a linear trend line for real-life
time series data.
Example 4: The sales director of a real estate company wants to study the
general direction (trend) of future housing sales. For that, he/she recorded the
number of houses sold from 2010 to 2018 as given in the following table:
Year 2010 2011 2012 2013 2014 2015 2016 2017 2018
Sales 52 54 48 60 61 66 70 80 92

(i) Construct a simple trend line for the house sales data for the real estate
company.
(ii) Find the trend values for the given data and find forecast errors.
(iii) Plot the given data with trend values.
(iv) Use the trend line of best fit to estimate the level of house sales for the
year 2022.
Solution: The linear trend line equation is given by

30 Yt = β0 + β1t
Unit 10 Trend Component Analysis

Since n (number of years) = 9 is odd and the middle value is 2014, therefore,
we make the following transformation in time t as
Xt = t − 2014

Therefore, the normal equations for estimating the constants are:

∑Y t = nβ0 + β1 ∑ Xt

∑X Y t t = β0 ∑ Xt + β1 ∑ X2t

We calculate the values of ∑ Y , ∑ X , ∑ X Y and ∑ X


t t t t
2
t in the following table:

Sales Xt = Trend Forecast


Year (t) XtYt Xt2
(Yt) t–2014 Value Error
2010 52 –4 –208 16 45.58 6.42
2011 54 –3 –162 9 50.38 3.62
2012 48 –2 –96 4 55.18 –7.18
2013 60 –1 –60 1 59.98 0.02
2014 61 0 0 0 64.78 –3.78
2015 66 1 66 1 69.58 –3.58
2016 70 2 140 4 74.38 –4.38
2017 80 3 240 9 79.18 0.82
2018 92 4 368 16 83.98 8.02
Total 583 0 288 60

Therefore, we find the values of β0 and β1 using the normal equations as


583
583= 9 × β0 + 0 × β1 ⇒ β0 = = 64.78
9
288
288= 0 × β0 + 60 × β1 ⇒ β1 = = 4.8
60
Thus, the final linear trend line is given by
=Ŷt 64.78 + 4.8X t

Ŷt = 64.78 + 4.8 ( t − 2014 )

We can find the trend values using the above trend line by putting values of t.
For example, t = 2010.
Ŷt = 64.78 + 4.8 ( 2010 − 2014 ) = 45.58

We can compute the forecast error as


e1 = y1 − yˆ 1 = 52 − 45.58 = 6.42

You can calculate the rest of the values in a similar manner. We have
calculated the same in the above table. We now plot the time series data and
trend line by taking years on the X-axis and the house sales and trend values
on the Y-axis. We get the time series plot as shown in Fig. 10.9.
We can estimate the trend value of house sales for 2022 by putting t = 2022 in
the above linear trend line as follows:
Ŷt = 64.78 + 4.8 ( 2022 − 2014 ) = 103.18 ≈ 103
31
Block 3 Time Series Analysis

Y
100

90 Trend line

80

House sales
Original data
70

60

50

40 X
2010 2011 2012 2013 2014 2015 2016 2017 2018
Year

Fig. 10.9: Computed trend line and house sales data.

10.6.2 Quadratic Trend


Sometimes the trend is not linear and shows some curvature. The simplest
curvilinear form is a second-degree polynomial. In this case, the quadratic
trend equation is given below:
Yt = β0 + β1t + β2 t 2

We proceed same as in the case of the trend line, the normal equations for
estimating β0 , β1 and β2 after the transform of the data are given as follows:

∑Y t = nβ0 + β1 ∑ Xt + β2 ∑ X2t

∑X Y t t = β0 ∑ Xt + β1 ∑ X2t + β2 ∑ X3t

∑X Y 2
t t = β0 ∑ X2t + β1 ∑ X3t + β2 ∑ Xt4

The values of ∑ Y , ∑ X , ∑ X , ∑ X Y , ∑ X , ∑ X Y and ∑ X


t t
2
t t t
3
t
2
t t
4
t are
obtained from the given data and we solve the normal equations for the
constants β0 , β1 and β2 .

If βˆ 0 , βˆ 1 and β̂2 represent the estimated values of β0 , β1 and β2 , respectively


then the fitted quadratic trend equation for estimating or forecasting the trend
values is given as follows:
Yˆ t = βˆ 0 + βˆ 1Xt + βˆ 2 X2t

After that, we put the value of Xt in terms of t to find the final quadratic trend.
To illustrate this, let us take an example to fit a quadratic trend.
Example 5: Fit a quadratic trend equation for the house sales data of the real
estate company given in Example 4. Also
(i) Find forecast errors.

(ii) Plot the given data with trend values.

(iii) Use the quadratic trend equation, to estimate the level of house sales for
year 2022.
32 Solution: The quadratic trend equation is given as
Unit 10 Trend Component Analysis
Yt = β0 + β1t + β2 t 2

Proceeding in the same way as in the case of Example 4, we make the


transform Xt = t − 2014 , then the normal equations are given as:

∑Y t = nβ0 + β1 ∑ Xt + β2 ∑ X2t

∑X Y t t = β0 ∑ Xt + β1 ∑ X2t + β2 ∑ X3t

∑X Y 2
t t = β0 ∑ X2t + β1 ∑ X3t + β2 ∑ Xt4

We calculate the values of ∑ Y , ∑ X , ∑ X Y , ∑ X Y , ∑ X ,∑ X


t t t t
2
t t
2
t
3
t and ∑X 4
t in
the following table:
Year Sales Xt = Trend Forecast
XtYt X 2t XtYt X 3t X 4t
(t) (Yt) t – 2014 Value Error
2010 52 –4 –208 16 832 –64 256 52.31 –0.31
2011 54 –3 –162 9 486 –27 81 52.07 1.93
2012 48 –2 –96 4 192 –8 16 53.27 –5.27
2013 60 –1 –60 1 60 –1 1 55.91 4.09
2014 61 0 0 0 0 0 0 59.99 1.01
2015 66 1 66 1 66 1 1 65.51 0.49
2016 70 2 140 4 280 8 16 72.47 –2.47
2017 80 3 240 9 720 27 81 80.87 –0.87
2018 92 4 368 16 1472 64 256 90.71 1.29

Total 583 0 288 60 4108 0 708

By putting the values from the table in the normal equations, we get
583= 9 × β0 + 0 × β1 + β2 × 60 ⇒ 9β0 + 60β2 = 583

288
288 = β0 × 0 + β1 × 60 + β2 × 0 ⇒ β1 = = 4.8
60
4108 = β0 × 60 + β1 × 0 + β2 × 708 ⇒ 60β0 + 708β2 = 4108

After solving the above equation for β0 , β1 and β2 get the estimate of these as

βˆ 0 =59.99 , βˆ 1 =4.8 , βˆ 2 =0.72

Thus, the final quadratic trend equation is given by


Ŷt = 59.99 + 4.8Xt + 0.72X2t

After putting the value of Xt in terms of t, we get the desired quadratic trend
equation as follows:

Ŷt = 59.99 + 4.8 ( t − 2014 ) + 0.72 ( t − 2014 )


2

We can find the trend values using the above quadratic trend equation by
putting t values. For example, for t = 2010.

Ŷt = 59.99 + 4.8 ( 2010 − 2014 ) + 0.72 ( 2010 − 2014 ) = 52.31


2

We can compute the forecast error as

e1 =y1 − yˆ 1 =
52 − 52.31 =
−0.31
33
Block 3 Time Series Analysis
You can calculate the rest of the values in a similar manner. We have
calculated the same in the table.
We can estimate the trend value of house sales for 2022 by putting t = 2022 in
the above linear trend line as follows:

Ŷt = 59.99 + 4.8 ( 2024 − 2014 ) + 0.72 ( 2024 − 2014 ) = 179.99 ≈ 180
2

We now plot the time series data and trend line by taking years on the X-axis
and the house sales and trend values on the Y-axis. We get the time series
plot as shown in Fig. 10.10.

Y
100

90 Quardetic trend
Trend line
80
Original data
House sales

70

60

50

40 X
2010 2011 2012 2013 2014 2015 2016 2017 2018
Year

Fig. 10.10: Fitted quadratic and linear trend models with house sales data.

If we compare the forecast errors that occurred in both linear and quadratic
form, then we observe that these are less in quadratic in comparison to the
linear so we can say that the quadratic trend fits better than linear on the
number of houses sold by the company.

Before going to the next session, you may like to fit linear or quadratic trend.
So, try a Self Assessment Question.

SAQ 5
The following table gives the gross domestic product (GDP) in 100 million for a
certain country from 2010 to 2020:
2019 2020
Year 2011 2012 2013 2014 2015 2016 2017 2018

GDP 35 37 51 54 62 64 74 71 83 80

(i) Fit a trend line for GDP data and find trend values with the help of trend
line.

(ii) Find forecast errors.

34 (iii) Use best-fit trend model to predict the country’s GDP for 2022
Unit 10 Trend Component Analysis

10.6.3 Exponential Trend


In many situations, the time series data relating to business and economic
activities show constant initial growth instead of annual increase as in the case
of linear trend. In such situations, the trend can be described best by
exponential function rather than linear or quadratic. This can be represented
by an exponential form as given below:
Yt = β0 eβ1t

We can transform this model to a linear trend model by taking the natural
logarithm (base e) on both sides of the above model as follows:
log ( Y=
t) log ( β0 ) + β1t log ( e )

Z t = a + β1t log ( e )= 1

where Z t = log ( Yt ) and=


a log ( β0 ) .

This is the equation of linear trend. Therefore, we proceed in the same way as
in the case of the linear trend line and find the estimate of a and β1 .

Once, we estimate of a and β1 i.e. â and β̂1 are obtained, then we can obtain
an estimate of β0 i.e. β̂0 with the help of â as

βˆ 0 =eâ

After that, we put the values of estimates of β0 and β1 in the exponential form
to get the equation of best fit.
Let us try a Self Assessment Question for the exponential trend in the same
manner as described above.

SAQ 6
The gross revenue of a company from 2015 to 2022 is given in the following
table:

Year 2015 2016 2017 2018 2019 2020 2021 2022

Gross Revenue 15 45 52 75 106 158 241 314

(i) Check which model of trend is the best fit to the given data.
(ii) Fit the suitable model.
(iii) Find the forecast errors.
(iv) Use the best fit trend model to predict the company’s gross revenue for
2025.

10.7 ESTIMATION OF TRENT COMPONENT


USING MOVING AVERAGE
We have discussed in Sec. 10.5 that the moving average method smooths out
(or filters out) the effect of irregular fluctuations by averaging values over
multiple periods. Also, if the time series has a seasonality effect, then we can
35
Block 3 Time Series Analysis
remove it by taking the moving average period equal to the season span. For
example, if a time series has been recorded monthly and there is a seasonal
effect of one year. This means that after twelve months data behave in a
similar way to twelve months ago, then we take a centred moving average with
m = 12 then it will smooth out (eliminate) the effect of season. Not only this, if
we increase m cyclical effect also smooths out from the time series. As you
have studied in Sec. 10.3 that the time series mainly has four components and
using the moving average method, we remove the seasonal, cyclic and
irregular components, therefore, after applying this method only the trend will
be left in the time series, hence, we can use the moving average method to
estimate the trend.
We end this unit by giving a summary of its contents.

10.8 SUMMARY
In this unit, we have discussed:
• A time series is a set of numeric data of a variable that is collected over
time at regular intervals and arranged in chronological (time) order.

• The different components of time series are trend, cyclical, seasonal and
irregularity;

• The general tendency of values of the data to increase or decrease during


a long period of time is called the trend.

• The variations which occur due to rhythmic or natural forces and operate
in a regular and periodic manner over a span of less than or equal to one
year are termed as seasonal variations.

• Apart from seasonal effects, some time series exhibit variation due to
some other physical causes, which is called cyclic variation. It has a
period greater than one year.

• The variations in a time series which do not repeat in a definite pattern are
called irregular variations or irregular components.

• Basic models of time series such as additive and multiplicative models.

• The methods of smoothing or filtering such as simple, weighted, and


exponential of the time series data.

• Method of least squares and moving average methods to estimate the


trend.

10.9 TERMINAL QUESTIONS


1. What is the difference between seasonal and cyclic components of time
series?
2. Explain various methods of smoothing.
3. For the gross domestic product (GDP) data for a country from 2010 to
2020 given in SAQ 5, find the trend values using 3-yearly simple moving
36 average. Also, compare with the linear trend.
Unit 10 Trend Component Analysis

4. Describe various methods of estimation of trend component.

10.10 SOLUTION/ANSWERS
Self-Assessment Questions (SAQs)
1. Refer Secs. 10.2 and 10.3.
2. Refer Sec. 10.4.
3. Since m = 4 is even, therefore, we compute the first moving average as
70 + 52 + 22 + 31
=MA1 = 43.75
4
We put the first MA in the middle of the second and third observations by
creating blank rows after each observation as explained in Example 2.
We can also calculate the weighted MAs by taking the weights w1 = 1,
w 2 2,=
= w 3 3 and w 4 = 4 as follows:

1× 70 + 2 × 52 + 3 × 22 + 4 × 31
WMA1 = 36.40
1+ 2 + 3 + 4

Also, we determine the centred moving average by averaging the first


two MAs and placing it against the middle value, i.e., we calculate the
average of the first two MAs and place it against the middle of the first
and second moving averages, i.e., against the third observation. We
have calculated the rest MAs and WMAs in the following table:

Demand Centred Centred


4-period 4-period
Year Season of 4-period 4-period
MA WMA
Electricity MA WMA

Summer 70

Monsoon 52

2020 43.75 36.40

Winter 22 47.63 47.85

51.50 59.30

Spring 31 53.00 61.80

54.50 64.30

Summer 101 54.75 58.20

55.00 52.10

Monsoon 64 56.75 50.10


2021
58.50 48.10

Winter 24 64.13 65.60

69.75 83.10

Spring 45.00 73.25 87.55

2022 76.75 92.00

37
Block 3 Time Series Analysis
Summer 146 78.50 84.25

80.25 76.50

Monsoon 92 80.75 70.25

81.25 64.00

Winter 38

Spring 49

We now plot the time series data and MAs by taking years on the X-axis and
the demand of electricity and centred MA and WMA on the Y-axis. We get the
time series plot as shown in Fig. 10.11.

Y
170

150

130 Centred
Demand of electricity

Centred 4-period WMA


4-period MA
110

90
Demand of
electricity
70

50

30

10 X

2020 2021 2022


Season

Fig. 10.11: Demand of electricity with centred MA and WMA.

From the above figure, we observed that WMA is smoother than the simple
MA.
4. In the exponential smoother method, we take the first smooth(forecast)
value as the first given value, i.e.,
′ y=
y=
1 1 5.5
We can compute the forecast error as
e1 = y1 − y1′ = 5.5 − 5.5 = 0
We can compute the second smoothed value using the first smoothed
value as
y′2 = α y 2 + (1 − α ) y1′

= 0.8 × 5.5 + (1 − 0.8) × 7.2 = 6.86

We can compute the forecast error as


38
Unit 10 Trend Component Analysis
e2 = y 2 − y′2 = 7.2 − 6.86 = 0.34

Similarly, you can compute the rest values in the same manner as
follows:

Exponential
Year Expenditure Smoothing Forecast Error
(α = 0.8)

2015 5.5 5.50 0

2016 7.2 6.86 0.34

2017 8.0 7.77 0.23

2018 9.6 9.23 0.37

2019 10.2 10.01 0.19

2020 11.0 10.80 0.20

2021 12.5 12.16 0.34

2022 14.0 13.63 0.37

We now demonstrate the impact of the smoothing factor α using the time
series graph shown in Fig. 10.12.

15

14

13 Smoothed
values α = 0.8
12
Expenditure

11
Original data
10

5
2015 2016 2017 2018 2019 2020 2021 2022
Year

Fig. 10.12: Annual expenditure with exponential smoothing.

From the above figure, we observed that the time series is not smoother so
α = 0.8 has less impact for smoothing the time series.
5. The linear trend line equation is given by
Yt = β0 + β1t

Since n = 10 is even, therefore, we make the following transformation in


time t:
t − average of two middle value
Xt =
half of interval in t values

t − 2015.5
= = 2 ( t − 2015.5 )
1/ 2 39
Block 3 Time Series Analysis
After the transformation, the normal equations for linear trend line are:

∑Y t = nβ0 + β1 ∑ Xt

∑X Y t t = β0 ∑ Xt + β1 ∑ X2t

We calculate the values of ∑ Y , ∑ X , ∑ X Y and ∑ X


t t t t
2
t in the following
table:
Xt = Trend Forecast
Year (t) GDP (Yt) XtYt X 2t
2(t-2015.5) Value Error

2011 35 –9 –315 81 36.89 –1.89

2012 37 –7 –259 49 42.27 –5.27

2013 51 –5 –255 25 47.65 3.35

2014 54 –3 –162 9 53.03 0.97

2015 62 –1 –62 1 58.41 3.59

2016 64 1 64 1 63.79 0.21

2017 74 3 222 9 69.17 4.83

2018 71 5 355 25 74.55 –3.55

2019 83 7 581 49 79.93 3.07

2020 80 9 720 81 36.89 –1.89

Total 611 0 889 330

Therefore, we find the values of β0 and β1 using normal equations as

611
611
= 10 × β0 + β1 × 0 ⇒=
β0 = 61.1
10
889
889 = β0 × 0 + β1 × 330 ⇒=
β1 = 2.69
330
Thus, the final linear trend line is given by
Ŷt 61.1 + 2.69X t
=

Ŷt = 61.1 + 2.69 × 2 ( t − 2015.5 )

We can find the trend values using the above line by putting t values.
For example, t = 2011.
Ŷt 61.1 + 2.69 × 2 ( 2011 − 2015.5
= = ) 61.6 − 2.69 × ( −9=) 36.89
We can compute the forecast error as
e1 =y1 − yˆ 1 =
35 − 36.89 =
−1.89

You can calculate the rest of the values in a similar manner. We have
calculated the same in the above table.
We can estimate the trend value of the GDP of the country for 2022 by
putting t = 2022 in the above linear trend line as follows:

Ŷt = 61.10 + 2.69 × 2 ( 2025 − 2015.5 ) = 112.21


40
Unit 10 Trend Component Analysis

We now plot the time series data and trend line by taking years on the X-
axis and the GDP and trend values on the Y-axis. We get the time series
plot as shown in Fig. 10.13.

Y Trend line
82

72
Original data

62
GDP

52

42

32 X
2010 2011 2012 2013 2014 2015 2016 2017 2018
Year

Fig. 10.13: Computed trend line and GDP data.

6. To check which model of trend is the best fit to the given data, we first
plot the given data by taking the year on the X-axis and corresponding
gross revenue on the Y-axis as shown in Fig. 10.14.
Y
350

300

250
Gross revenue

200

150

100

50

0 X
2015 2016 2017 2018 2019 2020 2021 2022

Year

Fig. 10.14: Gross revenue data of a company.

The shape of Fig. 10.14 is approximate as an exponential. So the


exponential model may be suitable for the data. The form of the model is
given as follows:

Yt = β0 eβ1t

We take the natural logarithm (base e) to convert this model to a linear


trend model as follows:
41
Block 3 Time Series Analysis
log ( Y=
t) log ( β0 ) + β1t log ( e )

Z t = a + β1t log ( e )= 1

Z t log ( Yt )=
where = ,a log ( β0 )

Since n = 8 is even, therefore, we make the following transformation in


time t as
t − average of two middle value
Xt =
half of interval in t values

t − 2018.5
= = 2 ( t − 2018.5 )
1/ 2
Therefore, the normal equations for estimating a and β1 are:

∑ Z= t na + β1 ∑ Xt

∑=
X Z a∑ X
t t t + β1 ∑ X2t

We calculate the values of ∑ Z ,∑ X ,∑ X Z


t t t t and ∑X 2
t in the following
table:

Gross
Year Xt = Zt = Trend Forecast
Revenue Xt Zt X 2t
(t) 2(t – 018.5) log(Yt) Value Error
(Yt)

2015 15 –7 2.71 –18.96 49 21.58 –6.58


2016 45 –5 3.81 –19.03 25 32.29 12.71
2017 52 –3 3.95 –11.85 9 48.03 3.97
2018 75 –1 4.32 –4.32 1 71.45 3.55
2019 106 1 4.66 4.66 1 106.28 –0.28
2020 158 3 5.06 15.19 9 158.11 –0.11
2021 241 5 5.48 27.42 25 235.20 5.80
2022 314 7 5.75 40.25 49 349.88 –35.88
Total 0 35.74 33.36 168

Therefore, we find the values of β0 and β1 using normal equations as

35.74
35.74 = 8 × a + β1 × 0 ⇒
= a = 4.47
8

33.36
33.36 = a × 0 + β1 × 168 =
⇒ β1 = 0.20
168

Therefore,

βˆ 0 =eâ

(=
2.20 )
4.47
= 87.60

Thus, the best fit of the exponential trend is given by

0.20×2( t − 2018.5 )
β0 eβ1t =
Ŷt = 87.60e

42
Unit 10 Trend Component Analysis

We can find the trend values using the above line by putting t values.
For example, for t = 2015

( ) 0.20×2 2015 − 2018.5


=Ŷ2015 87.60e
= 21.58

You can calculate the rest of the values in a similar manner. We have
calculated the same in the table.

We also calculate the forecast errors in the same table as follows:

y1 − Yˆ 2015 =
e1 = 15 − 21.58 =
−6.58

We can estimate the trend value of the gross revenue of the company
for 2025 by putting t = 2025 in the above trend equation as follows:
0.120×2( 2025 − 2018.5 )
Ŷt = 87.60e

= 1151.79

We now plot the time series data and trend values by taking years on the
X-axis and the gross revenue and trend values on the Y-axis. We get the
time series plot as shown in Fig. 10.15.

Y
400

350 Exponential
trend
300
Gross revenue

250

200

150 Original data

100

50
X
0
2015 2016 2017 2018 2019 2020 2021 2022
Year

Fig. 10.15: Fitted exponential trend model and gross revenue data.

Terminal Questions (TQs)


1. Refer Sec. 10.3.

2. Refer Sec. 10.5.

3. Since m = 3 is odd, therefore, we compute the first moving average as


35 + 37 + 51
=MA1 = 41
3
43
Block 3 Time Series Analysis
And put the same in the middle of these observations, that is, in front of
2012. This value is treated as the estimate of the trend for 2012.
Similarly, we estimate the trend value for the rest of the years. You can
calculate the rest of the values in a similar manner. We have calculated
the same in the following table:

MA/Trend Forecast Error Forecast Error


Year (t) GDP (Yt)
Value (MA) (linear trend)
2011 35 --- --- –1.89
2012 37 41.00 – 4.00 –5.27
2013 51 47.33 3.67 3.35
2014 54 55.67 –1.67 0.97
2015 62 60.00 2.00 3.59
2016 64 66.67 –2.67 0.21
2017 74 69.67 4.33 4.83
2018 71 76.00 –5.00 –3.55
2019 83 78.00 5.00 3.07
2020 80 --- --- –1.89

For comparison with the linear trend, we also calculate the forecast
errors in the same table as follows:
e2 =
y 2 − MA =
37 − 41 =
−4

By comparing the forecast errors that occur in both the moving average
and trend line, we observed that in most of the cases, the error is less in
the trend line in compression of the moving average.
We also pot the estimated trend using the method of least squares and
moving average method in Fig. 10.16.

Y Trend using MA
82

GDP data
72
Trend Using
62 method of least
squares
GDP

52

42

32 X
2010 2011 2012 2013 2014 2015 2016 2017 2018
Year

Fig. 10.16: GDP data with trend values using method of least squares and moving
average.

4. Refer Secs. 10.6 and 10.7.

44
UNIT 11
SEASONAL COMPONENT
ANALYSIS

Structure
11.1 Introduction 11.7 Estimation of Cyclic
Component
Expected Learning Outcomes
Residual Method
11.2 Estimation of Seasonal
Component 11.8 Estimation of Random
Component
11.3 Simple Average Method
11.9 Forecasting
11.4 Ratio to Trend Method
11.10 Summary
11.5 Ratio to Moving Average
Method 11.11 Terminal Questions
11.6 Estimation of Trend from 11.12 Solution/Answers
Deseasonalised Data

11.1 INTRODUCTION
In Unit 10, you have seen that time series can be decomposed into four
components i.e. trend, seasonal, cycle and irregular components. Our aim is
to estimate these components and use them for forecasting. We have already
described some methods for smoothing or filtering the time series i.e. simple
average, weighted moving average and exponential smoothing methods. We
have also described some methods for estimating the trend, i.e., methods of
least squares and moving average methods.
In Unit 10, you also study seasonal component. Seasonal variations are
fluctuations in a time series that are repeated at regular intervals within a year
(e.g. daily, weekly, monthly, quarterly). Seasonal variation may be caused by
the seasons, temperature, rainfall, public holidays, etc. In case when the effect
of seasonal variation is not removed from the time series data then the trend
estimate will also be affected by seasonal effects. In such cases, we divide the
original time series values by corresponding seasonal indices. This technique
is called deseasonalisation of data.
In this unit, we shall discuss some methods for estimating the seasonal
component and cyclic components. In Sec. 11.2, we discuss the goals of
seasonal component analysis. In Secs. 11.3, 11.4 and 11.5, we explain the
45
* Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 2 Time Series Analysis
simple average method, ratio to moving average, and ratio to trend method
with their merits and merits, for estimating seasonal components, respectively.
In Sec. 11.6, we explain the estimation of trend using deseasonalised data.
The estimation of cyclic and random components is discussed in Secs. 11.7
and 11.8, respectively. Once, we estimate trend, seasonal and cyclic
components, then we shall use them for forecasting purposes. Therefore, Sec.
11.9 is devoted to the forecast of the time series. In the next units, you will
study the modern approach to forecasting the time series.
Expected Learning Outcomes
After studying this unit, you would be able to
 describe the effect of seasonal variation in the time series influences;
 goals of analysis the seasonal components;
 apply the simple average method for the estimation of seasonal indices;
 use the ratio to trend method for computing seasonal indices;
 apply the ratio to moving average method for estimation of seasonal
indices;
 describe the method of estimation of trend component from
deseasonalised time series data;
 discuss the methods of estimation of cyclic and irregular fluctuations in a
time series; and
 use trend, seasonal and cyclical components to forecast future values.

11.2 ESTIMATION OF SEASONAL COMPONENT


In Unit 10, you have studied what seasonal variation is and how they affect a
time series. The variations which occur due to rhythmic or natural forces and
operate regularly and periodically over a span of less than or equal to one
year are termed as seasonal variations. For example, daily traffic volume
data show within-day seasonal behaviour, with peak levels occurring during
rush hours, moderate flow during the rest of the day, and light flow from
midnight to early morning. Thus, in a time series, seasonal variation may exist
if data are recorded quarterly, monthly, daily and so on. The main objective of
the analysis of seasonal variations are
(i) We analyse the seasonal variations to get an idea of the relative position
of the phenomenon under study in each season and it helps us to
identify the forces behind the seasonal variations so that we can plan for
the season.
(ii) The estimation of seasonal variations is also necessary to eliminate the
seasonal effect from the time series data so we can see better patterns
of the time series such as trends. This technique is called
deseasonalised data.
Seasonal variation is measured in terms of an index, called a seasonal index.
It is an average that can be used to compare an actual observation relative to
what it would be if there were no seasonal variations. An index value is
46 attached to each period of the time series within a year. This implies that if
Unit 11 Seasonal Component Analysis
monthly data are considered there are 12 separate seasonal indices, one for
each month.
There are various methods of estimating seasonal variations. We list some of
the most popular methods as follows:
1. Simple average method

2. Ratio to trend method

3. Ratio to moving average method

We will discuss one at a time in the subsequent sections.

11.3 SIMPLE AVERAGE METHOD


When we discuss the methods of measuring seasonality, the method of simple
average is the simplest of all. This method is based on the basic assumption
that the data do not contain any trend and cyclic components and consists of
eliminating irregular components by averaging the monthly (quarterly) values
over the years. These assumptions may or may not be true since most of the
economic or business time series exhibit trends.
This method consists of the following steps:
Step 1: If the time series data is given monthly or quarterly of different years,
then we, first of all, arrange the data by months, quarters, etc. of
different years in such a way that the months, quarters, etc. lie in
rows and years in columns or vice-versa (see Example 1).
Step 2: After arranging the time series data, we compute the average (yi ) of
each month or quarter in different years which eliminates the irregular
fluctuations. These averages are known season indices. For
example, if we have quarterly data for three years, say, 2020,2021
and 2022, then we compute the average of the ith quarter (Qi) as

Qi (2020) + Qi (2021) + Qi (2022)


yi =
3
Step 3: Each seasonal index has a base index of 100. Therefore, the
average of all seasonal indices will always be 100. In other wards,
the sum of the seasonal indices will be 400 for quarterly data and
1200 for monthly data. So we calculate the grand average of these
averages . For example, if we have quarterly data then we can
compute the grand average as
y1 + y 2 + y3 + y 4
y=
4
where y1, y 2 , y3 and y 4 are the averages of the first, second, third and
fourth quarters, respectively. If we have monthly data, then we can
compute the grand average as
y1 + y 2 + ... + y12
y=
12
Step 4: If the average of all seasonal indices is not 100, then we normalise
them so that their mean should be 100. For that, we find the adjusted 47
Block 2 Time Series Analysis
seasonal indices by expressing each seasonal index as the
percentage of the average of seasonal indices as
Seasonal index
Adjusted seasonal index = × 100
Average of seasonal index

That is, after calculating the grand average, we express each average
as the percentage of the grand average y . These percentages are
known as adjusted seasonal indices. We can compute the ith
seasonal index (Si) as

yi
S=i × 100
y
Interpretation of seasonal index
Each (adjusted) seasonal index measures the average magnitude of the
Seasonal index also
seasonal influence on the actual values of the time series for a given period
called seasonal effect or
within the year and it measures how a particular season compares on average
seasonal component,
to the mean of the cycle. For example, if the seasonal index of the season
(April-June) of the sales of AC of a particular company is 170.5. It means that
the sale of AC in that season, on an average, is 70.5 (170.5-100) times higher
than the average. If the seasonal index for the season (October-December) is
75 then it indicates that the sale of AC in that season, on an average, is 25
(100-75) times below the average and the company may be depressed by the
presence of seasonal forces by approximately 25%. Alternatively, the AC
sales would be about 25% higher if seasonal influences had not been present.
After understanding various steps of computing seasonal indices using the
simple average method and how to interpret these, let us take an example to
apply this method.
Example 1: The marketing manager of an electricity company recorded the
following quarterly (seasonally) demand levels for electricity (in 1000
megawatts) in a city from 2019 to 2022.
Season 2019 2020 2021 2022
Summer 70 101 120 135
Monsoon 52 64 75 82
Winter 22 24 30 34
Spring 31 45 49 50

(i) Calculate the seasonal index for each season by assuming that there
are no trend and cyclic effects.
(ii) Plot seasonal indices and original data on the same graph.
(iii) Also, interpret the seasonal indices.

Solution: Since there is no trend effect so we can apply simple average


method to obtain seasonal indices. Since the electricity demand is recorded
quarterly, therefore, we compute the average yi of each season/quarter as
follows:

70 + 101 + 120 + 135 52 + 64 + 75 + 82


=y1 = = 106.5 , y 2 = 68.25
48 4 4
Unit 11 Seasonal Component Analysis
22 + 24 + 30 + 34 31 + 45 + 49 + 50
= = y3 27.5 , y 4 = 43.75
4 4
After calculating the quarterly averages, we calculate the grand averages of all
quarterly averages as
y1 + y 2 + y3 + y 4 106.5 + 68.25 + 27.5 + 43.75
y = = 61.5
4 4
We now calculate the adjusted seasonal indices by expressing each average
as the percentage of the grand average one by one for all i = 1, 2, 3, 4. i.e.
106.5
Seasonal index for summer= × 100= 173.17
61.5
68.25
Seasonal index for monsoon= × 100= 110.98
61.5
27.5
Seasonal index for winter = × 100 = 44.72
61.5
43.75
Seasonal index for spring= × 100= 71.14
61.5
We can arrange these in a table as follows:
Adjusted
Seasonal
Season 2019 2020 2021 2022 Seasonal
Index
Iindex
Summer 70 101 120 135 106.5 173.17
Monsoon 52 64 75 82 68.25 110.98
Winter 22 24 30 34 27.5 44.72
Spring 31 45 49 50 43.75 71.4
Average 61.5 100

We now plot the demand of electricity and seasonal indices on the graph
paper as shown below:
Y
190

170
Demand of Seasonal index
150 electricity
Demand of electricity

130

110

90

70

50

30

10
X

2019 2020 2021 2022

Year

Fig. 11.1: Demand of electricity with seasonal index. 49


Block 2 Time Series Analysis
Interpretation
The adjusted seasonal indices for summer and monsoon seasons are 173.17
and 110.98, respectively. They indicate that the demand of electricity in these
seasons on an average are 173 (173.17 – 100) and 11 (110.98 – 100) times
higher than the average, respectively. Similarly, the seasonal indices for the
winter and spring seasons are 44.72 and 71.4, respectively. They indicate that
the demand of electricity on an average are 55.28(100 – 44.72) and 28.6
(100 – 71.4) times below the average, respectively. We can say that the
demand of electricity would be about 55.28% and 28.6% higher in both
seasons if seasonal influences had not been present.

After understanding the simple average method and how to calculate seasonal
indices, we now discuss the merits and demerits of this method.

Merits and Demerits

• It is a simplest method of measuring seasonal variations.

• This method is based on the unrealistic assumption that the trend and
cyclical variations are not present in the time series data. These
assumptions are not met in most of the economic or business time series
because they generally exhibit trends.

Before moving to the next method of measuring seasonal component, you can
try the following Self Assessment Question for better understanding.

SAQ 1
The sales manager of a company recorded the monthly sales (in thousands)
of a product for the years 2020, 2021 and 2022 and are given as follows:
Sales
Months
2010 2011 2012
January 120 150 160
February 110 140 150
March 100 130 140
April 140 160 160
May 150 160 150
June 150 150 170
July 160 170 160
August 130 120 130
September 110 1360 100
October 100 120 100
November 120 130 110
December 150 140 150

Obtain the monthly seasonal indices assuming that there is no trend and cyclic
effects.

After understanding the simple average method and its merits and demerits,
we now learn the second method of calculating seasonal variation, that is, the
50 ratio to trend method.
Unit 11 Seasonal Component Analysis

11.4 RATIO TO TREND METHOD


As you have seen, the simple average method is used only when the time
series data do not have a trend and cyclic effect but most of the economic or
business time series exhibit trends. Therefore, in such cases, we cannot use
the simple average method for seasonal indices. For calculating seasonal
indices in such cases, we can use another method of measuring seasonal
variation, i,e, the ratio to trend method. This method is based on the basic
assumption that the data do not contain any cyclic components. It means
that if the time series variable consists of trend, seasonal and random
components then we can apply this method to compute the seasonal indices.
Therefore, it is an improved version of the single average method as it
assumes that seasonal variation for a given period is a constant fraction of the
trend. The measurement of the seasonal indices by this method consists of
the following steps:
Step 1: In this method, we first convert the quarterly or monthly time series
data into yearly. For that, we compute the average of all quarters
(months) of each year and then take these averages as the yearly
values of the variable.
Step 2: After converting data annually/yearly, we find trend values of this
converted data by the method of least squares by fitting a suitable
mathematical curve, either a trend line (straight line) or second-
degree polynomial (parabola) as discussed in Unit 10.
Step 3: The yearly trend values which are calculated in Step 2 are the trend
values of the mid-periods of the respective years. Based on these
trend values, we determine the trend values for each quarter or
Since cyclic component
month as per the case using yearly increment (slope) of the trend line is not present in the time
(see Example 2). sereis, therefore

Step 4: To eliminate the trend effect from the data, we express the original
time series values as the percentages of the trend value assuming And after dividing by the
the multiplicative model. These percentages will contain the seasonal, trend, the seasonal and
and irregular components. irregular components
will remain
Step 5: After eliminating the trend effect, the series contains only seasonal
and irregular effects. Therefore, we now obtain the seasonal indices
free from the irregular variations by following the same procedure
discussed in the simple average method in the previous section.
Thus, we find the average (mean or median) of ratio to trend values
(or percentages values) for each season in all years. It is suggested
to prefer the median despite the mean if there are some extreme
values (outliers), which are not primarily due to seasonal effects. In
this way, the irregular variation is removed. If there are few abnormal
values in the percentage values, then the mean should be preferred
to remove the randomness. These averages are known as seasonal
indices.
Step 6: If the sum of the seasonal indices is not 400 for quarterly data and
1200 for monthly data, then we find the adjusted seasonal indices by
expressing each seasonal index as the percentage of the average of
seasonal indices as 51
Block 2 Time Series Analysis
Seasonal index
Adjusted seasonal index = × 100
Average of seasonal index

For a better understanding of the procedure of ratio to trend method, let us


take an example.
Example 2: Compute the seasonal indices by the ratio to trend method for the
data of quarterly demand of electricity (in 1000 megawatts) given in Example 1
by assuming that there is no cyclic pattern.
Solution: We have given the quarterly time series data for 4 years. Therefore,
we first convert the quarterly day into yearly by finding the average of all
quarters of each year as follows:

Year Summer Monsoon Winter Spring Average

70 + 52 + 22 + 31
2019 70 52 22 31 = 43.75
4
101 + 64 + 24 + 45
2020 101 64 24 45 = 58.5
4
120 + 75 + 30 + 49
2021 120 75 30 49 = 68.5
4
135 + 82 + 34 + 50
2022 135 82 34 50 = 75.25
4

Therefore, we get the required yearly data as given below:

Year Average

2019 43.75

2020 58.5

2021 68.5

2022 75.25

We now determine the yearly trend by fitting a linear trend by the method of
least square as discussed in Unit 10.

The linear trend line equation is given by


Y=
t β0 + β1t

Since n = 4 (number of years) is even, therefore, we make the following


transformation in time t to make calculation easy:

t − average of two middle value t − 2020.5


Xt = = = 2 ( t − 2020.5 )
half of interval in t values 1/ 2

After the transformation, the normal equations for a linear trend line are:

∑=
Y t nβ0 + β1 ∑ Xt


= XY t t β0 ∑ Xt + β1 ∑ X2t

52
We calculate the values of ∑ Y , ∑ X , ∑ X Y and ∑ X
t t t t
2
t in the following table:
Unit 11 Seasonal Component Analysis
Xt =
Year (t) Yt XtYt X2t
2(t-2020.5)
2019 43.75 –3 –131.25 9
2020 58.50 –1 –58.50 1
2021 68.50 1 68.50 1
2022 75.25 3 225.75 9
Total 246 0 104.50 20

Therefore, we find the values of β0 and β1 using normal equations as


246
246 = 4 × β0 + β1 × 0 ⇒ β0 = = 61.50
4
104.5
104.50 = β0 × 0 + β1 × 20 ⇒ β1 = = 5.23
20
Thus, the final linear trend line is given by
Yt 61.5 + 5.23X t
=

We now find the trend values by putting Xt = –3, –1, 1, 3 in the above trend
line equation as follows:
= 61.5 + 5.23 × ( −=
Trend value (2019) 3) 45.81
= 61.5 + 5.23 × ( −=
Trend value (2020) 1) 56.27

= 61.5 + 5.23 ×=
Trend value (2021) 1 66.73
Trend value (2022)= 61.5 + 5.23 × =
3 77.19
These trend values represent the averages of the corresponding year and are
supposed to lie at the centre of the corresponding year. Therefore, we place
these in the middle of the 2nd and 3rd quarters. We now determine the trend
values for each quarter using the yearly increment (slope = 5.23) of the trend
line.
We have the yearly increment = 5.23.
5.23
Therefore, the quarterly increment will=
be = 1.31
4
1.31
Similarly, the half-quarterly increment will=
be = 0.66
2
Thus, the trend value for the 3rd quarter of the year 2019 will be
= Trend value of 2019 + half quarterly increment = 45.81 + 0.66 = 46.47 .
Similarly, the trend value for the 2nd quarter of the year 2019 will be
= Trend value of 2019 – half quarterly increment = 45.81 − 0.66 = 45.15 .
Thus, the trend value for the 1st quarter of the same year will be
= Trend value of 2nd quarter – quarterly increment= 45.15 − 1.31= 43.84 .
In a similar way, the values for the 4th quarter of the same year will be
= Trend value of 3rd quarter + quarterly increment= 46.47 + 1.31= 47.78 .

Similarly, you can compute the trend values for the rest years as we have
computed for all quarters of the year 2019. We calculated the same in the
following table: 53
Block 2 Time Series Analysis
Quarterly trend values

Season 2019 2020 2021 2022

Summer 45.15 – 1.31 = 43.84 55.61 – 1.31 = 54.30 66.07 – 1.31 = 64.76 76.53 – 1.31 = 75.22

Monsoon 45.81 – 0.66 = 45.15 56.27 – 0.66 = 55.61 66.73 – 0.66 = 66.07 77.19 – 0.66 = 76.53

Winter 45.81 + 0.66 = 46.47 56.27 + 0.66 = 56.93 66.73 + 0.66 = 67.39 77.19 + 0.66 = 77.85

Spring 46.47 + 1.31 = 47.78 56.93 + 1.31 = 58.24 67.39 + 1.31 = 68.7 77.85 + 1.31 = 79.16

After getting the quarterly trend values, we now remove the trend effect. For
that, we divide each original value into the corresponding trend value and
express them in percentages as shown in the following table:
Season 2019 2020 2021 2022

70 101 120 135


Summer 159.67
× 100 = × 100 =
186 185.3
× 100 = 179.47
× 100 =
43.84 54.3 64.76 75.22

52 64 75 82
Monsoon 115.17
× 100 = 115.09
× 100 = 113.52
× 100 = 107.15
× 100 =
45.15 55.61 66.07 76.53

22 24 30 34
Winter 47.34
× 100 = 42.16
× 100 = 44.52
× 100 = 43.67
× 100 =
46.47 56.93 67.39 77.85

31 45 49 50
Spring 64.88
× 100 = 77.27
× 100 = 71.32
× 100 = 63.16
× 100 =
47.78 58.24 68.7 79.16

Now, the above data is free from trend effect so we can apply the simple
average method to calculate the seasonal indices. We now compute the
average(seasonal index) yi of each season/quarter in different years and then
adjused seasonal indices as follows:
Calculations for seasonal indices

Season 2019 2020 2021 2022 Average (Seasonal Index) Adjused Seasonal Index

159.67 + 186.00 + 185.30 + 179.47 177.61


Summer 159.67 186.00 185.30 179.47 = 177.61 175.89
× 100 =
4 100.98

1115.17 + 115.09 + 113.52 + 107.15 112.73


Monsoon 115.17 115.09 113.52 107.15 = 112.73 111.64
× 100 =
4 100.98

47.34 + 42.16 + 44.52 + 43.67 44.42


Winter 47.34 42.16 44.52 43.67 = 44.42 43.99
× 100 =
4 100.98

64.88 + 77.27 + 71.32 + 63.16 69.16


Spring 64.88 77.27 71.32 63.16 = 69.16 68.49
× 100 =
4 100.98

Total 177.61+112.73+44.42+69.16 = 405.04 400

177.61 + 112.73 + 44.42 + 69.16


Average = 100.98 100
4

The average yearly seasonal indices obtained above are adjusted to a total of
400 because the total of the seasonal Indices for each quarter is 405.04 which
is greater than 400. So we express each seasonal index as the percentage of
the average of seasonal indices. The adjusted seasonal indices for each
quarter are given in the last column of the above table.
We now discuss the merits and demerits of ratio to trend method as follows:
Merits and Demerits

• This method is undoubtedly a more rational way to measure seasonal


54 variations than the simple average method.
Unit 11 Seasonal Component Analysis
• The advantage of this method over the ratio to moving average method
(we will discuss in the next session) is that, unlike the ratio to moving
average method, we may obtain ratio to trend values for each period for
which data are available.
• The main demerit of this method is that if there are cyclical variations in the
series, then the trend whether a straight line or a curve can never follow
the actual data as closely as a moving average does.
• Furthermore, it is more complicated than the simple average method.
Before moving to the next method of measuring seasonal component, you can
try the following Self Assessment Question.

SAQ 2
The marketing manager of a company that manufactures and distributes
farming equipment (harvesters, ploughs and tractors) recorded the number of
farming units sold quarterly for the period 2018 to 2020 which are given in the
following table:
Quarter
Q1 Q2 Q3 Q4
Year

2018 48 41 60 65

2019 58 52 68 74

2020 60 56 75 78

(i) Find the quarterly seasonal indexes for farming equipment sold using the
ratio to trend method by assuming that there is no cyclic effect.
(ii) Do seasonal forces significantly influence the sale of farming equipment?
Comment.

After understanding the simple average and ratio to trend methods and their
merits and demerits, let us move to the ratio to moving average method.

11.5 RATIO TO MOVING AVERAGE METHOD


In the previous sections, we have discussed the simple average method and
ratio to trend method for calculating the seasonal indices. We are now
discussing the most widely used method which is known as ratio to moving
average method. It is better than the previously discussed methods because
of its accuracy. This method does not base any assumption regarding the
presence of the components in the time series. It means that we can apply this
method to calculate seasonal indices even if all components namely trend,
seasonal, cyclic and irregular variations are present in the time series.
The necessary steps for obtaining seasonal indices by this method are as
follows:

Step 1: If the time series data is recorded monthly or quarterly of different


years, then, we first arrange the data by months or quarters
chronologically of different years in such a way that all months or
quarters lie in a column (see Example 3). 55
Block 2 Time Series Analysis
Step 2: After arranging the data, we calculate the moving average as
discussed in Unit 10. If monthly (quarterly) data is given, a twelve
(four)-period moving average will isolate the trend/cyclical
movements in the time series and produce a base set of time series
Since all components
values.
are present in the time
sereis, therefore Step 3: The actual values include all components/effects, i.e. trend,
seasonal, cyclic and irregular, and moving averages contain only the
trend and cyclical effects, therefore, if we divide the actual value by
And after dividing by the
moving average, the the corresponding moving average then the ratio will contain only
seasonal and irregular seasonal and irregular effects. This ratio is known as the seasonal
components will remain ratio. Therefore, we calculate the seasonal ratio for each period (for
which we have the moving average) by dividing each actual time
series value by its corresponding moving average value as
Actual value
Seasona ratio
= × 100
Moving average

A seasonal ratio is an index that measures the percentage deviation


of each actual value from its moving average value, and it is a
measure of the seasonal impact.
Step 4: We now obtain the seasonal indices free from the irregular variations
by following the same procedure discussed in the simple average
method in the previous section. For that, we prepare another two-
way table consisting of the quarter/month-wise percentage values
calculated in Step 3, for all years.
Let us take an example to understand this method more explicitly.
Example 3: Apply ratio to moving average method to obtain seasonal indices
for the data of quarterly demand of electricity given in Example 1.
Solution: As we have described in the procedure of the ratio to moving
average that we first arrange the data quarterly in chronological order of
different years in such a way that all quarters lie in a column and then obtain
the moving average for 4 quarters as described in Unit 10 as follows:
Calculation of quarterly moving average and seasonal ratio

Demand of Centred 4-quarterly


Year Quarter 4-quarterly MA Seasonal ratio
electricity MA

Summer 70

Monsoon 52

70 + 52 + 22 + 31
= 43.75
4
2019 43.75 + 51.50 22
Winter 22 = 47.63 × 100 =
46.19
2 47.63

52 + 22 + 31 + 101
= 51.50
4

51.50 + 54.5 31
Spring 31 = 53 × 100 =
59.49
2 53

22 + 31 + 101 + 64
= 54.50
4
56
Unit 11 Seasonal Component Analysis
54.5 + 55 101
Summer 101 = 54.75 × 100 =
184.47
2 54.75

31 + 101 + 64 + 24
= 55
4

55 + 58.5 64
Monsoon 64 = 56.75 × 100 =
112.78
2 56.75

101 + 64 + 24 + 45
2020 = 58.5
4

58.5 + 63.25 24
Winter 24 = 60.88 × 100 =
39.43
2 60.88

64 + 24 + 45 + 120
= 63.25
4

63.25 + 66 45
Spring 45 = 64.63 × 100 =
69.63
2 64.63

24 + 45 + 120 + 75
= 66
4

66 + 67.5 120
Summer 120 = 66.75 × 100 =
179.78
2 66.75

45 + 120 + 75 + 30
= 67.5
4

67.5 + 68.5 75
Monsoon 75 = 68 × 100 =
110.29
2 68

120 + 75 + 30 + 49
2021 = 68.5
4

68.5 + 72.25 30
Winter 30 = 70.38 × 100 =
42.63
2 70.38

75 + 30 + 49 + 135
= 72.25
4

72.25 + 74 49
Spring 49 = 73.13 × 100 =
67.01
2 73.13

30 + 49 + 135 + 82
= 74
4

74 + 75 135
Summer 135 = 74.5 × 100 =
181.21
2 74.5

49 + 135 + 82 + 34
= 75
4

75 + 75.25 82
Monsoon 82 = 75.13 × 100 =
109.15
2022 2 75.13

135 + 82 + 34 + 50
= 75.25
4

Winter 34

Spring 50

After that, we compute the seasonal ratio for each period (for which we have
the moving average) by dividing each actual time series value by its
corresponding moving average value as

Actual value
Seasona ratio
= × 100
Moving average
We have computed these in the last column of the above table. 57
Block 2 Time Series Analysis
We now prepare a two-way table consisting of the seasonal ratios quarter-
wise for all years as shown in the following table and calculate median
(seasonal indices) and adjusted seasonal indices as follows:
Seasonal Index Adjusted Seasonal
Season 2019 2020 2021 2022
( Median) Index
181.21
Summer 184.47 179.78 181.21 181.21 × 100 =
180.7
100.29
110.29
Monsoon 112.78 110.29 109.15 110.29 × 100 =
109.98
100.29
42.63
Winter 46.19 39.43 42.63 42.63 × 100 =
42.51
100.29
67.01
Spring 58.49 69.63 67.01 67.01 × 100 =
66.82
100.29
Total 401.14 400
Average 100.29 100

After understanding the ratio to moving average method, let us see the merits
and demerits of this method.
Merits and Demerits
• The ratio to moving average method is the most popular method because
it can also be applied when all four components of a time series are
present.
• It is easier than the ratio to trend method and provides satisfactory
estimates.
• It is better than the simple average method which assumes that the trend
is absent. But more difficult than the simple average method.
• Its primary drawback is a loss of information when calculating trend, the
moving averages (trend) of a few seasons in the beginning and an equal
number of seasons in the end are not available.
Before moving to the next session, you should try the following Self
Assessment Question.

SAQ 3
Compute the quarterly seasonal indexes for the sales of farming equipment
data given in SAQ 2.

11.6 ESTIMATION OF TREND FROM


DESEASONALISED DATA
In Unit 10, we have already discussed the estimation of trend. However, when
the substantial seasonal component is present in the data then it is advisable
to first remove the effect of seasonal effect from the data. If we do not remove
it then the trend will also be affected by seasonal effects which will make it
unreliable. Hence, after the estimation of seasonal indices, the seasonal
component is removed from the original data, and the reduced data are free
from seasonal variations and is called deseasonalised data. For calculating
deseasonalised values, we divide the actual value by its corresponding
58 seasonal index, that is,
Unit 11 Seasonal Component Analysis
Actual value
Deseasonalised value = × 100
Seasonal index
Once the data are free from seasonal effect, we estimate the trend by the
method of least squares or moving average method as discussed in Unit 10.
The necessary steps for estimation of the trend from deseasonalised data are
as follows:
Step 1: We first obtained seasonal indices using the ratio to trend method or
ratio to moving average method as discussed in Secs. 10.4 and 10.5.
Step 2: After that we calculate deseasonalised values by dividing the actual
value by its corresponding seasonal index and multiplying by 100,
that is,
Actual value
Deseasonalised value = × 100
Seasonal index
Step 3: We estimate trend by the method of least squares or moving average
method as discussed in Unit 10.
Let us take an example to understand how to estimate trend from
deseasonalised data more clearly.
Example 4: Consider the seasonal indices calculated in Example 3.
(i) Obtain the deseasonalised values.
(ii) Estimate trend values from the deseasonalised values.
(iii) Plot original and deseasonalised demand of electricity obtained in part (i)
with trend line obtained in part (ii).
Solution: We have calculated seasonal indices for the demand of electricity in
Example 3. We now calculate the deseasonalised values. For that, we prepare
a table given below and write the corresponding seasonal index in front of
each actual observation and calculate deseasonalised values using the
following formula:
Actual value
Deseasonalised value = × 100
Seasonal index
Calculations for deseasonalised demand of electricity

Deseasonalised
Demand of Seasonal
Year Season Demand of
Electricity Index
Eelectricity
70
Summer 70 180.70 × 100 =
38.74
180.70
52
Monsoon 52 109.98 × 100 =
47.28
109.98
2019
22
Winter 22 42.51 × 100 =
51.75
42.51
31
Spring 31 66.82 × 100 =
46.39
66.82
101
Summer 101 180.70 × 100 =
55.89
180.70
2020
64
Monsoon 64 109.98 × 100 =
58.19
109.98
59
Block 2 Time Series Analysis
24
Winter 24 42.51 × 100 =
56.46
42.51
45
Spring 45 66.82 × 100 =
67.35
66.82
120
Summer 120 180.70 × 100 =
66.41
180.70
75
Monsoon 75 109.98 × 100 =
68.19
109.98
2021
30
Winter 30 42.51 × 100 =
70.57
42.51
49
Spring 49 66.82 × 100 =
73.33
66.82
135
Summer 135 180.70 × 100 =
74.71
180.70
82
Monsoon 82 109.98 × 100 =
74.56
109.98
2022
34
Winter 34 42.51 × 100 =
79.98
42.51
50
Spring 50 66.82 × 100 =
74.83
66.82

We now determine the trend by fitting a linear trend line by the method of least
square as discussed in Unit 10.
Here, time is in the form of seasons (Summer, Monsoon, Winter, Spring)
which is non-numeric. Therefore, we use the sequential numbering system
and take t = 1, 2, 3, … instead of the name of seasons as shown in the next
table.
The linear trend line equation is given by
Y=
t β0 + β1t

Since n (= 16) is even, therefore, we make the following transformation in time


t:
t − average of two middle value t − 8.5
Xt
= = = 2 ( t − 8.5 )
half of interval in t values 1/ 2

After the transformation, the normal equations for linear trend line are:

∑=
Y t nβ0 + β1 ∑ Xt


= XY t t β0 ∑ Xt + β1 ∑ X2t

We calculate the values of ∑ Y , ∑ X , ∑ X Y and ∑ X


t t t t
2
t in the following table:

Deseasonalised Xt = Trend
Year Season t Demand of XtYt X 2t Value
Electricity (Yt) 2(t-8.5)

Summer 1 38.74 –15 –581.1 225 44.19

Monsoon 2 47.28 –13 –614.64 169 46.67


2019
Winter 3 51.75 –11 –569.25 121 49.15

Spring 4 46.39 –9 –417.51 81 51.63

60 2020 Summer 5 55.89 –7 –391.23 49 54.11


Unit 11 Seasonal Component Analysis
Monsoon 6 58.19 –5 –290.95 25 56.59

Winter 7 56.46 –3 –169.38 9 59.07

Spring 8 67.35 –1 –67.35 1 61.55


Summer 9 66.41 1 66.41 1 64.03
Monsoon 10 68.19 3 204.57 9 66.51
2021
Winter 11 70.57 5 352.85 25 68.99
Spring 12 73.33 7 513.31 49 71.47
Summer 13 74.71 9 672.39 81 73.95
Monsoon 14 74.56 11 820.16 121 76.43
2022
Winter 15 79.98 13 1039.74 169 78.91
Spring 16 74.83 15 1122.45 225 81.39
Total 1004.63 0 1690.47 1360

Therefore, we find the values of β0 and β1 using the normal equations as

1004.63
1004.63 = 16 × β0 + β1 × 0 ⇒ β0 = = 62.79
16
1690.47
1690.47 = β0 × 0 + β1 × 1360 ⇒ β1 = = 1.24
1360
Thus, the final linear trend line is given by
=Yt 62.79 + 1.24X t

We calculate the trend value by putting Xt = –15, –13,…, 15 in the above trend
line. The same is calculated in the last column of the above table.
We now plot the original and deseasonalised demand of electricity with the
trend line in Fig. 11.2.

Y Deseasonalised
150 demand of electricity

130 Trend
Demand of
electricity
110
Demand of electricity

90

70

50

30

10 X

2019 2020 2021 2022


Season

Fig. 11.2: Actual and deseasonalised demand of electricity with trend values. 61
Block 2 Time Series Analysis
You should try a Self Assessment Question before going to the next section.

SAQ 4
A manager of a national park recorded the following data on the number of
visitors (in thousands) who visited the park in each quarter of 2021 and 2022:
Seasonal
Season 2021 2022
index
Summer 162 51 54
Monsoon 62 28 32
Winter 87 41 45
Spring 89 36 43

(i) Calculate the deseasonalised values for each quarter.


(ii) Compute the trend values for the deseasonalised values.
(iii) Plot the quarterly number of visitors for 2021 and 2022 and also
deseasonalised with trend values on the same axis.

After understanding the various measures of seasonal variations, we now


discuss the measurement of the cyclic effect.

11.7 ESTIMATION OF CYCLIC COMPONENT


A cycle in the time series means a business cycle which normally exceeds a
year in length. Business cycles are perhaps the most important type of
fluctuations for economists and businessmen. It is to be kept in mind that
hardly any time series possesses strict cycles because cycles are never
regular in periodicity and amplitude. In most of the business cycle sometimes
there is an upward trend then sometimes with the downfall. It touches its
lowest level and again a rise starts and it touches its peak. Therefore, a
businessman has to reduce the production or stock and increase the
production or stocks according to the receding and promoted demands
respectively. This is why business cycles are the most difficult type of
economic fluctuation to measure.
It has four phases prosperity (boom), recession, depression and recovery as
discussed in Unit 10 and are shown in Fig. 11.3.

62 Fig. 11.3: Business cycle and its phases.


Unit 11 Seasonal Component Analysis
To construct meaningful typical cycle indexes of curves similar to those that
have developed for trends and seasons is impossible because the successive
cycles vary widely in timing, amplitude and patterns and are inextricably mixed
with irregular factors. The following methods are used for measuring cyclical
variations from a time series:
1. Residual method
2. Reference cycle analysis method
3. Direct method
4. Harmonic Analysis method
We shall be discussing only the first method, which is mostly in use.
11.7.1 Residual Method
Among all methods of estimating cyclical movements of a time series, the
residual method is most commonly used. This method consists of the isolation
of cyclic variation by eliminating the trend and seasonal effects from the time
series data. After removing the trend and seasonal variation, the series is left
only with the cyclic and irregular variations. Symbolically, we use the
multiplicative model, i.e.
Yt = Tt × St × Ct × It

This method proceeds with


Yt Tt × St × Ct × It
= = Tt × St × Ct
St St

Tt × St × It
and then = Ct × It
Tt

If irregular variations are also removed from the time series data then we will
be able to isolate the cyclic fluctuations. We follow the following steps in the
computation of cyclic variations:
Step 1: We first compute the trend values preferably by the moving average
method of suitable order as discussed in Unit 10. Generally, the order Since all components
of moving average is taken as the period of seasonal effect. are present in the time
Step 2: After finding the trend values, we find the seasonal indices preferably sereis, therefore
by ratio to moving average as discussed in Sec. 11.5.

Step 3: After estimating the seasonal effects, we obtain deseasonalised And after dividing by the
values by dividing the actual value by its corresponding seasonal seasonal component
index which removes the seasonal effect from the original data. trend, cyclic and
irregular will remain
Step 4: We then fit the trend equation on the deseasonalised data and find
the trend values and express the deseasonalised data as a
percentage of trend values.
When we divide by the
Step 5: If the time series does not contain any random variations, then Step 4 trend vaues then
will provide the cyclical variations. Otherwise, we filter/smooth out the
random variations by computing moving averages of values obtained
in Step 4 with an appropriate period. Weighted moving average with
suitable weights may also be used, if necessary, for this purpose.
63
Block 2 Time Series Analysis
Let us take up an example to illustrate how to calculate cyclic component.
Example 5: Calculate cyclic effects using the deseasonalised and trend
values for the demand of electricity calculated in Example 4. Also, plot the
original and deseasonalised values and cyclic effect at the same axis.

Solution: The deseasonalised and trend values for the demand of electricity
calculated in Example 4 are given in the following table:
Deseasonalised Cyclic Effect
Trend
Year Season Demand of
Value
Electricity
38.74
Summer 38.74 44.19 × 100 =
87.67
44.19
47.28
Monsoon 47.28 46.67 × 100 =
101.31
46.67
2019
51.75
Winter 51.75 49.15 × 100 =
105.29
49.15
46.39
Spring 46.39 51.63 × 100 =
89.85
51.63
55.89
Summer 55.89 54.11 × 100 =
103.29
54.11
58.19
Monsoon 58.19 56.59 × 100 =
102.83
56.59
2020
56.46
Winter 56.46 59.07 × 100 =
95.58
59.07
67.35
Spring 67.35 61.55 × 100 =
109.42
61.55
66.41
Summer 66.41 64.03 × 100 =
103.72
64.03
68.19
Monsoon 68.19 66.51 × 100 =
102.53
66.51
2021
70.57
Winter 70.57 68.99 × 100 =
102.29
68.99
73.33
Spring 73.33 71.47 × 100 =
102.60
71.47
74.71
Summer 74.71 73.95 × 100 =
101.03
73.95
74.56
Monsoon 74.56 76.43 × 100 =
97.55
76.43
2022
79.98
Winter 79.98 78.91 × 100 =
101.36
78.91
74.83
Spring 74.83 81.39 × 100 =
91.94
81.39

We can obtain the cyclic effect by expressing the deseasonalised values as


the percentage of trend values as follows:
38.74
× 100 =
87.67
44.19
You can calculate the rest cyclic effects in the same way. We calculated the
same in the last column of the above table.

We now plot the original and deseasonalised values with the cyclic effect in
64 Fig. 11.4.
Unit 11 Seasonal Component Analysis

Y Cyclic values Deseasonalised


150 demand of electricity
Demand of
130 electricity

110

90
Claims

70

50

30

10 X

2019 2020 2021 2022


Season

Fig. 11.4: Actual and deseasonalised values with cyclic values.

You should try a Self Assessment Question before going to the next section.

SAQ 5
Compute cyclic component using the deseasonalised and trend values for the
number of visitors (in thousands) calculated in SAQ 4.

After understanding the measures of trend, seasonal, and cyclic components,


we now move to the measurement of the last component of the time series,
that is, the random component.

11.8 ESTIMATION OF RANDOM COMPONENT


Random variations are also known as irregular variations. An irregular
variation cannot be eliminated from any time series because of its nature. It is
very difficult to devise a formula for their direct computation. But this
component can be removed a little bit by averaging the indices. Like the
cyclical variations, this component can also be obtained as a residue after
eliminating the effects of other components. If we use a multiplicative model of
time series, then we can estimate it by dividing the original data by all other
components. If we use the additive model, then it can be estimated by
subtracting all components (trend, seasonal and cyclic) from the original value.
After studying the estimation of all components of a time series, you are in a
position where you can forecast further values of a time series based on these
components. So in the next session, you will know how to predict future
values.

11.9 FORECASTING
One of the main purposes of time series analysis is forecasting. In the simplest
terms, time-series forecasting is a method for predicting future values over a
period or at a precise point in the future using historical and present
data. Forecasting in time series is based on the assumption that the time
series will remain the same in future, at least for the period for which we are 65
Block 2 Time Series Analysis
forecasting, as in the past. This assumption is not very realistic, and we shall
assume that at least for short-term forecasting the process remains the same
as in the past.
If a time series plot shows that there is no seasonal component, or on a
Forecasing
theoretical basis there is no reason for having a seasonal component, then
Time-series forecasting
one can estimate the trend component and by using the trend equation one
is a method for
predicting future values
can easily forecast as you have seen in Example 1 of Unit 10. If a time series
over a period or at a plot shows that there is a significant seasonal effect and on the theoretical
precise point in the ground also there is a valid reason for the presence of such a component,
future using historical then we have to take into account seasonal variations while estimation and
and present data. forecasting.
When we have quarterly (monthly) data and the period of seasonality is one
year then we have to estimate four (twelve) seasonal indices, one for each
quarter (month). After deseasonalised data, the trend equation is fitted. Then
we project the trend for the period for which we have to forecast and then
adjust it for seasonal effect by multiplying it by the corresponding seasonal
index. This gives the final forecast value which has been corrected for
seasonal effect. For forecasting on the basis of time series components, we
follow the following steps:
Step 1: We first compute the trend values preferably by the moving average
method of suitable order as discussed in Unit 10. Generally, the order
of moving average is taken as the period of seasonal effect.
Step 2: After finding the trend values, we find the seasonal indices preferably
by ratio to moving average as discussed in Sec. 11.5.

Step 3: After estimating the seasonal effects, we obtain deseasonalised


values by dividing the actual value by its corresponding seasonal
index which removes the seasonal effect from the original data.
Step 4: We then fit the trend equation for deseasonalised data and compute
deseasonalised forecast value from the trend equation.
Step 5: To adjust the seasonal influence in the forecast, we multiply the trend
value by the corresponding seasonal index. This is known as
seasonalising the trend value. If we use an additive model, then
instead of multiplying we have to subtract the corresponding values.
Let us look at an example after studying the various steps for forecasting
based on time series components.
Example 6: Forecast the demand of electricity for the winter of 2023 using
estimated seasonal indices and fitted trend line equation for the demand of
electricity in Example 4.
Solution: We have fitted the trend equation in Example 4 as:
Y=
t 62.79 + 1.24 × 2 ( t − 8.5 )

We forecast the value for the winter of 2023 by putting t = 19 in the above
trend line equation because the numerical coding for the winter of 2023 is 19
as discussed in Example 4.

66 Ŷt = 62.79 + 1.24 × 2 (19 − 8.5 ) = 88.83


Unit 11 Seasonal Component Analysis
We now adjust the forecast value for seasonal effect by multiplying it by the
corresponding seasonal index as

Ŷt × Seasonal index for winer


Electricity demand for the winter of 2023 =
100
88.83 × 42.51
= = 37.76
100
You should solve the following Self Assessment Questions.

SAQ 6
Forecast the number of visitors for the spring season of 2023 using estimated
seasonal indices and fitted trend line equation in SAQ 4.

We end this unit by giving a summary of its contents.

11.10 SUMMARY
In this unit, we have discussed:
• The simple average method is used to estimate the seasonal effect from
the given time series data. It is based on the basic assumption that the
data do not contain any trend and cyclic components and consists of
eliminating irregular components by averaging the monthly (or quarterly
or yearly) values over the years.
• The seasonal index also called seasonal effect or seasonal component
and it measures the average magnitude of the seasonal influence on the
actual values of the time series for a given period within the year and it
measures how a particular season compares on average to the mean of
the cycle.
• The ratio to trend method is used when cyclical variations are absent
from the data, i.e., the time series variable consists of trend, seasonal
and random components.
• The ratio to moving average method is better than the simple and ratio to
trend methods because of its accuracy, and it can also be applied when
all components namely trend, seasonal, cyclic and irregular variations
present in time series.
• For calculating deseasonalised values, we divide the actual value by its
corresponding seasonal index, that is,
Actual value
Deseasonalised value = × 100
Seasonal index
• To construct meaningful typical cycle indexes of cyclic component similar
to those that have developed for trends and seasons is impossible
because the successive cycles vary widely in timing, amplitude and
patterns and are inextricably mixed with irregular factors.
• An irregular variation cannot be eliminated from any time series because
of its nature. It is very difficult to devise a formula for their direct
computation. But this component can be removed a little bit by averaging
the indices.
• Time series forecasting is a method for predicting future values over a 67
Block 2 Time Series Analysis
period or at a precise point in the future using historical and present
data.

10.11 TERMINAL QUESTIONS


1. What is seasonal index? Describe its significance.
2. Obtain the deseasonalised values using the seasonal indices calculated
in Example 1.

10.12 SOLUTION/ANSWERS
Self Assessment Questions (SAQs)
1. Since time series data of sales are recorded monthly, therefore, we
compute the average and seasonal indices for each month as shown in
the following table:
Sales Average Adjusted
Months Total Seasonal Seasonal
2020 2021 2022 Index Index
January 36 45 48 129 43 104.67
February 33 42 45 120 40 97.37
March 30 39 42 111 37 90.07
April 42 48 48 138 46 111.98
May 45 48 45 138 46 111.98
June 45 45 51 141 47 114.41
July 48 51 48 147 49 119.28
August 39 36 39 114 38 92.50
September 33 42 30 105 35 85.20
October 30 36 30 96 32 77.90
November 36 39 33 108 36 87.63
December 45 42 45 132 44 107.11
Total 493 1200.10
Average 41.08 100

2. Since we have to find seasonal indices using the ratio to trend method,
therefore, we first convert the quarterly data into yearly by computing the
average of all quarters of each year as follows:
Year Q1 Q2 Q3 Q4 Average
48 + 41 + 60 + 65
2018 48 41 60 65 = 53.50
4
2019 58 52 68 74 63.00
2020 60 56 75 78 67.25
2021 62 60 82 90 73.50

Therefore, we get the required yearly data as given below:


Year 2018 2019 2020 2021
Average 53.50 63.00 67.25 73.50

We now fit the trend line of the yearly data. For that, the linear trend line
equation is given by

68 Y=
t β0 + β1t
Unit 11 Seasonal Component Analysis
Since n (= 4) is even, therefore, we make the following transformation in
time t:
t − average of two middle value t − 2019.5
Xt
= = = 2 ( t − 2019.5 )
half of interval in t values 1/ 2

After the transformation, the normal equations for linear trend line will be:

∑=
Y t nβ0 + β1 ∑ Xt


= XY t t β0 ∑ Xt + β1 ∑ X2t

We calculate the values of ∑ Y , ∑ X , ∑ X Y and ∑ X


t t t t
2
t in the following
table:
Xt = Trend
Year (t) Yt XtYt X 2t Value
2(t–2019.5)
2018 53.50 –3 –160.50 9 54.68
2019 6300 –1 -63.00 1 61.10
2020 67.25 1 67.25 1 67.52
2021 73.50 3 220.50 9 73.94
Total 257 0 64.25 20

Therefore, we find the values of β0 and β1 using the normal equations


as
257
257 = 4 × β0 + β1 × 0 ⇒ β0 = = 64.31
4
64.25
64.25 = β0 × 0 + β1 × 20 ⇒ β1 = = 3.21
20
Thus, the final linear trend line is given by
Yt 64.31 + 3.21X t
=

Therefore, we find trend values by putting Xt =–3, –1, 1, 3 in the above


trend line. We calculated the same in the last column of the above table.
We now find quarterly trend values as
We have the yearly increment = 3.21 (i.e. slope β1).
3.21
Therefore, the quarterly increment will=
be = 0.8
4
0.8
Similarly, the half-quarterly increment will be
= = 0.4
2
We calculate the quarterly trend values in the following table:
Quarterly trend values

Quarter 2018 2019 2020 2021

Q1 54.28– 0.8 = 53.48 60.7 – 0.8 = 59.9 67.12 – 0.8 = 66.32 73.54 – 0.8 = 72.74

Q2 54.68 – 0.4 = 54.28 61.1 – 0.4 = 60.7 67.52 – 0.4 = 67.12 73.94– 0.4 = 73.54

Q3 54.68 + 0.4 = 55.08 61.1 + 0.4 = 61.5 67.52 + 0.4 = 67.56 73.94 + 0.4 = 74.34

Q4 55.08 + 0.8 = 55.88 61.5 + 0.8 = 62.3 67.56 + 0.8 = 68.72 74.34 + 0.8 = 75.14
69
Block 2 Time Series Analysis
We now remove the trend effect by dividing each original value by the
corresponding trend value and expressing them in percentages as
shown in the following table:
Quarter 2018 2019 2020 2021

48 41 60 65
Q1 × 100 =
89.75 68.45
× 100 = 90.47
× 100 = 89.36
× 100 =
53.48 59.9 66.32 72.74

58 52 68 74
Q2 106.85
× 100 = 85.67
× 100 = 101.31
× 100 = 100.63
× 100 =
54.28 60.7 67.12 73.54

60 56 75 78
Q3 108.93
× 100 = 91.06
× 100 = 110.42
× 100 = 104.92
× 100 =
55.08 61.5 67.72 74.34

62 60 82 90
Q4 110.95
× 100 = 96.31
× 100 = 119.32
× 100 = 119.78
× 100 =
55.88 62.3 68.72 75.14

We now compute the average of each season/quarter in different years


(seasonal indices) and then adjused seasonal indices as discussed in
Example 1.
Calculations for Seasonal Indices

Seasonal Index Adjused


Quarter 2018 2019 2020 2021
(Average) Seasonal Index
Q1 89.75 68.45 90.47 89.36 84.51 84.82
Q2 106.85 85.67 101.31 100.63 98.61 98.97
Q3 108.93 91.06 110.42 104.92 103.83 104.21
Q4 110.95 96.31 119.32 119.78 111.59 111.99
Total 398.54 399.99
Average 99.64 99.99

Interpretation
The seasonal index for the first quarters is 84.82 which indicates that the
sale of farming equipment, on average, is 15.18 (100 – 84.82)times
below the average. Therefore, we can say that the sale of farming
equipment would be about 15% higher in the season if seasonal
influences had not been present. The seasonal index for the second and
third quarters are almost equal to the average whereas it is (111.99-100)
12 times higher than the average.
3. We first arrange the sale of the farming equipment data quarterly in
chronological order of different years and then obtain 4 quarterly moving
averages as follows:
Calculation of moving averages
Centred
Farming 4-quarterly Seasonal
Year Quarter 4-quarterly
Equipment MA Ratio
MA
Q1 48

Q2 41
2018 53.50
Q3 60 54.75 109.59
56
Q4 65 57.38 113.28

70 58.75
Unit 11 Seasonal Component Analysis
Q1 58 59.75 97.07
60.75
Q2 52 61.88 84.03
2019 63
Q3 68 63.25 107.51
63.50
Q4 74 64 115.63
64.50
Q1 60 65.38 91.77
66.25
Q2 56 66.75 83.90
2020 67.25
Q3 75 67.5 111.11
67.75
Q4 78 68.25 114.29
68.75
Q1 62 69.63 89.04
70.50
Q2 60 72 83.33
2021 73.50
Q3 82

Q4 90

After that, we compute the seasonal ratios for each period (for which we
have the moving average) by dividing each actual time series value, by
its corresponding moving average value as

Actual value
Seasona ratio
= × 100
Moving average

We have computed the seasonal ratios in the last column of the above
table.

We prepare a two-way table consisting of the seasonal ratios quarter-


wise for all years as shown in the following table and calculate median
(seasonal indices) and adjused seasonal indices as follows:

Seasonal Adjusted
Season 2018 2019 2020 2021
Index (Median) Seasonal Index
Q1 97.07 91.77 89.04 91.77 91.87
Q1 84.03 83.90 83.33 83.90 83.99
Q1 109.59 107.51 111.11 109.59 109.71
Q1 113.28 115.63 114.29 114.29 114.42

Total 399.55 400


Average 99.89 100

4. To calculate the deseasonalised values, we prepare a table given below


and write the corresponding seasonal index in front of each actual
quarter and calculate deseasonalised values as 71
Block 2 Time Series Analysis
Actual value
Deseasonalised value = × 100
Seasonal index
Calculations for deseasonalised number of visitors

Number Deseasonalised
Seasonal
Year Season of Data of Number of
Index
Visitors Visitors
2021 Summer 51 162 31.48
Monsoon 28 62 45.16
Winter 41 87 47.13
Spring 36 89 40.45
2022 Summer 54 162 33.33
Monsoon 32 62 51.61
Winter 45 87 51.72
Spring 43 89 48.31

We now determine the trend values by fitting a linear trend by the


method of least squares as discussed in Unit 10. Here, we have non-
numeric data for time it is in the form of seasons (Summer, Monsoon,
Winter, Spring). Therefore, we use the sequential numbering system and
take t = 1, 2, 3, … as shown in the following table:
Deseasonalised
Xt = X 2t
Trend
Year Season t Data of Number XtYt
2(t-4.5) Value
of Visitors (Yt)
Summer 1 31.48 –7 –220.36 49 37.14
Monsoon 2 45.16 –5 –225.80 25 39.00
2021
Winter 3 47.13 –3 –141.39 9 40.86
Spring 4 40.45 –1 – 40.45 1 42.72
Summer 5 33.33 1 33.33 1 44.58
Monsoon 6 51.61 3 154.83 9 46.44
2022
Winter 7 51.72 5 258.60 25 48.30
Spring 8 48.31 7 338.17 49 50.16
Total 349.19 0 156.93 168

The linear trend line equation is given by


Y=
t β0 + β1t

Since n (= 8) is even, therefore, we make the following transformation in


time t:
t − average of two middle value t − 4.5
Xt
= = = 2 ( t − 4.5 )
half of interval in t values 1/ 2

The normal equations for linear trend line are:

∑=
Y t nβ0 + β1 ∑ Xt


= XY t t β0 ∑ Xt + β1 ∑ X2t

We calculate the values of ∑ Y , ∑ X , ∑ X Y and ∑ X


t t t t
2
t in the above
table. Therefore, we find the values of β0 and β1 using normal equations
72 as
Unit 11 Seasonal Component Analysis
349.19
349.19 = 8 × β0 + β1 × 0 ⇒ β0 = = 43.65
8
156.93
156.93 = β0 × 0 + β1 × 168 ⇒ β1 = = 0.93
168
Thus, the final linear trend line is given by
=Yt 43.65 + 0.93X t

Y=
t 43.65 + 0.93 × 2 ( t − 4.5 )

We calculate the trend value by putting Xt = –7, –5, ,…, 7 in the above
trend line. The trend values are calculated in the last column of the
above table.
We now plot the original and deseasonalised number of visitors with the
trend line in Fig. 11.5.

Deseasonalised no. of
visitors
70 Y
No. of visitors
60 Trend values
Number of visitors

50

40

30

20

10 X

2021 2022
Year

Fig. 11.5: Actual and deseasonalised number of visitors with trend values.

5. The deseasonalised and trend values for the number of visitors


calculated in SAQ 4 are given in the following table:

Number Deseasonalised
Seasonal Trend Cyclic
Year Season of Data of Number
Index Value Effect
Visitors of Visitors
Summer 51 162 31.48 37.14 84.76
Monsoon 28 62 45.16 39.00 115.79
2019
Winter 41 87 47.13 40.86 115.35
Spring 36 89 40.45 42.72 94.69
Summer 54 162 33.33 44.58 74.76
Monsoon 32 62 51.61 46.44 111.13
2020
Winter 45 87 51.72 48.30 107.08
Spring 43 89 48.31 50.16 96.31

We can obtain the cyclic effect by expressing the deseasonalised values


as a percentage of trend values as follows:

31.48
× 100 =
84.76
37.14
73
Block 2 Time Series Analysis
The rest are calculated in the last column of the table.
6. We have the fitted trend equation:
Y=
t 43.65 + 0.93 × 2 ( t − 4.5 )

We forecast the value for the spring season of 2023 by putting t = 12


because the numerical coding for the winter of 2023 is 12 as discussed
in Example 4.
Ŷt =Yt =43.65 + 0.93 × 2 (12 − 4.5 ) =54.81

We now adjust the forecast value for seasonal effect by multiplying it by


the corresponding seasonal index as
The number of visitors who visit the park in the spring season of 2023

Ŷt × Seasonal index for spring


=
100
54.81× 115.35
= = 63.22 ≈ 64
100
Terminal Questions (TQs)
1. Refer to Sec. 11.4.
2. For finding the deseasonalised values, we prepare a table given below
and write the corresponding seasonal index in front of each actual
observation and calculate deseasonalised values as
Acual value
Deseasonalised value = × 100
Seasonal index
Demand Deseasonalised
Seasonal
Year Season of Demand of
Index
Electricity Electricity
70
Summer 70 173.17 × 100 =
40.42
173.17

2019 Monsoon 52 110.98 46.86


Winter 22 44.72 49.19
Spring 31 71.14 43.58
Summer 101 173.17 58.32
Monsoon 64 110.98 57.67
2020
Winter 24 44.72 53.67
Spring 45 71.14 63.26
Summer 120 173.17 69.30
Monsoon 75 110.98 67.58
2021
Winter 30 44.72 67.08
Spring 49 71.14 68.88
Summer 135 173.17 77.96
Monsoon 82 110.98 73.89
2022
Winter 34 44.72 76.03
Spring 50 71.14 70.28

74
UNIT 12
STATIONARY TIME SERIES
Structure
12.1 Introduction 12.4 Transforming Nonstationary
Time Series into Stationary
Expected Learning Outcomes
Differencing
12.2 Stationary and Nonstationary
Time Series Seasonal Differencing

Week Stationarity Log-Transformation

Strict Stationarity Power Transformation

Non-Stationarity Box-Cox Transformations

12.3 Detecting Stationarity 12.5 Summary


Visualisation 12.6 Terminal Questions
Summary Statistics 12.7 Solution/Answers

Statistical Tests

12.1 INTRODUCTION
In the previous two units of this block, we have seen that time series can be
decomposed into four components, i.e., trend, seasonal, cycle, and irregular
components. We have also discussed some methods for estimating trend,
seasonal and cyclic components and then how to use them for the forecast.
Due to several features of the time series, this approach is not necessarily the
A time series is said
best one. to be stationary if
According to the modern approach, we try to fit a time series model so that we the properties of
one segment of the
can forecast the observations. But one of the essential elements of time series time series are
modelling is stationarity. A stationary times series is unaffected by the instant similar to the other
at which it is viewed. Most of time series forecasting models assume that the segment of the time
series. In other
underlying time series is stationary. In this unit, you will learn some
words, a stationary
fundamental concepts that are necessary for a proper understanding of time time series is a
series modelling. We begin with a simple introduction of stationary and series whose
nonstationary time series in Sec. 12.2. Since stationarity is one of the essential statistical properties
such as mean,
elements of a time series, therefore, we discuss various methods of detecting
variance, etc.of one
stationarity in Sec. 12.3. As stationarity is necessary to model a time series section are much
and if a time series shows a particular type of non-stationarity, then some like other sections.
simple transformation makes it stationary and then we can model them.
Therefore, in Sec. 12.4, we explain various methods of transforming a 75
* Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 3 Time Series Analysis
nonstationary time series into a stationary one. In the next unit, you will study
the concept of correlation in time series.

Expected Learning Outcomes


After studying this unit, you would be able to:
 describe a very useful class of time series, which is called stationary
time series;
 define weak and strict stationary time series;
 distinguish between stationary and nonstationary time series;
 apply various methods for detecting stationarity; and
 transforming a nonstationary time series to stationary time series.

12.2 STATIONARY AND NONSTATIONARY TIME


SERIES
One of the essential elements of a time series analysis is stationarity. In
simple words, a time series is stationary if it is unaffected by the instant at
which it is viewed. Most of the time series forecasting models assume that the
underlying time series is stationary. Think in your mind, “Why is stationarity
necessary in time series analysis?” To give the answer to this question,
first, we try to understand stationary and nonstationary time series. Therefore,
let's take a moment to discuss the stationary process before moving to the
time series models.
A time series is said to be stationary if the properties of one segment of the
time series are similar to the other segment of the time series. In other words,
a stationary time series is a series whose statistical properties of one section
are much like other sections. A time series whose statistical properties change
over time is called a nonstationary time series.
The concept of the stationary and nonstationary time series is also shown with
the help of the diagram as shown in Fig. 12.1.

76
Fig. 12.1: Distributions of stationary and nonstationary time series.
Unit 12 Stationary Time Series
Part (a) of the figure shows that the distribution of the time series remains the
same at different segments, therefore, it is stationary time series whereas part
(b) of the figure shows that the distribution of the time series changed with
time segment, therefore, it is a nonstationary time series.

We now discuss which statistical properties are used to check the same. As
you know that the mean and the variance of data are frequently used as basic
statistics to capture characteristics of data. Also, to describe the broad
characteristics of the data distribution, a histogram is used. Therefore, by
obtaining the mean, the variance and the histogram, it is expected to capture
some aspects or features of the data. but it is observed that for two different
time series data, the shape of the histogram is almost similar. Consider the
two segments of the sales data of a grocery shop in different months as shown
in Fig. 12.2.

Y Y
90
10880

10780

10680
88
10580

10480

10380 86

10280

10180
84
10080

9980

9880
1 10 19 28 37 X 82
1 10 19 28 37 X
Time Time

(a) (b)
Fig. 12.2: Two segments of the sales data.

From Fig. 12.2, we can observe that the shape of both segments of the time
series appeared differently. We now plot the histograms of these in Fig. 12.3.

(a) (b)

Fig. 12.3: Histograms of two segments of the sales data.

Fig. 12.3 indicates that the histograms of both segments are quite similar, but
our time series are quite different. Therefore, it indicates that a histogram is
not sufficient to check whether the properties/distribution of one segment of
the time series are similar to the other segment. In other words, we can say
that only the univariate (marginal) distribution of the time series cannot check
the same. Therefore, we move to the joint distribution, and we know that joint
distribution describes the properties such as covariance and correlation
coefficients. As you know that to get the idea of correlation, we plot the scatter
77
diagram. Therefore, we plot the scatter diagrams of both segments of the time
Block 3 Time Series Analysis
series by taking time series values, say, Yt on the X-axis and its lag values,
The number of say, Yt+1 on the Y-axis (You will learn about the term lag in the next sections of
intervals between
the two this unit.) as shown in Fig. 12.4.
observations is the
lag. For example,
91
10900

the lag between the 10800 90

current and
previous
10700 89

observations is one. 10600


88

If you go back one 10500


more interval, the 87

lag is two, and so 10400


86

on. In mathematical 10300

terms, the 10200


85

observations at Yt 84

and Yt+k are 10100

separated by k time 10000


83

units. k is the lag. 82


9900

9800 81
9800 9900 10000 10100 10200 10300 10400 10500 10600 10700 10800 10900 81 82 83 84 85 86 87 88 89 90 91

(a) (b)

Fig. 12.4: Scatterplots with it lag of two segments of the sales data.

The scatterplot in Fig. 12.4(a) reveals that the data are uniformly distributed
about the origin in a circle, which suggests that there is minimal connection
between Yt and Yt+1. On the other hand, the scatterplot shown in Fig. 12.4(b) is
concentrated around a line with a positive slope, showing that Yt and Yt+1
have a strong positive correlation.
These examples demonstrate that it is important to take into account not only
the distribution of Yt but also the joint distribution of a time series with its lags
for checking the stationarity of a time series.
We shall describe two types of stationarity as follows:

12.2.1 Weak Stationarity


A time series is said to be stationary if its mean, variance and covariance do
not change over time. It means that if we find the mean, variance and
covariance of one segment of the time series and they will approximately be
the same for the other segment of the time series, i.e.,
Mean [ y t ] Mean
= = [ y t +k ] μ(constant ) and
Var [ y t ] Var
= = [ y t +k ] σ 2 (constant )
Cov [ y t , y s ] Cov
= = [ y t +k , y s+k ] constant
where Var and Cov represent variance and covariance, respectively.
then the time series is called stationary but in strict sense it is called week
stationary.
If time series data follows a normal distribution, then these properties (mean,
variance and covariance) completely describe the distribution but if it is not
then these properties are not sufficient to describe the distribution or check the
stationarity. Therefore, a time series with these properties is called weakly
stationary or covariance stationary.
78 A time series is said to be weak stationary of order m, if the moments up to
Unit 12 Stationary Time Series
order m are constant and do not depend on time. Any other statistics (like
moments of order greater than m) can change over time.
Therefore, if m = 1 then the stationarity is called the first-order stationarity and
it implies that the first moment, that is, the mean of the time series remains
constant over time and any other statistics (like variance) can change with
time. No requirements are placed on the moments of higher order.
If m = 2, then the stationarity is called the second-order stationarity (also
called weak stationarity) and it implies that the time series has moments up to
second order, that is, mean and variance remain constant and finite over time.
Also the correlation/covariance between the observations from different points
in time is independent over time and only depends on the lag. Other statistics
are free to change over time. No requirements are placed on the moments of
higher order. Therefore, we can say that a time series (Yt) is said to be
stationary of second order if
(i) The mean and variance of the series do not change over time, that is,
Mean
= [ Yt ] Mean
= [ Yt +k ] μ,=
Var [ Yt ] Var
= [ Yt +k ] σ 2 and independent of t.
(ii) The correlation structure of the series, along with its lags, remains the
same over time. That is Cov [ Yt , Yt +k ] depends only on k (lag) for all t.

This is one of the most commonly observed series in real-life practice. In this
course whenever we call a times series stationary, we in fact refer to weak
stationarity.
12.2.2 Strict Stationarity
As we described that if the time series data do not follow the normal
distribution, then the mean, variance and covariance do not capture the
complete distribution. Therefore, it is necessary to check the joint distribution
of the time series. In the strict sense, a time series is called stationary if the
joint probability distribution of the observations, Yt, Yt+1, …, Yt+n remains the
same as another set of observations shifted by k (k > 0), k is called lag, time
units, that is, Yt+k, Yt+k+1, …, Yt+k+n. As a result, a time series is strictly
stationary if all statistical measures (such as mean, variance, higher moments,
etc.) are constant with respect to time, i.e., do not depend on t. Thus, when n
= 1, the (univariate) distribution of Yt is the same as that of Yt+k for all t and k.
We can also say that the Y’s are (marginally) identically distributed and
therefore, it then follows for all t and k that the mean and variance are constant
over time, that is,
Mean [ Yt ] Mean
= = [ Yt +k ] μ(constant ) and
Var [ Yt ] Var
= = [ Yt +k ] σ 2 (constant )
When n = 2, then according to the definition of stationarity we see that the
bivariate distribution of Yt and Ys must be the same as that of Yt+k and Ys+k, it
follows that for all t, s and k the covariance between y t and y s is constant over
time, that is,
Cov [ Yt , Ys ] Cov
= = [ Yt +k , Yt +s ] constant
That is, the covariance between Yt and Ys depends only on the time only
through the time difference t − s and not on the actual times t and s. 79
Block 3 Time Series Analysis
In most of the time series analysis, we do not require such strong conditions
and we shall define a weak type of stationarity which will be sufficient for most
of our purposes.
After understanding the stationarity and non-stationarity, we now come to the
answer “Why is stationarity necessary for time series analysis?” There
are two main reasons:

• The first one is that the inferences drawn from a nonstationary process
will not be reliable as its statistical properties/ parameters will keep on
changing with time. Thus, if these parameters are continuously changing,
estimating them by averaging over time will not be accurate.

• The second reason for this assumption is based on the Wold’s


representation theorem, which states that any stationary process can be
represented as a linear combination of white noise (Do not worry if you
are not familiar with the white noise term. In the next unit, we will discuss
it). It means that for representing a time series in the form of a model then
this assumption is necessary.

Hence, for the forecast to be reliable, it is necessary that the statistical


properties of the time series will remain the same in the future as they have
been in past.
12.2.3 Non-Stationarity
In simple words, a time series is said to be nonstationary if the statistical
properties of one segment of the time series are not similar to the other
segment of the time series, that is, the mean, variance, and covariance of the
time series change over time. Therefore, a time series which contains trend,
seasonality, cycles, random walks, or combinations of these is nonstationary.
Nonstationary data, as a rule, are unpredictable and cannot be modelled or
forecasted. In order to receive consistent, reliable results, the nonstationary
data needs to be transformed into stationary data.
After understanding stationary and nonstationary time series a question may
arise in your mind, how do we check whether a time series is stationary or
not? We explain it in the next section.

12.3 DETECTING STATIONARITY


As we have seen in the above section, stationarity is necessary for reliable
modelling and forecast, therefore, it is very important to ascertain whether a
given time series is stationary or not. We are describing some ways through
which you can check whether a given time series is stationary or
nonstationary.
12.3.1 Visualisation
The simplest way to check whether a given data comes from a stationary
series or not is to plot the data or some function of it. Both stationary and
nonstationary time series have some properties that can be detected very
easily from the plot of the data. Let us understand how these methods be
80 used to identify stationarity.
Unit 12 Stationary Time Series
Looking at the Data
First, we plot the data with respect to time and try to understand the
pattern of mean and variance. If the plot shows roughly horizontal
(although some cyclic behaviour is possible), with constant variance then
this indicates that the series is stationary. The data points in a stationary
series would constantly move in the direction of the long-run mean with a
constant variance. If the data points might show some trend or
seasonality, then these are an indication of a nonstationary series. For
more explanation, we consider some time series plotted in Fig. 12.5.
Fig. 12.5 (a): The sale of cars in different years; (b): Sales of milk on
different days; (c): Monthly sales of new houses in a region; (d): Monthly
sales of ice-cream in different months (e): Traffic density in a particular
area for 80 consecutive days.

Y Y
870 168
Monthly Sales of New Houses

770 166
164
670
162
Car Sales

570
160
470 158
370 156
154
270
152
170 X
150 X
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Year Day

(a) (b)

170
Y
160

150

140
Sales of new Houses

130

120

110

100

90

80

70 X
2013 2014 2015 2016 2017 2018 2019
Month

(c)
Y
1500 Y
66
1350
64
1200
Sales of ice-cream

1050 62
Traffic density

900 60
750
58
600
56
450
300 54

150 X 52 X
2013 2014 2015 2016 2017 1 11 21 31 41 51 61 71
Month Day

(d) (e)
Fig. 12.5: Time series plots of various data. 81
Block 3 Time Series Analysis
Which of these do you think are stationary? Let us discuss them one at
a time.
If you look at the first plot (car sales) (a), we can see that the mean varies
(increases) with time which results in an upward trend. Thus, this is a
nonstationary series. For a series to be classified as stationary, it should
not exhibit a trend.
Moving on to the second plot (b), we can see that there is no trend in the
series, but the variance of the series increases with time. As mentioned
previously, a stationary series must have a constant variance.
The third plot (c) shows that the sales of new houses first increase and then
decrease over a span of time, therefore, it is not also a stationary series.
The fourth plot (d) shows the seasonality as well as the trend (upward) and a
regularly repeating pattern of highs and lows related to months of the year.
Therefore, it is not a stationary series.
If you look at the fifth plot (e), we can see that there is no consistent trend
(upward or downward) over the entire time span. The series appears to slowly
wander up and down. The horizontal line drawn at 60 indicates the mean of
the series and we notice that the series tends to stay on the same side of the
mean (above or below) for a while and then wanders to the other side. Also,
the variance is constant. Almost by definition, there is no seasonality. So we
can say that this time series is stationary.
This method is only used to get an idea about stationarity and is not
completely reliable.
Autocorrelation Functions Plots
The time series plot gives only the idea of stationarity. Autocorrelation function
plot also known as correlogram is another method through which we can look
for the stationarity of a time series more accurately. We will describe it in
the next section.
After understanding how to visually check the stationarity in the data. Let's
move to other techniques for detecting stationarity.
12.3.2 Summary Statistics
As we discussed, a stationary time series has a constant mean, variance, etc.
over time. Therefore, we can use summary statistics like mean and variance to
check whether a time series is stationary or not.
In this method, we divide the data into two or more random groups and for
each group, we calculate the summary statistics as the mean or other
moments. After that, we analyse the summary statistics of such groups. If the
mean and variance of these groups are very close to each other, the series is
stationary otherwise, it is not stationary.

For example, we split the data of the traffic intensity as discussed above into
two halves and calculate the mean and variance of each group as

1 n1
82
Mean of the first =
group = ∑ xi 60
n1 i=1
Unit 12 Stationary Time Series
n2
1
Mean of the second =
group = ∑ xi 60.5
n2 i=1

1 n1
=
Variance of the first group ∑ (xi − x)
n1 i=1
= 2
5.69

n2
1
=
Variance of the second group
n2
∑ (x
i =1
i x)2 5.26
−=

We can observe that the mean and variance of both groups are very close to
each other, therefore, the series is stationary.
In a similar way, if we split the data of the monthly sales of new houses sold in
the region into two halves and calculate mean and variance of each group as
1 n1
Mean of the first =
group = ∑ yi 132
n1 i=1

1 n2
Mean of the second =
group = ∑ yi 114
n2 i=1

1 n1
=
Variance of the first group ∑ (yi − y)
n1 i=1
= 2
168.48

n2
1
=
Variance of the second group
n2
∑ (y
i =1
i y)2 366.45
−=

We can observe that there is a big difference between the two variances and
the two means' values. This implies that the series is not stationary.
If we want to apply a more effective and practical way to check whether the
series is stationary or not, then we may use different statistical tests which are
discussed in the next sub-sections.

12.3.3 Statistical Tests

There are both parametric and nonparametric tests (you will learn about
parametric and nonparametric tests in MST-016: Statistical Inference and
MST-019: Classical and Bayesian Estimation) that may be used to check
whether a time series is stationary or not. For testing the stationarity, we
formulate the hypotheses as

Null Hypothesis H0: The time series is not stationary.

Alternative Hypothesis H1: The time series is stationary.

For testing the hypotheses, two of the most used tests to test for stationarity
are the Augmented Dickey-Fuller test and the Kwatkowski-Phillips-
Schmidt-Shin test. These tests are beyond the scope of this unit. We shall
not discuss these here and if someone is interested in these, may refer to
Time Series Analysis Forecasting and Control, 4th Edition, written by Box,
Jenkins and Reinsel.

You now may like to pause here and check your understanding of detecting
stationarity by answering the following Self Assessment Question. 83
Block 3 Time Series Analysis

SAQ 1
The following Figs 12.6 (a) and (b) show the production of a company in
different years and temperatures of a process on different days, respectively.
Check visually whether the time series are stationary or not.

44,000 Y
39,000
34,000
29,000
Production

24,000
19,000
14,000
9,000
4,000 X
1 11 21 31 41
Year

Fig. 12.6 (a): Time series plots of production of a company in different years.

144 Y

142

140
Temperature

138

136

134

132

130 X
1 11 21 31 41
Day

Fig. 12.6 (b): Time series plots of the temperature of a process on different days

12.4 TRANSFORMING NONSTATIONARY TIME


SERIES INTO STATIONARY
In the previous sections, you learnt stationary and nonstationary time series
and why stationarity is important in a time series. But if we observe a
nonstationary time series then how can we draw a reliable forecast? One way
for that is transforming a nonstationary time series to stationary series. We can
do that by identifying and removing trends and seasonal effects. There are
following main methods for doing the same.
• Differencing

• Seasonal differencing

• Log-Transformation

• Power Transformation

• Box-Cox transformation

84 Let us learn about these one at a time.


Unit 12 Stationary Time Series
12.4.1 Differencing
It is one of the simplest methods for removing a systematic structure from the
time series. This method is used when the series exhibits non stabilise mean,
that is, trend and seasonality. Therefore, it is typically performed to get rid of
the varying mean. For example, a trend can be removed by subtracting the
previous value from each value in the series. In this method, we compute the
difference between consecutive observations in a series. Mathematically, it
can be applied as:
Y=
t
′ Yt − Yt −1

where Yt and Yt-1 are the values of the time series at time point t and t –1,
respectively and Yt′ represents the first-order difference.

This is called first-order differencing. In some situations, a nonstationary time


series will nevertheless convert into stationary from a single difference. In that
case, a second-order differencing is required. Second-order differencing is the
change between two consecutive data points in a first-order differenced time
series. In general, differencing of order d is used to convert nonstationary time
series to stationary time series. It is tricky to implement because the inverse
operation of differencing is the cumulative sum. This is not as straightforward
as a transformation because when we apply the inverse of the differenced
data to our forecasts, we must add in the last known observation of our series
in order to get the correct transformation.
12.4.2 Seasonal Differencing
If a time series shows a seasonality effect, then to remove this effect, we
calculate the difference between an observation and a previous observation
from the same season instead of calculating the difference between
consecutive values. For example, if we observe monthly seasonal effect then
we subtract the observation, say, the month of July of a year from an
observation taken in July of the previous year. If a season has a period m,
then mathematically it can be written as:
Y=
t
′ Yt − Yt −m

If there is a seasonal component at the level of one week, then we can


remove it on an observation today by subtracting the value from last week,
that is, m = 7.
The method of differencing has the main drawback of losing one observation
each time when the difference is calculated and it does not remove a non-
linear trend.

12.4.3 Log-Transformation
In time series analysis, the log-transformation is often used to stabilise the
variance and remove the non-linearity trend from a time series. Time series
with an exponential trend can be made linear by taking the logarithm of the
values. If we denote the original observations as Y1, Y2 ,..., YN then we make
the transformation as

Z t = log ( Yt ) 85
Block 3 Time Series Analysis
where Z1, Z2 ,..., ZN denote the transformed observations and the logarithm is
natural (i.e., to base e). Since the stationarity of time series helps us construct
the forecast model, therefore, it is important to apply the inverse of that
transformation to the data in order to get back to the original scale. Therefore,
we can find the original values as
Yt = exp ( Z t )

12.4.4 Power Transformation

The method of log-transformation has the main drawback that it can be


applied when the observations are positive and non-zero because log of
negative observation is not defined and log of zero is infinite. Other
transformations may also be used such as square roots and cube roots.
They are also called power transformations because they take the form as
follows:
Z t = Yt p

The inverse of the transformation is


1
Yt = ( Z t ) p

12.4.5 Box-Cox Transformations

George Box and Sir David Cox proposed a very useful family of
transformations called Box-Cox transformations. The beauty of this
transformation is that it includes both logarithms and power
transformations and is defined as follows:
log ( Yt ) ; if λ =0

Zt =  Y λ − 1

t
; otherwise
 λ
In this transformation, the logarithm is natural (i.e., to base e). It depends on
the parameter λ which varies from –5 to 5. If λ = 0, then this transformation
uses the log-transformation whereas if λ ≠ 0, then a power transformation.
We consider all values of λ and select the optimal value for our data. The
“optimal value” is the one which results in the variance stationary. We list
some common values used for λ as follows:
• λ = –1 is a reciprocal transform.
• λ = –0.5 is a reciprocal square root transform.
• λ = 0 is a log transform.
• λ = 0.5 is a square root transform.
• λ = 1 is no transform.
In the case of the Box-Cox method, we can find the original values as
exp ( Z t ) ; if λ =0
Yt = 
( λZ t + 1) ; otherwise
1/ λ

After understanding the methods of transforming nonstationary time series


86 to stationary, we now try to do an example.
Unit 12 Stationary Time Series
Example 1: The data given below presents the information on total annual
energy consumption in a particular area in different years.
Energy Energy
Year Year
Consumption Consumption
2001 32.0 2012 45.0
2002 34.6 2013 45.7
2003 37.0 2014 46.0
2004 36.7 2015 49.6
2005 37.6 2016 51.8
2006 37.1 2017 54.0
2007 39.1 2018 53.8
2008 41.7 2019 55.7
2009 41.8 2020 58.8
2010 41.6 2021 60.6
2011 43.4

(i) Plot the energy consumption data.


(ii) Is there an indication of nonstationary behaviour in the time series?
(iii) If yes, calculate the first difference of the time series and plot it.
(iv) What impact has differencing had on the time series?
Solution:
First, we plot the time series data by taking years on the X-axis and the
energy consumption on the Y-axis. We get the time series plot as shown in
Fig. 12.7.

Y
65
60
Energy Consumption

55
50
45
40
35
30
25 X

Year

Fig. 12.7: Time series plots of the energy consumption data.

From the plot, we can see that the mean energy consumption varies
(increases) with time which results in an upward trend. Thus, this is a
nonstationary time series. For a series to be classified as stationary, it should
not exhibit a trend.

For removing the trend, we obtain the first-order difference, that is, we
compute the difference between consecutive observations in the series by
subtracting the previous value from each value in the series. Mathematically, it
87
can be applied as
Block 3 Time Series Analysis
Y=
t
′ Yt − Yt −1

Y2′ =Y2 − Y2−1 =34.6 − 32.0 =2.6

You calculate the rest values in a similar way which are given in the following
table:
Energy Energy
Year Yt′ Year Yt′
Consumption (Yt) Consumption (Yt)
2001 32.0 ̶ 2011 43.4 1.8
2002 34.6 2.6 2012 45.0 1.6
2003 37.0 2.4 2013 45.7 0.7
2004 36.7 –0.3 2014 46.0 0.3
2005 37.6 0.9 2015 49.6 3.6
2006 37.1 –0.5 2016 51.8 2.2
2007 39.1 2.0 2017 54.0 2.2
2008 41.7 2.6 2018 53.8 –0.2
2009 41.8 0.1 2019 55.7 1.9
2010 41.6 –0.2 2020 58.8 3.1

To study the impact of the first-order differencing on the pattern of the time
series, we plot the first-order differences values against time (years) in
Fig. 12.8.
Y
4.0
3.5
First differenced values

3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
-1.0 X

Year
Fig. 12.8: Time series plots after the first difference of the energy consumption data.

If you look at Fig. 12.8, we observe that there is no consistent trend (upward or
downward) over the entire time span. It means that the first-order difference
removes the trend effect, and the time series becomes almost stationary.
Before going to the next session, you may like to check stationarity yourself.
For that try a Self Assessment Question.

SAQ 2
Suppose a researcher wants to study the pattern of sales of a store. He
records the monthly sales (in thousands) of the store for 36 months which are
given as follows:
Month Sales Month Sales
1 150 14 3352
2 184 15 4524
88 3 245 16 5512
Unit 12 Stationary Time Series
4 284 17 6022
5 301 18 6254
6 325 19 7906
7 526 20 9325
8 852 21 10450
9 1253 22 9358
10 1542 23 13688
11 1425 24 18542
12 1742 25 25142
13 2015

(i) Plot the sales data.


(ii) Is there an indication of nonstationary behaviour in the time series?
(iii) If it is nonstationary, apply the log-transformation and plot the
transformed time series.
(iv) What impact has the log transformation had on the time series?
(v) Is the new time series stationary? If not, calculate the first difference
and plot it
(vi) What impact has differencing had on the time series?

We end this unit by giving a summary of its contents.

12.5 SUMMARY
In this unit, we have discussed:
• A time series is said to be stationary if the statistical properties of one
segment of the time series are similar to the other segment of the time
series otherwise it is called nonstationary time series.
• Various methods for detecting stationarity such as visualisation, summary
statistics and statistical tests.
• Various methods of transforming nonstationary time series to stationary
such as differencing, seasonal differencing, log-transformation, power
transformation, and Box-Cox transformation.

12.6 TERMINAL QUESTIONS


A researcher wants to study the pattern of sales of new single house in a
region. She collects the data of the number of new single house sales for 15
months in that region which are given as follows:
Sales of New Sales of New
Month Month
Single House Single House
1 116 9 290
2 154 10 300
3 175 11 315
4 207 12 345
5 225 13 353
6 230 14 385
7 245 15 410
8 270 89
Block 3 Time Series Analysis
For the data:
(i) Plot the time series data and comment on any features of the data that
you see.
(ii) If the plot will show non-stationarity then transfer the data using first
difference and plot new time series data.
(iii) What impact has differencing had on the time series? Is the new time
series stationary or nonstationary?

12.7 SOLUTION/ANSWERS
Self Assessment Questions (SAQs)
1. Fig. 12.6 (a) shows that the production of the company increases with
time, therefore, there is a trend effect in the time series. Thus, this is a
nonstationary time series. For a series to be classified as stationary, it
should not exhibit a trend.
Fig. 12.6 (b) shows that there is no consistent trend (upward or
downward) over the entire period. The series appears to slowly wander
up and down. If we plot a line at 135 then it indicates the mean of the
series and we notice that the series tends to stay on the same side of
the mean (above or below) for a while and then wanders to the other
side. Also, the variance is constant. Almost by definition, there is no
seasonality. So we can say that this time series is stationary.
2. We plot the sales data by taking months on the X-axis and the sales
on the Y-axis. We get the time series plot as shown in Fig. 12.9.

Y
30000

25000

20000
Sales

15000

10000

5000

X
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Month

Fig. 12.9: Time series plots of the data.

From the plot, we observe that sale increases with time which results in
an upward trend. For a series to be classified as stationary, it should not
90 exhibit a trend. Thus, it is a nonstationary series.
Unit 12 Stationary Time Series
According to log-transformation, we take the natural log (base e) of sales
data as shown in the following table:

Month Sales (Yt) Log (Yt) Month Sales (Yt) Log (Yt)

1 150 2.176 14 3352 3.525

2 184 2.265 15 4524 3.656

3 245 2.389 16 5512 3.741

4 284 2.453 17 6022 3.780

5 301 2.479 18 6254 3.796

6 325 2.512 19 7906 3.898

7 526 2.721 20 9325 3.970

8 852 2.930 21 10450 4.019

9 1253 3.098 22 9358 3.971

10 1542 3.188 23 13688 4.136

11 1425 3.154 24 18542 4.268

12 1742 3.241 25 25142 4.400

13 2015 3.304

We now plot the log-transformed data as above taking log-transformed


sales instead of given sales on the Y-axis as shown in Fig. 12.10.

Y
5.0

4.5
Log-transformed Sales

4.0

3.5

3.0

2.5

2.0 X
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Month

Fig. 12.10: Time series plots of the data.

From the figure, we also observe an upward trend. Thus, it is a


nonstationary series. Hence, we can say that the log-transformation had
not removed the trend effect on the sales data.

We now obtain the first-order difference of the transformed data, that is,
we compute the difference between consecutive observations in the

series by subtracting the previous value from each value in the series as: 91
Block 3 Time Series Analysis
=Yt′ log ( Yt ) − log ( Yt −1 )

=Y2′ log ( Y2 ) − log ( Y2−1 )

= 2.265 − 2.176 = 0.089

The rest first-order differences are calculated in the following table:

Month Sales (Yt) Log (Yt) Yt′ Month Sales Log (yt) Yt′
1 150 2.176 0.089 14 3352 3.525 0.130
2 184 2.265 0.124 15 4524 3.656 0.086
3 245 2.389 0.064 16 5512 3.741 0.038
4 284 2.453 0.025 17 6022 3.780 0.016
5 301 2.479 0.033 18 6254 3.796 0.102
6 325 2.512 0.209 19 7906 3.898 0.072
7 526 2.721 0.209 20 9325 3.970 0.049
8 852 2.930 0.168 21 10450 4.019 -0.048
9 1253 3.098 0.090 22 9358 3.971 0.165
10 1542 3.188 -0.034 23 13688 4.136 0.132
11 1425 3.154 0.087 24 18542 4.268 0.132
12 1742 3.241 0.063 25 25142 4.400 0.130
13 2015 3.304 0.221

To study the impact of the differencing had the time series, we plot the
first-order differences against time (months) in Fig. 12.11.
Y
0.25

0.20
First-order difference

0.15

0.10

0.05

0.00

-0.05

-0.10 X
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Month

Fig. 12.11: Time series plots after the first difference.

If you look at Fig. 12.11, you will observe that there is no consistent
trend (upward or downward) over the entire time span. It means that the
first-order difference removes the trend effect, and the time series
becomes almost stationary.
92
Unit 12 Stationary Time Series
Terminal Questions (TQs)
1. First, we have to plot the time series data. For that, we take the month
on the X-axis and the sales of new single houses on the Y-axis as
shown in Fig. 12.12.

Y
450
Sales of new single houses

400

350

300

250

200

150

100

50 X
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Month

Fig. 12.12: Time series plots of the sales of new single houses data.

From the figure, we observed that the sales of new houses increase with
time, therefore, there is a trend effect in the time series. Thus, this is a
nonstationary series.
We can remove the trend by transforming the data using differencing. So
we obtain the first-order difference, that is, we compute the difference
between consecutive observations in the series by subtracting the
previous value from each value in the series as:
Y=
t
′ Yt − Yt −1

′ Y2 − Y2−1 = 34.6 − 32= 2.6


Y=
2

The rest first-order differences are calculated in the following table:


Sales of new Sales of new
Month single Yt′ Month single Yt′
houses (Yt) houses (Yt)
1 116 9 290 20
2 154 38 10 300 10
3 175 21 11 315 15
4 207 32 12 345 30
5 225 18 13 353 8
6 230 5 14 385 32
7 245 15 15 410 25
8 270 25

To study the impact of the differencing had the time series, we plot the
first-order differences against time (months) in Fig. 12.13.
If you look at Fig. 12.13, you will observe that there is no consistent
trend (upward or downward) over the entire time span. It means that the
first-order difference removes the trend effect, and the time series
93
becomes almost stationary.
Block 3 Time Series Analysis

Y
40

30

First difference
20 Mean

10

0 X
2 3 4 5 6 7 8 9 10 11 12 13 14 15
Year

Fig. 12.13: Time series plots after the first difference of the sales of new single houses.

94
UNIT 13
CORRELATION ANALYSIS IN
TIME SERIES

Structure
13.1 Introduction 13.5 Correlogram
Expected Learning Outcomes 13.6 Interpretation of
Correlogram
13.2 Autocovariance and
Autocorrelation Functions 13.7 Summary
13.3 Estimation of Autocovariance 13.8 Terminal Questions
and Autocorrelation
13.9 Solution/Answers
Functions
13.4 Partial Autocorrelation
Function

13.1 INTRODUCTION
With the help of the time series data, we try to fit a time series model so that
we can forecast the observations. But one of the essential elements of time
series modelling is stationarity. In the previous unit, you have studied what is
stationary and nonstationary time series and how to detect and transform
nonstationary time series to stationary time series. As you know, a time series
is a collection of observations with respect to time, therefore, there is a chance
that a value at the present time may relate/depend on the past value. In most
of the time series, we observe such relationships. To study the degree of
relationship between previous/past values with the current value, we have to
study the covariance and correlation between them before modelling the time
series. Therefore, in this unit, you will study correlation analysis in time series.
We begin with a simple introduction of autocovariance and autocorrelation
functions in time series in Sec. 13.2. In Sec. 13.3, we discuss how to estimate
the autocovariance and autocorrelation functions using time series data.
When we study the autocorrelation between observations in the presence of
the intermediate variables, then it does not give the true picture of the relation.
Therefore, to remove the effect of the same, we use partial autocorrelation
which is discussed in Sec. 13.4. To present the autocorrelation/ partial
autocorrelation in the form of graphs/diagrams, we use a correlogram. In
Sec. 13.5, we describe what is correlogram and how to plot it. The
95
* Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 3 Time Series Analysis
interpretation of the correlogram is also explained in Sec. 13.6. In the next
unit, you will study different models for time series.

Expected Learning Outcomes


After studying this unit, you would be able to:
 describe the concept of covariance and correlation in time series;
 explain autocovariance and autocorrelation functions;
 describe partial autocorrelation function; and
 plot and interpret the correlogram.

13.3 AUTOCOVARIANCE AND


AUTOCORRELATION FUNCTIONS
As you know, a time series is a collection of observations with respect to time.
Since time series data are continuous and chronologically arranged, therefore,
there is a chance that a value at the present time may depend on the past
value. For example, the temperature in the next hour is not a random event
since, in most cases, it depends on the current temperature or that has
occurred during the past 24 hours. Therefore, the past temperature has an
impact more on the feature temperature. In other words, we can say that there
exists a strong relationship between the current temperature and the next
hour's temperature. Similarly, in cases of current sales of a company
depending on past sales, the stock is up today, then it is more likely to be up
tomorrow, etc. For measuring such a linear relationship, we use covariance or
correlation. The correlation between a series and its lags is called
autocorrelation. Since we calculate covariance/correlation between two values
of the same time series, therefore, it is called autocovariance/autocorrelation.
The information provided by the autocovariance/ autocorrelation is used to
understand the properties of time series data, fit the appropriate models, and
forecast future events of the series.
You have some idea about the covariance and correlation. You now try to
understand the basic concepts of both and what covariance and
correlation are.
Covariance
Covariance is defined as a measure of the relationship between two variables.
It measures how much two variables change together. If X and Y are two
variables, then covariance is defined mathematically as
1 n
Cov(x, y)= ( )(
∑ Xi − X Yi − Y
n i=1
)
It takes values from −∞ to + ∞ . The covariance tells whether both variables
vary in the same direction (positive covariance) or in the opposite direction
(negative covariance). If it is positive then it indicates a direct dependency, i.e,
increasing the value of one variable will result in an increase in the value of the
other variable and vice versa. On the other hand, a negative value signifies
negative covariance, which indicates that the two variables have an inverse
96 dependency, i.e., increasing the value of one variable will result in decreasing
Unit 13 Correlation Analysis in Time Series

the value of another variable and vice versa. A zero value indicates no
relationship between the variables.
The main problem with the covariance is that it is hard to interpret due to its
wide range ( −∞ to + ∞ ). For example, our data set could return a value say 5,
or 500. It may take a large value if the variables X and Y are large. Therefore,
a large value of covariance does not indicate that there exists a strong
relationship between the variables. It means that it does not tell us that there
exists a strong relationship between the variables when it is large. A value of
500 tells us that the variables are correlated, but unlike the correlation
coefficient, that number doesn’t tell us exactly how strong that relationship is.
There is no meaning of the numerical value of covariance only the sign is
useful. To overcome this problem the covariance is divided by the standard
deviation to get the correlation coefficient.

Correlation

Correlation is a measure of identifying and quantifying the linear relationship


between two variables. This relationship could vary from having a full
dependency or linear relationship between the two, to complete independenc.
Correlation explains the change in one variable leads to how much proportion
changes in the second variable. One of the most popular methods for
measuring the level of correlation between two variables is the Pearson
correlation coefficient. It measures the intensity or degree of the linear
relationship between two variables. If X and Y are two variables, then the
Pearson correlation coefficient (r) is defined mathematically as When two
variables are
n

∑(X )( ) related in such a


− X Yi − Y
Cov(X, Y) i way that change in
rXY
= rYX
= = i =1
the value of one
Var(X)Var(Y) n 2 n
∑(X ) ∑ (Y − Y)
2
i −X i
variable affects the
=i 1=i 1 value of another
variable, then
variables are said
The value of the coefficient of correlation can range from – 1 to +1, with a
to be correlated.
negative value indicating an inverse relationship and a positive value
indicating a direct relationship. It reveals not only the nature of the relationship
but also its strength. If it is near to ± 1 then the variables are highly correlated
on the other hand if it is near zero then it indicates a poor relationship.
To understand autocovariance/autocorrelation, you have to understand what
lag is.

Lag

The number of intervals between the two observations is the lag. For example,
the lag between the current and previous observations is one. If you go back
one more interval, the lag is two, and so on. In mathematical terms, the
observations Yt and Yt+k are separated by k time units, then the lag is k. This
lag can be days, quarters, or years depending on the nature of the data. When
k = 1, you are assessing adjacent observations.
We now come to our main topic autocovariance, and autocorrelation and we
now define autocovariance/autocorrelation formally. 97
Block 3 Time Series Analysis
Autocovariance

If we are interested in finding a linear relationship between two consecutive


observations of a time series, say, Yt and Yt+1 and also interested in the
relationship between observations at k lag apart i.e. Yt and Yt+k, then we use
autocovariance /autocorrelation. Let us start with autocovariance and we will
introduce the autocorrelation function after that.
Autocovariance can be defined as
The covariance between a given time series and a lagged version of itself over
successive time intervals is called autocovariance.
If Yt and Yt +k (t = 1, 2, …k = 0,1, 2, …) denote the time series which start from
time t and t+k, respectively, then covariance between time series Yt and Yt +k is
You also noticed called autocovariance at lag k. Mathematically, we can define the
that the summation autocovariance function as
on the left-hand
1 N− k
side of the formula
of autocovariance
γk = Cov ( Yt , Yt +k ) =
γ −k = ∑ {Yt − mean ( Yt )}{Yt +k − mean ( Yt +k )}
N t =1
is divided by N
instead of N − k as where N is the size of time series.
you may expect.
This is done
The autocovariance function is denoted by γk and read as gamma. Here, k
because the former represents the lag. Since for stationary time series mean remains constant,
ensures that the therefore,
estimate of the
covariance matrix mean ( Yt ) mean
= = ( Yt +k ) μ
is a nonnegative
definite matrix. Thus,
1 N− k
γ −k Cov ( Yt , Yt +k ) =
γk == ∑ ( Yt − μ)( Yt +k − μ)
N t =1

When the lag is zero, that is, k = 0, then


1 N 1 N

γ 0 Cov ( Yt =
, Yt )
( Yt − μ)( Yt −=
μ) ∑ ( Yt − μ)
2
=
=N t 1= Nt1

The autocovariance is the same as the covariance. The only difference is that
the autocovariance is applied to the same time series data, i.e., you compute
the covariance of the data say temperature Y with the same data temperature
Y, but from a previous period.
Autocorrelation
In time series analysis, the autocorrelation is the fundamental technique for
calculating the degree of correlation between a series and its lags. This
method is fairly similar to the Pearson correlation coefficient but
autocorrelation uses the same time series twice: one in its original form and
the second lagged one or more time periods as in autocovariance. We now
define autocorrelation as
Autocorrelation is a measure of the degree of relationship between a
given time series and a lagged version of itself over successive time intervals.
If Yt and Yt +k denote the value of a stationary time series which start from
time t and t+k, respectively,then the autocorrelation function/ coefficient
98 between time series Yt and its lag value Yt +k is defined as
Unit 13 Correlation Analysis in Time Series

Cov ( Yt , Yt +k )
ρk =
Var ( Yt ) Var ( Yt +k )

Since for stationary time series variance of the series remains constant,
therefore,
Var ( Yt ) = Var ( Yt +k )

Thus, the autocorrelation function at lag k becomes as


N− k

Cov ( Yt , Yt +k ) ∑ ( Y − μ)( Y
t t +k − μ)
=ρk = t =1

Var ( Yt ) N

∑ ( Y − μ)
2
t
t =1

The autocorrelation function ( ρk ), can also be defined in terms of


autocovariance as
N− k

∑ ( Y − μ)( Y
t t +k − μ)
γk
ρk =
t =1
N
γ0
∑ ( Yt − μ)
2

t =1

When lag is zero, that is, k = 0 then


N

∑ ( Y − μ)( Y − μ)
t t
γ0
ρ
=0
t =1
N
= = 1
γ0
∑ ( Y − μ)
2
t
t =1

The degree of correlation between a series and its lags indicates the
pattern/characteristics of the series. For example, if a time series has a
seasonality component say monthly then we will observe a strong correlation
with its seasonal lags, say, 12, 24, and 36 months.
Some important properties of time series can be studied with the help of
autocovariance and autocorrelation functions. They measure the linear
relationship between observations at different time lags apart. They provide
useful descriptive properties of the time series under study. This is also an
important tool for guessing a suitable model for the time series data.
After understanding the concept of autocovariance and autocorrelation
functions, we now study how to estimate them using sample data.

13.3 ESTIMATION OF AUTOCOVARIANCE AND


AUTOCORRELATION FUNCTIONS
In the previous section, we consider the theoretical aspect of autocovariance
and autocorrelation functions for time series. In practice, we have a finite time
series and based on the observations of the given time series we estimate the
mean, autocovariance and autocorrelation function. Suppose y t = y1, y 2 ,..., yn
represent the observations of a finite time series and it is assumed a sample of
theoretical time series Yt . We can estimate the mean of the time series as (μ)
by the sample mean as
1 n
μ̂= y= ∑ yt
n t =1 99
Block 3 Time Series Analysis
and estimate autocovariance function as
1 n −k
γ̂k ==
ck ∑ ( y t − y )( y t +k − y ); k =
n t =1
1,2,...,n − 1

It is known as the sample autocovariance function.


You also noticed that the summation on the left-hand side of the formula of
sample autocovariance is divided by n instead of n − k as you may expect.
This is done because the former ensures that the estimate of the covariance
matrix is a nonnegative definite matrix.
Similarly, we estimate the autocorrelation function at lag k as
n −k

∑(y t− y )( y t −k − y )
c
ρ̂k= rk= t =1
n
= k ; k= 1,2,...,n − 1
c0
∑ ( yt − y )
2

t =1

It is known as the sample autocorrelation function.


As you know that the correlation coefficient is calculated between variables
with multiple values of the same length. Therefore, to compute the sample
autocorrelation, first of all, we make two series of the same length as
discussed in Example 1.
You will also notice that as we increase lag k, that is, if we calculate
autocorrelation between observations further and further apart, then we create
two variables, yt+k and yt and they will each have n – k observations, therefore,
as we increase k, the number of observations decrease. Therefore, after a
while, the estimates of autocovariance and autocorrelation will become more
and more unreliable. Hence, to find a reliable estimate of the autocorrelation
function, we should require at least 50 observations and the sample
autocorrelation function should be calculated up to lag k = n/4, where n is the
number of observations in the time series. For illustration purposes, we just
consider small time series data (less than 50 observations).
Let's look at an example which helps you to understand how to calculate the
sample autocovariance and autocorrelation functions.
Example 1: The meteorological department collected the following data of
temperature (in oC) in a particular area on different days:
Day Temperature Day Temperature
1 22 9 28
2 23 10 30
3 23 11 31
4 24 12 30
5 23 13 30
6 25 14 31
7 26 15 30
8 28

Calculate mean, variance and autocorrelation functions for the given data.
Solution: As you know that the autocovariance/autocorrelation function is
calculated between variables with multiple values of the same length.
100 Therefore, to compute the sample autocorrelation, first of all, we make two
Unit 13 Correlation Analysis in Time Series
series of the same length. If y t denotes the value of the temperature/series at
any particular time t then y t +1 denotes the value of the temperature/series one
time after time t. That is, y t +1 is the lag 1 value of y t as shown in the following
table:
Day Temperature (yt) yt+1 Day Temperature (yt) yt+1
1 22 -- 9 28 28
2 23 22 10 30 28
3 23 23 11 32 30
4 24 23 12 32 31
5 23 24 13 34 30
6 25 23 14 33 31
7 26 25 15 34 31
8 28 26

Since y t and y t +1 have different dimensions (the first one has 15


observations, while the second one has 14), therefore, we use data from
day 2 onwards to day 15 to make equal length for k = 1. Consequently, our
data is as follows:
Day Temperature (yt) yt+1
1 22 --
2 23 22
3 23 23
4 24 23
5 23 24
6 25 23
7 26 25
8 28 26
9 28 28
10 30 28
11 32 30
12 32 31
13 34 30
14 33 31
15 34 31

Since there are 15 observations, therefore, we prepare the data up to n/4


= 15/4 ~ 4 lags in a similar way as shown below:
Day Temperature(yt) yt+1 yt+2 yt+3 yt+4
1 22
For k = 2, we
2 23 22
consider data from
3 23 23 22 day 3 and for k = 3,
4 24 23 23 22 we start from day 4.

5 23 24 23 23 22
6 25 23 24 23 23
7 26 25 23 24 23
8 28 26 25 23 24
101
Block 3 Time Series Analysis
9 28 28 26 25 23
10 30 28 28 26 25
11 31 30 28 28 26
12 30 31 30 28 28
13 31 30 31 30 28
14 31 31 30 31 30
15 30 31 31 30 31
Total 405

Since for the calculation of the autocorrelation function, we assume that the
time series is stationary, therefore, mean and variance of the series will be
constant. Thus, we calculate the sample mean and variance of the given
original time series and make the necessary calculations for calculating the
autocovariance and autocorrelation function in the following table:

yt − y ( yt − y )
2
y t +1 − y yt+2 − y yt +3 − y yt+4 − y ( y t − y )( y t +1 − y ) ( y t − y )( y t + 2 − y ) ( y t − y )( y t + 3 − y ) ( y t − y )( y t + 4 − y )
–5 25

–4 16 –5 20

–4 16 –4 –5 16 20

–3 9 –4 –4 –5 12 12 15
-4 16 –3 –4 –4 –5 12 16 16 20

–2 4 –4 –3 –4 –4 8 6 8 8

–1 1 –2 –4 –3 –4 2 4 3 4
1 1 –1 –2 –4 –3 –1 –2 –4 –3
1 1 1 –1 –2 –4 1 –1 –2 –4
3 9 1 1 –1 –2 3 3 –3 –6
4 16 3 1 1 -1 12 4 4 –4
3 9 4 3 1 1 12 9 3 3
4 16 3 4 3 1 12 16 12 4
4 16 4 3 4 3 16 12 16 12
3 9 4 4 3 4 12 12 9 12
Total 164 –3 –7 –11 –14 137 111 77 46

Therefore,
1 n 405
Mean
= ∑ =
n t =1
y t = 27
15

1 n 164
Variance =c 0 = ∑ ( y t − y ) =
2
=10.933
n t =1 15

Autocovariance function
1 n−1 1
c1 = ∑
n t =1
( y t − y )( y t +1 − y ) = × 137 = 9.133
15

1 n−2 1
c2 = ∑
n t =1
( y t − y )( y t +2 − y ) = × 111 = 7.4
15

1 n−3 1
102 c3 = ∑
n t =1
( y t − y )( y t +3 − y ) = × 77 = 5.133
15
Unit 13 Correlation Analysis in Time Series

1 n− 4
1
c4 = ∑
n t =1
( y t − y )( y t + 4 − y ) = × 46 = 3.067
15

After calculating the autocovariance function, we now calculate the sample


autocorrelation function as
c1 9.133
=r1 = = 0.835
c 0 10.933
c2 7.4
r2 =
= = 0.677
c 0 10.933
c 3 5.133
r3 =
= = 0.470
c 0 10.933
c 4 3.067
r4 =
= = 0.280
c 0 10.933
You may like to try the following Self Assessment Question before studying
further.

SAQ 1
A researcher wants to study the pattern of the unemployment rate in his
country. He collected quarterly unemployment rate data and given in the
following table:
Unemployment Quarter Unemployment
Quarter
rate rate
1 91 7 64
2 45 8 99
3 89 9 64
4 36 10 89
5 72 11 68
6 51 12 108

Compute:
(i) mean and variance, and
(ii) Autocovariance and autocorrelation functions.

13.4 PARTIAL AUTOCORRELATION FUNCTION


In the previous section, you studied autocorrelation function which measures
the linear dependency between a time series Yt with its own lagged values
Yt+k. However, a time series tend to carry information and dependency
structures in steps and therefore autocorrelation at lag k is also influenced by
the intermediate variables Yt+1, Yt+2,…, Yt+k–1. Therefore, autocorrelation is not
the correct measure of the mutual correlation between Yt and Yt+k in the
presence of the intermediate variables. Partial autocorrelation solves this
problem by measuring the correlation between Yt and Yt+k when the influence
of the intermediate variables has been removed. Hence partial autocorrelation
in time series analysis defines the correlation between Yt and Yt+k which is not
accounted for by lags t +1 to t + k –1. The partial autocorrelation function is
similar to the atocorrelation function except that it displays only the correlation
between two observations after removing the effect of intermediate variables.
103
For example, if we are interested in the direct relationship between today’s
Block 3 Time Series Analysis
consumption of patrol and that of a year ago then we don’t blame what
happens in between. The consumption of the previous 12 months has an
effect on the consumption of the previous 11 months, and the cycle continues
until the most current period. In partial autocorrelation estimates, these indirect
effects are ignored. Therefore, we can define the partial autocorrelation
function as
The partial autocorrelation function calculates the degree of relationship
between a time series Yt with its own lagged values Yt+k after their mutual
linear dependency on the intervening variables Yt+1, Yt+2,…, Yt+k–1 has
been removed.
You can understand the same using the diagram given in Fig. 13.1.

Fig. 13.1: PACF of order 2.

In other words, we can define the partial autocorrelation function between Yt


and Yt+k as
The conditional correlation between Yt and Yt+k, conditional
on Yt +1, Yt + 2 ,..., Yt +k −1 (the set of observations that come between the time
points Yt and Yt+k), is known as the kth order PACF.
Therefore, we can define the kth order (lag) partial autocorrelation function
mathematically as
Cov ( Yt , Yt +k | Yt +1, Yt + 2 ,..., Yt +k −1 )
φkk =
Var ( Yt | Yt +1, Yt + 2 ,..., Yt +k −1 ) Var ( Yt +k | Yt +1, Yt + 2 ,..., Yt +k −1 )

This is the correlation between values two time periods apart conditional on
knowledge of the value in between. (By the way, the two variances in the
denominator will equal each other in a stationary series.), therefore,

Cov ( Yt , Yt +k | Yt +1, Yt + 2 ,..., Yt +k −1 )


φkk =
Var ( Yt | Yt +1, Yt + 2 ,..., Yt +k −1 )

The formula for calculating the partial autocorrelation function looks scary,
therefore, we calculate it using the autocorrelation function instead of it.
The 1st order partial autocorrelation function equals to the 1st order
autocorrelation function, that is,
φ11 =ρ1

Similarly, we can define the 2nd order (lag) partial autocorrelation function in
terms of autocorrelation function as

φ22 =
(ρ − ρ )
2
2
1

104 (1 − ρ )2
1
Unit 13 Correlation Analysis in Time Series

The general form for calculating partial autocorrelation function in terms of


ACF is given in the matrix form as shown:
−1
 φ1k   1 ρ1 ρ2  ρk −1   ρ1 
φ   ρ 1 ρ3  ρk − 2  ρ 
 2k   1  2
φ3k  =  ρ2 ρ1 1  ρk −3  ρ2  Cramer’s rule can be
      used in any system of
           n linear equations in n
 φkk  ρk −1 ρk − 2 ρk −3  1  ρk 
     variables. If we have
following equations
Or
φk =Pk−1Ψk

where
 φ1k   1 ρ1 ρ2  ρk −1   ρ1  and if then
φ  ρ 1 ρ3  ρk − 2  ρ  according to Cramer-
 2k   1  2 Rule the system has
=φk φ3k
=  ,Pk  ρ2 ρ1 1  ρk −3  and
= Ψk ρ2  unique solution and is
      given by
          
 φkk  ρk −1 ρk − 2 ρk −3  1  ρk 
    
In the above expression, the last coefficient, φkk , is the partial autocorrelation Where

function of order k. Since we are interested only in this coefficient, therefore,


we can solve the above expression for φkk using the Cramer-Rule. We get

Pk*
φkk =
Pk

where . indicates the determinant and Pk* is given as


1 ρ1 ρ2  ρ1
ρ1 1 ρ3  ρ2
*
Pk = ρ2 ρ1 1  ρ3
    
ρk −1 ρk − 2 ρk −3  ρk

It is equal to the matrix Pk in which the kth column is replaced with Ψk .

Therefore, the 3rd order partial autocorrelation function is


1 ρ1 ρ1
ρ1 1 ρ2
ρ ρ1 ρ3
φ33 = 2
1 ρ1 ρ2
ρ1 1 ρ3
ρ2 ρ1 1

As you saw, the autocorrelation function helps assess the properties of a time
series. In contrast, the partial autocorrelation function (PACF) is more useful
for finding the order of an autoregressive, autoregressive integrated moving
average (ARIMA) model. You will study these models in the next unit.

Sample Partial Autocorrelation Function


In practice, we have a finite time series and on the basis of the observations of
105
the given time series, we estimate the partial autocorrelation function. The
Block 3 Time Series Analysis
estimate of the partial autocorrelation function is known as sample partial
autocorrelation and the formulae for the same are obtained by replacing
autocorrelation function (ρ) with sample autocorrelation function (r) which are
given as follows:

φˆ11 =r1

We define the 2nd order (lag) sample partial autocorrelation function as

φˆ22 =
(
r2 − r12 )
(
1 − r12 )
The general form for calculating the sample partial autocorrelation function of
order k is given in the matrix form as shown below:

1 r1 r2  r1
r1 1 r3  r2
r2 r1 1  r3
    
*
P̂ k rk −1 rk − 2 rk −3  rk
φˆkk= =
Pˆk 1 r1 r2  rk −1
r1 1 r3  rk − 2
r2 r1 1  rk −3
    
rk −1 rk − 2 rk −3  1

Let's consider an example which helps you to understand how to calculate the
sample partial autocorrelation function.

Example 2: For the data given in Example 2 of Unit 12, calculate the sample
partial autocorrelation up to order 3.

Solution: For calculating sample partial autocorrelation, first of all, we have to


compute the sample autocorrelation function. We have already calculated
these in Example 2. Therefore, for the sake of time, we just write them here

r1 = 0.835, r2 = 0.677, r3 = 0.470, r4 = 0.280

Since, the 1st-order partial autocorrelation function equals the 1st-order


autocorrelation function, therefore,

φ11 = r1 = 0.835

We can calculate the 2nd order (lag) sample partial autocorrelation function as

=
φ22
(r =
−r )
2 1
2
0.677 − ( 0.835 )
2

(1 − r ) 1
2
1 − ( 0.835 )
2

−0.020
= = −0.067
106 0.303
Unit 13 Correlation Analysis in Time Series

Similarly, We now compute the 3rd order sample partial autocorrelation


function as
1 r1 r1
r1 1 r2
r r r
φ33 =2 1 3
1 r1 r2
r1 1 r3
r2 r1 1

1 r1 r1 1 0.835 0.835
r1 1 r2 = 0.835 1 0.677
r2 r1 r3 0.677 0.835 0.470

1× ( 0.470 − 0.835 × 0.677 ) − 0.835 × ( 0.835 × 0.470 − 0.677 × 0.677 )


=
+ 0.835 × ( 0.835 × 0.835 − 0.676 × 1)

−0.095 + 0.055 + 0.017 =


= −0.023
Similarly,
1 r1 r2 1 0.835 0.677
r1 1 r3 = 0.835 1 0.470
r2 r1 1 0.677 0.835 1

1× (1 − 0.835 × 0.470 ) − 0.835 × ( 0.835 × 1 − 0.677 × 0.470 )


=

+ 0.677 × ( 0.835 × 0.835 − 0.677 )

= 0.608 − 0.432 + 0.014 = 0.191


Therefore,
−0.023
φˆ33 = =−0.120
0.191
Before going to the next session, you may like to compute the sample partial
autocorrelation function yourself. Let us try a Self Assessment Question.

SAQ 2
For the data given in SAQ 1, calculate the sample partial autocorrelation
function up to order 2.

13.5 CORRELOGRAM
In the previous sessions, you learnt autocovariance, autocorrelation, and
partial autocorrelation functions which are used to understand the properties of
time series, fit the appropriate models, and forecast future events of the series.
With the help of the autocorrelation/partial autocorrelation function, we can
also diagnose whether the time series is stationary or not. But a group of a
large number of autocorrelation always makes misperceptions to the reader
and he/she may understand it wrongly. If we present the autocorrelation/
partial autocorrelation function in the form of graphs/diagrams, then it attracts
the reader and it can be understood better. 107
Block 3 Time Series Analysis
A plot in which we take the autocorrelation function on the vertical axis and
different lags on the horizontal axis is known as a correlogram. The technique
of drawing a correlogram is the same as that of a simple bar diagram. The
only difference is that we just take a line instead of a bar of the same width.
Each bar in the correlogram represents the level of correlation between the
series and its lags in chronological order. A correlogram is also known as an
autocorrelation function (ACF) plot or autocorrelation plot. It gives
a summary of autocorrelation at different lags. With the help of a
correlogram, we can easily examine the nature of the time series and
diagnose a suitable model for the time series data.
The correlogram suggests that observations with smaller lag are positively
correlated and autocorrelation decreases as lag k increases. In most of the
time series, it is noticed that the absolute value of rk i.e. | rk| decreases as k
increases. This is because observations which are located far away are not
much related to each other, whereas observations close may be positively or
negatively correlated.
Let us understand how we plot a correlogram with the help of an example.
Example 3: For the data given in Example 2 of Unit 12, plot the correlogram.
Solution: A correlogram is a plot of the autocorrelation function with respect to
its lag, therefore, first of all, we have to compute the sample autocorrelation
coefficients. In Example 2, we have already calculated these. Therefore, to the
sake of time, we just write them here
r1 = 0.835, r2 = 0.676, r3 = 0.469, r4 = 0.280

For the correlogram, we take lags on the X-axis and sample autocorrelation
function on the Y-axis. At each lag, we draw a line, which represents the level
of correlation between the series and a lagged version of itself, as shown in
the following Fig. 13.2.

Fig. 13.2: The correlogram for lag k = 1, 2, 3 and 4.

After learning what is correlogram and how we plot it, we now understand how
the correlogram helps us to recognise the nature of a time series.

13.6 INTERPRETATION OF CORRELOGRAM


A correlogram is a graph used to interpret a set of autocorrelation functions in
108
which the autocorrelation function is plotted against its lag. It is often very
Unit 13 Correlation Analysis in Time Series

helpful for visual inspection to recognise the nature of time series, though it is
not always easy. We now describe certain types of time series and the nature
of their correlograms.
Random Series
A time series is completely random if it contains only independent
observations. Therefore, the values of the autocorrelation function for such a
series are approximately zero, that is, rk  0 and the correlogram of such a
random time series will be moving around the zero line. The typical
correlogram is shown in Fig. 13.3.

Fig. 13.3: The correlogram of random series.

Alternating Series
If a time series behaves in a very rough and zig-zag manner, alternating
between above and below mean, then it indicates by negative rk and positive
rk+1 and vice-versa. The correlogram of an alternating time series is shown in
Fig. 13.4.

Fig. 13.4: The correlogram of alternating series.

Stationary Time Series

A time series is said to be stationary if its mean, variance and covariance are
109
almost constant and it is free from trend and seasonal effects. The
Block 3 Time Series Analysis

Fig. 13.5: The correlogram of stationary time series.

correlogram of the stationary series has a few large autocorrelations in


absolute value for small lag k and they tend to zero very rapidly with an
increase in lag k (See Fig. 13.5). A model called an autoregressive model (you
will study in the next session), may be appropriate for a series of this type.
Nonstationary Time Series
A time series is said to be nonstationary if its mean, variance, and covariance
change over time. Therefore, a time series which contains trend, seasonality
cycles, random walks, or combinations of these is nonstationary. Such a
series is usually very smooth in nature and its autocorrelations go to zero very
slowly as the observations are dominated by trend. We should remove the
trend from such a time series before doing any further analysis. The time
series with trend and seasonal effects are as follows:
(i) Trend Time Series
If a time series has a trend effect, then a time plot will show an upward
or down word pattern as you have seen in Unit 10. For such type of time
series, the correlogram decreases in an almost linear fashion as the lags
increase as shown in Fig. 13.6. Hence a correlogram of this type is a
clear indication of a trend.

110
Fig. 13.6: The correlogram of time series having trend effect.
Unit 13 Correlation Analysis in Time Series

(ii) Seasonal Time Series


If a time series has a dominant seasonal pattern, then a time plot will
show cyclical behaviour with a periodicity of the season. The
correlogram will also exhibit an oscillation behaviour as shown in Fig.
13.7. If there is seasonality, say, 12 months, then the ACF value will be
large and positive at lag 12 and possibly also at lags 24, 36, . . .Similarly,
for quarterly seasonal data, a large ACF value will be seen at lag 4 and
possibly also at lags 8,12, .… If the seasonal variation is removed from
time series data then the correlogram may provide useful information.
Therefore, in this case, the correlogram may not contain any more
information than what is given by the time plot of the time series.

Fig. 13.7: The correlogram of time series having seasonal effect.

In general, the interpretation of a correlogram is not easy and requires a lot of


experience and insight.
You may like to try the following Self Assessment Question.

SAQ 3
A share market expert wants to study the pattern of a particular share price.
For that, he calculates the autocorrelation for different lags which are given as
follows:
r0 = 1, r1 = 0.482 , r2 = 0.050 , r3 = −0.159 , r4 = 0.253 , r5 = −0.024 , r6 = 0.053,

r7 = 0.025 , r8 = −0.252 , r9 = −0.177 , r10 = 0.006 , r11 = 0.390 , r12 = −0.838,

r13 = 0.407 , r14 = 0.010 , r15 = −0.181, r16 = −0.257 , r7 = −0.057 , r18 = 0.016

r19 = −0.051

For the above information:


(i) Plot the correlogram.
(ii) Interpret the correlogram. Is the seasonality apparent in the
correlogram?

We end this unit by giving a summary of its contents. 111


Block 3 Time Series Analysis

13.7 SUMMARY
In this unit, we have discussed:
• Role of correlation analysis in time series.
• The covariance between a given time series and a lagged version of itself
over successive time intervals is called autocovariance. The formula for
calculating the autocovariance function is given as
1 N− k
γ −k Cov ( Yt , Yt +k ) =
γk == ∑ ( Yt − μ)( Yt +k − μ)
N t =1
and its estimate using sample data is as follows:
1 n −k
γ̂k ==
ck ∑ ( y t − y )( y t +k − y ); k =
n t =1
1,2,...,n − 1

• Autocorrelation is a measure of the degree of relationship between a


given time series and a lagged version of itself over successive time
intervals. The formula for calculating the autocorrelation function is given
as
N− k

∑ ( Y − μ)( Y
t − μ)
γk t +k
=ρk =
t =1
N
γ0
∑ ( Yt − μ)
2

t =1

and its estimate using sample data is as follows:


n −k

∑(y t− y )( y t −k − y )
c
ρ̂k= rk= t =1
n
= k ; k= 1,2,...,n − 1
c0
∑ ( yt − y )
2

t =1

• The partial autocorrelation function calculates the degree of relationship


between a time series Yt with its own lagged values Yt+k after their mutual
linear dependency on the intervening variables Yt+1, Yt+2,…, Yt+k–1 has
been removed. We can calculate it using the autocorrelation function as
φ11 =ρ1

φ22 =
(ρ − ρ )
2
2
1

(1 − ρ ) 2
1

Pk*
φkk =
Pk

• A plot in which we take the autocorrelation function on the vertical axis


and different lags on the horizontal axis is known as a correlogram.

13.8 TERMINAL QUESTIONS


For the data which is obtained after the first difference of the time series data
(sales of a new single house in a region) of TQ 1 of Unit 12

(i) Calculate ACF.

(ii) Interpret the correlogram. Is the trend apparent in the correlogram?


112
Unit 13 Correlation Analysis in Time Series

13.9 SOLUTION/ANSWERS
Self Assessment Questions (SAQs)
1. Since there are 12 observations, therefore, we prepare the data up to
n/4 = 12/4 = 3 lags as follows:
Quarter Unemployment (yt) yt+1 yt+2 yt+3
1 91
2 45 91
3 89 45 91
4 36 89 45 91
5 72 36 89 45
6 51 72 36 89
7 64 51 72 36
8 99 64 51 72
9 64 99 64 51
10 89 64 99 64
11 68 89 64 99
12 108 68 89 64
Total 876

For the calculation of the autocorrelation function, we assume that the


time series is stationary, therefore, mean and variance of the series will
be constant. Thus, we calculate the sample mean and variance of the
given original time series and make the necessary calculations for
calculating the autocovariance and autocorrelation function in the
following table:

yt − y ( yt − y )
2
y t +1 − y y t + 2 − y y t + 3 − y ( y t − y )( y t +1 − y ) ( y t − y )( y t +2 − y ) ( y t − y )( y t +3 − y )
18 324
–28 784 18 –504
16 256 –28 18 –448 288
–37 1369 16 –28 18 –592 1036 –666
–1 1 –37 16 –28 37 –16 28
–22 484 –1 –37 16 22 814 –352
–9 81 –22 –1 –37 198 9 333
26 676 –9 –22 –1 –234 –572 –26
–9 81 26 –9 –22 –234 81 198
16 256 –9 26 –9 –144 416 –144
–5 25 16 –9 26 –80 45 –130
35 1225 –5 16 –9 –175 560 –315
0 5562 –2154 2661 –1074

Therefore,
1 n 876
Mean
= ∑ =
n t =1
y t = 73 ,
12

1 n 5562
Variance =c 0 = ∑ ( y t − y ) =
2
=463.5
n t =1 12 113
Block 3 Time Series Analysis
Autocovariance
1 n−1 1
c1 = ∑
n t =1
( y t − y )( y t +1 − y ) = × −2154 = −179.5
15
1 n−2
1
c 2 = ∑ ( y t − y )( y t + 2 − y ) = × 2661 = 221.75
n t =1 12
1 n−3 1
c 3 = ∑ ( y t − y )( y t + 3 − y ) = × −1074 = −89.5
n t =1 12
After calculating the autocovariance function, we now calculate the
sample autocorrelation function as
r1 = −0.387 , r2 = 0.478 , r3 = −0.193
2. In SAQ 1, we have already calculated the sample autocorrelation
coefficients which are as follows:
c1 −179.5 c 2 221.75
r1 = = = −0.387 , =
r2 = = 0.478
c0 463.9 c0 463.9

c 3 −89.5
r3 = = = −0.193
c 0 463.9

Since the 1st-order partial autocorrelation equals to the 1st-order


autocorrelation, therefore,
φ11 =r1 =−0.387

We can compute the 2nd order (lag) sample partial autocorrelation


function as

φˆ22
=
(r =
2−r )
1
2
0.478 − ( −0.387 )
=
2

0.386
(1 − r )
1
2
1 − ( −0.193 )
2

3. For plotting the correlogram, we take lags on the X-axis and sample
autocorrelation coefficients on the Y-axis. At each lag, we draw a line,
which represents the level of correlation between the series and its lags,
as shown in the following Fig. 13.8.

Fig. 13.8: The correlogram of time series of the share price.

Since the correlogram shows an oscillation, therefore, the time series of


the share price is not stationary. The frequency of the oscillations is
114
almost the same, therefore, it has a seasonal effect.
Unit 13 Correlation Analysis in Time Series

Terminal Questions (TQs)


1. For calculating the first sample autocorrelation of the first-order
difference, we prepare the data of lags. Since there are 14
observations, therefore, we prepare the data up to n/4 = 14/4 ~ 4
lags as follows:
First
Month yt+1 yt+2 yt+3 yt+4
Difference (yt)
2 38
3 21 38
4 32 21 38
5 18 32 21 38
6 5 18 32 21 38
7 15 5 18 32 21
8 25 15 5 18 32
9 20 25 15 5 18
10 10 20 25 15 5
11 15 10 20 25 15
12 30 15 10 20 25
13 8 30 15 10 20
14 32 8 30 15 10
15 25 32 8 30 15

For the calculation of the autocorrelation function, we assume that the


time series is stationary, therefore, mean and variance of the series will
be constant. Thus, we calculate the sample mean and variance of the
time series ( first difference data) and make the necessary calculations
for calculating the autocovariance and autocorrelation function in the
following table:

yt − y ( yt − y )
2
y t +1 − y yt+2 − y yt +3 − y yt+4 − y ( y t − y )( y t +1 − y ) ( y t − y )( y t + 2 − y ) ( y t − y )( y t + 3 − y ) ( y t − y )( y t + 4 − y )
17 289
0 0 17 0
11 121 0 17 0 187
–3 9 11 0 17 –33 0 –51
–16 256 –3 11 0 17 48 –176 0 –272
–6 36 –16 –3 11 0 96 18 –66 0

4 16 –6 –16 –3 11 –24 –64 –12 44

–1 1 4 –6 –16 –3 –4 6 16 3

–11 121 –1 4 –6 –16 11 –44 66 176

–6 36 –11 –1 4 –6 66 6 –24 36

9 81 –6 –11 –1 4 –54 –99 –9 36

–13 169 9 –6 –11 –1 –117 78 143 13

11 121 –13 9 –6 –11 –143 99 –66 –121


4 16 11 –13 9 –6 44 –52 36 –24
Total 1272 –10 –41 33 –109
115
Block 3 Time Series Analysis
1 n
294
Mean
= ∑ =
n t =1
y t = 21
14

1 n −k 1272
Variance =c 0 = ∑ ( y t − y ) =
2
=90.86
n t =1 14

Autocovariance
1 n−1 1
c1 = ∑
n t =1
( y t − y )( y t +1 − y ) = × −110 = −7.86
14

1 n−2 1
c2 = ∑
n t =1
( y t − y )( y t +2 − y ) = × −41 = −2.93
14

1 n−3 1
c3 = ∑
n t =1
( y t − y )( y t +3 − y ) = × 33 = 2.36
14

1 n− 4 1
c4 = ∑
n t =1
( y t − y )( y t + 4 − y ) = × −109 = −7.79
14

After calculating the autocovariance, we now calculate the sample


autocorrelation function as
c1 −7.86 c −2.93
r1 = = = −0.086 , r2 = 2 = = −0.032
c 0 90.86 c 0 9086

c 3 2.36 c −7.79
r3
= = = 0.026 , r4 = 4 = = −0.086
c 0 90.86 c 0 90.86

For the correlogram, we take lags on the X-axis and sample


autocorrelation function on the Y-axis. At each lag, we draw a line,
which represents the level of correlation between the series and its lags,
as shown in the following Fig. 13.9.

Fig. 13.9: The correlogram of time series data of sales of new single houses.

Since the autocorrelation function is approximately zero and moving


around the zero line, therefore, the time series is stationary and no trend
appeared in the correlogram.

116
UNIT 14
TIME SERIES MODELLING
TECHNIQUES
Structure
14.1 Introduction qth-order Moving Average Models

Expected Learning Outcomes 14.5 Autoregressive Moving


Average Models
14.2 Time Series Models
Various Forms of ARMA Models
14.3 Autoregressive Models
14.6 Autoregressive Integrated
First-order Autoregressive Models
Moving Averages Models
Second-order Autoregressive
Various Forms of ARIMA Models
Models
14.7 Time Series Model
pth-order Autoregressive Models
Selection
14.4 Moving Average Models
14.8 Summary
First-order Moving Average Models
14.9 Terminal Questions
Second-order Moving Average
14.10 Solutions /Answers
Models

14.1 INTRODUCTION
In the previous units (Units 12 and 13), you have learnt stationary and
nonstationary time series, the concept of autocorrelation and partial
autocorrelation in time series. When the data is autocorrelated, then most of
the standard modelling methods may become misleading or sometimes even
useless because they are based on the assumption of independent
observations. Therefore, we need to consider alternative methods that take
into account the autocorrelation in the data. Such types of models are known
as time series models. In this unit, you will study time series models, such as
autoregressive (AR), moving average (MA) autoregressive moving average
(ARMA), autoregressive integrated moving average (ARIMA), etc.
In this unit, you will learn some time series models. We begin with a simple
introduction of the necessity of time series models instead of ordinary
regression models in Sec. 14.2. In Secs. 14.3 and 14.4, we discuss the
autoregressive and moving average models with their types and properties,
respectively. In Sec. 14.5, the autoregressive moving average models are
117
* Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 3 Time Series Analysis
explained. The AR, MA and ARMA models are used for stationary time series.
If a time series is nonstationary then we use autoregressive integrated moving
average (ARIMA), therefore, you will learn it in Sec. 14.6. When you deal with
real-time series data, then the first question may arise in your mind how you
know which time series model is most suitable for a particular time series data.
For that, we discuss time series model selection in Sec. 14.7.

Expected Learning Outcomes


After studying this unit, you would be able to:
 explain the necessity of time series models;
 describe and use autoregressive models;
 explain and use moving average models;
 explore autoregressive moving average model;
 describe and use autoregressive integrated moving average models;
and
 selection of a particular time series model for real-life time series data.

14.2 TIME SERIES MODELS


In the previous classes, you have learnt about linear regression, and it is one of
the most common methods for identifying and quantifying the relationship
between a dependent variable and a single (simple linear regression) or
multiple (multivariate linear regression) independent variables. The dependent
variable (Y ) is also called the regress, explained or forecast variable
whereas the independent variable(X) is also called the predictor, regressor
or explanatory variable.
In the simplest case, the regression model allows for a linear relationship
between the forecast variable Y and a single predictor variable X. In the
case of a single independent variable, the regression equation is as follows:
Y =β0 + β1X + ε

The coefficients β0 and β1 denote the intercept and the slope of the
regression line, respectively. The intercept β0 represents the predicted value
of Y when X = 0 and the slope β1 represents the average predicted change
in Y resulting from a one-unit change in X. Also each observation Y is
consisting of the systematic or explained part of the model, β0 + β1X and
random error ε . The term "error" in this context refers to a departure from the
underlying straight line model rather than a mistake and it includes everything
affecting Y other than predictor variable X. The error term has the following
assumptions:
• The mean of the error term should be zero, i.e., E [ ε ] =0 .

• The error term should have constant variance, i.e., Var [ ε ] =σ2 =constant

• It should not correlate with the predictor variable, i.e.,

118 Cor [ ε, X] =Cov [ ε, X] =E [ ε, X] =0


Unit 14 Time Series Modelling Techniques
• The error term should not correlate with its previous values, i.e.,
Cor [ εi , εi−1 ]= Cov [ εi , εi−1 ]= E [ εi , εi−1 ]= 0

• It should be normally distributed with mean 0 and constant variance.


We may use the regression model in time series, but it has some issues which
are as follows:
• The primary issue is that linear regression models assume that the data
points of a variable are independent of each other, while time series
models work with data points that have a temporal relationship with one
another (autocorrelated). For example, there exists a strong relationship
between today’s temperature and the next day’s temperature.
• A second issue is the error distribution. In the regression model, we
assume that the errors are independent and identically distributed (“IID”)
and the error in the model at one time is uncorrelated with the error at
other times, which is usually not fulfilled for time series data, which are
generally autocorrelated.
However, time series data have their historical features of the trend seasonal,
cyclical and irregular components, therefore, they might not fit well with
traditional regression models. In such cases, we use Autoregressive (AR)
models, Moving Average (MA) models, Autoregressive Moving Average
(ARMA) models or Autoregressive Integrated Moving Average (ARIMA)
models, etc. Out of these, AR, MA and ARMA models are used when the time
series is stationary whereas the ARIMA model is used when the time series is
nonstationary. We will discuss one at a time.

14.3 AUTOREGRESSIVE MODELS


In time series data, it is generally observed that the value of a variable
observed in the current time period will be similar to its value in the previous
period, or even the period before that, and so on. For example, if the
temperature is quite high today, it is reasonable to assume that the
temperature will also be high tomorrow. Therefore, the past temperature has
an impact more on the future temperature. In other words, we can say that
there exists a strong relationship between today’s temperature and the next
day’s temperature. Therefore, to forecast the temperature, we can take
previous days’ temperatures as a predictor when fitting a regression model for
such time series data. Consider one more example, suppose we want to
predict today's behaviour of a child. Generally, we see the behaviour of a child
matches with his/her father/mother or grandfather/grandmother. So if today’s
behaviour of a child depends on the behaviour of his/her father/mother then
we can take the father/mother as a predictor/lag because the father/mother is
the previous generation or grandfather/grandmother as lag two generations as
previous to the previous generation. So we can say that there exists
autocorrelation and we can predict today's behaviour of a child using the
behaviour of his/her parents or grandparents. The regression models in which
the predictors are the past values of the series instead of independent
variables are called autoregressive (AR) models. Because such a regression
model uses data from the same input variable at previous time steps,
therefore, it is referred to as an autoregression (regression of self). We can 119
Block 3 Time Series Analysis
define autoregressive models as
Autoregressive (AR) models are the models in which the value of a
variable in the current period is regressed against its own lagged values,
that is, the value in the previous period.
The AR models are very useful in situations where the next forecasted value is
a function of the previous time period, such as, if it's rainy today, the data
suggests that it's more likely to rain tomorrow than if it's clear today. The
autoregressive models are used for stationary time series. On the basis of the
correlation with the previous values, the autocorrelation model has different
types which are discussed in the next sub-sections.
14.3.1 First-order Autoregressive Models
The autoregressive model is a model in which the value of a variable in the
current period is regressed against its previous value then it is called first-
order autoregression. The number of lags used as regressors is called the
order of autoregression. So, the preceding model is a first-order
autoregression and it is written as AR(1) where 1 represents the order.
Since in time series, we have measured values of a variable over time,
therefore, we use “t” as a subscript in the variable of time series models. If y t
and y t −1 are the values of a variable at time t and t − 1 , respectively then we
can express the first-order autoregressive model as follows:
y t = δ + φ1y t −1 + ε t

The model expresses the present value as a linear combination of the mean of
the series δ (read as delta), the previous value of the variable y t −1 and the
error term ɛt (read as epsilon). The magnitude of the impact of the previous
value on the present value is quantified using a coefficient denoted with φ1
(read as phai). The "error term " is called white noise and it is normally
distributed with mean zero and constant variance (σ2 ).

We now study the properties of the model.


Mean and Variance
If X and Y are We can find mean of an AR(1) model as follows:
independent random
variables and a, b & c E [ y t ]= μ= E [δ] + φ1E [ y t −1 ] + E [ ε t ]
are constant then
Since time series is stationary, therefore,
 
Mean= δ + φ1μ + 0  E [ y t ]= E [ y t −1 ]= μ and 
 
 ε t  N 0,σ  , therefore, E [ ε t ] = 0
2


If X and Y are Thus,


independent random δ
variables and a, b & c Mean =
1 − φ1
are constant then
Similarly,
We can find the variance of an AR(1) model as follows:

120
Var
= [ y t ] Var [δ] + φ12 Var [ y t −1 ] + Var [ε t ]
Unit 14 Time Series Modelling Techniques
Since time series is stationary, therefore, Var [ y t ] = Var [ y t −1 ]

Also, δ is a constant, therefore, Var [δ] = 0 and ε t  N 0,σ 2  , therefore,


Var [ ε t ] = σ 2 .

Therefore,
Var [ y t ] =
φ12 Var [ y t ] + σ 2

σ2
Var [=
yt ] 2
≥ 0 when φ12 < 1
1 − φ1

Autocovariance and Autocorrelation Functions


We now find the autocovariance of a stationary AR(1) model as follows:
=γk Cov [ y t ,=
y t +k ] Cov [δ + φ1y t −1 + ε t , y t +k ]

= Cov [δ, y t +k ] + +φ1Cov [ y t −1, y t +k ] + Cov [ ε t , y t +k ]

Since the covariance between a constant and a variable is zero, therefore,


Cov [δ, y t +k ] = 0,

Also, Cov [ y t −1, y t +k ] = γk −1 and by the property of ε t

Cov [ ε t , y t +k ] = 0

Therefore,
γk = φ1γk −1
Hence,
σ2  σ2 
γ1 = φ1Var ( y t ) =
φ1γ 0 = φ1
1 − φ12
 Since Var [ y t ] = 
1 − φ12 

σ2  σ2 
γ2 =φ1γ1 =φ12  Since γ 1 = φ1 
1 − φ12  1 − φ12 
σ2
γk = φ1k
φ1γk −1 =
1 − φ12
The autocorrelation function(ACF) for an AR(1) model is as follows:
γk
ρk = φ1k for k =
= 0,1,2,...
γ0

Since φ1 lies between ̶ 1 to 1, therefore, for a positive value of φ1 , the ACF


( φ ) exponentially decreases to 0 as the lag k increases. For the
k
1

negative value of φ1 , the ACF also exponentially decays to 0 as the lag


increases, but the algebraic signs for the autocorrelations alternate between
positive and negative.
Conditions for Stationarity
An autoregressive model can be used on time series data if and only if the
time series is stationary. Therefore, some constraints on the values of the
parameters are required for stationarity which are as follows:

φ1 < 1 ⇒ −1 < φ1 < 1


121
Block 3 Time Series Analysis
Since stationarity is necessary for applying autoregressive models, therefore,
before applying it to time series data, you will have to check whether the time
series is stationary or not. If it is nonstationary, then you will have to apply
some transformation methods (such as differencing, log transformation, etc. as
discussed in Unit 13) to transform the series into a stationary and we use the
ARIMA model. We will discuss it later in this unit.
14.3.2 Second-order Autoregressive Models
The autoregressive model in which the value of a variable in the current period
is regressed against its two previous values then the autoregressive model is
called the second-order autoregression model.
The second-order autoregression model is written as AR(2) where 2
represents the order.
If y t , y t −1 and y t − 2 are the values of a variable at time t, t − 1 and t − 2 ,
respectively then the second-order autoregressive model is expressed as
follows:
y t = δ + φ1y t −1 + φ2 y t − 2 + ε t

We now study the properties of the AR(2) model as we have studied for AR(1).
Mean and Variance
We can find the mean and variance of an AR(2) model as we have obtained
for AR(1). Here, we just write these as follows:
δ
Mean =
1− φ1 − φ2
Similarly,
σ2
Var [ y t ]
= ≥ 0 when φ12 + φ22 < 1
1 − φ12 − φ22
Autocovariance and Autocorrelation Functions
We can find the autocovariance of an AR(2) model as we have obtained for
AR(1) model. Here, we just write these as follows:
σ 2 if k = 0
γk = Cov [ y t , y t +k ] = φ1γk −1 + φ2 γk − 2 + 
0 if k > 0
Therefore,
γ 0 = φ1γ1 + φ2 γ 2
and
γk = φ1γk −1 + φ2 γk − 2 + σ 2 ; k = 1,2,...

The autocorrelation function for an AR(2) model can be obtained by dividing


γk by γ 0 as follows:
γk
ρk = = φ1ρk −1 + φ2ρk − 2 for k = 1,2,...
γ0
Therefore,
ρ1 =φ1ρ0 + φ2ρ−1

122 ρ1 − φ2ρ1 =φ1 (Since ρ0 = 1,ρ−1 = ρ1 )


Unit 14 Time Series Modelling Techniques
φ1
ρ1 =
1 − φ2
ρ2 =φ1ρ1 + φ2ρ0 =φ1ρ1 + φ2

ρ3 =φ1ρ2 + φ2ρ1

Conditions for Stationarity


Since an autoregressive model can be used on time series data if and only if
the time series is stationary. Therefore, some constraints on the values of the
parameters are required for stationarity which are as follows:
• φ2 < 1 ⇒ −1 < φ2 < 1

• φ1 + φ2 < 1
• φ2 − φ1 < 1

After understanding the first and second-order autoregressive models, you


are interested to know the general form of the autoregressive model. Let’s
discuss the general form of the autoregressive model.
14.3.3 pth-order Autoregressive Models
More generally, a pth-order autoregression, written as AR(p), is a multiple
linear regression in which the value of the series at any time t is a linear
function of the values at times t−1,t−2,…,t−p.
If y t , y t −1,..., y t −p are the values of a variable at time t, t−1,t−2,…,t−p,
respectively, then the pth-order autoregressive model is expressed as follows:
y t = δ + φ1y t −1 + φ2 y t − 2 + ... + φp y t −p + ε t

We now study the properties of the AR(p) model as we have studied for AR(1)
and AR(2) models.
Mean and Variance
We can find the mean and variance of an AR(p) model as we have obtained
for AR(1). Here, we just write these as follows:
δ
Mean =
1 − φ1 − φ2 − ... − φp
and
σ2
=Var [ y t ] ≥ 0 when φ12 + φ22 + ... + φp2 < 1
1 − φ12 − φ22 − ... − φp2
Autocovariance and Autocorrelation Functions
We can find the autocovariance of an AR(2) model as we have obtained for
AR(1). Here, we just write these as follows:
σ 2 if k = 0
γk = Cov [ y t , y t +k ] = φ1γk −1 + φ2 γk − 2 + ... + φp γk −p + 
0 if k > 0
Therefore,
γ 0 = φ1γ1 + φ2 γ 2 + ... + φp γp
and
γk = φ1γk −1 + φ2 γk − 2 + ... + φp γk −p + σ 2 ; k = 1,2,...
123
Block 3 Time Series Analysis
The autocorrelation function for an AR(2) model can be obtained by dividing γk
by γ 0 as follows:

ρk =φ1ρk −1 + φ2ρk − 2 + ... + φpρk −p for k = 1,2,...

Conditions for Stationarity


When p ≥ 3, the restrictions for stationarity are much more complicated,
therefore, we have not discussed them.
After understanding the autoregressive models, let's look at an example which
helps you to understand how to check stationarity and calculate mean,
variance, autocovariance and autocorrelation functions of an autoregressive
model.
Example 1: Consider a time series model
yt =
10 + 0.2y t −1 + ε t

where ε t  N [0,1]

(i) Is this a stationary time series?


(ii) What are the mean and variance of the time series?
(iii) Calculate the autocorrelation function.
(iv) If the current observation is y100 = 7.5, would you expect the next
observation to be above or below the mean?
Solution:
For checking the stationarity of the time series, first of all, we find the
parameters of the time series model. Since the variable in the current period is
regressed against its previous value, therefore, it is the first-order
autoregressive model. We now compare it with its standard form, that is,
y t = δ + φ1y t −1 + ε t and we obtain

δ = 10 and φ1 =0.2

The condition for stationarity of the first-order autoregressive model is


−1 < φ1 < 1 . Since −1 < =
φ1 0.2 < 1, therefore, the series is stationary.

We now calculate the mean and variance of the series as


δ 10
Mean
= = = 12.5
1 − φ1 1 − 0.2
σ2 1 1
[yt ]
Var= = 2
= = 1.04
1 − φ1 1 − 0.04 0.96

We now calculate the autocorrelation function as

σ2
γ1 =
φ1γ 0 =
φ1 0.2 × 1.04 =
= 0.21
1 − φ12

γ2 =φ1γ1 =
0.2 × 0.21 =
0.04

γ3 =
φ1γ 2 =
0.2 × 0.04 =
0.008

124 We can forecast, the next value of y100 using the prediction model as
Unit 14 Time Series Modelling Techniques
ŷ=
t 10 + 0.2y t −1

ŷ101 =10 + 0.2y100 =10 + 0.2 × 7.5 =10 + 0.15 =11.15

Therefore, if the current observation is y100 = 7.5, then the next observation
y101 = 11.15 will be above the mean (12.5).
You may like to try the following Self Assessment Question before studying
further.

SAQ 1
Consider the time series model
yt =
5 + 0.8y t −1 − 0.5y t − 2 + ε t

where ε t  N [0,2]

(i) Is this a stationary time series?


(ii) What are the mean and variance of the time series?
(iii) Calculate the autocorrelation function.
(iv) Plot the correlogram.
(v) Suppose the observations for time periods 50 and 51 are 38 and 40 then
forecast the next observation.

14.4 MOVING AVERAGE MODELS


In the previous section, you learned the autoregression models in which the
value of a variable in the current period is regressed against its own lagged
values, that is, the value in the previous period. In some cases, the forecasting
model is unable to capture all series patterns, and therefore some information
is left over in model residuals which are called forecasting errors. For example,
in our example of today’s behaviour of a child, it may be possible that the
behaviour of a child does not depend on the behaviour of his/her family, and it
may be possible that it depends on the behaviour of his/her friends, therefore,
whatever his/her friend perform in the past the child performs today. It means
that the behaviour may also depend on other unknown factors. In other words,
we can say that we are not aware of that factors and they are not considered
in the model. As you know, the factors which are not considered in the model
come under the residual. It means that the current value of a variable may also
depend on the past residuals. The goal of the moving average models is to
capture patterns in the residuals, if they exist, by modelling the relationship
between the current value of a variable, the error term(∈t), and the past error
(residuals) terms of the models. We can define moving average models as

Moving average (MA) models are the models in which the value of a
variable in the current period is regressed against the residuals in the
previous period.

A moving average model, states that the current value is linearly dependent on
the past error terms. 125
Block 3 Time Series Analysis
A Moving Average model is similar to an autoregressive model, except that
instead of being a linear combination of past time series values, it is a linear
combination of the past error/residual/white noise terms.
Concept of Invertibility
An autoregressive model can be used on time series data if and only if the
time series is stationary. The moving average models are always stationary.
But some restrictions also are imposed on the parameters of the moving
average model as in the case of the autoregressive model.
As by the definition of the moving average model, the value of a variable in the
current period is regressed against the residuals in the previous period and if
The MA model is defined
in many textbooks and
y t is the value of a variable at time t and ε t −1 is the residuals at time t –1 then
computer software with we can express moving average model of first-order as
minus sign before the
terms. Although this y t =μ + ε t + θ1ε t −1
switches the algebraic
signs of estimated We can write the above expression as
coefficient values and
(unsquared) theta terms ε t = y t − μ − θ1ε t −1
in formulas for ACFs and
variances. It has no ε t = y t − μ − θ1 ( y t −1 − μ − θ1ε t − 2 )
effect on the model’s
overall theoretical = μ ( θ1 − 1) + y t + θ1y t −1 + θ12 ε t − 2
features. In order to
accurately construct the
estimated model, you 
must examine your ∞
software to make sure ε t= c + ∑ θ1i yit
that either to make sure i=0
that either negative or
positive signs are used.
It means that we can convert/invert the past residuals/errors//noises into past
R software uses positive
signs in its underlying observations. In other words, we can convert/invert a moving average model
model, as we take here. into the autoregressive model. This property is called invertibility. This notion
is very important if one wants to forecast the future values of the dependent
variable, otherwise, the forecasting task will be impossible (i.e., the residuals
in the past cannot be estimated, as it cannot be observed). Actually, when the
model is not invertible, the innovations can still be represented by
observations of the future, this is not helpful at all for forecasting purposes.
Any autoregressive process is necessarily invertible but a stationarity condition
must be imposed to ensure the uniqueness of the model for a particular
autocorrelation structure. A moving average process on the other hand is
stationary but in order for there to be a unique model for a particular
autocorrelation structure an invertbility condition must be imposed.
We now discuss different types of moving average models on the basis of the
correlation with the residuals of the previous periods in the following sub-
sections.

14.4.1 First-order Moving Average Models

The moving average model in which the value of a variable in the current
period is regressed against its previous residual is called the first-order moving
average. For example, if today's price of a share depends on whatever has
happened in the other factor on the previous day except for the price of the
126 share on the previous day then we use the first-order moving average.
Unit 14 Time Series Modelling Techniques
If y t is the value of a variable at time t and ε t −1 is the residuals at time t –1
then the moving average model of first-order is given as follows:

y t =μ + ε t + θ1ε t −1

A first-order moving average model expresses the present value of a variable


as a linear combination of the mean of the series µ, the present error term ε t
and the past error term ε t −1 . Of course, we do not observe the values of ε t .
The magnitude of the impact of past residual on the present value is quantified
using a coefficient denoted with θ1. We use θ1 for that instead of ϕ1 like in the
autoregressive model, to avoid confusion.
The number of residuals used as regressors is called the order of the moving
average model. So, the first-order moving average is written as MA(1) where 1
represents the order.
After understanding the expression of MA(1) model, we now study its
properties.
Mean and Variance
We can find the mean of a MA(1) model as follows:

E [ y t ] =E [μ] + E [ ε t ] + θ1E [ ε t −1 ]

=E [ ε t ] E=
[ε t −1 ] 0 because ε t  N 0,σ 2  
=μ + 0 + θ1 × 0  
and μ is a constant so E (μ) =μ 

Therefore,
Mean of MA(1) = μ

Similarly,

We can find the variance of a MA(1) model as follows:

Var [ y t ] =
Var [μ] + θ12 Var [ ε t −1 ] + Var [ ε t ]

Since

Var [μ] = 0 because μ is a constant

Var [ ε t ] Var
= = [ε t −1 ] σ 2 because ε t  N 0,σ 2 
Therefore,
[ y t ] θ12σ 2 + σ 2
Var =

Var [ y t ] = ( )
1 + θ12 σ 2 ≥ 0

Autocovariance and Autocorrelation Functions


The autocovariance function of an MA(1) model is as follows:
σ2
=γ 0 Var
= ( yt )
1 + θ12

γ1 = θ1σ 2
127
Block 3 Time Series Analysis
γk 0; k > 1
=

Similarly, we have the autocorrelation function of MA(1) model as


γ1 θ1
ρ
=1 =
γ 0 1 + θ12

ρk 0; k > 1
=

It indicates that the autocorrelation function for MA(1) model becomes zero
after lag 1.
Conditions for Invertibility
The moving average models are always stationary. However, some
restrictions are also imposed on the parameters of the moving average models
otherwise the model can not converge. Therefore, some constraints on the
values of the parameters are required for the invertibility of the MA (1)
model which is as follows:
θ1 < 1 ⇒ −1 < θ1 < 1

14.4.2 Second-order Moving Average Models


The moving average model in which the value of a variable in the current
period is regressed against its two previous residuals then it is called the
second-order moving average. It is represented by MA(2). For example, if
today's price of a share depends on whatever has happened in the other
factor in the previous day and the day of the previous day then we use a
second-order moving average.
If y t is the value of a variable at time t and ε t , ε t −1 and ε t − 2 are the residuals at
time t, t – 1 and t – 2, respectively, then the moving average model of second
order is expressed as follows:
y t =μ + ε t + θ1ε t −1 + θ2 ε t − 2

As for MA (1) model, the coefficients θ1 & θ2 represent the magnitude of the
impact of past residuals on the present value.
Let us study the properties of the MA(2) model.

Mean and Variance

The mean and variance of MA(2) model are given as follows:


=E [ ε t ] E=
[ε t −1 ] E=
[ε t −2 ] 0 
Mean = μ + 0 + θ1 × 0 + θ2 × 0  
 because ε t  N 0,σ 2  

Mean of MA(2) = μ

Similarly, We can find the variance of a MA(2) model as follows:

Var [ y t ] =
Var [μ] + θ12 Var [ ε t −1 ] + θ22 Var [ ε t − 2 ] + Var [ ε t ]

Since

128
Var [μ] = 0 because μ is constant
Unit 14 Time Series Modelling Techniques
Var [ ε t ] Var
= = [ε t −1 ] Var
= [ε t −2 ] σ because ε t  N 0,σ 
2 2

Therefore,
Var [ y t ] = θ12σ 2 + θ22σ 2 + σ 2

( )
Var [ y t ] = 1 + θ12 + θ22 σ 2 ≥ 0

Autocovariance and Autocorrelation Functions


The autocovariance function of an MA(2) model is given as follows:

( )
γ 0 =Var ( y t ) = 1 + θ12 + θ22 σ 2

γ1
= ( θ1 + θ1θ2 ) σ 2
γ 2 = θ2σ 2

γk 0; k > 2
=

Similarly, we have the autocorrelation function of MA(2) model as


γ1 θ1 + θ1θ2
ρ
=1 =
γ 0 1 + θ12 + θ22

γ2 θ2
ρ=
2 =
γ 0 1 + θ12 + θ22

ρk 0; k > 2
=

It indicates that the autocorrelation function of MA( 2) model becomes zero


after lag 2.
Conditions for Invertibility
The constraints on the values of the parameters for the invertibility of the
MA (2) model are as follows:
• θ2 < 1 ⇒ −1 < θ2 < 1

• θ1 + θ2 < 1
• θ2 − θ1 < 1

After understanding the first and second-order moving average models, you
are interested to know the general form of the moving average model. Let’s
discuss the general form of the moving average model.

14.4.3 qth-order Moving Average Models

A moving average model states that the current value is linearly dependent on
the current and past error terms.
If y t is the value of a variable at time t and ε t , ε t −1 , ε t − 2 ,...,ε t − q are the residuals
at time t,t − 1,t − 2,...,t − q , respectively, then the moving average model of the
qth order is expressed as the present value ( y t ) as a linear combination of the
mean of the series (μ) , the present error term ( ε t ) , and past error terms ( ε t −1 ,
ε t − 2 ,...,ε t − q ). Mathematically, we express a general moving average model as
follows: 129
Block 3 Time Series Analysis
y t =μ + ε t + θ1ε t −1 + θ2 ε t − 2 + ... + θqε t − q

where θ1,θ2 ,...,θq represent the magnitude of the impact of past errors on the
present value.
After understanding the form of the general moving average model, we now
study the properties of the model.
Mean and Variance
The mean and variance of MA(q) model are given as follows:
Mean = μ

( )
Var [ y t ] = 1 + θ12 + θ22 + ... + θ2q σ 2 ≥ 0

Autocovariance and Autocorrelation Functions


The autocovariance function of a MA(q) model is given below:

(
γ 0 = 1 + θ12 + θ22 + ... + θ2q σ 2)
( θ + θ1θk +1 + ... + θq−k θq ) σ 2 ;k =
1,2,...,q
γk =  k
0;k > q

Similarly, we have the autocorrelation function of MA(q) model as


 θk + θ1θk +1 + ... + θq−k θq
γk  ;k = 1,2,...,q
ρ=
k =  1 + θ12 + θ22 + ... + θ2q
γ0 
0;k > q

It indicates that the autocorrelation function of MA (q) model becomes zero


after lag q.
Conditions for Invertibility
The constraints on the values of the parameters for invertibility of the MA (q)
when q ≥ 3, are much more complicated, therefore, we are not discussing
them.
After understanding the moving average models, let's look at an example
which helps you to understand how to check invertibility and calculate mean,
variance, autocovariance and autocorrelation functions for the moving average
models.

Example 2: Consider the time series model


y t =2 + ε t + 0.8ε t −1

where ε t  N [0,1]

(i) Identify the model.

(ii) Is this a moving average model? If yes check whether it is invertible.

(iii) What are the mean and variance of the time series?

(iv) Calculate autocovariance and autocorrelation functions.

130 (v) Plot the correlogram.


Unit 14 Time Series Modelling Techniques
(vi) If the residual at t = 100 is 2.3, then would you expect the next
observation to be above or below the mean?

Solution: Since the variable in the current period is regressed against its
previous residual, therefore, it is the first-order moving average model. To
check whether it is invertible, first of all, we find the parameters of the time
series model. We now compare it with its standard form, that is,
y t =μ + ε t + θ1ε t −1

We obtain
μ = 10 , θ1 = 0.8

The invertibility constraints for MA(1) is −1 < θ1 < 1. Since θ1 lies between –1
and 1, therefore, the time series model MA(1) is invertible.

We now calculate the mean and variance of the series as

Mean = 2

( )
Var [ y t ] = 1 + θ12 σ 2 = (1 + 0.64 ) × 1 = 1.64

We now calculate the autocorrelation function as

=γ 0 Var
= ( y t ) 1.64
γ1= θ1σ 2= 0.8 × 1= 0.8

γk 0; k > 1
=

Similarly, we calculate the autocovariance function of MA(1) as


γ1 0.8
ρ
=1 = = 0.49
γ 0 1.64

ρk 0; k > 1
=

We can forecast, the next value of y100 using the prediction model as
ŷ t= 2 + 0.8ε t −1

ŷ101 =2 + 0.8ε100 =2 + 0.8 × 2.3 =2 + 0.18 =2.18

Therefore, if the residual at t = 100 is 2.3, then the next observation y101 = 2.18
will be above the mean (2).

You may try the following Self Assessment Question before studying further.

SAQ 2
Consider the time series model

y t = 42 + ε t + 0.7ε t −1 − 0.2ε t − 2

where ε t  N [0,2]

(i) Identify the model.


131
Block 3 Time Series Analysis
(ii) Is this a moving average model? If yes check whether it is invertible.
(iii) What are the mean and variance of the time series?

(iv) Calculate the autocorrelation function.

(v) Suppose the residual errors for time periods 20 and 21 are 0.23 and 0.54
then forecast the next observation.

14.5 AUTOREGRESSIVE MOVING AVERAGE


MODELS
In the previous sections, you have learnt autoregressive and moving average
models which are used to model time series data. The autoregressive(AR)
models are used when the current value of the time series variable depends
on the past values of the series whereas the moving average models are used
when the current value of the time series variable depends on the
unpredictable shocks (residuals) in the previous periods. But in real-life data,
we also observe that the current value of the time series variable depends not
only on its past values but also on the residuals in the previous periods. For
example, the sale of a product of a company at a current time depends on the
prior sales happening in the past time which plays a role of the AR component
and it also depends on the time-based campaigns launched by the company,
such as distribution of coupons, by one get one free, etc. will increase sales
temporarily and such change in sales is captured by the moving average
component. Therefore, we need models that simultaneously use past data as
a foundation for estimates, and can also quickly adjust to unpredictable shocks
(residuals). In this section, we are going to talk about one such model, called
autoregressive moving average (ARMA), which takes into account past values
as well as past errors when constructing future estimates.
Autoregressive moving average (ARMA) models play a key role in the
modelling of time series. An ARMA process consists of two models: an
autoregressive (AR) model and a moving average (MA) model. In analysis, we
tend to put the residuals at the end of the model equation, so that’s why the
“MA” part comes second. As compared with the pure AR and MA models,
ARMA models provide the most effective linear model of stationary time series
since they are capable of modelling the unknown process with the minimum
number of parameters. In this model, the impact of previous lags along with the
residuals is considered for forecasting the future values of the time series. We
can define ARMA model as follows:
Autoregressive moving average models are simply a combination of an AR
model and an MA model.
Autoregressive Moving Average (ARMA) models are models in which
the value of a variable in the current period is related to its own values
in the previous period as well as values of the residual in the previous
period.

Since ARMA is a combination of both autoregressive terms(p) and moving


average(q) terms, therefore, we represent it as ARMA (p,q). It is also used
132 for stationary time series.
Unit 14 Time Series Modelling Techniques
On the basis of different values of p and q, the ARMA model has different
types which are discussed as follows:

14.5.1 Various Forms of ARMA Models


The ARMA model has various forms for different values of the parameters p
and q of the model. We discuss some standard forms as follows:

ARMA(1,1) Models
ARMA(1,1) models are the models in which the value of a variable in the
current period is related to its own value in the previous period as well as
values of the residual in the previous period. It is a mixture of AR(1) and
MA(1).
If y t and y t −1 are the values of a variable at time t and t –1, respectively and if
ε t and ε t −1 are the residuals at time t and t –1, respectively then the
ARMA (1,1) model is expressed as follows:
y t = δ + φ1y t −1 + θ1ε t −1 + ε t

As usual, the coefficients δ and ε t denote the intercept/constant factor and


error term at time t, respectively whereas the coefficients φ1 and θ1 represent
AR and MA coefficients and represent the magnitude of the impact of past
values and past error on the present value, respectively.
After understanding the form of the ARMA(1,1) model, we now study the
properties of the model.
Mean and Variance
We can find the mean and variance of an ARMA(1, 1) model as we have
found in AR and MR models which are given as follows:
δ
Mean =
1 − φ1

Var [ y t ]
=
(1 + 2φ θ1 1 )
+ θ12 σ 2
≥ 0 when φ12 < 1
2
1− φ 1

Autocovariance and Autocorrelation Functions

The autocovariance function of an ARMA(1, 1) model is given as follows:

(1 + 2φ θ )
+ θ12 σ 2
=γ 0 Var
= [yt ] 1 1

1− φ 2
1

γ1 =
( φ1 + θ1 )(1 + φ1θ1 ) σ 2
1 − φ12

γk =
φ1γk −1 for k =
2,3,...

Similarly, the autocorrelation function of an ARMA(1,1) model is given as


follows:
 ( φ1 + θ1 )(1 + φ1θ1 )
γk  for k = 1
ρ=
k =  1 + 2φ1θ1 + θ12
γ0  k
φ1 ρk −1 for k = 2,3,...
133
Block 3 Time Series Analysis
The autocorrelation function of an ARMA(1, 1) model exhibits exponential
decay and/or sinusoid pattern towards zero. It does not cut off but gradually
decreases as lag k increases. Also the autocorrelation function of an ARMA(1,
1) model displays the shape of an AR(1) process. The partial autocorrelation
function of an ARMA(1, 1) model also gradually dies out (the same property as
a moving average model) as k increases. It is relatively difficult to the
identification of the order of the ARMA model.
Conditions for Stationarity and Invertibility
The stationarity of the ARMA (1, 1) is related to the AR component in the
ARMA (1, 1) model. Therefore, stationarity conditions which are discussed for
AR(1) are also for ARMA (1,1) model which is as follows:
φ1 < 1 ⇒ −1 < φ1 < 1

Similarly to the stationarity conditions, the invertibility of an ARMA(1,1) model


is related to the MA(1) component. Therefore, the invertibility conditions which
are discussed for MA(1) are also for ARMA (1,1) model which is as follows:
θ1 < 1 ⇒ −1 < θ1 < 1

ARMA (1, 2) Models


ARMA(1,2) models are the models in which the value of a variable in the
current period is related to its own value in the previous period as well as
the residuals of two previous periods. It is a mixture of AR(1) and MA(2)
models.
If y t and y t −1 are the values of a variable at time t and t –1, respectively, and if
ε t ,ε t −1 and ε t − 2 are the residuals at time t, t –1 and t – 2, respectively, then the
ARMA (1,2) model is expressed as follows:
y t = δ + φ1y t −1 + θ1ε t −1 + θ2 ε t − 2 + ε t

The expression for mean, variance, autocovariance and autocorrelation


functions of ARMA(1, 2) model are more complicated, therefore, we are not
giving the expression of these.
Conditions for Stationarity and Invertibility
The stationarity of the ARMA (1, 2) model is related to the AR component in
the ARMA (1, 1) model. Therefore, stationarity conditions which are discussed
for AR(1) are also for ARMA (1,2) model which is as follows:
φ1 < 1 ⇒ −1 < φ1 < 1

Similarly, the invertibility conditions, of an ARMA(1,1) model is related to the


MA(2) component which are as follows:
• θ2 < 1 ⇒ −1 < θ2 < 1

• θ1 + θ2 < 1
• θ2 − θ1 < 1

ARMA (p, q) Models


More generally, ARMA(p,q) models are the models in which the value of a
134 variable in the current period is related to its own p values in the previous
Unit 14 Time Series Modelling Techniques
periods as well as q values of the residual in the previous periods. It is a
mixture of AR(p) and MA(q) models.
If y t , y t −1,..., y t −p are the value of a variable at time t,t − 1,...,t − p , respectively
and if ε t ,ε t −1,...,ε t − q are the residuals at time t,t − 1,...,t − q , respectively, then
the ARMA (p, q) model is expressed as follows:
y t = δ + φ1y t −1 + φ2 y t − 2 + ... + φp y t −p + θ1ε1 + θ2 ε t − 2 + ... + θqε t − q + ε t

The expression for mean, variance, autocovariance and autocorrelation


functions of ARMA (p, q) models are more complicated, therefore, we are not
giving the expression of these.
Autocorrelation and Partial Autocorrelation Functions
The autocorrelation function of an ARMA(p, q) model exhibits exponential
decay and/or sinusoid pattern towards zero. It does not cut off but gradually
decreases as lag k increases. Also the autocorrelation function of an
ARMA(p, q) model displays the shape of an AR(p) process. The partial
autocorrelation function of an ARMA(p, q) model also gradually dies out (the
same property as a moving average model) as k increases. It is relatively
difficult to the identification of the order of the ARMA model.
Conditions for Stationarity and Invertibility of ARMA (p, q) model
When p ≥ 3, the restrictions for stationarity are much more complicated.
Similarly, when q ≥ 3, the restrictions for invertibility become more
complicated, therefore, we are not discussing them here.
After understanding the ARMA models, let's look at an example which helps
you to understand how to identify the order, check stationary and invertible
and calculate mean, variance, autocovariance and autocorrelation functions
for the ARMA models.
Example 3: Consider the time series model
y t = 20 + ε t − 0.5y t −1 + 0.7ε t −1

Assuming that the variance of the white noise is 2.

(i) Identify the model.

(ii) Check whether the model is stationary and invertible.


(iii) Calculate the autocorrelation function ρ1,ρ2 and ρ3 .

Solution:
Since the variable in the current period is regressed against its previous value
as well as previous residual, therefore, it is the ARMA model of order (1, 1). To
check whether it is stationary and invertible, first of all, we find the parameters
of the time series model. We now compare it with its standard form, that is,
y t = δ + φ1y t −1 + θ1ε t −1 + ε t

We obtain
δ = 20 , φ1 =−0.5 , θ1 = 0.7
The stationary constraints for ARMA(1, 1) is −1 < φ1 < 1 . Since φ1 lies
between –1 and 1, therefore, the time series model ARMA(1, 1) is stationary. 135
Block 3 Time Series Analysis
Similarly, the invertibility constraints for ARMA(1,1) is −1 < θ1 < 1. Since θ1
lies between –1 and 1, therefore, the time series model ARMA(1,1) is
invertible.
We now calculate the autocovariance function of ARMA(1, 1) as

 ( φ1 + θ1 )(1 + φ1θ1 )
γk  for k = 1
ρ=
k =  1 + 2φ1θ1 + θ12
γ0  k
φ1 ρk −1 for k = 2,3,...

ρ1 =
( φ1 + θ1 )(1 + φ1=
θ1 ) ( −0.5 + 0.7 )(1 − 0.5 × 0.7
=
) 0.13
= 0.165
1 + 2 × −0.5 × 0.7 + ( 0.7 )
2 2
1 + 2φ1θ1 + θ1 0.79

φ12ρ1 =
ρ2 = −0.5 × −0.5 × 0.165 =
0.041

( −0.5 ) × 0.041 =
3
φ13ρ2 =
ρ3 = −0.005

Now, Let us try some more Self Assessment Questions.

SAQ 3
Consider the ARMA time series model
y t =+
27 0.8y t −1 + 0.3ε t −1 + ε t

Assuming that the variance of the white noise is 1.5.


(i) Is the process stationary and invertible?
(ii) Find ρ1,ρ2 and ρ3 for the process.

14.6 AUTOREGRESSIVE INTEGRATED MOVING


AVERAGE MODELS
In the previous sections of this unit, you have learnt different time series
modes such as autoregressive (AR), moving average (MA) and autoregressive
moving average (ARMA). These models are based on the assumption that the
time series is stationary. But in the real world, most of the time series variables
are nonstationary. In general, trends, and periodicity exist in many time series
data. Hence, the AR, MA, and ARMA models do not apply to nonstationary
time series so there is a need to remove these effects before applying such
models. Therefore, if the input time series is nonstationary, then first we have
to transform the series from a nonstationary into a stationary and after that, we
shall apply models such as the AR, MA, and ARMA. For transforming a
nonstationary time series to stationary, we may use differencing, as discussed
in Unit 13, once, twice or three times, and so on until the series is at least
approximately stationary. As AR and MA processes are described by the
order, in a similar way, the differencing process is also described by the order
of differencing, as 1, 2, 3.... Therefore, to describe a model for nonstationary
time series, the elements make up a triple (p,d,q) instead of two (p, q) that
defines the type of model applied where the degree of the differencing is
represented by the d parameter. Combining the differencing of a nonstationary
time series with the ARMA model provides a powerful family of models that
can be applied in a wide range of situations. The model is described as an
136 autoregressive moving average (ARMA) model. In this form, the letter “I” in
Unit 14 Time Series Modelling Techniques
ARIMA refers to the fact that the time series data has been initially differenced
and when the modelling is completed the results then have to be summed
or integrated to produce the final estimations and forecasts. Box and Jenkins
played a significant role in the development of this extended variant of the
model, therefore, ARIMA models are also referred to as Box-Jenkins models.
The ARIMA model is discussed below:

Autoregressive Integrated Moving Average (ARIMA) model is a


combination of differencing with autoregressive and moving average
models.
We can express the ARIMA model as follows:
y′t = δ + φ1y′t −1 + φ2 y′t − 2 + ... + φp y′t − q − θ1ε1 − θ2 ε t − 2 − ... − θqε t − q + ε t

where y′t is the differenced series which may have been differenced more
than once and p and q are the orders of autoregressive and moving average
parts.
For the first difference, we can write the ARIMA model as
y t − y t −1 = δ + φ1 ( y t −1 − y t − 2 ) + φ2 ( y t − 2 − y t −3 ) + ... + φp ( y t − q − y t − q−1 )
− θ1ε1 − θ2 ε t − 2 − ... − θqε t − q + ε t

On the basis of different values of p, q and d, the ARIMA model has different
types which are discussed as follows:
14.6.1 Various Forms of ARIMA Models

The ARIMA model has various forms for different values of the parameters
p, d and q of the model. We discuss some standard forms as follows:

Model Name Form Use

The errors are


ARIMA(0,0,0) White noise y t = εt
uncorrelated across time.
first-order The series is stationary
ARIMA(1,0,0) autoregressive y t = µ + φ1y t −1 + ε t and autocorrelated with
model its previous value.
y t − y t −1 = µ + ε t The series is not
ARIMA(0,1,0) Random walk
y t = µ + y t −1 + ε t stationary.

Differenced The time series is not


y t − y t −1 = µ + φ1 ( y t − − y t − 2 ) + ε t
first-order stationary and the
ARIMA(1,1,0)
autoregressive y t = µ + y t −1 + φ1 ( y t − − y t − 2 ) + ε t autocorrelated with its
model previous values.
Simple The time series is not
ARIMA(0,1,1) exponential y t − y t −1 = θ1ε t −1 + ε t stationary and the errors
with constant smoothing y t y t −1 + θ1ε t −1 + ε t
= are correlated across
model time.
Simple The time series is not
exponential y t − y t −1 = µ + θ1ε t −1 + ε t stationary and the errors
ARIMA(0,1,1)
smoothing with y t = µ + y t −1 + θ1ε t −1 + ε t are correlated across
growth time.
Damped-trend
linear y t − y t −1 = µ + φ1y t −1 + θ1ε t −1 + ε t The series is not
ARIMA(1,1,1) exponential stationary and the series
smoothing y t = µ + (1 + φ1 ) y t −1 + θ1ε t −1 + ε t has an upward trend
model
137
Block 3 Time Series Analysis
After understanding various time series models, we now discuss an important
topic how to select a suitable time series model for real-life time series data in
the next section.

14.7 TIME SERIES MODEL SELECTION


I hope you understand the various time series models. Broadly, you can divide
all time series models into two categories: the models which are used for
stationary time series such as AR, MA, ARMA and the models which are used
for nonstationary time series such as ARIMA. When you deal with real-time
series data, the first question that may arise in your mind is how you know
which time series model is most suitable for a particular time series data. Don’t
worry about that here we describe the methodology for the same in steps so
that you can easily identify/select the model and its order for the given time
series data. It has the following steps:

It is important that the Step 1: Since there are two types of models which are used for stationary and
choice of order makes nonstationary time series, therefore, first of all, we plot the time series
sense. For example,
suppose you have had data and check whether the time series data is stationary or
blood pressure readings nonstationary as you have learned in Unit 13.
for every day over the
past two years. You Step 2: If the time series is stationary, we have to decide which model out
may find that an AR(1)
or AR(2) model is of AR, MA and ARMA is suitable for our time series data. To
appropriate for distinguish among them, we calculate the autocorrelation function
modeling blood
pressure. However, the (ACF) and the partial autocorrelation function (PACF) as discussed in
PACF may indicate a Unit 13. After that, we plot the ACF and PACF versus the lag, that is,
large partial
autocorrelation value at
correlogram as discussed in Unit 13 and try to identify the pattern of
a lag of 17, but such a both. The ACF plot is most useful for identifying the AR model and
large order for an PACF plot for the order of the AR model whereas the PACF plot is
autoregressive model
likely does not make most useful for identifying the MA model and ACF plot for the order of
much sense. the MA model. We now try to understand how to distinguish between
AR, MA and ARMA models as follows:
Case I (AR model): In the plot of ACF versus the lag (correlogram), if you
see a gradual diminish in amount or exponential decay
then this indicates that the values of the time series are
serially correlated, and the series can be modelled
through an AR mode. For determining the order of an
AR model, we use a plot of PACF versus the lag. If the
PACF output cuts off, which means the PACF is almost
zero at lag p+1, then it indicates that the AR model of
order p. We can also calculate PACF by increasing the
order one by one and as soon as this lies within the
range of ± 2/√n (where n is the size of the time series)
we should stop and take the order as the last significant
PACF as the order of the AR model (see SAQ 4).
Case II (MA model): In the plot of PACF versus the lag, if you see a gradual
diminish in amount or exponential decay then this
indicates that the series can be modelled through an MA
model and if the ACF output cuts off, means the ACF
almost zero, at lag q+1, then it indicates that the MA
138 model of order q.
Unit 14 Time Series Modelling Techniques
Case III (ARMA model): If the autocorrelation function (ACF) as well as the
partial autocorrelation function (PACF) plots show a
gradual diminish in amount (exponential decay) or
damped sinusoid pattern then this indicates that the
series can be modelled through an ARMA model but
it makes the identification of the order of the ARMA
(p, q) model relatively more difficult. For that
extended ACF, generalised sample PACS, etc. are
used which are beyond the scope of this course. For
more detail, you can consult Time Series Analysis
Forecasting and Control, 4th Edition, written by Box,
Jenkins and Reinsel.
Step 3: If the time series is nonstationary, we obtain the first, second, etc.
differences of the time series as discussed in Unit 13 until it becomes
stationary and ensure that trend and seasonal components are
removed and find d. Suppose after the second difference the series
becomes stationary then d is 2. Generally, one or two-stage
differencing is sufficient. The differenced series will be shorter (as you
have observed in Unit 13) than the source series. An ARMA model is
then fitted to the resulting time series. Since ARIMA models have
three parameters, therefore, there are many variations to the possible
models that could be fitted. We should choose the ARIMA models as
simple as possible, i.e. contain as few terms as possible (small values
of p and q). For more detail, you can consult Time Series Analysis
Forecasting and Control, 4th Edition, written by Box, Jenkins and
Reinsel.

Step 4: After identifying the model, we estimate the parameters of the model
using the method of moments, maximum likelihood estimation, least
squares methods, etc. The method of moments is the simplest of
these. In this method, we equate the sample autocorrelation functions
to the corresponding population autocorrelation functions which are
the function of the parameters of the model and solve these
equations for the parameters of the model. However, this method is
not a very efficient method of estimation of parameters. For moving
average processes usually the maximum likelihood method is used
which gives more efficient estimates when n is large. We shall not
discuss this anymore here and if someone is interested in this, may
refer to Time Series Analysis Forecasting and Control, 4th Edition,
written by Box, Jenkins and Reinsel.

Step 5: After fitting the best model, we give a diagnostic check to the
residuals to examine whether the fitted model is adequate or not. It
helps us to ensure no more information is left for extraction and check
the goodness of fit. For the residual analysis, we plot the ACF and
PACF of the residual and check whether there is a pattern or not. For
the adequate model there should be no structure in ACF and PACF of
the residual and should not differ significantly from zero for all lags
greater than one. For the goodness of fit, we use Akaike’s information
criterion (AIC) and Bayesian information criterion (BIC). We have not 139
Block 3 Time Series Analysis
discussed all the above aspects in detail here but interested one
should consult Time Series Analysis Forecasting and Control, 4th
Edition, written by Box, Jenkins and Reinsel.
After understanding the procedure of selection of a time series model, let us
take an example.
Example 4: The temperature (in oC) in a particular area on different days
collected by the meteorological department is as given below:
Day Temperature Day Temperature
1 27 9 28
2 29 10 30
3 31 11 30
4 27 12 26
5 28 13 30
6 30 14 31
7 32 15 27
8 29

(i) Examine which model (AR or MA) is suitable of this data


(ii) Find the order and estimate the parameter of the selected model.
(iii) Write the model.
Solution: First of all, we check whether the given time series is stationary or
nonstationary. For that, we plot the time series data by taking days on the
X-axis and temperature on the Y-axis. We get the time series plot as shown in
Fig. 14.1.

Y
40

38
Temperature

36

34

32

30 X
1 2 3 4 5 6 7 8 9 10 11 12 13
Day

Fig. 14.1: Time series plots of the temperature data.


Fig. 14.1 shows that there is no consistent trend (upward or downward) over
the entire period. The series appears to slowly wander up and down. Also, the
variance is constant. Almost by definition, there is no seasonality or trend. So
we can say that this time series is stationary.
To examine the model and its order, we have to compute the sample
autocorrelation (ACF) and partial autocorrelation (PACF) as we have
discussed in Unit 13. Therefore, for the sack of time, we just write them here

140 r1 = 0.835, r2 = 0.676, r3 = 0.469, r4 = 0.280


Unit 14 Time Series Modelling Techniques
φˆ11 = r1 = 0.835 , φˆ22 =0.088

Since the autocorrelation function (ACF) gradually diminishes (decreases) in


amount, it indicates that the series can be modelled through an AR model and
the PACF output is almost zero, at lag 2, it indicates that the AR model is of
order 1. Hence, we may conclude that the AR (1) model is suitable of this
data. Therefore, the model is
y t = δ + φ1y t −1 + ε t

We now estimate the parameters ( δ and φ1 ) of the model using the method of
moments. In this method, we equate the sample autocorrelation functions to
the population autocorrelation functions which are the function of the
parameters of the model and solve these as below:
r1 = ρ1

0.835 = φ11 ⇒ φ1 = 0.835

For estimating the parameter δ , first, we find the mean of the given data and
δ
then we use the relationship Mean =
1 − φ1
1 15 435
Mean
= ∑=
15 i=1
yi = 29
15

Therefore,
δ
Mean = 29 = ⇒ δ = 29 × 0.165 = 4.785
1 − 0.835
Therefore the suitable model for the temperature data is
yt =
4.785 + 0.835y t −1 + ε t

Before going to the next session, you may like to do some exercise yourself.
Let us try Self Assessment Question.

SAQ 4
A researcher wants to develop an autoregressive model for the data on
COVID-19 patients in a particular city. For that, he collected the data 100 days
and calculated the autocorrelation function which are given as follow:
r1 = 0.73, r2 = 0.39, r3 = 0.07

By calculating sample PACF, estimate the order of the autoregressive model


to be fitted.
We end this unit by giving a summary of what we have covered in it.

14.8 SUMMARY
In this unit, we have discussed

• The necessity of time series models in comparison of regression models.


• Describe various time series modes such as AR, MA, ARMA, ARIMA.
• Explain the various properties of the models.
141
Block 3 Time Series Analysis
• Selection of a particular time series model for real-life time series data.
• The stationarity and invertibility conditions of the model also considered
the role of autocorrelations and partial autocorrelations in the
identification of the models.

14.9 TERMINAL QUESTIONS


1. Define various components of the ARIMA model.
2. Define the parameters of the ARIMA model.
3. For time series data, a researcher obtained the following information:
n = 100, Mean = 26, Error variance = 2.2
r1 = 0.69, r2 = 0.54, r3 =0.38, r4 = 0.38, r5 = 0.29, r6 = 0.0.5
(i) Plot the correlogram
(ii) Which one of the AR and MR models will be more suitable?
(iii) Fit the suitable model.

14.10 SOLUTIONS/ANSWERS
Self Assessment Questions (SAQs)
1. For checking the stationarity of the time series, first of all, we find the
parameters of the time series model. Since the variable in the current
period is regressed against its previous and previous to previous values,
therefore, it is the second-order autoregressive model. We now compare
it with its standard form, that is, y t = δ + φ1y t −1 + φ2 y t − 2 + ε t . We obtain

δ = 5 , φ1 =0.8 , φ2 =−0.5

We now check the stationarity conditions for the second-order


autoregressive model as
−1 < φ2 < 1 ⇒ 0.5 < 1

φ1 + φ2 < 1 ⇒ 0.8 − 0.5 = 0.3 < 1


φ2 − φ1 < 1 ⇒ −0.5 − 0.8 = −1.3 < 1

Since all three conditions for stationarity are satisfied, hence, the time
series is stationary.

We now calculate the mean and variance of the series as


δ 5 5
Mean = = = = 7.14
1 − φ1 − φ2 1 − 0.8 + 0.5 0.7

σ2 2 2
Var [ y t ] = 2
= 2
= = 18.18
1 − φ1 − φ2 1 − 0.64 − 0.25 0.11

We now calculate the autocorrelation function as

φ1 0.8
=ρ1 = = 0.53
142 1 − φ2 1 + 0.5
Unit 14 Time Series Modelling Techniques
ρ2 =φ1ρ1 + φ2 =0.8 × 0.53 − 0.5 =−0.076

−0.061 − 0.265 =
= −0.326
We can forecast, the next value of y52 using the prediction model as
ŷ t =
5 + 0.8y t −1 − 0.5y t − 2

ŷ 52 =
5 + 0.8y 51 − 0.5y 50

ŷ 52 =5 + 0.8 × 0.40 − 0.5 × 0.38 =5.13

2. Since the variable in the current period is regressed against its previous
and previous to previous residuals, therefore, it is a second-order
moving average model. For checking the invertibility of the MA(2) model,
first of all, we find the parameters of the time series model. We now
compare it with its standard form, that is, y t =μ + ε t + θ1ε t −1 + θ2 ε t − 2 . We
have
μ = 42
θ1 = 0.7
θ2 = −0.2

The conditions for the invertibility of the MA(2) model are


• θ2 < 1 ⇒ −1 < θ2 < 1

• θ1 + θ2 < 1
• θ2 − θ1 < 1
Since θ2 = −0.2 , therefore, it lies between –1 and 1

θ1 + θ2 = 0.7 − 0.2 = 0.5 < 1


θ2 − θ1 =
−0.2 − 0.7 =
−0.9 < 1

Since all three conditions for the invertibility of MA(2) model are satisfied,
hence the time series is invertible.
We now calculate the mean and variance of the series as
Mean = 42
σ2 2 2
Var [ y t ] = 2
= 2
= = 1.307
1 + θ1 + θ2 1 + 0.49 + 0.04 1.53

We now calculate the autocorrelation function as


θ1 + θ1θ2 0.7 + 0.7 ( −0.2 ) 0.56
ρ1 = = = = 0.366
2 2
1 + θ1 + θ2 1.53 1.53
θ2 −0.2
ρ2 = 2 2
= = −0.131
1 + θ1 + θ2 1.53

ρk 0; k > 2
=
We can forecast, the next value of y52 using the prediction model as
ŷ t =+
42 0.7y t −1 − 0.2y t − 2

ŷ 22 =
42 + 0.7ε 21 − 0.2ε 20
143
Block 3 Time Series Analysis
ŷ 22 = 42 + 0.7 × 0.54 − 0.2 × 0.23 = 42.332

3. For checking whether it is stationary or invertible, first of all, we find the


parameters of the time series model. We now compare it with its
standard form, that is y t = δ + φ1y t −1 + θ1ε t −1 + ε t

We obtain
δ = 27 , φ1 =0.8 , θ1 = 0.3

The stationary constraints for ARMA(1, 1) is −1 < φ1 < 1 . Since φ1 =0.8


lies between –1 and 1, therefore, the time series model ARMA(1, 1) is
stationary.
Similarly, the invertibility constraints for ARMA(1) is −1 < θ1 < 1. Since
θ1 = 0.3 lies between –1 and 1, therefore, the time series model
ARMA(1) is invertible.
We now calculate the autocovariance function of ARMA(1, 1) as

ρ1 =
( φ1 + θ1 )(1 +=
φ1θ1 ) ( 0.8 + 0.3 )(1 − 0.8 × 0.3 )
=
0.836
= 0.532
1 + 2φ1θ1 + θ12 1 + 2 × 0.8 × 0.3 + ( 0.3 )
2
1.57

( 0.8 ) × 0.532 =
2
φ12ρ1 =
ρ2 = 0.340

( 0.8 ) × 0.340 =
3
φ13ρ2 =
ρ3 = 0.174

4. As we know from Unit 13, the 1st-order partial autocorrelation equals the
1st-order autocorrelation, that is,

φ11 = r1 = 0.73

The range of the PACF function is ±2/√n = ±2/10 = ±0.2

Since φ11 =0.73 lies outside of the range of PACF so we have to


calculate the next order PACF.
The 2nd order (lag) sample partial autocorrelation is

φˆ22 =
(r − r ) =0.39 − ( 0.73 )
2 1
2 2

=−0.31
(1 − r ) 1 − ( 0.73 )
1
2 2

Since φˆ22 =−0.31lies outside the range ±0.2, therefore, the


autoregressive model AR(1) is not possible and we have to calculate
next PACF.
Similarly, We now compute the 3rd order sample partial autocorrelation
function as
1 r1 r1
r1 1 r2
r r r
φ33 =2 1 3
1 r1 r2
r1 1 r3
r2 r1 1
144
Unit 14 Time Series Modelling Techniques
1 r1 r1 1 0.73 0.73
r1 1 r2 = 0.73 1 0.39
r2 r1 r3 0.39 0.73 0.07

1× ( 0.07 − 0.73 × 0.39 ) − 0.73 × ( 0.73 × 0.07 − 0.39 × 0.39 )


=
+ 0.73 × ( 0.73 × 0.73 − 0.39 × 1)

−0.214 + 0.073 + 0.104 =


= −0.037
Similarly,
1 r1 r2 1 0.73 0.39
r1 1 r3 = 0.73 1 0.07
r2 r1 1 0.39 0.73 1

1× (1 − 0.73 × 0.07 ) − 0.73 × ( 0.73 × 1 − 0.39 × 0.07 )


=

+ 0.39 × ( 0.73 × 0.73 − 0.39 × 1)

= 0.949 − 0.513 + 0.056 = 0.492


Therefore,
−0.037
φˆ33 = =−0.075
0.492
Since φˆ22 =−0.075 lies inside the range ±0.2, herefore, AR model of
order 2 will be suitable for this time series. .

Terminal Questions (TQs)


1. The components of the ARIMA model are as follows:

• Autoregression (AR): refers to a model that shows a changing


variable that regresses on its own lagged, or prior, values.

• Integrated (I): represents the differencing of raw observations to


allow the time series to become stationary (i.e., data values are
replaced by the difference between the data values and the
previous values).

• Moving Average (MA): incorporates the dependency between an


observation and a residual error.

2. The ARIMA(p, d, q) model has the parameters p, d, q which are


integers and different values of these indicate the type of ARIMA model
used. The parameters can be defined as:

• p: the number of lag observations in the model, also known as the


lag order.
• d: the number of times the raw observations are differenced; also
known as the degree of differencing.
• q: the size of the moving average window, also known as the order
of the moving average.
3. For the correlogram, we take lags on the X-axis and sample
autocorrelation coefficients on the Y-axis. At each lag, we draw a line, 145
Block 3 Time Series Analysis
which represents the level of correlation between the series and its lags
as shown in the following Fig. 14.2.

In the plot of ACF versus the lag (correlogram), we see a gradual


diminish in amount or exponential decay which indicates that the values
of the time series are serially correlated, and the series can be modelled
through an AR mode. For determining the order of an AR model, we use
PACF. We calculate PACF by increasing the order one by one and as
soon as this lies within the range of ± 2/√n (where n is the size of the
time series) we should stop and take the order as the last significant
PACF as the order of the AR model.

Fig. 14.2: The correlogram of the time series.

The 1st-order partial autocorrelation function equals the 1st-order


autocorrelation function, that is,
φ11 = r1 = 0.691

Since PACF of first-order lies outside the range ± 2/√n = ± 2/√100 = 0.2,
therefore, we calculate second-order PACF as

(
φˆ 22 =
)
r2 − r12
=
0.54 − ( 0.69 )
=
2
0.064
= 0.049
(
1 − r1)
2
1 − ( 0.69 )
2
0.524

Since PACF (2) lies within the range of ± 2/√n = 0.22, therefore, AR(1)
will be suitable for this time series.

146

You might also like