You are on page 1of 114

B.Com. (Hons.

) I Year Commerce

Paper IV : BUSINESS STATISTICS


SM 2 : Unit (III - V)

SCHOOL OF OPEN LEARNING


(Campus of Open Learning)
University of Delhi

Department of Commerce
Prepared by : Dr. K.L. Dahiya
Graduate Course

Paper IV : Business Statistics

CONTENTS

UNIT - III CORRELATION AND REGRESSION ANALYSIS


Lesson 1 : Correlation Analysis
Lesson 2 : Regression Analysis
UNIT - IV INDEX NUMBERS
Lesson 1 : Index Numbers
UNIT - V TIME SERIES ANALYSIS
Lesson 1 : Time Series Analysis

Prepared by :
Dr. K.L. Dahiya

SCHOOL OF OPEN LEARNING


University of Delhi
5, Cavalry Lane, Delhi-110007
Academic Session 2016-17 (8300 Copies)

© School of Open Learning

Published by : Executive Director, School of Open Learning, 5 Cavalry Lane, Delhi-110007


Printed at : Educational Stores, S-5, Bsr. Road Ind. Area, Ghaziabad (U.P.)
UNIT-3
SIMPLE CORRELATION AND REGRESSION ANALYSIS
LESSON-1 : SIMPLE CORRELATION

1. STRUCTURE
1.0 Objective
1.1 Introduction
1.2 Utility of Correlation
1.3 Difference between Correlation and Causation
1.4 Types of Correlation
1.5 Methods of Studying Correlation
1.5.1 Scatter Diagram
1.5.2 Graphic Method
1.5.3 Karl Pearson’s Coefficient of Correlation
1.5.4 Properties of Coefficient of Correlation
1.5.5 Probable Error of Coefficient of Correlation
1.5.6 Rank Correlation
1.5.7 Concurrent Deviation Method
1.6 Summary
1.7 Self Assessment Questions

1.0 OBJECTIVE
After studying this lesson, you should be able to :
(i) Understand the concept of correlation
(ii) Indentify different types of correlation
(iii) Understand the notion and interpretation of coefficient of correlation
(iv) Compute the value of correlation by different methods
(v) Compute correlation coefficient for bivariate frequency distribution.

1.1 INTRODUCTION
In the earlier chapters we have discussed univariate distributions to highlight the important
characteristics by different statistical techniques. Univariate distribution means the study related to
one variable only. We may however come across certain series where each item of the series may
assume the values of two or more variables. The distributions in which each unit of series assumes
two values is called bivariate distribution. In a bivariate distribution, we are interested to find out
whether there is any relationship between two variables. The correlation is a statistical technique
which studies the relationship between two or more variables and correlation analysis involves
various methods and techniques used for studying and measuring the extent of relationship between
the two variables. When two variables are related in such a way that a change in the value of one is
accompanied either by a direct change or by an inverse change in the values of the other, the two
variables are said to be correlated. In the correlated variables an increase in one variable is
accompanied by an increase or decrease in the other variable. For instance, relationship exists

1
between the price and demand of a commodity because keeping other things equal, an increase in
the price of a commodity shall cause a decrease in the demand for that commodity. Relationship
might exist between the heights and weights of the students and between amount of rainfall in a city
and the sales of raincoats in that city.
These are some of the important definitions about correlation.
Croxton and Cowden says, “When the relationship is of a quantitative nature, the appropriate
statistical tool for discovering and measuring the relationship and expressing it in a brief formula is
known as correlation”.
A.M. Tuttle says, “Correlation is an analysis of the covariation between two or more variables.”
W.A. Neiswanger says, “Correlation analysis contributes to the understanding of economic
behaviour, aids in locating the critically important variables on which others depend, may reveal to
the economist the connections by which disturbances spread and suggest to him the paths through
which stabilizing forces may become effective.
L.R. Conner says, “If two or more quantities vary in sympathy so that the movements in one
tends to be accompanied by corresponding movements in others then they are said to be correlated.

1.2 UTILITY OF CORRELATION


The study of correlation is very useful in practical life as revealed by these points.
(1) With the help of correlation analysis, we can measure in one figure, the degree of relationship
existing between variables like price, demand, supply, income, expenditure etc. Once we know
that two variables are correlated then we can easily estimate the value of one variable, given
the value of other.
(2) Correlation analysis is of great use to economists and businessmen, it reveals to the economists
the disturbing factors and suggest to him the stabilizing forces. In business, it enables the
executive to estimate costs, sales etc. and plan accordingly.
(3) Correlation analysis is helpful to scientists. Nature has been found to be a multiplicity of inter-
related forces.

1.3 DIFFERENCE BETWEEN CORRELATION AND CAUSATION


The term correlation should not be misunderstood as causation. If correlation exists between two
variables, it must not be assumed that a change in one variable is the cause of a change in other
variable. In simple words, a change in one variable may be associated with a change in another
variable but this change need not necessarily be the cause of a change in the other variable. When
there is no cause and effect relationship between two variables but a correlation is found between
the two variables such correlation is known as “spurious correlation” or “nonsense correlation”.
Correlation may exist due to the following:
(1) Pure change correlation : This happens in a small sample. Correlation may exist between
incomes and weights of four persons although there may be no cause and effect relationship
between incomes and weights of people. This type of correlation may arise due to pure random
sampling variation or because of the bias of investigator in selecting the sample.

2
(2) When the correlated variables are influenced by one or more variables. A high degree of
correlation between the variables may exist, where the same cause is affecting each variable or
different cause affecting each with the same effect. For instance, a degree of correlation may
be found between yield per acre of rice and tea due to the fact that both are related to the
amount of rainfall but none of the two variables is the cause of other.
(3) When the variable mutually influence each other so that neither can be called the cause of
other. At times it may be difficult to say that which of the two variables is the cause and which
is the effect because both may be reacting on each other.

1.4 TYPES OF CORRELATION


Correlation can be categorised as one of the following :
(i) Positive and Negative
(ii) Simple and Multiple
(iii) Partial and Total
(iv) Linear and Non-Linear (Curvilinear)
Positive and Negative Correlation
Positive or direct Correlation refers to the movement of variables in the same direction. The
correlation is said to be positive when the increase (decrease) in the value of one variable is
accompanied by an increase (decrease) in the value of other variable also. Negative or inverse
correlation refers to the movement of the variables in opposite direction. Correlation is said to be
negative, if an increase (decrease) in the value of one variable is accompanied by a decrease (increase)
in the value of other.
Simple and Multiple Correlation
Under simple correlation, we study the relationship between two variables only i.e., between the
yield of wheat and the amount of ramfall or between demand and supply of a commodity. In case of
multiple correlation, the relationship is studied among three or more variables. For example, the
relationship of yield of wheat may be studied with both chemical fertilizers and the pesticides.
Partial and Total Correlation
There are two categories of multiple correlation analysis. Under partial correlation, the relationship
of two or more variables is studied in such a way that only one dependent variable and one independent
variable is considered and all others are kept constant. For example, coefficient of correlation between
yield of wheat and chemical fertilizers excluding the effects of pesticides and manures is called
partial correlation. Total correlation is based upon all the variables.
Linear and Non-Linear Correlation
When the amount of change in one variable tends to keep a constant ratio to the amount of change
in the other variable, then the correlation is said to be linear. But if the amount of change in one
variable does not bear a constant ratio to the amount of change in the other variable then the correlation
is said to be non-linear. The distinction between linear and non-linear is based upon the consistency
of the ratio of change between the variables.

3
1.5 METHODS OF STUDYING CORRELATION
There are different methods which helps us to find out whether the variables are related or not.
(1) Scatter Diagram Method
(2) Graphic Method
(3) Karl Pearson’s Coefficient of Correlation
(4) Properties of Coefficient of Correlation
(5) Probable Error of Coefficient of Correlation
(6) Rank Method
(7) Concurrent Deviation Method
Let us understand these methods one by one.
1.5.1 Scatter Diagram
Scatter diagram is drawn to visualise the relationship between two variables. The values of more
important variable is plotted on the X-axis while the values of the other variable are plotted on the
Y-axis. On the graph, dots are plotted to represent different pairs of data. When dots are plotted to
represent all the pairs, we get a scatter diagram. The way the dots scatter gives an indication of the
kind of relationship which exists between the two variables. While drawing scatter diagram, it is
not necessary to take at the point of sign the zero values of X and Y variables, but the minimum
values of the variables considered may be taken.
When there is a positive correlation between the variables, the dots on the scatter diagram run
from left hand bottom to the right hand upper corner. In case of perfect positive correlation all the
dots will lie on a straight line.

When a negative correlation exists between the variables, dots on the scatter diagram run from
the upper left hand corner to the bottom right hand corner. In case of perfect negative correlation, all
the dots lie on a straight line.

4
If a scatter diagram is drawn and no path is formed, there is no correlation. Students are advised
to prepare two scatter diagrams on the basis of the following data :

(i) Data for the first Scatter Diagram :


Demand Schedule
Price (Rs.) Commodity Demand (units)
6 180
7 150
8 130
9 120
10 125
(ii) Data for the second Scatter Diagram :
Supply Schedule
Price (Rs.) Commodity Supply
50 2,000
51 2,100
52 2,200
53 2,500
54 3,000
55 3,800
56 4,700
Students will find that the first diagram indicate a negative correlation where the second diagram
shall reveal a positive correlation.
1.5.2 Graphic Method
In this method the individual values of the two variables are plotted on the graph paper. Therefore
two curves are obtained – one for X variable and another for Y variable.
The graph is interpreted as follows :
(i) If both the curves run parallel or nearly parallel or more in the same direction, there is
positive correlation.
(ii) On the other hand, if both the curves move in the opposite direction, there is a negative
correlation.

5
Example 1 : Show correlation from the following data by graphic method;
Year 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Average Income (Rs.) 100 110 125 140 150 180 200 220 250 360
Average Expenditure (Rs.) 90 95 100 120 120 140 150 170 200 260
Solution :

0 2005 06 07 08 09 10 11 12 13 2014
YEARS

The graph prepared shows that income and expenditure have a close positive correlation. As
income increases, the expenditure also increases.
1.5.3 Karl Pearson’s Coefficient of Correlation
Karl Pearson’s method, popularly known as Pearsonian co-efficient of correlation, is most widely
applied in practice to measure correlation. The Pearsonian co-efficient of correlation is represented
by the symbol r.
According to Karl Pearson’s method, coefficient of correlation between the variables is obtained
by dividing the sum of the products of the corresponding deviations of the various items of two
series from their respective means by the product of their standard deviations and the number of
pairs of observations. Symbolically,
xy
r= where r stands for coefficient of correlation ...(i)
N  x y

where x1 , x2 , x3 , x4 .....xn are the deviations of various items of the first variable from the mean,
y1 , y2 , y3 .... yn are the deviations of all items of the second variable from mean,
xy is the sum of products of these corresponding deviations. N stands for the number of pairs, x
stands for the standard deviation of X variable and y stands for the standard deviation of Y variable.

6
x 2 y 2
x = and y =
N N
If we substitute the value of x and y in the above written formula of computing r, we get

xy xy
r= or r=
 x 2 y 2  x 2 y 2
N  
 N N 

Degree of correlation varies between +1 and –1; the result will be +1 in case of perfect positive
correlation and –1 in case of perfect negative correlation.
Computation of correlation coefficient can be simplified by dividing the given data by a common
factor. In such a case, the final result is not multiplied by the common factor because coefficient of
correlation is independent of change of scale and origin.
Example 2 : Calculate Coefficient of Correlation from the following data :
X 50 100 150 200 250 300 350
Y 10 20 30 40 50 60 70
Solution :

XX Y Y
50 10
X (X  X ) x x2 Y Y Y y y2 xy
50 – 150 –3 9 10 – 30 –3 9 9
100 – 100 –2 4 20 –20 –2 4 4
150 – 50 –1 1 30 – 10 –1 1 1
200 0 0 0 40 0 0 0 0
250 + 50 +1 1 50 + 10 +1 1 1
300 + 100 +2 4 60 + 20 +2 4 4
350 + 150 +3 9 70 + 30 +3 9 9
x = 0 x2 = 28 y = 0 y2 = 28 xy = 28

xy
r =
x 2 y 2

28 28
By substituting the values we get r =  1
28  28 28
Hence there is perfect positive correlation.
Example 3 : A sample of five items is taken from the production of a firm, length and weight of the
five items are given below:

7
Length (inches) 3 4 6 7 10
Weight (ounces) 9 11 14 15 16
Calculate Karl Pearson’s correlation coefficient between length and weight and interpret the
value of correlation coefficient.

X 30 Y 65
Solution : X   6 and Y    13
N 5 N 5

(X  X ) (Y  Y )
X x x2 Y y y2 xy
3 –3 9 9 –4 16 12
4 –2 4 11 –2 4 4
6 0 0 14 +1 1 0
7 +1 1 15 +2 4 2
10 +4 16 16 +3 9 12
X = 30 0 30 Y = 65 0 34 30

xy
r= 2 2
where xy  30, x 2  30, and y 2  34
x y

30 30
r=    0.939 Ans.
30  34 1020
The value of r indicates that there exists a high degree positive correlation between lengths
and weights.
Example 4. From the following data, compute the coefficient of correlation between X and Y :
X Series Y Series
Number of items 15 15
Arithmetic Mean 25 18
Square of deviation from Mean 136 138
Summation of product deviations of X and Y from their Arithmetic Means = 122.
Solution : Denoting deviations of X and Y from their arithmetic means by x and y respectively, the
given data are : x2 = 136, xy = 122, and y2 = 138
xy 122 122
r=   = 0.89 Ans.
2
x y 2
136  138 137

Short-cut Method: To avoid difficult calculations due to mean being in fraction, deviations
are taken from assumed means while calculating coefficient of correlation. The formula is also
modified for standard deviations because deviations are taken from assumed means. Karl Pearson’s
formula for short-cut method is given below :

8
dx.dy
dxdy  N dxdy  dx  dy
r= N or r=
 2
dx 
(dx)  
2
2
 dy 
(dy ) 2 

 N dx 2
 (dx) 2  N dy 2
 (dy ) 2 
 N  N 

Example 5 : Compute the coefficient of correlation from the following data :


Marks in Statistics 20 30 28 17 19 23 35 13 16 38
Marks in Mathematics 18 35 20 18 25 28 33 18 20 40
Solution :

Marks in (X – 30) dx2 Marks in Y – 30


dy2 dxdy
Statistics, X dx Maths, Y dy
20 – 10 100 18 – 12 144 + 120
30 0 0 35 +5 25 0
28 –2 4 20 – 10 100 + 20
17 – 13 169 18 – 12 144 + 156
19 – 11 121 25 –5 25 + 55
23 –7 49 28 –2 4 + 14
35 +5 25 33 +3 9 + 15
13 – 17 289 18 – 12 144 + 204
16 – 14 196 20 – 10 100 + 140
38 +8 64 40 + 10 100 + 80
N = 10 – 61 1017 – 45 795 804

N dxdy  dx.dy
r=
{N dx 2  (dx) 2 }{N dy 2  ( dy ) 2 }
where dx  deviations of X series from an assumed mean 30.
dy  deviations of Y series from an assumed mean 30.
dx2  squares of the deviations of X series from assumed mean.
dy1  squares of the deviations of Y series from assumed mean.
dxdy  the product of deviations of X and Y series from their assumed means.
10  804  ( 61)( 45)
 r =
10  1017  ( 61) 10  795  ( 45) 2

8040  2745 5295


or r =   0.856
(10170  3721)(7950  2025) 6449  5925

9
Direct Method of Computing Correlation Coefficient
Correlation coefficient can also be computed from given X and Y values by using the formula given
below :

N XY  (X )(Y )


r =
N X  ( X ) 2 N Y 2  ( Y ) 2
2

The above given formula gives us the same answer as we are getting by taking durations from
actual mean or arbitrary mean.
Example 6. Compute the coefficient of correlations from the following data :
Marks in Statistics 20 30 28 17 19 23 35 13 16 38
Marks in Mathematics 18 35 20 18 25 28 33 18 20 40
Solution :
Marks in Marks in
Statistics, X Mathematics, Y X2 Y2 XY
20 18 400 324 360
30 35 900 1225 1050
28 20 784 400 560
17 18 289 324 306
19 25 361 625 475
23 28 529 784 644
35 33 1225 1089 1155
13 18 169 324 234
16 20 256 400 320
38 40 1444 1600 1520
X = 239 Y = 255 X2 = 6357 Y2 = 7095 XY = 6624

Substitute the computed values in the formula given below,

N XY  (X )(Y )


r=
N X 2  ( X ) 2 N Y 2  ( Y ) 2

(10  6624)  (239)(255)


=
10  6357  (239) 2 10  7095  (255) 2
66240  60945 5295 5295
=    0.856
63570  57121 70950  65095 6449 5925 6181.45

10
Coefficient of Correlation in a Continuous Series
In the case of a continuous series, we assume that every item which falls within a given class
interval falls exactly at the middle of that class. The formula, because of the presence of frequencies
is modified as follows :

fdx.fdy
fdxdy 
f
r=
 2 (fdx) 2   2 (fdy ) 2 
  fdx   dy  
 f   f 

Various values shall be calculated as follows :


(i) Take the step deviations of variable X and denote it as dx.
(ii) Take the step deviations of variable Y and denote it as dy.
(iii) Multiply dx dy and the respective frequency of each cell and write the figure obtained in
the right-hand upper corner of each cell.
(iv) Add all the cornered values calculated in step (iii) to get fdxdy.
(v) Multiply the frequencies of the variable X by the deviations of X to get fdx.
(vi) Take the squares of the deviations of the variable X and multiply them by the respective
frequencies to get fdx2.
(vii) Multiply the frequencies of the variable Y by the deviations of Y to get fdy.
(viii) Take the squares of the deviations of the variable Y and multiply them by the respective
frequencies to get fdy2.
(ix) Now substitute the values of fdxdy, fdx, fdx2, fdy, fdy2 in the formula to get the
value of r.
Example 7 : The following table gives the ages of husbands and wives at the time of their marriages.
Calculate the correlation coefficient between the ages of husbands and wives.
Ages of Husbands
Age of Wives 20–30 30–40 40–50 50–60 60–70 Total
15–25 5 9 3 – – 17
25–35 – 10 25 2 – 37
35–45 – 1 12 2 – 15
45–55 – – 4 16 5 25
55–65 – – – 4 2 6
Total 5 20 44 24 7 100

11
Solution : Age of Husbands (X)

fdx.fdy
fdxdy 
f
r=
 2 (fdx) 2   2 (fdy ) 2 
fdx   dy  
 f   f 

(8)(34) 90.72
88 
100 100 90.72
=     0.79
 (8)  
2
(34) 2  91.36 142.44 91.36  14.244
92   154   
 100   100  100 100

1.5.4 Properties of Coefficient of Correlation


Following are some of the important properties of r :
(1) The coefficient of correlation lies between –1 and +1 (–1  r  +1 )

12
(2) The coefficient of correlation is independent of change of scale and origin of the variable
X and Y.
(3) The coefficient of correlation is the geometric mean of two regression coefficients.
r = bxy  dyx
Merits of Pearson’s coefficient of correlation : The correlation of coefficient summarizes in
one figure the degree and direction of correlation but also the direction. Value varies between +1
and –1.
Demerits of Pearson’s coefficient of correlation :It always assumes linear relationship between
the variables; in fact the assumption may be wrong. Secondly, it is not easy to interpret the significance
of correlation coefficient. The method is time consuming and affected by the extreme items.
1.5.5 Probable Error of Coefficient of Correlation
It is calculated to find out how far the Pearson’s coefficient of correlation is reliable in a particular
case.
1 r2
P.E. of coefficient of correlation = 0.6745 
N
where r = coefficient of correlation and N= number of pairs of items.
If the probable error calculated is added to and subtracted from the coefficient of correlation,
it would give us such limits within which we can expect the value of the coefficient of correlation
to vary.
If r is less than probable error, then there is no real evidence of correlation.
If r is more than 6 times the probable error, the coefficient of correlation is considered highly
significant.
If r is more than 3 times the probable error but less than 6 times, correlation is considered
significant but not highly significant.
If the probable error is not much and the given r is more than the probable error but less than
3 times of it, nothing definite can be concluded.
1.5.6 Rank Correlation
There are many problems of business and industry when it is not possible to measure the variable
under consideration quantitatively or the statistical series is composed of items which can not be
exactly measured. For instance, it may be possible for the two judges to rank six different brands of
cigarettes in terms of taste, whereas it may be difficult to give them a numerical grade in terms of
taste. In such problems, Spearman’s coefficient of rank correlation is used. The formula for rank
correlation is :
6 D 2 6 D 2
 = 1 or 1 3
N ( N 2  1) N N
where  stands for rank coefficient of correlation
D refers to the difference of ranks between paired items
N refers to the number of paired observations.
The value of rank correlation coefficient varies between +1 and –1. When the value of  = +1,

13
there is complete agreement in the order of ranks and the ranks will be in the same order. When
 = –1, the ranks will be in opposite direction showing complete disagreement in the order of ranks.
Let us understand with the help of an example.
Example 8 : Ranks of 10 individuals at the start and at the finish of a course of training are given :
Individual : A B C D E F G H 1 J
Rank before : 1 6 3 9 5 2 7 10 8 4
Rank after : 6 8 3 7 2 1 5 9 4 10
Calculate coefficient of correlation.
Solution :
Individual Rank before Rank after (R1 – R2)
R1 R2 D D2
A 1 6 –5 25
B 6 8 –2 4
C 3 3 0 0
D 9 7 2 4
E 5 2 3 9
F 2 1 1 1
G 7 5 2 4
H 10 9 1 1
I 8 4 4 16
J 4 10 –6 36
N = 10 D2 = 100
By applying the formula,

6 D 2 6  100
 = 1 3  1 3  1  0.609  0.394
N N 10  10
When we are given the actual data and not the ranks, it becomes necessary for us to assign the
ranks. Ranks can be assigned by taking either the highest value as one or the lowest value as one.
But if we start by taking the highest value or the lowest value we must follow the same order for
both the variables to assign ranks.
Example 9 : Calculate rank correlation from the following data :
X : 17 13 15 16 6 11 14 9 7 12
Y : 36 46 35 24 12 18 27 22 2 8

14
Solution :
Calculation of Rank Correlation
X (Ranks) Y (Ranks) D D2
R1 R2 (R1 – R2)
17 1 36 2 –1 1
13 5 46 1 +4 16
15 3 35 3 0 0
16 2 24 5 –3 9
6 10 12 8 +2 4
11 7 18 7 0 0
14 4 27 4 0 0
9 8 22 6 +2 4
7 9 2 10 –1 1
12 6 8 9 –3 9
N = 10 D2 = 44
Rank correlation coefficient is calculated as follows :

6 D 2
 = 1
N3  N

6  44 264
 = 1 3
 1  1  0.266  0.734
10  10 990
In some cases it becomes necessary to rank two or more items an identical rank. In such cases,
it is customary to give each item an average rank. Therefore, if two items are equal for 4th and 5th
45
rank, each item shall be ranked 4.5 i.e., . It means, where two or more items are to be ranked
2
equal, the rank assigned for purposes of calculating coefficient of correlation is the average of ranks
which these items would have got had they differed slightly from each other. When equal ranks are
assigned to some items, the rank correlation formula is also adjusted. The adjustment consists of
1
adding ( m 2  m) to the value of D2 where m stands for number of items whose ranks are
12
identical.

 1 1 
6 D 2  (m3  m)  (m3  m)  ....
 12 12 
 = 1 3
N N
Let us take an example to understand this.
Example 10 : Compute the rank correlation coefficient from the following data :
Section A : 115 109 112 87 98 98 120 100 98 118
Section B : 75 73 85 70 76 65 82 73 68 80

15
Solution :
Computation of Rank correlation coefficient.
Series Ranks Series Ranks D D2
A R1 B R2 (R1 – R2)
115 8 75 6 –2 4
109 6 73 4.5 1.5 2.25
112 7 85 10 –3 9
87 1 70 3 –2 4
98 3 76 7 –4 16
98 3 65 1 2 4
120 10 82 9 1 1
100 5 73 4.5 0.5 0.25
98 3 68 2 1 1
118 9 80 8 1 1
N = 10 D2 = 42.50
Apply formula to calculate Rank Correlation

 1 1 
6 D 2  (m3  m)  (m3  m)  ....
12 12
 = 1  3

N N
Item 98 is repeated three times in series A. Hence m = 3. In series B the item 73 is repeated two
times and so m = 2.

 1 1 
6 42.50  (33  3)  (23  2) 
12 12
 = 1  
103  10

6(42.50  2  0.50) 270


 = 1  1  0.727
1000  10 990
1.5.7 Concurrent Deviation Method
This is the simplest method of studying correlation. The only thing to be computed under this
method is the direction of change of both the variables. The formula is

 2C  N 
rc =    
N 
where rc = Coefficient of concurrent deviations.
C = Number of concurrent deviations.
N = Number of pairs of deviations compared.

16
The procedure of calculating coefficient of correlation under this method is quite simple as
explained below:
(i) Compute the direction of change for both the variables comparing with the preceeding
values and assign + sign for increase and – sign for decrease and 0 for no change.
(ii) Denote these two columns by Dx and Dy.
(iii) Multiply Dx with Dy and determine the value of C which means positive.
(iv) Apply the formula.
We can understand by taking an example.
Example 11 : Calculate the coefficient of correlation by concurrent deviation from the following
data
X : 100 120 135 135 115 110 120
Y : 50 40 60 80 80 55 65
Solution :
X Dx Y Dy DxDy
100 50
120 + 40 – –
135 + 60 + +
135 0 80 + 0
115 – 80 0 0
110 – 55 – +
120 + 65 + +
N=6 C=3

 2C  N 
rc =    
N 

 2  3  6 0
=        0.
6 6
Therefore the correlation does not exist between the variables.

1.6 SUMMARY
Correlation analysis deals with bivariate and multivariate data.
Correlation is a study of the co-variation of the variables involved.
When changes in the variables occur in the same direction, they are positively correlated and
when the movements are in the opposite directions, the correlation is negative.
Correlation between two variables would result either when one of them is the cause while the

17
other is the effect or when both of them are affected by some common factors. It may also be
spurious correlation, resulting from chance when factors affecting each one have nothing in
common.
Correlation between variables may be of varying degrees ranging from perfect to high, moderate,
low and no correlation.
Correlation may be linear or non-linear. Only linear correlation is considered here.
Graphically, correlation is studied by means of a scatter diagram. If dots representing pairs of
data values are seen to fall on a straight line, the correlation is perfect. The degree of correlation
decreases as the points lay more and more away from the line.
Widely scattered dots with no clear direction and dots in a line that is parallel to either of the
axes means absence of correlation.
Numerically, the correlation is measured and expressed in terms of Karl Pearson’s coefficient
of correlation which is defined as the ratio of covariance to the product of standard deviations
of the two series involved.
Its calculation can be done by measuring deviations of the observations from their respective
means or assumed mean values, and even by not measuring deviations.
The coefficient of correlation varies between ±1 and is independent of the change of origin
and scale.

1.7 SELF ASSESSMENT QUESTIONS


Exercise 1: True or False statements
(i) The correlation analysis is related to the examination or the nature of relationship between
variables.
(ii) Covariation implies that two variables would vary in the same direction.
(iii) Negative correlation in two series means that as the value of one of the variables decreases,
the other would also decrease.
(iv) Graphic representation of correlation is done by means of a scatter diagram.
(v) A straight line on a scatter diagram having a zero slope implies that there is perfect positive
correlation between the variables.
(vi) The coefficient of correlation contains in a single number the extent and the direction of
relationship between the variables.
(vii) The absolute value of the covariance between X and Y can be at most equal to the product
of the standard deviations of X and Y series.
(viii) The coefficient of correlation always lies between 0 and 1.
(ix) The coefficient of correlation can be calculated without measuring deviations from actual
or assumed means.
(x) The probable error aims at establishing the dependability of the coefficient of correlation.
(xi) For one set of data, r = 0.8 and for another set r = 0.4. It means that correlation is twice as
strong in the first set than in the second.

18
(xii) A correlation between two variables may exist because both of them may be influenced
by some common factors.
(xiii) For a given set of X and Y values, the coefficient of correlation is found to be equal to 0.6.
If the variables are interchanged so thatX becomes Y and Y becomes X, then the coefficient
for the new set of data may or may not be equal to 0.6.
(xiv) If all the X values in a given set of paired data are subtracted from a constant K, it will
have no effect on the value of the correlation coefficient.
(xv) The coefficient of correlation is independent of the change of origin and scale.
(xvi) If the coefficient of correlation between X and Y is 0.7, then the coefficient of correlation
between –X and –Y would be equal to –0.7.
(xvii) The correlation is said to be significant only when | r| > 6PE.
(xviii) For the rank correlation to be calculated, it is necessary that the given variables should
not be quantifiable.
(xix) The coefficient of rank correlation has the same limits as the Karl Pearson’s coefficient of
correlation has.
(xx) Rank correlation can be used even when the variables under consideration are quantifiable
and not normally distributed.
Ans. 1. F 2. F 3. F 4. T 5. F 6. T 7. T 8. F 9. T 10. T 11. F 12. T 13. F 14. F 15. T 16. F, 17. T, 18. F,
19. T, 20. T
Exercise 2 : Questions and Answers
(i) What is correlation? Distinguish between positive and negative correlation. How is ‘scatter
diagram’ method helpful in the study of correlation?
(ii) What is a scatter diagram? How does it help in studying the degree and direction of
correlation between two variables? Illustrate with some sketches.
(iii) Define Karl Pearson’s coefficient of correlation. Explain the general rules for interpreting
the coefficient. In this connection, also state the meaning and significance of the concept
of probable error.
(iv) State and explain the properties of the coefficient of correlation. Also, state the assumptions
underlying.
(v) What do you understand by the statement that coefficient of correlation is independent of
the change of origin and scale?
(vi) Does correlation imply the existence of cause and effect relationship between the variables
involved? Does cause and effect relationship between variables result in correlation
between them? Explain with the help of suitable examples.
(vii) Define rank correlation. Write Spearman’s formula for rank correlation coefficient when
some ranks are tied and when ranks are not tied. What are the limits of this coefficient?
Interpret the case where this coefficient assumes the minimum value.
(viii) For a given series of paired data, the following information is available:
Covariance between X and Y series = –32.6

19
Standard deviation of X series = 8.6
Standard deviation of Y series = 4.8
No. of pairs of observations = 15
Calculate the coefficient of correlation.
(ix) Given the following information:
Number of pairs of observations of X and Y series =15
X series arithmetic mean = 25
Y series arithmetic mean =18
X series standard deviation = 3.0
Y series standard deviation = 3.03
Summation of the products of corresponding deviations of X and Y series = 122
Calculate the coefficient of correlation between X and Y series.
(x) Given:
Total of multiplication of deviations of X and Y = 3,476
No. of pairs of observations = 12
Total of deviations of X = – 176
Total of deviations of Y = – 26
Total of squares of deviations of X = 8,288
Total of squares of deviations of Y = 2,556
Using this information, calculate the coefficient of correlation when the arbitrary mean
values of X and Y are 85 and 22, respectively.
(xi) For a set of bivariate data, you are given the following information:
(X – 58) = 46,(Y – 58) = 19, (X – 58) (Y – 58) = 1,095,
(X – 58)2 = 1,483, and (Y – 58)2 = 3,086
Number of pairs of observations = 8
Calculate the coefficient of correlation between X and Y.
(xii) The co-efficient of correlation between two variables X and Y is – 0.4 and their covariance
is equal to –16. If variance of Y series is 36, find the second moment about mean of X
series.
(xiii) Given below is the information relating to marks in Statistics X
( ) and marks in Accountancy
(Y) obtained by the students of a class:
Co-variance between X and Y = 144
Second moment of X about 20 = 244
First moment of X about 20 = 10
Arithmetic mean of Y = 45
Coefficient of correlation between X and Y = 0.75

20
Calculate coefficient of variation for marks in Statistics and that for marks in Accountancy.
In which subject is the performance of students is more consistent?
(xiv) The coefficient of correlation between X and Y for 20 items is 0.3. The mean of X is 15
and that of Y is 20 while the respective standard deviations are 4 and 5. At the time of
calculation, one item 27 has wrongly been taken as 17 in the case of X series and 35
instead of 30 in the case of Y series. Find the correct coefficient of correlation.
(xv) While making calculations about coefficient of correlation, a student obtained the following
results:
n = 25, X = 125, X2 = 650, Y = 100, Y2 = 460, and XY = 508
It was discovered later, however, that two pairs of values were wrongly recorded as :
X Y X Y
6 14 while the correct values were: 8 12
8 6 6 8
Obtain the correct value of the coefficient of correlation.
(xvi) Find the coefficient of correlation between age and playing habits of the following students:

Age of Players: 16 17 18 19 20 21
No. of Students: 2,500 2,000 1,500 1,200 1,000 800
Regular Players: 2,250 1,200 1,050 480 250 120
(xvii) A panel of judges A and B graded seven dramatic performances by independently awarding
marks as given here.
Performance : 1 2 3 4 5 6 7
Marks by A: 46 42 44 40 43 41 45
Marks by B: 40 38 36 35 39 37 41
Show by means of coefficient of correlation whether the marks given by them are correlated.
(xviii) Calculate the coefficient of correlation between height and weight of the students using
the following data:

Height Weight (lbs)


(inches) 90–100 100–110 110–120 120–130
50–55 4 7
55–60 6 10 7
60–65 10 12 7
65–70 8 6 13

Ans. 8. –0.790, 9. 0.895, 10. 0.819, 11. 0.512, 12. 44.44, 13. CV : Stats = 40% Accs = 35.56%
14. 0.515, 15. 0.667, 16. – 0.958, 17. r = 0.750, 18. 0.583

21
LESSON 2
REGRESSION ANALYSIS

2. STRUCTURE
2.0 Objective
2.1 Introduction
2.2 Difference between Correlation and Regression
2.3 Principle of Least Squares
2.4 Methods of Regression Analysis
2.4.1 Graphic Method
2.4.2 Algebraic Method
2.5 Properties of Regression Coefficients
2.6 Standard Error of an Estimate
2.7 Summary
2.8 Self Assessment Questions

2.0 OBJECTIVE
After studying this lesson, you should be able to :
(i) Understand the concept of regression analysis
(ii) Differentiate between correlation and regression analysis
(iii) Compute regression coefficients by different methods and draw regression lines
(iv) Comprehend properties of Regression Coefficients
(v) Apply regression analysis to predict a dependent variable given the independent variable.

2.1 INTRODUCTION
The statistical technique correlation establishes the degree and direction of relationship between
two or more variables. But we may be interested in estimating the value of an unknown variable on
the basis of a known variable. If we know the index of money supply and price-level, we can find
out the degree and direction of relationship between these indices with the help of correlation
technique. But the regression technique helps us in determining what the general price-level would
be assuming a fixed supply of money. Similarly if we know that the price and demand of a commodity
are correlated we can find out the demand for that commodity for a fixed price. Hence, the statistical
tool with the help of which we can estimate or predict the unknown variable from known variable
is called regression. The meaning of the term “Regression” is the act of returning or going back.
This term was first used by Sir Francis Galton in 1877 when he studied the relationship between the
height of fathers and sons. His study revealed a very interesting relationship. All tall fathers tend to
have tall sons and all short fathers short sons but the average height of the sons of a group of tall
fathers was less than that of the fathers and the average height of the sons of a group of short fathers
was greater than that of the fathers. The line describing this tendency of going back is called
“Regression Line”. Modern writers have started to use the term estimating line instead of regression
line because the expression estimating line is more clear in character. According to Morris Myers

22
Blair, regression is the measure of the average relationship between two or more variables in terms
of the original units of the data.
Regression analysis is a branch of statistical theory which is widely used in all the scientific
disciplines. It is a basic technique for measuring or estimating the relationship among economic
variables that constitute the essence of economic theory and economic life. The uses of regression
analysis are not confined to economics and business activities. Its applications are extended to
almost all the natural, physical and social sciences. The regression technique can be extended to
three or more variables but we shall limit ourselves to problems having two variables in this lesson.
Regression analysis is of great practical use even more than the correlation analysis. Some of
the uses of the regression analysis are given below :
(i) Regression Analysis helps in establishing a functional relationship between two or more
variables. Once this is established it can be used for various analytic purposes.
(ii) With the use of electronic machines and computers, the medium of calculation of regression
equation particularly expressing multiple and non-linear relations has been reduced
considerably.
(iii) Since most of the problems of economic analysis are based on cause and effect relationship,
the regression analysis is a highly valuable tool in economic and business research.
(iv) The regression analysis is very useful for prediction purposes. Once a functional
relationship is established the value of the dependent variable can be estimated from the
given value of the independent variables.

2.2 DIFFERENCE BETWEEN CORRELATION AND REGRESSION


Both the techniques are directed towards a common purpose of establishing the degree and direction
of relationship between two or more variables but the methods of doing so are different. The choice
of one or the other will depend on the purpose. If the purpose is to know the degree and direction of
relationship, correlation is an appropriate tool but if the purpose is to estimate a dependent variable
with the substitution of one or more independent variables, the regression analysis shall be more
helpful. The point of difference are discussed below:
(i) Degree and Nature of Relationship : The correlation coefficient is a measure of degree of
covariability between two variables whereas regression analysis is used to study the nature of
relationship between the variables so that we can predict the value of one on the basis of
another. The reliance on the estimates or predictions depend upon the closeness of relationship
between the variables.
(ii) Cause and Effect Relationship : The cause and effect relationship is explained by regression
analysis. Correlation is only a tool to ascertain the degree of relationship between two variables
and we can not say that one variable is the cause and other the effect. A high degree of correlation
between price and demand for a commodity or at a particular point of time may not suggest
which is the cause and which is the effect. However, in regression analysis cause and effect
relationship is clearly expressed – one variable is taken as dependent and the other an
independent.
The variable which is the basis of prediction is called independent variable and the variable
that is to be predicted is called dependent variable. The independent variable is represented by X
and the dependent variable by Y.

23
2.3 PRINCIPLE OF LEAST SQUARES
Regression refers to an average of relationship between a dependent variable with one or more
independent variables. Such relationship is generally expressed by a line of regression drawn by the
method of the “Least Squares”. This line of regression can be drawn graphically or derived
algebraically with the help of regression equations. Before the equation of the least line can be
determined some criterion must be established as to what conditions the best line should satisfy.
The condition usually stipulated in regression analysis is that the sum of the squares of the deviations
of the observed Y values from the fitted line shall be minimum. This is known as the least squares
or minimum squared error criterion.
A line fitted by the method of least squares is the line of best fit. The line satisfies the following
conditions :
(i) The algebraic sum of deviations above the line and below the line are equal to zero.
 (x – xc) = 0 and  (y – yc) = 0
where xc and yc are the values derived with the help of regression technique.
(ii) The sum of the squares of all these deviations is less than the sum of the squares of
deviations from any other value, we can say
 (x – xc)2 is smaller than  (x – A)2 and
 (y – yc)2 is smaller than  (y – A)2
where A is some other value or any other straight line.
(iii) The line of regression (best fit) intersect at the mean value of the variables i.e., x and y.
(iv) When the data represent a sample from a larger population, the least square line is the
best estimate of the population line.

2.4 METHODS OF REGRESSION ANALYSIS


We can study regression by the following methods :
1. Graphic method (regression lines)
2. Algebraic method (regression equations)
We shall understand these methods.
2.4.1 Graphic Method
When we apply this method different points are plotted on a graph paper representing different
pairs of variables. These points give a picture of a scatter diagram with several points spread over.
A regression line may be drawn between these points either by free hand or by a scale in such a way
that the squares of the vertical or horizontal distances between the points and the line of regression
is minimum. It should be drawn in such a manner that the line leaves equal number of points on
both sides. However, to ensure this is rather difficult and the method only renders a rough estimate
which can not be completely free from subjectivity of person drawing it. Such a line can be a
straight line or a curved line depending upon the scatter of points and relationship to be established.
A non-linear free hand curve will have more element of subjectivity and a straight line is generally
drawn. Le us understand it with the help of an example :

24
Example 1 :
Height of fathers Height of sons
(Inches) (Inches)
65 68
63 66
67 68
64 65
68 69
62 66
70 68
66 65
68 71
67 67
69 68
71 70
Solution : The diagram given below shows the height of fathers on x-axis and the height of sons on
y-axis. The line of regression called the regression of y on x is drawn between the scatter dots.

Fig. 1
Another line of regression called the regression line of x on y is drawn amongst the same set of
scatter dots in such a way that the squares of the horizontal distances between dots are minimised.

Fig. 2

25
Fig. 3
It is clear that the position of the regression line of x on y is not exactly like that of the regression
line of y on x. In the following figure both the regression of y on x and x on y are exhibited.

Fig. 4
When there is either perfect positive or perfect negative correlation between the two variables,
the two regression lines will coincide and we will have only one line. The farther the two regression
lines from each other, the lesser is the degree of correlation and vice-versa. If the variables are
independent, correlation is zero and the lines of regression will be at right angles. It should be noted
that the regression lines cut each other at the point of average of x and y, i.e., if from the point where
both the regression lines cut each other a perpendicular is drawn on the x-axis, we will get the mean
value of x series and if from that point a horizontal line is drawn on the y-axis we will get the mean
of y series.
2.4.2 Algebraic Method
The algebraic method for simple linear regression can be understood by two methods:
(i) Regression Equations
(ii) Regression Coefficients
Regression Equations : These equations are known as estimating equations. Regression
equations are algebraic expressions of the regression lines. As there are two regression lines, there
are two regression equations :
(i) X on Y is used to describe the variations in the values of X for given changes in Y.

26
(ii) Y on X is used to describe the variations in the values of Y for given changes in X. The
regression equations of Y on X is expressed as
Yc = a + bX
The regression equations of X on Y is expressed as
Xc = a + bX
In these equations a and b are constants which deretmine the position of the line completely.
These constants are called the parameters of the line. If the value of any of these parameters is
changed, another line is determined.
Parameter a refers to the intercept of the line and b to the slope of the line. The symbol Yc and
Xc refers to the values of Y computed and the value of X computed on the basis of independent
variable in both the cases. If the values of both the parameters are obtained, the line is completely
determined. The values of these two parameters a and b can be obtained by the method of least
squares. With a little algebra and differential calculus it can be shown that the following two equations,
are solved simultaneously, will give values of the parameters a and b such that the least squares
requirement is fulfilled;
For regression equation Yc = a + bX
y = Na + bx
xy = ax + bx2
For regression equation Xc = a + bY
x = Na + by
xy = ay + by2
These equations are usually called the normal equations. In the equations x, y, xy, x2, y2
indicate totals which are computed from the observed pairs of values of two variables x and y to
which the least squares estimating line is to be fitted and N is the number of observed pairs of
values. Let us understand by an example.
Example 2: From the following data obtain the two regression equations :
x : 6 2 10 4 8
y : 9 11 5 8 7
Solution :
Computation of Regression Equations
x y xy x2 y2
6 9 54 36 81
2 11 22 4 121
10 5 50 100 25
4 8 32 16 64
8 7 56 64 49
x = 30 y = 40 xy = 214 x2 = 220 y2 = 340

27
Regression line of Y on X is expressed by the equation of the form
Yc = a + bX
To determine the values of a and b, the following two normal equations are solved
y = Na + bx
xy = ax + bx2
Substituting the values, we get
40 = 5a + 30b ...(i)
214 = 30a + 220b ...(ii)
Multiplying equation (i) by 6, we get
240 = 30a + 180b ...(iii)
214 = 30a + 220b ...(iv)
Deduct equation (iv) from (iii)
– 40b = + 26
 b = – 0.65
Substitute the value of b in equation (i)
40 = 5a + 30 (– 0.65)
5a = 40 + 19.5 or a = 11.9
Substitute the values of a and b in the equation
Regression line of Y on X is
Yc = 11.9 – 0.65X
Regression line of X on Y is
Xc = a + bY
The corresponding normal equations are
x = Na + by
xy = ay + by2
Substituting the values
30 = 5a + 40b ...(i)
214 = 40a + 340b ...(ii)
Multiply equation (i) by 8
240 = 40a + 320b ...(iii)
214 = 40a + 340b ...(iv)
Deduct equation (iv) from (iii)
–20b = 26 or b = – 1.3
Substitute the value of b in equation (i)
30 = 5a + 40(– 1.3)
5a = 30 + 52 or a = 16.4

28
Substitute the values of a and b in the equation. Regression line of X on Y is
Xc = 16.4 – 1.3Y
Regression Coefficients : In the regression equation b is the regression coefficient which
indicates the degree and direction of change in the dependent variable with respect to a change in
the independent variable. In the two regression equations:
Xc = a + bY
Yc = a + bX
where bxy (b) and byx (b) are known as the regression coefficients of the two equations. These
coefficients can be obtained independently without using simultaneous normal equations with these
formulae:
Regression coefficients of X on Y is
x
bxy = r
y

xy  xy
bxy =  x 
N  x  y  y N  2y
xy
bxy = where x = X  X and y = Y Y
y 2
Regression Coefficient of Y on X is

y
byx = r
x

xy  y xy
byx = N      N  2
x y x x

xy
byx = where x = X  X and y  Y  Y
x 2
Example 3: Calculate the regression coefficients from data given below :
Series x Series y
Average 25 22
Standard deviation 4 5 r = 0.8
Solution : The coefficient of regression of x on y is
x 4
bxy = r  0.8    0.64
y 5
The coefficient of regression of y on x is

y 4
byx = r  0.8   1.00
x 5

29
2.5 PROPERTIES OF REGRESSION COEFFICIENTS
(i) The coefficient of correlation is the geometric mean of the two regression coefficients,
r   bxy  byx .
(ii) Both the regression coefficients are either positive or negative. It means that they always
have identical sign i.e., either both have positive sign or negative sign.
(iii) The coefficient of correlation and the regression coefficients will also have same sign.
(iv) If one of the regression coefficient is more than unity, the other must be less than unity
because the value of coefficient of correlation cannot exceed one (r = ± 1).
(v) Regression coefficients are independent of the change in the origin but not of the scale.
(vi) The average of regression coefficients is always greater than correlation coefficient.
We can compute the regression equations with the help of regression coefficients by the
following equations:
1. Regression equation X on Y
x
XX = r (Y  Y )
y
where X is the mean of X series
Y is the mean of Y series

r x is the regression coefficient of x and y
y
2. Regression equation Y on X
We can explain this by taking an example :
y
Y Y = r (X  X )
x
Example 4 : Calculate the following from the below given data :
(a) the two regression equations,
(b) the coefficient of correlation and
(c) the most likely marks in Statistics when the marks in Economics are 30.
Marks in Economics : 25 28 35 32 31 36 29 38 34 32
Marks in Statistics : 43 46 49 41 36 32 31 30 33 39
Solution :
Calculation of Regression Equations and Correlation Coefficient
Marks in (X – X ) Marks in (Y – Y )
Eco (X) x x2 Stats (Y) y y2 xy
25 –7 49 43 +5 25 – 35
28 –4 16 46 +8 64 – 32
35 +3 9 49 + 11 121 + 33

30
32 0 0 41 +3 9 0
31 –1 1 36 –2 4 +2
36 +4 16 32 –6 36 – 24
29 –3 9 31 –7 49 + 21
38 +6 36 30 –8 64 – 48
34 +2 4 33 –5 25 – 10
32 0 0 39 +1 1 0
X = 320 x = 0 x2 = 140 Y = 380 y = 0 y2 = 398 xy = – 93
(a) Regression equation X on Y
X  X = bxy (Y  Y )

xy 93
bxy =   0.234
y 2 398

X 320 Y 380
X =   32 and Y    38
N 10 N 10
Substituting the values
X – 32 = – 0.234 (Y – 38)
X – 32 = – 0.234Y + 8.892
or X = 40.892 – 0.234Y
Regression equation Y on X
(Y  Y ) = byx ( X  X )

xy 93
byx =   0.664
x 2 140

X = 32, Y = 38, b = – 0.664


 Y – 38 = – 0.664 (X – 32)
= – 0.664Y + 21.248
or Y = 59.248 – 0.664X

(b) Correlation Coefficient (r) = ± bxy × byx = – 0.234  0.664  0.394


Since both the regression coefficients are negative, value of r must also be negative.
(c) Likely marks in statistics when marks in Economics are 30.
Y = – 0.664 X + 59.248 where X = 30
Y = (– 0.664 × 30) + 59.248 = 39.328 or 39.
Example 5 : The following scores were worked out from a test in Mathematics and English in an
annual examination.

31
Scores in Mathematics (x) English (y)
Mean 39.5 47.5
Standard deviation 10.8 16.8 r = + 0.42
Find both the regression equations. Using these regression estimate find the value of Y for
X = 50 and the value of X for Y = 30.
Solution : Regression of X on Y
x
X  X = r  (Y  Y )
y

where Y = 47.5, X = 39.5, r = 0.42, x = 10.8, and y = 16.8


By substituting values, we get
10.8
X – 39.5 = 0.42 (Y  47.5)
16.8
= 0.27 (Y – 47.5) = 0.27 Y – 12.82
or X = 0.27Y – 12.82 + 39.5 = 0.27Y + 26.68
when Y = 30
Value of X = (0.27 × 30 + 26.68) = 34.78
Regresssion euqation of Y on X

y
r X  X)
Y Y =  (
x

where X = 39.5, Y = 47.5, r = 0.42, x = 10.8, and y = 16.8


16.8
Y – 47.5 = 0.42 ( X  39.5)
10.8
Y – 47.5 = 0.653 (X – 39.5) = 0.653 X – 25.79
or Y = 0.653 X – 25.79 + 47.5 = 0.653X + 21.71
When X = 50
Value of Y = (0.653 × 50 + 21.71) = 32.65 + 21.71 = 54.36
Thus the regression equations are :
Xc = 0.27y + 26.68
Yc = 0.653x + 21.71
Value of X when Y = 30 is 34.78
Value of Y when X = 50 is 54.36
When actual mean of both the variables X and Y come out to be in fractions, the deviation from
actual means create a problem and it is advisable to take deviations from the assumed mean. Thus
when deviations are taken from assumed means, the value of bxy and byx is given by

32
(dx )  (dy )
dxdy 
bxy = N where dx = (X – A) and dy = (Y – A)
(  dy )2
dy 2 
N
The regression equation is :
( X  X ) = bxy (Y  Y )
Similarly the regression equation of Y on X is

(Y  Y ) = byx ( X  X )

(dx )  (dy )
dxdy 
N
byx =
(  dx )2
dx 2 
N
Let us try to understand with the help of an example :
Example 6 : You are given the data relating to purchases and sales. Compute the two regression
equations by method of least squares and estimate the likely sales when the purchases are 100.
Purchases : 62 72 98 76 81 56 76 92 88 49
Sales : 112 124 131 117 132 96 120 136 97 85
Solution :
Calculations of Regression Equations
Purchases (X–76) Sales (Y–120)
X dx dx2 Y dy dy2 dxdy
62 – 14 196 112 –8 64 + 112
72 –4 16 124 +4 16 – 16
98 + 22 484 131 + 11 121 + 242
76 0 0 117 –3 9 0
81 +5 25 132 + 12 144 + 60
56 – 20 400 96 – 24 576 + 480
76 0 0 120 0 0 0
92 + 16 256 136 + 16 256 + 256
88 +12 144 97 –23 529 – 276
49 –27 729 85 –35 1225 + 945
dx = – 10 dx2 = 2250 dy = – 50 dy2 = 2940 dxdy = 1803

dx 10 dy 50
X  A  76   75 and Y  A  120   115
N 10 N 10

33
Regression Coefficients : X on Y
(dx )  (dy ) (10)  ( 50)
dxdy  1803 
N 10 1753
bxy = 2
 2
  0.652
2 (dy ) (50) 2690
dy  2940 
N 10
Y on X
(dx )  ( dy ) ( 10)  ( 50)
dxdy  1803 
N 10 1753
byx = 2
 2
  0.78
2 (dx ) ( 10) 2240
dx  2250 
N 10
Regression equation : X on Y

X  X = bxy (Y  Y )
Substituting the values
X – 75 = 0.652 (Y – 115) = 0.652Y – 74.98
or X = 0.652Y + 0.02
Regression equation : Y on X
(Y  Y ) = bxy ( X  X )
Y – 115 = 0.78 (X – 75) = 0.78 X – 58.5
Y = 0.78 X + 56.5
when X = 100
Y = 0.78 × 100 + 56.5= 134.5

2.6. STANDARD ERROR OF AN ESTIMATE


Standard error of an estimate is the measure of the spread of observed values from estimated ones,
expressed by regression line or equation. The concept of standard error an estimate is analogous to
the standard deviation which measures the variation or scatter of individual items about the arithmetic
mean. Therefore, like the standard deviation which is the average of square of deviations about the
arithmetic mean, the standard error of an estimate is the average of the square of deviations between
the actual or the observed values and the estimated values based on the regression equation. It can
also be expressed as the root of the measure of unexplained variations divided by N – 2:

Unexplained variation  (Y  Yc ) 2
Syx = 
N–2 N 2
 ( X  X c )2
and Sxy =
N 2
where Syx refers to standard error of estimate of Y values on X values.
Sxy refers to standard error of estimate of X values on Y values.

34
Yc and Xc are the estimated values of Y and X variables by means of their regression equations
respectively. N – 2 is used for getting an unbiased estimate of standard error. The usual explanation
given for this division by N – 2 is that the two constants a and b were calculated on the basis of
original data and we lose two degrees of freedom. Degrees of freedom means the number of classes
to which values can be assigned at will without violating any restrictions.
However a simpler method of computing Syx and Sxy is to use the following formulae :

Y 2  aY  bXY
Syx =
N 2

X 2  a X  bXY
and Sxy =
N 2
The standard error of estimate measures the accuracy of the estimated figures. The smaller the
values of standard error of estimate, the closer will be the dots to the regression line and the better
the estimates based on the equation for this line. If standard error of estimate is zero, then there is no
variation about the line and the correlation will be perfect. Thus with the help of standard error of
estimate it is possible for us to ascertain how good and representative the regression line is as a
description of the average relationship between two series.
Example 7 : Given the following data :
X: 6 2 10 4 8
Y: 9 11 5 8 7
And two regression equations Y = 11.09 – 0.65 X and X = 16.4 – 1.3 Y. Calculate the standard
error of estimate i.e. Syx and Sxy.
Solution :
We can calculate Xc and Yc values from these regression equations.

X Y Yc Xc (Y – Yc)2 (X – Xc)2
6 9 8.0 4.7 1.00 1.69
2 11 10.6 2.1 0.16 0.01
10 5 5.4 9.9 0.16 0.01
4 8 9.3 6.0 1.69 4.00
8 7 6.7 7.3 0.09 0.49

X = 30 Y = 40 Yc = 40 Xc = 30 (Y–Yc)2 = 3.1 (X –Xc)2 = 6.20


Thus we can calculate Syx and Sxy from the above calculated values.

 (Y  Yc ) 2 3.1
Syx =   1.03  1.01
N 2 5 2

 ( X  X c )2 6.2
Sxy =   2.07  1.44
N 2 5 2

35
2.7 SUMMARY
Regression analysis deals with estimating values of one variable based on the values of one or
more other variables.
The variable being estimated is called dependent variable while the variable/s used to make
estimates is/are called independent variable/s.
The simple regression analysis involves one independent variable and one dependent variable.
It is based on the assumption of linear relationship between the two variables.
The relationship between variables is presented by means of a regression equation which is
obtained using the principle of least squares.
For a given set of data involving two variables,X and Y, we can derive two regression equations:
one treating Y as the dependent variable and the other treating X as the dependent variable.
When correlation between two variables is perfect, the two regression equations are reversible
because they both actually represent the same line.
The closer the two regression lines to each other, the higher is the degree of correlation.
The sign of the two regression coefficients is always the same as the sign of the coefficient of
correlation.
Standard error of estimate measures the variation around the regression line. A small value of
the standard error implies that the data cluster around the regression line.

2.8 SELF ASSESSMENT QUESTIONS


Exercise 1 : True or False statements :
(i) Regression is a tool for making estimates of an independent variable for a given value of
a dependent variable.
(ii) In th e reg ressio n eq u atio n Yc= a + bX, the variable Y is the independent variable.
(iii) In the regression equation Yc = a + bX, a and b are estimates of the population intercept
 and slope respectively.
(iv) The least squares principle ensures that sum of deviations from regression line is equal to
zero and  (Y  Y ) 2 is the minimum.
(v) In a regression equation, both a and b must bear the same sign.
(vi) In the case of negative correlation between the variables, one regression line is positively
sloped while the other is negatively sloped.
(vii) The sum of two regression coefficients is always equal to 1.
(viii) If the two regression equations are solved simultaneously, the X and Y values are
respectively the values of X and Y .
(ix) In the case of perfect correlation between two variables, both the regression coefficients
are equal.
(x) It is feasible to have byx = 5.4 and bxy = 0.15 for a given set of data.
(xi) The two regression coefficients byx and bxy cannot both be smaller than 1.
(xii) The difference between Y and Yc is called error.

36
(xiii) The standard error of estimate can never be equal to zero.
(xiv) The coefficient of correlation is equal to geometric mean of the two regression coefficients.
(xv) The regression coefficients are independent of the change of scale, but they are not
independent of the change of origin.
Ans. 1. F, 2. F, 3. T, 4. F, 5. F, 6. F, 7. F, 8. T, 9. F, 10. T, 11. F, 12. T, 13. F, 14. T, 15. F
Exercise 2 : Questions and Answers
(i) What do you understand by regression? What role does it play in business and economic
analysis?
(ii) Explain in your own words as to why there are two regression lines in the case of paired
values of two variables. At what point do the two regression lines intersect? If the two
regression lines coincide, what does it imply?
(iii) Explain the properties of the regression coefficients. Do you agree that for a given set of
data if each of the X values, is multiplied by 5, then the regression coefficient of Y on X
would also be multiplied by 5 while the regression coefficient of X on Y will be reduced
to l/5th of its original value? Explain.
(iv) Explain the properties of regression coefficients. What is the difference between Regression
and Correlation Analysis?
(v) Given the following data:
X : 7 9 7 12 12 11 14 16
Y : 6 12 12 14 14 16 18 20
(a) Fit the regression equation of Y on X.
(b) Estimate the value of Y for X = 15.
(vi) Given the following information: X = 56; Y = 40; X2 = 524; Y2: = 256; XY= 364:
and n = 8. Obtain the regression equation of X on Y.
(vii) In the estimation of the regression equation of two variables X and Y, the following results
were obtained:
X = 90: Y = 70; X2 = 6,360; Y2 = 2,860; XY = 3,900: and n = 10
Obtain the two regression equations.
(viii) The following data relate to 50 workers of a factory in respect of their experience (X) in
months and time needed (Y) in minutes to fit an apparatus.
Mean of X = 50
Mean of Y = 60
Standard deviation of X = 20
Standard deviation of Y = 20
Covariance (XY) = –100
Calculate the two regression coefficients and the coefficient of determination.

37
(ix) Using the following data,
(a) Obtain the two regression equations.
(b) Find the likely sales when advertising expenditure is Rs. 25 crores.
(c) Estimate the advertising budget to achieve the sales target of Rs. 150 crores ?
Advertising expenditure Sales
(Rs. crores) (Rs. crores)
Mean 20 120
Standard deviation 5 25
Coefficient of correlation 0.8

(x) The following data about the sales and advertisement expenditure of a firm are given :
Sales (X) Advertisement expenditure
(in crores of Rs.) (in lakhs of Rs.)
Mean 40 60
Standard deviation 10 15
Coefficient of correlation 0.9
(a) Estimate the likely sales for a proposed advertisement expenditure of Rs. 100 lakh.
(b) What should the advertisement expenditure be if the firm proposes a sale target of Rs.
60 crores?
(xi) The HR manager of Anomaly International wants to study the relationship betwen number
of years, experience and performance scores of the employees. An analysis of five
employees shows the following results :
No. of years’ experience (X) : 6 2 10 4 8
Performance score (Y) : 19 11 25 18 17
(a) Fit a regression equation of Y on X and interpret it.
(b) Calculate the likely performance score if experience is five years.
(c) Calculate the standard error of estimate and comment on the reliability of the
estimating equation.
(xii) While making calculations about regression equations, a student obtained the following
results :
n = 25, X = 125, X2 = 650; Y = 100, Y2 = 460, and XY = 508
It was discovered later, however, that two pairs of values were wrongly recorded as
X Y X Y
6 14 while the correct values were : 8 12
8 6 6 8
Obtain the two regression equations.

38
(xiii) The equations of two lines of regression between variables X and Y (not necessarily in
that order) are 2X + 3Y – 8 = 0 and X + 2Y – 5 = 0. The variance of X is 4. Find
(a) Variance of Y.
(b) Coefficient of Determination of X and Y
(c) Standard error of estiamte of X on Y, and standard error of estimate of Y on X.
(xiv) Given the following data :
Age Salary (X) Total
(Years) 250–300 300–350 350–400 400–450 450–500
20–30 5 5
30–40 2 3 2 7
40–50 1 6 3 10
50–60 1 2 1 4 8
Total 7 5 10 4 4 30

Obtain the two regression coefficients and the regression equations.


Ans. 5. Y = 0.861 + 01.194X, 18.78, 6. X = 0.5 + 1.5Y, 7. Y = 1.70 + 0.589X and X = – 0.66 +
1.38Y, 8. byx = bxy = – 0.25, r2 = 0.0625, 9. (a) Y = 4 + 4X, X = 0.8 + 0.16Y (b) 140 crores (c) 24.8
crores, 10. (i) 64 crores, (ii) 87 lakhs, 11. (i) Y = 9.9 + 1.35X (ii) 16.65, (iii) 3.006, 12. Y = + 0.8 X
and X = 2.778, 13. (a) 1.333 (b) 0.75 (c) SEyx = 0.5774, SExy = 1, 14. byx = 5.143, bxy = 0.125, Y
= 146.86 + 5.143X, X = – 3.07 + 0.125Y

39
UNIT-IV : INDEX NUMBERS
LESSON 1 : INDEX NUMBERS

1. STRUCTURE
1.0 Objective
1.1 Introduction
1.2 Features of Index Numbers
1.3 Problems of Index Numbers
1.3.1 The purpose of Index Numbers
1.3.2 Selection of Items
1.3.3 Price Quotations
1.3.4 Selection of the Base Period
1.3.5 The Choice of an Average
1.3.6 Selection of Appropriate Weights
1.4 Methods of Constructing Index Numbers
1.4.1 Unweighted Index Numbers
1.4.2 Weighted Index Numbers
1.5 Tests of Adequacy
1.6 Chain Base Index
1.7 Splicing
1.8 Consumer Price Index
1.9 Index Number of Industrial Production
1.10 Limitations of Index Numbers
1.11 Construction of BSE Sensex and NSE NIFTY
1.12 Summary
1.13 Self Assessment Questions

1.0 OBJECTIVE
After studying this lesson you should be able to :
(i) Understand the meaning and uses of index numbers
(ii) Identify various problems faced in the construction of index numbers
(iii) Learn different methods of constructing index numbers including BSE sensex and NIFTY
(iv) Appreciated different tests of consistency of index numbers
(v) Learn the consumer price index and its computation
(vi) Learn the process of base shifting, spacing and deflating of index numbers

1.1 INTRODUCTION
Economic activities have constant tendency to change. Prices of commodities which are the total
result of number of economic activities also have a tendency to fluctuate. The problem of change in
prices is very important. But it is not very simple to study this problem and derive conclusions

40
because price of different commodities change by different degrees. Hence, there is a great need for
a device which can smoothen the irregularities in the prices to obtain a conclusion. This need is
satisfied by Index Numbers which makes use of percentages and average for achieving the desired
objective. Index Number is a device for comparing the general level of the magnitude of a group of
distinct but related variables in two or more situations. Index Numbers are used to feel the pulse of
the economy and they reveal the inflationary or deflationary tendencies. In reality, Index Numbers
are described as barometers of economic activity because if one wants to have an idea as to what is
happening in an economy, he should check the important indices like the index numbers of industrial
production, agricultural production, business activity etc.
The various definitions of Index Numbers are discussed under three heads :
(i) Measure of change
(ii) Device to measure change
(iii) A series representing the process of change.
According to Maslow, it is a numerical value characterising the change in complex economic
phenomenon over a period of time.
Spiegal explains an index number is a statistical measure designed to show changes in variable
or a group of related variables with respect to time, geographical location or other characteristics.
Gregory and Ward describes it as a measure over time designed to show average change in the
price, quantity or value of a group of items.
Croxton and Cowden says Index numbers are devices for measuring differences in the magnitude
of a group of related variables.
B.L. Bowley describes Index Numbers as a series which reflects in its trend and fluctuations
the movements of some quantity to which it is related.
Blair puts Index Numbers as a specialised kind of average.

1.2 FEATURES OF INDEX NUMBERS


Index Numbers have the following features :
(i) Index numbers are specialised averages which are capable of being expressed in percentage.
(ii) Index numbers measure the changes in the level of a given phenomenon.
(iii) Index numbers measure the effect of changes over a period of time.
Index Numbers are indispensable tools of economic and business analysis. Their significance can
be appreciated by following points:
1. Index number helps in measuring relative changes in a set of items.
2. Index numbers provide a good basis of comparison because they are expressed in abstract unit
distinct from the unit of element.
3. Index numbers help in framing suitable policies for business and economic activities.
4. Index numbers help in measuring the general trend of the phenomenon.
5. Index numbers are used in deflating. They are used to adjust the original data for price changes
or to adjust wages for cost of living changes.

41
6. The utility of index numbers has increased a great deal because of the method of splicing
whereby the index prepared on any one base can be adjusted with reference to any other base.
7. As a measure of average change in a group of elements the index numbers can be used for
forecasting future events. Whereas a trend line gives an average rate of change in a single
phenomenon, it indicates the trend for a group of commodities.
8. It is helpful in a study of comparative purchasing power of money in different countries of the
world.
9. Index numbers of business activities throw light on the economic progress made by various
countries.

1.3 PROBLEMS OF INDEX NUMBERS


While constructing Index Number, the following problems arise:
1.3.1 The Purpose of Index Numbers
Before constructing an Index Number, it is necessary to define precisely the purpose for which they
are to be constructed. A single Index can not fulfil all the purposes. Index Numbers are specialised
tools which are more efficient and useful when properly used. If the purpose is not clear, the data
used may be unsuitable and the indices obtained may be misleading. If it is desired to construct a
Cost of Living Index Number of Labour class, then only those item will be included, which are
required by the Labour class.
1.3.2 Selection of Items
The list of commodities included in the Index numbers is called the ‘Regimen’. Because it may not
be possible to include all the items, it becomes necessary to decide what items are to be included.
Only those items should be selected which are representative of the data, e.g. in a consumer Price
Index for working class, items like scooters, cars, refrigerators, cosmetics, etc. find no place. There
is no hard and fast rule regarding the inclusion of number of commodities while constructing Index
Numbers. The number of commodities should be such as to permit the influence of the inertia of
large numbers. At the same time the numbers should not be so large as to make the work of
computation uneconomical and even difficult. The number of commodities should therefore be
reasonable. The following points should be considered while selecting the items to be included in
the Index :
(i) The items should be representative.
(ii) The items should be of a standard quality.
(iii) Non-tangible items should be excluded.
(iv) The items should be reasonable in number.
1.3.3 Price Quotations
It is neither possible nor necessary to collect prices of the commodities from all markets in the
country where it is dealt with, we should take a sample of the markets. Selection must be made of
the representative places and persons. These places should be well known for trading these
commodities.
It is necessary to select a reliable agency from where price quotations are obtained.

42
1.3.4 Selection of the Base Period
In the construction of Index Numbers, the selection of the base period is very important step since
the base period serves as a reference period and the prices for a given period are expressed as
percentages of those for the base year, it is therefore necessary that
(i) the base period should be normal and
(ii) it should not be too far in the past.
There are two methods by which base period can be selected (i) Fixed base method and (ii)
Chain base method.
Fixed base method : According to this any year is taken as a base. Prices during the year are
taken equal to 100 and the prices of other years are shown as percentages of those prices of the base
year. Thus if indices for 1998, 99, 2000, and 2001 are calculated with 1997 as base year, such
indices will be called as fixed base indices.
Chain base method: According to this method, relatives of each year are calculated on the
basis of the prices of the preceding year. The Chain base Index Numbers are called as Link Relatives
e.g., if index numbers are constructed for 1997, 98, 99, 2000 and 2001 then for 1998,1997 will be
the base and for 1999, 1998 will be the base and so on.
1.3.5 The Choice of an Average
An Index number is a technique of ‘averaging’ all the changes in the group of series over a period of
time, the main problem is to select an average which may be able to summarise the change in the
component series adequately. Median, Mode and Harmonic Mean are never used in the construction
of index numbers. A choice has to be made between the Arithmetic Mean and the Geometric Mean.
Merits and demerits of the two are then to be compared. Theoretically G.M. is superior to the A.M.
in many respects but due to difficulty in its computation, it is not widely used for this purpose.
1.3.6 Selection of Appropriate Weights: The term weight refers to the relative importance of the
different items in the construction of index numbers. All items are not of equal importance and
hence it is necessary to find out some suitable methods by which the varying importance of the
different items is taken into account. The system of weighing depends upon the purpose of index
numbers, but they ought to reflect the relative importance of the commodities in the regimen. The
system may be either arbitrary or rational. The weightage may be according to either:
(1) the value of quantity produced, or
(2) the value of quantity consumed, or
(3) the value or quantity sold or put on sale.
There are two methods of assigning weights.
(i) Implicit and (ii) Explicit.
Implicit : Under this method, the commodity to which greater importance has to be given is
repeated a number of times i.e., a number of varieties of such commodities are included in the index
numbers as separate items.
Explicit : In this case, the weights are explicitly assigned to commodities. Only one kind of a
commodity is included in the construction of Index numbers but its price relative is multiplied by
the figure of weights assigned to it. There has to be some logic in assigning such type of weights.

43
1.4 METHODS OF CONSTRUCTING INDEX NUMBERS
The index number for this purpose is divided into two heads :
(1) Unweighted Indices ; and
(2) Weighted Indices.
Each one of these types is further sub-divided under two categories :
(i) Simple aggregative; and
(ii) Average of price relatives.
1.4.1 Unweighted Index Numbers
(i) Simple aggregative method : Under this method the total of the current year prices for various
commodities is divided by the total of the base year and the quotient is multiplied by 100.
Symbolically,

p1
P01   100
p0
where P01 represents the Price Index, p1 represents prices of current year and p0 prices of base
year.
Example 1 : From the following data construct the index for 2013 taking 2000 at base year.
Commodity Prices in 2000 Prices in 2013
(Rs.) (Rs.)
A 30 30
B 35 50
C 45 75
D 45 70
E 25 40
Solution : Construction of Price Index.
Commodity Prices in 2000 Prices in 2013
(Rs.) (Rs.)
A 30 30
B 35 50
C 45 75
D 45 70
E 25 40
p0 = 180 p1 = 265

44
sum of prices in 2013
Price Index for 2013 with 2000 as base = ×100
sum of prices in 2000

Symbolically,

Σp1 265
P01 = × 100 = × 100 = 147.2
Σp0 180

Hence there is an increase of 47.2% in prices of commodities during the year 2013 as compared
to 2000.
(ii) Average of Price Relative Method: Under this method, calculate first the price relatives for the
various items included in the index and then average the price relatives by using any of the
measures of the central value, i.e. A.M.; the median; the mode; the Geometric mean or the
Harmonic mean.

p 
  1  100
p
(a) When arithmetic mean is used P01   0 
N

(b) When geometric mean is used

 p 
  log  1  100  
  p0 
P01 = AL   
 N 

where N refers to the number of items whose price relatives are averaged.
Example 2 : Calculate Index Numbers for 2011, 2012 and 2013 taking 2010 as base from the
following data by average of relatives method.
Commodity 2010 2011 2012 2013

A 2 5 4 3

B 8 11 13 6

C 4 5 6 8

D 6 4 5 7

E 5 4 6 3

45
Solution :
Construction of Index Numbers based on Mean of Relatives.
Commodity 2010 2011 2012 2013

p1 p2 p3
p0 p1 ×100 p2 ×100 p3 ×100
p0 p0 p0

A 2 100 5 250.0 4 200.0 3 150.0


B 8 100 11 137.5 13 162.5 6 75.0
C 4 100 5 125.0 6 150.0 8 200.0
D 6 100 4 66.7 5 83.3 7 116.7
E 5 100 4 80.0 6 120.0 3 60.0
500 659.2 715.8 601.7
P01 = Index with 2010 as base and 2011 as current year
p 
  1  100
 p0  659.2
P01    131.84
N 5
P = Index with 2010 as base and 2012 as current year
02

p 
  2  100
 p0  715.8
P02    143.16
N 5

P03 = Price Index with 2010 as base and 2013 as current year

p 
  3  100
 p0  601.7
P03    120.34
N 5

1.4.2 Weighted Index Numbers


(I) Aggregative Method: These indices are of the simple aggregative type with the only difference
that the weights are assigned to the various items included in the index. This method in fact
can be described as an extension of the simple aggregative method in the sense that the weights
are assigned to the different commodities included in the index. There are various methods by
which weights can be assigned and hence a large number of formulae for constructing Index
Numbers have been devised. Some commonly used methods suggested by different authorities
are as follows :
(i) Laspeyre’s method.
(ii) Paasche’s method.

46
(iii) Fisher’s ideal method.
(iv) Marshall Edgeworth method.
(v) Kelly’s method.
(vi) Dorbish and Bowley’s method.
(i) Laspeyre’s Method
Laspeyre suggested that for calculating Price Indices, the quantities in the base year should be
used as weights. Hence the formula for computing price Index number would be :

Σp1q 0
P01 = ×100
Σp0q 0

where P refers to Price Index,


01
p refers to price of each commodity,
q refers to quantity of each commodity,
0 base year,
1 current year, and
Σ refers to the summation of items.
The steps for calculating Index Numbers are :
(a) Multiply the price of each commodity for current year with its respective Quantity for the
base year (p1 × q ) and then find out the total of this product Σ (p1q0).
0
(b) Multiply the price of each commodity for the base year with the respective quantity for
the base year (p0 × q0) and then find out the total of these products for different commodities
Σ (p1q0).
(c) Divide Σ (p1q0) with Σ (p0q0) and multiply the quotient by 100. On the other hand, if
Quantity Index is to be calculated, the prices of base year will be used as weights.
Symbolically,

Σq1p0
Q01 = ×100
Σq 0 p0

Example 3 : Compute Price Index and Quantity Index from data given below by Laspeyre’s method.
Items Base year Current year
Quantity Price Quantity Price
A 6 units 40 paise 7 units 30 paise
B 4 units 45 paise 5 units 50 paise
C 5 units 90 paise 1.5 units 40 paise

47
Solution : Computation of Price and Quantity Indices.
Base year Current year
Items q0 p0 q1 p1 p0q0 p1q0 p0q1 p1q1
A 6 40 7 30 240 180 280 210
B 4 45 5 50 180 200 225 250
C 5 90 1.5 40 45 20 135 60
Σ p0q0 = 465 p1q0 = 400 p0q1 = 640 p1q1 = 520

p1q0 400
Price Index (P01)  p q  100  465  100  86.00
0 0

q1 p0 640
Quantity Index (Q01)   100   100  137.63
q0 p0 665

(ii) Paasches Method: Under this method of calculating Price Index the quantities of the current
year are used as weights as compared to base year quantities used by Lespeyre. Symbolically

Price Index or P01  p1q1  100


p0 q1
Steps of construction Index according to Paasche’s method are :
(i) Calculate the product of the current year prices of different commodities and their respective
quantities for the current year (p1× q1) and find out the total of the product of different
commodities (p1× q1) .
(ii) Calculate the product of p and q1 of different commodites and aggregate them  (p0q1).
0
(iii) Divide (p1× q1) with (p0q1) and multiply the quotient by 100 to obtain Price Index.
Similarly, quantity index is calculated using the current year price as weights. Symbolically,
q1 p1
Q01   100
q0 p1
Example 4 : From the data of previous illustration, calculate (i) Price Index (ii) Quantity Index by
Paasche’s method.
Base year Current year
Items q0 p0 q1 p1 p0q0 p1q0 p0q1 p1q1
A 6 40 7 30 240 180 280 210
B 4 45 5 50 180 200 225 250
C 5 90 1.5 40 45 20 135 60
Total 465 400 640 520

48
p1q1 520
Price Index P01   100   100  81.5
p0 q1 640

q1 p1 520
Quantity Index Q01   100   100  130
q0 p1 400

(iii) Fisher’s Ideal Index : Laspeyre has used base year quantities as weights whereas Paasche’s
has used current year quantities as weights for the computation of Index Number of prices. Fisher
suggested that both the current year quantities and the base year quantities should be used but
geometric mean of the two be calculated and that figure should be the Index Number. Symbolically,

 p1q0   p1q1   p1q0   p1q1 


Fisher’s Price Index P01 =  p q  100   p q  100   p q    p q   100
0 0 0 1 0 0 0 1

Fisher’s Index = Laspeyre's Index × Paasche's Index


On the other hand if quantity Indices by this method are to be calculated the geometric mean of
the Index Number of quantities with base year prices as weights and Index Number of Quantities
with current year as weights be found out. Symbolically,

q p q p
Fisher’s Quantity Index Q01   1 0    1 1   100
 q p   q p 
0 0 0 1

Example 5 : Construct Index Number of Prices and Quantities from the following data using Fisher’s
method (2010 = 100).
2010 2014
Commodity Price Qty. Price Qty.
A 2 8 4 6
B 5 10 6 5
C 4 14 5 10
C 2 19 2 13

Solution : Calculation of Price and Production Indices.


2010 2014
Items price (p0) Qty. (q0) price (p1) Qty. (q1) p0q0 p1q1 p1q0 p0q1

A 2 8 4 6 16 24 32 12
B 5 10 6 5 50 30 60 25
C 4 14 5 10 56 50 70 40
D 2 19 2 13 38 26 38 26
Total 160 130 200 103

49
 p q   p q  200 130
P01   1 0    1 1   100    100  125.6
 p0 q0   p0 q1  160 103

 q p    q p  103 130
Q01   1 0    1 1   100    100  64.7
 q0 p0   q0 p1  160 200

(iv) Marshall & Edgeworth’s Method: In this method also both current year as well as base
year prices and quantities are considered. The formula is as follows :

 (q0  q1 ) p1 q0 p1  q1 p1


P01   100   100
 (q0  q1 ) p0 q0 p0  q1 p0

and Quantity Index is calculated by the formula

 ( p0  p1 )q1 p0 q1  p1q1


Q01   100   100
 ( p0  p1 )q0 p0 q0  p1q0

(v) Kelly’s Method: Truman Kelly has suggested the following formula for constructing Index
Number.
p1q q0  q1
P01   100 where q 
p0 q 2
where q refers to the average quantity of two periods. This is also known as fixed aggregative
method.
(vi) Dorbish & Bowley’s Method: Dorbish & Bowley have suggested the simple arithmetic
mean of Lespeyre’s and Paasche’s formula. Symbolically.

p1q0 p1q1

p0 q0 p0 q1
P01   100
2

(II) Weighted Average of Price Relatives : This method is also known as the Family Budget Method.
Weights are values (p0q0) of the base year in this method. The Index Number for the current year is
calculated by dividing the sum of the products of the current year’s price relatives and base year
values by the total of the weights, i.e., the weighted arithmetic average of the price relatives gives
the required index numbers. Symbolically,

IV
Weighted Index number of the current year =
V
where I stands for Price Relatives of the current year and V stands for the values of the base year.

50
Example 6 : From the data given below, calculate the Weighted Index Number by using weighted
average of Relatives.
Commodities Units Base Yr. Qty. Base Year’s Price Current Yr. Price
A Quintal 7 16 19.6
B Kg. 6 2 3.2
C Dozen 16 5.6 7.0
D Metre 21 1.5 1.4
Solution :

Current Year's Price


The Price relative of the current year = ×100
Base Year's Price
Value of a weights = Quantity of base year × Price of the base year
Commodities Price Relatives Value of Weights Weights × Price Relatives
p 
I =  1 ×100 i.e. V = p0q0 V×I
 p0 
A 122.5 112.0 13,720
B 160.0 12.0 1,920
C 125 89.6 11,200
D 93.3 31.5 2,939
V = 245.1 V = 29,779

IV 29779
Weighted Index Number of the Current year   = 121.5 Ans.
V 245
In weighted average of relatives, the Geometric mean may be used instead of arithmetic mean.
The weighted geometric mean of relatives is calculated by applying logarithms to the relatives.
When this mean is used, then formula is:

P01 = Antilog  V .log I


V  where I =
p1
p0
 100 and V = p0q0

Example 7 : Find out price index by weighted average of price relatives from the following
commodities using geometric mean :
Commodities p0 q0 p1
X 3.0 20 4.0
Y 1.5 40 1.6
Z 1.0 10 1.5

51
Solution :
Calculation of Index Number

 p1 
(p0q0)  p0 ×100

Commodities p0 q0 p1 V I Log I V. log I

4 
X 3.0 20 4.0 60 133.33   100 2.1249 127.494
3

 1.6 
Y 1.5 40 1.6 60 106.7   100 2.0282 121.692
1.5 

 1.5 
Z 1.0 10 1.5 10 150.0   100 2.1761 21.761
1.0 
V = 130 V log I = 270.947
By applying the formula:

P01 = AL  V .log I
V   270.947 
 AL 
 130 
 AL2.084  121.3

1.5 TESTS OF ADEQUACY OR CONSISTENCY


Since several formulas have been suggested for the construction of index numbers, then the question
arises which method of index number is the most suitable in a given situation. These are some tests
to choose an appropriate index:
(i) Unit Test: It requires that the method of constructing index should be independent of the
units of the problem. All the methods except simple aggregative method satisfy this test.
(ii) Circular Test: It is based on the shiftability of the base. Accordingly, the index should
work in a circular fashion i.e., if an index number is computed for the period 1 on the base
period 0, another index is computed for period 2 on the base period 1, and still another
index number is computed for period 3 on the base period 2. Then the product should be
equal to one.
P01 × P12 × P23 ..........× Pn = 1
0
Only simple aggregative and fixed weight aggregative method satisfy the test.
If the test is applied to simple aggregative method, we will get

p1 p2 p3


  1
p0 p1 p2

The test is met by simple geometric mean of price relatives and the weighted aggregative of
fixed weights.

52
(iii) Time Reversal Test: According to Prof. Fisher the formula for calculating an index number
should be such that it gives the same ratio between one point of time and the other, no
matter which of the two time is taken as the base. In other words, when the data for any
two years are treated by the same method, but with the base reversed, the two index
numbers should be reciprocals of each other.
P01 × P10 = 1 (omitting the factor 100 from each index).
where P01 denotes the index for current year 1 based on the base year 0 and P10 is for current
year 0 on the base year 1.
It can be easily verified that simple geometric mean of price relatives index, weighted aggregative
formula, weighted geometric mean of relatives and Marshall Edgeworth and Fisher’s ideal
method satisfies the test.
Let us see how Fisher’s ideal method satisfies the test.

p1q0 p1q1
P01  
p0 q0 p0 q1

By changing time from 0 to 1 and 1 to 0

p0 q1 p0 q0
P01  
p1q1 p1q0

Now P01 × P10 = 1


Substitute the value of P01 and P10

p1q0 p1q1 p0 q1 p0 q0


P01  P10     1
p0 q0 p0 q1 p1q1 p1q0

(iv) Factor Reversal Test: It says that the product of a price index and the quantity index
should be equal to value index. In the words of Fisher, just as each formula should permit
the interchange of the two times without giving inconsistent results similarly it should
permit interchanging the prices and quantities without giving inconsistent results which
means two results multiplied together should give the true value ratio. The test says that
the change in price multiplied by change in quantity should be equal to total change in
value. If P01 is a price index for the current year with reference to base year and Q01 is the
quantity index for the current year,

p1q1
then P01  Q 01 
p0 q0

This test is satisfied only by Fisher’s ideal index method.

p1q0 p1q1
P01  
p0 q0 p0 q1

53
Changing p to q and q to p.

q1 p0 q1 p1
Q01  
q0 p0 q0 p1

p1q0 p1q1 q1 p0 q1 p1 (q1 p1 ) 2 (p1q1 )


 P01  Q01      2

p0 q0 p0 q1 q0 p0 q0 p1 (p0 q0 ) (p0 q0 )

In other words, factor reversal test is based on the following analogy. If the price per unit of a
commodity increases from Rs. 10 in 1995 to Rs. 15 in 1998, and the quantity of consumption
changes from 100 units to 140 units during the same period, then the price and quantity in 1998 are
15 and 140 respectively. The values of consumption (p × q) were Rs. 1000 in 1995 and Rs. 2100 in
1998 giving a value ratio.

p1q1 2100
  2.1
p0 q0 1000

Thus we find that the product of price ratio and quantity ratio equals the value ratio :
1.5 × 1.4 = 2.1

1.6 CHAIN BASE INDEX


The various formulas discussed so far assume that base period is some fixed previous period. The
index of a given year on a given fixed base is not affected by changes in the prices or the quantities
of any other year. On the other hand, in the chain base method, the value of each period is related
with that of the immediately proceeding period and not with any fixed period. To contruct index
numbers by chain base method, a series of index numbers are computed for each year with preceeding
year as the base. These index numbers are known as Link relatives. The link relatives when multiplied
successively known as the chaining process give link to a common base. The products obtained are
expressed as % and give the required index number. The steps of chain base index are :
(i) Express the figures of each period as a % of the preceeding period to obtain Link Relatives
(LR)
(ii) These link relatives are chained together by successive multiplication to get chain indices
by the formula:

Current year LR × Preceding year Chain Index


Chain Base Index (CBI) =
100
(iii) The chain index can be converted into a fixed base index by this formula :

Current year CBI × Previous year FBI


Fixed Base Index (FBI) =
100
Chain relatives are computed from link relatives whereas fixed base relatives are computed

54
directly from the original data. The results obtained by fixed base and chain base index invariably
are the same.
We shall understand the process by taking some examples.
Example 8 : Construct Index Numbers by chain base method from the following data of wholesale
prices.
Year : 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Prices : 75 50 65 60 72 70 69 75 84 80
Solution :
Computation of Chain Index
Year Price Link Relatives Chain Base Index Fixed Base Index
2005 75 100 100 100

50 66.67  100 50
2006 50  100  66.67  66.67  100  66.67
75 100 75

65 130  66.67 65
2007 65  100  130  86.67  100  86.67
50 100 75

60 92.31  86.67 60
2008 60  100  92.31  80.00  100  80
65 100 75

72 120  80 72
2009 72  100  120  96.00  100  96
60 100 75

70 97.22  96 70
2010 70  100  97.22  93.33  100  93.33
72 100 75

69 98.57  93.33 69
2011 69  100  98.57  92.00  100  92
70 100 75

75 108.69  92 75
2012 75  100  108.69  100.00  100  100
69 100 75

84 112  100 84
2013 84  100  112  112.00  100  112
75 100 75

80 95.24  112 80
2014 80  100  95.24  106.67  100  106.67
84 100 75
It may be seen that index by chain base and fixed base method comes to the same.

55
Example 9 : Construct chain index numbers from the link relatives given below:
Year : 2011 2012 2013 2014 2015
Link Relatives : 100 105 95 115 102
Solution :
Calculations for Chain Base Index
Year Link Relatives Chain Index Number
2011 100 100
105
2012 105  100  105.00
100
95
2013 95  105  99.75
100
115
2014 115  99.75  114.7
100
102
2015 102  114.75  117
100
Base Shifting: Sometimes it becomes necessary to change the base of index number series
from one period to another for the purpose of comparison. In such circumstances it is necessary to
recompute all index numbers using new base period. Such computation of index numbers using
new base period is to divide index number in each period by the index number corresponding to the
new base period and then to express the result as percentages. This process is known as the Base
shifting.
Example 10 : Compute Index Numbers from the following taking 2012 as the base and shift the
base to 2014.
Solution :
Year Price Index Number Shift of base from
Base 2012 2012 to 2014

100
2012 10 100  100  67
150
12 120
2013 12  100  120  100  80
10 150
15
2014 15  100  150 100
10
21 210
2015 21  100  210  100  140
10 150
20 200
2016 20  100  200  100  133
10 150

56
1.7 SPLICING
On several occasions the base year may give discontinuity in the construction of index numbers.
We would always like to compare figures with a recent year and not with distant past. For example,
the weights of an index number may become out of data and we may construct another index with
new weights. Two indices would appear. It becomes necessary to convert these two indices into a
continuous series. The procedure employed to do the conversion is known as splicing. The formulae
are :
For Forward Splicing :
Old index of the New Base Year × Index to be adjusted
Spliced Index Number:
100
For Backward Splicing:
100
Spliced Index Number: × Index to be adjusted
Old index of the New Base Year
Example 11 : Splice the following two Index number series, A series forward and B series backward:

Year : 2010 2011 2012 2013 2014 2015

Series A : 100 120 150 — — —

Series B : — — 100 110 120 150


Solution :
Splicing of two Index Number Series
Year Series Series Index Number Spliced Index Numbers Spliced
A B forward to Series A backward to Series B

100
2010 100  100  66.66
150
100
2011 120  120  80.00
150
150 100
2012 150 100  100  150  150  100.00
100 150
150
2013 110  110  165
100
150
2014 120  120  180
100
150
2015 150  150  225
100

57
Deflating: It means making allowance for the changes in the purchasing power of money due
to a change in general price level. It is the technique of converting a series of value calculated at
current prices in to a series at constant prices of a given year. In other words the process of removing
the effects of price changes from the current money values is called Deflation. By this process the
real value of the phenomenon is calculated which is free from the influence of price changes.
Deflation is used in computation of national income and other economic variables. The relevant
price index is called the deflator whether it is to be the wholesale price index or consumer price
index. Normally separate price deflators are found out for deflating the national income data from
different sectors of the economy considering the changes in prices in those sectors. The method is :

Current value
Deflated value = ×100
Deflator

1.8 CONSUMER PRICE INDEX


The consumer price index known as cost of living index is calculated to know the average change
over time in the prices of commodities consumed by the consumers. The need to construct consumer
price indices arises because the general index numbers fail to give an exact idea of the effect of the
change in the general price level on the cost of living of different classes of people, because a given
change in the level of prices affect different classes of people in different manners. Different people
consume different commodities and if same commodities then in different proportions. The consumer
price index helps us in determining the effect of rise and fall in prices on different classes of consumers
living in different area. The consumer price index is significant because the demand for higher
wages is based on the cost of living index and the wages and salaries in most nations are adjusted
according to this index. We should understand that the cost of living index does not measure the
actual cost of living nor the fluctuations in the cost of living due to causes other than the change in
price level but its object is to find out how much the consumers of a particular class have to pay
more for a certain basket of goods and services. That is why the term cost of living index has been
replaced by the term price of living index, cost of living price index or consumer price index.
The significance of studying the consumer price index is that it helps in wage negotiations and
wage contracts. It also helps in preparing wage policy, price policy, rent control, taxation and general
economic policies. This index is also used to find out the changing purchasing power of different
currencies.
Consumer Price Index can be prepared by two methods :
(i) Aggregative Method;
(ii) Weighted Relatives Method.
When, aggregative method is used to prepare consumer price index, the aggregative expenditure
for current year and base year are calculated and the below given formula is applied.

p 1q0
Consumer Price Index = p q  100
0 0

58
When weighted relatives method is used then the family budgets of a large number of people
for whom the index is meant are carefully studied and the aggregative expenditure of an average
family on various items is estimated. These will be weights. In other words, the weights are calculated
by multiplying the base year quantities and prices (p0q0). The price relatives for all the commodities
are prepared and multiplied by the weights. By applying the formula, we can calculate Consumer
price index.

IV p1
Consumer Price Index = where I = p  100 and V = p0q0
V 0

Example 12 : Prepare the Consumer price index for 2013 on the basis of 2010 from the following
data by both methods.
Commodities Quantities Consumed Prices Prices
2010 2010 2013

A 6 5.75 6.00

B 6 5.00 8.00

C 1 6.00 9.00

D 6 8.00 10.00

E 4 2.00 1.50

F 1 20.00 15.00

Solution :
Consumer Price Index by Aggregative Method
Commodities q0 p0 p1 p1q0 p0q0

A 6 5.75 6.00 36.00 34.50


B 6 5.00 8.00 48.00 30.00
C 1 6.00 9.00 9.00 6.00
D 6 8.00 10.00 60.00 48.00
E 4 2.00 1.50 6.00 8.00
F 1 20.00 15.00 15.00 20.00

p1q0 = 174 p0q0 = 146.5

p1q0 174
Consumer Price Index =  100   100  118.77
p0 q0 146.5

59
Consumer Price Index by Weighted Relatives
Commodities q0 p0 p1 I V IV
A 6 5.75 6.00 104.34 34.50 3600
B 6 5.00 8.00 160.00 30.00 4800
C 1 6.00 9.00 150.00 6.00 900
D 6 8.00 10.00 125.00 48.00 6000
E 4 2.00 1.50 75.00 8.00 600
F 1 20.00 15.00 75.00 20.00 1500
V = 146.5 IV = 17400

IV 17400
Consumer Price Index =   118.77
V 146.5

1.9 INDEX NUMBER OF INDUSTRIAL PRODUCTION


The Index Number of industrial production is prepared to know the increase or decrease in the level
of industrial production in a given period compared with some other period. This index measures
the changes in quantum of production. To prepare such an index it is necessary for us to compute
the production for two periods i.e. for the current year and for the base year. Generally the data are
collected under these heads :
(i) Textile industries – cotton, woollen, silk etc.
(ii) Mining industries – iron-ore, coal, copper, petroleum etc.
(iii) Metallurgical industries – iron-ore, coal, copper, petroleum etc.
(iv) Mechanical industries – locomotives, ships, aeroplane etc.
(v) Miscellaneous – glass, soap, chemical, cement etc.
The output for various industries are computed. Weights are assigned to various industries on
the basis of some criteria as capital invested, turnover, net output, production etc. We apply this
formula:
IW
Index of Industrial Production =
W
q1
where I = q and W = Relative importance of different outputs.
0

1.10 LIMITATIONS OF INDEX NUMBERS


1. They are only approximate indicators indicators of the relatives level of a phenomenon.

60
2. Index number are good for achieving one objective may be unsuitable for the other.
3. Index numbers can be manipulated in a manner as to draw the desired conclusion.

1.11 CONSTRUCTION OF BSE SENSEX AND NSE NIFTY


Bombay Stock Exchange
The first organised stock exchange was established in July 1875 as an association of native brokers,
named as Native Shares and Stock Brokers Association. Its formal deed of association was executed
in 1887. This stock exchange is now popularly known as the Bombay Stock Exchange (BSE). This
stock exchange played a significant role during the phase of recovery from several years of depression.
It was the first to be recognised by the Government of India.
The Exchange, while providing an efficient and transparent market for trading in securities,
debt and derivatives upholds the interests of the investors and ensures redressal of their grievances
whether against the companies or its own member-brokers. It also strives to educate and enlighten
the investors by conducting investor education programmes and making available to them necessary
informative inputs.
SENSEX – The Barometer of Indian Capital Markets
Bombay Stock Exchange (BSE) Sensitive Index Number of equity prices (SENSEX) is the most
widely used and accepted equity price index in the country. SENSEX, first compiled in 1986, was
calculated on a “Market Capitalization-Weighted” methodology of 30 component stocks representing
large, well-established and financially sound companies across key sectors. The base year of SENSEX
was taken as 1978-79. SENSEX today is widely reported in both domestic and international markets
through print as well as electronic media. It is scientifically designed and is based on globally
accepted construction and review methodology. Since September 1, 2003, SENSEX is being
calculated on a free-float market capitalization methodology. The “free-float market capitalization-
weighted” methodology is a widely followed index construction methodology on which majority of
global equity indices are based; all major index providers like MSCI, FTSE, STOXX, S&P and
Dow Jones use the free-float methodology.
Index Specification
Base Year 1978-79
Base Index Value 100
Date of Launch 01-01-1986
Method of calculation Launched on full market capitalization method and
effective September 01, 2003, calculation method
shifted to free-float market capitalization.
Number of scrips 30
Index calculation frequency Real Time
Historical Value of Index Index, Price Earnings, Price to Book Value Ratio
and Dividend Yield %

61
SENSEX Calculation Methodology
SENSEX is calculated using the “Free-float Market Capitalization” methodology, wherein, the
level of index at any point of time reflects the free-float market value of 30 component stocks
relative to a base period. The market capitalization of a company is determined by multiplying the
price of its stock by the number of shares issued by the company. This market capitalization is
further multiplied by the free-float factor to determine the free-float market capitalization.
The base period of SENSEX is 1978-79 and the base value is 100 index points. This is often
indicated by the notation 1978-79 = 100. The calculation of SENSEX involves dividing the free-
float market capitalization of 30 companies in the Index by a number called the Index Divisor. The
Divisor is the only link to the original base period value of the SENSEX. It keeps the Index comparable
over time and is the adjustment point for all Index adjustment arising out of corporate actions,
replacement of scrips, etc. During market hours, prices of the index scrips, at which latest trades are
executed, are used by the trading system to calculate SENSEX on a continuous basis.
Calculation of BSE SENSEX
Sensex is calculated using a “Market Capitalisation-Weighted” methodology. As per this
methodology, the level of index at any point of time reflects the total market value of 30 component
stocks relative to a base period. (The market capitalisation of a company is determined by multiplying
the price of its stock by the number of shares issued by the company). Statisticians call an index of
a set of combined variables (such as price and number of shares) a composite index. A single
indexed number is used to represent the results of this calculation in order to make the value easier
to work with and track over time. It is much easier to graph a chart based on indexed values than
one based on actual values.
The base period of Sensex is 1978-79. The actual total market value of the stocks in the index
during the base period has been set equal to an indexed value of 100. This is often indicated by the
notation 1978-79 = 100. The formula used to calculate the index is fairly straightforward. However,
the calculation of the adjustments to the index (commonly called Index maintenance) is more
complex.
The calculation of Sensex involves dividing the total market capitalisation of 30 companies in the
index by a number called the Index Divisor. The Divisor is the only link to the original base period
value of the Sensex. It keeps the index comparable over time and is the adjustment point for all
index maintenance adjustments. During market hours, prices of the index scrips, at which latest
trades are executed, are used by the trading system to calculate Sensex every 15 seconds and
disseminated, all over the country through BOLT terminals in real time.
Calculation of Closing SENSEX
The closing Sensex is computed taking the weighted average of all the trades on Sensex constituents
in the last 15 minutes of trading session. If a Sensex constituent has not traded in the last 15 minutes,
the last traded price is taken for computation of the index closure. If a Sensex constituent has not
traded at all in a day, then its last day’s closing price is taken for computation of index closure. The
use of Index Closure Algorithm prevents any intentional manipulation of the closing index value.

62
BSE Sensex as on January 7, 2012
Scrip Company Close No. of shares Full mkt. Free- Free-float Weight,
code price (normal) cap. float mkt. cap. in
(Rs. crore) adj. (Rs.crore) index
factor (%)
500209 Infosys Ltd. 2,837.50 574,203,082 162,930.12 0.85 138,490.61 10.66
500325 Reliance 714.70 3,274,452,139 234,025.09 0.55 128,713.80 9.91
500875 ITC Ltd. 201.60 7,789,453,850 157,035.39 0.70 109,924.77 8.46
500010 Housing Deve. 668.05 1,470,391,801 98,229.52 0.95 93,318.05 7.18
532174 ICICI Bank Ltd. 745.45 1,152,540,454 85,916.13 1.00 85,916.13 6.61
500180 HDFC Bank Ltd. 451.10 2,337,610,305 105,449.60 0.80 84,359.68 6.49
532540 TCS Ltd. 1,171.35 1,957,220,996 229,259.08 0.30 68,777.72 5.29
500510 Larsen & Toubro 1,081.45 611,844,627 66,167.94 0.90 59,551.14 4.58
500112 State Bank of India 1,669.75 634,999,595 106,029.06 . 0.45 47,713.08 3.67
532454 Bharti Airtel 330.90 3,797,530,096 125,660.27 0.35 43,981.09 3.49
500312 ONG Corp. Ltd. 256.65 8,555,490,120 219,576.65 0.20 43,915.33 3.38
500696 Hind Uni Ltd. 396.00 2,160,391,918 85,551.52 0.50 42,775.76 3.29
500570 Tata Motors 203.55 2,691,486,150 54,785.20 0.65 35,610.38 2.74
500520 Mahindra & M 654.25 613,974,839 40,169.30 0.75 30,126.98 2.32
532555 NTPC Ltd. 157.15 8,245,464,400 129,577.47 0.20 25,915.49 2.00
507685 Wipro Ltd. 406.55 2,457,821,578 99,922.74 0.25 24,980.68 1.92
500470 Tata Steel 362.80 959,214,779 34,800.31 0.70 24,360.22 1.88
500103 BHEL 250.65 2,447,600,000 61,349.09 0.35 21,472.18 1.65
532977 Bajaj Auto 1,448.05 289,367,020 41,901.79 0.50 20,950.90 1.61
524715 Sun Pharmace 500.40 1,035,550,385 51,818.94 0.40 20,727.58 1.60
533278 Coal India 319.80 6,316,364,400 201,997.33 0.10 20,199.73 1.56
532286 Jindal Steel 466.05 934,509,595 43,552.82 0.45 19,598.77 1.51
500087 CIPLA Ltd. 335.80 802,921,357 26,962.10 0.65 17,525,36 1.35
500182 Hero Moto Co. 1,729.80 199,687,500 34,541.94 0.50 17,270.97 1.33
500440 Hindalco In 117.95 1,918,596,448 22,629.85 0.70 15,840.89 1.22
500400 Tata Power 91.95 2,373,072,360 21,820.40 0.70 15,274.28 1.18
500900 Sterlite In 94.35 3,360,700,478 31,708.21 0.45 14,268.69 1.10
532500 Maruti Suzuki 954.75 288,910,060 27,583.69 0.50 13,791.84 1.06
532868 DLF Limited 176.60 1,698,157,659 29,989.46 0.25 7,497.37 0.58
532532 Jaiprak Asso 51.90 2,126,433,182 11,036.19 0.55 6,069,90 0.47
Total 2,641,977.20 1,298,919.37

63
Sectorwise Market Capitalisation of SENSEX as on January 7, 2012
Sl. No. SENSEX / Sectors Free float market capitalization
(Rs. crore) %
SENSEX 1,427,501.46 100.00
1. Finance 311,306.94 23.97
2. Information Technology 232,249.01 17.88
3. Oil & Gas 172,629.13 13.29
4. FMCG 152,700.53 11.76
5. Transport Equipments 117,751.07 9.07
6. Metal, Metal Product & Mining 94,268.30 7.26
7. Capital Goods 81,023.32 6.24
8. Telecom 43,981.09 3.39
9. Power 41,189.77 3.17
10. Healthcare 38,252.94 2.94
11. Housing Related 13,567.27 1.04

Free-float Methodology for Calculating Sensex


From 1st September 2003, the country’s equity benchmark Sensex is being calculated based on the
Free-float methodology. Prior to 1-9-2003, the Sensex was calculated based on the full market
capitalisation methodology.
Globally, the Free-float methodology of index construction is considered to be an industry best
practice and all major index providers like MSCI, FTSE, S&P and STOXX have adopted the same.
The MSCI India Standard Index is also based on the Free-float methodology.
In India, BSE pioneered the concept of Free-float with the launch of the country’s first Free-
float based index – BSE TECk in July 2001 and BANKEX in June 2003. The shifting of Sensex to
this methodology is a culmination of successful experiences with these two indices and a series of
debates and discussions in the last few years.
The new methodology would align the Sensex with the best global practice in index construction.
A smooth transition from full market capitalisation to Free-float market capitalisation methodology
would ensure that the basic characteristics of Sensex are retained. Importantly, the Free-float
methodology will further improve the benchmarking qualities of Sensex while maintaining its
historical continuity.
The following Free-float factors will be applied to the Sensex companies. A Free-float factor
of say 0.9 means that only 90% of the total market capitalisation of that company would be taken
into consideration for index calculation.

64
Free-float Index
Currently all equtiy indices in India, except the BSE-TECk Index and BANKEX, are calculated
using the ‘full-market capitalisation’ methodology. Under the ‘full-market capitalisation’
methodology, the total market capitalisation of a company, irrespective of who is holding the shares,
is taken into consideration for computation of an index. However, if instead of taking the total
market capitalisation, only the Free-float market capitalisation of a company is considered for index
calculation, it is called the Free-float methodology. Free-float market capitalisation is defined as
that proportion of total shares issued by the company, which are readily available for trading in the
market. It generally excludes promoters’ holding, government holding, strategic holding and other
locked-in shares, which will not come to the market for trading in the normal course. Thus, the
market capitalisation of each company in a Free-float index is reduced to the extent of its Free-float
available in the market.
National Stock Exchange (NSE)
In order to provide a nationwide stock trading facility to investors and to bring the Indian financial
market in line with international market, the National Stock Exchange (NSE) was set up and it
started its operations by the end of 1993. Further, it started trading in debt instruments in May, 1994
and in equity shares by the end of November 1994. The NSE uses the electronic trading system and
computerised settlement system. This system is so designed that it can be extended to every corner
of the country through the medium of electronic network. It was recently accorded recognition as a
stock exchange by the Department of Company Affairs. The instruments traded are treasury bills,
government security and bonds issued by public sector companies.
The exchange has two separate segments, i.e., capital market segment and money market
segment. The former is concerned with trading in equity shares, convertible debentures and debt
instruments as non-convertible debentures. In the money market segment, also known as whole-
sale debt market segment, facilitates trading in debts, public sector bonds, mutual fund units, treasury
bills, government securities, call money instruments, etc. The transactions in this segment are of
high values. The main participants, in this market are usually banks, financial institutions and other
financial agencies.
NSE-50, NIFTY
The NSE-50 index, NIFTY was launched by the National Stock Exchange of India Limited (NSE)
in April 1996, taking as base the closing prices of November prices of November 3, 1995 when one
year of operations of its capital market segment was completed. According to the NSE, the index
was introduced with the objectives of:
1. reflecting market movement more accurately,
2. providing fund managers with a tool for measuring portfolio returns vis-a-vis market
returns, and
3. providing a basis for introducing index-based derivatives.
The index is based on the prices of shares of 50 companies (chosen from among the companies

65
traded on the NSE), each with a market capitalisation of at least Rs. 500 crores and having a high
degree of liquidity. The methodology used for the computation of this index is ‘market capitalisation
weightage’ as followed by the S&P-500. The base value of the index has been set at 1000, and not
the usual 100.
S&P CNX NIFTY
The S&P CNX Nifty is the headline index on the National Stock Exchange of India Ltd (NSE). It
includes 50 of the approximately 1,300 companies listed on the NSE, captures approximately 60%
of its equity market capitalization and is a true reflection of the-Indian stock market.
S&P CNX Nifty tracks the behaviour of a portfolio of blue chip companies, the largest and
most liquid Indian securities. It covers 25 sectors-of the Indian economy and offers investment
managers exposure to the Indian market in one efficient portfolio. The index has been trading since
April of 1996 and is well suited for benchmarking, index funds, and index-based derivatives.
The S&P CNX Nifty index is owned and managed by the Indian Index Services and Products
Ltd. (IISL), with which Standard and Poor’s has a consulting and licensing agreement. IISL is a
joint venture between NSE and CRISIL (formerly Credit Rating Information Services of India Ltd.).
Index Methodology
S&P CNX Nifty is maintained by IISL’s Index Policy Committee, which manages policy and
guidelines for all CNX (CRISIL/NSE) indices. This Index Policy Committee follows a clear published
set of rules for index revision and meets quarterly to consider their application. Additionally, the
IISL’s Index Maintenance Sub-Committee reviews decisions about additions and deletions to the
index on a quarterly basis. Complete details of these rules are available on the website at
www.indices.standardandpoors.com.
NIFTY COMPOSITION
Sl. Scrip Equity Free Weigh- Beta R2 Vola- Monthly Impact
No. Capital Float tage% tility Returns Cost
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
1. ACC 1,877,452,660 9,629 0.59 0.72 0.48 1.23 6.43 0.07
2. Ambuja Cem 3,063,349,822 10,652 0.65 0.97 0.48 1.61 –3.22 0.08
3. Axis Bank 4,116,997,330 34,573 2.10 1.36 0.74 1.86 3.72 0.07
4. Bajaj Auto 2,893,670,200 19,722 1.20 0.75 0.49 1.07 4.44 0.06
5. Bharti Airtel 18,987,650,480 52,650 3.20 0.76 0.46 2.05 10.72 0.07
6. BHEL 4,895,200,000 29,067 1.77 0.86 0.58 1.99 –10.29 0.06
7. BPCL 3,615,421,240 8,500 0.52 0.79 0.45 0.92 1.17 0.07
8. CAIRN 19,022,340,290 11,439 0.70 0.59 0.40 1.79 –1.11 0.07
9. CIPLA 1,605,842,714 15,558 0.95 0.71 0.51 1.26 –7.39 0.07
10. DLF 3,395,150,248 8,380 0.51 1.42 0.66 2.59 9.76 0.07
11. DR Reddy 846,959,090 19,974 1.22 0.59 0.46 1.55 3.53 0.08

66
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
12. GAIL 12,684,774,000 20,712 1.26 0.62 0.46 1.34 4.51 0.08
13. GRASIM 917,018,380 13,942 0.85 0.67 0.49 1.37 4.78 0.08
14. HCL Tech 1,376,020,536 11,839 0.72 0.95 0.60 1.21 –1.61 0.06
15. HDFC 2,936,038,650 89,641 5.45 1.19 0.71 1.26 –2.45 0.07
16. HDFC Bank 4,667,711,360 87,080 5.30 1.06 0.74 . 1.06 –3.24 0.07
17. Hero Honda 399,375,000 17,035 1.04 0.53 0.25 1.20 –4.91 0.06
18. Hindal Co. 1,914,419,297 21,647 1.32 1.42 0.67 2.44 –6.70 0.06
19. Hind Unilever 2,160,683,560 33,208 2.02 0.58 0.42 0.84 –5.72 0.05
20. ICICI Bank 11,518,614,870 119,419 7.27 1.40 0.78 1.47 –5.29 0.06
21. IDFC 14,627,715,770 15,139 0.92 1.47 0.71 2.45 –3.93 0.07
22. INFY 2,870,938,460 133,825 8.14 0.83 0.62 1.38 –4.62 0.04
23. ITC 7,738,144,280 110,948 6.75 0.75 0.54 1.26 2.66 0.06
24. Jindal Steel 934,509,595 22,847 1.39 0.99 0.67 1.21 –9.87 0.06
25. JP Associate 4,252,866,364 7,527 0.46 1.78 0.70 1.69 –17.73 0.07
26. Kotak Bank 3,689,051,845 15,763 0.96 1.20 0.67 2.02 –7.16 0.08
27. LT 1,220,046,436 92,399 5.62 1.18 0.73 1.68 –5.38 0.06
28. M&M 3,069,874,195 33,234 2.02 1.20 0.65 1.99 2.69 0.07
29. Maruti 1,444,550,300 15,963 0.97 0.82 0.56 1.14 4.03 0.06
30. NTPC 82,454,644,000 22,507 1.37 0.75 0.57 1.23 –5.78 0.07
31. ONGC 42,777,450,600 36,322 2.21 0.71 0.48 0.94 –1.82 0.07
32. PNB 3,168,121,570 14,955 0.91 0.97 0.64 1.73 3.18 0.08
33. Power Grid 46,297,253,530 14,879 0.91 0.53 0.45 0.96 –4.02 0.06
34. Ranbaxy 2,106,816,150 8,231 0.50 0.89 0.50 1.29 –0.35 0.06
35. RCom 10,320,134,405 6,736 0.41 1.22 0.47 4.16 6.11 0.08
36. Rel Capital 2,456,328,000 6,484 0.39 1.24 0.57 1.54 –0.50 0.06
37. Reliance 32,738,103,000 139,612 8.50 0.96 0.68 1.60 –7.85 0.06
38. Re Infra 2,653,702,620 7,502 0.46 1.13 0.46 2.08 1.07 0.07
39. R Power 28,051,264,660 6,149 0.37 0.98 0.52 1.19 –3.41 0.07
40. SAIL 41,304,005,450 7,400 0.45 1.06 0.61 1.64 –8.11 0.07
41. SBIN 6,349,989,910 60,447 3.68 1.25 0.69 1.35 –2.49 0.04
42. SESAGOA 869,101,423 10,727 0.65 1.05 0.51 1.95 –2.48 0.06
43. Siemens 680,589,800 7,881 0.48 0.52 0.33 1.17 3.86 0.07
44. Ster 3,361,568,684 22,704 1.38 1.21 0.61 1.58 –5.05 0.06
45. Sun Pharma 1,035,581,955 19,472 1.18 0.68 0.45 1.17 4.08 0.07

67
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
46. Tata Motors 5,382,725,080 33,259 2.02 1.27 0.61 1.93 –4.64 0.05
47. Tata Power 2,373,072,360 20,733 1.26 0.57 0.46 1.30 –2.07 0.07
48. Tata Steel 9,592,144,500 37,547 2.28 1.15 0.70 1.14 –7.58 0.05
49. TCS 1,957,220,996 57,746 3.51 0.95 0.61 1.15 –3.99 0.06
50. WIPRO 4,911,272,378 19,741 1.20 0.83 0.56 1.37 –6.78 0.06

1.12 SUMMARY
An index number measures relative changes in the value of some economic variable/s over a
period of time. It is always expressed in terms of a base of usually 100.
Index numbers showing changes in the values of one variable over time are called univariate
while those showing changes in a group of variables are known as composite index numbers.
The base of an index may either be fixed or chained. In fixed base index numbers, the base
period is common while in chain base indices, for every period its immediately preceding
period is taken as the base.
In order to compare two time series of price relatives, it is necessary that each series should
have the same base period.
The composite index numbers may be simple or weighted, and aggregative or average-of-
relatives.
Simple aggregative price index shows the aggregate of the current year prices as a percentage
of the aggregate of the base year prices.
For weighted aggregative price indices, the quantities are used as weights whereas for
aggregative quantity indices, the prices are used as weights.
The two basic aggregative price indices are those using base year quantities as weights, known
as Laspeyre’s index and current year quantities as weights, known as Paasche’s index.
Fisher’s index is equal to the geometric mean of Laspeyre’s and Paasche’s indexes.
There are four tests of adequacy of index numbers: units test, time-reversal test, factor-reversal
test and circular test. Fisher’s method is the only one that satisfies the first three of these tests.
Value index may be calculated by taking the ratio of current year value to base year value
expressed as a percentage.
Purchasing power of rupee varies inversely with the price index.
Splicing refers to joining two or more series of index numbers for the reason of continuity.

1.13 SELF ASSESSMENT QUESTIONS


Exercise 1 : True or False Statements
(i) Index numbers are used to measure changes in the magnitude of one variable or a group
of distinct but related variables.

68
(ii) Univariate index numbers involve only one variable.
(iii) Index numbers are called specialized averages because they are used only for special
purposes.
(iv) For a given set of data, the simple average-of-price relatives index using arithmetic mean
would be smaller than simple average-of-price relatives index using geometric mean if
the current year prices are lower than the corresponding base year prices.
(v) Laspeyre’s formula uses base year quantities as weights both for price and quantity index
numbers.
(vi) Laspeyre’s price index can never be smaller than 100 in value.
(vii) Since it is not practical to include all commodities in the construction of an index number,
the sample of commodities should be selected by the method of sample random sampling
for reasons of objectivity.
(viii) For a given set of data. Fisher’s price index number cannot exceed both Laspeyre’s and
Paasche’s price indices.
(ix) Fixed base index numbers are also called link relatives while price relatives is another
name for chain base indices.
(x) In weighted average-of-relatives index, the weights used are either the base year quantities
or the current year quantities.
(xi) Paasche’s price index is calculated by using current year values as weights.
(xii) Laspeyre’s price index always has greater value than Paasche’s price index.
(xiii) Dorbish-Bowley index is equal to the arithmetic mean of Laspeyre’s and Paasche’s price
indices.
(xiv) The value index is given by the product or price and quantity index numbers.
(xv) If an index satisfies time-reversal test, it means that for that index. P01 and P10 are reciprocal
of each other.
(xvi) Simple aggregative index satisfies circular test.
(xvii) Fisher’s index is called ‘ideal’ because it satisfies all the tests of adequacy of index numbers.
(xviii) Splicing refers to connecting two or more series of index numbers for the purpose of
continuity.
(xix) The purchasing power of rupee is inversely related to the price index.
Ans. 1. T, 2. T, 3. F, 4. F, 5. F, 6. F, 7. F, 8. T, 9. F, 10. F, 11. F, 12. F, 13. T, 14. F, 15. T, 16. T, 17. F,
18. T, 19. T
Exercise 2 : Questions and Answers
(i) What are index numbers? Why are they called specialized averages?
(ii) “Index numbers are said to be economic barometers.” Explain this statement.

69
(iii) Distinguish between simple and weighted index numbers. Explain the importance of
weighting in the construction of index numbers. Enumerate some of the important methods
of weighting a price index and discuss their relative merits and demerits.
(iv) Laspeyre’s price index generally shows an upward bias while Paasche’s index shows a
downward bias. Do you agree. Explain.
(v) Explain and illustrate the following:
(a) Base shifting
(b) Splicing
(c) Deflating
(vii) What is consumer price index number and what is its utility? What methods are used to
calculate these numbers?
(viii) State and explain the tests of adequacy of index numbers. Which of these are satisfied by
the following index numbers?
(a) Laspeyre
(a) Paasche ,
(a) Fisher
(a) Dorbish-Bowley
(a) Marshall-Edgeworth
(ix) Information on the price of a commodity over the last few years is given below:
Year: 2004 2005 2006 2007 2008 2009 2010 2011 2012
Price in Rs per kg: 62 68 63 69 75 78 82 98 100
(a) Construct price index numbers taking (i) 2004 as base, and (ii) 2008 as base.
(b) Calculate chain base index numbers.
(x) The price index for 2006 stood at 100. It increased by 8 percent in 2007, decreased by 6
percent in 2008, decreased by 2 percent in 2009, increased by 14 percent in 2010, remained
unchanged in 2011, and increased by 12 percent in 2012. Calculate index numbers for
the years 2004 through 2012 taking 2006 = 100, and then shift the origin to 2008.
(xi) Using the following link relatives, calculate price relatives taking 2005 = 100.
Year: 2005 2006 2007 2008 2009 2010 2011 2012
Link relative: 100 114 120 120 114 136 105 110
(xii) Using following data, calculate the simple average-of-price relatives index by taking
(a) Prices of 2005 as the base
(b) Average prices as the base

70
Year Commodity
A B C D E
2005 20 12 40 24 36
2008 30 20 20 30 15
2012 40 10 20 42 15
(xiii) Calculate Laspeyre’s and Paasche’s price index numbers for the year 2012 using the
following data about five commodities:
Commodity
Year A S C D E
Quantity: 2010 12 20 180 36 48
2012 9 32 244 25 60
2010 24 18 24 48 24
Price: 2012 28 20 22 52 20
Further, test whether each of these satisfies the time-reversal test.
(xiv) From the following data, construct quantity index numbers using (a) Fisher’s method,
and (b) Marshall-Edgeworth method.
Commodity Base year Current year
Price Quantity Price Value
A 16 30 40 720
B 25 40 50 2,000
C 24 25 30 1,200
D 54 16 40 1,320
E 22 44 45 1,350
(xv) Using the following data, show that Fisher’s formula satisfies the time-reversal and factor-
reversal tests.
Commodity Base year Current year
Price Value Price Value
A 6 300 10 440
B 3 300 8 1,200
C 12 720 15 1,050
D 15 600 20 900
E 4 60 5 80
(xvi) An enquiry into the budgets of the middle class families of a certain city revealed that on
an average the percentage expenses on the different groups were: Food 45, Clothing 12,
Fuel & Light 8, and Miscellaneous 20. The increases in group index numbers for the
current year as compared with a fixed base period were, respectively, 310, 50, 243, 148
and 185.

71
(a) Calculate the consumer price index number for the current year.
(b) A person was getting Rs. 14,400 pm in the base year and Rs. 28,300 pm in the current
year. State how much he ought to have received as extra allowance to maintain his former
standard of living.
(xvii) From the information given below about the consumer price index number for a certain
group of families in a city, obtain the percentage weights assigned to (a) clothing, and (b)
housing. The consumer price index number is known to be 152.3.
Group: Food Clothing Housing Fuel and electricity Miscellaneous
Index: 140 185 205 120 156
Weight: 60 ? ? 8 10
(xviii) The monthly income of a person is Rs. 21,000. It is given that the consumer price index
number for a particular month is 136. Find out the amount spent by him on (i) food, and
(ii) clothing.
Group Expenditure Index
Food ? 180
Rent 2,940 100
Clothing ? 150
Fuel and power 3,360 110
Miscellaneous 3,780 80
(xix) Owing to a sudden price disturbance, the consumer price index of a working class in a
certain area increased in a month by one-quarter of what it was before, to 225. The index
of food became 252 from 198, that of clothing from 185 to 205, that of fuel & lighting
from 175 to 195, and that of miscellaneous from 138 to 212. The index of rent, however,
remained unchanged at 150. It was known that the weights of clothing, rent and fuel &
lighting were the same. Find out the exact percentage weights of each of the groups.
Ans. 12. (a) 106.67, 110 (b) 108.20, 95.95, 95 .84, 13. L = 94.52, P = 94.16, No 14. (a) = 109.66
(b) = 107.61, 16. (a) = 325 (b) = 18500, 17. (a) = 10 (b) = 12, 18. (i) = 8400 (ii) = 2520, 19. F = 54,
C = FL = R = 10, Misc = 16

72
UNIT V
LESSON 1
TIME SERIES ANALYSIS

1. STRUCTURE
1.0 Objective
1.1 Introduction
1.2 Components of Time Series
1.2.1 Secular Trend
1.2.2 Seasonal Variations
1.2.3 Business Cycle
1.2.4 Irregular Variations
1.3 Models of Time Series
1.4 Methods of Measuring Trend
1.4.1 Freehand Curve Method
1.4.2 Moving Averages
1.4.3 Semi-average Method
1.4.4 Method of Least Squares
1.5 Second Degree Parabola
1.6 Exponential Trend
1.7 Shifting the Trend Origin
1.8 Conversion of Annual Trend to Monthly Trend
1.9 Measurement of Seasonal Variations
1.9.1 Method of Simple Averages
1.9.2 Ratio-to-Moving Average Method
1.9.3 Ratio-to-Trend Method
1.9.4 Link Relatives Method
1.10 Summary
1.11 Self Assessment Questions

1.0 OBJECTIVE
1.0 After studying this lesson, you should be able to :
(i) Understand the meaning of time series analysis
(ii) Understand the importance and components of time series
(iii) Measure trend and seasonal variations by different methods
(iv) Measure trend by second degree parabola and exponential technique
(v) Learn shifting the trend origin and conversion of annual trend to monthly basis and vice-
versa.

1.1 INTRODUCTION
When quantitative data are arranged in the order of their occurrence, the resulting statistical series
is called a time series. The quantitative values are usually recorded over equal time interval daily,

73
weekly, monthly, quarterly, half yearly, yearly, or any other time measure. Monthly statistics of
industrial production in India, annual birth-rate figures for the entire world, yield on ordinary shares,
weekly wholesale price of rice, daily records of tea sales or census data are some of the examples of
time series. Each has a common characteristic of recording magnitudes that vary with passage of
time.
Time series are influenced by a variety of forces. Some are continuously effective other make
themselves felt at recurring time intervals, and still others are non-recurring or random in nature.
Therefore, the first task is to break down the data and study each of these influences in isolation.
This is known as decomposition of the time series. It enables us to understand fully the nature of the
forces at work. We can then analyse their combined interactions. Such a study is known as time-
series analysis.

1.2 COMPONENTS OF TIME SERIES


A time series consists of the following four components or elements :
1. Basic or Secular or Long-time trend;
2. Seasonal variations;
3. Business cycles or cyclical movement; and
4. Erratic or Irregular fluctuations.
These components provide a basis for the explanation of the past behaviour. They help us to
predict the future behaviour. The major tendency of each component or constituent is largely due to
casual factors. Therefore a brief description of the components and the causal factors associated
with each component should be given before proceeding further.
1.2.1 Secular Trend
Basic trend underlines the tendency to grow or decline over a period of years. It is the movement
that the series would have taken, had there been no seasonal, cyclical or erratic factors. It is the
effect of such factors which are more or less constant for a long time or which change very gradually
and slowly. Such factors are gradual growth in population, tastes and habits or the effect on industrial
output due to improved methods. Increase in production of automobiles and a gradual decrease in
production of foodgrains are examples of increasing and decreasing secular trend.
All basic trends are not of the same nature. Sometimes the predominating tendency will be a
constant amount of growth. This type of trend movement takes the form of a straight line when the
trend values are plotted on a graph paper. Sometimes the trend will be constant percentage increase
or decrease. This type takes the form of a straight line when the trend values are plotted on a semi-
logarithmic chart. Other types of trend encountered are “logistic”, “S-curves”, etc.
Properly recognising and accurately measuring basic trends is one of the most important
problems in time series analysis. Trend values are used as the base from which other three movements
are measured. Therefore, any inaccuracy in its measurement may vitiate the entire work. Fortunately,
the causal elements controlling trend growth are relatively stable. Trends do not commonly change
their nature quickly and without warning. It is therefore reasonable to assume that a representative
trend, which has characterized the data for a past period, is prevailing at present, and that it may be
projected into the future for a year or so.

74
1.2.2 Seasonal Variations
The two principal factors liable for seasonal changes are the climate or weather and customs. Since,
the growth of all vegetation depends upon temperature and moisture, agricultural activity is confined
largely to warm weather in the temperate zones and to the rainy or post-rainy season in the torrid
zone (tropical countries or sub-tropical countries like India). Winter and dry season make farming a
highly seasonal business. This high irregularity of month to month agricultural production determines
largely all harvesting, marketing, canning, preserving, storing, financing, and pricing of farm products.
Manufacturers, bankers and merchants who deal with farmers find their business taking on the
same seasonal pattern which characterise the agriculture of their area.
The second cause of seasonal variation is custom, education or tradition. Such traditional days
as Diwali, Christmas. Id etc., product marked variations in business activity, travel, sales, gifts,
finance, accident, and vacationing.
The successful operation of any business requires that its seasonal variations be known,
measured and exploited fully. Frequently, the purchase of seasonal item is made from six months to
a year in advance. Departments with opposite seasonal changes are frequently combined in the
same firm to avoid dull seasons and to keep sales or production up during the entire year.
Seasonal variations are measured as a percentage of the trend rather than in absolute quantities.
The seasonal index for any month (week, quarter etc.) may be defined as the ratio of the normally
expected value (excluding the business cycle and erratic movements) to the corresponding trend
value. When cyclical movement and erratic fluctuations are absent in a time series, such a series is
called normal. Normal values thus are consisting of trend and seasonal components. Thus when
normal values are divided by the corresponding trend values, we obtain seasonal component of
time series.
1.2.3 Business Cycle
Because of the persistent tendency for business to prosper, decline, stagnate recover; and prosper
again, the third characteristic movement in economic time series is called the business cycle. The
business cycle does not recur regularly like seasonal movement, but moves in response to causes
which develop intermittently out of complex combinations of economic and other considerations.
When the business of a country or a community is above or below normal, the excess deficiency
is usually attributed to the business cycle. Its measurement becomes a process of contrast occurrences
with a normal estimate arrived at by combining the calculated trend and seasonal movements. The
measurement of the variations from normal may be made in terms of actual quantities or it may be
made in such terms as percentage deviations, which is generally more satisfactory method as it
places the measure of cyclical tendencies on comparable base throughout the entire period under
analysis.
1.2.4 Irregular Variations
These movements are exceedingly difficult to dissociate quantitatively from the business cycle.
Their causes are such irregular and unpredictable happenings such as wars, droughts, floods, fires,
pestilence, fads and fashions which operate as spurs or deterrents upon the progress of the cycle.
Examples of such movements are : high activity in middle forties due to erratic effects of 2nd world
war, depression of thirties throughout the world, export boom associated with Korean War in 1950.
The common denominator of every random factor is that it does not come about as a result of the
ordinary operation of the business system and does not recur in any meaningful manner.

75
1.3 MODELS OF TIME SERIES
A time series may not be affected by all type of variations. Some of these type of variations may
affect a few time series, while the other series may be effected by all of them. Hence, in analysing
time series, these effects are isolated. In classical time series analysis it is assumed that any given
observation is made up of trend, seasonal, cyclical and irregular movements and these four
components have multiplicative relationship.
Symbolically:
O =T×S×C×I
where O refers to original data,
T refers to trend,
S refers to seasonal variations,
C refers to cyclical variations and
I refers to irregular variations.
This is the most commonly used model in the decomposition of time series.
There is another model called Additive model in which a particular observation in a time
series is the sum of these four components.
O =T+S+C+I
To prevent confusion between the two models, it should be made clear that in Multiplicative
model S, C, and I are indices expressed as decimal percents whereas in Additive model S, C and I
are quantitative deviations about trend that can be expressed as seasonal, cyclical and irregular in
nature.
If in a multiplicative model, T = 500, S = 1.4, C = 1.20 and I = 0.7 then
O =T×S×C×I
By substituting the values we get
O = 500 × 1.4 × 1.20 × 0.7 = 588
In additive model, T = 500, S = 100, C = 25, I = –50
O = 500 + 100 + 25 – 50 = 575
The assumption underlying the two schemes of analysis is that whereas there is no interaction
among the different constituents or components under the additive scheme, such interaction is very
much present in the multiplicative scheme. Time series analysis, generally, proceed on the assumption
of multiplicative formulation.

1.4 METHODS OF MEASURING TREND


Trend can be determined : (i) Free hand curve method ; (ii) moving averages method ; (iii) semi-
averages method ; and (iv) least-squares method. Each of these methods is described below.
1.4.1 Freehand Curve Method
The term freehand is used to any non-mathematical curve in statistical analysis even if it is drawn
with the aid of drafting instruments. This is the simplest method of studying trend of a time series.
The procedure for drawing free hand curve is as follows :

76
(i) The original data are first plotted on a graph paper.
(ii) The direction of the plotted data is carefully observed.
(iii) A smooth line is drawn through the plotted points.
While fitting a trend line by the freehand method, an attempt should be made that the fitted
curve conforms to these conditions.
(i) The curve should be smooth either a straight line or a combination of long gradual curves.
(ii) The trend line or curve should be drawn through the graph of the data in such a way that
the areas below and above the trend line are equal to each other.
(iii) The vertical deviations of the data above the trend line must equal to the deviations below
the line.
(iv) Sum of the squares of the vertical deviations of the observations from the trend should be
minimum.
Example 1 : Draw a time series graph relating to the following data and fit the trend by freehand
method :
Year Production of Steel
(million tonnes)
2007 20
2008 22
2009 30
2010 28
2011 32
2012 25
2013 29
2014 35
2015 40
2016 32
Y TREND OF STEEL PRODUCTION
PRODUCTION OF STEEL

40 IN E
N DL
T RE ATA
LD
30 UA
A CT

20

10

0 X
2007 2008 2009 2011 2012 2013 2014 2015 2016
YEARS

The trend line drawn by the freehand method can be extended to project future values. However,
the free-hand curve fitting is too subjective and should not be used as a basis for prediction.

77
1.4.2 Moving Averages
The moving average is a simple and flexible process of trend measurement which is quite accurate
under certain conditions. This method establishes a trend by means of a series of averages covering
overlapping periods of the data.
The process of successively averaging, say, three years data, and establishing each average as
the moving-average value of the central year in the group, should be carried throughout the entire
series. For a five-item, seven-item or other moving averages, the same procedure is followed : the
average obtained each time being considered as representive of the middle period of the group.
The choice of a 5-year, 7-year, 9-year, or other moving average is determined by the length of
period necessary to eliminate the effects of the business cycle and erratic fluctuations. A good trend
must be free from such movements, and if there is any definite periodicity to the cycle, it is well to
have the moving average to cover one cycle period. Ordinarily, the necessary periods will range
between three and ten years for general business series but even longer periods are required for
certain industries.
In the preceding discussion, the moving averages of odd number of years were representatives
of the middle years. If the moving average covers an even number of years, each average will still
be representative of the mid-point of the period covered, but this mid-point will fall half way between
the two middle years. In the case of a four-year moving average, for instance each average represents
a point half way between the second and third years. In such a case, a second moving average may
be used to ‘recentre’ the averages. That is, if the first moving averages gives averages centering
half-way between the years, a further two-point moving average will recentre the data exactly on
the years.
This method, however, is valuable in approximating trends in a period of transition when the
mathematical lines or curves may be inadequate. This method provides a basis for testing other
types of trends, even though the data are not such as to justify its use otherwise.
Example 2 : Calculate 5-yearly moving average trend for the time series given below.
Year : 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
Quantity : 239 242 238 252 257 250 273 270 268 288 284
Year : 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Quantity : 282 300 303 298 313 317 309 329 333 327
Solution :
Year Quantity 5-yearly moving total 5-yearly moving average
1995 239
1996 242
1997 238 1228 245.6
1998 252 1239 247.8
1999 257 1270 254.0
2000 250 1302 260.4
2001 273 1318 263.6
2002 270 1349 269.8

78
2003 268 1383 276.6
2004 288 1392 278.4
2005 284 1422 284.4
2006 282 1457 291.4
2007 300 1467 293.4
2008 303 1496 299.2
2009 298 1531 306.2
2010 313 1540 308.0
2011 317 1566 313.2
2012 309 1601 320.2
2013 329 1615 323.0
2014 333
2015 327

To simplify calculation work: Obtain the total of first five years data. Find out the difference
between the first and sixth term and add to the total to obtain the total of second to sixth term. In this
way the difference between the term to be omitted and the term to be included is added to the
preceding total in order to obtain the next successive total.
Example 3 : Fit a trend line by the method of four-yearly moving average to the following time
series data.
Year : 2001 2002 2003 2004 2005 2006 2007 2008
Sugar production (lakh tons) : 5 6 7 7 6 8 9 10
Year : 2009 2010 2011 2012
Sugar production (lakh tons) : 9 10 11 11

Solution :
Year Sugar Production 4-yearly 4-yearly To recenter trend values
(lakh tons) moving moving 2 yearly centred 2-yearly moving
total average total average
1. 2. 3. 4. 5. 6.
2001 5
2002 6
2003 7 25 6.25 12.75 6.375
2004 7 26 6.50 13.50 6.75
2005 6 28 7.00 14.50 7.25
2006 8 30 7.50 15.75 7.875
2007 9 33 8.25 17.25 8.625

79
2008 10 36 9.00 18.50 9.25
2009 9 38 9.50 19.50 9.75
2010 10 40 10.00 20.25 10.125
2011 11 41 10.25
2012 11
Remark : Observe carefully the placement of totals, averages between the lines.
Merits
1. This is a very simple method.
2. The element of flexibility is always present in this method as all the calculations have not
to be altered if same data is added. It only provides additional trend values.
3. If there is a coincidence of the period of moving averages and the period of cyclical
fluctuations, the fluctuations automatically disappear.
4. The pattern of moving average is determined in the trend of data and remains unaffected
by the choice of method to be employed.
5. It can be put to utmost use in case of series having strikingly irregular trend.
Limitations
1. It is not possible to have a trend value for each and every year. As the period of moving
average increases, there is always an increase in the number of years for which trend
values cannot be calculated and known. For example, in a five yearly moving average,
trend value cannot be obtained for the first two years and last two years, in a seven yearly
moving average for the first three years and last three years and so on. But usually values
of the extreme years are of great interest.
2. There is no hard and fast rule for the selection of a period of moving average.
3. Forecasting is one of the leading objectives of trend analysis. But this objective remains
unfulfilled because moving average is not represented by a mathematical function.
4. Theoretically it is claimed that cyclical fluctuations are ironed out if period of moving
average coincide with period of cycle, but in practice cycles are not perfectly periodic.
1.4.3 Semi-average Method
This simple method can be used if a straight line trend is to be obtained. Since the location of only
two points is necessary to obtain a straight line equation, it is obvious that we may select two
representative points and connect them by a straight line. Data are divided into two halves and an
average is obtained for each half. Each such average is shown against the mid-point of the half
period, we obtain two points on a graph paper. By joining these points, a straight line trend is
obtained.
The method is to be commended for its simplicity and used to some extent in practical work.
This method is also flexible, for it is permissible to select representative periods to determine the
two points. Unrepresentative years may be ignored.
1.4.4 Method of Least Squares
If a straight line is fitted to the data it will serve as a satisfactory trend, perhaps the most accurate
method of fitting is that of least squares. This method is designed to accomplish two results.

80
(i) The sum of the vertical deviations from the straight line must equal zero.
(ii) The sum of the squares of all deviations must be less than the sum of the squares for any
other conceivable straight line.
There will be many straight lines which can meet the first condition. Among all different lines,
only one line will satisfy the second condition. It is because of this second condition that this
method is known as the method of least squares. It may be mentioned that a line fitted to satisfy the
second condition, will automatically satisfy the first condition.
The formula for a straight-line trend can most simply be expressed as
Yc = a + bX
where X represents time variable, Yc is the dependent variable for which trend values are to be
calculated and a and b are the constants of the straight line to be found by the method of least
squares.
Constant a is the Y-intercept. This is the difference between the point of the origin (O) and the
point when the trend line and Y-axis intersect. It shows the value of Y when X = 0, constant b
indicates the slope which is the change in Y for each unit change in X.
Let us assume that we are given observations of Y for n number of years. If we wish to find the
values of constants a and b in such a manner that the two conditions laid down above are satisfied
by the fitted equation.
Mathematical reasoning suggests that, to obtain the values of constants a and b according to
the Principle of Least Squares, we have to solve simultaneously the following two equations.
Y = na + bX ...(i)
XY = aX + bX2 ...(ii)
Solution of the two normal equations yield the following values for the constants a and b:
nXY  X Y
b =
n X 2  (  X ) 2

 Y  b X
and a =
n
Least Squares Long Method : It makes use of the above mentioned two normal equations
without attempting to shift the time variable to convenient mid-year. This method is illustrated by
the following example.
Example 4 :
Fit a linear trend curve by the least-squares method to the following data :
Year Production (Kg.)
2006 3
2007 5
2008 6
2009 6

81
2010 8
2011 10
2012 11
2013 12
2014 13
2015 15
Solution : The first year 2006 is assumed to be 0, 2007 would become 1, 2008 would be 2 and so
on. The various steps are outlined in the following table.
Year Production
Y X XY X2
1 2 3 4 5
2006 3 0 0 0
2007 5 1 5 1
2008 6 2 12 4
2009 6 3 18 9
2010 8 4 32 16
2011 10 5 50 25
2012 11 6 66 36
2013 12 7 84 49
2014 13 8 104 64
2015 15 9 135 81
Total 89 45 506 285

The above table yields the following values for various terms mentioned below :
n = 10, X = 45, X2 = 285, Y = 89, and XY = 506
Substituting these values in the two normal equations, we obtain
89 = 10a + 45b ...(i)
506 = 45a + 285b ...(ii)
Multiplying equation (i) by 9 and equation (ii) by 2. we obtain
801 = 90a + 405b ...(iii)
1012 = 90a + 570b ....(iv)
Subtracting equation (iii) from equation (iv), we obtain
211 = 165 b or b = 211/165 = 1.28
Substituting the value of b in equation (i), we obtain
89 = 10a + 45 × 1.28

82
89 = 10a + 57.60
10a = 89 – 57.6
10a = 31.4
a = 31.4/10 = 3.14
Substituting these values of a and b in the linear equation, we obtain the following trend line
Yc = 3.14 + 1.28X
Inserting various values of X in this equation, we obtain the trend values as below :
Year Observed Y b×X Yc (Col. 3 plus Col. 4)
1 2 3 4 5
2006 3 3.14 1.28 × 0 3.14
2007 5 3.14 1.28 × 1 4.42
2008 6 3.14 1.28 × 2 5.70
2009 6 3.14 1.28 × 3 6.98
2010 8 3.14 1.28 × 4 8.26
2011 10 3.14 1.28 × 5 9.54
2012 11 3.14 1.28 × 6 10.82
2013 12 3.14 1.28 × 7 12.10
2014 13 3.14 1.28 × 8 13.38
2015 15 3.14 1.28 × 9 14.66
Least Squares Method : We can take any other year as the origin, and for that year X would
be 0. Considerable saving of both time and effort is possible if the origin is taken in the middle of
the whole time span covered by the entire series. The origin would then be located at the mean of
the X values. Sum of the X values would then equal 0. The two normal equations would then be
simplified to
Y = Na ... (i)
Y
or a =
N
XY
and XY = bX2 or b= .... (ii)
X 2
Two cases of short cut method are given below. In the first case there are odd number of years
while in the second case the number of observations are even.
Example 5 : Fit a straight line trend on the following data :
Year 2008 2009 2010 2011 2012 2013 2014 2015 2016
Y 4 7 7 8 9 11 13 14 17

83
Solution : Since we have 9 observations, the origin, is taken at 2012 for which X is assumed to
be 0.
Year Y X XY X2
2008 4 –4 – 16 16
2009 7 –3 – 21 9
2010 7 –2 – 14 4
2011 8 –1 –8 1
2012 9 0 0 0
2013 11 1 11 1
2014 13 2 26 4
2015 14 3 42 9
2016 17 4 68 16
Total 90 0 88 60

Thus n = 9, Y = 90, X = 0,XY = 88, and X2 = 60


Substituting these values in the two normal equations, we get
90 = 9a or a = 90/9 or a = 10
88 = 60b or b = 88/60 or b = 1.47
Trend equation is : Yc = 10 + 1.47 X
Inserting the various values of X, we obtain the trend values as below.
Years Observed Y X a b×X Yc(Col. 4 plus Col. 5)
2008 4 –4 10 1.47 × –4 = – 5.88 4.12
2009 7 –3 10 1.47 × –3 = – 4.41 5.59
2010 7 –2 10 1.47 × –2 = – 2.94 7.06
2011 8 –1 10 1.47 × –1 = – 1.47 8.53
2012 9 0 10 1.47 × 0 = 0 10.00
2013 11 1 10 1.47 × 1 = 1.47 11.47
2014 13 2 10 1.47 × 2 = 2.94 12.94
2015 14 3 10 1.47 × 3 = 4.41 14.41
2016 17 4 10 1.47 × 4 = 5.88 15.88
Example 6 : Fit a straight line trend to the data which gives number of passenger cars sold (millions)
Year 2009 2010 2011 2012 2013 2014 2015 2016
No. of cars 6.7 5.3 4.3 6.1 5.6 7.9 5.8 6.1
(millions)

84
Solution : Here there are two mid-years viz; 2012 and 2013. The mid-point of the two years is
assumed to be 0 and the time of six months is treated to be the unit. On this basis the calculations
are as shown below :
Year Observed Y X XY X2
2009 6.7 –7 – 46.9 49
2010 5.3 –5 – 26.5 25
2011 4.3 –3 –12.9 9
2012 6.1 –1 – 6.1 1
2013 5.6 1 5.6 1
2014 7.9 3 23.7 9
2015 5.8 5 29.0 25
2016 6.1 7 42.7 49
Total 47.8 0 8.6 168
From the above computations, we get the following values.
n = 8, Y = 47.8, X = 0, XY = 8.6, X2 = 168
Substituting these values in the two normal equations, we obtain
47.8 = 8a or a = 47.8/8 or a = 5.98
and 8.6b = 168b or b = 8.6/168 or b = 0.051
The equation for the trend line is : Yc = 5.98 + 0.051
Trend values generated by this equation are given below.
Years Observed Y X a b×X Yc(Col. 4 plus Col. 5)
2009 6.7 –7 5.98 .051 × –7 = –.357 5.623
2010 5.3 –5 5.98 .051 × –5 = –.255 5.725
2011 4.3 –3 5.98 .051 × –3 = –.153 5.827
2012 6.1 –1 5.98 .051 × –1 = –.051 5.929
2013 5.6 1 5.98 .051 × 1 = .051 6.031
2014 7.9 3 5.98 .051 × 3 = .153 6.133
2015 5.8 5 5.98 .051 × 5 = .255 6.235
2016 5.1 7 5.98 .051 × 7 = .357 6.337

1.5 SECOND DEGREE PARABOLA


The simplest example of the non-linear trend is the second degree parabola, the equation is written
in the form:
Yc = a + bX+ cX2

85
When numerical values for a, b and c have been derived, the trend value for any year may be
computed substituting in the equation the value of X for that year. The values of a, b and c can be
determined by solving the following three normal equations simultaneously:
(i) Y = Na + bX + cX2
(ii) XY = aX + bX2 + cX3
(iii) X2Y = aX2 + bX3 + cX4
Note that the first equation is merely the summation of the given function, the second is the
summation of X multiplied into the given function, and the third is the summation of X2 multiplied
into the given function.
When time origin is taken between two middle yearsX would be zero. In that case the equations
are reduced to:
(i) Y = Na + cX2
(ii) XY = bX2
(iii) X2Y = aX2 + cX4
The value of b can now directly be obtained from equation (ii) and value of a and c by solving
equations (i) and (iii) simultaneously. Thus,

Y  cX 2 XY N X 2Y  X 2 Y
a= ; b= ; c=
N X 2 N X 4  ( X 2 ) 2
Example 7 : The price of a commodity during 2010–2015 is given below. Fit a parabola
Y = a + bX + cX2 to this data. Estimate the price of the commodity for the year 2016
Year Price Year Price
2010 100 2013 140
2011 107 2014 181
2012 128 2015 192
Also plot the actual and trend values on graph.
Solution : To determine the value a, b and c, we solve the following normal equations:
Y = Na + bX + cX2
XY = aX + bX2 + cX3
X2Y = aX2 + bX3 + cX4
Year Y X X2 X3 X4 XY X2Y Yc
2010 100 –2 4 –8 16 – 200 400 97.744
2011 107 –1 1 –1 1 – 107 107 110.426
2012 128 0 0 0 0 0 0 126.680

86
2013 140 +1 1 +1 1 + 140 140 146.506
2014 181 +2 4 +8 16 + 362 724 169.904
2015 192 +3 9 +27 81 +576 1728 196.874
N = 6 Y = 848 X = 3 X2 = 19 X3 = 27 X4 = 115 XY = 771 X2Y = 3099 Yc = 848.134
848 = 6a + 3b + 19c ...(i)
771 = 3a + 19b + 27c ...(ii)
3,099 = 19a +27b + 115c ...(iii)
Solving Eqns. (i) and (ii), we get
35b +35c = 695 ...(iv)
Multiplying Eqn. (ii) by 19 and Eqn. (iii) by 3. Subtracting (iii) from (ii), we get
5352 = 280b + 168 c ...(v)
Solving Eqns. (iv) and (v), we get
c = 1.786
Substituting the value of c in Eqn. (iv), we get
b = 18.04 [35b + (35 × l.786) = 695]
Putting the value of b and c in Eqn. (i), we get
a = 126.68 [848 = 6a + (3 × 18.04) + (19 × 1.786)]
Thus a = 126.68, b = 18.04 and c = 1.786
Substituting the values in the equation
Yc = 126.68+ 18.04 X+ 1.786X2
When X = – 2, Y = 126.68 + 18.04(–2) + 1.786 (–2)2
= 126.68 – 36.08 + 7.144 = 97.744
When X = – 1, Y = 126.68 + 18.04(–1) + 1.786 (–1)2
= 126.68 – 18.04 + 1.786 = 110.426
When X = 0, Y = 126.68
When X= 1, Y = 126.68 + 18.04 + 1.786 = 146.506
When X = 2, Y = 126.68 + 18.04 (2) + 1.786 (2)2
= 126.68 + 36.08 + 7.144 = 169.904
When X = 3, Y = 126.68 + 18.04(3) + 1.786(3)2
= 126.68 + 54.12 + 16.074 = 196.874
Price for 2016, Y = 126.68 + 18.04(4) + 1.786(4)2
When X = 4 = 126.68 + 72.16 + 28.576 = 227.416
Thus the likely price of the commodity for the year 2016 is Rs. 227.416.

87
1.6 EXPONENTIAL TRENDS
The equation of the exponential curve is
Y = abx
Putting the equation in logarithmic form, we get
log Y = log a +X log b.
When plotted on a semi-logarithmic graph, the curve gives a straight line. However, on an
arithmetic chart the curve gives a non-linear trend. To obtain the value of the constants a and b, the
two normal equations to be solved are :
log Y = N log a + log bX
(X log Y) = log a X + log bX2
where a is the Y intercept and b the slope of the curve.
When deviations are taken from middle year, X = 0, the above equation becomes

 logY
log Y = Nlog a log a =
N

X log Y
and (X log Y) = log bX2 log b =
X 2
Steps. The steps in fitting a curve are :
(i) Find the time deviation of each year from the middle year and denote these deviations
by X.
(ii) Square these deviations and obtain X2.
(iii) Obtain logarithms of the variable Y.
(iv) Multiply log Y by the corresponding time deviation and obtain X log Y.
(v) Divide  log Y by N. This would give the value of log a.
(vi) Divide (X log Y) by X2. This would give the value of log b, i.e., rate of growth or the
slope of the line.
(vii) Put the value of log a before the middle year and add or subtract the slope of the line, i.e.,
the value of log b to get trend ordinates in logarithms.
(viii) Take the antilogs of these logs to arrive at the actual trend values.
Example 8 : The sales of a company for the years to are given below :
Years : 2009 2010 2011 2012 2013 2014 2015
Sales (Rs. million) : 32 47 65 92 190 132 275
Estimate sales figure for the year 2018 using the equation of the form Y = abx where X = years
and Y = Sales

88
Solution : Fitting Equation Y = abx
Year Sales (Rs. million) Y X log Y X2 X log Y
2009 32 –3 1.5051 9 – 4.5153
2010 47 –2 1.6721 4 – 3.3442
2011 65 –1 1.8129 1 – 1.8129
2012 92 0 1.9638 0 0
2013 190 +1 2.2788 1 + 2.2788
2014 132 +2 2.1206 4 + 4.2412
2015 275 +3 2.4393 9 + 7.3179
N=7 Y = 833 X = 0 log Y X2 = 28 Xlog Y
= 13.7926 = 4.1655
 log Y 13.7926
log a =   1.9704
N 7
X log Y 4.1655
log b =   0.149
X 2 28
log Y = 1.9704 + 0.149 X

1.7 SHIFTING THE TREND ORIGIN


Trends are usually fitted to annual data with the middle of the series as origin. At times it may be
necessary to change the origin of the trend equation to some other point in the series. For example,
annual trend values may be changed to monthly or quarterly values if we wish to study seasonal or
cyclical patterns.
For an arithmetic straight line we have to find out new Y intercept, lies the value of ‘a’. The
value of ‘b’ remains unchanged, since the slope of the trend line remains same irrespective of the
origin. The procedure of shifting the origin may be done by the expression :
Yt = a + b(X + k)
where k is the number of time units shifted. If the origin is shifted forward in time, k is positive, if
shifted backward in time, k is negative.
Example 9 : You are given the trend equation
Yc = 110 + 2X
(origin 2008, time unit 1 year)
Shift the origin to 2012.
Solution : We are required to shift the origin to 2012, 4 years forward. Here k = 4. The required
equation can be obtained as :
Yt = a + b (X + k)
= 110 + 2(X + 4) = 110 + 2X + 8 = 118 + 2X
(origin 2012, X unit = 1 year)

89
Example 10 : You are given the trend equation
Yc = 210 – 1.5X
(origin 2012, time unit 1 year)
Shift the origin to 2007.
Solution : Changing origin from 2012 to 2007 means going back by 5 years. Using the formula
Yt = a + b(X + k)
= 210 – 1.5(X – 5) = 210 = 1.5X + 7.5 = 21.75 – 1.5X
(origin 2007, time period one year)
Example 11 : You are given the following equation :
Y = 126.55 + 18.04X + 1.786X2
(origin 2011–12)
Solution : If we wish to shift the origin for this equation to 2012, we may follow the procedure
explained above
Yt = 126.55 + 18.04(X + 0.5) + 1.786(X + 0.5)2
= 126.55 + 18.04X + 9.02 + 1.786(X2 + X + 0.25)
= 126.55 + 18.04X + 9.02 +1.786X2 + 1.786X + 0.4465
= 136.0165 + 19.826X + 1.786X2

1.8 CONVERSION OF ANNUAL TREND TO MONTHLY TREND


From annual trend equations we can obtain monthly trend equations without any loss in accuracy.
When the Y units are annual totals then an annual trend equation can be converted into an equation
of monthly totals by dividing the computed constant ‘a’ by 12 and the value of‘b’ by 144. Justification
of dividing ‘a’ and ‘b’ by 12 and 144 is that the data are sums of 12 months hence ‘a’ and ‘b’ must
be divided by 12 and ‘b’ is again divided by 12 so that the time units (X’s will be in months as well,
i.e., ‘b’ would give monthly increments). Therefore the monthly trend equation becomes:
a b
Y =  X
12 144
The annual trend equation can also be reduced to quarterly trend equation which will be given
by :
a b a b
Y =  X or  X
4 44 4 16
Example 12 : The trend of the annual sales of ABC Co. Ltd. is given by the following equation :
Yc = 30 + 3.6X (origin 2012, X unit = 1 year, Y unit = annual sales)
Convert the equation on monthly basis.
Solution : To convert an annual trend equation on monthly basis, the value of ‘a’ is divided by 12
and the value of ‘b’ by 144. The equation on monthly basis is
30 3.6
Yc =  X
12 144

90
Yc = 2.5 + 0.025X
If the annual trend equation is of second degree, the corresponding monthly trend equation is
obtained by dividing ‘a’ by 12, ‘b’ by 144 and ‘c’ by 1728 (the last being identical to dividing ‘c’ by
12 three times).
Example 13 : Convert the following annual trend equation on a monthly basis :
Yc = 10.6 + 0.8 X + 0.64X2
Solution : To convert annual trend equation of the second degree on monthly basis, divide ‘a’ by
12, ‘b’ by 144 and ‘c’ by 1,728. Thus, the required equation will be :
10.6 0.8 0.64 2
Yc =  X X
12 144 1728
= 0.883 + 0.0056X + 0.00037X 2
where data are given as monthly averages per year, the value of ‘a’ remains unchanged and the ‘b’
is divided by 12 only once. The reason is that ‘a’ is already at the monthly level and ‘b’now represents
the annual change in monthly magnitudes. In case of a second-degree trend equation, the value of
‘c’ is divided by 144.
Example 14 : You are given the following trend equation:
Yc = 280 – 1.8X (origin June 30, 2012,
Y unit = annual monthly average sales)
Convert this equation into monthly terms and shift the origin half a month forward.
Solution : (i) The given annual trend equation reduced to monthly values will be :
1.8
Yc = 280  X
12
= 280 – 0.15X
(origin: June 30, 2012; X unit = 1 month; Y unit = average monthly sales)
(ii) Shifting the origin half a month forward :
Yt = 280 – 0.15 (X + 0.5)
= 280 – 0.15X – 0.075
= 279.925 – 0.15X

1.9 MEASUREMENT OF SEASONAL VARIATIONS


Seasonal variations are those rhythmic changes in the time series data that occur regularly each
year. They have their origin in climatic or institutional factors that affect either supply or demand or
both. It is important that these variations be measured accurately for three reasons. First, the
investigator wants to eliminate seasonal variations from the data he is studying. Second, a precise
knowledge of the seasonal pattern aid in planning future operations. Lastly, complete knowledge of
seasonal variations is of use to those who are trying to remove the cause of seasonals or are attempting
to mitigate the problem by diversification, offsetting opposing seasonal patterns, or some other
means.
Since the number of calender days and working days vary from month to month, therefore, it

91
is essential to adjust the monthly figures if the same are based on daily quantities, otherwise, there
is no need for such adjustment when we deal with either volume of inventories or of bank deposits
because then the values are not influenced by the number of calender days or working days.
Methods of Measuring Seasonal Variations
1. Method of Simple Averages (Weekly, Monthly or Quarterly).
2. Ratio-to-Moving Average Method.
3. Ratio-to-Trend Method.
4. Link Relatives Method.
1.9.1 Method of Simple Averages
This is the simplest method of obtaining a seasonal index. The following steps are necessary for
calculating the index:
(i) Average the unadjusted date by years and months or quarters if quarterly data are given.
(ii) Find totals of January, February etc.
(iii) Divide each total by the number of years for which data are given. For example, if we are
given monthly data for five years then we shall first obtain total for each month for five
years and divide each total by 5 to obtain an average.
(iv) Obtain an average of monthly averages by dividing the total of monthly averages by 12.
(v) Taking the average of monthly average as 100, compute the percentage of various monthly
averages as follows:
Monthly average for January
Seasonal Index for January =  100
Average of monthly average
If instead of the average of each month, the total of each month are obtained, we will get the
same result. The following example shall illustrate the method.
Example 15 : Consumption of monthly electric power in million of Kw hours for street lighting in
India during 2011 – 2015 is given below:
Year Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec
2011 318 281 278 250 231 216 223 245 269 302 325 347
2012 342 309 299 268 249 236 242 262 288 321 342 364
2013 367 328 320 287 269 251 259 284 309 245 367 394
2014 392 349 342 311 290 273 282 305 328 364 389 417
2015 420 378 370 334 314 296 305 330 356 396 422 452
Find out seasonal variation by the method of monthly averages.
Solution : Computation of Seasonal Indices by Monthly Averages

Monthly Five
Consumption of monthly electric power total for yearly Percen-
Month 5 years average tage
(1) (2) (3) (4) (5) (6) (7) (8) (9)
Jan. 318 342 367 392 420 1,839 367.8 116.1

92
Feb. 281 309 328 349 378 1,645 329.0 103.9
March 278 299 320 342 370 1,609 321.8 101.6
April 250 268 287 311 334 1,450 290.0 91.6
May 231 249 269 290 314 1,353 270.6 85.4
June 216 236 251 273 296 1,272 254.4 80.3
July 223 242 259 282 305 1,311 262.2 82.8
Aug. 245 262 284 305 330 1,426 285.2 90.1
Sept. 269 288 309 328 356 1,550 310.0 97.9
Oct. 302 321 245 364 396 1,728 345.6 109.1
Nov. 325 342 367 389 422 1,845 369.0 116.5
Dec. 347 364 394 417 452 1,974 394.8 124.7
Total 19,002 3,800.4 1,200
Average 1,583.5 316.7 100

The above calculations are explained below:


(i) Column No. 7 gives the total for each month for five years.
(ii) In column No. 8 each total of column No. 7 has been divided by 5 to obtain an average for
each month.
(iii) The average of monthly averages is obtained by dividing the total of monthly averages by
12.
(iv) In column No. 9 each monthly average has been expressed as percentage of the average
of monthly averages. Thus, the percentage for January
367.8
=  100  116.1
316.7
329.0
Percentage for February =  100  103.9
316.7
If instead of monthly data, we are given weekly or quarterly data, we shall compute weekly or
quarterly averages by following the same procedure.
1.9.2 Ratio-to-moving average method
The method of monthly totals or monthly averages does not give any consideration to the trend
which may be present in the data. The ratio-to-moving average method is one of the simplest of the
commonly used devices for measuring seasonal variation which takes the trend into consideration:
The steps to compute seasonal variation are as follows :-
(i) Arrange the unadjusted data by years and months.
(ii) Compute the trend values by the method of moving averages. For this purpose take 12
month moving average followed by a two-month moving average to recentre the trend
values.
(iii) Express the data for each month as a percentage ratio of the corresponding moving-
average trend value.

93
(iv) Arrange these ratios by months and years.
(v) Aggregate the ratios for January, February etc.
(vi) Find the average ratio for each month.
(vii) Adjust the average monthly ratios found in step (vi) so that they will themselves average
100 per cent. These adjusted ratios will be the seasonal indices for various months.
A seasonal index computed by the ratios-to-moving-average method ordinarily does not fluctuate
so much as the index based on straight-line trends. This is because the 12-month moving average
follows the cyclical course of the actual data quite closely. Therefore the index ratios obtained by
this method are often more representative of the data from which they are obtained than is the case
in the ratio-to-trend method which will be discussed later on.
Example 16 : Prepare a monthly seasonal index from the following data, using moving averages
method :
Monthly Sales of XYZ Products Co,. Ltd. (Rs.)
Year
Month 2010 2011 2012
January 3,639 3,913 4,393
February 3,591 3,856 4,530
March 3,326 3,714 4,287
April 3,469 3,820 4,405
May 3,321 3,647 4,024
June 3,320 3,498 3,992
July 3,205 3,476 3,795
August 3,205 3,354 3,492
September 3,255 3,594 3,571
October 3,550 3,830 3,923
November 3,771 4,183 3,984
December 3,772 4,482 3,880
Solution :
Computations of Ratios to 12-month centered moving averages for sales (Rs.)
Year & Sales 12-month 12-month Centred Ratio to moving
month (Rs.) moving moving 12-months average
total average moving
average
1 2 3 4 5 6
2010
Jan. 3,639
Feb. 3,591

94
March 3,326
April 3,469
May 3,321
June 3,320
41,424 3,452
July 3,205 3,463 92.55
41,698 3,475
Aug. 3,205 3,486 91.94
41,963 3,497
Sept. 3,255 3,513 92.66
42,351 3,529
Oct. 3,550 3,543 100.20
42,702 3,558
Nov. 3,771 3,572 105.57
43,028 3,586
Dec. 3,772 3,593 104.98
2011 43,206 3,601
Jan. 3,913 3,612 108.33
43,477 3,623
Feb. 3,856 3,630 106.23
43,626 3,636
March 3,714 3,650 101.75
43,965 3,664
April 3,820 3,675 103.95
44,245 3,687
May 3,647 3,704 98.46
44,657 3,721
June 3,498 3,751 93.26
45,367 3,781
July 3,476 3,801 91.45
45,847 3,821
Aug. 3,354 3,849 87.14
46,521 3,877
Sept. 3,594 3,901 92.13
47,094 3,925
Oct. 3,830 3,949 96.99
47,679 3,973
Nov. 4,183 3,989 104.86
48,056 4,005

95
Dec. 4,482 4,025 111.35
2012 48,550 4,046
Jan. 4,393 4,059 108.23
48,869 4,072
Feb. 4,530 4,078 111.08
49,007 4,084
March 4,287 4,083 105.00
48,984 4,082
April 4,405 4,086 107.81
49,077 4,090
May 4,024 4,081 98.60
48,878 4,073
June 3,992 4,048 98.62
48,276 4,023
July 3,795
Aug. 3,492
Sept. 3,571
Oct. 3,923
Nov. 3,984
Dec. 3,880
Arranging the ratios-to-moving average by months and years we obtain the following table
from which the seasonal index for each month is also obtained.
Computation of Seasonal Index by Ratios-to-Moving Averages of XYZ Products Co. Ltd.
Year Seasonal
Month 2010 2011 2012 Total Average Index
January — 108.33 108.23 216.56 108.28 107.6
February — 106.23 111.08 217.31 108.65 108.1
March — 101.75 105.00 206.75 103.37 102.8
April — 103.95 107.81 211.76 105.88 105.3
May — 98.46 98.60 197.06 98.53 98.0
June — 93.26 98.62 191.88 95.54 95.4
July 92.55 91.45 — 184.00 92.00 91.5
August 91.94 87.14 — 179.08 89.54 89.0
September 92.66 92.13 — 184.79 92.40 91.9
October 100.20 96.99 — 197.19 98.60 98.1
November 105.57 104.86 — 210.43 105.21 104.06
December 104.98 111.35 — 216.33 108.16 107.6
Total of Monthly Averages 1206.56
Average of Monthly Averages 100.55

96
Putting average of monthly averages as 100, monthly averages have been admitted to obtain seasonal
index for each month.
108.28
For example, Seasonal Index for January =  100  107.6
100.55

108.65
for February =  100  108.1
100.55
Merits
This method is more widely used in practice than other methods. The index calculated by the ratio-
to-moving average method does not fluctuate very much. The 12-month moving average follows
the cyclical course of the actual data closely. So index ratios are the true representative of the data
from which they have been obtained.
Limitations
All seasonal index numbers cannot be calculated for each month for which data is available. When
a four month average is taken, 2 months, in the beginning and 2 months in the end are left out for
which we cannot calculate seasonal index numbers.
1.9.3 Ratio-to-trend method
The ratio-to-trend method is similar to ratio-to-moving-average method. The only difference is the
way of obtaining the trend values. Whereas in the ratio-to-moving-average method, the trend values
are obtained by the method of moving averages, in the ratio-to-trend method, the corresponding
trend is obtained by the method of least squares.
The steps in the calculation of seasonal variation are as follows :
(i) Arrange the unadjusted data by years and months.
(ii) Compute the trend values for each month with the help of least squares equation.
(iii) Express the data for each month as a percentage ratio of the corresponding trend value.
(iv) Aggregate the January’s ratios, February’s ratios, etc., computed previously
(v) Find the average ratio for each month.
(vi) Adjust the average ratios found in step (v) so that they will themselves average 100 per
cent.
The last step gives us the seasonal index for each month.
Sometimes the median is used in place of the arithmetic average of the ratios-to-trend. The
choice depends upon circumstances but there is a preference for the median if several erratic ratios
are found. In fact, if a fairly large number of years, say, 20 or 15, are used in the computation, it is
not uncommon to omit extremely erratic ratios from the computation of average of monthly ratios.
Only the arithmetic average should be used for small number of years.
This method has the advantage of simplicity and case of interpretation. Although it makes
allowance for the trend, it may be influenced by errors in the calculation of the trend. The method
may also be influenced by cyclical and erratic influences. This source of possible error is eliminated
by the selection of a period of time in which depression is offset by prosperity.

97
Example 17 : Find seasonal variations by the ratio-to-trend method from the following data:
Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter
2010 30 40 36 34
2011 34 52 40 44
2012 40 58 54 48
2013 54 76 68 62
2014 80 92 86 82
Solution : For finding out seasonal variations by ratio-to-trend method, first the trend for yearly
data will be obtained and convert them into quarterly data.
Year Yearly Average of X XY X2 Trend Y = 56+ 12X
totals quarterly values
values of Y
2010 140 35 –2 – 70 4 32 56 + 12 (–2)
2011 180 45 –1 – 45 1 44 56+ 12 (–1)
2012 200 50 0 0 0 56 56+12(0)
2013 260 65 1 65 1 68 56+12(1)
2014 340 85 2 170 4 80 56+12(2)
Total 1120 280 + 120 10 280
Y = a + bX
Y 280
a =  56
N 5
XY 120
b =   12
X 2 10
The trend value for the middle quarter 2010, i.e., which should come between 2nd and 3rd
quarter is 32.
Yearly increment 12
Quarterly increment is :  3
4 4
3
Therefore, the trend value for 2nd quarter will be 32   30.5
2
3
The trend value for 3rd quarter is 32   33.5
2
Similarly other values will be calculated.
Quarterly Trend Values
Year 1st 2nd 3rd 4th Total
2010 27.5 30.5 33.5 36.5 128
2011 39.5 42.5 45.5 48.5 176
2012 51.5 54.5 57.5 60.5 224
2013 63.5 66.5 69.5 72.5 272
2014 75.5 78.5 81.5 84.5 320

98
O 
Now we calculate percentage of trend values on the basis of quarterly trend values   100
T

Year 1st 2nd 3rd 4th


2010 109.1 131.1 107.5 93.1
2011 86.1 122.4 109.9 90.7
2012 77.7 106.4 93.9 79.3
2013 85.0 114.3 97.8 85.5
2014 106.0 117.2 105.5 97.0
Total 463.9 591.4 514.6 445.6
Average 92.78 118.28 102.92 89.12

The average of quarterly average of trend figures :


92.78 + 118.28 + 102.92 + 89.12 403.10
  100.775
4 4
92.78
Quarterly seasonal Index for 1st Quarter :  100  92.1
100.775
118.28
Quarterly seasonal Index for 2nd Quarter :  100  117.4
100.775
102.92
Quarterly seasonal Index for 3rd Quarter :  100  102.1
100.775

89.12
Quarterly seasonal Index for 4th Quarter :  100  88.4
100.775
The total of seasonal indices should be equal to 400 and that for monthly indices should be
1200.
Merits
(i) This method is based on a logical procedure for measuring seasonal variations. This
procedure has advantage over the moving average method for it has a ratio-to-trend value
for each month for which data is available. So this method avoids loss of data which is
inherent in the case of moving averages. If the period of time series is very short then the
advantage becomes more prominent.
(ii) It is a simple method.
(iii) It is easy to understand.
Limitations :
If the cyclical changes are very wide in the time series, the trend can never follow the actual data, as
closely as a 12-month moving average will follow, under the ratio-to-trend method. There will be
more bias in a seasonal index computed by ratio-to-trend method.

99
1.9.4 Link Relatives Method
Among all the methods of measuring seasonal variation, link relatives method is the most difficult
one. When this method is adopted the following steps are taken to calculate the seasonal variation
indices:
(i) Calculate the link relatives of the seasonal figures. Link relatives are calculated by dividing
the figure of each season* by the figure of immediately preceding season and multiplying
it by 100.
Current season’s figure
 100
Previous season’s figure
These percentages are called link relatives since they link each month (or quarter or other time
period) to the preceding one.
(ii) Calculating the average of the link relatives for each season. While calculating average
we might take arithmetic average but median is probably better. The arithmetic average
would give undue weight to extreme cases which were not primarily due to seasonal
influences.
(iii) Convert these averages into chain relatives on the base of the first season.
(iv) Calculate the chain relatives of the first season on the basis of the last season. There will
be some difference between the chain relative of the first season and the chain relative
calculated by the previous method. This difference will be due to long-term changes. It is
therefore necessary to correct these chain relatives.
(v) For correction, the chain relative of the first season calculated by first method is deducted
from the chain relative (of the first season) calculated by the second method. The difference
is divided by the number of seasons. The resulting figure multiplied by 1, 2, 3 (and so on)
is deducted respectively from the chain relatives of the 2nd, 3rd, 4rd (and so on) seasons.
These are correct chain relatives.
(vi) Express the corrected chain relatives as percentage of their averages. These provide the
required seasonal indices by the method of link relatives.
The following example will illustrate the process.
Example 18 : Apply method of link relatives to the following data and calculate seasonal indices.

Quarterly Figures
Quarter 2011 2012 2013 2014 2015
I 6.0 5.4 6.8 7.2 6.6
II 6.5 7.9 6.6 5.8 7.4
III 7.8 8.4 9.3 7.5 8.0
IV 8.7 7.3 6.4 8.5 7.1

100
Solution : Calculation of Seasonal Indices by Method of Link Relatives
Quarter
Year I II III IV
2011 — 108.3 120.0 111.5
2012 62.1 146.3 106.3 86.9
2013 93.2 95.6 143.1 68.8
2014 112.5 80.6 129.3 113.3
2015 77.6 110.6 109.6 88.8

345.4 541.4 608.3 469.3


Arithmetic average  86.35  108.28  121.66  93.86
4 5 5 5

100  108.28 121.66  108.28 93.86  131.73


Chain relative 100
100 100 100
= 108.28 = 131.73 = 123.64

Corrected 108.28 – 1.675 131.73 – 3.35 123.64 – 5.025


Chain relative 100 = 106.605 = 128.38 = 118.615

106.605 128.38 118.615


Seasonal Indices 100  100  100  100
113.4 113.4 113.4
= 94.0 = 113.21 = 104.60
The correction factor is calculated as follows :
Chain relative of the first quarter (on the basis of first quarter) = 100
86.35  123.6
Chain relative of the first quarter (on the basis of the last quarter) =  106.7
100
Difference between these chain relatives = 106.7 – 100 = 6.7
6.7
Difference per quarter =  1.675
4
Adjusted chain relatives are obtained by subtracting 1 × l. 675, 2 × 1.675, 3 × 1.675 from the
chain relatives of 2nd, 3rd and 4th quarters, respectively.
Seasonal variation indices are calculated as below:
100 + 106.605 + 128.38 + 118.615 453.6
  113.4
4 4
Correct chain relatives × 100
Seasonal variation index =
113.4
Meaning of “Normal” in Business Statistics
Business is often said to be “above normal” or “below normal”. When so used the term “normal” is
generally recognized to mean a level of activity which is characterized by the presence of basic

101
trend and seasonal variation. This implies that the influence of business cycles and erratic fluctuations
on the level of activity is assumed to be insignificant. Therefore, the product of trend value for any
period when adjusted by the seasonal index for that period gives us an estimate of the normal
activity during that period.

1.10 SUMMARY
A time series refers to the observations of a random variable like sales, employment, etc.
placed in a chronological order.
The twin reasons for studying the time series include a historical understanding of the past
data and to make forecast for the future.
There are four components of a time series: (i) Secular trend, (ii) Cyclical variations, (iii)
Seasonal variations, and (iv) Irregular variations.
Secular trend refers to the general pattern of the values in a time series – it is the long-term
tendency of the movement of the variable.
The cyclical variations are caused by business cycles. A business cycle has four phases: (i)
peak time or prosperity, (ii) recession, (iii) trough or depression, and (iv) recovery.
Seasonal variations, which are caused by weather, customs, festivals, etc show themselves in
a period of one year. They repeat year after year.
Irregular variations or random fluctuations are those which result from unpredictable events
like strikes, natural or other calamities etc.
The two models used for the purpose of decomposing are (i) additive model, and (ii)
multiplicative model.
The additive model is based on the assumption that the four components add up to make time
series. They are assumed to be independent.
The multiplicative model is based on the assumption that a time series is the product of the
four components.
The linear trend is obtained by fitting a straight line to the given data. It is fitted on the principle
of least squares.
It is possible to shift the origin of an equation as Yt = a + b (X ± k).
The annual trend equations can be changed on a monthly or quarterly basis, and reverse is also
possible.
The parabolic trend involves fitting a second-degree parabola to the given data. It is of the
form Yt = a + bX + cX2.
The exponential trend is appropriate where the variable in consideration grows or declines
exponentially. It takes the form Yt = abx.
The method of moving averages is another way of obtaining trend. Beginning with a certain
number of time periods, average is calculated and then successive averages are calculated by
dropping the first of the values and including the next one.
Seasonal variations are measured and expressed as seasonal indices.
The methods of simple averages, ratio-to-moving averages and ratio-to-trend are primarily
used for the purpose.

102
1.11 SELF ASSESSMENT QUESTIONS
Exercise 1: Mark the following statements as True or False:
(i) A time series refers to a sequence of observations of a random variable over time and
placed in chronological order.
(ii) The components of a time series are: secular trend, cyclical variations, seasonal variations
and chance variations.
(iii) Secular trend and cyclical variations are related to long-term movements while seasonal
variations and random variations refer to the short-term changes.
(iv) Seasonal variations are highly predictable.
(v) Factors like population change, tastes, consumer incomes, etc. explain cyclical variations.
(vi) Technological innovations cause cyclical variations.
(vii) Business cycles relate to the economy as a whole and never to a particular industry.
(viii) The seasonal variations are caused by the changing seasons in a country.
(ix) The seasonal variations component is most important to analyse for purposes of forecasting
and planning in the short term.
(x) Irregular variations are erratic in nature.
(xi) In additive model, S, C and I are expressed as deviations from their respective mean values.
(xii) Using the multiplicative model of analysis, all the components of a time series are expressed
as percentages.
(xiii) In obtaining trend equation to a given set of data, the origin should be taken in such a
manner that X must work out to be equal to zero.
(xiv) The value of ‘a’, the intercept of a trend equation, is related to its origin.
(xv) The trend values for various years given in the data and the projected values do not change
with a shift in the origin.
(xvi) The monthly trend equations can also be converted into annual trend equations, and for this
we first need to shift the origin of the trend equation to July 1 of the year of origin.
(xvii) In exponential trend, a straight line trend is fitted to the log values of the Y variable.
(xviii) The exponential trend is an example of non-linear trend.
(xix) Moving averages require centering whenever the underlying period, n, used in their
calculation is even.
(xx) A monthly sales budget can be drawn up by multiplying monthly seasonal indices by average
monthly sales and dividing each by 100.
Ans. 1. T, 2. T, 3. F, 4. T, 5. F, 6. F, 7. F, 8. F, 9. T, 10. T, 11. F, 12. F, 13. F, 14. T, 15. T, 16. T, 17. T,
18. T, 19. T, 20. T
Exercise 2: Questions and Answers
(i) What is a time series? What are its components? With which component of a time series
would you mainly associate each of the following?
(a) Wild cat strike in a factory, interrupting production for 15 days.
(b) Increase in sales in a departmental store on Diwali.
(c) An era of prosperity.
(d) Fall in death rate due to advances in medical science.

103
(ii) What is meant by decomposition of a time series? Explain the difference between additive
and multiplicative models of analysing time series.
(iii) Explain the rules of converting annual trend equation (i) on a monthly basis, and (ii) on a
quarterly basis. How can a quarterly trend equation be converted on an annual basis?
(iv) How are seasonal variations measured under the multiplicative model of analysing time
series? How are the seasonal indices interpreted?
(v) Explain the following methods of calculating seasonal indices:
(a) Method of Simple Averages
(b) Ratio-to-trend Method
(c) Ratio-to-moving averages Method
(vi) The following data relates to gross ex-factory value (in Rs. crores) of output of a factory
over the last few years:
Year : 2006 2007 2008 2009 2010 2011 2012
Value : 320 360 368 332 376 396 368
(a) Fit a straight line trend by the method of least squares, taking the year of origin as
2006.
(b) What is the average annual change in the value of output?
(c) Obtain trend equation using the year 2009 as the origin. How does it compare with
equation obtained in (a) above?
(vii) Demand (in ‘000 metric tonnes) for sugar of Sweet India is given here:
Year : 2006 2007 2008 2009 2010 2011 2012
Demand : 77 88 94 85 91 98 90
(a) Fit a straight line trend by the method of least squares.
(b) Calculate trend values and plot observed values and trend values on a graph.
(c) Eliminate trend component using the multiplicative model.
(d) Obtain the forecast of demand for the year 2014.
(viii) Below are given figures of production of a sugar factory:
Year : 2005 2006 2007 2008 2009 2010 2011 2012
Production : 88 98 100 91 102 J07 100 118
(‘000 tons)
(a) Fit straight line trend to the above data by the method of least squares.
(b) What is the average annual change in the sugar production?
(c) Obtain trend values for various years. Show that the sum of difference between actual
and trend values is equal to zero.
(d) Eliminate the trend using multiplicative model. What components are thus left over?
(e) Convert the trend equation on a month-to-month basis and shift the origin to January
2006.
(ix) For each of the following derive the monthly trend equation:
(a) Yt = 960 + 72XOrigin: 2008, X Unit = 1 Year, Y unit = Annual sales of coffee in Rs.
(b) Yt = 169.58 + 78XOrigin: 2009, X Unit = 1 Year, Y unit = Average monthly production
(c) Yt = 2,760 + 212XOrigin: 2007, X Unit = 1/2 Year, Y unit = Annual earnings in Rs.
(d) Yt = 72 + 12XOrigin: 2010, X Unit = 1/2 Year, Y unit = Average monthly production

104
(x) Given the trend equation:
Yt = 204 + 24X
(2008 = 0, X unit = 1 Year, Y unit = Average monthly values)
(a) Convert this equation on a monthly basis.
(b) Shift the origin of the monthly trend equation to January, 2007.
(c) Estimate the value for January 2010.
(xi) Given the following trend equation:
Yt= 1,880+ 6X
[2009 = 0, X unit = 1 Year, Y unit = Average monthly sales (in ‘000 Rs.)]
(a) Convert this equation on a yearly basis.
(b) Estimate sales for the year 2013.
(c) Obtain a quarterly trend equation from (a) above.
(d) Obtain quarterly trend equation with origin at I Quarter, 2010.
(xii) Given below is a trend equation:
Yt = 372 + 288X
(Origin 2006, X unit = 1 Year, Y unit = Annual sales) Convert the above equation:
(a) To monthly trend equation with January 2007 as origin and estimate sales for March,
2007.
(b) To quarterly trend equation with first quarter, 2007 as the origin and estimate sales for
third quarter of 2007.
(xiii) Convert the following trend equation on a monthly basis and obtain the trend value for
November 2012:
Yt = 432 + 144X – 60X2
2010 = 0; X unit = 1 Year; Y unit = Yearly production in‘000 units
(xiv) The sales made by a company in the years 2006 through 2012 are given here :
Year : 2006 2007 2008 2009 2010 2011 2012
Sales (in millions of Rs.) : 30 38 75 90 88 140 188
x
(a) Fit an exponential trend Yt = ab to the data and obtain the trend equation.
(b) Plot and data on a graph and also plot the trend line.
(c) Find the projected sales for the year 2014.
(d) What is the average rate of growth of sales?
(xv) From the following data, estimate the trend values by taking 4-yearly moving averages :
Year Sales (Rs. lakh) Year Sales (Rs. Lakh)
1993 200 1999 360
1994 120 2000 400
1995 280 2001 320
1996 240 2002 360
1997 160 2003 360
1998 320

105
(xvi) The trend equation for quarterly sales of a firm is estimated to be as: Y = 20 + 2X, where
Y is sales per quarter in millions of rupees, unit of X is one quarter and the origin is the
middle of the first quarter (Jan.-Mar.) of 2005. The seasonal indices of sales for the four
quarters are given below :
Quarter : I II III IV
Seasonal Index: 120 105 85 90
Estimate the sales for each quarter of 2010.
(xvii) Calculate seasonal indices from the following data by ratio-to-moving averages method:
Year Quarter

I II III IV
2007 48 52 44 60
2008 60 72 64 80
2009 92 84 88 88
2010 96 100 96 104
2011 102 108 96 112
2012 108 116 120 116
(xviii) The ratios of observed values to moving averages in percentages are given in the following
table:
Year Quarter

I II III IV
2009 — — 112.2 96.8
2010 110.8 118.6 98.3 92.2
2011 102.4 116.2 96.3 88.0
2012 89.8 96.3 — —
Calculate the quarterly seasonal indices.

Ans. 6. Yt = 336 + 8X, 8 crores, 360 + 8X, 7. Yt = 89 + 2X, 99000 mt, 8. Yt = 99 + 3X(2008 = 0),
3000 tons, 90, 93, 96 etc. C & I, Yt = 7.635 + 0.0208X (Jan 2006 = 0), 9. Yt = 80 + 0.5 X (July 2008
= 0) Yt = 169.58 + 6.5 X (July 1, 2009 = 0), Yt = 230 + 2.944 X (July 1, 2007 = 0), Yt = 72 + 2X (July
1, 2010 = 0) 10. Yt = 204 + 2X (July 1, 2008 = 0), Yt = 169 + 2X (Jan 2007 = 0), 241 11. Yt = 22.56
+ 72X (2009 = 0), Y 2013 = 22.848, Yt = 5651.25 + 4.5X (Q1, 2010 = 0) 12. Yt = 44 + 2X (Jan 2007
= 0), 48, Yt = 138 + 18X (Q1, 2007 = 0), 174 13. Yt = 36 + X – 0.0347 X2(2005 = 0), Y nov, 2012
= 36.315 14. (a) = 78.16 (1.3438)x (c) = 342.5, (d) = 34.38% 15. 205, 225, 260, 290, 330, 355, 360,
16. 75.6, 66.15, 53.55, 56.7 17. 101. 92, 103.17, 92.08, 102.84, 18. 99.52, 108.74, 100.76, 90.98

106
This question paper contains 16 printed pages
Your Roll No. ............
5565
B.Com. (Hons.)/l

Paper Code : A-105


Paper IV—BUSINESS STATISTICS
(New Course : Admissions of 2004 and onwards)

Time : 3 Hours Maximum Marks: 55

(Write your Roll No. on the top immediately on receipt of this question paper.)

Note :— The maximum marks printed on the question paper are applicable for the students of the
regular colleges (Cat. ‘A’). These marks will, however, be scaled up proportionately in
respect of the students of SOL at the time of posting of awards for compilation of result.
Note : Answers may be written either in English or in Hindi; but the same medium should be
used throughout the paper.

Attempt All questions.


All questions carry equal marks.
1. (a) Distinguish between Mean Deviation and Standard Deviation. Why Standard Deviation
is considered a better method of variation as compared to Mean Deviation? 6
(b) In 2000 and 2010 the population of a country was 151.3 million and 179.3 million
respectively:
(i) What was the average percentage increase per year?
(ii) Calculate the population for the year 2004.
(iii) Calculate population for the year 2020. 5
Or
(a) The median and mode of the following wage distribution are known to be Rs. 33.5 and
Rs. 34 respectively. Three frequency values from the table are, however, missing. Find
these missing values : 6
Daily wages Frequencies
(in Rs.)
0 – 10 10
10 – 20 10
20 – 30 ?

107
30 – 40 ?
40 – 50 ?
50 – 60 6
60 – 70 4

230
(b) The following table gives heights of boys and girls studying in a college. Find :
(i) Standard deviation of the heights of boys and girls taken together.
(ii) Whose heights are more variable? 5
Boys Girls
Number 400 100
Average height 68 inches 65 inches
Variance 9 4
2. (a) If the first four moments of distribution about the value 5 are equal to –4, 22, –117 and
560, determine the corresponding moments : 6
(i) About the mean, and
(ii) About zero.
(b) Given the bivariate data :
X Y
1 6
5 1
3 0
2 0
1 1
1 2
7 1
3 5
(i) Fit a regression line of Y and X and hence predict Y if X = 5
(ii) Fit a regression line of X on Y and hence predict X if Y = 2.5
(iii) Calculate Karl Pearson’s coefficient of correlation. 5
Or
(a) What is Kurtosis ? Explain the significance of studying Kurtosis. 5
(b) Coefficient of correlation between X and Y for 20 items is 0.3, mean of X is 15 and that
of Y is 20, standard deviations are 4 and 5 respectively. At the time of calculation one
item 27 has wrongly been taken as 17 in case of X series and 35 instead of 20 in case of
Y series. Find the correct coefficient of correlation. 6

108
3. (a) Two sets of Indices one with 1998 as base and the other with 2006 are given below: 6
Year Index A Year Index B
1998 100 2006 100
1999 110 2007 105
2000 120 2008 90
2001 190 2009 95.
2002 300 2010 102
2003 330 2011 110
2004 360 2012 96
2005 390
2006 400
You are required to splice the Index B to Index A. Then also shift the base to 2008.
(b) Calculate 5 yearly moving average of the number of students studying in a college shown
below : 5
Year No. of Students
2003 332
2004 317
2005 357
2006 392
2007 402
2008 405
2009 410
2010 427
2011 405
2012 431
Or
(a) From the data given below, calculate price index number for 2012 with 2011 as base year :
Commodities 2011 2012
Price Quantity Price Quantity
A 20 8 40 6
B 50 10 60 5
C 40 15 50 15
D 20 20 20 25
Calculate the following :
(i) Laspeyre’s Index

109
(ii) Paasche Index
(iii) Bowley’s Index and
(iv) Fisher’s Ideal Index. 6
(b) Identify the component of a time series with which each of the following be associated
and also give reasons why :
(i) A fire in factory delaying production for three weeks
(ii) An era of prosperity
(iii) Sale of sweets during Deepawali
(iv) A need for increased wheat production due to constant increase in population.
(v) The increase in day temperature from winter to summer. 5
4. (a) Define probability. Discuss the importance of probability in decision-making. 5
(b) A bag contains 2 white balls and 3 black balls. Four persons A, B, C, D in the order
named each draws one ball and does not replace it. The first to draw a white ball receives
Rs. 20. Determine their expectations. 6
Or
(a) A man has five coins, one of which has two heads. He randomly takes out a coin and
tosses it three times :
(i) What is the probability that it will fall head upward all the times?
(ii) If it always falls head upward, what is the probability that it is the coin with two
heads?
(b) In a binomial distribution consisting of 5 independent trials, probabilities of 1 and 2
successes are 0.4096 and 0.2048. Find the parameter ‘p’ of the distribution.
5. A food products company is contemplating the introduction of a revolutionary new product
with new packaging to replace the existing product at much higher price (S1) or a moderate
change in the composition of the existing product with a new packaging at a small increase in
price (S2) or a small change in the composition of the existing except the word “New’ with a
negligible increase in price (S3). The three possible states of nature are :
(i) high increase in sales (N1),
(ii) No change in sales (N2) and
(iii) decrease in sales (N3).
The marketing department of the company worked out the payoffs in terms of yearly net profits
for each of the strategies of these events. This is represented in the following table : 11
State of Nature
Payoffs (in Rs.)
Strategies N1 N2 N3
S1 7,00,000 3,00,000 1,50,000
S2 5,00,000 4,50,000 0
S3 3,00,000 3,00,000 3,00,000

110
Which strategy should the executive choose on the basis of:
(i) Maximin Criterion
(ii) Maximax Criterion
(iii) Minimax Regret Criterion, and
(iv) Laplace Criterion.
Or
(a) What is Standard Error of Estimate ? Why is it calculated? 4
(b) The arithmetic mean of a set of a statistical observations is 20 while its geometric mean is
19 and Harmonic Mean is 25. Comment on the statement. 3
(c) The Mean and Standard Deviation of two brands and interpret the result :
Brand-I Brand-II
Mean 800 hours 770 hours
Standard Deviation 100 hours 60 hours
Calculate a measure of relative dispersion for the two brands and interpret the result. 4

111