11 views

Uploaded by Dinika

sm

- Topik 6 Kolerasi
- Regression Analysis
- Case Study: Single Subject and Group Analysis
- Correlation and Regression
- Correlation and Hypothesis Testing
- Dissertation
- MUP Syllabus-80credits
- MB0024 Statistics for Management...ANSWER
- Functional Relationship
- Learning Activity 5.19 Psy
- Youth and Science Centres in Norway
- Research Training
- 2965_01 5540F Paper 2 Foundation Tier November 2008
- Chapter I INTRODUCTION Background of The
- B-Modeling Multivariate Distributions Using Monte Carlo Simulation for Structural Reliability Analysis With Complex Performance Function
- Stat_II_12_practice after midterm 4.pdf
- the effects of class size
- Abstract
- Crosstabs.docx
- 1011648bgdfxc060111

You are on page 1of 114

) I Year Commerce

SM 2 : Unit (III - V)

(Campus of Open Learning)

University of Delhi

Department of Commerce

Prepared by : Dr. K.L. Dahiya

Graduate Course

CONTENTS

Lesson 1 : Correlation Analysis

Lesson 2 : Regression Analysis

UNIT - IV INDEX NUMBERS

Lesson 1 : Index Numbers

UNIT - V TIME SERIES ANALYSIS

Lesson 1 : Time Series Analysis

Prepared by :

Dr. K.L. Dahiya

University of Delhi

5, Cavalry Lane, Delhi-110007

Academic Session 2016-17 (8300 Copies)

Printed at : Educational Stores, S-5, Bsr. Road Ind. Area, Ghaziabad (U.P.)

UNIT-3

SIMPLE CORRELATION AND REGRESSION ANALYSIS

LESSON-1 : SIMPLE CORRELATION

1. STRUCTURE

1.0 Objective

1.1 Introduction

1.2 Utility of Correlation

1.3 Difference between Correlation and Causation

1.4 Types of Correlation

1.5 Methods of Studying Correlation

1.5.1 Scatter Diagram

1.5.2 Graphic Method

1.5.3 Karl Pearson’s Coefficient of Correlation

1.5.4 Properties of Coefficient of Correlation

1.5.5 Probable Error of Coefficient of Correlation

1.5.6 Rank Correlation

1.5.7 Concurrent Deviation Method

1.6 Summary

1.7 Self Assessment Questions

1.0 OBJECTIVE

After studying this lesson, you should be able to :

(i) Understand the concept of correlation

(ii) Indentify different types of correlation

(iii) Understand the notion and interpretation of coefficient of correlation

(iv) Compute the value of correlation by different methods

(v) Compute correlation coefficient for bivariate frequency distribution.

1.1 INTRODUCTION

In the earlier chapters we have discussed univariate distributions to highlight the important

characteristics by different statistical techniques. Univariate distribution means the study related to

one variable only. We may however come across certain series where each item of the series may

assume the values of two or more variables. The distributions in which each unit of series assumes

two values is called bivariate distribution. In a bivariate distribution, we are interested to find out

whether there is any relationship between two variables. The correlation is a statistical technique

which studies the relationship between two or more variables and correlation analysis involves

various methods and techniques used for studying and measuring the extent of relationship between

the two variables. When two variables are related in such a way that a change in the value of one is

accompanied either by a direct change or by an inverse change in the values of the other, the two

variables are said to be correlated. In the correlated variables an increase in one variable is

accompanied by an increase or decrease in the other variable. For instance, relationship exists

1

between the price and demand of a commodity because keeping other things equal, an increase in

the price of a commodity shall cause a decrease in the demand for that commodity. Relationship

might exist between the heights and weights of the students and between amount of rainfall in a city

and the sales of raincoats in that city.

These are some of the important definitions about correlation.

Croxton and Cowden says, “When the relationship is of a quantitative nature, the appropriate

statistical tool for discovering and measuring the relationship and expressing it in a brief formula is

known as correlation”.

A.M. Tuttle says, “Correlation is an analysis of the covariation between two or more variables.”

W.A. Neiswanger says, “Correlation analysis contributes to the understanding of economic

behaviour, aids in locating the critically important variables on which others depend, may reveal to

the economist the connections by which disturbances spread and suggest to him the paths through

which stabilizing forces may become effective.

L.R. Conner says, “If two or more quantities vary in sympathy so that the movements in one

tends to be accompanied by corresponding movements in others then they are said to be correlated.

The study of correlation is very useful in practical life as revealed by these points.

(1) With the help of correlation analysis, we can measure in one figure, the degree of relationship

existing between variables like price, demand, supply, income, expenditure etc. Once we know

that two variables are correlated then we can easily estimate the value of one variable, given

the value of other.

(2) Correlation analysis is of great use to economists and businessmen, it reveals to the economists

the disturbing factors and suggest to him the stabilizing forces. In business, it enables the

executive to estimate costs, sales etc. and plan accordingly.

(3) Correlation analysis is helpful to scientists. Nature has been found to be a multiplicity of inter-

related forces.

The term correlation should not be misunderstood as causation. If correlation exists between two

variables, it must not be assumed that a change in one variable is the cause of a change in other

variable. In simple words, a change in one variable may be associated with a change in another

variable but this change need not necessarily be the cause of a change in the other variable. When

there is no cause and effect relationship between two variables but a correlation is found between

the two variables such correlation is known as “spurious correlation” or “nonsense correlation”.

Correlation may exist due to the following:

(1) Pure change correlation : This happens in a small sample. Correlation may exist between

incomes and weights of four persons although there may be no cause and effect relationship

between incomes and weights of people. This type of correlation may arise due to pure random

sampling variation or because of the bias of investigator in selecting the sample.

2

(2) When the correlated variables are influenced by one or more variables. A high degree of

correlation between the variables may exist, where the same cause is affecting each variable or

different cause affecting each with the same effect. For instance, a degree of correlation may

be found between yield per acre of rice and tea due to the fact that both are related to the

amount of rainfall but none of the two variables is the cause of other.

(3) When the variable mutually influence each other so that neither can be called the cause of

other. At times it may be difficult to say that which of the two variables is the cause and which

is the effect because both may be reacting on each other.

Correlation can be categorised as one of the following :

(i) Positive and Negative

(ii) Simple and Multiple

(iii) Partial and Total

(iv) Linear and Non-Linear (Curvilinear)

Positive and Negative Correlation

Positive or direct Correlation refers to the movement of variables in the same direction. The

correlation is said to be positive when the increase (decrease) in the value of one variable is

accompanied by an increase (decrease) in the value of other variable also. Negative or inverse

correlation refers to the movement of the variables in opposite direction. Correlation is said to be

negative, if an increase (decrease) in the value of one variable is accompanied by a decrease (increase)

in the value of other.

Simple and Multiple Correlation

Under simple correlation, we study the relationship between two variables only i.e., between the

yield of wheat and the amount of ramfall or between demand and supply of a commodity. In case of

multiple correlation, the relationship is studied among three or more variables. For example, the

relationship of yield of wheat may be studied with both chemical fertilizers and the pesticides.

Partial and Total Correlation

There are two categories of multiple correlation analysis. Under partial correlation, the relationship

of two or more variables is studied in such a way that only one dependent variable and one independent

variable is considered and all others are kept constant. For example, coefficient of correlation between

yield of wheat and chemical fertilizers excluding the effects of pesticides and manures is called

partial correlation. Total correlation is based upon all the variables.

Linear and Non-Linear Correlation

When the amount of change in one variable tends to keep a constant ratio to the amount of change

in the other variable, then the correlation is said to be linear. But if the amount of change in one

variable does not bear a constant ratio to the amount of change in the other variable then the correlation

is said to be non-linear. The distinction between linear and non-linear is based upon the consistency

of the ratio of change between the variables.

3

1.5 METHODS OF STUDYING CORRELATION

There are different methods which helps us to find out whether the variables are related or not.

(1) Scatter Diagram Method

(2) Graphic Method

(3) Karl Pearson’s Coefficient of Correlation

(4) Properties of Coefficient of Correlation

(5) Probable Error of Coefficient of Correlation

(6) Rank Method

(7) Concurrent Deviation Method

Let us understand these methods one by one.

1.5.1 Scatter Diagram

Scatter diagram is drawn to visualise the relationship between two variables. The values of more

important variable is plotted on the X-axis while the values of the other variable are plotted on the

Y-axis. On the graph, dots are plotted to represent different pairs of data. When dots are plotted to

represent all the pairs, we get a scatter diagram. The way the dots scatter gives an indication of the

kind of relationship which exists between the two variables. While drawing scatter diagram, it is

not necessary to take at the point of sign the zero values of X and Y variables, but the minimum

values of the variables considered may be taken.

When there is a positive correlation between the variables, the dots on the scatter diagram run

from left hand bottom to the right hand upper corner. In case of perfect positive correlation all the

dots will lie on a straight line.

When a negative correlation exists between the variables, dots on the scatter diagram run from

the upper left hand corner to the bottom right hand corner. In case of perfect negative correlation, all

the dots lie on a straight line.

4

If a scatter diagram is drawn and no path is formed, there is no correlation. Students are advised

to prepare two scatter diagrams on the basis of the following data :

Demand Schedule

Price (Rs.) Commodity Demand (units)

6 180

7 150

8 130

9 120

10 125

(ii) Data for the second Scatter Diagram :

Supply Schedule

Price (Rs.) Commodity Supply

50 2,000

51 2,100

52 2,200

53 2,500

54 3,000

55 3,800

56 4,700

Students will find that the first diagram indicate a negative correlation where the second diagram

shall reveal a positive correlation.

1.5.2 Graphic Method

In this method the individual values of the two variables are plotted on the graph paper. Therefore

two curves are obtained – one for X variable and another for Y variable.

The graph is interpreted as follows :

(i) If both the curves run parallel or nearly parallel or more in the same direction, there is

positive correlation.

(ii) On the other hand, if both the curves move in the opposite direction, there is a negative

correlation.

5

Example 1 : Show correlation from the following data by graphic method;

Year 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

Average Income (Rs.) 100 110 125 140 150 180 200 220 250 360

Average Expenditure (Rs.) 90 95 100 120 120 140 150 170 200 260

Solution :

0 2005 06 07 08 09 10 11 12 13 2014

YEARS

The graph prepared shows that income and expenditure have a close positive correlation. As

income increases, the expenditure also increases.

1.5.3 Karl Pearson’s Coefficient of Correlation

Karl Pearson’s method, popularly known as Pearsonian co-efficient of correlation, is most widely

applied in practice to measure correlation. The Pearsonian co-efficient of correlation is represented

by the symbol r.

According to Karl Pearson’s method, coefficient of correlation between the variables is obtained

by dividing the sum of the products of the corresponding deviations of the various items of two

series from their respective means by the product of their standard deviations and the number of

pairs of observations. Symbolically,

xy

r= where r stands for coefficient of correlation ...(i)

N x y

where x1 , x2 , x3 , x4 .....xn are the deviations of various items of the first variable from the mean,

y1 , y2 , y3 .... yn are the deviations of all items of the second variable from mean,

xy is the sum of products of these corresponding deviations. N stands for the number of pairs, x

stands for the standard deviation of X variable and y stands for the standard deviation of Y variable.

6

x 2 y 2

x = and y =

N N

If we substitute the value of x and y in the above written formula of computing r, we get

xy xy

r= or r=

x 2 y 2 x 2 y 2

N

N N

Degree of correlation varies between +1 and –1; the result will be +1 in case of perfect positive

correlation and –1 in case of perfect negative correlation.

Computation of correlation coefficient can be simplified by dividing the given data by a common

factor. In such a case, the final result is not multiplied by the common factor because coefficient of

correlation is independent of change of scale and origin.

Example 2 : Calculate Coefficient of Correlation from the following data :

X 50 100 150 200 250 300 350

Y 10 20 30 40 50 60 70

Solution :

XX Y Y

50 10

X (X X ) x x2 Y Y Y y y2 xy

50 – 150 –3 9 10 – 30 –3 9 9

100 – 100 –2 4 20 –20 –2 4 4

150 – 50 –1 1 30 – 10 –1 1 1

200 0 0 0 40 0 0 0 0

250 + 50 +1 1 50 + 10 +1 1 1

300 + 100 +2 4 60 + 20 +2 4 4

350 + 150 +3 9 70 + 30 +3 9 9

x = 0 x2 = 28 y = 0 y2 = 28 xy = 28

xy

r =

x 2 y 2

28 28

By substituting the values we get r = 1

28 28 28

Hence there is perfect positive correlation.

Example 3 : A sample of five items is taken from the production of a firm, length and weight of the

five items are given below:

7

Length (inches) 3 4 6 7 10

Weight (ounces) 9 11 14 15 16

Calculate Karl Pearson’s correlation coefficient between length and weight and interpret the

value of correlation coefficient.

X 30 Y 65

Solution : X 6 and Y 13

N 5 N 5

(X X ) (Y Y )

X x x2 Y y y2 xy

3 –3 9 9 –4 16 12

4 –2 4 11 –2 4 4

6 0 0 14 +1 1 0

7 +1 1 15 +2 4 2

10 +4 16 16 +3 9 12

X = 30 0 30 Y = 65 0 34 30

xy

r= 2 2

where xy 30, x 2 30, and y 2 34

x y

30 30

r= 0.939 Ans.

30 34 1020

The value of r indicates that there exists a high degree positive correlation between lengths

and weights.

Example 4. From the following data, compute the coefficient of correlation between X and Y :

X Series Y Series

Number of items 15 15

Arithmetic Mean 25 18

Square of deviation from Mean 136 138

Summation of product deviations of X and Y from their Arithmetic Means = 122.

Solution : Denoting deviations of X and Y from their arithmetic means by x and y respectively, the

given data are : x2 = 136, xy = 122, and y2 = 138

xy 122 122

r= = 0.89 Ans.

2

x y 2

136 138 137

Short-cut Method: To avoid difficult calculations due to mean being in fraction, deviations

are taken from assumed means while calculating coefficient of correlation. The formula is also

modified for standard deviations because deviations are taken from assumed means. Karl Pearson’s

formula for short-cut method is given below :

8

dx.dy

dxdy N dxdy dx dy

r= N or r=

2

dx

(dx)

2

2

dy

(dy ) 2

N dx 2

(dx) 2 N dy 2

(dy ) 2

N N

Marks in Statistics 20 30 28 17 19 23 35 13 16 38

Marks in Mathematics 18 35 20 18 25 28 33 18 20 40

Solution :

dy2 dxdy

Statistics, X dx Maths, Y dy

20 – 10 100 18 – 12 144 + 120

30 0 0 35 +5 25 0

28 –2 4 20 – 10 100 + 20

17 – 13 169 18 – 12 144 + 156

19 – 11 121 25 –5 25 + 55

23 –7 49 28 –2 4 + 14

35 +5 25 33 +3 9 + 15

13 – 17 289 18 – 12 144 + 204

16 – 14 196 20 – 10 100 + 140

38 +8 64 40 + 10 100 + 80

N = 10 – 61 1017 – 45 795 804

N dxdy dx.dy

r=

{N dx 2 (dx) 2 }{N dy 2 ( dy ) 2 }

where dx deviations of X series from an assumed mean 30.

dy deviations of Y series from an assumed mean 30.

dx2 squares of the deviations of X series from assumed mean.

dy1 squares of the deviations of Y series from assumed mean.

dxdy the product of deviations of X and Y series from their assumed means.

10 804 ( 61)( 45)

r =

10 1017 ( 61) 10 795 ( 45) 2

or r = 0.856

(10170 3721)(7950 2025) 6449 5925

9

Direct Method of Computing Correlation Coefficient

Correlation coefficient can also be computed from given X and Y values by using the formula given

below :

r =

N X ( X ) 2 N Y 2 ( Y ) 2

2

The above given formula gives us the same answer as we are getting by taking durations from

actual mean or arbitrary mean.

Example 6. Compute the coefficient of correlations from the following data :

Marks in Statistics 20 30 28 17 19 23 35 13 16 38

Marks in Mathematics 18 35 20 18 25 28 33 18 20 40

Solution :

Marks in Marks in

Statistics, X Mathematics, Y X2 Y2 XY

20 18 400 324 360

30 35 900 1225 1050

28 20 784 400 560

17 18 289 324 306

19 25 361 625 475

23 28 529 784 644

35 33 1225 1089 1155

13 18 169 324 234

16 20 256 400 320

38 40 1444 1600 1520

X = 239 Y = 255 X2 = 6357 Y2 = 7095 XY = 6624

r=

N X 2 ( X ) 2 N Y 2 ( Y ) 2

=

10 6357 (239) 2 10 7095 (255) 2

66240 60945 5295 5295

= 0.856

63570 57121 70950 65095 6449 5925 6181.45

10

Coefficient of Correlation in a Continuous Series

In the case of a continuous series, we assume that every item which falls within a given class

interval falls exactly at the middle of that class. The formula, because of the presence of frequencies

is modified as follows :

fdx.fdy

fdxdy

f

r=

2 (fdx) 2 2 (fdy ) 2

fdx dy

f f

(i) Take the step deviations of variable X and denote it as dx.

(ii) Take the step deviations of variable Y and denote it as dy.

(iii) Multiply dx dy and the respective frequency of each cell and write the figure obtained in

the right-hand upper corner of each cell.

(iv) Add all the cornered values calculated in step (iii) to get fdxdy.

(v) Multiply the frequencies of the variable X by the deviations of X to get fdx.

(vi) Take the squares of the deviations of the variable X and multiply them by the respective

frequencies to get fdx2.

(vii) Multiply the frequencies of the variable Y by the deviations of Y to get fdy.

(viii) Take the squares of the deviations of the variable Y and multiply them by the respective

frequencies to get fdy2.

(ix) Now substitute the values of fdxdy, fdx, fdx2, fdy, fdy2 in the formula to get the

value of r.

Example 7 : The following table gives the ages of husbands and wives at the time of their marriages.

Calculate the correlation coefficient between the ages of husbands and wives.

Ages of Husbands

Age of Wives 20–30 30–40 40–50 50–60 60–70 Total

15–25 5 9 3 – – 17

25–35 – 10 25 2 – 37

35–45 – 1 12 2 – 15

45–55 – – 4 16 5 25

55–65 – – – 4 2 6

Total 5 20 44 24 7 100

11

Solution : Age of Husbands (X)

fdx.fdy

fdxdy

f

r=

2 (fdx) 2 2 (fdy ) 2

fdx dy

f f

(8)(34) 90.72

88

100 100 90.72

= 0.79

(8)

2

(34) 2 91.36 142.44 91.36 14.244

92 154

100 100 100 100

Following are some of the important properties of r :

(1) The coefficient of correlation lies between –1 and +1 (–1 r +1 )

12

(2) The coefficient of correlation is independent of change of scale and origin of the variable

X and Y.

(3) The coefficient of correlation is the geometric mean of two regression coefficients.

r = bxy dyx

Merits of Pearson’s coefficient of correlation : The correlation of coefficient summarizes in

one figure the degree and direction of correlation but also the direction. Value varies between +1

and –1.

Demerits of Pearson’s coefficient of correlation :It always assumes linear relationship between

the variables; in fact the assumption may be wrong. Secondly, it is not easy to interpret the significance

of correlation coefficient. The method is time consuming and affected by the extreme items.

1.5.5 Probable Error of Coefficient of Correlation

It is calculated to find out how far the Pearson’s coefficient of correlation is reliable in a particular

case.

1 r2

P.E. of coefficient of correlation = 0.6745

N

where r = coefficient of correlation and N= number of pairs of items.

If the probable error calculated is added to and subtracted from the coefficient of correlation,

it would give us such limits within which we can expect the value of the coefficient of correlation

to vary.

If r is less than probable error, then there is no real evidence of correlation.

If r is more than 6 times the probable error, the coefficient of correlation is considered highly

significant.

If r is more than 3 times the probable error but less than 6 times, correlation is considered

significant but not highly significant.

If the probable error is not much and the given r is more than the probable error but less than

3 times of it, nothing definite can be concluded.

1.5.6 Rank Correlation

There are many problems of business and industry when it is not possible to measure the variable

under consideration quantitatively or the statistical series is composed of items which can not be

exactly measured. For instance, it may be possible for the two judges to rank six different brands of

cigarettes in terms of taste, whereas it may be difficult to give them a numerical grade in terms of

taste. In such problems, Spearman’s coefficient of rank correlation is used. The formula for rank

correlation is :

6 D 2 6 D 2

= 1 or 1 3

N ( N 2 1) N N

where stands for rank coefficient of correlation

D refers to the difference of ranks between paired items

N refers to the number of paired observations.

The value of rank correlation coefficient varies between +1 and –1. When the value of = +1,

13

there is complete agreement in the order of ranks and the ranks will be in the same order. When

= –1, the ranks will be in opposite direction showing complete disagreement in the order of ranks.

Let us understand with the help of an example.

Example 8 : Ranks of 10 individuals at the start and at the finish of a course of training are given :

Individual : A B C D E F G H 1 J

Rank before : 1 6 3 9 5 2 7 10 8 4

Rank after : 6 8 3 7 2 1 5 9 4 10

Calculate coefficient of correlation.

Solution :

Individual Rank before Rank after (R1 – R2)

R1 R2 D D2

A 1 6 –5 25

B 6 8 –2 4

C 3 3 0 0

D 9 7 2 4

E 5 2 3 9

F 2 1 1 1

G 7 5 2 4

H 10 9 1 1

I 8 4 4 16

J 4 10 –6 36

N = 10 D2 = 100

By applying the formula,

6 D 2 6 100

= 1 3 1 3 1 0.609 0.394

N N 10 10

When we are given the actual data and not the ranks, it becomes necessary for us to assign the

ranks. Ranks can be assigned by taking either the highest value as one or the lowest value as one.

But if we start by taking the highest value or the lowest value we must follow the same order for

both the variables to assign ranks.

Example 9 : Calculate rank correlation from the following data :

X : 17 13 15 16 6 11 14 9 7 12

Y : 36 46 35 24 12 18 27 22 2 8

14

Solution :

Calculation of Rank Correlation

X (Ranks) Y (Ranks) D D2

R1 R2 (R1 – R2)

17 1 36 2 –1 1

13 5 46 1 +4 16

15 3 35 3 0 0

16 2 24 5 –3 9

6 10 12 8 +2 4

11 7 18 7 0 0

14 4 27 4 0 0

9 8 22 6 +2 4

7 9 2 10 –1 1

12 6 8 9 –3 9

N = 10 D2 = 44

Rank correlation coefficient is calculated as follows :

6 D 2

= 1

N3 N

6 44 264

= 1 3

1 1 0.266 0.734

10 10 990

In some cases it becomes necessary to rank two or more items an identical rank. In such cases,

it is customary to give each item an average rank. Therefore, if two items are equal for 4th and 5th

45

rank, each item shall be ranked 4.5 i.e., . It means, where two or more items are to be ranked

2

equal, the rank assigned for purposes of calculating coefficient of correlation is the average of ranks

which these items would have got had they differed slightly from each other. When equal ranks are

assigned to some items, the rank correlation formula is also adjusted. The adjustment consists of

1

adding ( m 2 m) to the value of D2 where m stands for number of items whose ranks are

12

identical.

1 1

6 D 2 (m3 m) (m3 m) ....

12 12

= 1 3

N N

Let us take an example to understand this.

Example 10 : Compute the rank correlation coefficient from the following data :

Section A : 115 109 112 87 98 98 120 100 98 118

Section B : 75 73 85 70 76 65 82 73 68 80

15

Solution :

Computation of Rank correlation coefficient.

Series Ranks Series Ranks D D2

A R1 B R2 (R1 – R2)

115 8 75 6 –2 4

109 6 73 4.5 1.5 2.25

112 7 85 10 –3 9

87 1 70 3 –2 4

98 3 76 7 –4 16

98 3 65 1 2 4

120 10 82 9 1 1

100 5 73 4.5 0.5 0.25

98 3 68 2 1 1

118 9 80 8 1 1

N = 10 D2 = 42.50

Apply formula to calculate Rank Correlation

1 1

6 D 2 (m3 m) (m3 m) ....

12 12

= 1 3

N N

Item 98 is repeated three times in series A. Hence m = 3. In series B the item 73 is repeated two

times and so m = 2.

1 1

6 42.50 (33 3) (23 2)

12 12

= 1

103 10

= 1 1 0.727

1000 10 990

1.5.7 Concurrent Deviation Method

This is the simplest method of studying correlation. The only thing to be computed under this

method is the direction of change of both the variables. The formula is

2C N

rc =

N

where rc = Coefficient of concurrent deviations.

C = Number of concurrent deviations.

N = Number of pairs of deviations compared.

16

The procedure of calculating coefficient of correlation under this method is quite simple as

explained below:

(i) Compute the direction of change for both the variables comparing with the preceeding

values and assign + sign for increase and – sign for decrease and 0 for no change.

(ii) Denote these two columns by Dx and Dy.

(iii) Multiply Dx with Dy and determine the value of C which means positive.

(iv) Apply the formula.

We can understand by taking an example.

Example 11 : Calculate the coefficient of correlation by concurrent deviation from the following

data

X : 100 120 135 135 115 110 120

Y : 50 40 60 80 80 55 65

Solution :

X Dx Y Dy DxDy

100 50

120 + 40 – –

135 + 60 + +

135 0 80 + 0

115 – 80 0 0

110 – 55 – +

120 + 65 + +

N=6 C=3

2C N

rc =

N

2 3 6 0

= 0.

6 6

Therefore the correlation does not exist between the variables.

1.6 SUMMARY

Correlation analysis deals with bivariate and multivariate data.

Correlation is a study of the co-variation of the variables involved.

When changes in the variables occur in the same direction, they are positively correlated and

when the movements are in the opposite directions, the correlation is negative.

Correlation between two variables would result either when one of them is the cause while the

17

other is the effect or when both of them are affected by some common factors. It may also be

spurious correlation, resulting from chance when factors affecting each one have nothing in

common.

Correlation between variables may be of varying degrees ranging from perfect to high, moderate,

low and no correlation.

Correlation may be linear or non-linear. Only linear correlation is considered here.

Graphically, correlation is studied by means of a scatter diagram. If dots representing pairs of

data values are seen to fall on a straight line, the correlation is perfect. The degree of correlation

decreases as the points lay more and more away from the line.

Widely scattered dots with no clear direction and dots in a line that is parallel to either of the

axes means absence of correlation.

Numerically, the correlation is measured and expressed in terms of Karl Pearson’s coefficient

of correlation which is defined as the ratio of covariance to the product of standard deviations

of the two series involved.

Its calculation can be done by measuring deviations of the observations from their respective

means or assumed mean values, and even by not measuring deviations.

The coefficient of correlation varies between ±1 and is independent of the change of origin

and scale.

Exercise 1: True or False statements

(i) The correlation analysis is related to the examination or the nature of relationship between

variables.

(ii) Covariation implies that two variables would vary in the same direction.

(iii) Negative correlation in two series means that as the value of one of the variables decreases,

the other would also decrease.

(iv) Graphic representation of correlation is done by means of a scatter diagram.

(v) A straight line on a scatter diagram having a zero slope implies that there is perfect positive

correlation between the variables.

(vi) The coefficient of correlation contains in a single number the extent and the direction of

relationship between the variables.

(vii) The absolute value of the covariance between X and Y can be at most equal to the product

of the standard deviations of X and Y series.

(viii) The coefficient of correlation always lies between 0 and 1.

(ix) The coefficient of correlation can be calculated without measuring deviations from actual

or assumed means.

(x) The probable error aims at establishing the dependability of the coefficient of correlation.

(xi) For one set of data, r = 0.8 and for another set r = 0.4. It means that correlation is twice as

strong in the first set than in the second.

18

(xii) A correlation between two variables may exist because both of them may be influenced

by some common factors.

(xiii) For a given set of X and Y values, the coefficient of correlation is found to be equal to 0.6.

If the variables are interchanged so thatX becomes Y and Y becomes X, then the coefficient

for the new set of data may or may not be equal to 0.6.

(xiv) If all the X values in a given set of paired data are subtracted from a constant K, it will

have no effect on the value of the correlation coefficient.

(xv) The coefficient of correlation is independent of the change of origin and scale.

(xvi) If the coefficient of correlation between X and Y is 0.7, then the coefficient of correlation

between –X and –Y would be equal to –0.7.

(xvii) The correlation is said to be significant only when | r| > 6PE.

(xviii) For the rank correlation to be calculated, it is necessary that the given variables should

not be quantifiable.

(xix) The coefficient of rank correlation has the same limits as the Karl Pearson’s coefficient of

correlation has.

(xx) Rank correlation can be used even when the variables under consideration are quantifiable

and not normally distributed.

Ans. 1. F 2. F 3. F 4. T 5. F 6. T 7. T 8. F 9. T 10. T 11. F 12. T 13. F 14. F 15. T 16. F, 17. T, 18. F,

19. T, 20. T

Exercise 2 : Questions and Answers

(i) What is correlation? Distinguish between positive and negative correlation. How is ‘scatter

diagram’ method helpful in the study of correlation?

(ii) What is a scatter diagram? How does it help in studying the degree and direction of

correlation between two variables? Illustrate with some sketches.

(iii) Define Karl Pearson’s coefficient of correlation. Explain the general rules for interpreting

the coefficient. In this connection, also state the meaning and significance of the concept

of probable error.

(iv) State and explain the properties of the coefficient of correlation. Also, state the assumptions

underlying.

(v) What do you understand by the statement that coefficient of correlation is independent of

the change of origin and scale?

(vi) Does correlation imply the existence of cause and effect relationship between the variables

involved? Does cause and effect relationship between variables result in correlation

between them? Explain with the help of suitable examples.

(vii) Define rank correlation. Write Spearman’s formula for rank correlation coefficient when

some ranks are tied and when ranks are not tied. What are the limits of this coefficient?

Interpret the case where this coefficient assumes the minimum value.

(viii) For a given series of paired data, the following information is available:

Covariance between X and Y series = –32.6

19

Standard deviation of X series = 8.6

Standard deviation of Y series = 4.8

No. of pairs of observations = 15

Calculate the coefficient of correlation.

(ix) Given the following information:

Number of pairs of observations of X and Y series =15

X series arithmetic mean = 25

Y series arithmetic mean =18

X series standard deviation = 3.0

Y series standard deviation = 3.03

Summation of the products of corresponding deviations of X and Y series = 122

Calculate the coefficient of correlation between X and Y series.

(x) Given:

Total of multiplication of deviations of X and Y = 3,476

No. of pairs of observations = 12

Total of deviations of X = – 176

Total of deviations of Y = – 26

Total of squares of deviations of X = 8,288

Total of squares of deviations of Y = 2,556

Using this information, calculate the coefficient of correlation when the arbitrary mean

values of X and Y are 85 and 22, respectively.

(xi) For a set of bivariate data, you are given the following information:

(X – 58) = 46,(Y – 58) = 19, (X – 58) (Y – 58) = 1,095,

(X – 58)2 = 1,483, and (Y – 58)2 = 3,086

Number of pairs of observations = 8

Calculate the coefficient of correlation between X and Y.

(xii) The co-efficient of correlation between two variables X and Y is – 0.4 and their covariance

is equal to –16. If variance of Y series is 36, find the second moment about mean of X

series.

(xiii) Given below is the information relating to marks in Statistics X

( ) and marks in Accountancy

(Y) obtained by the students of a class:

Co-variance between X and Y = 144

Second moment of X about 20 = 244

First moment of X about 20 = 10

Arithmetic mean of Y = 45

Coefficient of correlation between X and Y = 0.75

20

Calculate coefficient of variation for marks in Statistics and that for marks in Accountancy.

In which subject is the performance of students is more consistent?

(xiv) The coefficient of correlation between X and Y for 20 items is 0.3. The mean of X is 15

and that of Y is 20 while the respective standard deviations are 4 and 5. At the time of

calculation, one item 27 has wrongly been taken as 17 in the case of X series and 35

instead of 30 in the case of Y series. Find the correct coefficient of correlation.

(xv) While making calculations about coefficient of correlation, a student obtained the following

results:

n = 25, X = 125, X2 = 650, Y = 100, Y2 = 460, and XY = 508

It was discovered later, however, that two pairs of values were wrongly recorded as :

X Y X Y

6 14 while the correct values were: 8 12

8 6 6 8

Obtain the correct value of the coefficient of correlation.

(xvi) Find the coefficient of correlation between age and playing habits of the following students:

Age of Players: 16 17 18 19 20 21

No. of Students: 2,500 2,000 1,500 1,200 1,000 800

Regular Players: 2,250 1,200 1,050 480 250 120

(xvii) A panel of judges A and B graded seven dramatic performances by independently awarding

marks as given here.

Performance : 1 2 3 4 5 6 7

Marks by A: 46 42 44 40 43 41 45

Marks by B: 40 38 36 35 39 37 41

Show by means of coefficient of correlation whether the marks given by them are correlated.

(xviii) Calculate the coefficient of correlation between height and weight of the students using

the following data:

(inches) 90–100 100–110 110–120 120–130

50–55 4 7

55–60 6 10 7

60–65 10 12 7

65–70 8 6 13

Ans. 8. –0.790, 9. 0.895, 10. 0.819, 11. 0.512, 12. 44.44, 13. CV : Stats = 40% Accs = 35.56%

14. 0.515, 15. 0.667, 16. – 0.958, 17. r = 0.750, 18. 0.583

21

LESSON 2

REGRESSION ANALYSIS

2. STRUCTURE

2.0 Objective

2.1 Introduction

2.2 Difference between Correlation and Regression

2.3 Principle of Least Squares

2.4 Methods of Regression Analysis

2.4.1 Graphic Method

2.4.2 Algebraic Method

2.5 Properties of Regression Coefficients

2.6 Standard Error of an Estimate

2.7 Summary

2.8 Self Assessment Questions

2.0 OBJECTIVE

After studying this lesson, you should be able to :

(i) Understand the concept of regression analysis

(ii) Differentiate between correlation and regression analysis

(iii) Compute regression coefficients by different methods and draw regression lines

(iv) Comprehend properties of Regression Coefficients

(v) Apply regression analysis to predict a dependent variable given the independent variable.

2.1 INTRODUCTION

The statistical technique correlation establishes the degree and direction of relationship between

two or more variables. But we may be interested in estimating the value of an unknown variable on

the basis of a known variable. If we know the index of money supply and price-level, we can find

out the degree and direction of relationship between these indices with the help of correlation

technique. But the regression technique helps us in determining what the general price-level would

be assuming a fixed supply of money. Similarly if we know that the price and demand of a commodity

are correlated we can find out the demand for that commodity for a fixed price. Hence, the statistical

tool with the help of which we can estimate or predict the unknown variable from known variable

is called regression. The meaning of the term “Regression” is the act of returning or going back.

This term was first used by Sir Francis Galton in 1877 when he studied the relationship between the

height of fathers and sons. His study revealed a very interesting relationship. All tall fathers tend to

have tall sons and all short fathers short sons but the average height of the sons of a group of tall

fathers was less than that of the fathers and the average height of the sons of a group of short fathers

was greater than that of the fathers. The line describing this tendency of going back is called

“Regression Line”. Modern writers have started to use the term estimating line instead of regression

line because the expression estimating line is more clear in character. According to Morris Myers

22

Blair, regression is the measure of the average relationship between two or more variables in terms

of the original units of the data.

Regression analysis is a branch of statistical theory which is widely used in all the scientific

disciplines. It is a basic technique for measuring or estimating the relationship among economic

variables that constitute the essence of economic theory and economic life. The uses of regression

analysis are not confined to economics and business activities. Its applications are extended to

almost all the natural, physical and social sciences. The regression technique can be extended to

three or more variables but we shall limit ourselves to problems having two variables in this lesson.

Regression analysis is of great practical use even more than the correlation analysis. Some of

the uses of the regression analysis are given below :

(i) Regression Analysis helps in establishing a functional relationship between two or more

variables. Once this is established it can be used for various analytic purposes.

(ii) With the use of electronic machines and computers, the medium of calculation of regression

equation particularly expressing multiple and non-linear relations has been reduced

considerably.

(iii) Since most of the problems of economic analysis are based on cause and effect relationship,

the regression analysis is a highly valuable tool in economic and business research.

(iv) The regression analysis is very useful for prediction purposes. Once a functional

relationship is established the value of the dependent variable can be estimated from the

given value of the independent variables.

Both the techniques are directed towards a common purpose of establishing the degree and direction

of relationship between two or more variables but the methods of doing so are different. The choice

of one or the other will depend on the purpose. If the purpose is to know the degree and direction of

relationship, correlation is an appropriate tool but if the purpose is to estimate a dependent variable

with the substitution of one or more independent variables, the regression analysis shall be more

helpful. The point of difference are discussed below:

(i) Degree and Nature of Relationship : The correlation coefficient is a measure of degree of

covariability between two variables whereas regression analysis is used to study the nature of

relationship between the variables so that we can predict the value of one on the basis of

another. The reliance on the estimates or predictions depend upon the closeness of relationship

between the variables.

(ii) Cause and Effect Relationship : The cause and effect relationship is explained by regression

analysis. Correlation is only a tool to ascertain the degree of relationship between two variables

and we can not say that one variable is the cause and other the effect. A high degree of correlation

between price and demand for a commodity or at a particular point of time may not suggest

which is the cause and which is the effect. However, in regression analysis cause and effect

relationship is clearly expressed – one variable is taken as dependent and the other an

independent.

The variable which is the basis of prediction is called independent variable and the variable

that is to be predicted is called dependent variable. The independent variable is represented by X

and the dependent variable by Y.

23

2.3 PRINCIPLE OF LEAST SQUARES

Regression refers to an average of relationship between a dependent variable with one or more

independent variables. Such relationship is generally expressed by a line of regression drawn by the

method of the “Least Squares”. This line of regression can be drawn graphically or derived

algebraically with the help of regression equations. Before the equation of the least line can be

determined some criterion must be established as to what conditions the best line should satisfy.

The condition usually stipulated in regression analysis is that the sum of the squares of the deviations

of the observed Y values from the fitted line shall be minimum. This is known as the least squares

or minimum squared error criterion.

A line fitted by the method of least squares is the line of best fit. The line satisfies the following

conditions :

(i) The algebraic sum of deviations above the line and below the line are equal to zero.

(x – xc) = 0 and (y – yc) = 0

where xc and yc are the values derived with the help of regression technique.

(ii) The sum of the squares of all these deviations is less than the sum of the squares of

deviations from any other value, we can say

(x – xc)2 is smaller than (x – A)2 and

(y – yc)2 is smaller than (y – A)2

where A is some other value or any other straight line.

(iii) The line of regression (best fit) intersect at the mean value of the variables i.e., x and y.

(iv) When the data represent a sample from a larger population, the least square line is the

best estimate of the population line.

We can study regression by the following methods :

1. Graphic method (regression lines)

2. Algebraic method (regression equations)

We shall understand these methods.

2.4.1 Graphic Method

When we apply this method different points are plotted on a graph paper representing different

pairs of variables. These points give a picture of a scatter diagram with several points spread over.

A regression line may be drawn between these points either by free hand or by a scale in such a way

that the squares of the vertical or horizontal distances between the points and the line of regression

is minimum. It should be drawn in such a manner that the line leaves equal number of points on

both sides. However, to ensure this is rather difficult and the method only renders a rough estimate

which can not be completely free from subjectivity of person drawing it. Such a line can be a

straight line or a curved line depending upon the scatter of points and relationship to be established.

A non-linear free hand curve will have more element of subjectivity and a straight line is generally

drawn. Le us understand it with the help of an example :

24

Example 1 :

Height of fathers Height of sons

(Inches) (Inches)

65 68

63 66

67 68

64 65

68 69

62 66

70 68

66 65

68 71

67 67

69 68

71 70

Solution : The diagram given below shows the height of fathers on x-axis and the height of sons on

y-axis. The line of regression called the regression of y on x is drawn between the scatter dots.

Fig. 1

Another line of regression called the regression line of x on y is drawn amongst the same set of

scatter dots in such a way that the squares of the horizontal distances between dots are minimised.

Fig. 2

25

Fig. 3

It is clear that the position of the regression line of x on y is not exactly like that of the regression

line of y on x. In the following figure both the regression of y on x and x on y are exhibited.

Fig. 4

When there is either perfect positive or perfect negative correlation between the two variables,

the two regression lines will coincide and we will have only one line. The farther the two regression

lines from each other, the lesser is the degree of correlation and vice-versa. If the variables are

independent, correlation is zero and the lines of regression will be at right angles. It should be noted

that the regression lines cut each other at the point of average of x and y, i.e., if from the point where

both the regression lines cut each other a perpendicular is drawn on the x-axis, we will get the mean

value of x series and if from that point a horizontal line is drawn on the y-axis we will get the mean

of y series.

2.4.2 Algebraic Method

The algebraic method for simple linear regression can be understood by two methods:

(i) Regression Equations

(ii) Regression Coefficients

Regression Equations : These equations are known as estimating equations. Regression

equations are algebraic expressions of the regression lines. As there are two regression lines, there

are two regression equations :

(i) X on Y is used to describe the variations in the values of X for given changes in Y.

26

(ii) Y on X is used to describe the variations in the values of Y for given changes in X. The

regression equations of Y on X is expressed as

Yc = a + bX

The regression equations of X on Y is expressed as

Xc = a + bX

In these equations a and b are constants which deretmine the position of the line completely.

These constants are called the parameters of the line. If the value of any of these parameters is

changed, another line is determined.

Parameter a refers to the intercept of the line and b to the slope of the line. The symbol Yc and

Xc refers to the values of Y computed and the value of X computed on the basis of independent

variable in both the cases. If the values of both the parameters are obtained, the line is completely

determined. The values of these two parameters a and b can be obtained by the method of least

squares. With a little algebra and differential calculus it can be shown that the following two equations,

are solved simultaneously, will give values of the parameters a and b such that the least squares

requirement is fulfilled;

For regression equation Yc = a + bX

y = Na + bx

xy = ax + bx2

For regression equation Xc = a + bY

x = Na + by

xy = ay + by2

These equations are usually called the normal equations. In the equations x, y, xy, x2, y2

indicate totals which are computed from the observed pairs of values of two variables x and y to

which the least squares estimating line is to be fitted and N is the number of observed pairs of

values. Let us understand by an example.

Example 2: From the following data obtain the two regression equations :

x : 6 2 10 4 8

y : 9 11 5 8 7

Solution :

Computation of Regression Equations

x y xy x2 y2

6 9 54 36 81

2 11 22 4 121

10 5 50 100 25

4 8 32 16 64

8 7 56 64 49

x = 30 y = 40 xy = 214 x2 = 220 y2 = 340

27

Regression line of Y on X is expressed by the equation of the form

Yc = a + bX

To determine the values of a and b, the following two normal equations are solved

y = Na + bx

xy = ax + bx2

Substituting the values, we get

40 = 5a + 30b ...(i)

214 = 30a + 220b ...(ii)

Multiplying equation (i) by 6, we get

240 = 30a + 180b ...(iii)

214 = 30a + 220b ...(iv)

Deduct equation (iv) from (iii)

– 40b = + 26

b = – 0.65

Substitute the value of b in equation (i)

40 = 5a + 30 (– 0.65)

5a = 40 + 19.5 or a = 11.9

Substitute the values of a and b in the equation

Regression line of Y on X is

Yc = 11.9 – 0.65X

Regression line of X on Y is

Xc = a + bY

The corresponding normal equations are

x = Na + by

xy = ay + by2

Substituting the values

30 = 5a + 40b ...(i)

214 = 40a + 340b ...(ii)

Multiply equation (i) by 8

240 = 40a + 320b ...(iii)

214 = 40a + 340b ...(iv)

Deduct equation (iv) from (iii)

–20b = 26 or b = – 1.3

Substitute the value of b in equation (i)

30 = 5a + 40(– 1.3)

5a = 30 + 52 or a = 16.4

28

Substitute the values of a and b in the equation. Regression line of X on Y is

Xc = 16.4 – 1.3Y

Regression Coefficients : In the regression equation b is the regression coefficient which

indicates the degree and direction of change in the dependent variable with respect to a change in

the independent variable. In the two regression equations:

Xc = a + bY

Yc = a + bX

where bxy (b) and byx (b) are known as the regression coefficients of the two equations. These

coefficients can be obtained independently without using simultaneous normal equations with these

formulae:

Regression coefficients of X on Y is

x

bxy = r

y

xy xy

bxy = x

N x y y N 2y

xy

bxy = where x = X X and y = Y Y

y 2

Regression Coefficient of Y on X is

y

byx = r

x

xy y xy

byx = N N 2

x y x x

xy

byx = where x = X X and y Y Y

x 2

Example 3: Calculate the regression coefficients from data given below :

Series x Series y

Average 25 22

Standard deviation 4 5 r = 0.8

Solution : The coefficient of regression of x on y is

x 4

bxy = r 0.8 0.64

y 5

The coefficient of regression of y on x is

y 4

byx = r 0.8 1.00

x 5

29

2.5 PROPERTIES OF REGRESSION COEFFICIENTS

(i) The coefficient of correlation is the geometric mean of the two regression coefficients,

r bxy byx .

(ii) Both the regression coefficients are either positive or negative. It means that they always

have identical sign i.e., either both have positive sign or negative sign.

(iii) The coefficient of correlation and the regression coefficients will also have same sign.

(iv) If one of the regression coefficient is more than unity, the other must be less than unity

because the value of coefficient of correlation cannot exceed one (r = ± 1).

(v) Regression coefficients are independent of the change in the origin but not of the scale.

(vi) The average of regression coefficients is always greater than correlation coefficient.

We can compute the regression equations with the help of regression coefficients by the

following equations:

1. Regression equation X on Y

x

XX = r (Y Y )

y

where X is the mean of X series

Y is the mean of Y series

r x is the regression coefficient of x and y

y

2. Regression equation Y on X

We can explain this by taking an example :

y

Y Y = r (X X )

x

Example 4 : Calculate the following from the below given data :

(a) the two regression equations,

(b) the coefficient of correlation and

(c) the most likely marks in Statistics when the marks in Economics are 30.

Marks in Economics : 25 28 35 32 31 36 29 38 34 32

Marks in Statistics : 43 46 49 41 36 32 31 30 33 39

Solution :

Calculation of Regression Equations and Correlation Coefficient

Marks in (X – X ) Marks in (Y – Y )

Eco (X) x x2 Stats (Y) y y2 xy

25 –7 49 43 +5 25 – 35

28 –4 16 46 +8 64 – 32

35 +3 9 49 + 11 121 + 33

30

32 0 0 41 +3 9 0

31 –1 1 36 –2 4 +2

36 +4 16 32 –6 36 – 24

29 –3 9 31 –7 49 + 21

38 +6 36 30 –8 64 – 48

34 +2 4 33 –5 25 – 10

32 0 0 39 +1 1 0

X = 320 x = 0 x2 = 140 Y = 380 y = 0 y2 = 398 xy = – 93

(a) Regression equation X on Y

X X = bxy (Y Y )

xy 93

bxy = 0.234

y 2 398

X 320 Y 380

X = 32 and Y 38

N 10 N 10

Substituting the values

X – 32 = – 0.234 (Y – 38)

X – 32 = – 0.234Y + 8.892

or X = 40.892 – 0.234Y

Regression equation Y on X

(Y Y ) = byx ( X X )

xy 93

byx = 0.664

x 2 140

Y – 38 = – 0.664 (X – 32)

= – 0.664Y + 21.248

or Y = 59.248 – 0.664X

Since both the regression coefficients are negative, value of r must also be negative.

(c) Likely marks in statistics when marks in Economics are 30.

Y = – 0.664 X + 59.248 where X = 30

Y = (– 0.664 × 30) + 59.248 = 39.328 or 39.

Example 5 : The following scores were worked out from a test in Mathematics and English in an

annual examination.

31

Scores in Mathematics (x) English (y)

Mean 39.5 47.5

Standard deviation 10.8 16.8 r = + 0.42

Find both the regression equations. Using these regression estimate find the value of Y for

X = 50 and the value of X for Y = 30.

Solution : Regression of X on Y

x

X X = r (Y Y )

y

By substituting values, we get

10.8

X – 39.5 = 0.42 (Y 47.5)

16.8

= 0.27 (Y – 47.5) = 0.27 Y – 12.82

or X = 0.27Y – 12.82 + 39.5 = 0.27Y + 26.68

when Y = 30

Value of X = (0.27 × 30 + 26.68) = 34.78

Regresssion euqation of Y on X

y

r X X)

Y Y = (

x

16.8

Y – 47.5 = 0.42 ( X 39.5)

10.8

Y – 47.5 = 0.653 (X – 39.5) = 0.653 X – 25.79

or Y = 0.653 X – 25.79 + 47.5 = 0.653X + 21.71

When X = 50

Value of Y = (0.653 × 50 + 21.71) = 32.65 + 21.71 = 54.36

Thus the regression equations are :

Xc = 0.27y + 26.68

Yc = 0.653x + 21.71

Value of X when Y = 30 is 34.78

Value of Y when X = 50 is 54.36

When actual mean of both the variables X and Y come out to be in fractions, the deviation from

actual means create a problem and it is advisable to take deviations from the assumed mean. Thus

when deviations are taken from assumed means, the value of bxy and byx is given by

32

(dx ) (dy )

dxdy

bxy = N where dx = (X – A) and dy = (Y – A)

( dy )2

dy 2

N

The regression equation is :

( X X ) = bxy (Y Y )

Similarly the regression equation of Y on X is

(Y Y ) = byx ( X X )

(dx ) (dy )

dxdy

N

byx =

( dx )2

dx 2

N

Let us try to understand with the help of an example :

Example 6 : You are given the data relating to purchases and sales. Compute the two regression

equations by method of least squares and estimate the likely sales when the purchases are 100.

Purchases : 62 72 98 76 81 56 76 92 88 49

Sales : 112 124 131 117 132 96 120 136 97 85

Solution :

Calculations of Regression Equations

Purchases (X–76) Sales (Y–120)

X dx dx2 Y dy dy2 dxdy

62 – 14 196 112 –8 64 + 112

72 –4 16 124 +4 16 – 16

98 + 22 484 131 + 11 121 + 242

76 0 0 117 –3 9 0

81 +5 25 132 + 12 144 + 60

56 – 20 400 96 – 24 576 + 480

76 0 0 120 0 0 0

92 + 16 256 136 + 16 256 + 256

88 +12 144 97 –23 529 – 276

49 –27 729 85 –35 1225 + 945

dx = – 10 dx2 = 2250 dy = – 50 dy2 = 2940 dxdy = 1803

dx 10 dy 50

X A 76 75 and Y A 120 115

N 10 N 10

33

Regression Coefficients : X on Y

(dx ) (dy ) (10) ( 50)

dxdy 1803

N 10 1753

bxy = 2

2

0.652

2 (dy ) (50) 2690

dy 2940

N 10

Y on X

(dx ) ( dy ) ( 10) ( 50)

dxdy 1803

N 10 1753

byx = 2

2

0.78

2 (dx ) ( 10) 2240

dx 2250

N 10

Regression equation : X on Y

X X = bxy (Y Y )

Substituting the values

X – 75 = 0.652 (Y – 115) = 0.652Y – 74.98

or X = 0.652Y + 0.02

Regression equation : Y on X

(Y Y ) = bxy ( X X )

Y – 115 = 0.78 (X – 75) = 0.78 X – 58.5

Y = 0.78 X + 56.5

when X = 100

Y = 0.78 × 100 + 56.5= 134.5

Standard error of an estimate is the measure of the spread of observed values from estimated ones,

expressed by regression line or equation. The concept of standard error an estimate is analogous to

the standard deviation which measures the variation or scatter of individual items about the arithmetic

mean. Therefore, like the standard deviation which is the average of square of deviations about the

arithmetic mean, the standard error of an estimate is the average of the square of deviations between

the actual or the observed values and the estimated values based on the regression equation. It can

also be expressed as the root of the measure of unexplained variations divided by N – 2:

Unexplained variation (Y Yc ) 2

Syx =

N–2 N 2

( X X c )2

and Sxy =

N 2

where Syx refers to standard error of estimate of Y values on X values.

Sxy refers to standard error of estimate of X values on Y values.

34

Yc and Xc are the estimated values of Y and X variables by means of their regression equations

respectively. N – 2 is used for getting an unbiased estimate of standard error. The usual explanation

given for this division by N – 2 is that the two constants a and b were calculated on the basis of

original data and we lose two degrees of freedom. Degrees of freedom means the number of classes

to which values can be assigned at will without violating any restrictions.

However a simpler method of computing Syx and Sxy is to use the following formulae :

Y 2 aY bXY

Syx =

N 2

X 2 a X bXY

and Sxy =

N 2

The standard error of estimate measures the accuracy of the estimated figures. The smaller the

values of standard error of estimate, the closer will be the dots to the regression line and the better

the estimates based on the equation for this line. If standard error of estimate is zero, then there is no

variation about the line and the correlation will be perfect. Thus with the help of standard error of

estimate it is possible for us to ascertain how good and representative the regression line is as a

description of the average relationship between two series.

Example 7 : Given the following data :

X: 6 2 10 4 8

Y: 9 11 5 8 7

And two regression equations Y = 11.09 – 0.65 X and X = 16.4 – 1.3 Y. Calculate the standard

error of estimate i.e. Syx and Sxy.

Solution :

We can calculate Xc and Yc values from these regression equations.

X Y Yc Xc (Y – Yc)2 (X – Xc)2

6 9 8.0 4.7 1.00 1.69

2 11 10.6 2.1 0.16 0.01

10 5 5.4 9.9 0.16 0.01

4 8 9.3 6.0 1.69 4.00

8 7 6.7 7.3 0.09 0.49

Thus we can calculate Syx and Sxy from the above calculated values.

(Y Yc ) 2 3.1

Syx = 1.03 1.01

N 2 5 2

( X X c )2 6.2

Sxy = 2.07 1.44

N 2 5 2

35

2.7 SUMMARY

Regression analysis deals with estimating values of one variable based on the values of one or

more other variables.

The variable being estimated is called dependent variable while the variable/s used to make

estimates is/are called independent variable/s.

The simple regression analysis involves one independent variable and one dependent variable.

It is based on the assumption of linear relationship between the two variables.

The relationship between variables is presented by means of a regression equation which is

obtained using the principle of least squares.

For a given set of data involving two variables,X and Y, we can derive two regression equations:

one treating Y as the dependent variable and the other treating X as the dependent variable.

When correlation between two variables is perfect, the two regression equations are reversible

because they both actually represent the same line.

The closer the two regression lines to each other, the higher is the degree of correlation.

The sign of the two regression coefficients is always the same as the sign of the coefficient of

correlation.

Standard error of estimate measures the variation around the regression line. A small value of

the standard error implies that the data cluster around the regression line.

Exercise 1 : True or False statements :

(i) Regression is a tool for making estimates of an independent variable for a given value of

a dependent variable.

(ii) In th e reg ressio n eq u atio n Yc= a + bX, the variable Y is the independent variable.

(iii) In the regression equation Yc = a + bX, a and b are estimates of the population intercept

and slope respectively.

(iv) The least squares principle ensures that sum of deviations from regression line is equal to

zero and (Y Y ) 2 is the minimum.

(v) In a regression equation, both a and b must bear the same sign.

(vi) In the case of negative correlation between the variables, one regression line is positively

sloped while the other is negatively sloped.

(vii) The sum of two regression coefficients is always equal to 1.

(viii) If the two regression equations are solved simultaneously, the X and Y values are

respectively the values of X and Y .

(ix) In the case of perfect correlation between two variables, both the regression coefficients

are equal.

(x) It is feasible to have byx = 5.4 and bxy = 0.15 for a given set of data.

(xi) The two regression coefficients byx and bxy cannot both be smaller than 1.

(xii) The difference between Y and Yc is called error.

36

(xiii) The standard error of estimate can never be equal to zero.

(xiv) The coefficient of correlation is equal to geometric mean of the two regression coefficients.

(xv) The regression coefficients are independent of the change of scale, but they are not

independent of the change of origin.

Ans. 1. F, 2. F, 3. T, 4. F, 5. F, 6. F, 7. F, 8. T, 9. F, 10. T, 11. F, 12. T, 13. F, 14. T, 15. F

Exercise 2 : Questions and Answers

(i) What do you understand by regression? What role does it play in business and economic

analysis?

(ii) Explain in your own words as to why there are two regression lines in the case of paired

values of two variables. At what point do the two regression lines intersect? If the two

regression lines coincide, what does it imply?

(iii) Explain the properties of the regression coefficients. Do you agree that for a given set of

data if each of the X values, is multiplied by 5, then the regression coefficient of Y on X

would also be multiplied by 5 while the regression coefficient of X on Y will be reduced

to l/5th of its original value? Explain.

(iv) Explain the properties of regression coefficients. What is the difference between Regression

and Correlation Analysis?

(v) Given the following data:

X : 7 9 7 12 12 11 14 16

Y : 6 12 12 14 14 16 18 20

(a) Fit the regression equation of Y on X.

(b) Estimate the value of Y for X = 15.

(vi) Given the following information: X = 56; Y = 40; X2 = 524; Y2: = 256; XY= 364:

and n = 8. Obtain the regression equation of X on Y.

(vii) In the estimation of the regression equation of two variables X and Y, the following results

were obtained:

X = 90: Y = 70; X2 = 6,360; Y2 = 2,860; XY = 3,900: and n = 10

Obtain the two regression equations.

(viii) The following data relate to 50 workers of a factory in respect of their experience (X) in

months and time needed (Y) in minutes to fit an apparatus.

Mean of X = 50

Mean of Y = 60

Standard deviation of X = 20

Standard deviation of Y = 20

Covariance (XY) = –100

Calculate the two regression coefficients and the coefficient of determination.

37

(ix) Using the following data,

(a) Obtain the two regression equations.

(b) Find the likely sales when advertising expenditure is Rs. 25 crores.

(c) Estimate the advertising budget to achieve the sales target of Rs. 150 crores ?

Advertising expenditure Sales

(Rs. crores) (Rs. crores)

Mean 20 120

Standard deviation 5 25

Coefficient of correlation 0.8

(x) The following data about the sales and advertisement expenditure of a firm are given :

Sales (X) Advertisement expenditure

(in crores of Rs.) (in lakhs of Rs.)

Mean 40 60

Standard deviation 10 15

Coefficient of correlation 0.9

(a) Estimate the likely sales for a proposed advertisement expenditure of Rs. 100 lakh.

(b) What should the advertisement expenditure be if the firm proposes a sale target of Rs.

60 crores?

(xi) The HR manager of Anomaly International wants to study the relationship betwen number

of years, experience and performance scores of the employees. An analysis of five

employees shows the following results :

No. of years’ experience (X) : 6 2 10 4 8

Performance score (Y) : 19 11 25 18 17

(a) Fit a regression equation of Y on X and interpret it.

(b) Calculate the likely performance score if experience is five years.

(c) Calculate the standard error of estimate and comment on the reliability of the

estimating equation.

(xii) While making calculations about regression equations, a student obtained the following

results :

n = 25, X = 125, X2 = 650; Y = 100, Y2 = 460, and XY = 508

It was discovered later, however, that two pairs of values were wrongly recorded as

X Y X Y

6 14 while the correct values were : 8 12

8 6 6 8

Obtain the two regression equations.

38

(xiii) The equations of two lines of regression between variables X and Y (not necessarily in

that order) are 2X + 3Y – 8 = 0 and X + 2Y – 5 = 0. The variance of X is 4. Find

(a) Variance of Y.

(b) Coefficient of Determination of X and Y

(c) Standard error of estiamte of X on Y, and standard error of estimate of Y on X.

(xiv) Given the following data :

Age Salary (X) Total

(Years) 250–300 300–350 350–400 400–450 450–500

20–30 5 5

30–40 2 3 2 7

40–50 1 6 3 10

50–60 1 2 1 4 8

Total 7 5 10 4 4 30

Ans. 5. Y = 0.861 + 01.194X, 18.78, 6. X = 0.5 + 1.5Y, 7. Y = 1.70 + 0.589X and X = – 0.66 +

1.38Y, 8. byx = bxy = – 0.25, r2 = 0.0625, 9. (a) Y = 4 + 4X, X = 0.8 + 0.16Y (b) 140 crores (c) 24.8

crores, 10. (i) 64 crores, (ii) 87 lakhs, 11. (i) Y = 9.9 + 1.35X (ii) 16.65, (iii) 3.006, 12. Y = + 0.8 X

and X = 2.778, 13. (a) 1.333 (b) 0.75 (c) SEyx = 0.5774, SExy = 1, 14. byx = 5.143, bxy = 0.125, Y

= 146.86 + 5.143X, X = – 3.07 + 0.125Y

39

UNIT-IV : INDEX NUMBERS

LESSON 1 : INDEX NUMBERS

1. STRUCTURE

1.0 Objective

1.1 Introduction

1.2 Features of Index Numbers

1.3 Problems of Index Numbers

1.3.1 The purpose of Index Numbers

1.3.2 Selection of Items

1.3.3 Price Quotations

1.3.4 Selection of the Base Period

1.3.5 The Choice of an Average

1.3.6 Selection of Appropriate Weights

1.4 Methods of Constructing Index Numbers

1.4.1 Unweighted Index Numbers

1.4.2 Weighted Index Numbers

1.5 Tests of Adequacy

1.6 Chain Base Index

1.7 Splicing

1.8 Consumer Price Index

1.9 Index Number of Industrial Production

1.10 Limitations of Index Numbers

1.11 Construction of BSE Sensex and NSE NIFTY

1.12 Summary

1.13 Self Assessment Questions

1.0 OBJECTIVE

After studying this lesson you should be able to :

(i) Understand the meaning and uses of index numbers

(ii) Identify various problems faced in the construction of index numbers

(iii) Learn different methods of constructing index numbers including BSE sensex and NIFTY

(iv) Appreciated different tests of consistency of index numbers

(v) Learn the consumer price index and its computation

(vi) Learn the process of base shifting, spacing and deflating of index numbers

1.1 INTRODUCTION

Economic activities have constant tendency to change. Prices of commodities which are the total

result of number of economic activities also have a tendency to fluctuate. The problem of change in

prices is very important. But it is not very simple to study this problem and derive conclusions

40

because price of different commodities change by different degrees. Hence, there is a great need for

a device which can smoothen the irregularities in the prices to obtain a conclusion. This need is

satisfied by Index Numbers which makes use of percentages and average for achieving the desired

objective. Index Number is a device for comparing the general level of the magnitude of a group of

distinct but related variables in two or more situations. Index Numbers are used to feel the pulse of

the economy and they reveal the inflationary or deflationary tendencies. In reality, Index Numbers

are described as barometers of economic activity because if one wants to have an idea as to what is

happening in an economy, he should check the important indices like the index numbers of industrial

production, agricultural production, business activity etc.

The various definitions of Index Numbers are discussed under three heads :

(i) Measure of change

(ii) Device to measure change

(iii) A series representing the process of change.

According to Maslow, it is a numerical value characterising the change in complex economic

phenomenon over a period of time.

Spiegal explains an index number is a statistical measure designed to show changes in variable

or a group of related variables with respect to time, geographical location or other characteristics.

Gregory and Ward describes it as a measure over time designed to show average change in the

price, quantity or value of a group of items.

Croxton and Cowden says Index numbers are devices for measuring differences in the magnitude

of a group of related variables.

B.L. Bowley describes Index Numbers as a series which reflects in its trend and fluctuations

the movements of some quantity to which it is related.

Blair puts Index Numbers as a specialised kind of average.

Index Numbers have the following features :

(i) Index numbers are specialised averages which are capable of being expressed in percentage.

(ii) Index numbers measure the changes in the level of a given phenomenon.

(iii) Index numbers measure the effect of changes over a period of time.

Index Numbers are indispensable tools of economic and business analysis. Their significance can

be appreciated by following points:

1. Index number helps in measuring relative changes in a set of items.

2. Index numbers provide a good basis of comparison because they are expressed in abstract unit

distinct from the unit of element.

3. Index numbers help in framing suitable policies for business and economic activities.

4. Index numbers help in measuring the general trend of the phenomenon.

5. Index numbers are used in deflating. They are used to adjust the original data for price changes

or to adjust wages for cost of living changes.

41

6. The utility of index numbers has increased a great deal because of the method of splicing

whereby the index prepared on any one base can be adjusted with reference to any other base.

7. As a measure of average change in a group of elements the index numbers can be used for

forecasting future events. Whereas a trend line gives an average rate of change in a single

phenomenon, it indicates the trend for a group of commodities.

8. It is helpful in a study of comparative purchasing power of money in different countries of the

world.

9. Index numbers of business activities throw light on the economic progress made by various

countries.

While constructing Index Number, the following problems arise:

1.3.1 The Purpose of Index Numbers

Before constructing an Index Number, it is necessary to define precisely the purpose for which they

are to be constructed. A single Index can not fulfil all the purposes. Index Numbers are specialised

tools which are more efficient and useful when properly used. If the purpose is not clear, the data

used may be unsuitable and the indices obtained may be misleading. If it is desired to construct a

Cost of Living Index Number of Labour class, then only those item will be included, which are

required by the Labour class.

1.3.2 Selection of Items

The list of commodities included in the Index numbers is called the ‘Regimen’. Because it may not

be possible to include all the items, it becomes necessary to decide what items are to be included.

Only those items should be selected which are representative of the data, e.g. in a consumer Price

Index for working class, items like scooters, cars, refrigerators, cosmetics, etc. find no place. There

is no hard and fast rule regarding the inclusion of number of commodities while constructing Index

Numbers. The number of commodities should be such as to permit the influence of the inertia of

large numbers. At the same time the numbers should not be so large as to make the work of

computation uneconomical and even difficult. The number of commodities should therefore be

reasonable. The following points should be considered while selecting the items to be included in

the Index :

(i) The items should be representative.

(ii) The items should be of a standard quality.

(iii) Non-tangible items should be excluded.

(iv) The items should be reasonable in number.

1.3.3 Price Quotations

It is neither possible nor necessary to collect prices of the commodities from all markets in the

country where it is dealt with, we should take a sample of the markets. Selection must be made of

the representative places and persons. These places should be well known for trading these

commodities.

It is necessary to select a reliable agency from where price quotations are obtained.

42

1.3.4 Selection of the Base Period

In the construction of Index Numbers, the selection of the base period is very important step since

the base period serves as a reference period and the prices for a given period are expressed as

percentages of those for the base year, it is therefore necessary that

(i) the base period should be normal and

(ii) it should not be too far in the past.

There are two methods by which base period can be selected (i) Fixed base method and (ii)

Chain base method.

Fixed base method : According to this any year is taken as a base. Prices during the year are

taken equal to 100 and the prices of other years are shown as percentages of those prices of the base

year. Thus if indices for 1998, 99, 2000, and 2001 are calculated with 1997 as base year, such

indices will be called as fixed base indices.

Chain base method: According to this method, relatives of each year are calculated on the

basis of the prices of the preceding year. The Chain base Index Numbers are called as Link Relatives

e.g., if index numbers are constructed for 1997, 98, 99, 2000 and 2001 then for 1998,1997 will be

the base and for 1999, 1998 will be the base and so on.

1.3.5 The Choice of an Average

An Index number is a technique of ‘averaging’ all the changes in the group of series over a period of

time, the main problem is to select an average which may be able to summarise the change in the

component series adequately. Median, Mode and Harmonic Mean are never used in the construction

of index numbers. A choice has to be made between the Arithmetic Mean and the Geometric Mean.

Merits and demerits of the two are then to be compared. Theoretically G.M. is superior to the A.M.

in many respects but due to difficulty in its computation, it is not widely used for this purpose.

1.3.6 Selection of Appropriate Weights: The term weight refers to the relative importance of the

different items in the construction of index numbers. All items are not of equal importance and

hence it is necessary to find out some suitable methods by which the varying importance of the

different items is taken into account. The system of weighing depends upon the purpose of index

numbers, but they ought to reflect the relative importance of the commodities in the regimen. The

system may be either arbitrary or rational. The weightage may be according to either:

(1) the value of quantity produced, or

(2) the value of quantity consumed, or

(3) the value or quantity sold or put on sale.

There are two methods of assigning weights.

(i) Implicit and (ii) Explicit.

Implicit : Under this method, the commodity to which greater importance has to be given is

repeated a number of times i.e., a number of varieties of such commodities are included in the index

numbers as separate items.

Explicit : In this case, the weights are explicitly assigned to commodities. Only one kind of a

commodity is included in the construction of Index numbers but its price relative is multiplied by

the figure of weights assigned to it. There has to be some logic in assigning such type of weights.

43

1.4 METHODS OF CONSTRUCTING INDEX NUMBERS

The index number for this purpose is divided into two heads :

(1) Unweighted Indices ; and

(2) Weighted Indices.

Each one of these types is further sub-divided under two categories :

(i) Simple aggregative; and

(ii) Average of price relatives.

1.4.1 Unweighted Index Numbers

(i) Simple aggregative method : Under this method the total of the current year prices for various

commodities is divided by the total of the base year and the quotient is multiplied by 100.

Symbolically,

p1

P01 100

p0

where P01 represents the Price Index, p1 represents prices of current year and p0 prices of base

year.

Example 1 : From the following data construct the index for 2013 taking 2000 at base year.

Commodity Prices in 2000 Prices in 2013

(Rs.) (Rs.)

A 30 30

B 35 50

C 45 75

D 45 70

E 25 40

Solution : Construction of Price Index.

Commodity Prices in 2000 Prices in 2013

(Rs.) (Rs.)

A 30 30

B 35 50

C 45 75

D 45 70

E 25 40

p0 = 180 p1 = 265

44

sum of prices in 2013

Price Index for 2013 with 2000 as base = ×100

sum of prices in 2000

Symbolically,

Σp1 265

P01 = × 100 = × 100 = 147.2

Σp0 180

Hence there is an increase of 47.2% in prices of commodities during the year 2013 as compared

to 2000.

(ii) Average of Price Relative Method: Under this method, calculate first the price relatives for the

various items included in the index and then average the price relatives by using any of the

measures of the central value, i.e. A.M.; the median; the mode; the Geometric mean or the

Harmonic mean.

p

1 100

p

(a) When arithmetic mean is used P01 0

N

p

log 1 100

p0

P01 = AL

N

where N refers to the number of items whose price relatives are averaged.

Example 2 : Calculate Index Numbers for 2011, 2012 and 2013 taking 2010 as base from the

following data by average of relatives method.

Commodity 2010 2011 2012 2013

A 2 5 4 3

B 8 11 13 6

C 4 5 6 8

D 6 4 5 7

E 5 4 6 3

45

Solution :

Construction of Index Numbers based on Mean of Relatives.

Commodity 2010 2011 2012 2013

p1 p2 p3

p0 p1 ×100 p2 ×100 p3 ×100

p0 p0 p0

B 8 100 11 137.5 13 162.5 6 75.0

C 4 100 5 125.0 6 150.0 8 200.0

D 6 100 4 66.7 5 83.3 7 116.7

E 5 100 4 80.0 6 120.0 3 60.0

500 659.2 715.8 601.7

P01 = Index with 2010 as base and 2011 as current year

p

1 100

p0 659.2

P01 131.84

N 5

P = Index with 2010 as base and 2012 as current year

02

p

2 100

p0 715.8

P02 143.16

N 5

P03 = Price Index with 2010 as base and 2013 as current year

p

3 100

p0 601.7

P03 120.34

N 5

(I) Aggregative Method: These indices are of the simple aggregative type with the only difference

that the weights are assigned to the various items included in the index. This method in fact

can be described as an extension of the simple aggregative method in the sense that the weights

are assigned to the different commodities included in the index. There are various methods by

which weights can be assigned and hence a large number of formulae for constructing Index

Numbers have been devised. Some commonly used methods suggested by different authorities

are as follows :

(i) Laspeyre’s method.

(ii) Paasche’s method.

46

(iii) Fisher’s ideal method.

(iv) Marshall Edgeworth method.

(v) Kelly’s method.

(vi) Dorbish and Bowley’s method.

(i) Laspeyre’s Method

Laspeyre suggested that for calculating Price Indices, the quantities in the base year should be

used as weights. Hence the formula for computing price Index number would be :

Σp1q 0

P01 = ×100

Σp0q 0

01

p refers to price of each commodity,

q refers to quantity of each commodity,

0 base year,

1 current year, and

Σ refers to the summation of items.

The steps for calculating Index Numbers are :

(a) Multiply the price of each commodity for current year with its respective Quantity for the

base year (p1 × q ) and then find out the total of this product Σ (p1q0).

0

(b) Multiply the price of each commodity for the base year with the respective quantity for

the base year (p0 × q0) and then find out the total of these products for different commodities

Σ (p1q0).

(c) Divide Σ (p1q0) with Σ (p0q0) and multiply the quotient by 100. On the other hand, if

Quantity Index is to be calculated, the prices of base year will be used as weights.

Symbolically,

Σq1p0

Q01 = ×100

Σq 0 p0

Example 3 : Compute Price Index and Quantity Index from data given below by Laspeyre’s method.

Items Base year Current year

Quantity Price Quantity Price

A 6 units 40 paise 7 units 30 paise

B 4 units 45 paise 5 units 50 paise

C 5 units 90 paise 1.5 units 40 paise

47

Solution : Computation of Price and Quantity Indices.

Base year Current year

Items q0 p0 q1 p1 p0q0 p1q0 p0q1 p1q1

A 6 40 7 30 240 180 280 210

B 4 45 5 50 180 200 225 250

C 5 90 1.5 40 45 20 135 60

Σ p0q0 = 465 p1q0 = 400 p0q1 = 640 p1q1 = 520

p1q0 400

Price Index (P01) p q 100 465 100 86.00

0 0

q1 p0 640

Quantity Index (Q01) 100 100 137.63

q0 p0 665

(ii) Paasches Method: Under this method of calculating Price Index the quantities of the current

year are used as weights as compared to base year quantities used by Lespeyre. Symbolically

p0 q1

Steps of construction Index according to Paasche’s method are :

(i) Calculate the product of the current year prices of different commodities and their respective

quantities for the current year (p1× q1) and find out the total of the product of different

commodities (p1× q1) .

(ii) Calculate the product of p and q1 of different commodites and aggregate them (p0q1).

0

(iii) Divide (p1× q1) with (p0q1) and multiply the quotient by 100 to obtain Price Index.

Similarly, quantity index is calculated using the current year price as weights. Symbolically,

q1 p1

Q01 100

q0 p1

Example 4 : From the data of previous illustration, calculate (i) Price Index (ii) Quantity Index by

Paasche’s method.

Base year Current year

Items q0 p0 q1 p1 p0q0 p1q0 p0q1 p1q1

A 6 40 7 30 240 180 280 210

B 4 45 5 50 180 200 225 250

C 5 90 1.5 40 45 20 135 60

Total 465 400 640 520

48

p1q1 520

Price Index P01 100 100 81.5

p0 q1 640

q1 p1 520

Quantity Index Q01 100 100 130

q0 p1 400

(iii) Fisher’s Ideal Index : Laspeyre has used base year quantities as weights whereas Paasche’s

has used current year quantities as weights for the computation of Index Number of prices. Fisher

suggested that both the current year quantities and the base year quantities should be used but

geometric mean of the two be calculated and that figure should be the Index Number. Symbolically,

Fisher’s Price Index P01 = p q 100 p q 100 p q p q 100

0 0 0 1 0 0 0 1

On the other hand if quantity Indices by this method are to be calculated the geometric mean of

the Index Number of quantities with base year prices as weights and Index Number of Quantities

with current year as weights be found out. Symbolically,

q p q p

Fisher’s Quantity Index Q01 1 0 1 1 100

q p q p

0 0 0 1

Example 5 : Construct Index Number of Prices and Quantities from the following data using Fisher’s

method (2010 = 100).

2010 2014

Commodity Price Qty. Price Qty.

A 2 8 4 6

B 5 10 6 5

C 4 14 5 10

C 2 19 2 13

2010 2014

Items price (p0) Qty. (q0) price (p1) Qty. (q1) p0q0 p1q1 p1q0 p0q1

A 2 8 4 6 16 24 32 12

B 5 10 6 5 50 30 60 25

C 4 14 5 10 56 50 70 40

D 2 19 2 13 38 26 38 26

Total 160 130 200 103

49

p q p q 200 130

P01 1 0 1 1 100 100 125.6

p0 q0 p0 q1 160 103

q p q p 103 130

Q01 1 0 1 1 100 100 64.7

q0 p0 q0 p1 160 200

(iv) Marshall & Edgeworth’s Method: In this method also both current year as well as base

year prices and quantities are considered. The formula is as follows :

P01 100 100

(q0 q1 ) p0 q0 p0 q1 p0

Q01 100 100

( p0 p1 )q0 p0 q0 p1q0

(v) Kelly’s Method: Truman Kelly has suggested the following formula for constructing Index

Number.

p1q q0 q1

P01 100 where q

p0 q 2

where q refers to the average quantity of two periods. This is also known as fixed aggregative

method.

(vi) Dorbish & Bowley’s Method: Dorbish & Bowley have suggested the simple arithmetic

mean of Lespeyre’s and Paasche’s formula. Symbolically.

p1q0 p1q1

p0 q0 p0 q1

P01 100

2

(II) Weighted Average of Price Relatives : This method is also known as the Family Budget Method.

Weights are values (p0q0) of the base year in this method. The Index Number for the current year is

calculated by dividing the sum of the products of the current year’s price relatives and base year

values by the total of the weights, i.e., the weighted arithmetic average of the price relatives gives

the required index numbers. Symbolically,

IV

Weighted Index number of the current year =

V

where I stands for Price Relatives of the current year and V stands for the values of the base year.

50

Example 6 : From the data given below, calculate the Weighted Index Number by using weighted

average of Relatives.

Commodities Units Base Yr. Qty. Base Year’s Price Current Yr. Price

A Quintal 7 16 19.6

B Kg. 6 2 3.2

C Dozen 16 5.6 7.0

D Metre 21 1.5 1.4

Solution :

The Price relative of the current year = ×100

Base Year's Price

Value of a weights = Quantity of base year × Price of the base year

Commodities Price Relatives Value of Weights Weights × Price Relatives

p

I = 1 ×100 i.e. V = p0q0 V×I

p0

A 122.5 112.0 13,720

B 160.0 12.0 1,920

C 125 89.6 11,200

D 93.3 31.5 2,939

V = 245.1 V = 29,779

IV 29779

Weighted Index Number of the Current year = 121.5 Ans.

V 245

In weighted average of relatives, the Geometric mean may be used instead of arithmetic mean.

The weighted geometric mean of relatives is calculated by applying logarithms to the relatives.

When this mean is used, then formula is:

V where I =

p1

p0

100 and V = p0q0

Example 7 : Find out price index by weighted average of price relatives from the following

commodities using geometric mean :

Commodities p0 q0 p1

X 3.0 20 4.0

Y 1.5 40 1.6

Z 1.0 10 1.5

51

Solution :

Calculation of Index Number

p1

(p0q0) p0 ×100

4

X 3.0 20 4.0 60 133.33 100 2.1249 127.494

3

1.6

Y 1.5 40 1.6 60 106.7 100 2.0282 121.692

1.5

1.5

Z 1.0 10 1.5 10 150.0 100 2.1761 21.761

1.0

V = 130 V log I = 270.947

By applying the formula:

P01 = AL V .log I

V 270.947

AL

130

AL2.084 121.3

Since several formulas have been suggested for the construction of index numbers, then the question

arises which method of index number is the most suitable in a given situation. These are some tests

to choose an appropriate index:

(i) Unit Test: It requires that the method of constructing index should be independent of the

units of the problem. All the methods except simple aggregative method satisfy this test.

(ii) Circular Test: It is based on the shiftability of the base. Accordingly, the index should

work in a circular fashion i.e., if an index number is computed for the period 1 on the base

period 0, another index is computed for period 2 on the base period 1, and still another

index number is computed for period 3 on the base period 2. Then the product should be

equal to one.

P01 × P12 × P23 ..........× Pn = 1

0

Only simple aggregative and fixed weight aggregative method satisfy the test.

If the test is applied to simple aggregative method, we will get

1

p0 p1 p2

The test is met by simple geometric mean of price relatives and the weighted aggregative of

fixed weights.

52

(iii) Time Reversal Test: According to Prof. Fisher the formula for calculating an index number

should be such that it gives the same ratio between one point of time and the other, no

matter which of the two time is taken as the base. In other words, when the data for any

two years are treated by the same method, but with the base reversed, the two index

numbers should be reciprocals of each other.

P01 × P10 = 1 (omitting the factor 100 from each index).

where P01 denotes the index for current year 1 based on the base year 0 and P10 is for current

year 0 on the base year 1.

It can be easily verified that simple geometric mean of price relatives index, weighted aggregative

formula, weighted geometric mean of relatives and Marshall Edgeworth and Fisher’s ideal

method satisfies the test.

Let us see how Fisher’s ideal method satisfies the test.

p1q0 p1q1

P01

p0 q0 p0 q1

p0 q1 p0 q0

P01

p1q1 p1q0

Substitute the value of P01 and P10

P01 P10 1

p0 q0 p0 q1 p1q1 p1q0

(iv) Factor Reversal Test: It says that the product of a price index and the quantity index

should be equal to value index. In the words of Fisher, just as each formula should permit

the interchange of the two times without giving inconsistent results similarly it should

permit interchanging the prices and quantities without giving inconsistent results which

means two results multiplied together should give the true value ratio. The test says that

the change in price multiplied by change in quantity should be equal to total change in

value. If P01 is a price index for the current year with reference to base year and Q01 is the

quantity index for the current year,

p1q1

then P01 Q 01

p0 q0

p1q0 p1q1

P01

p0 q0 p0 q1

53

Changing p to q and q to p.

q1 p0 q1 p1

Q01

q0 p0 q0 p1

P01 Q01 2

p0 q0 p0 q1 q0 p0 q0 p1 (p0 q0 ) (p0 q0 )

In other words, factor reversal test is based on the following analogy. If the price per unit of a

commodity increases from Rs. 10 in 1995 to Rs. 15 in 1998, and the quantity of consumption

changes from 100 units to 140 units during the same period, then the price and quantity in 1998 are

15 and 140 respectively. The values of consumption (p × q) were Rs. 1000 in 1995 and Rs. 2100 in

1998 giving a value ratio.

p1q1 2100

2.1

p0 q0 1000

Thus we find that the product of price ratio and quantity ratio equals the value ratio :

1.5 × 1.4 = 2.1

The various formulas discussed so far assume that base period is some fixed previous period. The

index of a given year on a given fixed base is not affected by changes in the prices or the quantities

of any other year. On the other hand, in the chain base method, the value of each period is related

with that of the immediately proceeding period and not with any fixed period. To contruct index

numbers by chain base method, a series of index numbers are computed for each year with preceeding

year as the base. These index numbers are known as Link relatives. The link relatives when multiplied

successively known as the chaining process give link to a common base. The products obtained are

expressed as % and give the required index number. The steps of chain base index are :

(i) Express the figures of each period as a % of the preceeding period to obtain Link Relatives

(LR)

(ii) These link relatives are chained together by successive multiplication to get chain indices

by the formula:

Chain Base Index (CBI) =

100

(iii) The chain index can be converted into a fixed base index by this formula :

Fixed Base Index (FBI) =

100

Chain relatives are computed from link relatives whereas fixed base relatives are computed

54

directly from the original data. The results obtained by fixed base and chain base index invariably

are the same.

We shall understand the process by taking some examples.

Example 8 : Construct Index Numbers by chain base method from the following data of wholesale

prices.

Year : 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

Prices : 75 50 65 60 72 70 69 75 84 80

Solution :

Computation of Chain Index

Year Price Link Relatives Chain Base Index Fixed Base Index

2005 75 100 100 100

50 66.67 100 50

2006 50 100 66.67 66.67 100 66.67

75 100 75

65 130 66.67 65

2007 65 100 130 86.67 100 86.67

50 100 75

60 92.31 86.67 60

2008 60 100 92.31 80.00 100 80

65 100 75

72 120 80 72

2009 72 100 120 96.00 100 96

60 100 75

70 97.22 96 70

2010 70 100 97.22 93.33 100 93.33

72 100 75

69 98.57 93.33 69

2011 69 100 98.57 92.00 100 92

70 100 75

75 108.69 92 75

2012 75 100 108.69 100.00 100 100

69 100 75

84 112 100 84

2013 84 100 112 112.00 100 112

75 100 75

80 95.24 112 80

2014 80 100 95.24 106.67 100 106.67

84 100 75

It may be seen that index by chain base and fixed base method comes to the same.

55

Example 9 : Construct chain index numbers from the link relatives given below:

Year : 2011 2012 2013 2014 2015

Link Relatives : 100 105 95 115 102

Solution :

Calculations for Chain Base Index

Year Link Relatives Chain Index Number

2011 100 100

105

2012 105 100 105.00

100

95

2013 95 105 99.75

100

115

2014 115 99.75 114.7

100

102

2015 102 114.75 117

100

Base Shifting: Sometimes it becomes necessary to change the base of index number series

from one period to another for the purpose of comparison. In such circumstances it is necessary to

recompute all index numbers using new base period. Such computation of index numbers using

new base period is to divide index number in each period by the index number corresponding to the

new base period and then to express the result as percentages. This process is known as the Base

shifting.

Example 10 : Compute Index Numbers from the following taking 2012 as the base and shift the

base to 2014.

Solution :

Year Price Index Number Shift of base from

Base 2012 2012 to 2014

100

2012 10 100 100 67

150

12 120

2013 12 100 120 100 80

10 150

15

2014 15 100 150 100

10

21 210

2015 21 100 210 100 140

10 150

20 200

2016 20 100 200 100 133

10 150

56

1.7 SPLICING

On several occasions the base year may give discontinuity in the construction of index numbers.

We would always like to compare figures with a recent year and not with distant past. For example,

the weights of an index number may become out of data and we may construct another index with

new weights. Two indices would appear. It becomes necessary to convert these two indices into a

continuous series. The procedure employed to do the conversion is known as splicing. The formulae

are :

For Forward Splicing :

Old index of the New Base Year × Index to be adjusted

Spliced Index Number:

100

For Backward Splicing:

100

Spliced Index Number: × Index to be adjusted

Old index of the New Base Year

Example 11 : Splice the following two Index number series, A series forward and B series backward:

Solution :

Splicing of two Index Number Series

Year Series Series Index Number Spliced Index Numbers Spliced

A B forward to Series A backward to Series B

100

2010 100 100 66.66

150

100

2011 120 120 80.00

150

150 100

2012 150 100 100 150 150 100.00

100 150

150

2013 110 110 165

100

150

2014 120 120 180

100

150

2015 150 150 225

100

57

Deflating: It means making allowance for the changes in the purchasing power of money due

to a change in general price level. It is the technique of converting a series of value calculated at

current prices in to a series at constant prices of a given year. In other words the process of removing

the effects of price changes from the current money values is called Deflation. By this process the

real value of the phenomenon is calculated which is free from the influence of price changes.

Deflation is used in computation of national income and other economic variables. The relevant

price index is called the deflator whether it is to be the wholesale price index or consumer price

index. Normally separate price deflators are found out for deflating the national income data from

different sectors of the economy considering the changes in prices in those sectors. The method is :

Current value

Deflated value = ×100

Deflator

The consumer price index known as cost of living index is calculated to know the average change

over time in the prices of commodities consumed by the consumers. The need to construct consumer

price indices arises because the general index numbers fail to give an exact idea of the effect of the

change in the general price level on the cost of living of different classes of people, because a given

change in the level of prices affect different classes of people in different manners. Different people

consume different commodities and if same commodities then in different proportions. The consumer

price index helps us in determining the effect of rise and fall in prices on different classes of consumers

living in different area. The consumer price index is significant because the demand for higher

wages is based on the cost of living index and the wages and salaries in most nations are adjusted

according to this index. We should understand that the cost of living index does not measure the

actual cost of living nor the fluctuations in the cost of living due to causes other than the change in

price level but its object is to find out how much the consumers of a particular class have to pay

more for a certain basket of goods and services. That is why the term cost of living index has been

replaced by the term price of living index, cost of living price index or consumer price index.

The significance of studying the consumer price index is that it helps in wage negotiations and

wage contracts. It also helps in preparing wage policy, price policy, rent control, taxation and general

economic policies. This index is also used to find out the changing purchasing power of different

currencies.

Consumer Price Index can be prepared by two methods :

(i) Aggregative Method;

(ii) Weighted Relatives Method.

When, aggregative method is used to prepare consumer price index, the aggregative expenditure

for current year and base year are calculated and the below given formula is applied.

p 1q0

Consumer Price Index = p q 100

0 0

58

When weighted relatives method is used then the family budgets of a large number of people

for whom the index is meant are carefully studied and the aggregative expenditure of an average

family on various items is estimated. These will be weights. In other words, the weights are calculated

by multiplying the base year quantities and prices (p0q0). The price relatives for all the commodities

are prepared and multiplied by the weights. By applying the formula, we can calculate Consumer

price index.

IV p1

Consumer Price Index = where I = p 100 and V = p0q0

V 0

Example 12 : Prepare the Consumer price index for 2013 on the basis of 2010 from the following

data by both methods.

Commodities Quantities Consumed Prices Prices

2010 2010 2013

A 6 5.75 6.00

B 6 5.00 8.00

C 1 6.00 9.00

D 6 8.00 10.00

E 4 2.00 1.50

F 1 20.00 15.00

Solution :

Consumer Price Index by Aggregative Method

Commodities q0 p0 p1 p1q0 p0q0

B 6 5.00 8.00 48.00 30.00

C 1 6.00 9.00 9.00 6.00

D 6 8.00 10.00 60.00 48.00

E 4 2.00 1.50 6.00 8.00

F 1 20.00 15.00 15.00 20.00

p1q0 174

Consumer Price Index = 100 100 118.77

p0 q0 146.5

59

Consumer Price Index by Weighted Relatives

Commodities q0 p0 p1 I V IV

A 6 5.75 6.00 104.34 34.50 3600

B 6 5.00 8.00 160.00 30.00 4800

C 1 6.00 9.00 150.00 6.00 900

D 6 8.00 10.00 125.00 48.00 6000

E 4 2.00 1.50 75.00 8.00 600

F 1 20.00 15.00 75.00 20.00 1500

V = 146.5 IV = 17400

IV 17400

Consumer Price Index = 118.77

V 146.5

The Index Number of industrial production is prepared to know the increase or decrease in the level

of industrial production in a given period compared with some other period. This index measures

the changes in quantum of production. To prepare such an index it is necessary for us to compute

the production for two periods i.e. for the current year and for the base year. Generally the data are

collected under these heads :

(i) Textile industries – cotton, woollen, silk etc.

(ii) Mining industries – iron-ore, coal, copper, petroleum etc.

(iii) Metallurgical industries – iron-ore, coal, copper, petroleum etc.

(iv) Mechanical industries – locomotives, ships, aeroplane etc.

(v) Miscellaneous – glass, soap, chemical, cement etc.

The output for various industries are computed. Weights are assigned to various industries on

the basis of some criteria as capital invested, turnover, net output, production etc. We apply this

formula:

IW

Index of Industrial Production =

W

q1

where I = q and W = Relative importance of different outputs.

0

1. They are only approximate indicators indicators of the relatives level of a phenomenon.

60

2. Index number are good for achieving one objective may be unsuitable for the other.

3. Index numbers can be manipulated in a manner as to draw the desired conclusion.

Bombay Stock Exchange

The first organised stock exchange was established in July 1875 as an association of native brokers,

named as Native Shares and Stock Brokers Association. Its formal deed of association was executed

in 1887. This stock exchange is now popularly known as the Bombay Stock Exchange (BSE). This

stock exchange played a significant role during the phase of recovery from several years of depression.

It was the first to be recognised by the Government of India.

The Exchange, while providing an efficient and transparent market for trading in securities,

debt and derivatives upholds the interests of the investors and ensures redressal of their grievances

whether against the companies or its own member-brokers. It also strives to educate and enlighten

the investors by conducting investor education programmes and making available to them necessary

informative inputs.

SENSEX – The Barometer of Indian Capital Markets

Bombay Stock Exchange (BSE) Sensitive Index Number of equity prices (SENSEX) is the most

widely used and accepted equity price index in the country. SENSEX, first compiled in 1986, was

calculated on a “Market Capitalization-Weighted” methodology of 30 component stocks representing

large, well-established and financially sound companies across key sectors. The base year of SENSEX

was taken as 1978-79. SENSEX today is widely reported in both domestic and international markets

through print as well as electronic media. It is scientifically designed and is based on globally

accepted construction and review methodology. Since September 1, 2003, SENSEX is being

calculated on a free-float market capitalization methodology. The “free-float market capitalization-

weighted” methodology is a widely followed index construction methodology on which majority of

global equity indices are based; all major index providers like MSCI, FTSE, STOXX, S&P and

Dow Jones use the free-float methodology.

Index Specification

Base Year 1978-79

Base Index Value 100

Date of Launch 01-01-1986

Method of calculation Launched on full market capitalization method and

effective September 01, 2003, calculation method

shifted to free-float market capitalization.

Number of scrips 30

Index calculation frequency Real Time

Historical Value of Index Index, Price Earnings, Price to Book Value Ratio

and Dividend Yield %

61

SENSEX Calculation Methodology

SENSEX is calculated using the “Free-float Market Capitalization” methodology, wherein, the

level of index at any point of time reflects the free-float market value of 30 component stocks

relative to a base period. The market capitalization of a company is determined by multiplying the

price of its stock by the number of shares issued by the company. This market capitalization is

further multiplied by the free-float factor to determine the free-float market capitalization.

The base period of SENSEX is 1978-79 and the base value is 100 index points. This is often

indicated by the notation 1978-79 = 100. The calculation of SENSEX involves dividing the free-

float market capitalization of 30 companies in the Index by a number called the Index Divisor. The

Divisor is the only link to the original base period value of the SENSEX. It keeps the Index comparable

over time and is the adjustment point for all Index adjustment arising out of corporate actions,

replacement of scrips, etc. During market hours, prices of the index scrips, at which latest trades are

executed, are used by the trading system to calculate SENSEX on a continuous basis.

Calculation of BSE SENSEX

Sensex is calculated using a “Market Capitalisation-Weighted” methodology. As per this

methodology, the level of index at any point of time reflects the total market value of 30 component

stocks relative to a base period. (The market capitalisation of a company is determined by multiplying

the price of its stock by the number of shares issued by the company). Statisticians call an index of

a set of combined variables (such as price and number of shares) a composite index. A single

indexed number is used to represent the results of this calculation in order to make the value easier

to work with and track over time. It is much easier to graph a chart based on indexed values than

one based on actual values.

The base period of Sensex is 1978-79. The actual total market value of the stocks in the index

during the base period has been set equal to an indexed value of 100. This is often indicated by the

notation 1978-79 = 100. The formula used to calculate the index is fairly straightforward. However,

the calculation of the adjustments to the index (commonly called Index maintenance) is more

complex.

The calculation of Sensex involves dividing the total market capitalisation of 30 companies in the

index by a number called the Index Divisor. The Divisor is the only link to the original base period

value of the Sensex. It keeps the index comparable over time and is the adjustment point for all

index maintenance adjustments. During market hours, prices of the index scrips, at which latest

trades are executed, are used by the trading system to calculate Sensex every 15 seconds and

disseminated, all over the country through BOLT terminals in real time.

Calculation of Closing SENSEX

The closing Sensex is computed taking the weighted average of all the trades on Sensex constituents

in the last 15 minutes of trading session. If a Sensex constituent has not traded in the last 15 minutes,

the last traded price is taken for computation of the index closure. If a Sensex constituent has not

traded at all in a day, then its last day’s closing price is taken for computation of index closure. The

use of Index Closure Algorithm prevents any intentional manipulation of the closing index value.

62

BSE Sensex as on January 7, 2012

Scrip Company Close No. of shares Full mkt. Free- Free-float Weight,

code price (normal) cap. float mkt. cap. in

(Rs. crore) adj. (Rs.crore) index

factor (%)

500209 Infosys Ltd. 2,837.50 574,203,082 162,930.12 0.85 138,490.61 10.66

500325 Reliance 714.70 3,274,452,139 234,025.09 0.55 128,713.80 9.91

500875 ITC Ltd. 201.60 7,789,453,850 157,035.39 0.70 109,924.77 8.46

500010 Housing Deve. 668.05 1,470,391,801 98,229.52 0.95 93,318.05 7.18

532174 ICICI Bank Ltd. 745.45 1,152,540,454 85,916.13 1.00 85,916.13 6.61

500180 HDFC Bank Ltd. 451.10 2,337,610,305 105,449.60 0.80 84,359.68 6.49

532540 TCS Ltd. 1,171.35 1,957,220,996 229,259.08 0.30 68,777.72 5.29

500510 Larsen & Toubro 1,081.45 611,844,627 66,167.94 0.90 59,551.14 4.58

500112 State Bank of India 1,669.75 634,999,595 106,029.06 . 0.45 47,713.08 3.67

532454 Bharti Airtel 330.90 3,797,530,096 125,660.27 0.35 43,981.09 3.49

500312 ONG Corp. Ltd. 256.65 8,555,490,120 219,576.65 0.20 43,915.33 3.38

500696 Hind Uni Ltd. 396.00 2,160,391,918 85,551.52 0.50 42,775.76 3.29

500570 Tata Motors 203.55 2,691,486,150 54,785.20 0.65 35,610.38 2.74

500520 Mahindra & M 654.25 613,974,839 40,169.30 0.75 30,126.98 2.32

532555 NTPC Ltd. 157.15 8,245,464,400 129,577.47 0.20 25,915.49 2.00

507685 Wipro Ltd. 406.55 2,457,821,578 99,922.74 0.25 24,980.68 1.92

500470 Tata Steel 362.80 959,214,779 34,800.31 0.70 24,360.22 1.88

500103 BHEL 250.65 2,447,600,000 61,349.09 0.35 21,472.18 1.65

532977 Bajaj Auto 1,448.05 289,367,020 41,901.79 0.50 20,950.90 1.61

524715 Sun Pharmace 500.40 1,035,550,385 51,818.94 0.40 20,727.58 1.60

533278 Coal India 319.80 6,316,364,400 201,997.33 0.10 20,199.73 1.56

532286 Jindal Steel 466.05 934,509,595 43,552.82 0.45 19,598.77 1.51

500087 CIPLA Ltd. 335.80 802,921,357 26,962.10 0.65 17,525,36 1.35

500182 Hero Moto Co. 1,729.80 199,687,500 34,541.94 0.50 17,270.97 1.33

500440 Hindalco In 117.95 1,918,596,448 22,629.85 0.70 15,840.89 1.22

500400 Tata Power 91.95 2,373,072,360 21,820.40 0.70 15,274.28 1.18

500900 Sterlite In 94.35 3,360,700,478 31,708.21 0.45 14,268.69 1.10

532500 Maruti Suzuki 954.75 288,910,060 27,583.69 0.50 13,791.84 1.06

532868 DLF Limited 176.60 1,698,157,659 29,989.46 0.25 7,497.37 0.58

532532 Jaiprak Asso 51.90 2,126,433,182 11,036.19 0.55 6,069,90 0.47

Total 2,641,977.20 1,298,919.37

63

Sectorwise Market Capitalisation of SENSEX as on January 7, 2012

Sl. No. SENSEX / Sectors Free float market capitalization

(Rs. crore) %

SENSEX 1,427,501.46 100.00

1. Finance 311,306.94 23.97

2. Information Technology 232,249.01 17.88

3. Oil & Gas 172,629.13 13.29

4. FMCG 152,700.53 11.76

5. Transport Equipments 117,751.07 9.07

6. Metal, Metal Product & Mining 94,268.30 7.26

7. Capital Goods 81,023.32 6.24

8. Telecom 43,981.09 3.39

9. Power 41,189.77 3.17

10. Healthcare 38,252.94 2.94

11. Housing Related 13,567.27 1.04

From 1st September 2003, the country’s equity benchmark Sensex is being calculated based on the

Free-float methodology. Prior to 1-9-2003, the Sensex was calculated based on the full market

capitalisation methodology.

Globally, the Free-float methodology of index construction is considered to be an industry best

practice and all major index providers like MSCI, FTSE, S&P and STOXX have adopted the same.

The MSCI India Standard Index is also based on the Free-float methodology.

In India, BSE pioneered the concept of Free-float with the launch of the country’s first Free-

float based index – BSE TECk in July 2001 and BANKEX in June 2003. The shifting of Sensex to

this methodology is a culmination of successful experiences with these two indices and a series of

debates and discussions in the last few years.

The new methodology would align the Sensex with the best global practice in index construction.

A smooth transition from full market capitalisation to Free-float market capitalisation methodology

would ensure that the basic characteristics of Sensex are retained. Importantly, the Free-float

methodology will further improve the benchmarking qualities of Sensex while maintaining its

historical continuity.

The following Free-float factors will be applied to the Sensex companies. A Free-float factor

of say 0.9 means that only 90% of the total market capitalisation of that company would be taken

into consideration for index calculation.

64

Free-float Index

Currently all equtiy indices in India, except the BSE-TECk Index and BANKEX, are calculated

using the ‘full-market capitalisation’ methodology. Under the ‘full-market capitalisation’

methodology, the total market capitalisation of a company, irrespective of who is holding the shares,

is taken into consideration for computation of an index. However, if instead of taking the total

market capitalisation, only the Free-float market capitalisation of a company is considered for index

calculation, it is called the Free-float methodology. Free-float market capitalisation is defined as

that proportion of total shares issued by the company, which are readily available for trading in the

market. It generally excludes promoters’ holding, government holding, strategic holding and other

locked-in shares, which will not come to the market for trading in the normal course. Thus, the

market capitalisation of each company in a Free-float index is reduced to the extent of its Free-float

available in the market.

National Stock Exchange (NSE)

In order to provide a nationwide stock trading facility to investors and to bring the Indian financial

market in line with international market, the National Stock Exchange (NSE) was set up and it

started its operations by the end of 1993. Further, it started trading in debt instruments in May, 1994

and in equity shares by the end of November 1994. The NSE uses the electronic trading system and

computerised settlement system. This system is so designed that it can be extended to every corner

of the country through the medium of electronic network. It was recently accorded recognition as a

stock exchange by the Department of Company Affairs. The instruments traded are treasury bills,

government security and bonds issued by public sector companies.

The exchange has two separate segments, i.e., capital market segment and money market

segment. The former is concerned with trading in equity shares, convertible debentures and debt

instruments as non-convertible debentures. In the money market segment, also known as whole-

sale debt market segment, facilitates trading in debts, public sector bonds, mutual fund units, treasury

bills, government securities, call money instruments, etc. The transactions in this segment are of

high values. The main participants, in this market are usually banks, financial institutions and other

financial agencies.

NSE-50, NIFTY

The NSE-50 index, NIFTY was launched by the National Stock Exchange of India Limited (NSE)

in April 1996, taking as base the closing prices of November prices of November 3, 1995 when one

year of operations of its capital market segment was completed. According to the NSE, the index

was introduced with the objectives of:

1. reflecting market movement more accurately,

2. providing fund managers with a tool for measuring portfolio returns vis-a-vis market

returns, and

3. providing a basis for introducing index-based derivatives.

The index is based on the prices of shares of 50 companies (chosen from among the companies

65

traded on the NSE), each with a market capitalisation of at least Rs. 500 crores and having a high

degree of liquidity. The methodology used for the computation of this index is ‘market capitalisation

weightage’ as followed by the S&P-500. The base value of the index has been set at 1000, and not

the usual 100.

S&P CNX NIFTY

The S&P CNX Nifty is the headline index on the National Stock Exchange of India Ltd (NSE). It

includes 50 of the approximately 1,300 companies listed on the NSE, captures approximately 60%

of its equity market capitalization and is a true reflection of the-Indian stock market.

S&P CNX Nifty tracks the behaviour of a portfolio of blue chip companies, the largest and

most liquid Indian securities. It covers 25 sectors-of the Indian economy and offers investment

managers exposure to the Indian market in one efficient portfolio. The index has been trading since

April of 1996 and is well suited for benchmarking, index funds, and index-based derivatives.

The S&P CNX Nifty index is owned and managed by the Indian Index Services and Products

Ltd. (IISL), with which Standard and Poor’s has a consulting and licensing agreement. IISL is a

joint venture between NSE and CRISIL (formerly Credit Rating Information Services of India Ltd.).

Index Methodology

S&P CNX Nifty is maintained by IISL’s Index Policy Committee, which manages policy and

guidelines for all CNX (CRISIL/NSE) indices. This Index Policy Committee follows a clear published

set of rules for index revision and meets quarterly to consider their application. Additionally, the

IISL’s Index Maintenance Sub-Committee reviews decisions about additions and deletions to the

index on a quarterly basis. Complete details of these rules are available on the website at

www.indices.standardandpoors.com.

NIFTY COMPOSITION

Sl. Scrip Equity Free Weigh- Beta R2 Vola- Monthly Impact

No. Capital Float tage% tility Returns Cost

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

1. ACC 1,877,452,660 9,629 0.59 0.72 0.48 1.23 6.43 0.07

2. Ambuja Cem 3,063,349,822 10,652 0.65 0.97 0.48 1.61 –3.22 0.08

3. Axis Bank 4,116,997,330 34,573 2.10 1.36 0.74 1.86 3.72 0.07

4. Bajaj Auto 2,893,670,200 19,722 1.20 0.75 0.49 1.07 4.44 0.06

5. Bharti Airtel 18,987,650,480 52,650 3.20 0.76 0.46 2.05 10.72 0.07

6. BHEL 4,895,200,000 29,067 1.77 0.86 0.58 1.99 –10.29 0.06

7. BPCL 3,615,421,240 8,500 0.52 0.79 0.45 0.92 1.17 0.07

8. CAIRN 19,022,340,290 11,439 0.70 0.59 0.40 1.79 –1.11 0.07

9. CIPLA 1,605,842,714 15,558 0.95 0.71 0.51 1.26 –7.39 0.07

10. DLF 3,395,150,248 8,380 0.51 1.42 0.66 2.59 9.76 0.07

11. DR Reddy 846,959,090 19,974 1.22 0.59 0.46 1.55 3.53 0.08

66

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

12. GAIL 12,684,774,000 20,712 1.26 0.62 0.46 1.34 4.51 0.08

13. GRASIM 917,018,380 13,942 0.85 0.67 0.49 1.37 4.78 0.08

14. HCL Tech 1,376,020,536 11,839 0.72 0.95 0.60 1.21 –1.61 0.06

15. HDFC 2,936,038,650 89,641 5.45 1.19 0.71 1.26 –2.45 0.07

16. HDFC Bank 4,667,711,360 87,080 5.30 1.06 0.74 . 1.06 –3.24 0.07

17. Hero Honda 399,375,000 17,035 1.04 0.53 0.25 1.20 –4.91 0.06

18. Hindal Co. 1,914,419,297 21,647 1.32 1.42 0.67 2.44 –6.70 0.06

19. Hind Unilever 2,160,683,560 33,208 2.02 0.58 0.42 0.84 –5.72 0.05

20. ICICI Bank 11,518,614,870 119,419 7.27 1.40 0.78 1.47 –5.29 0.06

21. IDFC 14,627,715,770 15,139 0.92 1.47 0.71 2.45 –3.93 0.07

22. INFY 2,870,938,460 133,825 8.14 0.83 0.62 1.38 –4.62 0.04

23. ITC 7,738,144,280 110,948 6.75 0.75 0.54 1.26 2.66 0.06

24. Jindal Steel 934,509,595 22,847 1.39 0.99 0.67 1.21 –9.87 0.06

25. JP Associate 4,252,866,364 7,527 0.46 1.78 0.70 1.69 –17.73 0.07

26. Kotak Bank 3,689,051,845 15,763 0.96 1.20 0.67 2.02 –7.16 0.08

27. LT 1,220,046,436 92,399 5.62 1.18 0.73 1.68 –5.38 0.06

28. M&M 3,069,874,195 33,234 2.02 1.20 0.65 1.99 2.69 0.07

29. Maruti 1,444,550,300 15,963 0.97 0.82 0.56 1.14 4.03 0.06

30. NTPC 82,454,644,000 22,507 1.37 0.75 0.57 1.23 –5.78 0.07

31. ONGC 42,777,450,600 36,322 2.21 0.71 0.48 0.94 –1.82 0.07

32. PNB 3,168,121,570 14,955 0.91 0.97 0.64 1.73 3.18 0.08

33. Power Grid 46,297,253,530 14,879 0.91 0.53 0.45 0.96 –4.02 0.06

34. Ranbaxy 2,106,816,150 8,231 0.50 0.89 0.50 1.29 –0.35 0.06

35. RCom 10,320,134,405 6,736 0.41 1.22 0.47 4.16 6.11 0.08

36. Rel Capital 2,456,328,000 6,484 0.39 1.24 0.57 1.54 –0.50 0.06

37. Reliance 32,738,103,000 139,612 8.50 0.96 0.68 1.60 –7.85 0.06

38. Re Infra 2,653,702,620 7,502 0.46 1.13 0.46 2.08 1.07 0.07

39. R Power 28,051,264,660 6,149 0.37 0.98 0.52 1.19 –3.41 0.07

40. SAIL 41,304,005,450 7,400 0.45 1.06 0.61 1.64 –8.11 0.07

41. SBIN 6,349,989,910 60,447 3.68 1.25 0.69 1.35 –2.49 0.04

42. SESAGOA 869,101,423 10,727 0.65 1.05 0.51 1.95 –2.48 0.06

43. Siemens 680,589,800 7,881 0.48 0.52 0.33 1.17 3.86 0.07

44. Ster 3,361,568,684 22,704 1.38 1.21 0.61 1.58 –5.05 0.06

45. Sun Pharma 1,035,581,955 19,472 1.18 0.68 0.45 1.17 4.08 0.07

67

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

46. Tata Motors 5,382,725,080 33,259 2.02 1.27 0.61 1.93 –4.64 0.05

47. Tata Power 2,373,072,360 20,733 1.26 0.57 0.46 1.30 –2.07 0.07

48. Tata Steel 9,592,144,500 37,547 2.28 1.15 0.70 1.14 –7.58 0.05

49. TCS 1,957,220,996 57,746 3.51 0.95 0.61 1.15 –3.99 0.06

50. WIPRO 4,911,272,378 19,741 1.20 0.83 0.56 1.37 –6.78 0.06

1.12 SUMMARY

An index number measures relative changes in the value of some economic variable/s over a

period of time. It is always expressed in terms of a base of usually 100.

Index numbers showing changes in the values of one variable over time are called univariate

while those showing changes in a group of variables are known as composite index numbers.

The base of an index may either be fixed or chained. In fixed base index numbers, the base

period is common while in chain base indices, for every period its immediately preceding

period is taken as the base.

In order to compare two time series of price relatives, it is necessary that each series should

have the same base period.

The composite index numbers may be simple or weighted, and aggregative or average-of-

relatives.

Simple aggregative price index shows the aggregate of the current year prices as a percentage

of the aggregate of the base year prices.

For weighted aggregative price indices, the quantities are used as weights whereas for

aggregative quantity indices, the prices are used as weights.

The two basic aggregative price indices are those using base year quantities as weights, known

as Laspeyre’s index and current year quantities as weights, known as Paasche’s index.

Fisher’s index is equal to the geometric mean of Laspeyre’s and Paasche’s indexes.

There are four tests of adequacy of index numbers: units test, time-reversal test, factor-reversal

test and circular test. Fisher’s method is the only one that satisfies the first three of these tests.

Value index may be calculated by taking the ratio of current year value to base year value

expressed as a percentage.

Purchasing power of rupee varies inversely with the price index.

Splicing refers to joining two or more series of index numbers for the reason of continuity.

Exercise 1 : True or False Statements

(i) Index numbers are used to measure changes in the magnitude of one variable or a group

of distinct but related variables.

68

(ii) Univariate index numbers involve only one variable.

(iii) Index numbers are called specialized averages because they are used only for special

purposes.

(iv) For a given set of data, the simple average-of-price relatives index using arithmetic mean

would be smaller than simple average-of-price relatives index using geometric mean if

the current year prices are lower than the corresponding base year prices.

(v) Laspeyre’s formula uses base year quantities as weights both for price and quantity index

numbers.

(vi) Laspeyre’s price index can never be smaller than 100 in value.

(vii) Since it is not practical to include all commodities in the construction of an index number,

the sample of commodities should be selected by the method of sample random sampling

for reasons of objectivity.

(viii) For a given set of data. Fisher’s price index number cannot exceed both Laspeyre’s and

Paasche’s price indices.

(ix) Fixed base index numbers are also called link relatives while price relatives is another

name for chain base indices.

(x) In weighted average-of-relatives index, the weights used are either the base year quantities

or the current year quantities.

(xi) Paasche’s price index is calculated by using current year values as weights.

(xii) Laspeyre’s price index always has greater value than Paasche’s price index.

(xiii) Dorbish-Bowley index is equal to the arithmetic mean of Laspeyre’s and Paasche’s price

indices.

(xiv) The value index is given by the product or price and quantity index numbers.

(xv) If an index satisfies time-reversal test, it means that for that index. P01 and P10 are reciprocal

of each other.

(xvi) Simple aggregative index satisfies circular test.

(xvii) Fisher’s index is called ‘ideal’ because it satisfies all the tests of adequacy of index numbers.

(xviii) Splicing refers to connecting two or more series of index numbers for the purpose of

continuity.

(xix) The purchasing power of rupee is inversely related to the price index.

Ans. 1. T, 2. T, 3. F, 4. F, 5. F, 6. F, 7. F, 8. T, 9. F, 10. F, 11. F, 12. F, 13. T, 14. F, 15. T, 16. T, 17. F,

18. T, 19. T

Exercise 2 : Questions and Answers

(i) What are index numbers? Why are they called specialized averages?

(ii) “Index numbers are said to be economic barometers.” Explain this statement.

69

(iii) Distinguish between simple and weighted index numbers. Explain the importance of

weighting in the construction of index numbers. Enumerate some of the important methods

of weighting a price index and discuss their relative merits and demerits.

(iv) Laspeyre’s price index generally shows an upward bias while Paasche’s index shows a

downward bias. Do you agree. Explain.

(v) Explain and illustrate the following:

(a) Base shifting

(b) Splicing

(c) Deflating

(vii) What is consumer price index number and what is its utility? What methods are used to

calculate these numbers?

(viii) State and explain the tests of adequacy of index numbers. Which of these are satisfied by

the following index numbers?

(a) Laspeyre

(a) Paasche ,

(a) Fisher

(a) Dorbish-Bowley

(a) Marshall-Edgeworth

(ix) Information on the price of a commodity over the last few years is given below:

Year: 2004 2005 2006 2007 2008 2009 2010 2011 2012

Price in Rs per kg: 62 68 63 69 75 78 82 98 100

(a) Construct price index numbers taking (i) 2004 as base, and (ii) 2008 as base.

(b) Calculate chain base index numbers.

(x) The price index for 2006 stood at 100. It increased by 8 percent in 2007, decreased by 6

percent in 2008, decreased by 2 percent in 2009, increased by 14 percent in 2010, remained

unchanged in 2011, and increased by 12 percent in 2012. Calculate index numbers for

the years 2004 through 2012 taking 2006 = 100, and then shift the origin to 2008.

(xi) Using the following link relatives, calculate price relatives taking 2005 = 100.

Year: 2005 2006 2007 2008 2009 2010 2011 2012

Link relative: 100 114 120 120 114 136 105 110

(xii) Using following data, calculate the simple average-of-price relatives index by taking

(a) Prices of 2005 as the base

(b) Average prices as the base

70

Year Commodity

A B C D E

2005 20 12 40 24 36

2008 30 20 20 30 15

2012 40 10 20 42 15

(xiii) Calculate Laspeyre’s and Paasche’s price index numbers for the year 2012 using the

following data about five commodities:

Commodity

Year A S C D E

Quantity: 2010 12 20 180 36 48

2012 9 32 244 25 60

2010 24 18 24 48 24

Price: 2012 28 20 22 52 20

Further, test whether each of these satisfies the time-reversal test.

(xiv) From the following data, construct quantity index numbers using (a) Fisher’s method,

and (b) Marshall-Edgeworth method.

Commodity Base year Current year

Price Quantity Price Value

A 16 30 40 720

B 25 40 50 2,000

C 24 25 30 1,200

D 54 16 40 1,320

E 22 44 45 1,350

(xv) Using the following data, show that Fisher’s formula satisfies the time-reversal and factor-

reversal tests.

Commodity Base year Current year

Price Value Price Value

A 6 300 10 440

B 3 300 8 1,200

C 12 720 15 1,050

D 15 600 20 900

E 4 60 5 80

(xvi) An enquiry into the budgets of the middle class families of a certain city revealed that on

an average the percentage expenses on the different groups were: Food 45, Clothing 12,

Fuel & Light 8, and Miscellaneous 20. The increases in group index numbers for the

current year as compared with a fixed base period were, respectively, 310, 50, 243, 148

and 185.

71

(a) Calculate the consumer price index number for the current year.

(b) A person was getting Rs. 14,400 pm in the base year and Rs. 28,300 pm in the current

year. State how much he ought to have received as extra allowance to maintain his former

standard of living.

(xvii) From the information given below about the consumer price index number for a certain

group of families in a city, obtain the percentage weights assigned to (a) clothing, and (b)

housing. The consumer price index number is known to be 152.3.

Group: Food Clothing Housing Fuel and electricity Miscellaneous

Index: 140 185 205 120 156

Weight: 60 ? ? 8 10

(xviii) The monthly income of a person is Rs. 21,000. It is given that the consumer price index

number for a particular month is 136. Find out the amount spent by him on (i) food, and

(ii) clothing.

Group Expenditure Index

Food ? 180

Rent 2,940 100

Clothing ? 150

Fuel and power 3,360 110

Miscellaneous 3,780 80

(xix) Owing to a sudden price disturbance, the consumer price index of a working class in a

certain area increased in a month by one-quarter of what it was before, to 225. The index

of food became 252 from 198, that of clothing from 185 to 205, that of fuel & lighting

from 175 to 195, and that of miscellaneous from 138 to 212. The index of rent, however,

remained unchanged at 150. It was known that the weights of clothing, rent and fuel &

lighting were the same. Find out the exact percentage weights of each of the groups.

Ans. 12. (a) 106.67, 110 (b) 108.20, 95.95, 95 .84, 13. L = 94.52, P = 94.16, No 14. (a) = 109.66

(b) = 107.61, 16. (a) = 325 (b) = 18500, 17. (a) = 10 (b) = 12, 18. (i) = 8400 (ii) = 2520, 19. F = 54,

C = FL = R = 10, Misc = 16

72

UNIT V

LESSON 1

TIME SERIES ANALYSIS

1. STRUCTURE

1.0 Objective

1.1 Introduction

1.2 Components of Time Series

1.2.1 Secular Trend

1.2.2 Seasonal Variations

1.2.3 Business Cycle

1.2.4 Irregular Variations

1.3 Models of Time Series

1.4 Methods of Measuring Trend

1.4.1 Freehand Curve Method

1.4.2 Moving Averages

1.4.3 Semi-average Method

1.4.4 Method of Least Squares

1.5 Second Degree Parabola

1.6 Exponential Trend

1.7 Shifting the Trend Origin

1.8 Conversion of Annual Trend to Monthly Trend

1.9 Measurement of Seasonal Variations

1.9.1 Method of Simple Averages

1.9.2 Ratio-to-Moving Average Method

1.9.3 Ratio-to-Trend Method

1.9.4 Link Relatives Method

1.10 Summary

1.11 Self Assessment Questions

1.0 OBJECTIVE

1.0 After studying this lesson, you should be able to :

(i) Understand the meaning of time series analysis

(ii) Understand the importance and components of time series

(iii) Measure trend and seasonal variations by different methods

(iv) Measure trend by second degree parabola and exponential technique

(v) Learn shifting the trend origin and conversion of annual trend to monthly basis and vice-

versa.

1.1 INTRODUCTION

When quantitative data are arranged in the order of their occurrence, the resulting statistical series

is called a time series. The quantitative values are usually recorded over equal time interval daily,

73

weekly, monthly, quarterly, half yearly, yearly, or any other time measure. Monthly statistics of

industrial production in India, annual birth-rate figures for the entire world, yield on ordinary shares,

weekly wholesale price of rice, daily records of tea sales or census data are some of the examples of

time series. Each has a common characteristic of recording magnitudes that vary with passage of

time.

Time series are influenced by a variety of forces. Some are continuously effective other make

themselves felt at recurring time intervals, and still others are non-recurring or random in nature.

Therefore, the first task is to break down the data and study each of these influences in isolation.

This is known as decomposition of the time series. It enables us to understand fully the nature of the

forces at work. We can then analyse their combined interactions. Such a study is known as time-

series analysis.

A time series consists of the following four components or elements :

1. Basic or Secular or Long-time trend;

2. Seasonal variations;

3. Business cycles or cyclical movement; and

4. Erratic or Irregular fluctuations.

These components provide a basis for the explanation of the past behaviour. They help us to

predict the future behaviour. The major tendency of each component or constituent is largely due to

casual factors. Therefore a brief description of the components and the causal factors associated

with each component should be given before proceeding further.

1.2.1 Secular Trend

Basic trend underlines the tendency to grow or decline over a period of years. It is the movement

that the series would have taken, had there been no seasonal, cyclical or erratic factors. It is the

effect of such factors which are more or less constant for a long time or which change very gradually

and slowly. Such factors are gradual growth in population, tastes and habits or the effect on industrial

output due to improved methods. Increase in production of automobiles and a gradual decrease in

production of foodgrains are examples of increasing and decreasing secular trend.

All basic trends are not of the same nature. Sometimes the predominating tendency will be a

constant amount of growth. This type of trend movement takes the form of a straight line when the

trend values are plotted on a graph paper. Sometimes the trend will be constant percentage increase

or decrease. This type takes the form of a straight line when the trend values are plotted on a semi-

logarithmic chart. Other types of trend encountered are “logistic”, “S-curves”, etc.

Properly recognising and accurately measuring basic trends is one of the most important

problems in time series analysis. Trend values are used as the base from which other three movements

are measured. Therefore, any inaccuracy in its measurement may vitiate the entire work. Fortunately,

the causal elements controlling trend growth are relatively stable. Trends do not commonly change

their nature quickly and without warning. It is therefore reasonable to assume that a representative

trend, which has characterized the data for a past period, is prevailing at present, and that it may be

projected into the future for a year or so.

74

1.2.2 Seasonal Variations

The two principal factors liable for seasonal changes are the climate or weather and customs. Since,

the growth of all vegetation depends upon temperature and moisture, agricultural activity is confined

largely to warm weather in the temperate zones and to the rainy or post-rainy season in the torrid

zone (tropical countries or sub-tropical countries like India). Winter and dry season make farming a

highly seasonal business. This high irregularity of month to month agricultural production determines

largely all harvesting, marketing, canning, preserving, storing, financing, and pricing of farm products.

Manufacturers, bankers and merchants who deal with farmers find their business taking on the

same seasonal pattern which characterise the agriculture of their area.

The second cause of seasonal variation is custom, education or tradition. Such traditional days

as Diwali, Christmas. Id etc., product marked variations in business activity, travel, sales, gifts,

finance, accident, and vacationing.

The successful operation of any business requires that its seasonal variations be known,

measured and exploited fully. Frequently, the purchase of seasonal item is made from six months to

a year in advance. Departments with opposite seasonal changes are frequently combined in the

same firm to avoid dull seasons and to keep sales or production up during the entire year.

Seasonal variations are measured as a percentage of the trend rather than in absolute quantities.

The seasonal index for any month (week, quarter etc.) may be defined as the ratio of the normally

expected value (excluding the business cycle and erratic movements) to the corresponding trend

value. When cyclical movement and erratic fluctuations are absent in a time series, such a series is

called normal. Normal values thus are consisting of trend and seasonal components. Thus when

normal values are divided by the corresponding trend values, we obtain seasonal component of

time series.

1.2.3 Business Cycle

Because of the persistent tendency for business to prosper, decline, stagnate recover; and prosper

again, the third characteristic movement in economic time series is called the business cycle. The

business cycle does not recur regularly like seasonal movement, but moves in response to causes

which develop intermittently out of complex combinations of economic and other considerations.

When the business of a country or a community is above or below normal, the excess deficiency

is usually attributed to the business cycle. Its measurement becomes a process of contrast occurrences

with a normal estimate arrived at by combining the calculated trend and seasonal movements. The

measurement of the variations from normal may be made in terms of actual quantities or it may be

made in such terms as percentage deviations, which is generally more satisfactory method as it

places the measure of cyclical tendencies on comparable base throughout the entire period under

analysis.

1.2.4 Irregular Variations

These movements are exceedingly difficult to dissociate quantitatively from the business cycle.

Their causes are such irregular and unpredictable happenings such as wars, droughts, floods, fires,

pestilence, fads and fashions which operate as spurs or deterrents upon the progress of the cycle.

Examples of such movements are : high activity in middle forties due to erratic effects of 2nd world

war, depression of thirties throughout the world, export boom associated with Korean War in 1950.

The common denominator of every random factor is that it does not come about as a result of the

ordinary operation of the business system and does not recur in any meaningful manner.

75

1.3 MODELS OF TIME SERIES

A time series may not be affected by all type of variations. Some of these type of variations may

affect a few time series, while the other series may be effected by all of them. Hence, in analysing

time series, these effects are isolated. In classical time series analysis it is assumed that any given

observation is made up of trend, seasonal, cyclical and irregular movements and these four

components have multiplicative relationship.

Symbolically:

O =T×S×C×I

where O refers to original data,

T refers to trend,

S refers to seasonal variations,

C refers to cyclical variations and

I refers to irregular variations.

This is the most commonly used model in the decomposition of time series.

There is another model called Additive model in which a particular observation in a time

series is the sum of these four components.

O =T+S+C+I

To prevent confusion between the two models, it should be made clear that in Multiplicative

model S, C, and I are indices expressed as decimal percents whereas in Additive model S, C and I

are quantitative deviations about trend that can be expressed as seasonal, cyclical and irregular in

nature.

If in a multiplicative model, T = 500, S = 1.4, C = 1.20 and I = 0.7 then

O =T×S×C×I

By substituting the values we get

O = 500 × 1.4 × 1.20 × 0.7 = 588

In additive model, T = 500, S = 100, C = 25, I = –50

O = 500 + 100 + 25 – 50 = 575

The assumption underlying the two schemes of analysis is that whereas there is no interaction

among the different constituents or components under the additive scheme, such interaction is very

much present in the multiplicative scheme. Time series analysis, generally, proceed on the assumption

of multiplicative formulation.

Trend can be determined : (i) Free hand curve method ; (ii) moving averages method ; (iii) semi-

averages method ; and (iv) least-squares method. Each of these methods is described below.

1.4.1 Freehand Curve Method

The term freehand is used to any non-mathematical curve in statistical analysis even if it is drawn

with the aid of drafting instruments. This is the simplest method of studying trend of a time series.

The procedure for drawing free hand curve is as follows :

76

(i) The original data are first plotted on a graph paper.

(ii) The direction of the plotted data is carefully observed.

(iii) A smooth line is drawn through the plotted points.

While fitting a trend line by the freehand method, an attempt should be made that the fitted

curve conforms to these conditions.

(i) The curve should be smooth either a straight line or a combination of long gradual curves.

(ii) The trend line or curve should be drawn through the graph of the data in such a way that

the areas below and above the trend line are equal to each other.

(iii) The vertical deviations of the data above the trend line must equal to the deviations below

the line.

(iv) Sum of the squares of the vertical deviations of the observations from the trend should be

minimum.

Example 1 : Draw a time series graph relating to the following data and fit the trend by freehand

method :

Year Production of Steel

(million tonnes)

2007 20

2008 22

2009 30

2010 28

2011 32

2012 25

2013 29

2014 35

2015 40

2016 32

Y TREND OF STEEL PRODUCTION

PRODUCTION OF STEEL

40 IN E

N DL

T RE ATA

LD

30 UA

A CT

20

10

0 X

2007 2008 2009 2011 2012 2013 2014 2015 2016

YEARS

The trend line drawn by the freehand method can be extended to project future values. However,

the free-hand curve fitting is too subjective and should not be used as a basis for prediction.

77

1.4.2 Moving Averages

The moving average is a simple and flexible process of trend measurement which is quite accurate

under certain conditions. This method establishes a trend by means of a series of averages covering

overlapping periods of the data.

The process of successively averaging, say, three years data, and establishing each average as

the moving-average value of the central year in the group, should be carried throughout the entire

series. For a five-item, seven-item or other moving averages, the same procedure is followed : the

average obtained each time being considered as representive of the middle period of the group.

The choice of a 5-year, 7-year, 9-year, or other moving average is determined by the length of

period necessary to eliminate the effects of the business cycle and erratic fluctuations. A good trend

must be free from such movements, and if there is any definite periodicity to the cycle, it is well to

have the moving average to cover one cycle period. Ordinarily, the necessary periods will range

between three and ten years for general business series but even longer periods are required for

certain industries.

In the preceding discussion, the moving averages of odd number of years were representatives

of the middle years. If the moving average covers an even number of years, each average will still

be representative of the mid-point of the period covered, but this mid-point will fall half way between

the two middle years. In the case of a four-year moving average, for instance each average represents

a point half way between the second and third years. In such a case, a second moving average may

be used to ‘recentre’ the averages. That is, if the first moving averages gives averages centering

half-way between the years, a further two-point moving average will recentre the data exactly on

the years.

This method, however, is valuable in approximating trends in a period of transition when the

mathematical lines or curves may be inadequate. This method provides a basis for testing other

types of trends, even though the data are not such as to justify its use otherwise.

Example 2 : Calculate 5-yearly moving average trend for the time series given below.

Year : 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

Quantity : 239 242 238 252 257 250 273 270 268 288 284

Year : 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Quantity : 282 300 303 298 313 317 309 329 333 327

Solution :

Year Quantity 5-yearly moving total 5-yearly moving average

1995 239

1996 242

1997 238 1228 245.6

1998 252 1239 247.8

1999 257 1270 254.0

2000 250 1302 260.4

2001 273 1318 263.6

2002 270 1349 269.8

78

2003 268 1383 276.6

2004 288 1392 278.4

2005 284 1422 284.4

2006 282 1457 291.4

2007 300 1467 293.4

2008 303 1496 299.2

2009 298 1531 306.2

2010 313 1540 308.0

2011 317 1566 313.2

2012 309 1601 320.2

2013 329 1615 323.0

2014 333

2015 327

To simplify calculation work: Obtain the total of first five years data. Find out the difference

between the first and sixth term and add to the total to obtain the total of second to sixth term. In this

way the difference between the term to be omitted and the term to be included is added to the

preceding total in order to obtain the next successive total.

Example 3 : Fit a trend line by the method of four-yearly moving average to the following time

series data.

Year : 2001 2002 2003 2004 2005 2006 2007 2008

Sugar production (lakh tons) : 5 6 7 7 6 8 9 10

Year : 2009 2010 2011 2012

Sugar production (lakh tons) : 9 10 11 11

Solution :

Year Sugar Production 4-yearly 4-yearly To recenter trend values

(lakh tons) moving moving 2 yearly centred 2-yearly moving

total average total average

1. 2. 3. 4. 5. 6.

2001 5

2002 6

2003 7 25 6.25 12.75 6.375

2004 7 26 6.50 13.50 6.75

2005 6 28 7.00 14.50 7.25

2006 8 30 7.50 15.75 7.875

2007 9 33 8.25 17.25 8.625

79

2008 10 36 9.00 18.50 9.25

2009 9 38 9.50 19.50 9.75

2010 10 40 10.00 20.25 10.125

2011 11 41 10.25

2012 11

Remark : Observe carefully the placement of totals, averages between the lines.

Merits

1. This is a very simple method.

2. The element of flexibility is always present in this method as all the calculations have not

to be altered if same data is added. It only provides additional trend values.

3. If there is a coincidence of the period of moving averages and the period of cyclical

fluctuations, the fluctuations automatically disappear.

4. The pattern of moving average is determined in the trend of data and remains unaffected

by the choice of method to be employed.

5. It can be put to utmost use in case of series having strikingly irregular trend.

Limitations

1. It is not possible to have a trend value for each and every year. As the period of moving

average increases, there is always an increase in the number of years for which trend

values cannot be calculated and known. For example, in a five yearly moving average,

trend value cannot be obtained for the first two years and last two years, in a seven yearly

moving average for the first three years and last three years and so on. But usually values

of the extreme years are of great interest.

2. There is no hard and fast rule for the selection of a period of moving average.

3. Forecasting is one of the leading objectives of trend analysis. But this objective remains

unfulfilled because moving average is not represented by a mathematical function.

4. Theoretically it is claimed that cyclical fluctuations are ironed out if period of moving

average coincide with period of cycle, but in practice cycles are not perfectly periodic.

1.4.3 Semi-average Method

This simple method can be used if a straight line trend is to be obtained. Since the location of only

two points is necessary to obtain a straight line equation, it is obvious that we may select two

representative points and connect them by a straight line. Data are divided into two halves and an

average is obtained for each half. Each such average is shown against the mid-point of the half

period, we obtain two points on a graph paper. By joining these points, a straight line trend is

obtained.

The method is to be commended for its simplicity and used to some extent in practical work.

This method is also flexible, for it is permissible to select representative periods to determine the

two points. Unrepresentative years may be ignored.

1.4.4 Method of Least Squares

If a straight line is fitted to the data it will serve as a satisfactory trend, perhaps the most accurate

method of fitting is that of least squares. This method is designed to accomplish two results.

80

(i) The sum of the vertical deviations from the straight line must equal zero.

(ii) The sum of the squares of all deviations must be less than the sum of the squares for any

other conceivable straight line.

There will be many straight lines which can meet the first condition. Among all different lines,

only one line will satisfy the second condition. It is because of this second condition that this

method is known as the method of least squares. It may be mentioned that a line fitted to satisfy the

second condition, will automatically satisfy the first condition.

The formula for a straight-line trend can most simply be expressed as

Yc = a + bX

where X represents time variable, Yc is the dependent variable for which trend values are to be

calculated and a and b are the constants of the straight line to be found by the method of least

squares.

Constant a is the Y-intercept. This is the difference between the point of the origin (O) and the

point when the trend line and Y-axis intersect. It shows the value of Y when X = 0, constant b

indicates the slope which is the change in Y for each unit change in X.

Let us assume that we are given observations of Y for n number of years. If we wish to find the

values of constants a and b in such a manner that the two conditions laid down above are satisfied

by the fitted equation.

Mathematical reasoning suggests that, to obtain the values of constants a and b according to

the Principle of Least Squares, we have to solve simultaneously the following two equations.

Y = na + bX ...(i)

XY = aX + bX2 ...(ii)

Solution of the two normal equations yield the following values for the constants a and b:

nXY X Y

b =

n X 2 ( X ) 2

Y b X

and a =

n

Least Squares Long Method : It makes use of the above mentioned two normal equations

without attempting to shift the time variable to convenient mid-year. This method is illustrated by

the following example.

Example 4 :

Fit a linear trend curve by the least-squares method to the following data :

Year Production (Kg.)

2006 3

2007 5

2008 6

2009 6

81

2010 8

2011 10

2012 11

2013 12

2014 13

2015 15

Solution : The first year 2006 is assumed to be 0, 2007 would become 1, 2008 would be 2 and so

on. The various steps are outlined in the following table.

Year Production

Y X XY X2

1 2 3 4 5

2006 3 0 0 0

2007 5 1 5 1

2008 6 2 12 4

2009 6 3 18 9

2010 8 4 32 16

2011 10 5 50 25

2012 11 6 66 36

2013 12 7 84 49

2014 13 8 104 64

2015 15 9 135 81

Total 89 45 506 285

The above table yields the following values for various terms mentioned below :

n = 10, X = 45, X2 = 285, Y = 89, and XY = 506

Substituting these values in the two normal equations, we obtain

89 = 10a + 45b ...(i)

506 = 45a + 285b ...(ii)

Multiplying equation (i) by 9 and equation (ii) by 2. we obtain

801 = 90a + 405b ...(iii)

1012 = 90a + 570b ....(iv)

Subtracting equation (iii) from equation (iv), we obtain

211 = 165 b or b = 211/165 = 1.28

Substituting the value of b in equation (i), we obtain

89 = 10a + 45 × 1.28

82

89 = 10a + 57.60

10a = 89 – 57.6

10a = 31.4

a = 31.4/10 = 3.14

Substituting these values of a and b in the linear equation, we obtain the following trend line

Yc = 3.14 + 1.28X

Inserting various values of X in this equation, we obtain the trend values as below :

Year Observed Y b×X Yc (Col. 3 plus Col. 4)

1 2 3 4 5

2006 3 3.14 1.28 × 0 3.14

2007 5 3.14 1.28 × 1 4.42

2008 6 3.14 1.28 × 2 5.70

2009 6 3.14 1.28 × 3 6.98

2010 8 3.14 1.28 × 4 8.26

2011 10 3.14 1.28 × 5 9.54

2012 11 3.14 1.28 × 6 10.82

2013 12 3.14 1.28 × 7 12.10

2014 13 3.14 1.28 × 8 13.38

2015 15 3.14 1.28 × 9 14.66

Least Squares Method : We can take any other year as the origin, and for that year X would

be 0. Considerable saving of both time and effort is possible if the origin is taken in the middle of

the whole time span covered by the entire series. The origin would then be located at the mean of

the X values. Sum of the X values would then equal 0. The two normal equations would then be

simplified to

Y = Na ... (i)

Y

or a =

N

XY

and XY = bX2 or b= .... (ii)

X 2

Two cases of short cut method are given below. In the first case there are odd number of years

while in the second case the number of observations are even.

Example 5 : Fit a straight line trend on the following data :

Year 2008 2009 2010 2011 2012 2013 2014 2015 2016

Y 4 7 7 8 9 11 13 14 17

83

Solution : Since we have 9 observations, the origin, is taken at 2012 for which X is assumed to

be 0.

Year Y X XY X2

2008 4 –4 – 16 16

2009 7 –3 – 21 9

2010 7 –2 – 14 4

2011 8 –1 –8 1

2012 9 0 0 0

2013 11 1 11 1

2014 13 2 26 4

2015 14 3 42 9

2016 17 4 68 16

Total 90 0 88 60

Substituting these values in the two normal equations, we get

90 = 9a or a = 90/9 or a = 10

88 = 60b or b = 88/60 or b = 1.47

Trend equation is : Yc = 10 + 1.47 X

Inserting the various values of X, we obtain the trend values as below.

Years Observed Y X a b×X Yc(Col. 4 plus Col. 5)

2008 4 –4 10 1.47 × –4 = – 5.88 4.12

2009 7 –3 10 1.47 × –3 = – 4.41 5.59

2010 7 –2 10 1.47 × –2 = – 2.94 7.06

2011 8 –1 10 1.47 × –1 = – 1.47 8.53

2012 9 0 10 1.47 × 0 = 0 10.00

2013 11 1 10 1.47 × 1 = 1.47 11.47

2014 13 2 10 1.47 × 2 = 2.94 12.94

2015 14 3 10 1.47 × 3 = 4.41 14.41

2016 17 4 10 1.47 × 4 = 5.88 15.88

Example 6 : Fit a straight line trend to the data which gives number of passenger cars sold (millions)

Year 2009 2010 2011 2012 2013 2014 2015 2016

No. of cars 6.7 5.3 4.3 6.1 5.6 7.9 5.8 6.1

(millions)

84

Solution : Here there are two mid-years viz; 2012 and 2013. The mid-point of the two years is

assumed to be 0 and the time of six months is treated to be the unit. On this basis the calculations

are as shown below :

Year Observed Y X XY X2

2009 6.7 –7 – 46.9 49

2010 5.3 –5 – 26.5 25

2011 4.3 –3 –12.9 9

2012 6.1 –1 – 6.1 1

2013 5.6 1 5.6 1

2014 7.9 3 23.7 9

2015 5.8 5 29.0 25

2016 6.1 7 42.7 49

Total 47.8 0 8.6 168

From the above computations, we get the following values.

n = 8, Y = 47.8, X = 0, XY = 8.6, X2 = 168

Substituting these values in the two normal equations, we obtain

47.8 = 8a or a = 47.8/8 or a = 5.98

and 8.6b = 168b or b = 8.6/168 or b = 0.051

The equation for the trend line is : Yc = 5.98 + 0.051

Trend values generated by this equation are given below.

Years Observed Y X a b×X Yc(Col. 4 plus Col. 5)

2009 6.7 –7 5.98 .051 × –7 = –.357 5.623

2010 5.3 –5 5.98 .051 × –5 = –.255 5.725

2011 4.3 –3 5.98 .051 × –3 = –.153 5.827

2012 6.1 –1 5.98 .051 × –1 = –.051 5.929

2013 5.6 1 5.98 .051 × 1 = .051 6.031

2014 7.9 3 5.98 .051 × 3 = .153 6.133

2015 5.8 5 5.98 .051 × 5 = .255 6.235

2016 5.1 7 5.98 .051 × 7 = .357 6.337

The simplest example of the non-linear trend is the second degree parabola, the equation is written

in the form:

Yc = a + bX+ cX2

85

When numerical values for a, b and c have been derived, the trend value for any year may be

computed substituting in the equation the value of X for that year. The values of a, b and c can be

determined by solving the following three normal equations simultaneously:

(i) Y = Na + bX + cX2

(ii) XY = aX + bX2 + cX3

(iii) X2Y = aX2 + bX3 + cX4

Note that the first equation is merely the summation of the given function, the second is the

summation of X multiplied into the given function, and the third is the summation of X2 multiplied

into the given function.

When time origin is taken between two middle yearsX would be zero. In that case the equations

are reduced to:

(i) Y = Na + cX2

(ii) XY = bX2

(iii) X2Y = aX2 + cX4

The value of b can now directly be obtained from equation (ii) and value of a and c by solving

equations (i) and (iii) simultaneously. Thus,

Y cX 2 XY N X 2Y X 2 Y

a= ; b= ; c=

N X 2 N X 4 ( X 2 ) 2

Example 7 : The price of a commodity during 2010–2015 is given below. Fit a parabola

Y = a + bX + cX2 to this data. Estimate the price of the commodity for the year 2016

Year Price Year Price

2010 100 2013 140

2011 107 2014 181

2012 128 2015 192

Also plot the actual and trend values on graph.

Solution : To determine the value a, b and c, we solve the following normal equations:

Y = Na + bX + cX2

XY = aX + bX2 + cX3

X2Y = aX2 + bX3 + cX4

Year Y X X2 X3 X4 XY X2Y Yc

2010 100 –2 4 –8 16 – 200 400 97.744

2011 107 –1 1 –1 1 – 107 107 110.426

2012 128 0 0 0 0 0 0 126.680

86

2013 140 +1 1 +1 1 + 140 140 146.506

2014 181 +2 4 +8 16 + 362 724 169.904

2015 192 +3 9 +27 81 +576 1728 196.874

N = 6 Y = 848 X = 3 X2 = 19 X3 = 27 X4 = 115 XY = 771 X2Y = 3099 Yc = 848.134

848 = 6a + 3b + 19c ...(i)

771 = 3a + 19b + 27c ...(ii)

3,099 = 19a +27b + 115c ...(iii)

Solving Eqns. (i) and (ii), we get

35b +35c = 695 ...(iv)

Multiplying Eqn. (ii) by 19 and Eqn. (iii) by 3. Subtracting (iii) from (ii), we get

5352 = 280b + 168 c ...(v)

Solving Eqns. (iv) and (v), we get

c = 1.786

Substituting the value of c in Eqn. (iv), we get

b = 18.04 [35b + (35 × l.786) = 695]

Putting the value of b and c in Eqn. (i), we get

a = 126.68 [848 = 6a + (3 × 18.04) + (19 × 1.786)]

Thus a = 126.68, b = 18.04 and c = 1.786

Substituting the values in the equation

Yc = 126.68+ 18.04 X+ 1.786X2

When X = – 2, Y = 126.68 + 18.04(–2) + 1.786 (–2)2

= 126.68 – 36.08 + 7.144 = 97.744

When X = – 1, Y = 126.68 + 18.04(–1) + 1.786 (–1)2

= 126.68 – 18.04 + 1.786 = 110.426

When X = 0, Y = 126.68

When X= 1, Y = 126.68 + 18.04 + 1.786 = 146.506

When X = 2, Y = 126.68 + 18.04 (2) + 1.786 (2)2

= 126.68 + 36.08 + 7.144 = 169.904

When X = 3, Y = 126.68 + 18.04(3) + 1.786(3)2

= 126.68 + 54.12 + 16.074 = 196.874

Price for 2016, Y = 126.68 + 18.04(4) + 1.786(4)2

When X = 4 = 126.68 + 72.16 + 28.576 = 227.416

Thus the likely price of the commodity for the year 2016 is Rs. 227.416.

87

1.6 EXPONENTIAL TRENDS

The equation of the exponential curve is

Y = abx

Putting the equation in logarithmic form, we get

log Y = log a +X log b.

When plotted on a semi-logarithmic graph, the curve gives a straight line. However, on an

arithmetic chart the curve gives a non-linear trend. To obtain the value of the constants a and b, the

two normal equations to be solved are :

log Y = N log a + log bX

(X log Y) = log a X + log bX2

where a is the Y intercept and b the slope of the curve.

When deviations are taken from middle year, X = 0, the above equation becomes

logY

log Y = Nlog a log a =

N

X log Y

and (X log Y) = log bX2 log b =

X 2

Steps. The steps in fitting a curve are :

(i) Find the time deviation of each year from the middle year and denote these deviations

by X.

(ii) Square these deviations and obtain X2.

(iii) Obtain logarithms of the variable Y.

(iv) Multiply log Y by the corresponding time deviation and obtain X log Y.

(v) Divide log Y by N. This would give the value of log a.

(vi) Divide (X log Y) by X2. This would give the value of log b, i.e., rate of growth or the

slope of the line.

(vii) Put the value of log a before the middle year and add or subtract the slope of the line, i.e.,

the value of log b to get trend ordinates in logarithms.

(viii) Take the antilogs of these logs to arrive at the actual trend values.

Example 8 : The sales of a company for the years to are given below :

Years : 2009 2010 2011 2012 2013 2014 2015

Sales (Rs. million) : 32 47 65 92 190 132 275

Estimate sales figure for the year 2018 using the equation of the form Y = abx where X = years

and Y = Sales

88

Solution : Fitting Equation Y = abx

Year Sales (Rs. million) Y X log Y X2 X log Y

2009 32 –3 1.5051 9 – 4.5153

2010 47 –2 1.6721 4 – 3.3442

2011 65 –1 1.8129 1 – 1.8129

2012 92 0 1.9638 0 0

2013 190 +1 2.2788 1 + 2.2788

2014 132 +2 2.1206 4 + 4.2412

2015 275 +3 2.4393 9 + 7.3179

N=7 Y = 833 X = 0 log Y X2 = 28 Xlog Y

= 13.7926 = 4.1655

log Y 13.7926

log a = 1.9704

N 7

X log Y 4.1655

log b = 0.149

X 2 28

log Y = 1.9704 + 0.149 X

Trends are usually fitted to annual data with the middle of the series as origin. At times it may be

necessary to change the origin of the trend equation to some other point in the series. For example,

annual trend values may be changed to monthly or quarterly values if we wish to study seasonal or

cyclical patterns.

For an arithmetic straight line we have to find out new Y intercept, lies the value of ‘a’. The

value of ‘b’ remains unchanged, since the slope of the trend line remains same irrespective of the

origin. The procedure of shifting the origin may be done by the expression :

Yt = a + b(X + k)

where k is the number of time units shifted. If the origin is shifted forward in time, k is positive, if

shifted backward in time, k is negative.

Example 9 : You are given the trend equation

Yc = 110 + 2X

(origin 2008, time unit 1 year)

Shift the origin to 2012.

Solution : We are required to shift the origin to 2012, 4 years forward. Here k = 4. The required

equation can be obtained as :

Yt = a + b (X + k)

= 110 + 2(X + 4) = 110 + 2X + 8 = 118 + 2X

(origin 2012, X unit = 1 year)

89

Example 10 : You are given the trend equation

Yc = 210 – 1.5X

(origin 2012, time unit 1 year)

Shift the origin to 2007.

Solution : Changing origin from 2012 to 2007 means going back by 5 years. Using the formula

Yt = a + b(X + k)

= 210 – 1.5(X – 5) = 210 = 1.5X + 7.5 = 21.75 – 1.5X

(origin 2007, time period one year)

Example 11 : You are given the following equation :

Y = 126.55 + 18.04X + 1.786X2

(origin 2011–12)

Solution : If we wish to shift the origin for this equation to 2012, we may follow the procedure

explained above

Yt = 126.55 + 18.04(X + 0.5) + 1.786(X + 0.5)2

= 126.55 + 18.04X + 9.02 + 1.786(X2 + X + 0.25)

= 126.55 + 18.04X + 9.02 +1.786X2 + 1.786X + 0.4465

= 136.0165 + 19.826X + 1.786X2

From annual trend equations we can obtain monthly trend equations without any loss in accuracy.

When the Y units are annual totals then an annual trend equation can be converted into an equation

of monthly totals by dividing the computed constant ‘a’ by 12 and the value of‘b’ by 144. Justification

of dividing ‘a’ and ‘b’ by 12 and 144 is that the data are sums of 12 months hence ‘a’ and ‘b’ must

be divided by 12 and ‘b’ is again divided by 12 so that the time units (X’s will be in months as well,

i.e., ‘b’ would give monthly increments). Therefore the monthly trend equation becomes:

a b

Y = X

12 144

The annual trend equation can also be reduced to quarterly trend equation which will be given

by :

a b a b

Y = X or X

4 44 4 16

Example 12 : The trend of the annual sales of ABC Co. Ltd. is given by the following equation :

Yc = 30 + 3.6X (origin 2012, X unit = 1 year, Y unit = annual sales)

Convert the equation on monthly basis.

Solution : To convert an annual trend equation on monthly basis, the value of ‘a’ is divided by 12

and the value of ‘b’ by 144. The equation on monthly basis is

30 3.6

Yc = X

12 144

90

Yc = 2.5 + 0.025X

If the annual trend equation is of second degree, the corresponding monthly trend equation is

obtained by dividing ‘a’ by 12, ‘b’ by 144 and ‘c’ by 1728 (the last being identical to dividing ‘c’ by

12 three times).

Example 13 : Convert the following annual trend equation on a monthly basis :

Yc = 10.6 + 0.8 X + 0.64X2

Solution : To convert annual trend equation of the second degree on monthly basis, divide ‘a’ by

12, ‘b’ by 144 and ‘c’ by 1,728. Thus, the required equation will be :

10.6 0.8 0.64 2

Yc = X X

12 144 1728

= 0.883 + 0.0056X + 0.00037X 2

where data are given as monthly averages per year, the value of ‘a’ remains unchanged and the ‘b’

is divided by 12 only once. The reason is that ‘a’ is already at the monthly level and ‘b’now represents

the annual change in monthly magnitudes. In case of a second-degree trend equation, the value of

‘c’ is divided by 144.

Example 14 : You are given the following trend equation:

Yc = 280 – 1.8X (origin June 30, 2012,

Y unit = annual monthly average sales)

Convert this equation into monthly terms and shift the origin half a month forward.

Solution : (i) The given annual trend equation reduced to monthly values will be :

1.8

Yc = 280 X

12

= 280 – 0.15X

(origin: June 30, 2012; X unit = 1 month; Y unit = average monthly sales)

(ii) Shifting the origin half a month forward :

Yt = 280 – 0.15 (X + 0.5)

= 280 – 0.15X – 0.075

= 279.925 – 0.15X

Seasonal variations are those rhythmic changes in the time series data that occur regularly each

year. They have their origin in climatic or institutional factors that affect either supply or demand or

both. It is important that these variations be measured accurately for three reasons. First, the

investigator wants to eliminate seasonal variations from the data he is studying. Second, a precise

knowledge of the seasonal pattern aid in planning future operations. Lastly, complete knowledge of

seasonal variations is of use to those who are trying to remove the cause of seasonals or are attempting

to mitigate the problem by diversification, offsetting opposing seasonal patterns, or some other

means.

Since the number of calender days and working days vary from month to month, therefore, it

91

is essential to adjust the monthly figures if the same are based on daily quantities, otherwise, there

is no need for such adjustment when we deal with either volume of inventories or of bank deposits

because then the values are not influenced by the number of calender days or working days.

Methods of Measuring Seasonal Variations

1. Method of Simple Averages (Weekly, Monthly or Quarterly).

2. Ratio-to-Moving Average Method.

3. Ratio-to-Trend Method.

4. Link Relatives Method.

1.9.1 Method of Simple Averages

This is the simplest method of obtaining a seasonal index. The following steps are necessary for

calculating the index:

(i) Average the unadjusted date by years and months or quarters if quarterly data are given.

(ii) Find totals of January, February etc.

(iii) Divide each total by the number of years for which data are given. For example, if we are

given monthly data for five years then we shall first obtain total for each month for five

years and divide each total by 5 to obtain an average.

(iv) Obtain an average of monthly averages by dividing the total of monthly averages by 12.

(v) Taking the average of monthly average as 100, compute the percentage of various monthly

averages as follows:

Monthly average for January

Seasonal Index for January = 100

Average of monthly average

If instead of the average of each month, the total of each month are obtained, we will get the

same result. The following example shall illustrate the method.

Example 15 : Consumption of monthly electric power in million of Kw hours for street lighting in

India during 2011 – 2015 is given below:

Year Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec

2011 318 281 278 250 231 216 223 245 269 302 325 347

2012 342 309 299 268 249 236 242 262 288 321 342 364

2013 367 328 320 287 269 251 259 284 309 245 367 394

2014 392 349 342 311 290 273 282 305 328 364 389 417

2015 420 378 370 334 314 296 305 330 356 396 422 452

Find out seasonal variation by the method of monthly averages.

Solution : Computation of Seasonal Indices by Monthly Averages

Monthly Five

Consumption of monthly electric power total for yearly Percen-

Month 5 years average tage

(1) (2) (3) (4) (5) (6) (7) (8) (9)

Jan. 318 342 367 392 420 1,839 367.8 116.1

92

Feb. 281 309 328 349 378 1,645 329.0 103.9

March 278 299 320 342 370 1,609 321.8 101.6

April 250 268 287 311 334 1,450 290.0 91.6

May 231 249 269 290 314 1,353 270.6 85.4

June 216 236 251 273 296 1,272 254.4 80.3

July 223 242 259 282 305 1,311 262.2 82.8

Aug. 245 262 284 305 330 1,426 285.2 90.1

Sept. 269 288 309 328 356 1,550 310.0 97.9

Oct. 302 321 245 364 396 1,728 345.6 109.1

Nov. 325 342 367 389 422 1,845 369.0 116.5

Dec. 347 364 394 417 452 1,974 394.8 124.7

Total 19,002 3,800.4 1,200

Average 1,583.5 316.7 100

(i) Column No. 7 gives the total for each month for five years.

(ii) In column No. 8 each total of column No. 7 has been divided by 5 to obtain an average for

each month.

(iii) The average of monthly averages is obtained by dividing the total of monthly averages by

12.

(iv) In column No. 9 each monthly average has been expressed as percentage of the average

of monthly averages. Thus, the percentage for January

367.8

= 100 116.1

316.7

329.0

Percentage for February = 100 103.9

316.7

If instead of monthly data, we are given weekly or quarterly data, we shall compute weekly or

quarterly averages by following the same procedure.

1.9.2 Ratio-to-moving average method

The method of monthly totals or monthly averages does not give any consideration to the trend

which may be present in the data. The ratio-to-moving average method is one of the simplest of the

commonly used devices for measuring seasonal variation which takes the trend into consideration:

The steps to compute seasonal variation are as follows :-

(i) Arrange the unadjusted data by years and months.

(ii) Compute the trend values by the method of moving averages. For this purpose take 12

month moving average followed by a two-month moving average to recentre the trend

values.

(iii) Express the data for each month as a percentage ratio of the corresponding moving-

average trend value.

93

(iv) Arrange these ratios by months and years.

(v) Aggregate the ratios for January, February etc.

(vi) Find the average ratio for each month.

(vii) Adjust the average monthly ratios found in step (vi) so that they will themselves average

100 per cent. These adjusted ratios will be the seasonal indices for various months.

A seasonal index computed by the ratios-to-moving-average method ordinarily does not fluctuate

so much as the index based on straight-line trends. This is because the 12-month moving average

follows the cyclical course of the actual data quite closely. Therefore the index ratios obtained by

this method are often more representative of the data from which they are obtained than is the case

in the ratio-to-trend method which will be discussed later on.

Example 16 : Prepare a monthly seasonal index from the following data, using moving averages

method :

Monthly Sales of XYZ Products Co,. Ltd. (Rs.)

Year

Month 2010 2011 2012

January 3,639 3,913 4,393

February 3,591 3,856 4,530

March 3,326 3,714 4,287

April 3,469 3,820 4,405

May 3,321 3,647 4,024

June 3,320 3,498 3,992

July 3,205 3,476 3,795

August 3,205 3,354 3,492

September 3,255 3,594 3,571

October 3,550 3,830 3,923

November 3,771 4,183 3,984

December 3,772 4,482 3,880

Solution :

Computations of Ratios to 12-month centered moving averages for sales (Rs.)

Year & Sales 12-month 12-month Centred Ratio to moving

month (Rs.) moving moving 12-months average

total average moving

average

1 2 3 4 5 6

2010

Jan. 3,639

Feb. 3,591

94

March 3,326

April 3,469

May 3,321

June 3,320

41,424 3,452

July 3,205 3,463 92.55

41,698 3,475

Aug. 3,205 3,486 91.94

41,963 3,497

Sept. 3,255 3,513 92.66

42,351 3,529

Oct. 3,550 3,543 100.20

42,702 3,558

Nov. 3,771 3,572 105.57

43,028 3,586

Dec. 3,772 3,593 104.98

2011 43,206 3,601

Jan. 3,913 3,612 108.33

43,477 3,623

Feb. 3,856 3,630 106.23

43,626 3,636

March 3,714 3,650 101.75

43,965 3,664

April 3,820 3,675 103.95

44,245 3,687

May 3,647 3,704 98.46

44,657 3,721

June 3,498 3,751 93.26

45,367 3,781

July 3,476 3,801 91.45

45,847 3,821

Aug. 3,354 3,849 87.14

46,521 3,877

Sept. 3,594 3,901 92.13

47,094 3,925

Oct. 3,830 3,949 96.99

47,679 3,973

Nov. 4,183 3,989 104.86

48,056 4,005

95

Dec. 4,482 4,025 111.35

2012 48,550 4,046

Jan. 4,393 4,059 108.23

48,869 4,072

Feb. 4,530 4,078 111.08

49,007 4,084

March 4,287 4,083 105.00

48,984 4,082

April 4,405 4,086 107.81

49,077 4,090

May 4,024 4,081 98.60

48,878 4,073

June 3,992 4,048 98.62

48,276 4,023

July 3,795

Aug. 3,492

Sept. 3,571

Oct. 3,923

Nov. 3,984

Dec. 3,880

Arranging the ratios-to-moving average by months and years we obtain the following table

from which the seasonal index for each month is also obtained.

Computation of Seasonal Index by Ratios-to-Moving Averages of XYZ Products Co. Ltd.

Year Seasonal

Month 2010 2011 2012 Total Average Index

January — 108.33 108.23 216.56 108.28 107.6

February — 106.23 111.08 217.31 108.65 108.1

March — 101.75 105.00 206.75 103.37 102.8

April — 103.95 107.81 211.76 105.88 105.3

May — 98.46 98.60 197.06 98.53 98.0

June — 93.26 98.62 191.88 95.54 95.4

July 92.55 91.45 — 184.00 92.00 91.5

August 91.94 87.14 — 179.08 89.54 89.0

September 92.66 92.13 — 184.79 92.40 91.9

October 100.20 96.99 — 197.19 98.60 98.1

November 105.57 104.86 — 210.43 105.21 104.06

December 104.98 111.35 — 216.33 108.16 107.6

Total of Monthly Averages 1206.56

Average of Monthly Averages 100.55

96

Putting average of monthly averages as 100, monthly averages have been admitted to obtain seasonal

index for each month.

108.28

For example, Seasonal Index for January = 100 107.6

100.55

108.65

for February = 100 108.1

100.55

Merits

This method is more widely used in practice than other methods. The index calculated by the ratio-

to-moving average method does not fluctuate very much. The 12-month moving average follows

the cyclical course of the actual data closely. So index ratios are the true representative of the data

from which they have been obtained.

Limitations

All seasonal index numbers cannot be calculated for each month for which data is available. When

a four month average is taken, 2 months, in the beginning and 2 months in the end are left out for

which we cannot calculate seasonal index numbers.

1.9.3 Ratio-to-trend method

The ratio-to-trend method is similar to ratio-to-moving-average method. The only difference is the

way of obtaining the trend values. Whereas in the ratio-to-moving-average method, the trend values

are obtained by the method of moving averages, in the ratio-to-trend method, the corresponding

trend is obtained by the method of least squares.

The steps in the calculation of seasonal variation are as follows :

(i) Arrange the unadjusted data by years and months.

(ii) Compute the trend values for each month with the help of least squares equation.

(iii) Express the data for each month as a percentage ratio of the corresponding trend value.

(iv) Aggregate the January’s ratios, February’s ratios, etc., computed previously

(v) Find the average ratio for each month.

(vi) Adjust the average ratios found in step (v) so that they will themselves average 100 per

cent.

The last step gives us the seasonal index for each month.

Sometimes the median is used in place of the arithmetic average of the ratios-to-trend. The

choice depends upon circumstances but there is a preference for the median if several erratic ratios

are found. In fact, if a fairly large number of years, say, 20 or 15, are used in the computation, it is

not uncommon to omit extremely erratic ratios from the computation of average of monthly ratios.

Only the arithmetic average should be used for small number of years.

This method has the advantage of simplicity and case of interpretation. Although it makes

allowance for the trend, it may be influenced by errors in the calculation of the trend. The method

may also be influenced by cyclical and erratic influences. This source of possible error is eliminated

by the selection of a period of time in which depression is offset by prosperity.

97

Example 17 : Find seasonal variations by the ratio-to-trend method from the following data:

Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter

2010 30 40 36 34

2011 34 52 40 44

2012 40 58 54 48

2013 54 76 68 62

2014 80 92 86 82

Solution : For finding out seasonal variations by ratio-to-trend method, first the trend for yearly

data will be obtained and convert them into quarterly data.

Year Yearly Average of X XY X2 Trend Y = 56+ 12X

totals quarterly values

values of Y

2010 140 35 –2 – 70 4 32 56 + 12 (–2)

2011 180 45 –1 – 45 1 44 56+ 12 (–1)

2012 200 50 0 0 0 56 56+12(0)

2013 260 65 1 65 1 68 56+12(1)

2014 340 85 2 170 4 80 56+12(2)

Total 1120 280 + 120 10 280

Y = a + bX

Y 280

a = 56

N 5

XY 120

b = 12

X 2 10

The trend value for the middle quarter 2010, i.e., which should come between 2nd and 3rd

quarter is 32.

Yearly increment 12

Quarterly increment is : 3

4 4

3

Therefore, the trend value for 2nd quarter will be 32 30.5

2

3

The trend value for 3rd quarter is 32 33.5

2

Similarly other values will be calculated.

Quarterly Trend Values

Year 1st 2nd 3rd 4th Total

2010 27.5 30.5 33.5 36.5 128

2011 39.5 42.5 45.5 48.5 176

2012 51.5 54.5 57.5 60.5 224

2013 63.5 66.5 69.5 72.5 272

2014 75.5 78.5 81.5 84.5 320

98

O

Now we calculate percentage of trend values on the basis of quarterly trend values 100

T

2010 109.1 131.1 107.5 93.1

2011 86.1 122.4 109.9 90.7

2012 77.7 106.4 93.9 79.3

2013 85.0 114.3 97.8 85.5

2014 106.0 117.2 105.5 97.0

Total 463.9 591.4 514.6 445.6

Average 92.78 118.28 102.92 89.12

92.78 + 118.28 + 102.92 + 89.12 403.10

100.775

4 4

92.78

Quarterly seasonal Index for 1st Quarter : 100 92.1

100.775

118.28

Quarterly seasonal Index for 2nd Quarter : 100 117.4

100.775

102.92

Quarterly seasonal Index for 3rd Quarter : 100 102.1

100.775

89.12

Quarterly seasonal Index for 4th Quarter : 100 88.4

100.775

The total of seasonal indices should be equal to 400 and that for monthly indices should be

1200.

Merits

(i) This method is based on a logical procedure for measuring seasonal variations. This

procedure has advantage over the moving average method for it has a ratio-to-trend value

for each month for which data is available. So this method avoids loss of data which is

inherent in the case of moving averages. If the period of time series is very short then the

advantage becomes more prominent.

(ii) It is a simple method.

(iii) It is easy to understand.

Limitations :

If the cyclical changes are very wide in the time series, the trend can never follow the actual data, as

closely as a 12-month moving average will follow, under the ratio-to-trend method. There will be

more bias in a seasonal index computed by ratio-to-trend method.

99

1.9.4 Link Relatives Method

Among all the methods of measuring seasonal variation, link relatives method is the most difficult

one. When this method is adopted the following steps are taken to calculate the seasonal variation

indices:

(i) Calculate the link relatives of the seasonal figures. Link relatives are calculated by dividing

the figure of each season* by the figure of immediately preceding season and multiplying

it by 100.

Current season’s figure

100

Previous season’s figure

These percentages are called link relatives since they link each month (or quarter or other time

period) to the preceding one.

(ii) Calculating the average of the link relatives for each season. While calculating average

we might take arithmetic average but median is probably better. The arithmetic average

would give undue weight to extreme cases which were not primarily due to seasonal

influences.

(iii) Convert these averages into chain relatives on the base of the first season.

(iv) Calculate the chain relatives of the first season on the basis of the last season. There will

be some difference between the chain relative of the first season and the chain relative

calculated by the previous method. This difference will be due to long-term changes. It is

therefore necessary to correct these chain relatives.

(v) For correction, the chain relative of the first season calculated by first method is deducted

from the chain relative (of the first season) calculated by the second method. The difference

is divided by the number of seasons. The resulting figure multiplied by 1, 2, 3 (and so on)

is deducted respectively from the chain relatives of the 2nd, 3rd, 4rd (and so on) seasons.

These are correct chain relatives.

(vi) Express the corrected chain relatives as percentage of their averages. These provide the

required seasonal indices by the method of link relatives.

The following example will illustrate the process.

Example 18 : Apply method of link relatives to the following data and calculate seasonal indices.

Quarterly Figures

Quarter 2011 2012 2013 2014 2015

I 6.0 5.4 6.8 7.2 6.6

II 6.5 7.9 6.6 5.8 7.4

III 7.8 8.4 9.3 7.5 8.0

IV 8.7 7.3 6.4 8.5 7.1

100

Solution : Calculation of Seasonal Indices by Method of Link Relatives

Quarter

Year I II III IV

2011 — 108.3 120.0 111.5

2012 62.1 146.3 106.3 86.9

2013 93.2 95.6 143.1 68.8

2014 112.5 80.6 129.3 113.3

2015 77.6 110.6 109.6 88.8

Arithmetic average 86.35 108.28 121.66 93.86

4 5 5 5

Chain relative 100

100 100 100

= 108.28 = 131.73 = 123.64

Chain relative 100 = 106.605 = 128.38 = 118.615

Seasonal Indices 100 100 100 100

113.4 113.4 113.4

= 94.0 = 113.21 = 104.60

The correction factor is calculated as follows :

Chain relative of the first quarter (on the basis of first quarter) = 100

86.35 123.6

Chain relative of the first quarter (on the basis of the last quarter) = 106.7

100

Difference between these chain relatives = 106.7 – 100 = 6.7

6.7

Difference per quarter = 1.675

4

Adjusted chain relatives are obtained by subtracting 1 × l. 675, 2 × 1.675, 3 × 1.675 from the

chain relatives of 2nd, 3rd and 4th quarters, respectively.

Seasonal variation indices are calculated as below:

100 + 106.605 + 128.38 + 118.615 453.6

113.4

4 4

Correct chain relatives × 100

Seasonal variation index =

113.4

Meaning of “Normal” in Business Statistics

Business is often said to be “above normal” or “below normal”. When so used the term “normal” is

generally recognized to mean a level of activity which is characterized by the presence of basic

101

trend and seasonal variation. This implies that the influence of business cycles and erratic fluctuations

on the level of activity is assumed to be insignificant. Therefore, the product of trend value for any

period when adjusted by the seasonal index for that period gives us an estimate of the normal

activity during that period.

1.10 SUMMARY

A time series refers to the observations of a random variable like sales, employment, etc.

placed in a chronological order.

The twin reasons for studying the time series include a historical understanding of the past

data and to make forecast for the future.

There are four components of a time series: (i) Secular trend, (ii) Cyclical variations, (iii)

Seasonal variations, and (iv) Irregular variations.

Secular trend refers to the general pattern of the values in a time series – it is the long-term

tendency of the movement of the variable.

The cyclical variations are caused by business cycles. A business cycle has four phases: (i)

peak time or prosperity, (ii) recession, (iii) trough or depression, and (iv) recovery.

Seasonal variations, which are caused by weather, customs, festivals, etc show themselves in

a period of one year. They repeat year after year.

Irregular variations or random fluctuations are those which result from unpredictable events

like strikes, natural or other calamities etc.

The two models used for the purpose of decomposing are (i) additive model, and (ii)

multiplicative model.

The additive model is based on the assumption that the four components add up to make time

series. They are assumed to be independent.

The multiplicative model is based on the assumption that a time series is the product of the

four components.

The linear trend is obtained by fitting a straight line to the given data. It is fitted on the principle

of least squares.

It is possible to shift the origin of an equation as Yt = a + b (X ± k).

The annual trend equations can be changed on a monthly or quarterly basis, and reverse is also

possible.

The parabolic trend involves fitting a second-degree parabola to the given data. It is of the

form Yt = a + bX + cX2.

The exponential trend is appropriate where the variable in consideration grows or declines

exponentially. It takes the form Yt = abx.

The method of moving averages is another way of obtaining trend. Beginning with a certain

number of time periods, average is calculated and then successive averages are calculated by

dropping the first of the values and including the next one.

Seasonal variations are measured and expressed as seasonal indices.

The methods of simple averages, ratio-to-moving averages and ratio-to-trend are primarily

used for the purpose.

102

1.11 SELF ASSESSMENT QUESTIONS

Exercise 1: Mark the following statements as True or False:

(i) A time series refers to a sequence of observations of a random variable over time and

placed in chronological order.

(ii) The components of a time series are: secular trend, cyclical variations, seasonal variations

and chance variations.

(iii) Secular trend and cyclical variations are related to long-term movements while seasonal

variations and random variations refer to the short-term changes.

(iv) Seasonal variations are highly predictable.

(v) Factors like population change, tastes, consumer incomes, etc. explain cyclical variations.

(vi) Technological innovations cause cyclical variations.

(vii) Business cycles relate to the economy as a whole and never to a particular industry.

(viii) The seasonal variations are caused by the changing seasons in a country.

(ix) The seasonal variations component is most important to analyse for purposes of forecasting

and planning in the short term.

(x) Irregular variations are erratic in nature.

(xi) In additive model, S, C and I are expressed as deviations from their respective mean values.

(xii) Using the multiplicative model of analysis, all the components of a time series are expressed

as percentages.

(xiii) In obtaining trend equation to a given set of data, the origin should be taken in such a

manner that X must work out to be equal to zero.

(xiv) The value of ‘a’, the intercept of a trend equation, is related to its origin.

(xv) The trend values for various years given in the data and the projected values do not change

with a shift in the origin.

(xvi) The monthly trend equations can also be converted into annual trend equations, and for this

we first need to shift the origin of the trend equation to July 1 of the year of origin.

(xvii) In exponential trend, a straight line trend is fitted to the log values of the Y variable.

(xviii) The exponential trend is an example of non-linear trend.

(xix) Moving averages require centering whenever the underlying period, n, used in their

calculation is even.

(xx) A monthly sales budget can be drawn up by multiplying monthly seasonal indices by average

monthly sales and dividing each by 100.

Ans. 1. T, 2. T, 3. F, 4. T, 5. F, 6. F, 7. F, 8. F, 9. T, 10. T, 11. F, 12. F, 13. F, 14. T, 15. T, 16. T, 17. T,

18. T, 19. T, 20. T

Exercise 2: Questions and Answers

(i) What is a time series? What are its components? With which component of a time series

would you mainly associate each of the following?

(a) Wild cat strike in a factory, interrupting production for 15 days.

(b) Increase in sales in a departmental store on Diwali.

(c) An era of prosperity.

(d) Fall in death rate due to advances in medical science.

103

(ii) What is meant by decomposition of a time series? Explain the difference between additive

and multiplicative models of analysing time series.

(iii) Explain the rules of converting annual trend equation (i) on a monthly basis, and (ii) on a

quarterly basis. How can a quarterly trend equation be converted on an annual basis?

(iv) How are seasonal variations measured under the multiplicative model of analysing time

series? How are the seasonal indices interpreted?

(v) Explain the following methods of calculating seasonal indices:

(a) Method of Simple Averages

(b) Ratio-to-trend Method

(c) Ratio-to-moving averages Method

(vi) The following data relates to gross ex-factory value (in Rs. crores) of output of a factory

over the last few years:

Year : 2006 2007 2008 2009 2010 2011 2012

Value : 320 360 368 332 376 396 368

(a) Fit a straight line trend by the method of least squares, taking the year of origin as

2006.

(b) What is the average annual change in the value of output?

(c) Obtain trend equation using the year 2009 as the origin. How does it compare with

equation obtained in (a) above?

(vii) Demand (in ‘000 metric tonnes) for sugar of Sweet India is given here:

Year : 2006 2007 2008 2009 2010 2011 2012

Demand : 77 88 94 85 91 98 90

(a) Fit a straight line trend by the method of least squares.

(b) Calculate trend values and plot observed values and trend values on a graph.

(c) Eliminate trend component using the multiplicative model.

(d) Obtain the forecast of demand for the year 2014.

(viii) Below are given figures of production of a sugar factory:

Year : 2005 2006 2007 2008 2009 2010 2011 2012

Production : 88 98 100 91 102 J07 100 118

(‘000 tons)

(a) Fit straight line trend to the above data by the method of least squares.

(b) What is the average annual change in the sugar production?

(c) Obtain trend values for various years. Show that the sum of difference between actual

and trend values is equal to zero.

(d) Eliminate the trend using multiplicative model. What components are thus left over?

(e) Convert the trend equation on a month-to-month basis and shift the origin to January

2006.

(ix) For each of the following derive the monthly trend equation:

(a) Yt = 960 + 72XOrigin: 2008, X Unit = 1 Year, Y unit = Annual sales of coffee in Rs.

(b) Yt = 169.58 + 78XOrigin: 2009, X Unit = 1 Year, Y unit = Average monthly production

(c) Yt = 2,760 + 212XOrigin: 2007, X Unit = 1/2 Year, Y unit = Annual earnings in Rs.

(d) Yt = 72 + 12XOrigin: 2010, X Unit = 1/2 Year, Y unit = Average monthly production

104

(x) Given the trend equation:

Yt = 204 + 24X

(2008 = 0, X unit = 1 Year, Y unit = Average monthly values)

(a) Convert this equation on a monthly basis.

(b) Shift the origin of the monthly trend equation to January, 2007.

(c) Estimate the value for January 2010.

(xi) Given the following trend equation:

Yt= 1,880+ 6X

[2009 = 0, X unit = 1 Year, Y unit = Average monthly sales (in ‘000 Rs.)]

(a) Convert this equation on a yearly basis.

(b) Estimate sales for the year 2013.

(c) Obtain a quarterly trend equation from (a) above.

(d) Obtain quarterly trend equation with origin at I Quarter, 2010.

(xii) Given below is a trend equation:

Yt = 372 + 288X

(Origin 2006, X unit = 1 Year, Y unit = Annual sales) Convert the above equation:

(a) To monthly trend equation with January 2007 as origin and estimate sales for March,

2007.

(b) To quarterly trend equation with first quarter, 2007 as the origin and estimate sales for

third quarter of 2007.

(xiii) Convert the following trend equation on a monthly basis and obtain the trend value for

November 2012:

Yt = 432 + 144X – 60X2

2010 = 0; X unit = 1 Year; Y unit = Yearly production in‘000 units

(xiv) The sales made by a company in the years 2006 through 2012 are given here :

Year : 2006 2007 2008 2009 2010 2011 2012

Sales (in millions of Rs.) : 30 38 75 90 88 140 188

x

(a) Fit an exponential trend Yt = ab to the data and obtain the trend equation.

(b) Plot and data on a graph and also plot the trend line.

(c) Find the projected sales for the year 2014.

(d) What is the average rate of growth of sales?

(xv) From the following data, estimate the trend values by taking 4-yearly moving averages :

Year Sales (Rs. lakh) Year Sales (Rs. Lakh)

1993 200 1999 360

1994 120 2000 400

1995 280 2001 320

1996 240 2002 360

1997 160 2003 360

1998 320

105

(xvi) The trend equation for quarterly sales of a firm is estimated to be as: Y = 20 + 2X, where

Y is sales per quarter in millions of rupees, unit of X is one quarter and the origin is the

middle of the first quarter (Jan.-Mar.) of 2005. The seasonal indices of sales for the four

quarters are given below :

Quarter : I II III IV

Seasonal Index: 120 105 85 90

Estimate the sales for each quarter of 2010.

(xvii) Calculate seasonal indices from the following data by ratio-to-moving averages method:

Year Quarter

I II III IV

2007 48 52 44 60

2008 60 72 64 80

2009 92 84 88 88

2010 96 100 96 104

2011 102 108 96 112

2012 108 116 120 116

(xviii) The ratios of observed values to moving averages in percentages are given in the following

table:

Year Quarter

I II III IV

2009 — — 112.2 96.8

2010 110.8 118.6 98.3 92.2

2011 102.4 116.2 96.3 88.0

2012 89.8 96.3 — —

Calculate the quarterly seasonal indices.

Ans. 6. Yt = 336 + 8X, 8 crores, 360 + 8X, 7. Yt = 89 + 2X, 99000 mt, 8. Yt = 99 + 3X(2008 = 0),

3000 tons, 90, 93, 96 etc. C & I, Yt = 7.635 + 0.0208X (Jan 2006 = 0), 9. Yt = 80 + 0.5 X (July 2008

= 0) Yt = 169.58 + 6.5 X (July 1, 2009 = 0), Yt = 230 + 2.944 X (July 1, 2007 = 0), Yt = 72 + 2X (July

1, 2010 = 0) 10. Yt = 204 + 2X (July 1, 2008 = 0), Yt = 169 + 2X (Jan 2007 = 0), 241 11. Yt = 22.56

+ 72X (2009 = 0), Y 2013 = 22.848, Yt = 5651.25 + 4.5X (Q1, 2010 = 0) 12. Yt = 44 + 2X (Jan 2007

= 0), 48, Yt = 138 + 18X (Q1, 2007 = 0), 174 13. Yt = 36 + X – 0.0347 X2(2005 = 0), Y nov, 2012

= 36.315 14. (a) = 78.16 (1.3438)x (c) = 342.5, (d) = 34.38% 15. 205, 225, 260, 290, 330, 355, 360,

16. 75.6, 66.15, 53.55, 56.7 17. 101. 92, 103.17, 92.08, 102.84, 18. 99.52, 108.74, 100.76, 90.98

106

This question paper contains 16 printed pages

Your Roll No. ............

5565

B.Com. (Hons.)/l

Paper IV—BUSINESS STATISTICS

(New Course : Admissions of 2004 and onwards)

(Write your Roll No. on the top immediately on receipt of this question paper.)

Note :— The maximum marks printed on the question paper are applicable for the students of the

regular colleges (Cat. ‘A’). These marks will, however, be scaled up proportionately in

respect of the students of SOL at the time of posting of awards for compilation of result.

Note : Answers may be written either in English or in Hindi; but the same medium should be

used throughout the paper.

All questions carry equal marks.

1. (a) Distinguish between Mean Deviation and Standard Deviation. Why Standard Deviation

is considered a better method of variation as compared to Mean Deviation? 6

(b) In 2000 and 2010 the population of a country was 151.3 million and 179.3 million

respectively:

(i) What was the average percentage increase per year?

(ii) Calculate the population for the year 2004.

(iii) Calculate population for the year 2020. 5

Or

(a) The median and mode of the following wage distribution are known to be Rs. 33.5 and

Rs. 34 respectively. Three frequency values from the table are, however, missing. Find

these missing values : 6

Daily wages Frequencies

(in Rs.)

0 – 10 10

10 – 20 10

20 – 30 ?

107

30 – 40 ?

40 – 50 ?

50 – 60 6

60 – 70 4

230

(b) The following table gives heights of boys and girls studying in a college. Find :

(i) Standard deviation of the heights of boys and girls taken together.

(ii) Whose heights are more variable? 5

Boys Girls

Number 400 100

Average height 68 inches 65 inches

Variance 9 4

2. (a) If the first four moments of distribution about the value 5 are equal to –4, 22, –117 and

560, determine the corresponding moments : 6

(i) About the mean, and

(ii) About zero.

(b) Given the bivariate data :

X Y

1 6

5 1

3 0

2 0

1 1

1 2

7 1

3 5

(i) Fit a regression line of Y and X and hence predict Y if X = 5

(ii) Fit a regression line of X on Y and hence predict X if Y = 2.5

(iii) Calculate Karl Pearson’s coefficient of correlation. 5

Or

(a) What is Kurtosis ? Explain the significance of studying Kurtosis. 5

(b) Coefficient of correlation between X and Y for 20 items is 0.3, mean of X is 15 and that

of Y is 20, standard deviations are 4 and 5 respectively. At the time of calculation one

item 27 has wrongly been taken as 17 in case of X series and 35 instead of 20 in case of

Y series. Find the correct coefficient of correlation. 6

108

3. (a) Two sets of Indices one with 1998 as base and the other with 2006 are given below: 6

Year Index A Year Index B

1998 100 2006 100

1999 110 2007 105

2000 120 2008 90

2001 190 2009 95.

2002 300 2010 102

2003 330 2011 110

2004 360 2012 96

2005 390

2006 400

You are required to splice the Index B to Index A. Then also shift the base to 2008.

(b) Calculate 5 yearly moving average of the number of students studying in a college shown

below : 5

Year No. of Students

2003 332

2004 317

2005 357

2006 392

2007 402

2008 405

2009 410

2010 427

2011 405

2012 431

Or

(a) From the data given below, calculate price index number for 2012 with 2011 as base year :

Commodities 2011 2012

Price Quantity Price Quantity

A 20 8 40 6

B 50 10 60 5

C 40 15 50 15

D 20 20 20 25

Calculate the following :

(i) Laspeyre’s Index

109

(ii) Paasche Index

(iii) Bowley’s Index and

(iv) Fisher’s Ideal Index. 6

(b) Identify the component of a time series with which each of the following be associated

and also give reasons why :

(i) A fire in factory delaying production for three weeks

(ii) An era of prosperity

(iii) Sale of sweets during Deepawali

(iv) A need for increased wheat production due to constant increase in population.

(v) The increase in day temperature from winter to summer. 5

4. (a) Define probability. Discuss the importance of probability in decision-making. 5

(b) A bag contains 2 white balls and 3 black balls. Four persons A, B, C, D in the order

named each draws one ball and does not replace it. The first to draw a white ball receives

Rs. 20. Determine their expectations. 6

Or

(a) A man has five coins, one of which has two heads. He randomly takes out a coin and

tosses it three times :

(i) What is the probability that it will fall head upward all the times?

(ii) If it always falls head upward, what is the probability that it is the coin with two

heads?

(b) In a binomial distribution consisting of 5 independent trials, probabilities of 1 and 2

successes are 0.4096 and 0.2048. Find the parameter ‘p’ of the distribution.

5. A food products company is contemplating the introduction of a revolutionary new product

with new packaging to replace the existing product at much higher price (S1) or a moderate

change in the composition of the existing product with a new packaging at a small increase in

price (S2) or a small change in the composition of the existing except the word “New’ with a

negligible increase in price (S3). The three possible states of nature are :

(i) high increase in sales (N1),

(ii) No change in sales (N2) and

(iii) decrease in sales (N3).

The marketing department of the company worked out the payoffs in terms of yearly net profits

for each of the strategies of these events. This is represented in the following table : 11

State of Nature

Payoffs (in Rs.)

Strategies N1 N2 N3

S1 7,00,000 3,00,000 1,50,000

S2 5,00,000 4,50,000 0

S3 3,00,000 3,00,000 3,00,000

110

Which strategy should the executive choose on the basis of:

(i) Maximin Criterion

(ii) Maximax Criterion

(iii) Minimax Regret Criterion, and

(iv) Laplace Criterion.

Or

(a) What is Standard Error of Estimate ? Why is it calculated? 4

(b) The arithmetic mean of a set of a statistical observations is 20 while its geometric mean is

19 and Harmonic Mean is 25. Comment on the statement. 3

(c) The Mean and Standard Deviation of two brands and interpret the result :

Brand-I Brand-II

Mean 800 hours 770 hours

Standard Deviation 100 hours 60 hours

Calculate a measure of relative dispersion for the two brands and interpret the result. 4

111

- Topik 6 KolerasiUploaded byferrarinaks
- Regression AnalysisUploaded byAshish Baniwal
- Case Study: Single Subject and Group AnalysisUploaded bydixy0
- Correlation and RegressionUploaded bypriyasakthivelp
- Correlation and Hypothesis TestingUploaded byBhavesh Chauhan
- DissertationUploaded byMae Anthonette B. Cacho
- MUP Syllabus-80creditsUploaded byRitu Agrawal
- MB0024 Statistics for Management...ANSWERUploaded bysaiesh
- Functional RelationshipUploaded byRama Nathan
- Youth and Science Centres in NorwayUploaded byYurij Castelfranchi
- Research TrainingUploaded byAnonymous bPe2bO3wN0
- Chapter I INTRODUCTION Background of TheUploaded bykare
- Learning Activity 5.19 PsyUploaded byyolanda Sitepu
- 2965_01 5540F Paper 2 Foundation Tier November 2008Uploaded bystretfordhigh
- B-Modeling Multivariate Distributions Using Monte Carlo Simulation for Structural Reliability Analysis With Complex Performance FunctionUploaded bybuiducnang
- Stat_II_12_practice after midterm 4.pdfUploaded byNataliAmiranashvili
- the effects of class sizeUploaded byapi-253898830
- AbstractUploaded bygg12789
- Crosstabs.docxUploaded byAld' Sanada San
- 1011648bgdfxc060111Uploaded byHugo Mendes
- Discussion Week 7R.docxUploaded byPandurang Thatkar
- draft ch 18Uploaded byDicky Witama Suryadiredja
- management and quality.docxUploaded byselinasimpson1001
- OutputUploaded byAlex Drako
- chapter1_sUploaded byjessigriss
- Investigate The Relationship Between Customers' Cultural Capital And Decision To Buy Foreign Goods(Case Study Border Market Marivan)Uploaded byAJER JOURNAL
- RateUploaded byMahfudhotin
- 3854-1-15396-1-10-20110228Uploaded byAnmol Limpale
- fogelbergUploaded byArie Toddopuli
- Abs TrakUploaded byjesmo aldoran

- 03 OB TutorialUploaded byMSaqibKhan
- 00323 RG Process Based Management Case Study 1 ACME May 2018Uploaded byDinika
- Rules and Regulations[1340]Uploaded byDinika
- toyotaUploaded byDinika
- 15_04_2018_EveningUploaded bySayan Shah
- Nature and Scope of Human Relation Management-4Uploaded byDinika
- chapterUploaded byDinika
- eco 7Uploaded byDinika
- Management Hilal AhmedUploaded byHilal Ahmed
- 18 6 SA V1 S1 Solved Problems McUploaded bySudip Issac Sam
- Quiz I solution.docUploaded byDinika
- financeUploaded byDinika
- CRI.docxUploaded byDinika
- ObcUploaded byAman Kumar
- J 01718 Paper II Management.pdfUploaded byshivaraj p y
- Chap 1Uploaded byHue Nguyen
- Systems Constellations a BettUploaded byDinika
- pdfUploaded byDinika
- J 00818 Paper II CommerceUploaded byAnonymous WtjVcZCg
- UGCNETPaper1Hindi.pdfUploaded byDinika
- Chapter 1 IntroductionUploaded byDinika
- Answers to Homework 5 Fall 2010Uploaded bySri Harsha Challa
- Ug c Net Paper 1 EnglishUploaded byDinika
- J 00018 Paper I Set PUploaded byJagadeesh Jaggu

- Developments in Geophysical Exp - A. K. Booer, A. a. FitchUploaded byMiguel Angel Catunta Zarate
- Utf-8'en-us'Ibyde n Us EnUploaded bymanukleo
- pmp quality management questions.docxUploaded byselinasimpson1901
- Bit_selection_guidelines[1].pdfUploaded byDanish Khan
- Vector Past YearUploaded byWan Afiq
- EngineeringUploaded byDennis Quadjo
- Mathematics in Every Day LifeUploaded byphyaravi
- midsolUploaded byAskhat Zinat
- AMATO et al. (2009).pdfUploaded byAdauto Cezar Nascimento
- Challenges for the Overhead Power Transmission Line Surveyors OverseasUploaded byTATAVARTHYCH HANUMANRAO
- MSA TrainingUploaded byshukumar_24
- DEVELOPMENT AND TRIBOLOGICAL CHARACTERIZATION OF DUAL PARTICLE AND TRIPLE PARTICLE REINFORCED AL-7075/ AL2O3 METAL MATRIX COMPOSITEUploaded byAnonymous pKuPK3zU
- 4 Simetri Dan Tabel KarakterUploaded byDian Eka Fajriyanto
- Souhami 2007 01 Assessment-PTV-MarginUploaded byZoran Mirkov
- Merged DocumentUploaded byAshokAbi
- PsfUploaded bySreenath Reddy
- Simplifying Rational Algebraic ExpressionsV3 (1)Uploaded byCarlo Dean Guilles
- Zlatko Bacic and John C. Light- Theoretical Methods for Rovibrational States of Floppy MoleculesUploaded byImasmz
- Froehlich VonTerzi Hybrid-LES-RANS PAS 08Uploaded bySiva Raj
- Control Systems KuestionUploaded byRose Kayo
- 218365889-Aits-Part-Test-Ii-qn-Sol.pdfUploaded bydilip kumar
- Chapters 4 - 5- 6 Quiz MC Answers.docUploaded byJamie N Clint Brendle
- EnSPy: Python library for computations of ensembles of particles on GPUUploaded byPhtRaveller
- 17-H-Swarupa Vishnusai -Image Fusion Based on Spatial Weightage in NonsubsampledUploaded byAnand Kumar
- Research on Z-TransformsUploaded byEdward Amoyen Abella
- DTFT Tables DetailedUploaded byFarhan Saeed
- unit 1Uploaded byAlejandra Ortega Ruiz
- Texes Generalistec 6 FinalUploaded byMiguel Alejandro Flores Espino
- International Refereed Journal of Engineering and Science (IRJES)Uploaded bywww.irjes.com
- 173232298 a Guide to Modern Econometrics by Verbeek 1 10Uploaded byAnonymous T2LhplU