Professional Documents
Culture Documents
Sh. K.B.Gupta
Content Writers
Dr. Alok Kumar, Dr. Rakesh Kumar Gupta
Revised by
Dr. Sumita Jain
Content Reviewer from the DDCE/COL/SOL
Dr. Neha Nainwal and Dr. Promila Bharadwaj
Academic Coordinator
Mr. Deekshant Awasthi
Published by:
Department of Distance and Continuing Education
Campus of Open Learning/School of Open Learning,
University of Delhi, Delhi-110007
Printed by:
School of Open Learning, University of Delhi
DISCLAIMER
Disclaimer
u Unit I-V are edited versions of study material prepared for the courses
under Annual & CBCS Mode.
u Corrections/Modifications/Suggestions proposed by Statutory Body, DU/
Stakeholder/s in the Self Learning Material (SLM) will be incorporated in
the next edition. However, these corrections/modifications/suggestions will
be uploaded on the website https://sol.du.ac.in. Any feedback or suggestions
can be sent to the email- feedbackslm@col.du.ac.in
Printed at: Taxmann Publications Pvt. Ltd., 21/35, West Punjabi Bagh,
New Delhi - 110026 (18000 Copies, 2023)
PAGE
UNIT 1
Lesson 1 : Preparation of Frequency Distribution and their Graphical Presentation
1.1 Learning Objectives 3
1.2 What is Frequency Distribution 3
1.3 Types of Frequency Distribution 4
1.4 Principles of Frequency Distribution 8
1.5 Graphs 13
1.6 Summary 25
1.7 Self-Assessment Questions 26
PAGE i
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
PAGE
Lesson 5 : Moments
5.1 Learning Objectives 153
5.2 Concept of Central Moments 153
5.3 Sheppard’s Method 163
5.4 &RHI¿FLHQWV RI 0RPHQWV
5.5 Summary 167
5.6 Self-Assessment Questions 168
ii PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
CONTENTS
PAGE
UNIT 2
Lesson 1 : Theory of Probability
1.1 Learning Objectives 177
1.2 Probability Distribution 177
1.3 Basic Terminology in Probability 180
1.4 Methods of Assigning Probability 185
1.5 Computation of Probability 189
1.6 Laws of Probability 194
1.7 Bayes’ Theorem 202
1.8 Expected Value 206
1.9 Summary 208
1.10 Self-Assessment Questions 210
PAGE iii
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
PAGE
3.7 Decision Tree 263
3.8 Summary 268
3.9 Self-Assessment Questions 269
UNIT 3
Lesson 1 : Simple Correlation
1.1 Learning Objectives 279
1.2 Introduction 279
1.3 Utility of Correlation 280
1.4 Difference between Correlation and Causation 281
1.5 Types of Correlation 282
1.6 Methods of Studying Correlation 283
1.7 Summary 301
1.8 Self-Assessment Questions 302
UNIT 4
Lesson 1 : Index Numbers
1.1 Learning Objectives 335
1.2 Introduction 336
iv PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
CONTENTS
PAGE
1.3 Features of Index Numbers 337
1.4 Problems of Index Numbers 338
1.5 Methods of Constructing Index Numbers 341
1.6 Tests of Adequacy or Consistency 350
1.7 Chain Base Index 353
1.8 Splicing 356
1.9 Consumer Price Index 358
1.10 Index Number of Industrial Production 360
1.11 Limitations of Index Numbers 361
1.12 Construction of BSE Sensex and NSE Nifty 361
1.13 Summary 370
1.14 Self-Assessment Questions 371
UNIT 5
Lesson 1 : Time Series Analysis
1.1 Learning Objectives 379
1.2 Introduction 379
1.3 Components of Time Series 380
1.4 Models of Time Series 383
1.5 Methods of Measuring Trend 384
1.6 Second Degree Parabola 396
1.7 Exponential Trend 398
1.8 Shifting the Trend Origin 400
1.9 Conversion of Annual Trend to Monthly Trend 401
1.10 Measurement of Seasonal Variations 403
1.11 Summary 418
1.12 Self-Assessment Questions 419
Glossary 431
PAGE v
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
UNIT-1
PAGE 1
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
L E S S O N
1
Preparation of Frequency
Distribution and their
Graphical Presentation
STRUCTURE
1.1 Learning Objectives
1.2 What is Frequency Distribution
1.3 Types of Frequency Distribution
1.4 Principles of Frequency Distribution
1.5 Graphs
1.6 Summary
1.7 Self-Assessment Questions
PAGE 3
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
4 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 5
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
6 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
classify the data into different classes by dividing the entire group of Notes
values of the variable into a suitable number of groups and then recording
the number of observations in each group. Thus, if we divide the total
range of values of the variable (marks of 50 students) i.e. 78 – 15 =
63 into groups of 10 each, then we shall get (63/10) 6 groups and the
distribution of marks is displayed by the following frequency distribution :
Marks of 50 students
Marks (×) Tally Bars Number of Students
(f)
15–25 ||| 3
25–34 |||| |||| 9
35–45 |||| |||| ||| 13
45–55 |||| |||| ||| 13
55–65 |||| |||| 9
65–75 || 2
75–85 | 1
Total 50
The various groups into which the values of the variable are classified
are known as classes, the length of the class interval (10) is called the
width of magnitude of the class. Two values, specifying the class, are
called the class limits. The presentation of the data into continuous classes
with the corresponding frequencies is known as continuous frequency
distribution. There are two methods of classifying the data according to
class intervals :
(i) Exclusive method
(ii) Inclusive method
In an exclusive method, the class intervals are fixed in such a manner
that upper limit of one class becomes the lower limit of the following
class. Moreover, an item equal to the upper limit of a class would be
excluded from that class and included subsequently in the next class. The
following data are classified on this basis.
Income (Rs.) No. of Persons
200–250 50
250–300 100
PAGE 7
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
8 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
2. Number of Classes : The choice about the number of classes into Notes
which a given frequency distribution should be divided depends
upon:
(i) The total frequency which means the total number of observations
in the distribution.
(ii) The nature of the data which means the size or magnitude of
the values of the variable.
(iii) The desired accuracy.
(iv) The convenience regarding computation of the various descriptive
measures of the frequency distribution such as means, variance
etc.
The number of classes should neither be too small nor too large. In case
the classes are few, the classification becomes very broad and rough
which might obscure some important features and characteristics of the
data. The accuracy of the results decreases as the number of classes
becomes smaller. On the other hand, too many classes will result in
very few frequencies in each class. This will give an irregular pattern
of frequencies in different classes thus makes the frequency distribution
irregular. Moreover a large number of classes will render the distribution
too unwieldy to handle. The computational work for further processing
of the data will become quite tedious and time consuming without any
proportionate gain in the accuracy of the results. Hence a balance should
be maintained between the loss of information in the first case and
irregularity of frequency distribution in the second case, to arrive at a
pleasing compromise giving the optimum number of classes. Normally,
the number of classes should not be less than 5 and more than 20. Prof.
Sturges has given a formula :
k = 1 + 3.322 log n
where k refers to the number of classes and n is the total frequency or
number of observations. The value of k is rounded to the next higher
integer :
If n = 100 k = 1 + 3.322 log 100 = 1 + 6.6448 = 8
If n = 10,000 k = 1 + 3.322 log 10,000 = 1 + 13.288 = 14
PAGE 9
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes However, this rule should be applied only when the number of observations
are not very small.
Moreover, the number or class intervals should be such that they give
uniform and unimodal distribution which means that the frequencies in
the given classes increase and decrease steadily and there are no sudden
jumps. The number of classes should be an integer preferably 5 or
some multiples of 5, 10, 15, 20, 25 etc. which are quite convenient for
numerical computations.
3. Size of class intervals : Because the size of the class interval is
inversely proportional to the number of classes in a given distribution,
the choice about the size of the class interval will also depend upon the
sound subjective judgment of the statistician. An approximate value of
the magnitude of the class interval say i can be calculated with the help
of Sturge’s Rule :
Range
i=
1 + 3.322 log n
where i stands for class magnitude or interval, Range is calculated by
taking the difference between largest and smallest value of the distribution,
and n refers to total number of observations.
If we are given the following information; n = 400, Largest item = 1300
and Smallest item = 340.
1300 − 340 960 960
then, i = = = = 99.54(100 approx)
1 + 3.322log 400 1 + 3.322 × 2.6021 9.644
Another rule of thumb for determining the size of the class interval is
that the length of the class interval should not be greater than 1 th of the
4
estimated population standard deviation. If 6 is the estimate of population
standard deviation then the length of class interval is given by: i
The size of class intervals should be taken as 5 or multiples of 5,10,15 or
20 for easy computations of various statistical measures of the frequency
distribution, class intervals should be so fixed that each class has a
convenient mid-point around which all the observations in that class cluster.
It means that the entire frequency of the class is concentrated at the mid
value of the class. This assumption will be true only if the frequencies
of the different classes are uniformly distributed in the respective class
10 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 11
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes 6. Open end classes : The classification is termed as open end classification
if the lower limit of the first class or the upper limit of the last class
or both are not specified and such classes in which one of the limits is
missing are called open end classes. For example, the classes like the
marks less than 20 or age below 60 years. As far as possible open end
classes should be avoided because in such classes the mid-value cannot
be accurately obtained. But if the open end classes are inevitable then it
is customary to estimate the class mark or mid-value for the first class
with reference to the succeeding class. In other words, we assume that
the magnitude of the first class is same as that of the second class.
Example 1 : Construct a frequency distribution from the following data
by inclusive method taking 4 as the class interval :
10 17 15 22 11 16 19 24 29 18
25 26 32 14 17 20 23 27 30 12
15 18 24 36 18 15 21 28 33 38
34 13 10 16 20 22 29 19 23 31
Solution : Because the minimum value of the variable is 10 which is
a very convenient figure for taking the lower limit of the first class
and the magnitude of the class interval is given to be 4, the classes for
preparing frequency distribution by the Inclusive Method will be 10-13,
14-17, 18-21, 22-25,............38-41.
Frequency Distribution
Class Interval Tally Bars Frequency (f)
10 – 13 |||| 5
14 – 17 |||| ||| 8
18 – 21 |||| ||| 8
22 – 25 |||| || 7
26 – 29 |||| 5
30 – 33 |||| 4
34 – 37 || 2
38 – 41 | 1
12 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
1.5 Graphs
The guiding principles for the graphic representation of the frequency
distributions are precisely the same as for the diagrammatic and graphic
representation of other types of data. The information contained in a
frequency distribution can be shown in graphs which reveals the important
PAGE 13
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes characteristics and relationships that are not easily discernible on a simple
examination of the frequency tables. The most commonly used graphs
for charting a frequency distribution for the general understanding of the
details of the data are :
1. Histogram
2. Frequency polygon
3. Smoothed frequency curves/Frequency Curves
4. Ogives or cumulative frequency curves.
1.5.1 Histogram
The term ‘histogram’ must not be confused with the term ‘historigram’
which relates to time charts. Histogram is the best way of presenting
graphically a simple frequency distribution. The statistical meaning of
histogram is that it is a graph that represents the class frequencies in a
frequency distribution by vertical adjacent rectangles.
While constructing histogram the variable is always taken on the X-axis
and the corresponding frequencies on the Y-axis. Each class is then
represented by a distance on the scale that is proportional to its class-
interval. The distance for each rectangle on the X-axis shall remain
the same in case the class-intervals are uniform throughout; if they are
different the width of the rectangles shall also change proportionately.
The Y-axis represents the frequencies of each class which constitute the
height of its rectangle. We get a series of rectangles each having a class
interval distance as its width and the frequency distance as its height.
The area of the histogram represents the total frequency.
The histogram should be clearly distinguished from a bar diagram. A bar
diagram is one-dimensional i.e., only the length of the bar is important
and not the width, a histogram is two-dimensional, that is, in a histogram
both the length and the width are important. However, a histogram can
be misleading if the distribution has unequal class-intervals and suitable
adjustments in frequencies are not made.
The technique of constructing histogram is explained for :
(i) distributions having equal class-intervals and
(ii) distributions having unequal class-intervals.
14 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
When class-intervals are equal, take frequency on the Y-axis, the variable Notes
on the X-axis and construct rectangles. In such a case the heights of the
rectangles will be proportional to the frequencies.
Example 3 : Draw a histogram from the following data :
Classes Frequency
0 – 10 5
10 – 20 11
20 – 30 19
30 – 40 21
40 – 50 16
50 – 60 10
60 – 70 8
70 – 80 6
80 – 90 3
90 – 100 1
Solution:
PAGE 15
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
150
100
50
X
600 700 800 900 1000 1100 1200 1300 1400 1500 1800
MONTHLY INCOME
When mid point are given, first we ascertain the upper and lower limits
of each class and then construct the histogram in the same manner.
Example 5 : Draw a histogram of the following distribution :
Life of Electric Lamps Firm A Firm B
in hours
1010 10 287
1030 130 105
1050 482 26
1070 360 230
1090 18 352
16 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Solution : Since we are given the mid points, we should ascertain the Notes
class limits. To calculate the class limits of various classes, take difference
of two consecutive mid-points and divide the difference by 2, then add
and subtract the value obtained from each mid-point to calculate lower
and higher class-limits.
Life of Electric Frequency Frequency
Lamps Firm A Firm B
1000–1020 10 287
1020–1040 130 105
1040–1060 482 76
1060–1080 360 230
1080–1100 18 352
400 400
FREQUENCY
FREQUENCY
300 300
200 200
100 100
1000 1020 1040 1060 1080 1100 1000 1020 1040 1060 1080 1100
LIFE OF LAMPS LIFE OF LAMPS
PAGE 17
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Y
400
200
100
CLASS MARK
1 2
NUMBER OF STUDENTS (FREQUENCY)
5
4 5
300 3
5
200 7
3
2
100 r
2 8
1 9
X
0
CLASS MARK
18 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
50 HISTOGRAM
40
FREQUENCY
CURVE
30
NO. OF LEAVES
20
FREQUENCY
10
POLYGON
6.5
7.5
8.5
9.5
10.5
11.5
12.5
13.5
14.5
PAGE 19
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
20 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 21
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes
1500
1250
1000
CUMULATIVE FREQUENCY
750
500
250
X
0
90.5
100.5
110.5
120.5
130.5
140.5
150.5
160.5
170.5
180.5
190.5
200.5
210.5
220.5
230.5
SIZES
22 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Y
Notes
1500
1250
1000
CUMULATIVE FREQUENCY
750
500
250
X
0
90.5
100.5
110.5
120.5
130.5
140.5
150.5
160.5
170.5
180.5
190.5
200.5
210.5
220.5
230.5
SIZES
PAGE 23
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Solution :
Profit (Rs. No. of OGIVE BY LESS THAN METHOD
Crores) Companies
Less than 20 8 100
92
Less than 30 20 80
NO. OF COMPANIES
92–51 = 41
Less than 40 40
60
Less than 50 64 51
Less than 60 79 40
Less than 70 89 20
Less than 80 96
Less than 90 99 20 30 40 45 50 60 70 75 80 85
It is clear from the graph that the number of companies getting profits
less than Rs. 75 crores is 92 and the number of companies getting profits
less than Rs. 45 crores is 51. Hence the number of companies getting
profits between Rs. 45 crores and Rs. 75 crores is 92–51 = 41.
Example 8 : The following distribution is with regard to weight in grams
of mangoes of a given variety. If mangoes of weight less than 443 grams
be considered unsuitable for foreign market, what is the percentage of
total yield suitable for it? Assume the given frequency distribution to
be typical of the variety:
Weight in gms. No. of Mangoes
410–419 10
420–429 20
430–439 42
440–449 54
450–459 45
460–469 18
470–479 7
Draw an ogive of ‘more than’ type of the above data and deduce how
many mangoes will be more than 443 grams.
Solution : Mangoes weighing more than 443 gms. are suitable for foreign
market. Number of mangoes weighing more than 443 gms lies in the last
four classes. Number of mangoes weighing between 444 and 449 grams
would be:
24 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
6 324 Notes
× 54 = = 32.4
10 10
Total number of mangoes weighing more than 443 gms. = 32.4 +
45 + 18 + 7 = 102.4
102.4
Percentage of mangoes = = × 100 = 52.25
196
Therefore, the percentage of the total mangoes suitable for foreign
market is 52.25.
OGIVE BY MORE THAN METHOD
Weight more No. of Mangoes
than (gms.)
410 196 OGIVE BY MORE THAN METHOD
200
420 186 180
140
440 124 120
450 70 100
80
460 25 60
470 7 40
20
410 420 430 440 450 460 470
Weight in grams
From the graph it can be seen that there are 103 mangoes whose weight
will be more than 443 gms and are suitable for foreign market.
1.6 Summary
A frequency distribution aims to reduce the size of the given set of data
for a better comprehension. An array, which is an arrangement of data
in an ascending or descending order of magnitude, is a useful step in
preparing a frequency distribution. To prepare a frequency distribution, we
have to decide about the class intervals to be taken. The width of class
intervals depends on the number of classes. The number of classes should
not be very small or very large. Given values are considered one by one
and placed in appropriate class intervals. The number of observations in
each class is called the class frequency.
The class intervals may be overlapping Like 10–20, 20–30, etc. or
inclusive like 10–19, 20–29, etc. Inclusive class intervals should be
PAGE 25
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes transformed into exclusive classes, depending on the way the given data
are recorded. Class mid-points are the points that lie halfway between the
two class limits. The frequencies of a distribution can also be cumulated
in ascending or descending order. They are known as ACF and DCF.
respectively. The ACF are ‘less than’ cumulative frequencies while the
DCF are ‘more than’ cumulative frequencies. Absolute class frequencies
may also be expressed as relative frequencies, either as proportions or
percentages. A frequency distribution may have class intervals with equal
or unequal width.
A frequency distribution may be shown graphically by a histogram and
frequency polygon. A histogram consists of bars drawn over class limits
with heights of bars such that the areas of the bars are proportional
to the frequencies of various class intervals. A frequency polygon is a
line chart and is drawn by joining points given by the class mid-points
and class frequencies. Cumulative frequencies arc shown graphically by
means of gives.
26 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 27
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (vi) From the time cards of a factory, the following information has been
obtained about the number of days each one of the 48 workers has
reported late for the work during the last month:
3 0 5 0 6 2 1 0 4 6 5 2 1 1 1 3 4 2 2 5 6 3 0 2
2 3 2 5 4 2 4 3 5 2 2 2 4 6 4 0 3 1 1 4 5 2 1 1
28 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 29
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Draw a ‘less than’ ogive and answer the following from it:
(a) What are the limits within which incomes of the middle 50 per cent
of the families lie?
(b) It is decided that 80 per cent of the families should pay income tax.
What is the minimum taxable income?
(c) What is the minimum income of the richest 30 per cent of the
families?
Ans.
(x) (b) = 47 and 70 (xi) (b) = 126 and 86
(xii) (a) = 5000 – 9250
(b) = 4600
(c) = 8250
30 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
L E S S O N
2
Measures of Central
Tendency–Mathematical
and Positional Averages
STRUCTURE
2.1 Learning Objectives
2.2 What is Central Tendency?
2.3 Objectives of Central Tendency
2.4 Characteristics
2.5 Types of Averages
2.6 Mean
2.7 Geometric Mean
2.8 Harmonic Mean
2.9 Median
2.10 Other Positional Averages
2.11 Calculation of Missing Frequencies
2.12 Mode
2.13 Summary
2.14 Self-Assessment Questions
PAGE 31
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
32 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Second object is that an average represents the entire data, it facilitates Notes
comparison within one group or between group of data. Thus, the
performance of the members of a group can be compared with the average
performance of different group.
Third object is that an average helps in computing various other statistical
measures such as dispersion, skewness. kurtosis etc.
2.4 Characteristics
An average represents the statistical data and it is used for purposes of
comparison, it must possess the following properties.
1. It must be rigidly defined and not left to the mere estimation of
the observer. If the definition is rigid, the computed value of the
average obtained by different persons shall be similar.
2. The average must be based upon all values given in the distribution.
If the item is not based on all values it might not be representative
of the entire group of data.
3. It should be easily understood. The average should possess simple
and obvious properties. It should be too abstract for the common
people.
4. It should be capable of being calculated with reasonable care and
rapidity.
5. It should be stable and unaffected by sampling fluctuations.
6. It should be capable of further algebraic manipulation.
PAGE 33
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
2.6 Mean
∑X
X =
N
250 + 275 + 265 + 280 + 400 + 490 + 670 + 890 + 1100 + 1250 5870
X= = = Rs. 587
10 10
Short-cut Method : Direct method is suitable where the number of items
is moderate and the figures are small sizes and integers. But if the number
of items is large and/or the values of the variate are big, then the process
of adding together all the values may be a lengthy process. To overcome
this difficulty of computations, a short-cut method may be used. Short
cut method of computation is based on an important characteristic of the
arithmetic mean, that is, the algebraic sum of the deviations of a series
of individual observations from their mean is always equal to zero. Thus
deviations of the various values of the variate from an assumed mean
34 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
computed and the sum is divided by the number of items. The quotient Notes
obtained is added to the assumed mean to find the arithmetic mean.
Σdx
Symbolically, X = A + , where A is assumed mean and deviations
N
or dx = (X – A).
We can solve the previous example by short-cut method.
Computation of Arithmetic Mean
Serial Salary (Rupees) Deviations from assumed mean
Number X where dx = (X – A), A = 400
1. 250 – 150
2. 275 – 125
3. 265 – 135
4. 280 – 120
5. 400 0
6. 490 + 90
7. 670 + 270
8. 890 + 490
9. 1100 + 700
10. 1250 + 850
N = 10 Σdx = 1870
Σdx
X=A+
N
By substituting the values in the formula, we get
1870
X = 400 + = Rs. 587
10
Computation of Arithmetic Mean in Discrete Series. In discrete series,
arithmetic mean may be computed by both direct and short cut methods.
The formula according to direct method is:
1 Σ( fX )
X= ( f1 X 1 + f 2 X 2 + ........... + f n X n ) =
n N
where the variable values X1, X2, ........Xn have frequencies f1, f2 ,........fn
DQG 1 Ȉf.
PAGE 35
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Example 2 : The following table gives the distribution of 100 accidents
during area days of the week in a given month. During a particular
month there were 5 Fridays and Saturdays and only four each of other
days. Calculate the average number of accidents per day.
Days : Sun Mon Tue Wed Thur Fri Sat Total
Number of
accidents : 20 22 10 9 11 8 20 = 100
Solution :
Calculation of Number of Accidents per Day
Day No. of No. of days in Total accidents
Accidents month
X f fX
Sunday 20 4 80
Monday 22 4 88
Tuesday 10 4 40
Wednesday 9 4 36
Thursday 11 4 44
Friday 8 5 40
Saturday 20 5 100
100 N = 30 Σf X = 428
ΣfX 428
X= = = 14.27 = 14 accidents per day
N 30
The formula for computation of arithmetic mean according to short cut
method is
Σfdx
X=A+ where A is Assumed mean, dx = (X – A) and
N
We can solve the previous example by short-cut method as given below :
Calculation of Average Accidents per day
Day X dx = X–A (where f fdx
A = 10)
Sunday 20 + 10 4 + 40
Monday 22 + 12 4 + 48
Tuesday 10 + 0 4 + 0
36 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 37
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
38 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
We can observe that answer of average marks i.e. 28.8 is identical by Notes
all methods.
PAGE 39
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
40 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(iv) If we are given the arithmetic mean and number of items of two Notes
or more groups, we can compute the combined average of these
groups by applying the following formula :
N1X1 + N 2 X 2
X12 =
N1 + N 2
N1X1 +N 2 X 2
Apply X12 =
N1 + N 2
PAGE 41
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Example 6 : The mean age of a combined group of men and women is
30 years. If the mean age of the group of men is 32 and that of women
group is 27. find out the percentage of men and women in the group.
Solution : Let us take group of men as first group and women as second
group. Therefore, X1 = 32 years, X 2 = 27 years, and X12 = 30 years. In
the problem, we are not given the number of men and women. We can
assume N1 + N2 = 100 and therefore, N1 = 100 – N2
N1X1 + N 2 X 2
Apply X12 =
N1 + N 2
32N1 + 27N 2
30 = (Substitute N1 = 100 – N2)
100
30 ×100 = 32 (100 – N 2 ) + 27 N 2 or 5N 2 = 200
N 2 = 200 / 5 = 40%
N1 = (100 – N 2 ) = (100 – 40) = 60%
Σf = N.X
if we replace each item in the series by the mean, the sum of these
substitutions will be equal to the sum of the individual items. This
property is used to find out the aggregate values and corrected
averages. We can understand the property with the help of an
example.
Example 7 : Mean of 100 observations is found to be 44. If at the time
of computation two items are wrongly taken as 30 and 27 inplace of 3
and 72. Find the corrected average.
42 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
ΣX Notes
Solution : X =
N
PAGE 43
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes by 10. Therefore the appropriate assumption in this case would be that
the lower limit of the first class is zero and the upper limit of the last
class is 150. In case of other open-end class distributions the first class
limit should be fixed on the basis of succeeding class interval and the
last class limit should be fixed on the basis of preceding class interval.
If the class intervals are of varying width, an effort should be made to
avoid calculating mean and mode. It is advisable to calculate median.
44 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
where W1, W2, W3, W4 are weights and X1, X2, X3, X4 represents the
price of 4 varieties of toy.
Hence by substituting the values of W1, W2, W3, W4 and X1, X2, X3, X4,
we get
(50 × 3) + (25 × 5) + (15 × 7) + (10 × 9)
Xw =
50 + 25 + 15 + 10
150 + 125 + 105 + 90 470
Xw = = = Rs. 4.70
100 100
The table given below demonstrates the procedure of computing the
weighted Mean.
Weighted Arithmetic mean of Toys by the Raja Shop.
Toy Price per toy (Rs.) Number sold Price × weight
X W WX
Car 3 50 150
Locomotive 5 25 125
Aeroplane 7 15 105
Double Decker 9 10 90
∑ W = 100 ∑ WX = 470
∑ WX 470
? Xw = = = Rs. 4.70
∑X 100
Example 8 : The table below shows the number of skilled and unskilled
workers in two localities along with their average hourly wages.
Ram Nagar Shyam Nagar
Worker Number Wages (per hour) Number Wages (per hour)
Category
Skilled 150 1.80 350 1.75
Unskilled 850 1.30 650 1.25
Determine the average hourly wage in each locality. Also give reasons
why the results show that the average hourly wage in Shyam Nagar
exceed the average hourly wage in Ram Nagar, even though in Shyam
PAGE 45
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Nagar the average hourly wages of both categories of workers is lower.
It is required to compute weighted mean.
Solution :
Ram Nagar Shyam Nagar
Worker X W WX X W WX
Category
Skilled 1.80 150 270 1.75 350 612.50
Unskilled 1.30 850 1105 1.25 650 812.50
Total 1000 1375 1000 1425
1375 1425
Xw = = Rs.1.375 Xw = = Rs. 1.425
1000 1000
It may be noted that weights are more evenly assigned to the different
categories of workers in Shyam Nagar than in Ram Nagar.
In the case of a discrete series, if x1, x2,...........xn occur f1, f2,.......fn times
respectively and N is the total frequency (i.e. N = f1+, f2+,.........fn), then
G.M. = n x1 f1 , x2 f 2 ,...........xn f n
⎛ ∑ log x ⎞
= AL ⎜ ⎟ , where AL stands for anti log.
⎝ N ⎠
∑ f log x
In discrete series, G.M. = AL
N
∑ f log m
and in case of continuous series, G.M. = AL
N
46 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
⎛ Σf log x ⎞ ⎛ 36.1281 ⎞
G.M. = AL ⎜ ⎟ = AL ⎜ ⎟ = AL (0.9032) = 8.002
⎝ N ⎠ ⎝ 40 ⎠
Example 11 : Calculate G.M. from the following data :
X f
9.5–14.5 10
14.5–19.5 15
19.5–24.5 17
24.5–29.5 25
29.5–34.5 18
34.5–39.5 12
39.5–44.5 8
PAGE 47
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Solution :
Calculation of G.M.
X m log m f f log m
9.5–14.5 12 1.0792 10 10.7920
14.5–19.5 17 1.2304 15 18.4560
19.5–24.5 22 1.3424 17 22.8208
24.5–29.5 27 1.4314 25 35.7850
29.5–34.5 32 1.5051 18 27.0918
34.5–39.5 37 1.5682 12 18.8184
39.5–44.5 42 1.6232 8 12.9856
N = 105 Σf log m = 146.7496
⎛ 146.7496 ⎞
G.M. = AL ⎜ ⎟ = AL (1.3976) = 24.98
⎝ 105 ⎠
48 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
2000
r = 10 − 1 = 10 2 − 1
1000
⎡ log 2 ⎤ ⎡ 0.30103 ⎤
= AL ⎢ ⎥ − 1 = AL ⎢ − 1 = 1.0718 − 1 = 0.0718 = 7.18%
⎣ 10 ⎦ ⎣ 10 ⎥⎦
Hence, the rate of growth in GNP is 7.18%.
Example 13 : The price of commodity increased by 5 per cent from
1998 to 1999, 8 per cent from 1999 to 2000 and 77 per cent from 2000
to 2001. The average increase from 1998 to 2001 is quoted at 26 per
cent and not 30 per cent. Explain this statement and verify your result.
Solution : Taking Pn as the price at the end of the period. Po as the
price in the beginning, we can substitute the values of Pn and Po in the
compound interest formula. Taking Po = 100; Pn = 200.72
Pn = Po(1 + r)n
200.72 = 100(1 + r)3
200.72 200.72
or (1 + r)3 = or 1 + r = 3
100 100
200.72
r = 3 − 1 = 1.260 − 1 = 0.260 = 26%
100
Thus increase is not average of (5 + 8 + 77)/3 = 30 per cent. It is 26%
as found out by G.M.
PAGE 49
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
⎡ Σ(log x × w) ⎤
= AL ⎢ ⎥⎦
⎣ Σw
Example 14 : Find out weighted G.M. from the following data :
Group Index number Weights
Food 352 48
Fuel 220 10
Cloth 230 8
House Rent 160 12
Misc. 190 15
Solution :
Calculation of Weighted G.M.
Group Index Weights (w) log x w log x
Number(x)
Food 352 48 2.5465 122.2320
Fuel 220 10 2.3424 23.4240
Cloth 230 8 2.3617 18.8936
House Rent 160 12 2.2041 26.4492
Misc. 190 15 2.2788 34.1820
93 225.1808
⎡ Σw log x ⎤ 225.1808
G.M. (weighted) = AL ⎢ ⎥ = AL = 263.8
⎣ Σw ⎦ 93
Example 15 : A machine depreciates at the rate of 35.5% per annum in
the first year, at the rate of 22.5% per annum in the second year, and
at the rate of 9.5% per annum in the third year, each percentage being
computed on the actual value. What is the average rate of depreciation?
50 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
⎡ Σ log x ⎤ 5.6555
Apply G.M.= AL ⎢ = = AL 1.8851 = 76.77
⎣ w ⎥⎦ 3
? Average rate of depreciation = 100 – 76.77 = 23.33%.
Example 16 : The arithmetic mean and geometric mean of two values
are 10 and 8 respectively. Find the values.
Solution : If two values are taken as a and b, then
a+b
= 10, and ab = 8
2
Or a + b = 20, ab = 64
{ }
H.M. =
1
Σ f×
x
and in case of a continuous series,
PAGE 51
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes N
{ }
H.M. =
1
Σ f×
m
It may be noted that none of the values of the variable should be zero.
Example 17 : Calculate harmonic mean from the following data : 5, 15,
25, 35 and 45
Solution :
X 1
X
5 0.20
15 0.067
25 0.040
35 0.029
45 0.022
⎛1⎞
N = 5 Σ ⎜ ⎟ = 0.358
⎝X⎠
N 5
H.M. = = = 14 approx.
⎛ 1 ⎞ 0.358
Σ⎜ ⎟
⎝x⎠
Example 18 : From the following data compute the value of the harmonic
mean :
x : 5 15 25 35 45
f : 5 15 10 15 5
Solution :
Calculation of Harmonic Mean
x f
1
f
x
5 5 0.200 1.000
15 15 0.067 1.005
25 10 0.040 0.400
35 15 0.29 0.435
52 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
x f Notes
1
f
x
45 5 0.022 0.110
⎛ 1⎞
Ȉf = 50 Σ ⎜ f ⎟ = 2.950
⎝ x⎠
N 50
H.M. = = = 17 approx.
⎧ 1⎫ 2.95
Σ⎨ f × ⎬
⎩ x⎭
Example 19 : Calculate harmonic mean from the following distribution:
x f
0–10 5
10–20 15
20–30 10
30–40 15
40–50 5
Solution : First of all, we shall find out mid points of the various
classes. They are 5, 15, 25, 35 and 45. Then we will calculate the H.M.
by applying the following formula :
N
{ }
H.M. =
1
Σ f×
m
Calculation of Harmonic Mean
x (mid points) f 1 1
f
x x
5 5 0.200 1.000
15 15 0.067 1.005
25 10 0.040 0.400
35 15 0.29 0.435
45 5 0.022 0.110
⎛ 1⎞
Ȉf = 50 Σ ⎜ f ⎟ = 2.950
⎝ x⎠
PAGE 53
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes N 50
H.M. = = = 17 approximately
⎛ 1⎞ 2.950
Σ⎜ f × ⎟
⎝ m⎠
54 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
x+ y Notes
X = = 19.5
2
or x + y = 39 ...(i)
Also, G.M. = xy = 18
or xy = 324 ...(ii)
Now, (x – y)2 = ( x + y ) 2 − 4 xy
Now, G.M. = AM × HM
G.M.2 = AM × HM
or H.M. = GM2/AM
H.M. = 182/19.5 = 16.62
2.9 Median
The median is that value of the variable which divides the group in two
equal parts. One part comprising the values greater than and the other all
values less than median. Median of a distribution may be defined as that
value of the variable which exceeds and is exceeded by the same number
of observation. It is the value such that the number of observations above
it is equal to the number of observations below it. Thus we know that
the arithmetic mean is based on all items of the distribution, the median
is positional average, that is, it depends upon the position occupied by
a value in the frequency distribution.
When the items of a series are arranged in ascending or descending
order of magnitude the value of the middle item in the series in known
as median in the case of individual observations. Symbolically,
PAGE 55
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes ⎛ N +1⎞
Median = size of ⎜ ⎟ th item
⎝ 2 ⎠
If the number of items is even, then there is no value exactly in the
middle of the series. In such a situation the median is arbitrarily taken
to be halfway between the two middle items. Symbolically,
N ⎛ N +1 ⎞
size of th item + size of ⎜ ⎟ th item
Median = 2 ⎝ 2 ⎠
2
Example 23: Find the median of the following series:
(i) 8, 4, 8, 3, 4, 8, 6, 5, 10.
(ii) 15, 12, 5, 7, 9, 5, 11, 28.
Solution :
Computation of Median
(i) (ii)
Serial No. X Serial No. X
1 3 1 5
2 4 2 5
3 4 3 7
4 5 4 9
5 6 5 11
6 8 6 12
7 8 7 15
8 8 8 28
9 10
N = 9 N = 8
⎛ N +1 ⎞ 9 +1
For (i) series Median = size of the ⎜ ⎟ th item = size of the th
⎝ 2 ⎠ 2
item = size of 5th item = 6
⎛ N +1 ⎞
For (ii) series Median = size of the ⎜ ⎟ th item
⎝ 2 ⎠
⎛ 8 + 1⎞
= size of the ⎜⎝ ⎟ th item
2 ⎠
56 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
⎛ N +1 ⎞ 1059 + 1
Median = size of ⎜ ⎟ th item = size of th item = 530th item.
⎝ 2 ⎠ 2
Median lies in the cumulative frequency of 692 and the value corresponding
to this is 4.
Therefore, Median = 4 rooms
In a continuous series, median is computed in the following manner :
(i) Arrange the given variable data in ascending or descending order.
PAGE 57
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (ii) If inclusive series is given, it must be converted into exclusive series
to find real class intervals.
(iii) Find cumulative frequencies.
N
(iv) Apply Median = size of th item to ascertain median class.
2
(v) Apply formula of interpolation to ascertain the value of median.
N N
− cf 0 − cf 0
Median = l1 + 2 × (l2 − l1 ) or Median = l2 − 2 × (l2 − l1 )
f f
where, l1 refers to lower limit of median class
l2 refers to higher limit of median class
cf0 refers cumulative frequency of previous class
f refers to frequency of median class.
Example 25 : The following table gives you the distribution of marks
secured by some students in an examination
Marks No. of Students
0–20 42
21–30 48
31–40 120
41–50 84
51–60 48
61–70 36
71–80 31
Find the median marks.
Solution :
Calculation of Median Marks
Marks No. of students cf
(x) (f)
0–20 42 42
21–30 38 80
31–40 120 200
41–50 84 284
58 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
3( N + 1)
Third quartile (Q3) = size of th item
4
⎛ N +1 ⎞
First decile (D1) = size of ⎜ ⎟ th item
⎝ 10 ⎠
6( N + 1)
Sixth decile (D6) = size of th item
10
PAGE 59
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes ⎛ N +1 ⎞
First percentile (P1) = size of ⎜ ⎟ th item
⎝ 100 ⎠
Once values of the items are found out, then formulae of interpolation
are applied for ascertaining the value of Q1, Q3, D1, D4, P40 etc.
Example 26: Calculate Q1, Q3, D2 and P5 from the following data :
Marks: Below 10 10–20 20–40 40–60 60–80 Above 80
No. of Students: 8 10 22 25 10 5
Solution:
Calculation of Positional values
Marks No. of Students (f) c.f.
Below 10 8 8
10–20 10 18
20–40 22 40
40–60 25 65
60–80 10 75
Above 80 5 80
N = 80
N 80
Q1= size of th item = = 20th item
4 4
Hence Q1 lies in the class 20–40, apply
N
− Cf 0 N
Q1= l1 + 4 × i where l1 = 20, = 20, Cf0 = 18, f = 22 and i= = 20
f 4
By substituting the values, we get
(20 − 18)
Q1 = 20 + × 20 = 20 + 1.8 = 21.8
22
Similarly, we can calculate
3N 3 × 80
Q3 = size of th item = th item = 60th item.
4 4
Hence Q3 lies in the class 40–60
60 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
3N Notes
− Cf 0 3N
Q3 = l1 + 4 × i where l1 = 40, = 60, Cf0= 40, f = 25, i = 20.
f 4
(60 − 40)
? Q3 = 40 + × 20 = 40 + 16 = 56
25
2N
D2= size of th item = 16th item. Hence D2 lies in the class 10–20.
10
2N
− Cf 0
D2= l1 + 10 × i where l1 = 10, = 16, Cf0= 8, f = 10, i = 10.
f
(16 − 8)
D2 = 10+ ×10 = 10 + 8 = 18
10
5N 5 × 80
P5 = size of th item = th item = 4th item. Hence P5 lies in the
100 100
class 0–10
5N
− Cf 0
100 5N
P5 = l1 + × i where l1 = 0, = 4, Cf0 = 0, f = 8, i = 10
f 100
4−0
P5 = 0 + ×10 = 0 + 5 = 5.
8
PAGE 61
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
50 − (14 + x) 50 − (14 + x)
? 50 = 40 + × (60 − 40) or 50 = 40 + × 20
27 27
36 − x 20
or 50 − 40 = × 20 or 50 − 40 = (36 − x) ×
27 27
or 10 × 27 = 720 − 20x or 270 = 720 – 20x
? 20x = 720 – 270
450
x = = 22.5
20
By substituting the value of x in the equation,
x + y = 44
we get, 22.5 + y = 44
y = 44 – 22.5 = 21.5.
Hence frequency for the class 20–40 is 22.5 and 60–80 is 21.5.
Merits of Median:-
It is very easy to understand.
It can be easily calculated when open ended class distribution is
given because Median is the mid-value of a distribution.
62 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
2.12 Mode
Mode is the value of a repeated variable at maximum number of times
or with the highest frequency in a data series. The mode is the most
“fashionable” size in the sense that it is the most common and typical and
is defined by Zizek as “the value occurring most frequently in series of
items and around which the other items are distributed most densely.” In the
words of Croxton and Cowden, the mode of a distribution is the value at
the point where the items tend to be most heavily concentrated. According
to A.M. Tuttle, Mode is the value which has the greater frequency density
in its immediate neighbourhood. In the case of individual observations,
the mode is that value which is repeated the maximum number of times
in the series. The value of mode can be denoted by the alphabet z also.
Example 28 : Calculate mode from the following data:
Sr. Number : 1 2 3 4 5 6 7 8 9 10
Marks obtained : 10 27 24 12 27 27 20 18 15 30
Solution :
Marks No. of Students
10 1
12 1
15 1
18 1
20 1
24 1
27 3
30 1
PAGE 63
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes It is clearly shown in the above example that Mode is 27 marks because
this has the highest frequency in the which means 3 students get the 27
marks in the class.
Calculation of Mode in Discrete series. In discrete series, it is quite
often determined by inspection. We can understand with the help of an
example :
X 1 2 3 4 5 6 7
f 4 5 13 6 12 8 6
By inspection, the modal size is 3 as it has the maximum frequency. But
this test of greatest frequency is not fool proof as it is not the frequency
of a single class, but also the frequencies of the neighbour classes that
decide the mode. In such cases, we shall be using the method of Grouping
and Analysis table.
Size of shoe 1 2 3 4 5 6 7
frequency 4 5 13 6 12 8 6
Solution : By inspection, the mode is 3, but the size of mode may be
5. This is so because the neighbouring frequencies of size 5 are greater
than the neighbouring frequencies of size 3. This effect of neighbouring
frequencies is seen with the help of grouping and analysis table technique.
Grouping table
Size of Shoe Frequency
1 2 3 4 5 6
1 4
9
2 5 18 22
3 13 19
24 31
4 6 18
5 12 26
20 26
6 8
7 6 14
64 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 65
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Solution :
Grouping Table
X 1 2 3 4 5 6
0–10 5
15
10–20 10 30
25
20–30 15
29 39
30–40 14 39
40–50 10 24 29
15
50–60 5 18
8
60–70 3
Analysis Table
Column Size of item with maximum frequency
1 20–30
2 20–30, 30–40
3 10–20, 20–30
4 0–10, 10–20, 20–30
5 10–20, 20–30, 30–40
6 20–30, 30–40, 40–50
Modal group is 20–30 because it has occurred 6 times. Applying the
formula of interpolation.
f1 − f 0
Mode = l1 + (l2 − l1 )
2 f1 − f 0 − f 2
15 − 10 5
= 20 + (30 − 20) = 20 + (10) = 28.3
30 − 10 − 14 6
Calculation of mode where it is ill defined. The above formula is not
applied where there are many modal values in a series or a distribution.
For instance there may be two or more than two items having the
maximum frequency. In these cases, the series will be known as bimodal
66 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
or multimodal series. The mode is said to be ill-defined and in such cases Notes
the following formula is applied.
Mode = 3 Median – 2 Mean.
Example 30: Calculate mode of the following frequency data :
Variate value Frequency
10–20 5
20–30 9
30–40 13
40–50 21
50–60 20
60–70 15
70–80 8
80–90 3
Solution : First of all, ascertain the modal group with the help of process
of grouping.
Grouping Table
X 1 2 3 4 5 6
10–20 5
14
20–30 9 27
22
30–40 13 43
34
40–50 21 54
41
50–60 20 56
35
60–70 15 43
23
70–80 8 26
11
80–90 3
PAGE 67
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
40 47 − 27 200
= 45 + (10) = 45 + 4.2 = 49.2 = 40 + (10) = 40 + = 49.5
94 21 21
Mode = 3 median – 2 mean
= 3 (49.5) –2 (49.2) = 148.5 – 98.4 = 50.1
68 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Mode can also be computed by curve fitting. The following steps are to
be taken;
(i) Draw a histogram of the data.
(ii) Draw the lines diagonally inside the modal class rectangle, starting
from each upper corner of the rectangle to the upper corner of the
adjacent rectangle.
(iii) Draw a perpendicular line from the intersection of the two diagonal lines
to the X-axis. The abscissa of the point at which the perpendicular
line meets is the value of the mode.
Example 31 : Construct a histogram for the following distribution and,
determine the mode graphically :
X : 0–10 10–20 20–30 30–40 40–50
f : 5 8 15 12 7
Verify the result with the help of interpolation.
Solution :
16-
12-
8-
6-
3-
0 10 20 27 30 40 50
Mode
f1 − f 0
Mode = l1 + (l2 − l1 )
2 f1 − f 0 − f 2
15 − 8 7
= 20 + (30 − 20) = 20 + (10) = 27
30 − 8 − 12 10
PAGE 69
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
70 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 71
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
72 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
2.13 Summary
Central tendency indicates the location of the centre of a set of data.
It is the average value an average is a typical value which is used to
represent the entire set of values and is used as a benchmark to make
comparisons, a good average is expected to be based on all values; not
affected unduly by the presence of extremely large or small values in
the data, amenable to further algebraic treatment and having sampling
stability. Averages are distinguished as mathematical and positional.
Arithmetic mean is a mathematical average which is most commonly used
and understood and also very extensively used in statistical work. Obtained
by dividing the sum of values by their number, it is easy to calculate. It
enjoys well defined algebraic properties like zero-sum deviations, least
squares, and combined mean. It meets most of the requisites of a good
average. Geometric mean and harmonic mean are other mathematical
PAGE 73
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes averages but they have limited and specific uses. Their calculation is
restricted only to positive values. It is possible to calculate combined
average for two or more sets of data for each of these. Geometric mean,
which is nth root of the product of n values, is basically applied to obtain
average growth rates, price changes and depreciation rates Harmonic mean
is equal to the reciprocal of the arithmetic mean of reciprocals. It is used
to average rates. Harmonic mean is used when the weights are in terms
of the numerator factor of the given rates. Arithmetic mean is correct to
use when weights are in terms of the denominator factor.
Mathematical averages can be simple or weighted and used accordingly
as all values enjoy an equal or unequal weightage. Their values can be
calculated only by using well-defined formulae and cannot be obtained
graphically. Being based on all values, they are affected in a larger
measure by the presence of extreme values. The positional averages
include median and mode. While median refers to the central value in a
set of arrayed values, the mode is that value in a series which appears
the maximum number of times. A given set of individual values or a
frequency distribution may have one or more modal values. If values in
a given set of data are all unique, there is no mode. Mode suffers from
these drawbacks. The positional averages do not possess any mathematical
properties, except that the sum of absolute deviations from median is
the least.
In addition to median, there are a number of partition values that divide
given distribution into a certain number of parts. They include quartiles,
deciles and percentiles. The partition values are not averages but they
are discussed here for the reason that their calculation proceeds in the
same manner as that of median. They are used to locate relative position
of different values clearly (like use of percentiles in the CAT entrance
examinations) and also to calculate measures of variation, skewness etc.
(iii) In the deviation method of calculating arithmetic mean, the mean Notes
is obtained by adding the mean of the deviations to the assumed
mean value.
(iv) Arithmetic mean is not suitable for open-ended frequency distributions.
(v) All averages can be distinguished as being simple and weighted.
(vi) In the weighted arithmetic mean calculation, it is immaterial whether
the weights are expressed as, say, 20% and 80% or as 4 and 16.
(vii) The sum of squares of deviations as well as the sum of deviations
from mean is equal to zero.
(viii) For calculating different measures of central tendency, it is necessary
that all class intervals have equal width.
(ix) Median cannot be calculated in open-ended class frequency distributions.
(x) In an array of 41 items, median is equal to (41 + 1 )/2 = 21.
(xi) Two sets of values. A and B. are identical except that their respective
largest values are 80 and 8,000. The median of both the distributions
shall be same.
(xii) The sum of absolute deviations from median is equal to zero.
(xiii) The quartiles divide a distribution into four equal parts.
(xiv) A distribution has 10 deciles and 100 percentiles.
(xv) In a distribution of wages of the workers of a factory, the 95th
percentile indicates the maximum wage earned by the top 95 per
cent of the workers.
(xvi) The lower quartile in a distribution with a total frequency of 800
is equal to n/4 = 200.
(xvii) The median, quartiles and percentiles can be determined graphically
only by means of a “less than” ogive.
(xviii) The lower and the upper quartiles mark off the limits within which
the middle 50 per cent of the cases fall.
(xix) It is possible to have more than one median in a given distribution.
(xx) For every frequency distribution, the upper and lower quartiles are
located at equal distance from median.
PAGE 75
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
76 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(v) Define median, quartiles, deciles and percentiles. State the property Notes
of median. Does it have any practical application?
(vi) Write a detailed note on the choice of an average. Which average
would be more suitable in the following cases:
(a) Average size of ready-made garments sold by a store.
(b) Average intelligence level of students of a class.
(c) Average rate of growth of population per decade.
(vii) A taxi ride in New Delhi costs Rs. 20 for the first kilometer and Rs.
11 per kilometer thereafter. Assume that the cost of each kilometer
is incurred at the beginning of the kilometer. The waiting charges
are Rs. 30 per hour or a part thereof, subject to a minimum of 15
minutes stay. Calculate the effective average cost per kilometer
to a customer who rides a taxi from the Railway Station for her
home 21.7 kilometers away and chooses to stay for a coffee for 25
minutes on the way.
(viii) Find the arithmetic mean of the first 100 natural numbers.
(ix) The arithmetic mean of a distribution is known to be 55.45. It is
written below with the variate given in codified values. You are
required to determine the class intervals.
dƍ –3 –2 –1 0 1 2 3
f 10 28 30 42 65 15 10
It is known that various dƍYDOXHVKDYHEHHQFDOFXODWHGDV X – A)/10.
(x) In a hotel, a total of 500 bulbs were installed simultaneously and
their failure over time was observed as detailed below.
End of week : 1 2 3 4 5 6 7
No. of failures : 12 40 108 242 346 428 500
You are required to calculate the mean life of the bulbs.
(xi) A factory employs 100 workers. The mean daily wages of 99 of
these workers is Rs. 85 while the wages of the 100th worker are
Rs. 99 more than the mean wages of all the workers. Obtain mean
wages of the workers of the factory.
(xii) The following data gives the distribution of accidents in a large
city over weekdays of the last month :
PAGE 77
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
78 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 79
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (xx) The number of sales made by a shoe store in the City Mall during
the past 20 days is as follows:
7 6 13 16 8 5 9 9 10 19
16 8 11 13 7 24 22 15 21 21
Find the 50th, 75th and 88th percentiles.
Ans.
(vii) Rs. 12.14/km (viii) 50.5 (ix) 20–30, 30–40, etc.
(x) 4.15 Weeks (xi) Rs. 86 (xii) 14.27
(xiii) A Rs. 138 B (xiv) Rs. 73.004 (xv) 750
Rs. 143
(xvi) Mean = 56.4, (xvii) 2 (xviii) Med = 33.5, 25–43
Median = 58.5 app.
(xix) 15 and 10 (xx) 12, 18.25 and
21.48
80 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
L E S S O N
3
Measures of Variation –
Absolute and Relative
STRUCTURE
3.1 Learning Objectives
3.2 Need and Importance
3.3 What is Variation?
3.4 Requisites of a Good Measure of Variation
3.5 Types of Variation
3.6 Methods Computing Variation
3.7 Revisionary Problems
3.8 Summary
3.9 Self-Assessment Questions
PAGE 81
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes peculiarities and characteristics of the series. In other words, they fail
to reveal the degree of the spread out or the extent of the variability in
individual items of the distribution. This can be known by certain other
measures, known as ‘Measures of Dispersion’ or Variation.
We can understand variation with the help of the following example:
Series I Series II Series III
10 2 10
10 8 12
10 20 8
ȈX = 30 30 30
ΣX 30 30 30
X= = = 10 X= = 10 X= = 10
N 3 3 3
In all three series, the value of arithmetic mean is 10. On the basis of
this average, we can say that the series are alike. If we carefully examine
the composition of three series, we find the following differences:
(i) In case of 1st series, the value are equal; but in 2nd and 3rd series,
the values are unequal and do not follow any specific order.
(ii) The magnitude of deviation, item-wise, is specific different for the
1st, 2nd and 3rd series. But all these deviations cannot be ascertained
if the value of ‘simple mean’ is taken into consideration.
(iii) In these three series, it is quite possible that the value of arithmetic
mean is 10; but the value of median may differ from each other.
This can be understood as follows:
I II III
10 2 8
10 Median 8 Median 10 Median
10 20 12
The value of ‘Median’ in 1st series is 10, in 2nd series = 8 and in
3rd series = 10. Therefore, the value of Mean and Median are not
identical.
(iv) Even though the average remains the same, the nature and extent of
the distribution of the size of the items may vary. In other words,
the structure of the frequency distributions may differ even though
their means are identical.
82 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 83
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
3.6.1 Range
It is the simplest method of studying dispersion. Range is the difference
between the smallest value and the largest value of a series. While
computing range, we do not take into account frequencies of different
groups.
Formula: Absolute Range = L– S
L −S
Coefficient of Range =
L+S
where, L represents largest value in a distribution.
S represents smallest value in a distribution.
We can understand the computation of range with the help of examples
of different series.
84 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 85
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes L − S 35 − 12 20
Coefficient of Range = = = = 0.5 approx.
L + S 35 + 12 40
Range is a simplest method of studying dispersion. It takes lesser time
to compute the ‘absolute’ and ‘relative’ range. Range does not take into
account all the values of a series, i.e.it considers only the extreme items
and middle items are not given any importance. Therefore, Range cannot
tell us anything about the character of the distribution. Range cannot
be computed in the case of ‘open ends’ distribution i.e., a distribution
where the lower limit of the first group and upper limit of the higher
group is not given.
The concept of range is useful in the field of quality control and, to
study the variations in the prices of the shares etc.
86 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 87
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Solution : In the problem, we will first, compute the values of Q3 and Q1.
Salaries (Rs.) No. of workers Cumulative frequencies
(x) (f) (c.f.)
60 4 4
100 20 24 – Q lies in this cumulative
1
120 21 45 frequency
140 16 61
160 9 70
1 Ȉf = 70
Calculation of Q1 : Calculation of Q3 :
⎛ N +1⎞ ⎛ N +1⎞
Q1 = size of ⎜ ⎟ th item Q3 = size of 3 ⎜ ⎟ th item
⎝ 4 ⎠ ⎝ 4 ⎠
⎛ 70 + 1 ⎞ ⎛ 70 + 1 ⎞
= size of ⎜ ⎟ th item = 17.75th item = size of 3 ⎜⎝ 4 ⎟⎠ th item = 53.25th
⎝ 4 ⎠ item
17.75 lies in the cumulative frequency 24, 53.25 lies in the cumulative frequency 61
which is corresponding to the value which is corresponding to Rs. 140
Rs. 100
? Q1 = Rs. 100 ? Q3 = Rs. 140
(i) Inter-quartile range = Q3 – Q1= Rs. 140 – Rs. 100 = Rs. 40
Q3 − Q1 ⎛ 140 − 100 ⎞
(ii) Semi-quartile range = =⎜ ⎟ = Rs. 20
2 ⎝ 2 ⎠
Q3 − Q1 140 − 100 40
(iii) Coefficient of Quartile Deviation = = = = 0.17
Q3 + Q1 140 + 100 240
approx.
Calculation of Inter-quartile range, semi-quartile range and Coefficient
of Quartile Deviation in the case of continuous series
Example 6 : We are given the following data :
Salaries (Rs.) No. of workers
10–20 4
20–30 6
88 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 89
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes 3N 3 × 25 75
= = = 18.75 which lies in the cumulative frequency 20, which
4 4 4
is corresponding to class 30–40.
Therefore Q3 group is 30–40.
3N
where, l1 = 30, i = 10, = 18.75, cf0 = 10, f = 10
4
18.75 − 10
Q3 = 30 + ×10 = Rs. 38.75
10
Therefore :
(i) Inter-quartile range = Q3 – Q1 = Rs. 38.75 – Rs. 23.75 = Rs. 15.00
Q3 − Q1 15.00
(ii) Semi-quartile range = = = 7.50
2 2
(iii) Coefficient of Quartile Deviation =
Q3 − Q1 Rs. 38.75 – Rs. 23.75 15
= = = 0.24.
Q3 + Q1 Rs. 38.75 + Rs. 23.75 62.50
90 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Similarly, sometimes we calculate percentile range, say, 90th and 10th Notes
percentile as it gives slightly better measure of dispersion, in certain
cases. If we consider the calculations, then
(i) Absolute percentile range = P90 – P10
P90 − P10
(ii) Coefficient of percentile range =
P90 + P10
PAGE 91
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (iv) Apply the formula to get Average Deviation about Mean or Median
or Mode.
Example 7 : Suppose the values are 5, 5, 10, 15, 20. We want to calculate
Average Deviation and Coefficient of Average Deviation about Mean or
Median or Mode.
Solution :
Average Deviation about mean (Absolute and Coefficient).
Deviation from mean Deviations after ignoring signs
(X) d |d|
ΣX
5 – 6 6 X = N
92 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Σ | d | 25 Notes
Average deviation about Median = = = 5.2 .
N 5
Coefficient of Average Deviation about median
A.D. about Median 5
= = = 0.5
Median 10
Average Deviation (Absolute and Coefficient) about Mode
X Deviation from mode d | d |
5 0 0
Mode 5 0 0
10 + 5 5
15 + 10 10
20 + 15 15
N= 5 Ȉ _d|= 30
Σ | d | 30
Average deviation about Mode = = = 6.
N 5
A.D. about Mode 6
Coefficient of Average Deviation about mode = = = 1.2.
Mode 5
Average deviation in case of discrete and continuous series
Σf | d |
Average Deviation about Mean or Median or Mode =
N
where N = No. of items
| d | = deviations from Mean or Median or Mode, after ignoring
negative signs.
Coefficient of A.D. about Mean or Median or Mode
A.D. about Mean or Median or Mode
=
Value of Mean or Median or Mode
Example 8 : Suppose we want to calculate coefficient of Average Deviation
about Mean from the following discrete series:
X Frequency
10 5
15 10
PAGE 93
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes X Frequency
20 15
25 10
30 5
Solution : First of all, we shall calculate the value of arithmetic Mean,
Calculation of Simple Mean
X f fX
10 5 50
15 10 150
20 15 300
25 10 250
30 5 150
N = 45 ȈfX = 900
Calculation of Coefficient of Average Deviation about Mean
Deviation Deviations after ignoring Ȉf|d|
from mean negative signs | d |
X f d
10 5 – 10 10 50
15 10 – 5 5 50
20 15 0 0 0
25 10 + 5 5 50
30 5 + 10 10 50
N= 45 Ȉf |d| = 200
A.D. about Mean 4.4
Coefficient of Average Deviation about Mean = = = 0.22
Mean 20
Σ | d | 200
Average Deviation about Mean = = = 4.44 approx.
N 45
In case we want to calculate coefficient of Average Deviation about
Median from the following data:
Class Interval Frequency
10–14 5
15–19 10
20–24 15
94 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
N 45
? Median size = th item i.e. = 22.5
2 2
It lies in the cumulative frequency 30, which is corresponding to class
interval 19.5–24.5.
Median group is 19.5–24.5
PAGE 95
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
96 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
ΣX 2
ı − ( X ) 2 where N = Number of the items,
N
ΣX 2
or ı2 = − ( X )2 X = Given values in the series
N
X = Arithmetic mean of the values
We can also write the formula as follows :
2
ΣX 2 ⎛ ΣX ⎞ ΣX
ı −⎜ ⎟ where X =
N ⎝ N ⎠ N
PAGE 97
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Solution : We are required to calculate the values of. They are calculated
as follows :
X X2
220
ı − (6)2
5
2 4 = 44 − 36 = 8 = 2.828
4 16 9DULDQFH ı2) = ( 8) 2 = 8
ΣX 30
6 36 X = = =6
N 5
8 64
10 100
N = 5 Ȉ; = 220
2
There are certain specific problems, where the method can be applied. It
is different type of problem which is given as follows :
(ii) When the deviations are taken from actual mean
Σx 2
ı where N = No. of items and x = ( X − X )
N
Steps to Calculate ı
(i) Compute the deviations of given values from actual mean i.e., ( X − X )
and represent them by x.
(ii) Square these deviations and aggregate them.
98 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Notes
Σx 2
(iii 8VH WKH IRUPXOD ı
N
Example 10 : We are given values as 2, 4, 6, 8, 10. We want to find
out standard deviation.
X x = (X – X ) x2
2 2 – 6 = –4 (– 4 )2 = 16
4 4 – 6 = –2 (–2)2 = 4
6 6 – 6 = 0 = 0
8 8 – 6 = + 2 (2)2 = 4
10 10 – 6 = + 4 (4)2 = 16
N = 5 Ȉx2 = 40
⎛ ΣX 30 ⎞
X = 6⎜ = ⎟
⎝ N 5 ⎠
Σx 2 40
ı = = 8 = 2.828
N 5
(iii) When the deviations are taken from assumed mean
2
Σdx 2 ⎛ Σdx ⎞
ı −⎜ ⎟
N ⎝ N ⎠
where, N = No. of items.
dx = deviations from assumed mean i.e., (X – A).
A = assumed mean
Steps to Calculate :
(i) We consider any value as assumed mean. The value may be given
in the series or may not be given in the series.
(ii) We take deviations from the assumed value i.e., (X – A), to obtain
dx IRU WKH VHULHV DQG DJJUHJDWH WKHP WR ILQG Ȉdx.
(iii) We square these deviations to obtain dx2 and aggregate them to find
Ȉdx2.
(iv) Apply the formula given above to get standard deviation.
PAGE 99
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Example 11 : Suppose the values are given as 2, 4, 6, 8 and 10. We can
obtain the standard deviation as:
X dx = (X – A) dx2
2 – 2 = (2 – 4) 4
assumed mean (A) 4 0 = (4 – 4) 0
6 + 2 = (6 – 4) 4
8 + 4 = (8 – 4) 16
10 + 6 = (10 – 4) 36
N = 5 Ȉdx = 10 Ȉdx2= 60
2 2
Σdx 2 ⎛ Σdx ⎞ 60 ⎛ 10 ⎞
ı −⎜ ⎟ = − ⎜ ⎟ = 12 − 4 = 8 = 2.828.
N ⎝ N ⎠ 5 ⎝5⎠
100 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Notes
⎛d ⎞
X d = (X – A) dx = ⎜ ⎟ ,i = 2 dx 2
⎝i⎠
2 –2 –1 1
A= 4 0 0 0
6 +2 1 1
8 +4 2 4
10 +6 3 9
N = 5 Ȉdx = 5 Ȉdx = 15
2
2
Σdx 2 ⎛ Σdx ⎞
ı −⎜ ⎟ × i where N = 5, i = 2, dx Ȉdx2 = 15
N ⎝ N ⎠
2
15 ⎛ 5 ⎞
ı − ⎜ ⎟ × 2 = 3 − 1 × 2 = 2 × 2 = 1.414 × 2 = 2.828.
5 ⎝5⎠
Note : We can notice an important point that the standard deviation value
is identical by four methods. Therefore, any of the four formulae can be
applied to find the value of standard deviation. But the suitability of a
formula depends on the magnitude of items in a question.
σ
Coefficient of Standard-deviation =
X
,Q WKH DERYH JLYHQ H[DPSOH ı = 2.828 and
σ 2.828
Therefore, coefficient of standard deviation = = = 0.471
X 6
Coefficient of Variation or C.V.
σ 2.828
= ×100 = ×100 = 47.1%
X 6
Generally, coefficient of variation is used to compare two or more series.
If coefficient of variation (C.V.) is more in one series as compared to
the other, there will be more variations in that series, lesser stability or
consistency in its composition. If coefficient of variation is lesser as
compared to other series, it will be more stable or consistent. Moreover,
that series is always better where coefficient of variation is lesser or
coefficient of standard deviation is lesser.
PAGE 101
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Example 13 : Suppose we want to compare two firms where the salaries
of the employees are given as follows:
Firm A Firm B
No. of workers 100 100
Mean salary (Rs.) 100 80
Standard-deviation (Rs.) 40 45
Solution : We can compare these firms either with the help of coefficient
of standard deviation or coefficient of variation. If we use coefficient of
variation, then we shall apply the formula :
⎛σ ⎞
C.V. = ⎜ ×100 ⎟
⎝X ⎠
Firm A Firm B
40 45
C.V. = ×100 = 40% C.V. = ×100 = 56.25%
100 80
X = 100, σ = 40 X ı
Because the coefficient of variation is lesser for firm A as compared to
firm B, therefore, firm A is better.
Calculation of standard-deviation in discrete and continuous series
We use the same formula for calculating standard deviation for a continuous
series and a discrete series. The only difference that in discrete series,
values and frequencies are given whereas in a continuous series, class-
intervals and frequencies are given. When the mid-points of these class-
intervals are obtained, a continuous series takes the shape of a discrete
series. Alphabet X denotes values in a discrete series and mid points in
a continuous series.
When the deviations are taken from actual mean
We use the same formula for calculating standard deviation for a continuous
series:
Σfx 2
ı
N
102 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Σfm 990
Therefore, X = = = 22 where, N Ȉ fm = 990
N 45
Calculation of Standard Deviation
Class Mid Deviations from
Intervals points actual median = 22
f X x x2 f x2
10 – 14 5 12 –10 100 500
15 – 19 10 17 –5 25 250
PAGE 103
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Σfx 2
ı where, N = 45, Σfx 2 = 1500
N
1500
ı = 33.33 = 5.77 approx.
45
When the deviations are taken from assumed mean
In some cases, the value of simple mean may be in fractions, then it
becomes time consuming to take deviations and square them. Alternatively,
we can take deviations from the assumed mean.
2
Σfdx 2 ⎛ Σfdx ⎞
ı −⎜ ⎟
N ⎝ N ⎠
where N = number of items,
dx = deviations from assumed mean (X – A),
f = frequency of the different groups,
A = assumed mean and
X = values or mid points.
Step to calculate ı
(i) Take the assumed mean from the given values or mid points.
(ii) Take deviations from the assumed mean and represent them by dx.
(iii) Square the deviations to get dx2.
(iv) Multiply f with dx of different groups to obtain fdx and add them
XS WR JHW Ȉfdx.
(v) Multiply f with dx2 of different groups to obtain fdx2 and add them
XS WR JHW Ȉfdx2.
(vi) Apply the formula to get the value of standard deviation.
104 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
2
Σfdx 2 ⎛ Σfdx ⎞
ı −⎜ ⎟ where, N = 45, Σfdx 2 = 2625, Σfdx = 225
N ⎝ N ⎠
2
2625 ⎛ 225 ⎞
? ı −⎜ ⎟ = 58.33 − 25 = 33.33 = 5.77 approx.
45 ⎝ 45 ⎠
When the step deviations are taken from the assumed mean
2
Σfdx 2 ⎛ Σfdx ⎞
ı −⎜ ⎟ ×i
N ⎝ N ⎠
ZKHUH 1 1XPEHU RI WKH LWHPV Ȉf),
i = common factor,
f = frequencies corresponding to the different groups,
⎛ X − A⎞
dx = step-deviations ⎜ ⎟
⎝ i ⎠
Steps to calculate ı
(i) Take deviations from the assumed mean of the calculated mid-points
and divide all deviations by a common factor (i) and represent these
values by dx.
(ii) Square these step deviations dx to obtain dx2 for different groups.
PAGE 105
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (iii) Multiply f with dx of different groups to find fdx and add them to
REWDLQ Ȉfdx.
(iv) Multiply f with dx2 of different groups to find fdx2for different groups
DQG DGG WKHP WR REWDLQ Ȉfdx2.
(v) Apply the formula to get standard deviation.
Example 16 : Suppose we are given the series and we want to calculate
standard deviation with the help of step deviation method. According to
the given formula, we are required to calculate the value of i, N, Ȉfdx
DQG Ȉfdx2.
Class Frequency Mid Deviations i = 5
Intervals point from ⎛ X – A⎞
assumed ⎜⎝ i ⎟⎠
mean (22)
f x X dx dx2 fdx fdx2
10 – 14 5 12 – 10 – 2 4 – 10 20
15 – 19 10 17 – 5 – 1 1 – 10 10
20 – 24 15 22 + 0 0 0 0 0
25 – 29 10 27 + 5 + 1 1 10 10
30 – 34 5 32 + 10 + 2 4 10 20
N =45 Ȉfdx = 0 Ȉfdx2= 60
2
Σfdx 2 ⎛ Σfdx ⎞
ı −⎜ ⎟ × i, where, N = 45, i Ȉfdx Ȉfdx2= 60
N ⎝ N ⎠
2
60 ⎛ 0 ⎞ 4
? ı −⎜ ⎟ ×5 = × 5 = 1.33 × 5 = 1.154 × 5 = 5.77 approx.
45 ⎝ 45 ⎠ 3
106 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Disadvantages Notes
(i) It is difficult to compute.
(ii) It assign more weights to extreme items and less weights to items
that are nearer to mean. It is because of this fact that the squares
of the deviations which are large in size would be proportionately
greater than the squares of those deviations which are comparatively
small.
PAGE 107
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Example 17: Find the combined standard deviation of two series, from
the below given information:
First Series Second Series
No. of items 10 15
Arithmetic means 15 20
Standard deviation 4 5
Solution : Since we are considering two series, therefore combined
standard deviation is computed by the following formula :
1
ı ( N 2 − 1) where, N represents number of items
12
108 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 109
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Corrected ΣX 2
&RUUHFWHG ı − (corrected X ) 2
N
Corrected ΣX
(a) Compute corrected X =
N
ZKHUH FRUUHFWHG Ȉ; Ȉ; FRUUHFW LWHPV ± :URQJ LWHPV
ZKHUH Ȉ; N.X
(b &RPSXWHFRUUHFWHGȈ;2 Ȉ;2 + (Each correct item)2 – (Each
wrong item)2
ZKHUH Ȉ;2 = Nσ2 + NX 2
Example 19 : (a) Find out the coefficient of variation of a series for
which the following results are given:
1 Ȉ;ƍ Ȉ;ƍ2 ZKHUH;ƍ GHYLDWLRQIURPWKHDVVXPHG
average 5.
(b) For a frequency distribution of marks, in statistics of 100 candidates
(grouped in class intervals of 0–10, 10–20) the mean and, standard
110 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
deviation, were found to be 45 and 20. Later it was discovered that the Notes
score 54 was misread as 64 in obtaining frequency distribution. Find
out the correct mean and correct standard deviation of the frequency
distribution.
(c) Can, coefficient of variation be greater than 100%? If so, when?
Solution : (a) We want to calculate, coefficient of variation, which is =
σ
×100.
X
Therefore, we are required to calculate mean and standard deviation.
ΣX ′
Calculation of simple mean X = A + = where, A = 5, N Ȉ;ƍ
N
25
X = 5+ = 5.5
50
Calculation of standard deviation
2 2
ΣX ′ 2 ⎛ ΣX ′ ⎞ 500 ⎛ 25 ⎞
ı − ⎜ ⎟ = − ⎜ ⎟ = 5 − 0.25 = 4.75 = 2.179
N ⎝ N ⎠ 50 ⎝ 50 ⎠
Calculation of Coefficient of variation
σ 2.179 217.9
C.V. = ×100 = ×100 = = 39.6%
X 5.5 5.5
(b) Given X ı N = 100, wrong value = 64, correct value = 54.
Since this is a case of continuous series, therefore, we will apply the
formulae for mean and standard deviation that are applicable in continuous
series.
Calculation of correct Mean
Σfx
X= or NX = ΣfX
N
By substituting the values, we get 100 × 45 = 4500
Correct ΣfX = 4500 – 64 + 54 = 4490
PAGE 111
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
ΣfX 2
or 400 = − 2025
100
ΣfX 2
or 400 + 2025 =
100
or 2425 × 100 = ΣfX 2 = 242500
Correct ΣfX 2
&RUUHFW ı − (Correct ( X )) 2
N
241320
= − (44.9) 2 = 2413.20 − 2016.01 = 397.19 = 39.9 approx.
100
Hence, coefficient of variation can be greater than 100% only when the
value of standard deviation is greater than the value of mean.
This will happen when data contains a large number of small items and
few items are quite large. In such a case the value of simple mean will
be pulled down and the value of standard deviation will go up.
Similarly, if there are negative items in a series, the value of mean will
come down and the value of standard deviation shall not be affected
because of squaring the deviations.
Example 20 : In a distribution of 10 observations, the value of mean
and standard deviation are given as 20 and 8. By mistake, two values
112 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
are taken as 2 and 6 instead of 4 and 8. Find out the value of correct Notes
mean and variance.
Solution : We are given; N = 10, X ı = 3
Wrong values = 2 and 6 and Correct values = 4 and 8
Calculation of correct Mean
ΣX
X = or X = ΣX
N
? ȈX = 10 × 20 = 200
%XW ȈX LV LQFRUUHFW 7KHUHIRUH ZH VKDOO ILQG FRUUHFW Ȉ;
&RUUHFW ȈX = 200 – 2 – 6 + 4 + 8 = 204
Correct ΣX 204
Correct Mean = = = 20.4
N 10
Calculation of correct variance
ΣX 2
ı2 = − ( X )2
N
ΣX 2
RU ı 2
= − ( X )2
N
ΣX 2
or (8)2 = − (20) 2
10
ΣX 2
or 64 = − 400
10
ΣX 2
or 64 + 400 =
10
RU ȈX2 = 4640
%XW WKLV LV ZURQJ DQG KHQFH ZH VKDOO FRPSXWH FRUUHFW ȈX2
&RUUHFW ȈX2 = 4640 –22 – 62 + 42 + 82
= 4640 – 4 – 36 + 16 + 64
= 4680
PAGE 113
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes
Correct ΣX 2
&RUUHFW ı 2
= − Correct ( X ) 2
N
4680
= − (20.4) 2 = 468 − 416.16 = 51.84
10
114 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Figure
We notice in the above diagram that the Lorenz curve for Area B
companies is away from the line of equal distribution in comparison
with Lorenz curve for Area A. Therefore, we can conclude that there is
more variability in Area B companies as compared to Area A companies.
PAGE 115
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Farm Size (acres) No. of firms Farm Size (acres) No. of firms
0–40 394 161–200 169
41–80 461 201–240 113
81–120 391 241 and over 148
121–160 334
Solution : In this case, the real limits of the class intervals can be
obtained by subtracting 0.5 from the lower limits of the class intervals
and adding 0.5 to the upper limits of the different class intervals. This
adjustment is necessary to calculate median and quartiles of the series.
Farm Size (acres) No. of firms Cumulative frequency
(c.f.)
0–40 394 394
41–80 461 855
81–120 391 1246
121–160 334 1580
161–200 169 1749
201–240 113 1862
241 and over 148 2010
N = 2010
N 4 − c. f 0
Q1 = l1 + ×i
f
n 2010
' = = 502 th item
4 4
Q1 lies in the cumulative frequency of the group 41–80, where the real
limits of class intervals are 40.5–80.5 and l1 = 40.5, f = 461, i = 40, c.f0
n
= 394, = 502.5
4
502.5 − 394
? Q1 = 40.5 + × 40 = 40.5 + 9.4 = 49.9 acres
461
3n
− c. f 0
Similarly, Q3 = l1 + 4 ×i
f
116 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
3n 3 × 2010 Notes
= ×1507.5 th item
4 4
Q3 lies in the cumulative frequency of the group 121–160, where the real
limits of the class interval are 120.5–160.5 and l1 = 120.5, i = 40, f =
3n
334, = 1507.5, c.f. = 1246
4
1507.5 − 1246
? Q3 = 120.5 + × 40 = 120.5 + 31.3 = 151.8 acres
334
Inter-quartile range = Q3 – Q1 = 151.8 – 49.9 = 101.9 acres
Q3 − Q1 151.8 − 49.9
Semi-quartile range = = = 50.95 approx.
2 2
Q3 − Q1 151.8 − 49.9 101.9
Coefficient of quartile deviation = = = = 0.5
Q3 + Q1 151.8 + 49.9 201.7
approx.
Example 23 : Calculate mean and coefficient of mean deviation about
mean from the following data:
Marks less than No. of students
10 4
20 10
30 20
40 40
50 50
60 56
70 60
Solution : In this question, we are given less than type series alongwith
the cumulative frequencies. Therefore, we are required first of all to find
out class intervals and frequencies for calculating mean and coefficient
of mean deviation about mean.
PAGE 117
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
118 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Solution : Notes
Calculation of Standard Deviation
Class Frequency Mid Deviations Step
points from Deviations
Intervals assumed when i = 10
Mean
(A = –5)
( X – A)
f X Xƍ dx = dx2 fdx fdx2
i
–30 to –20 5 – 25 – 20 – 2 4 – 10 20
–20 to –10 10 – 15 – 10 – 1 1 – 10 10
10 to 0 15 – 5 + 0 0 0 0 0
0 to 10 10 5 + 10 1 1 10 10
10 to 20 5 15 + 20 2 4 10 20
N = 45 Ȉfdx = 0 Ȉfdx = 60
2
2
Σfdx 2 ⎛ Σfdx ⎞
ı −⎜ ⎟ ×i
N ⎝ N ⎠
where N = 45, i Ȉfdx Ȉfdx2 = 60
2
60 ⎛ 0 ⎞ 60
? ı − ⎜ ⎟ ×10 = ×10 = 1.33 ×10 = 1.153
45 ⎝ 45 ⎠ 45
Example 25 : For two firms A and B belonging to same industry, the
following details are available:
Firm A Firm B
Number of Employees: 100 200
Average wage per month: Rs. 240 Rs. 170
Standard deviation of the wage per month: Rs. 6 Rs. 8
Find (i) Which firm pays out larger amount as monthly wages?
(ii) Which firm shows greater variability in the distribution of
wages?
(iii) Find average monthly wage and the standard deviation of the
wages of all employees firms.
Solution : (i) For finding out which firm pays larger amount, we have
WR ILQG RXW Ȉ;
PAGE 119
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes
ΣX
X = RU ȈX = NX
N
Firm A : N = 100, X = 240 ? ȈX = 100 × 240 = 24000
Firm B : N = 200, X = 170 ? ȈX = 200 × 170 = 34000
Hence firm B pays larger amount as monthly wages.
(ii) For finding out which firm shows greater variability in the distribution
of wages, we have to calculate coefficient of variation
σ 6
Firm A : C.V. = ×100 = ×100 = 2.50
X 240
σ 8
Firm B : C.V. = ×100 = ×100 = 4.71
X 170
Since coefficient of variation is greater for firm B, hence it shows greater
variability in the distribution of wages.
N1 X1 + N 2 X 2
(iii) Combined wage : X12 =
N1 + N 2
120 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Σfdx
(i) X = A + ×i where, N = 360, A = 143, i Ȉfdx = 182
N
182
? X = 143 + × 5 = 143 + 2.53 = 145.53.
360
σ
(ii) C.V. = ×100
X
2 2
Σfdx 2 ⎛ Σfdx ⎞ 1618 ⎛ 182 ⎞
ı −⎜ ⎟ ×i = −⎜ ⎟ ×5
N ⎝ N ⎠ 360 ⎝ 360 ⎠
PAGE 121
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes
= 4.494 − 0.506 × 5 = 2.00 × 5 = 10
10
C.V. = ×100 = 6.87 per cent
145.53
Q3 − Q1
(iii) Q.D. =
2
N 360
Q1 = Size of th observation = = 90th observation
4 4
Q1 lies in the class 136–140. But the real limit of this class is
135.5–140.5.
N 4 − cf 0 90 − 75
Q1 = l1 + × i = 135.5 + × 5 = 135.5 + 1.56 = 137.06
f 48
3N 360
Q3 = Size of th observation = 3 × = 270th observation
4 4
Q3 lies in the class 151–155. But the real limit of this class is
150.5–155.5.
3 N 4 − cf 0 270 − 234
Q3 = l1 + × i = 150.5 + × 5 = 150.5 + 3.27 = 153.77.
f 55
Q3 − Q 1 153.77 − 137.06
Q.D. = = = 8.355.
2 2
3.8 Summary
While averages summarize and present data in a single number, variation is
studied to get a better idea of the nature of data. Variation can be absolute
or relative. Absolute variation refers to the amount of variation in a set of
data while relative variation serves to compare variability across different
sets of data. The ‘distance’ measures of variation include range and partial
ranges including inter-quartile range and inter-percentile range which are
used in addition to or as surrogates for range. Range is commonly used
in reporting price movements, quality control, etc. Coefficient of range
is a relative measure. The measures involving deviations include quartile
122 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
deviation and its co-efficient: Mean deviation and its coefficient; and Notes
standard deviation, variance and coefficient of variation.
Quartile deviation is a quick, inspectional measure of variability and used
when there are scattered or extreme values included in the data. A measure
based on each observation in the data is the mean deviation which is equal
to the sum of absolute deviations of the various observations from their
mean or median. The relative measure related to this is the coefficient of
mean deviation. Standard deviation is also based on all observations. It
is the best measure of variation as it possesses mathematical properties.
Coefficient of standard deviation is sometimes used instead of coefficient
of variation. All coefficients are pure numbers and there are no units
associated with them. Hence they are used for making comparisons of
variability.
Graphically, Lorenz curve is used to describe inequalities of income.
The extent of departure of the curve of actual distribution of income
from the line of equal distribution indicates the degree of inequalities
of income. Well-defined relationship exists between values of quartile
deviation, mean deviation and standard deviation in the case of normal
distributions. The relationship works well even for distributions which
deviate moderately from normality.
PAGE 123
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (viii) Since the sum of absolute deviations measured from median is
the minimum, this serves as the most appropriate average for
calculating mean deviation.
(ix) Since the sum of deviations of a set of values from their mean is
equal to zero, it follows that mean deviation from mean would
always be equal to zero.
(x) Mean deviation can never be negative.
(xi) Mean deviation cannot be calculated for distributions with open-
ended classes.
(xii) The arithmetic mean is used for measuring deviations in calculating
standard deviation due to its least squares property.
(xiii) Standard deviation cannot be equal to zero.
(xiv) Standard deviation can never exceed the arithmetic mean.
(xv) Standard deviation is positive or negative depending upon the sign
of deviations of various values from their mean.
(xvi) Variance is the square root of standard deviation.
(xvii) Coefficient of variation is always expressed as a percentage.
(xviii) Coefficient of standard deviation is equal to the ratio of standard
deviation to arithmetic mean of the data.
(xix) Coefficient of variation expresses arithmetic mean as a percentage
of standard deviation.
(xx) If each of the values of a set of data is increased by 5, the mean
and standard deviation would both increase by 5.
(xxi) If each of the values of a set of data is multiplied by –5, the
standard deviation would also be multiplied by the same number
and hence become negative.
(xxii) If each of the values of a set of data is increased by K, the
coefficient of variation would also increase by K.
(xxiii) When each value of a given set of data is multiplied by K, the
revised coefficient of variation would be K times the original
coefficient value.
(xxiv) The combined standard deviation of two sets of data will always
lie between the standard deviation values of the two sets.
124 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 125
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (viii) What is Lorenz curve? How is it obtained? Discuss its significance
as a tool of studying variation.
(ix) Determine (i) weekly and (ii) monthly range of gold prices (per 10
gm) from the following data for a month:
Week High Low
1 28,122 27,880
2 29,208 28,890
3 28,890 28,706
4 29,225 28,930
(x) The heights of 11 men are measured as 65, 68, 70, 69, 58, 66, 71,
65, 67, 69 and 73 inches. Calculate the range. If the shortest and
the tallest of them are omitted, what is the percentage change in
range?
(xi) Draw a “less than ogive” from the following data and obtain the
lower and upper quartiles there from. Also, calculate the values of
quartile deviation and its coefficient.
Wages (in Rs.) No. of workers
5,000 or more Nil
4,500 or more 4
4,000 or more 18
3,500 or more 38
3,000 or more 60
2,500 or more 75
2,000 or more 85
1,500 or more 93
1,000 or more 100
(xii) The following table shows the percentage of different age groups
to the total population of a certain country:
Age group Percentage of the total
(years) population
0–14 42.0
15–19 8.7
20–24 7.9
25–29 7.4
126 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 127
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
128 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 129
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes 40 is added to every observation of the same data set, then the
co-efficient of variation of the resulting set of data is 10%. Find
the mean and standard deviation of the original set of data.
(xxix) A set of 40 numbers has mean and standard deviation equal to
DQG ı UHVSHFWLYHO\ ,I HDFK RI WKH YDOXHV RI WKH VHW LV PXOWLSOLHG
by 16, the co-efficient of variation works out to be 25% while
if each value of the set is increased by 16, the co-efficient of
variation becomes 20%.Find the mean and standard deviation of
the set of numbers.
(xxx) Two groups of workers, consisting of 30 and 50 persons, have
the same mean wages but different standard deviations. The
respective standard deviations are Rs. 16 and Rs. 12. Obtain the
combined standard deviation of their wages.
Ans.
(x) 15.6 (xi) Q1 = 2500, Q3 = 3825,
QD = 662.5 CQD =
0.209
(xii) 28.214, 14.107, 0.612 (xiii) 13.575, 0.206
(xiv) 147.33, 19.198, 13.03% (xv) 8,16
(xvi) 7, 3.742 (xvii) 20-40, 40-60 etc.
(xviii) 4.47 (xix) 4, 9
(xx) 15, 10 (xxi) 5.568%, 6.38%,
5.562%, 5.876%
(xxii) 1106.67, 1050, 16.65%, (xxiii) 73.5, 5.85
11.86%
(xxiv) A, 285000, 293000, (xxv) 2.5%, 4.71%, 5.96,
525.45, 56.72 2.88%
(xxvi) 50,155, 38 (xxvii) 36
(xxviii) 80, 12 (xxix) 64, 16
(xxx) 13.64
130 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
L E S S O N
4
Skewness and Kurtosis
STRUCTURE
4.1 Learning Objectives
4.2 Tests of Skewness
4.3 Nature of Skewness
4.4 Characteristics of Skewness
4.5 Methods of Skewness
4.6 Measures of Kurtosis
4.7 Comparison among Variation, Skewness and Kurtosis
4.8 Summary
4.9 Self-Assessment Questions
PAGE 131
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
132 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 133
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
134 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
3n 4 − c. f . 45
Q1 = l + ×i n 4= = 11.25 , lies in the cumulative
f 4
Absolute Skewness = Q3 + Q1 frequency, corresponding to class interval (10
– 2 median – 20)
where, Q3 = 33.75, Q1= 16.75,
11.25 − 5
Median = 25 Q1 = 10 + ×10 = 16.25
10
? Ab. Skewness = 33.75 +
3n 4 − c. f .
16.25 – 2(25) Q3 = l + ×i
f
= 50 – 50 = 0 3n/4 = 33.75, that lies in the cumulative
PAGE 135
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes
Q3 + Q1 − 2(Median)
Coefficient of Skewness =
Q3 − Q1
frequency 40, corresponding to group \ (30–40)
Now we have, Q3 = 33.75, Q1 = 16.25,
33.75 − 30
Q3 = 30 + ×10 = 33.75
10
Median 25
33.75 + 16.25 − 2(25) 0
? Coefficient of Skewness = = =0
33.75 − 16.25 17.5
136 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
X (X – 21) Notes
X x 2
12 –9 81 Σx 2
σ= where x = ( X − X )
N
296
18 –3 9 ∴ σ= = 59.2 = 7.7
5
18 –3 9
22 + 1 1
35 + 14 196
N = 5 Ȉx = 296
2
Mean – Mode
? Coefficient of skewness =
σ
Substitute Mean = 21, Mode =18, Standard deviation = 7.7.
21 − 18 3
? SK = = = + 0.4
7.7 7.7
Calculation of Karl-Pearson’s coefficient of skewness by using the
following formula:
3(Mean – Median)
Coefficient of skewness =
σ
For the given data X = 12, 18, 18, 22, 35
0HDQ 0HGLDQ ı
3(21 − 18) 3 × 3 9
? Coefficient of skewness = = = = 1.12
7.7 7.7 7.7
PAGE 137
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
n 2 − cf
Median = L1 + ×i
f
600
?N/2 = = 300; It lies in the cumulative frequency 420,
2
which is corresponding to group 201 – 300.
138 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
But the real limits of the class interval are 200.5 – 300.5 Notes
300 − 180 120
? Median = 200.5 + ×100 = 200.5 + ×100 = 250.5
240 240
n 4 − cf
Q1 = L1 + ×i
f
600
N/4 = = 150. It lies in the cumulative frequency 180, which is
4
corresponding to class interval 151–200.
But the real limits of this class-interval are 150.5–200.5.
150 − 60 90
? Q1 = 150.5 + × 50 = 150.5 + × 50 = 150.5 + 37.5 = Rs. 188
120 120
3n 4 − cf
Q3 = L1 + ×i
f
where 3n/4 is used to find out upper quartile group.
3 × 600
? 3n 4 = = 450 . It lies in the cumulative frequency 556, which is
4
corresponding to group 301 – 500.
The real limits of this class interval are 300.5–500.5
450 − 420 30
? Q 3 = 300.5 + × 200 = 300.5 + × 200 = 300.5 + 44.12 = Rs. 344.62
136 136
Hence, Coefficient of skewness =
344.62 + 188 − 2(250.5) 532.62 − 501 31.62
= = = + 0.2 approx.
344.62 − 188 156.62 156.62
Example 2 : Calculate the appropriate measure of skewness from the
following cumulative frequency distribution :
Age (under years) : 20 30 40 50 60 70
No. of persons : 12 29 48 75 94 106
Solution: In this problem, we are given the upper limits of classes along
with the cumulative frequency. Therefore, we have to find out the lower
limits and frequencies for the given data.
PAGE 139
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
⎛ Q + Q1 − 2 Median ⎞
Bowley’s coefficient of skewness = ⎜ 3 ⎟
⎝ Q3 − Q1 ⎠
Thus, we have to calculate the values of Q3, Q1 and median.
n 2 − c. f .
Median = L1 + ×i
f
N 106
Median has items or or 53 items below it.
2 2
Therefore, it lies in the cumulative frequency 75, which is corresponding
to the class-interval (40–50). Hence, median group is (40–50).
N
where L1 = 40, i = 10, f = 27, = 53, c.f. = 48
2
53 − 48 5
? Median = 40 + ×10 = 40 + ×10 = 40 + 1.9 = 41.9
27 27
x
− c. f
Q1 = L1 + 4 ×i
f
N 106
Q1has or or 26.5 items below it.
4 4
140 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 141
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes n
− c. f .
Median = L1 + 2 ×i
f
N 150
Median has
or or 75 items below it. It lies in the cumulative
2 2
frequency 80, which is corresponding to the group (40–50). Therefore,
median group is 40–50.
N
where, L1 = 40, = 76, f = 10, c. f . = 70, i = 10
2
75 − 70
Median = 40 + + 10 = 45.
10
Calculation of Mean and Standard Deviation
Marks Frequency Mid Deviations i = 10 dx2 fdx fdx2
f points from
Assumed
Mean
X (X – 45)
⎛ X – 45 ⎞
dx = ⎜ ⎟
⎝ 10 ⎠
0–10 10 5 – 40 – 4 16 – 40 160
10–20 40 15 – 30 – 3 9 – 120 360
20–30 20 25 – 20 – 2 4 – 40 80
30–40 0 35 – 10 – 1 1 0 0
40–50 10 45 0 0 0 0 0
50–60 40 55 + 10 + 1 1 40 40
60–70 16 65 + 20 + 2 4 32 64
70–80 14 75 + 30 + 3 9 42 126
N= 150 Ȉfdx = –86 Ȉ fdx2 = 830
Σfd
X = A+ × i , where A = 45, N = 150, i Ȉfdx = –86.
N
−86
? X = 45 + ×10 = 45 − 5.73 = 39.27
150
2
Σfdx 2 ⎛ Σfdx ⎞
ı −⎜ ⎟ ×i
N ⎝ N ⎠
142 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 143
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
144 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 145
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (c) In a positively skewed curve, the value of mean is greater than median
is greater than mode. In other words, mean > median > mode.
In the given problem, for finding out the degree of skewness, we have
to compute the coefficient of skewness,
ZKHUH ȕ2 = 3, Mesokurtic Curve
ȕ2< 3, Platykurtic Curve
ȕ2>Leptokurtic Curve
146 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Σfx 2 60000
μ2 = = = 133.33
N 45
μ4 40000
? ȕ2 = = =3
μ 2 (133.33) 2
2
PAGE 147
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
4.8 Summary
Both skewness and kurtosis are related to the shape of the frequency
curve. Skewness means lack of symmetry, which implies that the mean,
median and mode are unequal in such a case. Skewness is positive when
its longer tail is to the right and negative when it is on the left. There
are three measures of skewness, given by Karl Pearson which is based
on averages and standard deviation; by Bowley which uses median and
quartiles; and by Kelly, based on median and the tenth and ninetieth
percentiles. Kurtosis refers to relative height of the frequency curve.
Distributions can be mesokurtic, leptokurtic and platykurtic on this basis.
148 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(vi) A longer tail to the right indicates positive skewness while a longer Notes
tail to the left indicates negative skewness in the data.
(vii) For any skewed distribution, Mean – Mode = 3(Mean – Median).
(viii) Positive skewness is indicated when > Me> Mo and negative
skewness when < Me < Mo.
(ix) In a highly skewed distribution, the value of second quartile may be
different from that of the median.
(x) In every distribution, the lower and upper quartiles are equidistant
from median.
(xi) Bowley’s measure of skewness can vary between ±3.
(xii) Negatively skewed distributions are usually platykurtic.
(xiii) Bowley’s measure of skewness is more appropriate to use in an
open-ended distribution.
(xiv) A distribution more peaked than normal distribution is called
platykurtic distribution.
(xv) Kelly’s measure of kurtosis can vary between the limits of – 0.2631
to +0.2369.
(xvi) The five-point summary of a distribution includes mean, median,
mode, lower quartile and upper quartile.
(xvii) A distribution with lower quartile = 127.8, median = 135.2 and
upper quartile = 148.8 has negative skewness.
Ans.
(i) F (ii) T (iii) F (iv) T (v) T (vi) T (vii) F
(viii) T (ix) F (x) F (xi) F (xii) F (xiii) T (xiv) F
(xv) T (xvi) F (xvii) F
PAGE 149
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (iii) State the empirical relationship between mean, median and mode for
unimodal frequency curves that are moderately skewed.
(iv) Explain the measures of skewness given by Karl Pearson and Bowley.
(v) What is kurtosis? How is it measured in terms of Kelly’s formula
and the beta co-efficient?
(vi) “Averages, measures of variation, skewness and kurtosis are complementary
in understanding a frequency distribution.” Explain.
(vii) For the distribution of daily wages of a factory employing 880 workers,
the co-efficient of quartile deviation is 3/5 and the co-efficient of
skewness based on quartiles is 1/3. The median wage is known to
be Rs. 90. Calculate the lower and upper quartile wages.
(viii) In a symmetrical distribution, the mean, standard deviation and
range of marks for a group of 20 students are 40, 12 and 60. Find
the standard deviation of marks if the students with highest and
lowest marks are excluded.
(ix) Given that median = 133.5 and mode = 134, obtain the missing
frequencies for the following distribution and then calculate Bowley’s
co-efficient of skewness:
Class Interval Frequency
100–110 8
110–120 32
120–130 ?
130–140 ?
140–150 ?
150–160 12
160–170 8
Total 460
(x) For a distribution, Bowley’s co-efficient of skewness is 0.6. If the
sum of the upper and the lower quartiles is 100 and median is 38,
find the values of the upper and lower quartiles.
(xi) For a distribution, Bowley’s co-efficient of skewness is – 0.36, lower
quartile is 8.6 and median is 12.3. Calculate the co-efficient of
quartile deviation for this distribution.
150 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(xii) The following table gives the distribution of monthly wages of 500 Notes
workers in a factory:
Monthly Wages No. of Workers
(in Rs.)
1,500 – 2,000 10
2,000 – 2,500 25
2,500 – 3,000 145
3,000 – 3,500 220
3,500 – 4,000 70
4,000 – 4,500 30
Compute average monthly wage, mode, standard deviation and Karl
Pearson’s co-efficient of skewness.
(xiii) Given that median = 46 and mode = 37, find the missing frequencies
of the following distribution and also calculate Karl Pearson’s co-
efficient of skewness:
Class 20–30 30–40 40–50 50–60 60–70 70–80 80–90 Total
Interval :
Frequency : 12 ? ? ? 12 9 7 100
(xiv) Consider the following data about two distributions:
Distribution A Distribution B
Mean 120 110
Median 110 120
Standard deviation 10 10
Examine the following statements, stating with reasons whether each
of them is true or false:
(a) Distribution A has the same degree of variation as distribution
B has.
(b) Distribution A has the same degree of skewness as distribution
B has.
(xv) Karl Pearson’s co-efficient of skewness of a distribution is 0.40. Its
standard deviation is 8 and mean is 30. Find the median and mode
of the distribution.
(xvi) For a moderately skewed distribution of the retail prices of children’s
shoes, it is found that the mean price is Rs. 180 and the median
PAGE 151
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes price is Rs. 164. If the co-efficient of variation is 20%, find the
Karl Pearson’s co-efficient of skewness.
(xvii) For a distribution, mean = 65, median = 70 and co-efficient of
skewness = –0.6. Find the mode and co-efficient of variation.
(xviii) Using mean and median, calculate Karl Pearson’s co-efficient of
skewness for the following distribution:
Marks : 10-100 10-80 10-60 10-50 10-40 10-30 10-20
Frequency : 100 88 73 57 35 17 5
(xix) Given, mean = 50, co-efficient of variation = 40% and J = –0.4.
Find mode, median and standard deviation.
(xx) The sum of 20 observations is 300 and the sum of their squares is
5000. Find the co-efficient of variation and co-efficient of skewness,
given further that median =15.
(xxi) Pearson’s co-efficient of skewness for a data distribution is 0.5 and
co-efficient of variation is 40%. Its mode is 80. Find the mean and
median of the distribution.
Ans.
(ix) –0.115 (x) 70, 30 (xi) 0.24
(xii) 3.345, 3.167, (xiii) 26, 20, 14, 0.7 (xiv) (a) False
503.46, 0.354 CVA< CVB
(b) True
(xv) 38.93, 36.8 (xvi) 1.33 (xvii) 80, 38.46%
(xviii) 0.12 (xix) 58, 52.67, 20 (xx) 33.33%, 0
(xxi) 100, 93.33
152 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
L E S S O N
5
Moments
STRUCTURE
5.1 Learning Objectives
5.2 Concept of Central Moments
5.3 Sheppard’s Method
5.4 &RHI¿FLHQWV RI 0RPHQWV
5.5 Summary
5.6 Self-Assessment Questions
PAGE 153
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Σx 2
μ2 =
N
Σx 3
μ3 =
N
Σx 4
μ4 =
N
In case of frequency distribution apply:
Σfx
μ1 =
N
Σfx 2
μ2 =
N
Σfx3
μ3 =
N
154 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Notes
Σfx 4
μ4 =
N
Let us take an example to understand the computation of the moments
about mean
Example 1 : Calculate the first four moments about the mean from the
following set of numbers 2, 3, 7, 8, 10.
Solution :
Calculation of Moments
(X – X )
X x x2 x3 x4
2 –4 16 –64 256
3 –3 9 –27 81
7 1 1 1 1
8 2 4 8 16
10 4 16 64 256
ȈX = 30 0 46 –18 610
ΣX 30
X = = = 6 , where N = 5
N 5
Moments of the data can be computed by using the values calculated
above.
Σx 0
μ1 = = =0
N 5
Σx 2 46
μ2 = = = 9.2
N 5
Σx3 −18
μ3 = = = −3.6
N 5
Σx 4 610
μ4 = = = 122
N 5
Therefore, the first four central moments about the mean are : 0, 9.2,
–3.6 and 122 respectively.
PAGE 155
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
ΣfX 3100
? X = = = 31 marks.
N 100
Now, we can calculate the moments about mean as follows :
Σfx 0
μ1 = =
N 100
156 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Notes
Σfx 4 1,59, 75, 200
μ4 = = = 1,59, 752
N 100
Therefore, the Central Moments are : 0, 254, 1212, 159752 respectively.
5.2.2 Short-Cut-Method
If the arithmetic mean is in fractions then, it is difficult to calculate
deviations (x) from arithmetic mean. Short-cut method is used in such
cases.
(i) Take any value as an arbitrary mean (A).
(ii) Calculate deviations (d) from A and calculate the first four moments
in the similar way as done in direct method.
These moments are called moments about an arbitrary origin which are
represented by the greek word v read as nu. The formulae for these
moments are :
Σ( X − A) Σd
v1 = = where d = X – A
N Ν
Σ( X − A) 2 Σd 2
v2 = =
N Ν
Σ( X − A)3 Σd 3
v3 = =
N Ν
Σ( X − A) 4 Σd 4
v4 = =
N Ν
In case of frequency distribution,
Σf ( X − A) Σfd
v1 = =
N N
Σf ( X − A) 2 Σfd 2
v2 = =
N N
Σf ( X − A)3 Σfd 3
v3 = =
N N
PAGE 157
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes
Σf ( X − A) 4 Σfd 4
v4 = =
N N
After calculating moments about an arbitrary origin convert them into
Moments about mean by using the following equations:
μ 1 = v 1– v 1 = 0
μ 2 = v 2 – v 12 ı2
μ3 = v3 – 3v2v1 + 2v13
μ4 = v4 – 4v3.v1 + 6v2.v12 – 3v14
We can calculate the Moments about an arbitrary origin from Moments
about the mean by this relationship:
v1 = μ1+d where d is the difference between the
v2 = μ2 + d2 mean and origin about which the Moments
v3 = μ3 + 3μ2d + d3 are to be calculated.
v4 = μ4 + 4μ3d + 6μ2d2 + d4 ?d = X – A
Example 3 : We are given the following set of numbers 1, 3, 7, 9, 10.
Calculate the first four moments about the origin 4.
Solution :
Calculation of First Four Moments about A = 4
X d = (X – A) d2 d3 d4
1 – 3 9 – 27 81
3 –1 1 – 1 1
7 3 9 27 81
9 5 25 125 625
10 6 36 216 1296
N = 5 10 80 340 2084
Σd 10
v1 = = =2
N 5
Σd 2 80
v2 = = = 16
N 5
Σd 3 340
v3 = = = 68
N 5
158 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Notes
Σd 4 2084
v4 = = = 416.8
N 5
Therefore the Moments about an arbitrary origin are 2, 16, 68 and 416.8
respectively.
Example 4 : Calculate first four moments about mean for the distribution
of heights of the following 100 students.
Heights (Inches) 61 64 67 70 73
Number of Students 5 18 42 27 8
Solution :
Calculation of Central Moments (short-cut method)
Heights No. of A = 67 f × d fd × d fd2 × d fd3 × d
students
X f d = (X – 67) fd fd2 fd3 fd4
61 5 –6 –30 180 –1,080 6,480
64 18 –3 –54 162 – 486 1,458
67 42 0 0 0 0 0
70 27 +3 81 243 729 2,187
73 8 +6 48 288 1,728 10,368
N = 100 45 873 891 20,493
Now we can substitute the calculated values in the formulae
Σfd 45
v1 = = = 0.45
N 100
Σfd 2 873
v2 = = = 8.73
N 100
Σfd 3 891
v3 = = = 8.91
N 100
Σfd 4 20493
v4 = = = 204.93
N 100
Moments about mean can be calculated as follows :
μ1 = v1 –v1 = 0 = 0.45 – 0.45 = 0
PAGE 159
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Σfd 45
X = A+ = 67 + = 67.45
N 100
D = ( X − A) = 67.45 − 67 = 0.45
v1 = μ1 + d = 0 + 0.45 = 0.45
v3 = μ3 + 3μ 2 d + d 3
= −2.6932 + 3 × 8.5275 × (0.45) + (0.45)3 = −2.6932 + 11.512125 + 0.091125 = 8.91.
v4 = μ 4 4μ3d + 6μ 2 d 3 + d 4
= 199.3759 – (4 × –2.69325 + 0.45) + 6 × 8.5275 × (0.45)2 + (0.45)4
= 199.3759 – 4.84785 + 10.36091 + 0.041006 = 204.93
? Moments about an arbitrary origin (67) are : 0.45, 8.73, 8.91 and
204.93.
5.2.3 Step-Deviation-Method
It is the most appropriate method to calculate central moments in problems
of continuous frequency distributions with equal class-intervals. Step-
deviation method is similar to short cut method. The only difference is
that in case of step-deviation method, we take a common factor from
among the deviations (d)which are taken from assumed mean (A).
160 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
μ2 = v2 − v12 = σ2
μ3 = v3 − 3v2v1 + 2v13
PAGE 161
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Solution :
Calculation of Moments about arbitrary Mean
(Mid Points) A = 35, C = 10
μ3 = v3 − 3v2v1 + 2v13
= 30 – 3 × 191 (–0.3) + 2 (–0.3)3 = 30 + 171.9 – 0.054 = 201.846
μ 4 = v4 − 4v3 .v1 + 6v2 .v12 − 3v14
PAGE 163
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Solution : We are given all the values of four moments and class-interval.
μ1 and μ3needs no correction.
1 2 1 100
μ2 (corrected) = μ 2 − i = 254 − ×102 = 254 − = 245.667
12 12 12
1 7 4 1 7
μ4 (corrected) = μ 4 − μ 2i 2 + i = 1,59752 − × 254 ×102 + ×104
2 240 2 240
= 1,59752 – 12,700 + 291.667 = 147,343.667.
Therefore, the corrected values of four moments are 0,245.667, 1,212
and 147,343.667 respectively.
μ3 μ3 μ4 μ4
α3 = = , and α4 = =
σ3 μ2
32
σ4 μ 22
Beta-Coefficients
μ32 μ4
β1 = = α 22 and β2 = = α4
μ32 μ 22
Gamma-Coefficients
μ 4 − 3μ 22
γ1 = β1 = α3 and γ 2 = β2 − 3 =
μ 22
%HWD&RHIILFLHQWV ȕ1 DQG ȕ2) are used to find the skewness and kurtosis
of a distribution. Let us take an illustration to understand coefficients.
Example 7: The values of μ1 , μ 2 , μ3 , and μ 4 , are 0, 9.2, 3.6 and 122
respectively. Find out the skewness and kurtosis of the distribution.
164 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
μ4 122 122
ȕ2 = = = = 1.4
μ 2 (9.2)
2 2
84.64
Hence the distribution is positively skewed and the curve is platykurtic
or flat at the top.
Example 8 : Calculate the first four Moments about an arbitrary origin.
Convert them into Moments about mean. Applying Sheppard’s corrections,
calculate corrected Moments and beta coefficients from the following data:
Experience No. of Employees
(years)
0–1 15
1–2 22
2–3 45
3–4 35
4–5 30
5–6 20
6–7 16
7–8 10
8–9 5
9–10 2
PAGE 165
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Solution :
Calculation of Moments
Experience Mid No. of Let A = 4.5
Points Employees
(Years) X f d = (X – A) fd fd2 fd3 fd4
0–1 0.5 15 –4 – 60 240 – 960 3,840
1–2 1.5 22 – 3 – 66 198 – 594 1,782
2–3 2.5 45 – 2 – 90 180 – 360 720
3–4 3.5 35 – 1 – 35 35 – 35 35
4–5 4.5 30 0 0 0 0 0
5–6 5.5 20 1 20 20 20 20
6–7 6.5 16 2 32 64 128 256
7–8 7.5 10 3 30 90 270 810
8–9 8.5 5 4 20 80 320 1,280
9–10 9.5 2 5 10 50 250 1,250
200 – 139 957 – 961 9,993
Σfd −139
We can find out v1 = = = −0.695
N 200
Σfd 2 957
v2 = = = 4.785
N 200
Σfd 3 −961
v3 = = = −4.805
N 200
Σfd 4 9993
v4 = = = 49.965
N 200
Computed moments are moments about an arbitrary point, 4.5. The central
moments are calculated below:
μ1 = v1 – v1 = – 0.695 – (–0.695) = 0
μ2 = v2 – v12= 5.985 – (–0.695)2 = 5.985 – 0.483 = 5.502
μ3 = v3 – 3v2v1 + 2v13
= – 4.805 – 3 (5.985) × (– 0.695) + 2(–0.695)3 = – 4.805 + 12.479
– 0.671 = 7.003
μ4 = v4 – 4v3.v1 + 6v2.v12 – 3v14
= 49.965 – 4 (– 4.805) × (– 0.695) + 6 × 5.985 (–0.695)2 – 3(– 0.695)4
= 49.965 – 13.358 + 17.345 – 0.700 = 53.252.
166 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
μ32
(7.003) 2 49.042
ȕ1 = 3 = = = 0.31
μ 2 (5.419) 2 159.132
μ4 50.33 50.33
ȕ2 = = = = 1.714
μ 22 (5.419) 2 29.336
Therefore the central moments after correction are 0, 5.419, 7.003 and
ȕ1 DQG ȕ2 = 1.714.
5.5 Summary
Moments provide a useful method to study various characteristics of a
set of data. Moments calculated about mean are called central moments.
There can also be moments about any given value A. When A = 0,
moments calculated are called moments about origin. It is possible to
convert moments about A into central moments and vice versa. The first
moment about zero is equal to mean and the first moment about mean
is equal to zero.
The second central moment is the variance of the distribution. The beta
and g-statistics are calculated with central moments and are used to learn
DERXW VNHZQHVV DQG NXUWRVLV ȕ1 and g1 are measures of skewness while
ȕ2 and g2 measure kurtosis. If the third central moment is positive, the
PAGE 167
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
168 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 169
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
170 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(b) Calculate the first four central moments and the beta co-efficient Notes
of skewness.
(c) Comment on the results.
(xii) You are given the following frequency distribution:
Class Interval Frequency
80–100 12
100–120 15
120–140 20
140–160 38
160–180 60
180–200 33
200–220 14
220–240 8
(a) Calculate moments about 170.
(b) Convert these to central moments and obtain the values of
beta and gamma statistics. Also, comment on the shape of
the distribution.
(xiii) For the following distribution, calculate first four moments about
A = 150 and obtain central moments from these. Also, calculate
beta co-efficients and comment on the skewness and kurtosis.
Class Interval Frequency
80–100 3
100–120 5
120–140 20
140–160 16
160–180 10
180–200 4
200–220 2
(xiv) The first four moments of a distribution are 2, 20, 40, and 500,
respectively. Comment on the shape of the distribution.
(xv) The first four moments of a distribution are calculated as 6; 235;
PAGE 171
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes 1,248; and 24,680. You are required to examine the skewness and
kurtosis of the distribution.
(xvi) From the following information about two distributions, find which
is more skewed?
Distribution Second Moment Third Moment
A 16 –15.7
B 40 25.8
(xvii) For a mesokurtic distribution, the fourth central moment is 768.
Obtain its standard deviation.
(xviii) The first four moments of a distribution about the value 4 of the
variable are –1.5, 17, –30 and 108. Its mean is given to be 2.5.
You are required to:
(a) Calculate the central moments and moments about origin.
(b) Determine the co-efficient of variation and variance of the
distribution.
(c) Examine the shape of the distribution.
(xix )RU D PHVRNXUWLF GLVWULEXWLRQ ȕ1= 0.004 and μ3= 16. Calculate the
value of its fourth central moment.
(xx) For a mesokurtic distribution, co-efficient of variation = 40% and
arithmetic mean = 40. Find the value of its fourth central moment.
(xxi) If variance = 42, then what values of μ4 would make a distribution
(i) mesokurtic, (ii) platykurtic, and (iii) leptokurtic?
(xxii) The first two moments about 40 for a set of 25 values were calculated
as equal to 65 and 2,985, respectively. Test if the calculations are
consistent.
(xxiii) You are given here the results of calculations in respect of a
negatively skewed distribution:
1 PHDQ YDULDQFH ȕ1 DQG ȕ2 = 2.4.
It was discovered later on that an item 12 was wrongly recorded as
2. Find the corrected values of mean, variance and the two beta
constants.
(xxiv)For a mesokurtic distribution, it is known that the first moment about
172 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 173
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
UNIT-2
PAGE 175
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
L E S S O N
1
Theory of Probability
STRUCTURE
PAGE 177
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes the early nineteenth century Pierre Simon and Marquis de Laplace
popularize the general theory of probability by compiling all these earlier
ideas.
Probability is the branch
Second Die
of mathematics concerned 6
with the outcomes of random
events. Probability refers to 5
the chance or potential of
4
an outcome. It explains the
likelihood of occurrence of 3
specific event(s). Thus, the
probability is defined as the 2
degree to which something is 1
likely to occur. The probability
that an event will occur ranges
1 2 3 4 5 6
from 0 to 1, where 0 denotes First Die
impossible event while 1 sample space for pair of dice experiment
denotes certain event.
Figure 1
Initially, probability theory
was applied in gambling to determine the chance of losing or winning,
eventually in the nineteenth century the concept of probability was used
in the insurance industry to quantify the risk of loss in order to calculate
insurance premiums to be charged from policyholder. Over the period
of time, its application extended to social and economic problems. For
instance – Most of the decision-making situations in business management
involve uncertainty. Since uncertainty is present and is an important aspect
in determining the consequences of various Jacob Bernoulli, Abraham
de Moivre, Reverend Thomas Bayes and Joseph Lagrange significantly
contributed to develop the probability formulas and techniques. However,
in the early nineteenth century Pierre Simon and Marquis de Laplace
popularize the general theory of probability by compiling all these earlier
ideas.
Probability is the branch of mathematics concerned with the outcomes
of random events. Probability refers to the chance or potential of an
outcome. It explains the likelihood of occurrence of specific event(s).
178 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 179
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes One of the simplest sample spaces can be the set of outcomes when
a pair of coins is tossed. It consists of four outcomes which can be
conveniently represented as :
S = {HH, HT, TH, TT}
where H denotes a head and T denotes a tail.
We can consider the case of a manufacturer who produces electric bulbs
in large batches. From each batch, a sample of 80 items is selected at
random, and the number of defective items are recorded. Although the
number of defectives in any sample cannot be predicted with certainty, all
of the possible outcomes may be known. The number of defective items
in a sample can be any integer from 9 to 80. Here the sample space is :
S = {0, 1, 2, 3, .......80}
In the same manner, when a pair of dice is tossed, a total of 36 outcomes
are possible. This can be represented as shown in figure 1
S = {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,2), (2,3) (2,4), (2,5),(2,6),
(3,1), (3,2), (3,3), (3,4), (3,5), (3,6), (4,1), (4,2), (4,3), (4,4), (4,5), (4,6),
(5,1), (5,2), (5,3), (5,4), (5,5), (5,6), (6,1), (6,2), (6,3), (6,4), (6,5), (6,6)}
Or simply Sample Space can be calculated as:
S = (all possible outcome of Die 1) × (all possible of outcome of Die 2)
In all the three examples, the number of outcomes from the experiment
are known to be finite. While in most cases it is so but it is not a rule.
The number of outcomes can be infinite as well. For example, it we
consider the experiment of observing the life-time of an electric bulb
in hours, the outcome can be any real, non-negative number. Thus, this
sample space contains an infinite number of sample points.
180 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Examples of random experiment are tossing a coin, rolling a die, drawing Notes
a card from a deck.
1.3.3 Event
An event refers to any set of possible outcomes in a sample space. If
the sample space for an event has the elements S1, S2, S3,...Sn, an event
in the sample space S would be any one, or collection of S1, S2, S3;...Sn.
In a sample space, every combination of sample points may be defined
PAGE 181
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
182 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 183
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
184 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
There are three basic ways of assigning probability to an event. They are :
(i) Classical approach,
(ii) Relative frequency approach, and
(iii) Subjective approach.
PAGE 185
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes n(A) 3 1
? P(A) = = = ; and
n(S) 6 2
n(A) 4 2
P(B) = = =
n(S) 6 3
Example 4 : A card is drawn from a deck of playing cards at random.
Find the chance that (i) it is a face card, (ii) it is a black ace card.
Solution : Let A : the event that the card is a picture card
B : the event that the card is black ace card
we have, n(S) = 52 (there being 52 cards)
n(A) = 12 (there being 4J, 4Q and 4K cards with faces)
n(B) = 2 (there being 2 black aces)
12 3
? P(A) = = , and
52 13
2 1
P(B) = =
52 26
Example 5 : Find the probability that a leap year selected at random
shall contain 53 Sundays.
Solution : Like every year, a leap year would have 52 full weeks. The
remaining two days of the years could be:
Sunday and Monday, Monday and Tuesday, Tuesday and Wednesday,
Wednesday and Thursday,
Thursday and Friday, Friday and Saturday, or Saturday and Sunday.
We observe here that n(S) = 7. Since two of the above combinations
have a Sunday included, we haven(A) = 2.
n(A) 2
Therefore, P(A) = = .
n(S) 7
The classical theory, under the assumption of equally likely outcomes,
depends on logical reasoning. It does very well when we are concerned
with balanced coins, perfect dice, wellshuffled pack of cards and all
those situations where all outcomes are equally likely. However, problems
are immediately encountered when we have to deal with the unbalanced
186 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
coins, loaded dice and so on. In such situations, we have to depend on Notes
the relative frequency approach.
Limitations of Classical Approach:
1. The classical probability cannot be applied in situations in which
the possibilities that arise cannot be regarded as equally likely.
Thus, it has serious problem while dealing with decision-making
problems in management.
2. It fails to calculate the probability while dealing with infinite sample
space.
3. It makes assumptions about scenarios that are extremely unlikely but
may theoretically occur such as coin landing on edge.
PAGE 187
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
188 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
examination may be placed at 80 per cent by one person, another might Notes
estimate the chances to be 95 per cent. Accordingly, the two would assign
a probability 0.80 and 0.95 respectively for the event to happen.
In may be mentioned that the three approaches to probability definitions
are not competitive rather they are complementary in nature.
PAGE 189
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Solution: There are total 4 distinct letters in the given word. Thus, n is 4
All 4 letters are to be taken to make all possible arrangement. Thus, r = 4
4! 4!
Therefore, nPr = 4P4 = = = 4! {0! is always 1}
4!-4! 0!
The word GOAL can be arranged or formed using all letters of GOAL
in 4! ways or 24 ways.
Example 7: How many words can be formed using the letters of the
word GOAL starting with G and ending with L?
Solution: If we fix two letters G and L at the beginning and at the end
respectively, then we are left with only two distinct letters with two
empty spaces as shown below:
G _ _ L
Therefore remaining 2 letters can be arranged or formed in 2P2 ways or
2! ways.
Example 8: How many words can be formed using the letters of the
word MATHEMATICS?
Solution: Here, we need to make some adjustments in our solution because
all the letters of the given word is not unique such as the letters M, A,
and T are being repeated twice.
Hence, The required number of arrangements can be calculated by dividing
the nPr by the number of times each letter is repeated,
11
Required number of arrangements = P11 ÷ 2!×2!×2!
Here first 2! is used for M being repeated twice, second 2! for A being
repeated twice and third 2! for T being repeated twice.
Combinations: Combination is a process of selection of elements from
a set of elements in which (unlike permutations) the order of selection
does not matter. Combination is a special type of permutation selection
in which the order does not matter. So, the number of permutations is
always greater than the number of combinations for selecting ‘r’ elements
out of ‘n’ elements. It is feasible to count the number of combinations in
smaller circumstances, but the potential of a set of combinations increases
with the number of groups of elements or sets. As a result, a formula has
190 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
been developed to determine the number of things that can be selected. Notes
Thus, A combination is created by selecting r items from a group of n
items without replacing them and without regard to their order.
n n!
Cr =
r!( n - r )!
(1) If a job can be done in m ways and another job can be done in n
ways, then the total number of ways in which both of them can be done
is m × n. This is the fundamental multiplication rule.
Example 11 : A man can go from city A to city B by three routes and
come back by any of four routes, in how many ways can he perform his
to and fro journey.
Solution : He can perform the journey in a total of 3 × 4 = 12 different
ways.
Example 12 : Three balanced dice are tossed. Find the chance that the
sum of digits on the two would be equal to 10.
Solution : Total number of ways in which three dice can fall = 6 × 6 ×
6 = 216. Total number of ways in which a total of 10 can appear = 27
(as shown below)
(1, 3, 6), (1, 4, 5), (1, 5, 4), (1, 6, 3), (2, 2, 6), (2, 3, 5), (2, 4, 4),
(2, 5, 3), (2, 6, 2), (3, 1, 6), (3, 2, 5), (3, 3, 4), (3, 4, 3), (3, 5, 2),
PAGE 191
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (3, 6, 1), (4, 1, 5), (4, 2, 4), (4, 3, 3), (4, 4, 2), (4, 5, 1), (5, 1, 4),
(5, 2, 3), (5, 3, 2), (5, 4, 1), (6, 1, 3), (6, 2, 2), (6, 3, 1)
27 1
Accordingly, P (total of 10) = =
216 8
nP 10! 10 × 9 × 8 × 7 × 6!
r = 10
P4 = = = 5040
(10 − 40)! 6!
(4) If out of n objects, k1 are alike, k2 are alike, k3 are alike....and so on
such that k1 + k2 + k3 + ......... = n, the number of arrangements of the
n objects would be equal to:
n n!
Pk1, k2 , k3 .... =
k1 !k2 !k3 !,....
Example 15 : In how many ways can the “letters in the word STATISTICS
be arranged ?
Solution : Here n = 10, k1 (S) = 3, k2(T) = 3, k3(I) = 2, k4(A) = 1 and
k5(C) = 1
192 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Accordingly Notes
10P 10!
3, 3, 2, 1, 1 = 3!3!2!1!1! = 50400.
nC = n!
r or n !/ (n − r )!r !
(n − n)!r !
Example 16 : In how many ways can a committee of 3 persons be chosen
out of a total of 10 persons?
Solution : Here n = 10 and r = 3. The total number of committees would
be :
nC = 10! 10 × 9 × 8 × 7!
r
10
C3 = = = 120
(10 − 3)!3! 7! × 3 × 2
Example 17 : A committee of four is to be selected randomly out of a
total of 10 executives, 3 of which are chartered accountants. Find the
probability that the committee would include exactly 2 CAs.
Solution : The committee of 4 executives can be selected out of a total
of 10 executives in 10C4 ways. The number of ways in which 2C As can
be selected out of 3 is equal to 3C2 while the number of ways in which
2 executives out of a total of 7 executives is equal to 7C2.
3
C2 × 7 C2 63
? P (committee includes exactly 2 CAs) = 10
=
C4 210
Example 18 : Two cards are drawn at random from a well-shuffled deck
of cards. Find the probability that both are ace cards.
Solution : No. of ways in which 2 cards can be selected out of 52 cards
52 × 51× 50!
= 52
C2 = = 1326
50!2
No. of ways in which 2 aces can be selected out of 4 ace cards =
4!
4
C2 = =6
2!2!
PAGE 193
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes 6 1
? P(2 ace cards) = =
1326 221
194 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
3 2 Notes
P(not hitting the target) = 1 − =
5 5
PAGE 195
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
196 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 197
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
198 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Solution : (i) Let A be the event that the first ball drawn is white and Notes
B be the event that the second ball drawn is black. From the given
3
information, P(A) = , since there are three white balls in a total of
8
eight balls. (ii) To determine the conditional probability of B given A,
P (B/A), which is the probability of drawing a black ball on the second
draw after drawing a white ball for the first draw, it should be noted that
if A has already occurred, then there is a total of seven balls remaining
5
and five of them are black. Thus, P(B/A) = .
7
PAGE 199
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes 8 7 4
? P(A B) = P (A) × P(B/A) = × =
15 14 15
Rule 6 : If the events A and B are independent, the probability that both
events occur can be determined by using P(A) and P(B). As mentioned
earlier, two events are independent if the occurrence of one has no
effect upon the occurrence of the other. More formally, if A and B are
independent,
P(A/B) = P(A), and P(B/A) = P(B).
If A and B are independent, the conditional probability of A, given B,
is the same as P(A), since the occurrence of the event B does not affect
the occurrence of the event B; P(A/B) = P(A).
The joint probability of independent events may be seen as the product
of the probabilities of the events A and B, since:
P(A ∩ B)
P(A/B) = = P(A) and P(A B) = P(A) × P(B)
P(B)
To generalize, for independent events A, B, C ... we have
P(A B C) = P(A) × P(B) × P (C) × ....
Example 25 : Two balls are selected one after the other from an urn
containing 7 black and 8 green balls. The first ball is replaced before
the second one is drawn. Find the probability that both would be green.
Solution : Let A and B be the events that the first and the second ball,
respectively, would be green.
From the given information,
8 8
P(A) = and P(B) =
15 15
Accordingly,
8 8 64
P(A B) = P(A) × P(B) = × =
15 15 225
If it is significant to note that the condition P(A B) = P(A) × P(B)
is satisfied then the events A and B are said to be independent, just as
when they are independent then this relation is satisfied. This condition
200 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
can be employed to determine whether the given events A and B are Notes
independent.
Example 26 : For the data given in example, test whether the events of
an employee selected being male and an employee selected being clerk
are independent.
Solution : Let A be the event that an employee selected is male and B
be the event that an employee selected would be a clerk. From the given
information,
the number of employees who are males = 700
the number of employees who are clerks = 400
the number of employees who are males and clerks = 300
Accordingly
700 400 300
P(A) = , P(B) = , and P(A B) =
1000 1000 1000
700 400 300
Here since × ≠ , therefore the events A and B are
1000 1000 1000
not independent.
Now we shall discuss the theorem of total probability, also called as the
theorem of elimination.
Rule 7 : If H1, H2, ....Hn be n mutually exclusive events, each with a
non-zero probability, and E be an event defined on the same sample
space and can be associated with either of them, the total probability of
event E to occur is given by : P(E) = P(H1) × P(E/H1) + P(H2) × P(E/
H2) +....+ P(Hn) × P(E/Hn).
Alternatively,
n
P(E) = ∑ [P(Hi ) × P(E/Hi )]
i=1
PAGE 201
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes wins, the probability that a product will be introduced is 0.80 while if
the second set wins, the probability for the product to be introduced is
0.30. Determine the probability that the product will be introduced.
Solution : If H1 is the event that the first set wins,
H2 is the event that the second set wins, and
E is the event that the product is introduced, then
P(H1) = 0.60, P(E/H1) = 0.80, P(H2) = 0.40, P(E/H2) = 0.30
Accordingly,
P(E) = P(H1) × P(E/H1) + P(H2) × P(E/H2)
= 0.6 × 0.8 + 0.4 × 0.3 = 0.48 + 0.12 = 0.60
Thus, there is a sixty per cent chance that the product shall be introduced.
202 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Notes
H2
H1
H3
H4
Bayes’ Theorem
Figure 4
We shall illustrate the concept with an example and then make a
generalization.
Example 28 : Box 1 contains 5 white balls and 3 red balls. Box 2
contains 4 white balls and 4 red balls. A box is selected at random and
one ball is randomly taken from that box. If the ball is white, what is
the probability that it came from box 1 ? box 2 ?
Solution : Let H1 : the box 1 is selected,
H2 : the box 2 is selected, and
E : the ball is white.
1 1
From the given information, P(H1) = , P(H2) = , P(E/H1) =
2 2
5 4
, and P(E/H2) = .
8 8
Here we wish to calculate P(H1/E) P(H2/E)
From the theorem of conditional probability,
P ( H E ) = P(HP(E)∩ E)
1 1 and P H 2 ( E ) = P(HP(E)∩ E)
2
PAGE 203
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes P(H2 E) is the probability of first selecting box 2 and then selecting
one white ball from it.
1 4 4
P(H2 E) = P(H2) × P(E/H2) = × =
2 8 16
Since the ball selected can be from box 1 or box 2, we have,
P(E) = P(H1E) + P(H2 E)
= P(H1) × P(E/H1) + P(H2) × P(E/H2)
⎛1 5⎞ ⎛1 4⎞ 5 4 9
= ⎜ × ⎟+⎜ × ⎟ = + = .
⎝ 2 8 ⎠ ⎝ 2 8 ⎠ 16 16 16
Accordingly,
( E)
5
H1 P(H1 ∩ E) P(H1 ∩ E)
P = = = 16 = 5
P(E) P(H1 ∩ E) + P(H 2 ∩ E) 9 9
16
( )
4
H2 P(H 2 ∩ E) 4
Also, P = = 16 =
E P(E) 9 9
16
Notice here that naturally either box 1 or box 2 would have been
selected. When no information about the colour of the ball is known,
the probability that box 1 is selected is 1/2 and so is the probability
that box 2 is selected. Thus, P(H1) = 1/2 and P(H2) = 1/2 are the prior
probabilities. Having known later on that the ball selected is of the
white colour, we have revised these probabilities of P(H1/E) = 5/9 and
P(H2/E) = 4/9. These probabilities are known as posterior probabilities.
Thus the prior probabilities are transformed into posterior probabilities
by incorporating the additional information, with the help of conditional
and joint probabilities. The information in the above stated example can
be restated as follows :
Event Prior Conditional Joint Prob. Posterior
Prob. Prob. Prob.
(Hi) P(Hi) P(E/Hi) P(Hi E) P(Hi/E)
H1 1/2 5/8 5/16 5/9
H2 1/2 4/8 4/16 4/9
Total, P(E) = 9/16
204 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
We can formally state the Bayes’ theorem now as follows : If H1, H2, Notes
...Hn be mutually exclusive and collectively exhaustive events and E be
an event which is arbitrarily defined on this sample space such that P(E)
> 0, then the Bayes’ Therom states that:
( E) P(Hi ∩ E) n
P
Hi
=
P(E)
where in P (E) ∑ P(Hi ∩ E)
i =1
P ( H E ) = P(HP(E)∩ E) = 0.120
1 1
0.155
= 0.77
P ( H E ) = P(HP(E)∩ E) = 0.035
2 2
0.155
= 0.23
PAGE 205
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
206 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 207
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes For a game to be fair, its net pay off must be equal to zero. We have,
Colour of Ball Payoff Probability Expected
Value
White 6 4/20 24/20
Green 2 8/20 16/20
Red –x (suppose) 8/20 –8x/20
0
24 16 8 x 40 8 x
? + − or =
20 20 20 20 20
or x = 40/8 = Rs. 5.
1.9 Summary
Probability is the likelihood that something will happen. When we calculate
the probability of an event, we assign it a number between zero and
208 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
one, depicting how likely it is to happen. There are three approaches to Notes
calculate, probability of an event. These are: (i) classical approach, where
the probability of an event is the ratio of number of favourable outcomes
to the number of total possible outcomes; (ii) relative frequency approach,
where an estimate of probability is given by the ratio of the number of
favourable outcomes to the number of trials made; and (iii) personalistic
approach, where the probability to an event is assigned by an individual
depending on his degree of belief in the occurrence of the event.
There are several theorems of probability, which are used to calculate
probabilities in different situations.
Theorem of Complementary events: This is used to determine the probability
of an event happening by subtracting the probability of the event not
happening from 1.
Theorem of Addition: It deals with the probability of occurrence of
either of the events when they are mutually exclusive or when they are
overlapping. According to this, the probability that either of the events
will happen is equal to the sum of their individual probabilities less the
probability of their joint occurrence.
Theorem of Multiplication: This theorem deals with the calculation of the
probability when our interest is in the occurrence of the events jointly.
For independent events, it uses multiplication of individual probabilities
while for events which are not independent, it uses conditional probability.
A conditional probability, is the likelihood that an event will happen,
given that another event has already happened.
A probability tree provides a useful way of the handling and analysing
conditional probabilities occurring at multiple levels. It represents the
given information through various branches on a set of chance nodes.
Bayes’ Theorem: This theorem provides a method of revising given
probabilities on the basis of additional information. This involves
transforming prior probabilities into posterior probabilities with the help
of conditional and joint probabilities.
PAGE 209
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
210 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Ans. Notes
(i) T (ii) F (iii) T (iv) F (v) F (vi) T
(vii) T (viii) F (ix) F (x) T (xi) F (xii) F
(xiii) F (xiv) T (xv) T (xvi) T
Exercise 2 : Questions and Answers
(i) What is probability? Explain the calculation of probability under
the classical approach.
(ii) Which probability approach would you use to calculate the following
probabilities? Give reasons also:
(a) The next toss of a fair coin will land on heads.
(b) India will win the next match with England.
(c) The sum of the faces of two dice will be eight.
(d) The success of a new product launched in the market.
(iii) “Complementary events are mutually exclusive but mutually exclusive
events may not be complementary.” Discuss with examples.
(iv) Distinguish between mutually exclusive and overlapping events. How
is the theorem of addition applied in both these cases?
(v) Distinguish clearly between mutually exclusive and independent
events. Can two events be mutually exclusive and independent
simultaneously? Do you agree that on tossing a coin once, the
appearance of heads and appearance of tails represent independent
as well as mutually exclusive events?
(vi) In each of the following cases, examine whether events are mutually
exclusive, overlapping, complementary, independent or not-independent:
(a) On a single toss of a die. appearance of 5 or 6 or appearance
of a number smaller than 4.
(b) A bank employee being an assistant manager or being a female.
(c) A claim adjuster in an insurance company being a male or
above 50 years of age.
(d) An employee being a clerk or a sportsman.
(e) A person in a hospital being a heart specialist or over 45
years of age or a lab technician.
(f) A two-shift factory employee working in morning shift or
evening shift.
(g) A teacher in a college working in the commerce department
or the chemistry department.
PAGE 211
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
212 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 213
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (xvii) Two unbiased dice are tossed. What is the probability that the total
of numbers on them would be a multiple of 3?
(xviii) A pack contains 30 tickets numbered consecutively from 1 to 30.
A ticket is chosen at random from this. Find the chance that the
number on this would be (i) a multiple 6 or 7 and (ii) a multiple
of 3 or 5.
(xix) Five candidates A, B, C, D and E appear for an interview. Two
candidates D and E are eliminated in the first round of the interview.
A has twice the chance of being selected than B, and B has twice
the chance as C, in the final interview. D bets that either A or B
will be selected and E bets that either B or C will be selected. Who
is likely to win the bet?
(xx) Given the following probability table of television viewing frequencies
(X) and the income levels (Y):
Viewing Income levels (Y) Total
frequency (X) High Middle Low
Regular 0.10 0.15 0.05 0.30
Occasional 0.10 0.20 0.10 0.40
Rarely 0.05 0.05 0.20 0.30
Total 0.25 0.40 0.35 1.00
(a) What is the probability that a person is a low income individual
and views TV regularly?
(b) If an individual is at low income level, what is the probability
that he/she views TV regularly?
(c) What is the probability that given an individual does not have
high income, he/she rarely watches TV?
(d) If an individual occasionally watches TV, what is the probability
that he/she is a high income earner or a middle income earner?
(e) Is viewing TV regularly independent of earning high income?
Explain.
(xxi) The probability that a contractor will not get a plumbing contract
is 2/3 and the probability that he will get an electric contract is
5/9. If the probability of getting at least one contract is 4/5. What
is the probability that he will get both the contracts?
214 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(xxii) An unbiased die and a biased die are tossed together. Find the Notes
probability that the sum of digits obtained on them is even, given
that on the biased die, it is thrice as likely to show an even number
as an odd one when tossed once.
(xxiii) A six-faced die is so biased that the digits 1, 3 or 5 on it are
thrice as likely as the digits 2, 4 or 6, when tossed once. Find the
probability that in two tosses of this die, the sum of digits would
be odd.
(xxiv) A husband and wife appear in an interview for two vacancies for
the same post. The probability of the husband’s selection is 1/7 and
that of wife’s selection is 1/5. What is the probability that:
(a) Both of them will be selected?
(b) Only one of them will be selected?
(c) None of them will be selected?
(xxv) (a) In rolling a pair of dice, what is the probability of rolling a
total of 21 on the first two rolls?
(b) Given that P(A) = 0.65. P(B) = 0.80, P(A/B) = P(A) and P(B/A)
= 0.85. Is this a consistent assignment of probabilities?
(xxvi) An MBA applies for one job in two firms X and Y. The probability
of his being selected in firm X is 0.7 and his being rejected in firm
Y is 0.5. The probability of at least one of his applications being
rejected is 0.6. What is the probability that he will be selected in
one or both of the firms?
(xxvii) During a survey of road safety, it was found that 60 per cent of
accidents occur at night, 52 per cent are alcohol related, and 37
per cent are alcohol related and occur at night:
(a) What is the probability that an accident was alcohol related
given that it occurred at night?
(b) What is the probability that an accident occurred at night
given that it was alcohol related?
(xxviii) An advertising executive is studying television-viewing habits
of married men and women during prime time hours. On the basis
of past viewing records, the executive has determined that during
prime time, husbands are watching television 60% of the time.
It has also been determined that when the husband is watching
television, 40% of the time the wife is also watching. When the
PAGE 215
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes husband is not watching television, 30% of the time the wife is
watching television. Find the probability that:
(a) If the wife is watching television, the husband is also watching
television.
(b) The wife is watching television during prime time.
(xxix)In a small factory, machines A, B and C manufacture 35%, 25% and
40% respectively of the total output. Of their output, respectively,
0.5, 4 and 2 per cent are defective. One item is drawn and found
to be defective. What are the respective probabilities that it was
produced by machines A, B and C?
(xxx) Reliance Industries Limited is determining whether it should submit
a bid for oil exploration contract. In the past, main competitor of
RIL, ONGC has submitted bids 66 per cent of the time. If ONGC
does not bid for oil exploration contract, the probability that RIL
will get the contract is 0.45. If ONGC does bid for oil exploration
contract, the probability that RIL will get the contract is 0.25:
(a) If Reliance Industries gets the contract, what is the probability
that ONGC did not bid?
(b) What is the probability that Reliance Industries will get the
contract?
Ans.
(x) 0.33, 0.22, 0.44, (xi) 498/2018, 932/ (xii) 0.3, 0.25, 0.4
2018, 808/2018,
169/2018, 507/
2018
(xiii) 3/32, 1/64 (xiv) 0 . 4 5 4 5 , 0 , (xv) 0.306, 0.195,
0.0303, 0.0303 0.4993
(xvi) 20/455, 13/455, (xvii) 0.33 (xviii) 9/30, 14/30
84/455
(xix) D is likely to win (xx) 0.0575, 0.01429, (xxi) 14/45
0.333, 0.75, No
(xxii) 0.5 (xxiii) 0.375 (xxiv) 1/35, 10/35,
24/35
(xxv) 20/1296, Not (xxvi) 0.8 (xxvii) 0.617, 0.712
consistent
(xxviii) 0.67, 0.36 (xxix) 0.362, 0.406, (xxx) 0.519, 0.318
0.232
216 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
L E S S O N
2
Probability Distributions
STRUCTURE
2.1 Learning Objectives
2.2 Probability Distribution
2.3 Binomial Distribution
2.4 Poisson Distribution
2.5 Normal Distribution
2.6 Summary
2.7 Self-Assessment Exercise
PAGE 217
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
218 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 219
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
220 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Notes (iv) The mean and the standard deviation of the Binomial distribution
is np and npq respectively.
(v) The other constants of the distribution can be calculated.
μ2 = npq
μ3 = npq (q – p)
μ4 = 3n2p2q2 + npq (1 – 6pq)
We can calculate the value of E1 and E2 to measure nature of the
distribution.
μ32 n 2 p 2 q 2 (q − p)2 (q − p 2 )
E1 = = =
μ32 n3 p 3 q 3 npq
μ 4 3n 2 p 2 q 2 + npq (1 − 6 pq ) 1 − 6 pq
E2 = = = 3+
μ22 2 2 2
n p q npq
222 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 223
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
8 1 9
Probability of getting more than 6 heads =
8
C7 q1 p 7 + 8Cn q 0 p8 = + =
256 256 256
Ans.
224 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
The binomial distribution is used in studies where there are only two
alternative outcomes which can be counted. However, there are many
situations in which number of successes can be counted but it becomes
impossible to count the number of failures such as number of vehicles
arriving at a toll gate during a day, here arrival of vehicle at toll gate is
considered as success which can be counted and non-arrival of vehicle at
toll gate is failure which can not be counted. Therefore, in such scenarios,
binomial distribution fails to calculate the probability of success. The
application of Poisson distribution is appropriate for such situation without
knowing the total possible outcomes.
The Poisson distribution is a discrete probability distribution that models
the number of events occurring within a fixed interval of time or space,
given a known average rate of occurrence and assuming events to be
independent. This distribution is characterized by a single parameter,
denoted as Ȝ representing the average rate of occurrence of an event in
a given interval.
It was originated by a French mathematician Simeon Denis Poisson in
1837. This distribution is used to calculate the likelihood of an independent
event occurring at a fixed interval of time or space with a constant mean
rate such as number of calls at call center in an hour, number of errors
on a page, number of deaths in a district in a year etc. In statistics, the
Poisson distribution is a probability distribution used to demonstrate how
frequently an event is likely to happen during a specific time frame. In
other terms, it is a distribution of counts. Thus, the Poisson distribution
is used to predict how many times an event will occur in a given time
period or space. Here, an interval of time may be of any length such,
minute, hour, month, year etc. and interval of space may be a piece of
paper, a district, country etc.
Let X be the Poisson random variable, which may take on any whole
number (X = 0,1,2,…n), representing number of event occurring in a
given interval of time or space, the probability of exactly x number of
events occur in a given interval is
P (X = x) = eȜ Ȝx / x!
PAGE 225
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Where, Ȝ is the parameter of the distribution and it is the mean number
of occurrences of event (or successes)
e = 2.71828
and, x is the Poisson random variable
Properties of Poisson Distribution
The Poisson distribution is useful in events with a large number of rare
and independent possible outcomes. The following are the properties of
the Poisson Distribution:
1. The events are assumed to be independent of each other.
2. The Poisson distribution can be defined only for non-negative integer
values of random variable.
3. The Poisson distribution is characterized by only one parameter Ȝ
4. The average number of successes (or occurrence of events) i.e., Ȝ
is constant from trial to trial.
5. The average number of successes in a given interval of time or
space is possible. Thus, two events cannot happen at the same time.
6. The mean and variance of Poisson distribution are same i.e., Ȝ
7. The Poisson distribution has the property of memorylessness, which
means the probability of an event occurring in future is not affected
by events in the past.
8. The Poisson distribution is positively skewed.
9. The Poisson distribution can approximate a binomial distribution,
if n is large and p is small.
10. The Poisson distribution is approximated by the normal distribution
when Ȝ is large.
11. The distribution is commonly used to model real world scenarios
involving rare events such as number of number of accidents at
intersection, number of bacteria in a given culture, number of
customers at checkout counter at departmental store etc.
Example 3: The average number of customers arriving at the checkout
counter of a store is 3 customers per hour. Find the probability that
during a given hour:
(i) No customer appears
226 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 227
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
⎝ 1! 2! 3! x! ⎠
where e LV D FRQVWDQW ZKRVH YDOXH LV DQG Ȝ LV WKH SDUDPHWHU RI
the distribution i.e. the average number of occurrences of an event.
A classical example of the Poisson distribution is given by road accidents.
As we know the number of people travelling on the road is very large i.e.
n is large. Probability that any specific individual runs into an accident
is very small. However.
np = average number of road accidents is a finite constant on any
particular day.
Therefore, x (number of road accidents on a particular day) follows
Poisson distribution.
The various parameters of Poisson Distribution are:
0HDQ Ȝ = np
(variance) = np = Ȝ
228 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
V = np Notes
μ2 =np = Ȝ
μ3 = Ȝ
μ4 = Ȝ + 3Ȝ2
μ 32 λ 2 1
? E1 = 3 = 3 =
μ2 λ λ
μ 4 λ + 3λ 2 1
E2 = 2 = = 3+
μ2 λ 2
λ
Example 4: If one house in 1000 has a fire in a district per year. What
is the probability that exactly 5 houses will have fire during the year if
there are 2000 houses?
Solution: We shall apply Poisson distribution
1
Ȝ = np where n = 2000, p =
1000
1
? Ȝ = np = 2000 × = 2.
1000
x!
25 2× 2× 2× 2× 2 4
P(5) = 2.7183– 2 × = (2.7183)– 2 × = 2.7183– 2 ×
5! 5 × 4 × 3 × 2 ×1 15
4 ⎛ 4⎞
= Reciprocal (AL(2 log 2.7183)) = Reciprocal (7.389) ⎜ ⎟ = 0.1352
15 ⎝ 15 ⎠
4
× = 0.036 Ans.
15
Example 5: If 3% of the bulbs manufactured are defective, calculate the
probability that a sample of 100 bulbs-will contain no defective and one
defective bulb using Poisson distribution.
Solution: Given number of defective bulbs are 3% (3/100).
PAGE 229
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes 3
? Ȝ = np = 100 × = 3.
100
Probability of no defective bulb in a sample of 100 is
230 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Q3 − Q1
Hence Quartile Deviation = = 0.6745 V
2
4
6. Mean deviation about mean is σ or MD = 0.7979 V
5
7. The points of inflexion of the normal curve occur at x =P + V and
x = P – V
PAGE 231
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes 8. The tails of curve extend to infinity on both sides of the mean. The
1
maximum ordinate at X = P is given by y =
σ 2π
9. Approximately 100% of the area under the curve is covered by μ+
3V.
Distance from the mean % of total area under the
ordinate in terms of ± V normal curve
Mean ± 3V 68.27
Mean ± 2V 95.45
Mean ± 3V 99.73
10. All odd moments are equal to zero.
μ1 = μ3 = 0
E1 = 0 and E2 = 3. Thus the curve is mesokurtic.
11. The normal distribution is formed with a continuous variable.
12. The fourth moment is equal to 3V4 for a normal distribution.
The equation of the normal curves gives the ordinate of the curve
corresponding to any given value of x. But we are interested in finding
out the area under the normal curve rather than its ordinate (y). A normal
curve with 0 mean and unit standard deviation is known as the standard
normal curve. With the help of a statistical table which gives the area
and ordinates of the normal curve are given corresponding to standard
normal variate.
x −μ
z = and not corresponding to x.
σ
Let us see the normal curve area under x-scale and z-scale.
232 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Notes
Fig. 1
PAGE 233
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (i) The area under the curve from Z = 0 (when X = μ) to a particular
value of Z gives the proportion of the area under this part of the
curve to the total area under the curve. Thus,
Z = 0 to Z = 1.42 the value 0.4222. Naturally, this is taken as the
probability that the variable in question will assume a value within
these limits.
(ii) Since the normal curve is symmetrical with respect to mean, the
area between μ(Z = 0) and particular value of Z to its right will be
same as the value of Z to its left. Thus, area between Z = 0 and
Z = 1.5 is equal to area between Z = 0 and Z = –1.5. Remember
that for values of X greater than μ, the Z value will be positive
while for X < μ, the value of Z would be negative.
(iii) The general procedure for calculating probabilities is like this :
(a) specify clearly the relevant area under the curve which is of
interest.
(b) determine the Z value (s).
(c) obtain the required area (s) with reference to the normal area
table.
Example 6 : Find the area under the normal curve :
(i) between Z = 0 and Z = 1.20
(ii) between Z = 1.0 and Z = 2.43
(iii) to the right of Z = 1.37
(iv) between Z = –1.3 and Z = 1.49
(v) to the right of Z = –1.78
Solution : For each of these, the relevant portions under the normal
curve are shown shaded and the areas determined with reference to the
normal area table.
Figure 2
234 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Figure 3
(ii) Area between Z = 0 and Z = 1.0 is 0.3413
Area between Z = 0 and Z = 2.43 is 0.4925
? Area between Z = 1.0 and Z = 2.43 is 0.4925 – 0.3413 = 0.1512.
Figure 4
(iii) Area between z = 0 and Z = 1.37 is 0.4147.
Total area under the curve being equal to 1, the area to the right
of Z = 0 is 0.5, as is the area to the left of it
? Area beyond Z = 1.37 is 0.5000 – 0.4147 = 0.0853.
Figure 5
(iv) Area between Z = 0 and Z = 1.3 is 0.4032.
Area between Z = 0 and Z = 1.49 is 0.4319.
? Area between Z = 1.3 and Z = 1.49 is 0.4032 + 0.4319 = 0.8351.
PAGE 235
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes
Figure 6
(v) Area between Z = 0 and Z = – 1.78 is 0.4625.
Area between Z = 0 is 0.5.
? Area to the right of Z = – 1.78 is 0.4625 + 0.5 = 0.9625.
Example 7 : Balls are tested by dropping from a certain height of bounce.
A ball is said to be fast if it rises above 36 inches. The height of the
bounce may be taken to be normally distributed with mean 33 inches
and standard deviation of 1.2 inches. If a ball is drawn at random, what
is the chance that it would be fast?
Solution : The given information is depicted in figure 7. Here we have
to calculate the probability that the height of the bounce, X, would be
greater that 36. This is shown shaded in figure 7.
Figure 7
We have, X = 36, μ = 33 and V = 1.2
X − μ 36 − 33
Z = = = 2.5
σ 1.2
From the normal area table, area between Z = 0 and Z = 2.5 is equal
to 0.4938. So area beyond Z = 2.5 is 0.5 – 0.4938 = 0.0062. Therefore,
P(X > 36) = 0.0062, the chance of getting a fast ball.
Example 8 : The life (x) of electric bulbs in hours is supposed to be
normally distributed as
236 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
( x −155)2
Notes
1
e 722
19. 2π
What is the probability that the life of a bulb will be :
(i) Less than 117 hours (ii) more than 193 hours (iii) between 117 and
193 hours.
Solution : Given μ = 155 and V = 19
Therefore, corresponding to x = 117 the standard normal variate is z =
117 − 155
= −2
19
Figure 8
We have to obtain the area to the left of Z = –2 [Pr(Z< – 2)].
From the table we see the area z = 0 and z = –2 and subtract it from 0.5.
? 0.5 – .4772 = 0.0228
Hence the probability of life of bulbs more than 193 hours is 0.0228.
To obtain the probability that the life of the bulb is more than 193 hours,
we obtain the corresponding standard normal variate
193 − 155
z = = +2
19
Figure 9
PAGE 237
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes And the area between 117 hours and 193 hours shall be
Figure 10
where Z = + 2.
Hence Pr (–2 < Z < +2) = Pr (117 < x < 193)
= .4772 + .4772 = .9544 Ans.
Example 9 : The results of a particular examination are given below in
a summary form :
Result Percentage of Candidates
Total Passed 80
Passed with distinction 10
Failed 20
It is known that a candidate fails if he obtains less than 40 marks (out
of 100), while he must obtain at least 75 marks in order to pass with
distinction. Determine the mean and the standard deviation of marks
assuming distribution of marks to be normal.
Solution : According to the given information,
Percentage of students getting marks less than 40 = 20,
Percentage of students getting marks between 40 and 75 = 70, and
Percentage of students getting marks above 75 = 10.
The relevant area is shown in figure 15.
Figure 11
Here P(X < 40) = 0.20, P (40 < X < 75) = 0.70 and P(X > 75) = 0.10
238 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Let μ and V represent the mean and standard deviation of the distribution. Notes
We have, area between μ and X =40 equal to 0.30, and area between μ
and X = 75 as equal to 0.40.
Now we have,
40 − μ
For X = 40, Z = , and
σ
75 − μ
For X = 75, Z = ,
σ
Corresponding to the area 0.30 in the normal area table, Z = 0.84. Thus,
for X = 40, we have Z = – 0.84 (Since the value of 40 lies to the left
of μ). Similarly, for the area equal to 0.40, we have Z = 1.28.
40 − μ
We have, then = – 0.84 and ....(i)
σ
75 − μ
= – 1.28 ....(ii)
σ
Rearranging the above equations, we get
μ– 0.84 V = 40 and ....(iii)
μ +1.28 V = 75 ....(iv)
Subtracting equation (iii) from equation (iv), we get
2.12 V = 35
or V = 35/2.12 = 16.51
Substituting the value of V in equation (iii) and solving for μ, we
get
P – (0.84) (16.51) = 40
or P = 40 + 13.87 = 53.87
Thus, Mean = 53.87 marks and standard deviation = 16.51 marks.
Example 10 : There are 900 students in B.Com (Hons.) course of a
college and the probability of a student needing a particular book on a
day is 0.10. How many copies of the book should be kept in the library
that there should be at least 0.90 chance that a student needing that book
will not go disappointed ? Assume normal approximation to the binomial
distribution.
PAGE 239
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
240 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Table 2 Notes
Normal Curve Z-score
An entry in the table is the area under the curve between Z = 0 and
a positive value of Z. Areas for negative values of Z arc obtained by
symmetry.
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2703 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3642 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4669 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4779 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
PAGE 241
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
242 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 243
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
244 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 245
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
246 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 247
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
L E S S O N
3
Statistical Decision Theory
STRUCTURE
3.1 Learning Objectives
3.2 Probability in Decision Making
3.3 Decision Making Process
3.4 Decision Under Uncertainty
3.5 Decision Under Risk
3.6 Expected Value of Perfect Information (EVPI)
3.7 Decision Tree
3.8 Summary
3.9 Self-Assessment Questions
248 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
might result from each alternative. Decision theory involves selecting an Notes
alternative and having a reasonable idea of the economic consequences
of choosing that action.
Decision theory may be applied to problems whether the time span is one
day or five years, whether it involves financial management or a plant
assembly line. Most of these problems have common characteristics. The
elements common to most decision theory problems are: (i) An objective,
(ii) Several courses of actions, (iii) A calculable measure of the benefit
or worth of various alternatives, (iv) Events beyond the control of the
decision maker, and (v) Uncertainty about which outcome or state of
nature will actually happen.
Most complex managerial decisions are made with some uncertainty.
Managers authorize substantial capital investments with incomplete knowledge
about product demand. When decisions are made under uncertain future
conditions, use of probabilities provides us with a rational technique for
making choices.
Example 1 : A bakery provides cakes at a cost of Rs. 6 and sells them
for Rs. 10 each. A cake not sold on a particular day is worthless. The
baker’s problem is to determine the optimum number of cakes to be made
each day. On days when his stock is more than his sales his profits are
reduced by the cost of unsold cakes. On days of demand exceeding his
stock he loses sales and makes smaller profits than he could have. He
has kept a record of his sales for past 100 days to tell him about the
historical pattern of sales.
Daily sales of cakes No. of days sold
300 15
400 20
500 45
600 15
700 5
100
Solution : On the basis of above information we can assign probabilities
of sale of cakes in different quantities. For example a probability of
0.45 is assigned to the sale figure of 500 cakes. The table assigning
probabilities to various quantities can be prepared.
PAGE 249
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
250 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
earns nor loses because by selling 300 units he makes a profit of 300 × Notes
4 = Rs. 1200. He loses Rs. 1200 cost of 200 units that remained unsold
and the net result is zero profit. But when he produces 600 cakes and
sells only 300 he incurred a loss of Rs. 600 [(300 × 4) – (300 × 6) =
1200 – 1800 = – 600].
PAGE 251
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
252 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
not regret if it had gone for full product line. In case it had decided Notes
for partial product line, it would regret somewhat while a decision of
minimal product line would cause a greater regret. The regret would be
Rs. 10,000 (= 80,000 – 70,000) and Rs. 30,000 (= 80,000 – 50,000),
respectively, with these two policies.
Table 1 : Decision-making Using Different Criteria
Anticipated Profit (in Rs.)
Act
Event Full Product Partial Minimal
Line Product Line Product Line
Good Product Acceptance 80,000 70,000 50,000
Fair Product Acceptance 50,000 45,000 40,000
Poor Product Acceptance –25,000 –10,000 0
Maximum 80,000 70,000 50,000
Minimum –25,000 –10,000 0
Average 35,000 35,000 30,000
Table 2 : Conditional Regret Table
Act
Event Full Product Partial Minimal
Line Product Line Product Line
Good Product Acceptance 0 10,000 30,000
Fair Product Acceptance 0 5,000 10,000
Poor Product Acceptance 25,000 10,000 0
The elements of a decision process are:
A decision-maker.
A set of possible outcomes, or events, in the decision situation.
A set of courses of action available to the decision-maker.
A set of conditional pay-offs corresponding to various possible combinations
of events and actions.
Selection of a particular course of action based on some criterion.
After setting up the pay-off table and the regret table we proceed to take
a decision. There are several rules, or criteria, on the basis of which
decision may be taken. The selection of an appropriate criterion depends
on factors such as the nature of decision situation, attitude of the decision-
PAGE 253
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes maker etc. We shall first discuss the decision rules for taking decisions
in conditions of uncertainty and then for conditions of risk.
254 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
The Laplace principle is based on the simple rule that if we are uncertain
about various events, then we may treat them as equally probable.
Therefore the expected value of pay-off for each strategy is calculated
and the strategy with the highest mean value is adopted. The expected
pay-offs for various courses of action are calculated as:
Full product line : (80,000 + 50,000 – 25,000)/3 = Rs. 35,000
Partial product line : (70,000 + 45,000 – 10,000)/3 = Rs. 35,000
Minimal product line : (50,000 + 40,000 + 0)/3 = Rs. 30,000
Since the highest expected pay-off is shared by the strategies of full product
line and partial product line, both could be adopted by the management.
PAGE 255
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes 3. Considering best of the weighted average of the best and worst pay-
offs under every act.
4. Considering best of the simple average of all pay-offs under every
act.
5. Considering act with least of maximum regret values associated with
all acts.
256 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
The calculation of expected pay off values EP for various acts is shown Notes
below:—
for A1, EP1 = 0.15 × 1200 + 0.20 × 1200 + 0.45 × 1200 + 0.15 × 1200
+ 0.05 × 1200 = Rs. 1200
for A2, EP2 = 0.15 × 600 + 0.20 × 1600 + 0.45 × 1600 + 0.15 × 1600
+ 0.05 × 1600 = Rs. 1450
for A3, EP3 = 0.15 × 0 + 0.20 × 1000 + 0.45 × 2000 + 0.15 × 2000 +
0.05 × 2000 = Rs. 1500
for A4, EP4 = 0.15 × –600 + 0.20 × 400 + 0.45 × 1400 + 0.15 × 2400
+ 0.05 × 2400 = Rs. 1100
for A5, EP5 = 0.15 × –1200 + 0.20 × –200 + 0.45 × 800 + 0.15 × 1800
+ 0.05 × 2800 = Rs. 550
Since maximum expected pay off is associated with strategy A3, the best
course of action is to produce 500 cakes.
PAGE 257
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes for A2, ER2 = 0.15 × 600 + 0.20 × 0 + 0.45 × 400 + 0.15 × 800 + 0.05
× 1200 = Rs. 450
for A3, ER3 = 0.15 × 1200 + 0.20 × 600 + 0.45 × 0 + 0.15 × 400 + 0.05
× 800 = Rs. 400
for A4, ER4 = 0.15 × 1800 + 0.20 × 1200 + 0.45 × 600 + 0.15 × 0 +
0.05 × 400 = Rs. 800
for A5, ER5 = 0.15 × 2400 + 0.20 × 1800 + 0.45 × 1200 + 0.15 × 600
+ 0.05 × 0 = Rs. 1350
Under this criterion, the optimal strategy is the one which minimizes
the expected regret. Since the minimum value occurs at A3 it represents
the optimal decision. This is same as under expected pay off criterion.
258 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Example 3 : A grocery shop is faced with the problem of how many Notes
cakes to buy in order to meet the day’s demand. The left over cakes are
a total loss. If the customer’s demand is not satisfied, the sales will be
lost. The shopkeeper has got the information regarding past sales for
past 200 days:
Sales per day No. of days Probability
25 20 0.10
26 60 0.30
27 100 0.50
28 20 0.10
(i) Prepare the payoff matrix and opportunity loss (regret) matrix.
(ii) Find the optimal number of cakes that should be bought each day.
(iii) Find EVPI.
The cost of a cake is Rs. 8 and it is sold for Rs. 10 each.
Solution :
(i) Profit = (Cakes sold × selling price) – (Cakes unsold × cost price)
Opportunity loss (regret) = Maximum profit in a row – Profit under
each column in that row.
Pay off Table (Rs.)
25 26 27 28 Probability
25 50 42 34 26 0.10
26 50 52 44 36 0.30
27 50 52 54 46 0.50
28 50 52 54 56 0.10
EMV 50 51 49 42
Regret Table (Rs.)
25 26 27 28 Probability
25 0 8 16 24 0.10
26 2 0 8 16 0.30
27 4 2 0 8 0.50
28 6 4 2 0 0.10
EOL 3.20 2.20 4.20 11.20
(ii) Now we can calculate expected monetary value and expected
opportunity loss
PAGE 259
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
260 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Example 5 : Each unit of a product produced and sold yields a profit of Notes
Rs. 50 but a unit produced but not sold results in a loss of Rs. 30. The
probability distribution of the number of units demanded is as follows:
No. of Units Demanded Probability
0 0.20
1 0.20
2 0.25
3 0.30
4 0.05
How many units be produced to maximise the expected profits? Also
calculate EVPI.
Solution :
Given : Profit for units produced and sold = Rs. 50
Loss for units produced and not sold = Rs. 30
Pay Off Table
(Production)
De- Proba- 0 EMV 1 EMV 2 EMV 3 EMV 4 EMV
mand bility
0 0.2 0 0 (30) (6) (60) (12) (90) (18) (120) (24)
1 0.2 0 0 50 10 20 4 (10) (2) (40) (8)
2 0.25 0 0 50 12.50 100 17.50 70 17.50 40 10
3 0.2 0 0 50 15 100 45 150 45 120 36
4 0.05 0 0 50 2.50 100 7.50 150 7.50 120 10
Total 0 34 52 50 42
Note: Values in brackets are negative.
We should produce 2 units because EMV = Rs. 52 (maximum).
Further, EVPI = EPPI – EMV
EPPI Table
Demand Probability Max. pay off EPPI
0 0.2 0 0 × 0.2 = 0
1 0.2 50 50 × 0.2 = 10
2 0.25 100 100 × 0.25 = 25
3 0.30 150 150 × 0.30 = 45
4 0.05 200 200 × 0.05 = 10
Total 90
? EVPI = 90 – 52 = Rs. 38
PAGE 261
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
262 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 263
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Decision trees have standard symbols, squares symbolize decision points
and circles represent chance events. From each square and circle branches
are drawn. These represent each possible outcome or state of nature.
Steps in Decision Tree Analysis
In a decision tree analysis, the decision-maker follows the following six
steps:
1. Define the Problem in Structured Terms: First of all, the factors
relevant to the solution should be determined. Then probability
distributions that are appropriate to describe future behaviour of
those factors are estimated.
2. Model the Decision Process: A decision tree that illustrates all the
alternatives in a problem is constructed. The entire decision process
is presented in an organised step-by-step procedure.
3. Apply the Appropriate Probability Values and Financial Data:
To each of the branches and sub-branches of the decision tree
the appropriate probability values and financial data are applied.
This will help us to distinguish between the probability value and
conditional monetary value associated with each outcome.
4. “Solve” the Decision Tree: Using the method explained above locate
that particular branch of the tree that has the largest expected value
or that maximises the decision criteria.
5. Perform Sensitivity Analysis: Determine how the solution reacts
to changes in inputs. Changing probability value and conditional
financial values, enables the decision maker to test the magnitude
and the direction of the reaction.
6. List the Underlying Assumptions: The accounting, cost finding and
other assumptions used to arrive at a function should be explained.
This will help others to know what risks they are taking when they
use the results of decision tree analysis.
Advantages of Decision Tree Approach
The decision tree analysis is important because of the following:
1. It structures the decision process enabling decisions to be made in
an orderly manner.
264 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 265
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Expected net profit for investment A = 0.4 × 75,000 + 0.4 × 55,000 + 0.2
× 35,000 – 30,000 = Rs. 29,000
Expected net profit for investment B = 0.3 × 100,000 + 0.4 × 80,000 + 0.3
× 70,000 – 50,000 = Rs. 33,000
Since the expected net profit for investment B is more than A, the
businessman should invest in B.
Example 9 : A person has two independent investments A and B available
to him, but he can undertake only one at a time due to certain constraints.
He can choose A first and then stop, or if A is successful, then take B
or vice versa. The probability of success of A is 0.6 while for B it is
0.4. Both investments require an initial capital outlay of Rs. 10,000 and
both return nothing if the venture is unsuccessful. Successful completion
of A will return Rs. 20,000 (over cost) and successful completion of B
266 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
will return Rs. 24,000 (over cost). Draw a decision-tree and determine Notes
the best strategy.
PAGE 267
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
268 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 269
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (xv) In the decision trees, no more than two alternative courses of action
can emanate from a decision node.
(xvi) In decision trees, the probabilities of all events at chance nodes and
the monetary evaluations of different alternatives must all be known
in advance.
(xvii) The probabilities of various outcomes at each chance node should
always add up to one.
(xviii) A decision taken on the basis of expected monetary value would
always prove to be the right decision.
Ans.
(i) True (ii) False (iii) True (iv) False (v) False (vi) True
(vii) False (viii) True (ix) True (x) True (xi) False (xii) True
(xiii) True (xiv) True (xv) False (xvi) True (xvii) False (viii) False
Exercise 2 : Questions and Answers
(i) Describe the steps involved in the process of decision-making.
(ii) What are pay-off and regret functions? How can entries in a regret
table be derived from a pay-off table?
(iii) Explain and illustrate the following principles of decision-making:
(a) Laplace,
(b) Maximax,
(c) Maximin,
(d) Hurwicz, and (e) Savage.
(iv) How are maximum likelihood and expectation principles of choice
differentiated? Do they always lead to same decisions?
(v) Define the term EPPI. How is it calculated? What does it signify?
(vi) What do you understand by EVPI? How is it calculated?
(vii) Explain the procedure of analysing a decision tree.
(viii) The research department of Hindustan Lever has recommended the
marketing department to launch a shampoo of three different types.
The marketing manager has to decide one of the types of shampoo
to be launched under the following estimated pay-offs (in millions
of Rs.) for various levels of sales:
270 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 271
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes to buy, drill for oil and if found exercise the option, and (c) not
buying or obtaining option. There are three possibilities on such
land: Large oil reserves may be found; minor reserves may be found,
or there may be no oil. The pay-offs (in lacs of Rs. resulting from
various combinations of acts and events are tabulated below:
Acts
Buy land Obtain option No action
Large Reserves 40 28 0
Minor Reserves 10 1 0
No Oil –25 –2 0
What action should be taken by the company when the decision
criterion is:
(a) Laplace, (b) Maximin, (c) Maximax, (d) Minimax Regret, and
(e) Expected pay-off (when the probabilities of obtaining large,
minor, and no reserves are estimated to be 0.2, 0.5 and 0.3,
respectively)?
(xi) A firm is considering the purchase of some complex equipment
from either of the two suppliers S1 and S2. Supplier S1 is capable of
supplying the equipment on time to meet a certain desired deadline.
The price chargeable by S1 is, however, considerably higher than
that of S2. It is felt by the management of the firm that S2 may
deliver the equipment or may not be able to deliver on time. It is
even suspected that supplier S2 may never be able to deliver the
equipment to the specifications. However, the management believes
that if it waits for some months, it may get better information on
S2’s capabilities of supplying the equipment.
The management is considering three alternative courses of action.
A1 : Order from S1. If later on it is clear that S2 can supply, order
from S1 can be cancelled. Of course, delay would be caused when
the order is given to S2.
A2: Order from supplier S2. If it is known later on that S2 cannot
supply the equipment, the order may be switched to S1.
A3: Wait till the time information on S2’s capabilities is known.
This would obviously cause delay. The outcomes (profits) in the
various possible situations are:
272 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 273
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (xiv) Three types of souvenirs can be sold outside a stadium. From the
following conditional pay-off table, construct the opportunity loss
table. (Sales are dependent on the winning team.)
Types of Souvenir
I (Rs.) II (Rs.) III (Rs.)
Team A wins 1,200 800 300
Team B wins 250 700 1,100
Point out which type of souvenir should be bought if probability of
Team A’s winning is 0.6.
(xv) Chemical Products Ltd. produces a compound which must be sold
within the month it is produced, if the normal price of Rs. 100 per
drum is to be obtained. Anything unsold in that month is sold in a
different market for Rs. 20 per drum. The variable cost is Rs. 55
per drum.
During the last three years, monthly demand was recorded and showed
the following frequencies:
Monthly demand (No. of drums): 2,000 3,000 6,000
Frequency (No. of months): 8 16 12
(a) Prepare an appropriate pay-off table.
(b) Advise the production management on the number of drums
that should be produced next month.
(xvi) A stockist of a particular commodity makes a profit of Rs. 30 on
each sale made within the same week of purchase; otherwise he
incurs a loss of Rs. 30 on each item.
No. of items sold within : 5 6 7 8 9 10 11
the same week
Frequency : 0 9 12 24 9 6 0
(a) Find out the optimum number of items the stockist should buy
every week in order to maximize the profit.
(b) Calculate the expected value of perfect information.
(xvii) A physician purchases a particular vaccine on Monday of each week.
The vaccine must be used within the week following, otherwise
it becomes worthless. The vaccine costs Rs. 20 per dose and
274 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
the physician charges Rs. 60 per dose. In the past 50 weeks, the Notes
physician has administered the vaccine in the following quantities:
Doses per week : 20 25 40 60
No. of weeks : 5 15 25 5
(a) Draw up a pay-off matrix.
(b) Obtain a regret matrix.
(c) Determine the optimum number of doses the physician should
buy.
(d) The maximum amount the physician would be willing to pay
per week for perfect information about the number of doses
expected to be demanded in a week.
Ans.
(viii) Egg, Deluxe, Deluxe, (ix) S3, S1, S1, S1 Obtain option, No
(x)
Deluxe action, Buy Land,
obtain option, buy
land or obtain option
(xi) A3 A1 A2 A2 A2 or A3 (xii) Property, Exp. (xiii) (i) (ii)
Return = 6%
(xiv) Type I ER = 340 (xv) 3000 drums (xvi) 8 units, EP = 210,
EVPI = 25.50
(xvii) 40, EMV = 1210,
EVPI = 1210.
PAGE 275
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
UNIT-3
PAGE 277
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
L E S S O N
1
Simple Correlation
STRUCTURE
1.1 Learning Objectives
1.2 Introduction
1.3 Utility of Correlation
1.4 Difference between Correlation and Causation
1.5 Types of Correlation
1.6 Methods of Studying Correlation
1.7 Summary
1.8 Self-Assessment Questions
1.2 Introduction
In the earlier chapters we have discussed univariate distributions to highlight the important
characteristics by different statistical techniques. Univariate distribution means the study
related to one variable only. We may however come across certain series where each
item of the series may assume the values of two or more variables. The distributions in
which each unit of series assumes two values is called bivariate distribution. In a bivariate
distribution, we are interested to find out whether there is any relationship between two
variables. The correlation is a statistical technique which studies the relationship between
two or more variables and correlation analysis involves various methods and techniques
PAGE 279
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes used for studying and measuring the extent of relationship between the
two variables. When two variables are related in such a way that a change
in the value of one is accompanied either by a direct change or by an
inverse change in the values of the other, the two variables are said to
be correlated. In the correlated variables an increase in one variable is
accompanied by an increase or decrease in the other variable. For instance,
relationship exists between the price and demand of a commodity because
keeping other things equal, an increase in the price of a commodity shall
cause a decrease in the demand for that commodity. Relationship might
exist between the heights and weights of the students and between amount
of rainfall in a city and the sales of raincoats in that city.
These are some of the important definitions about correlation.
Croxton and Cowden says, “When the relationship is of a quantitative
nature, the appropriate statistical tool for discovering and measuring the
relationship and expressing it in a brief formula is known as correlation”.
A.M. Tuttle says, “Correlation is an analysis of the covariation between
two or more variables.”
W.A. Neiswanger says, “Correlation analysis contributes to the understanding
of economic behaviour, aids in locating the critically important variables
on which others depend, may reveal to the economist the connections by
which disturbances spread and suggest to him the paths through which
stabilizing forces may become effective.
L.R. Conner says, “If two or more quantities vary in sympathy so that the
movements in one tends to be accompanied by corresponding movements
in others then they are said to be correlated.
280 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 281
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
282 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
a constant ratio to the amount of change in the other variable then the Notes
correlation is said to be non-linear. The distinction between linear and
non-linear is based upon the consistency of the ratio of change between
the variables.
PAGE 283
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes
284 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 285
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Example 1 : Show correlation from the following data by graphic method:
Year 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Average Income 100 110 125 140 150 180 200 220 250 360
(Rs.)
Average 90 95 100 120 120 140 150 170 200 260
Expenditure
(Rs.)
Solution:
0 2005 06 07 08 09 10 11 12 13 2014
Y
YEARS
The graph prepared shows that income and expenditure have a close
positive correlation. As income increases, the expenditure also increases.
Covariance
The measurement of relationship between two random variables (assume
variables are X and Y) is called Covariance. It examines the directions of
relationship of two variable whether the variables are positively covariate
or negatively covariate. If both X and Y variables tend to be higher or
lower at the same time then it shows the positive covariance between X
and Y. On the other side, if X and Y indicate inverse relationship which
means one is showing higher value than the average and other is showing
the lower value than the average, it considered as negative covariance
between the variables. It is denoted by as follows:
286 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Notes
Covariance (X, Y) =
∑ ( X - X ) (Y - Y) = ∑xy
n n
Here X = ∑X , similarly = ∑Y
n n
Example: Calculate Covariance (X, Y) if
X: 1 2 3 4 5
Y: 6 7 8 9 10
Covariance (X, Y)
X Y x = X-X y = Y-Y xy = (X- X)(Y- Y)
1 6 -2 -2 4
2 7 -1 -1 1
3 8 0 0 0
4 9 1 1 1
5 10 2 2 4
6X = 15 6Y = 40 6xy = 10
X = ∑X = 15
= 3 Y = ∑Y = 40
= 8
n 5 n 5
Covariance (X, Y) = ∑
xy 10
= = 2
n 2
PAGE 287
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
y1 , y2 , y3 .... yn are the deviations of all items of the second variable from
mean,
6xy is the sum of products of these corresponding deviations. N stands
for the number of pairs, Vx stands for the standard deviation of X variable
and Vystands for the standard deviation of Y variable.
Σx 2 Σy 2
V x= and Vy=
N N
If we substitute the value of Vx and Vy in the above written formula of
computing r, we get
Σxy Σxy
r = or r =
⎛ Σx 2 Σy 2 ⎞ Σx 2 Σy 2
N⎜ × ⎟
⎜ N N ⎟
⎝ ⎠
Degree of correlation varies between +1 and –1; the result will be +1
in case of perfect positive correlation and –1 in case of perfect negative
correlation.
Computation of correlation coefficient can be simplified by dividing the
given data by a common factor. In such a case, the final result is not
multiplied by the common factor because coefficient of correlation is
independent of change of scale and origin.
Example 2 : Calculate Coefficient of Correlation from the following data:
X 50 100 150 200 250 300 350
Y 10 20 30 40 50 60 70
Solution:
X−X Y −Y
50 10
X (X − X ) x x2 Y Y −Y y y2 xy
50 – 150 – 3 9 10 – 30 – 3 9 9
288 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Notes
X−X Y −Y
50 10
X (X − X ) x x2 Y Y −Y y y2 xy
28 28
By substituting the values we get r = = =1
28 × 28 28
Hence there is perfect positive correlation.
Example 3 : A sample of five items is taken from the production of a
firm, length and weight of the five items are given below:
Length (inches) 3 4 6 7 10
Weight (ounces) 9 11 14 15 16
Calculate Karl Pearson’s correlation coefficient between length and weight
and interpret the value of correlation coefficient.
ΣX 30 ΣY 65
Solution: X = = = 6 and Y = = = 13
N 5 N 5
(X − X ) (Y − Y )
X x x2 Y y y2 xy
3 – 3 9 9 – 4 16 12
4 – 2 4 11 – 2 4 4
6 0 0 14 +1 1 0
7 + 1 1 15 + 2 4 2
10 + 4 16 16 + 3 9 12
6X = 30 0 30 6Y = 65 0 34 30
PAGE 289
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Σxy
r = where Σxy = 30, Σx 2 = 30, and Σy 2 = 34
Σx Σy 2 2
30 30
r = = = + 0.939 Ans.
30 × 34 1020
The value of r indicates that there exists a high degree positive correlation
between lengths and weights.
Example 4 : From the following data, compute the coefficient of correlation
between X and Y :
X Series Y Series
Number of items 15 15
Arithmetic Mean 25 18
Square of deviation from Mean 136 138
Summation of product deviations of X and Y from their Arithmetic Means
= 122.
Solution: Denoting deviations of X and Y from their arithmetic means
by x and y respectively, the given data are : 6x2 = 136, 6xy = 122, and
6y2 = 138
Σxy 122 122
r = = = = 0.89 Ans.
Σx Σy
2 2
136 ×138 137
290 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 291
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
The above given formula gives us the same answer as we are getting by
taking durations from actual mean or arbitrary mean.
Example 6 : Compute the coefficient of correlations from the following
data :
Marks in 20 30 28 17 19 23 35 13 16 38
Statistics
Marks in 18 35 20 18 25 28 33 18 20 40
Mathematics
Solution:
Marks in Marks in
Statistics, X Mathematics, Y X2 Y2 XY
20 18 400 324 360
30 35 900 1225 1050
28 20 784 400 560
17 18 289 324 306
19 25 361 625 475
23 28 529 784 644
35 33 1225 1089 1155
13 18 169 324 234
16 20 256 400 320
38 40 1444 1600 1520
6X = 239 6Y = 255 6X = 6357
2
6Y = 7095
2
6XY = 6624
Substitute the computed values in the formula given below,
N ΣXY − (ΣX )(ΣY )
r =
N ΣX 2 − ( ΣX ) 2 N ΣY 2 − ( ΣY ) 2
292 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 293
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Σfdx.Σfdy
Σfdxdy −
Σf
r =
⎧ (Σfdx) 2 ⎫ ⎧ (Σfdy ) 2 ⎫
⎨ Σ fdx 2
− ⎬⎨ Σ dy 2
− ⎬
⎩ Σf ⎭ ⎩ Σf ⎭
294 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
r = bxy × dyx
Merits of Pearson’s coefficient of correlation : The correlation of coefficient
summarizes in one figure the degree and direction of correlation but also
the direction. Value varies between +1 and –1.
Demerits of Pearson’s coefficient of correlation: It always assumes
linear relationship between the variables; in fact the assumption may be
wrong. Secondly, it is not easy to interpret the significance of correlation
coefficient. The method is time consuming and affected by the extreme
items.
PAGE 295
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes If r is less than probable error, then there is no real evidence of correlation.
If r is more than 6 times the probable error, the coefficient of correlation
is considered highly significant.
If r is more than 3 times the probable error but less than 6 times,
correlation is considered significant but not highly significant.
If the probable error is not much and the given r is more than the probable
error but less than 3 times of it, nothing definite can be concluded.
296 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Solution: Notes
Individual Rank before Rank after (R1 – R2)
R1 R2 D D2
A 1 6 – 5 25
B 6 8 – 2 4
C 3 3 0 0
D 9 7 2 4
E 5 2 3 9
F 2 1 1 1
G 7 5 2 4
H 10 9 1 1
I 8 4 4 16
J 4 10 – 6 36
N = 10 2
6D = 100
By applying the formula,
6ΣD 2 6 × 100
U = 1− = 1− 3 = 1 − 0.609 = 0.394
N −N
3
10 − 10
When we are given the actual data and not the ranks, it becomes necessary
for us to assign the ranks. Ranks can be assigned by taking either the
highest value as one or the lowest value as one. But if we start by taking
the highest value or the lowest value we must follow the same order for
both the variables to assign ranks.
Example 9 : Calculate rank correlation from the following data:
X : 17 13 15 16 6 11 14 9 7 12
Y : 36 46 35 24 12 18 27 22 2 8
Solution:
Calculation of Rank Correlation
X (Ranks) Y (Ranks) D D2
R1 R2 (R1 – R2)
17 1 36 2 – 1 1
13 5 46 1 + 4 16
15 3 35 3 0 0
PAGE 297
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
298 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Example 10 : Compute the rank correlation coefficient from the following Notes
data:
Section A : 115 109 112 87 98 98 120 100 98 118
Section B : 75 73 85 70 76 65 82 73 68 80
Solution :
Computation of Rank correlation coefficient.
Series Ranks Series Ranks D D2
A R1 B R2 (R1 – R2)
115 8 75 6 – 2 4
109 6 73 4.5 1.5 2.25
112 7 85 10 – 3 9
87 1 70 3 – 2 4
98 3 76 7 – 4 16
98 3 65 1 2 4
120 10 82 9 1 1
100 5 73 4.5 0.5 0.25
98 3 68 2 1 1
118 9 80 8 1 1
N = 10 6D = 42.50
2
PAGE 299
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
300 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
X Dx Y Dy D xD y Notes
120 + 65 + +
N = 6 C = 3
⎛ 2C − N ⎞
rc= ± ± ⎜ ⎟
⎝ N ⎠
⎛ 2×3 − 6 ⎞ 0
= ± ±⎜ ⎟ =± = 0.
⎝ 6 ⎠ 6
1.7 Summary
Correlation analysis deals with bivariate and multivariate data.
Correlation is a study of the co-variation of the variables involved.
When changes in the variables occur in the same direction, they are
positively correlated and when the movements are in the opposite
directions, the correlation is negative. Correlation between two
variables would result either when one of them is the cause while
the other is the effect or when both of them are affected by some
common factors. It may also be spurious correlation, resulting from
chance when factors affecting each one have nothing in common.
Correlation between variables may be of varying degrees ranging from
perfect to high, moderate, low and no correlation. Correlation may
be linear or non-linear. Only linear correlation is considered here.
Graphically, correlation is studied by means of a scatter diagram. If
dots representing pairs of data values are seen to fall on a straight
line, the correlation is perfect. The degree of correlation decreases as
the points lay more and more away from the line. Widely scattered
dots with no clear direction and dots in a line that is parallel to
either of the axes means absence of correlation. Numerically, the
correlation is measured and expressed in terms of Karl Pearson’s
coefficient of correlation which is defined as the ratio of covariance
to the product of standard deviations of the two series involved. Its
calculation can be done by measuring deviations of the observations
from their respective means or assumed mean values, and even
PAGE 301
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
302 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(xiv) If all the X values in a given set of paired data are subtracted from Notes
a constant K, it will have no effect on the value of the correlation
coefficient.
(xv) The coefficient of correlation is independent of the change of origin
and scale.
(xvi) If the coefficient of correlation between X and Y is 0.7, then the
coefficient of correlation between –X and –Y would be equal to
–0.7.
(xvii) The correlation is said to be significant only when | r|> 6PE.
(xviii) For the rank correlation to be calculated, it is necessary that the
given variables should not be quantifiable.
(xix) The coefficient of rank correlation has the same limits as the Karl
Pearson’s coefficient of correlation has.
(xx) Rank correlation can be used even when the variables under consideration
are quantifiable and not normally distributed.
Ans.
(i) F (ii) F (iii) F (iv) T (v) F (vi) T (vii) T
(viii) F (ix) T (x) T (xi) F (xii) T (xiii) F (xiv) F
(xv) T (xvi) F (xvii) T (xviii) F (xix) T (xx) T
Exercise 2 : Questions and Answers
(i) What is correlation? Distinguish between positive and negative
correlation. How is ‘scatter diagram’ method helpful in the study
of correlation?
(ii) What is a scatter diagram? How does it help in studying the degree
and direction of correlation between two variables? Illustrate with
some sketches.
(iii) Define Karl Pearson’s coefficient of correlation. Explain the general
rules for interpreting the coefficient. In this connection, also state
the meaning and significance of the concept of probable error.
(iv) State and explain the properties of the coefficient of correlation.
Also, state the assumptions underlying.
(v) What do you understand by the statement that coefficient of
correlation is independent of the change of origin and scale?
PAGE 303
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (vi) Does correlation imply the existence of cause and effect relationship
between the variables involved? Does cause and effect relationship
between variables result in correlation between them? Explain with
the help of suitable examples.
(vii) Define rank correlation. Write Spearman’s formula for rank correlation
coefficient when some ranks are tied and when ranks are not tied.
What are the limits of this coefficient? Interpret the case where
this coefficient assumes the minimum value.
(viii) For a given series of paired data, the following information is
available:
Covariance between X and Y series = –32.6
Standard deviation of X series = 8.6
Standard deviation of Y series = 4.8
No. of pairs of observations = 15
Calculate the coefficient of correlation.
(ix) Given the following information:
Number of pairs of observations of X and Y series = 15
X series arithmetic mean = 25
Y series arithmetic mean = 18
X series standard deviation = 3.0
Y series standard deviation = 3.03
Summation of the products of corresponding deviations of X and Y
series = 122
Calculate the coefficient of correlation between X and Y series.
(x) Given:
Total of multiplication of deviations of X and Y = 3,476
No. of pairs of observations = 12
Total of deviations of X = – 176
Total of deviations of Y = – 26
Total of squares of deviations of X = 8,288
Total of squares of deviations of Y = 2,556
Using this information, calculate the coefficient of correlation when
the arbitrary mean values of X and Y are 85 and 22, respectively.
304 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(xi) For a set of bivariate data, you are given the following information: Notes
6(X – 58) = 46, 6(Y – 58) = 19, 6(X – 58) (Y – 58) = 1,095,
6(X – 58)2 = 1,483, and 6(Y – 58)2 = 3,086
Number of pairs of observations = 8
Calculate the coefficient of correlation between X and Y.
(xii) The co-efficient of correlation between two variables X and Y is –
0.4 and their covariance is equal to –16. If variance of Y series is
36, find the second moment about mean of X series.
(xiii) Given below is the information relating to marks in Statistics (X)
and marks in Accountancy (Y) obtained by the students of a class:
Co-variance between X and Y = 144
Second moment of X about 20 = 244
First moment of X about 20 = 10
Arithmetic mean of Y = 45
Coefficient of correlation between X and Y = 0.75
Calculate coefficient of variation for marks in Statistics and that
for marks in Accountancy. In which subject is the performance of
students is more consistent?
(xiv) The coefficient of correlation between X and Y for 20 items is
0.3. The mean of X is 15 and that of Y is 20 while the respective
standard deviations are 4 and 5. At the time of calculation, one
item 27 has wrongly been taken as 17 in the case of X series and
35 instead of 30 in the case of Y series. Find the correct coefficient
of correlation.
(xv) While making calculations about coefficient of correlation, a student
obtained the following results:
n = 25, 6X = 125, 6X2 = 650, 6Y = 100, 6Y2 = 460, and 6XY = 508
It was discovered later, however, that two pairs of values were
wrongly recorded as:
PAGE 305
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes X Y X Y
6 14 while the correct 8 12
values were:
8 6 6 8
Obtain the correct value of the coefficient of correlation.
(xvi) Find the coefficient of correlation between age and playing habits
of the following students:
Age of Players: 16 17 18 19 20 21
No. of Students: 2,500 2,000 1,500 1,200 1,000 800
Regular Players: 2,250 1,200 1,050 480 250 120
(xvii) A panel of judges A and B graded seven dramatic performances by
independently awarding marks as given here.
Performance : 1 2 3 4 5 6 7
Marks by A: 46 42 44 40 43 41 45
Marks by B: 40 38 36 35 39 37 41
Show by means of coefficient of correlation whether the marks given
by them are correlated.
(xviii) Calculate the coefficient of correlation between height and weight
of the students using the following data:
Height Weight (lbs)
(inches) 90–100 100–110 110–120 120–130
50–55 4 7
55–60 6 10 7
60–65 10 12 7
65–70 8 6 13
Ans.
(viii) –0.790 (ix) 0.895 (x) 0.819 (xi) 0.512
(xii) 44.44 (xiii) CV : Stats = 40% (xiv) 0.515 (xv) 0.667
Accs = 35.56%
(xvi) – 0.958 (xvii) r = 0.750 (xviii) 0.583
306 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
L E S S O N
2
Regression Analysis
STRUCTURE
2.1 Learning Objectives
2.2 Introduction
2.3 Difference between Correlation and Regression
2.4 Principle of Least Squares
2.5 Methods of Regression Analysis
2.6 3URSHUWLHV RI 5HJUHVVLRQ &RHI¿FLHQWV
2.7 Standard Error of an Estimate
2.8 Summary
2.9 Self-Assessment Questions
2.2 Introduction
The statistical technique correlation establishes the degree and direction of relationship
between two or more variables. But we may be interested in estimating the value of an
unknown variable on the basis of a known variable. If we know the index of money
supply and price-level, we can find out the degree and direction of relationship between
these indices with the help of correlation technique. But the regression technique helps us
in determining what the general price-level would be assuming a fixed supply of money.
PAGE 307
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Similarly if we know that the price and demand of a commodity are
correlated we can find out the demand for that commodity for a fixed
price. Hence, the statistical tool with the help of which we can estimate
or predict the unknown variable from known variable is called regression.
The meaning of the term “Regression” is the act of returning or going
back. This term was first used by Sir Francis Galton in 1877 when he
studied the relationship between the height of fathers and sons. His study
revealed a very interesting relationship. All tall fathers tend to have tall
sons and all short fathers short sons but the average height of the sons of
a group of tall fathers was less than that of the fathers and the average
height of the sons of a group of short fathers was greater than that of
the fathers. The line describing this tendency of going back is called
“Regression Line”. Modern writers have started to use the term estimating
line instead of regression line because the expression estimating line is
more clear in character. According to Morris Myers Blair, regression is
the measure of the average relationship between two or more variables
in terms of the original units of the data.
Regression analysis is a branch of statistical theory which is widely used
in all the scientific disciplines. It is a basic technique for measuring or
estimating the relationship among economic variables that constitute the
essence of economic theory and economic life. The uses of regression
analysis are not confined to economics and business activities. Its
applications are extended to almost all the natural, physical and social
sciences. The regression technique can be extended to three or more
variables but we shall limit ourselves to problems having two variables
in this lesson.
Regression analysis is of great practical use even more than the correlation
analysis. Some of the uses of the regression analysis are given below :
(i) Regression Analysis helps in establishing a functional relationship
between two or more variables. Once this is established it can be
used for various analytic purposes.
(ii) With the use of electronic machines and computers, the medium of
calculation of regression equation particularly expressing multiple
and non-linear relations has been reduced considerably.
308 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(iii) Since most of the problems of economic analysis are based on cause Notes
and effect relationship, the regression analysis is a highly valuable
tool in economic and business research.
(iv) The regression analysis is very useful for prediction purposes. Once
a functional relationship is established the value of the dependent
variable can be estimated from the given value of the independent
variables.
PAGE 309
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
310 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 311
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes
Fig. 1
Another line of regression called the regression line of x on y is drawn
amongst the same set of scatter dots in such a way that the squares of
the horizontal distances between dots are minimised.
Fig. 2
Fig. 3
312 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
It is clear that the position of the regression line of x on y is not exactly Notes
like that of the regression line of y on x. In the following figure both
the regression of y on x and x on y are exhibited.
Fig. 4
When there is either perfect positive or perfect negative correlation
between the two variables, the two regression lines will coincide and
we will have only one line. The farther the two regression lines from
each other, the lesser is the degree of correlation and vice-versa. If the
variables are independent, correlation is zero and the lines of regression
will be at right angles. It should be noted that the regression lines cut
each other at the point of average of x and y, i.e., if from the point where
both the regression lines cut each other a perpendicular is drawn on the
x-axis, we will get the mean value of x series and if from that point a
horizontal line is drawn on the y-axis we will get the mean of y series.
PAGE 313
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
314 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Solution : Notes
Computation of Regression Equations
x y xy x2 y2
6 9 54 36 81
2 11 22 4 121
10 5 50 100 25
4 8 32 16 64
8 7 56 64 49
6x = 30 6y = 40 6xy = 214 2
6x = 220 2
6y = 340
Regression line of Y on X is expressed by the equation of the form
Yc = a + bX
To determine the values of a and b, the following two normal equations
are solved
6y = Na + b6x
6xy = a6x + b6x2
Substituting the values, we get
40 = 5a + 30b ...(i)
214 = 30a + 220b ...(ii)
Multiplying equation (i) by 6, we get
240 = 30a + 180b ...(iii)
214 = 30a + 220b ...(iv)
Deduct equation (iv) from (iii)
– 40b = + 26
? b = – 0.65
Substitute the value of b in equation (i)
40 = 5a + 30 (– 0.65)
5a = 40 + 19.5 or a = 11.9
Substitute the values of a and b in the equation
Regression line of Y on X is
Yc = 11.9 – 0.65X
PAGE 315
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
316 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Σxy
bxy = where x = X − X and y = Y − Y
Σy 2
Regression Coefficient of Y on X is
σy
byx = r
σx
Σxy σ Σxy
byx = × y =
N σ x σ y σ x N σ 2x
Σxy
byx = where x = X − X and y = Y − Y
Σx 2
Example 3 : Calculate the regression coefficients from data given below :
Series x Series y
Average 25 22
Standard deviation 4 5 r = 0.8
Solution : The coefficient of regression of x on y is
σx 4
bxy = r = 0.8 × = + 0.64
σy 5
σy σx
i.e byx× bxy = r ×r
σx σy
byx × bxy = (for coefficient of correlation i.e r, put square root
both sides)
PAGE 317
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Covariance (X,Y)
bxy =
σ 2x
both regression coefficients have the same sign as covariance (X, Y).
The covariance (X, Y) may be positive or negative then bxy and
byx also positive or negative because and never be negative
which means that and are always positive.
(iv) If one of the regression coefficient is more than unity, the other must
be less than unity because the value of coefficient of correlation
cannot exceed one (r = ± 1). It means r varies from -1 to +1.
So if
byx×bxy = r 2 ≤ 1 (because byx × bxy = r2 )
byx × bxy =
1
then byx
bxy
byx + bxy
Therefore, > r
2
318 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
We can compute the regression equations with the help of regression Notes
coefficients by the following equations:
1. Regression equation X on Y
σx
X −X = r (Y − Y )
σy
2. Regression equation Y on X
We can explain this by taking an example :
σy
Y −Y = r (X − X )
σx
Marks in (X – X ) Marks in (Y – Y )
Eco (X) Stats (Y)
x x2 y y2 xy
25 – 7 49 43 + 5 25 – 35
28 – 4 16 46 + 8 64 – 32
35 + 3 9 49 + 11 121 + 33
32 0 0 41 + 3 9 0
PAGE 319
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes
Marks in (X – X ) Marks in (Y – Y )
Eco (X) Stats (Y)
x x2 y y2 xy
31 – 1 1 36 – 2 4 + 2
36 + 4 16 32 – 6 36 – 24
29 – 3 9 31 – 7 49 + 21
38 + 6 36 30 – 8 64 – 48
34 + 2 4 33 – 5 25 – 10
32 0 0 39 + 1 1 0
6X = 320 6x = 0 6x2 = 140 6Y = 380 6y = 0 6y2 = 398 6xy = – 93
(a) Regression equation X on Y
X − X = bxy (Y − Y )
Σxy −93
bxy = = = −0.234
Σy 2 398
320 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Since both the regression coefficients are negative, value of r must Notes
also be negative.
(c) Likely marks in statistics when marks in Economics are 30.
Y = – 0.664 X + 59.248 where X = 30
Y = (– 0.664 × 30) + 59.248 = 39.328 or 39.
Example 5 : The following scores were worked out from a test in
Mathematics and English in an annual examination.
Scores in English (y)
Mathematics
(x)
Mean 39.5 47.5
Standard deviation 10.8 16.8 r = + 0.42
Find both the regression equations. Using these regression estimate find
the value of Y for X = 50 and the value of X for Y = 30.
Solution : Regression of X on Y
σx
X −X = r (Y − Y )
σy
PAGE 321
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes 16.8
Y – 47.5 = 0.42 ( X − 39.5)
10.8
Y – 47.5 = 0.653 (X – 39.5) = 0.653 X – 25.79
or Y = 0.653 X – 25.79 + 47.5 = 0.653X + 21.71
When X = 50
Value of Y= (0.653 × 50 + 21.71) = 32.65 + 21.71 = 54.36
Thus the regression equations are :
Xc = 0.27y + 26.68
Yc = 0.653x + 21.71
Value of X when Y = 30 is 34.78
Value of Y when X = 50 is 54.36
When actual mean of both the variables X and Y come out to be in fractions,
the deviation from actual means create a problem and it is advisable to
take deviations from the assumed mean. Thus when deviations are taken
from assumed means, the value of bxy and byx is given by
(Σdx) × (Σdy )
Σdxdy −
bxy = N where dx = (X – A) and dy = (Y – A)
( Σ dy )2
Σdy −
2
N
The regression equation is :
( X − X ) = bxy (Y − Y )
Similarly the regression equation of Y on X is
(Y − Y ) = byx ( X − X )
(Σdx) × (Σdy )
Σdxdy −
byx = N
( Σ dx )2
Σdx 2 −
N
Let us try to understand with the help of an example :
Example 6 : You are given the data relating to purchases and sales.
Compute the two regression equations by method of least squares and
estimate the likely sales when the purchases are 100.
322 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Purchases : 62 72 98 76 81 56 76 92 88 49 Notes
Sales : 112 124 131 117 132 96 120 136 97 85
Solution :
Calculations of Regression Equations
Purchases (X–76) Sales (Y–120)
2
X dx dx Y dy dy2 dxdy
62 – 14 196 112 – 8 64 + 112
72 – 4 16 124 + 4 16 – 16
98 + 22 484 131 + 11 121 + 242
76 0 0 117 – 3 9 0
81 + 5 25 132 + 12 144 + 60
56 – 20 400 96 – 24 576 + 480
76 0 0 120 0 0 0
92 + 16 256 136 + 16 256 + 256
88 +12 144 97 –23 529 – 276
49 –27 729 85 –35 1225 + 945
6dx = – 10 6dx 2 = 6dy = – 6dy2 = 6dxdy =
2250 50 2940 1803
Σdx 10 Σdy 50
X = A+ = 76 − = 75 and Y = A + = 120 − = 115
N 10 N 10
Regression Coefficients : X on Y
(Σdx) × (Σdy ) (−10) × (−50)
Σdxdy − 1803 −
N 10 1753
bxy = = = = 0.652
(Σdy ) 2
(−50) 2
2690
Σdy −
2
2940 −
N 10
Y on X
(Σdx) × (Σdy ) (−10) × (−50)
Σdxdy − 1803 −
N 10 1753
byx = = = = 0.78
(Σdx) 2
(−10) 2
2240
Σdx −
2
2250 −
N 10
Regression equation : X on Y
X − X = bxy (Y − Y )
PAGE 323
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
(Y − Y ) = bxy ( X − X )
Y – 115 = 0.78 (X – 75) = 0.78 X – 58.5
Y = 0.78 X + 56.5
when X = 100
Y = 0.78 × 100 + 56.5= 134.5
Σ( X − X c ) 2
and Sxy =
N −2
where Syx refers to standard error of estimate of Y values on X values.
Sxy refers to standard error of estimate of X values on Y values.
Yc and Xc are the estimated values of Y and X variables by means of their
regression equations respectively. N – 2 is used for getting an unbiased
estimate of standard error. The usual explanation given for this division
by N – 2 is that the two constants a and b were calculated on the basis
324 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
of original data and we lose two degrees of freedom. Degrees of freedom Notes
means the number of classes to which values can be assigned at will
without violating any restrictions.
However a simpler method of computing Syx and Sxy is to use the
following formulae :
ΣY 2 − aΣY − bΣXY
Syx =
N −2
ΣX 2 − aΣX − bΣXY
and Sxy =
N −2
The standard error of estimate measures the accuracy of the estimated
figures. The smaller the values of standard error of estimate, the closer
will be the dots to the regression line and the better the estimates based
on the equation for this line. If standard error of estimate is zero, then
there is no variation about the line and the correlation will be perfect.
Thus with the help of standard error of estimate it is possible for us to
ascertain how good and representative the regression line is as a description
of the average relationship between two series.
Example 7 : Given the following data :
X : 6 2 10 4 8
Y : 9 11 5 8 7
And two regression equations Y = 11.09 – 0.65 X and X = 16.4 – 1.3
Y. Calculate the standard error of estimate i.e. Syx and Sxy.
Solution :
We can calculate Xc and Yc values from these regression equations.
X Y Yc Xc (Y – Yc)2 (X – Xc)2
6 9 8.0 4.7 1.00 1.69
2 11 10.6 2.1 0.16 0.01
10 5 5.4 9.9 0.16 0.01
4 8 9.3 6.0 1.69 4.00
8 7 6.7 7.3 0.09 0.49
6X = 30 6Y = 40 6Yc = 40 6Xc = 30 6(Y–Yc)2 = 6(X –Xc)2
3.1 = 6.20
Thus we can calculate Syx and Sxy from the above calculated values.
PAGE 325
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes
Σ(Y − Yc ) 2 3.1
Syx = = = 1.03 = 1.01
N −2 5−2
Σ( X − X c ) 2 6.2
Sxy = = = 2.07 = 1.44
N −2 5−2
2.8 Summary
Regression analysis deals with estimating values of one variable based on
the values of one or more other variables. The variable being estimated
is called dependent variable while the variable/s used to make estimates
is/are called independent variable/s. The simple regression analysis
involves one independent variable and one dependent variable. It is based
on the assumption of linear relationship between the two variables. The
relationship between variables is presented by means of a regression
equation which is obtained using the principle of least squares. For a
given set of data involving two variables, X and Y, we can derive two
regression equations: one treating Y as the dependent variable and the
other treating X as the dependent variable.
When correlation between two variables is perfect, the two regression
equations are reversible because they both actually represent the same
line. The closer the two regression lines to each other, the higher is
the degree of correlation. The sign of the two regression coefficients is
always the same as the sign of the coefficient of correlation. Standard
error of estimate measures the variation around the regression line. A
small value of the standard error implies that the data cluster around
the regression line.
326 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 327
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes two regression lines intersect? If the two regression lines coincide,
what does it imply?
(iii) Explain the properties of the regression coefficients. Do you agree
that for a given set of data if each of the X values, is multiplied by
5, then the regression coefficient of Y on X would also be multiplied
by 5 while the regression coefficient of X on Y will be reduced to
l/5th of its original value? Explain.
(iv) Explain the properties of regression coefficients. What is the difference
between Regression and Correlation Analysis?
(v) Given the following data:
X : 7 9 7 12 12 11 14 16
Y : 6 12 12 14 14 16 18 20
(a) Fit the regression equation of Y on X.
(b) Estimate the value of Y for X = 15.
(vi) Given the following information: 6X = 56; 6Y = 40; 6X2 = 524;
6Y2: = 256; 6XY= 364: and n = 8. Obtain the regression equation
of X on Y.
(vii) In the estimation of the regression equation of two variables X and
Y, the following results were obtained:
6X = 90: 6Y = 70; 6X2 = 6,360; 6Y2 = 2,860; 6XY = 3,900: and n
= 10
Obtain the two regression equations.
(viii) The following data relate to 50 workers of a factory in respect of
their experience (X) in months and time needed (Y) in minutes to
fit an apparatus.
Mean of X = 50
Mean of Y = 60
Standard deviation of X = 20
Standard deviation of Y = 20
Covariance (XY) = –100
328 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 329
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
330 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Obtain the two regression coefficients and the regression equations. Notes
Ans.
(v) Y = 0.861 + 01.194X, 18.78
(vi) X = 0.5 + 1.5Y
(vii) Y = 1.70 + 0.589X and X = – 0.66 + 1.38Y
(viii) byx = bxy = – 0.25, r2 = 0.0625
(ix) (a) Y = 4 + 4X, X = 0.8 + 0.16Y (b) 140 crores (c) 24.8 crores
(x) (i) 64 crores, (ii) 87 lakhs
(xi) (i) Y = 9.9 + 1.35X (ii) 16.65, (iii) 3.006
(xii) Y = + 0.8 X and X = 2.778
(xiii) (a) 1.333 (b) 0.75 (c) SEyx = 0.5774, SExy = 1
(xiv) byx = 5.143, bxy = 0.125, Y = 146.86 + 5.143X, X = – 3.07 +
0.125Y
PAGE 331
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
UNIT-4
PAGE 333
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
L E S S O N
1
Index Numbers
STRUCTURE
1.1 Learning Objectives
1.2 Introduction
1.3 Features of Index Numbers
1.4 Problems of Index Numbers
1.5 Methods of Constructing Index Numbers
1.6 Tests of Adequacy or Consistency
1.7 Chain Base Index
1.8 Splicing
1.9 Consumer Price Index
1.10 Index Number of Industrial Production
1.11 Limitations of Index Numbers
1.12 Construction of BSE Sensex and NSE Nifty
1.13 Summary
1.14 Self-Assessment Questions
PAGE 335
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
336 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 337
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes and ability to represent trends and relative changes make them valuable
tools. Here are some common uses of index numbers:
Inflation Measurement: Consumer Price Index (CPI) and Producer
Price Index (PPI) are used to track inflation rates by measuring
changes in the prices of goods and services over time. These indexes
help policymakers and economists understand price trends and their
impact on the economy.
Economic Indicators: Index numbers are used to calculate economic
indicators such as Gross Domestic Product (GDP) deflator and
real GDP growth. These indicators provide insights into economic
performance while adjusting for inflation.
Cost-of-Living Adjustment: Index numbers help determine Cost-of-
Living Adjustments (COLA) for salaries, pensions, and benefits.
This ensures that payments keep pace with changes in the price
level, maintaining purchasing power.
338 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
items should be selected which are representative of the data, e.g. in Notes
a consumer Price Index for working class, items like scooters, cars,
refrigerators, cosmetics, etc. find no place. There is no hard and fast rule
regarding the inclusion of number of commodities while constructing Index
Numbers. The number of commodities should be such as to permit the
influence of the inertia of large numbers. At the same time the numbers
should not be so large as to make the work of computation uneconomical
and even difficult. The number of commodities should therefore be
reasonable. The following points should be considered while selecting
the items to be included in the Index :
(i) The items should be representative.
(ii) The items should be of a standard quality.
(iii) Non-tangible items should be excluded.
(iv) The items should be reasonable in number.
PAGE 339
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Fixed Base Method : According to this any year is taken as a base.
Prices during the year are taken equal to 100 and the prices of other
years are shown as percentages of those prices of the base year. Thus if
indices for 1998, 1999, 2000, and 2001 are calculated with 1997 as base
year, such indices will be called as fixed base indices.
Chain Case Method : According to this method, relatives of each year
are calculated on the basis of the prices of the preceding year. The Chain
base Index Numbers are called as Link Relatives e.g., if index numbers
are constructed for 1997, 1998, 1999, 2000 and 2001 then for 1998,1997
will be the base and for 1999, 1998 will be the base and so on.
340 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 341
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
342 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
⎧ ⎧⎪ Notes
⎛p ⎞ ⎫⎪ ⎫
⎪ ⎨Σ log ⎜ 1 ×100 ⎟ ⎬ ⎪
⎨ ⎪⎩ ⎝ p0 ⎠ ⎪⎭ ⎬
(b) When geometric mean is used P01 = AL ⎪ ⎪
⎩ N ⎭
where N refers to the number of items whose price relatives are averaged.
Example 2 : Calculate Index Numbers for 2011, 2012 and 2013 taking
2010 as base from the following data by average of relatives method.
Commodity 2010 2011 2012 2013
A 2 5 4 3
B 8 11 13 6
C 4 5 6 8
D 6 4 5 7
E 5 4 6 3
Solution :
Construction of Index Numbers based on Mean of Relatives
Commodity 2010 2011 2012 2013
p0 p1 p2 p3
A 2 100 5 250.0 4 200.0 3 150.0
B 8 100 11 137.5 13 162.5 6 75.0
C 4 100 5 125.0 6 150.0 8 200.0
D 6 100 4 66.7 5 83.3 7 116.7
E 5 100 4 80.0 6 120.0 3 60.0
500 659.2 715.8 601.7
P01 = Index with 2010 as base and 2011 as current year
⎛p ⎞
Σ ⎜ 1 ×100 ⎟
p
P01 = ⎝ 0 ⎠ = 659.2 = 131.84
N 5
PAGE 343
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Advantage
The index number based on the simple average of price relatives
method is not influenced by the units in which prices are quoted.
The index number based on this method is not influenced by extreme
price quotations.
Disadvantage
It gives equal importance to all the items and thus neglects their
relative importance in the group. This drawback can be removed
by taking the weighted average of the price relatives.
A fair amount of difficulties is observed regarding the selection
of an appropriate average. G.M though difficult to compute, is
theoretically a better average than Arithmetic Mean. However, because
of the computational ease, arithmetic mean is used in practice.
344 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Example 3 : Compute Price Index and Quantity Index from data given
below by Laspeyre’s method.
Items Base year Current year
Quantity Price Quantity Price
A 6 units 40 paise 7 units 30 paise
B 4 units 45 paise 5 units 50 paise
C 5 units 90 paise 1.5 units 40 paise
PAGE 345
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Σp1q0 400
Price Index (P01) = ×100 = ×100 = 86.00
Σp0 q0 465
Σq1 p0 640
Quantity Index (Q01) = ×100 = ×100 = 137.63
Σq0 p0 665
(ii) Paasche’s Method: Under this method of calculating Price Index the
quantities of the current year are used as weights as compared to
base year quantities used by Lespeyre. Symbolically
Price Index or P01
Steps of construction Index according to Paasche’s method are :
(i) Calculate the product of the current year prices of different commodities
and their respective quantities for the current year (p1× q1)and find
out the total of the product of different commodities 6(p1× q1) .
(ii) Calculate the product of p0 and q1 of different commodities and
aggregate them 6(p0q1).
(iii) Divide 6(p1× q1) with 6(p0q1) and multiply the quotient by 100 to
obtain Price Index. Similarly, quantity index is calculated using the
current year price as weights. Symbolically,
Σq1 p1
Q01 = ×100
Σq0 p1
Example 4 : From the data of previous illustration, calculate (i) Price
Index (ii) Quantity Index by Paasche’s method.
346 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Σq1 p1 520
Quantity Index Q01 = ×100 = ×100 = 130
Σq0 p1 400
(iii) Fisher’s Ideal Index : Laspeyre has used base year quantities as
weights whereas Paasche’s has used current year quantities as weights
for the computation of Index Number of prices. Fisher suggested
that both the current year quantities and the base year quantities
should be used but geometric mean of the two be calculated and
that figure should be the Index Number. Symbolically,
Fisher’s Price Index P01 =
Fisher’s Index =
On the other hand if quantity Indices by this method are to be
calculated the geometric mean of the Index Number of quantities
with base year prices as weights and Index Number of Quantities
with current year as weights be found out. Symbolically,
Fisher’s Quantity Index
Example 5 : Construct Index Number of Prices and Quantities from the
following data using Fisher’s method (2010 = 100).
2010 2014
Commodity Price Qty. Price Qty.
A 2 8 4 6
B 5 10 6 5
C 4 14 5 10
D 2 19 2 13
PAGE 347
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
348 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
ΣIV 29779
Weighted Index Number of the Current year = = = 121.5
ΣV 245
In weighted average of relatives, the Geometric mean may be used
instead of arithmetic mean. The weighted geometric mean of relatives
is calculated by applying logarithms to the relatives. When this mean is
used, then formula is:
P01 = Antilog {
ΣV .log I
ΣV } p1
where I = p ×100 and V = p0q0
0
PAGE 349
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Solution :
Calculation of Index Number
⎛p ⎞
(p0q0) ⎜ p1 ×100 ⎟
⎝ 0 ⎠
Com- p0 q0 p1 V I Log I V. log I
modities
X 3.0 20 4.0 60 133.33 2.1249 127.494
Y 1.5 40 1.6 60 106.7 2.0282 121.692
Z 1.0 10 1.5 10 150.0 2.1761 21.761
6V = 130 6V log I = 270.947
By applying the formula:
P01 = AL { ΣV .log I
ΣV } ⎛ 270.947 ⎞
= AL ⎜
⎝ 130 ⎠
⎟ = AL2.084 = 121.3
350 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 351
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (iv) Factor Reversal Test: It says that the product of a price index and
the quantity index should be equal to value index. In the words of
Fisher, just as each formula should permit the interchange of the two
times without giving inconsistent results similarly it should permit
interchanging the prices and quantities without giving inconsistent
results which means two results multiplied together should give the
true value ratio. The test says that the change in price multiplied
by change in quantity should be equal to total change in value. If
P01 is a price index for the current year with reference to base year
and Q01 is the quantity index for the current year,
Σp1q1
then P01 × Q01 =
Σp0 q0
Changing p to q and q to p.
Σq1 p0 Σq1 p1
Q01 = ×
Σq0 p0 Σq0 p1
Thus we find that the product of price ratio and quantity ratio equals
the value ratio:
1.5 × 1.4 = 2.1
352 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
The various formulas discussed so far assume that base period is some
fixed previous period. The index of a given year on a given fixed base is
not affected by changes in the prices or the quantities of any other year.
On the other hand, in the chain base method, the value of each period
is related with that of the immediately proceeding period and not with
any fixed period. To construct index numbers by chain base method, a
series of index numbers are computed for each year with preceding year
as the base. These index numbers are known as Link relatives. The link
relatives when multiplied successively known as the chaining process
give link to a common base. The products obtained are expressed as %
and give the required index number. The steps of chain base index are:
(i) Express the figures of each period as a % of the preceding period
to obtain Link Relatives (LR)
(ii) These link relatives are chained together by successive multiplication
to get chain indices by the formula:
Current year LR × Preceding year Chain Index
Chain Base Index (CBI) =
100
(iii) The chain index can be converted into a fixed base index by this
formula:
Current year CBI × Previous year FBI
Fixed Base Index (FBI) =
100
Chain relatives are computed from link relatives whereas fixed base
relatives are computed directly from the original data. The results obtained
by fixed base and chain base index invariably are the same.
We shall understand the process by taking some examples.
Example 8 : Construct Index Numbers by chain base method from the
following data of wholesale prices.
Year : 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Prices : 75 50 65 60 72 70 69 75 84 80
PAGE 353
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Solution :
Computation of Chain Index
Year Price Link Relatives Chain Base Index Fixed Base Index
2005 75 100 100 100
50 66.67 ×100 50
2006 50 ×100 = 66.67 = 66.67 ×100 = 66.67
75 100 75
65 130 × 66.67 65
2007 65 ×100 = 130 = 86.67 ×100 = 86.67
50 100 75
60 92.31× 86.67 60
2008 60 ×100 = 92.31 = 80.00 ×100 = 80
65 100 75
72 120 × 80 72
2009 72 ×100 = 120 = 96.00 ×100 = 96
60 100 75
70 97.22 × 96 70
2010 70 ×100 = 97.22 = 93.33 ×100 = 93.33
72 100 75
69 98.57 × 93.33 69
2011 69 ×100 = 98.57 = 92.00 ×100 = 92
70 100 75
75 108.69 × 92 75
2012 75 ×100 = 108.69 = 100.00 ×100 = 100
69 100 75
84 112 ×100 84
2013 84 ×100 = 112 = 112.00 ×100 = 112
75 100 75
80 95.24 ×112 80
2014 80 ×100 = 95.24 = 106.67 ×100 = 106.67
84 100 75
It may be seen that index by chain base and fixed base method comes
to the same.
Example 9 : Construct chain index numbers from the link relatives given
below:
Year : 2011 2012 2013 2014 2015
Link Relatives : 100 105 95 115 102
354 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Solution : Notes
Calculations for Chain Base Index
Year Link Relatives Chain Index Number
2011 100 100
2013 95 95
×105 = 99.75
100
100
2012 10 100 ×100 = 67
150
12 120
2013 12 ×100 = 120 ×100 = 80
10 150
PAGE 355
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
21 210
2015 21 ×100 = 210 ×100 = 140
10 150
20 200
2016 20 ×100 = 200 ×100 = 133
10 150
1.8 Splicing
On several occasions the base year may give discontinuity in the
construction of index numbers. We would always like to compare figures
with a recent year and not with distant past. For example, the weights of
an index number may become out of data and we may construct another
index with new weights. Two indices would appear. It becomes necessary
to convert these two indices into a continuous series. The procedure
employed to do the conversion is known as splicing. The formulae are:
For Forward Splicing:
Old index of the New Base Year × Index to be adjusted
Spliced Index Number:
100
For Backward Splicing:
100
Spliced Index Number: × Index to be adjusted
Old index of the New Base Year
Example 11 : Splice the following two Index number series, A series
forward and B series backward:
Year : 2010 2011 2012 2013 2014 2015
Series A : 100 120 150 — — —
Series B : — — 100 110 120 150
356 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Solution : Notes
Splicing of two Index Number Series
Year Series Series Index Number Spliced Index Numbers Spliced
A B forward to Series A backward to Series B
100
2010 100 ×100 = 66.66
150
100
2011 120 ×120 = 80.00
150
150 100
2012 150 100 ×100 = 150 ×150 = 100.00
100 150
150
2013 110 ×110 = 165
100
150
2014 120 ×120 = 180
100
150
2015 150 ×150 = 225
100
PAGE 357
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
358 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
When weighted relatives method is used then the family budgets of a Notes
large number of people for whom the index is meant are carefully studied
and the aggregative expenditure of an average family on various items
is estimated. These will be weights. In other words, the weights are
calculated by multiplying the base year quantities and prices (p0q0). The
price relatives for all the commodities are prepared and multiplied by the
weights. By applying the formula, we can calculate Consumer price index.
ΣIV p
Consumer Price Index = where I = 1 ×100 and V = p0q0
ΣV p0
Example 12 : Prepare the Consumer price index for 2013 on the basis
of 2010 from the following data by both methods.
Commodities Quantities Consumed Prices Prices
2010 2010 2013
A 6 5.75 6.00
B 6 5.00 8.00
C 1 6.00 9.00
D 6 8.00 10.00
E 4 2.00 1.50
F 1 20.00 15.00
Solution :
Consumer Price Index by Aggregative Method
Commodities q0 p0 p1 p 1q 0 p 0q 0
A 6 5.75 6.00 36.00 34.50
B 6 5.00 8.00 48.00 30.00
C 1 6.00 9.00 9.00 6.00
D 6 8.00 10.00 60.00 48.00
E 4 2.00 1.50 6.00 8.00
F 1 20.00 15.00 15.00 20.00
6p1q0 = 174 6p0q0 = 146.5
Σp1q0 174
Consumer Price Index = ×100 = ×100 = 118.77
Σp0 q0 146.5
PAGE 359
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
ΣIV 17400
Consumer Price Index = = = 118.77
ΣV 146.5
ΣIW
Index of Industrial Production =
ΣW
q1
where I = and W = Relative importance of different outputs
q0
360 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 361
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
362 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 363
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
364 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Scrip Company Close No. of shares Full mkt. Free- Free-float Weight, Notes
code price (normal) cap. (Rs. float mkt. cap. (Rs. in in-
crore) adj. crore) dex (%)
factor
532500 Maruti Su- 954.75 288,910,060 27,583.69 0.50 13,791.84 1.06
zuki
532868 DLF Limited 176.60 1,698,157,659 29,989.46 0.25 7,497.37 0.58
532532 Jaiprak Asso 51.90 2,126,433,182 11,036.19 0.55 6,069,90 0.47
Total 2,641,977.20 1,298,919.37
PAGE 365
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes culmination of successful experiences with these two indices and a series
of debates and discussions in the last few years.
The new methodology would align the Sensex with the best global practice
in index construction. A smooth transition from full market capitalisation
to Free-float market capitalisation methodology would ensure that the
basic characteristics of Sensex are retained. Importantly, the Free-float
methodology will further improve the benchmarking qualities of Sensex
while maintaining its historical continuity.
The following Free-float factors will be applied to the Sensex companies.
A Free-float factor of say 0.9 means that only 90% of the total market
capitalisation of that company would be taken into consideration for
index calculation.
Free-float Index
Currently all equity indices in India, except the BSE-TECk Index and
BANKEX, are calculated using the ‘full-market capitalisation’ methodology.
Under the ‘full-market capitalisation’ methodology, the total market
capitalisation of a company, irrespective of who is holding the shares,
is taken into consideration for computation of an index. However, if
instead of taking the total market capitalisation, only the Free-float market
capitalisation of a company is considered for index calculation, it is called
the Free-float methodology. Free-float market capitalisation is defined as
that proportion of total shares issued by the company, which are readily
available for trading in the market. It generally excludes promoters’ holding,
government holding, strategic holding and other locked-in shares, which
will not come to the market for trading in the normal course. Thus, the
market capitalisation of each company in a Free-float index is reduced
to the extent of its Free-float available in the market.
National Stock Exchange (NSE)
In order to provide a nationwide stock trading facility to investors and
to bring the Indian financial market in line with international market, the
National Stock Exchange (NSE) was set up and it started its operations
by the end of 1993. Further, it started trading in debt instruments in
May, 1994 and in equity shares by the end of November 1994. The NSE
uses the electronic trading system and computerised settlement system.
This system is so designed that it can be extended to every corner of
366 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
the country through the medium of electronic network. It was recently Notes
accorded recognition as a stock exchange by the Department of Company
Affairs. The instruments traded are treasury bills, government security
and bonds issued by public sector companies.
The exchange has two separate segments, i.e., capital market segment and
money market segment. The former is concerned with trading in equity
shares, convertible debentures and debt instruments as non-convertible
debentures. In the money market segment, also known as wholesale debt
market segment, facilitates trading in debts, public sector bonds, mutual
fund units, treasury bills, government securities, call money instruments,
etc. The transactions in this segment are of high values. The main
participants, in this market are usually banks, financial institutions and
other financial agencies.
NSE-50, NIFTY
The NSE-50 index, NIFTY was launched by the National Stock Exchange
of India Limited (NSE) in April 1996, taking as base the closing prices
of November prices of November 3, 1995 when one year of operations
of its capital market segment was completed. According to the NSE, the
index was introduced with the objectives of:
1. reflecting market movement more accurately,
2. providing fund managers with a tool for measuring portfolio returns
vis-a-vis market returns, and
3. providing a basis for introducing index-based derivatives.
The index is based on the prices of shares of 50 companies (chosen
from among the companies traded on the NSE), each with a market
capitalisation of at least Rs. 500 crores and having a high degree of
liquidity. The methodology used for the computation of this index is
‘market capitalisation weightage’ as followed by the S&P-500. The base
value of the index has been set at 1000, and not the usual 100.
S&P CNX NIFTY
The S&P CNX Nifty is the headline index on the National Stock Exchange
of India Ltd. (NSE). It includes 50 of the approximately 1,300 companies
listed on the NSE, captures approximately 60% of its equity market
capitalization and is a true reflection of the Indian stock market.
PAGE 367
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes S&P CNX Nifty tracks the behaviour of a portfolio of blue chip companies,
the largest and most liquid Indian securities. It covers 25 sectors of the
Indian economy and offers investment managers exposure to the Indian
market in one efficient portfolio. The index has been trading since April
of 1996 and is well suited for benchmarking, index funds, and index-
based derivatives.
The S&P CNX Nifty index is owned and managed by the Indian Index
Services and Products Ltd. (IISL), with which Standard and Poor’s has a
consulting and licensing agreement. IISL is a joint venture between NSE
and CRISIL (formerly Credit Rating Information Services of India Ltd.).
Index Methodology
S&P CNX Nifty is maintained by IISL’s Index Policy Committee, which
manages policy and guidelines for all CNX (CRISIL/NSE) indices. This
Index Policy Committee follows a clear published set of rules for index
revision and meets quarterly to consider their application. Additionally, the
IISL’s Index Maintenance Sub-Committee reviews decisions about additions
and deletions to the index on a quarterly basis. Complete details of these
rules are available on the website at www.indices.standardandpoors.com.
NIFTY COMPOSITION
Sl. Scrip Equity Free Weigh- Beta R2 Volatility Monthly Impact
No. Capital Float tage% Returns Cost
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
1. ACC 1,877,452,660 9,629 0.59 0.72 0.48 1.23 6.43 0.07
2. Ambuja 3,063,349,822 10,652 0.65 0.97 0.48 1.61 –3.22 0.08
Cement
3. Axis 4,116,997,330 34,573 2.10 1.36 0.74 1.86 3.72 0.07
Bank
4. Bajaj 2,893,670,200 19,722 1.20 0.75 0.49 1.07 4.44 0.06
Auto
5. Bharti 18,987,650,480 52,650 3.20 0.76 0.46 2.05 10.72 0.07
Airtel
6. BHEL 4,895,200,000 29,067 1.77 0.86 0.58 1.99 –10.29 0.06
7. BPCL 3,615,421,240 8,500 0.52 0.79 0.45 0.92 1.17 0.07
8. CAIRN 19,022,340,290 11,439 0.70 0.59 0.40 1.79 –1.11 0.07
9. CIPLA 1,605,842,714 15,558 0.95 0.71 0.51 1.26 –7.39 0.07
10. DLF 3,395,150,248 8,380 0.51 1.42 0.66 2.59 9.76 0.07
11. DR 846,959,090 19,974 1.22 0.59 0.46 1.55 3.53 0.08
Reddy
368 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Sl. Scrip Equity Free Weigh- Beta R2 Volatility Monthly Impact Notes
No. Capital Float tage% Returns Cost
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
12. GAIL 12,684,774,000 20,712 1.26 0.62 0.46 1.34 4.51 0.08
13. GRASIM 917,018,380 13,942 0.85 0.67 0.49 1.37 4.78 0.08
14. HCL Tech 1,376,020,536 11,839 0.72 0.95 0.60 1.21 –1.61 0.06
15. HDFC 2,936,038,650 89,641 5.45 1.19 0.71 1.26 –2.45 0.07
16. HDFC 4,667,711,360 87,080 5.30 1.06 0.74 1.06 –3.24 0.07
Bank
17. Hero 399,375,000 17,035 1.04 0.53 0.25 1.20 –4.91 0.06
Honda
18. Hindal 1,914,419,297 21,647 1.32 1.42 0.67 2.44 –6.70 0.06
Co.
19. Hind 2,160,683,560 33,208 2.02 0.58 0.42 0.84 –5.72 0.05
Unilever
20. ICICI 11,518,614,870 119,419 7.27 1.40 0.78 1.47 –5.29 0.06
Bank
21. IDFC 14,627,715,770 15,139 0.92 1.47 0.71 2.45 –3.93 0.07
22. INFY 2,870,938,460 133,825 8.14 0.83 0.62 1.38 –4.62 0.04
23. ITC 7,738,144,280 110,948 6.75 0.75 0.54 1.26 2.66 0.06
24. Jindal 934,509,595 22,847 1.39 0.99 0.67 1.21 –9.87 0.06
Steel
25. JP 4,252,866,364 7,527 0.46 1.78 0.70 1.69 –17.73 0.07
Associate
26. Kotak 3,689,051,845 15,763 0.96 1.20 0.67 2.02 –7.16 0.08
Bank
27. LT 1,220,046,436 92,399 5.62 1.18 0.73 1.68 –5.38 0.06
28. M&M 3,069,874,195 33,234 2.02 1.20 0.65 1.99 2.69 0.07
29. Maruti 1,444,550,300 15,963 0.97 0.82 0.56 1.14 4.03 0.06
30. NTPC 82,454,644,000 22,507 1.37 0.75 0.57 1.23 –5.78 0.07
31. ONGC 42,777,450,600 36,322 2.21 0.71 0.48 0.94 –1.82 0.07
32. PNB 3,168,121,570 14,955 0.91 0.97 0.64 1.73 3.18 0.08
33. Power 46,297,253,530 14,879 0.91 0.53 0.45 0.96 –4.02 0.06
Grid
34. Ranbaxy 2,106,816,150 8,231 0.50 0.89 0.50 1.29 –0.35 0.06
35. RCom 10,320,134,405 6,736 0.41 1.22 0.47 4.16 6.11 0.08
36. Rel 2,456,328,000 6,484 0.39 1.24 0.57 1.54 –0.50 0.06
Capital
37. Reliance 32,738,103,000 139,612 8.50 0.96 0.68 1.60 –7.85 0.06
38. Re Infra 2,653,702,620 7,502 0.46 1.13 0.46 2.08 1.07 0.07
39. R Power 28,051,264,660 6,149 0.37 0.98 0.52 1.19 –3.41 0.07
PAGE 369
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Sl. Scrip Equity Free Weigh- Beta R2 Volatility Monthly Impact
No. Capital Float tage% Returns Cost
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
40. SAIL 41,304,005,450 7,400 0.45 1.06 0.61 1.64 –8.11 0.07
41. SBIN 6,349,989,910 60,447 3.68 1.25 0.69 1.35 –2.49 0.04
42. SESAGOA 869,101,423 10,727 0.65 1.05 0.51 1.95 –2.48 0.06
43. Siemens 680,589,800 7,881 0.48 0.52 0.33 1.17 3.86 0.07
44. Ster 3,361,568,684 22,704 1.38 1.21 0.61 1.58 –5.05 0.06
45. Sun 1,035,581,955 19,472 1.18 0.68 0.45 1.17 4.08 0.07
Pharma
46. Tata 5,382,725,080 33,259 2.02 1.27 0.61 1.93 –4.64 0.05
Motors
47. Tata 2,373,072,360 20,733 1.26 0.57 0.46 1.30 –2.07 0.07
Power
48. Tata Steel 9,592,144,500 37,547 2.28 1.15 0.70 1.14 –7.58 0.05
49. TCS 1,957,220,996 57,746 3.51 0.95 0.61 1.15 –3.99 0.06
50. WIPRO 4,911,272,378 19,741 1.20 0.83 0.56 1.37 –6.78 0.06
1.13 Summary
An index number measures relative changes in the value of some economic
variable/s over a period of time. It is always expressed in terms of a
base of usually 100. Index numbers showing changes in the values of one
variable over time are called univariate while those showing changes in a
group of variables are known as composite index numbers. The base of
an index may either be fixed or chained. In fixed base index numbers,
370 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
the base period is common while in chain base indices, for every period Notes
its immediately preceding period is taken as the base.
In order to compare two time series of price relatives, it is necessary
that each series should have the same base period.The composite index
numbers may be simple or weighted, and aggregative or average-of-
relatives. Simple aggregative price index shows the aggregate of the
current year prices as a percentage of the aggregate of the base year
prices. For weighted aggregative price indices, the quantities are used
as weights whereas for aggregative quantity indices, the prices are used
as weights.The two basic aggregative price indices are those using base
year quantities as weights, known as Laspeyre’s index and current year
quantities as weights, known as Paasche’s index.
Fisher’s index is equal to the geometric mean of Laspeyre’s and Paasche’s
indexes.There are four tests of adequacy of index numbers: Units test,
time-reversal test, factor-reversal test and circular test. Fisher’s method is
the only one that satisfies the first three of these tests.Value index may
be calculated by taking the ratio of current year value to base year value
expressed as a percentage. Purchasing power of rupee varies inversely
with the price index. Splicing refers to joining two or more series of
index numbers for the reason of continuity.
PAGE 371
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (v) Laspeyre’s formula uses base year quantities as weights both for
price and quantity index numbers.
(vi) Laspeyre’s price index can never be smaller than 100 in value.
(vii) Since it is not practical to include all commodities in the construction
of an index number, the sample of commodities should be selected
by the method of sample random sampling for reasons of objectivity.
(viii) For a given set of data. Fisher’s price index number cannot exceed
both Laspeyre’s and Paasche’s price indices.
(ix) Fixed base index numbers are also called link relatives while price
relatives is another name for chain base indices.
(x) In weighted average-of-relatives index, the weights used are either
the base year quantities or the current year quantities.
(xi) Paasche’s price index is calculated by using current year values as
weights.
(xii) Laspeyre’s price index always has greater value than Paasche’s price
index.
(xiii) Dorbish-Bowley index is equal to the arithmetic mean of Laspeyre’s
and Paasche’s price indices.
(xiv) The value index is given by the product or price and quantity index
numbers.
(xv) If an index satisfies time-reversal test, it means that for that index.
P01 and P10 are reciprocal of each other.
(xvi) Simple aggregative index satisfies circular test.
(xvii) Fisher’s index is called ‘ideal’ because it satisfies all the tests of
adequacy of index numbers.
(xviii) Splicing refers to connecting two or more series of index numbers
for the purpose of continuity.
(xix) The purchasing power of rupee is inversely related to the price
index.
372 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Ans. Notes
(i) T (ii) T (iii) F (iv) F (v) F
(vi) F (vii) F (viii) T (ix) F (x) F
(xi) F (xii) F (xiii) T (xiv) F (xv) T
(xvi) T (xvii) F (xviii) T (xix) T
PAGE 373
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Year: 2004 2005 2006 2007 2008 2009 2010 2011 2012
Price in 62 68 63 69 75 78 82 98 100
Rs. per
kg.:
(a) Construct price index numbers taking (i) 2004 as base, and (ii)
2008 as base.
(b) Calculate chain base index numbers.
(ix) The price index for 2006 stood at 100. It increased by 8 per cent
in 2007, decreased by 6 per cent in 2008, decreased by 2 per cent
in 2009, increased by 14 per cent in 2010, remained unchanged
in 2011, and increased by 12 per cent in 2012. Calculate index
numbers for the years 2004 through 2012 taking 2006 = 100, and
then shift the origin to 2008.
(x) Using the following link relatives, calculate price relatives taking
2005 = 100.
Year: 2005 2006 2007 2008 2009 2010 2011 2012
Link relative: 100 114 120 120 114 136 105 110
(xi) Using following data, calculate the simple average-of-price relatives
index by taking
(a) Prices of 2005 as the base
(b) Average prices as the base
Year Commodity
A B C D E
2005 20 12 40 24 36
2008 30 20 20 30 15
2012 40 10 20 42 15
(xii) Calculate Laspeyre’s and Paasche’s price index numbers for the year
2012 using the following data about five commodities:
Year Commodity
A B C D E
Quantity: 2010 12 20 180 36 48
2012 9 32 244 25 60
374 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 375
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (xvi) From the information given below about the consumer price index
number for a certain group of families in a city, obtain the percentage
weights assigned to (a) clothing, and (b) housing. The consumer
price index number is known to be 152.3.
Group: Food Clothing Housing Fuel and Miscellaneous
electricity
Index: 140 185 205 120 156
Weight: 60 ? ? 8 10
(xvii) The monthly income of a person is Rs. 21,000. It is given that the
consumer price index number for a particular month is 136. Find
out the amount spent by him on (i) food, and (ii) clothing.
Group Expenditure Index
Food ? 180
Rent 2,940 100
Clothing ? 150
Fuel and power 3,360 110
Miscellaneous 3,780 80
(xviii) Owing to a sudden price disturbance, the consumer price index of a
working class in a certain area increased in a month by one-quarter
of what it was before, to 225. The index of food became 252 from
198, that of clothing from 185 to 205, that of fuel & lighting from
175 to 195, and that of miscellaneous from 138 to 212. The index
of rent, however, remained unchanged at 150. It was known that
the weights of clothing, rent and fuel & lighting were the same.
Find out the exact percentage weights of each of the groups.
Ans.
(xi) (a) 106.67, 110 (b) 108.20, (xii) L = 94.52, P = 94.16, No
95.95, 95.84,
(xiii) (a) = 109.66 (b) = 107.61 (xiv) (a) = 325 (b) = 18500
(xv) (a) = 10 (b) = 12 (xvi) (i) = 8400 (ii) = 2520
(xvii) F = 54, C = FL = R = 10,
Misc = 16
376 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
UNIT-5
PAGE 377
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
L E S S O N
1
Time Series Analysis
STRUCTURE
1.1 Learning Objectives
1.2 Introduction
1.3 Components of Time Series
1.4 Models of Time Series
1.5 Methods of Measuring Trend
1.6 Second Degree Parabola
1.7 Exponential Trend
1.8 Shifting the Trend Origin
1.9 Conversion of Annual Trend to Monthly Trend
1.10 Measurement of Seasonal Variations
1.11 Summary
1.12 Self-Assessment Questions
1.2 Introduction
When quantitative data are arranged in the order of their occurrence, the resulting statistical
series is called a time series. The quantitative values are usually recorded over equal time
interval daily, weekly, monthly, quarterly, half yearly, yearly, or any other time measure.
PAGE 379
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
These components provide a basis for the explanation of the past behaviour.
They help us to predict the future behaviour. The major tendency of each
component or constituent is largely due to casual factors. Therefore a
brief description of the components and the causal factors associated with
each component should be given before proceeding further.
380 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 381
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes and pricing of farm products. Manufacturers, bankers and merchants who
deal with farmers find their business taking on the same seasonal pattern
which characterise the agriculture of their area.
The second cause of seasonal variation is custom, education or tradition.
Such traditional days as Diwali, Christmas. Id etc., product marked
variations in business activity, travel, sales, gifts, finance, accident, and
vacationing.
The successful operation of any business requires that its seasonal
variations be known, measured and exploited fully. Frequently, the
purchase of seasonal item is made from six months to a year in advance.
Departments with opposite seasonal changes are frequently combined in
the same firm to avoid dull seasons and to keep sales or production up
during the entire year.
Seasonal variations are measured as a percentage of the trend rather
than in absolute quantities. The seasonal index for any month (week,
quarter etc.) may be defined as the ratio of the normally expected value
(excluding the business cycle and erratic movements) to the corresponding
trend value. When cyclical movement and erratic fluctuations are absent
in a time series, such a series is called normal. Normal values thus are
consisting of trend and seasonal components. Thus when normal values are
divided by the corresponding trend values, we obtain seasonal component
of time series.
382 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 383
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes This is the most commonly used model in the decomposition of time series.
(ii) Additive Model
There is another model called Additive model in which a particular
observation in a time series is the sum of these four components.
O = T + S + C + I
To prevent confusion between the two models, it should be made clear
that in Multiplicative model S, C, and I are indices expressed as decimal
percents whereas in Additive model S, C and I are quantitative deviations
about trend that can be expressed as seasonal, cyclical and irregular in
nature.
If in a multiplicative model, T = 500, S = 1.4, C = 1.20 and I = 0.7 then
O = T × S × C × I
By substituting the values we get
O = 500 × 1.4 × 1.20 × 0.7 = 588
In additive model, T = 500, S = 100, C = 25, I = –50
O = 500 + 100 + 25 – 50 = 575
The assumption underlying the two schemes of analysis is that whereas
there is no interaction among the different constituents or components
under the additive scheme, such interaction is very much present in the
multiplicative scheme. Time series analysis, generally, proceed on the
assumption of multiplicative formulation.
384 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(i) The original data are first plotted on a graph paper. Notes
(ii) The direction of the plotted data is carefully observed.
(iii) A smooth line is drawn through the plotted points.
While fitting a trend line by the freehand method, an attempt should be
made that the fitted curve conforms to these conditions.
(i) The curve should be smooth either a straight line or a combination
of long gradual curves.
(ii) The trend line or curve should be drawn through the graph of the
data in such a way that the areas below and above the trend line
are equal to each other.
(iii) The vertical deviations of the data above the trend line must equal
to the deviations below the line.
(iv) Sum of the squares of the vertical deviations of the observations
from the trend should be minimum.
Example 1 : Draw a time series graph relating to the following data and
fit the trend by freehand method :
Year Production of Steel
(million tonnes)
TREND OF STEEL PRODUCTION
2007 20 Y
PRODUCTION OF STEEL
E
2008 22 40
D LIN
EN
TR ATA
2009 30 30 UA
LD
ACT
2010 28
20
2011 32
2012 25 10
2013 29 0 X
2007 2008 2009 2011 2012 2013 2014 2015 2016
2014 35 YEARS
2015 40
2016 32
The trend line drawn by the freehand method can be extended to project
future values. However, the free-hand curve fitting is too subjective and
should not be used as a basis for prediction.
PAGE 385
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
386 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Year : 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Notes
Quantity : 239 242 238 252 257 250 273 270 268 288 284
Year : 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Quantity : 282 300 303 298 313 317 309 329 333 327
Solution :
Year Quantity 5-yearly moving total 5-yearly
moving
average
1995 239
1996 242
1997 238 1228 245.6
1998 252 1239 247.8
1999 257 1270 254.0
2000 250 1302 260.4
2001 273 1318 263.6
2002 270 1349 269.8
2003 268 1383 276.6
2004 288 1392 278.4
2005 284 1422 284.4
2006 282 1457 291.4
2007 300 1467 293.4
2008 303 1496 299.2
2009 298 1531 306.2
2010 313 1540 308.0
2011 317 1566 313.2
2012 309 1601 320.2
2013 329 1615 323.0
2014 333
2015 327
To simplify calculation work: Obtain the total of first five years data.
Find out the difference between the first and sixth term and add to the
total to obtain the total of second to sixth term. In this way the difference
between the term to be omitted and the term to be included is added to
the preceding total in order to obtain the next successive total.
Example 3 : Fit a trend line by the method of four-yearly moving average
to the following time series data.
PAGE 387
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Year : 2001 2002 2003 2004 2005 2006 2007 2008
Sugar production : 5 6 7 7 6 8 9 10
(lakh tons)
Year : 2009 2010 2011 2012
Sugar production : 9 10 11 11
(lakh tons)
Solution :
Year Sugar 4-yearly 4-yearly To recenter trend
Production moving moving values
(lakh tons)
2 yearly 2 yearly
centred moving
total average total average
1. 2. 3. 4. 5. 6.
2001 5
2002 6
2003 7 25 6.25 12.75 6.375
2004 7 26 6.50 13.50 6.75
2005 6 28 7.00 14.50 7.25
2006 8 30 7.50 15.75 7.875
2007 9 33 8.25 17.25 8.625
2008 10 36 9.00 18.50 9.25
2009 9 38 9.50 19.50 9.75
2010 10 40 10.00 20.25 10.125
2011 11 41 10.25
2012 11
Remark : Observe carefully the placement of totals, averages between
the lines.
Merits
1. This is a very simple method.
2. The element of flexibility is always present in this method as all
the calculations have not to be altered if same data is added. It
only provides additional trend values.
3. If there is a coincidence of the period of moving averages and
the period of cyclical fluctuations, the fluctuations automatically
disappear.
388 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 389
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Example 4: Fit a trend line to the following data by the method of
Semi-averages:
Year 2000 2001 2002 2003 2004 2005 2006
Sales 100 105 115 110 120 105 115
(Lac
units)
Solution: Since the data consist of seven years, the middle year shall be
left out and an average of the first years and last three shall be obtained.
The average of first three year is (100 + 105 +115)/3 or 320/3 or 106.67
and the average of last three years (120 + 105 + 115)/3 or 340/3 or
133.33.
390 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
It is because of this second condition that this method is known as the Notes
method of least squares. It may be mentioned that a line fitted to satisfy
the second condition, will automatically satisfy the first condition.
The formula for a straight-line trend can most simply be expressed as
Yc = a + bX
where X represents time variable, Yc is the dependent variable for which
trend values are to be calculated and a and b are the constants of the
straight line to be found by the method of least squares.
Constant a is the Y-intercept. This is the difference between the point of
the origin (O) and the point when the trend line and Y-axis intersect. It
shows the value of Y when X = 0, constant b indicates the slope which
is the change in Y for each unit change in X.
Let us assume that we are given observations of Y for n number of years.
If we wish to find the values of constants a and b in such a manner that
the two conditions laid down above are satisfied by the fitted equation.
Mathematical reasoning suggests that, to obtain the values of constants
a and b according to the Principle of Least Squares, we have to solve
simultaneously the following two equations.
6Y = na + b6X ...(i)
6XY = a6X + b6X2 ...(ii)
Solution of the two normal equations yield the following values for the
constants a and b:
nΣXY − ΣX ΣY
b =
nΣX 2 − (ΣX ) 2
Σ Y − bΣ X
and a =
n
Least Squares Long Method : It makes use of the abovementioned
two normal equations without attempting to shift the time variable to
convenient mid-year. This method is illustrated by the following example.
Example 5 :
Fit a linear trend curve by the least-squares method to the following data :
PAGE 391
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
392 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 393
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes mean of the X values. Sum of the X values would then equal 0. The
two normal equations would then be simplified to
6Y = Na ... (i)
ΣY
or a =
N
ΣXY
and 6XY = b6X2 or b =
.... (ii)
ΣX 2
Two cases of short cut method are given below. In the first case there are
odd number of years while in the second case the number of observations
are even.
Example 6 : Fit a straight line trend on the following data :
Year 2008 2009 2010 2011 2012 2013 2014 2015 2016
Y 4 7 7 8 9 11 13 14 17
Solution : Since we have 9 observations, the origin, is taken at 2012 for
which X is assumed to be 0.
Year Y X XY X2
2008 4 – 4 – 16 16
2009 7 – 3 – 21 9
2010 7 – 2 – 14 4
2011 8 – 1 – 8 1
2012 9 0 0 0
2013 11 1 11 1
2014 13 2 26 4
2015 14 3 42 9
2016 17 4 68 16
Total 90 0 88 60
Thus n = 9, 6Y = 90, 6X =0,6XY= 88, and 6X2 = 60
Substituting these values in the two normal equations, we get
90 = 9a or a = 90/9 or a = 10
88 = 60b or b = 88/60 or b = 1.47
? Trend equation is : Yc = 10 + 1.47 X
Inserting the various values of X, we obtain the trend values as below.
394 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 395
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
396 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Notes
ΣY − cΣX 2 ΣXY N ΣX 2Y − ΣX 2 ΣY
a = ; b = ; c =
N ΣX 2 N ΣX 4 − ( ΣX 2 ) 2
Example 8 : The price of a commodity during 2010–2015 is given below.
Fit a parabola Y = a + bX + cX2 to this data. Estimate the price of the
commodity for the year 2016
Year Price Year Price
2010 100 2013 140
2011 107 2014 181
2012 128 2015 192
Also plot the actual and trend values on graph.
Solution : To determine the value a, b and c, we solve the following
normal equations:
6Y = Na + b6X + c6X2
6XY = a6X + b6X2 + c6X3
6X2Y = a6X2 + b6X3 + c6X4
Year Y X X2 X3 X4 XY X 2Y Yc
2010 100 –2 4 –8 16 – 200 400 97.744
2011 107 –1 1 –1 1 – 107 107 110.426
2012 128 0 0 0 0 0 0 126.680
2013 140 +1 1 +1 1 + 140 140 146.506
2014 181 +2 4 +8 16 + 362 724 169.904
2015 192 +3 9 +27 81 +576 1728 196.874
N = 6 6Y = 6X = 6X2 = 6X3 = 6X4 = 6XY = 6X2Y = 6Yc =
848 3 19 27 115 771 3099 848.134
848 = 6a + 3b + 19c ...(i)
771 = 3a + 19b + 27c ...(ii)
3,099 = 19a+27b + 115c ...(iii)
Solving Eqns. (i) and (ii), we get
35b +35c = 695 ...(iv)
Multiplying Eqn. (ii) by 19 and Eqn. (iii) by 3. Subtracting (iii) from
(ii), we get
5352 = 280b + 168c ...(v)
PAGE 397
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
398 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 399
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
400 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 401
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes totals by dividing the computed constant ‘a’ by 12 and the value of ‘b’
by 144. Justification of dividing ‘a’ and ‘b’ by 12 and 144 is that the
data are sums of 12 months hence ‘a’ and ‘b’ must be divided by 12 and
‘b’ is again divided by 12 so that the time units (X’s will be in months
as well, i.e., ‘b’ would give monthly increments). Therefore the monthly
trend equation becomes:
a b
Y = + X
12 144
The annual trend equation can also be reduced to quarterly trend equation
which will be given by :
a b a b
Y = + X or + X
4 4× 4 4 16
Example 13 : The trend of the annual sales of ABC Co. Ltd. is given
by the following equation :
Yc = 30 + 3.6X (origin 2012, X unit = 1 year, Y unit = annual sales)
Convert the equation on monthly basis.
Solution : To convert an annual trend equation on monthly basis, the
value of ‘a’ is divided by 12 and the value of ‘b’ by 144. The equation
on monthly basis is
30 3.6
Yc = + X
12 144
Yc = 2.5 + 0.025X
If the annual trend equation is of second degree, the corresponding
monthly trend equation is obtained by dividing ‘a’ by 12, ‘b’by 144 and
‘c’ by 1728 (the last being identical to dividing ‘c’ by 12 three times).
Example 14 : Convert the following annual trend equation on a monthly
basis :
Yc = 10.6 + 0.8 X + 0.64X2
Solution : To convert annual trend equation of the second degree on
monthly basis, divide ‘a’ by 12, ‘b’ by 144 and ‘c’ by 1,728. Thus, the
required equation will be :
402 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 403
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes variations is of use to those who are trying to remove the cause of
seasonals or are attempting to mitigate the problem by diversification,
offsetting opposing seasonal patterns, or some other means.
Since the number of calender days and working days vary from month
to month, therefore, it is essential to adjust the monthly figures if the
same are based on daily quantities, otherwise, there is no need for such
adjustment when we deal with either volume of inventories or of bank
deposits because then the values are not influenced by the number of
calender days or working days.
Three Reason for Studying seasonal variation
To determine the effects of seasonal variations on the value of a
given phenomenon
Projection of past patterns into the future
Elimination of seasonal component
Methods of Measuring Seasonal Variations
1. Method of Simple Averages (Weekly, Monthly or Quarterly).
2. Ratio-to-Moving Average Method.
3. Ratio-to-Trend Method.
4. Link Relatives Method.
404 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(v) Taking the average of monthly average as 100, compute the percentage Notes
of various monthly averages as follows:
Monthly average for January
Seasonal Index for January = ×100
Average of monthly average
If instead of the average of each month, the total of each month are
obtained, we will get the same result. The following example shall
illustrate the method.
Example 16 : Consumption of monthly electric power in million of Kw
hours for street lighting in India during 2011 – 2015 is given below:
Year Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec
2011 318 281 278 250 231 216 223 245 269 302 325 347
2012 342 309 299 268 249 236 242 262 288 321 342 364
2013 367 328 320 287 269 251 259 284 309 245 367 394
2014 392 349 342 311 290 273 282 305 328 364 389 417
2015 420 378 370 334 314 296 305 330 356 396 422 452
Find out seasonal variation by the method of monthly averages.
Solution : Computation of Seasonal Indices by Monthly Averages
Monthly Five Percentage
Consumption of monthly electric
Month total for yearly
power
5 years average
(1) (2) (3) (4) (5) (6) (7) (8) (9)
Jan. 318 342 367 392 420 1,839 367.8 116.1
Feb. 281 309 328 349 378 1,645 329.0 103.9
March 278 299 320 342 370 1,609 321.8 101.6
April 250 268 287 311 334 1,450 290.0 91.6
May 231 249 269 290 314 1,353 270.6 85.4
June 216 236 251 273 296 1,272 254.4 80.3
July 223 242 259 282 305 1,311 262.2 82.8
Aug. 245 262 284 305 330 1,426 285.2 90.1
Sept. 269 288 309 328 356 1,550 310.0 97.9
Oct. 302 321 245 364 396 1,728 345.6 109.1
Nov. 325 342 367 389 422 1,845 369.0 116.5
Dec. 347 364 394 417 452 1,974 394.8 124.7
Total 19,002 3,800.4 1,200
Average 1,583.5 316.7 100
PAGE 405
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
406 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 407
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Example 17 : Prepare a monthly seasonal index from the following data,
using moving averages method :
Monthly Sales of XYZ Products Co,. Ltd. (Rs.)
Year
Month 2010 2011 2012
January 3,639 3,913 4,393
February 3,591 3,856 4,530
March 3,326 3,714 4,287
April 3,469 3,820 4,405
May 3,321 3,647 4,024
June 3,320 3,498 3,992
July 3,205 3,476 3,795
August 3,205 3,354 3,492
September 3,255 3,594 3,571
October 3,550 3,830 3,923
November 3,771 4,183 3,984
December 3,772 4,482 3,880
Solution :
Computations of Ratios to 12-month centered moving averages for sales
(Rs.)
Year & Sales (Rs.) 12-month 12-month Centred Ratio to
month moving moving 12-months moving
total average moving average
average
1 2 3 4 5 6
2010
Jan. 3,639
Feb. 3,591
March 3,326
April 3,469
May 3,321
June 3,320
41,424 3,452
July 3,205 3,463 92.55
41,698 3,475
Aug. 3,205 3,486 91.94
408 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 409
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
410 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
1.10.3 Ratio-to-Trend-Method
The ratio-to-trend method is similar to ratio-to-moving-average method.
The only difference is the way of obtaining the trend values. Whereas in
the ratio-to-moving-average method, the trend values are obtained by the
method of moving averages, in the ratio-to-trend method, the corresponding
trend is obtained by the method of least squares.
PAGE 411
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
412 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 413
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
414 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
Limitations: Notes
If the cyclical changes are very wide in the time series, the trend can
never follow the actual data, as closely as a 12-month moving average
will follow, under the ratio-to-trend method. There will be more bias in
a seasonal index computed by ratio-to-trend method.
PAGE 415
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (vi) Express the corrected chain relatives as percentage of their averages.
These provide the required seasonal indices by the method of link
relatives.
The following example will illustrate the process.
Example 19 : Apply method of link relatives to the following data and
calculate seasonal indices.
Quarterly Figures
Quarter 2011 2012 2013 2014 2015
I 6.0 5.4 6.8 7.2 6.6
II 6.5 7.9 6.6 5.8 7.4
III 7.8 8.4 9.3 7.5 8.0
IV 8.7 7.3 6.4 8.5 7.1
Solution : Calculation of Seasonal Indices by Method of Link Relatives
Quarter
Year I II III IV
2011 — 108.3 120.0 111.5
2012 62.1 146.3 106.3 86.9
2013 93.2 95.6 143.1 68.8
2014 112.5 80.6 129.3 113.3
2015 77.6 110.6 109.6 88.8
416 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 417
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Therefore, the product of trend value for any period when adjusted by
the seasonal index for that period gives us an estimate of the normal
activity during that period.
1.11 Summary
A time series refers to the observations of a random variable like sales,
employment, etc. placed in a chronological order. The twin reasons for
studying the time series include a historical understanding of the past
data and to make forecast for the future. There are four components of
a time series: (i) Secular trend, (ii) Cyclical variations, (iii) Seasonal
variations, and (iv) Irregular variations.
Secular trend refers to the general pattern of the values in a time series
– it is the long-term tendency of the movement of the variable. The
cyclical variations are caused by business cycles. A business cycle has
four phases: (i) peak time or prosperity (ii) recession (iii) trough or
depression, and (iv) recovery. Seasonal variations, which are caused by
weather, customs, festivals, etc show themselves in a period of one year.
They repeat year after year. Irregular variations or random fluctuations
are those which result from unpredictable events like strikes, natural or
other calamities etc.
The two models used for the purpose of decomposing are (i) additive
model, and (ii) multiplicative model. The additive model is based on the
assumption that the four components add up to make time series. They
are assumed to be independent. The multiplicative model is based on the
assumption that a time series is the product of the four components. The
linear trend is obtained by fitting a straight line to the given data. It is
fitted on the principle of least squares. It is possible to shift the origin
of an equation as Yt= a + b(X ± k).The annual trend equations can be
changed on a monthly or quarterly basis, and reverse is also possible. The
parabolic trend involves fitting a second-degree parabola to the given data.
It is of the form Yt = a + bX + cX2.The exponential trend is appropriate
where the variable in consideration grows or declines exponentially. It
takes the form Yt= abx.
The method of moving averages is another way of obtaining trend.
Beginning with a certain number of time periods, average is calculated
418 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
and then successive averages are calculated by dropping the first of the Notes
values and including the next one. Seasonal variations are measured and
expressed as seasonal indices. The methods of simple averages, ratio-to-
moving averages and ratio-to-trend are primarily used for the purpose.
PAGE 419
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (xiv) The value of ‘a’, the intercept of a trend equation, is related to its
origin.
(xv) The trend values for various years given in the data and the projected
values do not change with a shift in the origin.
(xvi) The monthly trend equations can also be converted into annual trend
equations, and for this we first need to shift the origin of the trend
equation to July 1 of the year of origin.
(xvii) In exponential trend, a straight line trend is fitted to the log values
of the Y variable.
(xviii) The exponential trend is an example of non-linear trend.
(xix) Moving averages require centering whenever the underlying period,
n, used in their calculation is even.
(xx) A monthly sales budget can be drawn up by multiplying monthly
seasonal indices by average monthly sales and dividing each by
100.
Ans.
(i) T (ii) T (iii) F (iv) T (v) F
(vi) F (vii) F (viii) F (ix) T (x) T
(xi) F (xii) F (xiii) F (xiv) T (xv) T
(xvi) T (xvii) T (xviii) T (xix) T (xx) T
Exercise 2: Questions and Answers
(i) What is a time series? What are its components? With which
component of a time series would you mainly associate each of
the following?
(a) Wild cat strike in a factory, interrupting production for 15
days
(b) Increase in sales in a departmental store on Diwali
(c) An era of prosperity
(d) Fall in death rate due to advances in medical science.
(ii) What is meant by decomposition of a time series? Explain the
difference between additive and multiplicative models of analysing
time series.
(iii) Explain the rules of converting annual trend equation (i) on a
monthly basis, and (ii) on a quarterly basis. How can a quarterly
trend equation be converted on an annual basis?
420 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(iv) How are seasonal variations measured under the multiplicative model Notes
of analysing time series? How are the seasonal indices interpreted?
(v) Explain the following methods of calculating seasonal indices:
(a) Method of Simple Averages
(b) Ratio-to-trend Method
(c) Ratio-to-moving Averages Method
(vi) The following data relates to gross ex-factory value (in Rs. crores)
of output of a factory over the last few years:
Year : 2006 2007 2008 2009 2010 2011 2012
Value : 320 360 368 332 376 396 368
(a) Fit a straight line trend by the method of least squares, taking
the year of origin as 2006.
(b) What is the average annual change in the value of output?
(c) Obtain trend equation using the year 2009 as the origin. How
does it compare with equation obtained in (a) above?
(vii) Demand (in ‘000 metric tonnes) for sugar of Sweet India is given
here:
Year : 2006 2007 2008 2009 2010 2011 2012
Demand : 77 88 94 85 91 98 90
(a) Fit a straight line trend by the method of least squares
(b) Calculate trend values and plot observed values and trend values
on a graph
(c) Eliminate trend component using the multiplicative model
(d) Obtain the forecast of demand for the year 2014.
(viii) Below are given figures of production of a sugar factory:
Year : 2005 2006 2007 2008 2009 2010 2011 2012
Production : 88 98 100 91 102 107 100 118
(‘000 tons)
(a) Fit straight line trend to the above data by the method of least
squares
(b) What is the average annual change in the sugar production?
(c) Obtain trend values for various years. Show that the sum of
difference between actual and trend values is equal to zero
PAGE 421
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes (d) Eliminate the trend using multiplicative model. What components
are thus left over?
(e) Convert the trend equation on a month-to-month basis and shift
the origin to January 2006
(ix) For each of the following derive the monthly trend equation:
(a) Yt = 960 + 72X Origin: 2008, X Unit = 1 Year, Y unit = Annual
sales of coffee in Rs.
(b) Yt = 169.58 + 78X Origin: 2009, X Unit = 1 Year, Y unit =
Average monthly production
(c) Yt = 2,760 + 212X Origin: 2007, X Unit = 1/2 Year, Y unit =
Annual earnings in Rs.
(d) Yt= 72 + 12X Origin: 2010, X Unit = 1/2 Year, Y unit = Average
monthly production
(x) Given the trend equation:
Yt = 204 + 24X
(2008 = 0, X unit = 1 Year, Y unit = Average monthly values)
(a) Convert this equation on a monthly basis
(b) Shift the origin of the monthly trend equation to January, 2007
(c) Estimate the value for January 2010
(xi) Given the following trend equation:
Yt= 1,880+ 6X
[2009 = 0, X unit = 1 Year, Y unit = Average monthly sales (in
‘000 Rs.)]
(a) Convert this equation on a yearly basis
(b) Estimate sales for the year 2013
(c) Obtain a quarterly trend equation from (a) above
(d) Obtain quarterly trend equation with origin at I Quarter, 2010.
(xii) Given below is a trend equation:
Yt= 372 + 288X
(Origin 2006, X unit = 1 Year, Y unit = Annual sales) Convert the
above equation:
(a) To monthly trend equation with January 2007 as origin and
estimate sales for March, 2007.
422 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(b) To quarterly trend equation with first quarter, 2007 as the Notes
origin and estimate sales for third quarter of 2007.
(xiii) Convert the following trend equation on a monthly basis and obtain
the trend value for November 2012:
Yt = 432 + 144X – 60X2
2010 = 0; X unit = 1 Year; Y unit = Yearly production in‘000 units
(xiv) The sales made by a company in the years 2006 through 2012 are
given here:
Year : 2006 2007 2008 2009 2010 2011 2012
Sales (in millions : 30 38 75 90 88 140 188
of Rs.)
(a) Fit an exponential trend Yt = abx to the data and obtain the
trend equation.
(b) Plot and data on a graph and also plot the trend line.
(c) Find the projected sales for the year 2014.
(d) What is the average rate of growth of sales?
(xv) From the following data, estimate the trend values by taking 4-yearly
moving averages:
Year Sales (Rs. lakh) Year Sales (Rs. Lakh)
1993 200 1999 360
1994 120 2000 400
1995 280 2001 320
1996 240 2002 360
1997 160 2003 360
1998 320
(xvi) The trend equation for quarterly sales of a firm is estimated to
be as: Y = 20 + 2X, where Y is sales per quarter in millions of
rupees, unit of X is one quarter and the origin is the middle of the
first quarter (Jan.-Mar.) of 2005. The seasonal indices of sales for
the four quarters are given below:
Quarter : I II III IV
Seasonal Index: 120 105 85 90
Estimate the sales for each quarter of 2010.
(xvii) Calculate seasonal indices from the following data by ratio-to-moving
averages method:
PAGE 423
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
424 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 425
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
426 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
PAGE 427
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
428 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
BUSINESS STATISTICS
(a) A man has five coins, one of which has two heads. He randomly Notes
takes out a coin and tosses it three times:
(i) What is the probability that it will fall head upward all the
times?
(ii) If it always falls head upward, what is the probability that it
is the coin with two heads?
(b) In a binomial distribution consisting of 5 independent trials, probabilities
of 1 and 2 successes are 0.4096 and 0.2048. Find the parameter ‘p’
of the distribution.
5. A food products company is contemplating the introduction of
a revolutionary new product with new packaging to replace the
existing product at much higher price (S1) or a moderate change in
the composition of the existing product with a new packaging at a
small increase in price (S2) or a small change in the composition
of the existing except the word “New’ with a negligible increase
in price (S3). The three possible states of nature are:
(i) high increase in sales (N1),
(ii) No change in sales (N2) and
(iii) decrease in sales (N3).
The marketing department of the company worked out the payoffs
in terms of yearly net profits for each of the strategies of these
events. This is represented in the following table 11
State of Nature
Payoffs (in Rs.)
Strategies N1 N2 N3
S1 7,00,000 3,00,000 1,50,000
S2 5,00,000 4,50,000 0
S3 3,00,000 3,00,000 3,00,000
Which strategy should the executive choose on the basis of:
(i) Maximin Criterion
(ii) Maximax Criterion
(iii) Minimax Regret Criterion, and
(iv) Laplace Criterion.
PAGE 429
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
Notes Or
(a) What is Standard Error of Estimate ? Why is it calculated? 4
(b) The arithmetic mean of a set of a statistical observations is 20 while
its geometric mean is 19 and Harmonic Mean is 25. Comment on
the statement. 3
(c) The Mean and Standard Deviation of two brands and interpret the
result:
Brand-I Brand-II
Mean 800 hours 770 hours
Standard Deviation 100 hours 60 hours
Calculate a measure of relative dispersion for the two brands and
interpret the result. 4
430 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Glossary
Arithmetic Mean: Arithmetic mean is a mathematical average which is most commonly
used and understood and also very extensively used in statistical work.
Average: An average is a typical value which is used to represent the entire set of values
and is used as a benchmark to make comparisons.
Average Deviation: It is defined as a value, which is obtained by taking the average of
the deviations of various items, from a measure of central tendency, Mean or Median or
Mode, after ignoring negative signs.
Binomial Distribution: Refer to a set of mathematical models of the relative frequencies of
a finite number of observations of a variable. It is systematic arrangement of probabilities
of mutually exclusive and collectively exhaustive elementary events of an experiment.
Bombay Stock Exchange: The first organised stock exchange was established in July 1875
as an association of native brokers, named as native shares and stock brokers association.
its formal deed of association was executed in 1887. This stock exchange is now popularly
known as the Bombay Stock Exchange (BSE).
Bowley’s Method of Skewness: Bowley’s method of skewness is based on the values of
median, lower and upper quartiles.
Central Moments: Moments calculated about mean are called central moments.
Circular Test: It is based on the shiftability of the base accordingly, the index should
work in a circular fashion i.e., if an index number is computed for the period 1 on the
base period 0, another index is computed for period 2 on the base period 1, and still
another index number is computed for period 3 on the base period 2 then the product
should be equal to one.
Class Boundaries: The lower and upper class limits of new exclusive type classes are
called class boundaries.
Classical Approach: Ratio of number of favourable outcomes to total possible outcome.
Coefficients of Moments: 7KHUH DUH WKUHH FRHIILFLHQWV DQG WKH\ DUH Į $OSKD ȕ %HWD
Ȗ (Gamma) coefficients. These coefficients are calculated on individual relationships of
various Moments.
Complementary Events: Opposite to an event which has already occurred.
PAGE 431
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
432 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
GLOSSARY
characteristics and the number of cases which fall in each class are Notes
recorded.
Frequency Polygon: This is a graph of frequency distribution which
has more than four sides. It is particularly effective in comparing two
or more frequency distributions.
Harmonic Mean: Harmonic mean is equal to the reciprocal of the
arithmetic mean of reciprocals. It is used to average rates.
Histogram: Histogram is the best way of presenting graphically a simple
frequency distribution. Histogram is that it is a graph that represents the
class frequencies in a frequency distribution by vertical adjacent rectangles.
Independent Events: Two events are said to be independent if the
occurrence of one event in no way influences the occurrence of the other
event.
Independent Variable: Regression analysis deals with estimating values
of one variable which is not based on the values of one or more other
variables. The variable being estimates is/are called independent variable/s.
Index Numbers: That are specialised averages which are capable of
being expressed in percentage & index numbers measure the changes in
the level of a given phenomenon. Index numbers measure the effect of
changes over a period of time.
Kurtosis: Kurtosis refers to relative height of the frequency curve, when
two or more symmetrical distributions are compared, the difference in
them are studied with ‘kurtosis’.
Laplace Principle: The Laplace principle is based on the simple rule
that if we are uncertain about various events, then we may treat them
as equally probable.
Linear and Non-Linear Correlation: When the amount of change in
one variable tends to keep a constant ratio to the amount of change in
the other variable, then the correlation is said to be linear. But if the
amount of change in one variable does not bear a constant ratio to the
amount of change in the other variable then the correlation is said to be
non-linear. The distinction between linear and non-linear is based upon
the consistency of the ratio of change between the variables.
PAGE 433
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
434 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
GLOSSARY
said to be positive when the increase (decrease) in the value of one Notes
variable is accompanied by an increase (decrease) in the value of other
variable also. Negative or inverse correlation refers to the movement of
the variables in opposite direction. Correlation is said to be negative, if
an increase (decrease) in the value of one variable is accompanied by a
decrease (increase) in the value of other.
Possion Distribution: This is also a discrete distribution. It was originated
by a French mathematician Simeon Denis Poisson in 1837. The Poisson
distribution is the limiting form of binomial distribution as n becomes
infinitely large (n > 20) and p approaches zero (p <0.05)
Probability: It is the likelihood that something will happen. When we
calculate the probability of an event, we assign it a number between zero
and one, depicting how likely it is to happen.
Quartile deviation: It is a quick, inspectional measure of variability and
used when there are scattered or extreme values included in the data.
Random Experiment: A random experiment is defined as an experiment
whose outcome cannot be predicted with certainty.
Ratio-To-Trend Method: It is similar to ratio-to-moving-average method.
The only difference is the way of obtaining the trend values. Whereas in
the ratio-to-moving-average method, the trend values are obtained by the
method of moving averages, in the ratio-to-trend method, the corresponding
trend is obtained by the method of least squares.
Regression Analysis: Regression Analysis is a branch of statistical theory
which is widely used in all the scientific disciplines. It is a basic technique
for measuring or estimating the relationship among economic variables
that constitute the essence of economic theory and economic life.
Relative Frequency Approach: It is based on the actual observation.
Sample Point: Each element of a sample space is termed as sample point.
Sample Space: Set of all possible outcome of a trial.
Seasonal Variations: Seasonal variations are those rhythmic changes in
the time series data that occur regularly each year. They have their origin
in climatic or institutional factors that affect either supply or demand
PAGE 435
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
B.COM. (PROGRAMME)
436 PAGE
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
GLOSSARY
PAGE 437
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi