Professional Documents
Culture Documents
FOR B. Com
Unit – II:
Unit – III:
Unit – IV:
Index Number – Definition and Uses of Index Numbers, Construction of Index Numbers –
Simple & Weighted Index numbers – test for an Ideal index Number – Chain and Fixed base index –
Cost of living index numbers.
Unit – V:
Analysis of Time Series – Definition – Components and Uses of Time Series. Measures of
Secular trend, Measure of Seasonal Variation – Method of Simple average Only.
Text Books:
1. Business Mathematics and Statistics – P.A. Navanithan (2007) Jai Publishers, Trichy
– 21.
Reference Books :
1. Statistical Methods – S.P.Gupta
2. Statistics – D.C.Sanchati and V.K.Kapoor.
3.Elements of Statistics – Donald R.Byrkt.
4. Statistical Theory and Practice – Pillai. R.S.N Bagavathi. V (2001) S. Chand &
Company Ltd. 2009
Note:
1) Problems : 80% & Theory : 20%
2) This paper has to be taught by a statistics teacher.
UNIT – I
Introduction
The word statistics is derived from the Latin word “ status” (or) an Italian word (or) a German
word” statistic” which means political state
Since in early statistics indicates a collection of facts about the people in the state for
administration or political state.
The state administration requires data regarding birth death income employment, etc…
Now a day’s statistics used to collect quantities information in varies fields, economics, finance,
protection, agriculture, medicine and health care then statically approach
The term statistics is used in different senses singular form and plural form. Singular from
indicates the statistical methods such as collection, classification, frequency distribution and
interpretation.
In plural form it refers to the numerical information collected in a systematic manner.
Definition:
Bowely defines “statistics as numerical statements of facts in any department of enquiry placed
in relation to each other.”
Yule and Kendal defined by “statistics we mean quantitative data affected to market extent by a
multiplicity of causes.
This definition is comprehensive and exhaustive and highlights certain characteristics of
statistical data.
Statistics are aggregate of facts
Affected to a market extent by multiplicity of causes
Numerically expressed
Enumerated (or)estimated according to reasonable standards of accuracy
Collected in a systematic manner
Collected for a pre determined purpose
Placed in a relationship to each other
Reasonable standards of accuracy
Functions:
Statistics definition:
Collection
Classification
Tabulation
Analysis
Interpretation
Definition 2: It may be called science of Average
CHARACTERISTICS OF STATISTICS:
Limitations of statistics:
1. It does not deal with individual items
2. It deals with quantitative data
3. It can be misused
4. Statistical laws are true only on average
Explanations:
1. Statistics deals with group of items for
E.g. The height of 5 students.
2. It does not deals with qualitative terms
E.g.: honesty, color, intelligence, beauty.
3. It figures are given without details we may arrive at wrong and misleading conclusion.
4. The average life of human beings is 65 years. It does not mean that all the human beings die
At the age of 65 years
Functions of statistics:
1. It simplifies the unwieldy and complex data
2. It facilitates comparison
3. Formulates of test hypothesis
4. Studies the relationship
5. It tries to give material for the business man as well as the administrator so as to serve as a guide
in planning and is shaping future policies and programmers
Uses of Statistics:
It helps in presenting large quantity of data in a simple and classified form
It gives the method of comparison of data.
Time series analysis helps in four fasting and consequent planning
Regression analysis establishes relationship between two variables
For the calculation of mortality rates and vital statistics demography are useful
Type of Statistics:
Broadly speaking applied statistics can be divided into areas: Descriptive statistics and inferential
statistics.
DESCRIPTIVE STATISTICS:
Descriptive statistics consists of methods for organizing. Displaying and describing data by using tables,
graphs, and summary measures.
INFERENTIAL STATISTICS:
Inferential statistics consists of methods that use sample results to help make decisions or predictions
about a population.
COLLECTING OF DATA:
DATA:
DATA
PRIMARY SECONDARY
Primary:
1. Primary data are those statistical data which are collected for the first time and are original in
nature
2. Primary data are those which are collected from the individual directly and these data have
never been used for and purpose earlier
Merits:
Demerits:
1. Under this method the investigator contacts witness (or) neibours (or) friends who are
capable of supplying the necessary information
Merits:
Demerits:
Merits:
1. It is relatively cheap
2. It is widely used when the area of investigation is large
3. It saves money and time
Demerits:
1. In this method there is no direct contact between the investigator and the respondent.
therefore we cannot be sure about the accuracy and reliability of the data
2. This method is suitable only for the literate people
3. people may not give the correct answers
It is the most widely used method of collecting of primary data. a number of enumerator
are selected and trained .
They are provided with standard lied questionnaire. Specific training and interviews are
given to them for filling up the schedules. Each enumerator will be in charge of certain
area.
The investigator goes to the informant along with the questionnaire and gets the replies
and records their answers. Public organization and research .institution uses this method.
Merits:
Demerits:
1. It is very costly method as the enumerators are trained and paid for
2. This method is time consuming ,because the enumerators go personally to obtain the
information
3. Personal bias of the enumerators may lead to false conclusion
CHARACTERISTICS OF A GOOD QUESTIONNAIRE
SECONDARY DATA:
It is the statistical information which has already been collected by someone for his own.
Purpose and available for use by other purpose. (Or)
If the data have already been collected by some persons (or) institution and they are made
available for statistical investigation is known as secondary data
1. Published by official agencies of the government, as well as the non-official private agencies
2. Information available in daily news papers, magazines.
3. Published report of state and central government as union territories
CLASSIFICATION:
Definition:
The process of arranging data into groups according to some common characteristics
The process of arranging (or) grouping a large no of individual facts (or) observation on the
basics of similarity among the items is called classification.
Data can be classified on the basic of the following
Geographical (or)spatial
Chronological (or)temporal (or)historical
Qualitative
Quantitative
Spatial: Series which are arranged on the basic of place are called spatial series.
TIRUCHI 13000
MADURAI 11000
COIMBATORE 8000
KANYAKUMARI 4000
Qualitative:
Classification is based on the attributes (or) characteristics. (i.e.) non- measurable characteristic
are qualitative.
If the statistical data collected about the qualities likes male, female, employed, Indian, foreigner.
This is qualitative classification.
One way classification means classification of data on the basic of only one consideration
POPULATION
MALE FEMALE
Two way classifications is based on two consideration. For e.g.: the no of persons leaving India to four
different countries, USA, Canada, for employment opportunities according to sex from 4 different cities.
POPULATION
MALE FEMALE
Measurable characteristic are quantitative .i.e. the statistical data according to numerical measurable
such as age, height, weight, quantitative phenomenon is called a variable.
VARIABLE
QUALITATIVE OR
QUANTITATIV CATEGORICAL(e.g
E make of a computer
,hair color gender)
DISCRETE(e.g
CONTINUOUS(e.g .,
number of
length age
house
,height,weight,time)
,cars,accidents)
Tabulation:
A statistical data collected either through a primary source (or)a secondary source has to be classified
first &then the classified data has to be presented in a tabulate form in an orderly way before analysis
&interpretation of the data.
Definition:
Tabulation is defined as the orderly (or) systematic presentation of numerical data in rows &columns,
designed to facilitate the comparison between the figures.
Tabulation is a statistical tool used for condensation of the data in a statistical process.
1. Identification no
2. Title
3. Head note
4. Stubs
5. Captions
6. Body of the table
7. Foot notes
8. Source
MEASURES OF CENTRAL TENDENCY:
The first of these ways of defining the central tendency leads to the mean, the second leads of the
average or the distribution; and the third is known as the mode. All these three, as a class, are known as
measures of the central tendency.
Though the average is the popular term for the arithmetic mean, yet in statistical work ‘average’
is the general term for any measure of central tendency.
INTRODUCTION:
DEFINITION:
According to Clark “average is an attempt to find one single figures from a group of single
figures from a group of figures.”
Average is a value which is representative a set of data.
OBJECTIVES OF AN AVERAGE
1. To facilitate quick understanding: the purpose of an average is to reduce the mass of complex
data into a single figure it can be easily and quickly understood. The single figure represents the
characteristics of the whole group
2. To facilitate comparison: two or more sets of values can be easily compared on the basis of their
average for example monthly average sales of a company can be compared with the monthly
average sales of another company. such a comparison will be helpful in making decisions.
3. To establish mathematical relationship: an average helps to establish mathematical relationship
between variables. For example, to say that an average income of an Indian is less than the
average income of an American is not clear. But, if the respective income are expressed in terms
of average it will be define and clear.
4. To take policy decisions: in the process of experimentation and research, average are valuable in
setting standards, estimation and other managerial decisions.
CHARACTERISTICS OF A GOOD AVERAGE:
1. Arithmetic mean
2. Median
3. Mode
4. Geometric mean
5. Harmonic mean
Direct method
Shortcut method
Step deviation method
Arithmetic mean refers to the simple average. To a layman it is an average but.
For a statistician it is arithmetic mean it is the simplest of all averages and is widely used in practice.
Definition: The arithmetic mean is defined as the sum total of all values divided by their number. There
are two types of arithmetic mean they are
PROPERTIES OF MEAN:
1. The sum of the deviation of the items from the arithmetic mean, is always zero.
Σ ( X–X) =0
2. The sum of the squared deviation of the items from mean is minimum.
Σ ( X – X )2 = minimum
3. If any two of three values A.M ( X ), no. of. items (N) and total of the values (ΣX) are
known; the third can be found out.
ΣX
X = = ΣX = X N (or)
N
PROPERTIES OF ARITHMETIC MEAN:
1. It is easy understand
2. It easy calculate
3. It easy used in further calculation
4. It easy rigidly defined
5. It easy based on the value of every item in the series
6. It provides a geed basis for comparison
7. It can be used for further analysis and algebraic treatment
8. The mean is a more stable measure of central tendency.
Demerits:
Arithmetic mean gives equal importance to all the items but there are situations where items differ in
importance. in such cases, it is necessary to assign weights in proportion to the relative importance of
the various items .hence ,weighted arithmetic mean is calculated .weighted arithmetic mean is especially
useful in problems relating to the construction of index numbers and standardized birth and death rates.
THE MEDIAN:
Median is the value of item that goes to divide the series into equal parts.
The median is the value of the middle item in series, when items are arranged according to
magnitude.
Merits of Median:
Mode is the value which occurs the greatest number of frequency in a series.
Mode is the size of that item which has the maximum frequency
Merits:
Demerits:
The concept of mode is used by the people in their everyday life. For example, a manufacture of
banyans’, readymade garments, or shoes etc...., is interested in the modal size and manufactures them in
large quantities.
Mode helps the manufacturer in deciding the modals. It is usually in industry and business.
Whether forecasts are also based on mode. It is very usually to agriculturists, businessmen etc. Mode is
also used in socio-economic survey. Mode is also mostly used in business and common.
Geometric Mean:
1. it is difficult to understand
2. Non-mathematical persons cannot do calculations.
3. The geometric mean cannot be computed if any item in the series is negative or zer
4. It has restricted application.
Harmonic mean:
1. It is rigidly defined.
2. It is based on all the observations of the series.
3. It is suitable in case of series having wide dispersion
4. It is suitable for further mathematical treatment.
5. It gives less weight to large items and more weight to small items.
Demerits of Harmonic mean:
If all the items in a variable are the same the arithmetic mean the geometric mean and harmonic mean
are equal .if all the items in a distribution have the same value then.
X=G. M=H . M
But ifthe size vary ,as will generally be the case ,mean will be greater than the geometric mean,and
geometric mean will be greater than the harmonic mean this is because of the property of the geometric
mean to give larger weight to smaller item and of the harmonic mean to give the largest weight to the
smallest item hence,
X >G . M > H . M
Frequency Distribution:
Definition:
Frequency distribution is a statistical table which shows the set of all distinct values of the variable
arranged in order of Magnitude either individually or in groups with their frequency side by side.
Example: The weekly wages in Rs. Paid by a house building contractor to the workers are given below.
Form a discrete frequency distribution.
300, 240, 240, 150, 120, 240, 120, 120, 150, 150, 150, 240, 150, 150, 120
300, 120, 150, 240, 150, 150, 120, 240, 150, 240, 150, 120, 120, 240, 150
Solution:
In the first column all the different values are written in ascending order. Each of the given values is
considered and tally mark is put in the appropriate place in the second column. After considering all the given
values, the tally marks against each value are counted and their number is written in numerical in the third
column. These numerical are the frequencies and their total is also given.
In the final table, Only two columns (Excluding the tally marks) are given.
This is an interval of values which constitutes a class or group. In the example data under quantitative
classification the class- Intervals have been taken as 0-39, 40-49, 50-59 and 60-100.
Example : Given
In this example, d= 1 ( 40-39 = 50-49 = 60-59) and ½ d = 0.5 d can be different from 1 also.
Size of a class interval is also called length . Size = Upper boundary – Lower boundary.
Example : From the following observations prepare a frequency distribution table in ascending order starting
with 100- 110 (exclusive method).
Income in (Rs)
125 108 112 126 110 132 136 130 149 155 120 130 136 138 125
111 119 125 140 148 147 137 145 150 142 135 137 132 165 154
Solution: Proceeding as explained under the previous example the following table is Obtained. This being
Exclusive method the given values. 110, 120, 130, 140, and 150 are included in the class- intervals in which
they are lower boundary.
Example: Formation of the two cumulative frequency distribution as explained above from a given discrete
frequency distribution.
Less than cumulative frequency 28 corresponding to the wage 240 means 28 workers have wages less
than or equal to Rs. 240 similarly more than cumulative frequency 10 corresponding to the wage 240 means 10
workers have wages more than or equal to Rs. 240.
Example: Formation of the cumulative frequency distribution (as explained earlier) from a given continuous
frequency distribution (inclusive method)
A measures of central tendency gives a single representative value for a set of usually un equal values.
The single value is the point of location around which the individual values of the set cluster. The
Measures of central tendency are hence known as (measures of location) .They are popularly called
1. Arithmetic mean
2. Median
3. Mode
4. Geometric mean
5. Harmonic mean.
6. Combined mean
ARITHMETIC MEAN
When the observed values are given individually such as X 1 , X 2 , … … … … X n the methods of calculation
of arithmetic mean are as follows.
Total of t h eobservations
Direct method: Arithmetic mean =
Number of t h e observations
X 1+ X 2+ X 3+… … … … … ..+ Xn
= N
X́ =
∑x
N
i) Arithmetic mean:
1) Individual series:
Family: A B C D E F G H I J
Expenditure 30 70 10 75 500 8 42 250 40 36
:
Calculate the Arithmetic mean.
Family: Expenditure(Rs.)
A 30
B 70
C 10
D 75
E 500
F 8
G 42
H 250
I 40
J 36
Total ∑ x = 1061
X́ =
∑ x = 1061 = 106.1
N 10
R. No’s: 1 2 3 4 5 6 7 8 9 10
Marks: 40 50 55 78 58 60 73 35 43 48
Solution: Calculation of mean
R.No’s Marks
1 40
2 50
3 55
4 78
5 58
6 60
7 73
8 35
9 43
10 48
N=10 ∑ x = 540
X́ =
∑ x = 540 = 54 marks.
N 10
The actual values with corresponding frequencies are given in the following form.
Observed Frequency
X1 F1
X2 F2
. .
. .
. .
Mean, X́ =
∑ fx = 400 = 4
N 100
Value: 1 2 3 4 5 6 7 8 9 10
F: 21 30 28 40 26 34 40 9 15 57
Solution:
X F FX
1 21 21
2 30 60
3 28 84
4 40 160
5 26 130
6 34 204
7 40 280
8 9 72
9 15 135
10 57 570
Total N= 300 ∑ fx =1716
TYPE III [Continuous series – Exclusive class Intervals]
This is the most important form moss often data are available in this form. As seen later, data of type IV
To type VII are to be rewritten in type III form for proper use of the formulae for median , Mode Quartile etc.
Formulae for continuous series and discrete series are same (but definitions of d, d ' and f differ as
explained earlier) and hence the steps are same after identifying M , the mid values of the class intervals.
X́ =
∑ fm = 2460 = 49.20
N 50
Problem 1: The annual profits of 90 companies are given below find the arithmetic mean.
Arithmetic mean, X́ =
∑ fm = 4875
N 90
Height below(cms): 150 155 160 165 170 175 180 185
No. of. soldiers: 0 23 77 152 266 419 472 500
Solution:
Mean height= X́ =
∑ fm = 84205.0 = 168.41 cms.
N 500
Weight above No. of. Boys Weight (kgs) No. of. Boys Mid value
(kgs) f f M fm
20 160 20-25 160-145=15 22.5 337.5
25 145 25-30 145-100=45 27.5 1237.5
30 100 30-35 100-50=50 32.5 1625.0
35 50 35-40 50-9=41 37.5 1537.5
40 9 - 9 42.5 382.5
N= 160 ∑ fm= 5120
Arithmetic mean X́ =
∑ fm =
5120
= 32.00 kgs.
N 160
Combined mean:
Let there be N 1 items in the first group with mean x́ 1and N 2 items in the second group with mean x́ 2
When these two groups merge together, there are N 1 + N 2 items whose total = N 1 x´1 + N 2 x́ 2
N 1 x´1+ N 2 x´2
∴ The mean of the combined group, X´12 =
N 1+ N 2
In a similar manner, when there is a third group of N 3 items with mean x́ 3 , the combined arithmetic
N 1 x´1+ N 2 x´2 + N 3 x´3
mean of the three groups, X´12 = N 1+ N 2+ N 3
Problem: 1
There are two branches of an establishment employing 100 and 80 persons respectively, If the arithmetic
mean of the monthly salaries paid by the two branches are Rs. 275 and Rs.225 respectively Find the
arithmetic of the salaries of the employees of the establishment as a whole.
X́ 1 = 275 ; X́ 2 =225
The arithmetic mean of the salaries of the employees of the establishment as a whole.
N 1 x´1+ N 2 x́ 2
X´12 =
N 1+ N 2
27500+18000
= 180
45500
= 180 = Rs. 252.78
MEDIAN
Definition: Median is the value of the middle most item when all the items are in the order of
Magnitude.
Individual observations:
N +1
Formulae of the median =
2
N +1 9+1
Position of median is = =5
2 2
Solution:
Find median for the following data. 57, 58, 61, 42, 38, 65, 72, 66.
Solution: Values in ascending order: 38, 42, 57, 58, 61, 65, 66, 72.
N +1 8+1
Position of median is = = 4.5
2 2
N
A fraction value at (N/2 = 8/2) 4th position =58value at ( +1 = 4+1=) 5th position =61
2
Discrete series:
N + 1th
Median =
2
item.
Problem: The marks (out of a maximum of 10 , scored by the students of a class are given below. Find
the median mark.
Marks: 3 4 5 6 7 8 9 10 Total
No. of. students: 1 5 6 7 10 15 10 5 59
Solution:
N +1 59+1
Position of median is = = 30.
2 2
When all the 59 items are ascending order, which is in 30 th position. It is included in cf = 44
Median = 8.
Problem: 2 Locate median from the following
Solution:
173+1
= Size of th item = 87 th item
2
Median = 7.
Continuous series:
( N2 −cf )
Median = L+
[ i
f ]
Where , L is the lower boundary of the class interval
Problem:
Class intervals are continuous and are in ascending order. N/2 = 30/2 = 15.
15th cumulative frequency is included in the interval 155-160. It is the median class interval
L = 155 , f = 10 , i= 160-155=5, cf = 7
( N2 −cf )
M=L+
[ i
f ]
5( 15−7)
= 155 + [ 10 ]
5x8
= 155 + [ ]
10
= 155+4 = 159cms.
Mode
Definition: Mode is defined as the value of the variable which occurs most frequently in a distribution.
Individual Series:
The value or the values which occur more times are identified.
(i) 320, 395, 342, 444, 557, 395, 425, 417, 395, 401, 390, 400.
Solution: (i) 395 repeats three times , therefore the mode is 395 ( unimodal)
Discrete series:
Size: 10 11 12 13 14 15 16 17 18
Frequency: 10 12 15 19 20 8 4 3 2
Solution: Greatest frequency is 20 modes need not be 14 because the difference between the
greatest frequency 20 and the next lower frequency 19 is very small. Further 19 has the support of
the neighboring frequency 15 while 20 has the support of 8 only. Grouping table and the analysis
table are formed as explained earlier.
Solution: Grouping table
Size Frequency
X F (2) (3) (4) (5) (6)
10 10 22
11 12 27 37
46
12 15 34
13 19 39 54
14 20 28 47
15 8 12 32
16 4 7 15
17 3 9
18 2
Analysis Table:
∴ Mode = 5
Continuous series:
Mode, Z = L + ¿ x i
Z = Mode,
Find out mode for the following data using group and analysis table .
i) Grouping table
Frequency
C.I F (2) (3) (4) (5) (6)
0-5 9 21
5-10 12 27 36
43
10-15 15 31
15-20 16 33 48
20-25 17 32 33
25-30 15 25 42
30-35 10 23 38
35-40 13
ii) Analysis table
Mode, Z = L + ¿ x i
L= 20-25, f 1 = 17 , f 0=16 , f 2 = 15 , i= 5
17−16
Mode, Z = 20 + [ 2 x 17−16−15
x5 ]
= 20 + 1.67
= 21.67
Geometric mean:
Definition: Geometric mean of N values is the Nth root of the product of the N values.
N
If x 1 , x2 , x3 , …… … …… x N are the values, their geometric mean is √ X 1 , X 2 , X 3 , …… … … X N
∑ log X
Formulae: G.M = Antilog
[ N ] for Individual Observation
f log X
= Antilog
[∑ N ]
for Discrete Observation
f log m
= Antilog
[∑ N ]
for Continuous Observation
Individual Series:
Solution:
X Log X
3 0.4771
6 0.7782
24 1.3802
48 1.6812
log X
∴G.M = Antilog [∑ ] N
for Individual Observation
43167
= Antilog [ 4 ]
= Antilog[ 1.0792 ] = 12.00
Discrete series:
X: 10 15 25 40 50
F: 4 6 10 7 3
Solution:
X F Log X F log X
10 4 1.0000 4.0000
15 6 1.1761 7.0566
25 10 1.3979 13.9790
40 7 1.6021 11.2147
50 3 1.6990 5.0970
N= 30 ∑ flogx=41.3473
f log x
G.M = Antilog
[∑ N ]
413473
= Antilog [ 30 ]
= Antilog[ 1.3782 ] G.M = 23.89
Continuous Series:
G.M = 25.63
Harmonic mean
Definition: Harmonic mean is the reciprocal of the mean of the reciprocals of the values.
N
Formula: H.M =
∑ ( X1 ) for Individual Observation
N
H.M =
∑ ( Xf ) for Discrete Observation
N
H.M =
∑ ( mf ) for Continuous Observation
Individual series:
Problem 1: Find the Harmonic mean for the following Individual data.
Solution:
1
Value X x
6 0.1667
15 0.0667
35 0.0286
40 0.0250
900 0.0011
520 0.0019
300 0.0033
400 0.0025
1800 0.0006
2000 0.0005
1
∑ x = 0.2969
N
∴ H.M =
∑ ( X1 )
10
= 0.2969 = 33.68
Discrete Series:
X: 10 12 14 16 18 20
F: 5 18 20 10 6 1
Solution:
X F F
X
10 5 0.5000
12 18 1.5000
14 20 1.4286
16 10 0.6250
18 6 0.3333
20 1 0.0500
N=60 F
∑ X =4.4369
N
∴ H.M =
∑ ( Xf )
60
H.M =
4.369
= 13.521
Continuous Series:
F
Value Frequency (f) Mid value (M) M
0-10 8 5 1.6000
10-20 12 15 0.8000
20-30 20 25 0.8000
30-40 6 35 0.1714
40-50 4 45 0.0889
N= 50 - F
∑ M =3.4603
N
∴ H.M =
∑ ( mf )
50
= 3.4603
= 14.45
UNIT – II
Measures of Dispersion
Introduction:
In a series, all the items are not equal. There is difference or variation among the values. The
degree of variation is evaluated by various measures of dispersion.
Averages are central values. They enable comparison of two or more sets of data. They are not
sufficient to depict the true nature of the sets. For example, consider the following marks of two
students.
Student I Student II
68 85
75 90
65 80
67 25
70 65
Both have got a total of 345 and an average of 69 each. The fact is that the second student has
failed in one paper. When the averages alone are considered, the two students are equal.
What is Dispersion?
Simplest meaning that can be attached to the word ‘dispersion’ is a lack of uniformity in the sizes or
quantities of the items of a group or series. “Dispersion is the extent to which the magnitudes or
quantities of the items differ, the degree of diversity.” The word dispersion may also be used to indicate
the spread of the data.
In all these definitions, we can find the basic property of dispersion as a value that indicates the extent to
which all other values are dispersed about the central value in a particular distribution.
Types of Dispersion
The measures of dispersion can be either ‘absolute’ or “relative”. Absolute measures of dispersion are
expressed in the same units in which the original data are expressed. For example, if the series is
expressed as Marks of the students in a particular subject; the absolute dispersion will provide the value
in Marks. The only difficulty is that if two or more series are expressed in different units, the series
cannot be compared on the basis of dispersion.
‘Relative’ or ‘Coefficient’ of dispersion is the ratio or the percentage of a measure of absolute dispersion
to an appropriate average. The basic advantage of this measure is that two or more series can be
compared with each other despite the fact they are expressed in different units. Theoretically, ‘Absolute
measure’ of dispersion is better. But from a practical point of view, relative or coefficient of dispersion
is considered better as it is used to make comparison between series.
Methods of Dispersion
Methods of studying dispersion are divided into two types :
(i) Mathematical Methods: We can study the ‘degree’ and ‘extent’ of variation by these methods. In
this category, commonly used measures of dispersion are :
(a) Range
(b) Quartile Deviation
(c) Average Deviation
(d) Standard deviation and coefficient of variation.
(ii) Graphic Methods: Where we want to study only the extent of variation, whether it is higher or
lesser a Lorenz-curve is used.
Mathematical Methods
Range
It is the simplest method of studying dispersion. Range is the difference between the smallest
value and the largest value of a series. While computing range, we do not take into account frequencies
of different groups.
Formula: Absolute Range = L – S
L−S
Coefficient of Range = L+S
where, L represents largest value in a distribution S represents smallest value in a distribution We can
understand the computation of range with the help of examples of different series,
(i) Raw Data: Marks out of 50 in a subject of 12 students, in a class are given as follows:
12, 18, 20, 12, 16, 14, 30, 32, 28, 12, 12 and 35.
In the example, the maximum or the highest marks obtained by a candidate is ‘35’ and the lowest marks
obtained by a candidate are ‘12’. Therefore, we can calculate range;
L = 35 and S = 12
Absolute Range = L – S = 35 – 12 = 23 marks
L−S
Coefficient of Range = L+S
----------------------------------------------------------
Marks of the Students in No. of students
Statistics (out of 50)
-----------------------------------------------------------
Smallest 10 4
12 10
18 16
Largest 20 15
-----------------------------------------------------------
Total = 45
-----------------------------------------------------------
------------------------------------------
X Frequencies
------------------------------------------
10 – 15 4
S = 10 15 – 20 10
L = 30 20 – 25 26
25 – 30 8
-------------------------------------------
Range is a simplest method of studying dispersion. It takes lesser time to compute the ‘absolute’ and
‘relative’ range. Range does not take into account all the values of a series, i.e. it considers only the
extreme items and middle items are not given any importance. Therefore, Range cannot tell us anything
about the character of the distribution. Range cannot be computed in the case of “open ends’ distribution
i.e., a distribution where the lower limit of the first group and upper limit of the higher group is not
given.
The concept of range is useful in the field of quality control and to study the variations in the prices of
the shares etc.
Merits:
1. It is simple to compute and understand
2. It gives a rough but quick answer.
Demerits:
1. It is not reliable because it is affected by the extreme items.
2. It cannot be applied to open and cases.
3. It is not suitable for mathematical treatment.
Uses:
1. Range is used in industries for the SQC of the manufactured product by the variation in the
construction of control chart.
2. Range is useful in studying the variation in the price of stock, shares and other commodities
that are sensitive to price changes from one period to one period.
3. The meteorological department uses the range for weather forecasts.
Solution:
In case of quartile-deviation, it is necessary to calculate the values of Q1 and Q3 by arranging the
given data in ascending of descending order.
Therefore, the arranged data are (in ascending order):
X = 10, 12, 18, 20, 25, 32
No. of items = 6
Q1 = the value of item = = 1.75th item
= the value of 1st item + 0.75 (value of 2nd item – value of 1st item)
= 10 + 0.75 (12 – 10) = 10 + 0.75(2) = 10 + 1.50 = 11.50
Q3 = the value of item =
= the value of 3(7/4)th item = the value of 5.25th item
= 25 + 0.25 (32 – 25) = 25 + 0.25 (7) = 26.075
Therefore,
(i) Inter-quartile range = Q3 – Q1 = 26.75 – 11.50 = 15.25
Q 3−Q1
(ii) Semi-quartile range = 2
Q −Q3 1
Example:
----------------------------------------
----------------------------------------
60 4
100 20
120 21
140 16
160 9
----------------------------------------
Solution:
-------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
60 4 4
100 20 24 – Q1 lies in this cumulative
120 21 45 frequency
140 16 61 – Q3 lies in this cumulative
160 9 70 frequency
---------------------------------------------------------------------------------------
N = ∑f = 70
----------------------------------------------------------------------------------------
Calculation of Q1 : Calculation of Q3 :
Q1 = size of th item Q3 = size of th item
= size of th item = 17.75 = size of th item = 53.25th item
17.75 lies in the cumulative frequency 24, 53.25 lies in the cumulative frequency 61 which
which is corresponding to the value Rs. 100 is corresponding to Rs. 140
Q1 = Rs. 100 Q3 = Rs. 140
-------------------------------------------------------------------------------------------
---------------------------------------------
---------------------------------------------
10 – 20 4
20 – 30 6
30 – 10 10
40 – 50 5
----------------------------------------
Solution:
In this example, the values of Q3 and Q1 are obtained as follows:
Q1 = Therefore, .It lies in the cumulative frequency 10, which is corresponding to class 20 – 30.
Therefore, Q1 group is 20 – 30.
where, l1 = 20, f = 6, i = 10, and cfo = 4
Q1 =Rs 23.75
Q3 =Therefore, = 18.75,which lies in the cumulative frequency 20, which is corresponding to
class 30 –40, Therefore Q3 group is 30 – 40.
Where, L = 30, i = 10, cf = 10, and f = 10
Q3 = = Rs. 38.75
Therefore :
(i)Inter-quartile range = Q3 – Ql = Rs. 38.75 – Rs. 23.75 = Rs.15.00
Q 3−Q1
(iii Semi-quartile range = 2
Q −Q
3 1
Merits:
1. It is simple to understand and easy to compute
2. It is not influenced by the extreme items
3. It can be found out with open and distribution
4. It is not affected by presence of extreme items.
Demerits:
1. It ignores the first 25% of the items and the last 25% of the items.
2. It is a positional average ; hence not amenable to further mathematical treatment
3. Its value is affected by sampling fluctuations
4. It gives only a rough measure.
Standard Deviation
Merits:
1. It is rigidly defined
2. It is the most important and widely used measure of dispersion
3. It is possible for further algebraic treatment
4. The standard deviation provides the unit of measurement for the normal distribution
5. Standard deviation used in finding the coefficient of variation.
Demerits:
1. It is not easy to understand and it is difficult to calculate
2. It gives more weight to extreme values
3. It is affected by the value of every item in the series
4. It cannot be used for the propose of comparison.
Uses:
Standard deviation it is the best measure of dispersion. It is widely used in statistic because it
processes most of the characteristics of an ideal measure of dispersion. It is widely used in sampling
theory and by biologists. It is used in coefficient of correlation and in the study of symmetrical
frequency distribution.
Skewness
Consider the following three continuous series with common mid values.
Coefficients of skewness are calculated later. But the values of averages and quartiles are
presented in a tabular form now. The frequency curves are also drawn at the bottom of the table.
How a symmetric curve looks, what is the relation between the averages in such a case and how
the quartiles are related then are a few questions for which the answers are being found. These aspects of
skewed curves are also known.
Q 3 - M =M -Q 1 Q 3 - M ¿M -Q 1 Q 3 - M ¿M -Q 1
Quartiles Q 1 = 42.95 Q 1 = 39.81 Q 1 = 42.19
M= 50.00 M= 46.73 M= 53.27
Q 3 = 57.05 Q 3 = 57.81 Q 3 = 60.19
Longer tail in the right Longer tail in the
Nature of the curve Bell shaped side (Skewed to the left side (Skewed
right) to the left)
Absolute Measures: The following are the two absolute measures of skewness. They are of no
practical use. They indicate whether there is skewness or not; when there is skewness. Whether it is
positive or negative. They could not be used for comparison.
1. Mean – Mode
2. (Q3−¿¿ M) – (M - Q3 )
Even Mean – Median and Median – Mode are suggested as measures of skewness.
Relative Measures: The following five are the relative measures. According to G. Simpson and
F.Kafka. “ the same amount of skewness (absolute) and in distribution meanings in distributions
with small variation and in distributions with small variation and in distribution with large
variation.” Absolute measures are divided by certain measures of dispersion to eliminate the
influence of variation. Relative measures are called coefficients. They are used to compare two or
more series.
1. Karl-Pearson (1867 – 1936) was a great British Biometrician and Statistician. He introduced the
formula given below.
Karl-Pearson’s Coefficient of skewness,
Mean−Mode
SK P =
Standard Deviation
Theoretically, no limit can be found for this measure. This is found mostly to vary between -1
and +1. Based on the interrelation between mean, median and mode in a moderately skewed distribution,
his second formula:
3 (Mean−Median)
SK P =
Standard Deviation
It can be used when mode is ill defined. Theoretically, this measure lies between -3 and +3. But, this
lies outside -1 and +1, rarely.
D9+ D1−2 M
=
D9−D1
This method is also useful where there is an open end class interval or extreme values are
present. This formula is better than Bowley’s. Bowley’s formula ignore the lowest 25% and the
highest 25%. This formula ignores only the lowest 10% and the highest 10%. However, Kelly’s
coefficient is very rarely used.
μ 23
β 1 (read, beta one ) = , μ2 , the second central moment and
μ 32
μ3 the third central moment are considered later under moments in this chapter.
5. Moment Coefficient of Skewness,
μ3
γ 1 ( read, gamma one ) = 3
2
μ 2
Moment measures are calculated for distributions such as Binomial, Poisson, Normal, Chi-
square, Student’s t and F. Karl- Pearson’s coefficient is widely used in numerical data.
It is based on the best measure of central tendency, mean and the best measure of dispersion,
standard deviation.
Range:
Definition: Range is the difference between the greatest ( Largest) and the smallest of the values.
In symbols, Range = L – S
In individual observation and discrete series, L and S are easily identified. In continuous series,
the following two methods are followed.
Problem 1: Find the value of range and its coefficient for the following data.
8 10 5 9 12 11
Solution: L = 12 S= 5
Range = L - S
= 12 - 5 = 7
L−S
Coefficient of Range =
L+S
12−5
=
12+5
7
=
17
= 0.4118
Problem 2: Calculate range and its Coefficient from the following distribution:
The lower boundary of the lowest class, S= 59.5 and the upper boundary of the highest class, L =
74.5
L−S
Coefficient of Range =
L+S
74.5−59.5
=
74.5+59.5
= 0.1119
Range = L - S
= 73 - 61
= 12
L−S
Coefficient of Range =
L+S
73−61
=
73+61
12
=
134
= 0.0896
Definition: Quartile Deviation is half of the difference between the first and the third quartiles. Hence it
is called Semi Inter Quartile Range.
Q3−Q
In symbols, Q.D = Q.D . is the abbreviation. Among the quartiles Q 1 , Q 2 , ¿ Q 3 the rangeis Q 3−¿ Q ¿
1
2 1
Q3−Q
Hence, Q 3−¿Q ¿ is called inter quartile range and , semi inter quartile range.
1
2
1
Q3−¿ Q
Coefficient of Quartile Deviation =
1
¿
Q3+¿Q ¿
1
As mentioned in the previous chapter, 25% above or equal to Q 3. Q 3−¿Q ¿ is the distance between
1
Q3−Q
Q 1 , ¿ Q3 ,. Central 50 % of the items lie between Q 1 , ¿ Q3.. It is customary to consider as an1
2
absolute measure of dispersion.Definition and calculations of Q1 , ¿ Q 3 , for all types of data were
considered in the previous chapter.
Individual Series:
Problem 1: what do you mean by Quartile Deviations? Find the Quartile Deviation for the following.
391, 384, 591, 407, 672, 522, 777, 733, 1490, 2488.
Solution: The given values in ascending order: 384, 391, 407, 522, 591, 672, 733, 777, 1490, 2488.
N + 1 10+1
Position of Q1 is = = 2.75
4 4
= 391 + 12
∴ Q 1 = 403
Discrete Series:
Problem: Weekly wages of a labourer area given below. Calculate Q.D and Coefficient of Q.D.
Solution:
N + 1 52+1
Position of Q1 is = = 13.25.
4 4
Continuous Series:
Problem 1: For the data given here, give the quartile deviation.
N 400
= = 100 : Q 1 class is 500.5 - 650.5 ∴ L = 500.5 ; f = 189 ;
4 4
( N4 −cf )
∴ Q1 = L +
[ i
f ]
150 ( 100−48 )
= 500.5 + [ 189 ]
150 X 52
= 500.5+ [ 189 ]
= 500.4 + 41.27 = 541.77
3N
= 3 x 100 =300 ; Q 3 class is 650.5 - 800.5
4
f ]
150 ( 300−237 )
= 650.5 + [ 88 ]
150 X 63
= 650.5+ [ 88 ]
= 650.5 + 107.39 = 757.89
Standard Deviation
Definition: Standard Deviation is the root mean square deviation of the values from their arithmetic
mean.
S.D is the abbreviation and σ (read, sigma) is the symbol. Mean square deviation of the values
from their A.M is Variance and is denoted by σ 2 . S.D is the positive square root of variance. Karl
Pearson introduced the concept of standard deviation in 1893. S.D is also called root mean square
deviation. It is a mathematical deficiency of mean deviation to ignore negative sign. Standard deviation
possesses most of the desirable properties of a good measure of dispersion. It is the most widely used
absolute measure of dispersion. The corresponding relative measure is Coefficient of Variation. It is
very popular and so extensively used as raise a doubt whether there is any other relative measure of
dispersion.
Standard Deviation
Coefficient of Variation = x 100
Arithmetic Mean
Individual Observation:
Problem: 1 10 students of B.Com class of a college have obtained the following marks in Statistics
out of 100 marks. Calculate the Standard Deviation.
S. No: 1 2 3 4 5 6 7 8 9 10
Marks 5 10 20 25 40 42 45 48 70 80
:
Solution:
Marks
S. No X X2
1 5 25
2 10 100
3 20 400
4 25 625
5 40 1600
6 42 1764
7 45 2025
8 48 2304
9 70 4900
10 80 6400
Total X
∑ = 385 ∑ X 2 = 20143
2
Standard Deviation: Formula, σ =
√ ∑ X2 − ∑ X
N ( )
N
2
20143 385
=
√ 10
−
10 ( )
=√ 2014.3−( 38.5 )2
= √ 2014.30−1482.25
= √ 532.05
= 23.07
Discrete Series:
X F Fx fX 2
0 1 0 0
1 2 2 2
2 4 8 16
3 3 9 27
4 0 0 0
5 2 10 50
Total N= 12 ∑ fX = 29 ∑ fX 2 = 95
2
Standard Deviation, σ =
√ ∑ f X 2 − ∑ fX
N ( N )
2
95 29
=
√ −
12 12 ( )
2
= √ 7.9167− (2.4167 )
= √ 7.9167−5.8404
= √ 2.0763 = 1.44
Continuous Series:
Problem 3: The following data were obtained while observation the life span of a few neon lights of a
company. Calculate S.D.
Life Span No. of. Neon Lights (f) Mid Value (m) Fm fm 2
(Years):
4-6 10 5 50 250
6-8 17 7 119 833
8-10 32 9 288 2592
10-12 21 11 231 2541
12-14 20 13 260 3380
Total N= 100 - ∑ fm = 948 ∑ f m2= 9596
2
Standard Deviation, σ =
√ ∑ f m2 − ∑ fm
N ( N )
2
9596 948
=
√ 100
− ( )
100
= √ 95.96−( 9.48 ) 2
Definition: Variance is the mean square deviation of the values from their arithmetic mean.
Individual series:
2 0 1 3 0 4 3 1 1 2
Calculate variance.
Solution:
X X2
2 4
0 0
1 1
3 9
0 0
4 16
3 9
1 1
1 1
2 4
∑ X = 17 ∑ X 2 = 45
Mean, X́ =
∑X =
17
= 1.7
N 10
Variance, σ 2 = ∑ X2 =
45
= 4.5
N 10
Discrete Series:
Problem: 2 From the following data on daily sales of TV sets, calculate variance.
2
∑ fX 2 - ∑ fX 4873 317 2
Variance, σ 2
=
N ( N ) =
25
- ( )
25
= 34.14
Continuous Series:
Problem 3: The heights of the recruits are noted as follows. Calculate the variance.
Height (cms): 150 - 155 155 – 160 160 – 165 165- 170 170- 175
2
∑ fm2 ∑ fm
Variance, σ 2
=
N (
-
N )
2
2657775.00 1629.0
=
100
–
100 ( ) = 26577.75 – 265.3641
= 26312.3859
Coefficient of Variation
Standard Deviation
Formula: Coefficient of Variation = x 100
Arithmetic Mean
S. D
C.V is the abbreviation. ∴ C.V. = X 100
A .M
σ
= X 100
X́
Individual Series:
Problem :1 The means and standard deviation values for the number of runs of two players A and B are
55; 65 and 4.2; 7.8 respectively. Who is the more consistent player?
Solution: Given:
S. D
∴ Coefficient of Variation of player A = X 100
A .M
4.2
= X 100 = 7.64
55
S. D
∴ Coefficient of Variation of player B = X 100
A .M
7.8
= X 100 = 12.00
65
Coefficient of Variation of player A is less. Therefore, Player A is the more consistent player.
40 41 45 49 50 51 55 59 60 60
Solution: Mean, X́ =
∑X =
510
= 51.00
N 10
2
X X - X́ ( X − X́ )
X́ = 51
40 -11 12
41 -10 100
45 -6 36
49 -2 4
50 -1 1
51 0 0
55 4 16
59 8 64
60 9 81
60 9 81
∑ X = 510 ∑ ( X− X́ )2 = 504
504
S.D, σ = √ ∑ ¿ ¿ ¿ ¿ ¿ =
√ 10
= √ 50.4 = 7.10
σ 7.10
C.V. = x 100 = x 100 = 13.92
X́ 51.00
Discrete Series:
Problem: 4 From the following price of gold in a week, find the city in which the price was more
stable.
City A X - X́ City B X - X́
X1 X́ = 503 ( X − X́ )
2
X2 X́ = 501 ( X − X́ )
2
498 -5 25 500 -1 1
500 -3 9 505 4 16
505 2 4 502 1 1
504 1 1 498 -3 9
502 -1 1 496 -5 25
509 6 36 505 4 16
∑ X 1= 3018 - ∑ ( X− X́ )2= ∑ X 2 = 3006 - ∑ ( X− X́ )2= 68
76
X́ =
∑X =
3018
= Rs. 503 X́ 2 =
∑X =
3006
= Rs. 501
N 6 N 6
76 68
S.D, σ 1 = √ ∑ ¿ ¿ ¿ ¿ ¿ =
√ 6
S.D, σ 2 = √ ∑ ¿ ¿ ¿ ¿ ¿ =
√ 6
= √ 12.6667 = √ 11.3333
= Rs. 3.56 = Rs. 3.37
σ1 σ2
C.V. = x 100 C.V. = x 100
X́ 1 X́ 2
3.56 3.37
= x 100 = x 100
503 501
= 0.71 = 0.67
Coefficient of Variation of price in City B is less. Hence, the price was more stable in City B.
Discrete Series:
Problem 5: Goals scored by two teams A and B in a series of football matches were observed as
follows.
Solution: Goals (X) are common. No. of matches (f) differ between the teams.
Mean, X́ 1 =
∑ f1X =
49
= 1.96 X́ 2 =
∑ f2X =
54
= 2.25
N1 25 N2 24
2 2
∑ f 1 X2 − ∑ f 1 X ∑ f 2 X2 − ∑ f 2 X
S.D., σ 1 =
√ N1 ( N1 ) S.D., σ 2 =
√ N2 ( N2 )
= 1.61 = 1.61
σ1 σ2
C.V. = x 100 C.V. = x 100
X́ 1 X́ 2
1.61 1.61
= x 100 = x 100
1.96 2.25
= 82.14 = 71.56
Coefficient of variation of Team B is less. Hence, Team B is the more consistent team.
Mean deviation
The mean, X́ =
∑X is calculated first. From each X.
N
Median or mode, whichever is required, is calculated first. Then, as in M.D .about mean, other
calculations follow.
Example: 1
Daily earning in (Rs. X) of 10 coolies are given. Calculate all the three mean deviations and the
corresponding relative measures.
X: 32 51 23 46 20 78 57 56 57 30
Solution:
Mode, Z= Rs. 57
Discrete series: The measure of central tendency Mean or Median or Mode is calculated first. The
following formulae are used later.
f | X− X́|
Mean Deviation (About Mean) = ∑
N
The mean, X́ =
∑ fX is calculated first.
N
Problem:1 From the marks secured by 120 students in Section A and 120 students in Section B of a class, the
following measures are obtained:
Marks of Section A are more skewed. But, marks of Section A are negatively skewed and marks of
Section B are positively skewed.
Problem:2 From a moderately skewed distribution of retail prices for men’s shoes, it is found that the mean price
is Rs. 20 and the median price is Rs. 17. If the coefficient of variation is 20% , find the pearsonian coefficient of
skewness of the distribution.
σ
Solution: Consider C.V = x 100
X́
σ
By substituting the given values, 20 = x 100
20
20 X 20
∴σ = = 4
100
Problem:3 The sum and the sum of the squares of 60 items are 1860 and 67100 respectively. Mode is 28.49. Find
Pearson’s coefficient of skewness.
∑ X 2 - X́ 2
S.D., σ =¿
√ N
67100
=
√ 60
−(31)2
= √ 1118.3333−961
=√ 157.3333
= 12.54
Individual Series:
Problem :1 Calculate Karl Pearson’s coefficient of skewness for the following data:
25 15 23 40 27 25 23 25 20
Solution:
X X2
25 625
15 225
23 529
40 1600
27 729
25 625
23 529
25 625
20 400
∑ X = 223 ∑ X 2 = 5887
Mean , X́ =
∑ X ¿ 223 = 24.78
N 9
2
∑ X2 - ∑ X 5887
S.D., σ =
√ N ( ) √
N
=
9
2
−( 24.78 ) = √ 654.1111−614.0484
= √ 40.0627 = 6.33
Mode, Z = 25.00
Discrete series:
Problem: 2 Calculate Karl- Pearson’s coefficient of skewness for the following data:
Mean, X́ =
∑ fX = 5025 = 25.125
N 200
2
∑ f X 2 - ∑ fX
S.D., σ =
√ N ( N )
2
141415 25
=
√ 200
−
200 ( ) = 8.71 ; Mode z= 25
Continuous Series:
Mean, X́ =
∑ fm = 3360 = 33.60
N 100
2
∑ f m2 - ∑ fm
S.D., σ =
√ N ( N )
2
128100 3360
=
√ 100
−
100 ( ) = 12.33 ; Mode z= 35.56
Bowley’s Coefficient
Q3+ Q1−2 M
Formula: SK B =
Q3−Q1
Q1 M Q3
Series A 40 60 80
Series B 62.85 65.25 72.15
Q 3+ Q 1−2 M Q 3+ Q 1−2 M
SK B = SK B =
Q 3−Q 1 Q 3−Q 1
0 4.50
= =0 = = 0.4839
40 9.30
No. of. Children per family No. of. Families (f) Cum. Freq. (cf)
(X)
0 7 7
1 10 17
2 16 33←
3 25 58←
4 18 76←
5 11 87
6 8 95
N= 95 ----
N +1 95+1
Position of Q1 is = = 24 ∴ Q1 = 2
4 4
N +1 95+1
Position of M is = = 48 ∴M=3
2 2
Annual sales (in Rs. 0-20 20-50 50-100 100-250 250-500 500-1000
000)
No. of. Items: 20 50 69 30 25 19
Solution:
Annual Sales (in Rs. 000) No. of. Items Cum. Freq.(cf)
F
0-20 20 20
20-50 50 70 ←
50-100 69 139 ←
100-250 30 169 ←
250-500 25 194
500-1000 19 213
Total N= 213 ----
N 213
= = 53.25 ∴ 20-50 is the Q1 class. ∴ L = 20; i= 50-20 =30; f=50; c f= 20
2 4
( N4 −cf )
∴ Q1 = L +
[ i
f ]
30 ( 53.25−20 )
= 20+ [ 50 ]
30 X 33.25
= 20+ [ 50 ]
= 20 + 19.95 = 39.95
3N
= 3 x 53.25 =159.75 ; Q3 class is 100 - 250
4
( 34N −cf )
∴ Q3 = L +
[ i
f ]
150 ( 159.75−7 )
= 100+ [ 30 ]
150 X 20.75
= 100+ [ 30 ]
= 100 + 103.75 = 203.75
( N2 −cf )
∴ M= L +
[ i
f ]
50 ( 160.5−70 )
= 50+ [ 69 ]
50 X 36.5
= 50+ [ 69 ]
= 50 + 26.45= 76.45
90.80
= = 0.5543
163.80
UNIT – III
CORRELATION
Introduction:
So far we have confined ourselves to Univariate distributions, i.e., the distributions involving
only one variable. Often we come across situations in which our focus is simulation sly on two or more
variables and invariably, we observe that movements in one variable are accompanied by movements in
other variable. For example, husband’s age and wife’s age move together, scores on an I.Q. test move
with scores in university examinations, the study of variables indicating accompanying behavior is of
great interest in statistics.
Meaning of Correlation:
In a bivariate distribution we may be interested to find out if there is any correlation or
covariance between the two variables under study. If the change in one variable affects a change in the
other variables, the variables are said to be correlated.
Uses of Correlation:
It is used in deriving precisely the degree, and direction of relationship between variables like price
and demand, advertising expenditure and sales, rainfalls and crops yield etc.
It is used in reducing the range of uncertainty in the matter of perdition.
It is used in developing the concept of regression, and ratio of variation which help in estimating the
values of one variable for a given value of another variable.
In the field of economics it is used in understanding the economic behavior, and locating the
important variables on which the others depend.
In the field of business it is used advantageously to estimate the cost of sale, volume of sales, sales
price, and any other values on the basis of some other variables which are financially related to each
other.
In the field of nature also, it is used in observing the multiplicity of the inter-related forces.
Types of Correlation:
METHODS OF STUDING CORRELATION:
(i) Graphic Method:
1. Scatter diagram or Scatter diagram.
2. Simple graph or correlogram.
(ii) Mathematical Method:
1. Karl Pearson’s coefficient of correlation.
2. Spearman’s rank correlation coefficient.
3. Coefficient of concurrent deviation.
4. Method of least square.
Definition:
“Correlation analysis attempts to determine the degree of relationship between variables”. It
denoted by r. Example: price and demand of a commodity.(or)
Definition:
The term correlation refers to the relationship between the variables. Simple correlation refers
to the relationship between two variable.
Types of correlation:
When the values of two variables change in the same direction, there is positive correlation
between the two variables.
Example 1:
X 50 6 70 95 10 105
0 0
Y 23 3 37 41 46 50
2
Example 2:
X 34 25 18 10 7
Y 51 49 42 33 19
In the two examples X and Y change in the same
direction (X and Y increase in ex1 and they decrease in ex2). Hence ,there is positive correlation
positive correlation is generally found between the following pairs of variables.
When the values of two variables change in the opposite directions ,there is negative correlation
between the two variables.
Example 1:
X 50 60 70 9 100 105
5
Y 50 46 40 3 24 15
0
Example 2:
X 45 43 39 24 28
Y 14 20 28 29 34
If two variables tend to move together in opposite directions then the correlation
negative or diverse correlation.
ex: price and demand of a commodity, the volume and pressure of a perfect gas.
X: 10 20 30 40 50
Y: 50 40 25 15 10
(2) Simple Correlation:
When only two variables are considered as under positive or negative correlation above
the correlation between them is called simple correlation.(or)
When we study only two variables, the relationship is described as simple correlation.
(3) Multiple correlation:
When more than to variables are considered the correlation between one of them and its
estimate based on the group consisting of the other variables is called multiple correlation.
When more than two variables are considered, the correlation between two of them The
study of two variables excluding some other variables is called partial correlation.
Eg: We study price and demand, eliminating the supply side. In total correlation, all the facts are
taken into account.
If the ratio of change between two variables is uniform, then there will be linear correlation.
X: 5 10 15 20
Y: 4 8 12 16
If the ratio of change between two variables is un-uniform, then there will be non-linear
correlation.
X: 2 6 8 10
Y: 5 4 10 9
NO-Correlation:
When the points are scattered neither around a line nor around a curve, there is no
correlation between the two variables.
Methods:
The following four methods are available under simple linear correlation and among them ,
product moment method is the best one.
i) Scatter diagram.
ii) Karl person’s correlation co-efficient or product moment correlation co-efficient (r).
iii) Spearman’s rank correlation co-efficient (p).
iv) Correlation co-efficient by concurrent deviation method (r).
Scatter diagram:
Let (Xi ,Yi) i= 1,2,3…….N be the pairs of values of two variables X and Y.A point is plotted
on a graph sheet corresponding to each pair of the values .the resulting diagram with N points is called
scatter diagram.
Possible types of scatter diagrams under simple liner correlation are as given below from a
diagram, it can be found out whether it is perfect or high or low.
2. Simple Graph:
The values of two variables are plotted on a graph paper we get to curves. One for X variables
and another for Y variables. This two curves reveal the direction and closed of two curves, and also
reveal whether or nor the variables are related. If both the curves move in the same direction that parallel
to each either upward and downward correlation is said to the positive and the other hand, if they
opposite direction then the correlation is said to be negative.
Karl – Pearson’s Coefficient of correlation
Correlation coefficient between two random variables x and y, usually denoted by r(x,y) or
simply rxy , is a numerical measure of linear relationship between them and is defined as:
cov ( x , y)
r ( x , y )= or r =❑
σxσy ❑
Karl Pearson’s correlation coefficient is also called product-moment correlation coefficient,
Since Cov (x, y)=E[{x-E(x)}{y-E(y)}]=µ11.
Properties:
Merits:
1. Karl person’s correlation co-efficient is the most popular correlation co-fficint.it is used in
regression equation also.
2. It is superior to other methods. It is calculated directly from the numerical values of each and
every pair. Even if one value change, r changes.
3. The population correlation co-efficient can be estimated from the sample value.
4. The significance of the sample correlation co-efficient can be tested.
Demerits:
1. The correlation co efficient is unduly affected by extreme values.
2. From the values of r, it cannot be known whether the assumption of linear relationship between
the variables holds or not.
3. Compared with other correlation co-efficient Karl person’s correlation co-efficient is the most
difficult one to calculate.
This method is based on rank. This measure is useful in dealing with qualitative characteristics,
such as intelligence, beauty, morality, character, etc. It cannot be measured quantitatively, as in the
case of Pearson’s coefficient of correlation; but it is based on the ranks given to the observations. It
can be used when the data are irregular or extreme items are erratic or in accurate, because rank
correlation coefficient is not based on the assumption of formality of data.
The formula for spearman’s rank correlation which is denoted by ρ is;
6 ∑ d2
ρ= 1 -
[ N ( N 2−1 ) ]
We may come across two types of problem.
When the actual ranks are given, the steps followed are;
1. Compute the difference of the two ranks (R1and R2) and denote by d.
2. Square the d and get ∑d2.
3. Substitute the figures in the formula.
When two or more items have equal values, it is difficult to give ranks to them. In that
case the items are given the average of the ranks they would have received, if they are not tied. For
7 +8
=7 . 5
example, if two individuals are placed in the seventh place, they are each given the rank 2
which is common rank to be assigned; and the next will be 9; and if three are ranked equal at the seventh
7 +8+9
=8
place, They are given the rank 3 which is the common rank to be assigned to each; and the
next rank will be 10, in this case. A slightly different formula is used when there is more than one item
having the same value. The formula is:
Merits:
Demerits:
1. It N is large; it is very difficult to rank the items and to calculate P.
2. It cannot be calculated from a bivariate frequency table.
3. It is not used mush.
A very simple and casual method of finding correlation when we are not serious about the
magnitude of the two variables is the application of concurrent deviations. The deviation in X-value
and the corresponding Y-value is known to be concurrent if both the deviations have the same sign.
2 C−N
r(c)=±
√ N
Where r(c) = Coefficient of correlation by the concurrent deviation method
C = Number of concurrent deviations
N =Number of pairs of deviation compared.
STEPS:
1. Find out the direction of change of x variable. Take the first value of x as base and note down
whether the second value is increasing or decreasing or constant. If it increases in relation to the
previous one, mark plus(+) sign against it; if it decreases, put minus(-) sign; and if it equal, put
zero. In the case of the third value, the second value is the base and repeat the above method till
the last item. The heading of the column is denoted by Dx.
2. Find out the direction of change of y variables, following the above step. The heading of the
column is denoted by Dy.
3. Multiply Dx by Dy and find out the values of C; i.e., the number of positive items.
4. Substitute the figures in the formula.
2 C−N
If
√ N
is negative, the negative value multiplied by the minus sign inside will make it
positive and we can take the square root. But if the ultimate result is negative, we cannot take the
2 C−N
square roots of minus sign. If
√ N
is positive, then all the signs will be positive.
Merits:
Demerits:
1. It is not precise. It just gives a rough idea about the existing correlation between two
variables.
2. It does not consider the quantum of deviations.
3. It cannot be calculated from a bivariate frequency table.
4. It is not used mush.
Regression
Introduction:
Regression literally means stepping back towards the average.
Used by British Biometrician Sir Francis Galton 1822- 1911 in connection with the
inheritance of stature.
“ Regression analysis a mathematical measure of the Average relationship between
two or more variables in terms of the original units of the data”
Regression equation: The value of the dependent variable is estimated corresponding
to any value of the independent variable by using the regression equation.
In Regression there are 2 type of variable.
(i) Dependent variable (ii) Independent variable
Both the methods are based on the principle of least squares. They give the same
requirements:
1. The two regression equations are generally different and are not be interchanged in their
usage.
The regression equation of Y on X is to be used to find the value of Y corresponding to
any specified value of X. Similarly, the regression equation of X
On Y is to be used to find the value of X corresponding to any specified value of Y. The
two regression equations become one and the same when r= -1 or +1. In such cases, both
X and Y are to be found from that equation.
4. The two regression coefficients and the correlation coefficient have the same sign.
Both b YX. and b XY have the same sign. r is also of the same sign. In other words, there are
only two possibilities -b XY , b YX. and r are positive or b YX. and r are negative at a time.
5. Both the regression coefficient cannot be greater than 1 numerically simultaneously:
When the signs are ignored, both b XY andb YX. cannot be greater than 1 simultaneously; either
both are less than 1 or one of them is less than 1.
6. Regression coefficient are independent of change of origin but are affected by change of
scale.
c d
b XY = b uv and b YX = b vu
d c
b XY ≠ b uv ± a ± b and b YX ≠ b vu ± a ± b
7. Each regression coefficient indicates is in the unit of the measurement of the dependent
variable
.
8. Each regression coefficient indicates the quantum of change in the dependent variable
corresponding to unit increase in the independent variable.
Uses of regression:
1. It is widely used method than correlation analysis.
2. It is used to estimate the relationship between two Economic variable income and Expenditure.
3. Predicts the value of dependent from the independent values.
4. We can calculate coefficient of correlation(r) and Coefficient of Determinationr 2.
5. Estimation of Demand curves, Supply, Production.
Correlation Regression
1. Correlation is the relationship between two 1. Regression means going back. The average
or more variables. It is expressed numerically. relation between the variables is given as an
equation.
2. Between two variables none is identified as 2. One of the variables is independent variable
independent or dependent variable. and the other is independent variables.
3. It does not study the cause and effect 3. It indicates the cause and effect relationship
relationship between the variable. between the variables and establishes a
functional relationship.
8. If the coefficient is positive, then two 8. The regression coefficient explains that the
variables are positively correlated and vice decrease in one variable is associated with the
versa. increase in the other variables.
This is also called product moment correlation co-efficient. this is denoted by r. this is
covariance between the two variables divided by the product of their standard deviations. this can be
calculated by using any one of the formulae choice of a formula depends on the nature of the data.
Example1:
The following table gives aptitude test scores and productivity indices of 8 randomly selected
workers.
Aptitude score
productivity: 57 58 59 59 60 61 62 64
Index: 67 68 65 68 72 72 69 71
Calculate the correlation co-efficient between aptitude score and productivity index.
Solution:
X-aptitude score ; Y- productivity index x-x́and y- ý are integers and small and hence the following
formula is used ∑ x =¿ ¿x-x́)=0 and ∑ y=∑ ¿¿y- ý)=0 are the properties.
x=X- X́ y=Y-Ý
X Y X́ =60 Ý =69 xy x2 y2
57 67 -3 -2 6 9 4
58 68 -2 -1 2 4 1
59 65 -1 -4 4 1 16
59 68 -1 -1 1 1 1
60 72 0 3 0 0 9
61 72 1 3 3 1 9
62 69 2 0 0 4 0
64 71 4 2 8 16 4
∑ x =480 ∑ y=552 ∑ x =0 ∑ y=0 ∑ x y =24 ∑ x 2=36 ∑ y2=44
X́ =
∑x =
480
= 60 ý =
∑y =
552
= 69
N 8 N 8
Example 2:
x 10 12 18 8 13 20 22 15 5 17
y 88 90 94 86 87 92 96 94 88 85
Solution:
X Y XY X2 Y2
10 88 880 100 7744
12 90 1080 144 8100
18 94 1692 324 8836
08 86 688 64 7396
13 87 1131 169 7569
20 92 1840 400 8464
22 96 2112 484 9216
15 94 1410 225 8836
05 88 440 25 7744
17 85 1445 289 7225
∑ X =140 ∑ Y =900 ∑ X Y =12718 ∑ X 2=2244 ∑ Y 2=81130
6 ∑ d2
ρ= 1 -
[ N ( N 2−1 ) ] when there is no tie. d-difference between x and y ranks.
m(m2 −1)
= 1
[
-
6 { ∑ d2+ 12
N ( N 2−1 )
}
] When one value occurs m times.
m ( m2−1 ) m(m2−1)
= 1 -
[ {
6 ∑d + 2
12
+
12
+…
2
N ( N −1 )
}
] When more than one value is repeated.
It is calculated when ranks are given or when rank correlation co-efficient is required. Rank
correlation co-efficient also lies between -1 and +1.
Problem: 3 Rankings of 10 trainees at the beginning (x) and at the end (y) of a certain course are given
below.
Trainees: A B C D E F G H I J
X 1 6 3 9 5 2 7 10 8 4
Y 6 8 3 7 2 1 5 9 4 10
Solution:
X Y d d2
1 6 -5 25
6 8 -2 4
3 3 0 0
9 7 2 4
5 2 3 9
2 1 1 1
7 5 2 4
10 9 1 1
8 4 4 16
4 10 -6 36
∑ d=0 ∑ d 2=100
6∑ d2
ρ = 1−
[ N ( N 2−1 ) ]
6 × 100
=1 - [ 10 × 99 ]
= 1 - 0.6061
= 0.3939.
Problem 3:
Find the rank correlation co-efficient for the percentage of marks secured by a group of 8 students
in economics and statistics.
Marks in economics 50 60 65 70 75 40 70 80
Marks in statistics 80 71 60 75 90 82 70 50
Solution:
X Y X Y d d2
50 80 7 3 4 16
60 71 6 5 1 1
65 60 5 7 -2 4
70 75 3.5 4 -0.5 0,25
75 90 2 1 1 1
40 82 8 2 6 36
70 70 3.5 6 -2.5 6.25
80 50 1 8 -7 49
2
∑ d=0 ∑ d =113.5
m(m2 −1)
ρ= 1-
[
6 ∑d + {12
2
N ( N 2−1 )
}
]
m(m 2−1)
When m = 2, =¿ 0.5
12
6 (113 . 5−0 . 5)
∴ ρ= 1 - [ 8 (82−1) ]
6 × 114
= 1− [ 8 × 63 ]
= 1 - 1.3571
= -0.3571
r c = 2 c−N
√ N
when 2C – N > 0
= 0 when 2C – N = 0
= -
√ −2c−N
N
when 2C-N <0
∴ r c =± ± 2C−N
√
N
N denotes the number of entries and c denotes number + signs (concurrent deviations) in D xy
column.
It a value is greater than the preceding value, + sign put. It is less than preceding one,-sign is
marked. If it is equal to the preceding one, deviations is O. Dx denotes such deviations among the values
of the variable x and D y denotes those of y. D XY denotes the product of the entries under D x and D y .
Problem: 1
Calculate the co-efficient of correlation from the data given below by the method of concurrent
deviations.
Imports(X) Prices(Y) Dx Dy D xy
85 110
82 115 - + -
89 112 + - -
95 118 + + +
104 120 + - -
108 109 + - --
112 98 - + -
100 102 - + -
99 103 - + -
93 105 - + -
90 107 - + -
∴ r c = − − 2 c−N
√[ N ]
−2× 2−10
0.7746.
= -
√ 10
¿−
√ −−6
10
=-
UNIT-IV
INDEX NUMBER
Definition: A Price index number is the Percentage of change in the Price of one commodity or
one group of commodities in the current year compared with the base year. A Similar calculation
in quantity results in quantity index number.
The units of measurements of commodities are different. But, a price index number gives
the percentage of change in prices on the average. Hence, index numbers are a special
type of averages. For example, let the commodities be rice, kerosene and cloth. The price
of rice per kilogram is considered; the price of kerosene per litre and the price of a cloth
per metre are considered. The average change in prices is indicated by the index number.
3. Index numbers indicate the percentage of change which is not possible otherwise.
No other statistical tool is so effective in studying such a wide variety of situations.
Index numbers have been devised to compare two different times. Comparisons of two
different places or situations are also possible with index numbers.
Uses:
1 .Index numbers provide scopes for comparisons. price , production, value etc. in two times
are compared by index numbers.
2 .Index numbers are Economic Barometers . The dictionary meaning of the word
barometers is that it is an “instrument measuring atmospheric pressure used for forecasting
weather and ascertaining height above Sea –Level. Index numbers of whole sale prices,
Industrial production etc.
3 .Index numbers serve as guides. Being economic barometers the direction in which the
economy is likely to move is foretold, Government, Businessmen, etc.
4 . Index numbers are the pulse of an economy. The condition of an economy is known from
the index numbers of various economic activities.
Price index
Money Wage
Real Wage = x 100
Price Index off Cost of Living Index
7 . Index numbers are deflators . Deflator is one while makes allowance for the change in
the prices of commodities.
8. Index numbers are useful to formulate policies: Based on the relevant index numbers
suitable policies are framed by businessmen and economics. Governments and industrialists also
use the prevailing conditions and benefits through planning.
The following aspects are to be carefully considered during the construction of an index number,
1 . The purpose of the index number is to be clearly known for whom it is meant, by whom it is
to be used etc. to be spelt out
2 . The Base Period. The period may be one year or a few years. The base period is to be taken
according to the purpose.
(i). It should be a normal period. There should not have been natural calamities such as famine,
flood and earth quake, political, up navels’, war , etc.
(ii). It should not be two distant in the past This is to keep the Index numbers useful.
3 . The items Including all the items in a study is neither feasible nor useful. Only those items
which concern the people for whom the index number is intended are to be included. For
considering the living conditions of people in hill stations woolen clothes should be included
4.The Price Quotations : The Prices are to the De Properly gathered. For consumer Price index
number, retail prices are necessary, For whole – Sale Price indices. Whole – Sale prices are
needed.
5 . The Average for arriving at the average value of a group of items, the suitable average is to be
decided. In other contexts A . M may be more useful. It may be simple to understand and easy to
calculate.
(i) G . M is the appropriate average to measure relative changes. Hence, index numbers
where in the relative changes are expressed as percentages give scope for G.M
(ii) It gives more weight age to smaller items and expressed as percentages, give scope
for G.M.
(iii) It facilities the change of the base period. Base cannot be kept the same for a long
time because the purpose and all around changes may warrant a change in the base
period.
6 . Weighting :By un weighted method, equal weight age of unity is given to all the items.
(i) Base year quantity as in Laspeyres method or current year quantity as in Paasche’s
method for Price index number.
(ii) Base year value (Price × quantity ¿ asin consumer Price index number by family
Budget method.
(iii) Some fixed weight based on neither base year quantity nor current year quantity but
on some other consideration as in Kelly’s method.
7 . The formula: As seen in the following pages, many formulas are available.
Period is referred to as year here after and the following notations are used.
P – Price of a commodity.
q – quantity of a commodity.
V or W – weight of a commodity.
P1 q1
P= × 100, Q= × 100.
P0 q0
P01 – price index numbers the current year compared with the base year.
Q01 – quantity index number of the current year compared with the base year.
Formulae :
Methods
P01 =
∑ P1 X 100
∑ P0
When quantity index number is required, Q 01 =
∑ q1 X 100
∑ q0
The drawbacks of this method are:
(i) It does not satisfy even unit test which is explained later. The defect is due to the
fact that the unit prices are added as such even though the units of measurements
are different suc as kg, liter, etc.
(ii) It does not distinguish between the commodities with regard to their relative
importance.
∑ P 1 q 0 x ∑ P1 q 1 x 100
(iii) Fisher’s formula: P01F =
√ ∑ P 0 q 0 ∑ P0 q1
∑ P1 ( q0 +q 1 )
(iv) Marshall- Edge worth formula: P01ME = x 100
∑ P0 ( q0 +q 1 )
=
∑ P1 q0 + ∑ P1 q1 x 100
∑ P0 q0 + ∑ P0 q1
1 ∑ P1 q0 +∑ P1 q1
(v) Bowley’s formula: P01B =
2 ( ∑ P0 q0 +∑ P0 q 1 ) x 100
P 01L + P01P
=
2
P01 =
∑ P1 X 100
∑ P0
By Laspeyre’s formula,
P01=
∑ P1 q0 x 100
∑ P 0 q0
By Paasche’s formula,
P01❑ =
∑ P1 q 1 x 100
∑ P0 q1
By Fisher’s Formula,
∑ P 1 q 0 x ∑ P1 q 1 x 100
P01F =
√ ∑ P 0 q 0 ∑ P0 q1
2. Time Reversal Test (T.R test)
P01 x P10 = 1
P01 x Q 01 =
∑ P 1 q1
∑ P 0 q0
4. Circular Test:
P01 x P 12 x P20 = 1
5. Fixed Base: When the data are available for more than two years, the question ‘ which is the
base year’ arise. Under fixed base method, the base ‘year’ is same for all the different years
under consideration. Base year figures may be figures of any one year or the averages of a
few years or the totals of a few years or those suggested. When nothing is indicated, the first
year in the series of years in chronological order is to be taken as the base.
If no method is suggested , the method suggested, the method which is suitable for the
data under consideration is to be chosen. For the given data, although index number can
be calculated by more than one method , the result is obtained by only one method unless
stated otherwise. The method is selected in the following order.
(i) Fisher’s formula (or)
(ii) Weighted A.M. method(Or)
(iii) Unweighted A.M. method.
For each commodity the price in a year is divided by that in 1995 and is
multiplied by 100 to get the price relative. Using A.M., the price indices are
calculated and are given in the last column of the above table.
For the first year which is the base year, fixed base index number as well as each
P is 100.
6. Chain Base index:
Current year link relatives X Preceding year chain index
Chain Index =
100
Cost of living index number shows the impact of changes in the prices of a number of
commodities and services on a particular class of people in the current year in
comparison with the base year, cost of Living Index Number.
Formula:
1. Cost of living index numbers are the indicators of changes in real wages. Money
wages ar changing and so are prices. Cost of living index numbers help to know
whether money wages overtake the rising prices or are overpowered by them.
2. Decisions on dearness allowance are based on the cost of living indices.
3. They are further used for deflation of income and value in national accounts.
INDEX NUMBER
Commodities A B C D E
Price
Commodities 1994( p0) 1995( p1) P1 Log P
P= x 100
P0
A 50 70 140.00 2.1461
B 40 60 150.00 2.1761
C 80 90 112.50 2.0512
D 110 120 109.09 2.0378
E 20 20 100.00 2.0000
Total p
∑ 0= 300 p
∑ 1 = 360 P
∑ = 611.59 ∑ log p =
10.4112
By Aggregative Method,
P01 =
∑ P1 X 100 = 360 x 100 = 120
∑ P0 300
log p 10.4112
Using G.M., P01 = Antilog (∑ ) N
= Antilog
5 ( )
= 120.84
Problem :2 Compute (i) Laspeyre’s (ii) Paasche’s and (iii) Fisher’s index number.
Price Quantity
Item Base year Current year Base year Current year
A 6 10 50 50
B 2 2 100 120
C 4 6 60 60
D 10 12 30 25
Solution:
Commodit Price Quantity
y
Base Current Base Current
year year year year
P0 P1 q0 q1 p0 q0 p1 q0 p0 q1 p1 q1
A 6 10 50 50 300 500 300 500
B 2 2 100 120 200 200 240 240
C 4 6 60 60 240 360 240 360
D 10 12 30 25 300 360 300 300
∑ P 1 q 0 x ∑ P1 q 1 x 100 =
(iii) Fisher’s formula: P01F =
√ ∑ P 0 q 0 ∑ P0 q1 √ 1420 1400
×
1040 1030
x 100
= 136.23 (or)
= √ 136.54 ×135.92
= 136.23
Problem: 1 Calculate the index number of prices for 1998 on the basis of 1995 from the data given below.
Weights Price
Commodity W 1995 1998 WP Log p W log P
p1
P= x 100
p0
A 40 16 20 125 5000 2.0969 83.8760
B 25 40 60 150 3750 2.1761 54.4025
C 5 2 3 150 750 2.1761 10.8805
D 20 5 7 140 2800 2.1461 42.9220
E 10 2 4 200 2000 2.3010 23.0100
Total ∑w = ---- ----- ------ ∑ ℘= ------ ∑ W log P
100 14300 = 215.0910
Problem: 1 Show that Fisher’s ideal index satisfies both time reversal and factor reversal tests, using the
following data commonly.
1990 1992 p0 q0 p1 q0 p0 q1 p1 q1
Commodit p0 q0 p1 q1
y
A 6 50 10 56 300 500 336 560
B 2 100 2 120 200 200 240 240
C 4 60 6 60 240 360 240 360
D 10 30 12 24 300 360 240 288
E 8 40 12 36 320 480 288 432
Total ---- --- --- --- ∑ p0q0 ∑ p1 q 0 ∑ p0 q1 ∑ p1 q 1=
= 1360 = 1900 = 1344 1880
By Fisher’s formula, after ignoring the facto 100,
∑ P 1 q 0 × ∑ P1 q1 =
P01 =
√ ∑ P 0 q 0 ∑ P0 q1 √ 1900 1880
×
1360 1344
∑ P 0 q 1 × ∑ p0 q 0 =
P10 =
√ ∑ P 1 q 1 ∑ P1 q0 √ 1344 1360
×
1880 1900
and so
∑ P 0 q 1 × ∑ P1 q1 =
Q 01=
√ ∑ p0 q0 ∑ P1 q0 √ 1344 1880
×
1360 1900
Using the given data, Fisher’s index in found to satisfy both time reversal and factor
reversal tests.
Problem: 1 Calculate fixed base index numbers from the following prices:
For the first year which is the base year, fixed base index number as well as each P is 100.
Problem: 1 Prepare index numbers from the average prices of three groups of commodities given below
by taking the base year 1998 and the weights as 5, 3, and 2 respectively.
The price of each commodity in every year is divided by its price in 1998 and is multiplied by
100 to get the price relative (P). The price relatives of the three commodities are multiplied by 5, 3, and
2 respectively to get WP values. They are added year wise (∑ ℘ ¿ ¿ and the total is divided by 10 (
∑ w ¿ ¿ to get fixed base index numbers.
Problem: 2 from the following prices of three groups of commodities for the years 1993 to 1997 find
the chain base index numbers.
Problem:1 Construct cost of living index, for 2000 taking 1999 as the base year from the following data
using ‘Aggregate Expenditure’ Method.
Cost
Article Quantity Price 2000(p1) p1q0 p0q0
of
1999(q0) 1999(q0)
A 6 5.75 6.00 36.00 34.50 Living
B 1 5.00 8.00 8.00 5.00 Index
C 6 6.00 9.00 54.00 36.00 =
D 4 8.00 10.00 40.00 32.00
E 2 2.00 1.80 3.60 4.00
F 1 20.00 15.00 15.60 20.00
= 156.00 = 131.50
∑ p1 q0 x 100
∑ p 0 q0
= 119.09
Problem:2 Calculate the cost of living index number from the following data.
Problem: 3 Using geometric mean, calculate the cost of living index number for the year 2000.
∑ Wlog P
Cost of Living Index Number = Antilog
( ∑W )
= Antilog ( 225.4112
100 )
= Antilog 2.2541
= 179.51.
UNIT – V
The series of values might have been observed at regular intervals of time such as daily sales,
Annual profits and decennial census.
The observations at the past periods of time indicate the conditions which existed. A
detailed study enables us to know further.
If the past conditions had continued what would be the present position?
What is the actual position now? What are the causes for the difference? Are we satisfied
with the present? Thinking in these lines helps not only to assess the present but also to plan
for the future.
There are many methods in Statistics to estimate the value of a variable at a certain time in
the future. Theories which dwell upon for and against each method are available in plenty. It
has been found that the forecasts by analysis of time series are most reliable.
1. Secular Trend
1) Secular trend:
The general tendency of the time series data is to increase (or) decrease (or) stagnate during a
long period of time is called secular trend. (or) long term trend .
The concept of trend doesn’t include short range Oscillations, but rather steady Movements over
a long time.
This phenomenon is usually observed in most of the series relating to economics and business.
Upward tendency is usually observed in most of the series relating to Economic and business for
Eg. Population, production, price, Income.
Downward – Death, epidemics.
Trend is the general, smooth, long-term average tendency.
The concept of trend does not include short-range Oscillation but rather steady movements over
a long term.
Mathematically, trend may be
i) Linear or ii) Non – linear.
Graphically, linear trend is a straight line. The discussion in this chapter is restricted to linear
trend. Parabolic trend equation, if necessary, can be formed as explained in ‘Method of Least
Squares’. Trend is the major component. All the other components put together are generally
small.
2) Seasonal fluctuation:
Seasonal is a period which is less than one year it may be a period of 6 months or 4 months
or 3 months or 1 month etc… Certain nature is observed in the first season, another nature is observed in
the second season, etc. Further, the same nature is observed in a season in every year. In other words, the
different natures recur year after year at the respective seasons. These variations over time are called
seasonal fluctuations.
The factors which cause seasonal variations are of the following two kinds:
(i) Climate and whether conditions: Sales of ice-cream, khadi and cotton clothes, etc. are more
during summer. A sale of umbrellas is at its peak during rainy season. Production of paddy,
wheat, etc. is more in a few months and less in order months of a year. Climate and weather
cause this kind of variation.
(ii) Customs, tradition and habits of the people: Sales of crackers and fireworks is found to be more
during. Deepavali every year. Cloth shops register very good sales during festival seasons such
as Deepavali, Pongal, Ramzon and Christmas sorting and delivering greetings. All these
variations in sales, work load, etc. are due to the customs, traditions and habits of the people.
3. Cyclical Fluctuations:
Cyclical fluctuations are similar to seasonal variations. The difference is in the interval of
recur in seasonal fluctuations a nature of the series recurs at an interval of one year. A cyclical
fluctuation recurs at an interval of 3 or more year. The fitting example is business cycle. In
economics and business, there are many time series which have certain wave – like movements
called business cycle. In economics and Business, there are many time series which have
certain wave-like movements called business cycles. In one period, profits areas easily made and
are made in plenty also. Prices are high. This period is called prosperity. After this 9peak)
condition, things decline instead of improving. High wages, decreasing efficiency, increasing
interest rate, etc. cause the decline. This is the period of recession. After touching the bottom
which is called depression the condition improves. The recovery from depression leads to
prosperity. The four phases of a business cycle, namely, (i) Prosperity (ii) Recession (iii)
Depression and (iv) recovery recur one after another regularly.
------------------------------------------------------------------------------------
4. Irregular variations:
Variations which do not come under the other three components are called irregular
variations. The other three components have certain regularity. But this is irregular fire, floods,
earthquakes, wars, lock-outs, strikes, etc, cause irregular variations. Sometimes Causes as above
for irregular variations are known. Sometimes causes may not be known. For example, there
may be very poor sales on a particular day in a leading cloth shop on the eve of Deepavali. Cause
for such a happening may not be known.
Irregular variation is called random variation or erratic fluctuations.
Models: There exist certain relations between the components and the series of
observation. The relation between the observed value and the components is called model. Many
models exist. In this book, only two models are considered.
Let Y be observed data, T or Y t be the trend, S be seasonal variation, C be cyclical
variation and I be irregular variation.
Short-term variation = Y - Y t
Many times series in Economics and Business are found to be of multiplicative model. A few
other series are found to be of additive model.
I. The number of points above the line is equal to The number points below the line, as far as
possible.
II. The sum of the vertical distances of the points above the line equals that of the points below
the line.
III. The sum of the squares of the vertical distances of all the points from line is the minimum.
Merits:
1. It is a simple method.
2. It is flexible based on the positions of the points; trend line (or) trend curve (non-linear) can be
drawn.
Demerits:
1. It is subjective, different persons get different trend lines (or trend curves)
2. It is not relied for prediction because f its subjective character.
When there are even numbers of years, the middle most year and arithmetic mean of the
observed values are found out for each half.
When there are odd numbers of years, the middle most years and the corresponding observed
mean value are omitted. The middle most years and the arithmetic mean of the observed values
are then found out for half.
Based on them two points are marked on a graph sheet. The two points are joined by a straight
line which is extended on either side. It is the trend line.
The trend at any point of time can be found from that line. Only two points are valued on a line.
There is no difficultly in drawing the line along the two points.
Merits:
Demeits:
1. It is not flexible.
2. It is based on arithmetic mean.
For a series, there is only one arithmetic mean; ther are many moving averages. Moving
totals are found and they are divided by appropriate number to get the moving averages. The
following two cases arise:
Case 1. Period of Moving Averages is an odd number such as 3 or 5 or 7……
Moving totals are found and written against the middle most years. Each moving total is divided
by the period of moving average and the corresponding moving average is found.
Moving average is the trend. If short-term fluctuations is required trend is subtracted from the observed
value.
Let a, b, c, …. Be the observed values. When 3 yearly moving averages are required, a+b+c,
b+c+d, c+d+e,…….. are the moving totals corresponding to second, third, fourth,….. years. Each total is
then divided by 3 to get the moving everage.
a+b+ c b+c +d c+ d +e
That is, , , ,…… are the moving averages corresponding to second, third,
3 3 3
fourth,…..years. There is no moving total or moving average corresponding to the first year and the last
year.
When 5 yearly moving averages are required, a+b=c+d=e, b+c+d+e+f, c+d+e+f+g, …..
Are the moving totals corresponding to third, fourth, fifth,…. Years. Each total is then divided by 5 to
a+b+ c+ d+ e b+c +d +e + f c+ d +e+ f + g
get the moving average. That is , , , …… are the moving
5 5 5
averages corresponding to third, fourth, fifth…. Years. For the first two years and the last two years,
there is no moving total or moving average.
The mid years of the moving totals are not the given years in this case. Hence, 2 period moving totals of
the moving totals are found. The given years are found to be mid years of these totals. 2 period moving
totals are divided by twice the period of moving averages to get the centered moving averages.The
centered moving averages are the trend values.
Merits:
Demerits:
1. The calculations are tedious when the period of moving average is large and an even numbers.
2. The period of moving average should suit the nature of the series or else a distorted picture of the
time series will average.
By taking the time (X) as independent variable and the observed values (Y) as the dependent
variable, the trend line of the form Y = a+bx can be formed as discussed in the chapter,
‘Method has been adopted as such. Afterwards, the method has been used as if it is non-
mathematical.
Merits:
1. Method of least squares in an objective method everyone has to get the same trend equation for a
data.
2
2. The trend lion obtains by this method is called the line of best fit ∑ ( y− y t )=0 and ∑ ( y− y t ) is
the least for the line.
Demerits:
1. It is neither simple nor easy. It requires more time than the other methods.
2. Extreme values affect the results unduly unlike in the method of moving average.
SEASONAL FLUCTUATIONS:
The following four methods are used to estimate the seasonal variations.
Merits:
Demerits:
1. It assumes the absence of trend in a time series. This assumption is not always true.
2. It assumes that the averaging process eliminates the seasonal fluctuations. It is also not true.
Solution:
Year is represented in X axis, production is represented in Y axis points. (1995,20), (1996,22),
(1997,25), (1998,26),(1999,25), (2000,27) and (2001,30) are values on a graph sheet. A control
line in the middle of those points is drawn such that the line satisfies the three conditions.
Graph 1. Trend line dy tue graphic method corresponding to X= 2003, tue Y cordinate of tue
point on the line is found to be 32.2 thus, the estimated production in the year 2003 is 32.2 units.
Problem:2 The sales in tones of a commodity varied from 1990 to 2001 as under.
280,300,280,280,270,240,230,220,220,210,200 fit a trend line by the method of semi average estimate
the sales in 2002. \
Graph 2. Trend line by the method of semi-averge points (1992.5,275.0) and (1998.5,215.0) are
marued on a graoph sheet. A line is drawn along them. It is the trend line corresponding to X= 2002, Y=
180 from the line. Hence, tue estimated sales in 2002 is 180 tonnes.
Problem:3 Calculate 5 yearly moving average of numbers 0f students studying in a commerce college
as shown by the following figures:
Year No. of students Year No. of students
1987 332 1992 405
1988 311 1993 410
1989 357 1994 427
1990 392 1995 405
1991 402 1996 438
Solution:
Problem: 4 Using four yearly moving averages calculate the trend values and short term fluctuations.
Year 1981 1982 1983 1994 1995 1996 1997 1998 1999
Productio 464 515 518 467 502 540 557 581 612
n
Solution:
4 yearly
centered Short term
4 yearly 2 period moving average fluctuation
Year Production moving totals moving totals ¿) y- y 6
1981 404 - - -
1982 515 1964 - - -
1983 518 2002 3966 495.75 22.25
1984 407 2027 4029 503.65 -96.63
1985 502 2066 4093 511.63 -9.63
1986 540 2170 4236 529.50 10.5
1987 557 2254 4424 553.00 4.00
1988 571 2326 4580 572.50 -1.5
1989 586 - - -
1990 612 - - -
4. METHOD OF LEAST SQUARES:
Problem: 5 Fit a straight line trend equation to the following data by the method of least squares and
estimate the value of sales for the year 1985
Let Y= afar be the equation of the trend line where X – year and Y – sales.
For finding the values of A and B. the normal equation. Are ∑ y = NA+ B∑ x
∑ xy = N∑ x+ B∑ x 2
Year X Sales Y=y x= xy x2 Trend Y+
X-1981
1979 100 -2 -200 4 100
1980 120 -1 -120 1 120
1981 140 0 0 0 140
1982 160 1 160 1 100
1983 180 2 360 4 180
That is y = 140+20(x-1981)
Corresponding to different values of x, the right hand side gives the trend component¿) hence,
the equation is written as.
¿) = 140+ 20 (x -1981)
SEASONAL FLUCTUATIONS
The following four methods are used to estimate the seasonal variations.
This method assumes absence of trend in a time series. The following are the steps.
vi. The data are arranges season – wise in chronological order.
vii. For each season the total of the seasonal is found and called seasonal total
viii. Each seasonal total is divided by number of year and seasonal average is obtained.
ix. The total and the average of the seasonal averages are found. The average is called grand
average.
x. Seasonal index of every season is calculated as follows.
seasonal average
Seasonal index = ×100
grand average
Problem: 6
Assuming no trend in the series, calculate seasonal indices for the following data.
QUARTER
Year I II III IV
1994 78 66 84 80
1995 76 74 82 78
1996 72 68 80 70
1997 74 70 84 74
1998 76 74 86 82
Solution:
Year QUARTER
I II III IV
1994 78 66 84 80
1995 76 74 82 78
1996 72 68 80 70
1997 74 70 84 74
1998 76 74 86 82
Seasonal total 376 352 416 384 Total grand average
Seasonal average 75.2 70.4 83.2 76.8 305.6 76.4
Seasonal index 98.4 92.2 108.9 100.5 400.0 -