You are on page 1of 108

ALLIED STATISTICS

FOR B. Com

(Common to B.Com (C.A), B.Com (C.S) and B.Com (Co – operation))


(For the candidates admitted from 2017 – 2018 onwards)
SEMESTER III: Allied Statistics
Paper - I: Business Statistical Methods
Unit – I:

Introduction – Types of data – Classification and Tabulation of Statistical data –Central


Tendency – Measure of Central Tendency – Mean, Median, Mode, Harmonic
Mean and Geometric Mean, Combined Mean.

Unit – II:

Dispersion: Measures of Dispersion – Range – Quartile deviation – Mean Deviation – Standard


Deviation and their co- efficient. Skewness: Measure of Skewness –Karl Pearson and Bowley’s
Coefficient of skewness.

Unit – III:

Correlation – Types of Correlation – Measures of Correlation – Karl Pearson’s Co– efficient of


Correlation – Spearman Rank Correlation Co – efficient. Simple regression analysis – Fitting of
Regression lines.

Unit – IV:
Index Number – Definition and Uses of Index Numbers, Construction of Index Numbers –
Simple & Weighted Index numbers – test for an Ideal index Number – Chain and Fixed base index –
Cost of living index numbers.

Unit – V:
Analysis of Time Series – Definition – Components and Uses of Time Series. Measures of
Secular trend, Measure of Seasonal Variation – Method of Simple average Only.

Text Books:
1. Business Mathematics and Statistics – P.A. Navanithan (2007) Jai Publishers, Trichy
– 21.
Reference Books :
1. Statistical Methods – S.P.Gupta
2. Statistics – D.C.Sanchati and V.K.Kapoor.
3.Elements of Statistics – Donald R.Byrkt.
4. Statistical Theory and Practice – Pillai. R.S.N Bagavathi. V (2001) S. Chand &
Company Ltd. 2009

Note:
1) Problems : 80% & Theory : 20%
2) This paper has to be taught by a statistics teacher.
UNIT – I

Introduction

 The word statistics is derived from the Latin word “ status” (or) an Italian word (or) a German
word” statistic” which means political state
 Since in early statistics indicates a collection of facts about the people in the state for
administration or political state.
 The state administration requires data regarding birth death income employment, etc…
 Now a day’s statistics used to collect quantities information in varies fields, economics, finance,
protection, agriculture, medicine and health care then statically approach
 The term statistics is used in different senses singular form and plural form. Singular from
indicates the statistical methods such as collection, classification, frequency distribution and
interpretation.
 In plural form it refers to the numerical information collected in a systematic manner.
Definition:

 Bowely defines “statistics as numerical statements of facts in any department of enquiry placed
in relation to each other.”
 Yule and Kendal defined by “statistics we mean quantitative data affected to market extent by a
multiplicity of causes.
 This definition is comprehensive and exhaustive and highlights certain characteristics of
statistical data.
 Statistics are aggregate of facts
 Affected to a market extent by multiplicity of causes
 Numerically expressed
 Enumerated (or)estimated according to reasonable standards of accuracy
 Collected in a systematic manner
 Collected for a pre determined purpose
 Placed in a relationship to each other
 Reasonable standards of accuracy

Functions:

1. Statistics presents facts on a definite manner


2. Comparison of facts
3. Establishment of relationship
4. To measure the effects
5. To formulate policies in different fields

Statistics definition:

Statistics is defined as collection; classification, tabulation, frequency distribution; interpretation (or)


Statistics is derived from the Italian word “status” which means “political state”. It includes

 Collection
 Classification
 Tabulation
 Analysis
 Interpretation
Definition 2: It may be called science of Average

Definition 3: It is the science of counting

CHARACTERISTICS OF STATISTICS:

1. Statistics are aggregate of facts


2. Statistics are affected to a marked extent by a multiplicity of causes
3. Statistics are numerically expressed
4. Statistics should be enumerated or estimated
5. Statistics should be collected with reasonable standard of accuracy
6. Statistics should be collected in a systematic manner for a per-determined purpose
7. Statistics should be placed in relation to each other

Limitations of statistics:
1. It does not deal with individual items
2. It deals with quantitative data
3. It can be misused
4. Statistical laws are true only on average
Explanations:
1. Statistics deals with group of items for
E.g. The height of 5 students.
2. It does not deals with qualitative terms
E.g.: honesty, color, intelligence, beauty.
3. It figures are given without details we may arrive at wrong and misleading conclusion.
4. The average life of human beings is 65 years. It does not mean that all the human beings die
At the age of 65 years

Functions of statistics:
1. It simplifies the unwieldy and complex data
2. It facilitates comparison
3. Formulates of test hypothesis
4. Studies the relationship
5. It tries to give material for the business man as well as the administrator so as to serve as a guide
in planning and is shaping future policies and programmers
Uses of Statistics:
 It helps in presenting large quantity of data in a simple and classified form
 It gives the method of comparison of data.
 Time series analysis helps in four fasting and consequent planning
 Regression analysis establishes relationship between two variables
 For the calculation of mortality rates and vital statistics demography are useful
Type of Statistics:

Broadly speaking applied statistics can be divided into areas: Descriptive statistics and inferential
statistics.
DESCRIPTIVE STATISTICS:

Descriptive statistics consists of methods for organizing. Displaying and describing data by using tables,
graphs, and summary measures.

INFERENTIAL STATISTICS:

Inferential statistics consists of methods that use sample results to help make decisions or predictions
about a population.

COLLECTING OF DATA:

DATA:

1. It is nothing but collecting of information


2. The basic problem of statistics enquiry is to collect facts and figures relating to a particular
study
3. According to bowley “scientific ,reliable , precise and finite conclusions can be drawn only if
relevant data are collected
4. The investigator is the person who conducts the statistical enquiry respondents are the persons
from whom the information is collected

DATA

PRIMARY SECONDARY

SOURCES OF COLLECTING DATA:

Primary:

1. Primary data are those statistical data which are collected for the first time and are original in
nature
2. Primary data are those which are collected from the individual directly and these data have
never been used for and purpose earlier

Method of Collecting Primary Data:


 Direct personal interviews
 Indirect oral investigation
 Through investigation
 Mailed questionnaire method
 Sample surveys
 Schedules sent through enumerates
DIRECT PERSONAL INTERVIEW:
 Under this methods the investigator collects the data personally
 The persons from whom information are collected are known as informants or respondents
 The investigator personally meets them and asks questions

Merits:

1. Data are originally collected


2. True and reliable data
3. Higher degree of accuracy
4. Uniformity and homogeneity can be maintained

Demerits:

1. It is unsuitable when the area are large


2. It is expensive and time consuming

INDIRECT ORAL INVESTIGATIONS:

1. Under this method the investigator contacts witness (or) neibours (or) friends who are
capable of supplying the necessary information

Merits:

1. It is simple and conventional


2. It saves time and money
3. The information is unbelief
4. It can be used in the investigation of a large area

Demerits:

1. The information can be relied


2. Interview with improper man will spoil the result
3. The careless attitude of the informant
4. Will affect the degree of accuracy
5. In order to get the real position sufficient no of persons are to be interviewed
MAILED QUESTIONNAIRE:

1. In this method a questionnaire consisting of a list of questions pertaining to the enquiry is


prepared.
2. There are blank spaces for answers. This questionnaire is sent to the respondents, who are
expected to write the answers in blank spaces
3. A covering letter is also sent along with the questionnaire requesting the respondents to
extent their full co-operation by giving questionnaire dully filled in time
4. Research workers, privates, individuals, non- official are the agencies and state and
central government who adopt this method

Merits:

1. It is relatively cheap
2. It is widely used when the area of investigation is large
3. It saves money and time

Demerits:

1. In this method there is no direct contact between the investigator and the respondent.
therefore we cannot be sure about the accuracy and reliability of the data
2. This method is suitable only for the literate people
3. people may not give the correct answers

SCHEDULES SENT THROUGH ENUMERATORS:

 It is the most widely used method of collecting of primary data. a number of enumerator
are selected and trained .
 They are provided with standard lied questionnaire. Specific training and interviews are
given to them for filling up the schedules. Each enumerator will be in charge of certain
area.
 The investigator goes to the informant along with the questionnaire and gets the replies
and records their answers. Public organization and research .institution uses this method.

Merits:

1. It is very useful in extensive enquires


2. It yields reliable and accurate results ,because the enumerators to are educated &trained
3. The scope of enquiry can also be greatly enlarged
4. When the respondent are literates this techniques can be used

Demerits:

1. It is very costly method as the enumerators are trained and paid for
2. This method is time consuming ,because the enumerators go personally to obtain the
information
3. Personal bias of the enumerators may lead to false conclusion
CHARACTERISTICS OF A GOOD QUESTIONNAIRE

1. No of questions should be minimum


2. Questions should be in logical oeder.eg: are you married. If yes how many children do you
have?
3. Questions should be short &simple
4. Questions requiring lengthy answers are to be avoided. Questions fetching yes (or) no
answers are preferable.
5. Personal questions are to be avoided
6. There should not be a leading questions such as “are you rich “? Instead, the questions can be
what your annual income is.

SECONDARY DATA:

 It is the statistical information which has already been collected by someone for his own.
Purpose and available for use by other purpose. (Or)
 If the data have already been collected by some persons (or) institution and they are made
available for statistical investigation is known as secondary data

They are collected mainly from the following resources

1. Published by official agencies of the government, as well as the non-official private agencies
2. Information available in daily news papers, magazines.
3. Published report of state and central government as union territories

CLASSIFICATION:

Definition:
 The process of arranging data into groups according to some common characteristics
 The process of arranging (or) grouping a large no of individual facts (or) observation on the
basics of similarity among the items is called classification.
Data can be classified on the basic of the following

 Geographical (or)spatial
 Chronological (or)temporal (or)historical
 Qualitative
 Quantitative

Geographical (or) spatial classification:


The classification is based on place (or) region such as states, towns, city, and village.

Spatial: Series which are arranged on the basic of place are called spatial series.

Geographical classification is illustrated in the following table:


Sales data (of pressure cookers) for 2015(T.N)
NAME OF TOWN NUMBER OF
COOKERS
MADRAS 15000

TIRUCHI 13000

MADURAI 11000

COIMBATORE 8000

KANYAKUMARI 4000

Chronological (or) temporal (or) historical:

The classification is based on time and arranged chronologically (or) historically.

Chronological classification is illustrated below

Population of India from 1921to1981


Year Population (in
million)
1921 248
1931 276
1941 313
1951 357
1961 438
1971 536
1981 684

Qualitative:

 Classification is based on the attributes (or) characteristics. (i.e.) non- measurable characteristic
are qualitative.
 If the statistical data collected about the qualities likes male, female, employed, Indian, foreigner.
This is qualitative classification.

The qualitative classification is of two types.


1. One way classification
2. Two way (or) manifold classification

One way classification means classification of data on the basic of only one consideration

POPULATION

MALE FEMALE

Two way classifications is based on two consideration. For e.g.: the no of persons leaving India to four
different countries, USA, Canada, for employment opportunities according to sex from 4 different cities.
POPULATION

MALE FEMALE

LITERATE ILLITERATE LITERATE ILLITERATE

MARR MARR MARR UNMA MARR UNMA


IED UNMA IED UNMA IED IED
RRIE RRIE
RRIED RRIE
D D
D
Qua
ntitative classification:

Measurable characteristic are quantitative .i.e. the statistical data according to numerical measurable
such as age, height, weight, quantitative phenomenon is called a variable.

VARIABLE

QUALITATIVE OR
QUANTITATIV CATEGORICAL(e.g
E make of a computer
,hair color gender)

DISCRETE(e.g
CONTINUOUS(e.g .,
number of
length age
house
,height,weight,time)
,cars,accidents)
Tabulation:

A statistical data collected either through a primary source (or)a secondary source has to be classified
first &then the classified data has to be presented in a tabulate form in an orderly way before analysis
&interpretation of the data.

Definition:
Tabulation is defined as the orderly (or) systematic presentation of numerical data in rows &columns,
designed to facilitate the comparison between the figures.
Tabulation is a statistical tool used for condensation of the data in a statistical process.

Characteristic of good table:


1. A statistical table should contain a clear &precise title.
2. When a no of tables are presented in the analysis of a statistical data, serial no of should be a given to
the tables.

1. Descriptions of columns, rows, sub-columns, sub-rows, should be well defined.


2. The unit of measurement used should be clearly indicated .these units are normally mentioned at the
top of the columns.
3. Data which are comparable should be given side by side
4. Table should be neat &attractive
5. Horizontal and vertical lines should be drawn to separate adjacent rows &columns. Thin &think
lines may be drawn to distinguish sub-rows &sub-columns from the main rows &main columns.
6. Column totals &row totals should be shown
7. Be given as foot notes foe proper understanding of the table
8. If the data has several sub- classifications iy can be presented in more than one table
9. The data of preparation of the table &source of information should be mentioned at the bottom of the
table

Parts of the table:

1. Identification no
2. Title
3. Head note
4. Stubs
5. Captions
6. Body of the table
7. Foot notes
8. Source
MEASURES OF CENTRAL TENDENCY:

(i) The arithmetic average of the distribution.


(ii) The point exactly mid-way between the top and bottom halves of the distribution and
(iii) The most frequently occurring score or the mid-point of the most frequent measurement
class.

The first of these ways of defining the central tendency leads to the mean, the second leads of the
average or the distribution; and the third is known as the mode. All these three, as a class, are known as
measures of the central tendency.

Though the average is the popular term for the arithmetic mean, yet in statistical work ‘average’
is the general term for any measure of central tendency.

INTRODUCTION:

 A Measure of central tendency gives a single value from a group of values


 One of most important objectives of statistical analysis is to get one single value is to get one
single value that describes the entire mass of unwieldy data
 Such a value is called central or an average
 It is also known as measures of location (or) average (or) central value.

DEFINITION:

 According to Clark “average is an attempt to find one single figures from a group of single
figures from a group of figures.”
 Average is a value which is representative a set of data.
OBJECTIVES OF AN AVERAGE

1. To facilitate quick understanding: the purpose of an average is to reduce the mass of complex
data into a single figure it can be easily and quickly understood. The single figure represents the
characteristics of the whole group
2. To facilitate comparison: two or more sets of values can be easily compared on the basis of their
average for example monthly average sales of a company can be compared with the monthly
average sales of another company. such a comparison will be helpful in making decisions.
3. To establish mathematical relationship: an average helps to establish mathematical relationship
between variables. For example, to say that an average income of an Indian is less than the
average income of an American is not clear. But, if the respective income are expressed in terms
of average it will be define and clear.
4. To take policy decisions: in the process of experimentation and research, average are valuable in
setting standards, estimation and other managerial decisions.
CHARACTERISTICS OF A GOOD AVERAGE:

A good average should possess the following characteristics.

1. It should be simple to understand and easy to calculate.


2. It should be based on all the observations.
3. It should be well defined.
4. It should be capable of being used in further statistical computations.
For example : Arithmetic mean is suitable in calculating standard deviation.
5. It should not be affected by the extreme items
6. It should not be affected by sampling fluctuations.
7. It should be capable of further algebraic treatment. If we know the average of two or more sets of
data, it is possible to get the average of all the groups combined.
TYPES OF AVERAGE:

1. Arithmetic mean

i) Simple ii) weighted

2. Median
3. Mode
4. Geometric mean
5. Harmonic mean

ARITHMETIC MEAN: -METHODS (RAW DATA)

 Direct method
 Shortcut method
 Step deviation method
Arithmetic mean refers to the simple average. To a layman it is an average but.

For a statistician it is arithmetic mean it is the simplest of all averages and is widely used in practice.

Definition: The arithmetic mean is defined as the sum total of all values divided by their number. There
are two types of arithmetic mean they are

i) Simple arithmetic means and ii) weighted arithmetic mean

PROPERTIES OF MEAN:

1. The sum of the deviation of the items from the arithmetic mean, is always zero.
Σ ( X–X) =0
2. The sum of the squared deviation of the items from mean is minimum.

Σ ( X – X )2 = minimum
3. If any two of three values A.M ( X ), no. of. items (N) and total of the values (ΣX) are
known; the third can be found out.
ΣX
X = = ΣX = X N (or)
N
PROPERTIES OF ARITHMETIC MEAN:

1. The sum of deviation of the items from arithmetic mean is Zero.


2. The sum of the squared deviations of the items from the arithmetic mean is minimum
3. Two kinds of arithmetic mean, namely simple arithmetic mean and weighted arithmetic are
defined.
4. If arithmetic mean and the number of items of two or more than two related group are known,
then their combined mean can be calculated.
5. If a constant number is added or subtracted from each item in a series , then mean will increase
or decrease by the same amount.
Merits:

1. It is easy understand
2. It easy calculate
3. It easy used in further calculation
4. It easy rigidly defined
5. It easy based on the value of every item in the series
6. It provides a geed basis for comparison
7. It can be used for further analysis and algebraic treatment
8. The mean is a more stable measure of central tendency.

Demerits:

1. The mean is unduly affected by the extreme items.


2. It is unrealistic
3. It may lead to a f also conclusion
4. It cannot be accurately determined even if one of the values is not know
5. It is not useful for the study of qualities like intelligent, honesty and character.
6. It cannot be located by observation or the graphic method
7. It given greater importance to bigger items of a series and lesser importance to smaller items.

ii) weighted Arithmetic mean:

Arithmetic mean gives equal importance to all the items but there are situations where items differ in
importance. in such cases, it is necessary to assign weights in proportion to the relative importance of
the various items .hence ,weighted arithmetic mean is calculated .weighted arithmetic mean is especially
useful in problems relating to the construction of index numbers and standardized birth and death rates.
THE MEDIAN:

Median is the value of item that goes to divide the series into equal parts.
The median is the value of the middle item in series, when items are arranged according to
magnitude.
Merits of Median:

1. It is easy to understand and easy to compute


2. It is quite rigidly defined
3. It eliminates the effect of extreme items.
4. It is amenable to further algebraic items.
5. Since it is positional average process.
6. Since it is positional average ,median can be computed even if the items at the extremes are
unknown
7. Median can be calculated even from qualitative phenomena honesty ,character,etc.,
8. Its value generally lies in the distribution.
Demerits of median:

1. Typical representative of the observations cannot be computed be computer if the distribution of


item is irregular.
2. Where the number of items is large ,the prerequisite process
3. it ignores the extremes items.
4. In case of continuous series the median is estimated but not calculated.
5. It is more effected by fluctuations of sampling than in mean
6. Median is not amenable to further algebraic manipulation.
MODE:

Mode is the value which occurs the greatest number of frequency in a series.

Mode is the size of that item which has the maximum frequency

Merits:

1. It is easy to understand as well as easy to calculate


2. It is usually an actual value
3. It is not affected by extreme values
4. It is simple and precise
5. It is the most representative average.

Demerits:

1. It is not suitable for further mathematical treatment


2. It may not give weight to extreme items
3. It is stable only when the sample is large
4. Mode is influenced by magnitude of the class intervals.
USES:

The concept of mode is used by the people in their everyday life. For example, a manufacture of
banyans’, readymade garments, or shoes etc...., is interested in the modal size and manufactures them in
large quantities.

Mode helps the manufacturer in deciding the modals. It is usually in industry and business.
Whether forecasts are also based on mode. It is very usually to agriculturists, businessmen etc. Mode is
also used in socio-economic survey. Mode is also mostly used in business and common.

Geometric Mean:

Merits of Geometric mean:

1. It is based on all observations.


2. It is rigidly defined:
3. It is capable of further algebraic treatment.
4. It is an average most suitable when large weight has to be given to small items a small weight to
the large items.
5. It is less affected by the extreme values.
6. It is useful in studying economic and social data
7. It is suitable for averaging ratios, rates and percentages.
Demerits of Geometric mean:

1. it is difficult to understand
2. Non-mathematical persons cannot do calculations.
3. The geometric mean cannot be computed if any item in the series is negative or zer
4. It has restricted application.
Harmonic mean:

Merits of Harmonic Mean:

1. It is rigidly defined.
2. It is based on all the observations of the series.
3. It is suitable in case of series having wide dispersion
4. It is suitable for further mathematical treatment.
5. It gives less weight to large items and more weight to small items.
Demerits of Harmonic mean:

1. It is difficult to calculate and is not understandable.


2. All the values must be available for computation.
3. It is not popular.
4. It is usually a value which does not exist in series.
RELATIONSHIP BETWEEN MEAN, GEOMETRIC MEAN AND HARMONIC MEAN:

If all the items in a variable are the same the arithmetic mean the geometric mean and harmonic mean
are equal .if all the items in a distribution have the same value then.

X=G. M=H . M
But ifthe size vary ,as will generally be the case ,mean will be greater than the geometric mean,and
geometric mean will be greater than the harmonic mean this is because of the property of the geometric
mean to give larger weight to smaller item and of the harmonic mean to give the largest weight to the
smallest item hence,

X >G . M > H . M
Frequency Distribution:

Definition:

Frequency distribution is a statistical table which shows the set of all distinct values of the variable
arranged in order of Magnitude either individually or in groups with their frequency side by side.

Discrete Frequency Distribution:

The discrete frequency distribution, values are given individually (ungrouped)

Example: The weekly wages in Rs. Paid by a house building contractor to the workers are given below.
Form a discrete frequency distribution.

300, 240, 240, 150, 120, 240, 120, 120, 150, 150, 150, 240, 150, 150, 120

300, 120, 150, 240, 150, 150, 120, 240, 150, 240, 150, 120, 120, 240, 150

Solution:

In the first column all the different values are written in ascending order. Each of the given values is
considered and tally mark is put in the appropriate place in the second column. After considering all the given
values, the tally marks against each value are counted and their number is written in numerical in the third
column. These numerical are the frequencies and their total is also given.

In the final table, Only two columns (Excluding the tally marks) are given.

Weekly wages Tally marks No. of. workers(frequencies)


120 1111 111 8
150 1111 1111 11 12
240 1111 1111 8
300 11 2
DISCRETE FREQUENCY DISTRIBUTION

Weekly wages(Rs.) 120 150 240 300 Total


No. of. Workers: 8 12 8 2 30

Continuous Frequency Distribution:

This is an interval of values which constitutes a class or group. In the example data under quantitative
classification the class- Intervals have been taken as 0-39, 40-49, 50-59 and 60-100.

Example : Given

Class – Intervals True class- Intervals


0-39 -0.5-39.5
40-59 39.5-49.5
50-59 49.5-59.5
60-100 59.5-100.5

In this example, d= 1 ( 40-39 = 50-49 = 60-59) and ½ d = 0.5 d can be different from 1 also.

Sometimes, it is seen that d= 0.1 or 0.01 Size of the class –interval.

Size of a class interval is also called length . Size = Upper boundary – Lower boundary.

lower boundary+upper boundary


Mid-value =
2

lower limit +upper limit


(or) = 2

Example : From the following observations prepare a frequency distribution table in ascending order starting
with 100- 110 (exclusive method).

Income in (Rs)

125 108 112 126 110 132 136 130 149 155 120 130 136 138 125

111 119 125 140 148 147 137 145 150 142 135 137 132 165 154

Solution: Proceeding as explained under the previous example the following table is Obtained. This being

Exclusive method the given values. 110, 120, 130, 140, and 150 are included in the class- intervals in which
they are lower boundary.

CONTINUOUS FREQUENCY DISTRIBUTION

Income(Rs 100-110 110-120 120-130 130-140 140-150 150-160 160-170 Total


)
frequency 1 4 5 10 6 3 1 30
Cumulative Frequency Distribution:

There are two kinds of cumulative frequency distribution.

i) Less than cumulative frequency distribution


Frequency distributions both ungrouped and grouped are to be taken in ascending order.
The total of the frequencies from the beginning up to and including each frequency is found. That
cumulative frequency shows how many items are less than or equal to the corresponding value or
upper boundary of the class- interval.

ii) More than cumulative frequency distribution


Frequency distributions both ungrouped and grouped are to be taken in ascending order. The total
of the frequency is found that cumulative frequency shows how many items are more than or
equal to the corresponding value or lower boundary of the class- interval.

Example: Formation of the two cumulative frequency distribution as explained above from a given discrete
frequency distribution.

Given discrete f. d Less than cumulative f. d More than cumulative f. d


Weekly No. of. Weekly Less than Weekly wages More than
wages(Rs) Workers(f) wages(Rs) cumulative (Rs) cumulative
frequencies Frequencies
120 8 120 8 120 N=30
150 12 150 8+12=20 150 30-8=22
240 8 240 20+8=28 240 22-12=10
300 2 300 28+2=30 300 10-8=2
For example:

Less than cumulative frequency 28 corresponding to the wage 240 means 28 workers have wages less
than or equal to Rs. 240 similarly more than cumulative frequency 10 corresponding to the wage 240 means 10
workers have wages more than or equal to Rs. 240.

Example: Formation of the cumulative frequency distribution (as explained earlier) from a given continuous
frequency distribution (inclusive method)

Given Continuous f. d Less than cumulative f. d More than cumulative f. d


No. of. Less than More than
Marks Students Marks cumulative Marks cumulative
frequencies Frequencies
0-19 10 0-19 10 0-19 250
20-39 25 20-39 35 20-39 240
40-59 103 40-59 138 40-59 215
60-79 72 60-79 210 60-79 112
80-89 40 80-79 250 80-99 40
MEASURES OF CENTRAL TENDENCY
Definition: “Average is a value which is typical or representative of a set of data”

A measures of central tendency gives a single representative value for a set of usually un equal values.
The single value is the point of location around which the individual values of the set cluster. The

Measures of central tendency are hence known as (measures of location) .They are popularly called

Averages. Various measures of central tendency are the following.

1. Arithmetic mean
2. Median
3. Mode
4. Geometric mean
5. Harmonic mean.
6. Combined mean

ARITHMETIC MEAN

TYPE I: [Individual observation or raw data]

When the observed values are given individually such as X 1 , X 2 , … … … … X n the methods of calculation
of arithmetic mean are as follows.

Total of t h eobservations
Direct method: Arithmetic mean =
Number of t h e observations

X 1+ X 2+ X 3+… … … … … ..+ Xn
= N

X́ =
∑x
N

i) Arithmetic mean:
1) Individual series:

Problem 1: The expenditure of 10 families in Rupees is given below.

Family: A B C D E F G H I J
Expenditure 30 70 10 75 500 8 42 250 40 36
:
Calculate the Arithmetic mean.

Solution: X- Expenditure, N=10 Formula: X́ =


∑x
N

Family: Expenditure(Rs.)
A 30
B 70
C 10
D 75
E 500
F 8
G 42
H 250
I 40
J 36
Total ∑ x = 1061

X́ =
∑ x = 1061 = 106.1
N 10

The arithmetic mean is Rs. 106.10

Problem 2: Calculate mean from the following data.

R. No’s: 1 2 3 4 5 6 7 8 9 10
Marks: 40 50 55 78 58 60 73 35 43 48
Solution: Calculation of mean

R.No’s Marks
1 40
2 50
3 55
4 78
5 58
6 60
7 73
8 35
9 43
10 48
N=10 ∑ x = 540
X́ =
∑ x = 540 = 54 marks.
N 10

TYPE II: [Discrete series]

The actual values with corresponding frequencies are given in the following form.

Observed Frequency

X1 F1

X2 F2

. .

. .

. .

These methods of calculation of arithmetic mean are illustrated below.

Direct method: Arithmetic mean, X́ =


∑ fx
N

Problem 1: Calculate the mean number of persons per house given.


No.of.persons per house: 2 3 4 5 6 Total
No. of. houses: 10 25 30 25 10 100

Solution: X – No. of .persons per house, f – No. of. Houses

No. of. persons per house No. of. houses


X F fx
2 10 20
3 25 75
4 30 120
5 25 125
6 10 60
Total N= 100 fx
∑ =400

Mean, X́ =
∑ fx = 400 = 4
N 100

Problem : 2 Calculate mean from the following data.

Value: 1 2 3 4 5 6 7 8 9 10
F: 21 30 28 40 26 34 40 9 15 57
Solution:

X F FX
1 21 21
2 30 60
3 28 84
4 40 160
5 26 130
6 34 204
7 40 280
8 9 72
9 15 135
10 57 570
Total N= 300 ∑ fx =1716
TYPE III [Continuous series – Exclusive class Intervals]

This is the most important form moss often data are available in this form. As seen later, data of type IV
To type VII are to be rewritten in type III form for proper use of the formulae for median , Mode Quartile etc.

Formulae for continuous series and discrete series are same (but definitions of d, d ' and f differ as
explained earlier) and hence the steps are same after identifying M , the mid values of the class intervals.

Problem 1: Calculate Arithmetic mean for the following.

Marks: 20-30 30-40 40-50 50-60 60-70 70-80


No. of. students: 5 8 12 15 6 4

Solution: Formula, Arithmetic mean X́ =


∑ fm
N
Marks No. of. Students Mid value fm
F M
20-30 5 25 125
30-40 8 35 280
40-50 12 45 540
50-60 15 55 825
60-70 6 65 390
70-80 4 75 300
N=50 ∑ fm= 2460

X́ =
∑ fm = 2460 = 49.20
N 50

TYPE IV [Continuous series – Inclusive class Intervals]

Problem 1: The annual profits of 90 companies are given below find the arithmetic mean.

Annual profit (Rs .lakhs): 0-19 20-39 40-59 60-79 80-99


No. of. Companies: 5 17 32 24 12

Solution: Arithmetic mean X́ =


∑ fm
N

No. of. Companies Mid value


Annual profit (Rs. lakhs) F m fm
0-19 5 9.5 47.5
20-39 17 29.5 501.5
40-59 32 49.5 1584.0
60-79 24 69.5 1668.0
80-99 12 89.5 1074.0
N=90 ∑ fm=4875.0

Arithmetic mean, X́ =
∑ fm = 4875
N 90

= Rs. 54.17 lakhs.

TYPE V [Continuous series less than cumulative frequencies]

Problem 1: Calculate the mean height

Height below(cms): 150 155 160 165 170 175 180 185
No. of. soldiers: 0 23 77 152 266 419 472 500
Solution:

Height No. of Height No. of. Mid value


below (cms) .soldiers: (Cms) Soldiers M Fm
150 0 150-155 23-0=23 152.5 3507.5
155 23 155-160 77-23=54 157.5 8505.0
160 77 160-165 152-77=75 162.5 12187.5
165 152 165-170 266-152=114 167.5 19095.0
170 266 170-175 419-266=153 172.5 26392.5
175 419 175-180 472-419=53 177.5 9407.5
180 472 180-185 500-472=28 182.5 5110.0
185 500 185 above N = 500 84205.0

Mean height= X́ =
∑ fm = 84205.0 = 168.41 cms.
N 500

TYPE VI [Continuous series more than cumulative frequencies]

Problem: 1 Calculate the arithmetic mean from the following.

Weight above (kgs): 20 25 30 35 40


No. of. Boys: 160 145 100 50 9

Solution: Arithmetic mean X́ =


∑ fm
N

Weight above No. of. Boys Weight (kgs) No. of. Boys Mid value
(kgs) f f M fm
20 160 20-25 160-145=15 22.5 337.5
25 145 25-30 145-100=45 27.5 1237.5
30 100 30-35 100-50=50 32.5 1625.0
35 50 35-40 50-9=41 37.5 1537.5
40 9 - 9 42.5 382.5
N= 160 ∑ fm= 5120

Arithmetic mean X́ =
∑ fm =
5120
= 32.00 kgs.
N 160

Combined mean:

Let there be N 1 items in the first group with mean x́ 1and N 2 items in the second group with mean x́ 2

The total of N 1 items = N 1 x´1 and the total of N 2 items = N 2 x́ 2

When these two groups merge together, there are N 1 + N 2 items whose total = N 1 x´1 + N 2 x́ 2

N 1 x´1+ N 2 x´2
∴ The mean of the combined group, X´12 =
N 1+ N 2

In a similar manner, when there is a third group of N 3 items with mean x́ 3 , the combined arithmetic
N 1 x´1+ N 2 x´2 + N 3 x´3
mean of the three groups, X´12 = N 1+ N 2+ N 3

Problem: 1
There are two branches of an establishment employing 100 and 80 persons respectively, If the arithmetic
mean of the monthly salaries paid by the two branches are Rs. 275 and Rs.225 respectively Find the
arithmetic of the salaries of the employees of the establishment as a whole.

Solution: Given N 1 = 100 ; N 2 = 80

X́ 1 = 275 ; X́ 2 =225

The arithmetic mean of the salaries of the employees of the establishment as a whole.

N 1 x´1+ N 2 x́ 2
X´12 =
N 1+ N 2

100 x 275+ 80 x 225


= 100+80

27500+18000
= 180

45500
= 180 = Rs. 252.78

MEDIAN

Definition: Median is the value of the middle most item when all the items are in the order of
Magnitude.

Individual observations:

N +1
Formulae of the median =
2

Problem: 1 [ N is an odd number]

Find median for the following data. 6 9 21 5 7 -2 0 32 and 9.

Solution: Values in ascending order : -2 , 0, 5, 6, 7, 9, 9, 21, 32.

N +1 9+1
Position of median is = =5
2 2

Median = 7 [It is the value at 5th position]

Solution:

Position [Ascending order values (x)] [(Descending order values (x)]


1 -2 32
2 0 21
3 5 9
4 6 9
5 7 [Median] 7
6 9 6
7 9 5
8 21 0
9 32 -2

Problem: 1 [ N is an odd number]

Find median for the following data. 57, 58, 61, 42, 38, 65, 72, 66.

Solution: Values in ascending order: 38, 42, 57, 58, 61, 65, 66, 72.

N +1 8+1
Position of median is = = 4.5
2 2

N
A fraction value at (N/2 = 8/2) 4th position =58value at ( +1 = 4+1=) 5th position =61
2

value at 4 th position+ value at 5 th position 58+61


Median = = = 59.5
2 2

Discrete series:

N + 1th
Median =
2
item.

Problem: The marks (out of a maximum of 10 , scored by the students of a class are given below. Find
the median mark.

Marks: 3 4 5 6 7 8 9 10 Total
No. of. students: 1 5 6 7 10 15 10 5 59
Solution:

Marks No .of .students Cumulative frequency


X F CF
3 1 1
4 5 1+5=6
5 6 6+6=12
6 7 12+7=19
7 10 19+10=29
8 15 29+15=44
9 10 44+10=54
10 5 54+5=59
N= 59

N +1 59+1
Position of median is = = 30.
2 2

When all the 59 items are ascending order, which is in 30 th position. It is included in cf = 44

Median = 8.
Problem: 2 Locate median from the following

Solution:

Size of shoes Frequency CF


5 10 10
5.5 16 26
6 28 54
6.5 15 69
7 30 99
7.5 40 139
8 34 173
N= 173
N + 1th
Median = Size of item
2

173+1
= Size of th item = 87 th item
2

= size of 87th item = 7

Median = 7.
Continuous series:

( N2 −cf )
Median = L+
[ i

f ]
Where , L is the lower boundary of the class interval

F is the frequency of the class

I is the size or length of the class interval

Cf is the cumulative frequency of the class preceding the median class.

Problem:

Calculate the median heights.

Heights cms): 145-150 150-155 155-160 160-165 165-170 170-175


No.of.students 2 5 10 8 4 1
:
Solution:

Heights (cms) No. of. students Cumulative frequency


F Cf
145-150 2 2
150-155 5 7
155-160 10 17
160-165 8 25
165-170 4 29
170-175 1 30
N= 30 -

Class intervals are continuous and are in ascending order. N/2 = 30/2 = 15.

15th cumulative frequency is included in the interval 155-160. It is the median class interval

L = 155 , f = 10 , i= 160-155=5, cf = 7

( N2 −cf )
M=L+
[ i

f ]
5( 15−7)
= 155 + [ 10 ]
5x8
= 155 + [ ]
10

= 155+4 = 159cms.

Mode
Definition: Mode is defined as the value of the variable which occurs most frequently in a distribution.

Individual Series:

The value or the values which occur more times are identified.

Problem: Determine the mode.

(i) 320, 395, 342, 444, 557, 395, 425, 417, 395, 401, 390, 400.

Solution: (i) 395 repeats three times , therefore the mode is 395 ( unimodal)

ii) 40, 44, 57, 78, 48 (No mode)


iii) 45, 55, 50, 45, 40, 55, 45, 55 (Bimodal) Mode = 45 and 55.

Discrete series:

Problem: Calculate the mode from the following

Size: 10 11 12 13 14 15 16 17 18
Frequency: 10 12 15 19 20 8 4 3 2
Solution: Greatest frequency is 20 modes need not be 14 because the difference between the
greatest frequency 20 and the next lower frequency 19 is very small. Further 19 has the support of
the neighboring frequency 15 while 20 has the support of 8 only. Grouping table and the analysis
table are formed as explained earlier.
Solution: Grouping table

Size Frequency
X F (2) (3) (4) (5) (6)

10 10 22

11 12 27 37
46
12 15 34

13 19 39 54

14 20 28 47

15 8 12 32

16 4 7 15

17 3 9

18 2

Analysis Table:

Size (1) (2) (3) (4) (5) (6) Total


X
10 -
11 1 1
12 1 1 1 3
13 1 1 1 1 1 1 5
14 1 1 1 1 4
15 1 1 1
16 -
17 -
18 -

∴ Mode = 5

Continuous series:

Mode, Z = L + ¿ x i
Z = Mode,

L = lower boundary of the modal class interval

f 1=¿frequency of the modal class ,

f0 = frequency of the class preceding the modal class,


f 2=f requency of the class succeeding the modal class,

I = Size or length of the modal class interval.


Problem: 1

Find out mode for the following data using group and analysis table .

Class interval: 0-5 5-10 10-15 15-20 20-25 25-30 30-35


Frequency: 9 12 15 16 17 15 10
Solution:

A grouping table and an analysis table are formed as explained earlier.

i) Grouping table

Frequency
C.I F (2) (3) (4) (5) (6)

0-5 9 21

5-10 12 27 36
43
10-15 15 31

15-20 16 33 48

20-25 17 32 33

25-30 15 25 42

30-35 10 23 38

35-40 13
ii) Analysis table

C.I (1) (2) (3) (4) (5) (6) Total


0-5 -
5-10 1 1
10-15 1 1 2
15-20 1 1 1 1 4
20-25 1 1 1 1 1 5
25-30 1 1 2
30-35 -
35-40 -

Mode, Z = L + ¿ x i
L= 20-25, f 1 = 17 , f 0=16 , f 2 = 15 , i= 5

17−16
Mode, Z = 20 + [ 2 x 17−16−15
x5 ]
= 20 + 1.67
= 21.67

Geometric mean:

Definition: Geometric mean of N values is the Nth root of the product of the N values.
N
If x 1 , x2 , x3 , …… … …… x N are the values, their geometric mean is √ X 1 , X 2 , X 3 , …… … … X N
∑ log X
Formulae: G.M = Antilog
[ N ] for Individual Observation

f log X
= Antilog
[∑ N ]
for Discrete Observation

f log m
= Antilog
[∑ N ]
for Continuous Observation

Individual Series:

Problem 1: Find the geometric mean of 3 6 24 48.

Solution:

X Log X
3 0.4771
6 0.7782
24 1.3802
48 1.6812
log X
∴G.M = Antilog [∑ ] N
for Individual Observation

43167
= Antilog [ 4 ]
= Antilog[ 1.0792 ] = 12.00

Discrete series:

Problem: Calculate Geometric mean for the data given below.

X: 10 15 25 40 50
F: 4 6 10 7 3
Solution:
X F Log X F log X
10 4 1.0000 4.0000
15 6 1.1761 7.0566
25 10 1.3979 13.9790
40 7 1.6021 11.2147
50 3 1.6990 5.0970
N= 30 ∑ flogx=41.3473
f log x
G.M = Antilog
[∑ N ]
413473
= Antilog [ 30 ]
= Antilog[ 1.3782 ] G.M = 23.89

Continuous Series:

Problem 1: Compute the geometric mean of the following series.

Marks: 0-10 10-20 20-30 30-40 40-50


No. of. Students: 5 7 15 25 8
Solution:

Marks No .of. students Mid values Log m Flog m


F M
0-10 5 5 0.6990 3.4950
10-20 7 15 1.1761 8.2327
20-30 15 25 1.3979 20.9685
30-40 25 35 1.5441 38.6025
40-50 08 45 1.6532 13.2256
N= 60 ∑ Flog m = 84.5243
∑ f log m
G.M = Antilog
[ N ]
84.5243
= Antilog [ 60 ]
= Antilog[ 1.4087 ]

G.M = 25.63

Harmonic mean

Definition: Harmonic mean is the reciprocal of the mean of the reciprocals of the values.

N
Formula: H.M =
∑ ( X1 ) for Individual Observation
N
H.M =
∑ ( Xf ) for Discrete Observation

N
H.M =
∑ ( mf ) for Continuous Observation

Individual series:

Problem 1: Find the Harmonic mean for the following Individual data.

6, 15, 35, 40, 900, 520, 300, 400, 1800, 2000.

Solution:

1
Value X x
6 0.1667
15 0.0667
35 0.0286
40 0.0250
900 0.0011
520 0.0019
300 0.0033
400 0.0025
1800 0.0006
2000 0.0005
1
∑ x = 0.2969
N
∴ H.M =
∑ ( X1 )
10
= 0.2969 = 33.68

Discrete Series:

Problem 1: Calculate the Harmonic mean from the following data.

X: 10 12 14 16 18 20
F: 5 18 20 10 6 1
Solution:

X F F
X
10 5 0.5000
12 18 1.5000
14 20 1.4286
16 10 0.6250
18 6 0.3333
20 1 0.0500
N=60 F
∑ X =4.4369

N
∴ H.M =
∑ ( Xf )
60
H.M =
4.369
= 13.521

Continuous Series:

Problem 1: Calculate the Harmonic mean for the following data

Value: 0-10 10-20 20-30 30-40 40-50


Frequency: 8 12 20 6 4
Solution:

F
Value Frequency (f) Mid value (M) M
0-10 8 5 1.6000
10-20 12 15 0.8000
20-30 20 25 0.8000
30-40 6 35 0.1714
40-50 4 45 0.0889
N= 50 - F
∑ M =3.4603

N
∴ H.M =
∑ ( mf )
50
= 3.4603

= 14.45
UNIT – II

Measures of Dispersion

Introduction:

In a series, all the items are not equal. There is difference or variation among the values. The
degree of variation is evaluated by various measures of dispersion.

Averages are central values. They enable comparison of two or more sets of data. They are not
sufficient to depict the true nature of the sets. For example, consider the following marks of two
students.

Student I Student II
68 85
75 90
65 80
67 25
70 65
Both have got a total of 345 and an average of 69 each. The fact is that the second student has
failed in one paper. When the averages alone are considered, the two students are equal.

What is Dispersion?
Simplest meaning that can be attached to the word ‘dispersion’ is a lack of uniformity in the sizes or
quantities of the items of a group or series. “Dispersion is the extent to which the magnitudes or
quantities of the items differ, the degree of diversity.” The word dispersion may also be used to indicate
the spread of the data.
In all these definitions, we can find the basic property of dispersion as a value that indicates the extent to
which all other values are dispersed about the central value in a particular distribution.

Properties of a good measure of Dispersion


There are certain pre-requisites for a good measure of dispersion:
1. It should be simple to understand.
2. It should be easy to compute.
3. It should be rigidly defined.
4. It should be based on each individual item of the distribution.
5. It should be capable of further algebraic treatment.
6. It should have sampling stability.
7. It should not be unduly affected by the extreme items.

Types of Dispersion
The measures of dispersion can be either ‘absolute’ or “relative”. Absolute measures of dispersion are
expressed in the same units in which the original data are expressed. For example, if the series is
expressed as Marks of the students in a particular subject; the absolute dispersion will provide the value
in Marks. The only difficulty is that if two or more series are expressed in different units, the series
cannot be compared on the basis of dispersion.
‘Relative’ or ‘Coefficient’ of dispersion is the ratio or the percentage of a measure of absolute dispersion
to an appropriate average. The basic advantage of this measure is that two or more series can be
compared with each other despite the fact they are expressed in different units. Theoretically, ‘Absolute
measure’ of dispersion is better. But from a practical point of view, relative or coefficient of dispersion
is considered better as it is used to make comparison between series.

Methods of Dispersion
Methods of studying dispersion are divided into two types :
(i) Mathematical Methods: We can study the ‘degree’ and ‘extent’ of variation by these methods. In
this category, commonly used measures of dispersion are :
(a) Range
(b) Quartile Deviation
(c) Average Deviation
(d) Standard deviation and coefficient of variation.
(ii) Graphic Methods: Where we want to study only the extent of variation, whether it is higher or
lesser a Lorenz-curve is used.

Mathematical Methods

Range

It is the simplest method of studying dispersion. Range is the difference between the smallest
value and the largest value of a series. While computing range, we do not take into account frequencies
of different groups.
Formula: Absolute Range = L – S
L−S
Coefficient of Range = L+S
where, L represents largest value in a distribution S represents smallest value in a distribution We can
understand the computation of range with the help of examples of different series,
(i) Raw Data: Marks out of 50 in a subject of 12 students, in a class are given as follows:
12, 18, 20, 12, 16, 14, 30, 32, 28, 12, 12 and 35.
In the example, the maximum or the highest marks obtained by a candidate is ‘35’ and the lowest marks
obtained by a candidate are ‘12’. Therefore, we can calculate range;
L = 35 and S = 12
Absolute Range = L – S = 35 – 12 = 23 marks
L−S
Coefficient of Range = L+S

(ii) Discrete Series

----------------------------------------------------------
Marks of the Students in           No. of students
Statistics (out of 50)

                     (X)                      (f)

-----------------------------------------------------------

Smallest        10                        4
                      12                       10
                      18                       16
Largest           20                       15

-----------------------------------------------------------
                                               Total = 45

-----------------------------------------------------------

Absolute Range = 20 – 10 = 10 marks


L−S
Coefficient of Range = L+S

(iii) Continuous Series

------------------------------------------
                 X             Frequencies

------------------------------------------
                10 – 15             4
S = 10        15 – 20            10
L = 30        20 – 25             26
                25 – 30             8

-------------------------------------------

Absolute Range = L – S = 30 – 10 = 20 marks


L−S
Coefficient of Range = L+S

Range is a simplest method of studying dispersion. It takes lesser time to compute the ‘absolute’ and
‘relative’ range. Range does not take into account all the values of a series, i.e. it considers only the
extreme items and middle items are not given any importance. Therefore, Range cannot tell us anything
about the character of the distribution. Range cannot be computed in the case of “open ends’ distribution
i.e., a distribution where the lower limit of the first group and upper limit of the higher group is not
given.
The concept of range is useful in the field of quality control and to study the variations in the prices of
the shares etc.
Merits:
1. It is simple to compute and understand
2. It gives a rough but quick answer.
Demerits:
1. It is not reliable because it is affected by the extreme items.
2. It cannot be applied to open and cases.
3. It is not suitable for mathematical treatment.
Uses:
1. Range is used in industries for the SQC of the manufactured product by the variation in the
construction of control chart.
2. Range is useful in studying the variation in the price of stock, shares and other commodities
that are sensitive to price changes from one period to one period.
3. The meteorological department uses the range for weather forecasts.

(b) Quartile Deviations (Q.D.)


The concept of ‘Quartile Deviation does take into account only the values of the ‘Upper quartile (Q3)
and the ‘Lower quartile’ (Q1). Quartile Deviation is also called ‘inter-quartile range’. It is a better
method when we are interested in knowing the range within which certain proportion of the items fall.
‘Quartile Deviation’ can be obtained as:
(i) Inter-quartile range = Q3 – Q1
Q 3−Q1
(ii) Semi-quartile range = 2
Q −Q
3 1

(iii) Coefficient of Quartile Deviation = Q +Q


3 1

Example: Calculation of Inter-quartile Range, semi-quartile Range and Coefficient of Quartile


Deviation in case of Raw Data
suppose the values of X are: 20, 12, 18, 25, 32, 10

Solution:
In case of quartile-deviation, it is necessary to calculate the values of Q1 and Q3 by arranging the
given data in ascending of descending order.
Therefore, the arranged data are (in ascending order):
X = 10, 12, 18, 20, 25, 32
No. of items = 6
Q1 = the value of item = = 1.75th item
= the value of 1st item + 0.75 (value of 2nd item – value of 1st item)
= 10 + 0.75 (12 – 10) = 10 + 0.75(2) = 10 + 1.50 = 11.50
Q3 = the value of item =
= the value of 3(7/4)th item = the value of 5.25th item
= 25 + 0.25 (32 – 25) = 25 + 0.25 (7) = 26.075

Therefore,
(i) Inter-quartile range = Q3 – Q1 = 26.75 – 11.50 = 15.25
Q 3−Q1
(ii) Semi-quartile range = 2
Q −Q3 1

(iii) Coefficient of Quartile Deviation = Q +Q 3 1

Example:

Calculation of Inter-quartile Range, semi-quartile Range and Coefficient of Quartile Deviation in


discrete series
Suppose a series consists of the salaries (Rs.) and number of the workers in a factory:

----------------------------------------

Salaries (Rs.)     No. of workers

----------------------------------------
60                    4
100                  20
120                  21
140                  16
160                  9

----------------------------------------

Solution:

In the problem, we will first compute the values of Q3 and Q1

-------------------------------------------------------------------------------------

Salaries (Rs.)             No. of workers           Cumulative frequencies

(x)                              (f)                              (c.f.)

--------------------------------------------------------------------------------------

60                                 4                                4
100                               20                              24 – Q1 lies in this cumulative
120                               21                              45 frequency
140                               16                              61 – Q3 lies in this cumulative
160                               9                                70 frequency

---------------------------------------------------------------------------------------

                              N = ∑f = 70

----------------------------------------------------------------------------------------
Calculation of Q1 :                                     Calculation of Q3 :
Q1 = size of th item                                    Q3 = size of th item
= size of th item = 17.75                             = size of th item = 53.25th item
17.75 lies in the cumulative frequency 24,     53.25 lies in the cumulative frequency 61 which
which is corresponding to the value Rs. 100     is corresponding to Rs. 140
Q1 = Rs. 100                                                Q3 = Rs. 140

-------------------------------------------------------------------------------------------

(i) Inter-quartile range = Q3 – Q1 = Rs. 140 – Rs. 100 = Rs. 40


Q 3−Q1
(ii) Semi-quartile range = 2
Q −Q3 1

(iii) Coefficient of Quartile Deviation = Q +Q3 1

Calculation of Inter-quartile range, semi-quartile range and Coefficient of Quartile Deviation in


case of continuous series
we are given the following data:

---------------------------------------------

Salaries (Rs.)           No. of Workers

---------------------------------------------

10 – 20                   4
20 – 30                   6
30 – 10                   10
40 – 50                   5

----------------------------------------

Solution:
In this example, the values of Q3 and Q1 are obtained as follows:

Salaries (Rs.) No. of workers Cumulative frequencies


(x) (f) (c.f.)
10 – 20 4 4
20 – 30 6 10
30 – 40 10 20
40 – 50 5 25
N = 25

Q1 = Therefore, .It lies in the cumulative frequency 10, which is corresponding to class 20 – 30.
Therefore, Q1 group is 20 – 30.
where, l1 = 20, f = 6, i = 10, and cfo = 4
Q1 =Rs 23.75
Q3 =Therefore, = 18.75,which lies in the cumulative frequency 20, which is corresponding to
class 30 –40, Therefore Q3 group is 30 – 40.
Where, L = 30, i = 10, cf = 10, and f = 10
Q3 = = Rs. 38.75
Therefore :
(i)Inter-quartile range = Q3 – Ql = Rs. 38.75 – Rs. 23.75 = Rs.15.00
Q 3−Q1
(iii Semi-quartile range = 2
Q −Q
3 1

(iii) Coefficient of Quartile Deviation = Q +Q


3 1

Merits:
1. It is simple to understand and easy to compute
2. It is not influenced by the extreme items
3. It can be found out with open and distribution
4. It is not affected by presence of extreme items.
Demerits:
1. It ignores the first 25% of the items and the last 25% of the items.
2. It is a positional average ; hence not amenable to further mathematical treatment
3. Its value is affected by sampling fluctuations
4. It gives only a rough measure.

(c) Average Deviation (or)mean Deviation:


Average deviation is defined as a value which is obtained by taking the average of the deviations of
various items from a measure of central tendency Mean or Median or Mode, ignoring negative signs.
Generally, the measure of central tendency from which the deviations are taken, is specified in the
problem. If nothing is mentioned regarding the measure of central tendency specified than deviations are
taken from median because the sum of the deviations (after ignoring negative signs) is minimum.
Merits:
1. It is simple to understand and easy to compute
2. M.D is calculate value
3. It is not much affected by the fluctuations of sampling
4. It is based on all items of the series and gives weight according to their size
5. It is less affected by the extreme items
6. It is rigidly defined
7. It is flexible
8. It is a better measure for comparison.
Demerits:
1. It is a non-algebraic treatment
2. It is not a accurate measure of dispersion
3. It is not suitable for further mathematical calculation
4. It is rarely used. It is not as popular as standard deviation.
Uses:
It will help to understand the standard deviation. It is useful in marketing problems. It is useful
while using small samples. It is used in statistical analysis of economic business and social phenomena.
It is useful in calculating the distribution of wealth in a community or a nation. It useful in forecasting
business cycle.

Standard Deviation
Merits:
1. It is rigidly defined
2. It is the most important and widely used measure of dispersion
3. It is possible for further algebraic treatment
4. The standard deviation provides the unit of measurement for the normal distribution
5. Standard deviation used in finding the coefficient of variation.
Demerits:
1. It is not easy to understand and it is difficult to calculate
2. It gives more weight to extreme values
3. It is affected by the value of every item in the series
4. It cannot be used for the propose of comparison.
Uses:
Standard deviation it is the best measure of dispersion. It is widely used in statistic because it
processes most of the characteristics of an ideal measure of dispersion. It is widely used in sampling
theory and by biologists. It is used in coefficient of correlation and in the study of symmetrical
frequency distribution.
Skewness

Definition: Skewness is the degree of asymmetry, or departure from symmetry, of a distribution.

Consider the following three continuous series with common mid values.

Coefficients of skewness are calculated later. But the values of averages and quartiles are
presented in a tabular form now. The frequency curves are also drawn at the bottom of the table.
How a symmetric curve looks, what is the relation between the averages in such a case and how
the quartiles are related then are a few questions for which the answers are being found. These aspects of
skewed curves are also known.

Mid value Series A (Frequency) Series B( Frequency) Series C (Frequency)


20 1 1 1
30 12 12 27
40 55 91 40
50 91 55 55
60 55 40 91
70 12 27 12
80 1 1 1

Nature of skewness Symmetry No skewness Asymmetry (Positive Asymmetry


skewness) (Negative
skewness)
X́ ¿ M >Z X́ ¿ M <Z
Averages X́ = M=Z 49.07¿ 46.73> 41.87 50.93¿ 53.27
50= 50=50 ¿ 58.13

Q 3 - M =M -Q 1 Q 3 - M ¿M -Q 1 Q 3 - M ¿M -Q 1
Quartiles Q 1 = 42.95 Q 1 = 39.81 Q 1 = 42.19
M= 50.00 M= 46.73 M= 53.27
Q 3 = 57.05 Q 3 = 57.81 Q 3 = 60.19
Longer tail in the right Longer tail in the
Nature of the curve Bell shaped side (Skewed to the left side (Skewed
right) to the left)

Symmetric Curve Positively skewed Negatively skewed

Absolute Measures: The following are the two absolute measures of skewness. They are of no
practical use. They indicate whether there is skewness or not; when there is skewness. Whether it is
positive or negative. They could not be used for comparison.

1. Mean – Mode
2. (Q3−¿¿ M) – (M - Q3 )

Even Mean – Median and Median – Mode are suggested as measures of skewness.
Relative Measures: The following five are the relative measures. According to G. Simpson and
F.Kafka. “ the same amount of skewness (absolute) and in distribution meanings in distributions
with small variation and in distributions with small variation and in distribution with large
variation.” Absolute measures are divided by certain measures of dispersion to eliminate the
influence of variation. Relative measures are called coefficients. They are used to compare two or
more series.

1. Karl-Pearson (1867 – 1936) was a great British Biometrician and Statistician. He introduced the
formula given below.
Karl-Pearson’s Coefficient of skewness,
Mean−Mode
SK P =
Standard Deviation

Theoretically, no limit can be found for this measure. This is found mostly to vary between -1
and +1. Based on the interrelation between mean, median and mode in a moderately skewed distribution,
his second formula:

3 (Mean−Median)
SK P =
Standard Deviation

It can be used when mode is ill defined. Theoretically, this measure lies between -3 and +3. But, this
lies outside -1 and +1, rarely.

2. The following is due to Prof. Bowley.


Q 3+ Q 1 – 2 M
Bowley’s Coefficient of Skewness, S K P =
Q 3−Q1
This is quartile measure of skewness and the value of this lies between -1 and + 1. This method
is useful where there is an open end class interval or extreme values are present.
P 90+ P10 −2 M
3. Kelly’s Coefficient of Skewness, S K k =
P90−P10

D9+ D1−2 M
=
D9−D1

This method is also useful where there is an open end class interval or extreme values are
present. This formula is better than Bowley’s. Bowley’s formula ignore the lowest 25% and the
highest 25%. This formula ignores only the lowest 10% and the highest 10%. However, Kelly’s
coefficient is very rarely used.

4. Moment Coefficient of Skewness,

μ 23
β 1 (read, beta one ) = , μ2 , the second central moment and
μ 32

μ3 the third central moment are considered later under moments in this chapter.
5. Moment Coefficient of Skewness,

μ3
γ 1 ( read, gamma one ) = 3
2
μ 2

β 1and γ 1 are related: γ 1 = √ β 1

Moment measures are calculated for distributions such as Binomial, Poisson, Normal, Chi-
square, Student’s t and F. Karl- Pearson’s coefficient is widely used in numerical data.

It is based on the best measure of central tendency, mean and the best measure of dispersion,
standard deviation.

Relationship between Absolute and Relative Measures.

Absolute Measures Relative measures


1. Range 1. Coefficient of Range
2. Quartile Deviation (Q.D) or Semi Inter 2. Coefficient of Quartile Deviation
Quartile Range
3. (i) Mean Deviation (M.D) 3. (i) Coefficient of Mean Deviation
(about Mean)
(ii) Mean Deviation (M.D) (about mean)
(about Median) (ii) Coefficient of Mean Deviation(about
(iv) Mean Deviation about median)
Mode(about mode) (iii) Coefficient of Mean Deviation (about
4. Standard Deviation(S.D) mode)
5. Variance 4. Coefficient of Variation
Measures of Dispersion

Range:

Definition: Range is the difference between the greatest ( Largest) and the smallest of the values.

In symbols, Range = L – S

L – Largest Value, S = Smallest Value

In individual observation and discrete series, L and S are easily identified. In continuous series,
the following two methods are followed.

Method 1: L – Upper boundary of the highest class

S - Lower boundary of the lowest class

Method 2: L - Mid value of the highest class

S – Mid value of the lowest class


L−S
Coefficient of Range =
L+S

Problem 1: Find the value of range and its coefficient for the following data.

8 10 5 9 12 11

Solution: L = 12 S= 5

Range = L - S

= 12 - 5 = 7

L−S
Coefficient of Range =
L+S

12−5
=
12+5

7
=
17

= 0.4118

Problem 2: Calculate range and its Coefficient from the following distribution:

Size: 60-62 63-65 66-68 69-71 72-74


Number: 5 18 42 27 8
Solution:

Method 1: After rewriting the class intervals continuously,

The lower boundary of the lowest class, S= 59.5 and the upper boundary of the highest class, L =
74.5

Range = L – S = 74.5 - 59.5 = 15

L−S
Coefficient of Range =
L+S

74.5−59.5
=
74.5+59.5

= 0.1119

Method 2 : Mid value of the Lowest class, S = 61

Mid value of the highest class, L = 73

Range = L - S
= 73 - 61

= 12

L−S
Coefficient of Range =
L+S

73−61
=
73+61

12
=
134

= 0.0896

Quartile Deviation (Q.D)

Definition: Quartile Deviation is half of the difference between the first and the third quartiles. Hence it
is called Semi Inter Quartile Range.

Q3−Q
In symbols, Q.D = Q.D . is the abbreviation. Among the quartiles Q 1 , Q 2 , ¿ Q 3 the rangeis Q 3−¿ Q ¿
1

2 1

Q3−Q
Hence, Q 3−¿Q ¿ is called inter quartile range and , semi inter quartile range.
1

2
1

Q3−¿ Q
Coefficient of Quartile Deviation =
1
¿
Q3+¿Q ¿
1

As mentioned in the previous chapter, 25% above or equal to Q 3. Q 3−¿Q ¿ is the distance between
1

Q3−Q
Q 1 , ¿ Q3 ,. Central 50 % of the items lie between Q 1 , ¿ Q3.. It is customary to consider as an1

2
absolute measure of dispersion.Definition and calculations of Q1 , ¿ Q 3 , for all types of data were
considered in the previous chapter.

Individual Series:

Problem 1: what do you mean by Quartile Deviations? Find the Quartile Deviation for the following.

391, 384, 591, 407, 672, 522, 777, 733, 1490, 2488.

Solution: The given values in ascending order: 384, 391, 407, 522, 591, 672, 733, 777, 1490, 2488.

N + 1 10+1
Position of Q1 is = = 2.75
4 4

∴ Q1 = 2nd value + 0.75 (3rd value – 2nd value)

= 391 + 0.75 (407 – 391)


= 391 + 0.75 x 16

= 391 + 12

∴ Q 1 = 403

Position of Q 3 is 3 ( N4+1 )=3 X 2.75=8.25


∴ Q3 = 8th value + 0.25 (9th value – 8th value)

= 777 + 0.25 ( 1490 – 777)

= 777+ 0.25 x 713

= 777+ 178.25 = 955.25

Q3−¿Q 955.25−403.00 552.25


∴Q.D = 1
¿= = = 276.13.
2 2 2

Discrete Series:

Problem: Weekly wages of a labourer area given below. Calculate Q.D and Coefficient of Q.D.

Weekly wages (Rs.) 100 200 400 500 600 Total


No. of. Weeks: 5 8 21 12 6 52

Solution:

Weekly wage (Rs.) No. of. Weeks Cumulative frequency


100 5 5
200 8 13
400 21 34
500 12 46
600 6 52
Total N= 52

N + 1 52+1
Position of Q1 is = = 13.25.
4 4

∴ Q1=¿ 13th value + 0.25 (14th value - 13th value)

= 200 + 0.25 (400 -200) = 200 + 0.25 x 20 = 250

Position of Q 3 is 3 ( N4+1 )=3 X 13.25=39.75


∴ Q3=¿ 39th value + 0.75 (40th value - 39th value)
= 500 + 0.75 (500 – 500)

= 500 + 0.75 x 0 = 500 + 0 = 500

Q3−¿Q 500−250 250


∴Q.D = ¿=
1
= = 125
2 2 2

Q3−¿ Q 500−250 250


∴ Coefficient of Quartile Deviation = ¿ = 1
= = 0.3333
Q3+¿Q ¿ 1
500+250 750

Continuous Series:

Problem 1: For the data given here, give the quartile deviation.

X 351-500 501-650 651-800 801-950 951-1100


F 48 189 88 47 28
Solution:

X F True class intervals Cf


351-500 48 350.5 - 500.5 48
501-650 189 500.5 - 650.5 237 ←
651-800 88 650.5 - 800.5 325←
801-950 47 800.5 - 950.5 372
951-1100 28 950.5 - 1100.5 400
N= 400

N 400
= = 100 : Q 1 class is 500.5 - 650.5 ∴ L = 500.5 ; f = 189 ;
4 4

i= 650.5 - 500.5=150 ; C f = 48.

( N4 −cf )
∴ Q1 = L +
[ i

f ]
150 ( 100−48 )
= 500.5 + [ 189 ]
150 X 52
= 500.5+ [ 189 ]
= 500.4 + 41.27 = 541.77

3N
= 3 x 100 =300 ; Q 3 class is 650.5 - 800.5
4

∴ L = 650.5 ; f= 88 ; i= 800.5 - 650.5 =150 ; cf = 237


( 34N −cf )
∴ Q3 = L +
[ i

f ]
150 ( 300−237 )
= 650.5 + [ 88 ]
150 X 63
= 650.5+ [ 88 ]
= 650.5 + 107.39 = 757.89

Q3−¿Q 757.89−541.77 216.12


∴Q.D = ¿=
1
= = 108.06
2 2 2

Standard Deviation

Definition: Standard Deviation is the root mean square deviation of the values from their arithmetic
mean.

S.D is the abbreviation and σ (read, sigma) is the symbol. Mean square deviation of the values
from their A.M is Variance and is denoted by σ 2 . S.D is the positive square root of variance. Karl
Pearson introduced the concept of standard deviation in 1893. S.D is also called root mean square
deviation. It is a mathematical deficiency of mean deviation to ignore negative sign. Standard deviation
possesses most of the desirable properties of a good measure of dispersion. It is the most widely used
absolute measure of dispersion. The corresponding relative measure is Coefficient of Variation. It is
very popular and so extensively used as raise a doubt whether there is any other relative measure of
dispersion.

Standard Deviation
Coefficient of Variation = x 100
Arithmetic Mean

Individual Observation:

Problem: 1 10 students of B.Com class of a college have obtained the following marks in Statistics
out of 100 marks. Calculate the Standard Deviation.

S. No: 1 2 3 4 5 6 7 8 9 10
Marks 5 10 20 25 40 42 45 48 70 80
:
Solution:

Marks
S. No X X2
1 5 25
2 10 100
3 20 400
4 25 625
5 40 1600
6 42 1764
7 45 2025
8 48 2304
9 70 4900
10 80 6400
Total X
∑ = 385 ∑ X 2 = 20143
2
Standard Deviation: Formula, σ =
√ ∑ X2 − ∑ X
N ( )
N

2
20143 385
=
√ 10

10 ( )
=√ 2014.3−( 38.5 )2

= √ 2014.30−1482.25
= √ 532.05
= 23.07

Discrete Series:

Problem: 2 Calculate the Standard Deviation.

No. of. Goals Scored in


a match (X) 0 1 2 3 4 5
No. of. Matches(f) 1 2 4 3 0 2
Solution:

X F Fx fX 2
0 1 0 0
1 2 2 2
2 4 8 16
3 3 9 27
4 0 0 0
5 2 10 50
Total N= 12 ∑ fX = 29 ∑ fX 2 = 95

2
Standard Deviation, σ =
√ ∑ f X 2 − ∑ fX
N ( N )
2
95 29
=
√ −
12 12 ( )
2
= √ 7.9167− (2.4167 )
= √ 7.9167−5.8404
= √ 2.0763 = 1.44
Continuous Series:

Problem 3: The following data were obtained while observation the life span of a few neon lights of a
company. Calculate S.D.

Life Span (Years): 4-6 6-8 8-10 10-12 12-14 Total

No. of. Neon Lights: 10 17 32 21 20 100


Solution:

Life Span No. of. Neon Lights (f) Mid Value (m) Fm fm 2
(Years):
4-6 10 5 50 250
6-8 17 7 119 833
8-10 32 9 288 2592
10-12 21 11 231 2541
12-14 20 13 260 3380
Total N= 100 - ∑ fm = 948 ∑ f m2= 9596

2
Standard Deviation, σ =
√ ∑ f m2 − ∑ fm
N ( N )
2
9596 948
=
√ 100
− ( )
100
= √ 95.96−( 9.48 ) 2

= √ 95.9600−89.8704 = √ 6.0896 = 2.47.


Variance

Definition: Variance is the mean square deviation of the values from their arithmetic mean.

Individual series:

Problem: 1 Number of goals scored by a team in different matches.

2 0 1 3 0 4 3 1 1 2
Calculate variance.

Solution:

X X2
2 4
0 0
1 1
3 9
0 0
4 16
3 9
1 1
1 1
2 4
∑ X = 17 ∑ X 2 = 45
Mean, X́ =
∑X =
17
= 1.7
N 10

Variance, σ 2 = ∑ X2 =
45
= 4.5
N 10

Discrete Series:

Problem: 2 From the following data on daily sales of TV sets, calculate variance.

No. of. TV sets: 5 7 10 11 15 25 30


No. of. Days: 1 3 7 6 5 2 1
Solution:

No. of. TV sets No. of. Days


X F fX fX 2
5 1 5 25
7 3 21 147
10 7 70 700
11 6 66 726
15 5 75 1125
25 2 50 1250
30 1 30 900
Total N= 25 fx
∑ = 317 ∑ X 2 = 4873

2
∑ fX 2 - ∑ fX 4873 317 2
Variance, σ 2
=
N ( N ) =
25
- ( )
25

= 194.9200 – 160. 7824

= 34.14
Continuous Series:

Problem 3: The heights of the recruits are noted as follows. Calculate the variance.

Height (cms): 150 - 155 155 – 160 160 – 165 165- 170 170- 175

No. of. Recruits: 15 18 27 24 16


Solution:

Heights No. of. Recruits Mid value


(cms) F M fm fm2
150-155 15 152.5 2287.5 348843.75
155-160 18 157.5 2835.0 446512.50
160-165 27 162.5 4387.5 712968.75
165-170 24 167.5 4020.0 673350.00
170-175 16 172.5 2760.0 476100.00
Total N= 100 fm
∑ =16290.0 ∑ f m2
=2657775.00

2
∑ fm2 ∑ fm
Variance, σ 2
=
N (
-
N )
2
2657775.00 1629.0
=
100

100 ( ) = 26577.75 – 265.3641
= 26312.3859

Coefficient of Variation

Standard Deviation
Formula: Coefficient of Variation = x 100
Arithmetic Mean

S. D
C.V is the abbreviation. ∴ C.V. = X 100
A .M

σ
= X 100

Individual Series:

Problem :1 The means and standard deviation values for the number of runs of two players A and B are
55; 65 and 4.2; 7.8 respectively. Who is the more consistent player?

Solution: Given:

Player A: Mean = 55 ; Standard Deviation = 4.2


Player B : Mean = 65; Standard Deviation = 7.8

S. D
∴ Coefficient of Variation of player A = X 100
A .M

4.2
= X 100 = 7.64
55

S. D
∴ Coefficient of Variation of player B = X 100
A .M

7.8
= X 100 = 12.00
65

Coefficient of Variation of player A is less. Therefore, Player A is the more consistent player.

Problem: 3 Calculate the coefficient of variation of the following:

40 41 45 49 50 51 55 59 60 60

Solution: Mean, X́ =
∑X =
510
= 51.00
N 10

2
X X - X́ ( X − X́ )
X́ = 51
40 -11 12
41 -10 100
45 -6 36
49 -2 4
50 -1 1
51 0 0
55 4 16
59 8 64
60 9 81
60 9 81
∑ X = 510 ∑ ( X− X́ )2 = 504
504
S.D, σ = √ ∑ ¿ ¿ ¿ ¿ ¿ =
√ 10

= √ 50.4 = 7.10
σ 7.10
C.V. = x 100 = x 100 = 13.92
X́ 51.00

Discrete Series:
Problem: 4 From the following price of gold in a week, find the city in which the price was more
stable.

Day Mon Tues Wed Thurs Fri Sat


City A 498 500 505 504 502 509
City B 500 505 502 498 496 505
Solution:

City A X - X́ City B X - X́
X1 X́ = 503 ( X − X́ )
2
X2 X́ = 501 ( X − X́ )
2

498 -5 25 500 -1 1
500 -3 9 505 4 16
505 2 4 502 1 1
504 1 1 498 -3 9
502 -1 1 496 -5 25
509 6 36 505 4 16
∑ X 1= 3018 - ∑ ( X− X́ )2= ∑ X 2 = 3006 - ∑ ( X− X́ )2= 68
76

X́ =
∑X =
3018
= Rs. 503 X́ 2 =
∑X =
3006
= Rs. 501
N 6 N 6

76 68
S.D, σ 1 = √ ∑ ¿ ¿ ¿ ¿ ¿ =
√ 6
S.D, σ 2 = √ ∑ ¿ ¿ ¿ ¿ ¿ =
√ 6

= √ 12.6667 = √ 11.3333
= Rs. 3.56 = Rs. 3.37

σ1 σ2
C.V. = x 100 C.V. = x 100
X́ 1 X́ 2

3.56 3.37
= x 100 = x 100
503 501

= 0.71 = 0.67

Coefficient of Variation of price in City B is less. Hence, the price was more stable in City B.

Discrete Series:

Problem 5: Goals scored by two teams A and B in a series of football matches were observed as
follows.

No. of. Goals Scored in a Match No. of. Matches


Team A Team B
0 5 4
1 7 5
2 5 5
3 3 4
4 2 3
5 3 3
Which team A or B may be considered as a more consistent team?.

Solution: Goals (X) are common. No. of matches (f) differ between the teams.

Matches Team A Team B


Goals
Team A Team f 1 X f 1 X2 f 2 X f 2 X2
X B
f1f2
0 5 4 0 0 0 0
1 7 5 7 7 5 5
2 5 5 10 20 10 20
3 3 4 9 27 12 36
4 2 3 8 32 12 48
5 3 3 15 75 15 75
N 1 = 25 N 2 = 24 ∑ f1X = ∑ f 1 X2 = ∑ f2X = ∑ f 2 X 2=
49 161 54 184
Team A Team B

Mean, X́ 1 =
∑ f1X =
49
= 1.96 X́ 2 =
∑ f2X =
54
= 2.25
N1 25 N2 24

2 2
∑ f 1 X2 − ∑ f 1 X ∑ f 2 X2 − ∑ f 2 X
S.D., σ 1 =
√ N1 ( N1 ) S.D., σ 2 =
√ N2 ( N2 )
= 1.61 = 1.61

σ1 σ2
C.V. = x 100 C.V. = x 100
X́ 1 X́ 2

1.61 1.61
= x 100 = x 100
1.96 2.25

= 82.14 = 71.56

Coefficient of variation of Team B is less. Hence, Team B is the more consistent team.

Mean deviation

Computation of mean deviation -Individual observation:


|X − X́|
Mean Deviation (About Mean)= ∑
N

The mean, X́ =
∑X is calculated first. From each X.
N

Mean Deviation about Median =


∑ |X −M| and
N

Mean Deviation about Mode =


∑ |X −Z|
N

Median or mode, whichever is required, is calculated first. Then, as in M.D .about mean, other
calculations follow.

Example: 1

Daily earning in (Rs. X) of 10 coolies are given. Calculate all the three mean deviations and the
corresponding relative measures.

X: 32 51 23 46 20 78 57 56 57 30

Solution:

X (Rs.) | X− X́| | X−M | | X−Z|


M= 48.5 Z = 57
20 25 28.5 37
23 22 25.5 34
30 15 18.5 27
32 13 16.5 25
46 1 2.5 11
51 6 2.5 6
56 11 7.5 1
57 12 8.5 0
57 12 8.5 0
78 33 29.5 21
∑ X = 450 ∑|X− X́| = 150 ∑|X−M | =148.0 ∑|X−Z|= 162
X́ =
∑ X = 450 = Rs. 45
N 10
|X − X́| 150
Mean Deviation (About Mean)= ∑ = = Rs.15
N 10
M . D 15
Coefficient of M.D about mean = = = 0.3333
MEAN 45
N + 1 10+1
Position of Median, M is = = 5.5
2 2

Mean Deviation about Median =


∑ |X −M| =
148
= Rs. 14.80
N 10
M . D∧median 14.80
Coefficient of M.D about median = = = 0.3052
Median 48.50

Mode, Z= Rs. 57

Mean Deviation about Mode =


∑ |X −Z| = 162 = Rs. 16.20
N 10
M . D∧mode 16.20
Coefficient of M.D about mode = = = 0.2842.
Median 57

Discrete series: The measure of central tendency Mean or Median or Mode is calculated first. The
following formulae are used later.
f | X− X́|
Mean Deviation (About Mean) = ∑
N

The mean, X́ =
∑ fX is calculated first.
N

Mean Deviation about Median =


∑ f |X−M| and
N

Mean Deviation about Mode =


∑ f | X−Z|
N
SKEWNESS

Karl- Pearson’s coefficients

Problem:1 From the marks secured by 120 students in Section A and 120 students in Section B of a class, the
following measures are obtained:

Section A: X́ = 46.83; S.D = 14.8; Mode = 51.67.

Section B: X́ = 46.83; S.D = 14.8; Mode = 51.67.

Determine which distribution of marks is more skewed.

Solution: Karl- Pearson’s coefficient of skewness for Section A:

X́−Z 46.83−51.67 4.84


SK P = = = = -0.3270
σ 14.8 14.80
For section B:

X́−Z 47.83−47.07 0.76


SK P = = = = 0.0514
σ 14.8 14.80

Marks of Section A are more skewed. But, marks of Section A are negatively skewed and marks of
Section B are positively skewed.

Problem:2 From a moderately skewed distribution of retail prices for men’s shoes, it is found that the mean price
is Rs. 20 and the median price is Rs. 17. If the coefficient of variation is 20% , find the pearsonian coefficient of
skewness of the distribution.

σ
Solution: Consider C.V = x 100

σ
By substituting the given values, 20 = x 100
20

20 X 20
∴σ = = 4
100

Problem:3 The sum and the sum of the squares of 60 items are 1860 and 67100 respectively. Mode is 28.49. Find
Pearson’s coefficient of skewness.

Solution: Given N= 60; ∑ X = 1860; ∑ X 2 = 67100; Z = 28.49


∴ Mean, X́ =
∑ X = 1860 = 31
N 60

∑ X 2 - X́ 2
S.D., σ =¿
√ N

67100
=
√ 60
−(31)2

= √ 1118.3333−961

=√ 157.3333

= 12.54

X́−Z 31.00−28.49 2.51


Pearson’s coefficient of skewness, S K P = = = = 0.2002
σ 12.54 12.54

Individual Series:

Problem :1 Calculate Karl Pearson’s coefficient of skewness for the following data:

25 15 23 40 27 25 23 25 20

Solution:
X X2
25 625
15 225
23 529
40 1600
27 729
25 625
23 529
25 625
20 400
∑ X = 223 ∑ X 2 = 5887

Mean , X́ =
∑ X ¿ 223 = 24.78
N 9
2
∑ X2 - ∑ X 5887
S.D., σ =
√ N ( ) √
N
=
9
2
−( 24.78 ) = √ 654.1111−614.0484

= √ 40.0627 = 6.33

Mode, Z = 25.00

X́−Z 24.78−25.00 0.22


Karl -Pearson’s coefficient of skewness, S K P = = = - = -0.0348
σ 6.33 6.33

Discrete series:

Problem: 2 Calculate Karl- Pearson’s coefficient of skewness for the following data:

Wage per Item Rs. 12 15 20 25 30 40 50


Number of Items: 10 25 40 70 32 13 10
Solution:

Wage per Item (Rs.) No. of. Items


(X) (f) Fx fx2
12 10 120 1440
15 25 375 5625
20 40 800 16000
25 70 1750 43750
30 32 960 28800
40 13 520 20800
50 10 500 25000

Total N= 200 ∑ f x = 5025 ∑ f X 2 =141415

Mean, X́ =
∑ fX = 5025 = 25.125
N 200
2
∑ f X 2 - ∑ fX
S.D., σ =
√ N ( N )
2
141415 25
=
√ 200

200 ( ) = 8.71 ; Mode z= 25

X́−Z 25.13−25 0.13


Karl -Pearson’s coefficient of skewness, S K P = = = = 0.0149.
σ 8.71 8.71

Continuous Series:

Problem 3: Calculate coefficient of skewness by Karl Pearson’s method.

Profit (Rs. Lakhs) 10-20 20-30 30-40 40-50 50-60


No. of Companies: 18 20 30 22 10
Solution:

Profit (Rs. Lakhs) No. of. Companies(f) Mid value(m) fm fm 2


10-20 18 15 270 4050
20-30 20 25 500 12500
30-40 30 35 1050 36750
40-50 22 45 990 44550
50-60 10 55 550 30250
Total N= 100 --- ∑ fm= ∑ f m2= 128100
3360

Mean, X́ =
∑ fm = 3360 = 33.60
N 100
2
∑ f m2 - ∑ fm
S.D., σ =
√ N ( N )
2
128100 3360
=
√ 100

100 ( ) = 12.33 ; Mode z= 35.56

X́−Z 33.60−35.56 −1.96


Karl -Pearson’s coefficient of skewness, S K P = = = = - 0.1590
σ 12.33 12.33

Bowley’s Coefficient

Q3+ Q1−2 M
Formula: SK B =
Q3−Q1

Problem: 1 Compare the skewness of A and B.

Q1 M Q3

Series A 40 60 80
Series B 62.85 65.25 72.15

Solution : Series A Series B

Q 3+ Q 1−2 M Q 3+ Q 1−2 M
SK B = SK B =
Q 3−Q 1 Q 3−Q 1

80+40−2 X 60 72.15+ 62.85−2 X 65.25


= =
80−40 72.15−62.85

0 4.50
= =0 = = 0.4839
40 9.30

In a series A, there is no skewness. In series B, there is moderate positive skewness.

Problem : 2 Calculate Bowley’s coefficient of skewness .

No. of children per 0 1 2 3 4 5 6


Family:
No. of. Families: 7 10 16 25 18 11 8
Solution:

No. of. Children per family No. of. Families (f) Cum. Freq. (cf)
(X)
0 7 7
1 10 17
2 16 33←
3 25 58←
4 18 76←
5 11 87
6 8 95
N= 95 ----

N +1 95+1
Position of Q1 is = = 24 ∴ Q1 = 2
4 4

N +1 95+1
Position of M is = = 48 ∴M=3
2 2

Position of Q 1 is 3 ( N4+1 ) = 3X 24 =72 ∴ Q3 = 4

Q3+ Q1−2 M 4+ 2−2 X 3 0


Bowley’s coefficient of skewness, S K B = = = = 0
Q3−Q1 4−2 2

Problem 3: Calculate Bowley’s coefficient of skewness.

Annual sales (in Rs. 0-20 20-50 50-100 100-250 250-500 500-1000
000)
No. of. Items: 20 50 69 30 25 19
Solution:

Annual Sales (in Rs. 000) No. of. Items Cum. Freq.(cf)
F
0-20 20 20
20-50 50 70 ←
50-100 69 139 ←
100-250 30 169 ←
250-500 25 194
500-1000 19 213
Total N= 213 ----
N 213
= = 53.25 ∴ 20-50 is the Q1 class. ∴ L = 20; i= 50-20 =30; f=50; c f= 20
2 4

( N4 −cf )
∴ Q1 = L +
[ i

f ]
30 ( 53.25−20 )
= 20+ [ 50 ]
30 X 33.25
= 20+ [ 50 ]
= 20 + 19.95 = 39.95

3N
= 3 x 53.25 =159.75 ; Q3 class is 100 - 250
4

∴ L = 100 ; f= 30 ; i= 250-100 =150 ; c f = 139

( 34N −cf )
∴ Q3 = L +
[ i

f ]
150 ( 159.75−7 )
= 100+ [ 30 ]
150 X 20.75
= 100+ [ 30 ]
= 100 + 103.75 = 203.75
( N2 −cf )
∴ M= L +
[ i

f ]
50 ( 160.5−70 )
= 50+ [ 69 ]
50 X 36.5
= 50+ [ 69 ]
= 50 + 26.45= 76.45

Q3+ Q1−2 M 203.75+39.95−2 X 76.45


Bowley’s coefficient of skewness, S K B = =
Q3−Q1 203.75−39.95

90.80
= = 0.5543
163.80

UNIT – III
CORRELATION
Introduction:
So far we have confined ourselves to Univariate distributions, i.e., the distributions involving
only one variable. Often we come across situations in which our focus is simulation sly on two or more
variables and invariably, we observe that movements in one variable are accompanied by movements in
other variable. For example, husband’s age and wife’s age move together, scores on an I.Q. test move
with scores in university examinations, the study of variables indicating accompanying behavior is of
great interest in statistics.
Meaning of Correlation:
In a bivariate distribution we may be interested to find out if there is any correlation or
covariance between the two variables under study. If the change in one variable affects a change in the
other variables, the variables are said to be correlated.
Uses of Correlation:
 It is used in deriving precisely the degree, and direction of relationship between variables like price
and demand, advertising expenditure and sales, rainfalls and crops yield etc.
 It is used in reducing the range of uncertainty in the matter of perdition.
 It is used in developing the concept of regression, and ratio of variation which help in estimating the
values of one variable for a given value of another variable.
 In the field of economics it is used in understanding the economic behavior, and locating the
important variables on which the others depend.
 In the field of business it is used advantageously to estimate the cost of sale, volume of sales, sales
price, and any other values on the basis of some other variables which are financially related to each
other.
 In the field of nature also, it is used in observing the multiplicity of the inter-related forces.
Types of Correlation:
METHODS OF STUDING CORRELATION:
(i) Graphic Method:
1. Scatter diagram or Scatter diagram.
2. Simple graph or correlogram.
(ii) Mathematical Method:
1. Karl Pearson’s coefficient of correlation.
2. Spearman’s rank correlation coefficient.
3. Coefficient of concurrent deviation.
4. Method of least square.

Definition:
“Correlation analysis attempts to determine the degree of relationship between variables”. It
denoted by r. Example: price and demand of a commodity.(or)
Definition:

The term correlation refers to the relationship between the variables. Simple correlation refers
to the relationship between two variable.

Types of correlation:

Considered under the following three heads.

Positive correlation (or) negative correlation:

When the values of two variables change in the same direction, there is positive correlation
between the two variables.

Example 1:

X 50 6 70 95 10 105
0 0
Y 23 3 37 41 46 50
2

Example 2:
X 34 25 18 10 7
Y 51 49 42 33 19
In the two examples X and Y change in the same
direction (X and Y increase in ex1 and they decrease in ex2). Hence ,there is positive correlation
positive correlation is generally found between the following pairs of variables.

1. price and supply.

2. sales and expenditure on advertisement .

3. yield and fertilizer applied.

When the values of two variables change in the opposite directions ,there is negative correlation
between the two variables.

Example 1:

X 50 60 70 9 100 105
5
Y 50 46 40 3 24 15
0
Example 2:

X 45 43 39 24 28
Y 14 20 28 29 34

In the two examples X and Y move in the


opposite directions (in ex1, X increases and Y decreases; in ex2 X decreases and Y increases).hence
there is negative correlation generally exists between the following pairs of variables.

1. Price and demand

2. Number of members in a family and monthly expenditure of each member.

3. Yield and weed.

(1) Positive and Negative Correlation:


Positive and Negative correlation depend upon the direction of change of the variables.
If two variables tend to move together in the same direction then the correlation is called positive
or direct correlation.
Eg: height and weight, rainfall and yield of crops, price and supply.
X: 10 20 30 40 50
Y: 50 60 70 80 90

If two variables tend to move together in opposite directions then the correlation
negative or diverse correlation.
ex: price and demand of a commodity, the volume and pressure of a perfect gas.

X: 10 20 30 40 50
Y: 50 40 25 15 10
(2) Simple Correlation:
When only two variables are considered as under positive or negative correlation above
the correlation between them is called simple correlation.(or)
When we study only two variables, the relationship is described as simple correlation.
(3) Multiple correlation:

When more than to variables are considered the correlation between one of them and its
estimate based on the group consisting of the other variables is called multiple correlation.

Eg: Quantity of money and price level, demand and price.

When we study more than two variables simultaneously.


Eg: the relationship of price, demand and supply of a commodity.
(4) Partial and Total Correlation:
When all other variables are held constant ie+ , when the linear effects of all
other variables on them are removed .is called partial correlation.(or)

When more than two variables are considered, the correlation between two of them The
study of two variables excluding some other variables is called partial correlation.

Eg: We study price and demand, eliminating the supply side. In total correlation, all the facts are
taken into account.

(5) Linear or Non-linear Correlation or No correlation:

Linear or non-linear or no correlation corresponding to each pair of values of two variables


,plot a point n a graph sheet. consider all the points so obtained when all the points lie exactly on a line
or scattered around a line , there is linear correlation between the two variables. when all the points lie
exactly on a curve or scattered around a curve , there is non-linear correlation between the two
variables . when the points are scattered neither around a line nor around a curve ,there is no correlation
between the two variables.

If the ratio of change between two variables is uniform, then there will be linear correlation.

X: 5 10 15 20
Y: 4 8 12 16
If the ratio of change between two variables is un-uniform, then there will be non-linear
correlation.

X: 2 6 8 10
Y: 5 4 10 9
NO-Correlation:
When the points are scattered neither around a line nor around a curve, there is no
correlation between the two variables.
Methods:

The following four methods are available under simple linear correlation and among them ,
product moment method is the best one.

i) Scatter diagram.
ii) Karl person’s correlation co-efficient or product moment correlation co-efficient (r).
iii) Spearman’s rank correlation co-efficient (p).
iv) Correlation co-efficient by concurrent deviation method (r).

Scatter diagram:

Let (Xi ,Yi) i= 1,2,3…….N be the pairs of values of two variables X and Y.A point is plotted
on a graph sheet corresponding to each pair of the values .the resulting diagram with N points is called
scatter diagram.

Possible types of scatter diagrams under simple liner correlation are as given below from a
diagram, it can be found out whether it is perfect or high or low.

2. Simple Graph:
The values of two variables are plotted on a graph paper we get to curves. One for X variables
and another for Y variables. This two curves reveal the direction and closed of two curves, and also
reveal whether or nor the variables are related. If both the curves move in the same direction that parallel
to each either upward and downward correlation is said to the positive and the other hand, if they
opposite direction then the correlation is said to be negative.
Karl – Pearson’s Coefficient of correlation

Correlation coefficient between two random variables x and y, usually denoted by r(x,y) or
simply rxy , is a numerical measure of linear relationship between them and is defined as:
cov ( x , y)
r ( x , y )= or r =❑
σxσy ❑
Karl Pearson’s correlation coefficient is also called product-moment correlation coefficient,
Since Cov (x, y)=E[{x-E(x)}{y-E(y)}]=µ11.

Properties:

1. -1≤ r ≤ +1.(ie),correlation co-efficient cannot be greater than 1 numerically.


2. Correlation co-efficient is independent of change of origin. That is why we do not add A or B
when we use U and V although we have subtracted them from X and Y while finding U and V.
3. Correlation co-efficient is independent of change of scale. That is why we do not multiply by
cord when we use U and V although we have divided X and Y by them while finding U and V.
4. Correlation co-efficient is a pure number. it is not in any unit of measurement

Merits:

1. Karl person’s correlation co-efficient is the most popular correlation co-fficint.it is used in
regression equation also.
2. It is superior to other methods. It is calculated directly from the numerical values of each and
every pair. Even if one value change, r changes.
3. The population correlation co-efficient can be estimated from the sample value.
4. The significance of the sample correlation co-efficient can be tested.

Demerits:
1. The correlation co efficient is unduly affected by extreme values.
2. From the values of r, it cannot be known whether the assumption of linear relationship between
the variables holds or not.
3. Compared with other correlation co-efficient Karl person’s correlation co-efficient is the most
difficult one to calculate.

Spearman’s rank correlation co-efficient:

This method is based on rank. This measure is useful in dealing with qualitative characteristics,
such as intelligence, beauty, morality, character, etc. It cannot be measured quantitatively, as in the
case of Pearson’s coefficient of correlation; but it is based on the ranks given to the observations. It
can be used when the data are irregular or extreme items are erratic or in accurate, because rank
correlation coefficient is not based on the assumption of formality of data.
The formula for spearman’s rank correlation which is denoted by ρ is;
6 ∑ d2
ρ= 1 -
[ N ( N 2−1 ) ]
We may come across two types of problem.

 Where ranks are given.


 Where ranks are not given.

1. Where ranks are given:

When the actual ranks are given, the steps followed are;
1. Compute the difference of the two ranks (R1and R2) and denote by d.
2. Square the d and get ∑d2.
3. Substitute the figures in the formula.

2. Where ranks are not given:


When no rank is given, but actual data are given, then we must given ranks. We can give ranks
by taking the highest as 1 or the lowest as 1, next to the highest (lowest)as 2 and follow the same
procedure for both the variables.

Equal or Repeated ranks:

When two or more items have equal values, it is difficult to give ranks to them. In that
case the items are given the average of the ranks they would have received, if they are not tied. For
7 +8
=7 . 5
example, if two individuals are placed in the seventh place, they are each given the rank 2
which is common rank to be assigned; and the next will be 9; and if three are ranked equal at the seventh
7 +8+9
=8
place, They are given the rank 3 which is the common rank to be assigned to each; and the
next rank will be 10, in this case. A slightly different formula is used when there is more than one item
having the same value. The formula is:
Merits:

1. Spearman’s rank correlation co-efficient is useful in qualitative analysis. for example it is


sufficient for the judges to rank the competitors. Judges need not assign scores. It is more
difficult to assign scores to the competitors than ranking them.
2. It is the only method when ranks are given.
3. It can also be calculated when the values of the variables ar given.
4. It is simple to understand.
5. It is generally easy to calculate.

Demerits:
1. It N is large; it is very difficult to rank the items and to calculate P.
2. It cannot be calculated from a bivariate frequency table.
3. It is not used mush.

Concurrent Deviation Method:

A very simple and casual method of finding correlation when we are not serious about the
magnitude of the two variables is the application of concurrent deviations. The deviation in X-value
and the corresponding Y-value is known to be concurrent if both the deviations have the same sign.
2 C−N
r(c)=±
√ N
Where r(c) = Coefficient of correlation by the concurrent deviation method
C = Number of concurrent deviations
N =Number of pairs of deviation compared.
STEPS:
1. Find out the direction of change of x variable. Take the first value of x as base and note down
whether the second value is increasing or decreasing or constant. If it increases in relation to the
previous one, mark plus(+) sign against it; if it decreases, put minus(-) sign; and if it equal, put
zero. In the case of the third value, the second value is the base and repeat the above method till
the last item. The heading of the column is denoted by Dx.
2. Find out the direction of change of y variables, following the above step. The heading of the
column is denoted by Dy.
3. Multiply Dx by Dy and find out the values of C; i.e., the number of positive items.
4. Substitute the figures in the formula.
2 C−N
If
√ N
is negative, the negative value multiplied by the minus sign inside will make it

positive and we can take the square root. But if the ultimate result is negative, we cannot take the
2 C−N
square roots of minus sign. If
√ N
is positive, then all the signs will be positive.

Correlation co-efficient by concurrent deviations:

Merits:

1. Correlation co-efficient by concurrent deviations can be calculated without much


difficulty even when N is large and the values are large
2. It is simple to understand 3. It is easy to calculate.

Demerits:

1. It is not precise. It just gives a rough idea about the existing correlation between two
variables.
2. It does not consider the quantum of deviations.
3. It cannot be calculated from a bivariate frequency table.
4. It is not used mush.
Regression
Introduction:
 Regression literally means stepping back towards the average.
 Used by British Biometrician Sir Francis Galton 1822- 1911 in connection with the
inheritance of stature.
 “ Regression analysis a mathematical measure of the Average relationship between
two or more variables in terms of the original units of the data”
 Regression equation: The value of the dependent variable is estimated corresponding
to any value of the independent variable by using the regression equation.
 In Regression there are 2 type of variable.
(i) Dependent variable (ii) Independent variable

Method of forming the regression equations

Both the methods are based on the principle of least squares. They give the same
requirements:

1. Regression Equations on the basis of Normal Equations.


2. Regression Equations on the basis of X́ , Ý , b XY ∧¿b ¿ .
YX

Properties of Regression Lines and Coefficients

1. The two regression equations are generally different and are not be interchanged in their
usage.
The regression equation of Y on X is to be used to find the value of Y corresponding to
any specified value of X. Similarly, the regression equation of X
On Y is to be used to find the value of X corresponding to any specified value of Y. The
two regression equations become one and the same when r= -1 or +1. In such cases, both
X and Y are to be found from that equation.

2. The two regression lines intersect at ( X́ , Ý ).


When there are two regression lines, they interest at ( X́ , Ý ).Hence, the values obtained
for X and Y by solving the two regression equations simultaneously are X́
and Ý respectively.

3. Correlation coefficient is the geometric mean of the two regression coefficients.


That is, correlation coefficient is the square root of the product of the two regression
coefficients.
r = ± √ bYX . b XY

4. The two regression coefficients and the correlation coefficient have the same sign.
Both b YX. and b XY have the same sign. r is also of the same sign. In other words, there are
only two possibilities -b XY , b YX. and r are positive or b YX. and r are negative at a time.
5. Both the regression coefficient cannot be greater than 1 numerically simultaneously:
When the signs are ignored, both b XY andb YX. cannot be greater than 1 simultaneously; either
both are less than 1 or one of them is less than 1.

6. Regression coefficient are independent of change of origin but are affected by change of
scale.
c d
b XY = b uv and b YX = b vu
d c
b XY ≠ b uv ± a ± b and b YX ≠ b vu ± a ± b

7. Each regression coefficient indicates is in the unit of the measurement of the dependent
variable
.
8. Each regression coefficient indicates the quantum of change in the dependent variable
corresponding to unit increase in the independent variable.

Uses of regression:
1. It is widely used method than correlation analysis.
2. It is used to estimate the relationship between two Economic variable income and Expenditure.
3. Predicts the value of dependent from the independent values.
4. We can calculate coefficient of correlation(r) and Coefficient of Determinationr 2.
5. Estimation of Demand curves, Supply, Production.

Difference between Correlation and Regression.

Correlation Regression
1. Correlation is the relationship between two 1. Regression means going back. The average
or more variables. It is expressed numerically. relation between the variables is given as an
equation.
2. Between two variables none is identified as 2. One of the variables is independent variable
independent or dependent variable. and the other is independent variables.

3. It does not study the cause and effect 3. It indicates the cause and effect relationship
relationship between the variable. between the variables and establishes a
functional relationship.

4. The coefficient of correlation is a relative 4. Regression coefficient is an absolute


measure. The range of relationship lies measure. If we know the value of independent
between -1 and +1 variable, we can find the value of dependent
variable.

5. It is not useful for further mathematical 5. It is useful for further mathematical


treatment. treatment.

6. It has limited application because it is 6. If has wider application, as it studies linear


confined to linear relationship between the and non-linear relationship between the
variables. variables.

7. There is spurious or nonsense correlation. 7. There is no such nonsense regression.

8. If the coefficient is positive, then two 8. The regression coefficient explains that the
variables are positively correlated and vice decrease in one variable is associated with the
versa. increase in the other variables.

9. Correlation coefficient is independent of 9. Regression coefficient are independent of


change of origin and scale. change of origin but are affected by change of
scale.

Correlation and Regression

KARL PERSON’S COEFFICIENT OF CORRELLATION(r)

This is also called product moment correlation co-efficient. this is denoted by r. this is
covariance between the two variables divided by the product of their standard deviations. this can be
calculated by using any one of the formulae choice of a formula depends on the nature of the data.

Example1:

The following table gives aptitude test scores and productivity indices of 8 randomly selected
workers.

Aptitude score
productivity: 57 58 59 59 60 61 62 64
Index: 67 68 65 68 72 72 69 71
Calculate the correlation co-efficient between aptitude score and productivity index.

Solution:

X-aptitude score ; Y- productivity index x-x́and y- ý are integers and small and hence the following
formula is used ∑ x =¿ ¿x-x́)=0 and ∑ y=∑ ¿¿y- ý)=0 are the properties.

x=X- X́ y=Y-Ý
X Y X́ =60 Ý =69 xy x2 y2
57 67 -3 -2 6 9 4
58 68 -2 -1 2 4 1
59 65 -1 -4 4 1 16
59 68 -1 -1 1 1 1
60 72 0 3 0 0 9
61 72 1 3 3 1 9
62 69 2 0 0 4 0
64 71 4 2 8 16 4
∑ x =480 ∑ y=552 ∑ x =0 ∑ y=0 ∑ x y =24 ∑ x 2=36 ∑ y2=44

X́ =
∑x =
480
= 60 ý =
∑y =
552
= 69
N 8 N 8

Karl person’s correlation co-efficient, where ∑ x =0∧∑ y=0


∑xy 24
r= 2 2= = 0.6030.
√∑ x ∑ y √36 44

Example 2:

Compute the co-efficient of correlation between X-advertisement expenditure and Y- sales.

x 10 12 18 8 13 20 22 15 5 17
y 88 90 94 86 87 92 96 94 88 85
Solution:

X Y XY X2 Y2
10 88 880 100 7744
12 90 1080 144 8100
18 94 1692 324 8836
08 86 688 64 7396
13 87 1131 169 7569
20 92 1840 400 8464
22 96 2112 484 9216
15 94 1410 225 8836
05 88 440 25 7744
17 85 1445 289 7225
∑ X =140 ∑ Y =900 ∑ X Y =12718 ∑ X 2=2244 ∑ Y 2=81130

Correlation co-efficient, r = N ∑ XY −¿¿ ¿

10× 12718−140 × 900


=
√ 10 ×2224−¿ ¿ ¿
1180
=
√ 2640 √ 1300
r = 0.6370.

Spearman’s rank correlation co-efficient(ρ)

6 ∑ d2
ρ= 1 -
[ N ( N 2−1 ) ] when there is no tie. d-difference between x and y ranks.

m(m2 −1)
= 1
[
-
6 { ∑ d2+ 12
N ( N 2−1 )
}
] When one value occurs m times.

m ( m2−1 ) m(m2−1)
= 1 -
[ {
6 ∑d + 2
12
+
12
+…
2
N ( N −1 )
}
] When more than one value is repeated.

It is calculated when ranks are given or when rank correlation co-efficient is required. Rank
correlation co-efficient also lies between -1 and +1.

Problem: 3 Rankings of 10 trainees at the beginning (x) and at the end (y) of a certain course are given
below.

Calculate spearman’s rank correlation co-efficient.

Trainees: A B C D E F G H I J

X 1 6 3 9 5 2 7 10 8 4
Y 6 8 3 7 2 1 5 9 4 10
Solution:

X Y d d2
1 6 -5 25
6 8 -2 4
3 3 0 0
9 7 2 4
5 2 3 9
2 1 1 1
7 5 2 4
10 9 1 1
8 4 4 16
4 10 -6 36
∑ d=0 ∑ d 2=100

6∑ d2
ρ = 1−
[ N ( N 2−1 ) ]
6 × 100
=1 - [ 10 × 99 ]
= 1 - 0.6061

= 0.3939.

Problem 3:

Find the rank correlation co-efficient for the percentage of marks secured by a group of 8 students
in economics and statistics.

Marks in economics 50 60 65 70 75 40 70 80
Marks in statistics 80 71 60 75 90 82 70 50
Solution:

Let X -marks in economics; Y – marks in statistics

X Y X Y d d2
50 80 7 3 4 16
60 71 6 5 1 1
65 60 5 7 -2 4
70 75 3.5 4 -0.5 0,25
75 90 2 1 1 1
40 82 8 2 6 36
70 70 3.5 6 -2.5 6.25
80 50 1 8 -7 49
2
∑ d=0 ∑ d =113.5

m(m2 −1)
ρ= 1-
[
6 ∑d + {12
2

N ( N 2−1 )
}
]
m(m 2−1)
When m = 2, =¿ 0.5
12

6 (113 . 5−0 . 5)
∴ ρ= 1 - [ 8 (82−1) ]
6 × 114
= 1− [ 8 × 63 ]
= 1 - 1.3571

= -0.3571

Coefficient of correlation by concurrent deviation method( r c )

r c = 2 c−N
√ N
when 2C – N > 0

= 0 when 2C – N = 0

= -
√ −2c−N
N
when 2C-N <0

∴ r c =± ± 2C−N

N

N denotes the number of entries and c denotes number + signs (concurrent deviations) in D xy
column.

r c also lies between -1 and +1.

It a value is greater than the preceding value, + sign put. It is less than preceding one,-sign is
marked. If it is equal to the preceding one, deviations is O. Dx denotes such deviations among the values
of the variable x and D y denotes those of y. D XY denotes the product of the entries under D x and D y .

Problem: 1

Calculate the co-efficient of correlation from the data given below by the method of concurrent
deviations.

Year 1959 1960 1961 1962 1963 1964


Index of
imports 85 82 89 95 104 108
Index of
prices 110 115 112 118 120 109
Year 1965 1966 1967 1968 1969
Index of
imports 112 100 99 93 90
Index of
prices 98 102 130 105 107
Solution: Index of

Imports(X) Prices(Y) Dx Dy D xy
85 110
82 115 - + -
89 112 + - -
95 118 + + +
104 120 + - -
108 109 + - --
112 98 - + -
100 102 - + -
99 103 - + -
93 105 - + -
90 107 - + -

N =10 ; C = 2; 2C - N=2×2-10 = -6<0

∴ r c = − − 2 c−N
√[ N ]
−2× 2−10

0.7746.
= -
√ 10
¿−
√ −−6
10
=-

UNIT-IV

INDEX NUMBER

Introduction: An index number is a statistical measure designed to show changes in a variable


or a group of related variables with respect to time, geographic location or other characteristics
such as income, Profession; etc .

Definition: A Price index number is the Percentage of change in the Price of one commodity or
one group of commodities in the current year compared with the base year. A Similar calculation
in quantity results in quantity index number.

Characteristics of index number:

1. Index numbers are a special type of average:

The units of measurements of commodities are different. But, a price index number gives
the percentage of change in prices on the average. Hence, index numbers are a special
type of averages. For example, let the commodities be rice, kerosene and cloth. The price
of rice per kilogram is considered; the price of kerosene per litre and the price of a cloth
per metre are considered. The average change in prices is indicated by the index number.

2. Index numbers are percentages.


The price in the current year is divided by the price in the base year to get the ratio of
change in price. It multiplied by 100.Interpretation of an index number is made easy by
this procedure.

3. Index numbers indicate the percentage of change which is not possible otherwise.
No other statistical tool is so effective in studying such a wide variety of situations.

4. Index numbers are meant for comparisons.

Index numbers have been devised to compare two different times. Comparisons of two
different places or situations are also possible with index numbers.

Uses:

1 .Index numbers provide scopes for comparisons. price , production, value etc. in two times
are compared by index numbers.

2 .Index numbers are Economic Barometers . The dictionary meaning of the word
barometers is that it is an “instrument measuring atmospheric pressure used for forecasting
weather and ascertaining height above Sea –Level. Index numbers of whole sale prices,
Industrial production etc.

3 .Index numbers serve as guides. Being economic barometers the direction in which the
economy is likely to move is foretold, Government, Businessmen, etc.

4 . Index numbers are the pulse of an economy. The condition of an economy is known from
the index numbers of various economic activities.

5 . Index numbers measure the purchasing power of money.

Purchasing power of one rupee = 100

Price index

6 . Index numbers help to calculate real wages.

Money Wage
Real Wage = x 100
Price Index off Cost of Living Index

7 . Index numbers are deflators . Deflator is one while makes allowance for the change in
the prices of commodities.
8. Index numbers are useful to formulate policies: Based on the relevant index numbers
suitable policies are framed by businessmen and economics. Governments and industrialists also
use the prevailing conditions and benefits through planning.

General Problems in the construction of index numbers.

The following aspects are to be carefully considered during the construction of an index number,

1 . The purpose of the index number is to be clearly known for whom it is meant, by whom it is
to be used etc. to be spelt out

2 . The Base Period. The period may be one year or a few years. The base period is to be taken
according to the purpose.

(i). It should be a normal period. There should not have been natural calamities such as famine,
flood and earth quake, political, up navels’, war , etc.

(ii). It should not be two distant in the past This is to keep the Index numbers useful.

3 . The items Including all the items in a study is neither feasible nor useful. Only those items
which concern the people for whom the index number is intended are to be included. For
considering the living conditions of people in hill stations woolen clothes should be included

4.The Price Quotations : The Prices are to the De Properly gathered. For consumer Price index
number, retail prices are necessary, For whole – Sale Price indices. Whole – Sale prices are
needed.

5 . The Average for arriving at the average value of a group of items, the suitable average is to be
decided. In other contexts A . M may be more useful. It may be simple to understand and easy to
calculate.

(i) G . M is the appropriate average to measure relative changes. Hence, index numbers
where in the relative changes are expressed as percentages give scope for G.M
(ii) It gives more weight age to smaller items and expressed as percentages, give scope
for G.M.
(iii) It facilities the change of the base period. Base cannot be kept the same for a long
time because the purpose and all around changes may warrant a change in the base
period.

6 . Weighting :By un weighted method, equal weight age of unity is given to all the items.

(i) Base year quantity as in Laspeyres method or current year quantity as in Paasche’s
method for Price index number.
(ii) Base year value (Price × quantity ¿ asin consumer Price index number by family
Budget method.
(iii) Some fixed weight based on neither base year quantity nor current year quantity but
on some other consideration as in Kelly’s method.

7 . The formula: As seen in the following pages, many formulas are available.

Period is referred to as year here after and the following notations are used.

P0 - price of a commodity in the base year

P1 - Price of a commodity in the current year.

q0 – quantity of a commodity in the base year

q1 – quantity of a commodity in the current year.

P – Price of a commodity.

q – quantity of a commodity.

V or W – weight of a commodity.

I or P –Price relative or price Index number of a commodity.

Q – quantity relative or quantity index number of a commodity.

P1 q1
P= × 100, Q= × 100.
P0 q0

P01 – price index numbers the current year compared with the base year.

Q01 – quantity index number of the current year compared with the base year.

Formulae :

All the formulae can be brought under four groups as follows.

Methods

un weighted (simple) weighted

simple simple weighted weighted


aggregative Average of relation Aggregative Average of relation

method method method method

1.SIMPLE OR UNWEIGHTED AGGREGATIVE METHOD.


It is based on the aggregative or the totals as shown below.

P01 =
∑ P1 X 100
∑ P0
When quantity index number is required, Q 01 =
∑ q1 X 100
∑ q0
The drawbacks of this method are:
(i) It does not satisfy even unit test which is explained later. The defect is due to the
fact that the unit prices are added as such even though the units of measurements
are different suc as kg, liter, etc.
(ii) It does not distinguish between the commodities with regard to their relative
importance.

2. SIMPLE OR UN WEIGHTED AVERAGE RELATIONSHIP METHOD.

Price index (p01)

(i) Using A.M., P01 =


∑P
N
∑ log p
(ii) Using G.M., P01 = ( N )
Both these formulae can be found to satisfy unit test.

3.WEIGHTED AGGREGATIVES METHOD.

Price Indices ( P01)

(i) Laspeyre’s formula: P01L =


∑ P1 q0 x 100
∑ P 0 q0

(ii) Paasche’s formula: P01 P =


∑ P1 q 1 x 100
∑ P0 q1

∑ P 1 q 0 x ∑ P1 q 1 x 100
(iii) Fisher’s formula: P01F =
√ ∑ P 0 q 0 ∑ P0 q1
∑ P1 ( q0 +q 1 )
(iv) Marshall- Edge worth formula: P01ME = x 100
∑ P0 ( q0 +q 1 )

=
∑ P1 q0 + ∑ P1 q1 x 100
∑ P0 q0 + ∑ P0 q1

1 ∑ P1 q0 +∑ P1 q1
(v) Bowley’s formula: P01B =
2 ( ∑ P0 q0 +∑ P0 q 1 ) x 100

P 01L + P01P
=
2

(vi) Kelly’s formula: P01k =


∑ P1 q x 100
∑ P0 q
4.WEIGHTED AVERAGES OF RELATIVES METHOD:

Price Indices [ P01 ]

(i) Using A.M., P01 =


∑℘
∑W
∑ Wlog P
(ii) Using G.M., P01 = Antilog
[ ∑W ]
This method is better than the corresponding unweighted method in showing the relative
change. From the data available under this method, index numbers by unweighted
averages of relatives also could be calculated. This method provides scope for replacing
one or more items as a later stage.

TESTS OF CONSISTENCY AND ADEQUACY

1. Unit Test: By simple Aggregative Method,

P01 =
∑ P1 X 100
∑ P0
By Laspeyre’s formula,

P01=
∑ P1 q0 x 100
∑ P 0 q0
By Paasche’s formula,
P01❑ =
∑ P1 q 1 x 100
∑ P0 q1

By Fisher’s Formula,

∑ P 1 q 0 x ∑ P1 q 1 x 100
P01F =
√ ∑ P 0 q 0 ∑ P0 q1
2. Time Reversal Test (T.R test)
P01 x P10 = 1

3. Factor Reversal Test.(F.R test)

P01 x Q 01 =
∑ P 1 q1
∑ P 0 q0
4. Circular Test:

P01 x P 12 x P20 = 1

5. Fixed Base: When the data are available for more than two years, the question ‘ which is the
base year’ arise. Under fixed base method, the base ‘year’ is same for all the different years
under consideration. Base year figures may be figures of any one year or the averages of a
few years or the totals of a few years or those suggested. When nothing is indicated, the first
year in the series of years in chronological order is to be taken as the base.
If no method is suggested , the method suggested, the method which is suitable for the
data under consideration is to be chosen. For the given data, although index number can
be calculated by more than one method , the result is obtained by only one method unless
stated otherwise. The method is selected in the following order.
(i) Fisher’s formula (or)
(ii) Weighted A.M. method(Or)
(iii) Unweighted A.M. method.
For each commodity the price in a year is divided by that in 1995 and is
multiplied by 100 to get the price relative. Using A.M., the price indices are
calculated and are given in the last column of the above table.

For the first year which is the base year, fixed base index number as well as each
P is 100.
6. Chain Base index:
Current year link relatives X Preceding year chain index
Chain Index =
100

Current year C . B . I X Preceding year F . B . I


Current year F.B.I =
100

Cost of Living Index :

Cost of living index number shows the impact of changes in the prices of a number of
commodities and services on a particular class of people in the current year in
comparison with the base year, cost of Living Index Number.

Formula:

Two formulae are available. They are given below.

(i) Aggregate Expenditure Method or weighted Aggregate Method.

Cost of Living Index number =


∑ P1 q0 x 100
∑ P 0 q0
(ii) Family Budget Method or weighted Averages of Relatives Method.

Cost of Living Index number =


∑℘
∑W
∑ wlog P
Cost of Living Index Number = Antilog
( ∑W )
Uses:

1. Cost of living index numbers are the indicators of changes in real wages. Money
wages ar changing and so are prices. Cost of living index numbers help to know
whether money wages overtake the rising prices or are overpowered by them.
2. Decisions on dearness allowance are based on the cost of living indices.
3. They are further used for deflation of income and value in national accounts.

INDEX NUMBER

UNWEIGHTED AGGREGATIVE METHOD AND UNWEIGHTED AVERAGES OF


RELATIVE METHOD:
Problem: 1 from the following data constructs an index for 1995 taking 1994 as base:

Commodities A B C D E

Price in 1994 (Rs) 50 40 80 110 20

Price in 1995 (Rs) 70 60 90 120 20


Solution:

Price
Commodities 1994( p0) 1995( p1) P1 Log P
P= x 100
P0
A 50 70 140.00 2.1461
B 40 60 150.00 2.1761
C 80 90 112.50 2.0512
D 110 120 109.09 2.0378
E 20 20 100.00 2.0000
Total p
∑ 0= 300 p
∑ 1 = 360 P
∑ = 611.59 ∑ log p =
10.4112

By Aggregative Method,

P01 =
∑ P1 X 100 = 360 x 100 = 120
∑ P0 300

Using A.M., P01 =


∑P= 611.59
= 122. 32
N 5

log p 10.4112
Using G.M., P01 = Antilog (∑ ) N
= Antilog
5 ( )
= 120.84

WEIGHTED AGGREGATIVES METHOD

Problem :2 Compute (i) Laspeyre’s (ii) Paasche’s and (iii) Fisher’s index number.

Price Quantity
Item Base year Current year Base year Current year
A 6 10 50 50
B 2 2 100 120
C 4 6 60 60
D 10 12 30 25

Solution:
Commodit Price Quantity
y
Base Current Base Current
year year year year
P0 P1 q0 q1 p0 q0 p1 q0 p0 q1 p1 q1
A 6 10 50 50 300 500 300 500
B 2 2 100 120 200 200 240 240
C 4 6 60 60 240 360 240 360
D 10 12 30 25 300 360 300 300

(i) Laspeyre’s formula: P01L =


∑ P1 q0 x 100 = 1420 x 100 = 136.54
∑ P 0 q0 1040

(ii) Paasche’s formula: P01 P =


∑ P1 q 1 x 100 = 1400 x 100 = 135.92
∑ P0 q1 1030

∑ P 1 q 0 x ∑ P1 q 1 x 100 =
(iii) Fisher’s formula: P01F =
√ ∑ P 0 q 0 ∑ P0 q1 √ 1420 1400
×
1040 1030
x 100

= 136.23 (or)

P01F = √ Laspeyr e' s × Paasche ' s

= √ 136.54 ×135.92

= 136.23

WEIGHTED AVERAGES OF RELATIVES METHOD

Problem: 1 Calculate the index number of prices for 1998 on the basis of 1995 from the data given below.

Commodity Weights Price (1995) Price(1998)


A 40 16 20
B 25 40 60
C 5 2 3
D 20 5 7
E 10 2 4

Solution: Either G.M or A.M can be used

Weights Price
Commodity W 1995 1998 WP Log p W log P
p1
P= x 100
p0
A 40 16 20 125 5000 2.0969 83.8760
B 25 40 60 150 3750 2.1761 54.4025
C 5 2 3 150 750 2.1761 10.8805
D 20 5 7 140 2800 2.1461 42.9220
E 10 2 4 200 2000 2.3010 23.0100
Total ∑w = ---- ----- ------ ∑ ℘= ------ ∑ W log P
100 14300 = 215.0910

(iii) Using A.M., P01 =


∑ ℘ = 14300 = 143
∑ W 100
∑ Wlog P
(iv) Using G.M., P01 = Antilog
[
∑W ]= Antilog [ 215.0910
100 ]
= 141.55

TIME REVERSAL AND FACTOR REVERSAL TEST

Problem: 1 Show that Fisher’s ideal index satisfies both time reversal and factor reversal tests, using the
following data commonly.

Commodity Price(1990) Qty(1990) Price (1992) Qty (1992)


A 6 50 10 56
B 2 100 2 120
C 4 60 6 60
D 10 30 12 24
E 8 40 12 36
Solution:

1990 1992 p0 q0 p1 q0 p0 q1 p1 q1
Commodit p0 q0 p1 q1
y
A 6 50 10 56 300 500 336 560
B 2 100 2 120 200 200 240 240
C 4 60 6 60 240 360 240 360
D 10 30 12 24 300 360 240 288
E 8 40 12 36 320 480 288 432
Total ---- --- --- --- ∑ p0q0 ∑ p1 q 0 ∑ p0 q1 ∑ p1 q 1=
= 1360 = 1900 = 1344 1880
By Fisher’s formula, after ignoring the facto 100,
∑ P 1 q 0 × ∑ P1 q1 =
P01 =
√ ∑ P 0 q 0 ∑ P0 q1 √ 1900 1880
×
1360 1344

∑ P 0 q 1 × ∑ p0 q 0 =
P10 =
√ ∑ P 1 q 1 ∑ P1 q0 √ 1344 1360
×
1880 1900
and so

1900 1880 1344 1360


P01 × P 10 =
√ ×
1360 1344
×
√×
1880 1900

1900 1880 1344 1360


¿
√ × × ×
1360 1344 1880 1900
=√ 1 =1

∑ P 0 q 1 × ∑ P1 q1 =
Q 01=
√ ∑ p0 q0 ∑ P1 q0 √ 1344 1880
×
1360 1900

1900 1880 1344 1880


P01 × Q 01= ×
1360 1344√× ×
1360 1900 √
1880 ∑ P1 q 1
= =
1900 ∑ p0 q0

Using the given data, Fisher’s index in found to satisfy both time reversal and factor
reversal tests.

FIXED BASE INDEX

Problem: 1 Calculate fixed base index numbers from the following prices:

Commodity 1995 1996 1997 1998 1999 2000


I 4 5 6 6 8 10
II 5 7 8 10 13 15
III 6 9 12 12 15 15
Solution:

Prices Price Relatives [P] Total Index No.


Commodity Commodity
Year [∑ P ¿ ¿ [
∑ P ÷ N ¿¿
I II III I II III
1995 4 5 6 100 100 100 300 100.00
1996 5 7 9 125 140 150 415 138.33
1997 6 8 12 150 160 200 510 170.00
1998 6 10 12 150 200 200 550 183.33
1999 8 13 15 200 260 250 710 236.67
2000 10 15 15 250 300 250 800 266.67
For each commodity the price in a year is divided by that in 1995 and is multiplied by 100 to get
the price relative. Using A.M., the price indices are calculated and are given in the last column of the
above table.

For the first year which is the base year, fixed base index number as well as each P is 100.

CHAIN BASE INDEX

Problem: 1 Prepare index numbers from the average prices of three groups of commodities given below
by taking the base year 1998 and the weights as 5, 3, and 2 respectively.

Group 1998 1999 2000 2001 2002


I 50 55 52 49 55
II 4 5 3 5 6
III 10 10 11 10 9
Solution:

Prices Price Relatives[P] WP ∑℘ F.B.I


Year I II III I II III I II III
1998 50 4 10 100 100 100 500 300 200 1000 100.0
1999 55 5 10 110 125 100 550 375 200 1125 112.5
2000 52 3 11 104 75 110 520 225 220 965 96.5
2001 49 5 10 98 125 100 490 375 200 1065 106.5
2002 55 6 9 110 150 90 550 450 180 1180 118.0

The price of each commodity in every year is divided by its price in 1998 and is multiplied by
100 to get the price relative (P). The price relatives of the three commodities are multiplied by 5, 3, and
2 respectively to get WP values. They are added year wise (∑ ℘ ¿ ¿ and the total is divided by 10 (
∑ w ¿ ¿ to get fixed base index numbers.
Problem: 2 from the following prices of three groups of commodities for the years 1993 to 1997 find
the chain base index numbers.

Groups 1993 1994 1995 1996 1997


I 4 6 8 10 12
II 16 20 24 30 36
III 8 10 16 20 24
Solution:

year Prices Link Relatives(p) Total Mean Chain base


i ii iii i ii iii ∑p ∑p /N Index
1993 4 16 8 100.00 100 100 300.00 100.00 100.00
1994 6 20 10 150.00 125 125 400.00 133.33 133.33
1995 8 24 16 133.33 120 160 413.00 137.78 183.70
1996 10 30 20 125.00 125 125 375.00 125.00 229.63
1997 12 36 24 120.00 120 120 360.00 125.00 275.56
The price of each commodity in every year is divided by its price in the preceding year
and is multiplied by 100 to get the link relative (P) As no weight is given, link relatives are added
year wise and the total is divided by 3. The average of each year is multiplied by the chain index
number of the preceding year and is divided by 100 to get the chain index number of that year.
For the first year(1993) the link relatives and the chain base index number are taken as 100 each.

COST OF LIVING INDEX

Problem:1 Construct cost of living index, for 2000 taking 1999 as the base year from the following data
using ‘Aggregate Expenditure’ Method.

Article Quantity in 1999 Price Rs Per kg


(kg) 1999 2000
A 6 5 .75 6.00
B 1 5. 00 8.00
C 6 6. 00 9.00
D 1 8.00 10.00
E 2 2.00 1.80
F 1 20.00 15.00
Solution :

Cost
Article Quantity Price 2000(p1) p1q0 p0q0
of
1999(q0) 1999(q0)
A 6 5.75 6.00 36.00 34.50 Living
B 1 5.00 8.00 8.00 5.00 Index
C 6 6.00 9.00 54.00 36.00 =
D 4 8.00 10.00 40.00 32.00
E 2 2.00 1.80 3.60 4.00
F 1 20.00 15.00 15.60 20.00
= 156.00 = 131.50

∑ p1 q0 x 100
∑ p 0 q0
= 119.09

Problem:2 Calculate the cost of living index number from the following data.

Item Base year price Current year price Weight


Food 39 47 4
Fuel 8 12 1
Clothing 14 18 3
House Rent 12 15 2
Miscellaneous 25 30 1
Solution:
Item P0 P1 Weight W P1
P= x 100 WP
P0
Food 39 47 4 120.51 482.04
Fuel 8 12 1 150.00 150.00
Clothing 14 18 3 128.57 385.71
House Rent 12 15 2 125.00 250.00
Miscellaneous 25 30 1 120.00 120.00
Total ------ ------- ∑ W = 11 ------ ∑ ℘=1387.75

Cost of Living Index Number =


∑ ℘ = 126.16
∑W

Problem: 3 Using geometric mean, calculate the cost of living index number for the year 2000.

Commodity Price(1990) Price(2000) Weight


Food 60 108 40
Clothing 50 94 17
fuel 40 65 13
House Rent 125 225 27
Miscellaneous 120 240 3
Solution:

Commodity P0 P1 W P1 Log P W log P


P= x 100
P0
Food 40 108 40 180.0 2.2553 90.2120
Clothing 50 94 17 188.0 2.2742 38.6614
fuel 40 65 13 162.5 2.2909 28.7417
House Rent 125 225 27 180.0 2.2553 60.8931
Miscellaneous 120 240 3 200.0 2.3010 6.9030
W
∑ =1 ∑ W log P
00 =225.4112

∑ Wlog P
Cost of Living Index Number = Antilog
( ∑W )
= Antilog ( 225.4112
100 )
= Antilog 2.2541
= 179.51.
UNIT – V

Analysis of Time series

Definition of Time series:

A time series is a collection of observation made sequentially in time.

The series of values might have been observed at regular intervals of time such as daily sales,
Annual profits and decennial census.

E.g. Year: 1991 1992 1993 1994

Production of gold: 121 101 130 132.

Uses of Time Series:


Variables such as Sales, Production, Profit and Population have different values at different
points of time.
(i) The Analysis of Time series helps to know the past conditions.

The observations at the past periods of time indicate the conditions which existed. A
detailed study enables us to know further.

(ii) It helps in assessing the present conditions.

If the past conditions had continued what would be the present position?

What is the actual position now? What are the causes for the difference? Are we satisfied
with the present? Thinking in these lines helps not only to assess the present but also to plan
for the future.

(iii) It helps to predict reliably.

There are many methods in Statistics to estimate the value of a variable at a certain time in
the future. Theories which dwell upon for and against each method are available in plenty. It
has been found that the forecasts by analysis of time series are most reliable.

(iv) It facilitates Comparison.

Relevant time series could be compared and vital inference be drawn.


For example, the production of motor cycle of two companies can be compared over a
period of time. Market share are increasing or decreasing, could be seen. History repeats
itself. It may be worth to watch one series to know the future of a similar series.
(v) It fore warns.
As it predicts the future most reliably, future could be met with due preparedness. If the
sales in a cloth shop is likely to fall, advertisement campaign can be tried to increase the
sales, the services of certain staff may be terminated, unnecessary go down facilities may
be surrendered, etc. In short, losses, if any, could be minimized. Profits, if any, could be
maximized.
Thus, whether time related series of values are there such as in Economics,
Business, Research and Planning, the analysis of time series provides the opportunity to
see them in proper perspective.

Components of Time series:

 Long – Term Effect


 Short term Fluctuations.

Long Term – Effect:

1. Secular Trend

Short term variations:


2. Seasonal fluctuations
3. Cyclical fluctuations
4. Irregular variations.

1) Secular trend:

 The general tendency of the time series data is to increase (or) decrease (or) stagnate during a
long period of time is called secular trend. (or) long term trend .
 The concept of trend doesn’t include short range Oscillations, but rather steady Movements over
a long time.
 This phenomenon is usually observed in most of the series relating to economics and business.
 Upward tendency is usually observed in most of the series relating to Economic and business for
Eg. Population, production, price, Income.
 Downward – Death, epidemics.
 Trend is the general, smooth, long-term average tendency.
 The concept of trend does not include short-range Oscillation but rather steady movements over
a long term.
Mathematically, trend may be
i) Linear or ii) Non – linear.

Graphically, linear trend is a straight line. The discussion in this chapter is restricted to linear
trend. Parabolic trend equation, if necessary, can be formed as explained in ‘Method of Least
Squares’. Trend is the major component. All the other components put together are generally
small.

2) Seasonal fluctuation:

Seasonal is a period which is less than one year it may be a period of 6 months or 4 months
or 3 months or 1 month etc… Certain nature is observed in the first season, another nature is observed in
the second season, etc. Further, the same nature is observed in a season in every year. In other words, the
different natures recur year after year at the respective seasons. These variations over time are called
seasonal fluctuations.

The factors which cause seasonal variations are of the following two kinds:

1. Climate and whether conditions


2. Customs, traditions and habits of the people

(i) Climate and whether conditions: Sales of ice-cream, khadi and cotton clothes, etc. are more
during summer. A sale of umbrellas is at its peak during rainy season. Production of paddy,
wheat, etc. is more in a few months and less in order months of a year. Climate and weather
cause this kind of variation.
(ii) Customs, tradition and habits of the people: Sales of crackers and fireworks is found to be more
during. Deepavali every year. Cloth shops register very good sales during festival seasons such
as Deepavali, Pongal, Ramzon and Christmas sorting and delivering greetings. All these
variations in sales, work load, etc. are due to the customs, traditions and habits of the people.

3. Cyclical Fluctuations:
Cyclical fluctuations are similar to seasonal variations. The difference is in the interval of
recur in seasonal fluctuations a nature of the series recurs at an interval of one year. A cyclical
fluctuation recurs at an interval of 3 or more year. The fitting example is business cycle. In
economics and business, there are many time series which have certain wave – like movements
called business cycle. In economics and Business, there are many time series which have
certain wave-like movements called business cycles. In one period, profits areas easily made and
are made in plenty also. Prices are high. This period is called prosperity. After this 9peak)
condition, things decline instead of improving. High wages, decreasing efficiency, increasing
interest rate, etc. cause the decline. This is the period of recession. After touching the bottom
which is called depression the condition improves. The recovery from depression leads to
prosperity. The four phases of a business cycle, namely, (i) Prosperity (ii) Recession (iii)
Depression and (iv) recovery recur one after another regularly.

------------------------------------------------------------------------------------

Diagram: Business Cycle.

4. Irregular variations:
Variations which do not come under the other three components are called irregular
variations. The other three components have certain regularity. But this is irregular fire, floods,
earthquakes, wars, lock-outs, strikes, etc, cause irregular variations. Sometimes Causes as above
for irregular variations are known. Sometimes causes may not be known. For example, there
may be very poor sales on a particular day in a leading cloth shop on the eve of Deepavali. Cause
for such a happening may not be known.
Irregular variation is called random variation or erratic fluctuations.
Models: There exist certain relations between the components and the series of
observation. The relation between the observed value and the components is called model. Many
models exist. In this book, only two models are considered.
Let Y be observed data, T or Y t be the trend, S be seasonal variation, C be cyclical
variation and I be irregular variation.

(i) Additive Model:


According to this model, Y = T+S+C+I

When short-term variations is to be found out as per this model,

Short-term variation = Y - Y t

(ii) Multiplication Model:

According to this model, Y = T×S×C×I

Many times series in Economics and Business are found to be of multiplicative model. A few
other series are found to be of additive model.

Long term: 1. Secular trend


There are four methods to estimate secular trend, they are
1. Graphical Method
2. Method of Semi Averages
3. Method of Moving averages
4. Method of Least Squares.

(i) Graphic method:


 It is also known as free – hand method x axis represents time and y axis. The observed
data.
 corresponding to each pair of time and observed value, a point is marked on a graph
sheet. After marking all such possible points, the best line is drawn. It is the trend line.
 The trend at any point of time can be found from that line. All the marked points do not
lie on a line. Hence the line is drawn such the following three conditions are satisfied.

I. The number of points above the line is equal to The number points below the line, as far as
possible.
II. The sum of the vertical distances of the points above the line equals that of the points below
the line.
III. The sum of the squares of the vertical distances of all the points from line is the minimum.

Merits:

1. It is a simple method.
2. It is flexible based on the positions of the points; trend line (or) trend curve (non-linear) can be
drawn.

Demerits:

1. It is subjective, different persons get different trend lines (or trend curves)
2. It is not relied for prediction because f its subjective character.

(ii) Method of semi average:

The time series is considered.

 When there are even numbers of years, the middle most year and arithmetic mean of the
observed values are found out for each half.
 When there are odd numbers of years, the middle most years and the corresponding observed
mean value are omitted. The middle most years and the arithmetic mean of the observed values
are then found out for half.
 Based on them two points are marked on a graph sheet. The two points are joined by a straight
line which is extended on either side. It is the trend line.
 The trend at any point of time can be found from that line. Only two points are valued on a line.
There is no difficultly in drawing the line along the two points.

Merits:

1. it is not a subjective method.

2. it involves very simple calculations, it is easy to adopt.

3. the trend at any point of time can be found

Demeits:

1. It is not flexible.
2. It is based on arithmetic mean.

(iii) Method of Moving Averages :(odd and Even)


The method of moving averages is one of the most useful methods of estimating trend. It is
an algebric method. Graph sheet is not used for calculating trend.

For a series, there is only one arithmetic mean; ther are many moving averages. Moving
totals are found and they are divided by appropriate number to get the moving averages. The
following two cases arise:
Case 1. Period of Moving Averages is an odd number such as 3 or 5 or 7……

Moving totals are found and written against the middle most years. Each moving total is divided
by the period of moving average and the corresponding moving average is found.
Moving average is the trend. If short-term fluctuations is required trend is subtracted from the observed
value.

Let a, b, c, …. Be the observed values. When 3 yearly moving averages are required, a+b+c,
b+c+d, c+d+e,…….. are the moving totals corresponding to second, third, fourth,….. years. Each total is
then divided by 3 to get the moving everage.

a+b+ c b+c +d c+ d +e
That is, , , ,…… are the moving averages corresponding to second, third,
3 3 3
fourth,…..years. There is no moving total or moving average corresponding to the first year and the last
year.

When 5 yearly moving averages are required, a+b=c+d=e, b+c+d+e+f, c+d+e+f+g, …..

Are the moving totals corresponding to third, fourth, fifth,…. Years. Each total is then divided by 5 to
a+b+ c+ d+ e b+c +d +e + f c+ d +e+ f + g
get the moving average. That is , , , …… are the moving
5 5 5
averages corresponding to third, fourth, fifth…. Years. For the first two years and the last two years,
there is no moving total or moving average.

7 yearly, 9 yearly,…. Moving averages are calculated in a similar manner.

Case 2. Period of Moving Averages is an even number such as 4 or 6 or 8……

The mid years of the moving totals are not the given years in this case. Hence, 2 period moving totals of
the moving totals are found. The given years are found to be mid years of these totals. 2 period moving
totals are divided by twice the period of moving averages to get the centered moving averages.The
centered moving averages are the trend values.

Merits:

1. It is a simple method. The calculations are easy.


2. It is an objective method for a problem everyone gets the same moving average.
3. It is highly suitable when there is considerable fluctuation in the data.

Demerits:

1. The calculations are tedious when the period of moving average is large and an even numbers.
2. The period of moving average should suit the nature of the series or else a distorted picture of the
time series will average.

(iv) Method of Least Squares:

By taking the time (X) as independent variable and the observed values (Y) as the dependent
variable, the trend line of the form Y = a+bx can be formed as discussed in the chapter,
‘Method has been adopted as such. Afterwards, the method has been used as if it is non-
mathematical.
Merits:

1. Method of least squares in an objective method everyone has to get the same trend equation for a
data.
2
2. The trend lion obtains by this method is called the line of best fit ∑ ( y− y t )=0 and ∑ ( y− y t ) is
the least for the line.

Demerits:

1. It is neither simple nor easy. It requires more time than the other methods.
2. Extreme values affect the results unduly unlike in the method of moving average.

SEASONAL FLUCTUATIONS:

The following four methods are used to estimate the seasonal variations.

1. Method of simple average


2. Method of moving average
(a) Difference from moving average
(b) Ratio – to – moving average
3. Ratio – to – trend method
4. Method of line relatives.

1. Method of simple average:


This method assumes absence of trend in a time series. The following are the steps.
i. The data are arranges season – wise in chronological order.
ii. For each season the total of the seasonal is found and called seasonal total
iii. Each seasonal total is divided by number of year and seasonal average is obtained.
iv. The total and the average of the seasonal averages are found. The average is called grand
average.
v. Seasonal index of every season is calculated as follows.
seasonal average
Seasonal index = ×100
grand average

Merits:

1. It is the easiest method.


2. It is the simple and least time consuming method.

Demerits:

1. It assumes the absence of trend in a time series. This assumption is not always true.
2. It assumes that the averaging process eliminates the seasonal fluctuations. It is also not true.

ANALYSIS OF TIME SERIES

(i) GRAPHIC METHOD


Problem 1: Draw the trend line by graphic method and estimate the production in 2003.

Year 1995 1996 1997 1998 1999 2000 2001


Productio 20 22 25 26 25 27 30
n

Solution:
Year is represented in X axis, production is represented in Y axis points. (1995,20), (1996,22),
(1997,25), (1998,26),(1999,25), (2000,27) and (2001,30) are values on a graph sheet. A control
line in the middle of those points is drawn such that the line satisfies the three conditions.

Graph 1. Trend line dy tue graphic method corresponding to X= 2003, tue Y cordinate of tue
point on the line is found to be 32.2 thus, the estimated production in the year 2003 is 32.2 units.

(ii) METHOD OF SEMI AVERAGE:

Problem:2 The sales in tones of a commodity varied from 1990 to 2001 as under.
280,300,280,280,270,240,230,220,220,210,200 fit a trend line by the method of semi average estimate
the sales in 2002. \

Year Sales in tones middle most year Mean sales


1990 280
1991 300
1992 280 1992.5 165016=275.0
1993 280
1994 270
1995 240
1996 230
1997 220 1992.5 129016=215.0
1998 220
1999 210
2000 200

Graph 2. Trend line by the method of semi-averge points (1992.5,275.0) and (1998.5,215.0) are
marued on a graoph sheet. A line is drawn along them. It is the trend line corresponding to X= 2002, Y=
180 from the line. Hence, tue estimated sales in 2002 is 180 tonnes.

3. METHOD OF MOVING AVERAGE:

Case 1. Period of moving average is an old number such as 3or 5 or 7……

Problem:3 Calculate 5 yearly moving average of numbers 0f students studying in a commerce college
as shown by the following figures:
Year No. of students Year No. of students
1987 332 1992 405
1988 311 1993 410
1989 357 1994 427
1990 392 1995 405
1991 402 1996 438

Solution:

Year No. of students Moving totals 5 yearly moving average


1987 332 - -
1988 311 - -
1989 357 1794 358.8
1990 392 1867 373.4
1991 402 1960 393.2
1992 405 2036 407.2
1993 410 2049 409.8
1994 427 2085 417.0
1995 405 - -
1996 438 - -

Case 2: period of moving average is an even number such as 4 or 6 or 8……

Problem: 4 Using four yearly moving averages calculate the trend values and short term fluctuations.

Year 1981 1982 1983 1994 1995 1996 1997 1998 1999
Productio 464 515 518 467 502 540 557 581 612
n
Solution:

4 yearly
centered Short term
4 yearly 2 period moving average fluctuation
Year Production moving totals moving totals ¿) y- y 6
1981 404 - - -
1982 515 1964 - - -
1983 518 2002 3966 495.75 22.25
1984 407 2027 4029 503.65 -96.63
1985 502 2066 4093 511.63 -9.63
1986 540 2170 4236 529.50 10.5
1987 557 2254 4424 553.00 4.00
1988 571 2326 4580 572.50 -1.5
1989 586 - - -
1990 612 - - -
4. METHOD OF LEAST SQUARES:

Problem: 5 Fit a straight line trend equation to the following data by the method of least squares and
estimate the value of sales for the year 1985

Year 1979 1980 1981 1982 1983


Sales (in Rs) 100 120 140 100 180
Solution:

Let Y= afar be the equation of the trend line where X – year and Y – sales.

As values are larger, consider X = X - X́ = X – 1981

Let the resulting equation by y= a + b x ; Where Y =y

For finding the values of A and B. the normal equation. Are ∑ y = NA+ B∑ x
∑ xy = N∑ x+ B∑ x 2
Year X Sales Y=y x= xy x2 Trend Y+
X-1981
1979 100 -2 -200 4 100
1980 120 -1 -120 1 120
1981 140 0 0 0 140
1982 160 1 160 1 100
1983 180 2 360 4 180

Total ∑ y= 700 ∑ x =0 ∑ xy =200 ∑ x 2=10 ∑ y+=700

By substituting the value from the table.

5A +0B =700 ∴ A = 140

0A + 10B =200 ∴ B=20

The trend equation is y= 140+20x

That is y = 140+20(x-1981)

Corresponding to different values of x, the right hand side gives the trend component¿) hence,
the equation is written as.

¿) = 140+ 20 (x -1981)

Putting X = 1979 trend y t = 140 + 20(-2) = 100

Putting X = 1980 trend y t = 140 + 20(-1) = 120

Putting X = 1981 trend y t = 140 + 20(-0) = 140


Putting X = 1982 trend y t = 140 + 20(1) = 160

Putting X = 1983 trend y t = 140 + 20(2) = 180

Putting X = 1985 trend y t = 140 + 20(4) = 200

SEASONAL FLUCTUATIONS

MEASURE OF SEASONAL VARIATION:

The following four methods are used to estimate the seasonal variations.

5. Method of simple average


6. Method of moving average
(c) Difference from moving average
(d) Ratio – to – moving average
7. Ratio – to – trend method
8. Method of line relatives.

2. METHOD OF SIMPLE AVERAGE:

This method assumes absence of trend in a time series. The following are the steps.
vi. The data are arranges season – wise in chronological order.
vii. For each season the total of the seasonal is found and called seasonal total
viii. Each seasonal total is divided by number of year and seasonal average is obtained.
ix. The total and the average of the seasonal averages are found. The average is called grand
average.
x. Seasonal index of every season is calculated as follows.
seasonal average
Seasonal index = ×100
grand average

Problem: 6

Assuming no trend in the series, calculate seasonal indices for the following data.

QUARTER
Year I II III IV
1994 78 66 84 80
1995 76 74 82 78
1996 72 68 80 70
1997 74 70 84 74
1998 76 74 86 82
Solution:

Year QUARTER
I II III IV
1994 78 66 84 80
1995 76 74 82 78
1996 72 68 80 70
1997 74 70 84 74
1998 76 74 86 82
Seasonal total 376 352 416 384 Total grand average
Seasonal average 75.2 70.4 83.2 76.8 305.6 76.4
Seasonal index 98.4 92.2 108.9 100.5 400.0 -

You might also like