You are on page 1of 121

UNIT-I

INTRODUCTION TO STATISTICS:
The word statistics is derived from the Latin word ‘Status’ or Italian word ‘Statistika’ or
the German word ‘Statistik’ which means a political state. The term statistics was applied to
mean facts and figures which were needed by the state in its day to day life. Statistics was
regarded as a by-product of administrative activities of the State. Statistics is a tool in solving or
analyzing the problem of the State.

The word “statistics” is used in two different senses –Plural and Singular. When used as
plural, statistics means numerical set of data and when used in singular sense, it means the
science of statistical methods embodying the theory and techniques used for collecting, analyzing
and drawing inferences from the numerical data.

STATISTICS DEFINITION:
➢ “Statistics is the science which deals with the collection, classification Presentation,
analysis and interpretation of numerical data.”
➢ “Statistics may be called the science of counting”. - A. L. Bowley
➢ “Statistics may rightly be called the science of averages”. - A. L. Bowley
➢ “Statistics is the science of estimates and probabilities”. - Boddington

CHARACTERISTICS OF STATISTICS:
✓ Statistics are aggregate of facts.
✓ Statistics are affected to a marked extent by a multiplicity of causes.
✓ Statistics are numerically expressed.
✓ Statistics should be enumerated or estimated.
✓ Statistics should be collected with reasonable standard of accuracy.
✓ Statistics should be collected in a systematic manner for a pre-determined purpose.
✓ Statistics should be placed in relation to each other.

1. STATISTICS ARE AGGREGATE OF FACTS:


Statistics deals with groups, but not individual items. For instance, one accident, one
birth, one death, etc., cannot be called statistics. But the aggregate of figures relating to accidents,
birth, death, etc., over different times or places can be called statistics.

A single accident is not statistics. But the total number of accident of a city during a
month is statistics.

2. STATISTICS ARE AFFECTED TO A MARKED EXTENT BY A MULTIPLICITY OF


CAUSES:

Quantitative data or statistical data are influenced by a number of factors. Social


science- economics, history, sociology, etc., are affected by many factors. Statistics is most
commonly used in social sciences. For instance, the fall in sales of a commodity is affected by a
number of factors- supply, demand, market condition, general recession in trade, storage facility,

1
currency, import, export, competition in market, consumer taste is not possible to single out one
cause.

3. STATISTICS ARE NUMERICALLY EXPRESSED:


Numerical data constitute statistics; students can be classified very good, good,
average, and poor, etc., on the basis of their performance in tests. But they are in qualitative
expressions and are not statistics. In particular, the qualitative characteristics- honesty, beauty,
intelligence, etc., which cannot be measured numerically are not statistics.

4. STATISTICS SHOULD BE ENNUMERATED OR ESTIMATED:


The numerical data pertaining to any field of enquiry can be obtained either by
enumeration or by estimation. If the field of enquiry is not large, enumeration can be conducted.
If the field of enquiry is wide and large, enumeration is not possible and in such cases, data can
be estimated.

5. STATISTICS SHOULD BE COLLECTED WITH REASONABLE STANDARD OF


ACCURACY:
A reasonable standard of accuracy is needed in both enumeration and estimation. For
instance, if the weights of students are being measured fractions of kilogram can be ignored;
when measuring the distance from madras to kanyakumari fraction of a kilometer can easily be
ignored. No hard and fast rule can be laid down for all cases. Hence mathematical accuracy
cannot be attained in statistical studies.

6. STATISTICS SHOULD BE COLLECTED IN A SYSYTEMATIC MANNER FOR A


PRE-DETERMINED PURPOSE:
The data should be collected in a systematic manner through some suitable plan. If not,
there will be wastage of time energy and money.

For instance, when we collected the income data from rich people, ignoring the poor, it
will only inflate the national income data, the purpose of data collection must be decided in
advance, and the investigator must be aware for the purpose.

If the object is not known to the investigator, it is possible that he many collected
unnecessary data, which many not be of any use while ignoring necessary data. Thus, without a
pre-determined purpose, the collected data may not yield the desired results.

7. STATISTICS SHOULD BE PLACED IN RELATION TO EACH OTHER:

Statistical data are mostly collected for the purpose of comparison. In order to make
valid comparison the data should be homogeneous, i.e., they should relate to the same
phenomenon or subject. For instance, weights of the boys in a class are to be compared with the
corresponding weights of boys in another class. But it would be meaningless to compare the
height of the student with the height of trees.

2
FUNCTIONS OF STATISTICS:
It presents the data in definite form.
It studies relationship between the variable.
It simplifies the complex data.
It provides a technique of comparison.
It plays a significant role in forecasting and planning.
It helps in formulating policies in business, Industries (or) Government organization etc.,
LIMITATIONS OF STATISTICS:
1. Statistics does not deal with individual items:
Statistics deals with groups or aggregates only. The Scope of statistics lies outside
the study of individual fact.
2. Statistics deals with Quantitative data only :
Statistics is numerical statement of facts. Statistics deals with only the quantitative
data. For Example: Per Capital income, population growth etc., can be studied by
statistics; but qualitative aspects such as honesty, intelligence, poverty, etc., cannot be
studied directly.

3. Statistics may mislead to wrong conclusion in the absence of details :

If figures given without details, we may arrive at wrong and misleading


conclusions.

4. Statistical Laws are true only on averages:


Statistical results are true only on the average. For e.g., population statistics says
that the life span of life in India is 45.It does not mean that all men die at the age of 45.It
is only the average age at death. Statistics laws on based on probability.
5. Statistical does not reveal the entire story:

Statistics simplifies complicated data. Before using the data, the background of
the data may be studied.

6. Statistical data should be uniform and homogeneous:

Comparison is one of the important characters of statistical data.


7. Statistics is liable to be misused:

It is the most important limitations of statistics. Statisticians must know the use
and limitation of statistics. Only then they can make use of it to get fruitful results and
avoid dangerous, wrong and misleading results.
USES OF STATISTICS:
✓ It has a wide range of topics in biology.
✓ It has particular application to agriculture and medicine.
✓ It is used in design & analysis clinical trial in medicine.
✓ It is used in public health, services research nutrition & environmental health.

3
✓ It is used in genomics, Population genetics.
DATA:
✓ The raw material of statistics is data.
✓ It should be numerically expressed
✓ The raw material for statistics is data

DISCRETE VARIABLES:
Measurable (or) countable
They are obtained by enumeration, (i.e.) counting and are also called discontinuous.
Example: Number of children per family we can count the number of children in a family
as 0,1,2,3 and so on. But we cannot count 1.3 or 2.6 children per family.

CONTINUOUS VARIABLES:
Non measurable
Infinite number of values between any two fixed points.
Example: length of fish measured in cm to the nearest mm the length of fish can be
measured as 1.5, 1.6, 1.7cm and so on.

COLLECTION OF DATA:
It is the first step in a statistical enquiry
It is to be planned properly and executed properly
All the aspects of the survey ,starting from planning and ending and writing of the final
report are broadly classified in to two categories

1. Planning a survey
2. Executing a survey

PLANNING A SURVEY:
Purpose of a survey
Scope of a survey
Nature of information required
Sources of data
Accuracy aimed

EXECUTING A SURVEY:
Setting up an administrative organization
Designing of forms
Selecting, training and supervising the field investigators
Reducing non response
Presenting the information
Analyzing the information
Preparing the reports

4
SOURCES OF COLLECTING DATA:

Primary:
Primary data are those statistical data which are collected for the first time and are
original in nature.
Primary data are those which are collected from the individual directly and these data
have never been used for and purpose earlier.

Method of Collecting Primary Data:


Direct personal interviews
Indirect oral investigation
Information from correspondents
Mailed questionnaire method
Schedules sent through enumerates

DIRECT PERSONAL INTERVIEW:


The persons from whom information are collected are known as informants or
respondents.
The investigator personally meets them and asks questions to gather necessary
information

Merits:
1. Data are originally collected
2. True and reliable data
3. Higher degree of accuracy
4. Uniformity and homogeneity can be maintained
Demerits:
1. It is unsuitable when the area are large
2. It is expensive and time consuming

5
INDIRECT ORAL INVESTIGATIONS:
Under this method the investigator contacts witness (or) neibours (or) friends who are
capable of supplying the necessary information.
This method is preferred if the required information is on addiction or cause of fire or
theft or murder.
For e.g, an alcohol addict may not willingly give information on how the habit
started, the quantity of his daily consumption, how he feels with and without alcohol.
He may confide to his friend or to his doctor and not to a social worker who is
collecting the information.

Merits:
1. It is simple and conventional
2. It saves time and money
3. It can be used in the investigation of a large area
Demerits:
1. The information can be relied
2. Interview with improper man will spoil the result
3. The careless attitude of the informant
4. Will affect the degree of accuracy

INFORMATION FROM CORRESPONDENCE:

Under this method, local agents or correspondents will be appointed. They collect the
information and transmit it to the office or person. This system is adopted by newspapers,
periodicals, agencies, etc., when information is needed in different fields.

Merits:
1. It is relatively cheap
2. Requires less time
Demerits:

1. Local agents and correspondents are not likely to be serious and careful.

MAILED QUESTIONNAIRE METHOD:


In this method a questionnaire consisting of a list of questions pertaining to the
enquiry is prepared and is sent to all the informants.
There are blank spaces for answers, who are expected to write the answers in
blank spaces.
A covering letter is also sent along with the questionnaire requesting the
respondents to extent their full co-operation by giving questionnaire dully filled in
time.
Research workers, privates, individuals, non- official are the agencies and state
and central government who adopt this method

6
Merits:
1. It is relatively cheap
2. It is widely used when the area of investigation is large
3. It saves money and time
Demerits:
1. In this method there is no direct contact between the investigator and the respondent.
therefore we cannot be sure about the accuracy and reliability of the data
2. This method is suitable only for the literate people
3. people may not give the correct answers

QUESTIONNAIRE:

The list of questions is technically called questionnaire.


CHARACTERISTICS OF A GOOD QUESTIONNAIRE
1. No of questions should be minimum
2. Questions should be in logical order.eg: are you married. If yes how many children do
you have?
3. Questions should be short &simple
4. Questions requiring lengthy answers are to be avoided. Questions fetching yes (or) no
answers are preferable.
5. Personal questions are to be avoided
6. There should not be a leading questions such as “are you rich “? Instead, the questions
can be what your annual income is.
7. The wording of the questions should be proper without hurting the feeling.
8. Questions should be capable of an objective answer
SCHEDULES SENT THROUGH ENUMERATORS:
It is the most widely used method of collecting of primary data. A number of
enumerator are selected and trained .
They are provided with standard questionnaire. Specific training and interviews
are given to them for filling up the schedules. Each enumerator will be in charge
of certain area.
The investigator goes to the informant along with the questionnaire and gets the
replies and records their answers. Public organization and research institution uses
this method.
Merits:
1. It is very useful in extensive enquires
2. It yields reliable and accurate results ,because the enumerators to are educated
&trained
3. The scope of enquiry can also be greatly enlarged
4. When the respondent are literates this techniques can be used
Demerits:
1. It is very costly method as the enumerators are trained and paid for.
2. This method is time consuming, because the enumerators go personally to obtain the
information.
3. Personal bias of the enumerators may lead to false conclusion.
7
SECONDARY DATA:

➢ It is the statistical information which has already been collected by someone for his own.
Purpose and available for use by other purpose. (Or)
➢ If the data have already been collected by some persons (or) institution and they are made
available for statistical investigation is known as secondary data.
SOURCES OF SECONDARY DATA

Published Sources
Un published sources

1. Published Sources:
Various governmental, International and local agencies publish statistical data, and chief
among them are:

International publications: International agencies and international bodies publish


regular and occasional reports on economic and statistical matters. They are Im.F, U.N.O etc.,
Official publications of central and state Governments: Departments of the Union and
state Governments regularly publish reports on a number of subjects. They gather additional
information.
Semi-Official publications: Semi-Government institutions, like Municipal
Corporation, District board, panchayat, Publish reports.
Publications of Research Institutions: Indian Statistical institution (I.S.I), Indian
council of Agricultural Research (I.C.A.R) Indian Agricultural statistics Research Institute
(I.A.S.R.I),etc., Publish the findings of their research programmers.
Publications of commercial and Financial Institutions;
Reports of various committees and commissions appointed by the Government: For
example, commission Report on taxation, pay commission reports, etc., are sources of
secondary data.
Journals and Newspapers: Current and important materials on statistic and socio-
economic problems can be obtained from journals and newspapers like, Economic times,
Indian finance, Monthly statistics of trade etc.,

2. Unpublished Sources:
There are various sources of unpublished data. They are the records maintained by
various government and private offices, the researches carried out by individual research
scholars in the universities.
LIMITATIONS OF STATISTICS:

Statistics does not deal with individual items.


Statistics deals with quantitative data only.
Statistics may mislead to wrong conclusion in the absence of details.
Statistical laws are true only on average.

8
Statistics does not reveal the entire story.
Statistical data should be uniform and homogeneous.
Statistics is liable to be misused.

CLASSIFICATION AND TABULATION:

DEFINITION:
• The process of arranging data into groups according to some common characteristics
• The process of arranging (or) grouping a large no of individual facts (or) observation on
the basics of similarity among the items is called classification.

OBJECTS (OR) PURPOSE:

To condense the mass of data


To present the facts in a simple form
To bring out clearly the points of similarity and dissimilarity
To facilitate comparison
To bring out the relationship
To prepare data for tabulation
To facilitate statistical treatment of the data
To facilitate easy interpretation
To eliminate unnecessary details.
DATA CAN BE CLASSIFIED ON THE BASIC OF THE FOLLOWING

❖ Geographical (or) spatial


❖ Chronological (or)temporal (or)historical
❖ Qualitative
❖ Quantitative

Geographical

Quantitative
Classificatio Chronologic
n al

Qualitative

9
Geographical (or) spatial classification:

The classification is based on place (or) region such as states, towns, city, and village.
Number of cancer affected
Region
persons
Tamil nadu …………..
Kerala ………….
Maharastra ………….
Delhi …………….
kolkatta …………..

Chronological (or) temporal (or) historical:

The classification is based on time and arranged chronologically (or) historically.


Statistical data like years, months, weeks, days, hours etc.

Year Rainfall in cms


1990-91 …………..
1991-92 ………….
1992-93 ………….
1993-94 …………….
1994-95 …………..

Qualitative:

• Classification is based on the attributes (or) characteristics. (i.e.) non- measurable


characteristic are qualitative.

Sex Number of students


Male 1407
Female 538
Total 1945

Qualitative classification can be of two types as follows:

1. Simple classification or one way classification


2. Manifold classification or two way classification

10
Simple classification or one way classification
One way classification means classification of data on the basic of only one consideration
this is based on only one quality.
If the data are classified into only two classes, such as literate and illiterate or honest and
dishonest or skilled and unskilled, the classification is termed as simple classification.

For example:

Population Population

Male Female Literate Illiterate

MANIFOLD CLASSIFICATION:

This is based on more than one quantity. For example, the college students can be
classified on the basis of three attributes sex, subject of study and religion as follows.
Manifold classification: In manifold classification, the universe is classified on the basis
of more than one attribute at a time.
Population

Male Female

Literate Illiterate Literate Illiterate

M UM M UM M UM M UM
11
QUANTITATIVE CLASSIFICATION:

Measurable characteristic are quantitative.


That is the statistical data according to numerical measurable such as age, height,
weight; quantitative phenomenon is called a variable.
Some data can be classified in term of magnitude. The marks of the students are
given below:

Marks Number of students


0-39 124
40-49 471
50-59 908
60-69 442
Total 1945

FREQUENCY DISTRIBUTION:

It is simply a table in which the data are grouped into classes and the number of cases which fall
in each class is recorded. Frequency distribution can be two kinds:

1. Univariate Frequency Distribution


2. Bivariate Frequency Distribution ( Two – way Frequency Distribution)

INDIVIDUAL DATA:

For some statistical calculations, the series of individual observation are to be arranged in
either ascending or descending order. This is called as array.

Marks: 40 33 27 38 41 48 44 51 39 35.

Arraying
Observed Values
Ascending Order Descending Order
40 27 55
33 33 51
27 38 48
38 39 44
41 40 41
48 41 40
44 44 39
51 48 38
39 51 33
35 55 27

12
DISCRETE FREQUENCY DISTRIBUTION:

Here each class is distinct and separate from the other classes. We have to count the number
of times each value of the variable is repeated in the data and it is called the frequency of that
class.

Consider the marks scored by 30 students

9 7 5 3 4 8 6 0 6 5 9 1 7 2 3 8 6 8 7 4 9 4 5 10 5 9 6 9 5 6

Form a discrete frequency table.

Marks Tally marks Number of student (Frequency)


0 l 1
1 l 1
2 l 1
3 ll 2
4 lll 3
5 llll 5
6 llll 5
7 lll 3
8 lll 3
9 llll 5
10 l 1
Total 30

RESULT:

The discrete frequency table is given below

Marks 0 1 2 3 4 5 6 7 8 9 10
No of student 1 1 1 2 3 5 5 3 3 5 1

CONTINUOUS OR GROUPED FREQUENCY DISTRIBUTION:

Continuous series is one where measurements are only approximations and are expressed
in class intervals. Collection of items, which cannot be exactly measured, but placed within
certain limits, is called continuous series.

Class limits:

The class – limits are the smallest or the lowest and the largest or the highest values in the
class. For example take the class 10-20. The lowest value is 10 and the highest value is 20. The
two boundaries of the class are known as the lower limit and the upper limit of the class. Class
limits is also known as class boundaries.

13
Class intervals:

The difference between the lower limit and the upper limit of the class is known as the
class-interval; for example in the class 10-20 the class interval is 10. (i.e., 20 -10).

1) Exclusive method (Overlapping)


2) Inclusive method (non – overlapping)

Exclusive method: Inclusive method:

Marks No. Of students


Marks No. of students
10 – 19 17
10 - 20 15
20 -29 15
20 -30 20
30 –39 12
30 - 40 10
40 - 49 10
Total 45
Total 54
Example: Marks obtained by 50 students.

78 25 25 50 30 29 55 52 43 43
44 20 48 44 43 58 36 46 48 47
56 60 31 47 53 65 68 73 59 12
34 74 79 20 16 70 65 39 60 45
60 20 47 49 51 38 49 35 52 61

Form a frequency table using inclusive method.

Marks Tally Marks No. of Students


10-14 l 1
15-19 l 1
20-24 lll 3
25-29 lll 3
30-34 lll 3
35-39 llll 4
40-44 llll 5
45-49 llll llll 9
50-54 llll 5
55-59 llll 4
60-64 llll 4
65-69 llll 3
70-74 llll 3
75-79 ll 2

14
Continuous Table:

Marks 10- 15- 20- 25- 30- 35- 40- 45- 50- 55- 60- 65- 70- 75-
14 19 24 29 34 39 44 49 54 59 64 69 74 79

f 1 1 3 3 3 4 5 9 5 4 4 3 3 2

TABULATION:

Tabulation is the process of arranging data systematically in rows and columns of a table.
It is designed to simplify presentation and facilitates comparison and analysis.

OBJECTS:

Large and complex data can be presented in a neat and compact form.
Nature of the data can be easily understood.
Much of the time which is otherwise necessary to look of the data is saved.
The data are so pleased in a table that proper comparison is possible easier.
A table facilities further analysis of data.
A table is the convenient form for diagrammatic representation of data.
Voluminous data can be presented in a small space.
It remains a permanent record and enables ready reference.
Sometimes omissions and errors can be detected.
PARTS OF A TABLE:

Table number
Title
Prefatory note or head note
Stubs
Captions
Body of the table
Foot-notes
Source notes.
TABLE NUMBER:

A table should always be numbered for easy identification and reference in future .The
table number may be placed at the top of the table either in the centre above the title or in the left
side of the table.
TITLE:
Each table should be given a suitable title. It must describe the contents of the table.

PREFATORY NOTE OR HEAD NOTE:

It is a statement, given below the title and enclosed in brackets. For example, unit of
measurement such as crores of rupees.
15
STUBS:
These are the row headings. These constitute the first column and explain what the rows
are about.

CAPTIONS:
These are the column headings. These tell what the columns are about. There can be sub
headings.

BODY OF THE TABLE:


It contains the numerical information. It is the most important part of the table. The
arrangement in the body is generally from left to right in rows and from top to bottom in
columns.
FOOT NOTES:
If any explanation or elaboration regarding any item is necessary, foot notes should be
given.
SOURCE NOTE:
It refers to the source from where information has been taken. It is useful to the reader to
check the figures and gather additional information.
FORMAT OF A TABLE:

TITLE (HEAD NOTE)

Stub Caption Heading


Heading
Caption Heading

Stub Body
Entries

NUMBER (FOOT NOTE)

16
DIFFERENCE BETWEEN CLASSIFICATION AND TABULATION:

CLASSIFICATION TABULATION

This is the process of dividing the data This is the process of arranging the classified data
into homogeneous subgroups systematically in rows and columns of a table

This condenses the mass of data and This provides the data a readily referable and
facilitates to grasp the nature. almost permanent form.

This foreruns tabulation This completes an important stage enumeration

This is a process of analysis of data This is process of presentation of data

Careful planning for tabulation is This is mechanical function after classification


necessary even at this stage.

Measures of Central Tendency:

A measure of central tendency gives a single representative value for a set of unequal
values. The measures of central tendency are known as ‘Measures of location’. They are
popularly called averages. Various measures of central tendency are the following.

Arithmetic mean
Median
Mode
Geometric mean
Harmonic mean

DEFINITION: ARITHMETIC MEAN:

Arithmetic mean is the total of the items divided by their number.

ARITHMETIC MEAN:

Arithmetic average is also called as Mean.


It is the most common type and widely used measure of central tendency.

17
Merits of Arithmetic mean:

It is easy to understand.
It is easy to calculate.
It is used in further calculation.
It is easy to understand.
It is used in further calculation.
It is rigidly defined.
It is based on the value of every time in the series.
It provides a good basis for comparison.
It can be used for further analysis and algebraic treatment.
The mean is a more stable measure of central tendency.
The arithmetic average is not indefinite.

Demerits (limitations) of arithmetic mean:

The mean is unduly affected by the extreme items.


It is unrealistic.
It may lead to a false conclusion.
It cannot be accurately determined even if one of the values is not known.
It is not useful for the study of qualities like intelligence, honesty, and character.
It cannot be located by observation or the graphic method.

Uses of Arithmetic Mean:

It is considered to be the best of all averages.


It is familiar to everyone.
Arithmetic mean is called the Ideal Average.
It is used in social, economic and business problems.
The average cost of production, average income, average price, we mean the
arithmetic average.

INDIVIDUAL CASE (OR) RAW DATA:

Example 1: The expenditure of 10 families in rupees are given below

Family A B C D E F G H I J
Expenditure 30 70 10 75 500 8 42 250 40 36
Calculate the Arithmetic mean.

18
Solution:

Solution:Family Expenditure (Rs). X


A 30
B 70
C 10
D 75
E 500
F 8
G 42
H 250
I 40
J 36
TOTAL  x = 1061

 x 1061
X= = = 106.1
N 10

DISCRETE SERIES:

EXAMPLE 1:

Calculate the mean number of persons per house. Given

No. of persons: 2 3 4 5 6

No. of houses: 10 25 30 25 10
Solution:

x f Fx
2 10 20
3 25 75
4 30 120
5 25 125
6 10 60
 f = 100  fx = 400

 fx 400
X= = =4
f 100

CONTINUOUS SERIES:

Example 1: Calculate Arithmetic mean for the following:

Marks: 20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 70 - 80
No. of students: 5 8 12 15 6 4

19
Solution:

x F m fm
20 – 30 5 25 125
30 – 40 8 35 280
40 – 50 12 45 540
50 – 60 15 55 825
60 – 70 6 65 390
70 – 80 4 75 300
Total  f = 50  fm = 2460

 fm 2460
X= = = 49.2
f 50

Example 2: The annual profits of 90 companies are given below. Find the arithmetic mean
(Inclusive method)

Annual profit(Rs.lakhs) 0 – 19 20 – 39 40 – 59 60 – 79 80 – 99
No.of companies 5 17 32 24 12
Solution:

No.of
Annual True class Mid value
companies fm
profit(Rs. lakhs) interval m
F
0 – 19 -0.5 – 19.5 5 9.5 47.5
20 – 39 19.5 – 39.5 17 29.5 501.5
40 – 59 39.5 – 59. 5 32 49.5 1584.0
60 – 79 59.5 – 79.5 24 69.5 1668.0
80 – 99 79.5 – 99.5 12 89.5 1074.0
Total ∑ 𝒇 = 𝟗𝟎 ∑ 𝒇𝒎 = 𝟒𝟖𝟕𝟓. 𝟎
 fm 4875
Arithmetic mean X = = = 54.17
f 90

X = Rs.54.17

Example 3: Calculated mean from the following data:

Value Frequency
Less than 10 4
Less than 20 10
Less than 30 15
Less than 40 25
Less than 50 30
Less than 60 35
Less than 70 45
Less than 80 65

20
Solution: In this problem cumulative frequencies and classes are given. We Will first convert the
data in simple series from the given cumulative frequencies. After this, the calculation of mean is
done. This is illustrated below:

Value Individual frequency m fm


0 – 10 4 5 20
10 – 20 10 – 4 = 6 15 90
20 – 30 15 – 10 = 5 25 125
30 – 40 25 – 15 = 10 35 350
40 – 50 30 – 25 = 5 45 225
50 – 60 35 – 30 = 5 55 275
60 – 70 45 – 35 = 10 65 650
70 – 80 65 – 45 = 20 75 1500
∑ 𝒇 = 𝟔𝟓 ∑ 𝒇𝒎 =3235

∑ 𝑓𝑚
𝑥̅ =
∑𝑓

3235
=
65

𝑥̅ = 49.77

Example 4: From the following information pertaining to 150 workers. Calculate average wage
paid to workers.

Wages (Rs.) No. of workers


More than 75 150
More than 85 140
More than 95 115
More than 105 95
More than 115 70
More than 125 60
More than 135 40
More than 145 25

Solution: There are no workers who received wages less than Rs.75. The lower limit of the first
class is 75. The class interval would be 75 – 85, 85 – 90 and so on.

21
No .of workers
Wages X m fm
f
75 – 85 80 150 – 140 = 10 800
85 – 95 90 140 – 115 = 25 2250
95 – 105 100 115 – 95 = 20 2000
105 – 115 110 95 – 70 =15 1650
115 – 125 120 70 – 60 = 10 1200
125 – 135 130 60 – 40 = 20 2600
135 – 145 140 40 – 25 = 15 2100
145 – 155 150 25 3750
∑ 𝒇 = 𝟏𝟒𝟎 ∑ 𝒇𝒎 = 𝟏𝟔𝟑𝟓𝟎

∑ 𝑓𝑚 16350
𝑥̅ = =
∑𝑓 140

x = 116.79

Example 5: Find mean of the following data

Class interval 50 – 59 40 – 49 30 – 39 20 – 29 10 – 19 0 – 9
Frequency 1 3 9 10 15 2

Solution:

Convert inclusive to exclusive method by subtracting lower limit 0.5 and adding upper
limit 0.5. Then exclusive class interval series is (49.5 – 59.5, 39.5 – 49.5 and so on) nor to
arrange the data in ascending order, beginning with 0 – 9.

Class interval Mid value f c.f fm


49.5 – 59.5 54.5 1 1 54.5
39.5 – 49.5 44.5 3 4 133.5
29.5 – 39.5 34.5 9 13 310.5
19.5 – 29.5 24.5 10 23 245
9.5 – 19.5 14.5 15 38 217.5
-0.5 – 9.5 4.5 2 40 9
∑ 𝒇 = 𝟒𝟎 ∑ 𝒇𝒎 = 970

∑ 𝑓𝑚 970
𝑥̅ = ∑𝑓
= 40

𝑥̅ = 24.25

22
DEFINITION: MEDIAN:

Median is the value of the middle most items when all the items are in the order of
magnitude.

MEDIAN:

Median divides the series into two equal parts.


Middle most items.
It is also called a positional average.

Merits of median:

It is easy to understand and easy to compute.


It is quite rigidly defined.
It is eliminate the effect of extreme items.
It is amenable to further algebraic process.
Median can be calculated even from qualitative phenomena (i.e.) honesty,
character etc.,
Median can sometimes be known by simple inspection.
Its value generally lies in the distribution.

Demerits of median:

Typical representative of the observations cannot be computed if the distribution


of item irregular.
Where the number of items is large, the prerequisite process.
It ignores the extreme item.
In case of continuous series, the median is estimated, but not calculated.
It is more affected by fluctuations of sampling than is mean.
Median is not amenable to further algebraic.

Characteristics of Median:

Unlike the mean, the median can be computed from open-ended distribution.
In case of qualitative data where the items are not counted or measured but are
scored or ranked, it is the most appropriate measure of central tendency.
The median can be determined graphically whereas mean cannot found out.

INDIVIDUAL CASE (OR) RAW DATA: (ODD ORDER)

Example 1: The following are the marks scored by 7 students; find out the median marks:

Roll no: 1 2 3 4 5 6 7
Marks: 45 32 18 57 65 28 46

23
Solution:

R. No Marks R. No Marks
1 45 3 18
2 32 6 28
3 18 2 32
4 57 1 45
5 65 4 57
6 28 7 58
7 46 5 65

𝑁+1 𝑡ℎ
Median = Size of ( ) item
2

7+1 𝑡ℎ
= Size .of ( ) item
2

= Size .of 4th item

Median = 45

Example 2: (Even numbers)

Find out the median from the following:

57 58 61 42 38 65 72 66

Solution:

SI. No Values
1 38
2 42
3 57
4 58
5 61
6 65
7 66
8 72

𝑁+1 𝑡ℎ
Median = Size. of ( ) 𝑖𝑡𝑒𝑚
2

8+1 𝑡ℎ
= Size. of ( ) 𝑖𝑡𝑒𝑚
2

= 4.5th item

4𝑡ℎ 𝑖𝑡𝑒𝑚+5𝑡ℎ 𝑖𝑡𝑒𝑚


Size of (4.5th item) = 2

58+61
= = 59.5
2

24
Example 3: Discrete series:

Locate median from the following:

Size of shoes: 5 5.5 6 6.5 7 7.5 8


Frequency: 10 16 28 15 30 40 34
Solution:

Size of shoes f cf
5 10 10
5.5 16 26
6 28 54
6.5 15 69
7 30 99
7.5 40 139
8 34 173
𝑁+1 𝑡ℎ
Median = S. of ( ) 𝑖𝑡𝑒𝑚
2

173+1 𝑡ℎ
= S. of ( ) 𝑖𝑡𝑒𝑚
2

= 87th item

Median size of shoe = 7.

Example 4: Continuous series

Calculate the median from the following data:

Marks: 10 - 25 25 - 40 40 - 55 55 - 70 70 - 85 85 -100
Frequency: 6 20 44 26 3 1

Solution: Computation of median:

Marks Frequency Cumulative frequency


X f cf
10 – 25 6 6
25 – 40 20 26
40 – 55 44 70
55 – 70 26 96
70 – 85 3 99
85 – 100 1 100

𝑁 100
Median item = = = 50
2 2

Median lies between 40 – 55


𝑁
− 𝑐𝑓
2
Median = 𝐿 + ×𝑖
𝑓

25
50−26
= 40 + ×5
44

= 40 + 8.18

= 48.18 marks

Example: Calculate the median for the following (inclusive method)

value 0 – 9 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69
Frequency 328 720 664 598 524 378 244
Solution:

Frequency Cumulative frequency


Value True class intervals
f (cf)
0–9 328 -0.5 – 9.5 328
10 – 19 720 9.5 – 19.5 1048
20 – 29 664 19.5 – 29.5 1712
30 – 39 598 29.5 – 39.5 2310
40 – 49 524 39.5 – 49.5 2834
50 – 59 378 49.5 – 59.5 3212
60 – 69 244 59.5 – 69.5 3456
total N = 3456

Second lower limit – First upper limit = 10 – 9

=1
1
Half of the difference = 2 = 0.5

0.5 has been added to each upper limit and 0.5 has been subtracted from each lower limit to get
the boundaries of the true class intervals. It is the required form for the calculation of median.
𝑁 3456
= = 1728. Hence, the median class interval is 29.5 – 39.5
2 2

L = 29.5; f = 598; i = 39.5 – 29.5 = 10; cf = 1712

By substituting these values in

(𝑁⁄2 − 𝑐𝑓)
𝑀 =𝐿+[ ]𝑋 𝑖
𝑓

(1728−1712)
= 29.5 + [ ] 𝑋 10
598

10×16
= 29.5 + [ ]
598

= 29.5+0.27

= 29.77

26
DEFINITION: MODE:

Mode is the value which has the greatest frequency density.


Mode is defined as the value of the variable which occurs most frequently in a
distribution.

MODE:

Mode is the most common item of a series.


Mode is the value which occurs the greatest number of frequency in a series.

Merits of mode:

It is easy to understand as well as easy to calculate.


It is usually an actual value as it occurs most frequently.
It is not affected by extreme values as in the average.
It is simple and precise.
It is the most representative average.
The value of mode can be determined by the graphic method.

Demerits of mode:

It is not suitable for further mathematical treatment.


It may not give weight to extreme items.
In a bimodal distribution there are two modal classes.
It is stable only when the sample is large.
It will not give the aggregate value as in average.
It is difficult to compute; when there are both positive and negative items in a series
and when there are one or more items is zero.

Uses of Mode:

The concept of mode is used by the people in their everyday life.


For example, a manufacturer of banians, ready-made garments, or shoes etc., is
interested in the model size and manufactures them in larger quintiles.
Mode helps the manufacture in deciding the models.
It is useful in industry and business.
Weather forecasts are also based on mode.
It is very useful to agriculturists, businessmen, etc.
Mode is also used in socio-economic surveys.
Mode is also mostly used in business and commerce.

INDIVIDUAL SERIES:

Example 1:

10 persons have the following income: Rs. 850, 750, 600, 825, 850, 725, 600, 850, 640, and
530.

27
850 repeat three times,
Therefore the mode salary is 850.

NO MODE AND MULTI MODAL

Example: (a) 40, 44, 57, 78, 48 (No mode)

(b) 45, 55, 50, 45, 40, 55, 45, 55 (multimodal)

Example: Discrete series:

Calculate the mode from the following:

Size: 10 11 12 13 14 15 16 17 18
Frequency: 10 12 15 19 20 8 4 3 2

Solution: Grouping table:

Size f (2) (3) (4) (5) (6)


10 10
11 12 22 37
12 15 27 46
13 19 34 54
14 20 39 47
15 8 28 12 32
16 4 7 15
17 3 5 9
18 2

Analysis Table:
X 1 2 3 4 5 6 TOTAL
10 -
11 1 1
12 1 1 1 3
13 1 1 1 1 1 5
14 1 1 1 1 4
15 1 1
16 -
17 -
18 -

MODE =13

28
Example: Calculate the mode (inclusive method)

Marks 0 – 19 20 – 39 40 – 59 60 – 79 80 – 99
No. of Students 5 20 35 20 12
Solution:

Marks No.of Students Marks(True class interval)


0 – 19 5 -0.5 – 19.5
20 – 39 20 19.5 – 39.5
40 – 59 35 39.5 – 59.5
60 – 79 20 59.5 – 79.5
80 – 99 12 79.5 – 99.5

Greatest frequency = 35. ∴Modal class: 39.5 – 59.5.


∴ 𝐿 = 39.5; 𝑖 = 59.5 − 39.5 = 20; 𝑓1 − 𝑓0 = 35 − 20 = 15; 𝑓1 − 𝑓2 = 35 − 20 = 15

(𝑓1 − 𝑓0 )
∴𝑍 =𝐿+[ ]𝑋 𝑖
2𝑓1 − 𝑓0 − 𝑓2

20 × 15
= 39.5 + [ ]
(15 + 15)

300
= 39.5 + [ ]
30
= 39.5 – 10.0

= 49.5

Example: Find mode for the following data

CI 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40


F 9 12 15 16 17 15 10 13

Grouping Table:
Size of item f (2) (3) (4) (5) (6)
0–5 9
5 – 10 12 21 36
10 -15 15 27 43
15 -20 16 31 48
20 – 25 17 33 48
25 – 30 15 32 42
30 - 35 10 25 38
23
Analysis Table:

29
X 1 2 3 4 5 6 TOTAL
0-5 -
5 – 10 1 1
10 -15 1 1 2
15 -20 1 1 1 1 4
20 – 25 1 1 1 1 1 5
25 – 30 1 1 2
30 - 35 -
35 – 40 -
-

Greatest frequency =17. ∴Modal class: 20-25


∴ 𝐿 = 20; 𝑖 = 5; 𝑓1 = 17; 𝑓0 = 16; 𝑓2 = 15

(𝑓1 − 𝑓0 )
∴𝑍 =𝐿+[ ]𝑋 𝑖
2𝑓1 − 𝑓0 − 𝑓2

17 × 16
= 20 + [ ]𝑋 5
(34 − 16 − 15)

272
= 20 + [ ]𝑋 5
3
= 39.5 – 10.0

DEFINITION: GEOMETRIC MEAN:

Geometric mean is defined as Nth root of the product of N items.

GEOMETRIC MEAN:

G.M. is the abbreviation.


If there are two items, we take the square root; if three, the cube root; and so on.
The G.M. in never larger than the arithmetic mean.
If there are zeros or negative values in the series, the G.M. cannot be used.
Thus the GM is obtained by multiplying together all the values of the series and
then calculating the root of their product corresponding to the number of items in
the group.

30
Merits of G.M:

It is based on all observations.


It is rigidly defined.
It is capable of further algebraic treatment.
It is less affected by extreme values.
It is useful in studying economic and social data.
It is suitable for averaging ratios, rates and percentages.

Demerits of G.M:

It is difficult to understand.
Non – mathematical persons cannot do calculations.
The G.M. cannot be computed if any item in the series is negative or zero.
It has restricted application.

Uses of G.M:

G.M. is highly useful in averaging ratios, percentages and rate of increase between
two periods.
G.M. is important in the construction of index numbers.
In economic and social sciences, where we want to give more weight to smaller
items and smaller weight to large items, G.M. is appropriate.

Example: Individual series:

Calculate geometric mean of the following:

50 72 54 82 93

Solution:

X log X
50 1.6990
72 1.8573
54 1.7324
82 1.9138
93 1.9685
∑ 𝐥𝐨𝐠 𝑿 = 𝟗. 𝟏𝟕𝟏𝟎
∑ log 𝑋
G.M. = Antilog ( )
𝑁

9.1710
= Antilog ( )
5

= 68.26

Example: Discrete series:

The following table gives the weight of 31 persons in sample survey. Calculate G.M.

31
Weight(Ibs): 130 135 140 145 146 148 149 150 157
No.of Persons: 3 4 6 6 3 5 2 1 1

Solution:

X f log x f log x
130 3 2.1139 6.3417
135 4 2.1303 8.5212
140 6 2.1461 12.8766
145 6 2.1614 12.9684
146 3 2.1644 6.4932
148 5 2.1703 10.8515
149 2 2.1732 4.3464
150 1 2.1959 2.1761
157 1 2.1959 2.1959
∑ 𝒇 = 𝟑𝟏 ∑ 𝒇 𝐥𝐨𝐠 𝒙 = 𝟔𝟔. 𝟕𝟕𝟏𝟎
∑ 𝑓 log 𝑥
G.M. = Antilog ( ∑𝑓
)

66.7710
= Antilog ( )
31

G.M. = 142.5

Example: Continuous series:

Find out the geometric mean.

Yield of wheat (mounds) No. of farms


7.5 – 10.5 5
10.5 – 13.5 9
13.5 – 16.5 19
16.5 – 19.5 23
19.5 – 22.5 7
22.5 – 25.5 4
25.5 – 28.5 1

32
Solution:

X f m log m f log m
7.5 – 10.5 5 9 0.9542 4.7710
10.5 – 13.5 9 12 1.0792 9.7128
13.5 – 16.5 19 15 1.1761 22.3459
16.5 – 19.5 23 18 1.2553 28.8719
19.5 – 22.5 7 21 1.3222 9.2554
22.5 – 25.5 4 24 1.3802 5.5208
25.5 – 28.5 1 27 1.4314 1.4314
∑ 𝒇 = 𝟖𝟎 ∑ 𝒇 𝐥𝐨𝐠 𝒎 = 𝟖𝟏. 𝟗𝟎𝟗𝟐
∑ 𝑓 𝑙𝑜𝑔 𝑚
G.M = Antilog ( ∑𝑓
)

81.9092
= Antilog ( )
16.02

G.M = 16.02

DEFINITION: HARMONIC MEAN:

Harmonic mean is the reciprocal of the arithmetic mean of the reciprocal of


values.

HARMONIC MEAN:

Harmonic mean, like geometric mean is a measure of central tendency in solving


special types of problems.
The reciprocal of a number is that value, which is obtained by dividing one by the
value.

Merits of Harmonic mean:

It is rigidly defined.
It is based on all the observations of the series.
It is suitable in case of series having wide dispersion.
It is suitable for further mathematical treatment.
It gives less weight to large items and more weight to small items.

Demerits of Harmonic mean:

It is difficult to calculate and is not understandable.


All the values must be available for computation.
It is not popular.
It is usually a value which does not exist in series.

33
Example: Individual series:

The monthly incomes of 10 families in rupees in a certain village are given below:

Family: 1 2 3 4 5 6 7 8 9 10
Income: 85 70 10 75 500 8 42 250 40 36
Solution:

𝟏
Family Income X
𝑿
1 85 0.01176
2 70 0.01426
3 10 0.10000
4 75 0.01333
5 500 0.00200
6 8 0.12500
7 42 0.02318
8 250 0.00400
9 40 0.02500
10 36 0.02778
𝟏
∑ ( ) = 𝟎. 𝟑𝟒𝟔𝟑𝟏
𝑿
𝑁
Harmonic mean = 1
∑( )
𝑋
10
= 0.34631
Harmonic mean = 28.87

Example: Discrete series: Calculate H.M from the following data

Size of items: 6 7 8 9 10 11
Frequency: 4 6 9 5 2 8

Solution:

𝟏 𝟏
x f 𝒇( )
𝒙 𝒙
6 4 0.1667 0.6668
7 6 0.1429 0.8574
8 9 0.1250 1.1250
9 5 0.1111 0.5555
10 2 0.1000 0.2000
11 8 0.0909 0.7272
𝟏
∑ 𝒇 = 𝟑𝟒 ∑ 𝒇 ( ) = 𝟒. 𝟏𝟑𝟏𝟗
𝒙
𝑁 34
Harmonic mean = 1 = =8.23
∑ 𝑓( ) 4.1319
𝑥

34
Example:

Calculate H.M of the following data:

Marks: 30 - 40 40 - 50 50 - 60 60 - 70 70 - 80 80- 90 90 -100


Frequency: 15 13 8 6 15 7 6
Solution:

𝟏 𝟏
x f m 𝒇( )
𝒎 𝒎
30 – 40 15 35 0.02857 0.42855
40 – 50 13 45 0.02222 0.28886
50 – 60 8 55 0.01818 0.14544
60 – 70 6 65 0.01534 0.09204
70 – 80 15 75 0.01333 0.19995
80 – 90 7 85 0.01176 0.08232
90 -100 6 95 0.01053 0.06318
𝟏
∑ 𝒇=70 ∑ 𝒇 ( ) = 𝟏. 𝟑𝟎𝟎𝟑𝟒
𝒎

𝑁
Harmonic mean = 1
∑ 𝑓( )
𝑚

WEIGHTED ARITHMETIC MEAN:

If the values (K) : 𝑋1 𝑋2 𝑋3 … . . ℎ𝑎𝑣𝑒

Weight (W) : 𝑊1 𝑊2 𝑊3 ….

The weighted A.M

𝑊1 𝑋1 + 𝑊2 𝑋2 + 𝑊3 𝑋3 + ⋯ … ∑ 𝑊𝑋
𝑋̅𝑤 = =
𝑊1 + 𝑊2 + 𝑊3 + ⋯ … . ∑𝑊

35
EXAMPLE: 1

Calculate the simple average and the weighted average of the following data and account for
the difference in the averages.

Items 68 85 101 102 108 110 112 113 124 128


weights 1 45 31 1 11 7 23 17 14 14
Solution:

Items X Weight W WX
68 1 68
85 45 3825
101 31 3131
102 1 102
108 11 1188
110 7 770
112 23 2576
113 17 1921
124 14 1736
128 14 1792
∑ 𝑋 =1051 ∑ 𝑊 =164 ∑ 𝑊𝑋 =17109

∑𝑥 1051
Simple average = = = 105.10
𝑁 10

∑ 𝑊𝑋 17109
Weighted average = ∑𝑊
= = 104.32
164

FORMULA FOR COMBINED MEAN:

Let there 𝑁1 items in the first group with mean 𝑋̅1 𝑎𝑛𝑑 𝑁2 items in the second group with
mean 𝑋̅2

∴ The total of 𝑁1 items = 𝑁1 𝑋̅1 𝑎𝑛𝑑

The total of 𝑁2 items = 𝑁2 𝑋̅2

When these two groups merge together, there are 𝑁1 + 𝑁2 items whose total= 𝑁1 𝑋̅1 + 𝑁2 𝑋̅2

𝑁1 𝑋̅1 + 𝑁2 𝑋̅2
𝑋̅12 =
𝑁1 + 𝑁2
̅
𝑁 𝑋 +𝑁 𝑋 +𝑁 𝑋 ̅ ̅
𝑋̅123 = 1 1 2 2 3 3
𝑁1 +𝑁2 +𝑁3

36
EXAMPLE: 1

𝑵𝟏 = 100 𝑁2 = 80 𝑋̅1 = 275 𝑋̅2 = 225 find the mean of the salaries of the employees of the
establishment as a whole.

Solution:

𝑁1 𝑋̅1 + 𝑁2 𝑋̅2
𝑋̅12 =
𝑁1 + 𝑁2
100×275+80×225
= 100+80

27500+18000
= 180

45500
= 180

= Rs. 252.78

37
UNIT-II
MEASURES OF DISPERSION
INTRODUCTION:

In a series, all the items are not equal. There is difference or variation among the values.
The degree of variation is evaluated by various measures of dispersion.

Averages are central values. They enable comparison of two or more sets of data. They
are not sufficient to depict the true nature of the sets. For example, consider the following marks
of two students.

Student I Student II
68 85
75 90
65 80
67 25
70 65

Both have got a total of 345 and an average of 69 each. The fact is that the second student has
failed in one paper. When the averages alone are considered, the two students are equal.

Less variation is a desirable characteristic.

DEFINITION:

“Dispersion is the measure of the variation of the items.”


“Dispersion is a measure of extent to which the individual items vary.”
The degree to which numerical data tend to spread about an average value is
called the variation or dispersion of the data.

Criteria or requisites or characteristics or desirable properties of a measure of dispersion:

It should be rigidly defined.


It should be based on all the items.
It should not be unduly affected by extreme items.
It should lend itself for algebraic manipulation.
It should be simple to understand and easy to calculate.
It should have sampling stability.

38
Absolute and Relative Measures:

Dispersion

Absolute Relative

Absolute measure:
It indicates the amount of variation in a set of values.
They are quoted in terms of the units of observations
For eg when rainfall on different days are available in cm ,any absolute measure of
dispersion gives variation in rainfall in cm if it is in mm then the absolute measure of
dispersion are quoted in mm.

Relative measure:

It is used to compare the variation in two or more sets


They are free from units of measurements. They are pure numbers.
When the rainfall on different days is given in cm, a relative measure such as coefficient
of variation does not give variation in cm.
Rainfall in two places, say one in cm and other in inch can be compared using coefficient
of variation.
The set which has less variation is said to be less variable or more stable or more
consistent or more homogenous or more uniform.

Various absolute and relative measure:

Absolute Measures Relative Measures


Range Coefficient of Range
Quartile Deviation (Q.D.) or Coefficient of Quartile Deviation
Semi Inter Quartile Range
(i) Mean Deviation (M.D.) about mean (i) Coefficient of Mean Deviation (M.D.)
(ii) Mean Deviation (M.D.) about median about mean
(iii) Mean Deviation (M.D.) about mode (ii) Coefficient of Mean Deviation (M.D.)
about median
(iii) Coefficient of Mean Deviation (M.D.)
about mode
Standard Deviation Coefficient of variation
Variance

39
METHODS OF MEASURING DISPERSION:

The following are the important methods of studying variation.

1. Range.
2. Inter – quartile range.
3. Mean – deviation.
4. Standard deviation.
5. Lorenz curve.

Range:

Range is the difference between the greatest and the smallest of the values.
The range is the simplest measure of dispersion.
It is a rough measure of dispersion.
Its measure depends upon the extreme items and not on all the items.

Range = Largest value – smallest value


R=L–S
𝐿−𝑆
Coefficient of range = 𝐿+𝑆

Merits of Range:

It is simple to compute and understand.


It gives a rough but quick answer.

Demerits of Range:

It is not reliable.
It is affected by the extreme items.
It is an unsatisfactory measure.
It cannot be applied to open end classes.
It is not suitable for mathematical treatment.

Uses of Range:

Range is used in finding the control limits of Mean chart and Range chart in
S.Q.C.
While quoting the prices of shares, bands, gold, etc. on daily basis or yearly basis,
the minimum and the maximum prices are mentioned.
The minimum and the maximum temperature likely to prevail on each day are
forecasted.

40
INDIVIDUAL SERIES:

Example: Find the range of weights of 7 students from the following 27, 30, 35, 36, 38, 40, 43

Solution:

Range = L – S

= 43 – 27

= 16
𝐿−𝑆
Coefficient of range = 𝐿+𝑆

43−27
=
43+27

= 0.23

DISCRETE SERIES:

Example: Calculate range and its coefficient for the following data.

x: 10 20 30 40 50
f: 2 5 5 7 6
Solution:

Range = L – S

= 50 – 40

= 40
𝐿−𝑆
Coefficient of Range = 𝐿+𝑆

50−10
= 50+10

40
= 60

= 0.666

Example: (Continuous series)

x: 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
f: 3 4 2 6 7
Solution:

Range = L – S

= 60 – 10

= 50

41
𝐿−𝑆
Coefficient of Range = 𝐿+𝑆

60−10
= 60+10

50
= 70

= 0.7142

QUARTILE DEVIATION:

To obtain a measure of variation, we use the distance between the first and the
third quartiles.
Quartile deviation is defined as half the distance between the third and first
quartiles symbolically. Hence it is called Semi Inter Quartile Range.
𝑄3 −𝑄1
Semi – inter quartile range = 2

Quartile deviation is an absolute measure of dispersion. The relative measure of


dispersion known as coefficient of Quartile deviation is calculated as follows:
𝑄 −𝑄
Coefficient of Quartile deviation = 𝑄3 +𝑄1
3 1

Merits of Quartile deviation:

It is simple to understand and easy to compute.


It is not influenced by the extreme values.
It can be found out with open end distribution.
It is not affected by presence of extreme values.

Demerits of Quartile deviation:

It ignores the first 25% of the items and the last 25% of the items.
It is positional average; hence not amenable to further mathematical treatment.
Its value is affected by sampling fluctuations.
It gives only a rough measure.
It is not the representative value of the data.

Example: Individual series

Calculate Quartile deviation and its coefficient for the following data.

5, 7, 9, 12, 15, 19, 3, 7, 2, 9.

Solution:

Rearrange the given data in ascending order.

2, 3, 5, 7, 7, 9, 9, 12, 15, 19.

42
𝑁+1 𝑡ℎ
𝑄1 = Value of ( ) item
4

10+1 𝑡ℎ
= Value of ( ) item
4

= Value of 2.75th item

= 2nd Value + 0.75(3rd value – 2nd value)

= 3 + 0.75(5 - 3)

Q1=4.5

𝑁+1 𝑡ℎ
𝑄3 = Value of 3 ( ) item
4

10+1 𝑡ℎ
= Value of 3 ( ) item
4

= Value of 3(2.75)th item

= Value of 8.25th item

= 8th Value + 0.25(9th value – 8th value)

= 12 + 0.25(15 - 12)

= 12 + 0.7

Q3=12.75

𝑄3 −𝑄1
Quartile deviation = 2

12.75−4.5
= = 4.125
2

𝑄 −𝑄
Coefficient of Quartile deviation = 𝑄3 +𝑄1
3 1

8.25
= 17.25

= 0.4782.

Example: Discrete series:

x: 20 21 22 23 24 25 26 27 28
f: 8 10 11 16 20 25 5 9 6

Solution:

x: 20 21 22 23 24 25 26 27 28
43
f: 8 10 11 16 20 25 5 9 6
Cf: 8 18 29 45 65 90 105 114 120

𝑁+1 𝑡ℎ
𝑄1 = Value of ( ) item
4

𝑡ℎ
120+1
= Value of ( ) item
4

= Value of 30.25th item

= 30th Value + 0.25(31st value – 30th value)

= 23 + 0.25(23 – 23)

Q1= 23

𝑁+1 𝑡ℎ
𝑄3 = Value of 3 ( ) item
4

10+1 𝑡ℎ
= Value of 3 ( ) item
4

= Value of 3(30.25)th item

= Value of 90.75th item

= 90th Value + 0.75(91st value – 90th value)

= 25 + 0.75(26 - 25)

= 25 + 0.75

Q3 =25.75
𝑄3 −𝑄1
Quartile deviation = 2

25.75−23
= 2
= 1.375

𝑄3 −𝑄1
Coefficient of Quartile deviation =
𝑄3 +𝑄1

25.75−23
= 25.75+23

= 0.0564

44
Example: Continuous series:

Calculate the semi – inter quartile range of wages and coefficient of Q.D.

Wages(Rs) Labourers
30 – 32 12
32 – 34 18
34 – 36 16
36 – 38 14
38 – 40 12
40 – 42 8
42 – 44 6
Solution:

x f C.F
30 – 32 12 12
32 – 34 18 30
34 – 36 16 46
36 – 38 14 60
38 – 40 12 72
40 – 42 8 80
42 – 44 6 86

𝑁 𝑡ℎ
𝑄1 = Size of ( 4 ) item

86 𝑡ℎ
= Size of ( 4 ) item

= 21.5

𝑄1 lies between 32 – 34
𝑁⁄ − 𝑐𝑓
4
𝑄1 = 𝐿 + ×𝑖
𝑓

21.5−12
= 32 + ×2
18

= 32 + 1.06 = 33.06

𝑁 𝑡ℎ
𝑄3 = Size of 3 ( 4 ) item

86 𝑡ℎ
= Size of 3 ( 4 ) item

= 3(21.5)th item

𝑄3 lies between 38 – 40

3(𝑁⁄4)− 𝑐𝑓
𝑄1 = 𝐿 + ×𝑖
𝑓

45
64.5−60
= 38 + ×2
12

= 32 + 0.75 = 38.75
𝑄3 −𝑄1
Quartile deviation = 2

38.75−33.06
= 2

= 2.85
𝑄 −𝑄
Coefficient of Quartile deviation = 𝑄3 +𝑄1
3 1

38.75−33.06
=
38.75+33.06

= 0.08

MEAN DEVIATION:

M.D. is the arithmetic mean of the absolute deviations of the values about their
arithmetic mean or median or mode.
M.D. is the abbreviation of Mean Deviation. There are three kinds of mean
deviations, viz.,
❖ Mean deviation or mean deviation about mean.
❖ Mean deviation about median.
❖ Mean deviation about mode.

Merits of M.D:

It is simple to understand and easy to compute.


It is not much affected by the fluctuations of sampling.
It is based on all items of the series and gives weight according to their size.
It is rigidly defined.
It is flexible, because it can be calculated from any measure of central
tendency.
It is a better measure for comparison.

Demerits of M.D:

It is a non – algebraic treatment.


It is not a very accurate measure of dispersion.
It is not suitable for further mathematical calculation.
It is rarely used.
It is not as popular as S.D.

46
Mean deviation about mean (individual series):
Calculate mean deviation from mean for the following data: 100, 150, 200, 250, 360, 490,
500, 600, 671.
Solution:
X |𝑿 − 𝑿 ̅|
100 269
150 219
200 169
250 119
360 9
490 121
500 131
600 231
671 302
∑ 𝑿 = 𝟑𝟑𝟐𝟏 ∑|𝑿 − 𝑿 ̅ | = 1570
∑𝑋 3321
Mean 𝑋̅ = = = 369
𝑁 9

̅|
∑|𝑿− 𝑿
Mean deviation from mean = 𝑁

1570
= = 174.44
9

𝑀.𝐷 𝑓𝑟𝑜𝑚 𝑚𝑒𝑎𝑛


Coefficient of mean deviation from mean = 𝑋̅

174.44
= = 0.47
369

Discrete Series:

Calculate M.D from mean for the following data:

x: 2 4 6 8 10
f: 1 4 6 4 1
Solution:

x f Fx ̅|
|𝑿 − 𝑿 𝒇|𝑿 − 𝑿̅|
2 1 2 4 4
4 4 16 2 8
6 6 36 0 0
8 4 32 8 8
10 1 10 4 4
∑ 𝒇 = 𝟏𝟔 ∑ 𝒇𝒙 = 𝟗𝟔 ∑ 𝒇|𝑿 − 𝑿̅ | = 𝟐𝟒

∑ 𝑓𝑥 96
𝑋̅ = = =6
𝑁 16

24
M.D = 16 =1.5

47
1.5
Coefficient of M.D = 6

= 0.25

Continuous Series:

Calculate the M.D from mean for the following data:

C.I: 2 – 4 4 – 6 6 – 8 8 – 10
F: 3 4 2 1

Solution:

x f m fm |𝒎 − 𝑿̅| 𝒇 |𝒎 − 𝑿 ̅|
2–4 3 3 9 2.2 6.6
4–6 4 5 20 0.2 0.8
6–8 2 7 14 1.8 3.6
8 – 10 1 9 9 3.8 3.8
∑ 𝑓 = 10 ∑ 𝑓𝑚 = 52 ∑ 𝒇|𝒎 − 𝑿 ̅ | = 𝟏𝟒. 𝟖

∑ 𝑓𝑚 52
𝑋̅ = = = 5.2
𝑁 10
̅|
∑ 𝒇|𝒎− 𝑿
M.D from mean = ∑𝑓

14.8
= = 1.48
10

𝑀.𝐷 𝑓𝑟𝑜𝑚 𝑚𝑒𝑎𝑛


Coefficient of M.D from mean = 𝑋̅

1.48
= 5.2

= 0.28

Mean deviation from median (Individual series):

Calculate M.D from median and its coefficient:

15, 25, 30, 35, 40, 45, 50, 50

48
Solution:

X |𝑿 − 𝑴|
15 22.5
25 12.5
30 7.5
35 2.5
40 2.5
45 7.5
50 2.5
50 2.5
𝑛+1 𝑡ℎ
Median = ( ) 𝑖𝑡𝑒𝑚
2

= 4.5𝑡ℎ 𝑖𝑡𝑒𝑚

4𝑡ℎ 𝑣𝑎𝑙𝑢𝑒+ 5𝑡ℎ 𝑣𝑎𝑙𝑢𝑒


= 2

35+40
= = 37.5
2

∑|𝑋−𝑀| 80
M.D from median = = = 10
𝑁 8

10
Coefficient of M.D from median = 37.5 = 0.2666

Discrete Series:

x: 10 12 13 14 15 16
f: 2 3 7 20 8 9
Solution:

x F Cf |𝑿 − 𝑴| 𝒇|𝑿 − 𝑴|
10 2 2 4 8
12 3 5 2 6
13 7 12 1 7
14 20 32 0 0
15 8 40 1 8
16 9 49 2 18
∑ 𝒇 = 𝟒𝟗 ∑ 𝒇 |𝑿 − 𝑴| = 𝟒𝟕

49+1 𝑡ℎ
Median = ( ) 𝑖𝑡𝑒𝑚
2

50 𝑡ℎ
= ( ) 𝑖𝑡𝑒𝑚
2
Median = 14

49
∑ 𝑓|𝑋−𝑀| 47
M.D from median = ∑𝑓
= = 0.95
49

0.95
Coefficient of M.D from median = = 0.067
14

Continuous series:

C.I 16 – 20 21 – 25 26 – 30 31 – 35 36 – 40 41 – 45 46 – 50 51 – 55 56 – 60
F: 8 15 13 20 11 7 3 2 1

Solution:

C.I f True C.I m Cf |𝒎 − 𝑴| 𝒇|𝒎 − 𝑴|


16 – 20 8 15.5 – 20.5 18 8 13.5 108.0
21 – 25 15 20.5 – 25.5 23 23 8.5 127.5
26 – 30 13 25.5 – 30.5 28 36 3.5 45.5
31 – 35 20 30.5 – 35.5 33 56 1.5 30.0
36 – 40 11 35.5 – 40.5 38 67 6.5 71.5
41 – 45 7 40.5 – 45.5 43 74 11.5 80.5
46 – 50 3 45.5 – 50.5 48 77 16.5 49.5
51 – 55 2 50.5 – 55.5 53 79 21.5 43.0
56 – 60 1 55.5 – 60.5 58 80 26.5 26.5
∑ 𝒇 = 𝟖𝟎 ∑ 𝒇|𝒎 − 𝑴| = 𝟓𝟖𝟐. 𝟎

𝑁 𝑡ℎ
Median = S.of ( 2 ) item

80 𝑡ℎ
= S.of ( 2 ) item

= 40

Median lies between 30.5 – 35.5


𝑁
− 𝑐𝑓
Median = 𝐿 + [ 2 ] ×𝑖
𝑓

40−36
= 30.5 + [ ] ×5
20

∑ 𝒇|𝒎−𝑴|
M.D from median = 𝑁

582
= = 7.28
80

𝑀.𝐷 𝑓𝑟𝑜𝑚 𝑚𝑒𝑑𝑖𝑎𝑛


Coefficient of M.D from median = 𝑀𝑒𝑑𝑖𝑎𝑛

50
7.28
= 31.50 = 0.231

M.D from mode: Individual series

Calculate M.D from mode and its coefficient.

32 51 23 46 20 78 57 56 57 30

Solution:

X |𝑿 − 𝒛|
32 37
51 34
23 27
46 25
20 11
78 6
57 1
56 0
57 0
30 21
∑|𝑿 − 𝒛| = 𝟏𝟔𝟐

Mode = 57
∑|𝑿−𝒛| 162
M.D from mode = = = 16.2
𝑁 10

16.2
Coefficient of M.D from mode = = 0.28
57

Discrete series:

x: 21 25 27 32 41 46 50 55
f: 2 3 10 20 15 10 8 2
Solution:

x F |𝑿 − 𝒛| 𝒇|𝑿 − 𝒛|
21 2 11 22
25 3 7 21
27 10 5 50
32 20 0 0
41 15 9 135
46 10 14 140
50 8 18 144

51
55 2 23 46
∑ 𝒇 = 𝟕𝟎 ∑ 𝒇|𝑿 − 𝒛| = 𝟓𝟓𝟖
Z = 32
∑ 𝑓|𝑋−𝑧| 558
M.D from mode = =
𝑁 10

= 7.97
7.97
Coefficient of M.D from mode = = 0.2491
32

Continuous series:

x: 0 − 10 10 − 20 20 − 30 30 − 40 40 − 50
f: 4 15 28 16 7
Solution:

|𝒎 − 𝒛|
x f M 𝒇|𝒎 − 𝒛|
z = 28
0 – 10 4 5 23 92
10 – 20 15 15 13 195
20 – 30 28 25 0 0
30 – 40 16 35 12 192
40 – 50 7 45 21 147
∑ 𝒇 = 𝟕𝟎 ∑ 𝒇|𝒎 − 𝒛|

Z = 28
∑ 𝒇|𝒎−𝒛| 626
M.D about mode = ∑𝑓
= = 22.36
28

M.D about mode 22.36


Coefficient of M.D about mode = = = 0.80
𝑚𝑜𝑑𝑒 28

Standard Deviation: Individual series:

S.D is also called Root mean square deviation or Mean Error or Mean square
Error.
The reason is that it is the square root of the means of the squared deviation from
the arithmetic mean.
It provides accurate result.

Merits of S.D:

It is rigidly defined and its value is always definite and based on all the
observations and the actual signs of deviations are used.
As it based on A.M. it has all the merits of A.M.
It is the most important and widely used measures of dispersion.
It is possible for further algebraic treatment.

52
It is less affected by the fluctuations of sampling and hence stable.
It is the basis for measuring the coefficient of correlation, sampling and statistical
inferences.

Demerits of S.D:

It is not easy to understand, and it is difficult to calculate.


It gives more weight to extreme values, because the values are squared up.
It is affected by the value of every item in the series.
As it is an absolute measure of variability, it cannot be used for the purpose of
comparison.
It has not found favour with the economists and businessmen.
Individual series:
S.No: 1 2 3 4 5 6 7 8 9 10
Marks: 5 10 20 25 40 42 45 48 70 80
Solution:
S.No X X2
1 5 25
2 10 100
3 20 400
4 25 625
5 40 1600
6 42 1764
7 45 2025
8 48 2304
9 70 4900
10 80 6400
∑ 𝑿 = 𝟑𝟖𝟓 ∑ 𝑿𝟐 = 𝟐𝟎𝟏𝟒𝟑
2
∑ 𝑋2 ∑𝑋
𝜎= √ − ( )
𝑁 𝑁

20143 385 2
𝜎 = √ 10 − ( 10 )

𝜎 =√2014.3 − 1482.25
𝜎 =√532.05

𝜎 = 23.07

53
Individual series:

Calculate standard deviation from the following data:

14, 22, 9, 15, 20, 17, 12, 11.

Solution:

X X2
14 196
22 484
9 81
15 225
20 400
17 289
12 144
11 121
∑ 𝑿 = 𝟏𝟐𝟎 ∑ 𝑿𝟐 = 𝟏𝟗𝟒𝟎
2
∑ 𝑋2 ∑𝑋
𝜎= √ − ( )
𝑁 𝑁

1940 120 2
𝜎= √ −( )
8 8

𝜎 = 4.18

Discrete series:

Calculate standard deviation from the following:

Marks: 10 20 30 40 50 60
No. of Students: 8 12 20 10 7 3

Solution:

x F Fx f x2
10 8 80 800
20 12 240 4800
30 20 600 18000
40 10 400 16000
50 7 350 17500
60 3 180 10800
∑ 𝒇 = 𝟔𝟎 ∑ 𝒇𝒙 = 𝟏𝟖𝟓𝟎 ∑ 𝒇𝒙𝟐 = 𝟔𝟕𝟗𝟎𝟎

54
2
∑ 𝑓𝑋2 ∑ 𝑓𝑋
𝜎= √ − ( )
∑𝑓 ∑𝑓

67900 1850 2
= √ − ( )
60 60

= √1131.67 − (30.83)2

= √. 671131 − (30.83)2

= √1131.67 − 950.489 = √181.181

𝜎 = 13.484

Continuous series:

Class(x): 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
Frequency: 8 12 17 14 9 7 4

Solution:

x f m m2 fm fm2
0 – 10 8 5 25 40 200
10 – 20 12 15 225 180 3375
20 – 30 17 25 625 425 10625
30 – 40 14 35 1225 490 17150
40 – 50 9 45 2025 405 18225
50 – 60 7 55 3025 385 21175
60 – 70 4 65 4225 260 16900
∑ 𝒇 = 𝟕𝟏 ∑ 𝒇𝒎 = 𝟐𝟏𝟖𝟓 ∑ 𝒇𝒎𝟐 = 87650
2
∑ 𝑓𝑚2 ∑ 𝑓𝑚
𝜎= √ − ( ) ×𝑖
𝑁 𝑁

87650 2185 2
𝜎 = √ 71 − ( 71 ) × 1

= 16.96

55
DEFINITION: SKEWNESS:

“When a series is not symmetrical it is said to be asymmetrical or skewwd.”

SKEWNESS:

A distribution which is not symmetrical is called a skewed distribution and in such


distributions, the Mean, the Median and the Mode will not coincide, but the values
are pulled apart.
If the curve has a longer tail towards the right, it is said to be positively skewed.
If the curve has a longer tail towards the left, it is said to be negatively skewed.

CHARACTERISTICS OF DISPERSION AND SKEWNESS

Dispersion Skewness
It shows us the spread of individual It shows us departure from symmetry.
values about the central value.
It is useful to study the variability It is useful to study the concentration in
in data. lower or higher variables.
It judges the truthfulness of the It judges the differences between the central
central tendency. tendencies.
It is a type of average of deviation- It is not an average, but is measured by the
average of the second order. use of the mean, the median and mode.
It shows the degree of variability. It shows whether the concentration is in
higher or lower values.

MEASURES OF SKEWNESS:

The measures of asymmetry are usually measure of skewness.


Measures of skewness indicate not only the extent of skewness, but also the direction; i.e.,
the manner in which the deviations are distributed.
These measures can be absolute or relative.
The absolute measures are also known as measures of skewness.
The relative measures are known as the coefficient of skewness.

Absolute skewness = Mean – Mode


 = + Positive skewness : 
 
 = − Negative skewness :
If the value of the Mean is greater than the Mode, the skewness is positive.
If the value of the Mode is greater than the Mean, the skewness is negative.

OBJECTIVE OF SKEWNESS:

Measures of skewness tell us the direction and extent of asymmetry in a series, and permit
us to compare two or more series with regard to these.

56
Measures of skewness give an idea about the nature of variation of the items about the
central value.

RELATIVE MEASURE OF SKEWNESS:

There are three important measures of relative skewness:

Karl Pearson’s Coefficient of Skewness.


Bowley’s Coefficient of Skewness.
Kelly’s Coefficient of Skewness.

KARL PEARSON’S COEFFICIENT OF SKEWNESS:

The absolute skewness = Mean – Mode.


This measure is not suitable for making valid comparison of the skewness in two or more
distributions, because,
➢ the unit of measurement may be different in different series, and
➢ the same size of skewness has different significance with small or large variation
in two series.
An absolute measure is adopted.
Karl pearson’s coefficient of skewness is defined,

X − Mode
Coefficient of Skewness (SKp) =
σ
In case the mode is ill-defined, the coefficient can be determined by the changed formula:

Coefficient of Skewness (SKp) =


(
3(Mean − Mode ) 3 X − M
=
)
σ σ
Individual Series:

Calculate Karl Pearson’s coefficient of skewness for the following data:

25 15 23 40 27 25 23 25

57
Solution:

Marks
S. No X 𝑿𝟐
1 25 625
2 15 225
3 23 529
4 40 1600
5 27 729
6 25 625
7 23 529
8 25 625
9 20 400
2
Total ∑ 𝑋 = 223 ∑ 𝑋 = 5887

∑𝑋 2
∑𝑋 2
Standard Deviation: Formula, 𝜎 = √ 𝑁 − (𝑁)

5887 223 2
= √ 9 − ( 9 )

=√654.11 − (24.78)2

= √654.11 − 614.0484

= √40.0616

𝜎 = 6.33
∑𝑋
𝑋̅ = 𝑁

223
= 9

X = 24.78

Mode Z = 25

Mean − Mode
Karl Pearson’s coefficient of skewness =
S.D

24.78 − 25
=
6.33

−0.22
=
6.33

Coefficient of skewness = -0.03

58
Discrete Series:

Find the coefficient of skewness from the data given below:-

Size 3 4 5 6 7 8 9 10
frequency 7 10 14 35 102 136 43 8
Solution:-

X f 𝑿𝟐 fx fx2
3 7 9 21 63
4 10 16 40 160
5 14 25 70 350
6 35 36 210 1260
7 102 49 714 4998
8 136 64 1088 8704
9 43 81 387 3483
10 8 100 80 800
2
∑ 𝑓𝑋 = 2610 ∑ 𝑓𝑋 = 19818
Total N= 355

∑ 𝑓𝑋 ∑ 𝑓𝑋 2 2
Standard Deviation, 𝜎 = √ 𝑁 − ( 𝑁 )

19818 2610 2
=√ − ( 355 )
355

= √55.82 − (7.35)2

= √55.82 − 54.05

= √1.77

= 1.33

∑ 𝑓𝑋
𝑚𝑒𝑎𝑛 𝑋̅ = 𝑁

2610
= 355

Mean = 7.33

Mode = 8

59
Karl Pearson’s coefficient of skewness:

Mean − Mode
=
S.D
7.35−8
=
1.33

=-0.49
Continuous Series:

Find the standard deviation and coefficient of skewness for the given distribution:

Variable 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40


Frequency 2 5 7 13 21 16 8 3
Solution:-

Variable (X) Frequency (f) Mid Value (m) 𝒎𝟐 fm f𝒎𝟐


0-5 2 2.5 6.25 5 12.5
5-10 5 7.5 56.25 37.5 281.25
10-15 7 12.5 156.25 87.5 1093.75
15-20 13 17.5 306.25 227.5 3981.25
20-25 21 22.5 506.25 472.5 10631.25
25-30 16 27.5 756.25 440 12100
30-35 8 32.5 1056.25 260 8450
35-40 3 37.5 1406.25 112.5 4218.75
Total N= 75 ∑ 𝑓𝑚 = 1642.5 ∑ f𝑚2= 40768.75

∑ 𝑓𝑚 ∑ 𝑓𝑚 2 2
Standard Deviation, 𝜎 = √ 𝑁 − ( 𝑁 )

40768.75 1642.5 2
=√ − ( )
75 75

= √543.58 − (21.9)2

= √543.58 − 479.61

= √63.97

= 7.998(or) 8

60
∑ 𝑓𝑚
Mean 𝑋̅ = 𝑁

1642.5
= 75

= 21.9

Mode lies in 20-25 groups which contains the maximum frequency.

f1 − f 0
Z = L1 + i
2 f1 − f 0 − f 2

21 − 13
= 20 + 5
2(21) − 13 − 16

40
= 20 +
13

= 23.08

Karl Pearson’s coefficient of skewness:

Mean − Mode
=
S.D
21.9−23.08
= 8

= -0.148

EXAMPLE:

From a moderately skewed distribution of retail prices for men’s shoes, it is found that the
mean price is RS. 20 and the median price is Rs. 17. If the coefficient of variation is 20% , find
the pearson’s coefficient of skewness of the distribution.

Solution:
3(Mean−Median)
SK= Standard Deviation

Here, mean = 20 and median = 17 are given in the problem. To find the coefficient of skewness
we need standard deviation.
Standard Deviation ×100
C.V. = Mean

𝜎
20 = 20 × 100

5𝜎 = 20

𝜎 =4

61
3(20−17)
SK = 4

3(3)
= 4

= 2.25

BOWLEY’S COEFFICIENT OF SKEWNESS:

In the method of measuring skewness, the whole of the series is needed.


Prof. Bowley has suggested a formula based on relative position of quartiles.

Absolute SK = (Q3 – Median) – (Median – Q1)

= Q3 + Q1 – 2Median

Q 3 + Q1 − 2Median
Coefficient of SK =
Q 3 − Q1

BOWLEY’S COEFFICIENT OF SKEWNESS

From the information given below calculate quartile or Bowley’s coefficient of skewness:

Measure place
Median 201.0
S.D. 215.4
Third quartile 260.0
First quartile 157.0

Solution:

Quartile coefficient of skewness:

Q3 + Q1 − 2 Median
SK =
Q3 − Q1

260 + 157 − 2(201)


=
260 − 157

417 − 402
=
103

15
=
103

=0.146

Continuous Series:

Calculate Bowley’s measure of skewness from the following data:

62
Payment of commission RS No. of. salesmen
1000-1200 4
1200-1400 10
1400-1600 16
1600-1800 29
1800-2000 52
2000-2200 80
2200-2400 32
2400-2600 23
2600-2800 17
2800-3000 7
Solution:

Payment of commission RS No. of. salesmen Cumulative Frequency c.f


1000-1200 4 4
1200-1400 10 14
1400-1600 16 30
1600-1800 29 59
1800-2000 52 111
2000-2200 80 191
2200-2400 32 223
2400-2600 23 246
2600-2800 17 263
2800-3000 7 270

N
Median M = size of 2 𝑡ℎ item

270
= = 135th item which lies in the class 2000-2200
2

𝑁
−𝐶𝐹
Median = L1 +[ 2 ]×𝑖
𝑓

135−111
= 2000+[ ] × 200
80

= 2000 + 60

= 2060
N
𝑄1 = size of 4 𝑡ℎ item

270
= = 67.5th item which lies in the class 1800-2000
4

𝑁
−𝑐.𝑓
𝑄1 = L1 +[ 4 ]×𝑖
𝑓

63
67.5−59
= 1800+[ ] × 200
52

= 1800+ 32.69

𝑄1 = 1832.69
3N
𝑄3 = size of 𝑡ℎ item
4

3×270
= = 202.5 th item which lies in the class 2200-2400
4

3𝑁
−𝑐.𝑓
4
𝑄3 = L1 +[ ]×𝑖
𝑓

3×67.5−191
= 1800+[ ] × 200
32

= 2200+ 71.88

𝑄1 = 2271.88

Bowley’s coefficient of skewness:-

Q3 + Q1 − 2 Median
SK =
Q3 − Q1

2271.88+1832.69−2×2060
= 2271.88−1832.69

4104.57−4120
= 439.19

= -0.035

64
UNIT-IV

CORRELATION

INTRODUCTION :

Correlation refers to the relationship of two or more variable. We can find


some relationship between two variables.
Correlation is the statistical analysis which measure and analyses the degree or
extent to which two variables fluctuate with reference to each other.
There may be fluctuations or co variation between the values of the variables.

DEFINITION:

“Correlation analysis is the degree of relationship between variables”. It denoted by r.


Example: Price and demand of a commodity.

TYPES OF CORRELATION:

Correlation is classified into many types, but the important are:

Positive and negative


Simple and multiple
Partial and total
Linear and non-linear
No Correlation

POSITIVE AND NEGATIVE CORRELATION :

Positive correlation:

If two variables tend to move together in same direction (i.e) an increase in the value of
one variable is accompanied by an increase in the value of the other variable or a decrease
in the value of one variable is accompanied by a decrease in the value of the other
variable then the correlation is called positive.
Ex: Height and weight, price and supply. weight, rainfall and yield of crops, price and
supply .

X 10 20 30 40 50
Y 50 60 70 80 90
Negative correlation:

If two variables, tend to move together in opposite directions so that an increase or


decrease in the values of one variable is accompanied by a decrease or increase in
the value of the other variable, then the correlation is called negative.
Ex: Price and demand, yield and weed

x 10 20 30 40 50
y 50 40 25 15 10

65
ii) Simple and multiple correlation :

Simple correlation:

When we study only two variables, the relationship is described as simple correlation.
Eg: quantity of money price level, demand & Price.

Multiple correlations:

When we study more than variables simultaneously.


Eg: the relationship of price, demand and supply of a commodity.

iii) Partial and total:

partial correlation:

The study of two variables excluding some other variables is called Partial correlation.
Eg: we study price and demand , eliminating the supply side.

Total correlation:

In total correlation, all the facts are taken into accounts.

iv) Linear and Non-Linear:

If the ratio of change between two variables is uniform, there will be linear correlation.

X 5 10 15 20
Y 4 8 12 16

If we plot the values on the graph, if its forms a straight line. Then such a correlation is
called linear correlation
In a curvilinear or non-linear correlation, the amount of change in one variable does not
bear a constant ratio of the amount of change in the other variables.
If we plot the values on the graph, if it forms a curve or scattered around the curve then it
is called curvilinear..

v) No Correlation:

When the points are scattered there is no correlation between the two variables.

Methods of Correlation:

1. Graphic Method:
• Scatter Diagram
• Simple Graph

66
2. Mathematical Method:
• Karl Pearson’s Coefficient of Correlation
• Spearman’s Rank coefficient of Correlation
• Coefficient of Concurrent Deviation.
• Method of least squares.
i) Scatter diagram :
This is the simple method of finding out whether there is any relationship between two
variables by plotting the values on a chart, known as Scatter diagram.
If the plotted points form a straight line running from the lower left-hand corner to the
upper right-hand corner, then there is a perfect positive correlation i.e., r = +1
On the other hand , if the points are in a straight line, running from the upper left-hand
corner to the lower right – hand corner, it reveals that there is a perfect negative or
inverse correlation i.e., r = - 1,

High degree of High degree of No correlation


Positive correlation negative correlation

Karl Pearson’s Coefficient of Correlation:

This is also called product moment correlation coefficient


It is denoted by
N  xy −  x y
𝑟=
N  x 2 − ( x ) N  y 2 − ( y )
2 2

Merits:

Karl Pearson’s Correlation coefficient is the most popular correlation coefficient. It is


used in regression equation also.
The coefficient of correlation summarises in one figure the degree of correlation and its
direction. Moreover, we can estimate the value of the dependent variable from known
values of the dependent variables.
The population correlation coefficient can be estimated from the sample value.
The significance of the sample correlation coefficient can be tested.

67
Demerits:

The Correlation coefficient is unduly affected by extreme values.


From the values of r, it cannot be known whether the assumptions of linear relationship
between the variables hold or not.
Compared with other correlation coefficient. Karl Pearson’s correlation is the most
difficult one to calculate.
The calculation of coefficient of correlation is time – consuming.

Example 1
Calculate karl Pearson’s Correlation Coefficient from the following data.

X 40 45 50 53 60 57 51 48 45 47
Y 75 69 64 70 71 75 83 90 92 65

Solution:

X Y X2 Y2 XY
40 75 1600 5625 3000
45 69 2025 4761 3105
50 64 2500 4096 3200
53 70 2809 4900 3710
60 71 3600 5041 4260
57 75 3249 5625 4275
51 83 2601 6889 4233
48 90 2304 8100 4320
45 92 2025 8464 4140
47 65 2209 4225 3055
496 754 24922 57726 37298

𝑁∑𝑋𝑌−∑𝑋∑𝑌
r=
√𝑁∑𝑋 2 −∑(𝑋)2 √𝑁∑𝑌 2 –(∑𝑌)2

10𝑋37298−496𝑋754
=
√10(24922)−(496)2 √10(57726) –(754)2

68
372980−373984
=
√249220−246016√577260–568516

−1004
=
√3204√8744

−1004
= 56.6039𝑋93.5094

−1004
= 5292.9967

= - 0.1897
Example 2:
Calculate Coefficient of Correlation from the following data.

X 12 9 8 10 11 13 7
Y 14 8 6 9 11 12 3

Solution:

X Y X2 Y2 XY
12 14 144 196 168
9 8 81 64 72
8 6 64 36 48
10 9 100 81 90
11 11 121 121 121
13 12 169 144 156
7 3 49 9 21
70 63 728 651 676

𝑁∑𝑋𝑌−∑𝑋∑𝑌
r=
√𝑁∑𝑋 2 −∑(𝑋)2 √𝑁∑𝑌 2 –(∑𝑌)2

7𝑋676−70𝑋63
=
√7(728)−(70)2 √7(651)–(63)2

4732−4410
=
√5096−(4900)√4557)–(3969)

69
322
=
√196𝑋588

322
= 339.48

r = 0.95

Spearmen’s Rank Correlation Coefficient:

This method is based on rank.


This measure is useful in dealing with qualitative characteristics, such as intelligence,
beauty, mortality, character, etc.
The formula for spearman’s rank correlation which is denoted by ρ is:
6∑𝑑2
ρ= 1 – [𝑁(𝑁2 −1)]

We may come across two types of problem.

• Where rank are given


• Where ranks are not given

Where ranks are given:

When the actual rank are given, the steps followed are:

1. Compute the difference of the two ranks (R1 and R2) and denote by d.
2. Square the d and get ∑d2.
3. Substitute the figures in the formula.

Where ranks are not given :

When no rank is given, but actual data are given, then we must given ranks. We can give
ranks by taking the highest as 1 or the lowest as 1, net to the highest (lowest) as 2 and follow the
same procedure for both the variables,

Equal or repeated rank:

When two or more items have equal values. It is difficult to give ranks to them. In that case
the items are given the average of the ranks they would have received, if they are not tied. For
example, if two individuals are placed in the seventh place, they are each given the rank 7+8/2 =
7.5 which is common rank to be assigned; and the next will be 9; and if there are ranked equal at
the seventh place. They are given the rank 7+8+9/3 =8 which is the common rank to be
assigned to each; and the next rank will be 10, in this case. A slightly different formula is used
when there is more than one item having the same value. The formula is
1 1
∑𝑑2 + (𝑛3 −𝑚)+ +⋯
12 12(𝑚3 − 𝑚)
ρ = 1- 6 { }
𝑁3 − 𝑁
Merits:

70
Spearman’s rank correlation coefficient is useful in qualitative analysis. For example it is
sufficient for the judges to ranks the competitors. Judges to ranks the competitors.
Judges need not assign scores. It is more difficult to assign scores to the competitors than
ranking them.
It is the only method when ranks are given.
It can also calculate when the values of the variables are given.
It is simple to understand.
It is generally easy to calculate.

Demerits:

1. It N is large, it is very difficult to rank the items and to calculate P.


2. It cannot be calculated from a bivariate frequency.
3. It is not used much.

Problem - 1

Calculate spearman’s Coefficient of rank correlation for the following data.


X 1 2 3 4 5 6 7 8 9 10
Y 2 4 1 5 3 9 7 10 6 8
Solution

RX RY D D2
1 2 -1 1
2 4 -2 4
3 1 2 4
4 5 -1 1
5 3 2 4
6 9 -3 9
7 7 0 0
8 10 -2 4
9 6 3 9
10 8 2 4
=40
∑d2
ρ = 1- 6 N(N2 − 1)

6X40
= 1- 6 10(102 −1)
240
= 1- 10X99

71
240
= 1 - 990

= 1 – 0.24
ρ = 0.76

Example 2
Calculate the rank Correlation coefficient for the following data:

X 65 68 67 69 66 70 71 75
Y 65 68 67 64 72 70 86 73
Solution:

X Y RX RY d = RX - RY d2
65 65 8 7 1 1
68 68 5 5 0 0
67 67 6 6 0 0
69 64 4 8 -4 16
66 72 7 3 4 16
70 70 3 4 -1 1
71 86 2 1 1 1
75 73 1 2 -1 1

∑d2
ρ = 1- 6 N(N2 − 1)

6X36
= 1- 6 8(82 −1)

240
= 1- 10X99

216
= 1-
504

= 1 – 0.4286

ρ = 0.5714

EXAMPLE 3: FOR REPEATED RANKS


Calculate rank correlation Coefficient

72
Marks in Economics (x) 50 60 65 70 75 40 70 80
Marks in Statistics (y) 80 71 60 75 90 82 70 50

Solution :

X Y RX RY d = R X - RY d2
50 80 7 3 4 16
60 71 6 5 1 1
65 60 5 7 -2 4
70 75 3.5 4 -0.5 0.25
75 90 2 1 1 1
40 82 8 2 6 36
70 70 3.5 6 -2.5 6.25
80 50 1 8 -7 49
∑d2= 113.5
N =8
To is repeated 2 times, m=2

6{∑𝑑2 + 𝑚(𝑚2 − 1)\12}


ρ= 1–[ ]
𝑁(𝑁 2 − 1)

6{113.5+ 2(22 − 1)\12}


= 1–[ ]
8(82 − 1)

6{113.5+ 0.5}
= 1–[ ]
8𝑋63
6𝑋114
= 1–[ ]
504

684
= 1 – [504]

= 1 – 1.3571
= - 0.3571

ρ = - 0.3571

Problem : REPEATED RANKS

Marks obtained by 8 students in Maths and statistics are given below. Compute the rank
Correlation.

Marks in Maths (x) 15 20 28 12 40 50 20 80

73
Marks in Statistics (y) 40 30 55 60 40 30 60 70
Solution:

X Y RX RY d = R X - RY d2
15 40 7 5.5 1.5 2.25
20 30 5.5 7.5 -2 4
28 55 4 4 0 0
12 60 8 2.5 5.5 30.25
40 40 3 5.5 -2.5 6.25
50 30 2 7.5 -5.5 30.25
20 60 5.5 2.5 3 9
80 70 1 1 0 0
82

20 is repeated 2 times, m=2

60 is repeated 2 times, m=2

40 is repeated 2 times, m=2

30 is repeated 2 times, m=2


𝑚(𝑚2 −1) 𝑚(𝑚2 −1) 𝑚(𝑚2 −1) 𝑚(𝑚2 −1)
6{∑𝑑2 + + + + }
ρ= 1–[ 12 12
2
12 12
]
𝑁(𝑁 − 1)

2(22−1) 2(22−1) 2(22 −1) 2(22 −1)


6{82+ + + + }
= 1–[ 12 12 12 12
]
8(82 − 1)

6{82+0.5+0.5+0.5+0.5}
= 1–[ ]
8𝑋63

6𝑋84
= 1- 504

504
= 1 - 504

= 1-1

= 0

ρ=0

Problem:

Ten Competitors in a beauty contest are ranked by three judges in the following order:

I Judge 1 5 4 8 9 6 10 7 3 2
II Judge 4 8 7 6 5 9 10 3 2 1

74
III Judge 6 7 8 1 5 10 9 2 3 4

Use rank correlation coefficient to discuss which pair of judges have the nearest approach to
common tests in beauty

Solution:

R1 R2 R3 d12 = (R1 -R2)2 d22 = (R2 – R3)2 d32 = (R1 – R3)2


1 4 6 9 4 25
5 8 7 9 1 4
4 7 8 9 1 16
8 6 1 4 25 49
9 5 5 16 0 16
6 9 10 9 1 16
10 10 9 0 1 1
7 3 2 16 1 25
3 2 3 1 1 0
2 1 4 1 9 4
= 74 =44 =156

1st & 2nd Judge


∑d2
ρ12 = 1- 6 N(N2 − 1)

6X74
= 1- 10(102 − 1)

444
= 1-
990

= 1- 0.448

ρ12 = 0.552

2nd & 3rd Judge

∑d2
ρ23 = 1- 6 N(N2 − 1)

6X44
= 1- 10(102 − 1)

75
264
= 1- 990

= 1- 0.267
ρ23 = 0.733
1st &3rd Judge

∑d2
ρ13 = 1- 6 N(N2 − 1)

6X156
= 1- 10(102 − 1)
936
= 1- 990

= 1 – 0.945

ρ13 = 0.055

The Second and third judges have the nearest approach in common tastes in beauty, because the
coefficient of correlation is highest between them.

Concurrent Deviation method:

In this method, only the direction to change in the variables x and y is taken into account. It
is the simplest method of finding out correlation. This is based on the signs of deviations; for
each term the change is the value of the variable from its preceding or previous value which
may be plus (+) or minus (-). The formula is :

2𝐶−𝑁
𝑟(𝑐) = ± √ 𝑁

Where,

r(c) = Coefficient of Correlation by the concurrent deviation method

C = No.of Concurrent deviations

N = No.of Pairs of deviation Compared.

Steps:

1. Find out the directions of change of x variable. Take the first value of X as base and
note down whether the second value is increasing or decreasing or constant. If it
increases in relation to the previous one, mark plus (+) sign against it; if it decrease, put

76
minus(-) sign; and if it us equal, put zero. In the case of the third value the second value
is the base and repeat the above method till the item. The heading of the column is
denoted by Dx.
2. Find out the direction of change of y variable, following the above step. The heading of
the column is denoted by Dy.
3. Multiply Dx by Dy and find out the values of C; i.e., the number of positive items.
4. Substitute the figures in the formula
2𝐶−𝑁
If √ is negative, the negative value multiplied by the minus sign inside will make
𝑁

it positive and we can take the square root. But if the ultimate result is negative, we
cannot take the square root of minus sign.
2𝐶−𝑁
If √ is positive, then all the sign will be positive.
𝑁

Merits:

1. It is easiest and the simplest method.


2. It is used in the study of short time oscillation.
3. This method can be used when the numbers of items are very large; and we get a quick
idea of the degree of relationship.

Demerits:

1. It gives equal weight to small and big changes.


2. It provides only a rough measure of coefficient of correlation.

Regression

Introduction:

Regression literally means stepping back towards the average.


Used by British Biometrician Sir Francis Galton 1822- 1911 in connection with the
inheritance of stature.
“ Regression analysis a mathematical measure of the Average relationship between two or
more variables in terms of the original units of the data”
Regression equation: The value of the dependent variable is estimated corresponding to
any value of the independent variable by using the regression equation.
In Regression there are 2 type of variable.
Dependent variable (ii) Independent variable
The variable whose value is influenced or is to be predicted is called dependent variable
and the variable which influences the values or is used for prediction is called
independent variables.

77
In regression analysis independent variable is also known as regression or predicted or
explanatory variable. While the dependent variable s also known as regressed or
explained variable

Properties of Regression Lines and Coefficients

The two regression equations are generally different and are not be interchanged in their
usage.
̅, ̅
The two regression lines intersect at (X Y).
Correlation coefficient is the geometric mean of the two regression coefficients.
That is, correlation coefficient is the square root of the product of the two regression
coefficients.
r = ± √bYX . bXY

The two regression coefficients and the correlation coefficient have the same sign.
Both the regression coefficient cannot be greater than 1 numerically simultaneously:
Regression coefficient are independent of change of origin but are affected by change of
scale.
Each regression coefficient indicates is in the unit of the measurement of the dependent
variable.
Each regression coefficient indicates the quantum of change in the dependent variable
corresponding to unit increase in the independent variable.
Uses of regression:
It is widely used method than correlation analysis.
It is used to estimate the relationship between two Economic variable income and
Expenditure.
Predicts the value of dependent from the independent values.
We can calculate coefficient of correlation(r) and Coefficient of Determination𝑟 2 .
Estimation of Demand curves, Supply, Production.

Difference between Correlation and Regression.

Correlation Regression

78
1. Correlation is the relationship between 1. Regression means going back. The average
two or more variables. It is expressed relation between the variables is given as an
numerically. equation.

2. One of the variables is independent variable and


2. Between two variables none is the other is independent variables.
identified as independent or dependent
variable. 3. It indicates the cause and effect relationship
between the variables and establishes a functional
3. It does not study the cause and effect relationship.
relationship between the variable.
4. Regression coefficient is an absolute measure. If
we know the value of independent variable, we can
4. The coefficient of correlation is a find the value of dependent variable.
relative measure. The range of
relationship lies between -1 and +1 5. It is useful for further mathematical treatment.

6. If has wider application, as it studies linear and


5. It is not useful for further non-linear relationship between the variables.
mathematical treatment.
7. There is no such nonsense regression.
6. It has limited application because it is
confined to linear relationship between 8. The regression coefficient explains that the
the variables. decrease in one variable is associated with the
increase in the other variables.
7. There is spurious or nonsense
correlation. 9. Regression coefficient are independent of change
of origin but are affected by change of scale.
8. If the coefficient is positive, then two
variables are positively correlated and
vice versa.

9. Correlation coefficient is independent


of change of origin and scale.

79
Regression Line:

Linear Regression attempts to model the relationship between two variables by fitting a
linear equation to observed data. A linear regression line has an equation of the form Y= a+bX,
where X is the explanatory variable and the Y is the dependent variables.

Method of forming the regression equations

Regression Equations on the basis of Normal Equations.


̅̅̅, 𝑌̅ , 𝑏𝑋𝑌 𝑎𝑛𝑑 𝑏𝑌𝑋 .
Regression Equations on the basis of 𝑋

REGRESSION EQUATIONS ON THE BASIS OF NORMAL EQUATIONS

Regression line Y on X :

This gives the most probable values of y from the given value of x.

Y= a+bX
Normal equations

∑Y = na+b∑x (1)

∑XY = a∑X+b∑x2 (2)


Regression line X on Y:
This gives the most probable values of x from the given value of y.
X= a+bY
Normal Equations

∑X = na+b∑Y (1)

∑XY = a∑Y+b∑Y2 (2)


The algebraic expression of there regression line is called as Regression equations.

ii) Regression equations on the basis of bXY & bYX

• Regression coefficient of X on Y :

80
𝜎𝑋
bXY = r. 𝜎𝑌
∑𝑋𝑌 𝑁∑𝑋𝑌− ∑𝑋∑𝑌
bXY= (or) bXY = 𝑁∑𝑌 2 − (∑𝑌)2
∑𝑌 2

• Regression Coefficient of Y on X:

𝜎𝑌 𝑁∑𝑋𝑌− ∑𝑋∑𝑌
bYX = r. (or) bYX = 𝑁∑𝑋 2 − (∑𝑋)2
𝜎𝑋

Formula for regression equation:

Regression equation of X on Y:
(X – X) = bXY (Y-Y)
Regression equation of Y on X
(Y-Y) = bYX (X-X)
Method 1: Problem 1:
From the following data obtained the two regression equations by the method of least square.

X 6 2 10 4 8
Y 9 11 5 8 9

Solution:

X Y X2 Y2 XY
6 9 36 81 54
2 11 4 121 22
10 5 100 25 50
4 8 16 64 32
8 9 64 49 56
220 340 214

Here, n=5

Let Regression equation of X on Y

81
X = a +by (1)
Normal equation

∑X= na+b∑y (2)

∑XY = a∑y+b∑y2 (3)

5a+40b = 30 (4)
40a+ 340b=214 (5)

(4)X8 → 40a+ 320b=240


40a+ 340b=214
------------------------
-20b = 26
---------------------------
−26
b= 20

b = -1.3
b = - 1.3 sub in equ (4)

5a+40(-1.3) = 30
5a = 30 +52
a =16
a = 16 & b = - 1.3 sub in equ (1)
X = a+bY
X= 16 – 1.3Y

ii) Regression equation of Y on X:

Y = a +bX (1)
Normal equation

∑Y= na+ b∑X (2)

82
∑XY = a∑X+b∑X2 (3)

5a+40b = 30 (4)
30a+ 220b=214 (5)

(4)x6 30a+ 180b=240


30a+ 220b=214
- -
-40 b= 26
-------------------------
b = - 0.65 sub in equ (4)

5a + 30 (- 0.65) = 40

5a = 40 + 19.5

a = 11.9

a = 11.9 & b = - 0.65 sub in equ (1)

Y = a + bX

Y = 11.9 – 0.65X

Method :2

Problem:
Calculate the two regression equation from the following data .

X 10 12 13 12 16 15
Y 40 38 43 45 37 43

Also estimate Y when X = 20.

Solution:

83
X Y X2 Y2 XY
10 40 100 1600 400
12 38 144 1444 456
13 43 169 1849 559
12 45 144 2025 540
16 37 256 1369 592
15 43 225 1849 645
=78 =246 =1038 =10136 = 3192

X
X =
n

78
= 6

= 13

X = 13

∑Y
Y= 𝑛

246
= 41

Y = 41
𝑁∑𝑋𝑌− ∑𝑋∑𝑌
bXY =
𝑁∑𝑌 2 − (∑𝑌)2

6𝑋3192−78𝑋246
= 6𝑋10136 – (246)2

19152−19188
= 601816 – 60516

−36
= 300

bXY = - 0.12
𝑁∑𝑋𝑌− ∑𝑋∑𝑌
bYX =
𝑁∑𝑋 2 − (∑𝑋)2

84
6𝑋3192−78𝑋246
= 6𝑋138− (78)2

19152−19188
= 6228−6084

−36
= 144

bYX = - 0.25
Regression equation of X on Y

X − X = b XY (Y − Y )

X −13 = − 0.12 (Y − 41)

= - 0.12 Y + 4.92

X = - 0.12Y + 4.92

Regression equation Y on X

Y − Y = bYX ( X − X )

Y − 41 = −0.25( X − 13)

= - 0.25X + 3.25

Y = - 0.25X + 3.25

Estimate value of Y when X=20

Y = - 0.25(20) + 44.25

= - 5 + 44.25

Y = 39.25

Method 2:

Case(ii)

85
Problem:

From the following information on values of two variables X and Y. Find the two regression
equation and correlation coefficient. N=10, ∑x = 20, ∑y = 40, ∑x2 = 240, ∑y2 = 410, ∑xy = 200

Solution:

ΣX
X=
n
20
= =2
10

Y
Y=
n
40
= =4
10

Y=4

10 X 200 − 20 X 40
b XY =
10 X 410 − (40) 2

2000 − 800
=
4100 − 1600
= 0.48

10 X 200 − 20 X 40
bYX =
10 X 240 − (20) 2
2000 − 800
=
2400 − 400
= 0.6

bYX = 0.6

Regression equation of X on Y:

X −X = b XY (Y − Y )

X − 2 = 0.48 (Y − 4)
X = 0.48 + 0.08

Regression equation of Y on X

86
Y − Y = bYX ( X − X )
Y − 4 = 0.6 ( X −1.2)
Y = 0.6 X 2.8

Correlation Coefficient :

r =  b XY bYX
=  0.48 X 0.6
=  0.288
r = 0.5367
Method :2

Case (iii)

Problem :

Find the following regression equations find the mean values of X and Y series:

8X – 10Y = - 66

40X – 18Y = 214

Find : (i) Average values of X and Y.

(ii) Correlation Coefficient between the two variables.

(iii) Standard deviation of Y variance of X = 9

Solution:

1) Average value of X and Y can be obtained by solving the given equation:


8X – 10 Y = - 66 (1)
40X - 18Y = 214 (2)
Multiply equation (1) by 5 and subtract equation (2) from it

40X - 50Y = - 330


40X - 18Y = 214
- + -

-32Y = -544

Y = 17

Substituting the value of Y in equation (1)

87
8X – (10x17) = - 66
8X = - 66 + 170

X = 104/8

X = 13
Hence the mean value of X = 13, Y=17

UNIT-IV

88
INDEX NUMBER

DEFINITION:

➢ An index number is a statistical measure designed to show changes in a variable or a


group of related variables with respect to time, geographic location or other
characteristics such as income, Profession, etc.
➢ Index numbers which show changes in price or quantity in one time compared with
another alone are discussed here.
➢ The year for which index number is calculated is called the current year. The year with
which the current year is compared is called the base year.
➢ A Price index number is the Percentage of change in the Price of one commodity or one
group of commodities in the current year compared with the base year. A Similar
calculation in quantity results in quantity index number.

CHARACTERISTICS OF INDEX NUMBER:

1. Index numbers are a special type of average:

Index numbers are a special type of averages. For example, let the
commodities be rice, kerosene and cloth. The price of rice per kilogram is
considered; the price of kerosene per liter and the price of a cloth per metre are
considered. The average change in prices is indicated by the index number.

2. Index numbers are percentages:


The price in the current year is divided by the price in the base year to get
the ratio of change in price. It multiplied by 100.
3. Index numbers indicate the percentage of change which is not possible
otherwise:
No other statistical tool is so effective in studying such a wide variety of
situations.
4. Index numbers are meant for comparisons:

Index numbers have been devised to compare two different times.


Comparisons of two different places or situations are also possible with index
numbers.

USES OF INDEX NUMBERS:

1 .Index numbers provide scopes for comparisons

2 .Index numbers are Economic Barometers

3 .Index numbers serve as guides

4. Index numbers are the pulse of an economy

5. Index numbers measure the purchasing power of money

89
100
Purchasing power of one rupee = Price index

6. Index numbers help to calculate real wages.


Money Wage
Real Wage = x 100
Price Index off Cost of Living Index

7. Index numbers are deflators .

8. Index numbers are useful to formulate policies.

CONSTRUCTION OF INDEX NUMBERS:

The following aspects are to be carefully considered during the construction of an index
number,

1. The purpose:

The purpose of the index number is to be clearly known for whom it is meant, by
whom it is to be used etc. to be spelt out.

2. The base period:


The period may be one year or a few years. The base period is to be taken
according to the purpose.
(i). It should be a normal period. There should not have been natural calamities
such as famine, flood and earth quake, political, up navels’, war, etc.

(ii). It should not be two distant in the past this is to keep the Index numbers
useful.

3. The items:

The items including all the items in a study is neither feasible nor useful. Only
those items which concern the people for whom the index number is intended are to be
included. For considering the living conditions of people in hill stations woolen clothes
should be included

4. The Price Quotations :


The Price Quotations The Prices are to the De Properly gathered. For consumer
Price index number, retail prices are necessary, For whole – Sale Price indices.
Whole – Sale prices are needed.

5. The Average

90
The Average for arriving at the average value of a group of items, the suitable
average is to be decided. In other contexts A . M may be more useful. It may be simple
to understand and easy to calculate.

➢ G . M is the appropriate average to measure relative changes. Hence,


index numbers where in the relative changes are expressed as percentages
give scope for G.
➢ It facilities the change of the base period. Base cannot be kept the same
for a long time because the purpose and all around changes may warrant
a change in the base period.

6 . Weighting :

By unweighted method, equal weight age of unity is given to all the items.

➢ Base year quantity as in Laspeyre’s method or current year quantity as in


Paasche’s method for Price index number.
➢ Base year value (Price × quantity ) as in consumer Price index number
by family Budget method.
➢ Some fixed weight based on neither base year quantity nor current year
quantity but on some other consideration as in Kelly’s method.

7 . The formula:

As seen in the following pages, many formulas are available.

Period is referred to as year here after and the following notations are used.

P0 - price of a commodity in the base year

P1 - Price of a commodity in the current year.

q0 – quantity of a commodity in the base year

q1 – quantity of a commodity in the current year.

P – Price of a commodity.

q – quantity of a commodity.

V or W – weight of a commodity.

I or P –Price relative or price Index number of a commodity.

Q – quantity relative or quantity index number of a commodity.

P q
P=P1 × 100, Q = q1 × 100.
0 0

P01 – price index numbers the current year compared with the base year.

91
Q01 – quantity index number of the current year compared with the base year.

Formulae:

All the formulae can be brought under four groups as follows.

Methods

un weighted (simple) weighted

Simple simple weighted weighted

Aggregative Average of relation Aggregative Average of


relation method method method method

1. SIMPLE OR UNWEIGHTED AGGREGATIVE METHOD.

It is based on the aggregative or the totals as shown below.


∑P
P01 = ∑ P1 X 100
0

When quantity index number is required,


∑ q1
Q01 = ∑ q0
X 100

The drawbacks of this method are:


(i) It does not satisfy even unit test which is explained later. The defect is due
to the fact that the unit prices are added as such even though the units of
measurements are different suc as kg, liter, etc.
(ii) It does not distinguish between the commodities with regard to their
relative importance.

2. SIMPLE OR UN WEIGHTED AVERAGES RELATIONSHIP METHOD:

The price relatives P, for price index number and the quantity relatives, Q

For quantity index number are calculated and their A.M or G.M is found.

Price index (p01)


∑P
(i) Using A.M., P01 = N
∑ log p
(ii) Using G.M., P01 = ( )
N

Both these formulae can be found to satisfy unit test.

92
Problem: 1 from the following data constructs an index for 1995 taking 1994 as base:

Commodities A B C D E

Price in 1994 (Rs) 50 40 80 110 20

Price in 1995 (Rs) 70 60 90 120 20


Solution:

Price
Commodities P Log P
1994(p0 ) 1995(p1 ) P = P1 x 100
0
A 50 70 140.00 2.1461
B 40 60 150.00 2.1761
C 80 90 112.50 2.0512
D 110 120 109.09 2.0378
E 20 20 100.00 2.0000
Total ∑ p0= 300 ∑ p1 = 360 ∑ P = 611.59 ∑ log p =
10.4112

By Aggregative Method,
∑P
P01 = ∑ P1 X 100
0

360
= 300 x 100 = 120

∑P
Using A.M., P01 = N

611.59
= = 122. 32
5

∑ log p
Using G.M., P01 = Antilog ( )
N

10.4112
= Antilog ( ) = 120.84
5

3. WEIGHTED AGGREGATIVES METHOD.

Price Indices (P01)


∑ P1 q0
(i) Laspeyre’s formula: P01 L = ∑ x 100
P0 q0

∑ P1 q1
(ii) Paasche’s formula: P01 P = ∑ x 100
P0 q1

∑P q ∑P q
(iii) Fisher’s formula: P01 F = √∑ 1 0 x ∑ 1 1 x 100
P q P q 0 0 0 1

93
∑ P1 (q0 +q1 )
(iv) Marshall- Edge worth formula: P01 ME = ∑ x 100
P0 (q0 +q1 )

∑ P1 q0 +∑ P1 q1
=∑ x 100
P0 q0 +∑ P0 q1

1 ∑ P1 q0 +∑ P1 q1
(v) Bowley’s formula: P01 B = (∑ ) x 100
2 P0 q0 +∑ P0 q1

P01 L + P01 P
=
2

∑ P1 q
(vi) Kelly’s formula: P01 k = ∑ P0 q
x 100

Problem :1 Compute (i) Laspeyre’s (ii) Paasche’s and (iii) Fisher’s index number.

Price Quantity
Item Base year Current year Base year Current year
A 6 10 50 50
B 2 2 100 120
C 4 6 60 60
D 10 12 30 25
Solution:

Commodity Price Quantity


Base Current Base Current
year year year year
P0 P1 q0 q1 p0 q0 p1 q0 p0 q1 p1 q1
A 6 10 50 50 300 500 300 500
B 2 2 100 120 200 200 240 240
C 4 6 60 60 240 360 240 360
D 10 12 30 25 300 360 300 300

∑ P1 q0
(i) Laspeyre’s formula: P01 L = ∑ x 100
P0 q0

1420
= x 100 = 136.54
1040

∑ P1 q1
(ii) Paasche’s formula: P01 P = ∑ x 100
P0 q1

1400
= x 100 = 135.92
1030

94
∑P q ∑P q
(iii) Fisher’s formula: P01 F = √∑ 1 0 x ∑ 1 1 x 100
P q P q 0 0 0 1

1420 1400
=√ × x 100
1040 1030

= 136.23 (or)

P01 F = √Laspeyre′ s × Paasche′s

= √136.54 × 135.92

= 136.23

4. WEIGHTED AVERAGES OF RELATIVES METHOD:

Price Indices [P01 ]


∑ WP
(i) Using A.M., P01 = ∑W
∑ Wlog P
(ii) Using G.M., P01 = Antilog [ ∑W
]
This method is better than the corresponding unweighted method in showing the
relative change. From the data available under this method, index numbers by
unweighted averages of relatives also could be calculated. This method provides
scope for replacing one or more items as a later stage.
Problem: 1 Calculate the index number of prices for 1998 on the basis of 1995 from the data given
below.

Commodity Weights Price (1995) Price(1998)


A 40 16 20
B 25 40 60
C 5 2 3
D 20 5 7
E 10 2 4

Solution: Either G.M or A.M can be used

Weights Price
p
Commodity W 1995 1998 P = p1 x 100 WP Log p W log P
0
A 40 16 20 125 5000 2.0969 83.8760
B 25 40 60 150 3750 2.1761 54.4025
C 5 2 3 150 750 2.1761 10.8805
D 20 5 7 140 2800 2.1461 42.9220
E 10 2 4 200 2000 2.3010 23.0100
Total ∑w = ---- ----- ------ ∑ WP= ------ ∑ W log P =
100 14300 215.0910

95
∑ WP
(i) Using A.M., P01 = ∑W
14300
= = 143
100
∑ Wlog P
(ii) Using G.M., P01 = Antilog [ ∑W
]
215.0910
= Antilog[ ]= 141.55
100

TESTS OF CONSISTENCY AND ADEQUACY

➢ Unit Test: By simple Aggregative Method,

∑P
P01 =∑ P1 X 100
0

By Laspeyre’s formula,
∑ P1 q0
P01 = ∑ x 100
P0 q0

By Paasche’s formula,
∑ P1 q1
P01 = ∑ x 100
P0 q1

By Fisher’s Formula,

∑P q ∑P q
P01 F = √∑ 1 0 x ∑ 1 1 x 100
P q P q
0 0 0 1

➢ Time Reversal Test (T.R test)


P01 x P10 = 1
P01 is the index number of the base year in comparison with the current year. That
is the base year figure will be in the numerator and the current year figure will be in
the denominator. Hence, it is expected to be the reciprocal of P01 in other words, the
Product of P01 and P10 is expected to be unity.
➢ Factor Reversal Test.(F.R test)
∑P q
P01 x Q01 = ∑ P1 q1
0 0

P01 given the relative change in price while Q01 given the relative change in
quantity. Hence, P01× Q01 should give the relative change in price multiplied by
∑P q
quantity. And so should be equal to= ∑ P1 q1
0 0

96
➢ Circular Test:
Circular test is an extension of the time reversal test. If three years 0, 1,
and 2 are under consideration, this requires the formula to be such that,

P01 xP12 xP20 = 1


P01 is the index number of the second year in comparison with the first year,
P12 is the index number of the third year in comparison with the second
year,
P20 is the index number of the first year in comparison with the third.
➢ Fixed Base:
When the data are available for more than two years, the question ‘ which
is the base year’ arise. Under fixed base method, the base ‘year’ is same for all the
different years under consideration. Base year figures may be figures of any one year
or the averages of a few years or the totals of a few years or those suggested. When
nothing is indicated, the first year in the series of years in chronological order is to be
taken as the base.
If no method is suggested, the method suggested, the method which is
suitable for the data under consideration is to be chosen. For the given data, although
index number can be calculated by more than one method, the result is obtained by
only one method unless stated otherwise. The method is selected in the following
order.
(i) Fisher’s formula (or)
(ii) Weighted A.M. method (or)
(iii) Unweighted A.M. method.

For each commodity the price in a year is divided by that in 1995 and is
multiplied by 100 to get the price relative. Using A.M., the price indices are calculated
and are given in the last column of the above table.

For the first year which is the base year, fixed base index number as well as each
P is 100.

➢ Chain Base index:

Current year link relatives X Preceding year chain index


Chain Index =
100

97
Current year C.B.I X Preceding year F.B.I
Current year F.B.I =
100

Cost of Living Index :

Cost of living index number shows the impact of changes in the prices of a
number of commodities and services on a particular class of people in the current
year in comparison with the base year, cost of Living Index Number.

Formula:

Two formulae are available, They are given below.

(i)Aggregate Expenditure Method or weighted Aggregate Method.


∑P q
Cost of Living Index number = ∑ P1 q0 x 100
0 0

(ii)Family Budget Method or weighted Averages of Relatives Method.


∑ WP
Cost of Living Index number = ∑W

∑ wlog P
Cost of Living Index Number = Antilog ( ∑W
)

Uses:

1. Cost of living index numbers are the indicators of changes in real wages. Money
wages ar changing and so is prices. Cost of living index numbers help to know
whether money wages overtake the rising prices or are overpowered by them.
2. Decisons on dearness allowance are based on the cost of living indices.

3. They are further used for deflation of income and value in national accounts.

TIME REVERSAL AND FACTOR REVERSAL TEST

Problem: 1 Show that Fisher’s ideal index satisfies both time reversal and factor reversal tests,
using the following data commonly.

Commodity Price(1990) Qty(1990) Price (1992) Qty (1992)


A 6 50 10 56
B 2 100 2 120
C 4 60 6 60
D 10 30 12 24
E 8 40 12 36

98
Solution:

1990 1992 p0 q0 p1 q0 p0 q1 p1 q1
Commodity p0 q0 p1 q1
A 6 50 10 56 300 500 336 560
B 2 100 2 120 200 200 240 240
C 4 60 6 60 240 360 240 360
D 10 30 12 24 300 360 240 288
E 8 40 12 36 320 480 288 432
Total ---- --- --- --- ∑ p0 q0 = ∑ p1 q0 = ∑ p0 q1 = ∑ p1 q1 =
1360 1900 1344 1880
By Fisher’s formula, after ignoring the facto 100,

∑ P1 q0 ∑ P1 q1 1900 1880
P01 = √∑ P ×∑ =√ ×
0 q0 P0 q1 1360 1344

∑ P 0 q1 ∑ p0 q0
P10 = √∑ P ×∑
1 q1 P 1 q0

1344 1360
=√ × and so
1880 1900

1900 1880 1344 1360


P01 × P10 =√ × ×√ ×
1360 1344 1880 1900

1900 1880 1344 1360


= √ × × × =√1 =1
1360 1344 1880 1900

∑P q ∑P q 1344 1880
Q01 = √∑ p0q1 × ∑ P1q1 = √ ×
0 0 1 0 1360 1900

1900 1880 1344 1880


P01 × Q01 =√ × ×√ ×
1360 1344 1360 1900

1880 ∑P q
= = ∑ p 1q 1
1900 0 0

Using the given data, Fisher’s index in found to satisfy both time reversal and
factor reversal tests.

EXAMPLE:

Calculate fixed base index numbers from the following pries,

commodity 1995 1996 1997 1998 1999 2000


I 4 5 6 6 8 10
II 5 7 8 10 13 15
III 6 9 12 12 15 15

99
Solution:
Prices Price Relative
Total Index No
year Commodity (P) commodity

I II III I II III (∑ 𝑃) (∑ 𝑃 ÷ 𝑁)

1995 4 5 6 100 100 100 300 100.00


1996 5 7 9 125 140 150 415 138.33
1997 6 8 12 125 160 200 200 170.00
1998 6 10 12 150 200 200 200 183.33
1999 8 13 15 200 260 250 250 236.67
2000 10 15 15 250 300 250 250 266.67
COST OF LIVING INDEX

Problem:1

Construct cost of living index, for 2000 taking 1999 as the base year from the
following data using ‘Aggregate Expenditure’ Method.

Article Quantity in 1999 Price Rs Per kg


(kg) 1999 2000
A 6 5 .75 6.00
B 1 5. 00 8.00
C 6 6. 00 9.00
D 1 8.00 10.00
E 2 2.00 1.80
F 1 20.00 15.00
Solution :

Article Quantity Price 2000(p1) p1q0 p0q0


1999(q0) 1999(q0)
A 6 5.75 6.00 36.00 34.50
B 1 5.00 8.00 8.00 5.00
C 6 6.00 9.00 54.00 36.00
D 4 8.00 10.00 40.00 32.00
E 2 2.00 1.80 3.60 4.00
F 1 20.00 15.00 15.60 20.00
= 156.00 = 131.50

100
∑p q
Cost of Living Index = ∑ p1 q0 x 100
0 0

156.60
= 131.50 × 100 =119.0

Problem: Calculate the cost of living index number from the following data.

Item Base year price Current year price Weight


Food 39 47 4
Fuel 8 12 1
Clothing 14 18 3
House Rent 12 15 2
Miscellaneous 25 30 1
Solution:

P0 P1 P
Item Weight W P=P1 x 100
0 WP
Food 39 47 4 120.51 482.04
Fuel 8 12 1 150.00 150.00
Clothing 14 18 3 128.57 385.71
House Rent 12 15 2 125.00 250.00
Miscellaneous 25 30 1 120.00 120.00
Total ------ ------- ∑ W = 11 ------ ∑ WP=1387.75

∑ WP 1387′75
Cost of Living Index Number = ∑ W = = 126.16
11

Problem: 3

Using geometric mean, calculate the cost of living index number for the year 2000.

Commodity Price(1990) Price(2000) Weight


Food 60 108 40
Clothing 50 94 17
Fuel 40 65 13
House Rent 125 225 27
Miscellaneous 120 240 3
Solution:

P0 P1 P
Commodity W P=P1 x 100 Log P W log P
0
Food 40 108 40 180.0 2.2553 90.2120
Clothing 50 94 17 188.0 2.2742 38.6614
Fuel 40 65 13 162.5 2.2909 28.7417
House Rent 125 225 27 180.0 2.2553 60.8931
Miscellaneous 120 240 3 200.0 2.3010 6.9030
∑ W=100 ∑ W log P
=225.4112

101
∑ Wlog P
Cost of Living Index Number = Antilog ( ∑W
)
225.4112
= Antilog ( )
100

= Antilog 2.2541 = 179.5


FIXED BASE INDEX

Problem: 1 Calculate fixed base index numbers from the following prices:

Commodity 1995 1996 1997 1998 1999 2000


I 4 5 6 6 8 10
II 5 7 8 10 13 15
III 6 9 12 12 15 15
Solution:

Prices Price Relatives [P] Total Index No.


Commodity Commodity
Year I II III I II III [∑ P] [∑ P ÷ N]
1995 4 5 6 100 100 100 300 100.00
1996 5 7 9 125 140 150 415 138.33
1997 6 8 12 150 160 200 510 170.00
1998 6 10 12 150 200 200 550 183.33
1999 8 13 15 200 260 250 710 236.67
2000 10 15 15 250 300 250 800 266.67
For each commodity the price in a year is divided by that in 1995 and is multiplied by 100
to get the price relative. Using A.M., the price indices are calculated and are given in the last
column of the above table.

For the first year which is the base year, fixed base index number as well as each P is 100.

CHAIN BASE INDEX

Problem: 1 Prepare index numbers from the average prices of three groups of commodities
given below by taking the base year 1998 and the weights as 5, 3, and 2 respectively.

Group 1998 1999 2000 2001 2002


I 50 55 52 49 55
II 4 5 3 5 6
III 10 10 11 10 9
Solution:

Prices Price Relatives[P] WP ∑ WP F.B.I


Year I II III I II III I II III
1998 50 4 10 100 100 100 500 300 200 1000 100.0
1999 55 5 10 110 125 100 550 375 200 1125 112.5
2000 52 3 11 104 75 110 520 225 220 965 96.5
2001 49 5 10 98 125 100 490 375 200 1065 106.5
2002 55 6 9 110 150 90 550 450 180 1180 118.0

102
The price of each commodity in every year is divided by its price in 1998 and is
multiplied by 100 to get the price relative (P). The price relatives of the three commodities are
multiplied by 5, 3, and 2 respectively to get WP values. They are added year wise (∑ wp)and the
total is divided by 10 (∑ w) to get fixed base index numbers.

Problem: 2 from the following prices of three groups of commodities for the years 1993 to 1997
find the chain base index numbers.

Groups 1993 1994 1995 1996 1997


I 4 6 8 10 12
II 16 20 24 30 36
III 8 10 16 20 24
Solution:

year Prices Link Relatives(p) Total Mean Chain base


i ii iii i ii iii ∑p ∑p /N Index
1993 4 16 8 100.00 100 100 300.00 100.00 100.00
1994 6 20 10 150.00 125 125 400.00 133.33 133.33
1995 8 24 16 133.33 120 160 413.00 137.78 183.70
1996 10 30 20 125.00 125 125 375.00 125.00 229.63
1997 12 36 24 120.00 120 120 360.00 125.00 275.56

The price of each commodity in every year is divided by its price in the preceding
year and is multiplied by 100 to get the link relative (P) As no weight is given, link
relatives are added year wise and the total is divided by 3. The average of each year is
multiplied by the chain index number of the preceding year and is divided by 100 to get
the chain index number of that year. For the first year(1993) the link relatives and the
chain base index number are taken as 100 each.

COST OF LIVING INDEX


Problem:1 Construct cost of living index, for 2000 taking 1999 as the base year from the
following data using ‘Aggregate Expenditure’ Method.

Article Quantity in 1999 Price Rs Per kg


(kg) 1999 2000
A 6 5 .75 6.00
B 1 5. 00 8.00
C 6 6. 00 9.00
D 1 8.00 10.00
E 2 2.00 1.80
F 1 20.00 15.00

103
Solution :

Article Quantity Price 2000(p1) p1q0 p0q0


1999(q0) 1999(q0)
Article Quantity in 1999 Price Rs Per kg
(kg) 1999 2000
A 6 5 .75 6.00
B 1 5. 00 8.00
C 6 6. 00 9.00
D 1 8.00 10.00
E 2 2.00 1.80
F 1 20.00 15.00
Solution :

Article Quantity Price 2000(p1) p1q0 p0q0


1999(q0) 1999(q0)
A 6 5.75 6.00 36.00 34.50
B 1 5.00 8.00 8.00 5.00
C 6 6.00 9.00 54.00 36.00
D 4 8.00 10.00 40.00 32.00
E 2 2.00 1.80 3.60 4.00
F 1 20.00 15.00 15.60 20.00
= 156.00 = 131.50

∑p q
Cost of Living Index = ∑ p1 q0 x 100 = 119.09
0 0

Problem:2 Calculate the cost of living index number from the following data.

Item Base year price Current year price Weight


Food 39 47 4
Fuel 8 12 1
Clothing 14 18 3
House Rent 12 15 2
Miscellaneous 25 30 1

104
Solution:

P0 P1 P
Item Weight W P=P1 x 100
0 WP
Food 39 47 4 120.51 482.04
Fuel 8 12 1 150.00 150.00
Clothing 14 18 3 128.57 385.71
House Rent 12 15 2 125.00 250.00
Miscellaneous 25 30 1 120.00 120.00
Total ------ ------- ∑ W = 11 ------ ∑ WP=1387.75

∑ WP
Cost of Living Index Number = ∑ W = 126.16

Problem: 3 Using geometric mean, calculate the cost of living index number for the year 2000.

Commodity Price(1990) Price(2000) Weight


Food 60 108 40
Clothing 50 94 17
Fuel 40 65 13
House Rent 125 225 27
Miscellaneous 120 240 3
Solution:

P0 P1 P
Commodity W P=P1 x 100 Log P W log P
0
Food 40 108 40 180.0 2.2553 90.2120
Clothing 50 94 17 188.0 2.2742 38.6614
Fuel 40 65 13 162.5 2.2909 28.7417
House Rent 125 225 27 180.0 2.2553 60.8931
Miscellaneous 120 240 3 200.0 2.3010 6.9030
∑ W=100 ∑ W log P
=225.4112

∑ Wlog P
Cost of Living Index Number = Antilog ( ∑W
)

225.4112
= Antilog ( )
100

= Antilog 2.2541
= 179.51.

105
UNIT-V

ANALYSIS OF TIME SERIES


Definition of Time series:

A time series is a collection of observation made sequentially in time.

The series of values might have been observed at regular intervals of time such as daily
sales, Annual profits and decennial census.

E.g. Year: 1991 1992 1993 1994

Production of gold: 121 101 130 132.

Uses of Time Series:

Variables such as Sales, Production, Profit and Population have different values at
different points of time.
(i) The Analysis of Time series helps to know the past conditions
(ii) It helps in assessing the present conditions.
(iii) It helps to predict reliably
(iv) It facilitates Comparison
(v) It fore warns

Components of Time series:

➢ There are a large number of forces affecting time series. As a result, there are fluctuations
of time series.
➢ There are 4 basic types of variations and these are called the components (or) elements of
time series.

✓ Long – Term Effect


✓ Short term Fluctuations.

Long Term – Effect:

1. Secular Trend

Short term variations:

2. Seasonal variation
3. Cyclical variation
4. Irregular variations

106
Components of time series

Long term Short term

Secular Cyclical Seasonal Irregular

Regular
Secular trend:

➢ It is also called long term tread or trend.


➢ The general tendency of a series is to increase (or) decrease over a period of
time.
➢ Increasing trend is observed in population, price, production, literacy etc..,
➢ Decreasing trend in birth date.

Additive model:

Additive model assumes that all the components of the time series are independent of one
another and describes all the components as absolute values.

➢ It is rarely used based on the 4 components


Y = T+S+C+I
Y=Observed value in a given series
T=Trend
S=Seasonal variations
C=Cyclical variations
I=Irregular variations
Multiplication model:

It assumes that all the components are due to different causes but they can affect one
another

➢ It describes only the trend as an absolute value and while the other components are
expressed as rate or %

107
Y = T×S ×C×I

➢ In a tradition (or) classical time series the most commonly assumed mathematical model
is the multiplication model

Mixed model:

➢ A mixed model is a mathematical relation which is expressed as a combination of


multiplication and additive components of time series.
Examples:
Y = T+S ×C+I
Y = T+S×C× I

➢ According to additive model, the trend values then we have to segregate s. c and I

T=Y- (S+C+I)

T=T-(S+C+I)

Additive Multiplication
Expression Y=T+C+S+I Y=T×C×S×I
Absolute values/ Rates All components of a time Only trend is expensed as an
series are expressed as absolute values and the other
absolute values. are expressed as rate (or) %

Mixed model are:

Y = T+S ×C+I
Y = T+S×C× I

Long Term:

1) Secular Trend
There are 4 method of estimate secular trend, They are

➢ Graphical Method
➢ Method of semi average
➢ Method of Moving average
➢ Method of Least Square
(i) Graphic method:
➢ It is also known as free - hand method x axis represents time and y axis.
The observed data.
➢ After marking all the points, the best line is drawn. It is called trend line.
➢ The line is drawn such the following three conditions are satisfied.

108
I. The number of points above the line is equal to The number of points below the line.
II. The sum of the vertical distances of the points above the line equals that of the points
below the line.
III. The sum of the squares of the vertical distances of all the points from line is the
minimum.

Merits:

1. Simplest- Easiest-Quickest method- saves- time- labour.


2. Adaptable of flexible –used- describe all types of trend (i.e) linear (or) Non lineal
3. It gives more (better) estimate when it is used by experienced statistician.

Demerits:

1. It is subjective, different persons get different trend lines.

ii) Method of semi average:

The time series is considered.

➢ Here the series of data is divided into halves. Then the average is found out for
each half.
➢ The avg values are plotted on the graph paper against the mid points.
➢ When there are even no% the middle most year and the A.m of the observed
values are found out for each half.
➢ When it is odd the middle most year and the A.m of the observed values are
found out for each half.
➢ When it is odd the middle most year and the A.m of the observed values are
omitted.
➢ Based on the points the values are marked on the graph sheet. And are found by a
straight line called trend line.

Example-1:

Year sales
2002 60
2003 75 =216/3 =72
2004 81

2005 110
2006 106 =336/3=112
2007 120

109
Example-2:

Year sales
2001 110
2002 105 330/3=110
2003 115
2004 112 Left out
Merits:

1.Simple-Easy-Understand-then moving avg and least square.

2.Does not depend-personal judgement- Every one get the same trend line.

Demeits:

1.Not enough- futule trend – Removing trend from original data.

iii) Method of Moving Averages :(odd and Even)

1.The method of moving averages is one of the most useful methods of estimating
trend.
2. It is an improvement of over semi-average method.
3. It is an algebraic mrthod graph sheet-not used.
Example:

Period of moving avg is an odd such as 3 or 5 or 7.

a+ b+c b +c+ d c +d + e
, , and so on
3 3 3

a +b +c+ d + e
5

Case 1. Period of Moving Average is an odd numbers such as 3 or 5 or 7 etc..

Example 1: Using three year moving average determine the trend and short-term fluctuation.

Year 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992
production 21 22 23 25 24 22 25 26 27 26

110
Solution:

Trend (or) Short


Trend (or) 3yrs
3yrs Moving term fluctual
Year Production (Y) Moving average
Total (Y-Yt)
(1/t)
1983 21 - - -
1984 22 66 22.00 0
1985 23 70 23.33 -0.33
1986 25 72 24.00 1.00
1987 24 71 23.67 0.33
1988 22 71 23.67 -1.67
1989 25 73 24.33 0.67
1990 26 78 26.00 0
1991 27 79 26.33 0.67
1992 26 - - -

Example 2:

Calculate 5 yearly moving average of number of students studying in a commerce college


as shown by the following figures:

year No.of Students Year No.of Students


1987 332 1992 405
1988 311 1993 410
1989 357 1994 427
1990 392 1995 405
1991 402 1996 438
Solution:

5Yearly Moving 5Yearly Moving


Year No.of Students
Totals Averages
1987 332 - -
1988 311 - -
1989 357 1794 358.8
1990 392 1867 373.4
1991 402 1966 393.2
1992 405 2036 407.2
1993 410 2049 409.8
1994 427 2085 417.0
1995 405 - -
1996 438 - -

111
Case 2. Period of Moving Average is an even numbers such as 4 or 6 or 8 etc..

Example 1:

Using four yearly moving averages, calculate the trend values and short term fluctuations
Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
production 464 515 518 467 502 540 557 571 586 612

Solution:
4 Yearly 2 Period 4 Yearly Centered Short – term
Production
Year Moving Moving Moving Average Fluctuations
(Y)
Average Totals (Yt ) ( Y – Yt )
1981 464 - - -
1982 515 - - -
1964
1983 518 3966 495.75 22.25
2002
1984 467 4026 503.63 - 36.63
2027
1985 502 4093 511.63 -9.63
2066
1986 540 4236 529.50 10.50
2170
1987 557 2244 553.00 4.00
2254
1988 571 4580 572.50 -1.50
2326
1989 586 - - -
1990 612 - - -

Example 2:

Calculate 6 yearly centered moving averages of the earnings per share (EPS) of a
company.

Plot the actual data and trend values on a graph sheet.

Year 1985 1986 1987 1988 1989 1990 1991 1992


EPS(Rs.) 10 12 13 15 14 14 16 18

Year 1993 1994 1995 1996 1997 1998 1999 2000


EPS(Rs.) 22 24 26 29 25 21 25 27

112
Solution:

6 Yearly 2 Period
EPS 6 Yearly Centered Moving
Year Moving Moving
(Y) Average (Yt )
Average Totals
1985 10 - -
1986 12 - -
1987 13 - -
78
1988 15 162 13.50
84
1989 14 174 14.50
90
1990 14 189 15.75
99
1991 16 207 17.25
108
1992 18 228 19.00
120
1993 22 255 21.25
135
1994 24 279 23.25
144
1995 26 291 24.25
147
1996 29 297 24.75
150
1997 25 303 25.25
153
1998 21 - -
1999 25 - -
2000 27 - -

Merits:

➢ Objective Method – Same Moving Average - Every one


➢ It Eliminates Short term fluctuation
➢ Reduces the effect of extreme values.

(iv) Method of Least Squares:


➢ By the methods of least squares, a straight line trend can be fitted to the given time
series to data.
➢ It is Mathematical, as well as Analytical method.
➢ It helps in forecasting and predicting.

113
Example 8: Fit a straight line trend equation to the following data by the method of least squares
and estimate the value of sales for the year 1985.

Year 1979 1980 1981 1982 1983


Sales(in Rs.) 100 120 140 160 180

Solution: Let 𝑌 = 𝑎 + 𝑏𝑋 be the equation of the trend line where X – year and Y – Sales .

As X values are large, consider 𝑥 = 𝑋 − 𝑋̅

= X – 1981

Let the resulting equation be 𝑦 = 𝐴 + 𝐵𝑥 where 𝑌 = 𝑦

For finding the values of A and B, the normal equations are

∑ 𝑦 = 𝑁𝐴 + 𝐵 ∑ 𝑥

∑ 𝑥𝑦 = 𝑁 ∑ 𝑥 + 𝐵 ∑ 𝑥 2

Year X Sales (in Rs.) Y = y x = X – 1981 Xy 𝒙𝟐 Trend 𝒀𝒕


1979 100 -2 -200 4 100
1980 120 -1 -120 1 120
1981 140 0 0 0 140
1982 160 1 160 1 160
1983 180 2 360 4 180
Total ∑ 𝒚 = 𝟕𝟎𝟎 ∑𝒙 = 𝟎 ∑ 𝒙 𝒚 = 𝟐𝟎𝟎 ∑ 𝒙𝟐 = 𝟏𝟎 ∑ 𝒀𝒕 = 𝟕𝟎𝟎

By substituting the values from the table,t∑ 𝑦 = 𝑁𝐴 + 𝐵 ∑ 𝑥

700=5A +0B

A=140

∑ 𝑥𝑦 = 𝑁 ∑ 𝑥 + 𝐵 ∑ 𝑥 2

200= 5(0) +10B

B=20

The trend equation is y=104+20x

That is Y=140+20(X-1981)

Corresponding to different values of X, the right hand side given the trend component (Yt).

114
Hence, the equation is written as

Yt = 140+20(X-1981)

Putting X=1979, trend, Yt =140+20(-2)

=100

Putting X=1980, trend, Yt =140+20(-1)

=120

Putting X=1981, trend, Yt =140+20(0)

=140

Putting X=1982, trend, Yt =140+20(1)

=160

Putting X=1983, trend, Yt =140+20(2)

=180

Putting X=1985, trend, Yt =140+20(4)

=220

The value of sales in 1985 is estimation to be Rs.220

Example 2: Fit a linear trend equation by the method of least squares and estimate the estimate
the net profit in 2003.

What is the amount of change per annum in net profit?

year 1995 1996 1997 1998 1999 2000 2001


Net profit(Rs.Crores) 32 36 44 37 71 72 109

Plot the observed values on a graph sheet. Draw the trend line also.

Net profit x = X – 1988


Year X xy 𝒙𝟐 Trend 𝒀𝒕
(Rs. Crores) Y = y
1995 32 -3 -96 9 21.92
1996 36 -2 -72 4 33.71
1997 44 -1 -44 1 45.50
1998 37 0 0 0 57.29
1999 71 1 71 1 69.08
2000 72 2 144 4 80.87
2001 109 3 327 9 92.66
∑ 𝒙𝟐
Total ∑ 𝒚 = 𝟒𝟎𝟏 ∑𝒙 = 𝟎 ∑ 𝒙 𝒚 = 𝟑𝟑𝟎 ∑ 𝒀𝒕 = 𝟒𝟎𝟏. 𝟎𝟑
= 𝟐𝟖

115
Mid year, 𝑥̅ = 1998
∑𝑦 401
Trend of the mid year, 𝑎= = = 57.29
𝑁 7

∑ 𝑥𝑦 330
Annual change in trend, 𝑏= ∑ 𝑥2
= = 11.79
28

Linear trend equation: 𝑌𝑡 = 𝑎 + 𝑏(𝑋 − 𝑋̅)

i.e., 𝑌𝑡 = 57.29 + 11.79(𝑋 − 1998)

By substituting X = 1995, trend 𝑌𝑡 = 57.29 + 11.79(−3)

= 57.29 – 35.37

= 21.92

The amount of change per annum in net profit, b = Rs.11.79 Crores.

By adding 11.79 to the trend of each year, the trend of the next year is found.

The net profit in 2003 = 57.29 + 11.79(2003 - 1998)

= 57.29 + 11.79 x 5

= 57.29 + 58.95

= Rs. 116.24 Crore

Merits:

1. The Method is Mathematically Sound.


2. The algebraic sum of deviation of actual value from trend value is Zero .∑(𝑦 − 𝑦𝑡 )=0
2 2
3. The sum of squares of the deviations ∑(𝑦 − 𝑦𝑡 ) is Minimum. ∑(𝑦 − 1/𝑡) = least.
4. The estimates a and b are unbiased.
5. The estimates a and b have minimum variance.

Demerits:

1. It is not appropriate for projections.


2. It ignores cyclic, seasonal and irregular fluctatuation and it is based on long term variation.
3. This cannot be used.

The straight line trend (or) the first degree parabola is given by
Y = a+bx
The Equation for the second degree parabola
Y = a+bx+cx2

116
The Equations for the third degree parabola
Y = a+bx+cx2+dx3
Fitting of first degree parabola,
The Normal equations to find a and b are:
∑y = Na + b ∑x
∑xy = a ∑x + b∑x2
Types:
1. Linear trend
2. Non-linear (or) Curvilinear trend
➢ If we got a straight line when the values of the time series are plotted on a graph. Then it
is called straight line trend or linear trend.
➢ If we plot the values in the graph and if it forms a curve. Then it is called non-lineal (or)
curvilinear trend.

Uses of trend:

➢ The trend describes the basic growth tendency ignoring short time fluctuations.
➢ It describes the pattern of behavior which has characterized the series in the past.
➢ Future behavior can be forecasted
➢ Trend analysis facilitates us to compare two (or) more time series over different period of
time and this helps to draw conclusion about them.

Seasonal variations (or) Fluctuations:

➢ Seasonal is a period which is less than one year.


➢ It may be a period of 6 months (or) 4 months (or) 3 months.
➢ Certain nature is observed in the first season, second season and so on.

Definition:

A variance which occurs with some degree of regularity within a specified period of
one year (or) shorter is called seasonal variation.

The seasonal variation may occur due to

(a) Climate and natural forces


(b) Customs and habits
(a) Climate and nature forces:
➢ Sales of ice cream, khadi and cotton clothes are more during summer.
➢ Sales of umbrellas are at its peak during rainy season.
➢ Production of paddy, wheat etc is more in a few months and less in other months of a
year.

117
(b) Customs, traditions and habits of the people:
➢ Sales of crockers and fire works is found to be more during deepavali every year.
➢ Cloth shops register very good sales during festival seasons such as deepavali,
ramzon, Christmas.
➢ All thes variation in sales, work load, are due to cultoms.

Cyclical fluctuations:

➢ Cyclical fluctuations are similar to seasonal variations .the difference is in the


interval of recurrence.
➢ In seasonal fluctuations a nature of the series recurs at an interval of one year.
➢ Cyclical fluctuations recur at an interval of 3 or more years.
➢ For example: in economics and business there are many time series which
have certain have like movements called business cycles.
➢ In one period, profit are easily made and are made in plenty also. Prices are
high. this period is called prosperity.
➢ After this peak condition, things decline instead of improving, high wages
decreased efficiency, increasing interest rate cause decline. This is the period
of recursion.
➢ The four phases of business cycle
a) Prosperity
b) Recersion
c) Depression
d) Recovery recurs one after another.

Irregular variation:

➢ The other name for this is random variation (or) erratic fluctuations.
➢ Variations which donot come under the other three components are called
Irregular variation.
➢ Fire. Floods, earthquakes, cause irregular variations.
➢ For example: there may be very poor sales on a particular day in a leading
cloth shop on the eve of deepavali
➢ Causes for such a happening may not be known.

Models:

➢ There exists certain relations b/w the components and the series of
observation.
➢ The relation between the observed value and the component is called model.
a) Additive model b)Multiplicative model

Method of least squares:

➢ By the method of least square, a straight line trend can be fitted to the given
time series.
➢ It is mathematical, as well as analytical method.

118
➢ Helps in forecasting and prediction.
➢ The trend line is called the line of best fit.
➢ The sum of deviations of the actual values and the trend value is the zero.
➢ Sum of squares of the deviation of the actual value and the trend value is least.
(i.e.,)
(Y-YC) =0 (Y-YC)2= least

SEASONAL FLUCTUATIONS

MEASURE OF SEASONAL VARIATION:

The following four methods are used to estimate the seasonal variations.

Method of simple average


Method of moving average
Difference from moving average
Ratio – to – moving average
Ratio – to – trend method
Method of line relatives.

1. METHOD OF SIMPLE AVERAGE:

This method assumes absence of trend in a time series. The following are the steps.
The data are arranges season – wise in chronological order.
For each season the total of the seasonal is found and called seasonal total
Each seasonal total is divided by number of year and seasonal average is obtained.
The total and the average of the seasonal averages are found. The average is called
grand average.
Seasonal index of every season is calculated as follows.

𝒔𝒆𝒂𝒔𝒐𝒏𝒂𝒍 𝒂𝒗𝒆𝒓𝒂𝒈𝒆
Seasonal index = ×100
𝒈𝒓𝒂𝒏𝒅 𝒂𝒗𝒆𝒓𝒂𝒈𝒆

Problem: 1

Assuming no trend in the series, calculate seasonal indices for the following data.

QUARTER
Year I II III IV
1994 78 66 84 80
1995 76 74 82 78
1996 72 68 80 70
1997 74 70 84 74
1998 76 74 86 82

119
Solution:

QUARTER
Year
I II III IV
1994 78 66 84 80
1995 76 74 82 78
1996 72 68 80 70
1997 74 70 84 74
1998 76 74 86 82
Seasonal total 376 352 416 384 Total grand average
Seasonal average 75.2 70.4 83.2 76.8 305.6 76.4
Seasonal index 98.4 92.2 108.9 100.5 400.0 -

Merits:

i) It is the easiest method.


ii) It is the simplest and least time consuming method.

Demerits:

i) It assumes the absence the absence of trend in a time series. This assumption is not
always true.
ii) It assumes that the averaging process eliminates the seasonal fluctuation. It is also not
true.

120
121

You might also like