You are on page 1of 81

Module 1: Introduction to Statistics

In the modern world of computers and information technology, the


importance of statistics is very well recogonised by all the disciplines.
Statistics has originated as a science of statehood and found
applications slowly and steadily in Agriculture, Economics, Commerce,
Biology, Medicine, Industry, planning, education and so on.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Definition of Statistics

Statistics is a branch of science dealing with collecting,


organising, summarizing, analysing and making decisions
from the data.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Data Analysis

The data can be collected in connection with time or geographical


location or in connection with time and location.
Any statistical data can be classified under two categories depending
upon the sources utilized.
(1). Primiary data
(2). Secondary data.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I 3 / 54
1. Primiary data

Primary data is the one, which is collected by the investigator himself for
the purpose of a specific inquiry or study. Such data is original in character
and is generated by survey conducted by individuals or research institution
or any organisation
Example
If a researcher is interested to know the impact of noonmeal scheme for
the school children, he has to undertake a survey and collect data on the
opinion of parents and children by asking relevant questions. Such a data
collected for the purpose is called primary data.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Methods for Collecting Primary data

The primary data can be collected by the following five methods.


1 Direct personal interviews
2 Indirect Oral interviews
3 Information from correspondents
4 Mailed questionnaire method
5 Schedules sent through enumerators.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
2. Secondary data

Secondary data are those data which have been already collected and
analysed by some earlier agency for its own use; and later the same data
are used by a different agency.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Classification of Data

1 Raw data (or Individual data or Ungrouped data)


2 Discreat form data or Discreat frequency distribution
(or Ungrouped data)
3 Continuous form data or Continuous frequency distribution
(Grouped data)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
1. Raw data or Ungrouped data

The statistical data collected are generally raw data or ungrouped data.
Example 1
Let us consider the daily wages (in Rs ) of 30 labourers in a factory. 800,
700, 550, 500, 600, 650, 400, 300, 800, 900, 750, 450, 350, 650, 700, 800,
820, 550, 650, 800, 600, 550, 380, 650, 750, 850, 900, 650, 450, 750.
Example 2
If we write a 100 number of students class marks out of 10 (MM=10) as;
2, 3, 7, 9, 0, 10, 4, 8, 3, 6, 3, 2, 7, 9, 1, 0, 10, ....................................
Note: We can't analyze about the class performance by the Raw data.
So, we need to convert in the frequency distribution form of data.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
2. Discrete (or) Ungrouped frequency distribution

In this form of distribution, we write repeated data into the


frequency form.

Let us take following example


Example
In a survey of 40 families in a village, the number of children per family was
recorded and the following data obtained.

1 0 3 2 1 5 6 2
2 1 0 3 4 2 1 6
3 2 1 5 3 3 2 4
2 2 3 0 2 1 4 5
3 3 4 4 1 2 4 5

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
The above examples' Raw data can be witten in the form of a discrete
frequency distribution as:

Number of Childern Frequency


0 is repeated 3 times 0 3
1 is repeated 7 times 1 7
2 is repeated 10 times 2 10
3 is repeated 8 times 3 8
4 is repeated 6 times 4 6
5 is repeated 4 times 5 4
6 is repeated 2 times 6 2
Total 40

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
3. Continuous frequency distribution (Grouped data)

In this form of distribution refers to groups of values. This becomes


necessary in the case of some variables which can take any fractional value
_____________
and in which case an exact measurement is not possible. Hence a discrete
variable can be presented in the form of a continuous frequency
distribution.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I 12 / 54
Example

Wage distribution of 100 employees

Weekly wages (Rs) Number of employees


50-100 4
100-150 12
150-200 22
200-250 33
250-300 16
300-350 8
Total 100

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Measures of Central Tendency

1 Arithmetic Mean
2 Median
3 Mode

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
1. Arithmetic Mean or Average

P
i=1 xi
x̄ =
n
If xi |fi , i = 1, 2, . . . , n

Pn
i=1 fi xi
x̄ =
N
Pn
where N = i=1 fi

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
2. Median

Median of distribution is the value of the variable which divides it into


two equal parts
It is the value which exceeds and is exceeded by the same number of
observations
Median is the value such that the number of observations above it is
equal to the number of observations below it
The median is thus a positional average

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
In case of ungrouped data, ______________________________________
if the number of observations is odd then
median is the middle value after the values have been arranged in
____________________________________________________________
ascending or descending order of magnitude.
______________________________________
In case of even number of observations, there are two middle terms and
________________________
median is obtained by taking the arithmetic mean of the middle terms.
______________________________
For example, the median of the value 25, 20, 15, 35, 18, i.e., 15, 18,
20, 25, 35 is 20
The median of 8, 20, 50, 25, 15, 30, i.e., of 8, 15, 20, 25, 30, 50 is
1
2 (20 + 25) = 22.5.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
2A. Median of discrete frequency distribution

In case of discrete frequency distribution median is obtained by considering


the cumulative frequencies. The steps for calculating median are given
below:
P
1. Find N/2, where N = i fi
2. See the (less than) cumulative frequency (cf.) just greater than N/2
3. The corresponding value of x is median

Example
Obtain the median for the following frequency distribution:

x: 1 2 3 4 5 6 7 8 9
f: 8 10 11 16 20 25 15 9 6

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Solution:
x f c.f.
1 8 8
2 10 18
3 11 29
4 16 45
5 20 65
6 25 90
7 15 105
8 9 114
9 6 120

Hence, N = 120 =⇒ N/2 = 60. Cumulative frequency (cf.) just greater


than N/2 is 65 and the value of x corresponding to 65 is 5. Therefore,
median is 5.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Statistics for Engineers (MAT2001) Module-I 19 / 54
2B. Median of continuous frequency distribution

In the case of continuous frequency distribution, the class corresponding to


the cf. just greater than N/2 is called the median class and the value of
median is obtained by the following formula :
 
h N
Median = l + −c
f 2
where l is the lower limit of the median class, f is the frequency of the
median class, h is the magnitude of the median class, c is the cf. of the
P
class preceding the median class, and N = i fi

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Statistics for Engineers (MAT2001) Module-I 20 / 54
Example
Find the median wage of the following distribution:
Wages (in Rs.) : 20–30 30–40 40–50 50–60 60–70
No. of labourers : 3 5 20 10 5

Solution:
Wages (in Rs.) No. of labourers c.f.
20–30 3 3
30–40 5 8
40–50 20 28
50–60 10 38
60–70 5 43

Here N/2 = 43/2 = 21.5.


. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
3. Mode

Mode is the value which occurs most frequently in a set of observations


and around which the other items of the set cluster densely.
Mode is the value of the variable which is predominant in the series.

Definition (Discrete frequency distribution)


In the case of discrete frequency distribution mode is the value of x
corresponding to maximum frequency.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
For example, in the following frequency distribution:
x: 1 2 3 4 5 6 7 8
f: 4 9 16 25 22 15 7 3
the value of x corresponding to the maximum frequency, viz., 25 is 4. Hence
mode is 4.
But in anyone (or-more) of the following cases :
if the maximum frequency is repeated
if the maximum frequency occurs in the very beginning or at the end
of the distribution and
if there are irregularity in the distribution, the value of mode is
determined by the method of grouping. which is illustrated below by
an example.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
3A. Discrete frequency distribution

Example
Find the mode of the following frequency distribution:
Size (x): 1 2 3 4 5 6 7 8 9 10 11 12
Frequency (f): 3 8 15 23 35 40 32 28 20 45 14 6

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
(a)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
(b)
Here, we can see that 6 is repeated maximum numbers of times, hence mode is 6.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
3B. Continuous frequency distribution

In case of continuous frequency distribution. Mode is given by the formula


:
h(f1 − f0 )
Mode = l +
2f1 − f0 − f2
Here l is the lower limit, h the magnitude and f1 the frequency of the modal
class, f0 and f2 are the frequencies of the classes preceding and succeeding
the modal class respectively.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Example
Find the mode of the following frequency distribution:

Class-interval: 0–10 10–20 20–30 30–40 40–50 50–60 60–70 7


Frequency: 5 8 7 12 28 20 10

Solution: Maximum frequency is 28. Thus the class 40-50 is the modal
class.
l = 40, the lower limit of the modal class
h = 10, the magnitude
f1 = 28, the frequency of the modal class
f0 = 12 and f2 = 20
Answer=46.67 (approx.).
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Example
The median and mode of the following wages distribution are known to be
Rs.33.50 and Rs.34 respectively. Find the value of f3 , f4 , f5 .
Class-interval: 0–10 10–20 20–30 30–40 40–50 50–60 60–70 7
Frequency: 4 16 f3 f4 f5 6 4
Solution:

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Since median is 33.5, which lies in the class 30-40,
 30-40
 is the median
h N
class. Using the median formula Median = l + f 2 − c , we get

f3 = 95 − 0.35f4

Mode being 34, the modal class is also 30-40. Using mode formula Mode =
4 −f3 )
l + 2fh(f
4 −f3 −f5
, we get
10(f4 − f3 )
34 = 30 + .
2f4 − f3 − f5
By applying f3 = 95 − 0.35f4 and 200 − f4 = −f3 − f5 , we have f4 = 100
f3 = 60, f5 = 40

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
#Relation between Mean, Median, Mode

1 In symmetrical distribution Mean = Median = Mode


2 In positively skewed distribution Mode < Median < Mean
3 In negatively skewed distribution Mean < Median < Mode

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
@ Geometric Mean and Harmonic Mean

Definition (Geometric Mean)


Geometric mean of a set of n observations is the nth root of their product.
Thus the geometric mean GM of n observations xi , i = 1, 2, . . . , n is

GM = (x1 x2 . . . xn )1/n

Definition (Harmonic Mean)


Harmonic mean of n number of observations is the reciprocal of the
arithmetic mean of the reciprocals of the given values. Thus the harmonic
mean HM of n observations xi , i = 1, 2, . . . , n is
1
HM = 1 Pn .
n i=1 (1/xi )

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
$ Partition Values

These are the values which divide the series into a number of equal parts.
The three points which divide the series into four equal parts are called
quartiles
The nine points which divide the series into ten equal parts are called
deciles
The ninety-nine points which divide the series into hundred equal parts
are called percentiles
In quartiles, The first, second and third points are known as the first,
second and third quartiles respectively. The first quartile, Q1 is the value
which exceed 25% of the observations and is exceeded by 75% of the
observations.The second quartile, Q2 , coincides with median. The third
quartile, Q3 , is the point which has 75% observations before it and 25%
observations after-it.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Measures of Dispersion

1 Range
2 Quartile deviation or Semi-interquartile range
3 Mean deviation
4 Standard deviation and Variance

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Measures of Dispersion (or) Measures of Variance

1. Definition (Range)
The range is the difference between two extreme observations of the
distribution.

Range = Maximum Value – Minimum Value


Demerits of Range dispersion:
__________________________
-> It is the simplest of measure of dispersion, but It is based on
two extreme observations.
-> Hence, get affected by fluctuations.
-> Therefore, it is not at all a reliable measure of dispersion.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Measures of Dispersion

Definition (Quartile deviation)


Quartile deviation or semi-interquartile range Q is given by
1
Q = (Q3 − Q1 )
2
where Q1 and Q3 are the first and third quartiles of the distribution
respectively.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Measures of Dispersion

If xi |fi , i = 1, 2, . . . , n
the average x̂

1Xn
Mean deviation = fi |xi − x̄|
N i

where |xi − x̄| represents the modulus or the absolute value of the deviation
P
(xi − A), when the -ve sign is ignored i fi = N.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Measures of Dispersion

4. Definition (Standard Deviation)


If x|fi , i = 1, 2, . . . , n is the frequency distribution, then standard deviation
is given by v
u
u1 X
n
σ=t fi (xi − x̄)2
N i
P
where x̄ represents the arithmetic mean of the distribution and i fi = N.

The square of standard deviation is called the variance and is given by

1Xn
σ2 = fi (xi − x̄)2
N i

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Measures of Dispersion

*. Definition (Root mean square deviation)


If x|fi , i = 1, 2, . . . , n is the frequency distribution, then the Root mean
square deviation is given by
v
u
u1 X
n
s=t fi (xi − A)2
N i
P
where A is any arbitrary number and i fi = N.

s2 is called the mean square deviation.


In the standard deviation formula mean/median/mode is used, but Root
mean square deviation formual any random arbitrary constant A is used.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Example

Calculate the Mean, Standard deviation for the following age distribution of
542 members
age in year: 20–30 30–40 40–50 50–60 60–70 70–80 80–90
#members: 3 61 132 153 140 51 2
Solution:
Mean x̄ = 54.72 years
Standard deviation (σ) =11.55 years

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Different formulae for calculating variance

1 Pn
σ2 = N i fi (xi − x̄)2

Pn  Pn 2
1 1
σx2 = N i fi x2i − N i fi xi

Pn  Pn 2 Pn
1 1 1
σx2 = N i fi d2i − N i fi di , where di = xi − A, x̄ = A + N i fi di .
xi −A
In case of grouped- or continuous frequency distribution di = h ,
where h is common magnitude of class interval, A is arbitrary point.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Take d = x−A x−55
h = 10
Age group Mid-value (x) Frequency (f) d = x−55
10 fd fd2
20–30 25 3 -3 -9 27
30–40 35 61 -2 -122 244
40–50 45 132 -1 -132 132
50–60 55 153 0 0 0
60–70 65 140 1 140 140
70–80 75 51 2 102 204
80–90 85 2 3 6 18
542 -15 765
P
fd 10×−15
x̄ = A + h N = 55 + = 54.72 years
P  542
P 2 
1 n 1 n
σx2 = h2 N
2
i fi di − N i fi di = 133.3
Standard deviation (σ) =11.55 years

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Summary

Pn
fx
i=1 i i
x̄ = N ,
 
h N
Median = l + f 2 −c
h(f1 −f0 )
Mode = l + 2f1 −f0 −f2
 
h i∗N
Qi = l + f 4 − c and Quartile Deviation, Q = 12 (Q3 − Q1 )
q Pn
1
Standard Deviation, σ = N i fi (xi − x̄)2
C.V = 100 × σ

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Summary

Pn  Pn 2
1 1
σx2 = N i fi d2i − N i fi di , di = xi − A.
In continuous
 frequency distribution,

Pn  Pn 2
1 1 xi −A
σx2 = h2 N i fi d2i − N i fi di , where di = h
1 Pn h Pn
x̄ = A + N i fi di , x̄ = A + N i fi di
C.V = 100 × σx̄
q Pn
1
s= N i fi (xi − A)2
1 Pn
Mean deviation = N i fi |xi − x̄|

Some Solved examples


. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Exercise:
Moments
The rth moment of a variable x about any point x = A. usually denoted by
µ′r is given
1X
µ′r = fi (xi − A)r
N i
1X r
= fi di ,
N i
where di = xi − A. The rth moment of a variable about the mean x̄. usually
denoted by µr is given by
1X
µr = fi (xi − x̄)r
N i
NOTE:
µ0 = 1 Use r =0 in the formula, we get this relation

µ1 = 0 Use r =1 in the formula, we get this relation

µ2 = σ 2 (Variance) Use r =2 in the formula, we get this relation


.
.
.
.
.
. . . . .
. . . .
. . . .
. . . .
. . . .
. . . . .
.
.
.
.
.
.
.
.
.

Module-I
Pearson’s β and γ coefficients

Karl Pearson defined the following four coefficients, based upon the first
four moments about mean:
µ23
β1 = µ32


γ1 = + β 1

µ4
β2 = µ22

γ2 = β 2 − 3

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Skewness β1 and Kurtosis β2

x̄ = d = :
> >
x̄ < Md < M0 :

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I
Skewness β1 and Kurtosis β2

Skewness β1
In symmetrical distribution x̄ = Md = M0 : β1 = 0
In positively skewed distribution x̄ > Md > M0 : β1 > 0
In negatively skewed distribution x̄ < Md < M0 : β1 < 0

Kurtosis β2
Mesokurtic: β2 = 3
Leptokurtic: β2 > 3
Playtkurtic; β2 < 3

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Module-I

You might also like