Professional Documents
Culture Documents
Statistics
Classification of Data
There are five measures of central tendency that are in common use
1) Arithmetic average or arithmetic mean or simple mean
2) Median
3) Mode
4) Geometric mean
5) Harmonic mean
Arithmetic mean or Average:
1. Ungrouped Data: Consider set of observations , , ,……,
∑
. .= =
Measures of Central Tendency
Arithmetic mean or Average:
2. Ungrouped Frequency distribution: Consider the frequency Distribution
: :
: :
∑ ∑
. = = =
∑
Measures of Central Tendency
Q.1 Find mean of the following observations
(a) 480,485,492,495,500,505,515
(b)
x 32 33 34 35 36
f 4 10 13 8 5
Solution :
(a) Since, given data is ungrouped data
k
fx
i 1
i i
32 4 33 10 34 13 35 8 36 5
x n x x 34
4 10 13 8 5
f
i 1
i
Measures of Central Tendency
Arithmetic mean or Average:
3. Grouped Frequency distribution: Consider the frequency Distribution
Class f
+
−
2
+
−
2
: : :
: : :
+
−
2
∑ ∑ Where, ℎ ∶ ℎ ℎ
. .= = +ℎ = +ℎ
∑ ∶
−
=
ℎ
Measures of Central Tendency
Q.2 The marks obtained in paper of Mathematics are given in the following table. Find the
Arithmetic mean of the distribution.
Solution :
Measures of Central Tendency
Mid Value No. of students −
Marks Obtained =
ℎ
0-10 5 8 4 32
10-20 15 20 3 60
20-30 25 14 2 28
30-40 35 16 1 16
40-50 45 20 0 0
50-60 55 25 1 25
60-70 65 13 2 26
70-80 75 10 3 30
80-90 85 5 4 20
90-100 95 2 5 10
Total f 133 fu 25
= middle value of = 45
ℎ = 10
Measures of Central Tendency
We have
n
fu
i 1
i i
x Ah n
f
i 1
i
25
x 45 10
133
x 43.12
Dispersion
Disperson
In some cases average will not give the full information. In this we check how much is
scatter/ Disperse from its mean value. To do this we use following Methods of Dispersion
1. Standard Deviation
2. Coefficient of Variation
Dispersion
Standard deviation
1. Ungrouped Data: Consider set of observations , , ,……,
1
= = −
1
= = −
: :
: :
Dispersion
Remarks
1) = ∴ =
Ungrouped Data
1 1
= − = −
1 k 2 2
xi x
k i 1
Dispersion
Remarks
1. Standard Deviation gives information about distribution of data from its mean value.
2. Coefficient of Variation gives information about distribution of data from its mean value
in percentage
3. If the standard deviation is 0.20 and the mean is 0.50, then the C.V. = 40%. So CV helps
us see that even a lower standard deviation doesn't mean less variable data.
Solution : 1 x i
Standard deviation for ungrouped data is = − where, x i 1
49 2401
2
63 3969 2 1 448
25412 40.5
46 2116 8 8
59 3481
6.3640
65 4225
52 2704
60 3600
54 2916
∑ = 448 ∑ = 25412
Dispersion
Q.2 The scores obtained by two batsman A and B in 10 matches are given below. Determine
who is more consistent?
Batsman A 30 44 66 62 60 34 80 46 20 38
Batsman B 34 46 70 38 55 48 60 34 45 30
Solution :
Here, we have to find coefficient of variation of Batsman A i.e. ( . . ) & coefficient of
variation of Batsman B i.e. ( . . )
Where,
1 k 2 2 x i
2
xi x
k i 1
x i 1
k
Dispersion
Case I : Coefficient of variation (A) = ( . .) = ̅
× 100 C.V . A 100
Batsman A 1 k 2
A
x A
xi 2 x
2
k i 1
17.6409
30 900 C.V . A 100
2 48
44 1936 1 480
2 26152
66 4356 10 10 C.V . A 36.7519
62 3844 2 311.2
60 3600
34 1156
17.6409
80 6400 A 17.6409
46 2116
480
20 400 & x A
10
48
38 1444
∑ = 480 ∑ = 26152
Dispersion
Case II : Coefficient of variation (B) = ( . .) = ̅
× 100 C.V . B 100
Batsman B 1 k 2
B
x B
xi 2 x
2
k i 1
12.1078
34 1156 C.V . B 100
2 46
46 2116 1 460
2 22626
70 4900 10 10 C.V . B 26.3214
38 1444 2 146.6
55 3025
48 2304
12.1078
60 3600 B 12.1078
34 1156
460
45 2025 & x B
10
46
30 900
∑ = 460 ∑ = 22626
Dispersion
Since
x 20 & 3.....(1)
k
x i
1 k 2 2
i.e. x i 1
k
2
& xi x
k i 1
30
x i
1 30 2 2
i.e. i 1
20 & xi 9
20 From equation (1)
30 30 i 1
Dispersion
28
x 22 15 i
1 28 2 2 2 2
i.e. i 1
20 & xi 22 15 20 9
30 30 i 1
28 28
2
i.e. xi 563 &
i 1
i 11561
x
i 1
Find new A.M. and S.D.
Case I : when item 22 & 15 are dropped Calculate A.M. and S.D.
if item 22 is replaced by
k
8 and 15 is replaced by
xi 1
i 17.
x
k
28
x i 563
x i 1 20.1071 x 20.1071
28 28
Dispersion
28 28
1 k 2 2
2
xi x
k i 1
i.e. x
i 1
i 563 & x
i 1
i
2
11561
1 28 2
2 2
xi 20.1071
28 i 1
1 2
2 11561
20.1071
28
2 8.5974
2.9321
Case II : when item 22 is replaced by 8 & 15 is replaced by 17
x1 , x2 , x3 ,...8,17,.... x30
Dispersion
k 28 28
x
i 1
i i.e. x i 563 & x i
2
11561
x i 1 i 1
k
28
x 8 17
i 1
i
563 8 17
x 19.6 x 19.6
30 30
1 k 2 2
2
and xi x
k i 1
1 28 2 2 2
xi 8 17 2 19.6
2
30 i 1
1 2
2 11561 82 17 2 19.6
30
2 12.9733 3.6019
Moments, Skewness and kurtosis
Moment :
The moment about mean ̅ is denoted by and defined as
1
= −
Also, the moment about any Number ‘A’ is denoted by ′ and defined as
1
′ = −
standard deviation 2
3 2
1 2
2
Middle value
separating the greater
Median 1, 2, 2, 3, 4, 7, 9 3
and lesser halves of a
data set
2. Negative Skewness : If the frequency curve has a longer tail to the left of the central
maximum than to the right, the distribution is said to be skewed to the left or to have
negative skewed. In
a negatively skewed distribution, Mode > Median > Mean.
Moments, Skewness and kurtosis
Remarks :
Coefficient of skewness
3 2
1 2
2
1. >0 ℎ
<0 ℎ
=0 ℎ
Moments, Skewness and kurtosis
Kurtosis : Kurtosis is a measure of peakedness of a distribution relative to the normal
distribution(or symmetrical distribution)
A distribution having a relatively high peak is called leptokurtic. A distribution which is flat
topped is called platykurtic. The normal distribution which is neither very peaked nor very
flat-topped is also called mesokurtic.
Moments, Skewness and kurtosis
Remarks :
Coefficient of kurtosis
4
2 2
2
2. >3 ℎ
=3 ℎ
<3 ℎ
Moments, Skewness and kurtosis
Q.1 The first four moments of a distribution about the value 5 are 2,20,40 and 50. From the
given information obtain the first four central moments, mean, standard deviation and
coefficients of skewness and kurtosis.
Solution :
Given that
A 5, 1 ' 2, 2 ' 20, 3 ' 40, 4 ' 50
The first four central moments or moments about mean are
1 0
2 2
2 2 ' 1 ' 20 2 16 2 16
3 3
3 3 ' 31 ' 2 ' 2 1 ' 40 3 2 20 2 2 64 3 64
2 4 2 4
4 4 ' 41 ' 3 ' 6 1 ' 2 ' 3 1 ' 50 4 2 40 6 2 20 3 2 162
4 162
Moments, Skewness and kurtosis
Mean
mean x A 1 '
mean x 5 2 mean x 7
Standard Deviation
standard deviation 2
Coefficient of skewness
2
3 2 64
1 2 1 2
1 1
2 16
Coefficient of kurtosis
162
2 42 2 2 2 0.6328
2 16
Moments, Skewness and kurtosis
Q.2 Find the four moments about the mean of the following
61 64 67 70 73
5 18 42 27 8
Solution :
1
The moment about mean ̅ is given below = −
n n
1 1 3
1
N
f x x
i 1
i i 3
N
f x x
i 1
i i
n n
1 2 1 4
2
N
f x x
i 1
i i 4
N
f x x
i 1
i i
Moments, Skewness and kurtosis
− ̅ − ̅ − ̅ − ̅
61 5 305 -32.25 208.0125 -1341.6806 8653.84
64 18 1152 -62.1 214.245 -739.1453 2550.051
67 42 2814 -18.9 8.505 -3.8273 1.722263
70 27 1890 68.85 175.5675 447.6971 1141.628
73 8 584 44.4 246.42 1367.6310 7590.352
n n
1 1
fi xi
6745
1
N
i 1
fi xi x
100
0 0
i 1
x n 67.45
100
f i
1 n 2 1
i
2
N
f x x
i 1
i i
100
852.75 8.5275
Moments, Skewness and kurtosis
n
1 3 1
3
N
f x x
i 1
i i
100
269.3250 2.6933
n
1 4 1
4
N
f x x
i 1
i i
100
19937.5931 199.3759
Coefficient of skewness
2
3 2 2.6933
1 2 1 2
1 0.0998
2 8.5275
Coefficient of kurtosis
4 199.3759
2 2 2 2 2 2.7418
2 8.5275
Correlation
Correlation :
To measure linear relationship between two variables, Karl Pearson developed a formula
called correlation coefficient.
Coefficient of Correlation between two variables and denoted by ( , ) and defined as
( , )
, =
Where
1 1
, = − − , = − −
1 1
, = − . , = − .
1 1 1 1
= − = − = − = −
Correlation
Remarks :
1. −1 < <1
1 n 2 2
x 2
xi x
n i 1
y 21
x 20
1 2
x 2 2680 20
6
x 2 46.6667 x 6.8313
1 n 2 2
1
y 2
yi y
n i 1
y 2 3276 21
6
2
y 2 105 y 10.2470
Correlation
cov x, y cov x, y 42
r x, y
x y
x 6.8313
42
r x, y y 10.2470
6.831310.2470
r x, y 0.6
EXAMPLES
Q.2 Find the coefficient of correlation for the following table
x 6 2 10 4 8
y 9 11 5 8 7
Solution :
cov x, y
We know, Karl Pearson’s coefficient of correlation , r x, y
x y
Correlation
1 n 1 n 2 2 1 n 2 2
where, cov x, y xi yi x. y
n i 1
x 2
xi x
n i 1
y 2
yi y
n i 1
xy x 2 y2
6 9 54 36 81
2 11 22 4 121
10 5 50 100 25
4 8 32 16 64
8 7 56 64 49
∑ = 30 ∑ = 40
214 220 340
n
x i
n
x i 1
30
6 x6 y
i 1
i
40 y 8
n 5 y 8
n 5
Correlation
1 n
cov x, y xi yi x. y
n i 1
1
cov x, y 214 6 8
5
cov x, y 5.2
1 n 2 2
x 2
xi x
n i 1
y 8
x6
2 1 2
x 220 6
5
x2 8 x 2.8284
1 n 2 2
1
y 2
yi y
n i 1
2
y 2 340 8 y 2 4
5
y 2
Correlation
cov x, y cov x, y 5.2
r x, y
x y
x 2.8284
5.2
r x, y
2.8284 2 y 2
r x, y 0.9192
Solution :
We know that
n n
x
i 1
i
40 y i
40
x 2 & y i 1
2
n 20 n 20
Correlation
cov x, y Q.3 Find the coefficient of correlation for the following table
r x, y n 20, x 40, xi 2 190, yi 40, yi 2 200, xi yi 150,
x y
n x2 y2
1
where, cov x, y
n i 1
xi yi x. y
1
150 2 2
20 3.5
r x, y
2.3452 2.4495
3.5
cov x, y 3.5 r x, y 0.6093
1 n 2 1 2
x 2
xi x
n i 1
20
2
190 2 5.5 x 2.3452
1 n 2 2 1
y2 2
yi y 200 2 6 y 2.4495
n i 1 20
Regression
Regression :
In statistical modeling, regression analysis is a statistical process for estimating the
relationships among variables.
The line of regression gives best estimate for the value of one variable for some specified
value of other variable
Regression line on
y cov x, y y cov x, y
− = − where byx r byx
x x . y x x2
Regression line on
x cov x, y x cov x, y
− = − where bxy r bxy
y x . y y y2
1 n 1 n 2 2 1 n 2 2
n i 1
where cov x, y xi yi x. y x 2
xi x
n i 1
y 2
yi y
n i 1
Regression
Remarks :
1) = , = are called regression coefficients
2) = .
y y byx x x
x x bxy y y
Regression
Q.1 Obtain regression lines for the following data :
x 6 2 10 4 8
y 9 11 5 8 7
Solution :
y y byx x x
x x bxy y y
where where
cov x, y cov x, y
byx bxy
x2 y2
and
1 n 1 n 2 2 1 n 2 2
cov x, y xi yi x. y
n i 1
x 2
xi x
n i 1
y 2
yi y
n i 1
Regression
6 9 54 36 81
2 11 22 4 121
10 5 50 100 25
4 8 32 16 64
8 7 56 64 49
∑ = 30 ∑ = 40 ∑ = 214 ∑ = 220 ∑ = 340
n n
xi 30 yi 40
1 n
cov x, y xi yi x. y
x i 1
6 y i 1
8 n i 1
n 5 n 5
1 n 2 2
x6 y 8 x 2
n i 1
xi x
1 n 1 1 n 2 2
cov x, y xi yi x. y
n i 1
214 6 8 5.2
5
y 2
yi y
n i 1
Regression
cov x, y 5.2
1 n 2 2
1
x 2
xi x
n i 1
2
220 6 8
5
x2 8 1 n
cov x, y xi yi x. y
n i 1
1 n 2 2 1
y 2
yi y
n i 1
5
2
340 8 4
x 2 1 n 2
xi x
2
n i 1
y2 4
1 n 2 2
y 2
yi y
n i 1
cov x, y 5.2
byx 0.65 byx 0.65 y 8
x 2
8 x6
cov x, y 5.2
bxy 1.3 bxy 1.3
y2 4
Regression
Case I : Regression y on x
byx 0.65
y y byx x x bxy 1.3
y 8 0.65 x 6
x6 y 8
y 8 0.65 x 3.9
y 0.65 x 11.9
Case II : Regression x on y
x x bxy y y
x 6 1.3 y 8
x 6 1.3 y 10.4
x 1.3 y 16.4
Regression
Q.2 Obtain regression lines for the following data :
x 10 14 19 26 30 34 39
y 12 16 18 26 29 35 38
y y byx x x
x x bxy y y
where where
cov x, y cov x, y
byx bxy
x2 y2
and
1 n 1 n 2 2 1 n 2 2
cov x, y xi yi x. y
n i 1
x 2
xi x
n i 1
y 2
yi y
n i 1
Regression
x i
172
y i
174 1 n
x i 1
24.5714 y i 1
24.8571 n i 1
cov x, y xi yi x. y
n 7 n 7
1 n 2 2
x 24.5714 y 24.8571 x 2
xi x
n i 1
1 n 1 1 n 2
cov x, y xi yi x. y
n i 1
7
4904 24.5714 24.8571 y 2
yi y
n i 1
2
Regression
cov x, y 89.7977
1 n 2 2 1
x 2
n i 1
xi x 4910 24.5714
7
2
x 2 97.6749 1 n
cov x, y xi yi x. y
n i 1
1 n 2 2
1
y 2
n i 1
yi y 4910 24.8571
7
2
x 2 1 n 2
xi x
2
n i 1
y 2 83.5532
1 n 2 2
y 2
yi y
n i 1
cov x, y 89.7977
byx byx 0.9194 x 24.5714
x2 97.6749
cov x, y 89.7977 y 24.8571
bxy bxy 1.0747
y2 83.5532
Regression
Case I : Regression y on x byx 0.9194
y y byx x x bxy 1.0747
x x bxy y y bxy 1.0747
x 29.5612
Regression
Q.3 The regression equations are 8 x 10 y 66 0 and 40 x 18 y 214.The value of
variance of x is 9. Find
a) The mean values of x and y
b) The correlation x and y and
c) The standard deviation of y
Solution
8 x 10 y 66 0 and 40 x 18 y 214
x 13 and y 17
Regression
Case II : Coefficient of correlation 8 x 10 y 66 0
Regression y on x : y y byx x x 1 r 1
Regression x on y : x x bxy y y
To find regression of y on x and regression of x on y
8 66 18 214
y x x y
10 10 40 40
8 18
byx 0.8 bxy 0.45
10 40
x = var x
x = 9 x =3
Regression
Q.4 If the two lines of regression are 9 x y 0 and 4 x y and the means of
x & y are 2 & 3 respectively, find the values of , and the coefficient of
correlation between x & y.
Solution
The regression lines are 9 x y and 4 x y
Since, we know that the point ̅, satisfies both regression lines
9 x y and 4 x y
15 and 5
Regression y on x : y y byx x x 1 r 1
Regression x on y : x x bxy y y
To find regression of y on x and regression of x on y
9 x y 15 r bxy .byx 4x y 5
y 9 x 15 byx 9
1) r 9 0.25
y 4 x 5 byx 4
r 1.5
1 15 1 2) r ( 0.1111) 4
x y bxy 0.1111 1 5 1
9 9 9 x y bxy 0.25
r 0.6666 4 4 4
Regression
Regression y on x : Regression x on y :
1 15
x y y 4 x 5
9 9
1 byx 4
bxy 0.1111
9
r (0.1111) 4
r 0.6666
Since both regression coefficients and are negative
∴ we take
r 0.6666