Unit III Statistics

Pimpri Chinchwad Education Trust's
Pimpri Chinchwad College of Engineering

Department of Computer
Engineering
ENGINEERING MATHEMATICS III
Mrs. Ashwini Vaze

1
Statistics
Statistics
Def : Statistics is a branch of mathematics dealing with the collection,

analysis, interpretation, presentation, and organization of data.
Classification of Data
1. Ungrouped Frequency Distribution
2. Grouped Frequency Distribution

Classification of Data
Example : Consider marks of 10 students (out of 25)

23,10,12,23,20,12,20,15,15,25
(Ungrouped Data)
Ungrouped Frequency Distribution Grouped Frequency Distribution

Class
23 2 0-5 0
10 1 5-10 0
12 2 10-15 3
20 2 15-20 2
15 2 20-25 5
25 1
= = 10
= = 10
Measures of Central Tendency
Measures of Central tendency :
Measure of central tendency is typical value around which other figures aggregate.
There are five measures of central tendency that are in common use
1) Arithmetic average or arithmetic mean or simple mean
2) Median
3) Mode
4) Geometric mean
5) Harmonic mean
Arithmetic mean or Average:
1. Ungrouped Data: Consider set of observations , , ,……,
∑
. .= =
2. Ungrouped Frequency distribution: Consider the frequency Distribution
: :
: :
∑ ∑
. = = =
∑
Q.1 Find mean of the following observations
(a) 480,485,492,495,500,505,515
(b)
x 32 33 34 35 36
f 4 10 13 8 5
Solution :
(a) Since, given data is ungrouped data
k
x i 480  485  492  495  500  505  515

x i 1 x  x  496
k 7
(b) Since, given data is in ungrouped frequency distribution
n
fx
i 1
i i
 32  4    33  10    34  13   35  8    36  5 
x n x  x  34
4  10  13  8  5
f
i 1
i
3. Grouped Frequency distribution: Consider the frequency Distribution
Class f
+
−
2
+
−
2
: : :
: : :
+
−
2
∑ ∑ Where, ℎ ∶ ℎ ℎ
. .= = +ℎ = +ℎ
∑ ∶
−
=
ℎ
Q.2 The marks obtained in paper of Mathematics are given in the following table. Find the
Arithmetic mean of the distribution.
Marks Obtained No. of students

0-10 8
10-20 20
20-30 14
30-40 16
40-50 20
50-60 25
60-70 13
70-80 10
80-90 5
90-100 2
Solution :
Mid Value No. of students −
Marks Obtained =
ℎ
0-10 5 8 4 32
10-20 15 20 3 60
20-30 25 14 2 28
30-40 35 16 1 16
40-50 45 20 0 0
50-60 55 25 1 25
60-70 65 13 2 26
70-80 75 10 3 30
80-90 85 5 4 20
90-100 95 2 5 10
Total f  133  fu  25
= middle value of = 45
ℎ = 10
We have
n
 fu
i 1
i i
x  Ah n
f
i 1
i
 25 
 x  45  10  
 133 
 x  43.12
Dispersion
Disperson
In some cases average will not give the full information. In this we check how much is
scatter/ Disperse from its mean value. To do this we use following Methods of Dispersion
1. Standard Deviation
2. Coefficient of Variation
Dispersion
Standard deviation
1. Ungrouped Data: Consider set of observations , , ,……,
1
= = −
2. Ungrouped Frequency Distribution : Consider the frequency distribution
1
= = −
: :
: :
Dispersion
Remarks
1) = ∴ =
2) For calculation purpose use following formulae for Standard Deviation
 Ungrouped Data
1 1
= − = −
 Ungrouped Frequency Distribution

1
1 = −
= −
Coefficient of Variation (C.V.) :

σ
. .= × 100 = × 100
. . ̅
Dispersion
Remarks
2) For calculation purpose use following formulae for Standard Deviation
 Ungrouped Data
1 k 2
1 k 2 2
2

   xi  x
k i 1
   xi  2 xi x  x
k i 1

1 k 2 2 k 1 k 2
  xi   xi x   x
k i 1 k i 1 k i 1

1 k 2 2 2
 
  xi  2 x  x
k i 1
1 k 2 2
  xi  x
k i 1

Dispersion
Remarks
1. Standard Deviation gives information about distribution of data from its mean value.
2. Coefficient of Variation gives information about distribution of data from its mean value
in percentage
3. If the standard deviation is 0.20 and the mean is 0.50, then the C.V. = 40%. So CV helps
us see that even a lower standard deviation doesn't mean less variable data.
1. More C.V. ⇒ More Variability⇒ Less Consistency

Less C.V. ⇒ Less Variability⇒ More Consistency
4. More ⇒ More Variability⇒ Less Consistency

Less ⇒ Less Variability⇒ More Consistency
Dispersion
Examples :
Q.1 Find the standard deviation for the following data
49,63,46,59,65,52,60,54 k
Solution : 1 x i
Standard deviation for ungrouped data is = − where, x  i 1
49 2401
2
63 3969 2 1  448 
   25412     40.5
46 2116 8  8 
59 3481
  6.3640
65 4225
52 2704
60 3600
54 2916
∑ = 448 ∑ = 25412
Dispersion
Q.2 The scores obtained by two batsman A and B in 10 matches are given below. Determine
who is more consistent?
Batsman A 30 44 66 62 60 34 80 46 20 38
Batsman B 34 46 70 38 55 48 60 34 45 30
Solution :
Here, we have to find coefficient of variation of Batsman A i.e. ( . . ) & coefficient of
variation of Batsman B i.e. ( . . )
Where,
Coefficient of variation (A) = ( . .) = × 100

̅
Coefficient of variation (B) = ( . .) = × 100

̅
k
1 k 2 2 x i
2
   xi  x
k i 1
 x i 1
k
Dispersion
Case I : Coefficient of variation (A) = ( . .) = ̅
× 100  C.V .   A 100
Batsman A 1 k 2
A
 x A
   xi 2  x
2
k i 1
 17.6409
30 900   C.V . A   100
2 48
44 1936 1  480 
2   26152   
66 4356 10  10    C.V . A  36.7519
62 3844   2  311.2
60 3600
34 1156
   17.6409
80 6400  A  17.6409
46 2116
480
20 400 & x  A

10
 48
38 1444
∑ = 480 ∑ = 26152
Dispersion
Case II : Coefficient of variation (B) = ( . .) = ̅
× 100  C.V .   B 100
Batsman B 1 k 2
B
 x B
   xi 2  x
2
k i 1
 12.1078
34 1156   C.V . B  100
2 46
46 2116 1  460 
2   22626   
70 4900 10  10    C.V . B  26.3214
38 1444   2  146.6
55 3025
48 2304
   12.1078
60 3600  B  12.1078
34 1156
460
45 2025 & x  B

10
 46
30 900
∑ = 460 ∑ = 22626
Dispersion
Since
 C.V . A  36.7519   C.V . B  26.3214

Therefore, Batsman A has more variability than Batsman B
i.e. Batsman B is more consistent than Batsman A

Dispersion
Q.3 Arithmetic mean and standard deviation of 30 items are 20 and 3 respectively out of
these 30 items, item 22 and 15 are dropped. Find new A.M. and S.D. Calculate A.M. and
S.D. if item 22 is replaced by 8 and 15 is replaced by 17.
Solution :
Let x1 , x2 , x3 ,...22,15,....x30 be given items.
Given that,
A.M. ̅ and S.D. are 20 & 3 respectively
 x  20 &   3.....(1)
k
x i
1 k 2 2
i.e. x  i 1
k
2
&     xi   x
 k i 1 

30
x i
1 30 2 2
i.e. i 1
 20 &  xi    9
20 From equation (1)
30 30 i 1
Dispersion
28
 x  22  15 i
1  28 2 2 2 2
i.e. i 1
 20 &   xi  22  15    20  9
30 30  i 1 
28 28
2
i.e.  xi  563 &
i 1
 i  11561
x
i 1
Find new A.M. and S.D.
Case I : when item 22 & 15 are dropped Calculate A.M. and S.D.
if item 22 is replaced by
k
8 and 15 is replaced by
xi 1
i 17.
 x
k
28
x i 563
x i 1   20.1071  x  20.1071
28 28
Dispersion
28 28
1 k 2 2
2
    xi   x
 k i 1 
 i.e. x
i 1
i  563 & x
i 1
i
2
 11561
1  28 2 
2 2
    xi    20.1071
28  i 1 
1 2
2  11561 
 20.1071
28
  2  8.5974
   2.9321
Case II : when item 22 is replaced by 8 & 15 is replaced by 17
Now, consider the items
x1 , x2 , x3 ,...8,17,.... x30
Dispersion
k 28 28
x
i 1
i i.e. x i  563 & x i
2
 11561
 x i 1 i 1
k
28
 x  8  17
i 1
i
563  8  17
 x   19.6  x  19.6
30 30
1 k 2 2
2
and     xi   x
 k i 1 

1  28 2 2  2
    xi  8  17 2   19.6 
2
30  i 1 
1 2
2  11561  82  17 2   19.6 
30
 2  12.9733    3.6019
Moments, Skewness and kurtosis
Moment :
The moment about mean ̅ is denoted by and defined as
1
= −
Also, the moment about any Number ‘A’ is denoted by ′ and defined as
1
′ = −
Relation between & ′

1  0
2
 2  2 '  1 ' 
3
3  3 ' 31 ' 2 ' 2  1 '
2 4
 4   4 ' 41 ' 3 ' 6  1 '  2 ' 3  1 '
Remarks :
(1) Mean ̅
mean  x  A  1 '
(2) Standard Deviation
standard deviation  2
(3) Coefficient of skewness
3 2
1  2
2
(4) Coefficient of kurtosis

4
2 
22
Type Description Example Result
Sum of values of a data

Arithmetic mean set divided by number (1+2+2+3+4+7+9) / 7 4
of values
Middle value
separating the greater
Median 1, 2, 2, 3, 4, 7, 9 3
and lesser halves of a
data set
Most frequent value in

Mode 1, 2, 2, 3, 4, 7, 9 2
a data set
Skewness : It gives imbalance and asymmetry from mean of a data distribution
There are two types of skewness

1. Positive Skewness : If the frequency curve of a distribution has a longer tail to the right of
the central maximum than to the left, the distribution is said to be skewed to the right or
to have positive skewed. In a positive skewed distribution, Mean > Median > Mode
2. Negative Skewness : If the frequency curve has a longer tail to the left of the central
maximum than to the right, the distribution is said to be skewed to the left or to have
negative skewed. In
a negatively skewed distribution, Mode > Median > Mean.
Remarks :
Coefficient of skewness
3 2
1  2
2
1. >0 ℎ
<0 ℎ
=0 ℎ
Kurtosis : Kurtosis is a measure of peakedness of a distribution relative to the normal
distribution(or symmetrical distribution)
A distribution having a relatively high peak is called leptokurtic. A distribution which is flat
topped is called platykurtic. The normal distribution which is neither very peaked nor very
flat-topped is also called mesokurtic.
Remarks :
Coefficient of kurtosis
4
2  2
2
2. >3 ℎ
=3 ℎ
<3 ℎ
Q.1 The first four moments of a distribution about the value 5 are 2,20,40 and 50. From the
given information obtain the first four central moments, mean, standard deviation and
coefficients of skewness and kurtosis.
Solution :
Given that
A  5, 1 '  2,  2 '  20, 3 '  40, 4 '  50
The first four central moments or moments about mean are
1  0
2 2
 2  2 '  1 '   20   2   16  2  16
3 3
3  3 ' 31 '  2 ' 2  1 '  40  3  2  20   2  2   64  3  64
2 4 2 4
 4   4 ' 41 ' 3 ' 6  1 '  2 ' 3  1 '  50  4  2  40   6  2   20   3  2   162
 4  162
Mean
mean  x  A  1 '
mean  x  5  2  mean  x  7
Standard Deviation
standard deviation  2
standard deviation  16  standard deviation  4
2
3 2  64 
1  2  1  2
 1  1
2 16 
 162
 2  42  2  2   2  0.6328
2 16 
Q.2 Find the four moments about the mean of the following
61 64 67 70 73
5 18 42 27 8
also calculate & .
Solution :
1
The moment about mean ̅ is given below = −
n n
1 1 3
1 
N
 f  x  x
i 1
i i 3 
N
 f  x  x
i 1
i i
n n
1 2 1 4
2 
N
 f  x  x
i 1
i i 4 
N
 f  x  x
i 1
i i
− ̅ − ̅ − ̅ − ̅
61 5 305 -32.25 208.0125 -1341.6806 8653.84
64 18 1152 -62.1 214.245 -739.1453 2550.051
67 42 2814 -18.9 8.505 -3.8273 1.722263
70 27 1890 68.85 175.5675 447.6971 1141.628
73 8 584 44.4 246.42 1367.6310 7590.352
∑ = 335 = ∑ = 100 ∑ = 6745 ∑=0 ∑ = 852.75 ∑ = −269.3250 ∑ = 19937.5931
n n
1 1
 fi xi
6745
1 
N
 
i 1
fi xi  x  
100
0  0
i 1
x n   67.45
100
f i
1 n 2 1
i
2 
N
 f  x  x
i 1
i i 
100
 852.75  8.5275
n
1 3 1
3 
N
 f  x  x
i 1
i i

100
  269.3250   2.6933
n
1 4 1
4 
N
 f  x  x
i 1
i i

100
 19937.5931  199.3759
2
3 2  2.6933
1  2  1  2
 1  0.0998
2  8.5275
4 199.3759
2  2   2  2   2  2.7418
2  8.5275
Correlation
Correlation :
To measure linear relationship between two variables, Karl Pearson developed a formula
called correlation coefficient.
Coefficient of Correlation between two variables and denoted by ( , ) and defined as
( , )
, =
Where
1. Ungrouped Data 2. Ungrouped Frequency Distribution
1 1
, = − − , = − −
1 1
, = − . , = − .
1 1 1 1
= − = − = − = −
Correlation
Remarks :
1. −1 < <1
2. If r is close to 0, it means there is no relationship between the variables. If r is positive, it

means that as one variable gets larger the other gets larger. If r is negative it means that
as one gets larger, the other gets smaller
EXAMPLES
Q.1 Find the coefficient of correlation for the following table
x 10 14 18 22 26 30
y 18 12 24 6 30 36
Solution :
We know, Karl Pearson’s coefficient of correlation ,
cov  x, y 
r  x, y  
 x y
Correlation
1 n 1 n 2 2 1 n 2 2
where, cov  x, y    xi yi  x. y
n i 1
  x 2
  xi  x
n i 1
 y 2
 
  yi  y
n i 1
xy x 2 y2
10 18 180 100 324
14 12 168 196 144
18 24 432 324 576
22 6 132 484 36
26 30 780 676 900
30 36 1080 900 1296
∑ = 120 ∑ = 126
 2772  2680  3276
n n
x
i 1
i
120 y i
126  y  21
x   20  x  20 y i 1
  21
n 6 n 6
Correlation
1 n
cov  x, y    xi yi  x. y
n i 1
 
1
cov  x, y    2772   20  21
6
cov  x, y   42
1 n 2 2
x 2
  xi  x
n i 1
  y  21
 x  20
1 2
 x 2   2680   20 
6
 x 2  46.6667   x  6.8313
1 n 2 2
1
y 2
  yi  y
n i 1
    y 2   3276   21
6
2
  y 2  105   y  10.2470
Correlation
cov  x, y  cov  x, y   42
r  x, y  
 x y
  x  6.8313
42
r  x, y     y  10.2470
 6.831310.2470 
r  x, y   0.6
EXAMPLES
x 6 2 10 4 8
y 9 11 5 8 7
Solution :
cov  x, y 
We know, Karl Pearson’s coefficient of correlation ,  r  x, y  
 x y
Correlation
1 n 1 n 2 2 1 n 2 2
where, cov  x, y    xi yi  x. y
n i 1
  x 2
  xi  x
n i 1
 y 2
 
  yi  y
n i 1
xy x 2 y2
6 9 54 36 81
2 11 22 4 121
10 5 50 100 25
4 8 32 16 64
8 7 56 64 49
∑ = 30 ∑ = 40
 214  220   340
n
x i
n
x i 1

30
6 x6 y
i 1
i
40  y 8
n 5 y  8
n 5
Correlation
1 n
cov  x, y    xi yi  x. y
n i 1
 
1
cov  x, y    214   6  8 
5
cov  x, y   5.2
1 n 2 2
x 2
  xi  x
n i 1
  y 8
x6
2 1 2
x   220   6 
5
 x2  8   x  2.8284
1 n 2 2
1
y 2
  yi  y
n i 1
  2
  y 2   340   8    y 2  4
5
y  2
Correlation
cov  x, y  cov  x, y   5.2
r  x, y  
 x y
  x  2.8284
5.2
r  x, y  
 2.8284  2  y  2
r  x, y   0.9192

n  20,  x  40,  xi 2  190,  yi  40, yi 2  200,  xi yi  150,
Solution :
We know that
n n
x
i 1
i
40 y i
40
x  2 & y i 1
 2
n 20 n 20
Correlation
cov  x, y  Q.3 Find the coefficient of correlation for the following table
r  x, y   n  20, x  40, xi 2  190,  yi  40, yi 2  200, xi yi  150,
 x y
n x2 y2
1
where, cov  x, y   
n i 1
 
xi yi  x. y
1
  150   2  2 
20 3.5
 r  x, y  
 2.3452  2.4495 
 3.5
 cov  x, y   3.5  r  x, y   0.6093
1 n 2 1 2
x 2
  xi  x
n i 1

20
2
 190   2   5.5   x  2.3452

1 n 2 2 1
 y2   2
  yi  y   200   2   6   y  2.4495
n i 1 20
Regression
Regression :
In statistical modeling, regression analysis is a statistical process for estimating the
relationships among variables.
The line of regression gives best estimate for the value of one variable for some specified
value of other variable
Regression line on
 y cov  x, y   y cov  x, y 
− = − where byx  r   byx 
x  x . y  x  x2
Regression line on
 x cov  x, y   x cov  x, y 
− = − where bxy  r   bxy 
y  x . y  y  y2
1 n 1 n 2 2 1 n 2 2
n i 1
 
where cov  x, y    xi yi  x. y x 2

  xi  x
n i 1
y 2
  yi  y
n i 1
 
Regression
Remarks :
1) = , = are called regression coefficients
2) = .
3) If is given and to find corresponding value of ,use

regression line of on
4) If is given and to find corresponding value of ,use

regression line of on
5) If bxy  0 and byx  0 then r  0
6) If bxy  0 and byx  0 then r  0
7) Point ̅, satisfies the equations of lines of regression
y  y  byx x  x   
x  x  bxy y  y 
Regression
Q.1 Obtain regression lines for the following data :
x 6 2 10 4 8
y 9 11 5 8 7
Solution :
Case I : Regression y on x Case II : Regression x on y

y  y  byx x  x  
x  x  bxy y  y 
where where
cov  x, y  cov  x, y 
byx  bxy 
 x2  y2
and
1 n 1 n 2 2 1 n 2 2
cov  x, y    xi yi  x. y
n i 1
  x 2

  xi  x
n i 1
y 2
  yi  y
n i 1
 
Regression
6 9 54 36 81
2 11 22 4 121
10 5 50 100 25
4 8 32 16 64
8 7 56 64 49
∑ = 30 ∑ = 40 ∑ = 214 ∑ = 220 ∑ = 340
n n
 xi 30  yi 40
1 n
 
cov  x, y    xi yi  x. y
x i 1
 6 y i 1
 8 n i 1
n 5 n 5
1 n 2 2
x6  y 8 x 2
n i 1

  xi  x
1 n 1 1 n 2 2
cov  x, y    xi yi  x. y
n i 1
    214   6  8   5.2
5
y 2
 
  yi  y
n i 1
Regression
 cov  x, y   5.2
1 n 2 2
1
x 2
  xi  x
n i 1
 2
  220   6   8
5
  x2  8 1 n
 
cov  x, y    xi yi  x. y
n i 1
1 n 2 2 1
y 2
  yi  y
n i 1
  5
2
  340   8  4
x 2 1 n 2

  xi  x
2
n i 1
  y2  4
1 n 2 2
y 2
 
  yi  y
n i 1
cov  x, y  5.2
byx    0.65  byx  0.65  y 8
x 2
8 x6
cov  x, y  5.2
bxy    1.3  bxy  1.3
 y2 4
Regression
Case I : Regression y on x
 byx  0.65

y  y  byx x  x   bxy  1.3
y  8  0.65  x  6 
x6  y 8
y  8  0.65 x  3.9
y  0.65 x  11.9
Case II : Regression x on y

x  x  bxy y  y 
x  6  1.3  y  8 
x  6  1.3 y  10.4
x  1.3 y  16.4
Regression
Q.2 Obtain regression lines for the following data :
x 10 14 19 26 30 34 39
y 12 16 18 26 29 35 38
and estimate y for x  14.5 and x for y  29.5

Solution :
Case I : Regression y on x Case II : Regression x on y

y  y  byx x  x  
x  x  bxy y  y 
where where
cov  x, y  cov  x, y 
byx  bxy 
 x2  y2
and
1 n 1 n 2 2 1 n 2 2
 
cov  x, y    xi yi  x. y
n i 1
x 2

  xi  x
n i 1
y 2
  yi  y
n i 1
 
Regression
10 12 120 100 144

14 16 224 196 256
19 18 342 361 324
26 26 676 676 676
30 29 870 900 841
34 35 1190 1156 1225
39 38 1482 1521 1444
∑ = 172 ∑ = 174 ∑ = 4904 ∑ = 4910 ∑ = 4910
n n
x i
172
y i
174 1 n
x i 1
  24.5714 y i 1
  24.8571 n i 1
 
cov  x, y    xi yi  x. y
n 7 n 7
1 n 2 2
 x  24.5714  y  24.8571 x 2

  xi  x
n i 1
1 n 1 1 n 2
cov  x, y    xi yi  x. y
n i 1
  
7
 4904   24.5714  24.8571 y 2
 
  yi  y
n i 1
2
Regression
 cov  x, y   89.7977
1 n 2 2 1
x 2
n i 1

  xi  x   4910   24.5714 
7
2
  x 2  97.6749 1 n
 
cov  x, y    xi yi  x. y
n i 1
1 n 2 2
1
y 2
n i 1
 
  yi  y   4910   24.8571
7
2
x 2 1 n 2

  xi  x
2
n i 1
  y 2  83.5532
1 n 2 2
y 2
 
  yi  y
n i 1
cov  x, y  89.7977
byx    byx  0.9194  x  24.5714
 x2 97.6749
cov  x, y  89.7977  y  24.8571
bxy    bxy  1.0747
 y2 83.5532
Regression
Case I : Regression y on x  byx  0.9194

y  y  byx x  x   bxy  1.0747
y  24.8571  0.9194  x  24.5714   x  24.5714

y  24.8571  0.9194 x  22.5909  y  24.8571
y  0.9194 x  2.2662 and estimate
For = 14.5 y for x  14.5 and x for y  29.5
y  0.9194 14.5   2.2662

y  15.5975
Regression
Case II : Regression x on y  byx  0.9194

x  x  bxy y  y   bxy  1.0747
x  24.5714  1.0747  y  24.8571  x  24.5714

x  24.5714  1.0747 y  26.7139  y  24.8571
x  1.0747 y  2.1425 and estimate
For y = 29.5 y for x  14.5 and x for y  29.5
x  1.0747  29.5   2.1425
x  29.5612
Regression
Q.3 The regression equations are 8 x  10 y  66  0 and 40 x  18 y  214.The value of
variance of x is 9. Find
a) The mean values of x and y
b) The correlation x and y and
c) The standard deviation of y
Solution
The regression lines are 8 x  10 y  66  0 and 40 x  18 y  214
Case I : Mean values of x and y
Since, we know that the point ̅, satisfies both regression lines
8 x  10 y  66  0 and 40 x  18 y  214
x  13 and y  17
Regression
Case II : Coefficient of correlation 8 x  10 y  66  0
r  bxy .byx 40 x  18 y  214
 
Regression y on x : y  y  byx x  x 1  r  1
Regression x on y : x  x  bxy  y  y
To find regression of y on x and regression of x on y
8 x  10 y  66  0 r  bxy .byx 40 x  18 y  214
8 66 8 1) r  0.8  0.45 40 214  b  40  2.2222

y  x  b yx   0.8 y x yx
10 10 10 r  0.6 18 18 18
10 66 10 2) r  1.25  2.2222
x y  bxy   1.25 18 214  b  18  0.45
8 8 8 x y xy
r  1.6667 40 40 40
Regression
Regression y on x : Regression x on y :
8 66 18 214
y  x x y
10 10 40 40
8 18
 byx   0.8  bxy   0.45
10 40
r  0.8  0.45  r  0.6

Case II : standard deviation of ∶
standard deviation = variance   = var

Given that =9
  x = var  x 
x = 9   x =3
Regression
Q.4 If the two lines of regression are 9 x  y    0 and 4 x  y   and the means of
x & y are 2 &  3 respectively, find the values of  , and the coefficient of
correlation between x & y.
Solution
The regression lines are 9 x  y   and 4 x  y  
Since, we know that the point ̅, satisfies both regression lines
 9 x  y   and 4 x  y  
Given that : x  2 and y  3
 9  2    3   and 4  2    3  
  15 and   5
∴The regression lines are 9 x  y  15 and 4 x  y  5

Regression
Case II : Coefficient of correlation   15 and   5
r  bxy .byx
 
Regression y on x : y  y  byx x  x 1  r  1
Regression x on y : x  x  bxy  y  y
To find regression of y on x and regression of x on y
9 x  y  15 r  bxy .byx 4x  y  5
y  9 x  15  byx  9
1) r   9    0.25 
y  4 x  5  byx  4
r  1.5
1 15 1 2) r  ( 0.1111)   4 
x y   bxy   0.1111 1 5 1
9 9 9 x  y   bxy   0.25
r  0.6666 4 4 4
Regression
Regression y on x : Regression x on y :
1 15
x y y  4 x  5
9 9
1  byx  4
 bxy   0.1111
9
r  (0.1111)   4 
r  0.6666
Since both regression coefficients and are negative
∴ we take
r  0.6666

Unit III Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit III Statistics

Uploaded by

Copyright:

Available Formats

Pimpri Chinchwad Education Trust's

Pimpri Chinchwad College of Engineering

ENGINEERING MATHEMATICS III

Mrs. Ashwini Vaze

Def : Statistics is a branch of mathematics dealing with the collection,

1. Ungrouped Frequency Distribution

2. Grouped Frequency Distribution

Example : Consider marks of 10 students (out of 25)

Ungrouped Frequency Distribution Grouped Frequency Distribution

x i 480  485  492  495  500  505  515

Marks Obtained No. of students

2. Ungrouped Frequency Distribution : Consider the frequency distribution

2) For calculation purpose use following formulae for Standard Deviation

 Ungrouped Frequency Distribution

Coefficient of Variation (C.V.) :

1. More C.V. ⇒ More Variability⇒ Less Consistency

4. More ⇒ More Variability⇒ Less Consistency

Coefficient of variation (A) = ( . .) = × 100

Coefficient of variation (B) = ( . .) = × 100

 C.V . A  36.7519   C.V . B  26.3214

i.e. Batsman B is more consistent than Batsman A

Now, consider the items

Relation between & ′

(2) Standard Deviation

(3) Coefficient of skewness

(4) Coefficient of kurtosis

Type Description Example Result

Sum of values of a data

Most frequent value in

There are two types of skewness

standard deviation  16  standard deviation  4

also calculate & .

∑ = 335 = ∑ = 100 ∑ = 6745 ∑=0 ∑ = 852.75 ∑ = −269.3250 ∑ = 19937.5931

1. Ungrouped Data 2. Ungrouped Frequency Distribution

2. If r is close to 0, it means there is no relationship between the variables. If r is positive, it

Q.3 Find the coefficient of correlation for the following table

3) If is given and to find corresponding value of ,use

4) If is given and to find corresponding value of ,use

5) If bxy  0 and byx  0 then r  0

6) If bxy  0 and byx  0 then r  0

7) Point ̅, satisfies the equations of lines of regression

Case I : Regression y on x Case II : Regression x on y

and estimate y for x  14.5 and x for y  29.5

10 12 120 100 144

y  24.8571  0.9194  x  24.5714   x  24.5714

y  0.9194 14.5   2.2662

x  24.5714  1.0747  y  24.8571  x  24.5714

x  1.0747  29.5   2.1425

The regression lines are 8 x  10 y  66  0 and 40 x  18 y  214

Case I : Mean values of x and y

Since, we know that the point ̅, satisfies both regression lines

r  bxy .byx 40 x  18 y  214

8 x  10 y  66  0 r  bxy .byx 40 x  18 y  214

8 66 8 1) r  0.8  0.45 40 214  b  40  2.2222

r  0.8  0.45  r  0.6

standard deviation = variance   = var

Given that : x  2 and y  3

 9  2    3   and 4  2    3  

∴The regression lines are 9 x  y  15 and 4 x  y  5

You might also like