You are on page 1of 28

Probability & Statistics

Dr. Santosh Kumar Yadav


Assistant Professor

Department of Mathematics
Lovely Professional University, Phagwara,
Punjab.

Dr. S. K. Yadav, LPU Punjab 1 / 25


Correlation and Regression
Analysis
In this series of lecture we will mainly discuss the correlation of
bivariate data, mathematical measurment of linear relation be-
tween two variable, the properties of corelation coefficient, rank
correlation and regression lines and coefficients.

Dr. S. K. Yadav, LPU Punjab 2 / 25


Correlation

Definition: If the change in one variable affects a change in


other variable, then we say varaibles are correleted. There
are two type of correlation:
positive correlation:

negative correlation:

Dr. S. K. Yadav, LPU Punjab 3 / 25


Examples

Let X and Y are random variables, respectively representing


the following:

advertisement and sales.


price and demand of a product.
income and expenditure.
competition and sales in commodity.

Note: a quantitative measure of relationship between two


variables is required to interprete correlation between them.

Dr. S. K. Yadav, LPU Punjab 4 / 25


Scatter Diagram

The simplest way of diagramatic representation of bivariate data


is scatter plot.
For a bivariate distribution (Xi , Yi ), i = 1, 2, 3, · · · , n, if values
of x and y are represented by dots along x-axis and y-axis,
respectivly, then resulting plot is called scatter plot or scat-
ter diagram.

Dr. S. K. Yadav, LPU Punjab 5 / 25


Scatter Diagram

The simplest way of diagramatic representation of bivariate data


is scatter plot.
For a bivariate distribution (Xi , Yi ), i = 1, 2, 3, · · · , n, if values
of x and y are represented by dots along x-axis and y-axis,
respectivly, then resulting plot is called scatter plot or scat-
ter diagram.
If dots are very dense, imply good correation. if dots are
widely scattered, a poor correlation is expected.

Dr. S. K. Yadav, LPU Punjab 5 / 25


Carl Pearson’s Coefficient of Correlation
A formula to measure degree of linear relationship between
two variables, called correlation coefficient formula.
The correlation coefficient between two variables X and Y is
denoted by r (or by ρ) and defined as

Cov (X , Y )
r (X , Y ) = ,
σX σY

Where Cov (X , Y ), σx and σy have usual definitions.

Dr. S. K. Yadav, LPU Punjab 6 / 25


Computation of Correlation Coefficient

Cov (X , Y )
r (X , Y ) = ,
σX σY
where
n
1X
Cov (X , Y ) = Xi Yi − X Y
n
i=1
n
1 X 2
σx2 = Xi2 − X
n
i=1
n
1 X 2
σy2 = Yi2 − Y
n
i=1

Dr. S. K. Yadav, LPU Punjab 7 / 25


Cont...
(Direct method:)
P P P
n XY − X Y
r (X , Y ) = p P P 2p P 2 P
n X − ( X ) n Y − ( Y )2
2

(Proof it by definition)

Dr. S. K. Yadav, LPU Punjab 8 / 25


Cont...
(Shortcut method):

P P P
n UV − U V
r (X , Y ) = p P P p P P ,
n U 2 − ( U)2 n V 2 − ( V )2

where U = X − Ax and V = Y − Ay
and Ax and Ay are assumed mean of X and Y data.

Dr. S. K. Yadav, LPU Punjab 9 / 25


Example 1

Find the correlataion coefficient for the following heights (in


inches) of fathers (X) and sons (Y):
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
Let Ax = 68 and Ay = 69.

Dr. S. K. Yadav, LPU Punjab 10 / 25


Cont..

then by shortcut formula,


8×24−0×0
r (X , Y ) = √8×36−0√
8×44−0
= 0.603

Dr. S. K. Yadav, LPU Punjab 11 / 25


Example 2
The joint probability distribution of X and Y is given below:
Y /X 1 −1 .
1 3
0 8 8
.
2 2
1 8 8
.
. . .
Find the correlataion coefficient between X and Y .

Dr. S. K. Yadav, LPU Punjab 12 / 25


Example 2
The joint probability distribution of X and Y is given below:
Y /X 1 −1 .
1 3
0 8 8
.
2 2
1 8 8
.
. . .
Find the correlataion coefficient between X and Y .
Ans: r (X , Y ) = 0.2582

Dr. S. K. Yadav, LPU Punjab 12 / 25


Properties of correlation coefficient

The correlation coefficient can not exceed unity.

−1 ≤ r (X , Y ) ≤ 1

The correlation coefficient is independent of change of origin


and scale.

ac
r (aX + b, cY + d) = r (X , Y )
|ac|
r (X , Y ) = r (U, V )

Dr. S. K. Yadav, LPU Punjab 13 / 25


Cont...

Two independent variables are uncorreleted, but converse is


not true.
For example consider the following data:
X -3 -2 -1 1 2 3
Y 9 4 1 1 4 9

Dr. S. K. Yadav, LPU Punjab 14 / 25


Spearman’s Rank Correlation Coefficient

This method is used when rank of observation (data) is


given (or we assign ranks to observations).

let (Xi , Yi ) be n observations then we find the rank of X and


Y and then find d = Rx − Ry , difference of ranks. Then
rank correlation coefficient is given by

6 d2
P
ρ=1−
n(n2 − 1)

Dr. S. K. Yadav, LPU Punjab 15 / 25


Example 2
Find the rank correlation of the following data where X is
price of tea and Y is price of coeffee
X Y Rx Ry d = Rx − Ry d2
75 120 ... ... ... ...
88 134 ... ... ... ...
95 150 ... ... ... ...
70 115 ... ... ... ...
60 110 ... ... ... ...
80 140 ... ... ... ...
81 142 ... ... ... ...
50 100 ... ... P ... P ...2
. . . d= d =
Ans: 0.929

Dr. S. K. Yadav, LPU Punjab 16 / 25


Rank Correlation Coefficient(Tied Case)

If m is the number of times an item is repeated (recieved


same rank in ranking of merit). Then each of repeated item
is assigned a common rank.

This common rank is actually average of the ranks if they


were different from each other and he next item will get rank
next to the rank already assumed. As a result above
formula becomes,
hP P m(m2 −1) i
2
6 d + 12
ρ=1−
n(n2 − 1)

Dr. S. K. Yadav, LPU Punjab 17 / 25


Example 2
Find the rank correlation of the following data
X Y Rx Ry d = Rx − Ry d2
68 62 ... ... ... ...
64 58 ... ... ... ...
75 68 2.5 3.5 ... ...
50 45 ... ... ... ...
64 81 ... ... ... ...
80 60 ... ... ... ...
75 68 2.5 3.5 ... ...
40 48 ... ... ... ...
55 50 ... ... ... ...
64 70 ... ... P... P ...2
. . . d= d =

Dr. S. K. Yadav, LPU Punjab 18 / 25


Cont...

ranks assignment: Rx = (2 + 3)/2 = 2.5, (5 + 6 + 7)/3 = 6


and Ry = (3 + 4)/2 = 3.5. hence total correction for X: 5/2
and for Y: 1/2. Hence rank correlation is

[72 + 5/2 + 1/2]


ρ=1−6
10(102 − 1)
ρ = 0.545

Dr. S. K. Yadav, LPU Punjab 19 / 25


Regression Analysis
The Regression analysis is a mathematical measure of
the average relationship between two or more variables
in terms of the original units of data.

If variables are related the we can see in scatter plot the


points will be clustered around some curve, called “curve of
regression”.

Dr. S. K. Yadav, LPU Punjab 20 / 25


Regression Analysis
The Regression analysis is a mathematical measure of
the average relationship between two or more variables
in terms of the original units of data.

If variables are related the we can see in scatter plot the


points will be clustered around some curve, called “curve of
regression”.

If the curve is straight line, then line is called linear regres-


sion otherwise called curvilinear regression or non-linear re-
gression.

This line is the “bestfit line” to the given data points and will
be obtained by using the “principle of least sqaures’’.

Dr. S. K. Yadav, LPU Punjab 20 / 25


Regression Line

Fitting a Straight line (linear regression):


We want to fit a line to the given n data points (Xi, Yi) by
using principle of least square. Let the best fit line is

y = a + bx
Using principle of least squares
(REST PART IS DONE IN CLASS ON BOARD)

Dr. S. K. Yadav, LPU Punjab 21 / 25


Non linear Regression

Fitting a nonlinear lines:


PROPERTIES OF REGRESSION COEFFICIENTS
(THESE TOPICS ARE DONE IN CLASS ON BOARD)

Dr. S. K. Yadav, LPU Punjab 22 / 25


Review Problems
The coefficient of correlation between X and Y is 0.6. their
covariance is 4.8 and variance of X is 9, Find S.D. of Y .
Derive the normal equations for fitting a curve of form y =
abx to the given n data.
Derive the normal equations for fitting a line of form y =
a + bx to the given n data.
Derive the normal equations for fitting a non-linear curve of
form y = a + bx + cx 2 to the given n data.
Define the regression coefficients and write lines of regres-
sion. Also show that correlation coeficient is geometric mean
of regression coefficients.

Dr. S. K. Yadav, LPU Punjab 23 / 25


Review Problems

For two given ruegresion lines x + 2y − 5 = 0 and 2x + 3y −


8 = 0, find the mean vales of X and Y.
Write lines of regression. Also show that regression coeffi-
cients are independent of change of origin.
Write the formlae for regression coefficients. Also show that
regression coefficients are not independent of change of scale.

Define coefficient of correlation and show that coefficient of


correlation is independeent of change of scale and origin.

Dr. S. K. Yadav, LPU Punjab 24 / 25


THANK YOU

Dr. S. K. Yadav, LPU Punjab 25 / 25

You might also like