You are on page 1of 3

Rajib Dolai

Correlation and Regression https://rajib1.weebly.com/

 Correlation is concerned with the measurement of the ‘ strength of association’ between variable.
 While Regression is concerned with the ‘prediction’ of the most likely value of one variable when the
value of the other variable in known.

 When statistical data relating to


the simultaneous measurement
on two variable, each pair of
observation can be geometrically
represented then that
representation is known as
Scatter Diagram.

 The o d o elatio is used to de ote the deg ee of asso iatio et ee a ia les .


 If y tends to increase as x increases the variables are said to be positively correlated.
 If y tends to decrease as x increases the variables are negatively correlated.
 If the values of y are not affected by changes in the values of x , the variables are said to be
uncorrelated.

1
Cov (x,y) = ( − )( − )
= -( )( )
 Variance must be always positive, covariance may be positive, negative or zero.
 If x and y are two independent variable, then their co-variance is Zero. i.e. COV (X,Y) = 0 .

ASSUMPTION:
1. X and Y are linear relationship .
2. Both variable should be Normally Distributed.
3. Homoscedasticity of the variable.

𝒐𝒗 ( , )
r=
𝝈 𝝈
 r is independent of the choice of both origin and scale of observation.
 Correlation co-efficient between x and y = Correlation co-efficient between u and v.
− − ′
If u = ,v= ′
 r is a pure number and is unit free.
 r lies et ee - a d + - ≤ ≤ .
When r = +1 perfect positive Correlation between variable.
r= -1 perfect negative Correlation between variable. Rajib Dolai
https://rajib1.weebly.com/
 r is a measure of degree of association between two variables.
 Correlation coefficient is adopted by karl Pearson.
 If two variable are independent, their correlation coefficient is Zero. But the converse is not true.

 Total variation = Unexplained variance + Explain variable


𝑬
 1= +
𝐸 𝑙 𝑖 𝑟𝑖 𝑙 (𝐸 )
Now, 𝑟 2 = Total variation (TV )
 When r2= 1 , TV =EV and UV =0
 When r2= 0 , EV=0
 Sign of r only indicates whether x and y more in the same direction or opposite directions but r2 is
always positive.
−𝐸 𝑙 𝑖 𝑟𝑖 𝑙
 Coefficient of Non-Determination : K2 =
𝑡 𝑙 𝑟𝑖 𝑙
𝑬
=1-
= 1 – r2
 Coefficient of Alienation : K = ± − 𝒓
 The Correlation coefficient are symmetric function of x and y i.e.
i.e. 𝑟 = . 𝑟 . But regression co-efficient are not symmetric function of x and y i.e. ≠ .

𝑪𝑶 ( , ) 𝝈
= 𝝈
=r𝝈

1. y - = ( x- )
x- = ( y- )
Where and are respectively the regression coefficients of y on x and the regression
coefficients of x on y.
2. The product of the two regression coefficients is equal to the square of correlation coefficient.
. = r2
3. r, and , all have the same sign. If the correlation coefficient r is zero, the regression coefficients
and are also zero.
4. The regression lines always intersect at the point ( , ) . The slopes of the regression line of y on x and
the regression line of x on y are respectively and 1/ .
5. The angle between the two regression lines depends on the correlation coefficient r. When r=0 , the
two lines are perpendicular to each other; when r= +1, or r= -1, they coincide .As r increases
numerically from 0 to 1 , the angle between the regression lines diminishes from 90 0 to 00.
6. The two regression equations are usually different . However, when r = ±1 , they become identical;
and in this case, there is an exact linear relationship between the variables . When r = 0, the regression
equations reduce to y = and x = , and neither y nor x can be estimated from linear regression
equations.
7. If the variables are uncorrelated i.e. r = 0 then the lines are perpendicular.
8. If one of the regression coefficient is greater than one , the other must be less than one.
9. The A.M. of regression coefficient ( + ) is greater than the correlation coefficient.
10. Regression coefficients are independent of change of origin but not of scale.
 Correlation need not imply cause and effect relationship between the variables. But regression
analysis clearly indicates the cause and effect relationship between variables.
Rajib Dolai
https://rajib1.weebly.com/
Example 1:
 Let the two regression lines be given as: 3x = 10 + 5y and 4y = 5 + 15x . Then the correlation
oeffi ient etween and is…….

10 5
 X=
3
+ …………..
3
5 15
Y= +
4
…………..
4

5 15 5 15 25 5
𝑟2 = × = × = = = 2.5 > 1 [ this is impossible ]
3 4 3 4 4 2
So from 1 and 2 e uatio
st nd
e o e…….
10 3 5 4
Y=- + and x = - +
5 5 15 15
3 4 3 4 4 2
𝑟 2 = 5 × 15 = 5
× 15
= = = 0.4 < 1
25 5
so answer is 0.4.

Example 2:
 In a two variable regression Y is dependent variable and X is independent variable. The correlation
coefficient between Y and X is 0.6. For this which of the result explained by X.

 Y = a + bX where Y = dependent variable


X = independent variable
Here r = 0.6
r2 = 0.36

So 36% variations in Y are explained by X.

You might also like