You are on page 1of 34

CORRELATION AND REGRESSION

Covariance
Covariance is a measure of how much two
random variables change together
Correlation

• If the change in one variable affects a change


in the other variable, the variables are said to
be correlated.
• Correlation is a statistical measure that
indicates the extent to which two or more
variables fluctuate together.
Correlation coefficient
It is a measure of the linear correlation
(dependence) between two variables X and Y,
giving a value between +1 and −1 inclusive,
where 1 is positive correlation, 0 is no
correlation, and −1 is negative correlation.
It is widely used in the sciences as a measure
of the degree of linear dependence between
two variables.
COVARIANCE PROBLEMS
CORRELATION
• Let X and Y measure some characteristics of a
particular system. To study about the system it
is necessary the measure the Interdependence
of X and Y. If X and Y vary in such a way that
change in one variable affects the other
variable then X and Y are correlated.
• Eg. Price of commodity and amount of demand
Types of correlation
• i) Positive and negative
• ii)Simple, Partial and Multiple
• iii)Linear and non-linear
• Positive/Negative correlation:
• If increase in one variable causes a
proportionate increase in the other variable,
then the variables are positively correlated.
• If increase in one variable causes a
proportionate decrease in the other variable
then the variables are negatively correlated.
Coefficient of correlation:
It is a measure of relationship between
the two variables and is denoted by r.
Methods of studying correlation:
1)Scatter diagram method
2)Karl Pearson’s coefficient of
correlation
Karl Pearson’s coefficient of correlation
• The coefficient of correlation between X and
is defined as
=
PROPERTIES OF CORRELATION
COEFFICIENT
1)The coeff of correlation lies between -1 and 1.
2)When r=1, there is a perfect positive correlation
r=-1,there is a perfect negative correlation
r=0,there is no correlation(variables are independent)
3)r is independent of change of scale and origin of X
and Y
NOTE

• For frequency distribution

• Change in origin and scale does not affect the


coefficient of correlation (ie)𝑟𝑋𝑌 = 𝑟𝑈𝑉
PROBLEMS
• 1. Find the correlation coefficient for the
following data
Sales( Advertising X2 Y2 XY
X) Expenditure(
Y)
15 50 225 2500 750
18 65 324 4225 1170
25 82 625 6724 2050
27 95 729 9025 2565
30 110 900 12100 3300
35 120 1225 14400 4200
150 522 4028 48974 14035
Sum
=0.99
PROBLEM 2
• A computer while calculating rxy form 25 pairs of observaions
obtained the following constants
n=25
σ 𝑥 = 125, σ 𝑥 2 = 650, σ 𝑦 = 100, σ 𝑦 2 = 460, σ 𝑥𝑦 = 508
• A recheck showed that 2 pairs of values (6,14) and (8,6) were
wrong while the correct values were (8,12) and (6,8). Obtain
the correct correlation coefficient.
PROBLEM 3
• For the following bivariate distribution,
calculate the value of correlation coefficient.
Y X 0 1 2 3
1 5/48 7/48 0 0
2 9/48 5/48 5/48 0
3 1/12 1/12 1/12 5/48
• E(X)=49/48,E(Y)=102/48,E(X2)=97/48
• E(Y2)=241/48,E(XY)=241/48
• Cov(X,Y)=0.3103
• r=0.4072
PROBLEM 4
• The joint pdf of the 2 dim rv (X,Y) is

• Find the correlation coefficient between X and Y


PROBLEM 5
• Two random variables X and Y are defined
with Y=4X+9. Find the correlation coefficient
between X and Y.
• Solution : r=1
PROBLEM 6
• Find Karl Pearson’s correlation coefficient for
the following data.
• X: 78 89 97 69 59 79 61 61
• Y: 175 137 156 112 107 136 123 108

• (Use U=X-75,V=Y-125)
Regression
A linear regression consists of finding the best
fitting straight line through the points.
Lines of regression
• Regression line of y on x
y
y− y =r (x − x)
x
• Regression line of x on y

x
x−x =r ( y − y)
y
Where
x ( x − x )( y − y )
r = bxy =
y ( y − y ) 2

y ( x − x )( y − y )
r = byx =
x ( x − x ) 2
Coefficient of determination

x y
r .r =r 2

y x

Correlation coefficient

r =  byx.bxy
Regression, Correlation
From the following table, find
(a) The two regression lines
(b) Determine the correlation coefficient between the
marks in economics and statistics.
(c) Most likely marks in statistics when marks in
economics are 30
Marks in 25 28 35 32 31 36 29 38 34 32
Economics (x)
Marks in 43 46 49 41 36 32 31 30 33 39
Statistics (y)
Example
The two lines of regression are 8x-10y+66=0 and
40x-18y-214=0. Find
1. The mean values of x and y
2. Correlation coefficient of x and y.
PARTIAL-SIMPLE CORRELATION
• Partial correlation analysis involves studying the linear
relationship between two variables after excluding the
effect of one or more independent factors.
• Simple correlation does not prove to be an all-
encompassing technique especially under the above
circumstances. In order to get a correct picture of the
relationship between two variables, we should first
eliminate the influence of other variables.
• For example, study of partial correlation between price
and demand would involve studying the relationship
between price and demand excluding the effect of
money supply, exports, etc.
Partial correlation
r12.3-the coefficient of partial correlation
between X and Y keeping Z constant.
r12 − r13 r23
r12.3 =
(1 − r132 )(1 − r232 )
r23 − r12 r13
r23.1 =
(1 − r )(1 − r )
2
12
2
13

r13 − r12 r23


r13.2 =
(1 − r )(1 − r )
2
12
2
23
• Multiple Correlation
• Another technique used to overcome the
drawbacks of simple correlation is multiple
regression analysis.
• Here, we study the effects of all the independent
variables simultaneously on a dependent
variable. For example, the correlation co-efficient
between the yield of paddy (X1) and the other
variables, viz. type of seedlings (X2), manure (X3),
rainfall (X4), humidity (X5) is the multiple
correlation co-efficient R1.2345 . This co-efficient
takes value between 0 and +1.
Introduction
• Multiple Regression and Correlation allow
us to:
1. Disentangle and examine the separate effects
of the independent variables.
2. Use all of the independent variables to
predict Y.
3. Assess the combined effects of the
independent variables on Y.
Multiple correlation
R1.23
-the coefficient of multiple correlation
between X as dependent and Y ,Z as
independent.
r12 + r13 − 2 r12 r13 r23
2 2
R1.23 =
1 − r23 2

2
r21 + r23
2
− 2 r21r23 r13
R2.13 =
1 − r13
2

r 2
+r 2
− 2 r31r32 r12
R3.12 = 31 32
1 − r12
2
Multiple Regression
Multiple regression
Y = b0 + b1 X 1 + b2 X 2

To find b0,b1,b2,…bn we have


Y = nb0 + b1X 1 + b2 X 2
X 1Y = b0 X 1 + b1X + b2 X 1 X 2
1
2

X 2Y = b0 X 2 + b1X 1 X 2 + b2 X 2
2
Example
The table shows the corresponding values of three variables
X,Y and Z.
X 3 5 6 8 12 14
Y 16 10 7 4 3 2
Z 90 72 54 42 30 12

a) Find the regression equation of Z on X and Y.


b) Estimate Z when X=10 and Y=6.
c) Simple correlation coefficient between X and Y. Conclude
about it.
d) Coefficient of determination between X and Y.
e) Find the partial correlation coefficient between X and Y.
f) Find the multiple correlation coefficient of Z on X and Y.
Regression, Correlation
The following table shows the weights z to the nearest
pound, heights x to the nearest inch, and ages y to
the nearest year, of 12 boys,
(a) Find the least squares regression equation Z on X
and Y
(b) Determine the estimated values of Z from the given
values of X and Y
(c) Estimate the weight of a boy who is 9 years old and
54 inches tall.
Weight (z) 64 71 53 67 55 58 77 57 56 51 76 68
Height(x) 57 59 49 62 51 50 55 48 52 42 61 57
Age(y) 8 10 6 11 8 7 10 9 10 6 12 9
Example
Given that r(1,2)=0.6, r(1,3)=0.7 and r(2,3)=0.65
1. Determine R(1.23) and r(23.1)
2. If R(1.23)=0 does it follow that R(2.13=0
3. If R(1.23)=1 does it follow that R(2.13)=1

You might also like