Professional Documents
Culture Documents
Chapter
30
Correlation and Regression
n
1
Correlation or Cov (x , y )
n (x y i i x y) , where
Univariate and Bivariate distribution i 1
n n
1 1
“If it is proved true that in a large number of instances two
variables tend always to fluctuate in the same or in opposite directions,
x
n x
i 1
i and y
n y
i 1
i are means of variables x and y
we consider that the fact is established and that a relationship exists.
This relationship is called correlation.” respectively.
(1) Univariate distribution : These are the distributions in which Covariance is not affected by the change of origin, but it is
there is only one variable such as the heights of the students of a class. affected by the change of scale.
(2) Bivariate distribution : Distribution involving two discrete Correlation
variable is called a bivariate distribution. For example, the heights and The relationship between two variables such that a change in one
the weights of the students of a class in a school. variable results in a positive or negative change in the other variable is
(3) Bivariate frequency distribution : Let x and y be two known as correlation.
variables. Suppose x takes the values x 1 , x 2 ,....., x n and y takes (1) Types of correlation : (i) Perfect correlation : If the two
variables vary in such a manner that their ratio is always constant, then
the values y 1 , y 2 ,....., y n , then we record our observations in the correlation is said to be perfect.
the form of ordered pairs ( x 1 , y1 ) , where 1 i n,1 j n . (ii) Positive or direct correlation : If an increase or decrease in
one variable corresponds to an increase or decrease in the other, the
If a certain pair occurs fij times, we say that its frequency is fij . correlation is said to be positive.
The function which assigns the frequencies fij ’s to the pairs (iii) Negative or indirect correlation : If an increase or decrease
in one variable corresponds to a decrease or increase in the other, the
( x i , y j ) is known as a bivariate frequency distribution. correlation is said to be negative.
(2) Karl Pearson's coefficient of correlation : The correlation
Covariance coefficient r( x , y ) , between two variable x and y is given by,
Let ( x i , y i ); i 1, 2,....., n be a bivariate distribution, Cov ( x , y ) Cov ( x , y )
r( x , y ) or ,
where x 1 , x 2 ,....., x n are the values of variable x and Var ( x ) Var (y ) x y
y 1 , y 2 ,....., y n those of y. Then the covariance Cov (x, y)
between x and y is given by n n n
1
n n
x iyi
xi
yi
Cov (x , y )
n (x
i 1
i x )(y i y )
r( x , y ) i 1 i 1 i 1
2 2
n n n n
n x
i 1
i
2
xi
i 1
n y
i 1
i
2
yi
i 1
2 Correlation and Regression
r 1 , it means that there is perfect correlation in the two
(x x )(y y ) dxdy characteristics i.e., every individual is getting the same ranks in the two
r . characteristics.
( x x )2 (y y )2 dx 2 dy 2 r 1 , it means that if the rank of one characteristics is high,
(3) Modified formula : then that of the other is low or if the rank of one characteristics is low,
dxdy n
dx . dy then that of the other is high.
r 1 , it means that there is perfect negative correlation in
r the two characteristics i.e, an individual getting highest rank in one
dx dy dy
2 2
characteristic is getting the lowest rank in the second characteristic.
dx 2
n
n
2
Here the rank, in the two characteristics in a group of n individuals are
of the type (1, n), (2, n 1),....., (n, 1) .
where dx x x ; dy y y r 0 , it means that no relation can be established between the
two characteristics.
Cov ( x , y ) Cov ( x , y )
Also, rxy . Standard error and probable error
x y Var ( x ).Var (y )
(1) Standard error of prediction : The deviation of the
(4) Step deviation method : Let A and B are assumed mean of predicted value from the observed value is known as the standard error
xi and y i respectively, then
prediction and is defined as S y
(y y p)
2
.
1
r(x , y )
u . v
uivi
n
i i
n
where y is actual value and y p is predicted value.
u n u v n v
1 2 1 2 2 2
i i i i In relation to coefficient of correlation, it is given by
, (i) Standard error of estimate of x is S x x 1 r 2
where u i x i A, v i y i B .
(ii) Standard error of estimate of y is S y y 1 r 2 .
Rank correlation (2) Relation between probable error and standard error: If r
Let us suppose that a group of n individuals is arranged in order is the correlation coefficient in a sample of n pairs of observations,
of merit or proficiency in possession of two characteristics A and B. 1 r2
These rank in two characteristics will, in general, be different. then its standard error S.E. (r) and probable error P.E.
n
For example, if we consider the relation between intelligence and
beauty, it is not necessary that a beautiful individual is intelligent also. 1 r2
(r) = 0.6745 (S.E.)= 0.6745 . The probable error or the
Rank Correlation : 1
6 d 2
, which is the
n
n(n 2 1) standard error are used for interpreting the coefficient of correlation.
Spearman's formulae for rank correlation coefficient. (i) If r P.E. (r) , there is no evidence of correlation.
Where d 2
= sum of the squares of the difference of two (ii) If r 6 P.E. (r) , the existence of correlation is certain.
ranks and n is the number of pairs of observations. The square of the coefficient of correlation for a bivariate
We always have, distribution is known as the “Coefficient of determination”.
d i (x i yi) x i y i n( x ) n(y ) 0
Regression
, (x y )
Linear regression
If all d's are zero, then r 1 , which shows that there is perfect
rank correlation between the variable and which is maximum value of If a relation between two variates x and y exists, then the dots of
the scatter diagram will more or less be concentrated around a curve
r. If however some values of x i are equal, then the coefficient of which is called the curve of regression. If this curve be a straight line,
rank correlation is given by then it is known as line of regression and the regression is called
linear regression.
1
6 d 2 (m 3 m ) Line of regression : The line of regression is the straight line
12
r 1 which in the least square sense gives the best fit to the given
2 frequency.
n(n 1)
Equations of lines of regression
where m is the number of times a particular x i is repeated.
Positive and Negative rank correlation coefficients (1) Regression line of y on x : If value of x is known, then value of
y can be found as
Let r be the rank correlation coefficient then, if r 0 , it means
that if the rank of one characteristic is high, then that of the other is
also high or if the rank of one characteristic is low, then that of the
other is also low.
Correlation and Regression 3
Equation of the two lines of regression are both b yx and b xy are negative, the r will be negative.
y y b yx (x x ) and x x b xy (y y ) .
We have, m 1 slope of the line of regression of y on x
y
byx r.
x
1 y Correlation is a pure number and hence unitless.
m 2 Slope of line of regression of x on y .
b xy r. x Correlation coefficient is not affected by change of origin and
scale.
y r y If two variate are connected by the linear relation
x y K , then x, y are in perfect indirect correlation. Here
m 2 m1 r x x
tan
r y y r 1 .
1 m 1m 2
1 . If x, y are two independent variables, then
x r x x2 y2
(x y, x y ) .
( y r 2 y ) x (1 r 2 ) x y x2 y2
.
r x2 r y2 r( x2 y2 ) If regression lines are y ax b and x cy d , then
Correlation
1. If Var (x) = 8.25, Var (y) = 33.96 and Cov (x, y) = 10.2, then
the correlation coefficient is
(a) 0.89 (b) – 0.98
(c) 0.61 (d) – 0.16
2. When the origin is changed, then the coefficient of correlation
(a) Becomes zero (b) Varies
(c) Remains fixed (d) None of these
3. If r is the correlation coefficient between two variables, then
[MP PET 1995]
(a) r 1 (b) r 1
(c) | r | 1 (d) | r | 1
4. If r is the coefficient of correlation and Y a bX , then
| r|
a b
(a) (b)
b a
(c) 1 (d) None of these
5. Karl Pearson's coefficient of correlation between the heights (in
inches) of teachers and students corresponding to the given data
is [MP PET 1993]
Height of teachers x : 66 67 68 69 70
Height of students y : 68 66 69 72 70
1
(a) (b) 2
2
1
(c) (d) 0
2