Professional Documents
Culture Documents
In this module, you will learn about bivariate correlation as well as part
correlation, partial correlation and multiple correlation, which involve more than
two variables.
2
Objectives
3
Discussion
4
Y
(a) X
(b) X
(c) X
5
In describing the strength of association between two sets of data, we
refer to the table below:
N(XY) – (X)(Y)
rxy = ___________________________________
____________________________
[ (NX2) – (X)2 ] [ (NY2) – (Y)2 ]
512500 – 499273
rxy = ________________
______________
(18801) (11749)
rxy = 0.89
The above result indicates that there is a high positive (or direct)
relationship between performance in English and performance in Mathematics.
This implies that students who excel in English tend to excel also in Mathematics.
Step 1. Set your calculator to the LR mode (LR means Linear Regression).
78 XD,YD 85 DATA
45 XD,YD 48 DATA
. . . .
. . . .
63 XD,YD 68 DATA
7
Self-Test 1
Compute the correlation coefficient rxy between general average grade and
aptitude test score. Verify your answer using the calculator. Interpret the result.
Ŷ = A + BX
where Ŷ is the predicted value of Y
X is the independent variable
A is the y-intercept
B is the slope of regression.
N(XY) – (X)(Y)
B = ________________ .
NX2 – (X)2
_
Ŷ = rxy (Sy/Sx) (X - ) + Y
The prediction equation is the equation of the line that indicates the
relationship between two variables. The approximate regression line is obtained
by taking the plot of X vs. Y. The regression line or the line of best fit is given by
X vs. Ŷ.
9
To illustrate the above concepts, we consider the following data:
X 2 5 8 10 13
Y 4 6 7 9 11
The predicted values Ŷ are obtained using the regression equation given
previously. The following values were substituted into the equation:
rxy = 0.99
= 7.6
_
Y = 7.4
Sx = 4.28
Sy = 2.70.
Further, we can verify the Ŷ-values using the calculator set on the LR-mode. In
addition to the Pearson r, we can easily obtain the values of A and B:
A = 2.644
B = 0.626.
Ŷ = 2.644 + 0.626X.
The regression line indicating the relationship between X and Y is shown
in Figure 2.
______
Se = Sy 1 - rxy2 .
_________
= 2.70 1 - (0.99)2 .
Se = 0.38.
10
Y
14
Regression Line (X vs. Ŷ)
12
10
Approximate Line (X vs. Y)
8
0
2 4 6 8 10 12 14 X
Approximately 68% of the samples will have actual scores that lie within
one Se of their predicted scores: Y ± 0.38 = 7.4 ± 0.38. In other words, 68 %
have scores within the range 7.02 - 7.78. Likewise, approximately 95 % of the
samples will have actual scores that lie within two S e 's of their predicted scores:
7.4 ± 0.76 (or 6.64 - 8.16).
Self-Test 2
a. Draw the approximate line and regression line indicating the relationship
between X and Y.
rs = 1 - 6 Σ ( X - Y )2
N ( N2 - 1)
rs = 3 4 Σ XY - ( N-1)
N-1 N(N +1)
12
Table 4. Computation of rs
rs = 1 - 6 Σ (X-Y)2
N (N2 -1)
= 1 - 6 (12)
8 [(8)2 -1]
= 1 - 72
504
rs = 0.86
The result of the analysis indicates that there is a high correlation between
creativity and academic performance.
rpb = 1 - N1 N
sx No (N-1)
For example, the correlation between ability in statistics (interval) and sex
(real nominal). For the variable sex, male and female are coded with 1 and 0
respectively.
13
Table 5. Data for the Computation of rpb
X 25 18 24 13 10 20 28 21 15 9 11
Y 1 0 0 1 1 0 1 1 0 0 1
N1 = number of males = 6
N0 = number of females = 5
rpb = 1 - N1 N
sx No (N-1)
= 18 -17.6 (6)(11)
6.52 5 (11-1)
rpb = 0.07
14
Phi-Coefficient
X 0 1 Total
Y
1 11 14 25
0 18 7 25
Total 29 21 50
bc - ad
r= ____________________
__________________
(a+c)(b+d)(a+b)(c+d)
where the values of a, b, c and d; and the sums a+c, b+d, a+b, and c+d are
obtained from the cells of the contingency table below:
X 0 1 Total
Y
1 a b a+b
0 c d c+d
15
Thus,
(14)(18) - (11)(7)
r= __________________
______________
(29)(21)(25)(25)
r= 0.28
The result suggests that there is a slightly greater tendency that persons
from the urban areas will pursue graduate education.
rrb = P - Q
N1N0
where
P = sum of all agreements
rrb = 2 (Y1-Y0)
N
where
Y1 = mean rank of samples coded with 1 in X
_
Y0 = mean rank of samples coded with 0 in X
16
We now have an illustrative problem. A researcher wants to study the
relationship between sex (Male-1, Female-0) and psychomotor skill (ranked data)
of Physical Education students.
Sex 1 0 1 0 0 1 0 1 1 0 1
(X)
Psychomotor skill
10 5 3 9 2 1 4 6 8 7 11
(Y)
X
1 0 P Q
11 5
10 5
9 4
8 4
7 3
6 3
5 2
4 2
3 1
2 1
1 0
TOTAL 18 12
rrb = P - Q = 18 - 12
N1N0 (6)(5)
rrb = 0.20.
17
_ _
Alternatively, let's find Y1 and Y0 :
_
Y1 = 6.5
_
Y0 = 5.4
_ _
rrb = 2 ( Y1 - Y0 )
N
= 2 ( 6.5 - 5.4 )
11
rrb = 0.20
Kendall Tau
Kendall Tau (r) measures the degree of agreement between two judges
who give their rating in rank order. This coefficient is computed using the formula:
r = P - Q
N(N-1)/2
r = 1 - 4Q
N(N-1)
or
r = 4P - 1
N(N-1)
Suppose eight subjects are rated by two evaluators X and Y. The ranks
are given in the following table:
18
Table 10. Data for the Computation of r
Subject X Y
a 3 3
b 2 1
c 1 2
d 5 4
e 4 5
f 8 8
g 6 7
h 7 6
X Y P Q
8 8 7 0
7 6 5 1
6 7 5 0
5 4 3 1
4 5 3 0
3 3 2 0
2 1 0 1
1 2 0 0
Total 25 3
r = 25 - 3
8(8-1)/2
r = 0.79
19
When three or more judges or
raters are involved, we use
Kendall Coefficient of
Concordance, an extension of
Kendall Tau.
Tetrachoric Coefficient
rt = cos 180º
1+ bc
ad
1 0
0 a b
1 c d
20
Using the computational formula, we get:
rt = cos 180º
1+ (78)(43)
(47)(62)
= cos 86.84º
rt = 0.06
Biserial Correlation
_ _
rb = Y1 - Y0 N1N0
sy μN N2 -N
where μ is the ordinate of the unit normal curve corresponding to the area N 1/N
or N0/N in the unit normal curve. The value of μ is obtained from the Table of the
Areas of the Unit Normal (z) Distribution.
N1 N0
N N
Figure 3. The areas N0 /N and N1/N in the Unit Normal (z) Distribution
21
Table 14. Areas of the Unit-Normal (z) Distribution
X Y
0 5
1 11
0 9
1 13
0 10
1 8
0 6
1 12
1 7
_ _
We find that N1 = 5, N0 = 4, N = 9, Y1 = 10.2, Y0 = 7.5, and sy = 2.74, we
also obtain the areas N0/N= 0.4444 and N1/N= 0.5555, and that the ordinate is
0.3951.
Therefore,
_ _
rb = Y1 - Y0 N1N0
sy N N2 -N
rb = 0.65
The above result shows that score in a test item correlates substantially
with total score in the test.
24
Self-Test 3
25
You have just covered
several statistical tools.
Take a rest and then
move on to the last
lesson. I hope that you
are enjoying your
lessons…
Part Correlation
Part correlation coefficient with symbol rx(y.z) indicates the correlation of X
with Y after the portion of Y that can be predicted linearly from Z has been
removed from Y. Part correlation involves more than two variables. The
computational formula is as follows:
26
Table 16. Data for the Computation of rx(y.z)
rxy = 0.93
rxz = 0.69
ryz = 0.85
= 0.93 - (0.69)(0.85)
1 - (0.85)2
rx(y.z) = 0.65
Partial Correlation
Partial correlation, also involving more than two variables, measures the
correlation between X and Y with Z "partialed out" or held constant. The partial
correlation coefficient has the symbol rxy.z.
27
The formula for rxy.z is
rxy = 0.57
rxz = 0.94
ryz = 0.57
rxy.z = 0.12
28
The result indicates that there is a slight correlation between school ability
and physical fitness with age held constant.
Multiple Correlation
When two variable X1 and X2 are correlated with single variable Y, the
appropriate test to be used is multiple correlation. The coefficient (r y.12) is given
by
where the subscript 1 and 2 refer to the variables X1 and X2, respectively. The
variables X1and X2 are called predictors of variable Y. The prediction equation is
Ŷ = a + b1 x1 + b2x2.
The correlation coefficients rY1, rY2 and r12 are computed using the
Pearson r.
29
Table 18. Data for the Computation of ry.12
X1 X2 Y
4 5 5
5 6 5
5 5 6
6 4 5
6 6 8
6 8 7
7 7 8
7 9 9
8 10 9
9 8 8
10 9 10
Means:
_
1 = 6.64 2 = 7.00 Y = 7.27
Standard Deviations:
Correlation Coefficients:
b1 = 0.45.
30
b2 = rY2 – (rY1)( r12) sy
1 - r122 s2
b2 = 0.48.
_
a = Y – b1 1 – b2 2 .
a = 0.92.
Ŷ = a + b1X1 + b2X2
_______________________
rY.12 = / rY12 + rY22 – 2(rY1) (rY2) (r12)
1 - (r12)2
_______________________________
rY.12 = / (0.84)2 + (0.86)2 – 2(0.84) (0.86) (0.74)
1 - (0.74)2
rY.12 = 0.91.
31
The multiple correlation coefficient indicates that X 1 and X2, taken as a
whole, correlate highly with Y. Together with rY1 = 0.84 and ry2 = 0.86, rY.12 =
0.91 reveals that X1 (mental ability) and X2 (aptitude) are good predictors of Y
(academic performance).
Self-Test 4
32
a) Compute rX(Y.Z). Interpret the result.
Congratulations for
completing Module 3!
…Before you proceed to the
next module, please submit
your answers to the Self-
Tests to your professor...
References
33