You are on page 1of 33

MARIO P. OBRERO, Ph.D.

University of Northern Philippines


Vigan City, Ilocos Sur
About Module 3

Module 3 deals with the measures of correlation or relationship among


variables. It discusses various statistical tools, which can be used to determine
the relationship of any pair of nominal, ordinal, interval or ratio variables. The
most popular among these techniques is the Pearson Product Moment
Coefficient of Correlation, which is appropriate for two sets of interval data. On
the other hand, Spearman Ranks Correlation is applicable for two sets of ordinal
data. Other measures of correlation include the phi-coefficient, point-biserial
correlation, rank biserial correlation, biserial correlation, tetrachoric coefficient,
and Kendall Tau.

In this module, you will learn about bivariate correlation as well as part
correlation, partial correlation and multiple correlation, which involve more than
two variables.

The following topics are included in this learning module:

Lesson 1. The Concept of Correlation


➢ Pearson Product-Moment Coefficient of Correlation
➢ Linear Regression

Lesson 2. Other Bivariate Correlation Coefficients


➢ Spearman Ranks Correlation
➢ Point-Biserial Correlation
➢ Phi-Coefficient
➢ Rank Biserial Coefficient
➢ Kendall Tau
➢ Tetrachoric Coefficient
➢ Biserial Correlation

Lesson 3. Correlation Involving More than Two Variables


➢ Part Correlation
➢ Partial Correlation
➢ Multiple Correlation

2
Objectives

After completing this module, you will be able to:

➢ explain the concepts of correlation and regression,


➢ present graphically the relationship between two variables and
derive the regression equation,
➢ distinguish between and among the different correlation
coefficients,
➢ compute the correlation coefficient that gives the relationship
between two variables, measured in nominal, ordinal or interval
scales,
➢ differentiate part correlation, partial correlation, and multiple
correlation, and
➢ develop computational skills in correlation analyses involving two or
more variables.

How to go about the Module

As you go along this module, you will encounter self-tests designed to


help you assess your understanding of each topic. Answer the questions and
write your solutions in a notebook. In case you find difficulty in answering the
tests, I suggest that you read the topic covered once more.

Okay, you may now


proceed to the first lesson.
I hope that you will enjoy
the lessons…

3
Discussion

Lesson 1. The Concept of Correlation

It is a general knowledge that the intelligence of a child tends to resemble


the intelligence of his parents. Likewise, children of tall parents tend to be taller
than the children of shorter parents. These observations suggest that the
intelligence of a child and the intelligence of his parents are correlated; and that
the height of a child is correlated with the height of his parents. Several
characteristics of family members such as attitudes, ability, personality and
interest are found to have high degrees of relationship.

In education, there have been studies that show that mathematical


achievement is associated with sex, that is, males tend to be more
mathematically proficient than females. Educational researchers may study the
relationships among various student-, teacher- and school-related variables.

The correlation between two variables can be presented graphically.


Figure 1 illustrates positive correlation, negative correlation and no correlation
between variables X and Y.

Pearson Product-Moment Coefficient of Correlation

Karl Pearson (1857-1936) introduced a method of quantifying the


relationship between two variables. This method is called Pearson Product
Moment Coefficient of Correlation, denoted by rxy (or simply r).

For two variables X and Y, measured in interval scale, Pearson r or r xy


indicates both the magnitude (or degree or strength) and the direction of
relationship. The value of rxy ranges from 0 to 1. If rxy is zero, there is no
correlation between X and Y. If rxy is 1, the relationship between X and Y is
perfect. The direction of relationship is indicated by the sign of the coefficient. If
rxy is positive, the relationship is direct. If rxy is negative, the relationship is
inverse.

4
Y

(a) X

(b) X

(c) X

Figure 1. Types of Correlation: (a) Positive Correlation (b) Negative Correlation


(c) No Correlation Between X and Y

5
In describing the strength of association between two sets of data, we
refer to the table below:

Table 1. Values of rxy and Degrees of Correlation

Value of rxy Degree of Correlation


1.00 Perfect
0.81 to 0.99 High to Very High
0.61 to 0.80 Marked, Substantial
0.41 to 0.60 Moderate
0.21 to 0.40 Definite but Small
0.01 to 0.20 Almost Negligible to Slight
0.00 No Correlation

The computational formula for rxy is as follows:

N(XY) – (X)(Y)
rxy = ___________________________________
____________________________
 [ (NX2) – (X)2 ] [ (NY2) – (Y)2 ]

To illustrate the computation of rxy, let us consider the following example.


Given are the final test scores of Grade VI pupils in English (X) and Mathematics
(Y).

Table 2. Computation of the Pearson Product-Moment Coefficient of Correlation

Sample Score in Score in


No. English Mathematics XY X2 Y2
(X) (Y)
1 78 85 6630 6084 7225
2 45 48 2160 2024 2304
3 55 65 3575 3025 4225
4 76 73 5548 5776 5329
5 85 87 7395 7225 7569
6 72 77 5544 5184 5929
7 90 83 7470 8100 6889
8 65 74 4810 4225 5476
9 54 71 3834 2916 5041
10 63 68 4284 3969 4624
N = 10 X = 683 Y = 731 XY = X2 = Y2 =
51250 48529 54611
6
10(51250) – (683)(731)
rxy = ___________________________________
___________________________________
 [ 10(48529) – (683)2 ] [ 10(54611) – (731)2 ]

512500 – 499273
rxy = ________________
______________
 (18801) (11749)

rxy = 0.89

The above result indicates that there is a high positive (or direct)
relationship between performance in English and performance in Mathematics.
This implies that students who excel in English tend to excel also in Mathematics.

Finding rxy Using the Calculator

To facilitate the computation of the Pearson Product Moment


Coefficient of Correlation (rxy), we can use the scientific calculator. Just follow
the steps below:

Step 1. Set your calculator to the LR mode (LR means Linear Regression).

Step 2. Delete any stored data.

Step 3. Enter the data (see previous example) as follows:

78 XD,YD 85 DATA

45 XD,YD 48 DATA
. . . .
. . . .
63 XD,YD 68 DATA

Step 4. Find the Pearson r.

7
Self-Test 1

1. The following data were obtained from a sample of students entering a


National High School:

Sample General Average Grade Aptitude Test Score


1 93 94
2 88 82
3 87 90
4 89 87
5 84 86
6 82 85
7 80 76
8 91 86
9 87 80
10 86 81
11 81 84
12 88 80
13 90 85
14 82 84
15 83 88

Compute the correlation coefficient rxy between general average grade and
aptitude test score. Verify your answer using the calculator. Interpret the result.

2. Compute the Pearson r between IQ score and Mathematics Test Score.


Verify your answer using the calculator. Interpret the result.

Pupil IQ Mathematical Test Score


1 106 16
2 118 21
3 80 10
4 120 25
5 117 16
6 94 11
7 106 6
8 119 32
9 107 20
10 128 15
11 122 26
12 116 18
8
Linear Regression

A prediction equation (or regression equation) which is used to predict the


value of the dependent variable based on the dependent variable can be written
as follows:

Ŷ = A + BX
where Ŷ is the predicted value of Y
X is the independent variable
A is the y-intercept
B is the slope of regression.

The slope of regression can be obtained from the formula:

N(XY) – (X)(Y)
B = ________________ .

NX2 – (X)2

An equivalent regression equation is given by

_
Ŷ = rxy (Sy/Sx) (X - ) + Y

where rxy is the Pearson correlation coefficient,

is the mean of the data for the dependent variable X,


_
Y is the mean of the data for the independent variable Y,

Sx is the standard deviation of the data for X, and

Sy is the standard deviation of the data for Y.

To set the limits around a predicted score Ŷ within which a person's


actual score is likely to fall, the standard error of estimate (S e) is computed using
the equation:
______
Se = Sy  1 - rxy2 .

The prediction equation is the equation of the line that indicates the
relationship between two variables. The approximate regression line is obtained
by taking the plot of X vs. Y. The regression line or the line of best fit is given by
X vs. Ŷ.

9
To illustrate the above concepts, we consider the following data:

Table 3. Data for Variables X and Y

X 2 5 8 10 13

Y 4 6 7 9 11

The predicted values Ŷ are obtained using the regression equation given
previously. The following values were substituted into the equation:

rxy = 0.99

= 7.6
_
Y = 7.4

Sx = 4.28

Sy = 2.70.

Further, we can verify the Ŷ-values using the calculator set on the LR-mode. In
addition to the Pearson r, we can easily obtain the values of A and B:

A = 2.644

B = 0.626.

Thus, the regression equation is

Ŷ = 2.644 + 0.626X.
The regression line indicating the relationship between X and Y is shown
in Figure 2.

The standard error of estimate (Se) is

______
Se = Sy  1 - rxy2 .

_________
= 2.70  1 - (0.99)2 .

Se = 0.38.
10
Y

14
Regression Line (X vs. Ŷ)
12

10
Approximate Line (X vs. Y)
8

0
2 4 6 8 10 12 14 X

Figure 2. Regression Line Indicating Relationship Between X and Y

Approximately 68% of the samples will have actual scores that lie within
one Se of their predicted scores: Y ± 0.38 = 7.4 ± 0.38. In other words, 68 %
have scores within the range 7.02 - 7.78. Likewise, approximately 95 % of the
samples will have actual scores that lie within two S e 's of their predicted scores:
7.4 ± 0.76 (or 6.64 - 8.16).

Self-Test 2

Consider the data in problem 2 of Self - Test 1. Let X be IQ score and Y be


Mathematics Test Score.

a. Draw the approximate line and regression line indicating the relationship
between X and Y.

b. Write down the prediction equation.

c. Estimate the range of scores in the Mathematics Test in which approximately


c.1 68 %
c.2 95 % of the scores will likely fall.
11
Take a rest for a while
before you proceed to
the second lesson.

Lesson 2. Other Bivariate Correlation


Coefficients

Spearman Ranks Correlation

The Spearman Ranks Correlation coefficient, denoted by rs , is appropriate


for determining the relationship between two sets of ordinal (ranked) data. The
computational formula for rs is as follows:

rs = 1 - 6 Σ ( X - Y )2
N ( N2 - 1)

Another formula for rs is

rs = 3 4 Σ XY - ( N-1)
N-1 N(N +1)

Let us now consider an example. Eight students are ranked according to


creativity (X) and academic performance (Y).

12
Table 4. Computation of rs

Student Creativity Academic X -Y (X - Y)2


No. (X) Performance (Y)
1 6 8 -2 4
2 2 3 -1 1
3 1 1 0 0
4 3 2 1 1
5 4 5 -1 1
6 5 4 1 1
7 8 6 2 4
8 7 7 0 0
N=8 Σ (X-Y)2 = 12

rs = 1 - 6 Σ (X-Y)2
N (N2 -1)

= 1 - 6 (12)
8 [(8)2 -1]

= 1 - 72
504

rs = 0.86

The result of the analysis indicates that there is a high correlation between
creativity and academic performance.

Point Biserial Correlation Coefficient

The point-biserial correlation coefficient (rpb) can be used to analyze the


relationship between real nominal dichotomous data and interval data. The
formula for rpb is

rpb = 1 - N1 N
sx No (N-1)

For example, the correlation between ability in statistics (interval) and sex
(real nominal). For the variable sex, male and female are coded with 1 and 0
respectively.

13
Table 5. Data for the Computation of rpb

X 25 18 24 13 10 20 28 21 15 9 11
Y 1 0 0 1 1 0 1 1 0 0 1

1 = mean for males = 18

= total mean = 17.6

Sx = total standard deviation = 6.52

N1 = number of males = 6

N0 = number of females = 5

N = N1+No = total number of samples = 11

Using the formula, we get

rpb = 1 - N1 N
sx No (N-1)

= 18 -17.6 (6)(11)
6.52 5 (11-1)

rpb = 0.07

The value of rpb shows that there is an almost negligible correlation


between ability in statistics and sex.

Try using your calculator


(LR mode) to find rs and
rpb. What do you find?

14
Phi-Coefficient

When the relationship between two sets of dichotomous data is analyzed,


the phi-coefficient (r) is appropriate.

Suppose we want to know if place of residence [urban (1), rural (0)], is


associated with pursuit of graduate education [pursued graduate education (1),
did not pursue graduate education (0)].

Table 6. Data for the Computation of r

X 0 1 Total
Y
1 11 14 25

0 18 7 25

Total 29 21 50

We now use the computational formula:

bc - ad
r= ____________________
__________________
 (a+c)(b+d)(a+b)(c+d)

where the values of a, b, c and d; and the sums a+c, b+d, a+b, and c+d are
obtained from the cells of the contingency table below:

Table 7. Contingency Table for r

X 0 1 Total
Y
1 a b a+b

0 c d c+d

Total a+c b+d N

15
Thus,

(14)(18) - (11)(7)
r= __________________
______________
 (29)(21)(25)(25)

r= 0.28

The result suggests that there is a slightly greater tendency that persons
from the urban areas will pursue graduate education.

Rank Biserial Coefficient

The rank-biserial coefficient rrb is suitable for analyzing the relationship


between real dichotomous data and continuous ordinal data.

Below is the formula for rrb :

rrb = P - Q
N1N0

where
P = sum of all agreements

Q = sum of all inversions

N1 = number of samples coded with 1 in X

N0 = number of samples coded with 0 in X

Another formula for rrb is

rrb = 2 (Y1-Y0)
N
where
Y1 = mean rank of samples coded with 1 in X
_
Y0 = mean rank of samples coded with 0 in X

N = total number of samples.

16
We now have an illustrative problem. A researcher wants to study the
relationship between sex (Male-1, Female-0) and psychomotor skill (ranked data)
of Physical Education students.

Table 8. Data for the Computation of rrb

Sex 1 0 1 0 0 1 0 1 1 0 1
(X)
Psychomotor skill
10 5 3 9 2 1 4 6 8 7 11
(Y)

Arrange the data as shown in the table below:

Table 9. Computation of rrb

X
1 0 P Q
11 5
10 5
9 4
8 4
7 3
6 3
5 2
4 2
3 1
2 1
1 0
TOTAL 18 12

Note that column P (agreement) is provided for ranks in column 1; and a


column Q (inversions) for the ranks in column 0. To get P corresponding to rank
11, we observe that there are five less than 11numbers below rank 11 in column
0 (9,7,5,4and 2). Hence, P = 5. Also, P = 5 for rank 10. For rank 9, Q = 4 since
there are four numbers less than 9 below rank 9 in column 1 (8,6,3 and 1).

rrb = P - Q = 18 - 12
N1N0 (6)(5)

rrb = 0.20.

17
_ _
Alternatively, let's find Y1 and Y0 :
_
Y1 = 6.5
_
Y0 = 5.4
_ _
rrb = 2 ( Y1 - Y0 )
N

= 2 ( 6.5 - 5.4 )
11

rrb = 0.20

The rrb obtained indicates that there is a slight correlation between


psychomotor skill and sex.

Kendall Tau

Kendall Tau (r) measures the degree of agreement between two judges
who give their rating in rank order. This coefficient is computed using the formula:

r = P - Q
N(N-1)/2

where P is the total number of agreements and Q is the total number of


disagreements. Alternatively,

r = 1 - 4Q
N(N-1)

or
r = 4P - 1
N(N-1)

Suppose eight subjects are rated by two evaluators X and Y. The ranks
are given in the following table:

18
Table 10. Data for the Computation of r

Subject X Y
a 3 3
b 2 1
c 1 2
d 5 4
e 4 5
f 8 8
g 6 7
h 7 6

First, we arrange X (or Y) in natural order and then determine the Ps


(agreements) and the Qs(disagreements).

Table 11. Computation of r

X Y P Q
8 8 7 0
7 6 5 1
6 7 5 0
5 4 3 1
4 5 3 0
3 3 2 0
2 1 0 1
1 2 0 0
Total 25 3

Note that P for a rank R in Y is obtained by counting the numbers less


than the value of R, below it in column Y. Q for R is determined by counting the
numbers greater than the magnitude of R, below it in column Y.

We then use the first computational formula:

r = 25 - 3
8(8-1)/2

r = 0.79

The coefficient indicates that there is a substantial agreement between the


ranks given by the two judges or evaluators.

19
When three or more judges or
raters are involved, we use
Kendall Coefficient of
Concordance, an extension of
Kendall Tau.

Tetrachoric Coefficient

The tetrachoric coefficient (rt) is useful when one investigates the


relationship between two artificial dichotomous variables.

The computational formula is given as follows:

rt = cos 180º
1+ bc
ad

considering the contingency table:

Table 12. Contingency Table for rt

1 0

0 a b

1 c d

To illustrate the method, we consider the question: Is passing the


Licensure Examination for Teachers correlated with the type of school calendar
of the examinees?

Table 13. Data for the Computation of rt

School LET Performance


Calendar Passed (1) Failed (0)
Semestral (0) 47 78
Trimestral (1) 43 62

20
Using the computational formula, we get:

rt = cos 180º
1+ (78)(43)
(47)(62)

= cos 86.84º

rt = 0.06

The result indicates that there is almost negligible relationship between


the LET Performance and the type of school calendar.

Biserial Correlation

The biserial correlation coefficient (rb) is used to determine the relationship


between an artificial nominal dichotomous variable and an interval variable.

The coefficient rb is computed using a formula

_ _
rb = Y1 - Y0 N1N0
sy μN N2 -N

where μ is the ordinate of the unit normal curve corresponding to the area N 1/N
or N0/N in the unit normal curve. The value of μ is obtained from the Table of the
Areas of the Unit Normal (z) Distribution.

N1 N0
N N

Figure 3. The areas N0 /N and N1/N in the Unit Normal (z) Distribution

21
Table 14. Areas of the Unit-Normal (z) Distribution

z Below z Above z Ordinate z Below z Above z Ordinate


 
0.00 .5000 .5000 .3989 0.41 .6591 .3409 .3668
0.01 .5040 .4960 .3989 0.42 .6628 .3372 .3653
0.02 .5080 .4920 .3989 0.43 .6664 .3336 .3637
0.03 .5120 .4880 .3988 0.44 .6700 .3300 .3621
0.04 .5160 .4840 .3986 0.45 .6736 .3264 .3605
0.05 .5199 .4801 .3984 0.46 .6772 .3228 .3589
0.06 .5239 .4761 .3982 0.47 .6808 .3192 .3572
0.07 .5279 .4721 .3980 0.48 .6844 .3156 .3555
0.08 .5319 .4631 .3977 0.49 .6879 .3121 .3538
0.09 .5359 .4641 .3973 0.50 .6915 .3085 .3521
0.10 .5398 .4602 .3970 0.51 .6950 .3050 .3503
0.11 .5438 .4562 .3965 0.52 .6985 .3015 .3485
0.12 .5478 .4522 .3961 0.53 .7019 .2981 .3467
0.13 .5517 .4483 .3956 0.54 .7054 .2946 .3448
0.14 .5557 .4443 .3951 0.55 .7088 .2912 .3429
0.15 .5596 .4404 .3945 0.56 .7123 .2877 .3410
0.16 .5636 .4364 .3939 0.57 .7157 .2843 .3391
0.17 .5675 .4325 .3932 0.58 .7190 .2810 .3372
0.18 .5714 .4286 .3925 0.59 .7224 .2776 .3352
0.19 .5753 .4247 .3918 0.60 .7257 .2743 .3332
0.20 .5793 .4207 .3910 0.61 .7291 .2709 .3312
0.21 .5832 .4168 .3902 0.62 .7324 .2676 .3292
0.22 .5871 .4129 .3894 0.63 .7357 .2643 .3271
0.23 .5910 .4090 .3885 0.64 .7389 .2611 .3251
0.24 .5948 .4052 .3876 0.65 .7422 .2578 .3230
0.25 .5987 .4013 .3867 0.66 .7454 .2546 .3209
0.26 .6026 .3974 .3857 0.67 .7486 .2514 .3187
0.27 .6064 .3936 .3847 0.68 .7517 .2483 .3166
0.28 .6103 .3897 .3846 0.69 .7549 .2451 .3144
0.29 .6141 .3859 .3825 0.70 .7580 .2420 .3123
0.30 .6179 .3821 .3814 0.71 .7611 .2389 .3101
0.31 .6217 .3783 .3802 0.72 .7642 .2358 .3079
0.32 .6255 .3745 .3790 0.73 .7673 .2327 .3056
0.33 .6293 .3707 .3778 0.74 .7704 .2296 .3034
0.34 .6331 .3669 .3765 0.75 .7734 .2266 .3011
0.35 .6368 .3632 .3752 0.76 .7764 .2236 .2989
0.36 .6406 .3594 .3739 0.77 .7794 .2206 .2966
0.37 .6443 .3557 .3725 0.78 .7823 .2177 .2943
0.38 .6480 .3520 .3712 0.79 ..7852 .2148 .2920
0.39 .6517 .3483 .3697 0.80 .7881 .2119 .2897
0.40 .6554 .3446 .3683 0.81 .7910 .2090 .2874
22
z Below z Above z Ordinate z Below z Above z Ordinate
 
0.82 .7939 .2061 .2850 1.25 .8944 .1056 .1826
0.83 .7967 .2033 .2827 1.26 .8962 .1038 .1804
0.84 .7995 .2005 .2803 1.27 .8980 .1020 .1781
0.85 .8023 .1977 .2780 1.28 8997 .1003 .1758
0.86 .8051 .1949 .2756 1.29 .9015 .0985 .1736
0.87 .8078 .1922 .2732 1.30 .9032 .0968 .1714
0.88 .8106 .1894 .2709 1.31 .9049 .0951 .1691
0.89 .8133 .1867 .2685 1.32 .9066 .0934 .1669
0.90 .8159 .1841 .2661 1.33 .9082 .0918 .1647
0.91 .8186 .1814 .2637 1.34 .9099 .0901 .1626
0.92 .8212 .1788 .2613 1.35 .9115 .0885 .1604
0.93 .8238 .1762 .2589 1.36 .9131 .0869 .1582
0.94 .8264 .1736 .2565 1.37 .9147 .0853 .1561
0.95 .8289 .1711 .2541 1.38 .9162 .0838 .1539
0.96 .8315 .1685 .2516 1.39 .9177 .0823 .1518
0.97 .8340 .1660 .2492 1.40 .9192 .0808 .1497
0.98 .8365 .1635 .2468 1.41 .9207 .0793 .1476
0.99 .8389 .1611 .2444 1.42 .9222 .0778 .1456
1.00 .8413 .1587 .2420 1.43 .9236 .0764 .1435
1.01 .8438 .1562 .2396 1.44 9251 .0749 .1415
1.02 .8461 .1539 .2371 1.45 9265 .0735 .1394
1.03 .8485 .1515 .2347 1.46 .9279 .0721 .1374
1.04 .8508 .1500 .2323 1.47 .9292 .0708 .1354
1.05 .8531 .1492 .2299 1.48 .9306 .0694 .1334
1.06 .8554 .1469 .2275 1.49 .9319 .0681 .1315
1.07 .8577 .1446 .2251 1.50 .9332 .0668 .1295
1.08 .8599 .1423 .2227 1.51 .9345 .0655 .1276
1.09 .8621 .1401 .2203 1.52 .9357 .0643 .1257
1.10 .8643 .1379 .2179 1.53 .9370 .0630 .1238
1.11 .8665 .1357 .2155 1.54 .9382 .0618 .1219
1.12 .8686 .1335 .2131 1.55 .9394 .0606 .1200
1.13 .8708 .1314 .2107 1.56 .9406 .0594 .1182
1.14 .8729 .1292 .2083 1.57 .9418 .0582 .1163
1.15 .8749 .1271 .2059 1.58 .9429 .0571 .1145
1.16 .8770 .1251 .2036 1.59 .9441 .0559 .1127
1.17 .8790 .1230 .2012 1.60 .9452 .0548 .1109
1.18 .8810 .1210 .1989 1.61 .9463 .0537 .1092
1.19 .8830 .1190 .1965 1.62 .9474 .0526 .1074
1.20 .8849 .1170 .1942 1.63 .9484 .0516 .1057
1.21 .8869 .1151 .1919 1.64 .9495 .0505 .1040
1.22 .8888 .1131 .1895 1.65 .9505 .0495 .1023
1.23 .8907 .1112 .1872 1.66 .9515 .0485 .1006
1.24 .8925 .1093 .1849 1.67 .9525 .0475 .0989
23
z Below z Above z Ordinate z Below z Above z Ordinate
 
1.96 .9750 .0250 .0584 3.00 .99865 .00135 .0044
2.00 .9772 .0228 .0540 3.50 .99977 .00023 .00087
2.50 .9938 .0062 .0175 4.00 .999968 .000032 .00013
Source: Glass & Hopkins (1984)

As an example, we consider the correlation between score in a test item


(X) and the total score in the test (Y).

Table 15. Data for the Computation of rb

X Y
0 5
1 11
0 9
1 13
0 10
1 8
0 6
1 12
1 7

_ _
We find that N1 = 5, N0 = 4, N = 9, Y1 = 10.2, Y0 = 7.5, and sy = 2.74, we
also obtain the areas N0/N= 0.4444 and N1/N= 0.5555, and that the ordinate  is
0.3951.

Therefore,
_ _
rb = Y1 - Y0 N1N0
sy N N2 -N

= 10.2-7.5 (5) (4)


2.74 (0.3951)(9) (9)2-9

rb = 0.65

The above result shows that score in a test item correlates substantially
with total score in the test.

24
Self-Test 3

Given the following data:


Subject X1 X2 X3 X4 X5 X6 X7
1 0 1 0 9 6 90 89
2 1 1 0 3 7 88 87
3 0 1 1 1 5 85 83
4 1 0 1 12 14 86 85
5 1 0 0 7 10 82 90
6 0 1 1 8 8 90 85
7 0 0 1 2 4 89 81
8 1 1 1 11 13 83 79
9 0 0 0 15 11 80 75
10 0 1 1 10 9 78 76
11 1 0 1 4 1 95 93
12 0 1 0 5 3 91 88
13 1 0 1 14 12 75 83
14 1 0 0 13 15 70 77
15 1 1 0 6 2 87 81
where X1 = Sex (Male=1, Female=0)
X2 = Educational Attainment (With Master's Degree=1, Without Master's
Degree=0)
X3 = Type of Employment (Public=1, Private=0)
X4 = Rank in Leadership Qualities
X5 = Rank in Creativity Test
X6 = Verbal Aptitude
X7 = Numerical Proficiency
Identify the appropriate correlation analysis to determine the relationship
between:
a. X1 and X2
b. X1 and X3
c. X2 and X3
d. X2 and X4
e. X2 and X6
f. X4 and X5
g. X4 and X6
h. X5 and X6
i. X5 and X7
j. X6 and X7
Interpret the results.

25
You have just covered
several statistical tools.
Take a rest and then
move on to the last
lesson. I hope that you
are enjoying your
lessons…

Lesson 3. Correlation Coefficients


Involving More than Two
Variables

Part Correlation
Part correlation coefficient with symbol rx(y.z) indicates the correlation of X
with Y after the portion of Y that can be predicted linearly from Z has been
removed from Y. Part correlation involves more than two variables. The
computational formula is as follows:

rx(y.z) = rxy - (rxz) (ryz)


1 - ryz2

If the three variable X, Y and Z are measured in interval scale, the


correlation coefficients rxy , rxz, and ryz are computed using Pearson Product
Moment Coefficient of Correlation.

As an example, we consider the correlation between aptitude (X) and post


test score (Y) with the linear effect of the pretest score (Z) removed from Y.

26
Table 16. Data for the Computation of rx(y.z)

Aptitude Posttest Score Pretest Score


(X) (Y) (Z)
11 16 13
13 15 12
16 13 8
18 17 14
20 18 10
25 20 15
26 22 14
29 24 16
31 26 18

The correlation coefficients are

rxy = 0.93

rxz = 0.69

ryz = 0.85

and the partial correlation coefficient is

rx(y.z) = rxy - (rxz) (ryz)


1 - ryz2

= 0.93 - (0.69)(0.85)
1 - (0.85)2

rx(y.z) = 0.65

There is a marked correlation between aptitude and posttest score.

Partial Correlation

Partial correlation, also involving more than two variables, measures the
correlation between X and Y with Z "partialed out" or held constant. The partial
correlation coefficient has the symbol rxy.z.

27
The formula for rxy.z is

rxy.z = rxy - (rxz) (ryz)


(1-rxz2)(1-ryz2)

Suppose we administer a school ability test and physical fitness test to


Grade IV and VI pupils and the data are as follows:

Table 17. Data for the Computation of rxy.z

School Ability Physical Fitness Age


(X) (Y) (Z)
12 16 10
13 15 10
15 13 10
18 16 10
20 12 11
22 18 11
24 19 11
25 20 12
29 18 12
30 17 12

The question we want to answer is: Is school ability correlated with


physical fitness with age held constant?

We then obtain the Pearson Correlation Coefficients:

rxy = 0.57

rxz = 0.94

ryz = 0.57

The partial correlation coefficient is this

rxy.z = rxy - (rxz) (ryz)


(1-rxz2)91-ryz2)

= 0.57 - (0.94) (0.57)


[1-(0.94)2][1-(0.57)2

rxy.z = 0.12

28
The result indicates that there is a slight correlation between school ability
and physical fitness with age held constant.

Multiple Correlation

When two variable X1 and X2 are correlated with single variable Y, the
appropriate test to be used is multiple correlation. The coefficient (r y.12) is given
by

ry.12 = ry12 + ry22 - 2(ry1)(ry2)(r12)


1 - r122

where the subscript 1 and 2 refer to the variables X1 and X2, respectively. The
variables X1and X2 are called predictors of variable Y. The prediction equation is

Ŷ = a + b1 x1 + b2x2.

where b1 = ry1 - (ry2) (r12) sy


1 - r122 s1

b2 = rY2 – (rY1)( r12) sy


1 - r122 s2
_
a = Y – b1 1 – b2 2 .

The correlation coefficients rY1, rY2 and r12 are computed using the
Pearson r.

To illustrate this method, we calculate the correlation of academic


performance (Y) with mental ability (X1) and aptitude (X2). The data are given in
Table 18:

29
Table 18. Data for the Computation of ry.12

X1 X2 Y
4 5 5
5 6 5
5 5 6
6 4 5
6 6 8
6 8 7
7 7 8
7 9 9
8 10 9
9 8 8
10 9 10

Means:
_
1 = 6.64 2 = 7.00 Y = 7.27

Standard Deviations:

S1 = 1.80 S2 = 1.95 SY = 1.79

Correlation Coefficients:

r12 = 0.74 rY1 = 0.84 rY2 = 0.86.

The constants b1, b2 and a are computed as follows:

b1 = ry1 - (ry2) (r12) sy


1 - r122 s1

= 0.84 – (0.86)( 0.74) 1.79


1 - (0.74)2 1.80

b1 = 0.45.

30
b2 = rY2 – (rY1)( r12) sy
1 - r122 s2

= 0.86 – (0.84)( 0.74) 1.79


1 - (0.74)2 1.95

b2 = 0.48.

_
a = Y – b1 1 – b2 2 .

= 7.27 – (0.45)(6.64) – (0.48)(7.00)

= 7.27 – 2.99 – 3.36

a = 0.92.

Thus, the prediction equation is

Ŷ = a + b1X1 + b2X2

Ŷ = 0.92 + 0.45X1 + 0.48X2 .

The above regression equation is important because it can be used to


predict future results.

To obtain the multiple correlation coefficient, we use

_______________________
rY.12 = / rY12 + rY22 – 2(rY1) (rY2) (r12)
 1 - (r12)2

_______________________________
rY.12 = / (0.84)2 + (0.86)2 – 2(0.84) (0.86) (0.74)
 1 - (0.74)2

rY.12 = 0.91.

31
The multiple correlation coefficient indicates that X 1 and X2, taken as a
whole, correlate highly with Y. Together with rY1 = 0.84 and ry2 = 0.86, rY.12 =
0.91 reveals that X1 (mental ability) and X2 (aptitude) are good predictors of Y
(academic performance).

In general, if there are k predictors, the regression equation is written


as

Ŷ = a + b1X1 + b2X2 + b3X3 + . . . + bKXK .

The greater number of predictors considered, the more accurate the


prediction results are. To find out which predictors are significant, we can
employ a Stepwise Regression Analysis using the computer.

Self-Test 4

Given the following data:

Age Aptitude Mental Ability Achievement Pretest Score


(V) (W) (X) Test/Post Test (Z)
(Y)
16 17 18 23 19
15 22 19 28 22
18 16 15 24 17
14 23 25 23 18
17 19 23 27 22
16 20 21 30 26
18 26 28 32 24
19 25 29 21 13
20 27 35 25 19
18 24 36 32 26

32
a) Compute rX(Y.Z). Interpret the result.

b) Compute rXY.Z. Interpret the result.

c) Compute rY.WX. Interpret the result.

d) Determine the regression equation considering W and X as predictors.


Are W and X good predictors of performance in the Achievement Test?

Congratulations for
completing Module 3!
…Before you proceed to the
next module, please submit
your answers to the Self-
Tests to your professor...

References

Deauna, M. Applied Statistics (Part II).

Glass, G. V. & Hopkins, K. D. (1984). Statistical Methods in Education and


Psychology. (2nd ed.). New Jersey: Prentice-Hall, Inc.

Snedecor, G. W. & Cochran, W. G. (1980). Statistical Methods. (7 th ed.). Iowa:


The Iowa State University Press.

Spiegel, M. R. Theory and Problems of Statistics. (2nd ed.). McGraw-Hill Book


Company.

33

You might also like