You are on page 1of 17

CONTENTS

Correlation
• Meaning and Uses of Correlation The Phi or Fourfold Coefficient
• Graphs of Degree of Correlation • The Tetrachoric r, 𝑟!
and the Correlation Coefficient
• Pearson r Correlation or Product- No Special Correlation Coefficient
Moment Coefficient of Correlation. • Partial r
• Spearman Rho Coefficient of • Multiple Correlation
Correlation
• Test of Significance Coefficient Gamma
• Gamma
The Chi-Square Test
• Chi-Square Test Correlation Between Nominal Data
• Lambda
Kendall's Tau Correlation Between Ranks
• Kendall's Tau Correlation Between Correlation Between An Interval And
Ranks - No Ties Nominal Data
• Kendall's Tau Correlation Between • The Correlation Ratio
Ranks - With Ties
• Kendall's Coefficient of
Concordance W

Meaning and Uses of Interpretation Of Ranges is shown below:


Correlation 0 - No correlation
Correlation ±0.01 to ±0.20 - Slight Correlation, almost
Correlation is measure of negligible relationship.
relationship between two variables, ±0.21 to ±0.40 - Slight Correlation, but small
coefficient of correlation determines relationship.
validity, relativity and objectivity of an ±0.41 to±0.70 - Moderate Correlation,
examination prepared. In also indicates substantial relationship.
the amounts of agreement or ±0.71 to±0.90 - High Correlation, marked
disagreement between group of scores, relationship.
measurement and individual, correlation ±0.91 to ±0.99 - Very high Correlation, very
ranges in value from +1.00 through 0.00 dependable relationship.
up to -1.00. +1 - Perfect correlation.

Graphs of Degree of Correlation and


the Correlation Coefficient
y Perfect Positive Correlation

The degree of correlation can be indicated


by a number called correlation coefficient, the
symbol for which is the small letter r. Its value
range

– From -0 to +1 for positive correlation


– From -1 to for negative correlation

These varying degrees of correlation


maybe depicted by scatter graphs. These are x
shown below together with the corresponding
values of the correlation coefficient.
y
Some Positive Correlation r is between 0 to 1 No Correlation 𝑟 = 𝑧𝑒𝑟𝑜 or NIL
y

x x

Some negative correlation r is Perfect negative correlation r = -1


y between -1 and 0 y

x x

Pearson r Coefficient or Product-Moment Example 1


Coefficient of Correlation STUDENTS X Y XY 𝑿𝟐 𝒀𝟐
1 21 27 567 441 729
The Pearson r is used to find the correlation between individual data:
2 22 28 616 484 784
a. The Pearson r from raw scores,
3 28 27 756 784 729
The formula is:
4 27 10 270 729 100
5 48 30 1440 2304 900
𝑵𝜮𝑿𝒀 − 𝜮𝑿𝜮𝒀
𝒓= 6 22 21 462 484 441
[𝑵𝜮𝑿𝟐 − (𝜮𝑿)𝟐 ][𝑵Ʃ𝒀𝟐 − 𝜮𝒀 ²
7 27 27 329 729 729
8 6 21 176 36 441
Where: N – number of cases
9 11 21 231 121 441
𝜮𝑿𝒀 – sum of the products X and Y
10 12 28 336 144 784
𝜮𝑿 – sum of X
𝜮𝒀 – sum of Y 11 16 30 480 256 900
12 28 31 588 784 441
𝜮𝑿𝟐 - sum of the squares of X
268 291 6601 7297 7419
𝜮𝒀𝟐 - sum of the squares of Y
𝑵𝜮𝑿𝒀%𝜮𝑿𝜮𝒀
𝒓=
[𝑵𝜮𝑿𝟐 %(𝜮𝑿)𝟐 ][𝑵Ʃ𝒀𝟐 % 𝜮𝒀 ²
The steps are as follows: 𝟏𝟐(𝟔𝟔𝟎𝟏)%(𝟐𝟔𝟖)(𝟐𝟗𝟏)
1. Get the total of the data under x and y as 268 𝒓=
[𝟏𝟐(𝟕𝟐𝟗𝟔)%(𝟐𝟔𝟖)𝟐 ][𝟏𝟐(𝟕𝟒𝟏𝟗)% 𝟐𝟗𝟏 ²
and 291. 𝟏𝟐𝟐𝟒
𝒓=
(𝟏𝟓𝟕𝟐𝟖)(𝟒𝟑𝟒𝟕)

2. Square each of the data in x and y and get their 𝟏𝟐𝟐𝟒


𝒓=
𝟖𝟐𝟔𝟖.𝟓𝟗𝟐𝟏
totals as 7296 and 7419.
𝒓 = 𝟎. 𝟏𝟒𝟖𝟎𝟑
𝒓 = 𝟎. 𝟐𝟎
3. Find the product of each pair of x and y and get
* There is a slight positive correlation between
the sum.
variables X and Y.

4. Substitute the corresponding values of 𝑁,


Ʃ𝑿𝒀, Ʃ𝑿, Ʃ𝒀, Ʃ𝑿² and Ʃ𝒀² in the formula

Compute the coefficient of the correlation by the use


b. The Pearson r from standard scores Example 2 of the product-moment method by using the score
given as in example 1.
The formula is:
STUDENT TEST X TEST Y 𝐝𝐱(𝐱 − 𝐱+) 𝐝𝐲(𝐲 − 𝐲2) 𝐝²𝐱 𝐝²𝐲 𝐝𝐱𝐝𝐲
1 21 27 -1.33 2.75 1.77 7.56 -3.66
𝚺𝐝𝐱𝐝𝐲 2 22 28 -0.33 3.75 0.11 14.06 -1.24
𝐫=
3 28 5.67 2.75 32.15
(𝚺𝐝𝟐 𝐱)(𝚺𝐝²𝐲) 27 7.56 15.5
4 27 10 4.67 -14..25 21.81 203.06 -66.55
5 48 30 25.67 5.67 658.95 33.06 147.60
6 22 21 -0.33 -3.28 0.11 10.56 1.07
Where: 7 27 27 4.67 2.75 21.81 7.56 12.84
rxy – is the coefficient 𝚺𝐝𝟐 𝐱 – is the sum of the 8 6 21 -16.33 -3.25 266.67 10.56 53.07
of correlation by column 𝑑 / 𝑥 9 11 21 -11.33 -3.25 128.37 10.56 36.82
10 12 -10.33 3.75
the Product- 𝚺𝐝𝟐 𝐲 – is the sum of the
28 106.71 14.06 -38.74
11 16 30 -6.33 5.75 40.06
moment method column 𝑑 / 𝑦
33.06 -36.40
12 28 31 5.67 -3.25 32.15 10.56 -18.43
𝚺𝐝𝐱𝐝𝐲 – is the sum of 𝚺"
𝐱 = 268 𝚺𝐲 = 291 𝚺𝐝2 x = 𝚺𝐝2 = 𝚺𝐝𝐱𝐝𝐲 =
the columns 𝐱$ = 22.33 𝐲" = 24.25 1310.69 362.22 101.97

The steps are as follows: 6. Get the summation of 𝑑 : 𝑥 and 𝑑 : 𝑦


as 1310.67 and 362.22.
1. Get the total of the data under test 7. Get the product of dx and dy to have
x and y as 268 and 291. dxdy.
2. Get the mean of text x: 𝒙
0=
𝟐𝟔𝟖
= 8. Get the summation of dxdy: dxdy=
𝟏𝟐
101.97
𝟐𝟐. 𝟑

3. Get the mean on test y: 𝒚


0=
𝟐𝟗𝟏
= Applying the formula:
𝟏𝟐
𝟐𝟗. 𝟓 𝜮𝒅𝒙𝒅𝒚
𝒓𝒙𝒚 =
4. Get the deviation dx and dy by (𝜮𝒅𝟐 𝒙)(𝜮𝒅²𝒚)
getting the difference between the
mean and the score. 𝒓 𝟎 𝟏𝟓
SPEARMAN RHO COEFFICIENT OF CORRELATION Example 1
The Spearman rho coefficient of correlation is used for ordinal STUDENT TEST X TEST Y Rx Ry D D2

data. 1 21 27 8 6 2 4
2 22 28 6.5 3.5 3 9
The formula is:
3 28 27 2.5 6 3.5 12.25
𝟔𝜮𝑫² 4 27 10 4.5 12 0.5 56.25
𝒓𝒉𝒐 = 𝟏 −
𝑵(𝑵² − 𝟏) 5 48 30 1 1.5 0.5 0.25
6 22 21 6.5 9.5 3 9
Where:
7 27 27 4.5 6 1.5 2.5
8 6 21 12 9.5 2.5 6.25
rho – is the coefficient of 𝚺𝐃² – is the sum of
9 11 21 11 9.5 1.5 2.25
correlation by the rank column 𝐃²
10 12 28 10 3.5 6.5 42.25
difference method N – is the sum of the pairs
11 16 30 9 1.5 7.5 56.25
1 – is constant of scores or
12 28 21 2.5 9.5 7 49
6 – is constant measures
ƩD²= 249

The steps are as follows:


By substitution:
1. Follow the same steps from 1 – 5
𝟔𝜮𝑫²
𝒓𝒉𝒐 = 𝟏 −
2. Find the difference between the two sets of ranks of values 𝑵(𝑵²%𝟏)
under columns Rx and Ry. It doesn’t matter if the
difference is found by subtracting Rx values from Ry values 𝟔(𝟐𝟒𝟗)
𝒓𝒉𝒐 = 𝟏 −
or the Ry values to Rx values, as long as you subtract the 𝟏𝟐(𝟏𝟐²%𝟏)
larger value from the smaller value.
𝟏𝟒𝟗𝟒
𝒓𝒉𝒐 = 𝟏 −
3. Write the difference of the values under the Rx and Ry 𝟏𝟐(𝟏𝟒𝟒%𝟏)
columns under column D, which means difference.
𝟏𝟒𝟗𝟒
𝒓𝒉𝒐 = 𝟏 −
𝟏𝟐(𝟏𝟒𝟑)
4. Square the value under column D and write the result
under column D2. 𝒓𝒉𝒐 = 𝟏 − 𝟎. 𝟖𝟕
5. Get the sum of the values under column D2 and call this 𝒓𝒉𝒐 = 𝟎. 𝟏𝟑
ƩD2.

TEST OF SIGNIFICANCE CORRELATION 1 2 3 4 5


EXAMPLE: Hrs. of Study Grade
XY X² Y²
X Y
To establish the relation between the grades of students and 23 80 1840 529 6400
the number of hours that they spend in studying a subject, the
32 96 3072 1024 9216
grades of a sample of 8 students are taken after an examination on
39 92 3588 1521 8464
the subject. Find the:
25 72 1800 625 5184
27 85 2295 729 7225
a. Standard error od estimate.
37 100 3700 1364 10000
b. Coefficient of the termination and interpret the results.
28 78 2184 784 6084
21 69 1449 441 4861
The columns of grade X and hours of study Y are shown ƩX = 232 ƩY = 672 ƩXY = 19928 ƩX² = 7022 ƩY² = 57334
below. The other columns which are derive from the X and Y
columns are separated by a vertical broken line. The sums of the
original and derived columns shall be used in our presentation 𝐧 𝚺𝐱𝐲 − (𝚺𝐱)(𝚺𝐲)
computations. 𝐏ᵣ =
𝐧 𝚺𝐱 𝟐 − 𝚺𝐱 𝟐 [𝐧 𝚺𝐲 𝟐 − 𝚺𝐲 𝟐 ]
Solving for coefficient of determination
All the terms in this formula are the sums of
columns in our example, substituting these sums into 𝒓𝟐 = (𝟎. 𝟖𝟔)𝟐
=
the formula, we have: 𝒓𝟐 𝟎. 𝟕𝟑𝟗𝟔 𝒐𝒓 𝟕𝟒%

𝟖(𝟏𝟗𝟗𝟐𝟖)8(𝟐𝟑𝟐)(𝟔𝟕𝟐)
𝑷ᵣ = Conclusion:
𝟖 𝟕𝟎𝟐𝟐 8 𝟐𝟑𝟐 𝟐 [𝟖 𝟓𝟕𝟑𝟑𝟒 8 𝟔𝟕𝟐 𝟐 ]

𝑷ᵣ = 𝟎. 𝟖𝟔 Since the computed value r of 0.86 is


greater than the tabular value of 0.707, we
Using Pearson correlation coefficient table reject Ho (and accept Ha). therefore there is
Level of significance sufficient evidence to conclude that there is some
𝟓% 𝒐𝒓 𝟎. 𝟎𝟓 = 𝟎. 𝟕𝟎𝟕 significant linear correlation between the two
𝟏% 𝒐𝒓 𝟎. 𝟎𝟏 = 𝟎. 𝟖𝟑𝟒 variables hours of study (X) and grade (Y).

THE CHI-SQUARE TEST (λ²)


The chi-square test does not require the
basic assumption of normality of distribution
before it could be used. This is a distribution free
or non-parametric test. Chi-square is used for
THE CHI-SQUARE data presented in frequencies (nominal data).

TEST (𝝀²) TEST OF GOODNESS OF FIT


This is a one sample or one variable with
two or more categories chi-square. Test of
goodness of fit is used to determine if the
certain distribution differs for some pre-
determined theoretical distribution.

EXAMPLE 1: A 25 centavo coin is tossed


60 times. It was observed that
25 heads and 35 tails appear. O E OE (O-E)2 (O-E)2/E

Is the coin fair?


If the coin is fair, the frequencies
H 25 30 -5 25 0.83
would be 30 heads and 30 tails. The
hypothesis can be tested using the general
formula or chi-square which is as follows.
Where: T 35 30 5 25 0.83

O – the observed frequencies


E – expected frequencies
𝒅𝒇 = 𝒌 − 𝟏
Where:
Test of independence
The test of independence can be used
k = to the number of categories
if two variables are involved. Each
The degrees of freedom for this example is 2 − variable consisting a two or more
1 = 1 categories. The object of a test is to
determine if the variables are related or
Using the table for chi-square the if one of the variable is dependent to
tabular value for 1 degree of freedom at 0.05 another variable.
level of significance is 3.841. The computed
value of 1.66 is less than the tabular value, Example 2: To test the hypothesis
hence we accept the null hypothesis of no whether educational attainment is
difference between the observed and expected independent in socio-economic status,
frequencies at 0.05 level of significance. a survey was conducted. The results
Therefore, we say that the coin is fair. are given below:

Socio – economic status


Use 0.05 to test the hypothesis that the
Educational educational attainment is independent of SES.
attainment High Average Low
(𝑶#𝑬)²
The same formula will be applied 𝝀² = 𝜮
𝑬
ELEMENTAR 10 20 30 60 The expected frequencies can be found using the
Y
formula.
SECONDARY 25 22 20 67

𝒓𝒐𝒘 𝒕𝒐𝒕𝒂𝒍×𝒄𝒐𝒍𝒖𝒎𝒏 𝒕𝒐𝒕𝒂𝒍


TERTIARY 35 30 15 80 𝑬=
𝒈𝒓𝒂𝒏𝒅 𝒕𝒐𝒕𝒂𝒍
70 72 65 207

Computation
O E O–E (O – E)2
𝒅𝒇 = 𝒄 − 𝟏 𝒓 − 𝟏
(𝑶%𝑬)²
𝑬

10 20.29 -10.29 105.8841 5.2185

20 20.27 -0.87 0.7569 0.0363

30 18.84 11.16 124.5456 6.6107


Where
25 22.66 2.34 5.4756 0.2416 𝒄 =
22 23.30 -1.3 1.69 0.0725 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛𝑠
20 21.04 -1.04 1.0816 0.0514
𝒓 = 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑜𝑤𝑠
35 27.05 7.95 63.2025 2.3365

30 27.83 2.17 4.7089 0.1692

15 25.12 -10.12 102.4144 4.0770

𝝀² = 𝟏𝟖. 𝟖𝟏𝟑𝟕
For the example, Example 3: The sales manager is asked by the corporate
board to provide a more specific answer with regard to the
𝒅𝒇 = (𝟑 − 𝟏)(𝟑 − 𝟏) = 𝟐(𝟐) = 𝟒 importance of the location in selling a particular type of product
by doing empirical research.
The sales manager gives you the following data to find
Conclusion: out if dependency relationship exist between sales volume and
location of 80 establishments.
The computed value of 18.8317 is greater than
Sales Volume
the tabular value of 13.277 at 0.01 level of Quiapo Cubao Pasig Makati Total

significance with of df=4. The null hypothesis of High 10 8 9 10 37

number relationship between educational Moderate 8 6 5 4 23

attainment and SES. Educational attainment is Average 3 3 4 2 12

dependent on SES. Low 2 3 2 1 8

TOTAL 23 20 20 17 80

O E O-E (O-E)2 (𝑶#𝑬)²


𝑬
𝒅𝒇 = (𝒄 − 𝟏)(𝒓 − 𝟏)
10 10.64 0.64 0.4096 0.0385

8 9.25 -1.25 1.5625 0.1689 𝒅𝒇 = (𝟒 − 𝟏)(𝟒 − 𝟏)


9 9.25 -0.25 0.0625 6.7567

10 7.86 2.14 4.5796 0.5826 𝒅𝒇 = (𝟑)(𝟑)


8 6.61 1.39 1.9321 0.2923

6 5.75 0.25 0.0625 0.0109 𝒅𝒇 = 𝟗


5 5.75 -0.75 0.5025 0.0978

4 4.89 -0.89 0.7921 0.1620

Conclusion:
3 3.45 -0.45 0.2025 0.0587

3 3 0 0 0

4 3 1 1 0.3333

2 2.55 -0.55 0.3025 0.1186


The computed value of 𝝀² = 𝟐. 𝟔𝟗𝟕𝟕 is less than
2 2.3 -0.3 0.09 0.0391 the tabular value of 𝟐𝟏. 𝟔𝟔𝟔 at 𝟏% level of significance
3 2 1 1 0.5
with 𝒅𝒇 = 𝟗 . So, the dependency relationship exist
2 2 0 0 0

1 1.7 -0.7 0.49 0.2882 between sales volume and location of 80 establishments.
λ² = 2.6977

Kendall’s Tau Correlation Between Ranks

Kendall’s tau (T), can be applied wherever the Spearman


rank-order coefficient is applicable. As can be seen below, it is
somewhat harder to compute than rs. When there are no ties,
the solution is short and simple. Data in table 8.3. It is
imperative that one of the series of ranks be in its natural order-
that is starting from high to low. For each individual, the number
of ranks below that are higher and lower on the Y variable are
counted. For example, 3 persons rank higher than the first
individual on test Y and 3 rank lower. In the case of the second
individual, 4 rank higher and 1 ranks lower. This counting
continues until the columns in table 8.3 are completed. In doing
this, one looks only at the Ry’s that lie below the score of the
individual being examined. Each column is then summed, the
column with the number of ranks higher being called P and the
other column Q. Tau is the calculated:
Table 8.3 Calculation of Kendall’s T
Coefficient – No Ties Tau is the calculated:

𝐏"𝐐
𝐓 = 𝐍(𝐍"𝟏)⁄𝟐

𝟏𝟕"𝟒
𝐓=
𝟕(𝟔)⁄𝟐
𝟏𝟑
𝐓=
𝟐𝟏
𝐓 = 𝟎. 𝟔𝟐

When ties appears, certain adjustments have to be Table 8.4 Calculation of Kendall’s T
made. The number of individual’s ranking higher and lower
than each individual on the Y variable is again determined,
Coefficient – With Ties
resulting in P = 33 and Q = 11.
In handling ties we first take the X distribution and
for each set of ties determine (x)(x-1). Where x is the
number of tied for a particular rank. These are summed
and divided by 2. We have:
𝟐 𝟐%𝟏 A𝟑(𝟑%𝟏)
=
𝟐
𝟐A𝟔
=
𝟐
𝟖
=
𝟐
=𝟒

We repeat the process for Y distribution: And take the square root of the product:
𝟐(𝟐%𝟏) 𝟐 𝟏𝟖𝟎𝟒 = 𝟒𝟐. 𝟓
= =𝟏
𝟐 𝟐 Then:
Next we calculate: 𝐏−𝐐
𝐓=
𝑵(𝑵%𝟏) 𝟏𝟎%(𝟗) 𝟒𝟐. 𝟓
= = 𝟒𝟓 𝟐𝟐
𝟐 𝟐 𝐓=
𝟒𝟐. 𝟓
The correction obtained above for each 𝐓 = 𝟎. 𝟓𝟐
distribution is subtracted from this:
𝟒𝟓 − 𝟒 = 𝟒𝟏 Kendall’s tau, like rs, has many applications. For reasons
𝟒𝟓 − 𝟏 = 𝟒𝟒 beyond the scope of this text, many statisticians prefer tau
We next multiply this two terms: over rs. As illustrated, when both rs and tau are computed for
the same data, tau is then smaller. The range of tau is the
𝟒𝟒 𝟒𝟏 = 𝟏𝟖𝟎𝟒 same as that of rs and both statistics are interpreted in the
same way.
Kendall’s Coefficient of Concordance W
Total sum of
If we wish to determine the relationship among three or more
sets of ranks, one rank could be selected and a spearman rs
rank:
coefficient computed between it and all of the others, and this process
could then be continued until an rs coefficient has been obtained
𝒎(𝑵)(𝑵H𝟏)
between each set of two ranks. Then these rs ‘s could be averaged for
an overall measure of relationship. =
𝟐
Kendall though has developed a technique and a statistic that =
make all of this unnecessary. Supposed the five judges (m) rank the
projects of ten individual (N) in judging contest, and we wish to
𝟓(𝟏𝟎)(𝟏𝟏)
determine the overall relationship among the ratings of the five judges. 𝟐
The rankings of these judges have been set up in table 8.5. First the
rankings by the five judges of first of the projects are summed. The
sums appear in column 3. Then column 3 is summed to give the total
=
sum of the ranks. This can be checked for the total sum of the rank as
follows:
𝟐𝟕𝟓

Table 8.5: CALCULATION OF THE COEFFICIENT OF


CONCORDANCE, THE DATA CONSISTING THE RANKING If there were no relationship
OF 10 PROJECTS BY 5 JUDGES
among the ranks, we should
expect the sum of the ranks for
1 2 3 4 5
JUDGES’ RANKS
each row to be equal. For this
case the sum of each would be the
1 2 3 4 5
INDIVIDUAL SUM OF D D² average sum of ranks or 275/10
PROJECTS RANKS
1 2 1 2 3 4 12 15.5 240.25
which equals 27.5. We next obtain
2 1 3 1 2 2 9 18.5 342.25 the difference from this mean of
3 3 4 4 1 3 15 12.5 156.25
4 4 5 5 5 1 21 6.5 42..25 the sum of the ranks of each row
5 5 2 6 7 6 25 2.5 6.25
6 6 8 3 4 7 29 1.5 2.25 and square these differences.
7
8
7 6 8 6 5 31
39
3.5
11.5
12.25
132.25
Then these squares are summed.
8 7 7 8 9
9 9 10 10 9 8 46 18.5 342.25 This work appears in column 4 and
10 10 9 9 10 10 48 20.5 420.25
Ʃ = 275 Ʃ𝐷² = 1696.50 5 of Table 8.5.

To compute W, we use the following formula:


𝟏𝟐𝜮𝑫𝟐
𝑾=
𝒎𝟐 (𝑵)(𝑵𝟐 %𝟏)
𝟏𝟐(𝟏𝟔𝟗𝟔.𝟓)
𝑾=
𝟐𝟓 (𝟏𝟎)(𝟏𝟎𝟎%𝟏)
𝑾 = 𝟎. 𝟖𝟐

INTERPRETATION OF W.

The size of this coefficient of concordance


indicates that there is high agreement among these five
judges in the ranking of the ten projects. Perfect
agreement is indicated by W=1and lack of agreement
by W=0.
LINEAR RELATIONSHIP Suppose we call the test scores (the continuous variable) Y and
The Point-Biserial Coefficient that we call the responses to the dichotomous variable X. In this
There are circumstances, especially in the field of case X is either right or wrong and is cored either 1 or 0. It
test construction and test validation, where one of the follows then that the ∑X is actually the number of individuals who
variables is continuous and the other is conceived of as a responded correctly to the item. This will be designated as Np, the
dichotomy. In the usual scoring of items, the procedure is number passing the item or answering it correctly. Since each X
to mark the item either right or wrong. The right-wrong is either a 1 or a 0, the ∑x2 will also be equal to Np, and we shall
scoring is regarded as being a true dichotomy. call the total frequency Nt. Next we shall obtain the sum of the
The point biserial coefficient will be considered in squares for X.
some detail because it is a Person product-moment 𝚺𝐱 𝟐 = 𝚺𝐱 𝟐 −
(𝚺𝐱)𝟐
𝐍
coefficient, and It is widely used in test construction and 𝐍𝐩
𝚺𝐱 𝟐 = 𝐍𝐩 −
analysis. First, we shall sow that the point biserial is just a 𝐍𝐭

special case of the product-moment correlation, and then 𝚺𝐱 𝟐 =


𝐍𝐩 𝐍𝐭 (𝑵𝑷𝟐
𝐍𝐭
we shall consider a method of calculating it.
𝐍𝐩 (𝐍𝐭 (𝐍𝐩 )
𝚺𝐱 𝟐 =
𝐍𝐭

Since 𝑵𝒕 − 𝑵𝒑 = 𝑵𝒘, the number missing the item or responding to it Table 8.6: SCORES ON A CONTINOUS AND ON A DICHOTOMOUS
correctly: VARIABLE TO ILLUSTRATE THE COMPUTATION OF
𝑵𝒑 𝑵𝒘 THE POINT –BISERIAL
𝜮𝒙𝟐 =
𝑵𝒕 (1) (2) (3) (4) (5) (6) (7)
Y fp fw f Fy fY2 fpY
Also, in obtaining Ʃ𝑋𝑌 only those values where 𝑋 = 1 will enter 10 2 0 2 20 200 20
into the calculations, so actually Ʃ𝑋𝑌 may be written as Σ𝑓6 𝑌, each 𝑌
value multiplied by the frequency passing. 9 4 0 4 36 324 36
Let us now take equation (7.2) and write it wit the three basic 8 6 1 7 56 448 48
parts as fractions: 7 7 1 8 56 392 49
𝜮𝑿𝒀 − (𝜮𝑿)(𝜮𝒀)/𝑵 6 8 2 10 60 360 48
𝒓=
5 6 4 10 50 250 30
𝚺𝑿 𝟐 𝚺𝒀 𝟐 4 5 6 11 44 176 20
𝜮𝑿𝟐 − 𝜮𝒀𝟐 −
𝑵 𝑵
3 3 8 11 33 99 9
And substitute some of the information presented above into it. We 2 2 7 9 18 36 4
now have:
1 1 8 9 9 9 1
𝜮𝒇𝒑 𝒀 − 𝑵𝒑 (𝜮𝒇𝒑 𝒀)/𝑵 0 0 9 9 0 0 0
𝒓= ∑fp =44 ∑fw =46 ∑f=90 ∑fY=382 ∑fY2=2294 ∑fpY=265
𝑵𝒑 𝑵𝒘 /𝑵𝒕 − 𝜮𝒇𝒀𝟐 − (𝜮𝒇𝒀)𝟐 /𝑵𝒕

When this is cleared of fractions, as in formula Column 1 consists of he scores on test Y. In


(7.2),it becomes: column 2 are the frequencies of those who responded
correctly to test X, tallied against the various Y scores.
𝜮𝒇 𝜮𝒇𝒑𝒀 &𝜮𝒇𝒑(𝜮𝒇𝒀) In column 3 are the frequencies of those who failed test
𝒓𝒑𝒃 =
𝜮𝒇𝒑𝜮𝒇𝒘 𝜮𝒇 𝜮𝒇𝒀𝟐 &(𝜮𝒇𝒀)² X. In column 4 presents the total frequencies for each Y
score. The values in column (fY) are the products
obtain by multiplying the respective values in column 1
The use of this formula is illustrated by and 4. Values in column 6(fY2) are obtained by
manipulating the data in table 8.6, which multiplying the values in column 5 by their respective
presents the scores 90 individuals on a short values in column 1.
test of perception (Y) and on a test of ability to
visualize the number of blocks in a geometric Column 7 (fpY) values are obtained by multiplying
figure (X). The test was scored right or wrong, the values in column 1 by the values in column 2. Then
1 or 0. the columns are summed entered into equation (8.5):
Some comments on the point-biserial r. In test
𝟗𝟎 𝟐𝟔𝟓 %𝟒𝟒(𝟑𝟖𝟐)
analysis and construction work where this statistic is widely
𝐫𝐩𝐛 = used, computer programs exist that make the obtaining of
𝟒𝟒(𝟒𝟔) 𝟗𝟎 𝟐𝟐𝟗𝟒 %(𝟑𝟖𝟐)²
these correlation coefficients both rapid and efficient. When
𝟐𝟑𝟖𝟓𝟎%𝟏𝟔𝟖𝟎𝟖
𝐫𝐩𝐛 = the continuous variable is normally distributed and the
𝟐𝟎𝟐𝟒(𝟐𝟎𝟔𝟒𝟔𝟎%𝟏𝟒𝟓𝟗𝟐𝟒)
dichotomous variable is spilt 50=50 that is, p=.50-it is
𝟕𝟐𝟎𝟒 generally accepted that rpb has a maximum value of .80.
𝐫𝐩𝐛 =
𝟐𝟎𝟐𝟒(𝟔𝟎𝟓𝟑𝟔)
Later research (karabinus, 1975) has shown that when the
𝟕𝟎𝟒𝟐 shape of the distribution of the continuous variable departs
𝐫𝐩𝐛 =
𝟏𝟏𝟎𝟔𝟗 from normal, the computed rpb may exceed .80. Also, if the
𝐫𝐩𝐛 = 𝟎. 𝟔𝟒 continuous variable is platykurtic or rectangular, coefficients
in the .80s maybe expected. If the continuous variable
bimodal, coefficient above .90 may result.

THE BISERIAL rprb. The student


may encounter in the literature a
correlation coefficient known as the
biserial correlation coefficient.
For many years this statistics was
much use in test construction work
in numerous charts in other aids
were devise to help in obtaining it
rapidly. This statistic is only an
estimate of a Pearson r and is a
less reliable statistic. In this age of
the digital computers, there is a little
need for this statistic.

THE PHI OR FOURFOLD COEFFICIENT TABLE 8.7 RESPONSES OF THE 200


STUDENTS TO TEST THE ITEMS
The phi coefficient is used when each of the variables
is a dichotomy. To illustrate this technique, let us suppose RIGHT WRONG
that we are making an analysis of the relation between an
opinion item and an information item that have been
AGREE 70 a 30 b 100 k
administered to 200 students. Suppose 100 students agree
30 c 70 d 100 l
with the opinion and the other 100 disagree. We set the DISAGREE
100m 100n 200 N
data up as shown in table 8.7, with the agree and disagree
groups on the side and the scores of right and wrong across
the top. The number in the agree and disagree group that
answer the information item correctly and the number that
answer incorrectly are entered in the cells. Note that the 𝐚𝐝 − 𝐛𝐜
cells are lettered and that marginal value have been Ф=
computed and given letters. 𝐤𝐥𝐦𝐧
Where the various letters are the frequencies as The Phi-Coefficient, like the point-biserial coefficient, is
a product-moment correlation coefficient. This makes it
shown in table 8.7. a desirable one and one that is very useful in the test
Solving for this data, we have: construction and analysis. A major limitation of this
statistic is that the size of the coefficient is related to the
𝟕𝟎 𝟕𝟎 #𝟑𝟎(𝟑𝟎) way in which the two variables are split. When both
ф= variables are evenly divided as are the data in the table
𝟏𝟎𝟎 𝟏𝟎𝟎 𝟏𝟎𝟎 (𝟏𝟎𝟎) 8.7, the maximum limits of a correlation coefficient, +1
may be obtained. If the marginal limits are unequal, the
𝟒𝟗𝟎𝟎#𝟗𝟎𝟎
ф= 𝟏𝟎𝟎𝟎𝟎
maximum values will be less than 1.00 and will be
different from case to case.
𝟒𝟎𝟎𝟎
ф=
𝟏𝟎𝟎𝟎𝟎
ф = 𝟎. 𝟒𝟎

As an illustration take the following example:


The Tetrachoric r, 𝐫𝐭
30 20 50
A B
The tetrachoric correlation r is a statistical that
10 40 50
C D was widely used in the past as a measure of the
40 60 100
relationship between data that were reduced to two
Using formula (8.6), phi id found to be 0.41. The dichotomics. This statistical, like the biserial r, is only
maximum value for such marginal values could be an estimate of the Pearson r and is quite unreliable
obtained when we have a distribution like the following: statistic. At the present time there is little justification
40
a
10
B
50
for the use of this statistic, as the phi coefficient is far
0 50 50 superior.
c D

40 60 100

NO SPECIAL CORRELATION COEFFICIENT


PARTIAL r
The relationship between two variables frequently
influenced variable .For example suppose that we have
the relationship between the intelligence test scores
and arithmetic grades for a set of students and also the
correlation of the same intelligence scores with the
grades in English. In addition we have the relation
between the arithmetic grades and the English grades.
Both of this school subject are related to intelligence
test scores and they seem to be related to each other.
With partial correlation coefficient it is possible to Similarly it is possible to write a comparable equation
for 𝐫𝟏𝟑,𝟐 and 𝐫𝟐𝟑,𝟏 supposes that we have the following three
control these effects of intelligence or to “partial them
variables:
out”. We might what is the relation between English
grades and arithmetic grades with the effect of 1. Chronological
intelligence partial out. A situation like this refers to as 2. Weight
a partial r of the first order based upon the zero order 3. Scores on an arithmetic test
r’s.
The general formula for the partial order r: For several hundred students, we compute the
correlation among three variables and obtain the following.
𝐫𝟏𝟐 − 𝐫𝟏𝟑 𝐫𝟐𝟑
𝐫𝟏𝟐)𝟑 = 𝐫𝟏𝟐 = 𝟎. 𝟖𝟎
𝟐 𝟐
(𝟏 − 𝐫𝟏𝟑 )(𝟏 − 𝐫𝟐𝟑 ) 𝐫𝟏𝟑 = 𝟎. 𝟔𝟎
𝐫𝟐𝟑 = 𝟎. 𝟓𝟎

From this we see that we have a correlation between weight and


arithmetic test, that with a sample of this size is significant, suppose
MULTIPLE CORRELATIONS
that we investigate the relation between weight and scores on an Suppose that we have the following three coefficients
arithmetic test with the effects of chronological age partialed out.
based upon three variables for a large group of first year
𝐫𝟐𝟑 8 𝐫𝟏𝟐 𝐫𝟏𝟑 students of a university.
𝐫𝟐𝟑@𝟏 =
𝟐 )(𝟏 8 𝐫 𝟐 )
(𝟏 8 𝐫𝟏𝟐 𝟏𝟑
1. grade
2. scores on the Ohio state psychological examination
𝟎.𝟓𝟎 8 𝟎.𝟖𝟎 (𝟎.𝟔𝟎)
𝐫𝟐𝟑@𝟏 = 3. scores on the cooperative mathematical test
(𝟏8𝟎.𝟓²)(𝟏8𝟎.𝟔²)
Thus,
𝟎.𝟓𝟎 8 𝟎.𝟒𝟖
𝐫𝟐𝟑@𝟏 = 𝐫𝟏𝟐 = 𝟎. 𝟓𝟎
(𝟎.𝟑𝟔)(𝟎.𝟔𝟒)

𝟎.𝟎𝟐 𝐫𝟏𝟑 = 𝟎. 𝟔𝟎
𝐫𝟐𝟑@𝟏 =
𝟎.𝟒𝟖
𝐫𝟐𝟑 = 𝟎. 𝟒𝟎
𝐫𝟐𝟑@𝟏 = 𝟎. 𝟎𝟒

We want to compute the multiple correlation coefficients


between grades and the combined effects of the two tests.
The formula is:

𝐫𝟏𝟐 ²7𝐫𝟏𝟑 ²8𝟐𝐫𝟏𝟐 𝐫𝟏𝟑 𝐫𝟐𝟑


𝐑 𝟏5𝟐𝟑 =
𝟏8𝐫𝟐𝟑 ²

GAMMA
𝟎.𝟓𝟎²7𝟎.𝟔𝟎²8𝟐(𝟎.𝟓𝟎)(𝟎.𝟔𝟎)(𝟎.𝟒𝟎)
𝐑 𝟏5𝟐𝟑 =
𝟏8𝟎.𝟒𝟎²

𝟎.𝟐𝟓7𝟎.𝟑𝟔8𝟎.𝟐𝟒
𝐑 𝟏5𝟐𝟑 =
𝟎.𝟖𝟒

𝟎.𝟑𝟕
𝐑 𝟏5𝟐𝟑 =
𝟎.𝟖𝟒

𝐑 𝟏5𝟐𝟑 = 𝟎. 𝟒𝟒𝟎𝟓
𝐑 𝟏5𝟐𝟑 = 𝟎. 𝟔𝟔
GAMMA Example 1: Compute the gamma for the data shown in
An alternative the rank – order correlation coefficient is the the table 6.1
Goodman’s and Kruskal’s gamma (G). The value of variable can
be estimated or predicated from the other variable when you Table 6.1 Employees Ranked on Socio – Economic
have the knowledge of their values. They can also be used when Status and Educational Status
ties are found in the ranking of data.
Educational Status
The formula is:
𝐍𝐬 %𝐍𝟏
𝐆= Upper Middle Lower Total
𝐍𝐬 A𝐍𝟏 Socio
Economic
Where: Status
Upper 24 19 5 48

Ns = the no. of pairs ordered in the parallel


Middle 12 54 29 95
direction.
N1 = the no. of pairs ordered in the opposite Lower 9 26 25 60
direction.
G = the different between the proportion of pairs Total 45 99 59 203
ordered in the parallel direction and the
proportion of pairs ordered in the opposite
direction.

Solution: 𝑵𝒔 =[(24)(54)+24(29)+24(26)+24(25)]+[19(29)+1
Step 1. Arrange the ordering for one of the two 9(25)+12[12(26)+12(25)]+[(54)(25)]
characteristics from the highest to the lowest or vice 𝑵𝒔 =[1296+696+624+600+551+475+312+300+13
versa from top to bottom through the rows and for the 50]
other characteristics from the highest to the lowest o
𝑵𝒔 =6204
vice versa from left to right though the column.
The procedure can also be written as:
𝐍𝐬 =[24(54+29+26+25)]+[19(29+25)]+[12(26+25)
Step 2. Compute Ns by multiplying the frequency in
every cell by the series of the frequencies in all of the ]+[(54)(25)]
other cells which are both to the right of the original 𝐍𝐬 =[1296+696+624+600+551+475+312+300+13
cell below it and then sum up the products obtained. 50]
𝐍𝐬 =6204

Step 3. To solve 𝑁8 you simply reverse partially


Step 4. Apply Gamma
the process described in Step 2. You multiply
formula
the frequency of every cell by the sum of the
𝐍 8𝐍
frequencies in all of the cells to the left of the 𝐆 = 𝐍𝐬C𝐍𝟏
𝐬 𝟏
original cell below it and then sum up the 𝟔𝟐𝟎𝟒8𝟐𝟒𝟎𝟓
𝐆=
products obtained. 𝟔𝟐𝟎𝟒C𝟐𝟒𝟎𝟓
𝟑𝟕𝟗𝟗
𝐍𝟏 𝐆 = 𝟖𝟔𝟎𝟗
= 𝟓 𝟏𝟐 + 𝟓𝟒 + 𝟗 + 𝟐𝟔 + 𝟏𝟗 𝟏𝟐 + 𝟗
𝐆 = 𝟎. 𝟒𝟒𝟏𝟐𝟖𝟐
+ 𝟐𝟗 𝟗 + 𝟐𝟔 + 𝟓𝟒 𝟗
𝐆 = 𝟎. 𝟒𝟒 Answer
𝐍𝟏 = 𝟔𝟎 + 𝟐𝟕𝟎 + 𝟒𝟓 + 𝟏𝟑𝟎 + 𝟐𝟐𝟖 + 𝟏𝟕𝟏 + 𝟐𝟔𝟏 +
𝟕𝟓𝟒 + 𝟒𝟖𝟔
𝐍𝟏 = 𝟐𝟒𝟎𝟓
A gamma coefficient of +0.44 indicates a
moderately small positive correlation between socio-
economic status and educational status. The results
suggest a correlation based on a dominance of a
parallel direction of the two variables. This means there
is 0.44 percent greater chance for a parallel direction
CORRELATION
than opposite direction for the variables of socio-
economic status and educational status. If the gamma BETWEEN NOMINAL
coefficient is −0.44 it will indicate instead a moderately
small negative correlation based on a dominance of DATA
opposite direction.

Correlation Between Nominal Data However, if your dependent variable is regarded as


Lambda the row variable, the formula to use is,
The lambda coefficient is represented by the lower
case Greek letter ʎ which is also known as Guttmann’s 𝑭𝒃𝒋 − 𝑴𝒃𝒓
𝝀𝒓 =
coefficient of predictability. This is defined as the 𝑵 − 𝑴𝒃𝒓
proportionate reduction in error measure which shows the
index of how much an error is reduced in prediction values Where:
of one variable from values of another. It is also the other
way of measuring to what degree the accuracy of the ʎ = the lambda coefficient
prediction can be improved. If you have a lambda of 0.80, Fbj = the biggest cell frequencies in the
you have minimized the error of your prediction about the jth column (with sum taken over all
values of the dependent variable by 80 percent: If your of the column)
lambda is 0.30 , you have minimized the error of your Mbr = the biggest of the row totals
prediction by only 30 percent. The lambda coefficient is
measure of association for comparing several groups or N = the total number of observations
categories at the nominal level.

Solution: Formula :
𝑭𝒃𝒊 − 𝑴𝒃𝒄
𝚺𝐅𝐛𝐢%𝐌𝐛𝐜 𝚺𝐅𝐛𝐣%𝐌𝐛𝐫 𝝀=
𝛌𝐜 = 𝛌𝐫 = 𝑵 − 𝑴𝒃𝒄
𝐍%𝐌𝐛𝐜 𝐍%𝐌𝐛𝐫
(𝟒𝟗A𝟕𝟐A𝟐𝟔)%𝟏𝟐𝟐 (𝟒𝟗A𝟕𝟐A𝟐𝟏)%𝟏𝟐𝟕
𝛌𝐜 = 𝛌𝐜 = Where:
𝟐𝟗𝟎%𝟏𝟐𝟐 𝟐𝟗𝟎%𝟏𝟐𝟕
𝟏𝟒𝟕%𝟏𝟐𝟐 𝟏𝟒𝟐%𝟏𝟐𝟕
𝝀 = 𝑡ℎ𝑒 𝑙𝑎𝑚𝑏𝑑𝑎 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡
𝛌𝐜 = 𝛌𝐜 = 𝑭𝒃𝒊 = 𝑡ℎ𝑒 𝑏𝑖𝑔𝑔𝑒𝑠𝑡 𝑐𝑒𝑙𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑟𝑜𝑤
𝟏𝟖𝟏 𝟏𝟔𝟑
𝟐𝟓 𝟏𝟓 (𝑤𝑖𝑡ℎ 𝑡ℎ𝑒 𝑠𝑢𝑚 𝑡𝑎𝑘𝑒𝑛 𝑜𝑣𝑒𝑟 𝑎𝑙𝑙 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑜𝑤𝑠)
𝛌𝐜 = 𝛌𝐜 =
𝟏𝟖𝟏 𝟏𝟔𝟑 𝑴𝒃𝒄 = 𝑡ℎ𝑒 𝑏𝑖𝑔𝑔𝑒𝑠𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙𝑠
𝛌𝐜 = 𝟎. 𝟏𝟑𝟖𝟏𝟐𝟏𝟓 𝛌𝐜 = 𝟎. 𝟎𝟗𝟐𝟎𝟐𝟒𝟓 𝑵 = 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠.
𝛌𝐜 = 𝟎. 𝟏𝟒 (Answer) 𝛌𝐜 = 𝟎. 𝟎𝟗 (Answer)
Example 1 : Compute ʎ𝑐 and ʎ𝑟 for the data The obtained lambda coefficient of .14 indicates
that when religion as treated as an independent
on table 6.2
variable, the error reduced in the prediction (increases
Table 6.2. A Segment of the Filipino Electorate According its accuracy) is 14 percent. While the obtained lambda
to Religion and Political Party. coefficient of .09 indicates that when political party
treated as independent variable, the error minimized in
KBL LABAN UNIDO TOTAL the prediction (increases accuracy) is 9 percent. These
CATHOLIC 49 25 18 92
results prove that religion accurately predicts political
party more than political party predicting religion.
IGLESIA NI 34 72 21 127
CRISTO

PROTESTANT 26 25 20 71

TOTAL 109 122 59 290

CORRELATION BETWEEN AN INTERNAL


AND NOMINAL DATA
The Correlation Ratio
CORRELATION If the variable you would consider can be measured on
two different scales, one on a nominal scale say X, and the
BETWEEN AN second on the interval scale, say Y, the correlation ratio can best
be applied to measure the association of these bivariate

INTERNAL AND variables.


Basil P. Korin (1977) gives an easier formula for
correlation ratio given below.
NOMINAL DATA 𝐄² =
𝚺𝐍𝐢𝐘0𝐢² − 𝐍𝐘0²

𝚺𝐫 𝚺𝐜 𝐘𝐢 − 𝐍𝐘
Where:

𝑵 = 𝑠 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛

Example:
Solve the correlation ratio between sex and scores obtained
ΣrΣcYi2=(13)2+(19)2+(11)2+(26)2+(22)2+(30)2+(25)2
by eight male and five female psychology majors in Abnormal
Psychology class at the Philippine Normal College. +(29)2+(14)2+(9)2+(12)2+(8)2+(15)2

ΣrΣcYi2=169+361+121+676+484+900+625+841+1
Male 13 19 11 26 22 30 25 29
96+81+144+64+225
Female 14 9 12 8 15
ΣrΣcYi2=4887
Solution:
𝚺𝐍𝐢𝐘-𝐢²#𝐍𝐘-²
𝑵𝟏 = 𝟖 0 ₁= 𝟏𝟕𝟓 = 𝟐𝟏. 𝟖𝟖
𝒀 𝐄² =
𝟖 -²
𝚺𝐫 𝚺𝐜 𝐘𝐢#𝐍𝐘
𝟓𝟖
𝑵𝟐 = 𝟓 0 ₂= = 𝟏𝟏. 𝟔𝟎
𝒀
𝟓 𝟖 𝟐𝟏.𝟖𝟖 𝟐 3(𝟓)(𝟏𝟏.𝟔𝟎)²#𝟏𝟑(𝟏𝟕.𝟗𝟐)²
0 = 𝟐𝟑𝟑 = 𝟏𝟕. 𝟗𝟐
𝐄² =
𝑵 = 𝟏𝟑 𝒀 𝟒𝟖𝟖𝟕#(𝟏𝟑)(𝟏𝟕.𝟗𝟐)²
𝟏𝟑
𝟖 𝟒𝟕𝟖.𝟕𝟑 A𝟓(𝟏𝟑𝟒.𝟓𝟔)%𝟏𝟑(𝟑𝟐𝟏.𝟏𝟑)
𝐄² =
𝟒𝟖𝟖𝟕%𝟏𝟑(𝟑𝟐𝟏.𝟏𝟑)

𝟑𝟖𝟐𝟗.𝟖𝟒A𝟔𝟕𝟐.𝟖%𝟒𝟏𝟕𝟒.𝟔𝟗
𝐄² =
𝟒𝟖𝟖𝟕%𝟒𝟏𝟕𝟒.𝟔𝟗

𝟒𝟓𝟎𝟐.𝟔𝟒%𝟒𝟏𝟕𝟒.𝟔𝟗
𝐄² =
𝟕𝟏𝟐.𝟑𝟏

𝟑𝟐𝟕.𝟗𝟓
𝐄² =
𝟕𝟏𝟐.𝟑𝟏

𝐄² = 𝟎. 𝟒𝟔 Answer

You might also like