Professional Documents
Culture Documents
Educ 98 6. Pearson R To Spearman Rho
Educ 98 6. Pearson R To Spearman Rho
Correlation Analysis is a method used to measure the strength of relationship between two or more variables.
Correlation Coefficient may be positive or negative. A positive correlation is present when high values in one variable is
associated with high values of another variable or vice versa. On the other hand, when high values is associated with
low values of the other variable or vice versa, a negative correlation is present. A perfect positive correlation is
represented by a +1.00 value while a perfect negative correlation is represented by a -1.00 value.
∙ ∙ ∙ ∙ ∙
∙ ∙ ∙
∙
∙ ∙ ∙ r=+
∙ ∙ ∙
∙
X
Low High
If the trend of the line graph is going upward, the value of r is positive. This indicates that as the value of x
increases the value of y also increases. Likewise, if the value of x decreases, the value of y also decreases, the x and y
being positively correlated.
y
High
∙ ∙ ∙ r=-
∙ ∙
∙ ∙ ∙
∙
∙ ∙
∙ ∙ ∙
X
Low High
If the trend of the line graph is going downward, the value of r is negative. It indicates that as the value of x
increases the corresponding value of y decreases, x and y being negatively correlated.
y
High
r=0
∙
∙ ∙ ∙
∙ ∙ ∙ ∙ ∙
∙ ∙ ∙ ∙
∙ ∙ ∙
X
Low High
If the trend of the line graph cannot be establishes either upward or downward, then r = 0, indicating that there
is no correlation between the x and y variables.
The degree of relationship can be interpreted through the use of range of values for the Pearson Product Moment
Correlation Coefficient as shown below:
Range of Value Interpretation
1.00 (-1.00) Perfect positive (negative) Correlation
0.91 to 0.99 (-.91 to -0.99) Very high positive (negative) correlation.
0.71 to 0.90(-0.71 to -0.90) High positive (negative) correlation
0.51 to 0.70(-0.51 to -0.70) Moderate positive (negative) correlation
0.31 to 0.50(-0.31 to -0.50) Low positive (negative) correlation
0.01 to 0.30(0.01 to -0.30) Little, if any correlation
0.00 no correlation
The formulae for the Pearson Product Moment Coefficient of Correlation are:
A. RAW SCORE
n ∑ xy−∑ x ∑ y
r=
√[ n ∑ x −(∑ x ) ][ n ∑ y −( ∑ y ) ]
2 2 2 2
Where:
r = the Pearson Product Moment Coefficient of Correlation n = sample size
∑ xy = the sum of the product of x and y
∑x ∑ y = the product of the sum of ∑ x and the sum of ∑ y
∑ x 2 = sum of squares of x ∑ y 2 = sum of squares of y
B. STANDARD
√ √
❑
r= ∑ ZxZy , Gx =
⅀ x ² Gy = ⅀ y ²
N n n
C. METHOD OF DIFFERENCE
2 2 2
S x + S y −S d
r=
2 SxSy
(∑ D )
2
∑d
2
∑D −
2 2
Σd = 2 , Sd =
n n
(∑ X ) ,
2
=∑
2
2 2 2 x
Σx = ΣX - Sx
n n
(∑ Y )
2
Sy2 = ∑
2
2 2 y
Σy = ΣY - ,
n n
Example 1. A personnel manager would like to know if there is a relationship between knowledge factors and practical
factors of a training course. The following scores were obtained by six trainees on the knowledge factors, X,
and the practical factors, Y in a training course.
Compute the correlation coefficient and test its significance at 0.05 level.
A. RAW SCORE
Trainee X Y XY X2 Y2
1 2 4 8 4 16
2 4 10 40 16 100
3 4 8 32 16 64
4 5 8 40 25 64
5 7 14 98 49 196
6 8 16 128 64 256
ΣX= 30 ΣY=60 ΣXY= 346 ΣX2= 174 ΣY2= 696
n ∑ xy−∑ x ∑ y
r=
√[ n ∑ x −(∑ x ) ][ n ∑ y −( ∑ y ) ]
2 2 2 2
B. STANDARD
X - x Y- y x/Gx y/Gy
TRAINEE X Y x y x2 y2 xy Zx Zy ZxZy
1 2 4 --3 -6 9 36 18 -1.5 -1.5 2.25
2 4 10 -1 0 1 0 0 -.5 0 0
3 4 8 -1 -2 1 4 2 -.5 -.5 .25
4 5 8 0 -2 0 4 0 0 -.5 0
5 7 14 2 4 4 16 8 1 1 1
6 8 16 3 6 9 36 18 1.5 1.5 2.25
2
ΣX= 30 ΣY=60 Σx =24 Σ y2=96 Σ xy=46 ΣZxZy=5.75
C. METHOD OF DIFFERENCE
Trainee X Y X2 Y2 D D2
1 2 4 4 16 -2 4
2 4 10 16 100 -6 36
3 4 8 16 64 -4 16
4 5 8 25 64 -3 9
5 7 14 49 196 -7 49
6 8 16 64 256 -8 64
2 2
ΣX= 30 ΣY=60 ΣX = 174 ΣY = 696 ΣD= -30 ΣD2= 178
900
= 178 - --------- = 178 – 150 = 28
6
(ΣX)2 (
30)2 Σx2 24
2 2 2
Σx = ΣX - ----------- = 174 - --------- , Sx = -------- = ----- = 4
n 6 n 6
900 Sx = 2
= 174 - -------- = 174 – 150 = 2424 24
6
(ΣY)2 (60)2 Σy2 96
Σy2 = ΣY2 - ----------- = 696 - -------- , 2
Sy = -------- = ------- = 16
n 6 n 6
3600 Sy = 4
= 696 - -------------- = 696 – 600 = 9696
6
Sx + Sy – Sd2
2 2
4 + 16 – 4.67 20 - 4.67 15.33
r = ----------------------------- = ------------------------ = ------------------ = ---------------- = 0.958 = 0.96 0.811
2SxSy 2(2)(4) 16 16
Computation:
t=r
(7.1428)= 6. 86
√ n−2
1−r 2
= 0 . 96
√ 6−2
1−0 .962
= 0. 96
√ 4
1−0.9216
= 0. 96
√ 4
0.0784
= 0. 96 √ 51.02 = 0.96
Decision: Reject Ho and Accept Ha since tcom = /6.86/ > ttab= /2.776/ with df = 4 at 0.05 level of significance
Conclusion: Therefore there is a significant relationship between the knowledge and practical factors in a training course
or the obtained relationship is significant or the correlation is not equal to zero.
Example 2. Below are the midterm (x) and the final (y) grades.
x 75 70 65 90 85 85 80 70 65 90
y 80 75 65 95 90 85 90 75 70 90
n ∑ xy−∑ x ∑ y
r=
√[ n ∑ x −(∑ x ) ][ n ∑ y −( ∑ y ) ]
2 2 2 2 t=r
√ n−2
1−r 2
= 0.95
10−2
√
1−.952
= 0.95
=
10 ( 64 , 000 )−( 775 ) (815)
√[ 10 ( 60 , 925 )−(775) ] [ 10 ( 67 , 325 )−(815) ]
2 2 √ 8
1−.9025
=0.95
√ 8 = 0.95
.0975
√ 82.05 =0.95 (9.06)
8 ,375
=
√ [ 609 , 250−600 ,625 ][ 673 , 250−664 ,225 ]
8 , 375
=
√ [ 8 , 625 ][ 9 , 025 ]
8 ,375 8 , 375
= = 8 , 822.73 r = 0.949 or 0.95 0.63 Very high positive correlation
√77 ,840 , 625
V. Decision Rule: If the computed r value is greater than the r tabular value, reject H 0.
Coefficient of Determination
The coefficient of determination is r 2 times 100%. This explains the extent to which the independent variable x
influences y or the extent to which y depends on x.
Example 1. What is the coefficient of determination when
r = 0.949
CD = r 2 x 100%
= (0.949)2 x 100%
= 0.9006 x 100%
= 90.06%
This 90.06% indicates that the final examination grade depends on the midterm grades. Thus, the final grade is
influenced by the midterm grade.
* The table contains critical values for two-tail tests. For one-tail tests, multiply α by 2.
If the calculated Pearson’s correlation coefficient is greater than the critical value from the table, then reject the null
hypothesis that there is no correlation, i.e. the correlation coefficient is zero.
y = a + bx
Where:
y = the dependent variable
x = the independent variable
a = the y intercept
b = the slope of the line
Example 1. Based on example 1 in the Pearson Product Moment Coefficient of Correlation r, suppose the midterm
report is x = 88, what is the value of the final grade?
Solution:
b=
=
=
b = .971
a= -b
= 81.5 - .971 (77.5)
= 81.5 – 75.25
a = 6.25
y = a + bx
y = 6.25 + .971x
y = 6.25 + .971 (88)
= 6.25 + 85.45
Y = 91.70 or 92 – final grade
Example 2. A study is conducted on the relationship of the number of absences (x) and the grades (y) of 15 students
in English. Using r at 0.05 level of significance and the hypothesis that there is no significant relationship
between absences and grades of students in English, determine the relationship using the following
data.
II. Hypotheses:
H 0: There is no significant relationship between the number of absences and the grades of 15 students
in a English class.
H a : There is a significant relationship between the number of absences and the grades of 15 students
in an English class.
III. Level of Significance
a= 0.05 df = n -2 = 15 – 2 = 12 CV= 2.179
IV. Test Statistic:
r Pearson Product Moment Coefficient of Correlation
2 2
x y x y xy
1 90 1 8100 90
2 85 4 7225 170
2 80 4 6400 160
3 75 9 5625 225
3 80 9 6400 240
8 65 64 4225 520
6 70 36 4900 420
1 95 1 9025 95
4 80 16 6400 320
5 80 25 6400 400
5 75 25 5625 375
1 92 1 8464 92
2 89 4 7921 178
1 80 1 6400 80
9 65 81 4225 585
∑x = 53 ∑ y = 1201 ∑ x 2 = 281 ∑ y 2 = 97, 335 ∑ xy = 3950
n = 15 n= 15
X = 3.53 y = 80.07
n ∑ xy −∑ x ∑ y
r=
√[ n ∑ x ]−[ n ∑ y ]
2 2
2− (∑ x ) 2− (∑ y )
=
59250−63653
√ [ 4215−2809 ][ 1460025−1442401 ]
t=r
√ n−2
1−r 2
= .88
√ 15−2
1−.882
= .95
√
−4403 13
=
√ [ 1406 ][ 17624 ] 1−.7744
−4403
√
= 13 = =.88
√24779344 =.95 √ 57.62 = .88 (7.59)
.2256
−4403
= 4977.88
= -0.88
V. Decision Rule: If the r computed value is greater than or beyond the critical value, reject H 0.
VI. Decision & Conclusion: The computed value of t = 6.68 is beyond the critical value of 2.179 at .05 level of
significance with 13 degrees of freedom, so the null hypothesis is rejected. This means that there is a
significant relationship between the number of absences and the grades of students in English. Since the
value of r is negative, it implies that students who had more absences had lower grades.
Suppose we want to predict the grade (y) of the student who has incurred 7 absences (x). To get the value of
y given the value of x, the simple linear regression analysis will be used.
Y= a + bx is the regression equation.
y = a + bx
= 91.12 + (−3.13 ) x
= 91.12 – 3.13 x
= 91.12 – 3.13 ( 7 )
= 91.12 – 21.91
y = 69.21 or 69 grade
Nonparametric tests do not require a normal distribution. When the value of skewness is either positive or
negative and the kurtosis is greater or lesser than 0.265 the distribution is said to be abnormal. When the value of the
kurtosis is less than 0.265, it is leptokurtic but it is platykurtic when the value is greater than 0.265.
Nonparametric tests also utilize both nominal and ordinal data. Nominal data are expresses in categories while
the ordinal are expressed in ranking.
The most commonly used tests under the Chi-Square test, U-test, H-test, Spearman Rank Order Coefficient of
Correlation, Sign Test (Median Test), Mc Nemar’s Test, Friedman Test and Kendall’s Coefficient of Coordinance W.
THE KRUSKAL-WALLIS TEST, ALSO CALLED THE KRUSKAL-WALLIS H-TEST
This test is used to compare 3 or more independent groups. This is a nonparametric test which does not require
normal distribution.. This is an alternative for the F-test (ANOVA) in parametric tests. The formula for this test is
12 2
H = n(n+1) ∑ Ri - 3 (n + 1)
¿
Where:
H = Kruskal Wallis test
n = the number of observation
12 = constant
3 = constant
Example 1. Consider the examination scores of samples of high school students who are taught in English using
three different methods: Method 1 (classroom instruction and language laboratory). Method 2
(only classroom instruction) Method 3 (only self study in language laboratory). Use the H-test at .05 level
of significance to test the null hypothesis that their means are not equal. Consider the following data.
94 17 85 8.5 89 12
88 10.5 88 10.5 78 3
90 14 90 14 75 2
95 18 80 6 65 1
92 16 79 4 80 6
90 14 85 8.5
80 6
n1 = 6 ∑ R1 = 89.5 n2 = 7 ∑ R2 = 57.5 n3 = 5 ∑ R3 = 24
Arrange the scores jointly from the lowest to the highest, then rank them.
V. Decision Rule: If the H-computed value is greater than the X 2 tabular value, reject H 0.
VI. Decision &Conclusion: Since the H-computed value of 10.458 is greater than the X 2 tabular
value of 5.991 at .05 level of significance with 2 degrees of freedom, the research hypothesis is accepted.
This means that there is a significance in the average scores using the three different methods of teaching
English. It can also be conducted that the three methods are not equally effective.
Example 2. The following are the mileage yield per gallon which a test driver consumed for 5 tankfuls each of four
kinds of gasoline. Use the H-test at the level of significance a = .05 to check the claim that there is no
significant difference in the true average mileage yield of the four kinds of gasoline.
Gasoline C 24 17 21 31 22
Gasoline P 21 31 32 19 17
Gasoline S 28 23 26 31 14
Gasoline T 29 14 18 31 20
C Rc P Rp S Rs T RT
24 12 21 8.5 28 14 29 15
17 3.5 31 17.5 23 1 16 2
21 8.5 32 20 26 13 18 5
31 17.5 19 6 31 17.5 31 17.5
22 10 17 3.5 14 1 20 7
R
∑ c = 51.5 R
∑ p = 55.5 R
∑ s= 56.5 ∑ RT = 46.5
Arrange the scores jointly from the lowest to the highest, then rank them.
12 2
H = n(n+1) ∑ Ri - 3(n+1)
¿
12 51.5
2
55.5
2
56.5
2
46.5
2
= 20(20+1) + + + - 3(20+1)
5 5 5 5
12
= 420 (530.45 + 616.05 + 638.45 + 432.45 ) – 63
12 2217.4
= 420 ¿ 1
) – 63
26608.8
= 420 - 63
= 63.354 – 63
H = 0.354
V. Decision Rule: If the H-computed value is greater than the X 2 tabular value, reject H 0.
VI. Decision &Conclusion: Since the H-computed value of 0.354 is lesser than the X 2 tabular value of 7.815
at 0.05 level of significance with 3 degrees of freedom, the null hypothesis is accepted. This means that
there is no significant difference in the average mileage yield of four kinds of gasoline.
THE SPEARMAN RANK ORDER COEFFICIENT OF CORRELATIONr s.
The Spearman Rank Order Coefficient of Correlation is denoted by r s . This test of correlation does not
require the stringent assumption of normality like the Pearson Product Moment Coefficient of Correlation which is
denoted by small r. The formula is:
6∑ D
2
rs = 1 - 2
n(n −1)
Where:
r s = Spearman Rank Order Coefficient Correlation
∑ D 2 = sum of the squares of the difference between
rank x and rank y
n = number of data pairs
6 = constant
Example 1. The following are the number of hours which 12 students studied for a midterm examination and the
grades they obtained in English. Calculate r s at 0.05 level of significance.
5 50 12 12 0 0
6 60 11 11 0 0
11 79 7 8 -1 1
20 90 1.5 3 -1.5 2.25
19 85 3 4.5 -1.5 2.25
20 92 1.5 2 -.5 .25
10 80 8.5 7 1.5 2.25
12 82 6 6 0 0
8 65 10 10 0 0
15 85 5 4.5 .5 .25
18 94 4 1 3 9
10 70 8.5 9 -.5 .25
2
∑ D = 17.5
6∑ D
2
rs = 1 – 2
n(n −1)
6 (17.5) 105
=1– 2 = 1 - 1716 = 1 - .06
12(12 −1)
= 0.94
V. Decision Rule: If the r s computed value is greater than the r s tabular value, reject H 0.
VI. Decision & Conclusion: Since ther s computed value of 0.94 is greater than the r s tabular value
of 0.618 at 0.05 level of significance, the research hypothesis is accepted. A significant relationship between
the number of hours spent in studying English and the grade in the midterm examination in English is
established. It implies that the more number of hours devoted to studying, the higher is the result in the
examination.
Example 2: The following is the ranking of two judges given to the work of 8 artists. Use r s at 0.05 level of
significance to test the null hypothesis that the two judges differ most in their opinions about these
artists.
Judge A Judge B
5 8
8 5
4 6
2 4
1 2
7 1
3 3
6 7
2
Judge A Judge B D D
5 8 -3 9
8 5 3 9
4 6 -2 4
2 4 -2 4
1 2 -1 1
7 1 6 36
3 3 0 0
6 7 -1 1
2
∑ D = 64
6∑ D
2
rs = 1 – 2
n(n −1)
6(64)
=1–
8(8 2−1)
384
= 1− 504
= 1- .76
= 0.24
Month June July August Sept Oct Nov Dec Jan Feb March
x (number of theft
6 15 30 12 20 9 2 10 11 28
cases)
y (number of vandalism
3 6 15 5 15 7 0 21 4 12
cases)
Test the hypothesis using the 0.05 level of significance. CV(t)= 2.306 (Pearson r)