You are on page 1of 37

EDUCATIONAL DATA REASONING

BBD 30402

By:

Faculty of Technical & Vocational Education,


University Tun Hussein Onn Malaysia, Malaysia
Learning Outcomes
Upon completion of this chapter, you should be able to:
• Construct a scatter diagram given two sets of data
• Interpret a given scatter plot in terms of strength of relationship and
direction of relationships
• Decide whether the relationship between two sets of data is
linear/non-linear given the scatter diagram
• Calculate Pearson correlation given the data
• Calculate Spearman correlation given the data
• Interpret given correlation coefficient
• Decide on whether to use Pearson or Spearman correlation given the
data sets
Correlation
• A correlation is showing the degree or strength of
relationship between two variables.
• The population correlation, denoted by 
• The sample correlation, denoted by r
• Usually, the variables denoted by X and Y.
•  (r) can take on any value from -1 to 1.
• Three method can be used for the describe the relation and
estimating association between variables
a) Scatter plot
b) Pearson’s Correlation Coefficient
c) Spearman’s Rank Correlation Coefficient
Scatter Plot / Diagram
• A scatter diagram shows the relationship between two
variables. For example, you might want to compare the speed
you drive with the time it takes you to get to work, or to
compare the heights and weights of children, or to compare
the steam usage in a plant to the outside temperature.
Scatter Plot / Diagram
• Scatter plots usually consist of a large body of data.
• The closer the data points come when plotted to making a straight
line, the higher the correlation between the two variables, or the
stronger the relationship.
• In a positive linear relationship indicates that as the X score
increase, the Y also tend to increase.
Scatter Plot / Diagram
• In a negative linear relationship indicates that as the X score
increases, the Y score tend to decreases.

• In a nonlinear relationship denotes that as the X scores


increases, the Y score do not increases nor decreases.
Perfect positive Strong positive Positive correlation r =
correlation r = 1 correlation r = 0.99 0.80

Strong negative No correlation r = 0.00 Non-linear


correlation r = -0.98
Strength of Relationship
r Relationship Between Two Variables
r = -1.00 indicates a perfect negative linear relationship
r = +1.00 indicates a perfect positive linear relationship
-1.00 < r < -0.50 indicates a strong negative linear relationship
+0.50 < r < +1.00 indicates a strong positive linear relationship

-0.50 < r < 0 indicates a weak negative linear relationship


0 < r < +0.50 indicates a weak positive linear relationship
r=0 indicates no linear relationship
Strength of Relationship
Important Things for Plotting Scatter Diagram

• We need a scatter plot to find if the relationship between X


and Y is a linear relationship.
• It can be positive linear relationship or negative linear
relationship.
• Identify the independent and dependent variables
• Plot the scatter diagram
– Title of the diagram
– Label the X-axis
– Label the Y-axis
– Plot the data points
Pearson’s correlation coefficient ( r)
Synonyms:
product moment correlation coefficient
simple linear correlation coefficient

Definition
Pearson’s correlation coefficient measures the strength or the
degree of the linear relationship between two variables.
• It is assumed that both variables (often called X and Y) are of
interval or ratio scale.
• Data set approximately normally distribute.
Cont…

• Pearson’s Correlation Coefficient is usually signified by r


(rho, ρ).
• Formula for computing Pearson correlation is given as:
 

r
 XY  N X Y
_ 2 _ 2
( X 2  N X )( Y 2  N Y ) Where:

SP X Mean of X
r
SS x SSY 

N  XY  ( X )( Y ) Y Mean of Y
rp 
 (N  X  ( X ) )  (N Y  (Y ) ) 
2 2 2 2
N number of sample
How to choose the
Start correlation?
No
Interval/ratio Spearman rank
data?
Yes

Normally No
distributed

Yes

Pearson
End
Correlation
Example 1
A high school guidance is interested in a relationship between
proximity to school and participation in extracurricular activities.
He collects the data on the distance from home to school (in
miles) and number of clubs joined for a sample of 10 juniors.
Using the following data compute a Pearson’s correlation is
significant.
Distance to school Numbers of clubs
(in miles) joined

X Y

Lee 4 3
Rhonda 2 1
Jess 7 5
Evelyn 1 2
Mohammad 4 1
Steve 6 1
George 9 9
Juan 7 6
Chi 7 5
David 10 8
Solution
Step 1
Step 2
 

r
 XY  N X Y
_ 2 _ 2
( X  N X )( Y  N Y )
2 2

299  (10)(5.7)(4.1)
rp 
(401  (10)(5.7 2 )(247  (10)(4.12 )
65.3
rp 
76.1 (78.9)
rp   0.84

Interpretation
Pearson's correlation coefficient was +0.84, indicating that there
was a strong positive linear relationship between distance from
school and number of participants in the club.
Pearson’s Coefficient Correlation

 Pearson;s Coefficient correlation test can be determine by


using critical value from Pearson’s Table or T-test.
 To test the significant of a measure of correlation, we usually
set up that

Ho :   0 Null hypothesis
Ha :   0 Alternative hypothesis

>0
Degree of freedom, df = n-2
< 0

n2
statistik ujian : T  rp
1  rp
2
Example 2

Daripada contoh, uji keertian pekali korelasi Pearson


rp dengan aras keertian, α=0.05.

Langkah 1: Nyatakan Ho dan Ha


Nilai korelasi rp adalah positif (+0.84), maka ujian hipotesis
satu hujung digunakan.

Ho : Tidak terdapat perkaitan antara jarak dari sekolah dengan


penglibatan dalam aktiviti kelab
Ho : s = 0
Ha : Terdapat perkaitan antara jarak dari sekolah dengan penglibatan
dalam aktiviti kelab.
Ha : s > 0
Example 2
• Keputusan ujian:
Darjah kebebasan
df = n-2
= 10-2
=8

Compare the obtained Pearson’s with the


appropriate value of Pearson’s in Table F

Cari rcriticaldengan menggunakan jadual nilai


kritikal Pearson’s
Example 2

Daripada jadual didapati bahawa rp  r critical


maka, null hypotesis ditolak dan mempunyai bukti yang
kukuh untuk membuat kesimpulan bahawa rp  0

Kesimpulan, ini menunjukkan bahawa wujudnya


hubungan yang signifikan pada aras signifikan 0.05 iaitu
jika jarak dari sekolah jauh, penglibatan dalam aktiviti
kelab meningkat.
Example 3
Seorang guru ingin membuktikan kepada para pelajar
keburukkan bermain komputer dalam prestasi pelajaran
mereka. Guru tersebut percaya bahawa lebih banyak
masa (jam seminggu) yang digunakan oleh pelajar untuk
bermain komputer lebih rendah markah peperiksaan
mereka. Satu sampel rawak sebanyak 10 orang pelajar
diambil untuk mendapatkan data. Data adalah seperti
berikut.

Masa (jam 4 10 14 12 4 5 8 11 13 15
seminggu)
Markah 26 17 7 12 30 40 20 15 10 5
peperiksaan
X Y XY X2 Y2
4 26 104 16 676
10 17 170 100 389
14 7 98 196 49
12 12 144 144 144
4 30 120 16 900
5 40 200 25 1600
8 20 160 64 400
11 15 165 121 225
13 10 130 169 100
15 5 75 225 25

 X  96  Y  182  XY  1366 X  1076 Y  4408


2 2
Example 3

rp =
10 (1336) – 96 (182)

 10 (1076) – (96)²  10 (4408) – (182)²

rp = - 0.927
Example 3

Nilai pekali korelasi Pearson -0.927 menunjukkan


terdapatnya satu hubungan linear negatif yang kuat di
antara masa yang digunakan untuk bermain komputer
dan prestasi pelajar. Maka, kita boleh membuat
kesimpulan bahawa jika pelajar menggunakan masa
yang banyak bermain komputer, ia akan menjejaskan
pelajaran mereka.
Spearman Rank Correlation
Coefficient Test
Introduction

• It is used to compute the degree or strength of


the linear relationship between two variables
(can be seen from the scatter diagram) is not
normal (non parametric), or data is in ordinal
level.
• It is also used when the data are at the interval/ratio but
not normally distributed.
• Denoted by rs for sample data and ρs for population
data.
Theory

• The rank representation of variable X is


denoted by symbol U

• The rank representation of variable Y is


denoted by symbol V
Simplified Formula

6 d 2

rs  1 
n(n  1)
2

• Where,
d = u – v (difference between each pair of
ranks)
n = number of pairs
Example 1

• The marks for a random sample of eight


candidates in English and Mathematics are:

Candidate 1 2 3 4 5 6 7 8
English (x) 50 58 35 86 76 43 40 60
Maths (y) 65 72 54 82 32 74 40 53
• Rank the results and hence find Spearman’s rank
correlation coefficient between the two sets of
marks. Comment on the value obtained.
Solution

Rank Maths
Eng (x) Rank (v) d=u-v d2
(u) (y)
50 4 65 5 -1 1
58 5 72 6 -1 1
35 1 54 4 -3 9
86 8 82 8 0 0
76 7 32 1 6 36
43 3 74 7 -4 16
40 2 40 2 0 0
60 6 53 3 3 9
∑ d2= 72
Solution
• Calculate the value of the test statistic, rs

6 d 2

rs  1 
n(n  1)
2

where d = u - v
Solution

6 d 2

rs  1 
n ( n 2  1)
Interpretation:
6 72 There is a very weak
rs 1 positive correlation
8(82  1)
between English
432 and Mathematics
rs  1  ranking.
504
rs  0.142
Exercise 2
• Early in the first semester, 10 students were
asked to sit on a test to determine their
Mathematics ability. At the end of the first
semester they sat for their Mathematics
examination. The distribution of data is not
normal. Calculate the Spearman rank
correlation coefficient for the two sets of
marks and interpret the results.
Students Pre-test Examination marks
1 Exercise 1
45 92
2 23 86
3 50 97
4 46 95
5 33 87
6 21 76
7 13 72
8 30 84
9 34 85
10 50 98
Solution
• Spearman rho, rs  0.94

• Interpretation: There is a strong positive


correlation between the pre-test and
examination marks which means that students
who score high marks in the pre-test tend to
score high marks also in the examination.

You might also like