Professional Documents
Culture Documents
Credits
The content was based on previous semester’s course slides created by all previous
lecturers.
2
CORRELATION - STATPROB FASILKOM UI
References
Introduction to Probability and Statistics for Engineers & Scientists, 4th ed., Sheldon M.
Ross, Elsevier, 2009.
Applied Statistics for the Behavioral Sciences, 5th Edition, Hinkle., Wiersma., Jurs.,
Houghton Mifflin Company, New York, 2003.
3
CORRELATION - STATPROB FASILKOM UI
Introduction
While studying babies’ development, we have the variables:
Introduction
While studying babies’ development, we have the variables:
Correlation
5
CORRELATION - STATPROB FASILKOM UI
Scatterplot
Pictures the relationship between variables
6
CORRELATION - STATPROB FASILKOM UI
Scatterplot (2)
What can we infer from this scatterplot?
We need a number!
7
CORRELATION - STATPROB FASILKOM UI
Correlation Coefficient
Correlation coefficient is a measure of the
relationship between two variables.
8
CORRELATION - STATPROB FASILKOM UI
9
CORRELATION - STATPROB FASILKOM UI
10
CORRELATION - STATPROB FASILKOM UI
11
CORRELATION - STATPROB FASILKOM UI
Pearson 𝑟
One of the well-known correlation coefficients is Pearson product-moment correlation
coefficient, symbolized by 𝑟 or Pearson 𝑟.
Pearson 𝑟 was developed by Karl Pearson (1857 - 1936).
Pearson 𝑟 is used most often in the behavioral sciences [Hinkle, 2003].
12
CORRELATION - STATPROB FASILKOM UI
Pearson 𝑟
Suppose there is positive correlation.
ത this
If an individual has a score on variable X that is above the mean of (𝑋),
ത
individual is likely to have a score on the Y variable that is above the mean of Y (𝑌).
The same rationale can be applied for negative correlation (opposite direction).
Karl Pearson defined a correlation coefficient between two variables (Pearson 𝑟) as:
rxy = (z z ) x y
13
CORRELATION - STATPROB FASILKOM UI
rxy =
(z z )
x y
n −1
14
CORRELATION - STATPROB FASILKOM UI
𝑋ത = 534 𝑋ത = 51.47
𝑠𝑥 = 96.53 𝑠𝑦 = 10.11
15
CORRELATION - STATPROB FASILKOM UI
12.67
rxy = = 0.9
14
16
CORRELATION - STATPROB FASILKOM UI
Instead, transform that formula into another formula that doesn’t need to compute z-
score directly
Deviation score formula
Raw score formula
Covariance to find Pearson 𝑟
17
CORRELATION - STATPROB FASILKOM UI
18
CORRELATION - STATPROB FASILKOM UI
rxy =
(xy )
( x y )
2 2
𝑋ത = 534 𝑌ത = 51.47
19
CORRELATION - STATPROB FASILKOM UI
12332
rxy = = 0.90
(130460)(1429.72)
𝑋ത = 534 𝑌ത = 51.47
𝑠𝑥 = 96.53 𝑠𝑦 = 10.11
20
CORRELATION - STATPROB FASILKOM UI
n (XY ) − X Y
rxy =
(n X 2
− ( X ) n Y − ( Y )
2
)( 2 2
)
Advantages
We only need raw scores here
𝑋, 𝑌 are raw scores for each variable
21
CORRELATION - STATPROB FASILKOM UI
n ( XY ) − X Y
rxy =
(n X 2
)(
− ( X ) n Y 2 − ( Y )
2 2
)
15(424.580 ) − (8.010)(772)
rxy = = 0.90
(15(4.407.800) − 8010 )(15(41.162) − 772 )
2 2
22
CORRELATION - STATPROB FASILKOM UI
(X - X)(Y - Y) (xy )
s xy = =
n -1 n −1
s xy ( x − x)( y
i i − y)
rxy = = i =1
s xs y (n − 1)s x s y
23
CORRELATION - STATPROB FASILKOM UI
Pearson 𝑟
Two Conditions For Computing Pearson 𝑟
The two variables to be correlated must be paired observations for the same set of individuals
or object
We use mean and variance in computing Pearson 𝑟 → the variables being correlated must be
measured on an interval or ratio scale.
Factors affecting the size of Pearson 𝑟
24
CORRELATION - STATPROB FASILKOM UI
Linearity
Relationship between two variables can be:
linear
Linear
Curvilinear
curvilinear
Pearson 𝑟 is an index of the linear
relationship between two variables.
curvilinear
25
CORRELATION - STATPROB FASILKOM UI
Linearity (2)
If the Pearson 𝑟 is applied to variables
that are curvilinear, it will underestimate
the relationship between the variables
Pearson r of the data in the picture is 0.
But, it has a perfect relation 𝑦 = 𝑥 2
Here, Pearson 𝑟 is not suitable.
26
CORRELATION - STATPROB FASILKOM UI
Homogeneity
Homogeneity vs Variance?
27
CORRELATION - STATPROB FASILKOM UI
28
CORRELATION - STATPROB FASILKOM UI
29
CORRELATION - STATPROB FASILKOM UI
30
CORRELATION - STATPROB FASILKOM UI
31
CORRELATION - STATPROB FASILKOM UI
32
CORRELATION - STATPROB FASILKOM UI
Coefficient of Determination
The square of correlation coefficient (𝑟2) equals the proportion of the total variance in 𝑌
that can be associated with variance in 𝑋, or the coefficient of determination.
s 2A
r = 2
sY2
Previously, 𝑟 = 0.69, 𝑟2 = 0.48 so, 48% variance in 𝑌 can be associated with the variance
in 𝑋.
33
CORRELATION - STATPROB FASILKOM UI
34
CORRELATION - STATPROB FASILKOM UI
Why?
Rankings are ordinal data, the Pearson 𝑟 is not applicable to them.
35
CORRELATION - STATPROB FASILKOM UI
36
CORRELATION - STATPROB FASILKOM UI
6 d 2
= 1−
n( n 2 − 1)
37
CORRELATION - STATPROB FASILKOM UI
6 d 2
= 1−
n( n 2 − 1)
6(36.5)
𝜌=1−
15(225 − 1)
𝜌 = 0.93
38
CORRELATION - STATPROB FASILKOM UI
39
CORRELATION - STATPROB FASILKOM UI
Two variables with high correlation show that they have strong association.
But, it doesn’t necessarily follow that scores on one variable are directly caused by scores
on the other variable.
A third, fourth, or a combination of other variables may be causing the two correlated
variables.
Examples?
40
CORRELATION - STATPROB FASILKOM UI
41
𝑃𝑒𝑎𝑟𝑠𝑜𝑛 𝑟 = 0.81
CORRELATION - STATPROB FASILKOM UI
42
CORRELATION - STATPROB FASILKOM UI
43