You are on page 1of 75

Planning Analytical Method (PL5101)

Association Between Variables


Measured at the Nominal, Ordinal, and
Interval-Ratio Level

Group 1
25422001-Gugun Muhammad Fauzi
25422033-Sulthon Kamel Machmud
25422016-Ahmad Rowatul Irham
MAIN MENU

INTRODUCTION Definition Concept Function

NOMINAL Chi-Square PRE

ORDINAL Continuous Collapse

INTERVAL-RATIO R Pearson Regression ANOVA


INTRODUCTION
INTRODUCTION
Definition Concept Function

Association > Statistics that summarize  the strength Measures of association help us trace relationships
and direction of the relationship between among variables, causality about the relationship, and
variables. Most generally, two variables are said to be they are our most important and powerful statistical
associated if the distribution of one of them changes tools for documenting, measuring, and analyzing
under the various categories or scores of the other. cause-and-effect relationships.

Correlation TYPE Experimental

The relationship between variables is seen


The relationship between variables is seen
based on experiments by controlling the
based on the natural environment (no control
value of one variable to the object under study,
over the value of the variable owned by the
and observing the accompanying changes in
object), only observes the relationship between
other variables. Able to show cause-and-
variables and cannot show cause and effect
effect relationships.
INTRODUCTION
Definition Concept Function

Source : Gulö, W (2005)


INTRODUCTION
Definition Concept Function

• Ordinal and interval/ratio level can use


analysis tools of nominal level Koefisin Phi, V Creamer, For answer the
Chi-Square
• Interval/ratio level can use analysis tools and Contingency question
of ordinal level
Nominal
• Does a Relationship
PRE Lambda Exist?
• How Strong is the

+
Correlation Relationship?
Rho Spearman and Tau
Continuous For answer the
Kendal
question
Ordinal
Association Gamma, d Sommer, and
Collapsed • Direction of the

+
Analysis Tau-B Kendal Relationship?

Interval/Ratio R Pearson For answer the


question

Regression and • How the Causality of


Experimental Interval/Ratio the Relationship?
ANOVA
INTRODUCTION
Definition Concept Function

1 Understanding key variables The basis for many advanced statistical


7 analysis

2 Experimental input​
8 Identification of surrogate variables

3 Hypothesis generation​

4 Prediction​

5 Validity assessment​

6 Reliability assessment​
Association Between Variables
Measured at the Nominal Level
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

Chi square (X2) test is probably the most frequently used hypothesis test. Tests can be carried out with
variables measured at the nominal level (the lowest measurement level) and nonparametric, meaning
that it does not require any assumptions about the shape of the population or the distribution of sampling.
The chi square calculation is calculated from a bivariate table, which is a table that displays the scores of two
variables in the same table. This table can be used to further determine the relationship between the two
variables.
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

= observed frequency
( 𝒇 𝟎− 𝒇𝒆 ) 𝟐

𝒙 (𝒐𝒃𝒕𝒂𝒊𝒏𝒆𝒅)=∑
𝟐 = = expected frequency
❑ 𝒇𝒆
N = Number of pairs observation
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

Employment Accreditation Status (1) (2) (3) (4) (5)


Status Accredited Not Accredited Totals - - )2 - )2 /

30 22 8 64 2.91
Working as a social
worker 30 10 40 10 18 -8 64 3.56
Not working as a social 25 33 -8 64 1.94
worker 25 35 60
Totals
35 27 8 64 2.37
55 45 100
= N=100 N=100 = 10.78

= = 22
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

Phi coefficient (Ф) is a coefficient of association analysis which aims to determine the relationship between
variables in a case with nominal data. Has the characteristics used in tables with two rows and two columns
(2x2). The principle is to divide the size of the chi square by the number of samples, which is then
interpreted for strength.
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example


𝟐 Value Relationship
𝒙
𝝓= 0.00 - 0.10 Weak
𝑵 0.11 - 0.30 Moderate
> 0.30 Strong

X2 = Chi Square Value


N = Number of pairs observation
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

(1) (2) (3) (4) (5)

√ √
10.78
2
- - )2 - )2 /
𝑥 𝜙= 𝜙= 𝟎 .𝟑𝟑
𝜙=
30 22 8 64 2.91
𝑁 100
10 18 -8 64 3.56
25 33 -8 64 1.94
Conclusion:
35 27 8 64 2.37
There is a strong association between accreditation
N=100 N=100 = 10.78 and employment in the social department.
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

Crammer's V is a coefficient of association analysis which aims to determine the relationship between
variables in a case with nominal data. Its characteristic is that it is used in tables larger than two rows and
two columns (more than 2x2). The V Creamer coefficient is more general, compared to the previous phi
coefficient, the upper limit of phi can exceed 1.00. So it makes phi difficult to interpreted. Crammer's V has
the principle of dividing the chi square by the number of samples that have been multiplied by the smallest
number of columns/rows, which is then interpreted for strength.
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

X2 = Chi Square Value


V N = Number of pairs observation
(min r - 1, c - 1) = Minimum value number of row - 1 or number of
coloumn - 1
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

Membership V
Academic
Achievement Fraternity Other No
or Sorority Organization Memberships Totals Conclusion:
V There is a strong
Low 4 4 17 25 association between
Moderate 15 6 4 25 membership in student
V organizations and
High 4 16 5 25 academic achievement.
Totals 23 26 26 75
V
x2= 31.5 and α= 0.05
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation

The contingency coefficient is the coefficient of association analysis which aims to determine the
relationship between variables in a case with nominal data. This coefficient is commonly used in tables with
a large number of samples. The principle is to divide the size of the chi square by the number of samples
that have been added to the size of the chi square, which is then interpreted for its strength.
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation

Procedure:
• Determine H0 and H1 Interpretation Compared C dan C*
C • Determine the level of significance (𝛼) Value Relationship
• Calculate X2 count
0 No Relationship
• Calculate C
• Calculate the upper limit or C* = < 0.50 Weak
• Compare C and C* 0.50 – 0.75 Moderate

X2 = Chi Square Value • Determine the critical region 0.75 – 0.90 Strong

N = Number of pairs • Make a conclusion 0.90 – 1 Perfect


observation
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Lambda

Definition Calculation Example

Proportional Reduction in Error (PRE) association measure such as a lambda tells us how much knowledge
about the independent variable improves our predictions about the dependent variable. For nominal-
level variables, we first predict the category in which each case will fall on the dependent variable (Y), while
the independent variable (X) is ignored. In this calculation, we will often incorrectly predict the case value
on the dependent variable.The second prediction, we take into account the independent variables. If the
two variables are related, the additional information provided by the independent variable will reduce our
prediction error. The stronger the relationship between variables, the greater the reduction in error.
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Lambda

Definition Calculation Example

• Lambda values range from 0.00 to


• Number of prediction errors made 1.00
when ignoring the independent • 0.00 = The variable is not related at
𝑬𝟏 − 𝑬𝟐 variable (E1) all (E1 equals E2)
𝝀= • Number of prediction errors made • 1.00 = Perfect association (E2 is zero
𝑬𝟏 when considering the and the score on the dependent
independent variable (E2) variable can be predicted without
error from the independent variable)
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE

Lambda

Definition Calculation Example

Gender
Height Conclusion:
Male Female Totals When multiplied by 100, the lambda
Tall 44 8 52 value indicates the strength of the
association in terms of the percentage
Short 6 42 48
of error reduction. Thus, the lambda
Totals 50 50 100 just calculated means that knowledge
of gender increases our ability to
𝐸1 − 𝐸2 48 −14 34 predict height by 71%. That is, 71%
𝜆= 𝜆= 𝜆= 𝜆=𝟎 .𝟕𝟏 better at knowing gender when trying
𝐸1 48 48 to predict height.
Association Between Variables
Measured at the Ordinal Level
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Continuous Ordinal Collapsed Ordinal

• This data has only a few (no more than 5 or


6) values for each variable and is generated
• This data has a wide range of possible values from grouping the data into continuous
• Example: The rank that belongs to the variable ordinal data.
itself (without being categorized) • Example: Ordinal data collapsed by forming
categories of upper, middle, and lower groups

X Variable X Variable
Rank 1 2 3 4 5 6 7 ... Category Low Medium High
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Gamma d Sommer Tau-b Kendall Example

Definition Calculation

Gamma Goodman and Kruskal, is a symmetrical Purpose


measure used to estimate the order of case Predict whether one case will score higher or lower
pairs. than another on the variable of interest.
Principle
Characteristics • Predicting the order of a pair of cases on one
• Does not distinguish between independent variable while ignoring the order on another variable
and dependent variables • Repeating predictions while accounting for order on
• Responding to the existence, closeness and other variables
direction of the relationship • Provide interpretation of proportion reduction in
error
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Gamma d Sommer Tau-b Kendall Example

Definition Calculation

Value Interpretation

If Existence

𝑵𝒔 − 𝑵𝒅 Nd = 0, so G value = 1 0 No relationship
𝑮= Ns = 0, so G value = -1 ±1 Relationship
𝑵𝒔+ 𝑵𝒅
Ns = Nd, so G value = 0 Strength
0,00 – 0,30 Weak
0,31 – 0,60 Moderate
Ns : Number of pairs of cases ordered in the same rank on >0,60 Strong
two variables Direction
Nd : The number of pairs of cases ordered in different Positive (+) Same direction
ranks on the two variables Negative (-) Opposite Direction
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Gamma d Sommer Tau-b Kendall Example

Definition Calculation

LLs Low Med High MLs Low Med High Ns HLs Low Med High MLs Low Med High
Low Fx Low Low Fz Low
Med Fb Fc Med Med Fa Fb Med
High Fe Ff High High Fd Fe High
LLs : Fx(Fb + Fc + Fe + Ff) HLs : Fz(Fa + Fb + Fd + Fe)
LMs Low Med High MMs Low Med High HMs Low Med High MMs Low Med High
Low Low Low Low
Med Med Med Med
High High Nd High High
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Gamma d Sommer Tau-b Kendall Example

Modification of the gamma coefficient, where this Tx X


X1: Fx(Fa + Fe) X4: Fa(Fe)
coefficient involves a pair of one row or one column Low Med High
X2: Fy(Fb + Fd) X5: Fb(Fd)
which means it is bound to the dependent variable (Tx) or Low Fx Fy Fz
Y X3: Fz(Fc + Ff) X6: Fc(Ff)
(Ty) Med Fb Fc
Fa *multiply by total vertical (per
column) from left to right, then to
High Fe Fd Ff
dxy or dyx row 2

Ty X Y1: Fx(Fy + Fz) Y4: Fy(Fz)


Low Med High Y2: Fa(Fb + Fc) Y5: Fb(Fc)
Ns : Number of pairs of cases ordered in the same rank on two variables
Nd : The number of pairs of cases ordered in different ranks on the two Low Fx Fy Fz Y3: Fe(Fd + Ff) Y3: Fd(Ff)
variables Y
Med Fa Fb Fc *multiply by the horizontal total (per
Ty : Pairs attributed to Y row) from top to bottom, then to
Tx : Pairs associated with X High Fe Fd Ff column 2
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Gamma d Sommer Tau-b Kendall Example

Tau-b Kendal involves the dependent pair on the dependent variable and the independent variables (Ty) and
(Tx)

Tau-b • Tau-b Kendal is used for


symmetric tables
• Coef. Tau-b Kendal reaches a
Ns : Number of pairs of cases ordered in the same rank on two value of ±1 if the table is
variables square
Nd : The number of pairs of cases ordered in different ranks on the two • Kendal's Tau-b test is not
variables recommended for rectangular
Ty : Pairs attributed to Y tables
Tx : Pairs associated with X
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Gamma d Sommer Tau-b Kendall Example

In principle, the calculations of Gamma, d Sommer and Gamma d Sommer


Tau-b Kendall are distinguished based on the 𝑵𝒔 − 𝑵𝒅 dxy
𝑮=
𝑵𝒔+ 𝑵𝒅
complexity of the calculations. Gamma is the simplest
calculation of the ordinal collapsed association analysis
dyx
method, while d sommer and tau-b Kendall have a
Tau-b Kendall
more complex calculation method because they
Tau-b
consider the values of y and x in their calculations.
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Gamma d Sommer Tau-b Kendall Example

Political Income Level


Rights Calculate Nd
Low Middle High Total Nd
Political Income Level
Rights HL : 2(20+7+8+0) = 70
Low Middle High ML : 12(8+0) = 96
Low 15 12 2 29
HM : 3(7+0) = 21
Medium 8 20 3 31 Low 15 12 2 MM : 20(0) = 0
High 0 7 22 29 Middle 8 20 3
Nd = 187
Total 23 39 27 89 High 0 7 22
Calculate Ns
Ns
Political Income Level 𝑵𝒔 − 𝑵𝒅 𝟏𝟕𝟓𝟐 −𝟏𝟖𝟕
LL : 15(20+7+3+22) = 780 ¿ 𝟎 ,𝟖 𝟏
Rights
Low Middle High
Gamma 𝑮= ¿
𝑵𝒔+ 𝑵𝒅 𝟏𝟕𝟓𝟐+𝟏𝟖𝟕
ML : 12(3+22) = 300
LM : 8(7+22) = 232
Low 15 12 2 MM : 20(22) = 440 Gamma is 0.81 which indicates that there is a strong and
Middle 8 20 3
positive relationship between income levels and levels of
Ns =1752 political rights
High 0 7 22
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Gamma d Sommer Tau-b Kendall Example

Calculate Tx and Ty d Sommer


Political Income Level Tx
Rights X1 : 15(8+0) = 120 dxy 60
Low Middle High X2 : 12(20+7) = 324
X3 : 2(3+22) = 50 dyx
X4 : 8(0) = 0 59
Low 15 12 2
X5 : 20(7) = 140
Middle 8 20 3 X6 : 3(22) = 66
Tx = 700
Income 0 7 22 Tau-b Kendal
Political Income Level Ty
Rights Y1 : 15(12+2) = 210
Low Middle High Y2 : 8(20+3) = 184 Tau-b
Y3 : 0(7+22) = 0
Low 15 12 2
Y4 : 12(2) = 24 Tau-b
Y5 : 20(3) = 60
Middle 8 20 3 Y6 : 7(22) = 154
Income 0 7 22
Ty = 632 Tau-b = 0,60
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Rho Spearmen Tau Kendall Example

Value Interpretation
Spearman's rho (rs) is a measure of association for
Existence
ordinal level variables that have different scores
0 No Relationship
and the relationship between cases on any
±1 Relationship
variable. Seeing the difference in the ranking of
Strength
two variables seen through their relationship.
0,00 – 0,30 Weak
0,31 – 0,60 Moderate
>0,60 Strong
Rs Direction
Positive (+) Same direction
Negative (-) Opposite direction
R² represents a proportional reduction in prediction error when predicting the
ranking of one variable based on another. Example: R = 0.86 and R² = 0.74, then the
prediction error will be reduced by 74% if you use the ranking in this case.
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Rho Spearmen Tau Kendall Example

Obyek X Rank Y Rank D D²


Step 1
A 18 1 15 3 2 4 • Create a table like the one beside according to the given data
B 17 2 18 1 1 1 (object, variable, rank, D, and D²)

C 15 3 12 4 1 1 Step 2
D 12 4 16 2 -2 4 • Rank the cases from high to low on each variable (X) and (Y).
• Find the highest score on each variable and assign the first rank (1)
E 10 5 6 8 -3 9 and continue on both variables until the last rank
F 9 6 10 5 -1 1 • If there are cases that have the same score on a variable, determine
it using the average of the ranks.
G 8 7,5 8 6 -1,5 2,25
H 8 7,5 7 7 -0,5 0,25 Step 3
• Calculate D, which is the difference between rank Y and rank X (rank
I 5 9 5 9 0 0
Y - rank X). Note: the total number of column D is 0.
J 1 10 2 10 0 0 • Square the result D and enter it into column D².
• Calculate the value of Rs according to the Rho Spearmen calculation
formula.
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Rho Spearmen Tau Kendall Example Country X Rank Y Rank D D²


India 91 1 29,7 13 12 144
South Africa 87 2 58,4 1 -1 1
Kenya 83 3 57,5 2 -1 1
Rs Rs 0,33 R² = 0,11 Canada 75 4 31,5 11 7 49
Malaysian 72 5 48,4 4 -1 1
Kazakhstan 69 6 32,7 8 2 4
Egypt 65 7 32,0 10 3 9
USA 63 8 41,0 5 -3 9
Srilanka 57 9 30,1 12 3 9
Rs of 0.33 indicates that there is a moderate and positive Meksico 50 10 50,3 3 -7 49
relationship between the level of ethnic diversity and economic Spain 44 11 32,5 9 -2 4
inequality. The more ethnically diverse the state has, the higher Australian 31 12 33,7 7 -5 25

the inequality value will be. The R² value of 0.11 indicates that Finland 26 13 25,6 15 2 4

the error in predicting is reduced by 11% in this case. Irlandia 4 14 35,9 6 -8 64


Poland 3 15 27,2 14 -1 1
Total 374
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Rho Spearmen Tau Kendall Example

Tau kendall is an association measure for tracing the


order of the two variables simultaneously.
𝑺
𝝉=
𝟏
Assumption 𝒏(𝒏 −𝟏)
𝟐
• The independent variables summarized in the table
must be sorted first.
• The dependent variable summarized in the table does
not have to be sorted.
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Rho Spearmen Tau Kendall Example

Step 1
Obyek X Rank Y Rank > < S • Create a table like the one beside according to the given data (object,
A 18 1 15 3 7 -2 5 variable, rank, >,< and S)
Step 2
B 17 2 18 1 8 0 8
• Arrange the independent variables (X) in order of rank from the first rank
C 15 3 12 4 6 -1 5 to the last rank
D 12 4 16 2 6 0 6 • The dependent variable (Y) follows in accordance with the resulting order
and does not need to be sorted data
E 10 5 6 8 2 -3 -1 • Rank the cases from high to low on the variable (Y).
F 9 6 10 5 4 0 4 • If there are cases that have the same score on a variable, determine it
using the average of the rank
G 8 7,5 8 6 3 0 3 Step 3
H 8 7,5 7 7 2 0 2 • Based on the dependent rank variable (Y), put a number in the >
column according to the number of objects in the next order that
I 5 9 5 9 1 0 1
have a rank > compared to the value in Y in that column, and vice
J 1 10 2 10 0 0 0 versa in the < column, by adding a (-) sign
Total 33 • Calculate the S value of each object (Pos – Neg)
• Calculate the value (Tau) according to the Tau Kendal calculation formula
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous

Rho Spearmen Tau Kendall Example Country X Rank Y Rank > < S
India 91 1 29,7 13 2 -12 -10
South Africa 87 2 58,4 1 13 0 13
Kenya 83 3 57,5 2 12 0 12
Canada 75 4 31,5 11 3 -8 -5
𝑺
𝝉= Malaysian 72 5 48,4 4 9 -1 8
𝟏 Tau of 0.23 indicates that there is a
𝒏(𝒏 −𝟏) Kazakhstan 69 6 32,7 8 5 -4 1
𝟐 weak and positive relationship Egypt 65 7 32,0 10 3 -5 -2
𝟐𝟓 between the level of ethnic
𝝉= ¿ 𝟎 ,𝟐𝟑 USA 63 8 41,0 5 6 -1 5
𝟏 diversity and economic inequality.
𝟏𝟓(𝟏𝟓 −𝟏) Srilanka 57 9 30,1 12 2 -4 -2
𝟐 Meksico 50 10 50,3 3 5 0 5
Spain 44 11 32,5 9 2 -2 0
Australian 31 12 33,7 7 2 -1 1
Finland 26 13 25,6 15 0 -2 -2
Irlandia 4 14 35,9 6 1 0 1
Poland 3 15 27,2 14 0 0 0
Total 25
Association Between Variables Measured
at the Interval and Ratio Level
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

The correlation coefficient r to measure association


between interval-ratio variables (Healey,2010)

Key assumption for the use of the correlation coefficient is that the variables are random
variables and measured on either an interval or ratio scale  (Kachigan, 1986)

1 2 3
Determining the presence or Find the regression Calculate the correlation
absence of a relationship line coefficient (Pearson's R)
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

1 Determining the presence or


absence of a relationship a The 2x2 Contingency Table

Source: Kachigan, 1986


ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

1 Determining the presence or


absence of a relationship

b Scattergrams or Scatter Diagram

Most important reasons for examining the


scattergram before proceeding with statistical
analysis is to know that is association between
variables (Healey, 2010)
Source: Kachigan,
1986
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

Source:

2
Kachigan, 1986
Find the regression line

Key assumption underlying the statistical techniques


that the two variables have an essentially linear
relationship. If the relationship is nonlinear, you
Source:
might need to treat the variables as if they were Healey,
ordinal in level of measurement (Healey, 2010) 2010

The correlation coefficient r is only appropriate


for measuring the degree of relationship
between variables which are linearly
related (Kachigan, 1986)
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

Formula of the Pearson's R Alternative Formula

Independent Variable Dependent Variable


The formula using mean and standard deviation
(b) (Kachigan, 1986)

Number of pairs
observation
Mean X Mean Y
(a) This formula is calculation r using a raw score
(Kachigan, 1986)
The formula are equivalent and will
result in the same value of r. Which
The formula is used when at the same time we
procedure is used will be a matter of
personal preference, the nature of the will calculate the regression equation (Sugiyono,
data, and the availability of (c) 2018)
computing facilities (Kachigan, 1986)
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

Coefficient (Pearson's R) Relationship Level


The Interpretation
Values 0,00-0,199 Very Weak
0,20-0,399 Weak
Values between 0.00-0.30 described as
0,40-0,599 Moderate
weak, 0.30-0.60 describe as moderate,
0,60-0,799 Strong
and greater than 0.60 would be strong 0,80-1,0 Very Strong
(Healey, 2010) Source: Sugiyono (2018)
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

Coefficient (Pearson's R) Relationship Level


0,00-0,199 Very Weak
0,20-0,399 Weak
0,40-0,599 Moderate
0,60-0,799 Strong
0,80-1,0 Very Strong
Source: Sugiyono (2018)
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

Calculate the coefficient of determination

Prediction Y using knowledge of X


Coefficient of determination
indicates precisely the extent to Example:
When r = 0.5 the value a coefficient of
which X helps us predict, determination r2 = 0.25 which indicates that
understand, or explain Y number of X explain 25% of the total variation Y
Prediction Y score without X (using regression equation)
(Healey, 2010).

Calculate significance test the coefficient correlation

The model assumption in the test of Homoscedasticity: Where the variance of the Y scores
significance for Pearson's R that both in uniform for all values of X.
variables are normally distributed It can be know if the Y scores are evenly spread above
and homoscedasticity. and below the regression line for entire length of the
line.
 Healey (2010)
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

Source: Healey, 2010


2

t (obtained) fall into rejected null hypothesis because t (obtained)


> t (critical) so alternative hypothesis accepted 
(No Relationship)

(Relationship)

Rejected Rejected
Information Accepted
Alpha = 0,05
N = 10
Degree of freedom (df) = N – 2 = 10 – 2 = 8
 t (critical) = 2.306 (look the t table) 
 r = 0.9129
Source: Sugiyono, 2018
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

The Interpretation of Correlation

The issue of causality because the existence of a correlation between variables does not imply
causality
A correlation does a serve a data reduction descriptive function​
The descriptive power of correlation analysis is most evident in its potential for predicting
information about the values on one variable given information on another variable. The
limitation on its theoretical interpretation, since it has practical applications.
Another interpretation of the correlation between two variables is concerned with degree to
which they covary
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

This analyses provide us with an equation describing the nature of the relationship between two variables.
The regression equation can predict values on the criterion variable, making it more than just a curve
technique. Regression analysis can be used with both correlational and experimental data (Kachigan, 1986).

Type: Simple Regression and Multiple Regression


Purpose:
• To determine whether there is a relationship between two variables
• To describe the nature of the relationship, there must be one of them, in the form of a mathematical
equation
• To assess the level of accuracy of the description or prediction achieved by the regression equation
• In the case of multiple regression, to assess the relative importance of the various predictor variables in
their contribution to the variation of the criterion variable.
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

Formula

Slope (b) Y intercept (a) Squared deviation*


Σ( 𝑋 − 𝑋 )(𝑌 −𝑌 ) 𝑎=𝑌 −𝑏 𝑋 Σ ¿
𝑏= This formula will be discuss with
Σ ¿¿ x = observed value, x’ = predicted value regression analysis group in week 5
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

Number of Husband’s Conditional


Children (X) Housework (Y) Mean of Y
1 1,2,3,5 2.75
Σ( 𝑋 − 𝑋 )(𝑌 −𝑌 )
2 3,1 2.00 𝑏=
Σ ¿¿
3 5,0 2.50 18.32
𝑏=
4 6,3 4.50 26.68
5 7,4 5.50
𝒃=𝟎 . 𝟔𝟗

The slope is 0.69 which is


by a unit change in X, then
in Y change 0.69
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

Model : Least-squares Regression line


𝑌 =𝑎+𝑏𝑋
𝑌 =1.49+(0.69)(6)
𝑌 =1.49+ 4.14
5.63
Y intercept (a) Interpretation:
𝑎=𝑌 −𝑏 𝑋 In a double breadwinner family with 6 children
𝑎=3.33 −(0.69)(2.67) (X), the husband will devote 5.63 hours (Y) a week
𝑎=3.33 − 1.84 to household chores.
𝒂=𝟏 . 𝟒𝟗
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

Analysis of variance useful for identifying and describing a linear or other systematic relationship
between qualitative variables, It also can be used identifying between predictor and criterion variables.
ANOVA found to be useful in guiding the efficient design of our data collection schemes, especially
complex experiments (Kachigan, 1986)

Analysis of variance is appropriate for significance independent variables with more than two
categories (t test can be used only in situations in which our independent variable has exactly two
categories). For ANOVA, the null hypothesis is that the populations from which the samples are drawn
have the same score on dependent variables.  (Healey, 2010)
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

1 Total Sum of Square (SST) 3 Calculate F Ratio


𝑀𝑒𝑎𝑛𝑠𝑞𝑢𝑎𝑟𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛
Principle ANOVA is 𝐹=
𝑀𝑒𝑎𝑛 𝑠𝑞𝑢𝑎𝑟𝑒 𝑤𝑖𝑡h𝑖𝑛
comparing within 𝑆𝑆𝑇 =𝑆𝑆𝐵+𝑆𝑆𝑊
groups variance and 2a Within Sum of Square (SSW)
between groups of
variance

2b Between Sum of Square (SSB)


ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

1. Find SST
Protestant Catholic Jewish None Other

𝑆𝑆𝑇 = ( 666+1898 +1078+1794 +712 ) − (20) ¿


8 64 12 144 12 144 15 225 10 100 𝑆𝑆𝑇 =6148 −(20)(275.56 ) 𝑺𝑺𝑻 =𝟔𝟑𝟔.𝟖𝟎
12 144 20 400 13 169 16 256 18 324
2. Find SSB
13 169 25 625 18 324 23 529 12 144
17 289 27 729 21 441 28 784 12 144
50 666 84 1898 64 1078 82 1794 52 712

𝑆𝑆𝐵=67.24 +77.44+1.44 +60.84+51.84


6.6
𝑺𝑺𝑩=𝟐𝟓𝟖. 𝟖𝟎
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Introduction Calculation Example

3. Find SSW 6. Find F Ratio


B 𝑀𝑒𝑎𝑛𝑠𝑞𝑢𝑎𝑟𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 64.70
𝐹= = =𝟐 . 𝟓𝟕
𝑆𝑆𝑊 =636.80− 258.80 𝑀𝑒𝑎𝑛 𝑠𝑞𝑢𝑎𝑟𝑒 𝑤𝑖𝑡h𝑖𝑛 25.20
𝑺𝑺𝑾 =𝟑𝟕𝟖 .𝟎𝟎
Significance ANOVA
4. Calculate Degree of Freedom 𝐻 0 : 𝜇1 =𝜇2 =𝜇 3=𝜇 4 =𝜇5
𝑑𝑓𝑤= 𝑁 − 𝑘=20 − 5=𝟏𝟓 ¿ ¿
𝑑𝑓𝑏=𝑘−1=5 −1=𝟒 Sampel Distribution = F DIstribution Alpha = 0.05
Dfw (within) = N – k F critical = 3.06
5. Calculate mean square within and between
Dfb (between) = k-1
𝑆𝑆𝑊 378.00
𝑀𝑒𝑎𝑛 𝑠𝑞𝑢𝑎𝑟𝑒 𝑤𝑖𝑡h𝑖𝑛= = =𝟐𝟓 . 𝟐𝟎
𝑑𝑓𝑤 15 𝐹 (𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙) =𝟑 .𝟎𝟔
H0 accepted and H1 rejected
𝑆𝑆𝐵 258.80 2.57
𝑀𝑒𝑎𝑛 𝑠𝑞𝑢𝑎𝑟𝑒𝑏𝑒𝑡𝑤𝑒𝑒𝑛= = =𝟔𝟒 . 𝟕𝟎
𝑑𝑓𝑏 4 𝐹 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 > 𝐹 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Analysis of variance technique as testing hypotheses about the presence of


relationships between predictor and criterion variables, regression analysis as
describing the nature of those relationships, and r2 as measuring the
strength of the relationships (Kachigan, 1986).
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

1
Urban form, land use, and cover change and their impact on carbon emissions in the
Monterrey Metropolitan area, Mexico
by Carpio, Alejandro et al (2021)

This study analyses the urban expansion of the


Monterrey Metropolitan Area (MMA) Mexico from 1990
to 2019 using satellite imagery and GIS to determine
relation to carbon emission.

The analyses considers as variables: population data, urban expansion, gross domestic product, motor vehicle
inventory, vegetation displacement, and energy usage from residential and commercial sectors.

In this research using Pearson Correlation Coefficient Analysis  statistical test to know the


magnitude between the variables from periods 1990, 1995, 2000, 2005, 2010, 2015 and 2019. 
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

1
Urban form, land use, and cover change and their impact on carbon emissions in the
Monterrey Metropolitan area, Mexico
by Carpio, Alejandro et al (2021)

Tools: Minitab 18 A) Population


B) MMA Gross Domestic
Product
C) Urban Growth
D) Peri Urban Growth
E) Population Density
F) Vehicle Units
G) Vegetation – Displacement
H) CO2 Carbon sink
I) CO2 Emissions (Residential
and Commercial Sector)
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

1
Urban form, land use, and cover change and their impact on carbon emissions in the
Monterrey Metropolitan area, Mexico
by Carpio, Alejandro et al (2021)

GDP (B) is positive corelated with increasing urban population (A), motor vehicle acquisition (F), urban growth (C),  peri urban growth (D), and
emissions from residential and commercial sector (I). GDP (B) shows a negative correlation with urban density (E)
Population (A) has a positive correlation between almost all variables except vegetation displacement (G) and shows a clear negative correlation
with density (E)
Urban growth (C)  shows that has a high positive correlation with peri urban growth (D) and vehicle units (F), plus is related with emissions of
residential and commercial sectors (I).  Urban growth (C)  has a strong negative correlation with density (E)
Peri urban growth (D) has a strong positive correlation with the incremental of  vehicle units (F) and emissions residential and commercial sector (I)
Population density (E) has a strong negative correlation with almost every variable except vegetation displacement (G) and CO2 sink removal (H)
Vehicle unit (F) has a strong positive correlation with emissions from commercial and residential sector (I)
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

Check the Pearson's R Correlation using this formula

r = pearson coefficient
n = number of the pairs of stock
Σxy = sum of products of the paired stocks
Σx = sum of the x scores
Σy = sum of the y scores
Σx2 = sum of the squared x scores
Σy2 = sum of the squared y scores

Calculate with:
1) Microsoft Excel
2) STATA
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

2
Does local planning of fast-growing medium-sized towns lead to higher urban
intensity or to sprawl ? Cases from Zhejiang Province
by Guan, ChengHe et al (2021)

This paper used open source geospatial data and regulatory detailed planning to measure urban intensity of the existing
and the planned.

Variables: 
• Building density
• Diversity of land use function In this research using Pearson Correlation Coefficient Analysis to know if any of
• Accessibility to destination the urban intensity variables are highly correlated ​
• Compactness of development
• Composite score
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

2
Does local planning of fast-growing medium-sized towns lead to higher urban
intensity or to sprawl ? Cases from Zhejiang Province
by Guan, ChengHe et al (2021)

The result show that:


• In existing forms, no pair exhibits significant correlation
• In planned forms, diversity of land use function and building density have a
significant negative correlation
• In both existing and planned forms, composite score is positively associated
with compactness of development

In this paper noticed that the purpose of this analysis is to


The limitations of the pair correlation analysis include the
identify the potential contribution of the proposed urban
incapacity of confirmative relationships between variables and
intensity measures. As a result that correlation analysis
less explanation power in the scientific scope. 
selected instead of regression analysis.
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

3
Evaluation of pavement condition index by different methods: Case study of
Maringá, Brazil
by Pinatt, J.M et al. (2020)

The objectives of this study were to analyze the objective and subjective


evaluations of the Pavement Condition Index (PCI) used in the Urban Pavement
Management System (UPMS) using GIS and identify the most damaged pathways.
This research was carried out in the state of Paraná (PR), Brazil. A functional
evaluation was performed, with defect identification by means of visual analysis
using the PCI method. Two types of evaluation were performed, objective and
subjective, which were compared to each other using the coefficient of
Pearson's correlation. 
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

3
Evaluation of pavement condition index by different methods: Case study of
Maringá, Brazil
by Pinatt, J.M et al. (2020)

The result obtained from the calculation of


Pearson's correlation coefficient was 0.95;
that is, there is a strong correlation between
the PCI values, because the closer the value is
to 1, the stronger the relationship is. Thus, it
is possible to affirm that the subjective
evaluation can be used to define the
condition index of the pavement as a
simplified alternative to the
objective evaluation by the PCI method.
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

4
An object-based image analysis in QGIS for image classification and assessment of
coastal spatial planning
by Zaki, A. et al. (2022)

A methodology used in the assessment of spatial planning consists of classifying imageries, projecting future land cover map,
and comparing the outcome of the projection with the map of the spatial planning (Amri et al., 2017; Hakim et al., 2020). From the
perspective of urban and regional planning practices, a pixel-based image analysis is a commonly used method for image classification
Furthermore, it calculates Pearson’s correlation between the spatial variables of land cover change and the probability of each
land cover changing to other land covers. 
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

4
An object-based image analysis in QGIS for image classification and assessment of
coastal spatial planning
by Zaki, A. et al. (2022)

Most of the variables have negligible correlation (Pearson’s correlation = 0.00 to 0.30 or 0.00 to 0.30), but there are three pairs of variables
having higher correlation (Table 3). For example, there are a low positive correlation between land value and population density (Pearson’s
correlation = 0.30 to 0.50), a low negative correlation between distance from the built-up areas and population density (Pearson’s correlation
= 0.30 to 0.50), and a high positive correlation between distance from the built-up areas and distance from the roads (Pearson’s correlation =
0.70 to 0.90). 

The high correlation between built-up areas and a road network is showed by fact that residential buildings within the study area stretch
along the roads. It is also supported by a theory that humans prefer to live in an area with a higher road accessibility (Patarasuk, 2013).
Moreover, the transition probability suggests that there are conversions of 25 percent of paddy fields and bare land into waterbodies and 14
percent of paddy fields and bare land into built-up areas in 2015–2020.
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

5
Landscape index for indicating water quality and application to master plan of
regional lake cluster restoration
by Xinxia He et al. (2021)

Studied the impacts of landscape pattern changes on water quality over lake clusters, taking the aquaculture
area in the Lixia River hinterland of China as a case. Multi-temporal Landsat series of remote sensing data
from 1985 to 2018 was used and space-for-time substitution (SFTS) method was applied to explore the
relationship between landscape pattern and water quality.
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

5
Landscape index for indicating water quality and application to master plan of
regional lake cluster restoration
by Xinxia He et al. (2021)

The SFTS results showed that PD_A had


positive correlation with total nitrogen (TN)
(r = 0.26), ammonia nitrogen (NH3-N) (r =
0.21), and Chlorophyta (r = 0.33), and the
water quality degraded with increasing PD_A.
Hence, PD_A could be a water quality indicator
of lakes in the Lixia River hinterland. The study
is expected to provide a viable method to
design regional restoration plan for degraded
and over-developed wetland areas.
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

6
Where do networks really work? The effects of the Shenzhen greenway network on
supporting physical activities
by Kun Liuet al. (2015)

In metropolitan areas, more greenways are interconnected, forming a greenway network (GN). A GN is considered to encourage physical
activities, but verifying this statement is difficult, as traditional social survey methods do not obtain fine-grain activity geographic data on a
large scale. In view of this shortcoming, the volunteered geographic information and the geographic information system techniques were used
to describe the distribution of physical activities in a GN, to explore the effects of greenway network features on supporting activities.
ASSOCIATION (INTERVAL/RATIO LEVEL)

R Pearson Regression ANOVA Case Study

6
Where do networks really work? The effects of the Shenzhen greenway network on
supporting physical activities
by Kun Liuet al. (2015)

As shown in the statistical analyses above, the GN density negatively influenced the presence of
physical activities in Model 1, while positively relating with the physical activity diversity .The
Shenzhen GN planning believes that the network could increase the possibility of people using them,
but the models revealed the confusing results. To explain these results, bivariate correlation analyses
were conducted to examine the relations between the GN density and surrounding variables .
CONCLUSION
Association Nominal Level Association Ordinal Level Association Interval/Ratio Level

Ordinal variable analysis is able to answer


Nominal variable analysis is able to Ordinal variable analysis is able to answer
answer the existence and strength of the existence, strength, direction and
the existence, strength and direction of
nature of the relationship between
the relationship between variables the relationship between variables
variables
The analysis of the ratio interval variable
Divided into two measures of association Divided into two measures of association
is able to answer the type of relationship
(correlative type) namely based on chi (correlative type) namely continuous
in a correlative and experimental
square and PRE ordinal and collapsed ordinal
manner

The calculation method on the size of the The calculation method for the collapsed The method of calculating the type of
association based on chi square is the association size is the gamma correlative relationship is the coefficient
coefficient. Phi, V Creamer and coefficient, d sommer and tau-b of moment product or correlation r
Contingency kendal Pearson

The calculation method for continuous The calculation method on the type of
The calculation method on the PRE-
ordinal association sizes is rho spearman experimental relationship is ANOVA and
based association size is lambda
and tau-kendal Regression
Literature
• Kachigan. 1986. Statistical Analysis An Interdisciplinary Introduction to Univariate & Multivariate Methods. New York: Radius Press.
• Gulö, W. 2005. Metodologi Penelitian. Jakarta: PT Grasindo.
• Healey, Joseph F. 2010. Statistics: A Tool For Social Research, Ninth Edition. Wadsworth: Cengage Learning
• Sugiyono. 2018. Metode Penelitian Kuantitatif. Bandung: ALFABETA CV
• Carpio, Alejandro et al. 2021. Urban form, land use, and cover change and their impact on carbon emissions in the Monterrey Metropolitan
area, Mexico. Urban Climate, 39, 1-17.
• He, X., Chen, C., He, M., Chen, Q., Zhang, J., Li, G., … Dong, J. (2021). Landscape index for indicating water quality and application to master
plan of regional lake cluster restoration. Ecological Indicators, 126, 107668.
• Liu, K., Siu, K. W. M., Gong, X. Y., Gao, Y., & Lu, D. (2016). Where do networks really work? The effects of the Shenzhen greenway network on
supporting physical activities. Landscape and Urban Planning, 152, 49–58.
• Guan, ChungHe et al. 2022. Does local planning of fast-growing medium sized towns lead to higher urban intensity or to sprawl? Cases
from Zhejiang Province. Cities, 130, 1-13.
• Pinatt, J. M., Chicati, M. L., Ildefonso, J. S., & Filetti, C. R. G. D. arc. (2020). Evaluation of pavement condition index by different methods: Case
study of Maringá, Brazil. Transportation Research Interdisciplinary Perspectives, 4, 100100. 
• Zaki, A., Buchori, I., Sejati, A. W., & Liu, Y. (2022). An object-based image analysis in QGIS for image classification and assessment of coastal
spatial planning. Egyptian Journal of Remote Sensing and Space Science, 25(2), 349–359. 
Thank You

You might also like