PL5101 - Group 1 - Association Analysis

Planning Analytical Method (PL5101)
Association Between Variables

Measured at the Nominal, Ordinal, and
Interval-Ratio Level
Group 1
25422001-Gugun Muhammad Fauzi
25422033-Sulthon Kamel Machmud
25422016-Ahmad Rowatul Irham
MAIN MENU
INTRODUCTION Definition Concept Function
NOMINAL Chi-Square PRE
ORDINAL Continuous Collapse
INTERVAL-RATIO R Pearson Regression ANOVA

INTRODUCTION
INTRODUCTION
Definition Concept Function
Association > Statistics that summarize the strength Measures of association help us trace relationships
and direction of the relationship between among variables, causality about the relationship, and
variables. Most generally, two variables are said to be they are our most important and powerful statistical
associated if the distribution of one of them changes tools for documenting, measuring, and analyzing
under the various categories or scores of the other. cause-and-effect relationships.
Correlation TYPE Experimental
The relationship between variables is seen

The relationship between variables is seen
based on experiments by controlling the
based on the natural environment (no control
value of one variable to the object under study,
over the value of the variable owned by the
and observing the accompanying changes in
object), only observes the relationship between
other variables. Able to show cause-and-
variables and cannot show cause and effect
effect relationships.
INTRODUCTION
Source : Gulö, W (2005)

INTRODUCTION
• Ordinal and interval/ratio level can use

analysis tools of nominal level Koefisin Phi, V Creamer, For answer the
Chi-Square
• Interval/ratio level can use analysis tools and Contingency question
of ordinal level
Nominal
• Does a Relationship
PRE Lambda Exist?
• How Strong is the
+
Correlation Relationship?
Rho Spearman and Tau
Continuous For answer the
Kendal
question
Ordinal
Association Gamma, d Sommer, and
Collapsed • Direction of the
+
Analysis Tau-B Kendal Relationship?
Interval/Ratio R Pearson For answer the

question
Regression and • How the Causality of

Experimental Interval/Ratio the Relationship?
ANOVA
INTRODUCTION
1 Understanding key variables The basis for many advanced statistical

7 analysis
2 Experimental input
8 Identification of surrogate variables
3 Hypothesis generation
4 Prediction
5 Validity assessment
6 Reliability assessment
Measured at the Nominal Level
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE
Introduction Phi Coefficient Crammer’s V Contingency
Definition Calculation Example
Chi square (X2) test is probably the most frequently used hypothesis test. Tests can be carried out with
variables measured at the nominal level (the lowest measurement level) and nonparametric, meaning
that it does not require any assumptions about the shape of the population or the distribution of sampling.
The chi square calculation is calculated from a bivariate table, which is a table that displays the scores of two
variables in the same table. This table can be used to further determine the relationship between the two
variables.
Chi-Square PRE
= observed frequency
( 𝒇 𝟎− 𝒇𝒆 ) 𝟐
❑
𝒙 (𝒐𝒃𝒕𝒂𝒊𝒏𝒆𝒅)=∑
𝟐 = = expected frequency
❑ 𝒇𝒆
N = Number of pairs observation
Chi-Square PRE
Employment Accreditation Status (1) (2) (3) (4) (5)

Status Accredited Not Accredited Totals - - )2 - )2 /
30 22 8 64 2.91
Working as a social
worker 30 10 40 10 18 -8 64 3.56
Not working as a social 25 33 -8 64 1.94
worker 25 35 60
Totals
35 27 8 64 2.37
55 45 100
= N=100 N=100 = 10.78
= = 22
Chi-Square PRE
Phi coefficient (Ф) is a coefficient of association analysis which aims to determine the relationship between
variables in a case with nominal data. Has the characteristics used in tables with two rows and two columns
(2x2). The principle is to divide the size of the chi square by the number of samples, which is then
interpreted for strength.
Chi-Square PRE
√
𝟐 Value Relationship
𝒙
𝝓= 0.00 - 0.10 Weak
𝑵 0.11 - 0.30 Moderate
> 0.30 Strong
X2 = Chi Square Value

N = Number of pairs observation
Chi-Square PRE
(1) (2) (3) (4) (5)
√ √
10.78
2
- - )2 - )2 /
𝑥 𝜙= 𝜙= 𝟎 .𝟑𝟑
𝜙=
30 22 8 64 2.91
𝑁 100
10 18 -8 64 3.56
25 33 -8 64 1.94
Conclusion:
35 27 8 64 2.37
There is a strong association between accreditation
N=100 N=100 = 10.78 and employment in the social department.
Chi-Square PRE
Crammer's V is a coefficient of association analysis which aims to determine the relationship between
variables in a case with nominal data. Its characteristic is that it is used in tables larger than two rows and
two columns (more than 2x2). The V Creamer coefficient is more general, compared to the previous phi
coefficient, the upper limit of phi can exceed 1.00. So it makes phi difficult to interpreted. Crammer's V has
the principle of dividing the chi square by the number of samples that have been multiplied by the smallest
number of columns/rows, which is then interpreted for strength.
Chi-Square PRE
X2 = Chi Square Value

V N = Number of pairs observation
(min r - 1, c - 1) = Minimum value number of row - 1 or number of
coloumn - 1
Chi-Square PRE
Membership V
Academic
Achievement Fraternity Other No
or Sorority Organization Memberships Totals Conclusion:
V There is a strong
Low 4 4 17 25 association between
Moderate 15 6 4 25 membership in student
V organizations and
High 4 16 5 25 academic achievement.
Totals 23 26 26 75
V
x2= 31.5 and α= 0.05
Chi-Square PRE
Definition Calculation
The contingency coefficient is the coefficient of association analysis which aims to determine the
relationship between variables in a case with nominal data. This coefficient is commonly used in tables with
a large number of samples. The principle is to divide the size of the chi square by the number of samples
that have been added to the size of the chi square, which is then interpreted for its strength.
Chi-Square PRE
Procedure:
• Determine H0 and H1 Interpretation Compared C dan C*
C • Determine the level of significance (𝛼) Value Relationship
• Calculate X2 count
0 No Relationship
• Calculate C
• Calculate the upper limit or C* = < 0.50 Weak
• Compare C and C* 0.50 – 0.75 Moderate
X2 = Chi Square Value • Determine the critical region 0.75 – 0.90 Strong
N = Number of pairs • Make a conclusion 0.90 – 1 Perfect

observation
Chi-Square PRE
Lambda
Proportional Reduction in Error (PRE) association measure such as a lambda tells us how much knowledge
about the independent variable improves our predictions about the dependent variable. For nominal-
level variables, we first predict the category in which each case will fall on the dependent variable (Y), while
the independent variable (X) is ignored. In this calculation, we will often incorrectly predict the case value
on the dependent variable.The second prediction, we take into account the independent variables. If the
two variables are related, the additional information provided by the independent variable will reduce our
prediction error. The stronger the relationship between variables, the greater the reduction in error.
Chi-Square PRE
Lambda
• Lambda values range from 0.00 to

• Number of prediction errors made 1.00
when ignoring the independent • 0.00 = The variable is not related at
𝑬𝟏 − 𝑬𝟐 variable (E1) all (E1 equals E2)
𝝀= • Number of prediction errors made • 1.00 = Perfect association (E2 is zero
𝑬𝟏 when considering the and the score on the dependent
independent variable (E2) variable can be predicted without
error from the independent variable)
Chi-Square PRE
Lambda
Gender
Height Conclusion:
Male Female Totals When multiplied by 100, the lambda
Tall 44 8 52 value indicates the strength of the
association in terms of the percentage
Short 6 42 48
of error reduction. Thus, the lambda
Totals 50 50 100 just calculated means that knowledge
of gender increases our ability to
𝐸1 − 𝐸2 48 −14 34 predict height by 71%. That is, 71%
𝜆= 𝜆= 𝜆= 𝜆=𝟎 .𝟕𝟏 better at knowing gender when trying
𝐸1 48 48 to predict height.
Measured at the Ordinal Level
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous
Continuous Ordinal Collapsed Ordinal
• This data has only a few (no more than 5 or

6) values for each variable and is generated
• This data has a wide range of possible values from grouping the data into continuous
• Example: The rank that belongs to the variable ordinal data.
itself (without being categorized) • Example: Ordinal data collapsed by forming
categories of upper, middle, and lower groups
X Variable X Variable
Rank 1 2 3 4 5 6 7 ... Category Low Medium High
Gamma d Sommer Tau-b Kendall Example
Gamma Goodman and Kruskal, is a symmetrical Purpose

measure used to estimate the order of case Predict whether one case will score higher or lower
pairs. than another on the variable of interest.
Principle
Characteristics • Predicting the order of a pair of cases on one
• Does not distinguish between independent variable while ignoring the order on another variable
and dependent variables • Repeating predictions while accounting for order on
• Responding to the existence, closeness and other variables
direction of the relationship • Provide interpretation of proportion reduction in
error
Value Interpretation
If Existence
𝑵𝒔 − 𝑵𝒅 Nd = 0, so G value = 1 0 No relationship
𝑮= Ns = 0, so G value = -1 ±1 Relationship
𝑵𝒔+ 𝑵𝒅
Ns = Nd, so G value = 0 Strength
0,00 – 0,30 Weak
0,31 – 0,60 Moderate
Ns : Number of pairs of cases ordered in the same rank on >0,60 Strong
two variables Direction
Nd : The number of pairs of cases ordered in different Positive (+) Same direction
ranks on the two variables Negative (-) Opposite Direction
LLs Low Med High MLs Low Med High Ns HLs Low Med High MLs Low Med High
Low Fx Low Low Fz Low
Med Fb Fc Med Med Fa Fb Med
High Fe Ff High High Fd Fe High
LLs : Fx(Fb + Fc + Fe + Ff) HLs : Fz(Fa + Fb + Fd + Fe)
LMs Low Med High MMs Low Med High HMs Low Med High MMs Low Med High
Low Low Low Low
Med Med Med Med
High High Nd High High
Modification of the gamma coefficient, where this Tx X

X1: Fx(Fa + Fe) X4: Fa(Fe)
coefficient involves a pair of one row or one column Low Med High
X2: Fy(Fb + Fd) X5: Fb(Fd)
which means it is bound to the dependent variable (Tx) or Low Fx Fy Fz
Y X3: Fz(Fc + Ff) X6: Fc(Ff)
(Ty) Med Fb Fc
Fa *multiply by total vertical (per
column) from left to right, then to
High Fe Fd Ff
dxy or dyx row 2
Ty X Y1: Fx(Fy + Fz) Y4: Fy(Fz)

Low Med High Y2: Fa(Fb + Fc) Y5: Fb(Fc)
Ns : Number of pairs of cases ordered in the same rank on two variables
Nd : The number of pairs of cases ordered in different ranks on the two Low Fx Fy Fz Y3: Fe(Fd + Ff) Y3: Fd(Ff)
variables Y
Med Fa Fb Fc *multiply by the horizontal total (per
Ty : Pairs attributed to Y row) from top to bottom, then to
Tx : Pairs associated with X High Fe Fd Ff column 2
Tau-b Kendal involves the dependent pair on the dependent variable and the independent variables (Ty) and
(Tx)
Tau-b • Tau-b Kendal is used for

symmetric tables
• Coef. Tau-b Kendal reaches a
Ns : Number of pairs of cases ordered in the same rank on two value of ±1 if the table is
variables square
Nd : The number of pairs of cases ordered in different ranks on the two • Kendal's Tau-b test is not
variables recommended for rectangular
Ty : Pairs attributed to Y tables
Tx : Pairs associated with X
In principle, the calculations of Gamma, d Sommer and Gamma d Sommer

Tau-b Kendall are distinguished based on the 𝑵𝒔 − 𝑵𝒅 dxy
𝑮=
𝑵𝒔+ 𝑵𝒅
complexity of the calculations. Gamma is the simplest
calculation of the ordinal collapsed association analysis
dyx
method, while d sommer and tau-b Kendall have a
Tau-b Kendall
more complex calculation method because they
Tau-b
consider the values of y and x in their calculations.
Political Income Level

Rights Calculate Nd
Low Middle High Total Nd
Political Income Level
Rights HL : 2(20+7+8+0) = 70
Low Middle High ML : 12(8+0) = 96
Low 15 12 2 29
HM : 3(7+0) = 21
Medium 8 20 3 31 Low 15 12 2 MM : 20(0) = 0
High 0 7 22 29 Middle 8 20 3
Nd = 187
Total 23 39 27 89 High 0 7 22
Calculate Ns
Ns
Political Income Level 𝑵𝒔 − 𝑵𝒅 𝟏𝟕𝟓𝟐 −𝟏𝟖𝟕
LL : 15(20+7+3+22) = 780 ¿ 𝟎 ,𝟖 𝟏
Rights
Low Middle High
Gamma 𝑮= ¿
𝑵𝒔+ 𝑵𝒅 𝟏𝟕𝟓𝟐+𝟏𝟖𝟕
ML : 12(3+22) = 300
LM : 8(7+22) = 232
Low 15 12 2 MM : 20(22) = 440 Gamma is 0.81 which indicates that there is a strong and
Middle 8 20 3
positive relationship between income levels and levels of
Ns =1752 political rights
High 0 7 22
Calculate Tx and Ty d Sommer

Political Income Level Tx
Rights X1 : 15(8+0) = 120 dxy 60
Low Middle High X2 : 12(20+7) = 324
X3 : 2(3+22) = 50 dyx
X4 : 8(0) = 0 59
Low 15 12 2
X5 : 20(7) = 140
Middle 8 20 3 X6 : 3(22) = 66
Tx = 700
Income 0 7 22 Tau-b Kendal
Political Income Level Ty
Rights Y1 : 15(12+2) = 210
Low Middle High Y2 : 8(20+3) = 184 Tau-b
Y3 : 0(7+22) = 0
Low 15 12 2
Y4 : 12(2) = 24 Tau-b
Y5 : 20(3) = 60
Middle 8 20 3 Y6 : 7(22) = 154
Income 0 7 22
Ty = 632 Tau-b = 0,60
Rho Spearmen Tau Kendall Example
Value Interpretation
Spearman's rho (rs) is a measure of association for
Existence
ordinal level variables that have different scores
0 No Relationship
and the relationship between cases on any
±1 Relationship
variable. Seeing the difference in the ranking of
Strength
two variables seen through their relationship.
0,00 – 0,30 Weak
0,31 – 0,60 Moderate
>0,60 Strong
Rs Direction
Positive (+) Same direction
Negative (-) Opposite direction
R² represents a proportional reduction in prediction error when predicting the
ranking of one variable based on another. Example: R = 0.86 and R² = 0.74, then the
prediction error will be reduced by 74% if you use the ranking in this case.
Obyek X Rank Y Rank D D²

Step 1
A 18 1 15 3 2 4 • Create a table like the one beside according to the given data
B 17 2 18 1 1 1 (object, variable, rank, D, and D²)
C 15 3 12 4 1 1 Step 2
D 12 4 16 2 -2 4 • Rank the cases from high to low on each variable (X) and (Y).
• Find the highest score on each variable and assign the first rank (1)
E 10 5 6 8 -3 9 and continue on both variables until the last rank
F 9 6 10 5 -1 1 • If there are cases that have the same score on a variable, determine
it using the average of the ranks.
G 8 7,5 8 6 -1,5 2,25
H 8 7,5 7 7 -0,5 0,25 Step 3
• Calculate D, which is the difference between rank Y and rank X (rank
I 5 9 5 9 0 0
Y - rank X). Note: the total number of column D is 0.
J 1 10 2 10 0 0 • Square the result D and enter it into column D².
• Calculate the value of Rs according to the Rho Spearmen calculation
formula.
Rho Spearmen Tau Kendall Example Country X Rank Y Rank D D²

India 91 1 29,7 13 12 144
South Africa 87 2 58,4 1 -1 1
Kenya 83 3 57,5 2 -1 1
Rs Rs 0,33 R² = 0,11 Canada 75 4 31,5 11 7 49
Malaysian 72 5 48,4 4 -1 1
Kazakhstan 69 6 32,7 8 2 4
Egypt 65 7 32,0 10 3 9
USA 63 8 41,0 5 -3 9
Srilanka 57 9 30,1 12 3 9
Rs of 0.33 indicates that there is a moderate and positive Meksico 50 10 50,3 3 -7 49
relationship between the level of ethnic diversity and economic Spain 44 11 32,5 9 -2 4
inequality. The more ethnically diverse the state has, the higher Australian 31 12 33,7 7 -5 25
the inequality value will be. The R² value of 0.11 indicates that Finland 26 13 25,6 15 2 4
the error in predicting is reduced by 11% in this case. Irlandia 4 14 35,9 6 -8 64

Poland 3 15 27,2 14 -1 1
Total 374
Tau kendall is an association measure for tracing the

order of the two variables simultaneously.
𝑺
𝝉=
𝟏
Assumption 𝒏(𝒏 −𝟏)
𝟐
• The independent variables summarized in the table
must be sorted first.
• The dependent variable summarized in the table does
not have to be sorted.
Step 1
Obyek X Rank Y Rank > < S • Create a table like the one beside according to the given data (object,
A 18 1 15 3 7 -2 5 variable, rank, >,< and S)
Step 2
B 17 2 18 1 8 0 8
• Arrange the independent variables (X) in order of rank from the first rank
C 15 3 12 4 6 -1 5 to the last rank
D 12 4 16 2 6 0 6 • The dependent variable (Y) follows in accordance with the resulting order
and does not need to be sorted data
E 10 5 6 8 2 -3 -1 • Rank the cases from high to low on the variable (Y).
F 9 6 10 5 4 0 4 • If there are cases that have the same score on a variable, determine it
using the average of the rank
G 8 7,5 8 6 3 0 3 Step 3
H 8 7,5 7 7 2 0 2 • Based on the dependent rank variable (Y), put a number in the >
column according to the number of objects in the next order that
I 5 9 5 9 1 0 1
have a rank > compared to the value in Y in that column, and vice
J 1 10 2 10 0 0 0 versa in the < column, by adding a (-) sign
Total 33 • Calculate the S value of each object (Pos – Neg)
• Calculate the value (Tau) according to the Tau Kendal calculation formula
Rho Spearmen Tau Kendall Example Country X Rank Y Rank > < S
India 91 1 29,7 13 2 -12 -10
South Africa 87 2 58,4 1 13 0 13
Kenya 83 3 57,5 2 12 0 12
Canada 75 4 31,5 11 3 -8 -5
𝑺
𝝉= Malaysian 72 5 48,4 4 9 -1 8
𝟏 Tau of 0.23 indicates that there is a
𝒏(𝒏 −𝟏) Kazakhstan 69 6 32,7 8 5 -4 1
𝟐 weak and positive relationship Egypt 65 7 32,0 10 3 -5 -2
𝟐𝟓 between the level of ethnic
𝝉= ¿ 𝟎 ,𝟐𝟑 USA 63 8 41,0 5 6 -1 5
𝟏 diversity and economic inequality.
𝟏𝟓(𝟏𝟓 −𝟏) Srilanka 57 9 30,1 12 2 -4 -2
𝟐 Meksico 50 10 50,3 3 5 0 5
Spain 44 11 32,5 9 2 -2 0
Australian 31 12 33,7 7 2 -1 1
Finland 26 13 25,6 15 0 -2 -2
Irlandia 4 14 35,9 6 1 0 1
Poland 3 15 27,2 14 0 0 0
Total 25
Association Between Variables Measured
at the Interval and Ratio Level
ASSOCIATION (INTERVAL/RATIO LEVEL)
R Pearson Regression ANOVA Case Study
Introduction Calculation Example
The correlation coefficient r to measure association

between interval-ratio variables (Healey,2010)
Key assumption for the use of the correlation coefficient is that the variables are random
variables and measured on either an interval or ratio scale (Kachigan, 1986)
1 2 3
Determining the presence or Find the regression Calculate the correlation
absence of a relationship line coefficient (Pearson's R)
1 Determining the presence or

absence of a relationship a The 2x2 Contingency Table
Source: Kachigan, 1986

1 Determining the presence or

absence of a relationship
b Scattergrams or Scatter Diagram
Most important reasons for examining the

scattergram before proceeding with statistical
analysis is to know that is association between
variables (Healey, 2010)
Source: Kachigan,
1986
Source:
2
Kachigan, 1986
Find the regression line
Key assumption underlying the statistical techniques

that the two variables have an essentially linear
relationship. If the relationship is nonlinear, you
Source:
might need to treat the variables as if they were Healey,
ordinal in level of measurement (Healey, 2010) 2010
The correlation coefficient r is only appropriate

for measuring the degree of relationship
between variables which are linearly
related (Kachigan, 1986)
Formula of the Pearson's R Alternative Formula
Independent Variable Dependent Variable

The formula using mean and standard deviation
(b) (Kachigan, 1986)
Number of pairs
observation
Mean X Mean Y
(a) This formula is calculation r using a raw score
(Kachigan, 1986)
The formula are equivalent and will
result in the same value of r. Which
The formula is used when at the same time we
procedure is used will be a matter of
personal preference, the nature of the will calculate the regression equation (Sugiyono,
data, and the availability of (c) 2018)
computing facilities (Kachigan, 1986)
Coefficient (Pearson's R) Relationship Level

The Interpretation
Values 0,00-0,199 Very Weak
0,20-0,399 Weak
Values between 0.00-0.30 described as
0,40-0,599 Moderate
weak, 0.30-0.60 describe as moderate,
0,60-0,799 Strong
and greater than 0.60 would be strong 0,80-1,0 Very Strong
(Healey, 2010) Source: Sugiyono (2018)
Coefficient (Pearson's R) Relationship Level

0,00-0,199 Very Weak
0,20-0,399 Weak
0,40-0,599 Moderate
0,60-0,799 Strong
0,80-1,0 Very Strong
Source: Sugiyono (2018)
Calculate the coefficient of determination
Prediction Y using knowledge of X

Coefficient of determination
indicates precisely the extent to Example:
When r = 0.5 the value a coefficient of
which X helps us predict, determination r2 = 0.25 which indicates that
understand, or explain Y number of X explain 25% of the total variation Y
Prediction Y score without X (using regression equation)
(Healey, 2010).
Calculate significance test the coefficient correlation
The model assumption in the test of Homoscedasticity: Where the variance of the Y scores
significance for Pearson's R that both in uniform for all values of X.
variables are normally distributed It can be know if the Y scores are evenly spread above
and homoscedasticity. and below the regression line for entire length of the
line.
Healey (2010)
Source: Healey, 2010

2
t (obtained) fall into rejected null hypothesis because t (obtained)

> t (critical) so alternative hypothesis accepted
(No Relationship)
(Relationship)
Rejected Rejected
Information Accepted
Alpha = 0,05
N = 10
Degree of freedom (df) = N – 2 = 10 – 2 = 8
t (critical) = 2.306 (look the t table)
r = 0.9129
Source: Sugiyono, 2018
The Interpretation of Correlation
The issue of causality because the existence of a correlation between variables does not imply
causality
A correlation does a serve a data reduction descriptive function
The descriptive power of correlation analysis is most evident in its potential for predicting
information about the values on one variable given information on another variable. The
limitation on its theoretical interpretation, since it has practical applications.
Another interpretation of the correlation between two variables is concerned with degree to
which they covary
This analyses provide us with an equation describing the nature of the relationship between two variables.
The regression equation can predict values on the criterion variable, making it more than just a curve
technique. Regression analysis can be used with both correlational and experimental data (Kachigan, 1986).
Type: Simple Regression and Multiple Regression

Purpose:
• To determine whether there is a relationship between two variables
• To describe the nature of the relationship, there must be one of them, in the form of a mathematical
equation
• To assess the level of accuracy of the description or prediction achieved by the regression equation
• In the case of multiple regression, to assess the relative importance of the various predictor variables in
their contribution to the variation of the criterion variable.
Formula
Slope (b) Y intercept (a) Squared deviation*

Σ( 𝑋 − 𝑋 )(𝑌 −𝑌 ) 𝑎=𝑌 −𝑏 𝑋 Σ ¿
𝑏= This formula will be discuss with
Σ ¿¿ x = observed value, x’ = predicted value regression analysis group in week 5
Number of Husband’s Conditional

Children (X) Housework (Y) Mean of Y
1 1,2,3,5 2.75
Σ( 𝑋 − 𝑋 )(𝑌 −𝑌 )
2 3,1 2.00 𝑏=
Σ ¿¿
3 5,0 2.50 18.32
𝑏=
4 6,3 4.50 26.68
5 7,4 5.50
𝒃=𝟎 . 𝟔𝟗
The slope is 0.69 which is

by a unit change in X, then
in Y change 0.69
Model : Least-squares Regression line

𝑌 =𝑎+𝑏𝑋
𝑌 =1.49+(0.69)(6)
𝑌 =1.49+ 4.14
5.63
Y intercept (a) Interpretation:
𝑎=𝑌 −𝑏 𝑋 In a double breadwinner family with 6 children
𝑎=3.33 −(0.69)(2.67) (X), the husband will devote 5.63 hours (Y) a week
𝑎=3.33 − 1.84 to household chores.
𝒂=𝟏 . 𝟒𝟗
Analysis of variance useful for identifying and describing a linear or other systematic relationship
between qualitative variables, It also can be used identifying between predictor and criterion variables.
ANOVA found to be useful in guiding the efficient design of our data collection schemes, especially
complex experiments (Kachigan, 1986)
Analysis of variance is appropriate for significance independent variables with more than two
categories (t test can be used only in situations in which our independent variable has exactly two
categories). For ANOVA, the null hypothesis is that the populations from which the samples are drawn
have the same score on dependent variables. (Healey, 2010)
1 Total Sum of Square (SST) 3 Calculate F Ratio

𝑀𝑒𝑎𝑛𝑠𝑞𝑢𝑎𝑟𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛
Principle ANOVA is 𝐹=
𝑀𝑒𝑎𝑛 𝑠𝑞𝑢𝑎𝑟𝑒 𝑤𝑖𝑡h𝑖𝑛
comparing within 𝑆𝑆𝑇 =𝑆𝑆𝐵+𝑆𝑆𝑊
groups variance and 2a Within Sum of Square (SSW)
between groups of
variance
2b Between Sum of Square (SSB)

1. Find SST
Protestant Catholic Jewish None Other
𝑆𝑆𝑇 = ( 666+1898 +1078+1794 +712 ) − (20) ¿

8 64 12 144 12 144 15 225 10 100 𝑆𝑆𝑇 =6148 −(20)(275.56 ) 𝑺𝑺𝑻 =𝟔𝟑𝟔.𝟖𝟎
12 144 20 400 13 169 16 256 18 324
2. Find SSB
13 169 25 625 18 324 23 529 12 144
17 289 27 729 21 441 28 784 12 144
50 666 84 1898 64 1078 82 1794 52 712
𝑆𝑆𝐵=67.24 +77.44+1.44 +60.84+51.84

6.6
𝑺𝑺𝑩=𝟐𝟓𝟖. 𝟖𝟎
3. Find SSW 6. Find F Ratio

B 𝑀𝑒𝑎𝑛𝑠𝑞𝑢𝑎𝑟𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 64.70
𝐹= = =𝟐 . 𝟓𝟕
𝑆𝑆𝑊 =636.80− 258.80 𝑀𝑒𝑎𝑛 𝑠𝑞𝑢𝑎𝑟𝑒 𝑤𝑖𝑡h𝑖𝑛 25.20
𝑺𝑺𝑾 =𝟑𝟕𝟖 .𝟎𝟎
Significance ANOVA
4. Calculate Degree of Freedom 𝐻 0 : 𝜇1 =𝜇2 =𝜇 3=𝜇 4 =𝜇5
𝑑𝑓𝑤= 𝑁 − 𝑘=20 − 5=𝟏𝟓 ¿ ¿
𝑑𝑓𝑏=𝑘−1=5 −1=𝟒 Sampel Distribution = F DIstribution Alpha = 0.05
Dfw (within) = N – k F critical = 3.06
5. Calculate mean square within and between
Dfb (between) = k-1
𝑆𝑆𝑊 378.00
𝑀𝑒𝑎𝑛 𝑠𝑞𝑢𝑎𝑟𝑒 𝑤𝑖𝑡h𝑖𝑛= = =𝟐𝟓 . 𝟐𝟎
𝑑𝑓𝑤 15 𝐹 (𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙) =𝟑 .𝟎𝟔
H0 accepted and H1 rejected
𝑆𝑆𝐵 258.80 2.57
𝑀𝑒𝑎𝑛 𝑠𝑞𝑢𝑎𝑟𝑒𝑏𝑒𝑡𝑤𝑒𝑒𝑛= = =𝟔𝟒 . 𝟕𝟎
𝑑𝑓𝑏 4 𝐹 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 > 𝐹 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑
Analysis of variance technique as testing hypotheses about the presence of

relationships between predictor and criterion variables, regression analysis as
describing the nature of those relationships, and r2 as measuring the
strength of the relationships (Kachigan, 1986).
1
Urban form, land use, and cover change and their impact on carbon emissions in the
Monterrey Metropolitan area, Mexico
by Carpio, Alejandro et al (2021)
This study analyses the urban expansion of the

Monterrey Metropolitan Area (MMA) Mexico from 1990
to 2019 using satellite imagery and GIS to determine
relation to carbon emission.
The analyses considers as variables: population data, urban expansion, gross domestic product, motor vehicle
inventory, vegetation displacement, and energy usage from residential and commercial sectors.
In this research using Pearson Correlation Coefficient Analysis statistical test to know the

magnitude between the variables from periods 1990, 1995, 2000, 2005, 2010, 2015 and 2019.
1
Tools: Minitab 18 A) Population

B) MMA Gross Domestic
Product
C) Urban Growth
D) Peri Urban Growth
E) Population Density
F) Vehicle Units
G) Vegetation – Displacement
H) CO2 Carbon sink
I) CO2 Emissions (Residential
and Commercial Sector)
1
GDP (B) is positive corelated with increasing urban population (A), motor vehicle acquisition (F), urban growth (C), peri urban growth (D), and
emissions from residential and commercial sector (I). GDP (B) shows a negative correlation with urban density (E)
Population (A) has a positive correlation between almost all variables except vegetation displacement (G) and shows a clear negative correlation
with density (E)
Urban growth (C) shows that has a high positive correlation with peri urban growth (D) and vehicle units (F), plus is related with emissions of
residential and commercial sectors (I). Urban growth (C) has a strong negative correlation with density (E)
Peri urban growth (D) has a strong positive correlation with the incremental of vehicle units (F) and emissions residential and commercial sector (I)
Population density (E) has a strong negative correlation with almost every variable except vegetation displacement (G) and CO2 sink removal (H)
Vehicle unit (F) has a strong positive correlation with emissions from commercial and residential sector (I)
Check the Pearson's R Correlation using this formula
r = pearson coefficient
n = number of the pairs of stock
Σxy = sum of products of the paired stocks
Σx = sum of the x scores
Σy = sum of the y scores
Σx2 = sum of the squared x scores
Σy2 = sum of the squared y scores
Calculate with:
1) Microsoft Excel
2) STATA
2
Does local planning of fast-growing medium-sized towns lead to higher urban
intensity or to sprawl ? Cases from Zhejiang Province
by Guan, ChengHe et al (2021)
This paper used open source geospatial data and regulatory detailed planning to measure urban intensity of the existing
and the planned.
Variables:
• Building density
• Diversity of land use function In this research using Pearson Correlation Coefficient Analysis to know if any of
• Accessibility to destination the urban intensity variables are highly correlated
• Compactness of development
• Composite score
2
Does local planning of fast-growing medium-sized towns lead to higher urban
intensity or to sprawl ? Cases from Zhejiang Province
by Guan, ChengHe et al (2021)
The result show that:

• In existing forms, no pair exhibits significant correlation
• In planned forms, diversity of land use function and building density have a
significant negative correlation
• In both existing and planned forms, composite score is positively associated
with compactness of development
In this paper noticed that the purpose of this analysis is to

The limitations of the pair correlation analysis include the
identify the potential contribution of the proposed urban
incapacity of confirmative relationships between variables and
intensity measures. As a result that correlation analysis
less explanation power in the scientific scope.
selected instead of regression analysis.
3
Evaluation of pavement condition index by different methods: Case study of
Maringá, Brazil
by Pinatt, J.M et al. (2020)
The objectives of this study were to analyze the objective and subjective

evaluations of the Pavement Condition Index (PCI) used in the Urban Pavement
Management System (UPMS) using GIS and identify the most damaged pathways.
This research was carried out in the state of Paraná (PR), Brazil. A functional
evaluation was performed, with defect identification by means of visual analysis
using the PCI method. Two types of evaluation were performed, objective and
subjective, which were compared to each other using the coefficient of
Pearson's correlation.
3
Evaluation of pavement condition index by different methods: Case study of
Maringá, Brazil
by Pinatt, J.M et al. (2020)
The result obtained from the calculation of

Pearson's correlation coefficient was 0.95;
that is, there is a strong correlation between
the PCI values, because the closer the value is
to 1, the stronger the relationship is. Thus, it
is possible to affirm that the subjective
evaluation can be used to define the
condition index of the pavement as a
simplified alternative to the
objective evaluation by the PCI method.
4
An object-based image analysis in QGIS for image classification and assessment of
coastal spatial planning
by Zaki, A. et al. (2022)
A methodology used in the assessment of spatial planning consists of classifying imageries, projecting future land cover map,
and comparing the outcome of the projection with the map of the spatial planning (Amri et al., 2017; Hakim et al., 2020). From the
perspective of urban and regional planning practices, a pixel-based image analysis is a commonly used method for image classification
Furthermore, it calculates Pearson’s correlation between the spatial variables of land cover change and the probability of each
land cover changing to other land covers.
4
An object-based image analysis in QGIS for image classification and assessment of
coastal spatial planning
by Zaki, A. et al. (2022)
Most of the variables have negligible correlation (Pearson’s correlation = 0.00 to 0.30 or 0.00 to 0.30), but there are three pairs of variables
having higher correlation (Table 3). For example, there are a low positive correlation between land value and population density (Pearson’s
correlation = 0.30 to 0.50), a low negative correlation between distance from the built-up areas and population density (Pearson’s correlation
= 0.30 to 0.50), and a high positive correlation between distance from the built-up areas and distance from the roads (Pearson’s correlation =
0.70 to 0.90).
The high correlation between built-up areas and a road network is showed by fact that residential buildings within the study area stretch
along the roads. It is also supported by a theory that humans prefer to live in an area with a higher road accessibility (Patarasuk, 2013).
Moreover, the transition probability suggests that there are conversions of 25 percent of paddy fields and bare land into waterbodies and 14
percent of paddy fields and bare land into built-up areas in 2015–2020.
5
Landscape index for indicating water quality and application to master plan of
regional lake cluster restoration
by Xinxia He et al. (2021)
Studied the impacts of landscape pattern changes on water quality over lake clusters, taking the aquaculture
area in the Lixia River hinterland of China as a case. Multi-temporal Landsat series of remote sensing data
from 1985 to 2018 was used and space-for-time substitution (SFTS) method was applied to explore the
relationship between landscape pattern and water quality.
5
Landscape index for indicating water quality and application to master plan of
regional lake cluster restoration
by Xinxia He et al. (2021)
The SFTS results showed that PD_A had

positive correlation with total nitrogen (TN)
(r = 0.26), ammonia nitrogen (NH3-N) (r =
0.21), and Chlorophyta (r = 0.33), and the
water quality degraded with increasing PD_A.
Hence, PD_A could be a water quality indicator
of lakes in the Lixia River hinterland. The study
is expected to provide a viable method to
design regional restoration plan for degraded
and over-developed wetland areas.
6
Where do networks really work? The effects of the Shenzhen greenway network on
supporting physical activities
by Kun Liuet al. (2015)
In metropolitan areas, more greenways are interconnected, forming a greenway network (GN). A GN is considered to encourage physical
activities, but verifying this statement is difficult, as traditional social survey methods do not obtain fine-grain activity geographic data on a
large scale. In view of this shortcoming, the volunteered geographic information and the geographic information system techniques were used
to describe the distribution of physical activities in a GN, to explore the effects of greenway network features on supporting activities.
6
Where do networks really work? The effects of the Shenzhen greenway network on
supporting physical activities
by Kun Liuet al. (2015)
As shown in the statistical analyses above, the GN density negatively influenced the presence of
physical activities in Model 1, while positively relating with the physical activity diversity .The
Shenzhen GN planning believes that the network could increase the possibility of people using them,
but the models revealed the confusing results. To explain these results, bivariate correlation analyses
were conducted to examine the relations between the GN density and surrounding variables .
CONCLUSION
Association Nominal Level Association Ordinal Level Association Interval/Ratio Level
Ordinal variable analysis is able to answer

Nominal variable analysis is able to Ordinal variable analysis is able to answer
answer the existence and strength of the existence, strength, direction and
the existence, strength and direction of
nature of the relationship between
the relationship between variables the relationship between variables
variables
The analysis of the ratio interval variable
Divided into two measures of association Divided into two measures of association
is able to answer the type of relationship
(correlative type) namely based on chi (correlative type) namely continuous
in a correlative and experimental
square and PRE ordinal and collapsed ordinal
manner
The calculation method on the size of the The calculation method for the collapsed The method of calculating the type of
association based on chi square is the association size is the gamma correlative relationship is the coefficient
coefficient. Phi, V Creamer and coefficient, d sommer and tau-b of moment product or correlation r
Contingency kendal Pearson
The calculation method for continuous The calculation method on the type of
The calculation method on the PRE-
ordinal association sizes is rho spearman experimental relationship is ANOVA and
based association size is lambda
and tau-kendal Regression
Literature
• Kachigan. 1986. Statistical Analysis An Interdisciplinary Introduction to Univariate & Multivariate Methods. New York: Radius Press.
• Gulö, W. 2005. Metodologi Penelitian. Jakarta: PT Grasindo.
• Healey, Joseph F. 2010. Statistics: A Tool For Social Research, Ninth Edition. Wadsworth: Cengage Learning
• Sugiyono. 2018. Metode Penelitian Kuantitatif. Bandung: ALFABETA CV
• Carpio, Alejandro et al. 2021. Urban form, land use, and cover change and their impact on carbon emissions in the Monterrey Metropolitan
area, Mexico. Urban Climate, 39, 1-17.
• He, X., Chen, C., He, M., Chen, Q., Zhang, J., Li, G., … Dong, J. (2021). Landscape index for indicating water quality and application to master
plan of regional lake cluster restoration. Ecological Indicators, 126, 107668.
• Liu, K., Siu, K. W. M., Gong, X. Y., Gao, Y., & Lu, D. (2016). Where do networks really work? The effects of the Shenzhen greenway network on
supporting physical activities. Landscape and Urban Planning, 152, 49–58.
• Guan, ChungHe et al. 2022. Does local planning of fast-growing medium sized towns lead to higher urban intensity or to sprawl? Cases
from Zhejiang Province. Cities, 130, 1-13.
• Pinatt, J. M., Chicati, M. L., Ildefonso, J. S., & Filetti, C. R. G. D. arc. (2020). Evaluation of pavement condition index by different methods: Case
study of Maringá, Brazil. Transportation Research Interdisciplinary Perspectives, 4, 100100.
• Zaki, A., Buchori, I., Sejati, A. W., & Liu, Y. (2022). An object-based image analysis in QGIS for image classification and assessment of coastal
spatial planning. Egyptian Journal of Remote Sensing and Space Science, 25(2), 349–359.
Thank You

PL5101 - Group 1 - Association Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PL5101 - Group 1 - Association Analysis

Uploaded by

Copyright:

Available Formats

Planning Analytical Method (PL5101)

Association Between Variables

INTRODUCTION Definition Concept Function

NOMINAL Chi-Square PRE

ORDINAL Continuous Collapse

INTERVAL-RATIO R Pearson Regression ANOVA

Correlation TYPE Experimental

The relationship between variables is seen

Source : Gulö, W (2005)

• Ordinal and interval/ratio level can use

Interval/Ratio R Pearson For answer the

Regression and • How the Causality of

1 Understanding key variables The basis for many advanced statistical

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

Employment Accreditation Status (1) (2) (3) (4) (5)

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

X2 = Chi Square Value

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

(1) (2) (3) (4) (5)

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

X2 = Chi Square Value

Introduction Phi Coefficient Crammer’s V Contingency

Definition Calculation Example

Introduction Phi Coefficient Crammer’s V Contingency

Introduction Phi Coefficient Crammer’s V Contingency

N = Number of pairs • Make a conclusion 0.90 – 1 Perfect

Definition Calculation Example

Definition Calculation Example

• Lambda values range from 0.00 to

Definition Calculation Example

Continuous Ordinal Collapsed Ordinal

• This data has only a few (no more than 5 or

Gamma d Sommer Tau-b Kendall Example

Gamma Goodman and Kruskal, is a symmetrical Purpose

Gamma d Sommer Tau-b Kendall Example

Gamma d Sommer Tau-b Kendall Example

Gamma d Sommer Tau-b Kendall Example

Modification of the gamma coefficient, where this Tx X

Ty X Y1: Fx(Fy + Fz) Y4: Fy(Fz)

Gamma d Sommer Tau-b Kendall Example

Tau-b • Tau-b Kendal is used for

Gamma d Sommer Tau-b Kendall Example

In principle, the calculations of Gamma, d Sommer and Gamma d Sommer

Gamma d Sommer Tau-b Kendall Example

Political Income Level

Gamma d Sommer Tau-b Kendall Example

Calculate Tx and Ty d Sommer

Rho Spearmen Tau Kendall Example

Rho Spearmen Tau Kendall Example

Obyek X Rank Y Rank D D²

Rho Spearmen Tau Kendall Example Country X Rank Y Rank D D²