Professional Documents
Culture Documents
PL5101 - Group 1 - Association Analysis
PL5101 - Group 1 - Association Analysis
Group 1
25422001-Gugun Muhammad Fauzi
25422033-Sulthon Kamel Machmud
25422016-Ahmad Rowatul Irham
MAIN MENU
Association > Statistics that summarize the strength Measures of association help us trace relationships
and direction of the relationship between among variables, causality about the relationship, and
variables. Most generally, two variables are said to be they are our most important and powerful statistical
associated if the distribution of one of them changes tools for documenting, measuring, and analyzing
under the various categories or scores of the other. cause-and-effect relationships.
+
Correlation Relationship?
Rho Spearman and Tau
Continuous For answer the
Kendal
question
Ordinal
Association Gamma, d Sommer, and
Collapsed • Direction of the
+
Analysis Tau-B Kendal Relationship?
2 Experimental input
8 Identification of surrogate variables
3 Hypothesis generation
4 Prediction
5 Validity assessment
6 Reliability assessment
Association Between Variables
Measured at the Nominal Level
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE
Chi square (X2) test is probably the most frequently used hypothesis test. Tests can be carried out with
variables measured at the nominal level (the lowest measurement level) and nonparametric, meaning
that it does not require any assumptions about the shape of the population or the distribution of sampling.
The chi square calculation is calculated from a bivariate table, which is a table that displays the scores of two
variables in the same table. This table can be used to further determine the relationship between the two
variables.
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE
= observed frequency
( 𝒇 𝟎− 𝒇𝒆 ) 𝟐
❑
𝒙 (𝒐𝒃𝒕𝒂𝒊𝒏𝒆𝒅)=∑
𝟐 = = expected frequency
❑ 𝒇𝒆
N = Number of pairs observation
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE
30 22 8 64 2.91
Working as a social
worker 30 10 40 10 18 -8 64 3.56
Not working as a social 25 33 -8 64 1.94
worker 25 35 60
Totals
35 27 8 64 2.37
55 45 100
= N=100 N=100 = 10.78
= = 22
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE
Phi coefficient (Ф) is a coefficient of association analysis which aims to determine the relationship between
variables in a case with nominal data. Has the characteristics used in tables with two rows and two columns
(2x2). The principle is to divide the size of the chi square by the number of samples, which is then
interpreted for strength.
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE
√
𝟐 Value Relationship
𝒙
𝝓= 0.00 - 0.10 Weak
𝑵 0.11 - 0.30 Moderate
> 0.30 Strong
√ √
10.78
2
- - )2 - )2 /
𝑥 𝜙= 𝜙= 𝟎 .𝟑𝟑
𝜙=
30 22 8 64 2.91
𝑁 100
10 18 -8 64 3.56
25 33 -8 64 1.94
Conclusion:
35 27 8 64 2.37
There is a strong association between accreditation
N=100 N=100 = 10.78 and employment in the social department.
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE
Crammer's V is a coefficient of association analysis which aims to determine the relationship between
variables in a case with nominal data. Its characteristic is that it is used in tables larger than two rows and
two columns (more than 2x2). The V Creamer coefficient is more general, compared to the previous phi
coefficient, the upper limit of phi can exceed 1.00. So it makes phi difficult to interpreted. Crammer's V has
the principle of dividing the chi square by the number of samples that have been multiplied by the smallest
number of columns/rows, which is then interpreted for strength.
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE
Membership V
Academic
Achievement Fraternity Other No
or Sorority Organization Memberships Totals Conclusion:
V There is a strong
Low 4 4 17 25 association between
Moderate 15 6 4 25 membership in student
V organizations and
High 4 16 5 25 academic achievement.
Totals 23 26 26 75
V
x2= 31.5 and α= 0.05
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE
Definition Calculation
The contingency coefficient is the coefficient of association analysis which aims to determine the
relationship between variables in a case with nominal data. This coefficient is commonly used in tables with
a large number of samples. The principle is to divide the size of the chi square by the number of samples
that have been added to the size of the chi square, which is then interpreted for its strength.
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE
Definition Calculation
Procedure:
• Determine H0 and H1 Interpretation Compared C dan C*
C • Determine the level of significance (𝛼) Value Relationship
• Calculate X2 count
0 No Relationship
• Calculate C
• Calculate the upper limit or C* = < 0.50 Weak
• Compare C and C* 0.50 – 0.75 Moderate
X2 = Chi Square Value • Determine the critical region 0.75 – 0.90 Strong
Lambda
Proportional Reduction in Error (PRE) association measure such as a lambda tells us how much knowledge
about the independent variable improves our predictions about the dependent variable. For nominal-
level variables, we first predict the category in which each case will fall on the dependent variable (Y), while
the independent variable (X) is ignored. In this calculation, we will often incorrectly predict the case value
on the dependent variable.The second prediction, we take into account the independent variables. If the
two variables are related, the additional information provided by the independent variable will reduce our
prediction error. The stronger the relationship between variables, the greater the reduction in error.
ASSOCIATION (NOMINAL LEVEL)
Chi-Square PRE
Lambda
Lambda
Gender
Height Conclusion:
Male Female Totals When multiplied by 100, the lambda
Tall 44 8 52 value indicates the strength of the
association in terms of the percentage
Short 6 42 48
of error reduction. Thus, the lambda
Totals 50 50 100 just calculated means that knowledge
of gender increases our ability to
𝐸1 − 𝐸2 48 −14 34 predict height by 71%. That is, 71%
𝜆= 𝜆= 𝜆= 𝜆=𝟎 .𝟕𝟏 better at knowing gender when trying
𝐸1 48 48 to predict height.
Association Between Variables
Measured at the Ordinal Level
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous
X Variable X Variable
Rank 1 2 3 4 5 6 7 ... Category Low Medium High
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous
Definition Calculation
Definition Calculation
Value Interpretation
If Existence
𝑵𝒔 − 𝑵𝒅 Nd = 0, so G value = 1 0 No relationship
𝑮= Ns = 0, so G value = -1 ±1 Relationship
𝑵𝒔+ 𝑵𝒅
Ns = Nd, so G value = 0 Strength
0,00 – 0,30 Weak
0,31 – 0,60 Moderate
Ns : Number of pairs of cases ordered in the same rank on >0,60 Strong
two variables Direction
Nd : The number of pairs of cases ordered in different Positive (+) Same direction
ranks on the two variables Negative (-) Opposite Direction
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous
Definition Calculation
LLs Low Med High MLs Low Med High Ns HLs Low Med High MLs Low Med High
Low Fx Low Low Fz Low
Med Fb Fc Med Med Fa Fb Med
High Fe Ff High High Fd Fe High
LLs : Fx(Fb + Fc + Fe + Ff) HLs : Fz(Fa + Fb + Fd + Fe)
LMs Low Med High MMs Low Med High HMs Low Med High MMs Low Med High
Low Low Low Low
Med Med Med Med
High High Nd High High
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous
Tau-b Kendal involves the dependent pair on the dependent variable and the independent variables (Ty) and
(Tx)
Value Interpretation
Spearman's rho (rs) is a measure of association for
Existence
ordinal level variables that have different scores
0 No Relationship
and the relationship between cases on any
±1 Relationship
variable. Seeing the difference in the ranking of
Strength
two variables seen through their relationship.
0,00 – 0,30 Weak
0,31 – 0,60 Moderate
>0,60 Strong
Rs Direction
Positive (+) Same direction
Negative (-) Opposite direction
R² represents a proportional reduction in prediction error when predicting the
ranking of one variable based on another. Example: R = 0.86 and R² = 0.74, then the
prediction error will be reduced by 74% if you use the ranking in this case.
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous
C 15 3 12 4 1 1 Step 2
D 12 4 16 2 -2 4 • Rank the cases from high to low on each variable (X) and (Y).
• Find the highest score on each variable and assign the first rank (1)
E 10 5 6 8 -3 9 and continue on both variables until the last rank
F 9 6 10 5 -1 1 • If there are cases that have the same score on a variable, determine
it using the average of the ranks.
G 8 7,5 8 6 -1,5 2,25
H 8 7,5 7 7 -0,5 0,25 Step 3
• Calculate D, which is the difference between rank Y and rank X (rank
I 5 9 5 9 0 0
Y - rank X). Note: the total number of column D is 0.
J 1 10 2 10 0 0 • Square the result D and enter it into column D².
• Calculate the value of Rs according to the Rho Spearmen calculation
formula.
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous
the inequality value will be. The R² value of 0.11 indicates that Finland 26 13 25,6 15 2 4
Step 1
Obyek X Rank Y Rank > < S • Create a table like the one beside according to the given data (object,
A 18 1 15 3 7 -2 5 variable, rank, >,< and S)
Step 2
B 17 2 18 1 8 0 8
• Arrange the independent variables (X) in order of rank from the first rank
C 15 3 12 4 6 -1 5 to the last rank
D 12 4 16 2 6 0 6 • The dependent variable (Y) follows in accordance with the resulting order
and does not need to be sorted data
E 10 5 6 8 2 -3 -1 • Rank the cases from high to low on the variable (Y).
F 9 6 10 5 4 0 4 • If there are cases that have the same score on a variable, determine it
using the average of the rank
G 8 7,5 8 6 3 0 3 Step 3
H 8 7,5 7 7 2 0 2 • Based on the dependent rank variable (Y), put a number in the >
column according to the number of objects in the next order that
I 5 9 5 9 1 0 1
have a rank > compared to the value in Y in that column, and vice
J 1 10 2 10 0 0 0 versa in the < column, by adding a (-) sign
Total 33 • Calculate the S value of each object (Pos – Neg)
• Calculate the value (Tau) according to the Tau Kendal calculation formula
ASSOCIATION (ORDINAL LEVEL)
Introduction Collapsed Continuous
Rho Spearmen Tau Kendall Example Country X Rank Y Rank > < S
India 91 1 29,7 13 2 -12 -10
South Africa 87 2 58,4 1 13 0 13
Kenya 83 3 57,5 2 12 0 12
Canada 75 4 31,5 11 3 -8 -5
𝑺
𝝉= Malaysian 72 5 48,4 4 9 -1 8
𝟏 Tau of 0.23 indicates that there is a
𝒏(𝒏 −𝟏) Kazakhstan 69 6 32,7 8 5 -4 1
𝟐 weak and positive relationship Egypt 65 7 32,0 10 3 -5 -2
𝟐𝟓 between the level of ethnic
𝝉= ¿ 𝟎 ,𝟐𝟑 USA 63 8 41,0 5 6 -1 5
𝟏 diversity and economic inequality.
𝟏𝟓(𝟏𝟓 −𝟏) Srilanka 57 9 30,1 12 2 -4 -2
𝟐 Meksico 50 10 50,3 3 5 0 5
Spain 44 11 32,5 9 2 -2 0
Australian 31 12 33,7 7 2 -1 1
Finland 26 13 25,6 15 0 -2 -2
Irlandia 4 14 35,9 6 1 0 1
Poland 3 15 27,2 14 0 0 0
Total 25
Association Between Variables Measured
at the Interval and Ratio Level
ASSOCIATION (INTERVAL/RATIO LEVEL)
Key assumption for the use of the correlation coefficient is that the variables are random
variables and measured on either an interval or ratio scale (Kachigan, 1986)
1 2 3
Determining the presence or Find the regression Calculate the correlation
absence of a relationship line coefficient (Pearson's R)
ASSOCIATION (INTERVAL/RATIO LEVEL)
Source:
2
Kachigan, 1986
Find the regression line
Number of pairs
observation
Mean X Mean Y
(a) This formula is calculation r using a raw score
(Kachigan, 1986)
The formula are equivalent and will
result in the same value of r. Which
The formula is used when at the same time we
procedure is used will be a matter of
personal preference, the nature of the will calculate the regression equation (Sugiyono,
data, and the availability of (c) 2018)
computing facilities (Kachigan, 1986)
ASSOCIATION (INTERVAL/RATIO LEVEL)
The model assumption in the test of Homoscedasticity: Where the variance of the Y scores
significance for Pearson's R that both in uniform for all values of X.
variables are normally distributed It can be know if the Y scores are evenly spread above
and homoscedasticity. and below the regression line for entire length of the
line.
Healey (2010)
ASSOCIATION (INTERVAL/RATIO LEVEL)
(Relationship)
Rejected Rejected
Information Accepted
Alpha = 0,05
N = 10
Degree of freedom (df) = N – 2 = 10 – 2 = 8
t (critical) = 2.306 (look the t table)
r = 0.9129
Source: Sugiyono, 2018
ASSOCIATION (INTERVAL/RATIO LEVEL)
The issue of causality because the existence of a correlation between variables does not imply
causality
A correlation does a serve a data reduction descriptive function
The descriptive power of correlation analysis is most evident in its potential for predicting
information about the values on one variable given information on another variable. The
limitation on its theoretical interpretation, since it has practical applications.
Another interpretation of the correlation between two variables is concerned with degree to
which they covary
ASSOCIATION (INTERVAL/RATIO LEVEL)
This analyses provide us with an equation describing the nature of the relationship between two variables.
The regression equation can predict values on the criterion variable, making it more than just a curve
technique. Regression analysis can be used with both correlational and experimental data (Kachigan, 1986).
Formula
Analysis of variance useful for identifying and describing a linear or other systematic relationship
between qualitative variables, It also can be used identifying between predictor and criterion variables.
ANOVA found to be useful in guiding the efficient design of our data collection schemes, especially
complex experiments (Kachigan, 1986)
Analysis of variance is appropriate for significance independent variables with more than two
categories (t test can be used only in situations in which our independent variable has exactly two
categories). For ANOVA, the null hypothesis is that the populations from which the samples are drawn
have the same score on dependent variables. (Healey, 2010)
ASSOCIATION (INTERVAL/RATIO LEVEL)
1. Find SST
Protestant Catholic Jewish None Other
1
Urban form, land use, and cover change and their impact on carbon emissions in the
Monterrey Metropolitan area, Mexico
by Carpio, Alejandro et al (2021)
The analyses considers as variables: population data, urban expansion, gross domestic product, motor vehicle
inventory, vegetation displacement, and energy usage from residential and commercial sectors.
1
Urban form, land use, and cover change and their impact on carbon emissions in the
Monterrey Metropolitan area, Mexico
by Carpio, Alejandro et al (2021)
1
Urban form, land use, and cover change and their impact on carbon emissions in the
Monterrey Metropolitan area, Mexico
by Carpio, Alejandro et al (2021)
GDP (B) is positive corelated with increasing urban population (A), motor vehicle acquisition (F), urban growth (C), peri urban growth (D), and
emissions from residential and commercial sector (I). GDP (B) shows a negative correlation with urban density (E)
Population (A) has a positive correlation between almost all variables except vegetation displacement (G) and shows a clear negative correlation
with density (E)
Urban growth (C) shows that has a high positive correlation with peri urban growth (D) and vehicle units (F), plus is related with emissions of
residential and commercial sectors (I). Urban growth (C) has a strong negative correlation with density (E)
Peri urban growth (D) has a strong positive correlation with the incremental of vehicle units (F) and emissions residential and commercial sector (I)
Population density (E) has a strong negative correlation with almost every variable except vegetation displacement (G) and CO2 sink removal (H)
Vehicle unit (F) has a strong positive correlation with emissions from commercial and residential sector (I)
ASSOCIATION (INTERVAL/RATIO LEVEL)
r = pearson coefficient
n = number of the pairs of stock
Σxy = sum of products of the paired stocks
Σx = sum of the x scores
Σy = sum of the y scores
Σx2 = sum of the squared x scores
Σy2 = sum of the squared y scores
Calculate with:
1) Microsoft Excel
2) STATA
ASSOCIATION (INTERVAL/RATIO LEVEL)
2
Does local planning of fast-growing medium-sized towns lead to higher urban
intensity or to sprawl ? Cases from Zhejiang Province
by Guan, ChengHe et al (2021)
This paper used open source geospatial data and regulatory detailed planning to measure urban intensity of the existing
and the planned.
Variables:
• Building density
• Diversity of land use function In this research using Pearson Correlation Coefficient Analysis to know if any of
• Accessibility to destination the urban intensity variables are highly correlated
• Compactness of development
• Composite score
ASSOCIATION (INTERVAL/RATIO LEVEL)
2
Does local planning of fast-growing medium-sized towns lead to higher urban
intensity or to sprawl ? Cases from Zhejiang Province
by Guan, ChengHe et al (2021)
3
Evaluation of pavement condition index by different methods: Case study of
Maringá, Brazil
by Pinatt, J.M et al. (2020)
3
Evaluation of pavement condition index by different methods: Case study of
Maringá, Brazil
by Pinatt, J.M et al. (2020)
4
An object-based image analysis in QGIS for image classification and assessment of
coastal spatial planning
by Zaki, A. et al. (2022)
A methodology used in the assessment of spatial planning consists of classifying imageries, projecting future land cover map,
and comparing the outcome of the projection with the map of the spatial planning (Amri et al., 2017; Hakim et al., 2020). From the
perspective of urban and regional planning practices, a pixel-based image analysis is a commonly used method for image classification
Furthermore, it calculates Pearson’s correlation between the spatial variables of land cover change and the probability of each
land cover changing to other land covers.
ASSOCIATION (INTERVAL/RATIO LEVEL)
4
An object-based image analysis in QGIS for image classification and assessment of
coastal spatial planning
by Zaki, A. et al. (2022)
Most of the variables have negligible correlation (Pearson’s correlation = 0.00 to 0.30 or 0.00 to 0.30), but there are three pairs of variables
having higher correlation (Table 3). For example, there are a low positive correlation between land value and population density (Pearson’s
correlation = 0.30 to 0.50), a low negative correlation between distance from the built-up areas and population density (Pearson’s correlation
= 0.30 to 0.50), and a high positive correlation between distance from the built-up areas and distance from the roads (Pearson’s correlation =
0.70 to 0.90).
The high correlation between built-up areas and a road network is showed by fact that residential buildings within the study area stretch
along the roads. It is also supported by a theory that humans prefer to live in an area with a higher road accessibility (Patarasuk, 2013).
Moreover, the transition probability suggests that there are conversions of 25 percent of paddy fields and bare land into waterbodies and 14
percent of paddy fields and bare land into built-up areas in 2015–2020.
ASSOCIATION (INTERVAL/RATIO LEVEL)
5
Landscape index for indicating water quality and application to master plan of
regional lake cluster restoration
by Xinxia He et al. (2021)
Studied the impacts of landscape pattern changes on water quality over lake clusters, taking the aquaculture
area in the Lixia River hinterland of China as a case. Multi-temporal Landsat series of remote sensing data
from 1985 to 2018 was used and space-for-time substitution (SFTS) method was applied to explore the
relationship between landscape pattern and water quality.
ASSOCIATION (INTERVAL/RATIO LEVEL)
5
Landscape index for indicating water quality and application to master plan of
regional lake cluster restoration
by Xinxia He et al. (2021)
6
Where do networks really work? The effects of the Shenzhen greenway network on
supporting physical activities
by Kun Liuet al. (2015)
In metropolitan areas, more greenways are interconnected, forming a greenway network (GN). A GN is considered to encourage physical
activities, but verifying this statement is difficult, as traditional social survey methods do not obtain fine-grain activity geographic data on a
large scale. In view of this shortcoming, the volunteered geographic information and the geographic information system techniques were used
to describe the distribution of physical activities in a GN, to explore the effects of greenway network features on supporting activities.
ASSOCIATION (INTERVAL/RATIO LEVEL)
6
Where do networks really work? The effects of the Shenzhen greenway network on
supporting physical activities
by Kun Liuet al. (2015)
As shown in the statistical analyses above, the GN density negatively influenced the presence of
physical activities in Model 1, while positively relating with the physical activity diversity .The
Shenzhen GN planning believes that the network could increase the possibility of people using them,
but the models revealed the confusing results. To explain these results, bivariate correlation analyses
were conducted to examine the relations between the GN density and surrounding variables .
CONCLUSION
Association Nominal Level Association Ordinal Level Association Interval/Ratio Level
The calculation method on the size of the The calculation method for the collapsed The method of calculating the type of
association based on chi square is the association size is the gamma correlative relationship is the coefficient
coefficient. Phi, V Creamer and coefficient, d sommer and tau-b of moment product or correlation r
Contingency kendal Pearson
The calculation method for continuous The calculation method on the type of
The calculation method on the PRE-
ordinal association sizes is rho spearman experimental relationship is ANOVA and
based association size is lambda
and tau-kendal Regression
Literature
• Kachigan. 1986. Statistical Analysis An Interdisciplinary Introduction to Univariate & Multivariate Methods. New York: Radius Press.
• Gulö, W. 2005. Metodologi Penelitian. Jakarta: PT Grasindo.
• Healey, Joseph F. 2010. Statistics: A Tool For Social Research, Ninth Edition. Wadsworth: Cengage Learning
• Sugiyono. 2018. Metode Penelitian Kuantitatif. Bandung: ALFABETA CV
• Carpio, Alejandro et al. 2021. Urban form, land use, and cover change and their impact on carbon emissions in the Monterrey Metropolitan
area, Mexico. Urban Climate, 39, 1-17.
• He, X., Chen, C., He, M., Chen, Q., Zhang, J., Li, G., … Dong, J. (2021). Landscape index for indicating water quality and application to master
plan of regional lake cluster restoration. Ecological Indicators, 126, 107668.
• Liu, K., Siu, K. W. M., Gong, X. Y., Gao, Y., & Lu, D. (2016). Where do networks really work? The effects of the Shenzhen greenway network on
supporting physical activities. Landscape and Urban Planning, 152, 49–58.
• Guan, ChungHe et al. 2022. Does local planning of fast-growing medium sized towns lead to higher urban intensity or to sprawl? Cases
from Zhejiang Province. Cities, 130, 1-13.
• Pinatt, J. M., Chicati, M. L., Ildefonso, J. S., & Filetti, C. R. G. D. arc. (2020). Evaluation of pavement condition index by different methods: Case
study of Maringá, Brazil. Transportation Research Interdisciplinary Perspectives, 4, 100100.
• Zaki, A., Buchori, I., Sejati, A. W., & Liu, Y. (2022). An object-based image analysis in QGIS for image classification and assessment of coastal
spatial planning. Egyptian Journal of Remote Sensing and Space Science, 25(2), 349–359.
Thank You