Professional Documents
Culture Documents
PRE - by - 6
PRE - by - 6
Why not χ2 ?
Reduction of Error z
Phi
z c) Sum based an observed and expected z Phi
frequencies many different tables may have z - measures based on χ2
sum χ2 value.
Association for nominal variable χ2
z
Φ=
n
χ2 χ2
c= V=
χ2 + n n(min(r − 1),(c − 1))
1
Example: Canadian firms
χ2=32920
Total
Firm type
domestic foreign
32920
widely held 197221 44579 241800
φ= = 0.259
490682
level of effective
87984 15843 103827
ownership control
legal
84414 60641 145055 32920
control
c= = 0.251
Total 369619 121063 490682 32920 + 490682
lambda
z Note that although values of Φ and C aren’t z Its main advantage relates to its
equal, they are of the same magnitude but ‘they asymmetrical nature.
are not particularly large’ is not a very
satisfactory way to have an interpretation. z Contrary to other tests, the way variables are
paired is of utmost importance; rows and columns
z Alternatives to χ2 are based on the idea of are not interchangeable.
proportional reduction in error or PRE.
z They have a clean interpretation based on how z Another advantage is the absence of
well you can predict the value of dependent constraints on the distribution of the variables
variables if you know the value of the
independent.
z Fw=197221+60641=257
F − M dv
λ= w
862
z Mdv=241800
N − Md v z N=490682
2
Calculating Lambda
z Lambda (λ) z Assuming the rows represent the dependent
variable
z Very common measure of association.
z Three Steps
z Compares two types of errors z Find (E1)
z Error made while ignoring the independent z Subtract the largest row total from N.
z Find (E2)
variable (E1).
z For each column, subtract the largest cell
z Errors made taking into account the frequency from the column total and then add
independent variable (E2). the subtotals together.
z Calculate Lambda
z Subtract (E1 - E2) and then divide by E1.
λ (lambda)
z λ = misclassified in situation 1 - misclassified in situation 2 z For situation 2 use the other variable.
misclassified in situation 1 z The rule is straight forward :
z For each category of the independent
z Use most common category to predict if you variable, predicts the category of the
don’t know anything else. dependent that occurs most frequently.
z For situation 1 use one of the variables, say
the row, this should be the dependent
variable and count the number of cases
misclassified.
Example
z E1 = 490682 – 241800 = 248882
Total
firm type z E2 = (369619 – 197221) = 172398
domestic foreign +
(121063 – 60641) = 60422
widely
197221 44579 241800 = 232820
held
level of effective
87984 15843 103827
ownership control
E1 − E 2
legal
84414 60641 145055
λ=
control E1
Total 369619 121063 490682
3
z The λ tells you the proportion by which you
248882−232820 reduce your error in predicting the dependent
λ= variable if you know the independent that’s
248882 why its called a Proportional Reduction In
Error measure.
λ = 0.065 z The largest the value can be is 1.
z When variables are independent, λ = 0.
z λ is not symmetric its value depends on which is
the independent variable.
Symmetric λ
z Suppose we take the column as dependent z If you have no reason to pick one as dependent or
independent, use symmetric 8 .
z Symmetric λ =Σ of 2 differences/Σ of denominator
121063 − 121063 Example
λ=
z
121063 16062
λ= = 0.043
369945
λ=0
4
Example Cross tabulation form
Retirees
Pop (000s) Rank Class Class
City (000s) Retirees class
z Let x be an
independent variable
( P − Q) ( 9 − 2 )
γ = = = 0.636 with three values and X
( P + Q) 11 let y be a dependent
with two values, with a,
1 2 3
Y 1 a b c
b, ..., f being the cell
counts in the resulting 2 d e f
table
5
Example for 2 by 3 table Kendall’s tau - b
Tied on y a(b+c)+bc+d(e+f)+ef Ty
P Q T Tx Ty
Tau - C
1, 2 X
1, 3 X
1, 4 X
1, 5 X X
2m( P − Q)
1, 6 X X z Where : m is the Tc =
smaller number of rows ( P − Q) 2 (m − 1)
1, 7 X
2, 3 X X X
2, 4
2, 5
X
X
X
X
and columns
2, 6 X X z Note : There is no simple
2, 7 X X
3, 4 X X
proportional reduction of
3, 5 X error interpretation. 2(3)(9 − 2) 6(7)
3, 6 X
Tc = = = 0.438
3, 7
4, 5
X
X
X
X
7 2 (3 − 1) 49(2)
4, 6 X
4, 7 X
5, 6 X
5, 7 X X X
m=3 because the cross tabulation table is a 3 by 3 table
6, 7 X X X (3 classes of population and 3 classes of retirees)
9 2 10 5 5
6
Somer’s d
z gamma, Tb, Tc are all symmetric measures
z same as gamma except the denominator is
sum of all pairs if cases that are not tied on
independent variables.
z i.e.
( P − Q) ( 9 − 2)
d= d= = 0.437
P + Q + ( pick Tx , Ty ) (9 + 2 + 5)