You are on page 1of 31

ASSOCIATION OF ATTRIBUTES

Chi-Square


MUHAMMAD USMAN
ROLL 553-07-09
1
Data Types
Data
Quantitative Qualitative
Continuous Discrete
2
Quantitative Data
Quantitative Data usually consist of
measurable characteristics called variables. For
example,
The annual income of a family,
The weight/ height of a student,
The age of a child,
The price of a commodity

3
Qualitative Data
Qualitative Data can not be measured accurately
but can be divided into classes and their number
in each class can be counted. It consist of non-
measurable characteristics. A characteristic
which can be measured numerically (but only its
presence or absence can be described) is called
an Attribute. Nominal or ordinal scale, For
example,
Division by Gender (Male, Female)
Marital Status (Single, Married, Divorced, or
Widowed)
Employment Status (Employed, Unemployed)
4
Measurement Scales
The four scales of measurement are

Nominal Scale
Ordinal or Ranking Scale
Interval Scale
Ratio Scale
5
Nominal Scale
It is the classification of the observations into
mutually exclusive qualitative categories, For
example,
Students are classified as MALE or
FEMALE, Number 1 and 2 may also be used
to identify these two categories.
Rainfall may be classified as HEAVY,
MODERATE & LIGHT, Numbers 1, 2 and 3
might be used to denoted three classes.

6
NOTE: There is no particular order for grouping/
classifications here..
Ordinal or Ranking Scale
It includes the characteristic of a nominal
scale and in addition has a property of
ordering or ranking, For example,
The performance of Students is rated as
EXCELLENT, GOOD, FAIR or POOR, etc
here number 1, 2, 3 & 4 are used to indicate
ranks.
7
NOTE: The only relation that holds
between any pair of categories is
that of greater than (or more
preferred)
Interval Scale
A measurement scale possessing a constant
interval size (distance) but not a true zero point, is
called an interval scale. For example,
Temperature measured is an outstanding
example of interval scale because a same
difference exists between 20C and 40C as
between 5C and 25C. It can not be said that a
temp of 40c is twice as hot as a temperature of
20c.
The ratio 40/20 has no meanings.
8
NOTE: The arithmetic operation
addition, subtraction etc. are
meaningful.
Ratio Scale
It is a special kind of an interval scale where
the scale of measurement has a true zero
point as its origin.
The ratio scale is used to measure weight,
volume, length, distance, money, etc in
which zero point is meaningful.
9
NOTE: The zero point is meaningful for
Ratio scale but not for Interval
scale..
Hypothesis Tests
Qualitative Data
Qualitative
Data
Z Test Z Test
_
2
Test
Proportion Independence
1 pop.
_
2
Test
More than
2 pop.
2 pop.
10
ASSUMPTION
Random sample selected from a binomial
population Normal approximation can be used if


H0: p <= p
0
or p = p
0
or p >= p
0
H1: p > p
0
or p p
0
or p < p
0
Z-test statistic
where


11
0 0

15 and 15 np nq > >
0
0 0
p p
Z
p q
n

~
number of successes

sample size
x
p
n
= =
Hypothesis for One Proportion
H
a
Hypothesis
Research Questions
No Difference
Any Difference
Pop 1
>
Pop 2
Pop 1 < Pop 2
Pop 1
s
Pop 2
Pop 1 > Pop 2
H
0 1 2
0 p p >
1 2
0 p p <
1 2
0 p p =
1 2
0 p p =
1 2
0 p p s
1 2
0 p p >
Z-Test Statistic for Two Proportions
( ) ( )
1 2 1 2
1 2
1 2
1 2

where
1 1

p p p p
X X
Z p
n n
pq
n n

+
~ =
+
| |
+
|
\ .
Hypothesis for Two Proportions
Chi Square Test Basic Idea
1. Compares observed count to expected
count assuming null hypothesis is true
2. Closer observed count is to expected
count, the more likely the H
0
is true

2. Test Statistic
( )
( )
2
2
all cells
i i
i
n E n
E n
_
(

=

Observed (actual) count
Expected
count:
E(n
i
) = np
i,0

3. Degrees of Freedom: k 1
Number of
outcomes
Hypothesized
probability
1. Hypotheses
H
0
: p
1
= p
1,0
, p
2
= p
2,0
, ..., p
k
= p
k,0

H
a
: At least one p
i
is different from above
Chi Square Test for k proportions
What is the critical _
2
value if k = 3, and =.05?
_
2
0
Upper Tail Area
DF .995 .95 .05
1 ... 0.004 3.841
2 0.010 0.103 5.991
_
2
Table
(Portion)
If n
i
= E(n
i
), _
2
= 0.
Do not reject H
0
df = k - 1 = 2
5.991
Reject H
0
o = .05
Finding Critical Value
_
2
Test of Independence Example
As a realtor you want to determine if house style
and house location are related. At the .05 level of
significance, is there evidence of a relationship?
House Location
House Style Urban Rural Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160

Shows number of observations from 1 sample
jointly in 2 qualitative variables
House Location
House Style Urban Rural Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160

Levels of variable 2
Levels of variable 1
Chi Square Test of Independence
Contingencies Table
112
160
Marginal probability =
Expected Count Example
Location
Urban Rural
House Style Obs. Obs. Total
SplitLevel 63 49 112
Ranch 15 33 48
Total 78 82 160
78
160
Marginal probability =
Expected Count Example
112
160
Marginal probability =
Location
Urban Rural
House Style Obs. Obs. Total
SplitLevel 63 49 112
Ranch 15 33 48
Total 78 82 160
Expected Count Example
78
160
Marginal probability =
112
160
Marginal probability =
Joint probability =
112
160
78
160
Location
Urban Rural
House Style Obs. Obs. Total
SplitLevel 63 49 112
Ranch 15 33 48
Total 78 82 160
Expected count = 160
112
160
78
160
= 54.6
Expected Count Calculation
i j
R C
=
n
ij
E
House Location
Urban Rural

House Style Obs. Exp. Obs. Exp. Total
Split - Level 63
112 78
160
54.6 49
112 82
160
57.4 112
Ranch 15
48 78
160
23.4 33
48 82
160
24.6 48
Total 78 78 82 82 160

E
ij
> 5 in all cells
_
2
Test of Independence Solution
House Location
Urban Rural

House Style Obs. Exp. Obs. Exp. Total
Split-Level 63 54.6 49 57.4 112
Ranch 15 23.4 33 24.6 48
Total 78 78 82 82 160


112 82
160
48 78
160
48 82
160
112 78
160
| | | | | |
| | | | | |
2
2
all cells
2 2 2
11 11 12 12 22 22
11 12 22
2 2 2
63 54.6 49 57.4 33 24.6
8.41
54.6 57.4 24.6
ij ij
ij
n E
E
n E n E n E
E E E
_
(

=

= + + +

= + + + =

_
2
Test of Independence Solution
_
2
Test of Independence Solution
H0:
Ha:
o =
df =
Critical Value(s):
Test Statistic:

p-value = ?
Decision:

Conclusion:

_
2
= 8.41
Reject at o = .05
There is evidence of
a relationship
_
2
0
Reject H
0
No Relationship
Relationship
.05
(2 - 1)(2 - 1) = 1
3.841
o = .05
Yates Correction for Continuity
( )

=

=
2
1
2
2
5 . 0
i
i
i i
e
e o
_
In applying Chi-square approximation, we are required
to combine the smaller expected counts (<5) with larger
ones.
But in case of 2 classes only, we cannot pool the smaller
frequency into the larger one.
Frank Yates in 1934 showed that the Chi Square
approximation is markedly improved if we use the
following formula



It should only be used when d.f=1 and only one ei is
small.

Chi-Square Table
Coefficient of Contingency
Chi-Square statistic does not tell anything about the
strength of the association.
For this purpose Karl Pearson (1857-1936) has
defined a coefficient C defined as pearson coefficient of
mean square contingency


where n indicates sample size
This coefficient measures the strength of the association
or dependence of two variables of classification of the
contingency table.

n
C
+
=
2
2
_
_
C=0 (Complete Independence)

If (Perfect Association) k is smaller of r & c

C lies between zero and

The larger the value of C the stronger is the association.
C suffers from the disadvantage that it does not reach a
maximum of 1 or the minimum of -1
It should, therefore, not be used to compare associations
among tables with different numbers of categories

k
k
C
1
=
k
k 1
Coefficient of Contingency
Phi-Coefficient
Phi Coefficient is defined as


Where chi-square is a pearsons Chi square statistic, and
N is a grand total of the observations.
Phi varies from -1 to 1
0 indicates no association
1 corresponds complete association
-1 corresponds complete inverse association
This coefficient can only be calculated for frequency
data represented in 2 x 2 tables
N
2
_
= u
Cramers Co-efficient of contingency
Cramers co-efficient of contingency is defined as



Where n is total sample size and k is smaller of r & c
If Q=0 variables are completely independent
If Q=1, there is perfect relationship
( ) 1
2

=
k n
Q
_




Thanks

You might also like