You are on page 1of 25

Basic Statistics

The Chi Square Test of


Independence
Data Types

Data

Quantitative Qualitative

Discrete Continuous

EPI809/Spring 2008 2
Qualitative Data
1. Qualitative Random Variables Yield
Responses That Can Be Put In Categories.
Example: Gender (Male, Female)
2. Measurement or Count Reflect # in Category
3. Nominal (no order) or Ordinal Scale (order)
4. Data can be collected as continuous but
recoded to categorical data. Example
(Systolic Blood Pressure - Hypotension,
Normal tension, hypertension )
EPI809/Spring 2008 3
Hypothesis Tests
Qualitative Data
Qualitative
Data

2 or more
1 pop. pop.
Proportion Independence

2 pop.

Z Test Z Test 2 Test 2 Test

EPI809/Spring 2008 4
Hypothesis Tests
Qualitative Data
Qualitative
Data

2 or more
1 pop. pop.
Proportion Independence

2 pop.

Z Test Z Test 2 Test 2 Test

EPI809/Spring 2008 5
Chi Square Test of Independence
• A measure of association similar to the correlations
we will see later.
• T tests or Pearson and Spearman are not applicable if
the data are at the nominal level of measurement.
• Chi Square is used for nominal data placed in a
contingency table.
– A contingency table is a two-way table showing the
contingency between two variables where the variables
have been classified into mutually exclusive categories
and the cell entries are frequencies.
 Test of Independence
2

1.Shows If a Relationship Exists Between 2


Qualitative Variables, but does Not Show
Causality
2.Assumptions
Multinomial Experiment
All Expected Counts  5
3.Uses Two-Way Contingency Table

EPI809/Spring 2008 7
 Test of Independence
2

Contingency Table
1.Shows # Observations From 1 Sample
Jointly in 2 Qualitative Variables
Levels of variable 2

Residence
Disease Urban Rural Total
Status
Disease 63 49 112
No disease 15 33 48
Total 78 82 160
Levels of variable 1
EPI809/Spring 2008 8
 Test of Independence
2

Hypotheses & Statistic


1.Hypotheses
 H0: Variables Are Independent
 Ha: Variables Are Related (Dependent)

EPI809/Spring 2008 9
 Test of Independence
2

Hypotheses & Statistic


1.Hypotheses
H0: Variables Are Independent
Ha: Variables Are Related (Dependent)
2.Test Statistic Observed count

ch
nij  E nij
2
Expected
2
  
all cells ch
E n ij
count

EPI809/Spring 2008 10
 Test of Independence
2

Hypotheses & Statistic


1.Hypotheses
H0: Variables Are Independent
Ha: Variables Are Related (Dependent)
2.Test Statistic Observed count

ch
nij  E nij
2
Expected
2
  
all cells ch
E n ij
count

Rows Columns
Degrees of Freedom: (r - 1)(c - 1)
EPI809/Spring 2008 11
2 Test of Independence
Expected Counts
1.Statistical Independence Means Joint
Probability Equals Product of Marginal
Probabilities
2.Compute Marginal Probabilities & Multiply
for Joint Probability
3.Expected Count Is Sample Size Times
Joint Probability

EPI809/Spring 2008 12
An Example
• Suppose that the state legislature is considering
a bill to lower the legal drinking age to 18. A
political scientist is interested in whether there
is a relationship between party affiliation and
attitude toward the bill. A random sample of
150 registered republicans and 200 registered
democrats are asked their opinion about the
proposed bill. The data are presented on the
next slide.
Political Party and
Legal Drinking Age Bill
Attitude Toward Bill
For Undecided Against Total

Republican 38 17 95 150

Democrat 92 18 90 200

Total 130 35 185 350

The bold numbers are the observed frequencies (fo)


Determining the Expected
Frequencies (fe)
• First, add the columns and the rows to get
the totals as shown in the previous slide.
• To obtain the expected frequency within a
particular cell, perform the following
operation: Multiply the row total and the
column total for the cell in question and
then divide that product by the Total
number of all respondents.
Calculating the Expected Value for a
Particular Cell
Attitude Toward Bill
For Undecided Against Total

Republican 38 17 95 150
55.7
Democrat 92 18 90 200

Total 130 35 185 350

1. 130*150 = 19500 2. 19500/350 = 55.7


Political Party and Attitude toward Bill
Attitude Toward Bill
For Undecided Against Total
Republican 38 17 95 150
55.7 15 79.3

Democrat 92 18 90 200
74.3 20 105.7

Total 130 35 185 350

Numbers in Black are obtained (fo),


Numbers in Purple are expected (fe)
The Null Hypothesis and the
Expected Values
• The Null Hypothesis under investigation in
the Chi Square Test of Independence is that
the two variables are independent (not
related).
• In this example, the Null Hypothesis is that
there is NO relationship between political
party and attitude toward lowering the legal
drinking age.
Understanding the Expected Values

• If the Null is true, then the percentage of


those who favor lowering the drinking age
would be equal for each political party.
• Notice that the expected values for each
opinion are proportional for the number of
persons surveyed in each party.
Political Party and Attitude toward Bill
Attitude Toward Bill
For Undecided Against Total
Republican 38 17 95 150
55.7 15 79.3 42.8%

Democrat 92 18 90 200
74.3 20 105.7 57.2%

Total 130 35 185 350

The numbers in Green are the percentage of the total for each Party.
The Expected Values
• The expected values for each cell are also
equal to the percentage of each party for the
column total.
• For example, Republicans were 42.8% of the
total persons surveyed
• If 130 people were in favor of the bill, then
42.8% of them should be Republican (55.7), if
there is no relationship between the variables
Calculating the Chi Square Statistic

The Chi Square statistic is obtained via this formula


2
2 ( fo  fe )
x 
fe
The Chi Square statistic is
(1) the sum over all cells of
(2) the difference between the obtained value and the
expected value SQUARED, which is then
(3) divided by the expected frequency.

The numbers in Purple on the next slide illustrate this calculation


Calculating the Chi Square Statistic

(38  55.7) 2
(17  15) 2 (95  79.3) 2
55.7 15 79.3

=5.62 =0.27 =3.11


(92  74.3) 2 (18  20) 2 (90  105.7) 2
74.3 20 105.7
=4.22 =0.20 =2.33

X2 = 5.62 + 0.27 + 3.11 + 4.22 + 0.20 + 2.33 = 15.75


Interpreting the Results
The calculated value for the chi square statistic X 2 is
compared to the critical value found in Table H, page 544.

Note: The distribution of the Chi Square Statistic is not


normal and the critical values are only on one side. If the
obtained values are close to the expected value, then the chi
square statistic will approach 0. As the obtained value is
different from the expected, the value of chi square will
increase. This is reflected in the values found in Table H.

The Degrees of Freedom for the Chi Square Test of


Independence is the product of the number of rows
minus 1 times the number of columns minus 1
Interpreting Our Results
• In our study, we had two rows (Republicans and
Democrats) and three columns (For, Undecided,
Against).
• Therefore, the degrees of freedom for our study is
(2-1)(3-1) = 1(2) = 2.
• Using an  of .05, the critical value from Table H
would be 5.991
• Since our calculated chi square is 15.75, we
conclude that there IS a relationship between
political party and opinion on lowering the drinking
age, thereby rejecting the Null Hypothesis

You might also like