You are on page 1of 5

12.

2 Chi-Squared Tests for Independence

The c 2 test for independence is the third and final type of c 2 test we will learn. It tests for an association
between 2 categorical variables in 1 population.

How is this different than the other tests?


Goodness of Fit: 1 variable in 1 population

ex: take 1 sample and ask 1 question

Homogeneity of Proportions: 1 variable in 2 or more populations or treatments

ex: take 2 or more samples or treatments and ask 1 question

Independence: 2 variables in 1 population

ex: take 1 sample and ask 2 questions

Among Monte Vista high school seniors, is there an association between gender and having a driver’s
license? Suppose that a random sample of 100 seniors was taken and the subjects were asked their gender
and whether or not they had a driver’s license. Does the data suggest than an association exists?
Mal Femal
e e
License 36 32 68
No 11 21 32
License
47 53 10
If there was no association in the 0
two variables, what would the
expected counts be?

Give blank table and let them think.


Mal Femal
e e
License 68
No 32
License
47 53 10
Calculating expected cell counts: If 0
gender and having a license are
independent, then:
�47 ��68 �
P(M and L) = P(M) • P(L) = � �� � �
�100 ��100 �

Thus, since there are 100 people in our sample, the expected number of males with licenses should be:
�47 ��68 � 47 � 68 row total � column total
100•P(M and L) = 100 � � �� � �= = = 31.96
� 100 �� 100 � 100 grand total

Male Femal
e
License 31.9 36.04 68
6
No 15.0 16.96 32
License 4
47 53 10
0
Note: Even though this is not a HOP test, the method for calculating expected cell counts is the same.
5 steps:

1. At first glance, it appears that there is an association between gender and having a license since the
observed counts are different than the expected counts. However, it is possible that the variables have no
association and the differences we see are due to sampling variability. To decide, we will conduct a Chi-
square test for Independence ( a = .05).

2. H 0 : There is no association between gender and having a license


(Gender and having a license are independent).
H a : There is an association
(Gender and having a license are not independent).

3. Conditions:
a. The data comes from a random sample of Monte Vista seniors? Given.

b. The sample size is large? Yes, all expected cell counts are ≥ 5 (see table above).

c. Sample < 10% of population? Assuming > 1000 Monte Vista seniors. This is not reasonable.
Proceed with caution.

( 36 - 31.96 )
2

4. c =
2
+L = 3.01, df = (2 - 1)(2 - 1) = 1, P( c > 3.01) = .0827
2

31.96

5. Since P-value > a , we fail to reject the null hypothesis and cannot conclude that there is an association
between gender and having a license.
Innovative Machines Incorporated has developed two new letter arrangements for computer keyboards. The
company randomly select 300 beginning typing students and randomly assigned them to keyboard types.The
company wishes to see if there is any relationship between the arrangement of letters on the keyboard and
the number of hours it takes a new typing student to learn to type at 20 words per minute. Or, from another
point of view, is the time it takes a student to learn to type INDEPENDENT of the arrangement of the letters
on a keyboard? Perform a hypothesis test based on the data in the chart below:

1. At first glance, it appears that there is an association between keyboard and number of hours it takes the
master the keyboard since observed counts are different than the expected counts. However, it is possible
that the variables have no association and the differences we see are due to sampling variability. To decide,
we will conduct a Chi-square test for Independence ( a = .05).

2. H 0 : There is no association between keyboard type and time it takes to master it


(Keyboard type and Mastery Hours are independent).
H a : There is an association between keyboard type and time it takes to master it
(Keyboard type and Mastery Hours are not independent).

3. Conditions:
a. The data comes from a random sample of new typists? Given.

b. The sample size is large? Yes, all expected cell counts are ≥ 5 (see table above).

c. Sample < 10% of population? Assuming > 3000 beginning typists.

(25 - 24) 2
4. c 2 = + ... =13.32, df =(3 - 1)(3 - 1) =4, P( c 2 > 13.32) = .0098
24

5. Since P-value < a , we reject the null hypothesis and conclude that there is an association between
keyboard type and having Mastery Hours required. The time it takes a student to learn to type is not
independent of the keyboard type.
HW #94: 12.16, 12.30, 12.40, 12.43

You might also like