You are on page 1of 14

Slide

17-1

Chi-Square Tests
Chi-Squared Analysis: Testing
for Patterns in Qualitative Data
Slide
17-2 Chi-Squared Tests
• Hypothesis Tests for Qualitative Data
– Categories instead of numbers
– Based on counts
• the number of sampled items falling into each category
– Chi-squared statistic
• Measures the difference between Actual counts and Expected
counts (as expected under the null hypothesis H0)

Chi - squared statistic  Sum of


( Observed Count  Expected Count ) 2 O  Ei 
 i
2

Expected Count Ei
where the sum extends over all categories or combinations of
categories
– Significant if the chi-squared statistic is large enough
Slide
17-3 Summarizing Qualitative Data
• Use Counts and Percentages, for Example:

– How many (of people sampled) prefer your product?

– What percentage of sophisticated product users said


they would like to purchase an upgrade?

– What percentage of products produced today are both


• designed for export
• and imperfect
Slide
17-4 Testing Population Percentages
• Testing if Population Percentages are Equal to
Known Reference Values
– The chi-squared test for equality of percentages
– Could a table of observed counts have reasonably come
from a population with known percentages (the
reference values)?
– The data: A table indicating the count for each
category for a single qualitative variable
– The hypotheses:
• H0: The population percentages are equal to a set of known,
fixed reference values
• H1: The population percentages are not equal to a set of
known, fixed reference values
Slide
17-5 Testing Percentages (continued)
– The expected counts:
• For each category, multiply the population reference
proportion by the sample size, n
– The assumptions:
1. Data set is a random sample from the population of interest
2. At least 5 counts are expected in each category
– The chi-squared statistic:

Sum of
(Observed Count  Expected Count ) 2 O  Ei 
 i
2

Expected Count Ei

– The degrees of freedom: Number of categories minus 1


– The test result: Significant if the chi-squared statistic is
larger than the critical value from the table
Slide
17-6 Independence
• Two Qualitative Variables are Independent if:
– knowledge about the value (i.e., category) of one
variable does not help you predict the other variable
• For Example: Background Sales Information
– The two variables are
• where the customer lives, and
• the customer’s favorite product
– Independence would say that
• customers tend to have the same pattern (distribution) of
favorite products, regardless of where they live, and
• where customers live is not associated with which product is
their favorite
Slide
17-7 Testing for Association
• The chi-squared test for independence
– The data: A table indicating the counts for each
combination of categories for two qualitative variables
– The hypotheses:
H0: The two variables are independent of one another
H1: The two variables are associated; they are not independent
– The expected table:
 Count for category  Count for category 
  
Expected count    
for one variable for other vari able
n

• Tells you what the counts would have been, on average, if the
variables were independent and there were no randomness
Slide
17-8 Testing Association (continued)
– The assumptions:
1. Data set is a random sample from the population of interest
2. At least 5 counts are expected in each combination of
categories
– The chi-squared statistic:
Sum of
(Observed Count  Expected Count ) 2 O  Ei 
 i
2

Expected Count Ei
• where the sum extends over all combinations of categories
– The degrees of freedom:
 Number of categories  Number of categories 
  1  1
 for first vari able  for second variable 
– The test result: Significant if the chi-squared statistic is
larger than the critical value from the table
Slide
17-9
Tbl 17.3.1
Example: Market Segmentation
• Data: Rowing Machine Purchases
– Which model rowing machine was purchased?
• Basic, Designer, or Complete
– Which type of customer purchased it?
• Practical or Impulsive

Observed Counts
Practical Impulsive Total
Basic 22 25 47
Designer 13 88 101
Complete 54 19 73
Total 89 132 221
Slide
17-10
Tbl 17.3.2
Example (continued)
• Overall Percentages
– Divide each count by the overall total, 221
• e.g., 10.0% = 22/221 were practical customers who purchased
basic machines
• Note: impulsive customers bought most designer machines,
while practical customers bought most complete machines
Overall Percentages
Practical Impulsive Total
Basic 10.0% 11.3% 21.3%
Designer 5.9% 39.8% 45.7%
Complete 24.4% 8.6% 33.0%
Total 40.3% 59.7% 100.0%
Slide
17-11
Tbl 17.3.3
Example (continued)
• Percentages by Model
– Divide each count by the total for its model type
• e.g., 22/47 = 46.8% of the 47 basic machines were purchased
by practical customers
• Note: Practical customers make up 40.3% of all customers, but
represent 74.0% of complete-machine purchasers
Percentages by Model
Practical Impulsive Total
Basic 46.8% 53.2% 100.0%
Designer 12.9% 87.1% 100.0%
Complete 74.0% 26.0% 100.0%
Total 40.3% 59.7% 100.0%
Slide
17-12
Tbl 17.3.4
Example (continued)
• Percentages by Customer Type
– Divide each count by the total for its customer type
• e.g., 22/89 = 24.7% of the 89 practical customers purchased
basic machines
• Note: Of all machines, 33.0% are complete; looking only at
practical-customer purchases, 60.7% are complete
Percentages by Customer Type
Practical Impulsive Total
Basic 24.7% 18.9% 21.3%
Designer 14.6% 66.7% 45.7%
Complete 60.7% 14.4% 33.0%
Total 100.0% 100.0% 100.0%
Slide
17-13
Tbl 17.3.5
Example (continued)
• Expected Counts
– Multiply row total by column total, then divide by the
overall total of 221
• e.g., if there were no association between type of customer and
type of machine, we would have expected to find 8947/221 =
18.93 basic machines purchased by practical customers
Expected Counts
Practical Impulsive Total
Basic 18.93 28.07 47.00
Designer 40.67 60.33 101.00
Complete 29.40 43.60 73.00
Total 89.00 132.00 221.00
Slide
17-14 Example (continued)
• Chi-Squared Statistic (do not use the total row or column)
66.8
• Degrees of freedom
(Rows – 1)(Column – 1) = (3 – 1)(2 – 1) = 2
• Result
– If there were no association, such a large chi-squared
value would be highly unlikely

The association between customer type and model


purchased is very highly significant (p < 0.001)

You might also like