You are on page 1of 10

8/25/22

Lecture 9:
Chi-squared tests for
qualitative data

Objectives

Chi-squared tests
for qualitative data

Goodness-of-fit Test of
test independence

Goodness-of-fit test

1
8/25/22

Market share
• Firms periodically estimate their market
shares and the market shares of competitors.
• Market shares change over time.
• Firms have to test to determine whether the
actual current market shares are in
accordance with their beliefs.

Problem context
– Two competing companies A and B have
conducted aggressive advertising campaigns.
– Market shares before the campaigns were:
• Company A = 45%
• Company B = 40%
• Other competitors = 15%.

Problem context (cont.)

§To study the effects of the campaigns on the market


shares, 200 customers were asked to indicate their
preference regarding the product advertised.
§ Survey results:
§ 102 customers preferred Company A’s product
§ 82 customers preferred Company B’s product
§ 16 customers preferred the other competitors’
products.
§At 5% level of significance can we conclude that
market shares changed?

2
8/25/22

The
The technique
technique

– The population investigated is the brand


preferences.
– The data are qualitative (A, B or others).
– This is a multinomial experiment, with three
categories.
– The question of interest is: Are p 1, p 2 and p 3
different after the campaign from their values
before the campaign?

Multinomial experiment

• The multinomial experiment studied is an extension


of the binomial experiment.
– There are n independent trials.
– The outcome of each trial can be classified into
one of k categories, called cells.
– The probability p i of cell i remains constant for
each trial. Moreover, p 1 + p 2 + … + p k = 1.

Problem
Problem context (cont.)
context (cont.)
– The hypotheses are:
H0: p1 = 0.45, p2 = 0.40, p3 = 0.15
HA: at least one pi is not equal to its specified value.

What sample frequency would you expect


for each category if the null hypothesis is true?

90 = 200(0.45) 80 = 200(0.40) What actual frequencies


did the sample return?

102
82
30 = 200(0.15)

16 9

3
8/25/22

Problem context (cont.)

Company Observed market Expected


frequenc shares frequency
y oi pi ei
A 102 0.45 90
B 82 0.40 80
Others 16 0.15 30
TOTAL 200 1 200

10

The technique
• To determine whether the market shares have
changed or not, the best way is to compare
the observed frequency and the expected
one.
• If the differences are small à actual shares
are in accord with estimated shares àshares
unchanged.
• If the differences are large à actual shares are
not in accord with estimated shares.à shares
changed.

11

The technique

Expected frequency = np

The statistic follows chi-squared

distribution with (k-1) degrees of freedom.

We reject Ho if the value of

12

4
8/25/22

Problem
Problemcontext
context(cont.)
(cont.)
• The rejection region is Conclusion: at the 5%
significance level there is
sufficient evidence to reject
the null hypothesis. At least
one of the probabilities pi
is different. Thus, at least two
• Value of the test statistic: market shares have changed.

13

Rule of five
• The test statistic used to perform the test is
only approximately Chi-squared distributed.
• For the approximation to apply, the expected
cell frequency has to be at least five.
• If the expected frequency in a cell is less than
five, combine it with other cells.

14

Goodness-of-Fit test

Goodness-of-Fit Test
comparing an observed set of frequencies
to an expected distribution.

Equal Expected Unequal Expected


Frequencies Frequencies

15

5
8/25/22

Goodness-of-Fit test
Equal expected frequencies
Ho : No difference between ....
Ha : There is a difference between ....
OR
• H 0: p 1 = 1/k, …, p k = 1/k
• H A: At least one proportion differs from its
specified value.
• Test statistic:

follows chi-squared distribution with k-1 degrees of


freedom
16

Example
The following data on absenteeism was collected
from a manufacturing plant. At the 0.05 level of
significance, test to determine whether there is a
difference in the absence rate by day of the week.

17

Solution
Ho : No difference in the absence rate by day of the week
Ha : There is a difference in the absence rate by day of the week
Test statistic:

Level of significance: α =0.05


Decision rule: reject Ho if
Value of the test statistic:
Day Oi pi Ei (Oi-Ei)^2/Ei
Decision:
MON 120 0.2 89 10.798 Reject Ho
TUE 45 0.2 89 21.753
WED 60 0.2 89 9.449
THU 90 0.2 89 0.011
FRI 130 0.2 89 18.888
445 60.899 18

6
8/25/22

Goodness-of-Fit test
Unequal expected frequencies
• H 0: P 1 = p 1, …, P k = p k
• H A: At least one proportion differs from its
specified value.
• Test statistic:

follows chi-squared distribution with k-1 degrees


of freedom.

19

Test of independence

20

Choice of MBA major


• In an effort to better predict the demand
for courses offered by a certain MBA
program, it was hypothesised that
students’ academic background affects
their choice of MBA major, thus, their
course selection.

21

7
8/25/22

Problem context
• A random sample of last year’s MBA students was selected.
• The following contingency table summarizes the relevant
data

• Is there sufficient evidence to infer that undergraduate


degree affects MBA students’course preferences?

22

Problem context

There are two ways to address the problem:

If each classification is considered If each undergraduate degree


a qualitative variable, are these two is considered a population, do
variables dependent? these populations differ?

23

Problem context
H0: The two variables are independent
HA: The two variables are dependent

§ The test statistic

The rejection region

rc is the number of cells in


the contingency table.
df
Since eij = npij we need to estimate
the unknown probability from the data,
assuming H0 is true.
24

8
8/25/22

Problem context
60

39
61 44 152

• Under the null hypothesis the two variables are independent:

P(Marketing and BA) = P(Marketing)*P(BA) = [61/152][60/152].

The number of students expected to fall in the cell ‘Marketing – BA’ is


eMarket–BA = npMarket–BA = 152(61/152)(60/152) = [61*60]/152 = 24.08

The number of students expected to fall in the cell ‘Finance – BBA’ is


eFinance–BBA = npFinance–BBA = 152(44/152)(39/152) = [44*39]/152 = 11.29

25

Problem context

Expected frequency for a contingency table

(column j total)(row i total)


eij = sample size

26

Problem context

31 24.08

31 24.08
5 6.39 7 6.80
31 24.08 5 6.39 7 6.80
The expected frequency 7 6.80
5 6.39
31 24.08
Calculation of the c 2 statistic
7 6.80
31 24.08 5 6.39
7 6.80
24.08)2 6.39)2 (7 - 6.80)2
c 2= (31 -
24.08 +…+
(5 -
6.39
+…+ 6.80
= 14.70

The critical value in our example: Since 14.70 > 12.5916,


there is sufficient evidence
to infer that undergraduate
degree affects MBA students’
27
course preferences.

9
8/25/22

Rule of five
– The c2 distribution provides an adequate
approximation to the sampling distribution under the
condition that eij ³ 5 for all the cells.
– When eij < 5, rows or columns must be added such
that the condition is met.

Example
18 (17.9) 4 (5.1)
23 (22.3) 7 (6.3)
12 (12.8) 4 (3.6)
4 +14 5.1+12.8
7 +16 6.3+16
4+ 8 3.6+ 9.2
28

Summary

Ø Goodness-of-fit test

Ø Test of independence

Ø Rule of five

29

10

You might also like