BES - Lecture 9 - Chi Squared Tests For Qualitative Data

8/25/22
Lecture 9:
Chi-squared tests for
qualitative data
Objectives
Chi-squared tests
for qualitative data
Goodness-of-fit Test of
test independence
Goodness-of-fit test
1
8/25/22
Market share
• Firms periodically estimate their market
shares and the market shares of competitors.
• Market shares change over time.
• Firms have to test to determine whether the
actual current market shares are in
accordance with their beliefs.
Problem context
– Two competing companies A and B have
conducted aggressive advertising campaigns.
– Market shares before the campaigns were:
• Company A = 45%
• Company B = 40%
• Other competitors = 15%.
Problem context (cont.)
§To study the effects of the campaigns on the market

shares, 200 customers were asked to indicate their
preference regarding the product advertised.
§ Survey results:
§ 102 customers preferred Company A’s product
§ 82 customers preferred Company B’s product
§ 16 customers preferred the other competitors’
products.
§At 5% level of significance can we conclude that
market shares changed?
2
8/25/22
The
The technique
technique
– The population investigated is the brand

preferences.
– The data are qualitative (A, B or others).
– This is a multinomial experiment, with three
categories.
– The question of interest is: Are p 1, p 2 and p 3
different after the campaign from their values
before the campaign?
Multinomial experiment
• The multinomial experiment studied is an extension

of the binomial experiment.
– There are n independent trials.
– The outcome of each trial can be classified into
one of k categories, called cells.
– The probability p i of cell i remains constant for
each trial. Moreover, p 1 + p 2 + … + p k = 1.
Problem
context (cont.)
– The hypotheses are:
H0: p1 = 0.45, p2 = 0.40, p3 = 0.15
HA: at least one pi is not equal to its specified value.
What sample frequency would you expect

for each category if the null hypothesis is true?
90 = 200(0.45) 80 = 200(0.40) What actual frequencies

did the sample return?
102
82
30 = 200(0.15)
16 9
3
8/25/22
Company Observed market Expected

frequenc shares frequency
y oi pi ei
A 102 0.45 90
B 82 0.40 80
Others 16 0.15 30
TOTAL 200 1 200
10
The technique
• To determine whether the market shares have
changed or not, the best way is to compare
the observed frequency and the expected
one.
• If the differences are small à actual shares
are in accord with estimated shares àshares
unchanged.
• If the differences are large à actual shares are
not in accord with estimated shares.à shares
changed.
11
The technique
Expected frequency = np
The statistic follows chi-squared
distribution with (k-1) degrees of freedom.
We reject Ho if the value of
12
4
8/25/22
Problem
Problemcontext
context(cont.)
(cont.)
• The rejection region is Conclusion: at the 5%
significance level there is
sufficient evidence to reject
the null hypothesis. At least
one of the probabilities pi
is different. Thus, at least two
• Value of the test statistic: market shares have changed.
13
Rule of five
• The test statistic used to perform the test is
only approximately Chi-squared distributed.
• For the approximation to apply, the expected
cell frequency has to be at least five.
• If the expected frequency in a cell is less than
five, combine it with other cells.
14
Goodness-of-Fit test
Goodness-of-Fit Test
comparing an observed set of frequencies
to an expected distribution.
Equal Expected Unequal Expected

Frequencies Frequencies
15
5
8/25/22
Equal expected frequencies
Ho : No difference between ....
Ha : There is a difference between ....
OR
• H 0: p 1 = 1/k, …, p k = 1/k
• H A: At least one proportion differs from its
specified value.
• Test statistic:
follows chi-squared distribution with k-1 degrees of

freedom
16
Example
The following data on absenteeism was collected
from a manufacturing plant. At the 0.05 level of
significance, test to determine whether there is a
difference in the absence rate by day of the week.
17
Solution
Ho : No difference in the absence rate by day of the week
Ha : There is a difference in the absence rate by day of the week
Test statistic:
Level of significance: α =0.05

Decision rule: reject Ho if
Value of the test statistic:
Day Oi pi Ei (Oi-Ei)^2/Ei
Decision:
MON 120 0.2 89 10.798 Reject Ho
TUE 45 0.2 89 21.753
WED 60 0.2 89 9.449
THU 90 0.2 89 0.011
FRI 130 0.2 89 18.888
445 60.899 18
6
8/25/22
Unequal expected frequencies
• H 0: P 1 = p 1, …, P k = p k
• H A: At least one proportion differs from its
specified value.
• Test statistic:
follows chi-squared distribution with k-1 degrees

of freedom.
19
Test of independence
20
Choice of MBA major

• In an effort to better predict the demand
for courses offered by a certain MBA
program, it was hypothesised that
students’ academic background affects
their choice of MBA major, thus, their
course selection.
21
7
8/25/22
Problem context
• A random sample of last year’s MBA students was selected.
• The following contingency table summarizes the relevant
data
• Is there sufficient evidence to infer that undergraduate

degree affects MBA students’course preferences?
22
Problem context
There are two ways to address the problem:
If each classification is considered If each undergraduate degree

a qualitative variable, are these two is considered a population, do
variables dependent? these populations differ?
23
Problem context
H0: The two variables are independent
HA: The two variables are dependent
§ The test statistic
The rejection region
rc is the number of cells in

the contingency table.
df
Since eij = npij we need to estimate
the unknown probability from the data,
assuming H0 is true.
24
8
8/25/22
Problem context
60
39
61 44 152
• Under the null hypothesis the two variables are independent:
P(Marketing and BA) = P(Marketing)*P(BA) = [61/152][60/152].
The number of students expected to fall in the cell ‘Marketing – BA’ is

eMarket–BA = npMarket–BA = 152(61/152)(60/152) = [61*60]/152 = 24.08
The number of students expected to fall in the cell ‘Finance – BBA’ is

eFinance–BBA = npFinance–BBA = 152(44/152)(39/152) = [44*39]/152 = 11.29
25
Problem context
Expected frequency for a contingency table
(column j total)(row i total)

eij = sample size
26
Problem context
31 24.08
31 24.08
5 6.39 7 6.80
31 24.08 5 6.39 7 6.80
The expected frequency 7 6.80
5 6.39
31 24.08
Calculation of the c 2 statistic
7 6.80
31 24.08 5 6.39
7 6.80
24.08)2 6.39)2 (7 - 6.80)2
c 2= (31 -
24.08 +…+
(5 -
6.39
+…+ 6.80
= 14.70
The critical value in our example: Since 14.70 > 12.5916,

there is sufficient evidence
to infer that undergraduate
degree affects MBA students’
27
course preferences.
9
8/25/22
Rule of five
– The c2 distribution provides an adequate
approximation to the sampling distribution under the
condition that eij ³ 5 for all the cells.
– When eij < 5, rows or columns must be added such
that the condition is met.
Example
18 (17.9) 4 (5.1)
23 (22.3) 7 (6.3)
12 (12.8) 4 (3.6)
4 +14 5.1+12.8
7 +16 6.3+16
4+ 8 3.6+ 9.2
28
Summary
Ø Goodness-of-fit test
Ø Test of independence
Ø Rule of five
29
10

BES - Lecture 9 - Chi Squared Tests For Qualitative Data

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BES - Lecture 9 - Chi Squared Tests For Qualitative Data

Uploaded by

Copyright:

Available Formats

8/25/22

Problem context (cont.)

§To study the effects of the campaigns on the market

– The population investigated is the brand

• The multinomial experiment studied is an extension

What sample frequency would you expect

90 = 200(0.45) 80 = 200(0.40) What actual frequencies

Problem context (cont.)

Company Observed market Expected

The statistic follows chi-squared

distribution with (k-1) degrees of freedom.

We reject Ho if the value of

Equal Expected Unequal Expected

follows chi-squared distribution with k-1 degrees of

Level of significance: α =0.05

follows chi-squared distribution with k-1 degrees

Choice of MBA major

• Is there sufficient evidence to infer that undergraduate

There are two ways to address the problem:

If each classification is considered If each undergraduate degree

§ The test statistic

The rejection region

rc is the number of cells in

• Under the null hypothesis the two variables are independent:

P(Marketing and BA) = P(Marketing)*P(BA) = [61/152][60/152].

The number of students expected to fall in the cell ‘Marketing – BA’ is

The number of students expected to fall in the cell ‘Finance – BBA’ is

Expected frequency for a contingency table

(column j total)(row i total)

The critical value in our example: Since 14.70 > 12.5916,

You might also like