Statistics TA CHP 12 Multiple Proportions, Test of Independence, Goodness of Fit-2

Chp.
12
Statistics: Comparing Multiple

Proportions, Test of
Independence, Goodness of Fit
Professor: Lee Yung Hsin 李永新
138332@mail.tku.edu.tw
TA: jenniferlimerthaa@gmail.com
MS Teams : 409595526@o365.tku.edu.tw
12.1 - TESTING THE EQUALITY OF POPULATION PROPORTIONS
FOR 3 OR MORE POPULATIONS
*Last semester (Chp 10): statistical inference for population proportions with 2 populations, using Z table
** This Chp: statistical inference for population proportions with 3 or more populations, using chi square table
Hypotheses for this is:
● If H0 cannot be rejected = we cannot detect a difference among the k population proportions

(meaning p1 = p2 = pk)
● If H0 can be rejected = not all k population proportions are equal
EXAMPLE
Q: Organizations such as J.D. Power and Associates use the proportion of owners likely to repurchase a
particular automobile as an indication of customer loyalty for the automobile. An automobile with a greater
proportion of owners likely to repurchase is concluded to have greater customer loyalty. Suppose that in a
particular study we want to compare the customer loyalty for three automobiles: Chevrolet Impala, Ford
Fusion, and Honda Accord. The current owners of each of the three automobiles form the three populations for
the study. The three population proportions of interest are as follows:
The data for samples of 125 Chevrolet Impala owners, 200 Ford Fusion owners, and 175 Honda Accord owners
are summarized:
HOW TO DETERMINE IF H0 CAN BE REJECTED ?
How to find Expected Frequencies:
(1) From the observed frequencies,

we need to first find the
Expected Frequencies under the
assumption that H0 is true
● 69 -> 78, how?
(312 x 125)/500 = 78
● 52 -> 65.8, how?

(188 x 175)/500 = 65.8
If there is a significant difference between Observed & Expected

Frequencies (using chi square table) = H0 can be rejected
(2) Find the chi-square Test Statistic
(3) Critical Value Rejection Rule:
● df = k - 1 = 3 - 1 = 2
● Find X²a based on a & df, using the Chi square distribution table (a=0.05 , df = 2)
● ( X² = 7.89 ) > ( X²a = 5.991),

So we reject H0
(4) P Value Rejection Rule:
● Find estimation of the p-value from chi dist. Table:
● (0.01 < p value < 0.025) < ( a = 0.05),
So we reject H0.
At the 5% significance level, there is enough evidence to suggest that the 3 population proportions are not
all equal, and thus there is a difference in brand loyalties among Chevrolet Impala, Ford Fusion and
Honda Accord owners.
This tells us that the population proportions for the 3 populations are not equal,
but it doesn’t tell us the details (which proportion is significantly different)
-> To identify where this difference between the proportion exist, we use Multiple Comparisons Procedure
MULTIPLE COMPARISON PROCEDURE
(1) Calculate the 3 sample proportions:
(2) Marascuilo procedure: Find the absolute value of the difference between each pairs (pair 1&2, 1&3, 2&3)
(3) Find the critical value for each pair:
If the absolute value of any pairwise sample

proportion difference exceeds its corresponding
critical value = the two proportions are different
the only significant difference in customer loyalty occurs between the Chevrolet Impala and the Honda Accord.
12.2 - TEST OF INDEPENDENCE
● The test of independence allows us to test if two categorical variables are
independent (not related) or dependent (related).
● It can only show if a relationship exists between two variables, but the test does
not show if one variable causes changes in the other variable.
EXAMPLE
Q: A beer industry association conducts a survey to determine the preferences of beer drinkers for light,
regular, and dark beers. A sample of 200 beer drinkers is taken with each person in the sample asked to
indicate a preference for one of the three types of beers: light, regular, or dark. At the end of the survey
questionnaire, the respondent is asked to provide information on a variety of demographics including gender:
male or female. A research question of interest to the association is whether preference for the three types of
beer is independent of the gender of the beer drinker.
(1) Hypotheses:
● H0 : the two variables are independent
● Ha: the two variables are dependent

find the Expected Frequencies
● 51 -> 59.4, how?

(90 x 132)/200 = 59.4
● 8 -> 11.22, how?

(33 x 68)/200 = 11.22
● df = ( row - 1)(column - 1)
With r rows and c columns in the table, the chi-square distribution will have (r – 1)(c – 1) degrees of freedom
provided the expected frequency is at least 5 for each cell.
df = (3 – 1)(2 – 1) = 2 degrees of freedom
● ( X² = 6.45 ) > ( X²a = 5.991),

So we reject H0
● (0.025 < p value < 0.05) < ( a = 0.05),
So we reject H0.
At the 5% significance level, there is not enough evidence to suggest that the 2 variables are independent of
each other.
12.3 - GOODNESS OF FIT TEST (MULTINOMIAL PROBABILITY DISTRIBUTION)
● The goodness of fit test determines how well sample data fits a distribution from a
population with a normal distribution. Do the set of observed values match the
expected values under the applicable model?
● It allows us to test if the sample data from a categorical variable fits the pattern of
expected probabilities for the variable. Does a new sample from the population
support the assumed probability distribution or does the sample indicate that there
has been a change in the probability distribution?
EXAMPLE
Q: Over the past year, market shares for a certain product have stabilized at 30% for company A, 50% for company B, and
20% for company C. Since each customer is classified as buying from one of these companies, we have a multinomial
probability distribution with three possible outcomes. The probability for each of the three outcomes is :
pA = probability a customer purchases the company A product
pB = probability a customer purchases the company B product The sum of the probabilities for
a multinomial probability
pC = probability a customer purchases the company C product distribution equal 1.
Using the historical market shares, we have multinomial probability distribution with pA = 0.30, pB = 0.50, and pC = 0.20.
Company C plans to introduce a “new and improved” product to replace its current entry in the market. Company C has
retained Scott Marketing Research to determine whether the new product will alter or change the market shares for the
three companies. Specifically, the Scott Marketing Research study will introduce a sample of customers to the new company
C product and then ask the customers to indicate a preference for the company A product, the company B product, or the new
company C product.
(1) Hypotheses:
● H0 : the population follows a
multinomial dist. With specified
probabilities for each of the k categories
● Ha: the population does not follow a

multinomial dist. With specified
probabilities for each of the k categories
The market research firm has used a consumer panel of 200 customers. Each
customer was asked to specify a purchase preference among the 3 options:
= Sample Size (200) x

category probability
find the Expected Frequencies
● df = k - 1
df = 3 - 1 = 2
● ( X² = 7.34 ) > ( X²a = 5.991),

So we reject H0
● (0.025 < p value < 0.05) < ( a = 0.05),
So we reject H0.

Statistics TA CHP 12 Multiple Proportions, Test of Independence, Goodness of Fit-2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics TA CHP 12 Multiple Proportions, Test of Independence, Goodness of Fit-2

Uploaded by

Copyright:

Available Formats

Chp.

Statistics: Comparing Multiple

Hypotheses for this is:

● If H0 cannot be rejected = we cannot detect a diﬀerence among the k population proportions

How to ﬁnd Expected Frequencies:

(1) From the observed frequencies,

● 52 -> 65.8, how?

If there is a signiﬁcant diﬀerence between Observed & Expected

● ( X² = 7.89 ) > ( X²a = 5.991),

● Find estimation of the p-value from chi dist. Table:

● (0.01 < p value < 0.025) < ( a = 0.05),

(1) Calculate the 3 sample proportions:

If the absolute value of any pairwise sample

(2) From the observed frequencies,

● 51 -> 59.4, how?

● 8 -> 11.22, how?

● ( X² = 6.45 ) > ( X²a = 5.991),

● Find estimation of the p-value from chi dist. Table:

● (0.025 < p value < 0.05) < ( a = 0.05),

pA = probability a customer purchases the company A product

● Ha: the population does not follow a

How to ﬁnd Expected Frequencies:

= Sample Size (200) x

● ( X² = 7.34 ) > ( X²a = 5.991),

● Find estimation of the p-value from chi dist. Table:

● (0.025 < p value < 0.05) < ( a = 0.05),

You might also like