You are on page 1of 23

Chp.

12

Statistics: Comparing Multiple


Proportions, Test of
Independence, Goodness of Fit
Professor: Lee Yung Hsin 李永新
138332@mail.tku.edu.tw

TA: jenniferlimerthaa@gmail.com
MS Teams : 409595526@o365.tku.edu.tw
12.1 - TESTING THE EQUALITY OF POPULATION PROPORTIONS
FOR 3 OR MORE POPULATIONS

*Last semester (Chp 10): statistical inference for population proportions with 2 populations, using Z table
** This Chp: statistical inference for population proportions with 3 or more populations, using chi square table

Hypotheses for this is:

● If H0 cannot be rejected = we cannot detect a difference among the k population proportions


(meaning p1 = p2 = pk)
● If H0 can be rejected = not all k population proportions are equal
EXAMPLE
Q: Organizations such as J.D. Power and Associates use the proportion of owners likely to repurchase a
particular automobile as an indication of customer loyalty for the automobile. An automobile with a greater
proportion of owners likely to repurchase is concluded to have greater customer loyalty. Suppose that in a
particular study we want to compare the customer loyalty for three automobiles: Chevrolet Impala, Ford
Fusion, and Honda Accord. The current owners of each of the three automobiles form the three populations for
the study. The three population proportions of interest are as follows:

The data for samples of 125 Chevrolet Impala owners, 200 Ford Fusion owners, and 175 Honda Accord owners
are summarized:
HOW TO DETERMINE IF H0 CAN BE REJECTED ?

How to find Expected Frequencies:

(1) From the observed frequencies,


we need to first find the
Expected Frequencies under the
assumption that H0 is true
● 69 -> 78, how?
(312 x 125)/500 = 78

● 52 -> 65.8, how?


(188 x 175)/500 = 65.8

If there is a significant difference between Observed & Expected


Frequencies (using chi square table) = H0 can be rejected
(2) Find the chi-square Test Statistic
(3) Critical Value Rejection Rule:

● df = k - 1 = 3 - 1 = 2

● Find X²a based on a & df, using the Chi square distribution table (a=0.05 , df = 2)

● ( X² = 7.89 ) > ( X²a = 5.991),


So we reject H0
(4) P Value Rejection Rule:

● Find estimation of the p-value from chi dist. Table:

● (0.01 < p value < 0.025) < ( a = 0.05),

So we reject H0.
At the 5% significance level, there is enough evidence to suggest that the 3 population proportions are not
all equal, and thus there is a difference in brand loyalties among Chevrolet Impala, Ford Fusion and
Honda Accord owners.

This tells us that the population proportions for the 3 populations are not equal,
but it doesn’t tell us the details (which proportion is significantly different)
-> To identify where this difference between the proportion exist, we use Multiple Comparisons Procedure
MULTIPLE COMPARISON PROCEDURE

(1) Calculate the 3 sample proportions:

(2) Marascuilo procedure: Find the absolute value of the difference between each pairs (pair 1&2, 1&3, 2&3)
(3) Find the critical value for each pair:

If the absolute value of any pairwise sample


proportion difference exceeds its corresponding
critical value = the two proportions are different

the only significant difference in customer loyalty occurs between the Chevrolet Impala and the Honda Accord.
12.2 - TEST OF INDEPENDENCE
● The test of independence allows us to test if two categorical variables are
independent (not related) or dependent (related).
● It can only show if a relationship exists between two variables, but the test does
not show if one variable causes changes in the other variable.
EXAMPLE
Q: A beer industry association conducts a survey to determine the preferences of beer drinkers for light,
regular, and dark beers. A sample of 200 beer drinkers is taken with each person in the sample asked to
indicate a preference for one of the three types of beers: light, regular, or dark. At the end of the survey
questionnaire, the respondent is asked to provide information on a variety of demographics including gender:
male or female. A research question of interest to the association is whether preference for the three types of
beer is independent of the gender of the beer drinker.

(1) Hypotheses:
● H0 : the two variables are independent
● Ha: the two variables are dependent
How to find Expected Frequencies:

(2) From the observed frequencies,


find the Expected Frequencies

● 51 -> 59.4, how?


(90 x 132)/200 = 59.4

● 8 -> 11.22, how?


(33 x 68)/200 = 11.22
(3) Find the chi-square Test Statistic
(4) Critical Value Rejection Rule:

● df = ( row - 1)(column - 1)

With r rows and c columns in the table, the chi-square distribution will have (r – 1)(c – 1) degrees of freedom
provided the expected frequency is at least 5 for each cell.
df = (3 – 1)(2 – 1) = 2 degrees of freedom

● Find X²a based on a & df, using the Chi square distribution table (a=0.05 , df = 2)

● ( X² = 6.45 ) > ( X²a = 5.991),


So we reject H0
(5) P Value Rejection Rule:

● Find estimation of the p-value from chi dist. Table:

● (0.025 < p value < 0.05) < ( a = 0.05),

So we reject H0.
At the 5% significance level, there is not enough evidence to suggest that the 2 variables are independent of
each other.
12.3 - GOODNESS OF FIT TEST (MULTINOMIAL PROBABILITY DISTRIBUTION)

● The goodness of fit test determines how well sample data fits a distribution from a
population with a normal distribution. Do the set of observed values match the
expected values under the applicable model?

● It allows us to test if the sample data from a categorical variable fits the pattern of
expected probabilities for the variable. Does a new sample from the population
support the assumed probability distribution or does the sample indicate that there
has been a change in the probability distribution?
EXAMPLE
Q: Over the past year, market shares for a certain product have stabilized at 30% for company A, 50% for company B, and
20% for company C. Since each customer is classified as buying from one of these companies, we have a multinomial
probability distribution with three possible outcomes. The probability for each of the three outcomes is :

pA = probability a customer purchases the company A product

pB = probability a customer purchases the company B product The sum of the probabilities for
a multinomial probability
pC = probability a customer purchases the company C product distribution equal 1.

Using the historical market shares, we have multinomial probability distribution with pA = 0.30, pB = 0.50, and pC = 0.20.
Company C plans to introduce a “new and improved” product to replace its current entry in the market. Company C has
retained Scott Marketing Research to determine whether the new product will alter or change the market shares for the
three companies. Specifically, the Scott Marketing Research study will introduce a sample of customers to the new company
C product and then ask the customers to indicate a preference for the company A product, the company B product, or the new
company C product.

(1) Hypotheses:
● H0 : the population follows a
multinomial dist. With specified
probabilities for each of the k categories

● Ha: the population does not follow a


multinomial dist. With specified
probabilities for each of the k categories
The market research firm has used a consumer panel of 200 customers. Each
customer was asked to specify a purchase preference among the 3 options:

How to find Expected Frequencies:

= Sample Size (200) x


category probability
(2) From the observed frequencies,
find the Expected Frequencies
(3) Find the chi-square Test Statistic
(4) Critical Value Rejection Rule:

● df = k - 1

df = 3 - 1 = 2

● Find X²a based on a & df, using the Chi square distribution table (a=0.05 , df = 2)

● ( X² = 7.34 ) > ( X²a = 5.991),


So we reject H0
(5) P Value Rejection Rule:

● Find estimation of the p-value from chi dist. Table:

● (0.025 < p value < 0.05) < ( a = 0.05),

So we reject H0.

You might also like