Goodness of Fit Tests

QUANTITATIVE ANALYSIS
FOR
MANAGEMENT –II
Measures of Association,
Goodness-of-Fit Tests and
Contingency Tables
Measures of Association
 Measures of association provide a means of summarizing the size
of the association between two variables.
 There are two main classes of measure of association: symmetric
and asymmetric.
 Symmetric measures will be the same if the roles of X and Y are

reversed. In other, words it does not matter which variable is
viewed as the independent variable (X) and which is viewed as the
dependent variable (Y).
 Asymmetric measures will be different if the roles of X and Y are

reversed. In other words, which variable is viewed as the
dependent variable (Y) matters.
How to Interpret Levels of Association
 Perfect positive relationship between variables: +1.0
 Perfect negative relationship between variables: -1.0
 No relationship between variables = 0
 The closer to 0, the weaker the relationship
 The closer to ±1, the stronger the relationship
 There is no universal scale to determine if a relationship is strong

or weak.
Types of Measures of Association
 Chi-square based measures

 Specific measures for ANOVA
Goodness-of-Fit Test: SPECIFIED
PROBABILITIES
 The most straightforward test of this type is illustrated with a study

that observed a random sample of 300 subjects purchasing a soft
drink. Of these subjects, 75 selected brand A, 110 selected brand
B, and the remainder selected brand C.
 More generally, consider a random sample of 𝑛 observations that

can be classified according to 𝐾 categories. If the numbers of
observations falling into each category are 𝑂1 , 𝑂2 , … , 𝑂𝐾 .
PROBABILITIES
 The null hypothesis 𝐻0 might be that a randomly chosen
subject is equally likely to select any of the three different
varieties
 This null hypothesis, then, specifies that the probability is 1/3
that a sample observation falls into each of the three
categories. To test this hypothesis, it is natural to compare
the sample numbers observed with what would be expected
if the null hypothesis were true.
PROBABILITIES
 A random sample of n observations, each of which can be

classified into exactly one of K categories, is selected.
 Suppose the observed numbers in each category are 𝑂1 , 𝑂2 , … , 𝑂𝐾 .

 If the null hypothesis specifies 𝑃1 , 𝑃2 , … , 𝑃𝐾 for an observation falling
into each of these categories.
𝑃1 + 𝑃2 + ⋯ + 𝑃𝐾 = 1
 The expected numbers in each category, under the null hypothesis,

will be as follows:
𝐸𝑖 = 𝑛𝑃𝑖 for 𝑖 = 1,2, … , 𝐾
PROBABILITIES
 If the null hypothesis is true and the sample size is large enough
that the expected values are at least 5, then the random variable
associated with
𝐾
𝑂𝑖 − 𝐸𝑖 2
𝜒2 = ෍
𝐸𝑖
𝑖=1
 is known as a chi-square random variable, and has, to a good
approximation, a chi-square distribution with (K - 1) degrees of
freedom.
PROBABILITIES
PROBABILITIES
 The null hypothesis is that the probabilities are the same for the
three categories. The test of this hypothesis is based on the
following:
 There are three 𝐾 = 3 categories, so 𝐾 − 1 = 2 degrees of freedom

are associated with the chi-square distribution. At 𝛼 = 0.01
2
significance level the tabulated value is 𝜒2,0.01 = 9.210.
 Therefore, according to our decision rule, the null hypothesis is
rejected at the 1% significance level. These data contain strong
evidence against the hypothesis that a randomly chosen subject is
equally likely to select any of the three soft drink brands.
Testing Market Shares
 Company A has recently conducted aggressive advertising
campaigns to maintain and possibly increase its share of the
market (currently 45%) for fabric softener. Its main competitor,
company B, has 40% of the market, and a number of other
competitors account for the remaining 15%. To determine whether
the market shares changed after the advertising campaign, the
marketing manager for company A solicited the preferences of a
random sample of 200 customers of fabric softener. Of the 200
customers, 102 indicated a preference for company A’s product, 82
preferred company B’s fabric softener, and the remaining 16
preferred the products of one of the competitors. Can the analyst
infer at the 5% significance level that customer preferences have
changed from their levels before the advertising campaigns were
launched?
 When the null hypothesis is true, the observed and expected

frequencies should be similar, in which case the test statistic will be
small. Thus, a small test statistic supports the null hypothesis. If the
null hypothesis is untrue, some of the observed and expected
frequencies will differ and the test statistic will be large.
 Consequently, we want to reject the null hypothesis when 𝜒 2 is
2
greater than 𝜒𝑘−1,𝛼 . In other word, rejection region is
2 2
 𝜒 2 > 𝜒𝑘−1,𝛼 = 𝜒2,0.05 = 5.99
 Because the test statistics is 𝜒 2 = 8.18 , we reject the null
hypothesis. The p-value of the test is
2
 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑃 𝜒𝑘−1 > 8.18 = 0.0167
Goodness-of-Fit Tests, Population
Parameters Unknown
Idea:
 Test whether data follow a specified distribution (such
as binomial, Poisson, or normal) . . .
 . . . without assuming the parameters of the distribution

are known
 Use sample data to estimate the unknown population

parameters
Goodness-of-Fit Tests, Population
Parameters Unknown (continued)
 Suppose that a null hypothesis specifies category
probabilities that depend on the estimation (from the
data) of m unknown population parameters
 The appropriate goodness-of-fit test is the same as in
the previously section . . .
K
(O  E ) 2
2   i i
i1 Ei
 . . . except that the number of degrees of freedom for
the chi-square random variable is
Degrees of Freedom  (K  m  1)
 Where K is the number of categories
Contingency Tables
Contingency Tables
 Used to classify sample observations according to a
pair of attributes
 Also called a cross-classification or cross-tabulation
table
 Assume r categories for attribute A and c categories
for attribute B
 Then there are (r x c) possible cross-classifications
r x c Contingency Table
Characteristic B
Characteristic A 1 2 ... C Totals
1 O11 O12 … O1c R1

2 O21 O22 … O2c R2
. . . … . .
. . . … . .
. . . … . .
r Or1 Or2 … Orc Rr
Totals C1 C2 … Cc n
Test for Association
 Consider n observations tabulated in an r x c
contingency table
 Denote by Oij the number of observations in
the cell that is in the ith row and the jth column
 The null hypothesis is
H0 : No associatio n exists
between the two attributes in the population
The Chi-Square Test Statistic
The Chi-square test statistic is:
r c (Oij  Eij )2
  
2
with d.f.  (r  1)(c  1)
i1 j1 Eij
 where:
Oij = observed frequency in cell (i, j)
Eij = expected frequency in cell (i, j)
r = number of rows
c = number of columns
 The appropriate test is a chi-square test with (r-1)(c-1) degrees of
freedom
Test for Association
(continued)
 Let Ri and Cj be the row and column totals
 The expected number of observations in cell row i and
column j, given that H0 is true, is
R iC j
Eij 
n
 A test of association at a significance level  is based
on the chi-square distribution and the following decision
rule
r c (Oij  Eij )2
Reject H0 if χ 2    χ(r2 1)c1),α
i1 j1 Eij
Relationship between Undergraduate Degree
and MBA Major
 The MBA program was experiencing problems scheduling its

courses. The demand for the program’s optional courses and
majors was quite variable from one year to the next. In one year,
students seem to want marketing courses; in other years,
accounting or finance are the rage. In desperation, the dean of the
business school turned to a statistics professor for assistance. The
statistics professor believed that the problem may be the variability
in the academic background of the students and that the
undergraduate degree affects the choice of major. As a start, he
took a random sample of last year’s MBA students and recorded
the undergraduate degree and the major selected in the graduate
program.
and MBA Major
 The undergraduate degrees were BA, BEng, BBA, and several

others. There are three possible majors for the MBA students:
accounting, finance, and marketing. The results were summarized
in a cross-classification table, which is shown here. Can the
statistician conclude that the undergraduate degree affects the
choice of major at 5% significance level?
and MBA Major
 One way to solve the problem is to consider that there are two
variables: undergraduate degree and MBA major. Both are
nominal. The values of the undergraduate degree are BA, BEng,
BBA, and other. The values of MBA major are accounting, finance,
and marketing. The problem objective is to analyze the relationship
between the two variables. Specifically, we want to know whether
one variable is related to the other.
 The null hypothesis will specify that there is no relationship
between the two variables. We state this in the following way:
𝐻0 : The two variables are independent
 The alternative hypothesis specifies one variable affects the other,
expressed as
𝐻1 : The two variables are dependent
and MBA Major
 Using relative frequencies, we calculate the estimated probabilities

for the MBA major and also estimated probabilities for the
undergraduate degree.
and MBA Major
 Assuming that the null hypothesis is true (the two variables are
independent), we can compute the estimated joint probabilities. To
produce the expected values, we multiply the estimated joint
probabilities by the sample size, 𝑛 = 152.
and MBA Major
 The expected cell frequencies are shown in parentheses in the

following table. As in the case of the goodness-of-fit test, the
expected cell frequencies should satisfy the rule of five.
and MBA Major
 We can now calculate the value of the test statistic:

and MBA Major
 To determine the rejection region we must know the number of

degrees of freedom associated with the chi-squared statistic. The
number of degrees of freedom for a contingency table with r rows
and c columns is = (r − 1)(c − 1). For this example, the number of
degrees of freedom is v = (r − 1)(c − 1) = (4 − 1)(3 − 1) = 6.
 If we employ a 5% significance level, the rejection region is
2 = 𝜒2
𝜒 2 > 𝜒𝑣,𝛼 6,0.05 = 12.6
 Because 𝜒 2 = 14.70, we reject the null hypothesis and conclude
that there is evidence of a relationship between undergraduate
degree and MBA major.
 The p-value of the test statistic is P(𝜒 2 > 14.70)= 0.0227.
 Unfortunately, we cannot determine the p-value manually.

Goodness of Fit Tests

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Goodness of Fit Tests

Uploaded by

Copyright:

Available Formats

QUANTITATIVE ANALYSIS

 Symmetric measures will be the same if the roles of X and Y are

 Asymmetric measures will be different if the roles of X and Y are

 Perfect negative relationship between variables: -1.0

 No relationship between variables = 0

 The closer to 0, the weaker the relationship

 The closer to ±1, the stronger the relationship

 There is no universal scale to determine if a relationship is strong

 Chi-square based measures

 The most straightforward test of this type is illustrated with a study

 More generally, consider a random sample of 𝑛 observations that

 A random sample of n observations, each of which can be

 Suppose the observed numbers in each category are 𝑂1 , 𝑂2 , … , 𝑂𝐾 .

 The expected numbers in each category, under the null hypothesis,

 There are three 𝐾 = 3 categories, so 𝐾 − 1 = 2 degrees of freedom

 When the null hypothesis is true, the observed and expected

 . . . without assuming the parameters of the distribution

 Use sample data to estimate the unknown population

Characteristic A 1 2 ... C Totals

1 O11 O12 … O1c R1

The Chi-square test statistic is:

 The MBA program was experiencing problems scheduling its

 The undergraduate degrees were BA, BEng, BBA, and several

 Using relative frequencies, we calculate the estimated probabilities

 The expected cell frequencies are shown in parentheses in the

 We can now calculate the value of the test statistic:

 To determine the rejection region we must know the number of

You might also like