You are on page 1of 42

# Business Statistics: Communicating with Numbers

## By Sanjiv Jaggia and Alison Kelly

Chapter 12 Learning Objectives (LOs)

experiment.

## LO 12.2: Determine whether two classifications of a

population are independent.

## LO 12.4: Perform the Jarque-Bera test for normality.

12-2
Is Brand Loyalty Related to Buyer’s Age?

##  The retail analyst for a marketing firm wants to know

if different customer groups prefer one brand over
another. She looks at data from 600 sales.
 In particular, she feels that the brand Under Armour
might appeal more to younger customers.
 The more established brands (Nike and Adidas)
might be capturing the older-customer market.

12-3
Is Brand Loyalty Related to Buyer’s Age?

## 1. Determine whether the two classifications

(age and brand name) are dependent at
the 5% significance level

## 2. Discuss how the findings from the test for

independence can be used.

12-4
12.1 Goodness-of-Fit Test for a Multinomial Experiment

##  Thistest determines whether two or more population

proportions equal each other or any predetermined set of
values.

##  Forexample, are four candidates in an election equally

favored by voters?

##  Or, do people rate food quality in a restaurant comparably

to last year?

12-5
LO 12.1 A Multinomial Experiment

## A multinomial experiment consists of a series of n

independent trials such that:

## 1.On each there are k possible outcomes.

2.The
probability pi of falling into category i is the same on
each trial.

## 3.Thek probabilities sum to 1:

p1 + p2 + … + pk = 1

12-6
LO 12.1 The Hypothesis Test

##  The null hypothesis: the population proportions are equal

to one another or they are each equal to a specific value.

##  Equal Population proportions:

H0: p1 = p2 = p3 = p4 = 0.25
HA: Not all population proportions are equal to 0.25.

##  Unequal Population Proportions:

H0: p1 =0.4, p2 = 0.3, p3 = 0.2, p4 = 0.1
HA: At least one pi differs from its hypothesized value.

12-7
LO 12.1 Restaurant Food Quality

## Last year the management at a restaurant

surveyed its patrons to rate the quality of its food.
The results were as follows:

## Based on this and other survey results,

12-8
LO 12.1 This Year’s Results

## This year, the management surveyed 250

quality. Here are the results:

## We want to know if the results agree with those

from last year, or if there has been a significant
change.

12-9
LO 12.1 Methodology
 Compute an expected frequency for each
category and compare it to what we actually
observe.

##  Compute the difference between what was

observed and expected for each category.

##  If the results this year are consistent with last

year, these differences will be relatively small.

12-10
LO 12.1 The ei (Expected Frequencies)
 We first compute the expected counts based
on the survey of 250 restaurant patrons.

##  If the survey is consistent with last year’s

results, we expect e1 = p1(250) = .15(250) =
37.5 responses to be in the “Excellent”
category.

##  There actually were o1 = 46, a bit more than

expected.

12-11
LO 12.1 Computing the Deviations
 In the first category e1 = 37.5 and o1 = 46, so we
get (o1 – e1) = ___.

##  In the third category, which are “Fair” responses,

e3 = p3(250) = .45(250) = 112.5.

##  There are 105 of these responses in the survey,

so we compute (o3 – e3) = 105 – 112.5 = ___.

12-12
LO 12.1 Standardizing the Deviations

12-13
LO 12.1 The Chi-Square Test

## df = k-1, where k is the number of categories

oi = observed frequency for category i
ei = expected frequency for category i

12-14
LO 12.1 The Critical Value (at  = .05)

12-15
LO 12.1 The Restaurant Example

12-16
LO 12.1 The Restaurant Example
Observed Expected ( oi - ei )2
Response Percentage This Year Out of 250 ________
Category Last year ( oi ) ( ei ) ( oi - ei ) ei
Excellent 15% 46 37.5 8.5 1.927
Good 30% 83 75.0 8.0 0.853
Fair 45% 105 112.5 -7.5 0.500
Poor 10% 16 25.0 -9.0 3.240
TOTAL 100% 250 250 0.0 6.520

##  Since the computed test statistic of 6.520 is less than

the critical value of 7.815, we do not reject H0.
 The changes did not produce a statistically significant
response at the 5% level.

12-17
LO 12.1 A Required Condition

##  The test requires that the expected frequency

( ei ) in each cell is at least 5.

example.

##  One way to correct this potential problem is

to combine categories to get ei ≥ 5.

12-18
LO 12.1 Example 12.1
 There are five companies that manufacture a
particular product. Their market shares for
2010 are:
Company 1 2 3 4 5
Market Share 40% 32% 24% 2% 2%

##  Current-year shares are not yet known, so a

market analyst surveys 200 recent customers

12-19
LO 12.1 Example 12.1 (continued)
 The survey showed the following results:
Company 1 2 3 4 5 Total
Purchases 70 60 54 10 6 200

##  A minor complication is that for two small

companies, a 2% market share yields
expected frequencies of 4 (200×0.02).

##  We will combine companies 4 and 5 in

performing the analysis.

12-20
LO 12.1 Example 12.1 (continued)

12-21
LO 12.1 Example 12.1 Computations
Market Purchases Expected ( oi - ei )2
Share in This Year Out of 200 ________
Company 2010 ( oi ) ( ei ) ( oi - ei ) ei
1 40% 70 80 -10.0 1.250
2 32% 60 64 -4.0 0.250
3 24% 54 48 6.0 0.750
4 and 5 4% 16 8 8.0 8.000
TOTAL 100% 200 200 0.0 10.250

##  Because the computed test statistic exceeds 7.815,

we reject H0.
 We conclude that there have been shifts in the market.

12-22
12.2 Chi-Square Test for Independence
LO 12.2 Determine whether two classifications of a population are independent.

##  The goodness-of-fit test examines a single qualitative

variable. A test of independence – also called a chi-
square test of a contingency table – analyzes the
relationship between two qualitative variables.

##  The competing hypotheses can be expressed as:

H0: The two classifications are independent
HA: The two classifications are dependent

12-23
LO 12.2 Contingency Tables

##  A contingency table shows the frequencies

for two qualitative variables (i.e., brand of
product and type of customer).

##  The test for independence is based on the

expected and observed frequencies for each
cell in the table.

12-24
LO 12.2 Example
Does the brand of compression garment
purchased depend on the customer’s age?

Brand Name
Age Group Under Armor Nike Adidas
Under 35 years 174 132 90
35 years and older 54 72 78

12-25
LO 12.2 Notation
 We use the notation oij to denote the observed
frequency in row i of column j.

of column j.

##  Under the independence assumption, the

expected frequency per cell is:
eij = (Row i total)(Column j total)/Sample Size

12-26
LO 12.2 The Chi-Square Statistic
We apply the chi-square test statistic in a
similar manner as in the goodness-of-fit test.
The formula is as follows:

(oij  eij ) 2

  
2
df ,
i j eij

## where df = (rows - 1)(columns -1).

12-27
LO 12.2 Computing Expected Frequencies
Brand Name Row
Age Group Under Armor Nike Adidas Totals
Under 35 years 174 132 90 396
35 years and up 54 72 78 204
Column Totals 228 204 168 600

##  For row 1 and column 1, the expected frequency, e11, is

(396)(228)/600 = 150.48.

##  For row 1 and column 2, the expected frequency, e12, is

(396)(204)/600 = _____.

##  For e13, we calculate (396)(___)/600 = _____.

12-28
LO 12.2 Expected Frequencies and Deviations

## Brand Name Row

Age Group Under Armor Nike Adidas Totals
Under 35 years 150.48 134.64 110.88 396.00
35 years and up 077.52 069.36 057.12 204.00
Column Totals 228.00 204.00 168.00 600.00

## The deviations ( oij – eij ) are:

Brand Name
Age Group Under Armor Nike Adidas
Under 35 years 23.52 -2.64 -20.88
35 years and up -23.52 2.64 20.88

12-29
LO 12.2 Squared Deviations
 We square each deviation and divide by the
respective expected frequency. These
values are shown in the following table.

Brand Name
Age Group Under Armor Nike Adidas
Under 35 years 3.68 0.05 3.93
35 years and up 7.14 0.10 7.63

##  The standardized, squared deviations sum to

22.53, the value of the test statistic.

12-30
LO 12.2 Summarizing the Example
Competing Hypotheses:
H0: Age and brand name are independent.
HA: Age and brand name are dependent.

## The test statistic is calculated using:

(oij  eij ) 2

  
2
df ,
i j eij
where df = (r – 1)(c – 1) = (2 - 1)(3 - 1) = 2.

## The critical value is 5.991 at the 5% significance level.

12-31
LO 12.2 Summarizing the Example
 We reject H0 because the value of the test
statistic is larger than the critical value:
22.53 > 5.991. Therefore, age and brand
name are not independent of one another.

##  Alternatively, by selecting Formulas > Insert

Function > CHISQ.DIST.RT and inputting
X=22.53 and Deg-freedom=2, Excel will
compute the p-value for our test, which is
very close to 0.

12-32
12.3 Chi-Square Test for Normality
LO 12.3 Conduct a goodness-of-fit test for normality.

##  The goodness-of-fit test can also be used to

determine if a population has a particular
probability distribution. The expected
frequencies are determined from this assumed
distribution.

##  These expected frequencies are then compared

to the observed frequencies to compute the
familiar chi-square test statistic.

12-33
LO 12.3 Testing for Normality

##  The hypotheses for a test for normality:

H0: The data follow a normal distribution with
parameters µ and σ
HA: The data do not follow this distribution

##  The values of µ and σ are typically the point

estimates calculated from the sample data.

12-34
LO 12.3 Example: A Sample of 50 Incomes
 Table 12.9 in the text shows 50 household
incomes. The sample mean income is 63.80 (in
\$1000s) with standard deviation 45.78.

##  We next form k = 5 categories (up to 20, 20 to

40, etc.), and count how many households we
observe with incomes in each category.

##  For the expected frequency, we calculate the

probability of an income falling in each category,
assuming income follows our hypothesized
distribution.

12-35
LO 12.3 Computing the Expected Counts
 There are 6 households in the first class of
less than \$20,000.

##  In this interval we expect 0.1658×50 = 8.43

households.

12-36
LO 12.3 Calculations for the Test

##  df = (k – 1 – 2) because there are two parameters in the

normal distribution.
 With k = 5, df = 2 and the critical value is 5.991.

12-37
LO 12.3 Concluding the Test
 Since the value of the test statistic, 8.12, exceeds
our critical value of 5.991, we reject the null
hypothesis.
 We conclude that this data does not come from a
normal distribution with mean 63.8 and standard
deviation 45.78.
 A criticism of this method is that we first have to
convert raw data into a set of arbitrary classes.
 The result might be different if we had grouped
the data differently.

12-38
12.4 The Jarque-Bera Test for Normality
LO 12.4 Perform the Jarque-Bera test for normality.

##  An alternative to the goodness-of-fit test for

normality is one developed by Jarque and Bera.

##  A normal distribution is not skewed and its peak

is in a specific ratio to its spread.

##  The Jarque-Bera test uses these facts to derive

a test statistic.

12-39
LO 12.4 Skewness and Kurtosis
 Skewness is a measure of a distribution’s lack of
symmetry; we have S = 0 for any normal
distribution.

##  Kurtosis is a measure of peakedness; the value

is K = 0 for a normal distribution.

##  We can obtain the values of S and K from Excel

and use them to compute the appropriate test
statistic.

12-40
LO 12.4 Hypotheses and Test Statistic

12-41
LO 12.4 Example 12.3

12-42