Professional Documents
Culture Documents
Lessons in Business Statistics Prepared by P.K. Viswanathan
Lessons in Business Statistics Prepared by P.K. Viswanathan
Lessons in Business Statistics Prepared by P.K. Viswanathan
Prepared By
P.K. Viswanathan
Chapter 9: Chi-Square Test and
Analysis of Variance(ANOVA)
Introduction
In the previous chapter, we have made inferences about
difference between two population means based on the
corresponding sample means. Suppose we are interested
in testing the equality of means involving more than two
populations, we have an elegant technique known as
ANOVA developed by Ronald Fisher, the father of
statistics in the year 1920. The specialty of ANOVA is
that it is part of the domain called "Experimental Design"
which deals with cause-effect relationship in an effective
manner. Cause-effect relationship is also reflected in
association of attributes. Association of attributes is
effectively answered by the chi-square test. This chapter
covers the basic models of chi-square test and ANOVA.
1) Chi-Square Analysis-Basics
Chi-Square analysis is widely used in research studies
for testing hypothesis involving nominal data.
Nominal data are also known by two names-
categorical data and attribute data. The symbol 2
statistics is used to designate the chi-square
distribution whose value depends on the number of
degrees of freedom (d.f.). A chi-square distribution is
a skewed distribution particularly with smaller d.f. As
the sample size and therefore the d.f. increases, the 2
distributions becomes a symmetrical distribution
approaching normality. The general shape of the 2
distributions for smaller d.f. is given in the next slide
1) Chi-Square Analysis-Basics-Picture
1) Chi-Square Analysis-Basics
Test Statistic
(O E) 2
χ
2
E
Where O = Observed Frequency
E = Expected Frequency
2) Chi-Square Test-Goodness of Fit
Example:
Assume that a marketer wishes to compare five different package
designs. He is interested in knowing which is the most preferred
one so that the same can be introduced in the market. A random
sample of 200 consumers gives the following picture:
(O E) 2
χ
2
E
= 4.850
The critical χ for 4 d.f at 5% level of significance is 9.49. Since the calculated
2
value of is less than critical at 5% level, accept the null hypothesis of equal
preference. The conclusion is that all packages are equally preferred
preferred and difference
in preference in the sample survey may have arisen due to chance.
3) Chi-Square Test-Cross Tab
The manger who conducted this survey wants to know whether the brand
brand
preference is associated with the income strata.
3) Chi-Square Test-Cross Tab
Solution:
The null hypothesis is that there is no association between the
brand preference and the income level (These two are
independent). The alternative hypothesis is that the brand and
income level are associated (dependent).
In order to calculate the value, you need to work out the expected
frequency in each cell in the contingency table. In our example,
there are 4 rows and 3 columns amounting to 12 elements. There
will be 12 expected frequencies.
3) Chi-Square Test-Cross Tab
The upper χ2
value at 5% level for 6 d.f =12.59.
Total 123.6 14
5) ANOVA-One Way Classification
Solution continues
Formulation of the Null and Alternative hypothesis
H0: The population means of percentage stock out position for all the
the
three chains are equal
H1: The population means of percentage stock out position for all the
three chains are not equal
Decision Rule: If the computed F is greater than the critical F, reject the null
hypothesis H0 and accept the alternative H1.
At 5% level from the ANOVA output of Excel, we have the computed F = 7.53
and the critical F(2,12) =3.89. So, reject the null hypothesis and accept the
alternative. The inference is that the population means of percentage stock out
are not the same for all the three chains. So, what do you do? Now,
Now, look at the
point estimates from the summary table. Chain 1 has a mean stock out of 16%,
chain 2 has a mean stock out of 10.8% and chain 3 has a mean stock out of 14%.
Chain 2 has the least stock out percentage followed by chain 3 and
and then chain 1.
5) ANOVA-One Way Classification
Assumptions involved in using ANOVA
Example:
A supermarket that has a chain of stores is concerned
about its service quality reputation perceived by its
customers. The Table below shows the perceived
service quality with regard to politeness of the staff.
The number in each cell of the table is the percentage of
people who have said that the staff is polite. Perform
the two-way ANOVA and draw your inferences about
the population means of politeness corresponding to the
days as well as the stores.
6) ANOVA-Two Way Classification
Day Store A B C D E
Monday 79 81 74 77 66
Tuesday 78 86 89 97 86
Wednesday 81 87 84 94 82
Thursday 80 83 81 88 83
Friday 70 74 77 89 68
6) ANOVA-Two Way Classification
Total 1361.76 24
6) ANOVA-Two Way Classification
Interpretation of the results:
Rows are the days and columns are the stores. The F
value computed in both cases is greater than the
critical F. So reject the null hypothesis of equality of
means in both the cases. The conclusion is that the
stores (columns) as well as the days (rows) reveal
different patterns in politeness level. The highest
politeness level is witnessed on Tuesday and Store D
extends the maximum politeness level.