You are on page 1of 20

Data Analysis for Managers

Unit IV:
Chi-Square Test and ANOVA
Rajashree Kamath, Ph.D. (Statistics),
Assistant Professor (Economics and Quantitative Techniques),
Coordinator – AcadX, CSR Karma Club, Cell for Sustainable Development,
Documentation
(Kengeri Campus),
School of Business and Management, CHRIST (Deemed to be University), Kengeri,
Bangalore 560074.
Ph.: +918040129879 (O), Cell: +919448067196.
MISSION VISION CORE VALUES
CHRIST is a nurturing ground for an individual’s Excellence and Service Faith in God | Moral Uprightness
holistic development to make effective contribution to Love of Fellow Beings
the society in a dynamic environment Social Responsibility | Pursuit of Excellence
CHRIST
Deemed to be University

Introduction

●  

Excellence and Service


CHRIST
Deemed to be University
Exercise 1
● With double-digit annual percentage increases in the cost of health
insurance, more and more workers are likely to lack health insurance
coverage (USA Today, January 23, 2004). The following sample data
provide a comparison of workers with and without health insurance
coverage for small, medium, and large companies.
  Health
Insurance
Size of Yes No
Company
Small 36 14
Medium 65 10
a. Conduct a testLarge
of independence to determine
88 whether
12 employee
health insurance coverage is independent of the size of the company
at the 5% level of significance. What is your conclusion?
b. The USA Today article indicated employees of small companies are
more likely to lack health insurance coverage. Use percentages based
on the preceding data to support this conclusion.
Excellence and Service
CHRIST
Deemed to be University
Exercise 2
● The National Sleep Foundation used a survey to determine whether
hours of sleeping per night are independent of age (Newsweek,
January 19, 2004). The following show the hours of sleep on
weeknights for a sample of individuals age 49 and younger and for a
sample of individuals age 50 and older.
  Hours of Sleep
Age Fewer than 6 to 7 to 8 or
6 6.9 7.9 more
49 or 38 60 77 65
younger
50 or older
a. Formulate the appropriate null36 57
and alternative 75
hypothesis 92
in this
context.
b. Which statistical test will you advise in this scenario?
c. Conduct a test of independence to determine whether the hours of
sleep on weeknights are independent of age at the 10% level of
significance. What is your conclusion?
Excellence and Service
CHRIST
Deemed to be University

Exercise 3

● A large middle school administrator wishes to use celebrity influence


to encourage students to make healthier choices in the school
cafeteria. The cafeteria is situated at the centre of an open space.
Every day at lunch time students get their lunch and a drink in three
separate lines leading to three separate serving stations. As an
experiment, the school administrator displayed a poster of a popular
teen pop star drinking milk at each of the three areas where drinks are
provided, except the milk in the poster is different at each location:
one shows white milk, one shows strawberry-flavoured pink milk, and
one shows chocolate milk. After the first day of the experiment the
administrator noted the students’ milk choices separately for the three
lines.

Excellence and Service


CHRIST
Deemed to be University
Exercise 3 (Contd.)
● The data are given in the table provided.
  Student Choice
Poster Regular Strawberry Chocolat
Choice e
Regular 38 28 40
Strawberry 18 51 24
Chocolate 32 32 53
● Formulate the appropriate null and alternative hypothesis in this
context.
● Which statistical test will you advise in this scenario?
● Test, at the 1% level of significance, whether there is sufficient
evidence in the data to conclude that the posters had some impact on
the students’ drink choices.
● If you are using Excel, calculate the expected frequencies using the
usual formula in the same layout as Oij. Then, find
CHISQ.TEST(<observed freq. cell range>, <expected frequency cell
range>) to get the p-value.Excellence and Service
CHRIST
Deemed to be University

ANOVA (Analysis of Variance)


● The group averages for a variable are tested statistically using
ANOVA.
● The numerator of the variance is called as Total Sum of Squares
(TSS).
● It is split up into sum of squares (SS) due to groups and SS due to
error (called "within groups" in Excel).
● If groups are formed using just one categorization, it is called One-
Way ANOVA.
● If groups are formed like a crosstabulation, it is called Two-Way
ANOVA. There will be SS due to Rows and SS due to Columns.
● If experiments are repeated, we can test for the interaction of rows
and columns as well.
● Assumptions:
1. For each population, the response variable is normally distributed.
2. The variance of the response variable is the same for all the
populations. Excellence and Service
CHRIST
Deemed to be University

One-Way ANOVA

● Generalization of the 2-sample t-test from 2 to n populations.


● For example, suppose we are comparing ratings given by different
groups of people for a particular brand of a product (or a particular
company policy).
● The interest is to know whether the average rating is the same across
the different groups.
● In general, H0: 𝜇1 = 𝜇2 = … = 𝜇k versus Ha: At least one 𝜇i is different
● Construct an Analysis of Variance table using Excel.
● Reject H0 if the p-value ≤ 𝛼

● Note: The "design of this experiment" is called as Completely


Randomized Design (CRD).

Excellence and Service


CHRIST
Deemed to be University

Exercise 4

● Four different paints are advertised as having the same drying time.
To check the manufacturer’s claims, five samples were tested for each
of the paints. The time in minutes until the paint was dry enough for a
second coat to be applied was recorded. The following data were
obtained.
● AndersonData\Ch 13 Exp. Design\Paint.xlsx

At the 𝛼 = .05 level of significance, test to see whether the mean


drying time is the same for each type of paint.

Excellence and Service


CHRIST
Deemed to be University

Solution 4

As the p-value > 0.05,


we accept H0 at the
5% level of
significance.
That means, the
average drying time of
all the paints is the
same.

Excellence and Service


CHRIST
Deemed to be University

ANOVA Table Formulae


Source of SS df MS F P-value F crit
Variation
Between Groups SSR df1 = #Groups - MSR = f= P[F > f] F-table
1 SSR/df1 MSR/MSW *will be value with
given df1 and df2
Within Groups SST - SSR df2 MSW =
*By subtraction SSW/df2
Total SST Sample size - 1

Excellence and Service


CHRIST
Deemed to be University

Exercise 5

● The Consumer Reports Restaurant Customer Satisfaction Survey is


based upon 148,599 visits to full-service restaurant chains (Consumer
Reports website). One of the variables in the study is meal price, the
average amount paid per person for dinner and drinks, minus the tip.
Suppose a reporter for the Sun Coast Times thought that it would be
of interest to her readers to conduct a similar study for restaurants
located on the Grand Strand section in Myrtle Beach, South Carolina.
The reporter selected a sample of eight seafood restaurants, eight
Italian restaurants, and eight steakhouses.
● AndersonData\Ch 13 Exp. Design\GrandStand.xlsx
● Use 𝛼 = .05 to test whether there is a significant difference among the
mean meal price for the three types of restaurants.

Excellence and Service


CHRIST
Deemed to be University

Exercise 6

● The U.S. Census Bureau computes quarterly vacancy and


homeownership rates by state and metropolitan statistical area. Each
metropolitan statistical area (MSA) has at least one urbanized area of
50,000 or more inhabitants. The following data are the rental vacancy
rates (%) for MSAs in four geographic regions of the United States
for the first quarter of 2008 (U.S. Census Bureau website, January
2009).
● AndersonData\Ch 13 Exp. Design\RentalVacancy.xlsx

● Use α = .05 to test whether there the mean vacancy rate is the same
for each geographic region.

● Note: Notice that there are unequal number of values in the groups.
This is fine and permitted.
Excellence and Service
CHRIST
Deemed to be University

Two-Way ANOVA

● This may be seen as a test conducted on a pivot table where the cells
contain average or sum of some response variable (instead of count).
● The TSS is split into SS due to Rows and SS due to Columns.
● There will be two sets of hypotheses:
● H01: The means across the rows are equal.
● Ha1: At least one of the row means are not equal to the others.

● H02: The means across the columns are equal.


● Ha2: At least one of the column means are not equal to the others.

● The conclusions can be drawn based on the respective p-values.


● Note: The "design of this experiment" is called as Randomized Block
Design (RBD).
Excellence and Service
CHRIST
Deemed to be University

Exercise 7

● The Scholastic Aptitude Test (SAT) contains three parts: critical


reading, mathematics, and writing. Each part is scored on an 800-
point scale. Information on test scores for the 2009 version of the SAT
is available at the College Board website. A sample of SAT scores for
six students is in:
● AndersonData\Ch 13 Exp. Design\SATScores.xlsx

a. Using a .05 level of significance, do students perform differently on


the three portions of the SAT?
b. Which portion of the test seems to give the students the most trouble?
Explain.

Excellence and Service


CHRIST
Deemed to be University
Solution 7

● For rows, since the p-value < 0.05, we reject H01. That means at least
one of parts of the SAT has average mean score different from the
others.
● For columns, since the p-value < 0.05, we reject H02. That means at
Excellence and Service
least one of students has average mean score different from the others.
CHRIST
Deemed to be University

Factorial Experiments

● This is an extension of two-way ANOVA, where you may test for the
interaction effect between row and column variables (combination
means).
● For this, there should be more than one row corresponding to any
combination of row and column variable values. Every such row is
called a replication.
● There will be 3 sets of hypotheses, the first two being the same as in
Two-way ANOVA, and the third being about the interaction means
being same.

● Reject H0i if p-valuei < 𝛼.

Excellence and Service


CHRIST
Deemed to be University

Exercise 8
● Suppose that a local chapter of sales professionals in the greater San
Francisco area conducted a survey of its membership to study the
relationship, if any, between the years of experience and salary for
individuals employed in inside and outside sales positions. On the
survey, respondents were asked to specify one of three levels of years
of experience: low (1–10 years), medium (11–20 years), and high (21
or more years). A portion of the data obtained follow. The complete
data set, consisting of 120 observations, is contained in AndersonData
\Ch 13 Exp. Design\SalesSalary.xlsx
a. Test for any significant differences due to position. Use a .05 level
of significance, ignoring the effect of years of experience.
b. Test for any significant differences due to years of experience. Use
a .05 level of significance ignoring the effect of position.
c. At the .05 level of significance, test for any significant differences
due to position, years of experience, and interaction.
Excellence and Service
CHRIST
Deemed to be University

Solution 8

a. Sort by position and paste the salaries corresponding to any given


position in a separate column (as in SalesSalaryPosition). Do One-
Way ANOVA on this data. (Ans.: The average salaries vary according
to position.)
b. As in (a), arrange the salaries corresponding to Low, Medium and
High in 3 separate columns and do One-Way ANOVA on this data.
c. Sort the data by Experience and then Position. Rearrange the data so
that salaries corresponding to High, Medium and Low levels of
experience are in separate columns. Also copy the first 41 cells of the
Position column as the first column (20 cells for Inside and 20 cells
for Outside).
d. The next slide shows the rest of the analyses as done in Excel.

Excellence and Service


CHRIST

Solution 8 (Contd.) Deemed to be University

● Since the p-value corresponding to Position (Sample) is less than 0.05,


we reject H01 at the 5% level of significance. Similarly, we reject H02
[Experience (Columns)] and H03 (Interaction). [Conclude appropriately].
Excellence and Service

You might also like