Professional Documents
Culture Documents
“nobreak
provided the data distribution is normal and the population standard deviations are un-
known.
If we simply compare two groups at a time, we will have to carry out more than one t-tests.
Carrying out multiple tests inflates Type I error probability. So, we want to carry out an
11
12 DS 1 Lecture Notes
overall test to see whether there are di↵erences among the groups so as to control overall
Type I error probability. The overall test that we carry is called the Analysis of Variance.
Here we compare a quantitative variable across several (> 2) groups at the same time. The
grouping (categorical) variable is typically referred to as factor or treatment.
Examples: Is the amount of time spent on sleeping (per day) related to the CGPA (A, B, C)
of IIMV students? We can start by visualizing the data using Side-by-side boxplots. Suppose
the following side-by-side boxplot graph displays the time that 30 randomly selected IIMV
students spend playing video games per week categorized by their CGPA (A, B, or C).
FIGURE 2.1
Questions:
• Does the amount of time spent sleeping appear to be related with GPA?
• How can we prove or disprove such a claim such that overall Type-I error is not in-
flated?
Example: Suppose that Sales of beverages in Moonbucks located at the Lucknow airport
last Monday is Rs. 20000 and that at Varanasi airport is Rs. 15000. We can see that the
di↵erence between the sales in those outlets of Moonbucks is Rs. 5000. What is a major
“explanation” as to why there is this di↵erence between the two sales? Is this factor the
ONLY reason why the two sales di↵er? Will the di↵erence between the sales at Lucknow
and Vizag be also Rs. 5000?
Analysis of Variance: Testing Equality of Means across groups 13
Assumptions of ANOVA
Suppose samples of sizes n1 , n2 , . . . , nk are drawn from the k population such that,
Pk
Denote the overall mean = x̄ and n = i=1 ni . Then
P k P ni
SS(Total) = i=1 j=1 (xij x̄)2
Pk
SS(P) = i=1 ni (x̄i x̄)2
P k P ni
SSE = i=1 j=1 (xij x̄i )2 .
SS(P ) M S(P )
Between k 1 SS(P ) M S(P ) = k 1 F = M SE P (Fk 1,n k F)
SSE
Within n k SSE M SE = n k
Things to remember:
• We call it “one-way” because the k populations di↵er with respect to a single “cate-
gorical” feature.
Example 1: Data on average sales of beverages (in Rs. 1000) at Moonbucks outlets located
at Jaipur, Lucknow and Vizag airport from 3-hour time periods from 4:30 AM to 10:30 PM on
Mondays in Q1 of 2019 is given below:
Jaipur Lucknow Vizag
3.1 4.2 3.3
2.5 2.5 2.6
2.2 1.7 1.7
1.5 3.5 3.9
0.7 1.2 2.8
2.4 3.1 3.5
Is there evidence that the average Moonbucks beverage sales at these three Airports are di↵erent
at the 5% level of significance? Proceed using Excel:
Total SS = Population
P PB SS + Block SS + Error SS or SS(Total) = SS(P) + SS(B) + SSE.
SS(Total) = K i=1 j=1 (xij x̄)2
P
SS(P) = B K (x̄i x̄)2
Pi=1
B
SS(B) = K j=1 (ȳj x̄)2
P PB
SSE = K i=1 j=1 (xij x̄i ȳj x̄)2 .
As in one-way ANOVA, the test statistic for testing di↵erences between treatments is F =
M S(P )/M SE ⇠ FK 1,(B 1)(K 1) . The results are summarized in an ANOVA table.
Example 1: In the example, suppose the following data were obtained from 6 blocks:
Blocks Jaipur Lucknow Vizag
4:30 AM - 7:30 AM 3.1 4.2 3.3
7:30 AM - 10:30 AM 2.5 2.5 2.6
10:30 AM - 1:30 PM 2.2 1.7 1.7
1:30 PM - 4:30 AM 1.5 3.5 3.9
4:30 AM - 7:30 PM 0.7 1.2 2.8
7:30 AM - 10:30 PM 2.4 3.1 3.5
We proceed with the steps while using Excel:
Analysis of Variance: Testing Equality of Means across groups 17
• Comparison of several routes for driving to work. Here response variable is driving time. A
confounding factor may be day of work. One-way ANOVA treats all driving days for each
route as equivalent. Two-way ANOVA will allow for di↵erences due to day of week by treat-
ing it as block.
• Suppose an experiment comparing the sales of di↵erent oil brands in Reliance Fresh stores in
di↵erent states in India. Here response variable is ............., “population” is ................, and
block is ..............
• Comparing 3 methods of rounding first base in baseball: Round Out, Narrow Angle, and
Wide Angle. The design for one-way ANOVA assigns each method some runners but the
runners may di↵er in their running skills. So, for a two-way ANOVA design, blocks can be
..............