Professional Documents
Culture Documents
2022
• Quick Recap
• Variance revisited
• F-Test
• ANOVA
• Quick Recap
• Variance revisited
• F-Test
• ANOVA
Quick recap…
• Irrespective of population distribution of size, N>0, if we plot MEANS of ALL possible sample combinations
of sample size ‘n’, ( N < n < 0 ), then the distribution is (almost) always a BELL shaped curve.
• Because this bell shaped curve is found so often, it is also called a NORMAL distribution curve.
• This curve has certain properties:
• It is ‘characterized’ by its MEAN and Standard Deviation (SD) – i.e. a given mean and SD shall ALWAYS
lead to only one specific bell-curve.
• It is symmetric about its PEAK.
• Mean = Median = Mode for an IDEAL bell shaped curve – for all curves of sample size ‘n’, even if
population mean/ mode/median does not match, the curve of all possible means of any size ‘n’
follow this property.
• The PEAK of all curves of all samples sizes ‘n’ always coincides with the mean of the population.
• The curve has specific properties such as area on either side of PEAK on multiples of SD are always
same, e.g., all curves shall always have 95% samples lying between +1.96*σ and so on.
• We also studies that σ = sn /SQRT (n), i.e., SD of sample curve of ALL possible combinations of
size ‘n’ = Population SD / Square Root of ‘n’ (sample size for which the curve is created)
…Quick recap
• Statisticians encounter multiple problems to solve:
• To ESTIMATE Population mean from a sample, properties listed on previous slide are exploited
(remember difference between Confidence Level and Interval).
• To check if a sample is part of a given population, we may need to check if mean of population and
sample are likely to have same mean and/ or variance, because if they have same mean and variance,
they are part of the same curve of size ‘n’, and thus belong to the same population from where this
curve is drawn.
• To check if a sample and population are likely to have same mean, we may use one-sample t-test
(provided the data meets assumptions of one sample t-test)
• To check if TWO given sample are likely to be the part of same population (thus are similar), we may
use two-sample t-test (provided the data meets assumptions of two sample t-test)
• To check for variance difference for sets of categorical data, we may use Chi-Square test
• To check for variance difference for sets of continuous data, we may use ANOVA (Analysis Of
Variance) - following slides
• Quick Recap
• Variance revisited
• F-Test
• ANOVA
• Quick Recap
• Variance revisited
• F-Test
• ANOVA
F-Test
• Also called Snedecor's F distribution or the Fisher–Snedecor distribution
• The name was coined by George W. Snedecor, in honour of Sir Ronald A. Fisher. Fisher initially developed the statistic as the
variance ratio in the 1920s.
• It is often desirable to compare two variances rather than two averages. For instance:
• College administrators would like two college professors grading exams to have the same variation in their grading.
• In order for a lid to fit a container, the variation in the lid and the container should be approximately the same.
• A service organization such as a car dealership would like to provide customers with a consistent service experience,
implying that the variance between delivery standards of different workshops or service advisors should be nearly
same.
10
11
Example
PROBLEM:
Two college instructors are interested in whether or not there is any variation in the way they grade math exams. They
each grade the same set of 10 exams. The first instructor's grades have a variance of 52.3. The second instructor's grades
have a variance of 89.9. Test the claim that the first instructor's variance is smaller. The level of significance is 10%.
SOLUTION:
Let 1 and 2 be the subscripts that indicate the first and second instructor, respectively. n1 = n2 = 10.
H0: σ12 ≥ σ22 and Ha: σ12 < σ22
12
• Quick Recap
• Variance revisited
• F-Test
• ANOVA
13
INTRODUCTION TO ANOVA
H0 True, all means are near similar, minor HA True, all means are very different,
differences are due to random variations differences unlikely due to random variations
Privileged and Confidential. All Rights Reserved © Facts n Data 2022 14
14
15
16
17
EXAMPLE 1
BIMTECH students grew tomato plants under different soil cover conditions. Groups of three plants each had one of the
following treatments: bare soil / a commercial ground cover / black plastic / Straw /compost
All plants grew under the same conditions and were the same variety. Students recorded the weight (in grams) of
tomatoes produced by each of the n = 15 plants:
18
9,000
8,000
7,000
5,000
4,000
3,000
2,000
1,000
0
0 2 4 6 8 10 12 14 16
19
9,000
8,000
7,000
5,000
4,000
3,000
2,000
1,000
0
0 2 4 6 8 10 12 14 16
20
10,000
9,000
8,000
Grp 4 Mean 7804
Grp 4 Mean 7591
7,000
Grp 3 Mean 6324
Total Mean 6147 6,000
Grp 2 Mean 5504
5,000
4,000
Grp 1 Mean 3512
3,000
2,000
1,000
0
0 2 4 6 8 10 12 14 16
21
22
EXAMPLE 2
23
Thank You!
shekhar@factsNdata.com / 9810228402
24