Statistics For Decision Making in Python: Session 7, Lecture 8 V Shekhar Avasthy, 18 Feb, 2022

Session 7, Lecture 8, BIMTECH, 18 Feb 2/18/2022
2022
Statistics for Decision Making in Python

Session 7, Lecture 8
Business Vertical – DA, Trimester III, Batch ‘21-’23
V Shekhar Avasthy, 18th Feb, 2022
Privileged and Confidential. All Rights Reserved © Facts n Data 2022 1
What we intend to cover today?
• Quick Recap
• Variance revisited
• F-Test
• ANOVA
All rights reserved, Facts n Data, 2022 1

2022
• Quick Recap
• F-Test
• ANOVA
Quick recap…
• Irrespective of population distribution of size, N>0, if we plot MEANS of ALL possible sample combinations
of sample size ‘n’, ( N < n < 0 ), then the distribution is (almost) always a BELL shaped curve.
• Because this bell shaped curve is found so often, it is also called a NORMAL distribution curve.
• This curve has certain properties:
• It is ‘characterized’ by its MEAN and Standard Deviation (SD) – i.e. a given mean and SD shall ALWAYS
lead to only one specific bell-curve.
• It is symmetric about its PEAK.
• Mean = Median = Mode for an IDEAL bell shaped curve – for all curves of sample size ‘n’, even if
population mean/ mode/median does not match, the curve of all possible means of any size ‘n’
follow this property.
• The PEAK of all curves of all samples sizes ‘n’ always coincides with the mean of the population.
• The curve has specific properties such as area on either side of PEAK on multiples of SD are always
same, e.g., all curves shall always have 95% samples lying between +1.96*σ and so on.
• We also studies that σ = sn /SQRT (n), i.e., SD of sample curve of ALL possible combinations of
size ‘n’ = Population SD / Square Root of ‘n’ (sample size for which the curve is created)

2022
…Quick recap
• Statisticians encounter multiple problems to solve:
• To ESTIMATE Population mean from a sample, properties listed on previous slide are exploited
(remember difference between Confidence Level and Interval).
• To check if a sample is part of a given population, we may need to check if mean of population and
sample are likely to have same mean and/ or variance, because if they have same mean and variance,
they are part of the same curve of size ‘n’, and thus belong to the same population from where this
curve is drawn.
• To check if a sample and population are likely to have same mean, we may use one-sample t-test
(provided the data meets assumptions of one sample t-test)
• To check if TWO given sample are likely to be the part of same population (thus are similar), we may
use two-sample t-test (provided the data meets assumptions of two sample t-test)
• To check for variance difference for sets of categorical data, we may use Chi-Square test
• To check for variance difference for sets of continuous data, we may use ANOVA (Analysis Of
Variance) - following slides
• Quick Recap
• F-Test
• ANOVA

2022
Remember session 2 Slide 23!

• Total Population Variance is
sum of squared difference of
independent values from
mean
• Why squared difference? –

to make the distance from
mean a positive value
• So, why not take modulus?

E.g., -3.4 could be treated as
|-3.4| = 3.4: because we
want to AMPLIFY the
difference to highlight values
that were far away. A
difference of, say, 7 would
become 49!
• Quick Recap
• F-Test
• ANOVA

2022
F-Test
• Also called Snedecor's F distribution or the Fisher–Snedecor distribution
• The name was coined by George W. Snedecor, in honour of Sir Ronald A. Fisher. Fisher initially developed the statistic as the
variance ratio in the 1920s.
• It is often desirable to compare two variances rather than two averages. For instance:
• College administrators would like two college professors grading exams to have the same variation in their grading.
• In order for a lid to fit a container, the variation in the lid and the container should be approximately the same.
• A service organization such as a car dealership would like to provide customers with a consistent service experience,
implying that the variance between delivery standards of different workshops or service advisors should be nearly
same.
• Assumptions/ Necessary Conditions:

1. The populations from which the two samples are drawn are approximately normally distributed.
2. The two populations are independent of each other.
• Unlike most other hypothesis tests, the F test for equality of two variances is very sensitive to deviations from
normality. If the two (POPULATION) distributions are NOT normal, or close, the test can give a biased result
for the test statistic.
10

2022
Hypothesis for F-Test
11
Example
PROBLEM:
Two college instructors are interested in whether or not there is any variation in the way they grade math exams. They
each grade the same set of 10 exams. The first instructor's grades have a variance of 52.3. The second instructor's grades
have a variance of 89.9. Test the claim that the first instructor's variance is smaller. The level of significance is 10%.
SOLUTION:
Let 1 and 2 be the subscripts that indicate the first and second instructor, respectively. n1 = n2 = 10.
H0: σ12 ≥ σ22 and Ha: σ12 < σ22
Calculate the test statistic:

By the null hypothesis (σ12 ≥ σ22 ) ,
the F statistic is:
Fc = s22 /s12
= 89.9 / 52.3 = 1.719
LOOK AT THE F table TABLE.
Since F value < FCRITICAL, reject H0.
12

2022
• Quick Recap
• F-Test
• ANOVA
13
INTRODUCTION TO ANOVA
H0 True, all means are near similar, minor HA True, all means are very different,
differences are due to random variations differences unlikely due to random variations
14

2022
STEPS FOR ANOVA
15
Steps for ANOVA (Contd)
16

2022
Steps for ANOVA (Contd)…
17
EXAMPLE 1
BIMTECH students grew tomato plants under different soil cover conditions. Groups of three plants each had one of the
following treatments: bare soil / a commercial ground cover / black plastic / Straw /compost
All plants grew under the same conditions and were the same variety. Students recorded the weight (in grams) of
tomatoes produced by each of the n = 15 plants:
18

2022
1. Plot the points

10,000
9,000
8,000
7,000
Total Mean 6147 6,000
5,000
4,000
3,000
2,000
1,000
0
0 2 4 6 8 10 12 14 16
19
2. Calculate total variance (Distance from mean)2

10,000
9,000
8,000
7,000
5,000
4,000
3,000
2,000
1,000
0
0 2 4 6 8 10 12 14 16
20

2022
10,000
9,000
8,000
Grp 4 Mean 7804
Grp 4 Mean 7591
7,000
Grp 3 Mean 6324
Grp 2 Mean 5504
5,000
4,000
Grp 1 Mean 3512
3,000
2,000
1,000
0
0 2 4 6 8 10 12 14 16
21
Populating the ANOVA Table
22

2022
EXAMPLE 2
23
Thank You!
shekhar@factsNdata.com / 9810228402
24

Statistics For Decision Making in Python: Session 7, Lecture 8 V Shekhar Avasthy, 18 Feb, 2022

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics For Decision Making in Python: Session 7, Lecture 8 V Shekhar Avasthy, 18 Feb, 2022

Uploaded by

Copyright:

Available Formats

Session 7, Lecture 8, BIMTECH, 18 Feb 2/18/2022

Statistics for Decision Making in Python

V Shekhar Avasthy, 18th Feb, 2022

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 1

What we intend to cover today?

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 2

All rights reserved, Facts n Data, 2022 1

What we intend to cover today?

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 3

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 4

All rights reserved, Facts n Data, 2022 2

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 5

What we intend to cover today?

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 6

All rights reserved, Facts n Data, 2022 3

Remember session 2 Slide 23!

• Why squared difference? –

• So, why not take modulus?

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 7

What we intend to cover today?

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 8

All rights reserved, Facts n Data, 2022 4

• Assumptions/ Necessary Conditions:

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 9

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 10

All rights reserved, Facts n Data, 2022 5

Hypothesis for F-Test

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 11

Calculate the test statistic:

LOOK AT THE F table TABLE.

Since F value < FCRITICAL, reject H0.

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 12

All rights reserved, Facts n Data, 2022 6

What we intend to cover today?

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 13

All rights reserved, Facts n Data, 2022 7

STEPS FOR ANOVA

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 15

Steps for ANOVA (Contd)

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 16

All rights reserved, Facts n Data, 2022 8

Steps for ANOVA (Contd)…

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 17

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 18

All rights reserved, Facts n Data, 2022 9

1. Plot the points

Total Mean 6147 6,000

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 19

2. Calculate total variance (Distance from mean)2

Total Mean 6147 6,000

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 20

All rights reserved, Facts n Data, 2022 10

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 21

Populating the ANOVA Table

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 22

All rights reserved, Facts n Data, 2022 11

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 23

Privileged and Confidential. All Rights Reserved © Facts n Data 2022 24

All rights reserved, Facts n Data, 2022 12

You might also like