Professional Documents
Culture Documents
of Variance
harishram@hotmail.com
KDUE73YTIL
Introduction to Statistics
● Analysis
harishram@hotmail.com of variance
KDUE73YTIL
○ One way ANOVA
■ Total Variation
■ Variation within treatment
■ Variation between treatment
● Appendix
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
harishram@hotmail.com
KDUE73YTIL
● It may also be of interest to know whether the categories are statistically independent
At an emporium, the manager is interested in knowing age group which visits the mall
during the day. He defines categories as - children (age < 13), teenagers (13 ≤ age < 20),
adults (20 ≤ age < 55) and senior citizens (55 ≤ age). Moreover, he wishes to plan his
inventory of goods accordingly.
harishram@hotmail.com
KDUE73YTIL
He claims that out of all the people who visited 5% are children, 38% are teenagers, 2%
are senior citizens are remaining are adults.
● The hypothesis to test whether the data fits the a specified distribution
● Failing to reject H0, implies that there is no difference between observed frequencies
and expected frequencies
harishram@hotmail.com
KDUE73YTIL
total number of observations
estimated frequency
● Under H0, the test statistics follows χ2 distribution with k-p-1 d.f
where k: number of class frequencies
This file is meant for personal Ref: Test statistic for goodness
p: number of parameter estimated for use by harishram@hotmail.com only.
fitting of fit (A.1)
Sharing or publishing the contents in part or full is liable for legal action.
Chi-square test for goodness of fit
Decision Rule:
Reject H0 if p-value ≤ α
Question:
At an emporium, the manager is interested in knowing age group which visits the mall
during the day. He defines categories as - children, teenagers, adults and senior citizens.
harishram@hotmail.com
KDUE73YTIL
He plans to have his inventory of goods accordingly. He claims that out of all the people
who visited 5% are children, 38% are teenagers, 2% are senior citizens are remaining are
adults.
From a sample of 180 people it was seen that 25 were children, 50 were teenagers, 90
were adults and 15 were senior citizens
Solution:
Solution:
To test, H0: The managers claim is correct Against H1: The managers claim is
false
harishram@hotmail.com
KDUE73YTIL
Solution:
Here there are 4 class frequencies, i.e k = 4. Since no parameter was calculated p = 0
harishram@hotmail.com
KDUE73YTIL
From the statistical table for χ2 distribution, χ2k-p-1,α = χ23,0.05 = 7.815
The managers claim is false, his claim is different than what was observed from the data
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Test for goodness of fit
Python solution:
harishram@hotmail.com
KDUE73YTIL
● It is not applicable for testing the goodness of fit of a straight line or any
curve (exponential curve, second degree curve)
H0: The attributes are independent against H1: The attributes are
dependent
harishram@hotmail.com
KDUE73YTIL
● Under H0, the test statistics follows χ2 distribution with (r-1)(c-1) d.f
where r levels for a category and c levels for another category
This file is meant for personal use by harishram@hotmail.com only. Ref: Test statistic for
independence of attributes (A.2)
Sharing or publishing the contents in part or full is liable for legal action.
Test for independence of attributes
Question:
A study was conducted to test the effect of the malaria parasite - plasmodium falciparum -
on heterozygous and homozygous humans. The vaccine was given to a cohort of 252
harishram@hotmail.com
KDUE73YTIL
humans. Test whether the heterozygous humans are better protected than homozygous.
Heterozygous humans 93 51
Homozygous humans 68 40
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Test for independence of attributes
Solution:
To test, H0: The attributes are independent Against H1: The attributes are
dependent
Solution:
Homozygous
68 40 108
humans
Solution:
Infected with
The expected plasmodium
Not infected with
Total
frequencies are
harishram@hotmail.com plasmodium falciparum
KDUE73YTIL falciparum
computed as
Heterozygous
144
humans
Homozygous
108
humans
Solution:
Infected with
Similarly compute the plasmodium
Not infected with
Total
expected frequencies
harishram@hotmail.com plasmodium falciparum
KDUE73YTIL falciparum
for other classes
Heterozygous
144
humans
Homozygous
108
humans
Solution:
Solution:
Solution:
harishram@hotmail.com
KDUE73YTIL
A psychologist wants study whether the happiness quotient of children in the house is
harishram@hotmail.com
related to the
KDUE73YTIL family income. He collects data of 1300 children is there enough evidence to
claim that they are related.
harishram@hotmail.com
KDUE73YTIL
Ryan wants to study whether all the machines have equal efficiency based on the tensile
strength of the alloy wire.
harishram@hotmail.com
KDUE73YTIL
As the number of t-tests increases the probability of at least one type I error increases.
However, it is possible to test Ryans claim by using one way analysis of variance (one way
ANOVA) where the probability of type I error does not change
● For a true null hypothesis, the probability of not obtaining a significant result is 0.95 if
the α = 0.05
harishram@hotmail.com
KDUE73YTIL
● Say you conduct the t-test twice, the probability of not obtaining one or more
significant result is 0.95 x 0.95 = 0.9025
● Thus the probability of at least one type error is 1-0.9025 = 0.0975 (for two t-tests)
As the number of tests increase the probability of at least one type error also increases
● He developed
harishram@hotmail.com
KDUE73YTIL
ANOVA while dealing with agronomic data
● To test the equality of population means for two or more unrelated samples ANOVA
harishram@hotmail.com
KDUE73YTIL
technique is used
● Each sample
harishram@hotmail.com should be from normally distributed population
KDUE73YTIL
● Failing to reject H0, implies that all treatments have the same average
● Suppose Ryan collects data for tensile strength of wires produced by each machine
● It is said
harishram@hotmail.com there are 4 treatments (t = 4) A B C D
KDUE73YTIL
● For our
harishram@hotmail.com
KDUE73YTIL
example, t = 4 A B C D
● In one way ANOVA, the entire population variance is split into two component
● Let xij be the observations in the ith treatment and jth row
harishram@hotmail.com
KDUE73YTIL
● is the grand mean, i.e. the mean of all observations
● Let xi. be the observations in the ith treatment with ni in observation in each treatment
and
harishram@hotmail.com
KDUE73YTIL
is the mean over ith treatment
● Let xi. be the observations in the ith treatment and is the mean over jth row
harishram@hotmail.com
KDUE73YTIL
● is the grand mean, i.e. the mean of all observations
● Under H0, the test statistic follows F-distribution with (dfTr, dfe) degrees of freedom
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
One way ANOVA
harishram@hotmail.com
KDUE73YTIL
F(t-1,N-t),α
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
One way ANOVA
Source
harishram@hotmail.comof Degrees of Mean Sum of
KDUE73YTIL Sum of Squares F-ratio
variation freedom Squares
harishram@hotmail.com
KDUE73YTIL
Or can be obtained as This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
One way ANOVA
Solution:
Source of Degrees of
harishram@hotmail.com Sum of Squares Mean Sum of Squares F-ratio
KDUE73YTIL variation freedom
N-t = 20-4
Error ESS = 296.06
=16
N-1 = 20 -1
Total TSS = 2241.5255
=19
F(3,16),0.05
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
One way ANOVA
Python solution:
harishram@hotmail.com
KDUE73YTIL
● In the example, Ryan has tested for strength of materials due to 4 machines
● If we fail to reject the null hypothesis, it implies that all the treatments have the same
effect
harishram@hotmail.com
● However,
KDUE73YTIL if the null hypothesis is rejected, it implies that at least one treatment has a
different average
● A post hoc is conducted after the null hypothesis of ANOVA is rejected to determine
the different treatments(s)
harishram@hotmail.com
KDUE73YTIL
● There are various post hoc tests available such as:
○ Tukey’s HSD test (Tukey’s Honest(ly) Significant Difference test)
○ Scheffe test
○ Duncan's Multiple Range test
○ Fisher's’ LSD test (Fisher’s Least Significant Difference test)
○ Bonferroni test
● Consider our example where Ryan wants to find out the which machines had different
result
● Each pair of machines is tested for the statistical difference
harishram@hotmail.com
KDUE73YTIL
Machine B Machine C
Machine D Machine D
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Post-hoc test
H01:
harishram@hotmail.com μmachine_A = μmachine_B Against H11:
KDUE73YTIL
μmachine_A ≠ μmachine_B
μmachine_A ≠ μmachine_C
μmachine_A ≠ μmachine_D
This file is meant for personal use by harishram@hotmail.com only.
H04: μmachine_B = μ or publishing the
Sharing
machine_C H is:liable
contents in part or full
Against 14 μ for legal ≠
machine_A
action.
Post-hoc test
harishram@hotmail.com
KDUE73YTIL
t: total treatments
f: error degrees of freedom
MSE: Mean error sum of squares (from ANOVA table)
n: number of observations in a group
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Post-hoc test
harishram@hotmail.com
KDUE73YTIL
True: reject H0
It can been seen that there is statistical difference between pairs of machines (A,B),
(A,C), (B,D), and (C,D).
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
● For equal number of observations in each treatment, tukey HSD test can be
used
harishram@hotmail.com
KDUE73YTIL
● However when the data is unequal it is not efficient
harishram@hotmail.com
KDUE73YTIL
Kruskal-
One-way Reject Reject
Wallis H
ANOVA H0 H0
Test
Post-hoc Post-hoc
Test Test
Fail to Fail to
reject H0 reject H0
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
One - way ANOVA
● Ryan only considered the effect of the machines on the tensile strength
● What if
harishram@hotmail.com he considers the effect of work shifts used?
KDUE73YTIL
● The effect of the quality of material and the effect due to machine can be tested using
two way ANOVA