Professional Documents
Culture Documents
DA Chap 2
DA Chap 2
Statistical Hypotheses
Statistical hypothesis testing and confidence interval
estimation of parameters are the fundamental methods
used at the data analysis stage of a comparative
experiment, in which the engineer is interested, for
example, in comparing the mean of a population to a
specified value.
Definition
How to perform hypothesis testing in machine learning?
• By performing hypothesis
testing, we validate these
assumptions for a desired
significance level.
Formulating the hypothesis
• One of the key steps to do this is to formulate the below two hypotheses:
• The null hypothesis represented as H₀ is the initial claim that is based on the
prevailing belief about the population.
• One of the main points which we should consider while formulating the null
and alternative hypothesis is that the null hypothesis always looks at
confirming the existing notion. Hence, it has sign >= or , < and ≠
Determine the significance level
• Determine the significance level also known as alpha or α for
Hypothesis Testing
Statistical Hypotheses
Two-sided Alternative Hypothesis
null hypothesis
alternative hypothesis
Statistical Hypotheses
Test of a Hypothesis
• A procedure leading to a decision about a particular
hypothesis
Decision criteria for testing H0: = 50 centimeters per second versus H1: 50 centimeters
per second.
Hypothesis Testing-type I error, type II error
Tests of Statistical Hypotheses
Definitions
Hypothesis Testing
Definition
Hypothesis Testing
One-Sided Tests:
Hypothesis Testing
P-Values in Hypothesis Tests
Definition
Hypothesis Testing
P-Values in Hypothesis Tests
Hypothesis Testing
P-Values in Hypothesis Tests
Hypothesis Testing
Fail to reject H0 if
-z/2 < z0 < z/2
Tests on the Mean of a Normal Distribution, Variance
Known
Tests on the Mean of a Normal Distribution,
Variance Known
Example 9-2
Tests on the Mean of a Normal Distribution, Variance
Known
Tests on the Mean of a Normal Distribution,
Variance Known
Tests on the Mean of a Normal Distribution, Variance
Known
Figure The reference distribution for H0: = 0 with critical region for (a) H1: 0
, (b) H1: > 0, and (c) H1: < 0.
Review of ANOVA and linear
regression
Review of simple ANOVA
ANOVA
for comparing means between more
than 2 groups
Hypotheses of One-Way ANOVA
◼ H0 : μ1 = μ2 = μ3 = = μc
◼ All population means are equal
◼ i.e., no treatment effect (no variation in means among
groups)
◼
H1 : Not al of the population means are the same
◼ At least one population mean is different
◼ i.e., there is a treatment effect
◼ Does not mean that all population means are different
(some pairs may be the same)
The F-distribution
◼ A ratio of variances follows an F-distribution:
2
between
~ Fn,m
2
within
H a : 2
between within
2
Sum of Squares Within (SSW), or Su
of Squares Error (SSE)
10
(y
10 10
(y (y
10
(y − y2• ) 2
1j − y1• )2 2j 3j − y3• )
2
4j − y4• )2
j=1 j=1 j=1 j=1
The (within) group
variances
10−1 10−1 10−1 10−1
10 10
(y
10 10
( y 3 j − y3• ) + − y4• ) 2
2
( y1 j − y1• ) 2 + ( y 2 j − y2• ) 2 + 4j
j=1 j=1 j=3 j=1
4 10
= ( y ij − yi• ) 2 Sum of Squares Within (SSW)
(or SSE, for chance error)
i=1 j=1
Sum of Squares Between (SSB), or
Sum of Squares Regression (SSR)
4 10
Overall mean of
all 40 y ij
observations
y •• =
i=1 j=1
(“grand mean”)
40
(y
S um of S quares Between
S quared difference of every
( yij − y•• ) 2 observation from the overall
mean.(numerator of
variance of Y!)
i=1 j=1
Partitioning of Variance
4 10 4 4 10
( y
i=1 j=1
ij − yi• ) 2
+ 10x ( y i• − y •• ) 2 = ( y ij − y•• ) 2
i=1 i=1 j=1