DA Chap 2

Chapter 2
Test of Hypothesis and Significance

Outliner
Statistical hypothesis
Null and Alternate hypothesis
test of hypothesis and significance
Type I and Type II errors,
 Level of Significance
Tests involving the Normal distribution,
One-Tailed and Two-Tailed tests,
P value
 Special tests of significance for Large samples and Small
(F, chi- square, z, t- test), ANOVA.
Hypothesis Testing
• Hypothesis testing is done to confirm our observation

about the population using sample data, within the desired
error level. Through hypothesis testing, we can determine
whether we have enough statistical evidence to conclude
if the hypothesis about the population is true or not.
Hypothesis Testing
Statistical Hypotheses
Statistical hypothesis testing and confidence interval
estimation of parameters are the fundamental methods
used at the data analysis stage of a comparative
experiment, in which the engineer is interested, for
example, in comparing the mean of a population to a
specified value.
Definition
How to perform hypothesis testing in machine learning?
• To trust your model and make

predictions, we utilize
hypothesis testing.
• When we will use sample data to

train our model, we make
assumptions about our
population.
• By performing hypothesis
testing, we validate these
assumptions for a desired
significance level.
Formulating the hypothesis
• One of the key steps to do this is to formulate the below two hypotheses:
• The null hypothesis represented as H₀ is the initial claim that is based on the
prevailing belief about the population.
• The alternate hypothesis represented as H₁ is the challenge to the null

hypothesis. It is the claim which we would like to prove as True
• One of the main points which we should consider while formulating the null
and alternative hypothesis is that the null hypothesis always looks at
confirming the existing notion. Hence, it has sign >= or , < and ≠
Determine the significance level
• Determine the significance level also known as alpha or α for
Hypothesis Testing
• The significance level is the proportion of the sample mean lying in

critical regions. It is usually set as 5% or 0.05 which means that there
is a 5% chance that we would accept the alternate hypothesis even
when our null hypothesis is true
• Based on the criticality of the requirement, we can choose a lower

significance level of 1% as well.
Select the type of Hypothesis test
• We choose the type of test statistic based on the predictor variable –
quantitative or categorical. Below are a few of the commonly used
test statistics for quantitative data
Hypothesis Testing
For example, suppose that we are interested in the burning rate of a
solid propellant used to power aircrew escape systems.
• Now burning rate is a random variable that can be described by a
probability distribution.
• Suppose that our interest focuses on the mean burning rate (a
parameter of this distribution).
• Specifically, we are interested in deciding whether or not the
mean burning rate is 50 centimeters per second.
Hypothesis Testing
Two-sided Alternative Hypothesis
null hypothesis
alternative hypothesis
One-sided Alternative Hypotheses

Hypothesis Testing
Test of a Hypothesis
• A procedure leading to a decision about a particular
hypothesis
• Hypothesis-testing procedures rely on using the information

in a random sample from the population of interest.
• If this information is consistent with the hypothesis, then we

will conclude that the hypothesis is true; if this information is
inconsistent with the hypothesis, we will conclude that the
hypothesis is false.
Hypothesis Testing
Tests of Statistical Hypotheses
Decision criteria for testing H0: = 50 centimeters per second versus H1:  50 centimeters
per second.
Hypothesis Testing-type I error, type II error
Definitions
Hypothesis Testing

Hypothesis Testing
The probability of making a type I error is denoted by Greek sigma.
Sometimes the type I error probability is called the

significance level, or the -error, or the size of the test.
Hypothesis Testing
Standard deviation of burning rate is =2.5 centimeter per
second
Distribution of the sample mean is approximately normal
with mean a
standard deviation
Hypothesis Testing

Hypothesis Testing
Hypothesis Testing
Figure: The probability of type II

error when  = 52 and n = 10.
Hypothesis Testing
Hypothesis Testing
Figure : The probability of type II

error when  = 50.5 and n = 10.
Hypothesis Testing
Hypothesis Testing
Figure : The probability of type

II error when  = 2 and n = 16.
Hypothesis Testing
Hypothesis Testing
Example
solution
Solution
Hypothesis Testing
Definition
Hypothesis Testing
One-Sided and Two-Sided Hypotheses

Two-Sided Test:
One-Sided Tests:
Hypothesis Testing
P-Values in Hypothesis Tests
Definition
Hypothesis Testing
Hypothesis Testing
Hypothesis Testing
General Procedure for Hypothesis Tests
1. From the problem context, identify the parameter of interest.

2. State the null hypothesis, H0 .
3. Specify an appropriate alternative hypothesis, H1.
4. Choose a significance level, .
5. Determine an appropriate test statistic.
6. State the rejection region for the statistic.
7. Compute any necessary sample quantities, substitute these into the equation for the
test statistic, and compute that value.
8. Decide whether or not H0 should be rejected and report that in the problem context.
Tests on the Mean of a Normal Distribution,
Variance Known
Hypothesis Tests on the Mean

We wish to test:
The test statistic is:

Variance Known
Reject H0 if the observed value of the test statistic z0 is

either:
z0 > z/2 or z0 < -z/2
Fail to reject H0 if
-z/2 < z0 < z/2
Tests on the Mean of a Normal Distribution, Variance
Known
Variance Known
Example 9-2
Known
Variance Known
Known

Variance Known
Hypothesis Tests on the Mean (Continued) (Eq. 9-12 & 18)

Variance Known
Hypothesis Tests on the Mean (Continued)

Variance Known

Variance Known
Type II Error and Choice of Sample Size
Finding the Probability of Type II Error  (Eq. 9-19)
Variance Known

Finding the Probability of Type II Error  (Eq. 9-20)
Known
Finding the Probability of Type II Error  (Figure 9-9)
Figure The distribution of Z0 under H0 and H1

Variance Known
Sample Size Formulas
For a two-sided alternative hypothesis: (Eq. 9-22)
Variance Known

Sample Size Formulas
For a one-sided alternative hypothesis: (Eq. 9-23)
Variance Known
Variance Known
Variance Known

Using Operating Characteristic Curves
Known

Using Operating Characteristic Curves
Variance Known
Variance Known
Large Sample Test

9-3 Tests on the Mean of a Normal Distribution,
Variance Unknown
9-3.1 Hypothesis Tests on the Mean

One-Sample t-Test
Variance Unknown
Figure The reference distribution for H0:  = 0 with critical region for (a) H1:   0
, (b) H1:  > 0, and (c) H1:  < 0.
Review of ANOVA and linear
regression
Review of simple ANOVA
ANOVA
for comparing means between more
than 2 groups
Hypotheses of One-Way ANOVA
◼ H0 : μ1 = μ2 = μ3 = = μc
◼ All population means are equal
◼ i.e., no treatment effect (no variation in means among
groups)
◼
H1 : Not al of the population means are the same
◼ At least one population mean is different
◼ i.e., there is a treatment effect
◼ Does not mean that all population means are different
(some pairs may be the same)
The F-distribution
◼ A ratio of variances follows an F-distribution:
 2
between
~ Fn,m
 2
within
⚫The F-test tests the hypothesis that two variances

are equal.
⚫F will be close to 1 if sample variances are equal.
H 0 : 2
between =  within
2
H a : 2
between   within
2
Sum of Squares Within (SSW), or Su
of Squares Error (SSE)
10
(y
10 10
(y (y
10
(y − y2• ) 2
1j − y1• )2 2j 3j − y3• )
2
4j − y4• )2
j=1 j=1 j=1 j=1
The (within) group
variances
10−1 10−1 10−1 10−1
10 10
 (y
10 10
  ( y 3 j − y3• ) + − y4• ) 2
2
( y1 j − y1• ) 2 + ( y 2 j − y2• ) 2 + 4j
j=1 j=1 j=3 j=1
4 10
=  ( y ij − yi• ) 2 Sum of Squares Within (SSW)
(or SSE, for chance error)
i=1 j=1
Sum of Squares Between (SSB), or
Sum of Squares Regression (SSR)
4 10
Overall mean of
all 40  y ij
observations
y •• =
i=1 j=1
(“grand mean”)
40
(y
S um of S quares Between
− y•• ) 2 (S S B).Variability of the

10x i• group means compared to
the grand mean (the
i=1 variability due to the
treatment).
Total Sum of Squares (SST)
Total sum of squares(TSS).

4 10

S quared difference of every
( yij − y•• ) 2 observation from the overall
mean.(numerator of
variance of Y!)
i=1 j=1
Partitioning of Variance
4 10 4 4 10
 ( y
i=1 j=1
ij − yi• ) 2

+ 10x ( y i• − y •• ) 2 =  ( y ij − y•• ) 2
i=1 i=1 j=1
SSW + SSB = TSS

ANOVA Table
Mean Sum
Source of Sum of of Squares
variation d.f. squares F-statistic p-value
Between k-1 SSB SSB/k-1 Go to

SSB
(sum of squared k −1
(k groups) SSW Fk-1,nk-k
deviations of nk − k chart
group means from
grand mean)
Within nk-k SSW s2=SSW/nk-k

(sum of squared
(n individuals per
deviations of
group)
observations from
their group mean)
Total nk-1 TSS

variation (sum of squared deviations of
observations from grand mean) TSS=SSB + SSW

DA Chap 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DA Chap 2

Uploaded by

Copyright:

Available Formats

Chapter 2

Test of Hypothesis and Significance

• Hypothesis testing is done to confirm our observation

• To trust your model and make

• When we will use sample data to

• The alternate hypothesis represented as H₁ is the challenge to the null

• The significance level is the proportion of the sample mean lying in

• Based on the criticality of the requirement, we can choose a lower

One-sided Alternative Hypotheses

• Hypothesis-testing procedures rely on using the information

• If this information is consistent with the hypothesis, then we

Tests of Statistical Hypotheses

Tests of Statistical Hypotheses

The probability of making a type I error is denoted by Greek sigma.

Sometimes the type I error probability is called the

Tests of Statistical Hypotheses

Figure: The probability of type II

Figure : The probability of type II

Figure : The probability of type

One-Sided and Two-Sided Hypotheses

General Procedure for Hypothesis Tests

1. From the problem context, identify the parameter of interest.

Hypothesis Tests on the Mean

The test statistic is:

Hypothesis Tests on the Mean

Reject H0 if the observed value of the test statistic z0 is

Hypothesis Tests on the Mean

Hypothesis Tests on the Mean (Continued) (Eq. 9-12 & 18)

Hypothesis Tests on the Mean (Continued)

P-Values in Hypothesis Tests

Type II Error and Choice of Sample Size

Figure The distribution of Z0 under H0 and H1

Type II Error and Choice of Sample Size

Type II Error and Choice of Sample Size

Type II Error and Choice of Sample Size

Large Sample Test

9-3.1 Hypothesis Tests on the Mean

Hypothesis Tests on the Mean

⚫The F-test tests the hypothesis that two variances

− y•• ) 2 (S S B).Variability of the

Total sum of squares(TSS).

SSW + SSB = TSS

Between k-1 SSB SSB/k-1 Go to

Within nk-k SSW s2=SSW/nk-k

Total nk-1 TSS

You might also like