Professional Documents
Culture Documents
Lecture Notes
Subject: Probability, Statistics and Information Theory (SC222)
Chapter 8
Hypothesis Testing
Contents
1 Introduction 1
2 Types of errors 2
3 Critical region 2
4 Level of significance 3
1
alternative hypothesis.
2 Types of errors
We go out and collect the samples. We do analysis on the sample and
then we reach some conclusion about our hypothesis so the question is: Does
our conclusion match the actual state of reality? We know it is not going to
happen 100%. There are two types of errors.
Type I: Test rejects H0 when H0 is true.
Type II: Test fails to reject H0 when H0 is false.
For example,
you will be happy if
H0 : you will get spi ≥ 7.
you will be upset if
Ha : you will get spi < 7.
Now, suppose you got 7.2 spi but you are upset. So H0 is true yet you are
rejecting it. this is Type I error. Suppose you got 6.9 spi but you are happy
with that. It means H0 is false yet you are not rejecting it. This is called
Type II error.
3 Critical region
Test statistics: It is a statistics whose value is determined from the sample
data. Depending upon value of test statistics, the null hypothesis will be
rejected or not rejected. In above examples, spi and quantity of water in
each sample is the test statistics value.
Critical/Rejection region: The set of values of test statistics for which
the null hypothesis is rejected. We have to reject H0 if test statistics value
is in critical region C. We will not reject H0 if test statistics value is not
in critical region C. Critical region will increase while we increase samples
value. it will move towards test statistics value. H0 rejection possibility will
increase. Critical region will be different for number of samples. It will be
2
the nearest to test statistics value for 1 sample.
4 Level of significance
Statisticians refer to the significance level as alpha(α). Hypothesis test
are to determine whether the sample data as evidence is strong enough to
show that an actual outcome applied to the whole population. Suppose, we
have two different data samples. Does it mean that two different data sam-
ples represent two different population? our samples provides evidence for an
effect. The significance level is measure of how strong the sample evidence
must be before determining results. Test have property that whenever H0 is
true, the probability of being rejected is ≤ α. Thus simply we can say that
probability of Type I error occurs is ≤ α. for example,
Example 4.1 An automobile manufacturing company claims that their par-
ticular car is giving 32 kmpl mileage. A customer does research on it before
buying that car. Customer want to test whether company’s claim is true or
not. As a customer or research who has to discredit company’s claim,
H0 : mileage = 32 kmpl
Ha : mileage 6= 32 kmpl
Hypothesis test results in rejection of H0 at α = 5% level of significance. It
means hypothesis that H0 mileage = 32 is rejected by a procedure that would
have resulted by a procedure in rejection only 5% time when mileage = 32.
Mathematical expression : P{X-32 ≥ C}=α. Where X is mean of sample.
Generally, α values 1%, 5%, 10% are chosen by statisticians. If α = 5%,
it means 1 - α = 95% we are sure that we have taken right decision about
null hypothesis. When we change values of α, there will directly impact over
critical region. When we are increasing α value it means we are increas-
ing probability of rejection of null hypothesis, critical/rejection region will
increase.
3
5 Tests concerning mean of a normal population
5.1 Unknown mean and Known variance
Suppose, that X1 , X2 , ..., Xn is a sample of size n from a normal distribu-
tion having an unknown mean and known variance σ 2 and suppose we are
interested in testing null hypothesis
H0 : µ=µ0
against the alternate hypothesis
Ha : µ 6= µ0
where µ0 is some specified constant.
Pn Xi
Since X= i=1 n is a natural pointer estimator of µ, H0 is not rejected
if X is not too far from µ0 . Known variance σ 2 is population variance, not
sample variance. How much far X should be from µ0 to reject null hypothesis,
it will be decided from critical region C.
| X − µ0 | ≥ c. (1)
µ0 -c µ0 µ0 +c
4
From equations (3) and (4),
√
n
| Z |≥ c (5)
σ
From equations (1),(2) and (5),
√
n
P {| Z | ≥ c} = α (6)
σ
Figure 1
Figure 1 is graph of standard normal distribution. by symmetry of the graph,
we can write equation 6 as
√
n
2P {Z ≥ σ c} = α
√
n α
∴ P {Z ≥ c}= (7)
σ 2
5
As we know,
α
P {Z ≥ Zα/2 }= (8)
2
by comparing equation (7) and (8),
√
n
σ c= Zα/2
σ
∴ c = Zα/2 √ (9)
n
By putting value of c from equation (9) to (1),
σ
| X − µ0 | ≥ Zα/2 √ (10)
n
Figure 2
H0 is rejected if equation (10) is satisfied. In figure 2, Zα/2 = 0.9 and α =
30%. So, 30% of graph is rejecting and 70% is non-rejecting region.
6
Example 5.1.1 There are average 5 family members in each house in a
village. Standard deviation is 4. Suppose we randomly choose 10 houses
and we found average 3 family members in each house. Use 5% of level of
significance.
Soln : We have n = 10, µ = X = 3, µ0 = 5, σ = 4 and α = 0.05.
Z0.025 = 1.96.
H0 : µ = 5
Ha : µ 6= 5
√ √
n 10
σ | X − µ0 | = 4 | 3 − 5 | = 1.58.
Thus, condition of equation (10) is not satisfied. H0 can not be rejected.
Figure 3
Our sample mean is in the accepting region. We can see it from Figure 3.
Now we will take level of significance α = 12%.
Z0.06 = 1.55.
Condition of equation (10) is satisfied, So H0 can be rejected.
7
Figure 4
Our sample mean is in critical region. We can see it from figure 4. Hence,
when significance level will increase, accepting (non-rejecting) region will de-
crease and rejecting region will increase. When we increase sample size,
probability of rejecting H0 will become higher. For cheating purpose, people
choose α value such that critical region become small. So their sample mean
will come in accepting region and H0 can not be rejected.