You are on page 1of 9

DA-IICT, Gandhinagar

Lecture Notes
Subject: Probability, Statistics and Information Theory (SC222)

Date: 24/03/2021 Reg. No.: 201901305

Name: Piyars Kakadiya Lecture No.: 19

Chapter 8
Hypothesis Testing

Contents
1 Introduction 1

2 Types of errors 2

3 Critical region 2

4 Level of significance 3

5 Tests concerning mean of a normal population 4


5.1 Unknown mean and Known variance . . . . . . . . . . . . . . 4
1 Introduction
Hypothesis is an idea or assumption, that is suggested as the possible
explanation for something but has not yet been found to be true or correct.
So we can put hypothesis in testing. A statistical hypothesis is a statement
that shows nature of the population. It is often stated in the terms of pop-
ulation parameter. For example, a customer bought a water bottle of 500ml
from store. This info is on the label of the bottle. Customer assume it to
be true. but is it? As a customer we are concerned that there is at least
500ml water in the bottle. If there is a little more that is OK. However if
we are manufacturer we want the volume to be exactly 500ml. A question
may be raised in customer’s mind that is it, on average, at least 500ml of
water in each bottle? So customer will collect 50 bottle from all over country
to randomize sample in terms of time, location etc. This process is called
’testing of hypothesis’. There are two types of hypothesis, Null hypothesis
and Alternative hypothesis. As a customer null hypothesis is quantity of
water=500ml and alternative hypothesis is quantity of water≥500ml. Sym-
bol of null hypothesis is H0 and symbol of alternative hypothesis is H1 or
Ha . If we are trying to establish a certain hypothesis then that should be
alternative hypothesis and if we are trying to discredit a certain hypothesis
it should be null hypothesis. Here, as customer perspective, we are trying to
prove that quantity of water from each 50 bottles is ≥500ml. So Ha : quantity
of water≥500ml and H0 : quantity of water=500ml. As manufacturer per-
spective, we are trying to prove quantity of water is exactly 500ml. So, Ha :
quantity of water=500ml and H0 : quantity of water6=500ml. So basically hy-
pothesis is such a statement that is not known whether or not it is true. Null
hypothesis is a statement about the population parameter. Null hypothesis
will be rejected if it appears to be inconsistent with data samples; otherwise
it will not be rejected. In water bottle example, if the data indicates the
bottles are being filled property then customer fails to reject null hypothesis.
We are not saying we have proven null. Just that our assumption has held
up. If we reject the null hypothesis, then we conclude the data supports the

1
alternative hypothesis.

2 Types of errors
We go out and collect the samples. We do analysis on the sample and
then we reach some conclusion about our hypothesis so the question is: Does
our conclusion match the actual state of reality? We know it is not going to
happen 100%. There are two types of errors.
Type I: Test rejects H0 when H0 is true.
Type II: Test fails to reject H0 when H0 is false.
For example,
you will be happy if
H0 : you will get spi ≥ 7.
you will be upset if
Ha : you will get spi < 7.
Now, suppose you got 7.2 spi but you are upset. So H0 is true yet you are
rejecting it. this is Type I error. Suppose you got 6.9 spi but you are happy
with that. It means H0 is false yet you are not rejecting it. This is called
Type II error.

3 Critical region
Test statistics: It is a statistics whose value is determined from the sample
data. Depending upon value of test statistics, the null hypothesis will be
rejected or not rejected. In above examples, spi and quantity of water in
each sample is the test statistics value.
Critical/Rejection region: The set of values of test statistics for which
the null hypothesis is rejected. We have to reject H0 if test statistics value
is in critical region C. We will not reject H0 if test statistics value is not
in critical region C. Critical region will increase while we increase samples
value. it will move towards test statistics value. H0 rejection possibility will
increase. Critical region will be different for number of samples. It will be

2
the nearest to test statistics value for 1 sample.

4 Level of significance
Statisticians refer to the significance level as alpha(α). Hypothesis test
are to determine whether the sample data as evidence is strong enough to
show that an actual outcome applied to the whole population. Suppose, we
have two different data samples. Does it mean that two different data sam-
ples represent two different population? our samples provides evidence for an
effect. The significance level is measure of how strong the sample evidence
must be before determining results. Test have property that whenever H0 is
true, the probability of being rejected is ≤ α. Thus simply we can say that
probability of Type I error occurs is ≤ α. for example,
Example 4.1 An automobile manufacturing company claims that their par-
ticular car is giving 32 kmpl mileage. A customer does research on it before
buying that car. Customer want to test whether company’s claim is true or
not. As a customer or research who has to discredit company’s claim,
H0 : mileage = 32 kmpl
Ha : mileage 6= 32 kmpl
Hypothesis test results in rejection of H0 at α = 5% level of significance. It
means hypothesis that H0 mileage = 32 is rejected by a procedure that would
have resulted by a procedure in rejection only 5% time when mileage = 32.
Mathematical expression : P{X-32 ≥ C}=α. Where X is mean of sample.
Generally, α values 1%, 5%, 10% are chosen by statisticians. If α = 5%,
it means 1 - α = 95% we are sure that we have taken right decision about
null hypothesis. When we change values of α, there will directly impact over
critical region. When we are increasing α value it means we are increas-
ing probability of rejection of null hypothesis, critical/rejection region will
increase.

3
5 Tests concerning mean of a normal population
5.1 Unknown mean and Known variance
Suppose, that X1 , X2 , ..., Xn is a sample of size n from a normal distribu-
tion having an unknown mean and known variance σ 2 and suppose we are
interested in testing null hypothesis
H0 : µ=µ0
against the alternate hypothesis
Ha : µ 6= µ0
where µ0 is some specified constant.
Pn Xi
Since X= i=1 n is a natural pointer estimator of µ, H0 is not rejected
if X is not too far from µ0 . Known variance σ 2 is population variance, not
sample variance. How much far X should be from µ0 to reject null hypothesis,
it will be decided from critical region C.

| X − µ0 | ≥ c. (1)

Reject Accept Accept Reject

µ0 -c µ0 µ0 +c

P {| X − µ | ≥ c}=α when µ=µ0 . (2)


X is normally distributed with mean µ0 and standard deviation √σn , so we
can define standard normal variable Z.

X − µ0 n
Z≡ σ = X − µ0 ) (3)

n
σ

n
Multiply σ to both side of equation (1),
√ √
n n
| X − µ0 | ≥ c (4)
σ σ

4
From equations (3) and (4),

n
| Z |≥ c (5)
σ
From equations (1),(2) and (5),

n
P {| Z | ≥ c} = α (6)
σ

Figure 1
Figure 1 is graph of standard normal distribution. by symmetry of the graph,
we can write equation 6 as

n
2P {Z ≥ σ c} = α

n α
∴ P {Z ≥ c}= (7)
σ 2

5
As we know,
α
P {Z ≥ Zα/2 }= (8)
2
by comparing equation (7) and (8),

n
σ c= Zα/2
σ
∴ c = Zα/2 √ (9)
n
By putting value of c from equation (9) to (1),
σ
| X − µ0 | ≥ Zα/2 √ (10)
n

Figure 2
H0 is rejected if equation (10) is satisfied. In figure 2, Zα/2 = 0.9 and α =
30%. So, 30% of graph is rejecting and 70% is non-rejecting region.

6
Example 5.1.1 There are average 5 family members in each house in a
village. Standard deviation is 4. Suppose we randomly choose 10 houses
and we found average 3 family members in each house. Use 5% of level of
significance.
Soln : We have n = 10, µ = X = 3, µ0 = 5, σ = 4 and α = 0.05.
Z0.025 = 1.96.
H0 : µ = 5
Ha : µ 6= 5
√ √
n 10
σ | X − µ0 | = 4 | 3 − 5 | = 1.58.
Thus, condition of equation (10) is not satisfied. H0 can not be rejected.

Figure 3
Our sample mean is in the accepting region. We can see it from Figure 3.
Now we will take level of significance α = 12%.
Z0.06 = 1.55.
Condition of equation (10) is satisfied, So H0 can be rejected.

7
Figure 4

Our sample mean is in critical region. We can see it from figure 4. Hence,
when significance level will increase, accepting (non-rejecting) region will de-
crease and rejecting region will increase. When we increase sample size,
probability of rejecting H0 will become higher. For cheating purpose, people
choose α value such that critical region become small. So their sample mean
will come in accepting region and H0 can not be rejected.

You might also like