Debre Tabor University
College of Health Science
SPH unit
Mulu Tiruneh(Asst. Professor in Biostatistics)
tirunehmulu1@gmail.com
January, 2024
1
At the end of this chapter the student will be able to:
Understand the concepts of null and alternative hypothesis
Explain the meaning and application of statistical significance
Differentiate between type I and type II errors
Describe the different types of statistical tests used when
samples are large and small
Explain the meaning and application of P – values
Understand the concepts of degrees of freedom
HYPOTHESIS TESTING
• Hypothesis: is a statement about one or more populations.
• It is usually concerned with the parameters(mean,
proportion) of the population.
• A statistical hypothesis is an assumption or a statement
which may or may not be true concerning one or more
populations.
E.g. 1) The mean height of the DTU Health Sciences
students is 1.63m.
2) There is no difference between the distribution of P.
falciform and P. vivax malaria in Ethiopia (distributed in
4
Statistical hypotheses
There are two hypotheses involved in hypothesis testing
1. Null hypothesis (H0): It is the hypothesis to be
tested. Also called hypotheses of no difference, no
effect and denoted by Ho
2. Alternative hypothesis (HA ): It is a statement of
what we believe is true if our sample data cause us to
reject the null hypothesis.
5
In general, hypothesis testing in statistics involves the following
steps:
1. Choose the hypothesis that is to be questioned.
2. Choose an alternative hypothesis which is accepted if the original
hypothesis is rejected.
3. Choose a rule for making a decision about when to reject the
original hypothesis and when to fail to reject it.
4. Choose a random sample from the appropriate population and
compute appropriate statistics: that is, mean, variance and so on.
5. Make the decision.
6
Choosing the Alternative Hypothesis (HA)
• The notation HA is used for the hypothesis that will be accepted , if HO is
rejected.
• HA must also be formulated before a sample is tested, so it like the null
hypothesis (HO), does not depend on sample values.
Possible choices of HA
If HO is Then HA is
μ = A (single mean) μ ≠ A or μ < A or μ > A
P = B (single proportion) P ≠ B or P < B or P > B
μx – μ y = C (difference of means) μx – μ y ≠ C or μx – μ y < C or μx – μ y> C
Px – P y = D(difference of proportions Px – P y ≠ D or Px- P y < D or Px- P y > D
Where, A, B, C and D are constants. 7
Cont.…
• A method for making a decision must be agreed upon.
• If HO is rejected, then HA is accepted.
How is a “significant” difference defined?
• A null hypothesis is either true or false, and it is either
rejected or not rejected.
• No error is made if it is true and we fail to reject it, or if it
is false and rejected.
• An error is made, however, if it is true but rejected, or if it
8
is false and we fail to reject it.
Definitions
• A Type I error : is made when HO is true but rejected.
• A Type II error: is made when HO is false but we fail
to reject it .
Notation:
• α is the probability of a type I error.
• It is called the level of significance.
• β is the probability of a type II error.
9
• The following table summarizes the relationships
between the null hypothesis and the decision taken .
Decision
Accept HO Reject HO
Null hypothesis
(Fail to reject HO)
HO True Correct Type I error
HO False Type II error Correct
10
Level of Significance, α
• Is the probability of rejecting a true Ho
• Defines rejection region of the sampling distribution
• The decision is made on the basis of the level of
significance, designated by α.
• More frequently used values of α are 0.01, 0.05 and
0.10.
• α is selected by the researcher
11
One tail and two tail tests
• In a one tail test, the rejection region is at one end of the distribution or the other.
• Consider the situation when HA includes the symbol “ > or < ”. That is,
HA: μ > __ , HA : μ < __,
HA : P > __, HA : P < __,
HA : μx - μy > ___, HA : μx - μy < ___, etc.
• In a two tail test, the rejection region is split between the two tails.
• Consider the situation when HA includes the symbol “≠”. That is,
HA : μ ≠ _ HA : P ≠ __ HA : μx - μy ≠ ___
• Which one is used depends on the way the HA is stated.
12
• The most frequently used values of α and the
corresponding critical values of Z are:
α (level of Two-tailed One -tailed, < One-tailed, >
significance)
0.10 ± 1.64 - 1.28 1.28
0.05 ± 1.96 - 1.64 1.64
0.01 ± 2.58 - 2.33 2.33
13
Level of Significance and the Rejection Region
e.g. The average survival year after cancer diagnosis
is less than 3 years.
14
Steps in testing hypothesis
1. Data: understand the nature of data (e.g. counts or
measurements or proportions)
2. Assumptions: about normality of population
distribution, equality of variance, independence of
samples
3. Hypotheses: the H0 and HA should be explicitly stated
15
3.Hypotheses cont’d
Rules for stating statistical hypotheses
a) What you hope to be able to conclude as a result of the test
usually should be placed in the alternative hypothesis.
b) The null hypothesis should contain a statement of equality, either
=,≥, or ≤.
c) The null hypothesis is the hypothesis that is tested.
d) The null and alternative hypotheses are complementary.
•That is, the two together exhaust all possibilities regarding the
value that the hypothesized parameter can assume.
16
4. Test statistic:
Decide on the appropriate test statistic for the hypothesis
(z, t,etc.) Based on the
sample size (n< 30 or n >30),
type of data (count i.e. qualitative or measurement or
quantitative),
functional form of the distribution (normal or non
normal),
known or unknown population variance,
number of means or proportions, etc.
General formula for test statistic
standard error of the observed statistic
17
5. Select the level of significance (α):
(α =0.05, 0.01, 0.001, etc). If not given take 0.05
• The level of significance (α) :is the probability of
rejecting a true null hypothesis.
6. Determine Critical value (z tab, t tab):
•It is the value the test statistic must attain to be declared
significant (i.e. label the rejection & acceptance regions)
18
7. Calculation of test statistic (zcalc, tcalc):
calculate the test statistic based on step 4 and compare it with the
critical value.
8. Statistical decision: statistical decision consists of rejecting or not
rejecting the null hypothesis.
It is rejected if the computed value of the test statistic falls in the
rejection area. i.e. Reject Ho if, Z cal > Z tab OR t cal> t tab
It is not rejected if the computed value of the test statistic falls in the
non-rejection area. i.e. Accept or don't reject Ho if, Z cal < Z tab OR t cal< t
tab
19
9. Conclusion :
• If Ho is rejected, we conclude that HA is true.
• If Ho is not rejected, we conclude that Ho may be true.
10. P-values:
• The p-value is the probability of getting a value for the test
statistic larger than the observed value of the test statistic just by
random chance if Ho is true
Reject the null hypothesis if P ≤ α
Don't reject ("accept") the null hypothesis if P > α
20
I. Testing a hypothesis about the mean of a
population
21
I. Testing a hypothesis about the mean of a population
1.Data: Determine variable, sample size (n), sample mean( ) ,
x
population standard deviation or sample standard deviation (s)
if it is unknown.
2. Assumptions : We have two cases:
• Case1: Population is normally or approximately normally
distributed with known or unknown variance (n may be small
or large),
• Case 2: Population is not normal with known or unknown
variance (n is large i.e. n≥30). 22
3.Hypotheses:
we have three cases
• Case I : H0: μ=μ0
HA: μ μ0
e.g. we want to test that the population mean is different than50
• Case II : H0: μ ≤ μ0
HA: μ > μ0
e.g. we want to test that the population mean is greater than 50
• Case III : H0: μ ≥ μ0
HA: μ < μ0
e.g. we want to test that the population mean is less than 50
23
4.Test Statistic :
• Case 1: population is normal or approximately normal
σ2 is known σ 2 is unknown
( n large or small)
n large n small
X - o
Z
t
X - o
n X - o s
Z
s n
n
• Case2: If population is not normally distributed and n is large
• i) If σ2 is known ii) If σ 2 is unknown
X - o X - o
Z
Z
n s
n
24
5.Decision Rule:
i) If HA: μ μ 0
Reject H 0 if Z > Z1-α/2 or Z < - Z1-α/2 (when use Z - test) Or
Reject H 0 if T > t1-α/2,n-1 or T < - t1-α/2,n-1 (when use T- test)
ii) If HA: μ> μ0
Reject H0 if Z >Z1-α (when use Z - test) Or
Reject H0 if T >t1-α,n-1 (when use T - test)
iii) If HA: μ< μ0
Reject H0 if Z < - Z1-α (when use Z - test) Or
Reject H0 if T < - t1-α,n-1 (when use T - test)
25
Note
• Z1-α/2 , Z1-α are tabulated values obtained from Z
table
• t1-α/2 , t1-α are tabulated values obtained from t table
with (n-1) degree of freedom (df)
6.Decision :
• If we reject H0, we can conclude that HA is true.
• If ,however ,we do not reject H0, we may conclude
that H0 may be true.
26
An Alternative Decision Rule using the p - value
• The P-value is defined as the smallest value of α for
which the null hypothesis can be rejected.
• If the P-value is less than or equal to α ,we reject the null
hypothesis (P ≤ α if one tailed test or P≤ α/2, if two
tailed test )
• If the P-value is greater than α ,we do not reject the null
hypothesis (P > α if one tailed test or P > α/2, if two
tailed test )
27
Example
• Researchers are interested in the mean age of a certain
population.
• A random sample of 10 individuals drawn from the
population of interest has a mean of 27.
• Assuming that the population is approximately normally
distributed with variance 20.
• Can we conclude that the mean is different from 30 years ?
(α=0.05) .
28
Solution
1- Data: variable is age, n=10, x =27 ,σ 2=20,α=0.05
2-Assumptions: the population is approximately normally
distributed with variance 20
3-Hypotheses:
• H0 : μ=30
• HA: μ 30
4- Distribution of Test Statistic: X - o
Z
n
29
5. Level of significance α=0.05
6.Decision Rule
• The alternative hypothesis is H A: μ
30
Reject H0 if Zcal >Ztab or Zcal< - Ztab
Generally when HA: μ μ0
Reject H0 if │Zcal│> Z tab
6. Critical value
• Since the HA is two sided we divide α by 2
Z tab= Z1-α/2= Z1-0.05/2 =Z0.975 =1.96 in right tail and -1.96 in left
tail
30
7. Calculation of test statistic
• Zcal = 27-30 = -2.12
(√20/√10)
8. Statistical Decision:
• We reject H0 ,since -2.12 is in the rejection region .
i.e. │-2.12│> 1.96
9. Conclusion
• We can conclude that the mean age (μ) is different from 30
years
10. P-value: P = 0.0174 < 0.025, i.e. P ≤ α/2, Therefore we
reject H0
31
Example
• Among 157 African-American men ,the mean systolic
blood pressure was 146 mm Hg with a standard
deviation of 27.
• We wish to know if on the basis of these data,
• We may conclude that the mean systolic blood
pressure for a population of African-American is greater
than 140.
•Use α=0.01.
32
Solution
1. Data: Variable is systolic blood pressure, n=157,
x =146, s=27, α=0.01.
2. Assumption: population is not normal, σ2 is unknown,
n>30
3. Hypotheses: H0 :μ ≤ 140
HA: μ > 140
4.Test
Z
XStatistic:
- o
s 146 140 6
n 27 =
• = 2.1548
= 2.78 = Zcal
157
33
5.Level of significance α=0.01.
6. Decision Rule:
we reject H0 if Zcal>Z1-α
7. Critical value: Ztab = Z0.99= 2.33 (from z table)
8.Statistical Decision: We reject H0. since | 2.78| > 2.33
9. Conclusion: We may conclude that the mean systolic
blood pressure for a population of African-American is
greater than 140 mm Hg.
34
Exercise
•A simple random sample of 17 patients with muscle
injury were treated at a research center.
•The variable of interest was number of days between
injury and recovery. The number of days until recovery
was normally distributed in the population.
•Can we conclude that the mean number of days is not 15
days in the population represented by the sample data?
•See the data below
35
Table: number of days until recovery for subjects with
muscle injury
Subject Days Subject Days
1 14 11 28
2 9 12 24
3 18 13 24
4 26 14 2
5 12 15 3
6 0 16 14
7 10 17 9
8 4
9 8
10 21
36
Hypothesis Testing:
A population proportion:
37
A single population proportion:
• Testing hypothesis about population proportion (P) have
the following steps:
1.Data: sample size (n), sample proportion ( p̂
) ,
hypothesized population proportion (P 0)
no. of element in the sample with some charachtaristic a
pˆ
Total no. of element in the sample n
2. Assumptions :normal distribution ,
38
3.Hypotheses:
we have three cases
• Case I : H0: P = P0
HA: P ≠ P0
• Case II : H0: P ≤ P0
HA: P > P0
• Case III : H0: P ≥ P0
HA: P < P0
4.Test Statistic: pˆ p0
Z
p0 q0
n 39
5.Decision Rule:
i) If HA: P ≠ P0
• Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 __
ii) If HA: P> P0
• Reject H0 if Z>Z1-α
iii) If HA: P< P0
Reject H0 if Z< - Z1-α
Note: Z1-α/2 , Z1-α are tabulated values obtained from table
6. Conclusion: reject or fail to reject H0
40
Example
•A study on 301 Hispanic women in San Antonio, Texas
investigated percentage of subjects with impaired fasting
glucose (IFG).
• In the study, 24 women were classified in the IFG stage. The
population estimates for IFG among Hispanic women in Texas
as 6.3%.
•Is there sufficient evidence to indicate that the population of
Hispanic women in San Antonio has a prevalence of IFG higher
than 6.3%.
41
Solution:
1. Data: n = 301, P0 = 6.3/100 = 0.063 , a = 24,
a 24
ˆ
p 0.08
n 301
q0 =1- P0 = 1- 0.063 =0.937, α=0.05
p̂
2. Assumptions : is approximately normally
distributed
3.Hypotheses:
H0: P ≤ 0.063
HA: P > 0.063 42
• :
4.Test Statistic
ˆ p0
p 0.08 0.063
Z 1.21
p 0 q0 0.063(0.937)
n 301
5.Decision Rule: α=0.05
Reject H0 if Z>Z1-α
Where Z1-α = Z1-0.05 =Z0.95= 1.645
6. Statistical decision: Fail to reject H0
Since Z =1.21 > Z1-α=1.645
• Interpretation ?????
43
THANK YOU!!!!