You are on page 1of 44

Debre Tabor University

College of Health Science


SPH unit
Mulu Tiruneh(Asst. Professor in Biostatistics)

tirunehmulu1@gmail.com

January, 2024

1
At the end of this chapter the student will be able to:

 Understand the concepts of null and alternative hypothesis

 Explain the meaning and application of statistical significance

 Differentiate between type I and type II errors

 Describe the different types of statistical tests used when


samples are large and small

 Explain the meaning and application of P – values

 Understand the concepts of degrees of freedom


HYPOTHESIS TESTING
• Hypothesis: is a statement about one or more populations.
• It is usually concerned with the parameters(mean,
proportion) of the population.

• A statistical hypothesis is an assumption or a statement


which may or may not be true concerning one or more
populations.

E.g. 1) The mean height of the DTU Health Sciences


students is 1.63m.

2) There is no difference between the distribution of P.


falciform and P. vivax malaria in Ethiopia (distributed in
4
Statistical hypotheses

There are two hypotheses involved in hypothesis testing

1. Null hypothesis (H0): It is the hypothesis to be


tested. Also called hypotheses of no difference, no
effect and denoted by Ho

2. Alternative hypothesis (HA ): It is a statement of


what we believe is true if our sample data cause us to
reject the null hypothesis.

5
In general, hypothesis testing in statistics involves the following
steps:

1. Choose the hypothesis that is to be questioned.

2. Choose an alternative hypothesis which is accepted if the original


hypothesis is rejected.

3. Choose a rule for making a decision about when to reject the


original hypothesis and when to fail to reject it.

4. Choose a random sample from the appropriate population and


compute appropriate statistics: that is, mean, variance and so on.

5. Make the decision.

6
Choosing the Alternative Hypothesis (HA)

• The notation HA is used for the hypothesis that will be accepted , if HO is

rejected.

• HA must also be formulated before a sample is tested, so it like the null

hypothesis (HO), does not depend on sample values.


Possible choices of HA

If HO is Then HA is
μ = A (single mean) μ ≠ A or μ < A or μ > A
P = B (single proportion) P ≠ B or P < B or P > B

μx – μ y = C (difference of means) μx – μ y ≠ C or μx – μ y < C or μx – μ y> C

Px – P y = D(difference of proportions Px – P y ≠ D or Px- P y < D or Px- P y > D

Where, A, B, C and D are constants. 7


Cont.…

• A method for making a decision must be agreed upon.

• If HO is rejected, then HA is accepted.

How is a “significant” difference defined?

• A null hypothesis is either true or false, and it is either


rejected or not rejected.

• No error is made if it is true and we fail to reject it, or if it


is false and rejected.

• An error is made, however, if it is true but rejected, or if it


8
is false and we fail to reject it.
Definitions

• A Type I error : is made when HO is true but rejected.

• A Type II error: is made when HO is false but we fail


to reject it .

Notation:

• α is the probability of a type I error.

• It is called the level of significance.

• β is the probability of a type II error.


9
• The following table summarizes the relationships
between the null hypothesis and the decision taken .
Decision

Accept HO Reject HO
Null hypothesis
(Fail to reject HO)

HO True Correct Type I error

HO False Type II error Correct

10
Level of Significance, α

• Is the probability of rejecting a true Ho

• Defines rejection region of the sampling distribution

• The decision is made on the basis of the level of


significance, designated by α.

• More frequently used values of α are 0.01, 0.05 and


0.10.

• α is selected by the researcher

11
One tail and two tail tests
• In a one tail test, the rejection region is at one end of the distribution or the other.

• Consider the situation when HA includes the symbol “ > or < ”. That is,

HA: μ > __ , HA : μ < __,

HA : P > __, HA : P < __,

HA : μx - μy > ___, HA : μx - μy < ___, etc.

• In a two tail test, the rejection region is split between the two tails.

• Consider the situation when HA includes the symbol “≠”. That is,

HA : μ ≠ _ HA : P ≠ __ HA : μx - μy ≠ ___

• Which one is used depends on the way the HA is stated.

12
• The most frequently used values of α and the
corresponding critical values of Z are:

α (level of Two-tailed One -tailed, < One-tailed, >


significance)

0.10 ± 1.64 - 1.28 1.28

0.05 ± 1.96 - 1.64 1.64

0.01 ± 2.58 - 2.33 2.33

13
Level of Significance and the Rejection Region
e.g. The average survival year after cancer diagnosis
is less than 3 years.

14
Steps in testing hypothesis

1. Data: understand the nature of data (e.g. counts or


measurements or proportions)

2. Assumptions: about normality of population


distribution, equality of variance, independence of
samples

3. Hypotheses: the H0 and HA should be explicitly stated

15
3.Hypotheses cont’d
Rules for stating statistical hypotheses

a) What you hope to be able to conclude as a result of the test

usually should be placed in the alternative hypothesis.

b) The null hypothesis should contain a statement of equality, either

=,≥, or ≤.

c) The null hypothesis is the hypothesis that is tested.

d) The null and alternative hypotheses are complementary.

•That is, the two together exhaust all possibilities regarding the

value that the hypothesized parameter can assume.


16
4. Test statistic:
 Decide on the appropriate test statistic for the hypothesis
(z, t,etc.) Based on the
sample size (n< 30 or n >30),
type of data (count i.e. qualitative or measurement or
quantitative),
functional form of the distribution (normal or non
normal),
known or unknown population variance,
number of means or proportions, etc.
General formula for test statistic

standard error of the observed statistic

17
5. Select the level of significance (α):

(α =0.05, 0.01, 0.001, etc). If not given take 0.05

• The level of significance (α) :is the probability of


rejecting a true null hypothesis.
6. Determine Critical value (z tab, t tab):

•It is the value the test statistic must attain to be declared


significant (i.e. label the rejection & acceptance regions)

18
7. Calculation of test statistic (zcalc, tcalc):

calculate the test statistic based on step 4 and compare it with the
critical value.

8. Statistical decision: statistical decision consists of rejecting or not


rejecting the null hypothesis.

It is rejected if the computed value of the test statistic falls in the
rejection area. i.e. Reject Ho if, Z cal > Z tab OR t cal> t tab

It is not rejected if the computed value of the test statistic falls in the
non-rejection area. i.e. Accept or don't reject Ho if, Z cal < Z tab OR t cal< t
tab

19
9. Conclusion :

• If Ho is rejected, we conclude that HA is true.

• If Ho is not rejected, we conclude that Ho may be true.

10. P-values:

• The p-value is the probability of getting a value for the test


statistic larger than the observed value of the test statistic just by
random chance if Ho is true

Reject the null hypothesis if P ≤ α

Don't reject ("accept") the null hypothesis if P > α

20
I. Testing a hypothesis about the mean of a
population

21
I. Testing a hypothesis about the mean of a population

1.Data: Determine variable, sample size (n), sample mean( ) ,


x
population standard deviation or sample standard deviation (s)
if it is unknown.

2. Assumptions : We have two cases:

• Case1: Population is normally or approximately normally


distributed with known or unknown variance (n may be small
or large),

• Case 2: Population is not normal with known or unknown


variance (n is large i.e. n≥30). 22
3.Hypotheses:
we have three cases
• Case I : H0: μ=μ0
HA: μ  μ0
e.g. we want to test that the population mean is different than50

• Case II : H0: μ ≤ μ0
HA: μ > μ0
e.g. we want to test that the population mean is greater than 50

• Case III : H0: μ ≥ μ0


HA: μ < μ0
e.g. we want to test that the population mean is less than 50

23
4.Test Statistic :
• Case 1: population is normal or approximately normal

σ2 is known σ 2 is unknown
( n large or small)
n large n small
X - o
Z
 t
X - o
n X - o s
Z 
s n
n
• Case2: If population is not normally distributed and n is large
• i) If σ2 is known ii) If σ 2 is unknown
X - o X - o
Z
 Z 
n s
n

24
5.Decision Rule:
i) If HA: μ μ 0
Reject H 0 if Z > Z1-α/2 or Z < - Z1-α/2 (when use Z - test) Or
Reject H 0 if T > t1-α/2,n-1 or T < - t1-α/2,n-1 (when use T- test)

ii) If HA: μ> μ0


Reject H0 if Z >Z1-α (when use Z - test) Or
Reject H0 if T >t1-α,n-1 (when use T - test)

iii) If HA: μ< μ0


Reject H0 if Z < - Z1-α (when use Z - test) Or
Reject H0 if T < - t1-α,n-1 (when use T - test)

25
Note
• Z1-α/2 , Z1-α are tabulated values obtained from Z
table
• t1-α/2 , t1-α are tabulated values obtained from t table
with (n-1) degree of freedom (df)

6.Decision :

• If we reject H0, we can conclude that HA is true.

• If ,however ,we do not reject H0, we may conclude


that H0 may be true.
26
An Alternative Decision Rule using the p - value
• The P-value is defined as the smallest value of α for
which the null hypothesis can be rejected.

• If the P-value is less than or equal to α ,we reject the null


hypothesis (P ≤ α if one tailed test or P≤ α/2, if two
tailed test )

• If the P-value is greater than α ,we do not reject the null


hypothesis (P > α if one tailed test or P > α/2, if two
tailed test )

27
Example

• Researchers are interested in the mean age of a certain


population.

• A random sample of 10 individuals drawn from the


population of interest has a mean of 27.

• Assuming that the population is approximately normally


distributed with variance 20.

• Can we conclude that the mean is different from 30 years ?


(α=0.05) .

28
Solution

1- Data: variable is age, n=10, x =27 ,σ 2=20,α=0.05

2-Assumptions: the population is approximately normally


distributed with variance 20

3-Hypotheses:
• H0 : μ=30
• HA: μ  30

4- Distribution of Test Statistic: X - o


Z 

n

29
5. Level of significance α=0.05
6.Decision Rule
• The alternative hypothesis is H A: μ 
30
Reject H0 if Zcal >Ztab or Zcal< - Ztab
Generally when HA: μ  μ0
Reject H0 if │Zcal│> Z tab
6. Critical value
• Since the HA is two sided we divide α by 2
Z tab= Z1-α/2= Z1-0.05/2 =Z0.975 =1.96 in right tail and -1.96 in left
tail

30
7. Calculation of test statistic
• Zcal = 27-30 = -2.12
(√20/√10)

8. Statistical Decision:
• We reject H0 ,since -2.12 is in the rejection region .
i.e. │-2.12│> 1.96

9. Conclusion
• We can conclude that the mean age (μ) is different from 30
years

10. P-value: P = 0.0174 < 0.025, i.e. P ≤ α/2, Therefore we


reject H0
31
Example
• Among 157 African-American men ,the mean systolic
blood pressure was 146 mm Hg with a standard
deviation of 27.

• We wish to know if on the basis of these data,

• We may conclude that the mean systolic blood


pressure for a population of African-American is greater
than 140.

•Use α=0.01.
32
Solution

1. Data: Variable is systolic blood pressure, n=157,


x =146, s=27, α=0.01.

2. Assumption: population is not normal, σ2 is unknown,


n>30

3. Hypotheses: H0 :μ ≤ 140
HA: μ > 140
4.Test
Z 
XStatistic:
- o
s 146  140 6
n 27 =
• = 2.1548
= 2.78 = Zcal
157

33
5.Level of significance α=0.01.
6. Decision Rule:
 we reject H0 if Zcal>Z1-α

7. Critical value: Ztab = Z0.99= 2.33 (from z table)

8.Statistical Decision: We reject H0. since | 2.78| > 2.33


9. Conclusion: We may conclude that the mean systolic
blood pressure for a population of African-American is
greater than 140 mm Hg.
34
Exercise

•A simple random sample of 17 patients with muscle


injury were treated at a research center.

•The variable of interest was number of days between


injury and recovery. The number of days until recovery
was normally distributed in the population.

•Can we conclude that the mean number of days is not 15


days in the population represented by the sample data?

•See the data below

35
Table: number of days until recovery for subjects with
muscle injury
Subject Days Subject Days

1 14 11 28
2 9 12 24
3 18 13 24
4 26 14 2
5 12 15 3
6 0 16 14
7 10 17 9
8 4
9 8
10 21

36
Hypothesis Testing:
A population proportion:

37
A single population proportion:

• Testing hypothesis about population proportion (P) have


the following steps:

1.Data: sample size (n), sample proportion ( p̂


) ,
hypothesized population proportion (P 0)
no. of element in the sample with some charachtaristic a
pˆ  
Total no. of element in the sample n

2. Assumptions :normal distribution ,


38
3.Hypotheses:
we have three cases
• Case I : H0: P = P0
HA: P ≠ P0
• Case II : H0: P ≤ P0
HA: P > P0
• Case III : H0: P ≥ P0
HA: P < P0

4.Test Statistic: pˆ  p0
Z
p0 q0
n 39
5.Decision Rule:

i) If HA: P ≠ P0

• Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 __

ii) If HA: P> P0

• Reject H0 if Z>Z1-α

iii) If HA: P< P0

Reject H0 if Z< - Z1-α

Note: Z1-α/2 , Z1-α are tabulated values obtained from table

6. Conclusion: reject or fail to reject H0

40
Example

•A study on 301 Hispanic women in San Antonio, Texas


investigated percentage of subjects with impaired fasting
glucose (IFG).

• In the study, 24 women were classified in the IFG stage. The


population estimates for IFG among Hispanic women in Texas
as 6.3%.

•Is there sufficient evidence to indicate that the population of


Hispanic women in San Antonio has a prevalence of IFG higher
than 6.3%.

41
Solution:

1. Data: n = 301, P0 = 6.3/100 = 0.063 , a = 24,


a 24
ˆ 
p   0.08
n 301

q0 =1- P0 = 1- 0.063 =0.937, α=0.05


2. Assumptions : is approximately normally


distributed

3.Hypotheses:
H0: P ≤ 0.063
HA: P > 0.063 42
• :
4.Test Statistic
ˆ  p0
p 0.08  0.063
Z    1.21
p 0 q0 0.063(0.937)
n 301
5.Decision Rule: α=0.05
Reject H0 if Z>Z1-α
Where Z1-α = Z1-0.05 =Z0.95= 1.645

6. Statistical decision: Fail to reject H0


Since Z =1.21 > Z1-α=1.645

• Interpretation ?????

43
THANK YOU!!!!

You might also like