You are on page 1of 23

Type I and Type II errors and Power of the test & P-value

Shair Muhammad Hazara


MSPH, MSBE, BSN, Ped. N
E-mail address: hazara_27@hotmail.com
Learning Objectives
By the end of this session the students should be able to:
Understand the test of hypothesis using the p-value method for testing a
population mean (for large sample):
Identify appropriate null and alternative hypotheses
Select a level of significance
Compute the value of test statistic
Computation of p-value for Z-test (large sample)
Report appropriate conclusion based on p-value
Understand the two types of possible errors when conducting a
hypothesis test and power of the test.
Elements of a Test of Hypothesis

Rejection Region Approach: p-value Approach:

- Null Hypothesis - Null Hypothesis


- Alternative Hypothesis - Alternative Hypothesis
- Choice of appropriate  - Choice of appropriate 
- Assumptions - Assumptions
- Test Statistic - Test Statistic
- Identification of Critical Region - Obtain p-value
- Conclusion - Conclusion
What is the difference between critical region and p-value approach?
• Critical region approach: In this approach, we use the value of α and
researcher hypothesis (Ha) to select the rejection region and then compare it
with the value of test statistic for making the decision of whether to reject or do
not reject H0.

One-tailed, upper tail  Ha:  > 0  Z > Z


One-tailed, lower tail  Ha:  < 0  Z < -Z
Two-tailed  Ha:   0  Z > Z/2 and Z<-Z/2

• p-value approach: Another approach and now a days the most common
approach is to report the extent to which the statistic disagrees with the null
hypothesis and compare it with the value of α for the decision whether to
reject the null hypothesis. This measure of disagreement is called the p-value.
Steps for Calculating the p-value
• Choose the maximum level of a you are willing to tolerate.

• Determine the value of the test statistic Z from the sample data. Look up the
Z-statistic and find the corresponding probability:
– One-tailed test - the p-value= tail area beyond Z in the same direction as
the alternative hypothesis.
– Two-tailed test - the p-value= 2 times the tail area beyond the Z value in
the direction of the sign of Z.
• Reject the null hypothesis (Significant), if the observed p-value is less than
a.
Example: Mean APTT among DVT patients
• A researcher assumes that activated partial thromboplastin
time (APTT) of population of patients diagnosed with deep
vein thrombosis (DVT) is approximately normally distributed
with standard deviation of 7 seconds. A random sample of 30
hospitalized patients suffering from DVT had a mean APTT of
50 seconds. Use a 5 percent level of significance.
– Does the data provide sufficient evidence to conclude that
mean APTT for DVT patients is different from 53 seconds?
• Let  = Mean APTT of all hospitalized DVT patients.
(True/Actual)
Example: Mean APTT among DVT patients (Contd.)
Hypothesis  Description
1. Ho :  = 53 seconds  The mean APTT of DVT patients is equal to 53
seconds.
Ha :  ≠ 53 seconds  The mean APTT of DVT patients is different from
53 seconds.
2. α = 0.05
3. Test Statistic:

n  30; x  50
 0  53;   7
50  53 3
Z    2.35
7 1.28
30
Calculation of p-value?
Example: Mean APTT among DVT patients (Contd.)
How unlikely is the value of 50 seconds (sample mean)?
We need to calculate the chance of observing this or more extreme sample
mean (p-value).

Calculation of p-value:
- The value of test statistic is Z = -2.35
- Ignoring the sign, area between Z=0 & Z = 2.35 is 0.4906
- Area beyond Z=2.35 will be 0.5 - 0.4906 = 0.0094
- For two tailed hypothesis:
Area will be 2 x 0.0094 = 0.0188 (p-value).
Conclusion:
The p-value (0.0188) is less than α (0.05), so, we are rejecting our null
hypothesis. i.e. The mean APTT of DVT patients is different from 53
seconds.
Example: Frequent users of Narcotics (p-value)
Frequent users of narcotics have a mean anger expression score
higher than for nonusers of narcotics (40).

Hypothesis  Description
1. Ho :  ≤ 40  The mean anger expression score is lower than &

equal to for nonusers of narcotics (40).


Ha :  > 40  The mean anger expression score is higher than
for nonusers of narcotics (40).
1.α = 0.05
2.Test Statistic: Z = 3.2
Example: Frequent users of Narcotics (p-value)

4. P-value:
- Test statistic value is 3.2
- Ignoring the sign, area between Z=0 & Z=3.2 is
0.4993
- The p-value will be the area beyond Z=3.2 and is
equal to 0.5 - 0.4993 = 0.0007
5. Conclusion:
Reject H0 (significant), i.e. The mean anger expression score
is higher than for nonusers of narcotics (40).
Graphical Presentation of p-value
Example: Frequent users of Narcotics (p-value) (contd.)
• Frequent users of narcotics have a mean anger expression
score higher than for nonusers of narcotics (40).

Hypothesis  Description
1. Ho :  ≤ 40  The mean anger expression score is lower
than & equal to for nonusers of narcotics (40).
Ha :  > 40  The mean anger expression score is higher
than for nonusers of narcotics (40).
2. α = 0.05

3. Test Statistic: Z = 1.3


Example: Frequent users of Narcotics (p-value) (contd.)

4. P-value:
- Test statistic value is 1.3
- Ignoring the sign, area between Z=0 & Z=1.3 is
0.4032
- The p-value will be the area above Z=1.3 and is
equal to 0.5 - .4032 = 0.0968
5. Conclusion:
Fail to reject the H0 (non-significant), i.e. The mean anger
expression score is lower than and equal to for nonusers
of narcotics (40).
Errors Involved in Hypothesis Testing

• Type I Error (Rejection error or Alpha () error): It is the


decision that we reject Ho when in fact Ho is true.

• Type II Error (Non-rejection error or Beta () error): It is


the decision that we do not reject Ho when in fact Ho is false.
Reject H0 Do not reject H0
(conclude µ≠53) (conclude µ=53)

Type I error Non-significant result


H0 true (Correct Decision)
(µ=53)

Non-Significant result Type II error


H0 is false (Correct Decision)
(µ≠53)
Probabilities of Errors Involved in Hypothesis Testing
and Power of the test

• Pr (Type I Error) or  or Level of significance: It is the risk of


rejecting Ho when in fact Ho is true. Most common values are 0.05 or
0.01.

• Pr (Type II Error) or : It is the risk of not rejecting Ho when in fact


Ho is false.

• Power = 1 - Pr(Type II Error) or (1 - ): It is the risk of rejecting Ho,


when in fact Ho is false. Most common values are 0.80 or 0.90.
Reject H0 Do not reject H0
(conclude µ≠53) (conclude µ=53)

Pr(Type I error)= Confidence level=(1-)


H0 true
(µ=53)

Power=(1-) Pr (Type II error)=


H0 is false
(µ≠53)
Example: Mean APTT among DVT patients
• A researcher assumes that activated partial thromboplastin
time (APTT) of population of patients diagnosed with deep
vein thrombosis (DVT) is approximately normally distributed
with standard deviation of 7 seconds. A random sample of
30 hospitalized patients suffering from DVT had a mean
APTT of 50 seconds. Use a 5 percent level of significance.
– State type I, type II errors and power of the test for the
situation that mean APTT for DVT patients is different
from 53 seconds?
• Let  = Mean APTT of all hospitalized DVT patients.
(True/Actual)
Example: Mean APTT among DVT patients (Contd.)

Hypothesis  Description
1. Ho :  = 53 seconds  The mean APTT of DVT patients is
equal to 53 seconds.

Ha :  ≠ 53 seconds  The mean APTT of DVT patients is


different from 53 seconds.
Example: Mean APTT among DVT patients (Contd.)
Errors Involved in Hypothesis Testing

• Type I Error: Rejecting Ho when it is true.


(Claiming that the mean APTT of DVT patients is different
from 53 seconds, when in fact it is equal to 53 seconds).

• Type II Error: Do not rejecting Ho when it is false.


(Claiming that the mean APTT of DVT patients is equal to
53 seconds, when in fact it is different from 53 seconds).
Example: Mean APTT among DVT patients (Contd.)
Probability of Errors Involved in Hypothesis Testing

• Pr (Type I Error) or  or Level of significance:

It is the probability of claiming that the mean APTT of DVT patients is


different from 53 seconds, when in fact it is equal to 53 seconds.

• Pr (Type II Error) or :

It is the probability of claiming that the mean APTT of DVT patients is


equal to 53 seconds, when in fact it is different from 53 seconds.

• Power or (1- ):

The ability of avoiding to decide that the mean APTT of DVT


patients is equal to 53 seconds, when in fact it is different from 53
seconds.
What if your conclusion about test of hypothesis is non-significant?

• If the conclusion of a test of hypothesis is failing to reject null


hypothesis or non-significant (i.e. p > .05 ) then it means that:
• there is no effect, i.e. H0 is true.
• Or we have made a Type II error.

• Note: Reasons for a Type II error may be due to one or more of


the following:
a) Use of too small sample size
b) High variation between observations
c) Use of inappropriate test of hypothesis

You might also like