You are on page 1of 5

More Chapter 21 (Error and Power)

Meaning of the Terms Fail to Reject H0 and Reject H0


Term Meaning
Fail to reject H0 There is not enough evidence in the data (and the test being used) to justify a
rejection of H0. This means that we retain H0 with the understanding that we have
not proved it to be true beyond all doubt.
Reject H0 There is enough evidence in the data (and the test employed) to justify rejection of
H0. This means that we choose the alternate hypothesis Ha, with the understanding
that we have not proved Ha to be true beyond all doubt.

Probabilities Associated with a Statistical Test


Our Decision
Truth of H0 And if we accept H0 as true And if we reject H0 as false
H0 is true Correct decision, with corresponding Type I error, with corresponding
probability 1 probability , called the level of
significance of the test
H0 is false Type II error, with corresponding Correct decision, with corresponding
probability probability 1 , called the power of the
test

Example 1: For each of the following, describe the Type I and Type II errors:

Discuss which error you think is more serious as well.

(a) Criminal trial.


H 0 : Not Guilty
H a : Guilty

Type I: The defendant is not guilty but found guilty.

Type II: The defendant is guilty but found not guilty.

(b) Going to a doctor when you feel sick.


H 0 : You are well.
H a : You are sick.

Type I: You are well but diagnosed sick.

Type II: You are sick but diagnosed well.


(c) FDA: Approval of a New Drug
H 0 : Drug has no effect.
H a : Drug is effective.

Type I: Drug is not effective but found effective.

Type II: Drug is effective but found not effective.

Example 2: p.501 #16


Spam filters try to sort your e-mails, deciding which are real messages and which are unwanted. One
method used is a point system. The filter reads each incoming e-mail and assigns points to the sender,
the subject, key words in the message, and so on. The higher the point total, the more likely it is that
the message is unwanted. The filter has a cutoff value for the point total; any message rated lower than
that cutoff passes through to your inbox, and the rest, suspected to be spam, are diverted to the junk
mailbox.

We can think of the filters decision as a hypothesis test. The null hypothesis is that the e-mail is a real
message and should go to your inbox. A higher point total provides evidence that the message may be
spam; when theres sufficient evidence, the filter rejects the null, classifying the message as junk. This
usually works pretty well, but, of course, sometimes the filter makes a mistake.

(a) When the filter allows spam to slip through into your inbox, which kind of error is that?

H 0 : The email is not junk.


H a : The email is junk.
Type II; the truth is the email is junk but sent to inbox anyway.

(b) Which kind of error is it when a real message gets classified as junk?

Type I; the email is not junk, but sent to junk mail anyway.

(c) Some filters allow the user (thats you) to adjust the cutoff. Suppose your filter has a default
cutoff of 50 points, but you reset it to 60. Is that analogous to choosing a higher or lower value of
for a hypothesis test? Explain.

This is similar to lowering the alpha level. It takes more evidence to reject the null and classify the
email as junk.

(d) What impact does this change in the cutoff value have on the chance of each type of error?

The probability of a Type I decreases (lower alpha) and the probability of a Type II increases.
Example 3: p.501 #18
Consider again the points-based spam filter described in Exercise 16. When the points assigned to
various components of an e-mail exceed the cutoff value youve set, the filter rejects its null hypothesis
(that the message is real) and diverts that e-mail to a junk mailbox.

H 0 : The email is not junk.


H a : The email is junk.

(a) In this context, what is meant by the power of the test?

The power of the test is the ability of the filter to detect spam. It is the probability that the test will
correctly send an email to the junk mailbox WHEN the email is indeed spam.

(b) What could you do to increase the filters power?

Lower the cutoff score (this would be analogous to increasing the alpha level).

(c) Whats the disadvantage of doing that?

If the cutoff score is lowered, the risk of a Type I error increases so a larger number of good emails
will end up in the junk mailbox.
Example 4: p.502 #22
Production managers on an assembly line must monitor the output to be sure that the level of defective
products remains small. They periodically inspect a random sample of the items produced. If they find
a significant increase in the proportion of items that must be rejected, they will halt the assembly
process until the problem can be identified and repaired.

(a) In this context, what is a Type I error?

H 0 : The assembly process is working fine.


H a : The assembly process is producing defective items.

The truth is the process is working fine but managers determine it is not.

(b) In this context, what is a Type II error?

The truth is the assembly process is producing defective items but managers determine it is working
fine.

(c) Which type of error would the factory owner consider more serious?

Type II; defects caught in the factory are generally cheaper to correct than defects found after the
point of sale.

(d) Which type of error might customers consider more serious?

Type II; customers dont want to purchase defective items.

Note: the power of the test in this scenario is the probability that the test will, in light of some true
alternative, correctly reject the null hypothesis. In context, given the factory is producing defective
items, the probability the test will correctly identify that.
Example 5: p.502 #26
Highway safety engineers test new road signs, hoping that increased reflectivity will make them more
visible to drivers. Volunteers drive through a test course with several of the new- and old-style signs
and rate which kind shows up the best.

(a) Is this a one-tailed or a two-tailed test? Why?

H 0 : Reflective signs are not more visible.


H a : Reflective signs are more visible.
One-tailed; we want to test if the signs are MORE visible not that the visibility is different.

(b) In this context, what would a Type I error be?

The truth is reflective signs are not more visible but we decide they are.

(c) In this context, what would a Type II error be?

The truth is reflective signs are more visible but we decide they are not.

(d) In this context, what is meant by the power of the test?

With respect to some true alternative, the power of the test is the probability that the test will correctly
reject the null and determine that reflective signs are more visible.

(e) If the hypothesis is tested at the 1% level of significance instead of 5%, how will this affect the
power of the test?

If the alpha level decreases, so does the probability of a Type I error. Since the probability of a Type I
error is decreasing, the probability of a Type II error will increase. Since the probability associated
with the power of the test is the complement of a Type II error, the power will decrease as well.

(f) The engineers hoped to base their decision on the reactions of 50 drivers, but time and budget
constraints may force them to cut back to 20. How would this affect the power of the test?
Explain.

Larger sample sizes will result in more power. Decreasing the sample size will decrease the power of
the test.

You might also like