Introduction to Hypothesis Testing Sometimes the inference one wishes to make takes the form of a choice between two hypotheses about the nature of reality: H0 , called the null hypothesis, and H1 , called the alternative hypothesis. One wishes to develop a statistical decision rule which makes a choice based on the available data. Any decision rule, statistical or otherwise, is subject to errors of two sorts : Definition Rejecting the null hypothesis when the null hypothesis is true is an error of Type I. Accepting the null hypothesis when the null hypothesis is false is an error of Type II. A classic illustration is the case of a person charged with a criminal offense. In this case the legal system must make a choice between the pair of hypotheses : H0 : The defendant is innocent. H1 : The defendant is guilty. What are Type I and Type II errors in this instance ? Mathematics can design a test with a specified probability of a Type I error and a specified probability of a Type II error ; deciding on the relative gravity of the errors is not a mathematical issue. To make this point, consider the pair of hypotheses displayed above. In our society the more serious error is an error of Type I. The feeling is : Better to let a thousand guilty men go free than punish a single innocent person . This is not a universal feeling, and there are societies for which Type II error is the more serious error : Better to murder an entire village than to miss punishing one guilty person. In the following discussion we consider how to design a test of hypothesis for a pair of hypotheses concerning the value of a population mean. The goal is to have a test which meets the specifications on Type I and Type II error with as small a sample size as possible

An Application : Consider the testing of a new drug designed to extend the time between periods of severe respiratory difficulty for subjects suffering from asthma. This time interval depends on many variables relating to heredity and environment ; so its appropriate to view this interval as a random variable . Clinical studies suggest that, for the population of asthmatics who are candidates for the new drug, the mean time between asthmatic events is normally distributed with mean of 60 hours and a standard deviation of about 10 hours. The makers of the new drug claim that the mean time between events for those using the new medication is longer than for those not using the new medication , with about the same standard deviation as before, 10 hours. Always skeptical, the FDA seeks to test the pair of hypotheses : H0 : : # 60 > 60

H1 :

In other words, the null hypothesis is that the new drug is, at best , useless ; the alternative hypothesis is that the new drug actually provides some relief to asthma sufferers.

There is an intuitive test for this pair of hypotheses : observe T , the time between episodes of respiratory difficulty, for n asthmatics who are given the new medication ; then compute the average , if . If > c , for some critical value c, then H0 is rejected in favor of H1 ;

Recall from our previous discussion of hypothesis testing that there are two kinds of errors : a Type I error, which means rejecting H 0 when it is true ; and a Type II error , which means accepting H0 when it is false. In the context of this application, a Type I error means permitting the sale of a drug which is either useless , or actually harmful ; a Type II error means that a drug which may give some relief to asthma suffers will not be available . Lets say the FDA specifies that if the true mean is 58 hours then the probability of a Type I error should be 0.01 in other words, the probability of accepting H 0 should be 0.99 ; but if the true mean is 62 hours then the probability of a Type II error should be 0.05 in other words the probability of accepting H0 should be 0.05. These specifications suffice to determine both the critical value and the sample size.

: = 58 ,

Since T - N ( : , F2 ) , when : = 58 and F = 10 , the expression on the left side of the inequality is standard normal ; hence using standard normal tables :

Solving for c, the critical value, in terms of n , the sample size, gives :

(*)

. The specification

: = 62 ,

on the probability of committing a Type I error is that, when . Which implies that :

Solving for c, the critical value, in terms of n , the sample size, gives :

( ** )

Equations ( * ) and ( ** ) are that mathematical object so dear to the heart of high school students everywhere : two equations in two unknowns ! Setting c = c and solving for n gives n = 99. ( Recall n is the sample size which means n must be integer and its better to round up than round down ....) Substituting this value of n into the expression for c obtained earlier, we find c = 60.34. ( The value of c obtained by substitution in ( * ) and ( ** ) are slightly different merely due to the rounding of n. Do NOT round the value of c : small variations in the sample average are significant! )

This completes the design of the test . It remains to actually conduct the test : i.e. actually observe 99 time intervals between attacks for users of the new medication, and see whether or not the sample average of those 99 times exceeds the critical value.

