You are on page 1of 6

Hypothesis Testing

Methods for making inferences about population parameters fall into one of two categories. Either we will
estimate the value of the population parameter of interest or we will test a hypothesis about the value of the
parameter.
With confidence interval estimates there was no supposition about the actual value of the parameter prior to
collecting the data. In hypothesis testing, there is a preconceived idea about the value of the population
parameter.
For example, in studying the antipsychotic properties of an experimental compound, we might ask whether the
average shock-avoidance response of rats treated with a specific dose of the compound is greater than 60, >60,
the value that has been observed after extensive testing using a suitable standard drug. Thus, there are two
hypotheses involved in a statistical study. The first is the hypothesis being proposed by the person conducting
the study, called the research hypothesis, >60 in our example. The second is the negation of the research
hypothesis, called the null hypothesis, 60 in our example. The goal of the study is to decide whether the
sample data tend to support the research hypothesis.
The fundamental idea behind hypothesis testing procedures is this: We reject the null hypothesis(H0) if the
observed sample is very unlikely to have occurred when H0 is true.
We begin by assuming that the null hypothesis is correct. The sample is then examined in light of this
assumption. If the observed sample would not be unusual when H0 is true, then chance variation from one
sample to another is a plausible explanation for what has been observed, and H0 is not rejected. On the other
hand, if the observed sample would have been quite unlikely were H 0 true, we take the sample as convincing
evidence against the null hypothesis and reject H 0. We base a decision to reject or to fail to reject the null
hypothesis on an assessment of how extreme or unlikely the observed sample is when H0 is true.
A statistical hypothesis test is only capable of demonstrating strong support for the alternative(research)
hypothesis (by rejection of the null hypothesis). When the null hypothesis is not rejected, it does not mean
strong support for H0, only lack of strong evidence against it.

Hypothesis Test for a Population Mean


Using a Large Sample (n>30)
H0:

= o

Ha:

1. > o
2. < o
o
3.

Test Statistic:

x o

Rejection Region: For a probability of a Type-I error, we can reject H0 if


1.
2.
3.

z z
z - z
z zor z - z

if (the population standard deviation) is unknown, s may be used instead as an approximation.

Example The average (mean) live weight of a farmers steers prior to slaughter was 380 pounds in past years.
This year his 50 steers were fed on a new diet. Suppose we consider these 50 steers on the new diet as a random
sample taken from a population of all possible steers that may be fed the diet now or in the future. Use the
sample data given below and =.01 to test the research hypothesis that the mean live weight for steers on the
new diet is greater than 380.
x =390 s=35.2.
= population mean live weight of steers fed on the new diet
H0: = 380
Ha: > 380
Significance Level: = P(Type-I Error) = .01
Test Statistic: z =

x 380
s
n

Rejection Region: For =.01, we reject H0 if z zwhere z=2.33

390 380
Calculations: z = 35.2
= 2.01
50

Conclusion: Using =.01we fail to reject H0. There is not sufficient evidence to conclude that the mean live
weight for steers on the new diet is greater than 380.
Example WSU uses thousands of fluorescent light bulbs each year. The brand of bulb it currently uses has a
mean life of 900 hours. A manufacturer claims that its new brand of bulbs, which cost the same as the brand the
university currently uses, has a mean life of more than 900 hours. The university has decided to purchase the
new brand if, when tested, the test evidence supports the manufacturers claim at =.05. Suppose sixty-four
bulbs were tested with the following results:

x = 930 hours

s= 80 hours

Will WSU purchase the new brand of fluorescent bulbs? Conduct hypothesis test.
= population mean life for the new brand of bulbs
H0: = 900
Ha: > 900 (the mean life for the new brand of bulbs is higher than the mean life for the old brand)
Significance Level: = .05

x 900
Test Statistic: z = s
n
Rejection Region: For =.05, we reject H0 if z zwhere z= 1.645

930 900
Calculations: z = 80
= 3.00
64
Conclusion: Using =.05we reject H0. There is sufficient evidence to conclude that the mean life for the new
brand of bulbs is greater than 900.

P-value Approach to Hypothesis Testing


The p-value associated with the observed value of the test statistic is the probability of getting the observed
value or a value more extreme (in the direction of Ha), assuming H0 is true.
To draw conclusions using p-value, you compare the p-value with and draw your conclusions using the
following rule.
If the p-value reject H0 and conclude that Ha is true.
If the p-value > fail to reject H0.
Type-I Error, Type-II error, and Power in Hypothesis Testing
A Type I error refers to the decision of rejecting Ho when it is actually true. The probability of making a type I
error is denoted by .
A Type II error refers to the decision of failing to reject Ho when it false. The probability of making a type II
error is denoted by .
The power of a test is the probability of rejecting a false null hypothesis. It is denoted by 1- , where is the
probability of making a Type-II error.
If you decrease , will increase.
If you increase , willdecrease.
Statistical versus Practical Significance
When the value of the test statistic falls in the rejection region, it is customary to say that the result is
statistically significant at the chosen level However, a statistically significant result may not having any
practical consequences. This is something to be wary of when a very large sample is used in carrying out the
hypothesis test. The following example is used to illustrate this point.
Example Let denote the population average IQ for children in a certain region of the United States. The
average IQ for all children in the United States is 100. Education authorities are interested in testing
H0: =100

Ha: >100

A sample of 2500 students resulted in the following

x =101 s=15
Using =.01, carry out hypothesis test.

With n=2500, the point estimate x = 101 is almost surely very close to the true value of . So it looks as though
H0 was rejected because
101 rather than 100. And from a practical point, a 1-pont IQ difference has no
significance. So the statistically significant result does not have any practical consequences.
Practice Problems Hypothesis Testing

Practice Problem 1 WSU uses thousands of fluorescent light bulbs each year. The brand of bulb it currently
uses has a mean life of 900 hours. A manufacturer claims that its new brand of bulbs, which cost the same as the
brand the university currently uses, has a mean life of more than 900 hours. The university has decided to
purchase the new brand if, when tested, the test evidence supports the manufacturers claim at =.05. Suppose
sixty-four bulbs were tested with the following results:

x = 930 hours

s= 80 hours

Will WSU purchase the new brand of fluorescent bulbs? Conduct hypothesis test. Use traditional and p-value
approach.
Practice Problem 2 A nutritionist believes that a 12 ounce box of breakfast cereal should contain an average of
1.2 ounces of bran. The nutritionist measures a random sample of sixty boxes of popular cereal for bran content.
Suppose the data yield
x 1.170

s = .111

Do the data indicate that the mean bran content of all boxes of this brand of cereal differs from 1.2 ounces? Use
=.05. Use traditional and p-value approach.
Practice Problem 3 Speed, size and strength are thought to be important factors in football performance. The
paper Physical and Performance Characteristics of NCAA Division I Football Players (Research Quarterly
for Exercise and Sport (1990) : 395-401) reported on physical characteristics of Division I starting football
players in the 1988 football season. Information for teams ranked in the top 20 was easily obtained, and it was
reported that the mean weight of starters on top-20 teams was 105 kg. A sample of 33 starting players (various
positions were represented) from Division I teams that were not ranked in the top 20 resulted in a sample mean
of 103.3 kg and a sample standard deviation of 16.3 kg. Is there sufficient evidence to conclude that the mean
weight for non top-20 starters is less than the known value for top-20 teams. Conduct hypothesis test using
=.01. Use traditional approach and p-value approach.
Practice Problem 4 Factors influencing the Power of a Hypothesis Test
Suppose we are interested in testing H0: =105 vs. Ha: >105 using a sample of size n=35 and =.025. Based
on historical data we estimate to be 17.
a. If is actually equal to 108, what is the probability that we would reject H 0: =105 using a sample of size
n=35 and =.025?
b. If is actually equal to 110, what is the probability that we would reject H 0: =105 using a sample of size
n=35 and =.025?

c. If is actually equal to 114, what is the probability that we would reject H 0: =105 using a sample of size
n=35 and =.025?
What do you learn from the above calculations?
d. If is actually equal to 108, what is the probability that we would reject H 0: =105 using a sample of size
n=50 and =.025?
e. If is actually equal to 108, what is the probability that we would reject H 0: =105 using a sample of size
n=100 and =.025?
f. If is actually equal to 108, what is the probability that we would reject H 0: =105 using a sample of size
n=200 and =.025?
What do you learn?