You are on page 1of 16

Hypothesis Testing

Statistical Hypotheses

A statistical hypothesis is a mathematical claim about a population parameter.

Examples:

 The mean height of women is less than 65 inches tall.


 The percentage of Floridians favoring a bullet train is 57%.
 The average distance driven per year by Americans is more than 10,000 miles.
 At least 5% of Americans earn more than $100,000 per year.

We could write the claims above as   65 , p  .57 ,   10, 000 , and p  .05
respectively.

Hypothesis Testing - Basic Procedure

If we wanted to know whether any of the above hypotheses are true, we would conduct a
hypothesis test. When we test a statistical hypothesis, we follow the following basic
procedure:

1. Draw a random sample for the random variable in question.

2. Determine if the results from the sample data are consistent or inconsistent with
the hypothesis.

3. If the sample data is significantly different from the claimed hypothesis, we would
reject the hypothesis as being false. If the data is not significantly different, we
would not reject the hypothesis.

1
Example: Battery Life (p. 320)

Suppose a battery manufacturer claims that the average life of its batteries is at least 300
minutes (  300) .

To test the claim, a sample of n  100 batteries is drawn. The sample of batteries is
tested and the mean battery life in the sample is found to be x  294 minutes with a
sample standard deviation of s  20 minutes. Is this data sufficiently different from the
manufacturer's claim to justify rejecting the claim as false?

Since we have drawn a large sample, the Central Limit Theorem allows us to conclude
that the distribution of sample means x is approximately normal.

If the manufacturers claim is correct, then x    300 , and so we will assume that
s 20
x    300 . From our sample, we can also estimate that  x    2.
n 100

The observed value x  294 has a z-score of

x  x 294  300
z   3.0
x 2
Looking in the Standard Normal Table, we find that P( x  294)  .0013 .

Now, one of the following must be true:

Our assumption that   300 is incorrect.


OR
We have drawn a sample whose mean is so small that only 13 in 10,000
samples have a mean as low.

The likelihood of the second statement being true is quite small (.0013). Thus, we have
strong evidence to believe that the first statement is true, and hence that the manufacturer
overstated the mean lifetime of its batteries.

2
Formal Hypothesis Tests

In the informal hypothesis test above, we decided to reject the manufacturers claim that
  300 minutes, and thus we believe that   300 .

In a formal hypothesis test, the opposite claims above would be given the names null
hypothesis and alternative hypothesis. The null hypotheses is denoted by H 0 and the
alternative hypothesis is denoted by H a . The null and alternative hypotheses need to be
assigned as follows:

The null hypothesis is the hypothesis being tested. H 0 must:


 be the hypothesis we want to reject
 contain the condition of equality (, , )

The alternative hypothesis is always the opposite of the null hypothesis. H a must:
 be the hypothesis we want to support
 not contain the condition of equality (, , )

A formal hypothesis test will always conclude with a decision to reject H 0 based on
sample data, or the decision that there is not strong enough evidence to reject H 0 .

Example: In the battery life example, we had:


H0 :   300
H a :   300

Example: A tire manufacturer claims its tires last on average more than 35,000 miles. If
we think the claim is false, then we would write the claim as H 0 , remembering to include
the condition of equality. Our hypothesis for this test would be:
H 0 :   35, 000
H a :   35, 000
We would then hope that our sample data would allow the rejection of the null hypothesis,
refuting the company’s claim.

3
Example: On the other hand, if we worked for the tire company and wanted to gather
evidence to support their claim, then we would make the company’s claim H a , and
remember that equality should not be included in the claim. Our hypothesis test would
then use the hypotheses:
H 0 :   35, 000
H a :   35, 000
If the sample data was able to support the rejection of H 0 , this would be strong evidence
to support the claim H a which is what the company believes to be true.

Types of Error

Whenever sample data is used to make an estimate of a population parameter, there is


always a probability of error due to drawing an unusual sample. There are two main types
of error that occur in hypothesis tests.

Type I Error – A sample is chosen whose sample data leads to the rejection of the null
hypothesis when, in fact, H 0 is true.

Type II Error – A sample is chosen whose sample data leads to not rejecting the null
hypothesis when, in fact, H 0 is false.

Summarizing:

H 0 True H 0 False
H 0 Rejected Type I Error Correct Decision
H 0 Not Rejected Correct Decision Type II Error

Exercise: Determine what a Type I Error and a Type II Error would be for our tire
examples.

4
Level of Significance

In hypothesis tests, a conservative approach is usually taken toward the rejection of the
null hypothesis. That is, we want the probability of making a Type I Error to be small.

The maximum acceptable probability is usually chosen from the beginning of the
hypothesis test, and is called the level of significance for the test. The level of
significance is denoted by  , and the most commonly used values are   .10 ,   .05 ,
and   .01 .

The probability of making a Type II Error in a hypothesis test is denoted by  . Once 


is determined, the value of  is also fixed, but the calculation of this value is beyond the
scope of this course.

Types of Tests

There are three basic types of hypothesis tests:

Left-tailed Test – used when the null hypothesis being tested is a claim that the
population parameter at least () a given value. Note that the alternative hypothesis then
claims that the parameter is less than (<) the value.

Example: From our first tire manufacturer example:


H 0 :   35, 000
H a :   35, 000
We would reject H 0 in the case above if our sample mean was significantly less than
35,000. That is, if our sample mean was in the left tail of the distribution of all sample
means. (See Diagrams on Page 326 in the text)

Right-tailed Test – used when the null hypothesis being tested is a claim that the
population parameter is at most () a given value. Note that the alternative hypothesis
then claims that the parameter is greater than (>) the value.

Example: From our second tire manufacturer example:


H 0 :   35, 000
H a :   35, 000
We would reject H 0 in this case if our sample mean was significantly more than 35,000.
That is, if our sample mean was in the right tail of the distribution of all sample means.
(See Diagrams on Page 326 in the text)

5
Two-tailed Test – used when the null hypothesis being tested is a claim that the
population parameter is equal to (=) a given value. Note that the alternative hypothesis
then claims that the parameter is not equal to () the value.

Example: The Census Bureau claims that the percentage of Tampa Area residents with a
bachelor’s degree or higher is 24.4%. We would write the null and alternative hypotheses
for this claim as:
H0 : p  .244
H a : p  .244

In this case, we would have to reject H 0 if our sample percentage was either significantly
more than 24.4%, or significantly less than 24.4%. That is, if our sample proportion was
in either tail of the distribution of all sample proportions.
(See Diagrams on Page 326 in the text)

Testing a Claim about the Mean Using a Large Sample

When a hypothesis test involves a claim about a population mean, then we will draw a
sample and look at the sample mean to test the claim. If the sample drawn is large enough
( n  30 ), then the Central Limit Theorem applies, and the distribution of sample means is
 s
approximately normal. As usual, we also have that:  x   and  x   .
n n

Note: Since s and n are known from the sample data, so we have a good estimate of  x ,
but we do not know  since this is the parameter we are testing a claim about. In order
to have a value for  , we will always assume that the null hypothesis is true in any
hypothesis test.

Since the null hypothesis must be of one of the following types:


  0 ,   0 , or   0
where 0 is a constant, we will always assume for the purpose of our test that   0 .

6
The Standardized Test Statistic

There are two methods we will use to determine whether to reject or not reject the null
hypothesis, but in both cases it will be more convenient to convert our sample mean x to a
z-score which will be called our standardized test statistic.

Since we are assuming   0 , we also have  x  0 , and so:

x  x x  0
z 
x s/ n

As long as   0 as assumed, the distribution of the standardized test statistic z defined


above will be the Standard Normal Distribution.

Example: Suppose we believe that the mean body temperature of healthy adults is less
than the commonly accepted measurement of 98.6 F. A sample of 60 healthy adults is
drawn with an average temperature of x  98.2 F and with a sample standard deviation
of s  1.1 .

Our hypotheses in this case would be:


H0 :   98.6
H a :   98.6
So we have a left-tailed test with 0  98.6 .
Based on our sample data, our standardized test statistic is:

x  0 98.2  98.6 0.4


z    2.82
s/ n 1.1/ 60 .142

Exercise: Suppose we decide to make the stronger claim that the mean body temperature
was less than 98.4. Find the new standardized test statistic.

Exercise: A researcher believes that the average commuting time for Tampa commuters
is at least 25 minutes. A random sample of 100 Tampa commuters finds that the average
commuting time is x  25.5 minutes with a standard deviation of s  11.5 minutes. If we
wish to support the researcher’s claim, determine the null and alternative hypotheses, the
type of test, and the standardized test statistic.

7
The P-Value Method

The P-value of a test is the probability of drawing a random sample whose standardized
test statistic is at least as contrary to the claim of the null hypothesis as that observed in
the sample group.

Example: In the hypothesis test for body temperature given above, we had:
H0 :   98.6
H a :   98.6
Our sample had a mean temperature of x  98.2 which is contrary to the null hypothesis.
Only a sample group with an average temperature less than 98.2 would be stronger
evidence against H 0 . Thus the P-value of this test is P( x  98.2) . Since the z-score of
x is just our standardized test statistic z which has the Standard Normal Distribution,
P( x  98.2)  P( z  2.82)  .0024 .

Since the probability of drawing a sample as contrary to the null hypothesis as the
observed sample (assuming H 0 is true) is small, we would decide to reject H 0 .

Calculating P-Values

In the example above, we calculated the P-value of the test by finding the area to the left
of the standardized test statistic z on the standard normal curve. Notice that the example
above was also a left-tailed test, and that any hypothesis test which is left-tailed will have
the P-value calculated exactly as above. (See diagram on p. 326)

Similarly, for a right-tailed test, we would calculate the


P-value by finding the area to the right of the standardized test statistic.

For a two-tailed test, the null hypothesis is always claiming that   0 , and so the
sample data is contrary to this claim if the sample mean is either much higher or much
lower than 0 . The P-value for a two tailed test then is the area in both tails of the
normal distribution more extreme than the standardized test statistic. Since the normal
distribution is symmetric, this is just twice the area in one tail. (See diagram on p. 326)

Example: The P-value of the battery life example would be: P( z  3.0)  .0013

8
Example: A university claims that the average SAT score for its incoming freshmen is
1080. A sample of 56 freshmen at the university is drawn and the average SAT score is
found to be x  1044 with a sample standard deviation of s  94.7 points.

Our hypotheses in this case would be:


H 0 :   1080
H 0 :   1080
So we have a two-tailed test with 0  98.6 .
Our standardized test statistic is:
x  0 1044  1080 36
z    2.85
s/ n 94.7 / 56 12.65

Since the test is two-tailed, our P-value is given by:


P( z  2.85 or z  2.85)  2  P( z  2.85)  2  .0022  .0044

Exercise: Find the P-value for the test on Tampa commuting times

Deciding to Reject the Null Hypothesis

In the examples above, we saw that a very small P-value would lead us to reject the null
hypothesis, and a high P-value would not.

Since the P-value of a test is the probability of randomly drawing a sample at least as
contrary to H 0 as the observed sample, we can also think of the P-value as the probability
that we will be wrong if we choose to reject H 0 based on our sample data. The P-value
then is the probability of making a Type I Error.

Recall that the maximum acceptable probability of making a Type I Error is the level of
significance  , and is usually determined at the outset of the hypothesis test.
The rule we will use to decide whether to reject H 0 is:

Reject H 0 if P  
Do not reject H 0 if P  

Exercise: Is there sufficient evidence to support the researchers claim that the mean
commuting time of Tampa commuters is more than 25 minutes at a   .05 level of
significance? Is there sufficient evidence to support a claim that the mean commuting time
is more than 20 minutes?

9
Rejection Regions and Critical Values

A second method to determine whether to reject the null hypothesis is to use rejection
regions and critical values.

A rejection region for a hypothesis test is the range of values for the standardized test
statistic which would cause us to decide to reject the null hypothesis. Critical values for
a hypothesis test are the z-scores which separate the rejection region(s) from the non-
rejection region. The critical values will be denoted by z 0 .

The rejection region for a test is determined by the type of test (left/right/two tailed) and
the level of significance  for the test. For a left-tailed test, the rejection region is a
region in the left tail of the normal distribution, for a right tailed test, it is in the right tail,
and for a two tailed test, there are two equal rejection regions in either tail.
(See Diagram p. 341)

Since the level of confidence is the maximum acceptable probability of a Type I Error, we
want the area under the normal curve in the rejection region to have an area of  . We can
use this area to find our critical values.
(See Diagram p. 341)

Once we establish the critical values and rejection region, if the standardized test statistic
for a sample data set falls in the rejection region, we will reject the null hypothesis.

Example: In our body temperature example, we were using a left-tailed test. If the level
of significance was   .05 , then the rejection region would be the values in the lowest
5% of the standard normal distribution. Looking up .05 in the standard normal table, we
see that this corresponds to a z-score of at most -1.645, so this would be our critical value
z 0 , and so our rejection region is z  1.645 . Since our standardized test statistic
z  2.82 falls into this region, we would choose to reject H 0 .

Example: In our SAT example, we had a two-tailed test, so the rejection region will be
the two tails at either end of the normal distribution. If we again want   .05 , then the
area in both rejection regions together should be .05, and so we will look up  / 2  .025
in the standard normal table to get critical values of z0  1.96 . The rejection region thus
consists of: z  1.96 and z  1.96 . Since our standardized test statistic z  2.85 falls
in the region, we would reject the university’s claim that   1080 .

Exercises: Find the critical values and rejection region(s):


 A right-tailed test with   .05
 A left-tailed test with   .01
 A two-tailed test with   .10
 A right-tailed test with   .02

10
Exercise: Mercury levels in fish are considered dangerous to people if they exceed 0.5 mg
mercury per kilogram of meat. A sample of 50 tuna is collected, and the mean level of
mercury in these 50 fish is 0.6 mg/kg. with a standard deviation of 0.2 mg/kg. A health
warning will be issued if it the claim that the mean exceeds 0.5 mg/kg. can be supported at
the   .10 level of significance.

Determine the null and alternative hypotheses in this case, the type of test, and the critical
value(s) and rejection region.

Find the standardized test statistic for this data. Should the health warning be issued?
What is the P-value of this test?

Testing a Claim about the Mean Using a Small Sample

Recall that if a sample of values for a normal random variable is drawn and if the sample
size is less than 30, and the population standard deviation is unknown, then the random
variable
x 
t
s/ n

has the Student t-distribution with n 1 degrees of freedom.


When testing a claim about the mean using sample data from a small sample, we should
therefore use the appropriate t-distribution instead of the standard normal distribution to
determine our standardized test statistic, critical values, rejection regions, and P-values.

The standardized test statistic for a t-distribution test will be given by:
x  0
t
s/ n

Note: Probability values for the t-distributions are not given in the text, so we will limit
our discussion to using the critical value method in this case. However, P-values for
t-distribution tests can be calculated using the TI-83.

11
Finding Critical Values for the t-distributions

The process for locating the rejection regions for a t-distribution hypothesis test are the
same as for the normal distribution tests. The critical values however will be different.

To find the critical value(s) t0 for a test, first determine if the test is one-tailed or two-
tailed, and the level of significance  . The critical values can be found in the t-distribution
table in the front of the book by looking up the entry in the column giving the level of
significance and the row giving the degree of freedom.

Note that:
 For a right-tailed test, t0 is the value in the table
 t0 is the negative of the value in the table
For a left-tailed test,
 For a two-tailed test, there are two critical values t0 both the value and its
opposite

Example: SAT Math Scores are normally distributed. A sample of scores for 16 students
has mean score of x  522.8 with a sample standard deviation of s  154.5 .
Suppose we wished to support the claim that the average SAT Math score exceeds 500
using a   .05 level of significance.

The null and alternative hypotheses in this case would be:


H0 :   500
H a :   500
So the test is right-tailed with  0  500 .

Using the t-distribution table with One Tail   .05 and 15 degrees of freedom, we get a
critical value of t0  1.753 . The rejection region is thus: t  1.753 .

The standardized test statistic is given by:

x  0 522.8  500 22.8


t    .59
s / n 154.5 / 16 38.625

Because the standardized test statistic is not in the rejection region, we would not
reject H 0 , and so the sample data is not sufficient to support the claim that the mean
exceeds 500 at the .05 level of significance.

12
Example: A biologist measures the weights of anesthetized female grizzly bears during
winter. A sample of 14 bears is found to have a mean weight of x  376.6 lbs. with a
sample standard deviation of s  32.5 lbs. Is there sufficient evidence to support the
claim that the mean weight of all female bears in the area is less than 400 lbs. using a
  .01 level of significance? (Assume that the weights of female grizzly bears are
normally distributed)

The null and alternative hypotheses in this case would be:


H0 :   400
H a :   400
So the test is left-tailed with  0  400 .

Using the t-distribution table with One Tail   .01 and 13 degrees of freedom, we get a
critical value of t0  2.650 . The rejection region is thus: t  2.650 .

The standardized test statistic is given by:

x  0 376.6  400 23.4


t    2.694
s/ n 32.5 / 14 8.686

The standardized test statistic is in the rejection region, so we would reject H 0 , which
supports the claim that the mean bear weight is less than 400 lbs.

Exercise: A statistics professor gives his class their second exam. In order to estimate the
mean score, the professor chooses 8 exams at random and grades them. The mean grade
for the eight exams chosen is x  84 and they have a standard deviation of s  7 .
Assume that the students’ test scores are normally distributed.

The professor believes the average score will be higher than the mean of 80 on the first
exam. Conduct a hypothesis test using the   .05 level of significance, to support the
professor’s claim.

13
Testing a Claim about a Population Proportion

Recall that if a random sample is drawn and if the sample proportion p̂ is measured and
npˆ  5 , nqˆ  5 , then the distribution of p̂ is approximately normal with  p̂  p and
 pˆ  pq / n .

When testing a claim about a population proportion, the null hypothesis has one of the
following forms:
p  p0 , p  p0 , or p  p0
So as with the mean, we will assume p  p0 and
we will use the following standardized test statistic for a proportion test:
pˆ   pˆ pˆ  p0
z 
 pˆ p0 q0 / n

This random variable should have the standard normal distribution, and so we will
calculate all of our rejection regions, critical values, and P-values using the standard
normal distribution as we did when testing a mean using a large sample.

Example: A computer chip manufacturer tests microprocessors coming off the


production line. In one sample of 577 processors, 37 were found to have defects. The
company wants to claim that the proportion of processors that are defective is only 4%.
Can the company’s claim be rejected at the   .01 level of significance?

The null and alternative hypotheses in this case would be:


H0 : p  .04
H a :   .04
So the test is two-tailed with p0  .04 .

Since pˆ  37 / 577  .064 , the standardized test statistic is given by:

pˆ  p0 .064  .04 .024


z    3.0
p0 q0 / n (.04)(.96) / 577 .008

Looking up z  3.00 in the standard normal table we get a value of .9987, so


P( z  3.00)  1  .9987  .0013 and since we have a two-tailed test, the P-value is just
twice this amount or .0026. Since this is less than   .01 , we can reject the company’s
claim.

14
Example: An opinion poll of 1010 randomly chosen American adults finds that only 47%
approve of the president’s job performance. The president’s political advisors want to
know if this is sufficient data to show that less than half of Americans approve of the
president’s job performance using the   .05 level of significance.

The null and alternative hypotheses in this case would be:


H0 : p  .50
H a : p  .50
So the test is left-tailed with p0  .5 .

Since pˆ  .47 , the standardized test statistic is given by:

pˆ  p0 .47  .50 .03


z    1.91
p0 q0 / n (.5)(.5) /1010 .01573

For a left-tailed test with   .05 , we will have z0  1.645 , and so since
z  1.91  1.645  z0 the null hypothesis should be rejected, and so the data does
support the claim p  .50 at the   .05 level of significance.

Exercise: Calculate the P-value of the above test.

Exercise: According to the IRS, 23.8% of all 2003 income tax returns were filed
electronically. To estimate whether the number has increased, a random sample of 500 tax
returns for 2004 is selected, and the IRS finds that 132 of the returns were filed
electronically. Conduct a hypothesis test to support the claim that more than 23.8% of
2004 tax returns are filed electronically. What is the test statistic and P-value for the test?

15
6 STEPS HYPOTHESIS TEST
(SUGGESTED FOR THE COURSE)

1. IDENTIFY THE DEPENDENT AND INDEPENDENT


VARIABLES.

2. STATE THE NULL AND ALTERNATIVE


HYPOTHESIS

3. ESTABLISH THE LEVEL OF SIGNIFICANCE


(USUALLY SET TO 0.05 LEVEL FOR SOCIAL
RESEARCHES UNLESS SPECIFIED)

4. DETERMINE THE APPROPRIATE TEST


STATISTICS

5. COMPUTATION / ANALYSIS ( USE BASIC


AVAILABLE STATISTICAL SOFTWARE)

6. INTERPRETATION / IMPLICATION / CONCLUSION

16

You might also like