You are on page 1of 5

Stat 104, Section 6 Handout (Solutions)

Marion, Thursday, 4pm, SC107

Key Concepts

2-sample t-based confidence intervals for the difference of two means/


two proportions
Hypothesis Testing

Review
1. The weight of an adult swan is normally distributed with a mean of 30 pounds
and a standard deviation of 10 pounds. A farmer randomly selected 36 swans
and loaded them into his truck. What is the probability that this flock of
swans weights more than 1000 pounds?
The wrong approach to this problem is to find the average weight and standard
deviation of a flock of swans and then calculate the Z-score from that. We have
no information about a flock of swans and how that weight is distributed; we are
told that individual swan weights are normally distributed. If a flock of swans
needs to weigh less than 1000 pounds, then an average weight of the swans
needs to weigh 1000/36 pounds. If the true weight of an adult swan is

X (30, 10) , then the average weight of a sample of 36 is

(30, 10 )
X
36 . We

need to find the P( X >1000/36).

1000
30
X 30
36

>
P( X >1000/36) = P( 10
10
36
36

) = P (Z>-1.33) = .908

2. The weights of boxes of cookies produced by a certain manufacturer


approximately follow a normal distribution with a mean of 202 g and a
standard deviation of 3 g.
a. Between what values do the middle 95% of the weights of boxes of
cookies lie?
From a normal distribution, we know that 95% of the observations lie within
1.96 standard deviation from the mean. Therefore:
1.96* = 202 1.96*(3) = (196.12, 207.88)
b. If the manufacturer stamps 206 g on all the boxes, what percent of
boxes of cookies are overweight?
Box is overweight if it is more than 206 g.
P(X > 206) = P((X-)/ > (206-202)/3) = P(z > 1.33) = 1-0.9082 = 0.0918
c. Suppose the company makes a profit of $2 for each box of cookies that
is not overweight, but only $1.50 for the overweight boxes of cookies.
If they sell 500 boxes of cookies what is the expected value and
variance of profit?
This is a binomial problem because the boxes are either overweight or they
are not.
Let X be the number of overweight boxes, and Y be profit. The profit equation
would be:
Y = 2(500-X) + 1.5(X) = 1000 0.5X
The expected value and variance of the number of overweight boxes can be
determined from the binominal distribution is:
E(X) = np = 500(0.0918) = 45.9
Var(X) = np(1-p) = 500(0.0918)(0.9082) = 41.69
The profit can be determine from a+bX transformation:
E(Y) = 1000-0.5*E(X) = 1000-0.5*45.9 = 977.05
Var(Y) = (-0.5)2*Var(X) = 10.42

Practice Problems
3. An investigator wants to study whether or not Harvard students have similar
aerobic conditioning as the general US population. In the general US
population age 18-29, it is known that the mean heart rate is 69 beats per
minute with a standard deviation of 6 beats per minute. We would like to use
this section as a random sample in order to investigate this.

a.

What are our hypotheses? What should the level of the test () be set
to? What would be a Type I error? A Type II?

H 0 : 69

Ha : 69
(2-sided)

0.05

(This is the default).

A Type I error would be one where we reject the null hypothesis even though
it is true. In this case we claim that Harvard student have the different heart
rates, then though they have the same as the population. A Type II error
would not rejecting the null hypothesis, when it is false, or claiming that
Harvard has the same rate as the population when it does not.
b.

Is this a one-sided or two-sided test? What test will we perform? If we


set the

-level to be 0.05, how many standard deviations away from

H0
the hypothesized mean does our sample mean have to be to reject
?
This is a 2-sided test. The investigator has no a priori assumption about
whether the average heartrate will be lower or higher.
We will be performing a z-test (since we know the true standard deviation in
the population, = 6).

H0
If our z-test is further in the tails than 1.96 or -1.96, we will reject the
c.

Now take the mean and SD of the heartrates of this class. What is the
conclusion from this test?

Let us assume that the average heart rate of this class is 65.95, standard
deviation 8.11 and there are 20 students in this section

x 0 (65.95 69) 3.05

2.27

6
1.3416
n
20

We reject the null hypothesis because |Z| > 1.96. There is enough evidence
that Harvards mean heartrate is different than the rest of the US; in fact, its
lower.

d.

Without performing any calculations, would you expect the confidence


interval built upon the above sample to include the null mean, 0, or
not? Calculate this confidence interval.
No, the interval should not include 0 = 69 since we rejected that hypothesis
in part (d) above. The calculation is:

65.95 2.63 (63.32, 68.58)


x z * 65.95 1.96 * 6

n
20
*Note, we chose z to be 1.96 since that is the value in a standard normal
distribution that puts 0.025 in each tail (so that 95% fall between -1.96 and
1.96).

e.

How would a 90% CI compare to ours? How about a 99% CI?

A 90% confidence interval would lead to a z = 1.645, therefore it would be


narrower. This makes sense since at 90% we are less confident the interval
contains the true mean, so we do not need to make it as wide. For a 99%
confidence interval, z = 2.58, so the interval will be wider.

f.

If we did not know the true population standard deviation, , what do


you propose we use instead? What test should we then perform?
Perform this formal test.

The most logical solution would be to use the sample standard deviation (s)
as an estimate for . The resulting test statistic would no longer be a true
normal distribution.
This actually would be a t-test based on the same hypotheses. Here, we
found the standard deviation in the sample of data to be s = 8.11. Thus, we
would calculate the t-statistic to be:

x 0 (65.95 69) 3.05

1.68
s
8.11
1.813
n
20

This t-test has df = n 1 = 19. In the t-table, we find the t critical value to be

0.05

2.093 for df = 19 and


for a 2-sided test. Since our t-statistic is not
further out in the tail than t, we cannot reject the null hypothesis. There is

not enough evidence that Harvards mean heartrate is different than the rest
of the US.

4.

Is there a significant difference in team batting average in the American


League (AL) vs. the National League (NL) in Major League Baseball? The AL
had an overall batting average of 0.267 with a standard deviation across the
14 teams of 0.0101. The NL had an overall batting average of 0.260 with a
standard deviation across the 16 teams of 0.0099.
a. Calculate the 95% confidence interval to estimate the true mean
difference inbatting average between the two leagues.

(.267-.260) +- 1.96 * sqrt((.0101)^2/14 + (.0099)^2/16) = (-.000178, .


014178)
b. Is there statistical evidence to support the claim that the batting
average is different in the two leagues?
No, because the confidence interval includes the null hypothesis, of 0,
where there is no difference.