Stats solution !!!!

Key Concepts

two proportions

Hypothesis Testing

Review

1. The weight of an adult swan is normally distributed with a mean of 30 pounds

and a standard deviation of 10 pounds. A farmer randomly selected 36 swans

and loaded them into his truck. What is the probability that this flock of

swans weights more than 1000 pounds?

The wrong approach to this problem is to find the average weight and standard

deviation of a flock of swans and then calculate the Z-score from that. We have

no information about a flock of swans and how that weight is distributed; we are

told that individual swan weights are normally distributed. If a flock of swans

needs to weigh less than 1000 pounds, then an average weight of the swans

needs to weigh 1000/36 pounds. If the true weight of an adult swan is

(30, 10 )

X

36 . We

1000

30

X 30

36

>

P( X >1000/36) = P( 10

10

36

36

) = P (Z>-1.33) = .908

approximately follow a normal distribution with a mean of 202 g and a

standard deviation of 3 g.

a. Between what values do the middle 95% of the weights of boxes of

cookies lie?

From a normal distribution, we know that 95% of the observations lie within

1.96 standard deviation from the mean. Therefore:

1.96* = 202 1.96*(3) = (196.12, 207.88)

b. If the manufacturer stamps 206 g on all the boxes, what percent of

boxes of cookies are overweight?

Box is overweight if it is more than 206 g.

P(X > 206) = P((X-)/ > (206-202)/3) = P(z > 1.33) = 1-0.9082 = 0.0918

c. Suppose the company makes a profit of $2 for each box of cookies that

is not overweight, but only $1.50 for the overweight boxes of cookies.

If they sell 500 boxes of cookies what is the expected value and

variance of profit?

This is a binomial problem because the boxes are either overweight or they

are not.

Let X be the number of overweight boxes, and Y be profit. The profit equation

would be:

Y = 2(500-X) + 1.5(X) = 1000 0.5X

The expected value and variance of the number of overweight boxes can be

determined from the binominal distribution is:

E(X) = np = 500(0.0918) = 45.9

Var(X) = np(1-p) = 500(0.0918)(0.9082) = 41.69

The profit can be determine from a+bX transformation:

E(Y) = 1000-0.5*E(X) = 1000-0.5*45.9 = 977.05

Var(Y) = (-0.5)2*Var(X) = 10.42

Practice Problems

3. An investigator wants to study whether or not Harvard students have similar

aerobic conditioning as the general US population. In the general US

population age 18-29, it is known that the mean heart rate is 69 beats per

minute with a standard deviation of 6 beats per minute. We would like to use

this section as a random sample in order to investigate this.

a.

What are our hypotheses? What should the level of the test () be set

to? What would be a Type I error? A Type II?

H 0 : 69

Ha : 69

(2-sided)

0.05

A Type I error would be one where we reject the null hypothesis even though

it is true. In this case we claim that Harvard student have the different heart

rates, then though they have the same as the population. A Type II error

would not rejecting the null hypothesis, when it is false, or claiming that

Harvard has the same rate as the population when it does not.

b.

set the

H0

the hypothesized mean does our sample mean have to be to reject

?

This is a 2-sided test. The investigator has no a priori assumption about

whether the average heartrate will be lower or higher.

We will be performing a z-test (since we know the true standard deviation in

the population, = 6).

H0

If our z-test is further in the tails than 1.96 or -1.96, we will reject the

c.

Now take the mean and SD of the heartrates of this class. What is the

conclusion from this test?

Let us assume that the average heart rate of this class is 65.95, standard

deviation 8.11 and there are 20 students in this section

2.27

6

1.3416

n

20

We reject the null hypothesis because |Z| > 1.96. There is enough evidence

that Harvards mean heartrate is different than the rest of the US; in fact, its

lower.

d.

interval built upon the above sample to include the null mean, 0, or

not? Calculate this confidence interval.

No, the interval should not include 0 = 69 since we rejected that hypothesis

in part (d) above. The calculation is:

x z * 65.95 1.96 * 6

n

20

*Note, we chose z to be 1.96 since that is the value in a standard normal

distribution that puts 0.025 in each tail (so that 95% fall between -1.96 and

1.96).

e.

narrower. This makes sense since at 90% we are less confident the interval

contains the true mean, so we do not need to make it as wide. For a 99%

confidence interval, z = 2.58, so the interval will be wider.

f.

you propose we use instead? What test should we then perform?

Perform this formal test.

The most logical solution would be to use the sample standard deviation (s)

as an estimate for . The resulting test statistic would no longer be a true

normal distribution.

This actually would be a t-test based on the same hypotheses. Here, we

found the standard deviation in the sample of data to be s = 8.11. Thus, we

would calculate the t-statistic to be:

1.68

s

8.11

1.813

n

20

This t-test has df = n 1 = 19. In the t-table, we find the t critical value to be

0.05

for a 2-sided test. Since our t-statistic is not

further out in the tail than t, we cannot reject the null hypothesis. There is

not enough evidence that Harvards mean heartrate is different than the rest

of the US.

4.

League (AL) vs. the National League (NL) in Major League Baseball? The AL

had an overall batting average of 0.267 with a standard deviation across the

14 teams of 0.0101. The NL had an overall batting average of 0.260 with a

standard deviation across the 16 teams of 0.0099.

a. Calculate the 95% confidence interval to estimate the true mean

difference inbatting average between the two leagues.

014178)

b. Is there statistical evidence to support the claim that the batting

average is different in the two leagues?

No, because the confidence interval includes the null hypothesis, of 0,

where there is no difference.

