You are on page 1of 5

Stat 1001

Winter 1998
Geyer
Homework 8
Problem 21-4
(a) The observed percentage is 36.1%. We estimate the standard error by the
\bootstrap procedure" plugging in 0.361 for the (unknown) fraction of 1 's in
the box model.
SD of the box. p0 361  0 639 = 0 48.
: : :

SE for the percentage. 0 48 p6 000  100% = 0 62%.


: = ; :

Con dence interval. observed percentage  2 SE is 36 1%  1 2%.: :

(b) The observed percentage is 95.2%. We estimate the standard error by the
\bootstrap procedure" plugging in 0.952 for the (unknown) fraction of 1 's in
the box model.
SD of the box. p0 952  0 048 = 0 2138.
: : :

SE for the percentage. 0 2128 p6 000  100% = 0 276%.


: = ; :

Con dence interval. observed percentage  2 SE is 95 2%  0 5%.: :

Problem 21-6
The calculation correctly follows the formulas explained in the book, but is
wrong for a reason you have to think to see.

A con dence interval is only as good as the box model on which it is based.
If the box model doesn't correspond to reality, neither does the con dence
interval.
The problem here is that stock price movements are dependent. If the stock
went up today, then it is likely to keep going up tomorrow. Draws from a box
model are independent. They can't model dependent data.

1
Problem 21-11
(ii) is correct. (i) is incorrect. Suppose the \reliable to within two percentage
points" is intended to represent 95% con dence (which is typical for public
opinion polls). This means that the error is smaller than two percentage points
95% of the samples and larger than that 5% of samples. 95% is not \virtually
all."
The Moral of the Story. There is a right way to think about con dence
intervals (ii) and a wrong way (i). The right way keeps sampling variability in
mind, the wrong way forgets. Humans have a hard time relating to randomness.
Even people who know better slip into the wrong way of thinking when they
don't try hard to keep the right way in mind. It's just much easier, and much
more natural to humans, to ignore randomness.

Problem 21-12
(ii) is the histogram for the values drawn. There are only three possible values
because there are only three di erent numbers 1, 2, and 5 on the tickets in the
box. This is the only histogram of the three with just three rectangles.
(iii) is the probability histogram for the sum. By the central limit theorem it
is close to the normal curve. Also there are many possible values for the sum of
draws. (i) doesn't have enough rectangles and isn't close enough to the normal
curve.

Problem 21-13
The expected value and the SE are the same for all three parts. They are not
random. As in several previous problems, we don't have to calculate the average
and SD of the box 0 1 . Both are 0.5 (Example 5 of Chapter 17, p. 301).

Expected value 1 000  0 5 = 500.


; :

SE p1 000  0 5 = 15 81.
; : :

Chance error this is observed value , expected value. For (a) 529 , 500 = 29,
for (b) 484 , 500 = ,16, and for (c) 514 , 500 = 14.

Problem 23-2
It doesn't matter that we don't know the SD of the box. We can estimate the
SD using the SD of the sample and plug that in for the unknown SD of the box.
This is the \bootstrap" principle (p. 416).

2
The SE for the average of draws is estimated by
SE for average of draws = p SD for box
number of draws
p
in this case 2 3 500 = 0 10.
: = :

Thus (a) and (b) are correct. The observed value is likely to be o by about
1 SE and observed value  1 SE is a 68% con dence interval.
(c) is wildly wrong. Nothing about the theory we have learned tells us that.
If the box follows the normal curve about 68% of the tickets are in the range
average of box  1 SD of box. That's 71 3  2 3, a much wider interval! If the
: :

box doesn't follow the normal curve, we can't even say that.

Problem 23-7
(iii) This is a convenience sample. Probability doesn't apply. The notion of
standard error is meaningless in this context.

Problem 23-8
We need
p to work out the SE for the average. Our \bootstrap" estimate is
6 500 400 = 325. So the \give or take" number here is one SE.
; =

(a) True. This interval is observed average  1 SE, which indeed is a 68%
con dence interval.
(b) True. That's what a con dence interval says.
(c) False. It would be true that population average  population SD con-
tains about 68% of the population, if the population follows the normal curve.
But a bit of thought tells you this population doesn't. Consider Minnesota,
Macalester, Carleton, and St. Olaf. There are a few really big ones and lots of
little ones.
Even if the population did follow the normal curve, the probability would be
a bit di erent from 68% because 3,700 is not the population average and 6,500
is not the population SD. They are the sample average and sample SD. Close,
but not exactly right.
(d) False. This wrong for three reasons. The main reason it is wrong, wildly
wrong, is that it confuses the amount of population covered with amount of
con dence, the same error made in Problem 23-2 (c).
Two minor reasons it is wrong are the reasons for part (c), which apply here
too.

3
(e) False. Whether the data follow the normal curve or not is irrelevant. The
central limit theorem says the average follows the normal curve (approximately)
regardless of the distribution of the population whenever the sample size is large
enough. 400 is large enough.

Problem 23-9
Using the \bootstrap" principle we estimate the SD p
of the box model as 2.3
(the sample SD) and the SE of the average as 2 3 2 500 = 0 046. Then
: = ; :

sample average  2 SE is 1 7  0 09 papers.


: :

Problem 23-11
Average of the box. (1 + 2 + 3 + 4 + 5) 5 = 3. =

SD of the box. Deviations from average: ,2, ,1, 0, 1, 2. Squared deviations:


4, 1, 0, 1, 4.pSum of squared deviations: 10. Average of squared deviations: 2.
SD of box: 2.
SE of the average of draws. p2 p25 = 0 2828.
= :

Now we can look at the gure. The histogram is centered at 3, which is the
average of the box. That's o. k.
The histogram looks to have an SD of about 0.5. It couldn't have an SD as
small as 0.3. That would have 95% of the area between 2.4 and 3.6, which is
clearly wrong.

Problem 23-S-16
The quick answer is that the addition rule doesn't apply, so the answer is false.
What other than the addition rule would give twice the probability for twice the
spins? But the addition rule doesn't apply because the spins are independent,
which means they can't be mutually exclusive. (See Problem 14.5.) If that isn't
enough explanation, we have to actually calculate probabilities.
The pattern of calculation for this problem should be familiar. It is like
Problems 13.8(c) and 14.7 and paradox of Chevalier de Mere.
No rule gives directly the probability of \any 7's in 15 spins." Thus we
look at the opposite event which is \no 7's in 15 spins." This probability we
can calculate by the multiplication rule. The probability of no 7's in one spin
is 37 38, so the probability of no 7's in 15 spins is (37 38)15 = 0 67, and the
= = :

probability of any 7's is 1 , 0 67 = 0 33.


: :

The same pattern of calculation for 30 spins gives 1 , (37 38)30 = 0 55.
= :

Problem 23-S-17
This is a \counting and classi cation problem" we want to count ones and
sixes so we recode the tickets putting ones on the tickets we want to count

4
and zeros on the rest. The original box is 1 2 3 4 5 6 after recoding
we have 1 0 0 0 0 1 . The average of this box is 1 3 and the SD is
q1 2 =

3  3 = 0 4714. The expected value for the count for 20 rolls is 20 3 = 6 667.
p
: = :

The SE for the count for 20 rolls is 20  0 4714 = 2 108.


: :

Problem 23-S-27
(a) 54 500  100% = 10 8%.
= :

(b) Using the \bootstrap" principle we assume the observed percentage 10.8%
is like the percentage ofpdraws from a box with 10.8% ones and 89.2% zeros.
The SDp for this box is 0 108  0 892 = 0 31. The SE for sample size 500 is
: : :

0 31 500  100% = 1 4%. Thus the answer to (b) is 1.4%.


: = :

(c) The con dence interval is observed percentage  2 SE, which is 10 8  2 8,


: :

or from 8.0 to 13.6.


The sentence nishes percentage of independents among all registered
:::

voters in Hayward, California.

You might also like