You are on page 1of 10

Stat 1001

Winter 1998
Geyer
Homework 7
Problem 18.1
20 and 25.

Problem 18.2
(a)
Average of the box. (1 + 3 + 5 + 7) 4 = 4. =

SD of the box. The deviations from the average are ,3, ,1, 1, 3. The squared
deviations are 9, 1, 1, 9. The sum of the squared
p deviations is 20. The average
squared deviation is 20 4 = 5 and the SD is 5 = 2 236.
= :

Expected value for the sum of draws. 400  4 = 1 600. ;

p
SE for the sum of draws. 400  2 236 = 44 72. : :

Conversion to standard units. 1,500 is 100 44 72 = 2 236 SE below the


= : :

expected value. That is, ,2 236 in standard units.


:

Table look-up. The area we want to look up is the area above 1,500 in original
units (,2 236 in standard units). The normal curve tail area table says the
:

1500 1600 1700


value of the sum
-3 -2 -1 0 1 2 3
standard units

tail (the unshaded area) is 1.3% (between 1.39 and 1.22). The shaded area is
100 , 1 27 = 98 7 percent.
: :

1
(b) For this problem we have to recode the box to count threes. Thus we use
the box 0 1 0 0 .

Average of the box. 1 4. =

SD of the box. q
(1 , 0)  1
4  43 = 0 4330
:

Expected value for the sum of draws. 400  14 = 100.


p
SE for the sum of draws. 400  0 4330 = 8 660. : :

Conversion to standard units. 90 is 10 8 660 = 1 155 SE below the expected


= : :

value. That is, ,1 155 in standard units.


:

Table look-up. The area we want to look up is the area below 90 in original
units (,1 155 in standard units). The normal curve tail area table says the tail
:

80 100 120
value of the sum
-3 -2 -1 0 1 2 3
standard units

area is 12.4% (between 12.51 and 11.51, much closer to the former).

2
Continuity correction. To be nicky, we should use \continuity correction."
The plot we should actually be thinking of is the one below.

70 75 80 85 90 95 100
value of the sum
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0
standard units

\Fewer than 90" means numbers below 90, not including 90. The actual
probability histogram for the sum of draws is shown. We want to calculate
the area of the shaded rectangles, corresponding to numbers below 90. This is
approximated by the area under the normal curve below 89.5 in original units,
which is (89 5 , 100) 8 660 = ,1 212 in standard units.
: = : :

The normal curve tail area table says the tail area is 11.3% (between 11.51
and 10.56, much closer to the former). Exact calculation (we don't know how
to do this having skipped Chapter 15, but suce it to say that I asked a com-
puter) shows that the exact answer is 11.2%. So the normal approximation
with continuity correction is correct to almost three gures. Without continuity
correction we don't even get two correct gures (12.4 rounds to 12 whereas 11.2
rounds to 11).
Nevertheless, we count 12.4% as a correct answer, you don't have to use
continuity correction on this problem.

Problem 18.4
This problem is like Example 1(a) in Section 4. If you don't use \continuity
correction" you can't do the problem at all.
The question is about the sum of 25 draws from the box 0 1 . What is
the chance that the sum of draws is exactly 12? (If you are confused by the
phrasing of the question \12 heads and 13 tails," you have to realize that this
is just the same as saying \12 heads," because if you get 12 heads in 25 tosses,
then the other 13 tosses must be tails.)
We don't need to calculate the average of the box and the SD of the box.
Both are given in Example 5 of Chapter 17 (p. 301). Both are 1 2. =

Expected value for the sum of draws. 25  12 = 12 5. :

p
SE for the sum of draws. 25  12 = 2 5. :

3
5 10 15 20
value of the sum
-3 -2 -1 0 1 2 3
standard units

The picture. The sides of the shaded rectangle are 11.5 and 12.5 in the original
units.
Conversion to standard units. 11.5 converted to standard units is is (11 5 , :

12 5) 2 5 = ,0 4. 12.5 converted to standard units is is 0.


: = : :

Table look-up. Thus we want the area under the normal curve between = z

,0 4 and = 0. This is half the area between ,0 4 and +0 4. That area, from
: z : :

the normal curve table in the book is 31.08%. Thus the answer is 15.54%
The Moral of the Story. Sometimes you must use \continuity" correction.
What would you do otherwise?
 If you say both sides of the rectangle are 12, the width of the rectangle is
zero and the area zero. That's completely wrong.
 If you say one side of the rectangle is 12 and the other 13 so the rectangle
has the right width, the answer you get is 15.85%, which has only two
correct gures instead of the three you get using the continuity correction
properly. (The exact answer, from asking a computer, is 15.50%.)

Center the Rectangles over the Numbers


In a classifying and counting problem the sum of draws from the box is a
count (integer). In drawing the probability histogram, center the rectangles
over the integers.

The picture for this problem is an example.

Continuity Correction
When you have centered the rectangles in the probability histogram over
the integers, the side of a rectangle is never an integer, it is always halfway
between two integers (something point ve).

4
This problem and the nicky solution to Problem 18.2 are examples.
The two boxes tell you how to use the normal approximation for probability
histograms the Right Way (with a capital R and a capital W). Using what
the book calls \keeping track of the edges of the rectangles" and everyone else
calls \continuity correction" (never mind why, it's a misleading name, but is
traditional).

Problem 18.6
We should be comparing the error to the SE for the sum of draws. As in
Problem 18.4, the relevant box is 0 1 . We are interested in the sum of one
million draws from this box. Also as in Problem 18.4, we don't need to calculate
the average of the box and the SD of the box. Both are given in Example 5 of
Chapter 17 (p. 301). Both are 1 2.
=
p
Thus the SE for the sum of draws is 1 000 000  12 = 1 000  21 = 500.
; ; ;

The error should be about 500 or so. 95% of the time it will be less than
2 SE, which is 1,000. An error of 2,015 is more than 4 SE. The normal curve
tail area table says this happens less than one time in 10,000. Looks like the
computer program is buggy.

Problem 18.7
Like box (ii). The roll of each die is like one draw from box (ii). The dice are
independent. Hence the total number of spots on the dice is like the sum of two
draws with replacement from box (ii).

Problem 18.12
(a) No. You aren't given enough information to calculate the average and SD
of the box.
(b) Yes. Knowing the average and the SD of the box and the number of draws
allows you to calculate the expected value and SE for the sum of draws, and
that's enough to use the normal approximation.

Problem 18.13
(a) No. There could be anywhere from 0 to 4 3 's in the box. We would need
to be told how many.
(b) No. The average and SD of the original box are no help. We need to know
the average and SD of the box that has 3 's replaced by 1 's and all the other
numbers replaced by 0 's (as in any \classifying and counting problem"). We
haven't been told that.

5
Problem 18.14
(a) Yes. We need to know the average and SD of the box that has positive
numbers replaced by 1 's and negative numbers replaced by 0 's (as in any
\classifying and counting problem"). We know there are four positive numbers
and six negative ones. So the relevant box is 1 1 1 1 0 0 0 0 0 0
p
We can calculate the average (0.4) and SD ( 0 4  0 6 = 0 49) of this box,
: : :

and so forth.
(b) This is now moot. We don't need any extra information, we already had
enough.

Problem 19.2
Yes or no, it depends on what \similar" means. I would expect some response
bias in the rst survey. If you feed people an answer, some of them will take it,
regardless of whether it is correct. See p. 344. The mere order in which choices
are given a ects the choice. Asking to see what detergent is being used leaves
no opportunity for response bias.
If you objected to the word \housewives" in the question, you missed the
point. The authors of the textbook didn't phrase it well, but surely a real
marketing survey would ask whoever does the laundry.

Problem 19.4
There is no random sampling here. People chose the blocks. They thought those
blocks were \most representative" but people aren't any good at such decisions.
That's why random samples are used.
This can be called a quota sample, but quota sampling isn't a good method.
See the box on p. 339 in the textbook.

Problem 19.5
This is a totally bogus way of dealing with nonresponse bias. The 347 households
in the new sample are just like the responders in the rst sample. Nothing has
been found out about nonresponders.
There's no way to tell for sure about which way the nonresponse bias goes.
A guess is that the smaller households are more likely to be nonresponders (with
fewer people, it's less likely that one is interested in responding). This would
make the estimate too high.
What nance department should have done is to go back to the 347 nonre-
sponding households in the original survey and see if the could get responses
from them on the second try (or third try, etc.)

6
Problem 20.1
Number of tosses Number of heads Percentage of heads
EV SE EV SE
100 50 5 50% 5%
2,500 1,250 25 50% 1%
10,000 5,000 50 50% 0.5%
1,000,000 500,000 500 50% 0.05%
The number of heads is the number of draws times 0.5 (the average of the box).
The percentage of heads is always 50% (the percentage in the box, which is the
same thing as the population percentage). The standard errors are given by
p
SE for number = number of draws  (SD of box) (1)
SE for percentage = p SD of box  100% (2)
number of draws
(The SD of the box is also 0.5.)
The Moral of the Story. Repeating what is said at the bottom of p. 360
 The SE for the sample number goes up like the square root of the sample
size.
 The SE for the sample percentage goes down like the square root of the
sample size.

Problem 20.3
(a) 50,000. The box models the population not the sample.
(b) A zero or a one. This is a \counting and classifying" problem. We are
counting gross incomes over $50,000.
(c) False. It doesn't even have units of dollars.
(d) True. The number of draws in the box model corresponds to the sample
size in the real sample.
(e)
The box model. The box model has 20% 1 's and 80% 0 's.
Expected value for the percentage. The expected value for the percentage is
always the same as the the population percentage, which is 20%, the percentage
of 1 's in the box.

7
SD for the box.
  s
bigger smaller fraction with fraction with
number , number  bigger number , smaller number
p
= (1 , 0)  0 20  0 80 = 0 4
: : :

Caution. Don't use 20% and 80% instead of 0.20 and 0.80 here. If you do, your
answer will be o by a factor of 100.
SE forpthe percentage. Now we use equation (2). The SE for the percentage
is 0 4 900  100% = 1 333%.
: = :

The picture.

16 18 20 22 24
value of the percentage
-3 -2 -1 0 1 2 3
standard units

Conversion to standard units. 19% is (19 , 20) 1 333 = ,0 75 in standard


= : :

units. 21% is +0 75.:

Table look-up. The normal curve table in the textbook says this area is 54.67%.
(f) No. We aren't told anything about the fraction of incomes in the population
over $75,000. We would need to know that.

Problem 20.4
This problem is tricky. Despite being in the chapter which introduces \percent-
age of draws," this problem is about \sum of draws." The \total gross income
of the audited forms" is the sum of all the incomes in the audited sample. So
in this problem we use formulas for sums not percentages.
(a) 50,000. For the same reason as in Problem 20.3.

8
(b) Now that we are interested in the sum of incomes, the box with incomes
written on the tickets is the appropriate box.
(c) True.
(d) True. For the same reason as in Problem 20.3.
(e)
The box model. The box model has average $37,000 and SD $20,000 given in
the statement of Problem 20.3.
Expected value for the sum of draws. 900  $37 000 = $33 300 000. ; ; ;

p
SE for the sum of draws. 900  $20 000 = $600 000. ; ;

The picture.

31 32 33 34 35
value of the sum (millions of dollars)
-3 -2 -1 0 1 2 3
standard units

Conversion to standard units. 33 million is (33 , 33 3) 0 6 = ,0 5 in stan- : = : :

dard units.
Table look-up. The normal curve tail area table says the tail (unshaded area)
is 30.85%. The opposites rule gives 100 , 30 85 = 69 15% for the answer. : :

Problem 20.6
(ii) The sample size for California is 0 001  30 000 000 = 30 000. The sam-
: ; ; ;

ple size for Nevada is 0 001  1 000 000 = 1 000. The larger sample size for
: ; ; ;

California provides more accuracy.

9
Problem 20.11
(a) ::: observed value is 357 but the expected value is 340.
expected value for number = (number of draws)  (average of box)
= (sample size)  (fraction in population)
The fraction of undergraduates in the population is 17 000 25 000 = 17 25.
; = ; =

Thus the expected value for the observed number is 500  17 25 = 340.
=

(b) :::observed value is 71.4% but the expected value is 68%. The expected
value for the percentage is the same as the population percentage, 17 25  100%.
=

10

You might also like