Chapter 05 - Sampling and Sampling Distributions

Chapter 05 - Sampling and Sampling Distributions
CHAPTER 5
SAMPLING AND SAMPLING DISTRIBUTIONS
5.1. Parameters are numerical measures of populations. Sample statistics are numerical measures of
samples. An estimator is a sample statistic used for estimating a population parameter.
5-2. x = 97.9225 (estimate of  )

s = 51.8303 (estimate of  )
s2 = 2,686.38 (estimate of  2—the population variance)
5-3. p̂ = x/n = 5/12 = 0.41667

(5 out of 12 accounts are over $100.)
5-4. x = 2121.667 s = 1737.714

Basic Statistics from Raw Data
Measures of Central tendency
Mean 2121.6667 Median
Measures of Dispersion
If the data is of a
Sample Population
Variance 3019651.52
St. Dev. 1737.71445
5.5. average price = 4.367 standard deviation = 0.3486
Basic Statistics from Raw Data
Measures of Central tendency
Mean 4.3676471 Median
Measures of Dispersion
If the data is of a
Sample Population
Variance 0.12154412
St. Dev. 0.34863178
5-1
5.6. p̂ = x/n = 11/18 = 0.6111, where x = the number of users of the product.
5.7. We need 25 elements from a population of 950 elements. Use the rows of Table 5-1, the rightmost
3 digits of each group starting in row 1 (left to right). So we skip any such 3-digit number that is
either > 950 or that has been generated earlier in this list, giving us a list of 25 different numbers in
the desired range. The chosen numbers are:
480, 11, 536, 647, 646, 179, 194, 368, 573, 595, 393, 198, 402, 130, 360, 527, 265, 809, 830,
167, 93, 243, 680, 856, 376.
5.8. We will use again Table 5-1, using columns this time. We will use right-hand columns, first 4 digits
from the right (going down the column):
4,194 3,402 4,830 3,537 1,305.
5.9 We will use Table 5-1, sets of 2 columns using all 5 digits from column 1 and the first 3 digits from
column 2, continuing by reading down in these columns. Then we will continue to the set: column 3
and first 3 digits column 4. We skip any numbers that are > 40,000,000. The resulting voter
numbers are:
10,480,150 22,368,465 24,130,483 37,570,399 1,536,020.
5.10.There are 7 x 24 x 60 minutes in one week: (7)(24)(60) = 10,080 minutes. We will use Table 5-1
Start in the first row and go across the row, then to the next row (left to right using all 5 digits
in each set), discarding any of the resulting 5-digit numbers that are > 10,080. The resulting
minute numbers are:
1,536 2,011 6,243 7,856 6,121 6,907
5-11. A sampling distribution is the probability distribution of a sample statistic. The sampling
distribution is useful in determining the accuracy of estimation results.
5.12.Only if the population is itself normal.
5-13. E  X    = 125 SE  X    / n  20/ 5 = 8.944
5.14.The fact that, in the limit, the population distribution does not matter. Thus the theorem is very
general.
5.15.When the population distribution is unknown.
5.16.The Central Limit Theorem does not apply.
5-2
5.17. P̂ is binomial. Since np = 1.2, the Central Limit Theorem does not apply and we cannot use the
normal distribution.
5.18.  = 1,247  2 = 10,000 n = 100

 1,230  1,247 
P( X < 1,230) = P  Z   = P(Z < –1.7) = .5 – .4554 = 0.0446
 100 / 10 
Sampling Distribution of Sample Mean
Population Distribution
Mean Stdev
1247 100
Sample Size Sampling Distribution of X-bar

n 100 Mean Stdev
1247 10
P(X<x) x
0.0446 1230
5.19.P  X  8   
= 1 – P X    8 = 1 – P(–8 < X   < 8)
 8 8 
= 1 – P Z  = 1 – P(–1.78 < Z < 1.78)
 55 / 150 55 / 150 
= 1 – 2(.4625) = 0.075
 3.6  3.4 
5.20.P(X > 3.6) = P  Z   = P(Z > 1.333) = 0.0912

1.5 / 100 
Mean Stdev
3.4 1.5

n 100 Mean Stdev
3.4 0.15
x P(X>x)
3.6 0.0912
5-3
 12  13.1 15  13.1 
5.21.P(12 < X < 15) = P  Z  

 1.2 / 36 1.2 / 36 
= P(–5.5 < Z < 9.5) = 2 (.5) = 1.000 (approximately)
(Use template: Sampling Distribution.xls, sheet: x-bar)

Is the population
Mean Stdev normal?
13.1 1.2

n 36 Mean Stdev
13.1 0.2
x1 P(x1<X<x2) x2
12 1.0000 15
5.22.s = 4,500 n = 225

    800 800 
 
P X    800 = P  Z 
800
 = P  Z 
 4,500 / 225   4,500 / 15 4,500 / 15 
= P(–2.667 < Z < 2.667) = 2(.4961) = 0.9923
5-23. p = 0.18 n = 200

 .20  .18   .02 
P( Pˆ  .20 ) = P  Z   = PZ 
  = P(Z  .736)
 (.18)(.82) / 200   .02717 
= .5 – .2692 = 0.2308
5.24.The claim is that p = 0.58. We have n = 250 and x / n = 123/250 = 0.492.

 .492  .58 
P( P̂  .492) = P  Z   = P(Z < -2.819) = 0.0024
 (.58)(.42) / 250 
5-4
5-25. P(X > $3M) = 0.00

 known
Mean Stdev Is the population normal?

2.6 0.4

n 75 Mean Stdev
2.6 0.04619
x P(>x)
3 0.0000
5-26. n = 16  = 1.5  =2
 0  1.5 
P( X > 0) = P  Z   = P(Z > -3) = .5 + .4987 = 0.9987
 2 / 16 
Mean Stdev
1.5 2

n 16 Mean Stdev
1.5 0.5
x P(X>x)
0 0.9987
5-5
5.27. p = 1/7
 .10  .143 
P( P̂ < .10) = P  Z   = P(Z < 1.648) = 0.5  0.4503 =
 (1 / 7)(6 / 7) / 180 
0.0497, a low probability. The sample size, along with np and n(1 – p), are large enough here that
the sample distribution (over all the different samples of 180 people in the population) of the
proportion of people who get hospitalized during the year is going to be pretty close to normal.
Therefore, any one such sample proportion will be close to the predicted mean 1/7 with reasonable
probability, and 1/10 is far enough away from that mean given our estimated sample standard
deviation that the probability of falling even farther away than that from the mean is small.
5-28.  = 700  = 100 n = 60

 680  700 720  700 
P(680  X  720) = P  Z 
 100 / 60 100 / 60 
= 2TA(1.549) = 0.8786
5-29. p =  = 0.35  = (0.35)(0.65) / 500 = 0.0213

P ˆ  p  0.05
P  = P( P̂ < 0.30) + P( P̂ > 0.40)
5-6
 0.30  0.35   0.40  0.35 

= PZ   + PZ  
 0.0213   0.0213 
= 1 – 2TA(2.344) = 0.0190
5.30. Estimator B is better. It has a small bias, but its variance is small. This estimator is more likely to
produce an estimate that is close to the parameter of interest.
5.31. I would use this estimator because consistency means as n   the probability of getting close
to the parameter increases. With a generous budget I can get a large sample size, which will make
this probability high.
 n  2  100 
5.32. ŝ 2 = 1,287 s2 =   ŝ =   1,287 = 1,300
 n 1   99 
5.33. Advantage: uses all information in the data.

Disadvantage: may be too sensitive to the influence of outliers.
5.34. Depends also on efficiency and other factors. With respect to the bias:
A has bias = 1/n
B has bias = 0.01
A is better than B when 1/n < 0.01, that is, when n > 1/0.01 = 100
5.35. Consistency is important because it means that as you get more data, your probability of getting
closer to your “target” increases.
5.36. n1 = 30, n 2 = 48, n3 = 32. The three sample means are known. The df for deviations from the
three sample means are:
df = n1 + n 2 + n3 – 3 = 30 + 48 + 32 – 3 = 107
5-7
5.37. a) the mean is the best number to use.
mean = 43.667
Deviation Deviation
Sample from mean squared
34 -9.667 93.45089
51 7.333 53.77289
40 -3.667 13.44689
38 -5.667 32.11489
47 3.333 11.10889
50 6.333 40.10689
52 8.333 69.43889
44 0.333 0.110889
37 -6.667 44.44889
SSD = 358
degrees of freedom = 8
MSD = SSD / df = 358 / 8 = 44.75
b) choose the means of the respective block of numbers: 40.75, 49.667, 40.5 minimized SSD =
195.917, df = 6, MSD = 32.65283
mean = 40.75 49.667 40.5
Deviation Deviation
34 -6.75 45.5625
51 10.25 105.0625
40 -0.75 0.5625
38 -2.75 7.5625
47 -2.667 7.112889
50 0.333 0.110889
52 2.333 5.442889
44 3.5 12.25
37 -3.5 12.25
SSD = 195.9167
c) Each of the numbers themselves. SSD = 0. MSD indicates that the variance is zero, which is
true since we are using each of the individual numbers to reduce SSD to zero.
5-8
d) SSD = 719, df = 9, MSD = 79.889

mean = 50
Deviation Deviation
34 -16 256
51 1 1
40 -10 100
38 -12 144
47 -3 9
50 0 0
52 2 4
44 -6 36
37 -13 169
SSD = 719
5.38. No, because there are n – 1 = 19 – 1 = 18 degrees of freedom for these checks once you know their
mean. Since 17 is on less, there is a remaining degree of freedom and you cannot solve for the
missing checks.
5.39. Yes. ( x1 +  + x18 + x19 )/19 = x . Since 18 of the x i are known and so is x , we can solve
the equation for the unknown x19 .
5.40. df = n-k
as k increases, df decreases, SSD decreases, MSD decreases
5.41. E( X ) =  = 1,065 V( X ) =  2 /n = 5002/100 = 2,500
5.42.  2 = 1,000,000
Want SD( X )  25
SD( X ) =  / n = 1,000 / n
1,000 / n  25
n  1,000/25 = 40
n  1,600. The sample size must be at least 1,600.
5.43.  = 53  = 10 n = 400
E( X ) =  = 53 SE( X ) = / n = 10 / 400 = 0.5
5-9

Mean Stdev
53 10
Sample Size
Sampling Distribution of X-bar
n 400 Mean Stdev
53 0.5
5-44. p = 0.5 n = 120

p (1  p ) (.5)(.5)
SE( P̂ ) = = = 0.0456
n 120
5.45.E( P̂ ) = p = 0.2
p (1  p ) (.2)(.8)
SE( P̂ ) = = = 0.04216
n 90
5.46.P = 0.5 maximizes the variance of P̂ . Proof:

p (1  p )
V( P̂ ) =
n
ˆ)
dV ( P 1 d 1
= (pp 2) = (1 – 2p)
dp n dp n
Set the derivative to zero:
1
(1 – 2p) = 0 1 = 2p p = 1/2
n
The assertion may also be demonstrated by trying different values of p.
5.47.P(0.72 < X < 0.82) = P(–10.95 < Z <16 .432) = 2(.5) = 1.00
 known

0.76 0.02

n 30 Mean Stdev
0.76 0.00365
5-10
x1 P(x1<X<x2) x2
0.72 1.0000 0.82
 1,000  1,065    650 

5.48.P( X  1.000) = P  Z   = PZ  
 500 / 10   500 
= P(Z  1.3) = .5 + .4032 = 0.9032
We need to use the Central Limit Theorem for a normal distribution.
5-49.  = 53  = 10 n = 400
 52  53 54  53 
P(52 < X < 54) = P  Z  = P(2 < Z < 2) = 0.9544
 10 / 20 10 / 20 
5-50. p = 0.5 n = 120

 .45  .5 
P( P̂  .45) = P  Z   = P(Z  1.095) = 0.8632
 (.5)(.5) / 120 
5-51. a. $8,128.08 found by $3.3M/406 = 8,128.08
 7000  8128.08 
b. P( X < 7000) = P  Z   = P(Z < 2.256)
 2000 / 16 
= .5000  .4880 = 0.012
P(X<x) x
0.0120 7000
5.52. 0.06  p  0.10

SE( P̂ ) = p (1  p) / n  0.03
Assume p = 0.06:
SE( P̂ ) = (.06)(.94) / n  .03
(.06)(.94)/n  .032
62.66  n
Now assume the other extreme, p = 0.10:
SE( P̂ ) = (.1)(.9) / n  .03
(.1)(.9)/n  .032
100  n
5-11
Now, we also know that the function SE( P̂ ) does not have a maximum point between p = 0.06
and p = 0.10 because the only maximum point of the function occurs at p = 0.5 (as we know from
Problem 5-46). Hence SE( P̂ ) is monotonic between p = 0.06 and 0.10, and thus n = 100 is the
minimum required sample size.
5.53. Random samples from the entire population of interest reduce the chance of a bias and increase
chance of being representative of the entire population. Also, we have a known probability of being
within certain distances of the parameter of interest. We use a frame and a random number
generator or a table of random numbers. A simple random sample is such that every possible set of
n elements has an equal chance of being selected.
5.54. A bias is a systematic deviation away from the target of estimation. A bias takes us away from the
target parameter in repeated sampling. If the bias is small and variance of the estimator is also
small, the bias may be tolerated, especially if the bias decreases as n increases.
5.55. The sample median is unbiased. The sample mean is more efficient; it is also sufficient. This is why
we prefer the sample mean. We must assume normality for using the sample median to estimate 
. The median is more resistant to outliers.
5.56. S 2 has n – 1 in the denominator because there are n – 1 degrees of freedom for deviations from the
sample mean. Using n – 1 instead of n makes S 2 an unbiased estimator of  2 .
5.57.  = 44  = 7 n = 50
P( X < 35) = P(Z < -9.0918) = .5  .5 = 0.00
 known

44 7

n 50 Mean Stdev
44 0.98995
P(<x) x
0.0000 35
5-58. 95% bounds on X :

  1.96 / n = 19.5  1.96(5.3/10) = [18.4612, 20.5388]
90% bounds on X :
19.5  1.645(5.3/10) = [18.62815, 20.37185]
5-12
Symmetric Intervals
x1 P(x1<X<x2) x2
18.46122 0.95 20.538779
18.62823 0.9 20.371772
5-59.  = 3.9  = 0.5 n = 25

 3.0  3.9 
P( X > 3.0) = P  Z   = 0.5 + 0.5 = 1.000
 0.5 / 25 
(Use template: Sampling Distribution.xls, sheet: x-bar)

Mean Stdev
3.9 0.5

n 25 Mean Stdev
3.9 0.1
x P(>x)
3 1.0000
5.60.df = (rows-1)(columns-1) = (5-1)(3-1) = 8
5-61. p = 0.38 n = 100

 .30  .38 
P( P̂ > 0.30) = P  Z   = P(Z > 1.648)
 (.38)(.62) / 100 
= .5 + .4503 = 0.9503
where stdev = SQRT(.38*.62)

Mean Stdev
0.38 0.48539

n 100 Mean Stdev
0.38 0.04854
5-13
x P(X>x)
0.3 0.9503
5-62. X is normal. But since  is unknown and we use S, the quantity ( X   )/(S/ n )
has the t ( n 1) distribution rather than the standard normal distribution Z.
5-63. No minimum (n = 1 is enough for normality).
5.64. X , P̂ , S 2 are unbiased. S is the square root of an unbiased estimator of  2 , thus

it is not unbiased. Proof:
Assume E(S) = 
then: (E(S))2 =  2
and: E(S 2) – (E(S))2 =  2   2 = 0 (since E(S 2) =  2 ).
But E(S 2) – (E(S))2 = V(S)
V(S) = 0 means that S is not a statistical estimator. The contradiction establishes the proposition
that S is biased.
5.65. This estimator is also consistent. It is more efficient than X , because  2 /n 2 <  2 /n.
5.66. df = 124 –3 = 121
a. Normal population requires the smallest minimum n.

b. Mound-shaped population requires the next higher minimum n.
c. Discrete population needs the highest minimum n.
d. Slightly skewed population: n more than for (b), less than for (c).
e. Highly skewed population: n less than for (c), but more than for (d).
The relative minimum required sample sizes are as follows:
n a < nb < n d < n e < n c
5.67. Yes. SE( X ) decreases as n increases:

SE( X ) =  / n , which goes to 0 as n goes to  . Statistically, it is always good to have as
large a sample as possible.
5.68. Draw repeated samples, preferably by simulation on a computer, and determine the empirical
distribution of the statistic: the relative frequency distribution of its values.
5-14
 .15  .20 
5.69. P( P̂ < .15) = P  Z   = P(Z < 1.976) = .5  .4759 = 0.0241
 (.2)(.8) / 250 
5-71.  = 25  =2 n = 100
 24  25 
P( X < 24) = P  Z   = P(Z < 5) = 0.0000003
 2 / 10 
Not probable at all.
5.72.P  Pˆ  0.60  0.07  = P(0.53  P̂  0.67)

 .53  .60 .67  .60 
= P  Z   = P(2.02  Z  2.02) = 0.9567
 (.60)(.40) / 200 (.60)(.40) / 200 
Sampling Distribution of Sample Proportion
Population
Proportion
p
0.6
Sample Size Sampling Distribution of P-hat

n 200 Mean Stdev
0.6 0.03464
x1 P(x1<P hat<x2) x2
0.53 0.9567 0.67
 1.52  1.57 1.62  1.57 

5.73.P(1.52 < X < 1.62) = P  Z  = 2TA(1.768) = 0.923
 0.4 / 200 0.4 / 200 
5-15
5-74.
a) point estimate for the sample mean is 52
Is the population
Mean Stdev normal?
52 2.4

n 40 Mean Stdev
52 0.37947
P(X<x) x x P(X>x) x1 P(x1<X<x2) x2

52 0.4958 53
b) P( 52 < X < 53) = 0.4958
5-75 (Use template: Sampling Distribution.xls, sheet: p-hat)

n = 400 p = 0.06
Sampling Distribution of Sample Proportion

Population Proportion
p
0.06
Sample Size Sampling Distribution of P-hat

n 400 Mean Stdev
0.06 0.01187
P(P-hat < 0.05) = 0.1999

P(<x) x
0.1999 0.05
5-76 (Use template: Sampling Distribution.xls, sheet: x-bar)

μ = 15830 σ = 458 n = 10

Mean Stdev
15830 458
5-16

n 10 Mean Stdev
15830 144.832
P( X  16000)  0.1202
x P(>x)
16000 0.1202
5-77 (Use template: Sampling Distribution.xls, sheet: x-bar)

μ = 3.42 σ = 1.5 n = 30

Mean Stdev
3.42 1.5

n 30 Mean Stdev
3.42 0.27386
P( X  4.00)  0.0171
x P(>x)
4 0.0171
Case 6: Acceptance Sampling of Pins
5-17
1) 0.6210. No it is not an acceptable level of performance
Is the population
Mean Stdev normal?
1.008 0.045
Sampling Distribution of X-bar

50 Mean Stdev
1.008 0.00636

0.99 0.6210 1.01
2) 1.00 This would result in the lowest SSD and MSD
3) 0.01104
Is the population
Mean Stdev normal?
1.008 0.01104

n 50 Mean Stdev
1.008 0.00156

0.99 0.9000 1.01
4) for 95% acceptance: stdev = 0.00861, for 99% acceptance: stdev = 0.00609
5) easier to adjust the mean. The production process will always cause vibrations, etc.
5-18
6)
Reduction
P(0.99 < X <
1.01) Stdev stdev Cost
0.9 0.01104 33.96 $ 172,992.24
0.95 0.00861 36.39 $ 198,634.82
0.99 0.00609 38.91 $ 227,098.22
7)
Reduction Total
P(0.99 < X <
1.01) Stdev stdev Cost Cost
$
0.9 0.04302 1.98 $ 588.06 668.06
$
0.95 0.03609 8.91 $ 11,908.22 11,988.22
$
0.99 0.02749 17.51 $ 45,990.02 46,070.02
8) Use a mean of 1.00 and adjust the standard deviation to 0.02749
5-19

Chapter 05 - Sampling and Sampling Distributions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 05 - Sampling and Sampling Distributions

Uploaded by

Copyright:

Available Formats

Chapter 05 - Sampling and Sampling Distributions

5-2. x = 97.9225 (estimate of  )

5-3. p̂ = x/n = 5/12 = 0.41667

5-4. x = 2121.667 s = 1737.714

Measures of Central tendency

Mean 2121.6667 Median

5.5. average price = 4.367 standard deviation = 0.3486

Basic Statistics from Raw Data

Measures of Central tendency

Mean 4.3676471 Median

5.12.Only if the population is itself normal.

5-13. E  X    = 125 SE  X    / n  20/ 5 = 8.944

5.15.When the population distribution is unknown.

5.16.The Central Limit Theorem does not apply.

5.18.  = 1,247  2 = 10,000 n = 100

Sample Size Sampling Distribution of X-bar

Sample Size Sampling Distribution of X-bar

(Use template: Sampling Distribution.xls, sheet: x-bar)

Sample Size Sampling Distribution of X-bar

5.22.s = 4,500 n = 225

5-23. p = 0.18 n = 200

5.24.The claim is that p = 0.58. We have n = 250 and x / n = 123/250 = 0.492.

5-25. P(X > $3M) = 0.00

Sampling Distribution of Sample Mean

Mean Stdev Is the population normal?

Sample Size Sampling Distribution of X-bar

Sample Size Sampling Distribution of X-bar

5-28.  = 700  = 100 n = 60

5-29. p =  = 0.35  = (0.35)(0.65) / 500 = 0.0213

 0.30  0.35   0.40  0.35 

5.33. Advantage: uses all information in the data.

5.37. a) the mean is the best number to use.

d) SSD = 719, df = 9, MSD = 79.889

5.41. E( X ) =  = 1,065 V( X ) =  2 /n = 5002/100 = 2,500

Sampling Distribution of Sample Mean

5-44. p = 0.5 n = 120

5.46.P = 0.5 maximizes the variance of P̂ . Proof:

Mean Stdev Is the population normal?

Sample Size Sampling Distribution of X-bar

 1,000  1,065    650 

5-50. p = 0.5 n = 120

5.52. 0.06  p  0.10

Mean Stdev Is the population normal?

Sample Size Sampling Distribution of X-bar

5-58. 95% bounds on X :

5-59.  = 3.9  = 0.5 n = 25

(Use template: Sampling Distribution.xls, sheet: x-bar)

Sample Size Sampling Distribution of X-bar

5.60.df = (rows-1)(columns-1) = (5-1)(3-1) = 8

5-61. p = 0.38 n = 100

where stdev = SQRT(.38*.62)

Sampling Distribution of Sample Mean

Sample Size Sampling Distribution of X-bar

5-63. No minimum (n = 1 is enough for normality).

5.64. X , P̂ , S 2 are unbiased. S is the square root of an unbiased estimator of  2 , thus

5.66. df = 124 –3 = 121

a. Normal population requires the smallest minimum n.

5.67. Yes. SE( X ) decreases as n increases:

5.72.P  Pˆ  0.60  0.07  = P(0.53  P̂  0.67)

Sampling Distribution of Sample Proportion

Sample Size Sampling Distribution of P-hat

 1.52  1.57 1.62  1.57 

Sample Size Sampling Distribution of X-bar