You are on page 1of 8

Chapter 4: Probability Distributions 23

Chapter 4: Probability Distributions


4.1. (a) 1567/(1567 + 433) = 1567/2000 = 0.7835
(b) 1 – 0.7835 = 0.2165
(c) (0.7835)(0.40) = 0.3134
4.2 4.2(0.95)(0.95) = 0.9025
4.3. (a) 210/885 = 0.2373 (c) (i) 50/885 = 0.0564
(b) (i) 50/210 = 0.238 (ii) (0.238)(0.2373) = 0.0564
(ii) 95/675 = 0.1407 (d) (50 + 580)/885 = 630/885 = 0.712
4.4. (a) The number of sports a person can play (y) is discrete, since it must take on integer values.
(b)
y 0 1 2
Probability 0.05 0.85 0.10
(c) 0.05 + 0.85 = 0.90
(d) 0(0.05) + 1(0.85) + 2(0.10) = 1.05
4.5. (a) The probabilities of each y value need to be taken into account (the mean is a weighted average).
(b) 0(0.85) + 1(0.13) + 2(0.01) + 3(0.01) = 0.18
4.6.
y 0.0000001 2,000,000
Probability 0.9999999 0.0000001
0(0.9999999) + 2000000(0.0000001) = 0.20 or $0.20.
4.7. (a)
y 0 1 2 3 4 5 6 7 8 9
Probability 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10
(b) å yP ( y ) = 0 ( 0.10 ) + 1( 0.10 ) +  + 8 ( 0.10) + 9 ( 0.10) = 0.45

4.8. (a) P ( Z ³ 1) = 1 - P ( Z < 1) = 1 - 0.8413 = 0.1587

(b) P ( Z £ - 1) = P ( Z ³ 1) = 0.1587

4.9. (a) P ( m - s < X < m - s ) = P ( - 1 < Z < 1) = 0.8413 - 0.1587 = 0.6826

(b) P ( m - 1.96s < X < m - 1.96s ) = P ( - 1.96 < Z < 1.96 ) = 0.9750 - 0.0250 = 0.95

(c) P ( m - 3s < X < m + 3s ) = P ( - 3 < Z < 3) = 0.9987 - 0.0013 = 0.9974

(d) P ( m - 0.67s < X < m + 0.67s ) = P ( - 0.67 < Z < 0.67 ) = 0.7486 - 0.2514 = 0.4972

4.10. (a) 2.33 4.11. (a) 0.67


(b) 1.96 (b) 1.64
(c) 1.64 (c) 1.96
(d) 1.28 (d) 2.58
(e) 0.67
(f) 0

Copyright © 2018 Pearson Education Ltd.
24 Statistical Methods for the Social Sciences

4.12. (a) 1.28 (c) 2.06


(b) 1.64 (d) 2.33
4.13. If the interval m - zs to m + zs contains 90% of a normal distribution, there is 5% below m - zs and 5%
above m + zs . Thus, m + zs equals the 95th percentile.
4.14. (a) (i) 75th percentile
(ii) 25% percentile.
(b) z = 0.67; Plug z = 0.67 into the equations in part (a) to get that m + 0.67s is the upper quartile of
a normal distribution and m - 0.67s is the lower quartile of a normal distribution.

4.15. (a) P ( Z > 2.10 ) = 0.0179 (b) P ( Z < - 2.10 ) = 0.0179

(c) P ( - 2.10 £ Z £ 2.10) = 1 - P ( Z < - 2.10) - P ( Z > 2.10 ) = 1 - 2 ( 0.0179 ) = 0.9642

4.16. The right-tail probability of 0.01 has z = 2.33.


4.17. (a) The 96th percentile is 1.75 standard deviations above the mean.
(b) The test score for the 96th percentile is 75 + 1.75(8) = 89.
4.18. 40 hours per week has z = (40 – 55)/18 = –0.833, so is 0.833 standard deviations below the mean. The
proportion of self-employed individuals who averaged more than 40 hours per week is 0.7967.
4.19. (a) An MDI of 140 has z = (140 – 100)/16 = 2.5, so is 2.5 standard deviations above the mean. The
proportion of children with an MDI of 140 or more is 0.01.
(b) The MDI score that is the 90th percentile is 1.28 standard deviations above the mean, so this score is 100 
1.2816120.48, or 120.
(c) The lower quartile is 0.67 standard deviations below the mean, which gives a lower quartile of 100
0.671689.28, or 89. Similarly, the upper quartile is 0.67 standard deviations above the mean, which
gives an upper quartile of 100 0.6716110.72, or 111. Since the MDI scores are approximately normal,
the median will be equal to the mean of 100.
4.20. (a) 258 days has z = ( 258 - 284) 14 = - 2.1667, so is 2.17 standard deviations below the mean. The
proportion of babies that would be born prematurely is 0.01A.
(b) Since 0.045 (the actual proportion of premature babies) is more than the proportion we would expect if
gestation times were normally distributed, the actual distribution is probably skewed to the left; so the left tail
probability more than 2.17 standard deviations below the mean is larger than the right-tail probability more
than 2.17 standard deviations above the mean.

4.21. (a) 25 gallons per week has z = ( 25 - 20) 5 = 1.0, and is 1.0 standard deviations above the mean. The
proportion of adults who use more than 25 gallons per week is 0.1587.
(b) The 95th percentile is 1.645 standard deviations above normal. We need to solve for x where 1.645 = (x –
20)/5. The value of x is 28.225. So, the mean would need to be about 28.2 gallons so that only 5% of adults
use more than 25 gallons per week.
(c) If the distribution of ground water use is not actually normal, we should expect it to be skewed to the
right, since there will be fewer adults with high ground water usage, which will cause the distribution to be
skewed to the right.
4.22. 90 has z0 – 83so is 1.0 standard deviation below the mean. 95 has z95 – 83)/71.71, so is
1.71 standard deviations above the mean. The proportion of students who earn an A is approximately (0.1587
– 0.0436) = 0.1151.
4.23. A’s score of 800 has z = (500 – 400)1.33, so is 1.33 standard deviations above the mean. B’s score of
36 has z = (36 – 25)/, so is 2.75 standard deviations above the mean. Relatively speaking, B’s score
of 36 is higher than A’s score of 500.

Copyright © 2018 Pearson Education Ltd.
Chapter 4: Probability Distributions 25

4.24. (a) z = (8000 - 5400) /1800 = 1.44


(b) About 0.0749 (7.49%) of the consumption expenditure exceeds $8000..
4.25. (a) 800 kWh is z = (800 – 583)/450 = 0.482 standard deviations above the mean. If the distribution were
normal, about 31.56% of the households used above 800 kWh.
(b) The distribution is probably skewed to the right due to a few very large homes that have high electricity
usage.
4.26. (a) The probability distribution for y is :
y 0 1 2
Probability 0.1 0.6 0.3
(b) The sampling distribution of the sample proportion p of the students selected who are female is
p 0 0.5 1
Probability 0.1 0.6 0.3
4.27. (a) The sampling distribution of the sample proportion of heads for flipping a balanced coin once is
p 0 1
Probability 0.50 0.50
(b) The sampling distribution of the sample proportion of heads for flipping a balanced coin twice is
p 0 0.5 1
Probability 0.25 0.50 0.25
(c) The sampling distribution of the sample proportion of heads for flipping a balanced coin three times is
p 0 1/3 2/3 1
Probability 0.125 0.375 0.375 0.125
(d) The sampling distribution of the sample proportion of heads for flipping a balanced coin four times is
p 0 0.25 0.50 0.75 1
Probability 0.0625 0.25 0.375 0.25 0.0625
(e) As the number of flips increases the sampling distribution of the sample proportion of heads seems to be
getting more normal, with the probabilities concentrating more around 0.50.

4.28. (a) The 36 possible pairs are ( 1,1) , ( 1, 2 ) , ( 1,3) , ( 1, 4 ) , ( 1,5) , ( 1,6) , ( 2,1) , ( 2, 2) , ( 2,3) , ( 2, 4) , ( 2,5) ,
( 2,6) , ( 3,1) , ( 3,3) , ( 3, 4 ) , ( 3,5) , ( 3,6) , ( 4,1) , ( 4, 2) , ( 4,3) , ( 4, 4) , ( 4,5) , ( 4,6) , ( 5,1) , ( 5, 2) ,
( 5,3) , ( 5, 4) , ( 5,5) , ( 5,6) , ( 6,1) , ( 6, 2 ) , ( 6,3) , ( 6, 4 ) , ( 6,5) , ( 6,6) .
(b) The sampling distribution for the sample mean y is
p 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
Probability 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
(c) (i) The histogram of the probability distribution for each roll is uniform.
(ii) The shape is triangular, but starting to approach a bell shape, compared to the uniform distribution
for Y .

Copyright © 2018 Pearson Education Ltd.
26 Statistical Methods for the Social Sciences

4.28 (continued)
(d) The mean of the probability distribution for each roll is
æ1 ö æ1 ö æ1 ö æ1 ö æ1 ö æ1 ö
m = å Y ×P ( Y ) = 1 ç ÷ + 2 ç ÷ + 3 ç ÷ + 4 ç ÷ + 5 ç ÷ + 6 ç ÷ = 3.5.
è6 ø è6 ø è6 ø è6 ø è6 ø è6 ø
The mean of the probability distribution for y is
æ1 ö æ1 ö æ1 ö æ1 ö æ1 ö æ1 ö
m = å y ×P ( y ) = 1 ç ÷+ 2 ç ÷+ 3 ç ÷+ 4 ç ÷+ 5 ç ÷+ 6 ç ÷ = 3.5.
è 36 ø è 36 ø è 36 ø è 36 ø è 36 ø è 36 ø
The means of the two distributions are the same because they are both symmetric about the mean of 3.5.
(e) There are more ( y1 , y2 ) pairs that have a sample mean closer to the true mean for the average of two
rolls than for the value of one roll. The spread of the probability distribution for y is less than that for the
probability distribution of a single roll.
s 0.5
4.29. (a) s y = = = 0.021378
n 547
(b) If actually 50% of the population voted for Y, it would be surprising to obtain 55% in this exit poll for X,
since 55% is 5% higher than 50%, and the standard error for the sampling distribution is 2.13%, that is, the
sample proportion of 0.55 is approximately 2.338 standard errors above 0.50.
(c) Based on the information from the exit poll, I would be willing to predict that Candidate Y would win the
election.
s 4
4.30. (a) The mean is 10.6, and the standard error is s y = = = 1.633.
n 6
s 4
(b) The mean is 10.6, and the standard error is s y = = = 0.5714.
n 49
s 4
(c) The mean is 10.6, and the standard error is s y = = = 0.4. As n increases, the standard error
n 100
decreases by a factor of n.
s 632.45
4.31. (a) The mean is 0.10, and the standard error is s y = = = 0.632.
n 1, 000,000

(b) It is very unlikely to come out ahead. The z -score for $1 is ( 1 - 0.20) 0.63 = 1.27. The probability of
winning more than $1 is therefore 0.1020, which is 10.20%.
4.32. (a) y does not have a normal distribution since the standard deviation almost as large as the mean. This
implies that y has a distribution that is skewed to the right.
s
(b) The sampling distribution of y is approximately normal with mean 6.5, and the standard error s y =
n
4.9
= = 0.1852.
700
(c) The sample mean would almost surely fall within 3 standard errors of the sample mean, which is the
interval 5.944 to 7.055.

Copyright © 2018 Pearson Education Ltd.
Chapter 4: Probability Distributions 27

æ 95 - 100 ö
4.33. (a) The probability that PDI is below 95 is P ( Y < 95) = P ç Z < ÷ = P ( Z £ - 0.33) = 0.3707.
è 15 ø
æ 95 - 100 ö
(b) The probability that the sample mean PDI is below 95 is P ( Y < 95) = P ç Z < ÷
è 15 36 ø
= P( Z < - 2.0) < 0.0228.
(c) An individual PDI of 95 is not surprising, since the probability is 0.3707. A sample mean PDI of 95
would not be surprising, however it is less likely since this value is small.
(d) The sketch of the sampling distribution should be less spread out and have a taller peak and thinner tails
than the sketch of the population distribution.

æ - 20 20 ö
4.34. (a) P ( m - 20 < y < m + 20) = P ç <y< ÷ = - 0.5 < z < 0.5. Then the probability =
è 400 100 400 100 ø
0.6815 – 0.3085 = 0.3730
(b) If the actual population standard deviation is larger than 400, the probability would be smaller than the
probability found in part (a).
4.35. (a) The variable y is the number of children in a household in Country A.
(b) The center of the population distribution is 3.5 children, with standard deviation 2 children.
(c) The center of the sample data distribution is 4.2 children, with standard deviation 1.5 children.
(d) The center of the sampling distribution of the sample mean for 300 homes is 3.5 children, with standard
error 2 300 = 0.1154.

4.36. (a) Let y = 1 if the student is female and y = 0 if the student is male.
(b) The population distribution of gender at this university has P ( Y = 1) = 0.55 and P ( Y = 0) = 0.45.

(c) The sample data distribution of gender has P ( Y = 1) = 0.53 and P ( Y = 0) = 0.47.
(d) In a random sample of size 60, we expect the sampling distribution of the sample proportion of females in
the sample to be approximately normal with mean 0.55 and standard error of 0.07.
4.37. (a) The population distribution is skewed to the right with mean 10.4 and standard deviation 6.0.
(b) The sample data distribution based on the sample of 40 families and is skewed to the right with mean 8.6
and standard deviation 5.2.
(c) The sampling distribution of y is approximately normal with mean 10.4 and standard error
6 40 = 0.94. This distribution describes the theoretical distribution for the sample mean.

æ - 0.5 0.5 ö
(d) P ( m - 0.5 < y < m + 0.5) = P ç <Z < ÷ = - 0.53 < z < 0.53. Then the probability equals
è 6 40 6 40 ø
0.7019 – 0.2981 = 0.4038.
(e) If the sample were truly random, then the probability that y would be 6.0 or less is P ( Y < 0.6)
æ 6 - 8.6 ö
= P çZ < ÷ = P ( Z < - 4.07 ) » 0.
è 6 40 ø
4.38. (a) Answers will vary. 4.40. (a) Answers will vary.
(b) Answers will vary. (b) Answers will vary.
4.39. (a) Answers will vary. (c) Answers will vary.
(b) The sample size is not large enough.

Copyright © 2018 Pearson Education Ltd.
28 Statistical Methods for the Social Sciences

4.41. (a) The probability distribution of y is given below. The mean of the distribution is 0 ( 0.5) + 1 ( 0.5) = 0.5.
y 0 1
Probability 0.5 0.5
(b) The probability distribution of the sample distribution is
y 0 1
Probability 0.4 0.6
(c) Answers will vary.
(d) The theoretical mean of the sampling distribution is 0.5 and the theoretical standard deviation is
s n = 0.5 10,000 = 0.005.
4.42. Answers will vary.
4.43. (a) The population distribution is skewed to the left with mean 6 years and standard deviation 10 years.
(b) The sample data distribution is skewed to the left with mean 8.3 years and standard deviation 7.0 years.
(c) The sampling distribution of y is approximately normal with mean 6 years and standard error 1.0 years.
This distribution specifies probabilities for the possible values of y for all the possible samples.
(d) The probability of observing a child of age 5 in Rainbow City is P(Y < 5) = P[Z < (5 – 6)/10)] = P(Z < –
0.1) = 0.46. The probability of observing a sample mean of 5 for a random sample of size 100 is P(Z < (5 –
6)/(10/10) = P(Z < – 1) = 0.159. Thus, it is not unusual to observe a child of age 5 years in Rainbow City, and
it is not very unusual to observe a random sample of size 100 in the City with an average age of 5.
4.44. (a) The sampling distribution of y for a random sample of size n = 1 is exactly the same as the population
distribution.
(b) If you sample all 30,000 children in Rainbow City, there will not be a sampling distribution, and you will
know that the population mean is 6 years and the population standard deviation is 10 years.
4.45. (a) A stem-and-leaf plot of the population is
Stem (10s) Leaves (1s)
2 3
2 578899
3 0123
3 6677899
4 011233
4 556789 510014
5 677
6 02234
6 6778
7 01
7 6
8 1
(b) The mean of the y -values in a long run of repeated samples of size 9 should be approximately 47.18.
(c) The standard deviation of the y -values in a long run of repeated samples of size 9 should be
approximately 4.9.
4.46. (a) The sample data distribution tends to resemble the population distribution more closely than the sampling
distribution. A random sample of data from a population should be representative of the population, and
its distribution should be similar to the population distribution.
(b) The sample data distribution is the distribution of data that we actually observe. The sampling distribution
of y is the probability distribution for the possible values of the sample statistic y .

Copyright © 2018 Pearson Education Ltd.
Chapter 4: Probability Distributions 29

4.47. (a) A lower bound for the mean is


m = å yP ( y ) = 1( 0.02) + 2 ( 0.14) + 3 ( 0.07 ) + 4 ( 0.33) + 5 ( 0.15) + 6 ( 0.28) = 4.26.
(b) Since we know the category of ideal number of children that falls at the 50% point, we can find the
median. The median is 4 children.

4.48. (a) P ( y < m + 0.67s ) = P ( z < 0.67) = 0.7514; Thus, the upper quartile equals m + 0.67s .

(b) The IQR for a normal distribution is ( m + 0.67s ) - ( m - 0.67s ) = 1.34s . This gives us
1.5 ( IQR ) = 1.5 ( 1.34s ) = 2.01s . The 1.5(IQR) criterion would tell us that an outlier lies above
m + 0.67s + 2.01s = m + 2.68s , which is about 2.7 standard deviations above the mean. The probability that
data from a normal distribution would fall above this point is P ( y > m + 2.7s ) = P ( z > 2.7 ) = 0.0035. Note
that outliers will also fall below m - 2.7s , which has area 0.0035 (by symmetry). Therefore, only 0.7% of the
data are outliers using the 1.5 (IQR) criterion.
4.49. The standard error for the poll, assuming that the true proportion is 0.5, is 0.0083.
Since the 50.8% statistic from the exit poll is (0.508 – 0.5)/ 0.0083 = 0.4819 standard deviations above the
expected mean, which is not unusual, we should not be willing to predict the winner of the race.
s 0.5
4.50. When n = 100, s y = = = 0.05; The interval 0.5 - 3 ( 0.05) = 0.35 to 0.5 + 3 ( 0.05) = 0.65 is then n =
n 100
0.5
100 interval within which the sample proportion is almost certain to fall. When n = 1000, s y = = 0.016;
1000
The interval 0.5 - 3 ( 0.016) = 0.453 to 0.5 + 3 ( 0.016 ) = 0.547 is the interval within which the sample proportion
s 0.5
is almost certain to fall. When n = 10, 000, s y = = = 0.005; The interval 0.5 - 3 ( 0.005) = 0.485 to
n 10,000
0.5 + 3 ( 0.005) = 0.515 is the interval within which the sample proportion is almost certain to fall.

4.51. The correct responses are (a), (c), and (d). 4.52. The correct response is (c).
4.53. False; As the sample size increases, the standard error of the sampling distribution of y decreases, since
s y =s n decreases as n increases.

æ500 - 550 ö
4.54. (a) Group A: P ( y < 500) = P ç = P ( z < - 0.42) = 0.3409; Almost 34% of students from Group A
è 120 ÷ ø
are not admitted to the Junior College.
æ500 - 600 ö
Group B: P ( y < 500) = P ç = P ( z < - 0.83) = 0.2033; Almost 20% of students from Group B are
è 120 ÷ ø
not admitted to the Junior College.
(b) Of the students who are not admitted, 0.3409 / ( 0.2033 + 0.3409) = 0.3409 / 0.5442 = 0.6264, or about
63%, are from Group A.
(c) If the new policy is implemented, the proportion of students from Group A that are not admitted would be
0.0188, while the proportion of students from Group B that are not admitted would be 0.0062. In this case,
about 75.2% of the students who are not admitted would be from Group A. Relatively speaking, this policy
would hurt students from Group A more than the current policy.

Copyright © 2018 Pearson Education Ltd.
30 Statistical Methods for the Social Sciences

4.55. (a) m = å yP ( y ) = 0 ( 1 - p ) + 1( p ) = p ; s = 2 2 2
å ( y - m) P ( y ) = ( 0 - p ) ( 1 - p ) + ( 1 - p ) ( p )
= p 2 - p 3 + p - 2p 2 + p 3 = p - p 2 = p ( 1 - p )

(b) The standard error for a sample proportion for a random sample of size n is s n= ( p ( 1- p ) ) n
= p ( 1- p ) n.

4.56. Since f ( m + c ) =
1
e
- ( ( m+c ) - m)
2
( 2s )
2
=
1
e
- c2 ( 2s )
2

and f ( m - c ) =
1
e
- ( ( m- c ) - m)
2
( 2s ) 2

2p s 2p s 2p s

=
1
e
- c2 ( 2s )
2

are equal. Thus f ( m + c ) = f ( m - c ) , and the curve is symmetric.


2p s

4.57. (a) The finite population correction is ( 30,000 - 300) ( 30,000 - 1) = 0.99 = 0.995.
s 0
(b) If n = N , the finite population correction is ( N - N ) ( N - 1) = 0, so s y = = = 0.
n N
s s
(c) When n = 1, the finite population correction is ( N - 1) / ( N - 1)
= = 1, s
= s . Thus, the
y =
n 1
sampling distribution of y and its standard error are identical to the population distribution and its standard
deviation.
4.58. No solution provided.
4.59. No solution provided.

Copyright © 2018 Pearson Education Ltd.

You might also like