You are on page 1of 25

Sample Questions Solution

Term I Week 1

Chapter 1 Data Representation

Sec 1.1 Representing data using tables

52  78  134 264
(ST1) 1. Relative frequency =   0.66
400 400
^
(ST2) 2. (0.42 + 0.1 + 0.06) × 50 = 0.74 × 50 = 29 customers
^
72  14 58
(ST3) 3. Proportion =   0.725
80 80
^
(ST4) 4. (0.92 – 0.04) × 200 = 176 cars

^
Term I Week 2 and 3

Sec 1.2 Graphical representation of data

5  10  15 30
(ST5) 5.  100%   100%  15%
200 200
^
2.5  2.5  10  17.5 32.5
(ST6) 6.  800   800  260 seedlings
100 100
^
(ST7) 7. 4 students
^
11
(ST8) 8. Proportion =  0.275
40
^
(ST9) 9. The difference in temperature was 2 – (–3) = 5 C
^

Level N | 1
Term I Week 4
Chapter 2 Statistical Measures
Sec 2.1 Measures of Center

(ST10) 10. In ascending order, the numbers, except p, are 21, 22, 24, 25, 29, 35, 36, 38, 40.
29  p
If Median = 31, p must be in the 6th place, so that  31 , yielding p = 33.
2
^
58  62
(ST11) 11. The median belongs to the interval [58, 61), with the midpoint  60
2
^
p 2p 3p 4p 5p 6p
(ST12) 12. Mean =  10.50 is equivalent to 21p = 63,
6
yielding p = 3
^
15  15  45  25  75  40  105  60  135  10 12 000
(ST13) 13. Mean =   80 min
150 150
^
(ST14) 14. Mean = 15 × 0.15 + 25 × 0.25 + 35 × 0.4 + 45 × 0.15 + 55 × 0.05 = 32 min
^
(ST15) 15. The relative frequencies are 0.05, 0.2, 0.35, 0.25, 0.1, 0.05
The class centers are 10, 30, 50, 70, 90, 110
Mean = 10 × 0.05 + 30 × 0.2 + 50 × 0.35 + 70 × 0.25 + 90 × 0.1 + 110 × 0.05
= 56 min
^
(ST16) 16. Let x be the mean score on the remaining two tests.
3  78  2  x
Then,  80 , which yields 234 + 2x = 400. Therefore, x = 83.
5
^
(ST17) 17. a. The median is the mean of the 43rd and 44th elements in the data set, when
arranged in ascending order. Both these elements belong to the interval
45 - 50, so, the median haircut time is between 45 and 50 minutes.
b. Since the distribution is skewed to the left, it is more likely that the median
haircut time of the 86 clients is greater than the mean.
^
(ST18) 18. The distribution is skewed to the left, since few of the fruit types are sold in
smaller amounts than the rest.
The median should be used as the measure of central tendency because it is less
affected by the shape of the distribution and the presence of gaps and outliers
than the mean.
The interquartile range should be used as the measure of variation because it is
less affected by the shape of the distribution and the presence of gaps and
outliers than the range.

Level N | 2
45  70  32  52
(ST19) 19. a. The mean would be  45.29 , correct to 2 dp.
70
b. E
^

Term I Week 5 and 6

Sec 2.3-2.4 Measures of location and Linear Transformation

(ST20) 20. 1. About 30 percent of the houses cost less than 60 thousand dollars.
3. About 20 percent of the houses cost more than 70 thousand dollars.
6. Less than 10 percent of the houses cost more than 80 thousand dollars.
^
(ST21) 21. a. The distribution is skewed to the left, since Median > Mean,
Q1 – Min = 10.56 > Max – Q3 = 4.47, and/or
Q3 – Median= 3.44 < Median – Q1 = 4.86.
b. About 75% of the seedlings have heights between 33.41 cm and 46.18 cm.
c. IQR = 41.71 – 33.41 = 8.3
Q1 – 1.5 ×IQR = 33.41 – 1.5(8.3) = 20.96 > Min.
So, there is at least one outlier on the low side.
Q3 + 1.5 ×IQR = 33.41 – 1.5(8.3) = 54.16 > Max.
So, there are no outliers on the high side.
^
(ST22) 22. It is 0.5, which means that the time it took the student to finish the exam is 0.5
standard deviations below the mean time.
^
(ST23) 23. a. The predicted ideal weight is 7.95 × (128 – 88) = 318 kg more.
b. The required proportion is 0.62. (that is 62% of the variation in the ideal car
weight can be explained in terms of the variation in engine power using the
regression model).
^

Level N | 3
Term I Week 7

Sec 2.5 Compare data sets

(ST24) 24. The distribution of the scores obtained by the students in Group A is
approximately normal (or symmetrical), while the distribution of the scores
obtained by the students in Group B is skewed to the left.
The median of the scores obtained by the students in Group A is less than the
median of the scores obtained by the students in Group B.
The mean of the scores obtained by the students in Group A (  84) is less than
the mean of the scores obtained by the students in Group B (calculated to be
86.725).
^
(ST25) 25. 3. The inter-quartile range of data set A is smaller than that of data set B.
5. The standard deviation of data set A is smaller than that of data set B.
^
(ST26) 26. D. The median of the distribution of Semester 1 Exam scores is the same as the
first quartile of the distribution of Semester 2 Exam scores (both 83).
^
(ST27) 27. a. The distribution of day shift phone operators tended to work longer overtime
hours.
b. The distribution of day shift phone operators has a larger proportion of
those working more than 10 hours, which is (20 + 20 + 16 + 8)/100 = 0.64,
while for the night shift workers it is (6 + 4 + 2 + 1 + 1)/100 = 0.14.
c. The distribution of day shift phone operators has a larger median, since it is
skewed to the left.
d. The distribution of night shift phone operators appears to have larger
variability, since it has both a larger range and a larger IQR.
^
(ST28) 28. a. The median price of the houses on the right bank is between 110 and 120
thousand dollars, which is higher than the median price of the houses on the
left bank having a value between 100 and 110 thousand dollars.
b. The distribution of the prices of the houses on the right bank is skewed to the
right, so its mean is greater than its median.
The distribution on the left bank is approximately symmetrical, so its mean
is the same as its median (or very close).
Since from part a) we know that the median price of the houses on the right
bank is higher than the median price of the houses on the left bank, then, we
can conclude that the mean price of the houses on the right bank is higher
than the mean price of the houses on the left bank.
^
(ST29) 29. a. Using the minimum and maximum values from each boxplot, the combined
concentration of methane and carbon dioxide for Toronto must be within
6 + 4 = 10 ppb and 22 + 17 = 39 ppb.
For Mexico City: within 13 + 11 = 24 ppb and 31 + 25 = 56 ppb.
For Tokyo: within 8 + 6 = 14 ppb and 22 + 19 = 41 ppb.

Level N | 4
b. Mexico City has the largest variation in the distribution of concentrations of
methane (CH4) in the samples collected since it has the largest range, and
since its IQR is larger than that of Tokyo and similar to that of Toronto.
Since 48 > 39 and 48 > 41 but 48 < 56, it is most likely that this sample was
collected from Mexico City.
^
(ST30) 30. a. The median height of the seedlings fed with the chemical fertilizer is 39
which is smaller than the median height of the seedlings fed with the organic
fertilizer which is 41. (OR The median height of the seedlings fed with the
chemical fertilizer is same as the lower quartile of the heights of the seedlings
fed with the organic fertilizer.)
b. The variation of heights of the seedlings fed with the chemical fertilizer is
larger than the variation of the heights of the seedlings fed with the organic
fertilizer since both the range and the interquartile range of variation of
heights of the seedlings fed with the chemical fertilizer are larger.
^

Level N | 5
Term I Week 8 and 9

Chapter 3 Linear Regression

(ST31) 31. Since the association is linear, a change in x would produce a roughly
proportionate change in y.
Since the association is positive, an increase in x would produce an increase in
y.
Since the association is strong, the data can be modeled with a good degree of
accuracy using a regression line.
^
(ST32) 32. The actual BMI of this particular 15-year old student is 0.62 kg/m higher than
would be predicted for a 15-year old student.
^
(ST33) 33. The actual grade point average is 4.72 – 0.09(14) – 0.18 = 3.28.
^
(ST34) 34. A student who is 1 year younger than another student is predicted to have a
lower BMI by 0.91 kg/m.
^
(ST35) 35. a. Point J.
b. Point G.
c. Points M and N.
^
(ST36) 36. After adding point M to the original scatterplot, both the slope of the least
squares regression line and the correlation coefficient decrease.
^
(ST37) 37. a. The scatterplot produced by the first method reveals a weak positive linear
association between the variables, while the scatterplot produced by the
second method reveals a moderate positive linear association.
In general, method 2 would yield a more accurate prediction for the
percentage of iron in the samples examined because the correlation
coefficient of the linear regression between the variables will be larger.
b. The sum of the squares of the residuals when predicting the percentage of
iron using method 2 is less than the sum when predicting the percentage of
iron using method 1.
^
(ST38) 38. Based on the residual graphs above, the height is a better predictor for the arm
span because the corresponding association decreases the variability in the
response variable that has to be explained after prediction.
^

Level N | 6
(ST39) 39. Using the graph on the right, there 5 points below the line y = x, meaning that 5
students scored less on the final exam than what they scored on the trial exam.
Using the equation of the least squares regression line, the predicted score is
7.12 + 0.93 (74) = 75.9, to 1 dp.
^
(ST40) 40. Statements B, C, and D may be justified.
^

Level N | 7
Term I Week 10 and 11

Chapter 4 Probability

Sec 4.1 Probability of random events

4 3 2 1 24
(ST41) 41. Probability =      0.005 , to 3 dp
10 9 8 7 5040
^
4 3 2 24 4
(ST42) 42. a. Probability =    
7 6 5 210 35
4 4 4 64
b . Probability =   
7 7 7 343
^
(ST43) 43. a. P(Same age group) = 0.272 + 0.182 + 0.412 + 0.072 + 0.072 = 0.28, to 2 dp
0.07 0.07
b. P(X  65 | X  25) =   0.127 , to 3 dp
0.41  0.07  0.07 0.55
^
(ST44) 44. a. The probability that the group assigned to take version 1 of the test consists
of two girls is equal to the probability that Boy 1 takes Version 2 and also
Boy 2 takes Version 2 =
1 1 1
P(Boy 1 takes Version 2) × P(Boy 2 takes Version 2) = × = = 0.25 .
2 2 4
b. The probability that the group assigned to take version 1 of the test consists
of Boy 1 and Girl 2 is equal to the probability that Boy 1 takes Version 1,
Boy 2 takes Version 2, and Girl 2 takes Version 1 =
P(Boy 1 takes Version 1) × P(Boy 2 takes Version 2) × P(Girl 2 takes Version
1 1 1 1
1) = × × = = 0.125 .
2 2 2 8
^
(ST45) 45. a. The probability that the group assigned to take version 2 of the test consists
of two girls is equal to the probability that the group assigned to take version
2 1 1
1 of the test consists of the two boys = × = = 0.17 (to 2 dp).
4 3 6
b. The probability that the group assigned to take version 2 of the test consists
of Boy 2 and Girl 2 is equal to the probability that Boy 1 takes version 1, Boy
2 2 1 1
2 takes version 2, and Girl 1 takes version 1 = × × = = 0.17 (to 2 dp).
4 3 2 6
^
(ST46) 46. Complete the tree diagram

Level N | 8
Student answers
correctly
1
Student knows 0
the name
0.45 Student answers
Student chosen incorrectly
randomly
0.55 Student answers
correctly
Student guesses 1/5 = 0.2
the name
4/5 = 0.8
Student answers
incorrectly

Probability (Naming the top winner) = 0.45  1 + 0.55  0.2 = 0.56


^
(ST47) 47. P(At least one fails) = 1 – 0.85  0.9 = 0.235
0.15  0.8 0.12
P(Only A fails | At least one fails) =   0.51 , to 2 dp
0.235 0.235
^
146  98 244
(ST48) 48. a. P(liberal and between 25 and 64) =   0.16 , to 2 dp
1494 1494
537  500  164 873
b. P(moderate or younger than 25) =   0.58 , to 2 dp
1494 1494
98
c. P(liberal | between 50 and 64) =  0.28 , to 2 dp
346
137
d. P(between 25 and 49 | conservative) =  0.29 , to 2 dp
465
^
92
(ST49) 49. a. P(Prefers Math) =  0.23
400
40
b. P(Math | Grade 8) =  0.22 , to 2 dp
180
c. Based on the results obtained in parts (a) and (b) above, subject preference
and grade level for students at Cyprus Middle School are dependent.
^

Level N | 9
Term I Week 12

Sec 4.2 Distributions of random variables

(ST50) 50. a. P(first 12 positive feedbacks) = 0.8512 .


b. P(13th or 14th is the first complain | first 12 positive feedbacks)
0.8512  0.15  0.8513  0.15
=  0.2775 .
0.8512
^
(ST51) 51. a. P(a winning ticket) = 0.1 + 0.05 + 0.002 = 0.152
b. Expected value = 0  0.848 + 10  0.1 + 50  0.05 + 100  0.002 = $3.70
c. Amount raised = 8  1,000 – 3.70 1,000 = $4,300
^
(ST52) 52. Let random variable X represent the prize a player may win. The table below
shows the probability distribution of X.

Prize won ($) 0 8 10


Probability 0.64 0.16 0.2

Let n be the number of times the game is played.


Expected net income = 5  n – (8 0.16 + 10  0.2)  n
= 5n – 3.28n = 1.72n
Solving 1.72n ≥ 200 for n gives n ≥ 200  1.72 = 116.279….
Therefore, the game should be played at least 117 times.
^
(ST53) 53. a. The expected value of the number of busy lines in the afternoon of a working
day is 0  0.36 + 1  0.32 + 2  0.22 + 3  0.10 = 1.06.
b. P(At least 1 line available) = 0.36 + 0.32 + 0.22 = 0.9
(OR 1 – 0.1 = 0.9)
c. P(All available | At least 1 available) = 0.36  0.9 = 0.4
^
(ST54) 54. a. E(X) = 36  0.2 = 7.2.
b. E(Y) = 7.2  5 – 28.8  1.25 = 0
c. Expected score = 29  5 + 7  0 = 145
d. In order to score more than 160 points, the students need to earn more than
15 points while guessing answers to the remaining 7 questions. If he guesses
n questions right, he scores n  5 – 1.25(7 – n) = 6.25n – 8.75. Solving 6.25n –
8.75 > 15 yields 6.25n > 23.75 or n > 3.8. So, the student needs to guess at
least 4 right answers.
P(more than 160 points) = P(at least 4 correct guesses)
 7  7  7
=    0.24  0.83 +    0.25  0.82 +    0.26  0.81
 4  5  6
 7
+    0.27  0.80 = 0.033, to 3 dp.
 7
^

Level N | 10
(ST55) 55. a. This is geometric distribution, p = 0.4
P(X = 2) = 0.6  0.4 = 0.24.
b. X ~ B(0.4, 20)
 20 
P(X = 2) =    0.42  0.618 = 0.003, to 3 dp
2
2
 20 
c. P(X > 2) = 1 – P(X  2) = 1 –   x   0.4 x
 0.620 x = 0.996, to 3 dp
x 0  
^
(ST56) 56. a. If the technician does not attempt to repair any board, his Net profit would
be 20  (–5) = –$100.
On the other hand, Net profit = a  0 – b = –b.
Therefore, b =100.
b. If the technician attempts to repair all 20 boards, his Net profit would be 0.8
 20  13 + 0.2  20  (–7) = 208 – 28 = 180.
On the other hand, Net profit = a  20 – 100 = 20a – 100.
Therefore, 180 = 20a – 100 and thus, a = 280  20 = 14.
c. Solving 14n – 100  60 for n yields 14n  160 or n  11.4
The technician should attempt to repair at least 12 boards.
^
(ST57) 57. Let X be the lowest possible score Steve could have.
Using the Inverse Normal distribution table (or GDC),
X  70
 1.6449 , yielding X  6 1.6449 + 70 = 79.8694.
6
Therefore, the lowest possible score is 80.
^
(ST58) 58. a. Let random variable X represent the prize a ticket may win. The table below
shows the probability distribution of X.

Prize won ($) 0 7 19


1 3 1
Probability
2 8 8

1 3 1 40
E(X) = 0   7   19   = $5
2 8 8 8
The expected money raised from a single ticket is 8 – 5 = $3
b. If n the number of tickets sold, 3n  150 yields n  50.
c. Using the normal distribution N(180, 502),we have
P(Money collected  150) = 0.73, to 2 dp
^
(ST59) 59. a. Let random variable XA represent the amount of juice in a carton filled in by
machine A. XA ~ N(255, 82).
P(XA > 260) = 0.27, to 2 dp
b. Let random variable X represent the amount of juice in a carton filled in by
any machine.
P(X > 260) = 0.45  0.27 + 0.55  0.42 = 0.35, to 2 dp

Level N | 11
0.55  0.42
c. P(Machine B) =  0.66
0.35
^
(ST60) 60. Based on the given summary statistics, it is not reasonable to assume that the
distribution of the yearly incomes is normal because the maximum value is less
than one standard deviation from the mean.
^

(ST61) 61. The distribution of the weights of the coffee bags filled in by the two machines
combined is bimodal with a mean of about 12 ounces.
The distribution has a variation that is larger than the variation of the
individual distribution of the coffee bags filled in by each of the two machines.
^
1 1  100
(ST62) 62. For the first stack, the expected number is   100  50.5 .
100 2
1 201  250
For the second stack, the expected number is   50  225.5 .
50 2
The expected value of the sum of the two numbers is 50.5 + 225.5 = 276.
^
(ST63) 63. a. Let Y represent the height of an empty pallet and Z represent the total height
of a loaded pallet. Y ~ N(15, 1.32) and Z ~ N(155, 5.22).
Z = X1 + X2 + X3 + X4 + Y yields X1 + X2 + X3 + X4 = Z  Y
This means E ( X 1  X 2  X 3  X 4 )  4 E ( X )  E ( Z  Y )  E ( Z )  E (Y ) .
So, E(X) = 0.25(155) – 0.25(15) = 35
b. X ~ N(35, 2.682)
P  X  33   1  P ( X  33)
 33  35 
 1 P  Z 
 2.68 
 1  P ( Z  0.75)
 1  0.2266  0.77, to 2dp
^

Level N | 12
Term I Week 13

Sec 4.3 Sampling distributions

3, 724
(ST64) 64. a. p   0.19
19, 600
500  0.19 = 95 students are expected to be enrolled in vocational training
programs.
b.   np(1  p)  500  0.19  0.81  76.95  8.772...  9
c. X ~ N(95, 92)
X - 95 80 - 95
P(X  80) = P( ≤ ) = P(z  –1.67)  0.05, to 2 dp
9 9
^
(ST65) 65.
σ 2 full-time
The variance of the sampling distribution of X = (0.7)2 + (0.3)2
70
σ 2part-time
30
(4.8)2 (2.25)2
= (0.7)2 + (0.3)2 = 0.1764675.
70 30
To two decimal places, the standard deviation of the sampling distribution of
X is equal to 0.42.
^
X - 60 63 - 60
(ST66) 66. a. X ~ N(60, 3.62) and P(X > 63) = P( > ) = P(z > 0.83)  1 – 0.7967
3.6 3.6
 0.203
3.6
b. Sample mean has standard deviation  1.2
9
X - 60 63 - 60
X ~ N(60, 1.22) and P( X > 63) = P( > ) = P(z > 2.5)  1 – 0.9938
1.2 1.2
 0.006
^
0.6
(ST67) 67. a. Standard deviation is  0.12 .
25
60
b. Students are given  2.4 min per question.
25
If X is the time student needs to answer a question, X ~ N(2.1, 0.122) and
therefore, the estimated probability that an average student will not be able
to answer all the questions on the exam due to lack of time is
X - 2.1 2.4 - 2.1
P(X > 2.4) = P( > ) = P(z > 2.5)  1 – 0.9938  0.006, to 3 dp.
0.12 0.12
^

Level N | 13
Term II Week 1 and 2

Chapter 5 Samples, Observations and Experiments


(ST68) 68. D. Stratified random sampling
^
(ST69) 69. The answer is D.
Meet with the guardians of 200 kids randomly selected from the identified
1,278 pre-school children and request each guardian to respond to a properly
designed questionnaire.
^
(ST70) 70. a. Melatonin level is the explanatory variable and suffering from dementia is
the response variable.
b. The study conducted is an observation because melatonin deficiency of the
patients was recorded before the study.
c. The results of the study are not conclusive to establish an association
between melatonin levels and dementia because it did not include a follow up
on people known to have no melatonin deficiency before being diagnosed
with dementia in the preceding years.
^
(ST71) 71.
D. 300
Frequency

200

100
y

0
–0.4 –0.2 0.0 0.2 0.4
Difference in proportions of success

^
(ST72) 72. The experimental units are the glass containers with ant farms, the treatments
are the different temperatures in the five boxes and the response is the number
of ants in the glass containers.
^
(ST73) 73. a. The study is an experiment because treatments were imposed on the
participating individuals.
b. The study is completely randomized because of the selection of the sample
and the assignment of the individuals to each group. Thus, the results of the
study cannot be generalized to all middle age females suffering from
insomnia.

^
(ST74) 74. No, because the type of medication is confounded with age.
^

Level N | 14
(ST75) 75. a. No, toddlers spending less time watching TV are not necessarily less likely to
suffer from eye problems than those who spend more time watching TV
because the study is an observation and not an experiment.
b. The study does not establish a cause-effect relationship between the time
spent watching TV and the frequency of having eye problems due to the
possibility of the presence of a lurking variable.
^

Level N | 15
Term II Week 3 and 4

Chapter 6 Estimation
(ST76) 76. If all possible random samples of same size taken from the families living in the
city are considered and a 90 percent confidence interval is constructed for each
sample, then 90 percent of these intervals will contain the proportion of all
families in the city who have children in their households.
^
(ST77) 77. At the 90 percent confidence level, no prediction can be made about the policy
receiving more than 50 percent of the votes.
^
(ST78) 78. It is given that pˆ  0.637 and z 2 pˆ  0.088 .
For a 90% interval of confidence, the margin of error is z 2  1.645 .
If n is the size of the sample, then
pˆ 1  pˆ  0.637  (1  0.637)
z 2 pˆ  z 2  1.645  0.088 .
n n
2
 1.645 
Solving for n yields n     0.637  0.363  80.8 , to 1 dp.
 0.088 
The size of the sample of families is 81.
^
(ST79) 79. The percentage of the citizens of the town who agree with the changes
introduced is likely to be within 4 percentage points from the proportion of
citizens in the sample who agreed with the street renaming.
^
32
(ST80) 80. a. The estimator of the population proportion is pˆ   0.2  a .
160
pˆ  1  pˆ  0.2  (1  0.2)
b. b  z 2  1.96  0.062 , to 3 dp
n 160
c. The minimum number of patients in the group who would suffer withdrawal
side effects is 1,500  a  b   1,500(0.2  0.062)  207 .
^
pˆ  1  pˆ 
0.34  0.66
(ST81) 81. a. The margin of error was z 2  1.96
 0.04 , to 2 dp.
n 540
b. The maximum possible number of adult voters the legislators would expect
to be in favor of the conservative proposal is
1,700,000  0.34  0.04   646,000 .
^
(ST82) 82. a. This condition must be satisfied in order to be able to approximate the
distribution of the sampling proportion by a normal distribution.
b. The significance of the difference between the proportion of adult voters in
state X who are in favor of the conservative proposal and the proportion of
those who do not favor this proposal cannot be investigated from the given

Level N | 16
statistics using a two-sample z-interval for the difference between two
proportions because the voters surveyed came from the same sample.
^
(ST83) 83. a. The confidence interval provides convincing statistical evidence that the
program is working properly because it includes the value 0.1.
b. The confidence interval does not provide convincing statistical evidence that
the program generates the letter T with a probability of 0.1 because it
includes values other than 0.1
^
(ST84) 84. In the random sample of 850 citizens who participated in the poll, the
proportion of those supporting the proposal submitted by the Department of
544
Tourism is pˆ   0.64 .
850

It is reasonable to assume that the population of the city is much larger than
850  10 = 8,500. In addition, npˆ  544  10 and
n 1  pˆ   850  544  306  10 . Therefore, we may use a normal distribution
with mean equal to pˆ  0.64 and standard deviation equal to
pˆ  1  pˆ  0.64  0.36
 pˆ    0.01646 to establish the interval of
n 850
confidence.

For a 99% interval of confidence, the z-value is z 2  2.576 . The margin of


error is z 2 pˆ  2.576  0.01646  0.0424 .

The estimated 99% confidence interval for the population proportion is


p̂  margin of error, yielding the interval 0.64  0.0424, which may be
expressed using interval notation as (0.5976, 0.6824).

Since 0.7 is not included in the 99 percent confidence interval, it is not a


plausible value for the proportion of citizens who will support the proposal
submitted by the Department of Tourism.
^
(ST85) 85. For any sampling proportion, the margin of error is always less than or equal
0.5 z 2
to , where z 2  1.645 , for a 90% interval of confidence.
n
0.5 z 2
For the margin of error to be less than 0.02, we must have  0.02 ,
n
where n is the size of the sample.
 0.5  1.645 
2

Solving for n yields n     1,691.2 , to one decimal place.


 0.02 
The minimum size of a random sample of students at school age who received
the flu vaccine should be 1,692.
Level N | 17
^
(ST86) 86. a. Yes, since the interval (0.28 – 0.03, 0.28 + 0.03) = (0.25, 0.31) contains 0.3.
0.03 0.03 0.03
b. The value of the margin of error would be    0.015 .
n 4 2
c. No, since the interval (0.28 – 0.015, 0.28 + 0.015) = (0.265, 0.295) does not
contain 0.3.
^
(ST87) 87. By verifying that N ≥ 10n, we make sure that the effect of bias is minimized.
^
(ST88) 88. To invoke the central limit theorem and use the normal distribution to find the
confidence interval
^
(ST89) 89. The standard deviation of the sampling distribution of X would be
6.4 6.4
  0.32 .
400 20
^
(ST90) 90. A t-distribution is used when the standard deviation of the population is
unknown.
^
(ST91) 91. a. Yes, the population distribution can be assumed to be normal because the
data shows no outliers or strong skewness as concluded from the stemplot.
b. Not all conditions have been met to approximate the sampling distribution
by a normal distribution, since the sample size is less than 30.
c. If conditions for inference are met, the appropriate procedure to establish a
90 percent confidence interval is a one sample t-interval for a population
mean because the standard deviation of the population is unknown.
^
(ST92) 92. 1. The two samples were randomly selected from two independent populations.
2. The size of the population of students majoring in Sciences across the state
and the size of the population of students majoring in Economics both are
expected to be greater than or equal to 1,000.

3. The size of each of the two samples is at least 30.

^
(ST93) 93. If all samples of 80 teenage students selected randomly from the schools in the
city considered are surveyed and the corresponding 95 percent confidence
intervals are constructed, then 95 percent of these intervals will contain the
mean of the population of teenage students in this city.
^
(ST94) 94. a. Since the population, which consists of all the Hokuto apples grown in Japan,
has a size larger than 10 times the size of the sample and the conditions for
inference are satisfied, the sampling distribution of the mean is
approximated by a t-distribution with degrees of freedom df = 25 – 1 = 24.
It is given that the mean x  8.28 and the standard deviation
0.82
x   0.164 .
25
Level N | 18
At the confidence level 95%, the table for t-distributions with degrees of
freedom df = 24 gives a critical value t 2  2.064 .
Therefore, the margin of error is e  2.064  0.164  0.34 .
b. x  t 2   x  8.28  2.064  0.164  8.28  0.34 which is written as (7.94,
8.62),
yielding a = 7.94.
c. b = 8.62.
^
(ST95) 95. a. At the confidence level 95%, the table for t-distributions with degrees of
freedom
df = 82 – 2 = 80 gives a critical value t 2  1.99 .
a = 1.7 – 1.99(0.18) = 1.3 (to 1 dp)
b. b = 1.7 + 1.99(0.18) = 2.1 (to 1 dp)
^

Level N | 19
Term II Week 5

Chapter 7 Significance Tests


Sec 7.1 Tests For proportions

(ST96) 96. The p-value is less than the significance level  = 0.05. Thus, a type I error may
occur leading to the rejection of H 0 in favor of H a .
As a consequence, the board will finance special assistance programs.
^
(ST97) 97. The probability of type II error would increase and the power of the test would
decrease.
^
(ST98) 98. In this situation, the appropriate test to use is a two-proportion z-test because
the number of those who approve of the new policies and the number of those
who disapprove in each sample are greater than 10.
^
(ST99) 99. a. The sampling distribution of the difference p1  p2 is approximately normal
because both counts among each sample (count of those who approve and
count of those who disapprove) are greater than 10.
b. The condition of independence of individuals in each group was guaranteed
since it is reasonable to assume that the size of each sample is less than 10%
of the corresponding population.
c. Since the p-value 0.03 is smaller than the significance level  = 0.05, the data
provided statistical evidence of a significant difference in opinions with
respect to imposing an additional tax on dogs between the population of
adult dog owners and the population of adults who do not own a dog.
^
(ST100) 100. a. If the conditions for inference are verified, the researcher must conduct a
two-sample z-test for comparing two proportions.
b. The condition for invoking the Central Limit Theorem is met because the
proportions of those who said yes and those who said no in each sample are
large relative to the corresponding sample size.
c. The condition of the independence of the two samples may be assumed to be
met because the samples were selected from two different states.
^
(ST101) 101. a. A two-sample z-test should be used.
The null hypothesis H0 : pr  pb or pr  pb  0 , the alternative hypothesis
Ha : pr  pb or pr  pb  0 .
585 555 585 + 555
b. p1 = = 0.78 , p2 = = 0.74 , and p = = 0.76
750 750 1, 500

Level N | 20
Test statistic: z* =
p  p2 p1  p2 0.78 - 0.74
 1  =  1.8 (to 1 dp).
p p  1 1   2 
1 2
p(1  p)    0.76(1 - 0.76)  
 n1 n2   750 

c. The p-value = P(z  z*) = P(z  1.8) = 1 - 0.9641 = 0.0359 (to 4 dp).
^

Term II Week 6 and 7

Sec 7.2 Test for means

(ST102) 102. H0 :   5
Ha :   5
^
(ST103) 103. The proper test is a two-sample t-test for the difference between two means.
^
(ST104) 104. a. Since the P-value is greater than 0.05, the data do not provide enough
convincing evidence that the drug prescribed is more effective in reducing
systolic blood pressure than just dieting for male adults who are at risk of
hypertension.
b. Since the P-value is less than 0.1, the data provide convincing evidence that
the drug prescribed is more effective in reducing systolic blood pressure
than just dieting for male adults who are at risk of hypertension.
^
(ST105) 105. a. A proper test to use is a two-sample pooled t-test.
b. H0 :  A  B and Ha :  A  B
^
(ST106) 106. a. The independence of the individuals in each sample is ensured by the size
of the samples as compared to the corresponding sizes of the populations
b. The normal approximation is justified by the distribution of the data in
each sample, as observed in the dot plots.
^
(ST107) 107. a. The number of degrees of freedom is df = 10 + 8 – 2 = 16.
b. Since 0.0504 > 0.05, the data do not provide enough evidence that the
means of the amounts of coffee beans dispensed by the two machines are
different.
0.0504
c. P-value = = 0.0252
2
d. Since 0.0252 < 0.05, the data provide convincing evidence that the mean
amount of coffee beans dispensed by machine A is greater than the mean
amount of coffee beans dispensed by machine B.
^
(ST108) 108. The proper test is a paired t-test with 9 degrees of freedom.
^
Level N | 21
11 - 0
(ST109) 109. a. The test statistic is t* =  13 (to the nearest whole number).
2.67 / 10
b. Based on the statistic calculated in part (a) and the corresponding P-value
0.000019, which is smaller than all three commonly used significance
levels, the test provides convincing statistical evidence that supports the
fact that on average the prices in city A are higher than the prices in city
B.
^

Level N | 22
Term II Week 8 and 9

Chapter 8 Chi-Square Tests


(ST110) 110. Chi-square test for goodness of fit
Willing to work some weekends Total
Shift worked Yes No
Morning 16 15 31
Afternoon 59 22 81
Evening 12 26 38
Total 87 63 150
Using the respective total values, the expected cell count of the employees
who work the afternoon shift and are willing to work some weekends is
87  81
 46.98  47.
150

^
(ST111) 111. The small p-value of the test suggests that the data provides evidence of an
association between availability to work some weekends and shift assigned.
^
(ST112) 112. a. Since 0.036 > 0.01, we conclude that at the significance level 0.01, the test
does not provide statistical evidence that fertilizing approach affects
hydrangea blooming density.
Since 0.036 < 0.05, we conclude that at the significance level 0.05, the test
provides statistical evidence that fertilizing approach affects hydrangea
blooming density.
b. The gardener’s experiment shows that slow-release granular formula
affects hydrangea blooming density more than the water soluble liquid.
^
(ST113) 113. We produce the expected counts in the two-way table. The entries are
calculated to one decimal place,
Symptoms Relief Timeline
(Number of Days)
1 to 2 3 to 4 5 to 6 7 to 8 Total
Males 46.9 36.5 24.4 12.2 120
Females 30.1 23.5 15.6 7.8 77
Total 77 60 40 20 197
(Observed  Expected) 2
2   
Expected
(46.9  42)2 (36.5  34)2 (24.4  28)2 (12.2  16)2
+ + +
46.9 36.5 24.4 12.2
(30.1  35) (23.5  26) (15.6  12) (7.8  4)2
2 2 2
+ + + +  6.14
30.1 23.5 15.6 7.8
The number of degrees of freedom is
df = (number of rows – 1) × (number of columns – 1) = 1 × 3 = 3. Thus, the p-
value = 0.105.
Level N | 23
^
10  4
(ST114) 114. a. For the day-shift employees, Proportion =  0.21
67
14  8
For the night-shift employees, Proportion =  0.27 ,which is slightly
83
larger than the proportion for the day-shift employees.
b. Since the p-value is larger than any of the commonly used significance
levels, the test did not provide strong enough evidence to conclude that
day-shift employees produce better work quality than night-shift
employees.
^
(ST115) 115. a. H 0 : There is no association between time of taking the omega-3 capsule
and severity of nausea.
H a : There is an association between time of taking the omega-3 capsule
and severity of nausea.

b. Since the sample was selected randomly and all expected values are at least
5, conditions for inference are met.
^
(ST116) 116. a. Since P-value > , the researcher will conclude that there is no convincing
evidence of an association between time of taking the omega-3 capsule and
severity of nausea.
b. Type II Error, since H 0 is accepted while it may be not true.
^
(ST117) 117. a. We produce the expected counts in the two-way table. The entries are
calculated to one decimal place,

Engagement in risk
behavior
Weekly average
number of classes No Yes Total
skipped
0 74.7 25.3 100
1 to 5 45.6 15.4 61
6 or more 21.7 7.3 29
Total 142 48 190

(Observed  Expected)2
2   
Expected
(74.7  83)2 (25.3  17)2 (45.6  42) 2
+ +
74.7 25.3 45.6
(15.4  19) (21.7  17) (7.3  12)2
2 2
+ + +  8.81
15.4 21.7 7.3
The number of degrees of freedom is
df = (number of rows – 1) × (number of columns – 1) = 1 × 2 = 2. Thus, the
p-value = 0.012.
Level N | 24
b. Since p-value < 0.05, it can be concluded that at the significance level
 = 0.05 there is convincing evidence of an association between the weekly
average number of classes skipped and engagement in risk behavior.
Since p-value > 0.01, it can be concluded that at the significance level
 = 0.01 there is no convincing evidence of an association between the
weekly average number of classes skipped and engagement in risk
behavior.
^
(ST118) 118. a. df = (3 – 1)(2 – 1) = 2
b. We produce the expected counts in the two-way table. The entries are
calculated to one decimal place,
Response
Approving Disapproving Neutral Total
Younger than
300.0 161.2 298.8 760
40 years
40 years or
413.0 221.8 411.2 1,046
older
Total 713 383 710 1,806

(Observed  Expected)2
2   
Expected
(300  320)2 (161.2  136)2 (298.8  304)2
+ +
300 161.2 298.8
(413  393) (221.8  247) (411.2  406)2
2 2
+ + +  9.26
413 221.8 411.2
Thus, the p-value = 0.0098 (to 2 sf)
Since p -value < 0.01, it can be concluded that at the significance level
 = 0.01 there is convincing evidence that distribution of adults by
approval of local authorities in city A changes by age group.
^
(ST119) 119. a. df = n – 1 = 5 – 1 = 4
b. Since P-value is greater than any of the commonly accepted significance
levels, it can be concluded that there is no convincing evidence to believe
that the spinner is biased.
c. The difference between the frequencies reported in the table can be
justified by sampling variation.
^
(ST120) 120. Chi-square test for goodness of fit

Level N | 25

You might also like