You are on page 1of 4

Semester/Acad.

year 1 2022-2023
Final Exam Date December 24th , 2022
Course title Probability and Statistics
UNIVERSITY OF TECHNOLOGY - VNUHCM Course ID MT2013 Question sheet code 2211
Faculty of Applied Science Duration 100 minutes Shift 16:00
Instructions to students:
- You are allowed to use your OWN materials and calculator. Total available score: 10.
- At the beginning of the working time, you MUST fill in your full name and student ID on this question
sheet. There are 22 questions on 4 pages. Do not round between steps. Round your final answers to 4
decimal places.

Student’s full name: Hoang


. . . . . . . . .Tran
.............................. Invigilator 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Student Id: . . . . . . . . . . . . . . . . . . . .......................... Invigilator 2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part I: Multiple Choice (7 points, 70 minutes)

Questions 1 through 3. An e-mail filter is planned to separate valid e-mails from spam. The word
"free"occurs in 20% of the spam messages and only 5% of the valid messages. Also, 10% of the messages
are spam.
0.1 * 0.2 + 0.9 * 0.05 = 0.065 => B
1. Find the probability that the message contains the word "free".
A 0.365 B 0.065 C 0.165 D 0.265 E 0.465
P(spam | free) = P(spam & free) / P(free)
2. Find the probability that the message is spam given that it contains the word "free".
A 0.0077 B 0.3077 C 0.5077 D 0.6077 E 0.4077 = 0.1 * 0.2 / 0.065
3. Compute the probability that the message is spam or contains the word "free".
A 0.145 B 0.445 C 0.545 D 0.345 E 0.245 1 - 0.9 * (1 - 0.05)
Questions 4 through 8.
A particular brand of diet margarine was analyzed to determine the level of polyunsaturated fatty acid
(in percentages). A random sample of 5 packages resulted in the following data: 15, 14.2, 17.8, 14.1,
16.7.
It is assumed that the level of polyunsaturated fatty acid follows a normal distribution and a significant
level of 0.1 is used. Scientists want to know if the data show enough evidence to prove that the average
level of polyunsaturated fatty acid is not equal to 13.7 (%). mean_sample = u = 15.56
std_sample = sqrt(((15-u)^2 + ... ) / 4) = 1.6288
4. Find the estimated standard deviation of the sample mean. std_SM = std_sample / sqrt(5) => A
A 0.7284 B 0.0284 C 1.9284 D 2.4284 E 2.6284

5. Find the test statistic value.


A 1.6535 B 1.8535 C 2.5535 D 4.4535 E 3.0535

6. Find the the rejection region.


A (−∞, −1.64) ∪ (1.64, +∞) B (−∞, −2.015) ∪ (2.015, +∞) C (−∞, −1.28) ∪ (1.28, +∞)
D (−∞, −1.533) ∪ (1.533, +∞) E (−∞, −2.132) ∪ (2.132, +∞)

Use t-distribution as N < 30 => t_0.05_df_4 = 2.132 => E


CI = 15.56 +/- 2.132 * std_err = 15.56 +/- 2.132 * 0.7285 => D
7. Calculate a 90% confidence interval on the population mean.
A [14.3654 , 16.7546] B [14.6276 , 16.4924] C [14.4433 , 16.6767] D [14.007 , 17.113]
E [14.0922 , 17.0278]

8. If the population variance of the polyunsaturated fatty acid levels is assumed to be 1.9, how many
packages must be collected to ensure that the radius of a 90% two-sided confidence interval for the
population mean is at most 0.25?
z_0.05 * sqrt(1.9) / sqrt(N) <= 0.25 => N >= 82..
A 79 B 82 C 78 D 76 E 87

Questions 9 through 14.


Regression methods were used to analyze the data from a study investigating the relationship between
roadway
Pn surface temperature
Pn (x) and pavement
Pn 2
deflection (y).
Pn Summary
2
Pn n = 19,
quantities were
n=1 xi = 1059, n=1 yi = 359.288, n=1 xi = 73099, n=1 yi = 8458.311, and i=1 xi yi =
24727.568.

9. Find the sample correlation of this dataset.


A 0.7983 B 0.8145 C 0.5344 D 0.827 E 0.9716

10. If the surface temperature increases 1◦ F , the pavement deflection is expectedly


A increased about 0.3341 units B increased about 0.6682 units C decreased about 0.3341
units D decreased about 0.2883 units E increased about 0.2883 units

11. Find the estimated standard error for the fitted slope coefficient β̂1 .
A 0.4053 B 0.399 C 0.0197 D 0.2364 E 0.1468

12. Calculate a 99% confidence interval on the slope coefficient β1 .


A [-0.5015,1.1697] B [0.2832,0.385] C [0.2769,0.3913] D [0.2834,0.3848] E [0.2881,0.3801]

13. Compute the residual for an observation y = 20.87 at x = 71.


A -2.7973 B -3.1083 C -3.3001 D -3.4707 E -3.1393

14. Find the coefficient of determination for the linear regression model.
A 84.8381 B 94.3945 C 97.1568 D 86.3936 E 75.5051

Questions 15 through 20. An article in Communications of the ACM reported on a study of different
algorithms for estimating software development costs. Three algorithms were applied to 15 software
development projects and the development costs (hours) were observed. The data are given as below.
Algorithm 1 4.7 4.9 5.8 4.8 3.7
Algorithm 2 6.7 5.8 7.7 6 8.7
Algorithm 3 8.6 9.6 9.5 10 10.8
Consider an ANOVA situation with a significance level α = 0.01.

15. Choose the correct quantity to describe the total variability between treatment means.
A 60.7413 B 672.7391 C 71.4373 D 300.7403 E 10.696

16. Find the test statistic values.


A 36.073 B 38.0729 C 28.0734 D 34.0733 E 32.0732

17. Find the rejection region.


A [3.8, +∞] B [−∞, 6.93] C [8.51, +∞] D [−∞, 3.8] E [6.93, +∞]

Page 2
18. Find the least significant difference (LSD) for the Fisher’s multiple comparision.
A 1.8242 B 0.8843 C 2.5643 D 3.2843 E 2.0393

19. Choose the correct conclusion.


A There is insufficient information to confirm that algorithm 2 and algorithm 3 have different
development cost means.
B The development cost mean of algorithm 1 is greater than the one of algorithm 2.
C Each of the three algorithms has a different development cost mean.
D There is insufficient information to confirm that algorithm 1 and algorithm 2 have different
development cost means.
E There is insufficient information to confirm that algorithm 1 and algorithm 3 have different
development cost means.

20. Find a 99% confidence interval for the difference in the mean costs between algorithms 1 and
2. A [-3.1356,0.5128] B [-7.0241,-3.3757] C [-4.0242,-0.3758] D [-1.4691,2.1793]
E [-3.6911,-0.0427]

Part II: Essay (3 points, 30 minutes)

21. A factory has 2 firms producing the same type of product. The numbers of errors per product
produced by firm A and firm B follow Poisson distributions with means of 0.1 and 0.2 respectively.
Furthermore, errors occur independently between products regardless the producing firms.
(a) Suppose that the proportion of products produced by firm A is 0.25. In a random sample of 15
products produced by this factory, find the probability that there are more than 12 products
that have exactly 3 errors.
(b) In a random sample of 100 products produced by firm A, find the probability that there are
from 60 to 95 products that have at least 1 error.

Page 3
22. Ten adult males between the ages of 35 and 50 participated in a study to evaluate the effect of diet
and exercise on blood cholesterol levels. The total cholesterol was measured in each subject initially
and then three months after participating in an aerobic exercise program and switching to a low-fat
diet. The data show in the following table.
Before 230 243 256 260 295 283 212 287 269 272
After 229 240 267 257 280 280 230 280 270 205
Suppose that the blood cholesterol levels of adult males between the ages of 35 and 50 follow a
normal distribution. Do the data support the claim that low-fat diet and aerobic exercise are of
value in producing a mean reduction in blood cholesterol levels at the significance level α = 0.05?

22)
H_0: Before - after <= 0
H_1: Before - after > 0
u_before = 260.7, u_after = 253.8 => u_before - u_after = 6.9
var_before = 690.2333, var_after = 695.5111 => pop_var equals
s_before = 26.2723, s_after = 26.3725
s_p = sqrt((n1-1) * s_1^2 + (n2-1) * s_2^2) / (n1 + n2 - 2)) = 26.3225
std_(u_before - u_after) = s_p * sqrt(1/n_1 + 1/n_2), where s_p is the pooled standard dev
= 11.772
Test statistic = 6.9 / 11.772
t_0.05_df_18 = 1.734 => Reject H_0

Page 4

You might also like