You are on page 1of 32

Semester/Acad.

year 1 2022-2023
Final Exam Date December 24th , 2022
Course title Probability and Statistics
UNIVERSITY OF TECHNOLOGY - VNUHCM Course ID MT2013 Question sheet code 2211
Faculty of Applied Science Duration 100 minutes Shift 16:00
Instructions to students:
- You are allowed to use your OWN materials and calculator. Total available score: 10.
- At the beginning of the working time, you MUST fill in your full name and student ID on this question
sheet. There are 22 questions on 4 pages. Do not round between steps. Round your final answers to 4
decimal places.

Student’s full name: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Invigilator 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Student Id: . . . . . . . . . . . . . . . . . . . .......................... Invigilator 2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part I: Multiple Choice (7 points, 70 minutes)

Questions 1 through 3. An e-mail filter is planned to separate valid e-mails from spam. The word
"free"occurs in 20% of the spam messages and only 5% of the valid messages. Also, 10% of the messages
are spam.

1. Find the probability that the message contains the word "free".
A 0.365 B 0.065 C 0.165 D 0.265 E 0.465

2. Find the probability that the message is spam given that it contains the word "free".
A 0.0077 B 0.3077 C 0.5077 D 0.6077 E 0.4077

3. Compute the probability that the message is spam or contains the word "free".
A 0.145 B 0.445 C 0.545 D 0.345 E 0.245

Questions 4 through 8.
A particular brand of diet margarine was analyzed to determine the level of polyunsaturated fatty acid
(in percentages). A random sample of 5 packages resulted in the following data: 15, 14.2, 17.8, 14.1,
16.7.
It is assumed that the level of polyunsaturated fatty acid follows a normal distribution and a significant
level of 0.1 is used. Scientists want to know if the data show enough evidence to prove that the average
level of polyunsaturated fatty acid is not equal to 13.7 (%).

4. Find the estimated standard deviation of the sample mean.


A 0.7284 B 0.0284 C 1.9284 D 2.4284 E 2.6284

5. Find the test statistic value.


A 1.6535 B 1.8535 C 2.5535 D 4.4535 E 3.0535

6. Find the the rejection region.


A (−∞, −1.64) ∪ (1.64, +∞) B (−∞, −2.015) ∪ (2.015, +∞) C (−∞, −1.28) ∪ (1.28, +∞)
D (−∞, −1.533) ∪ (1.533, +∞) E (−∞, −2.132) ∪ (2.132, +∞)
7. Calculate a 90% confidence interval on the population mean.
A [14.3654 , 16.7546] B [14.6276 , 16.4924] C [14.4433 , 16.6767] D [14.007 , 17.113]
E [14.0922 , 17.0278]

8. If the population variance of the polyunsaturated fatty acid levels is assumed to be 1.9, how many
packages must be collected to ensure that the radius of a 90% two-sided confidence interval for the
population mean is at most 0.25?
A 79 B 82 C 78 D 76 E 87

Questions 9 through 14.


Regression methods were used to analyze the data from a study investigating the relationship between
roadway
Pn surface temperature
Pn (x) and pavement
Pn 2
deflection (y).
Pn Summary
2
Pn n = 19,
quantities were
n=1 xi = 1059, n=1 yi = 359.288, n=1 xi = 73099, n=1 yi = 8458.311, and i=1 xi yi =
24727.568.

9. Find the sample correlation of this dataset.


A 0.7983 B 0.8145 C 0.5344 D 0.827 E 0.9716

10. If the surface temperature increases 1◦ F , the pavement deflection is expectedly


A increased about 0.3341 units B increased about 0.6682 units C decreased about 0.3341
units D decreased about 0.2883 units E increased about 0.2883 units

11. Find the estimated standard error for the fitted slope coefficient β̂1 .
A 0.4053 B 0.399 C 0.0197 D 0.2364 E 0.1468

12. Calculate a 99% confidence interval on the slope coefficient β1 .


A [-0.5015,1.1697] B [0.2832,0.385] C [0.2769,0.3913] D [0.2834,0.3848] E [0.2881,0.3801]

13. Compute the residual for an observation y = 20.87 at x = 71.


A -2.7973 B -3.1083 C -3.3001 D -3.4707 E -3.1393

14. Find the coefficient of determination for the linear regression model.
A 84.8381 B 94.3945 C 97.1568 D 86.3936 E 75.5051

Questions 15 through 20. An article in Communications of the ACM reported on a study of different
algorithms for estimating software development costs. Three algorithms were applied to 15 software
development projects and the development costs (hours) were observed. The data are given as below.
Algorithm 1 4.7 4.9 5.8 4.8 3.7
Algorithm 2 6.7 5.8 7.7 6 8.7
Algorithm 3 8.6 9.6 9.5 10 10.8
Consider an ANOVA situation with a significance level α = 0.01.

15. Choose the correct quantity to describe the total variability between treatment means.
A 60.7413 B 672.7391 C 71.4373 D 300.7403 E 10.696

16. Find the test statistic values.


A 36.073 B 38.0729 C 28.0734 D 34.0733 E 32.0732

17. Find the rejection region.


A [3.8, +∞] B [−∞, 6.93] C [8.51, +∞] D [−∞, 3.8] E [6.93, +∞]

Page 2
18. Find the least significant difference (LSD) for the Fisher’s multiple comparision.
A 1.8242 B 0.8843 C 2.5643 D 3.2843 E 2.0393

19. Choose the correct conclusion.


A There is insufficient information to confirm that algorithm 2 and algorithm 3 have different
development cost means.
B The development cost mean of algorithm 1 is greater than the one of algorithm 2.
C Each of the three algorithms has a different development cost mean.
D There is insufficient information to confirm that algorithm 1 and algorithm 2 have different
development cost means.
E There is insufficient information to confirm that algorithm 1 and algorithm 3 have different
development cost means.

20. Find a 99% confidence interval for the difference in the mean costs between algorithms 1 and
2. A [-3.1356,0.5128] B [-7.0241,-3.3757] C [-4.0242,-0.3758] D [-1.4691,2.1793]
E [-3.6911,-0.0427]

Part II: Essay (3 points, 30 minutes)

21. A factory has 2 firms producing the same type of product. The numbers of errors per product
produced by firm A and firm B follow Poisson distributions with means of 0.1 and 0.2 respectively.
Furthermore, errors occur independently between products regardless the producing firms.
(a) Suppose that the proportion of products produced by firm A is 0.25. In a random sample of 15
products produced by this factory, find the probability that there are more than 12 products
that have exactly 3 errors.
(b) In a random sample of 100 products produced by firm A, find the probability that there are
from 60 to 95 products that have at least 1 error.

Page 3
22. Ten adult males between the ages of 35 and 50 participated in a study to evaluate the effect of diet
and exercise on blood cholesterol levels. The total cholesterol was measured in each subject initially
and then three months after participating in an aerobic exercise program and switching to a low-fat
diet. The data show in the following table.
Before 230 243 256 260 295 283 212 287 269 272
After 229 240 267 257 280 280 230 280 270 205
Suppose that the blood cholesterol levels of adult males between the ages of 35 and 50 follow a
normal distribution. Do the data support the claim that low-fat diet and aerobic exercise are of
value in producing a mean reduction in blood cholesterol levels at the significance level α = 0.05?

Page 4
Semester/Acad. year 1 2022-2023
Final Exam Date December 24th , 2022
Course title Probability and Statistics
UNIVERSITY OF TECHNOLOGY - VNUHCM Course ID MT2013 Question sheet code 2212
Faculty of Applied Science Duration 100 minutes Shift 16:00
Instructions to students:
- You are allowed to use your OWN materials and calculator. Total available score: 10.
- At the beginning of the working time, you MUST fill in your full name and student ID on this question
sheet. There are 22 questions on 4 pages. Do not round between steps. Round your final answers to 4
decimal places.

Student’s full name: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Invigilator 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Student Id: . . . . . . . . . . . . . . . . . . . .......................... Invigilator 2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part I: Multiple Choice (7 points, 70 minutes)

Questions 1 through 3. An e-mail filter is planned to separate valid e-mails from spam. The word
"free"occurs in 20% of the spam messages and only 5% of the valid messages. Also, 15% of the messages
are spam.

1. Find the probability that the message contains the word "free".
A 0.4725 B 0.1725 C 0.2725 D 0.3725 E 0.0725

2. Find the probability that the message is spam given that it contains the word "free".
A 0.0138 B 0.8138 C 0.7138 D 0.4138 E 0.5138

3. Compute the probability that the message is spam or contains the word "free".
A 0.2925 B 0.0925 C 0.1925 D 0.4925 E 0.3925

Questions 4 through 8.
A particular brand of diet margarine was analyzed to determine the level of polyunsaturated fatty acid
(in percentages). A random sample of 6 packages resulted in the following data: 15.7, 15.3, 16.6, 16.1,
14.6, 15.3.
It is assumed that the level of polyunsaturated fatty acid follows a normal distribution and a significant
level of 0.01 is used. Scientists want to know if the data show enough evidence to prove that the average
level of polyunsaturated fatty acid is less than 15.2 (%).

4. Find the estimated standard deviation of the sample mean.


A 1.7852 B 1.3852 C 0.6852 D 0.1852 E 0.2852

5. Find the test statistic value.


A 2.4026 B 2.3026 C 0.0026000000000002 D 1.4026 E 3.3026

6. Find the the rejection region.


A (−∞, −2.58) B (−∞, −3.365) C (−∞, −4.032) D (−∞, −2.33) E (−∞, −3.707)
7. Calculate a 99% confidence interval on the population mean.
A [14.8642 , 16.3358] B [14.6403 , 16.5597] C [14.5428 , 16.6572] D [14.9355 , 16.2645]
E [14.4501 , 16.7499]

8. If the population variance of the polyunsaturated fatty acid levels is assumed to be 1, how many
packages must be collected to ensure that the radius of a 99% two-sided confidence interval for the
population mean is at most 0.3?
A 74 B 80 C 68 D 83 E 75

Questions 9 through 14.


Regression methods were used to analyze the data from a study investigating the relationship between
roadway
Pn surface temperature
Pn (x) and Ppavement deflectionP
n 2
(y). Summary quantities were
n 2
Pn n = 17,
n=1 xi = 727, n=1 yi = 330.575, n=1 xi = 44137, n=1 yi = 8528.0656, and i=1 xi yi =
19301.825.

9. Find the sample correlation of this dataset.


A 0.9034 B 0.9868 C 0.7224 D 0.6154 E 0.944

10. If the surface temperature increases 1◦ F , the pavement deflection is expectedly


A increased about 2.5165 units B increased about 0.7917 units C decreased about 2.5165
units D increased about 0.3959 units E decreased about 0.3959 units

11. Find the estimated standard error for the fitted slope coefficient β̂1 .
A 0.1035 B 0.3198 C 0.1082 D 0.2824 E 0.0168

12. Calculate a 99% confidence interval on the slope coefficient β1 .


A [-7.0195,7.8113] B [0.3525,0.4392] C [0.3521,0.4396] D [0.3464,0.4454] E [0.3567,0.435]

13. Compute the residual for an observation y = 7.88 at x = 11.


A 1.0253 B 0.9314 C 1.009 D 1.1334 E 1.334

14. Find the coefficient of determination for the linear regression model.
A 97.3693 B 76.9244 C 98.6759 D 92.4794 E 95.5904

Questions 15 through 20. An article in Communications of the ACM reported on a study of different
algorithms for estimating software development costs. Three algorithms were applied to 12 software
development projects and the development costs (hours) were observed. The data are given as below.
Algorithm 1 6.3 6.6 5 6.4
Algorithm 2 9.9 8.2 6.6 7.5
Algorithm 3 2.4 2.2 1.5 4.4
Consider an ANOVA situation with a significance level α = 0.05.

15. Choose the correct quantity to describe the total variability between treatment means.
A 12.085 B 72.3967 C 920.3087 D 60.3117 E 734.3093

16. Find the test statistic values.


A 16.4579 B 26.4574 C 28.4573 D 18.4578 E 22.4578

17. Find the rejection region.


A [4.26, +∞] B [2.9, +∞] C [−∞, 2.9] D [5.71, +∞] E [−∞, 4.26]

Page 2
18. Find the least significant difference (LSD) for the Fisher’s multiple comparision.
A 0.6785 B 2.2385 C 3.6535 D 1.8534 E 3.3235

19. Choose the correct conclusion.


A There is insufficient information to confirm that algorithm 1 and algorithm 3 have different
development cost means.
B There is insufficient information to confirm that algorithm 2 and algorithm 3 have different
development cost means.
C The development cost mean of algorithm 1 is greater than the one of algorithm 2.
D Each of the three algorithms has a different development cost mean.
E There is insufficient information to confirm that algorithm 1 and algorithm 2 have different
development cost means.

20. Find a 95% confidence interval for the difference in the mean costs between algorithms 1 and 2.
A [-6.2728,-2.566] B [-3.4953,0.2115] C [-2.9398,0.767] D [-3.8284,-0.1216] E [-
6.8283,-3.1215]

Part II: Essay (3 points, 30 minutes)

21. A factory has 2 firms producing the same type of product. The numbers of errors per product
produced by firm A and firm B follow Poisson distributions with means of 0.1 and 0.2 respectively.
Furthermore, errors occur independently between products regardless the producing firms.
(a) Suppose that the proportion of products produced by firm A is 0.25. In a random sample of 15
products produced by this factory, find the probability that there are more than 12 products
that have exactly 3 errors.
(b) In a random sample of 100 products produced by firm A, find the probability that there are
from 60 to 95 products that have at least 1 error.

Page 3
22. Ten adult males between the ages of 35 and 50 participated in a study to evaluate the effect of diet
and exercise on blood cholesterol levels. The total cholesterol was measured in each subject initially
and then three months after participating in an aerobic exercise program and switching to a low-fat
diet. The data show in the following table.
Before 230 243 256 260 295 283 212 287 269 272
After 229 240 267 257 280 280 230 280 270 205
Suppose that the blood cholesterol levels of adult males between the ages of 35 and 50 follow a
normal distribution. Do the data support the claim that low-fat diet and aerobic exercise are of
value in producing a mean reduction in blood cholesterol levels at the significance level α = 0.05?

Page 4
Semester/Acad. year 1 2022-2023
Final Exam Date December 24th , 2022
Course title Probability and Statistics
UNIVERSITY OF TECHNOLOGY - VNUHCM Course ID MT2013 Question sheet code 2213
Faculty of Applied Science Duration 100 minutes Shift 16:00
Instructions to students:
- You are allowed to use your OWN materials and calculator. Total available score: 10.
- At the beginning of the working time, you MUST fill in your full name and student ID on this question
sheet. There are 22 questions on 4 pages. Do not round between steps. Round your final answers to 4
decimal places.

Student’s full name: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Invigilator 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Student Id: . . . . . . . . . . . . . . . . . . . .......................... Invigilator 2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part I: Multiple Choice (7 points, 70 minutes)

Questions 1 through 3. An e-mail filter is planned to separate valid e-mails from spam. The word
"free"occurs in 15% of the spam messages and only 5% of the valid messages. Also, 15% of the messages
are spam.

1. Find the probability that the message contains the word "free".
A 0.165 B 0.065 C 0.465 D 0.265 E 0.365

2. Find the probability that the message is spam given that it contains the word "free".
A 0.0462 B 0.3462 C 0.1462 D 0.5462 E 0.7462

3. Compute the probability that the message is spam or contains the word "free".
A 0.5925 B 0.0925 C 0.1925 D 0.4925 E 0.2925

Questions 4 through 8.
A particular brand of diet margarine was analyzed to determine the level of polyunsaturated fatty acid
(in percentages). A random sample of 5 packages resulted in the following data: 13.5, 15.5, 16.4, 15.4,
14.5.
It is assumed that the level of polyunsaturated fatty acid follows a normal distribution and a significant
level of 0.1 is used. Scientists want to know if the data show enough evidence to prove that the average
level of polyunsaturated fatty acid is not equal to 13.9 (%).

4. Find the estimated standard deviation of the sample mean.


A 0.5925 B 0.4925 C 1.8925 D 0.3925 E 2.0925

5. Find the test statistic value.


A 2.0551 B 2.2551 C 2.1551 D 3.4551 E 2.3551

6. Find the the rejection region.


A (−∞, −2.132) ∪ (2.132, +∞) B (−∞, −1.64) ∪ (1.64, +∞) C (−∞, −1.28) ∪ (1.28, +∞)
D (−∞, −2.015) ∪ (2.015, +∞) E (−∞, −1.533) ∪ (1.533, +∞)
7. Calculate a 90% confidence interval on the population mean.
A [14.0099 , 16.1101] B [14.0675 , 16.0525] C [14.4295 , 15.6905] D [14.2522 , 15.8678]
E [14.3049 , 15.8151]

8. If the population variance of the polyunsaturated fatty acid levels is assumed to be 0.8, how many
packages must be collected to ensure that the radius of a 90% two-sided confidence interval for the
population mean is at most 0.3?
A 18 B 25 C 21 D 24 E 32

Questions 9 through 14.


Regression methods were used to analyze the data from a study investigating the relationship between
roadway
Pn surface temperature
Pn (x) and Ppavement deflectionP
n 2
(y). Summary quantities were
n 2
Pn n = 17,
n=1 xi = 787, n=1 yi = 175.613, n=1 xi = 49057, n=1 yi = 2673.0683, and i=1 xi yi =
11231.343.

9. Find the sample correlation of this dataset.


A 0.9419 B 0.822 C 0.5064 D 0.5325 E 0.7048

10. If the surface temperature increases 1◦ F , the pavement deflection is expectedly


A decreased about 0.2457 units B increased about 1.0439 units C decreased about 1.0439
units D increased about 0.4914 units E increased about 0.2457 units

11. Find the estimated standard error for the fitted slope coefficient β̂1 .
A 0.0226 B 0.3584 C 0.1631 D 0.1677 E 0.2807

12. Calculate a 90% confidence interval on the slope coefficient β1 .


A [0.2086,0.2828] B [2.0756,-1.5843] C [0.2167,0.2747] D [0.2154,0.276] E
[0.206,0.2854]

13. Compute the residual for an observation y = 7.76 at x = 41.


A -1.4957 B -1.6969 C -0.9525 D -1.2695 E -1.2664

14. Find the coefficient of determination for the linear regression model.
A 83.8238 B 94.7123 C 88.7137 D 94.188 E 77.6018

Questions 15 through 20. An article in Communications of the ACM reported on a study of different
algorithms for estimating software development costs. Three algorithms were applied to 15 software
development projects and the development costs (hours) were observed. The data are given as below.
Algorithm 1 10 9.1 10.8 10.6 9.2
Algorithm 2 6.2 6.1 6 6.4 7.9
Algorithm 3 2.8 6.6 4.3 3.5 4.2
Consider an ANOVA situation with a significance level α = 0.01.

15. Choose the correct quantity to describe the total variability between treatment means.
A 104.249 B 259.2485 C 94.3373 D 13.088 E 81.2493

16. Find the test statistic values.


A 33.2476 B 31.2477 C 41.2472 D 37.2476 E 43.2471

17. Find the rejection region.


A [8.51, +∞] B [−∞, 3.8] C [3.8, +∞] D [6.93, +∞] E [−∞, 6.93]

Page 2
18. Find the least significant difference (LSD) for the Fisher’s multiple comparision.
A 1.7779 B 1.1229 C 2.0178 D 2.6679 E 2.9429

19. Choose the correct conclusion.


A Each of the three algorithms has a different development cost mean.
B There is insufficient information to confirm that algorithm 2 and algorithm 3 have different
development cost means.
C There is insufficient information to confirm that algorithm 1 and algorithm 3 have different
development cost means.
D The development cost mean of algorithm 1 is less than the one of algorithm 2.
E There is insufficient information to confirm that algorithm 1 and algorithm 2 have different
development cost means.

20. Find a 99% confidence interval for the difference in the mean costs between algorithms 1 and
2. A [3.9573,7.9929] B [-1.5977,2.4379] C [1.4022,5.4378] D [3.4018,7.4374]
E [0.0688,4.1044]

Part II: Essay (3 points, 30 minutes)

21. A factory has 2 firms producing the same type of product. The numbers of errors per product
produced by firm A and firm B follow Poisson distributions with means of 0.1 and 0.2 respectively.
Furthermore, errors occur independently between products regardless the producing firms.
(a) Suppose that the proportion of products produced by firm A is 0.25. In a random sample of 15
products produced by this factory, find the probability that there are more than 12 products
that have exactly 3 errors.
(b) In a random sample of 100 products produced by firm A, find the probability that there are
from 60 to 95 products that have at least 1 error.

Page 3
22. Ten adult males between the ages of 35 and 50 participated in a study to evaluate the effect of diet
and exercise on blood cholesterol levels. The total cholesterol was measured in each subject initially
and then three months after participating in an aerobic exercise program and switching to a low-fat
diet. The data show in the following table.
Before 230 243 256 260 295 283 212 287 269 272
After 229 240 267 257 280 280 230 280 270 205
Suppose that the blood cholesterol levels of adult males between the ages of 35 and 50 follow a
normal distribution. Do the data support the claim that low-fat diet and aerobic exercise are of
value in producing a mean reduction in blood cholesterol levels at the significance level α = 0.05?

Page 4
Semester/Acad. year 1 2022-2023
Final Exam Date December 24th , 2022
Course title Probability and Statistics
UNIVERSITY OF TECHNOLOGY - VNUHCM Course ID MT2013 Question sheet code 2214
Faculty of Applied Science Duration 100 minutes Shift 16:00
Instructions to students:
- You are allowed to use your OWN materials and calculator. Total available score: 10.
- At the beginning of the working time, you MUST fill in your full name and student ID on this question
sheet. There are 22 questions on 4 pages. Do not round between steps. Round your final answers to 4
decimal places.

Student’s full name: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Invigilator 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Student Id: . . . . . . . . . . . . . . . . . . . .......................... Invigilator 2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part I: Multiple Choice (7 points, 70 minutes)

Questions 1 through 3. An e-mail filter is planned to separate valid e-mails from spam. The word
"free"occurs in 20% of the spam messages and only 5% of the valid messages. Also, 10% of the messages
are spam.

1. Find the probability that the message contains the word "free".
A 0.365 B 0.465 C 0.165 D 0.065 E 0.265

2. Find the probability that the message is spam given that it contains the word "free".
A 0.2077 B 0.4077 C 0.3077 D 0.1077 E 0.0077

3. Compute the probability that the message is spam or contains the word "free".
A 0.345 B 0.045 C 0.245 D 0.145 E 0.445

Questions 4 through 8.
A particular brand of diet margarine was analyzed to determine the level of polyunsaturated fatty acid
(in percentages). A random sample of 7 packages resulted in the following data: 14.5, 19.5, 15.3, 13,
18.3, 18.9, 16.
It is assumed that the level of polyunsaturated fatty acid follows a normal distribution and a significant
level of 0.1 is used. Scientists want to know if the data show enough evidence to prove that the average
level of polyunsaturated fatty acid is not equal to 17 (%).

4. Find the estimated standard deviation of the sample mean.


A 2.825 B 0.525 C 1.825 D 0.925 E 2.925

5. Find the test statistic value.


A -1.5405 B -0.5405 C -0.7405 D -1.0405 E -0.3405

6. Find the the rejection region.


A (−∞, −1.64) ∪ (1.64, +∞) B (−∞, −1.28) ∪ (1.28, +∞) C (−∞, −1.943) ∪ (1.943, +∞)
D (−∞, −1.44) ∪ (1.44, +∞) E (−∞, −1.895) ∪ (1.895, +∞)
7. Calculate a 90% confidence interval on the population mean.
A [15.3159 , 17.6841] B [15.1679 , 17.8321] C [14.9829 , 18.0171] D [14.7026 , 18.2974]
E [14.747 , 18.253]

8. If the population variance of the polyunsaturated fatty acid levels is assumed to be 1, how many
packages must be collected to ensure that the radius of a 90% two-sided confidence interval for the
population mean is at most 0.2?
A 68 B 66 C 61 D 70 E 75

Questions 9 through 14.


Regression methods were used to analyze the data from a study investigating the relationship between
roadway
Pn surfacePtemperature (x)Pand pavement deflection (y). Summary quantities were n = 15,
n n 2
Pn 2
Pn
n=1 xi = 725, n=1 yi = 110.2, n=1 xi = 42535, n=1 yi = 1137.1594, and i=1 xi yi = 6723.76.

9. Find the sample correlation of this dataset.


A 0.7061 B 0.892 C 0.9193 D 0.4739 E 0.7104

10. If the surface temperature increases 1◦ F , the pavement deflection is expectedly


A increased about 1.667 units B decreased about 1.667 units C decreased about 0.1865
units D increased about 0.373 units E increased about 0.1865 units

11. Find the estimated standard error for the fitted slope coefficient β̂1 .
A 0.335 B 0.0857 C 0.0262 D 0.425 E 0.0921

12. Calculate a 95% confidence interval on the slope coefficient β1 .


A [0.1351,0.2379] B [3.7878,-3.4149] C [0.1299,0.2431] D [0.1435,0.2295] E [0.1401,0.2329]

13. Compute the residual for an observation y = 3.56 at x = 31.


A -0.4885 B -0.961 C -0.3451 D -0.5542 E -0.8128

14. Find the coefficient of determination for the linear regression model.
A 89.1966 B 59.1154 C 76.2259 D 60.6709 E 79.5603

Questions 15 through 20. An article in Communications of the ACM reported on a study of different
algorithms for estimating software development costs. Three algorithms were applied to 15 software
development projects and the development costs (hours) were observed. The data are given as below.
Algorithm 1 7.2 6.9 8.1 7.5 8.5
Algorithm 2 4 3.5 5.1 4.2 3
Algorithm 3 9.4 8.9 8.6 9.6 8.7
Consider an ANOVA situation with a significance level α = 0.01.

15. Choose the correct quantity to describe the total variability between treatment means.
A 711.8457 B 4.976 C 68.848 D 804.8454 E 73.824

16. Find the test statistic values.


A 79.0161 B 83.0161 C 87.0157 D 89.0156 E 85.0158

17. Find the rejection region.


A [−∞, 6.93] B [−∞, 3.8] C [8.51, +∞] D [6.93, +∞] E [3.8, +∞]

Page 2
18. Find the least significant difference (LSD) for the Fisher’s multiple comparision.
A 1.2793 B 2.4693 C 1.2442 D 3.1943 E 2.7193

19. Choose the correct conclusion.


A There is insufficient information to confirm that algorithm 2 and algorithm 3 have different
development cost means.
B There is insufficient information to confirm that algorithm 1 and algorithm 2 have different
development cost means.
C Each of the three algorithms has a different development cost mean.
D The development cost mean of algorithm 1 is less than the one of algorithm 2.
E There is insufficient information to confirm that algorithm 1 and algorithm 3 have different
development cost means.

20. Find a 99% confidence interval for the difference in the mean costs between algorithms 1 and 2.
A [4.9909,7.4793] B [4.4354,6.9238] C [3.3244,5.8128] D [-0.0086,2.4798] E
[2.4358,4.9242]

Part II: Essay (3 points, 30 minutes)

21. A factory has 2 firms producing the same type of product. The numbers of errors per product
produced by firm A and firm B follow Poisson distributions with means of 0.1 and 0.2 respectively.
Furthermore, errors occur independently between products regardless the producing firms.
(a) Suppose that the proportion of products produced by firm A is 0.25. In a random sample of 15
products produced by this factory, find the probability that there are more than 12 products
that have exactly 3 errors.
(b) In a random sample of 100 products produced by firm A, find the probability that there are
from 60 to 95 products that have at least 1 error.

Page 3
22. Ten adult males between the ages of 35 and 50 participated in a study to evaluate the effect of diet
and exercise on blood cholesterol levels. The total cholesterol was measured in each subject initially
and then three months after participating in an aerobic exercise program and switching to a low-fat
diet. The data show in the following table.
Before 230 243 256 260 295 283 212 287 269 272
After 229 240 267 257 280 280 230 280 270 205
Suppose that the blood cholesterol levels of adult males between the ages of 35 and 50 follow a
normal distribution. Do the data support the claim that low-fat diet and aerobic exercise are of
value in producing a mean reduction in blood cholesterol levels at the significance level α = 0.05?

Page 4
Semester/Acad. year 1 2022-2023
Final Exam Date December 24th , 2022
Course title Probability and Statistics
UNIVERSITY OF TECHNOLOGY - VNUHCM Course ID MT2013 Question sheet code 2215
Faculty of Applied Science Duration 100 minutes Shift 16:00
Instructions to students:
- You are allowed to use your OWN materials and calculator. Total available score: 10.
- At the beginning of the working time, you MUST fill in your full name and student ID on this question
sheet. There are 22 questions on 4 pages. Do not round between steps. Round your final answers to 4
decimal places.

Student’s full name: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Invigilator 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Student Id: . . . . . . . . . . . . . . . . . . . .......................... Invigilator 2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part I: Multiple Choice (7 points, 70 minutes)

Questions 1 through 3. An e-mail filter is planned to separate valid e-mails from spam. The word
"free"occurs in 20% of the spam messages and only 10% of the valid messages. Also, 15% of the messages
are spam.

1. Find the probability that the message contains the word "free".
A 0.315 B 0.115 C 0.215 D 0.015 E 0.415

2. Find the probability that the message is spam given that it contains the word "free".
A 0.6609 B 0.2609 C 0.3609 D 0.0609 E 0.4609

3. Compute the probability that the message is spam or contains the word "free".
A 0.535 B 0.635 C 0.435 D 0.235 E 0.135

Questions 4 through 8.
A particular brand of diet margarine was analyzed to determine the level of polyunsaturated fatty acid
(in percentages). A random sample of 5 packages resulted in the following data: 16.1, 15, 14.3, 16.2,
15.3.
It is assumed that the level of polyunsaturated fatty acid follows a normal distribution and a significant
level of 0.1 is used. Scientists want to know if the data show enough evidence to prove that the average
level of polyunsaturated fatty acid is not equal to 16 (%).

4. Find the estimated standard deviation of the sample mean.


A 0.5541 B 1.4541 C 1.7541 D 0.3541 E 1.3541

5. Find the test statistic value.


A -1.5508 B -3.7508 C -2.1508 D -1.7508 E -0.0508

6. Find the the rejection region.


A (−∞, −2.015) ∪ (2.015, +∞) B (−∞, −1.64) ∪ (1.64, +∞) C (−∞, −1.28) ∪ (1.28, +∞)
D (−∞, −1.533) ∪ (1.533, +∞) E (−∞, −2.132) ∪ (2.132, +∞)
7. Calculate a 90% confidence interval on the population mean.
A [14.9267 , 15.8333] B [14.625 , 16.135] C [14.8371 , 15.9229] D [14.6665 , 16.0935]
E [14.7992 , 15.9608]

8. If the population variance of the polyunsaturated fatty acid levels is assumed to be 0.7, how many
packages must be collected to ensure that the radius of a 90% two-sided confidence interval for the
population mean is at most 0.25?
A 33 B 39 C 28 D 40 E 31

Questions 9 through 14.


Regression methods were used to analyze the data from a study investigating the relationship between
roadway
Pn surface temperature
Pn (x) and Ppavement deflectionP
n 2
(y). Summary quantities were
n 2
Pn n = 19,
n=1 xi = 839, n=1 yi = 247.237, n=1 xi = 56259, n=1 yi = 4691.2319, and i=1 xi yi =
16084.097.

9. Find the sample correlation of this dataset.


A 0.9555 B 0.9858 C 0.7854 D 0.9709 E 0.9627

10. If the surface temperature increases 1◦ F , the pavement deflection is expectedly


A decreased about 0.2689 units B decreased about 1.1363 units C increased about 0.5379
units D increased about 0.2689 units E increased about 1.1363 units

11. Find the estimated standard error for the fitted slope coefficient β̂1 .
A 0.3982 B 0.0161 C 0.1534 D 0.2017 E 0.2711

12. Calculate a 90% confidence interval on the slope coefficient β1 .


A [-1.7078,2.2457] B [0.241,0.2969] C [0.2475,0.2904] D [0.2484,0.2895] E [0.2426,0.2953]

13. Compute the residual for an observation y = 6.44 at x = 21.


A -0.3442 B -0.2229 C 0.039 D -0.1144 E -0.1374

14. Find the coefficient of determination for the linear regression model.
A 94.2667 B 98.7098 C 89.3768 D 97.0911 E 80.0438

Questions 15 through 20. An article in Communications of the ACM reported on a study of different
algorithms for estimating software development costs. Three algorithms were applied to 15 software
development projects and the development costs (hours) were observed. The data are given as below.
Algorithm 1 4.1 3.6 4.5 3.5 4.4
Algorithm 2 8.6 7.8 11.1 7.8 8.3
Algorithm 3 6.2 5.9 6.6 6.5 4.8
Consider an ANOVA situation with a significance level α = 0.05.

15. Choose the correct quantity to describe the total variability between treatment means.
A 202.6806 B 55.6813 C 10.476 D 78.681 E 66.1573

16. Find the test statistic values.


A 29.8907 B 31.8908 C 27.8908 D 25.8909 E 35.8904

17. Find the rejection region.


A [−∞, 3.89] B [3.89, +∞] C [5.1, +∞] D [−∞, 2.53] E [2.53, +∞]

Page 2
18. Find the least significant difference (LSD) for the Fisher’s multiple comparision.
A 2.7827 B 0.8327 C 1.2876 D 3.0827 E 3.2127

19. Choose the correct conclusion.


A There is insufficient information to confirm that algorithm 2 and algorithm 3 have different
development cost means.
B There is insufficient information to confirm that algorithm 1 and algorithm 2 have different
development cost means.
C Each of the three algorithms has a different development cost mean.
D There is insufficient information to confirm that algorithm 1 and algorithm 3 have different
development cost means.
E The development cost mean of algorithm 1 is greater than the one of algorithm 2.

20. Find a 95% confidence interval for the difference in the mean costs between algorithms 1 and
2. A [-8.9875,-6.4123] B [-4.5435,-1.9683] C [-7.321,-4.7458] D [-5.9876,-3.4124]
E [-7.8765,-5.3013]

Part II: Essay (3 points, 30 minutes)

21. A factory has 2 firms producing the same type of product. The numbers of errors per product
produced by firm A and firm B follow Poisson distributions with means of 0.1 and 0.2 respectively.
Furthermore, errors occur independently between products regardless the producing firms.
(a) Suppose that the proportion of products produced by firm A is 0.25. In a random sample of 15
products produced by this factory, find the probability that there are more than 12 products
that have exactly 3 errors.
(b) In a random sample of 100 products produced by firm A, find the probability that there are
from 60 to 95 products that have at least 1 error.

Page 3
22. Ten adult males between the ages of 35 and 50 participated in a study to evaluate the effect of diet
and exercise on blood cholesterol levels. The total cholesterol was measured in each subject initially
and then three months after participating in an aerobic exercise program and switching to a low-fat
diet. The data show in the following table.
Before 230 243 256 260 295 283 212 287 269 272
After 229 240 267 257 280 280 230 280 270 205
Suppose that the blood cholesterol levels of adult males between the ages of 35 and 50 follow a
normal distribution. Do the data support the claim that low-fat diet and aerobic exercise are of
value in producing a mean reduction in blood cholesterol levels at the significance level α = 0.05?

Page 4
Semester/Acad. year 1 2022-2023
Final Exam Date December 24th , 2022
Course title Probability and Statistics
UNIVERSITY OF TECHNOLOGY - VNUHCM Course ID MT2013 Question sheet code 2216
Faculty of Applied Science Duration 100 minutes Shift 16:00
Instructions to students:
- You are allowed to use your OWN materials and calculator. Total available score: 10.
- At the beginning of the working time, you MUST fill in your full name and student ID on this question
sheet. There are 22 questions on 4 pages. Do not round between steps. Round your final answers to 4
decimal places.

Student’s full name: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Invigilator 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Student Id: . . . . . . . . . . . . . . . . . . . .......................... Invigilator 2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part I: Multiple Choice (7 points, 70 minutes)

Questions 1 through 3. An e-mail filter is planned to separate valid e-mails from spam. The word
"free"occurs in 20% of the spam messages and only 10% of the valid messages. Also, 10% of the messages
are spam.

1. Find the probability that the message contains the word "free".
A 0.41 B 0.01 C 0.31 D 0.11 E 0.21

2. Find the probability that the message is spam given that it contains the word "free".
A 0.0818 B 0.3818 C 0.1818 D 0.5818 E 0.4818

3. Compute the probability that the message is spam or contains the word "free".
A 0.09 B 0.29 C 0.19 D 0.59 E 0.39

Questions 4 through 8.
A particular brand of diet margarine was analyzed to determine the level of polyunsaturated fatty acid
(in percentages). A random sample of 5 packages resulted in the following data: 16.9, 14.4, 15.4, 15.3,
16.3.
It is assumed that the level of polyunsaturated fatty acid follows a normal distribution and a significant
level of 0.1 is used. Scientists want to know if the data show enough evidence to prove that the average
level of polyunsaturated fatty acid is less than 15.5 (%).

4. Find the estimated standard deviation of the sample mean.


A 0.732 B 2.232 C 1.132 D 0.032 E 0.432

5. Find the test statistic value.


A 0.6704 B 2.1704 C 1.0704 D 0.1704 E 0.3704

6. Find the the rejection region.


A (−∞, −2.132) B (−∞, −1.64) C (−∞, −2.015) D (−∞, −1.533) E (−∞, −1.28)
7. Calculate a 90% confidence interval on the population mean.
A [14.739 , 16.581] B [14.7896 , 16.5304] C [14.9516 , 16.3684] D [14.9978 , 16.3222]
E [15.1071 , 16.2129]

8. If the population variance of the polyunsaturated fatty acid levels is assumed to be 1.8, how many
packages must be collected to ensure that the radius of a 90% two-sided confidence interval for the
population mean is at most 0.1?
A 492 B 479 C 476 D 486 E 485

Questions 9 through 14.


Regression methods were used to analyze the data from a study investigating the relationship between
roadway surface temperature
P n Pn (x) and Ppavement deflectionP
n 2
(y). Summary quantities were
n 2
Pn n = 17,
n=1 xi = 877, n=1 yi = 337.782, n=1 xi = 62737, n=1 yi = 8848.1752, and i=1 xi yi =
23453.542.

9. Find the sample correlation of this dataset.


A 0.986 B 0.7103 C 0.5825 D 0.9398 E 0.9515

10. If the surface temperature increases 1◦ F , the pavement deflection is expectedly


A increased about 2.0937 units B increased about 0.3446 units C decreased about 0.3446
units D decreased about 2.0937 units E increased about 0.6891 units

11. Find the estimated standard error for the fitted slope coefficient β̂1 .
A 0.0151 B 0.0282 C 0.439 D 0.2824 E 0.0728

12. Calculate a 90% confidence interval on the slope coefficient β1 .


A [0.3182,0.371] B [-3.3257,4.0149] C [0.3199,0.3693] D [0.3253,0.3639] E [0.3244,0.3648]

13. Compute the residual for an observation y = 5.33 at x = 11.


A -0.881 B -0.3898 C -0.655 D -0.554 E -0.4848

14. Find the coefficient of determination for the linear regression model.
A 97.2131 B 82.9902 C 92.3232 D 95.4342 E 98.5967

Questions 15 through 20. An article in Communications of the ACM reported on a study of different
algorithms for estimating software development costs. Three algorithms were applied to 12 software
development projects and the development costs (hours) were observed. The data are given as below.
Algorithm 1 7.5 5.5 7.3 6.3
Algorithm 2 3.1 2.9 5.7 4.1
Algorithm 3 9.2 9.1 10.8 11.2
Consider an ANOVA situation with a significance level α = 0.05.

15. Choose the correct quantity to describe the total variability between treatment means.
A 935.3787 B 439.3803 C 86.3892 D 75.3817 E 11.0075

16. Find the test statistic values.


A 26.8169 B 36.8164 C 30.8169 D 24.817 E 28.8168

17. Find the rejection region.


A [−∞, 2.9] B [2.9, +∞] C [−∞, 4.26] D [4.26, +∞] E [5.71, +∞]

Page 2
18. Find the least significant difference (LSD) for the Fisher’s multiple comparision.
A 1.7689 B 0.294 C 2.409 D 3.539 E 2.134

19. Choose the correct conclusion.


A There is insufficient information to confirm that algorithm 2 and algorithm 3 have different
development cost means.
B The development cost mean of algorithm 1 is less than the one of algorithm 2.
C There is insufficient information to confirm that algorithm 1 and algorithm 3 have different
development cost means.
D There is insufficient information to confirm that algorithm 1 and algorithm 2 have different
development cost means.
E Each of the three algorithms has a different development cost mean.

20. Find a 95% confidence interval for the difference in the mean costs between algorithms 1 and
2. A [-0.4023,3.1355] B [2.9307,6.4685] C [-0.9578,2.58] D [0.9311,4.4689]
E [2.3752,5.913]

Part II: Essay (3 points, 30 minutes)

21. A factory has 2 firms producing the same type of product. The numbers of errors per product
produced by firm A and firm B follow Poisson distributions with means of 0.1 and 0.2 respectively.
Furthermore, errors occur independently between products regardless the producing firms.
(a) Suppose that the proportion of products produced by firm A is 0.25. In a random sample of 15
products produced by this factory, find the probability that there are more than 12 products
that have exactly 3 errors.
(b) In a random sample of 100 products produced by firm A, find the probability that there are
from 60 to 95 products that have at least 1 error.

Page 3
22. Ten adult males between the ages of 35 and 50 participated in a study to evaluate the effect of diet
and exercise on blood cholesterol levels. The total cholesterol was measured in each subject initially
and then three months after participating in an aerobic exercise program and switching to a low-fat
diet. The data show in the following table.
Before 230 243 256 260 295 283 212 287 269 272
After 229 240 267 257 280 280 230 280 270 205
Suppose that the blood cholesterol levels of adult males between the ages of 35 and 50 follow a
normal distribution. Do the data support the claim that low-fat diet and aerobic exercise are of
value in producing a mean reduction in blood cholesterol levels at the significance level α = 0.05?

Page 4
Semester/Acad. year 1 2022-2023
Final Exam Date December 24th , 2022
Course title Probability and Statistics
UNIVERSITY OF TECHNOLOGY - VNUHCM Course ID MT2013 Question sheet code 2217
Faculty of Applied Science Duration 100 minutes Shift 16:00
Instructions to students:
- You are allowed to use your OWN materials and calculator. Total available score: 10.
- At the beginning of the working time, you MUST fill in your full name and student ID on this question
sheet. There are 22 questions on 4 pages. Do not round between steps. Round your final answers to 4
decimal places.

Student’s full name: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Invigilator 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Student Id: . . . . . . . . . . . . . . . . . . . .......................... Invigilator 2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part I: Multiple Choice (7 points, 70 minutes)

Questions 1 through 3. An e-mail filter is planned to separate valid e-mails from spam. The word
"free"occurs in 15% of the spam messages and only 10% of the valid messages. Also, 15% of the messages
are spam.

1. Find the probability that the message contains the word "free".
A 0.2075 B 0.5075 C 0.3075 D 0.1075 E 0.4075

2. Find the probability that the message is spam given that it contains the word "free".
A 0.1093 B 0.5093 C 0.3093 D 0.4093 E 0.2093

3. Compute the probability that the message is spam or contains the word "free".
A 0.035 B 0.435 C 0.635 D 0.235 E 0.335

Questions 4 through 8.
A particular brand of diet margarine was analyzed to determine the level of polyunsaturated fatty acid
(in percentages). A random sample of 7 packages resulted in the following data: 16.1, 14, 11.4, 11.6,
13.2, 17.8, 14.6.
It is assumed that the level of polyunsaturated fatty acid follows a normal distribution and a significant
level of 0.1 is used. Scientists want to know if the data show enough evidence to prove that the average
level of polyunsaturated fatty acid is less than 13.8 (%).

4. Find the estimated standard deviation of the sample mean.


A 2.077 B 0.677 C 1.677 D 0.877 E 1.977

5. Find the test statistic value.


A 0.9421 B 1.8421 C 0.6421 D 0.3421 E 1.5421

6. Find the the rejection region.


A (−∞, −1.28) B (−∞, −1.895) C (−∞, −1.64) D (−∞, −1.44) E (−∞, −1.943)
7. Calculate a 90% confidence interval on the population mean.
A [12.9775 , 15.2225] B [12.6618 , 15.5382] C [12.3961 , 15.8039] D [12.4382 , 15.7618]
E [12.8372 , 15.3628]

8. If the population variance of the polyunsaturated fatty acid levels is assumed to be 1.1, how many
packages must be collected to ensure that the radius of a 90% two-sided confidence interval for the
population mean is at most 0.2?
A 74 B 78 C 66 D 75 E 82

Questions 9 through 14.


Regression methods were used to analyze the data from a study investigating the relationship between
roadway
Pn surfacePtemperature (x)Pand pavement deflection (y). Summary quantities were n = 15,
n n 2
Pn 2
Pn
n=1 xi = 935, n=1 yi = 425.2, n=1 xi = 64055, n=1 yi = 13305.32, and i=1 xi yi = 29067.4.

9. Find the sample correlation of this dataset.


A 0.8115 B 0.7355 C 0.8969 D 0.8061 E 0.9533

10. If the surface temperature increases 1◦ F , the pavement deflection is expectedly


A increased about 0.6717 units B decreased about 0.444 units C decreased about 0.6717
units D increased about 0.888 units E increased about 0.444 units

11. Find the estimated standard error for the fitted slope coefficient β̂1 .
A 0.1425 B 0.4805 C 0.0242 D 0.107 E 0.039

12. Calculate a 99% confidence interval on the slope coefficient β1 .


A [0.3406,0.5474] B [0.3531,0.5349] C [0.3264,0.5615] D [-1.5793,2.4673] E [0.3433,0.5447]

13. Compute the residual for an observation y = 5.36 at x = 11.


A -0.4053 B -0.1955 C -0.1657 D -0.0693 E -0.1608

14. Find the coefficient of determination for the linear regression model.
A 99.985 B 96.874 C 90.8754 D 85.9855 E 95.3286

Questions 15 through 20. An article in Communications of the ACM reported on a study of different
algorithms for estimating software development costs. Three algorithms were applied to 15 software
development projects and the development costs (hours) were observed. The data are given as below.
Algorithm 1 4.5 3.6 3.8 5.4 5.8
Algorithm 2 8.2 9 8.6 9.1 7.7
Algorithm 3 10.9 9.2 9.4 11.2 9.8
Consider an ANOVA situation with a significance level α = 0.05.

15. Choose the correct quantity to describe the total variability between treatment means.
A 79.5613 B 350.5602 C 722.559 D 87.8773 E 8.316

16. Find the test statistic values.


A 57.4036 B 53.4036 C 59.4033 D 61.4032 E 63.4031

17. Find the rejection region.


A [3.89, +∞] B [−∞, 2.53] C [2.53, +∞] D [−∞, 3.89] E [5.1, +∞]

Page 2
18. Find the least significant difference (LSD) for the Fisher’s multiple comparision.
A 1.1373 B 0.1973 C 3.0423 D 2.5623 E 1.1472

19. Choose the correct conclusion.


A There is insufficient information to confirm that algorithm 1 and algorithm 2 have different
development cost means.
B The development cost mean of algorithm 1 is greater than the one of algorithm 2.
C There is insufficient information to confirm that algorithm 2 and algorithm 3 have different
development cost means.
D There is insufficient information to confirm that algorithm 1 and algorithm 3 have different
development cost means.
E Each of the three algorithms has a different development cost mean.

20. Find a 95% confidence interval for the difference in the mean costs between algorithms 1 and
2. A [-7.4916,-5.1972] B [-3.6031,-1.3087] C [-4.7141,-2.4197] D [-5.0472,-2.7528]
E [-3.0476,-0.7532]

Part II: Essay (3 points, 30 minutes)

21. A factory has 2 firms producing the same type of product. The numbers of errors per product
produced by firm A and firm B follow Poisson distributions with means of 0.1 and 0.2 respectively.
Furthermore, errors occur independently between products regardless the producing firms.
(a) Suppose that the proportion of products produced by firm A is 0.25. In a random sample of 15
products produced by this factory, find the probability that there are more than 12 products
that have exactly 3 errors.
(b) In a random sample of 100 products produced by firm A, find the probability that there are
from 60 to 95 products that have at least 1 error.

Page 3
22. Ten adult males between the ages of 35 and 50 participated in a study to evaluate the effect of diet
and exercise on blood cholesterol levels. The total cholesterol was measured in each subject initially
and then three months after participating in an aerobic exercise program and switching to a low-fat
diet. The data show in the following table.
Before 230 243 256 260 295 283 212 287 269 272
After 229 240 267 257 280 280 230 280 270 205
Suppose that the blood cholesterol levels of adult males between the ages of 35 and 50 follow a
normal distribution. Do the data support the claim that low-fat diet and aerobic exercise are of
value in producing a mean reduction in blood cholesterol levels at the significance level α = 0.05?

Page 4
Semester/Acad. year 1 2022-2023
Final Exam Date December 24th , 2022
Course title Probability and Statistics
UNIVERSITY OF TECHNOLOGY - VNUHCM Course ID MT2013 Question sheet code 2218
Faculty of Applied Science Duration 100 minutes Shift 16:00
Instructions to students:
- You are allowed to use your OWN materials and calculator. Total available score: 10.
- At the beginning of the working time, you MUST fill in your full name and student ID on this question
sheet. There are 22 questions on 4 pages. Do not round between steps. Round your final answers to 4
decimal places.

Student’s full name: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Invigilator 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Student Id: . . . . . . . . . . . . . . . . . . . .......................... Invigilator 2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part I: Multiple Choice (7 points, 70 minutes)

Questions 1 through 3. An e-mail filter is planned to separate valid e-mails from spam. The word
"free"occurs in 20% of the spam messages and only 5% of the valid messages. Also, 15% of the messages
are spam.

1. Find the probability that the message contains the word "free".
A 0.0725 B 0.1725 C 0.2725 D 0.3725 E 0.4725

2. Find the probability that the message is spam given that it contains the word "free".
A 0.8138 B 0.4138 C 0.1138 D 0.6138 E 0.2138

3. Compute the probability that the message is spam or contains the word "free".
A 0.3925 B 0.2925 C 0.0925 D 0.1925 E 0.5925

Questions 4 through 8.
A particular brand of diet margarine was analyzed to determine the level of polyunsaturated fatty acid
(in percentages). A random sample of 7 packages resulted in the following data: 15.4, 17.8, 15, 16.8, 16,
14.4, 13.9.
It is assumed that the level of polyunsaturated fatty acid follows a normal distribution and a significant
level of 0.01 is used. Scientists want to know if the data show enough evidence to prove that the average
level of polyunsaturated fatty acid is not equal to 16.8 (%).

4. Find the estimated standard deviation of the sample mean.


A 0.5157 B 2.0157 C 1.7157 D 0.3157 E 0.7157

5. Find the test statistic value.


A -2.3994 B -3.8994 C -4.1994 D -2.2994 E -2.9994

6. Find the the rejection region.


A (−∞, −3.5) ∪ (3.5, +∞) B (−∞, −2.33) ∪ (2.33, +∞) C (−∞, −3.707) ∪ (3.707, +∞)
D (−∞, −2.58) ∪ (2.58, +∞) E (−∞, −3.143) ∪ (3.143, +∞)
7. Calculate a 99% confidence interval on the population mean.
A [13.8094 , 17.4191] B [14.2839 , 16.9447] C [14.4128 , 16.8158] D [13.7027 , 17.5259]
E [13.9935 , 17.235]

8. If the population variance of the polyunsaturated fatty acid levels is assumed to be 1.8, how many
packages must be collected to ensure that the radius of a 99% two-sided confidence interval for the
population mean is at most 0.25?
A 202 B 194 C 192 D 189 E 201

Questions 9 through 14.


Regression methods were used to analyze the data from a study investigating the relationship between
roadway surface temperature
P n Pn (x) and Ppavement deflectionP
n 2
(y). Summary quantities were
n 2
Pn n = 17,
n=1 xi = 907, n=1 yi = 248.757, n=1 xi = 64297, n=1 yi = 5214.0775, and i=1 xi yi =
18079.647.

9. Find the sample correlation of this dataset.


A 0.8656 B 0.872 C 0.9445 D 0.9608 E 0.6338

10. If the surface temperature increases 1◦ F , the pavement deflection is expectedly


A increased about 0.3023 units B increased about 1.4938 units C decreased about 0.3023
units D decreased about 1.4938 units E increased about 0.6045 units

11. Find the estimated standard error for the fitted slope coefficient β̂1 .
A 0.2173 B 0.2915 C 0.2053 D 0.0259 E 0.0225

12. Calculate a 99% confidence interval on the slope coefficient β1 .


A [0.2498,0.3547] B [4.7039,-4.0994] C [0.2359,0.3686] D [0.2437,0.3608] E [0.2442,0.3603]

13. Compute the residual for an observation y = 1.37 at x = 21.


A -3.1647 B -3.5631 C -3.6435 D -3.8108 E -3.4837

14. Find the coefficient of determination for the linear regression model.
A 92.3202 B 95.2078 C 73.4308 D 79.6528 E 96.0834

Questions 15 through 20. An article in Communications of the ACM reported on a study of different
algorithms for estimating software development costs. Three algorithms were applied to 15 software
development projects and the development costs (hours) were observed. The data are given as below.
Algorithm 1 8.8 8.2 9.3 8.6 8.3
Algorithm 2 4 3.3 1 3.5 3.5
Algorithm 3 6.3 6.9 4.4 7.9 6.8
Consider an ANOVA situation with a significance level α = 0.01.

15. Choose the correct quantity to describe the total variability between treatment means.
A 815.0787 B 505.0797 C 92.0773 D 12.996 E 79.0813

16. Find the test statistic values.


A 36.5103 B 30.5104 C 42.5098 D 40.5099 E 38.51

17. Find the rejection region.


A [3.8, +∞] B [−∞, 6.93] C [8.51, +∞] D [−∞, 3.8] E [6.93, +∞]

Page 2
18. Find the least significant difference (LSD) for the Fisher’s multiple comparision.
A 2.7058 B 0.1908 C 2.0107 D 1.2458 E 2.7558

19. Choose the correct conclusion.


A The development cost mean of algorithm 1 is less than the one of algorithm 2.
B Each of the three algorithms has a different development cost mean.
C There is insufficient information to confirm that algorithm 2 and algorithm 3 have different
development cost means.
D There is insufficient information to confirm that algorithm 1 and algorithm 2 have different
development cost means.
E There is insufficient information to confirm that algorithm 1 and algorithm 3 have different
development cost means.

20. Find a 99% confidence interval for the difference in the mean costs between algorithms 1 and
2. A [3.5693,7.5907] B [5.5689,9.5903] C [5.0134,9.0348] D [4.4579,8.4793]
E [6.1244,10.1458]

Part II: Essay (3 points, 30 minutes)

21. A factory has 2 firms producing the same type of product. The numbers of errors per product
produced by firm A and firm B follow Poisson distributions with means of 0.1 and 0.2 respectively.
Furthermore, errors occur independently between products regardless the producing firms.
(a) Suppose that the proportion of products produced by firm A is 0.25. In a random sample of 15
products produced by this factory, find the probability that there are more than 12 products
that have exactly 3 errors.
(b) In a random sample of 100 products produced by firm A, find the probability that there are
from 60 to 95 products that have at least 1 error.

Page 3
22. Ten adult males between the ages of 35 and 50 participated in a study to evaluate the effect of diet
and exercise on blood cholesterol levels. The total cholesterol was measured in each subject initially
and then three months after participating in an aerobic exercise program and switching to a low-fat
diet. The data show in the following table.
Before 230 243 256 260 295 283 212 287 269 272
After 229 240 267 257 280 280 230 280 270 205
Suppose that the blood cholesterol levels of adult males between the ages of 35 and 50 follow a
normal distribution. Do the data support the claim that low-fat diet and aerobic exercise are of
value in producing a mean reduction in blood cholesterol levels at the significance level α = 0.05?

Page 4

You might also like