You are on page 1of 7

ENG5001/ENG6001 – Advanced Engineering Data Analysis 1

Tutorial 2 – Problem Set


1. A social researcher in a particular city wishes to obtain information on the number of
children in households that receive welfare support. A random sample of 400 households
is selected from the city welfare rolls. A check on welfare recipient data provides the
number of children in each household.
(a) Identify the population of measurements that is of interest to the researcher.
(b) Identify the sample.
(c) What characteristics of the population are of interest to the researcher?
2. A leakage test was conducted to determine the effectiveness of a seal designed to keep the
inside of a plug airtight. An air needle was inserted into the plug, which was then placed
underwater. Next, the pressure was increased until leakage was observed. The magnitude
of this pressure in psi was recorded for 10 trials:

3.1 3.5 3.3 3.7 4.5 4.2 2.8 3.9 3.5 3.3

Find the sample mean and sample standard deviation for these 10 measurements.
3. The nine measurements that follow are furnace temperatures recorded on successive
batches in a semiconductor manufacturing process (units are ◦ F):

953 955 948 951 957 949 954 950 959

(a) Calculate the sample mean and sample standard deviation.


(b) Find the sample median of the data.
(c) How much could the largest temperature measurement increase without changing
the sample median?

4. The minimum injection pressure (psi) for injection molding specimens of high amylose
corn was determined for eight different specimens (higher pressure corresponds to greater
processing difficulty), resulting in the following observations:

15.0 13.0 18.0 14.5 12.0 11.0 8.9 8.0

(a) Determine the values of the sample mean and sample median.
(b) By how much could the sample observation 8.0 be increased without affecting the
value of the sample median?
(c) Suppose we want the values of the sample mean and median when the observa-
tions are expressed in kilograms per square inch (ksi) rather than psi (pounds per
square inch). Is it necessary to re-express each observation in ksi, or can the values
calculated in part (a) be used directly? Hint: 1kilogram = 2.2 pounds.

5. Suppose after computing the mean based on n sample observations x1 , . . . , xn , another


observation xn+1 becomes available. What is the relationship between the mean of the
first n observations, the new observation, and the mean of all n + 1 observations? (We
can let x̄n denote the mean computed based on n observations.)
2 Tutorial 12

6. In the casino game roulette, if a player bets $1 on red (or on black or on odd or on even),
the probability of winning $1 is 18/38 and the probability of losing $1 is 20/38. Suppose
that a player begins with $5 and makes successive $1 bets. Let Y equal the player’s
maximum capital before losing the $5. One hundred observations of Y were simulated on
a computer, yielding the following data:

25 9 5 5 5 9 6 5 15 45
55 6 5 6 24 21 16 5 8 7
7 5 5 35 13 9 5 18 6 10
19 16 21 8 13 5 9 10 10 6
23 8 5 10 15 7 5 5 24 9
11 34 12 11 17 11 16 5 15 5
12 6 5 5 7 6 17 20 7 8
8 6 10 11 6 7 5 12 11 18
6 21 6 5 24 7 16 21 23 15
11 8 6 8 14 11 6 9 6 10

(a) Find the five-number summary of the data.


(b) Calculate the IQR and identify outliers.
(c) Draw a box plot that shows outliers.
(d) Find the 90th sample percentile.

7. Noise is measured in decibels, denoted as dB. One decibel is about the level of the weakest
sound that can be heard in a quiet surrounding by someone with good hearing; a whisper
measures about 30 dB; a human voice in normal conversation is about 70 dB; a loud
radio is about 100 dB. Ear discomfort usually occurs at a noise level of about 120 dB.
The following data give noise levels measured at 36 different times directly outside of
Grand Central Station in Manhattan.

82 89 94 110 74 122 112 95 100


78 65 60 90 83 87 75 114 85
69 94 124 115 107 88 97 74 72
68 83 91 90 102 77 125 108 65

(a) Construct a stem-and-leaf diagram.


(b) Convert the stem-and-leaf diagram into an ordered stem-and-leaf diagram.
(c) Determine the quartiles.

8. A study on strength properties of high-performance concrete obtained by using super-


plasticizers and certain binders recorded the following data on flexural strength (in mega-
pascals, MPa):

5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0
8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7

(a) Construct a stem-and-leaf diagram of the data.


ENG5001/ENG6001 – Advanced Engineering Data Analysis 3

(b) Find the mean and median of this data set.


(c) Does the stem-and-leaf diagram appear to be reasonably symmetric about the me-
dian, or would you describe the shape in some other way?
(d) What proportion of strength observations in this sample exceed 10 MPa?

9. In a study of warp breakage during the weaving of fabric (Technometrics, 1982: 63), 100
specimens of yarn were tested. The number of cycles of strain to breakage was deter-
mined for each yarn specimen, resulting in the following data:

86 146 251 653 98 249 400 292 131 169


175 176 76 264 15 364 195 262 88 264
157 220 42 321 180 198 38 20 61 121
282 224 149 180 325 250 196 90 229 166
38 337 65 151 341 40 40 135 597 246
211 180 93 315 353 571 124 279 81 186
497 182 423 185 229 400 338 290 398 71
246 185 188 568 55 55 61 244 20 284
393 396 203 829 239 236 286 194 277 143
198 264 105 203 124 137 135 350 193 188

(a) Construct a relative frequency histogram based on the class intervals [0,100), [100,200),
[200,300), . . ., and comment on the features of the histogram.
(b) Construct a histogram based on the following class intervals: [0,50), [50,100), [100,150),
[150,200), [200,300), [300,400), [400,500), [500,600), [600,900).
(c) If weaving specifications require a breaking strength of at least 100 cycles, what
proportion of the yarn specimens in this sample would be considered satisfactory?

10. Ledolter and Hogg report that a manufacturer of metal alloys is concerned about customer
complaints regarding the lack of uniformity in the melting points of one of the film’s alloy
filaments. Fifty filaments are selected and their melting points determined. The following
results were obtained:

320 326 325 318 322 320 329 317 316 331
320 320 317 329 316 308 321 319 322 335
318 313 327 314 329 323 327 323 324 314
308 305 328 330 322 310 324 314 312 318
313 320 324 311 317 325 328 319 310 324

(a) Construct a frequency table, and display the histogram, of the data.
(b) Calculate the sample mean and sample standard deviation.
(c) Locate x̄, x̄ ± s on your histogram. How many observations lie within one standard
deviation of the mean? How many lie within two standard deviations of the mean?
(d) Find the five-number summary for these melting points.
(e) Construct a box-and-whisker diagram.
4 Tutorial 12

11. A small part for an automobile rearview mirror was produced on two different punch
presses. In order to describe the distribution of the weights of those parts, a random
sample was selected, and each piece was weighed in grams, resulting in the following data
set:

3.968 3.534 4.032 3.912 3.572 4.014 3.682 3.608


3.669 3.705 4.023 3.588 3.945 3.871 3.744 3.711
3.645 3.977 3.888 3.948 3.551 3.796 3.657 3.667
3.799 4.010 3.704 3.642 3.681 3.554 4.025 4.079
3.621 3.575 3.714 4.017 4.082 3.660 3.692 3.905
3.977 3.961 3.948 3.994 3.958 3.860 3.965 3.592
3.681 3.861 3.662 3.995 4.010 3.999 3.993 4.004
3.700 4.008 3.627 3.970 3.647 3.847 3.628 3.646
3.674 3.601 4.029 3.603 3.619 4.009 4.015 3.615
3.672 3.898 3.959 3.607 3.707 3.978 3.656 4.027
3.645 3.643 3.898 3.635 3.865 3.631 3.929 3.635
3.511 3.539 3.830 3.925 3.971 3.646 3.669 3.931
4.028 3.665 3.681 3.984 3.664 3.893 3.606 3.699
3.997 3.936 3.976 3.627 3.536 3.695 3.981 3.587
3.680 3.888 3.921 3.953 3.847 3.645 4.042 3.692
3.910 3.672 3.957 3.961 3.950 3.904 3.928 3.984
3.721 3.927 3.621 4.038 4.047 3.627 3.774 3.983
3.658 4.034 3.778

(a) Using about 10 (say, 8 to 12) classes, construct a frequency distribution of the data.
(b) Draw a histogram of the data.
(c) Describe the shape of the distribution represented by the histogram.

12. A transformation of data values by means of some mathematical function, such as x
or 1/x, can often yield a set of numbers that has “nicer” statistical properties than the
original data. In particular, it may be possible to find a function for which the histogram
of transformed values is more symmetric (or, even better, more like a bell-shaped curve)
than the original data.
For example, in an experiment designed to study the behaviour of certain individual cells
that had been exposed to beryllium, the interdivision times (IDTs) of cells were deter-
mined for a large number of cells both in exposed (treatment) and unexposed (control)
conditions. Consider the following IDT data:

28.1 31.2 13.7 46.0 25.8 16.8 34.8 62.3


28.0 17.9 19.5 21.1 31.9 28.9 60.1 23.7
18.6 21.4 26.6 26.2 32.0 43.5 17.4 38.8
30.6 55.6 25.5 52.1 21.0 22.3 15.5 36.3
19.1 38.4 72.8 48.9 21.4 20.7 57.3 40.9

Construct a histogram of this data based on classes with boundaries 10, 20, 30, . . .. Then
calculate log10 (x) for each observation, and construct a histogram of the transformed data
using class boundaries 1.1, 1.2, 1.3, . . .. What is the effect of the transformation?
ENG5001/ENG6001 – Advanced Engineering Data Analysis 5

13. A survey on knee injuries recorded the following data on type of injury (A= mensical
tear, B=MCL tear, C=ACL tear, D=patella dislocation, E=PCL tear):

A B B A C A A D B A C E B
B A A C D C A C B C C C A
B B C A A B C C A C B B D
A B A C B A A C A B B E B
B B C C A C A A B D A A C
B C C A B B A D C A B

(a) Construct a bar chart for this data.


(b) Construct a pie chart for this data.

14. The National Highway Traffic Safety Administration has studied the use of rear-seat
automobile lap and shoulder seat belts. The number of lives potentially saved with the
use of lap and shoulder seat belts is shown for various percentages of use.

Percentage Lives saved wearing


of use Lap belt only Lap and shoulder belt
100 529 678
80 423 543
60 318 407
40 212 271
20 106 136
10 85 108

Suggest an appropriate way to display this data and produce it.

15. Blood cocaine concentration (mg/L) was determined both for a sample of individuals who
had died from cocaine-induced excited delirium and for a sample of those who had died
from a cocaine overdose without excited delirium; survival time for people in both groups
was at most 6 hours. The data is as follows.

ED: 0 0 0 0 0.1 0.1 0.1 0.1 0.2 0.2


0.3 0.3 0.3 0.4 0.5 0.7 0.8 1.0 1.5 2.7
2.8 3.5 4.0 8.9 9.2 11.7 21.0
Non-ED: 0 0 0 0 0 0.1 0.1 0.1 0.1 0.2
0.2 0.2 0.3 0.3 0.3 0.4 0.5 0.5 0.6 0.8
0.9 1.0 1.2 1.4 1.5 1.7 2.0 3.2 3.5 4.1
4.3 4.8 5.0 5.6 5.9 6.0 6.4 7.9 8.3 8.7
9.1 9.6 9.9 11.0 11.5 12.2 12.7 14.0 16.6 17.8

(a) Determine the medians, quartiles and IQRs for the two samples.
(b) Are there any outliers in either sample? Any extreme outliers?
(c) Construct a side-by-side box plot, and use it as a basis for comparing and contrasting
the ED and non-ED samples.
6 Tutorial 12

16. Specimens of three different types of rope wire were selected, and the fatigue limit (MPa)
was determined for each specimen, resulting in the accompanying data:

Type 1: 350 350 350 363 370 370 370 371


371 372 372 380 391 391 392
Type 2: 350 354 359 363 365 368 369 371
373 374 376 380 383 388 392
Type 3: 350 361 362 363 364 365 366 371
377 377 377 380 380 380 392

(a) Construct a side-by-side box plot, and comment on similarities and differences.
(b) Construct a stem-and-leaf plot for each of the three types. Comment on similarities
and differences.
(c) Does the side-by-side box plot in part (a) give an informative assessment of similar-
ities and differences? Explain your reasoning.

17. Wear resistance of certain nuclear reactor components made of Zircaloy-2 is partly deter-
mined by properties of the oxide layer. The following data is from an article that proposed
a new nondestructive testing method to monitor thickness of the layer ; the variables are
x = oxide-layer thickness and y = eddy current response:

x: 0 7 17 114 133 142 190 218 237 285


y: 20.3 19.8 19.5 15.9 15.1 14.7 11.9 11.5 8.3 6.6

(a) Construct a scatter plot of the data. How would you describe the nature of the
relationship between the two variables?
(b) Compute the sample correlation coefficient for the data. Does it confirm your im-
pression from the scatter plot?

18. Express the sample correlation coefficient r in terms of the following sums:
X X X X X
xi , yi , x2i , yi2 , xi y i

19. Toughness and fibrousness of asparagus are major determinants of quality. An article
“Postharvest glyphosate application reduces toughening, fiber content, and lignification
of stored asparagus spears” reported the following data on x = shear force (kg) and y =
percent fiber dry weight:

x: 46 48 55 57 60 72 81 85 94
y: 2.18 2.10 2.13 2.28 2.34 2.53 2.28 2.62 2.63
x: 109 121 132 137 148 149 184 185 187
y: 2.50 2.66 2.79 2.80 3.01 2.98 3.34 3.49 3.26

(a) Using the formula obtained in Problem 18, calculate the sample correlation coeffi-
cient. Based on this value, how would you describe the nature of the relationship
between the two variables?
ENG5001/ENG6001 – Advanced Engineering Data Analysis 7

(b) If a first specimen has a larger value of shear force than does a second specimen,
what tends to be true of percent dry fiber weight for the two specimens, which one
would be larger?
(c) If shear force is expressed in pounds, what happens to the value of r? Why?

20. An experiment was conducted to investigate how the behaviour of mozzarella cheese
varied with temperature. The following data was obtained, with x = temperature and y
= elongation (%) at failure of the cheese.

x: 59 63 68 72 74 78 83
y: 118 182 247 208 197 135 132

(a) Construct a scatter plot in which the axes intersect at (0, 0). Mark 0, 20, 40, 60,80,
and 100 on the horizontal axis and 0, 50, 100, 150, 200, and 250 on the vertical axis.
(b) Construct a scatter plot in which the axes intersect at (55, 100). Does this plot seem
preferable to the one in part (a)?
(c) What do the plots suggest about the nature of the relationship between the two
variables?