Professional Documents
Culture Documents
ESci 117 Answers To LAs in Modules 2
ESci 117 Answers To LAs in Modules 2
Statistics
Answers to the Learning Activities
ESci 117- Engineering Data Analysis
Instructor: Meralyn R. Lebante
Lesson 2.1 Organization of Data
Key points:
• Numerically measured or measurement data – discrete(e.g., shoe size,
year level, count) or continuous (e.g., weight in kg, height in m, age in
years)
Note: We can also have discrete (e.g., 5-point rating scale of 1, 2, 3, 4, or 5) or
continuous categories (e.g., income classes)
• Discrete data - organized/displayed using line diagram and histogram
• Continuous data
• organized/displayed using dotplot, histogram, and stem-and-leaf
display or stem-and-leaf plot
• construction of histogram with equal class widths and unequal class
widths
Lesson 2.1 Organization of Data
bimodal
SALD
Stem Leaf
2L 29
2H 86
3L 13 07 34
3H 86 56 65 85 99
4L 23 07 15 35 18 05 11 09 44
4H 81 80 57 65
5L 06 12
A. SALD
2L 29 Stem: tens digit; H-high, L-low
2H 86 Leaf: one and tenths digits
3L 13 07 34
3H 86 56 65 85 99 or Unit = 0.1
4L 23 07 15 35 18 05 11 09 44
4H 81 80 57 65
5L 06 12
Relative frequency
The histogram is negatively skewed since it stretches in the left part. This indicates that
most of the samples have percentages of respiring bacteria at relatively high levels, with
only a few at relatively low levels.
Lesson 2.2 Measures of Location
Key points:
• Measures of central tendency or average:
omean (at least interval, similar values)
omedian (at least ordinal)
omode ( at least nominal)
29 5 6
30 2 4 6 9 5
31 9 2
32 1
33 3 4
SALD:
29 5 6 Unit = 0.1
30 2 4 5 6 9
31 2 9
32 1
33 3 4
Array: 29.5, 29.6, 30.2, 30.4, 30.5, 30.6, 30.9, 31.2, 31.9, 32.1, 33.3, 33.4
σ𝑁
𝑖=1 𝑥𝑖 373.6
1. To find the population mean: 𝜇 = = ≅ 31.13℃
𝑁 12
Interpretation:
Most of the temperatures are close to 31.13℃. Or, if the monthly high
temperature was constant in 2019, this temperature would be
31.13℃.
SALD:
29 5 6 Unit = 0.1
30 2 4 5 6 9
31 2 9
32 1
33 3 4
Array: 29.5, 29.6, 30.2, 30.4, 30.5, 30.6, 30.9, 31.2, 31.9, 32.1, 33.3, 33.4
𝑥 𝑁 +𝑥 𝑁
2 2 +1 𝑥 6 +𝑥 7 30.6+30.9
2. To find the population median, 𝑁 is even: 𝜇 = = =
2 2 2
= 30.75℃
Interpretation: Half of the monthly high temperatures in 2019 were below 30.75℃ and half were
above it (as can be observed).
Note: Since the SALD indicates a positively skewed histogram, the median is the better measure of
central tendency over the mean. That is, the middle value better represents the 12
temperatures in terms of “average” level.
3. There is no mode as each monthly high temperature in 2019
is unique.
4. To find the first quartile: (𝑄1 = 𝑃25 )
𝑛𝑘 12 (25) 12 1
a. Finding = = =3
100 100 4
b. Since we obtained an integer in a), we have
𝑥 3 +𝑥 4 30.2+30.4
𝑄1 = 𝑃25 = = = 30.3℃.
2 2
We can say that, in 2019, the lowest 25% of the monthly high
temperatures were below 30.3℃ (as we can observe).
5. To find the third quartile: (𝑄3 = 𝑃75 )
𝑛𝑘 12 (75) 12 3
a. Finding = = =9
100 100 4
b. Since we obtained an integer in a), we have
𝑥 9 +𝑥 10 31.9+32.1
𝑄3 = 𝑃75 = = = 32℃.
2 2
We can say that, in 2019, the highest 25% of the monthly high
temperatures were above 32℃ (as we can observe).
Lesson 2.3 Measures of Variability
Key points:
0H 89 96
2L 04 12 33 42 49
2H 53 58 71 85 or Unit = 0.1
3L 02 24
3H Array:
4L 8.9, 9.6, 10.3, 11.8, 12.7, 14.0, 14.6, 16.1, 18.5, 20.4, 21.2, 23.3, 24.2, 24.9,
4H 50 25.3, 25.8, 27.1, 28.5, 30.2, 32.4, 45.0
Crack Length
𝒙𝒊 − 𝒙
8.9 12.3
• Compute the appropriate measure of 9.6 11.6
10.3 10.9
variability and interpret. 11.8 9.4
Since the SALD indicates a set of values having one different 12.7 8.5
from the rest located in the upper tail, the mean will not be a 14 7.2
good measure of central tendency. We then need to use an 14.6 6.6
alternative measure of variability to the standard deviation. This 16.1 5.1
18.5 2.7
is the average deviation based on the median as the shape of
20.4 0.8
the histogram indicated by the SALD is not symmetric. We
21.2 0
recall that the median of this data set is 21.2. We now compute 23.3 2.1
for the average deviation as follows: 24.2 3
24.9 3.7
σ21
𝑖=1 𝑥𝑖 − 21.2 12.3 + 11.6 + ⋯ + 23. 149.8 25.3 4.1
A.D. = = = ≅ 7.13 25.8 4.6
21 21 21 27.1 5.9
28.5 7.3
Interpretation: On the average, the 21 lengths deviated by 30.2 9
7.133 units from their median. 32.4 11.2
45 23.8
Total 444.8 149.8
2. The CV of this sample data is computed as follows:
𝑠 9.0018
CV= × 100% = × 100% ≅ 42.50%
𝑥ҧ 21.18
𝑛𝑖 21(1) 𝑛𝑖 21(3)
𝑄1 = = = 5.25 𝑄3 = = = 15.75
4 4 4 4
Lower inner fence = 14.0 − 1.5 11.8 = −3.7, not useful (no negative value)
Upper inner fence = 25.8 + 1.5 11.8 = 43.5
Lower outer fence = 14.0 − 3 11.8 = −21.4, not useful
Upper outer fence = 25.8 + 3(11.8) = 61.2
Lower inner fence = 14.0 − 1.5 11.8 = −3.7, not useful
The boxplot for this data set is: (no negative value)
Upper inner fence = 25.8 + 1.5 11.8 = 43.5
Median 𝑥 IQR=11.8 Lower outer fence = 14.0 − 3 11.8 = −21.4, not useful
Upper outer fence = 25.8 + 3(11.8) = 61.2
𝑄1 = 14.0 𝑄3 = 25.8
This data value is
considered mild outlier
since it is beyond the
upper inner fence
Clearly, the largest value, 45.0, is a mild outlier. The sample data is
negatively skewed with the median line closer to the upper quartile.
There is no outlier in the lower half.
Thank You!