You are on page 1of 6

Course Statistical Hydrology

Assigment No. 1
Topic Helsel et al. Chapter 1 - Summarizing Data

Exercise 1.1
Yields in wells penetrating rock units without fractures were measured by Wright (1985), and are given
below. Calculate the
a) mean
b) trimmed mean
c) geometric mean
d) median
e) compare these estimates of location. Why do they differ?
Unit well yields (in gal/min/ft) in Virginia (Wright, 1985)
0.001 0.03 0.1 0.003 0.004 0.454
0.007 0.051 0.49 0.02 0.077 1.02

Solution:
a) 𝑛= 12 ෍ 𝑋𝑖 = 2.257 𝑚𝑒𝑎𝑛 = 0.188

b)
10 percent trimmed 1 observation to trim
0.001 0.003 0.004 0.007 0.02 0.03
0.051 0.077 0.1 0.454 0.49 1.02

10 % 𝑡𝑟𝑖𝑚𝑚𝑒𝑑 𝑚𝑒𝑎𝑛 = 0.124

20 percent trimmed 2 observations to trim


0.001 0.003 0.004 0.007 0.02 0.03
0.051 0.077 0.1 0.454 0.49 1.02

20 % 𝑡𝑟𝑖𝑚𝑚𝑒𝑑 𝑚𝑒𝑎𝑛 = 0.0929

c)
𝑋𝑖 𝑌𝑖
0.001 -6.9078 𝑌𝑖 = ln(𝑋𝑖) ෍ 𝑌𝑖 = -39.9445
0.003 -5.8091
0.004 -5.5215
0.007 -4.9618
σ𝑛𝑖=1 𝑌𝑖 -3.328705
𝑌= 𝑌=
0.02 -3.9120 𝑛
0.03 -3.5066
0.051 -2.9759 𝐺𝑀 = exp(𝑌) 𝐺𝑀 = 0.0358
0.077 -2.5639
0.1 -2.3026 𝑛 𝐺𝑀 = 0.0358
𝐺𝑀 = 𝑋1 ∙ 𝑋2 ∙ . . .∙ 𝑋𝑛
0.454 -0.7897
0.49 -0.7133
1.02 0.0198
d)
0.001 0.003 0.004 0.007 0.02 0.03
0.051 0.077 0.1 0.454 0.49 1.02

Central values
0.03
0.051 𝑚𝑒𝑑𝑖𝑎𝑛 = 0.0405

e)
Because the Mean is higher than the Median (0.188 compared to 0.0405) it shows that the sample is
positively skewed. Mean differs from trimmed mean at 10 percent and at 20 percent due to in the trimmed
cases outliers have been removed, meaning that the more common observations are analyzed. It can be said
also that trimmed mean at 20 percent is more precise than the one at 10 percent because more outliers are
being not considered.

In the case of the Geometric Mean, we can say that because observations are related to the same event and
they all have relationship with each other, it is more accurate than the results provided by the mean.

Exercise 1.2
For the well yield data of exercise 1.1, calculate the
a) standard deviation
b) interquartile range
c) MAD
d) skew and quartile skew.
Discuss the differences between a through c.

Solution:
a)
2
𝑋𝑖 𝑋𝑖 − 𝑋 𝑋𝑖 − 𝑋
0.001 -0.1871 0.0350 𝑋= 0.1881
0.003 -0.1851 0.0343
2
0.004 -0.1841 0.0339 ෍ 𝑋𝑖 − 𝑋 = 1.0820
0.007 -0.1811 0.0328
0.02 -0.1681 0.0283 2
2
σ 𝑋𝑖 − 𝑋
0.03 -0.1581 0.0250 𝑠 = = 0.0984
(𝑛 − 1)
0.051 -0.1371 0.0188 𝑆𝐷 = 𝑠 = 𝑠2 = 0.314
0.077 -0.1111 0.0123
0.1 -0.0881 0.0078
0.454 0.2659 0.0707
0.49 0.3019 0.0912
1.02 0.8319 0.6921

b)
position 1 2 3 4 5 6
observ. 0.001 0.003 0.004 0.007 0.02 0.03

position 7 8 9 10 11 12
observ. 0.051 0.077 0.1 0.454 0.49 1.02

𝑃 𝑋
𝑃0.75 = 𝑋 9.75 𝑖𝑛𝑡𝑒𝑟𝑝𝑜𝑙𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑋9.75 = 0.3655 𝐼𝑄𝑅 = 𝑃0.75 − 𝑃0.25 = 0.361
𝑃0.25 = 𝑋 3.25 𝑖𝑛𝑡𝑒𝑟𝑝𝑜𝑙𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑋3.25 = 0.00475
c)
𝑋𝑖 𝑑𝑖 Organizing from 𝑑𝑖 𝑚𝑒𝑑𝑖𝑎𝑛 𝑜𝑓 𝑋𝑖 = 0.0405
0.001 0.0395 lowest to highest 0.0105
0.003 0.0375 0.0105
0.004 0.0365 0.0205 𝑀𝐴𝐷 = 𝑚𝑒𝑑𝑖𝑎𝑛 |𝑑𝑖| = 0.037
0.007 0.0335 0.0335
0.02 0.0205 0.0365
0.03 0.0105 0.0365
0.051 0.0105 0.0375
0.077 0.0365 0.0395
0.1 0.0595 0.0595
0.454 0.4135 0.4135
0.49 0.4495 0.4495
1.02 0.9795 0.9795

d) (𝑋𝑖 − 𝑋)3
𝑋𝑖 𝑠3 (𝑋𝑖 − 𝑋)3
0.001 -0.212248 ෍ = 18.9199
𝑠3
0.003 -0.205513
0.004 -0.2022
𝑛 (𝑋𝑖 − 𝑋)3
0.007 -0.192475 𝑔= ෍ = 2.06
(𝑛 − 1)(𝑛 − 2) 𝑠3
0.02 -0.153926
0.03 -0.128055
0.051 -0.083501 𝑃0.25 = 0.00475
0.077 -0.044431 𝑃0.50 = 0.0405
0.1 -0.022152 𝑃0.75 = 0.3655
0.454 0.6095013
0.49 0.8920706
(𝑃0.75 −𝑃0.50 ) − (𝑃0.50 − 𝑃0.25 )
1.02 18.662827 𝑞𝑠 = = 0.802
(𝑃0.75 − 𝑃0.25 )
𝑋= 0.1881
𝑠= 0.314

In this case the highest observation can be taken as an outlier due to its distance to the rest of
observations, because the IQR is not affected by this one, can be assumed as a more resistant measure
compared to the SD.

Exercise 1.3
Ammonia plus organic nitrogen (in mg/L) was measured in samples of precipitation by Oltmann and Shulters
(1989). Some of their data are presented below. Compute summary statistics for these data. Which
observation might be considered an outlier? How should this value affect the choice of summary statistics
used
a) to compute the mass of nitrogen falling per square mile.
b) to compute a "typical" concentration and variability for these data?
0.3 0.9 0.36 0.92 0.5 1
0.7 9.7 0.7 1.3
Solution
(𝑋𝑖 − 𝑋)3
2
Position 𝑋𝑖 𝑌𝑖 𝑋𝑖 − 𝑋 𝑋𝑖 − 𝑋 𝑑𝑖 𝑠3 From lowest 𝑑𝑖
1 0.3 -1.2040 -1.3380 1.7902 0.5 -0.103574 to highest 0.1
2 0.36 -1.0217 -1.2780 1.6333 0.44 -0.090256 0.1
3 0.5 -0.6931 -1.1380 1.2950 0.3 -0.063725 0.1
4 0.7 -0.3567 -0.9380 0.8798 0.1 -0.035686 0.12
5 0.7 -0.3567 -0.9380 0.8798 0.1 -0.035686 0.2
6 0.9 -0.1054 -0.7380 0.5446 0.1 -0.01738 0.3
7 0.92 -0.0834 -0.7180 0.5155 0.12 -0.016005 0.44
8 1 0.0000 -0.6380 0.4070 0.2 -0.011229 0.5
9 1.3 0.2624 -0.3380 0.1142 0.5 -0.00167 0.5
10 9.7 2.2721 8.0620 64.9958 8.9 22.657513 8.9

𝑛= 10 𝑋 2.75 𝑃0.25 = 0.465


𝑋= 1.638 𝑋 5.50 𝑃0.50 = 0.8
𝑚𝑒𝑑𝑖𝑎𝑛 𝑜𝑓 𝑋𝑖 = 0.80 𝑋 8.25 𝑃0.75 = 1.075
𝑠2 = 8.117
𝑠= 2.849
𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 = 1.6
𝑆𝐷 = 2.8
𝑚𝑒𝑑𝑖𝑎𝑛 = 0.80
𝐼𝑄𝑅 = 0.61
𝐺𝑀 = 0.88
𝑀𝐴𝐷 = 0.25
𝑔= 3.1
𝑞𝑠 = -0.098

a) The last highest value is an outlier being far from the other observations.
In order to better determine the mass of nitrogen per square mile, the median would not consider
the high observation. If all data is correct, the mean will include the highest observation.

b) If a typical or more common observation needs to be computed, then the highest value should not
be considered since it is only one value that goes far from the rest. The measures that will not consider
this highest point are the median and IQR.
Exercises considering the CWD Dataset
Vol. m3/ha (𝑋𝑖 − 𝑋)3 From lowest
2
Position 𝑋𝑖 𝑌𝑖 𝑋𝑖 − 𝑋 𝑋𝑖 − 𝑋 𝑑𝑖 𝑠3 to highest 𝑑𝑖
1 0 - -6.08 36.97 1.88 -0.25 0.34
2 0 - -6.08 36.97 1.88 -0.25 0.34
3 0.99 -0.01 -5.09 25.91 0.89 -0.15 0.34
4 1.54 0.43 -4.54 20.61 0.34 -0.10 0.34
5 1.54 0.43 -4.54 20.61 0.34 -0.10 0.34
6 1.54 0.43 -4.54 20.61 0.34 -0.10 0.34
7 1.54 0.43 -4.54 20.61 0.34 -0.10 0.89
8 2.22 0.80 -3.86 14.90 0.34 -0.06 1.88
9 2.22 0.80 -3.86 14.90 0.34 -0.06 1.88
10 3.95 1.37 -2.13 4.54 2.07 -0.01 2.07
11 6.17 1.82 0.09 0.01 4.29 0.00 4.29
12 12.09 2.49 6.01 36.12 10.21 0.24 10.21
13 15.79 2.76 9.71 94.28 13.91 1.01 13.91
14 35.53 3.57 29.45 867.30 33.65 28.29 33.65

𝑛= 14 𝑋 3.75 𝑃0.25 = 1.40


𝑋= 6.08 𝑋 7.50 𝑃0.50 = 1.88
𝑚𝑒𝑑𝑖𝑎𝑛 𝑜𝑓 𝑋𝑖 = 1.88 𝑋 11.25 𝑃0.75 = 7.65
𝑠2 = 93.41
𝑠= 9.66
Exercise 1.1
a)
𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 = 6.08

b)
10 percent trimmed 1 observation to trim
0 0 0.99 1.54 1.54 1.54 1.54
2.22 2.22 3.95 6.17 12.09 15.79 35.53

10 % 𝑡𝑟𝑖𝑚𝑚𝑒𝑑 𝑚𝑒𝑎𝑛 = 4.13

20 percent trimmed 3 observations to trim


0 0 0.99 1.54 1.54 1.54 1.54
2.22 2.22 3.95 6.17 12.09 15.79 35.53

20 % 𝑡𝑟𝑖𝑚𝑚𝑒𝑑 𝑚𝑒𝑎𝑛 = 2.59

c) 𝐺𝑀 = 2.99

d) 𝑚𝑒𝑑𝑖𝑎𝑛 = 1.88

e)
The mean is higher than the median, which means that the sample is positively skewed. The trimmed
means eliminate the lowest and highest observations, which are considered outliers in this specific case,
due to its distance to the central observations. In this case the 20 percent trimmed mean is also
considered to be more accurated, because it does not consider 3 of the highest and lowest observations.
Exercise 1.2
a) 𝑆𝐷 = 9.66

b) 𝐼𝑄𝑅 = 6.25

c) 𝑀𝐴𝐷 = 1.39

d) 𝑔= 2.5
𝑞𝑠 = 0.85

This sample shows similarity to the previous data set if we compare the MAD with the higher values of
SD and IQR. Also IQR can be considered as a more accurate measure of spread compared to the SD.

You might also like