Professional Documents
Culture Documents
Analysis 3
This chapter starts by presenting some basic notions and (iii) output or terminating stage, which acts to indicate,
characteristics of different types of data collection systems record, or control the variable being measured. The out-
and types of sensors. Next, simple ways of validating and put could be digital or analog.
assessing the accuracy of the data collected are addressed. Ideally, the output or terminating stage should indicate
Subsequently, salient statistical measures to describe univari- only the quantity to be measured. Unfortunately, there are
ate and multivariate data are presented along with how to use various spurious inputs which could contaminate the desired
them during basic exploratory data and graphical analyses. measurement and introduce errors. Doebelin (1995) groups
The two types of measurement uncertainty (bias and random) the inputs that may cause contamination into two basic types:
are discussed and the concept of confidence intervals is intro- modifying and interfering (Fig. 3.2).
duced and its usefulness illustrated. Finally, three different (i) Interfering inputs introduce an error component to the
ways of determining uncertainty in a data reduction equation output of the detector-transducer stage in a rather direct
by propagating individual variable uncertainty are presented; manner, just as does the desired input quantity. For
namely, the analytical, numerical and Monte Carlo methods. example, if the quantity being measured is temperature
of a solar collector’s absorber plate, improper shield-
ing of the thermocouple would result in an erroneous
reading due to the solar radiation striking the thermo-
3.1 Generalized Measurement System couple junction. As shown in Fig. 3.3, the calibration of
the instrument is no longer a constant but is affected by
There are several books, for example Doebelin (1995) or Hol- the time at which the measurement is made, and since
man and Gajda (1984) which provide a very good overview this may differ from one day to the next, the net result
of the general principles of measurement devices and also is improper calibration. Thus, solar radiation would be
address the details of specific measuring devices (the com- thought of as an interfering input.
mon ones being those that measure physical quantities such as (ii) Modifying inputs have a more subtle effect, introducing
temperature, fluid flow, heat flux, velocity, force, torque, pres- errors by modifying the input/output relation between
sure, voltage, current, power,…). This section will be limited the desired value and the output measurement (Fig. 3.2).
to presenting only those general principles which would aid An example of this occurrence is when the oil used to
the analyst in better analyzing his data. The generalized mea- lubricate the various intermeshing mechanisms of a sys-
surement system can be divided into three parts (Fig. 3.1): tem has deteriorated, and the resulting change in viscos-
(i) detector-transducer stage, which detects the value of ity can lead to the input/output relation getting altered
the physical quantity to be measured and transduces or in some manner.
transforms it into another form, i.e., performs either a One needs also to distinguish between the analog and the
mechanical or an electrical transformation to convert digital nature of the sensor output or signal (Doebelin 1995).
the signal into a more easily measured and usable form For analog signals, the precise value of the quantity (voltage,
(either digital or analog); temperature,…) carrying the information is important. Digi-
(ii) intermediate stage, which modifies the direct signal by tal signals, however, are basically binary in nature (on/off),
amplification, filtering, or other means so that an output and variation in numerical values is associated with changes
within a desirable range is achieved. If there is a known in the logical state (true/false). Consider a digital electronic
correction (or calibration) for the sensor, it is done at system where any voltage in the range of + 2 to + 5 V produces
this stage; and the on-state, while signals of 0 to + 1.0 V correspond to the
T. Agami Reddy, Applied Data Analysis and Modeling for Energy Engineers and Scientists, 61
DOI 10.1007/978-1-4419-9613-8_3, © Springer Science+Business Media, LLC 2011
62
62 3 Data Collection and Preliminary Data Analysis
Quantity
Quantity
to be
recorded
measured
Detector- Intermediate Output
Transducer stage stage
stage
Quantity
∑ recorded
3.2.1 Sensors
a Accuracy b Precision
. At off-design
temperature
.
.
Output signal
Output signal
Resolution at point x
Sensitivity
At nominal design
drift
temperature
.
5
Least-Squares fitted line
qo = 1.08q i – 0.85
4
0
0 1 2 3 4 5 6 7 8 9 10
± 3s uncertainty limits
± 0.54 imprecision
Bias
(–0.47)
4.25 Pressure,
4.32 4.79 5.33
kPa
Reading Best estimate
b of true Value
66
66 3 Data Collection and Preliminary Data Analysis
Step input
Phase distorted
Frequency distorted
Output response True signal
Signal amplitude
Time
Rise
time
Fig. 3.9 Concept of rise time of the output response to a step input
3.3 Data Validation and Preparation Fortunately, many of the measurements made in the context
of engineering systems have identifiable limits. Limits are
The aspects of data collection, cleaning, validation and trans- useful in a number of experimental phases such as establish-
formation are crucial. However, these aspects are summarily ing a basis for appropriate instrumentation and measurement
treated in most books, partly because their treatment involves techniques, rejecting individual experimental observations,
adopting circumstance specific methods, and also because it and bounding/bracketing measurements. Measurements can
is (alas) considered neither of much academic interest nor a often be compared with one or more of the following limits:
field worthy of scientific/statistical endeavor. This process is physical, expected and theoretical.
3.3 Data Validation and Preparation 67
(a) Physical Limits. Appropriate physical limits should be thermodynamic limits of a subsystem or system (e.g.,
identified in the planning phases of an experiment so Carnot efficiency for a vapor compression cycle), or ther-
that they can be used to check the reasonableness of modynamic definitions (e.g., heat exchanger effective-
both raw and post-processed data. Under no circum- ness between zero and one). During the execution phase
stance can experimental observations exceed physical of experiments, theoretical limits can be used to bound
limits. For example, in psychrometrics: measurements. If individual observations exceed theo-
− dry bulb temperature ≥ wet bulb temperature ≥ dew retical values, those points should be flagged and closely
point temperature scrutinized to establish their validity and reliability.
− 0 ≤ relative humidity ≤ 100%
Examples in refrigeration systems is that refrigerant sat-
urated condensing temperature should always be greater 3.3.2 Independent Checks Involving Mass
than the outdoor air dry bulb for air-cooled condensers. and Energy Balances
Another example in solar radiation measurement is that
global radiation on a surface should be greater than the In a number of cases, independent checks can be used to
beam radiation incident on the same surface. establish viability of data once the limit checks have been
Experimental observations or processed data that performed. Examples of independent checks include com-
exceed physical limits should be flagged and closely parison of measured (or calculated) values with those of
scrutinized to determine the cause and extent of their other investigators (reported in the published literature) and
deviation from the limits. The reason for data being per- intra-experiment comparisons (based on component conser-
sistently beyond physical limits is usually instrumen- vation principles) which involve collecting data and applying
tation bias or errors in data analysis routines/methods. appropriate conservation principles as part of the validation
Data that infrequently exceed physical limits may be procedures. The most commonly applied conservation prin-
caused by noise or other related problems. Resolving ciples used for independent checks include mass and energy
problems associated with observations that sporadically balances on components, subsystems and systems. All inde-
exceed physical limits is often difficult. However, if pendent checks should agree to within the range of expected
they occur, the experimental equipment and data analy- uncertainty of the quantities being compared. An example
sis routines should be inspected and repaired. In situa- of heat balance conservation check as applied to vapor com-
tions where data occasionally exceed physical limits, it pression chillers is that the chiller cooling capacity and the
is often justifiable to purge such observations from the compressor power should add up to the heat being rejected
dataset prior to undertaking any statistical analysis or at the condenser.
testing of hypotheses. Another sound practice is to design some amount of
(b) Expected Limits. In addition to identifying physical lim- redundancy into the experimental design. This allows con-
its, expected upper and lower bounds should be identified sistency and conservation checks to be performed. A simple
for each measured variable. During the planning phase example of consistency check is during the measurement of
of an experiment, determining expected ranges for mea- say the pressure differences between indoors and outdoors of
sured variables facilitates the selection of instrumenta- a two-story residence. One could measure the pressure differ-
tion and measurement techniques. Prior to taking data, it ence between the first floor and the outdoors and the second
is important to ensure that the measurement instruments floor and the outdoors, and deduce the difference in pressure
have been calibrated and are functional over the range between both floors as the difference between both measure-
of expected operation. An example is that the relative ments. Redundant consistency checking would involve also
humidity in conditioned office spaces should be in the measuring the first floor and second floor pressure difference
range between 30–65%. During the execution phase of and verifying whether the three measurements are consistent
experiments, the identified bounds serve as the basis for or not. Of course such checks would increase the cost of the
flagging potentially suspicious data. If individual obser- instrumentation, and their need would depend on the specific
vations exceed the upper or lower range of expected circumstance.
values, those points should be flagged and closely scru-
tinized to establish their validity and reliability. Another
suspicious behavior is constant values when varying val- 3.3.3 Outlier Rejection by Visual Means
ues are expected. Typically, this is caused by an incorrect
lower or upper bound in the data reporting system so that This phase is undertaken after limit checks and independent
limit values are being reported instead of actual values. checks have been completed. Unless there is a definite rea-
(c) Theoretical Limits. These limits may be related to physi- son for suspecting that a particular observation is invalid,
cal properties of substances (e.g., fluid freezing point), indiscriminate outlier rejection is not advised. The sensible
68
68 3 Data Collection and Preliminary Data Analysis
30 30
more problematic than the case of data missing at random.
use (Btu/h-sqft)
Table 3.1 Saturation water pressure with temperature two data points on either side of the missing point is illus-
trated below.
(a) Simple linear interpolation: Since the x-axis data are at
equal intervals, once would estimate
P(58°C) = (15.002 + 21.84)/2 = 18.421 which is 1.5%
too high.
(b) Method of undetermined coefficients using third order
model: In this case, a more flexible functional form of the
type: P = a + b · T + c · T 2 + d · T 3 is assumed, and using data
variations (say, the diurnal variation of outdoor from the four sets of points, the following four simultane-
dry-bulb temperature). This approach works well ous equations need to be solved for the four coefficients:
when the missing data gaps are short and the pro-
cess is sort of stable; 12.335 = a + b · (50) + c · (50)2 + d · (50)3
(iv) regression methods which use a regression model
between the variable whose data is missing and 15.002 = a + b · (54) + c · (54)2 + d · (54)3
other variables with complete data. Such regression
models can be simple regression models or could 21.840 = a + b · (62) + c · (62)2 + d · (62)3
be multivariate models depending on the specific
circumstance. Many of the regression methods 26.150 = a + b · (66) + c · (66)2 + d · (66)3
(including splines which are accurate especially Once the polynomial function is known, it can be used
for cases where data exhibits large sudden changes to predict the value of P at T = 58°C.
and which are described in Sect. 5.7.2) can be (c) Gregory-Newton method takes the form:
applied. However, the analyst should be cognizant
y = a1 + a2 (x − x1 ) + a3 (x − x1 )(x − x2 ) + . . .
of the fact that such a method of rehabilitation
always poses the danger of introducing, sometimes Substituting each set of data point in turn results in
subtle, biases in the final analysis results. The pro-
y2 − a 1
cess of rehabilitation may have unintentionally a1 = y1 , a2 = ,
given a structure or an interdependence which may x2 − x 1
not have existed in the phenomena or process. (y3 − a1 ) − a2 · (x3 − x1 )
a3 = , ...
(x3 − x1 ) · (x3 − x2 )
Example 3.3.1 Example of interpolation.
Consider Table 3.1 showing the saturated water vapor pres- and so on
sure (P) against temperature (T). Let us assume that the It is left to the reader to use these formulae and estimate
mid-point (T = 58°C) is missing (see Fig. 3.12). The use of the value of P at T = 58°C
different interpolation methods to determine this point using
20
Descriptive summary measures of sample data are meant to
Assumed missing point characterize salient statistical features of the data for easier
15
reporting, understanding, comparison and evaluation. The
10 following are some of the important ones:
Linear trend line joining the two adjacent points (a) Mean (or arithmetic mean or average) of a set or sam-
5
ple of n numbers is:
n
0 − 1 (3.1)
45 50 55 60 65 70 xmean ≡ x = xi
Temperature (degrees C) n i=1
(b) Weighted mean of a set of n numbers is: (k) The variance or the mean square error (MSE) of a set
n of n numbers is:
xi wi
n
− i=1 1 sxx
x= n
(3.2) sx2 = (xi − x̄ )2 = (3.6a)
wi n − 1 i=1 n−1
i=1
Table 3.2 Values of the heat loss coefficient for 90 homes (Example 3.4.1)
2.97 4.00 5.20 5.56 5.94 5.98 6.35 6.62 6.72 6.78
6.80 6.85 6.94 7.15 7.16 7.23 7.29 7.62 7.62 7.69
7.73 7.87 7.93 8.00 8.26 8.29 8.37 8.47 8.54 8.58
8.61 8.67 8.69 8.81 9.07 9.27 9.37 9.43 9.52 9.58
9.60 9.76 9.82 9.83 9.83 9.84 9.96 10.04 10.21 10.28
10.28 10.30 10.35 10.36 10.40 10.49 10.50 10.64 10.95 11.09
11.12 11.21 11.29 11.43 11.62 11.70 11.70 12.16 12.19 12.28
12.31 12.62 12.69 12.71 12.91 12.92 13.11 13.38 13.42 13.43
13.47 13.60 13.96 14.24 14.35 15.12 15.24 16.06 16.90 18.26
use of residences in their service territory. The annual heating Table 3.3 Summary statistics for values of the heat loss coefficient
consumption Q of a residence can be predicted as: (Example 3.4.1)
Count 90
Q = U × A × DD Average 10.0384
Median 9.835
where U is the overall heat loss coefficient of the residence Mode
(includes heat conduction as well as air infiltration,…) and A Geometric mean 9.60826
is the house floor area. 5% Trimmed mean 9.98444
Based on gas bills, a certain electric company calculated Variance 8.22537
the U value of 90 homes in their service territory in an effort Standard deviation 2.86799
to determine which homes were “leaky”, and hence are good Coeff. of variation 28.5701%
candidates for weather stripping so as to reduce their energy Minimum 2.97
use. These values (in units which need not concern us here) Maximum 18.26
are given in Table 3.2. Range 15.29
An exploratory data analysis would involve generating the Lower quartile 7.93
types of pertinent summary statistics or descriptive measures Upper quartile 12.16
given in Table 3.3. Note that no value is given for “Mode” Interquartile range 4.23
since there are several possible values in the table. What can
one say about the variability in the data? If all homes whose cov(xy)
r= (3.10)
U values are greater than twice the mean value are targeted sx sy
for further action, how many such homes are there? Such
questions and answers are left to the reader to explore. where sx and sy are the standard deviations of x and y.
Hence the absolute value of r is less than or equal to
unity. r = 1 implies that all the points lie on a straight line,
3.4.2 Covariance and Pearson Correlation while r = 0 implies no linear correlation between x and y. It is
Coefficient pertinent to point out that for linear models r2 = R2 (the well
known coefficient of determination used in regression and
Though a scatter plot of bivariate numerical data gives a discussed in Sect. 5.3.2), the use of lower case and upper
good visual indication of how strongly variables x and y vary case to denote the same quantity being a historic dichotomy.
together, a quantitative measure is needed. This is provided Figure 3.13 illustrates how the different data scatter affect
by the covariance which represents the strength of the linear the magnitude and sign of r. Note that a few extreme points
relationship between the two variables: may exert undue influence on r especially when data sets are
n
small. As a general thumb rule1, for applications involving
1 engineering data where the random uncertainties are low:
(xi − x̄ ) · (yi − ȳ ) (3.9)
cov(xy) = ·
n − 1 i=1
abs(r) > 0.9 → strong linear correlation
0.7 < abs(r) < 0.9 → moderate (3.11)
where x̄ and ȳ are the mean values of variables x and y. 0.7 > abs(r) → weak
To remove the effect of magnitude in the variation of x
and y, the Pearson correlation coefficient r is probably more
meaningful than the covariance since it standardizes the 1
A more statically sound procedure is described in Sect. 4.2.7 which
allows one to ascertain whether observed correlation coefficients are
coefficients x and y by their standard deviations:
significant or not.
72
72 3 Data Collection and Preliminary Data Analysis
Example 3.4.2 The following observations are taken of the where x̄ and sx are the mean and standard deviation
extension of a spring under different loads (Table 3.4). respectively of the x variable.
The standard deviations of load and extension are 3.7417
and 18.2978 respectively, while the correlation coeffi-
cient = 0.9979. This indicates a very strong positive correla- 3.5 Plotting Data
tion between the two variables as one should expect.
Graphs serve two purposes. During exploration of the data,
they provide a better means of assimilating broad qualitative
3.4.3 Data Transformations trend behavior of the data than can be provided by tabular
data. Second, they provide an excellent manner of communi-
Once the above validation checks have been completed, the cating to the reader what the author wishes to state or illus-
raw data can be transformed to one on which subsequent trate (recall the adage “a picture is worth a thousand words”).
3.5 Plotting Data 73
Hence, they can serve as mediums to communicate informa- 3.5.1 Static Graphical Plots
tion, not just to explore data trends (an excellent reference
is Tufte 2001). However, it is important to be clear as to the Graphical representations of data are the backbone of explor-
intended message or purpose of the graph, and also tailor atory data analysis. They are usually limited to one-, two- and
it as to be suitable for the intended audience’s background three-dimensional data. In the last few decades, there has been
and understanding. A pretty graph may be visually appeal- a dramatic increase in the types of graphical displays largely
ing but may obfuscate rather than clarify or highlight the due to the seminal contributions of Tukey (1988), Cleveland
necessary aspects being communicated. For example, unless (1985) and Tufte (1990, 2001). A particular graph is selected
one is experienced, it is difficult to read numerical values based on its ability to emphasize certain characteristics or
off of 3-D graphs. Thus, graphs should present data clearly behavior of one-dimensional data, or to indicate relations
and accurately without hiding or distorting the underlying between two- and three-dimension data. A simple manner of
intent. Table 3.5 provides a succinct summary of graph for- separating these characteristics is to view them as being:
mats appropriate for different applications. (i) cross-sectional (i.e., the sequence in which the data has
Graphical methods are recommended after the numeri- been collected is not retained),
cal screening phase is complete since they can point out (ii) time series data,
unflagged data errors. Historically, the strength of a graphi- (iii) hybrid or combined, and
cal analysis was to visually point out to the analyst relation- (iv) relational (i.e., emphasizing the joint variation of two or
ships (linear or non-linear) between two or more variables in more variables).
instances when a sound physical understanding is lacking, An emphasis on visualizing data to be analyzed has
thereby aiding in the selection of the appropriate regression resulted in statistical software programs becoming increas-
model. Present day graphical visualization tools allow much ingly convenient to use and powerful towards this end. Any
more than this simple objective, some of which will become data analysis effort involving univariate and bivariate data
apparent below. There are a very large number of graphi- should start by looking at basic plots (higher dimension data
cal ways of presenting data, and it is impossible to cover require more elaborate plots discussed later).
them all. Only a small representative and commonly used
plots will be presented below, while operating manuals of (a) for univariate data:
several high-end graphical software programs describe com- Commonly used graphics for cross-sectional representation
plex, and sometimes esoteric, plots which can be generated are mean and standard deviation, steam-and-leaf, histograms,
by their software. box-whisker-mean plots, distribution plots, bar charts, pie
charts, area charts, quantile plots. Mean and standard devia-
tion plots summarize the data distribution using the two most
basic measures; however, this manner is of limited use (and
Table 3.5 Type and function of graph message determines format. even misleading) when the distribution is skewed. For uni-
(Downloaded from http://www.eia.doe.gov/neic/graphs/introduc.htm) variate data, plotting of histograms is very useful since they
Type of Function Typical format provide insight into the underlying parent distribution of data
message
dispersion, and can flag outliers as well. There are no hard
Component Shows relative size of – Pie chart (for 1 or 2
various parts of a whole important components)
and fast rules of how to select the number of bins (Nbins) or
– Bar chart
classes in case of continuous data, probably because there
– Dot chart
is no theoretical basis. Generally, the larger the number of
– Line chart observations n, the more classes can be used, though as a
Relative Ranks items according to – Bar chart guide it should be between 5 and 20. Devore and Fornum
amounts size, impact, degree, etc. – Line chart (2005) suggest:
– Dot chart
Number of bins or classes = Nbins = (n)1/2 (3.14)
Time series Shows variation over time – Bar chart (for few
intervals) which would suggest that if n = 100, Nbins = 10
– Line chart Doebelin (1995) proposes another equation:
Frequency Shows frequency of – Histogram
distribution among certain Nbins = 1.87.(n − 1)0.4 (3.15)
– Line chart
intervals
– Box-and-Whisker which would suggest that if n = 100, Nbins = 12.
Correlation Shows how changes in – Paired bar The box and whisker plots also summarize the distribu-
one set of data is related – Line chart tion, but at different percentiles (see Fig. 3.14). The lower
to another set of data
– Scatter diagram and upper box values (or hinges) correspond to the 25th and
75th percentiles (i.e., the interquartile range (IQR) defined
74
74 3 Data Collection and Preliminary Data Analysis
50%
24.65% 24.65%
–6σ –5σ –4σ –3σ –2σ –1σ 0 1σ 2σ 3σ 4σ 5σ 6σ
68.27%
15.73% 15.73%
–6σ –5σ –4σ –3σ –2σ –1σ 0 1σ 2σ 3σ 4σ 5σ 6σ
in Sect. 3.4.1) while the whiskers extend to 1.5 times the on the right side of the graph is indicative of data that con-
IQR on either side. These allow outliers to be detected. Any tains outliers (caused by five students taking much longer to
observation farther than (3.0 × IQR) from the closest quar- complete the exam).
tile is taken to be an extreme outlier, while if farther than
(1.5 × IQR), it is considered to be a mild outlier. Example 3.5.2 Consider the same data set as for Example
Though plotting a box-and-whisker plot or a plot of the 3.4.1. The following plots have been generated (Fig. 3.16):
distribution itself can suggest the shape of the underlying (b) Box and whisker plot
distribution, a better visual manner of ascertaining whether (c) Histogram of data (assuming 9 bins)
a presumed underlying parent distribution applies to the data (d) Normal probability plots
being analyzed is to plot a quantile plot (also called the prob- (e) Run chart
ability plot). The observations are plotted against the parent It is left to the reader to identify and briefly state his
distribution (which could be any of the standard probability observations regarding this data set. Note that the run chart
distributions presented in Sect. 2.4), and if the points fall on is meant to retain the time series nature of the data while the
a straight line, this suggests that the assumed distribution is other graphics do not. The manner in which the run chart has
plausible. The example below illustrates this concept. been generated is meaningless since the data seems to have
been entered into the spreadsheet in the wrong sequence,
Example 3.5.1 An instructor wishes to ascertain whether with data entered column-wise instead of row-wise. The
the time taken by his students to complete the final exam fol- run chart, had the data been entered correctly, would have
lows a normal or Gaussian distribution. The values in min-
utes shown in Table 3.6 have been recorded. Table 3.6 Values of time taken (in minutes) for 20 students to com-
plete an exam
The quantile plot for this data assuming the parent dis-
37.0 37.5 38.1 40.0 40.2 40.8 41.0
tribution to be Gaussian is shown in Fig. 3.15. The pattern
42.0 43.1 43.9 44.1 44.6 45.0 46.1
is obviously nonlinear, so a Gaussian distribution is implau-
47.0 62.0 64.3 68.8 70.1 74.5
sible for this data. The apparent break appearing in the data
3.5 Plotting Data 75
15
10
5
0
0 4 8 12 16 20 0 4 8 12 16 20
U Value U Value
Normal Probability Plot Run Chart
99.9 20
99
16
95
percentage
U Value
80 12
50
20 8
5
4
1
0.1 0
0 4 8 12 16 20 0 20 40 60 80 100
U Value Observation
76
76 3 Data Collection and Preliminary Data Analysis
10
Often values of physical variables need to be plotted
9 against two physical variables. One example is the well-know
World population (in billions)
0.4 40 40
STRATUM 5
0.35 HOTTEST DAYS
RELATIVE FREQUENCY
0.3 30 30
0.25 STANDARD
Btu/h-mgr’t
NORMAL
0.2 CURVE
20 20
0.15
0.1
10 10
0.05
0
<–3 –2.5 –1.5 –0.5 0.5 1.5 2.5 >3 0 0
20 30 40 50 60 70 80 90 100 110
Z* (r,d)
Outside Air Temperature °F
0.4
STRATUM 5 Fig. 3.23 Scatter plot combined with box-whisker-mean (BWM) plot
0.35 COOLEST DAYS of the same data as shown in Fig. 3.11. (From Haberl and Abbas (1998)
RELATIVE FREQUENCY
0.25 STANDARD
NORMAL
0.2 CURVE there are twice as many graphs as needed minimally (since
each graph has another one with the axis interchanged),
0.15
the redundancy is sometimes useful to the analyst in better
0.1 detecting underlying trends.
0.05
0
<–3 –2.5 –1.5 –0.5 0.5 1.5 2.5 >3
3.5.2 High-Interaction Graphical Methods
Z* (r,d)
The above types of plots can be generated by relatively low
Fig. 3.22 Several combination charts are possible. The plots shown end data analysis software programs. More specialized soft-
allows visual comparison of the standardized (subtracted by the mean ware programs called data visualization software are avail-
and divided by the standard deviation) hourly whole-house electricity
use in a large number of residences against the standard normal distri-
able which provide much greater insights into data trends,
bution. (From Reddy 1990) outliers and local behavior, especially when large amounts of
data are being considered. Animation has also been used to
advantage in understanding system behavior from monitored
3.5 Plotting Data 79
Weekday Temperatures < 45 F Weekday Temperatures < 45 F–75 F Weekday Temperatures < 75 F
60 60 60 60 60 60
50 50 50 50 50 50
Measured (kWh/h)
Measured (kWh/h)
Measured (kWh/h)
40 40 40 40 40 40
30 30 30 30 30 30
20 20 20 20 20 20
10 10 10 10 10 10
0 0 0 0 0 0
0 400 800 1200 1600 2000 2400 0 400 800 1200 1600 2000 2400 0 400 800 1200 1600 2000 2400
a Time of Day b Time of Day c Time of Day
Fig. 3.24 Example of a combined box-whisker-component plot depicting how hourly energy use varies with hour of day during a year for dif-
ferent outdoor temperature bins for a large commercial building. (From ASHRAE 2002 © American Society of Heating, Refrigerating and Air-
conditioning Engineers, Inc., www.ashrae.org)
data since time sequence can be retained due to, say, seasonal
differences. Animated scatter plots of the x and y variables
as well as animated contour plots, with color superimposed,
which can provide better visual diagnostics have also been
developed.
More sophisticated software is available which, however,
requires higher user skill. Glaser and Ubbelohde (2001)
describe novel high performance visualization techniques
for reviewing time dependent data common to building
energy simulation program output. Some of these techniques
include: (i) brushing and linking where the user can investi-
gate the behavior during a few days of the year, (ii) tessel-
lating a 2-D chart into multiple smaller 2-D charts giving a
4-D view of the data such that a single value of a representa-
tive sensor can be evenly divided into smaller spatial plots
arranged by time of day, (iii) magic lenses which can zoom
into a certain portion of the room, and (iv) magic brushes.
These techniques enable rapid inspection of trends and sin-
gularities which cannot be gleaned from conventional view-
ing methods.
(kWh/h)
© American Society of Heating, 40
18
Refrigerating and Air-condition- 20 12
ing Engineers, Inc., www.ashrae. 6
org)
0
1 APR 1 JUL 30 SEP 31 DEC
1.0
5% > MINIMUM point the cause of the anomalies. The experimenter is often
RELATIVE COOLING TOWER FAN SPEEDS
a bad point at all, but merely the beginning of a new portion on the regression parameters identified, and in fact retaining
of the curve (say, the onset of turbulence in an experiment it would be beneficial since it would lead to a reduction in
involving laminar flow). Similarly, even point C may be valid model parameter variance. The behavior shown in Fig. 3.31b
and important. Hence, the only way to remove this ambiguity is more troublesome because the estimated slope is almost
is to take more observations at the lower end. Thus, a modi- wholly determined by the extreme point. In fact, one may
fication of the statistical rejection criterion is that one should view this situation as a data set with only two data points, or
do so only if the points to be rejected are center points. one may view the single point as a spurious point and remove
Several advanced books present formal analytical treat- it from the analysis. Gathering more data at that range would
ment of outliers which allow diagnosing whether the regres- be advisable, but may not be feasible; this is where the judg-
sor data set is ill-conditioned or not, as well as identifying ment of the analyst or prior information about the underlying
and rejecting, if needed, the necessary outliers that cause trend line are useful. How and the extent to which each of the
ill-conditioning (for example, Belsley et al. 1980). Consider data points will affect the outcome of the regression line will
Fig. 3.31a. The outlier point will have little or no influence determine whether that particular point is an influence point
or not. This aspect is treated more formally in Sect. 5.6.2.
o
o
Response variable
y y
o o o
o o
B o o
o A
o
o
C
0 x 0 x
a b
Regressor variable
Fig. 3.31 Two other examples of outlier points. While the outlier point
Fig. 3.30 Illustrating different types of outliers. Point A is very prob- in (a) is most probably a valid point, it is not clear for the outlier point
ably a doubtful point; point B might be bad but could potentially be a in (b). Either more data has to be collected, failing which it is advisable
very important point in terms of revealing unexpected behavior; point to delete this data from any subsequent analysis. (From Belsley et al.
C is close enough to the general trend and should be retained until more (1980) by permission of John Wiley and Sons)
data is collected
82
82 3 Data Collection and Preliminary Data Analysis
True value
Frequency
Frequency
Parameter Measurement Parameter Measurement
a Unbiased and precise
b Biased and precise
True value
Frequency
Frequency
Parameter Measurement Parameter Measurement
c Unbiased and imprecise d Biased and imprecise
(measurements by one person using a single instrument). interval for the mean value of x, when no fixed (bias)
For the majority of engineering cases, it is impractical and errors are present in the measurements, is given by:
too costly to perform a true multi-sample experiment. While,
t.sx t.sx
strictly speaking, merely taking repeated readings with the x̄min = x̄ − ( √ ) and x̄max = x̄ + ( √ ) (3.17)
same procedure and equipment does not provide multi- n n
sample results, such a procedure is often accepted by the
engineering community as a fair approximation of a multi- For example, consider the case of d.f. = 10 and two-tailed
sample experiment. significance level α = 0.05. One finds from Table A4 that
Depending upon the sample size of the data (greater or t = 2.228 for 95% CL. Note that this increases to t = 2.086
less than about 30 samples), different statistical consider- for d.f. = 20 and reaches the z value for 1.96 for d.f. = ∞.
ations and equations apply. The issue of estimating confi-
dence levels is further discussed in Chap. 4, but operational Example 3.6.1 Estimating confidence intervals
equations are presented below. These levels or limits are (a) The length of a field is measured 50 times. The mean is 30
directly based on the Gaussian and the Student-t distribu- with a standard deviation of 3. Determine the 95% CL.
tions presented in Sect. 2.4.3a and b. This is a large sample case, for which the z
(a) Random Uncertainty in large samples (n > about 30): multiplier is 1.96. Hence, the 95% CL are
The best estimate of a variable x is usually its mean (1.96) · (3)
= 30 ± = 30 ± 0.83 = {29.17, 30.83}
value given by x̄. The limits of the confidence interval (50)1/2
are determined from the sample standard deviation sx. (b) Only 21 measurements are taken and the same mean and
The typical procedure is then to assume that the individ- standard deviation as in (a) are found. Determine the
ual data values are scattered about the mean following 95% CL.
a certain probability distribution function, within (±z. This is a small sample case for which the t-value = 2.086
standard deviation sx) of the mean. Usually a normal for d.f. = 20. Then, the 95% CL will turn out to be wider:
probability curve (Gaussian distribution) is assumed (2.086) · (3)
30 ± = 30 ± 1.37 = {28.63, 31.37}
to represent the dispersion in experimental data, unless
(21)1/2
the process is known to follow one of the standard dis-
tributions (discussed in Sect. 2.4). For a normal distri-
bution, the standard deviation indicates the following
degrees of dispersion of the values about the mean (see 3.6.4 Bias Uncertainty
Table A3). For z = 1.96, 95% of the data will be within
( ±1.96sx ) of the mean. Thus, the z multiplier has a Estimating the bias or fixed error at a specified confidence
direct relationship with the confidence level selected level (say, 95% confidence) is described below. The fixed
(assuming a known probability distribution). The confi- error BX for a given value x is assumed to be a single value
dence interval (CL) for the mean of n number of multi- drawn from some larger distribution of possible fixed errors.
sample random data, i.e., data which do not have any The treatment is similar to that of random errors with the
fixed error is: major difference that only one value is considered even
z.sx z.sx though several observations may be taken. Lacking further
x̄min = x̄−( √ ) and x̄max = x̄ + ( √ ) (3.16) knowledge, a normal distribution is usually assumed. Hence,
n n
if a manufacturer specifies that the fixed uncertainty BX is
(b) Random uncertainty in small samples (n < about 30). In ± 1.0°C with 95% confidence (compared to some standard ref-
many circumstances, the analyst will not be able to col- erence device), then one assumes that the fixed error belongs
lect a large number of data points, and may be limited to to a larger distribution (taken to be Gaussian) with a standard
a data set of less than 30 values (n < 30). Under such con- deviation SB = 0.5°C (since the corresponding z-value ∼ – 2.0).
ditions, the mean value and the standard deviation are
computed as before. The z value applicable for the nor-
mal distribution cannot be used for small samples. The 3.6.5 Overall Uncertainty
new values, called t-values, are tabulated for different
degrees of freedom d.f. (ν = n − 1) and for the acceptable The overall uncertainty of a measured variable x has to com-
degree of confidence (see Table A45). The confidence bine the random and bias uncertainty estimates. Though
several forms of this expression appear in different texts, a
convenient working formulation is as follows:
5
Table A4 applies to critical values for one-tailed distributions, while
most of the discussion here applies to the two-tailed case. See Sect. 4.2.2
sx 2
for the distinction between both. Ux = Bx 2
+ t√ (3.18)
n
3.6 Overall Measurement Uncertainty 85
where: Once installed, the engineer estimates that the bias error due
Ux = overall uncertainty in the value x at a specified confi- to the placement of the meter in the flow circuit is 2% at
dence level 95.5% CL. The flow meter takes a reading every minute, but
Bx = uncertainty in the bias or fixed component at the speci- only the mean value of 15 such measurements is recorded
fied confidence level once every 15 min. Estimate the overall uncertainty at 99%
sx = standard deviation estimates for the random component CL of the mean of the recorded values.
n = sample size The bias uncertainty can be associated with the normal
t = t-value at the specified confidence level for the appropri- tables. From Table A3, z = 2.575 has an associated probabil-
ate degrees of freedom ity of 0.01 which corresponds to the 99% CL. Since 95.5%
CL corresponds to z = 2, the bias uncertainty at one standard
Example 3.6.2: For a single measurement, the statistical deviation = 1%.
concept of standard deviation does not apply Nonetheless, Since the number of observations is less than 30, the stu-
one could estimate it from manufacturer’s specifications if dent-t table has to be used for the random uncertainty compo-
available. It is desired to estimate the overall uncertainty at nent. From Table A4, the critical t value for d.f. = 15 − 1 = 14
95% confidence level in an individual measurement of water and significance level of 0.01 is equal to 2.977. Also, the
flow rate in a pipe under the following conditions: 5.0
random uncertainty at one standard deviation = = 2.5%
(a) full scale meter reading 150 L/s 2
(b) actual flow reading 125 L/s Hence, the overall uncertainty of the recorded values at
(c) random error of instrument is ± 6% of full-scale reading 99% CL
at 95% CL 1/2
(2.977).(2.5) 2
(d) fixed (bias) error of instrument is ± 4% of full-scale = Ux = [(2.575).1] + 2
reading at 95% CL (15)1/2
The solution is rather simple since all stated uncer- = 0.0322 = 3.22%
tainties are at 95% CL. It is implicitly assumed that the
normal distribution applies. The random error = 150 ×
0.06 = ± 9 L/s. The fixed error = 150 × 0.04 = ± 6 L/s. The 3.6.6 Chauvenet’s Statistical Criterion of Data
overall uncertainty can be estimated from Eq. 3.18 with n = 1: Rejection
Ux = (62 + 92 )1/2 = ±10.82 L/s
The statistical considerations described above can lead to
Ux
The fractional overall uncertainty at 95% CL = = analytical screening methods which can point out data errors
x
10.82 not flagged by graphical methods alone. Though several
= 0.087 = 8.7% types of rejection criteria can be formulated, perhaps the best
125
known is the Chauvenet’s criterion. This criterion, which
Example 3.6.3: Consider Example 3.6.2. In an effort to presumes that the errors are normally distributed and have
reduce the overall uncertainty, 25 readings of the flow are constant variance, specifies that any reading out of a series
taken instead of only one reading. The resulting uncertainty of n readings shall be rejected if the magnitude of its devia-
in this case is determined as follows. tion dmax from the mean value of the series is such that the
The bias error remains unchanged at ± 6 L/s. √ probability of occurrence of such a deviation exceeds (1/2n).
The random error decreases by a factor of n to It is given by:
9/(25)1/2 = ±1.8 L/s
= 0.819 + 0.544. ln (n) − 0.02346. ln (n2 ) (3.19)
The overall uncertainty is thus: Ux = (6² + 1.8²)1/2 = ± 6.26 L/s dmax
The fractional overall uncertainty at 95% confidence sx
Ux 6.26
level = = = 0.05 = 5.0%
x 125 where sx is the standard deviation of the series and n is the
Increasing the number of readings from 1 to 25 reduces number of data points. The deviation ratio for different num-
the relative uncertainty in the flow measurement from ± 8.7% ber of readings is given in Table 3.7. For example, if one
to ± 5.0%. Because of the large fixed error, further increase takes 10 observations, an observation shall be discarded if its
in the number of readings would result in only a small reduc- deviation from the mean is dmax ≥ (1.96)sx .
tion in the overall uncertainty. This data rejection should be done only once, and more
than one round of elimination using the Chauvenet criterion
Example 3.6.4: A flow meter manufacturer stipulates a ran- is not advised. Note that the Chauvenet criterion has inherent
dom error of 5% for his meter at 95.5% CL (i.e., at z = 2). assumptions which may not be justified. For example, the
86
86 3 Data Collection and Preliminary Data Analysis
Table 3.7 Table for Chauvenet’s criterion of rejecting outliers follow- where:
ing Eq. 3.19 sy = function standard deviation
Number of readings N Deviation ratio dmax/Sx sx,i = standard deviation of the measured quantity xi
2 1.15 Neglecting terms higher than the first order (as implied by a
3 1.38 first order Taylor Series expansion), the propagation equa-
4 1.54 tions for some of the basic operations are given below. Let x1
5 1.65 and x2 have standard deviations s1 and s2. Then:
6 1.73
7 1.80
Addition or subtraction: y = x1 ± x2 and
2 2 1/2 (3.21)
10 1.96 sy = (sx1 + sx2 )
15 2.13
20 2.31 Multiplication: y = x1 .x2 and
25 2.33 2 1/2
sx1 2
sx2
30 2.51 sy = (x1 .x2 ). +
50 2.57 x1 x2
100 2.81
300 3.14 (3.22)
500 3.29
1000 3.48 Division: y = x1 /x2 and
2 2 1/2
x1 sx1 sx2
sy = . + (3.23)
x2 x1 x2
underlying distribution may not be normal, but could have
a longer tail. In such a case, one may be throwing out good
data. A more scientific manner of dealing with outliers which For multiplication and division, the fractional error is given
also yields similar results is to use weighted regression or by the same expression. If y = xx1 x2 , then the fractional
3
robust regression, where observations farther away from the standard deviation:
mean are given less weight than those from the center (see 2 1/2
Sect. 5.6 and 9.6.1 respectively). sy sx1 sx2 2 sx3 2 (3.24)
= 2
+ 2 + 2
y x1 x2 x3
3.7 Propagation of Errors The uncertainty in the result depends on the squares of the
uncertainties in the independent variables. This means that if
In many cases, the variable of interest is not directly mea- the uncertainty in one variable is larger than the uncertainties
sured, but values of several associated variables are mea- in the other variables, then it is the largest uncertainty that
sured, which are then combined using a data reduction dominates. To illustrate, suppose there are three variables
equation to obtain the value of the desired result. The objec- with an uncertainty of magnitude 1 and one variable with
tive of this section is to present the methodology to estimate an uncertainty of magnitude 5. The uncertainty in the result
overall uncertainty from knowledge of the uncertainties in would be (52 + 12 + 12 + 12)0.5 = (28)0.5 = 5.29. Clearly, the effect
the individual variables. The random and fixed components, of the largest uncertainty dominates the others.
which together constitute the overall uncertainty, have to be An analysis involving relative magnitude of uncertainties
estimated separately. The treatment that follows, though lim- plays an important role during the design of an experiment
ited to random errors, could also apply to bias errors. and the procurement of instrumentation. Very little is gained
by trying to reduce the “small” uncertainties since it is the
“large” ones that dominate. Any improvement in the over-
3.7.1 Taylor Series Method for Cross-Sectional all experimental result must be achieved by improving the
Data instrumentation or experimental technique connected with
these relatively large uncertainties. This concept is illustrated
In general, the standard deviation of a function y = y(x1, x2, in Example 3.7.2 below.
…, xn), whose independently measured variables are all Equation 3.20 applies when the measured variables are
given with the same confidence level, is obtained by the first uncorrelated. If they are correlated, their interdependence
order expansion of the Taylor series: can be quantified by the covariance (defined by Eq. 3.9).
If two variables x1 and x2 are correlated, then the standard
deviation of their sum is given by:
n
2
∂y
sy = ( sx,i ) (3.20)
i=1
∂x i
3.7 Propagation of Errors 87
Table 3.8 Error table of the four Quantity Minimum Maximum Random error at full % errorsa
quantities that define the Reyn- flow flow flow (95% CL) Minimum Maximum
olds number (Example 3.7.2)
Velocity m/s (V) 1 20 0.1 10 0.5
Pipe diameter m (d) 0.2 0.2 0 0 0
Density kg/m3 (ρ) 1000 1000 1 0.1 0.1
−3 −3 −3
Viscosity kg/m -s (μ) 1.12 × 10 1.12 × 10 0.45 × 10 0.4 0.4
a
Note that the last two columns under “% error” are computed from the previous three columns of data
3. The random (precision) error at 95% CL for the type of Table 3.9 Numerical computation of the partial derivatives of t with
commercial grade sensor to be used for temperature mea- Q and r
surement is 0.2°F. Consequently, the error in the mea- Multiplier Assuming Q = 1000 Assuming r = 0.027
surement of temperature difference ∆T = (0.22 + 0.22)1/2 = r t (from Eq. 3.32b) Q t (from Eq. 3.32b)
0.28°F. From manufacturer catalogs, the temperature
0.99 0.02673 69.12924 990 68.43795
difference between supply and return chilled water tem-
1.00 0.027 68.75178 1000 68.75178
peratures at full load can be assumed to be 10°F. The frac-
1.01 0.02727 68.37917 1010 69.06297
tional uncertainty at full load is then
2 2
0.28 tainties of both quantities are taken to be normal with
UT UT
= = 0.00078 and = ±0.078
T 10 T one standard deviation values of 0.2% (absolute) and
10% (relative) respectively, determine the lower and
4. Propagation of the above errors yields the fractional upper estimates of the years to depletion at the 95%
uncertainty at 95% CL at full chiller load of the measured confidence level.
COP: Though the partial derivatives can be derived analytically,
the use of Eq. 3.26 will be illustrated so as to compute them
UCOP numerically. Let us use Eq. 3.32b with a perturbation multi-
= (0.0031 + 0.00012 + 0.00078)1/2
COP plier of 1% to both the base values of r (= 0.027) and of Q
= 0.063 = 6.3% (= 1000). The pertinent results are assembled in Table 3.9.
From here:
It is clear that the fractional uncertainty of the proposed instru-
∂t (68.37917 − 69.12924)
mentation is not satisfactory for the intended purpose. = = −1389 and
The logical remedy is to select a more accurate flow meter ∂r (0.02727 − 0.02673)
or one with a lower maximum flow reading. ∂t (69.06297 − 68.43795)
= = 0.03125
∂Q (1010 − 990)
Example 3.7.4: Uncertainty in exponential growth models
Exponential growth models are used to model several com- Then:
monly encountered phenomena, from population growth 2 2 1/2
to consumption of resources. The amount of resource con-
∂t ∂t
st = sr + sQ
sumed over time Q(t) can be modeled as: ∂r ∂Q
t
P0 rt
Q(t) = P0 ert dt = (e − 1) (3.32a)
= {[ − 1389)(0.002)]2 + [(0.03125)(0.1)(1000)]2 }1/2
r
0 = (2.7782 + 3.1252 )1/2 = 4.181
where P0 = initial consumption rate, and r = exponential rate Thus, the lower and upper limits at the 95% CL (with the
of growth z = 1.96) is
The world coal consumption in 1986 was equal to 5.0 bil-
lion (short) tons and the estimated recoverable reserves of = 68.75 ± (1.96)4.181 = {60.55, 76.94} years
coal were estimated at 1000 billion tons. The analyst should repeat the above procedure with, say, a
(a) If the growth rate is assumed to be 2.7% per year, how perturbation multiplier of 2% in order to evaluate the sta
many years will it take for the total coal reserves to be bility of the numerically derived partial derivatives. If these
depleted? differ substantially, it is urged that the function be plotted
Rearranging Eq. 3.32a results in and scrutinized for irregular behavior around the point of
interest.
1
Q.r (3.32b)
t= ln 1 +
r P0
3.7.2 Taylor Series Method for Time Series Data
Or
Uncertainty in time series data differs from that of stationary
1 (1000)(0.027)
t= . ln 1 + = 68.75 years data in two regards:
0.027 5 (a) the uncertainty in the dependent variable yt at a given
time t depends on the uncertainty at the previous time
(b) Assume that the growth rate r and the recoverable yt−1, and thus, uncertainty compounds over consecutive
reserves are subject to random uncertainty. If the uncer- time steps, i.e., over time; and
90
90 3 Data Collection and Preliminary Data Analysis
where
Fig. 3.35 Schematic layout of a cool storage system with the chiller
located upstream of the storage tanks for Example 3.7.5. (From Dorgan
Q = stored energy amount or inventory of the storage system
and Elleson 1994 © American Society of Heating, Refrigerating and (say, in kWh or Ton-hours)
Air-conditioning Engineers, Inc., www.ashrae.org) t = time
qin = rate of energy flow into (or out of) the tank due to the
secondary coolant loop during charging (or discharging)
(b) some or all of the independent variables x may be cross- qloss = rate of heat lost by tank to surroundings
correlated, i.e., they have a tendency to either increase The rate of energy flow into or out of the tank can be deduced
or decrease in unison. by measurements from:
The effect of both these factors is to increase the uncer-
qin = Vρcp (Tin − Tout ) (3.36)
tainty as compared to stationary data (i.e., data without time-
wise behavior). Consider the function shown below: where
V = volumetric flow rate of the secondary coolant
y = f (x1 , x2 , x3 ) (3.33) ρ = density of the coolant
Following Eq. 3.25, the equation for the propagation of ran- cp = specific heat of coolant
dom errors for a data reduction function with variables that Tout = exit temperature of coolant from tank
exhibit cross correlation (case (b) above) is given by: Tin = inlet temperature of coolant to tank
The two temperatures and the flow rate can be measured, and
Uy2 =[(Ux1 .SCx1 )2 + (Ux2 .SCx2 )2 + (Ux3 .SCx3 )2 thereby qin can be deduced.
+ 2.rx1 x2 .SCx1 .SCx2 .Ux1 Ux2 The rate of heat loss from the tank to the surroundings can
+ 2.rx1 x3 .SCx1 .SCx3 .Ux1 Ux3 also be calculated as:
+ 2.rx2 x3 .SCx2 .SCx3 .Ux2 Ux3 ] (3.34) qloss = U A(Ts − Tamb ) (3.37)
where where
Uxi is the uncertainty of variable xi UA = effective overall heat loss coefficient of tank
∂y
SCxi = the sensitivity coefficients of y to variable xi = ∂xi , and Tamb = ambient temperature
rxi xj = correlation coefficient between variables xi and xj. Ts = average storage temperature
The UA value can be determined from the physical construc-
Example 3.7.5: Temporal Propagation of Uncertainty in tion of the tank and the ambient temperature measured.
ice storage inventory Combining all three above equations:
The concept of propagation of errors can be illustrated with
= Vρcp (Tin − Tout ) − U A(Ts − Tamb )
time-wise data for an ice storage system. Figure 3.35 is a dQ
(3.38)
schematic of a typical cooling system comprising of an dt
upstream chiller charging a series of ice tanks. The flow to
these tanks can be modulated by means of a three-way valve Expressing the time rate change of heat transfer in terms of
when partial charging or discharging is to be achieved. The finite differences results in an expression for stored energy at
building loads loop also has its dedicated pump and three- time (t) with respect to time (t − 1):
way valve. It is common practice to charge and discharge
the tanks uniformly. Thus, they can be considered to be one Q ≡ Qt − Qt−1 = t.[C.T − U A.(Ts − Tamb )]
large consolidated tank for analysis purposes. The inventory (3.39a)
of the tank is the cooling capacity available at any given where
time, and is an important quantity for the system operator ∆t = time step at which observations are made (say, 1 h),
3.7 Propagation of Errors 91
Table 3.10 Storage inventory and uncertainty propagation table for Example 3.7.5
Hour Mode of Storage variables 95% CL Uncertainty in storage capacity
Ending storage
Change in Storage Inlet fluid Exit fluid Total flow Change in UQ,t Absolute Relative
(t)
storage capacity temp (°C) temp (°C) rate V (L/s) Uncertainty UQ,t /Qmax UQ,t /Qt
DQt (kWh) Qt (kWh) (Eq. 3.40c)
8 Idle 0 2967 – – – 0.00 0.00 0.000 0.000
9 Idle 0 2967 – – – 0.00 0.00 0.000 0.000
10 Discharging −183 2784 4.9 0.1 9.08 1345.72 36.68 0.012 0.013
11 “ −190 2594 5.5 0.2 8.54 1630.99 54.56 0.018 0.021
12 “ −327 2266 6.9 0.9 12.98 2116.58 71.37 0.024 0.031
13 “ −411 1855 7.8 2.1 17.17 1960.29 83.99 0.028 0.045
14 “ −461 1393 8.3 3.1 21.11 1701.69 93.57 0.032 0.067
15 “ −443 950 8.1 3.4 22.44 1439.15 100.97 0.034 0.106
16 “ −260 689 6.2 1.7 13.76 1223.73 106.86 0.036 0.155
17 “ −165 524 5.3 1.8 11.22 744.32 110.28 0.037 0.210
18 Idle 0 524 – – – 0.00 110.28 0.037 0.210
19 “ 0 524 – – – 0.00 110.28 0.037 0.210
20 “ 0 524 – – – 0.00 110.28 0.037 0.210
21 “ 0 524 – – – 0.00 110.28 0.037 0.210
22 “ 0 524 – – – 0.00 110.28 0.037 0.210
23 Charging 265 847 −3.3 −0.1 19.72 721.59 113.51 0.038 0.134
24 “ 265 1112 −3.4 −0.2 19.72 721.59 116.64 0.039 0.105
1 “ 265 1377 −3.4 −0.2 19.72 721.59 119.70 0.040 0.087
2 “ 265 1642 −3.6 −0.3 19.12 750.61 122.79 0.041 0.075
3 “ 265 1907 −3.6 −0.4 19.72 721.59 125.70 0.042 0.066
4 “ 265 2172 −3.8 −0.6 19.72 721.59 128.53 0.043 0.059
5 “ 265 2437 −4 −0.8 19.72 721.59 131.31 0.044 0.054
6 “ 265 2702 −4.4 −1.1 19.12 750.61 134.14 0.045 0.050
7 “ 265 2967 −4.8 −1.6 19.72 721.59 136.80 0.046 0.046
Table 3.11 Magnitude and associated uncertainty of various quantities used; but it has limitations. If uncertainty is large, this method
used in Example 3.7.5 may be inaccurate for non-linear functions since it assumes
Quantity Symbol Value Random uncertainty at derivatives based on local functional behavior. Further, an
95% CL implicit assumption is that errors are normally distributed.
Density of water ρ 1000 kg/m3 0.0 Finally, in many cases, deriving partial derivatives of com-
Specific heat of cp 4.2 kJ/kg°C 0.0 plex analytical functions is a tedious and error-prone affair,
water
and even the numerical approach described and illustrated
Temperature T °C 0.1°C
above is limited to cases of small uncertainties. A more
Flow rate V L/s UV = 6% of full
scale reading of 30 general manner of dealing with uncertainty propagation is
L/s = 1.8 L/s = 6.48 m3/hr to use Monte Carlo methods, though these are better suited
Temperature ∆T °C U∆T = (0.12 + 0.12)1/2 = 0.141 for more complex situations (and treated at more length in
difference Sects. 11.2.3 and 12.2.7). These methods are numerical
methods for solving problems involving random numbers
inlet and outlet temperatures and the fluid flow through the and require considerations of probability. Monte Carlo, in
tank are also indicated. These are the operational variables essence, is a process where the individual basic variables
of the system. Table 3.11 gives numerical values of the perti- or inputs are sampled randomly from their prescribed prob-
nent variables and their uncertainty values which are used to ability distributions so as to form one repetition (or run or
compute the last four columns of the table. trial). The corresponding numerical solution is one possible
The uncertainty at 95% CL in the fluid flow rate into the outcome of the function. This process of generating runs is
storage is: repeated a large number of times resulting in a distribution
of the functional values which can then be represented as
UC = ρcp UV = (1000).(4.2)(6.48) = 27, 216 kJ/hr-◦ C probability distributions, or as histograms, or by summary
= 7.56 kWh/hr-◦ C statistics or by confidence intervals for any percentile thresh-
old chosen. The last option is of great importance in cer-
Inserting numerical values in Eq. 3.40b and setting the time tain types of studies. The accuracy of the results improves
step as one hour, one gets with the number of runs in a square root manner. Increasing
the number of runs 100 times will approximately reduce the
2 2 2 2 ◦
UQ,t − UQ,t−1 = [(7.56)T ] + [C.(0.141)] kWh/hr- C uncertainty by a factor of 10. Thus, the process is computer
(3.40c) intensive and requires thousands of runs be performed. How-
ever, the entire process is simple and easily implemented on
The uncertainty at the start of the calculation of the stor- spreadsheet programs (which have inbuilt functions for gen-
age inventory is taken to be 0% while the maximum storage erating pseudo-random numbers of selected distributions).
capacity Qmax = 2967 kWh. Equation 3.40c is used at each Specialized software programs are also available.
time step, and the time evolution of the uncertainty is shown There is a certain amount of uncertainty associated with
in the last two columns both as a fraction of the maximum the process because Monte Carlo simulation is a numerical
storage capacity (referred to as “absolute”, i.e., [UQ,t/Qmax]) method. Several authors propose approximate formulae for
and as a relative uncertainty, i.e., as [UQ,t/Qt]. The variation determining the number of trials, but a simple method is as
of both these quantities is depicted graphically in Fig. 3.36. follows. Start with a large number of trials (say, 1000), and
Note that the absolute uncertainty at 95% CL increases to generate pseudo random numbers with the assumed prob-
4.6% during the course of the day, while the relative uncer- ability distribution. Since they are pseudo-random, the mean
tainty goes up to 21% during the hours of the day when the and the distribution (say, the standard deviation) may devi-
storage is essentially depleted. Further, note that various ate somewhat from the desired ones (which depend on the
simplifying assumptions have been made during the above accuracy of the algorithm used). Generate a few such sets
analysis; a detailed evaluation can be quite complex, and so, and pick one which is closest to the desired quantities. Use
whenever possible, simplifications should be made depend- this set to simulate the corresponding values of the function.
ing on the specific system behavior and the accuracy to This can be repeated a few times till one finds that the mean
which the analysis is being done. and standard deviations stabilize around some average val-
ues which are taken to be the answer. It is also urged that
the analyst evaluate the effect of the results with different
3.7.3 Monte Carlo Method number of trials; say, using 3000 trials, and ascertaining that
the results of both the 1000 trial and 3000 trials are similar.
The previous method of ascertaining uncertainty, namely If they are not, sets with increasingly large number of trials
based on the first order Taylor series expansion is widely should be used till the results converge.
3.8 Planning a Non-intrusive Field Experiment 93
0.05 Table 3.12 The first few and last few calculations used to determine
uncertainty in variable t using the Monte Carlo method (Example 3.7.6)
0.04 Run # Q (1000, 100) r (0.027, 0.002) t (years)
Absolute Uncertainty
0.03
1 1000.0000 0.0270 68.7518
2 1050.8152 0.0287 72.2582
0.02 3 1171.6544 0.0269 73.6445
4 1098.2454 0.0284 73.2772
0.01
5 1047.5003 0.0261 69.0848
0 6 1058.0283 0.0247 67.7451
8 11 15 19 23 3 7 7 946.8644 0.0283 68.5256
Hour of day
a Absolute uncertainty 8 1075.5269 0.0277 71.8072
9 967.9137 0.0278 68.6323
0.24
10 1194.7164 0.0262 73.3758
0.2
11 747.9499 0.0246 57.2155
Relative Uncertainty
tocol and scheme are followed, and that the appropriate data installation from the manufacturer’s recommendations
analysis procedures are selected). It is advisable to explicitly should be documented and the effects of the devia-
adhere to the following steps (ASHRAE 2005): tion on instrument performance evaluated. A change in
(a) Identify experimental goals and acceptable accuracy instrumentation or location may be required if in-situ
Identify realistic experimental goals (along with some uncertainty exceeds acceptable limits determined by the
measure of accuracy) that can be achieved within the preliminary uncertainty analysis.
time and budget available for the experiment. (i) Perform initial data quality verification
(b) Identify variables and relationships To ensure that the measurements taken are not too
Identify the entire list of relevant measurable variables uncertain and represent reality, instrument calibration
that should be examined. If some are inter-dependent, and independent checks of the data are recommended.
or if some are difficult to measure, find alternative Independent checks can include sensor validation,
variables. energy balances, and material balances (see Sect. 3.3).
(c) Establish measured variables and limits (j) Collect data
For each measured variable, determine its theoretical The challenge for data acquisition in any experiment
limits and expected bounds to match the selected instru- is to collect the required amount of information while
ment limits. Also, determine instrument limits – all sen- avoiding collection of superfluous information. Super-
sor and measurement instruments have physical limits fluous information can overwhelm simple measures
that restrict their ability to accurately measure quanti- taken to follow the progress of an experiment and can
ties of interest. complicate data analysis and report generation. The
(d) Preliminary instrumentation selection relationship between the desired result, either static,
Selection of the equipment should be based on accuracy, periodic stationary or transient, and time is the deter-
repeatability and features of the instrument increase, mining factor for how much information is required.
as well as cost. Regardless of the instrument chosen, A static, non-changing result requires only the steady-
it should have been calibrated within the last twelve state result and proof that all transients have died out. A
months or within an interval required by the manufac- periodic stationary result, the simplest dynamic result,
turer, whichever is less. The required accuracy of the requires information for one period and proof that the
instrument will depend upon the acceptable level of one selected is one of three consecutive periods with
uncertainty for the experiment. identical results within acceptable uncertainty. Tran-
(e) Document uncertainty of each measured variable sient or non-repetitive results, whether a single pulse or
Utilizing information gathered from manufacturers or a continuing, random result, require the most informa-
past experience with specific instrumentation, document tion. Regardless of the result, the dynamic characteris-
the uncertainty for each measured variable. This infor- tics of the measuring system and the full transient nature
mation will then be used in estimating the overall uncer- of the result must be documented for some relatively
tainty of results using propagation of error methods. short interval of time. Identifying good models requires
(f) Perform preliminary uncertainty analysis a certain amount of diversity in the data, i.e., should
An uncertainty analysis of proposed measurement cover the spatial domain of variation of the independent
procedures and experimental methodology should be variables (discussed in Sect. 6.2). Some basic sugges-
completed before the procedures and methodology are tions pertinent to controlled experiments are summa-
finalized in order to estimate the uncertainty in the final rized below which are also pertinent for non-intrusive
results. The higher the accuracy required of measure- data collection.
ments, the higher the accuracy of sensors needed to (i) Range of variability: The most obvious way in
obtain the raw data. The uncertainty analysis is the basis which an experimental plan can be made compact
for selection of a measurement system that provides and efficient is to space the variables in a predeter-
acceptable uncertainty at least cost. How to perform mined manner. If a functional relationship between
such a preliminary uncertainty analysis was discussed in an independent variable X and a dependent vari-
Sect. 3.6 and 3.7. able Y is sought, the most obvious way is to select
(g) Final instrument selection and methods end points or limits of the test, thus covering the
Based on the results of the preliminary uncertainty test envelope or domain that encloses the complete
analysis, evaluate earlier selection of instrumentation. family of data. For a model of the type Z = f(X,Y),
Revise selection if necessary to achieve the acceptable a plane area or map is formed (see Fig. 3.37).
uncertainty in the experiment results. Functions involving more variables are usually
(h) Install instrumentation broken down to a series of maps. The above dis-
Instrumentation should be installed in accordance with cussion relates to controllable regressor variables.
manufacturer’s recommendations. Any deviation in the Extraneous variables, by their very nature, cannot
3.8 Planning a Non-intrusive Field Experiment 95
Minimizing fixed errors can be accomplished by care- Table 3.13 Data table for Problem 3.3
ful calibration with referenced standards. Month Station A Station B Station C
(m) Reporting results January 9.867 3.723 4.410
Reporting is the primary means of communicating the February 14.035 8.416 11.100
results from an experiment. The report should be struc- March 10.700 20.723 4.470
tured to clearly explain the goals of the experiment and April 13.853 9.168 8.010
the evidence gathered to achieve the goals. It is assumed May 7.067 4.778 34.080
that data reduction, data analysis and uncertainty analy- June 11.670 9.145 8.990
sis have processed all data to render them understand- July 7.357 8.463 3.350
able by the intended audiences. Different audiences August 3.358 4.086 4.500
require different reports with various levels of detail and September 4.210 4.233 6.830
background information. In any case, all reports should October 3.630 2.320 5.800
include the results of the uncertainty analysis to an iden- November 2.953 3.843 3.480
tified confidence level (typically 95%). Uncertainty December 2.640 3.610 3.020
limits can be given as either absolute or relative (in per-
centages). Graphical and mathematical representations Pr. 3.4 Consider Example 3.7.3 where the uncertainty analy-
are often used. On graphs, error bars placed vertically sis on chiller COP was done at full load conditions. What about
and horizontally on representative points are a very clear part-load conditions, especially since there is no collected
way to present expected uncertainty. A data analysis sec- data? One could use data from chiller manufacturer catalogs
tion and a conclusion are critical sections, and should be for a similar type of chiller, or one could assume that part-load
prepared with great care while being succinct and clear. operation will affect the inlet minus the outlet chilled water
temperatures (∆T) in a proportional manner, as stated below.
(a) Compute the 95% CL uncertainty in the COP at 70%
Problems and 40% full load assuming the evaporator water flow
rate to be constant. At part load, the evaporator tempera-
Pr. 3.1 Consider the data given in Table 3.2. Determine tures difference is reduced proportionately to the chiller
(a) the 10% trimmed mean value load, while the electric power drawn is assumed to
(b) which observations can be considered to be “mild” out- increase from a full load value of 0.8 kW/t to 1.0 kW/t
liers (> 1.5 × IQR) at 70% full load and to 1.2 kW/t at 40% full load.
(c) which observations can be considered to be “extreme” (b) Would the instrumentation be adequate or would it be
outliers (> 3.0 × IQR) prudent to consider better instrumentation if the frac-
(d) identify outliers using Chauvenet’s criterion given by tional COP uncertainty at 95% CL should be less than
Eq. 3.19 10%.
(e) compare the results from (b), (c) and (d). (c) Note that fixed (bias) errors have been omitted from
the analysis, and some of the assumptions in predict-
Pr. 3.2 Consider the data given in Table 3.6. Perform an ing part-load chiller performance can be questioned.
exploratory data analysis involving computing pertinent sta- A similar exercise with slight variations in some of
tistical summary measures, and generating pertinent graphi- the assumptions, called a sensitivity study, would be
cal plots. prudent at this stage. How would you conduct such an
investigation?
Pr. 3.3 A nuclear power facility produces a vast amount of
heat which is usually discharged into the aquatic system. This Pr. 3.5 Consider the uncertainty in the heat transfer coef-
heat raises the temperature of the aquatic system resulting in ficient illustrated in Example 3.7.1. The example was solved
a greater concentration of chlorophyll which in turn extends analytically using the Taylor’s series approach. You are asked
the growing season. To study this effect, water samples were to solve the same example using the Monte Carlo method:
collected monthly at three stations for one year. Station A is (a) using 500 data points
located closest to the hot water discharge, and Station C the (b) using 1000 data points
farthest (Table 3.13). Compare the results from this approach with those in the
You are asked to perform the following tasks and annotate solved example.
with pertinent comments:
(a) flag any outlier points Pr. 3.6 You will repeat Example 3.7.6. Instead of computing
(b) compute pertinent statistical descriptive measures the standard deviation, plot the distribution of the time vari-
(c) generate pertinent graphical plots able t in order to evaluate its shape. Numerically determine
(d) compute the correlation coefficients. the uncertainty bands for the 95% CL.
3.8 Planning a Non-intrusive Field Experiment 97
Pr. 3.7 Determining cooling coil degradation based on Table 3.15 Parameters and uncertainties to be assumed (Pr. 3.8)
effectiveness Parameter Nominal value 95% Uncertainty
The thermal performance of a cooling coil can also be char- cpc 1 Btu/lb°F ±5%
acterized by the concept of effectiveness widely used for mc 475,800 lb/h ±10%
thermal modeling of traditional heat exchangers. In such
Tc,i 34°F ±1°F
coils, a stream of humid air flows across a coil supplied by
Tc,o 46°F ±1°F
chilled water and is cooled and dehumidified as a result. In
chc 0.9 Btu/hr°F ±5%
this case, the effectiveness can be determined as:
mh 450,000 lb/h ±10%
actual heat transfer rate (hai − hao ) Th,i 55°F ±1°F
ε= =
maximum possible heat transfer rate (hai − hci ) Th,o 40°F ±1°F
(3.41)
where hai and hao are the enthalpies of the air stream at the where m, T and c are the mass flow rate, temperature and
inlet and outlet respectively, and hci is the enthalpy of enter- specific heat respectively, while the subscripts 0 and i stand
ing chilled water. for outlet and inlet, and c and h denote cold and hot streams
The effectiveness is independent of the operating condi- respectively.
tions provided the mass flow rates of air and chilled water The effectiveness of the sensible heat exchanger is given
remain constant. An HVAC engineer would like to determine by:
whether the coil has degraded after it has been in service for
actual heat transfer rate
a few years. For this purpose he assembles the following coil ε=
performance data at identical air and water flow rates corre- maximum possible heat transfer rate
sponding to when originally installed (done during start-up Qactual
= (3.42b)
commissioning) and currently (Table 3.14). (mcp )min (Thi − Tci )
Note that the uncertainty in determining the air enthal-
pies are relatively large due to the uncertainty associated Assuming the values and uncertainties of various parameters
with measuring bulk air stream temperatures and humidities. shown in the table (Table 3.15):
However, the uncertainty in the enthalpy of the chilled water (i) compute the heat exchanger loads and the uncertainty
is only half of that of air. ranges for the hot and cold sides
(a) Asses, at 95% CL, whether the cooling coil has degraded (ii) compute uncertainty in the effectiveness determination
or not. Clearly state any assumptions you make during (iii) what would you conclude regarding the heat balance
the evaluation. checks?
(b) What are the relative contributions of the uncertainties
in the three enthalpy quantities to the uncertainty in the Pr. 3.9 The following table (Table 3.16) (EIA 1999) indi-
effectiveness value? Do these differ from the installed cates the total electricity generated by five different types of
period to the time when current tests were performed? primary energy sources as well as the total emissions associ-
ated by each. Clearly coal and oil generate a lot of emissions
Pr. 3.87 Consider a basic indirect heat exchanger where heat or pollutants which are harmful not only to the environment
rates of the heat exchange associated with the cold and hot but also to public health. France, on the other hand, has a mix
sides is given by: of 21% coal and 79% nuclear.
Qactual = mc .cpc .(Tc,o − Tc,i ) (cold side heating)
(3.42a)
Qactual = mh .cph .(Th,i − Th,o ) (hot side cooling) Table 3.16 Data table for Problem 3.9
US power generation mix and associated pollutants
Table 3.14 Data table for Problem 3.7 Fuel Electricity Short Tons (= 2000 lb/t)
Units When Current 95% kWh (1999) % Total SO2 NOx CO2
installed Uncertainty
Coal 1.77E + 12 55.7 1.13E + 07 6.55E + 06 1.90E + 09
Entering air enthalpy (hai) Btu/lb 38.7 36.8 5%
Oil 8.69E + 10 2.7 6.70E + 05 1.23E + 05 9.18E + 07
Leaving air enthalpy (hao) Btu/hr 27.2 28.2 5% Nat. Gas 2.96E + 11 9.3 2.00E + 03 3.76E + 05 1.99E + 08
Entering water enthalpy (hci) Btu/hr 23.2 21.5 2.5% Nuclear 7.25E + 11 22.8 0.00E + 00 0.00E + 00 0.00E + 00
Hydro/ 3.00E + 11 9.4 0.00E + 00 0.00E + 00 0.00E + 00
Wind
From ASHRAE (2005) © American Society of Heating, Refrigerating
7
and Air-conditioning Engineers, Inc., www.ashrae.org). Totals 3.18E + 12 100.0 1.20E + 07 7.05E + 06 2.19E + 09
98
98 3 Data Collection and Preliminary Data Analysis
with the symbols described in Table 3.17 along with their Pr. 3.12 Sensor placement in HVAC ducts with consider-
numerical values. ation of flow non-uniformity
(i) Determine the absolute and relative uncertainties in Esave Consider the same situation as in Pr. 3.11. Usually, the air
under these conditions. ducts have large cross-sections. The problem with inferring
(ii) If this uncertainty had to be reduced, which variable outdoor air flow using temperature measurements is the
will you target for further refinement? large thermal non-uniformity usually present in these ducts
(iii) What is the minimum value of ηnew under which the due to both stream separation and turbulence effects. More-
lower bound of the 95% CL interval is greater than zero. over, temperature (and, hence density) differences between
the OA and MA streams result in poor mixing. The following
Pr. 3.11 Uncertainty in estimating outdoor air fraction in table gives the results of a traverse in the mixed air duct with
HVAC systems 9 measurements (using an equally spaced grid of 3 × 3 desig-
Ducts in heating, ventilating and air-conditioning (HVAC) nated by numbers in bold in Table 3.18). The measurements
systems supply conditioned air (SA) to the various spaces were replicated four times under the same outdoor condi-
in a building, and also exhaust the air from these spaces, tions. The random error of the sensors is 0.2°F at 95% CL
called return air (RA). A sketch of an all-air HVAC system is with negligible bias error. Determine:
shown in Fig. 3.39. Occupant comfort requires that a certain (a) the worst and best grid locations for placing a single
amount of outdoor air (OA) be brought into the HVAC sys- sensor (to be determined based on analyzing the record-
tems while an equal amount of return air is exhausted to the ings at each of the 9 grid locations and for all four time
outdoors. The OA and the RA mix at a point just before the periods)
3.8 Planning a Non-intrusive Field Experiment 99
Table 3.18 Table showing the temperature readings (in º F) at the nine where C( t) is the indoor concentration at a given time t, k is a
different sections (S#1–S#9) of the mixed air (MA) duct (Pr. 3.12) constant which includes effects such as the occupant breath-
55.6, 54.6, 55.8, 54.2 56.3, 58.5, 57.6, 63.8 53.7, 50.2, 59.0, 49.4 ing rate, the absorption efficiency of the agent or species,…
S#1 S#2 S#3 and t1 and t2 are the start and end times. This relationship is
58.0, 62.4, 62.3, 65.8 66.4, 67.8, 68.7, 67.6 61.2, 56.3, 64.7, 58.8 often used to determine health-related exposure guidelines
S#4 S#5 S#6 for toxic substances. For a simple one-zone building, the free
63.5, 65.0, 63.6, 64.8 67.4, 67.4, 66.8, 65.7 63.9, 61.4, 62.4, 60.6 response, i.e., the temporal decay is given in terms of the
S#7 S#8 S#9
initial concentration C(t1) by:
C(t) = C(t1 ). exp [(− a(t − t1 )] (3.45b)
(b) the maximum and minimum errors at 95% CL one
could expect in the average temperature across the duct where the model parameter “a” is a function of the volume
cross-section, if the best grid location for the single sen- of the space and the outdoor and supply air flow rates. The
sor was adopted. above equation is easy to integrate during any time period
from t1 to t2, thus providing a convenient means of computing
Pr. 3.13 Uncertainty in estimated proportion of exposed total occupant inhaled dose when occupants enter or leave
subjects using Monte Carlo method the contaminated zones at arbitrary times. Let a = 0.017186
Dose-response modeling is the process of characterizing with 11.7% uncertainty while C(t1) = 7000 cfu/m3 (cfu–col-
the relation between the dose of an administered/exposed ony forming units). Assume k = 1.
agent and the incidence of an adverse health effect. These (a) Determine the total dose to which the individual is
relationships are subject to large uncertainty because of the exposed to at the end of 15 min.
paucity of data as well as the fact that they are extrapolated (b) Compute the uncertainty of the total dose at 1 min time
from laboratory animal tests. Haas (2002) suggested the use intervals over 15 min (similar to the approach in Exam-
of an exponential model for mortality rate due to inhalation ple 3.7.6)
exposure by humans to anthrax spores (characterized by the (c) Plot the 95% CL over 15 min at 1 min intervals
number of colony forming units or cfu):
Pr. 3.15 Propagation of optical and tracking errors in solar
p = 1 − exp (− kd) (3.44)
concentrators
where p is the expected proportion of exposed individu- Solar concentrators are optical devices meant to increase the
als likely to die, d is the average dose (in cfu) and k is the incident solar radiation flux density (power per unit area) on
dose response parameter (in units of 1/cfu). A value of a receiver. Separating the solar collection component (viz.,
k = 0.26 × 10−5 has been suggested. One would like to deter- the reflector) and the receiver can allow heat losses per col-
mine the shape and magnitude of the uncertainty distribution lection area to be reduced. This would result in higher fluid
of d at p = 0.5 assuming that the one standard deviation (or operating temperatures at the receiver. However, there are
uncertainty) of k is 30% of the above value and is normally several sources of errors which lead to optical losses:
distributed. Use the Monte Carlo method with 1000 trials to (i) Due to non-specular or diffuse reflection from the
solve this problem. Also, investigate the shape of the error reflector, which could be due to improper curvature
probability distribution, and ascertain the upper and lower of the reflector surface during manufacture (shown in
95% CL. Fig. 3.40a) or to progressive dust accumulation over the
surface over time as the system operates in the field;
Pr. 3.14 Uncertainty in the estimation of biological dose (ii) Due to tracking errors arising from improper tracking
over time for an individual mechanisms as a result of improper alignment sensors or
Consider an occupant inside a building in which an acciden- non-uniformity in drive mechanisms (usually, the track-
tal biological agent has been released. The dose (D) is the ing is not continuous; a sensor activates a motor every
cumulative amount of the agent to which the human body few minutes which re-aligns the reflector to the solar
is subjected, while the response is the measurable physio- radiation as it moves in the sky). The result is a spread
logical change produced by the agent. The widely accepted in the reflected radiation as illustrated in Fig. 3.40b;
approach for quantifying dose is to assume functional forms (iii) Improper reflector and receiver alignment during the
based on first-order kinetics. For biological and radiological initial mounting of the structure or due to small ground/
agents where the process of harm being done is cumulative, pedestal settling over time).
one can use Haber’s law (Heinsohn and Cimbala 2003): The above errors are characterized by root mean square
(or rms) random errors (bias errors such as that arising from
t2 structural mismatch can often be corrected by one-time or
D(t) = k C(t)dt (3.45a) regular corrections), and their combined effect can be deter-
t1
100
100 3 Data Collection and Preliminary Data Analysis
Incoming ray
Incident Reflected
ray rays
a b Tracker reflector
Fig. 3.40 Different types of optical and tracking errors. a Micro- distribution. Note that a tracker error of σtrack results in a reflection error
roughness in solar concentrator surface leads to a spread in the reflected σreflec = 2.σtrack from Snell’s law. Factor of 2 also pertains to other sources
radiation. The roughness is illustrated as a dotted line for the ideal based on the error occurring as light both enters and leaves the optical
reflector surface and as a solid line for the actual surface. b Tracking device (see Eq. 3.46)
errors lead to a spread in incoming solar radiation shown as a normal
Glaser, D. and S. Ubbelohde, 2001. “Visualization for time dependent Reddy, T.A., 1990. Statistical analyses of electricity use during the hot-
building simulation”, 7th IBPSA Conference, pp. 423–429, Rio de test and coolest days of summer for groups of residences with and
Janeiro, Brazil, Aug. 13–15. without air-conditioning. Energy, vol. 15(1): pp. 45–61.
Haas, C. N., 2002. “On the risk of mortality to primates exposed to Schenck, H., 1969. Theories of Engineering Experimentation, 2nd Edi-
anthrax spores.” Risk Analysis vol. 22(2): pp.189–93. tion, McGraw-Hill, New York.
Haberl, J.S. and M. Abbas,1998. Development of graphical indices Tufte, E.R., 1990. Envisioning Information, Graphic Press, Cheshire,
for viewing building energy data: Part I and Part II, ASME J. Solar CN.
Energy Engg., vol. 120, pp. 156–167 Tufte, E.R., 2001. The Visual Display of Quantitative Information, 2nd
Heinsohn, R.J. and J.M. Cimbala, 2003, Indoor Air Quality Engineer- Edition, Graphic Press, Cheshire, CN
ing, Marcel Dekker, New York, NY Tukey, J., 1988. The Collected Works of John W. Tukey, W. Cleveland
Holman, J.P. and W.J. Gajda, 1984. Experimental Methods for Engi- (Editor), Wadsworth and Brookes/Cole Advanced Books and Soft-
neers, 5th Ed., McGraw-Hill, New York ware, Pacific Grove, CA
Kreider, J.K., P.S. Curtiss and A. Rabl, 2009. Heating and Cooling of Wonnacutt, R.J. and T.H. Wonnacutt,1985. Introductory Statistics, 4th
Buildings, 2nd Ed., CRC Press, Boca Raton, FL. Ed., John Wiley & Sons, New York.