You are on page 1of 11



From Materials Evaluation, Vol. 73, No. 1, pp: 44–54.
Copyright © 2015 The American Society for Nondestructive Testing, Inc.

by Charles Annis, John C. Aldrin, and
Harold A. Sabbagh


hat is missing in nondestructive
testing (NDT) capability evaluation is
what is missing in many engineering
evaluations of risk—understanding of
the statistical premises governing their calculation.
Apparently, it is easy to forget that clever reasoning,
however valid, cannot rescue a faulty premise. And if
NDT practitioners do not even know what that premise
is, they are in trouble at the outset. It is the authors’
objective here to begin to remedy that.

What is
Missing in
Photo credit: Matt Lieb

The Most Common Mistake Engineers Make in



Two of the authors have been practicing engineers for
nearly five decades, each. In their experience, the
most common mistake that engineers make in their
statistical analysis is beginning with a valid mathematical statement that is conditionally true, then
proceeding with a series of valid mathematical operations to arrive at an answer—which may or may not be
true, depending on the long-forgotten (or ignored)
Consider two questions about the following mathematical statement: 2 + 2 = 5.
l Question 1: Is this a valid mathematical statement?
l Answer 1: Yes. Addition is defined as a binary
operation. It requires two addends, a sign indicating the operation, an equal sign, and the sum.
Statement 1 meets these criteria; thus, it is a valid
mathematical statement.
l Question 2: Is the statement true?
l Answer 2: No. It is false. A valid statement can be
false and still be a valid mathematical construct.

0.314.3 0.16. and the upper bound is lower bound is X – 2s – – ^ X + 2s .1 0. and (b) log-normal Q-Q plot.95 0.0 X 0.2 0.2 0. It is easy to check the normal assumption: make a quantile-quantile (Q-Q) plot. they might add that the material was assumed to be isotropic and linear elastic and tell about its ultimate strength.9 Fraction < X Engineers would never work from something as obviously false as this statement.8 1. without checking. so the lower bound can never be negative.05 0.0. Pressed for details. is 0. And in this case. Simple.62.32. For example. The lesson here is not that engineers do not check if their data have a normal distribution. yield 0.Capability 0.409.) X: 0. so the – ^.2 0. only to illustrate that not all data are normal and assuming that they are. 0. Why? Because the data do not have a normal distribution. If the data are (approximately) normal they plot as a straight line. 0.219 and 1. which means that – < log(X) < .01 2 10–2 (b) 3 4 5 6 789 10–1 2 3 4 5 6 789 100 X Figure 1.95 0. respectively.05 0.02 0.99 0.01 0. but just because something is not obviously false does not mean it is not false. A Q-Q plot displays the quantiles (percentiles) of the data against the quantiles expected from the probability model—in this case.8 0.6 0.8 0. is irresponsible.037. ask engineers about their finite element analysis and they will say what software was used. Answer: Everyone knows ±2 standard deviations from the mean enclose 95% of the sample.4 (a) 0. (Making inferences based on small samples can be dangerous.6 0. s so the bounds are –0.4 0. it is that they seldom check the validity of any of their statistical assumptions. 0.99 0.98 0.10. 0. and the ^.3 0. the data are lognormal. Rather.98 0.6 0. The mean is X =  X / n = 0. estimate of the standard deviation.0 0.5 0.7 0.7 0. Consider the commonplace statistical example of computing upper and lower bounds expected to include 95% of the population given the following observations. Normal quantile-quantile (Q-Q) plot. JANUARY 2015 • MATERIALS EVALUATION 45 . wrong. on which normal data plot as a straight line: (a) normal Q-Q plot.9 Fraction < X 0.5 0.4 0.02 0.1 0.43. which they do not in Figure 1a. but do when plotted as log(X) in Figure 1b. but the purpose here is not to discuss that trap. 1.23. the normal model.

46 to investigate those under-appreciated assumptions in regression analysis. so finding a summed squared error is not possible. 1992). 2011). This is not always the case—sometimes the response is below some noise threshold or above some saturation value. what to do with censored observations? They can either be ignored (which means throwing away useful information about the sought after model parameters) or replaced with their censoring value. Regression of signal strength (Y) on target size (X): (a) ordinary leastsquares requires all responses to be observable. 2001. The assumptions also hold for the technique of maximum-likelihood estimation (MLE). Implicit Statistical Assumptions in Regression Analysis POD (a) Signal strength. and (b) replacing censored values with the censoring value skews the result anticonservatively. the resulting analysis will be wrong. Cressie and Wikle. all responses are observed. Since it is unknown (other than being below some noise or above some saturation). l The model must be linear in the parameters. POD versus size model.8 0.4 0. as well as illustrates the relationship between probability of detection (POD) and signal strength. In that case it is censored.6 0. MLE will be discussed later in this section. While this may be self-evident. “Simply not understanding the nature of the assumptions being made does not mean that they do not exist” (Frank et al. Perhaps the most obvious is this: l The model must look like the data (Harrell. the authors will use it 0. Venables and Ripley.8 0. what can be done? . space.ME FEATURE w x ndt capability evaluation strength. given how often it is ignored. All analysis relies on assumptions concerning the relationship between reality and the process being modeled. a or log(a) 0. Cressie. 1993). POD = probability of detection. OLS has been a fundamental part of engineering practice for 200 years. Both choices are bad. 1989. So.2 The Response Must Be Continuous and Observable (Part 1) â = â0 âcensor Target size. l The observations must be uncorrelated with respect to time. l The errors must be normal (Sakia.4 0. even though that fact may be far from obvious. loading conditions. and temperature profile. ask engineers about their statistical analysis and they will say what statistical software was used. Since neither option is acceptable. or both (Chatfield.2 â = â0 a10 POD (a) Signal strength. â or log(â) (a) (b) a50 a90 Target size. Now the authors consider each requirement more closely. elastic properties. l The variance must be homoscedastic (have uniform variance).6 0. 1993. checking to see if the assumption holds is less so. Figure 2b illustrates that the OLS parameter estimates based on replacing an observation with its censoring value results in an erroneous. MATERIALS EVALUATION • JANUARY 2015 In Figure 2a. Figure 2a shows the relationship between OLS regression of NDT response and target size.. However. Quoting from an older work. 2010). OLS chooses parameter estimates that minimize the summed squared difference between the model and the observations. There are five other implicit assumptions that must be satisfied for the resulting parameter estimates to be useful: l The response must be continuous and observable. frequently used in POD evaluation. If any of these assumed conditions are not met. anticonservative. it is obviously not possible to compute the difference (error) between the observation and the model. â or log(â) Since most engineers are familiar with ordinary leastsquares (OLS) linear regression. a or log(a) Figure 2.

With likelihood there is a collection of observations. and the probability is desired of the next observation falling in a given range. Before leaving the topic. It is not perfect. With censored data. MLEs are exactly equal to OLS estimators. but it is far superior to its alternatives. but the authors aim to provide a very brief introduction to likelihood. and the most likely mean and standard deviation are desired. POD (a) For some unfathomable reason.4 0. so there is no need to jettison 200 years of OLS experience to use the MLE criterion. Rather than minimizing the summed error. the censoring value. Look familiar? More involved problems may require more sophisticated optimization algorithms. setting the derivative equal to zero.8 0. Likelihood is defined. The ordinate is unknown. then. the maximum likelihood – occurs at X = X / n. but likelihood handles censored data easily. as the ordinate of the given probability density. OLS is untenable: the errors cannot be computed so they cannot be minimized. an optimization problem must be solved. That cannot be fixed now.2 Signal strength. Likelihood is the chances that these parameters are the best possible for the given probability model. the foundation of modern statistics. Probability and likelihood are two sides of the same coin: probability provides the chances that outcome X will occur. Looking at censored observations. showing the correct censored regression fit as compared with the OLS fit of all the data. The only difference is what is known. the gaussian. OLS is powerless to deal with censoring. given a model with stated parameters. given this collection of observations. In this example. but the idea is the same: the best parameter estimates are those that maximize the likelihood of observing what has occurred. They are not taught how statistical stuff works but spend tedious hours discussing red and black balls in urns. Then the optimization problem is solved. the likelihood is maximized. that is.6 0. X is unknown.there is no need to jettison 200 years of OLS experience to use the MLE criterion. Censored regression using maximum-likelihood estimation (blue dashed line) correctly accounts for observations with actual responses obscured by background noise and thus censored. and solving. To find the MLE of the mean. JANUARY 2015 • MATERIALS EVALUATION 47 . With probability the parameters (for example. for example. 0. the integral of the probability density below or above. When the data are not censored. Rather. except that it is greater or less than some censoring value. mean and standard deviation of normal data) are known. This involves differentiating the log of the product of likelihoods (one likelihood for each observation). POD = probability of detection. since X could be anything in the censored reason. engineers are not introduced to likelihood in their first statistics course. a or log(a) Figure 3. â or log(â) Probability and Likelihood â = â0 âcensor a10 a50 a90 Target size. the likelihood of a censored observation is defined as all of them. How can the likelihood be maximized if the likelihood is unknown? Simply. the authors illustrate in Figure 3 how well MLEs perform with the regression data in Figure 2b. they are given a collection of formulas to memorize. Their mathematical formulations are identical.

If the . the authors will use it with continuous data since its deficiencies are more obvious there. 6 sample means. Signal strength is often misleading because a small crack can weep penetrant and appear larger. and a large crack can still be so tight as to prevent penetrant entry. Thus it provides more (30 – 3 = 27) degrees of freedom for estimating the standard deviation of the underlying variability (“error”). Figure 4b. nominally identical. Might there be sufficient reason to suspect the true relation is much simpler—like a simple straight line? A parametric model. It is important to note that this does not mean that 95 of the next 100 observations will fall within the prediction bounds. experiments. In Figure 4a. 48 MATERIALS EVALUATION • JANUARY 2015 often used. The parametric model is more efficient. All of the mathematical manipulations in technique 1 are valid as in Figure 4b. The innermost bounds are the confidence bounds on the mean line. hit/miss outcome. There are two sets of bounds on the regression plot. and 6 standard deviations and tacitly assumes that the observed behavior is the actual behavior. Y 80 60 40 20 0 0 20 40 60 80 Independent variable. The next future single observation is expected to fall within the prediction bounds 95% of the time.ME FEATURE w x ndt capability evaluation The Response Must Be Continuous and Observable (Part 2) Many NDT techniques provide only a binary. the data are in groups of five observations each. Just because software can be coerced to provide an answer does not mean the answer is meaningful. centered at the sample means and based on the sample standard deviations. it can be used. and estimate the bounds by connecting the points at group mean ±2sample. but this approach begs several questions: l Is the true underlying relationship really as crooked as it appears? l Are the six standard deviations really different. OLS cannot be used with binary data. not locally to provide a better overall description of the data. X (a) Dependent variable. as compared with 12. X Figure 4. assuming that they had a normal distribution. The confidence bounds are expected to contain the true relationship (red line) in 95 of 100 nominally identical experiments. or do they result entirely by chance. (More specifically. requiring only three parameters. To illustrate how ill advised this idea is. and (b) technique 2. It means that of 100 similar. produces a more believable description of the underlying reality and does not tempt the unwary into trying to explain group differences that are only To estimate the lower (and upper) bound. normal distributions were drawn.) One (creative) technique to describe binary response is to group the data and analyze the grouped averages. estimate the mean behavior by connecting the group means. the sample means were connected. but it is wrongly Dependent variable. To estimate the underlying Y = f(X) relationship. the next single observation in 95 of the experiments would likely be contained within that experiment’s prediction bounds. Y 80 60 40 20 0 0 (b) 20 40 60 80 Independent variable. a parametric model can use all the data collectively. Inefficient versus efficient model building: (a) technique 1. and would another random sample of 30 look rather different? l Is it the best that can be done? It requires estimating 12 parameters. The six sample means and six standard deviations were calculated. The outer bounds are prediction bounds on the individuals.

0 2..0 0. The Variance Must be Homoscedastic The variance (data scatter about the line or model) must be approximately constant because the fitting criterion is minimized summed squared differences between the model and each observation. the smaller observations would be slighted so as to do a better job with the larger ones where the deviations are proportionally larger. If the variance were. but rather to call the reader’s attention to standard statistical methods that are well suited for solving many of NDT engineering problems. f(X) = g(Y) = log(p / [1 – p]).95 = 0. 2009). Appendix G. y = b0 + b1x2 + b0sin(x) is a linear statistical model. Probability of detection (POD) as a function of target size. A binary analog is needed of the parametric model technique. The model parameters are again chosen to maximize the probability of observing what was actually observed.) = exp f [ X ] ( ) 1 + exp f [ X ] ( ) where f(X) = b0 + b1X.8 0.9025. Attempts at grouping binary data suffer from all these shortcomings. Equation 1 is a non-linear model. where the link is based on the probability of observing Y rather than observing Y directly. As before.. no transformation will be effective. In fact.0 0.95  0. Some are autocorrelated—related with their neighbors in time so more recent observations are likely to be JANUARY 2015 • MATERIALS EVALUATION 49 .2 0. A large number of grouped means is no longer needed. or box-cox transformation. The resulting parameter estimates would not be useful. a (mm) Figure 5.08 7. that is. supported by hit/miss data. 1988). In engineering. as with errors that are both additive and multiplicative. then the probability that the next two observations will be within the bounds is 0.54 5.9 0.3 0. There is an entire area of applied statistics dedicated to nonlinear regression (Bates and Watts.4 0. Not meeting the requirement for being linear in the model parameters means that OLS cannot be used.Y) relationship. but y = b0 + b1e–b2x is not.1 0. found as example 3 hm. small at one extreme.62 10.7 0. it is possible to transform the observations (log[y]. f(X) = g(Y) = Y. The Model Must be Linear in the Parameters Statisticians and engineers use the term “linear model” differently.5 0. Sometimes. The completed analysis is shown in Figure 5 (DOD. Details in MIL-HDBK-1823A. for example) to stabilize the variance so OLS is viable. A linear model links Y with X directly. which focuses on techniques to produce POD versus size curves based on experimental data (DOD. where g(Y) is the identity function. Consider real data. a90/95 a90 a50 1. and large at the other.(1) POD( a. so the 95% prediction bounds for a single future observation are also the approximate 90% bounds for the next two observations. and only two model parameters are necessary to estimate as shown in Equation 1: 0. 2009). say. . The Observations Must be Uncorrelated Most observations are uncorrelated but not all.6 POD a probability that the next single observation will be within the prediction bounds is 0. but as with GLM. The confidence bounds are constructed using extremes of feasible likelihood. for example. the authors’ purpose is not to provide a précis on mathematical statistics. So. that means only that some other technique is required. This idea can be generalized. especially treating random behavior as if it were meaningful.16 Size. Sometimes the transform does stabilize the variance but also destroys the simplicity of an underlying (X.csv in MIL-HDBK-1823A (2009).Y) relationship.95. Confidence bounds describe how well the model captures the true (X. In statistics. Sometimes. a model is linear if it is linear in the model parameters. The prediction bounds describe the anticipated behavior of the next single observation. a system is linear if the output is a linear function of the input. a generalized linear model (GLM).

It would have prevented using a normal distribution erroneously in the earlier example. not only are the errors not normal.78 Size. This is a reasonable assumption in nearly every NDT situation— but not all. The model says the POD is nearly perfect. Here again.).5 in. Again. sometimes very close scrutiny. or lack of attentiveness by the inspector. a90 a50 a90/95 Nonexistent a90 and a90/95 1.2 0. What about the model could be wrong? This model. for example.1 Clue that something is wrong 0. perhaps due to inaccessibility of the inspection site. POD = probability of detection. In some situations the POD may never reach zero because of excessive noise and false positives.5 and diminishes as p approaches either zero or one. The plot is necessary.9 0. Software may be coerced to produce an answer— but it will be wrong. But what about NDT data collected hourly? Are early morning data somehow different from data collected in late afternoon? How is that known? Spatial autocorrelation should not be overlooked either. the variance is not constant. Consider the following real example. then the model is wrong.7 15. and approaches one for large cracks. a (mm) Figure 6.08 7.) were missed. like this one. as with many POD models.95 0. techniques that are useful for product improvement. These seven misses would be exceedingly unlikely if the true POD were greater than 95% as the model indicates. In other situations. It is a fact that if the data disagree with the model.5 0.ME FEATURE w x ndt capability evaluation The most common mistake engineers make with probability is multiplying probabilities. similar. Preliminary analysis of Nondestructive Testing Information Analysis Center 1997 A9002 3-L dataset showing some problems that arise from data plots.7 0. This is because the error variance depends on the mean: Var(p) = p(1 – p). Something is clearly wrong.6 0. An obvious example is weather data: tomorrow is more likely to look like today than it will look like last month.4 These misses would be very improbable if the POD model were correct and the local POD > 0. There are statistical techniques for separating the random component of these deviations from the systemic.3 0. but seven cracks larger than 12. but the data say otherwise. malfunction of the inspection apparatus. The model says that POD is nearly 100% for sizes greater than 12. 50 MATERIALS EVALUATION • JANUARY 2015 The Errors Must be Normal This requirement means binary data are not suited for OLS analysis because. assumes that POD approaches zero for very small cracks. The Model Must Look Like the Data Perhaps the first rule of statistical data analysis should be to plot the data. Consider a component’s random surface topography.54 5. only to suggest through references areas worth further engineering study.0 2. . and it can avoid considerable wasted time and misdirection in most circumstances.62 10.5 in.8 POD (a) 0. only to suggest through references areas worth further engineering study.0 0. Look carefully at Figure 6.0 0.7 mm (0.16 12. the purpose is not a discussion of mathematical statistics. because the plot itself requires scrutiny. but are more likely to be self-similar when they are proximal rather than distal. which means it is greatest when p = 0. the authors’ purpose is not a discussion of mathematical statistics.24 17. but not sufficient. Deviations from print may be random.7 mm (0. the POD never reaches one.

single inspection. are independent if the outcome of one has no influence on the outcome of the other. look at Figure 7. However. In both figures. A and B. 2. who concurs that it is red and passes the apple to Richard. TABLE 1 Example binary categorical data Manufacturer Mfg1 Mfg2 1 2 3 0 0 1 0 1 0 With four probe manufacturers this would require three Mfg parameters. Now there are three opinions on the color of one apple. How could they be placed on a real number line? They cannot. The “logic” of the purported improvement is something like this: l If PODsingle inspection = 0. Mfg1 and Mfg2.Categorical Data So far. Remember. more is known about the quality of the inspection process. Refer to a recent work for a great place to start to learn about analyzing categorical data (Agresti. how much more is known of the fraction of red apples in the barrel than after the first examination? Still nothing. A and B. but they will not improve the probability of detecting what is being looked for. then probability of miss (POM) = 1 – POD = 0. l The probability of missing something twice is POM  POM. So how then is P(A and B) calculated? Two events. What does this mean? Repeated inspections of the same thing are not very informative. 2002). and the information they do provide concerns the inspection itself. for example. different eddy current probe manufacturers. so that each appears in the model only when response is from that manufacturer’s probe. who also agrees that the apple is red. Richard gives the apple to Harold. with manufacturer 1 providing the baseline so coefficients would be the difference between manufacturer 1 and the others. the area representing “found by inspection A” and “found by inspection B” is counted twice. To see this. not what is being inspected. Suppose the data involved eddy current probes from different manufacturers and needed to be included in your regression analysis to evaluate manufacturer performance. A selected apple is examined and pronounced to be red. The probability of finding a crack with either inspection A or B is P(A or B) = P(A) + P(B) – P(A and B). Though nothing has been learned about the fraction of red and green apples in the barrel. this coding will be handled when a parameter is defined as being categorical. In that case. and 3. Thus. only the requirements on the response. ignored. The Most Common Mistake Engineers Make with Probability The most common mistake engineers make with probability is multiplying probabilities. the inspection repeatability leaves something to be desired. Y. have been considered. Now. The not altogether obvious problem is that manufacturer 3 has been defined to have three times the influence as manufacturer 1. many things cannot be described by a position on the number line. any analysis would be hopelessly wrong because the purpose of the analysis is to determine the relative performances of each manufacturer. just because the assumptions being made are unknown does not mean they are not being made. However. The regression would determine the coefficients associated with each Mfg. A parameter could be created called Mfg and assigned values of 1. who says that it is green. The same apple is given to Thomas. How much more is known of the fraction of red apples in the barrel than after the first examination? Nothing.9.99. Now. One technique (there are others) is to define two Mfg parameters. and that has been made impossible. One requirement of X for OLS regression to be meaningful is that it be continuous and observable. or unknown assumptions being made about the independent observations. It means that repeated inspections will help understand the inspection process better. Therefore. Consider. and thus must be subtracted from the sum of P(A) + P(B). the redundant inspection has 99% POD compared with the original. not red. X. underappreciated. can one JANUARY 2015 • MATERIALS EVALUATION 51 . for example. the PODdouble inspection = 1 – POM2 = 1 – 0. consider determining the fraction of red apples and green apples in a barrel by inspecting a sample of them. As a result. but there are other. To see why looking at the same thing twice does not change much. like in Table 1. POD of 90%. and so on. Such data are categorical and cannot be analyzed simply by assigning values. As another example consider two inspections.1. and attempts to do it anyway are part of the ad hoc analysis problem. Here is an example—POD improvement through redundant inspection. If this R is used. This does not mean that repeated inspections should not be carried out.01 = 0. and only in that case. It has been further stipulated that the difference between manufacturer 1 and manufacturer 2 is the same as between manufacturer 2 and manufacturer 3.

Nordtest. any non-zero correlation means the inspections are not independent. there are some outstanding issues with the current practice for the quantitative evaluation of sizing capability with respect to NDT technique evaluation. Venn diagrams showing: (a) independent. and may require counting the fraction of times A and B occur together. By chance.. but in many situations events are not independent. there are some important assumptions like linearity in the response and constant variance in sizing error with changes in discontinuity size that should be addressed before using this measure. the results are dependent on discontinuity size. 1998). and found by both A and B. found by inspection B. What was found (or missed) by A was found (or missed) by B. In addition. specifically addressing discontinuities in welds and corrosion in aircraft structures (Ducharme et al. In (a). inspections A and B are not independent. even tedious. and (b) non-independent inspections. Characterization error for depth sizing was evaluated with a linear model using MLE. 2002. 1989. and missed by both.. In (b). Clearly. One metric frequently cited is the calculation of the 95% safety limit against undersizing (LUS) bound for quantifying sizing performance for discontinuities in welds 52 MATERIALS EVALUATION • JANUARY 2015 (Ducharme et al. Computational convenience is no substitute for veracity. Inspecting twice does not change probability of detection. Common Errors in Evaluating Nondestructive Testing Sizing Capability There have been some recent efforts to define and demonstrate a complete process for evaluating sizing capability. Nordtest. The common practice of fitting an OLS with the assumption that sizing performance does not vary with varying crack depth is also included in the plot (red dashed lines). A scatter plot of the sizing error as a function of crack depth is shown in Figure 8b. finding P(A and B) can be inconvenient. 2012. however. In general. 1998). Figure 7b shows complete correlation between inspections. However. However. so chance protects the ignorant from the consequences of the indiscriminant multiplying of probabilities. this simple OLS fit is not appropriate. Forsyth and Lepine. when the events are not independent. This demonstrates the need for care when attempting to report a single value that defines . The characterization error results shown in Figure 8b include a linear model fit (solid blue line) with corresponding 95% prediction bounds (black dash-dot line). events are often independent. A sizing example from eddy current inspection for surface cracks in metallic components is presented in Figure 8a. 2012. McCullagh and Nelder. However. Since there is a significant change in the lower bound for error in the crack depth estimates as a function of changing varying size. the simplistic character of the bound from an OLS fit does not adequately address the true variation of the bound with the varying distribution of discontinuities and limited sample numbers. the linear model fit appears to be adequate for the censored set of depths presented here and addresses the clear dependency as a function of varying depth. and the “multiplying probabilities” calculation is wrong.ME FEATURE w x ndt capability evaluation A A A and B A and B B (a) B (b) Figure 7. Note that area A and B is counted twice. “multiply probabilities” so that P(A and B) = P(A)  P(B)? What is wrong with multiplying probabilities? Nothing—if the events are independent. 1989. the venn diagram shows crack found by inspection A. McCullagh and Nelder.

Cloaking a dubious calculation in impressive-sounding statistical raiment changes nothing but the potential that it will impress the gullible and mislead the uninformed. and understanding mathematical statistics is necessary for engineering practice. Please remember that it is impossible to learn physics in a month.38 0.51 (b) 0. 1998). as often as not.25 0. estimated (mm) 0. What they do recommend is that the engineer learns what the statistician has learned. Where to begin? If having a statistician as a team member is not the answer. and likewise statistics cannot be learned any faster. An engineer can be rather accomplished as a mathematician and yet be completely ignorant of mathematical statistics. The current situation of sloppy engineering statistics is a consequence of statistical ignorance. Begin with Meeker and Escobar. but to produce a model with known statistical properties. then the engineer must learn statistics.38 Depth. (The authors are not suggesting the engineer shun the statistician because frequent statistical consultation can greatly facilitate learning.Characterization error. That is the reason why so many of the ad hoc “statistical” models are worthless: they may appear to describe the immediate data but they cannot be relied on to predict anything.25 0.5 0 –0. Red dashed lines are additional bounds based on an ordinary least-squares fit and assumption that sizing performance does not vary with varying crack depth. they can provide the requisite knowledge. as shown here. Plots include a linear model fit (black solid line) with corresponding prediction bounds (blue dash-dot line). JANUARY 2015 • MATERIALS EVALUATION 53 . The authors do not advocate that every engineering project have a statistician as a team member because. and no less is required to understand statistics.2 0. but mathematics is neither engineering nor statistics.25 0.25 0. For greater understanding of Meeker and Escobar’s summaries. nor can their purported confidence bounds be believed since their statistical properties are completely unknown.13 0. Statistics is not mathematics.15 0.13 0 0 (a) 0. Conclusion and Recommended Practice The purpose of statistical modeling is not simply to produce a description of the data (a French curve can do that).3 0. known (mm) Figure 8. the entire lower bound for the safety LUS. not simply finding a useful-looking equation and blindly using it.36 0.15 0.51 0. and with honest effort. Understanding physics required serious study.2 0. the authors recommend Casella and Berger (Casella and Berger. (b) characterization error for censored inversion results for crack depth. Mathematics is the language of statistics as it is for engineering. and this is not a quick fix.5 –0. Operators should never mandate generating such numbers for sizing capability evaluation when they are often not appropriate for the data. which is to applied statistics what physics is to engineering. since only the engineer understands the engineering problem.) The authors recommend two texts. known (mm) 0. The place to begin is with an understanding of mathematical statistics.46 Depth.15 0. Especially useful are the appendices summarizing the salient results from mathematical statistics. Characterization error in depth sizing: (a) eddy current sizing results for crack depth (left censoring was applied for small cracks with weak signals below the detection threshold). depth (mm) Depth. 2001). and the only remedy for ignorance is study. Taken together. the statistician is as ignorant of physics and engineering as the engineer is of statistics—not a situation conducive to effective communication and collaboration. What they do recommend is study. Study is hard work.41 0.1 –0. the single best statistical reference for an engineer practicing in the field (Meeker and Escobar.1 0. each with a different purpose.

E. New Jersey. Wiley. Rigault. Springer. Gurnee.. 2nd ed. New York.” The Statistician. Watts. “Automated Ultrasonic Phased Array Inspection of Fatigue Sensitive Riser Girth Welds with a Weld Overlay Layer of Corrosive Resistant Alloy (CRA). N.. 1993.. No. of the AFRL/RXCA Wright-Patterson AFB. REFERENCES Agresti. Harold A. 1988. Indiana 47401. Statistical Methods for Reliability Data. P. and Survival Analysis.. 2011.. Categorical Data Analysis. Piché. Cengage Learning. Diligent. Hoboken. Their purpose here is to call attention to the statistical ignorance that is a pall on the practice of engineering.Q. and C. ASTM Symposium on Probabilistic Aspects of Life Prediction. Vol. Logistic Regression. Sakia. Springer. New York. Lepine. “A Statistical View of some Chemometrics Regression Tools. 110. “Guidelines for NDE Reliability Determination and Description. London. Feuilly. Chapman and Hall. Illinois 60031. was used for the statistical analyses in the paper. They are especially grateful to Jeremy Knopp and Eric Lindgren. Ripley. and J. England. 2nd ed.” ASTM STP1450. 2002. “Probabilistic Life Prediction Isn’t as Easy as It Looks... and L. with the hope of suggesting resources to remedy it.A. Meeker. A. New Jersey. for their insightful comments and suggestions. Annis. I. No. Bloomington. New Jersey.A. John C. Palm Beach Garden... Modern Applied Statistics with S (Statistics and Computing). 1998. Nonlinear Regression Analysis and Its Applications. Strijdonk. R. and R. West Conshohocken. Hoboken. New York. 35. 1989. Nordtest. . The Analysis of Time Series. O. Florida 33418. Harrell. N. Department of Defense Handbook. 54 MATERIALS EVALUATION • JANUARY 2015 Cressie.M. Pennsylvania.” AIP Conference Proceedings. Venables. P. and Hoboken. R. Freidman.K. Wiley. pp. Vol. Kentucky. Pennsylvania. Philadelphia.. LLC. w x ACKNOWLEDGMENTS The authors would like to acknowledge ancillary support from the Air Force Research Lab (AFRL) under a SBIR Phase II Contract FA8650-13-C-5180 with Victor Technologies.. 1992. Escobar. McCullagh. England. 1998. 169–178. AUTHORS Charles Annis: Statistical Engineering. Hoboken. 4th ed. and J. Forsyth. W. 7 April 2009.. London.” Technometrics. Hoboken. DOD. 41. P. pp. 2002. “Development and Verification of NDI for Corrosion Detection and Quantification in Airframe Structures. New Jersey. F. Statistics for Spatio-temporal Data. Vol. Sabbagh: Victor Technologies. Cressie.ME FEATURE w x ndt capability evaluation One final comment: the authors are all engineers. 2003. N.” NDT. Chapman and Hall/CRC Press. Wiley. 2010.. Nondestructive Evaluation System Reliability Assessment. 615. and B. 2..C. D. 1989. an opensource package for statistical computing. 4th ed.” Nordtest Technical Report 394.S.. W. Wikle.A. Statistical Inference. Statistics for Spatial Data. ASTM International. C. 2001. Independence. 1. New York. C. 2001. Frank. Ducharme. Wiley. Casella. Aldrin: Computational Tools. and B.. 1993. Berger. Regression Modeling Strategies: With Applications to Linear Models. New Jersey.. G.A. S. September 2012. 2nd ed. 1787–1791. Jacques. MIL-HDBK-1823A. Chatfield.N. LLC. Bates. I. Nelder. D. p.. Generalized Linear Models... “The Box-Cox Transformation Technique: A Review. and D. Wiley.