You are on page 1of 7

w

x ME FEATURE

PODmodels
From Materials Evaluation, Vol. 73, No. 1, pp: 55–61.
Copyright © 2015 The American Society for Nondestructive Testing, Inc.

Developments in
Probability of
Detection Modeling
and Simulation
Studies
by Jeremy S. Knopp, Frank Ciarallo, and
Ramana V. Grandhi

Photo credit: William Meeker,
Iowa State University

T

he purpose of this paper is to review recent developments in
probability of detection (POD) modeling and demonstrate how
simulation studies can be used to benchmark the effect of
sample size on confidence bound calculations by means of
evaluating probability of coverage.

Review of Recent Advances
The historical development of POD is well documented in previous papers
and will not be discussed in this paper (Berens, 1989; Knopp et al., 2012;
Olin and Meeker, 1996). In the last decade, the major development was the
use of the likelihood ratio technique for confidence bound calculations
pertaining to hit/miss data (Annis and Knopp, 2007; Harding and Hugo,
2003; Spencer, 2007). This was incorporated into the guidance for POD
studies provided in MIL-HDBK-1823A (DOD, 2010).
The other areas where significant advances have been made include the
following:
l Exposition on the distinction between uncertainty and variability in nondestructive testing (NDT) (Li et al., 2012).
l Use of markov chain monte carlo (MCMC) simulation for confidence
bounds (Knopp and Zeng, 2013).

JANUARY 2015 • MATERIALS EVALUATION

55

(1) (2) pi = α + (β − α) × exp ( b0 + b1 log  ai ) 1 + exp ( b0 + b1 log ai ) pi = α + (β − α ) × Φ ( b0 + b1 log  ai ) (logit) (probit) . Safizadeh et al. The mean 50% POD is a50. Box-cox transformations to mitigate violations of homoscedasticity (Knopp et al. The authors point out that that the traditional way of performing a POD study determines the mean POD..0 0.2 POD (a) Upper confidence bound 0. Figure 1 shows the conventional twoparameter model.8 a90 a90/95 POD 0. the appropriate interval is a tolerance interval. 2012).and four-parameter models (Knopp and Zeng. If one is interested in where 95% of a90 values in future experiments will lie. Consider Equations 1 and 2 that show a four-parameter model for the logit and probit links. 2007). 2011). The two-parameter model forces the POD curve to zero as the discontinuity size approaches zero. and the upper 95% confidence bound on a90 is known as a90/95. The concept of probability of coverage discussed in this paper looks at this issue from another perspective. 2014. Spencer. The confidence bounds only pertain to the error due to sampling for a single experimental run.0 0 2 4 6 8 10 12 14 16 a (mm) Figure 1. 2014). Therefore. and to one as the discontinuity size approaches infinity. the mean 90% POD is a90. Nonparametric techniques for POD modeling (Spencer. A common error in interpreting a POD analysis is to think of the confidence bounds as pertaining to variability in the inspection process. 1. discussed elsewhere (Li et al... Moore and Spencer.. Sample size determination (Annis. The quantities of interest are a50. and a90/95.. respectively. 2014). Bayesian design of experiments (Koh and Meeker.6 a50 0. a90. 56 MATERIALS EVALUATION • JANUARY 2015 of linear regression (De Gryze et al. 2012).and Four-parameter Models and Markov Chain Monte Carlo Three. 2015).4 0. 1998. 2004). 1998.ME FEATURE w x developments in probability of detection modeling l l l l l l Three. Conventional two-parameter probability of detection (POD) model. however. 2012). and so a modified model that includes lower and upper asymptotes was developed to address this issue (Moore and Spencer. 1998. 2014). 2013. Bootstrapping for confidence bound calculations (Knopp et al. The technical details also appear in a tutorial in the context confidence bounds only pertain to the error due to sampling for a single experimental run Uncertainty and Variability It may be argued that the first item on the list is not an advance. The conventional model can be modified by adding additional parameters. There are many data sets that do not support POD of one for any discontinuity size. 2014). Three. it represents a very clear exposition and reminder of what a POD curve and associated confidence bounds actually provide (Li et al. Spencer.and four-parameter POD models have been proposed to address limitations of the conventional two-parameter models (Moore and Spencer. Spencer. which averages out variability from the inspection process.. it is a serious error to assume that 95% of a90 values in future experiments will be within the confidence bounds computed from a single experiment.

but this is not always the case. (b) threeparameter with upper bound. Bootstrapping Current guidance for POD studies assumes that a linear relationship exist between the explanatory variable. 2013). the POD curve approaches one for very large discontinuities but has an α parameter that needs to be estimated as the discontinuity size approaches zero. computing confidence bounds with non-informative priors is a graceful first step to introducing bayesian techniques. 2014). it is not POD per se. The β parameter represents this for large discontinuity sizes. In MCMC. The question of which model is appropriate for a given data set can be answered by using the bayes factor approach described in prior work (Kass and Raftery. and the measurement response “â”. Typically. so there are a total of eight possible models. l Three-parameter model with a lower asymptote. 2013).The α parameter represents the fact that there is a finite probability of detecting cracks well below the size intended and can be thought of as a false call rate. Figure 2c shows a fourparameter model that requires both α and β to be calculated. Knopp and Zeng. an alternative technique of computing confidence intervals for POD models that include lower and upper asymptotes via MCMC was introduced (Knopp and Zeng. Depending on whether α and β are included in the model. Bootstrapping is essentially sampling with replacement. and (c) four-parameter with lower and upper bound. l Four-parameter model that includes both α and β. there are four candidate models: l Two-parameter model that does not include α or β. Models with additional complexity beyond a linear model are sometimes necessary for proper analysis of â data. a flexible approach called bootstrapping was demonstrated. and it is a very easy technique to implement for more complicated models and is very useful for model-assisted POD. such as discontinuity size. Recently. Since MCMC is the computational engine that enables bayesian analysis. which is commonly designated with an “a” in POD literature. Figure 2b shows the β parameter that needs to be estimated to represent detection capability at very POD 1 α (a) 0 a large discontinuity sizes.. 1995. a logarithmic transform will remedy cases where the linear relationship is not established. Probability of detection (POD) model options: (a) three-parameter with lower bound. There is also a finite probability that very large cracks will not be detected. which is necessary for model-assisted POD. however. and an upper asymptote can also be estimated for this. but leads to a function that models an independent false call process on detections (Spencer. 2012). which means an individual sample does not depend on a previous sample. MCMC has proven to be an effective way to compute multidimensional integrals that occur in bayesian calculations. JANUARY 2015 • MATERIALS EVALUATION 57 . the logit and probit links have been found to fit POD data well. A recent work points out that even though POD when no discontinuity is present might (and probably does) equal something other than zero. MCMC techniques are very similar to monte carlo techniques with one important distinction: the samples in monte carlo are statistically independent. The difficulty in these cases is that the procedures for confidence bound calculations are not developed. Historically. This was the case in the analysis of data from an inspection of subsurface cracks around fastener sites using eddy current (Knopp et al. In Figure 2a. α. β. the samples are correlated with each other. 1 POD β (b) 0 a 1 POD β α (c) 0 a Figure 2. l Three-parameter model with an upper asymptote.

2012). Another development is the use of the box-cox transformation to mitigate violations of homoscedasticity for â data analysis. which in this case is crack length. The question of the range of discontinuity sizes and the distribution of discontinuity sizes is not discussed in MIL-HDBK-1823A. can be described with a power transformation on â in the form of Equation 3. Bayesian Design of Experiments A bayesian approach to planning hit/miss experiments has also been presented (Koh and Meeker. 2004). Safizadeh et al. c = (âi)1/n. A nonparametric model was proposed for POD with the only assumption being that the POD function is monotonically increasing with respect to discontinuity size (Spencer. An entirely different idea is to not assume any particular model form. In MIL-HDBK-1823A. The optimization problem is formulated such that the objective is to minimize SSE with l as a single parameter to be adjusted. For cases where there is a relationship between the mean response and the variance.ME FEATURE w x developments in probability of detection modeling Nonparametric Models The POD model described in MIL-HDBK-1823A assumes an S-shaped curve described by two parameters (DOD. if the nonparametric model closely follows a threeparameter model with an upper asymptote. chances are that the three-parameter model is the best fit. 58 MATERIALS EVALUATION • JANUARY 2015 The standardized observations were: 1 aˆiλ − 1 .. 2011). An example of how the box-cox transform is used was presented in the context of an eddy current inspection of cracks around fastener sites in an aircraft structure (Knopp et al. Recently. Sample Size One of the more common questions asked about POD studies is how many samples will be required. 2010). λ ≠ 0 λc λ − 1 ( ) (5) gi = (6) gi = c ln  aˆiλ − 1 . 2014). which is referred to in the literature as a nonparametric model. This technique assumes that the relationship between the error variance. 2014). and mean response. however. It is generally useful to see what type of model form the data dictates before forcing a parametric model on it. This allows engineers to use any prior information that may be known about a POD curve to assist in designing the experiment. which happens to be the geometric mean of the observations. and has not been examined extensively in the literature except in a few cases (Berens and Hovey. The â observations were first standardized so that the order of magnitude error sum of squares was not dependent on the value of λ. The recommendations from this study include using a uniform distribution of discontinuity sizes and that the range should be from 3 to 97% POD. which also needs to be estimated. Once these standardized observations are obtained. real inspection data often violate core assumptions required to use a POD model fit. the box-cox transformation is used to stabilize the variance.. but they . λ = 0 ( ) where n is the total number of observations.. this question for hit/miss data was investigated via simulation and looked at discontinuity size distribution and the effects of moving the center of the sample distribution relative to the true a50 value (Annis. Both the sample size and distribution of the samples will affect the POD evaluation. The three-parameter and fourparameter models discussed earlier are modified versions of that model. A numerical search procedure was set up to estimate λ. The conclusion of this work was that optimal test plans developed purely from bayesian techniques may not be practical. Box-cox Transformation It is always advantageous to use the measured response data for POD evaluation since there is more information contained in that form rather than hit/miss. they are then regressed on a. The new regression model in Equation 4 includes the additional λ parameter. This model is useful for many reasons. it is recommended that there be at least 40 samples when the â signal response data is used and 60 samples for hit/miss (DOD. and then the sum of squares error (SSE) is obtained. First it can be used as a screening model by comparing the form of the nonparametric model with the selected parametric model. Homoscedasticity means that the scatter in the observations is constant for the discontinuity size range. s2i . The number of specimens agrees with MIL-HDBK-1823A in that a minimum of 60 specimens should be used. For example. 2004). μi. (3) (4) aˆ ′ = aˆ λ aˆ λi = β 0 + β1 ai + ε i The technique described in an outside work was followed exactly (Kutner et al. 2010). 1985.

Simulated data with linear model form that includes additive and multiplicative noise. This model and associated parameters was used to create a data set with 100 000 observations 1 a (mm) Simulation Study with Additive and Multiplicative Noise This simulation study uses a noise model that includes both additive and multiplicative noise as represented in Equation 8.0 0. the authors recommend simulation studies to provide the NDT practitioner with a connection between the intuition gained with inspection experience and the statistical techniques used to quantify the capability of an inspection.4 0. Probability of coverage is defined as the probability that the interval contains the quantity of interest.3 0. JANUARY 2015 • MATERIALS EVALUATION 59 . (7) to resemble a population from which samples were drawn as shown in Figure 3.195 was determined in intervals of 1000 observations and the “true” POD curve is plotted in Figure 4. where â is the signal response and a is the indication size. covering a90 with a 95% upper confidence bound (that is. It was determined based on this technique that the “true” a90 is 2. a90/95) is of particular interest.9 0. and may be slightly sub-optimal for estimating a90. 0. and is used for the purpose of generating a synthetic data set useful for simulations purposes only.2 0. 0.).2 0.. The objective of simulation studies is to show how often (in terms of percent) a confidence interval contains the true parameter of interest. β0 = 0.8 “True” POD 0. True probability of detection (POD) curve available from 100 000 simulated observations. εmult = 0.1 P { a90/95 > true a90 } A simple case with a model that includes both additive and multiplicative noise is proposed as an approach for how to conduct a simulation study. 0. In this case.0316.114 in.316.195 can be considered the true a90 value.043. The additive noise component is designated by εadd. and the multiplicative is designated by εmult.13546. In this work.3 1 2 3 4 5 a (mm) Figure 4. so the probability that a90/95 is greater than the true value of a90 as defined in Equation 7 is what is meant by probability of coverage in this paper.7 0. The coverage probability of this value is investigated via simulation.6 0. but the uniform distribution recommendations for hit/miss experiments still performs quite well.0 0 The linear model parameters for data used in prior work was used as the basis for this study (Knopp et al. 1.907 mm (0. Another conclusion was that the recommendation of 60 observations for hit/miss analysis performs well for estimating a50. then the interval of observations with 90% proportion above 0.1 0. If 100 000 simulated observations is designated the “population” for this inspection.0 0 (8) aˆ = β 0 + β1 a (1 + ε mult ) + ε add 2 3 4 5 Figure 3.5 0. 0. simulation studies can be used to benchmark the effect of sample size on confidence bound calculations by means of evaluating probability of coverage.4 â can be used to develop a compromise between the optimal test plan and a uniform distribution of discontinuity sizes. β1 = 0. For example. This additive and multiplicative noise model resembles realistic inspection data.5 0. and εadd = 0. 2013). The proportion of observations above the detection threshold of 0.Simulation Studies Going forward.6 0.

Knopp: Air Force Research Laboratory (AFRL/RXCA). S. Summary and Conclusion There have been considerable contributions to POD modeling in the last decade. Hence. 4. w x AUTHORS Jeremy S. the sample data sets were converted to hit/miss for analysis and a two-parameter logit model was used. “Using the Correct Intervals for Prediction: A Tutorial on Tolerance Intervals for Ordinary Least-squares Regression. a90/95 < true a90). and G. 2007. 87. In this case. Ohio 45433. . As expected. Vandebroek. 1823–1844. and P. Ramana V. Appropriate coverage probability is defined as being close to the theoretical confidence value. Nondestructive Evaluation System Reliability Assessment. No. 9th ed: Nondestructive Evaluation and Quality Control. the variance for the 30-observation case is approximately double that of the 100-observation case. while coverage probability for the 100-observation case is approximately 94%. and repeat. 26. REFERENCES Annis.” Review of Progress in Quantitative Nondestructive Evaluation. 1985.” Review of Progress in Quantitative Nondestructive Evaluation. 2014). Ohio 45435. Simulation studies were introduced as a way of to quantify the effect of sample size on confidence bound calculations. 33. 22. August 2010.ME FEATURE w x developments in probability of detection modeling The process of the simulation is simply to sample from the “population” of observations. Department of Defense. Harding. MIL-HDBK-1823A. Hence. pp. “Experimental Determination of the Probability of Detection Hit/Miss Data for Small Data Sets.” Review of Progress in Quantitative Nondestructive Evaluation. Vol. De Gryze. Vol. for example. pp. Wright-Patterson AFB. I.. pp.S. “NDE Reliability Data Analysis. 1767–1774. Grandhi: Wright State University. Based on experience. the variance is higher for the estimated a90 and a90/95 values with only 30 observations. 147–154.” ASM Handbook.. Annis. DOD. Dayton.. A.A. a90 and a90/95 were computed for each simulated data set and the coverage probability was computed for both of the sample sizes. Berens. Figure 5 shows box plots for a90 and a90/95 for the twoparameter logit model. Though there does not appear to be much of a difference in the a90 values on average. U. It is hoped that they will be used more frequently as different techniques of modeling POD are proposed. Department of Defense Handbook. Knopp. Vol. pp. 1327–1334. For a 95% level of confidence. which shows the variation in the a90 and a90/95 values with respect to the true value of a90. Frank Ciarallo: Wright State University. C. Since the assumption of constant variance is violated. it is predicted that 100 observations of hit/miss data randomly sampled should provide appropriate coverage probability for POD (Annis. the percent of simulated intervals that cover the “true” a90 will be calculated and evaluated against 95%.S. 1989. 17. Box plots for a90 and a90/95 values for 30 and 100 observations (obs) for two-parameter hit/miss model. Further simulation studies can investigate the effect of the choice of left censoring that data and the detection threshold on the coverage of the true a90. In the future. Vol. It is hoped that procedures for three-parameter and four-parameter models and also the bootstrapping approach for confidence intervals will be codified for more general use. “Comparing the Effectiveness of a90/95 Calculations. Langhans. and M. and J. Berens. “Influence of Sample Characteristics on Probability of Detection Curves. C.P. A. pp. 2007. 2039–2046.W.” Review of Progress in Quantitative Nondestructive Evaluation. 60 MATERIALS EVALUATION • JANUARY 2015 Hence. The variance of the a90/95 values for the 30-observation case is approximately four times that of the 100-observation case.. C. Hovey. Vol. the coverage probability should be approximately 95%. 12 10 Range 8 6 4 True a90 2 0 a90 (30 obs) a90/95 (30 obs) a90 (100 obs) a90/95 (100 obs) Probability of detection Figure 5. Coverage probability for the 30-observation case is approximately 85%. 689–701. These are the types of investigations that can be conducted via simulation studies. it is recommended that future POD software projects enable simulation studies to facilitate such investigations. Dayton. Hugo.R. the 30-observation case yields more overly optimistic results about the inspection capability (that is. 2003. Ohio 45435. 2014. pp. “The Sample Size and Flaw Size Effects in NDI Reliability Experiments. this process was done for sample sizes of 30 and 100. such simulation studies can be used to quantify the effect of changes in model form or different techniques for calculating confidence bounds. Vol.. compute a90 and a90/95.P..” Chemometrics and Intelligent Laboratory Systems. 2.

pp. and W.E. Spencer.Q.. M. 6. F. “Statistical Analysis of Hit/ miss Data. 1996.M. Forsyth. 2014..” E-Journal of Advanced Maintenance. 1725–1732. “Quantile Probability of Detection: Distinguishing between Uncertainty Variability in Nondestructive Testing. Fahr.” Review of Progress in Quantitative Nondestructive Evaluation. pp.W. F. Aldrin..W. Meeker. No. Society of Automotive Engineers. Vol. Knopp. Vol. 32. “Bayes Factors. “Distinguishing Between Uncertainty and Variability in Nondestructive Evaluation.S. pp. No. 1998. R. “Nonparametric Pod Estimation for Hit/miss Data: A Goodness of Fit Comparison for Parametric Models. McGraw-Hill/Irwin. M. Knopp. 38. 90. Kutner. pp.” Review of Progress in Quantitative Nondestructive Evaluation. Grandhi. Li.” Journal of Nondestructive Evaluation. 2055–2062. Inc.W. and L.. New York. Li. 1995. Vol... pp. and W. 89–95. J. Spencer. “Applications of Statistical Methods to Nondestructive Evaluation. F. and L. Park. 4.V. Vol. 1–5.C. New York.. J. “Detection Reliability Study for Interlayer Cracks. and I. Grandhi. Koh.” Review of Progress in Quantitative Nondestructive Evaluation. “The Calculation and Use of Confidence Bounds in POD Models. Neter. 95–112.” Review of Progress in Quantitative Nondestructive Evaluation.W. Applied Linear Statistical Models. pp.” Technometrics. 2012.” Materials Evaluation.Q.Kass. Vol. Zeng. Vol.W. 773–795. and W. R. 105–115. pp. C.” Proceedings of the 1998 SAE Airframe/Engine Maintenance & Repair Conference. “The Effect of Flaw Size Distribution on the Estimation of POD. pp. R. Meeker. Vol. and W. pp. Moore. 2013. 5th ed. F. Vol. Raftery. pp.” Journal of the American Statistical Association.” Materials Evaluation.S. M. Safizadeh. Spencer. pp. No. D. Vol. and A.Q. 33. 26. 30.. 33.C. Knopp. 31. 71. 2014. 2047–2054. D. Vol..V. 2007. 3. Spencer.” Review of Progress in Quantitative Nondestructive Evaluation. “Curve Fitting for Probability of Detection Data: A 4–parameter Generalization. “Statistical Analysis of Eddy Current Data from Fastener Site Inspections. and F. 1. 46.S.” Insight.W.E.. Spencer. J. and A. 2011. 323–329. F..J. JANUARY 2015 • MATERIALS EVALUATION 61 .. Meeker.H. J..S. 2012.S. 2015. Meeker. Li. Olin. Nachtsheim. “Considerations for Statistical Analysis of Nondestructive Evaluation Data: Hit/miss Analysis. pp. 2004. J.D. M. 1557–1564.G. Vol.Q. 73. 1791–1798.. Spencer. J. “Bayesian Planning of Hit-Miss Inspection Tests. June 2004. No. 44–50. 2013. Aldrin. Zeng.. Vol. B. and W. Y. 430. No. 3.