You are on page 1of 144

HSE

Health & Safety


Executive

Probability of Detection (PoD) curves


Derivation, applications and limitations

Prepared by Jacobi Consulting Limited


for the Health and Safety Executive 2006

RESEARCH REPORT 454


HSE
Health & Safety
Executive

Probability of Detection (PoD) curves


Derivation, applications and limitations

George A Georgiou
Jacobi Consulting Limited
57 Ockendon Road
London N1 3NL

There is a large amount of ‘Probability of Detection’ (PoD) data available (eg National NDT Centre (UK),
NORTEST (Norway), NIL (Netherlands) and in particular NTIAC (USA)). However, it is believed that PoD
curves produced from PoD data are not very well understood by many who use and apply them. For
example, in producing PoD curves, a certain material and thickness may have been used and yet one can
find the same PoD quoted for a range of thicknesses. In other cases, PoD curves may have been
developed for pipes, but they have been applied to plates or other geometries. Similarly, PoD curves for
one type of weld (eg single sided) have been used for other welds (eg double sided). PoD data are also
highly dependent on the Non-Destructive Testing (NDT) methods used to produce them and these data
can be significantly different, even when applied to the same flaws and flaw specimens. It is often
assumed that the smallest flaw detected is a good measure of PoD, but there is usually a large gap
between the smallest flaw detected and the largest flaw missed. Similarly, it is often assumed that human
reliability is a very important factor in NDT procedures, and yet it is usually found not to be as important as
other operational and physical parameters.

It is important to question the validity of how PoD curves are applied as well as their limitations. This report
aims to answer such questions and in particular their relevance to fitness for service issues involving PoD.

The overall goal of this project is to provide clear, concise, understandable and practical information on
PoD curves, which will be particularly useful for Health and Safety Inspectors when discussing safety
cases involving PoD curves.

This report and the work it describes were funded by the Health and Safety Executive (HSE). Its contents,
including any opinions and/or conclusions expressed, are those of the author alone and do not necessarily
reflect HSE policy.

HSE BOOKS
© Crown copyright 2006

First published 2006

All rights reserved. No part of this publication may be


reproduced, stored in a retrieval system, or transmitted in
any form or by any means (electronic, mechanical,
photocopying, recording or otherwise) without the prior
written permission of the copyright owner.

Applications for reproduction should be made in writing to:


Licensing Division, Her Majesty's Stationery Office,
St Clements House, 2-16 Colegate, Norwich NR3 1BQ
or by e-mail to hmsolicensing@cabinet-office.x.gsi.gov.uk

ii
TABLE OF CONTENTS

TABLE CAPTIONS AND FIGURE CAPTIONS v


EXECUTIVE SUMMARY vi
Background vi
Objectives vi
Work Carried Out vi
Conclusions vii
Recommendations vii
1. INTRODUCTION 1
2. OBJECTIVES 2
3. DERIVATION OF POD CURVES 2
3.1. A HISTORICAL BACKGROUND AND DEVELOPMENT OF NDT RELIABILITY METHODS 2
3.2. EXPERIMENTAL REQUIREMENTS TO PRODUCE POD CURVES 3
3.3. THE AVAILABLE PROBABILITY METHODS TO PRODUCE POD CURVES 4
3.3.1. PoD Curves for Hit/Miss Data 4
3.3.2. PoD Curves for Signal Response Data 5
3.3.3. Sample Sizes 6
3.4. CONFIDENCE LIMITS (OR CONFIDENCE INTERVALS) 7
3.5. PUBLISHED WORK ON THE MODELLING OF POD 8
3.5.1. An Overview 8
3.5.2. The PoD-generator (The Netherlands) 9
3.5.3. Iowa State University (USA) 9
3.5.4. National NDT Centre (UK) 9
4. THE PRACTICAL APPLICATION OF POD CURVES 10
4.1. HOW POD CURVES ARE USED IN INDUSTRY 10
4.2. PUBLISHED WORK ON POD CURVES IN DIFFERENT INDUSTRIES 10
4.2.1. Aerospace (NASA) 10
4.2.2. Aircraft Structures, Inclusions in Titanium Castings 10
4.2.3. NORDTEST Trials 11
4.2.4. Nuclear Components (The PISC Trials) 11
4.2.5. Offshore tubular Joints 11
4.2.6. Dutch Welding Institute (NIL) 11
4.2.7. Railways (National NDT Centre (UK)) 12
4.2.8. LPG Storage Vessels 13
5. THE LIMITATIONS OF APPLYING POD CURVES 13
5.1. COMMENTS ON HIT/MISS DATA AND SIGNAL RESPONSE DATA 13
5.2. IMPORTANT OPERATING AND PHYSICAL PARAMETERS 14
5.2.1. NDT Method 15
5.2.2. Fluorescent Penetrant NDT 15
5.2.3. Material Properties 15
5.2.4. Specimen Weld Geometry 15
5.2.5. Flaw Characteristics 16

iii
5.2.6. Human Reliability 16
6. DISCUSSION 16
6.1. INTRODUCTION 16
6.2. AIMS AND OBJECTIVES 17
6.3. HISTORICAL DEVELOPMENT 17
6.4. FLAW SAMPLE SIZES FOR ‘HIT/MISS’ DATA AND ‘SIGNAL RESPONSE’ DATA 18
6.4.1. Model for Hit/Miss Data 18
6.4.2. Model for Signal Response Data 19
6.4.3. To Compute PoD parameters 20
6.4.4. To Achieve the Desired PoD/Confidence Limit Combination 20
6.5. POD MODELLING 20
6.6. PRACTICAL APPLICATIONS OF POD 21
6.6.1. Aircraft Structures, Inclusions in Titanium Castings 21
6.6.2. NORDTEST Trials 21
6.6.3. Nuclear Components (The PISC Trials) 21
6.6.4. Offshore Tubular Joints 21
6.6.5. Dutch Welding Institute (NIL) 22
6.6.6. Railways 22
6.6.7. LPG Storage Vessels 22
6.7. DEPENDENCE OF POD ON OPERATIONAL AND PHYSICAL PARAMETERS 22
6.7.1. Important Operational and Physical Parameters 22
6.7.2. NDT Method 23
6.7.3. Fluorescent Penetrant NDT 23
6.7.4. Material Properties 23
6.7.5. Specimen Weld Geometry 23
6.7.6. Flaw Characteristics 23
6.7.7. Human Reliability 24
7. INDEPENDENT VERIFICATION 24
8. CONCLUSIONS 24
9. RECOMMENDATIONS 25
10. ACKNOWLEDGEMENTS 25
11. REFERENCES 25
12. VERIFICATION STATEMENT

TABLES 1
FIGURES 1 - 13

APPENDIX A GLOSSARY OF TERMS, STATISTICAL TERMINOLOGY AND OTHER


RELEVANT INFORMATION
APPENDIX B AN AUDIT TOOL FOR THE PRODUCTION AND APPLICATION OF
POD CURVES
APPENDIX C THE VALIDITY OF THE JCL ‘INDEX OF DETECTION’ MODEL

iv
TABLE CAPTIONS AND FIGURE CAPTIONS
TABLE CAPTIONS
Table 1 Maximum Probability Tables

FIGURE CAPTIONS
Figure 1 Example of detection percentages for a handheld Eddy-Current inspection and a ‘log-
odds’ distribution fit to the data.
Figure 2 Ultrasonic NDT hit/miss data illustrating the relatively large gap between the smallest
flaw detected and the largest flaw missed.
Figure 3 The linear relationship between the log-odds and log flaw size.
Figure 4 Schematic of the PoD for flaws of fixed dimension for ‘hit/miss’ data.
Figure 5 Schematic of the PoD for flaws of fixed dimension for ‘signal response’ data.
Figure 6 A comparison between the log-odds and cumulative log-normal distribution functions
for the same parameters =0 and =1.0.
Figure 7 An example of when the log-odds model was not applicable to the data collected
Figure 8 PoD (a) log-odds model results for different NDT methods applied to the same flaw
specimen.
Figure 9 PoD (a) log-odds model results for fluorescent penetrant: no developer and developer
applied to the same flaw specimen
Figure 10 PoD (a) log-odds model results for manual eddy currents: different materials but
nominally the same flaws
Figure 11 PoD (a) log-odds model results for X-ray radiography: different weld conditions but
nominally the same flaws
Figure 12 PoD (a) log-odds model results for fluorescent penetrant: different flaws but nominally
the same specimens
Figure 13 PoD (a) log-odds model results for Ultrasound (Immersion): different operators but
inspecting the same flaw specimen

v
EXECUTIVE SUMMARY
Background
There is a large amount of ‘Probability of Detection’ (PoD) data available (e.g. National NDT
Centre (UK), NORTEST (Norway), NIL (Netherlands) and in particular NTIAC (USA)). However,
it is believed that PoD curves produced from PoD data are not very well understood by many who
use and apply them. For example, in producing PoD curves, a certain material and thickness may
have been used and yet one can find the same PoD quoted for a range of thicknesses. In other cases,
PoD curves may have been developed for pipes, but they have been applied to plates or other
geometries. Similarly, PoD curves for one type of weld (e.g. single sided) have been used for other
welds (e.g. double sided). PoD data are also highly dependent on the Non-Destructive Testing
(NDT) methods used to produce them and these data can be significantly different, even when
applied to the same flaws and flaw specimens. It is often assumed that the smallest flaw detected is
a good measure of PoD, but there is usually a large gap between the smallest flaw detected and the
largest flaw missed. Similarly, it is often assumed that human reliability is a very important factor in
NDT procedures, and yet it is usually found not to be as important as other operational and physical
parameters.

It is important to question the validity of how PoD curves are applied as well as their limitations.
This report aims to answer such questions and in particular their relevance to fitness for service
issues involving PoD.

The overall goal of this project is to provide clear, concise, understandable and practical
information on PoD curves, which will be particularly useful for Health and Safety Inspectors when
discussing safety cases involving PoD curves.

Objectives
♦ To provide a clear and understandable description of how PoD curves are derived.
♦ To provide practical applications of how PoD curves are used and their relevance to fitness
for service issues.
♦ To quantify the limitations of PoD curves.

Work Carried Out


A historical overview is provided for PoD in section 3 and a description of how the techniques used
to produce PoD curves have evolved during the last three decades, paying special attention to the
fundamental PoD functions for ‘hit/miss’ data and ‘signal response’ data. In this respect, Appendix
A provides additional help for non-statisticians on the basic elements and mathematics of PoD
functions, and Appendix B provides an audit tool for those interested in producing or assessing PoD
curves. Part of section 3 is also devoted to published work on the modelling of PoD over the same
period.

A range of different industrial applications of PoD curves are discussed in section 4, and the
opportunity was taken to update the results of an earlier application of PoD curves to Liquid
Petroleum Gas (LPG) spheres. The details of the new work, which can be regarded as having wider
applications, are included in Appendices C and D (i.e. the ‘Probability of Inclusion’ and the
‘Guidelines’ on inspecting welds respectively).

Section 5 has been devoted to the limitations of applying PoD curves, as well as the main
operational and physical parameters they are dependent on.

vi
In order to illustrate and explain many of the important issues discussed, and which are particularly
relevant to PoD, a number of experimental and theoretical examples are provided throughout the
report.

Conclusions
• The ‘log-odds’ distribution is found to be one of the best fits for hit/miss NDT data.
• The log-normal distribution is found to be one of the best fits for signal response NDT data, and
in particular for flaw length and flaw depth data as determined by ultrasonic NDT.
• In some cases, the ‘log-odds’ and cumulative log-normal distributions are very similar, but
there are many cases where they are significantly different.
• There are NDT data when neither the ‘log-odds’ nor the log-normal distributions are
appropriate and other distributions need to be considered.
• There is often a large gap between the smallest flaw detected and the largest flaw missed.
• Very small or very large flaws do not contribute much to the PoD analysis of hit/miss data.
• To achieve a valid ‘log-odds’ model solution for hit/miss data, a good overlap between the
smallest flaw detected and the largest flaw missed is necessary.
• To achieve a valid log-normal model solution for signal response data, there is less reliance on
flaw size range overlap, but more on the linear relationship between ln(â) and ln(a).
• When the PoD (a) function decreases with increasing flaw size, it is usually an indication that
the NDT procedures are poorly designed.
• When the lower confidence limit decreases with increasing flaw size, notwithstanding an
acceptable PoD (a) function, it is usually associated with extreme or unreasonable values of the
mean and standard deviation.
• The effect on PoD results for particular operational and physical parameters can be significant
for datasets selected from the NTIAC data book of PoD curves.
• The PoD data in the NTIAC data book were collected some 30 years ago and may not
necessarily reflect current capabilities with modern digital instrumentation. However, the
results are still believed to be relevant to best practice NDT.
• The PoD data illustrated in each of the figures 7 – 13 are valid for the particular datasets in
question. It would be wrong to draw too many general conclusions about the particular PoD
values (e.g. ultrasound is better than X-ray).
• Figures 7 - 13 serve to illustrate the possible effects that the physical and operational parameters
can have on the PoD and an awareness of these effects is important when quoting PoD results.
• NDT methods, equipment ‘calibration’, fluorescent penetrant developers, material, surface
condition, flaws and human factors are all important operational and physical parameters,
which can have a significant effect on PoD results.
• Whilst human factors are important variables in NDT procedures, they are often found not to be
as important as other operational and physical variables.
• The ‘Log-odds’ distribution was found to be the most appropriate distribution to use with the
JCL ‘Probability of Inclusion’ model.
• The earlier JCL ‘Probability of Inclusion’ model has been validated against an independently
developed ‘Probability of Inclusion’ model by MBEL.

Recommendations
• Publish a signal response data book of PoD results.
• Publish a more up to date data book from different PoD studies and collate them in a way which
best serves more general industrial and modelling applications.
• Set up a European style project or Joint Industry Project to realise the above recommendations.

vii
viii
1. INTRODUCTION
There is a large amount of ‘Probability of Detection’ (PoD) information available; starting with the
pioneering work in the late 1960’s to early 1970’s for the aerospace industry, to more recent and
more general industrial applications (e.g. National NDT Centre (UK), NORTEST (Norway), NIL
(Netherlands), PISC (Europe) and NTIAC (USA)). However, it is believed that PoD curves are not
very well understood by many who use and apply them.

PoD curves have been produced for a range of Non-Destructive Testing (NDT) methods (e.g.
ultrasound, radiography, eddy currents, magnetic particle inspection, liquid penetrants, visual and
others). Whilst it is reasonable to assume that each NDT method will produce different PoD curves
(even when applied to the same flaws), it is believed that many who use PoD curves do not fully
appreciate how significant the differences can be. PoD curves are also dependent on a number of
physical parameters (e.g. material, thickness of component, flaw type, geometry etc) and this too is
not always appreciated. In some applications only one thickness may have been used and yet the
same PoD is quoted for a range of thicknesses. In other cases, PoD curves may have been derived for
pipes, but they have been used for plates or other geometries. Similarly, PoD curves for one type of
weld (e.g. single sided or J-prep welds) have been used for other welds (e.g. double sided or double
V-prep welds) without any justification.

It is important to have an understanding of how PoD curves are derived and to question the validity
of how PoD curves are applied, as well as to appreciate their limitations. This report aims to provide
this information as well as considering their relevance to health and safety issues.

Section 3 provides an overview of the historical background to PoD, the experimental requirements
for PoD curves in practice (e.g. minimum sample size and confidence limits), the various approaches
to produce PoD curves and the development of theoretical modelling of PoD curves. Section 4
provides information on how curves are used in industry and there are a number of examples
provided in various industrial applications and publications. There are examples of both practical and
impractical applications of PoD curves. Section 5 discusses the dependence of PoD curves on a range
of operational and physical parameters and the limitations of applying PoD curves.

Section 6 is used to bring together all the salient features of the report and is much more than an
executive summary. This is done so that (i) Health and Safety Inspectors can get an informed
overview of the report without having to read each section in detail and (ii) to provide the basis of an
externally published paper. The conclusions and recommendations are provided in Section 8 and
Section 9 respectively.

Where the statistics is believed to be important to the discussion, some explanation is provided in the
main body of the text, but in most cases more detailed explanations are provided in Appendix A. The
mathematical content has been kept to a minimum and at a level which is suitable for scientists and
engineers, who may not necessarily have a background in statistics. Appendix A also includes a
glossary of terms and symbols, statistical definitions and different terminologies used in this report.

In order to assist organisations interested in producing PoD curves, perhaps for the first time, and in
particular Health and Safety Inspectors dealing with PoD issues in industry, an ‘audit tool’ (or check
list) has been provided in Appendix B.

There is a self contained report in Appendix C that deals specifically with validating an earlier
probabilistic model developed for the Health and Safety Executive (HSE) (i.e. the ultrasonic NDT of
LPG storage vessels). The model, which makes use of PoD curves and hence relevant to this study,
has been updated along with its companion report, the HSE guidelines on how to use the model
1
(Appendix D). Both the updated model and updated companion guidelines are now considered as
having wider applications than just the ultrasonic NDT of LPG storage vessels.

The whole report has been read by a qualified statistician to verify and check the calculations and to
assess that the conclusions and recommendations are based on sound scientific reasoning. Additional
verifications have been carried out by others and the full details are discussed in section 7 and a
formal verification statement is made in Section 12.

The overall goal of this project is to provide clear, concise and understandable information on PoD
curves, which will be particularly useful for Health and Safety Inspectors in discussing safety cases
involving PoD curves.

2. OBJECTIVES
♦ To provide a clear and understandable description of how PoD curves are derived.
♦ To provide practical applications of how PoD curves are used and their relevance to fitness for
service issues.
♦ To quantify the limitations of PoD curves.

3. DERIVATION OF POD CURVES


3.1. A HISTORICAL BACKGROUND AND DEVELOPMENT OF NDT RELIABILITY METHODS
Non-destructive Testing (NDT) reliability may be defined as 'the probability of detecting a crack in
a given size group under the inspection conditions and procedures specified' (1). There are of
course other similar definitions, but the underlying statistical parameter is the PoD, which has
become the accepted formal measure of quantifying NDT reliability. The PoD is usually expressed as
a function of flaw size (i.e. length or depth), although in reality it is a function of many other
physical and operational parameters, such as, the material, the geometry, the flaw type, the NDT
method, the testing conditions and the NDT personnel (e.g. their certification, education and
experience).

Repeat inspections of the same flaw size or the same flaw type will not necessarily result in
consistent hit or miss indications. Hence there is a spread of detection results for each flaw size and
flaw type and this is precisely why the detection capability is expressed in statistical terms such as
the PoD. An early example of this is illustrated in the paper by Lewis et al (2), who had 60 air force
inspectors use the same surface eddy-current technique to inspect 41 known cracks around
countersunk fastener holes in a 1.5m length of a wing box. The results are illustrated in Figure 1 in
terms of a detection percentage (i.e. the number of times a crack was detected relative to the number
of detection attempts). The chances of detecting the cracks increases with crack size, as one might
expect, but none of the cracks were detected 100% of the time and different cracks with the same
size have quite different detection percentages. Figure 1 also shows that the ‘log-odds’ distribution is
a reasonable fit to this data and illustrates why PoD is considered an appropriate measure of
detection capability.

PoD functions, for describing the reliability of an NDT method or technique have been the subject of
many studies and have undergone considerable development since the late 1960’s and early 1970's,
where most of the pioneering work was carried out in the aerospace industry (3,4). In order to ensure
the structural integrity of critical components it was becoming more evident that instead of asking the
question ‘…what is the smallest flaw that can be detected by an NDT method?’ it was more
appropriate, from a fracture mechanics point of view, to ask ‘…what is the largest flaw that can be
missed?’ To elaborate on this point here, ultrasonic inspection data has been re-plotted from the
‘Non-destructive Testing Information Analysis Centre’ (NTIAC) capabilities data book (5). Figure 2
2
illustrates the detection capabilities of an ultrasonic surface wave inspection of two flat aluminium
plates (thicknesses 1.5mm and 5.6mm), containing a total of 311 simulated fatigue cracks with
varying depths. The flaws are recorded as detected (or hit) with PoD=1, or missed with PoD=0.
Figure 2 shows three distinct regions separated by the lines asmallest (i.e. the smallest flaw detected)
and alargest (i.e. the largest flaw missed). The region between asmallest and alargest shows that there are
flaws of the same size which are sometimes detected and sometimes not detected. It is also clear that
alargest is significantly larger than asmallest .

In 1969, a program was initiated by the National Aeronautics and Space Administration (NASA) to
determine the largest flaw that could be missed for the various NDT methods that were to be used in
the design and production of the space shuttle. The methodology by NASA was soon adopted by the
US Air Force as well as the US commercial aircraft industry. In the last two decades many more
industries have adopted similar NDT reliability methods based on PoD. Some of these will be
discussed in more detail in section 4 below.

Early on in the mid-1970’s, a constant PoD for all flaw types of a given size was proposed and
Binomial distribution methods were used to estimate this probability, along with an associated error
or ‘lower confidence limit’ as it is often called (1). Whilst good PoD estimates could be obtained for
a single flaw size, very large sample sizes were required to obtain good estimates of the ‘lower
confidence limit’ (see section 3.4 below for more details on the confidence limit). It is clear from
Figure 1, that this early assumption about a constant PoD for flaws of a given size, whilst making the
probability calculations easier, was too simplistic as different detection percentages were being
recorded for the same flaw size.

In cases where there was an absence of large sample sizes, various grouping schemes were
introduced to analyse the data, but in these cases estimates for the lower confidence limit were no
longer valid. In the early to the mid-1980s, the approach was to assume a more general model for the
PoD vs. flaw size ‘a’. Various analyses of data from reliability experiments on NDT methods
indicated that the PoD (a) function could be modelled closely by either the cumulative 'log-normal'
distribution or the 'log-logistic' (or ‘log-odds’) distribution (6). Both of these models will be
discussed in more detail below.

The statistical parameters (e.g. mean, median and standard deviation) associated with the PoD (a)
functions can be estimated using standard statistical methods like 'maximum likelihood methods' (6)
(see also Appendix A, section 2).

3.2. EXPERIMENTAL REQUIREMENTS TO PRODUCE POD CURVES


The ‘Recommended Practice’ (1), which was originally prepared for the aircraft industry, provides
comprehensive information on the experimental sequence of events for generating data to produce
PoD curves and to ‘certify’ (i.e. validate) an NDT method or procedure.

The sequence of events can be broadly summarised as follows (see also (3)):

• Manufacture or procure flaw specimens with the required large number of relevant flaw sizes
and flaw types
• Inspect the flaw specimens with the appropriate NDT method
• Record the results as a function of flaw size
• Plot the PoD curve as a function of flaw size

However, before the manufacture or procurement of flaw specimens, it is necessary to make the
following crucial decisions:
3
• What flaw parameter size will be used (e.g. flaw length or flaw depth)?
• What overall flaw size range is to be investigated (e.g. 1mm to 9mm)?
• How many intervals are required within the flaw size range to be investigated (e.g. if 6
intervals are selected for a 1mm to 9mm flaw size range, this implies a flaw width interval of
1.5mm)?

The recommended practice (1) also provides critical information on the necessary flaw sample size
for each flaw width interval in order to demonstrate the desired PoD, along with an appropriate lower
confidence limit, has been achieved. Usually it is not know before hand how large a flaw has to be
before the desired PoD is satisfied and this can present problems in knowing the most appropriate
flaw size range to select. Following the above experimental approach should lead to the largest flaw
that can be detected with the desired PoD and confidence limit.

It is important to appreciate that in selecting the sample size there are two distinct issues that have to
be addressed. First, there is the issue of the sample size being large enough to achieve the desired
PoD and confidence limit combination. Second, the sample size has to be large enough to be able to
compute the statistical parameters associated with the PoD curve that best fits the data. It is believed
that this distinction is not always made clear in the open literature. It may be of course that the
sample size required to achieve the desired PoD/confidence limit combination is always sufficiently
large to compute the statistical parameters for the PoD curve accurately enough (this is considered in
more detail in section 3.3.3).

3.3. THE AVAILABLE PROBABILITY METHODS TO PRODUCE POD CURVES


In NDT reliability methods, there are two related probabilistic methods for analysing reliability data
and producing PoD curves as functions of the flaw size a. Originally, NDT results were only
recorded in terms of whether the flaw was detected or not (c.f. Figure 2). This type of data is called
'hit/miss' data and it is discrete data. This way of recording data is still appropriate for some NDT
methods (e.g. penetrant testing or magnetic particle testing).

However, in many NDT systems there is more information in the NDT response (e.g. peak voltage in
eddy current NDT, the signal amplitude in ultrasonic NDT, the light intensity in fluorescent
penetrant NDT). Since the NDT signal response can be interpreted as the perceived flaw size, the
data is sometimes called â data (i.e. ‘a hat data’) or 'signal response' data and it is continuous data.

Each type of data (i.e. hit/miss or signal response) is usually analysed using a different probabilistic
model to produce the PoD (a) function. The details of the complete theoretical analysis is quite
involved and beyond the scope of this report, but some details will be provided here and more
information can be found in well referenced publications (4, 6).

3.3.1. PoD Curves for Hit/Miss Data


For hit/miss data a number of different statistical distributions were originally considered for the best
fit (7). It was found that the log-logistic distribution was the most acceptable and the PoD (a)
function can be written as:

! $ ln a # m %
& '
e 3( " )
PoD(a ) = ! $ ln a # m %
(1)
& '
1+ e 3( " )

4
where a is the flaw size and m and are the median and standard deviation respectively.

Another convenient form of equation (1) can be written as:

! + " ln a )
e(
PoD(a ) = ! + " ln a )
(2)
1 + e(

and it is straightforward (see Appendix A, Section 3.2) to show that the parameters and are
related to m and by:

!
m=# (3)
"

!
"= (4)
# 3

From equation (2), it is straight forward to show that (see Appendix A, section 3.2):

# PoD(a ) $
ln % & = ! + " ln a (5)
( 1 ' PoD(a ) )

The term on the left hand side is called the logarithm of the ‘odds’
(i.e. odds = probability of success/probability of failure) and equation (5) demonstrates that:

ln(odds ) ! ln a (6)

hence the name ‘the log-odds model’ when applied to the hit/miss data.

In Figure 1, it is evident that the log-odds PoD (a) function fits the particular hit/miss eddy current
data well. Further evidence is given in Figure 3 where the linear relationship shown above in
equation (6) is demonstrated (see also reference 6). The particular parameters and in Figure 3
(i.e. = -2.9 and = 1.69) were computed using maximum likelihood methods (6). The statistical
parameters m and can be calculated from equations (3) and (4).

Recall the discussion above in section 3.1 regarding the detection probabilities of repeat inspections
of the same flaw, as well as of different flaw types with the same size. The different detection
probabilities result in a distribution of probabilities for some fixed flaw length (or flaw depth). The
standard way of defining the distribution of these probabilities is through a 'probability density
function' (see Appendix A, section 3.4). In the case of 'hit/miss' data the PoD (a) function is the mean
of the probability density function for each flaw length or depth (Figure 4).

3.3.2. PoD Curves for Signal Response Data


For signal response data, much more information is supplied in the signal for analysis than is in the
hit/miss data. In fact, as will be shown below, the PoD (a) function is derived from the correlation of
â vs. a data.

For signal response data it has been observed in a number of studies (6, 8) that an approximate linear
relationship exists between ln(â) and ln(a). The relationship is usually expressed by:
5
ln(â ) = ! 1 + " 1 ln(a) + # (7)

where is an error term and is normally distributed with zero mean and constant standard deviation
. The term 1 + 1 ln(a) in equation (7) is the mean (a) of the probability density function of
ln(â). In signal response data, a flaw is regarded as ‘detected’ if â exceeds some pre-defined
threshold âth.

Equation (7) is really expressing the fact that ln(â) is normally distributed with mean
2
(a) = 1 + 1 ln(a) and constant standard deviation (i.e. N( (a), ).

The PoD (a) function for signal response data (i.e. ln(â)) can be expressed as:

PoD(a ) = Probability (ln(â) > ln(âth )) (8)

In other words, it is the area contained between the probability density function of ln(â) and above
the flaw evaluation threshold ln(âth) (see Figure 5).

Using standard statistical notation (9), equation (8) can be written as;

% ln(âth ) ' (" 1 + # 1 ln(a)) &


PoD(a ) = 1 ' F ( ) (9)
*( $! )+

where F is the continuous cumulative distribution function (see Appendix A, Section 3)).

It is fairly straight forward to show that with the symmetric properties of the Normal distribution
equation (9) can be written as (see Appendix A, Section 3):

% ln(a) ' ((ln(âth ) ' " 1 ) / # 1 )&


PoD(a ) = F ( ) (10)
(
* $ ! (
/ # 1 ) )
+

which is the cumulative log-normal distribution with:

ln(âth ) # ! 1
mean = µ (a ) = (11)
"1

and
"!
standard deviation = " = (12)
#1

The estimates for 1, 1 and are computed from the PoD data using the maximum likelihood
method (6).

3.3.3. Sample Sizes


(a) To compute PoD parameters

6
For the hit/miss data, it has been shown in Figure 2 there is a flaw size range (i.e. asmallest, alargest) in
which there is a definite uncertainty whether the inspection system will detect the flaw or not. On the
other hand, if the flaw size a < asmallest the inspection system would be expected to miss the flaw.
Similarly if a > alargest the inspection system would be expected to detect the flaw. So having a large
number of very small or very large flaws will not provide much information on the PoD (a) function
that will fit the data. To maximise the information required for estimating the PoD (a) function (i.e.
the parameters) it is recommended that the flaw sizes be uniformly distributed between the minimum
and maximum flaw size of interest. A minimum of 60 flaws is recommended for hit/miss data (6).

For signal responses data, a direct consequence of the additional information means the range of flaw
sizes is not as critical. The recommendation is a minimum of 30 flaws in the sample size (6).
However, increasing the sample size will also increase the accuracy of the PoD (a) function estimate.

(b) To achieve the desired PoD/Confidence limit combination


In practice, a PoD and lower confidence limit combination that is often quoted is 90% and 95%
respectively (sometimes written 90-95). For the hit/miss NDT data discussed in the recommended
practice (1), it is necessary to have a minimum sample of 29 flaws in each flaw width interval. This
could be interpreted as 29 flaw specimens with one flaw in each specimen. This means that with 6
flaw width intervals, a minimum of 174 flaw specimens would be necessary (i.e. 174 flaws spread
across the overall flaw range). So when published articles on this theme often refer to the
‘considerable’ cost associated with producing PoD curves experimentally, it is often understated. In
addition, it is necessary to have the same number of ‘control’ specimens (specimens having no flaws)
as flaw specimens, which are randomly mixed in with the flaw specimens before all the specimens
are inspected.

With such a large number of flaws, the requirement to compute the PoD (a) function parameters, as
discussed above in (a), is easily satisfied.

In order to achieve the 90% PoD with a 95% lower confidence limit for any flaw width interval, it is
necessary to detect all the 29 flaws in that flaw width interval. For each flaw that is not detected in
any particular flaw width interval, the recommended practice (1) provides tables of how many flaws
in total need to be detected to achieve certification. There are also ‘maximum probability’ tables
which indicate the probability of achieving certification after failing to achieve it at the first attempt
or the second attempt and so on. Following any failure to certify, the decision as to whether it is
economically viable to continue has to be considered very carefully and the maximum probability
tables in the recommended practice are provided as assistance. A selection of maximum probability
values, based on a 90-95 PoD and confidence limit combination are given in Table 1 of this report
(see Appendix B of reference 1 for a more complete set of maximum probability tables).

The experimental procedure for achieving the desired PoD/Confidence limit is equally applicable to
the signal response data.

3.4. CONFIDENCE LIMITS (OR CONFIDENCE INTERVALS)


To obtain a better understanding of confidence limits in statistics, consider first an example using
numerical integration (10).

When we want to calculate the area under a curve for a function that is too complicated (or
impossible) to carry out an exact integration, we need to compute a numerical integration. There are
‘error formulae’ in numerical integration where the maximum possible deviation or error can be
calculated. Hence:
7
If we have an unknown exact value ‘e’ for the area and a known approximate value ‘A’ for the area,
we will be able to calculate a maximum possible error, or deviation ‘ d’ from the error formulae.
Hence we can say that:

A ! d " e " A+ d

That is, ‘e’ lies between A - d and A + d with 100% certainty.

In statistics however, a similar problem of estimating the true parameter ‘p’ of a population (e.g. the
PoD), would require us to determine two numerical values ‘p1’ and ‘p2’, that depend on a particular
random sample set and include ‘p’ with 100% certainty. However, from a sample set we cannot draw
conclusions about the population with 100% certainty. We need to modify our approach since the
numerical quantities p1 and p2 depend on the sample set and will be different for each random set.
The interval with end points p1 and p2 is called a ‘confidence interval’. The concept of the
confidence interval is usually expressed in the following way:

P ( p1 ! p ! p2 ) = C (13)

where C is called ‘the confidence level’. The point p1 is called ‘the lower confidence limit’ and the
point p2 is called ‘the upper confidence limit’ (9, 10).

For example, if we assign C to be 95%, what is the meaning of a ‘95% confidence interval for the
population parameter p? To illustrate the point, let p be the mean of the population. Equation (13) is
often wrongly interpreted as ‘there is a 95% probability that the confidence interval contains the
population mean p’. However, any particular confidence interval will either contain the population
mean or it won’t. The confidence level C does have this probability value associated with it, but it is
not a probability in the normal usage, since p1 and p2 in equation (13) are not unique and are different
for each random sample selected. The correct interpretation of equation (13) is based on repeated
sampling. If samples of the same size are drawn repeatedly from a population and a confidence
interval is calculated from each sample, then we can expect 95% of these different intervals to
contain the true population mean.

Formal definitions of terms associated with the confidence interval are provided in Appendix A,
section 2 and an example is provided in Appendix A, section 2 of how the confidence interval is
calculated for the population mean with a known standard deviation.

3.5. PUBLISHED WORK ON THE MODELLING OF POD


During the last two decades, the modelling of NDT capability has increased and improved
substantially. The models are now being used as part of PoD studies to simulate the results of
inspecting components with quite complex geometries.

3.5.1. An Overview
The savings in carrying out modelling of PoD, as opposed to the experimental determination of PoD,
has been a strong motivation in the development of such models.

The historical development of computational NDT and PoD models is discussed in some detail in a
relatively recent NTIAC publication (11), covering the period from 1977-2001. The development of
modelling PoD has focussed on NDT methods such as ultrasound, eddy currents, X-ray radiography

8
and numerous publications are cited in reference 11. Whilst the models have been used to produce
PoD results for particular NDT methods and flaws, their other main contribution has been to
optimise and validate the NDT procedures.

During the 1990’s there were major research efforts in modelling NDT reliability and PoD from
Iowa State University (USA) and the National NDT Centre, Harwell (UK). Two notable publications
in the 1990’s were Thompson (12), which contained an updated review of the PoD methodology
developed for the NDT of titanium components and Wall (13), which focussed on the PC-based
models at Harwell and included corrections to PoD models due to human and environmental factors.
Both the above publications are worthy of consideration for anyone wishing to start modelling PoD
or to get a very good overview of the capabilities and usefulness of modelling PoD.

A number of models discussed in reference 11 also consider the probability of false calls (or
probability of false alarms (PFA)) and there is a good description of PFA in references (5) and (11).
PFA will not be reported here as it is outside the scope of the project.

3.5.2. The PoD-generator (The Netherlands)


A recent NDT reliability model that is worth mentioning is the ‘PoD-generator’. This particular
model was developed in the Netherlands as part of a joint industry project and presented at the 16th
world conference on NDT 2004 (14). The model allows the assessment and optimisation of an
inspection program for in-service components. The PoD-generator is really 3 models in one; the
‘degradation model’, which predicts the initiation and growth of flaws, the ‘inspection model’, which
simulates the performance of the NDT method (i.e. currently it can deal with ultrasound or
radiography) and the ‘integrity’ model’, which predicts the probability of failure. The degradation
model passes information about the flaws to the inspection model, which in turn passes information
about the inspection performance to the integrity model. A simple example of ultrasonic pulse-echo
measurements to illustrate the concept of the PoD-generator is provided in reference (14).

3.5.3. Iowa State University (USA)


The main centre of excellence in the USA for PoD studies is almost certainly Iowa State University.
In the field of modelling they have developed physically detailed models for predicting PoD. Some
of their main collaborations in the USA have been, understandably, with the aerospace industry and
the air force research laboratories. In fact, in the September 2005 NTIAC Newsletter, there was an
interesting article on developments at Iowa State regarding PoD. The article reported that The Model
Assisted PoD (MAPOD) Working Group has been established with the joint support of some major
aerospace and air force research laboratories. The MAPOD approach is based on using modelling to
determine PoD results in a way that reduces the need for the empirical approach, which can incur
substantial costs and is usually slow to deliver results. More detailed information on MAPOD can be
found in the September 2005 NTIAC Newsletter or by visiting the NTIAC website at
www.ntiac.com.

3.5.4. National NDT Centre (UK)


The National NDT Centre (NNDTC) in the UK, which is now part of ESR Technology Ltd, holds a
similar position in the UK and Europe on PoD as Iowa State holds in the USA. One of the major
contributions of the NNDTC has been in the development of computer models for predicting PoD.
However, they have also contributed to a number of national and international trials on PoD (e.g.
USA ageing aircraft programme) as well as some high profile industrial applications of PoD (see
Section 4).

9
On the modelling, there is the PoD for ultrasonic corrosion mapping (15), which predicts the PoD
theoretically as well as by a simulation approach. Simulated images are brought up on the screen and
the inspector can mark where flaws are seen, like a ‘spot the ball’ approach. The data is then
analysed in terms of PoD and false calls. There are also PoD models originally developed for the
European Space Agency (ESA), which deal with ultrasonic C-scanning and radiography of
composite materials. The ESA work was reported at WCNDT 2000 (16). More recent work on
modelling PoD includes the Magnetic Flux Leakage method in floor scanners and Eddy Currents for
fastener inspection in airframe structures. There are also a number of other model applications,
notably in the offshore industry and more specific information on these can be found on the NNDTC
website at www.nndtc.com.

4. THE PRACTICAL APPLICATION OF POD CURVES


4.1. HOW POD CURVES ARE USED IN INDUSTRY
PoD curves provide reference to results that have been obtained for particular flaws using specific
NDT procedures. However, it is important to appreciate that in using particular PoD curves for
different applications that some validation of the NDT procedures is carried out. The POD curves
provide important results for quantifying the performance capability of NDT procedures as well as
the operators and could be used as a basis for:

• Establishing design acceptance requirements


• NDT procedure qualification and acceptance
• Qualification of personnel performance
• Comparing the performance capabilities of NDT procedures
• Selecting an applicable NDT procedure
• Quantifying improvements in NDT procedures
• Developing repeatable NDT data for fracture mechanics

The examples provided below of PoD applications to different industries link in well with the above
uses of PoD curves (3).

4.2. PUBLISHED WORK ON POD CURVES IN DIFFERENT INDUSTRIES


The methodology of PoD reliability studies, developed in the late 60’s and early 70’s, for the
aerospace industry has been adopted by a number of other industries and some of these will be
discussed here.

4.2.1. Aerospace (NASA)


The first general requirements to quantify the capabilities of NDT methods came with the design and
production of the NASA space shuttle system. In the past, the capability and reliability of routinely
applied NDT procedures was assumed, but no one had produced any factual evidence. For example,
knowing the smallest flaw detected by an NDT method was not much use, as there were many flaws
larger than this smallest flaw that were missed. The flaw size which was more relevant was the
largest flaw that could be missed (c.f. Figure 2). NASA initiated a research program in 1969 to
determine the largest flaw that could be missed for the materials and NDT methods that were to be
used in relation to the design and production of the space shuttle (1, 3).

4.2.2. Aircraft Structures, Inclusions in Titanium Castings


Childs et al (17) assessed X-rays radiography for the detection of ceramic inclusions in thick
Titanium (Ti) castings used in aircraft structures. The castings were manufactured using the ‘Hot
Isostatic Pressure (HIP) process. During the HIP process, the ceramic face coat can break into
10
splinters (or ‘spall’) and become embedded in the casting as ceramic inclusions called ‘shells’. The
X-ray radiography results were analysed in terms of PoD as a function of shell diameter for different
face coat formulations from different suppliers. The PoD results were used to improve the face coat
formulations to improve detectability.

4.2.3. NORDTEST Trials


The NORDTEST programme (18) set out to compare manual ultrasonic NDT with X-ray
radiography when applied to carbon manganese steel butt welds ≤ 25mm thick. The study was used
to establish ‘acceptance curves’ as opposed to PoD curves. The acceptance curves defined
acceptance probabilities vs. flaw height, where the acceptance probabilities were really 1 – PoD. The
results of the NORDTEST trials demonstrated that there was an approximate relationship between
certain ultrasonic NDT and radiographic NDT acceptance criteria (see also reference (19)).

4.2.4. Nuclear Components (The PISC Trials)


The Programme for the Inspection of Steel Components (PISC), carried out in the mid to late
seventies (20), was concerned with the flaw detection capabilities of ultrasonic NDT on thick walled
nuclear pressure vessel components (i.e. ~ 250mm).

The ultrasonic NDT procedures used in the trials were applied too rigidly and did not allow the
signal responses from large planar flaws to be evaluated properly. Hence, relatively low PoD’s were
obtained for quite large flaws. This is a good example of poorly designed NDT procedures leading to
unexpected and low PoD results (see also the discussion below in section 5.1).

In the above PISC-I trials some of the inspectors were allowed to use their own preferred NDT
procedures. This approach proved more effective and the PoD results were much higher for the same
large flaws.

In the PISC-II trials (21), the approach of using more flexible ultrasonic NDT procedures showed
that the flaw characteristics (e.g. flaw shape, flaw geometry, orientation) had a relatively larger
influence on the final PoD results compared to other parameters of the NDT procedures.

4.2.5. Offshore tubular Joints


The underwater PoD trials at University College London in the early 1990’s considered the detection
of fatigue cracks in offshore tubular joints (22). The results of the trials were used to compare the
flaw detection capabilities of Magnetic Particle Inspection (MPI) with a number of eddy current
NDT techniques as well ultrasonic NDT techniques using creeping waves. For the techniques
considered, the 90-95 PoD/Confidence limit combination was being achieved for cracks with typical
lengths ≥ 100 mm.

4.2.6. Dutch Welding Institute (NIL)


The Dutch Welding Institute (Nederlands Instituut voor Lastechwick (NIL)) acts as a moderator of
NDT in the Netherlands, but does not have its own experts in NDT.

During the mid-1980’s to the mid-90’s NIL produced four reports based on four major joint industry
projects (JIP), which were funded and carried out by Dutch industry. One of the JIP projects (23)
involved assessing the reliability of mechanised ultrasonic NDT, in comparison with standard film
radiography and manual ultrasonic NDT, for detecting flaws in thin steel welded plates (i.e. 6mm to
15mm).

11
There were 244 simulated, but realistic, flaw types such as lack of penetration, lack of fusion, slag
and gas inclusions and cracks, which spanned 21 flat welded test plates. Some of the main
conclusions were:

• Mechanised ultrasonic NDT (i.e. mechanised pulse-echo and time of flight diffraction (TOFD)),
performed better than manual ultrasonic NDT with respect to flaw detection capability.
• Mechanised ultrasonic NDT was better at flaw sizing than manual ultrasonic NDT.
• Double exposure weld bevel radiography performed than 00 film radiography.
• The detection performance did not depend on the wall thickness in the range 6mm to 12mm.

Some of the PoD values associated with this particular flaw population, and the 6mm to 12mm
plates, are as follows:

NDT Methods PoD Values (%)

Mechanised ultrasonic and TOFD 60-80


Manual ultrasonic NDT 50
00 film radiography 65
Double exposure weld bevel radiography 95
False calls 10-20

It is always important in these kinds of studies not to draw too many general conclusions but simply
accept the results for this particular set of flaw specimens.

The results of this particular NIL study on PoD, along with the NORDTEST (18), PISC (20, 21) and
the underwater trials at UCL (22), are reviewed in more detail in an HSE report with a focus on
offshore technology (24).

4.2.7. Railways (National NDT Centre (UK))


During the last 5 years the NNDTC has worked with the UK rail industry’s main line and London
underground to improve and quantify the reliability of inspection. This has included looking at the
reliability of ultrasonic near-end and far-end scan methods used on axles as well as other work on
bogie frames, wheel sets and train structures. There has been work also on the rail infrastructure
including rail inspection and edge-corner cracking issues and electromagnetic modelling.

POD is commonly used in the rail industry to quantify reliability and to optimise the inspection
periodicity using probabilistic methods. NNDTC has produced improved estimates of POD for
ultrasonic axle inspection. POD estimates have also been produced for the improved NDE methods
and for new designs utilising hollow axles.

More recently, the NNDTC has developed a simulation model utilising real A-Scan data and data
from real flaws to produce POD curves for far-end and near end axle inspection. This enables
specific POD curves to be produced for individual axle designs and geometries. The location and
sizes of the cracks can be altered and the effect of geometric features on detectability evaluated.

There has been a lot of interest in the industry in improved methods for NDT measurement of bogie
frames and NNDTC has been heavily involved in this, particularly for inspecting less accessible parts
of the bogie. This work included POD trials on manual ultrasonic inspection of welds in bogie
frames (25).

12
4.2.8. LPG Storage Vessels
The extent of non-invasive inspection of Liquid Petroleum Gas (LPG) storage vessels has been
considered previously by Georgiou and a probabilistic model was devised for optimising flaw
detection and a number of reports and papers were published (26-29). A guidelines document (28)
was written to assist companies and HSE inspectors to assess how much NDT was required in order
to achieve a desired probability of detecting a flaw and was based on a concept called the ‘index of
detection’ (IoD). The IoD was related to ‘Probability of Inclusion’ (PoI) curves and to a particular
PoD (a) curve (i.e. for ultrasonic NDT), which was kindly provided by NNDTC from a particular
PoD study (30).

Since the work by Georgiou, some HSE inspectors have considered the PoI curves as well as the IoD
results in the guidelines document (28). It was considered timely to assess their comments as well as
pull together all the statistical models considered so far, validate them against real data using
appropriate statistical techniques and select the best available model. The additional work on the PoI
curves and the IoD have been completed alongside this PoD work and are provided as two self
contained reports in Appendices C and D respectively. The updated work is now considered to have
wider applications than just the ultrasonic NDT of LPG storage vessels.

5. THE LIMITATIONS OF APPLYING POD CURVES


5.1. COMMENTS ON HIT/MISS DATA AND SIGNAL RESPONSE DATA
Whilst the approaches to determine the PoD (a) function for hit/miss data and the signal response
data are quite different, the log-odds and cumulative log-normal distribution functions are very
similar for the same statistical parameters. Figure 6 shows a comparison between the log-odds and
cumulative log-normal for =0 and =1 (6).

On occasions the behaviour of the PoD data may appear illogical and the PoD (a) function selected
(e.g. log-odds or cumulative log-normal) may not fit the data. It may be of course that other
modelling approaches need to be considered (31, 32). However, it is useful to carry out some quick
checks to see if there is something specific about the data in order to decide what action to take.

In the case of hit/miss data it has been observed that the PoD (a) function can sometimes decrease
with flaw size (i.e. large flaws are missed more than small ones). This is usually because the NDT
experiment was poorly designed and it would require a repeat some of the trials with better designed
NDT procedures. There was a good example of this in the PISC-I trials discussed above in section
4.2.4. There are also a number of examples of this in the NTIAC data book (5) and a particular one is
provided in Figure 7, simply to illustrate how the PoD curve can behave for such a case.

Regions of flaw hits and flaw misses should not be distinct, there has to be a good overlap (c.f.
Figure 2), otherwise the analysis that fits the log-odds model will not produce a valid solution (6).
This usually means more data is required in the region between asmallest and alargest (c.f. Figure 2).

It is also possible to produce what appears to be an acceptable PoD (a) function that fits the data
well, but the confidence limit decreases with increasing flaw size. This is usually evidence that the
log-odds model is not a good fit. This behaviour is usually associated with extreme values of and
(e.g. large and small ).

In signal response data, there is less reliance on the overlap of flaw size range and more emphasis on
the linear relationship between ln(â) and ln(a). When the relationship is not linear, the cumulative
log-normal will not fit the data. This is usually associated with unreasonably values for and and
the lower confidence limit will eventually decrease with increasing flaw size (similar to that
13
observed with the hit/miss data). When these situations occur, it is worth checking that the NDT
experiment was designed and executed properly. Failing that, it is likely that a different model needs
to be investigated (32).

5.2. IMPORTANT OPERATING AND PHYSICAL PARAMETERS


In the Recommended Practice (1), operating parameters for each of five NDT methods are provided
(i.e. Ultrasound, Eddy Currents, Penetrants, Magnetic Particle and Radiography). Each method has a
very detailed list of both operator controlled parameters, relating to the NDT method, and physical
parameters associated with the specimen and flaws. The parameters for each NDT method are too
numerous to repeat here, but different NDT methods will be considered to assess the differences in
their respective PoD curves. In addition, the effects on the PoD curves from material properties, the
specimen geometry and the flaw characteristics will also be considered.

The NTIAC data book (5) provides information on 423 PoD curves covering eight NDT methods
(i.e. the five mentioned in the Recommended Practice (1) above as well as visual testing and two so
called emerging NDT methods, which are Holographic Interferometry and ‘Edge of Light’
inspection).

In this report, the NTIAC data book was used as the prime source of raw PoD data and these data
have been used to assess how the PoD curves are affected by the various operational and physical
parameters in the sub-headings below. It is important to note that the NTIAC data book contains only
hit/miss data and in each of the 423 PoD curves it is the log-odds model that has been used to fit the
data (using a 95% confidence limit). In all cases the actual flaw dimensions have been verified by
destructive analysis and measurement.

In order to assess the effects of the operational and physical parameters, it was necessary to find PoD
data where only one of the operational or physical parameters changed while the other parameters
remained the same. This was not always completely clear, as there were always some uncertainties,
notwithstanding the data sheets indicating which parameters were nominally the same and which
were different. The examples selected cover a range of NDT methods and help to illustrate the kind
of differences that can exist between PoD results, but without any deliberate attempt to maximise
these differences.

Before observing the effects of certain operational and physical parameters on the PoD results, it is
important to note the following points:

• The PoD data in the NTIAC data book (5) was collected about 30 years ago and may not
necessarily reflect current capabilities with modern digital instrumentation.
• The PoD data illustrated in each of the figures 7 – 13 are valid for the particular datasets in
question. It would be wrong to draw general conclusions about PoD values (e.g. ultrasound is
better than X-ray). The figures merely serve to illustrate the possible effects that the physical and
operational parameters can have on the PoD and that we should be aware of these effects when
quoting PoD results.
• Equipment ‘calibration’ is also one of the important variables in the application of an NDT
procedure. It is believed that no attempt was made to resolve calibration issues in collecting the
inspection data.
• The designated operators A, B and C recorded in the datasets are not necessarily the same 3
persons each time.

Notwithstanding the above points, the PoD datasets in the NTIAC data book (5) are considered a
rich, comprehensive and valid set of data, which would almost certainly be prohibitively expensive
14
to repeat by any one organisation using more modern digital technology. Such data does not appear
to exist elsewhere in such an easily accessible and consistent format in which to illustrate the
comparisons below.

5.2.1. NDT Method


To illustrate the differences in the PoD curves that can occur for different NDT methods, the NTIAC
data book was used to identify the NDT carried out on the same flaw specimen and by the same
designated operator. Whether the designated operator (e.g. operator C) is precisely the same person
in each case is not absolutely clear. However, it is believed that the cases selected offer a reasonable
independent measure of the differences.

Two Titanium flat plates (i.e. thicknesses 1.7mm and 5.7mm) with a total of 135 cracks were
inspected by the same designated operator using; manual eddy currents, manual ultrasound (surface
waves) and X-ray radiography. The PoD curves for each method are plotted in Figure 8 and show the
differences in the PoD curves for a particular dataset. In particular, the flaw size corresponding to the
90% PoD varies significantly for each NDT method (i.e. 3.4mm, 14.8mm and 18.5mm for
ultrasound, eddy currents and X-ray radiography respectively).

5.2.2. Fluorescent Penetrant NDT


In the case of fluorescent penetrant NDT, two datasets were considered which quantify the
differences in PoD between the cases of no developer and developer being used to reveal surface
flaws (Figure 9). Best practice cleaning procedures were followed between inspections. Whilst
surface lengths were measured, the depths were predicted from validated crack growth procedures.

The PoD differences in Figure 9 for this particular flaw specimen (i.e. Haynes 188 alloy (AMS
5608A, with 125 RMS and with dimensions 3.5in x 16in x 0,19in) are quite large, with the 90% PoD
not being achieved without the developer.

5.2.3. Material Properties


To assess the differences in PoD for different material being inspected with the same NDT method,
the same size flaw specimen with the same nominal flaws had to be found. Clearly, this was not
going to be 100% possible with different material specimens. However, by a close examination of
the datasheets, which accompanying each dataset, it was possible to find flaw specimens that were
produced in the same way with the intention of producing the same flaw types.

The three PoD results illustrated in Figure 10 are for three different materials (i.e. aluminium,
titanium and steel). The datasheets for these three datasets suggest that the only physical difference is
the material, although the width of the steel plate is different from the other two. The thickness of all
three is the same and they are in the same ‘as machined’ state. The flaws were all initiated using the
same mechanism and cover a similar length range, but clearly the flaws in each different material
specimen will not be identical.

It is worth mentioning that these particular steel PoD results improved dramatically once the
specimen went beyond the ‘as machined’ state (e.g. etching and proof loading).

5.2.4. Specimen Weld Geometry


The aim here was to look for PoD data that had different weld conditions. Since the NTIAC data
book has considered mainly flat panel specimens and bolt holes, there was not a V-butt weld to
compare with J-prep weld, for example. The closest was a particular comparison for the same flaw
15
specimen, but with a different condition of the weld. The PoD results for aluminium welds with
crowns, was compared to PoD results with the welds ground flush. The NDT method used was X-ray
radiography and the results are illustrated in Figure 11.

The differences in PoD are relatively small and the 90% PoD was not achieved in both cases,
although for the ‘welds ground flush’ PoD, the 90% PoD was very close to 0.75in (~19mm).

5.2.5. Flaw Characteristics


To illustrate the possible effect on the PoD from inspecting different flaws, fluorescent penetrant
NDT was considered for longitudinal cracks and transverse cracks covering the same flaw length
range. The datasets for two specimens were found that were physically the same apart from the
flaws. The PoD results for this comparison are illustrated in Figure 12. The transverse flaws are
associated with a much lower PoD values for relatively smaller crack lengths (i.e. below about 0.15in
( ~ 4mm)), but the PoD results are more similar for larger crack lengths (i.e. above about 0.25in (~
6mm)), beyond which the PoD results both converge to unity.

5.2.6. Human Reliability


This is an area which has been researched extensively in the UK at the NNDTC (c.f. reference 13). It
would be very easy to show some quite startling differences in detection capability based on human
reliability studies. In manual ultrasonic NDT, for example, an often quoted anecdote is ‘you can only
believe a manual ultrasonic NDT result 50% of the time’. Perhaps this originated from typical
differences observed in the past between two operators in various detection trials (c.f. the NIL study
(23) discussed in section 4.2.6). The ‘50% anecdote’ is believed to be too simplistic for many
situations and more information needs to be considered.

The NTIAC data book (5) does contain a great deal of data where the physical and operational
parameters are the same, for a particular NDT method, and the only difference is the operator.
However the data book did not set out to study human reliability. Nevertheless, it does appear that
for nearly every datasheet there are 3 PoD results (i.e. according to operators A, B, C). There are a
number of cases where the differences in the PoD results are significant and others where the PoD
results are very similar.

The cases selected for illustrating operator variability here are illustrated in Figure 13 for ultrasonic
immersion NDT, inspecting titanium plates with low cycle fatigue cracks.

Whilst the results appear similar, and the same order of flaws is missed by each operator, the 90%
PoD varies by at least a factor of 2 (c.f. operator A with operators B and C)

6. DISCUSSION
This section brings together all the salient features of this study. It is much more than an Executive
Summary and is aimed particularly at HSE inspectors who want a reasonably quick understanding of
PoD without having to read the whole report.

6.1. INTRODUCTION
There is a large amount of ‘Probability of Detection’ (PoD) information available; starting with the
pioneering work in the late 1960’s to early 1970’s for the aerospace industry, to more recent and
more general industrial applications.

16
PoD curves have been produced for a range of Non-Destructive Testing (NDT) methods (e.g.
ultrasound, radiography, eddy currents, fluorescent penetrants). Whilst it is reasonable to assume that
each NDT method will produce different PoD curves (even when applied to the same flaws), it is
believed that many who use PoD curves do not fully appreciate how significant the differences can
be. PoD curves are also dependent on a number of physical and operational parameters.

It is important to understand how PoD curves are derived and to question the validity of their
application, as well as to appreciate their limitations. This discussion aims to provide this
information as well as considering the relevance of PoD to fitness for service issues.

6.2. AIMS AND OBJECTIVES


The overall goal is to provide concise and understandable information on PoD curves, which Health
and Safety Inspectors should find useful when discussing safety cases involving PoD curves. The
specific objectives are:

• To provide a clear and understandable description of how PoD curves are derived.
• To provide practical applications of PoD curves, particularly to fitness for service issues.
• To quantify the limitations of PoD curves.

6.3. HISTORICAL DEVELOPMENT


An early definition of NDT reliability is, 'the probability of detecting a crack in a given size group
under the inspection conditions and procedures specified' (1). The PoD is usually expressed as a
function of flaw size (i.e. length or depth), although it is a function of many physical and operational
parameters.

Repeat inspections of the same flaw size or the same flaw type do not necessarily result in consistent
hit or miss indications. There is a spread of detection results, which is why the detection capability is
expressed in terms of the PoD. An early example is illustrated by Lewis et al (2), where 60 air force
inspectors, using the same surface eddy-current technique, inspected 41 cracks around fastener holes.
The results in Figure 1 show that the chances of detecting the cracks increase with crack size.
However, none of the cracks were detected 100% of the time and different cracks of the same size
have different detection percentages. Figure 1 also shows that the ‘log-odds’ distribution is a good fit
to the data and illustrates why PoD is an appropriate measure of detection capability.

PoD functions have been the subject of many studies since the late 1960’s and early 1970's, where
most of the work was carried out in the aerospace industry (3, 4). It was becoming clear that to
ensure the structural integrity of critical components, the question ‘…what is the smallest flaw that
can be detected by an NDT method?’ was less appropriate than the question‘…what is the largest
flaw that can be missed?’ To elaborate on this, some real ultrasonic inspection data has been
considered from the ‘Non-destructive Testing Information Analysis Centre’ (NTIAC) capabilities
data book (5). Figure 2 illustrates the detection capabilities of an ultrasonic surface wave inspection
of two flat aluminium plates, containing a total of 311 simulated fatigue cracks with varying depths.
The flaws were recorded as detected (or hit) with PoD=1, or missed with PoD=0.In Figure 2, there
are three distinct regions separated by the lines asmallest and alargest. The region between asmallest and
alargest shows flaws of the same size, which are hit and missed, and alargest is much larger than asmallest.

In 1969, a program was initiated by the National Aeronautics and Space Administration (NASA) to
determine the largest flaw that could be missed, for various NDT methods to be used in the design
and production of the space shuttle. The methodology by NASA was soon adopted by the US Air

17
Force as well as the US commercial aircraft industry. In the last two decades many more industries
have adopted similar NDT reliability methods based on PoD. Some of these will be discussed below.

Early on in the mid-1970’s, a constant PoD for all flaw types of a given size was proposed and
Binomial distribution methods were used to estimate this probability, along with an associated error
or ‘lower confidence limit’ (1). It is clear from Figure 1, that this early assumption about a constant
PoD for flaws of a given size was too simplistic.

In the early to the mid-1980s, the approach was to assume a more general model for the PoD vs. flaw
size ‘a’. Various analyses of data from reliability experiments on NDT methods indicated that the
PoD (a) function could be modelled closely by either the 'log-logistic' (or ‘log-odds’) distribution or
the cumulative 'log-normal' distribution (6).

6.4. FLAW SAMPLE SIZES FOR ‘HIT/MISS’ DATA AND ‘SIGNAL RESPONSE’ DATA
The ‘Recommended Practice’ (1), originally prepared for the aircraft industry, provides
comprehensive information on the experimental sequence of events for generating data to produce
PoD curves and to ‘certify’ (i.e. validate) an NDT method or procedure.

The sequence of events can be broadly summarised as follows (see also (3)):

• Manufacture or procure flaw specimens with a large number of flaw sizes and flaw types
• Inspect the flaw specimens with the appropriate NDT method
• Record the results as a function of flaw size
• Plot the PoD curve as a function of flaw size

However, before the manufacture or procurement of flaw specimens, it is necessary to ask:

• What flaw parameter size will be used (e.g. flaw length or flaw depth)?
• What overall flaw size range is to be investigated (e.g. 1mm to 9mm)?
• How many intervals are required within the flaw size?

The recommended practice (1) also provides critical information on the flaw sample size, for each
flaw width interval, in order to achieve the desired PoD and the appropriate lower confidence limit.

It is important to appreciate that in selecting the sample size there are two distinct issues to address.
First, the sample size has to be large enough to achieve the desired PoD and confidence limit
combination. Second, the sample size has to be large enough to determine the statistical parameters,
associated with the PoD curve that best fits the data.

Originally, NDT results were always recorded in terms of ‘hit/miss’ data (c.f. Figure 2), which is
discrete data. This way of recording data is still appropriate for some NDT methods (e.g. magnetic
particle testing). However, in many inspections there is more information in the NDT response (e.g.
the light intensity in fluorescent NDT). Since the NDT signal response can be interpreted as the
perceived flaw size, the data is often called â, that is, ‘a hat’ or ‘signal response’ data, which is
continuous data.

6.4.1. Model for Hit/Miss Data


For hit/miss data a number of different statistical distributions have been considered (7). It was found
that the log-logistic distribution was the most acceptable and the PoD (a) function can be written as;

18
! $ ln a # m %
& '
e 3( " )
PoD(a ) = ! $ ln a # m %
& '
1+ e 3( " )

where a is the flaw size and m and are the median and standard deviation respectively.

Another convenient form of the above equation can be written as:


! + " ln a )
e(
PoD(a ) = ! + " ln a )
1 + e(

and it is straight forward to show that:

# PoD(a ) $
ln % & = ! + " ln a
( 1 ' PoD(a ) )

! !
where m = # and " =
" # 3

Hence the name ‘log-odds’ (i.e. odds = probability of success/probability of failure, c.f. Figure 1) and

ln(odds ) ! ln a

6.4.2. Model for Signal Response Data


For signal response data it has been observed (6, 8) that an approximate linear relationship exists
between ln(â) and ln(a), where a is the flaw size. The relationship is often expressed by:

ln(â ) = ! 1 + " 1 ln(a) + #

where is an error term and is normally distributed with zero mean and constant standard deviation
. The above relationship is expressing the fact that ln(â) is normally distributed with mean
2
(a) = 1 + 1 ln(a) and constant standard deviation (i.e. N( (a), ).

The PoD (a) function for signal response data (i.e. ln(â)) can be expressed as:

PoD(a ) = Probability (ln(â) > ln(âth ))

where ln(âth)is the flaw evaluation threshold.

Using standard statistical notation (9), the PoD for signal response data can be expressed as:

% ln(âth ) ' (" 1 + # 1 ln(a)) &


PoD(a ) = 1 ' F ( )
*( $! )+

where F is the continuous cumulative distribution function.

It is straight forward to show using the symmetric properties of the Normal distribution (9) that:

19
# ln(a) " µ $
PoD(a ) = F % &
' ! (

which is the cumulative log-normal distribution where

ln(âth ) # ! 1 "
the mean µ (a ) = and standard deviation " = ! .
"1 #1

The estimates for 1, 1 and are computed using ‘maximum likelihood’ methods (6).

6.4.3. To Compute PoD parameters


In order to determine the parameters associated with the PoD (a) function, for hit/miss data, it is
recommended that the flaw sizes be uniformly distributed between the minimum and maximum flaw
size of interest, with a minimum of 60 flaws(6).

For signal responses data, a direct consequence of the additional information means the range of flaw
sizes is not as critical. The recommendation is a minimum of 30 flaws in the sample size (6).

6.4.4. To Achieve the Desired PoD/Confidence Limit Combination


In practice, a PoD and lower confidence limit combination often quoted is 90% and 95%
respectively. For hit/miss and signal response NDT data (1), it is necessary to have a minimum
sample of 29 flaws in each flaw width interval. This could be interpreted as 29 flaw specimens with
one flaw in each specimen. This means that if 6 flaw width intervals were used, a minimum of 174
flaw specimens would be necessary, a considerable cost to produce PoD curves experimentally.

With such a large number of flaws, the requirement to compute the PoD (a) function parameters, as
discussed above in (a), is easily satisfied.

6.5. POD MODELLING


During the last two decades, the modelling of NDT capability has increased and improved
substantially. The savings in carrying out modelling of PoD, as opposed to the experimental
determination of PoD, has been a strong motivation in developing models.

The historical development of computational NDT and PoD models is discussed in some detail in a
relatively recent NTIAC publication (11), covering the period from 1977-2001. The development of
modelling PoD has focussed on NDT methods such as ultrasound, eddy currents, X-ray radiography
and numerous publications are cited in reference 11.

During the 1990’s there were major research efforts in modelling NDT reliability and PoD from
Iowa State University (USA) and the National NDT Centre, Harwell (UK). Two notable publications
in the 1990’s were Thompson (12), which contained an updated review of the PoD methodology
developed for the NDT of titanium components and Wall (13), which focussed on the PC-based
models at Harwell and included corrections to PoD models due to human and environmental factors.
Both the above publications are worthy of consideration for anyone wishing to start modelling PoD
or to get a very good overview of the capabilities and usefulness of modelling PoD.

20
In recent years both Iowa State and the National NDT Centre (NNDTC) have continued developing
models to determine PoD results. Iowa State has established the Model Assisted PoD (MAPOD)
working group, with the joint support of some major aerospace and airframe research laboratories.
The NNDTC, which is now part of ESR Technology Ltd, have continued to develop some interesting
PoD models for composites (16), magnetic flux leakage in floor scanners and other PoD applications
in the offshore industry (see www.nndtc.com ).

A recent interesting NDT reliability model is the ‘PoD-generator’ (14). The model allows the
assessment and optimisation of an inspection program for in-service components using ultrasound
and radiography.

6.6. PRACTICAL APPLICATIONS OF POD


The methodology of PoD reliability studies, developed in the 60s and 70s, for the aerospace industry
has been adopted by a number of other industries and some of these are discussed below.

6.6.1. Aircraft Structures, Inclusions in Titanium Castings


Childs et al (17) assessed X-rays radiography for the detection of ceramic inclusions in thick
Titanium (Ti) castings for aircraft structures. The castings were manufactured using the ‘Hot
Isostatic Pressure (HIP) process. During this process, the ceramic face coat can break into splinters
and become embedded in the casting as inclusions (i.e. ‘shells’). The X-ray radiography results were
analysed in terms of PoD of the shell diameter, for different face coat formulations, and the results
were used to improve the face coat formulations and also improve detectability.

6.6.2. NORDTEST Trials


The NORDTEST trials (18) set out to compare manual ultrasonic NDT with X-ray radiography when
applied to carbon manganese steel butt welds ≤ 25mm thick. The trials were used to establish
‘acceptance curves’, which defined acceptance probabilities (i.e. 1-PoD) against flaw height. The
results of the NORDTEST trials demonstrated that there was an approximate relationship between
certain ultrasonic NDT and radiographic NDT acceptance criteria (see also reference (19)).

6.6.3. Nuclear Components (The PISC Trials)


The Programme for the Inspection of Steel Components (PISC), carried out in the mid to late
seventies (20), considered flaw detection capabilities of ultrasonic NDT on thick nuclear pressure
vessel components (i.e. ~ 250mm). The ultrasonic NDT procedures were applied too rigidly and
signal responses from large planar flaws were not evaluated properly, resulting in low PoD values.
However, some of the inspectors were also allowed to use their own preferred NDT procedures,
which proved more effective and the PoD results were much higher for the same large flaws.

In the PISC-II trials (21), the approach of using more flexible ultrasonic NDT procedures showed
that the flaw characteristics (e.g. flaw shape, flaw geometry, orientation) had a relatively larger
influence on the final PoD results compared to other physical parameters.

6.6.4. Offshore Tubular Joints


The underwater PoD trials at University College London in the early 1990’s considered the detection
of fatigue cracks in offshore tubular joints (22). The results of the trials were used to compare the
flaw detection capabilities of Magnetic Particle Inspection (MPI) with a number of eddy current
NDT techniques as well ultrasonic creeping wave NDT techniques. The 90-95 PoD/Confidence limit
combination was being achieved for cracks with typical lengths ≥ 100 mm.

21
6.6.5. Dutch Welding Institute (NIL)
During the mid-90s, the Dutch Welding Institute (Nederlands Instituut voor Lastechwick (NIL))
produced a report (23) involving, amongst other NDT methods, the reliability of mechanised
ultrasonic NDT for detecting flaws in thin steel welded plates (i.e. 6mm to 15mm).

Some of the main conclusions for ultrasonic NDT were:


• Mechanised ultrasonic NDT and time of flight diffraction (TOFD), had a higher flaw detection
capability than manual ultrasonic NDT (i.e. PoD of 60%-80% compared to 50% respectively).
• Mechanised ultrasonic NDT was better at flaw sizing than manual ultrasonic NDT.
An HSE report, focussing on offshore technology (24), has carried out a detailed reviewed on the
NORDTEST trials, the PISC trials, the underwater trials at UCL and the NIL PoD study.

6.6.6. Railways
During the last 5 years the NNDTC has worked with the UK rail industry’s main line and London
underground to improve and quantify the reliability of inspection.

PoD methods are commonly used in the rail industry to quantify reliability and to optimise inspection
periodicity. The NNDTC has developed a simulation model utilising real A-Scan data and data from
real flaws to produce POD curves for far-end and near end axle inspection. NNDTC has also been
heavily involved in POD trials on manual ultrasonic inspection of welds in bogie frames (25).

6.6.7. LPG Storage Vessels


The NDT of LPG storage vessels was considered by Georgiou and a probabilistic model for
optimising flaw detection was developed. The reports and papers published (26-29), included a
guidelines document (28) written to assist companies and HSE inspectors to assess the amount NDT
required order to achieve a desired PoD and used a concept called the ‘index of detection’ (IoD).

In the meantime, some HSE inspectors have considered the IoD work. It was timely to assess their
comments as well as pull together the statistical models considered so far, validate them against
different data using statistical techniques and select the best available model. This additional work
has been completed alongside this PoD study and is now considered to have wider applications than
just the ultrasonic NDT of LPG storage vessels (c.f. Appendices C and D).

6.7. DEPENDENCE OF POD ON OPERATIONAL AND PHYSICAL PARAMETERS


6.7.1. Important Operational and Physical Parameters
The NTIAC data book (5) has been the prime source of raw PoD data (i.e. 423 PoD curves)
for assessing the effects on PoD curves by the operational and physical parameters. The
NTIAC data book contains only hit/miss data and in each of the 423 PoD curves it is the log-
odds model that has been used to fit the data, along with a 95% confidence limit.

To assess the effects of the parameters, PoD data was considered where only one of the parameters
changed while the other parameters remained the same. The examples selected cover a range of NDT
methods and help to illustrate the kind of differences that can exist between PoD results, but without
any deliberate attempt to maximise these differences.

Before observing the effects of certain parameters on the PoD curves, it is important to note the
following points:
22
• The PoD data in the NTIAC data book was collected about 30 years ago and may not necessarily
reflect current capabilities with modern digital instrumentation.
• The PoD data illustrated in the figures to follow are valid for the particular datasets in question.
It would be wrong to draw general conclusions about PoD values. The figures merely serve to
illustrate the possible effects that the parameters can have on the PoD and that we should be
aware of these effects when quoting PoD results.
• Equipment ‘calibration’ is an important variable in the application of an NDT procedure. It is
believed that no attempt was made to resolve this issue in collecting the inspection data.
• The designated operators A, B and C recorded in the datasets are not necessarily the same 3
people each time.
Notwithstanding the above points, the PoD datasets in the NTIAC data book are a rich, and
comprehensive set of data, which would almost certainly be prohibitively expensive to repeat by any
one organisation using more modern digital technology. Such data does not appear to exist elsewhere
in such an easily accessible and consistent format to illustrate the comparisons below.

6.7.2. NDT Method


Two Titanium flat plates (i.e. thicknesses 1.7mm and 5.7mm) with a total of 135 cracks were
inspected by the same designated operator using manual eddy currents, manual ultrasound (surface
waves) and X-ray radiography. The PoD curves for each method are plotted in Figure 8 and show the
differences in the PoD curves for a particular dataset.

6.7.3. Fluorescent Penetrant NDT


Two datasets were considered which quantify the differences in PoD between the cases of no
developer and developer being used to reveal surface flaws (Figure 9). Whilst surface lengths were
measured, the depths were predicted from validated crack growth procedures.

6.7.4. Material Properties


The three PoD results illustrated in Figure 10 are for aluminium, titanium and steel. The datasheets
for these datasets suggest the only physical difference is the material, although the width of the steel
plate is different from the other two. The thickness of all three is the same and they are in the same
‘as machined’ state. The flaw types were all initiated using the same mechanism. However, the flaws
in each different material specimen are clearly not identical. It is worth noting that the particular PoD
results for steel improves dramatically once the specimen goes beyond the ‘as machined’ state (e.g.
etching and proof loading).

6.7.5. Specimen Weld Geometry


The NTIAC data book contains data on flat panel specimens and bolt holes, with no V-butt welds to
compare with J-prep welds, for example. However, there are particular PoD results for aluminium
welds with crowns and PoD results with the same aluminium welds ground flush. The NDT method
used was X-ray radiography and the results are illustrated in Figure 11.

6.7.6. Flaw Characteristics


Fluorescent penetrant NDT was considered for inspecting longitudinal cracks and transverse cracks
covering the same flaw length range. The PoD results for this comparison are illustrated in Figure 12.
The transverse flaws are associated with much lower PoD values for relatively smaller crack lengths
(i.e. below about 4mm), but are more similar for larger crack lengths (i.e. above about 6mm).

23
6.7.7. Human Reliability
It would be very easy to show significant differences in detection capability based on human
reliability studies. In manual ultrasonic NDT an often quoted anecdote is, ‘you can only believe a
manual ultrasonic NDT result 50% of the time’. Perhaps this originated from typical differences
observed in the past (c.f. the NIL study (23)). The ‘50% anecdote’ is believed to be too simplistic for
many situations and more information is required. The NTIAC data book does contain a great deal of
data where the only difference is the operator (i.e. operators A, B, C).

The cases selected for illustrating operator variability here are illustrated in Figure 13 for ultrasonic
immersion NDT, inspecting titanium plates with low cycle fatigue cracks.

7. INDEPENDENT VERIFICATION
The independent verification was carried out by George A Georgiou (GAG), Emilie Beye (EB), and
Melody Drewry (MD). Whilst GAG is the author of this report, there were specific aspects of the
statistical theory and calculations in Appendix C that he did not carry out and particular checks were
carried out in relation to the formal statistical tests and the datasets used in those tests. Similarly, EB
and MD, who are co-authors of Appendix C, were not involved at all in the main study and
numerous checks were carried out by them in the following areas:

• Derivation of the mathematical equations


• The statistical definitions and terminologies in Appendix A
• The figures and the corresponding datasets (i.e. Figure 2 and Figures 7 – 13)
• The comparison of the Probability of Inclusion curves in Appendix C
• The updated Index of Detection model in Appendix D
• The Conclusions and Recommendations

A formal verification statement is made in section 12.

8. CONCLUSIONS
• The ‘log-odds’ distribution is found to be one of the best fits for hit/miss NDT data.
• The log-normal distribution is found to be one of the best fits for signal response NDT data, and
in particular for flaw length and flaw depth data as determined by ultrasonic NDT.
• In some cases, the ‘log-odds’ and cumulative log-normal distributions are very similar, but there
are many cases where they are significantly different.
• There are NDT data when neither the ‘log-odds’ nor the log-normal distributions are appropriate
and other distributions need to be considered.
• There is often a large gap between the smallest flaw detected and the largest flaw missed.
• Very small or very large flaws do not contribute much to the PoD analysis of hit/miss data.
• To achieve a valid ‘log-odds’ model solution for hit/miss data, a good overlap between the
smallest flaw detected and the largest flaw missed is necessary.
• To achieve a valid log-normal model solution for signal response data, there is less reliance on
flaw size range overlap, but more on the linear relationship between ln(â) and ln(a).
• When the PoD (a) function decreases with increasing flaw size, it is usually an indication that
the NDT procedures are poorly designed.
• When the lower confidence limit decreases with increasing flaw size, notwithstanding an
acceptable PoD (a) function, it is usually associated with extreme or unreasonable values of the
mean and standard deviation.
• The effect on PoD results for particular operational and physical parameters can be significant
for datasets selected from the NTIAC data book of PoD curves.

24
• The PoD data in the NTIAC data book were collected some 30 years ago and may not
necessarily reflect current capabilities with modern digital instrumentation. However, the results
are still believed to be relevant to best practice NDT.
• The PoD data illustrated in each of the figures 7 – 13 are valid for the particular datasets in
question. It would be wrong to draw too many general conclusions about the particular PoD
values (e.g. ultrasound is better than X-ray).
• Figures 7 - 13 serve to illustrate the possible effects that the physical and operational parameters
can have on the PoD and an awareness of these effects is important when quoting PoD results.
• NDT methods, equipment ‘calibration’, fluorescent penetrant developers, material, surface
condition, flaws and human factors are all important operational and physical parameters, which
can have a significant effect on PoD results.
• Whilst human factors are important variables in NDT procedures, they are often found not to be
as important as other operational and physical variables.
• The ‘Log-odds’ distribution was found to be the most appropriate distribution to use with the
JCL‘Probability of Inclusion’ model.
• The earlier JCL ‘Probability of Inclusion’ model has been validated against an independently
developed ‘Probability of Inclusion’ model by MBEL.

9. RECOMMENDATIONS
• Publish a signal response data book of PoD results.
• Publish a more up to date data book from different PoD studies and collate them in a way which
best serves more general industrial and modelling applications.
• Set up a European style project or Joint Industry Project to realise the above recommendations.

10. ACKNOWLEDGEMENTS
The author would like to acknowledge the organisations ASM and NTIAC for giving permission to
reprint and re-plot various figures in this study. Special acknowledgement goes to Ward Rummel, the
author of the NTIAC data book, for his advice during discussions on various PoD datasets in the data
book.

The author would also like to acknowledge Martin Wall (ERS Technology Ltd) for highlighting the
PoD research work that has been carried out at the NNDTC over the last two decades.

Lastly, the author would like to thank the HSE for funding this work and in particular to Graeme
Hughes for his guidance and useful discussion throughout this study.

11. REFERENCES
1. Rummel W D: ‘Recommended practice for a demonstration of non-destructive evaluation (NDE)
reliability on aircraft production parts’. Materials Evaluation Vol. 40 August 1982.
2. Lewis W H, Sproat W H, Dodd B D and Hamilton J M: ‘Reliability of non-destructive inspection
– Final Report’. SA-ALC/MME 76-6-38-1, San Antonio Air Logistics Centre, Kelly Air Force
Base, Texas, 1978.
3. Rummel W D: ‘Probability of detection as a quantitative measure of non-destructive testing end-
to-End process capabilities’, 1998. www.asnt.org/publications/materialseval/basics/Jan98 .
4. AGARD Lecture Series 190: ‘A recommended methodology to quantify NDE/NDI based on
aircraft engine experience’, April 1993, ISBN 92-835-0707-X
5. NTIAC Non-destructive Evaluation (NDE) capabilities data book, 3rd ed., November 1997,
NTIAC DB-97-02, Non-destructive Testing Information Analysis Centre.

25
6. Berens A P: ‘NDE reliability data analysis’, non-destructive evaluation and quality control:
qualitative non-destructive evaluation'. ASM Metals Data book, Volume 17, Fifth printing,
December 1997, ISBN 0-87170-007-7 (v.1).
7. Berens A P and Hovey P W: ‘Evaluation of NDE reliability characterisation’, AFWAL-TR-81-
4160, Vol. 1, Air Force Wright-Aeronautical Laboratories, Wright-Patterson Air Force Base,
December 1981.
8. Sturges D J: ‘Approaches to measuring probability of detection for subsurface flaws’, Proc. 3rd
Ann. Res. Symp., ASNT 1994 Spring Conference, New Orleans, 1994, pp229-231.
9. Crawshaw J and Chambers J: ‘A concise course in A-level statistics’, 1984, Stanley Thornes
(Publishers) Ltd. ISBN 0-7487-0455-8
10. Kreyszig E: ‘Advanced engineering mathematics’, John Wiley & Sons, Inc. 1983 (5th Edition,
p947).
11. Matzknin G A and Yolken HT: ‘Probability of detection (PoD) for non-destructive evaluation
(NDE)’, NTIAC-TA-00-01, August 2001.
12. Thompson R Bruce: ‘Overview of the ETC PoD methodology’, Review of Progress in
Quantitative Non-destructive Evaluation, Vol. 18b, Plenum Press, New York, July 19-24, 1998,
pp2295-2304.
13. Wall M: ‘Modelling of NDT reliability and applying corrections for human factors’, European
American Workshop, Determination of Reliability and Validation Methods of NDE, Berlin, June
18-20, 1997, pp87-98.
14. Voker A W F, Dijkstra F H, Terpstra S, Herrings H A M and Lont M A: ‘Modelling of NDE
reliability: Development of a PoD-Generator’, Proceedings of the 16th WCNDT, Montreal,
Canada, August 30-September 3, 2004.
15. Burch S F, Stow B A and Wall M: ‘Computer modelling for the prediction of probability of
detection of ultrasonic corrosion mapping’, Insight, Vol. 47 No 12 Dec 2005. (This can also be
downloaded from the BINDT website www.bindt.org).
16. Wall M and Burch S: ‘Worth of modelling for assessing the intrinsic capability of NDT’, 15th
World Conference on NDT, WCNDT15 Rome, October, 2000. (This can also be downloaded
from the following website www.ndt.net/article/wcndt00/papers/idn735/idm735.htm).
17. Childs F R, Phillips D H, Liese L W and Rummel W D: ‘Quantitative assessment of the
detectability of ceramic Inclusions in structural titanium castings by X-ray radiography’. Review
of Progress in QNDE Vol. 18B, 1999, pp2311-2317, Editors Thompson D O and Chimenti D E.
18. NORDTEST Report: 'Guidelines for NDE reliability determination and description'. NT TECHN
Report 394, 1998.
19. Kenzie B W, Mudge P J and Pisarski H G: ‘A methodology for dealing with uncertainties in
NDE data when using inputs to fracture mechanics analyses’, Proc. 13th International Conference
on NDE in the nuclear and pressure vessel industries, Kyoto, ASM International, 1995.
20. Commission of the European Communities, PISC-I, Report EUR 6371 EN Volumes 1 to VI,
Brussels, Luxembourg (1979).
21. Commission of the European Communities, PISC-II, Report Nos. 1 to 5, Joint Research Centre,
Ispra Establishment, Varese, Italy (1986).
22. Dover W D and Rudlin J R: ‘Results of probability of detection trials’, Proc IOCE 92, Aberdeen,
13-16 October 1992.
23. ‘NDT of thin plates – evaluation of results’, NIL Report, NDP 93-38 Rev. 1, 1995 (In Dutch).
24. Visser W: ‘PoD/PoS curves for non-destructive examination’. HSE Offshore Technology Report
2000/018, ISBN 07176 2297 5, 2002.
25. Warder, P Lilley J and Wall M: ‘Improved integrity management of bogie frame transom welded
joints’, AEAT Engineering Solutions and John Reddyhof HSBC Rail Conference, Engineering
integrity of railway systems, the Arup Campus, Solihull, 21-22 October 2003.
26. Georgiou G A: ‘The Extent of Ultrasonic non-Invasive Inspection of LPG Storage vessels’. HSE
Project, JCL Report No. 2/8/99, (September 1999, Revision 1).
26
27. Georgiou G A: ‘Probabilistic models for optimising defect detection in LPG storage vessels’.
HSE Project, JCL Report No. 3/3/00 (June 2000).
28. Georgiou G A: ‘Proposed Guidelines for estimating the extent of manual ultrasonic NDT for
LPG storage vessels’. HSE Project, JCL Report No. 4/3/00 (July 2000)
29. Georgiou G A: ‘Probabilistic models for optimising defect detection in LPG welds’. Proceedings
of BINDT, September 2000.
30. AEA Technology Report, AEAT-4389 HOIS (98) P8 Issue 2 (DRAFT). Data for POD curve
supplied with kind permission by Dr. Martin Wall, AEA Technology.
31. Schneider C R A and Georgiou G A: ‘Radiography of thin section welds, Part 2: Modelling’,
Insight, Vol. 45, No. 2, pp 119-121, February 2003.
32. Schneider C R A and Rudlin J R: ‘Review of statistical methods used in quantifying NDT
reliability’, Insight, Vol. 46, No. 2, pp 77-79, February 2004.

27
12. VERIFICATION STATEMENT
The following persons were involved in verification work relating to this report as outlined below.

Print name …………GEORGE A GEORGIOU (GAG) ……………


Position ……………Director of Jacobi Consulting Ltd…………….
Qualifications………BSc PhD (C.Eng., FIMA, FInstNDT)………...

Address:
57 Ockendon Road
London N1 3NL

Signature ……………………………………………………………

Print name …………MELODY DREWRY (MD)…………………..


Position ……………Research Scientist (Jacobi Consulting Ltd)……
Qualifications………BSc MSc (GInstNDT)…………………………

Address:
Flat 1, 54 Penywern Road
London
SW5 9SX

Signature …………………………………………………………….

Print name …………EMILIE BEYE (EB)…………………………


Position ……………Model Analyst (Abbey)……………………….
Qualifications………BSc MSc (Statistics)…………………………..

Address:
58 Orchard Way
Bicester
Oxfordshire OX26 2EJ

Signature ………………………………………………….

GAG carried out independent checks on the calculations used to produce Figures 5 to 11 in
Appendix C, as well as a check on the data used in the calculations (c.f. Appendix II of Appendix C).
The calculations were all found to be correct.

MD carried out random independent checks on the datasets used to produce Figure 2 and Figures 7 to
13 (i.e. from the NTIAC data book). MD also carried out checks on the depth data analysis in
Appendix C. All the Figures 9-11 were found to be a correct representation of the nominated
datasets. The depths used in Appendix C were the ones listed in Appendix II of Appendix C.

EB read the whole report and carried out independent checks on all the mathematical equations and
derivations. In addition, EB found the statistical definitions and terminologies used in the report, and
in particular those in Appendix A, to be correct and appropriate for engineers and scientists who may
not have a statistical background. EB has found that conclusions and recommendations of the report
to be based on sound scientific reasoning and to follow on logically from the evidence presented.
28
Table 1 Maximum Probability Tables
(Based on 90% PoD and 95% Lower Confidence Limit)

Number of trials Number of successes Maximum number of trials Maximum probability of


/flaw width interval /flaw width interval needed for certification achieving certification (%)
29 29 29 100.0
29 28 46 96.6
29 27 61 68.0
29 26 75 25.8
29 25 89 4.9
29 24 103 0.5

(The results are reprinted with permission of NTIAC. All rights reserved. (See references (1) and (5))

29
Figure 1 Example of detection percentages for a handheld Eddy-Current inspection and a
‘log-odds’ distribution fit to the data.
(Reprinted with permission of ASM International® . All rights reserved. (See references 2 and 6))

30
1.2

flaws detected

0.8

a sm allest a largest
is the smallest flaw detected is the largest flaw missed

0.6
Data Set: D1001AD (File D-UT1)
Specimen : Aluminum / Flat Panel

31
Thickness: 1.5mm and 5.6mm

PoD (Hit/miss data)


Condition: As Machined
0.4 NDT Method: Ultrasonic surface waves

Total number of flaws = 311


Total flaws detected = 258
flaws missed Total flaws missed = 53
0.2

0
0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00
Flaw depth (mm)

Figure 2 Ultrasonic NDT hit/miss data illustrating the relatively large gap between the smallest flaw detected and the largest flaw missed
(The results are re-plotted with permission of NTIAC. All rights reserved. (See reference (5))
Figure 3 The linear relationship between the log-odds and log flaw size
(Reprinted with permission of ASM International® . All rights reserved. (See reference 6))

32
Probability of Detection
function (PoD (a)) is the
mean of the probability
density function f(a)
PoD

Probability density function f(a )


of defect with fixed dimension a

33
Probability of detection PoD(a)
a
Flaw dimension

Figure 4 Schematic of the PoD for flaws of fixed dimension for ‘hit/miss’ data
The evaluation threshold ( a th)

34
ln (Signal Response)
Probability of Detection function
(PoD (a)) is the area between the
probability density function f(a) and
the evaluation threshold

Probability density
function f(a) of flaw
width fixed dimension a

a1 a2 ln (Flaw dimension)

Figure 5 Schematic of the PoD for flaws of fixed dimension for ‘signal response’ data
Figure 6 A comparison between the log-odds and cumulative log-normal distribution
functions for the same parameters =0 and =1.0
(Reprinted with permission of ASM International® . All rights reserved. (See reference (6))

35
100

90
flaws detected

80

70 Data Set: D9001(3)L (File D-UT9)


Specimen: 2219 Aluminum,GTA
Welded, Panels with
60 Lack of Penetration
The Log-odds model is not
Condition: As Welded and Scarfed
applicable
NDT Method: Ultrasonic, Shear Wave
50
No. of operators: 3 Operators combined

Total number of flaws = 499


40

36
Total flaws detected = 105
Total flaws missed = 394
90% PoD Not Achieved
30

Probability of detection (PoD) in %


20

flaws missed
10

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Crack Length in Inches

Log-Odds Model 95% Confidence Lower Limit Hit/Miss Data

Figure 7 An example of when the log-odds model was not applicable to the data collected
(The results are re-plotted with permission of NTIAC. All rights reserved. (See reference (5))
100
Ultrasonic Immersion
X-Rays
90

80

70

60
Eddy Currents

50

37
40
Data Set: ETAC003L-C (File A-ET3) Data Set: D3003CL (File D-UT3) Data Set: F30653CL (File F-XT3)
Test Object : 6AL-4V Titanium / Flat Plate Test Object : 6AL-4V Titanium / Flat Plate Test Object : 6AL-4V Titanium / Flat Plate
30 Low Cycle Fatigue Cracks Low Cycle Fatigue Cracks Low Cycle Fatigue Cracks

Probability of Detection (PoD) in %


Condition: After Etch and Proof Load Condition: After Etch and Proof Load Condition: After Etch and Proof Load
NDT Method: Eddy Current - Hand Scan NDT Method: UT Immersion-Shear Wave NDT Method: X-radiography
Operator: C Operator: C Operator: C
20
Total number of flaws = 135 Total number of flaws = 135 Total number of flaws = 61
Total flaws detected = 69 Total flaws detected = 116 Total flaws detected = 41
Total flaws missed = 66 Total flaws missed = 19 Total flaws missed = 20
10
90% POD = 0.581 in. (14.76 mm) 90% POD = 0.133 in. (3.38 mm) 90% POD = 0.729 in. (18.52 mm)

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Crack Length in Inches

Figure 8 PoD (a) log-odds model results for different NDT methods applied to the same flaw specimen
(The results are re-plotted with permission of NTIAC. All rights reserved. (See reference (5))
100

90

80 Non-Aqueous Developer

70

No Developer
60

50

38
40 Data Set: CE031(6)D (File C-PTE) Data Set: CE032(6)D (File C-PTE)
Test Object : Low cycle fatigue cracks in Test Object : Low cycle fatigue cracks in
Haynes 188, Flat Panels Haynes 188, Flat Panels
Condition: Etched Condition: Etched
30
NDT Method: Water Washable, Fluorescent NDT Method: Water Washable, Fluorescent
Penetrant, No Developer Penetrant, Non Aqueous Developer
Operator: C, Facility 1 Operator: C, Facility 1
20 Total number of flaws = 284 Total number of flaws = 284
Total flaws detected = 93 Total flaws detected = 249
Total flaws missed = 191 Total flaws missed = 35
10
90% POD = Not Achieved 90% POD = 0.024 in. (0.598 mm)

0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
Crack Depth in Inches

Figure 9 PoD (a) log-odds model results for fluorescent penetrant: no developer and developer applied to the same flaw specimen
(The results are re-plotted with permission of NTIAC. All rights reserved. (See reference (5))
100

90

80
Aluminium Data Set: A7001AL (File A-ET7)
Test Object : 4340 Steel / Flat Plate
Data Set: ETA3001A (File A-ET3) Low Cycle Fatigue Cracks
70
Test Object : 6AL-4V Titanium / Flat Plate Condition: As Machined
Low Cycle Fatigue Cracks NDT Method: Eddy Current - Hand Scan
Data Set: ETA1001A (File A-ET1)
Condition: As Machined Operator: A
60 Test Object : Aluminum / Flat Panel
NDT Method: Eddy Current - Hand Scan Total number of flaws = 142
Low Cycle fatigue Cracks
Operator: A Total flaws detected = 30
Condition: As Machined
Total number of flaws = 134 Total flaws missed = 112
NDT Method: Eddy Current-Hand Scan
50 Total flaws detected = 92
Operator: A
Total flaws missed = 42 90% POD = Not Achieved
Total number of flaws = 311
Total flaws detected = 208

39
90% POD = 0.173 in. (4.40 mm)
40 Total flaws missed = 103

90% POD = 0.196 in. (4.98mm)


30

Probability of Detection (PoD) In %


20

6AL-4V Titanium 4340 Steel


10

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Crack Length In Inches

Figure 10 PoD (a) log-odds model results for manual eddy currents: different materials but nominally the same flaws
(The results are re-plotted with permission of NTIAC. All rights reserved. (See reference (5))
100

90

80

70

60 Welds Ground Flush Welds with Crowns

50

Data Set: F6003(3)L (File F-XT6) Data Set: F8003(3)L (File F-XT8)

40
Test Object : Longitudinal Cracks in Test Object : Longitudinal Cracks in
40
2219 Aluminum,GTA 2219 Aluminum,GTA,
Welds with Crowns Flush Ground Welds
Condition: As Cracked, Etched and Condition: As Cracked, Etched and

Probability Of Detection (PoD) In %


30 Proof Loaded Proof Loaded
NDT Method: Xray Radiography NDT Method: Xray Radiographic
Operator: Combined, 3 Operators Operator: Combined, 3 Operators
Total number of flaws = 162 Total number of flaws = 324
20
Total flaws detected = 80 Total flaws detected = 185
Total flaws missed = 82 Total flaws missed = 139

10 90% POD = Not Achieved 90% POD = Not Achieved

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Crack Length In Inches

Figure 11 PoD (a) log-odds model results for X-ray radiography: different weld conditions but nominally the same flaws
(The results are re-plotted with permission of NTIAC. All rights reserved. (See reference (5))
100

90

80
Longitudinal Flaws Transverse Flaws

70

60

50

41
Data Set: CC002(3)L (File C-PTC) Data Set: CD002(3)L (Fiile C-PTD)
40
Test Object : Longitudinal Cracks in Test Object : Transverse Cracks in
2219 Aluminum,GTA, 2219 Aluminum,GTA,

Probability of detection (PoD) In %


Flush Ground Welds Flush Ground Welds
30 Condition: As Cracked and Etched Condition: As Cracked and Etched
NDT Method: Fluorescent Penetrant NDT Method: Fluorescent Penetrant
Operator: Combined 3 Operators Operator: Combined 3 Operators
Total number of Flaws = 324 Total number of flaws = 54
20 Total flaws detected = 307 Total flaws detected = 48
Total flaws missed = 17 Total flaws missed = 6

10 90% POD = 0.048 in. (1.21 mm) 90% POD = 0.201 in. (5.11 mm)

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Crack length In Inches

Figure 12 PoD (a) log-odds model results for fluorescent penetrant: different flaws but nominally the same specimens
(The results are re-plotted with permission of NTIAC. All rights reserved. (See reference (5))
100

90

80
Operator C Operator A

70

60

50

Operator B

42
40
Data Set: D3003AL (File-D-UT3) Data Set: D3003BL (File D-UT3) Data Set: D3003AL (File D-UT3)
Test Object : 6AL-4V Titanium / Flat Plate Test Object : 6AL-4V Titanium / Flat Plate Test Object : 6AL-4V Titanium / Flat Plate
30 Low Cycle Fatigue Cracks Low Cycle Fatigue Cracks Low Cycle Fatigue Cracks

Probability of Detection (PoD) In %


Condition: After Etch and Proof Load Condition: After Etch and Proof Load Condition: After Etch and Proof Load
NDT Method: UT Immersion-Shear Wave NDT Method: UT Immersion-Shear Wave NDT Method: UT Immersion-Shear Wave
Operator: A Operator: B Operator: C
20
Total number of flaws = 135 Total number of flaws = 135 Total number of flaws = 135
Total flaws detected = 105 Total flaws detected = 115 Total flaws detected = 116
Total flaws missed = 30 Total flaws missed = 20 Total flaws missed = 19
10
90% POD = 0.265 in. (6.73 mm) 90% POD = 0.111 in. (2.82 mm) 90% POD = 0.133 in. (3.38 mm)

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Crack Length In Inches

Figure 13 PoD (a) log-odds model results for Ultrasound (Immersion): different operators but inspecting the same flaw specimen
(The results are re-plotted with permission of NTIAC. All rights reserved. (See reference (5))
APPENDIX A

GLOSSARY OF TERMS, STATISTICAL TERMINOLOGY


AND OTHER RELEVANT INFORMATION
TABLE OF CONTENTS

1. GLOSSARY OF TERMS A1

2. STATISTICS TERMINOLOGY A1
2.1. CONFIDENCE INTERVAL A1
2.2. CONFIDENCE LIMITS A1
2.3. CONFIDENCE LEVEL: A1
2.4. MAXIMUM LIKELIHOOD METHODS: A2
2.5. PROBABILITY OF DETECTION: A4

3. OTHER RELEVANT INFORMATION A4


3.1. CALCULATING A CONFIDENCE INTERVAL A4
3.2. LOG-ODDS MODEL A5
3.3. LOG-NORMAL MODEL A6
3.4. PROBABILITY DENSITY FUNCTION A7
3.4.1. The Discrete Case A7
3.4.2. The Continuous Case A8
1. GLOSSARY OF TERMS

English Description
Symbols

a Flaw size
â Signal response
âth Signal response threshold
C Confidence level
F Continuous cumulative distribution
m The median of a population
p A general statistical parameter of a population
p1 The lower confidence limit
p2 The upper confidence limit
PoD (a) The probability of detection function

Greek
Symbols

1 A parameter in the log-normal relationship


1 A parameter in the log-normal relationship
An error term in the log-normal relationship
The mean of a population
The standard deviation of a population
The standard deviation of the error term

2. STATISTICS TERMINOLOGY
2.1. CONFIDENCE INTERVAL
A confidence interval gives an estimated range of values which is likely to include an
unknown population parameter.

(Note: Each estimated range is calculated from a particular random sample. If random
samples of the same size are taken repeatedly from the same population and a confidence
interval is calculated for each sample, then a certain percentage of the intervals will include
the unknown parameter. This percentage is referred to as the ‘confidence level’ (see below).
The width of the confidence interval provides some idea about the uncertainty of the unknown
parameter. A very wide interval may indicate that more data should be collected before
anything very definitive can be said about the parameter).

2.2. CONFIDENCE LIMITS


Confidence limits are the lower and upper boundaries of a confidence interval.

2.3. CONFIDENCE LEVEL:


The confidence level is the probability value that is associated with the confidence interval.

A1
(Note: It is called a probability value, notwithstanding the fact that it should not be
interpreted as a probability. The notation of probability is introduced when calculating the
confidence interval using a normal probability distribution curve, where different areas under
the curve have corresponding probabilities (see the example in Section 3).

2.4. MAXIMUM LIKELIHOOD METHODS:


Statement of the Maximum Likelihood Method (MLM):

• We have made n measurements of x {x1, x2, …, xn}.


• We know the probability density function that describes x: f(x, a).
• We want to determine the parameter a, hence we select a to maximise the probability of
getting the measurements of the xi's

The mathematical implementation of the MLM, for calculating the parameters in the log-odds
or log-normal models, is outside the scope of this study. However, particular worked
examples can be found in reference 6. the example below illustrates an optimisation technique
called ‘the method of least squares’, which can be regarded as a special case of the MLM.

Example:

A trolley moves along a track at constant speed. Suppose the following measurements of the
distance vs. time were made. From the data find the best value for the speed (v) of the trolley.

Distance d (mm) 11 19 33 40 49 61
Time t (seconds) 1.0 2.0 3.0 4.0 5.0 6.0

Since the trolley is moving at constant speed, the gradient in a distance vs. time graph must be
constant and it can be assumed that the relationship between d and v must take the form:

d = vt + d 0 (1)

The problem is to establish the parameters d0 and v from the measurements taken. Here, it is
going to be done using the ‘method of least squares’ (9). The least squares regression line d
on v is given by

std
d !d =
st 2
( )
t!t (2)

where d and t are the mean values of the distance and time respectively and

6 6 6
n( t i d i ! ( t i ( d i
std 1 1 1
= 2
(3)
st 2 6
" # 6
n( t i2 ! $ ( t i %
1 & 1 '
A2
on comparing equations (1) and (2) it is straight forward to show that

std
v= (4)
st 2
and

!s "
d 0 = d # $ td %% t (5)
$s2
& t '

Since

6 6 6 6
2
! ti = 21,
1
! d i = 213,
1
! ti d i = 919,
1
!t
1
i
= 91 (6)

Using the values in equation (6) and substituting the appropriate ones into equations (3), (4)
and (5), it can be shown that

v = 9.9
d 0 = 0.8 (7)
! d = 9.9t + 0.8

70

60

50
distance d (mm)

40

d = 9.9t + 0.8
30 2
R = 1.0

20

10

0
0 1 2 3 4 5 6 7

time t (seconds)

• The trend line is the optimum fit to the data


• It minimises the sum of the squares of the deviations between the line and the data
(hence the name ‘method least squares’)
A3
• This method is also used in Microsoft Excel to establish the trend line. In Excel a
‘correlation coefficient’ (R2) is also calculated and provides a quantitative measure
of fit. The closer R2 is to 1 the better the fit. For the data here, R2 has been
calculated and to one decimal place the trend line is virtually a perfect fit.

The MLM will also involve deriving formulae for the relevant parameters (i.e. usually the
‘average’ and the standard deviation). However, the formulae are much more complicated
than the least squares formulae presented here.

2.5. PROBABILITY OF DETECTION:


The PoD (a) function is the proportion of all flaws of size ‘a’ that will be detected in a
particular application of an NDT system.

3. OTHER RELEVANT INFORMATION


3.1. CALCULATING A CONFIDENCE INTERVAL
See also reference (9), p434.

Consider the confidence Interval for the mean of a population where the population
variance 2 is known.
2
If X is normally distributed such that X ~ N( , ), then for any n (i.e. for any random sample
with n pieces of data)

" !2 #
X ~ N $µ, %
& n '

X "µ
Standardising, we have Z = where Z ~ N ( 0 ,1 ) .
!/ n

We know that the central 95% area of N(0, 1) 2.5%


N(0,1)
lies between the values ± 1.96

-1.96 0 -1.96

X "µ
# P( "1.96 $ $ 1.96 ) = 0.95
!/ n
! !
# P( "1.96 $ X " µ $ 1.96 ) = 0.95
n n
! !
# P( 1.96 % µ " X % "1.96 ) = 0.95
n n
! !
# P( X + 1.96 % µ % X " 1.96 ) = 0.95
n n

A4
! !
" P( X # 1.96 $ µ $ X + 1.96 ) = 0.95 (8)
n n

Equation (8) has a probability of 0.95 associated with the interval. However, it is stressed that
equation (8) should not interpreted as ‘the probability that lies between X ± 1.96 /√n is
0.95’, since the value of the population mean will either be in the particular interval or not.
The correct interpretation of equation (8) is that if a large number of different confidence
intervals are calculated in the same way (the intervals will be different because X will be
different for each random sample), then we would expect that about 95% of them will include
or ‘trap’ . That is, an interval has been found with a 95% confidence level that includes .
3.2. LOG-ODDS MODEL
Consider the two forms of the log-odds model:

! $ ln a # m %
& '
e 3( " )
PoD(a ) = ! $ ln a # m %
(9)
& '
1+ e 3( " )

and
! + " ln a )
e(
PoD(a ) = ! + " ln a )
(10)
1 + e(

on comparing equations (9) and (10)

! & ln a % m '
( ) = " + # ln a
3* $ +
m! ! ln a
,% + = " + # ln a
$ 3 $ 3

on comparing coefficients

!
"= and # = % " m
$ 3
# !
hence m = % and $ =
" " 3

Let p = PoD(a) in equation (10), hence

A5
p( 1 + e! + " lna ) = e! + " lna
# p = e! + " lna ( 1 $ p )
p
# = e! + " lna
1$ p
% p &
# ln ' ( = ! + " ln a
) 1$ p *
# ln( odds ) + ln a

3.3. LOG-NORMAL MODEL


Since

ln(â ) = ! 1 + " 1 ln(a) + #

then we say ln(â) is normally distributed with mean (a) = 1 + 1 ln(a) and constant
2
standard deviation (i.e. N( (a), ). The standard parameter Z ~ N ( 0 ,1 ) is given by

ln aˆ % ( " 1 + # 1 ln a )
Z= (11)
$!

and the area we require is given the right hand shaded portion.

1
Z th
"
t2 N(0,1)
Since P( Z $ Z th ) = F ( Z th ) = %e 2
dt
2! "#

(see section 3.4.2 below) -Zth 0 Zth

! P( Z > Z th ) = 1 " F ( Z th )

and hence from equation (11)

% ln( aˆ th ) ' ( " 1 + # 1 ln a ) &


PoD( a ) = 1 ' F ( )
(* $! )+
but PoD( a ) = F ( ' Z th )
% ( " + # 1 ln a ) ' ln( aˆ th ) &
, PoD( a ) = F ( 1 )
*( $! +)
% - (ln( aˆ th ) ' " 1 ) . &
( ln a ' / 0)
( 1 #1 2)
, PoD( a ) = F
( $ ! / #1 )
( )
*( +)

A6
which is the cumulative log-normal distribution with mean and standard deviation given
by:

(ln( aˆ th ) % " 1 )
µ( a ) =
#1
$!
$=
#1

3.4. PROBABILITY DENSITY FUNCTION


3.4.1. The Discrete Case
A discrete random variable X is defined here in terms of its properties:

(a) X is an event which is associated with the discrete values (x1, x2,……., xk)
(b) The probabilities associated with each of the values (x1, x2,……., xk) are
(p1, p2,……, pk) respectively (i.e. P(X=xi) = pi, where 1 ≤ i ≤ k)

For example, in the rolling of an unbiased die, X could be the event 'the score of the rolled
die'. The set of associated discrete values are (1, 2, 3, 4, 5, 6), each with probability p = 1/6.

The set of all the possible values of the event X and associated probabilities describe the
'probability distribution' of X. The above notation can be simplified by introducing the
discrete probability density function which is commonly denoted by:

p(x)or P(X=x) (p(x) ! 0) (12)

where x is a general element of the range of possible values of X, defined as the ‘Quantile.
The ordinates of p(x) represent the probability that X assumes a particular value x (the random
variable is normally denoted by a capital letter and the particular value it takes by a small
letter).

From probability theory it follows that:

! p( x ) = 1 (13)
all x

that is, the probability that X assumes any one of all the possible values is a certainty.

The cumulative distribution function F(x) is defined as:

F ( x ) = " p( t ) =P( X ! x ) (14)


t! x

that is, the probability that X assumes any one of the values up to and including the value x.

A7
3.4.2. The Continuous Case
A continuous random variable X can assume any value in a particular interval rather than
any value from a set of discrete values. It is necessary to define a continuous function to
describe the probability distribution of X. This function is called the continuous probability
density function f(x) and it is usual to define it over the range -∞ <x <∞. The random
variable X is defined in terms of f(x) and has the following properties:

!
P( "! < X < ! ) = $ f ( x ) dx = 1 ( f( x )# 0 ) (15)
"!

that is, the probability that X lies in the complete range of possible values is a certainty and
corresponds to the whole area under f(x). Equation (15) is analogous to equation (13)) and it
follows that

b
P( a < X < b ) = ! f ( x ) dx (16)
a

that is, the probability that X lies in the interval (a, b) is the corresponding area under f(x).

From equation (16), the probability that the continuous random variable X lies between x and
x+δx (where δx is a small finite interval) is f(x) δx (i.e. the corresponding area under f(x)).
With an appropriate selection of δx, this area under f(x) can be used to approximate the
probability that X assumes a particular value in a discrete distribution, that is:

f ( x ). ! x = P( X = x ) (17)

The ordinates of f(x) represent the probability per unit length and hence the terminology
'probability density function' for f(x) is highly appropriate.

The cumulative distribution function F(x) is given by:

x
F( x ) = $ f ( t ) dt = P( X # x ) ( !" < x < " ) (18)
!"

Equation (18) is analogous to equation (14) and represents the area under the curve f(x) up to
and including x.

A8
APPENDIX B

AN AUDIT TOOL FOR THE PRODUCTION


AND APPLICATION OF POD CURVES
TABLE OF CONTENTS

1. INTRODUCTION B1

2. OPERATIONAL PARAMETERS B1

2.1. NDT METHODS B1


3. PHYSICAL PARAMETERS B2

3.1. THE SPECIMEN B2


3.2. FLAW CHARACTERISTICS B2
4. MODELLING OF POD(A) B2
1. INTRODUCTION
The ‘audit tool’ (or check list) is based on issues that are covered in the main text, in order to
check that the PoD has been gathered correctly and that there is enough data to compute the
PoD/Confidence limit combination and the parameters associated with the PoD (a) function.

The following suggested audit tool for HSE inspectors is aimed at assisting them in their
involvement with general PoD studies and in particular safety cases involving PoD.

For each item in the audit tool there is a reference to a particular section in the main PoD
report for further information and clarification. The order of the questions is currently as they
are found in the PoD report and not by any order of priority or importance.

The headings below cover some important operational and physical parameters that are
known to affect PoD results. The tables contain the kind of questions that could be asked by
HSE inspectors and organisations producing and or applying PoD curves.

It is worthwhile looking at the ‘Guidelines’ document (Appendix D), which provides some
examples of how one might apply an existing PoD curve to the inspection of welds or
components with flaws.

This document should be regarded as a working document, which could be updated


periodically (e.g. every 3-4 years), based on developments in PoD research and experiences
of HSE inspectors.

2. OPERATIONAL PARAMETERS

2.1. NDT METHODS

Section reference in
Audit
PoD report

What was recorded for the NDT parameters? 5.2


How often was the equipment calibrated? 5.2
How often was the NDT procedure assessed to see that it was
5.2
producing the correct result?
What NDT method was used? 5.2.1 (Figure 8)
Was a developer used in fluorescent penetrant testing? 5.2.2 (Figure 9)
Were the correct procedures followed before each NDT method was
5.2.2
applied (e.g. thorough cleaning after a fluorescent penetrant test?)
What sort of operational variability was there in the PoD results of
5.2.6 (Figure 13)
different operators?
How many inspectors in the PoD study? 5.2.6 (Figure 13)

B1
3. PHYSICAL PARAMETERS

3.1. THE SPECIMEN

Audit Section reference


in PoD report

What is the material? 5.2.3 (Figure 10)


What are the physical dimensions? 5.2.3
What is the surface condition? 5.2.3 (Figure 10)
What is the state of machining? 5.2.3 (Figure 10)
What is the weld geometry? 5.2.4
What is the weld condition? 5.2.4 (Figure 11)

3.2. FLAW CHARACTERISTICS

Audit Section reference


in PoD report

What is the largest flaw that can be missed? 3.1 (Figure 2)


Where flaws in PoD study simulated or real? 3.2
How many flaws in PoD study? 3.3.3
What was recorded for the flaw characteristics? 5.2.5 (Figure 12)

4. MODELLING OF POD(A)

Audit Section reference


in PoD report

Is the PoD a good enough fit, does the PoD increase with flaw size? 5.1 (Figure 7)
Does the confidence limit increase with flaw size? 5.1 (Figure 7)
Is there enough data in the part of the PoD that is increasing? 5.1 (Figure 2)
Has ln(â) been plotted against ln(a)? 5.1 (Figure 3)
Is ln(â) an increasing function of ln(a)? 5.1 (Figure 3)
Are the values of and reasonable (e.g. either too large or too
5.1
small)?

B2
APPENDIX C

THE VALIDITY OF THE JCL


‘PROBABILITY OF INCLUSION’ MODEL
The validity, of the JCL
‘Probability of Inclusion’ model

By

George A Georgiou
Melody Drewry
Emilie Beye

Jacobi Consulting Ltd


TABLE OF CONTENTS

TABLE AND FIGURE CAPTIONS Cii


EXECUTIVE SUMMARY Ciii
Background Ciii
Objectives Ciii
Work Carried Out Ciii
Conclusions Ciii
Recommendation Ciii

1. INTRODUCTION C1

2. OBJECTIVES C1

3. ASSESS PREVIOUS MODELLING WORK C1


3.1. A CRITICAL REVIEW C2
3.2. DISCUSSIONS WITH HSE INSPECTORS C2

4. THE MBEL ‘PROBABILITY OF INCLUSION’ MODEL C2


4.1. ASSESSMENT OF MBEL’S STATISTICAL DEFINITIONS C2
4.2. SUMMARY OF MBEL'S MODELLING APPROACH C3
4.2.1. The Binomial Model (With Replacement) C5
4.2.2. The Hypergeometric Model (Without Replacement) C5
4.3. A SUMMARY OF JCL’S MODELLING APPROACH C6
4.4. A COMPARISON OF THE MBEL AND THE JCL APPROACHES C7
4.5. COMMENTS ON THE HYPERGEOMETRIC MODEL C8

5. FURTHER ANALYSIS OF REAL DATA FROM HSE AND OTHER SOURCES C8


5.1. DATASETS INVOLVING FLAW LENGTHS C9
5.2. DATASETS INVOLVING FLAW DEPTHS C11
5.3. REASSESS AND VALIDATE THE MODEL C12
5.4. SELECT THE BEST AVAILABLE MODEL C14

6. CONCLUSIONS C14

7. RECOMMENDATIONS C14

8. ACKNOWLEDGEMENTS C14

9. REFERENCES C14

TABLES 1 - 6
FIGURES 1 - 13

APPENDIX CI PART A: A SUMMARY COMPARISON OF THE MBEL AND JCL


‘PROBABILITY OF INCLUSION’ MODELS
PART B: THE MBEL ‘PROBABILITY OF INCLUSION’ MODEL
APPENDIX CII THE DATASETS USED IN THE STATISTICAL ANALYSES
Ci
TABLE AND FIGURE CAPTIONS
TABLE CAPTIONS
Table 1 The mean and standard deviation for the four flaw length data sets
Table 2 The ‘p-value’ calculated using the Kolmogrov-Smirnoff test for each data set against each distribution
Table 3 The ‘p-value’ calculated using the Shapiro-Wilks test for each data set against the Normal distribution
Table 4 The mean and standard deviation for the three flaw depth data sets
Table 5 The ‘p-value’ calculated using the Kolmogrov-Smirnoff test for each flaw depth data set against each
distribution
Table 6 The ‘p-value’ calculated using the Shapiro-Wilks test for each flaw depth data set against the Normal
distribution

FIGURE CAPTIONS
Figure 1 An illustration of a Liquid Petroleum Gas (LPG) sphere
Figure 2 An illustration of a collapsed LPG Sphere
Figure 3 Probability of including a defective part of at least 1% given a certain % level of inspection (Hyper
geometric model)
Figure 4 Probability of including a defective part of at least 1% given a certain % level of inspection (Comparison of
JCL and MBEL Models)
Figure 5a A typical analysis and presentation of Case 1 (Pi data) using S-Plus
Figure 5b Quantile-Quantile plots of Case 1 (Pi data) against Normal, (b) Exponential, (c) Logistic and (d)
Lognormal distributions
Figure 5c Cumulative density function plots of Case 1 (Pi data) against (a) Normal, (b) Exponential, (c) Logistic and
(d) Lognormal distributions
Figure 6a A typical analysis and presentation of Case 2 (Full sectioning data) using S-Plus
Figure 6b Quantile-Quantile plots of Case 2 (full sectioning data) against Normal, (b) Exponential, (c) Logistic and
(d) Lognormal distributions
Figure 6c Cumulative density function plots of Case 2 (full sectioning data) against (a) Normal, (b) Exponential, (c)
Logistic and (d) Lognormal distributions
Figure 7a A typical analysis and presentation of Case 3 (ultrasonic data) using S-Plus
Figure 7b Quantile-Quantile plots of Case 3 (ultrasonic data) against Normal, (b) Exponential, (c) Logistic and (d)
Lognormal distributions
Figure 7c Cumulative density function plots of Case 3 (ultrasonic data) against (a) Normal, (b) Exponential, (c)
Logistic and (d) Lognormal distributions
Figure 8a A typical analysis and presentation of Case 4 (reduced sectioning data) using S-Plus
Figure 8b Quantile-Quantile plots of Case 4 (reduced ultrasonic data) against (a) Normal, (b) Exponential, (c) Logistic
and (d) Lognormal distributions
Figure 8c Cumulative density function plots of Case 4 (reduced sectioning data) against (a) Normal, (b) Exponential,
(c) Logistic and (d) Lognormal distributions Figure 9a A typical analysis and presentation of Case 2 (Full
sectioning data) using S-Plus
Figure 9a A typical analysis and presentation of Case 2 (Full sectioning data) using S-Plus
Figure 9b Quantile-Quantile plots of Case 2 (full sectioning data) against Normal, (b) Exponential, (c) Logistic and
(d) Lognormal distributions
Figure 9c Cumulative density function plots of Case 2 (full sectioning data) against (a) Normal, (b) Exponential, (c)
Logistic and (d) Lognormal distributions
Figure 10a A typical analysis and presentation of Case 3 (ultrasonic data) using S-Plus
Figure 10b Quantile-Quantile plots of Case 3 (ultrasonic data) against Normal, (b) Exponential, (c) Logistic and (d)
Lognormal distributions
Figure 10c Cumulative density function plots of Case 3 (ultrasonic data) against (a) Normal, (b) Exponential, (c)
Logistic and (d) Lognormal distributions
Figure 11a A typical analysis and presentation of Case 4 (reduced sectioning data) using S-Plus
Figure 11b Quantile-Quantile plots of Case 4 (reduced ultrasonic data) against (a) Normal, (b) Exponential, (c) Logistic
and (d) Lognormal distributions
Figure 11c Cumulative density function plots of Case 4 (reduced sectioning data) against (a) Normal, (b) Exponential,
(c) Logistic and (d) Lognormal distributions
Figure 12 Estimating the probability of including a defective part (P(X>0) = 1-P(X=0)) using the Lognormal
distribution given a certain % level of inspection for the cases (a) X=0.02, (b) X=0.004 and (c) X=0.0008
Figure 13 The probability of including a defective part (P(X>0) = 1-P(X=0)) using (a) the Normal distribution and (b)
the Log-odds distribution

Cii
EXECUTIVE SUMMARY
Background
The extent of non-invasive inspection of LPG storage vessels was considered previously by Jacobi
Consulting Ltd (JCL) and a probabilistic model (i.e. the ‘Probability of Inclusion’) based on the
Normal distribution, was developed for optimising the probability of flaw detection. A companion
guidelines document was written to assist companies and HSE inspectors to assess how much NDT
was required in order to achieve a desired probability of detecting a flaw (i.e. the ‘Index of
Detection’). The earlier work was recently assessed and reviewed by an independent statistician in
August 2004, as part of a training programme arrangement with Toulouse University, France. In the
meantime, some HSE inspectors have considered the earlier JCL work along with the guidelines
document and it was timely to assess their views and comments.

It was also timely, as part of the review, to carry out some formal statistical tests on the various
distributions used in developing the ‘Probability of Inclusion’ model, with a view to selecting the
most appropriate distribution. The opportunity was taken to update the companion guidelines
document to Appendix C, which has now become a separate document, Appendix D.

Objectives
• To provide a critical review of the ‘Probability of Inclusion’ model
• To carry out formal statistical tests on the distributions used in the modelling
• To select the most appropriate distribution for the ‘Probability of Inclusion’ model

Work Carried Out


The work has focussed on three important issues. The first was to review the existing ‘Probability
of Inclusion’ model with a view to validating it against an independently developed model by
Mitsui Babcock Engineering Ltd (MBEL). The second issue was to establish which of the statistical
distributions best fit real flaw length and flaw depth data, some of which came from an ultrasonic
inspection of an LPG storage vessel. The third issue was to select the most appropriate statistical
distribution for the ‘Probability of Inclusion’ model, based on the investigation of this study.

Conclusions
• The earlier JCL ‘Probability of Inclusion’ model has been validated against an independently
developed ‘Probability of Inclusion’ model by MBEL.
• The Lognormal distribution and the ‘Log-odds’ distribution were found to be the best fits for
the flaw length and flaw depth data in this study, with the Lognormal distribution being the
optimum fit, according to the formal statistical tests carried out.
• The ‘Log-odds’ distribution was found to be the most appropriate distribution to use with the
‘Probability of Inclusion’ model.

Recommendation
• The ‘Probability of Inclusion’ model should be applied more widely than just to LPG storage
vessels.

Ciii
Civ
1. INTRODUCTION
The extent of non-invasive inspection of Liquid Petroleum Gas (LPG) storage vessels has been
considered previously by Jacobi Consulting Ltd (JCL) and a probabilistic model (i.e. ‘Probability of
Inclusion’) was devised for optimising defect detection. A series of reports and papers were
published (1-5) and included a confidential draft document (5), which considered a brief
comparison between a modelling approach by Mitsui Babcock Energy Ltd (MBEL) and the
modelling approach developed by JCL. A guidelines document (3) was written to assist companies
and HSE inspectors to assess how much NDT was required in order to achieve a desired probability
of detecting a flaw and was based on a concept called the ‘Index of Detection’.

In the meantime, some HSE inspectors have used the JCL work and the guidelines document. In
addition, an independent statistician from Toulouse University (France) was employed by JCL as
part of a student industrial training programme. A number of tasks and objectives were set for the
student, which included a critical review of the earlier JCL model.

It was considered timely to assess the experiences of the HSE inspectors, as well as review the JCL
‘Probability of Inclusion’ model and validate it by carrying out a more detailed comparison with the
MBEL model. In addition, the various statistical distributions that can be used in the ‘Probability of
Inclusion’ model were considered in relation to real flaw data using formal statistical tests, with a
view to establishing the distribution that best fits this data and to select the most appropriate
distribution to use with the ‘Probability of Inclusion’ model.

In this project a critical review is carried out in section 3 along with a report on the experiences of
HSE inspectors, who have used the earlier JCL model. In section 4, the confidential draft document
(5) is considered in more detail and further explanations are provided about the advantages and
disadvantages of the MBEL and JCL approaches. Section 5 deals specifically with the detailed
analysis of real data (i.e. flaw length and flaw depth data) and the selection of the most suitable
distribution to use in the ‘Probability of Inclusion’ model and ultimately in the ‘Index of Detection’
model (see Appendix D).

In the context of LPG storage vessels (Figure 1), the importance of carrying out sufficient non-
invasive inspection, cannot be over stated, given the hazardous and inflammable substances
contained by them. Disasters can happen and the consequences can be extreme (Figure 2).

2. OBJECTIVES
• To provide a critical review of the ‘Probability of Inclusion’ model
• To carry out formal statistical tests on the distributions used in the modelling
• To select the most appropriate distribution for the ‘Probability of Inclusion’ model

3. ASSESS PREVIOUS MODELLING WORK


At the start of this study it will have been 5 years since the earlier ‘Probability of Inclusion’ (PoI)
model was completed. Since then, HSE inspectors have been considering the PoI model and so their
experiences would be helpful to assess the industrial usefulness of the model. In addition, the
opportunity was taken during this time to have an independent assessment of the PoI model. An
arrangement was made with Toulouse University, France as part of their undergraduate training
programme, to employ one of their final year statisticians to carry out an independent review of the
earlier PoI model. The student was given the following agreed tasks and objectives:

• Understand the PoI modelling approach by JCL


C1
• Offer a critical review of the JCL modelling approach
• Research other possible approaches
• Research real data from HSE and other sources
• Compare results between the various approaches
• Re-assess the JCL PoI model in the light of real data

3.1. A CRITICAL REVIEW


During the review of the JCL PoI model, a number of checks were made. Initially, these were
numerical in nature to verify that the earlier calculations were correct and to also assess any glaring
errors in the approach. The checks confirmed, as did a verification statement in an earlier report (2),
that all the repeated calculations were correct. In addition, the theoretical approach was also found
to be correct, given the very limited amount of real LPG data to test the JCL model against.

The review pointed out, quite correctly, that apart from some very basic statistical tests, no rigorous
formal statistical tests were carried out to assess how well the distributions used were able to fit the
data. However, in defence of the earlier work, there was very little real LPG data to carry out formal
tests on. Another drawback that was observed in the earlier PoI model was that the probability of
including the flaw did not reach 100%, even when 100% inspection was carried out. Whilst there
was a good theoretical reason for this, which was fully explained, it was still worth investigating
ways of improving the situation.

3.2. D ISCUSSIONS WITH HSE INSPECTORS


HSE inspectors felt that the probabilistic models developed by JCL (2, 3), were critically needed at
the time because of particular inspection issues the HSE was having with certain sectors of the LPG
industry. However, since then there have not been any similar inspection issues in relation to LPG
spheres and moreover no one has really challenged the JCL models. Nevertheless, the general
feeling within HSE is that this work continues to be of value, as it is offering practical information
to industry.

4. THE MBEL ‘PROBABILITY OF INCLUSION’ MODEL


At the time when the PoI model for LPG spheres was being developed by JCL (2, 3), MBEL was
also developing a similar model. MBEL have kindly provided the relevant section of this work and
it is included here in Appendix CI (Part B). Whilst the JCL and MBEL approaches are quite
different on the surface, it is interesting that they produce very similar results. This section
considers these two approaches and discusses their advantages and disadvantages.

A brief summary of the comparison between the MBEL and JCL model approaches is also provided
in Appendix CI (Part A), and the reader can omit the full details of the comparison below, on the
first read, and go straight to Appendix CI (Part A).

4.1. ASSESSMENT OF MBEL’S STATISTICAL DEFINITIONS


In the first paragraph of MBEL's document (Appendix CI, Part B), the phrase:
'…a uniform random distribution of defects…' is used.

It is felt that the word ‘uniform’ should be avoided here as there are statistical distributions, both
discrete and continuous, which are called 'uniform' and this is not what is being considered by the
MBEL approach. The complete phrase in the first paragraph has been interpreted as ‘defects’ occur
randomly and that all sampled lengths of equal size have the same probability of including a
'defect'.
C2
Page 1 of the MBEL document also refers to '…a unit containing a defect…'
Since the random distribution of defects is expressed as a %, technically one should not be talking
about the '…probability that a unit contains a defect…' (i.e. just one defect). The weld volume has
been divided into 100 discrete units and the term Dd is the defect distribution expressed as a %. This
is equivalent to saying Dd units are defective. The term Pd is believed should be expressed as the
'Probability that any one unit selected at random is defective' or the 'Probability that any 1% of the
weld selected at random is defective' and not the 'Probability that a unit contains a defect', since a
defective unit may contain more than one defect.

The MBEL document uses the notation for the percentage coverage of the weld volume to be 'Cov'.
It is felt this notation should also be avoided as this could be initially misunderstood as representing
the 'Covariance', another statistical parameter, which is not what is intended.

4.2. S UMMARY OF MBEL'S MODELLING APPROACH


MBEL's approach is based on the following idea and this is illustrated below.

Dd defective units (or Dd % of the weld is defective and randomly distributed)

100 units (or 100% of the weld)

This is equivalent to saying we have one hundred components, Dd of which are defective. The
probability that any one unit (or any 1%) selected at random is defective is

Dd
Pd = (1)
100

In the following analysis, the notation used by MBEL has been modified slightly for subsequent
comparisons with the JCL notation.

MBEL have used the above approach to consider the problem of selecting k units (or k%) of the
weld and calculating the probability that s (or s%) of these k (or k%) units are defective. MBEL
have considered this problem in the context of what in probability is usually termed:

(i) 'With replacement' (i.e. the Binomial distribution)


(ii) 'Without replacement' (i.e. the Hypergeometric distribution)

It is important to note that the random variable X, in the problem considered by MBEL, can only
take on integer values 0,1,2,3,……,100 and these integer values represent the % of defective weld
and not the number of individual defects. Thus the random variable X is really ‘the % of defective
weld’.

Consider the following two examples before developing general formulae for the problem
considered by MBEL.

C3
Ex.1. A box contains 10 bolts, 3 of which are defective. Two bolts are drawn at random. Find the
probability that the two bolts are both defective.
Let the random variable A represents the event ‘the 1st bolt drawn is defective’
Let the random variable B represents the event ‘the 2nd bolt drawn is defective’

P(A) = 3/10 since 3 out of the 10 bolts are defective.

If we sample with replacement, the situation before the 2nd drawing is the same as at the beginning
and

P(B) = 3/10. Assume the events are independent and from the multiplication law of independent
events (6).

P(A and B) = P(A).P(B) = (0.3)2= 0.09

If we sample without replacement, the situation for A is the same P(A) = 3/10. However the
situation for B has changed since we have one less defective bolt after the first selection and P(B) =
2/9 (i.e. 9 bolts left but only 2 are defective)

P(A and B) = P(A).P(B) = (3/10)(2/9) ≈ 0.07

The above illustrates the Binomial and Hypergeometric approaches respectively.

Consider now a more general example.

Ex.2. A box contains N bolts, M of which are defective. If a sample of k bolts is drawn at random
find the probability that s bolts are defective.
The probability of selecting one defective bolt at random is given by

M
p= (2)
N

Let the random variable Y represent the event ‘the number of defective bolts’.

In drawing a sample of k bolts, with replacement, the probability that precisely s bolts are defective
is given by the Binomial distribution (6)
s k!s
" k #$M % $ M%
P (Y = s ) = & ' ( ) ( 1 ! ) ( s = 0 , 1, 2......, k ) (3)
, s -* N + * N+

Equation (2) can be thought of as a special case of equation (3), sampling one at a time (i.e. k = 1
!k" k!
and with s = 1, note also that # $ = )
& s ' ( k % s )! s !

Similarly, in drawing a sample of k bolts, without replacement, the probability that precisely s bolts
are defective is given by the Hypergeometric distribution (6)

C4
" k # " N !M #
$ %$ %
P (Y = s ) = & s ' &N k ! s ' ( s = 0 , 1, 2........, k ) (4)
" #
$ %
&k'

Returning to original problem, the values of N, M, k and s in the latter example can now be thought
of as percentages. That is, given k% of the weld is selected find the probability that it contains a
defective part (i.e. X > 0). It is felt that this is a more accurate description of the problem than the
way it is expressed on page 1 of the MBEL document (i.e.'…the problem is to determine the
probability of detecting at least 1 defect when only inspecting a percentage of the weld…')

P ( X > 0 ) = P ( X ! 1) = 1 " P ( X = 0 ) (5)

It is assumed here that X<0 does not have any meaning in the context of the problem here. Let the
probability of Including a Defective part (PoI) = P(X > 0) and the probability of Not Including a
Defective part (PoN) = P(X = 0) and therefore equation (5) can be re-expressed as

PoI = 1 - PoN (6)

4.2.1. The Binomial Model (With Replacement)


Using equation (3) with s=0 and the probability in equation (2) given by Pd = Dd/100, it can be
shown that

0 k !0
" k #$ D % $ D %
PoI = 1 ! & ' ( d ) ( 1 ! d )
, 0 - * 100 + * 100 +
(7)
k
$ D %
. PoI = 1 ! ( 1 ! d ) (0 / k / 100 )
* 100 +

Equation (7) is the equivalent formula to that given on page 2 of the MBEL document (Appendix
CI, Part B). The MBEL approach is equivalent to having 100 bolts of which Dd are defective (i.e.
100 - Dd are not defective) and if we sampled k bolts with replacement, the probability that at least
one is defective is selected is given by equation (7).

Note that the mean value is given by:

kDd
µ = Np = kPd = (8)
100

4.2.2. The Hypergeometric Model (Without Replacement)


Using equation (4), with s=0, N=100 and M=Dd it can be shown that

C5
" k # " 100 ! Dd #
$ %$ % ( 100 ! Dd )!
& 0 '& k ' ( 100 ! Dd ! k )!
PoI = 1 ! = 100 ! ( k ( 100 ! Dd )
" 100 # ( 100 ! k )!
$ %
& k '
(9)
PoI = 1 ( k > 100 ! Dd )

This is the equivalent formula to the one used in the MBEL document (page 2) to produce the
MBEL graph. Note, that in equation (9) k is used instead of the parameter Cov and Dd is used
instead of Pd. The PoI curves for different values of Dd and based on equation (9) are illustrated
here in Figure 3.

Note that the mean for the Hypergeometric distribution is the same as the Binomial distribution, but
their variances are different. This will be considered below.

4.3. A S UMMARY OF JCL’S MODELLING APPROACH


JCL did not consider approaching the problem like MBEL, because it would have meant saying that
the defective parts come as 1% units. In practice 1% could be as long as 10m and in general this
would be unrealistic. This is not offered as a criticism of the MBEL model but just a statement to
say why JCL did not consider modelling the problem this way. It will be shown below that the
MBEL and JCL approach derive quite different formulae for the Binomial case but the results are
almost identical.

In the JCL approach the weld is not divided into 100 discrete units. The weld is considered to have
a finite number of defects (L) with different lengths and which are randomly distributed along the
weld. This physical approach was adopted because of the way the problem was originally posed to
JCL by HSE.

The mean amount of defective weld D% is expressed in the following way

x1 x2 x3 x4 xi xL
x

! x + x2 + x3 + ......... + x L "
D=# 1 $ 100 (10)
% x &

Hence the

Probability (of including a defective part if the whole weld is considered) is equivalent to

Probability (of including x1 or x2 or x3 or…..xL), which is given by

x1 x 2 x 3 x D
p= + + + ........ + L which from (10 ) = (11)
x x x x 100

This is developed more generally by first considering I% of the weld (instead of the whole weld)
then rI% of the weld (where r is a counter) and then the probability of including a defective part for

C6
each rI% selected is established. While this is not difficult to show it is a little long winded and has
already been shown in the JCL report (2). Suffice to say that for any particular rI% of the weld, the
probability of including a defective part, prI is given by

rD
prI = where nI = 100 and 0!r!n (12)
n.100

In the JCL approach a probability is established for including a defective part for each rI% selected.
This probability is proportional to the size of selection, unlike the MBEL approach where the units
are of equal size (c.f. equation (1)). The value of prI is used in the Binomial expansion to establish
the PoI as before (c.f. equations (3), (5) and (6)).
100
! rD "
PoI = 1 # % 1 # 0$r$n (13)
' n.100 &(

Note that the mean is given by

rD
µ = Np = 100 prI = (14)
n

4.4. A COMPARISON OF THE MBEL AND THE JCL APPROACHES


On comparing equations (8) and (14) assuming Dd = D,

k r
= (15)
100 n

and the MBEL and JCL expressions for PoI in the Binomial distributions can be compared directly.
k
! D "
PoI MBEL = 1 # $1 # and
& 100 %'
(16)
100
! kD "
PoI JCL = 1 # $1 # ( 0 ( k ( 100 )
& 100 2 %'

A graphical comparison between the MBEL and JCL formulae for the PoI is provided in Figure 4.

It may or may not seem surprising that these two quite different approaches give such close results.
Initial considerations would suggest that there is something too simplistic about the MBEL model
because it appears to be assuming that the defective parts come in 1% chunks. However, as the
model results show it agrees almost identically with the more direct approach of the physical
problem as outlined above for the JCL model. A plausible reason of why the two approaches agree
is provided here.

The physical distribution of defects in the MBEL approach appears too simplistic at first glance.
However, it is as if this is irrelevant because from a modelling point of view the D% defective parts
and the (100-D)% non-defective parts, can be grouped together and re-distributed in 1% units along
the weld no matter how they appear in practice. The formula for PoI can then be developed as
shown above.
C7
4.5. COMMENTS ON THE HYPERGEOMETRIC MODEL
The Hypergeometric model does overcome the 'irritating' problem of inaccuracy at high values of
the % weld selection (i.e. large k) for low values of mean defect distribution (i.e. D < 4). The JCL
approach of modelling the problem more directly from a physical point of view would not produce
any different results by applying the Hypergeometric distribution. This is because for each rI% of
the weld selected, the whole rI% is sampled in developing the formula so the issue of with/without
replacement does not arise in the JCL approach.

If the Hypergeometric approach is adopted, it has to be appreciated that the PoI is the 'probability of
including a defective part of at least 1%' and not the 'probability of including at least 1 defect' as
stated in the MBEL document. This may or may not be a desirable way of expressing the
probability of including a defective part.

On the other hand, this does not arise with the Normal distribution as discussed in the JCL report,
notwithstanding the difficulties for D < 4. The comparison in this exercise has explained where this
difficulty comes from more clearly and provides further independent evidence validating the JCL
results, and for that matter validating the MBEL approach as well.

The issue of with/without replacement becomes increasingly less important for larger values of the
defect distribution D, or smaller values of the weld selection k. The mean for the Binomial and
Hypergeometric distributions is the same. However, the variance, σ2, for each distribution is not and
is given by
2
! Binomial = Np( 1 " p )
(17)
2 (N "k)
! Hypergeometric = Np( 1 " p )
( N "1)

It can be seen from equation (17) that the Hypergeometric parameters approach the Binomial ones
for N >> k (i.e. N much larger than k) and for such values the Hypergeometric distribution can be
approximated by the Binomial distribution.

5. FURTHER ANALYSIS OF REAL DATA FROM HSE AND OTHER SOURCES


The lack of available inspection data from LPG spheres was part of the difficulty in carrying out
rigorous statistical tests in the earlier work (1, 2). However, in order to carry out such tests it is
necessary to provide appropriate data. It was decided to re-use the LPG inspection data originally
provided by HSE (7), in the absence of other LPG inspection data, which was collected using
ultrasonic NDT where both the ultrasonic signal response and the ultrasonically measured flaw
lengths are provided.

Other data was also researched and it was decided to use data from an earlier HSE funded project
(8), which is also ultrasonic NDT data and has accompanying sectioning data. The sectioning data
is used here in two forms. The first form of the sectioning dataset is called the ‘full dataset’, where
some flaw lengths had to be rounded up or down because the flaw length was given only by an
inequality (e.g. sectioning flaw length > 60mm, would have been taken to be 61mm). The second
form of the sectioning dataset is called the ‘reduced dataset’, where only the precise flaw length
given is included. Thus the reduced dataset is a subset of the full dataset. It was felt useful to
include both datasets in order to assess the effects of rounding the sectioning length up or down and
the effects of using a relatively smaller dataset.
C8
5.1. D ATASETS INVOLVING FLAW LENGTHS
There are four datasets involving flaw lengths and these are given in Appendix CII and are
identified as:

i. Case 1: Pi data
ii. Case 2: Sectioning data (full)
iii. Case 3: Ultrasonic flaw lengths
iv. Case 4: Sectioning data (reduced)

Each dataset was analysed in the same way using the statistical software package S-Plus.

On entering a dataset in S-Plus, one typical output format is a set of four presentations:

• A histogram
• A box plot
• A probability density function
• A quantile-quantile plot

The four presentations for Case 1 (Pi data) are illustrated in Figure 5a.

A histogram is the graphical version of a table which shows what proportion of cases fall into each
of several specified categories. The categories are usually specified as non-overlapping intervals of
some variable (e.g. flaw length intervals (L) 0 ≤ L <10mm, 10 ≤ L < 20mm etc).

A box plot (also known as a box-and-whisker diagram) is a convenient way of illustrating


graphically a summary of five statistical numbers, which consists of the smallest observation, the
lower quartile, the median, the upper quartile and the largest observation.

A probability density function is the continuous equivalent of the histogram representation (for
more details see Appendix A).

A quantile-quantile (q-q) plot is based on a graphical technique for determining if two data sets
come from a population with a common distribution. The q-q plot in Figure 5a shows the Pi flaw
lengths plotted against the predicted standardised length based on a Normal distribution, with the
mean and standard deviation of the Pi flaw lengths.

In order to establish the most appropriate statistical distributions that fit the four data sets in
Appendix CII, q-q plots as well as cumulative frequency distribution (cfd) plots (see Appendix
A) were initially considered.

From the earlier study (1, 2) and more recent research, a number of distributions were analysed.
Ultimately four statistical distributions were believed to be the most appropriate to carry out formal
statistical tests on. These are:

• The Normal
• The Exponential
• The Logistic (Log-odds)
• The Lognormal

C9
Each of the above distributions was used in the q-q plots and the cfd plots, as a way of providing a
visual comparison between the respective distributions and each of the data sets.

The q-q plots and cfd plots for Case 1 (Pi data) are illustrated in Figures 5b and 5c respectively. The
figures show a degree of agreement with each of the distributions considered, but the cfd plots
demonstrate that the Lognormal is perhaps the best fit.

Figures 6a, 6b and 6c illustrate Case 2 (full sectioning data) analyses using S-Plus. The q-q plots
and the cfd plots of Figures 6b and 6c all show some agreement with the distributions. Here,
perhaps it is not absolutely clear which is the best fit, except to say that the cfd Lognormal fit is as
good as any of the others.

The S-Plus results for Case 3 (Ultrasonic length data) are illustrated in Figures 7a, 7b and 7c and the
results for Case 4 (reduced section data) are illustrated in Figures 8a, 8b and 8c.

For Case 3 the Lognormal is perhaps the overall best fit. Case 4 should be similar to Case 2, as it is
a subset of Case 2. A visual comparison of the results in Figures 6 and 8, shows that the results are
indeed very similar. This suggests that the rounding up or rounding down carried out to include
more flaw lengths for Case 2 was reasonable, and the smaller data set of Case 4 did not make much
difference to the overall results.

The plots in Figures 5 to 8 are useful in that they provide clues as to the distribution that best fits the
data considered. However, there are formal statistical tests that provide quantitative information for
assessing best fit.

To test if a distribution is appropriate for a particular data set, parametric or non-parametric tests are
used according to the type of the data studied. Parametric tests are based on the fact that the
distribution is known a priori. Non-parametric tests are based on the fact that the distribution is not
known. For example, if the shape indicates a nearly Normal distribution without outliers, the
Student's t tests can be used. If the data contain outliers or are far from Normal, a non-parametric
method is used such as the Wilcoxon rank test or the Kolmogorov-Smirnoff test (9). The
significance test (10) to be carried out on the four data sets is non-parametric since the distribution
for best fit is not known.

Here, the null hypothesis H0 is:

H0: "The studied distribution is appropriate for the data"

and so the alternative hypothesis H1 is:

H1: "The studied distribution is not appropriate for the data"

The mean and standard deviation for the four flaw length data sets are given in Table 1 and the
Kolmogrov-Smirnoff test was applied.

A 95% (or 0.95) confidence level was selected. The test computes a statistical parameter ‘p’ and if
the p-value is smaller than 0.05, then the null hypothesis H0 cannot be accepted and alternative
hypothesis H1 is accepted. If the p-value is higher than 0.05, then the null hypothesis H0 is accepted
and the alternative hypothesis H1 is rejected with an error of 0.05.

The results of the Kolmogrov-Smirnoff test on the four data sets are given in Table 2.

C10
From Table 2 the only p-values larger than 0.05, for each of the datasets, are for the Log-odds and
Lognormal distributions. Hence, the null hypothesis H0 is accepted for the Logistic and Lognormal
distributions, but H0 is rejected for the Normal and Exponential distributions. Moreover, the p-
values calculated using the Lognormal distribution are larger than the p-values from the Log-odds
distribution. These results serve as quantitative evidence for the distribution of best fit. This is not to
say of course that using the other distributions is useless, it just means that the Lognormal is the
best fit.

Some non-parametric tests are much more appropriate for testing particular distributions. The
Shapiro-Wilk test (9) has been developed for the Normal distribution and the results for the four
data sets are given in Table 3. The results of Table 3 offer additional confirmation that the Normal
distribution, which was used in the earlier JCL model (2), is not the most appropriate distribution
for describing flaw length distributions as all the p-values are significantly less than 0.05.

5.2. D ATASETS INVOLVING FLAW DEPTHS


The statistical study that was carried out over the lengths of the defects has also been carried out on
the flaw depths. Here, only Case 2, Case 3 and Case 4 have flaw depth data.

The four S-Plus presentations for Case 2 (Full Sectioning Data) are illustrated in Figure 9a.
The q-q plots and cfd plots for Case 2 are illustrated in Figures 9b and 9c respectively. Figure 9b
does not show that any one distribution is better than another. Figure 9c shows a degree of
agreement with each distribution, but the Exponential and the Lognormal are the best fit.

Figures 10a, 10b and 10c illustrate Case 3 (Ultrasonic Flaw Depth Data) analyses using S-Plus. The
q-q plots and cfd plots of Figures 10b and 10c all show some agreement with each of the
distributions considered; except that the cfd plot of the Exponential distribution is the worst fit.

The S-Plus results for Case 4 (Reduced Section Data) are illustrated in Figures 11a, 11b and 11c.
Figure 11b shows that the Exponential distribution is probably the best fit. For the other
distributions, most of the data follows the trend of the line but with a lot of points not on the line.
Figure 11c shows that each distribution exhibits some fit with the data, but the Exponential and the
Lognormal are the best fit.

Here, the null hypothesis H0 is:

H0: "The studied distribution is appropriate for the data"

and so the alternative hypothesis H1 is:

H1: "The studied distribution is not appropriate for the data"

The mean and standard deviation for the three flaw depth data sets are given in Table 4 and the
Kolmogrov-Smirnoff test was applied.

As with the flaw length data a 95% confidence level was selected. As before, the test computes a
statistical parameter ‘p’ and if the p-value is smaller than 0.05, then the null hypothesis H0 cannot
be accepted and alternative hypothesis H1 is accepted. If the p-value is higher than 0.05, then the
null hypothesis H0 is accepted and the alternative hypothesis H1 is rejected with an error of 0.05.

The results of the Kolmogrov-Smirnoff test on the three data sets are given in Table 5.

C11
From Table 5 the p-values larger than 0.05 for the three datasets, are for the Log-odds and
Lognormal distributions, although the p-value for case 2, using the exponential distribution, was
also greater than 0.05. Hence, the null hypothesis H0 can be accepted in all cases for the Log-odds
and Lognormal distributions. The null hypothesis H0 can also be accepted for case 2, which used the
exponential distribution. The p-values calculated using the Lognormal distribution are larger than
corresponding the p-values using the Log-odds distribution, and so overall, the Lognormal is
accepted as the best fit.

On applying the Shapiro-Wilk test (9) to each case, it was found, as with the length data, that the
Normal distribution is not the most appropriate distribution for describing flaw depth data.

5.3. REASSESS AND VALIDATE THE MODEL


Despite the fact that the JCL approach for establishing the PoI curves has been discussed in section
4.3 above, the problem as originally stated is given here for completeness and for the analysis that
follows.

Consider a weld which is D% defective and where the defects are distributed randomly. The
problem is: What is the probability of including a defective part, given that I% of the weld is
selected, where I% ≤ 100.

Consider the random variable X and let it be the event: ‘the % of defective weld included’. The
above question can be expressed mathematically in the following way:

What is the P(X>0) when I% of the weld is selected?

Clearly, if 100% of the weld is selected then it should follow P(X>0) = 1 (i.e. a certainty), since the
weld is known to be D% defective.

In order to compute P(X>0), probability theory tells us that

P(X>0) = 1 - P(X ≤ 0)

In this case it does not make sense physically to consider X < 0 (i.e. negative)). So the real problem
is to calculate P(X = 0) for the distribution in question.

In the work completed so far (2), a given D% is assumed and then for every value of I% (i.e. the
amount of weld selected) a unique mean value is derived, which corresponds to the amount of
defective weld contained in I% of the weld selected. It follows that if I% = 100, then =D.

The value of corresponding to each I% selected (i.e. I) is substituted in each of the statistical
distributions used to calculate P(X = 0) and hence P(X>0).

With the previous and current study PoI curves have been calculated using 6 different distributions:

1. Poisson
2. Binomial
3. Hypergeometric
4. Normal
5. Log-odds
6. Lognormal

C12
For the problem as stated above, the first 5 distributions agree very closely and some of the results
have been compared in earlier reports (2). The Binomial and the Hypergeometric have been
compared in this Appendix, which is important given that the two approaches were quite different
and developed independently by two different organisations. The Poisson, Binomial and the
Hypergeometric are all discrete distributions and so the probability

P(X>0) = 1 - P(X = 0) should really be re-expressed as


P(X≥1) = 1 - P(X = 0)

since X can only take on the values whole number values 0, 1, 2, 3, 4 etc.

The Normal, Log-odds and Lognormal are all continuous distributions and do not ‘suffer’ from the
need to re-express P(X>0). However, in each case it is still necessary to compute P(X = 0).

In the case of the Log-odds distribution, the computation at P(X = 0) can be done analytically using
a formula for the cumulative distribution function F(x), which provides a clear idea of the behaviour
at all values of x and in particular at X=0 for different values of D and I, where I and D are
related by equation (14). F(x) for the Log-odds model is usually expressed as

1
F( x ) =
" !( x ! µ I )#
1 + exp $ %
& b '

where b is a shape parameter related to the standard deviation (σ) by:


!b
" =
3

In the case of the Normal and the Lognormal distributions, and for the sake of efficiency, it is
necessary to use statistical packages to compute the results. For example, Microsoft Excel allow the
input of X=0 in the statistical function NORMDIST(X, , , TRUE) (i.e. the Normal cumulative
distribution function) where and are the mean and standard deviation of X.

However, the Excel statistical function LOGNORMDIST(X, l, l), (i.e. the Lognormal cumulative
distribution function) where l, and l are the mean and standard deviation of lnX respectively,
does not allow the input of X=0. In order to analyse the behaviour of the Lognormal distribution at
X=0 (i.e. to compute P(X>0), it is necessary to do it numerically with values that gradually get
closer to zero. The PoI curves for the cases X=0.02, 0.004 and 0.0008, for example, are illustrated in
Figures 12a, 12b and 12c respectively. In truth, Figures 12a, 12b and 12c should be interpreted
P(X>0.02), P(X>0.004) and P(X>0.0008) respectively. From Figure 12, the behaviour of the
Lognormal distribution suggests that as X 0, P(X>0) is a certainty, no matter how much of the
weld is selected and no matter how much of it is defective. Whilst this seems logical at one level, it
is in contrast with the results using the other distributions.

In sections 5.1 and 5.2 it has been shown, through the various statistical tests, that the Lognormal
distribution was the ‘best’ fit for the distribution of flaw length or flaw depth data, which have come
from a variety of data sources (e.g. determined by ultrasonic NDT or determined by sectioning).
The Log-odds distribution was the next ‘best’ fit. This of course does not imply that the Lognormal
will also be the most appropriate distribution to use in the PoI model, which is really a different
problem to fitting a statistical distribution to a set of flaw data. Moreover, the behaviour of the
Lognormal at X=0 suggests that it is not the most appropriate distribution to use for calculating the
C13
PoI curves, particularly as the PoI curves of concern here are defined by P(X>0) = 1 – P(X=0).
However, at larger values of X, the behaviour of the Lognormal does seem appropriate
(e.g. P(X>1 = 1 – P(X≤1))

The next ‘best’ fit to the flaw data was the Log-odds distribution and this has a number of
advantages in the context of the PoI curves. The Log-odds distribution has an analytic function to
describe its behaviour at all values of X. It is continuous, so the values of X are not limited to
discrete jumps. The values of the mean and standard deviation are not limited by some relationship,
as in the case of the Poison and Binomial and hence allow for more general cases. The PoI curves
using the Log-odds distribution agree very well with the PoI curves using the Poison, Binomial,
Hypergeometric and the Normal. A comparison of the PoI curves for P(X>0), using the Normal and
Log-odds distributions are illustrated in Figures 13a and 13b respectively.

5.4. S ELECT THE BEST AVAILABLE MODEL


From Figure 13, the difference between the PoI curves calculated using the Log-odds and the
Normal distributions is pretty negligible. However, it is recommended that the Log-odds
distribution is used to compute the PoI curves, mainly for the reasons stated above and also that the
Log-odds distribution has been shown to be a better statistical fit to the distribution of flaw lengths
and flaw depths, than the Normal distribution.

6. CONCLUSIONS
• The earlier JCL ‘Probability of Inclusion’ model has been validated against an independently
developed ‘Probability of Inclusion’ model by MBEL.
• The Lognormal distribution and the ‘Log-odds’ distribution were found to be the best fits for
the flaw length and flaw depth data in this study, with the Lognormal distribution being the
optimum fit, according to the formal statistical tests carried out.
• The ‘Log-odds’ distribution was found to be the most appropriate distribution to use with the
‘Probability of Inclusion’ model.

7. RECOMMENDATIONS
• The ‘Probability of Inclusion’ model should be applied more widely than just to LPG storage
vessels.

8. ACKNOWLEDGEMENTS
The authors would like to thank the HSE for funding this study and in particular to Graeme Hughes
for his useful comments and advice throughout.

9. REFERENCES
1. Georgiou G A: ‘The Extent of Ultrasonic Non-Invasive Inspection of LPG Storage Vessels’.
HSE Project, JCL Report No. 2/8/99, (September 1999, Revision 1).
2. Georgiou G A: ‘Probabilistic Models for Optimising Defect Detection in LPG Storage Vessels’.
HSE Project, JCL Report No. 3/3/00 (June 2000)
3. Georgiou G A: ‘Proposed Guidelines for Estimating the Extent of Manual Ultrasonic NDT for
LPG Storage Vessels’. HSE Project, JCL Report No. 4/3/00 (July 2000)
4. Georgiou G A: ‘Probabilistic models for optimising defect detection in LPG welds’.
Proceedings of BINDT, September 2000
5. Georgiou G A: ‘Probabilistic Methods for Optimising defects Detection’ (A comparison of
Mitsui Babcock Energy Ltd (MBEL) approach and Jacobi Consulting Ltd approach),
Confidential draft document for HSE (June 2000).
C14
6. Kreyszig, E: ‘Advanced engineering mathematics’, John Wiley & Sons, Inc. 1983 (5th Edition,
pp 922-927).
7. Pi report: ‘Sample inspections of welds on LPG spheres’ (Report Ref. IC 0041/1/98 December
1998).
8. Georgiou G A: ‘Adopting European ultrasonic standards for high quality fabrications:
Implications for manufacturers and end users’. A TWI report (No. 5657/10/95) December 1995.
9. The Kolmogrov-Smirnoff Test: http://en.wikipedia.org/wiki/Kolmogorov_Smirnoff_Test
10. Crawshaw J and Chambers J: ‘A concise course in A level Statistics’, Second Edition, Stanley
Thornes (Publishers) Ltd, 1992.

C15
Table 1 The mean and standard deviation for the four flaw length data sets

Case 1 Case 2 Case 3 Case 4


Mean 22.18966 27.09 33.26 26.75
Standard Deviation 13.37772 20.02949 24.47578 16.91343

Table 2 The ‘p-value’ calculated using the Kolmogrov-Smirnoff test for each data set against
each distribution

Sample Normal Exponential Logistic Lognormal


Case 1 0.0001 0 0.0526 0.4391
Case 2 0.0179 0.0217 0.162 0.3706
Case 3 0.0005 0.0133 0.1063 0.9122
Case 4 0.0126 0.0034 0.142 0.3223

Table 3 The ‘p-value’ calculated using the Shapiro-Wilks test for each data set against the
Normal distribution

Sample Normal
Case 1 2.220446e-016
Case 2 7.333356e-011
Case 3 0
Case 4 0

Table 4 The mean and standard deviation for the three flaw depth data sets

Case 2 Case 3 Case 4


Mean 4.44 8.2 4.207
Standard Deviation 3.663109 4.956123 3.629328

Table 5 The ‘p-value’ calculated using the Kolmogrov-Smirnoff test for each flaw depth data
set against each distribution

Sample Normal Exponential Logistic Lognormal


Case 2 0.0093 0.3628 0.191 0.6419
Case 3 0.5 0.0052 0.5661 0.9381
Case 4 0.0113 0.5 0.1342 0.8328

C16
Table 6 The ‘p-value’ calculated using the Shapiro-Wilks test for each flaw depth data set
against the Normal distribution

Sample Normal
Case 2 1.291189e-013
Case 3 0
Case 4 0

C17
Figure 1 An illustration of a Liquid Petroleum Gas (LPG) sphere

Appendix C
Figure 2 An illustration of a collapsed LPG Sphere

Appendix C
1.00

0.90

0.80

0.70
Probability POI (MBEL's 'Simple' Model)

0.60

0.50

Hypergeometric Model
0.40

100 ! D d
0.30 (k %( % Defective Weld D = 1%
& #& #
(100 ! D d )!
& #& # Defective Weld D = 2%
'0 $' k $= ( 100 ! D d ! k )!
POI = 1 ! 100 !
( k " 100 ! D d )
0.20 ( 100 % ( 100 ! k )!
Defective Weld D = 3%
& #
& # Defective Weld D = 4%
' k $
0.10 Defective Weld D = 10%

POI = 1 (k > 100 ! D d ) Defective Weld D = 40%

0.00
0 10 20 30 40 50 60 70 80 90 100
% Inspection (k)

Figure 3 Probability of including a defective part of at least 1% given a certain % level of inspection (Hyper geometric model)

Appendix C
1.00

0.90

0.80

0.70

0.60
POI (Binomial)

0.50

0.40
MBEL Model D=1%
JCL Model D=1%
0.30
MBEL Model D=4%
k
' D$ JCL Model D=4%
POI M BEL = 1 ( %1 ( "# and
0.20 & 100 MBEL Model D=10%
JCL Model D=10%
100
0.10 ' kD $ MBEL Model D=40%
POI JCL = 1 ( %1 ( ( 0 ! k ! 100 )
& 100 "#
2
JCL Model D=40%
0.00
0 10 20 30 40 50 60 70 80 90 100
% Inspection (k)

Figure 4 Probability of including a defective part of at least 1% given a certain % level of inspection
(Comparison of JCL and MBEL Models)

Appendix C
(a) Data presentation in form of histogram (b) Data presentation in form of boxplot

Frequencies 25 Lengths of defects

80
20

60
15

40
10

20
5
0

0 20 40 60 80
Lengths of defects

(c) Data density presentation (d) Quantile-Quantile

80
0.03

Lengths of defects
60
Probability
0.02

40
0.01

20
0.00

0 20 40 60 80 100 -2 -1 0 1 2
Lengths of defects Quantiles of Standard Normal

Figure 5a A typical analysis and presentation of Case 1 (Pi data) using S-Plus

Appendix C
(a) Normal distribution (b) Exponential distribution
80

80
Empirical quantiles

Empirical quantiles
60

60
40

40
20

20
-2 -1 0 1 2 0 1 2 3 4
Theoretical quantiles Theoretical quantiles

(c) Logistic distribution (d) Lognormal distribution


80

80
Empirical quantiles

Empirical quantiles
60

60
40

40
20

20

-4 -2 0 2 4 0 2 4 6 8 10
Theoretical quantiles Theoretical quantiles

Longueur des defauts


Figure 5b Quantile-Quantile plots of Case 1 (Pi data) against
Probabilite

(a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

(a) Normal distribution (b) Exponential distribution


1.0
1.0

0.8
0.8

Probability
Probability

0.6
0.6

0.4
0.4

0.2
0.2

0.0
0.0

20 40 60 80 20 40 60 80
Lengths of defects Lengths of defects

(c) Logistic distribution (d) Lognormal distribution


1.0

1.0
0.8

0.8
Probability

Probability
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

20 40 60 80 20 40 60 80
Lengths of defects Lengths of defects

Figure 5c Cumulative density function plots of Case 1 (Pi data) against


(a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

Appendix C
(a) Data presentation in form of histogram (b) Data presentation in form of boxplot

Frequencies Lengths of defects


10 12

80
60
8
6

40
4

20
2
0

0
0 20 40 60 80
Lengths of defects

(c) Data density presentation (d) Quantile-Quantile


0.000 0.005 0.010 0.015 0.020

80
Lengths of defects
60
Probability

40
20
0

-20 0 20 40 60 80 100 -2 -1 0 1 2
Lengths of defects Quantiles of Standard Normal

Figure 6a A typical analysis and presentation of Case 2 (Full sectioning data) using S-Plus

Appendix C
( a ) N o r m a l d is t r ib u t io n ( b ) E x p o n e n t ia l d is t r ib u t io n

80
80

Empirical quantiles
Empirical quantiles

60
60

40
40

20
20

0
0

-2 -1 0 1 2 0 1 2 3 4
T h e o r e tic a l q u a n tile s T h e o r e tic a l q u a n tile s

( c ) L o g is t ic d is t r ib u t io n ( d ) L o g n o r m a l d is t r ib u t io n

80
80

Empirical quantiles
Empirical quantiles

60
60

40
40

20
20

0
0

-4 -2 0 2 4 0 2 4 6 8 10
T h e o r e tic a l q u a n tile s T h e o r e tic a l q u a n tile s

Figure 6b Quantile-Quantile plots of Case 2 (full sectioning data) against


(a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

(a ) N o rm a l d is trib u tio n (b ) E x p o n e n tia l d is trib u tio n


1.0

1.0
0.8

0.8
Probability

Probability
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

0 20 40 60 80 0 20 40 60 80
L e n g th s o f d e fe c ts L e n g th s o f d e fe c ts

(c ) L o g is tic d is trib u tio n (d ) L o g n o rm a l d is trib u tio n


1.0

1.0
0.8

0.8
Probability

Probability
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

0 20 40 60 80 0 20 40 60 80
L e n g th s o f d e fe c ts L e n g th s o f d e fe c ts

Figure 6c Cumulative density function plots of Case 2 (full sectioning data) against
(a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

Appendix C
(a) Data presentation in the form of histogram (b) Data presentation in the form of boxplot

Frequencies Length of defects


20

100
80
15

60
10

40
5

20
0

0 20 40 60 80 100 120

Length of defects

(c) Data density presentation (d) Quantile-Quantile


0.000 0.005 0.010 0.015 0.020

100
80
Length of defects
Probabilities

60
40
20

0 50 100 -2 -1 0 1 2
Length of defects Quantiles of Standard Normal

Figure 7a A typical analysis and presentation of Case 3 (ultrasonic data) using S-Plus

Appendix C
100
(a) Normal distribution (b) Exponential distribution

100
Empirical quantiles

Empirical quantiles
80

80
60

60
40

40
20

20
-2 -1 0 1 2 0 1 2 3 4
Theoretical quantiles Theoretical quantiles

(c) Logistic distribution (d) Lognormal distribution


100

100
Empirical quantiles

Empirical quantiles
80

80
60

60
40

40
20

20
-4 -2 0 2 4 0 2 4 6 8 10
Theoretical quantiles Theoretical quantiles

Figure 7b Quantile-Quantile plots of Case 3 (ultrasonic data) against


(a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions
Longueur des defauts
Probabilite

(a) Normal distribution (b) Exponential distribution


1.0

1.0
0.8

0.8
Probability

Probability
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

20 40 60 80 100 20 40 60 80 100
Lengths of defects Lengths of defects

(c) Logistic distribution (d) Lognormal distribution


1.0

1.0
0.8

0.8
Probability

Probability
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

20 40 60 80 100 20 40 60 80 100
Lengths of defects Lengths of defects

Figure 7c Cumulative density function plots of Case 3 (ultrasonic data) against


(a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions
Appendix C
(a ) D a ta p re s e n ta tio n in th e fo rm o f h is to g ra m (b ) D a ta p re s e n ta tio n in th e fo rm o f b o x p lo t

F re q u e n cie s L e n g th o f d e fe cts
12 14

60
10
8

40
6
4

20
2
0

0 20 40 60 80
L e n g th o f d e fe cts

(c ) D a ta d e n s ity p re s e n ta tio n (d ) Q u a n tile -Q u a n tile


0.000 0.005 0.010 0.015 0.020 0.025

60
Length of defects
Probabilities

40
20

0 20 40 60 80 -2 -1 0 1 2
L e n g th o f d e fe cts Q u a n tile s o f S ta n d a rd N o rm a l

Figure 8a A typical analysis and presentation of Case 4 (reduced sectioning data) using S-Plus

Appendix C
( a ) N o r m a l d is tr ib u tio n ( b ) E x p o n e n tia l d is tr ib u tio n

60

60
Empirical quantiles

Empirical quantiles
40

40
20

20
-2 -1 0 1 2 0 1 2 3 4
T h e o re tic a l q u a n tile s T h e o re tic a l q u a n tile s

( c ) L o g is tic d is tr ib u tio n ( d ) L o g n o r m a l d is tr ib u tio n


60

60
Empirical quantiles

Empirical quantiles
40

40
20

20
-4 -2 0 2 4 0 2 4 6 8 10
T h e o re tic a l q u a n tile s T h e o re tic a l q u a n tile s

Figure 8b Quantile-Quantile plots of Case 4 (reduced ultrasonic data) against


L o n g u e u r d e s d e fa u ts
(a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions
P ro b a b ilite

( a ) N o r m a l d is tr ib u tio n ( b ) E x p o n e n tia l d is tr ib u tio n


1.0

1.0
0.8

0.8
0.6

0.6
Probability

Probability
0.4

0.4
0.2

0.2
0.0

0.0

20 40 60 20 40 60
L e n g th s o f d e fe c ts L e n g th s o f d e fe c ts

( c ) L o g is tic d is tr ib u tio n ( d ) L o g n o r m a l d is tr ib u tio n


1.0

1.0
0.8

0.8
0.6

0.6
Probability

Probability
0.4

0.4
0.2

0.2
0.0

0.0

20 40 60 20 40 60
L e n g th s o f d e fe c ts L e n g th s o f d e fe c ts

Figure 8c Cumulative density function plots of Case 4 (reduced sectioning data) against
(a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

Appendix C
(a) Data presentation in the form of histogram (b) Data presentation in the form of boxplot
Frequencies Depth of defects

8 10 12 14
15
10

6
5

4
2
0

0
0 5 10 15
Depth of defects

(c) Data density presentation (d) Quantile-Quantile


0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14

8 10 12 14
Depth of defects
Probabilities

6
4
2
0

0 5 10 15 -2 -1 0 1 2
Depth of defects Quantiles of Standard Normal

Figure 9a A typical analysis and presentation of Case 2 (Full sectioning data) using S-Plus

Appendix C
(a) Normal distribution (b) Exponential distribution
8 10 12 14

8 10 12 14
Empirical quantiles

Empirical quantiles
6

6
4

4
2

2
0

0
-2 -1 0 1 2 0 1 2 3 4
Theoretical quantiles Theoretical quantiles

(c) Logistic distribution (d) Lognormal distribution


8 10 12 14

8 10 12 14
Empirical quantiles

Empirical quantiles
6

6
4

4
2

2
0

-4 -2 0 2 4 0 0 2 4 6 8 10
Theoretical quantiles Theoretical quantiles

Figure 9b Quantile-Quantile plots of Case 2 (full sectioning data) against


(a) Longueur
Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions
des defauts
Probabilite

(a) Normal distribution (b) Exponential distribution


1.0

1.0
0.8

0.8
Probability

Probability
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

0 5 10 15 0 5 10 15
Depths of defects Depths of defects

(c) Logistic distribution (d) Lognormal distribution


1.0

1.0
0.8

0.8
Probability

Probability
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

0 5 10 15 0 5 10 15
Depths of defects Depths of defects

Figure 9c Cumulative density function plots of Case 2 (full sectioning data) against
(a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

Appendix C
(a) Data presentation in the form of histogram (b) Data presentation in the form of boxplot

Frequencies Depth of defects


10

25
8

20
6

15
4

10
2

5
0

0 5 10 15 20 25
Depth of defects

(c) Density data presentation (d) Quantile-Quantile


0.00 0.02 0.04 0.06 0.08 0.10

25
20
Depth of defects
Probabilities

15
10
5

0 10 20 30 -2 -1 0 1 2
Depth of defects Quantiles of Standard Normal

Figure 10a A typical analysis and presentation of Case 3 (ultrasonic data) using S-Plus

Appendix C
(a) Normal distribution (b) Exponential distribution

25
25

20
20

Empirical quantiles
Empirical quantiles

15
15

10
10

5
5

-2 -1 0 1 2 0 1 2 3 4
Theoretical quantiles Theoretical quantiles

(c) Logistic distribution (d) Lognormal distribution


25

25
20

20
Empirical quantiles

Empirical quantiles
15

15
10

10
5

5
-4 -2 0 2 4 0 2 4 6 8 10
Theoretical quantiles Theoretical quantiles

Figure 10b Quantile-Quantile plots of Case 3 (ultrasonic data) against


(a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions
Longueur des defauts
Probabilite

(a) Normal distribution (b) Exponential distribution


1.0

1.0
0.8

0.8
Probability

Probability
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

5 10 15 20 25 5 10 15 20 25
Depths of defects Depths of defects

(c) Logistic distribution (d) Lognormal distribution


1.0

1.0
0.8

0.8
Probability

Probability
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

5 10 15 20 25 5 10 15 20 25
Depths of defects Depths of defects

Figure 10c Cumulative density function plots of Case 3 (ultrasonic data) against
(a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

Appendix C
(a) Data presentation in form of histogram (b) Data presentation in form of boxplot

Frequencies Depths of defects


20

8 10 12 14
15
10

6
4
5

2
0

0
0 5 10 15
Depths of defects

(c) Data density presentation (d) Quantile-Quantile


0.000.020.040.060.080.100.120.14

8 10 12 14
Depths of defects
Probability

6
4
2
0

0 5 10 15 -2 -1 0 1 2
Depths of defects Quantiles of Standard Normal

Figure 11a A typical analysis and presentation of Case 4 (reduced sectioning data) using S-Plus

Appendix C
(a) Normal distribution (b) Exponential distribution

15

15
Empirical quantiles

Empirical quantiles
10

10
5

5
0

0
-2 -1 0 1 2 0 1 2 3 4
Theoretical quantiles Theoretical quantiles

(c) Logistic distribution (d) Lognormal distribution


15

15
Empirical quantiles

Empirical quantiles
10

10
5

5
0

0
-4 -2 0 2 4 0 2 4 6 8 10
Theoretical quantiles Theoretical quantiles

Figure 11b Quantile-Quantile plots of Case 4 (reduced ultrasonic data) against


Longueur des defauts
(a) Normal,
Probabilite (b) Exponential, (c) Logistic and (d) Lognormal distributions

(a) Normal distribution (b) Exponential distribution


1.0
1.0

0.8
0.8

Probability
Probability

0.6
0.6

0.4
0.4

0.2
0.2

0.0
0.0

0 5 10 15 0 5 10 15
Depths of defects Depths of defects

(c) Logistic distribution (d) Lognormal distribution


1.0

1.0
0.8

0.8
Probability

Probability
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

0 5 10 15 0 5 10 15
Depths of defects Depths of defects

Figure 11c Cumulative density function plots of Case 4 (reduced sectioning data) against
(a) Normal, (b) Exponential, (c) Logistic and (d) Lognormal distributions

Appendix C
1.00

0.90

0.80

0.70
Probability of Inclusion (Lognormal)

0.60

0.50

1% Defective Weld
(a)
0.40
2% Defective Weld
3% Defective Weld
0.30 4% Defective Weld
5% Defective Weld
6% defective Weld
0.20 8% Defective Weld
10% Defective Weld

0.10
P(X>0) computed at X=1, that is,
P(X>0) = 1 - P(X=0.02)
0.00
0 10 20 30 40 50 60 70 80 90 100
% Inspection (I%)

1.00

0.90

0.80

0.70
Probability of Inclusion (Lognormal)

0.60

0.50

1% Defective Weld
(b)
0.40
2% Defective Weld
3% Defective Weld
0.30 4% Defective Weld
5% Defective Weld
6% defective Weld
0.20 8% Defective Weld
10% Defective Weld

0.10
P(X>0) computed at X=1, that is,
P(X>0) = 1 - P(X=0.004)
0.00
0 10 20 30 40 50 60 70 80 90 100
% Inspection (I%)

1.00

0.90

0.80

0.70
Probability of Inclusion (Lognormal)

0.60

0.50

1% Defective Weld
(c)
0.40
2% Defective Weld
3% Defective Weld
0.30 4% Defective Weld
5% Defective Weld
6% defective Weld
0.20 8% Defective Weld
10% Defective Weld

0.10
P(X>0) computed at X=1, that is,
P(X>0) = 1 - P(X=0.0008)
0.00
0 10 20 30 40 50 60 70 80 90 100
% Inspection (I%)

Figure 12 Estimating the probability of including a defective part (P(X>0) = 1-P(X=0)) using
the Lognormal distribution given a certain % level of inspection for the cases
(a) X=0.02, (b) X=0.004 and (c) X=0.0008
Appendix C
1.00

0.90

0.80

0.70

0.60
Probability (Normal)

0.50

(a)
0.40
1% Defective Weld
2% Defective Weld
0.30 3% Defective Weld
4% Defective Weld
5% Defective Weld
0.20 Normal Distribution with 6% Defective Weld
variance = mean 8% defective Weld
for comparison with Poisson 10% Defective Weld
0.10

0.00
0 10 20 30 40 50 60 70 80 90 100
I % )(
% Inspection

1.00

0.90

0.80

0.70

0.60
Probability (Logistic)

0.50

(b)
0.40
1% Defective Weld
2% Defective Weld
0.30 3% Defective Weld
4% defective Weld
5% Defective Weld
0.20 Logistic Distribution with 6% Defective Weld
variance = mean
8% Defective Weld
for comparison with Poisson
10% Defective Weld
0.10

0.00
0 10 20 30 40 50 60 70 80 90 100
% Inspection ( I %)

Figure 13 The probability of including a defective part (P(X>0) = 1-P(X=0)) using


(a) the Normal distribution and (b) the Log-odds distribution

Appendix C
APPENDIX CI

PART A: A SUMMARY COMPARISON OF THE MBEL AND JCL


‘PROBABILITY OF INCLUSION’ MODELS

PART B: THE MBEL ‘PROBABILITY OF INCLUSION’ MODEL


(Kindly provided by B Shepherd, MBEL November 2005)

CI 1
PART A: A SUMMARY COMPARISON OF THE MBEL AND JCL PROBABILITY OF
INCLUSION MODELS
1. GENERAL POINTS ABOUT MBEL DEFINITIONS
(a) The word 'uniform' on page 1 should be avoided as it stands for a specific distribution in statistics.
(b) Pd is not the 'Probability that a unit contains a defect' but the 'Probability that any one unit selected
at random is defective' or the 'Probability that any 1% of the weld selected at random is defective'.
(c) The notation for the amount of weld selected 'Cov' should be avoided as this could be initially
misunderstood as representing the 'Covariance', another statistical parameter.

2. SUMMARY OF MBEL'S AND JCL’S MODELLING APPROACHES


In essence the MBEL approach divides the weld into one hundred discrete 1% units, Dd% of which is
defective and randomly distributed. The problem considered is, if k% of the weld is selected find the
probability it includes a defective part (a) with replacement (Binomial), (b) without replacement
(Hypergeometric).

In the JCL approach the weld is not divided into one hundred discrete units. The weld is considered to
have a finite number of defects (L) with different lengths and which are randomly distributed along the
weld, resulting in a mean amount of defective weld D%. A probability is established for including a
defective part for each rI% selected. This probability is proportional to the size of selection, unlike the
MBEL approach where the units are of equal size. This result is then fed into the different distributions
considered and for all possible rI% selections.

3. A COMPARISON OF THE MBEL APPROACH AND THE JCL APPROACH FOR


THE BINOMIAL
The two different formulae derived using two different approaches are:

k
! D "
POI MBEL = 1 # $1 # % and
& 100 '
(1)
100
! kD "
POI JCL = 1 # $1 # 2 %
( 0 ( k ( 100 )
& 100 '

The comparison using equation 1 (c.f. Figure 4, Appendix C) shows the results are almost identical.

4. COMMENTS ON THE HYPERGEOMETRIC MODEL


The Hypergeometric model does overcome the 'irritating' problem of inaccuracy at high values of the %
weld selection (i.e. large k) and for low values of mean defect distribution (i.e. D < 4).

If the Hypergeometric approach is adopted, it has to be appreciated that the PoI is the 'probability of
including a defective part of at least 1%' and not the 'probability of including at least 1 defect' as stated
in the MBEL document. The 'probability of including a defective part of at least 1%' may not be the
most desirable way of expressing the probability of including a defective part.

On the other hand, this is not an issue with the Normal or Log-odds distributions as discussed in
Appendix C, notwithstanding the difficulties for D < 4. The comparison in this exercise has explained
where this difficulty comes from more clearly and provides further independent evidence validating the
JCL results and for that matter the MBEL results.

CI 2
PART B: THE MBEL ‘PROBABILITY OF INCLUSION’ MODEL

PROBABILITY OF DETECTING AT LEAST ONE DEFECT AS A FUNCTION OF WELD


DEFECTIVENESS AND SAMPLE SIZE.

Description of Problem:
Assume a weld has a uniform random distribution of defects within its length. This random
distribution can be represented as a percentage of defective weld structure. If a defect will be
detected when the region it is in is inspected (assumes 100% detection capability), the problem is
to determine the probability of detecting at least 1 defect when only inspecting a percentage of the
weld volume (and therefore e.g. being alerted to the fact that a particular degradation mechanism
is active)
POI : Probability of Inclusion of a Defect
Cov : Percentage Coverage of the Weld Volume.
Dd : Defect Distribution as a Percentage

Assumptions:
To assist in calculating the POI the following assumptions are made:
The weld volume can be divided into 100 discrete units.
Probability that a unit contains a defect:
Pd = Dd / 100%.
Probability that a unit contains no defect:
P0 = 1 - Pd
Probability of Inclusion:
POI = 1-PON
Where
PON : Probability of Detecting No Defects

Methods:
Using the above definitions and assumptions a number of models could be used to determine the
POI:

CI 3
Simple Evaluation:
Using the following data evaluation, a formula can be determined and extrapolated to 100%
coverage.

Pd % 1 2 3 4

P0 % 99 98 97 96

Cov PON PON PON PON


%

0 1 1 1 1

1 99 98 97 96
100 100 100 100

2 99 98 98 97 97 96 96 95
! ! ! !
100 99 100 99 100 99 100 99

3 99 98 97 98 97 96 97 96 95 96 95 94
! ! ! ! ! ! ! !
100 99 98 100 99 98 100 99 98 100 99 98

4 99 98 97 96 98 97 96 95 97 96 95 94 96 95 94 93
! ! ! ! ! ! ! ! ! ! ! !
100 99 98 97 100 99 98 97 100 99 98 97 100 99 98 97

General Formula
(100 ! Pd )!
(100 ! Pd ! Cov)!
PON =
100!
(100 ! Cov)!
Binomial Distribution:

' n! $ n ( r
POI = 1 ( PON = 1 ( %% "" ! P0 ! Pd r
& (n ( r )!r! #
Where:
n = Cov
r = 0 defects to detect
By altering n from 0 to 100 % coverage the POI rises from 0 to 1 in a curve.

CI 4
Poisson Distribution:

& e ' µµ r #
POI = 1 ' PON = 1 ' $$ !!
% r! "
where
µ = nPd
n = Cov
r = 0 defects to detect

Graphs for all three analysis methods are presented.


Generally all three graphs exhibit similar trends.
The binomial and Poisson relationship do not provide reliable results for high coverage of welds
with a low defect distribution, since the graphs do not predict a probability of 1 at 100% coverage of
the weld with 4% uniform defect distribution.
A more rigorous treatment, which also addresses the probability of detection (POD) of the
inspection method, is provided in the reference below.
Reference
Probabilistic Models for Optimising Defect Detection in LPG Welds
G. A. Georgiou, Proceedings of the British Institute of NDT Conference 2000, pp 168 - 173

CI 5
Probability of Detecting 1 Defect Given a Percentage coverage of a Defective Weld.
Simple Distribution

0.9

0.8

0.7
Probability of Detecting a Defect

0.6

0.5

0.4

4% defective weld (uniform Distribution)


0.3 10% defective weld (uniform Distribution)
40% defective weld (uniform Distribution)

0.2

0.1

0
0 10 20 30 40 50 60 70 80 90 100
Percentage Coverage

CI 6
Probability of Detecting 1 Defect Given a Percentage coverage of a Defective Weld.
Using Binomial Distribution Model

0.9

0.8
Probability of Detecting a defect

0.7

0.6

0.5

0.4

0.3 Binomial dist for 4% defective weld


Binomial dist for 10% defective weld
0.2
Binomial dist for 40% defective weld
0.1

0
0 10 20 30 40 50 60 70 80 90 100
Percentage Coverage

CI 7
Probability of Detecting 1 Defect Given a Percentage coverage of a Defective Weld.
Using Poisson Distribution Model

0.9

0.8
Probability of Detecting a defect

0.7

0.6

0.5

0.4

0.3 Poisson dist for 4% defective weld


Poisson dist for 10% defective weld
0.2
Poisson dist for 40% defective weld
0.1

0
0 10 20 30 40 50 60 70 80 90 100
Percentage Coverage

CI 8
APPENDIX CII

THE DATASETS USED IN THE STATISTICAL


ANALYSES IN APPENDIX C

CII 1
Case 1 (Pi data, 58 UT data) Case 2 (TWI, 50 sectioning data) Case 3 (TWI, 50 UT data) Case 4 (TWI, 42 sectioning data)
See Pi report IC 0041/1//98 See GSP 5657/10/95 See GSP 5657/10/95 See GSP 5657/10/95
Appendix C Tables B2 & B3 corresponding to 50 UT lengths and depths Tables B2 & B3, 50 UT lengths and depths Tables B2 & B3 corresponding to 42 UT lengths and 54 UT depths

Original Length Section Flaw Section Depth Flaw Length Depth Flaw Length Depth
Reference (mm) Ident. Identification Length (mm) (mm) type mm mm type mm mm
B1-1-IP 35 IP 10 CP 10 25 5 CP 6 23 NA CP2 2.6
B1-2-IP 27 IP 8 Crk 1 38 4.1 CP10 5.5 CP 6 13 0
B1-3-IP 40 IP 9 Crk 2 38 5.7 Crk 1 30 NA CP10 5
B1-4-IP 25 IL 6 Crk 4 39 2.6 Crk 2 33 NA Crk 1 28 4.1
B1-5-IP 20 IP 6 Crk 5 48 4 Crk 4 40 NA Crk 2 38 5.7
B1-6-IP 35 LRF 1 Crk 6 43 1 Crk 5 60 3.5 Crk 4 39 2.6
B2-1-IP 35 W 2 Crk 7 38 5.4 Crk 6 57 <3 Crk 5 48 4
B2-3-IP 70 W 3 IL 1 47 3.4 Crk 7 35 <3 Crk 6 43 1
B2-5-IP 40 W 4 IL 2 38 3.5 IL 1 52 9 Crk 7 38 5.4
B2-6-IP 20 W 1 IL 6 4 1.9 IL 2 41 3 IL 1 47 3.4
B2-7-IP 80 CP 2 IL 7 16 13.5 IL 6 13 <3 IL 2 38 3.5
B3-1-IP 40 CP 8 IP 1 10 15 IL 7 31.5 4 IL 6 4 1.9
B3-2-IP 27 IP 5 IP 2 61 1 IP 1 23 <3 IL 7 16 13.5
B3-3-IP 20 IP 1 IP 3 14 8.4 IP 2 41 3.5 IL 10 12 0.7
B3-4-IP 25 IP 4 IP 4 10 9.8 IP 3 8 NA IP 1 10 15
B3-5-IP 20 PL 3 IP 5 9 0.6 IP 4 23 NA IP 2 1
B3-6-IP 30 CP 7 IP 7 31 1.3 IP 5 14 NA IP 3 14 8.4
B3-7-IP 32 IL 3 LSF 1 67 3.5 IP 7 18 <3 IP 4 10 9.8
B3-9-IP 28 LSF 7 LSF 2 56 5.5 LSF 1 87 10 IP 5 0.6
B3-10-IP 23 IL 10 LSF 3 44 8.5 LSF 2 65 13 IP 7 1.3
B4-1-IP 15 PL 2 LSF 4 46 8 LSF 3 71 7 LSF 1 63 3.5
B4-2-IP 15 CP 3 LSF 5 17 4.8 LSF 4 93 25 LSF 2 56 5.5
B4-3-IP 20 CP 6 LSF 6 16 10.5 LSF 5 27 <3 LSF 3 44 8.5
B4-4-IP 25 LSF 8 LSF 7 11 5 LSF 6 27.5 11.5 LSF 4 46 8
B4-5-IP 10 IP 3 LSF 8 13 4.1 LSF 7 107 12 LSF 5 17 4.8
B5-1-MK 15 CP 1 Pl 1 13 8.1 LSF 8 15.5 NA LSF 6 16 10.5
B5-3-MK 12 CP 5 Pl 3 36 2 Pl 1 31.5 10.5 LSF 7 11 5
B5-4-MK 10 CP 4 Pl 4 17 1.7 Pl 3 25 6 LSF 8 13 4.1
B5-5-MK 10 IL 7 Pl 5 36 1.7 Pl 4 24 4 PL 1 1.9
B5-6-MK 10 LSF 6 Pl 7 25 3 Pl 5 20 8 Pl 1 13 8.1
B5-8-MK 19 LSF 5 Pl 8 19 5.5 Pl 7 39 7 Pl 3 36 2
B5-9-MK 25 IL 5 Pl 9 16 13.2 Pl 8 27.5 7.5 Pl 4 17 1.7
B5-10-MK 12 IL 9 Ps 2 6 1.8 Pl 9 21.5 6 Pl 5 36 1.7
B5-11-MK 25 LSF 10 Ps 3 9 0.9 Ps 2 16.5 NA Pl 7 25 3
B6-1-MK 9 IL 4 Ps 5 11 0.7 Ps 3 13 NA Pl 8 19 5.5
B6-2-MK 22 LSF 9 Ps 6 23 1.1 Ps 5 8 NA Pl 9 16 13.2
B6-3-MK 5 CP 10 Ps 7 74 5.5 Ps 6 8 NA Ps 2 6 1.8
B6-4-MK 25 CP 9 Ps 9 20 1.9 Ps 7 88 8 Ps 3 0.9
B6-7-MK 20 Crk 1 Ps 10 27 5.3 Ps 9 11.5 NA Ps 5 0.7
B6-8-MK 18 IP 7 Pt 2 2 0.4 Ps 10 25 NA Ps 6 23 1.1
B7-1-MK 20 IL 8 Pt 3 2 0.7 Pt 2 10 NA Ps 7 74 5.5
B7-2-MK 20 Crk 2 Pt 4 2 0.4 Pt 3 8.5 NA Ps 9 20 1.9
B7-4-MK 15 Crk 7 Th 2 38 3.2 Pt 4 8.5 NA Ps 10 27 5.3
B7-6-MK 10 IL 2 Th 3 28 1.9 Th 2 38 <3 Pt 2 0.4
B7-7-MK 10 Crk 4 Th 4 9.5 0.7 Th 3 35 <3 Pt 3 0.7
B7-8-MK 12 Crk 6 Th 5 39 3.3 Th 4 10 NA Pt 4 0.4
B7-9-MK 12 LSF 3 Th 6 20 4.6 Th 5 31 <3 Th 2 38 3.2
B7-10-MK 12 LSF 4 Th 7 10 10.8 Th 6 32 <3 Th 3 28 1.9
E1-1-IP 35 IL 1 Th 9 3 1.6 Th 7 18 NA Th 4 9.5 0.7
E1-2-IP 15 Crk 5 Th 10 90 5.9 Th 9 6 NA Th 5 39 3.3
E1-3-IP 25 Th 10 72 <3 Th 6 20 4.6
E3-2-IP 20 Th 7 10 10.8
E3-4-IP 10 Th 9 3 1.6
E4-1-MK 12 Th 10 5.9
E4-2-MK 10
E4-3-MK 15
E4-6-MK 20
T3-1-MK 20

Total = 1287.00 Total = 1354.50 222.00 Total = 1663.00 164.00 Total = 1123.50 227.20
Mean = 22.19 Mean = 27.09 4.44 Mean = 33.26 8.20 Mean = 26.75 4.21
SD = 13.38 SD = 20.03 3.66 SD = 24.48 4.96 SD = 16.91 3.63

CII 2
APPENDIX D

PROPOSED GUIDELINES FOR ESTIMATING THE


EXTENT OF NDT FOR WELDS
Proposed Guidelines for Estimating the
Extent of NDT for Welds
TABLE OF CONTENTS

1 INTRODUCTION AND BACKGROUND D1


2 PROCEDURE D1
3 DECISION TREES D2
3.1 PROBABILITY OF INCLUSION CURVES D2
3.2 PROBABILITY OF DETECTION CURVES D3
3.3 INDEX OF DETECTION D3
4 REFERENCES D3

Figures 1 – 4

APPENDIX DI EXAMPLES FOR THE USING THE DECISION TREES (FIGURES


1 AND 2) FOR A PARTICULAR NDT METHOD
1 INTRODUCTION AND BACKGROUND
Just over 5 years ago HSE had reason to investigate the occurrence of cracking in Liquid
Petroleum Gas (LPG) storage vessels. As a result of these investigations and obvious
concerns, there was a requirement to carry out manual ultrasonic NDT to ensure that the
structural integrity of the vessels was not compromised. An important question for the
operators of LPG vessels was how much manual ultrasonic NDT was necessary, considering
the significant outlay costs involved as well as loss of production? Also, in the absence of
useful and comprehensive ultrasonic inspection data, the problem of estimating how much
NDT was required, and moreover where to inspect is not straightforward.

In order to assist the LPG industry at the time, HSE funded research work to develop
‘Probability of Inclusion’ (PoI) models, based on well established statistical theoretical
approaches. The PoI models could be used to estimate the level of manual ultrasonic NDT
required to:

(i) include a defective part of the weld


(ii) to detect the defective part.

In that early work, the PoI models were developed specifically for LPG storage vessels in the
context of manual ultrasonic NDT. However, this early work has been updated and in this
appendix, which is a complement document to Appendix C, the application of the PoI models
is considered to be wider than just the ultrasonic NDT of LPG storage vessels. The PoI
models have been adapted for welds in general and the user may need to substitute their own
PoI curves and their own Probability of Detection (PoD) curves appropriate to the particular
NDT method of interest.

The PoI modelling work assumes at the outset that D% of the total length of weld is
defective, (i.e. D > 0) and randomly distributed. The method of establishing the value of D
will be up to the user, but could be based on some sample inspection. The final PoI model
results presented here have been compared with a number of other statistical approaches and
shown to be valid for D  4%, which is believed to be a reasonable lower bound for industrial
applications (and in the light of the investigations highlighted above).

The conclusions and results of the PoI modelling work in Appendix C form the basis of these
proposed guidelines.

2 PROCEDURE
In developing a procedure for industry to estimate the % level NDT required, a number of
key issues have been considered and these are highlighted below.

• Has any NDT been carried out already?


• Have the inspected areas been targeted or have they been selected at random?
• What are the critical flaw sizes of concern for the particular weld?
• Have any significant flaws been detected?

Decision trees and graphs are provided to help estimate the % level of NDT necessary to
include as well as to detect a defective part of the total weld length, to a required probability.
This is considered for whatever % level (if any) of NDT was carried out previously.

D1
It is believed that flaws are more likely to occur in certain areas of welds than others, as there
is evidence of this in industrial data (e.g. horizontal seam welds in LPG vessels). In
recognising this, if the NDT carried out on selected parts of the weld include such critical or
'targeted' areas, a weighting factor (w) is applied to the level of NDT already carried out. This
increases the equivalent level of NDT and, in turn, increases the probability of including and
detecting a defective part of the weld. The weighting factor (w) is set at 1.5.

It is assumed that the operator is able to calculate the critical flaw size (i.e. height) using
fracture mechanics methods.

A general principle is adopted that if at any point in the inspection a significant flaw is
detected, then 100% NDT is carried out followed by procedures for remedial repairs and
checks to remove the flaws.

3 DECISION TREES
The % level of NDT previously carried out (I%), falls into three categories:

(i) NO NDT (I = 0%), in which case some guidance is provided to estimate a sufficient
level of inspection (decision tree, Figure 1).
(ii) SOME NDT (0% < I < 100%), in which case some guidance is provided to assess
whether the level of NDT is sufficient or whether an additional amount is necessary
(decision tree, Figure 2).
(iii) MAXIMUM NDT (I = 100%), in which case if any significant flaws are detected, it
is assumed that procedures apply for remedial repairs and checks to remove the
flaws. No decision tree is provided for this special case.

The decision trees in Figures 1 and 2 are linked to two additional Figures and a brief
description is given of these as well as some simple examples of how they are used.

3.1 PROBABILITY OF INCLUSION C URVES


In Figure 3, the PoI curves illustrated are for the 4%, 6%,10%, 20% and 40% degrees of
defective weld and are based on the Log-odds model (see Appendix C). The height of each
curve represents the probability of including a defective part of the weld, assuming a certain
level of NDT I% (i.e. % area selected for inspection). It is predictable that as I% approaches
100% then the PoI approaches unity. It may be necessary to recalculate the PoI curves in
Figure 3, depending on the mean and standard deviation of the data. For illustrative purposes,
the mean=variance in Figure 3 (c.f. Poisson distribution).

In practice it is reasonable to assume that 4% of the weld could be defective. The 4% curve in
Figure 3 is recommended for use in estimating the % level of NDT corresponding to a given
PoI, or vice versa (i.e. estimating the PoI corresponding to a given % level of NDT).

For example, to achieve a high probability of including a defective part, consider a PoI value
of 0.9. Figure 3 shows that for a weld that is 4% defective, nearly 50% of the weld would
need to be inspected, assuming the weld area was all non-targeted. If all the weld area was
targeted, then the equivalent amount of weld to be inspected would be reduced to about 33%
(i.e. 50/w), where the weighting factor w = 1.5.

D2
3.2 PROBABILITY OF DETECTION C URVES
A number of 'defect detection trials' have been carried out previously to estimate the
probability of detection (PoD) using conventional manual ultrasonic inspection (e.g. National
NDT Centre (UK), NORDTEST (DNV, Norway) and NIL (Netherlands)). An AEA
Technology report (1) has brought these results together and compared the various PoD
curves to compute a lower bound PoD curve for a range of defect heights (1). This lower
bound curve is reproduced here in Figure 4. If the NDT method was not ultrasonics, then the
appropriate PoD curve for the NDT method of interest could be substituted for Figure 4
(c.f. Figures 8-13 in the main PoD report).

For example, if the critical flaw height for a particular weld was 3mm, then the PoD from
Figure 4 would be just over 0.5. For a 6mm height the PoD would be just under 0.7.

3.3 INDEX OF DETECTION


For the PoD values in Figure 4, it is assumed that the flaw has been included in the selected
part of the weld. If it were not known that the flaw is included then from the multiplication
law of probability for independent events (2), the PoD would need to be multiplied by the
PoI. In fact Figure 4 is used in conjunction with Figure 3 in the decision trees to compute a
parameter defined as the 'Index of Detection' (PID) where

PID = PoI  PoD (1)

which is the probability of including and detecting a flaw (PID) with a certain flaw depth.

For example, using the case highlighted in section 3.1 above with PoI = 0.9, then the
probability of including and detecting a 3mm high flaw would be about (0.9)(0.5) = 0.45, a
45% chance. For a 6mm high flaw this would be about (0.9)(0.7) = 0.63, a 63% chance.

There are two more detailed examples provided in Appendix DI, of how to use the PoI and
PoD curves in conjunction with the decision trees for the cases when:

i. No NDT was previously carried out


ii. Some NDT was previously carried out

If the final PID value is considered high enough then the NDT carried is regarded as
sufficient. If it is not then NDT is necessary. In practice, an acceptable PID value shall be set
by a competent person.

4 REFERENCES
1. AEA Technology Report, AEAT-4389 HOIS (98) P8 Issue 2 (DRAFT). Data for PoD
curve supplied with kind permission by Dr. Martin Wall, AEA Technology.
2. Crawshaw J and Chambers J: 'A Concise Course In A-Level Statistics With Worked
Examples'. Stanley Thornes (Publishers) Ltd, 2nd Edition, 1994, ISBN 0-7487-0455-8.

D3
APPENDIX DI

EXAMPLES FOR THE USING THE DECISION TREES (FIGURES 1 AND 2)


FOR A PARTICULAR NDT METHOD
In both Case 1 and Case 2 below the following is assumed:

• The D = 4% curve is used


• The critical flaw height is taken to be 6mm.
• The agreed acceptable value for the 'Index of Detection' is PID = 0.6
• The weighting factor w = 1.5
• If at any time during the inspection a significant flaw is detected, then 100% NDT will
apply followed by procedures for remedial repairs and checks to remove any significant
flaws

In following through both cases below it is important to have the relevant decision trees (i.e. Figures
1 and 2), the PoI curves and PoD curve (i.e. Figures 3 and 4).

CASE 1: Using the Decision Tree for 'No previous NDT' (Figure 1)

• The PoD value corresponding to 6mm (i.e. Figure 4 or equivalent) is 0.69.


• Select, for example, I = 30% which  PoI = 0.82.
• The corresponding 'Index of Detection' PID = (0.82)(0.69) = 0.57
• PID is unacceptable, since 0.57 < 0.6
• Increase I to 40%, which  PoI = 0.88
• The corresponding 'Index of Detection' PID = (0.88)(0.69) = 0.61
• PID is acceptable since 0.61 > 0.6
• Take I = 40% and consider the following possibilities before carrying the inspection

• If I = 40% is all non-targeted, IN = 40%


• If I = 40% is all targeted, IE  27% (IE = 40/1.5)
• If, for example, I = 40% is split IN = 10% and IT = 30%, IE  30% (IE = 10 + (30/1.5))
• If, for example, I = 40% is split IN = 30% and IT = 10%, IE  37% (IE = 30 + (10/1.5))

• Carry out NDT according to IN or IE above and assess for any significant flaws.

DI 1
CASE 2: Using the Decision Tree for 'Previous manual ultrasonic NDT=I%' (Figure 14)

Note that the PoD value corresponding to 6mm is 0.69 in all the different examples considered
below.

• Assume I = 30% and assess the split of non-targeted and targeted inspection (IN and IT)

• If I = 30% was all non-targeted then IN = 30% and PoI = 0.82


• The corresponding 'Index of Detection' PID = (0.82)(0.69) = 0.57
• PID is unacceptable, since 0.57 < 0.6
• Assess the additional amount of manual ultrasonic NDT to ensure PID is acceptable. For
example, if an additional 10% was considered and was all non-targeted, then IN =40% and
PoI = 0.88. The new PID = (0.88)(0.69) = 0.61, which is acceptable. If on the other hand the
additional 10% was all targeted then it is equivalent to 10 x 1.5 = 15%, that is, a total
IE = 45% and PoI = 0.9. The new PID = (0.9)(0.69) = 0.62, which is acceptable. If this
additional 10% was split as IN + IT, then 0.61  PID  0.62 and an even split of IN = 5% and IT
= 5% would suffice.
• Carry out the additional NDT and assess for any significant flaws.

• If I = 30% was all targeted, then IE = (30)(1.5) = 45% and PoI = 0.9
• The corresponding 'Index of Detection' PID = (0.9)(0.69) = 0.62
• PID is acceptable, since 0.62 > 0.6
• Stop, since no significant flaws have been detected along this branch of the tree.

• If, for example, I = 30% was split IN = 10% and IT = 20%, then IE = 10+(20)(1.5) = 40 %
and PoI = 0.88
• The corresponding 'Index of Detection' PID = (0.88)(0.69) = 0.61
• PID is acceptable, since 0.61 > 0.6
• Stop, since no significant flaws have been detected along this branch of the tree.

• If, for example, I = 30% was split IN = 20% and IT = 10%, then IE = 20+(10)(1.5) = 35%
and PoI = 0.85.
• The corresponding 'Index of Detection' PID = (0.85)(0.69) = 0.59
• PID is unacceptable, since 0.59 < 0.6
• Assess the additional amount of NDT to ensure PID is acceptable in the manner demonstrated
in the first example above (e.g. in this case an additional IN = 5%, or IT = 4% would suffice).
• Carry out the additional NDT and assess for any significant flaws.

DI 2
Start

Key:
Select a level for NDT I Actual % level of NDT
Obtain the PoD for (I %) and obtain the
the critical flaw size corresponding PoI value IE Equivalent % level of NDT
from Figure 4 from Figure 3 IN The non-targeted % level of NDT
IT The targeted % level of NDT
w The weighting factor (set at 1.5)
PoI Probability of Inclusion
Compute the PoD Probability of Detection
'Index of Detection'
PID Index of Detection
PID = PoI x PoD

Is Y Are areas to be Y Carry out non-targeted


PID acceptable? inspected all non- NDT
targeted? IN = I%

N
N
Increase the I % level for Carry out targeted NDT
NDT and obtain the new Are areas to be Y equivalent to I%
inspected all IE = (I / w) %
corresponding PoI value targeted?
from Figure 3

Carry out 100% NDT,


Carry out NDT
equivalent to I=IN+IT Detected any Y followed by procedures
significant flaws? for remedial repairs and
checks, to remove flaws
IE = (IN + (IT / w)) %

Stop

Figure 1 No previous NDT: a decision tree to estimate a sufficient % level of inspection

Appendix DI
1.00

0.90

0.80

0.70
Probability of Inclusion (Log-odds model)

0.60

0.50

0.40
4% Defective Weld

0.30 6% Defective Weld

10% Defective Weld


0.20 Log-odds Distribution with 20% Defective Weld
variance = mean
40% Defective Weld
0.10

0.00
0 10 20 30 40 50 60 70 80 90 100
% Level of Inspection ( I %)

Figure 3 The Probability of including a defective part of the weld given a certain % level of inspection, using the Log-odds model
Appendix DI
1.00

0.90

0.80

0.70
Probability of Detection ( POD )

0.60

0.50

0.40

0.30

Based on manual
0.20 ultrasonic NDT

0.10

0.00
0 1 2 3 4 5 6 7 8 9 10
Flaw Depth (mm)

Figure 4 The lower bound probability of detection vs. flaw depth (reproduced with kind permission from reference (1)
Appendix DI
Start

Carry out 100% NDT,


Detected any Y followed by procedures for
significant flaws? remedial repairs and
checks to remove flaws

N
Obtain the PoI value from
Y Figure 3 corresponding to
Are areas inspected the non-targeted level of
all non-targeted? NDT
IN = I%

N
Obtain the PoI value from
Y Figure 3 corresponding to
Are areas inspected the 'equivalent level' of
all targeted? NDT
IE = (I x w) %

Obtain the PoI value from Find the PoD for the
Figure 3 corresponding to critical flaw size from
the 'equivalent level' of NDT Figure 4 and compute the
to IE=IN+IT 'Index of Detection'
IE = (IN + (IT x w)) % PID = PoI x PoD

Assess the additional NDT


required using Figure 3 and
the above formulae, where
N Is Y
necessary, to ensure PID is PID acceptable? Stop
acceptable. Carry out
additional inspection.

Carry out 100% NDT,


Detected any Y followed by procedures for
significant flaws? remedial repairs and
checks to remove flaws

Key:
I Actual % level of NDT
IE Equivalent % level of NDT
IN The non-targeted % level of NDT
IT The targeted % level of NDT
w The weighting factor (set at 1.5)
PoI Probability of Inclusion
PoD Probability of Detection
PID Index of Detection

Figure 2 Previous NDT = I%: a decision tree to assess whether I% is sufficient


Appendix DI
Published by the Health and Safety Executive
06/06
RR 454

You might also like