You are on page 1of 3

346

termediate measurements leading to a de-


INTERNATIONAL COMMITTEES cision is sufficiently small.

Accred Qual Assur (2000) 5 : 346–348


3 Forms of uncertainty information
in qualitative testing
Qualitative testing generally relates to ca-
tegorical statements, such as ‘present/ab-
sent’, ‘pass/fail’, chemical species, or per-
haps membership of a class of com-
pounds. Such classification statements are
not usually associated with a range of ex-
pression; one does not, in general report-
Stephen L.R. Ellison This paper sets out some of the main ing, generally speak of an artefact or ma-
issues arising for analysts in testing labo- terial being a 90% pass, or 99% present 2.
Uncertainties in ratories and accreditation bodies inter-
ested in the assessment of uncertainty in
The typical form of uncertainty informa-
tion is, as a result, typically probabilistic
qualitative testing and qualitative testing. While it does not pro- in nature. That is, one gives an indication
vide detailed statistical methods for the of the probability of a given classification
analysis characterisation of uncertainties in quali- being correct.
tative testing, it does provide general The most familiar and widely used
guidance on the main issues. form of such information is, at present,
The following document has been devel- the use of false response rates, particular-
oped by the EURACHEM Measurement ly “false positive rates” and “false nega-
Uncertainty Working Group 1. It is pre- 2 Importance of uncertainty in tive rates”.
sented with a view to developing policy Probably the most important alterna-
qualitative testing tive to simple statements of false re-
and promoting work on the topic. Com-
ments on the content and the issues sponse rates is the use of values derived
raised are invited, and should be ad- Broadly, qualitative testing provides a from Bayes’ theorem (a summary of
dressed to the working group secretary simple statement or categorisation of a Bayes’ theorem is given in reference 2).
(above). test item or material. Decisions are in- Examples include likelihood ratio (an in-
variably taken as a result; for example, dication of the additional information
whether or not to issue a batch of fertilis- provided by a test result) and posterior
er, whether water is fit to drink, whether probability, an indication of the probabil-
1 Introduction a person is in possession of controlled ity of an object fitting a given category
substance or not, or whether a newly syn- given a test result. Bayesian estimates are
Uncertainties associated with quantitative thesised material has the desired struc- particularly widely used in evaluating fo-
measurement results have been the sub- ture. Clearly, incorrect classifications – rensic evidence, for example DNA
ject of considerable activity since the such as ‘passing’ a product when in fact it matching or blood group matching. Fur-
publication of the ISO Guide on the top- is unfit for use – carry risks to all parties. ther details can be found elsewhere [ref.
ic [1]. By comparison, the issue of uncer- To control those risks, professionals in- 2 and references cited therein]. Bayesian
tainties in qualitative testing and analysis volved in testing take pains to ensure that estimates can be calculated by appro-
(referred to elsewhere as “identification their methods lead to acceptably low priate combination of false positive and
certainty” [2]) has received less attention. risks of incorrect classification. false negative rates.
With the publication of ISO 17025 : 1999, It follows that, at some point in the
however, interest in uncertainties in test- development of any such test method, an
ing operations has increased. The prob- evaluation must be made as to the risk of
lems of establishing uncertainty asso- incorrect classification. For most such 4 Nomenclature relating to
ciated with qualitative tests, such as ‘pass/ methods, therefore, it is reasonable to ex- qualitative testing uncertainties
fail’, identity and comparative identity pect a laboratory to establish, or have ac-
tests have accordingly become more im- cess to, information on the risks of incor- The nomenclature for qualitative testing
portant. rect results. is not fully developed. An example will
An important exception is the use of illustrate a current problem. The term
standard test methods, established by ‘false negative rate’ can, in principle,
1
Members of the working group at the groups outside the particular laboratory have two quite different interpretations:
time of publication are as follows: A Wil- as fit for the purpose in question. The la- i) The chance, or frequency, of negative
liams (Chairman), S Ellison (Secretary), boratory may well have limited, or even responses given that the response
M Berglund, W Haesselbarth, K Hede- no access to the risk information leading should be positive. Broadly, this is the
gaard, R Kaarls, M Mansson, M Rosslein, to that decision. However, such methods fraction of ‘true positive’ test items
R Stephany, A van der Veen, W Weg- invariably specify a test procedure in that return negative responses.
scheider, H van de Wiel, R Wood. The some detail, and the laboratory will gen-
group includes representatives from other erally be expected to show that those fac-
bodies as follows: CITAC: Pan Xiu Rong, tors which are within its control do in-
2
M Salit, A Squirrell, K Yasuda., AOAC deed meet the requirements of the test Partial class membership is used exten-
International: R Johnson, Jung-Keun method. That, in turn, may involve de- sively in “fuzzy logic” systems, but the
Lee, D Mowrey. IAEA: P De Regge, A monstrating that the uncertainty of refer- relevant terminology and treatment is
Fajgelj. EA: D Galsworthy. ence values, calibration operations or in- very rare in ordinary testing activities.
347

ii) The frequency of incorrect negative sented by a leading forensic statistician obtained are unlikely to be better than
responses in a series of tests, that is, and the accompanying information on its order-of-magnitude estimates.
the fraction of the testing population interpretation in relation to other evi-
which returns false negatives. dence so confused the jury that the case
The difference appears subtle, but is im- was dismissed. 7.2 Predicted false response rates
portant. In case i), the fraction is not ex- Where an indication of the test re-
pected to change with the number of sult’s reliability is required, it may be Examples include:
‘true positives’ in the population. This most useful to adopt a semiquantitative i) Prediction of chance spectroscopic
fraction could be established by appro- reporting system. For example, forensic peak matching from uncertainties in
priate method performance studies with scientists in the UK have recommended a peak position, for example using bi-
known ‘true positive’ samples. But in the ’weight of evidence’ scale along the lines nomial or hypergeometric statistics.
second case, a very small fraction of ‘true of “indication”/ “strong indication”/“very Note: In estimating the probabilities
positives’ in the population leads to a strong indication”, with each expression of multiple events (e.g. a six-peak
very low fraction of ‘false negatives’ irres- related to (overlapping) ranges of a match in a spectrum), predicted prob-
pective of the performance of the meth- Bayesian likelihood ratio. abilities are often extremely sensitive
od. Current nomenclature in analytical to choice of the probabilities assigned
chemistry does not distinguish these to individual events; predictions are
terms. It follows that there is a strong therefore unlikely to be very accurate.
risk of confusion in using even familiar 7 Methods of evaluating ii) Prediction of chance threshold exceed-
terminology at present. uncertainties in qualitative testing ence from the known or estimated dis-
persion of measurement results or
Broadly, there are two general methods from the measurement uncertainty of
of evaluating false response probabilities. the results.
5 The reliability of probabilistic The first relies on observation of false Note: If normal distributions are as-
information used to characterise probabilities in a series of controlled sumed, probabilities fall off very
uncertainties in qualitative testing tests. The second relies on prediction sharply with increasing distance be-
from known population characteristics, tween threshold levels and typical lev-
including statistics of quantitative measur- els of response. However, very consid-
Because false response rates are, in gen- ements and known distributions of test erable caution is advisable in extrapo-
eral, low for effective methods, it often sample characteristics in a population. lating much beyond 95% confidence
takes an extremely large number of ex- The latter might include, for example, the bounds. Due to such factors as human
periments to obtain even indicative val- observed peak incidence rate at different error, it is generally observed in rou-
ues. Further, observed false response positions in an IR spectrum. tine measurements that the probabili-
rates are influenced very considerably by ties of very extreme results, though
the characteristics of the test population. still quite low, are nonetheless many
For example, false response rates are orders of magnitude higher than
much higher when the typical level of a 7.1 False response rates from experiment would be expected on the basis of the
material falls close to the response normal distribution.
threshold of a simple spot test. Thus, it is False response rates are hard to observe
unrealistic to expect great reliability in in a realistic number of experiments un-
false response rates obtained within a la- less the rate is high (near 50%). The
boratory; it is often difficult to obtain most practical experiments thus concen-
trate on regions where false responses are
8 The relevance of measurement
false response rate figures accurate to
within an order of magnitude. likely. Typical approaches include: uncertainty
Quantitative expression and reporting i) False positive rates in the presence of
of qualitative testing uncertainties is ac- high levels of known cross-reacting in- Measurement uncertainty as described in
cordingly likely to give indicative, but not terferents. In these experiments, the the GUM [1] impacts qualitative measur-
very accurate information. choice of interferent is critical; the ex- ements in two ways.
perimenter must at a minimum ob- i) Control of uncertainties in test param-
serve false response rates in the pres- eters, such as times, temperatures,
ence of the worst case interferent at lengths etc, is vital for reliable qualita-
levels significantly above those found tive testing. Typically, a laboratory is
6 Reporting uncertainties in for the interferent in the normal test expected to control factors affecting
qualitative testing population. the test result to within well estab-
ii) False negative rates at very low levels lished tolerances, or to show that the
Three main factors bear on the reporting of analyte. uncertainty is sufficiently small to
of uncertainty information in qualitative A related experiment involves chance have no significant influence on the
testing. First, probabilistic statements are mismatch studies in reference databases. outcome of the test.
frequently misinterpreted by non-statisti- In some cases, this allows the equivalent Note: Because false response rates are
cians. Second, reliable figures are difficult of many thousands of experiments. How- hard to measure, good data on the
to obtain by observation. Third, some in- ever, though informative and powerful, a sensitivity of the test result to varia-
dicators in common use are subject to current limitation is that such databases tion in input factors is subject to the
misinterpretation even by professionals. are often quite unrepresentative of the same practical limitations as the deter-
For these reasons, quantitative reporting testing population; for example, while the mination of false response rates. Typ-
of qualitative uncertainty information is prevalence of different materials in gen- ical experiments would accordingly
very much the exception, rather than the eral use varies widely, a typical reference aim to demonstrate that substantial
rule. To underline the point: in one re- database will only contain one of each. change – say, larger than 3–5 times
cent UK court case, the Judge ruled that This may lead to significantly biased the uncertainty – in a particular pa-
the quantitative probability evidence pre- probability estimates; again, the values rameter had limited effect. It is un-
348

realistic to expect quantitative sensi- ing the limits of the method, for exam-
tivity analysis in routine qualitative ple by locating ‘worst case’ scenarios 12 Implications
testing. well outside normal usage or by pro-
ii) Measurement uncertainty related to gressive departure from normal operat- 1. It is realistic to expect that testing la-
intermediate measurements may in- ing conditions until false responses be- boratories have qualitative test method
form predictions of false response come significant. parameters (conditions of testing) un-
rates (see above). – Few explore the less critical response der adequate control. Evidence of that
In either case, the laboratory will typical- rate, either because it is unimportant to will typically involve
ly be expected to provide uncertainty es- the customer or because it is impracti- – clear evidence of traceability for the
timates based on established principles cal. values of important control parameters
(i.e. the GUM [1]). Of course, where – A few sectors have started to use Baye- prescribed by the method
equipment is calibrated by a third party, sian probability estimates in assessing – evidence that uncertainties in these pa-
the relevant uncertainty values will usual- the performance of qualitative tests; rameters are sufficiently small for the
ly be provided to the laboratory on cali- the forensic sector is probably the most purpose
bration certificates etc. advanced. Even here, direct reporting 2. It is important for laboratories to
of probability information is rare be- check at least the most critical false re-
cause of uncertainties in the various sponse rate for a qualitative test.
terms needed for the estimate. 3. It is reasonable to expect laboratories
9 The relevance of traceability – Though there are publications on quali- to be following published codes of best
tative test failure probabilities and risks practice in qualitative testing where
in the specialist literature, few labora- they are available.
Traceability of measurement results, ref-
tories can be expected to have access 4. Quantitative (i.e. numerical) reports of
erence values and calibration values is es-
to the wide range of journals involved. uncertainties in qualitative test results
sential in qualitative testing. It is particu-
Further, such papers tend to be written should not generally be expected.
larly critical where the qualitative test re-
for specialists, and are accordingly not
lies on comparison with reference values
easy to implement for a routine test Acknowledgements Preparation of this
(such as in comparing wavelength data in
lab. There is thus little detailed and ac- paper was supported under contract to
an IR spectrum with a reference data-
cessible guidance available to the gen- the UK Department of Trade and Indus-
base, or comparing melting point data
eral laboratory population. try as part of the National Measurement
with literature values). This follows from
– There is often sectoral or more general System Valid Analytical Measurement
the observation that reference data is oft-
guidance on good practice in qualita- (VAM) Programme [3].
en collated by organisations and at times
tive testing, and while this may not ad-
remote from the testing laboratory. Real-
dress uncertainty directly, it typically
istic comparison is only feasible if both
addresses other issues associated with
the reference data supplier and the test References
quality control and assurance for the
laboratory are using measurements trace-
type of tests involved.
able to common references with accepta-
bly small uncertainties. 1. Guide to the Expression of Uncertain-
A further important property of refer- ty in Measurement, ISO, Geneva,
ence data, where used, is that its origin Switzerland (1993) ISBN 92-67-10188-9
should be well established and the condi- 2. Ellison SLR, Gregory S, Hardcastle
tions under which it was obtained well WA (1998) Analyst 123 : 1155–1161
11 Future developments 3. Fleming J (1995) Anal Proc 32 : 31–32
documented.
There is a need to standardise the nom-
enclature relating to false response rates.
There is an additional need to provide
10 The current state of the art accessible and consistent guidance on the
study of qualitative test performance. S.L.R. Ellison (Y)
– Most competent laboratories currently EURACHEM is pursuing both these LGC, Queens Road, Teddington,
evaluate one (the most critical for the ends through the measurement uncertain- TW11 0LY, England
application) of the false response rates, ty working group, and hopes to obtain e-mail: slre6lgc.co.uk
typically by experiments during method wider input from other international or- Tel.: c44-181-943-7000
validation. Best practice involves stress- ganisations. Fax: c44-181-943-2767

You might also like