You are on page 1of 6

Accred Qual Assur (2002) 7:488–493

DOI 10.1007/s00769-002-0523-6 PRACTITIONER’S REPORT


© Springer-Verlag 2002

Per Hyltoft Petersen Importance of the choice of assumptions


and models in the estimation of analytical
quality specifications

Received: 1 June, 2002 Abstract The validity of any model ‘clinical settings’/’clinical outcome’
Accepted: 17 July 2002 depends on its ability to imagine the models, followed by specifications
situation or problem to which it is based on biological variation and on
Presented at the European Conference on applied. Further, the assumptions ‘clinicians opinions’. This contribu-
Quality in the Spotlight in Medical Labora- made in relation to the model are de- tion, deals with the problems of
tories, 7–9 October 2001, Antwerp,
Belgium termining for the actual outcome. combining random and systematic
Within the field of clinical biochem- errors and the implications of appli-
P.H. Petersen istry a lot of models for analytical cation of different models to a vari-
Department of Clinical Biochemistry, quality specifications, based on a va- ety of clinical settings.
Odense University Hospital, riety of concepts and ‘clinical set-
5000 Odense-C, Denmark tings’, have been proposed. A hierar- Keywords Analytical bias ·
e-mail: chical structure for application of Bi-modal distributions · Ordinal
per.hyltoft.Petersen@ouh.fyns-amt.dk
these approaches and models has scale · Point-of-care-testing (POCT) ·
NOKLUS been agreed on at several occasions Random error · Ratio scale ·
Norwegian centre for external quality
assurance of primary care laboratories, in 1999. In this hierarchy, the highest Systematic errors · Uni-modal
Division of General Practice, rank is given to evaluation of analyt- distributions
University of Bergen, Norway ical quality specifications based on

Introduction fications. The basic idea was presented in an editorial in


Clinical Chemistry [2] and accepted by ISO (ISO/TC
Since Tonks presented his analytical quality specifica- 212 WG 3: 15196) and again at a conference in Stock-
tions in 1963 [1] a great number of proposals for such holm (Strategies to Set Global Analytical Quality Speci-
goals has been presented. The principles and applications fications in Laboratory Medicine) from which the pro-
of all these suggestions were mainly linked to certain ceedings were published in Scand J Clin Lab Invest [3]
purposes relevant for the proposer’s position, e.g. statis- the same year.
ticians and some clinical biochemists produced analyti- The consensus document from the Stockholm confer-
cal quality specifications based on biology, other clinical ence [4] is close to the proposal from the editorial in Clin
biochemists closer related to the use of data based their Chem [2], so the approaches and models for clinical use
specifications on clinical strategies/clinical settings/clini- of biochemical and haematological data were placed on
cal outcome, whereas external quality assessment orga- the top of the hierarchy. The next level was “evaluation
nizers based their specifications on “the state of the art” of the effect of analytical performance on clinical deci-
(which in this context means something like the current sions in general: data based on components of biological
quality as estimated from surveys). variations and data based on analysis of clinicians opin-
There was no real co-ordination of all these approach- ion”, followed by other approaches. In this paper I will
es and proposals for analytical quality specifications un- only look at the models related to the clinical settings
til 1999, where four different events resulted in an agree- and start with a more general discussion of the models
ment between scientists within this area on a hierarchy for combining random and systematic errors in relation
of models and approaches to set analytical quality speci- to biochemical measurements.
489

Models for combining random and systematic should be remembered what Box and Luceño write [6]:
errors “All models are wrong – but some models are useful”. In
the following, models for estimation of analytical quality
A fundamental problem in the estimation of analytical specifications for the highest level in the hierarchy, ‘clin-
quality specifications is the combination of random and ical strategies’/’clinical settings’/’clinical outcome’ will
systematic errors. be discussed.
The nature of random errors is different from system-
atic errors and they are described by standard deviations
(or coefficients of variation) and differences (from de- The scale of measurement
fined target values), respectively. It would be logical to
keep these two incommensurable concepts separated, but Most measurements in clinical biochemistry are per-
there is a considerable desire and pressure for creation of formed on a ratio scale, where proportions are constant
such models. Some of these models have been compared and there is a true zero point. Using this scale, calcula-
in a recent publication [5], where three main concepts tions of means, standard deviations and coefficients of
were linear models, squared models and combinations of variation are allowed. The difference scale is much like
the two. the ratio scale, but it has no true zero (as the temperature
measured in degrees C). Means and standard deviations
1. The linear model has the form ‘total error’ TE=“bi- can be calculated, but not coefficients of variation. The
as”+z×s, where bias is the systematic error defined as ordinal scale has ranks, but there is no guarantee of any
a constant deviation from a target value (the truth), z constancy of the steps. An example is joy, which is diffi-
is the standard deviate for a certain probability and s cult to quantify but obviously has several ranks. In clini-
is the random error component, expressed as a stan- cal biochemistry the ordinal scale is mostly used as a
dard deviation. The equation describes a straight line substitute for a ratio scale, where an exact number is not
in a TE-s plot with intercept=“bias” and slope=z. The important, or where speed or costs make a simple solu-
line will vary with “bias” and chosen z-value. This tion acceptable (i.e. there is an underlying ratio scale).
model is mostly used in relation to internal control. The nominal scale is independent of magnitude and re-
2. There are two main models for combination of ran- fers to clear qualities without rank as for example col-
dom and systematic error based on variances: ours (red and green) or classification of cells, where it is
a) The classical variance model sqrt(s2+bias2), which possible to count the numbers of the different groups.
turns out to be a circle in a simple plot. This model Most analytical quality specifications are derived for the
has mainly been used in relation to description of ratio scale, but examples from the ordinal scale will be
within- and between-variation, whether for biological discussed.
or analytical variation.
b) The GUM-model (Guide to expression of Uncer-
tainty in Measurement) which transforms all linear Analytical quality specifications
deviations into variances, as for example a rectangu-
lar distribution with the length 2×a which is trans- The creation of analytical quality specifications in rela-
formed into a standard deviation of a/31/2. This model tion to clinical settings has been concentrated on two
is used in accreditation. clinical decision situations: classification and monitoring
3. These combined models are created in relation to the of patients.
estimation of analytical quality specifications and usual-
ly include biological variation. One of the simplest
models based on the strategy of establishing common Classification
reference intervals is 1.96×(sBiological2+sAnalytical2)1/2+
“bias”, where sBiological and sAnalytical are biological and Classification is usually based on measurements of the
analytical standard deviations, respectively. More mod- concentration of a component in a single sample from
els will be described in the following. the patient, but the interpretation of the result may be
performed according to a uni-modal or a bi-modal model
These examples demonstrate that random and systematic as illustrated in Fig. 1.
errors can be combined according to different models,
and that the result will be different, not only according to
the model but also according to the assumptions applied, Uni-modal distributions
e.g. dependent on the probability chosen. Consequently,
the models should only be used for the purpose to which The uni-modal distribution presents the classification ac-
they have been invented, and when models are used they cording to risk as discussed by Klee et al. for plasma-
should be validated for the purpose. In this context it cholesterol [7]. The decision limit is set according to
490

Fig. 1 Principle models in classification. Upper: uni-modal distri-


bution. An increasing risk according to measured concentration is
assumed and the chosen decision limit separates the otherwise ho-
mogeneous distribution into two groups with high and low risk,
respectively. There is no underlying dichotomy, so the apparent
prevalence is not real, but determined by the decision point. Low-
er: bi-modal distribution with a well defined (but often unknown)
prevalence for the disease under consideration. Characteristic esti- Fig. 2 Combined FP and FN from a bi-modal distribution. The
mates are fractions of true positives, TP, true negatives, TN, false ‘outcome’ as function of analytical bias is calculated for two pre-
positives, FP, and false negatives, FN valences and two sets of weighting factors and for three different
assumed values of analytical imprecision. Upper: prevalence=0.2,
weight of FP=3 and FN=1. Lower: prevalence=0.8, weight of
some clinical decision (often a recommendation) which FP=1 and FN=3
divide the population into people at risk (high risk) or
not (low risk). It is important to realize that the decision
limit determines the prevalence as clearly demonstrated As for the uni-modal distributions, the evaluation of
for plasma-glucose, where the new ADA- and WHO-cri- analytical quality specifications are performed by simu-
teria increase the number (and fraction) of diabetics by lation of combinations of increasing random and system-
changing the decision point [8]. The implications of ana- atic errors and calculation of the fractions FP and FN,
lytical bias (systematic error) on the classification of in- until a level where these values are not longer accept-
dividuals from the population according to plasma-cho- able. When the fractions of acceptable FP and FN are de-
lesterol and combinations of random error and bias for cided from a clinical point of view, then the analytical
plasma-glucose result in considerable misclassifications quality specifications can be read off from curves where
of individuals for relatively small errors. The analytical the simulated errors equal these acceptable values.
quality specifications are defined for the combination of It is also possible to combine FP and FN (and even
random and systematic error, for which, the outcome is also with TP and TN) resulting in a sum of misclassifica-
still acceptable according to the fraction of misclassifica- tions [9] as shown in Fig. 2. Here are two examples illus-
tions. In general the analytical quality specifications de- trating the effect of prevalence as well as the influence of
rived from uni-modal distributions are demanding. weighting factors for FP and FN on an outcome (in arbi-
trary units) as function of analytical bias for three differ-
ent values of imprecision. In the case of a prevalence of
Bi-modal distributions 0.2 and weights of FP=3 and FN=1, the effect of nega-
tive bias is small and the influence of imprecision negli-
The bi-modal concept (Fig. 1) is based on the assump- gible, whereas positive bias increases the complex sum
tion of a prevalence which is independent of the mea- of misclassifications considerably. Here the outcome is
surement. The test is performed with the purpose of clas- also heavily dependent on the imprecision. This is in
sifying the patient correctly in the diseased or non-dis- contrast to a prevalence of 0.8 and weights of FP=1 and
eased group. A change of cutoff point will not result in FN=3, where a negative bias has considerable effect on
another prevalence, but in a change in fractions of true the outcome, but not a positive bias, and further, the in-
positive, TP, true negative, TN, false positive, FP, and fluence of imprecision is nearly constant and of less im-
false negative, FN, and thus on the outcome of the result. portance.
491

Ordinal scale

A special form of classification is related to the ordinal


scale where there is an underlying ratio scale for bi-mod-
al distributions. This is mostly for urine analyses, preg-
nancy-tests, where the simple dichotomy classifies the
patients according to a single measurement with the pos-
sible results 0 and 1 (minus and plus). The reason for
choosing the ordinal scale measurement is often econo-
my or the need of a quick result, by use a dip-stick or
simple device to be used by non-professional persons.
The main problem with this type of measurement is
that within a wide concentration range, the outcome ‘0’
or ‘1’ is a statistical event, i.e. there is a certain probabil- Fig. 3 Estimation of analytical quality specifications for ordinal
ity (for example 10%) for getting the result ‘1’. Conse- scale based on biological principles. The frequency distribution of
a log-Gaussian distribution of reference values from a healthy
quently, it is not wrong to get the result ‘0’, neither to get population is illustrated on a log-scale abscissa (Doh). Further, the
the result ‘1’. In order to estimate such percentages a line of probability of obtaining the result ‘1’ by measurement is
great number of independent measurements is needed, shown on a probit scale with the same abscissa (Po1). For each
which can only be obtained under certain conditions as, small interval the fraction of individuals is multiplied with the
probability of getting the result ‘1’, and the distribution of mea-
for example, external surveys where the same control sured ‘1’ is drawn (Doh×Po1=Dom1)
samples of concentrations within this range are sent to a
great number of laboratories. The combined results will
then disclose increasing percentages of ‘1’ for increasing certain probability of measuring ‘1’, so by multiplying
concentrations which, when plotted on a probit scale these two values the fraction of healthy individuals to get
with logarithmic abscissa, will fit to a straight line as the result ‘1’ is calculated. When this process is per-
shown for some components in urine and for samples of formed for all concentration intervals a curve describing
streptococcus [10]. Consequently, control samples with the distribution of measured ‘1’ is formed. By integrat-
concentrations within this interval cannot be used to val- ing this distribution, the fraction of healthy reference in-
idate the single laboratory, but to characterise and vali- dividuals to be measured as ‘1’ (tested positive) is deter-
date the different kits/methods when split into peer- mined. The desired percentage may be 2.5%, as for tradi-
groups. To control the single laboratories, control sam- tional estimation of reference intervals. Thus the analyti-
ples with concentrations just outside the interval (100% cal quality specifications for the tests are to obtain an an-
‘0’ and 100% ‘1’) should be applied. alytical quality, which will result in 2.5% of the healthy
The analytical quality specifications for measure- population to be measured as ‘1’. This may be obtainable
ments on an ordinal scale has only been set arbitrarily for different methods with varying position and slope of
until now [10], but in the following, a new approach to the probability line.
this is proposed based on biology.

Monitoring
Analytical quality specifications for measurements on
ordinal scale based on biology In monitoring of patients using measurements of a single
component, only a few attempts have been made to de-
Assuming a log-Gaussian distribution of reference val- rive analytical quality specifications for the monitoring
ues for a component, which could have been measured against a defined concentration, e.g. [11], whereas the
on the ratio scale, e.g. urine-Bilirubin or urine glucose, monitoring using just the change in concentration be-
and a dip-stick which give the result ‘0’ or ‘1’, character- tween two samplings has been investigated intensely,
ized by the percentages of measured ‘1’ for known con- e.g. [12], using the formula
centrations which fits to a straight line in a probit plot
CD = z × 21/2 × (CV2W-S + CVA2 )1/2+∆ in analytical bias
with logarithmic abscissa. A principle example is shown
in Fig. 3, where the distribution of healthy individuals is where CD is the critical difference, z is the standard devi-
shown as Gaussian according to the logarithmic abscissa ate for a defined probability, e.g. z=1.65 for a one-tail
and the probability of measuring ‘1’ is illustrated by the probability of 95%, 21/2 is a factor due to the two measure-
straight line in the probit plot. For each small concentra- ments, CVW-S is the biological within-subject variation,
tion interval the frequency distribution of reference val- CVA is the analytical imprecision and ∆ in analytical bias
ues from the healthy population denotes a certain frac- denotes the possible change in bias between the two mea-
tion of the total population. For this interval there is a surements. Assuming z=1.96 (two tailed probability of
492

we assume that the increase is linear for the period of


time under consideration and that the turnaround time is
1 h. Further, we assume that the sample interval is 1 h
and that the change during this hour is equivalent to the
reference change, RC, discussed above, i.e. CVA=
0.5×CVW-S, but here CVW-S describes the short-term
within-subject variation under the specific conditions
(and not the long-term biological variation tabled in
many articles). Three simplified situations are illustrated
in the lower part of Fig. 4:
1. Measurement in laboratory: samples are drawn at
2 and 3 h and the samples are analysed in the labora-
tory, which makes the result available at the time 4 h.
2. POCT measurements at location: samples are drawn
at 2 and 3 h and analysed at location, which makes
the result available at the time 3 h.
3. POCT measurements at location: samples are drawn
at 2 and 4 hours and analysed at location, which
makes the result available at the time 4 h.
Examples 1 and 2 illustrate the usual comparison be-
tween POCT and laboratory. The benefit is the early re-
sult, but the analytical quality specifications are the same
for POCT and laboratory. In example 3, however, the re-
Fig. 4 Comparison of measurements performed in laboratory and
POCT. Upper: the simple model for a linear increase in the me- sult from POCT is not earlier than from the laboratory,
asurand at time 2 h. Lower: there are three steps: A) measure- but as the change is larger, the analytical quality specifi-
ments in laboratory of samplings at time 2 and 3 h plus 1 h turn- cations must be wider as well. Using any sampling inter-
around time gives the result at 4 h; B) POCT measurement under val between 1 and 2 h for POCT will allow both for an
same conditions as A), but without turnaround time – result at 3 h
– same analytical quality specifications as A); C) POCT measure- earlier result and for wider analytical quality specifica-
ments at 2 and 4 h – result at 4 h (as in A)), but wider analytical tions. Thus, it may be possible to find a relationship be-
quality specifications tween analytical quality specifications for POCT, sam-
pling interval, and turnaround time, when CVW-S and
slope of expected increase are known.
95%), ∆ in analytical bias=0 and CVA=0.5×CVW-S, CD
becomes equal to 3.1×CVW-S. This is often called the ref-
erence change, RC, and used as the basis for interpretation Discussion
of differences and based on this RC can be used for the
calculation of analytical quality specifications. In the process of establishing analytical quality specifi-
cations, a variety of different models and assumptions
must be used and applied. As demonstrated for the rela-
POCT tive simple combinations of random and systematic er-
rors, [8] how difficult it is and how different the outcome
For Point-Of-Care-Testing, POCT, measurements are may be, it is necessary to be cautious when it comes to
performed ‘on location’, saving the time usually spent estimation of analytical quality specifications according
with transport to the central laboratory and time for the to different clinical settings. We can say with Box and
analytical procedure, turnaround time. This aspect is evi- Luceño’s words [6] that any model applied is wrong, so
dent, but it has been difficult to combine this time-saving the task is to make probable that the model is useful for
with the analytical quality specifications, as these speci- the purpose.
fications should be the same as the general for laborato- One of the first considerations may be to define cor-
ries in order to get the same data – but at an earlier time. rectly the scale of measurements, and next to describe
A direct relationship between time and analytical the clinical setting, whether classification or monitoring
quality specifications can, however, be established by – or any other. For classification, whether it is a uni- or
changing the sampling interval. In Fig. 4 is shown the bi-modal situation (Fig. 1). In the case of uni-modal dis-
concentration of a component as function of time. At a tributions – usually risk as for cholesterol [7], but also
certain time (here 2 h), a change in the patient initiates diagnostic in diabetes [8] – the prevalence is determined
an increase in the component. In order to make it simple by the decision limit (and the population sample). In the
493

uni-modal concept the analytical quality specifications been proposed (Fig. 3), parallel to the biological approach
will mostly be very demanding. If a uni-modal concept is (for sharing common reference intervals) based on a well
treated as bimodal with FP and FN results, absurd con- known model [8]. This model can only give an interval for
clusions may be drawn. the analytical quality to be obtained in order to produce a
For bi-modal distributions the primary outcome is a certain percentage of values from a healthy distribution.
combination of FP, FN, TP and TN, which can be treated In monitoring, a distinction between monitoring against
in many ways, as known for calculations of sensitivity a certain concentration value and estimation of the differ-
and specificity. Most important for the interpretations, ence between two measurements must be performed. In
and for further calculations, is the prevalence [9]. A vari- this contribution only the difference between two measure-
ety of possibilities are, however, open for further calcu- ments is dealt with. Here the well known formulas can be
lations in estimation of analytical quality specifications, applied without any problems [12], but in order to investi-
e.g. predictive values and costs, and it is also possible to gate specific analytical quality specifications for POCT the
vary weighting factors for the FP and FN groups togeth- time must be involved in the assumptions and calculations.
er with investigation of the influence of prevalence on Further, a change in sampling strategy with prolonged
the outcome (Fig. 2 and [9]). sampling intervals must be introduced in order to obtain
A special type of measurement, which needs a totally analytical quality specifications which are different from
different interpretation, is measurements on the ordinal the traditional for analytical work. In the example (Fig. 4),
scale, where the analytical quality specifications must be simple assumptions are made, such as linear increase of the
defined via probabilities for getting the result ‘0’ or ‘1’ measurand and round hours of sampling intervals and turn-
[10]. Here there are still lots of unanswered questions, around time. These assumptions and the applied model are
and it will probably take some time to create the needed much too simple to draw definite conclusions.
relevant models and to understand the complex nature of
this type of result. Measurements on the ordinal scale are
interesting from a theoretical point of view, but measure- Conclusion
ments on the ratio scale should be preferred whenever
possible, as the quality – and thereby the information The choice of model and the assumptions applied are
value – of ordinal measurements is considerable lower. crucial for the reliability of analytical quality specifica-
In this paper an attempt to create analytical quality tions, so these presumptions must be validated painstak-
specifications for measurements on the ordinal scale, has ingly in each specific clinical setting.

References
1. Tonks DB (1963) A study of the accu- 5. Hyltoft Petersen P, Stöckl D, Westgard 9. Hyltoft Petersen P (1999) Quality spec-
racy and precision of clinical chemistry JO, Sandberg S, Linnet K, Thienpont L ifications based on analysis of effects
determinations in 170 Canadian labora- (2001) Models for combining random of performance on clinical decision-
tories. Clin Chem 9:217–233 and systematic errors. Assumptions making. Scan J Clin Lab Invest
2. Fraser CG, Hyltoft Petersen P (1999) and consequences for different models. 59:517–521
Analytical performance characteristics Clin Chem Lab Med 39:589–595 10. Hyltoft Petersen P, Sandberg S, Fraser
should be judged against objective 6. Box G, Luceño A (1997) Statistical CG, Goldschmidt H (2000) A model
quality specifications. Clin Chem control by monitoring and feedback ad- for setting analytical quality specifica-
45:321–323 justment. A Wiley-Interscience publi- tions and design of control for mea-
3. Hyltoft Petersen P, Fraser CG, Kallner cation. Wiley, New York Chichester surements on the ordinal scale. Clin
A, Kenny D (eds) (1999) Strategies to Weinheim Brisbane Singapore Toronto Chem Med Lab 38:545–551
set global analytical quality specifica- 7. Klee GG, Schryver PG, Kisabeth RM 11. Skeie S, Thue G, Sandberg S (2001)
tions in laboratory medicine. Scan J (1999) Analytical bias specifications Interpretation of HemoglobinA1C
Clin Lab Invest 59:475–585 based on the analysis of effects on per- (HbA1C) values among diabetic pa-
4. Kenny D, Fraser CG, Hyltoft Petersen formance of medical guidelines. Scand tients. Implications for quality specifi-
P, Kallner A (1999) Consensus agree- J Clin Lab Invest 59:509–512 cations for HbA1C. Clin Chem
ment. In: Hyltoft Petersen P, Fraser 8. Hyltoft Petersen P, Brandslund I, 47:1212–1217
CG, Kallner A, Kenny D (eds) Strate- Jørgensen LGM, Stahl M, de Fine Oli- 12. Hyltoft Petersen P (2001) Elements of
gies to set global analytical quality varius N, Borch-Johnsen K (2001) analytical quality and strategies for
specifications in laboratory medicine. Evaluation of systematic and random choosing analytical quality specifica-
Scan J Clin Lab Invest 59:585 factors in measurements of fasting tions. Blood Gas News 10:31–39
plasma glucose as the basis for analyti-
cal quality specifications in the diagno-
sis of diabetes. 3. Impact of the new
WHO and ADA recommendations on
diagnosis of diabetes mellitus. Scand J
Clin Lab Invest 61:191–204

You might also like