You are on page 1of 11

Home Search Collections Journals About Contact us My IOPscience

On the analysis of measurement comparisons

This content has been downloaded from IOPscience. Please scroll down to see the full text.

2004 Metrologia 41 122

(http://iopscience.iop.org/0026-1394/41/3/003)

View the table of contents for this issue, or go to the journal homepage for more

Download details:

IP Address: 163.247.43.71
This content was downloaded on 12/02/2014 at 16:58

Please note that terms and conditions apply.


INSTITUTE OF PHYSICS PUBLISHING METROLOGIA
Metrologia 41 (2004) 122–131 PII: S0026-1394(04)73992-4

On the analysis of measurement


comparisons
D R White
Measurement Standards Laboratory, IRL, PO Box 31310, Lower Hutt, New Zealand

Received 30 September 2003


Published 20 February 2004
Online at stacks.iop.org/Met/41/122 (DOI: 10.1088/0026-1394/41/3/003)

Abstract
The method of constrained least squares is applied to the determination of
laboratory bias and degrees of equivalence in the analysis of measurement
comparisons. The analysis assumes a measurement model with unknown
values for all travelling artefacts and laboratory biases. Least-squares fitting
applied to this model yields an infinite set of solutions, all with the same
inter-laboratory and inter-artefact differences. The application of a
constraint, which can be interpreted as the definition of the key comparison
reference value, then yields a single solution. Different types of constraint
may be applied for known-value comparisons, for key comparisons without
known values, and when linking regional metrology organization or
supplementary comparisons to key comparisons. Once the laboratory biases
are determined, the degrees of equivalence can be determined using the
generic equations provided. The constrained-least-squares approach
provides a single mathematical framework for the analysis of a wide range
of comparisons including those with multiple artefacts of varying attributes,
circulated amongst multiple overlapping comparison loops, laboratories that
provide an arbitrary number of measurements of one or more of the
artefacts, and spectral dependence of laboratory bias.

1. Introduction expectation that the KCRV should be ‘. . .a close approximation


to the corresponding SI value’.
Four years have passed since the mutual recognition The other differences between key comparisons and
arrangement (MRA) [1] was signed and its implementation the comparisons employed in proficiency tests and collabo-
commenced. The purpose of the MRA is to establish the degree rative trials relate to the lack of simplifying assumptions in the
of equivalence of national measurement institutes (NMIs), statistical model. For example, in contrast with collaborative
enable the mutual recognition of certificates issued by the trials, the measurements reported by NMIs in key compar-
NMIs, and to provide international users of the SI with a isons are often based on different procedures and equipment;
secure technical foundation for agreements and contracts. hence the results are drawn from different distributions with
The effectiveness of the MRA depends critically on key different biases and uncertainties. Also, in contrast with some
comparisons: measurement comparisons conducted amongst proficiency tests and collaborative trials, all the artefacts do
the NMIs that test their key skills and provide an experimental not have the same values (i.e. in key comparisons the range
confirmation of their capability. of artefact values is greater than the uncertainty of measure-
Superficially, key comparisons have much in common ment, see [3, Pt. 2, Cl. 5.3]), and the uncertainties associated
with proficiency tests [2] and collaborative trials [3]. However, with the drift and variations in the attributes of the artefacts are
from a statistical perspective, there are significant differences. rarely negligible. These statistical differences and the need
The most significant difference arises because there is for a mathematical definition for the KCRV have resulted in
usually no ‘high-echelon laboratory’ [2] capable of providing a flurry of papers discussing a wide range of technical issues
reference values with negligible uncertainty. Thus, the relating to the implementation of the MRA and the analysis
Technical Supplement to the MRA encourages the calculation of comparisons. Most recently, Cox, representing a group of
of a key comparison reference value (KCRV), with the experts assembled by the director of the BIPM, published a

0026-1394/04/030122+10$30.00 © 2004 BIPM and IOP Publishing Ltd Printed in the UK 122
The analysis of measurement comparisons

description of a comparison analysis for one of the simplest properties of the constrained-least-squares solution, clarify
cases [4, 5]: a comparison involving the circulation of a single the calculations of the uncertainty terms of the degree of
stable artefact. equivalence, and illustrate the relationship to other approaches.
The purpose of this paper is to offer a method of analysis
that is applicable generally to comparisons involving multiple Section 8: Discussion. This section discusses some issues
artefacts of different and possibly varying attributes, circulated relating to the comparison model, its associated assumptions,
amongst multiple possibly overlapping comparison loops, with and the impact of the constrained-least-squares approach on
laboratories that provide an arbitrary number of measurements comparison protocol.
of one or more of the artefacts. It is based on the least- Finally, the conclusions are summarized.
squares approach suggested previously by the author [6–8]
and discussed by Cox [9]. The analysis is consistent with 2. Definition of the problem
that recommended by the BIPM Director’s advisory group
for the simple case, with the ISO Guide to the Expression 2.1. Laboratory capability
of Uncertainties in Measurement [10], and with the majority
of papers published to date. This paper is divided into seven It is assumed that, if any comparison participant was to measure
main sections as follows. repeatedly a particular attribute of a single stable artefact, the
results would be distributed about a mean value biased away
Section 2: Definition of the problem. This section presents from the SI value by an amount d with a standard deviation
a mathematical model of comparisons based on the model s. The distinction between the laboratory bias, d, and the
and terminology employed in ISO G43 Proficiency Testing laboratory standard deviation, s, in the model of capability
by Interlaboratory Comparisons [2] and ISO 5725 Accuracy is the same as the distinction between systematic error and
(Trueness and Precision) of Measurement Methods and the standard uncertainty characterizing the random error [10].
Results [3], with the added features of multiple, unstable Prior to the comparison, the participant is unaware of the sign
and parametrized artefacts. The section emphasizes the and magnitude of d. However, it will have estimated a standard
assumptions made and defines the parameters in the model. uncertainty, u(d), that ‘characterises the dispersion of values
that could reasonably be attributed’ [10] to d. The expanded
Section 3: The least-squares solution. It follows directly from uncertainty reported by the ith participant in its statement of
the problem definition that a least-squares approach provides capability is therefore
the maximum-likelihood solution to the comparison analysis.
Ui2 = k 2 u2i = k 2 [si2 + u2i (di )], (1)
It is found, however, that there is no unique solution; instead
the solution is in the form of two sequences comprising the where k is an appropriate coverage factor. For key
inter-laboratory and inter-artefact differences. comparisons, the MRA requires the expanded uncertainty to
be expressed as a 95% confidence interval.
Section 4: Determination of the KCRV. It is shown that The standard deviation, s, as defined above is closely
absolute values of the laboratory biases and artefacts can be related to, and in some cases is the same as, the laboratory
estimated by applying a constraint to the least-squares problem. repeatability [11]. The contributing variations may or may not
This constraint, which applies a priori information about include differences in the staff carrying out the measurements,
artefacts or laboratory biases, is equivalent to the definition differences in procedures, and differences in equipment, but
of a KCRV and may be chosen to suit the comparison design. the participant must be clear about what is included and
what is not when preparing the uncertainty analysis. For
Section 5: Calculation of the degrees of equivalence. Once the comparison pilots and participants measuring more than one
constraint is applied, both the degree of equivalence and the artefact, a low value for s is highly desirable in the linking of
pairwise degree of equivalence can be calculated. This section different comparison loops, so variations should be minimized.
discusses the calculation of the uncertainty term of the degree In principle, the value of s can be determined as a Type A
of equivalence and includes the effect of the uncertainties uncertainty by repeated measurements of any suitable artefact.
associated with the linking of comparisons, the effect of
transport uncertainties associated with the artefact, and the
2.2. The comparison
effect of participants submitting more than one result.
It is assumed that the purpose of the comparison is to estimate
Section 6: Linking of comparisons. It has been shown values for the laboratory biases, di , the set of inter-laboratory
that comparison reference values cannot be used to link differences, di −dl , and the uncertainties in these values. These
comparisons without known values in a statistically unbiased are used to calculate the degree of equivalence for any NMI
manner [6]. Unbiased linking of two (or more) such and the pairwise degrees of equivalence for any pair of NMIs.
comparisons can only be accomplished by using results from The comparison is designed so that each participant is
NMIs that have participated in both comparisons. This section able to provide one or more measurements for one or more
describes two methods for linking comparisons based on the artefacts. Following the symbolism suggested by schematic
constrained-least-squares approach. representations of comparisons, such as shown in figure 1, the
term ‘loop’ will be used to describe the set of laboratories that
Section 7: Example. This section gives a simple algebraic provide measurements of one artefact. As shown in figure 1,
example of a comparison analysis to illustrate the key it is assumed that all the loops of a comparison are connected;

Metrologia, 41 (2004) 122–131 123


D R White

Artefact 3

Artefact 2
Artefact 1

Artefact 4

Figure 1. A comparison comprises one or more artefacts circulating amongst different participating laboratories. The different loops of the
comparison are connected by linking laboratories (•) that measure more than one of the artefacts.

that is, any one participant’s measurements can be related to • Both the artefact variances, qj2 , and the laboratory
those of any other by an unbroken chain of measurements and variances, si2 , may be parametrized. This would
artefacts. It is also assumed that each artefact is measured by be appropriate when, for example, the random errors
at least two laboratories. characterized by the variances exhibit correlations over
Each measurement submitted to a comparison is time [14]. Least-squares fits are readily adapted to include
modelled as autoregressive and moving average models of noise [15].
Xi,j,k = Vj + δi,j,k + di + εi,j,k , (2)
where i, j , and k are indices identifying each participant, 3. The least-squares solution
each artefact, and each repeat of a participant’s measurement,
respectively; Vj is the value of the j th artefact; δi,j,k is the Given the measurement model of the form of (2), the
random error associated with the departure of the artefact value maximum-likelihood solution for the parameter values is
from the artefact model (e.g. variations due to transport) and is conventionally found by least-squares fitting [16]. The general
distributed about zero with a variance qj2 ; di is the bias in each solution is found by minimizing the chi-square function of
participant’s measurement of the artefact value; εi,j,k is the differences between the measurements and the model:
random error associated with each participant’s measurement    (Xi,j,k − Vj − di )2
and is distributed about zero with a variance si2 . χ2 = . (4)
Thus, the variance in a participant’s single measurement i j k
si2 + qj2
of the artefact value, as observed by the other participants, is
Note that the weights in this fit are the total variance in each
var(Xi,j,k ) = qj2 + si2 . (3) measurement as given by (3). Unfortunately, the definition
It is assumed that the uncertainties characterizing variations of the comparison model, (2), is such that no unique solution
in artefact values and variations in the laboratory measure- exists. This can be demonstrated by considering a case where
ments are uncorrelated. The variations giving rise to these the laboratory biases, di , and the artefact values, Vj , are simple
effects are also assumed to be distributed normally; this constants. In this case, we might expect to determine values
assumption is implicit in the derivation of the least-squares for Vj and di by differentiating (4) with respect to each of the
method used in section 3. fitted parameters and setting the resulting equations to zero.
The model expressed by (2) is similar to that employed This leads to two sets of normal equations, one associated
in collaborative trials [3] and in many comparisons although with the derivatives with respect to each of the artefact values,
often not stated explicitly. While it is not shown in (2), any or   Vj + d i   Xi,j,k
all of the various parameters may be parametrized further:
2 2
= , for each j (5)
si + q j si2 + qj2
• The artefact value Vj may include a parametric i k i k
dependence on other physical quantities. The most
and the other set associated with derivatives with respect to
obvious example is time dependence, Vj (t). Other
each of the laboratory biases,
examples include ambient temperature, the number of
measurements made, or dc current as in the case of the   Vj + d i   Xi,j,k
radiometric lamps used in CCT-K5 [12]. Note that 2 2
= , for each i. (6)
si + q j si2 + qj2
the term ‘artefact value’ is used as a short description for j k j k
the value of the particular quantity associated with the
artefact that is the subject of the comparison and defined In this example, the sums of the two sets of equations are
by the comparison protocol. the same; hence the corresponding normal-equation matrix is
• The laboratory bias, di , might be parametrized, for singular. This is a consequence of the assumption that all the
example, with spectral measurements where d depends measurements are biased by some unknown amount.
on wavelength or frequency [9, 13]. Another example The nature of the set of solutions is also evident; given any
is a temperature-dependent laboratory bias, such as set of Vj and di that minimizes (4), an infinite number of other
in CCT-K5. sets can be obtained by adding any constant to all the Vj values

124 Metrologia, 41 (2004) 122–131


The analysis of measurement comparisons

and subtracting the same constant from all the di values. Thus, or more generally
(5) and (6) yield an infinite set of solutions, all with the same  
inter-laboratory differences and inter-artefact differences. In wi di = 0, where wi = 1. (12)
simple cases, an algebraic solution in terms of the differences
is practical. Cox [9, 18, 19] describes the various statistical assumptions
inherent in the application of (10)–(12). Equations (11) and
(12) correspond to the application of the hidden-error model
4. Determination of the KCRV described by Willink [20]. Note too, that the summations
in (11) and (12) may be limited to selected participants if,
One way of restricting the infinite set of solutions to a single
for example, outliers are rejected. In the CCEM-K4 10 pF
solution is to impose a constraint. This may be implemented
key comparison, (12) was used with the summation limited to
by using the method of Lagrange multipliers [17], in which a
those laboratories that derived their capacitance measurements
constraint is added to (4):
from an independent realization of the calculable capacitor
   (Xi,j,k − Vj − di )2 [21]. The summation may also be limited to the CIPM
χ2 = + λ[g(Vj , di )], (7) or BIPM comparison participants in cases where data from
si2 + qj2
i j k comparisons organized by regional metrology organizations
(RMOs) are included in the analysis. Elster and Link [22]
where g(Vj , di ) = 0 is the equation of constraint, and λ is describe refinements in the selection of the weights of (12),
the Lagrange multiplier. Multiple constraints can be included based on observations of clustering effects and resampling.
by adding multiple terms with separate Lagrange multipliers.
The form of the constraints varies according to the comparison
design, as discussed later. The normal equations of the least- 4.3. The meaning of the KCRV
squares method are then generated by differentiating (7) with For proficiency tests employing known-value comparisons,
respect to Vj , di , and λ. An example will be given in section 7. the assigned artefact value is called the reference value [3].
Therefore, for known-value key comparisons employing a
4.1. Known-value comparisons single stable artefact, it seems clear that the assigned SI artefact
value should be the KCRV, and this is so in the simple case
In some comparisons, the value of the artefact is known in described by Cox [5]. However, for classes of comparisons
advance. Examples include chemical comparisons where involving multiple or unstable artefacts, the definition is not so
the uncertainty in the artefact preparation process (e.g. clear. By implication, and often explicitly, the KCRV is defined
gravimetric) is less than the uncertainty in any measurement to be the estimated or modelled value of a selected artefact at
of the prepared artefact or where the artefact value is a selected time. The participants’ results are then adjusted
determined in advance by a ‘high-echelon laboratory’. In to correspond to a hypothetical simultaneous measurement of
most cases, the known values can be substituted directly into the artefact at that time. CCEM-K4 went one step further
(5) and (6), and the Lagrange-multiplier formalism is not by assigning the KCRV a value of 10 pF exactly [21], that
required. In the remaining cases, involving time-dependent being the nominal value of the capacitors circulated and the
models with parameters determined from the comparison data, value of the hypothetical stable travelling capacitor that would
the constraint takes the form have yielded the same results. When these interpretations of
the KCRV are adopted, a complication arises because of the
g(Vj ) = Vj (t)|t=t0 − Vj,0 = 0, (8) arbitrary nature of the selected value of the KCRV. Can such
an arbitrarily chosen value have an uncertainty?
where Vj (t) is the parametric model of the artefact and Vj,0 is
The least-squares approach offers a change in perspective
the known value of the artefact at the time t0 .
and clarification. The application of the constraint according
to participant consensus fixes values for both the laboratory
4.2. Comparisons without known values biases and all the artefact values at all times throughout the
For key comparisons where the artefact values are not known, comparison. In particular, the laboratory biases are constants
the analysis applied to collaborative trials perhaps forms the independent of the behaviour of the artefacts, which of the
best precedent [3]. These are tests where the artefact value is artefacts is measured, and when the artefacts are measured. If
defined by a prescribed test method, and several laboratories the laboratory biases are, as defined by the MRA, the deviation
collaborate to ascertain the repeatability and reproducibility between the participants’ measurements and the KCRV, then
of the test method. In these cases the assigned value for the the KCRV is a multi-valued quantity: ‘the consensus values
artefact is determined by consensus as an average of selected for all the artefacts at all times throughout the comparison’.
results. It is this feature of collaborative trials that is shared Since the least-squares fit provides a parametric description of
with many key comparisons. In this case there are a large the artefacts, the uncertainty in the KCRV is the propagated
number of possible constraints including uncertainty in the artefact values. However, as will be shown
below, with the exception of known-value comparisons, the
dpilot = 0, (9) values of the artefacts and their uncertainties do not enter into
subsequent calculations.
median (di ) = 0, (10) The MRA states that the ‘. . .KCRV should be a
 close approximation to the corresponding SI value’. For
di = 0 (11) comparisons without known values, the verbal definition of

Metrologia, 41 (2004) 122–131 125


D R White

the KCRV given above emphasizes it as a consensus value propagated artefact and laboratory variances, as will be shown
rather than the SI value. The distinction is highlighted in section 7. The uncertainty associated with the constraint
through consideration of the constraints (10)–(12). Since the arises because of the finite number of participants. With a small
metrological equivalence is intended primarily to indicate number of participants, it is unlikely that the mean laboratory
the degree of interoperability of measurement standards, bias is identically zero in practice, although it is assumed to be
it is reasonable to choose an average (the median, mean, zero in the implementation of the constraint.
or weighted mean) as a ‘measure of centrality’ [5] for Consider the case when the constraint is based on the
the distribution of laboratory biases. For this reason, the weighted mean of the laboratory biases (12). The statistical
assumption implicit in (10)–(12), that the average bias is contribution of the biases can be determined by assuming that
zero, will be the basis of the analysis that follows. However, the uncertainties si and qj are zero in equation (13):
the assertion that the laboratories’ biases are indeed zero on
average is usually untestable and, in some cases, known to 
N
di,meas = di − w j dj , (14)
be incorrect [23, 24]. The validity of the assumption, and
j =1
hence the relation between the KCRV and the corresponding
SI value, is a secondary issue best left to the respective CIPM where N is the number of participating laboratories, and hence
consultative committee to resolve.

N
di,meas = wj (di − dj ). (15)
5. Calculation of the degrees of equivalence j =1

5.1. The degree of equivalence This equation distinguishes the measured value of laboratory
bias and the actual (but unknown) values of laboratory bias.
The MRA states that the degree of equivalence for ‘. . .each Note that the ith term of the summation of (15) is identically
national measurement standard is expressed quantitatively by zero, and the weights can be chosen according to any one of
two terms: the deviation from the KCRV and the uncertainty the constraints (10)–(12). Application of the propagation of
in this deviation (at a 95% level of confidence)’. The definition uncertainty equation to (15) yields the expected variance in the
of the degree of equivalence is based on the En numbers used value of the measured value of di arising from the distribution
in proficiency tests [3]. The En number is the ratio of measured of the actual dj values of all participants,
deviation to the laboratory’s expanded uncertainty (an estimate
of the range of possible deviations) and is a measure of the 
N

quality of the laboratory’s uncertainty analysis; En numbers (1 − wi )2 u2i (di ) + wj2 u2j (dj ), (16)
j =1,j =i
much greater than 1 are suggestive of an incomplete analysis.
By requiring the deviation and uncertainty to be reported where uj (dj ) are the standard uncertainties provided by the
separately, the MRA also provides a basis for determining the participants, which characterize the dispersion of values that
interoperability of different measurement standards [25]. could reasonably be attributed to dj , as defined in section 2.
For a comparison employing a single stable artefact, and Addition of the uncertainty propagated through the least-
where each laboratory submits a single result, the deviation squares fit, u2fit (di ), and multiplication by the appropriate
term of the degree of equivalence is the measured laboratory coverage factor, k, then yields the uncertainty term required
bias and is estimated from (2) [5] as for the degrees of equivalence:
 
di,meas = Xi − KCRV, (13) 
N
2
Udoe (di ) = k 2 u2fit (di ) + (1 − wi )2 u2i (di ) + wj2 u2j (dj ) .
where KCRV is the consensus artefact value (note that the j =1,j =i
j and k indices on Xij k are redundant in this case). Since (17)
the measured value of di is subject to uncertainties arising It will be shown in section 7 that the ufit (di ) term of this
from the estimation of the KCRV, in all comparisons to date, equation has a form very similar to that for the ui (di ) terms
the practical interpretation of the uncertainty term of the in (16). Indeed, when there is a single stable artefact (no
degree of equivalence has been the expanded total uncertainty, time dependence and q = 0), measured only once by each
calculated from the combination of the uncertainty reported by participant, ufit (di ) has exactly the same form. Under these
the laboratory and the uncertainty in the KCRV, including the conditions (17) becomes
effects of correlation. This follows from the application of the  
propagation of uncertainty equation [10] to (13). 
N
For known-value comparisons, the uncertainty in the
2
Udoe (di ) = k 2 (1 − wi )2 u2i + wj2 u2j  , (18)
KCRV is simply the uncertainty with which the artefact value j =1,j =i
can be prepared, or measured by the higher-echelon laboratory.
where the ui are the standard total uncertainties reported by
For comparisons without known values, the uncertainty in the
the laboratories, (1). If the calculations are further simplified
KCRV has two parts, one due to the artefact and laboratory
by assuming that the number of degrees of freedom associated
variances in the measurements and one due to the uncertainty
with the various uncertainties is large, then (18) becomes
associated with the definition of the KCRV, i.e. the degree to
which the constraint is satisfied in practice. If the weights 
N
in the least-squares problem are the artefact and laboratory 2
Udoe (di ) ≈ (1 − wi )2 Ui2 + wj2 Uj2 , (19)
variances, as in (7), then the least-squares solution includes the j =1,j =i

126 Metrologia, 41 (2004) 122–131


The analysis of measurement comparisons

where the Ui are the expanded total uncertainties reported 6. Linking of RMO and supplementary comparisons
by each participant. Equations (17)–(19) explicitly remove
the correlation between the laboratory uncertainties and the With known-value comparisons, laboratory biases are
uncertainty in the KCRV, a major point of concern noted in calculated by comparing results directly with the KCRV. This
other publications (e.g. [9, 26–28]). When the participants suggests that the relative performance of participants A and B
have similar weightings, the Ui term typically dominates in two different comparisons can be compared as
Udoe , and Udoe approaches Ui as the number of participants
approaches infinity. Equation (19) was derived by Cox dA − dB = (XA − KCRV1 ) − (XB − KCRV2 ). (23)
[5, appendix C] for the simple one-stable-artefact comparison.
While this may be appropriate for a pair of known-value
However, as the derivation above shows, the result is limited to
comparisons, it can be shown that with other comparisons
cases where participants submit only one result and the artefact
it leads to statistically biased estimates for inter-laboratory
variances, q 2 , are negligible. Cox also notes that if the weights
differences. That is, the value of dA − dB will depend
are proportional to 1/Uj2 , then (19) further simplifies to
on the other laboratory biases [6], some of which may be
 −1 outliers. Unbiased linking can be accomplished by ensuring
N
1 that some laboratories participate in both comparisons to
2
Udoe (di ) = Ui2 −   . (20) provide measurements of the differences in the values of
j =1
Uj2
the two artefacts. For this reason the technical supplement
of the MRA notes that ‘The results of Supplementary and
Since the second term with the summation is the standard RMO comparisons are linked . . . by the common participation
result for the variance in the weighted mean, that term can of some institutes in both CIPM and RMO comparisons’.
be interpreted as the uncertainty in the KCRV. The minus sign The least-squares analysis is one way of providing unbiased
in (20) accounts for the correlation between the KCRV and the linking since the linking of two or more comparisons is
participant’s results. The same result was obtained by Nielsen fundamentally the same as a single comparison with multiple
for a single-artefact-comparison analysis where the laboratory artefacts. There are two basic approaches to linking of RMO
biases are all assumed to be zero [29]. (or Supplementary) comparisons, each with advantages and
In many comparisons, (19) or (20) provides a satisfactory disadvantages.
approximation for the degree-of-equivalence uncertainty. The simplest way to link an RMO comparison to a key
However, (17) is a better choice in cases where the artefacts are comparison is to add the RMO data to the key comparison
unstable (qj > 0) or there are many repeats of measurements data and repeat the least-squares fit. This approach simplifies
or there are a large number of comparison loops. The the calculation of degrees of equivalence and propagation of
uncertainty given by (19) may be smaller or larger than that uncertainties since the same formulae are used for the RMO
given by (17); connection via linking laboratories and unstable comparison participants as for the key comparison participants.
artefacts (qj > 0) will increase the uncertainty propagated However, as each set of RMO results is added, the additional
through the fit, while the averaging associated with multiple information gained about the linking laboratories may yield
measurements will reduce the uncertainty. Although none of updated parameter values differing from those obtained with
(17)–(19) accounts for correlations in the ui (di ) uncertainties the key comparison data only. If the uncertainties in the
from different laboratories, the extension is obvious. RMO measurements are large compared with those in the
key comparison measurements, then the impact on the key
5.2. The pairwise degrees of equivalence comparisons results is negligible. In that case, it is reasonable
to ignore the updated results for key comparison participants.
As was noted in section 3, the difference di − dl between any Limiting the constraint summations, (11) and (12), to key
pair of laboratory biases is insensitive to the definition of the comparison participants can prevent the new information from
KCRV. The uncertainty in the difference is also independent propagating to the key comparison parameters but only when
of the uncertainty in the KCRV; hence, there is one laboratory linking the comparisons via a single
RMO loop with a single artefact.
2
Udoe (di − dl ) = k 2 [u2i (di ) + u2l (dl ) + u2fit (di − dl )], (21) An alternative approach to linking an RMO or
Supplementary comparison to a key comparison is to constrain
where u2fit (di − dl ) is the uncertainty due to the artefact and the values of laboratory biases for the linking laboratories and
laboratory variances propagated through the least-squares fit. treat the RMO comparison as a separate analysis; that is, for
In section 7 it is shown that the ufit term has the same form as each linking laboratory apply a constraint of the form
the ui (di ) terms under the conditions of a single stable artefact,
measured only once by each participant, and a large number gi (Bi ) = di − di,KC = 0, (24)
of degrees of freedom. Then (21) simplifies to
where di,KC is the value of laboratory bias obtained in the
2
Udoe (di − dl ) = Ui2 + Ul2 . (22) key comparison. This is equivalent to defining the artefact
values in terms of a weighted mean of results of the linking
If the weights wi used to constrain the fit are inversely laboratories, with the weights determined by the number
proportional to total uncertainty, this uncertainty is always of measurements, the artefact variances and the laboratory
larger than the single-laboratory uncertainty, Udoe (di ), for variances as determined through the least-squares fit. This
either of the two laboratories (19). approach leaves unchanged the laboratory biases, KCRV, and

Metrologia, 41 (2004) 122–131 127


D R White

degrees of equivalence determined during the key comparison. Differentiation of (27) with respect to each of the artefact
As with the known-value comparisons, the constraints are values, laboratory biases, and the Lagrange multiplier leads
simple enough to allow direct substitution into (5) and (6) to to a set of normal equations, which in matrix form are
determine the remaining laboratory biases.   
N̂1 + N̂2 0 N̂1 N̂2 0 0 VN
A variation on this approach that allows the weights to be  0 K̂3 0 
 K̂2 + K̂3 0 K̂2  V 
 
chosen freely is to construct a single constraint as a weighted K

 N̂1 0 N̂1 0 0 w1   d 
 
mean of the linking laboratories’ biases, similar to (12), but 1
 
 N̂2 0 N̂2 + K̂2 0 w2  d2 
with a non-zero result determined by the numerical values of  K̂2  
laboratory bias determined from the key comparison:  0 K̂3 0 0 K̂3 w3   d3 
   0 0 w1 w2 w3 0 λ
g= r i di − ri di,KC = 0, with ri = 1,  
KC∩RMO KC∩RMO S1N + S2N
S2K + S3K 
(25)  
 S1N 
where the summation is over the linking laboratories only. = 
S2N + S2K  , (28)
This approach has the advantage of simplifying the uncertainty  
 S3K 
analysis because the weights are known a priori. The
0
uncertainty terms for the degrees of equivalence for the RMO
comparison are given by where

 Ni
Udoe (di ) = k u2fit (di ) + u2i (di ) +
2 2
rj2 u2fit,RMO (dj ) Ni Xi,N,k
N̂i = and Si,N = for i = 1, 2
KC si + qN2
2
s 2 + qN2
k=1 i
  and
+ rj2 u2fit,KC (dj ) + wj2 u2j (dj ) , (26)
RMO KC
Ki Ki
Xi,K,k
where the first term is the uncertainty due to artefact and K̂i = and Si,K = for i = 2, 3.
si2 + qK2 s
k=1 i
2
+ qK2
laboratory variances in the RMO comparison propagated
through the least-squares fit, the second term is the participant’s As is expected for least-squares equations, the leading 6 × 6
estimate of the range of values for di , the third and fourth terms matrix (the curvature matrix) is symmetric. Note too that
are the uncertainty in the link between the two comparisons due without the constraint applied, the resulting 5×5 matrix would
to the artefact and laboratory variances, and the last term is the be singular (compare the sum of the rows 1 and 2 with the sum
uncertainty in the constraint applied in the key comparison. of rows 3, 4, and 5).
It is instructive to look at some of the values determined
7. A simple example for the parameters. First, consider the artefact differences and
the laboratory differences since these are independent of the
This section presents a simple algebraic example to applied constraint (the definition of KCRV). The difference
demonstrate some of the properties of the constrained-least- between the two artefact values is
squares approach, clarify some of the uncertainty calculations,
1  1 
N2 K2
and illustrate the relationship to other approaches that have
VN − VK = X2,N,k − X2,K,k . (29)
been adopted to date. N2 k=1 K2 k=1
Consider a comparison involving two time-independent
artefacts and three participants, as indicated in figure 2. This result can also be derived directly from the normal
In this example, the KCRV is defined by the weighted equations (5) and (6). In this example, Laboratory 2 was the
mean of the laboratory biases of the three participants. Hence only participant to measure both artefacts, and the difference
the least-squares problem is defined by (29) is formulated so that the value of d2 has no effect

   (Xi,j,k − Vj − di )2  on the estimate of artefact difference. The least-squares
χ =
2
+λ wi di . (27) solution always yields values for artefact differences that are
si2 + qj2
i j k i independent of all laboratory biases.

Artefact N Artefact K
Laboratory 1
Submits N1 Laboratory 3
Submits K3
results
results

Laboratory 2
Submits N2,
K2 results

Figure 2. A representation of a comparison involving the circulation of two artefacts N and K among three laboratories. The symbols VN
and VK represent the values of the artefacts circulated in the two loops. N1 , N2 , K2 , and K3 designate the number of measurements of the
artefacts performed by Laboratories 1, 2, and 3, respectively.

128 Metrologia, 41 (2004) 122–131


The analysis of measurement comparisons

The difference between the biases of Laboratory 1 and The uncertainty in laboratory differences requires the use
Laboratory 2 is of the off-diagonal terms of the covariance matrix. For d1 −d2 ,
u2fit (d1 − d2 ) = cov(3, 3) − 2cov(3, 4) + cov(4, 4)
1  1 
N1 N2
d1 − d2 = X1,N,k − X2,N,k (30) s12 + qN2 s22 + qN2
N1 k=1 N2 k=1 = + (35)
N1 N2
and the difference between the biases of Laboratory 1 and as expected, and the uncertainty in d1 − d3 includes additional
Laboratory 3 is terms due to the uncertainty in the artefact correction:
s12 + qN2 s32 + qK2 s22 + qN2 s22 + qK2
1  1 
N1 K3
u2fit (d1 − d3 ) = + + + . (36)
d1 − d3 = X1,N,k − X3,K,k N1 K3 N2 K2
N1 k=1 K3 k=1
Equations (33)–(36) show the uncertainties in fitted parameters
1  1 
N2 K2
increase with laboratory and artefact variances, increase with
− X2,N,k − X2,K,k . (31) distance between loops due to additional uncertainties in the
N2 k=1 K2 k=1
artefact correction terms, and decrease with the number of
The difference between Laboratories 1 and 2 is simply the measurements submitted to the comparison.
difference between the means of measurements of artefact For the simplest comparison designs, including the simple
N only. However, for the difference between Laboratories 1 case described by Cox [5], the results can be obtained
and 3, which are in different loops, the result is the difference intuitively since the solutions are simple combinations of the
between the means of the two laboratories’ results minus the means of results as given above. However, most comparisons
artefact correction of (29). This result demonstrates again are more complex. In this respect the algebraic examples
that it is the laboratories that participate in both loops of a presented here and in [5] are atypical; small increases in
comparison that provide the critical information for linking complexity, such as an additional artefact or linking laboratory,
the loops. make an algebraic analysis very unwieldy. A more typical
As noted in section 4, the absolute values of the artefacts comparison would involve at least a dozen free parameters
and laboratory biases do depend on the constraint; for example, and is practically impossible to analyse algebraically. Indeed,
the complexity of analysis in most comparisons is such
1  1 
N2 N1
d2 = w1 X2,N,k − X1,N,k that a sequential derivation of numerical equations from
N2 k=1 N1 k=1 first principles, which has been the most common approach
to date, is onerous and unlikely to yield a maximum-
1  1 
K2 K3
+ w3 X2,K,k − X3,K,k . (32) likelihood solution. In CCT-K3 [30], which involved 22 free
K2 k=1 K3 k=1 parameters (15 laboratories and seven artefacts circulating
This is the laboratory-deviation term of the degree of in six comparison loops), the comparison topology allowed
equivalence and is a simple linear combination of the various a ‘direct’ and ‘inclusive’ route for computing some of the
means. Note that the distribution of the weights in (32) is inter-laboratory differences, and both solutions were reported.
similar to that in (15). A pure least-squares approach would use all the information
The inverse of the 6 × 6 matrix in (28) is the covariance at hand to compute a single set of inter-laboratory differences.
matrix [16]. Thus, so long as the various values for the
laboratory and artefact variances are realistic, the elements of 8. Discussion
the covariance matrix give the uncertainties in the estimates of
the parameters. The uncertainty in the fitted value of the bias 8.1. The model and assumptions
for Laboratory 2 is
 2  An important prerequisite of least-squares analysis is that the
s + qK2 s12 + qN2 model describes accurately the real situation: an incorrect
u2fit (d2 ) = cov(4, 4) = w12 2 +
K2 N1 model leads to statistically biased values for the fitted
 2 2 2 2  parameters. In particular, explicitly including laboratory
s + qN s3 + qK
+w32 2 + . (33) biases in the model eliminates the biasing effect of other,
N2 K3
possibly outlying, values of laboratory bias, on all estimates of
Comparison with (32) shows that the uncertainty propagates laboratory bias [8]. This is a key distinguishing feature of the
as the uncertainty in the various means. In the case when there analysis presented here and some earlier comparison analyses
is a single artefact (no additional uncertainty due to linkages), [18, 29] where zero bias was assumed. For the same reasons,
the artefact is stable (q = 0), and the participants submit only it is important that the comparison analyst recognize and
one measurement (Ni = 1), and the uncertainty in the fitted include physical effects (e.g. time-dependent artefact values)
values of di is that might impact on the measurements.
One advantage of the assumption of laboratory bias is that

N
u2fit (di ) = (1 − wi )2 si2 + wj2 sj2 . (34) outlier results do not need to be completely rejected from the
j =1,j =i
analysis; it is sufficient to exclude their results from any KCRV
definition, (10)–(13). Thus, the degrees of equivalence of all
This has the same form as (16) for the dispersion of the di participants can be determined. However, the outlier problem
values and leads to (18) for the uncertainty term of the degree still exists in a lesser form since participants with inaccurate
of equivalence. estimates of their repeatability will cause the weighting of the

Metrologia, 41 (2004) 122–131 129


D R White

terms of the least-squares equation to be non-optimal. As It is possible to make estimates for the laboratory and
discussed in section 8.2, this is not a major obstacle. artefact variances from the residual errors in the least-
A second benefit of the assumption of laboratory squares fit. Plots of residual error versus participant provide
bias is that it greatly simplifies analytical problems with indicative information on the laboratory variances, and plots
possible correlations between different uncertainty terms. For versus artefact provide indicative information on the artefact
laboratory and artefact variances, the assumption of statistical variances. Since the weights in the fit are the sum of the
independence is reasonable, and hence the least-squares two variances, for good estimates of the fit uncertainties, it
analysis is free of correlation effects. Correlation effects may is sufficient to have the total of the two variances correct. For
occur in the evaluation of the degrees of equivalence, but in those laboratories that submit only one result, the residual is
this case the calculations (17) and (21) for the degrees of always zero and their reported laboratory variance affects only
equivalence can be easily extended to include the effects of their result. An overall assessment of the quality of the variance
correlations between the ui (di ). The least-squares solution estimates can be made by carrying out a chi-square test on the
presented here has similarities with the approach often used results of the least-squares sum (7).
in mass comparisons [31, 32], except that the contributing A key feature of the laboratories used to link RMO
uncertainties due to the laboratory non-repeatability and and key comparisons is that they are expected to exhibit
artefact transport are itemized separately, obviating the need the same laboratory bias for both comparisons. Therefore,
for correlation coefficients. The approach described here also laboratories that knowingly change their procedures (and
clarifies the calculation of the degree of equivalence and allows therefore possibly their bias) between comparisons should alert
non-linear parametrization of the comparison model. the pilot laboratory to the changes.
An alternative approach to least squares is to sequentially
minimize the variance in each estimate of the model parameters 9. Conclusions
[7, 8, 14]. For cases where all the artefacts are time
independent and the artefact variances are all the same, the Constrained-least-squares fitting offers a complete solution
minimum-variance approach and the maximum-likelihood for analysing comparison data. The approach is applica-
approach appear to yield the same solution [8]. When not ble to comparisons involving multiple artefacts, of varying
all artefact variances are the same, the least-squares approach attributes, circulated amongst multiple overlapping compari-
yields a solution close to but not identical to that for the son loops, and to laboratories that provide an arbitrary number
minimum-variance approach. One of the advantages of the of measurements of one or more of the artefacts. It enables
least-squares approach is that it yields a single ordered set of the inclusion of artefact (transport) variances, parametrization
laboratory biases. If the minimum-variance criterion is applied of laboratory biases, correlation between laboratory uncertain-
ties, and possibly autoregressive or moving average behaviour
separately to each parameter in the model, then the approach
in artefact and laboratory variances. Equations (17) and (21)
may not yield an ordered sequence of laboratory biases; that is,
give the uncertainty terms for the degree of equivalence and
for any three measurements of laboratory difference d1 − d2 ,
the pairwise degree of equivalence, respectively.
d2 − d3 , and d1 − d3 , the relation
The least-squares approach is also applicable to different
(d1 − d2 ) + (d2 − d3 ) = (d1 − d3 ) (37) types of comparisons. The analysis differs only in the
way a constraint is applied. For known-value comparisons,
may not apply. This transitive property ensures that the the constraint allows the inclusion of predetermined artefact
laboratory biases can be placed in an ordered sequence and values. For key comparisons without known values, values for
that conclusions drawn from tables of laboratory difference the artefacts are inferred by assuming that the average (median,
are not dependent on which columns or rows are considered. mean, weighted mean, as required) of laboratory biases is zero.
It also ensures that the laboratory differences can be expressed For RMO and Supplementary comparisons, the constraint
in a compact manner, such as a single graph rather than one allows the inclusion of laboratory biases predetermined from
the key comparisons. In each case the constraint is interpreted
graph for each participating laboratory, or a one-dimensional
as a means of defining the KCRV.
table rather than a two-dimensional table. Minimum-variance
The biggest advantage of the least-squares approach is
approaches, when applied successively to each participant’s
that it offers a single algorithm that may be applied to all
results, do not always have this property.
comparisons independent of their complexity.

8.2. Consequences for comparison protocol


Acknowledgments
The analysis provided here requires that the uncertainties
reported by the laboratories be separated into two parts: The author gratefully acknowledges valuable comments from
that characterizing the laboratory repeatability and that the referee and useful discussions with colleagues at MSL:
characterizing the range of values that may reasonably be P Saunders, K Jones, L Christian, and C Sutton.
attributed to the laboratory bias (section 2.1). This separation
of the uncertainties has not always been included in comparison References
protocol to date. Often, although not always, the separation of
[1] 1999 Mutual Recognition of National Measurement Standards
laboratories’ total uncertainty into terms for repeatability and and of Calibration and Measurement Certificates Issued by
possible bias may be equivalent to separation of the terms into National Measurement Institutes (Paris: Comité
Type A and Type B estimates. international des Poids et Mesures)

130 Metrologia, 41 (2004) 122–131


The analysis of measurement comparisons

[2] ISO/IEC Guide 43 1997 Proficiency Testing by Interlaboratory [16] Press W H, Flannery B P, Teukolski S A and Vetterling W T
Comparisons, Parts 1 and 2 (Geneva: International 1986 Numerical Recipes (Cambridge: Cambridge
Organisation for Standardisation) University Press)
[3] ISO 5725 1994 Accuracy (Trueness and Precision) of [17] Spiegel M R 1963 Shaum’s Outline of Theory and Problems of
Measurement Methods and Results, Parts 1 to 6 (Geneva: Advanced Calculus (New York: McGraw Hill)
International Organisation for Standardisation) [18] Cox M G 1999 NPL Report CISE 42/99 (Teddington: National
[4] Cox M G 2002 Metrologia 39 587–8 Physical Laboratory)
[5] Cox M G 2002 Metrologia 39 589–95 [19] Cox M G 2000 Advanced Mathematical and Computational
[6] White D R 2000 CPEM Digest 325–6 Tools ed P Carlini et al (Singapore: World Scientific)
[7] White D R 2000 Consultative Committee for Thermometry, pp 45–65
working document CCT/2000-2 [20] Willink R 2002 Metrologia 39 343–54
[8] White D R Christian L A, Jones K and Saunders P 2000 [21] Jeffery A-M 2002 Metrologia 39 Tech. Suppl. 01003
Consultative Committee on Electricity and Magnetism, [22] Elster C and Link A 2001 Meas. Sci. Technol. 12
working document CCEM/WGKC 00/14 1431–8
[9] Cox M G and Harris P M 2001 CIE Expert Symp. on [23] Ballico M 2001 Metrologia 38 155–9
Uncertainty Evaluation (Vienna, Austria) pp 22–4 [24] Cox M (ed) 1999 NPL Workshop on Statistical Analysis of
[10] 1993 Guide to the Expression of Uncertainty in Measurement Inter-Laboratory Comparisons (Teddington: National
(Geneva: International Organisation for Standardisation) Physical Laboratory)
[11] 1993 International Vocabulary of Basic and General Terms in [25] Willink R 2003 Metrologia 40 9–17
Metrology (Geneva: International Organisation for [26] Steele A G, Wood B M and Douglas R J 2001 Metrologia 38
Standardisation) 483–8
[12] de Groot M 2003 Consultative Committee for Thermometry, [27] Milton M J T and Cox M G 2003 Metrologia 40 L1
working document CCT/03-31 [28] Beissner K 2002 Metrologia 39 59–63
[13] Elster C, Link A and von Martens H-J 2001 Meas. Sci. [29] Nielsen L 2000 Consultative Committee on Electricity and
Technol. 12 1672–7 Magnetism, working document CCEM/WGKC 00-13
[14] Helistö P and Seppä H 2003 IEEE Trans. Instrum. Meas. [30] Mangum B W et al 2002 Metrologia 39 179–205
IM-52 495–9 [31] Bich W 1990 Metrologia 27 111–16
[15] Chatfield C 1980 The Analysis of Time Series: An Introduction [32] Sutton C M 2004 Analysis and linking of international
(London: Chapman and Hall) measurement comparisons Metrologia submitted

Metrologia, 41 (2004) 122–131 131

You might also like