You are on page 1of 11

307

The Statistician 32 (1983) 307-317


1983 Institute of Statisticians

Measurement in Medicine: the Analysis of Method
Comparison Studies

D. G. ALTMAN and J. M. BLAND

Division of Computing and Statistics, MRC Clinical
Research Centre, Watford Road, Harrow HA1 3UJ, and
Department of Clinical Epidemiologv and Social Medicine,
St Georges Hospital Medical School, Cranmer Terrace, London SW17


Summary: Methods of analysis used in the comparison of two methods of
measurement are reviewed. The use of correlation, regression and the difference
between means is criticized. A simple parametric approach is proposed based on
analysis of variance and simple graphical methods.



1 The problem

In medicine we oIten want to compare two diIIerent methods oI measuring some quantity,
such as blood pressure, gestational age, or cardiac stroke volume. Sometimes we compare an
approximate or simple method with a very precise one. This is a calibration problem, and we
shall not discuss it Iurther here. Frequently, however, we cannot regard either method as
giving the true value oI the quantity being measured. In this case we want to know whether the
methods give answers which are, in some sense, comparable. For example, we may wish to see
whether a new, cheap and quick method produces answers that agree with those Irom an
established method suIIiciently well Ior clinical purposes. Many such studies, using a variety
oI statistical techniques, have been reported. Yet Iew really answer the question 'Do the two
methods oI measurement agree suIIiciently closely?
In this paper we shall describe what is usually done, show why this is inappropriate,
suggest a better approach, and ask why such studies are done so badly. We will restrict our
consideration to the comparison oI two methods oI measuring a continuous variable, although
similar problems can arise with categorical variables.

2 Incorrect methods of analysis

We shall Iirst describe some examples oI method comparison studies, where the statistical
methods used were not appropriate to answer the question.

Comparison of means
Cater (1979) compared two methods oI estimating the gestational age oI human babies.



fPaper presented at the Institute oI Statisticians conIerence, July 1981.
308
Gestational age was calculated Irom the last menstrual period (LMP) and also by the total maturity
score based on external physical characteristics (TMS). He divided the babies into three groups:
normal birthweight babies, low birthweight pre-term ( 36 weeks gestation) babies, and low
birthweight term babies. For each group he compared the mean by each method (using an
unspeciIied test oI signiIicance), Iinding the mean gestational age to be signiIicantly diIIerent Ior
pre-term babies but not Ior the other groups. It was concluded that 'the TMS is a convenient and
accurate method oI assessing gestational age in term babies.
His criterion oI agreement was that the two methods gave the same mean measurement; 'the
same appears to stand Ior 'not signiIicantly diIIerent. Clearly, this approach tells us very little
about the accuracy oI the methods. By his criterion, the greater the measurement error, and hence
the less chance oI a signiIicant diIIerence, the better.

Correlation
The Iavourite approach is to calculate the product-moment correlation coeIIicient, r, between the
two methods oI measurement. Is this a valid measure oI agreement? The correlation coeIIicient in
this case depends on both the variation between individuals (i.e. between the true values) and the
variation within individuals (measurement error). In some applications the 'true value will be the
subject`s average value over time, and short-term within-subject variation will be part oI the
measurement error. In others, where we wish to identiIy changes within subjects, the true value is
not assumed constant.
The correlation coeIIicient will thereIore partly depend on the choice oI subjects. For iI the
variation between individuals is high compared to the measurement error the correlation will be
high, whereas iI the variation between individuals is low the correlation will be low. This can be
seen iI we regard each measurement as the sum oI the true value oI the measured quantity and the
error due to measurement. We have:
variance oI true values
T
2

variance oI measurement error, method A
A
2

variance oI measurement error, method B
B
2
In the simplest model errors have expectation zero and are independent oI one another and oI the
true value, so that
variance oI method A
A
2

T
2

variance oI method B
B
2

T
2

covariance
T
2
(see appendix)
Hence the expected value oI the sample correlation coeIIicient r is
) )( (
2 2 2 2
2
T B T A
T



Clearly is less than one, and it depends only on the relative sizes oI
T
2
,
A
2
and
B
2
. II
A
2
and
B
2

are not small compared to
T
2
, the correlation will be small no matter how good the agreement
between the two methods.
In the extreme case, when we have several pairs oI measurements on the same individual,

T
2
0 (assuming that there are no temporal changes), and so 0 no matter how close the
agreement is. Keim et al. (1976) compared dye-dilution and impedance cardiography by Iinding
the correlation between repeated pairs oI measurements by the two methods on each oI 20 patients.
The 20 correlation coeIIicients ranged Irom 0.77 to 0.80, with one correlation being signiIicant at
the 5 per cent level. They concluded that the two methods did not agree because low correlations
were Iound when the range oI cardiac output was small, even though other studies covering a wide
range oI cardiac output had shown high correlations. In Iact the result oI their analysis may be

The value of a measurement is depend on :
The underlying true values: variation between subjects
The error in measurement: variation within subjects
309
explained on the statistical grounds discussed above, the expected value oI the correlation
coeIIicient being zero. Their conclusion that the methods did not agree was thus wrong - their
approach tells us nothing about dye-dilution and impedance cardiography.
As already noted, another implication oI the expected value oI r is that the observed
correlation will increase iI the between subject variability increases. A good example oI this is
given by the measurement oI blood pressure. Diastolic blood pressure varies less between
individuals than does systolic pressure, so that we would expect to observe a worse correlation
Ior diastolic pressures when methods are compared in this way. In two papers (Laughlin et al.,
1980; Hunyor et al., 1978) presenting between them 11 pairs oI correlations, this phenomenon
was observed every time (Table 1). It is not an indication that the methods agree less well Ior
diastolic than Ior systolic measurements. This table provides another illustration oI the eIIect on
the correlation coeIIicient oI variation between individuals. The sample oI patients in the study
oI Hunyor et al. had much greater standard deviations than the sample oI Laughlin et al. and the
correlations were correspondingly greater.

Table 1. Correlation coefficients between methods of measurement of blood pressure for
systolic and diastolic pressures

Systolic pressure Diastolic pressure
s
A
s
B
r s
A
s
B
r
Laughlin et al. (1980)
1 13.4
a
15
.
3
a
0.69 6.1
a
6.3
a
0.63
2 0.83 0.55
3 0.68 0.48
4 0.66 0.37
Hunyor et al. (1978)
1 40.0 40.3 0.997 15
.
9 13.2 0
,
938
2 41.5 36.7 0.994 15.5 14.0 0.863
3 40.1 41.8 0.970 16.2 17.8 0.927
4 41.6 38.8 0.984 14.7 15.0 0.736
5 40.6 37.9 0.985 15
.
9 19
.
0 0.685
6 43.3 37.0 0.987 16.7 15.5 0.789
7 45.5 38.7 0.967 23.9 26.9 0.941

a
Standard deviations Ior Iour sets oI data combined.

A Iurther point oI interest is that even what appears (visually) to be Iairly poor
agreement can produce Iairly high values oI the correlation coeIIicient. For example, SerIontein
and Jaroszewicz (1978) Iound a correlation oI 0.85 when they compared two more methods oI
assessing gestational age, the Robinson and the Dubowitz. They concluded that because the
correlation was high and signiIicantly diIIerent Irom zero, agreement was good. However, Irom
their data a baby with a gestational age oI 35 weeks by the Robinson method could have been
anything between 34 and 39.5 weeks by the Dubowitz method. For two methods which purport
to measure the same thing the agreement between them is not close, because what may be a high
correlation in other contexts is not high when comparing things that should be highly related
anyway. The test oI signiIicance oI the null hypothesis 0 is beside the point. It is unlikely
that we would consider totally unrelated quantities as candidates Ior a method comparison study.
The correlation coeIIicient is not a measure oI agreement; it is a measure oI association.

310
Thus it is quite wrong, Ior example, to inIer Irom a high correlation that 'the methods . . . may
be used interchangeably (Hallman and Teramo, 1981).
At the extreme, when measurement error is very small and correlations correspondingly
high, it becomes diIIicult to interpret diIIerences. Oldham et al. (1979) state that: 'Connecting
|two types oI peak Ilow meter| in series produces a correlation coeIIicient oI 0.996, which is a
material improvement on the Iigure oI 0.992 obtained when they are used separately. It is
diIIicult to imagine another context in which it were thought possible to improve materially on
a correlation oI 0.992. As Westgard and Hunt (1973) have said: 'The correlation coeIIicient ...
is oI no practical use in the statistical analysis oI comparison data.

Regression
Linear regression is another misused technique in method comparison studies. OIten the slope
oI the least squares regression line is tested against zero. This is equivalent to testing the
correlation coeIIicient against zero, and the above remarks apply. A more subtle problem is
illustrated by the work oI Carr et al. (1979), who compared two methods oI measuring the
heart`s leIt ventricular ejection Iraction. These authors gave not only correlation coeIIicients
but the regression line oI one method, Teichholz, on the other, angiography.
They noted that the slope oI the regression line diIIered signiIicantly Irom the line oI
identity. Their implied argument was that iI the methods were equivalent the slope oI the
regression line would be 1. However, this ignores the Iact that both dependent and
independent variables are measured with error. In our previous notation the expected slope is

T
2
/(
A
2

T
2
) and is thereIore less than l. How much less than 1 depends on the amount oI
measurement error oI the method chosen as independent. Similarly, the expected value oI the
intercept will be greater than zero (by an amount that is the product oI the mean oI the true
values and the bias in the slope) so that the conclusion oI Ross et al. (1982) that 'with a slope
not diIIering signiIicantly Irom unity but a statistically highly signiIicant v-intercept, the
presence oI a systematic diIIerence ... is demonstrated is unjustiIied.
We do not reject regression totally as a suitable method oI analysis, and will discuss it
Iurther below.

Asking the right question
None oI the previously discussed approaches tells us whether the methods can be considered
equivalent. We think that this is because the authors have not thought about what question
they are trying to answer. The questions to be asked in method comparison studies Iall into
two categories:
(a) Properties oI each method:
How repeatable are the measurements?
(b) Comparison oI methods:
Do the methods measure the same thing on average? That is, is there any relative
bias? What additional variability is there? This may include both errors due to
repeatability and errors due to patient/method interactions. We summarize all this as
'error.

Under properties oI each method we could also include questions about variability
between observers, between times, between places, between position oI subject, etc. Most
studies standardize these, but do not consider their eIIects, although when they are considered,
conIusion may result. Altman`s (1979) criticism oI the design oI the study by SerIontein and
Jaroszewicz (1978) provoked the response that: 'For the actual study it was Ielt that the Iact
assessments were made by two diIIerent observers (one doing only the Robinson technique
and the other only the Dubowitz method) would result in greater objectivity (SerIontein and
Jaroszewicz, 1979). The eIIects oI method and observer are, oI course, totally conIounded.
311
We emphasize that this is a question oI estimation, both oI error and bias. What we need is
a design and analysis which provide estimates oI both error and bias. No single statistic can
estimate both.

3 Proposed method of analysis

Just as there are several invalid approaches to this problem, there are also various possible
types oI analysis which are valid, but none oI these is without diIIiculties. We Ieel that a
relatively simple pragmatic approach is preIerable to more complex analyses, especially when
the results must be explained to non-statisticians.
It is diIIicult to produce a method that will be appropriate Ior all circumstances. What
Iollows is a brieI description oI the basic strategy that we Iavour; clearly the various possible
complexities which could arise might require a modiIied approach, involving additional or
even alternative analyses.

Properties of each method. repeatabilitv
The assessment oI repeatability is an important aspect oI studying alternative methods oI
measurement. Replicated measurements are, oI course, essential Ior an assessment oI
repeatability, but to judge Irom the medical literature the collection oI replicated data is rare.
One possible reason Ior this will be suggested later.
Repeatability is assessed Ior each measurement method separately Irom replicated
measurements on a sample oI subjects. We obtain a measure oI repeatability Irom the within-
subject standard deviation oI the replicates. The British Standards Institution (1979) deIine a
coeIIicient oI repeatability as 'the value below which the diIIerence between two single test
results ... may be expected to lie with a speciIied probability; in the absence oI other
indications, the probability is 95 per cent. Provided that the diIIerences can be assumed to
Iollow a Normal distribution this coeIIicient is 2.83
r
, where
r
is the within-subject standard
deviation.
r
must be estimated Irom a suitable experiment. For the purposes oI the present
analysis the standard deviation alone can be used as the measure oI repeatability.
It is important to ensure that the within-subject repeatability is not associated with the
size oI the measurements, in which case the results oI subsequent analyses might be
misleading. The best way to look Ior an association between these two quantities is to plot the
standard deviation against the mean. II there are two replicates x
1
and x
2
then this reduces to a
plot oI , x
1
x
2
, against (x
1
x
2
)/2. From this plot it is easy to see iI there is any tendency Ior
the amount oI variation to change with the magnitude oI the measurements. The correlation
coeIIicient could be tested against the null hypothesis oI r 0 Ior a Iormal test oI
independence.
II the within-subject repeatability is Iound to be independent oI the size oI the
measurements, then a one-way analysis oI variance can be perIormed. The residual standard
deviation is an overall measure oI repeatability, pooled across subjects.
II, however, an association is observed, the results oI an analysis oI variance could be
misleading. Several approaches are possible, the most appealing oI which is the transIormation
oI the data to remove the relationship. In practice the logarithmic transIormation will oIten be
suitable. II the relationship can be removed, a one-way analysis oI variance can be carried out.
Repeatability can be described by calculating a 95 per cent range Ior the diIIerence between
two replicates. Back-transIormation provides a measure oI repeatability in the original units.
In the case oI log transIormation the repeatability is a percentage oI the magnitude oI the
measurement rather than an absolute value. It would be preIerable to carry out the same
transIormation Ior measurement by each method, but this is not essential, and may be totally
inappropriate.
312
II transIormation is unsuccessIul, then it may be necessary to analyse data Irom a
restricted range oI measurements only, or to subdivide the scale into regions to be analysed
separately. Neither oI these approaches is likely to be particularly satisIactory. Alternatively,
the repeatability can be deIined as a Iunction oI the size oI the measurement.

Properties of each method. other considerations
Many Iactors may aIIect a measurement, such as observer, time oI day, position oI subject,
particular instrument used, laboratory, etc. The British Standards Institution (1979) distinguish
between repeatability, described above, and reproducibility, 'the value below which two
single test results ... obtained under diIIerent conditions ... may be expected to lie with a
speciIied probability. There may be diIIiculties in carrying out studies oI reproducibility in
many areas oI medical interest. For example, the gestational age oI a newborn baby could not
be determined at diIIerent times oI year or in diIIerent places. However, when it is possible to
vary conditions, observers, instruments, etc., the methods described above will be appropriate
provided the eIIects are random. When eIIects are Iixed, Ior example when comparing an
inexperienced observer and an experienced observer, the approach used to compare diIIerent
methods, described below, should be used.

Comparison of methods
The main emphasis in method comparison studies clearly rests on a direct comparison oI the
results obtained by the alternative methods. The question to be answered is whether the
methods are comparable to the extent that one might replace the other with suIIicient accuracy
Ior the intended purpose oI measurement.




Fig. 1. Comparison of two methods of measuring systolic blood pressure.

313
The obvious Iirst step, one which should be mandatory, is to plot the data. We Iirst
consider the unreplicated case, comparing methods A and B. Plots oI this type are very
common and oIten have a regression line drawn through the data. The appropriateness or
regression will be considered in more detail later, but whatever the merits oI this approach, the
data will always cluster around a regression line by deIinition, whatever the agreement. For
the purposes oI comparing the methods the line oI identity (A B) is much more inIormative,
and is essential to get a correct visual assessment oI the relationship. An example oI such a
plot is given in Figure 1, where data comparing two methods oI measuring systolic blood
pressure are shown.
Although this type oI plot is very Iamiliar and in Irequent use, it is not the best way oI looking
at this type oI data, mainly because much oI the plot will oIten be empty space. Also, the
greater the range oI measurements the better the agreement will appear to be. It is preIerable
to plot the diIIerence between the methods (A B) against (A B)/2, the average. Figure 2
shows the data Irom Figure 1 replotted in this way. From this type oI plot it is much easier to
assess the magnitude oI disagreement (both error and bias), spot outliers, and see whether
there is any trend, Ior example an increase in A B Ior high values. This way oI plotting the
data is a very powerIul way oI displaying the results oI a method comparison study. It is
closely related to the usual plot oI residuals aIter model Iitting, and the patterns observed may
be similarly varied. In the example shown (Figure 2) there was a signiIicant relationship
between the method diIIerence and the size oI measurement (r 0.45, n 25, P 0.02). This
test is equivalent to a test oI equality oI the total variances oI measurements obtained by the
two methods (Pitman, 1939; see Snedecor and Cochran, 1967, pp. 195-7).


Fig. 2. Data from Figure 1 replotted to show the difference between the two methods
against the average measurement.


314
As in the investigation oI repeatability, we are looking here Ior the independence oI the
between-method diIIerences and the size oI the measurements. With independence the
methods may be compared very simply by analysing the individual A B diIIerences. The
mean oI these diIIerences will be the relative bias, and their standard deviation is the estimate
oI error. The hypothesis oI zero bias can be Iormally examined by a paired t-test.
For the data oI Carr et al. (1979) already discussed, the correlation oI the individual
diIIerences with the average value was 0.36 (P ~ 0.1), so that an assumption oI independence
is not contradicted by the data. Figure 3 shows these data plotted in the suggested manner.
Also shown is a histogram oI the individual between-method diIIerences, and superimposed
on the data are lines showing the mean diIIerence and a 95 per cent range calculated Irom the
standard deviation. A composite plot like this is much more inIormative than the usual plot
(such as Figure 1).
II there is an association between the diIIerences and the size oI the measurements, then
as beIore, a transIormation (oI the raw data) may be successIully employed. In this case the 95
per cent limits will be asymmetric and the bias will not be constant. Additional insight into the
appropriateness oI a transIormation may be gained Irom a plot oI ,A B, against (A B)/2, iI
the individual diIIerences vary either side oI zero. In the absence oI a suitable transIormation
it may be reasonable to describe the diIIerences between the methods by regressing A B on
(A B)/2.
For replicated data, we can carry out these procedures using the means oI the replicates.
The estimate oI bias will be unaIIected, but the error will be reduced. We can estimate the
standard deviation oI the diIIerence between individual measurements Irom the standard
deviation oI the diIIerence between means by
var(A B) n var( A B)
where n is the number oI replicates.
Within replicated data it may be Ielt desirable to carry out a two-way analysis oI variance,
with main eIIects oI individuals and methods, in order to get better estimates. Such an analysis
would need to be supported by the analysis oI repeatability, and in the event oI the two
methods not being equally repeatable the analysis would have to be weighted appropriately.
The simpler analysis oI method diIIerences (Figure 2) will also need to be carried out to
ascertain that the diIIerences are independent oI the size oI the measurements, as otherwise the
answers might be misleading.

Alternative analvses
One alternative approach is least squares regression. We can use regression to predict the
measurement obtained by one method Irom the measurement obtained by the other, and
calculate a standard error Ior this prediction. This is, in eIIect, a calibration approach and does
not directly answer the question oI comparability. There are several problems that can arise,
some oI which have already been reIerred to. Regression does not yield a single value Ior
relative precision (error), as this depends upon the distance Irom the mean. II we do try to use
regression methods to assess comparability diIIiculties arise because there no obvious estimate
oI bias, and the parameters are diIIicult to interpret. Unlike the analysis oI variance model, the
parameters are aIIected by the range oI the observations and Ior the results to apply generally
the methods ought to have been compared on a random sample oI subjects - a condition that
will very oIten not be met. The problem oI the underestimation (attenuation) oI the slope oI
the regression line has been considered by Yates (Healy, 1958), but the other problems
remain.
315



Fig. 3. Comparison of two methods of measuring left ventricular ejection fraction (Carr et
al., 1979) replotted to show error and bias.

Other methods which have been proposed include principal component analysis (or
orthogonal regression) and regression models with errors in both variables (structural
relationship models) (see Ior example Carey et al., 1975; Lawton et al., 1979; Cornbleet and
Gochman, 1979; Feldmann et al., 1981). The considerable extra complexity oI such analysis
will not be justiIied iI a simple comparison is all that is required. This is especially true when
the results must be conveyed to and used by non-experts, e.g. clinicians. Such methods will be
necessary, however, iI it is required to
316
predict one measurement Irom the other - this is nearer to calibration and is not the problem
we have been addressing in this paper.

Why does the comparison of methods cause so much difficulty?

The majority oI medical method comparison studies seem to be carried out without the beneIit
oI proIessional statistical expertise. Because virtually all introductory courses and textbooks in
statistics are method-based rather than problem-based, the non-statistician will search in vain
Ior a description oI how to proceed with studies oI this nature. It may be that, as a
consequence, textbooks are scanned Ior the most similar-looking problem, which is
undoubtedly correlation. Correlation is the most commonly used method, which may be one
reason Ior so Iew studies involving replication, since simple correlation cannot cope with
replicated data. A Iurther reason Ior poor methodology is the tendency Ior researchers to
imitate what they see in other published papers. So many papers are published in which the
same incorrect methods are used that researchers can perhaps be Iorgiven Ior assuming that
they are doing the right thing. It is to be hoped that journals will become enlightened and
return papers using inappropriate techniques Ior reanalysis.
Another Iactor is that some statisticians are not as aware oI this problem as they might be. As
an illustration oI this, the blood pressure data shown in Figures 1 and 2 were taken Irom the
book Biostatistics by Daniel (1978), where they were used as the example oI the calculation oI
the correlation coeIIicient. A counter-example is the whole chapter devoted to method
comparison (by regression) by Strike (1981). More statisticians should be aware oI this
problem, and should use their inIluence to similarly increase the awareness oI their non-
statistical colleagues oI the Iallacies behind many common methods.

Conclusions

1. Most common approaches, notably correlation, do not measure agreement.
2. A simple approach to the analysis may be the most revealing way oI looking at the
data.
3. There needs to be a greater understanding oI the nature oI this problem, by
statisticians, non-statisticians and journal reIerees.

Acknowledgements

We would like to thank Dr David Robson Ior helpIul discussions during the preparation oI this
paper, and ProIessor D. R. Cox, ProIessor M. J. R. Healy and Mr A. V. Swan Ior comments on
an earlier draIt.

Appendix

Covariance of two methods of measurement in the presence of measurement errors
We have two methods A and B oI measuring a true quantity T. They are related T by A T

A
and B T
B
, where
A
and
B
are experimental errors. We assume that the errors have
mean zero and are independent oI each other and oI T, and deIine the Iollowing variances:
var(T)
T
2
, var(
A
)
A
2
, and var(
B
)
B
2

317
Now the covariance oI A and B is given by
E(AB) E(A)E(B) E(T
A
) (T
B
)} E(T
A
)E(T
B
)
E(T
2

A
T
B
T
A

B
)} E(T)

E(
A
)}E(T) E(
B
)}
But E(
A
) E(
B
) 0, and the errors and T are independent, so
E(
A
)E(T) E(
B
)E(T) 0
and
E(
A

B
) E(
A
)E(
B
) 0
Hence cov(A,B) E(T
2
) E(T)}
2

T
2


References
Altman, D. G. (1979). Estimation oI gestational age at birth - comparison oI two methods. Archives of
Disease in Childhood 54, 242-3.
British Standards Institution (1979). Precision oI test methods, part 1: guide Ior the determination oI
repeatability and reproducibility Ior a standard test method. BS 5497, Part 1. London.
Carey, R. N., Wold, S. and Westgard, J. O. (1975). Principal component analysis: an alternative to 'reIeree
methods in method comparison studies. Analvtical Chemistrv 47, 1824-9.
Carr, K. W., Engler, R. L., Forsythe, J. R., Johnson, A. D. and Gosink, B. (1979). Measurement oI leIt
ventricular ejection Iraction by mechanical cross-sectional echocardiography. Circulation 59, 1196-1206.
Cater, J. I. (1979). ConIirmation oI gestational age by external physical characteristics (total maturity score).
Archives of Disease in Childhood 54, 794-5.
Cornbleet, P. J. and Gochman, N. (1979). Incorrect least-squares regression coeIIicients in method-
comparison analysis. Clinical Chemistrv 25, 432-8.
Daniel, W. W. (1978). Biostatistics. a Foundation for Analvsis in the Health Sciences, 2nd edn. Wiley,
New York.
Feldmann, U., Schneider, B., Klinkers, H. and Haeckel, R. (1981). A multivariate approach Ior the biometric
comparison oI analytical methods in clinical chemistry. Journal of Clinical Chemistrv and Clinical
Biochemistrv 19, 121-37.
Hallman, M. and Teramo, K. (1981). Measurement oI the lecithin/sphingomyelin ratio and
phosphatidylglycerol in amniotic Iluid: an accurate method Ior the assessment oI Ietal lung maturity. British
Journal of Obstetrics and Gvnaecologv 88, 806-13.
Healy, M. J. R. (1958). Variations within individuals in human biology. Human Biologv 30, 210-8.
Hunyor, S. M., Flynn, J. M. and Cochineas, C. (1978). Comparison oI perIormance oI various
sphygmomanometers with intra-arterial blood-pressure readings. British Medical Journal 2, 159-62.
Keim, H. J., Wallace, J. M., Thurston, H., Case, D. B., Drayer, J. I. M. and Laragh, J. H. (1976). Impedance
cardiography Ior determination oI stroke index. Journal of Applied Phvsiologv 41, 797-9.
Laughlin, K. D., Sherrard, D. J. and Fisher, L. (1980). Comparison oI clinic and home blood-pressure levels
in essential hypertension and variables associated with clinic-home diIIerences. Journal of Chronic
Diseases 33, 197-206.
Lawton, W. H., Sylvestre, E. A. and Young-Ferraro, B. J. (1979). Statistical comparison oI multiple analytic
procedures: application to clinical chemistry. Technometrics 21, 397-416.
Oldham, H. G., Bevan, M. M. and McDermott, M. (1979). Comparison oI the new miniature Wright peak
Ilow meter with the standard Wright peak Ilow meter. Thorax 34, 807-8.
Pitman, E. J. G. (1939). A note on normal correlation. Biometrika 31, 9-12.
Ross, H. A., Visser, J. W. E., der Kinderen, P. J., Tertoolen, J. F. W. and Thijssen, J. H. H. (1982). A
comparative study oI Iree thyroxine estimations. Annals of Clinical Biochemistrv 19, 108-13.
SerIontein, G. L. and Jaroszewicz, A. M. (1978). Estimation oI gestational age at birth - comparison oI two
methods. Archives of Disease in Childhood 53, 509-11.
SerIontein, G. L. and Jaroszewicz, A. M. (1979). Estimation oI gestational age at birth - comparison oI two
methods. Archives of Disease in Childhood 54, 243.
Snedecor, G. W. and Cochran, W. G. (1967). Statistical Methods, 6th edn. University Press, Iowa. Strike,
P. W. (1981). Medical Laboratorv Statistics. P. S. G. Wright, Bristol.
Westgard, J. O. and Hunt, M. R. (1973). Use and interpretation oI common statistical tests in method-
comparison studies. Clinical Chemistrv 19, 49-57.

You might also like