You are on page 1of 3

SERIES ON STATISTICS

Correlation, Agreement, and Bland–Altman Analysis:


Statistical Analysis of Method Comparison Studies

CATEY BUNCE

T
ECHNOLOGY SEEMS TO EVOLVE AT A RAPID PACE
these days, and new methods for measuring ocular
characteristics seem to be emerging constantly.
Although once there might have been a single method to
assess intraocular pressure (IOP), ophthalmic researchers
today are presented with a variety of tools— dynamic
contour tonometry, Goldmann applanation tonometry,
hand-held tonometers such as the Tono-Pen XL, Perkins
tonometer, Draeger tonometer, etc. The same is true for
visual field assessment (Humphrey Field Analyzer [Hum-
phrey Instruments, Dublin, California, USA], Octopus
perimeter [Interzeag, Schlieren, Switzerland] with their
various threshold strategies), optic disc evaluation (confo-
cal laser ophthalmoscope, scanning laser polarimeter), etc.
It would be unwise simply to assume that measures made
on the same person using different methods of measure-
FIGURE 1. Bland–Altman plot showing the difference against
ment will agree, and so studies are designed to address the the average of test A and standard measurements with limits of
question. Such studies may compare measurement with a agreement (LoA) (broken lines)—simulated data. This plot
new piece of equipment (perhaps cheaper, faster, or shows evidence of increasing discrepancy between standard and
smaller) with the so-called true measurement, but more test A with increasing intraocular pressure (IOP), so that the
often they compare two different measuring devices where LoA are far wider than would be the case were the relationship
neither can be said to offer the truth. to be removed.
In 1983, Altman and Bland set out their views regarding
the correct analysis of the data gathered in studies of this
type and drew attention to a common misconception that Bland and Altman stress the need to assess two aspects
computation of the Pearson correlation coefficient be- of agreement: how well the methods agree on average and
tween the two measurements is appropriate.1,2 They ex- how well the measurements agree for individuals. If one
plained that the Pearson correlation coefficient measures method reads lower than the other for half of the subjects
linear association rather than agreement and pointed out but higher than the other for the other subjects, then
that methods can correlate well yet disagree greatly, as overall the average discrepancy (the difference between
would occur if one method read consistently higher than measures on the same subject) may be close to 0, despite
the other. Bland and Altman commented that correlation discrepancy for individuals being high.
typically depends on the range of measures being assessed, Average agreement, or bias, can be estimated by the
with wider ranges being assessed often resulting in higher mean of the differences for individuals, and commonly a t
correlations but not as a result of better agreement between test is conducted against the null hypothesis of no bias.
the methods being assessed. They concluded that correla- Estimates of bias then can be reported with 95% confi-
tion coefficients can be misleading in method agreement dence intervals (CIs) computed as the mean difference ⫾
studies and put forward their alternative method, the limits 1.96 ⫻ standard error of the differences.
of agreement (LoA) technique. Agreement for individuals is summarized in terms of
LoA, which involves an examination of the variability of
Accepted for publication Sep 24, 2008. the differences. If the distribution of the differences is
From Moorfields Eye Hospital; the London School of Hygiene & reasonably normal, that is, symmetric and without long
Tropical Medicine; and the University College London Institute of tails (assessed by a histogram), and provided that the
Ophthalmology, London, United Kingdom.
Inquiries to Catey Bunce, Moorfields Eye Hospital, City Road, London level of discrepancy does not depend on the level of the
EC1V 2PD, United Kingdom; e-mail: c.bunce@ucl.ac.uk characteristic being measured, then 95% LoA can be

4 © 2009 BY ELSEVIER INC. ALL RIGHTS RESERVED. 0002-9394/09/$36.00


doi:10.1016/j.ajo.2008.09.032
ure 2]). Figure 3 shows the situation where there is no
relationship between discrepancy and level of measure-
ment, in which case 95% LoA would be appropriate.
Where relationships are observed, Bland and Altman
make recommendations as to how to remove these by
transformation or regression.3
Ninety-five percent LoA quantify the range of values
that can be expected to cover agreement for most of the
subjects, thereby guiding the clinician as to whether
methods agree sufficiently for use in clinical assessment.
For example, 95% LoA between two methods of (⫺1
mm Hg, 6 mm Hg) would mean that for 95% of
individuals, a measurement made by one method would
be between 1 mm Hg less and 6 mm Hg more than a
measurement made by the other method. It should be
understood that “how small LoA should be to conclude
FIGURE 2. Bland–Altman plot showing the difference against that methods agree sufficiently” is a clinical, not a
the average of test B and standard measurements with LoA statistical, decision, and it is a decision that ideally is
(broken lines)—simulated data. This plot shows evidence of made in advance of the analysis.4 It is not possible to
increasing variability of differences between instruments, with provide a formulaic approach that automatically classi-
increasing IOP— here the LoA clearly are too wide at lower fies agreement into good or poor or to provide guidance
levels of IOP. on which method to use when disagreement is consid-
erable, because this will depend on the particular
purpose for which measurements are being made. The
question that needs consideration is whether the largest
likely differences are small enough for the particular
purpose for which measurements are wanted.
Because one method comparison study provides a single
estimate of LoA, ideally these should be reported with
their 95% CIs computed as the lower or upper limit ⫾ 1.96
standard error (limit), where the standard error (limit) is
given by approximately root(3s2/n), s being the SD of the
differences between measurements by the two methods and
n being the sample size.
It is important that studies comparing methods of
measurements are adequately sized—if the number of
subjects is small, then even large discrepancies between
methods may not be detected. Such studies typically
require 100 to 200 subjects. Without large numbers, there
FIGURE 3. Bland–Altman plot showing the difference against is a very real potential for incorrectly finding a new method
the average of test C and standard measurements with LoA acceptable and for such methods to be recommended for
(broken lines)—simulated data. This plot shows no relationship widespread use without justification.
between discrepancy and the level of measurement, so that LoA Although there does seem to be evidence of increasing
are valid. awareness of the methodology put forward by Altman and
Bland,5 there also seems to be evidence of some common
misunderstandings.6 Many authors use correlation in addi-
computed as the mean of the differences ⫾ 1.96 ⫻ tion to limits of agreement, suggesting that they view these
standard deviation (SD) of the differences. as complementary rather than alternatives.7,8 Other au-
The second assumption is assessed by examination of the thors seem to believe that the Bland–Altman plot is the
Bland–Altman plot, a scatterplot of the difference between analysis—rather than a check on the assumptions neces-
measurements against their average. The plot should be sary for validation of the LoA.
looked at to see whether there seems to be any relationship The methods put forward by Bland and Altman seem
between discrepancy and the level of measurement (eg, simple and yet their message that the Pearson correlation
increasing discrepancy between standard and test A with coefficient is not the correct tool for assessing method
increasing IOP [Figure 1] or increasing variability of agreement does not seem to have been fully acknowledged.
differences between instruments with increasing IOP [Fig- The Bland–Altman plot is commonly included, yet its

VOL. 148, NO. 1 EDITORIAL 5


raison d’etre—to check that assumptions necessary for use the Pearson correlation coefficients when analyzing data
valid use of LoA are adhered to— does not seem to be from method agreement studies and to report LoA and bias
understood. Ophthalmic researchers are encouraged to not with their respective CIs.

THIS STUDY WAS SUPPORTED BY THE NATIONAL INSTITUTE FOR HEALTHCARE RESEARCH (NIHR), LONDON, UNITED
kingdom and Development funding. The author indicates no financial conflict of interest. The author was involved in the design of study; collection;
management; analysis and interpretation of data; and preparation and review of the manuscript.

REFERENCES 5. Patton N, Aslam T, Murrary G. Statistical strategies to assess


reliability in ophthalmology. Eye 2006;20:749 –754.
1. Altman DG, Bland JM. Measurement in medicine: the anal- 6. Dewitte K, Fierens C, Stöckl D, Thienpont LM. Application
ysis of method comparison studies. Statistician 1983;32:307– of the Bland-Altman plot for interpretation of method-
317. comparison studies: a critical investigation of its practice. Clin
2. Bland JM, Altman DG. Statistical methods for assessing Chem 2002;48:799 – 801.
agreement between two methods of clinical measurement. 7. King AJ, Taguri A, Wadood AC, Azuara-Blanco A. Compar-
Lancet 1986;1:307–310. ison of two fast strategies, SITA Fast and TOP, for the
3. Bland JM, Altman DG. Measuring agreement in method assessment of visual fields in glaucoma patients. Graefes Arch
comparison studies. Stat Methods Med Res 1999;8:135–160. Clin Exp Ophthalmol 2002;240:481– 487.
4. Bland JM, Altman DG. Applying the right statistics: analyses 8. Allen RJ, Dev Borman A, Saleh GM. Applanation tonometry
of measurement studies. Ultrasound Obstet Gynecol 2003;22: in silicone hydrogel contact wearers. Cont Lens Anterior Eye
85–93. 2007;30:267–269.

6 AMERICAN JOURNAL OF OPHTHALMOLOGY JULY 2009

You might also like