You are on page 1of 12

Ophthalmic and Physiological Optics ISSN 0275-5408

INVITED REVIEW

Should Pearson’s correlation coefficient be avoided?


Richard A Armstrong
School of Life and Health Sciences: Ophthalmic Research Group, School of Optometry, Aston University, Birmingham, UK

Citation information: Armstrong RA. Should Pearson’s correlation coefficient be avoided? Ophthalmic Physiol Opt 2019. https://doi.org/10.1111/
opo.12636

Keywords: bivariate normal distribution, Abstract


correlation, curvilinear regression, partial
correlation, Pearson’s correlation coefficient Purpose: To survey the use of Pearson’s correlation coefficient (r) and related sta-
(r), range restriction tistical methods in the ophthalmic literature, to consider the limitations of r, and
to suggest suitable alternative methods of analysis.
Correspondence: Richard A Armstrong Recent findings: Searching Ophthalmic and Physiological Optics (OPO), Optometry
E-mail address: r.a.armstrong@aston.ac.uk
and Vision Science (OVS), and Clinical and Experimental Optometry (CXO) online
archives using correlation and Pearson’s r as search terms resulted in 4057 and 281
Received: 24 May 2019; Accepted: 18 July
2019 hits respectively. Coefficient of determination, r square, or r squared received fewer
hits (65, 8, and 22 hits respectively). The assumption that r follows a bivariate nor-
mal distribution was rarely encountered (3 hits) although several studies applied
Spearman’s rank correlation (70 hits). The intra-class correlation coefficient (ICC)
was widely used (178 hits), but fewer hits were recorded for partial correlation (43
hits) and multiple correlation (13) hits. There was little evidence that the problem
of sample size was addressed in correlation studies.
Summary: Investigators should be alert to whether: (1) the relationship between
two variables could be non-linear, (2) the data are bivariate normal, (3) r
accounts for a significant proportion of the variance in Y, (4) outliers are present,
the data are clustered, or have a restricted range, (5) the sample size is appropri-
ate, and (6) a significant correlation indicates causality. In addition, the number
of significant digits used to express r and the problems of multiple testing should
be addressed. The problems and limitations of r suggest a more cautious approach
regarding its use and the application of alternative methods where appropriate.

goodness of fit of a linear regression.1 In addition, several


Introduction
related statistical methods have been commonly applied
Testing the degree of correlation between two or more vari- including partial and multiple correlation coefficients and
ables is one of the most widely used of all statistical proce- the family of intra-class correlation coefficients (ICC).
dures1; the product moment correlation coefficient, also Despite its widespread application, Pearson’s r has many
known as Pearson’s correlation coefficient (r) being one of limitations raising the question of whether its use should be
the most frequently used statistics.2 In ophthalmic research, restricted or even avoided. First, r tests only whether there
correlation methods have been applied in a wide variety of is a linear relationship between two variables and a signifi-
circumstances. First, to determine whether there is a statis- cant curvilinear relationship can result in a non-significant
tically significant positive or negative relationship between r. Second, the use of r assumes that the pairs of observa-
two or more variables. Second, to measure the degree of tions (x,y) are members of the bivariate normal distribu-
statistical significance that can be attached to a correlation. tion4 and failure of this assumption may require the use of
Third, to determine what proportion of the variability in a non-parametric correlation coefficient. Third, the square
the independent (Y) variable can be accounted for or ‘ex- of the correlation coefficient (r2), also known as the ‘coeffi-
plained’ by the dependent (X) variable, e.g., how much of cient of determination’, measures the proportion of the
the variation in standard visual acuity can be explained by variance associated with the Y variable that can be
other spatial vision measures3 and fourth, to test the accounted for or ‘explained’ by the X variable. When more

© 2019 The Authors Ophthalmic & Physiological Optics © 2019 The College of Optometrists 1
Correlation coefficient R A Armstrong

than 50 pairs of observations are present, values of r as low parametric data (bivariate normal distribution, non-paramet-
as 0.3 are statistically significant at the 5% level of ric correlation, Spearman rank correlation, Kendall rank corre-
probability. This property of r can cause confusion in the lation, Gamma), (5) sample size calculation (sample size
interpretation of the results of a study as a statistically sig- calculation for r, correlation sample size), and (6) related con-
nificant r may be so low that the X variable may account cepts (non-linear correlation, intra-class correlation coefficient
for a biologically meaningless proportion of the variance in (ICC), partial correlation coefficient, multiple correlation coef-
Y. Fourth, r is sensitive to the distribution of the X and Y ficient). Nevertheless, many articles refer only to r or r2 with-
variables and may be dependent on their range and vari- out a full definition and any search for ‘r’ inevitably attracts
ance,3 the presence of non-homogenous groups, or outly- hits from text and reference lists which include ‘r’ or ‘R’ as
ing values. Fifth, in many correlation studies, a significant an initial. The number of hits obtained from each archive for
value of r does not imply causality or indeed, that there is each journal using the above search terms was recorded.
any direct relationship between them at all. Hence, two
variables can be correlated not because they are causally
Results
related but because both have a significant degree of corre-
lation with a third variable. A useful statistic when multiple The number of hits (Table 1) using a general search term
variables are present is the ‘partial correlation coefficient’ testifies to the frequency of use of correlation methods in
(rp), i.e., the degree of correlation present between two vari- the ophthalmic literature (4057 hits totaled over all three
ables when the effect of a third confounding variable is journals). The number of hits using a specific search term
taken into account.4 Finally, there are the problems of how for the correlation coefficient was significantly less but were
many decimal digits to quote for r and of multiple testing if most frequent for Pearson r (281 hits) and Pearson’s correla-
may correlation tests are made in a study. tion coefficient’ (168 hits), the term ‘product-moment corre-
The purpose of this article is to review the use of r and lation coefficient’ receiving fewer hits (37 hits). The number
related methods in the ophthalmic literature and to con- of hits using the terms ‘coefficient of determination’, ‘r
sider the question of whether its use should be restricted or square’, or ‘r squared’ was less (65, 8, and 22 hits respec-
even avoided. First, the frequency of use of the various cor- tively) suggesting many studies did not assess the ‘signifi-
relation methods is assessed with reference to articles cance’ of r in terms of the proportion of the variance in the
archived online by three optometric journals, viz. Oph- Y variables attributed to X. However, many articles use the
thalmic and Physiological Optics (OPO), Optometry and term r and r2 without further definition, which would not
Vision Sciences (OVS), and Clinical and Experimental have been captured by the searches. Although the assump-
Optometry (CXO). Second, various concerns are discussed tion that the data should fit a ‘bivariate normal distribution’
with specific reference to: (1) the presence of non-linearity, was rarely addressed directly (3 hits), a number of studies
(2) the bivariate normal distribution and non-parametric used a non-parametric correlation test, viz. Spearman’s
correlation coefficients, (3) r2 and the issue of sample size, rank correlation (70 hits) which was more popular than
(4) the distribution of the X variable, (5) causality, (6) the Kendall’s rank correlation (2 hits). Of related concepts, the
number of decimal digits quoted for r, and (7) the problem ICC received most hits (178 hits in total), but there were
of multiple testing. Third, where appropriate alternative significantly fewer hits for partial correlation (43 hits) and
methods of analysis are suggested. multiple correlation (13 hits), the terms curvilinear and poly-
nomial regression receiving 7 and 19 hits respectively, the
Methods term non-linear correlation receiving no hits. Search terms
pertinent to sample size calculation for r also received no
Journals
hits suggesting that sample size was rarely discussed with
All articles published in the online archives of OPO, OVS, reference to correlation studies.
and CXO up until the end of March 2019 were accessed
using various search terms relevant to correlation studies.
Limitations of r
There are a number of problems in these types of search.
Hence, no single search term can capture all of the relevant Given the results of the survey, the following sections dis-
articles in a large archive and so a variety of search terms cuss the various limitations of r and related issues and sug-
were used for each topic. Search terms were divided into sev- gest advice or alternative methods of analysis where
eral groups, i.e., those relevant to: (1) correlation generally appropriate.
(correlation, correlation coefficient), (2) correlation more
specifically (Pearson’s correlation coefficient, Pearson r, pro- The relationship may be non-linear
duct-moment correlation coefficient, (3) the interpretation of r Pearson’s r tests the degree of linear correlation between
(coefficient of determination, r squared, r square), (4) non- two variables and is often applied without examining

2 © 2019 The Authors Ophthalmic & Physiological Optics © 2019 The College of Optometrists
R A Armstrong Correlation coefficient

Table 1. Frequency of ‘hits’ using various search terms pertinent to variance (ANOVA) and the method is illustrated in Table 2.
studies of correlation in all content and all archived online issues of Essentially, a straight line is fitted to the data and an ANOVA
three optometry journals
carried out to obtain the sums of squares (SS) of the devia-
Journal tion from a linear regression.4 A second-order (quadratic)
polynomial curve is then fitted to the data, i.e., an equa-
Exact search term OPO CXO OVS Total
tion of the form:
Correlation 1287 1318 1452 4057
Correlation coefficient 275 126 358 759 Y ¼ a þ bx þ cx2 : ð1Þ
Pearson’s correlation coefficient 62 24 82 168
Pearson r 214 27 40 281 A second ANOVA is carried out to obtain the SS of devia-
Product moment correlation 13 5 19 37
tions from the curved regression. The difference between
coefficient
the linear and curvilinear SS measures the reduction in SS
Bivariate normal distribution 1 1 1 3
Non-linear correlation 0 0 0 0 of the Y values achieved by fitting the curvilinear rather
Curvilinear regression 4 1 2 7 than the linear regression. This difference is then tested
Polynomial regression 10 2 7 19 against the deviation from the curved regression using an F
Hierarchical linear modelling 0 0 1 1 test. If the F ratio of the mean square (MS) reduction in SS
Coefficient of determination 23 13 29 65 to the MS of the deviation from a curvilinear regression is
r square, r squared 5, 9 0, 7 3, 6 8, 22
significant, then the curved relationship is a significantly
Non-parametric correlation 5 1 1 7
better fit to the data than the straight line. In this example
Spearman rank correlation 23 19 28 70
Kendall rank correlation 2 0 0 2 F = 0.75 (DF 1,16; p = 0.60) and it would be concluded
Gamma 0 0 0 0 that the quadratic regression does not fit the data any better
Intra-class correlation coefficient, ICC 17, 12, 3, 73 32, than a straight line. Parabolic regressions often work well
40 65 178 for estimation and interpolation within the range of the
Partial correlation 10 12 21 43 data even if the actual relationship between Y and X is not
Multiple correlation 3 4 6 13
strictly parabolic. Extrapolation beyond the data for estima-
Sample size calculation for ‘r’ (and 0 0 0 0
tion is, however, extremely risky. In these circumstances,
variants)
Pearson’s r should not be used to assess the relationship
CXO, Clinical and Experimental Optometry; OPO, Ophthalmic and Physi- between Y and X and instead, there are various non-linear
ological Optics; OVS, Optometry and Vision Science. correlation coefficients that can be applied.9
If a curve has a more complex shape, a series of polyno-
whether a degree of curvature may be present. Some curvi- mials of higher order may be necessary to provide the best
linear relationships will also result in a significant r while fit to the data.4,5 Hence, polynomials of order 1, 2, 3 . . . n,
others such as a normal or inverted parabola may not. are fitted successively to the data and with the addition of
Hence, examination of a graphical plot of the data is an each extra term, a further ‘bend’ is added to in the curve.
essential first step in any correlation study to determine Hence, third-order (cubic) curves are ‘S’ shaped and
whether a degree of curvature may be present. In some cir- fourth-order (quartic) curves have three ‘bends’ and may
cumstances, the curvature will be evident, and investigators appear to be ‘double peaked’. An investigator may then
can fit a specific type of curve, e.g., a logarithmic, exponen- have to decide which of the curves provides the ‘best’ fit to
tial, or asymptotic function if there is a specific hypothesis the data. With each fitted polynomial, the regression coeffi-
regarding the shape of the relationship. Alternatively, a cients, standard errors (S.E.), values of t, and the residual
more general polynomial curve can be fitted without a mean square are obtained. From these statistics, a judgment
specific hypothesis.4,5 In yet other circumstances, especially can be made as to whether a polynomial of sufficiently high
if the data exhibit a degree of scatter, a curved relationship degree has been fitted to the data. At each stage, the reduc-
may be less obvious and it may be important to test tion in the SS is tested for significance as each term is
whether the data depart significantly from linear. added. The analysis is continued by fitting successively
If simple curvature is present, a second-order (quadratic) higher-order polynomials until a non-significant value of F
polynomial is often an adequate fit to the data.6–8 For is obtained. As a precaution, it is usually good practice to
example, the two variables in Figure 1 appear positively check the next order polynomial after a non-significant
correlated and a straight line of positive slope to provide a term has been fitted.
good fit to the data. In this example, however, a degree of
curvature could be present and the question may arise as to The data may not come from a bivariate normal distribution
whether a curvilinear regression would fit the data any bet- Pearson’s r is a parametric statistic, and its application
ter than a straight line. This can be tested using analysis of depends on the assumption that the pairs of values (x,y) are

© 2019 The Authors Ophthalmic & Physiological Optics © 2019 The College of Optometrists 3
Correlation coefficient R A Armstrong

Figure 1. Change in intraocular pressure (IOP) with age showing both the linear (first-order) and quadratic (second-order) fits to the data. Test of
deviation from a linear regression (F = 0.75, 1,16 DF, p = 0.60).

members of the bivariate normal distribution.4 The bivari- been more rarely applied.13 Similar to rs, s varies from +1
ate normal distribution is a natural extension of the normal (complete concordance) to 1 (complete disagreement)
distribution from one to two variables and has the follow- but it is calculated differently.4 The two rank correlation
ing properties: (1) for each individual value of x, the corre- methods are closely related and it probably matters little
sponding values of Y are normally distributed; the means of which method is actually used. One advantage of s is that it
these normal distributions lying on a straight line, the vari- can be extended to study partial correlation discussed in a
ance being constant for each X, (2) for each y, the corre- later section. A further non-parametric correlation coeffi-
sponding X values are normally distributed, and (3) the cient occasionally given by statistical software is ‘gamma’14
marginal distributions of X and Y are also normally dis- which received no hits in the survey. Gamma is closer to s
tributed. This distribution has five parameters, viz. the than rs but is regarded as a more sensitive test if the data
means and standard deviations (S.D.) of the two variables contain many tied values.
and the population correlation coefficient p. of which r is
an estimate. The data are likely to approximate to a bivari- A significant ‘r’ does not always indicate a meaningful rela-
ate normal distribution if X and Y are both continuous tionship
variables and are themselves, normally distributed. If the The square of the correlation coefficient r2, also known as
data are small whole numbers, scores based on a limited the ‘coefficient of determination’, measures the proportion
scale, or percentages, then this assumption may not hold. of the variance associated with the Y variable that can be
The lack of hits to the specific search term bivariate normal accounted for or ‘explained’ by its linear regression on X.
distribution suggests that this assumption was rarely dis- When large numbers of pairs of observations are present,
cussed specifically in the ophthalmic literature but the fre- e.g., >50 pairs, examination of the statistical table for r
quent use of non-parametric rank correlation methods reveals that values as low as 0.3 are significant. In this case,
does indicate awareness of the problem. Snedecor and however, r2 would suggest that only 0.09 or 9% of the vari-
Cochran4 suggest that r can be used if at least one of the ation in the Y values would be accounted for by the inde-
variables is normally distributed. If there is doubt regarding pendent variable X.
the distribution of both variables then Spearman’s rank Some studies quote r and a p value but without consider-
correlation rs was the most widely used non-parametric ing r2. Hence, Mainstone et al.15 studied the relationship
method of testing correlation.10–12 Kendall’s rank correla- between the radius of curvature, hyperopic refractive error,
tion (s), which like Spearman’s rs can be used as a measure and axial length quoting r values of 0.37 and 0.75 for cor-
of an ability to appraise or detect a property by scoring, has relations between axial length and corneal radius of

4 © 2019 The Authors Ophthalmic & Physiological Optics © 2019 The College of Optometrists
R A Armstrong Correlation coefficient

curvature and hyperopic refractive error and which account but high in social and medical sciences in which a large
for 14% and 56% of the variance respectively. Jonuscheit number of individual X variables may be present.24,25 Note
et al.16 found that body height was weakly associated with also that an r of 0.9 is still associated with some scatter
central corneal thickness and peripheral corneal thickness about the fitted regression line and an r of 0.3 would hardly
(r ≥ 0.18) and moderately correlated with corneal radius be worthy of consideration.23 These issues suggest an r2 of
(r = 0.35), r2 values in both cases being quite low. More at least 50% should be set as a test statistic, but this value
respectable r2 values of 68% and 77% were obtained by would require an r value of at least 0.70, considerably
Plakitsi et al.17 in their study of the correlation of refractive greater than quoted in many studies. Once decisions have
error among right and left eyes and between corneal hys- been made regarding these variables, then an appropriate
teresis and corneal resistance factor respectively and by sample size can be obtained from curves published in Nor-
Grosvenor and Rolene18 in their study of axial length/cor- man and Streiner.23
neal radius as a function of spherical equivalent refraction A further use of r and r2 is in testing the goodness of fit
(r2 = 84%). Other studies, however, report a significant of a regression line. Hence, r2 provides an estimate of the
value of r but very low values of r2. Hence, Applegate strength of the relationship between Y and X. Nevertheless,
et al.19 investigated whether the metrics of retinal image there is no established ‘cut-off’ in r2 below which the line
quality predicted visual performance, the ability of metrics would be regarded as a poor fit. In general, however, a line
to predict logMAR acuity improving as luminance and/or accounting for less than 50% of the variance should not be
contrast was lowered. Hence, the best image quality metric regarded as a good fit to the data. There are two further
accounted for 2.6%, 15.1%, 27.6% and 40% of the variance methods of testing the significance of a regression. First,
taking into account various types of logMAR acuity. Koenig goodness of fit of the line can be tested using ANOVA.4 ANOVA
et al.20 studied the effect of various metrics on changes in determines the statistical significance of the line rather than
logMAR and reported r2 values in the range 3–21%. One of the strength of the relationship between the two variables.
the more extreme examples was reported by Nomura The total variation of the Y values is divided into a linear
et al.,21 intraocular pressure (IOP) in a large Japanese pop- effect, i.e., that portion of the variance accounted for by the
ulation was inversely correlated with age in men line, and the error variance associated with deviations from
(r = 0.14, p < 0.001). Although the value of r was highly the line, the two sources of variation being compared using
significant, only approximately 2% of the variance in IOP a variance ratio (F) test. Second, the significance of a
could be attributable to age, 98% of the variance in IOP regression line can be tested by whether the slope of the line
being attributable to other factors. Low values of r are com- (b) is significantly different from zero. The ratio of b to its
mon in observational studies where large numbers of X SE (sb) converts b so that it is a member of the t distribu-
variables are likely to be present and in which the objective tion. To decide which of the three methods of testing a
may be to ‘explain’ the source of variation in Y. A number regression line is appropriate depends on the precise
of X variables may be correlated with Y but each may hypothesis posed. Hence, r2 estimates the strength of the
account for only a small proportion of the total variance. relationship between Y and X, ANOVA whether the regression
In this context, significant correlations of small magnitude line is statistically significant, and the t test whether the slope
may not be of practical value because they account for little of the line is significantly different from zero. Application of
of the total variability. both the latter two methods to the data in Figure 1 is
These problems raise the question of whether an appro- shown in Table 3. Hence, the regression effect (F = 32.73,
priate sample size can be calculated for a correlation test, p = 0.000025) is highly significant. Moreover, the t test of
i.e., if a theory predicts a certain correlation between two the slope of the line b gives a value of t = 5.72
variables then what is the appropriate sample size to effec- (p = 0.00001), again very highly significant. Both of these
tively test this hypothesis, a question rarely addressed in the tests are more useful than r2 in judging whether a signifi-
three journals? The question is also more complex than cant regression line has been fitted to the data.
usual as both a too small and a too large a sample size can
cause problems in testing the hypothesis and in interpreta- r is sensitive to the distribution of the X values
tion.22 A useful method of approach is described by Nor- First, r is particularly sensitive to the presence of outliers
man and Streiner.23 First, decide on an appropriate a (from especially if the sample size is small26,27 and a single atypical
0.05 to 0.01) and b (from 0.05 to 0.20) level. Second, decide point can have a considerable effect on its value.28–30
how big the correlation should be before it is declared as Hence, in the data illustrated in Figure 1, changing the final
significant. The interpretation of r, however, is arbitrary point (63, 21.2) to (63,10) would have a dramatic effect on
depending on purpose and context. Hence, a value of r of r reducing it from r = 0.81 to r = 0.23. A graphical plot of
0.8 might be regarded as low in the physical sciences in the data is therefore essential to identify possible outliers
which a physical law requires more rigorous verification and if present, and with no grounds for regarding them as

© 2019 The Authors Ophthalmic & Physiological Optics © 2019 The College of Optometrists 5
Correlation coefficient R A Armstrong

Table 2. Analysis of variance (ANOVA) to test the departure of a set of reported values of r calculated between visual acuity and
data from a linear regression using the data illustrated in Figure 1 contrast sensitivity (CS) are highly variable.34–37 When r is
Source SS DF MS F high it was concluded that other measures of spatial vision
added no further information38 whereas when r is low, sev-
Linear regression 53.76 1 eral different dimensions of vision were being assessed.39 A
Quadratic term 1.35 1 1.35 0.75 (p = 0.60)
major factor determining these differences was the range of
Deviation from quadratic term 28.76 16 1.79
the X variable. If r was low and a restricted range of X was
DF, degrees of freedom; F, Variance ratio; MS, Mean square; p, proba- present then no strong conclusion could be drawn from the
bility; SS, Sums of squares. data and even if r was large with a significant range of the X
variable, the accuracy of prediction may still remain
errors, the data should be analysed with and without the poor.40–44 Formulae for adjusting r values for ‘range restric-
outliers to assess their effect on the correlation, especially if tion’ are available and if the data come from a bivariate
sample size is small. normal distribution the methods described by Guilford,45
Second, care is needed to ensure that non-homogeneous Thorndike,46 and Wiberg and Sondstrom,47 the latter being
groups are not included in the correlation. For example, if based on a ‘missing value’ approach, have been used. Nev-
measurements of X and Y are made from two samples of ertheless, investigators should be wary of these procedures
subjects differing substantially in age (say a ‘young’ and an as they are essentially extrapolating beyond the range of the
‘older’ group), then regressions calculated on each group data to adjust the value of an already ‘weak’ statistic. In
separately may not be significant. It would be inappropriate addition, the actual circumstances where this type of
to combine the groups and calculate r on the pooled data adjustment may be made are unlikely to occur commonly
because even if r was significant it would not reflect a true in ophthalmic research. An example often quoted is the use
linear relationship between X and Y but only the fact that of the ‘Thorndyke case 2 adjustment’ in the circumstance
there was a significant mean difference between the where a series of x,y values are obtained, x being available
groups.31 A more realistic example of this problem is if sev- for all measurements but y values being restricted. In this
eral y values are measured over a small number of selected case, r can only be calculated on the restricted sample (rres)
x values (e.g. 3–5 levels of X). In this case, a one-way ANOVA and the question is what would be the value of r if it had
would be more appropriate with partitioning of the effect been calculated on the unrestricted sample (runres). The for-
SS into linear, quadratic, and cubic components, the degree mula is given in Equation and a sample calculation using
of polynomial being determined by the number of x val- the data analysed in Table 2 is shown in Table 4.
ues.32,33 These problems are clearly visible from a graphical  1=2
plot of the data either revealing a ‘dumbbell’ type distribu- runres ¼ S.D.Xunres rres = S.D.2Xunres rres
2
þ S.D.2Xres  S.D.Xres rres
2
;
tion with two groups or two or more distinct clusters of
data points arranged along the X axis. ð2Þ
Third, r is sensitive to how X and Y are sampled. When
runres is the estimated value of r in the unrestricted sample
the variable Y is predicted from the regression of Y on X, r
rres is the value of r calculated from the restricted sample
is proportional to the S.D. of the X values, and selecting or
S.D.Xunres is the standard deviation of the X values from the
including a narrow range of X values will result in a lower r
unrestricted sample
value than if a wider range of values had been included.
S.D.Xres is the standard deviation of the X values from the
This property of r is well illustrated in Figure 2, which is re-
restricted sample
plotted from data reported in Haegerstrom-Portnoy et al.,3
The best method of avoiding this problem is where pos-
and shows a clear linear relationship between r and the
sible to ensure from the outset that measurements of x and
range of acuity values (X) compiled from several studies.
y are made over a sufficient range of X values. Note that the
This relationship between r and range is essentially deter-
problem of range restriction may also occur in meta-analy-
mined by how it is calculated, i.e., r is the ratio of the cross-
sis where different contributing studies may have calculated
products of the X and Y deviations from their means to the
r from data sets with very different ranges of the X variable.
square root of the products of the individual variations in
Another type of correlation coefficient, the ICC, was one
X and Y. Hence, r is influenced by the variation in both the
of the earliest correlation methods to be applied to paired
x and y values but in practice the range on the X axis has
data, e.g., to compare the degree to which siblings may
the biggest effect. This aspect of correlation was studied in
resemble each other in a quantitative feature, the correla-
detail in the study of Haegerstrom-Portnoy et al.3 the
tion between the left and right eyes of patients,48 or the
objective being to determine whether spatial vision could
degree of reproducibility of a measurement.49–51 There are
be predicted from visual acuity measures alone. Hence,
a ‘family’ of ICCs, their use depending both on the

6 © 2019 The Authors Ophthalmic & Physiological Optics © 2019 The College of Optometrists
R A Armstrong Correlation coefficient

Figure 2. The relationship between the correlation coefficient (r) achieved and the range of acuity values (Spearman’s rank correlation rs = 0.91)
derived from several studies in the literature (Replotted from data compiled in Haegerstrom-Portnoy et al. OVS 2000; 77:653–662 with permission
from the author).

Table 3. Analysis of variance (ANOVA) of a linear regression and a t test


all data are pooled to estimate the mean and variance of the
of the significance of the slope b of the data illustrated in Figure 1
ICC and unlike r the ICC is not sensitive to range restric-
ANOVA tion. In addition, in some paired studies, the pairs of values
Source SS DF MS F are considered to be ‘unordered’, e.g., in twin or sibling
Regression 57.94 1 57.94 32.73 (p = 0.000025)
studies, there is no meaningful way to determine within a
Deviations from 30.10 17 1.77 twin pair which individual should come first. Reversing the
regression order of any pair would give a slightly different correlation
if Pearson’s r was used. The ICC avoids this problem
t test of slope of line
because it estimates the average correlation among all pos-
b S.E. t p
sible orderings of the pairs of observations.
0.16 0.03 5.72 0.00001 In addition, to calculating the degree of correlation
DF, degrees of freedom; F, Variance ratio; MS, Mean square; p, proba-
between paired entities, a major use of the ICC has been
bility; S.E., Standard error. in repeatability studies or test/retest reliability. Lam
et al.54 examined the degree of intra-observer and inter-
observer repeatability of anterior eye segment analysis
Table 4. Estimation of the unrestricted correlation coefficient (run) from system (EAS-1000) using the ICC and found good levels
restricted data using the Thorndyke case 2 method using the data illus-
of repeatability. In addition, ICC has been used to study
trated in Figure 1
repeatability in studies of the Corvis ST ‘air puff’
Restricted r (rres) S:D:Xres S:D:Xun Estimated run Actual run intraocular pressure measurement,55 between various
measures of subjective refraction,56 and dynamic micro-
0.48 4.88 10.90 0.77 0.81
scopy and subjective measurements of amplitude of
The restricted data employs the x.y values from the first 10 data points. accommodation.57 Nevertheless, some studies have used r
S.D.Xres , Standard deviation of the X values from the unrestricted sam- as a measure of correlation in test/retest studies58 or use
ple; S.D.Xun , Standard deviation of the X values from the restricted sam- it in combination with ICC59 but this use of r should be
ple.
avoided.60 Hence, the ICC is a useful adjunct to a Bland
and Altman plot as a relative measure of the degree of
assumptions being made and the type of data.52,53 The consistency of measurements made by two methods or
main difference between the ICC and conventional r is that by two observers.

© 2019 The Authors Ophthalmic & Physiological Optics © 2019 The College of Optometrists 7
Correlation coefficient R A Armstrong

‘r’ does not indicate causality 3 (d2.3), and third, the simple correlation coefficient
A significant value of r does not imply that there is a between d1.2 and d2.3 (r12.3) and is given by:
‘causal’ relationship between two variables or indeed, qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
that there is any relationship between them at all if there r12:3 ¼ r12  r13 :r23 = ð1  r13 2 Þð1  r 2 Þ: ð3Þ
23
is a third underlying variable which is unmeasured and
which can account for the correlation. This problem is
The partial correlation, which has N  3 DF, where N is
especially acute when many potential X variables are pre-
the number of experimental units sampled, is referred to the
sent, two of them (X1 and X2) can be significantly corre-
table of Pearson’s r to judge statistical significance. Hence,
lated not because they are directly related but because
partial correlation has been used to evaluate the relationship
both have mutual correlations with other variables. This
between reading speed and contrast sensitivity in dry eye dis-
circumstance commonly arises in non-experimental stud-
ease controlling for differences in age among groups,63 the
ies where there is a high probability that other variables
perceived bulbar redness of clinical grading scales,64 and to
are involved. For example, in the study of Rochtchina
investigate the relationship between refractive error (spheri-
et al.61 the data suggested a positive correlation between
cal equivalent) and visual acuity accounting for the DM/DD
IOP and age. Systolic blood pressure, however, was
ratio and the magnitude of anisometropia.65
strongly positively correlated with IOP and, after adjust-
The multiple correlation coefficient (R) has been less fre-
ing for this variable, a negative correlation between IOP
quently used in ophthalmic research. R is defined as the
and age was observed. Furthermore, after adjusting for
simple correlation between Y and its linear regression on all
other confounding variables (e.g., the presence of dia-
of the X variables included in the study. Hence, as in simple
betes, glaucoma, or myopia), IOP was no longer corre-
correlation, R2 is the fraction of the SS of deviations from
lated with age. By contrast, in the study of Lee et al.62
the mean of the Y values attributable to the regression as a
IOP in the Korean population was still negatively corre-
whole while 1  R2 is the proportion of the SS not associ-
lated with age even after adjustment for confounding
ated with the regression. A multiple regression should
variables.
account for at least half the variance of the data, i.e., R
A useful statistic when there are multiple variables is
should be at least 0.7 (R2 = 0.49). If nearly all of the vari-
the partial correlation coefficient (rp), i.e., the degree of
ance is associated with a single X variable, a multiple regres-
correlation between two variables when the effect of a
sion analysis adds little to that of a simple linear regression.
third confounding variable is removed. Pearson’s r is clo-
sely related to the bivariate normal distribution but if
more than two variables are present, then the multivari- The number of decimal digits for r
ate normal distribution is a more appropriate model. In It is commonplace in studies to quote at least two decimal
a multivariate normal distribution, any variable has a lin- places for r but there is the question of how many decimal
ear regression on any other variable or on any subset of places would it be appropriate to use.66,67 Hence, Bedeian
the variables with deviations that are normally dis- et al.66 carried out simulations to investigate the precision of
tributed.4 If there are three variables, then there are three simple correlations. It was concluded that if the true value of
simple population (p) correlations among them, viz., cor- r is low, which is common in many studies, and the sample
relation between variable 1 and 2 (p1.2), between variables size is <500, even two decimal places cannot be justified and
2 and 3 (p2.3), and between variables 1 and 3 (p1.3). a sample size of >100 000 would be required to warrant a
Hence, the partial correlation p12.3 is defined as the corre- third place of decimal. It is suggested that two decimal places
lation between variables 1 and 2 in a cross-section of should be retained in most studies but authors should not
individuals all having the same value of variable 3. place too much confidence in the second digit especially with
Hence, the third variable is held ‘constant’ so that only a small sample size or a low value of r.
variables 1 and 2 are involved in the correlation, p12.3,
being the same for every value of variable 3. The basic The problem of multiple testing
principle of the analysis is to calculate that part of the Studies of correlation will often include multiple testing
correlation between variables 1 and 2 which is not simply among many variables, the results being expressed usually
a reflection of their mutual relationship with variable 3. in the form of a matrix of r, which raises the question of
The sample estimate r12.3 of the population value p12.3 is whether the p value used to judge significance should be
obtained by calculating first, the deviation of variable 1 adjusted.67 If there are many variables present, then the
from its sample regression on variable 3 (d1.3), second, relationships between them should be studied using
the deviation of variable 2 from its regression on variable multiple correlation methods, as suggested in a previous

8 © 2019 The Authors Ophthalmic & Physiological Optics © 2019 The College of Optometrists
R A Armstrong Correlation coefficient

section, rather than examining the simple correlations 7. No more than two decimal places should be quoted for
among them. Nevertheless, a matrix of the simple correla- r, even with relatively large sample sizes and if r is rela-
tions is useful in the context of multiple regression as a tively small, little confidence should be given to the sec-
method of assessing the degree of inter-correlation between ond digit.
the X variables, as the analysis assumes little correlation
among them. If it is essential to examine the simple correla-
Acknowledgements
tions among many variables then the advice remains the
same as discussed previously.68 Hence, correction may be Grateful thanks to Professor G. Haegerstrom-Portnoy for
relevant in situations where an investigator is searching for allowing me to reproduce data regarding the correlation
significant correlations but without a pre-established between range of visual acuity present and the correlation
hypothesis. However, this use depends on the ‘intention’ of coefficient.
the investigator which always be stated in a study. In an
exploratory context, an investigator would not wish to miss
Disclosure statement
a possible correlation worthy of further study and there-
fore, a correction would be inappropriate. However, if the Dr R. A. Armstrong has nothing to disclose.
objective was to test all correlations among variables in the
hope that some would appear significant and the results
References
were not considered to be hypotheses for further study,
then a Bonferroni or related correction should be applied. 1. Armstrong RA, Eperjesi F & Gilmartin B. The use of correla-
tion and regression methods in optometry. Clin Exp Optom
2005; 88: 81–88.
Concluding remarks and advice 2. Pearson K & Lee A. On the laws of inheritance in man. I.
The problems and limitations of r suggest a more cautious Inheritance of physical characteristics. Biometrika 1902; 2:
approach regarding its use and to apply alternative meth- 357.
ods where appropriate and especially in the following cir- 3. Haegerstrom-Portnoy G, Schneck ME, Lott LA & Bra-
cumstances: byn JA. The relation between visual acuity and other
spatial vision measures. Optom Vis Sci 2000; 77: 653–
1. When the data do not fit a bivariate normal distribu-
662.
tion, viz. if both variables are not normally distributed.
4. Snedecor GW & Cochran WG. Statistical Methods, 7th edn.
If the normality of both X and Y is doubtful then Spear-
Iowa State University Press: Ames, IA, 1980.
man’s or Kendall’s rank correlation methods should be
5. Armstrong RA & Hilton A. Statistical Analysis in Microbiol-
used instead and the latter can also be used in a partial
ogy: Statnotes. Wiley-Blackwell: Hoboken, NJ, 2011.
correlation analysis. 6. Douthwaite WA & Jenkins TCA. Visually evoked responses
2. If the data exhibit a degree of curvature, which is often to checkerboard patterns: check and field size interactions.
evident on a graphical plot, then the presence of signifi- Optom Vis Sci 1982; 59: 894–901.
cant curvature can be tested using ANOVA. 7. Watanabe S, Yamestuba T & Ohba N. A longitudinal study
3. If large numbers of observations are present, calculation of cyclopegic refraction in a cohort of 350 Japanese
of r2 is essential as very low values of r can be statistically schoolchildren. Cyclopegic refraction. Ophthalmic Physiol
significant. If the objective is to test a specific hypothe- Opt 2002; 19: 22–29.
sis, then a sample size calculation for r should be consid- 8. Lee ES, Kang SY, Choi EH et al. Comparisons of nerve fibre
ered.22 layer thickness measurements between stratus, cirrus and
4. When testing the significance of a regression line, ANOVA RTVue OCTs in healthy and glaucomatous eyes. Optom Vis
goodness of fit and testing the slope of the regression Sci 2011; 88: 751–758.
line are useful additional tests. 9. Dietrich CF. Uncertainty, Calibration and Probability: The
5. If the data are clustered, have a restricted range of X, or Statistics of Scientific and Industrial Measurement, 2nd edn.
if outliers are present. Authors should pay particular Taylor & Francis Ltd: London, UK, 1991.
attention to the problem of limited range but statistical 10. Czepita D, Zejnio M & Mojsa A. Prevalence of myopia and
corrections for ‘range restriction’ should be used with hyperopia in a population of Polish schoolchildren. Oph-
caution. thalmic Physiol Opt 2007; 27: 60–65.
6. If there are a significant numbers of potential X vari- 11. Jorge J, Gonzalez-Meijonie JM, Quer os A, Pernandes P &
ables influencing Y, it may be better to employ rp or Diaz-Rey JA. A comparison of the NCT Reichert R7 with
Goldmann applanation tonometry and the Reichert ocular
multiple regression methods to examine the complex
response analyzer. Ophthalmic Physiol Opt 2011; 31: 174–
interrelationships among them rather than examine the
179.
simple correlations.

© 2019 The Authors Ophthalmic & Physiological Optics © 2019 The College of Optometrists 9
Correlation coefficient R A Armstrong

12. Flores-Rodriguez P, Gili P, Martin-Rios MD & Grifol-Clar 30. Lei F, Burns SA, Shao L & Yang Y. Retinal measurements
E. Comparison of optic area measurements using fundus using time domain OCT imaging before and after myopic
photography and optical coherence tomography between Lasik. Ophthalmic Physiol Opt 2012; 32: 222–227.
optic nerve head drusen and control subjects. Ophthalmic 31. Stakheev AA. Intraocular lens calculation for cataract after
Physiol Opt 2013; 33: 164–171. previous radial keratotomy. Ophthalmic Physiol Opt 2002;
13. Forster JE, Abadi RV, Muldoon M & Lloyd IC. Grading 22: 289–295.
infantile cataracts. Ophthalmic Physiol Opt 2006; 26: 372– 32. Armstrong RA, Slade SV & Eperjesi F. An introduction to
379. analysis of variance (ANOVA) with special reference to data
14. Goodman LA & Kruskal WH. Measures of association for from clinical experiments in optometry. Ophthalmic Physiol
cross classification. J Am Stat Assoc 1954; 49: 732–764. Opt 2000; 20: 235–241.
15. Mainstone JC, Carney LG, Anderson CR, Clem PM, Ste- 33. Armstrong RA, Davies L, Dunne MCM & Gilmartin B. Sta-
phensen AL & Wilson MD. Corneal shape in hyperopia. tistical guidelines for clinical studies of human vision. Oph-
Clin Exp Optom 2010; 81: 131–137. thalmic Physiol Opt 2011; 31: 123–126.
16. Jonuscheit S, Doughty MJ, Martin R et al. Relationship 34. Greene HA & Madden DJ. Adult age differences in visual
between corneal thickness and radius to body height. Optom acuity, stereopsis, and contrast sensitivity. Am J Optom
Vis Sci 2017; 94: 380–386. Physiol Opt 1987; 64: 749–753.
17. Plakitsi A, O’Donnell C, Miranda MA, Charman WN & 35. Gosnell R, Golden A & Hinder-Zimmerman A. Clinical
Radhakrishnan H. Corneal biomechanical properties mea- assessment of contrast sensitivity function: a comparison of
sured with ocular response analyser in a myopic population. charts with respect to low vision. J Vis Rehab 1989; 3: 11–32.
Ophthalmic Physiol Opt 2011; 31: 404–412. 36. Brown B & Lovie-Kitchin JE. High and low contrast acuity
18. Grosvenor T & Rolene S. Role of axial length/corneal radius and clinical contrasts sensitivity tested in a normal popula-
ratio in determining the relative state of the eye. Optom Vis tion. Optom Vis Sci 1989; 66: 467–473.
Sci 1994; 71: 573–579. 37. Elliot DB, Sanderson K & Conkey A. The reliability of the
19. Applegate RA, Marsack JD & Thibos LN. Metrics of Pelli-Robson contrast sensitivity chart. Ophthalmic Physiol
retinal image quality predict visual performance in eyes Opt 1990; 10: 21–24.
with 20/17 or better visual acuity. Optom Vis Sci 2006; 38. Hirvela H, Koskela P & Laatikainen L. Visual acuity and
83: 635–640. contrast sensitivity in the elderly. Acta Ophthalmol Scand
20. Koenig DE, Nguyen LC, Parker KE & Applegate RA. 1995; 73: 111–115.
Factors accounting for the 4-year change in acuity in 39. Rubin GS, West SK, Munoz B et al. A comprehensive
patients between 50 and 80 years. Optom Vis Sci 2013; assessment of visual impairment in a population of older
90: 620–627. Americans. The SEE Study. Salisbury Eye Evaluation Pro-
21. Nomura H, Ando F, Niino N, Shimokata H & Miyake Y. ject. Invest Ophthalmol Vis Sci 1997; 38: 557–568.
The relationship between age and intraocular pressure in a 40. Adams AJ, Haegerstrom-Portnoy G, Brown B & Jampolsky
Japanese population: the influence of central corneal thick- A. Predicting visual resolution from detection threholds.
ness. Curr Eye Res 2002; 24: 81–85. Am J Optom Physiol Opt 1984; 61: 371–376.
22. Armstrong RA. Editorial: is there a large sample size prob- 41. Elliot DB & Hurst MA. Simple clinical techniques to evalu-
lem? Ophthalmic Physiol Opt 2019; 39: 129–130. ate visual function in patients with early cataract. Optom Vis
23. Norman GR & Streiner DL. Biostatistics: The Bare Essentials. Sci 1990; 67: 822–825.
Mosby: St Louis, Baltimore, Boston, Chicago, Madrid, 42. Elliot DB & Whitaker D. Clinical contrast sensitivity chart
Philadelphia, Sydney & Toronto, 1994. evaluation. Ophthalmic Physiol Opt 1992; 12: 275–280.
24. Cohen J. Statistical Power Analysis for the Behavioural 43. Rubin GS. Reliability and sensitivity of clinical contrast sen-
Sciences, 2nd edn. Lawrence Erlbaum Assoc: Toronto, 1988. sitivity tests. Clin Vis Sci 1988; 2: 169–177.
25. Buda A & Jarinowski A. Life time of correlations and its 44. Rubin GS, Roche KB, Prasad-Rao P & Fried LP. Visual
applications. Wydawnictwo Niezalezne 2010; 5–21. impairment and disability in older adults. Optom Vis Sci
26. Devlin SJ, Gnanadesikan R & Kettering JR. Robust estima- 1994; 71: 750–760.
tion and outlier detection with correlation coefficients. Bio- 45. Guilford JP. Fundamental Statistics in Psychology and Educa-
metrika 1975; 62: 531–545. tion. McGraw-Hill: New York, NY, 1965.
27. Huber PJ & Ronchetti EM. Robust Statistics, 2nd edn. John 46. Thorndike RL. Research Problems and Techniques (Report
Wiley: Hoboken NJ, 2004. No. 3). US Govt: Washington, DC.
28. Lam AKC, Chan ST, Chan B & Chan H. The effect of axial 47. Wiberg M & Sundstrom A. A comparison of two
length on ocular blood flow assessment in anisometropes. approaches to correction of restriction of range in correla-
Ophthalmic Physiol Opt 2003; 23: 315–320. tion analysis. Prac Assess Res Eval 2009; 14: 1–9.
29. Vincent SJ, Collin MJ, Read SA, Carney LG & Yap MKH. 48. Armstrong RA. Statistical guidelines for the analysis of data
Interocular symmetry in myopic anisometropia. Optom Vis obtained from one or both eyes. Ophthalmic Physiol Opt
Sci 2011; 88: 1454–1462. 2013; 33: 7–14.

10 © 2019 The Authors Ophthalmic & Physiological Optics © 2019 The College of Optometrists
R A Armstrong Correlation coefficient

49. Altman DG & Bland JM. Measurement in medicine: the analysis 59. Hardgrave N, Hatley J & Lewerenz D. Comparing LEA
of method comparison studies. Statistician 1983; 32: 307–317. numbers low vision book and Feinbloom visual acuity
50. Bland JM & Altman DG. Statistical method for assessing charts. Optom Vis Sci 2012; 89: 1611–1618.
agreement between two methods of clinical measurement. 60. McAlinden C, Khadka J & Pesudovs K. Statistical methods
Lancet 1986; I: 307–310. for conducting agreement (comparison of clinical tests) and
51. Bland JM & Altman DG. Measurement error and correla- precision (repeatability or reproducibility) studies in
tion coefficients. BMJ 1996; 313: 41–42. optometry and ophthalmology. Ophthalmic Physiol Opt
52. Shrout PE & Fleiss JL. Intraclass correlations: uses in assess- 2011; 31: 330–338.
ing rater reliability. Psychol Bull 1979; 86: 420–428. 61. Rochtchina E, Mitchell P & Wang JJ. Relationship between
53. McGraw KO & Wong SP. Forming inferences about some age and intraocular pressure: the Blue Mountains eye study.
intraclass correlation coefficients. Psychol Methods 1996; 1: Clin Exp Ophthalmol 2002; 30: 173–175.
30–46. 62. Lee JS, Lee OH, Oum BS, Chung JS, Cho BM & Hong JW.
54. Lam AKC, Chan R, Woo GC, Pang PCK & Chin R. Intra- Relationship between intraocular pressure and systemic
observer and inter-observer repeatability of anterior eye seg- health parameters in a Korean population. Clin Exp Oph-
ment analysis system (EAS-1000) in anterior chamber con- thalmol 2002; 30: 237–241.
figuration. Ophthalmic Physiol Opt 2002; 22: 552–559. 63. Ridder WH III, Zhang Y & Huang JF. Evaluation of reading
55. Boszczyk A, Kasprzak H & Jozwik A. Eye retraction and speed and contrast sensitivity in dry eye disease. Optom Vis
rotation during Corvis ST ‘air puff’ intraocular pressure Sci 2013; 90: 37–44.
measurement and its quantitative analysis. Ophthalmic Phys- 64. Schulze MM, Hutchings N & Trefford S. The perceived bul-
iol Opt 2017; 37: 253–262. bar redness of clinical grading scales. Optom Vis Sci 2009;
56. Revert AM, Conversa MA, Diego CA & Mico V. An alterna- 86: E1250–E1258.
tive clinical routine for subjective refraction based on power 65. Pang Y, Franz KA & Roberts DK. Association of refractive
vectors with trial frames. Ophthalmic Physiol Opt 2016; 37: error with optic nerve hypoplasia. Ophthalmic Physiol Opt
24–32. 2015; 35: 570–576.
57. Leon AA, Medrano SM & Rosenfield M. A comparison of 66. Bedeian AG, Sturman MC & Streiner DL. Decimal dust, sig-
the reliability of dynamic microscopy and subjective mea- nificant digits, and the search for stars. Organ Res Methods
surements of amplitude of accommodation. Ophthalmic 2009; 12: 687–694.
Physiol Opt 2012; 32: 133–141. 67. Streiner DL. A plague of decimals: why too much precision
58. Mackeben M, Nair UKW, Walker LL & Fletcher DC. Ran- can be misleading. J Clin Psychopharm 2017; 37: 646–647.
dom word recognition chart helps scotoma assessment in 68. Armstrong RA. When to use the Bonferroni correction.
low vision. Optom Vis Sci 2015; 92: 421–428. Ophthalmic Physiol Opt 2014; 34: 502–508.

© 2019 The Authors Ophthalmic & Physiological Optics © 2019 The College of Optometrists 11
Correlation coefficient R A Armstrong

R.A. Armstrong was educated at King’s College London and St.


Catherine’s College, Oxford. His early research involved the applica-
tion of statistical methods to problems in Ecology and Botany. He
taught Ecology for many years at the University of Aston before
retraining in Neurosciences at the Institute of Psychiatry, London and
at the University of Washington, Seattle. Subsequently he has taught
biomedical subjects to students of optometry at the Vision Sciences
department of Aston University. He has been a statistical advisor for
Ophthalmic and Physiological Optics for many years. Currently he is
an honorary professor at Aston University where he is continuing his
research into the use of quantitative methods in the study of the
pathology of neurodegenerative disease.

12 © 2019 The Authors Ophthalmic & Physiological Optics © 2019 The College of Optometrists

You might also like