You are on page 1of 6

Spearman Rank Table 1

Person, i 1 2 3 4 5
Correlation Height (m), Yi 1.59 1.66 1.82 1.73 1.91
Weight (kg), Xi 75.8 77.2 89.3 72.2 81.5
In a study of the relationship between two variables, Rank of height, R(Xi ) 1 2 4 3 5
the use of measures of correlation assumes that nei- Rank of weight, R(Yi ) 2 3 5 1 4
ther is functionally dependent upon the other. So, di = R(Xi ) − R(Yi ) −1 −1 −1 2 1
for example, we might ask whether body weight di2 1 1 1 4 1
is related to height in 35-year-old men; or whether
examination scores in music theory are related to
examination scores in mathematics for first-year col-
high to low (e.g. rank 1 indicates the tallest person,
lege students; or whether the blood levels of two
rank 2 the next tallest, and so on, with rank n the
steroid hormones are related in 20-year-old women.
shortest) or from low to high (rank 1 denotes the
A quantitative measure of the strength of the cor-
shortest and rank n the tallest). Similarly, each of
relation is a correlation coefficient, which expresses
the n measurements of the second variable may be
how closely a change in the magnitude of one of the
denoted as Yi (i.e. Y1 , Y2 , . . . , Yn ), and R(Yi ) would
variables is accompanied by a change in the mag-
denote the rank of Yi , where the sequence of ranking
nitude of the other variable. This is also referred
(either high to low or low to high) is the same as for
to as a measure of association or of correspondence
R(Xi ). This is shown in Table 1.
(see Association, Measures of).
An rS = 0 (“no correlation”) indicates that the
If the distributions underlying the two variables
magnitudes of the ranks of one variable are inde-
are far from bivariate normal, or if the data are ordi-
pendent of the magnitudes of the ranks of the second
nal (e.g. we know relative magnitudes–such as man
variable. A positive value of rS (“positive correla-
A is taller than man C but shorter than man D–but
tion”) indicates that the R(Xi )s tend to increase as
we do not know their actual heights) (see Ordered
the R(Yi )s increase; a negative rS (“negative corre-
Categorical Data), then nonparametric correlation
lation”) indicates that the R(Xi )s tend to decrease as
techniques should be employed to test hypotheses
the R(Yi )s increase.
about the relationship between variables or to set
If the sequence of ranks were identical for the
confidence limits around the correlation coefficients.
two variables, we would say that there was a perfect
Nonparametric correlation also is less sensitive to
positive correlation, and rS = 1.0. This would occur,
outliers than is its parametric analog. The under-
for example, if five pairs of data had these ranks:
lying assumptions for nonparametric correlation are
that the n pairs of ratio, interval, or ordinal data 1 2 3 4 5 1 2 4 3 5
(see Measurement Scale) constitute a random sam- or these:
ple and that the two members of each of the n pairs 1 2 3 4 5 1 2 4 3 5
of data are measurements taken on the same subject.
A perfect negative correlation (where rS = −1.0)
Among the correlation coefficients proposed by
would be one in which the magnitudes of the ranks
Charles Spearman [19, 20] is a commonly used
for one variable vary inversely with the sizes of the
nonparametric correlation measure that Maurice
ranks of the second; for example,
Kendall formally associated with Spearman’s name a
quarter of a century later [14], and that is one of the
1 2 3 4 5
oldest statistics based on ranks. The Spearman rank
coefficient computed for a sample of data is typically 5 4 3 2 1
designated as rS .
If each of the n measurements of one of the
variables is denoted as Xi (i.e. X1 , X2 , . . . , Xn ), then Computing the Coefficient
R(Xi ) may represent the rank of Xi , where each rank
is an integer, from 1 through n, indicating relative The widely used parametric correlation coefficient,
magnitude. The measurements may be ranked from known as the Pearson product–moment correlation
2 Spearman Rank Correlation


coefficient (see Correlation), is defined as (Xi − X)(Yi − Y ) = (1 − 3)(2 − 3) + (2 − 3)

(Xi − X)(Yi − Y ) × (3 − 3) + (4 − 3)(5 − 3)
r =   1/2 , (1)
(Xi − X)2 (Yi − Y )2 + (3 − 3)(1 − 3) + (5 − 3)
× (4 − 3) = 6.
and commonly computed as
    Eq. (1) yields



 X Y  2 6 6
r =  XY −   X rS = = = 0.60,
n 
 [(10)(10)] 1/2 10

 2    2 1/2 while (2) yields




X
  Y
 51 − (15 × 15)/5
−  Y 2
−  , (2) rS =   1/2
n n 
 55 − (15)2 /5 55 − (15)2 /5
6
= = 0.60.
where X and Y are the means of the Xi s and  the [(10)(10)]1/2
Yi s, respectively, and where the summations ( ) are
each over all n data. As the sum of integers 1 through n (i.e. the
The Spearman rank correlation coefficient, rS , sum of all n ranks) is n(n + 1)/2, (2) employed for
may be obtained by subjecting the ranks, instead of Spearman rank correlation may be written as
the raw measurements, to the above calculations. For 
the example above, and substituting R(Xi ) for Xi R(Xi )R(Yi ) − n(n + 1)2 /4
rS =     . (3)
and R(Yi ) for Yi : 
 R(Xi )2 − n(n + 1)2 /4 

  1/2
Xi = 1 + 2 + 4 + 3 + 5 = 15, 
× R(Yi )2 − n(n + 1)2 /4 


(Xi )2 = 12 + 22 + 42 + 32 + 52 = 55,
Also, as the sum of the squares of all n ranks
 is n(n + 1)(2n + 1)/6, (2) using ranks can be
Yi = 2 + 3 + 5 + 1 + 4 = 15,
reduced to
  
(Yi )2 = 22 + 32 + 52 + 12 + 42 = 55, 12 R(Xi )R(Yi ) − n(n + 1)2 /4
rS = (4)
and n3 − n
 or
(Xi Yi ) = (1)(2) + (2)(3) + (4)(5) + (3)(1) 
12 R(Xi )R(Yi ) 3(n + 1)
+ (5)(4) = 51. rS = − . (5)
n3 − n n−1
Then, Alternatively, the difference, di , for each pair of
ranks may be obtained, and the following equa-
X = 15/5 = 3 and Y = 15/5 = 3, tion used: 
6 di2
and rS = 1 − 3 , (6)
 n −n
(Xi − X)2 = (1 − 3)2 + (2 − 3)2 + (4 − 3)2 which, for the above example, is
+ (3 − 3) + (5 − 3) = 10,
2 2
6(1 + 1 + 1 + 4 + 1)
 rS = 1 −
(Yi − Y )2 = (2 − 3)2 + (1 − 3)2 + (5 − 3)2 53 − 5
48
+ (3 − 3)2 + (4 − 3)2 = 10, =1− = 1 − 0.40 = 0.60.
120
Spearman Rank Correlation 3

Eq. (6) is most commonly encountered in textbooks, where 


but (1) is very convenient on a computer.  (ti3 − ti )
Instead of the differences between pairs of ranks, tX = , (12)
12
one may use the sums of the ranks for each pair [15,
p. 227; [20]]: where ti is the number of tied values of X in a
group of ties (two in the paragraph above) and the

6 Si2 summation is over all groups of tied Xs, and
7n + 5
rS = 3 − , (7) 
n −n n−1  (ti3 − ti )
tY = , (13)
where 12
Si = R(Yi ) + R(Xi ). (8) where ti is the number of tied values of Y in a
group of ties (three in the paragraph above) and the
It can be shown [13] that in bivariate normal popu- summation is over all groups of tied Y s. Similarly,
lations the Pearson correlation coefficient, ρ, is the following [21] is an alternative to (7):
π   
ρ = 2 sin ρS . (9)  Si2 − [(n3 − n)/6][(7n + 5)/(n − 1)] 
6  
 − tX − tY 
rS =      ,
Tied Ranks  (n3 −n)/6−2 tX 
  1/2
If two or more data have the same value, then they  × (n3 −n)/6−2 tY 
are said to be “tied”, and each of their ranks may be
set equal to the mean of the ranks of the positions (14)
they occupy in the ordered data set. For example, in  
If tX and tY are both zero, then (10) is equiv-
the data set 70, 74, 74, 78, and 79 kg, data 2 and 3
alent to (4), (11) equals (6), and (14) is equivalent
are tied; the mean of 2 and 3 is 2.5, so the ranks of
to (7). The results from these equations for tied and
the five data are 1, 2.5, 2.5, 4, and 5. In the data set
nontied data are noticeably different only if there are
1.6, 1.7, 1.9, 1.9, and 1.9 m, data 3, 4, and 5 are tied;
many ties.
the mean of 3, 4, and 5 is 4, so the ranks of the five
data are 1, 2, 4, 4, 4.
Then these ranks would be subjected to (1), or, Testing Hypotheses
equivalently, the following calculation [9, p. 366]
would be used as an alternative to (4): The rS calculated from a sample of data is an estimate
  of ρS , the Spearman rank correlation coefficient that
rS = 12 R(Xi )R(Yi ) would be obtained from the entire population of data
   
from which that sample came; ρS is sometimes called

−n(n + 1) /4 2
(n3 − n) − 12 tx “Spearman’s rho”.
A common desire in rank correlation analysis is to
  1/2 test the null hypothesis that there is no correlation
× (n3 − n) − 12 tY (10) in the population between the paired ranks, i.e. we
wish to test the two-tailed hypotheses H0 : ρS = 0
and the following [12, p. 38; [20]] is an alternative vs. Ha : ρS  = 0 (see Hypothesis Testing). There are
to (6): many tables of critical values of rS , and if rS is greater
      2 critical value, then H0 is rejected.
than the relevant
rS = (n3 − n)/6 − di2 − tX − tY The use of di , instead of rS , as the test statistic
for rank-correlation testing issometimes called the
   “Hotelling–Pabst test” [10].  di2 is small when rS
(n3 − n)/6 − 2 tX
is large, and H0 is rejected if di2 is less than the
  1/2 critical value. Published tables offer critical values for
× (n3 − n)/6 − 2 tY , (11) various sample sizes, n, and levels of significance,
4 Spearman Rank Correlation

α. The most extensive of such tables for rS are i.e. H0 : ρ = ρ0 vs. Ha : ρ = ρ0 , where ρ0  = 0. This
those of Zar [22, Appendix, pp. 115–116] and, with is done via
slight improvements, of Ramsey [17]. If there are tied z − ζ0
Z= , (20)
data, critical values are only approximate. It should σz
be noted that computer software packages may use
approximations that are not as accurate as published where z is the transform of rS ; ζ0 is the transform of
tables. the hypothesized coefficient, ρ0 ; the standard error of
One-tailed hypotheses may also be considered. For z is approximated by
H0 : ρS ≤ 0 vs. Ha : ρS > 0, H0 is rejected if rS is !1/2
1.060
positive and greater than the critical value for α/2. σz = (21)
For H0 : ρS ≥ 0 vs. Ha : ρS < 0, H0 is rejected if rS n−3
is negative and its absolute value is greater than the
[7, 8], and Z is a normal deviate. In this fashion both
critical value for α/2. If n is larger than that in these
two-tailed and one-tailed hypotheses may be tested.
large tables, then one may compute
rS
t= , (15) Confidence Limits
s
where s, the standard error of rS , is The z transformation also allows the setting of
approximate 1 − α confidence limits for ρS . The con-
!1/2
1 − r2 fidence limits for the z transformation are
s= , (16)
n−2
z ± Zα σz , (22)
for which two- and one-tailed critical values of t
(Student’s t distribution), for df = n − 2, are readily where Zα = tα (∞). Then, the lower confidence limit
found. Equivalently, one may employ of the transformation, L1 = z − Zα σα , is converted
to the lower confidence limit of ρS by
1 + |rS |
F = (17) exp(2L1 ) − 1
1 − |rS | , (23)
exp(2L1 ) + 1
[2], referring to two- or one-tailed critical values of
the F distribution for numerator and denominator and the upper confidence limit of the transformation,
df = n − 2. Using t or F is valid even with tied data, L2 = z + Zα σα , is converted to the upper confidence
and is preferable in any case to employing the normal limit of ρS by substituting L2 for L1 in (23) above.
approximation, Published tables, e.g. [22, Appendix, pp. 112–114],
execute these conversions.
Z = rS (n − 1)1/2 . (18)
Power of Testing
The Fisher Transformation For data that meet the normality assumptions of
parametric correlation analysis, use of the Spearman
If n is at least moderately large, the Spearman cor-
method has a relative efficiency of 9/π 2 = 0.912
relation coefficient may be subjected to the Fisher z
compared with the parametric procedure for testing
transformation by
hypotheses about the population correlation coeffi-
1 + rS cient [10]. For other data distributions, the Spearman
z = 0.5 ln , (19) procedure may perform even better. The power of
1 − rS
hypothesis tests for ρS , and the determination of the
and there are tables, e.g. [22, Appendix pp. 110–111], minimum sample size needed to achieve a desired
available to obviate the need to perform this compu- power, may be approximated by an adaptation of the
tation. With this transformed value, one may test null procedures of Cohen [4, p. 546], as shown by Zar
hypotheses that ρS equals some value other than zero; [22, pp. 379–380, 392].
Spearman Rank Correlation 5
 2
Other Rank Correlation Measures Basler [1] discusses a relationship between di
and the chi-square test statistic in a fourfold contin-
The Kendall rank correlation coefficient [11, 12] is gency table with ordinal marginal categories
the other commonly encountered rank correlation (see Two-by-Two Table).
measure (see Rank Correlation). It is often referred
to as Kendall’s tau, with the population parameter References
designated as τ and the sample estimate of τ denoted
as τ̂ , t, T , or (unfortunately) τ . Whereas τ̂ is an [1] Basler, H. (1988). Equivalence between tie-corrected
unbiased estimate of τ , rS is a biased estimate of ρS , Spearman test and a chi-square test in a fourfold
with E(rS ) = [3τ + (n − 2)ρS ]/(n + 1) [6], but this contingency table, Metrika 35, 203–209.
bias disappears rapidly as n increases. [2] Cacoullos, T. (1965). A relation between the t and
F distributions, Journal of the American Statistical
The two rank-correlation procedures have differ-
Association 60, 528–531.
ent underlying premises and influences (e.g. rS is [3] Chow, B., Miller, J.E. & Dickinson, P.E. (1974). Exten-
more affected by larger di s), so they do not necessar- sions of Monte Carlo comparison of some properties
ily yield identical coefficients, τ̂ and rS ; indeed, data of two rank correlation coefficients in a small sam-
sets may have the same τ̂ s yet different rS s. How- ple, Journal of Statistical Computation and Simulation
ever, there is a very strong correlation between the 3, 189–195.
[4] Cohen, J. (1988). Statistical Power Analysis for the
two coefficients, and they each may range between
Behavioral Sciences, 2nd Ed. Lawrence Earlbaum,
−1.0 and 1.0. Daniels [5] found the relationship Hillsdale.
[5] Daniels, H.E. (1950). Rank correlation and population
−(n − 2) ≤ 3nτ̂ − 2(n + 1)rS ≤ (n − 2),
models, Journal of the Royal Statistical Society, Series
which, for large n, is B 12, 171–181.
[6] Durbin, J. & Stuart. A. (1951). Inversions and rank
−1 ≤ 3τ̂ − 2rS ≤ 1. correlation coefficients, Journal of the Royal Statistical
Society, Series B 13, 303–309.
A better relationship was proved by Durbin & Stu- [7] Fieller, E.C., Hartly, H.O. & Pearson. E.S. (1957).
art [6] to be Tests for rank correlation coefficients, Biometrika 44,
470–481.
3nτ̂ − (n − 2) (1 − τ̂ ) [8] Fieller, E.C., Hartly, H.O. & Pearson. E.S. (1961).
≤ rS ≤ 1 − Tests for rank correlation coefficients. II, Biometrika 48,
2(n + 1) 2(n + 1)
29–40.
× [(n − 1)(1 − τ̂ ) + 4]. [9] Gibbons, J.D. & Chakraborti. S. (1992). Nonparametric
Statistical Inference, 3rd Ed. Marcel Dekker, New York.
Whether rS or τ̂ is preferable depends upon the crite- [10] Hotelling, H. & Pabst. M.R. (1936). Rank correlation
ria employed to make the judgment; Chow et al. [3] and tests of significance involving no assumption of
judged rS to be the preferable estimator. normality, Annals of Mathematical Statistics 7, 29–43.
Spearman’s [19, 20] introduction of correlation [11] Kendall, M.G. (1938). A new measure of rank correla-
tion, Biometrika 30, 81–93.
between ranks was accompanied by a correlation [12] Kendall, M.G. (1962). Rank Correlation Methods, 3rd
measure of which he was fond, the “Spearman Ed. Charles Griffin, London.
footrule”, based upon
 |R(Xi ) − R(Yi )| instead of [13] Kruskal, W.H. (1958). Ordinal measures of associa-
[R(Xi ) − R(Yi )]2 . This measure is less useful than tion, Journal of the American Statistical Association 53,
rS in statistical analysis and is no longer encountered. 814–861.
If one’s interest is predominantly in the correla- [14] Lovie, A.D. (1995). Who discovered Spearman’s rank
correlation?, British Journal of Mathematical and Statis-
tion among the largest (or smallest) members in the
tical Psychology 48, 255–269.
two populations, then the weighted rank correlation [15] Meddis, R. (1984). Statistics Using Ranks: A Unified
concept [18, 16; see also [22], pp. 392–395] might Approach. Basil Blackwell, Oxford.
usefully be employed. [16] Quade, D. & Salama. I. (1992). A survey of weighted
The Spearman rank correlation coefficient, rS , is rank correlation, in Order Statistics and Nonparametrics:
related to the Kendall coefficient of concordance, W , Theory and Applications, P.K. Sen & I. Salama, eds.
when there are two sets of ranks, as Elsevier, New York, pp. 213–224.
[17] Ramsey, P.H. (1988). Critical values for Spearman’s
(rS + 1) rank order correlation, Journal of Educational Statistics
W = . (24) 14, 245–253.
2
6 Spearman Rank Correlation

[18] Salama, I. & Quade. D. (1982). A nonparametric com- [21] Thomas, G.E. (1989). A note on correcting for ties with
parison of two multiple regressions by means of a Spearman’s ρ, Journal of Statistical Computation and
weighted measure of correlation, Communications in Simulation 31, 37–40.
Statistics – Theory and Methods 11, 1185–1195. [22] Zar, J.H. (1996). Biostatistical Analysis, 3rd Ed.
[19] Spearman, C. (1904). The proof and measurement of Prentice-Hall, Upper Saddle River.
correlation between two things, American Journal of
Psychology 15, 72–101. JERROLD H. ZAR
[20] Spearman, C. (1906). “Footrule” for measuring correla-
tion, British Journal of Psychology 2, 89–108.

You might also like