Algorithms for Computing the Sample Variance: Analysis and Recommendations Author(s): Tony F. Chan, Gene H.

Golub, Randall J. LeVeque Source: The American Statistician, Vol. 37, No. 3 (Aug., 1983), pp. 242-247 Published by: American Statistical Association Stable URL: . Accessed: 13/03/2011 09:56
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at . JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at . . Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to The American Statistician.

Algorithms Computing SampleVariance:Analysis for the and Recommendations
The problem computing variance a sample of the of of N datapoints } may difficult certain sets, be for data {xi particularly N is largeand thevariance small. when is a We present survey possiblealgorithms their of and round-off bounds, error including newanalysis some for computations shifted with data. Experimental results confirm bounds illustrate dangers some these and the of algorithms. Specific recommendations made as to are which algorithm should usedin various be contexts. KEY WORDS: Variance; Standard deviation; Shifted data; Round-off errors; Computer algorithms. as is variance to be calculated dynamically thedata is collected. of nature (1.1), it is standard To avoidthetwo-pass of the to practice manipulate definition S intotheform
N /N


i-N g E Xi)


in textbooks is Thisform frequently suggested statistical
Unand willbe called the textbook one-passalgorithm.

fortunately, although ismathematically (1.2) equivalent itcan The to (1.1), numerically be disastrous. quantities in be and large practice Lxi and(1/N) (Exi )2 may very If with error. be will generally computed somerounding cancelout is should thesenumbers thevariance small, in of almost completely thesubtraction (1.2). Many(or 1. INTRODUCTION will all) ofthecorrectly computed digits cancel, leaving errelative a S a computed with possibly unacceptable The problem computing variance a sample of the of in a S ror.The computed canevenbe negative, blessing ofN datapoints } is one that seems,at first glance, {xi that the sincethisat leastalerts programmer disguise to be almosttrivial can in factbe quitedifficult, but has cancellation occurred. disastrous when particularly N is largeand thevariance small. is To avoid thesedifficulties, severalalternative oneThe fundamental calculation consists computing of the have pass algorithms been introduced. These include sumofsquaresofthedeviations from mean, the of theupdating and Cramer algorithms Youngs (1971), N Welford (1962),West(1979),Hanson(1975),andCotS= (xi- x_) ( ton (1975), and the pairwise algorithm the present of 1=1 authors(Chan, Golub, and LeVeque 1979). In dewhere these we scribing algorithms usethenotation andMij Tij N to denotethesumand themeanof thedata points xi ( E x. xthrough respectively, xi, Ni=l
The sample variance is then SIN or S/(N - 1) de-

on The a pending theapplication. formulas define (1.1) for S. straightforward algorithm computing Thiswillbe

Tij =






since it requires called thestandardtwo-pass algorithm,

the onceto compute and x passing through datatwice: S. in thenagainto compute Thismaybe undesirable the is when datasample for many applications, example or too largeto be storedin mainmemory whenthe
*TonyF. Chan is AssistantProfessor,Departmentof Computer New Haven, CT 06520. Gene H. Golub is Science, Yale University, of and Chairman,Department ComputerScience, Stanford Professor Stanford,CA 94305. Randall J. LeVeque is Research University, of Fellow, CourantInstitute MathematicalSciences, New York UniNew York, NY 10012. This workwas supportedin part by versity, DAAG29-78DE-ACO2-81ER10996, ArmyContract DOE Contract G-0179, and by National Science Foundationand Hertz Foundation The articlewas produced using TEX, a comgraduatefellowships. systemcreated by Donald Knuthat Stanford. putertypesetting

and Sijto denotethesumofsquares
Sij =


( Xk- Mij )2.

For computing unweighted of squares,as we an sum consider of here,the algorithms Welford, West,and Hanson are virtually identical and are based on the formulas updating 1 Mlj = Ml jl + - (xi - Mlj 1l)
Syj = Sijij + (j

(1.3a) (1.3b)

1) (xj - Mj,1)




August1983, Vol. 37, No. 3 ?) The AmericanStatistician,

with Ml,1 = x1 and Sl,1= 0. The desired value of S is

as of ultimately obtained SliN. The updating formulas are Youngsand Cramer similar:
Tij= T1j1 + x


+ (1


1) (jxj



have with T1,1= x1 and Sl 1= 0. These two algorithms

than and stable the similar numerical behavior aremore both that textbook algorithm. Note,in particular, with as S ofthesealgorithms = S1,Nis computed thesumof Cotton's nonnegative quantities. updateis no morestanot and blethanthetextbook algorithm should be used (see Chan and Lewis1979). to formulas The updating (1.4) can be generalized of size. two allowus to combine samples arbitrary Suppose we have two samples{xi} >1, {xi} +' 1 and we know

pected to have the same advantage, as is confirmed numerically. Incidentally, pairwisesummationcan be used in imx plementing (1.1) (both in computing and in forming S) or (1.2) withsimilarbenefits. Other devices can also be used to increase the accuracy of the computed S. For data with a large mean value x, experiencehas shown thatsubstantialgains in all accuracycan be achieved by shifting of the data by to some approximation x beforeattempting compute to S. Even a crude estimate of x can yield dramaticimprovementsin accuracy, so we need not resort to a in estimatex. This is two-passalgorithm order to first discussedin detailin Section3. However,whentheshift is the computedmean and the textbookalgorithm (1.2) is thenapplied to the shifted data, one obtainsthe correctedtwo-passalgorithm
N N 2

S =


Ti,m = Exi,


(Xi )-x


m+n Tm+l,m+n = E i=m+ XiS

Sl,m =,


m Ti,m





E (xi-n n Tm+i,m+n)2

all of Then,ifwe combine ofthedataintoa sample size
m + n, we have
Tl,m+n = Ti,m + Tm+l,m+n
Sl,m+n = S1,m + Sm+l,m+n



(m+ n)


Tim -Tm+m+n

(1. 5b)

When m = n thisreduces to
S1,2m = S1,m + Sm+1,2m +



forms basisofthepairwise Thisformula the algorithm. summation for The pairwise the algorithm computing is sumofN numbers wellknown can be described and thatT1,2mshallbe computed as recursively stating by
Tl,2m = Ti,m + Tm+1,2m

Here the firstterm is simplythe two-pass algorithm ( The second termwould be zero in exact comto putationbut in practiceis a good approximation the errorin the first term.Note thatin thiscase use of the textbookalgorithmdoes not lead to catastrophiccanis muchsmaller cellation,since the correction generally was first thanthe first pointedout term.This algorithm to the authorsby ProfessorA. Bjorck (1978) who suggested this correctionterm based solely on the error analysis of the two-pass algorithm(Chan, Golub, and LeVeque 1979). An alternative(and improved) error analysisis given in Section 3. for Initiallyalgorithms computingthe variance were judged solely on the basis of empiricalstudies(Hanson 1975, West 1979, and Youngs and Cramer 1971). More recently rigorouserrorbounds have been obtained for many algorithms(Chan, Golub, and LeVeque 1979; Chan and Lewis 1978, 1979). Our aim here is to present a unified surveyof error analyses for the previously and mentionedalgorithms techniques.Some of thismaterial is believed to be new, particularlythe inintotheeffects shifting data. Based on of the vestigation will be made as this survey,specificrecommendations to whichalgorithm should be used in various contexts. 2. CONDITION NUMBERS AND ERROR ANALYSIS Chan and Lewis (1978) firstderived the condition number,K, of a sample {xi } (withrespectto computing the variance). This conditionnumbermeasuresthesenof sitivity S for the given data set. If relativeerrorsof size y are introduced into the xi, then the relative change in S is bounded by sy. Chan and Lewis showed thisto be true up to 0O(y2). In factit is strictly trueas noted by van Nes (1979). Physical data almost always has some uncertainty it, and thisuncertainty be in will else, errors magnified thefactorK in S. If nothing by are the introducedin representing data on the computer,

in side computed a with each of thesumson theright manner. Formula(1.6) defines analogous the similar for Thiscan the pairwise algorithm computing variance. in manner be implemented a one-pass only0 (log using storagelocationsas discussedin Chan, N) internal Golub, and LeVeque (1979) and also byNash (1981). be are All logarithms thisarticle base 2. It can easily in algorithm shown theuseofthepairwise that summation 0 reduces relative errors Ti N from (N) to 0 (logN) in can variance algorithm be exas N -*oc. The pairwise

August 1983, Vol. 37, No. 3 C) The American Statistician,

and so a value of S computed a computerwithma.on chineaccuracyu mayhave relativeerrorsas largeas KU of regardless what algorithm used. This value KUcan is be used as a yardstick whichto judge theaccuracyof by the various algorithms, especially since error bounds that are functions solely of K, u, and N can oftenbe derived. If we definethe 2-normof the data by


X i


thenthe conditionnumberforthisproblemis givenby

2 IX11 =


y2 1+X2N/S.


When S is smalland x is notclose to zero we obtainthe usefulapproximation

The numerical experiments wereperformed an IBM on 3081 computer the Stanford at Linear AcceleratorCenter. The data used were providedby a normalrandom number generator withmean 1 and a variety different of variances1- a ? 10-13. For thischoice of the mean, K` 1/( (see (2.2)). In each case the resultshave been averagedover 20 runs. Single precisionwas used in all of the tests,withmachine accuracyu 5 x iO-'. The "correct" answer for use in computingthe errorwas calculatedin double precision.The resulting errorsare denoted in the figures the symbols + (forN = 64) by and x (forN = 4096). The experimental results confirm generalform the of the error bounds given in Table 1. In particularthe graphs for the two-pass algorithmsshow how the higher-order terms(such as N2K2U2) begin to dominate the errorat fairly modest values of K. 3. COMPUTATIONS WITH SHIFTED DATA If we replace the originaldata {xi } by shifteddata {xi } definedby =xi - d x~i (3.1) d, forsome fixedshift thenthenew data has meanx-- d and S remains unchanged (assuming the xi are computedexactly).In practice,data witha nonzeromean is of shifted some a prioriestimate themean frequently by to before attempting compute S. This will generally increase the accuracyof the computed S. We analyze the this improvement investigating dependence of by the conditionnumberon the shift.Bounds on i, the conditionnumberof the shifteddata, are derivedfor various choices of the shiftd. These can then be insertedin place of K in the bounds of Table 1 to obtain with shifted error bounds for each of the algorithms data. we of Fromthe definition theconditionnumber have

i x/_7/

(forS small, x nonzero), (2.2)

whichis the mean divided by the standarddeviation. K We alwayshaveK ?-1, and in manysituations is very large. Table 1 shows the asymptotic errorbounds for the discussed.These are bounds on the relative algorithms error I(S - S )/S I in thecomputedvalue S. Small constant multipliers have been dropped, for clarity. termshave also been dropped, but the Higher-order termsshowndominatethe errorbounds wheneverthe error less than1. The boundsforthetextbook is relative and are algorithm West'supdating derivedbyChan and Lewis (1978). The two-passerrorbound including the N2K2U2 term(whichcan dominatein practice)is derived in Chan, Golub, and LeVeque (1979). Bounds forthese algorithms can usingpairwisesummation be foundsimilarly.The pairwisevariancealgorithm bound is a conof jecturebased on theform theerror boundforYoungs and Cramer updating and experimentalresults. The erroranalysisfor the correctedtwo-passalgorithm is givenin Section 3. Graphs of these bounds are shown in Figures 1 8 through along withsome experimental results.Each plot has K on the abscissa and the relativeerrorin S on the ordinate.The lowercurvein each figure showsthe errorbound forN = 64, the upper curveforN = 4096. Table 1. Error Bounds forthe RelativeError - S)/S I in the Computed Value S. Onlythe DominantTermsare Shown, and Small Constant Factors Have Been Suppressed forClarity

1 + - (x - d)2.



Algorithm 1. textbook 2. textbook pairwise with summation 3. two-pass 4. two-pass with pairwise summation 5. corrected two-pass 6. corrected two-pass with pairwise summation 7. updating 8. pairwise 244

Error Bound NK2U K2UlogN Nu + N2K2U2 u logN + (KUlogN)2 Nu + N3K2U3

Comparingthiswith(2.1) we see that i < K whenever Id - x <I xI l, thatis, wheneverd lies between0 and 2x. Taking d = xe gives perfectlyconditioned data, Ki= 1. In practice we cannot compute x exactly and usually will not even attemptto compute it (except whenusinga two-passalgorithm). Instead,we use some a roughestimatethatis easily computedwithout separate pass through of the data. all Frequentlya shiftd is obtained by simply "eyebe balling" the data. Such a techniquemight expected to yieldan approximation thatis within fewstandard d a deviationsof the mean. This is sufficient give comto pletelysatisfactory bounds on ic. Recall that the standard deviation lis ( S/N)2 and suppose that Ix - d I < p ( S/N)2 forsome smallp . Then (3.2) gives lc2<1+p2. ~~~(3.3) For example,ifd is within one standarddeviationofthe

u logN +


KUlogN (conjectured)

? The AmericanStatistician, August1983, Vol. 37, No. 3




x +X




I I I 1^1131 1

i00 lo-' io-3 10-4 10-3



io-4 10io-6

10-7 --I
10 7

100 101 102 103 104 10o5 1o
Algorithm Figure1. Textbook

II IgIi]


I iiIl




I11l III


I t~



IllU I






103 104 105 i6


Summation With Algorithm Pairwise 2. Figure Textbook

io0 1010-3xX

io-1 Lx~~~~~~~~
x +~~~~~
+ +



10-4 -x 105-s
-+ +-

10-6 10-s


c 10-7
io -7


I~ ~~~ li ii




102 103 104 105 1015 107


i lli


1111111 I I II


I I I 111111

I I I1111

FI II 10 7 100 litilil i1



I I H ll ilil 11111 !

103 104 105 1io
iil l,i


1I 111ii!

Algorithm 3. Figure Two-Pass

Summation With Algorithm Pairwise 4. Figure Two-Pass

1.00 10-1 100












1O-6 -Io-7











-x6t. 105 1i0 107


X Xx x


100 101 102
104 -x 10-5 +




? ''10

102 103 104 105o io


5. Two-Pass Figure Corrected Algorithm

With Algorithm Pairwise Two-Pass 6. Figure Corrected Summation


io0 10-1



++ 10 7




+1 Ilil







I li,


? 101


104 105

io6 107

Algorithm 8. Figure Pairwise August1983, Vol. 37, No. 3 (C The AmericanStatistician, 245

K mean then R \/2. This resultis completelyindependent of S and N. in It is notalwayspossibleto obtainan approximation this manner, nor is it always valid to make such an on assumption itsaccuracy.Anotherbound on K can be easily obtained by assumingonly that minxi < d ' max xi.
i i

to responding thisshiftis bounded by using Cauchy's inequality,

1 + N(


This is easly guaranteed,forexample by choosingone of the data pointsas the shift.When minxi c d c max - Xi)2 =S and so from ( xi, we have ( x- d)2zi (3.2), (3.4) R2< 1 + N. as This bound is not as satisfactory (3.3), but formodto erate values of N it may be sufficient guaranteeacceptable errorsin S. by For the case in whichwe shift a singledata point, probad = xjforsomej, we can obtainsome interesting of bilisticrefinements (3.4). Equality in (3.4) is unattainable and approximateequality holds only when
(- Xj 2

1+N 1(3.7) p For p = 1 this reduces to (3.4). We note that the reon can sultingalgorithm be veryeasily implemented a pocketcalculator,withgreatpotentialforacscientific curacyimprovement. We now consider the case in which the computed mean is used as the shift.In generalwe cannot ignore x. roundingerrorsin computing Instead we compute pointvalue fl(x ), givenby some approximatefloating

(X--xi )2,

fl(x) --N


farther fromx that is, only when xj lies considerably than do any of the otherxi. If xj is picked at random the from sample {xi }, thentheexpectedvalue of k2will be much smaller than 1 + N. In fact, since of E [( x - xi )2 ] = SIN, (the definition the sample variance), we have from(3.2) that

where the (i are bounded by (3.9) IkiI<Nu is summation used. Ifpairwise whentheusual (forward) summationis used, the N can be replaced by log N. Now we can bound R2 by
K2 = -2 1 + N (X -fl(X-))2 N-~~~~~~~


xi (1 + (i).


E [R2 ] = 2


of independent N and S. Note thatthisis also indepenof distribution the {xi }. We asdent of the underlying sumed onlythatxj was chosenfrom } witha uniform {xi distribution. we Alternatively could choose the data value with a fixedindex, say x1, and assume that the This may not be a valid asdata is ordered randomly. are sumptionif,forexample, initialtransients present in the data. Improvedupper bounds of the form(3.4) thathold close to 1 can also be obtainedprobawithprobability For bilistically. fixedk, 1 ' k ' N, the inequality
(- -Xi)2


1 1NS EXikiI

'+N 1

2~H IX 1122I(



can hold for at most Nlk values of i. Otherwisewe )2> (Nlk) (kS/N) = S. Thus if would have (-x xj of at random, thereis a probability at least is chosen <)2<kS/N. It fol(N - Nlk)/N = 1 - 1/k that (x lows that
K2< 1 +

for at withprobability least 1 - 1/k 1 < k s N. (3.6) If N 2 100 we have, forexample, R2< 101 withprobaof bility.99. This is again independent N and S when the shift is chosen at randomfromthe sample. xj We can generalizethischoice of d by usingthe averp age of some p data points, < N. This averagewillbe /p, denotedbyxp= EXxj thesumbeingoverthechosenp small that data points.We assume thatp is sufficiently errors computing can be ignored.Specifin rounding xp numbercoricallythisrequiresKPU K 1. The condition


(3.10) . K2 2C 1 + K2 2 N Here we have used (2.1) and the general inequality = maxi I. Using (3.9) K N 2e11, where 1 11 as we can rewrite (3.10) S 1 + N2K2U2 (3.11) Note that owing to the dependence on K, the bound (3.11) maybe worsethantheboundsobtainedformore thatcan situations of estimates d. This reflects primitive exactuallyoccur in practice.One can easily construct amples where the computed mean does not even lie between min xi and max xi and hence (x - fl(x ))2 iS largerthanmaxi(x - xi )2. In thiscase one is betteroff by shifting any singledata pointthan by the computed mean. Of courseshifting thecomputedmean mayalso be by of the choice from standpoint efficiency, an undesirable since it requires a separate pass throughthe data to computefl(x). Nonetheless,whena two-passalgorithm is acceptable and N2 K2 u2 is small (< 1, say), thisshift provides a verydefollowedby a one-pass algorithm twoS. pendable methodforcomputing The corrected (1.7) is of this form;it consistsof the pass algorithm on textbookalgorithm data shiftedby fl(x). Its error bound Nu( + N2 K2u2) is easily derived from(3.11) bounds of Table 1. and the textbookalgorithm could also be used in conOtherone-pass algorithms

August1983, Vol. 37, No. 3 ?) The AmericanStatistician,

by junctionwitha shift the computedmean. However, has ifa good shift been chosenso thatK 1, all one-pass equivalentwitha bound Nu are algorithms essentially usingpairwisesummations). (or u log N foralgorithms oneis Since the textbookalgorithm the mostefficient and onlyN multiplications 2N (requiring pass algorithm and as additions opposed to 4N multiplications 3N addifor tionsfortheupdating algorithms, example), itis the methodof choice except in rare instances. 4. RECOMMENDATIONS of The results theprevioussectionsprovidea basis for for choice of algorithm accurately makingan intelligent the computing sample variance. Firstwe note thatif a parallel processoris available, the data can be splitup intosmallersamples and the sum of squares computed foreach sample individually. These can then be combined, and the global sum of squares computed, by (1.5). In thatcase the conformulas usingthe updating thatfollowapply foreach processor. siderations There is one situationin which the textbookalgoas rithm (1.2) can be recommended itstands.Ifthedata smallenoughthatno overflows consistonlyof integers, occur, then (1.2) should be used withthe sums comIn puted in integerarithmetic. this case no roundoff the two errorsoccur untilthe finalstep of combining sums,in whicha divisionby N occurs. to decide whether data we mustfirst For nonintegral If use a one-passor a two-pass algorithm. all of thedata in and we are not interested memory fitin high-speed updatingthe varianceas new data are coldynamically is acceptable algorithm probably lected,thena two-pass and the correctedtwo-passalgorithm (1.7) is recommended. If N is large and high accuracyis needed, it in to maybe worthwhile use pairwisesummation implethisalgorithm. menting is step is If a one-pass algorithm to be used, the first the to shift data as well as possible,perhapsby some xp estimates as discussedin Section 3. (The probabilistic a verified usingmuchtighter posmaybe subsequently of terioribounds provided as a by-product the commust one-passalgorithm putation.)Now an appropriate estimate K, the condition be chosen. We should first number of the shifteddata, perhaps by one of the bounds of Section 3. If NK2U, the errorbound forthe is textbookalgorithm, at least as small as the desired can be relativeaccuracy,then the textbookalgorithm data. If thisbound is too large,we used on the shifted to for shouldresort a less efficient The algorithm safety. dependenceon N can be reducedby theuse of pairwise summation.The dependence on K can be reduced by The use of the pairwise using an updatingalgorithm. When N should reduce bothof thesefactors. algorithm is easy to is a power of 2 the pairwisealgorithm fairly and and implement requiresonly2N multiplications 4N algorithms. additions, whichis betterthanthe updating human more work (particularly For generalN slightly work) is required,makingit less attractive.

data? Intcgral yes no

textbook L(unshifted)_ no

two-pass? \ \ yes

as shift wellas possible and estimate Jc

two-pass (perhapswith pairwise summation)


small? sufficiently yes n


small? ~~~~~~~~~sufficiently

textbook with yes lNRu
pairwise summnation | sufficiently smaUl?|





||updating algorithm



|pairwise |algorithm

|| l|

9. Figure DecisionProcedure Choosing Algorithm for an to the For Compute Variance. Detailssee theRecommendations section

The decision procedure just described is shown in graphically Figure 9.
[Received 1982.Revised June 1982.1 April

communication. BJOCK,A. (1979),personal CHAN, T.F., GOLUB, G.H., and LeVEQUE, R.J. (1979),"Upand for Sample dating Formulae a Pairwise Algorithm Computing Variances," Compstat 1982,Proceedings the5thSymposium of heldat Toulouse, eds. H. Caussinus al., 30-41. et Error Analysis CHAN,T.F.C., andLEWIS, J.G.(1978),"Rounding for of Algorithms Computing Meansand Standard Deviations," No. 284, The Johns DeTechnical Report Hopkins University, of partment Mathematical Sciences. Com(1979),"Computing Standard Deviations: Accuracy," on Mean and COTTON, E.W. (1975), Remark "Stably Updating
municationsof the Association for ComputingMachinery,22, 526-531.

DeviHANSON, R.J.(1975),"Stably Updating MeanandStandard

StandardDeviation of Data," Communications theAssociation of for Computing Machinery,18, 458.

InterNASH, J.C. (1981),"Fundamental Statistical Calculations," faceAge,September, 40-42. VAN NESS, F. (1979),personal communication. WEST, D.H.D. (1979),"Updating Mean and Variance Estimates: Corfor WELFORD, B.P. (1962),"Noteon a Method Calculating of and Sums Squares Products," rected Technometrics, 4, 419-420. RelYOUNGS, E.A., andCRAMER, E.M. (1971),"SomeResults Techto evant Choiceof Sumand Sum-of-Product Algorithms," nometrics, 657-665. 13,
An Improved Method," Communications the Associationfor of Computing Machinery, 532-535. 22,

ationof Data," Communications theAssociation Computing of for Machinery,18, 57-58.

August1983, Vol. 37, No. 3 C The AmericanStatistician,


Sign up to vote on this title
UsefulNot useful