Standard deviation
Standard deviation is the measu rement of the distribu tion of data abou t a
mean valu e. It describes the dispersion of data on either side of a mean
valu e. A low standard deviation indicat es that the data set is clu stered
arou nd the mean valu e, whereas a high standard deviation indicates that the
data is widely spread with significantly higher/lower figu res than the mean.
T he mean is the arithmetic average.
Formu lated by Francis Galton in the late 1860s, the standard deviation
remains the most common measu re of statistical dispersion, measu ring how
widely spread the valu es in a data set are. If many data points are close to
the mean, then the standard deviation is small; if many data points are far
from the mean, then the standard deviation is large. If all data valu es are
equ al, then the standard deviation is zero. A u sefu l property of standard
deviation is that, u nlike variance, it is expressed in the same u nits as the

When only a sample of data from a popu lation is available, the popu lation
standard deviation can be estimated by a modified standard deviation of the
sample, explained below.

Definition and calculation

Probability distribution or random variable

T he standard deviation of a (u nivariate) probability distribu tion is the same as

that of a random variable having that distribu tion.

T he standard deviation σ of a real-valu ed random variable X is defined as:

where E(X) is the expected valu e of X. E(X) is another name for the mean,
and it is often indicated with the Greek letter μ.

Not all random variables have a standard deviation, since these expected
valu es need not exist. For example, the standard deviation of a random
variable which follows a Cau chy distribu tion is u ndefined becau se its E(X) is
u ndefined.

Continuous random variable

Continu ou s distribu tions u su ally give a formu la for calcu lating the standard
deviation as a fu nct ion of the parameters of the distribu tion. In general, the
standard deviation of a continu ou s real-valu ed random variable X with
probability density fu nction p(x) is


and where the integrals are definite integrals taken for x ranging over the
range of X.

Discrete random variable or data set

T he standard deviation of a discrete random variable is the root-mean-squ are

(RMS) deviation of its valu es from the mean.

If the random variable X takes on N valu es (which are real

nu mbers) with equ al probability, then its standard deviat ion σ can be
calcu lated as follows:

1. Find the mean, , of the valu es.

2. For each valu e xi calcu late its deviat ion ( ) from the mean.
3. Calcu late the squ ares of these deviat ions.
4. Find the mean of the squ ared deviations. T his qu antity is t he variance σ .
5. T ake the squ are root of the variance.

T his calcu lation is described by the following formu la:

where is the arithmetic mean of the valu es x i, defined as:

If not all valu es have equ al probability, bu t the probability of valu e xi equ als
pi, the standard deviation can be compu ted by:



and N' is the nu mber of non-zero weight elements.

T he standard deviation of a data set is the same as that of a discrete random

variable that can assu me precisely the valu es from the data set, where the
point mass for each valu e is proportional to its mu ltiplicity in the data set.


Su ppose we wished to find the standard deviation of the data set consisting
of the valu es 3, 7, 7, and 19.

Step 1: find the arithmetic mean (average) of 3, 7, 7, and 19,

Step 2: find the deviation of each nu mber from the mean,

Step 3: squ are each of the deviations, which amplifies large deviations and
makes negative valu es positive,

Step 4: find the mean of those squ ared deviat ions,

Step 5: take the non-negative squ are root of the qu otient (converting
squ ared u nits back to regu lar u nits),

So, the standard deviation of the set is 6. T his example also shows that, in
general, the standard deviation is different from the mean absolu te deviation
(which is 5 in this example).

Note that if the above data set represented only a sample from a greater
popu lation, a modified standard deviat ion wou ld be calcu lated (explained
below) to estimate the popu lation standard deviation, which wou ld give 6.93
for this example.

Simplification of the formula

T he calcu lation of the su m of squ ared deviations can be simplified as follows:

Applying this to the original formu la for standard deviation gives:

Estimating population standard deviation from sample

standard deviation

In the real world, finding the standard deviation of an ent ire popu lation is
u nrealistic except in certain cases, su ch as standardized testing, where every
member of a popu lation is sampled. In most cases, the st andard deviation is
estimated by examining a random sample taken from the popu lation. Using
the definition given above for a data set and applying it to a small or
moderately-sized sample resu lts in an estimate that tends to be too low. T he
most common measu re u sed is an adju sted version, the sample standard
deviation, which is defined by

where is the sample and is the mean of the sample. T he

denominator N − 1 is the nu mber of degrees of freedom in the vector
T he reason for this definition is that s is an u nbiased estimator for the

variance σ 2 of the u nderlying popu lation, if that variance exists and the
sample valu es are drawn independent ly with replacement. However, s is not
an u nbiased estimator for the standard deviation σ; it tends to
u nderestimate the popu lation standard deviation. Althou gh an u nbiased
estimator for σ is known when the random variable is normally distribu ted,
the formu la is complicated and amou nts to a minor correction: see Unbiased
estimation of standard deviation. Moreover, u nbiasedness, in this sense of
the word, is not always desirable; see bias of an estimator.

Another estimator sometimes u sed is the similar expression

T his form has a u niformly smaller mean squ ared error than does the
u nbiased estimator, and is the maximu m-likelihood estimate when the
popu lation is normally distribu ted.

Properties of standard deviation

For constant c and random variables X and Y:

where and stand for variance and covariance, respectively.

Interpretation and application

A large standard deviation indicates that the data points are far from the
mean and a small standard deviation indicates that they are clu stered closely
arou nd the mean.

For example, each of the three data sets {0, 0, 14, 14}, {0, 6, 8, 14} and {6,
6, 8, 8} has a mean of 7. T heir standard deviations are 7, 5, and 1,
respectively. T he t hird set has a mu ch smaller standard deviation than the
other two becau se its valu es are all close to 7. In a loose sense, the standard
deviation tells u s how far from the mean the data points tend to be. It will
have the same u nit s as the data points themselves. If, for instance, the data
set {0, 6, 8, 14} represents the ages of fou r siblings in years, the standard
deviation is 5 years.

As another example, the data set {1000, 1006, 1008, 1014} may represent
the distances traveled by fou r athletes, measu red in met ers. It has a mean of
1007 meters, and a standard deviation of 5 meters.

Standard deviation may serve as a measu re of u ncertainty. In physical science

for example, the reported standard deviation of a grou p of repeated
measu rements shou ld give the precision of those measu rements. When
deciding whether measu rements agree with a theoretical prediction, the
standard deviation of those measu rements is of cru cial importance: if the
mean of the measu rements is too far away from the prediction (with the
distance measu red in standard deviations), then the theory being tested
probably needs to be revised. T his makes sense since they fall ou tside the
range of valu es that cou ld reasonably be expected to occu r if the prediction
were correct and t he standard deviation appropriately qu antified. See
prediction interval.

Application examples

T he practical valu e of u nderstanding t he standard deviation of a set of valu es

is in appreciating how mu ch variation there is from the "average" (mean).


As a simple example, consider average temperatu res for cities. While two
cities may each have an average temperatu re of 15 °C, it's helpfu l to
u nderstand that the range for cities near the coast is smaller than for cities
inland, which clarifies that, while the average is similar, the chance for
variation is greater inland than near the coast.

So, an average of 15 occu rs for one cit y with highs of 25 °C and lows of 5 °C,
and also occu rs for another city with highs of 18 and lows of 12. T he standard
deviation allows u s to recognize that t he average for the city with the wider
variation, and thu s a higher standard deviation, will not offer as reliable a
prediction of temperatu re as the city with the smaller variation and lower
standard deviation.


Another way of seeing it is to consider sports teams. In any set of categories,

there will be teams that rate highly at some things and poorly at others.
Chances are, the teams that lead in the standings will not show su ch
disparity, bu t will perform well in most categories. T he lower the standard
deviation of their ratings in each category, the more balanced and consistent
they will tend to be. Whereas, teams with a higher standard deviation will be
more u npredictable. For example, a team that is consistently bad in most

categories will have a low standard deviation. A team that is consistently good
in most categories will also have a low standard deviation. However, a team
with a high standard deviation might be the type of team that scores a lot
(strong offense) bu t also concedes a lot (weak defense), or, vice versa, that
might have a poor offense bu t compensates by being difficu lt to score on.

T rying to predict which teams, on any given day, will win, may inclu de looking
at the standard deviations of the variou s team "stats" ratings, in which
anomalies can match strengths vs. weaknesses to attempt to u nderstand
what factors may prevail as stronger indicators of eventu al scoring
ou tcomes.

In racing, a driver is timed on su ccessive laps. A driver with a low standard

deviation of lap times is more consistent than a driver wit h a higher standard
deviation. T his information can be u sed to help u nderstand where
opportu nities might be fou nd to redu ce lap times.


In finance, standard deviation is a representation of the risk associated with a

given secu rity (stocks, bonds, property, etc.), or the risk of a portfolio of
secu rities (actively managed mu tu al fu nds, index mu tu al fu nds, or ET Fs).
Risk is an important factor in determining how to efficient ly manage a
portfolio of investments becau se it det ermines the variation in retu rns on
the asset and/or portfolio and gives investors a mathemat ical basis for
investment decisions (known as mean-variance optimizat ion). T he overall
concept of risk is t hat as it increases, the expected retu rn on the asset will
increase as a resu lt of the risk premiu m earned – in other words, investors
shou ld expect a higher retu rn on an investment when said investment carries
a higher level of risk, or u ncertainty of that retu rn. When evalu ating
investments, investors shou ld estimate both the expect ed retu rn and the
u ncertainty of fu tu re retu rns. Standard deviation provides a qu antified
estimate of the u ncertainty of fu tu re retu rns.

For example, let's assu me an investor had to choose bet ween two stocks.
Stock A over the last 20 years had an average retu rn of 10%, with a standard
deviation of 20% and Stock B, over the same period, had average retu rns of
12%, bu t a higher standard deviation of 30%. On the basis of risk and retu rn,
an investor may decide that Stock A is the safer choice, becau se Stock B's
additional 2% point s of retu rn is not worth the additional 10% standard
deviation (greater risk or u ncertainty of the expected retu rn). Stock B is
likely to fall short of the initial investment (bu t also to exceed the initial
investment) more often than Stock A u nder the same circu mstances, and is
estimated to retu rn only 2% more on average. In this example, Stock A is
expected to earn abou t 10%, plu s or minu s 20% (a range of 30% to -10%),
abou t two-thirds of the fu tu re year retu rns. When considering more extreme

possible retu rns or ou tcomes in fu tu re, an investor shou ld expect resu lts of
u p to 10% plu s or minu s 90%, or a range from 100% to -80%, which inclu des
ou tcomes for three standard deviations from the average retu rn (abou t
99.7% of probable retu rns).

Calcu lating the average retu rn (or arit hmetic mean) of a secu rity over a given
nu mber of periods will generate an expected retu rn on t he asset. For each
period, su btracting the expected retu rn from the actu al retu rn resu lts in
the variance. Squ are the variance in each period to find the effect of the
resu lt on the overall risk of the asset. T he larger the variance in a period, the
greater risk the secu rity carries. T aking the average of t he squ ared variances
resu lts in the measu rement of overall u nits of risk associated with the asset.
Finding the squ are root of this variance will resu lt in the standard deviation
of the investment t ool in qu estion.

Geometric interpretation

T o gain some geometric insights, we will start with a popu lation of three
valu es, x1 , x2, x3. T his defines a point P = (x1 , x2, x3) in R . Consider the line L
= {(r, r, r) : r in R}. T his is the "main diagonal" going throu gh the origin. If ou r
three given valu es were all equ al, then the standard deviation wou ld be zero
and P wou ld lie on L. So it is not u nreasonable to assu me t hat the standard
deviation is related to the distance of P to L. And that is indeed the case.
Moving orthogonally from P to the line L, one hits the point :

whose coordinates are the mean of the valu es we started ou t with. A little
algebra shows that the distance between P and R (which is the same as the
distance between P and the line L) is given by σ√3. An analogou s formu la
(with 3 replaced by N) is also valid for a popu lation of N valu es; we then have
to work in R .

Chebyshev's inequality

An observation is rarely more than a few standard deviat ions away from the
mean. Chebyshev's inequ ality entails t he following bou nds for all distribu tions
for which the standard deviation is defined.

At least 50% of the valu es are within √2 standard deviations from the
At least 75% of the valu es are within 2 standard deviations from the
At least 89% of the valu es are within 3 standard deviations from the

At least 94% of the valu es are within 4 standard deviations from the
At least 96% of the valu es are within 5 standard deviations from the
At least 97% of the valu es are within 6 standard deviations from the
At least 98% of the valu es are within 7 standard deviations from the

And in general:
At least (1 − 1/k ) × 100% of the valu es are within k standard deviations
from the mean.

Rules for normally distributed data

T he central limit theorem says

that the distribu tion of a su m of
many independent, identically
distribu ted random variables
tends towards the normal
distribu tion. If a data distribu tion
is approximately normal then
abou t 68% of the valu es are
within 1 standard deviation of the
mean (mathematically, μ ± σ,
where μ is the arithmetic mean), Dark blue is less than one standard deviation
abou t 95% of the valu es are from the mean. For the normal distribution, this
within two standard deviations (μ accounts for 68.27 % of the set; while two
± 2σ), and abou t 99.7% lie within standard deviations from the mean (medium and
dark blue) account for 95.45%; three standard
3 standard deviations (μ ± 3σ).
deviations (light, medium, and dark blue) account
T his is known as the 68-95-99.7 for 99.73%; and four standard deviations
rule, or the empirical rule. account for 99.994%. The two points of the
curve which are one standard deviation from the
For variou s valu es of z, the mean are also the inflection points.
percentage of valu es expected
to lie in the symmetric
confidence interval (−zσ,zσ) are as follows:

zσ percentage
1σ 68.27%
1.645σ 90%

1.960σ 95%
2σ 95.450%
2.576σ 99%
3σ 99.7300%
3.2906σ 99.9%
4σ 99.993666%
5σ 99.99994267%
6σ 99.9999998027%
7σ 99.9999999997440%

Relationship between standard deviation and

T he mean and the standard deviation of a set of data are u su ally reported
together. In a certain sense, the standard deviation is a "natu ral" measu re of
statistical dispersion if the center of the data is measu red abou t the mean.
T his is becau se the standard deviation from the mean is smaller than from
any other point. T he precise statement is the following: su ppose x1 , ..., xn are
real nu mbers and define the fu nction:

Using calcu lu s, or simply by completing the squ are, it is possible to show

that σ(r) has a u niqu e minimu m at the mean:

T he coefficient of variation of a sample is the ratio of the standard deviation

to the mean. It is a dimensionless nu mber that can be u sed to compare the
amou nt of variance between popu lations with different means.

If we want to obtain the mean by sampling the distribu tion then the standard
deviation of the mean is related to the standard deviation of the distribu tion

where N is the nu mber of samples u sed to sample the mean.

Rapid calculation methods

See also: Algorithms for calculating variance

A slightly faster (significantly for ru nning standard deviat ion) way to compu t e
the popu lation standard deviation is given by the following formu la (thou gh
considerations mu st be made for rou nd-off error, arithmetic overflow, and
arithmetic u nderflow conditions):

T he following two formu las are a u sefu l representation of ru nning

(continu ou s) standard deviation. A set of three power su ms s0,1,2 are each
compu ted over a set of N valu es of x, denoted as x k. Given the resu lts of
these three ru nning su mations, one can u se σ at any time to compu te the
current valu e of the ru nning standard deviation. T his crafty definition for sj
allows u s to easily represent the two different phases (su mmation
compu tation sj, and σ calcu lation). Note that s0 raises x to the zero power,
and since x is always 1, s0 evalu ates to N.

where the power su ms s0 , s1 , s2 are defined by

In a compu ter implementation, as the three sj su ms become large, we need

to consider rou nd-off error, arithmetic overflow, and arit hmetic u nderflow.
T o avoid this, we will periodically redu ce their absolu te valu es in a process
reminiscent of normalizing a u nit vector. Since s1 is the su m of valu es and s2
is the su m of squ ares, we can estimate these valu es for a smaller valu e of N
simply by dividing by ou r cu rrent N, and mu ltiplying by a well-selected

smaller new-N. Ou r comparison with a u nit vector encou rages u s to consider

selecting 1 as the valu e of new-N. However, this is a particu larly poor choice,
as the accu racy of ou r continu ou s approximation was est ablished only for
large N, and this wou ld cau se ou r next valu e to have as mu ch weight in the
calcu lation as all previou s valu es. A more appropriate valu e of new-N is the
maximu m valu e we can afford, su ch that we are su re we can renormalize
back to new-N again before N again becomes large enou gh to introdu ce error
(or catastrophe) as we add more valu es.

Similarly for sample standard deviation:

Or from ru nning su ms:

T he above method can be very su sceptible to rou nding, u nderflow, and

overflow errors, especially when the sample valu es are very close to the
mean. It can actu ally give negative standard devation valu es, which shou ld be
impossible given the definition. T his method is also given in a lot of
textbooks. However, it shou ld not be u sed. Below is a better method for
calcu lating ru nning su ms method with redu ced rou nding errors:

A1 = x1

where A is the mean valu e.

Q1 = 0

sample variance:

standard variance

For weighted distribu tion it is somewhat more complicated: T he mean is

given by:

A1 = x1

where wj are the weights.

Q1 = 0

where n is the total nu mber of elements, and n' is the nu mber of elements
with non-zero weights. T he above formu las become equ al to the more simple
formu las given above if we take all weights equ al to 1.

