You are on page 1of 3

Standard Deviation and Variance (raw data)

In statistics it is convenient to summarise a set of data by highlighting some key features. It is common to summarise data using an average (such as the mean or median) but it is also helpful to have a measure of the spread of the data. Two simple measures of spread are the range (i.e. the difference between the largest and smallest values in the data) and the inter-quartile range (i.e. the difference between the lower and upper quartiles). Standard deviation is another measure of spread which is widely used in statistics. The standard deviation gives a measure of how far the data tends to be from the mean value. One formula for the standard deviation is: s.d. = !ote that:

( x x)
n

") x is the notation used for the mean of a set of data ) The symbol # is the $reek letter sigma % it is used in maths to mean &add up'.

The variance is also sometimes used. The variance is the square of the standard deviation and so is given by the formula: ( x x) variance ( n The e)ample below shows how these formulae are used. Introduction: *now +hite timed each of the seven dwarfs running a race. Their times (in seconds) were as follows: ,opey: -. seconds $rumpy: /" seconds ,oc: -0 seconds 1appy: /0 seconds 2ashful: /- seconds *nee3y: /4 seconds *leepy: /5 seconds The mean of these 5 times is: x =

x = -. + ... + /5 =
n 5 xx ( x 7 / 75 7" 75 " 7 . ( x x) = 4

0/ = / seconds. 5

To find the standard deviation6 we can draw up a table: ,ata6 x -. /" -0 /0 //4 /5 x = 0/ ( x x) /0 " 0 /0 " / . ( x x) ( "-8

The variance of the dwarfs9 times is therefore: variance ( *o the standard deviation is: s.d. (

( x x)
n

"-8 = "0.5"/ 5

variance = "0.5"/ = /.//sec .

!ote: *tandard deviation is measured in the same units as the original data whereas variance is measured in squared units.

A more useful formula


There are alternative formulae which are usually simpler to use in order to find the variance or the standard deviation. These are: x x variance ( n and s.d. (

x
n

The steps involved to find the standard deviation therefore are as follows: *tep ": *quare each piece of data *tep : :dd up these squares (to get

*tep -: ,ivide by the number of values (to get

) n *tep /: *ubtract the square of the mean (to get the variance) *tep .: *quare root (to get the standard deviation) If we apply these steps to the dwarfs9 race times (from page ") we get: *tep ": -.; ( " . /"; ( "<8" -0; ( ". " ( "8/0 /4; ( "<44 /5; ( 40 *tep : *o6

/0; ( /4"

/-;

( " /8<

*tep -: Therefore6

x
n

" /8< = "58-.5"/... 5 x = "58-.5"/... / = "0.5"/...

*tep /: *o variance (

x
n

*tep .: =onsequently6 standard deviation ( "0.5"/... = /.// secs (as before)

>sually we show less working as the following e)ample demonstrates: Worked example : class sat tests in *tatistics and in ?ure @athematics. Their results (e)pressed as percentages) were as follows: Statistics mark, x: Pure mark, y: /. /0 5 8. <</ .0 /" 58 5</ .." <5 ..

a) =alculate the mean and standard deviation for each test. b) =ompare the results obtained in *tatistics and ?ure @aths. Solution: a) Aor *tatistics: /. + 5 + ... + <5 /00 x= = = < .-5. 8 8 To find the standard deviation6 the key value is *o the standard deviation is given by: s.d . = Aor ?ure:

= /. + 5

+ ... + <5 = -"0 0

x
n

x =

-"0 0 < .-5. = "44./8... = "4.4 8

/0 + 8. + ... + .. /. = = .<.. 8 8 The sum of the squares is y= *o the standard deviation is: s.d . =

= /0 + ... + .. = 5.04

y
n

y =

5.04 .<.. = 8

.<.. = "<.4

b) +hen comparing two sets of data6 it is important to compare the values of both the mean and the standard deviation using the conte)t of the question. In this case6 we can conclude that: i) students generally achieved higher marks in *tatistics (as shown by the higher mean)B ii) the standard deviation was higher for the ?ure marks indicating that there was greater variation in the students9 performances in the ?ure test than in the *tatistics test.