Professional Documents
Culture Documents
http://www.comp.leeds.ac.uk/hannah/mathsclub Probability 1 (for dummies:-) Stats 1 (averages and deviations) Probability 2 (Trials and distributions) Stats 2 (signicance) Stats 3 (errors)
p. 1/2
Preliminaries
So what is statistics? Applied branch of mathematics Concerning data and its representation Descriptive Statistics (today) are concerned with representing and summarising data Analytical Statistics (in a few weeks) are concerned drawing conclusions from data ... probability theory enables us to nd the consequences of a given ideal world, while statistical theory enables us to to measure the extent to which our world is ideal Skiena, 2001.
p. 2/2
Max, Min, Mean(s), Median, Mode, Variance, Standard Deviation, Interquartile range, ... All ways of presenting numerical data in such a way that we learn something of its spread and tendency and deviation.
p. 3/2
What is an average?
Average originally meant Financial loss incurred through damage to goods in transit, from the Italian avaria, a word from 12c. Mediterranean maritime trade. Sometimes traced to Arabic arwariya damaged merchandise, but this is less certain. Later, the meaning of the word shifts to equal sharing of such loss by the interested parties.
p. 4/2
xi
i=1
p. 5/2
xi
i=1
p. 6/2
p. 7/2
Symmetricity/Skewness
I am just going to mention this in passing today, but...
A fictitious but nastily skewed dataset 700
600
500
10
20
30 Number
40
50
60
70
p. 8/2
p. 9/2
p. 10/2
p. 11/2
p. 12/2
p. 13/2
Deviation
As well as knowing some kind of average of a particular sample, you might want to know something of its spread.
6 x 10
4
Count
0 1.5
0.5
0.5
1 Number
1.5
2.5
3.5
Figure 2: Three datasets with the same mean but different spreads.
p. 14/2
p. 15/2
Deviation
The deviation of a sample is measured with reference to some measure of central tendency you want to know how much the sample deviates from something. With average deviation, variance, and standard deviation, this is the mean or the sample mean x.
p. 16/2
Measures of deviation
Average deviation =
2
|x | N (x )2 N (x )2 N
Variance = =
Standard deviation = =
For reasons you will now be familiar with, when considering samples, becomes s, and becomes x. To account for bias, sample standard deviation is divided by n 1 rather than n.
p. 17/2
Worked example
This examplea involves the rainfall in Liberiab . J F M A M J J A S O N D 1 2 4 6 18 37 31 16 28 24 9 4 The mean of this data is
1 + 2 + 4 + 6 + 18 + 37 + 31 + 16 + 28 + 24 + 9 + 4 = 15 12
taken from Sternsteins Statistics No, Ive never been there either
p. 18/2
Average deviation
The average deviation
|1 15| + |2 15| + |4 15| + |6 15| + |18 15| + ... = 12 14 + 13 + 11 + 9 + 3 + 22 + 16 + 1 + 13 + 9 + 6 + 11 = 12 (10.7 Inches)
p. 19/2
(143.7 Inches squared) AND the standard deviation is the square root of the variance, so...
= 143.7 = 12.0
and the units of the standard deviation are... the same as the units of measurement.
p. 20/2
Interquartile range
One nal measure of deviation is the interquartile range. This is related to the median, and the rst thing you do is place your data in order.
1 Discard the lowest and the highest 4 of your data, and use the range of what remains. This is much more robust to outliers.
p. 21/2
And to nish
If your data is normally distributed (of which more next week), knowing the standard deviation tells you all sorts of useful stuff.
p. 22/2