You are on page 1of 6

INTRODUCTION TO STATISTICS

Mean

We'll start by calculating the 'average' or 'mean' of a set of numbers.

You can use the terms 'mean' and 'average' interchangeably. Strictly speaking
the 'mean' is the correct term. The 'average' has a more general meaning, it can
be used to refer to any value that typifies a set of numbers.

You may already know how to calculate the mean, but I want to use the explanation to make a
few points. Suppose you want to calculate the mean of five values, say:

1 8 3 6 2

You add them up and divide by the number of values:

Ive introduced a bit of notation, Ive represented the mean by the symbol . The bar over the
'x' is a standard way of representing a mean.

We say x-bar and that's the way I'll write it in the text from now on.

Now lets introduce some more notation. Well represent the number of values (five in this case)
by the variable n. Although we could use other letters, n is the usual first choice when we
want to represent the number of values in a sample.

Well also substitute the individual values by xi, where the subscript i takes a number to
represent each individual value (i stands for index):

This formula can be used to find the mean of any number of values. Just substitute the symbols
x1, x2 and so on up to xn with the actual data values.

We can introduce yet more mathematical notation to make it more concise:

Lets look more closely at the bit on top (the numerator):

The symbol is the Greek symbol capital Sigma, and means summation in 'mathspeak':

In full the notation means add all the values of xi from i equals 1 to i
equals n (run your mouse over the image to see another view of this).

Note that the i = 1 and i = n are sometimes omitted, if they are self-
evident.

The times taken to repair breakdowns of critical equipment, in hours, over the
past week were as follows:

5 4 3 2 5 5

What is the mean repair time?


Understanding the Mean

Now lets consider what this formula means in practice:

One way of looking at it is to imagine that weights are placed along a plank, with the distance
along the plank representing the data value. The mean is at the point of balance:

The mean is also the position where:

Run your mouse over the image above

Median

The mean is a 'measure of central tendency'. It is a single value that attempts to tell you the
position of the 'center' of the data set. There are several other measures of central tendency that
try to capture the same idea, the two most important are the 'median' and the 'mode'.
Consider the following example, valuations gathered from recent house sales in a particular
neighborhood were as follows:

$185, $190, $145, $220, $1,060 $200, $170,


000 000 000 000 ,000 000 000

The mean of these values is $310,000.


A statistic only serves a purpose if it helps to inform a decision. You might want to know the
average valuation because you are considering moving to the neighborhood. You would find the
mean value misleading, the typical valuation is around $200,000 but one very expensive
property inflates the mean.
You might find the 'median' value more useful. The median is the central value in the ordered
data set, there are as many values above the median as below. To calculate it, sort the values
into order and take the central value:

$145, $170, $185, $190, $200, $220, $1,060


000 000 000 000 000 000 ,000

The ages of people presenting with a particular medical condition were:

20 24 22 65 20 20

In this case there is an even number of values, and so no middle value. The median is the mean
of the two middle values, after the data are sorted into order of magnitude:

20 20 20 22 24 65

The median is 21.


Again, the mean age of 28.5 is not meaningful. Most people presenting are young, but there is
one 65 year old.

The American Medical Association (AMA) and the American Bar Association
(ABA) had a dispute about the rising cost of malpractice insurance for
doctors. The doctors used the 'mean' to show a sharp rise in cost over the
period concerned. The lawyers used the 'median' to show there had been no
increase in cost.

The 'Average Net Worth' or the 'Median Net Worth' is often used to measure
the prosperity of the community.

Which do you think is the better measure, and do you think there would be
much difference?

Mode

The previous example considered the ages of people presenting with a particular medical
condition. An alternative measure of central tendency that could be used for this example is the
'mode'. The mode is the most common value in the data set:
20 24 22 65 20 20

The mode of the ages is 20.

The 'mode' is used with 'categorical' data, that is data that can be placed into discrete categories,
rather than measured on a continual scale. This data can be either a number or a name. For
example, one hundred patients were asked to rate the quality of nursing care in a clinic. The
results were:

Rating Frequency of Response


Very Poor 0
Poor 4
Average 24
Good 47
Excellent 22
Not Answered 3
Total 100

The mode is 'Good'.

Salary levels of 6 employees in a company are:

$25,20 $34,00 $243,00 $43,20 $43,20 $31,00


0 0 0 0 0 0

Specify the mean, median and mode salary levels:

Mean:
Median: Mode:

Which measure is likely to be the most useful in this example?


Mean: Median: Mode:

You might also like