You are on page 1of 25

Dr Hjh Madihah Khalid

Summary Statistics
The purpose of summary statistics is to replace a mass

of numbers (the data) by just one or two numbers that together, convey most of the essential information. When we are analysing our data distribution, it is important to look at the measure of central tendency (averages) and the measure of spread. The measure of central tendency shows where the data is concentrated and this can be done by looking at three statistics values mean, median or mode

Mean the common type of average


Arithmetic mean = A mean is very sensitive to extreme values.

An extreme value can distort the mean and make it

less representative of the set of values and less useful as an average. Eg. 25, 35, 45, 55, 65, 75, 890. The mean is = 170. Therefore, the mean is not as good representative of the set of data, being affected by the single large number (890) of the set.

Mean is found by making numbers even

2, 5, 2, 1, 5

Mean is found by evening out the numbers

2, 5, 2, 1, 5

Mean is found by evening out the numbers

2, 5, 2, 1, 5
mean = 3

Median - the middle value when all the data are arranged in order of size
To find the median, arrange the data in ascending

order. Will we get the same value for the median is we arrange the data in descending order? When a set of n values is arranged in order, the median is the middle value, that is the value in the th position. Median is not affected by the extreme value in a set of data.

How to Find the Median in a Group of Numbers Arrange the numbers in order

from least to greatest. Example:

21, 18, 24, 19, 28 18, 19, 21, 24, 28

What if you have another number 25?

Mode the value that occur the most often


Eg. If someone says that his average travelling time is 30

minutes, it is probable that he is saying his journey takes 30 minutes most days, not that it takes 15 minutes on Monday to Thursday and an our and a half on Friday. A set of data may have more than one mode, because two or more values may occur an equal number of times. A mode does not exist in a set of data that has the same frequency all through out. When a shopkeeper is checking which shoe size is in greatest demand, he is actually finding the mode. Mode is not affected by a few unusually high or low values.

How to Find the Mode in a Group of Numbers

Step 2 Find the number that

is repeated the most.

21, 18, 23, 19, 18 18, 18, 19, 21, 23

For classed data


Find the mean , median and mode for the following

distribution # of mistakes A)
# of pages

0 61

1 109

2 53

3 23

4 4

Height

B)
# of students

150154

155 159

160164

165169

170174

175179

10

20

Activity
The typical price (AVERAGE)for a packet of potato

crisps is $1.45. What are the possible prices in 6 different shops for the particular potato crisps?

Which measure to use?


Depending on the types of data
For nominal data (such as sex or race), the mode is

the only valid measure. Example:

For ordinal data (such as salary categories), the

mean, mode and median can be used. Example:

Looking at distribution
When the mean, median and mode are equal, you will have

a normal or bell shaped distribution of scores. Example: Scores: 7, 8, 9, 9, 10, 10, 10, 11, 11, 12, 13 Mean: 10 Median: 10 Mode: 10
We should note at this point that a normal distribution (Bell Curve) is an important concept for statisticians because it gives them a "theoretical standard" by which to compare data that may not form a perfect bell curve

If you have data where the mean, median and mode

are quite different, the scores are said to be skewed. Example: Scores: 7, 8, 9, 10, 11, 11, 12, 12, 12, 13, 13 Mean: 10.7 Median: 11 Mode: 12
Scores that are concentrated" at the right or high end of the scale are said to have a negative skew

In a positive skew, scores are concentrated near the left or

low end of a scale. Suggest some numbers that will produce a positively skewed distribution

For a positively skewed distribution, the mean will

always be the highest estimate of central tendency and the mode will always be the lowest estimate of central tendency (assuming that the distribution has only one mode). For negatively skewed distributions, the mean will always be the lowest estimate of central tendency and the mode will be the highest estimate of central tendency. In any skewed distribution (i.e., positive or negative) the median will always fall in-between the mean and the mode. When dealing with skewed distributions, researchers typically decide between the mean or median as the best estimate of central tendency. As distributions go from symmetrical to more skewed, the researcher is more likely to chose the median over the mean.

Example
For example, consider the wages of staff at a factory below: Staff 1 2 3 4 5 6 7 8 9 10

Salary

15k

18k 16k 14k 15k

15k

12k

17k

90k 95k

The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that this mean value might not be the best way to accurately reflect the typical salary of a worker, as most workers have salaries in the $12k to 18k range. The mean is being skewed by the two large salaries. Therefore, in this situation we would like to have a better measure of central tendency. Taking the median would be a better measure of central tendency in this situation

Summary of when to use the mean, median and mode


Type of Variable Nominal Ordinal Interval/Ratio (not skewed) Interval/Ratio (skewed) Best measure of central tendency Mode Median Mean Median

What is the most appropriate measure of central tendency?

Common mistakes
Many people think that 'mean' means the same thing as

'average'. It doesn't; mean is a mathematical term. Average is often used as a description for a person or data item, but in mathematics it means 'a number that typifies a set of numbers of which it is a function'. In other words, average can mean mean, median or mode. The median is the middle value in a distribution, above and below which lie an equal number of values. The mean is a number that typifies a set of numbers, such as a geometric mean or an arithmetic mean; the average value of a set of numbers. Mode is the value or item occurring most frequently in a series of observations or statistical data.

. Example data 1: 2 5 5 6 9 12 15 Analysing the data, we

get mean: 7.71, median: 6, mode: 5 Example data 2: 4 5 5 5 8 12 86 Analysing this data, we get mean: 17.857, median: 5, mode: 5

You might also like