Lec17 - Summarising Data

You might also like

You are on page 1of 6

120MP SUMMARISING DATA

In a summary of some data, it is usually recommended to include the following 5 elements:

Average
Spread
Distribution
Outliers
Chart

This can be remembered using the acronym ASDOC

Average
This is a ‘typical’ data value that is representative of the whole data set.

Mean:

Median:

Mode:

120MP 1 17
Spread
This gives a measure of how much the values vary:

Range

Interquartile Range

Standard Deviation

120MP 2 17
Distribution
This is characterised by the shape and ‘skewness’

If mean = median then the data distribution is usually symmetrical. Symmetrical distributions could
be uniform (or rectangular), bell-shaped (or normal), or U-shaped.

If there are a few particularly high values then the frequency curve is stretched in the positive
direction, and the data are said to be positively skewed.
This usually results in the mean > median.

If there are a few particularly low values then the frequency curve is stretched in the negativee
direction, and the data are said to be negatively skewed.
This usually results in the mean < median.

There are a few different formulae for measuring skewness. One is:
n ∑ ( x− x̄ )
3

(n−1 )(n−2)s 3

Outliers
These are extreme or unusual values.

There are several formal calculations for determining whether a value is an outlier. Two are
described below:
Any data value that is:
1) more than 3 sds away from the mean or
2) more than 1.5 IQRs above Q3 or below Q1

120MP 3 17
Chart
Useful for highlighting the overall picture and relevant results.

Bar Chart -can be used to show %s;

or means and standard deviations:

120MP 4 17
Pie Chart- Can be used to show %s (which should always add to 100)

Histogram -Can be used to present %s for continuous data and it also shows the shape of the
distribution:

120MP 5 17
Box plot (or box and whisker diagram) - Displays the minimum, lower quartile, median, upper
quartile, maximum, and any outliers, and hence the overall shape of the distribution

Ogive (or cumulative percentage frequency plot)- Used to show percentiles (e.g. quartiles).
Empirical CDF of Time

100

80

60
Percent

40

20

0 20 40 60 80 100 120 140 160 180


Travel Time (minutes)

120MP 6 17

You might also like