DESCRIPTIVE STATISTICS (Chapter 17)

CASE STUDY

While Norm Gregory was here for the golf championship, I measured how far he hit 30 drives on the practice fairway. The results are given below in metres:

DRIVING A GOLF BALL

244:6 251:1 255:9 263:1 265:5 270:5

245:1 251:2 257:0 263:2 265:6 270:7

248:0 253:9 260:6 264:3 266:5 272:9

248:8 254:5 262:8 264:4 267:4 275:6

250:0 254:6 262:9 265:0 269:7 277:5

This type of data must be grouped before a histogram can be drawn. In forming groups, find the lowest and highest values, and then choose a group width so that there are about 6 to 12 groups. In this case the lowest value is 244:6 m and the largest is 277:5 m. This gives a range of approximately 35 m, so a group width of 5 m will give eight groups of equal width.

We will use the following method of grouping. The first group ‘240 - < 245’ includes any data value which is at least 240 m but less than 245 m. Similarly the group ‘260 - < 265’ includes data which is at least 260 m but < 265 m. This technique creates a group for every distance > 240 m but < 280 m. A tally is used to count the data that falls in each group. Do not try to determine the number of data values in the ‘240 - < 245’ group first off. Simply place a vertical stroke in the tally column to register an entry as you work through the data from start to finish. Every fifth entry in a group is marked with a diagonal line through the previous four so groups of five can be counted quickly.

A frequency column summarises the number of data values in each group. The relative frequency column measures the percentage of the total number of data values that are in each group. Norm Gregory’s 30 drives Frequency % Relative Distance (m) Tally (f ) Frequency 240 - < 245 j 1 3:3 245 - < 250 jjj 3 10:0 © j 250 - < 255 © jjjj 6 20:0 255 - < 260 jj 2 6:7 © 260 - < 265 © jjjj jj 7 23:3 © j 265 - < 270 © jjjj 6 20:0 270 - < 275 jjj 3 10:0 275 - < 280 jj 2 6:7 Totals 30 100:0 From this table two histograms can be drawn: a frequency histogram and a relative frequency histogram. They look as follows:

DESCRIPTIVE STATISTICS (Chapter 17)

487

A frequency histogram displaying the distribution of 30 of Norm Gregory’s drives. frequency

A relative frequency histogram displaying the distribution of 30 of Norm Gregory’s drives.

7 6 5 4 3 2 1 0

240 245 250 255 260 265 270 275 280 distance (m)

30 25 20 15 10 5 0

relative frequency (%)

**240 245 250 255 260 265 270 275 280
**

distance (m)

The advantage of the relative frequency histogram is that we can easily compare it with other distributions with different numbers of data values. Using percentages allows for a fair comparison. Notice how the axes are both labelled and the graphs have titles. The left edge of each bar is the first possible entry for that group.

Example 1

The weight of parcels sent on a given day from a post office were, in kilograms: 2:1, 3:0, 0:6, 1:5, 1:9, 2:4, 3:2, 4:2, 2:6, 3:1, 1:8, 1:7, 3:9, 2:4, 0:3, 1:5, 1:2 Organise the data using a frequency table and graph the data. The data is continuous since the weight could be any value from 0:1 kg up to 5 kg. The lowest weight was 0:3 kg and the heaviest was 4:2 kg, so we will use class intervals of 1 kg. The class interval ‘1 - < 2’ includes all weights from 1 kg up to, but not including, 2 kg. Weight (kg) 0-<1 1-<2 2-<3 3-<4 4-<5 Frequency 2 6 4 4 1

A histogram is used to graph this continuous data.

Weights of parcels

frequency

6 4 2 0 0 1 2 3 4 5 weight (kg)

A stemplot could also be used to organise the data: Note: The modal class is (1 - < 2) kg as this occurred most frequently.

Stem 0 1 2 3 4

Leaf 36 255789 1446 0129 2 Scale: 1 j 2 means 1:2 kg

