Professional Documents
Culture Documents
Statistics S1-Chap1 For Ocr
Statistics S1-Chap1 For Ocr
P U B L I S H E D B Y T H E P R E S S S Y N D I C AT E O F T H E U N I V E R S I T Y O F C A M B R I D G E
A catalogue record for this book is available from the British Library
ISBN 0 521 54893 4 paperback
C Digital Vision
Cover image
Contents
Introduction
page iv
Representation of data
Measures of location
27
Measures of spread
44
Probability
66
86
Probability distributions
103
113
133
Correlation
148
Regression
175
Revision exercise
197
201
208
Answers
214
Index
227
10
1 Representation of data
This chapter looks at ways of displaying numerical data using diagrams. When you have
completed it you should
1.1
Introduction
The collection, organisation and analysis of numerical information are all part of the subject
called statistics. Pieces of numerical and other information are called data. A more helpful
denition of data is a series of facts from which conclusions may be drawn.
In order to collect data you need to observe or to measure some property. This property is
called a variable. The data which follow were taken from the internet, which has many sites
containing data sources. In this example a variety of measurements was taken on packets of
breakfast cereal for sale in a supermarket in the USA. Each column represents a variable. So, for
example, mfr, sodium and shelf are all variables.
Datale Name
Cereals
Description Data which refer to various brands of breakfast cereal in a particular store. A
value of 1 for nutrients indicates a missing observation.
Number of cases
77
Variable Names
1 name: name of cereal
2 mfr: manufacturer of cereal, where A = American Home Food Products; G = General Mills;
K = Kelloggs; N = Nabisco; P = Post; Q = Quaker Oats; R = Ralston Purina
3 type: cold (C) or hot (H)
4 cals: calories per serving
5 fat: grams of fat
6 sodium: milligrams of sodium
Statistics 1
name
100% Bran
100% Natural Bran
All-Bran
All-Bran with Extra Fiber
Almond Delight
Apple Cinnamon Cheerios
Apple Jacks
Basic 4
Bran Chex
Bran Flakes
CapnCrunch
Cheerios
Cinnamon Toast Crunch
Clusters
Cocoa Puffs
Corn Chex
Corn Flakes
Corn Pops
Count Chocula
Cracklin Oat Bran
Cream of Wheat (Quick)
Crispix
Crispy Wheat & Raisins
Double Chex
Froot Loops
Frosted Flakes
Frosted Mini-Wheats
Fruit & Fibre Dates, Walnuts,
and Oats
Fruitful Bran
Fruity Pebbles
Golden Crisp
Golden Grahams
Grape Nuts Flakes
Grape-Nuts
Great Grains Pecan
Honey Graham Ohs
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
H
C
C
C
C
C
C
C
70
120
70
50
110
110
110
130
90
90
120
110
120
110
110
110
100
110
110
110
100
110
100
100
110
110
100
120
1
5
1
0
2
2
0
2
1
0
2
2
3
2
1
0
0
0
1
3
0
0
1
0
1
0
0
2
130
15
260
140
200
180
125
210
200
210
220
290
210
140
180
280
290
90
180
140
80
220
140
190
125
200
0
160
5
8
7
8
14
10.5
11
18
15
13
12
17
13
13
12
22
21
13
12
10
21
21
11
18
11
14
14
12
3
3
3
3
3
1
2
3
1
3
2
1
2
3
2
1
1
2
2
3
2
3
3
3
2
1
2
3
1
1
1
1
1
1
1
1.3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1.3
68
34
59
94
34
30
33
37
49
53
18
51
20
40
23
41
46
36
22
40
65
47
36
44
32
31
58
41
K
P
P
G
P
P
P
Q
C
C
C
C
C
C
C
C
120
110
100
110
100
110
120
120
0
1
0
1
1
0
3
2
240
135
45
280
140
170
75
220
14
13
11
15
15
17
13
12
3
2
1
2
3
3
3
2
1.3
1
1
1
1
1
1
1
41
28
35
24
52
53
46
22
(cont.)
Representation of data
(cont.)
name
Honey Nut Cheerios
Honey-comb
Just Right Crunchy Nuggets
Just Right Fruit & Nut
Kix
Life
Lucky Charms
Maypo
Muesli Raisins, Dates, & Almonds
Muesli Raisins, Peaches, & Pecans
Mueslix Crispy Blend
Multi-Grain Cheerios
Nut & Honey Crunch
Nutri-Grain Almond-Raisin
Nutri-grain Wheat
Oatmeal Raisin Crisp
Post Nat. Raisin Bran
Product 19
Puffed Rice
Puffed Wheat
Quaker Oat Squares
Quaker Oatmeal
Raisin Bran
Raisin Nut Bran
Raisin Squares
Rice Chex
Rice Krispies
Shredded Wheat
Shredded WheatnBran
Shredded Wheat spoon size
Smacks
Special K
Strawberry Fruit Wheats
Total Corn Flakes
Total Raisin Bran
Total Whole Grain
Triples
Trix
Wheat Chex
Wheaties
Wheaties Honey Gold
Table 1.1. Datale Cereals.
C
C
C
C
C
C
C
H
C
C
C
C
C
C
C
C
C
C
C
C
C
H
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
110
110
110
140
110
100
110
100
150
150
160
100
120
140
90
130
120
100
50
50
100
100
120
100
90
110
110
80
90
90
110
110
90
110
140
100
110
110
100
100
110
1
0
1
1
1
2
1
1
3
3
2
1
1
2
0
2
1
0
0
0
1
2
1
2
0
0
0
0
0
0
1
0
0
1
1
1
1
1
1
1
1
250
180
170
170
260
150
180
0
95
150
150
220
190
220
170
170
200
320
0
0
135
0
210
140
0
240
290
0
0
0
70
230
15
200
190
200
250
140
230
200
200
11.5
14
17
20
21
12
12
16
16
16
17
15
15
21
18
13.5
11
20
13
10
14
1
14
10.5
15
23
22
16
19
20
9
16
15
21
15
16
21
13
17
17
16
1
1
3
3
2
2
2
2
3
3
3
1
2
3
3
3
3
3
3
3
3
1
2
3
3
1
1
1
1
1
2
1
2
3
3
3
3
2
1
1
1
1
1
1
1.3
1
1
1
1
1
1
1.5
1
1
1.3
1
1.3
1.3
1
0.5
0.5
1
1
1.3
1
1
1
1
0.8
1
1
1
1
1
1
1.5
1
1
1
1
1
1
31
29
37
36
39
45
27
55
37
34
30
40
30
41
60
30
38
42
61
63
50
51
39
40
55
42
41
68
74
73
31
53
59
39
29
47
39
28
50
52
36
Statistics 1
The variable mfr (manufacturer) has several different letter codes, for example N, G, K, Q
and P.
The variable sodium takes values such as 130, 15, 260 and 140.
The variable shelf takes values 1, 2 or 3.
You can see that there are different types of variable. The variable mfr is non-numerical: such
variables are usually called qualitative. The other two variables are called quantitative,
because the values they take are numerical.
Example 1.1.1
Which of the variables (a) type,
qualitative?
(b) carbo,
(c) mass,
(a) type is a qualitative variable, since the two different types of breakfast cereal,
cold and hot, are recorded as words (letters) rather than as numbers.
(b) carbo is the number of grams of complex carbohydrates. This is quantitative,
since the possible values are numerical.
(c) mass stands for mass in ounces. This is also quantitative, since the different
possible masses per serving would be given in the form of numbers.
Quantitative numerical data can be subdivided into two categories. For example, sodium,
the mass of sodium in grams, which can take any value in a particular range, is called a
continuous variable. Display shelf, on the other hand, is a discrete variable: it can only
take the integer values 1, 2 or 3, and there is a clear step between each possible value. It would
not be sensible, for example, to refer to display shelf number 2.43.
In summary:
A variable is qualitative if it is not possible for it to take a numerical
value.
A variable is quantitative if it can take a numerical value.
A quantitative variable which can take any value in a given range is
continuous.
A quantitative variable which has clear steps between its possible values is
discrete.
1.2
Representation of data
Stem-and-leaf diagrams
The datale on cereals has one column which gives a rating of the cereals on a scale of 0100.
The ratings are given below.
68
18
65
35
39
60
55
29
34
51
47
24
45
30
42
47
59
20
36
52
27
38
41
39
94
40
44
53
55
42
68
28
34
23
32
46
37
61
74
50
30
41
31
22
34
63
73
52
33
46
58
31
30
50
31
36
37
36
41
29
40
51
53
49
22
41
37
30
39
59
53
40
28
36
41
40
39
These values are what statisticians call raw data. Raw data are the values collected in a survey
or experiment before they are categorised or arranged in any way. Usually raw data appear in
the form of a list. It is very difcult to draw any conclusions from these raw data just by
looking at the numbers. One way of arranging the values that gives some information about
the patterns within the data is a stem-and-leaf diagram.
In this case the stems are the tens digits and the leaves are the units digits. You write the stems
to the left of a vertical line and the leaves to the right of the line. So, for example, you would
write the rst value, 68, as 6|8.
The leaves belonging to one stem are then written in the same row. The stem-and-leaf diagram
for these data is shown in Fig. 1.2. The key shows what the stems and leaves mean.
0
1
2
3
4
5
6
7
8
9
8
0
4
9
9
8
4
3
4
0
3
5
3
2
0
1
1
0
8
3
6
8
1
4
7
0
2
3
2
6
7
3
8
9
6
4
5
7
2
1
0
9
1
1
1
8
5 1 7 6 9 7 4 0 0 0 8 9 1 9 9 6
6 5 0 1 2 0 2 1 7
5 3 9 0 2
(0)
(1)
(10)
(25)
(18)
(14)
(6)
(2)
(0)
(1)
The numbers in the brackets tell you how many leaves belong to each stem (and may be
omitted). The digits in each stem form a horizontal block, similar to a bar on a bar chart,
which gives a visual impression of the distribution. In fact, if you rotate a stem-and-leaf
diagram anticlockwise through 90 it looks like a bar chart. It is also common to rewrite the
leaves in numerical order; the stem-and-leaf diagram formed in this way is called an ordered
stem-and-leaf diagram. The ordered stem-and-leaf diagram for the cereal ratings is shown
in Fig. 1.3.
Statistics 1
0
1
2
3
4
5
6
7
8
9
8
0
0
0
0
0
3
2
0
0
0
1
4
2
0
0
1
3
3
0
0
1
5
4
1
1
2
8
7
1
1
2
8
8
1
1
3
8
2
1
3
9
3
1
3
9
4 4 4 5 6 6 6 6 7 7 7 8 9 9 9 9
2 2 4 5 6 6 7 7 9
5 5 8 9 9
(0)
(1)
(10)
(25)
(18)
(14)
(6)
(2)
(0)
(1)
When you are asked to draw a stem-and-leaf diagram you should assume that an ordered
stem-and-leaf diagram is required.
So far the stem-and-leaf diagrams discussed have consisted
of data values which are integers between 0 and 100. With
suitable adjustments, you can use stem-and-leaf diagrams
for other data values.
For example, the data 6.2, 3.1, 4.8, 9.1, 8.3, 6.2, 1.4, 9.6, 0.3,
0.3, 8.4, 6.1, 8.2, 4.3 could be illustrated in the
stem-and-leaf diagram in Fig. 1.4.
Table 1.5 is a datale about brain sizes which will be used
in several examples.
Datale Name Brain size (Data reprinted from Intelligence,
Vol. 15, Willerman et al, In vivo brain size . . ., 1991, with
permission from Elsevier Science)
0
1
2
3
4
5
6
7
8
9
3 3
4
1
3 8
1 2 2
2 3 4
1 6
(2)
(1)
(0)
(1)
(2)
(0)
(3)
(0)
(3)
(2)
40
Variable Names
1 gender: male or female
2 FSIQ: full scale IQ scores based on the four Wechsler (1981) subtests
3 VIQ: verbal IQ scores based on the four Wechsler (1981) subtests
4 PIQ: performance IQ scores based on the four Wechsler (1981) subtests
5 mass: body mass in pounds (1 pound = 0.45 kg)
gender
FSIQ
VIQ
PIQ
mass
Female
Male
Male
Male
Female
Female
Female
Female
Male
Male
Female
Male
Male
Female
Female
Female
Female
Male
Female
Male
Male
Male
Female
Male
Female
Male
Female
Male
Female
Female
Female
Male
Male
Male
Female
Female
Male
Female
Male
Male
133
140
139
133
137
99
138
92
89
133
132
141
135
140
96
83
132
100
101
80
83
97
135
139
91
141
85
103
77
130
133
144
103
90
83
133
140
88
81
89
132
150
123
129
132
90
136
90
93
114
129
150
129
120
100
71
132
96
112
77
83
107
129
145
86
145
90
96
83
126
126
145
96
96
90
129
150
86
90
91
124
124
150
128
134
110
131
98
84
147
124
128
124
147
90
96
120
102
84
86
86
84
134
128
102
131
84
110
72
124
132
137
110
86
81
128
124
94
74
89
118
143
172
147
146
138
175
134
172
118
151
155
155
146
135
127
178
136
180
186
122
132
114
171
140
187
106
159
127
191
192
181
143
153
144
139
148
179
64.5
72.5
73.3
68.8
65.0
69.0
64.5
66.0
66.3
68.8
64.5
70.0
69.0
70.5
66.0
68.0
68.5
73.5
66.3
70.0
76.5
62.0
68.0
63.0
72.0
68.0
77.0
63.0
66.5
62.5
67.0
75.5
69.0
66.5
66.5
70.5
64.5
74.0
75.5
816 932
1 001 121
1 038 437
965 353
951 545
928 799
991 305
854 258
904 858
955 466
833 868
1 079 549
924 059
856 472
878 897
865 363
952 244
945 088
808 020
889 083
892 420
905 940
790 619
955 003
831 772
935 494
798 612
1 062 462
793 549
866 662
857 782
949 589
997 925
879 987
834 344
948 066
949 395
893 983
930 016
935 863
Representation of data
Statistics 1
147
122
146
114
138
140
175
106
118
159
155
127
146
143
135
153
127
139
You can represent these data with the stem-and-leaf diagram shown in Fig. 1.6, which uses
stems from 10 to 17.
10
11
12
13
14
15
16
17
6
4
2
5
0
3
8
7
6
3
5
(1)
(3)
(3)
(4)
(5)
(3)
(0)
(1)
8
7
8 9
6 6 7
9
Two sets of data can be compared in a back-to-back stem-and-leaf diagram. Consider the
following data, which are the masses of 18 males in pounds (correct to the nearest pound),
taken from the datale Brain size.
143
132
172
171
134
187
172
191
151
192
155
181
178
144
180
148
186
179
Fig. 1.7 shows these data added to Fig. 1.6 as leaves to the left of the stem. Note that the stem
has been extended to 19 to accommodate the highest male mass of 192 pounds.
males
(0)
10
(0)
11
(0)
12
(2)
4 2 13
(3)
8 4 3 14
(2)
5 1 15
(0)
16
(5)
9 8 2 2 1 17
(4)
7 6 1 0 18
(2)
2 1 19
Key: 10|6 means 106 pounds
females
6
4
2
5
0
3
5
8
7
6
3
5
8
7
8 9
6 6 7
9
(1)
(3)
(3)
(4)
(5)
(3)
(0)
(1)
(0)
(0)
Representation of data
Exercise 1A
1 The following stem-and-leaf diagram illustrates the lengths, in cm, of a sample of 15 leaves
fallen from a tree. The values are given correct to 1 decimal place.
4
5
6
7
8
9
3
4
3
6
0
1
1
7
2
6
3
4
(1)
(5)
(4)
(3)
(0)
(2)
160
142
186
182
141
169
168
173
153
160
169
189
161
171
171
181
159
188
168
175
163
151
157
165
186
162
166
177
179
164
181
184
167
172
171
161
154
181
183
170
145
152
176
167
156
178
155
180
159
151
161
137
171
177
182
143
Construct a stem-and-leaf diagram using two rows for each stem so that, for example, with
a stem of 15 the rst leaf may have digits 0 to 4 and the second leaf may have digits 5 to 9.
5 A selection of 25 of A. A. Michelsons measurements of the speed of light, carried out in
1882, is given below. The gures are in thousands of kilometres per second and are given
correct to 5 signicant gures.
10
Statistics 1
299.84
299.94
299.72
299.96
299.81
299.90
299.87
299.76
299.83
300.00
300.07
299.62
299.93
299.79
299.88
299.65
299.93
299.97
299.88
299.80
299.85
299.98
299.75
299.74
299.91
1.351
1.336
1.348
1.344
1.345
1.342
1.362
1.350
1.353
1.338
1.353
1.341
1.341
1.347
1.322
1.342
1.342
1.354
1.356
1.353
1.347
1.339
1.329
1.349
1.351
1.346
1.370
1.3
Tally
Frequency
049
5099
100149
150199
200249
250299
300349
|||| |||| ||
||||
|||| |||| ||
|||| |||| |||| ||
|||| |||| |||| |||| |
|||| ||||
|
12
5
12
17
21
9
1
Table 1.8 is called a grouped frequency distribution. This table shows how many values
of the variable lie in each class. The pattern of a grouped frequency distribution is decided to
some extent by the choice of classes. There would have been a different appearance to the
distribution if the classes 099, 100199, 200299 and 300399 had been chosen. There is no
Representation of data
clear rule about how many classes should be chosen or what size they should be, but it is usual
to have from 5 to 10 classes.
Grouping the data into classes inevitably means losing some information. Someone looking at
the table would not know the exact values of the observations in the 049 category. All he or
she would know for certain is that there were 12 such observations.
Consider the class 5099. This refers to cereal packets which contained from 50 to 99 mg of
sodium. The amount of sodium is a continuous variable and the amounts appear to have been
rounded to the nearest mg. If this is true then the class labelled as 5099 would actually
contain values from 49.5 up to (but not including) 99.5. These real endpoints, 49.5 and 99.5,
are referred to as the class boundaries. The class boundaries are generally used in most
numerical and graphical summaries of data. In this example you were not actually told that
the data were recorded to the nearest mg: you merely assumed that this was the case. In most
examples you will know how the data were recorded.
Example 1.3.1
For each case below give the class boundaries of the rst class.
(a) The heights of 100 students were recorded to the nearest centimetre.
Height, h (cm)
Frequency
160164
7
165169
9
170174
13
...
...
(b) The masses in kilograms of 40 patients entering a doctors surgery on one day were
recorded to the nearest kilogram.
Mass, m (kg)
Frequency
55
9
60
15
65
12
...
...
(c) A group of 40 motorists was asked to state the ages at which they passed their driving
tests.
Age, a (years)
Frequency
17
6
20
11
23
7
...
...
(a) The minimum and maximum heights for someone in the rst class are 159.5 cm
and 164.5 cm. The class boundaries are given by 159.5 h < 164.5.
(b) The rst class appears to go from 55 kg up to but not including 60 kg, but as
the measurement has been made to the nearest kg the lower and upper class
boundaries are 54.5 kg and 59.5 kg. The class boundaries are given by
54.5 m < 59.5.
11
12
Statistics 1
(c) Age is recorded to the number of completed years, so 17 contains those who
passed their tests from the day of their 17th birthday up to, but not including,
the day of their 20th birthday. The class boundaries are given by 17 a < 20.
Sometimes discrete data are grouped into classes. For example, the test scores of 40 students
might appear as in Table 1.12.
Score
Frequency
09
14
1019
9
2029
9
3039
3
4059
5
What are the class boundaries? There is no universally accepted answer, but a common
convention is to use 9.5 and 19.5 as the class boundaries. Although it may appear strange,
the class boundaries for the rst class would be 0.5 and 9.5.
When a grouped frequency distribution contains continuous data, one of the most common
forms of graphical display is the histogram. A histogram looks similar to a bar chart, but
there are two important differences.
A bar chart which represents continuous data is a histogram if
r the bars have no spaces between them (though there may be bars of
height zero, which look like spaces), and
Height, h (cm)
Frequency
0h<5
5 h < 10
10 h < 15
15 h < 20
20 h < 25
25 h < 30
3
5
11
6
3
2
The histogram to represent the set of data in Table 1.13 is shown in Fig. 1.14.
Representation of data
Another person recorded the same results by combining the last two rows as in Table 1.15, and
then drew the diagram shown in Fig. 1.16.
Height, h (cm)
Frequency
0h<5
5 h < 10
10 h < 15
15 h < 20
20 h < 30
3
5
11
6
5
frequency
.
width of class
13
14
Statistics 1
Example 1.3.2
The grouped frequency distribution in Table 1.18
represents the heights in inches of a sample of 39
of the people from the datale Brain size (see the
previous section). Represent these data in a
histogram.
Height (inches)
Frequency
6263
6465
6667
6871
7275
7679
4
5
8
13
5
4
Find the frequency densities by dividing the frequency of each class by the width of
the class, as shown in Table 1.19.
Height, h (inches)
Class boundaries
Class width
Frequency
Frequency density
6263
6465
6667
6871
7275
7679
2
2
2
4
4
4
4
5
8
13
5
4
2
2.5
4
3.25
1.25
1
Table 1.19. Calculation of frequency density for the data in Table 1.18.
Representation of data
Example 1.3.3
The grouped frequency distribution in Table 1.21 summarises the
masses in grams (g), measured to the nearest gram, of a sample of
20 pebbles. Represent the data in a histogram.
The problem with this frequency distribution is that the
last class is open-ended, so you cannot deduce the
correct class boundaries unless you know the individual
data values. In this case the individual values are not
given. A reasonable procedure for this type of situation
is to take the width of the last interval to be twice that
of the previous one. Table 1.22 and the histogram in
Fig. 1.23 are constructed using this assumption.
Mass (g)
Frequency
101110
111120
121130
131140
141150
over 150
1
4
2
7
2
4
Mass, m (g)
Class boundaries
Class width
Frequency
Frequency density
101110
111120
121130
131140
141150
over 150
10
10
10
10
10
20
1
4
2
7
2
4
0.1
0.4
0.2
0.7
0.2
0.2
Table 1.22. Calculation of frequency density for the data in Table 1.21.
15
16
Statistics 1
All the previous examples of histograms have involved continuous data. You can also
represent grouped discrete data in a histogram. Table 1.24 gives the class boundaries for the
data in Table 1.12, which were the test scores of 40 students. Recall that the convention used is
to take the class with limits 10 and 19 as having class boundaries 9.5 and 19.5. Using this
convention you can nd the frequency densities.
Score
Class boundaries
Class width
Frequency
Frequency density
09
1019
2029
3039
4059
0.59.5
9.519.5
19.529.5
29.539.5
39.559.5
10
10
10
10
20
14
9
9
3
5
1.4
0.9
0.9
0.3
0.25
Frequency
15
10
10
20
30
Height, h (cm)
Representation of data
Usually you would not display both the histogram and the frequency polygon on the same
diagram. A frequency polygon can be drawn, without drawing the histogram rst, by plotting
the frequency density for each class at the mid-point of that class.
Example 1.3.4
Plot a frequency polygon for the data in Table 1.18.
The table below reproduces Table 1.19 with an extra column added.
Height, h
(inches)
Class boundaries
Class width
6263
6465
6667
6871
7275
7679
2
2
2
4
4
4
Frequency
Frequency
density
Mid-point
4
5
8
13
5
4
2
2.5
4
3.25
1.25
1
62.5
64.5
66.5
69.5
73.5
77.5
Table 1.27. Calculation of frequency density for the data in Table 1.18.
0
60
65
70
75
80
Height, h (inches)
Frequency polygons are useful when comparing two or more data sets because they can be
superimposed in the same diagram.
Exercise 1B
In Question 1 on the next page, the upper class boundary of one class is identical to the lower
class boundary of the next class. If you were measuring speeds to the nearest m.p.h., then you
might record a result of 40 m.p.h., and you would not know which class to put it in. This is
not a problem in Question 1. You may come across data like these in examination questions,
but it will be made clear what to do with them.
17
18
Statistics 1
1 The speeds, in miles per hour, of 200 vehicles travelling on a motorway were measured by a
radar device. The results are summarised in the following table.
Speed
Frequency
3040
12
4050
32
5060
56
6070
72
7080
20
over 80
8
59
2
1014
5
1519
8
2024
14
2529
17
3034
11
3544
3
02
17
35
6
68
4
911
2
1215
1
09
6
1019
21
2029
51
3034
36
3539
48
4049
82
5059
31
Taking the class boundaries for 09 as 0.5 and 9.5, represent the data in a histogram.
6 The haemoglobin levels in the blood of 45 hospital patients were measured. The results,
correct to 1 decimal place, and ordered for convenience, are as follows.
9.1 10.1 10.7 10.7 10.9 11.3 11.3 11 4 11.4 11.4 11.6 11.8 12.0 12.1 12.3
12.4 12.7 12.9 13.1 13.2 13.4 13.5 13.5 13.6 13.7 13.8 13.8 14.0 14.2 14.2
14.2 14.6 14.6 14.8 14.8 15.0 15.0 15.0 15.1 15.4 15.6 15.7 16.2 16.3 16.9
(a) Form a grouped frequency table with 8 classes.
(b) Draw a frequency polygon of the data.
Representation of data
7 Each of the 34 children in a Year 3 class was given a task to perform. The times taken in
minutes, correct to the nearest quarter of a minute, were as follows.
4
3 34
6 14
5 14
7 12
8 34
9 12
4 12
6 12
4 14
7 14
6 34
5 34
4 34
8 14
3 12
5 12
7 34
8 12
6 12
7 14
6 34
7 34
5 34
7 34
6 12
(a) Form a grouped frequency table with 6 equal classes beginning with 3 3 34 .
(b) What are the boundaries of the rst class?
(c) Draw a frequency polygon for the data.
8 The table shows the age distribution of the 200 members of a golf club.
Age
Number of members
1619
12
2029
40
3039
44
4049
47
5059
32
over 59
25
(a) Form a table showing the class boundaries and frequency densities.
(b) Draw a histogram of the data.
9* A histogram is drawn to represent a set of data.
(a) The rst two classes have boundaries 2.0 and 2.2, and 2.2 and 2.5, with frequencies
5 and 12. The height of the rst bar drawn is 2.5 cm. What is the height of the second
bar?
(b) The class boundaries of the third bar are 2.5 and 2.7. What is the corresponding
frequency if the bar drawn has height 3.5 cm?
(c) The fourth bar has a height of 3 cm and the corresponding frequency is 9. The lower
class boundary for this bar is 2.7 cm. Find the upper class boundary.
1.4
6263
4
6465
5
6667
8
6871
13
7275
5
7679
4
19
20
Statistics 1
The points (61.5, 0), (63.5, 4), . . . , (79.5, 39) are then plotted. The points are joined with
straight lines, as shown in Fig. 1.31.
You may also see cumulative frequency diagrams in which the points have been joined with a
smooth curve. If you join the points with a straight line then you are making the assumption
that the observations in each class are evenly spread throughout the range of values in that
class. This is usually the most sensible procedure unless you know something extra about the
distribution of the data which would suggest that a curve was more appropriate. You should
not be surprised, however, if you encounter cumulative frequency graphs in which the points
are joined by a curve.
You can also use the graph to read off other information. For example, you can estimate the
proportion of the sample whose heights are under 69 inches. Read off the cumulative
frequency corresponding to a height of 69 in Fig. 1.31. This is approximately 21.9. Therefore
an estimate of the proportion of the sample whose heights were under 69 inches would be
21.9
0.56, or 56%.
39
Representation of data
Exercise 1C
1 The cumulative frequency diagram below illustrates data relating to the height (in cm) of
400 children at a certain school.
Cumulative
frequency
400
300
200
100
0
100
110
120
130
140
150
160
170
under 16
14.3
1639
33.1
4064
35.3
6579
11.9
80 and over
5.4
21
22
Statistics 1
5 The records of the sales in a small grocery store for the 360 days that it opened during the
year 2003 are summarised in the following cumulative frequency table.
Sales, x (in 100s)
Number of days
x<2
15
x<3
42
x<4
106
x<5
178
x<6
264
x<7
334
x<8
350
x<9
360
(a) Days for which sales fall below 325 are classied as poor and those for which the sales
exceed 775 are classied as good. With the help of a cumulative frequency diagram
estimate the number of poor days and the number of good days in 2003.
(b) Construct the corresponding frequency table.
6 A company has 132 employees who work in its city branch. The distances, x miles, that
employees travel to work are summarised in the following grouped frequency table.
x
Frequency
<5
12
59
29
1014
63
1519
13
2024
12
>24
3
Draw a cumulative frequency diagram and use it to nd the number of miles below which
(a) one-quarter
(b) three-quarters
<7.00
10
7.007.05
63
7.057.10
77
7.107.15
65
7.157.20
30
>7.20
5
Given that 10% of the components are scrapped because they are too short and 8% are
scrapped because they are too long, use a cumulative frequency diagram to estimate limits
for the length of an acceptable component.
8 As part of a health study the blood glucose levels of 150 students were measured. The
results, in mmol l1 correct to 1 decimal place, are summarised in the following table.
Glucose level
Frequency
<3.0
7
3.03.0
55
4.04.9
72
5.05.9
10
6.06.9
4
7.0
2
Draw a cumulative frequency diagram and use it to nd the percentage of students with
blood glucose level greater than 5.2.
The number of students with blood glucose level greater than 5.2 is equal to the number
with blood glucose level less than a. Find a.
1.5
Representation of data
Practical activities
1 One-sidedness Investigate whether reaction times are different when you use only
information from one side of your body.
(a) Choose a subject and instruct them to close their left eye. Against a wall hold a ruler
pointing vertically downwards with the 0 cm mark at the bottom and ask the subject to
place the index nger of their right hand aligned with this 0 cm mark. Explain that you
will let go of the ruler without warning, and that the subject must try to pin it against
the wall using the index nger of their right hand. Measure the distance dropped.
(b) Repeat this for, say, 30 subjects.
(c) Take a further 30 subjects and carry out the experiment again for each of these subjects,
but for this second set of 30 make them close their right eye and use their left hand.
(d) Draw a stem-and-leaf diagram for both sets of data and compare the distributions.
(e) Draw two histograms and use these to compare the distributions.
(f) Do subjects seem to react more quickly using their right side than they do using
their left side? Are subjects more erratic when using their left side? How does the fact
that some people are naturally left-handed affect the results? Would it be more
appropriate to investigate dominant side versus non-dominant side rather than left
versus right?
2 High jump
(a) Pick a subject and ask them to stand against a wall and stretch their arm as far up the
wall as possible. Make a mark at this point. Then ask the subject to jump as high as they
can and make a second mark at this highest point. Measure the distance between the
two marks. This is a measure of how high they jumped.
(b) Take two samples, one of Year 11 students and another of Year 7 students, and plot a
histogram of the results for each group.
(c) Do Year 11 students jump higher than Year 7 students?
3 Darts
(a) Throw four darts at a dart-board, aiming for the treble twenty, and record the total
score. Get a sample of students to repeat this. Plot the results on a stem-and-leaf
diagram.
(b) Take a second sample of students and ask each of them to throw four darts at the
dart-board, but this time tell each student to aim for the bull. Plot these results on a
second stem-and-leaf diagram which is back-to-back with the rst, for easy comparison.
Is the strategy of aiming for the treble twenty more successful than that of aiming for
the bull? Does one of the strategies result in a more variable total score?
23
24
Statistics 1
Miscellaneous exercise 1
1 The following gives the scores of a cricketer in 40 consecutive innings.
6
72
14
21
18
85
75
11
27
25
86
36
19
84
37
11
57
43
20
29
12
31
42
34
28
63
8
55
38
0
42
62
45
26
0
16
66
17
33
82
Illustrate the data on a stem-and-leaf diagram. State an advantage that the diagram has over
the data. What information is given by the data that does not appear in the diagram?
2 The service time, t seconds, was recorded for 120 customers at a supermarket till. The results
are summarised in the following grouped frequency table.
t
Frequency
<30
2
3060
3
60120
8
120180
16
180240
42
240300
25
300360
18
>360
6
Draw a histogram of the data. Estimate the greatest service time that is exceeded by 30
customers.
3 At the start of a new school year, the heights of the 100 new pupils entering the school are
measured. The results are summarised in the following table. The 10 pupils in the class 110
have heights not less than 110 cm but less than 120 cm.
Height (cm)
Number of pupils
100
2
110
10
120
22
130
29
140
22
150
12
160
3
Use a cumulative frequency diagram to estimate the height of the tallest pupil of the 18
shortest pupils.
4 The following ordered set of numbers represents the salinity of 30 specimens of water taken
from a stretch of the Irish Sea, near the mouth of a river.
4.2
10.2
11.9
4.5
10.3
12.4
5.8
10.4
12.4
6.3
10.7
12.5
7.2
11.1
12.6
7.9
11.6
12.9
8.2
11.6
12.9
8.5
11.7
13.1
9.3
11.8
13.5
9.7
11.8
14.3
(a) Form a grouped frequency table for the data using 6 equal classes.
(b) Draw a cumulative frequency diagram. Estimate the 12th highest salinity level. Calculate
the percentage error in this estimate.
Representation of data
5 The following are ignition times in seconds, correct to the nearest 0.1 s, of samples of
80 ammable materials. They are arranged in numerical order by rows.
1.2
2.3
3.7
4.7
5.6
6.8
9.4
1.4
2.5
3.8
4.7
5.8
6.9
9.6
1.4
2.5
3.8
4.8
5.9
7.3
9.7
1.5
2.5
3.9
4.9
5.9
7.4
9.8
1.5
2.5
3.9
5.1
6.0
7.4
10.6
1.6
2.6
4.0
5.1
6.3
7.6
11.2
1.7
2.7
4.1
5.1
6.4
7.9
11.8
1.8
2.8
4.2
5.2
6.4
8.0
12.8
1.8
3.1
4.3
5.2
6.4
8.6
1.9
3.2
4.5
5.3
6.4
8.8
2.1
3.5
4.5
5.4
6.7
8.8
2.2
3.6
4.6
5.5
6.8
9.2
Group the data into 8 equal classes, starting with 1.02.4 and 2.53.9 and form a grouped
frequency table. Draw a histogram. State what it indicates about the ignition times.
6 A company employs 2410 people whose annual salaries are summarised as follows.
Salary (in 1000s)
Number of staff
<5
16
510
31
1015
502
1520
642
2025
875
2530
283
3040
45
4050
12
>50
4
19
69
37
23
21
32
31
9
4
1
95
18
17
25
10
39
10
51
24
44
0
22
7
10
61
28
89
33
3
29
1
9
31
73
2
11
15
14
50
51
39
23
33
8
16
9
22
36
27
2
0
44
48
0
75
10
(a) Put the data into a grouped frequency table with classes 09, 1019, . . . , 7099.
(b) Draw a histogram of the data.
(c) Draw a cumulative frequency diagram and use it to estimate the number of leaves with
fewer than 34 galls.
(d) State an assumption required for your estimate in part (c), and briey discuss its
justication in this case.
25
26
Statistics 1
8 The trafc noise levels on two city streets were measured one weekday, between 5.30 a.m.
and 8.30 p.m. There were 92 measurements on each street, made at equal time intervals,
and the results are summarised in the following grouped frequency table.
Noise level (dB)
<65 6567 6769 6971 7173 7375 7577 7779 >79
Street 1 frequency
4
11
18
23
16
9
5
4
2
Street 2 frequency
2
3
7
12
27
16
10
8
7
(a) On the same axes, draw cumulative frequency diagrams for the two streets.
(b) Use them to estimate the highest noise levels exceeded on 50 occasions in each street.
(c) Write a brief comparison of the noise levels in the two streets.
9 The time intervals (in seconds) between which telephone calls are received at a solicitors
ofce were monitored on a particular day. The rst 51 calls after 9.00 a.m. gave the
following 50 intervals.
34
118
23
24
56
25
2
104
10
2
119
22
35
48
24
16
40
118
24
44
12
25
85
76
12
72
39
67
6
20
5
19
14
3
8
41
67
16
61
11
12
4
50
5
29
66
13
16
58
82
Illustrate the data with a stem-and-leaf diagram, and with a histogram, using 6 equal
classes.
10 Construct a grouped frequency table for the following data.
19.12
20.02
21.04
21.13
21.43
12.76
17.71
20.34
20.57
20.40
20.69
20.52
16.97
21.38
15.61
17.30
14.82
20.27
19.41
19.61
20.21
21.25
19.35
16.53
19.72