Professional Documents
Culture Documents
Dealing with
data
9
Syllabus strand
Chance and data CD
Ex 9A Collecting data:
CD 6.2
Ex 9B Presenting categorical
and discrete data:
CD 5.2
Ex 9C Representing data
grouped into class
intervals: CD 5.2
Age 8 9 10 11 12 13 14 15 16 17 18 Ex 9D Measures of central
(years) tendency: CD 5.2
Best jump 4.31 4.85 5.29 5.74 6.05 6.21 — 6.88 7.24 7.35 7.57 Ex 9E Measures of spread:
(metres) CD 6.2, CD DB 6.2
Ex 9F Bivariate data: CD 6.2
Ex 9G Lines of best fit:
CD 6.2
Laura is training for the long jump event and has hopes of
making the Australian team at the next Olympic Games.
Laura has been competing since she was 8 years old, and
each year she has kept records of her best jump. When
Laura was 14, she did not compete and missed the
season’s competition.
The qualifying mark for the Australian Olympic team is
8.1 metres and the next Olympic Games will be held when
Laura is 20. Can you predict whether or not Laura will
qualify for the team?
MQ QLD 3 - Chapter 09 Page 376 Wednesday, May 26, 2004 11:17 AM
t i gat
es Problems collating data
io
ion v
in
n inv
t i gat
es Suppose that you want to conduct a survey on Internet usage. Survey a group of
people giving them the questions set out below. After questioning the participants
collate your results.
1 Have you ever used the Internet?
2 Where do you access the Internet?
3 How often do you use the Internet?
4 What type of computer do you have?
5 For what purpose do you use the Internet?
6 What would you rather do: go to see a movie, or go on the Net to chat (that is,
use a chat room)?
7 What time of day do you use the Internet?
8 What do you like about the Internet?
Having completed your survey:
a Discuss as a class the problems that you may have had in collating the answers
to some questions.
b Identify which questions you consider suitable for a questionnaire.
c Redesign those questions that were difficult to collate and make them more
suitable.
MQ QLD 3 - Chapter 09 Page 378 Wednesday, May 26, 2004 11:17 AM
Sampling
If the collection of data is to be done through questioning, the most important step after
preparing a questionnaire is to decide who to ask. For the most accurate results a
census is required; that is, the entire population must be questioned. However, this is
usually practical for only small populations. For large populations statisticians usually
opt for a sample; that is, a group of people whose opinions will, hopefully, reflect the
opinions of the whole population.
It is important to decide how many people to include in a sample. As a general guide,
if the size of the population is N, a sample size should be about N . For example, if
the population is 100 people, a good sample size would be 100 = 10 people.
To estimate sample size, n, use the rule n ª N where N is the size of the
population.
There are numerous sampling techniques, but in this section we will discuss simple
random sampling and stratified random sampling. These particular techniques involve
using some form of random device for selecting people from the target population.
Such devices include numbered pieces of paper mixed in a hat, numbered balls in an
urn, tables of random digits and random number generators on calculators and com-
puters.
WORKED Example 1
A city council representative wishes to survey the parents of children attending any one of
the council’s 5 kindergartens. According to council’s records, the total enrolment in the
kindergartens is 250.
a Determine how many parents will be surveyed.
b Describe a procedure for obtaining a simple random sample.
THINK WRITE
a 1 Write down the rule to estimate a n= N , where N = 250
sample size.
2 Substitute known values into the rule n = 250
and calculate. = 15.811 388 3
3 Round your answer correct to the ≈ 16
nearest whole number.
4 Answer the question. Sixteen parents will need to be surveyed.
b Describe a sampling procedure where b Assign each child in the kindergarten a
the participants are chosen randomly. number from 1 to 250. Number 250 pieces
of paper and place them into a container.
Select 16 pieces of paper from the container
and match these to the children. The parents
of these children will participate in the
survey.
MQ QLD 3 - Chapter 09 Page 379 Friday, June 3, 2005 1:09 PM
sheet
You can generate a set of random numbers on a scientific calculator, graphics calculator
E
or a spreadsheet. Consider the case in the previous worked example. Generating
random
Scientific calculator numbers
By pressing the RANDOM (or Ran#) on a scientific calculator, a random decimal
number between 0 and 1 will be generated. This number must then be multiplied by the
value representing the size of the population, N. The result is then rounded up to the
nearest whole number.
Note: By always rounding up we ensure that 0 is never the result and this method
also ensures that the last number assigned to the population has an equal chance of
being generated.
Suppose that the random number generated is 0.217. Multiply this by 250 to obtain
the result 54.25, which is then rounded up to 55. This procedure will need to be
repeated a further 15 times.
Note: Repeated values are discarded, as parents cannot be surveyed twice. If this
occurs, an extra random number must be generated.
Some calculators will generate random integers between set limits. If your calculator
has a RANDI function this can be done. Enter a lower limit of 1 and an upper limit of
250, then close the brackets and press ENTER . This will need to be repeated a further
15 times.
Graphics calculator
A graphics calculator can be used to produce a list of 16 random integers between 1
and 250 by following these steps.
Casio:
1. Enter the RUN mode from the MAIN MENU.
2. Press OPTN F6 ( ) F4 (NUM) F2 (Int) (
then EXIT .
3. Press F3 (PROB) F4 (Ran #).
4. Complete the expression by pressing the keys
× 2 5 0 + 1 ) EXE .
5. Each time the EXE key is pressed, a random
integer in the range 1 to 250 is produced. Pressing
this key a total of 16 times will produce 16 random
integers.
TI-83:
1. Press MATH .
2. Use the arrow keys to select PRB.
3. Select option 5: randInt(.
4. Enter the lower limit (1), the upper limit (250),
the number of random integers required (16),
then close the brackets and press ENTER .
5. The list will now be displayed across the screen.
You will need to use the right arrow key to see
the full list.
MQ QLD 3 - Chapter 09 Page 380 Wednesday, May 26, 2004 11:17 AM
Spreadsheet
An Excel spreadsheet uses a similar method to a scientific calculator.
1. In cell A1 enter the formula =INT(RAND()*250+1). This formula generates a random
decimal number between 0 and 1, multiplies the result by 250 and adds 1. Only the
whole number part is then considered.
Note: Adding 1 and taking only the whole number part (that is, rounding down) is
the same as rounding up before adding 1.
2. Use the Fill Down function to copy this formula down to cell A16.
By clicking on the link below you can see further instructions on how to generate
random numbers by spreadsheet.
extension
extension — Generating random numbers by spreadsheet
Stratified random sampling
The method of stratified random sampling consists of splitting the target population
into certain categories, called strata. People in each stratum (category) may be expected
to have opinions similar to each other, but different from those expressed by people in
other strata. Suppose that you wanted to obtain opinions of secondary school students.
You may divide the school into groups according to gender or individual year levels.
The size of the sample selected from each stratum is proportional to the size of this
stratum, as compared to the whole population. The sample from each stratum is
selected randomly (as discussed previously).
WORKED Example 2
The city council representative from worked example 1 decides that parents of 3-year-old
children might have different opinions on the quality of care from parents of 4-year-olds.
The total enrolment of 3-year-olds is 100 and the total of 4-year-olds is 150. Determine
how many parents will need to be surveyed from each category.
THINK WRITE
Other random and non-random sampling methods are often used. Non-random methods
include convenience, volunteer, quota and judgement sampling. These methods, how-
ever, are statistically not as accurate as random sampling techniques, because it is easy
to introduce some kind of bias and also because they depend on the surveyor’s ability
to select an appropriate sample. Therefore, more confidence is placed in conclusions
drawn from samples obtained randomly.
remember
remember
1. If the collection of the data does not involve responses from people, it can be
obtained by observation. It is always a good idea to prepare a table where
observations will be recorded (tallied) prior to collecting the data.
2. To collect data that require responses from people, questioning is used.
Preparation in this case involves designing a questionnaire. When preparing a
questionnaire, questions must be clear and to the point. It is always a good idea
to include the category ‘other’ to cover any responses that are not listed.
Questions requiring written explanations should be avoided.
3. An estimation of the sample size, n, is given by the square root of the
population size, N. n ≈ N
4. Random sampling uses a random device to select people or objects from the
target population.
5. Simple random sampling ensures that every person or object in the population
is equally likely to be chosen.
6. Stratified random sampling splits the population into categories, called strata.
Opinions expressed by people from the same stratum may be similar to each
other, but may differ from those expressed by people from other strata. The
sample size from each stratum is proportional to that stratum size as compared
to the population size.
MQ QLD 3 - Chapter 09 Page 382 Wednesday, May 26, 2004 11:17 AM
9A Collecting data
1 State whether the following data can be obtained by observation or by questioning.
a The number of students attending school each day
b The shoe size and the clothes size of the students in Year 10
c The usual means of arriving to school for Year 8 and 9 students
d The number of M&Ms of each colour in a pack
e The amount of M&Ms and Minties consumed weekly by students in grade 5
f The daily total number of visits to the local medical centre
g The average number of visits to the doctor per year for people in different age
groups
h The ranking of a new movie (on a scale from ‘don’t bother’ to ‘can’t miss’)
i The number of people attending a new movie in the first week of showing
HEET
9.1 2 Design a questionnaire on a movie theme (include at least 10 questions). Test the ques-
tionnaire by asking someone from your class to fill it in. Did the test reveal any areas
SkillS
5 multiple choice
To conduct a statistical investigation, Nathan
needs to obtain a simple random sample
from 400 students enrolled at his school.
a The appropriate sample size Nathan
should obtain is:
A 10 B 20 C 30 D 40
MQ QLD 3 - Chapter 09 Page 383 Wednesday, May 26, 2004 11:17 AM
HEE
2
T
patients in the trauma and cardi- Finding
ology wards. At the time of the proportions
survey there are 30 cardiology
patients and 50 trauma patients.
If 9 patients are to be surveyed,
determine how many patients
should be surveyed from each
category.
8 multiple choice
A stratified random sample is being selected from a population of 100 individuals, who
have been divided into three strata. If the number of people in these strata is 30, 20 and
50, then the corresponding number of people selected from each stratum would be:
A 2, 3 and 5 respectively B 6, 4 and 10 respectively
C 3, 2 and 10 respectively D 3, 2 and 5 respectively.
t i gat
es Non-random sampling
io
ion v
in
n inv
t i gat
es Research and explain each of the following non-random sampling methods.
1 Convenience sampling
2 Volunteer sampling
3 Quota sampling
4 Judgemental sampling
MQ QLD 3 - Chapter 09 Page 384 Wednesday, May 26, 2004 11:17 AM
Types of data
All data can be divided into two major groups: categorical (or qualitative) and
numerical (or quantitative).
Categorical data are data that can not be measured or counted, but can be categ-
orised. Examples of categorical data include eye colour or pizza sizes available at the
local takeaway. Categorical data may be divided into two groups — nominal and
ordinal. Nominal data divide a particular piece of information into subgroups, for
example eye colour (hazel, blue, green and so on). Ordinal data deal with a ranking
system, for example pizza sizes (family, large, medium, small).
Numerical data are data that can be measured or counted. Examples of numerical
data include students’ heights and the number of defective items in a batch of identical
items.
Numerical data in turn can be subdivided into two groups — discrete and con-
tinuous. Discrete data can assume only specific values and are usually associated with
counting, such as the number of defective items in a batch. Continuous data can take
any value within a certain range and are usually associated with measuring, such as the
height of students.
In this section we will consider the representation of categorical and numerical
discrete types of data.
Coco Pops 6
Weet-Bix 3
Rice Bubbles 5
Total 30
THINK DRAW
1 Rule up a set of axes on graph paper. Cereal preferences
Title the graph and label the horizontal 7 for breakfast
axis ‘Type of cereal’ and the vertical 6
axis ‘Frequency’.
5
Frequency
4
3
2
1
0
Kellogg’s
Special K
Cornflakes
Coco Pops
Weet-Bix
Rice
Bubbles
Kellogg’s
Just Right
Type of cereal
2 Scale the horizontal axis and vertical axis.
3 Draw the first column (rectangle) so that
it reaches a vertical height of 7 units.
Label the section of the axis below the
column as Kellogg’s Special K.
4 Leave a gap between the first and
second columns (rectangles).
5 Repeat steps 3 and 4 for each of the
remaining cereals.
Graphical representation of the data allows us to see the ‘whole picture at a glance’.
Many questions about the data can be answered by simply looking at the graph. You
will have done much of this in earlier years.
MQ QLD 3 - Chapter 09 Page 386 Wednesday, May 26, 2004 11:17 AM
Oct
Beginning of quarter
July
‘His’ fashion
Jan
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Sales ($1000)
10 ‘His’ fashion
8 ‘Her’ fashion
6
4
2
0
Jan Apr July Oct
Pie graphs
Pie graphs, also known as sector graphs, pie charts and circular graphs, are mostly used
to represent categorical data. The size of the sector is proportional to the size of that
category, as compared to the total.
MQ QLD 3 - Chapter 09 Page 387 Wednesday, May 26, 2004 11:17 AM
THINK WRITE
1 Express each percentage as a common Mortgage: 30
---------
100
× 360 = 108°
fraction over 100 and convert the
fraction to an angle by multiplying by Food: 20
---------
100
× 360 = 72°
360°.
Childcare: 20
---------
100
× 360 = 72°
Bills: 10
---------
100
× 360 = 36°
Transport: 10
---------
100
× 360 = 36°
Entertainment: 5
---------
100
× 360 = 18°
Other: 5
---------
100
× 360 = 18°
2 Check that the total of all angles is Total of all angles is 360°.
360°.
3 Use a pair of compasses to draw a circle
and mark the centre. Monthly expenses
4 Measure sectors corresponding to each Other 5%
angle to complete the pie graph. Entertainment 5%
Transport 10%
Childcare 20%
Food 20%
MQ QLD 3 - Chapter 09 Page 388 Wednesday, May 26, 2004 11:17 AM
Note: Sometimes the total sum of the sectors won’t add up to 360° exactly, but would
produce a total somewhere around 360° (say between 359° and 361°). This is due to
rounding. When constructing a pie graph by hand, we can ignore this. All we have to
do is measure out all but the last angle and let the last angle ‘take in’ any minor error
that occurred due to rounding.
Pictographs
Pictographs, also referred to as picture graphs or pictograms, allow data to be displayed
in a novel way using illustrations or symbols. A key or legend is always used to show
the number of items each symbol represents. Although pictographs can be interesting
and do make an impression, the rounding of data to suit key pictures causes loss of
detail and accuracy. Half pictures (or scaled down versions) can be used to represent
half of the data. However, smaller fractions such as one-third or one-quarter may be
quite difficult to illustrate using pictographs.
WORKED Example 5
The table at right shows the number of
Time interval Number of voters
people who attended a local primary school
(during the first 6 hours of voting time) to 8 am–9 am 60
cast their vote in the last state election. 9 am–10 am 85
Represent the given data as a pictograph.
10 am–11 am 100
11 am–12 noon 125
12 noon–1 pm 115
1 pm–2 pm 95
THINK DRAW
1 Rule and label a vertical axis. Title the
graph.
2 Scale the vertical axis.
3 Include a key showing a symbol to
represent the number of voters.
4 Place the appropriate number of Number of voters who attended
symbols in their respective row next to during the first six hours of voting
the relevant time interval. For example, Time
8 am – 9 am → 6 symbols 1 pm – 2 pm
(6 × 10 = 60 voters).
12 noon – 1 pm
11 am – 12 noon
10 am – 11 am
9 am – 10 am
8 am – 9 am
= 10 voters
MQ QLD 3 - Chapter 09 Page 389 Wednesday, May 26, 2004 11:17 AM
Dot-plots
Dot-plots are similar to pictographs. Each observation is represented by a single dot. A
good feature of the dot-plot is that it can be constructed while in the process of col-
lecting the data. A horizontal axis is prepared by writing in possible values of observ-
ations or categories, and then the collection of data begins. Each time a certain value is
observed, a dot is placed in the corresponding column. Provided that the dots are
placed neatly in columns and are evenly spaced, by the end of the experiment the data
collected are also represented (displayed) graphically; that is, two steps are accom-
plished in one action.
WORKED Example 6
While waiting for her mum to pick her up from school, Anna watched the cars that were
passing by. Within 7 minutes Anna observed 4 sedans, 6 station wagons, 5 four-wheel
drives, 3 hatchbacks and 1 sports car. Represent these data using a dot-plot.
THINK DRAW
1 Draw an evenly scaled horizontal axis
and label it.
2 Write down the different types of cars
observed underneath the horizontal
axis.
3 Systematically work through the given
data and place a dot above the
appropriate type of vehicle for each
value recorded.
4 Title the dot-plot. Cars observed by Anna
Cars
Sedan
Station wagon
4-wheel drive
Hatchback
Sports car
Stem-and-leaf plots
A stem-and-leaf plot, or stem plot, can be used if the data are initially recorded as a
string (or list) of numbers. Although stem-and-leaf plots are usually used to represent
discrete numerical data, they can also be used to represent continuous data if the data
are rounded off first. For example, if the distances between cities are rounded off to,
say, the nearest kilometre, they can then be displayed on a stem-and-leaf plot.
MQ QLD 3 - Chapter 09 Page 390 Wednesday, May 26, 2004 11:17 AM
Data in stem-and-leaf plots are made up of two components; a stem and a leaf. The
final digit of a particular number is the leaf, the previous digit(s) form the stem.
Stem-and-leaf plots have leaves arranged in order of size, increasing away from
the stem. The final digit of a particular number is the leaf while the previous
digit(s) form the stem.
WORKED Example 7
The heights of 30 students (to the nearest cm) were measured and recorded as follows:
125, 143, 119, 136, 127, 131, 139, 122, 140, 118,
120, 123, 132, 134, 127, 129, 124, 131, 138, 133,
122, 128, 130, 135, 141, 139, 121, 138, 131, 126
Represent the data on a stem-and-leaf plot.
THINK WRITE
1 Rule up two columns headed ‘stem’ and Key 11 | 8 = 118 cm
‘leaf’. Stem Leaf
2 Make note of the smallest and largest 11 98
values of the data (118 and 143). List the 12 572037942816
stems in ascending order in the first 13 6192418305981
column. 14 301
Note: The hundreds and tens component
of the number represents the stem.
3 Systematically work through the given
data and enter the leaf (unit component) of
each value in a row beside the appropriate
stem.
4 Include a key next to the plot which Key 11 | 8 = 118 cm
informs the reader of the significance of Stem Leaf
each entry. 11 89
5 Redraw the stem-and-leaf plot so that the 12 012234567789
numbers in each row of the leaf column 13 0111234568899
are in ascending order. This is called an 14 013
ordered stem-and-leaf plot.
Note: In worked example 7, the middle rows of leaves were too long. This can be over-
come by breaking stems into smaller intervals, say intervals of 5. The stem of 12 would
include all numbers from 120 to 124 inclusive and the stem of 12* would include all
numbers from 125 to 129 inclusive. In comparison to the stem-and-leaf plot in worked
example 7, the new plot would have extra rows and not look so bunched up.
Key 11 | 8 = 118 cm Key 11* | 8 = 118 cm
Stem Leaf Stem Leaf
11 8 9 11* 8 9
12 0 1 2 2 3 4 5 6 7 7 8 9 12* 0 1 2 2 3 4
13 0 1 1 1 2 3 4 5 6 8 8 9 9 12* 5 6 7 7 8 9
14 0 1 3 13* 0 1 1 1 2 3 4
13* 5 6 8 8 9 9
14* 0 1 3
MQ QLD 3 - Chapter 09 Page 391 Wednesday, May 26, 2004 11:17 AM
remember
remember
1. Data may be classified under the following headings:
Nominal: data are placed in
subgroups.
Categorical: data are placed in categories
(non-numerical form).
Ordinal: categories are in a
ranked order.
et
et
3 For each categorical piece of data in question 1 state whether it is nominal or ordinal.
Bar
graphs
(DIY)
WORKED 4
Example
3
Thirty-five people were asked to name their favourite movie of all time. The results were
recorded in the table below. Construct a column graph to represent this information.
Movie Frequency
Pearl Harbor 2
Titanic 4
Crocodile Dundee 12
Batman 10
The Mask 7
Total 35
MQ QLD 3 - Chapter 09 Page 393 Wednesday, May 26, 2004 11:17 AM
HEE
Oct
T
in a certain city. Reading
a What was the lowest average Sept bar
temperature recorded? Aug graphs
b In what month did the average
Months
July
temperature reach its maximum? June
c In which months was the average May
temperature the same? Apr
d What was the difference in average Mar
temperatures between December and Feb
June? Jan
e In which country do you think the 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
temperatures shown on the graph could Temperature °C
have been recorded?
Economics
6 multiple choice
The graph at right shows the Chemistry
enrolment of students in a particular Subject
Maths A
school in a number of Senior subjects.
a The type of graph being displayed Maths B
is a:
A bar graph Physics
B column graph
C compound bar graph Number of students enrolled
D multiple bar graph Boys Girls
b The subject that has the largest enrolment of boys is:
A Economics B Chemistry C Maths A D Maths B
c The subject that has the largest enrolment of students is:
A Chemistry B Maths A C Maths B D Physics
d The subject that has the least enrolment of girls is:
A Economics B Maths A C Maths B D Physics
e Which of the following statements is not true?
A The number of girls studying Economics is about the same as those studying
Chemistry.
B Physics has the smallest number of students enrolled.
C The number of boys and girls studying Maths B is about the same.
D There are as many boys studying Physics as there are girls studying Chemistry.
7 The table below shows the production of three different models of bicycles (in
thousands) in four consecutive years. (Note: The production of model A was stopped
in 1998.) Construct a multiple column graph to represent these data.
Year Model A Model B Model C
1996 12 16 20
1997 8 20 23
1998 — 26 26
1999 — 28 30
MQ QLD 3 - Chapter 09 Page 394 Wednesday, May 26, 2004 11:17 AM
WORKED 8 The data in the table below show the distribution of the budget allocated to
HEET
9.5 Example
promotional advertising in a big cosmetic company (all numbers are expressed as a
4
SkillS
TV commercials 45
Radio commercials 10
Total 100
9.6 9 During the day Maya recorded the amount of time spent on various activities, as
HEET shown below.
SkillS
Expressing
one quantity House- Watching
as a Activity Sleep Work Travel Cooking Eating work TV Shopping
percentage
of another Time spent 7.5 8 1.5 1 1 1.5 2 1.5
(h)
Spreadshe
EXCEL
et
Sector or
a Represent these data on a pie graph.
pie graphs b What percentage of Maya’s day is spent doing various types of work (that is, at her
(DIY) workplace, cooking, housework and shopping)?
c What percentage of the day did Maya spend watching TV?
WORKED 10 The number of ‘Happy meal deals’ sold at the local McDonald’s restaurant in one par-
Example
5
ticular week is shown in the table below. Represent these data as a pictograph.
Monday 30
Tuesday 25
Wednesday 35
Thursday 40
Friday 65
Saturday 135
Sunday 110
MQ QLD 3 - Chapter 09 Page 395 Wednesday, May 26, 2004 11:17 AM
sheet
E
WORKED
Example
6
as follows: 3 A+s, 7 As, 8 B+s, 6 Bs and 1 C+. Represent these data using a dot-plot. Dot-plots
(DIY)
WORKED 13 The ages of the people arriving at a hospital emergency room during one morning were
Example
7
recorded as follows: 12, 29, 48, 62, 67, 23, 69, 21, 19, 73, 82, 17, 46, 20, 51, 64, 24,
66, 34, 35, 80, 28, 27, 61, 75, 45, 18, 26, 32, 59. Represent the data on a stem-and-leaf
plot.
14 The time (to the nearest second) taken by each student in the class to run a certain
distance was recorded in the table below.
Boys 42 48 46 39 43 38 45 47 51 42 50 50 49
Girls 51 50 46 47 42 40 58 59 52 49 48 44 56
GAME
time
a Represent the data on a back-to-back stem-and-leaf plot with the stems 3, 4 and 5.
Dealing
b Redraw your stem-and-leaf plot so that the stems are now 3*, 4, 4*, 5 and 5*. with data
c Which graph in your opinion gives a better ‘feel’ of the data? Explain your answer. — 001
QUEST
S
M AT H
GE
Which of the five pieces shown below cannot be found in the jigsaw
EN
D E
MQ QLD 3 - Chapter 09 Page 396 Wednesday, May 26, 2004 11:17 AM
What is hologra
holograph
phy?
y?
Hockey games won by a club
18
1 G
15 2 H
3 I
12
4 J
9 5 L
6 M
6
7 N
3 8 O
0
1 2 3 4 5 6 7 8
Week number
Ann
Ben Name SUE BILL DAVE JILL BEN ANN
Name
Jill
Dave Frequency
Bill
Sue P Q R S T U
Number of music CDs = 2 music CDs
11 9 8 11 8 2 9 21 12 17 16 8 1 10
14 0 1 5 16 2 12 21 15 7 11 9 0 8 8
5 12 6 8 21 13 12 1 21 7 18 14 9 1 11 1
1 10 7 21 1 4 3 8 2 11 16 13 12 21 15
18 7 13 8 0 4 8 7 6 13
MQ QLD 3 - Chapter 09 Page 397 Wednesday, May 26, 2004 11:17 AM
Frequency tables
A frequency table shows the number of scores (frequencies) that belong to each group
or class interval.
WORKED Example 8
The following set of data shows the number of lollies in each of thirty 500 g bags. Place the
data into a frequency table, using class intervals of 5.
59, 62, 51, 55, 46, 60, 58, 49, 64, 57, 53, 50, 56, 61, 54,
53, 55, 55, 61, 58, 54, 52, 57, 58, 59, 51, 48, 60, 60, 56
THINK WRITE
WORKED Example 9
The data in the frequency table at right show
the mass (in kg) of 40 people upon joining a
Class interval Frequency
weight loss program. Represent the given
data using a: 60–<70 2
a histogram 70–<80 5
b frequency polygon.
80–<90 9
90–<100 12
100–<110 7
110–<120 3
120–<130 2
Total 40
THINK DRAW
a 1 Rule up a set of axes on graph paper. a
Title the graph. Label the horizontal
axis Mass (kg) and the vertical axis
Frequency.
2 Scale the horizontal and vertical
axes. Note: Leave half an interval at
the beginning and end of the graph.
3 Draw a column which represents the
first class interval and reaches a Mass of people joining
a weight loss program
vertical height of 2 people. 12
4 Repeat step 3 for each of the other 10
class intervals.
8
Frequency
6
4
2
0
60 70 80 90 100 110 120 130
Mass (kg)
MQ QLD 3 - Chapter 09 Page 399 Wednesday, May 26, 2004 11:17 AM
THINK DRAW
b 1 Mark the midpoints of the tops of b
the rectangles obtained in the Mass of people joining
histogram from part a. a weight loss program
2 Join the midpoints by straight line 12
intervals. 10
3 Close the polygon by drawing lines 8
Frequency
which meet the horizontal axis a
6
half-column width before the first
column and a half-column width 4
after the last column. 2
0
60 70 80 90 100 110 120 130
Mass (kg)
A graphics calculator can be used for constructing histograms. However, when the data
are grouped, the midpoint of each class interval must be entered for x-values, rather
than the extreme values (that is, the beginning and end point).
You can store the data from a frequency distribution table on your graphics calculator.
Consider the data from worked example 9.
Casio:
1. Enter the STAT mode from the MAIN MENU.
2. In List1 enter the midpoint of each class and in List2
enter the frequencies.
4. Press F4 (SEL) and set StatGraph1 as On. Press F6 (DRAW). Set the interval to
start at 65 with a pitch of 10. Press F6 (DRAW) to display the histogram.
MQ QLD 3 - Chapter 09 Page 400 Wednesday, May 26, 2004 11:17 AM
TI-83:
1. Press STAT then choose option 1: Edit and enter the
midpoint of each class in the L1 column and the
frequencies in the L2 column.
remember
remember
1. Class intervals are used when:
(a) data are spread over a wide range
(b) there is a large amount of data
(c) the data are continuous.
The size of a class interval should lead to the formation of 5 to 10 groups.
2. A frequency table shows the number of scores (frequencies) that belong to each
group or class interval.
3. A histogram resembles a column graph without gaps between each column. A
frequency polygon is a line graph that connects the midpoints of the adjacent
HEET
9.7 columns of a histogram.
SkillS
Organising
data into
class
intervals
Representing data grouped
ogram Ca
9C
GC pr
UV
statistics
WORKED 1 The following set of data shows the pulse rate of 30 gym members, 10 minutes after
Example
am
rogr TI 8
they have started exercising on a treadmill. Place the data into a frequency table, using
GC p
sheet
E
9
Histograms
and
frequency
polygons
L Spread
XCE
sheet
E
Histograms
and
frequency
polygons
(DIY)
100–<150 13
150–<200 18
200–<250 19
250–<300 17
The data in the frequency table at right
show the number of houses of different 300–<350 14
sizes (measured in square metres) in a
small block. 350–<400 11
Represent the given data using a:
a histogram Total 92
b frequency polygon.
10
0
110 120 130 140 150 160 170 180 190
Data
MQ QLD 3 - Chapter 09 Page 402 Wednesday, May 26, 2004 11:17 AM
4 The data below show the number of books borrowed from the school library by 30
Year 10 students over the last year.
20, 39, 15, 0, 2, 41, 25, 7, 19, 30, 5, 12, 26, 4, 32,
40, 10, 15, 22, 19, 33, 7, 16, 9, 34, 28, 24, 11, 8, 12
a Group the data into a frequency table in class intervals of size 10.
b Represent the grouped data on a histogram.
c Add a frequency polygon to the data.
5 For the data in question 4:
SHE
ET 9.1 a Complete a frequency table using a class interval size of 5.
Work
QUEST
S e h
M AT H
GE
f g
Here is a challenging visual puzzle. d
i
EN
k
MQ QLD 3 - Chapter 09 Page 403 Wednesday, May 26, 2004 11:17 AM
n inv
t i gat
es Collect several graphs from newspapers or magazines.
For each graph:
1 What point is the graph trying to make?
2 Has the graph been presented fairly? If not, how has the display been altered to
make a greater impression?
3 Suppose you are trying to present the opposite point of view. Draw a graph
using the same data to present the opposite point of view.
1
1 Sonya wants to find out which is the most popular video hired from the local video
rental outlet. Would this be best achieved by observation or questioning?
2 A small country town has a population of 4900. If the local council wants to conduct
a survey on the population, what would be an appropriate sample size?
3 In the survey in question 2, the participants are chosen using a random number
generator. A scientific calculator is used and the decimal number 0.516 is generated.
Assuming that the population of the town is numbered 1 to 4900 on the electoral roll,
what is the number of the chosen participant?
4 This town is a mining town, with a population of 3100 males and only 1800 females.
How many males should participate in the survey if a stratified sample is chosen?
5 Various brands of paint are tested to see the area of wall that can be painted with 1 litre
of each brand. Are the data being examined categorical or numerical? If they are
categorical, are they nominal or ordinal? If they are numerical, are they discrete or
continuous?
6 Fifty students were surveyed on their favourite subject. Twenty students selected
Mathematics, 13 selected English, 7 selected History and 10 selected Geography.
Construct a column graph to represent this information.
7 Represent the data from question 6 in a pie graph.
8 Represent the following data on students’ arm span (to the nearest centimetre) on a
back-to-back stem-and-leaf plot.
Girls 152, 148, 139, 169, 151, 143, 142, 148, 152
Boys 161, 169, 181, 191, 162, 153, 185, 161, 152
9 A group of students were tested to see how many sit-ups they could do in 60 seconds.
Construct a histogram and frequency polygon for the tabled data.
Number 1–15 16–30 31–45 46–60 61–75
Frequency 1 10 21 32 2
10 The time that students take to travel to school is summarised in the table below.
Time (mins) 0–<5 5–<10 10–<15 15–<20 20–<25 25–<30 30–<35
Frequency 2 5 10 8 6 4 1
Represent the data as a pictograph.
MQ QLD 3 - Chapter 09 Page 404 Wednesday, May 26, 2004 11:17 AM
Ungrouped data
Mean
To obtain the mean of a set of ungrouped data, all numbers (scores) in the set are added
together and then the total is divided by the number of scores in that set.
sum of all scores
mean = ----------------------------------------
number of score
∑ x- .
Symbolically this is written x = -------
n
Median
The median is the middle value of any set of data arranged in numerical order. In the
n+1
set of n numbers, the median is located at the ------------ th score. The median is:
2
• the middle score for an odd number of scores arranged in numerical order
• the average of the two middle scores for an even number of scores arranged in
numerical order.
Mode
The mode is the score that occurs most often in a set of data. Sets of data may contain:
1. no mode; that is, each score occurs once only
2. one mode
3. more than one mode.
WORKED Example 10
For the data set 6, 2, 4, 3, 4, 5, 4, 5 find the:
a mean b median c mode.
THINK WRITE
a 1 Calculate the sum of the scores; that a Σx = 6 + 2 + 4 + 3 + 4 + 5 + 4 + 5
is, Σx. = 33
2 Count the number of scores; that is, n. n=8
Σx
3 Write the rule for the mean. x = ------
n
4 Substitute the known values into the = 33
------
rule. 8
5 Evaluate. = 4.125
6 Answer the question. The mean is 4.125.
MQ QLD 3 - Chapter 09 Page 405 Friday, June 3, 2005 1:09 PM
➞
➞
c 1 Systematically work through the set and c 23444556
➞
➞
➞
make note of any repeated values (scores).
2 Answer the question. The mode is 4.
3. Press F1 (1 VAR). The arrow keys can be used to scroll down the list. The mean x–
is shown as 4.5 and the median, Med, is shown as 4.
TI-83:
1. Press STAT then select 1: Edit and enter the scores
in the L1 column.
2. Again press STAT , arrow across to choose the
CALC option and select 1: 1-Var Stats.
MQ QLD 3 - Chapter 09 Page 406 Wednesday, May 26, 2004 11:17 AM
You will need to use the arrow keys to scroll the screen
in order to see the median.
The mean is given as x = 4.5 and the median is given as
Med = 4.
WORKED Example 11
For the table at right find the:
a mean b median c mode. Score (x) Frequency (f)
4 1
5 2
6 5
7 4
8 3
Total 15
THINK WRITE
1 Rule up a table with four columns titled Frequency
Score (x), Frequency ( f ), Score Frequency × score Cumulative
Frequency × score ( f × x) and (x) (f) ( f × x) frequency (cf )
Cumulative frequency (cf ). 4 1 4 1
2 Enter the data and complete both the 5 2 10 3
f × x and cumulative frequency 6 5 30 8
columns. 7 4 28 12
8 3 24 15
n = 15 Σ( f × x) = 96
MQ QLD 3 - Chapter 09 Page 407 Wednesday, May 26, 2004 11:17 AM
THINK WRITE
Σ( f × x)
a 1 Write the rule for the mean. a x = ---------------------
n
96
2 Substitute the known values into the x = ------
15
rule and evaluate.
= 6.4
3 Answer the question. The mean of the data set is 6.4.
15 + 1
b 1 Locate the position of the median b The median is the --------------- th or 8th score.
2
n+1
using the rule ------------ where n = 15.
2
This places the median as the 8th score.
2 Use the cumulative frequency The median of the data set is 6.
column to find the 8th score and
answer the question.
c 1 The mode is the score with the c The score with the highest frequency is 6.
highest frequency.
2 Answer the question. The mode of the data set is 6.
2. Press F2 (CALC) then F6 (SET). Set the 1Var Xlist as List1 and the 1Var Freq as
List2. Press EXE .
3. Press F1 (1VAR) to display the key statistics. The mean is the value given for x .
TI-83:
1. Press STAT then select 1: Edit and enter the scores in the L1 column and the fre-
quencies in the L2 column.
2. Again press STAT , choose the CALC menu and select 1:1-Var Stats. Press 2nd
[L1] followed by a comma, then 2nd [L2] .
3. Press ENTER to display the key statistics. The mean is the value given for x .
MQ QLD 3 - Chapter 09 Page 408 Wednesday, May 26, 2004 11:17 AM
Grouped data
Mean
When the data are grouped into class intervals, the actual values (or data) are lost. In
such cases we have to approximate the real values with the midpoints of the intervals
into which these values fall. For example, when measuring heights of students in a
class, if we found that 4 students had a height between 180 and 185 cm, we have to
assume that each of those 4 students is 182.5 cm tall. The formula used for calculating
the mean is the same as for data presented in a frequency table:
Σ( f × x)
x = ---------------------
n
Here x represents the midpoint (or class centre) of each class interval, f is the corre-
sponding frequency and n is the total number of observations in a set.
Median
The median cannot be found exactly when the data have been grouped. Instead we can
locate the median class from the cumulative frequency.
Modal class
We do not find a mode because exact scores are lost. We can, however, find a modal
class. This is the class interval that has the highest frequency.
WORKED Example 12
For the given data: Class interval Frequency
a estimate the mean
b find the median class 60–<70 5
c find the modal class. 70–<80 7
80–<90 10
90–<100 12
100–<110 8
110–<120 3
Total 45
THINK WRITE
1 Draw up a table with 5 columns Mid- Fre- Frequency Cumulative
headed Class interval, Midpoint Class point quency × midpoint frequency
(x), Frequency ( f ), Frequency × interval x (f) (f × x) (cf )
midpoint ( f × x) and Cumulative 60–<70 65 5 325 5
frequency (cf ). 70–<80 75 7 525 12
2 Complete the x, f × x and cf 80–<90 85 10 850 22
columns. 90–<100 95 12 1140 34
100–<110 105 8 840 42
110–<120 115 3 345 45
Σf = 45 Σ(f × x) = 4025
MQ QLD 3 - Chapter 09 Page 409 Wednesday, May 26, 2004 11:17 AM
n+1 n+1
b 1 Median = ------------ score. Use b Median = ------------ score
2 2
this to locate the median score. 45 + 1
Median = --------------- score
2
Median = 23rd score
2 Locate the class interval that The 23rd score lies in the 90–<100 class.
contains this median score.
3 Answer the question. The median class is the 90–<100 class interval.
remember
remember
For ungrouped data the following measures of central tendency are used.
1. The mean is the sum of scores in a given set of data divided by the number of
scores in the set.
Σx
x = ------ is used when a list of scores is given.
n
Σ( f × x)
x = --------------------- is used when a frequency distribution table is given.
n
2. The median is:
(a) the middle score for an odd number of scores arranged in numerical order
(b) the average of the two middle scores for an even number of scores arranged
in numerical order.
n+1
Its location is determined by finding the score in the ------------ th position.
2
3. The mode is the score that occurs most often in a set of data.
For grouped data the following measures of central tendency are used.
Σ( f × x)
4. The mean is x = --------------------- , where x represents the midpoint of a class interval.
n
5. The median class can be determined from the cumulative frequency.
6. The modal class is given by the class interval with the highest frequency.
10
Measures a 3, 5, 6, 8, 8, 9, 10 b 4, 6, 7, 4, 8, 9, 7, 10
of central c 17, 15, 48, 23, 41, 56, 61, 52 d 4.5, 4.7, 4.8, 4.8, 4.9, 5.0, 5.3
tendency 1 1 1 1 1
e 7 --2- , 10 --4- , 12, 12 --4- , 13, 13 --2- , 13 --2- , 14
Spreadshe
2 The following back-to-back stem-and-leaf plot shows the test results of 25 Year 10
EXCEL
et
Finding students in Mathematics and Science. Find the mean, median and mode for each of
the median the two subjects.
Spreadshe
Key: 3|2 = 32
Leaf Stem Leaf
EXCEL
et
et
97432 6 2679
Finding
the mode 8510 7 3678
73 8 044689
Spreadshe 9 258
EXCEL
et
Finding
the mode
(DIY)
MQ QLD 3 - Chapter 09 Page 411 Wednesday, May 26, 2004 11:17 AM
Casi
GC
11
o
a b UV
statistics
Score (x) Frequency ( f ) Score (x) Frequency ( f )
rog
4 3 12 4 GC p ram
TI
5 6 13 5 UV
statistics
6 9 14 10
7 4 15 12 XCE
L Spread
sheet
E
8 2 16 9
Calculating
Total 24 Total 40 the mean
from a
frequency
4 The following data show the number of bedrooms in each of the 10 houses in a table
particular neighbourhood: 2, 1, 3, 4, 2, 3, 2, 2, 3, 3. L Spread
XCE
sheet
E
Calculating
the mean
from a
frequency
table
(DIY)
40–<50 2
50–<60 4
60–<70 6
70–<80 9
80–<90 5
90–<100 4
Total 30
6 Calculate the mean of the grouped data shown in the table below.
100–<109 3
110–<119 7
120–<129 10
130–<139 6
140–<149 4
Total 30
7 Find the modal class of the data shown in the table below.
51–<55 1
56–<60 3
61–<65 4
66–<70 5
71–<75 3
76–<80 2
Total 18
MQ QLD 3 - Chapter 09 Page 413 Wednesday, May 26, 2004 11:17 AM
220–<229 2
230–<239 2
240–<249 3
250–<259 5
260–<269 4
270–<279 4
Total 20
10 a Add one more number to the set of data 3, 4, 4, 6, so that the mean of a new set is
equal to its median.
b Design a set of five numbers so that mean = median = mode = 5.
c In the set of numbers 2, 5, 8, 10, 15 change one number, so that the median
remains unchanged, while the mean increases by 1.
MQ QLD 3 - Chapter 09 Page 414 Wednesday, May 26, 2004 11:17 AM
Career profile
GRAHAM DE HOEDT — Meteorologist
Measures of spread
Range
The most basic measure of spread is the range. It is defined as the difference between
the highest and the lowest values in the set of data.
range = highest score − lowest score or
range = Xmax − Xmin
WORKED Example 13
Find the range of the given data set: 2.1, 3.5, 3.9, 4.0, 4.7, 4.8, 5.2.
THINK WRITE
1 Identify the lowest score of the data set. Lowest score = 2.1
2 Identify the highest score of the data set. Highest score = 5.2
3 Write the rule for the range. Range = highest score − lowest score
4 Substitute the known values into the rule. = 5.2 − 2.1
5 Evaluate. = 3.1
Interquartile range
Another way of measuring the difference in spread is by dividing the data set into quar-
ters. The number that marks the end of the first quarter of an ordered data set is called
the lower quartile and is denoted by QL (or the 25th percentile). The number that marks
the end of the third quarter is called the upper quartile and is denoted by QU (or the
75th percentile).
The difference between the upper and lower quartiles is called the interquartile range
(IQR). It considers the middle 50% of the data.
IQR = QU − QL
MQ QLD 3 - Chapter 09 Page 416 Wednesday, May 26, 2004 11:17 AM
The lower quartile, upper quartile and the interquartile range of a set of data may be
calculated using the following steps.
1. Order the set of data.
2. Locate the median that divides the set of data into two halves.
(a) For an odd number of scores, the median will be one of the original scores. It
should not be included in either the lower or upper half of the scores.
(b) For an even number of scores the median will lie halfway between two scores. It
will divide the data into two equal sets.
3. Locate and calculate QL, the median of the lower half of the data.
4. Locate and calculate QU, the median of the upper half of the data.
5. Obtain the interquartile range by calculating the difference between the upper and
lower quartiles; that is, IQR = QU − QL.
WORKED Example 14
Calculate the interquartile range (IQR) of the following set of data: 3, 2, 8, 6, 1, 5, 3, 7, 6.
THINK WRITE
1 Arrange the scores in order. 123356678
2 Locate the median and use it to divide the 1233 5 6678
data into two halves. Note: The median is the
5th score in this data set and should not be
included in either half of the data.
2+3
3 Find QL, the median of the lower half of the QL = ------------
2
data. 5
= ---
2
= 2.5
6+7
4 Find QU, the median of the upper half of the QU = ------------
2
data. 13
= ------
2
= 6.5
5 Calculate the interquartile range. IQR = QU − QL
= 6.5 − 2.5
=4
Boxplots
Boxplots (or box-and-whisker plots) are constructed using a five-number summary
which includes the lowest value of the set, the lower quartile, the median, the upper
quartile and the highest value of the set; that is, Xmin, QL, Median, QU, Xmax. The ver-
tical ends of a box extend from the lower to the upper quartile and contain a vertical
line, indicating the location of the median. Whiskers extend to the smallest and to the
largest values on either side of the box.
WORKED Example 15
a Represent the following set of data using a boxplot: 4, 5, 5, 6, 9, 10, 12, 14, 15.
b State the: i range and ii IQR of the data.
THINK WRITE
a 1 Check that the scores are in a 4 5 5 6 9 10 12 14 15
ascending order. Median
2 State the smallest number in the set. Lowest value = 4
3 State the largest number in the set. Highest value = 15
4 Find the median (5th score). Median = 9
Continued over page
MQ QLD 3 - Chapter 09 Page 418 Wednesday, May 26, 2004 11:17 AM
THINK WRITE
5+5 12 + 14
5 Find the value of QL and QU. QL = ------------ QU = ------------------
2 2
10 26
= ------ = ------
2 2
=5 = 13
6 Draw a horizontal axis which is
evenly scaled and incorporates the
given values.
7 Draw a box representing the
interquartile range which begins at 5
(QL) and ends at 13 units (QU).
8 Draw a vertical line within the box at
9 units (the median).
9 Draw two horizontal lines, one
extending from the smallest value to
the lower quartile end of the box, the 2 4 6 8 10 12 14 16
other extending from the upper quartile
end of the box to the highest value.
b ii 1 Write the rule for the range. b ii Range = highest value − lowest value
2 Substitute the values into the rule. = 15 − 4
3 Evaluate. = 11
TI-83:
1. Press STAT , choose EDIT and enter the data as L1.
2. Press 2nd [STAT PLOT] and press 1 and select
ON. Then use the arrow keys to choose the box-and-
whisker plot.
3. Xlist will need to be L1 while Freq must become 1.
Outliers
An outlier is a piece of data which is considerably different from the rest of the values
in a set of data. The presence of an outlier may be an indication that an error has been
made in recording the data. Outliers may alter the representative nature of any statistics
calculated, as illustrated below.
For the data set 3, 3, 2, 1, 3, 4, 2, 3, 2, 2 the measures of central tendency are:
mean = 2.5 median = 2.5 mode = 2 and 3.
When an outlier, say 48, is added to the original data set, the measures of central ten-
dency for the new list, 3, 3, 2, 1, 3, 4, 2, 3, 2, 2, 48 are: mean = 6.64, median = 3,
mode = 2 and 3. It can be seen that when an outlier is added to a set of data, the mean
may not be truly representative of values in the data set. In this case the median (or
mode) would be a better measure of central tendency than the mean.
MQ QLD 3 - Chapter 09 Page 420 Wednesday, May 26, 2004 11:17 AM
An outlier can be defined as any value which is more than 1.5 × interquartile range
above the upper quartile value or more than 1.5 × interquartile range below the lower
quartile value.
When drawing a boxplot the whiskers do not extend as far as any outliers. The whis-
kers stop at the last score that is not an outlier, with crosses placed at the value of any
outliers.
WORKED Example 16
As newly appointed coach of Omizzolo’s Shooting
Stars basketball team, Maria decided to record
each player’s statistics for the previous season. The
number of goals scored by the leading goal shooter
were:
3, 18, 30, 29, 25, 25, 36, 27, 28, 28, 28, 23,
1, 22, 23, 19, 19, 20, 2, 26, 29, 30, 30, 25.
a Prepare a boxplot for the data, showing the
position of any outliers.
b Suggest reasons for any outliers obtained.
THINK WRITE
a 1 Check that the scores are in a 1, 2, 3, 18, 19, 19, 20, 22, 23, 23, 25, 25,
ascending order. 25, 26, 27, 28, 28, 28, 29, 30, 30, 30, 30, 36
2 State the smallest number in the set. Lowest value = 1
3 State the largest number in the set. Highest value = 36
4 Find the median (12.5th score). Median = 25
5 Find the value of QL and QU. QL = 19.5, QU = 28.5
6 Find the IQR. IQR = 28.5 − 19.5
=9
7 Check for outliers. Calculate 1.5 × IQR. 1.5 × IQR = 1.5 × 9
= 13.5
Subtract this result from QL to find the QL − 13.5 = 19.5 − 13.5 = 6
lower limit. QU + 13.5 = 28.5 + 13.5 = 42
Add the result to QU to find the upper Outliers are values lower than 6 and higher
limit. than 42.
8 Write the values of any outliers. There are 3 outliers; they are: 1, 2 and 3.
9 Draw a horizontal axis that is evenly
scaled and incorporates the given values.
10 Draw a box representing the
interquartile range that begins at 19.5
(QL) and ends at 28.5 units (QU).
11 Draw a vertical line within the box at
25 units (the median).
MQ QLD 3 - Chapter 09 Page 421 Wednesday, May 26, 2004 11:17 AM
THINK WRITE
0 4 8 12 16 20 24 28 32 36 40
Number of goals
b Comment on any outliers obtained and b The low number of goals scored (outliers)
suggest reasons for their presence. could be due to a number of reasons such as
the goal shooter playing poorly, the team’s
inability to get the ball to the goal shooter,
injuries to the team and goal shooter, or the
superior playing ability of the opposition.
TI-83:
To draw a boxplot with outliers such as in the example
above, we need to choose the modified boxplot option.
The steps are the same as previously shown ( 2nd
[STAT PLOT]). However, when selecting the type of
graph, we use the modified boxplot option as shown at
right.
Use your graphics calculator to draw the boxplot for worked example 16.
MQ QLD 3 - Chapter 09 Page 422 Wednesday, May 26, 2004 11:17 AM
remember
remember
1. Range = highest score − lowest score or range = Xmax − Xmin
2. The difference between the upper and lower quartiles is called the interquartile
range, IQR. IQR = QU − QL. The IQR accounts for the middle 50% of the data.
3. A boxplot is a graphical representation of the five-number summary; that is, the
lowest score, lower quartile, median, upper quartile, highest score, for a
particular set of data. It consists of a partitioned box and a whisker at either end
that extends to the extreme scores.
4. Any piece of data that is considerably different from the rest of the values in a
set of data is called an outlier. When a set of data includes an outlier, the
median (or mode) rather than the mean is a better measure of central tendency.
9E Measures of spread
d WORKED 1 Find the range for each of the following sets of data.
hca Example
Mat
13 a 4, 3, 9, 12, 8, 17, 2, 16
Measures
of spread b 49.5, 13.7, 12.3, 36.5, 89.4, 27.8, 53.4, 66.8
c 7 1--2- , 12 3--4- , 5 1--4- , 8 2--3- , 9 1--6- , 3 3--4-
ogram Ca
2 Calculate the interquartile range, IQR, for the following sets of data.
GC pr
sio
WORKED
Example
UV
statistics 14 a 3, 5, 8, 9, 12, 14 b 7, 10, 11, 14, 17, 23
c 66, 68, 68, 70, 71, 74, 79, 80 d 19, 25, 72, 44, 68, 24, 51, 59, 36
am
rogr TI
3 The following stem-and-leaf plot shows the mass of newborn babies (rounded to the
GC p
sheet
State the:
E
15
a 6, 9, 12, 13, 20, 22, 26, 29 Boxplots
b 7, 15, 2, 26, 47, 19, 9, 33, 38
c 120, 99, 101, 136, 119, 87, 123, 115, 107, 100
5 The following set of data shows the ages of 30 people who attended a concert.
18, 26, 10, 12, 20, 18, 19, 10, 19, 17, 17, 9, 11, 13, 16
14, 14, 13, 12, 13, 24, 10, 12, 15, 14, 12, 16, 18, 11, 13
a Draw a boxplot of these data using a graphics calculator.
b State the range and the interquartile range of the data.
6 multiple choice
The diagram at right shows the heights
of a group of students.
a The interquartile range of the data is:
A 34 B 18
138 148 156 160 172 cm
C 12 D 22
b Which of the following is not true?
A 50% of the students are shorter than 1.56 m.
B The number of students shorter than 1.48 m is less than the number of those taller
than 1.60 m.
C The range of the heights of the students in the group is 34 cm.
D 75% of students have a height of 1.6 m or under.
WORKED 7 As newly appointed coach of Terrorolo’s Meteors netball team, Kate decided to record
Example
16
each player’s statistics for the previous season. The number of goals scored by the
leading goal shooter was:
1, 3, 8, 18, 19, 23, 25, 25, 25, 26, 27, 28,
28, 28, 28, 29, 29, 30, 30, 33, 35, 36, 37, 40.
a Prepare a boxplot for the data, showing the position of any outliers.
b Suggest reasons for any outliers obtained.
8 The following back-to-back stem-and-leaf plot shows the ages of 30 pairs of men and
women when entering their first marriage.
Key: 1 | 6 = 16 years old
Leaf Stem Leaf
Men Women
998 1 67789
99887644320 2 001234567789
9888655432 3 01223479
6300 4 1248
GAME
60 5 2
time
a Use a graphics calculator to construct a pair of parallel boxplots to represent the two Dealing
with
sets of data. (Parallel boxplots are those that share a common scale and are placed data
one above the other). — 002
b Find the mean, median, range and interquartile range of each set.
SHE 9.2
ET
c Find any outliers, if they exist, for each set.
Work
d Write a short paragraph comparing the two distributions. (Use mathematical evi-
dence, particularly the answers to part b.)
MQ QLD 3 - Chapter 09 Page 424 Wednesday, May 26, 2004 11:17 AM
t i gat
es Standard deviation
io
ion v
in
n inv
t i gat
es Statisticians use range and interquartile range as measures of the spread of data.
They are not the only measures used, however. The most frequently used measure
is the standard deviation.
Consider the data set below:
5, 10, 10, 13, 15, 16, 17, 20, 21, 25.
1 Find the range of the data.
2 Find the interquartile range of the data.
Now consider what happens if we change the first and last figures in the data set.
Consider the new data set:
9, 10, 10, 13, 15, 16, 17, 20, 21, 22.
3 Find the range of the data.
4 Find the interquartile range of the data.
5 After changing only two figures in the data set, describe the effect on the range
and the interquartile range.
Now consider another slight change to the data set:
9, 10, 12, 13, 15, 16, 17, 17, 21, 22.
6 Find the range of the data.
7 Find the interquartile range of the data.
8 We have now changed only 4 figures in the data set. Describe the effect that
this has had on the range and the interquartile range.
Again consider the original data set: 5, 10, 10, 13, 15, 16, 17, 20, 21, 25.
Change the data set to 5, 6, 10, 16, 17, 18, 19, 20, 24, 25.
9 Find the range of the data.
10 Find the interquartile range of the data.
11 Compare the range and interquartile range of this data set with those of the
original data set.
The standard deviation is a measure of spread that considers every score in the data
set. Again, consider the original data set:
5, 10, 10, 13, 15, 16, 17, 20, 21, 25.
We want to find a measure of the typical distance that each score lies from the
mean.
12 Find the mean.
We can find the distance of each score from the mean by subtracting the mean from
the score. This is called the deviation and is written as x – x . Because half these
scores will be negative we square each to obtain a positive answer. These are called
the squared deviations and are written as ( x – x ) 2 .
MQ QLD 3 - Chapter 09 Page 425 Wednesday, May 26, 2004 11:17 AM
Score x–x ( x – x )2
5
10
10
13
15
16
17
20
21
25
14 Next find the sum of the squared deviations, Σ ( x – x ) 2 and divide this result
by the number of scores, n. The result of this calculation is called the variance.
15 Finally, because the deviations were squared, we compensate for this by
taking the square root of the variance. This is the standard deviation, denoted
Σ( x – x )2
σn. Written as a formula σ n = ---------------------- .
n
16 In practice we do not go through this long procedure in each case. The
standard deviation can be found on a calculator by using the σn function. Enter
the data as a list on your graphics calculator.
17 The CALC function of 1 Variable Stats can be used to see the standard
deviation listed among other key statistics of the data.
t i gat
es Conducting a statistical inquiry
io
ion v
in
n inv
t i gat
es Your knowledge of sampling procedures, methods of graphical representation of
the data, and ways of calculating measures of central tendencies and measures of
spread of the data can now be applied in a single task — conducting a statistical
investigation.
You are to conduct an investigation on the theme of computers, with the target
population being all the students in your school.
1 Decide on the strata for your sample (it could be, for instance, gender, or
students in Year 8–10, Year 11–12).
2 Develop a method of obtaining a stratified random sample.
MQ QLD 3 - Chapter 09 Page 426 Wednesday, May 26, 2004 11:17 AM
hm
r ic e
Who owns the gold coins?
nt en
nten
r ic e A group of divers has just located a sack of gold coins on the seabed of the Pacific
hm
Ocean. In an effort to trace the country of origin, Britain, Mexico and the USA
have all claimed ownership. Since the coins have been in the ocean for a
considerable length of time, any identifying marks have been eroded away. The
three countries were each asked to supply a random sample of 20 of their gold
coins which they suspect of being the type in the raised treasure. Measurements on
these coins are to be compared with measurements of a sample of 20 from the
discovered sack. The following table displays the results of these measurements.
MQ QLD 3 - Chapter 09 Page 427 Wednesday, May 26, 2004 11:17 AM
2 Report the results of your findings. Which country is the rightful owner of the
coins? Provide evidence for your decision.
MQ QLD 3 - Chapter 09 Page 428 Wednesday, May 26, 2004 11:17 AM
Bivariate data
So far in this chapter we have dealt with univariate data; that is, only one variable was
considered for each piece of data. In this section we will look at sets of data where each
piece is represented by two variables. Such data are called bivariate.
Consider the following example. A researcher for a particular electricity company
wishes to analyse electricity consumption patterns. She selects a sample of households
and asks a representative of each household to name the number of electrical appli-
ances in their home and the amount of electricity consumed in the first quarter. This is
an example of bivariate data, since each household is being represented by 2 variables:
the number of electrical appliances and the electricity consumption. Furthermore, since
the amount of energy consumed could depend on the number of electrical appliances
being used, the number of electrical appliances can be thought of as an independent
variable and the electricity consumption as a dependent variable.
Scatterplots
Bivariate data are best represented using a scatterplot. Each piece of data on a scatterplot
is shown by a point. The x-coordinate of this point is the value of the independent variable
and the y-coordinate is the corresponding value of the dependent variable. In the above
example each household would be represented by the point whose x-coordinate is the
number of electrical appliances and whose y-coordinate is the amount of electricity
consumed by that household.
WORKED Example 17
The following table shows the total revenue from selling tickets for a number of different
chamber music concerts. Represent these data on a scatterplot.
Number of tickets sold 400 200 450 350 250 300 500 400 350 250
Total revenue ($) 8000 3600 8500 7700 5800 6000 11 000 7500 6600 5600
THINK WRITE
1 Determine the nature of the variables The total revenue depends on the number of
with reasoning. tickets being sold, so the number of tickets is
the independent variable and the total revenue
is the dependent variable.
2 Rule up a set of axes on graph paper. Revenue obtained from selling
Title the graph. Label the horizontal music concert tickets
11 000
axis ‘Number of tickets’ and the
10 000
vertical axis ‘Total revenue ($)’.
9000
Scale the horizontal and vertical axes.
Total revenue ($)
3
8000
4 Plot the points on the scatterplot. In each
7000
pair of values, treat the number of tickets
6000
as the horizontal coordinate and the
5000
corresponding total revenue as the
4000
vertical coordinate. For example, the first
3000
pair of values in the table is represented
0
by the point with coordinates (400, 8000). 200 250 300 350 400 450 500
Number of tickets
MQ QLD 3 - Chapter 09 Page 429 Wednesday, May 26, 2004 11:17 AM
TI-83:
1. Enter the number of tickets sold as L1 and the total
revenue as L2. To do this press STAT , choose EDIT
and enter the information in the two lists.
2. Adjust the window settings. Press WINDOW and
enter the settings as shown on the screen at right.
3. Press 2nd [STAT PLOT] and select Plot 1.
4. Use the arrow keys to select the scatterplot icon and
set Xlist to L1, Ylist to L2 and select the type of
mark. The selections are shown at right.
Correlation
When analysing bivariate data we are often interested to see whether any relationship
exists between the two variables and, if it does, what type of relationship it is.
The relationship between the two variables is called correlation. If correlation exists,
it can be classified according to its:
1. form — whether it is linear or non-linear
2. direction — whether it is positive or negative
3. strength — whether it is strong, moderate or weak.
The scatterplot is an excellent tool that assists in classifying the relationship between
the two variables.
y y
x x
Linear relationships
y y
x x
Non-linear relationships
x
Negative correlation
MQ QLD 3 - Chapter 09 Page 431 Wednesday, May 26, 2004 11:17 AM
x x x
Strong correlation Moderate correlation Weak correlation
Sometimes the points on the scatterplot form a straight line. In such cases we say
that the relationship between the variables is perfectly linear.
y y
x x
Perfectly linear relationships
Sometimes the points on the scatterplot appear to be in no y
particular order (that is, they are randomly spread over the set of
axes). In such cases we say that there is no correlation between the
two variables.
x
No correlation
The classification of the correlation between two variables discussed above is qual-
itative rather than quantitative. There are a number of methods that allow us to measure
and classify the correlation numerically, but these are beyond the scope of this course.
WORKED Example 18 y
State the type of the relationship between the variables x and y,
suggested by the scatterplot at right.
x
THINK WRITE
Carefully analyse the The points on the scatterplot form a narrow path that resembles a
scatterplot and comment straight ‘corridor’ (that is, it would be reasonable to fit a straight
on its form, direction and line to it). Therefore the relationship is linear.
strength. The path is directed from the bottom left corner to the top
right corner and the value of y increases as x increases. Therefore
the correlation is positive.
Furthermore the points are quite tight; that is, they form a thin
corridor. So the correlation can be classified as being strong.
There is a strong, positive, linear relationship between x and y.
MQ QLD 3 - Chapter 09 Page 432 Wednesday, May 26, 2004 11:17 AM
WORKED Example 19
Mary sells business shirts in a department store. She always records the number of
different styles of shirt sold during the day. The table below shows her sales over one
week.
Price ($) 14 18 20 21 24 25 28 30 32 35
2 Draw a conclusion corresponding to The price of the shirt appears to affect the
the analysis of the scatterplot. number sold; that is, the more expensive the
shirt the fewer sold.
remember
remember
1. Bivariate data involve two sets of related variables for each piece of data.
2. Bivariate data are best represented on a scatterplot. On a scatterplot each piece
of data is shown by a single point whose x-coordinate is the value of the
independent variable, and whose y-coordinate is the value of the dependent
variable.
3. The relationship between two variables is called correlation. Correlation can be
classified as linear, non-linear, positive, negative, weak, moderate or strong.
4. If the points appear to be scattered about the scatterplot in no particular order,
then no correlation between the two variables exists. If the points form a
straight line, then the relationship between the variables is perfectly linear.
5. When drawing conclusions based on the scatterplot, it is important to
distinguish between the correlation and the cause. Strong correlation between
the variables does not necessarily mean that an increase in one variable causes
an increase or decrease in the other.
9F Bivariate data
Determining
9.8 SkillS
HEE
T
independent
1 For each of the following pairs, decide which of the variables is independent and and
dependent
which is dependent. variables
a Number of hours spent studying for a Mathematics test and the score on that test.
b Daily amount of rainfall (in mm) and daily attendance at the Botanical Gardens. Math
c Number of hours per week spent in a gym and the annual number of visits to the
cad
doctor. Scatterplots
d Amount of computer memory taken by an essay and the length of the essay (in words).
e The cost of care in a childcare centre and attendance at the childcare centre.
f The cost of the property (real estate) and the age of the property. L Spread
XCE
sheet
E
g The cut-off OP score for a certain tertiary course and the number of applications
Scatterplots
for that course. (DIY)
h The heart rate of a runner and the running speed.
MQ QLD 3 - Chapter 09 Page 434 Wednesday, May 26, 2004 11:17 AM
WORKED 2 The following table shows the cost of a wedding reception at 10 different venues.
Example
Represent the data on a scatterplot.
17
Total cost (× $1000) 1.5 1.8 2.4 2.3 2.9 4 4.3 4.5 4.6 4.6
WORKED 3 State the type of relationship between x and y for each of the following scatterplots.
Example
18 a y b y c y d y
x x x x
e y f y g y h y
x x x x
i y j y k y l y
x x x x
my n y o y
x x x
WORKED 4 Eugene is selling leather bags at the local market. During the day he keeps records of
HEET
9.9 Example
his sales. The table below shows the number of bags sold over one weekend and their
19
SkillS
6 The table below shows the number of questions solved by each student on a test, and
the corresponding total score on that test.
Number of 2 4 7 10 5 2 6 3 9 4 8 3 6
questions
7 A sample of 25 drivers who had obtained a full licence within the last month was
asked to recall the approximate number of driving lessons they had taken (to the
nearest 5), and the number of accidents they had had while being on P plates. The
results are summarised in the table which follows.
MQ QLD 3 - Chapter 09 Page 436 Wednesday, May 26, 2004 11:17 AM
5 6 5 5
20 2 20 3
15 3 40 0
25 3 25 4
10 4 30 1
35 0 15 4
5 5 35 1
15 1 5 4
10 3 30 0
20 1 15 2
40 2 20 3
25 2 10 4
10 5
8 Each point on the scatterplot below shows the time (in weeks) spent by a person on a
healthy diet and the corresponding mass lost (in kg).
Loss in mass
Number of weeks
Study the scatterplot and state whether each of the following statements is true or
false.
a The number of weeks that the person stays on a diet is the independent variable.
b The y-coordinates of the points represent the time spent by a person on a diet.
c There is evidence to suggest that the longer the person stays on a diet, the greater
the loss in mass.
d The time spent on a diet is the only factor that contributes to the loss in mass.
e The correlation between the number of weeks on a diet and the number of kilo-
grams lost is positive.
MQ QLD 3 - Chapter 09 Page 437 Wednesday, May 26, 2004 11:17 AM
Temperature (°C)
Water usage (L)
10 multiple choice
The scatterplot below shows the number of sides and the sum of interior angles for a
number of polygons.
1300
1200
1100
Sum of angles (°)
1000
900
800
700
600
500
400
300
200
3 4 5 6 7 8 9 10
Number of sides
Which of the following statements is not true?
A The correlation between the number of sides and the angle sum of the polygon is
perfectly linear.
B The increase in the number of sides causes the increase in the size of the angle
sum.
C The number of sides depends on the sum of the angles.
D The correlation between the two variables is positive.
11 multiple choice
After studying a scatterplot, it was concluded that there was evidence that the greater
the level of one variable, the smaller the level of the other variable. The scatterplot
must have shown a:
A strong, positive correlation B strong, negative correlation
C moderate, positive correlation D moderate, negative correlation
MQ QLD 3 - Chapter 09 Page 438 Wednesday, May 26, 2004 11:17 AM
2
1 Find the mean, median and mode of the following set of data: 34, 18, 42, 18, 55, 18,
25, 42, 33, 18.
2 The results of a mathematics project are represented by this stem-and leaf-plot.
Key 2 |5 = 25
Stem Leaf
0 00
1
2 5579
3 3888
4 06666
Find the mean, median and mode.
3 Find the mean number of pets per household surveyed and presented in this frequency
table (correct to 1 decimal place).
4 Using the midpoint of class intervals, calculate the mean amount (dollars) spent at a
school canteen, represented in the following frequency table.
5 Add a cumulative frequency column to the table in the previous question and find the
median class.
6 Find the range for the set of data: 33, 46, 57, 42, 51, 66, 37, 27, 76, 74, 53, 77.
7 Find the interquartile range for the set of data: 2, 4, 3, 1, 4, 2, 3, 5, 2, 4, 3, 3, 5, 3.
MQ QLD 3 - Chapter 09 Page 439 Wednesday, May 26, 2004 11:17 AM
WORKED Example 20
The table below shows the number of boxes of tissues purchased by hay fever sufferers
during the blooming season in spring.
THINK WRITE/DRAW
a 1 Draw the scatterplot showing a T
‘Number of days affected by hay
5
fever’ (independent variable d) on
the horizontal axis and ‘Total 4
number of boxes of tissues
purchased’ (dependent variable T) 3
on the vertical axis. 2
0
3 4 5 6 7 8 9 10 11 12 13 14 d
0
3 4 5 6 7 8 9 10 11 12 13 14 d
b Look at the slope of the line. It is b The line of best fit has a positive slope. This
sloping upwards from left to right, so the means that, as the number of days affected
slope is positive. by hay fever increases, more tissue boxes
will need to be purchased.
0
3 4 5 6 7 8 9 10 11 12 13 14 d
2 Answer the question. In 11 days the hay fever sufferer will need
about 4 boxes of tissues.
MQ QLD 3 - Chapter 09 Page 441 Wednesday, May 26, 2004 11:17 AM
THINK WRITE/DRAW
ii 1 Locate 2 on the vertical axis. Draw a ii T ,
horizontal line until it meets the line
5
of best fit. From that point draw a
vertical line to the horizontal axis. 4
Read off the horizontal value
indicated on the horizontal axis by 3
the vertical line. 2
0
3 4 5 6 7 8 9 10 11 12 13 14 d
2 Answer the question. 2 boxes of tissues would be likely to last
about 6 days.
T
5
Extrapolation
4 (outside the
given range)
3 Interpolation
(inside the
2 given range)
0
3 4 5 6 7 8 9 10 11 12 13 14 d
Reliability of predictions
When predictions of any sort are made it is always good to know whether they are
reliable or not. Predictions made using the line of best fit can be thought of as reliable
if each of the following are observed:
1. the number of observations (that is, points constituting the scatterplot) is reasonably
large
2. the scatterplot indicates reasonably strong correlation between the variables
3. the predictions were made using interpolation.
MQ QLD 3 - Chapter 09 Page 442 Wednesday, May 26, 2004 11:17 AM
remember
remember
1. If the scatterplot indicates a linear relationship between two variables, the
linear model of the relationship can be established by drawing a line of best fit
into the scatterplot. Position the line so that there is approximately an equal
number of points on either side of the line.
2. The line of best fit can be used for predicting the value of one variable when
given the value of the other. This can be done graphically.
3. When the value that is being predicted using the line of best fit is within the
given range, the process is called interpolation.When the value that is being
predicted using the line of best fit is outside the given range, the process is
called extrapolation.
4. Only predictions made using interpolation can be considered reliable.
d WORKED 1 The data in the table below show the distances travelled by 10 cars and the amount of
hca Example
petrol used for their journeys (to the nearest litre).
Mat
20
Lines of
best fit
Hours worked 4 8 15 18 10 5 12 16 14 6
Weekly earnings ($) 23 47 93 122 56 33 74 110 78 35
5 The table below shows the average weekly expenditure on food for households of
various sizes.
Number of people in a 1 2 4 7 5 4 3 5
household
Cost of food ($ per week) 70 100 150 165 150 140 120 155
Number of people in a 2 4 6 5 3 1 4
household
a Construct a scatterplot of the data and draw in the line of best fit.
b Interpret the meaning of the gradient.
c Use your graph to predict the weekly food expenditure for a family of:
i 8 ii 9 iii 10.
MQ QLD 3 - Chapter 09 Page 444 Wednesday, May 26, 2004 11:17 AM
6 The following table shows the gestation time and the birth mass of 10 babies.
Birth mass (kg) 1.1 1.5 1.8 2.1 2.2 2.5 2.8 3.1 3.2 3.4
a Construct a scatterplot of the data. What type of correlation does the scatterplot
suggest?
b Draw in the line of best fit.
c What does the gradient indicate?
d Although full term of gestation is considered to be 40 weeks, some pregnancies last
longer. Use your graph to predict the birth mass of babies born after 41 and
42 weeks of gestation.
e Many babies are born prematurely. Predict the birth mass of a baby whose gestation
time was 30 weeks.
f If the birth mass of the baby was 2.4 kg, what was his or her gestation time (to the
nearest week)?
7 multiple choice
Consider the figure at right. y
The line of best fit on the scatterplot at right is
used to predict the values of y when x = 15, x = 40
and x = 60.
a Interpolation would be used to predict the value
of y when the value of x is:
A 15 and 40 B 15 and 60
C 15 only D 40 only 10 20 30 40 50 60 70 x
8 multiple choice y
n inv
t i gat
es 1 Choose an object or subject that is of interest to you and which can
be observed and measured during one day. For example, you might
decide to measure your own pulse rate.
2 Prepare a table where you will record your results every hour within
the school day. For example, for the pulse rate the table might look
like this.
Time 9 am 10 am 11 am 12 noon 1 pm 2 pm 3 pm 4 pm
Pulse rate
t i gat
es Long jump to the top
io
ion v
in
n inv
t i gat
es At the beginning of the chapter we met Laura, who was training for the long jump
and hoping to make the Australian Olympic team. Her best jump each year is
shown in the table below.
Age 8 9 10 11 12 13 14 15 16 17 18
Best jump 4.31 4.85 5.29 5.74 6.05 6.21 — 6.88 7.24 7.35 7.57
(metres)
summary
Copy the sentences below. Fill in the gaps by choosing the correct word or
expression from the word list that follows.
1 Data can be obtained either by observation or by .
2 Random sampling uses a device to select people or objects
from the population.
3 In simple random sampling every person or object has an equal
of being selected.
4 A stratified random sample splits the population into .A
sample from each stratum is selected ; the sample size for
each stratum is to the stratum size as compared to the
size.
5 An estimate of the sample size is given by the of the popu-
lation size.
6 All data can be divided into 2 types: and numerical. Numer-
ical data can be or while categorical data are
.
7 Numerical data can be or continuous.
8 To represent categorical or discrete numerical data we can use bar and
column graphs, sector graphs, picture graphs, dotplots and
plots.
9 In bar and column graphs the respective length and height of each bar
and column directly correspond to the of the observation, or
category it represents.
10 In pie graphs each category is shown by a whose size is pro-
portional to the category’s size (as compared to the population).
11 Picture graphs use to represent a specific number of items.
12 Dot-plots use a single dot to represent an .
13 Stem-and-leaf plots have leaves arranged in order of size,
outwards; that is, away from the stem.
14 Discrete data with a large number of different values and
data can be grouped into .
15 Grouped data can be represented using histograms, frequency
, cumulative frequency polygons and percentage cumulative
frequency polygons.
16 A histogram does not have any between columns.
17 A frequency polygon is a line graph, joining the of the top
parts of the columns that constitute a histogram.
18 Mean, median and mode are called measures of .
19 The mean is the of a set of data.
20 For ungrouped data the mean is calculated using the formula
; for grouped data the formula is , where x is the
actual observation for the discrete data, or a midpoint of a class interval
for the grouped data.
MQ QLD 3 - Chapter 09 Page 447 Wednesday, May 26, 2004 11:17 AM
WORD LIST
average dependent increasing discrete
target symbols highest modal
class intervals strata extrapolation categorical
spread central tendency n+1 upper
------------
questioning bivariate 2 QU
categorised counted frequency interpolation
stem-and-leaf scatterplot random lowest
randomly gaps proportional correlation
QL linear five-number
x = ------- ∑ x-
polygons sector n population
observation midpoints chance reliable
measured ∑ fx-
x = ---------- lower square root
independent n continuous positive
best fit
MQ QLD 3 - Chapter 09 Page 448 Wednesday, May 26, 2004 11:17 AM
CHAPTER
review
1 Lena is planning to conduct a survey of students who have completed the Senior Certificate.
9A She thought of some questions which are listed below.
i Determine the suitability of each of the listed questions, justifying your answer.
ii Tabulate possible responses for the questions deemed suitable in part i.
a Did you like studying for the Senior Certificate?
b Did you find studying for the Senior Certificate difficult?
c How many hours per week did you study?
d Which subjects did you do in your senior years?
e What were the hardest and easiest aspects of studying for your Senior Certificate?
f Did you have good teachers?
g Do you think tasks should be internally or externally assessed?
h What OP score did you obtain?
2 A researcher wishes to conduct a survey for the manager of a computer company. If the
9A company employs 200 technicians and 90 computer programmers, describe the procedure of
obtaining:
a a simple random sample b a stratified random sample.
3 ii For each of the following, classify the data as being categorical or numerical.
9B ii For the numerical data, decide whether they are discrete or continuous.
a Numbers on the T-shirts of football players
b The mass of individual tea bags in a pack of 50
c The finishing places in the gymnastics competition
d Weekly sales of computers in a large department store
e Arm span of 20 students
f The country of origin of the people applying for Australian citizenship certificates
4 The owner of a local restaurant wishes to know what desserts are popular. Over one night he
9B observes that 2 people order pavlova, 12 people order chocolate mousse, 7 people order
lemon tang cake, 14 people order chocolate mud cake and 9 people order sacher torte.
Represent these data using:
a a bar graph b a sector graph c a dot-plot.
5 The data below show the daily sales of calculators in a large electronics store over the last
9B three weeks of January.
GC 2 3 6 9 12 10 24 17 15 19 20 26 24 18 29 33 30 36
SC 7 6 10 8 15 11 20 18 23 28 30 26 32 38 39 35 43 41
c Score x Frequency f
70 2
71 6
72 9
73 7
74 4
8 A sample of 30 people was selected at random from those attending a local swimming pool.
Their ages (in years) were recorded as follows: 19, 7, 58, 41, 17, 23, 62, 55, 40, 37, 32, 29, 9D
21, 18, 16, 10, 40, 36, 33, 59, 65, 68, 15, 9, 20, 29, 38, 24, 10, 30.
a Find the mean and the median age of the people in this sample.
b Group the data into class intervals of 10 and complete the frequency distribution table.
c Use the frequency distribution table to estimate the mean age.
d Calculate the cumulative frequency.
e Find the median class.
f Compare the mean and median of the original data in part a with the estimates of the
mean and the median class obtained for the grouped data in parts c and e.
9 The following back-to-back stem-and-leaf plot shows the typing speed in words per minute
(wpm), of 30 Year 8 and Year 10 students. 9E
Key: 2 | 6 = 26 wpm
Leaf Stem Leaf
Year 8 Year 10
99 0
9865420 1 79
988642100 2 23689
9776410 3 02455788
86520 4 1258899
5 03578
6 003
MQ QLD 3 - Chapter 09 Page 450 Wednesday, May 26, 2004 11:17 AM
Number of questions 9 12 37 60 55 40 10 25 50 48 60
Test result 18 21 52 95 100 67 15 50 97 85 89
Number of questions 50 48 35 29 19 44 49 20 16 58 52
Test result 97 85 62 54 30 70 82 37 28 99 80
i 10 ii 35. 45
b Use the line of best fit to predict the value of x, 40
when the value of y is: 35
i 15 ii 30. 30
25
20
15
10
12 For his birthday, Ari was given a small white rabbit.
9G To monitor the rabbit’s development, Ari decided to
5
Week number 1 2 3 4 6 8 10 13 14 17 20
Length (cm) 20 21 23 24 25 30 32 35 36 37 39