MQ q3 ch09

MQ QLD 3 - Chapter 09 Page 375 Wednesday, May 26, 2004 11:17 AM
Dealing with
data
9
Syllabus strand
Chance and data CD
Ex 9A Collecting data:
CD 6.2
Ex 9B Presenting categorical
and discrete data:
CD 5.2
Ex 9C Representing data
grouped into class
intervals: CD 5.2
Age 8 9 10 11 12 13 14 15 16 17 18 Ex 9D Measures of central
(years) tendency: CD 5.2
Best jump 4.31 4.85 5.29 5.74 6.05 6.21 — 6.88 7.24 7.35 7.57 Ex 9E Measures of spread:
(metres) CD 6.2, CD DB 6.2
Ex 9F Bivariate data: CD 6.2
Ex 9G Lines of best fit:
CD 6.2
Laura is training for the long jump event and has hopes of
making the Australian team at the next Olympic Games.
Laura has been competing since she was 8 years old, and
each year she has kept records of her best jump. When
Laura was 14, she did not compete and missed the
season’s competition.
The qualifying mark for the Australian Olympic team is
8.1 metres and the next Olympic Games will be held when
Laura is 20. Can you predict whether or not Laura will
qualify for the team?
are you 376 Maths Quest 8 for Victoria
Are you ready? READY?

Try the questions below. If you have difficulty with any of them, extra help can be
obtained by completing the matching SkillSHEET. Either click on the SkillSHEET icon
next to the question on the Maths Quest 3 CD-ROM or ask your teacher for a copy.
Determining suitability of questions for a survey

9.1
1 Lauren was preparing a questionnaire for a survey on graphics calculator usage in the classroom.
Would the following be suitable questions?
a Do you own (or have access to) a graphics calculator?
b Do you agree that graphics calculators are too expensive?
c How frequently (on average) would you use a graphics calculator in a maths lesson?
9.2 Finding proportions

2 A school has 430 students in the junior school, 260 in the middle school and 170 in the senior
school. Determine the proportion of students in each of the three sections.
9.3 Distinguishing between types of data

3 Decide whether the following data are categorical or numerical. For categorical data, state whether
they are ordinal or nominal. For numerical data, state whether they are continuous or discrete.
a Height of students in Year 10
b Pets owned by students
c Position in the under-15 cross-country race
Favourite television shows
9.4 Reading bar graphs
Comedy
4 The graph at right represents the favourite Soaps
Television shows
television shows of 500 teenagers. Police Drama

a What are the most popular and least News
popular television shows? Documentaries
Cartoons
b How many teenagers prefer watching Science Fiction
comedy television shows? Lifestyle
c How many more students prefer soaps to Thriller
thriller television shows?
0 5% 10% 15% 20% 25%
Percentage favouring
9.5 Calculating the angle in a pie graph
5 The data in question 2 are used to construct a pie graph. What size sector angle would be used for
each category (to the nearest degree)?
9.6 Expressing one quantity as a percentage of another

6 For each of the following pairs, express the first quantity as a percentage of the second quantity (to
1 decimal place).
a 12 minutes, 1 hour b 75c, $1.75 c 4 days, 2 weeks
9.8 Determining independent and dependent variables

7 For each of the following, state the independent and the dependent variables.
a The number of kilograms of potatoes purchased and the total cost.
b The number of swimmers in a public swimming pool and the day temperature.
c The height and age of a student.
Chapter 9 Dealing with data 377

Collecting data
Generating data for statistical investigation
The basis of any statistical investigation is data. Such data may be available from a
source like the Australian Bureau of Statistics, but if this is not the case, you may have
to conduct your own research. You can obtain your own data for an investigation using
either observation or questioning.
If the collection of data does not involve responses from people, it can be obtained
by observation. An example of collecting data through observation would be counting
the number of defective items in each batch of items, as a quality control check.
When preparing to obtain data by observation, it is a good idea to organise a table in
which to record (or tally) the observations.
Once the table has been prepared, the data may be collected. Tallies
are best placed in groups of five and denoted as ‘gateposts’; that is, four
vertical strokes crossed by one diagonal stroke.
This makes the count of the total very convenient. Not only are the data recorded
neatly, you also have a ready-made frequency table for graphical representation and
data analysis.
To collect data that require responses from people, questioning is used. Once again,
it is best to be prepared before collecting the data, which in this case involves designing
a questionnaire. When preparing a questionnaire, it is imperative that questions are clear
and to the point. Questions entailing a number of responses must include categorised
(that is, tabulated) possible answers. It is always a good idea to include the category
‘other’ to cover any responses that are not listed. Questions requiring written responses
should be avoided as these are time-consuming to collect and difficult to analyse.
Once the questionnaire is ready, the data may be collected in a number of ways; that
is, by interviewing people or by letting people fill them in themselves.
t i gat
es Problems collating data
io
ion v
in
n inv
t i gat
es Suppose that you want to conduct a survey on Internet usage. Survey a group of
people giving them the questions set out below. After questioning the participants
collate your results.
1 Have you ever used the Internet?
2 Where do you access the Internet?
3 How often do you use the Internet?
4 What type of computer do you have?
5 For what purpose do you use the Internet?
6 What would you rather do: go to see a movie, or go on the Net to chat (that is,
use a chat room)?
7 What time of day do you use the Internet?
8 What do you like about the Internet?
Having completed your survey:
a Discuss as a class the problems that you may have had in collating the answers
to some questions.
b Identify which questions you consider suitable for a questionnaire.
c Redesign those questions that were difficult to collate and make them more
suitable.
378 Maths Quest for Queensland Book 3
Sampling
If the collection of data is to be done through questioning, the most important step after
preparing a questionnaire is to decide who to ask. For the most accurate results a
census is required; that is, the entire population must be questioned. However, this is
usually practical for only small populations. For large populations statisticians usually
opt for a sample; that is, a group of people whose opinions will, hopefully, reflect the
opinions of the whole population.
It is important to decide how many people to include in a sample. As a general guide,
if the size of the population is N, a sample size should be about N . For example, if
the population is 100 people, a good sample size would be 100 = 10 people.
To estimate sample size, n, use the rule n ª N where N is the size of the
population.
There are numerous sampling techniques, but in this section we will discuss simple
random sampling and stratified random sampling. These particular techniques involve
using some form of random device for selecting people from the target population.
Such devices include numbered pieces of paper mixed in a hat, numbered balls in an
urn, tables of random digits and random number generators on calculators and com-
puters.
Simple random sampling

Simple random sampling consists of obtaining a sample from the target population, so
that the selection of any person (or object) in the population is equally likely.
WORKED Example 1
A city council representative wishes to survey the parents of children attending any one of
the council’s 5 kindergartens. According to council’s records, the total enrolment in the
kindergartens is 250.
a Determine how many parents will be surveyed.
b Describe a procedure for obtaining a simple random sample.
THINK WRITE
a 1 Write down the rule to estimate a n= N , where N = 250
sample size.
2 Substitute known values into the rule n = 250
and calculate. = 15.811 388 3
3 Round your answer correct to the ≈ 16
nearest whole number.
4 Answer the question. Sixteen parents will need to be surveyed.
b Describe a sampling procedure where b Assign each child in the kindergarten a
the participants are chosen randomly. number from 1 to 250. Number 250 pieces
of paper and place them into a container.
Select 16 pieces of paper from the container
and match these to the children. The parents
of these children will participate in the
survey.
MQ QLD 3 - Chapter 09 Page 379 Friday, June 3, 2005 1:09 PM

Generating random numbers using a calculator or computer L Spread
XCE
sheet
You can generate a set of random numbers on a scientific calculator, graphics calculator
E
or a spreadsheet. Consider the case in the previous worked example. Generating
random
Scientific calculator numbers
By pressing the RANDOM (or Ran#) on a scientific calculator, a random decimal
number between 0 and 1 will be generated. This number must then be multiplied by the
value representing the size of the population, N. The result is then rounded up to the
nearest whole number.
Note: By always rounding up we ensure that 0 is never the result and this method
also ensures that the last number assigned to the population has an equal chance of
being generated.
Suppose that the random number generated is 0.217. Multiply this by 250 to obtain
the result 54.25, which is then rounded up to 55. This procedure will need to be
repeated a further 15 times.
Note: Repeated values are discarded, as parents cannot be surveyed twice. If this
occurs, an extra random number must be generated.
Some calculators will generate random integers between set limits. If your calculator
has a RANDI function this can be done. Enter a lower limit of 1 and an upper limit of
250, then close the brackets and press ENTER . This will need to be repeated a further
15 times.
Graphics calculator
A graphics calculator can be used to produce a list of 16 random integers between 1
and 250 by following these steps.
Casio:
1. Enter the RUN mode from the MAIN MENU.
2. Press OPTN F6 ( ) F4 (NUM) F2 (Int) (
then EXIT .
3. Press F3 (PROB) F4 (Ran #).
4. Complete the expression by pressing the keys
× 2 5 0 + 1 ) EXE .
5. Each time the EXE key is pressed, a random
integer in the range 1 to 250 is produced. Pressing
this key a total of 16 times will produce 16 random
integers.
TI-83:
1. Press MATH .
2. Use the arrow keys to select PRB.
3. Select option 5: randInt(.
4. Enter the lower limit (1), the upper limit (250),
the number of random integers required (16),
then close the brackets and press ENTER .
5. The list will now be displayed across the screen.
You will need to use the right arrow key to see
the full list.
Spreadsheet
An Excel spreadsheet uses a similar method to a scientific calculator.
1. In cell A1 enter the formula =INT(RAND()*250+1). This formula generates a random
decimal number between 0 and 1, multiplies the result by 250 and adds 1. Only the
whole number part is then considered.
Note: Adding 1 and taking only the whole number part (that is, rounding down) is
the same as rounding up before adding 1.
2. Use the Fill Down function to copy this formula down to cell A16.
By clicking on the link below you can see further instructions on how to generate
random numbers by spreadsheet.
extension
extension — Generating random numbers by spreadsheet
Stratified random sampling
The method of stratified random sampling consists of splitting the target population
into certain categories, called strata. People in each stratum (category) may be expected
to have opinions similar to each other, but different from those expressed by people in
other strata. Suppose that you wanted to obtain opinions of secondary school students.
You may divide the school into groups according to gender or individual year levels.
The size of the sample selected from each stratum is proportional to the size of this
stratum, as compared to the whole population. The sample from each stratum is
selected randomly (as discussed previously).
WORKED Example 2
The city council representative from worked example 1 decides that parents of 3-year-old
children might have different opinions on the quality of care from parents of 4-year-olds.
The total enrolment of 3-year-olds is 100 and the total of 4-year-olds is 150. Determine
how many parents will need to be surveyed from each category.
THINK WRITE
1 We have previously determined that 16 n = 16

parents need to be surveyed.
2 Calculate the number of students to be 3-year-old proportion = 100
---------
250
selected from the 3-year-old stratum.
iii Express the year level number as a Number to be surveyed = 100
---------
250
× 16
fraction of the total.
iii Multiply this fraction by the number of = 6.4
parents to be surveyed.
iii Round your answer to the nearest whole ≈6
number where necessary.

THINK WRITE
3 Repeat step 2 for the 4-year-old stratum. 4-year-old proportion = 150
---------
250
Number to be surveyed = 150
---------
250
× 16
= 9.6
≈ 10
4 Write a short statement regarding the results A total of 16 parents will need to be
obtained. surveyed from the total group. The
council representative will need to
randomly select 6 parents from the
3-year-old children’s group and
10 parents from the 4-year-old children’s
group.
Other random and non-random sampling methods are often used. Non-random methods
include convenience, volunteer, quota and judgement sampling. These methods, how-
ever, are statistically not as accurate as random sampling techniques, because it is easy
to introduce some kind of bias and also because they depend on the surveyor’s ability
to select an appropriate sample. Therefore, more confidence is placed in conclusions
drawn from samples obtained randomly.
remember
remember
1. If the collection of the data does not involve responses from people, it can be
obtained by observation. It is always a good idea to prepare a table where
observations will be recorded (tallied) prior to collecting the data.
2. To collect data that require responses from people, questioning is used.
Preparation in this case involves designing a questionnaire. When preparing a
questionnaire, questions must be clear and to the point. It is always a good idea
to include the category ‘other’ to cover any responses that are not listed.
Questions requiring written explanations should be avoided.
3. An estimation of the sample size, n, is given by the square root of the
population size, N. n ≈ N
4. Random sampling uses a random device to select people or objects from the
target population.
5. Simple random sampling ensures that every person or object in the population
is equally likely to be chosen.
6. Stratified random sampling splits the population into categories, called strata.
Opinions expressed by people from the same stratum may be similar to each
other, but may differ from those expressed by people from other strata. The
sample size from each stratum is proportional to that stratum size as compared
to the population size.
9A Collecting data
1 State whether the following data can be obtained by observation or by questioning.
a The number of students attending school each day
b The shoe size and the clothes size of the students in Year 10
c The usual means of arriving to school for Year 8 and 9 students
d The number of M&Ms of each colour in a pack
e The amount of M&Ms and Minties consumed weekly by students in grade 5
f The daily total number of visits to the local medical centre
g The average number of visits to the doctor per year for people in different age
groups
h The ranking of a new movie (on a scale from ‘don’t bother’ to ‘can’t miss’)
i The number of people attending a new movie in the first week of showing
HEET
9.1 2 Design a questionnaire on a movie theme (include at least 10 questions). Test the ques-
tionnaire by asking someone from your class to fill it in. Did the test reveal any areas
SkillS
Determining that need to be improved? If so, refine your questionnaire.

suitability
of questions 3 Josef wishes to research people’s cooking habits and has organised a list of possible
for a survey
questions that could be included in his survey. Determine the suitability of each of the
listed questions, justifying your answer.
a Do you ever cook?
b Do you enjoy cooking?
c Which part of the meal do you prefer to cook?
d How often do you cook?
e What is your favourite recipe?
f Are you offended if someone does not
like the food you have prepared?
g Classify your cooking ability.
h Do you use a dishwasher?
WORKED 4 The owners of a local gymnasium wish to
Example
1 conduct a survey of their Gold Pass card-
holders. According to their records, there are
200 such holders.
a Determine how many Gold Pass card-
holders should be surveyed.
b Describe the procedure of obtaining a
simple random sample.
5 multiple choice
To conduct a statistical investigation, Nathan
needs to obtain a simple random sample
from 400 students enrolled at his school.
a The appropriate sample size Nathan
should obtain is:
A 10 B 20 C 30 D 40

b Using the random number generator function on his calculator, Nathan produced a
string of decimal numbers. The first two decimal numbers were 0.221 and 0.043. If
each student is assigned a number from 1 to 400 inclusive, the first two random
numbers will correspond to which students on the roll?
A 5 and 1 respectively B 88 and 17 respectively
C 89 and 18 respectively D 8 and 1 respectively
6 Suggest possible strata for the stratified random samples from each of the following
populations.
a Students enrolled in the Bach-
elor of Science course at the
Queensland University of
Technology.
b All students who have
finished the first year of the
Law and Commerce course at
James Cook University.
c All students who have gradu-
ated from the University of
Queensland this year.
d Last year’s Computer Science
graduates.
WORKED 7 A researcher for a hospital 9.2
Example SkillS
wishes to conduct a survey of the
HEE
2
T
patients in the trauma and cardi- Finding
ology wards. At the time of the proportions
survey there are 30 cardiology
patients and 50 trauma patients.
If 9 patients are to be surveyed,
determine how many patients
should be surveyed from each
category.
8 multiple choice
A stratified random sample is being selected from a population of 100 individuals, who
have been divided into three strata. If the number of people in these strata is 30, 20 and
50, then the corresponding number of people selected from each stratum would be:
A 2, 3 and 5 respectively B 6, 4 and 10 respectively
C 3, 2 and 10 respectively D 3, 2 and 5 respectively.
t i gat
es Non-random sampling
io
ion v
in
n inv
t i gat
es Research and explain each of the following non-random sampling methods.
1 Convenience sampling
2 Volunteer sampling
3 Quota sampling
4 Judgemental sampling
Presenting categorical and discrete

data
Once data have been collected, the next step is to present them correctly and clearly. In
this section we will revise the methods of presenting data using column and bar graphs,
sector/pie graphs, pictographs, dot-plots and stem-and-leaf plots.
Types of data
All data can be divided into two major groups: categorical (or qualitative) and
numerical (or quantitative).
Categorical data are data that can not be measured or counted, but can be categ-
orised. Examples of categorical data include eye colour or pizza sizes available at the
local takeaway. Categorical data may be divided into two groups — nominal and
ordinal. Nominal data divide a particular piece of information into subgroups, for
example eye colour (hazel, blue, green and so on). Ordinal data deal with a ranking
system, for example pizza sizes (family, large, medium, small).
Numerical data are data that can be measured or counted. Examples of numerical
data include students’ heights and the number of defective items in a batch of identical
items.
Numerical data in turn can be subdivided into two groups — discrete and con-
tinuous. Discrete data can assume only specific values and are usually associated with
counting, such as the number of defective items in a batch. Continuous data can take
any value within a certain range and are usually associated with measuring, such as the
height of students.
In this section we will consider the representation of categorical and numerical
discrete types of data.
Data may be classified under the following headings:

1. Categorical: data are placed in categories. The categories can then be described
as:
• nominal — the information is divided into subgroups.
• ordinal — the categories are in some type of ranked order.
2. Numerical: data are in numerical form. Numerical data can then take two
forms:
• discrete — the data can take only certain exact values, usually whole
numbers, and are associated with counting.
• continuous — the data can take any value within a certain range, and are
associated with measuring.
Bar and column graphs

Bar and column graphs are often used to represent categorical data (although numerical
data can also be represented with these graphs). Each bar (or column) represents a
single category or an observation and its length (or height) shows the frequency of each
category. The bar graph has frequencies on the horizontal axis, while the column graph
has frequencies on the vertical axis.

WORKED Example 3
Thirty people were asked which cereal they
preferred for breakfast. The results were Cereal Frequency
recorded in the table at right. Construct a
Kellogg’s Special K 7
column graph to represent this information.
Cornflakes 4
Coco Pops 6
Weet-Bix 3
Rice Bubbles 5
Kellogg’s Just Right 5
Total 30
THINK DRAW
1 Rule up a set of axes on graph paper. Cereal preferences
Title the graph and label the horizontal 7 for breakfast
axis ‘Type of cereal’ and the vertical 6
axis ‘Frequency’.
5
Frequency
4
3
2
1
0
Kellogg’s
Special K
Cornflakes
Coco Pops
Weet-Bix
Rice
Bubbles
Kellogg’s
Just Right
Type of cereal
2 Scale the horizontal axis and vertical axis.
3 Draw the first column (rectangle) so that
it reaches a vertical height of 7 units.
Label the section of the axis below the
column as Kellogg’s Special K.
4 Leave a gap between the first and
second columns (rectangles).
5 Repeat steps 3 and 4 for each of the
remaining cereals.
Graphical representation of the data allows us to see the ‘whole picture at a glance’.
Many questions about the data can be answered by simply looking at the graph. You
will have done much of this in earlier years.
Compound and multiple graphs

When comparing two or more sets of
data relating to the same topic at the ‘His and ‘Her’ fashion house
same time, the following are used: Sales per quarter
i(i) compound column graphs and bar First month
graphs of the ‘His’ ‘Her’
(ii) multiple column graphs and bar quarter fashions fashions
graphs. January 12 000 15 000
The table at right compares the
number of sales made per quarter April 10 000 12 000
(every 3 months) of ‘His’ and ‘Her’ July 9 000 10 000
fashion house. The two graphs below
October 14 000 14 000
represent this information pictorially.
• Compound column or bar graphs are drawn with one bar representing combined sets
of data. Individual bars are multicoloured, and each colour corresponds to a par-
ticular piece of data.
• Multiple column and bar graphs are drawn with bars representing the same sets of
data next to each other.
Which graph is easier to read? Why?
Compound bar graph
Oct
Beginning of quarter
July
‘His’ fashion
Apr ‘Her’ fashion
Jan
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Sales ($1000)
Multiple column graph

16
14
12
Sales ($1000)
10 ‘His’ fashion
8 ‘Her’ fashion
6
4
2
0
Jan Apr July Oct
Pie graphs
Pie graphs, also known as sector graphs, pie charts and circular graphs, are mostly used
to represent categorical data. The size of the sector is proportional to the size of that
category, as compared to the total.

WORKED Example 4
The data in the table at right show certain
family monthly expenses, expressed as a Percentage of monthly
percentage of the total monthly income. Expenditure income
Represent the data on a pie graph. Mortgage 30
Food 20
Childcare 20
Bills 10
Transport 10
Entertainment 5
Other 5
Total 100
THINK WRITE
1 Express each percentage as a common Mortgage: 30
---------
100
× 360 = 108°
fraction over 100 and convert the
fraction to an angle by multiplying by Food: 20
---------
100
× 360 = 72°
360°.
Childcare: 20
---------
100
× 360 = 72°
Bills: 10
---------
100
× 360 = 36°
Transport: 10
---------
100
× 360 = 36°
Entertainment: 5
---------
100
× 360 = 18°
Other: 5
---------
100
× 360 = 18°
2 Check that the total of all angles is Total of all angles is 360°.
360°.
3 Use a pair of compasses to draw a circle
and mark the centre. Monthly expenses
4 Measure sectors corresponding to each Other 5%
angle to complete the pie graph. Entertainment 5%
Bills 10% Mortgage 30%
Transport 10%
Childcare 20%
Food 20%
Note: Sometimes the total sum of the sectors won’t add up to 360° exactly, but would
produce a total somewhere around 360° (say between 359° and 361°). This is due to
rounding. When constructing a pie graph by hand, we can ignore this. All we have to
do is measure out all but the last angle and let the last angle ‘take in’ any minor error
that occurred due to rounding.
Pictographs
Pictographs, also referred to as picture graphs or pictograms, allow data to be displayed
in a novel way using illustrations or symbols. A key or legend is always used to show
the number of items each symbol represents. Although pictographs can be interesting
and do make an impression, the rounding of data to suit key pictures causes loss of
detail and accuracy. Half pictures (or scaled down versions) can be used to represent
half of the data. However, smaller fractions such as one-third or one-quarter may be
quite difficult to illustrate using pictographs.
WORKED Example 5
The table at right shows the number of
Time interval Number of voters
people who attended a local primary school
(during the first 6 hours of voting time) to 8 am–9 am 60
cast their vote in the last state election. 9 am–10 am 85
Represent the given data as a pictograph.
10 am–11 am 100
11 am–12 noon 125
12 noon–1 pm 115
1 pm–2 pm 95
THINK DRAW
1 Rule and label a vertical axis. Title the
graph.
2 Scale the vertical axis.
3 Include a key showing a symbol to
represent the number of voters.
4 Place the appropriate number of Number of voters who attended
symbols in their respective row next to during the first six hours of voting
the relevant time interval. For example, Time
8 am – 9 am → 6 symbols 1 pm – 2 pm
(6 × 10 = 60 voters).
12 noon – 1 pm
11 am – 12 noon
10 am – 11 am
9 am – 10 am
8 am – 9 am
= 10 voters

Note: In worked example 5 the symbols were placed in rows. Alternatively, they could
have been placed in columns. Pictographs resemble bar graphs (or column graphs),
except that they lack the frequency axis.
Dot-plots
Dot-plots are similar to pictographs. Each observation is represented by a single dot. A
good feature of the dot-plot is that it can be constructed while in the process of col-
lecting the data. A horizontal axis is prepared by writing in possible values of observ-
ations or categories, and then the collection of data begins. Each time a certain value is
observed, a dot is placed in the corresponding column. Provided that the dots are
placed neatly in columns and are evenly spaced, by the end of the experiment the data
collected are also represented (displayed) graphically; that is, two steps are accom-
plished in one action.
WORKED Example 6
While waiting for her mum to pick her up from school, Anna watched the cars that were
passing by. Within 7 minutes Anna observed 4 sedans, 6 station wagons, 5 four-wheel
drives, 3 hatchbacks and 1 sports car. Represent these data using a dot-plot.
THINK DRAW
1 Draw an evenly scaled horizontal axis
and label it.
2 Write down the different types of cars
observed underneath the horizontal
axis.
3 Systematically work through the given
data and place a dot above the
appropriate type of vehicle for each
value recorded.
4 Title the dot-plot. Cars observed by Anna
Cars
Sedan
Station wagon
4-wheel drive
Hatchback
Sports car
Stem-and-leaf plots
A stem-and-leaf plot, or stem plot, can be used if the data are initially recorded as a
string (or list) of numbers. Although stem-and-leaf plots are usually used to represent
discrete numerical data, they can also be used to represent continuous data if the data
are rounded off first. For example, if the distances between cities are rounded off to,
say, the nearest kilometre, they can then be displayed on a stem-and-leaf plot.
Data in stem-and-leaf plots are made up of two components; a stem and a leaf. The
final digit of a particular number is the leaf, the previous digit(s) form the stem.
Stem-and-leaf plots have leaves arranged in order of size, increasing away from
the stem. The final digit of a particular number is the leaf while the previous
digit(s) form the stem.
WORKED Example 7
The heights of 30 students (to the nearest cm) were measured and recorded as follows:
125, 143, 119, 136, 127, 131, 139, 122, 140, 118,
120, 123, 132, 134, 127, 129, 124, 131, 138, 133,
122, 128, 130, 135, 141, 139, 121, 138, 131, 126
Represent the data on a stem-and-leaf plot.
THINK WRITE
1 Rule up two columns headed ‘stem’ and Key 11 | 8 = 118 cm
‘leaf’. Stem Leaf
2 Make note of the smallest and largest 11 98
values of the data (118 and 143). List the 12 572037942816
stems in ascending order in the first 13 6192418305981
column. 14 301
Note: The hundreds and tens component
of the number represents the stem.
3 Systematically work through the given
data and enter the leaf (unit component) of
each value in a row beside the appropriate
stem.
4 Include a key next to the plot which Key 11 | 8 = 118 cm
informs the reader of the significance of Stem Leaf
each entry. 11 89
5 Redraw the stem-and-leaf plot so that the 12 012234567789
numbers in each row of the leaf column 13 0111234568899
are in ascending order. This is called an 14 013
ordered stem-and-leaf plot.
Note: In worked example 7, the middle rows of leaves were too long. This can be over-
come by breaking stems into smaller intervals, say intervals of 5. The stem of 12 would
include all numbers from 120 to 124 inclusive and the stem of 12* would include all
numbers from 125 to 129 inclusive. In comparison to the stem-and-leaf plot in worked
example 7, the new plot would have extra rows and not look so bunched up.
Key 11 | 8 = 118 cm Key 11* | 8 = 118 cm
Stem Leaf Stem Leaf
11 8 9 11* 8 9
12 0 1 2 2 3 4 5 6 7 7 8 9 12* 0 1 2 2 3 4
13 0 1 1 1 2 3 4 5 6 8 8 9 9 12* 5 6 7 7 8 9
14 0 1 3 13* 0 1 1 1 2 3 4
13* 5 6 8 8 9 9
14* 0 1 3

Stem-and-leaf plots can also be used to display two sets of data simultaneously. For
example, if we want to show the heights of boys and girls in the class on the same plot,
we can place leaves on either side of the stem; say, girls on the right and boys on the
left. Such a display is called a back-to-back stem-and-leaf plot.
Below is an example of a back-to-back stem-and-leaf plot that displays the heights of
boys and girls in a particular group.
Key 11 | 8 = 118 cm
Leaf Stem Leaf
Boys Girls
11 89
9641 12 02235778
9986411 13 012358
31 14 0
Note: On the left side of the plot (boys’ heights) the leaves increase away from the stem.
remember
remember
1. Data may be classified under the following headings:
Nominal: data are placed in
subgroups.
Categorical: data are placed in categories
(non-numerical form).
Ordinal: categories are in a
ranked order.
Discrete: counted in exact values.

Numerical: data are in numerical form.
Continuous: take any value in a
range.
2. In a bar graph each bar represents a single category or observation, while the
length corresponds to that observation’s frequency. In a column graph each
column represents a single category or observation while the height
corresponds to that observation’s frequency.
3. Compound and multiple column graphs and bar graphs are used to display two
or more sets of data simultaneously.
4. In sector graphs (pie graphs) each category is represented by a sector whose
size is proportional to the size of that category, as compared to the total.
5. Pictographs use symbols to represent a specific number of items. There is no
frequency axis on these graphs. The frequency is determined by multiplying
the number of symbols for a given category with the number that symbol
represents (as shown on the legend or key).
6. Dot-plots use one dot to represent a single observation. Dots are placed in
columns (or rows), so that each column (or row) corresponds to a single category
or observation. Dot-plots can be constructed while the data are being collected.
7. Stem-and-leaf plots have leaves arranged in order of size, increasing away from
the stem. The final digit of a particular number is the leaf, the previous digit(s)
form the stem. Stem-and-leaf plots can also be used to display two sets of data
simultaneously.
Presenting categorical and

9B discrete data
9.3 1 Decide whether the following data are categorical or numerical.
HEET
a Options for voting (Liberal, Labor, Democrats and so on)
SkillS
Distinguishing b Students’ weights

between types c Heights of trees in a park
of data
d Religious denomination (Catholic, Anglican, Muslim and so on)
hca
d e Head circumference of newborn babies
Mat
f Gender of newborn babies

Displaying
data g Student enrolment in every state school in Queensland
h Make of cars (Holden, Toyota, Nissan and so on)
i Distances between the Australian capitals
Spreadshe
j Number of biscuits in a batch of 250 g packets
EXCEL
et
Column k Selection of Junior school subjects offered by McKinnon Secondary College

graphs l The capacity of the public swimming pool in Townsville.
(DIY)
2 For each numerical piece of data in question 1 state whether it is discrete or continuous.
Spreadshe
EXCEL
et
3 For each categorical piece of data in question 1 state whether it is nominal or ordinal.
Bar
graphs
(DIY)
WORKED 4
Example
3
Thirty-five people were asked to name their favourite movie of all time. The results were
recorded in the table below. Construct a column graph to represent this information.
Movie Frequency
Pearl Harbor 2
Titanic 4
Crocodile Dundee 12
Batman 10
The Mask 7
Total 35

5 The graph at right shows the average Dec
Nov
9.4 SkillS
monthly temperatures during one year
HEE
Oct
T
in a certain city. Reading
a What was the lowest average Sept bar
temperature recorded? Aug graphs
b In what month did the average
Months
July
temperature reach its maximum? June
c In which months was the average May
temperature the same? Apr
d What was the difference in average Mar
temperatures between December and Feb
June? Jan
e In which country do you think the 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
temperatures shown on the graph could Temperature °C
have been recorded?
Economics
6 multiple choice
The graph at right shows the Chemistry
enrolment of students in a particular Subject
Maths A
school in a number of Senior subjects.
a The type of graph being displayed Maths B
is a:
A bar graph Physics
B column graph
C compound bar graph Number of students enrolled
D multiple bar graph Boys Girls
b The subject that has the largest enrolment of boys is:
A Economics B Chemistry C Maths A D Maths B
c The subject that has the largest enrolment of students is:
A Chemistry B Maths A C Maths B D Physics
d The subject that has the least enrolment of girls is:
A Economics B Maths A C Maths B D Physics
e Which of the following statements is not true?
A The number of girls studying Economics is about the same as those studying
Chemistry.
B Physics has the smallest number of students enrolled.
C The number of boys and girls studying Maths B is about the same.
D There are as many boys studying Physics as there are girls studying Chemistry.
7 The table below shows the production of three different models of bicycles (in
thousands) in four consecutive years. (Note: The production of model A was stopped
in 1998.) Construct a multiple column graph to represent these data.
Year Model A Model B Model C
1996 12 16 20
1997 8 20 23
1998 — 26 26
1999 — 28 30
WORKED 8 The data in the table below show the distribution of the budget allocated to
HEET
9.5 Example
promotional advertising in a big cosmetic company (all numbers are expressed as a
4
SkillS
percentage of the total budget). Represent these data on a pie graph.

Calculating
the angle in
a pie graph Percentage
Type of promotion allocated
TV commercials 45
Radio commercials 10
Advertisements in major newspapers 5
Advertisements in women’s magazines 25
Promotions in major shops 12
Promotions in beauty salons 3
Total 100
9.6 9 During the day Maya recorded the amount of time spent on various activities, as
HEET shown below.
SkillS
Expressing
one quantity House- Watching
as a Activity Sleep Work Travel Cooking Eating work TV Shopping
percentage
of another Time spent 7.5 8 1.5 1 1 1.5 2 1.5
(h)
Spreadshe
EXCEL
et
Sector or
a Represent these data on a pie graph.
pie graphs b What percentage of Maya’s day is spent doing various types of work (that is, at her
(DIY) workplace, cooking, housework and shopping)?
c What percentage of the day did Maya spend watching TV?
WORKED 10 The number of ‘Happy meal deals’ sold at the local McDonald’s restaurant in one par-
Example
5
ticular week is shown in the table below. Represent these data as a pictograph.
Number of ‘Happy meal deals’

Day of the week sold
Monday 30
Tuesday 25
Wednesday 35
Thursday 40
Friday 65
Saturday 135
Sunday 110

11 The pictograph at right shows the
number of cars sold by a small car yard Week 1
over the first six weeks of the new year.
Week 2
a If 7 cars are sold during the 4th
week, how many cars are represented Week 3
by each symbol?
b How many cars does half a symbol Week 4
represent?
c How many more cars were sold Week 5
during the 6th week than during the
Week 6
2nd week?
L Spread
XCE
12 In a class of 25 students, the marks obtained on the Mathematics test were distributed
sheet
E
WORKED
Example
6
as follows: 3 A+s, 7 As, 8 B+s, 6 Bs and 1 C+. Represent these data using a dot-plot. Dot-plots
(DIY)
WORKED 13 The ages of the people arriving at a hospital emergency room during one morning were
Example
7
recorded as follows: 12, 29, 48, 62, 67, 23, 69, 21, 19, 73, 82, 17, 46, 20, 51, 64, 24,
66, 34, 35, 80, 28, 27, 61, 75, 45, 18, 26, 32, 59. Represent the data on a stem-and-leaf
plot.
14 The time (to the nearest second) taken by each student in the class to run a certain
distance was recorded in the table below.
Boys 42 48 46 39 43 38 45 47 51 42 50 50 49
Girls 51 50 46 47 42 40 58 59 52 49 48 44 56
GAME
time
a Represent the data on a back-to-back stem-and-leaf plot with the stems 3, 4 and 5.
Dealing
b Redraw your stem-and-leaf plot so that the stems are now 3*, 4, 4*, 5 and 5*. with data
c Which graph in your opinion gives a better ‘feel’ of the data? Explain your answer. — 001
QUEST
S
M AT H
GE
Which of the five pieces shown below cannot be found in the jigsaw
EN
puzzle? (Do not flip any piece over)

CH L
AL A B
D E
What is hologra
holograph
phy?
y?
Hockey games won by a club
1998 Year Frequency

19 1997
99 1997 A
1998 B
2000 1999 C
2002 2000 D
2001 2001 E
2002 F
Total 36
21 Student absences Week

number Frequency
Number of students absent
18
1 G
15 2 H
3 I
12
4 J
9 5 L
6 M
6
7 N
3 8 O
0
1 2 3 4 5 6 7 8
Week number
Ann
Ben Name SUE BILL DAVE JILL BEN ANN
Name
Jill
Dave Frequency
Bill
Sue P Q R S T U
Number of music CDs = 2 music CDs
11 9 8 11 8 2 9 21 12 17 16 8 1 10
14 0 1 5 16 2 12 21 15 7 11 9 0 8 8
5 12 6 8 21 13 12 1 21 7 18 14 9 1 11 1
1 10 7 21 1 4 3 8 2 11 16 13 12 21 15
18 7 13 8 0 4 8 7 6 13

Representing data grouped into
class intervals
When the set of discrete data consists of a large number of different values, or the data
are continuous, it is convenient to group the data into class intervals. If the size of the
class interval is not specified, as a general guide try to group the data so that 5 to 10
class intervals are formed. Grouped data can then be represented (or displayed) using
frequency tables, histograms and frequency polygons. In this section we will consider
these methods of presenting grouped data.
Frequency tables
A frequency table shows the number of scores (frequencies) that belong to each group
or class interval.
WORKED Example 8
The following set of data shows the number of lollies in each of thirty 500 g bags. Place the
data into a frequency table, using class intervals of 5.
59, 62, 51, 55, 46, 60, 58, 49, 64, 57, 53, 50, 56, 61, 54,
53, 55, 55, 61, 58, 54, 52, 57, 58, 59, 51, 48, 60, 60, 56
THINK WRITE
1 Prepare a table with three columns

headed Class interval, Tally and
Frequency.
2 Make note of the smallest and largest
numbers in the set and determine the first
and last class intervals required.
Note: The smallest value is 46 and the
largest value is 64; therefore, the class
intervals will range from 46–50 to 61–65
inclusive.
3 Go through the list and complete the
tally column.
4 Write down the total tally strokes for
each class interval in the frequency
column.
5 Calculate the total of the frequency
column. Note: A total of 30 should be
obtained.
Histograms and frequency polygons

Once the data are grouped into a frequency table, they can be presented as a histogram
or a frequency polygon.
A histogram is similar to a column graph, but it does not have any gaps between the
columns. The frequency is always placed on the vertical axis and the data on the hori-
zontal axis.
A frequency polygon is a line graph, joining the midpoints of the tops of the adjacent
columns of a histogram.
WORKED Example 9
The data in the frequency table at right show
the mass (in kg) of 40 people upon joining a
Class interval Frequency
weight loss program. Represent the given
data using a: 60–<70 2
a histogram 70–<80 5
b frequency polygon.
80–<90 9
90–<100 12
100–<110 7
110–<120 3
120–<130 2
Total 40
THINK DRAW
a 1 Rule up a set of axes on graph paper. a
Title the graph. Label the horizontal
axis Mass (kg) and the vertical axis
Frequency.
2 Scale the horizontal and vertical
axes. Note: Leave half an interval at
the beginning and end of the graph.
3 Draw a column which represents the
first class interval and reaches a Mass of people joining
a weight loss program
vertical height of 2 people. 12
4 Repeat step 3 for each of the other 10
class intervals.
8
Frequency
6
4
2
0
60 70 80 90 100 110 120 130
Mass (kg)
THINK DRAW
b 1 Mark the midpoints of the tops of b
the rectangles obtained in the Mass of people joining
histogram from part a. a weight loss program
2 Join the midpoints by straight line 12
intervals. 10
3 Close the polygon by drawing lines 8
Frequency
which meet the horizontal axis a
6
half-column width before the first
column and a half-column width 4
after the last column. 2
0
60 70 80 90 100 110 120 130
Mass (kg)
A graphics calculator can be used for constructing histograms. However, when the data
are grouped, the midpoint of each class interval must be entered for x-values, rather
than the extreme values (that is, the beginning and end point).
Graphics Calculator tip! Entering data and

drawing histograms
You can store the data from a frequency distribution table on your graphics calculator.
Consider the data from worked example 9.
Casio:
1. Enter the STAT mode from the MAIN MENU.
2. In List1 enter the midpoint of each class and in List2
enter the frequencies.
3. Press F1 (GRPH). Press F6 (SET) and use the

arrow keys to select the Graph Type as Hist by
pressing F6 () then F1 (Hist). Ensure that X List
is List1 and Frequency is List2. Press EXE .
4. Press F4 (SEL) and set StatGraph1 as On. Press F6 (DRAW). Set the interval to
start at 65 with a pitch of 10. Press F6 (DRAW) to display the histogram.
TI-83:
1. Press STAT then choose option 1: Edit and enter the
midpoint of each class in the L1 column and the
frequencies in the L2 column.
You are now ready to use the calculator to draw the

histogram.
2. Press 2nd [STATPLOT] and choose option 1: Plot 1.

Use the arrow keys to select ON, then the histogram
icon. Then set the Xlist to L1 and the Freq to L2.
3. Press ZOOM and select 9: ZoomStat. The histogram

should now be on screen.
remember
remember
1. Class intervals are used when:
(a) data are spread over a wide range
(b) there is a large amount of data
(c) the data are continuous.
The size of a class interval should lead to the formation of 5 to 10 groups.
2. A frequency table shows the number of scores (frequencies) that belong to each
group or class interval.
3. A histogram resembles a column graph without gaps between each column. A
frequency polygon is a line graph that connects the midpoints of the adjacent
HEET
9.7 columns of a histogram.
SkillS
Organising
data into
class
intervals
Representing data grouped
ogram Ca
9C
GC pr
into class intervals

sio
UV
statistics
WORKED 1 The following set of data shows the pulse rate of 30 gym members, 10 minutes after
Example
am
rogr TI 8
they have started exercising on a treadmill. Place the data into a frequency table, using
GC p
class intervals of 10.

UV
statistics 164, 136, 171, 144, 128, 130, 165, 170, 120, 124, 143, 157, 136, 172, 168,
165, 121, 137, 162, 146, 166, 130, 152, 169, 148, 132, 125, 142, 150, 129

WORKED 2 L Spread
Example XCE
sheet
E
9
Histograms
and
frequency
polygons
L Spread
XCE
sheet
E
Histograms
and
frequency
polygons
(DIY)
Size of house (m2) Frequency
100–<150 13
150–<200 18
200–<250 19
250–<300 17
The data in the frequency table at right
show the number of houses of different 300–<350 14
sizes (measured in square metres) in a
small block. 350–<400 11
Represent the given data using a:
a histogram Total 92
b frequency polygon.
3 a The graph at right shows a histogram 300

for a certain set of data. 250
Frequency
Add the frequency polygon to the 200

graph. 150
100
50
0
10 20 30 40 50 60 70 80 90
Data
b The graph at right shows a frequency

20
polygon for a certain set of data.
Add a histogram to the graph. 15
Frequency
10
0
110 120 130 140 150 160 170 180 190
Data
4 The data below show the number of books borrowed from the school library by 30
Year 10 students over the last year.
20, 39, 15, 0, 2, 41, 25, 7, 19, 30, 5, 12, 26, 4, 32,
40, 10, 15, 22, 19, 33, 7, 16, 9, 34, 28, 24, 11, 8, 12
a Group the data into a frequency table in class intervals of size 10.
b Represent the grouped data on a histogram.
c Add a frequency polygon to the data.
5 For the data in question 4:
SHE
ET 9.1 a Complete a frequency table using a class interval size of 5.
Work
b Represent the data on a histogram.

c Compare this histogram with the one drawn in question 4.
QUEST
S e h
M AT H
GE
f g
Here is a challenging visual puzzle. d
i
EN
In what order should you pick up c
CH L the sticks so that you are always b

AL removing the top stick? j
k

t i gat
es
io
Misuse of graphs
ion v
in
n inv
t i gat
es Collect several graphs from newspapers or magazines.
For each graph:
1 What point is the graph trying to make?
2 Has the graph been presented fairly? If not, how has the display been altered to
make a greater impression?
3 Suppose you are trying to present the opposite point of view. Draw a graph
using the same data to present the opposite point of view.
1
1 Sonya wants to find out which is the most popular video hired from the local video
rental outlet. Would this be best achieved by observation or questioning?
2 A small country town has a population of 4900. If the local council wants to conduct
a survey on the population, what would be an appropriate sample size?
3 In the survey in question 2, the participants are chosen using a random number
generator. A scientific calculator is used and the decimal number 0.516 is generated.
Assuming that the population of the town is numbered 1 to 4900 on the electoral roll,
what is the number of the chosen participant?
4 This town is a mining town, with a population of 3100 males and only 1800 females.
How many males should participate in the survey if a stratified sample is chosen?
5 Various brands of paint are tested to see the area of wall that can be painted with 1 litre
of each brand. Are the data being examined categorical or numerical? If they are
categorical, are they nominal or ordinal? If they are numerical, are they discrete or
continuous?
6 Fifty students were surveyed on their favourite subject. Twenty students selected
Mathematics, 13 selected English, 7 selected History and 10 selected Geography.
Construct a column graph to represent this information.
7 Represent the data from question 6 in a pie graph.
8 Represent the following data on students’ arm span (to the nearest centimetre) on a
back-to-back stem-and-leaf plot.
Girls 152, 148, 139, 169, 151, 143, 142, 148, 152
Boys 161, 169, 181, 191, 162, 153, 185, 161, 152
9 A group of students were tested to see how many sit-ups they could do in 60 seconds.
Construct a histogram and frequency polygon for the tabled data.
Number 1–15 16–30 31–45 46–60 61–75
Frequency 1 10 21 32 2
10 The time that students take to travel to school is summarised in the table below.
Time (mins) 0–<5 5–<10 10–<15 15–<20 20–<25 25–<30 30–<35
Frequency 2 5 10 8 6 4 1
Represent the data as a pictograph.
Measures of central tendency

Measures of central tendency are summary statistics which measure the middle (or
centre) of the data. These are known as the mean, median and mode.
• The mean is the average of all observations in a set of data.
• The median is the middle observation in an ordered set of data.
• The mode is the most frequent observation in a data set.
Ungrouped data
Mean
To obtain the mean of a set of ungrouped data, all numbers (scores) in the set are added
together and then the total is divided by the number of scores in that set.
sum of all scores
mean = ----------------------------------------
number of score
∑ x- .
Symbolically this is written x = -------
n
Median
The median is the middle value of any set of data arranged in numerical order. In the
n+1
set of n numbers, the median is located at the ------------ th score. The median is:
2
• the middle score for an odd number of scores arranged in numerical order
• the average of the two middle scores for an even number of scores arranged in
numerical order.
Mode
The mode is the score that occurs most often in a set of data. Sets of data may contain:
1. no mode; that is, each score occurs once only
2. one mode
3. more than one mode.
WORKED Example 10
For the data set 6, 2, 4, 3, 4, 5, 4, 5 find the:
a mean b median c mode.
THINK WRITE
a 1 Calculate the sum of the scores; that a Σx = 6 + 2 + 4 + 3 + 4 + 5 + 4 + 5
is, Σx. = 33
2 Count the number of scores; that is, n. n=8
Σx
3 Write the rule for the mean. x = ------
n
4 Substitute the known values into the = 33
------
rule. 8
5 Evaluate. = 4.125
6 Answer the question. The mean is 4.125.
MQ QLD 3 - Chapter 09 Page 405 Friday, June 3, 2005 1:09 PM

THINK WRITE
b 1 Check that scores are arranged in b 23444556
numerical order.
2 Locate the position of the median using the 23444556
n+1 4+4
rule ------------ where n = 8. This places the Median = ------------
2 2
median as the 4.5th score; that is, between 8
= ---
the 4th and 5th score. 2
3 Obtain the average of the two middle =4
scores.
4 Answer the question. The median is 4.
➞
➞
c 1 Systematically work through the set and c 23444556
➞
➞
➞
make note of any repeated values (scores).
2 Answer the question. The mode is 4.
Graphics Calculator tip! Finding the mean and

median of ungrouped data
To calculate the mean and median for the given set of observations 1, 2, 2, 4, 4, 4, 5, 6,
8, 9, using a graphics calculator, the following steps must be employed.
Casio:
1. Enter the STAT mode from the MAIN MENU.
2. Enter the scores as List1. Press F2 (CALC) then
F6 (SET). Ensure that 1Var Xlist is set at List1 and
1Var Freq is set at 1. Press EXE .
3. Press F1 (1 VAR). The arrow keys can be used to scroll down the list. The mean x–
is shown as 4.5 and the median, Med, is shown as 4.
TI-83:
1. Press STAT then select 1: Edit and enter the scores
in the L1 column.
2. Again press STAT , arrow across to choose the
CALC option and select 1: 1-Var Stats.
3. Press ENTER to display a list of key statistics. The

mean and median can be found among this list.
You will need to use the arrow keys to scroll the screen
in order to see the median.
The mean is given as x = 4.5 and the median is given as
Med = 4.
Calculating mean, median and mode from a frequency distribution table

If data are presented in a frequency distribution table, the formula used to calculate the
( f × x)
mean is x = ∑ ---------------------- . Here, each value (score) in the table is multiplied by its corre-
n
sponding frequency; then all the (f × x) products are added together and the total sum is
divided by the number of observations in the set.
To find the median we find the position of each score from the cumulative frequency
column. The mode is the score with the highest frequency.
WORKED Example 11
For the table at right find the:
a mean b median c mode. Score (x) Frequency (f)
4 1
5 2
6 5
7 4
8 3
Total 15
THINK WRITE
1 Rule up a table with four columns titled Frequency
Score (x), Frequency ( f ), Score Frequency × score Cumulative
Frequency × score ( f × x) and (x) (f) ( f × x) frequency (cf )
Cumulative frequency (cf ). 4 1 4 1
2 Enter the data and complete both the 5 2 10 3
f × x and cumulative frequency 6 5 30 8
columns. 7 4 28 12
8 3 24 15
n = 15 Σ( f × x) = 96
THINK WRITE
Σ( f × x)
a 1 Write the rule for the mean. a x = ---------------------
n
96
2 Substitute the known values into the x = ------
15
rule and evaluate.
= 6.4
3 Answer the question. The mean of the data set is 6.4.
15 + 1
b 1 Locate the position of the median b The median is the --------------- th or 8th score.
2
n+1
using the rule ------------ where n = 15.
2
This places the median as the 8th score.
2 Use the cumulative frequency The median of the data set is 6.
column to find the 8th score and
answer the question.
c 1 The mode is the score with the c The score with the highest frequency is 6.
highest frequency.
2 Answer the question. The mode of the data set is 6.
Finding the mean (and other key

Graphics Calculator tip! statistics) of data in a frequency
distribution table
To find the mean of data presented in a frequency distribution table you need to store
the data in two lists.
Casio:
1. Enter the STAT mode from the MAIN MENU. Enter the scores in List1 and the
frequencies in List2.
2. Press F2 (CALC) then F6 (SET). Set the 1Var Xlist as List1 and the 1Var Freq as
List2. Press EXE .
3. Press F1 (1VAR) to display the key statistics. The mean is the value given for x .
TI-83:
1. Press STAT then select 1: Edit and enter the scores in the L1 column and the fre-
quencies in the L2 column.
2. Again press STAT , choose the CALC menu and select 1:1-Var Stats. Press 2nd
[L1] followed by a comma, then 2nd [L2] .
3. Press ENTER to display the key statistics. The mean is the value given for x .
Grouped data
Mean
When the data are grouped into class intervals, the actual values (or data) are lost. In
such cases we have to approximate the real values with the midpoints of the intervals
into which these values fall. For example, when measuring heights of students in a
class, if we found that 4 students had a height between 180 and 185 cm, we have to
assume that each of those 4 students is 182.5 cm tall. The formula used for calculating
the mean is the same as for data presented in a frequency table:
Σ( f × x)
x = ---------------------
n
Here x represents the midpoint (or class centre) of each class interval, f is the corre-
sponding frequency and n is the total number of observations in a set.
Median
The median cannot be found exactly when the data have been grouped. Instead we can
locate the median class from the cumulative frequency.
Modal class
We do not find a mode because exact scores are lost. We can, however, find a modal
class. This is the class interval that has the highest frequency.
WORKED Example 12
For the given data: Class interval Frequency
a estimate the mean
b find the median class 60–<70 5
c find the modal class. 70–<80 7
80–<90 10
90–<100 12
100–<110 8
110–<120 3
Total 45
THINK WRITE
1 Draw up a table with 5 columns Mid- Fre- Frequency Cumulative
headed Class interval, Midpoint Class point quency × midpoint frequency
(x), Frequency ( f ), Frequency × interval x (f) (f × x) (cf )
midpoint ( f × x) and Cumulative 60–<70 65 5 325 5
frequency (cf ). 70–<80 75 7 525 12
2 Complete the x, f × x and cf 80–<90 85 10 850 22
columns. 90–<100 95 12 1140 34
100–<110 105 8 840 42
110–<120 115 3 345 45
Σf = 45 Σ(f × x) = 4025

THINK WRITE
Σ( f × x)
a 1 Write the rule for the mean. a x = ---------------------
n
4025
2 Substitute the known values x = ------------
45
into the rule and evaluate.
≈ 89.4
3 Answer the question. The mean for given data is approximately 89.4.
n+1 n+1
b 1 Median = ------------ score. Use b Median = ------------ score
2 2
this to locate the median score. 45 + 1
Median = --------------- score
2
Median = 23rd score
2 Locate the class interval that The 23rd score lies in the 90–<100 class.
contains this median score.
3 Answer the question. The median class is the 90–<100 class interval.
c 1 The modal class is the class c

interval with the highest
frequency.
2 Answer the question. The modal class is the 90–<100 class interval.
Graphics Calculator tip! Finding the mean (or other

key statistics) for grouped data
Casio:
The method used here is the same as that shown on page 399 for entering data in a
frequency distribution table. The scores to be entered in the List1 column are the
midpoints of the class intervals.
To find the mean or other key statistics, press F2 (CALC) then F6 (SET). Ensure
that the 1Var Xlist is set as List1 and the 1Var Freq is set as List2 and press EXE .
Pressing F1 (1VAR) displays the key statistics, the mean being shown as the value
of x .
TI-83:
The method used here is the same as that shown on page 400 for entering data in a
frequency distribution table. The scores to be entered in the L1 column are the
midpoints of the class intervals.
To find the mean or other key statistics, press STAT , choose the CALC menu, select
1:1-Var Stats, and press ENTER .
remember
remember
For ungrouped data the following measures of central tendency are used.
1. The mean is the sum of scores in a given set of data divided by the number of
scores in the set.
Σx
x = ------ is used when a list of scores is given.
n
Σ( f × x)
x = --------------------- is used when a frequency distribution table is given.
n
2. The median is:
(a) the middle score for an odd number of scores arranged in numerical order
(b) the average of the two middle scores for an even number of scores arranged
in numerical order.
n+1
Its location is determined by finding the score in the ------------ th position.
2
3. The mode is the score that occurs most often in a set of data.
For grouped data the following measures of central tendency are used.
Σ( f × x)
4. The mean is x = --------------------- , where x represents the midpoint of a class interval.
n
5. The median class can be determined from the cumulative frequency.
6. The modal class is given by the class interval with the highest frequency.
9D Measures of central tendency
d WORKED 1 For each of the following sets of data find the:

hca Example
i mean ii median iii mode.
Mat
10
Measures a 3, 5, 6, 8, 8, 9, 10 b 4, 6, 7, 4, 8, 9, 7, 10
of central c 17, 15, 48, 23, 41, 56, 61, 52 d 4.5, 4.7, 4.8, 4.8, 4.9, 5.0, 5.3
tendency 1 1 1 1 1
e 7 --2- , 10 --4- , 12, 12 --4- , 13, 13 --2- , 13 --2- , 14
Spreadshe
2 The following back-to-back stem-and-leaf plot shows the test results of 25 Year 10
EXCEL
et
Finding students in Mathematics and Science. Find the mean, median and mode for each of
the median the two subjects.
Spreadshe
Key: 3|2 = 32
Leaf Stem Leaf
EXCEL
et
Finding Science Mathematics

the median 873 3 29
(DIY) 96221 4 068
Spreadshe 876110 5 135
EXCEL
et
97432 6 2679
Finding
the mode 8510 7 3678
73 8 044689
Spreadshe 9 258
EXCEL
et
Finding
the mode
(DIY)

WORKED 3 For the data shown in each of the following frequency distribution tables, find the: ram
Example prog
i mean ii median iii mode.
Casi
GC
11
o
a b UV
statistics
Score (x) Frequency ( f ) Score (x) Frequency ( f )
rog
4 3 12 4 GC p ram
TI
5 6 13 5 UV
statistics
6 9 14 10
7 4 15 12 XCE
L Spread
sheet
E
8 2 16 9
Calculating
Total 24 Total 40 the mean
from a
frequency
4 The following data show the number of bedrooms in each of the 10 houses in a table
particular neighbourhood: 2, 1, 3, 4, 2, 3, 2, 2, 3, 3. L Spread
XCE
sheet
E
Calculating
the mean
from a
frequency
table
(DIY)
a Calculate the mean and median number of bedrooms.

b A local motel contains 20 rooms. Add this observation to the set of data and recal-
culate the values of the mean and median.
c Compare the answers obtained in parts a and b and complete the following state-
ment: When the set of data contains an unusually large value(s), called an outlier,
the (mean/median) is the better measure of central tendency, as it
is less affected by this extreme value.
WORKED 5 For the given data:

Example
a estimate the mean b find the median class c find the modal class.
12
40–<50 2
50–<60 4
60–<70 6
70–<80 9
80–<90 5
90–<100 4
Total 30
6 Calculate the mean of the grouped data shown in the table below.
100–<109 3
110–<119 7
120–<129 10
130–<139 6
140–<149 4
Total 30
7 Find the modal class of the data shown in the table below.
51–<55 1
56–<60 3
61–<65 4
66–<70 5
71–<75 3
76–<80 2
Total 18

8 multiple choice
The number of textbooks sold by various bookshops during the second week of
December was recorded. The results are summarised in the table below.
Number of books sold Frequency
220–<229 2
230–<239 2
240–<249 3
250–<259 5
260–<269 4
270–<279 4
Total 20
a The modal class of the data is given by the class interval(s):

A 220–<229 and 230–<239 B 250–<259
C 260–<269 and 270–<279 D of both A and C
b The midpoint of the first class interval is:
A 224 B 224.5 C 225 D 225.5
c The median of the data is in the interval:
A 230–<239 B 240–<249 C 250–<259 D 260–<269
d The mean of the data is:
A 251 B 252 C 253 D 254
9 A random sample was taken, composed of 30 people shopping at a Coles supermarket

on a Tuesday night. The amount of money (to the nearest dollar) spent by each person
was recorded as follows:
6, 32, 66, 17, 45, 1, 19, 52, 36, 23, 28, 20, 7, 47, 39
6, 68, 28, 54, 9, 10, 58, 40, 12, 25, 49, 74, 63, 41, 13.
a Find the mean and median amount of money spent at the checkout by the people in
this sample.
b Group the data into class intervals of 10 and complete the frequency distribution
table. Use this table to estimate the mean amount of money spent.
c Add the cumulative frequency column to your table and fill it in. Find the median
class.
d Compare the mean and the median of the original data from part a, with the mean
and the median class obtained for grouped data in parts b and c. Were the estimates
good enough? Explain your answer.
10 a Add one more number to the set of data 3, 4, 4, 6, so that the mean of a new set is
equal to its median.
b Design a set of five numbers so that mean = median = mode = 5.
c In the set of numbers 2, 5, 8, 10, 15 change one number, so that the median
remains unchanged, while the mean increases by 1.
Career profile
GRAHAM DE HOEDT — Meteorologist
Qualifications: models to try and predict what will happen in

Bachelor of Applied Science the future.
Diploma of Meteorology For example we can get an idea of the
Master of Applied Science average rainfall conditions for a particular
place by looking at the mean rainfall. We can
Employer: also get an idea of expected extreme rainfall
Bureau of Meteorology conditions, by looking at percentile or decile
rainfall information.
Company website:
In the work I do it is necessary to know:
www.bom.gov.au
the difference between the mean and the
I was interested in science when I was median,what the term 25th, 50th or 75th
younger. I enjoyed mathematics at school and percentile means and what the term 10th,
used to read about the great scientists and 50th or 90th decile means.
inventors such as Sir Isaac Newton, Johannes For example, the formula for rainfall
Kepler, Nicolaus Copernicus and Albert variability is:
Einstein. The fields of climatology and
meteorology use many of the principles of 90p – 10p
Rainfall variability = ------------------------
mathematics and physics which I’ve found so 50p
interesting. Where 90p, 50p and 10p = 90th, 50th and
Climate work, such as climate forecasting 10th percentiles respectively.
or long-term rainfall prediction, involves the A few weeks ago, a scientist from the
use of statistics and statistical methods. As CSIRO needed a map showing the 80th
well as using statistics to look at what has percentile rainfall over Australia. To do this I
happened in the past (investigating the used 100 years of historical rainfall data for
historical data) and what is happening now, all the available rainfall measuring stations
we use statistical and dynamical climate across Australia. I then used the historical

data to calculate the 80 percentile rainfall scientific fields, where mathematics is
value for each station, and used this to absolutely essential in day to day work. In
generate the map. many scientific areas, such as meteorology
After this was completed, I continued and climatology, mathematics is used as a
working on the average temperature atlas for tool to model many of the natural processes
Australia. The temperature atlas will have such as cloud formation and the movement of
maps showing mean monthly and annual weather systems. I believe it is essential to
maximum and minimum temperature across have an understanding of mathematics and be
Australia. These maps can be used to get an able to use it as part of the problem solving
idea of average temperature conditions for process.
anywhere in the country. They are useful if Questions
you are planning a trip or if you want to know 1. Give two examples of the type of data
whether a particular crop will grow in a that Graham calculates the mean of.
particular area. 2. Explain what a percentile is.
I think that an understanding of 3. Find out what a decile is.
mathematics is useful in just about any area 4. What is the difference between
of work. This is particularly true in the climatology and meteorology?
Measures of spread
Range
The most basic measure of spread is the range. It is defined as the difference between
the highest and the lowest values in the set of data.
range = highest score − lowest score or
range = Xmax − Xmin
WORKED Example 13
Find the range of the given data set: 2.1, 3.5, 3.9, 4.0, 4.7, 4.8, 5.2.
THINK WRITE
1 Identify the lowest score of the data set. Lowest score = 2.1
2 Identify the highest score of the data set. Highest score = 5.2
3 Write the rule for the range. Range = highest score − lowest score
4 Substitute the known values into the rule. = 5.2 − 2.1
5 Evaluate. = 3.1
Interquartile range
Another way of measuring the difference in spread is by dividing the data set into quar-
ters. The number that marks the end of the first quarter of an ordered data set is called
the lower quartile and is denoted by QL (or the 25th percentile). The number that marks
the end of the third quarter is called the upper quartile and is denoted by QU (or the
75th percentile).
The difference between the upper and lower quartiles is called the interquartile range
(IQR). It considers the middle 50% of the data.
IQR = QU − QL
The lower quartile, upper quartile and the interquartile range of a set of data may be
calculated using the following steps.
1. Order the set of data.
2. Locate the median that divides the set of data into two halves.
(a) For an odd number of scores, the median will be one of the original scores. It
should not be included in either the lower or upper half of the scores.
(b) For an even number of scores the median will lie halfway between two scores. It
will divide the data into two equal sets.
3. Locate and calculate QL, the median of the lower half of the data.
4. Locate and calculate QU, the median of the upper half of the data.
5. Obtain the interquartile range by calculating the difference between the upper and
lower quartiles; that is, IQR = QU − QL.
WORKED Example 14
Calculate the interquartile range (IQR) of the following set of data: 3, 2, 8, 6, 1, 5, 3, 7, 6.
THINK WRITE
1 Arrange the scores in order. 123356678
2 Locate the median and use it to divide the 1233 5 6678
data into two halves. Note: The median is the
5th score in this data set and should not be
included in either half of the data.
2+3
3 Find QL, the median of the lower half of the QL = ------------
2
data. 5
= ---
2
= 2.5
6+7
4 Find QU, the median of the upper half of the QU = ------------
2
data. 13
= ------
2
= 6.5
5 Calculate the interquartile range. IQR = QU − QL
= 6.5 − 2.5
=4
Graphics Calculator tip! Finding the range and

interquartile range
To find the range and interquartile range of the data in worked example 14, follow these
steps:
Casio:
1. Enter the STAT mode from the MAIN MENU. Enter the data as List1.
2. Press F2 (CALC) then F6 (SET). Ensure that 1Var Xlist is set as List1 and 1Var Freq
is set as 1 and press EXE . Press F1 (1VAR) to display the key statistics.
3, Scroll down the list of key statistics to find the ones of interest.

TI-83:
1. Press STAT , choose option 1: Edit and enter the data as L1.
2. Again press STAT , arrow across to choose the CALC menu and select 1: 1-Var
Stats. Press ENTER to display the key statistics.
3. Scroll down through the list of key statistics to find the ones of interest.
Use minX and maxX to find the range and Q1 and Q3

for QL and QU to find the interquartile range.
Boxplots
Boxplots (or box-and-whisker plots) are constructed using a five-number summary
which includes the lowest value of the set, the lower quartile, the median, the upper
quartile and the highest value of the set; that is, Xmin, QL, Median, QU, Xmax. The ver-
tical ends of a box extend from the lower to the upper quartile and contain a vertical
line, indicating the location of the median. Whiskers extend to the smallest and to the
largest values on either side of the box.
Lowest Lower Upper Highest

value quartile quartile value
Xmin QL Median QU Xmax
From the diagram it can be seen that:

• the interquartile range is represented by the partitioned box
• the median is a vertical line within the box
• the whiskers represent the range of scores.
WORKED Example 15
a Represent the following set of data using a boxplot: 4, 5, 5, 6, 9, 10, 12, 14, 15.
b State the: i range and ii IQR of the data.
THINK WRITE
a 1 Check that the scores are in a 4 5 5 6 9 10 12 14 15
ascending order. Median
2 State the smallest number in the set. Lowest value = 4
3 State the largest number in the set. Highest value = 15
4 Find the median (5th score). Median = 9
Continued over page
THINK WRITE
5+5 12 + 14
5 Find the value of QL and QU. QL = ------------ QU = ------------------
2 2
10 26
= ------ = ------
2 2
=5 = 13
6 Draw a horizontal axis which is
evenly scaled and incorporates the
given values.
7 Draw a box representing the
interquartile range which begins at 5
(QL) and ends at 13 units (QU).
8 Draw a vertical line within the box at
9 units (the median).
9 Draw two horizontal lines, one
extending from the smallest value to
the lower quartile end of the box, the 2 4 6 8 10 12 14 16
other extending from the upper quartile
end of the box to the highest value.
b ii 1 Write the rule for the range. b ii Range = highest value − lowest value
2 Substitute the values into the rule. = 15 − 4
3 Evaluate. = 11
b ii 1 Write the rule for the interquartile b ii IQR = QU − QL

range.
2 Substitute the values into the rule. = 13 − 5
3 Evaluate. =8
Graphics Calculator tip! Drawing a boxplot and finding

the five-number summary
Both the boxplot and the five-number summary can be obtained easily using a graphics
calculator.
Use the instructions given below to construct a boxplot and obtain the five-number
summary for the following data: 4, 5, 12, 17, 18, 20, 26, 30, 39, 42, 45.
Casio:
1. In the STAT mode, enter the data as List1.
2. Press F1 (GRPH) then F6 (SET). Set Graph Type as
MedBox by pressing F6 () then F2 (Box) with
XList as List1 and Frequency as 1. Press EXE .

3. Press F4 (SEL) and ensure that StatGraph1 is On.
Press F6 (DRAW) to display the boxplot.
4. Press SHIFT F1 (Trace). Use the arrow keys to

explore the boxplot and locate the five-number
summary.
TI-83:
1. Press STAT , choose EDIT and enter the data as L1.
2. Press 2nd [STAT PLOT] and press 1 and select
ON. Then use the arrow keys to choose the box-and-
whisker plot.
3. Xlist will need to be L1 while Freq must become 1.
4. Press ZOOM then choose option 9: ZoomStat to

display your graph.
5. Press TRACE to explore the boxplot and locate the

five-number summary.
Outliers
An outlier is a piece of data which is considerably different from the rest of the values
in a set of data. The presence of an outlier may be an indication that an error has been
made in recording the data. Outliers may alter the representative nature of any statistics
calculated, as illustrated below.
For the data set 3, 3, 2, 1, 3, 4, 2, 3, 2, 2 the measures of central tendency are:
mean = 2.5 median = 2.5 mode = 2 and 3.
When an outlier, say 48, is added to the original data set, the measures of central ten-
dency for the new list, 3, 3, 2, 1, 3, 4, 2, 3, 2, 2, 48 are: mean = 6.64, median = 3,
mode = 2 and 3. It can be seen that when an outlier is added to a set of data, the mean
may not be truly representative of values in the data set. In this case the median (or
mode) would be a better measure of central tendency than the mean.
An outlier can be defined as any value which is more than 1.5 × interquartile range
above the upper quartile value or more than 1.5 × interquartile range below the lower
quartile value.
When drawing a boxplot the whiskers do not extend as far as any outliers. The whis-
kers stop at the last score that is not an outlier, with crosses placed at the value of any
outliers.
WORKED Example 16
As newly appointed coach of Omizzolo’s Shooting
Stars basketball team, Maria decided to record
each player’s statistics for the previous season. The
number of goals scored by the leading goal shooter
were:
3, 18, 30, 29, 25, 25, 36, 27, 28, 28, 28, 23,
1, 22, 23, 19, 19, 20, 2, 26, 29, 30, 30, 25.
a Prepare a boxplot for the data, showing the
position of any outliers.
b Suggest reasons for any outliers obtained.
THINK WRITE
a 1 Check that the scores are in a 1, 2, 3, 18, 19, 19, 20, 22, 23, 23, 25, 25,
ascending order. 25, 26, 27, 28, 28, 28, 29, 30, 30, 30, 30, 36
2 State the smallest number in the set. Lowest value = 1
3 State the largest number in the set. Highest value = 36
4 Find the median (12.5th score). Median = 25
5 Find the value of QL and QU. QL = 19.5, QU = 28.5
6 Find the IQR. IQR = 28.5 − 19.5
=9
7 Check for outliers. Calculate 1.5 × IQR. 1.5 × IQR = 1.5 × 9
= 13.5
Subtract this result from QL to find the QL − 13.5 = 19.5 − 13.5 = 6
lower limit. QU + 13.5 = 28.5 + 13.5 = 42
Add the result to QU to find the upper Outliers are values lower than 6 and higher
limit. than 42.
8 Write the values of any outliers. There are 3 outliers; they are: 1, 2 and 3.
9 Draw a horizontal axis that is evenly
scaled and incorporates the given values.
10 Draw a box representing the
interquartile range that begins at 19.5
(QL) and ends at 28.5 units (QU).
11 Draw a vertical line within the box at
25 units (the median).
THINK WRITE
12 Draw the whiskers. The upper whisker

is drawn from the top end of the box to
the highest score (36). The lower
whisker is drawn from the bottom end
of the box to the score of 18 which is
the lowest score that is not an outlier.
13 Place crosses on the outliers.
19 25 28.5
123 18 36
XXX
0 4 8 12 16 20 24 28 32 36 40
Number of goals
b Comment on any outliers obtained and b The low number of goals scored (outliers)
suggest reasons for their presence. could be due to a number of reasons such as
the goal shooter playing poorly, the team’s
inability to get the ball to the goal shooter,
injuries to the team and goal shooter, or the
superior playing ability of the opposition.
Graphics Calculator tip! Drawing a boxplot

with outliers
Casio:
To draw a boxplot with outliers such as in the previous
example we need to set the StatGraph with the Outliers
option On.
TI-83:
To draw a boxplot with outliers such as in the example
above, we need to choose the modified boxplot option.
The steps are the same as previously shown ( 2nd
[STAT PLOT]). However, when selecting the type of
graph, we use the modified boxplot option as shown at
right.
Use your graphics calculator to draw the boxplot for worked example 16.
remember
remember
1. Range = highest score − lowest score or range = Xmax − Xmin
2. The difference between the upper and lower quartiles is called the interquartile
range, IQR. IQR = QU − QL. The IQR accounts for the middle 50% of the data.
3. A boxplot is a graphical representation of the five-number summary; that is, the
lowest score, lower quartile, median, upper quartile, highest score, for a
particular set of data. It consists of a partitioned box and a whisker at either end
that extends to the extreme scores.
Lowest Lower Upper Highest

value quartile quartile value
Xmin QL Median QU Xmax
4. Any piece of data that is considerably different from the rest of the values in a
set of data is called an outlier. When a set of data includes an outlier, the
median (or mode) rather than the mean is a better measure of central tendency.
9E Measures of spread
d WORKED 1 Find the range for each of the following sets of data.
hca Example
Mat
13 a 4, 3, 9, 12, 8, 17, 2, 16
Measures
of spread b 49.5, 13.7, 12.3, 36.5, 89.4, 27.8, 53.4, 66.8
c 7 1--2- , 12 3--4- , 5 1--4- , 8 2--3- , 9 1--6- , 3 3--4-
ogram Ca
2 Calculate the interquartile range, IQR, for the following sets of data.
GC pr
sio
WORKED
Example
UV
statistics 14 a 3, 5, 8, 9, 12, 14 b 7, 10, 11, 14, 17, 23
c 66, 68, 68, 70, 71, 74, 79, 80 d 19, 25, 72, 44, 68, 24, 51, 59, 36
am
rogr TI
3 The following stem-and-leaf plot shows the mass of newborn babies (rounded to the
GC p
UV nearest 100 g). Find the:

statistics
a range of the data b IQR of the data.
Key 1*| 9 = 1.9 kg
Stem Leaf
1* 9
2* 24
2* 6789
3* 001234
3* 55678889
4* 01344
4* 56689
5* 0122

WORKED 4 Represent each of the following sets of data using a boxplot. L Spread
Example XCE
i range and ii IQR of each set.
sheet
State the:
E
15
a 6, 9, 12, 13, 20, 22, 26, 29 Boxplots
b 7, 15, 2, 26, 47, 19, 9, 33, 38
c 120, 99, 101, 136, 119, 87, 123, 115, 107, 100
5 The following set of data shows the ages of 30 people who attended a concert.
18, 26, 10, 12, 20, 18, 19, 10, 19, 17, 17, 9, 11, 13, 16
14, 14, 13, 12, 13, 24, 10, 12, 15, 14, 12, 16, 18, 11, 13
a Draw a boxplot of these data using a graphics calculator.
b State the range and the interquartile range of the data.
6 multiple choice
The diagram at right shows the heights
of a group of students.
a The interquartile range of the data is:
A 34 B 18
138 148 156 160 172 cm
C 12 D 22
b Which of the following is not true?
A 50% of the students are shorter than 1.56 m.
B The number of students shorter than 1.48 m is less than the number of those taller
than 1.60 m.
C The range of the heights of the students in the group is 34 cm.
D 75% of students have a height of 1.6 m or under.
WORKED 7 As newly appointed coach of Terrorolo’s Meteors netball team, Kate decided to record
Example
16
each player’s statistics for the previous season. The number of goals scored by the
leading goal shooter was:
1, 3, 8, 18, 19, 23, 25, 25, 25, 26, 27, 28,
28, 28, 28, 29, 29, 30, 30, 33, 35, 36, 37, 40.
a Prepare a boxplot for the data, showing the position of any outliers.
b Suggest reasons for any outliers obtained.
8 The following back-to-back stem-and-leaf plot shows the ages of 30 pairs of men and
women when entering their first marriage.
Key: 1 | 6 = 16 years old
Leaf Stem Leaf
Men Women
998 1 67789
99887644320 2 001234567789
9888655432 3 01223479
6300 4 1248
GAME
60 5 2
time
a Use a graphics calculator to construct a pair of parallel boxplots to represent the two Dealing
with
sets of data. (Parallel boxplots are those that share a common scale and are placed data
one above the other). — 002
b Find the mean, median, range and interquartile range of each set.
SHE 9.2
ET
c Find any outliers, if they exist, for each set.
Work
d Write a short paragraph comparing the two distributions. (Use mathematical evi-
dence, particularly the answers to part b.)
t i gat
es Standard deviation
io
ion v
in
n inv
t i gat
es Statisticians use range and interquartile range as measures of the spread of data.
They are not the only measures used, however. The most frequently used measure
is the standard deviation.
Consider the data set below:
5, 10, 10, 13, 15, 16, 17, 20, 21, 25.
1 Find the range of the data.
2 Find the interquartile range of the data.
Now consider what happens if we change the first and last figures in the data set.
Consider the new data set:
9, 10, 10, 13, 15, 16, 17, 20, 21, 22.
5 After changing only two figures in the data set, describe the effect on the range
and the interquartile range.
Now consider another slight change to the data set:
9, 10, 12, 13, 15, 16, 17, 17, 21, 22.
8 We have now changed only 4 figures in the data set. Describe the effect that
this has had on the range and the interquartile range.
Again consider the original data set: 5, 10, 10, 13, 15, 16, 17, 20, 21, 25.
Change the data set to 5, 6, 10, 16, 17, 18, 19, 20, 24, 25.
11 Compare the range and interquartile range of this data set with those of the
original data set.
The standard deviation is a measure of spread that considers every score in the data
set. Again, consider the original data set:
5, 10, 10, 13, 15, 16, 17, 20, 21, 25.
We want to find a measure of the typical distance that each score lies from the
mean.
12 Find the mean.
We can find the distance of each score from the mean by subtracting the mean from
the score. This is called the deviation and is written as x – x . Because half these
scores will be negative we square each to obtain a positive answer. These are called
the squared deviations and are written as ( x – x ) 2 .

13 Complete the table below.
Score x–x ( x – x )2
5
10
10
13
15
16
17
20
21
25
14 Next find the sum of the squared deviations, Σ ( x – x ) 2 and divide this result
by the number of scores, n. The result of this calculation is called the variance.
15 Finally, because the deviations were squared, we compensate for this by
taking the square root of the variance. This is the standard deviation, denoted
Σ( x – x )2
σn. Written as a formula σ n = ---------------------- .
n
16 In practice we do not go through this long procedure in each case. The
standard deviation can be found on a calculator by using the σn function. Enter
the data as a list on your graphics calculator.
17 The CALC function of 1 Variable Stats can be used to see the standard
deviation listed among other key statistics of the data.
t i gat
es Conducting a statistical inquiry
io
ion v
in
n inv
t i gat
es Your knowledge of sampling procedures, methods of graphical representation of
the data, and ways of calculating measures of central tendencies and measures of
spread of the data can now be applied in a single task — conducting a statistical
investigation.
You are to conduct an investigation on the theme of computers, with the target
population being all the students in your school.
1 Decide on the strata for your sample (it could be, for instance, gender, or
students in Year 8–10, Year 11–12).
2 Develop a method of obtaining a stratified random sample.
3 Design a questionnaire. Include at least 10 questions and ensure that some of

the sets of answers that you will obtain will represent numerical data and some
categorical data (include at least 3 questions that will entail a numerical
answer). Give your questionnaire a ‘test run’; that is, ask somebody to answer
the questions. Refine the questionnaire if necessary.
4 Once your questionnaire has been refined, prepare a response form for each
participant. The response forms, when completed, form a database which you
can then record on computer. Each completed form will be a record on the
database and each question forms a field. This will make it easy to recall
respondents who had common answers.
5 For each stratum, represent the data obtained for each question using any
graphical means discussed in the chapter. When representing categorical data,
try to use each of these: bar/column graphs, pie graphs and pictographs. When
representing numerical data, try to use each of these: histograms, frequency
polygons, stem-and-leaf plots and boxplots. Use frequency tables where
necessary.
6 Analyse the data obtained for each stratum separately by calculating measures
of central tendency and measures of spread. Recall that outliers may indicate
errors in recording the data.
7 Compare the data obtained from each stratum, starting with graphical
representation. For categorical data use multiple and compound bar/column
graphs. For numerical data draw back-to-back stem-and-leaf plots if you had
only 2 strata, or parallel boxplots if there were more than 2 strata. Compare the
measures of central tendency and the spread of the distributions. Do the
opinions of people from different strata differ a lot?
8 Now combine the data from your strata. Analyse the combined data (that is,
your full sample). This again should include graphs and calculations of
measures of central tendency and spread.
9 What conclusions can be made regarding the target population?
10 Write a report. It must include a full account of your investigation.
hm
r ic e
Who owns the gold coins?
nt en
nten
r ic e A group of divers has just located a sack of gold coins on the seabed of the Pacific
hm
Ocean. In an effort to trace the country of origin, Britain, Mexico and the USA
have all claimed ownership. Since the coins have been in the ocean for a
considerable length of time, any identifying marks have been eroded away. The
three countries were each asked to supply a random sample of 20 of their gold
coins which they suspect of being the type in the raised treasure. Measurements on
these coins are to be compared with measurements of a sample of 20 from the
discovered sack. The following table displays the results of these measurements.
Britain Mexico USA Gold coins
Diameter Thickness Diameter Thickness Diameter Thickness Diameter Thickness

(mm) (mm) (mm) (mm) (mm) (mm) (mm) (mm)
33 2.3 29 2.4 30 2.2 32 2.2
32 2.5 30 2.6 31 2.4 33 2.6
29 2.7 28 2.7 29 2.5 29 2.7
35 2.8 30 3.0 31 2.6 35 2.8
31 2.5 33 2.7 34 2.5 35 2.7
36 2.9 34 2.7 28 2.9 30 2.7
33 3.0 27 3.0 35 2.8 27 3.0
30 2.8 29 2.9 30 2.6 34 2.6
32 2.6 29 2.7 30 2.6 34 2.8
34 2.4 30 2.6 32 2.4 32 2.7
33 2.7 31 2.7 31 2.5 31 2.4
32 2.9 32 2.9 34 2.7 36 3.0
29 2.4 29 2.6 29 2.4 31 2.3
33 2.7 30 3.0 31 2.7 33 2.9
34 3.0 30 3.0 31 2.8 35 2.8
36 3.0 31 3.0 31 2.9 31 3.0
33 2.8 27 2.8 28 2.8 30 3.0
30 3.0 28 3.0 29 2.8 31 3.0
35 2.5 31 2.6 33 2.5 35 2.5
34 2.4 29 2.7 29 2.6 31 2.7
1 Analyse these measurements both mathematically and graphically.
2 Report the results of your findings. Which country is the rightful owner of the
coins? Provide evidence for your decision.
Bivariate data
So far in this chapter we have dealt with univariate data; that is, only one variable was
considered for each piece of data. In this section we will look at sets of data where each
piece is represented by two variables. Such data are called bivariate.
Consider the following example. A researcher for a particular electricity company
wishes to analyse electricity consumption patterns. She selects a sample of households
and asks a representative of each household to name the number of electrical appli-
ances in their home and the amount of electricity consumed in the first quarter. This is
an example of bivariate data, since each household is being represented by 2 variables:
the number of electrical appliances and the electricity consumption. Furthermore, since
the amount of energy consumed could depend on the number of electrical appliances
being used, the number of electrical appliances can be thought of as an independent
variable and the electricity consumption as a dependent variable.
Scatterplots
Bivariate data are best represented using a scatterplot. Each piece of data on a scatterplot
is shown by a point. The x-coordinate of this point is the value of the independent variable
and the y-coordinate is the corresponding value of the dependent variable. In the above
example each household would be represented by the point whose x-coordinate is the
number of electrical appliances and whose y-coordinate is the amount of electricity
consumed by that household.
WORKED Example 17
The following table shows the total revenue from selling tickets for a number of different
chamber music concerts. Represent these data on a scatterplot.
Number of tickets sold 400 200 450 350 250 300 500 400 350 250
Total revenue ($) 8000 3600 8500 7700 5800 6000 11 000 7500 6600 5600
THINK WRITE
1 Determine the nature of the variables The total revenue depends on the number of
with reasoning. tickets being sold, so the number of tickets is
the independent variable and the total revenue
is the dependent variable.
2 Rule up a set of axes on graph paper. Revenue obtained from selling
Title the graph. Label the horizontal music concert tickets
11 000
axis ‘Number of tickets’ and the
10 000
vertical axis ‘Total revenue ($)’.
9000
Scale the horizontal and vertical axes.
Total revenue ($)
3
8000
4 Plot the points on the scatterplot. In each
7000
pair of values, treat the number of tickets
6000
as the horizontal coordinate and the
5000
corresponding total revenue as the
4000
vertical coordinate. For example, the first
3000
pair of values in the table is represented
0
by the point with coordinates (400, 8000). 200 250 300 350 400 450 500
Number of tickets
Graphics Calculator tip! Drawing

scatterplots
A graphics calculator can be used to construct scatterplots. To obtain the scatterplot for
the data in worked example 17 we need to proceed as follows.
Casio:
1. In the STAT mode, enter the number of tickets sold as
List1 and the total revenue as List2.
2. Press SHIFT F3 (V-Window), enter the settings as

shown on the screen at right and press EXE .
3. Press F1 (GRPH) then F6 (SET). Set StatGraph1 as

shown at right. Press EXE .
4. Press F4 (SEL). Ensure that StatGraph1 is On then

press F6 (DRAW). The scatterplot will be displayed
on the screen.
TI-83:
1. Enter the number of tickets sold as L1 and the total
revenue as L2. To do this press STAT , choose EDIT
and enter the information in the two lists.
2. Adjust the window settings. Press WINDOW and
enter the settings as shown on the screen at right.
3. Press 2nd [STAT PLOT] and select Plot 1.
4. Use the arrow keys to select the scatterplot icon and
set Xlist to L1, Ylist to L2 and select the type of
mark. The selections are shown at right.
5. Press ZOOM and select 9:Zoomstat. The scatterplot

should then be shown.
Note that the calculator may adjust the window settings

and these may need to be restored manually to see the
coordinate axes.
Correlation
When analysing bivariate data we are often interested to see whether any relationship
exists between the two variables and, if it does, what type of relationship it is.
The relationship between the two variables is called correlation. If correlation exists,
it can be classified according to its:
1. form — whether it is linear or non-linear
2. direction — whether it is positive or negative
3. strength — whether it is strong, moderate or weak.
The scatterplot is an excellent tool that assists in classifying the relationship between
the two variables.
Linear and non-linear relationships

If a scatterplot is in the shape of a ‘corridor’ and fitting a straight line to it seems
reasonable, then the relationship between the two variables can be called linear. Other-
wise the relationship is non-linear.
y y
x x
Linear relationships
y y
x x
Non-linear relationships
Non-linear relations can be classified further as being quadratic, exponential and so

on, but further classification is beyond the scope of this course.
Positive and negative correlation

y
If one variable tends to increase as the other variable increases,
the correlation between the two variables is said to be positive.
The data points on a scatterplot appear to form a path, directed
from the bottom left to the top right corner.
x
Positive correlation
If one variable tends to decrease with the increase of the other, y

the correlation is said to be negative. The points on the scatterplot
form a path directed from the top left to the bottom right corner.
x
Negative correlation

The strength of the correlation
The narrower the path, the stronger the correlation between the two variables. The dia-
gram below shows examples of correlation of various strengths.
y y y
x x x
Strong correlation Moderate correlation Weak correlation
Sometimes the points on the scatterplot form a straight line. In such cases we say
that the relationship between the variables is perfectly linear.
y y
x x
Perfectly linear relationships
Sometimes the points on the scatterplot appear to be in no y
particular order (that is, they are randomly spread over the set of
axes). In such cases we say that there is no correlation between the
two variables.
x
No correlation
The classification of the correlation between two variables discussed above is qual-
itative rather than quantitative. There are a number of methods that allow us to measure
and classify the correlation numerically, but these are beyond the scope of this course.
WORKED Example 18 y
State the type of the relationship between the variables x and y,
suggested by the scatterplot at right.
x
THINK WRITE
Carefully analyse the The points on the scatterplot form a narrow path that resembles a
scatterplot and comment straight ‘corridor’ (that is, it would be reasonable to fit a straight
on its form, direction and line to it). Therefore the relationship is linear.
strength. The path is directed from the bottom left corner to the top
right corner and the value of y increases as x increases. Therefore
the correlation is positive.
Furthermore the points are quite tight; that is, they form a thin
corridor. So the correlation can be classified as being strong.
There is a strong, positive, linear relationship between x and y.
Correlation and causation

The correlation between the two variables, even when strong, does not necessarily
mean that the increase or decrease in the level of one variable causes an increase or
decrease in the level of the other. When describing the relationship between the two
variables it is best to avoid sentences like these:
‘An increase in rainfall causes an increase in the wheat growth.’
‘The increase of the cost of childcare causes the decrease in enrolment of children in
that childcare centre.’
To draw a conclusion about the relationship between the two variables based on the
scatterplot, the following guidelines must be closely followed.
If the correlation between x and y is weak, we can conclude that there is little evi-
dence to show that the larger x is, the larger (positive correlation) or smaller (negative
correlation) y is.
If the correlation between x and y is moderate, we can conclude that there is evi-
dence to show that the larger x is, the larger (positive correlation) or smaller (negative
correlation) y is.
If the correlation between x and y is strong, we can conclude that the larger x is, the
larger (positive correlation) or smaller (negative correlation) y is.
WORKED Example 19
Mary sells business shirts in a department store. She always records the number of
different styles of shirt sold during the day. The table below shows her sales over one
week.
Price ($) 14 18 20 21 24 25 28 30 32 35
Number of shirts sold 21 22 18 19 17 17 15 16 14 11
a Construct a scatterplot of the data.

b State the type of correlation between the two variables and, hence, draw a
corresponding conclusion.
THINK WRITE
a Draw the scatterplot showing ‘Price ($)’ a
(independent variable) on the horizontal 28
26
Number of shirts sold
axis and ‘Number of shirts sold’ 24

(dependent variable) on the vertical axis. 22
20
18
16
14
12
10
5 10 15 20 25 30 35
Price ($)

THINK WRITE
b 1 Carefully analyse the scatterplot and b The points on the plot form a path that
comment on its form, direction and resembles a straight ‘corridor’, directed
strength. from the top left corner to the bottom
right corner. The points are close to
forming a straight line. There is a strong,
negative, linear correlation between the
two variables.
2 Draw a conclusion corresponding to The price of the shirt appears to affect the
the analysis of the scatterplot. number sold; that is, the more expensive the
shirt the fewer sold.
remember
remember
1. Bivariate data involve two sets of related variables for each piece of data.
2. Bivariate data are best represented on a scatterplot. On a scatterplot each piece
of data is shown by a single point whose x-coordinate is the value of the
independent variable, and whose y-coordinate is the value of the dependent
variable.
3. The relationship between two variables is called correlation. Correlation can be
classified as linear, non-linear, positive, negative, weak, moderate or strong.
4. If the points appear to be scattered about the scatterplot in no particular order,
then no correlation between the two variables exists. If the points form a
straight line, then the relationship between the variables is perfectly linear.
5. When drawing conclusions based on the scatterplot, it is important to
distinguish between the correlation and the cause. Strong correlation between
the variables does not necessarily mean that an increase in one variable causes
an increase or decrease in the other.
9F Bivariate data
Determining
9.8 SkillS
HEE
T
independent
1 For each of the following pairs, decide which of the variables is independent and and
dependent
which is dependent. variables
a Number of hours spent studying for a Mathematics test and the score on that test.
b Daily amount of rainfall (in mm) and daily attendance at the Botanical Gardens. Math
c Number of hours per week spent in a gym and the annual number of visits to the
cad
doctor. Scatterplots
d Amount of computer memory taken by an essay and the length of the essay (in words).
e The cost of care in a childcare centre and attendance at the childcare centre.
f The cost of the property (real estate) and the age of the property. L Spread
XCE
sheet
E
g The cut-off OP score for a certain tertiary course and the number of applications
Scatterplots
for that course. (DIY)
h The heart rate of a runner and the running speed.
WORKED 2 The following table shows the cost of a wedding reception at 10 different venues.
Example
Represent the data on a scatterplot.
17
No of guests 30 40 50 60 70 80 90 100 110 120
Total cost (× $1000) 1.5 1.8 2.4 2.3 2.9 4 4.3 4.5 4.6 4.6
WORKED 3 State the type of relationship between x and y for each of the following scatterplots.
Example
18 a y b y c y d y
x x x x
e y f y g y h y
x x x x
i y j y k y l y
x x x x
my n y o y
x x x
WORKED 4 Eugene is selling leather bags at the local market. During the day he keeps records of
HEET
9.9 Example
his sales. The table below shows the number of bags sold over one weekend and their
19
SkillS
corresponding prices (to the nearest dollar).

Determining
the type of
correlation Price ($) of a bag 30 35 40 45 50 55 60 65 70 75 80
Number of bags sold 10 12 8 6 4 3 4 2 2 1 1
b State the type of correlation between the
two variables and, hence, draw a
corresponding conclusion.

5 The table below shows the number of bedrooms and the price of each of 30 houses.
Number of Price Number of Price Number of Price

bedrooms (× $1000) bedrooms (× $1000) bedrooms (× $1000)
2 180 3 279 3 243
2 160 2 195 3 198
3 240 6 408 3 237
2 200 4 362 2 226
2 155 2 205 4 359
4 306 7 420 4 316
3 297 5 369 2 200
5 383 1 195 2 158
2 212 3 265 1 149
4 349 2 174 3 286

b State the type of correlation between the number of bedrooms and the price of the
house and, hence, draw a corresponding conclusion.
c Suggest other factors that could contribute to the price of the house.
6 The table below shows the number of questions solved by each student on a test, and
the corresponding total score on that test.
Number of 2 4 7 10 5 2 6 3 9 4 8 3 6
questions
Total score (%) 22 39 69 100 56 18 60 36 87 45 84 32 63

b What type of correlation does the scatterplot suggest?
c Give a possible explanation as to why the scatterplot is not perfectly linear.
7 A sample of 25 drivers who had obtained a full licence within the last month was
asked to recall the approximate number of driving lessons they had taken (to the
nearest 5), and the number of accidents they had had while being on P plates. The
results are summarised in the table which follows.
Number of Number of Number of Number of

lessons accidents lessons accidents
5 6 5 5
20 2 20 3
15 3 40 0
25 3 25 4
10 4 30 1
35 0 15 4
5 5 35 1
15 1 5 4
10 3 30 0
20 1 15 2
40 2 20 3
25 2 10 4
10 5
a Represent these data on a scatterplot.

b Specify the relationship suggested by the scatterplot.
c Suggest some reasons why this scatterplot is not perfectly linear.
8 Each point on the scatterplot below shows the time (in weeks) spent by a person on a
healthy diet and the corresponding mass lost (in kg).
Loss in mass
Number of weeks
Study the scatterplot and state whether each of the following statements is true or
false.
a The number of weeks that the person stays on a diet is the independent variable.
b The y-coordinates of the points represent the time spent by a person on a diet.
c There is evidence to suggest that the longer the person stays on a diet, the greater
the loss in mass.
d The time spent on a diet is the only factor that contributes to the loss in mass.
e The correlation between the number of weeks on a diet and the number of kilo-
grams lost is positive.

9 multiple choice
The scatterplot that best represents the relationship between the amount of water con-
sumed daily by a certain household for a number of days in summer and the daily
temperature is:
A B
Temperature (°C)
Water usage (L)

Water usage (L) Temperature (°C)
C D
Temperature (°C)
Water usage (L)
Temperature (°C) Water usage (L)
10 multiple choice
The scatterplot below shows the number of sides and the sum of interior angles for a
number of polygons.
1300
1200
1100
Sum of angles (°)
1000
900
800
700
600
500
400
300
200
3 4 5 6 7 8 9 10
Number of sides
Which of the following statements is not true?
A The correlation between the number of sides and the angle sum of the polygon is
perfectly linear.
B The increase in the number of sides causes the increase in the size of the angle
sum.
C The number of sides depends on the sum of the angles.
D The correlation between the two variables is positive.
11 multiple choice
After studying a scatterplot, it was concluded that there was evidence that the greater
the level of one variable, the smaller the level of the other variable. The scatterplot
must have shown a:
A strong, positive correlation B strong, negative correlation
C moderate, positive correlation D moderate, negative correlation
2
1 Find the mean, median and mode of the following set of data: 34, 18, 42, 18, 55, 18,
25, 42, 33, 18.
2 The results of a mathematics project are represented by this stem-and leaf-plot.
Key 2 |5 = 25
Stem Leaf
0 00
1
2 5579
3 3888
4 06666
Find the mean, median and mode.
3 Find the mean number of pets per household surveyed and presented in this frequency
table (correct to 1 decimal place).
Number of pets Frequency

0 11
1 34
2 11
3 1
4 0
5 0
6 0
7 1
4 Using the midpoint of class intervals, calculate the mean amount (dollars) spent at a
school canteen, represented in the following frequency table.

0–<$2.50 22
$2.50–<$5.00 12
$5.00–<$7.50 3
$7.50–<$10.00 1
5 Add a cumulative frequency column to the table in the previous question and find the
median class.
6 Find the range for the set of data: 33, 46, 57, 42, 51, 66, 37, 27, 76, 74, 53, 77.
7 Find the interquartile range for the set of data: 2, 4, 3, 1, 4, 2, 3, 5, 2, 4, 3, 3, 5, 3.

8 The stem-and-leaf plot at right Key 1 |4 = 14
represents the number of goals scored Stem Leaf
by a netball team throughout the 0 9
course of a season. 1 48
Find the range of the data. 2 02249
3 0116
4 18
5 0007
9 Find the interquartile range of the data in question 8.

10 Draw a boxplot to represent the data in question 8.
Lines of best fit

When analysing bivariate data we first want to know whether there is any correlation
between the two variables. Once the existence of the correlation has been established,
we need to describe the relationship between the two variables.
The shape of the scatterplot may indicate different types of relationships between
variables: linear, quadratic, exponential and others. In this section we will concern our-
selves only with linear modelling.
If the scatterplot indicates that the relationship between the two variables is linear,
we can establish a linear model of the relationship as follows.
First a straight line is fitted to the scatterplot. It is positioned so that there is approxi-
mately an equal number of data points on either side of the line, and so that all the
points are as close to the line as possible. Such a line is called a line of best fit.
We can then use this line to graphically predict the value of one variable from
another.
WORKED Example 20
The table below shows the number of boxes of tissues purchased by hay fever sufferers
during the blooming season in spring.
Number of days affected by 3 12 14 7 9 5 6 4 10 8

hay fever (d)
Total number of boxes of 1 4 5 2 3 2 2 2 3 3
tissues purchased (T )
a Construct a scatterplot of the data and draw a line of best fit.

b Interpret the meaning of the gradient of this line.
c Use the line of best fit to predict:
i the number of boxes of tissues purchased by people suffering from hay fever over a
period of 11 days
ii how many days 2 boxes of tissues would be likely to last.
Continued over page
THINK WRITE/DRAW
a 1 Draw the scatterplot showing a T
‘Number of days affected by hay
5
fever’ (independent variable d) on
the horizontal axis and ‘Total 4
number of boxes of tissues
purchased’ (dependent variable T) 3
on the vertical axis. 2
0
3 4 5 6 7 8 9 10 11 12 13 14 d
2 Position the line of best fit on the T

scatterplot so there is approximately
5
an equal number of data points on
either side of the line. 4
0
3 4 5 6 7 8 9 10 11 12 13 14 d
b Look at the slope of the line. It is b The line of best fit has a positive slope. This
sloping upwards from left to right, so the means that, as the number of days affected
slope is positive. by hay fever increases, more tissue boxes
will need to be purchased.
c ii 1 Locate 11 on the horizontal axis. c ii T

Draw a vertical line until it meets the
5
line of best fit. From that point draw
a horizontal line to the vertical axis. 4
Read off the value on the vertical
axis, indicated by the horizontal line. 3
0
3 4 5 6 7 8 9 10 11 12 13 14 d
2 Answer the question. In 11 days the hay fever sufferer will need
about 4 boxes of tissues.
THINK WRITE/DRAW
ii 1 Locate 2 on the vertical axis. Draw a ii T ,
horizontal line until it meets the line
5
of best fit. From that point draw a
vertical line to the horizontal axis. 4
Read off the horizontal value
indicated on the horizontal axis by 3
the vertical line. 2
0
3 4 5 6 7 8 9 10 11 12 13 14 d
2 Answer the question. 2 boxes of tissues would be likely to last
about 6 days.
Interpolation and extrapolation

When the line of best fit is used to predict the value of the variable from within the
given range, the process is called interpolation; and when the value of the variable
being predicted is outside the given range, the process is called extrapolation. In
worked example 20, the values of the data set were for the duration of the sickness
between 3 and 14 days. Therefore, the predictions were both examples of interpolation.
On the other hand, predicting the amount used over a period of 15 days would be an
example of extrapolation, as 15 is beyond the given range.
T
5
Extrapolation
4 (outside the
given range)
3 Interpolation
(inside the
2 given range)
0
3 4 5 6 7 8 9 10 11 12 13 14 d
Reliability of predictions
When predictions of any sort are made it is always good to know whether they are
reliable or not. Predictions made using the line of best fit can be thought of as reliable
if each of the following are observed:
1. the number of observations (that is, points constituting the scatterplot) is reasonably
large
2. the scatterplot indicates reasonably strong correlation between the variables
3. the predictions were made using interpolation.
remember
remember
1. If the scatterplot indicates a linear relationship between two variables, the
linear model of the relationship can be established by drawing a line of best fit
into the scatterplot. Position the line so that there is approximately an equal
number of points on either side of the line.
2. The line of best fit can be used for predicting the value of one variable when
given the value of the other. This can be done graphically.
3. When the value that is being predicted using the line of best fit is within the
given range, the process is called interpolation.When the value that is being
predicted using the line of best fit is outside the given range, the process is
called extrapolation.
4. Only predictions made using interpolation can be considered reliable.
9G Lines of best fit
d WORKED 1 The data in the table below show the distances travelled by 10 cars and the amount of
hca Example
petrol used for their journeys (to the nearest litre).
Mat
20
Lines of
best fit
Distance travelled (km) 52 36 83 12 44 67 74 23 56 95

Petrol used (L) 7 5 9 2 7 9 12 3 8 14
b Draw in the line of best fit.
c Use your graph to predict the distance travelled on 10 L of petrol.

2 A random sample of 10 Year 12 students who have part-time jobs was selected. Each
student was asked to state the average number of hours they work per week and their
average weekly earnings (to the nearest dollar). The results are summarised in the table
below.
Hours worked 4 8 15 18 10 5 12 16 14 6
Weekly earnings ($) 23 47 93 122 56 33 74 110 78 35

c Interpret the meaning of the gradient.
3 Use the given scatterplot and line of best fit y

to predict: 70
a the value of y when x = 45 60
b the value of x when y = 15. 50
40
30
20
10
0
10 20 30 40 50 60 70 80 x
4 Analyse the graph at right and use the y

line of best fit to predict: 600
a the value of y when the value of x is: 500
i 7 ii 22 iii 36 400
300
b the value of x when the value of y is: 200
i 120 ii 260 iii 480. 100
0
5 10 15 20 25 30 35 40 45 x
5 The table below shows the average weekly expenditure on food for households of
various sizes.
Number of people in a 1 2 4 7 5 4 3 5
household
Cost of food ($ per week) 70 100 150 165 150 140 120 155
Number of people in a 2 4 6 5 3 1 4
household
Cost of food ($ per week) 90 160 160 160 125 75 135
a Construct a scatterplot of the data and draw in the line of best fit.
b Interpret the meaning of the gradient.
c Use your graph to predict the weekly food expenditure for a family of:
i 8 ii 9 iii 10.
6 The following table shows the gestation time and the birth mass of 10 babies.
Gestation time (weeks) 31 32 33 34 35 36 37 38 39 40
Birth mass (kg) 1.1 1.5 1.8 2.1 2.2 2.5 2.8 3.1 3.2 3.4
a Construct a scatterplot of the data. What type of correlation does the scatterplot
suggest?
c What does the gradient indicate?
d Although full term of gestation is considered to be 40 weeks, some pregnancies last
longer. Use your graph to predict the birth mass of babies born after 41 and
42 weeks of gestation.
e Many babies are born prematurely. Predict the birth mass of a baby whose gestation
time was 30 weeks.
f If the birth mass of the baby was 2.4 kg, what was his or her gestation time (to the
nearest week)?
7 multiple choice
Consider the figure at right. y
The line of best fit on the scatterplot at right is
used to predict the values of y when x = 15, x = 40
and x = 60.
a Interpolation would be used to predict the value
of y when the value of x is:
A 15 and 40 B 15 and 60
C 15 only D 40 only 10 20 30 40 50 60 70 x
b The prediction of the y-value(s) can be considered reliable when:

A x = 15 and x = 40 B x = 15, x = 40 and x = 60
C x = 40 D x = 40 and x = 60
8 multiple choice y
The scatterplot at right is used to predict the 500

value of y when x = 300. 400
This prediction is: 300
A reliable, because it is obtained using 200
interpolation 100
B not reliable, because it is obtained using 0
100 200 300 400 500 600 700 x
SHE
ET 9.3 extrapolation
Work
C reliable because the scatterplot contains a large number of points

D not reliable, because there is no correlation between x and y

t i gat
es
io
A pulsating problem
ion v
in
n inv
t i gat
es 1 Choose an object or subject that is of interest to you and which can
be observed and measured during one day. For example, you might
decide to measure your own pulse rate.
2 Prepare a table where you will record your results every hour within
the school day. For example, for the pulse rate the table might look
like this.
Time 9 am 10 am 11 am 12 noon 1 pm 2 pm 3 pm 4 pm
Pulse rate
3 Take your measurements at the regular time intervals you have

decided on and record them in the table.
4 Draw a scatterplot of the results of your experiment.
5 Describe the graph and comment on the trend.
6 If appropriate, draw in a line of best fit and predict the values (that is,
your pulse rate) for the next 2–3 hours.
7 Take the actual measurements during the hours you have made
predictions for. Compare your predictions with the actual
measurements. Were your predictions any good?
t i gat
es Long jump to the top
io
ion v
in
n inv
t i gat
es At the beginning of the chapter we met Laura, who was training for the long jump
and hoping to make the Australian Olympic team. Her best jump each year is
shown in the table below.
Age 8 9 10 11 12 13 14 15 16 17 18
Best jump 4.31 4.85 5.29 5.74 6.05 6.21 — 6.88 7.24 7.35 7.57
(metres)
1 Plot the points given in the table on a scatterplot.

2 Draw a line of best fit.
3 The next Olympic Games will occur when Laura is 20 years old. Use your line
of best fit to estimate Laura’s best jump that year and whether it will pass the
qualifying mark of 8.1 metres.
4 Is a line of best fit a good way to predict future improvement in this situation?
What problems are there with using a line of best fit?
5 There will also be Olympic Games held when Laura is 24 years old and 28
years old. What length jump would you predict Laura could achieve at these two
ages? Is this realistic?
6 When Laura was 14 years old she twisted a knee in training and did not compete
for the whole season. In that year a national junior championship was held. The
winner of that championship jumped 6.5 metres. Use your line of best fit to
predict whether Laura would have won that championship.
summary
Copy the sentences below. Fill in the gaps by choosing the correct word or
expression from the word list that follows.
1 Data can be obtained either by observation or by .
2 Random sampling uses a device to select people or objects
from the population.
3 In simple random sampling every person or object has an equal
of being selected.
4 A stratified random sample splits the population into .A
sample from each stratum is selected ; the sample size for
each stratum is to the stratum size as compared to the
size.
5 An estimate of the sample size is given by the of the popu-
lation size.
6 All data can be divided into 2 types: and numerical. Numer-
ical data can be or while categorical data are
.
7 Numerical data can be or continuous.
8 To represent categorical or discrete numerical data we can use bar and
column graphs, sector graphs, picture graphs, dotplots and
plots.
9 In bar and column graphs the respective length and height of each bar
and column directly correspond to the of the observation, or
category it represents.
10 In pie graphs each category is shown by a whose size is pro-
portional to the category’s size (as compared to the population).
11 Picture graphs use to represent a specific number of items.
12 Dot-plots use a single dot to represent an .
13 Stem-and-leaf plots have leaves arranged in order of size,
outwards; that is, away from the stem.
14 Discrete data with a large number of different values and
data can be grouped into .
15 Grouped data can be represented using histograms, frequency
, cumulative frequency polygons and percentage cumulative
frequency polygons.
16 A histogram does not have any between columns.
17 A frequency polygon is a line graph, joining the of the top
parts of the columns that constitute a histogram.
18 Mean, median and mode are called measures of .
19 The mean is the of a set of data.
20 For ungrouped data the mean is calculated using the formula
; for grouped data the formula is , where x is the
actual observation for the discrete data, or a midpoint of a class interval
for the grouped data.

21 The median is the middle observation of the ordered data set; it is
located at the th place.
22 The mode is the observation with the frequency.
23 For grouped data the class interval with the highest frequency is called
the class.
24 Range, interquartile range and standard deviation are called measures of
.
25 The range is the difference between the highest and numbers
in the set.
26 The interquartile range is the difference between the and
quartile values.
27 The median divides the set of data into two halves; is the
median of the lower half and is the median of the upper half.
28 A boxplot is a graph which represents the data using a sum-
mary: Xmin, QL, Median, QU and Xmax.
29 Data where each piece is represented by two variables are called
data.
30 Bivariate data are best represented on a .
31 On a scatterplot, each piece of data is represented by a single point,
whose x-coordinate is the variable and y-coordinate is the
variable.
32 The relation between two variables is called . It can be classi-
fied as being linear or non-linear; or negative; and strong,
moderate or weak.
33 If the scatterplot indicates a relationship between two vari-
ables, the line of can be fitted to it.
34 Predicting a value from within the given range is called ; pre-
dicting the value outside the given range is called .
35 Predictions made with extrapolation are not considered to be
.
WORD LIST
average dependent increasing discrete
target symbols highest modal
class intervals strata extrapolation categorical
spread central tendency n+1 upper
------------
questioning bivariate 2 QU
categorised counted frequency interpolation
stem-and-leaf scatterplot random lowest
randomly gaps proportional correlation
QL linear five-number
x = ------- ∑ x-
polygons sector n population
observation midpoints chance reliable
measured ∑ fx-
x = ---------- lower square root
independent n continuous positive
best fit
CHAPTER
review
1 Lena is planning to conduct a survey of students who have completed the Senior Certificate.
9A She thought of some questions which are listed below.
i Determine the suitability of each of the listed questions, justifying your answer.
ii Tabulate possible responses for the questions deemed suitable in part i.
a Did you like studying for the Senior Certificate?
b Did you find studying for the Senior Certificate difficult?
c How many hours per week did you study?
d Which subjects did you do in your senior years?
e What were the hardest and easiest aspects of studying for your Senior Certificate?
f Did you have good teachers?
g Do you think tasks should be internally or externally assessed?
h What OP score did you obtain?
2 A researcher wishes to conduct a survey for the manager of a computer company. If the
9A company employs 200 technicians and 90 computer programmers, describe the procedure of
obtaining:
a a simple random sample b a stratified random sample.
3 ii For each of the following, classify the data as being categorical or numerical.
9B ii For the numerical data, decide whether they are discrete or continuous.
a Numbers on the T-shirts of football players
b The mass of individual tea bags in a pack of 50
c The finishing places in the gymnastics competition
d Weekly sales of computers in a large department store
e Arm span of 20 students
f The country of origin of the people applying for Australian citizenship certificates
4 The owner of a local restaurant wishes to know what desserts are popular. Over one night he
9B observes that 2 people order pavlova, 12 people order chocolate mousse, 7 people order
lemon tang cake, 14 people order chocolate mud cake and 9 people order sacher torte.
Represent these data using:
a a bar graph b a sector graph c a dot-plot.
5 The data below show the daily sales of calculators in a large electronics store over the last
9B three weeks of January.
GC 2 3 6 9 12 10 24 17 15 19 20 26 24 18 29 33 30 36
SC 7 6 10 8 15 11 20 18 23 28 30 26 32 38 39 35 43 41
(GC = graphics calculator; SC = scientific calculator)

Represent the data as a back-to-back stem-and-leaf plot.

6 The data below show the number of students attending weekly lectures in Cognitive
Psychology over the last year. 9C
76, 43, 29, 58, 82, 63, 36, 45, 70, 68, 34, 89, 95, 66, 41
37, 86, 53, 72, 92, 91, 87, 61, 37 48, 64, 81, 42, 59, 77
a Group the data into a frequency table with the class intervals of 10.
b Represent the grouped data as a histogram.
c Add a frequency polygon to the histogram.
7 Find the mean, median and mode for each of the following sets of data:
a 7, 15, 8, 8, 20, 14, 8, 10, 12, 6, 19 9D
b Stem Leaf
1 26
2 178
3 033468
4 01159
5 136
c Score x Frequency f
70 2
71 6
72 9
73 7
74 4
8 A sample of 30 people was selected at random from those attending a local swimming pool.
Their ages (in years) were recorded as follows: 19, 7, 58, 41, 17, 23, 62, 55, 40, 37, 32, 29, 9D
21, 18, 16, 10, 40, 36, 33, 59, 65, 68, 15, 9, 20, 29, 38, 24, 10, 30.
a Find the mean and the median age of the people in this sample.
b Group the data into class intervals of 10 and complete the frequency distribution table.
c Use the frequency distribution table to estimate the mean age.
d Calculate the cumulative frequency.
e Find the median class.
f Compare the mean and median of the original data in part a with the estimates of the
mean and the median class obtained for the grouped data in parts c and e.
9 The following back-to-back stem-and-leaf plot shows the typing speed in words per minute
(wpm), of 30 Year 8 and Year 10 students. 9E
Key: 2 | 6 = 26 wpm
Leaf Stem Leaf
Year 8 Year 10
99 0
9865420 1 79
988642100 2 23689
9776410 3 02455788
86520 4 1258899
5 03578
6 003
a Using a graphics calculator or otherwise, construct a pair of parallel boxplots to

represent the 2 sets of data.
b Find the mean, median, range and interquartile range for each set.
c Compare the two distributions, using your answers to parts a and b.
10 As preparation for a Mathematics test, a group of 20 students was given a revision sheet
9F containing 60 questions. The table below shows the number of questions from the revision
sheet successfully completed by each student and the mark, out of 100, obtained by the
student.
Number of questions 9 12 37 60 55 40 10 25 50 48 60
Test result 18 21 52 95 100 67 15 50 97 85 89
Number of questions 50 48 35 29 19 44 49 20 16 58 52
Test result 97 85 62 54 30 70 82 37 28 99 80
a State which of the variables is dependent and which is independent.

b Construct a scatterplot of the data.
c State the type of correlation between the two variables suggested by the scatterplot and
draw a corresponding conclusion.
d Suggest why the relationship is not perfectly linear.
11 a Use the line of best fit to predict the value of y,
9G when the value of x is: 50
y
i 10 ii 35. 45
b Use the line of best fit to predict the value of x, 40
when the value of y is: 35
i 15 ii 30. 30
25
20
15
10
12 For his birthday, Ari was given a small white rabbit.
9G To monitor the rabbit’s development, Ari decided to
5
measure it once a week. The table below shows the 5 10 15 20 25 30 35 40 x

length of the rabbit for various weeks.
Week number 1 2 3 4 6 8 10 13 14 17 20
Length (cm) 20 21 23 24 25 30 32 35 36 37 39

b Draw a line of best fit.
c As can be seen from the table, Ari did not measure his rabbit on weeks 5, 7, 9, 11, 12, 15,
16, 18 and 19. Use the line of best fit to predict the length of the rabbit for those weeks.
d Were the predictions made in part c an example of interpolation or extrapolation?
Explain.
test
yourself e Predict the length of the rabbit in the next three weeks (that is, weeks 21–23), using the
CHAPTER
line of best fit from part c.

9 f Are the predictions that have been made in part e reliable? Explain.

MQ q3 ch09

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MQ q3 ch09

Uploaded by

Copyright:

Available Formats

MQ QLD 3 - Chapter 09 Page 375 Wednesday, May 26, 2004 11:17 AM

are you 376 Maths Quest 8 for Victoria

Are you ready? READY?

Determining suitability of questions for a survey

9.2 Finding proportions

9.3 Distinguishing between types of data

television shows of 500 teenagers. Police Drama

9.6 Expressing one quantity as a percentage of another

9.8 Determining independent and dependent variables

Chapter 9 Dealing with data 377

378 Maths Quest for Queensland Book 3

Simple random sampling

Chapter 9 Dealing with data 379

380 Maths Quest for Queensland Book 3

1 We have previously determined that 16 n = 16

Chapter 9 Dealing with data 381

382 Maths Quest for Queensland Book 3

Determining that need to be improved? If so, refine your questionnaire.

Chapter 9 Dealing with data 383

384 Maths Quest for Queensland Book 3

Presenting categorical and discrete

Data may be classified under the following headings:

Bar and column graphs

Chapter 9 Dealing with data 385

Kellogg’s Just Right 5

386 Maths Quest for Queensland Book 3

Compound and multiple graphs

Apr ‘Her’ fashion

Multiple column graph

Chapter 9 Dealing with data 387

Bills 10% Mortgage 30%

388 Maths Quest for Queensland Book 3

Chapter 9 Dealing with data 389

390 Maths Quest for Queensland Book 3

Chapter 9 Dealing with data 391

Discrete: counted in exact values.

392 Maths Quest for Queensland Book 3

Presenting categorical and

Distinguishing b Students’ weights

f Gender of newborn babies

Column k Selection of Junior school subjects offered by McKinnon Secondary College

Chapter 9 Dealing with data 393

394 Maths Quest for Queensland Book 3

percentage of the total budget). Represent these data on a pie graph.

Advertisements in major newspapers 5

Advertisements in women’s magazines 25

Promotions in major shops 12

Promotions in beauty salons 3

Number of ‘Happy meal deals’

Chapter 9 Dealing with data 395

puzzle? (Do not flip any piece over)

396 Maths Quest for Queensland Book 3

1998 Year Frequency

21 Student absences Week

Chapter 9 Dealing with data 397

1 Prepare a table with three columns

398 Maths Quest for Queensland Book 3

Histograms and frequency polygons

Chapter 9 Dealing with data 399

Graphics Calculator tip! Entering data and

3. Press F1 (GRPH). Press F6 (SET) and use the

400 Maths Quest for Queensland Book 3