You are on page 1of 38

Representation and Summary of Data

- Location
This chapter is generally about calculating averages,
also known as ‘measures of location’

Much of the topics you will have seen at GCSE, but we


will begin to use proper ‘mathematical notation’ when
solving problems

By the end of the chapter you will have seen:


 Key words to remember
 Mean, Median and Mode (including from a table)
 The difference between a discrete frequency
table and a continuous frequency table and its effect
on calculations
 How to use coding
Representation and Summary of Data
- Location
Data
Key Terms

• Quantitative Variable
– Data which is numerical
– eg) Height, profits, number of beads in a bag Quantitative Qualitative

• Qualitative Variable
– Data which is not numerical
– eg) Car colour, brand name of clothes Continuous Discrete

• Discrete Data
– Numerical data that only takes certain values
– eg) Shoe size, goals scored

• Continuous Data
– Numerical Data that takes any value
– eg) Height, Weight, Time taken

2A
Representation and Summary of Data
- Location
Data in a table

• Rebecca records the shoe size, x, of the Number of


female students in her year. The table x
shows her results. students, f
• Find: 35 3
a) The number of students who take size
37 36 17
 29
b) The shoe size taken by the smallest 37 29
number of female students
 35 38 34
c) The shoe size taken by the largest
number of female students 39 12
 38
d) The total number of students in the x  The data you are
year looking at
 Add them up..
 95 f  frequency

2A
Representation and Summary of Data
- Location
Data in a table
Number of Cumulative
x
• Add a Cumulative Frequency students, f Frequency
column to the table
 Add up the totals after 35 3 3
each additional group 36 17 20
37 29 49
38 34 83
39 12 95

x  The data you are


looking at
f  frequency

2A
Representation and Summary of Data
- Location
Data in a Grouped Frequency Table

• You need to know the following when Length of Number of


working with grouped data:
Wing (mm) Butterflies, f
• Groups are known as classes
30-31 2
• You need to know how to find class
boundaries 32-33 25

• You need to be able to work out the 34-36 30


mid-point of a class

• You need to be able to find the class


37-39 13
width

2A
Representation and Summary of Data
- Location
Data in a Grouped Frequency Table

• Write down the class boundaries, mid-


point and class width for the group 34-36 Length of Number of
a) Class boundaries Wing (mm) Butterflies, f
 As there are gaps between the groups,
the groups are said to begin and end 30-31 2
halfway between each other
 33.5 – 36.5
32-33 25
b) Midpoint 33.5
 Add up the boundaries and divide by 2
 (33.5 + 36.5) ÷ 2 = 35mm 34-36 30
36.5
c) Class width
 The upper boundary minus the lower 37-39 13
boundary
 36.5 – 33.5 = 3mm

2A
Representation and Summary of Data
- Location
Data in a Grouped Frequency Table

• Write down the class boundaries, mid-


point and class width for the group Time taken Number of
70 < s ≤ 75 (s) females, f
a) Class boundaries 55 < s ≤ 65 2
 No gaps so the same as in the table
 70 - 75
65 < s ≤ 70 25
b) Midpoint 70
 Add up the boundaries and divide by 2
 (70 + 75) ÷ 2 = 72.5s
70 < s ≤ 75 30
75
c) Class width 75 < s ≤ 90 13
 The upper boundary minus the lower
boundary
 75 – 70 = 5s

2A
Representation and Summary of Data
- Location
Measures of Location (averages)

• Mode
– The most common value in a set of data.

• Median
– The middle value when the data is put into ascending order
– For n observations, divide n by 2.
– If whole, find the midpoint of the corresponding term and the term above. If not
whole, round up and find the corresponding term

• Eg) For the median of 14 values


– 14 ÷ 2 = 7
– 7 is whole so the median will be the midpoint of the 7th and 8th terms.

– For the median of 29 values


– 29 ÷ 2 = 14.5
– Round up to 15 and find the 15th value

2B
Representation and Summary of Data
- Location
Measures of Location (averages)

• Mean
– The sum of the observations divided by the total number of observations
– This is written as:

x
n

– The symbol means ‘the sum of’
– The x represents the observations
– The n stands for the number of observations
x
– Often, the mean is denoted by . (x-bar)

2B
Representation and Summary of Data
- Location
Measures of Location (averages)

• Calculate the mean, median and mode of the set of data below…

2, 6, 18, 21, 16, 17, 6, 5, 5, 1, 5, 3

a) Mode  5

b) Median
1, 2, 3, 5, 5, 5, 6, 6, 16, 17, 18, 21
12 ÷ 2 = 6
So find the mid-point of the 6th and 7th terms
 5.5

c) Mean You must get into


x 105
8.75 the habit of
n 12 showing workings
like this!
2B
Representation and Summary of Data
- Location
Measures of Location (averages)

• Ben collects 8 pieces of data and calculates that x is 13.5

• Calculate the mean: x 13.5


1.69 (2dp)
n 8

2B
Representation and Summary of Data
- Location
Measures of Location (averages)

• You need to be able to calculate a combined mean.

1) If the mean pay of 20 workers is £5 per hour, and the mean of a different 20 workers is
£6 per hour, what is the overall mean?

The midpoint of £5 and £6 = £5.50

2) If the mean pay of 5 workers is £8 per hour and the mean pay of a different 12 workers
is £6 per hour, what is the overall mean?

This is not as simple. You need to work out the total pay, and the total number of people.

Total Pay  (5 x £8) + (12 x £6)


= £112
Total People  (5 + 12) x 112
£6.59 (2dp)
= 17 n 17

2B
Representation and Summary of Data
- Location
Measures of Location (averages)

• You need to be able to calculate a combined mean.

In general, you can use a formula…


If data set 1 has observations given by n , and mean x , and set 2 has
1 1
n x
observations 2 , and mean 2 then:

Mean of set 1 multiplied Mean of set 2 multiplied


by observations in set 1 by observations in set 2

x  n x n x
1 1 2 2

n n
Overall mean
1 2

Total number of
observations
2B
Representation and Summary of Data
- Location
Measures of Location (averages)

x n x
1  n
1 x 2 2
Using the formula
A sample of 25 observations has a
n n1 2

mean of 6.4. The mean of a second


sample is 7.2, with 30 observations.
Calculate the overall mean.
x (25  6.4)  (30  7.2)
25  30

x  6.84 (2dp)
You must get into
the habit of
showing workings
like this!

2B
Representation and Summary of Data
- Location
Measures of location (averages)

• You should realise that the 3 measures of location have different


advantages and disadvantages.

Mode
Can be used with any data, qualitative or quantitative. No use when
there isn’t a common value.

Median
Used with quantitative data and is unaffected by extreme values. Only
uses the middle value(s) though.

Mean
Uses all the data but can be affected by extreme values.

2B
Representation and Summary of Data
- Location
Measures of location (averages)
from tables

x Number of
Rebecca records the shirt collar
students, f
size, x, of male students in her
year group. Her results are in the 15 3
table.
15.5 17
Find the modal collar size.
16 29
 16.5 as this is the collar size
16.5 34
which occurred most often (34
times)
17 12

2C
Representation and Summary of Data
- Location
Measures of location (averages)
from tables
Number of Cumulative
Find the median collar size. x
students, f Frequency

 Fill in the Cumulative Frequency 15 3 3


column
15.5 17 20
 Total ÷ 2
 95 ÷ 2 = 47.5
16 29 49
 The median will be the 48th
value 16.5 34 83
 Find which group the 48th value
will be in, using the Cumulative 17 12 95
Frequency column

 The median is 16

2C
Representation and Summary of Data
- Location
Measures of location (averages)
from tables
Number of
Find the mean collar size. x fx
students, f

Sum of collar sizes ÷ Total students 15 3 45

15.5 17 263.5
 1537.5 ÷ 95
16 29 464
 16.18 (2dp)
16.5 34 561

17 12 204

Total 95 1537.5

2C
Representation and Summary of Data
- Location
Measures of location (averages)
from tables
Number of
Find the mean collar size. x fx
students, f

This is the formula you are actually 15 3 45


using:
15.5 17 263.5

 fx Sum of ‘f times x’
16 29 464

f Sum of ‘f’ 16.5 34 561

17 12 204
 fx 1537.5
16.18 Total 95 1537.5
f 95

2C
Representation and Summary of Data
- Location
• Measures of location (averages) from grouped tables

• All grouped data is treated as continuous data, and you need to be able to
calculate all 3 averages from this kind of table.

• The mode is essentially the same, the group with the highest frequency

• We will be focusing on the median and mean, and is important to know that when
data is grouped, you do not know the actual values. Therefore, the median and
mean from a grouped table are only estimates and not necessarily accurate.

2D
Representation and Summary of Data
- Location
• Mean from a grouped table

• To calculate the mean from a Length of Number of


grouped table, we use the same Pine Cone
formula as for an ungrouped table. Cones, f
(mm)

x fx 30-31 2

f 32-33 25

• The difference is that x is now the 34-36 30


midpoint of each class, rather than
actual values
37-39 13

2D
Representation and Summary of Data
- Location
• Mean from a grouped table

• Fill in 2 columns on the table Length of Number


(sometimes you will have to Midpoint
Pine Cone of Cones, fx
remember which columns you need) (x)
(mm) f

x  fx 30-31 2 30.5 61

f 32-33 25 32.5 812.5

2417.5 34-36 30 35 1050


x
70
37-39 13 38 494

x  34.54 Total 70 2417.5

2D
Representation and Summary of Data
- Location
• Median from a grouped table

• We will be using a formula to Length of


estimate the median, but first we Number of Cumulative
Pine Cone
will try to understand the process. Cones, f Frequency
(mm)

• First, find which group it is in… 30-31 2 2


 Complete the Cumulative
Frequency column 32-33 25 27
 70 ÷ 2 = 35
(for continuous data you just divide 34-36 30 57
by 2)
 It will be in the 34-36 group
37-39 13 70
• Our next step is to consider ‘how
far’ the median will be into the group

2D
Representation and Summary of Data
- Location
• Median from a grouped table

We have had 27 observations so far… Length of


Number of Cumulative
Pine Cone
33.5 Cones, f Frequency
(mm)
8 values
to go 30-31 2 2

30 values 32-33 25 27
in group
34-36 30 57

37-39 13 70

36.5 35th value, in the 34-36 group

The Median will be 8/30ths into a 8


/30 of 3 = 0.8, so the median is 0.8
group with a class width of 3 into the group
Representation and Summary of Data
- Location
• Median from a grouped table

• The median is 0.8 into the group Length of


Number of Cumulative
Pine Cone
Cones, f Frequency
(mm)
• The lower boundary of the group is
33.5
30-31 2 2

• 33.5 + 0.8 = 34.3


32-33 25 27
• So our estimate of the median is
34.3, this process is known as 34-36 30 57
interpolation.
37-39 13 70

35th value, in the 34-36 group


Representation and Summary of Data
- Location
• Median from a grouped table

• The formula (most important bit!) Length of


Number of Cumulative
Pine Cone
Cones, f Frequency

( )
(mm)
Lower + Places into Group x Classwidth
Boundary Group Frequency
30-31 2 2

33.5 +
( 8
30
x 3
) 32-33 25 27

34-36 30 57
= 34.3
37-39 13 70

You must get into 35th value, in the 34-36 group


the habit of
showing workings
like this!
2D
Representation and Summary of Data
- Location
Coding
• You need to understand why data is coded, how to code it and
how to un-code it.

• Coding is done before any average is calculated, and is usually


used with large values of data in order to simplify calculations

• Once data has been coded, averages are calculated

• Then after the average is worked out, the code is reversed in


order to give the actual average
Representation and Summary of Data
- Location
Coding
• Use the following coding to calculate the mean of the data below
110, 120, 130, 140, 150

x represents the original value

Coding  y  x  100
y is the coded value
So this code is telling us to subtract 100 from all the numbers before calculating
the mean

10, 20, 30, 40, 50

The mean of these numbers is 30


However as 100 was subtracted, you must now undo this to get the correct mean
 So the mean of the original set of data is 130
Representation and Summary of Data
- Location
Coding
• Use the following coding to calculate the mean of the data below
110, 120, 130, 140, 150
x represents the original value
x  100
Coding  y
10
y is the coded value
So this code is telling us to subtract 100 from all the numbers, and then divide
by 10, before calculating the mean

1, 2, 3, 4, 5

The mean of these numbers is 3


We subtracted 100 then divided by 10..
So to undo this we must multiply by 10 then add 100…
 So the mean of the original set of data is 130
Representation and Summary of Data
- Location
Coding Time Midpoint
Calls y fy
Use the following code to estimate the (mins) ,x
mean of this set of grouped data on the
lengths of phonecalls. 0-5 4 2.5 -1 -4
x  7.5
y
5 5-10 15 7.5 0 0
First the midpoints (x) must be turned
into new values (y) using the code.
10-15 5 12.5 1 5
We are now working out the mean, so use
the formula for this.
15-20 2 17.5 2 4

y  fy
y  16.5 20-60 0 40 6.5 0
f 27

y  0.61111 60-70 1 65 11.5 11.5

Total 27 16.5
Representation and Summary of Data
- Location
Coding
x  7.5
We calculated a mean of 0.61111 using the code y
5
 So we subtracted 7.5 and then divided by 5

 We therefore need to multiply by 5 and then add 7.5

 (0.61111 x 5) + 7.5

 The mean for the original data (x ) is 10.5555 (10.56 to 2dp)


Summary
• We have now covered all of chapter 2

• We have seen the 3 measures of location (averages)

• We have seen how to calculate them in tables, using


midpoints and interpolation where appropriate

• We have looked at combination means

• We have also used coding in answering questions

You might also like