You are on page 1of 44

# GCSE MATHEMATICS Handling data

## Steve Bishop First edition December 2012

Contents
Handling data checklist ............................................................................................ 3 Handling data 1 ........................................................................................................... 4 The handling data cycle: .......................................................................................... 4 Types of data ........................................................................................................... 4 Collecting data ......................................................................................................... 5 Displaying the data .................................................................................................. 6 Pictogram ............................................................................................................. 6 Bar Chart .............................................................................................................. 7 Why are the following diagrams misleading? .......................................................... 9 How to draw pie charts .......................................................................................... 10 Stem and leaf diagrams ......................................................................................... 11 Handling data 2 ......................................................................................................... 14 Mean, median and mode ....................................................................................... 14 Using frequency tables .......................................................................................... 16 Grouped frequency tables ..................................................................................... 18 Box and whisker diagrams .................................................................................... 20 Handling data 3 ......................................................................................................... 23 Scatterplots ............................................................................................................ 23 Correlation ............................................................................................................. 24 Lines of best fit ...................................................................................................... 26 Frequency polygons .............................................................................................. 29 Two-way tables ...................................................................................................... 32 Handling data 1 practice questions ....................................................................... 35 Handling data 2 practice questions ....................................................................... 40 Mean, median, mode and range ........................................................................ 40 mean from tables ............................................................................................... 42 mean from grouped data .................................................................................... 43 Box and whisker plots ........................................................................................ 44

Page 2

## Handling data checklist

Can do Tick the appropriate box F Calculate the mode from a list of data Calculate the range from a list of data Collect & sort discrete data into a frequency table Draw a simple line graph (such as temperature) E Calculate the mean from a list of data Calculate the median from a list of data Choose the best average to use & explain your decision Interpret a line graph & bar chart D Calculate the averages from a frequency table Distinguish between qualitative, quantitative, discrete & continuous data Collect & sort continuous data into groups within a frequency table Draw a stem & leaf diagram Draw a pie chart (by calculating the angles) Draw a frequency polygon (it is a line graph for continuous data) Plot a scatter graph & the line of best fit Draw a histogram (it is a bar chart for grouped data) Interpret a stem & leaf diagram Interpret a pie chart (by using the angles) Interpret a frequency polygon (it is a line graph for continuous data) Interpret a histogram (it is a bar chart for grouped data) C Calculate the averages from a grouped frequency table Design a questionnaire & correct any deficiencies Write a hypothesis & design a way of testing it Compare data using all the averages Describe correlation & use the line of best fit Help! Can't do

Page 3

Handling data 1
The handling data cycle:

Key words
Discrete Continuous Stem and leaf diagram Stemplot Data Pictogram Pie chart Bar chart Tally Frequency

Types of data
Statistics is a branch of mathematics that is concerned with the collection, representation and interpretation of data. There are different types of data: Data

Qualitative

Quantitative

Discrete counted

Continuous measured

Now try these 1 State whether each of the following is qualitative or quantitative data. If quantitative, state whether it is discrete or continuous. (a) The number of pupils in a class. (b) The colour of cars in a car park. (c) The time spent by a motorist waiting at a red traffic light. (d) The styles of womens dresses available in a chain store. (e) The number of votes received by the candidates in an election. (f) The club of each of the members of the England football team. (g) The number of players from a club who play football for England. (h) The mass of a new born baby. (i) The number of words on a page of a book. (j) The duration of a hockey match.

Page 4

Collecting data
Name 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Transport Time (mins)

Transport

Tally

Frequency

Page 5

## Displaying the data

Pictogram

A pictogram is a very simple-to-read way of presenting data. It is cheerful and it makes a powerful visual impact. Example Population of Great Britain Excluding Ireland (figures in millions) 1801 (10.5) 1851 (21.0) 1901 (37.0) EACH FACE REPRESENTS 1 MILLION PEOPLE

But it is not very accurate (how would you draw 0.1 of a face?), and it can be rather laborious if you are drawing by hand. Now try this 2 Draw a pictogram to represent the way in which students in your class have travelled to college

Page 6

Bar Chart

The bar chart is easier to draw than a pictogram and allows for greater accuracy. It is best drawn on graph paper. Lets say that we have recorded the colours of the shirts of 30 students in a class, with the following results (this is qualitative data): RED 15 GREEN 5 BLUE 5 BLACK 3 WHITE 2

This can be presented in a frequency diagram with the bars either VERTICAL or HORIZONTAL.
16 14 12 Frequency 10 Shirt colour 8 6 4 2 0 White Black Blue

Green Red 0 2 4 6 8 10 12 14 16

Shirt Colour

Frequency

Note:

1. 2. 3. 4. 5.

If the data are continuous, the bars should be next to each other. If the data are discrete or qualitative, the bars should be kept separate. Each bar must be of exactly the same width. The frequency scale must go up evenly and must start at 0. Everything should be clearly labelled. The bar can be plain, shaded or coloured.

Now try this 3 Draw a bar chart to represent the way in which students in your class have travelled to college

Page 7

Points 2 and 3 above are very important: you are not allowed to mislead the viewer by tampering with the scales and the bar widths. For example, say there was a by-election and the result was: LABOUR 19,800 CONSERVATIVE 14,500 LIB. DEM. 13,000 GREEN 11,000 Now look at this chart presenting the result: A first glance suggests that Labour has 4 or 5 times as many votes as Conservative and that Conservative has twice as many as Lib. Dem. This is because we have made the Labour bar look bigger by making it wider. And, because we have started our scale at 10,000 votes, the Conservatives 14,000 looks twice as large as the Lib. Dem.s 12,000. THIS IS NOT ALLOWED!

Page 8

(a) Sales

## (b) Calcium Milk

DRINK MILK ITS BETTER 4U

1985 1986 1987 1990 1991

Other drinks

(c) Profits `

## (d) calories Coke Typical other drink

Milk 1987 (e) Sales 1988 1989 1990 (f) Cost of a car k

## Price increases are slowing down!

Sales take off All diagrams should: be clearly labelled and titled have the scales clearly identified have the frequency begin at zero have the units given have the scales going up in equal amounts.

Page 9

## How to draw pie charts

Step 1 Put the information in a table Eye colour Blue Green Hazel Grey Total Step 2 Work out how many degrees for one item 360 represent 10 people So 360 10 = 36 36 represent 1 person Step 3 Write down in the table how many degrees for each item. Eye colour Blue Green Hazel Grey Total No of people 5 2 2 1 10
If correct they should add up to 360

No of people 5 2 2 1 10

## No of degrees 5 x 36 = 180 2 x 36 = 72 2 x 36 = 72 1 x 36 = 36 360

Step 4 Draw a circle and draw on the correct angles Step 6 Label the segments Grey Eyes Blue eyes Hazel eyes

Green eyes

Now try this 4 Draw a pie chart to represent the way in which students in your class have travelled to college

Page 10

## Stem and leaf diagrams

Stem and leaf diagrams are also known as stemplots. They are useful ways of displaying information. Example These are the results for a module test for 10 students: 12 23 34 35 37 55 56 57 We can display this on a stem and leaf diagram. L = 12 1 2 2 3 3 4 5 7 4 5 5 6 7 6 8 8 H = 68 n = 10 3 4 represents 34 marks There are a number of basic elements to a stempot: stem, level, leaves.
This column is the stem

68

68

L = 12 1 2 3 4 5 6 H = 71
n = 10 30 3 represents 33 marks This indicates the total number of data items

2 3 4 5 7 5 6 7 8 8

This is the 20 level 7 is a leaf on the 50 level: indicating the data value 57 This is a key, which enables you to translate the level and leaf into a data value

L indicates the lowest value and H the highest value. (The difference between H and L is the range) Note that at each level the leaves are ordered, increasing as it moves away from the stem. This makes it easier to find the middle (median) value. Note also that repeated data values (here 68) are recorded separately.

Page 11

Now try this 5 Complete the stem and leaf plot for the following data 10 11 12 14 21 22 24 45 45 47 48 55 56 L= 1 2 3 4 5

H= n=

1 1 represents 11

Using stem and leaf plots to compare data higher tier A stem and leaf plot helps us to compare visually two different but related data sets. To do this we need to construct a back-to-back stem and leaf plot. Example In a module 2 test the same 10 students in a previous example scored the following results 25 33 40 42 43 45 56 57 57 69

Add this data to the first stem and plot diagram (from the above example) to form a back to back stem plot. 5 3 5 3 2 0 7 7 6 9 n = 10 1 2 3 4 5 6 2 3 4 5 7 5 6 7 8 8 2 3 represents 23 marks scored n = 10

The back to back stem and leaf plot uses one stem but has two sets of leaves, one to the right and one to the left. Remember the leaves are ordered so that larger leaves are further away from the stem.

Page 12

By looking at the back-to-back stem and leaf plot we can see that the module 2 test was probably easier - or that the students were better prepared - as students scored better marks. Now try this 6 (Higher tier) The table below gives the annual rates of inflation for 10 countries in 1992 and 1991. Complete the back-to-back stem and leaf plot and comment on your results. Country UK Australia Canada France Germany Italy Japan Netherlands Spain USA 1992 % change 4.1 1.5 1.6 2.9 4.0 6.1 2.2 4.1 5.5 2.6 1991 % change 8.9 6.9 3.9 3.4 2.8 6.5 3.3 2.8 6.6 5.7

## Finished early? Look at Handling data 1 practice questions on page 35

Page 13

Handling data 2
Mean, median and mode
There are three types of average: mean, mode and median. Mode This is the one that occurs the most.

Key words
Mean Median Mode Range Frequency

Median This is the one in the middle, when all the numbers have been put into numerical order. Mean This is the one that we normally think of when we are asked to find the average. Mean = Total of the scores No of scores This can be expressed mathematically as x = Where x represents the mean of x, number of values.
_

x
n

## x represents the sum of all the xs and n is the

Range The range is also an important piece of information. It tells us how spread out the information is. Range = largest smallest Example Here are the scores that 5 people get for a test. 6 7 5 6 6

Find (a) the mean (b) the mode and (c) the median score and (d) the range (a) Mean = 6 + 7 + 5 + 6 + 6 = 30 = 6 5 5 (b) Mode: 6 occurs the most (3 times) so the mode is 6 (c) Median Step 1: put in numerical order: Step 2: identify the middle number: So the median is 6. (d) The range is 7 5 = 2.
Page 14

5, 6, 6, 6, 7 5, 6, 6, 6, 7

Now try these 7 1. Five other people take the same test and their scores are: 5 3 5 5 7

Find (a) the mean (b) the mode (c) the median score and (d) the range 2. The following are the midday temperatures over a week: 23 C 24C 25C 26C 20C 23C 27C

Find (a) the mean (b) the mode (c) the median score and (d) the range 3. Find the mean of these numbers: 200 400 200 100 100

Add five to each of the numbers now find the mean: 205 405 205 105 105

Subtract 10 from each of the original numbers and find the mean: 190 390 190 90 90

Add 23 to each of the original numbers. Can you find the mean without doing a calculation? Multiply the original numbers by 2 and find the mean. 400 800 400 200 200

Page 15

## Using frequency tables

When we have a large number of values it is easier to put the data into a frequency table. For example, if we survey 120 houses and asked how many people under the age of 16 lived in the house, we might get the following results: No under 16 frequency 0 37 1 23 2 34 3 18 4 5 5 2 6 1

To find the range from this data we can see that the number of children ranged from 0 to 6, so the range would be 6 0 = 6. To find the mode, we have to look for the highest frequency, here it is 37. So most houses have 0 children, hence the mode is 0. To find the mean is more complicated. We need to find the total number of children and the total number of houses. To find the total number of houses we need to add up all the frequencies = 120. The best way to find the total number of children is to redraw the table vertically and add another column: No of children x 0 1 2 3 4 5 6 Totals Frequency f 37 23 34 18 5 2 1 f = 120 Children Houses f x 37 0 = 0 23 1 = 23 68 54 20 10 6 fx = 181

## fx = 181 = 1.51children per house. x 120

Now try these 8 1 .The number of children per family on a housing estate were recorded as follows: No of children x 0 1 2 3 4 No of families f 12 15 5 2 1

Find (a) the range (b) the mode and (c) the mean number of children per family.
Page 16

2. An agricultural researcher counted the number of peas in a pod in a certain strain as follows: No of peas 3 4 5 6 7 8 No of pods 5 5 20 35 25 10

Find (a) the range (b) the mode and (c) the mean number of peas in a pod.

Page 17

## Grouped frequency tables

Grouped frequency tables are used when a lot of data has to be recorded. Example Hours No of bulbs 0-400 2 400-800 5 800-1200 7 1200-1600 5 1600-2000 1

The problem that we have here is that we cannot multiply 2 0-400. We dont know exactly the life in hours for each bulb. So, we have to estimate its lifespan, by taking the midpoint of the group and use this to multiply the number of bulbs. The best way to do this is to redraw the table vertically with some extra columns: Hours 0-400 400-800 800-1200 1200-1600 1600-2000 Totals Midpoint x No of bulbs f 2 5 7 5 1 f = 20

To find the midpoint add the first and last value and divide by 2:

0 + 400 = 200 2

1000 1400 1800

## fx = 19200 = 960 hours f 20

Page 18

Now try these 9 1. Andrew did a survey at the seaside for his science coursework. He measured the lengths of 55 pieces of seaweed. The results of the survey are shown in the table.
Length of seaweed (L cm) 0 < L 20 20 < L 40 40 < L 60 60 < L 80 80 < L 100 100 < L 120 120 < L 140 Frequency 2 22 13 10 5 2 1

Andrew needs to calculate an estimate for the mean length of the pieces of seaweed. Work out an estimate for the mean length of the piece of seaweed.

(a)

10 6 17

16 3.1 12.8

18.1 10.8

8.3 15.7

14 3.7

11.5 9.4

21.7 8

## Time (t) seconds 0 t< 5

Tally

Frequency

(b) (c)

Write down the modal class interval. Calculate the mean time.

Page 19

## Box and whisker diagrams

A box and whisker diagram (sometimes called a boxplot) is another way of displaying the same data that we find on a cumulative frequency curve. To draw a box and whisker diagram you need the following information: lowest value lower quartile median upper quartile highest value This information is then drawn on a diagram as follows: Lowest value Lower quartile Median Upper quartile Highest value

Scale Example The midday temperature (in C) for 11 cities around the world are: 13 12 5 34 8 10 11 4 25 23 36

Draw a boxplot to illustrate this data. First, put the data into order and then locate the median, LQ and UQ 4 5 8 LQ 10 11 12 Median 13 23 25 UQ 34 36

## These values are then used to draw the boxplot

10

15

20

25

30

35

40

Page 20

Now try these 10 1. The following boxplot shows the class scores on a GCSE Maths mock paper

10

20

30

40

50

60

Find: the median mark the lower quartile the upper quartile the inter-quartile range the highest mark the lowest mark the range of the marks

2. (a) The following are the shoe size of eleven children: 1 6 2 5 5 6 4 Draw a boxplot to illustrate the data using the scale below:

(b) The following are the times in minutes taken to evacuate a building over 15 different fire tests: 5 8 6 8 6 4 5 6 6 7 4 7 7 5 8

## Draw a boxplot to illustrate the data. (Draw your own scale)

Page 21

Now try these 11 The stem and leaf diagram shows the ages of students in a maths group. L= 0 10 20 30 40 50 60 H= n= How many students are there in the class? How old is the oldest student? How old is the youngest student? What is the range? What is the modal age? What is the median age? Find the mean age. Draw a box plot to illustrate the data. 6 7 7 8 8 9 9 9 9 1 1 4 5 2 6

Finished early? Can you find 4 numbers that have a mode of 1, a median of 2 and a mean of 3? Now have a look at Handling data sheet 2 on page 40

Page 22

Handling data 3
Scatterplots
Scatterplots (sometimes called scattergraphs, scattergrams or scatter diagrams) are ways of displaying two variables. They can be used to see if there is some link between the two sets of data.

Key words
Scatterplot Scatter diagram Scattergraph Variable Correlation

Match each person to the correct number on the scattergraph Example A survey was carried out by a group of students in which the height and weight of each student was measured. The results were recorded in pairs (e.g. the student with height 164cm weighed 58.2kg). Height (cm) Mass (kg) 164 58.2 152 50.8 173 60.3 158 56.0 177 76.2 173 64.2 179 68.8 168 60.5

In order to display this data on a scatter graph, two axes are drawn, one for the heights and one for the weights. (It does not really matter which is which, but, as a general rule, the first set of data is recorded along the horizontal axis and the second along the vertical axis). Each point is plotted using the paired data as the co-ordinates, i.e. for the student with height 164cm and weight 58.2 kg, the co-ordinates are (164, 58.2). Scatter graph showing height/mass of students 80
mass/kg

## 70 60 50 40 140 145 150 155 160 165 170 175 180

height/ cm
Page 23

Correlation
Looking at the scatterplot above we can see that there is a link between a persons height and their weight. In general the taller someone is the heavier they weigh. If there is a link we say that there is a correlation. There are three types of correlation: positive, zero and negative.

Positive

Zero

Negative

Positive When there is a positive correlation as the x value increases so too does the y value. An example of positive correlation might be the number of police cameras and the number of speeding convictions; or height and foot size; or the amount of time spent revising and the marks in a maths exam. Zero Zero correlation shows that there may well not be a link or relationship between the two variables. For example, IQ and height, or the amount of food eaten and the marks on a maths exam. Negative If the y value decreases as the x value increases then it has a negative correlation. An examples of a negative correlation might be the age of a computer and its value; or the amount of time spent watching TV and the marks in maths coursework. Strong, moderate and weak correlation The correlation can also be strong, moderate or weak. The diagrams below give examples of each for a negative correlation:

Strong

Moderate

Weak

Page 24

Now try these 12 1. The diagram shows three different types of scatter graphs.

Describe each of the different kinds of correlation. The diagrams represent these three situations: (a) the age of cars plotted against their value. (b) the number of rooms in a house plotted against the value of the house. (c) the age of adults plotted against their weight. Which diagrams represent each of the situations? 2. For each of the following decide if there is a correlation and what sort of correlation it might be: (a) (b) (c) (d) (e) (f) (g) (h) Number of people in a lift and the weight of the lift Shoe size of students and the number of brothers and sisters they have Marks in maths test and marks in a science test Speed of a car and the time it takes to stop Speed of a car and the time taken to travel 10 km Temperature of the room and the time taken for an ice cube to melt The height of a student and the time taken to do maths homework The time taken to revise for a test and the test results

Page 25

Now try these 13 Draw scatter graphs for the following data. State the type of correlation (none, positive, negative) and give some indication of the degree of correlation (strong, moderate, weak). (a) Mark on module 1: Mark on module 2: (b) Age (years) Price of car () (c) Shoe size Handspan 5 17 9 21 7 20 6 20 5 18 10 22 2 3250 5 1500 10 220 4 2400 8 1200 9 900 15 22 20 34 14 50 5 20 24 66 10 32

## Lines of best fit

If the relationship between the two variables is linear (i.e. a straight line), then a line of best fit can be drawn. This is done by drawing a line that goes through as many points as possible, with roughly the same number above as below the line. Example A class sat two maths tests on one algebra and another on handling data. The results are shown below: Algebra 67 76 78 93 65 61 56 38 54 72 84 Handling data 52 56 72 84 43 78 67 34 62 77 84

To draw the line of best fit we must first plot the data on a scattergraph and then draw the line of best fit by eye.

Page 26

We can then use the line of best fit to predict someones score in one paper if we know the score in another. For example, if a student scored 80 in the Algebra, what would the score be for Handling data? Go to the 80 on the algebra axis go up until it hits the line and read off the corresponding Handling data value (75 marks). Now try these 14 1. The scatter graph shows the height and mass of students. Draw a line of best fit on the graph. Use the line of best fit to estimate the height of someone who has a mass of 60 kg.

Page 27

Whats happening here? What type of correlation is it? Does it mean that the more lemons that are imported the fewer road fatalities?

Page 28

Frequency polygons
Ungrouped data Example The results of a survey of 100 households are given in the table. Number of people in 1 2 3 4 household Frequency 11 28 21 25 Draw a frequency polygon to represent these data.
Frequency Polygon to show no. of people in household

5 10

6 5

## 30 25 frequency 20 15 10 5 0 0 1 2 3 4 5 6 7 no. of people in household

Now try this 15 1. The frequency distribution of the heights of some students is shown Height (cm) 130- 140- 150- 160- 170Frequency 1 6 13 10 2 Draw a frequency polygon to illustrate the data. NB 140- means 140 or more but less than 150

Page 29

Grouped data Similarly, for grouped data, the frequencies are plotted against the mid-point of the class interval and the points are joined with straight lines. Example The following table shows the heights of 65 people grouped into class intervals. Height (cm) 150<h 160 160<h 170 170<h 180 180<h 190 190<h 200 200<h 210 Frequency 4 7 15 47 6 1 (no. of people) In order to draw a frequency polygon, we first need to find the mid-points of these intervals. e.g. to find the mid-point of (150-160) add the two values together and divide by 2

## 1 310 (150 + 160) = = 155 cm. 2 2

Draw a new table of results, using the mid-points. Height mid points (cm) Frequency 155 4 165 7 175 15 185 47 195 6 205 1

We can then draw the frequency polygon, plotting the heights against the frequency, as shown below.

50 Frequency 40 30 20 10 0 140 150 160 170 180 190 200 210 220 Height (cm)
This frequency polygon has been completed by joining the first point to (145, 0) and the last point to (215, 0).

Page 30

Example The mock examinations results in Mathematics for two GCSE groups in two successive years are recorded on the table below. Mark Group 1 (% frequency) Group 2 (% frequency) i) ii) 1-20 5 7 21-40 12 26 41-60 35 48 61-80 28 9 81-100 20 10

Draw the frequency polygon for each group. Comment on the mock examination papers, assuming that the ability of the pupils was the same in each group.

Solution (i) In this example, in order to draw the frequency polygon, the percentage frequencies could be plotted against the class mid-points, which are: 10.5, 30.5, 50.5, 70.5 and 90.5. It is not really necessary to plot the points to such a high degree of accuracy so the values 10, 30, 50, 70 and 90 can be used. (ii) Group 2 appear to have been given a more difficult examination paper than Group 1 because a smaller number of people in group 2 obtained high marks in the examination. [The average mark in group 2 was lower than in group 1]
60 50

Group 1 Group 2

Frequency

40 30 20 10 0 0 10

20

30

40

50

60

70

80

90

100

Mark

Now try these 16 2. A teacher noted the absence rates of her maths class on Mondays and Fridays. The results are given on the table below.
No absent from class Monday Frequency Friday Frequency 0 3 0 1 6 2 2 6 2 3 7 3 4 4 4 5 4 6 6 3 3 7 0 0 8 0 7 9 0 3 10 0 2

(i) (ii)

Draw the frequency polygon for each day, using the same axes. Comment on the absence rates of the two days.
Page 31

Two-way tables
Completing two way tables This is a typical (incomplete) two-way table.

How many people buy 100 g tea bags? How many buy 100g packet tea? The table is part of a GCSE question. It reads: Bob carried out a survey of 100 people who buy tea. He asked them about the tea they buy most. The two-way table gives some information about his results. Complete the two-way table. To complete the table we have to look at rows and columns that only have one missing figure. Look at the first column. How many people in total have tea bags: 2 + 35 + 15 = 52 Look at the second column. How many have 200 g packet tea? 25 (in total) (20 + 0) = 5. We can now complete some of the table:

5 52 Looking at the first row we can now find out how many in total use 50 g tea: 2+0+5=7 Looking at the second row we can find out how many have 100g tea: 60 (35 + 20) = 60 55 = 5 Adding these we get:

Page 32

7 5 5 52 We can now find the missing total in the end column 100 (60 + 7) = 33 And the missing total in the bottom row: 100 (52 + 25) = 23 This only leaves the 200 g instant tea figure to find. We can do this in two ways a good way to check the figures: From the column: 23 (5 + 5) =13 From the row: 33 (15 + 5) = 13 The answers agree. The completed table will then look like:

7 5 5 52 13 23 33

Page 33

## Now try these 1.

How many males went to France? How many Males went to Spain? How many Males were there in total? Complete the two-way table. 2.

Page 34

1.

2.

Page 35

3.

Page 36

4.

5.

Page 37

6.

Page 38

7.

8.

Page 39

## Handling data 2 practice questions

Mean, median, mode and range

Page 40

Page 41

1.

2.

3.

Page 42

Page 43

Page 44