You are on page 1of 31

Representation and Summary of

Data - Dispersion
• The last chapter was based on
calculating averages from sets of data

• Just the average alone does not give


the full picture though

• The next chapter looks at measures of


dispersion, how spread out the data is
Representation and Summary of
Data - Dispersion
Range and Quartiles
• The Quartiles, Q1, Q2 and Q3 split the data into 4 parts, with 25% of the
information in each
Lowest Highest
Value Q1 Q2 Q3 Value

25% 25% 25% 25%

• For discrete data  Q1 = n/4 Remember, if the result is whole, you need
the midpoint of the term and the term
 Q2 = n/2 above. If not, round up and find the
 Q3 = 3n/4 corresponding term.

For continuous data  Use interpolation (like with the median from
chapter 2)  PL 
LB    CW 
 GF 

Inter-quartile range  Upper Quartile – Lower Quartile


 Q3 – Q1
3A
Representation and Summary of
Data - Dispersion
Range and Quartiles
Calculate the Range and Inter-quartile range of the following data.
7, 9, 4, 6, 3, 2, 8, 1, 10, 15, 11

Putting the data in order…


1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 15

Range  15 – 1 = 14

Lower Quartile  n 11 2.75 (3rd term)


4 4
3
Upper Quartile  3n 33
8.25 (9th term)
4 4
10
Inter-quartile Range 
10 – 3 = 7

3A
Representation and Summary of
Data - Dispersion
Range and Quartiles No.
Cumulative
Rebecca records the number of CDs in x Students,
the collections of students in her year. Frequency
f
The results are shown in the table
opposite. Calculate the Inter-quartile
range (IQR). 35 3 3

Q1 = n 95 23.75 (24th term)


4 4 36 17 20
37

Q3 = 3n 285 71.25 (72nd term) 37 29 49


4 4
38

38 34 83
IQR = Q3 – Q1
= 38 – 37
=1 39 12 95

Discrete 3A
Representation and Summary of
Data - Dispersion
Range and Quartiles
Time No. Cumulative
The length of time spent on the internet
each evening by a group of students is (mins) Students Frequency
shown in the table below. Calculate the
Inter-quartile range.
30-31 2 2
Q1 = n 70 17.5 term
th
4 4 31.5

( )
PL 32-33 25 27
Q1 = LB + x CW
GF
33.5

( )
15.5
31.5 + x 2 34-36 30 57
25

Q1 = 32.74
37-39 13 70

Continuous 3A
Representation and Summary of
Data - Dispersion
Range and Quartiles
Time No. Cumulative
The length of time spent on the internet
each evening by a group of students is (mins) Students Frequency
shown in the table below. Calculate the
Inter-quartile range.
30-31 2 2
Q3 = 3n 210 52.5 term
th
4 4

( )
PL 32-33 25 27
Q3 = LB + x CW
GF
33.5

( )
25.5
33.5 + x 3 34-36 30 57
30
36.5
Q3 = 36.05
37-39 13 70

Continuous 3A
Representation and Summary of
Data - Dispersion
Range and Quartiles
Time No. Cumulative
The length of time spent on the internet
each evening by a group of students is (mins) Students Frequency
shown in the table below. Calculate the
Inter-quartile range.
30-31 2 2
Q1 = 32.74

Q3 = 36.05
32-33 25 27
IQR  Q3 – Q1
 36.05 – 32.74
 3.31 34-36 30 57

37-39 13 70

Continuous 3A
Representation and Summary of
Data - Dispersion
Percentiles
A Percentile is similar to a quartile. The 70 th percentile of a set of data will be
the value that has 70% of the data before it. It would normally be written P 70.

The 62nd percentile will be the value that has 62% of the data before it, P 62.
xn
To calculate Px, you find the value of the th term
100
31n
For the 31 percentile 
st
100

90n
For the 90th percentile 
100

You can calculate the n% to m% Inter-percentile range  Pm – Pn

The Quartiles are effectively percentiles  Q1 = P25


 Q2 = P50
 Q3 = P75
3B
Representation and Summary of
Data - Dispersion
Percentiles Cumulative
The height, in cm of 70 eighteen year old Height Students
Frequency
boys was measured and the data put into
the table opposite. Calculate the 90th
percentile, the 10th percentile and the 150-160 4 4
10% to 90% Inter-percentile range.

P90 = 90n 6300 63rd term 160-170 21 25


100 100

170-180 32 57
( )
P90 = LB + PL
x CW
GF
180-190 9 66
( )
6
180 + x 10
9

P90 = 186.67 (2dp) 190-200 4 70

3B
Representation and Summary of
Data - Dispersion
Percentiles Cumulative
The height, in cm of 70 eighteen year old Height Students
Frequency
boys was measured and the data put into
the table opposite. Calculate the 90th
percentile, the 10th percentile and the 150-160 4 4
10% to 90% Inter-percentile range.

P10 = 10n 700 7th term 160-170 21 25


100 100

170-180 32 57
( )
P10 = PL
LB + x CW
GF
180-190 9 66
( )
3
160 + x 10
21

P10 = 161.43 (2dp) 190-200 4 70

3B
Representation and Summary of
Data - Dispersion
Percentiles Cumulative
The height, in cm of 70 eighteen year old Height Students
Frequency
boys was measured and the data put into
the table opposite. Calculate the 90th
percentile, the 10th percentile and the 150-160 4 4
10% to 90% Inter-percentile range.

P90 = 186.67 (2dp) 160-170 21 25


P10 = 161.43 (2dp)

170-180 32 57
The 10% to 90% Inter-percentile
range  P90 – P10
180-190 9 66
 186.67 – 161.43
 25.24cm
190-200 4 70

3B
Representation and Summary of
Data - Dispersion
Variance and Standard Deviation
Variance and Standard Deviation are measures of how far away the data is
spread from the mean. If the mean is x and an observation is x, then the
observation’s dispersion from the mean is x  x.
Sum of the squared
dispersions from the mean
(squaring removes any
The variance will therefore be given by;  ( x  x) 2 negative values)

n Number of
observations
However, a formula which is more commonly used, especially with larger sets of
data, is; The mean of the
squares

 2      
2
x  x
2
Variance
n  n 
The square of
the mean

The Standard Deviation,  is given by Variance .


3C
Representation and Summary of
Data - Dispersion
Variance and Standard Deviation

Important point:
 The Standard Deviation tells you the range from the mean which
contains around 68% of the data (if data is normally disributed)

For example, if 100 students have a mean height of 150cm and a


standard deviation of 10cm.

150
68 of the students are within
140 160 one Standard Deviation

95 of the students are within


130 170 two Standard Deviations

3C
Representation and Summary of
Data - Dispersion
Variance and Standard Deviation
 x2 x
2

Given that for x; 2    


n  n 
 x  42  x
2
 720 n5 2
720  42 
2   
Calculate the Variance and Standard 5  5 
Deviation of x.
 2  144  70.56

Which part is the mean?


Variance  2  73.44

Standard Deviation   8.57 (2dp)

3C
Representation and Summary of
Data - Dispersion
Variance and Standard Deviation
 x2 x
2

Use the formula to calculate the variance and 2    


standard deviation of the following numbers n  n 
3, 4, 6, 2, 8, 8, 5
2

 x  36  x n 218  36 
2
 218 7 2   
7  7 
x x2
3 9  2  31.14  26.45
4 16
6 36 Variance  2  4.69
2 4
8 64
Standard Deviation   2.17 (2dp)
8 64
5 25
Total 36 218
3C
Representation and Summary of
Data - Dispersion
Variance and Standard Deviation from a Table
As with the averages from Chapter 2, you need to be able to calculate the
Variance and Standard Deviation from a frequency table, grouped or ungrouped.

 x2 x
2

This was the formula from before  2    


n  n 
The formula for tabled data is similar:
Sum of frequency Sum of frequency
times x2 times x

   fx 
2
2
fx
Variance  2
 
f   f 
 
Sum of frequency

The difference reflects the fact that each value of x will appear many
times, rather than just once or a few times
3D
Representation and Summary of
Data - Dispersion
Variance and Standard Deviation from a Table
Calculate the Variance and Standard Deviation of a set of data with the
following values already calculated.

 fx  224  f
2
fx  8731  25

2  
  fx 
2
fx 2
 
 f   f 

2
8731  224 
 
2
 
25  25 

Variance  2  268.9584

Standard Deviation   16.40 (2dp)

3D
Representation and Summary of
Data - Dispersion
Variance and Standard Deviation from a Table No.
Sue records the time spent in town at lunchtime x students fx fx2
(mins) of students in her year. The results are in the (f)
table. Calculate the Standard Deviation of the time
spent out of school. 35 3 105 3675
   (3 x 35) (3 x 352)
2
fx  4096 fx  154050 f  109
36 17 612 22032
   fx 
2
fx 2 (17 x 36) (17 x 362)
 2
 
f   f 
  37 29 1073 39701
2
154050  4096 
 
2
  38 34 1292 49096
109  109 

 2  1.19805... 39 26 1014 39546

σ = 1.20 (2dp) Total 109 4096 154050

3D
Representation and Summary of
Data - Dispersion
Variance and Standard Deviation from a Table
Mid-
Andy recorded the lengths of telephone calls he Length Calls, f point fx fx2
made over the course of a month. Calculate an (x)
estimate of the Standard Deviation of his calls.
0-5 4 2.5 10 25
 
2
fx  247.5 fx  3018.75 (4 x 2.5) (4 x 2.52)

f  27 5-10 15 7.5 112.5 843.75


(15 x 7.5) (15 x 7.52)

   fx  10-15 5 12.5 62.5 781.25


2
fx 2
 2
  
 f   f 
15-20 2 17.5 35 612.5
2
3018.75  247.5 
 
2
  20-25 0 22.5 0 0
27  27 

 2  27.78 25-30 1 27.5 27.5 756.25

Total 27 247.5 3018.75


  5.27 (2dp)
3D
Representation and Summary of
Data - Dispersion
Coding
As with averages, coding can be used to make data easier to work with.
However, there is something extra to remember…

If you have a set of data with a range of 15, and reduce every number
by 2, what will happen to the range?
 Nothing!
 Range measures the spread of data, and if all the numbers are 2 less,
the spread will not have changed

It is exactly the same for Standard Deviation. Because it measures the


spread of data, any addition/subtraction in the coding will not need to
be undone.

 Any division or multiplication will have to be ‘uncoded’ as normal

3E
Representation and Summary of
Data - Dispersion
Coding
Use the following code to calculate the Standard Deviation of this set of data:
150, 160, 170, 180, 190

Code 
x x x2
y
10 15 225
15, 16, 17, 18, 19
16 256

 x  85  x  1455 n 5
2
17 289
18 324

 x2 x
2
19 361
2    
n  n  Total 85 1455

But we had divided by


2
1455  85 
 
2
 
5  5  10 so we must undo
2 2 this…
x 10
  1.41 (2dp)   14.14 (2dp)
3E
Representation and Summary of
Data - Dispersion
Coding
Use the following code to calculate the Standard Deviation of this set of data:
150, 160, 170, 180, 190

x x2
Code  y  x  100
50 2500
50, 60, 70, 80, 90
60 3600

 x  350  x  25500 n 5
2
70 4900
80 6400

 x2 x
2
90 8100
2    
n  n  Total 350 25500
2
25500  350 
 
2
 
5  5 
 2  200
We do not need to undo
  14.14 (2dp)
as we only subtracted!
3E
Representation and Summary of
Data - Dispersion
Coding
Use the following code to calculate the Standard Deviation of this set of data:
150, 160, 170, 180, 190

x  100 x x2
Code  y
10 5 25
5, 6, 7, 8, 9
6 36

 x  35  x  255 n 5
2
7 49
8 64

 x2 x
2
9 81
2    
n  n  Total 35 255
2
255  35  We only need to undo
 
2
 
5  5  the divide by 10…
2 2
x 10
  1.41 (2dp)   14.14 (2dp)
3E
Representation and Summary of
Data - Dispersion
Coding
Use the code below to calculate the Standard Deviation of this table of data.

x  7.5 Call Midpoint


Code  y  Calls, f y fy fy2
5 length ,x

2  
  fy 
2
fy 2 0-5 4 2.5 -1 -4 4
 
 f   f 
2 5-10 12 7.5 0 0 0
34.25  11.5 
  2
 
26  26 
10-15 6 12.5 1 6 6
  1.12
2

  1.06 (2dp) 15-20 3 17.5 2 6 12

Undo the divide


x5 by 5 only… 20-30 1 25 3.5 3.5 12.25

  5.29 (2dp)
Total 26 11.5 34.25
(Σf) (Σfy) (Σfy2)
Summary
• We have now finished chapter 3

• We have seen how to calculate range and Inter-


quartile range including using Interpolation from a
table

• We have learnt how to calculate Percentiles and


Inter-Percentile range

• We have calculated Variance and Standard Deviation


from a table

• We have also used coding to simplify calculations

You might also like