3) S1 Representation and Summary of Data - Dispersion

Representation and Summary of
Data - Dispersion
• The last chapter was based on
calculating averages from sets of data
• Just the average alone does not give

the full picture though
• The next chapter looks at measures of

dispersion, how spread out the data is
Data - Dispersion
Range and Quartiles
• The Quartiles, Q1, Q2 and Q3 split the data into 4 parts, with 25% of the
information in each
Lowest Highest
Value Q1 Q2 Q3 Value
25% 25% 25% 25%
• For discrete data  Q1 = n/4 Remember, if the result is whole, you need
the midpoint of the term and the term
 Q2 = n/2 above. If not, round up and find the
 Q3 = 3n/4 corresponding term.
For continuous data  Use interpolation (like with the median from
chapter 2)  PL 
LB    CW 
 GF 
Inter-quartile range  Upper Quartile – Lower Quartile

 Q3 – Q1
3A
Data - Dispersion
Range and Quartiles
Calculate the Range and Inter-quartile range of the following data.
7, 9, 4, 6, 3, 2, 8, 1, 10, 15, 11
Putting the data in order…

1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 15
Range  15 – 1 = 14
Lower Quartile  n 11 2.75 (3rd term)

4 4
3
Upper Quartile  3n 33
8.25 (9th term)
4 4
10
Inter-quartile Range 
10 – 3 = 7
3A
Data - Dispersion
Range and Quartiles No.
Cumulative
Rebecca records the number of CDs in x Students,
the collections of students in her year. Frequency
f
The results are shown in the table
opposite. Calculate the Inter-quartile
range (IQR). 35 3 3
Q1 = n 95 23.75 (24th term)

4 4 36 17 20
37
Q3 = 3n 285 71.25 (72nd term) 37 29 49

4 4
38
38 34 83
IQR = Q3 – Q1
= 38 – 37
=1 39 12 95
Discrete 3A
Data - Dispersion
Range and Quartiles
Time No. Cumulative
The length of time spent on the internet
each evening by a group of students is (mins) Students Frequency
shown in the table below. Calculate the
Inter-quartile range.
30-31 2 2
Q1 = n 70 17.5 term
th
4 4 31.5
( )
PL 32-33 25 27
Q1 = LB + x CW
GF
33.5
( )
15.5
31.5 + x 2 34-36 30 57
25
Q1 = 32.74
37-39 13 70
Continuous 3A
Data - Dispersion
Range and Quartiles
Time No. Cumulative
30-31 2 2
Q3 = 3n 210 52.5 term
th
4 4
( )
PL 32-33 25 27
Q3 = LB + x CW
GF
33.5
( )
25.5
33.5 + x 3 34-36 30 57
30
36.5
Q3 = 36.05
37-39 13 70
Continuous 3A
Data - Dispersion
Range and Quartiles
Time No. Cumulative
30-31 2 2
Q1 = 32.74
Q3 = 36.05
32-33 25 27
IQR  Q3 – Q1
 36.05 – 32.74
 3.31 34-36 30 57
37-39 13 70
Continuous 3A
Data - Dispersion
Percentiles
A Percentile is similar to a quartile. The 70 th percentile of a set of data will be
the value that has 70% of the data before it. It would normally be written P 70.
The 62nd percentile will be the value that has 62% of the data before it, P 62.
xn
To calculate Px, you find the value of the th term
100
31n
For the 31 percentile 
st
100
90n
For the 90th percentile 
100
You can calculate the n% to m% Inter-percentile range  Pm – Pn
The Quartiles are effectively percentiles  Q1 = P25

 Q2 = P50
 Q3 = P75
3B
Data - Dispersion
Percentiles Cumulative
The height, in cm of 70 eighteen year old Height Students
Frequency
boys was measured and the data put into
the table opposite. Calculate the 90th
percentile, the 10th percentile and the 150-160 4 4
10% to 90% Inter-percentile range.
P90 = 90n 6300 63rd term 160-170 21 25

100 100
170-180 32 57
( )
P90 = LB + PL
x CW
GF
180-190 9 66
( )
6
180 + x 10
9
P90 = 186.67 (2dp) 190-200 4 70
3B
Data - Dispersion
Frequency
P10 = 10n 700 7th term 160-170 21 25

100 100
170-180 32 57
( )
P10 = PL
LB + x CW
GF
180-190 9 66
( )
3
160 + x 10
21
P10 = 161.43 (2dp) 190-200 4 70
3B
Data - Dispersion
Frequency
P90 = 186.67 (2dp) 160-170 21 25

P10 = 161.43 (2dp)
170-180 32 57
The 10% to 90% Inter-percentile
range  P90 – P10
180-190 9 66
 186.67 – 161.43
 25.24cm
190-200 4 70
3B
Data - Dispersion
Variance and Standard Deviation
Variance and Standard Deviation are measures of how far away the data is
spread from the mean. If the mean is x and an observation is x, then the
observation’s dispersion from the mean is x  x.
Sum of the squared
dispersions from the mean
(squaring removes any
The variance will therefore be given by;  ( x  x) 2 negative values)
n Number of
observations
However, a formula which is more commonly used, especially with larger sets of
data, is; The mean of the
squares
 2      
2
x  x
2
Variance
n  n 
The square of
the mean
The Standard Deviation,  is given by Variance .

3C
Data - Dispersion
Important point:
 The Standard Deviation tells you the range from the mean which
contains around 68% of the data (if data is normally disributed)
For example, if 100 students have a mean height of 150cm and a

standard deviation of 10cm.
150
68 of the students are within
140 160 one Standard Deviation
95 of the students are within

130 170 two Standard Deviations
3C
Data - Dispersion
 x2 x
2
Given that for x; 2    

n  n 
 x  42  x
2
 720 n5 2
720  42 
2   
Calculate the Variance and Standard 5  5 
Deviation of x.
 2  144  70.56
Which part is the mean?

Variance  2  73.44
Standard Deviation   8.57 (2dp)
3C
Data - Dispersion
 x2 x
2
Use the formula to calculate the variance and 2    

standard deviation of the following numbers n  n 
3, 4, 6, 2, 8, 8, 5
2
 x  36  x n 218  36 
2
 218 7 2   
7  7 
x x2
3 9  2  31.14  26.45
4 16
6 36 Variance  2  4.69
2 4
8 64
8 64
5 25
Total 36 218
3C
Data - Dispersion
Variance and Standard Deviation from a Table
As with the averages from Chapter 2, you need to be able to calculate the
Variance and Standard Deviation from a frequency table, grouped or ungrouped.
 x2 x
2
This was the formula from before  2    

n  n 
The formula for tabled data is similar:
Sum of frequency Sum of frequency
times x2 times x
   fx 
2
2
fx
Variance  2
 
f   f 
 
Sum of frequency
The difference reflects the fact that each value of x will appear many
times, rather than just once or a few times
3D
Data - Dispersion
Calculate the Variance and Standard Deviation of a set of data with the
following values already calculated.
 fx  224  f
2
fx  8731  25
2  
  fx 
2
fx 2
 
 f   f 
2
8731  224 
 
2
 
25  25 
Variance  2  268.9584
3D
Data - Dispersion
Variance and Standard Deviation from a Table No.
Sue records the time spent in town at lunchtime x students fx fx2
(mins) of students in her year. The results are in the (f)
table. Calculate the Standard Deviation of the time
spent out of school. 35 3 105 3675
   (3 x 35) (3 x 352)
2
fx  4096 fx  154050 f  109
36 17 612 22032
   fx 
2
fx 2 (17 x 36) (17 x 362)
 2
 
f   f 
  37 29 1073 39701
2
154050  4096 
 
2
  38 34 1292 49096
109  109 
 2  1.19805... 39 26 1014 39546
σ = 1.20 (2dp) Total 109 4096 154050
3D
Data - Dispersion
Mid-
Andy recorded the lengths of telephone calls he Length Calls, f point fx fx2
made over the course of a month. Calculate an (x)
estimate of the Standard Deviation of his calls.
0-5 4 2.5 10 25
 
2
fx  247.5 fx  3018.75 (4 x 2.5) (4 x 2.52)
f  27 5-10 15 7.5 112.5 843.75

(15 x 7.5) (15 x 7.52)
   fx  10-15 5 12.5 62.5 781.25

2
fx 2
 2
  
 f   f 
15-20 2 17.5 35 612.5
2
3018.75  247.5 
 
2
  20-25 0 22.5 0 0
27  27 
 2  27.78 25-30 1 27.5 27.5 756.25
Total 27 247.5 3018.75

  5.27 (2dp)
3D
Data - Dispersion
Coding
As with averages, coding can be used to make data easier to work with.
However, there is something extra to remember…
If you have a set of data with a range of 15, and reduce every number
by 2, what will happen to the range?
 Nothing!
 Range measures the spread of data, and if all the numbers are 2 less,
the spread will not have changed
It is exactly the same for Standard Deviation. Because it measures the

spread of data, any addition/subtraction in the coding will not need to
be undone.
 Any division or multiplication will have to be ‘uncoded’ as normal
3E
Data - Dispersion
Coding
Use the following code to calculate the Standard Deviation of this set of data:
150, 160, 170, 180, 190
Code 
x x x2
y
10 15 225
15, 16, 17, 18, 19
16 256
 x  85  x  1455 n 5
2
17 289
18 324
 x2 x
2
19 361
2    
n  n  Total 85 1455
But we had divided by

2
1455  85 
 
2
 
5  5  10 so we must undo
2 2 this…
x 10
  1.41 (2dp)   14.14 (2dp)
3E
Data - Dispersion
Coding
150, 160, 170, 180, 190
x x2
Code  y  x  100
50 2500
50, 60, 70, 80, 90
60 3600
 x  350  x  25500 n 5
2
70 4900
80 6400
 x2 x
2
90 8100
2    
n  n  Total 350 25500
2
25500  350 
 
2
 
5  5 
 2  200
We do not need to undo
  14.14 (2dp)
as we only subtracted!
3E
Data - Dispersion
Coding
150, 160, 170, 180, 190
x  100 x x2
Code  y
10 5 25
5, 6, 7, 8, 9
6 36
 x  35  x  255 n 5
2
7 49
8 64
 x2 x
2
9 81
2    
n  n  Total 35 255
2
255  35  We only need to undo
 
2
 
5  5  the divide by 10…
2 2
x 10
  1.41 (2dp)   14.14 (2dp)
3E
Data - Dispersion
Coding
Use the code below to calculate the Standard Deviation of this table of data.
x  7.5 Call Midpoint

Code  y  Calls, f y fy fy2
5 length ,x
2  
  fy 
2
fy 2 0-5 4 2.5 -1 -4 4
 
 f   f 
2 5-10 12 7.5 0 0 0
34.25  11.5 
  2
 
26  26 
10-15 6 12.5 1 6 6
  1.12
2
  1.06 (2dp) 15-20 3 17.5 2 6 12
Undo the divide

x5 by 5 only… 20-30 1 25 3.5 3.5 12.25
  5.29 (2dp)
Total 26 11.5 34.25
(Σf) (Σfy) (Σfy2)
Summary
• We have now finished chapter 3
• We have seen how to calculate range and Inter-

quartile range including using Interpolation from a
table
• We have learnt how to calculate Percentiles and

Inter-Percentile range
• We have calculated Variance and Standard Deviation

from a table
• We have also used coding to simplify calculations

3) S1 Representation and Summary of Data - Dispersion

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3) S1 Representation and Summary of Data - Dispersion

Uploaded by

Copyright:

Available Formats

Representation and Summary of

• Just the average alone does not give

• The next chapter looks at measures of

25% 25% 25% 25%

Inter-quartile range  Upper Quartile – Lower Quartile

Putting the data in order…

Lower Quartile  n 11 2.75 (3rd term)

Q1 = n 95 23.75 (24th term)

Q3 = 3n 285 71.25 (72nd term) 37 29 49

You can calculate the n% to m% Inter-percentile range  Pm – Pn

The Quartiles are effectively percentiles  Q1 = P25

P90 = 90n 6300 63rd term 160-170 21 25

P90 = 186.67 (2dp) 190-200 4 70

P10 = 10n 700 7th term 160-170 21 25

P10 = 161.43 (2dp) 190-200 4 70

P90 = 186.67 (2dp) 160-170 21 25

The Standard Deviation,  is given by Variance .

For example, if 100 students have a mean height of 150cm and a

95 of the students are within

Given that for x; 2    

Which part is the mean?

Standard Deviation   8.57 (2dp)

Use the formula to calculate the variance and 2    

This was the formula from before  2    

Standard Deviation   16.40 (2dp)

 2  1.19805... 39 26 1014 39546

σ = 1.20 (2dp) Total 109 4096 154050

f  27 5-10 15 7.5 112.5 843.75

   fx  10-15 5 12.5 62.5 781.25

 2  27.78 25-30 1 27.5 27.5 756.25

Total 27 247.5 3018.75

It is exactly the same for Standard Deviation. Because it measures the

 Any division or multiplication will have to be ‘uncoded’ as normal

But we had divided by

x  7.5 Call Midpoint

  1.06 (2dp) 15-20 3 17.5 2 6 12

Undo the divide

• We have seen how to calculate range and Inter-

• We have learnt how to calculate Percentiles and

• We have calculated Variance and Standard Deviation

• We have also used coding to simplify calculations

You might also like