You are on page 1of 9

LESSON 37: Analyzing Data III

“Mathematics is not about numbers, equations , computations or algorithms; it is about


understanding.”
– William Paul Thurston

O.M. “ Another method for structuring numerical data is to organize the values into
groups. This enables efficient and compact data representation. However, it
does complicate the analytical process somewhat. Seeing is believing so stay
tuned to witness the complex methods that are applied when analyzing grouped
data. Also look out for tips involving how to discuss the data that has been
presented. ”

37.1 ANALYSIS OF GROUPED DATA


Grouped data refers to data organized into fixed class intervals of an attribute along
with their respective frequencies. Below is an example of a class interval

Lower class boundary 39.5 40 – 49 49.5 Upper class boundary

Lower class limit upper class limit

In general, the class boundaries are calculated as follows:


The upper class boundary = upper class limit + 0.5
The lower class boundary = lower class limit – 0.5

Example 1 (a) (Calculating measures of central tendency from grouped data.)


The points obtained by 80 students playing computer games are recorded in the table
below
Points Number of
students
40 – 49 8
50 – 59 16
60 – 69 20
70 – 79 18
80 – 89 14
90 – 99 4

Use the grouped frequency distribution table above to determine :


(a) the mode (b) the median (c) the mean
Solution:
(a) First locate the modal class – the class interval with the highest frequency

Points Number of
students
40 – 49 8
Modal class
50 – 59 16
60 – 69 20
70 – 79 18
80 – 89 14
90 – 99 4
Next , we construct a histogram in the following manner

Constructed lines of the modal class bar

The modal class is 60 – 69 which has an upper class boundary of 69.5 and
a lower class boundary of 59.5. The frequency of that class is 20. The intersection
of the two constructed lines in the modal class bar determine the mode.
Hence, the mode is 66.1 points (See histogram above)

(b) To determine the median of a grouped frequency distribution, we need


to construct a cumulative frequency curve. To do that, we first add a
cumulative frequency column and an upper class boundary column

Points Number of Upper Class Cumulative


students Boundary Frequency
30 – 39 0 39.5 0
40 – 49 8 49.5 8
50 – 59 16 59.5 24
60 – 69 20 69.5 44
70 – 79 18 79.5 62
80 – 89 14 89.5 76
90 – 99 4 99.5 80

Next, we use the upper class boundaries and cumulative frequencies to


construct the curve as follows:
The position of the median = (½ n) th frequency = 40th frequency
Construction sequence for median:
1. Draw a horizontal line from the 40th frequency (on the y-axis) to
intersect with the curve,
2. From the point of intersection, draw a vertical line to meet the x-
axis.
3. Read off the value on the x-axis (in this case its 67.5)
Hence, median = 67.5 points

(c) To calculate the mean, we also need additional columns and calculations:

Points Frequency Class mid- f × x = fx


(f) point (x)
40 – 49 8 44.5 356
50 – 59 16 54.5 872
60 – 69 20 64.5 1290
70 – 79 18 74.5 1341
80 – 89 14 84.5 1183
90 – 99 4 94.5 378
n = ∑ 𝑓 = 80 ∑ 𝑓x = 5 420

Note: Class midpoint = (class upper limit + class lower limit) ÷ 2


∑ 𝑓𝑥 5 420
Hence, mean =
∑𝑓
= = 67.75 points
80
Example 1 (b) (Calculating measures of dispersion from grouped data.)
The points obtained by 80 students playing computer games are recorded in the table
below

Points Number of
students
40 – 49 8
50 – 59 16
60 – 69 20
70 – 79 18
80 – 89 14
90 – 99 4

Use the grouped frequency distribution table above to determine :


(a) the range
(b) (i) the interquartile range (ii) the semi-interquartile range
(c) (i) the variance (ii) the standard variation

Solution:
(a)
Points Number of
students
Lower class boundary
of first class – 49
39.5 40 8
50 – 59 16
60 – 69 20
70 – 79 18
80 – 89 14
90 – 99 99.5 4

Upper class boundary of last class.

the range = upper boundary of last class – lower boundary of first class
= 99.5 – 39.5 = 60 points.

(b) (i) The calculation of the interquartile range requires the use of a
cumulative frequency curve.
Upper quartile = Q3 = ¾ (80)th frequency = 60th frequency
Lower quartile = Q1 = ¼ (80)th frequency = 20th frequency
See diagram below for the position of these quartiles:
Q1 = 57
Q3 = 78.4

Therefore, interquartile range = upper quartile(Q3) – lower quartile(Q1)


= 78.4 – 57
= 21.4 points

(ii) the semi-interquartile range = interquartile range ÷ 2


= 21.4 ÷ 2
= 10.7 points
(c) First, we construct a calculation table as follows:

Class mid- Frequency f × x = fx fx2


point (x) (f)
44.5 8 356 15 842
54.5 16 872 47 524
64.5 20 1290 83 205
74.5 18 1341 99 904.5
84.5 14 1183 99 963.5
94.5 4 378 35 721
n = ∑ 𝑓 = 80 ∑ 𝑓x = 5 420 ∑ 𝑓x2 = 382 160

∑ 𝑓𝑥 2
(i) The variance = s2 = – (𝑥̅ )2 // 𝑥̅ is the mean of the distribution. `
𝑛
See Example 1(a) part (c) //
382 160
= – (67.75)2
80

= 4 777 – 4 590.0625
= 186.9375
≈ 186.9 points (1 d.p.)
∑ 𝑓𝑥 2
(ii) the standard deviation = s = √ – (𝑥
̅ )2
𝑛

= √186.9375
≈ 13.7 points (3 s.f.)

37.2 INTERPRETING BOX-AND-WHISKER PLOTS


It is often the case that the same data is presented using two different
statistical diagrams. Therefore, both the stemplot and the boxplot (box-and-
whisker plot) can be used to represent the same data. Depending on the goals of
the analyst, the boxplot may be more useful in extracting information for drawing
meaningful conclusions. Hence, it is important that we can construct a boxplot
and comment on the shape of the plot.
Example 2: (Performing calculations for analysis and commenting on a box plot)

(a) The number of runs scored by a cricketer for 18 consecutive innings is


illustrated in the following stem-and-leaf diagram

(i) Determine the median score.


(ii) Calculate the interquartile range of the scores.
(iii) In the space below, construct a box-and-whisker plot to illustrate
the data and comment on the shape of the distribution.
Solution:
(i) Middle values are 19 and 24. // 9th and 10th values respectively.//
∴ Median score = (19 + 24) / 2 = 21.5 runs

(ii) Upper quartile = 31 // Position = ¾ (18 + 1)th = 14th value.//


Lower quartile = 10 // Position = ¼ (18 + 1)th = 5th value.//
∴ interquartile range = 31 – 10 = 21 runs.

(iii) The first step in constructing a boxplot is to calculate the 5-point data
summary
Minimum value = 0|2 = 2

Maximum value = 4|7 = 47

5-point data summary:


Minimum value = 2 ; //See diagram above.//
Maximum value = 47 ; //See diagram above.//
Median = 21.5 ; //See working in part (i). //
Lower quartile = 10 ; Upper quartile = 31
//See working in part (ii). //
After, this we select an appropriate scale, construct the box and attach
the whiskers. With best work the diagram below will be the result.
Commentary:
Median is very close to the middle of the box, which means the
distribution is highly symmetrical. The length of the box is 21 runs which
indicates a fairly wide spread of the data. The left whisker is shorter than the
right whisker and this means that there is a higher variability of values greater
than the median as opposed to those less than the median.

#1 Commentary on the shape of distributions, whether presented as a boxplot or


curve involves two things: observation and interpretation. Therefore, when asked to
do this type of commentary, always start with what you observe and then explain what
that observation means (your interpretation).

#2 When commenting on boxplots, always speak to the median line, the length
of the box(interquartile range) and the width of the two whiskers.

Example 3: (Using a box-plot to determine measures of dispersion; commentary)


(c) The scores of a class of 30 students on a Mathematics test were used to
draw the box plot below. (The total score possible is 20 marks.)

Using the box plot, determine the following:


(i) the median score
(ii) the range of the scores. = 20 – 4 = 16 marks
(iii) the semi-interquartile range of the scores:
(iv) Comment on the shape of the distribution of the scores.
Solution:
// All values can be read off from the boxplot. //
(i) median score = 11 marks
(ii) range of the scores = 20 – 4 = 16 marks
(iii) the semi-interquartile range of the scores:
= (17 – 7)/2 = 5 runs
(iv) The median line is just below the half-mark (at 12 marks) which
means the distribution is skewed upwards (values greater than 12). The
length of the box is fairly thin (10 runs) which means the spread of the
data is low. The top and bottom whiskers are equal in length which
indicates an even distribution of scores in the first and third quartiles and
therefore equal variability of values.

TAKE-AWAYS
• Grouped data refers to data organized into fixed class intervals of an
attribute along with their respective frequencies.
• Calculating measures of central tendency for grouped data involves the
use of the histogram, cumulative frequency curve and additional
columns in the grouped frequency tables for calculation purposes. (See
Example 1(a))
• Calculating measures of dispersion for grouped data involves the
calculation of boundaries for each class interval, the cumulative
frequency curve and additional columns in the grouped frequency tables
for calculation purposes . (See Example 1(b))
• Commit Pro Tips #1 and #2 to memory so that you can apply them when
asked to do commentary.

You might also like