AddMathLesson (5th Form Term 2, Lesson 37 - Analyzing Data II)

LESSON 37: Analyzing Data II
“Mathematics is not about numbers, equations , computations or algorithms; it is about understanding.”

– William Paul Thurston
O.M. “ Although analysis from raw quantitative data is possible (as we saw in the previous
lesson), it is not always practical for large data sets and therefore at times it is
necessary to apply structure to the data before applying calculations for the
analysis process. We can structure quantitative data in two ways: ungrouped and
grouped. However, in this lesson, we will look specifically at how to generate
descriptive statistics from ungrouped data. We will also look at how to interpret
the results generated from analyzing data structured using a stemplot. ”
37.1 ANALYSIS OF UNGROUPED DATA

Ungrouped data refers to data organized into single values of an attribute along with
their respective frequencies.
Example 1(a): (Calculating measures of central tendency from ungrouped data)
The table below shows the number of sessions, out of a possible 10 sessions, the 40
students of the same form in a school attended swimming classes offered by the school
Number of sessions Frequency
0 7
1 1
2 1
3 3
4 5
5 8
6 5
7 4
8 3
9 2
10 1
Use the ungrouped frequency distribution to calculate

(i) the mode (ii) the median (iii) the mean
Solution:
(i) The mode is the data item with the highest frequency
Number of sessions Frequency
0 7
1 1
2 1
3 3
4 5 Highest
Data Item 5 8 frequency
6 5
7 4
8 3
9 2
10 1
Hence, the mode is 5. (i.e. 5 swimming sessions)

1
(ii) Position of median = (n + 1)th rank =
2
1
= (40 + 1)th rank = 20.5th rank
2
⇒ there are two middle items in the 20th and 21st rank
To see this, we attach an additional column called the cumulative
frequency.
Number Frequency Cumulative
of sessions Frequency
0 7 7
1 1 7+1=8
2 1 8+1=9
3 3 9 + 3 = 12
4 5 12 + 5 = 17 20th rank
5 8 17 + 8 = 25 21st rank
6 5 30
7 4 34
8 3 37
9 2 39
10 1 40
The cumulative frequency is the sum of frequencies up to a particular item in the

data set. So up to the data item , 5 (i.e. 5 swimming sessions), we have 25 students
accounted for.
⇒ the values in the 20th and 21st rank are 5 and 5.
5+ 5
Hence, the median = = 5
2
Ans: median is 5
𝑠𝑢𝑚 𝑜𝑓 (𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 × 𝑑𝑎𝑡𝑎 𝑖𝑡𝑒𝑚𝑠)
(iii) Mean =
𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
∑ 𝑓𝑥
= ∑𝑓
The following table clarifies the calculations involve in determining the

mean.
Number of Frequency x×f=fx
sessions, x f
0 7 0
1 1 1
2 1 2
3 3 9
4 5 20
5 8 40
6 5 30
7 4 28
8 3 24
9 2 18
10 1 10
∑ 𝑓 = 40 ∑ 𝑥 = 182
∑ 𝑓𝑥 182
Hence, mean = ∑𝑓
= = 4.55
40
Hence, the mean is 4.55 swimming sessions.

Example 1(b): (Calculating measures of dispersion from ungrouped data.)
The table below shows the number of sessions, out of a possible 10 sessions, the 40
students of the same form in a school attended swimming classes offered by the school.
Number Frequency
of sessions
0 7
1 1
2 1
3 3
4 5
5 8
6 5
7 4
8 3
9 2
10 1
Use the ungrouped frequency distribution to calculate

(i) the range
(ii) a) the interquartile range b) the semi-interquartile range
(iii) a) the variance b) the standard deviation
Solution:
(i) Range of distribution = highest observation – lowest observation
= 10 – 0 = 10
(ii) a) Interquartile range = upper quartile – lower quartile
3
Position of upper quartile = (n + 1)th rank =
4
3 3 th
= (40 + 1)th rank = 30 rank
4 4
⇒ upper quartile is between the 30th and 31st rank

To see this, we attach an additional column called the cumulative
frequency.
Number of sessions Frequency Cumulative
Frequency
0 7 7
1 1 7+1=8
2 1 8+1=9
1 th 1 th
Value in 10 rank 3 3 9 + 3 = 12 10 rank
4 4
4 5 12 + 5 = 17
5 8 17 + 8 = 25
Value in 30th rank 6 5 30 30th rank
Value in 31st rank 7 4 34 31st rank

8 3 37
9 2 39
10 1 40
Thus, the 30th value is a 6 and the 31st value is a 7. So the upper quartile is
(6 + 7) ÷ 2 = 6.5
1
Position of the lower quartile = (n + 1)th rank =
4
1 1 th
= (40 + 1)th rank = 10 rank
4 4
= 3 (See diagram above)

Hence, interquartile range = 6.5 – 3 = 3.5 (i.e. 3.5 swimming sessions)
b) semi-interquartile range = interquartile range ÷ 2

= 3.5 ÷ 2
= 1.75
(iii) a) Calculating variance
Let x rep. no. of swimming sessions
Let f rep. frequency of a swimming session.
Mean of the ungrouped distribution = 𝑥̅ = 4.55 (See Example 1(a))
x f fx x – 𝑥̅ (x – 𝑥̅ )2 f (x – 𝑥̅ )2
0 7 0 – 4.55 20.7025 144.9175
1 1 1 – 3.55 12.6025 12.6025
2 1 2 – 2.55 6.5025 6.5025
3 3 9 – 1.55 2.4025 7.2075
4 5 20 – 0.55 0.3025 1.5125
5 8 40 0.45 0.2025 1.62
6 5 30 1.45 2.1025 10.5125
7 4 28 2.45 6.0025 24.01
8 3 24 3.45 11.9025 35.7075
9 2 18 4.45 19.8025 39.605
10 1 10 5.45 29.7025 29.7025
∑ 𝑓 = 40 ∑(𝑥 − 𝑥 ̅ ) = 313.9
∑(𝑥− 𝑥̅ ) = 313.9
Using the results in the table above, we get variance = ∑ 𝑓= 40
≈ 7.85 (3 s.f. )
Hence, the variance of the swimming sessions is 7.85
(The value 7.85 indicates a fairly high variability(inconsistency) within the data)
∑(𝑥− 𝑥̅ ) = 313.9
b) Standard deviation = √ ∑ 𝑓= 40
= √7.85
≈ 2.80 (3 s.f.)
Hence, the standard deviation of swimming sessions is 2.80
(The value 2.8 indicates that the data has a fairly low degree of spread.)
37.2 INTERPRETING STEMPLOTS

A very important phase of the statistical analysis process is the interpreting of the
results to derive meaning and establish patterns. One technique used to enable efficient and
accurate interpretation is using the results from a stemplot to create distribution curves.
The diagrams below illustrate some of the possible patterns for stemplots:
In the diagram on the left, the mean, mode and median
(measures of central tendency) have the same value. So we
say here that the distribution is symmetrical about the vertical
line shown.
Mean
Median
Mode
Here, the mean, is greater than the median so the distribution

is skewed to the right (positively skewed)
Mode Mean
Median
Here, the mean, is less than the median so the distribution is

skewed to the left (negatively skewed)
Mean Mode
Median
Example 2: The marks of two students as stated in their report books are:
Oliver: 78 82 65 92 83 85 64 75 79
Olivia: 86 75 83 77 78 87 67 95 79
The back-to-back stemplot for this data is given below:
(Oliver) Leaf Stem Leaf (Olivia)

5 4 6 7
9 8 5 7 5 7 8 9
5 3 2 8 3 6 7
2 9 5
( Note: A back-to-back stemplot is a variation which enables easy comparison between two sets of data.)
Use the back-to-back stemplot to determine for Olivia and Oliver:

(a) (i) the median
(ii) the interquartile range
(iii) the mean
(b) Interpret the results in part (i) and (iii)
Solution:
(a) (i) Oliver’s scores, in ascending order, are:
64, 65, 75, 78, 79, 82, 83, 85, 92
Hence, Oliver’s median = 79
Olivia’s scores, in ascending order, are:
67, 75, 77, 78, 79, 83, 86, 87, 95
Olivia’s median = 79
(ii) Upper quartile for Oliver’s scores = (83 + 85)/2 = 84

Lower quartile for Oliver’s scores = (65 + 75)/2 = 70
∴ interquartile range for Oliver’s scores = 84 – 70 = 14
Upper quartile for Olivia’s scores = (86 + 87)/2 = 86.5

Lower quartile for Olivia’s scores = (75 + 77)/2 = 76
∴ interquartile range for Oliver’s scores = 86.5 – 76 = 10.5
64+65+75+78+79+82+83+85+92
(iii) Mean for Oliver’s scores =
9
= 703 ÷ 9 ≈ 78.1
67+75+77+78+79+83+86+87+95
Mean for Olivia’s scores =
9
= 727 ÷ 9 ≈ 80.8
(b) Distribution curve for Oliver’s scores:
Mean = 78.1 Median = 79
Since the mean is less than the median, the distribution is skewed to the left
(negatively skewed) which means Oliver’s performance tends towards scores less
than 79.
Distribution for Olivia’s scores.
Median = 79 Mean = 80.8
Since the mean is more than the median, the distribution is skewed to the
right (positively skewed) which means Olivia’s performance tends towards scores
more than 79.
TAKE-AWAYS
• Ungrouped data refers to data organized into single values of an attribute along with
their respective frequencies.
• The cumulative frequency is the sum of frequencies up to a particular item in the data
set.
• Variance is a measure of the degree of variability(inconsistency) within the data set
• Standard deviation is a measure of the degree of spread i.e. how far the data set in
general deviates from a central value (such as mean)
• If mean, mode and median are equal, the distribution is described as symmetrical
• If mean is greater than the median, the distribution is skewed to the right (positively
skewed)
• If mean is less than the median, the distribution is skewed to the left (negatively
skewed)

AddMathLesson (5th Form Term 2, Lesson 37 - Analyzing Data II)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AddMathLesson (5th Form Term 2, Lesson 37 - Analyzing Data II)

Uploaded by

Copyright:

Available Formats

LESSON 37: Analyzing Data II

“Mathematics is not about numbers, equations , computations or algorithms; it is about understanding.”

37.1 ANALYSIS OF UNGROUPED DATA

Use the ungrouped frequency distribution to calculate

Hence, the mode is 5. (i.e. 5 swimming sessions)

The cumulative frequency is the sum of frequencies up to a particular item in the

The following table clarifies the calculations involve in determining the

Hence, the mean is 4.55 swimming sessions.

Use the ungrouped frequency distribution to calculate

⇒ upper quartile is between the 30th and 31st rank

Value in 31st rank 7 4 34 31st rank

= 3 (See diagram above)

b) semi-interquartile range = interquartile range ÷ 2

37.2 INTERPRETING STEMPLOTS

Here, the mean, is greater than the median so the distribution

Here, the mean, is less than the median so the distribution is

(Oliver) Leaf Stem Leaf (Olivia)

Use the back-to-back stemplot to determine for Olivia and Oliver:

(ii) Upper quartile for Oliver’s scores = (83 + 85)/2 = 84

Upper quartile for Olivia’s scores = (86 + 87)/2 = 86.5

Mean = 78.1 Median = 79

Median = 79 Mean = 80.8

You might also like