You are on page 1of 8

LESSON 37: Analyzing Data II

“Mathematics is not about numbers, equations , computations or algorithms; it is about understanding.”


– William Paul Thurston

O.M. “ Although analysis from raw quantitative data is possible (as we saw in the previous
lesson), it is not always practical for large data sets and therefore at times it is
necessary to apply structure to the data before applying calculations for the
analysis process. We can structure quantitative data in two ways: ungrouped and
grouped. However, in this lesson, we will look specifically at how to generate
descriptive statistics from ungrouped data. We will also look at how to interpret
the results generated from analyzing data structured using a stemplot. ”

37.1 ANALYSIS OF UNGROUPED DATA


Ungrouped data refers to data organized into single values of an attribute along with
their respective frequencies.
Example 1(a): (Calculating measures of central tendency from ungrouped data)
The table below shows the number of sessions, out of a possible 10 sessions, the 40
students of the same form in a school attended swimming classes offered by the school
Number of sessions Frequency
0 7
1 1
2 1
3 3
4 5
5 8
6 5
7 4
8 3
9 2
10 1

Use the ungrouped frequency distribution to calculate


(i) the mode (ii) the median (iii) the mean
Solution:
(i) The mode is the data item with the highest frequency
Number of sessions Frequency
0 7
1 1
2 1
3 3
4 5 Highest
Data Item 5 8 frequency
6 5
7 4
8 3
9 2
10 1

Hence, the mode is 5. (i.e. 5 swimming sessions)


1
(ii) Position of median = (n + 1)th rank =
2
1
= (40 + 1)th rank = 20.5th rank
2

⇒ there are two middle items in the 20th and 21st rank
To see this, we attach an additional column called the cumulative
frequency.
Number Frequency Cumulative
of sessions Frequency
0 7 7
1 1 7+1=8
2 1 8+1=9
3 3 9 + 3 = 12
4 5 12 + 5 = 17 20th rank
5 8 17 + 8 = 25 21st rank
6 5 30
7 4 34
8 3 37
9 2 39
10 1 40

The cumulative frequency is the sum of frequencies up to a particular item in the


data set. So up to the data item , 5 (i.e. 5 swimming sessions), we have 25 students
accounted for.
⇒ the values in the 20th and 21st rank are 5 and 5.
5+ 5
Hence, the median = = 5
2

Ans: median is 5
𝑠𝑢𝑚 𝑜𝑓 (𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 × 𝑑𝑎𝑡𝑎 𝑖𝑡𝑒𝑚𝑠)
(iii) Mean =
𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠

∑ 𝑓𝑥
= ∑𝑓

The following table clarifies the calculations involve in determining the


mean.
Number of Frequency x×f=fx
sessions, x f
0 7 0
1 1 1
2 1 2
3 3 9
4 5 20
5 8 40
6 5 30
7 4 28
8 3 24
9 2 18
10 1 10
∑ 𝑓 = 40 ∑ 𝑥 = 182

∑ 𝑓𝑥 182
Hence, mean = ∑𝑓
= = 4.55
40

Hence, the mean is 4.55 swimming sessions.


Example 1(b): (Calculating measures of dispersion from ungrouped data.)
The table below shows the number of sessions, out of a possible 10 sessions, the 40
students of the same form in a school attended swimming classes offered by the school.
Number Frequency
of sessions
0 7
1 1
2 1
3 3
4 5
5 8
6 5
7 4
8 3
9 2
10 1

Use the ungrouped frequency distribution to calculate


(i) the range
(ii) a) the interquartile range b) the semi-interquartile range
(iii) a) the variance b) the standard deviation
Solution:
(i) Range of distribution = highest observation – lowest observation
= 10 – 0 = 10
(ii) a) Interquartile range = upper quartile – lower quartile
3
Position of upper quartile = (n + 1)th rank =
4
3 3 th
= (40 + 1)th rank = 30 rank
4 4

⇒ upper quartile is between the 30th and 31st rank


To see this, we attach an additional column called the cumulative
frequency.
Number of sessions Frequency Cumulative
Frequency
0 7 7
1 1 7+1=8
2 1 8+1=9
1 th 1 th
Value in 10 rank 3 3 9 + 3 = 12 10 rank
4 4
4 5 12 + 5 = 17
5 8 17 + 8 = 25
Value in 30th rank 6 5 30 30th rank

Value in 31st rank 7 4 34 31st rank


8 3 37
9 2 39
10 1 40

Thus, the 30th value is a 6 and the 31st value is a 7. So the upper quartile is
(6 + 7) ÷ 2 = 6.5
1
Position of the lower quartile = (n + 1)th rank =
4
1 1 th
= (40 + 1)th rank = 10 rank
4 4

= 3 (See diagram above)


Hence, interquartile range = 6.5 – 3 = 3.5 (i.e. 3.5 swimming sessions)

b) semi-interquartile range = interquartile range ÷ 2


= 3.5 ÷ 2
= 1.75
(iii) a) Calculating variance
Let x rep. no. of swimming sessions
Let f rep. frequency of a swimming session.
Mean of the ungrouped distribution = 𝑥̅ = 4.55 (See Example 1(a))

x f fx x – 𝑥̅ (x – 𝑥̅ )2 f (x – 𝑥̅ )2
0 7 0 – 4.55 20.7025 144.9175
1 1 1 – 3.55 12.6025 12.6025
2 1 2 – 2.55 6.5025 6.5025
3 3 9 – 1.55 2.4025 7.2075
4 5 20 – 0.55 0.3025 1.5125
5 8 40 0.45 0.2025 1.62
6 5 30 1.45 2.1025 10.5125
7 4 28 2.45 6.0025 24.01
8 3 24 3.45 11.9025 35.7075
9 2 18 4.45 19.8025 39.605
10 1 10 5.45 29.7025 29.7025
∑ 𝑓 = 40 ∑(𝑥 − 𝑥 ̅ ) = 313.9

∑(𝑥− 𝑥̅ ) = 313.9
Using the results in the table above, we get variance = ∑ 𝑓= 40

≈ 7.85 (3 s.f. )
Hence, the variance of the swimming sessions is 7.85
(The value 7.85 indicates a fairly high variability(inconsistency) within the data)
∑(𝑥− 𝑥̅ ) = 313.9
b) Standard deviation = √ ∑ 𝑓= 40
= √7.85

≈ 2.80 (3 s.f.)
Hence, the standard deviation of swimming sessions is 2.80
(The value 2.8 indicates that the data has a fairly low degree of spread.)

37.2 INTERPRETING STEMPLOTS


A very important phase of the statistical analysis process is the interpreting of the
results to derive meaning and establish patterns. One technique used to enable efficient and
accurate interpretation is using the results from a stemplot to create distribution curves.
The diagrams below illustrate some of the possible patterns for stemplots:
In the diagram on the left, the mean, mode and median
(measures of central tendency) have the same value. So we
say here that the distribution is symmetrical about the vertical
line shown.

Mean
Median
Mode

Here, the mean, is greater than the median so the distribution


is skewed to the right (positively skewed)

Mode Mean
Median

Here, the mean, is less than the median so the distribution is


skewed to the left (negatively skewed)

Mean Mode
Median

Example 2: The marks of two students as stated in their report books are:

Oliver: 78 82 65 92 83 85 64 75 79
Olivia: 86 75 83 77 78 87 67 95 79
The back-to-back stemplot for this data is given below:

(Oliver) Leaf Stem Leaf (Olivia)


5 4 6 7
9 8 5 7 5 7 8 9
5 3 2 8 3 6 7
2 9 5

( Note: A back-to-back stemplot is a variation which enables easy comparison between two sets of data.)

Use the back-to-back stemplot to determine for Olivia and Oliver:


(a) (i) the median
(ii) the interquartile range
(iii) the mean
(b) Interpret the results in part (i) and (iii)
Solution:
(a) (i) Oliver’s scores, in ascending order, are:
64, 65, 75, 78, 79, 82, 83, 85, 92
Hence, Oliver’s median = 79
Olivia’s scores, in ascending order, are:
67, 75, 77, 78, 79, 83, 86, 87, 95
Olivia’s median = 79

(ii) Upper quartile for Oliver’s scores = (83 + 85)/2 = 84


Lower quartile for Oliver’s scores = (65 + 75)/2 = 70
∴ interquartile range for Oliver’s scores = 84 – 70 = 14

Upper quartile for Olivia’s scores = (86 + 87)/2 = 86.5


Lower quartile for Olivia’s scores = (75 + 77)/2 = 76
∴ interquartile range for Oliver’s scores = 86.5 – 76 = 10.5

64+65+75+78+79+82+83+85+92
(iii) Mean for Oliver’s scores =
9

= 703 ÷ 9 ≈ 78.1
67+75+77+78+79+83+86+87+95
Mean for Olivia’s scores =
9

= 727 ÷ 9 ≈ 80.8
(b) Distribution curve for Oliver’s scores:

Mean = 78.1 Median = 79

Since the mean is less than the median, the distribution is skewed to the left
(negatively skewed) which means Oliver’s performance tends towards scores less
than 79.
Distribution for Olivia’s scores.

Median = 79 Mean = 80.8

Since the mean is more than the median, the distribution is skewed to the
right (positively skewed) which means Olivia’s performance tends towards scores
more than 79.

TAKE-AWAYS
• Ungrouped data refers to data organized into single values of an attribute along with
their respective frequencies.
• The cumulative frequency is the sum of frequencies up to a particular item in the data
set.
• Variance is a measure of the degree of variability(inconsistency) within the data set
• Standard deviation is a measure of the degree of spread i.e. how far the data set in
general deviates from a central value (such as mean)
• If mean, mode and median are equal, the distribution is described as symmetrical
• If mean is greater than the median, the distribution is skewed to the right (positively
skewed)
• If mean is less than the median, the distribution is skewed to the left (negatively
skewed)

You might also like