AddMathLesson (5th Form Term 2, Lesson 36 - Analyzing Data)

LESSON 36: Analyzing Data I
“ Without Mathematics, there is nothing you can do. Everything around

you is mathematics. Everything around you is numbers. ”
– Shakuntala Devi
O.M. “ Data in its raw form is limited in its usefulness. That is why it is often necessary to
give it(data) some structure as a form of pre-surgery before subjecting it to analysis
and processing. We analyze data to find meaning, discover things previously hidden
and to understand how it relates to or participates in whatever context of study we are
engaged in. So in this lesson, we will focus on how data is analyzed mainly in its raw
form.”
36.1 QUARTILES AND PERCENTILES

When numerically ordered data is divided into 100 equal parts, then each division is called a
percentile
A quartile is a percentile that divides the data by a quarter. This means that there are
essentially three quartiles for a set of data structured into percentiles, as the diagram below
illustrates:
100
75 Upper Quartile (Q3)
Set of Data
(organized into 100 percentiles) 50 Middle Quartile (Q2 Median)
25 Lower Quartile (Q1)
The 25th percentile is called the Lower Quartile (Q1)

The 50th percentile is called the Middle Quartile or Median (Q2)
The 75th percentile is called the Upper Quartile (Q3)
Organising data into percentiles and therefore into quartiles is one method of structuring the
data to enable various calculations to be performed. All this is part of the process of analyzing
the data.
36.2 MEASURES OF CENTRAL TENDENCY

The process of analyzing data involves the use of descriptive statistics, which applies
calculations to extract useful summaries of the data set. There are two main categories of
descriptive statistics: measures of central tendency and measures of dispersion.
A measure of central tendency is a value to which values in the data set tend to i.e. the
centrality of the data. It therefore serves as a numerical summary of the data. The measures
of central tendency are:
Mean (average): the ratio of the total sum of the data values and the number of data
values.
Mode: the data item that appears the most (has the highest frequency).
Median: the middle value of ordered data.
Example 1 : (Calculating measures of central tendency from raw data)

The number of litres of gasoline served by an attendant at a gas station are:
32, 28, 29, 35, 32, 37, 33
Calculate the following for this set of values:
(a) the mode
(b) the median
(c) the mean.
Solution:
(a) the mode is the data item that appears the most.
32, 28, 29, 35, 32, 37, 33
Therefore, the mode is 32 litres.
(It is possible for a data set to not have a mode.
(b) to calculate the median, we first order the data:
28, 29, 32, 32, 33, 35, 37 (ascending order)
median
The middle value is 32 litres , so this is the median.

𝑠𝑢𝑚 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 32 + 28 + 29 + 35 + 32 + 37 + 33
(c) Mean = =
𝑛𝑜.𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 7
≈ 32.3 litres
36.3 MEASURES OF DISPERSION
A measure of dispersion is a quantity of the spread of the data i.e. how far from a central
value, the values in the data set are. The measures of dispersion are:
Range: the difference between the highest data value and the lowest data value.
Interquartile range: difference between upper quartile and lower quartile.
Semi-interquartile range: half the interquartile range.
Variance: average (mean) of squared deviations.
Standard deviation: the square root of the variance.
Example 2 : (Calculating measures of dispersion from raw data)

The number of litres of gasoline served by an attendant at a gas station are:
32, 28, 29, 35, 32, 37, 33
(a) Calculate the range for this data.
(b) a) Determine:
(i) The upper quartile (ii) the lower quartile
b) Hence, calculate:
(i) the interquartile range (ii) semi-interquartile range
(c) Find for this data:
(i) the variance (ii) standard deviation
Solution:
(a) Raw data is : 32, 28, 29, 35, 32, 37, 33
Smallest observation largest observation
Hence, range = 37 – 28 = 9 litres.
(b) a) (i) Raw data arranged in ascending order is

28, 29, 32, 32, 33, 35, 37
3
Upper quartile = (n + 1)th position // n rep. no. of data items. //
4
3
= (7+ 1) = 6th position
4
Data item in 6th position =
28, 29, 32, 32, 33, 35, 37
Ans: 35
1
(ii) Lower quartile = (n + 1)th position // n rep. no. of data items. //
4
1
= (7 + 1)th position = 2nd position
4
Data item in 2nd position =
28, 29, 32, 32, 33, 35, 37. Thus, answer is 29
b) (i) Interquartile range = upper quartile – lower quartile
= 35 – 29 = 6 litres
(ii) Semi-interquartile range = interquartile range ÷ 2
= 6 ÷ 2 = 3 litres.
(c) (i) Variance uses the mean of the data as a point of reference, so we calculate
the mean first.
32 + 28 + 29 + 35 + 32 + 37 + 33
Mean = ≈ 32.3 litres
7
Deviation = data value – mean

Thus, we can construct the table of calculations
Data values Deviations Square of deviations

(value) (value – mean) (value – mean)2
32 – 0.3 (– 0.3)2 = 0.09
28 – 4.3 (– 4.3)2 = 18.49
29 – 3.3 (– 3.3)2 = 10.89
35 2.7 (2.7)2 = 7.29
32 – 0.3 (– 0.3)2 = 0.09
37 4.7 (4.7)2 = 22.09
33 0.7 (0.7)2 = 0.49
∑(𝑣𝑎𝑙𝑢𝑒) = 226 ∑(𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑒𝑎𝑛)2 = 59.43
∑(𝑣𝑎𝑙𝑢𝑒−𝑚𝑒𝑎𝑛)2 = 59.43
Hence, variance = ≈ 8.49 litres2
7
(ii) Standard deviation = √8.49 ≈ 2.91 litres (3 s.f.)

TAKE-AWAYS
• Quartiles and percentiles are means of structuring the data for subsequent
analysis.
• A measure of central tendency is a calculated value to which values in the data
set tend to. It serves as a summary of the data.
• The measures of central tendency are:
- Mean (average): the ratio of the sum of data items and the number of data
items.
- Mode: the data item that appears the most (has the highest frequency)
- Median: the middle value of ordered data.
• A measure of dispersion is a quantity of the spread of the data.
• The measure of dispersion are:
- Range: the difference between the highest data value and the lowest data value.
- Interquartile range: difference between upper quartile and lower quartile.
- Semi-interquartile range: half the interquartile range.
- Variance: average (mean) of squared deviations.
- Standard deviation: the square root of the variance.

AddMathLesson (5th Form Term 2, Lesson 36 - Analyzing Data)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AddMathLesson (5th Form Term 2, Lesson 36 - Analyzing Data)

Uploaded by

Copyright:

Available Formats

LESSON 36: Analyzing Data I

“ Without Mathematics, there is nothing you can do. Everything around

36.1 QUARTILES AND PERCENTILES

75 Upper Quartile (Q3)

25 Lower Quartile (Q1)

The 25th percentile is called the Lower Quartile (Q1)

36.2 MEASURES OF CENTRAL TENDENCY

Example 1 : (Calculating measures of central tendency from raw data)

The middle value is 32 litres , so this is the median.

Example 2 : (Calculating measures of dispersion from raw data)

Smallest observation largest observation

Hence, range = 37 – 28 = 9 litres.

(b) a) (i) Raw data arranged in ascending order is

Deviation = data value – mean

Data values Deviations Square of deviations

(ii) Standard deviation = √8.49 ≈ 2.91 litres (3 s.f.)

You might also like