You are on page 1of 5

LESSON 36: Analyzing Data I

“ Without Mathematics, there is nothing you can do. Everything around


you is mathematics. Everything around you is numbers. ”
– Shakuntala Devi

O.M. “ Data in its raw form is limited in its usefulness. That is why it is often necessary to
give it(data) some structure as a form of pre-surgery before subjecting it to analysis
and processing. We analyze data to find meaning, discover things previously hidden
and to understand how it relates to or participates in whatever context of study we are
engaged in. So in this lesson, we will focus on how data is analyzed mainly in its raw
form.”

36.1 QUARTILES AND PERCENTILES


When numerically ordered data is divided into 100 equal parts, then each division is called a
percentile
A quartile is a percentile that divides the data by a quarter. This means that there are
essentially three quartiles for a set of data structured into percentiles, as the diagram below
illustrates:

100

75 Upper Quartile (Q3)

Set of Data
(organized into 100 percentiles) 50 Middle Quartile (Q2 Median)

25 Lower Quartile (Q1)

The 25th percentile is called the Lower Quartile (Q1)


The 50th percentile is called the Middle Quartile or Median (Q2)
The 75th percentile is called the Upper Quartile (Q3)
Organising data into percentiles and therefore into quartiles is one method of structuring the
data to enable various calculations to be performed. All this is part of the process of analyzing
the data.

36.2 MEASURES OF CENTRAL TENDENCY


The process of analyzing data involves the use of descriptive statistics, which applies
calculations to extract useful summaries of the data set. There are two main categories of
descriptive statistics: measures of central tendency and measures of dispersion.
A measure of central tendency is a value to which values in the data set tend to i.e. the
centrality of the data. It therefore serves as a numerical summary of the data. The measures
of central tendency are:
Mean (average): the ratio of the total sum of the data values and the number of data
values.
Mode: the data item that appears the most (has the highest frequency).
Median: the middle value of ordered data.

Example 1 : (Calculating measures of central tendency from raw data)


The number of litres of gasoline served by an attendant at a gas station are:
32, 28, 29, 35, 32, 37, 33
Calculate the following for this set of values:
(a) the mode
(b) the median
(c) the mean.

Solution:
(a) the mode is the data item that appears the most.
32, 28, 29, 35, 32, 37, 33
Therefore, the mode is 32 litres.
(It is possible for a data set to not have a mode.
(b) to calculate the median, we first order the data:
28, 29, 32, 32, 33, 35, 37 (ascending order)

median

The middle value is 32 litres , so this is the median.


𝑠𝑢𝑚 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 32 + 28 + 29 + 35 + 32 + 37 + 33
(c) Mean = =
𝑛𝑜.𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 7

≈ 32.3 litres
36.3 MEASURES OF DISPERSION
A measure of dispersion is a quantity of the spread of the data i.e. how far from a central
value, the values in the data set are. The measures of dispersion are:
Range: the difference between the highest data value and the lowest data value.
Interquartile range: difference between upper quartile and lower quartile.
Semi-interquartile range: half the interquartile range.
Variance: average (mean) of squared deviations.
Standard deviation: the square root of the variance.

Example 2 : (Calculating measures of dispersion from raw data)


The number of litres of gasoline served by an attendant at a gas station are:
32, 28, 29, 35, 32, 37, 33
(a) Calculate the range for this data.
(b) a) Determine:
(i) The upper quartile (ii) the lower quartile
b) Hence, calculate:
(i) the interquartile range (ii) semi-interquartile range
(c) Find for this data:
(i) the variance (ii) standard deviation
Solution:
(a) Raw data is : 32, 28, 29, 35, 32, 37, 33

Smallest observation largest observation

Hence, range = 37 – 28 = 9 litres.

(b) a) (i) Raw data arranged in ascending order is


28, 29, 32, 32, 33, 35, 37

3
Upper quartile = (n + 1)th position // n rep. no. of data items. //
4
3
= (7+ 1) = 6th position
4
Data item in 6th position =
28, 29, 32, 32, 33, 35, 37
Ans: 35
1
(ii) Lower quartile = (n + 1)th position // n rep. no. of data items. //
4
1
= (7 + 1)th position = 2nd position
4
Data item in 2nd position =
28, 29, 32, 32, 33, 35, 37. Thus, answer is 29
b) (i) Interquartile range = upper quartile – lower quartile
= 35 – 29 = 6 litres
(ii) Semi-interquartile range = interquartile range ÷ 2
= 6 ÷ 2 = 3 litres.

(c) (i) Variance uses the mean of the data as a point of reference, so we calculate
the mean first.
32 + 28 + 29 + 35 + 32 + 37 + 33
Mean = ≈ 32.3 litres
7

Deviation = data value – mean


Thus, we can construct the table of calculations

Data values Deviations Square of deviations


(value) (value – mean) (value – mean)2
32 – 0.3 (– 0.3)2 = 0.09
28 – 4.3 (– 4.3)2 = 18.49
29 – 3.3 (– 3.3)2 = 10.89
35 2.7 (2.7)2 = 7.29
32 – 0.3 (– 0.3)2 = 0.09
37 4.7 (4.7)2 = 22.09
33 0.7 (0.7)2 = 0.49
∑(𝑣𝑎𝑙𝑢𝑒) = 226 ∑(𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑒𝑎𝑛)2 = 59.43

∑(𝑣𝑎𝑙𝑢𝑒−𝑚𝑒𝑎𝑛)2 = 59.43
Hence, variance = ≈ 8.49 litres2
7

(ii) Standard deviation = √8.49 ≈ 2.91 litres (3 s.f.)


TAKE-AWAYS
• Quartiles and percentiles are means of structuring the data for subsequent
analysis.
• A measure of central tendency is a calculated value to which values in the data
set tend to. It serves as a summary of the data.
• The measures of central tendency are:
- Mean (average): the ratio of the sum of data items and the number of data
items.
- Mode: the data item that appears the most (has the highest frequency)
- Median: the middle value of ordered data.
• A measure of dispersion is a quantity of the spread of the data.
• The measure of dispersion are:
- Range: the difference between the highest data value and the lowest data value.
- Interquartile range: difference between upper quartile and lower quartile.
- Semi-interquartile range: half the interquartile range.
- Variance: average (mean) of squared deviations.
- Standard deviation: the square root of the variance.

You might also like