Professional Documents
Culture Documents
Adv Stat Central Tendency Variations
Adv Stat Central Tendency Variations
and
VARIABILITY
Prepared by:
DR LEONORA T. DELA CRUZ
Central Tendency
Central tendency (sometimes called “measures of location,”
“central location,” or just “center”) is a way to describe
what’s typical for a set of data. Central tendency doesn’t tell
you specifics about the individual pieces of data, but it does
give you an overall picture of what is going on in the entire
data set.
It is a single value that attempts to describe a set of data by
identifying the central position within that set of data. As
such, measures of central tendency are sometimes called
measures of central location.
A balance scale. The point at which the distribution is in
balance.
Measures of Central Tendency
The sum of the value of each observation in a dataset divided by the number of observations.
This is also known as the arithmetic average.
1. The mean can be used for both continuous and discrete numeric data.
Mean The mean cannot be calculated for categorical data, as the values cannot be summed.
As the mean includes every value in the distribution the mean is influenced by outliers and
skewed distributions.
The median is the middle value in distribution when the values are arranged in ascending or
descending order. It divides the distribution in half (there are 50% of observations on either side
of the median value). In a distribution with an odd number of observations, the median value is
2.
the middle value.
Median The median is less affected by outliers and skewed data than the mean, and is usually the
preferred measure of central tendency when the distribution is not symmetrical.
The median cannot be identified for categorical nominal data, as it cannot be logically ordered.
The value that occurs most often in the data.
Can be found for both numerical and categorical (non-numerical) data.
In some distributions, the mode may not reflect the center of the distribution very well.
3.
It is also possible for there to be more than one mode for the same distribution of data, (bi-
Mode modal, or multi-modal).
In some cases, particularly where the data are continuous, the distribution may have no mode
at all (i.e. if all values are different).
How does the shape of a distribution influence the
Measures of Central Tendency?
Symmetrical distributions:
When a distribution is symmetrical, the mode, median and
mean are all in the middle of the distribution.
Positively skewed distributions:
The mean to be ‘pulled’ toward the right tail of the distribution.
Although there are exceptions to this rule, generally, most of the
values, including the median value, tend to be less than the
mean value.
Negatively skewed distributions:
The mean to be ‘pulled’ toward the left tail of the distribution.
Although there are exceptions to this rule, generally, most of the
values, including the median value, tend to be greater than the
mean value.
Finding the mean and median of ungrouped data
fx Mean Median
162 6th score
55
56 Mdn = 57
114 x̄ = 623
11
116
x̄ = 56.64 Mode:
120
n = 11
The value that occurs most
Σfx = 623
Mo: 54
Mean, Median and Mode for Grouped Data
• Mean
■ Median
■ Mode
Mean, Median and Mode for Grouped Data
Exercises 8 (Grouped Data): A group of University students took part in a sponsored
race. The number of laps completed is given in the table below. Use the information to (a)
calculate an estimate for the mean number of laps; (b) Determine the mode; and (c)
Solve for the median.
Number of Frequency Midpoint (f x) Cumulative Frequency
Laps (f) (x) (F)
1–5 2 3 6 2
6 – 10 9 8 72 11
11 – 15 13 13 169 24
16 – 20 22 18 396 46
21 – 25 17 23 391 63
26 – 30 25
28 700 88
31 – 35 2
33 66 90
36 – 40 1
38 38 91
Σf or n = 91 Σfx = 1,838
Mean, Median and Mode for Grouped Data
Exercises 8 (Grouped Data): A group of University students took part in a sponsored
race. The number of laps completed is given in the table below. Use the information to (a)
calculate an estimate for the mean number of laps; (b) Determine the mode; and (c)
Solve for the median.
Number of Frequency Midpoint (f x) Cumulative Frequency
Laps (f) (x) (F)
1–5 2 1st to 2nd
3 6 2
6 – 10 9 8 72 3rd to 11th
11
11 – 15 13 13 169 24 12th to 24th
16 – 20 22 18 396 46 25th to 46th
21 – 25 17 23 391 63 47th to 63rd
26 – 30 25
28 700 88 64th to 88th
31 – 35 2
33 66 90 89th to 90th
36 – 40 1
38 38 91 91st
Σf or n = 91 Σfx = 1,838
Mean, Median and Mode for Grouped Data
Exercises 8 (Grouped Data): A group of University students took part in a sponsored
race. The number of laps completed is given in the table below. Use the information to (a)
calculate an estimate for the mean number of laps; (b) Determine the mode; and (c)
Solve for the median.
Number of Frequency Midpoint (f x) Cumulative Frequency
Laps (f) (x) (F) Mean
1–5 2 3 6 2
6 – 10 9 8 72 11
11 – 15 13 13 169 24
16 – 20 22 18 396 46
21 – 25 17 23 391 63 x̄ = 1,838
26 – 30 25
28 700 88 91
31 – 35 2
33 66 90
36 – 40 1
38 38 91
x̄ = 20.19
Σf or n = 91 Σfx = 1,838
Mean, Median and Mode for Grouped Data
Exercises 8 (Grouped Data): A group of University students took part in a sponsored
race. The number of laps completed is given in the table below. Use the information to (a)
calculate an estimate for the mean number of laps; (b) Determine the mode; and (c)
Solve for the median.
Number of Frequency Midpoint (f x) Cumulative Frequency
Laps (f) (x) (F) Median
1–5 2 3 6 2
6 – 10 9 8 72 11
Lower Boundary
11 – 15 13 13 169 24
16 – 20 22 18 396 46
15.5
21 – 25 17 23 391 63 91 - 24
26 – 30 25 2
28 700 88
31 – 35 2 Mdn = 15.5 + 22 5
33 66 90
36 – 40 1
38 38 91
Mdn = 15.5 + 4.89
Σf or n = 91 Σfx = 1,838 Mdn = 20.39
Mean, Median and Mode for Grouped Data
Exercises 8 (Grouped Data): A group of University students took part in a sponsored race. The
number of laps completed is given in the table below. Use the information to (a) calculate an
estimate for the mean number of laps; (b) Determine the mode; and (c) Solve for the median.
Quartiles are the three values that split a data set into
four equal parts. Note that the 'middle' quartile is the
median.
Quartiles
Quartiles for Ungrouped Data:
Exercises:
Exercises 9: The following data are marks obtained by 20 students in a test of
statistics. Determine the Q1, Q2, and Q3
53 74 82 42 39 20 81 68 58 28
67 54 93 70 30 55 36 38 29 61
Array 2 28 29 30 36 38 39 42 53 54 55 58 61 67 68 70 74 81 82 93
0
Oder 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Q1 = value of (20 + 1)th item Q2 = value of (20 + 1)th item Q3 = value of 3(20 + 1)th item
4 2 4
Q1 = value of 5.25th item Q2 = value of 10.5th item Q3 = value of 15.75th item
Q1 = 36 + 0.25 (38 – 36) Q2 = 54 + 0.5 (55 – 54) Q3 = 68 + 0.75 (70 – 68)
Q1 = 36 + 0.25 (2) = 36 + 0.5 Q2 = 54 + 0.5 (1) = 54 + 0.5 Q3 = 68 + 0.75 (2) = 68 + 1.5
= 36.5 = 54.5 = 69.5
Quartiles for Grouped Data
To find the location of the quartiles use:
q1 = (¼) n for Q1;
q2 = (½) n for Q2; and
q3 = (¾) n for Q3
Groups in semester 2 show more dispersion (or variability in size) than those in semester
1.
Four Frequently used Measures of Variability
1. Range is simply the highest score minus the lowest score. It is easy to
calculate and very much affected by extreme values (range is not a
resistant measure of variability)
R = upper boundary of the highest interval – lower boundary of the lowest interval
Four Frequently used Measures of Variability
2. The interquartile range (IQR) is the difference between upper and lower quartiles and
denoted as IQR. In some texts the interquartile range is defined differently. It is defined
as the difference between the largest and smallest values in the middle 50% of a set of
data. IQR is not affected by extreme values. It is thus a resistant measure of variability.
4. The Variance is defined as the average of the squared differences from the Mean.
Formula for the standard deviation and variance
n = 20 Mark (X) x – x̅ (x – x̅)2
Exercises 43
48
50
-18.9
-13.9
-11.9
357.21
193.21
141.61
Exercises 11 (Ungrouped Data): Shown below 50 -11.9 141.61
are the examination marks for 20 students 52 -9.9 98.01
following a particular module. Determine the 52 -9.9 98.01
values of (a) Range; (b) IQR; (c) Standard 56 -5.9 34.81
58 -3.9 15.21
Deviation; and (d) Variance using the table
59 -2.9 8.41
below 60 -1.9 3.61
60 74 76 78 66 68 50 56 58 43
62 0.1 0.01
48 50 59 62 70 71 80 65 52 52 65 3.1 9.61
66 4.1 16.81
Mean Range 68 6.1 37.21
70 8.1 65.61
R = 80 – 43 71 9.1 82.81
74 12.1 146.41
x̄ = 1,238 R = 37 14.1 198.81
76
20 78 16.1 259.21
80 18.1 327.61
x̄ = 61.9
Σx = 1,238 Σ(x– x̅ )2 = 2,235.79
n = 20 Mark (X) x – x̅ (x – x̅)2
Exercises 43
48
50
-18.9
-13.9
-11.9
357.21
193.21
141.61
Exercises 11 (Ungrouped Data): Shown below 50 -11.9 141.61
are the examination marks for 20 students 52 -9.9 98.01
following a particular module. Determine the 52 -9.9 98.01
values of (a) Range; (b) IQR; (c) Standard 56 -5.9 34.81
58 -3.9 15.21
Deviation; and (d) Variance using the table
59 -2.9 8.41
below 60 -1.9 3.61
60 74 76 78 66 68 50 56 58 43
62 0.1 0.01
48 50 59 62 70 71 80 65 52 52 65 3.1 9.61
66 4.1 16.81
Q1 = value of (20 + 1)th item Q3 = value of 3(20 + 1)th item
4 68 6.1 37.21
4
Q3 = value of 15.75th item 70 8.1 65.61
Q1 = value of 5.25th item
71 9.1 82.81
Q1 = 52 + 0.25 (52 – 52) Q3 = 70 + 0.75 (71 – 70)
74 12.1 146.41
Q1 = 52 + 0.25 (0) = 52 + 0 Q3 = 70 + 0.75 (1) = 70 + 0.75 76 14.1 198.81
= 52.0 = 70.75
78 16.1 259.21
80 18.1 327.61
IQR = Q3 – Q1 = 70.75 – 52.0 = 18.75
Σx = 1,238 Σ(x– x̅ )2 = 2,235.79
n = 20 Mark (X) x – x̅ (x – x̅)2
Exercises 43
48
50
-18.9
-13.9
-11.9
357.21
193.21
141.61
Exercises 11 (Ungrouped Data): Shown below 50 -11.9 141.61
are the examination marks for 20 students 52 -9.9 98.01
following a particular module. Determine the 52 -9.9 98.01
values of (a) Range; (b) IQR; (c) Standard 56 -5.9 34.81
58 -3.9 15.21
Deviation; and (d) Variance using the table
59 -2.9 8.41
below 60 -1.9 3.61
60 74 76 78 66 68 50 56 58 43
62 0.1 0.01
48 50 59 62 70 71 80 65 52 52 65 3.1 9.61
66 4.1 16.81
Standard Deviation Variance
68 6.1 37.21
70 8.1 65.61
71 9.1 82.81
74 12.1 146.41
s2 = (10.85) 2 76 14.1 198.81
2,235.79
78 16.1 259.21
20 – 1 s2 = 117.72
80 18.1 327.61
s = 10.85 Σx = 1,238 Σ(x– x̅ )2 = 2,235.79
Exercises
Exercises 12 (Ungrouped Data in Frequency distribution): 15 students were asked how
many hours (x) they worked per day. Their responses in hours are listed below. Determine
the values of (a) Range; (b) IQR; (c) Standard Deviation; and (d) Variance
Hours of Work Frequency fx x – x̅ (x – x̅)2 (x – x̅)2 f Range: R=8–2=6
(x) (f)
2 2 4 -2.93 8.58 17.16 Q1 = value of (15 + 1)th item = value of 4.00th item
4 4 16 -0.93 0.86 3.44
4
5 5 25
Q1 = 4.00
7 3 0.07 0.0049 0.0245
Q3 = value of 3(15 + 1)th item = value of 12th item
8 1 21 2.07 4.28 12.85 4
8 3.07 9.42 9.42 Q3 = 7.00
x̄ = 74 = 4.93
15
Exercises
Exercises 12 (Ungrouped Data in Frequency distribution): 15 students were asked how
many hours (x) they worked per day. Their responses in hours are listed below. Determine
the values of (a) Range; (b) IQR; (c) Standard Deviation; and (d) Variance
Hours of Work Frequency fx x – x̅ (x – x̅)2 (x – x̅)2 f Standard Deviation Variance
(x) (f)
2 2 4 -2.93 8.5849 17.1698
4 4 16 -0.93 0.8649 3.4596
5 5 25
7 3 0.07 0.0049 0.0245
21 s2 = (1.75) 2
8 1 2.07 4.2849 12.8547 42.89
8 3.07 9.4249 9.4249 15 – 1 s2 = 3.06
n = 15 Σfx = 74 Σ(x – x̅)2 f = 42.89
s = 1.75
x̄ = 74 = 4.93
15
Exercises
Exercise 13 (Grouped Data): 220 students were asked the number of hours per week
they spent watching television. With this information, calculate the mean and standard
deviation of hours spent watching television by the 220 students. Determine the values
of (a) Range; (b) IQR; (c) Standard Deviation; and (d) Variance