You are on page 1of 13

Lesson 4.

MEASURES OF CENTRAL TENDENCY AND


VARIABILITY
▪ Frequency distributions, by themselves do not allow quantitative statements that
characterize the distribution as a whole to be made, nor do they allow quantitative
comparisons to be made between two or more distributions.
▪ It is also important to quantify the extent to which scores in distributions are
dispersed, or spread out.
▪ These requirements are satisfied using measures of central tendency and
measures of variability.

4.1 MEASURES OF CENTRAL TENDENCY

The three most used measures of central tendency are the arithmetic mean, the
median, and the mode.

4.1.1 The MEAN

▪ The arithmetic mean is defined as the sum of the scores divided by the
number of scores.

▪ Properties of the Mean

1. Changing a Score: Changing the value of any score in a distribution will


change the mean.

2. Introducing a New Score or Removing a Score: If you add a new score or


take away a score, both X and N will change. It does not guarantee,
however, that a change in the mean will take place.

3. Adding or Subtracting a Constant: Consider this example.

Amount of Food Drug to reduce


eaten by rats appetite (-2)
6 4
3 1
5 3
3 1
4 2
5 3
X = 26 X = 14
Mean = 4.333 Mean = 2.333

When adding or subtracting a constant, the mean changes in exactly the same
way.
4. Multiplying or Dividing a Score by a Constant: Again, this example.

Given X3
10 30
9 27
8 24
10 30
11 33
5 15
X = 53 X = 159
Mean = 8.833 Mean = 26.5

When multiplying or dividing a score by a constant, the mean changes in exactly


the same way.

▪ The Overall Mean

Given that three distributions have the same number of scores (i.e., n1=20,
n2=20, and n3=20); with the following means: Mean1 = 60; Mean2 = 50; Mean3 =
40

Because the number of scores per distribution is the same, then:

60 + 50 + 40 = 150/3 = 50

However, with unequal number of scores per distribution (i.e., n1=10, n2=60, and
n3=30), then:

60 x 10 = 600
50 x 60 = 3000
40 x 30 = 1200
4800/100 = 48

Exercise: A researcher conducted an experiment involving three groups of subjects. The


mean of the first group is 75, and there were 50 subjects in the group. The mean of the
second group is 80, and there were 40 subjects. The third group has a mean of 70 and
25 subjects. Calculate the overall mean of the three groups combined.
Solution:
Mean n Mean x n
75 50 3750
80 40 3200
70 25 1750
Total 115 8700
Overall Mean 75.652

4.1.2 The MEDIAN

Definition: The median (symbol Mdn) is defined as the scale value below which 50% of
the scores fall. It is therefore the same thing as P 50.

Practice Problem:

Calculate the median of the grouped scores in the following table.

Class
f Cum f Cum %
Interval
3.6 – 4.0 4 52 100.00
3.1 – 3.5 6 48 92.31
2.6 – 3.0 8 42 80.77
2.1 – 2.5 10 34 65.38
1.6 – 2.0 9 24 46.15
1.1 – 1.5 7 15 28.85
0.6 – 1.0 5 8 15.38
0.1 – 0.5 3 3 5.77
N=52

Mdn = P50
= XL + (i/fi)(cum fp – cum fL)

XL = 2.05
i = 0.5
fi = 10
cumfp = 26
cumfL = 24

Mdn = 2.05 + (0.5/10)(26-24)


= 2.05 + 0.10
= 2.15
When dealing with raw (ungrouped) scores, first arrange the scores in rank order. The
median is the centermost score if the number of scores is odd. If the number is even,
the median is taken as the average of the two centermost scores.

4.1.3 The MODE

The third and last measure of central tendency that we shall discuss is the mode.

■ The mode is defined as the most frequent score in the distribution.

Clearly, this is the easiest of the three measures to determine. The mode is found by
inspection of the scores; there isn’t any calculation necessary. For instance, to find the
mode of the data in Table 3.2 (see below), all we need to do is search the frequency
column. The mode for these data is 76. With grouped scores, the mode is designated as
the midpoint of the interval with the highest frequency. The mode of the grouped scores
in Table 3.4 (see next page) is 77.
*When all the scores in the distribution have the same frequency, it is customary to say
that the distribution has no mode.

Usually, distributions are unimodal; that is, they have only one mode. How- ever, it is
possible for a distribution to have many modes. When a distribution has two modes, as
is the case with the scores 1, 2, 3, 3, 3, 3, 4, 5, 7, 7, 7, 7, 8, 9, the distribution is called
bimodal. Histograms of a unimodal and bimodal distribution are shown in Figure 4.2.
Although the mode is the easiest measure of central tendency to determine, it is not
used very much in the behavioral sciences because it is not very stable from sample to
sample and often there is more than one mode for a given set of scores.

Measures of Central Tendency and Symmetry

If the distribution is unimodal and symmetrical, the mean, median, and mode will all be
equal. An example of this is the bell-shaped curve shown in Figure 4.3. When the
distribution is skewed, the mean and median will not be equal. Since the mean is most
affected by extreme scores, it will have a value closer to the extreme scores than will the
median. Thus, with a negatively skewed distribution, the mean will be lower than the
median. With a positively skewed curve, the mean will be larger than the median. Figure
4.3 shows these relationships.

4.2 MEASURES OF VARIABILITY

Previously, we pointed out that variability specifies how far apart the scores are spread.
Whereas measures of central tendency are a quantification of the average value of the
distribution, measures of variability quantify the extent of dispersion. Three measures of
variability are commonly used in the behavioral sciences: the range, the standard
deviation, and the variance.

4.2.1 The Range

We have already used the range when we were constructing frequency distributions of
grouped scores.
■ The range is defined as the difference between the highest and lowest scores in the
distribution. In equation form,

Range = Highest score - Lowest score + 1

The range is easy to calculate but gives us only a relatively crude measure of dispersion,
because the range really measures the spread of only the extreme scores and not the
spread of any of the scores in between. Although the range is easy to calculate, we’ve
included some problems for you to practice on. Better to be sure than sorry.

Practice Problems:

Calculate the range for the following distributions:

1. 2, 3, 5, 8, 10 Answer: 10 – 2 + 1 = 9

2. 18, 12, 28, 15, 20 Answer: _____

3. 115, 107, 105, 109, 101 Answer: _____

4. 1.2, 1.3, 1.5, 1.8, 2.3 Answer: _____

4.2.2 The Standard Deviation

Before discussing the standard deviation, it is necessary to introduce the concept of a


deviation score.

Deviation scores So far, we’ve been dealing mainly with raw scores. You will recall that
a raw score is the score as originally measured. For example, if we are interested in IQ
and we measure an IQ of 126, then 126 is a raw score.

To understand standard deviations better, we will present how they are determined side
by side with deviation scores. Look at the following table:
The column presents the deviation scores, which, as you would note, sums up to
zero (0). Something to remember… the sum of all deviation scores will ALWAYS be equal
to zero. This is a “general property of the mean.” But this doesn’t imply that there is NO
deviation. To determine how dispersed the scores really are (i.e., how “far” they are from
the mean, each deviation score is squared (see column next to ). For this
distribution, the sum of squares of all deviation scores (SS) is 40.

Calculating the standard deviation would then be easy. Just follow the formula and
procedure you see in the screenshot above. I hope that at this point, you have
scientific calculators that you can use to verify how the values are obtained.

Looking at the table title for Table 4.7 (i.e., Calculation of the standard deviation of sample
scores using the deviation method) shows us that the scores in the distribution came from
a sample (not a population). The following table shows how it is done when the scores
are obtained from a population.
Note: Be mindful of symbols representing the standard deviation pertaining to samples
and populations. As a general rule, sample symbols use letters in the English alphabet,
while population symbols use Greek letters.

Comparing the formulas used for computing standard deviations for samples versus
populations, do you notice a slight difference? (Refer to the screenshots above)

By now you should have observed that the denominators for the two formulas are slightly
different (i.e., N – 1 for sample SDs, and simply N for population SDs). Why is there a
difference?

Technically, the equation is the same for calculating the standard deviation of sample
scores. However, when we calculate the standard deviation of sample data, we usually
want to use our calculation to estimate the population standard deviation. It can be
shown algebraically that the equation with N in the denominator gives an estimate that on
the average is too small. Dividing by N - 1, instead of N, gives a more accurate estimate
of s.

Just think of it this way. When we deal with population data, ALL scores are considered.
Therefore, the computed standard deviation is the EXACT standard deviation. But when
we deal with sample data, NOT ALL SCORES are used. Therefore, the computed
standard deviation is not expected to be exact. That is why we call it an ESTIMATE. The
sample standard deviation is an estimate of the population standard deviation. And like
any estimate, we want it to be closer to reality. We want it to be reliable. By using N-1 as
the denominator instead of just N, we arrive at a larger SD value. By saying that the
sample SD is larger than the population SD, we are in fact saying that the sample SD is
an UNBIASED ESTIMATE of the population SD.

Okay, now let us get ourselves acquainted with new terms: The sample mean is an
unbiased estimate of the population mean. The sample standard deviation is an unbiased
estimate of the population standard deviation.

So, kung unbiased estimate ang sample mean, anong tawag sa population mean?
(Answer: TRUE MEAN). And if the sample standard deviation is an unbiased estimate,
what do you call the population SD?

In reality however, you compute standard deviations directly using calculators. Hence, we
use a different formula (you call it the COMPUTATIONAL FORMULA), and the process
is called the COMPUTATIONAL PROCEDURE. Be guided by the next screenshot:
Note: In this image labeled “table 4.9”, the method is called the “raw score method.”
Pareho lang yan ng computational procedure. Please compute the values yourselves
using your scientific calculators, do not simply read the materials being given to
you. Learning is best achieved by actively practicing how the process is done.

Now, do the next Practice Problems.


Make sure you go through the steps yourselves. This will allow you to check if
you can perform the procedure correctly.

The next practice set makes use of scores that have decimal values. Again,
practice using your calculators.
Note: For our purposes, answers should be rounded off to the hundredth

4.2.3 The Variance

The variance of a set of scores is just the square of the standard deviation. For sample
scores, the variance equals

For population scores, the variance equals



The variance is not used much in descriptive statistics because it gives us squared units
of measurement. However, it is used quite frequently in inferential statistics.

You might also like