You are on page 1of 3

Whenever you collect data, you end up with a group of scores on one or more

variables. If you
take the scores on one variable and arrange them in order from lowest to highest,
what you get
is a distribution of scores. Researchers often want to know about the
characteristics of these
distributions of scores, such as the shape of the distribution, how spread out the
scores are, what
the most common score is, and so on. One set of distribution characteristics that
researchers are
usually interested in is central tendency. This set consists of the mean, median,
and mode.
The mean is probably the most commonly used statistic in all social science
research.
The mean is simply the arithmetic average of a distribution of scores, and
researchers like it
because it provides a single, simple number that gives a rough summary of the
distribution.
It is important to remember that although the mean provides a useful piece of
information,
it does not tell you anything about how spread out the scores are (i.e., variance)
or how many
scores in the distribution are close to the mean. It is possible for a distribution
to have very
few scores at or near the mean.
The median is the score in the distribution that marks the 50th percentile. That
is, 50% of
the scores in the distribution fall above the median and 50% fall below it.
Researchers often use
the median when they want to divide their distribution scores into two equal groups
(called a
median split). The median is also a useful statistic to examine when the scores in
a distribution
are skewed or when there are a few extreme scores at the high end or the low end of
the distribution.
This is discussed in more detail in the following pages.
The mode is the least used of the measures of central tendency because it provides
the least
amount of information. The mode simply indicates which score in the distribution
occurs most
often, or has the highest frequency.

As you will see in Chapter 4, when scores in a distribution are normally


distributed, the mean,
median, and mode are all at the same point: the center of the distribution. In the
messy world
of social science, however, the scores from a sample on a given variable are often
not normally
distributed. When the scores in a distribution tend to bunch up at one end of the
distribution
and there are a few scores at the other end, the distribution is said to be skewed.
When working
with a skewed distribution, the mean, median, and mode are usually all at different
points.
It is important to note that the procedures used to calculate a mean, median, and
mode are
the same whether you are dealing with a skewed or a normal distribution. All that
changes
are where these three measures of central tendency are in relation to each other.
To illustrate,
I created a fictional distribution of scores based on a sample size of 30. Suppose
that I were to
ask a sample of 30 randomly selected fifth graders whether they think it is
important to do well
in school. Suppose further that I ask them to rate how important they think it is
to do well in
school using a 5-point scale, with 1 = �not at all important� and 5 = �very
important.� Because
most fifth graders tend to believe it is very important to do well in school, most
of the scores in
this distribution are at the high end of the scale, with a few scores at the low
end. I have arranged
my fictitious scores in order from smallest to largest and get the following
distribution:

countries are arranged from the longest life expectancy (Japan) to the shortest
(Uganda). As
you can see, there is a gradual decline in life expectancy from Japan through
Turkey, but then
there is a dramatic drop off in life expectancy in Uganda. In this distribution of
nations, Uganda
is an outlier. The average life expectancy for all of the countries except Uganda
is 78.17 years,
whereas the average life expectancy for all 13 countries in Figure 2.2, including
Uganda, drops to
76.21 years. The addition of a single country, Uganda, drops the average life
expectancy for all of
the 13 countries combined by almost 2 full years. Two years may not sound like a
lot, but when
you consider that this is about the same amount that separates the top 5 countries
in Figure 2.2
from each other, you can see that 2 years can make a lot of difference in the
ranking of countries
by the life expectancies of their populations.
The effects of outliers on the mean are more dramatic with smaller samples because
the
mean is a statistic produced by combining all of the members of the distribution
together. With
larger samples, one outlier does not produce a very dramatic effect. But with a
small sample,
one outlier can produce a large change in the mean. To illustrate such an effect, I
examined the
effect of Uganda�s life expectancy on the mean for a smaller subset of nations than
appeared in
Figure 2.2. This new analysis is presented in Figure 2.3. Again, we see that the
life expectancy
in Uganda (about 52 years) was much lower than the life expectancy in Japan, the
United
States, and the United Kingdom (all near 80 years). The average life expectancy
across the
three nations besides Uganda was 79.75 years, but this mean fell to 72.99 years
when Uganda
was included. The addition of a single outlier pulled the mean down by nearly 7
years. In this
small dataset, the median would be between the United Kingdom and the United
States, right
around 78.5 years. This example illustrates how an outlier pulls the mean in its
direction. In
this case, the mean was well below the median.

The range is simply the difference between the largest score (the maximum value)
and the
smallest score (the minimum value) of a distribution. This statistic gives
researchers a quick
sense of how spread out the scores of a distribution are, but it is not a
particularly useful statistic
because it can be quite misleading. For example, in our depression survey described
earlier, we
may have 1 student score a 1 and another score a 20, but the other 98 may all score
10. In this
example, the range will be 19 (20 � 1 = 19), but the scores really are not as
spread out as the range
might suggest. Researchers often take a quick look at the range to see whether all
or most of the
points on a scale, such as a survey, were covered in the sample.
Another common measure
of the range of scores in a distribution is the interquartile range
(IQR). Unlike the range, which is the difference between the largest and smallest
score in the
distribution, the IQR is the difference between the score that marks the 75th
percentile (the
third quartile) and the score that marks the 25th percentile (the first quartile).
If the scores in
a distribution were arranged in order from largest to smallest and then divided
into groups of
equal size, the IQR would contain the scores in the two middle quartiles (see
Figure 3.1).

You might also like