You are on page 1of 40

Chapter 4:

Measures of Variability
• The Importance of Measuring Variability
• IQV (Index of Qualitative Variation)
• The Range
• IQR (Inter-Quartile Range)
• Variance
• Standard Deviation
• Considerations for choosing a measure of
variation
The Importance of
Measuring Variability
• Central tendency - Numbers that describe
what is typical or average (central) in a
distribution
• Measures of Variability - Numbers that
describe diversity or variability in the
distribution.
These two types of measures together help us to sum up a
distribution of scores without looking at each and every
score. Measures of central tendency tell you about typical (or
central) scores. Measures of variation reveal how far from
the typical or central score the distribution tends to vary.
Measures of central tendency use a single number to describe
what is average for or typical of a given distribution.
Measures of central tendency can be very helpful but they tell
only part of the story. When used alone, they may even
mislead rather than inform.
Another way of summarizing a distribution of data is by
selecting a single number that describes how much variation
and diversity there is in the distribution.
Numbers that describe diversity or variation are called
measures of variability.
Researchers often use measures of central tendency along
with measures of variability to describe their data.
Asian-American women
One form of stereotyping is treating a group as if it were totally
characterized by its central value, ignoring the diversity within the group.

The complete story of a particular group, like Asian American women,


can best be told by examining their commonalities and their differences.
Measures of central tendency can be used to document what is common
or average for a group of individuals, and measures of variation are used
to understand the diversity of experiences.

The concept of variability has implications not only for describing the
diversity of social groups such as Asian American women but also for
issues that are important in everyday life.
Grading policies in two statistics courses offered in different
departments are compared:
1.Offered in the sociology department, by Professor Brown.
2.Offered through the school of social work, by Professor
Yamato.

The average grade for Professor Brown’s class has been C+.
The average grade in Professor Yamato’s class is also C+.

Is the grading policy of both instructors about the same?

How are the grades distributed in each of the classes?


Notice that both distributions have the same mean,
yet they are shaped differently
The grades in Professor Yamato’s class are more spread
out ranging from A to F.

The grades for Professor Brown’s class are clustered


around the mean and range only from B to C.

Although the means for both distributions are


identical, the grades in Professor Yamato’s class vary
considerably more than the grades given by Professor
Brown.
Index of Qualitative Variation
• IQV – A measure of variability for nominal variables.
It is based on the ratio of the total number of
differences in the distribution to the maximum number
of possible differences within the same distribution.
IQV = K(1002 - Σf 2)
1002 (K-1)
 Where
K= the number of categories
Σf 2 or Σpct 2 = the sum of all squared frequencies or
percentages
* Compare the racial/ethnic diversity in different cities, regions,
or states.
* If a group has become more racially and ethnically diverse
over time.
The index of qualitative variation (IQV) is a measure of
variability for nominal variables such as race and ethnicity.
The index can vary from 0.00 to 1.00.
When all the cases in the distribution are in one category, there is
no variation (or diversity) and the IQV is 0.00.
In contrast, when the cases in the distribution are distributed
evenly across the categories, there is maximum variation (or
diversity) and the IQV is 1.00.
Understanding the Index of
Qualitative Variation
• The IQV is a single number that expresses the
diversity of a distribution.
• The IQV ranges from 0 to 1
• An IQV of 0 would indicate that the distribution
has NO diversity at all.
• An IQV of 1 would indicate that the distribution is
maximally diverse.
• Racial diversity in Maine and Hawaii
• Table 4.1
• Table 4.2

In Hawaii, where the IQV is 0.84, there is


considerably more racial/ethnic variation than in
Maine, where the IQV is 0.07.
IQV in Real Life: Diversity in the U.S.
State IQV State IQV State IQV
Hawaii 0.82 North Carolina 0.56 Utah 0.30
California 0.72 Alabama 0.54 Indiana 0.30
New Mexico 0.68 Delaware 0.53 Wisconsin 0.28
New York 0.67 Oklahoma 0.48 Nebraska 0.28
Texas 0.66 Colorado 0.46 South Dakota 0.26
Maryland 0.65 Connecticut 0.46 Minnesota 0.26
Georgia 0.63 Arkansas 0.43 Idaho 0.25
Mississippi 0.62 Michigan 0.43 Kentucky 0.23
Louisiana 0.62 Tennessee 0.42 Wyoming 0.23
New Jersey 0.62 Washington 0.41 Montana 0.22
Florida 0.60 Massachusetts 0.38 North Dakota 0.18
South Carolina 0.59 Rhode Island 0.38 Iowa 0.17
Illinois 0.58 Kansas 0.35 West Virginia 0.12
Nevada 0.58 Pennsylvania 0.35 New Hampshire 0.11
Arizona 0.58 Missouri 0.34 Vermont 0.07
Alaska 0.58 Ohio 0.34 Maine 0.07
Virginia 0.56 Oregon 0.33
• IQV can be expressed as a percentage

– Multiply IQV by 100


Expressed as a percentage, the IQV would reflect the
percentage of racial/ethnic differences relative to the
maximum possible differences in each distribution.

• Thus, an IQV of 0.07 indicates that the number of


racial/ethnic differences in Maine is 7% (0.07 × 100) of
the maximum possible differences.
• For Hawaii, an IQV of 0.84 means that the number of
racial/ethnic differences is 84% (0.84 × 100) of the
maximum possible differences.
The Range

Range = highest score - lowest score

• Range – A measure of variation in interval-ratio


variables. It is the difference between the
highest (maximum) and the lowest (minimum)
scores in the distribution.
• Table 4.4
• Percentage change in the elderly population
Range: Alaska has the highest percentage
change, with 50%, and Washington, D.C.,
has the lowest change, with −14.1%.

The range is 64.1 percentage points, or 50%


to −14.1%.
• Simple and quick to calculate, yet crude
because it is based on only the lowest and
highest scores.
• These two scores might be extreme and
rather atypical, making range a misleading
indicator of variation in the distribution.
Among the 50 states and Washington, D.C., no
state has a percentage decrease as that of
Washington, D.C., and only Nevada has a
percentage increase nearly as high as Alaska’s.
The range of 64.1 percentage points does not
give us information about the variation in
states between Washington, D.C., and Alaska.
Inter-Quartile Range
• Inter-Quartile Range (IQR) – A measure of
variation for interval-ratio data. It indicates the
width of the middle 50 percent of the distribution
and is defined as the difference between the
lower and upper quartiles (Q1 and Q3.)

• IQR = Q3 – Q1
• The IQR, defines variation for the middle
50% of the cases.
• Also based on two scores.
• Because it is based on intermediate scores
rather than extreme scores, it avoids some of
the instability associated with the range.

• Table 4.5
• Compare range and IQR
The difference between Range and IQR

These
values Shows
fall greater
together variability
closely

Importance
of the IQR
Yet the ranges
are equal!
The Box Plot
• The Box Plot is a graphic device that visually presents the following elements: the range, the
IQR, the median, the quartiles, the minimum (lowest value,) and the maximum (highest value.)

M ax im u m

Q3

R ange IQ R M ed ia n

Q1

M in im u m
• The box plot provides us with a way to visually
examine the center, the variation, and the
shape of distributions of interval-ratio
variables.

• Box plots are useful for comparing


distributions.
• Figure 4.4
• Box plot of the distribution of the 2008–
2015 projected percentage increase in the
elderly population
1. The center of the distribution is easily identified
by the solid line inside the box.
2. Since the box is drawn between the lower and
upper quartiles, the IQR is reflected in the height of
the box.
3. Similarly, the length of the vertical lines drawn
outside the box (on both ends) represents the range
of the distribution.
4. Both the IQR and the range give us a visual
impression of the spread in the distribution.
5. The relative position of the box and the position of
the median within the box tell us whether the
distribution is symmetrical or skewed.
– A perfectly symmetrical distribution would have the box
at the center of the range as well as the median in the
center of the box.
– When the distribution departs from symmetry, the box
and/or the median will not be centered.
– It will be closer to the lower quartile when there are more
cases with lower scores or to the upper quartile when
there are more cases with higher scores.
• Figure 4.5
• Compare two box plots
Variance
• Variance – A measure of variation for
interval-ratio variables; it is the average of
the squared deviations from the mean
2

 (Y Y )
2
s 
Y
N 1
• Table 4.6

• Regional variation in the elderly population.


• Do figures cluster or are dispersed around
the mean?
• Close to average vs. deviate from the
average
• How much, on average, each score in the
distribution deviates from a central point (the mean)?

• Mean is the reference point because it is based on all


the scores in the distribution.

• The variance and the standard deviation are two


closely related measures of variation that increase or
decrease based on how closely the scores cluster
around the mean.
• Figure 4.6 Deviations from the mean

• Table 4.7

• Table 4.8
Variance
Standard Deviation
• Standard Deviation – A measure of variation for
interval-ratio variables; it is equal to the square
root of the variance.

 (Y Y )
2
s s  Y
N 1
One problem with the variance is that it is based on
squared deviations and therefore is no longer expressed in
the original units of measurement. Thus, we often take
the square root of the variance (standard deviation) and
interpret it instead.

The advantage of the standard deviation is that unlike the


variance, it is measured in the same units as the original
data.

The actual number tells us very little by itself, but it


allows us to evaluate the dispersion of the scores around
the mean.
In a distribution where all the scores are
identical, the standard deviation is zero (0).

Zero is the lowest possible value for the


standard deviation; in an identical distribution,
all the points would be the same, with the
same mean, mode, and median.

There is no variation or dispersion in the


scores.
The more the standard deviation departs from
zero, the more variation there is in the
distribution.

There is no upper limit to the value of the


standard deviation.

The standard deviation can be considered a


standard against which we can evaluate the
positioning of scores relative to the mean and to
other scores in the distribution.
In most distributions, unless they are highly
skewed, about 34% of all scores fall between
the mean and 1 standard deviation above the
mean. Another 34% of scores fall between the
mean and 1 standard deviation below it.

Thus, we would expect the majority of scores


(68%) to fall within 1 standard deviation of
the mean.
• Comparing two distributions. Age
characteristics of female clerical and
technical employees.
• Table 4.10
• Figure 4.7
Find the Mean and the
Standard Deviation
Considerations for Choosing a
Measure of Variability
• For nominal variables, you can only use IQV (Index of
Qualitative Variation.)
• For ordinal variables, you can calculate the IQV or the
IQR (Inter-Quartile Range.) Though, the IQR provides
more information about the variable. In most instances,
social science researchers treat ordinal variables as
interval-ratio measures, preferring to calculate variance
and standard deviation.
• For interval-ratio variables, you can use IQV, IQR, or
variance/standard deviation. The standard deviation (also
variance) provides the most information, since it uses all
of the values in the distribution in its calculation.
How to Choose a Measure of Variation

You might also like