You are on page 1of 5

In statistics and mathematics, the range is the difference between the maximum

and minimum values of a data set and serve as one of two important features of a
data set. The formula for a range is the maximum value minus the minimum
value in the dataset, which provides statisticians with a better understanding of
how varied the data set is.

Two important features of a data set include the center of the data and the spread
of the data, and the center can be measured in a number of ways: the most
popular of these are the mean, median, mode, and midrange, but in a similar
fashion, there are different ways to calculate how spread out the data set is and
the easiest and crudest measure of spread is called the range.
The calculation of the range is very straightforward. All we need to do is find the
difference between the largest data value in our set and the smallest data value.
Stated succinctly we have the following formula: Range = Maximum Value–
Minimum Value. For example, the data set 4,6,10, 15, 18 has a maximum of 18, a
minimum of 4 and a range of 18-4 = 14.
Limitations of Range
The range is a very crude measurement of the spread of data because it is
extremely sensitive to outliers, and as a result, there are certain limitations to the
utility of a true range of a data set to statisticians because a single data value can
greatly affect the value of the range.

For example, consider the set of data 1, 2, 3, 4, 6, 7, 7, 8. The maximum value is 8,


the minimum is 1 and the range is 7. Then consider the same set of data, only
with the value 100 included. The range now becomes 100-1 = 99 wherein the
addition of a single extra data point greatly affected the value of the range. The
standard deviation is another measure of spread that is less susceptible to
outliers, but the drawback is that the calculation of the standard deviation is
much more complicated.

The range also tells us nothing about the internal features of our data set. For
example, we consider the data set 1, 1, 2, 3, 4, 5, 5, 6, 7, 8, 8, 10 where the range
for this data set is 10-1 = 9. If we then compare this to the data set of 1, 1, 1, 2, 9,
9, 9, 10. Here the range is, yet again, nine, however, for this second set and unlike
the first set, the data is clustered around the minimum and maximum. Other
statistics, such as the first and third quartile, would need to be used to detect
some of this internal structure.

Applications of Range
The range is a good way to get a very basic understanding of how spread out
numbers in the data set really are because it is easy to calculate as it only requires
a basic arithmetic operation, but there are also a few other applications of the
range of a data set in statistics.

The range can also be used to estimate another measure of spread, the standard
deviation. Rather than go through a fairly complicated formula to find the
standard deviation, we can instead use what is called the range rule. The range is
fundamental in this calculation.

The range also occurs in a boxplot, or box and whiskers plot. The maximum and
minimum values are both graphed at the end of the whiskers of the graph and the
total length of the whiskers and box is equal to the range.

The interquartile range (IQR) is the difference between the first quartile and
third quartile. The formula for this is:

IQR = Q3 - Q1

There are many measurements of the variability of a set of data. Both


the range and standard deviation tell us how spread out our data is. The problem
with these descriptive statistics is that they are quite sensitive to outliers. A
measurement of the spread of a dataset that is more resistant to the presence of
outliers is the interquartile range.

Definition of Interquartile Range


As seen above, the interquartile range is built upon the calculation of other
statistics. Before determining the interquartile range, we first need to know the
values of the first quartile and third quartile. (Of course, the first and third
quartiles depend upon the value of the median).

Once we have determined the values of the first and third quartiles, the
interquartile range is very easy to calculate. All that we have to do is to subtract
the first quartile from the third quartile. This explains the use of the term
interquartile range for this statistic.

Example
To see an example of the calculation of an interquartile range, we will consider
the set of data: 2, 3, 3, 4, 5, 6, 6, 7, 8, 8, 8, 9. The five number summary for this
set of data is:

 Minimum of 2
 First quartile of 3.5
 Median of 6
 Third quartile of 8
 Maximum of 9

Thus we see that the interquartile range is 8 – 3.5 = 4.5.

The Significance of the Interquartile Range


The range gives us a measurement of how spread out the entirety of our data set
is. The interquartile range, which tells us how far apart the first and third
quartile are, indicates how spread out the middle 50% of our set of data is.

Resistance to Outliers
The primary advantage of using the interquartile range rather than the range for
the measurement of the spread of a data set is that the interquartile range is not
sensitive to outliers. To see this, we will look at an example.

From the set of data above we have an interquartile range of 3.5, a range of 9 – 2
= 7 and a standard deviation of 2.34. If we replace the highest value of 9 with an
extreme outlier of 100, then the standard deviation becomes 27.37 and the range
is 98. Even though we have quite drastic shifts of these values, the first and third
quartiles are unaffected and thus the interquartile range does not change.

Use of the Interquartile Range


Besides being a less sensitive measure of the spread of a data set, the interquartile
range has another important use. Due to its resistance to outliers, the
interquartile range is useful in identifying when a value is an outlier.

The interquartile range rule is what informs us whether we have a mild or strong


outlier. To look for an outlier, we must look below the first quartile or above the
third quartile. How far we should go depends upon the value of the interquartile
range.

The interquartile range rule is useful in detecting the presence of


outliers. Outliers are individual values that fall outside of the overall pattern of a
data set. This definition is somewhat vague and subjective, so it is helpful to have
a rule to apply when determining whether a data point is truly an outlier—this is
where the interquartile range rule comes in.

What Is the Interquartile Range?


Any set of data can be described by its five-number summary. These five
numbers, which give you the information you need to find patterns and outliers,
consist of (in ascending order):

 The minimum or lowest value of the dataset


 The first quartile Q1, which represents a quarter of the way through the list
of all data
 The median of the data set, which represents the midpoint of the whole list
of data
 The third quartile Q3, which represents three-quarters of the way through
the list of all data
 The maximum or highest value of the data set.

These five numbers tell a person more about their data than looking at the
numbers all at once could, or at least make this much easier. For example,
the range, which is the minimum subtracted from the maximum, is one indicator
of how spread out the data is in a set (note: the range is highly sensitive to
outliers—if an outlier is also a minimum or maximum, the range will not be an
accurate representation of the breadth of a data set).

Range would be difficult to extrapolate otherwise. Similar to the range but less
sensitive to outliers is the interquartile range. The interquartile range is
calculated in much the same way as the range. All you do to find it is subtract the
first quartile from the third quartile:

IQR = Q3 – Q1.

The interquartile range shows how the data is spread about the median. It is less
susceptible than the range to outliers and can, therefore, be more helpful.

Using the Interquartile Rule to Find Outliers


Though it's not often affected much by them, the interquartile range can be used
to detect outliers. This is done using these steps:

1. Calculate the interquartile range for the data.


2. Multiply the interquartile range (IQR) by 1.5 (a constant used to discern
outliers).
3. Add 1.5 x (IQR) to the third quartile. Any number greater than this is a
suspected outlier.
4. Subtract 1.5 x (IQR) from the first quartile. Any number less than this is a
suspected outlier.
Remember that the interquartile rule is only a rule of thumb that generally holds
but does not apply to every case. In general, you should always follow up your
outlier analysis by studying the resulting outliers to see if they make sense. Any
potential outlier obtained by the interquartile method should be examined in the
context of the entire set of data.

Interquartile Rule Example Problem


See the interquartile range rule at work with an example. Suppose you have the
following set of data: 1, 3, 4, 6, 7, 7, 8, 8, 10, 12, 17. The five-number summary for
this data set is minimum = 1, first quartile = 4, median = 7, third quartile = 10
and maximum = 17. You may look at the data and automatically say that 17 is an
outlier, but what does the interquartile range rule say?

If you were to calculate the interquartile range for this data, you would find it to
be:

Q3 – Q1 = 10 – 4 = 6

Now multiply your answer by 1.5 to get 1.5 x 6 = 9. Nine less than the first
quartile is 4 – 9 = -5. No data is less than this. Nine more than the third quartile
is 10 + 9 =19. No data is greater than this. Despite the maximum value being five
more than the nearest data point, the interquartile range rule shows that it
should probably not be considered an outlier for this data set.

You might also like