Professional Documents
Culture Documents
and minimum values of a data set and serve as one of two important features of a
data set. The formula for a range is the maximum value minus the minimum
value in the dataset, which provides statisticians with a better understanding of
how varied the data set is.
Two important features of a data set include the center of the data and the spread
of the data, and the center can be measured in a number of ways: the most
popular of these are the mean, median, mode, and midrange, but in a similar
fashion, there are different ways to calculate how spread out the data set is and
the easiest and crudest measure of spread is called the range.
The calculation of the range is very straightforward. All we need to do is find the
difference between the largest data value in our set and the smallest data value.
Stated succinctly we have the following formula: Range = Maximum Value–
Minimum Value. For example, the data set 4,6,10, 15, 18 has a maximum of 18, a
minimum of 4 and a range of 18-4 = 14.
Limitations of Range
The range is a very crude measurement of the spread of data because it is
extremely sensitive to outliers, and as a result, there are certain limitations to the
utility of a true range of a data set to statisticians because a single data value can
greatly affect the value of the range.
The range also tells us nothing about the internal features of our data set. For
example, we consider the data set 1, 1, 2, 3, 4, 5, 5, 6, 7, 8, 8, 10 where the range
for this data set is 10-1 = 9. If we then compare this to the data set of 1, 1, 1, 2, 9,
9, 9, 10. Here the range is, yet again, nine, however, for this second set and unlike
the first set, the data is clustered around the minimum and maximum. Other
statistics, such as the first and third quartile, would need to be used to detect
some of this internal structure.
Applications of Range
The range is a good way to get a very basic understanding of how spread out
numbers in the data set really are because it is easy to calculate as it only requires
a basic arithmetic operation, but there are also a few other applications of the
range of a data set in statistics.
The range can also be used to estimate another measure of spread, the standard
deviation. Rather than go through a fairly complicated formula to find the
standard deviation, we can instead use what is called the range rule. The range is
fundamental in this calculation.
The range also occurs in a boxplot, or box and whiskers plot. The maximum and
minimum values are both graphed at the end of the whiskers of the graph and the
total length of the whiskers and box is equal to the range.
The interquartile range (IQR) is the difference between the first quartile and
third quartile. The formula for this is:
IQR = Q3 - Q1
Once we have determined the values of the first and third quartiles, the
interquartile range is very easy to calculate. All that we have to do is to subtract
the first quartile from the third quartile. This explains the use of the term
interquartile range for this statistic.
Example
To see an example of the calculation of an interquartile range, we will consider
the set of data: 2, 3, 3, 4, 5, 6, 6, 7, 8, 8, 8, 9. The five number summary for this
set of data is:
Minimum of 2
First quartile of 3.5
Median of 6
Third quartile of 8
Maximum of 9
Resistance to Outliers
The primary advantage of using the interquartile range rather than the range for
the measurement of the spread of a data set is that the interquartile range is not
sensitive to outliers. To see this, we will look at an example.
From the set of data above we have an interquartile range of 3.5, a range of 9 – 2
= 7 and a standard deviation of 2.34. If we replace the highest value of 9 with an
extreme outlier of 100, then the standard deviation becomes 27.37 and the range
is 98. Even though we have quite drastic shifts of these values, the first and third
quartiles are unaffected and thus the interquartile range does not change.
These five numbers tell a person more about their data than looking at the
numbers all at once could, or at least make this much easier. For example,
the range, which is the minimum subtracted from the maximum, is one indicator
of how spread out the data is in a set (note: the range is highly sensitive to
outliers—if an outlier is also a minimum or maximum, the range will not be an
accurate representation of the breadth of a data set).
Range would be difficult to extrapolate otherwise. Similar to the range but less
sensitive to outliers is the interquartile range. The interquartile range is
calculated in much the same way as the range. All you do to find it is subtract the
first quartile from the third quartile:
IQR = Q3 – Q1.
The interquartile range shows how the data is spread about the median. It is less
susceptible than the range to outliers and can, therefore, be more helpful.
If you were to calculate the interquartile range for this data, you would find it to
be:
Q3 – Q1 = 10 – 4 = 6
Now multiply your answer by 1.5 to get 1.5 x 6 = 9. Nine less than the first
quartile is 4 – 9 = -5. No data is less than this. Nine more than the third quartile
is 10 + 9 =19. No data is greater than this. Despite the maximum value being five
more than the nearest data point, the interquartile range rule shows that it
should probably not be considered an outlier for this data set.