Professional Documents
Culture Documents
On
Submitted To
Professor Dr Rafiquzzaman
Army Institute of Business Administration,Savar,Dhaka.
Submitted By
Murshid Iqbal M9200B012
1. Definition
A box and whisker plot is also called a box plot displays the five-number
summary of a set of data. The five-number summary is, the minimum/lower
extreme, first/lower quartile, median, third/upper quartile and maximum/upper
extreme. In a box plot, a box is drawn from the first quartile to the third quartile.
The ends of the box are the upper and lower quartiles, so the box spans the
interquartile range. The median is marked by a vertical line inside the box. The
whiskers are the two lines outside the box that extend to the highest and lowest
observations.
a. A box and whisker plot is a way of summarizing a set of data measured on an interval
scale. It is often used in explanatory data analysis.
b. It is a graph that presents information from a five-number summary.
c. It does not show a distribution in as much detail as a stem and leaf plot or histogram
does.
d. It is also very useful when large numbers of observations are involved and when two
or more data sets are being compared.
e. It is ideal for comparing distributions because the centre, spread and overall range are
immediately apparent.
f. This type of graph is used to show the shape of the distribution, its central value and
its variability.
g. It can be drawn either vertically or horizontally.
h. They are built to provide high-level information at a glance, offering general
information about a group of data’s symmetry, skew, variance, and outliers.
i. With a box plot, we miss out on the ability to observe the detailed shape of
distribution, such as if there are oddities in a distribution’s modality (number of
‘humps’ or peaks) and skew.
j. They are compact in their summarization of data, and it is easy to compare groups
through the box and whisker markings’ positions.
k. The horizontal orientation can be a useful format when there are a lot of groups to plot
or if those group names are long. A vertical orientation can be a more natural format
when the grouping variable is based on units of time.
l. There are multiple ways like standard and alternative ways of defining the maximum
length of the whiskers extending from the ends of the boxes.
“Data Presentation using Whisker and Box Plot”
Angela and Carl works at a computer store. They recorded the number of sales they
made in each month. In the past 12 months, they sold the following numbers of
computers:
51, 17, 25, 39, 7, 49, 62, 41, 20, 6, 43, 13.
4. Solution
First, put the data in ascending order. Then to find out the median.
6, 7, 13, 17, 20, 25, 39, 41, 43, 49, 51, 62.
b. There are six numbers below the median, namely: 6, 7, 13, 17, 20, 25.
Q1 = the median of these six items
= (6 + 1 ) ÷ 2= 3.5th value
= (third + fourth observations) ÷ 2
= (13 + 17) ÷ 2
= 15
c. Here are six numbers above the median, namely: 39, 41, 43, 49, 51, 62.
Q3 = the median of these six items
= (6 + 1) ÷ 2= 3.5th value
= (third + fourth observations) ÷ 2
= 46
d. The five-number summary for Carl's sales is 6, 15, 32, 46, 62.
“Data Presentation using Whisker and Box Plot”
e. Using the same calculations, we can determine that the five-number summary for
Angela is 1, 17, 26, 42, 57.
a. Carl's highest and lowest sales are both higher than Angela's corresponding
sales and Carl's median sales figure is higher than Angela's. Also Carl's
interquartile range is larger than Angela's.
b. These results suggest that Carl consistently sells more computers than
Angela does.
6. Conclusion :
There are several ways to describe the centre and spread of a distribution. One way to
present this information is with a five-number summary. It uses the median as its
centre value and gives a brief picture of the other important distribution values.
Another measure of spread uses the mean and standard deviation to decipher the
spread of data. This technique, however, is best used with symmetrical distributions
with no outliers.
Despite this restriction, the mean and standard deviation measures are used more
commonly than the five-number summary. The reason for this is that many natural
phenomena can be approximately described by a normal distribution. And for normal
distributions, the mean and standard deviation are the best measures of centre and
spread respectively. Standard deviation takes every value into account, has extremely
useful properties when used with a normal distribution, and is mathematically
manageable. But the standard deviation is not a good measure of spread in highly
skewed distributions and, in these instances, should be supplemented by other
measures such as the semi-quartile range. The semi-quartile range is rarely used as a
measure of spread, partly because it is not as manageable as others. Still, it is a useful
statistic because it is less influenced by extreme values than the standard deviation, is
less subject to sampling fluctuations in highly skewed distributions and is limited to
only two values Q1 and Q3. However, it cannot stand alone as a measure of spread.