You are on page 1of 16

NOTCHED AND VARIABLE WIDTH BOX-PLOT

Applied Statistics and Computing Lab Indian School of Business

Applied Statistics and Computing Lab

Learning goals
What is a notched box-plot? How does one construct such a plot? What is a variable width box-plot? How does one construct such a plot? How are they useful? Can one combine these two features of a box-plot? How does one construct a box-plot for data with factors? What is its use?

Applied Statistics and Computing Lab

Dataset
For this study of Notched and Variable Width box-plots, we consider a slightly modified version of the scores dataset Suppose the score record is blank for some students, during some or all the exams The student could have been absent for the exam There could be a data entry error In either case, we do not have 50 scores for each of the exams
Variable name First minor Second minor Third minor First semester GPA Second semester scores 47
3

# of observations available

48

45

48

47

Applied Statistics and Computing Lab

Notched Box-plots
As per Oxford Advanced Learners Dictionary, one of the meanings of notch is a V-shaped cut in an edge or a surface. This is used to test whether two or more population medians are equal at 5% level In a notched box-plot, a notch appears on either side of the median. The interval corresponding to the notch is the confidence interval for the population median If the notches of the box-plots of variables in the same frame do not overlap, then we conclude that the population medians are different (using a test at 5% level of significance)

Applied Statistics and Computing Lab

Notched box-plots (contd.)

Applied Statistics and Computing Lab

Width of the box


If there is only one batch (variable), the width can be arbitrary. If there are several batches, each having the same number of observations, then again the width can be the same for all the variables. If there are several batches with varying numbers of observations, it is desirable that the Box-plots produced in the same frame exhibit this information. This can be done using varwidth option in R When this option is used, the width of each box is proportional to the square root of the number of observations

Applied Statistics and Computing Lab

Variable width box-plot

Applied Statistics and Computing Lab

What if we combine the features of notches and variable width, to make a variable width notched box-plot?

Applied Statistics and Computing Lab

Variable width Notched Box-plot

Applied Statistics and Computing Lab

Comments on the Box-plot


Earlier we remarked that the medians of the three minors appear to be close. From the preceding plot, it is clear that the notch of First.minor does not overlap with those of the other two Thus the earlier belief is refuted The upper end of the notch of the Box-plot of Second.minor barely coincides with the lower end of the notch of Third.minor Thus it cannot be said that the medians of the minors at population level are the same

Applied Statistics and Computing Lab

10

Box-plot for data with factors


Sometimes we have data on a batch with factors Research has shown that in the fast-paced world of electronics, the key factor that separates the winners from the losers is actually how slow a firm is in making decisions: The most successful firms take longer to arrive at strategic decisions on product development, adopting new technologies, or developing new products The following values are the number of months taken to arrive at a decision, for firms ranked high, medium and low in terms of Performance:
High Medium Low 3.5 4.8 3 1 5.5 2.5 3 6 2 6.5 7.5 4 4 8 4.5 6 2 6 6 2 5.5 6.5 9 4.5 2 7 5 3.5 9 5 10 6

2.5 7 1 2
11

1.5 1.5

3.8 4.5 0.5

Applied Statistics and Computing Lab

Box-plot for data with factors (contd.)


Notice that in such cases, typically one does analysis of variance to test the equality of means. Here, the batch is the data on the number of months taken to arrive at a decision and the factor is the performance: high, medium and low In such cases one can use a variable width notched Box-plot to examine the equality of medians. This can be used independently or in conjunction with the analysis of variance in arriving at meaningful conclusions on the location behavior of different factors

Applied Statistics and Computing Lab

12

Box-plot for data with factors (contd.)

Applied Statistics and Computing Lab

13

Comments on the Box-plot


From the plot it is clear that the medians in the population are most unlikely to be equal For the Box-plot for high performance, the notch is within the first and third quartiles. However, for the plots corresponding to low and medium performances, the lower end of the notch is below the first quartile. Thus the population median could fall below the observed first quartile in these two cases It is also worth noting that the sampling variability of the median (as observed by the length of the notch) is about the same for the three factors (performance groups).

Applied Statistics and Computing Lab

14

R-codes
Plot Notched box-plot Variable width box-plot R-code boxplot(data name, notch=TRUE) install.packages(aplpack) library(aplpack) boxplot(data name, varwidth=TRUE) boxplot(data name, varwidth=TRUE, notch=TRUE) Boxplot(numeric variable~factor variable, varwidth=TRUE, notch=TRUE)

Variable width notched boxplot Box-plot for data with factors

Applied Statistics and Computing Lab

15

Thank you

Applied Statistics and Computing Lab