You are on page 1of 16

NOTCHED AND VARIABLE WIDTH BOX-PLOT

Applied Statistics and Computing Lab Indian School of Business

Applied Statistics and Computing Lab

Learning goals
• • • • • • • • What is a notched box-plot? How does one construct such a plot? What is a variable width box-plot? How does one construct such a plot? How are they useful? Can one combine these two features of a box-plot? How does one construct a box-plot for data with factors? What is its use?

Applied Statistics and Computing Lab

2

Dataset
• For this study of Notched and Variable Width box-plots, we consider a slightly modified version of the scores dataset • Suppose the score record is blank for some students, during some or all the exams • The student could have been absent for the exam • There could be a data entry error • In either case, we do not have 50 scores for each of the exams
Variable name First minor Second minor Third minor First semester GPA Second semester scores 47
3

# of observations available

48

45

48

47

Applied Statistics and Computing Lab

Notched Box-plots
• As per Oxford Advanced Learner’s Dictionary, one of the meanings of notch is ‘a V-shaped cut in an edge or a surface.’ • This is used to test whether two or more population medians are equal at 5% level • In a notched box-plot, a notch appears on either side of the median. The interval corresponding to the notch is the confidence interval for the population median • If the notches of the box-plots of variables in the same frame do not overlap, then we conclude that the population medians are different (using a test at 5% level of significance)

Applied Statistics and Computing Lab

4

Notched box-plots (contd.)

Applied Statistics and Computing Lab

5

Width of the box
• If there is only one batch (variable), the width can be arbitrary. • If there are several batches, each having the same number of observations, then again the width can be the same for all the variables. • If there are several batches with varying numbers of observations, it is desirable that the Box-plots produced in the same frame exhibit this information. – This can be done using varwidth option in R – When this option is used, the width of each box is proportional to the square root of the number of observations

Applied Statistics and Computing Lab

6

Variable width box-plot

Applied Statistics and Computing Lab

7

What if we combine the features of notches and variable width, to make a variable width notched box-plot?

Applied Statistics and Computing Lab

8

Variable width Notched Box-plot

Applied Statistics and Computing Lab

9

Comments on the Box-plot
• Earlier we remarked that the medians of the three minors appear to be close. From the preceding plot, it is clear that the notch of First.minor does not overlap with those of the other two • Thus the earlier belief is refuted • The upper end of the notch of the Box-plot of Second.minor barely coincides with the lower end of the notch of Third.minor • Thus it cannot be said that the medians of the minors at population level are the same

Applied Statistics and Computing Lab

10

Box-plot for data with factors
• Sometimes we have data on a batch with factors • Research has shown that in the fast-paced world of electronics, the key factor that separates the winners from the losers is actually how slow a firm is in making decisions: The most successful firms take longer to arrive at strategic decisions on product development, adopting new technologies, or developing new products • The following values are the number of months taken to arrive at a decision, for firms ranked high, medium and low in terms of Performance:
High Medium Low 3.5 4.8 3 1 5.5 2.5 3 6 2 6.5 7.5 4 4 8 4.5 6 2 6 6 2 5.5 6.5 9 4.5 2 7 5 3.5 9 5 10 6

2.5 7 1 2
11

1.5 1.5

3.8 4.5 0.5

Applied Statistics and Computing Lab

Box-plot for data with factors (contd.)
• Notice that in such cases, typically one does ‘analysis of variance’ to test the equality of means. Here, the batch is the data on the number of months taken to arrive at a decision and the factor is the performance: high, medium and low • In such cases one can use a variable width notched Box-plot to examine the equality of medians. This can be used independently or in conjunction with the analysis of variance in arriving at meaningful conclusions on the location behavior of different factors

Applied Statistics and Computing Lab

12

Box-plot for data with factors (contd.)

Applied Statistics and Computing Lab

13

Comments on the Box-plot
• From the plot it is clear that the medians in the population are most unlikely to be equal • For the Box-plot for high performance, the notch is within the first and third quartiles. However, for the plots corresponding to low and medium performances, the lower end of the notch is below the first quartile. Thus the population median could fall below the observed first quartile in these two cases • It is also worth noting that the sampling variability of the median (as observed by the length of the notch) is about the same for the three factors (performance groups).

Applied Statistics and Computing Lab

14

R-codes
Plot Notched box-plot Variable width box-plot R-code boxplot(‘data name’, notch=TRUE) install.packages(“aplpack”) library(aplpack) boxplot(‘data name’, varwidth=TRUE) boxplot(‘data name’, varwidth=TRUE, notch=TRUE) Boxplot(‘numeric variable’~’factor variable’, varwidth=TRUE, notch=TRUE)

Variable width notched boxplot Box-plot for data with factors

Applied Statistics and Computing Lab

15

Thank you

Applied Statistics and Computing Lab