You are on page 1of 6

Histogram.

A histogram is a bar graph of raw data that creates a picture of the data distribution. The bars represent the frequency of occurrence by classes of data. A histogram shows basic information about the data set, such as central location, width of spread, and shape. Use histograms to assess the systems current situation and to study results of improvement actions. The histograms shape and statistical information help you decide how to improve the system. If the system is stable, you can make predictions about the future performance of the system. After improvement action has been carried out, continue collecting data and making histograms to see if the theory has worked.

The vertical axis of the histogram shows the frequency of occurrence. The horizontal axis shows the cell values. Histograms illustrate the process distribution and are used to make predictions about a stable process. If the system is unstable, the histogram will have little predictive value. Histograms provide three very important pieces of information about distributions of data values: shape, central location (the middle), and spread (how different the values are from each other and from the middle). Getting the most from this tool means being able to apply these statistical concepts. Histograms show how data can pile up; in any distribution of values, some values will occur more frequently than others. The peaks on the histogram show where there is similarity among the data. This is the central location, which is measured by mean, median, and mode. While these statistics provide valuable information about the process, central location alone does not provide a complete picture of the process. When you consider the spread of the data, you will see its extremes. The shape of the histogram can show if the system leans toward one extreme or the other, or if there are multiple peaks. When you use a histogram for prediction, the system must be stable. If not, the central location ,spread , and shape may vary dramatically in histograms created from data taken at different times and will not be an accurate reflection of the process. If you are not using histograms to make predictions, stability is not required.

Box-and-whisker plot. Given some data, we can draw a box and whisker diagram (or box plot) to show the spread of the data. The diagram shows the quartiles of the data, using these as an indication of the spread. The diagram is made up of a "box", which lies between the upper and lower quartiles. The median can also be indicated by dividing the box into two. The "whiskers" are straight line extending from the ends of the box to the maximum and minimum values.

When collecting data, often a result is collected which seems "wrong". In other words, it is much higher or much lower than all of the other values. Such points are known as "outliers". On a box and whisker diagram, outliers should be excluded from the whisker portion of the diagram. Instead, plot them individually, labelling them as outliers. If the whisker to the right of the box is longer than the one to the left, there is more extreme values towards the positive end and so the distribution is positively skewed. Similarly, if the whisker to the left is longer, the distribution is negatively skewed.

Now, let see an example of question. Example. The time taken by the participants in the National Aids Foundation Charity Run are shown on the table below. Represent the data by using histogram and box-and-whisker plot. Participants 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Time (minutes) 11 13 12 16 15 16 17 18 20 21 24 25 20 23 Participants 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Time (minutes) 17 28 29 17 22 23 29 25 15 19 20 21 23 27

Solution. Histogram.

12 10

Frequency

8 6 4 2 0 5 10 15 20 25 TIME 30 35 40

The first bar in this histogram represents the number of participants (frequency) with times 10 and < 15 minutes . The second bar represents the number of cases with weight

15 and < 20 and on. When the Show Normal distribution option is selected, a Normal distribution plot (with Mean and Standard Deviation of the data represented in the histogram) is superimposed over the histogram.

Using the histogram it can be evaluated visually whether the data are distributed symmetrically, Normally or Gaussian or whether the distribution is asymmetrical or skewed. When the distribution is not Normal, it can not accurately be described by mean and standard deviation, but instead the median, mode, quartiles and percentiles should be used. By looking at the histogram, we can see that the data is normally distributed.

Box-and-whisker plot.

10

15

20 TIME

25

30

The diagram show the box plot for the data. In the Box-and-whisker plot, the central box represents the values from the lower to upper quartile (25 to 75 percentile). The middle line represents the median. The horizontal line extends from the minimum to the maximum value, excluding outside and far out values which are displayed as separate points.

An outside value is defined as a value that is smaller than the lower quartile minus 1.5 times the interquartile range, or larger than the upper quartile plus 1.5 times the interquartile range (inner fences).

A far out value is defined as a value that is smaller than the lower quartile minus 3 times the interquartile range, or larger than the upper quartile plus 3 times the interquartile range (outer fences). These values are plotted using a different marker in the warning color .

As an option, you may select to plot all individual data points. This enables you to obtain a diagram representing a statistical summary of the data without the disadvantage of concealing the real data.

10

15

20 TIME

25

30