Professional Documents
Culture Documents
2.1 Introduction
2.1.1 This Chapter provides a review of some basic methods of presenting raw
data in a more digestible form, and some less well known but very useful
techniques. None of these tools should be considered in isolation: they will
almost always be used in a progression or combination in the course of a real
application.
It is worth reviewing some of the essential principles of data collection and
recording. Often data is collected as a habit: perhaps someone started it long
ago for a purpose now obscured, and the routine has continued ever since. No
one dares to question its relevance or suitability in changing conditions. Some
important points include:
Objective: Is the data required for legal purposes (accounts, VAT, customer
documentation), control, solving a specific problem or management informa-
tion?
Type: Is the data obtained by counting or measurement? Samples or 100%
checks?
Frequency: Is it required once only, or for regular checks (once per hour,
every batch or shift, etc.)?
Recording format: Will this be automatic data capture, record sheets, tallies,
control charts, questionnaires? Document design should be considered in
terms of relevance, simplicity, ease of transfer of information to other media
(e.g. computer), possible multiple use of the same data for various purposes,
and facilitating calculation of summaries.
Communication and training: Ensuring all concerned know the relevance and
importance, and are familiar with the correct procedure, methods of calculation
and avoiding errors.
Integrity: Data is used for control, making decisions, diagnosing problems
and other important functions. It needs to be objective, honest, legible and
(especially where sampling is concerned) properly representative ofthe system
or process.
l 0 Data collection and graphical summaries
2.2.1 In its raw form, data is rarely useful. Care is needed in presentation and
extracting useful summary statistics. Consider the following set of data:
238.9, 238.3, 240.4, 241.0, 239.0, 239.3, 239.6, 236.9, 239.3, 240.1, 238.8, 240.7,
241.0, 240.1, 239.3, 239.1, 239.5, 239.9, 238.9, 237.4, 238.2, 238.4, 239.2, 239.7,
239.4, 240.1, 240.3, 239.5, 239.0, 238.6, 239.8, 240.3, 239.7, 238.7, 240.7, 240.0,
240.1, 241.6, 239.8, 240.6, 239.7, 239.7, 240.4, 239.5, 238.6, 239.2, 237.6, 238.9,
239.2, 240.3, 239.4, 240.8.
In this form, the figures convey little except that they are all fairly close to 240.
What are they? And have we any background information?
The data make rather more sense when we learn that they are weights (in
grams) of valve liners moulded in a corrosion resistant plastic material. They
are weighed as a quality check (if they weigh too little, they may contain voids
or be undersize; if too much, they may be of poor shape due to improper mould
closure). There is a design specification of 240 ± 5 g.
While one can now examine whether the weights (about equal to a half-pound
of butter!) satisfy the specification, it is impossible to discern any pattern. Some
organization is needed.
242
241 0 0 6
240 4 1 7 1 1 3 3 7 0 1 6 4 3 8
239 0 3 6 3 3 1 5 9 2 7 4 5 0 8 7 8 7 7 5 2 2 4
238 9 3 8 9 2 4 6 7 6 9
237 4 6
236 9
235
241 0 0 6
240 0 1 1 1 1 3 3 3 4 4 6 7 7 8
239 0 0 1 2 2 2 3 3 3 4 4 5 5 5 6 7 7 7 7 8 8 9
238 2 3 4 6 6 7 8 9 9 9
237 4 6
236 9
In the example, the first value is 238.9, so the leaf .9 is allocated to the stem
238, then .3 also to stem 238, .4 to 240, etc., as in Table 2.1. Immediately a
pattern emerges - the values cluster around a centre (stem 239), tapering off in
each direction. Even at this stage, it is apparent that although all the values lie
within the specification, there are more values in the lower half (239.9 and
below) than in the upper half (240.0 and above). Of course, this may be a
deliberate saving on materials!
If required, the leaves within each stem can now be rearranged in ascending
order to give a complete ranking order which is useful for identifying quantiles
such as the median and first and third quartiles, as described in section 2.3.3.
This yields (deleting the unused stems 242 and 235) the presentation in Table 2.2.
52
2.3.1 Histogram
While the tally chart is often adequate for displaying the pattern, the presentation
may be improved by drawing a scaled bar chart or histogram. It is here that
the advantage of using equal class widths becomes apparent, as the histogram
Graphical presentation 13
J 1 2 1 5 10 13 8 8 3 1 I
236.75 237.25 237.75 238.25 238 .75 239.25 239.75 240.25 240.75 241 .25 241.75
Fig. 2.1 Histogram of valve linear data.
1\
I ·-·
. . . . . . .I
. \.""
/
·• 237 238 239 240 241
. ·-
........