You are on page 1of 38

The Effect of the Design of the IBM Proposed UPC Symbol and Code on Scanner Decoding Reliability

David Savir

The Effect of the Design of the IBM Proposed UPC Symbol and Code on Scanner Decoding Reliability

David Savir

System Development Division International Business Machines Corporation Research Triangle Park, N.C. 27709 October, 1972

International Business Machines Corporation, 1972

Contents

Foreword Introduction and Summary. The Concept of Performance Reasons for Error Print Errors Choice of Symbol to Neutralize Systematic Delta Distance Decoding Performance as a Function of the Random The IBM Proposed Code Detection of a Character Error in Measurements. The Consequence of a T1 or T2 Error Error in T4 Measurement Probabilities of Character Error Error Detection and Correction Distribu tion of Errors in a Block The Derivation of Performance Errors Error

iii

Wand Performance Scanning Performance Allowable Edge Dislocation Alignment of 0p' . . . . . . . . Appendix A. Optimality of Decoding Decision Rules. Appendix Appendix Appendix Appendix B. C. D. E. Probabilities of Ti Errors Probabilities of Character Error Probabilities of Errors in a Block Wanding Performance Probabilities

II
11

2 2 2 3 4 4 5 5 6 7

12
13

IS 17 21 23
25

8 8
9 9 10

Appendix F. Distribution of the Number of Scans. Appendix G. Probabilities of Success, Error, and Rescan for Values of m, n . Appendix H. Systematic Variation References Effects of Print Quality

27
29

33
36

Illustrations

The Letter Inkspread F-I Title Sequences H-2 8 6 7 SectionDislocations on on its Page Number ofthrough Set and TitleSymbolic Representation. Systematicof "T" Symbol,MarkPage Span of Ti FigureSmear TheMeasurementsSymbola PassIgnoring the Effects Figure EffectProposedEffects in Marks Taking into Account Edge of the Spans EffectCharacterSymbol.

4 527 II6 3 434

H-I 9

Foreword

This technical paper has been prepared by IBM as part of the submission of the IBM Proposed UPC Symbol and Code to the Symbol Standardization Subcommittee. It is hereby offered to the Subcommittee for evaluation. The specific intent of this part of the submission is to provide the Subcommittee with the basic theoretical foundation on which the IBM Proposed UPC Symbol and Code was developed. In this connection, it should be noted that the paper deals primarily with questions of basic theoretical importance and thus does not necessarily address all possible parameters that may influence the process of scanner decoding. Similarly, while the work is based on valid, wellestablished techniques of mathematical modeling, the conclusions stated result from the application of several assumptions describing the scanning environment. Accordingly, the results presented here are not intended to constitute any express or implied warranty. It is our expectation that the assumptions, methodologies, and conclusions presented here will be reviewed and validated by the technical community concerned with the adoption of the UPC Symbol and Code.

iii

Introduction The degree of accuracy obtained in transforming the UPC symbol found on an item into its intended coded numeric value is dependent on (a) the structure and properties of the symbol and code and the nature of the decoding and error correction algorithms implemented in the scanning subsystem, and on (b) the introduction of error, as may be caused by such factors as random noise, distortion resulting from the printing process used to prepare the label, excessive dust or moisture found on the label during the scanning process, and optical aberrations in the scanner. Within this menograph, the concept of performance is developed to evaluate the degree of accuracy theoretically obtainable in the scanning transformation process. When an item is scanned, all elements affecting the scanning transformation process may be within their tolerance limits thereby facilitating a valid scan. Conversely, if one or more elements are beyond their limits, a measurement error will occur. Given sufficient redundancy in the symbol and code and a suitably defined scanning decoding algorithm, measurement errors on individual scans may be detected and corrected witho"ut the requirement for rescan, yielding an accurate transformation. Performance is evaluated in probabilistic terms related to the proportion of items scanned which produce a successful decode (including those errors which have been detected and corrected), the proportion of items scanned which are rejected and require rescan or manual (key) entry (errors which are detected but cannot be uniquely corrected), and the proportion of items scanned which produce an erroneous decode, in which one or more characters of the code are replaced by incorrect characters (errors which are either not detected within the scanning subsystem or which result from erroneous correction). It should be noted that the concept of performance as developed in this paper does not relate to specific hardware designs. Rather, it relates to the theoretical impact of certain types of errors upon the overall transformation process. Certain characteristics of individual printing processes, scanning subsystem implementation limitations, types of errors not considered in this analysis, or other factors that are inconsistent with the assumptions used in the analysis can be expected to produce results other than those evaluated herein. As such, the evaluation of the various probabilities of success and failure of correctly scanning a label based on the discussion herein, should be considered with these assumptions taken into account. Only if these assumptions represent the real scanning process completely and accurately would the results obtained represent a measurable performance in the conventional sense. The appropriate use of the results of this analysis lies in the comparison of the IBM proposed code and symbol with any other code and symbol under consistent assumptions. Any choice of code and symbol is associated with <: performance tha t can be evaluated either by the techniques to be described or by others applying more exactly to the

and Summary

methodology of scanning and decoding represented by such a choice. Basic to the evaluation of performance is an understanding of the chosen process of scanning and decoding and, very particularly, the causes of decoding error entailed in that process. Upon that understanding a mathematical model of the scanning and decoding process can be structured to yield the performance of the code and symbol. Causes of potential decoding error can be both systematic and random. The code and symbol proposed by IBM was chosen to isolate and eliminate the effects of the systematic causes. This was achieved through the employment of a self-clocking code, a symbol comprising rectangular bars and spaces, and delta-distance decoding methodology. In our approach, the random effects are aggregated into edge dislocations as interpreted by the scanner. Because the probability of successful decoding depends on the magnitude of these edge dislocations, performance is therefore derived as a function of the variance of the distribution of edge dislocation. For the purposes of this analysis, this variance can be assigned to a print tolerance and to a tolerance on scanner design. The . proposed assignment is a standard deviation of 1.00 mil ? for print tolerance and 1.94 mil for tolerance on scanner....; design. Employing these assumptions, these standard deviations will yield a success rate of 0.9949, a reject rate of 0.005048, and a substitution rate of 0.000076. Similar results are obtained for wanding. A character is decoded by the correct interpretation of two or three ratios of delta-distance measurements. The outcome of an error in one or more of these interpretations is either an invalid character or an undetected character error. Each character is encoded in odd parity, ensuring that the former outcome is more likely than the latter. The probabilities of these outcomes depend on the standard deviation of edge dislocation and are as tabulated in later pages. The probabilities of encountering various error combinations in a symbol block are subsequently deduced. These probabilities are necessary in order to measure the success of error detection and correction techniques employed. An arbitrarily good performance measure can be theoretically attained by holding the standard deviation of edge dislocation to a sufficiently low level. However, since the standard deviation cannot be practicably held very low we use three techniques for improving performance beyond that which might be attained by straightforward decoding. These are: parity on each character, modulus checking, and multiple scanning. Multiple scanning yields several candidates for decoding, modulus checking detects most errors, and parity, in conjunction with modulus checking, permits error correction. Rules for acceptance and rejection of decoded input are given, and the probabilities of making the correct decision are calculated, from which the performance capabilities are derived.

The Concept of Performance

In this analysis, we first develop the concept of the performance of a code and symbol and subsequently employ it to measure the probabilities of successful and unsuccessful decoding of encoded information under scanning. Because we are concerned with the robustness of the code and symbol, rather than the efficacy of the scanning hardware, we will examine the performance of the code and symbol without regard to the scanner. Clearly if the symbol is well designed, and if it always appears as designed when it is scanned, then it will always be decoded correctly, assuming a sufficiently good scanner. Furthermore, under such assumptions, we would be indifferent to any of a large selection of possible codes and symbols. However, for reasons to be discussed in the next section, the scanned symbol does not appear as designed. The performance of the code and symbol becomes then a measure of the ability of the symbol to successfully survive changes. We conceptually separate the notions of code and symbol because only the symbol, the visible representation, is subject to changes, whereas both the symbol and the code, the binary representation of information, contain the power to resist the effect of the changes.

In order for the performance of the code and symbol to be high, then the decoding logic must be given the ability either (1) to accept and decode edges which are greatly dislocated or (2) to isolate and eliminate the effects of key components of error to such a degree that the residual error is diminished. For any decoding logic the latter course is preferred, even if the former is feasible. We partition the sources of edge dislocation into print error, e P' and system error, e s' such that the total dislocation on an edge is e = ep + es' Errors in artwork, platemaking, and printing are consolidated into ep ; errors due to optical effects and the effects of digital timing and discrete sampling and those due to environmental degradation are consolidated into e s .

-.
Print Errors

The errors contained in e p that affect the location of the edge of the mark are due to: (1) artwork, (2) platemaking, (3) inkspread, (4) smear, (5) edge roughness, (6) extraneous ink, (7) voids, and (8) expansion and contraction of substrate. Errors in artwork consist of the random error in the line drawing and a systematic error in photoreduction. We distinguish between random arid systematic errors. A random error will affect an edge or a portion of an edge independently of the rest of the symbol. A systematic error will affect all the edges in a similar manner. Errors in platemaking contain a systematic component, increasing or decreasing the size of all the marks, and a random component. The error of inkspread is due to overinking (or conversely underinking) which increases (or decreases) the size of all the marks, systematically. The error of smear is due to a systematic ink deposit in the direction of motion of the paper. Edge roughness is a random effect whose intensity depends on the printing process and the paper. Extraneous ink and voids affect edges only when sufficiently large to be identified falsely as a mark or when intrusive into the edge of a true mark. The effect of smear on marks of differing shape is shown in Figure 1. Arrows indicate optimal directions of traversal of scanner spot with respect to the marks, ignoring the effect of smear. The presence of smear on each of the marks affects the optimal trajectory by differing amounts. We observe that mark (iv) of Figure 1 succeeds in isolating and eliminating the effect of smear. The spot does not traverse any edge affected by smear over the effective range of the mark.

Reasons for Error

The symbol we consider is a set of marks and spaces printed on a label. In the scanning process a spot traverses the symbol, interprets the location of each edge of each mark and decodes information accordingly. A decoding error occurs if and only if the location of one or more edges is interpreted incorrectly. One combination of code and symbol has a higher performance than another if the former can tolerate a greater edge dislocation than the latter while still decoding the information correctly. The edge of a mark is designed to be at a specific location. As the artwork is drawn an error is introduced. Further errors are introduced in the processes of reduction and platemaking. Additional errors occur in the process of printing. The symbol as seen to the eye contains at each edge of every mark the sum of all these errors. The scanner cannot perceive the edge to be exactly where it is printed; there are errors introduced due to optical effects and to the effects of digital timing and discrete sampling. In addition, there are errors caused by the environmental degradation of the symbol, of which dirt, wrinkles, abrasion, moisture are a few. The total dislocation of the edge of the mark as perceived by the scanner is the sum of all these errors. If this sum of errors on any edge exceeds some value a decoding error will be made.

1i)
(i

+
(i i i)

(vi)

(iv) Figure 1. Effect of Smear on Marks

(v)

The effect of inkspread on sequences of marks of differing shape is shown in Figure 2. If the spot of the scanner follows a straight trajectory across the marks, the effect of inkspread on example (i) of Figure 2 is to add a constant increment of the width of each mark. For example (ii) a constant increment is added only when the spot passes through point A. Otherwise the differing radii of each of the edges of the marks cause the increment of inkspread to increase towards the center.

Other systematic effects result in a change of scale of one dimension with respect to the orthogonal dimension, i.e., an apparent expansion or contraction oflength or width of the symbol. (An estimation of the magnitude of systematic effects is made in Appendix H.)

Choice of Symbol to Neutralize Systematic Errors

(i)

(i i)

Figure 2. Effect of Inkspread on Mark Sequences

The effects of systematic errors are isolated and controlled by using a bar coded symbol and an appropriate decoding scheme. The symbol is a sequence oflong rectangular bars of several widths separated by spaces of several widths, Figure 3. By printing the symbol such that the bars are aligned with the motion of the paper through the printing press the adverse effects of smear are controlled. The other systematic effects are controlled by delta-distance decoding. [2]

The even-indexed locations are shifted to the left and the odd-indexed locations are shifted to the right, yielding an increase in the width of each bar of 2t independent of the original width of the bar. The locations bi are the only bar edges perceivable to the scanner or to the eye. Now for r~2:
Figure 3. The Proposed Symbol ai

(-li~
ai_2

ai_2

+ (_l)i-2~
(1 - (_1)2)

Delta-Distance

Decoding

ai -

(_l)i-2~

We illustrate the power of delta-distance decoding by examining the phenomenon of inkspread. Let us separate the effect of the systematic error of inkspread from the other errors that contribute to edge dislocation. Schematically, the section across a sequence of bars and spaces through which the scanning spot passes is as shown in Figure 4, where ao through as represent the locations of the edges of the bars, with aO, a2, and a4 denoting space-to-bar transitions and aI, a3' and as denoting transitions from bar to space. We assume that ea<:;h of the edges is dislocated due to various errors, but that dislocation due to inkspread is not included. Suppose, in addition, that we have means of decoding the symbol based on data ai'

Figure 4. Section Through Symbol, Ignoring the Effects of Systematic Errors

Now let us apply an error, due to inkspread, where ~ is unknown (in fact, will vary from print run to print run) but is consistent throughout any single print run. After the error ~ is applied the sectioned symbol is like Figure 5, where bi represent the locations of the edges of the bars corresponding to locations denoted by ai under the transformation:

which yields directly sufficient information for the decoding of the symbol. Hence the process of decoding by measuring the distances between the leading edges of adjacent bars and those between the trailing edges of adjacent bars isolates and controls the effect of the error, ~ , due to inkspread. This process is known as delta-distance decoding. By the same argument, any systematic error which widens or narrows each bar by the same amount is circumvented by delta-distance decoding. The effect of the systematic error due to change of scale and distortion occurring in platemaking and in the expansion and contraction of the paper is controlled by use of delta-distance decoding in conjunction with a reference measurement which is subject to the same change of scale or distortion. The properties which permit this to be done belong to the code chosen, rather than to the choice of symbol which represents it. Having selected a symbol, code, and method of decoding which controls systematic errors, we can consider the residual errors to be independent identically distributed random variables when the probability space is taken to be defined over events that can occur within the traversal of the spot over the symbol in one scan. What this statement means is that if an error persists over some significant portion of a bar, on anyone scan it will be encountered as if it were localized at the point at which the spot crossed the edge.

Performance Error

as a Function

of the Random

We have partitioned the print error ep' into two components, a systematic component whose effects we can
Figure S. Section Through Symbol, Taking into Account the Effects of Systematic Errors

ignore, and a component of residual error, ep' wh~ch is random. We can similarly treat the system error, es' to

yield a systematic component and a random component,


es' but such treatment is not in the scope of this report,

belonging rather to the domain of scanner design. We define the total random error of edge dislocation as c .. e = ep + es' c.., c-. C, The random error in bar edge dislocation is taken as normally distribu ted with mean zero and variance 02. The assumption of the normal distribution is plausible because the total random error is the sum of many independent random errors, each of which can adopt the value of a real positive or negative number. The performance of the code and symbol can now be calculated as a function of o. The largest value of 0 can be found which will yield a specified performance. Thus, since

,-,

c,

c.1.. lC.7 C"fc: C.Zt:C'1 CI.(c~ 'r C C"l-- ~ C."~ t:..

r.

of

C ,
1123

'-,

C-{

1222

1321

1141

2113

2212

and ep' es are independent, then


02

2311

= 0p 2 +

02 s
2131

and the analysis can be used to derive a print specification and a functional specification for the scanner, although the specific allocation of variance between ep and es cannot be derived from this analysis.
3121

4111

The I BM Proposed Code


Figure 6. The Character Set and its Symbolic Representation

The code is a binary representation of information, containing one-bits and zero-bits. Each character comprises four runs* of bits beginning with a one-bit. The total number of bits in all four runs is seven and the total number of one-bits (those in the first and third runs) is odd. Runs of one-bits are represented in the symbol by bars, runs of zero-bits by spaces. Thus each character contains two bars and two spaces. A block of code contains six characters, comprised of five characters containing UPC information and one check character. The symbol contains two blocks. The character set is as shown in Figure 6. The code numbers to the right of the character representations show the number of bits in each of the four runs. The decimal digit that each character represents can be assigned arbitrarily and is irrelevant to the analysis. We wiJI call the code numbers c = (c l' c2' c)' c4), where ci indicates the length of the i-th run. Odd values of i are associated with bars and even T2 c2 T) would equal c) T4 c4 values with spaces.

Detection of a Character

Four delta-distance measurements (bi - bi_2) are taken over each character. These are named, Figure 7, T1, T2, T) and T4. The measurements are normalized to a unit such that

If no edge dislocation occurred


c4 c) T1 c2 would character); c1(next equal c1 +

*A run of bits is a sequence of like bits, all one-bits or all zero-bits.

~_I/
Figure 7. Ti Measurements

guarantees the decodability of the sixth character without further information. This in turn guarantees the decodability of prior characters taken in succession. We denote the reference measurement as T= T1 + T3, which we normalize to 7 units, to correspond with the number of bits in the character. Each character requires one T1 measurement, one T2 measurement and, with probability 0.4, one T4 measurement in addition to the reference measurement T. We are concerned with two distinct measures. One is the unit, a random variable, equal in length to T/7. The other is the bit length, equal to the mean of the unit multiplied by a constant. There are two orthogonal scanning paths which may cross a character. If one is normal to the bars of the character the bit length on that path is equal to the mean of the unit. The other path will then be parallel to the bars and will not cross the character. If one path is inclined to the normal of the bars at an angle of a the other will be inclined at an angle ofrrj4 - a. The mean of the unit on the first path will then be the bit length multiplied by secant a, and on the second path will be the bit length multiplied by cosecant a. There will be one path whose mean unit will not exceed the bit length multiplied by V2 and the spot will always decode on that path. It may also decode on the other path if the mean unit be not too large. Clearly, the angle a is random, determined by the random orientation of the item label with respect to the scanner. The nominal length of the bi t length in the symbol is 0.0136 inches or, equivalently, the duration of time over which the scanning spot traverses a bar of width 0.0136 inches. However, the systematic effects of change of scale may yield a bit length different from the nominal.

however, recognizing that the numbers on the left sides of the expressions are real numbers derived from measurements and those on the right sides are integers, both sides can almost surely never be equal. For simplicity let tl
t2 t3 t4

= =

cl c2 c3 c4

c2 c3
c4

+
+

cl(next character).

Thus the t/s are the desired values for the measured T/s. A correct decode occurs if and only if each necessary
Ti measurement is interpreted to be the associated number, ti, of the character being scanned. The values of ti are shown in Table 1.

Table 1. Values of t
Character 3 4-] 4t2 4-] 5 3 4 5 3-6 2t3 2-5 t42

i
2 4 4 5 t13

Error in Measurements

We note that t4 depends on the width of the first bar of the next character. We note also that t 1 + t 2 is an odd number. This follows from t1 + t2 = c1 + 2c2 + c3 where cl + c3 is an odd number. Each character has a unique value of the (tl, t2) pair except for 1222 and 2131 which share (3, 4) and for 2212 and 3121 which share (4, 3). F or each such pair of characters which share a value of (tl' t2)' for unique decoding we require the value of t4 in conjunction with c1 of the next character. The value of c 1 is obtained after the next character is decoded which, in turn, may require prior decoding of the ~ubsequent character. Following the sixth character of the block there is always a single bar (c 1 = 1) of a framing pattern which

The error, e, of edge dislocation affects every edge randomly. In Figure 8 we illustrate a realization of this error on the three edges necessary for a T 1 measurement. ei is the dislocation of the i-th edge. The length of T is 7 + e1 + e3 mean units. The length of T1 is t1 + e1 + e2

Figure 8. Edge Dislocations

mean units, where t 1 can be 2,3,4, or 5. ei is normally distributed with variance 02. Treating 02 as a variable, we will study values of 0 (standard deviation) ranging from 0.04 to 0.2 of a mean unit. The linearity of the symbol will enable us to express results in terms of bit-lengths with errors measured normal to the bars. We will take a bitlength to be the nominal minimum bar width, 0.0136 in. Since el' e2, and e3 are independent, Tis normally distributed with mean 7 and variance 202. 1'1 is normally distributed with mean t 1 and variance 202. We establish the following decision rules with respect to the interpretation of the value of 1'1: 4~ 3~ 2~ l' IL T 3 7 5 l' 4 IL
.,;;; .,;;;

o
0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

<0.0001 < 0.0001 <0.0001 0.00143 0.00716 0.01999 0.03944 0.06437 0.09256

<

Atl ;> Atl

;> Atl ;> 2Atl

4~ 3~ 2~

where At 1 is the value assigned to t 1 as a consequence of the decision. The decision is correct when Atl = tl' The optimality of the decision rules is derived in Appendix A. A 1'1 error occurs when I'dI' satisfies one of the four left-most conditions (as of course it must) and the value of tl is not the one imputed in the right hand side to Atl' The probability, u, of a 1'1 error is derived in Appendix B and shown below in Table 2.

It may be surprising at first glance that 1'2 errors are so much likelier than 1'1 errors when they are analogous. The difference between the measurements 1'1 and 1'2 that explains the large discrepancy between u and v lies in the positive correlation of 1'1 and l' on the one hand, and the statistical independence of 1'2 and l' on the other.* We have tacitly assumed that if a 1'1 or 1'2 error occurs the value of At 1 or At 2 will differ from t 1 or t 2 respectively by I; for example, tl = 3 and I'dI' exceeds 3 1/2 but is less than 4 1/2, yielding an erroneous decision that Atl = 4. The probability that the difference between the value of t and the value of At will exceed 1 is extremely small (less than 0.0001) for all values of 0 considered, and will be ignored. In our example, if t 1 = 3 then it is extremely unlikely that 1'1/1' will exceed 4 1/2.

The Consequences of a T, or T2 Error

Table 2. Probability of a T 1 Error

o
0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

<0.0001 <0.0001 <0.0001 <0.0001 0.00058 0.00302 0.00914 0.01946 0.03462

If either a 1'1 or a 1'2 error occurs in a character, but not both, then the sum of the values of Atl and At2 becomes even indicating that the character has been decoded as invalid. If both a 1'1 and a 1'2 error occur in a character, then the sum of the values of Atl and At2 remain odd and an undetected character error occurs. Since 1'1 and 1'2 are independent**, we derive the probabilities of an undetected character error due to 1'1 and 1'2 alone: a

uv;

The occurrence of a 1'2 error is analogous to that of a 1'1 error, using the same decision rules, substituting the index 2 for 1. The probability, v, of a 1'2 error is derived in Appendix B and shown below in Table 3.

*Independence of T2 and T implies the possibility that T2 may exceed T, which is absurd. However the probability of such an occurrence is infinitesimal since 0 is a small number. The smallness of 0 as compared to Ti is indeed a necessary condition for us to use assumptions of normality on ei. **The independence of Tl and T2 measurements does not imply the independence of Tl and T2 errors. However the correlation is small, varying in sign depending on the values of tr and f2. The net effect of the correlation weighted over all values of tr and f2 is negligible.

and the probability

of an invalid character:

Table 5. Probability of a T4 Error. o


w

+ v - 2uv.

Table 4. Probabilities of Undetected Character Error Due to TI and TZ Alone (0) and of an Invalid Character (b)
a a

0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

< 0.00001 < 0.00001 < 0.00001 <0.00001 < 0.00001 0.00006 0.00036 0.00126 0.00320

<0.0002 <0.0002 < 0.0002 0.00143 0.00774 0.02289 0.04786 0.08131 0.12078

0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

< 0.0001 < 0.0001 < 0.0001 0.00037 0.00230 0.00766 0.01684 0.02941 0.04436

Probabilities of Character Error

The sources of character

error due to edge dislocation

lie

Error in T4 Measurement

If a character is found to be valid from T I and T 2 measurements then a T4 measurement may be required to distinguish between two characters whose values of t 1 and t 2 are iden tical. The value of C4, the width of the second space, corresponding to the code number c4, is needed. This value cannot be measured for if it were then the systematic error avoided by taking only leading-to-leading and trailing-totrailing edge measurements would be introduced. Suppose that 02=0 (i.e., no random error exists) but that a systematic error of E is present extending every edge by E such that every bar is widened by 2_ There is no error in measurements

in the measurement of TI, T2, and T4 and consequently the probability of a character error is a function of the probabilities a, b, and w which in turn depend on values of 02. The probability of the probability of an error in a character is independent as long of an error in any other character

are required. This indeas only T1 and T2 measurements pendence is assured by the self-clocking nature of the code in which the only measurements used to decode the character are found in the character itself. However, this independence breaks down when a T4 measurement is needed because that methodology which preserves insensitivity to the systematic edge dislocations requires reliance on the correct interpretation of c 1 of the next character for the interpretation of c4 of the character being decoded. Hence the probability of an error in the k-th character depends on the goodness of the character which depends in turn error in the (k+l)-th character. character of a block depends on decoding of the (k+ 1)-th on the probability of The sixth (and last) no other character because

T,

T1, T2, and T4 since the systematic

E is compensated

in each measurement.

error There is an error

of 2 in the measu remen t of the first bar, hence if C 4 is obtained by subtracting the width of the first bar of the next character from T4, C4 will be in error by -2. We wish to obtain a measure of C4 that will be free from systematic error. After the next character is decoded (or the last character of the chain is decoded) the number of bits, K, in the first bar is known, given no error in that character. The width of this bar after systematic effects have been excluded will be KT/7 units (where T is the length of this next character). This corrected width contains no systemati~error, hence the derived measurement C4 = T4 - KT/7 will be free from systematic error, and c4 deducible from C4/T by the decoding decision rules employed for TI and T2. The probability of a T4 error, W, is derived in Appendix B and shown in Table 5.

This the subsequent value of c 1 is fixed at one bit-length. c 1 value is part of a framing pattern rather than of a character. Our approach then will be to evaluate the error probabilities of the last character of the block, and then to evaluate the error probabilities of a prior character conditioned on the state of its successor. All desired results will be obtained from this set of conditional character can be in one of three states: decoded validly but incorrectly decoded invalidly. The conditional probabilities derived in Appendix C. probabilities. A decoded correctly, error), and

(undetected of character

error are

Error Detection and Correction

Po' the probability of no errors in a block.


Po

Symbol performance does not depend solely on the probability of making a decoding error. The effects of decoding errors can be greatly reduced by taking appropriate action when an error occurs. To this end we require the ability, not merely to evaluate the probability of error, but to detect an actual error and, if possible, to correct it. The code contains one check digit modulo-IO.* Ifno error exists in the symbol the syndrome (weighted sum modulo-1 0 of the characters plus check digit) will equal zero. If exactly one character is in error the syndrome will be positive. ** If all the characters are valid then the location of the error is unknown. If, however, exactly one of the characters is invalid, and therefore not contributing to the syndrome calculation, the location of the error is known and the value of the character, multiplied by its weight in the syndrome calculation, is the complement of the syndrome. *** If more than one character is in error, detection and correction are not as secure. If there are two or more invalid characters the error is always detected but correction is impossible since there are many ways of reconstituting the invalid characters and satisfying the modulus check. If there are exactly one invalid character and at least one erroneous but valid character, then the correction process will certainly produce an error since the invalid character will be changed to an erroneous character. Hence error correction should always be used with caution since the logic has no way of determining the exact number of errors among valid characters. The decoding logic used, as will be shown, is cognizant of this fact, and the risk of "correcting" erroneously has been held to a negligible level for the scanner. If there are no invalid characters and more than one character is in error 1/9 of such conditions will generate a zero syndrome. Hence we require knowledge of the probability of each type of error condition and the likelihood of correctly interpreting each condition and taking the correct action.

*, the probability of no invalid characters in a block. Such a block is not correctable-if any errors exist their location is unknown. If only one error exists in such a block and it is paired with an error-free block, then the modulus check will always detect the error. Hence we will require the probability of no invalid characters and exactly one undetected error in a block.

Pu'

It follows that Po * - Po - Pu is the probability of no invalid characters and at least two undetected errors in a block.
P d'

the probability of exactly one invalid character and no undetected errors in a block. If such a block is paired wi th an error free block, then the modulus check will permit correction. * the probability of exactly one invalid character in a block. Such a block might be correctable, but successful correction can occur only if no undetected errors are presen t, of which the condi tional probabili ty is P dip d * . Other conditions are also necessary for successful correction of errors.

P d *,

The following conditional probabilities will also be necessary.


q 0 = P dip 0 *, the probability

that a block that appears error-free really is free from error; q 1 = Pu/Po *, the probability that a block that appears error-free contains exactly one error;
q2 = 1 - q 0 - q

the probability that a block that appears error-free contains at least two errors.

I'

Expressions for these probabilities are derived, as functions of a, b, and w in Appendix D, and are tabulated in Tables

6-11.

Table 6. Probability of No Errors in a Block

a Distribution of Errors in a Block


0.20 0.16 0.14 0.12 0.18 0.06 0.10 0.08 0.04

> 0.94919 0.71385 0.85409 0.9985 0.40591 0.55540 0.99057 >0.9985 Po

Having computed the conditional probabilities of character error, we can derive the distributions of error in a block of six characters. We are concerned with specific points of the distribution in our evaluation of symbol performance. We employ the following:
*These are two check characters in the symbol-one in each block. These two characters encode one check digit and also additional information. See reference [11. **Unless the character should be zero and is decoded invalidly. ***This does not hold when a check character is invalid, for then the syndrome is lost. *Unless the error is in a check character. The consequences of invalidity in a check character have not been fully investigated as yet. The most important effect will be to yield valid but unchecked information, the precise consequence of the correction of an invalid character. For this reason we treat check characters in the same way as information characters.

Exactly One Undetected Error (q I)' and at Least Two qo 0.00458 Table 7. Probability q10.00025 > 0.99911No Invalid of No Undetected Errors (qo)'Table 11. >0.9997 0.00005 0.87871 0.92382 0.95808 0.98140 0.99449 0.9997 0.60119 0.00743 0.00318 0.01419 0.00015 < >0.9988 <0.00025 0.09709 0.06199 0.00093 0.00074 < 0.46194 0.99145 0.14 0.10 0.08 0.06 0.04 a 0.02420 0.03449 0.01542 0.87028 0.95445 0.12 q2 0.74508 Undetected Errors (q2) in a Block, Given No Invalid ConditionalofProbabilities Characters in a Block Characters

Table 8. Probability of Exactly One Undetected Error and No Invalid Characters in a Block
a

<0.00025 0.03727 0.02570 0.01342 0.00437 < 0.00073 0.00025 0.04485 Pu

The Derivation of Performance

The code and symbol were designed to be used either with a fixed-head scanner or with a hand-held wand. Each of these tools constitutes a separate environment of operation, and the required performance specification of the code and symbol in each environment is different. We shall show that the performance, for a given value of a, will also be different for each environment. The two symbol blocks of six characters are scanned individually, once each when wanded and several times each when passed over the fixed-head scanner. A successful scan occurs when a chosen scan is correct and the decoding logic affirms that it is correct. A substitution error occurs when a chosen scan is incorrect but the decoding logic affirms that it is correct. A reject occurs when the decoding logic is unable to choose or affirm a scan. We note that almost all substitution errors will be ultimately detected at the system controller since the allocation of codes is sparse over the domain of possible allocations. However, such late detection is deleterious to system performance, and our objective is to maintain a high rate of performance measured at point of scan. A reject occurs if no scan contains completely valid or correctable information (i.e., in every scan there are at least two invalid characters), if the chosen scan is not affirmed as correct, or if no scan can be uniquely chosen. The probabilities of the occurrence of error have been determined. We will proceed with the logic of selection of scans and their affirmation of correctness. This logic differs between the wand and the fixed-head scanner because of the multiplicity of fixed head scans. We will discuss the wand first since the analysis is simpler.

Table 9. Probability of Exactly One Invalid Character and No Undetected Errors in a Block
a

0.23099 0.14529 0.02966 0.00568 < 0.20102 0.08047 0.0008 <0.0008 Pd

Table 10. Probability of Exactly One Invalid Character in a Block


a Pd* 0.21284 0.14981 0.25383 0.08155 0.02978 0.00568 < 0.0008

10

Wand Performance

The wand is passed once over each block yielding a single scan of each block. Each block will appear to be correct (containing no invalid characters) with probability Po *. Each block will be correct with probability Po' If each block appears to be correct and is correct the modulus check will affirm correctness. If each block appears to be correct bu t exactly one error is present in one of the blocks then the modulus check will detect the error and reject the scan. If, however, two or more undetected errors are present then the error will be detected and rejected by the modulus check with probability 8/9 (where the modulus is 10), and the incorrect scan will be affirmed to be correct with probability 1/9. If there is exactly one invalid character in the two blocks then error correction can be attempted. If there are no other errors present the correction will be successful. If there is at least one other error '(undetected error) present then the attempted correction will yield an erroneous scan which will certainly satisfy the modulus check and be affirmed correct. If there are two or more invalid characters then the scan is rejected. Performance probabilities are derived in Appendix E, yielding for a = 0.12 bit-lengths,
Pr [success]

0.95727 0.04200
error]

Figure 9. Span of the Symbol

Pr [reject]
Pr(substitution

0.00073.

We consider wan ding to be adequate if the success rate exceeds 0.9 and the substitution error rate is less than 0.001. Hence a value of a = 0.12 meets the performance specification for the wand. A value of a = 0.12 bit-lengths corresponds to 1.63 mils for a bit-length of 13.6 mils. Hence a value of a = 1.63 mils for total error will allow for adequate wan ding performance.

Scanning Performance

of scans during the duration of access. Further, the span depends on the angle of attack, the inclination of the bars of the symbol to a reference direction. The scanning spot traverses an X-shaped trajectory such that the symbol block will always cross two legs of an X. When the angle of attack is zero, the symbol is symmetric to both legs of the X and the span with respect to each leg is the same, 0.300 inch. As the angle of attack changes, the span with respect to one of the legs increases while the span with respect to the other leg decreases. There are ranges of angle of attack over which a scan only on one of the legs will encompass all the bars of the block. From these considerations, and since both the item movement velocity and the angle of attack are random variables, the number of scans obtained on an item pass is also a random variable. The distribution of this variable is derived in Appendix F and tabulated in Table 12. Suppose that K scans are made on a pass. We assume, because of the proximity of the blocks and their parallel alignment, that each block will be scanned K times. Suppose that scans from the first block andj scans from the second block are free from invalid characters, i ~k, j~k, and suppose that m scans from the first block and n scans from the second block are error-free, m~i, n ~j. Although we can calculate the probabilities of obtaining m and n good scans respectively, this does not help us in

The performance of the code and symbol when used with a fixed-head scanner is superior to the performance when wanded because there is added the redundancy of multiple scans from which to select the information for transmission. The number of scans obtained on one pass of the item depends upon the following: the span, that length of bar over which the spot can complete the scan when started from that bar edge, Figure 9; the item movement velocity which determines the duration of the access of the span; and the period of the spot which determines the number

11

Table 12. Probability Number of scans K

of Obtaining

K Scans Probability of obtaining N (K) K scans

o
1 2

3
4

5
6
7

8
9
10ormore

0.0000564 0.00352 0.0199 0.0433 0.0653 0.0781 0.0858 0.0715 0.1025 0.0718 0.4582

Probabilities of success, substi tu tion error, and reject for m=l, n>l; m=O, n>l; m=l, n=l ;m=O, n=l; m=O, n=O are derived in Appendix G. By symmetry we obtain the probabilities for n= 1, m> 1; n=O, m> 1; n=O, m= I. These probabilities are of course conditioned on the values of m and n. The probability of obtaining m error-free scans from scans free from invalid characters is

the probability of obtaining n error-free scans from j scans free from invalid characters is

identifying them once we have them. We make the following assumption: If two scans are identical then both are correct, eyen if obtained after the correction of an invalid character. The rationale for this assumption is that the symbol block is scanned in different places and that it is extremely unlikely for random error to be identical. * We make the following rules for accepting and rejecting scans: (1) If the assumption is satisfied then accept a scan satisfying the assumption. (2) If the assumption is not satisfied accept a pair of scans, one from each block, satisfying the modulus check, even if obtained after the correction of an invalid character, but only if no other pair of scans (including those obtained after the correction of an invalid character) satisfies the modulus check. Otherwise reject. The following conditional probabilities of success, reject, and substitution error are derived assuming the yalidity of the assumption. If both m and n are greater than one the good scans can always be recognized, hence
m

the probability of obtaining scans free from invalid characters from a total of K scans is

(K)

Po

*i (_

Po

*)

' K -i .

and the probability of obtainingj scans from invalid characters from a total of K scans is

j (K)

Po *j

(1 _

Po *\ K -j )

Finally the probability of obtaining K scans is N( K). By summing the products of all these conditional probabilities we obtain the measures of performance. For a = 0.16 bit-lengths we obtain
Pr [success] Pr [reject] Pr[substitution error]

0.9949 0.005048 0.000076.

>

1, n

>

;> Pr[success]
Pr[substitution Pr [reject] error]

o
O.

This performance meets the requirement of a success rate exceeding 0.99 and a substitution error rate of less than 0.0001.*

Allowable Edge Dislocation


*The assumption will on occasion be false due to a persistent error in print, curvature, defacement, etc. The assumption may also be false if the item be moved very slowly over the scanning window, permitting adjacent scans to be close. The effects are difficult to quantify without knowing how frequently the assumption will be false. At all events the parity check and the modulus check will dominate in the decision making process. To the extent that following the assumption results in an erroneous decision the error will be detected in almost all cases causing rejection of the scan.

For the wand a value of a = 0.12 bit-lengths and for the scanner a value of a = 0.16 bit-lengths satisfy the performance specification. Since one bit-length is nominally
*In those few instances in which the assumption that we used to establish the rules of acceptance and rejection of scans does not hold, the performance will be lower. The overall performance, taking these cases into consideration, will not be significantly affected.

12

0.0136 inches, these measures are 0.00163 inches and 0.00218 inches, respectively. It has already been shown that this edge dislocation is due to a random error e and a random error es' each independent of the other. CI~arly ep must be common to both wand and scanner since both systems must read the same symbol. The following dichotomyexists: given an upper bound for ap of 0.00163 Inches, the allocatIon of a between ap and as satisfying the equation a = .J ap 2 + as 2 is arbi trary, implying that the less precise the printing is, the more precise must be the scanning system. However, ap must be that value from which a print quality specification is to be derived, binding the printing industry, while as is that value to which any system manufacturer must specify his equipment. Clearly, as the values of ap and as, respectively, are restricted to smaller numbers, the costs to the printer and to the system manufacturer, respectively, increase. These costs, of course, ultimately are borne by the user. Hence an optimal allocation of a between ap and as should be related to the differential costs of production as a function of those values. The necessary cost information does not exist, nor is it a well

welldefined concept, since it must vary between processes and between manufacturers.

Assignment of

up

A value of ap = 0.00 I inches was selected for the following reasons. I. The principal contributor to ep is edge roughness, since the systematic effects of print imperfection are of no consequence to the bar code. A value of ap = 0.001 corresponds to a standard deviation of bar width of 0.0014 inches which is consistent with the results of a study on print quality. [3] 2. The selected value will hold the minimum bar width to a value greater than 0.008 inches, which is desirable in order to hold the effects of extraneous ink spots and voids to a negligible level.
ap has therefore been selected such that the derived print specification be consistent with the current standards of commercial printing.

13

Appendix A. Optimality

of Decoding Decision Rules

Suppose that the length of the rule

T1

is measured to be f units. We wish to select a value of ~ in the open unit interval such that

i ,.;;
tI =
[f]

[I] 7+ ~

~ At I

[f],

i>
[f]

[f]

7+ ~

;> At

[l] + 1, for 2";;

<

5,

minimizes the probability of erroneous decision, At I Error occurs if either


T1

'* t I .

and

T > -7-

[I]

+~
or

t.

1 and:G. ,.;; [I] + ~ T 7

Hence the probability of error is

T
Pr [TI

>

7
[I]

+~

tl

= [l

]]

Pr [T

T ""

I ~

7
[I]

+~

For the first term:

= Pr
Pr [ 7T. TTI

[7T1

([f]

+~)T

>

0].

>

[l]7+ ~] +

([f]

n T is normally

distributed with mean 7 [I] -

([f]

n (7)

-7~ and variance

For the second term:

-7T1 + ([l] +

n Tis

normally distributed with mean -7

([f]

+ 1) +

([f]

+ ~)(7)

= -7 (1- ~), and variance

(The derivations of the variance are given in Appendix B.) The probability of error is therefore

where S is the square root of the variance given above, and <I> the cumulative standard normal distribution. is This expression is minimized when

is maximized.

15

Since

<l>

is a strictly concave function on the positive half line:

~ ~ (~~) + ~~

(7;) <V (~)

except for ~ = 1/2, where equality holds. Hence the choice of ~ = 1/2 is the optimal decision rule.

16

Appendix B. Probabilities of T; Errors

T1

error occurs if

T1

T
p{T;

> >

+ 1/2
7'

'f
or
1

T~
[7T1

T1

m -1/2
7

lET
1

w en I

m.

m ~ 1/2J
-

= Pr
(m 1)

(m

+ 1/2)

> 0].

The variable 7T1 E (7T1


-

+ 1/2)
=

T is normally distributed.
-

(m
-

+ 1/2)
(m

7ET1

(m

+ 1/2)
T1

E1) (m

= 7m - (m + 1/2)(7) = -7/2.

Var (7T1
202

+
T1)

1/2) 1)

= 49 Var

+
-

+ 1/2)2

Var T -

14 (m
-

+ 1/2)

Cov (T,Td.

= Var (T-

= Var T

Var

T1

2 Cov (T,Td

402

2 Cov (T,Td.

Therefore

Cov (T,Td
-

02,

and Var (7T1

(m

+ 1/2) 1) =
- 7 (m
0]

49 (202)

(m

+ 1/2)2

(202)

14 (m

+ 1/2)02

202 (49
-

+ (m + 1/2)2 + 1/2)
T>

+ 1/2)).

Pr[7T1

(m

Pr

+ 2 (m + 1/2)2 ~oV98 7Tl-(m+1/2)T+~ -14

(m

+ 1/2)

>

---============
oV98

+ 2 (m

+ 1/2)2 - 14 (m + 1/2) ~

= <P(20V98 + 2 (m + 1~2? - 14 (m + 1/2) ~ = u (m,o),


which is tabulated in the Standard Normal Tables.

Pr [Tl

T ~ m -7 1/2J

= Pr[(m - 1/2) T -

7T1

0].

The mean of (m - 1/2)T - 7T1 is again -7/2, and the variance can be similarly shown to be 202 (49 + (m - 1/2)2 - 7 (m - 1/2)). Hence Pr [m - 1/2)T - 7T1 ;;;, 0] = u(m - 1,0). For m = 2, the conditional probability of T1 error is u(2,0); for m = 3, u (2, a) + u(3, a); for m = 4, u (3, a) + u(4, a); for m = 5, u (4, a). Of the ten decimal characters, two have m = 2, three have m = 3, three have m = 4 and two have m = 5. Hence, the unconditional probability of u
T1

error is

= (0.2) u (2, a) + (0.3) (u(2, a) + u(3, a)) + (0.3) (u(3, a) + u(4, a)) + (0.2) u (4, a)

(0.5) u (2, a)

+ (0.6) u (3, a) + (0.5) u (4, a).

17

T2 measurement errors are computed as Tj measurement errors with the important distinction that the length of T2 is independent of the length of T. Hence, the term 7(m 1/2) in the variance of (7Tj - (m 1/2)T) drops out when Tj is replaced by T2. The unconditional probability of T2 error, v, is derived otherwise analogously to u. A T3 error cannot occur since the T3 measurement is used only to establish the reference measurement. T3 measurements are never used for character decoding. We now evaluate the probability of a T4 error, given that the measurement is required for decoding. (If the T4 measurement is redundant to character decoding its accuracy is of no importance.) C4, when its value must be established to distinguish between two characters, takes the value of either I or 2. The decision rule for the selection of C4 is:
T4 -KT/7 1~ ____ _ T 7 _ ;;:.AC4 -

Suppose that K has not been correctly determined, either because the next character is invalid or because the next character is invalid or because of an undetected error in the next character. We take the probability of a T4 error to be 1, except that if an undetected error occurs in the next character causing K to be incorrectly determined, then only with probability 0.5 will the error cause an incorrect selection of C4, because of the binary nature of the decision. We further assume, conservatively, that if the (k + 1)-th character is in error then K has not been correctly determined. Now suppose that K has been correctly determined. Conditioned on K:

Pr[T4 error I C4 Pr[T4 error Suppose


I C4

= 1] = 2]

pr[T4 -';-1'/7 ----_ pr[T4 -TKT/7

> 134]

<

134J

C4

= 1.

The variable 14T4 - 2KT Hence E(14T4 Var (14T4 2if 2KT 3T)

3T is normally distributed. 3T)

ET4 21

= K + 1, since C4 = 1. = -7.
9 Var T

E1' = ET

= 7.

14(K

+ 1)

14K 4~ Var

2KT -

196 Var T4

l' +

12K Cov (T,1)

= Var (T + 1) = Var T + Var l' + 2 Cov (T,1) = 402 + 2 Cov (T,1). = -02, and
3T) 3T

Therefore Cov (T, 1) Var (14T4 Pr[14T4 2KT 2KT -

= 196 (2if)
0]

4~

(2if)

(2x2)

12Kif

= if (410 +8K2

12K).

>

= prG4T4 -2KT-3T+7- 12K > L oy410 + 81(2

oy41O

+ 8K2
7

- 12K

18

which is tabulated in the Standard Normal Tables. K is 1 with probability 0.4, 2 with probability 0.4, 3 with probability 0.1, and 4 with probability 0.1. Averaging over these values of K we obtain w. Suppose C4 = 2.

Now ET4

+ 2,

so

E (-14T4 31)

2KT

+ 31) =
-

-7.

Var (-14T4

+ 2KT +

= c? (410 + 8K2
C4

12K) the same as for


C4

hence the probabilities of T4 error for correctly determined is w.

= 2 are

= 1, and

the probability of T4 error given that K is

19

Appendix C. Probabilities of Character Error

Suppose that the last character of a block does not require a T4 measurement. the probability that the character is decoded as invalid. Hence
Pr [invalid characterlT4 not required]

The probability that an error is detected is

b.

Similarly Pr[undetected errorlT4 not required] Pr[no errorlT4 not required]

= a,

= 1 - a-b.

(These probabilities hold also for each character, since the dependence of character error lies in the requirement for the T4 measurement.) Now suppose that a T4 measurement is required. C1 of the framing pattern is known so the probability of an error due to a T4 measurement is w. If a detected error occurs, it is detected prior to the use of the T4 measurement. Therefore
Pr [invalid charac terl T4 required]

= b.

If an undetected error occurs it can occur through T1 and T2 errors, in which case the goodness of the T4 measurement is irrelevant, or it can occur by the incorrect decision made due to T4 error. Hence Pr[undetected errorlT4 required] Pr[noerrorlT4 required]

= a +
a -

w (1 - a - b),

1 -

b -

w(1-a-b)

(l-w)(l-a-b).

The probability that a T4 measurement is required is 004, since four of the ten characters need this measurement to be decoded. Unconditioning on the T4 requirement we obtain:
Pr [invalid character]

= b = a + Oo4w(1-a-b) = [0.6 + Oo4(l-w)]


(1-a-b)

Pr[undetectederror] Pr[noerror]

= (1-004w)(l-a-b).

We now consider the probabilities of error in the k-th character conditioned on the state of the (k + 1)-th character. Suppose the (k + I)-th character contains no error. Then the interpretation of Cl is correct, which is precisely the condition that we have just discussed. Hence
Pr [invalid character in kino error in k

+ I] = b + 1] = a + Oo4w(1 - a - b)

Pr [undetected error in kino error in k Pr[no error in kino error in k

+ 1] = (1 - Oo4w) (1 - a - b).

21

Suppose the (k + 1)-th character is invalid. Then CI is undetermined and a T4 measurement, good or bad, yields no information, so if the k-th character requires a T4 measurement it cannot be decoded. The consequence is an undetermined character, which we state to be invalid, by default. Even if T1 and T2 errors occurred in the k-th character the default would indicate an invalid character. Pr [invalid character in klinvalid character in k

1, T4 required]

Pr[ulldetected error in kjinvalid character in k


Pr[no error in klinvalid character in k Unconditioning

+ 1, T4 required]

+ 1,

T4 required] we have:

O.

on the T4 requirement,

Pr(invalid character in klinvalid character in k Pr[undetected error in klinvalid character in k


Pr [no error in klinvalid character in k

I]

0.4 0.6a

0.6b

+ I]

1]

0.6 (J - a - b). error. Now


C1

Suppose the k + I-th character contains an undetected bility of an invalid character is unaffected by CI .

is determined

and (assumed)

wrong.

The proba-

Pr[invalid character in klundetected


We obtain an undetected error if

error in k

1, T4 required]

b.

(i) an undetected

error results from T1 and T2 measurements,

with probability

a, or
to
C4,

(ii) T1 and T2 measurements are correct and the misinterpretation of C1 caused a wrong decision with respect probability 0.5 (since one direction is favorable, and we assume equal likelihood of direction of CI error). Pr [undetected error in kjundetected Pr [no error in klundetected Unconditioning

with

error in k

1, T4 required]

0.5 (J - a - b)

error in k

+ I,

T4

required]

0.5 (1 - a - b).

on the T4 requirement:

Pr (invalid character in klundetected

error in k error in k

I]

b.
(J -

Pr[undetected error in klundetected


Pr [no error in klundetected

+ 1] = a + 0.2
0.8 (I - a - b)

a - b)

error in k

I]

22

Appendix D. Probabilities of Errors in a Block

Po

= =

Pr [no errors in block]

Pr[no error in Iino error in 2] Pr[no error in 21no error in 3] X ...

X Pr[no error in 6]

Po *

Pr[no invalid character in block]


Pr [no invalid character in Iino invalid character in 2]
... X

Pr [no invalid character in 21no invalid character in 3]

Pr[no invalid character in 6]

(1 - b)6

Pu

Pr[exactly one undetected error and no invalid characters] Pr[noerrorin5]


X ... X

= Pr[undetectederrorin6]

Pr[noerrorinl]
X ... X

+ + + +

Pr[noerrorin6]

Pr[undetectederrorin5]

Pr[noerrorin4]

Pr[noerrorinl]

Pr[noerrorin6]

...

Pr[undetectederrorin

I]

= Pr[undetected error in
Pr[no error in 6]

6]

Pr[no error in 51undetected error in 6] X

...

X Pr[no

error in Iino error in 2]


... X

Pr[undetected error in Sino error in 6] Pr[no error in 41undetected error in 5] X

Pr [no error in Iino error in 2]

+ ... +
Pr[no

error in 6] X

...

X Pr[undetected

error in Iino error in 2] [(1-0Aw)(l-a-b)]4

5[a

OAw(1-a-b)]

[O.8(I-a-b)] [a

+ [(1-0Aw)(l-a-b)]S

OAw(1-a-b)]

[u + OAw (1 - a - b)] [(1 - OAw)(1 - a - b)] 4 [4 (1 - a - b) [a + OAw (1 - a - b)] (1 - OAwt (1 - a - b)5 (5 - OAw)

+ (1 - OAw)(1 - a - b)]

Pd

Pr[exactly one invalid character and no undetected errors]


is similarly derived to be

5b [O.6(1-a-b)]

[(1-0Aw)(l-a-b)]4

+ [(1-0Aw)(l-a-b)]Sb

b [I - OAw)(1 - a - b)] 4 [3 (1 - a - b) b(1-0Aw)4(1-a-b)S(4-0Aw).

(1 - OAw)(1 - a - b)]

23

Pd *

= Pr[exactly one invalid character] = =


5b (0.6 - 0.6b) (1 - b)4 4b

(1 - b)5 b

(1- b)5

24

Appendix E. Wanding Performance Probabilities

Pr [one block contains no error]

= Po = Po2

Pr[both blocks contain no error)

Pr [successlboth blocks contain no error]

Pr [one block contains exactly one undetected Pr [both blocks contain exactly one undetected

error and no invalid character) error and no invalid character)

= Pu

2p oPu

Pr [rejectlboth blocks contain exactly one undetected Pr [one block contains no invalid character)

error and no invalid character]

Po*

Pr[both blocks contain no invalid character]

= po *2

Pr[both blocks contain no invalid character and two or more undetected errors)
Pr [rejectltwo or more undetected

errors, no invalid character)

8/9

Pr [substitution errorltwo or more undetected

errors, no invalid character]

1/9

Pr[one block contains exactly one invalid character)


Pr [both blocks contain exactly one invalid character)

= Pd* = 2po*Pd*
1 -

Pr[invalid characters and no correction attempted]

Po*2

Pr [reject Iinvalid characters and no correction attempted) Pr [correction is attempted)

= 2po *Pd*
PoPd

Pr[correction is successfullattempted)
Pr [substitutionerrorlcorrection

Po*Pd*

attempted]

1 -

PoPd Po*Pd*

Therefore

Pr[success)

Pr[successlboth blocks contain no error] Pr[both blocks contain no error]

Pr [successful correctionlattempted)

Pr[correction attempted]

Pr [reject)

Pr [rejectlboth blocks contain one undetected

error, no invalid character)

Pr[both blocks contain one undetected error, no invalid character)

Pr[rejectltwo or more undetected errors, no invalid character)

25

Pr[two

or more undetected errors, no invalid character]


X

Pr[rejectlinvalid characters and no correction attempted]


Pr [invalid characters and no correction attempted]

+ 2p ou + 8/9 (p0*2 P
Pr [substitution error]

P02

2p ou ) - P0 *2 P
_

2Po*Pd*

2p o *pd *

1/9 (2p ou - 8p02 p

P0' *2)
X

Pr[substitution errorltwo or more undetected errors, no invalid character] Pr[two or more undetected errors, no invalid character]

Pr [substitution errorlcorrection attempted]

Pr [correction attempted]

1/9 (po *2

P02

2p ou ) P

+ 2 (p0 *pd * - p od ). p

26

Appendix F. Distribution

of the Number of Scans

Suppose that an item is moved at 100 inches/see (maximum design velocity) at angle of attack=O. The duration of the span access will be 0.300/ I 00 = 0.003 seconds. A requirement of at least one scan from each leg of the X in the duration of span access establishes the maximum period of the spot at 0.003 seconds. Let the item movement velocity be u inches/sec. Then the distance d, Figure F-I, between the successive scans in the direction normal to the scan window is the product of the period and the item movement velocity: d = O.003u (inches). The spans with respect to each leg are

d!
d2

d/v'2 sin (45-8) inches, and d/v'2 cos (45-8) inches

whenever di are positive, where 8 is the angle of attack. The lead, Ii, is the distance in the direction of the bars traversed by the spot when it leaves the symbol block or the projection of its further edge. If a is the width of the symbol block and b the length of the block, in inches,
II
12

= =

tan

(45+8) I inches (45-8) I inches

a I tan

Figure F-l. Number of Spans in a Pass

27

We ignore the effect of The expected number

in this determination

since u/spot

velocity is

is very small.

of scans from each side respectively

as long as the numerator

is positive.

If the numerator

is negative no scan is possible.

The exact number

of scans from each

side is either [nil or [nil +1. The probabili ty of achieving [nil is ni- [nil, and the probability of achieving [nil + I is [nil + l-nr We assume the scans to occur independently (in a probabilistic sense) from each side yielding values of N([ntl N([ntl N([n tl

[n2l

2Ie,u) 1Ie,u)

= ([ntl =
1 -

+
(n -

1 -

nd([n2l -

1 -

n2) -

+ [n2l +

[ntl)(n2

[n2])

([ntl

I -

nd([n2l

I -

n2

N(K) can be found from N(K)

N (K I e ,u) dFe dFuThe distribution ofu was obtained empirically from laboratory data and

Clearly, e is uniformly distributed on [O,27Tl. estimated to follow the probability law:


1414.55

g(u)

= f(4.55)

U5.55

exp (-141/u) numerically.

The distribu tion N( K) was integrated

28

Appendix G. Probabilities of Success, Error, and Rescan for Values of m, n

Suppose

m = 1, n

> 1.

The good scan from the second block can be identified by our assumption. In the first block are one good scan, i-I scans Given that a block contains at least one invalid containing undetected errors and K-i scans containing invalid characters. character, which occurs with probability I-po *, the conditional probability that it contains exactly one error, i.e., that it is correctable, is P d/(1- Po *). The probability that it contains exactly one error which is an invalid character, i. e., that it appears to be correctable, is Pd * /(1-po *). The probability that none of the K-i blocks is correctable is (1 - Pd/(1 - Po *))K-i. Hence the probability that at least one block will match the good block after correction is 1 - (1 - Pd/(1 - Po *))K -i. The corresponding event leads to success. Success is also obtained if none of the blocks appears to be correctable and only the good block satisfies the modulus check. We have i-I scans containing only undetected errors. Out of these are r scans containing more than one error with probability

The probability that any of these r scans will satisfy the modulus check when associated with a good scan from the second block is 1/9, where 10 is the modulus. The probability that none of the scans will satisfy the modulus check is (8/9)'. Hence the probability that only the good scan will satisfy the modulus check is

The probability that none of the blocks appears check occurs with probability

to be correctable

and that only the good scan satisfies the modulus

1 -Po * Pd)

K-i
Hence

All other events lead to a rejection.

Pr [success]

I - (1 Pr[substitution Pr[reject] Suppose characters. probability


[(K-i) Pd/(1

1 ~~o

.)

K-;

+ (

error}

o
- Pr [success] .

m=O, n> I. The first block con tains no good scans, i scans con tain undetected errors, and K -i con lain invalid The good scan of the second block is identifiable. We are successful if at least two scans are correctable. The
that none is correctible is (1 - Pd/(1 - Po *))K-i. The probability that exactly one is correctable - Po *)] (1 - Pd/(1 - Po *))K-i-l. Hence the probability that at least two are correctable is is

1-

I--E.L I(

Po

* )

K-i

1 -

Pd 1 - Po

) K-i-l
29

We are also successful if exactly one appears correctable, is correctable, and none of the K-i scans satisfy the modulus check. This occurs with probability

1 (1 _ ~)-

Po *

K-i-l

A substitution error occurs if none of the K scans appears correctable and exactly one of the more than one error satisfies the modulus check. The probability of the latter event is

-i

i scans from

those with

1:; (~r-l(~)/ql q
(l -

i-r

qoi

Hence the probability of the simultaneous occurrence of both events is

A substitution error also occurs if exactly one of the K-i scans appears correctable, but is not, and none of the i scans satisfies the modulus check. This occurs with probability
Po * 1-9(l_qo) I- Po * (, td :';, Pd, ) -( 1 (K-O) r-i-1(,q (, - ( ;:.) q,)
1 -

K-i-l 1

9(1 -

~qo) .

)'

(1

__

I-po* _) P_d

K-i

I-po* (K-i) Pd

Pr[substitution

error]

+------ * I -Po
Pr[reject]

(Pd * - Pd)(K-i) ( -

I -Po
Pd*

*
)

K-i-l
error].

1 -

Pr[success]

Pr[substitution

We do not attempt to correct if n<2, for then we cannot identify any block as good, so we run the risk that by erroneous correction we satisfy our assumption and, of course, the modulus check. Hence the analysis of performance for n <2 ignores the correctability of the m-i and n-j blocks containing invalid characters. Suppose m= 1, n= I. Neither of the good scans can be identified. Success occurs when exactly one pair of scans satisfies the modulus check, otherwise ambiguity forces a rejection.

30

Pr [success]

q 1 sq2i-s-1

(1 - qa)i-1

Pr[substitution Pr[reject] Suppose

error]

o
Pr[success].

m=O, n= 1.

Pr[substitution

error]

9 Pr [success] Pr[reject] Suppose m=O, n=O. Pr [success] Pr [substitution error]

(%ti_1

o
Pr[substitution error]

Pr [reject 1 By interchanging

Pr[substitution

error] . the symmetric results for m=l, m> 1; n=O,m

i andj,

and m and n, we obtain

> 1 ;n=O,m=l.

31

Appendix H. Systematic Effects of Print Quality Variation

The results of the PIDAS Study [3] contained detailed measurements and statistics of random effects in several printing processes. In this short analysis, we quantify some of the systematic effects that occur. Analysis of a small sample indicates that the magnitude of systematic effects exceeds that of random effects. The data from which these results are obtained were taken from Appendix E of the cited report.

Print Quality Variation

The variation in print quality of the letter "T" was studied. The PIDAS report dealt with the printed letter without consideration of its designed dimensions and contrast and obtained measurements and statistics of variation in line and contrast within the character and its background. In this analysis, we estimate the variation of the mean dimensions of the printed letter about the designed dimensions. Those effects which cause the mean dimensions to differ from the designed dimensions are called systematic effects. Our assumption is that four systematic effects occur in the printing process. They are: distortion parallel to the horizontal axis, distortion parallel to the vertical axis, expansion or contraction of stroke width (spread), and additional deposit of ink (smear) adjacent to the edges of the letter which are parallel to either the horizontal axis or the vertical axis but not both. These effects are shown separately in Figure H-I, but any realization of the printed letter will contain the superposition of all four effects. In Figure H-l are shown the assignments of variables h, v, p and m and the influence of their numeric value on these effects.

The Separation of Systematic Effects

The designed letter "T" contains four measurements, as shown in Figure H-2: character width, x; character height,y; vertical stroke thickness, u; horizontal stroke thickness, w. The printed letter "T" contains corresponding measurements x', y', u " and w'; respectively. The latter measurements are transformed from the former by one of the systems of equations: hx m. +p vw + hu x m+ppmvy or + + v u y (1)

u w y

(2)

The nature of the phenomena investigated requires that any value for the smear effect, m, be non-negative. We show that we can find a solution h,v,p,m satisfying this condition. Proposition: Ifx>u andy>w, then systems of equations (1) and (2) contain exactly one solution each. m is positive in one system if and only if m is negative in the other system.

33

a.

b.

T I T .,.. T T T T T ,h ~ 1 h :: 1

Horizontal

distortion

Vertical

distortion

v ::

?0

p :: 0

c.

Spread

Vertical
m

?0

Horizontal

m?O

d.

Smear

Figure H-1. Systematic Effects

u
34

Letter "T"

tw

Proof" Each system contains four independent exist a non-zero vector a such that:
alx a2Y a1 al

equations.

If the equations

of system (I) were not independent

there would

+ +
+

a3u a4 W

0, 0,
a3

a2 a3

a4

0,
0,

implying that aj -a3. implying that at a3 = 0, implying that a2 - a4. implying that a2 a4 = 0, contradicting the condition that a be a non-zero. Similarly, the equations of system (2) can be shown to be independent. Consequently, h, v, p, and m constitute a basis, and the solution sets to (I) and (2) are unique. Define new variables p 'and m 'such that p ,

m, p ,

m'

p. into the other, such that if h, v, p, m is the

This transformation of p, minto p " m 'also transforms one system of equations solu tion to one system then h, v, p " m " is the solu tion to the other. Now m

p' -

P. m'

p -

p;

so m

= -

m'.

The proposition

is proved.

Now if m =0, then p= p " so the solutions to (I) and (2) are identical. It is physically impossible for smear to be negative. We can therefore solve for h. v. p, m by taking that system of equations, (I) or (2), which yields a non-negative value of m.

Values of Spread and Smear

For 37 items in the cited study, values of x, Y, u. wand x; y ; u; and w 'are tabulated in Appendix E of the PIDAS study. A further 24 items are also tabulated, but for these items measurements were taken from photographs of labels rather than from the labels themselves. We restricted our analysis to those items whose measurements were taken directly from the labels. Values of p, m and p + m are tabulated in Table H-1. p is taken as the ink spread on the entire stroke. It is assumed to be allocated equally half to each edge. p ranges from -4.72 to 4.59 mils with a mean of -0.18 mils and a standard deviation of 2.10 mils. The mean value of p for each edge is therefore -0.09 mils and the standard deviation is 1.05 mils. The effect of m is on one edge only. m ranges from 0 to 7.52 mils with a mean of 2.17 mils and a standard deviation of 1.99 mils. The sum of the variances respectively) indicating of p and m is virtually equal to the variance of the sum of p and m (8.37 mil2 and 8.47 mil2 is plausible.
,

that an assumption

that the two effects occur independently

Conclusion

The sample size of 37 is too small to yield conclusive resul ts on the magnitude of systematic effects relative to random effects. However the standard deviations of the systematic effects measured greatly exceed the standard deviations of edge roughness as measured by variation in bar width in Appendix B of the cited report. We may suspect print quality contains a larger contribution from systematic effects than from random effects. that the degradation of

35

04.59 -0.182.17 2.91 1.98 2.10 -_2.132.13 = m - -4.721.98 -4.003.06 -2.740.03 -2.002.00 0.65 2.68 0.38 00.18 1.632.49 3.863.86 2.242.24 1.89 2.01 0 .04 2.03 2.01 3.54 0.31 3.47 0.83 2.86 8.17 6.01 2.00 0.54 3.37 0 5.97 2.13 0.01 1.19 2.86 3.22 5.60 7.52 5.63 2.85 Smear 1.99 -3.903.47 -0.942.29 10.19 3.40 2 8.65 4.01 0.29 0.11 0.97 4.62 0.09 4.02 2.02 6.39 3.81 0.34 3.70 2.00 1.73 3.37 1.20 0.86 "L=p+m Table H-1. Calculated--0.432.01of Spread, Smear and Their Sum for a Sample of 37 Items Values
deviation Spread = p

References

I. Proposed U.P.C. Symbol, Revision No.1, October 1972, IBM Corporation, Research Triangle Park, N.C. 2. McEnroe, P.V. and J.E. Jones, "Identification Technology for the Retail Industry", October 1971, IBM Corporation, Research Triangle Park, N.C. 3. PIDAS Study of Print Quality Variation, Grocery Industry Committee, July 14, 1972.

36

Systems Development Division Research Triangle Park, North Carolina

\
I

,,/

t\!1tCce_~~ .

_2_345 67

/'"

You might also like