You are on page 1of 3

Information Processing Letters 19 (1984) 17-19 26 July 1984

North-Holland

ON CONVERTING CHARACTER STRINGS TO INTEGERS

J.R. PARKER
Department of Computer Science. University of Calgary. Calgary, Alberta T2N IN4. Canada

Communicated by Kenneth C. Sevcik


Received 26 October 1983
Revised 6 February 1984

Most programs which involve character to integer conversion fail to detect integer overflow correctly. This paper describes
some of the pitfalls of not recognizing overflow, and presents a correct method for doing so in an arbitrary base for any
precision.

Keywords: Programming techniques, data representations

The problem of converting a string of characters, procedures is that almost all existing processors
which represent the digits of an integer, into the impose an upper limit on the size (magnitude) of
corresponding internal integer form is superficially an integer, which the conversion programs ignore.
an easy one to solve. Most often, the actual num- Let us call the largest representable number on a
bers involved are given in base ten by the ASCII particular system M. The penalty for attempting to
digits ‘0’ through ‘9’. These digits are weighted by exceed M varies from system to system, but the
powers of ten based on their position in the string, usual result is that the number concerned appears
and accumulated one digit at a time until none to change sign. An attempt to produce a number
remain. Almost all text-driven programs, espe- exceeding M is usually called an overflow. Many
cially compilers, use this method. However, almost examined procedures failed to check for overflow.
all of these programs are incomplete or incorrect. While this problem seems obvious and easily
A more general problem uses an arbitrary base dealt with, a sample of over 25 programs examined
B instead of ten. While it is not absolutely neces- during the writing of this paper yielded only one
sary to do so, the value of B will be restricted to program which correctly checked for overflow.
the positive integers. For positional number sys- Included in this collection were procedures taken
tems in base B, the sequence of digits from a number of actual production compilers,
which incorrectly process integer constants or im-
d,d,d,...d,
properly read integers from text files [3,4,7]. After
denotes a number I, which can be computed using seeing the large number of erroneous solutions, the
the recurrence relation problem was given as an assignment in a fourth
year (senior) computer science course, with the
I, = 0, Ii = I,_,B + di.
instructions that the solution should be correct,
This well-known relation forms the basis for most complete and robust. None of the students suc-
character to integer conversion schemes. For nega- ceeded.
tive integers it is possible to allow Ii c 0 when During the examination of sample programs,
necessary. the two problems most frequently encountered
The problem with most character to integer were the following:

0020-0190/84/%3.00 0 1984, Elsevier Science Publishers B.V. (North-Holland) 17


Volume 19,Number 1 INFORMATIONPROCESSINGLETTERS 26July1984

(1) Overflow is ignored completely. If an in- precisely when the relation I, = Ii _ , B + d, results
teger greater than M is encountered, this fact is in Ii being representable (i.e., I, < M) [l]. This
not detected. The result is an erroneous, condition must not itself cause an overflow to
machine-dependent result, which may have un- occur, because the result of an overflow is unpre-
pleasant side-effects, and which is difficult to de- dictable. For example, some system will trap over-
bug. flows as a runtime error. Hence, the condition will
(2) Overflow is incorrectly (prematurely) indi- be expressed in terms of M, B, d i and Ii_ ,, and
cated, with the result that some legal and perhaps must never result in a number which exceeds M.
important integers (such as M itself) cannot be This condition will be true when Ii will be repre-
constructed. sentable, and false otherwise.
Problem (1) is the more serious of the two, as the
following example [6] is intended to illustrate. Proposition. Let Ii, M, B and d i be defined as
Consider a 16-bit processor, on which a C array before. Then, using integer arithmetic, the inequality
declaration is entered as Ii_, <(M - d,)/B holds if and on/y if Ii is repre-
sentable.
char A[ 66 0001;

The number 66000 overflows twice while being Proof. The proof of this proposition is trivial, and
processed by the compiler, giving the equivalent is included here for completeness. The demonstra-
declaration tion is in two parts. First, we show that if Ii_, d
(M - di)/B, then Ii g M:
char A[ 4641;
M-d.
where 464 = 66000 (mod 32 768). The result of all Ii-*<---! + Ii-,B< yB.,-di
I3
of this is a much smaller array than requested, and
no indication that a reduction in storage has oc- -) I,_,B + di d M,
curred. Hence, the program will produce a confus-
ing array bounds error at runtime. which implies that Ii < M since Ii = I,_,B + di.
A frustrating, but less serious, problem involves Now we show that if Ii < M, then Ii_, Q (M -
premature overflow detection on another 16-bit di)/B:
processor. Within a PASCAL compiler under con-
I,<M + Ii_,B+digM
struction, the statement
* I,_,B<M-di
const mxint = 32 767;
~ Ii_, ~ (M-di)/B.
appeared. It failed to compile, because the con-
stant 32767 was incorrectly flagged as being too
The expression above can easily be converted into
large to represent.
working code in any programming language. For
A final example was discovered during the pro-
example, PASCAL code which would implement
duction of this paper. A PASCAL compiler used to
this for base 10 integers would be
compile some of the test programs included here
correctly detected 32767 as the maximum integer. INT:= 0;
It did not have a builtin constant for MININT, while (CH 2‘0') and (CH g‘9') do
though, and it would not accept the constant begin
- 32 768. However, the expression - 32 767 - 1 DIG := ord(CH) - ord(‘0’);
was accepted, and was correct for the two’s com- if INT Q (maxint - DIG) div 10 then
plement machine. These problems, and a number INT:= INT*lo+ DIG
of others, would be avoided by correctly checking eke OVEWLOW I= true;
for a potential overflow when constructing an GETCH(CH)
integer. end;
What is needed is a condition which defines if OVERFLOW then

18
Volume 19. Number I INFORMATION PROCESSING LETTERS 26 July 1984

begin appropriate expression to check for overflow. For


IN-r:= 0; negative integers, the individual digits would be
ERROR negated before being added to the accumulated
end, sum.
The expressions above will permit the correct
This expression is only valid for positive integers.
scanning of both positive and negative integers in
The negative integer case is also straightforward,
any integer base.
but is not necessarily symmetrical [5]. For exam-
ple, if the underlying hardware uses two’s comple-
ment, then the legal range of integers is
References
-2”_’ g I < 2”-’ - 1,
(11 G. Birtwistle. Personal communication, 1982.
where w is the word size, and there is clearly one [2] J.R. Parker, A general character to integer conversion
more negative than positive integer. This should be method, Res. Rept. *82/91/10. University of Calgary,
taken into account. Department of Computer Science, 1982.
Let N be the minimum integer which is repre- [3] B.W. Kemighan and P.J. Plauger. Software Tools
(Addison-Wesley, Reading. MA. 1976) p. 63.
sentable (N G 0). PASCAL, for example, often has a [4] B.W. Kemighan and D.M. Ritchie, The C Programming
builtin variable MININT which provides this value. Language (Prentice-Hall, Englewcod Cliffs. NJ, 1978) p. 39.
The expression [S] J.R. Low, A short note on scanning signed integers, SIG-
PLAN Notices 14 (1) (1979).
Ii-1 >, (N + di)/B (61 R.M. Neal, Personal communication, 1982.
[7] N. Wirth, Systematic Programming - An Introduction
is true if and only if Ii is representable. A conver- (Prentice-Hall, Englewood Cliffs, NJ, 1973) p. 105.
sion routine would test for sign, and then use the

19

You might also like