You are on page 1of 8

THE INTEL~8087 NUMERIC DATA PROCESSOR

John Palmer

Intel Corporation

This paper describes a new device, the Intel~ new applications, most notably interval arithmetic
8087 Numeric Data Processor, with unprecedented [1]. The 8087 provides an unprecedented level of
speed, accuracy and capability. Its modified stack capability, safety and r e l i a b i l i t y with high per-
architecture and instruction set are explained and formance and low cost and is a prime example of the
i l l u s t r a t i v e examples are included. The 8087, almost incredible p o s s i b i l i t i e s in combining soft-
which conforms to the proposed IEEE FloAting-Point ware and architectural expertise with VLSI proces-
Standard, is a coprocessor in the Intel~8086 fam- sing capability.
i l y . I t supports seven data types: three REAL,
three INTEGERand one packed BCD format, and per- 2.0 8087 OVERVIEW
forms a l l necessary numeric operations from addi-
tion to logarithmic and trigonometric functions. The 8087 consists of a stack of registers for
holding operands and results, a set of registers
constituting i t s environment and a set of instruc-
tions.

The stack is a set of 8 registers, each 80


bits wide. Associated with the stack is a three
b i t stack pointer, TOP, and with each stack element
a two b i t tag. (Both the tags and TOP physically
belong to the ENVIRONMENT but w i l l be shown with
the stack.) The stack elements are numbered rela-
tive to TOP (ST(i) means the i t h stack element from
the top of stack) as shown bel~.

TAGS STACK
1.0 INTRODUCTION
¢- SIGN
The Intel~)8087 is a high performance gen- -7,=1 o
eral purpose nume~c data processor. I t is a
part of the InteNJ8086 family and can be used 5 EXPONENT 51GNIFleRND ST{~)
with either the 8086 or the 8088 to extend their
instruction sets by over 120 numeric data manip- ST (a)
ulation operations. The 8087 is not a peripheral ST(K)
but a coprocessor; i t monitors the instruction ST(o) STBCK
stream and when an 8086/8088 ESCAPE instruction
is read, the 8087 takes over the bus and inter-
prets and executes the ESCAPE instruction as one
of i t s own instructions. This t i g h t l y coupled
coprocessing interface permits the 8087 to exe-
i ST(G)
cute numeric instructions while the 8086 executes II 5T(5)
any others. The concurrent instruction execution ST(4)
increases the throughput of the system. Further-
more, the 8087 is the only chip that must be added
to an 8086 (8088) system to provide numeric capa-
b i l i t y that exceeds software in speed by more than The tag f i e l d is used to detect u n i n i t i a l i z e d
a factor of 100. stack elements and to designate special values
(e.g. zero) for microcode optimization.
The 8087 is intended to be general purpose
and satisfy a very wide range of needs for math- The value represented in a register has 64
ematical computation. I t is fast enough for a
great many s c i e n t i f i c and s t a t i s t i c a l calculations; bits of precision and a range of about 10±4900 (15
i t is accurate enough for business and commercial b i t exponent). A more complete description of the
computation; and i t is precise enough for entirely register values w i l l be given in Section 3.

CH1494-4/80/0000-0174 $00.75 © 1980 IEEE


474
The 8087 environment consists of seven words i t causes an exception t h a t generates an i n t e r r u p t .
as i l l u s t r a t e d below.
There are four types of 8087 i n s t r u c t i o n s : the
B Z TOP~ C AISIN -iP U O (~!DI ST/~TU5 CORE set, the EXTENDED set, the SPECIAL FUNCTION
CONTROL set and the ADMINISTRATIVE set. The core set i n -
WORD cludes load and store of the stack values and a-
TAG r i t h m e t i c operators: add, s u b t r a c t , m u l t i p l y , d i -
WOR.D vide and compare. The extended set is f o r loading
~NST~UC- and s t o r i n g three special formats (see Section 3).
T ion
The special f u n c t i o n set includes square root and
RDDIZESS transcendental f u n c t i o n support. The administra-
t i v e i n s t r u c t i o n s are used f o r context switching
DATA and processor c o n t r o l . Most of the i n s t r u c t i o n s
w i l l be described in more d e t a i l as the 8087 de-
sign goals are explained.
The STATUS word consists of the EXCEPTION
f l a g s (0-7) and the STATUS b i t s (8-15) where the 3.0 DESIGN GOALS
meanings are (* indicates a f i e l d reserved f o r
f u t u r e use): The 8087 is designed to achieve several major
goals. F i r s t , the 8087 conforms to an improved
EXCEPTION FLAGS and expanded version of I n t e l ' s standard f o r f l o a t -
i n g - p o i n t a r i t h m e t i c C2]. Second, the 8087 prov-
I : i n v a l i d operation ides s i g n i f i c a n t l y more c a p a b i l i t y than mainframe
D : denormalized operand and minicomputer f l o a t i n g - p o i n t processors and
Q : d i v i s i o n of nonzero by zero consequently has a p p l i c a t i o n s beyond s c i e n t i f i c
0 : overflow computation. T h i r d , the 8087 is convenient to use
U : underflow in assembly language and easy to generate code f o r
P : inexact ( p r e c i s i o n ) in high level language. And f i n a l l y the capabil-
N : indicates a pending i n t e r r u p t i t i e s of VLSI are used to provide a l l t h i s f u n c t i o n -
a l i t y with high performance and e f f i c i e n c y in a
STATUS BITS s i n g l e device.

Z,C,A,S : c o n d i t i o n code b i t s f o r various 3.1 F l o a t i n g - P o i n t Standard


i n s t r u c t i o n s (e.g. COMPARE)
The I n t e l f l o a t i n g - p o i n t standard, called the
TOP : stack p o i n t e r REALMATH standard, was o r i g i n a l l y specified in
1977 C2] and implemented in several products (FPAL,
B : indicates whether the 8087 is SBC-310, FORTRAN-80, BASIC-80). At about t h a t time
BUSY (used f o r synchronization) an IEEE committee was formed to propose a f l o a t i n g -
p o i n t standard f o r microprocessors. I n t e l was i n -
The CONTROL WORD consists of EXCEPTION MASKS v i t e d to p a r t i c i p a t e and offered i t s standard f o r
and CONTROL BITS. For each exception there is a consideration.
mask which i f reset allows an i n t e r r u p t to be gen-
erated ( i f M = O) but i f set the i n t e r r u p t is sup- At the time t h i s paper was w r i t t e n i t had be-
pressed and the 8087 executes a d e f a u l t exception come apparent t h a t the m a j o r i t y of the committee
handling procedure (on chip) and continues (the had agreed on a revised and expanded version of
procedure w i l l be explained in Section I I I ) . The I n t e l ' s standard [ 3 ] . The standard s p e c i f i e s data
M mask is the 8087 i n t e r r u p t enable/disable b i t . formats, rounding algorithms and exception han-
The CONTROL BITS have the f o l l o w i n g meaning dling.

PC : precision control - r e s u l t s are rounded The standard s p e c i f i e s and the 8087 supports
to one of three p r e c i s i o n s : Temporary three f l o a t i n g - p o i n t data types: Real ( s i n g l e
Real (64 b i t s ) , Long Real (53 b i t s ) , p r e c i s i o n ) , Long Real (double p r e c i s i o n ) and Tem-
Real (24 b i t s ) . porary Real (extended p r e c i s i o n ) . A l l formats are
binary and each has a biased exponent. The values
RC : rounding control - r e s u l t s are rounded represented by the three formats are shown below.
in one of four d i r e c t i o n s : unbiased
round to nearest, round towards + ~ ,
,~m ~ o
round towards - ~ , round towards zero.
IC : i n f i n i t y control - there are two types
of i n f i n i t y a r i t h m e t i c provided: a f f i n e
and p r o j e c t i v e .

The TAG word contains tags describing the


'Tq 6~ 0
contents of the corresponding stack elements. The
i n s t r u c t i o n and data pointers are the addresses o f
an i n s t r u c t i o n (and i t s referenced data i f any) i f I~hL I

t75
i. I : i n v a l i d operation
RERL.. LONGR£BL T g M R REAL t h i s exception is signaled by stack
TOTRL. ~4 bi-t-5 80 bit5 overflow or underflow, the use of a
L.E N ~TH 3 E bits NAN as an operand and several other
EXPoNENT '3 bits I I bi'i~ 15 bits cases as l i s t e d in ~3]
LENGTH
EXPO~E~,,I'r p.,1 _ [ ~.,o_ I ~'~- I 2. D: denormalized operand

VALU4 e.-O
<o..F') at least one operand is denormalized

[e,,o, JJ...I 3. Q : zero divisor


the dividend is f i n i t e and nonzero
INFINITY e.-/l"-I,-F':O e : l l , . . I , ~ : O e.tl...I,i.l,.F':O while the divisor is zero
NOT'-A- e:~l...i ,-C¢O e:ll-"l ,.4~0 e,Jl..-I,i.I,-~-O
NUrvaBEI~. 4. 0 : overflow
C.aN) the exponent of the result is too
The Temporary Real format (identical to the large for the destination's format
8087 register format) is intended to hold inter-
mediates and to support accurate Long Real cal- 5. U : underflow
culations. I t has an e x p l i c i t leading b i t ( i ) in the exponent of the r e s u l t is too
the significand thus allowing unnormalized arith- small f o r the d e s t i n a t i o n ' s format
metic. However, the algorithms are designed so
6. P : inexact result
that normalized operands w i l l always yield nor-
malized results. the delivered result is not equal to
the completely precise result but has
The algorithms specified by the standard re- been rounded
quire that the completely precise result of an
operation be rounded to the nearest representable Since the default response to overflow and
number, breaking ties by rounding to the nearest zero divisor is to set the result t o n , the 8087
even number. This default mode of rounding is supports two modes of i n f i n i t y arithmetic:
called "unbiased round to nearest". There are
,optional "directed rounding" modes that are spec- I. a f f i n e - there are two i n f i n i t e s , one
i f i e d to yield ( - ~ ) less than a l l other numbers and one
(+cx:~) greater
1. the nearest neighbor less than or equal
to the true result. 2. projective - there is only one i n f i n i t y
(the sign o n - - i s ignored) which closes
2. the nearest neighbor greater than or the number system analogous to the point
equal to the true result. a t ~ o n the Reimann sphere.

The 8087 provides these rounding modes as con- These two modes require the representation of
trolled by a f i e l d (RC) in the CONTROLWORD. two zeros (±0) which are "equal" in comparison and
a l l other operations except division where*I/+O=,loc~
The 8087, which does a l l c a l c u l a t i o n s in +I#O:-~. The mode of i n f i n i t y arithmetic is de-
Temporary Real format, has another f i e l d in the termined by a f i e l d (IC) in the CONTROLword.
CONTROL word f o r s p e c i f y i n g the precision to which
a r e s u l t is rounded (PC). Thus, the p r e c i s i o n of There are instructions that support the stand-
r e s u l t s is independent of the p r e c i s i o n of operands ard by controlling rounding, precision and i n f i n -
and, though held in Temporary Real format and ben- i t y arithmetic and by permitting complete exception
e f i t t i n g from extended range, may be forced to handling. These instructions load and store either
Real, Long Real or Temporary Real. This control the control word or the entire environment and
is provided f o r languages t h a t do not a l l o w ex- store the exception flags.
tended p r e c i s i o n intermediates and to allow the
same code to be run under d i f f e r e n t precision set- The features and instructions discussed above
t i n g s as an aid to e r r o r estimation. support the Intel floating-point (REALMATH) stand-
ard but additional capability is also desired.
The standard also specifies that a l l excep-
tions must be detected and that an implementation 3.2 Capability Extension
should permit exception handling. The 8087 sup-
ports this by detecting six types of exceptions The 8087, by supporting the required and op-
and by generating an interrupt i f the exception is tional aspects of the standard and by supporting
not masked. I f an interrupt is generated, the in- several features not mentioned by the standard,
terrupt procedure (exception handler) has avail- s i g n i f i c a n t l y extends the capabilities of the 8086
able the exception flags, a pointer to the instruc- family beyond that expected from a typical floating-
tion causing the interrupt and a pointer to the point processor. These extensions include addi-
tional data types, provision of exact arithmetic,
datum i f memory was addressed. The six exceptions,
each of which has an associated "sticky" flag (once support for interval arithmetic and special func-
set i t remains set until reset by software), are tions.
listed below.

176
The 8087 addresses seven d i f f e r e n t data types Start at a and proceed clockwise u n t i l b is
using a l l of the 8086 addressing modes. These data reached; a l l numbers covered belong to I. The
types are: signs on zero and i n f i n i t y permit us to have open
or closed intervals when zero or i n f i n i t y is an end
1. Real (32 b i t s ) point with the sign denoting which case pertains.
I f an endpoint is neither zero nor i n f i n i t y then
2. Long Real (64 b i t s ) the interval is always closed. A complete d e f i n i -
tion of interval arithmetic cannot be given here;
3. Temporary Real (80 b i t s ) however, we can l i s t some of i t s uses. In addition
to i t s obvious a b i l i t y to bound rounding errors,
4. Integer Word (16 b i t s 2's complement) interval arithmetic can be used to estimate the
effect of noise in data, to compute confidence in-
5. Integer (32 b i t 2's complement) tervals and to do worst-case analysis.

6. Long Integer (64 b i t 2's complement) In a d d i t i o n to exact and i n t e r v a l a r i t h m e t i c ,


the 8087 provides several special i n s t r u c t i o n s f o r
7. Packed BCD Integer (80 b i t s , 18 d i g i t s and e f f i c i e n t evaluation of many important mathematical
sign) f u n c t i o n s with unprecendented accuracy. One of
these i n s t r u c t i o n s is square root. I t overwrites
A l l of the data types, when used as operands, the contents of the top of stack with i t s c o r r e c t l y
are f i r s t converted (without rounding e r r o r ) to rounded (according to RC and PC) square root. Be-
Temporary Real and the r e s u l t of the operation is sides being c o r r e c t l y rounded the square root op-
also returned as Temporary Real. Thus the 8087 eration is as f a s t as the d i v i d e i n s t r u c t i o n .
a r i t h m e t i c u n i t only has to work with one kind of Thus algorithms need not be contorted to remove
data. When r e s u l t s are desired in one of the other square roots.
formats, they are a u t o m a t i c a l l y converted to t h a t
type before they are stored in memory. There are two i n s t r u c t i o n s to aid in argument
reduction f o r transcendental f u n c t i o n e v a l u a t i o n :
The provision of exact a r i t h m e t i c is accom- DECOMPOSE and REMAINDER. The decompose i n s t r u c t i o n
plished by i n c l u d i n g the inexact exception (P) overwrites the contents of the top of stack with
along with i t s mask. I f a rounding e r r o r is com- the i n t e g r a l value of i t s exponent in Temporary
mitted, the c o r r e c t l y rounded r e s u l t is delivered Real format, decrements the stack p o i n t e r and loads
and the P f l a g is set. I f the mask (PM) is zero i n t o the new top of stack the value of the s i g n i f i -
an i n t e r r u p t is generated, otherwise execution cand of the o r i g i n a l stack top scaled between I and
simply continues. This permits f i n a n c i a l account- 2 (or -1 and -2 i f negative). The operation is i l -
ing functions to be performed w i t h o u t fear of l u s t r a t e d below.
roundoff e r r o r . Exact a r i t h m e t i c is also useful
in doing c o e f f i c i e n t " p r e c o n d i t i o n i n g " [see 4].

The support of i n t e r v a l a r i t h m e t i c is consid- A A


ered one of the most important features of the 8087.
As stated by W. Kahan [ 5 ] :
Top sl p lil
"No other feature would enhance safe
numerical computation more than the
provision of INTERVAL as a data type
in FORTRAN as r e a d i l y accessible as ( I f the o r i g i n a l top of stack is zero then both
r e s u l t s are zero.)
INTEGER or REAL."

This new INTERVAL data type, which the 8087 The remainder i n s t r u c t i o n is f o r reducing ar-
supports through the rounding modes (RC) and the guments of periodic f u n c t i o n s to a primary range.
signed zeros and i n f i n i t i e s , can be represented I t c a l c u l a t e s the exact remainder (no roundoff er-
as an ordered pa~r: INTERVAL, I = [a,b~. If a~b ror) of the top two stack elements:
then I includes a l l numbers between a and b; but
REM = (TOP) modulo (next-of-TOP)
i f a > b then I includes a l l numbers x where x ~ a
or x ~ b . An i l l u s t r a t i o n may help c l a r i f y the con-
The remainder is returned to the stack top and the
cept. Consider the set of numbers as a c i r c l e with
the two cases described above pictured as next-of-TOP ( " d i v i s o r " ) is not changed. Since the
execution of a f l o a t i n g - p o i n t remainder could be
very lengthy, the remainder i n s t r u c t i o n is a c t u a l l y
a primitive: the r e s u l t is e i t h e r the remainder or
the p a r t i a l remainder a f t e r a f i x e d number of steps.
Thus to compute a remainder requires a software
loop that terminates when I(TOP)I is less than
I(TOP +I) I. Even by using remainder we w i l l not
have t r i g o n o m e t r i c functions with period 2'Irsince
'IT'cannot be e x a c t l y represented in the 8087. How-
ever, the functions w i l l be e x a c t l y p e r i o d i c with
o 0

'177
period 2"Ir'* (whereqT'* is the machine approximation l i n k stage, i t is necessary to explain the 8086-
to.lr') and thus w i l l obey the i d e n t i t i e s t h a t do not 8087 i n t e r f a c e .
explicitly involveqT'.
The 8086 (8088) has a set of ESCAPE i n s t r u c -
The other i n s t r u c t i o n s provided f o r special t i o n s t h a t , in memory addressing mode, cause the
functions are TANGENT, ARCTANGENT, EXPONENTIAL and 8086 to c a l c u l a t e the address and read the contents
LOGARITHM. of t h a t address. The 8086 ignores the word i t
reads and then preceeds to execute subsequent i n -
The tangent assumes the top of stack, X, i s s t r u c t i o n s . The 8087 is monitoring the same i n -
between zero and'IT'/4 and returns two r e s u l t s as s t r u c t i o n stream and when i t detects an ESCAPE i t
shown: . knows t h a t i t is being i n s t r u c t e d to do something.
I t latches the opcode and i f there was an address

ToP .
A
X
I t
I
TAN

T~P
/
/
A
Y
c a l c u l a t e d the 8087 captures both the address and
the datum read by the 8086. By decoding the i n -
s t r u c t i o n the 8087 knows how many more words i t
meeds from memory and i t increments the address and
fetches data u n t i l a l l required data is read. The
8087 then releases the bus and begins c a l c u l a t i n g
w h i l e the 8086 continues executing the i n s t r u c t i o n
The arctangent works in reverse by using two argu-
stream. Because of the overlapped coprocessing of
ments and r e t u r n i n g one: the 8086-8087 i t is necessary to preceed 8087 i n -
s t r u c t i o n s (ESCAPE) with a WAIT i n s t r u c t i o n in or-
der to synchronize the two processors. In place
A
:

Ily~z
ATAN
>O
.
A
" of the WAIT, when the software emulator is to be
invoked, an INTERRUPT i n s t r u c t i o n is inserted.
y IT°p " X There are some other d i f f e r e n c e s between the hard-
ware and software i n t e r f a c e s but they are the same
TOP = ~ II X--arc'fon(y~)j length and use the same addressing mechanism. This
The exponential i n s t r u c t i o n , which c a l c u l a t e s permits a compiler to output an external reference
instead of the WAIT-ESCAPE and l e t the LINKER f i l l
2 X -1, assumes t h a t 0 _~x~1/2 and overwrites the
in with e i t h e r WAIT-ESCAPE or INTERRUPT depending
argument on the top of the stack with the r e s u l t . on whether the user has an 8087 or desires to use
The logarithm f u n c t i o n , which computes Y * log2(X),
the emulator.
uses two arguments and returns a s i n g l e r e s u l t as
shown In a d d i t i o n to software emulation to aid s o f t -
ware development, the 8087 has an e i g h t level stack

I i i
of r e g i s t e r s t h a t supports the Temporary Real (80
b i t ) format and makes the 8087 f a r easier to use
than other f l o a t i n g - p o i n t processors. A l l calcu-
Y x >o ~" l a t i o n s are done in t h i s extended format and as
TOP ~ X [~:y~loq~Cx)l long as intermediates are kept in the stack or i t s
e q u i v a l e n t memory format ( i f e i g h t is not enough)
The e r r o r bound f o r a l l these f u n c t i o n s is about 2 then the t h r e a t of roundoff damage and r i s k of over-
u n i t s in the l a s t place thus a l l o w i n g f o r Long Real flow or underflow is g r e a t l y reduced. Roundoff er-
arguments to be computed to Long Real accuracy. ror is reduced because Temporary Real intermediates
The p r o v i s i o n of the described special f u n c t i o n s are more precise than Long Real data or f i n a l re-
support the goal of increased c a p a b i l i t y . s u l t s by eleven guard b i t s . Most overflows and
underflows occur on intermediate c a l c u l a t i o n s and
3.3 Ease of Use the extended range of Temporary over Long Real
(1024900 vs. 10 ±308 ) ensures t h a t on intermediates
As stated above, ease of use, along with sup- these exceptions need seldom, i f ever, occur.
port of the standard and extended c a p a b i l i t y , is
a major 8087 goal. We have made the 8087 easy and The symmetric mixed mode i n s t r u c t i o n set also
convenient f o r programmers and automatic code gen- c o n t r i b u t e s to ease of use. The CORE i n s t r u c t i o n s ,
erators by providing software emulation, a deep which include LOAD, STORE & POP, STORE, ADD, SUB-
(8 l e v e l s ) i n t e r n a l stack of very wide precision TRACT, SUBTRACT REVERSE, MULTIPLY, DIVIDE, DIVIDE
(64 bits) and large range (10:1:4900), optimized sym- REVERSE, COMPARE, and COMPARE & POP, take one o-
metric mixed mode arithmetic and on chip default perand from the top of stack and a second operand
exception handling. from e i t h e r memory or a stack element. There are
thus two forms of CORE i n s t r u c t i o n s : memory ad-
The i n t e r f a c e between the 8086 (8088) and 8087 dressed and stack addressed. The memory addressed
allows f o r software emulation of the 8087 permit- form supports four memory formats in a l l 8086 ad-
t i n g software f o r the 8087 to be developed, de- dressing modes:
bugged and executed on a system containing only an
8086 (8088). In order to run the developed soft- Integer Word (16 b i t 2's complement)
ware on an 8087 i t is not necessary to recompile Integer (32 b i t 2's complement)
but only r e l i n k . To understand how one can delay Real (32 b i t )
the resolution of either 8087 or emulator u n t i l the Long Real (64 b i t )

~78
The LOAD Integer i n s t r u c t i o n converts an i n t e g e r p h i c a l l y l a r g e r ( i g n o r i n g the sign) otherwise i t
to Temporary Real format and pushes i t on the stack; generates a special NAN c a l l e d INDEFINITE as the
the ADD Long Real i n s t r u c t i o n converts a Long Real result.
operand to Temporary Real and adds i t to the top of
the stack; and t h e STORE Integer Word i n s t r u c t i o n 2. Denormalized Operand - the operand is con-
converts the top of stack to a 16 b i t integer and verted to an e q u i v a l e n t unnormalized rep-
stores i t in memory ( w i t h o u t a l t e r i n g the contents resentation preserving the same number of
of the stack). leading zeros.

The stack addressed form of the CORE i n s t r u c - 3. Zero D i v i s o r - since the dividend is non-
t i o n s obtains the second operand from one of the zero the r e s u l t is ± ~ with the sign set
stack elements instead of memory. The reference in the usual way (XOR of the signs of the
is always r e l a t i v e to the top of stack; thus stack operands).
element i , where i:O . . . . . 7, refers to the i t h ele-
ment of the stack under the top of stack. The 4. Overflow - the r e s u l t i s ~ w i t h the sign
stack addressed form has two options f o r the des- of the overflowed r e s u l t .
t i n a t i o n of the r e s u l t . The r e s u l t can e i t h e r over-
w r i t e the top of stack or replace the contents of 5. Underflow - the r e s u l t is denormalized to
the i t h stack element depending on the s e t t i n g of f i t the d e s t i n a t i o n ' s format ("gradual
the "di-rection" (D) b i t in the i n s t r u c t i o n . I f the underflow" E4J).
d e s t i n a t i o n is the i t h stack element then depending
on the s e t t i n g of another b i t (the "pop" (P) b i t ) 6. Inexact Result - the c o r r e c t l y rounded
the stack is popped or l e f t unaltered. r e s u l t is returned.

The EXTENDED instructi~on set consists of two A l l of the features discussed above: software em-
memory addressed type of i n s t r u c t i o n s , LOAD and u l a t i o n , deep Temporary Real stack, symmetric and
STORE & POP, t h a t support three a d d i t i o n a l memory powerful i n s t r u c t i o n set and d e f a u l t exception
formats: handling, make the 8087 easy and convenient to use;
but to be useful i t must also be e f f i c i e n t .
Long Integer (64 b i t 2's complement)
Temporary Real (80 b i t ) 3.4 Effic.iency
Packed BCD (80 b i t )
E f f i c i e n c y was a major goal in the design of
The Temporary Real format is supported f o r extending the 8087. An extensive treatment of the i n t e r n a l
the 8087 stack to memory when necessary; the Packed hardware and algorithms w i l l be given elsewhere,
BCD format, which is a signed 18 d i g i t i n t e g e r as but a b r i e f d e s c r i p t i o n w i l l i l l u s t r a t e our concern
shown, f o r performance. The 8087's main ALU is more than
64 b i t s wide. This is to handle e f f i c i e n t l y 64
b i t operands with guard, round and s t i c k y b i t s [ 6 ]
°I °°. Hod and at l e a s t one overflow b i t . I t s s h i f t e r can
s h i f t r i g h t or l e f t from 0 to 63 places in one
clock cycle. This is useful f o r f o r m a t t i n g , nor-
is used to aid binary-decimal conversion and COBOL malizing and denormalizing and f o r the transcen-
type c a l c u l a t i o n s ; and the Long Integer format is dental f u n c t i o n s . For normalizing there is hard-
supported f o r a p p l i c a t i o n s r e q u i r i n g very wide pre- ware f o r detecting the p o s i t i o n of the most s i g -
c i s i o n exact computation. Again i t is important n i f i c a n t one. F i n a l l y , there is special harc~ware
to note t h a t conversion of these formats to Tem- to permit m u l t i p l y , d i v i d e , remainder and square
porary Real is done with no rounding e r r o r . root to be calculated r a p i d l y . Approximate speeds
of the basic operations f o r stack operands are
Another i n s t r u c t i o n , included to make the 8087 summarized below:
easy to use, is in n e i t h e r the CORE nor the EXTEN-
5MHz
DED set but i t s value is obvious. That i n s t r u c t i o n
Microseconds
is EXCHANGEtop of stack with the i t h stack element.
This i n s t r u c t i o n has no memory form and ignores the COMPARE 5
D and P b i t s . ADD (MAGNITUDE) 10
SUBTRACT (MAGNITUDE) 16
A f u r t h e r user convenience in the 8087 is i t s MULTIPLY 16, 24*
on-chip d e f a u l t exception handling. Though i t is DIVIDE 38
possible to handle exceptions with software, i t is SQUARE ROOT 38
often an onerous task to w r i t e , debug and maintain
exception handlers. The d e f a u l t 8087 response to * shorter time i f e i t h e r operand was o r i g i n a l l y
an exception is invoked by masking in the CONTROL Real (32 b i t )
WORD t h a t exception. The 8087's response to masked
exceptions balances safety With the u t i l i t y of con- The above timings apply f o r Real, Long Real or
tinued c a l c u l a t i o n . Listed below are the d e f a u l t Temporary Real operands and r e s u l t s . The p r e v i -
responses to masked exceptions: ously described overlapped i n s t r u c t i o n execution
by the 8086 and 8087 also increases throughput.
1. I n v a l i d Operation - i f e i t h e r operand is However, more important t h a t absolute execution
NAN, the 8087 propagates the l e x i c o g r a - speeds is the stack with i t s i n t e r n a l addressing

'179
t h a t minimizes memory referencing. There is an i n - 5. Add TOP (XT) to Sx and POP
s t r u c t i o n f o r scaling t h a t is much f a s t e r than mul-
tiply. For rapid context s w i t c h i n g , the 8087 has 6. LOAD Yi
SAVE and RESTORE i n s t r u c t i o n s . The i n s t r u c t i o n set
and the hardware to execute i t r a p i d l y give the 7. Add TOP (Yj) to My
8087 very high performance w i t h o u t s a c r i f i c i n g
quality. 8. M u l t i p l y TOP (Yi) to Xi

4.0 CONCLUSION 9. Square TOP

To i l l u s t r a t e the c a p a b i l i t i e s of the 8087 an I0. Add TOP ( Y ~ to Sy and POP


extensive set of programs would be very u s e f u l . We
w i l l here give two examples t h a t should r e i n f o r c e 11. Add TOP ( X i Y i ) to Cxy and POP
many of the points made e a r l i e r . The f i r s t example
is to c a l c u l a t e the length of a vector. The task 12. Loop to Step I
is conceptually simple but a r e l i a b l e , robust pro-
gram f o r the t y p i c a l f l o a t i n g - p o i n t system is hard The inner loop of t h i s program has only eleven
to produce. With the 8087 i t is easy, almost auto- 8087 i n s t r u c t i o n s and has the same properties of
matic, to produce such a program. r e l i a b i l i t y and robustness as the f i r s t example.
I t is also e f f i c i e n t since the minimum computation
Temporary Real : SUM and memory addressing is done.
Long Real : X (I), L
SUM : = 0 The I n t e l 8087 Numeric Data Processor, along
For I = 1 to N Do with i t s design goals of meeting I n t e l ' s REALMATH
SUM : = SUM + X ( I ) * * 2 standard, and providing increased c a p a b i l i t y , ease
of use and performance, has been described. We
L : = SQRT (SUM) have attempted to balance safety and u t i l i t y and
have provided an unprecendented level of c a p a b i l -
This program is free from intermediate overflow or i t y , accuracy and r e l i a b i l i t y in a math processor.
underflow problems and unless N is very large i t s
only s i g n i f i c a n t rounding e r r o r is in the l a s t i n - 5.0 ACKNOWLEDGEMENTS
, s t r u c t i o n - where i t is unavoidable but easy to
analyze. There are a great number of people who deserve
r e c o g n i t i o n f o r t h e i r c o n t r i b u t i o n to the 8087.
The second example demonstrates how several The i n i t i a l a r c h i t e c t u r a l design was the j o i n t work
accumulations can be calculated e f f i c i e n t l y in the of the author and Bruce Ravenel, r e l y i n q h e a v i l y
8087. I f we have two sets of data, Xi and Yi, t h a t on the advice of Professor W. Kahan. Robert
we want to analyze, we very l i k e l y w i l l want means, Koehler made s i g n i f i c a n t c o n t r i b u t i o n s to the sys-
standard deviations and c o r r e l a t i o n c o e f f i c i e n t s . tems aspects of the 8087 and Janis Baron was re-
We thus want to c a l c u l a t e : sponsible f o r designing the assembly language and
implementing the emulator. A great deal of c r e d i t
Mx =~Ex i My = ~ y j Sx = ~ x l 2 Sy =:Eyi 2 must go to Rafi Nave and his team in I n t e l Israel
f o r implementing the 8087 and to Dai-Sun Tsien f o r
Cxy = ~ x i Y i c a r e f u l l y reviewing and checking the implementa-
tion. Perhaps most s i g n i f i c a n t of a l l , we acknow-
In an o r d i n a r y stack machine, the f i v e values l i s - ledge the management of I n t e l f o r being w i l l i n g to
ted above would probably be calculated in f i v e sep- commit s i g n i f i c a n t resources to both implementation
arate passes through the data r e q u i r i n g t h a t each and promotion of a standard f o r r e l i a b l e numeric
datum be read three times. data processing.

On the 8087 the f i v e values can be c a l c u l a t e d BIBLIOGRAPHY


in one pass through the data, r e q u i r i n g t h a t each
datum be read only once. The a r c h i t e c t u r a l feature I. Moore, R.E. (1979), "Methods and A p p l i c a t i o n s
t h a t permits t h i s increase in e f f i c i e n c y is the of I n t e r v a l A n a l y s i s , " SIAM Studies in Applied
a b i l i t y to do a r i t h m e t i c with operands from any Mathematics, SIAM, P h i l a d e l p h i a .
stack element. The algorithm is described below.
2. Palmer, J. (1977), "The I n t e l Standard f o r
STEP ACTION F l o a t i n g - P o i n t A r i t h m e t i c , " Proc. COMPSAC,
107-112.
O. Clear f i v e stack elements (push zero f i v e
times): Mx, My, Sx, Sy, Cxy 3. Coonan, J . , W. Kahan, J. Palmer, T. Pittman
and D. Stevenson (1979), "A Proposed Standard
1. LOAD X i f o r F l o a t i n g - P o i n t A r i t h m e t i c , " SIGNUM News-
l e t t e r , October, 1979.
2. Add TOP (Xi) to Mx
4. Kahan, W., J. Palmer (1979), "On a Proposed
3. Duplicate TOP of stack F l o a t i n g - P o i n t Standard," SIGNUM Newsletter,
October, 1979.
4. Square TOP

'180
5. Kahan, W. (1972), "A Survey of Error Analysis,"
Information Processing 71, North Holland Pub-
lishing Company, 1214-1239.

6. Yohe, J. (1973), "Roundings in Floating-Point


Arithmetic," IEEE Trans. Computers, Vol. C-22,
No. 6, 577-586.

t81