You are on page 1of 34

- 237 -

PROGRAMMABLE DIGITAL MICROCIRCUITS -


A SURVEY WITH EXAMPLES OF USE

C. Verkerk
CERN, Geneva, S w i t z e r l a n d

1. Introduction

For most readers the title of these lecture notes will evoke microprocessors. The
fixed instruction set m i c r o p r o c e s s o r s are h o w e v e r not t h e o n l y p r o g r a m m a b l e digital mi-
c r o c i r c u i t s a n d , a l t h o u g h a n u m b e r of p a g e s w i l l b e d e d i c a t e d t o t h e m , t h e aim o f t h e s e
notes is also to draw attention to other useful microcircuits. A complete survey of
programmable circuits would fill several books and a selection had therefore to be
made. T h e choice has rather been to treat a variety of devices than to g i v e an in-
depth treatment of a particular circuit. T h e selected devices h a v e all f o u n d u s e f u l ap-
p l i c a t i o n s in h i g h - e n e r g y physics, or hold promise for f u t u r e use.

The m i c r o p r o c e s s o r is v e r y y o u n g : just over eleven years. An advertisement, an-


nouncing a new era of integrated electronics, and which appeared in t h e November 15,
1971 i s s u e o f Electronics News, is g e n e r a l l y c o n s i d e r e d i t s b i r t h - c e r t i f i c a t e . The adver-
tisement was for the I n t e l 4004 a n d i t s t h r e e s u p p o r t c h i p s . The h i s t o r y leading to this
announcement merits to be recalled. Intel, then a very young company, was w o r k i n g on
the design of a c h i p - s e t for a h i g h - p e r f o r m a n c e calculator, f o r a n d in c o l l a b o r a t i o n with
a Japanese firm, Busicom. O n e of the Intel e n g i n e e r s found the Busicom design of 9
different chips too complicated and tried to find a more general and programmable solu-
tion. His d e s i g n , the 4004 m i c r o p r o c e s s o r , was finally adapted by Busicom, and after
further négociation, Intel acquired marketing rights for its new invention. The firm's
marketing department, however, was not interested in the product, arguing that the
n e w c h i p w o u l d c o n q u e r 10% a t m o s t o f t h e m i n i c o m p u t e r m a r k e t , w h i c h w a s t h e n 40'000
yearly. History 1 5
does not record what happened to the marketing manager What
happened to t h e microprocessor is well known : the 4 - b i t 4004 w a s superseded after
one year b y the first 8-bit processor, t h e 8008, f o l l o w e d in 1974 b y t h e 8080 a n d the
6800 a n d by PACE, the first 16-bit microprocessor. The years 1978/79 s a w t h e b i r t h of
the well k n o w n 16-bit devices : I n t e l 8 0 8 6 , Z i l o g Z 8 0 0 0 a n d M o t o r o l a 68000 a n d a l s o t h e
h i g h l y c a p a b l e 8 - b i t 6809. Sales also rose with an a s t o u n d i n g speed : in 1979, 75 m i l -
lion m i c r o p r o c e s s o r s were sold a n d at t h e e n d of t h e y e a r t h e total number sold was
110 m i l l i o n . T h i s rate of g r o w t h has been maintained since.

What h a p p e n e d t o all t h e s e p r o c e s s o r s ? Very few were in fact used as d i r e c t re-


placements of minicomputers, at least in the beginning. Many were used in a
large variety of control applications, replacing random logic. T h e overwhelming majority
however ended in n e w applications which even t h e most imaginative m a r k e t i n g manager
could not foresee : arcade games, car ignition systems, home appliances, hobby compu-
t e r s , office automation s y s t e m s , etc.

The microprocessor was invented to replace random logic and t h e f i r s t to a p p l y it


were electronics engineers, accustomed to logic d e s i g n , but not well versed in pro-
gramming. The programs these engineers developed were in g e n e r a l short and dedicat-
ed. A s s o o n as t h e y w e r e c o n s i d e r e d t o b e w o r k i n g , t h e s e p r o g r a m s w e r e " p e t r i f i e d " in
- 238 -

a R e a d - O n l y Memory (ROM). In t h e e a r l y days most programming was d o n e in "hexa-


decimal", e.g. directly in t h e m a c h i n e language. The majority of t h e users were una-
ware of the possibilities better programming tools could provide. Even assembly lan-
g u a g e programming was believed to be b e y o n d reach both for the engineers and for the
micro-computer systems ( T h e r e is a b o o k , published in 1976/77 w h i c h dedicated exactly
t w o of i t s 300 o r m o r e pages to assembly l a n g u a g e ) . It s h o u l d t h e r e f o r e b e n o surprise
that the early generations of m i c r o p r o c e s s o r s had rather primitive architectures. Tech-
nological limitations obviously were also an important obstacle to making supermicros,
b u t it w a s n o t t h e o n l y o n e , c o n t r a r y t o p o p u l a r belief.

In m o r e r e c e n t a p p l i c a t i o n s , such as o f f i c e a u t o m a t i o n s y s t e m s and personal compu-


ters, a better architecture of the microcomputer is required, in order to sup-
port higher-level language programming. These systems a r e in fact used in m u c h the
same way as mainframes and larger minicomputers are. They therefore run operating
systems, h a v e real p e r i p h e r a l s a t t a c h e d t o t h e m (hard or flexible disc, alphanumeric or
graphic display, keyboard and high-quality printer) and the user expects to find
high-level language compilers in a d d i t i o n to special application packages. P a r t of these
notes will attempt to d e s c r i b e in h o w f a r t h e m o d e r n 1 6 - b i t m i c r o p r o c e s s o r s satisfy the
r e q u i r e m e n t s of efficient implementation of h i g h level languages and o p e r a t i n g systems.

Besides making it e a s i e r t o p r o g r a m microprocessors, semi-conductor manufacturers


have also constantly strived to improve their performance. Working with longer
words , as t h e 16-bit processors do, in itself improves the performance considerably.
Progress in M O S d e v i c e t e c h n o l o g y has resulted in s p e e d i n g up the internal operations
b y m o r e t h a n an o r d e r of m a g n i t u d e . P e r f o r m a n c e i m p r o v e m e n t s w e r e a l s o s o u g h t in two
other directions : use of bipolar d e v i c e t e c h n o l o g y to overcome the speed limitations of
earlier MOS devices and t h e use of arithmetic attachments to increase the numeric pro-
cessing power. The first road led to t h e b i t - s l i c e m i c r o p r o c e s s o r , whereas the second
e n d e d in t h e d e s i g n o f a f e w interesting floating-point arithmetic devices. Both will re-
c e i v e some a t t e n t i o n in t h e s e lectures.

The modern microprocessors, where the designer succeeded in p u t t i n g of t h e order


of 10 s
transistors on a little c h i p o f silicon, usually 6 x 6 mm2, are perfect examples
of t h e so-called "Very Large Scale Integration" (VLSI) technique. That this has been
a t all possible is, in addition to the progress in d e v i c e t e c h n o l o g y mentioned before,
d u e to t h e availability of C o m p u t e r - A i d e d Design systems and techniques. The latter,
and the need to form design engineers for the semi-conductor industry, has spurred
the work on V L S I which is n o w undertaken at a n u m b e r of U n i v e r s i t i e s and Research
laboratories. Some interesting results have been obtained , and the last p a r t of these
notes will briefly review a few. The research is m a i n l y oriented towards investigating
novel computer architectures, but also some devices for practical use have been de-
s i g n e d in universities.

A microprocessor chip is n o t a v i a b l e object, generally speaking, without its com-


p l e m e n t of s u p p o r t c h i p s : m e m o r y in its d i f f e r e n t a p p e a r a n c e s , interface adapters, er-
ror detection circuits, analogue-to-digital converters, etc.. Many of these s u p p o r t cir-
cuits have wide possibilities for control by the microprocessor and should therefore be
included in t h e class of p r o g r a m m a b l e digital microcircuits. A few will be mentioned,
m a i n l y b e c a u s e t h e y h a v e f o u n d u s e s in h i g h - e n e r g y physics experiments.
- 239 -

Table I

Classification of the processor chips

CALCULATOR CHIPS

r- 1 bit
- t bit
*FIXED INSTRUCTION - *8 bit
*SET MICROPROCESSORS •»16 bit
L 32 bit

SINGLE CHIP MICROPROCESSORS

PROCESSORS

i-»
C R ) ALU
MSECUENCER
'* BIT—SLICES [-SUPPORT (MAU.DMA)
LsPECIAL (FFT)

-SPECIAL PURPOSE
(CIPHER, «GRAPHICS,
»SYSTOLIC, MOUSE)
VLSI -TREE MACHINES
-SIMPLE INSTRUCTION SET
*(LISP, R I S C )
LNON-VON NEUMANN

Table II

Examples of support chips

[-»RAM )
-»MEMORY -{-ROM V
L-EPROM __) FIRMWARE CHIPS

•TIMERS/EVENT COUNTERS

.-MULTIPLIERS
'-* ARITHMETIC •
L*FLOATING POINT
PROCESSOR
SUPPORT
-»MEMORY MANAGEMENT

-PLAs, PALS

—,-IKTERRUPT CONTROLLER
DMA CONTROLLER
-GENERAL INTERFACE •PARALLEL
-T
(SERIAL
KEYBOARD, LED DISPLAY
-DISC-, CASSETTE INTERFACE
CRT CONTROLLER, CHARACTER GENERATOR
-DOT-MATRIX PRINTER CONTROL
t-i/o SUPPORT - -COMMUNICATION (HLDC, ETHERNET, ETC.)
CYCLIC REDUNDANCY CHECK
-ENCRYPTION, DATA SECURITY
•INTER-BUS CONNECTION, TRANSCEIVERS
-IEEE-488 TRANSCEIVERS
-BURST-ERROR DETECTION

S
TEPPER MOTOR
IGITAL — > ANALOG
NALOG > DIGITAL
SYNTHESIS
t-SPEECH
ANALYSIS
- 240 -

Finally, if w e were to make some classification scheme of the whole area of pro-
grammable digital microcircuits, we could distinguish between the processors proper and
the support chips. The processors can be further subdivided into user-programmable
and fixed program processors (e.g. the marvels y o u find inside y o u r electronic watch).
Still f i n e r s u b - d i v i s i o n s of the u s e r - p r o g r a m m a b l e processors and the support chips are
s h o w n in T a b l e I and II. T h e devices w h i c h will r e c e i v e some attention in t h e following
p a g e s a r e m a r k e d w i t h an asterisk.

2. Bit-Slice Microprocessors.

T h e Field Effect T r a n s i s t o r , i m p l e m e n t e d in M O S t e c h n o l o g y m a d e it p o s s i b l e t o put


a complete processing unit on a single chip. First (1970) 4-bits wide, soon 8-bits
(=1973) and l a t e r (1978) 16-bits. Constant improvements in p a c k i n g d e n s i t y and power
dissipation of the transistors made this progress possible. The earlier microprocessors
lacked however dramatically in p o w e r when compared to e v e n the lowest r a n g e of the
computers then available. The processing speed could only be improved b y using bipo-
lar technologies (signal propagation t i m e s t h r o u g h a g a t e o f 1-3 n s , c o m p a r e d t o »10 n s
f o r MOS t e c h n o l o g y a r o u n d 1974), b u t these did not allow t h e p a c k i n g density and the
power dissipation needed. Instead of a complete 8-bit processor, including instruction
decoding and control, interrupt handling etc., which was possible in M O S technology,
either TTL or ECL technologies allowed at best the implementation of a 4-bit
wide Arithmetic and Logic Unit with its control and a few registers. To make use of
such an ALU in a processor, external control circuitry is needed to ensure proper
e x e c u t i o n of the i n s t r u c t i o n s in a p r o g r a m . On the other hand, longer data-words could
be easily handled b y concatenating several chips, as m a n y a s t h e n u m b e r o f b i t s in t h e
data w o r d required : 8 4-bit chips put together produce a 32-bit processor. The cir-
cuit represents therefore a 4-bit wide vertical "slice" of t h e 3 2 - b i t p r o c e s s o r , including
its i n t e r n a l r e g i s t e r s u s e d in a r i t h m e t i c o p e r a t i o n s ( s u c h as a c c u m u l a t o r s o r g e n e r a l re-
gisters and condition code registers).

One should note t h a t b u i l d i n g a 16- o r 3 2 - b i t w i d e p r o c e s s o r f r o m 4 - b i t w i d e slices


is different from the way processors used to be built. The ALU used to be a pure
combinatorial device, without internal storage capability. The accumulators or general
purpose r e g i s t e r files used to be built up from chips containing flip-flops. The parts
Indicated in Figure 1 in t h e block marked Arithmetic Processor Unit and consisting of
ALU, r e g i s t e r file and s h i f t e r w e r e separate items. T h e control unit saw to it t h a t the
operands were picked up from the register file, transported to the inputs of the ALU
and that the result was correctly r o u t e d b a c k t o its d e s t i n a t i o n i n t h e r e g i s t e r f i l e (For
simplicity we consider only register-to register operations for the moment). This neces-
sitated not o n l y control signals to be s e n t to t h e A L U , to indicate w h a t operation (add,
substract, AND, O R e t c . . ) to p e r f o r m , b u t also control of t h e r e g i s t e r file, of t h e data
paths and timing signals.

T h e bit-slice R A L U s i m p l i f i e s t h i s c o n t r o l in t h e f o l l o w i n g w a y : the external control


s i g n a l s s p e c i f y t h e o p e r a t i o n a n d its o p e r a n d s ; the s e t t i n g - u p of the data p a t h s , of the
A L U Control and the necessary clocking is d o n e internally to the slice. A l l s l i c e s in the
processor therefore receive the same control signals (see f i g u r e 2 ) . The slices cannot
operate completely i n d e p e n d e n t l y of each other, however. Carries generated in arithmet-
ic o p e r a t i o n s must be passed from one slice to a n o t h e r and the same must happen to
b i t s s h i f t e d o u t o f a s l i c e in s h i f t o p e r a t i o n s . In s h o r t , o n e c a n t h u s s a y slices combine
ALU, datapaths and registers on a single chip, split the overall data busses, share
control lines a n d p r o p a g a t e status.
- 241 -

C E N T R A L PROCESSING UNIT (CPU)

COMPUTER
CONTROL
UNIT
(ecu)
REGISTER FILE

ARITHMETIC
P R O C E S S O R U N I T (APU)

PROGRAM
CONTROL
UNIT
(PCU)

INTERRUPT DMA
CONTROLLER CONTROLLER

1 10

EXTERNAL WORLD

F i g . 1 Schematic diagram of a computer, consisting of a Central Processing


Unit, Memory and Input-Output.

.IS

Cout IS Cin

.OP

A B A B A B A B_
;3—0r f—r
16
B

II II ,16

jfci OP

Fig. 2 Diagram showing how a 16-bit ALU can be built from four 4-bit slices.

So f a r w e o n l y s p o k e about t h e R A L U slices, leaving the overall control aside. The


task of the control unit is t o p r o v i d e a s e q u e n c e of control signals, corresponding to
the sequential execution of program-instructions. The control unit should thus fetch
the next instruction and generate the necessary control signals for its e x e c u t i o n . It
should h o w e v e r also be able to d e v i a t e f r o m the sequential execution b y making condi-
tional o r unconditional branches, by making s u b r o u t i n e calls a n d r e t u r n s , and possibly
by handling external and internal i n t e r r u p t s . T h e control unit will t h u s become a c o m -
p l i c a t e d d e v i c e a n d t h e n u m b e r o f b i t s it c a n m a n i p u l a t e w i l l a g a i n b e l i m i t e d b y techno-
logical c o n s t r a i n t s . Control units - usually called sequencers - can h o w e v e r be conca-
tenated to obtain wider words. The words handled by the sequencer are instruction
addresses and longer words in this case mean that larger programs can be
accommodated.
- 242 -

I/O
UNIT

T
PERIPHERAI

RAIUMRALU)

STATUS FLAGS

CONTROL SECTION PROCESSING SECTION

BIT-SLICED >cPROCESSOR

Fig. 3 Diagram of a processor built from bit-slices. Both the arithmetic


(RALU) and the control (sequencer) parts are made up from slices.

Taking a number of R A L U and s e q u e n c e r slices w e can b u i l d a processor (see fig-


u r e 3) w h i c h is n o t o n l y f a s t e r t h a n t h e m i c r o p r o c e s s o r s of t h e 70's, but which in a d -
dition can have nearly unlimited wordlength and program size. A p a r t from the number
of chips needed, there is a more fundamental difference with the nor-
mal m i c r o p r o c e s s o r : t h e i n s t r u c t i o n s e t t h e p r o c e s s o r w i l l r e c o g n i z e is n o t u n i q u e l y de-
fined by the semiconductor manufacturer. The designer has a considerable liberty in
putting the parts together and in d e c i d i n g which bits of an instruction control what.
Moreover, the pieces of instructions the manufacturer has defined (e.g. the control
codes f o r the R A L U and the control of flow in t h e sequencer) a r e at a m o r e primitive
level than most conventional machine instructions to which the assembly language pro-
grammer is accustomed. For this reason code at this level is said to be composed
of micro-instructions. Another important fact to realize is t h u s that bit-slices require
microprogramming, a level of p r o g r a m m i n g well below normal a s s e m b l y l a n g u a g e s . (This
does not e x c l u d e t h a t o t h e r levels of p r o g r a m m i n g can be put on top of the microcode,
as w e will see l a t e r ) .

T o make the distinction between an a s s e m b l y l a n g u a g e p r o g r a m and microcode clear,


let us c o n s i d e r the following example : assume we have a machine with a number of
general purpose registers, say R0-R7, and we want to add the contents of RO t o the
- 243 -

contents of a m e m o r y location. T h e a d d r e s s L of t h e m e m o r y location is h e l d in register


R2 : (R2)=L, where (R2) means "contents of R2". In a s s e m b l y language this could be
written as
ADD @R2, RO.

Translated in, for instance, PDP-11 m a c h i n e c o d e , t h i s w o u l d r e s u l t in a 1 6 - b i t instruc-


tion, which would do exactly what was described above. In m i c r o c o d e , t h i s s i m p l e oper-
ation w o u l d have to be b r o k e n d o w n in s e v e r a l s m a l l s t e p s . T h e same steps i n d e e d that
occur internally in t h e PDP-11. Assuming that t h e same 16-bit instruction code is fed
into a bit-slice machine, than t h e steps to be e x e c u t e d b y the m i c r o p r o g r a m could be :

C. F e t c h a n d D e c o d e s e q u e n c e c o m m o n t o all instructions

1. (PC) --> MAR ; Transfer Contents of Program


initiate memory read C o u n t e r to Memory A d d r e s s Register
to fetch new instruction.
2. (MDR) > IR ; T r a n s f e r contents of Memory
Data r e g i s t e r to I n s t r u c t i o n Register
3. Decode contents of IR ; Decode operation code and
(PC) 1 +
-->PC a d d r e s s i n g m o d e s . . R e s u l t is
a n a d d r e s s M in microstore.
Update the program counter
4, Transfer microprogram control to t h e s e q u e n c e of m i c r o - i n s t r u c t i o n s which will
execute the instruction held in IR. (e.g. transfer control to microcode address
M f o u n d in s t e p 3 ) .

Now follow the steps f o r the A D D sequence for the particular


addressing mode.

5. (R2) -- > MAR ; C o n t e n t s of R2 to MAR


initiate memory read
6. (MDR) -- > A i n p u t of ALU ; Contents of m e m o r y data register
(RO) - - > B i n p u t of ALU to o n e a n d o f RO t o t h e other
set u p A L U control f o r "ADD" input of the ALU.
7. o u t p u t of A L U --> MDR ; o u t p u t goes back to memory
initiate memory write
8. g o back to step 1 g o back to fetch n e x t instruction.

Many steps are needed for such a simple operation. N o t e t h a t n o t all s t e p s t a k e an


equal amount of time. D e p e n d i n g on the speed of the memory there may be considerable
delays between step 1 and 2 and between 5 and 6. Also note that i t is a s s u m e d that
there are sufficient d a t a p a t h s a v a i l a b l e s o t h a t t h e a c t i o n s in s t e p 6 m a y b e d o n e sim-
ultaneously.
What do we really h a v e to d o to e x e c u t e t h e s e atomic actions ? Look at s t e p 1 and 5.
We see that t h e s o u r c e f o r loading t h e M A R can be the PC o r one of the general regis-
ters. This means that t h e c o r r e c t register must be selected, and its o u t p u t gated to
the inputs of the M A R . This, implies enabling gates, setting up multiplexers, in order
to r o u t e t h e data to its d e s t i n a t i o n . An important task for a micro-instruction is thus
to set up the route for the data, in m u c h t h e same w a y as a r a i l w a y signal man sets
t h e points to d i r e c t an incoming t r a i n to t h e c o r r e c t p l a t f o r m . A n o t h e r t a s k is t o e n a b l e
- 244 -

MICROPROGRAMMING
DECODER OUTPUT
LINE=YU INSTRUCTION
WILKES MICROPROGRAMMED CONTROL STORAGE

/
A8 B
DECODER diode matrix
NEXT ROM s
ADDRESS
(1
1
2n l i n e s 1

Control information:i Sequencing


States, i information :
Output i transitions

I
I from A L U STATUS :
Conditional B r a n c h

T A R G E T IR

1
Fig. 4 W i l k e s microprogramming model. Each microinstruction contains
control information together with the address of the next microinstruc-
tion.

the MAR inputs, so that t h e next clock pulse will actually s t r o b e t h e data into t h e r e -
gister. All this control information is p r o v i d e d b y the micro-instruction's fields. The
m i c r o - i n s t r u c t i o n itself c a n also contain t h e a d d r e s s of the next micro-instruction to be
executed and thus define t h e flow of control of t h e microprogram. What has been d e -
s c r i b e d s o f a r is e x a c t l y t h e m i c r o - p r o g r a m m i n g m o d e l p r o p o s e d i n 1951 b y W i l k e s a n d
s h o w n in f i g u r e 4 . Each clock pulse causes a n e w microcode t o b e strobed into t h e d e -
coder and thus new control information to become available together with t h e next a d -
dress.

W e h a v e o n l y t o u c h e d t h e s u r f a c e o f m i c r o p r o g r a m m i n g s o f a r , b u t it s h o u l d already
be clear to e v e r y o n e that writing microcode is n o t a s i m p l e t a s k a n d t h a t i t requires
very intimate knowledge of t h e h a r d w a r e of t h e machine. W e c a n l e a r n still m o r e from
the example : the first four steps a r e necessary to fetch (from main m e m o r y ) a n d d e -
c o d e i n s t r u c t i o n s of a h i g h e r level t h a n t h e m i c r o - i n s t r u c t i o n s t h e m s e l v e s . In t h e e x a m -
ple a PDP-11 machine instruction was fetched and then executed. Writing short micro-
code sequences, o n e f o r each c o n v e n t i o n a l machine instruction, is t h e r e f o r e equivalent
to defining a set of such machine instructions. In o t h e r w o r d s , w e have created the
possibility to write programs in a s s e m b l y language, from where w e can build up to
higher level languages. B y writing other, longer sequences of microinstructions, we
could define a set of commands of a higher level; f o r instance a set particularly well
suited for interpretation o f Pascal P-code. T h e task of writing microcode f o r t h e e x e -
c u t i o n o f all d e f i n e d i n s t r u c t i o n s h a s o n l y t o b e p e r f o r m e d o n c e ( b y a n e x p e r t ! ) . Once
it i s d o n e t h e m a c h i n e p r e s e n t s itself t o a p r o g r a m m e r j u s t as a n y o t h e r m a c h i n e . T h e
microcode can be stored i n a R e a d - O n l y M e m o r y in w h i c h c a s e w e h a v e o b t a i n e d a f i x e d
instruction set processor. When the microcode is s t o r e d in a R e a d - W r i t e memory, the
possibility exist to a d d new instructions o r sequences of instructions : t h e m a c h i n e is
u s e r - m i c r o p r o g rammable.
- 245 -

T h e designer o f a b i t - s l i c e m a c h i n e is f r e e t o d e f i n e t h e i n s t r u c t i o n s e t h i s proces-
s o r will e x e c u t e . He can define none. In that case t h e p r o c e s s o r can only be program-
med by writing microcode. This can be done once and for all s o t h a t a f i x e d program
machine will be t h e r e s u l t . T h e solution adopted generally is t o l e a v e it t o t h e u s e r to
program the processor. Several examples of such p r o c e s s o r s - t o - b e - m i c r o p r o g r a m m e d ex-
ist a n d a r e u s e d in h i g h - e n e r g y physics experiments : ESOP 2 5
, CAB 3 5
, MONICA" . 0

The designer can also choose an instruction set of another, hopefully well-known
and widely used, computer, e.g. PDP-11 or I B M 370/168. In that case he creates an
"emulator". Emulators are also used in h i g h - e n e r g y physics experiments ; examples
are MICE > S B >
and 370/E >. 7

The last choice t h e d e s i g n e r has is t o d e f i n e a n instruction set of his o w n liking,


which has as a c o n s e q u e n c e t h a t all s o f t w a r e must be d e v e l o p e d from scratch. There
are many examples of commercial m a c h i n e s ' 8
in t h i s c a t e g o r y : Nanodata QM-1, Con-
t r o l D a t a 5600, B u r r o u g h s B1700, e t c . G E S P R O s >
is a n e x a m p l e f r o m high-energy phy-
sics .

Of all these possibilities, emulators have the invaluable advantage that software
running on t h e emulated machine will also run on the emulator ( i f it d o e s not, than
there is s o m e t h i n g wrong with the emulator). This means in practice that high-level
languages are available, and that programs can be compiled and debugged on a large
machine with excellent facilities. T h e final p r o g r a m can h o w e v e r run in a s m a l l , cheap
machine (the emulator), without fancy peripherals but dedicated t o its t a s k . Emulators
tend however to be slower than the directly microprogrammed machines. A user
m i c r o p r o g r a m m a b l e emulator makes t h e best of both worlds.

For completeness w e m u s t m e n t i o n t h a t t h e r e is a n o t h e r w a y to m a k e an "emulator".


T h e processor can b e m a d e i n s u c h a w a y t h a t it c a n o n l y r u n m i c r o c o d e , w h i c h is o b -
tained by translating machine code for the emulated computer into microcode for the
emulator, instruction by instruction. With a proper design of the bit-slice processor
this translation can be simple and a program can be written to do it automatically.
There are a few problems, which can be solved in p r a c t i c e . Normal programs mix in-
structions and data. The emulator must keep the microcode separate from the data, so
the translator program must d o some more w o r k . T h e other problem is t h a t t h e micro-
programs generated are long, generally longer than the original machine code, so the
microstore must be large to contain these p r o g r a m s . A n d as t h e m i c r o s t o r e m u s t also b e
m a d e of f a s t m e m o r y chips (the speed of operation d e p e n d s d i r e c t l y on the access time
o f t h e m i c r o s t o r e ) , t h i s t y p e o f e m u l a t o r is g e n e r a l l y e x p e n s i v e . The well-known example
(and actually the only one k n o w n ) o f t h i s t y p e o f e m u l a t o r is t h e 168/E, 1 0 5 1 1 5 l 2 ) 1 3 5

o f w h i c h b e t w e e n 2 0 a n d 3 0 a r e in o p e r a t i o n i n h i g h - e n e r g y physics.

We noted a l r e a d y t h a t w r i t i n g m i c r o c o d e is d i f f i c u l t a n d t e d i o u s a n d t h a t it requires
expertise. I t is t h e r e f o r e i m p o r t a n t to use g o o d tools w h e n writing microcode. Several
good microassemblers do now exist. In fact t h e y are meta-assemblers *', 1
which means
t h a t t h e c o d e t o b e g e n e r a t e d is n o t p r e - d e f i n e d i n s i d e t h e a s s e m b l e r , but the user has
to d e f i n e it. A m e t a - a s s e m b l e r w o r k s in t w o ( o r t h r e e ) phases :
- the definition phase. In this phase the format of the micro-instruction is defined,
symbolic names are given to fields and default values attributed. Also macro defini-
t i o n s a r e m a d e in t h i s phase.
- the assembly phase. During this phase the symbolic micro-instructions (which use
the field-names and the macro-definitions) are assembled into b i n a r y code.
- 246 -

- a post-processing phase, in w h i c h t h e b i n a r y code may be re-formatted for program-


ming of P R O M chips o r f o r use b y a loader.

A good microassembler will allow f o r almost a n y w i d t h of m i c r o - i n s t r u c t i o n s so that


horizontal and vertical microcode may be assembled. It will also have macro facilities
a n d will allow nesting of m a c r o s . E r r o r d e t e c t i o n is a n o t h e r i m p o r t a n t feature.

A n o t h e r s o f t w a r e tool w h i c h is v e r y important for debugging microcode is a simula-


tor. Instead of w r i t i n g an a d - h o c s i m u l a t o r f o r e v e r y processor built from bit-slices, it
is p r e f e r a b l e t o u s e a g e n e r a l t o o l . T h e ISPS system (Instruction Set Processor System)
is a n e x a m p l e . " ° 1 l s )
The user writes in a h a r d w a r e d e s c r i p t i o n language a definition
of the machine (which as a m a t t e r o f f a c t c o n s t i t u t e s an e x c e l l e n t piece of documenta-
tion). ISPS compiles the description and the user can then interactively simulate the
behaviour of t h e hardware. He can set b r e a k p o i n t s , inspect the contents of registers
or memory, etc. All micro-instructions a n d also s e q u e n c e s can t h u s be t e s t e d a n d the
hardware verified before building it. The interested reader is r e f e r r e d to r e f . 14 for
more details.

A n u m b e r of b i t - s l i c e families exist, b u t t h e A m d 2900 f a m i l y is b y far the most po-


pular. The RALU s l i c e is 4 - b i t s wide and contains a file of 16 r e g i s t e r s of w h i c h two
can be accessed (for reading) simultaneously. There is a n e x t r a accumulator register,
connected to the output of t h e A L U which can b e shifted and there is a n additional
shift register. The c h i p is c o n t r o l l e d by 9-bits, divided into t h r e e encoded fields. A
few sequencer chips belong to t h e family. The Amd 2909 is o n e o f t h e m . It c o n t a i n s a
microprogram counter register ( t h e e q u i v a l e n t of a n o r m a l p r o g r a m c o u n t e r ) w i t h its in-
crement logic. External addresses can be strobed into the MCP, so that jumps and
b r a n c h e s can be m a d e a n d a small s t a c k , 4 d e e p , is u s e d f o r s t o r i n g o f s u b r o u t i n e re-
turn addresses. The sequencer s l i c e is also 4-bits wide. A bit-slice family which is
used for its high speed is t h e M o t o r o l a 10800, implemented in ECL. The essential dif-
ference with the Amd 2900 is t h a t t h e A L U slice does not contain a register file. It is
bus-oriented and a separate register file slice can be easily combined with the ALU
slice. T h e sequencer is m u c h more involved than the Amd 2909. The family also com-
prises a Memory Control slice, which contains a 4-bit ALU. Address calculations may
thus be made elsewhere than in the main ALU of the processor. Figure 5 shows a

MICRO STORE I K * 112 b i t s

f~ MICROINSTRUCTION REGISTER

AAC TI0 Ml F ALU CCL


TMCF
0-BUS-

AUX-ADDR.
CONTROL

0 NA
T M MC 10B06 MC 1 0 8 0 0 CONO. MC 10S01
CODE
M IF RF ALU LOGIC MCF
I T 1 CMA

L I
Fig. 5
Simplified block diagram I-BUS
of a processor built from
the Motorola 10800 bit- INTERRUPT TARGET INSTR
slice family LOGIC DECODING

INTERRU PTS
- 247 -

b l o c k - d i a g r a m o f a p r o c e s s o r c o n s t r u c t e d w i t h t h e 10800 f a m i l y . This processor emulates


the PDP-11 i n s t r u c t i o n set a n d has b e e n d e s c r i b e d elsewhere. ' 6

A l a r g e b o d y of literature e x i s t s on m i c r o - p r o g r a m m i n g l s > 1 7 >


and bit-slices, 1 8 > 1 9 5

which shows that there is m u c h more to be said on t h e topic than can be e x p o s e d in


these few pages. For detailed descriptions of the various chips, the reader is refered
to the manufacturer's literature.

Bit-slices are components for building powerful processors. They r e q u i r e an inte-


grated hard ware/firm ware/software design. Designing with bit-slice chips requires spe-
cialists a n d good tools, but the result c a n b e a n e x c e l l e n t m a c h i n e , as s o m e o f t h e ex-
amples f r o m h i g h - e n e r g y physics have amply shown.

3. Fixed Instruction Set microprocessors.

The class of f i x e d instruction set m i c r o p r o c e s s o r s c o m p r i s e s all t h e p o p u l a r devices


which made the microprocessor revolution : Intel 8080 , 8086, Motorola 6800, 6809,
68000, Zilog Z80, Z8000, MOS Technology 6502 a n d many, many others. As the name
indicates, the engineer and programmer h a v e no control o v e r the instruction set, which
h a s b e e n o n c e a n d f o r all f i x e d b y the manufacturer. For a critical application the de-
s i g n e r will have to make a careful selection of the microprocessor chip which has the
best chance of p r o v i d i n g a reasonable solution to the problem. Really critical applica-
tions are however extremely rare - and then in m a n y cases s o l v e d b y using a bit-sliced
processor - a n d t h e d e s i g n e r s c h o i c e is t h e r e f o r e m o s t l y g u i d e d b y o t h e r c r i t e r i a : fam-
iliarity and experience with a particular type, or with a m e m b e r of the same family,
cost, availability of monoboard m i c r o c o m p u t e r s , of suitable s u p p o r t chips, etc.

For industrial applications and for consumer products, where c o s t is t h e overriding


factor, the simple 4-bit microprocessors h a v e not been abandoned. The single chip mi-
croprocessors, combining a CPU a n d m e m o r y on t h e same piece of silicon, a r e also very
popular in p r o d u c t s which are sold in l a r g e q u a n t i t i e s . In t h e s e a p p l i c a t i o n s , t h e pro-
cessor runs a simple p r o g r a m , w h i c h need not to be changed after suitable debugging.
The development cost, including the software development, is quickly amortized and
there is n o n e e d f o r "sophisticated" tools. High level languages are t h e r e f o r e not used
at all and the processors also do not need to h a v e features which make programming
easier, or execution faster, or even code more compact. Pushing this argument to the
extreme, for industrial control applications a 1-bit microprocessor has been manufac-
tured. Intended as a - c h e a p - replacement of relay logic, this processor, the Motorola
MC14500B, has f o u n d an i n t e r e s t i n g a p p l i c a t i o n in a n e x p e r i m e n t a l v e c t o r processor, on
w h i c h w e w i l l c o m e b a c k in o n e o f t h e f o l l o w i n g chapters.

There are however many applications where the expected v o l u m e of sales does not
reach millions, but w h e r e a d a p t a b i l i t y o f t h e p r o d u c t is o f g r e a t importance. For a long
time this has been the realm of the 8-bit microprocessors, until progress in device
technology made the 16-bit microprocessor possible. With the 16-bit processor came also
a breakthrough in t h e a r c h i t e c t u r e of t h e m a c h i n e s , t u r n i n g them into real computers,
which have nothing to e n v y from the typical minicomputer of the mid-seventies. This
d o e s h o w e v e r not mean t h a t i n d u s t r y has a b a n d o n e d the 8-bit micro. Its l o w e r c o s t and
t h e fact that it is p e r f e c t l y a d e q u a t e f o r a l a r g e p e r c e n t a g e o f all a p p l i c a t i o n s (80% ? )
accounts f o r its lasting p o p u l a r i t y . Enormous progress has also been made to overcome
t h e i n i t i a l l i m i t a t i o n s o f t h e 8 - b i t m i c r o s a n d t h e 6809 is a p e r f e c t e x a m p l e o f h o w added
- 248 -

Table III

Typical examples of the uses to which the various fixed


instructions set microprocessors are put

DOMAINS OF USE :

- 1-bit (MC14500B) Industrial Control and ... vector


processor

- 4-bit (Intel 4004, Consumer market


Texas TMS 1000)
:
- 8-BIT (Intel 8080A, Most Widely Used in
Motorola 6800, -Control
6802,6809, - Character Handling
Zilog Z 8 0 , - Home computers
Mos Techn. 6502 - terminals
etc, e t c . )

- 16-bit (Intel 8086, : Professional applications where


Motorola 68000 (re)programming and throughput
Zilog Z8000 are important
National 16032
Texas TMS 9900)

- 32-bit (Intel ÏAPX432) : "Micro mainframe"

features can enhance a processor a n d m a k e it m u c h m o r e s u i t a b l e f o r p r o g r a m m i n g in a


high-level language, without loosing completely the compatibility with the other members
of the family. Table III summarizes t h e fields of application of t h e d i f f e r e n t classes of
micro-processors. In t h e r e s t o f t h i s c h a p t e r w e will r e v i e w some of t h e f e a t u r e s which
make that the 16-bit p r o c e s s o r s can be considered to be adult devices, apt to support
an o p e r a t i n g s y s t e m a n d an acceptable p r o g r a m m i n g environment.

The limitations of t h e 8-bit processors are largely due to the word-size and res-
tricted addressing modes and to t h e need to keep t h e C P U simple. T h e s e limitations can
b e s u m m a r i z e d as f o l l o w s :
i) a r i t h m e t i c o p e r a t i o n s :
- limited precision ; operations on reasonably sized operands are slow.
- h a r d w a r e multiply/divide do not exist.
ii) n u m b e r o f i n t e r n a l r e g i s t e r s is l i m i t e d , r e s u l t i n g in :
- slow operation, a s i n t e r m e d i a t e r e s u l t s m u s t b e s t o r e d in memory.
- restricted indexing operations, which result in explicit address calculations,
slowing down the overall operation.
iii) t h e m a j o r i t y of instructions require more than one b y t e , again slowing down exe-
cution.
iv) total a d d r e s s space is l i m i t e d t o 64 K b y t e s , precluding the running of large pro-
g r a m s a n d also to a l a r g e e x t e n t t h e use of h i g h - l e v e l languages.
v) limitations in t h e i m p l e m e n t e d a d d r e s s i n g modes impede elegant solutions f o r param-
eter passing in h i g h - l e v e l language procedure calls.
vi) the use of absolute addresses is p r a c t i c a l l y unavoidable (ROMs a n d I/O d e v i c e s at
fixed addresses, forcing programs to occupy the holes left). In the absence of
universally accepted conventions, software Is t h e r e f o r e not easy to t r a n s p o r t bet-
ween systems.
vii) advanced features, such as m u l t i l e v e l i n t e r r u p t or protection mechanisms are ab-
sent or v e r y primitive.
- 249 -

The consequences of t h e s e restrictions are that software tools on 8-bit micropro-


cessors are primitive. Programs, written in a high-level language and built-up from
separately compiled and relocatable modules, linked together and to library routines,
are the exception and not the rule.

It t o o k i n d u s t r y a w h i l e t o r e a l i z e t h e impact of t h e s e r e s t r i c t i o n s a n d t h e i r conse-
quences. With a d v a n c i n g technology, emphasis was first put on microcontrollers, e.g.
applications where the total number of chips must be reduced. I t is typical for this
t r e n d t h a t t h e 6 8 0 2 , w h i c h is a 6800 w i t h 128 b y t e s of read/write memory on chip and
thus particularly suited for controller applications, was developed a few years before
the 6809. Both processors have approximately the same number of transistors; the
6809, although still an 8 - b i t machine, has o v e r c o m e many of limitations listed before,
t h e 6802 h a s not.

T h e 6800 ( a n d t h e 6802) a r e in f a c t p o o r in r e g i s t e r s , as shown in f i g 6. Such a


register set is t y p i c a l for the 8-bit processors, which are all a c c u m u l a t o r - b a s e d ma-
chines. When we compare with the register set of, for instance t h e Z8000, figure 7,
we are struck by two things : there are many more registers and - with a few excep-
tions - no special roles a r e a t t r i b u t e d to each register. This symmetric use of general
purpose registers, is o b s e r v e d for most 16-bit processors, as w e w i l l s e e . The larger
number of registers opens a number of possibilities, enhancing t h e capabilities of the
machine. Most of t h e 16-bit processors t h a t w e will see h a v e improvements over the
8-bit predecessors in all o f t h e f o l l o w i n g a s p e c t s :
i) a larger address space is a v a i l a b l e . . A l t h o u g h the basic length of an a d d r e s s is
still 1 6 - b i t s , as b e f o r e , all p r o c e s s o r s (except one) have built-in or external fa-
cilities to increase the address space from 64K to a m a x i m u m of 16 M b y t e s . How
t h i s is d o n e w e w i l l s e e i n m o r e d e t a i l later,
ii) m o r e r e s o u r c e s a r e p u t at t h e p r o g r a m m e r s ' disposal :
- more registers, which are often not limited to a special f u n c t i o n , but can be
used to hold addresses, address-pointers or data.

I ACCA I Accumulator A

ACC B Accumulator B

IX Index Register

PC Program Counter

SP Stack Pointer

|1|1M'|N|Z|V|C| Condition Code Reg.

6800 Registers

Fig. 6 Programming m o d e l of the 8-bit 6800 microprocessor, showing the


registers accessible to the program.
- 250 -

RHO Oí 7 RLO
RRO
RH1 RL1
RQO
RH2 RL2
RR2
RH3 RL3

RH4 RL4
RR4
RH5 RL5
RQ4
RH6 RL6
RR6
RH7 RL7
GENERAL
15
RR8 PURPOSE
REGISTERS
RQ8

RR10

RR12

R14' r SYSTEM STACK POINTER RQ12


R14 NORMAL STACK POINTER
RR14 •
R1 SYSTEM STACK POINTER
, R15 NORMAL STACK POINTER

NOT USED

FLAG CONTROL WORD PROGRAM


PC SEGMENT NO. STATUS

PC OFFSET

SEGMENT NUMBER PROGRAM


STATUS AREA
UPPER OFFSET POINTER
5 14 98
RATE COUNTER REFRESH
-REFRESH ENABLE

Fig. 7 Register structure of the 16-bit Z8000 microprocessor, showing


the registers accessible to a program.

- better arithmetic capabilities. The word s i z e in itself is a great improvement,


b u t most p r o c e s s o r s also p o s s e s s hardware multiply and/or divide capability.
- operations are defined on data t y p e s of d i f f e r e n t lengths : bytes (characters),
words, double-words, etc.
- powerful instructions h a v e b e e n a d d e d , as f o r i n s t a n c e b l o c k move.
iii) all p r o c e s s o r s have enhanced possibilities for multiprogramming and multitasking :
- task switching and context saving and restoring is e a s e d and a much greater
range of interrupts and traps is a v a i l a b l e .
- m e m o r y protection schemes are sometimes implemented.
- memory segmentation or paging is i m p l e m e n t e d , e i t h e r o n c h i p ( t h i s is t h e case
f o r t h e 8086) o r o f f - c h i p (68000) o r at t h e u s e r ' s choice (Z8000).
- all processors have a privileged way of running, reserved for the operating
system. These user/supervisor states go hand in h a n d w i t h t h e p r o t e c t i o n me-
chanisms : a user cannot r u n in s u p e r v i s o r state and thus cannot c o r r u p t the
operating system.
iv) judicious selections of a d d r e s s i n g modes p r o v i d e elegant w a y s for parameter pass-
i n g in p r o c e d u r e c a l l s , w h i c h l a r g e l y s a t i s f y t h e r e q u i r e m e n t s of block-structured
high-level languages.

H o w is t h i s a c h i e v e d ? T h e m e t h o d s v a r y from one processor to another, so w e will


examine a few examples.
- 251 -

Intel 8086.

The 8086, o n e of t h e earlier 16-bit processors, bears great ressemblance to the


8080, as w a s w a n t e d b y its d e s i g n e r s . 2 0 5
M o s t o f t h e 8080 i n s t r u c t i o n s a r e in f a c t c o m -
p a t i b l e w i t h t h e 8086. A number of improvements have been made, apart from the lon-
ger data a n d enhanced arithmetic capabilities : sighed and unsigned hardware multiply
and divide instructions are implemented, accepting words a n d b y t e s as o p e r a n d s . The
processor consists of t w o rather separate parts : the Execution Unit (EU) and the Bus
Interface Unit (BIU)(see fig.8). T h e purpose o f t h e E U is o b v i o u s , . t h e f u n c t i o n of
the B I U Is t o g e n e r a t e a d d r e s s e s . Both units h a v e their o w n set of registers, which
are shown in f i g 9 . T h e p r o g r a m counter resides i n t h e B I U a n d is c a l l e d instruction
pointer ( I P ) . T h e registers in t h e E U a r e m o s t l y special p u r p o s e a n d their roles cannot
be interchanged. Thus A X is t h e m a i n a c c u m u l a t o r , also u s e d in m u l t i p l i c a t i o n , B X is a
base register, C X is u s e d f o r c o u n t s a n d l o o p c o n t r o l , e t c .

EXECUTION UNIT (EU) BUS INTERFACE UNIT (BIU)

GENERAL SEGMENT
REGISTERS REGISTERS
INSTRUCTION
POINTER

ADDRESS MULTIPLEXED
GENERATION
AND BUS CONTROL

r *
J OPERANDS J
INSTRUCTION
QUEUE

F i g . 8 The structuré of the 8086 microprocessor is broken-up


into two parts: the Execution Unit and the B u s Interface Unit.

(AX) AH AL CODE SEGMENT (CS)


(BX) BH BL STACK SEGMENT (SS)
(CX) CH CL DATA SEGMENT (DS)
(DX) DH DL EXTRA SEGMENT (ES)
(SP) STACK POINTER INSTR. POINTER (IP)
(BP) BASE POINTER
(Si) SOURCE INDEX
(DI) DEST. INDEX
(PSW) FLAGS

EXECUTION UNIT BUS INTERFACE UNIT

8086 Regi

Fig. 9 T h e register structure of the Intel 8086


- 252 -

19 O
M E M . A D D R . LATCH P h y s í c o l address

Logical
»~ p h y s i c a l a d d r e s s
translation

|îEMP. 16-bit |0000| TEMP. 16-bit

16-bil 16-bit offset as


CS segment from instruction
S address
E
ss L
E
DS C from ALU
T or A - b u s
ES

8086

Fig. 10 Logical to Physical Address translation in the Intel 8086. This


translation is done on-chip.

T h e segment registers are used to e x t e n d t h e a d d r e s s s p a c e b e y o n d t h e i n h e r e n t 64


Kbytes. The way t h i s is a c c o m p l i s h e d is s h o w n in f i g 10. From the 16-bit address de-
f i n e d in t h e i n s t r u c t i o n , a 2 0 - b i t p h y s i c a l a d d r e s s is f o r m e d , b y adding a segment base
address, which is s h i f t e d f o u r places before the addition. In this way a 1 Mbyte ad-
dress r a n g e is o b t a i n e d . The segment base address register is u s u a l l y selected b y the
processor : instructions are always fetched using the contents of C S a n d IP to calcu-
late t h e p h y s i c a l a d d r e s s ; stack operations use SS and d a t a is o b t a i n e d u s i n g DS. ES
is u s e d i n o p e r a t i o n s on c h a r a c t e r s t r i n g s a n d determines t h e destination of t h e string.
This selection of segment base registers can be o v e r r i d d e n by attaching a pre-byte to
the relevant instruction, but this is d o n e in an a s y m m e t r i c way : the programmer is
n o t f r e e in h i s c h o i c e . T h e existence of the segment registers imply that the program-
mer has at any moment four blocks of 64 Kbytes of memory at his disposal. These
blocks may be entirely disjoint or t h e y may overlap. T h e contents of the segment regis-
ters can be changed under program control, of c o u r s e . It can t h u s be said that the
8086 has a M e m o r y M a n a g e m e n t U n i t i n c o r p o r a t e d in t h e p r o c e s s o r , a l t h o u g h it d o e s not
h a v e al t h e facilities one w o u l d w a n t for a multiprogramming system. For instance, if
t w o small p r o g r a m s , owned by programmers A and B are loaded in the same block of
64K, then the p r o g r a m (say A's) that uses the lower value of C S can h a v e undisturbed
access to the other's (B's) instructions.

On the other hand, the use of Segment Registers allows to bind a program to a
physical place in m e m o r y at t h e u l t i m a t e moment o n l y . A program may also be dynami-
cally relocated, if t h i s move is a c c o m p a n i e d b y the appropriate change to the contents
of the segment r e g i s t e r s , a n d t h e p r o g r a m o r its d a t a d o n o t e x c e e d 64K.

T h e a d d r e s s i n g m o d e s i m p l e m e n t e d o n t h e 8086 a r e s u f f i c i e n t l y e x t e n d e d so t h a t po-
sition-independent or re-entrant code can be written without difficulty. The use of the
stack pointer and base register allow parameter passing in procedure calls. Another
feature of the 8086 is i t s string handling capabilities, which are fully interruptable,
- 253 -

w h i c h is o f c o u r s e v e r y important for v e r y l o n g s t r i n g s . T h e 8086 h a s h o w e v e r no p r i v i -


leged mode of operation. T h e reader w h o wants to k n o w m o r e details of this processor
is referred to Intel's literature, or to t h e article by Morse et al. 2 0
' Consultation of
Wakerly's excellent book 2 1 3
is r e c o m m e n d e d , as it p r o v i d e s complete descriptions of a
n u m b e r of m o d e r n m i c r o p r o c e s s o r s , in a u n i f i e d f o r m a t , making comparisons v e r y easy.

Zilog Z8000.

The designers of this processor had a number of objectives in mind which they
summarized themselves 2 2
' : increase capabilities, provide architectural compatibility
o v e r a r a n g e of capabilities a n d , increase clarity. T h e f i r s t o b j e c t i v e is o b v i o u s b u t the
o t h e r two merit some explanation. C l a r i t y means t h a t all r e g i s t e r s should play the same
role and that operations are not implicitly linked to the use of special purpose regis-
ters. A general register m a c h i n e , w i t h 16 g e n e r a l p u r p o s e r e g i s t e r s , each of 16-bit, is
the result (see fig 7). The registers can hold addresses, or operands. Architectural
compatibility means in fact that there are two compatible models of the Z8000 : the
Z 8 0 0 2 o r t h e u n s e g m e n t e d v e r s i o n a n d t h e Z8001 o r t h e s e g m e n t e d v e r s i o n . The models
have exactly t h e same a r c h i t e c t u r e , but their internal structure is d i f f e r e n t . The un-
segmented version has an a d d r e s s space of 64K; t h e Z8001 can a d d r e s s 8 Mbytes of
memory, if a n e x t e r n a l m e m o r y m a n a g e m e n t u n i t is u s e d . T h e M M U c a l c u l a t e s t h e physi-
cal a d d r e s s from a 16-bit offset value and a 7-bit segment number. The way this is
done is s h o w n in f i g . 11. Note that the segment number is converted into a segment
base address by means of a look-up table. F i g . 12 shows an e x a m p l e of this mapping
operation. N o t e again that logical s e g m e n t s m a y well o v e r l a p physically.

87

LOGICAL A D D R E S S S E G M E N T NO- OFFSET

MEMORY
MANAGEMENT
UNIT

BASE
ADDRESS
MEMORY

1615 O 8 7

. 00000000

oooooooo

24-BIT P H Y S I C A L A D D R E S S

F i g . 11 Logical to Physical Address translation for


the Zilog Z 8 0 0 0 . A Memory Management Unit, external
to the processor chip must be used.
- 254 -

Fig. 12 An example of logical to physical address translation ( Z 8 0 0 0 ) . Note


that segments may overlap and that the order in which logical segments are stored
in physical memory is arbitrary.

The Z8000 has t w o stack pointer registers, which are p a r t of t h e g e n e r a l purpose


register file ; they constitute a deviation from the clarity principle. O n e contains the
user stackpointer, the other the system stack pointer, which is o n l y accessible when
t h e m a c h i n e o p e r a t e s in t h e p r i v i l e g e d s u p e r v i s o r state. Certain instructions, including
I/O commands, can o n l y be executed w h e n in s u p e r v i s o r state. T h e p o s s i b l e I/O tran-
sactions include block t r a n s f e r s between a device and memory. T h e addressing modes of
t h e Z8000 f o r m a rather complete set, making parameter passing, re-entrancy, reloca-
tion etc. possible. In a d d i t i o n , t h e Z8000 has a t h r e e level v e c t o r e d priority interrupt
and trap system. In s h o r t , it p r o v i d e s all t h e facilities an operating system needs.
T h e Z8000 w a s in f a c t t h e f i r s t m i c r o p r o c e s s o r w h i c h c o u l d b e p u t o n t h e s a m e l e v e l as
a larger minicomputer. Interested readers, who want to know more details, should
consult the literature, particularly r e f e r e n c e s 21 a n d 2 2 .

M o t o r o l a 68000.

T h e 68000, more recent than the two preceding processors, has become v e r y popu-
lar in a s h o r t time. It is not o n l y widely used in h i g h - e n e r g y physics, b u t also in
many different Personal Work Stations, including IBM's ! This is n o t a s t o n i s h i n g , con-
s i d e r i n g t h a t t h e 68000 p r o v i d e s all t h e f a c i l i t i e s r e q u i r e d f o r a modern computing sys-
tem. 2 1 5 2 3 >
Internally, i t is a 3 2 - b i t m a c h i n e , it i n t e r f a c e s to the external world over
16 d a t a lines. It has t w o sets of 8 32-bit registers ; one set are data registers ; the
other address registers. The CPU contains two additional special purpose registers :
the program counter and the 16-bit status register (see f i g . 13). In t h e f i r s t version
o f t h e 68000 t h e P C h a s 2 4 - b i t s , b u t t h i s can be e x t e n d e d in later m o d e l s . In f a c t , the
68000 s h o u l d be considered as t h e f i r s t m e m b e r of an architectural family, which ,as
- 255 -

DATA
REGISTERS

ADDRESS
REGISTERS

23
PROGRAM
PC COUNTER F i g . 13 The register structure
of the Motorola 68000. Note
STATUS that the general registers are
SR REGISTER 32-bit w i d e .

68000 Registers

technology advances, can g r o w with new members, having increased capabilities. The
development of such a family is g r e a t l y helped b y t h e f a c t t h a t t h e 68000 i s a m i c r o -
programmed machine. It has e v e n t w o l e v e l s o f c o n t r o l : micro- and nano- control (see
figure 14). B y separating the control of p r o g r a m flow from t h e control of t h e combina-
torial c i r c u i t s in t h e C P U , a s a v i n g in t h e total s i z e o f t h e c o n t r o l s t o r e has been o b -
tained. 2 1 , 5
T h e 68000 c a n o p e r a t e in o n e o f t w o states : user or supervisor. The su-
p e r v i s o r state uses a separate stack pointer w h i c h cannot be corrupted b y a user. The
instruction set, modeled after t h e PDP-11, is c o n s i s t e n t , in t h e s e n s e t h a t a n y i n s t r u c -
tion w h i c h specifies an o p e r a n d in m e m o r y , may use a n y of the addressing modes.

INSTRUCTION
MICRO
DECODE
ADDRESS CONTROL
IR STORE

SEQUENCE
MODIFICATION •x 6A0 x 10

REG- AND BRANCH


FUNCTION SELECTION
SELECTION ADDRESS

CONDITIONALS

NANO
I A I addressing " CONTROL
STORE
| S | fetch"
EXECUTION UNIT E st

CONTROL
ll °
(TIMING |D J odd, store
+ SWITCH)
|C I add, fetch »280x70

M 68000 CONTROL UNIT

F i g . 14 The microprogrammed control of the 68000 has two levels. In addition


to the microstore, a so-called nanostore is present. This structure minimizes
the number of bits needed for the microprogram.
- 256 -

The 68000 s t a n d s out for its l a r g e n u m b e r o f a d d r e s s i n g m o d e s : 12. N e e d l e s s t o say


then that parameter passing is n o t a p r o b l e m in a 68000 p r o g r a m . The processor has
even gone a step f u r t h e r b y introducing two instructions which make parameter passing
particularly easy. Besides the stack pointer, another important aid, the stack frame
pointer is i n t r o d u c e d explicitly. A stack frame pointer facilitates access to parameters
on the stack, by refering to the f i x e d position of the frame pointer FP, instead of to
the often varying position of t h e stack pointer itself. A FP could be defined by the
p r o g r a m m e r in t h e p r o c e s s o r s m e n t i o n e d so f a r ( i n c l u d i n g t h e 6809), b u t i n s t r u c t i o n s to
manipulate the FP and thus facilitate parameter passing had not been implemented be-
fore. T h e 68000 L I N K a n d U N L K (unlink) i n s t r u c t i o n s d o p r e c i s e l y w h a t is n e e d e d . Any
address register may be used b y LINK and UNLK, together w i t h an offset. The LINK
An, #displacement instruction does the following :
i) t h e p r e s e n t v a l u e of A n ( = a d d r e s s of old FP) is p u s h e d o n t o t h e stack
ii) t h e n e w v a l u e o f t h e s t a c k p o i n t e r is p u t i n t o An
i i i ) d i s p l a c e m e n t is a d d e d t o S P , reserving s p a c e o n t h e s t a c k f o r local variables.

Figures 15 a n d 16 ( f r o m r e f e r e n c e 23) show how parameter passing can be accom-


plished, whereas at t h e same t i m e re-entrant code will be p r o d u c e d (space for local
variables is reserved on the stack). The translation from the PASCAL program into
assembly language is s t r a i g h t f o r w a r d (figure 15). From figure 16 it is s e e n t h a t the
LINK instruction forms a linked list o f f r a m e - p o i n t e r s FP, thereby creating also the
possibility of accessing data local t o t h e c a l l i n g p r o g r a m . Note that the procedure call
in f i g . 15 d e f i n e s o n e p a r a m e t e r b y its v a l u e , t h e o t h e r b y its address.

SAMPLE PROGRAM:
PROGRAM EXAMPLE;
VAR PARAM 1,PARAM 2: I N T E G E R ;
P R O C E D U R E P R O C (X : I N T E G E R ; V A R Y : I N T E G E R ) ;
VAR A, B : I N T E G E R ;
BEGIN
<procedure body>
END
BEGIN
P R O C ( P A R A M ]j P A R A M 2 )
END.

PROGRAM BODY :
MOVE PARAM 1 TO - S P @ "push first parameter"
PEA PARAM 2 'push a d d r e s s of 2 n d parameter"
JSR PROC 'call the p r o c e d u r e " ,
ADD # 6 TO S P ''pop p a r a m e t e r s f r o m the s t a c k

PROCEDURE BODY :
LINK FP, 4 ' link and allocate three local
variables "
MOVEM < registerlist> T 0 - S P @ * p u s h s o m e register contents"
<procedure body>
MOVEM < registerlist> FROM S P @ "restore registers"
UNLK FP ^restore stack"
RETURN ' r e t u r n to c a l l i n g p r o c e d u r e "

Fig. 15 An example of a Pascal program, calling a procedure. The figure shows


how the program and the procedure are translated into 68000 assembly language.
- 257 -

LOW MEMORY

B E F O R E CALL AFTER P U S H I N G HIGH M E M O R Y


P A R A M E T E R S , CALL,
LINK, A N D
SAVING REGISTERS

F i g . 16 Use of the Stack Frame Pointer (FP) in procedure


calls to facilitate access to the parameters passed to
the procedure. Results can be passed back to the calling
program by the same process.

At present, the address s p a c e o f t h e 68000 is 2 "' = 16 M b y t e 2


; future expansion
to 2 3 2
= 4 Gbyte is f o r e s e e n . A d d r e s s a b l e u n i t s a r e 1, 8 , 1 6 , o r 32 b i t s w i d e . An ex-
ternal memory management unit can be u s e d . T h e 68000 h a s m e m o r y - m a p p e d I/O so no
i s o l a t e d I/O instructions exist. A bus request/grant protocol has been implemented on
the chip, thus facilitating DMA transfers. T h e m a c h i n e has 8 levels of v e c t o r e d inter-
rupts. A p a r t f r o m t h e p r o c e d u r e calls e x p l a i n e d above, a few other features facilitat-
ing high-level language support have been implemented. These include bounds
checking in a c c e s s e s t o a r r a y v a r i a b l e s , t r a p s o n s p e c i a l c o n d i t i o n s in a r i t h m e t i c oper-
ations a n d some loop c o n s t r u c t s w h i c h closely match P A S C A L FOR loops.

T h e reader may consult the manufacturer's literature or ref.21 f o r more details on


t h e 68000.

Texas 9900/99000.

T h e 9900 is a r a t h e r old processor, which can be replaced b y the faster and fully
backward compatible 99000. The 99000 is o n e of the fastest microprocessors on the
market. The 9 9 0 0 is s i n g u l a r i n t h e sense that it h a s n o i n t e r n a l registers to perform
operations on. There is o n l y a program counter, a status register and the workspace
pointer WP. WP points t o a b l o c k o f 16 l o c a t i o n s in m a i n memory which a c t as t h e 16
general purpose registers of the machine. Generally this slows down the operation of
the machine, except when a context switch must be performed f o l l o w i n g an interrupt.
To change the context of the machine it is sufficient to load a new value into
WP, after t h e old v a l u e has been preserved.
- 258 -

The machine has a few serious shortcomings. For instance, subroutine return ad-
d r e s s e s a r e s t o r e d in a d e d i c a t e d register (inside the w o r k s p a c e of c o u r s e ) . Subrout-
ines can t h e r e f o r e not be n e s t e d , u n l e s s t h e r e t u r n a d d r e s s is m o v e d t o a n o t h e r place
b e f o r e t h e n e w c a l l is m a d e . T h e m e m o r y s p a c e is l i m i t e d t o 64 K b y t e s , with no p o s s i -
b i l i t y of e x t e n s i o n . The d e s i g n e r s o f t h e 99000 e v e n refused to contemplate the intro-
d u c t i o n of some memory management scheme in t h e i r d e s i g n , z 5 >
giving absolute priori-
t y to t h e p r i n c i p l e of b a c k w a r d s compatibility.

N a t i o n a l 16008/16016 a n d 16032.

National announced rather recently a new family of p r o c e s s o r s , all w i t h the same


architecture, but distinguished by the width of the data path. 2 6
' These processors
a p p e a r to be v e r y fast. They are further distinguished by their arithmetic capabilities
( b u i l t - i n floating point) and b y t h e v a r i e t y of data s t r u c t u r e s t h e y can handle.

The c h a r a c t e r i s t i c s of t h e p r e c e d i n g 16-bit microprocessors a r e s u m m a r i z e d in Ta-


ble IV, which is a d a p t e d f r o m r e f . 2 7 . Table V , f r o m t h e same s o u r c e , lists execution
times f o r a few typical instructions. The reader s h o u l d b e c a u t i o u s in u s i n g these ta-
bles to select t h e "best" m i c r o p r o c e s s o r for a given application. T i c k i n g off the de-
sired characteristics and comparing the scores may well lead to a d i s a s t r o u s result, if
a more profound study of a f e w processors is not made before the selection is at-
tempted. Prevailing standards in t h e w o r k i n g e n v i r o n m e n t and availability of subsys-
tems and - most important ! - of s o f t w a r e s h o u l d h a v e a much g r e a t e r w e i g h t t h a n the
extra addressing mode, the additional instruction or the slightly faster addition. What
the author hopes to h a v e achieved with this rather long incursion into the field of
processor structures is t h a t t h e r e a d e r h a s b e c o m e a w a r e o f t h e p o t e n t i a l o f t h e s e mo-
dern machines. In particular their suitability for solving real-life problems using
high-level languages and modern programming practice should be noted.

T a b l e IV

Characteristics of a few 16-bit microprocessors

9900/95 8086 Z8D00 68000 16016/32

YEAR AVAILABLE 1976/81 1978 1979 1980 1981


IMPLEMENTATION UPR0G RANDOM UPR0G
CLOCK FREQUENCY (MHZ) 3 5(4-8) 2.5-3.9 4-8 10
No OF BASIC INSTRUCTIONS 69/73 95 110 61 100
No OF GEN.PURPOSE + OTHER R E G s . (16)+3 4+10 16+8 8+8+3 8+8
PIN COUNT 64/40 40 40/48 64 40/48
DIRECT ADDRESS RANGE(BYTES) 64K 1M 48M 16M/64M 16M
No OF ADDRESSING MODES 8 24 8 14 9
1/0 SPACE (BYTES) 0.5/4K 64K 2x64K MEM.
SEP SEP SEP MAP.
DATA TYPES : BITS + + + +
INTEGER BYTE/WORD + + + + +
CHARACTER STRINGS + + + +
BCD BYTE + + + +
FLOATING POINT +
DATA STRUCTURES : STACKS + + + +
ARRAYS +
RECORDS + + + +
STRINGS + + +
CONTROL STRUCTURE : TRAPS/INTs. + + , + + +
SUPERVISOR CALL + + +
- 259 -

Table V

Execution speeds for a few typical instructions


on different 16-bit microprocessors

EXECUTION SPEEDS (.s) 9900 8086 Z8000 68000 16016

REGISTER > REGISTER MOVE 4 60 0.40 0 75 0 50 0.30


9 80 0 80 1 25 0 50 0.30

MEMORY > MEMORY MOVE 9 90 7 00 7 00 2 50 1.60


19 80 14 00 8 50 3 75 2.40

ADD M E M O R Y TO REGISTER 7 32 3 60 3 75 1 50 1.10


21 30 7 20 5 25 2 25 1.50

MULTIPLY (MEM—>MEM) 21 90 23 00 16 00 8 75 4.6 0


180 64 115 20 85 75 43 oo 7.60

CONDITIONAL BRANCH 3 60 1 60 1 50 1 25 1.40


2 90 0 80 1 50 1 oo 0.70

MODIFY INDEX, BRANCH IF=0 7 .60 2 .20 2 75 1 25 1.30

BRANCH TO SUBROUTINE 7 .90 3 80 3 75 2 25 2.50

4. Support chips

T h e v a r i e t y o f m i c r o p r o c e s s o r s u p p o r t c h i p s is a s t o u n d i n g ( s e e T a b l e II f o r w h a t is
probably a "short l i s t " ) . We will limit o u r s e l v e s t o a f e w devices which have found in-
teresting applications in h i g h - e n e r g y physics.

4.1 Memory

Memory devices are undoubtedly the most w i d e l y used support chips. H e r e w e will
o n l y mention a few unusual applications of memory.

4.1.1 Look-up tables.

Look-up tables are more and more used, as t h e y m a k e v e r y rapid evaluation of


complex expressions possible. They can be used to evaluate Boolean expressions,
such as

F = ABC +
ACD +
BCD

or arithmetic expressions, such as

R = VX 2 +
Y 2
and <p = a r c t a n Y/X

T h e p r i n c i p l e is s i m p l e : t h e i n p u t v a l u e s a r e c o n c a t e n a t e d to f o r m an a d d r e s s . At that
address in memory the output value is s t o r e d . The answer is o b t a i n e d in o n e single
memory access time.

For Boolean expressions a memory of size 2 n


is n e e d e d f o r n i n p u t variables. One
therefore runs quickly out of s u p p l y of m e m o r y c h i p s if, for instance, one would use
directly the o u t p u t of a large number of scintillation counters, to evaluate if a valid
- 260 -

trigger condition occurred. In m a n y cases a FPLA (Field Programmable Logic Array)


can be used instead, but the possibility of dynamically reconfiguring the expression is
t h e n lost. A m e m o r y can be q u i c k l y r e l o a d e d at a n y time.

In t h e e a r l y s e v e n t i e s Synertek produced a special m e m o r y chip for physics appli-


cations. Instead of decoding on chip 5 bits to p r o v i d e a column address and 5 bits to
produce a row address, these decoders were simply bypassed. T h e 32 r o w a n d 32 c o -
lumn lines w e r e b r o u g h t out directly ( s e e f i g u r e 17 f o r t h e p r i n c i p l e o f a 1 K memory).
A 3 2 x 32 c o i n c i d e n c e matrix is t h e r e s u l t , w h i c h c a n s t i l l t> e
u s e d if m u l t i p l e counters
are hit. T h i s w o u l d g o u n n o t i c e d if a n o r m a l e n c o d i n g s c h e m e o f t h e 32 c o u n t e r s into a
5-bit pattern had been used.

A4 A3 A2 Al AO

1024x1 MEMORY

F i g . 17 Principle of a I K " 1 bit memory, showing that 32


rows and 32 columns of storage elements are used.

A n added advantage of using memory modules in a t r i g g e r o r e v e n t selection set-up


is t h a t l o g i c a l and arithmetic operations may be freely m i x e d . For arithmetic operations
the operand sizes are t h e main problem. Three 16-bit input operands would need the
w h o l e IBM Mass Store to store the result for every possible combination of i n p u t s . Also
h e r e a solution can be f o u n d sometimes, as d i s c u s s e d in a n o t h e r l e c t u r e c o u r s e . 2 8 3
Se-
paration of v a r i a b l e s and - in c a s e o f linear relations - splitting of long variables into
subfields can provide solutions. For example e x
cos Y n e e d s a t a b l e w i t h 64 K entries
if X and Y have 8-bit each and are simply concatenated. If e x
is e v a l u a t e d separately
f r o m cos Y , t w o t a b l e s w i t h 256 e n t r i e s is all t h a t is n e e d e d . T h e penalty is o n e m u l t i -
plication to be p e r f o r m e d a f t e r w a r d s . The multiplication process itself p r o v i d e s another
e x a m p l e of p o s s i b l e reduction of the table size : for two 16-bit operands a table with
2 3 2
= 4.10 s
entries is r e q u i r e d . Splitting the operands a and ß into 2 fields of
8-bits each, yields :

a.ß = (A.2 8 +
a)(B.2 8 +
b) = AB.2 1 6
- (Ab aB)2
+ s
+ ab

Four identical tables with 256 e n t r i e s are now all t h a t is needed. All linear functions
can be treated this w a y . C o u n t i n g t h e n u m b e r of bits s e t in a w o r d is a t h i r d example
where a look-up table is f a s t e r than any other method. Again the table size can be
r e d u c e d at t h e c o s t o f a f e w additions.
- 261 -

I t is e v e n p o s s i b l e t o c o n s t r u c t an arithmetic processor using memories only. The


principle is t h e use of Residue Arithmetic. 2 9 5 3 0 )
The principle can be explained very
simply : Assume we take 4 numbers (we could take more) B l t B ,
2 B 3 and B<, which
are mutually prime ( t h e y do not contain common factors). 7,11,13 a n d 15 is a reason-
able choice f o r the example. A n u m b e r N can t h e n be represented by the four remain-
ders obtained b y dividing N by B k k=1, ... 4 :

R k
=
N modulo B k

It c a n be p r o v e d (chínese remainder theorem) 2 9 5


that this representation is u n i q u e for
N < B j . B . B . B«.
2 3 We have thus obtained a 15 b i t representation of the numbers 0 -
15015. Why 15-bit ? T h r e e of the remainders in o u r example are less t h a n 16 a n d one
is l e s s t h a n 8 , s o t h e y h o l d in 4 r e s p . 3 - b i t fields.

T h e three basic arithmetic operations can be p e r f o r m e d with ease on this represen-


tation :
i) a d d i t i o n . N +
N' = M
The representation o f M is found by simply adding R k
+
R' k for k=1, ... 4 and
taking each result modulo (B^).
ii) s u b s t r a c t i o n . as a b o v e : R k - R' k gives the result.
iii) m u l t i p l i c a t i o n . P = N.N'
(a B k k * R )(a' B
k k k • R' ) k = { } B R * R .R' k k

So, R ^ \ = k- '|< modulo B


P R R
k

T h e r e s u l t is o b t a i n e d b y m u l t i p l y i n g the four residues separately, and taking the


remainder.

T h e s e t h r e e basic arithmetic operations can be p e r f o r m e d b y u s i n g small l o o k - u p ta-


bles. In o u r example each table is 256 x 4 b i t s and 12 o f t h e m are needed (addition,
subtraction and multiplication for 4 different It can also be s h o w n t h a t conversion
between ASCII code and residue number representation can be p e r f o r m e d with look-up
tables again. Also negative numbers or fractions can be represented. A processor can
thus be built entirely from m e m o r y . 3 1 5
Such a processor would be v e r y fast, but it
would have one v e r y serious shortcoming : it c a n by no means perform a meaningful
division b y a variable, (division b y a constant can of c o u r s e be r e p l a c e d b y a multipli-
cation).

4.2 Content Addressable Memory and Associative Processors.

A Content Addressable Memory (CAM) is a s t o r a g e d e v i c e w h i c h returns a signal on


one or more address lines whenever the word presented on its data lines is found
amongst the words s t o r e d in t h e m e m o r y . We p r e s e n t data to t h e d e v i c e and obtain an
answer indicating that the data is p r e s e n t or not and ,if present, the address where
the data are s t o r e d . In g e n e r a l a m a s k c a n b e a p p l i e d t o t h e d a t a a n d t h e search for a
match made on a selected set of bits. A C A M m u s t also b e able to o p e r a t e as a normal
read/write memory, as d a t a m u s t b e w r i t t e n i n t o it in t h e f i r s t p l a c e . Content Addres-
sable Memories are ideal d e v i c e s when arrays must be searched for the presence of a
particular datum. A C A M avoids the need for a sequential search and gives the desired
a n s w e r in a s i n g l e m e m o r y c y c l e . CAMs would greatly i m p r o v e the speed of m a n y pro-
cesses, which involve searching long lists o r tables. T r a c k following is a n e x a m p l e o f a
procedure that would benefit from it ; CAMs have in fact been used in a processor,
specialized for this task.* 5
- 262 -

One would therefore expect to find large sized CAM chips on the market. This is
however n o t t h e c a s e : a C A M c h i p is h e a v i l y p i n - l i m i t e d . W h e r e a s a normal RAM of 2^
words x N bits needs only k +
N pins (plus a few more for power, R/W c o n t r o l and
chip enable), a C A M of t h e same s i z e n e e d s in a d d i t i o n N pins f o r a mask and 2^ pins
to signal the matches f o u n d . These 2 lines c a n n o t b e e n c o d e d as t h e possibility would
b e l o s t t o f i n d m u l t i p l e m a t c h e s in a s i n g l e c y c l e . A large CAM can of c o u r s e be built
from many small c a p a c i t y chips, but a more elegant solution is o f f e r e d by associative
processors. An associative processor searches all elements of an a r r a y simultaneously
for a match. Generally this is done bit-by-bit. The array to be searched may have
1000 e n t r i e s , typically. As long as t h e ratio of t h e w o r d length o v e r the number of
e n t r i e s t o b e s e a r c h e d is s m a l l , a c o n s i d e r a b l e g a i n in s p e e d is o b t a i n e d . I n an a s s o c i -
ative processor each memory word has a simple processing element attached to it. A
memory word contains in general many bits, so t h a t several attributes can be stored
along with the search field. For instance if t h e memory contained personal data, a
search could be made for all t h e S m i t h e s who are between 45 a n d 55 y e a r s old. Once
located, their address and telephone number could then be read from those memory lo-
cations w h e r e a match was found.

The memory for an a s s o c i a t i v e processor can be built from normal RAM chips, for
i n s t a n c e as i n d i c a t e d in f i g u r e 18. T h e n o r m a l a d d r e s s lines are used to select one bit,
for all 1024 e l e m e n t s o f t h e v e c t o r . These 1024 b i t s a r e t r e a t e d in t h e 1024 processing
elements. T h e processing elements (PEs) m u s t b e c h e a p to m a k e an a s s o c i a t i v e proces-
sor a viable structure. S i n c e o n e b i t is t r e a t e d at a t i m e , a single-bit microprocessor
would be indicated. An associative processor using the single-bit Motorola MC14500B
has been b u i l t at t h e U n i v e r s i t y of T o r o n t o . 3 2 5
1 n addition to the PE's a n d t h e working
store there is also a b a c k i n g store. The different elements communicate as s h o w n in
figure 19 f o r a s i n g l e horizontal slice t h r o u g h the machine. The shift register is used
for communication between the different elements of t h e vector. This communication is
needed for the execution of o p e r a t i o n s which are more complex than the simple search-
es. A block d i a g r a m o f t h e m i c r o p r o c e s s o r is s h o w n in f i g u r e 2 0 . T h e MC14500B has a
set of 7 boolean and 9 o t h e r instructions. T h r e e of t h e i n s t r u c t i o n s automatically enable
the write line. R e f e r e n c e 32 g i v e s more details on this cheap associative processor, in
p a r t i c u l a r examples of search and o t h e r operations.

i ;
9
8-BIT 1 CHIP 8 PEs
B
8-BIT 1 CHIP ! ¡ 8 PES
ETC..
! ¡
102« k - BIT i ! 1024 PEs
ELEMENTS SEARCH i i TOTAL
IN A VECTOR FIELD—•», J

Fig. 18 A l K x 256 bit memory for an associative processor


can be built from 128 256 x 8 bit memory chips. Broadcasting
the same address to all 128 chips results in selecting a single
bit for all 1024 elements of the v e c t o r .
- 263 -

SERIAL IN
CONST.

SHIFT
REG
SH

BACKING WORKING
STORE (CCD) STORE (RAM) PE SERIAL OUT
BK WK

WRITE ÎcONTROL

ADD
*I ADDRESS
* ï LIN
ES RA;

* COMMON TO ALL ARRAY WORDS

Fig. 19 Block diagram of an associative processor. The diagram


shows the structure implemented for each element of the array.

Œ3 FLAGS

OEN
INSTRUCTIONS

7 BOOLEAN
+ 9 OTHER
INSTRUCTIONS
(3 ENABLE,
WRITE )
INSTRUCTIONS

WRITE ENABLE

F i g . 20 Diagram of the MC14500B, the single-bit microprocessor


used in the associative processor of the preceding figure.

4.3 Arithmetic attachments.

We h a v e seen that the arithmetic capabilities of microprocessors h a v e been greatly


improved, but processors with floating-point arithmetic on chip do practically not exist
yet. A n u m b e r of floating point chips which can be interfaced to a microprocessor are
available on t h e market to augment the p o w e r of t h e processor, if n e e d e d . T h e oldest,
the Amd 9511 w a s t e r r i b l y slow, taking 57 us f o r a 32-bit floating point multiplication
- 264 -

or division. The Intel 8087, a co-processor for t h e 8086 i m p r o v e s greatly on this fig-
ure, bringing it d o w n t o 16 fis ( o r 24 \is f o r m u l t i p l i c a t i o n o f t w o 6 4 - b i t real numbers).
The question arises if it is f e a s i b l e t o replace large number crunching computers by a
reasonable number of 16-bit microprocessors, each with an attached floating point
processor. Can a reasonable performance be obtained at an affordable cost ? At
the Brookhaven National Laboratory this problem was i n v e s t i g a t e d . 3 3
' An experimen-
tal processor was built from a 68000 a n d an A m d 9511. A Fortran compiler was devel-
oped for the 68000 a n d a p i e c e o f pattern recognition code f o r events in t h e multiparti-
cle s p e c t r o m e t e r was run. Performance measurements were made using this code. The
results were then extrapolated t o a 8 M H z 68000 (instead of t h e 4 M H z v e r s i o n used in
t h e t e s t ) a n d t o a N S 16081 ( i n s t e a d of t h e 9511). It w a s f o u n d t h a t t h e c o m b i n a t i o n of
a 68000 a n d a N S 16081 w o u l d have 1/30 o f t h e p o w e r o f a C D C 7600, or approximately
the power of a D E C - 1 0 ( w i t h a KA10 processor). O n e can conclude from this result that
single microprocessors do not provide yet a solution for applications where number-
crunching is e s s e n t i a l .

5. VLSI

VLSI (Very L a r g e Scale I n t e g r a t i o n ) has become a tool f o r research in n o v e l compu-


ter architectures and special p u r p o s e machines. The availability of C o m p u t e r Aided De-
sign tools, the existence of small semiconductor firms specializing in t h e f a b r i c a t i o n of
custom-designed chips ("silicon foundries") and the need to educate specialists has led
to a f l o u r i s h i n g research activity in a p p l i c a t i o n s of V L S I . ' "3
E x a m p l e s of s u c c e s f u l pro-
jects are :
- the geometry engine developed at S t a n f o r d 3 5
' for clipping, scaling and coordinate
t r a n s f o r m a t i o n of objects to b e d i s p l a y e d on a g r a p h i c s screen ;
- S c h e m e - 7 9 , d e s i g n e d at M I T 3 6
', which is a c h i p f o r d i r e c t e x e c u t i o n of the program-
ming language LISP ;
- RISC, the Reduced Instruction Set Computer, developed at Berkeley, and which we
will b r i e f l y d e s c r i b e below.

Besides these projects data flow machines, tree machines, systolic arrays and a
number of other non-von Neumann structures are the object of study. Industry tends
to remain on the safe side and its VLSI developments are essentially limited
to fabricating larger memories and more complex (micro)processors (e.g. iAPX 432
of I n t e l 3 7
' 3 8
').

5.1 Systolic Arrays

The concept of systolic computing was introduced by Kung. 3 9


' " '
, 0
He observed
that for certain calculations the throughput could be improved by connecting several
processing elements in a pipe-line fashion and "pumping" the data through this pipe-

y
line, instead of c y c l i n g the data through a central memory (see f i g u r e s 21a a n d b).

a) i i b)
MEMORY EMORY

100 ns 5 MOPS MAXIMUM 100 ns 30 M O P S

|PE|PE|PE|PE¡PE|PE

Fig. 21 Principle of systolic arrays, a) The throughput in a normal processor is limited,


because intermediate results must be stored back in memory, b ) When intermediate results
are passed to the next processing element in the array, the throughput is improved.
- 265 -

The analogy with the heart gave the name to this type of computing structure. A
structure as d e p i c t e d in f i g u r e 2 1 b is o f c o u r s e not suited for general computing ; in
most instances we would not know w h a t to do with the a r r a y of PE's. Systolic arrays
are however very well suited for signal processing and pattern matching. We will show
two examples of how a convolution integral may be evaluated using two differently
structured systolic arrays. Kung himself gives four more structures to compute the
convolution integral.* 0 >

We will approximate the convolution integral

Y(t) = o J « W(t-T).x(T)dT

This can be done with a s t r u c t u r e as s h o w n in f i g u r e 2 2 , w h e r e each s q u a r e box is a


processing e l e m e n t , w h i c h p e r f o r m s o n l y o n e s p e c i f i c o p e r a t i o n , a s s h o w n in t h e f i g u r e .
In t h i s example the weights are attached to PEs and do not c h a n g e o r move through
the array. The input sequence is b r o a d c a s t t o all PEs, o n e e l e m e n t at a t i m e . At the
next beat of the clock, a new element of the input sequence is presented. The Y's
move systolically through the array from left to right. At every beat the o u t p u t of a
P E b e c o m e s t h e i n p u t t o its r i g h t h a n d neighbour. I t is e a s i l y v e r i f i e d t h a t a f t e r a n in-
itial p e r i o d in w h i c h the pipeline becomes filled, the output of t h e a r r a y is e x a c t l y the
sequence { Y i , Y 2 , Y n + ^_| } a n d t h a t o n e
< n e w element of t h e s e q u e n c e is produced
every beat. T h e same r e s u l t can b e o b t a i n e d w i t h a d i f f e r e n t s t r u c t u r e and" a d i f f e r e n t
PE. This is s h o w n in f i g u r e 2 3 . T h e W's a r e a g a i n f i x e d in e a c h P E , but this time the

X BROADCAST
STAY
Y¡ MOVE SYSTOLICALLY Yout=V +W.Xi,
in
in
F i g . 22 Example of a systolic array to calculate the
convolution integral. The specialized function of a PE
is shown in the inset. For each beat of the clock, new
Input data is presented and intermediate results move on.

is STAY

X¡' A N D
s Y¡'s M O V E
IN OPPOSITE DIRECTIONS
•w.x¡,i n

F i g . 23 Another configuration of a systolic array for


the convolution integral. A new value of the input
sequence is now presented every second beat of the clock.
- 266 -

X and Y move systolically, in o p p o s i t e d i r e c t i o n s . The PE t r a n s m i t s the X unchanged,


but with a delay of one beat. It is a g a i n e a s y to v e r i f y that the correct r e s u l t is ob-
tained when a new X is p r e s e n t e d e v e r y s e c o n d b e a t , a s i n d i c a t e d in t h e f i g u r e .

A concrete e x a m p l e of a systolic array is t h e pattern matching chip developed at


Carnegie-Mellon University by Foster and Kung. 4 1
' A character string of arbitrary
l e n g t h is s e a r c h e d f o r t h e o c c u r r e n c e of a p a t t e r n (figure 24). The pattern may con-
tain w i l d c a r d characters, which match with any character (the X in the figure). The
pattern moves over the character s t r i n g and w h e n a match is f o u n d a logical 1 is out-
p u t , o t h e r w i s e t h e r e s u l t is " 0 " .

AXC
PATTERN
001001100.-.
HOST
ABCAACCQQ...
STRING
F i g . 24 The principle of the pattern
PM
matching chip, which is a practical
RESULT implementation of the systolic array.

5.2 T h e G e o m e t r y Engine.

T h e geometry engine, designed by Clark, 3 5 5


is i n t e n d e d to t r a n s f o r m objects to be
displayed on a graphics screen. A complete system requires 12 i d e n t i c a l c h i p s , used in
slightly different ways. T h e functions performed are :
- coordinate transformation of the object
- clipping to t h e b o u n d a r i e s of the v i e w i n g window
- scaling to t h e viewport.

The chip i t s e l f Is a n a r i t h m e t i c device, capable of p e r f o r m i n g floating point opera-


tions on numbers with an 8 - b i t exponent and a 20-bit mantissa. There are 4 floating
point units per chip, multiplication is p e r f o r m e d in 12 u s . The overall performance of
t h e 1 2 - c h i p s y s t e m is r e m a r k a b l e : 3500 l i n e s o r 900 p o l y g o n s c a n b e t r a n s f o r m e d , clip-
ped and scaled e v e r y 1/30 s e c o n d , the refresh time of t h e d i s p l a y . A t the time of pu-
blication of t h e d e s c r i p t i o n of the g e o m e t r y e n g i n e , a p a r t of t h e final chip had actual-
ly been implemented, p r o v i n g the feasability of t h e device.

5.3 RISC

Before we describe the Reduced Instruction Set Computer of Patterson


and S e q u i n , 4 2 5
we want to repeat some of t h e a r g u m e n t s u s e d in d i s c u s s i o n s o n opti-
mum instruction sets. The fundamental question is Should the instruction set of a
computer be simple o r complex, i.e. should the set contain powerful instructions such
as block moves, which keep the processor busy for many cycles and which require
complex l o g i c o r s h o u l d it n o t ?

Some of the arguments in favour if CISCs (Complex Instruction Set Compu-


ters) are :
- t h e i m p l e m e n t a t i o n o f a c o m p i l e r is s i m p l i f i e d o n a C I S C .
- some f u n c t i o n s of the operating system can be migrated to the hardware, with in-
creased performance.
- the code f o r a C I S C is compact.
- 267 -

A s w e h a v e s e e n in c h a p t e r 3 , t h e p r e s e n t t r e n d is t o w a r d s C I S C . T h e Z8000, MC
68000 a n d N S 16032 a r e e x a m p l e s . The Intel i A P X 432 a n d t h e H e w l e t t P a c k a r d H P 9000
(430 000 t r a n s i s t o r s !) have gone even further in this direction. By contrast, the
T e x a s T M S 99000 h a s r e m a i n e d r a t h e r simple.

T h e opponents to this t r e n d advance the following arguments against C I S C s :


- Maybe the code generation p a r t of a compiler will be simplified, b u t this is o n l y a
small p a r t of t h e total j o b . T h e lexical and s y n t a x analysis, parsing, optimization and
also loaders a n d e x c e p t i o n h a n d l i n g remain unchanged.
- complex instructions may do the wrong thing for languages other than the one they
were designed for.
- t h e cost of m e m o r y goes d o w n . So w h y w o r r y about compact code ?
- compact programs are not necessarily the fastest.

They further a r g u e that the design of the instruction set s h o u l d be based on the
actual use of the instructions. So L O A D , STORE and BRANCH instructions should be
made faster. The d e s i g n e r can forget about the rest, the instructions which are seldom
used : t h e y should be replaced b y software. T h e compiler should get the task to optim-
ize t h e h a r d w a r e / s o f t w a r e mix f o r h i g h e s t speed.

The RISC-I is a n e x a m p l e o f a S i m p l e Instruction Set C o m p u t e r . The design goals


were the following :
- i n s t r u c t i o n s s h o u l d b e e x e c u t e d in a s i n g l e cycle
(register > operation > register)
- higher level instructions should be implemented by software, using the basic instruc-
tions.
- all i n s t r u c t i o n s s h o u l d h a v e the same w o r d length.
- high-level languages should be supported through fast execution of CALL and
RETURN, together with an effective means of parameter passing and allocation of
s t o r a g e f o r local variables.
RISC-I has a very elegant scheme of "register windows" to implement the latter re-
quirement. A register window is s h o w n in f i g u r e 2 5 a . It c o n s i s t s of a storage area for
global variables and three other areas : high, local a n d l o w . W h e n a procedure A calls
procedure B, the low area of A becomes the high area of B (see fig-
ure 25b). Parameters are passed through this area. In the course of execution of a
program the register window slides upward and downward, a c c o r d i n g to the nesting of
the procedures.

HIGH A b)
LOCAL A PROC. A
LOW A / H I G H B
LOCAL B PROC. B
a) L O W B/HIGH C
LOCAL C PROC.C
HIGH
LOW C
LOCAL 1
REGISTER
LOW WINDOW

GLOBAL GLOBAL

Fig. 25 Register windows used in R I S C - I . a) The components of a


register window, b ) Parameters are passed to procedures by
sliding the register window up and down.
- 268 -

It is interesting to see how well RISC-I performs compared to o t h e r processors.


Some results have been published. '" >
The comparison concerns a pattern matching in-
struction, MATCHP, implemented in microcode on the VAX 11/750. The instruction
counts the occurrences ( m ) of a 1 6 - b i t p a t t e r n in a s t r e a m o f n b i t s . The instruction
was rewritten in C a n d in a s s e m b l y language for V A X 11/750, 11/780 a n d f o r RISC-I.
The microcoded implementation is t h e f a s t e s t , RISC-I hand-coded is 5.1 times slower,
the C-version 5.6 times, hand-coded version on V A X 11/780 8 . 6 a n d on V A X 11/750
14.3 times slower. In spite of the slower cycle of RISC (1.25 x slower than the
11/750), its p e r f o r m a n c e is c o n s i d e r a b l y better than the V A X ' s , in t h i s p a r t i c u l a r case
at least. Note also that the code p r o d u c e d b y a portable, non-optimizing C-compiler can
hardly be improved by hand-coding on R I S C - I . T h e s e are certainly encouraging results
f o r t h e p r o p o n e n t s of simple instruction set computers.

6. Conclusion

In t h e s e lecture notes the author has paced rapidly over a wide field, picking a
flower here and there, without bending down to inspect more closely the large variety
of herbs g r o w i n g in t h e f i e l d . H e h o p e s n e v e r t h e l e s s t h a t he has s u c c e e d e d in g i v i n g a
glimpse of a few topics of i n t e r e s t f o r experimental physicists. The bit slices are the
devices to be used when speed is at a p r e m i u m o r when an e x i s t i n g machine must be
emulated. The fixed instruction set m i c r o p r o c e s s o r s are catching-up very rapidly, not
only in speed and arithmetic capabilities, but especially in their high-level language
and other software support facilities. Luckily, t h e times are gone where an engineer
painfully aligned zeroes and ones to be b u r n t into a P R O M , hoping that it w o u l d make
his microprocessor chip to do something useful. The new developments in VLSI may
p r o d u c e some f u r t h e r h a p p y surprises in t h e y e a r s to c o m e . But even without new dev-
i c e s , c l e v e r u s e o f t h e o l d o n e s c a n a l s o b e o f g r e a t h e l p in p h y s i c s experiments.

7. Acknowledgements.

I would like to t h a n k Mrs C. Gentet for preparing the manuscript and making it f i t
to p r i n t and Mrs. O. Marais f o r her usual fast a n d accurate p r o d u c t i o n of figures.

8. References.

1. R.N. Noyce, M.E. Hoff J r , A H i s t o r y of M i c r o p r o c e s s o r D e v e l o p m e n t at I n t e l , IEEE


Micro, V o l . 1 , no. 1, F e b r 1981, p p 8-21.
2. T. Lingjaerde, A Fast Microprogrammable Processor, CERN DD/75/17 (1975).
3. E. Barrelet, R. Marbot a n d P. Matricon, A V e r s a t i l e Micro computer for High Rate
Camac Data A c q u i s i t o n , in Real-time Data Handling and Process Control, Proc. 1st
European Symposium, Berlin 1979 ( e d . H . Meyer) (North Holland, Amsterdam 1980),
p. 77.
4. P. Schildt, H.J. Stuckenberg, N. Wermes, MONICA - a Programmable Microproces-
sor for Track Recognition in a n e e - +
E x p e r i m e n t at P e t r a , in P r o c . Topical Confer-
ence on the Application of Microprocessors to High-Energy Physics Experiments,
Geneva 1981, C E R N 8 1 - 0 7 , p . 38.
5. C. Halatsis, A. Van Dam, J . Joosten and M.F. Letheren, Architectural Considera-
tions for a Microprogrammable Emulating Engine Using Bit-slices, CERN DD/79/7
(1979), in Proc. 7th Int. Symposium on Computer Architecture, La Baule, 1980,
I E E E p u b l . 80 C H 1 4 9 4 - 4 C (1980).
- 269 -

6. J, Anthonioz-Blanc, C. Halatsis, J , Joosten, M.F. Letheren, A. van Dam, A. van


Praag and C. Verkerk, MICE, a fast User-Microprogrammable Emulator of the
PDP-11, CERN DD/80/14 (1980).
7. T h e 370/E w a s developed at t h e Weizman Institute by H. Brafman, R. Fall a n d R.
Yaari.
8. A. B. Salisbury, Microprogrammable Computer Architectures, Elsevier, New York
1976.
9. J. Lecoq, T h e s e d'Etat, M u l h o u s e 1982.
10. P. F. Kunz, The LASS Hardware Processor, Nucl. Instr. M e t h o d s 135, 435 ( 1 9 7 6 ) .
11. P.F. Kunz, R.F. Fall, M.F. Gravina and H. Brafman, The LASS Hardware
Processor, in Proc. 11th A n n u a l Microprogramming Workshop, Pacific G r o v e , Cal.,
1978, IEEE pub. 78 C H 1411-8 (1978).
12. P.F. Kunz, R.M. Fall, M.F. Gravina, J . H . Halpering, L.J. Levinson, G.J. Oxoby
ad Q . H . Trang, Experience Using the 168/E M i c r o p r o c e s s o r f o r O f f - l i n e D a t a Analy-
sis, IEEE T r a n s . Nucl. Science NS-27,582 (1980).
13. D. Lord et.al. T h e 168/E a t C E R N a n d t h e M a r k 2 , an I m p r o v e d Processor Design,
in Proc. Topical Conf. on Application of Microprocessors to High-Energy Physics
Experiments, G e n e v a 1981, C E R N 8 1 - 0 7 , p . 341.
14. C. Halatsis, Software Tools for Microprocessor Based Systems, in Proc. 1980 CERN
School of C o m p u t i n g , Vraona, 1980, C E R N 8 1 - 0 3 , p . 241.
15. M.R. Barbacci and A.W. Nagle, an ISPS Simulator, Technical Report, Dept. of
Computer Science, Carnegie-Mellon University (1978)
16. A.K. Agrawala and T . G . Rauscher, Foundations of Microprogramming; Architecture,
Software and Applications (Academic Press, New Y o r k , 1976).
17. S.S. Husson, Microprogramming : Principles and Practices (Prentice-Hall, Englewood
Cliffs, NJ, 1970).
18. G.J. Myers, Digital System Design with LSI Bit-Slice Logic (Wiley Interscience,
1980).
19. A. van Dam, Introduction to Bit Slices and Microprogramming, in Proc. 1980 CERN
School of C o m p u t i n g , Vraona, C E R N 81-03, p. 220.
20. S.P. Morse, B.W. Ravenel, S. Mazor and W.B. Pohlman, Intel Microprocessors -
8008 t o 8 0 8 6 , C o m p u t e r V o l . 1 3 , n o . 1 0 , O c t o b e r 1980, p.42.
21. J.F. Wakerly, Microcomputer Architecture and Programming (John Wiley and Sons,
New Y o r k , 1981).
22. B.L. Peuto, Architecture of a New Microprocessor, Computer, Vol 12, n o . 2, Feb.
1979, p . 10.
23. S. Stritter and T. Gunter, A Microprocessor Architecture for a Changing World,
t h e M o t o r o l a 68000, C o m p u t e r V o l . 12, n o . 2 , F e b . 79, p. 43.
24. S. Stritter and N. Tredennick, Microprogrammed Implementation of a S i n g l e Chip
Microprocessor, in Proc. 11th Annual Microprogramming Workshop, Pacific Grove,
1978 : IEEE pub. 78 c h 1 4 1 1 - 8 , p a g e 8 .
25. R.V. Orlando and T.L. Anderson, An O v e r v i e w o f t h e 9900 M i c r o p r o c e s s o r Family,
IEEE Micro, Vol. 1, no. 3 ( A u g u s t 1981), p. 38.
26. S. Bal e t . a l . , T h e N S 16000 F a m i l y - Advances in A r c h i t e c t u r e a n d H a r d w a r e , Com-
puter, Vol. 15, n o . 6, ( J u n e '82) p . 58.
27. Hoo-min D. Toong and Amar Gupta, An Architectural Comparison of Contemporary
16-bit Microprocessors, IEEE Micro, V o l . 1, no. 2 (May '81), p. 26.
28. C. Verkerk, Use of Intelligent Devices in High-Energy Physics Experiments, in
Proc. 1980 C E R N School of C o m p u t i n g , V r a o n a , C E R N 81-03, p. 282.
29. H.L. Garner, The Residue Number System, IRE Trans. Electr. Compu-
ters, Vol EC-8, (1959) 140.
- 270 -

30. N.S. Szabo and R . J . Tanaka, Residue Arithmetic and Its A p p l i c a t i o n s to Computer
Technology, (Mc G r a w - H i l l , New Y o r k , 1967).
31. A. Huang, Number Theoretic Processors, Thesis, Dept. of Electrical E n g . , Stanford
University, 1980.
32. W.M. Loucks, M. Snelgrove and S.G. Zaky, A Vector Processor based on One-bit
Microprocessors, IEEE Micro, V o l . 2, no. 1 (Febr. 1982), p. 53.
33. H. Bernstein et.al., A Microprocessor - based Single Board Computer for High En-
ergy Physics Event Pattern Recognition, in Proc. Topical Conf. Application of Mi-
croprocessors to High-Energy Physics Experiments, Geneva, 1981 ; Cern
81-07, p.479
34. P.C. Treleaven, VLSI Processor Architectures, Computer, Vol. 15, No 6 (June
1982), p. 33
35. J. Clark, A VLSI Geometry Processor for Graphics, Computer, Vol. 13, n o . 7 (July
1980), p. 59.
36. G.J. Sussman, J. Holloway, G.L. Steele J r . ad A. Bell, Scheme-79, LISP on a
Chip, Computer, Vol. 14, n o . 7 ( J u l y 1981), p. 10.
37. Introduction to the ¡APX 432 Architecture, Intel Corporation, Santa Clara, Cal.
1981.
38. S. Zeigler, N. Allègre, R. J o h n s o n , J . M o r r i s and G. Burns, A d a f o r t h e I n t e l 432
Micro- computer, Computer, Vol. 14, n o . 6 ( J u n e 1981), p. 47.
39. H.T. Kung, Let's Design Algorithms for VLSI Systems, Proc. Conf. Very Large
Scale Integration, Architecture, Design, Fabrication, Cal. Inst, of T e c h n o l o g y , Los
Angeles, Jan.1979, p. 65-90.
40. H.T. Kung, Why Systolic Architectures ? Computer, Vol. 15, No 1 (Jan.
1982), p.37.
41. M.J. Foster and H . T . Kung, The D e s i g n of Special P u r p o s e V L S I Chips, Computer,
Vol. 13, n o . 1 ( J a n . ' 8 0 ) , p. 26.
42. D.A. Patterson and C H . Sequin, A VLSI RISC,Computer, Vol. 15, no. 9, (Sept.
1982), p.8.
43. J.R. Larus, A Comparison of M i c r o c o d e , Assembly Code and High-Level Languages
on the VAX-11 and RISC-I, Computer Architecture News (ACM Special Interest
Group), Vol. 10, n o . 5, S e p t . ' 8 2 , p . 10.

You might also like