Professional Documents
Culture Documents
C. Verkerk
CERN, Geneva, S w i t z e r l a n d
1. Introduction
For most readers the title of these lecture notes will evoke microprocessors. The
fixed instruction set m i c r o p r o c e s s o r s are h o w e v e r not t h e o n l y p r o g r a m m a b l e digital mi-
c r o c i r c u i t s a n d , a l t h o u g h a n u m b e r of p a g e s w i l l b e d e d i c a t e d t o t h e m , t h e aim o f t h e s e
notes is also to draw attention to other useful microcircuits. A complete survey of
programmable circuits would fill several books and a selection had therefore to be
made. T h e choice has rather been to treat a variety of devices than to g i v e an in-
depth treatment of a particular circuit. T h e selected devices h a v e all f o u n d u s e f u l ap-
p l i c a t i o n s in h i g h - e n e r g y physics, or hold promise for f u t u r e use.
Table I
CALCULATOR CHIPS
r- 1 bit
- t bit
*FIXED INSTRUCTION - *8 bit
*SET MICROPROCESSORS •»16 bit
L 32 bit
PROCESSORS
i-»
C R ) ALU
MSECUENCER
'* BIT—SLICES [-SUPPORT (MAU.DMA)
LsPECIAL (FFT)
-SPECIAL PURPOSE
(CIPHER, «GRAPHICS,
»SYSTOLIC, MOUSE)
VLSI -TREE MACHINES
-SIMPLE INSTRUCTION SET
*(LISP, R I S C )
LNON-VON NEUMANN
Table II
[-»RAM )
-»MEMORY -{-ROM V
L-EPROM __) FIRMWARE CHIPS
•TIMERS/EVENT COUNTERS
.-MULTIPLIERS
'-* ARITHMETIC •
L*FLOATING POINT
PROCESSOR
SUPPORT
-»MEMORY MANAGEMENT
-PLAs, PALS
—,-IKTERRUPT CONTROLLER
DMA CONTROLLER
-GENERAL INTERFACE •PARALLEL
-T
(SERIAL
KEYBOARD, LED DISPLAY
-DISC-, CASSETTE INTERFACE
CRT CONTROLLER, CHARACTER GENERATOR
-DOT-MATRIX PRINTER CONTROL
t-i/o SUPPORT - -COMMUNICATION (HLDC, ETHERNET, ETC.)
CYCLIC REDUNDANCY CHECK
-ENCRYPTION, DATA SECURITY
•INTER-BUS CONNECTION, TRANSCEIVERS
-IEEE-488 TRANSCEIVERS
-BURST-ERROR DETECTION
S
TEPPER MOTOR
IGITAL — > ANALOG
NALOG > DIGITAL
SYNTHESIS
t-SPEECH
ANALYSIS
- 240 -
Finally, if w e were to make some classification scheme of the whole area of pro-
grammable digital microcircuits, we could distinguish between the processors proper and
the support chips. The processors can be further subdivided into user-programmable
and fixed program processors (e.g. the marvels y o u find inside y o u r electronic watch).
Still f i n e r s u b - d i v i s i o n s of the u s e r - p r o g r a m m a b l e processors and the support chips are
s h o w n in T a b l e I and II. T h e devices w h i c h will r e c e i v e some attention in t h e following
p a g e s a r e m a r k e d w i t h an asterisk.
2. Bit-Slice Microprocessors.
COMPUTER
CONTROL
UNIT
(ecu)
REGISTER FILE
ARITHMETIC
P R O C E S S O R U N I T (APU)
PROGRAM
CONTROL
UNIT
(PCU)
INTERRUPT DMA
CONTROLLER CONTROLLER
1 10
EXTERNAL WORLD
.IS
Cout IS Cin
.OP
A B A B A B A B_
;3—0r f—r
16
B
II II ,16
jfci OP
Fig. 2 Diagram showing how a 16-bit ALU can be built from four 4-bit slices.
I/O
UNIT
T
PERIPHERAI
RAIUMRALU)
STATUS FLAGS
BIT-SLICED >cPROCESSOR
C. F e t c h a n d D e c o d e s e q u e n c e c o m m o n t o all instructions
MICROPROGRAMMING
DECODER OUTPUT
LINE=YU INSTRUCTION
WILKES MICROPROGRAMMED CONTROL STORAGE
/
A8 B
DECODER diode matrix
NEXT ROM s
ADDRESS
(1
1
2n l i n e s 1
I
I from A L U STATUS :
Conditional B r a n c h
T A R G E T IR
1
Fig. 4 W i l k e s microprogramming model. Each microinstruction contains
control information together with the address of the next microinstruc-
tion.
the MAR inputs, so that t h e next clock pulse will actually s t r o b e t h e data into t h e r e -
gister. All this control information is p r o v i d e d b y the micro-instruction's fields. The
m i c r o - i n s t r u c t i o n itself c a n also contain t h e a d d r e s s of the next micro-instruction to be
executed and thus define t h e flow of control of t h e microprogram. What has been d e -
s c r i b e d s o f a r is e x a c t l y t h e m i c r o - p r o g r a m m i n g m o d e l p r o p o s e d i n 1951 b y W i l k e s a n d
s h o w n in f i g u r e 4 . Each clock pulse causes a n e w microcode t o b e strobed into t h e d e -
coder and thus new control information to become available together with t h e next a d -
dress.
W e h a v e o n l y t o u c h e d t h e s u r f a c e o f m i c r o p r o g r a m m i n g s o f a r , b u t it s h o u l d already
be clear to e v e r y o n e that writing microcode is n o t a s i m p l e t a s k a n d t h a t i t requires
very intimate knowledge of t h e h a r d w a r e of t h e machine. W e c a n l e a r n still m o r e from
the example : the first four steps a r e necessary to fetch (from main m e m o r y ) a n d d e -
c o d e i n s t r u c t i o n s of a h i g h e r level t h a n t h e m i c r o - i n s t r u c t i o n s t h e m s e l v e s . In t h e e x a m -
ple a PDP-11 machine instruction was fetched and then executed. Writing short micro-
code sequences, o n e f o r each c o n v e n t i o n a l machine instruction, is t h e r e f o r e equivalent
to defining a set of such machine instructions. In o t h e r w o r d s , w e have created the
possibility to write programs in a s s e m b l y language, from where w e can build up to
higher level languages. B y writing other, longer sequences of microinstructions, we
could define a set of commands of a higher level; f o r instance a set particularly well
suited for interpretation o f Pascal P-code. T h e task of writing microcode f o r t h e e x e -
c u t i o n o f all d e f i n e d i n s t r u c t i o n s h a s o n l y t o b e p e r f o r m e d o n c e ( b y a n e x p e r t ! ) . Once
it i s d o n e t h e m a c h i n e p r e s e n t s itself t o a p r o g r a m m e r j u s t as a n y o t h e r m a c h i n e . T h e
microcode can be stored i n a R e a d - O n l y M e m o r y in w h i c h c a s e w e h a v e o b t a i n e d a f i x e d
instruction set processor. When the microcode is s t o r e d in a R e a d - W r i t e memory, the
possibility exist to a d d new instructions o r sequences of instructions : t h e m a c h i n e is
u s e r - m i c r o p r o g rammable.
- 245 -
T h e designer o f a b i t - s l i c e m a c h i n e is f r e e t o d e f i n e t h e i n s t r u c t i o n s e t h i s proces-
s o r will e x e c u t e . He can define none. In that case t h e p r o c e s s o r can only be program-
med by writing microcode. This can be done once and for all s o t h a t a f i x e d program
machine will be t h e r e s u l t . T h e solution adopted generally is t o l e a v e it t o t h e u s e r to
program the processor. Several examples of such p r o c e s s o r s - t o - b e - m i c r o p r o g r a m m e d ex-
ist a n d a r e u s e d in h i g h - e n e r g y physics experiments : ESOP 2 5
, CAB 3 5
, MONICA" . 0
The designer can also choose an instruction set of another, hopefully well-known
and widely used, computer, e.g. PDP-11 or I B M 370/168. In that case he creates an
"emulator". Emulators are also used in h i g h - e n e r g y physics experiments ; examples
are MICE > S B >
and 370/E >. 7
Of all these possibilities, emulators have the invaluable advantage that software
running on t h e emulated machine will also run on the emulator ( i f it d o e s not, than
there is s o m e t h i n g wrong with the emulator). This means in practice that high-level
languages are available, and that programs can be compiled and debugged on a large
machine with excellent facilities. T h e final p r o g r a m can h o w e v e r run in a s m a l l , cheap
machine (the emulator), without fancy peripherals but dedicated t o its t a s k . Emulators
tend however to be slower than the directly microprogrammed machines. A user
m i c r o p r o g r a m m a b l e emulator makes t h e best of both worlds.
o f w h i c h b e t w e e n 2 0 a n d 3 0 a r e in o p e r a t i o n i n h i g h - e n e r g y physics.
We noted a l r e a d y t h a t w r i t i n g m i c r o c o d e is d i f f i c u l t a n d t e d i o u s a n d t h a t it requires
expertise. I t is t h e r e f o r e i m p o r t a n t to use g o o d tools w h e n writing microcode. Several
good microassemblers do now exist. In fact t h e y are meta-assemblers *', 1
which means
t h a t t h e c o d e t o b e g e n e r a t e d is n o t p r e - d e f i n e d i n s i d e t h e a s s e m b l e r , but the user has
to d e f i n e it. A m e t a - a s s e m b l e r w o r k s in t w o ( o r t h r e e ) phases :
- the definition phase. In this phase the format of the micro-instruction is defined,
symbolic names are given to fields and default values attributed. Also macro defini-
t i o n s a r e m a d e in t h i s phase.
- the assembly phase. During this phase the symbolic micro-instructions (which use
the field-names and the macro-definitions) are assembled into b i n a r y code.
- 246 -
f~ MICROINSTRUCTION REGISTER
AUX-ADDR.
CONTROL
0 NA
T M MC 10B06 MC 1 0 8 0 0 CONO. MC 10S01
CODE
M IF RF ALU LOGIC MCF
I T 1 CMA
L I
Fig. 5
Simplified block diagram I-BUS
of a processor built from
the Motorola 10800 bit- INTERRUPT TARGET INSTR
slice family LOGIC DECODING
INTERRU PTS
- 247 -
There are however many applications where the expected v o l u m e of sales does not
reach millions, but w h e r e a d a p t a b i l i t y o f t h e p r o d u c t is o f g r e a t importance. For a long
time this has been the realm of the 8-bit microprocessors, until progress in device
technology made the 16-bit microprocessor possible. With the 16-bit processor came also
a breakthrough in t h e a r c h i t e c t u r e of t h e m a c h i n e s , t u r n i n g them into real computers,
which have nothing to e n v y from the typical minicomputer of the mid-seventies. This
d o e s h o w e v e r not mean t h a t i n d u s t r y has a b a n d o n e d the 8-bit micro. Its l o w e r c o s t and
t h e fact that it is p e r f e c t l y a d e q u a t e f o r a l a r g e p e r c e n t a g e o f all a p p l i c a t i o n s (80% ? )
accounts f o r its lasting p o p u l a r i t y . Enormous progress has also been made to overcome
t h e i n i t i a l l i m i t a t i o n s o f t h e 8 - b i t m i c r o s a n d t h e 6809 is a p e r f e c t e x a m p l e o f h o w added
- 248 -
Table III
DOMAINS OF USE :
The limitations of t h e 8-bit processors are largely due to the word-size and res-
tricted addressing modes and to t h e need to keep t h e C P U simple. T h e s e limitations can
b e s u m m a r i z e d as f o l l o w s :
i) a r i t h m e t i c o p e r a t i o n s :
- limited precision ; operations on reasonably sized operands are slow.
- h a r d w a r e multiply/divide do not exist.
ii) n u m b e r o f i n t e r n a l r e g i s t e r s is l i m i t e d , r e s u l t i n g in :
- slow operation, a s i n t e r m e d i a t e r e s u l t s m u s t b e s t o r e d in memory.
- restricted indexing operations, which result in explicit address calculations,
slowing down the overall operation.
iii) t h e m a j o r i t y of instructions require more than one b y t e , again slowing down exe-
cution.
iv) total a d d r e s s space is l i m i t e d t o 64 K b y t e s , precluding the running of large pro-
g r a m s a n d also to a l a r g e e x t e n t t h e use of h i g h - l e v e l languages.
v) limitations in t h e i m p l e m e n t e d a d d r e s s i n g modes impede elegant solutions f o r param-
eter passing in h i g h - l e v e l language procedure calls.
vi) the use of absolute addresses is p r a c t i c a l l y unavoidable (ROMs a n d I/O d e v i c e s at
fixed addresses, forcing programs to occupy the holes left). In the absence of
universally accepted conventions, software Is t h e r e f o r e not easy to t r a n s p o r t bet-
ween systems.
vii) advanced features, such as m u l t i l e v e l i n t e r r u p t or protection mechanisms are ab-
sent or v e r y primitive.
- 249 -
It t o o k i n d u s t r y a w h i l e t o r e a l i z e t h e impact of t h e s e r e s t r i c t i o n s a n d t h e i r conse-
quences. With a d v a n c i n g technology, emphasis was first put on microcontrollers, e.g.
applications where the total number of chips must be reduced. I t is typical for this
t r e n d t h a t t h e 6 8 0 2 , w h i c h is a 6800 w i t h 128 b y t e s of read/write memory on chip and
thus particularly suited for controller applications, was developed a few years before
the 6809. Both processors have approximately the same number of transistors; the
6809, although still an 8 - b i t machine, has o v e r c o m e many of limitations listed before,
t h e 6802 h a s not.
I ACCA I Accumulator A
ACC B Accumulator B
IX Index Register
PC Program Counter
SP Stack Pointer
6800 Registers
RHO Oí 7 RLO
RRO
RH1 RL1
RQO
RH2 RL2
RR2
RH3 RL3
RH4 RL4
RR4
RH5 RL5
RQ4
RH6 RL6
RR6
RH7 RL7
GENERAL
15
RR8 PURPOSE
REGISTERS
RQ8
RR10
RR12
NOT USED
PC OFFSET
Intel 8086.
GENERAL SEGMENT
REGISTERS REGISTERS
INSTRUCTION
POINTER
ADDRESS MULTIPLEXED
GENERATION
AND BUS CONTROL
r *
J OPERANDS J
INSTRUCTION
QUEUE
8086 Regi
19 O
M E M . A D D R . LATCH P h y s í c o l address
Logical
»~ p h y s i c a l a d d r e s s
translation
8086
On the other hand, the use of Segment Registers allows to bind a program to a
physical place in m e m o r y at t h e u l t i m a t e moment o n l y . A program may also be dynami-
cally relocated, if t h i s move is a c c o m p a n i e d b y the appropriate change to the contents
of the segment r e g i s t e r s , a n d t h e p r o g r a m o r its d a t a d o n o t e x c e e d 64K.
T h e a d d r e s s i n g m o d e s i m p l e m e n t e d o n t h e 8086 a r e s u f f i c i e n t l y e x t e n d e d so t h a t po-
sition-independent or re-entrant code can be written without difficulty. The use of the
stack pointer and base register allow parameter passing in procedure calls. Another
feature of the 8086 is i t s string handling capabilities, which are fully interruptable,
- 253 -
Zilog Z8000.
The designers of this processor had a number of objectives in mind which they
summarized themselves 2 2
' : increase capabilities, provide architectural compatibility
o v e r a r a n g e of capabilities a n d , increase clarity. T h e f i r s t o b j e c t i v e is o b v i o u s b u t the
o t h e r two merit some explanation. C l a r i t y means t h a t all r e g i s t e r s should play the same
role and that operations are not implicitly linked to the use of special purpose regis-
ters. A general register m a c h i n e , w i t h 16 g e n e r a l p u r p o s e r e g i s t e r s , each of 16-bit, is
the result (see fig 7). The registers can hold addresses, or operands. Architectural
compatibility means in fact that there are two compatible models of the Z8000 : the
Z 8 0 0 2 o r t h e u n s e g m e n t e d v e r s i o n a n d t h e Z8001 o r t h e s e g m e n t e d v e r s i o n . The models
have exactly t h e same a r c h i t e c t u r e , but their internal structure is d i f f e r e n t . The un-
segmented version has an a d d r e s s space of 64K; t h e Z8001 can a d d r e s s 8 Mbytes of
memory, if a n e x t e r n a l m e m o r y m a n a g e m e n t u n i t is u s e d . T h e M M U c a l c u l a t e s t h e physi-
cal a d d r e s s from a 16-bit offset value and a 7-bit segment number. The way this is
done is s h o w n in f i g . 11. Note that the segment number is converted into a segment
base address by means of a look-up table. F i g . 12 shows an e x a m p l e of this mapping
operation. N o t e again that logical s e g m e n t s m a y well o v e r l a p physically.
87
MEMORY
MANAGEMENT
UNIT
BASE
ADDRESS
MEMORY
1615 O 8 7
. 00000000
oooooooo
24-BIT P H Y S I C A L A D D R E S S
M o t o r o l a 68000.
T h e 68000, more recent than the two preceding processors, has become v e r y popu-
lar in a s h o r t time. It is not o n l y widely used in h i g h - e n e r g y physics, b u t also in
many different Personal Work Stations, including IBM's ! This is n o t a s t o n i s h i n g , con-
s i d e r i n g t h a t t h e 68000 p r o v i d e s all t h e f a c i l i t i e s r e q u i r e d f o r a modern computing sys-
tem. 2 1 5 2 3 >
Internally, i t is a 3 2 - b i t m a c h i n e , it i n t e r f a c e s to the external world over
16 d a t a lines. It has t w o sets of 8 32-bit registers ; one set are data registers ; the
other address registers. The CPU contains two additional special purpose registers :
the program counter and the 16-bit status register (see f i g . 13). In t h e f i r s t version
o f t h e 68000 t h e P C h a s 2 4 - b i t s , b u t t h i s can be e x t e n d e d in later m o d e l s . In f a c t , the
68000 s h o u l d be considered as t h e f i r s t m e m b e r of an architectural family, which ,as
- 255 -
DATA
REGISTERS
ADDRESS
REGISTERS
23
PROGRAM
PC COUNTER F i g . 13 The register structure
of the Motorola 68000. Note
STATUS that the general registers are
SR REGISTER 32-bit w i d e .
68000 Registers
technology advances, can g r o w with new members, having increased capabilities. The
development of such a family is g r e a t l y helped b y t h e f a c t t h a t t h e 68000 i s a m i c r o -
programmed machine. It has e v e n t w o l e v e l s o f c o n t r o l : micro- and nano- control (see
figure 14). B y separating the control of p r o g r a m flow from t h e control of t h e combina-
torial c i r c u i t s in t h e C P U , a s a v i n g in t h e total s i z e o f t h e c o n t r o l s t o r e has been o b -
tained. 2 1 , 5
T h e 68000 c a n o p e r a t e in o n e o f t w o states : user or supervisor. The su-
p e r v i s o r state uses a separate stack pointer w h i c h cannot be corrupted b y a user. The
instruction set, modeled after t h e PDP-11, is c o n s i s t e n t , in t h e s e n s e t h a t a n y i n s t r u c -
tion w h i c h specifies an o p e r a n d in m e m o r y , may use a n y of the addressing modes.
INSTRUCTION
MICRO
DECODE
ADDRESS CONTROL
IR STORE
SEQUENCE
MODIFICATION •x 6A0 x 10
CONDITIONALS
NANO
I A I addressing " CONTROL
STORE
| S | fetch"
EXECUTION UNIT E st
CONTROL
ll °
(TIMING |D J odd, store
+ SWITCH)
|C I add, fetch »280x70
SAMPLE PROGRAM:
PROGRAM EXAMPLE;
VAR PARAM 1,PARAM 2: I N T E G E R ;
P R O C E D U R E P R O C (X : I N T E G E R ; V A R Y : I N T E G E R ) ;
VAR A, B : I N T E G E R ;
BEGIN
<procedure body>
END
BEGIN
P R O C ( P A R A M ]j P A R A M 2 )
END.
PROGRAM BODY :
MOVE PARAM 1 TO - S P @ "push first parameter"
PEA PARAM 2 'push a d d r e s s of 2 n d parameter"
JSR PROC 'call the p r o c e d u r e " ,
ADD # 6 TO S P ''pop p a r a m e t e r s f r o m the s t a c k
PROCEDURE BODY :
LINK FP, 4 ' link and allocate three local
variables "
MOVEM < registerlist> T 0 - S P @ * p u s h s o m e register contents"
<procedure body>
MOVEM < registerlist> FROM S P @ "restore registers"
UNLK FP ^restore stack"
RETURN ' r e t u r n to c a l l i n g p r o c e d u r e "
LOW MEMORY
Texas 9900/99000.
T h e 9900 is a r a t h e r old processor, which can be replaced b y the faster and fully
backward compatible 99000. The 99000 is o n e of the fastest microprocessors on the
market. The 9 9 0 0 is s i n g u l a r i n t h e sense that it h a s n o i n t e r n a l registers to perform
operations on. There is o n l y a program counter, a status register and the workspace
pointer WP. WP points t o a b l o c k o f 16 l o c a t i o n s in m a i n memory which a c t as t h e 16
general purpose registers of the machine. Generally this slows down the operation of
the machine, except when a context switch must be performed f o l l o w i n g an interrupt.
To change the context of the machine it is sufficient to load a new value into
WP, after t h e old v a l u e has been preserved.
- 258 -
The machine has a few serious shortcomings. For instance, subroutine return ad-
d r e s s e s a r e s t o r e d in a d e d i c a t e d register (inside the w o r k s p a c e of c o u r s e ) . Subrout-
ines can t h e r e f o r e not be n e s t e d , u n l e s s t h e r e t u r n a d d r e s s is m o v e d t o a n o t h e r place
b e f o r e t h e n e w c a l l is m a d e . T h e m e m o r y s p a c e is l i m i t e d t o 64 K b y t e s , with no p o s s i -
b i l i t y of e x t e n s i o n . The d e s i g n e r s o f t h e 99000 e v e n refused to contemplate the intro-
d u c t i o n of some memory management scheme in t h e i r d e s i g n , z 5 >
giving absolute priori-
t y to t h e p r i n c i p l e of b a c k w a r d s compatibility.
N a t i o n a l 16008/16016 a n d 16032.
T a b l e IV
Table V
4. Support chips
T h e v a r i e t y o f m i c r o p r o c e s s o r s u p p o r t c h i p s is a s t o u n d i n g ( s e e T a b l e II f o r w h a t is
probably a "short l i s t " ) . We will limit o u r s e l v e s t o a f e w devices which have found in-
teresting applications in h i g h - e n e r g y physics.
4.1 Memory
Memory devices are undoubtedly the most w i d e l y used support chips. H e r e w e will
o n l y mention a few unusual applications of memory.
F = ABC +
ACD +
BCD
R = VX 2 +
Y 2
and <p = a r c t a n Y/X
T h e p r i n c i p l e is s i m p l e : t h e i n p u t v a l u e s a r e c o n c a t e n a t e d to f o r m an a d d r e s s . At that
address in memory the output value is s t o r e d . The answer is o b t a i n e d in o n e single
memory access time.
A4 A3 A2 Al AO
1024x1 MEMORY
a.ß = (A.2 8 +
a)(B.2 8 +
b) = AB.2 1 6
- (Ab aB)2
+ s
+ ab
Four identical tables with 256 e n t r i e s are now all t h a t is needed. All linear functions
can be treated this w a y . C o u n t i n g t h e n u m b e r of bits s e t in a w o r d is a t h i r d example
where a look-up table is f a s t e r than any other method. Again the table size can be
r e d u c e d at t h e c o s t o f a f e w additions.
- 261 -
R k
=
N modulo B k
One would therefore expect to find large sized CAM chips on the market. This is
however n o t t h e c a s e : a C A M c h i p is h e a v i l y p i n - l i m i t e d . W h e r e a s a normal RAM of 2^
words x N bits needs only k +
N pins (plus a few more for power, R/W c o n t r o l and
chip enable), a C A M of t h e same s i z e n e e d s in a d d i t i o n N pins f o r a mask and 2^ pins
to signal the matches f o u n d . These 2 lines c a n n o t b e e n c o d e d as t h e possibility would
b e l o s t t o f i n d m u l t i p l e m a t c h e s in a s i n g l e c y c l e . A large CAM can of c o u r s e be built
from many small c a p a c i t y chips, but a more elegant solution is o f f e r e d by associative
processors. An associative processor searches all elements of an a r r a y simultaneously
for a match. Generally this is done bit-by-bit. The array to be searched may have
1000 e n t r i e s , typically. As long as t h e ratio of t h e w o r d length o v e r the number of
e n t r i e s t o b e s e a r c h e d is s m a l l , a c o n s i d e r a b l e g a i n in s p e e d is o b t a i n e d . I n an a s s o c i -
ative processor each memory word has a simple processing element attached to it. A
memory word contains in general many bits, so t h a t several attributes can be stored
along with the search field. For instance if t h e memory contained personal data, a
search could be made for all t h e S m i t h e s who are between 45 a n d 55 y e a r s old. Once
located, their address and telephone number could then be read from those memory lo-
cations w h e r e a match was found.
The memory for an a s s o c i a t i v e processor can be built from normal RAM chips, for
i n s t a n c e as i n d i c a t e d in f i g u r e 18. T h e n o r m a l a d d r e s s lines are used to select one bit,
for all 1024 e l e m e n t s o f t h e v e c t o r . These 1024 b i t s a r e t r e a t e d in t h e 1024 processing
elements. T h e processing elements (PEs) m u s t b e c h e a p to m a k e an a s s o c i a t i v e proces-
sor a viable structure. S i n c e o n e b i t is t r e a t e d at a t i m e , a single-bit microprocessor
would be indicated. An associative processor using the single-bit Motorola MC14500B
has been b u i l t at t h e U n i v e r s i t y of T o r o n t o . 3 2 5
1 n addition to the PE's a n d t h e working
store there is also a b a c k i n g store. The different elements communicate as s h o w n in
figure 19 f o r a s i n g l e horizontal slice t h r o u g h the machine. The shift register is used
for communication between the different elements of t h e vector. This communication is
needed for the execution of o p e r a t i o n s which are more complex than the simple search-
es. A block d i a g r a m o f t h e m i c r o p r o c e s s o r is s h o w n in f i g u r e 2 0 . T h e MC14500B has a
set of 7 boolean and 9 o t h e r instructions. T h r e e of t h e i n s t r u c t i o n s automatically enable
the write line. R e f e r e n c e 32 g i v e s more details on this cheap associative processor, in
p a r t i c u l a r examples of search and o t h e r operations.
i ;
9
8-BIT 1 CHIP 8 PEs
B
8-BIT 1 CHIP ! ¡ 8 PES
ETC..
! ¡
102« k - BIT i ! 1024 PEs
ELEMENTS SEARCH i i TOTAL
IN A VECTOR FIELD—•», J
SERIAL IN
CONST.
SHIFT
REG
SH
BACKING WORKING
STORE (CCD) STORE (RAM) PE SERIAL OUT
BK WK
WRITE ÎcONTROL
ADD
*I ADDRESS
* ï LIN
ES RA;
Œ3 FLAGS
OEN
INSTRUCTIONS
7 BOOLEAN
+ 9 OTHER
INSTRUCTIONS
(3 ENABLE,
WRITE )
INSTRUCTIONS
WRITE ENABLE
or division. The Intel 8087, a co-processor for t h e 8086 i m p r o v e s greatly on this fig-
ure, bringing it d o w n t o 16 fis ( o r 24 \is f o r m u l t i p l i c a t i o n o f t w o 6 4 - b i t real numbers).
The question arises if it is f e a s i b l e t o replace large number crunching computers by a
reasonable number of 16-bit microprocessors, each with an attached floating point
processor. Can a reasonable performance be obtained at an affordable cost ? At
the Brookhaven National Laboratory this problem was i n v e s t i g a t e d . 3 3
' An experimen-
tal processor was built from a 68000 a n d an A m d 9511. A Fortran compiler was devel-
oped for the 68000 a n d a p i e c e o f pattern recognition code f o r events in t h e multiparti-
cle s p e c t r o m e t e r was run. Performance measurements were made using this code. The
results were then extrapolated t o a 8 M H z 68000 (instead of t h e 4 M H z v e r s i o n used in
t h e t e s t ) a n d t o a N S 16081 ( i n s t e a d of t h e 9511). It w a s f o u n d t h a t t h e c o m b i n a t i o n of
a 68000 a n d a N S 16081 w o u l d have 1/30 o f t h e p o w e r o f a C D C 7600, or approximately
the power of a D E C - 1 0 ( w i t h a KA10 processor). O n e can conclude from this result that
single microprocessors do not provide yet a solution for applications where number-
crunching is e s s e n t i a l .
5. VLSI
Besides these projects data flow machines, tree machines, systolic arrays and a
number of other non-von Neumann structures are the object of study. Industry tends
to remain on the safe side and its VLSI developments are essentially limited
to fabricating larger memories and more complex (micro)processors (e.g. iAPX 432
of I n t e l 3 7
' 3 8
').
y
line, instead of c y c l i n g the data through a central memory (see f i g u r e s 21a a n d b).
a) i i b)
MEMORY EMORY
|PE|PE|PE|PE¡PE|PE
The analogy with the heart gave the name to this type of computing structure. A
structure as d e p i c t e d in f i g u r e 2 1 b is o f c o u r s e not suited for general computing ; in
most instances we would not know w h a t to do with the a r r a y of PE's. Systolic arrays
are however very well suited for signal processing and pattern matching. We will show
two examples of how a convolution integral may be evaluated using two differently
structured systolic arrays. Kung himself gives four more structures to compute the
convolution integral.* 0 >
Y(t) = o J « W(t-T).x(T)dT
X BROADCAST
STAY
Y¡ MOVE SYSTOLICALLY Yout=V +W.Xi,
in
in
F i g . 22 Example of a systolic array to calculate the
convolution integral. The specialized function of a PE
is shown in the inset. For each beat of the clock, new
Input data is presented and intermediate results move on.
is STAY
X¡' A N D
s Y¡'s M O V E
IN OPPOSITE DIRECTIONS
•w.x¡,i n
AXC
PATTERN
001001100.-.
HOST
ABCAACCQQ...
STRING
F i g . 24 The principle of the pattern
PM
matching chip, which is a practical
RESULT implementation of the systolic array.
5.2 T h e G e o m e t r y Engine.
5.3 RISC
A s w e h a v e s e e n in c h a p t e r 3 , t h e p r e s e n t t r e n d is t o w a r d s C I S C . T h e Z8000, MC
68000 a n d N S 16032 a r e e x a m p l e s . The Intel i A P X 432 a n d t h e H e w l e t t P a c k a r d H P 9000
(430 000 t r a n s i s t o r s !) have gone even further in this direction. By contrast, the
T e x a s T M S 99000 h a s r e m a i n e d r a t h e r simple.
They further a r g u e that the design of the instruction set s h o u l d be based on the
actual use of the instructions. So L O A D , STORE and BRANCH instructions should be
made faster. The d e s i g n e r can forget about the rest, the instructions which are seldom
used : t h e y should be replaced b y software. T h e compiler should get the task to optim-
ize t h e h a r d w a r e / s o f t w a r e mix f o r h i g h e s t speed.
HIGH A b)
LOCAL A PROC. A
LOW A / H I G H B
LOCAL B PROC. B
a) L O W B/HIGH C
LOCAL C PROC.C
HIGH
LOW C
LOCAL 1
REGISTER
LOW WINDOW
GLOBAL GLOBAL
6. Conclusion
In t h e s e lecture notes the author has paced rapidly over a wide field, picking a
flower here and there, without bending down to inspect more closely the large variety
of herbs g r o w i n g in t h e f i e l d . H e h o p e s n e v e r t h e l e s s t h a t he has s u c c e e d e d in g i v i n g a
glimpse of a few topics of i n t e r e s t f o r experimental physicists. The bit slices are the
devices to be used when speed is at a p r e m i u m o r when an e x i s t i n g machine must be
emulated. The fixed instruction set m i c r o p r o c e s s o r s are catching-up very rapidly, not
only in speed and arithmetic capabilities, but especially in their high-level language
and other software support facilities. Luckily, t h e times are gone where an engineer
painfully aligned zeroes and ones to be b u r n t into a P R O M , hoping that it w o u l d make
his microprocessor chip to do something useful. The new developments in VLSI may
p r o d u c e some f u r t h e r h a p p y surprises in t h e y e a r s to c o m e . But even without new dev-
i c e s , c l e v e r u s e o f t h e o l d o n e s c a n a l s o b e o f g r e a t h e l p in p h y s i c s experiments.
7. Acknowledgements.
I would like to t h a n k Mrs C. Gentet for preparing the manuscript and making it f i t
to p r i n t and Mrs. O. Marais f o r her usual fast a n d accurate p r o d u c t i o n of figures.
8. References.
30. N.S. Szabo and R . J . Tanaka, Residue Arithmetic and Its A p p l i c a t i o n s to Computer
Technology, (Mc G r a w - H i l l , New Y o r k , 1967).
31. A. Huang, Number Theoretic Processors, Thesis, Dept. of Electrical E n g . , Stanford
University, 1980.
32. W.M. Loucks, M. Snelgrove and S.G. Zaky, A Vector Processor based on One-bit
Microprocessors, IEEE Micro, V o l . 2, no. 1 (Febr. 1982), p. 53.
33. H. Bernstein et.al., A Microprocessor - based Single Board Computer for High En-
ergy Physics Event Pattern Recognition, in Proc. Topical Conf. Application of Mi-
croprocessors to High-Energy Physics Experiments, Geneva, 1981 ; Cern
81-07, p.479
34. P.C. Treleaven, VLSI Processor Architectures, Computer, Vol. 15, No 6 (June
1982), p. 33
35. J. Clark, A VLSI Geometry Processor for Graphics, Computer, Vol. 13, n o . 7 (July
1980), p. 59.
36. G.J. Sussman, J. Holloway, G.L. Steele J r . ad A. Bell, Scheme-79, LISP on a
Chip, Computer, Vol. 14, n o . 7 ( J u l y 1981), p. 10.
37. Introduction to the ¡APX 432 Architecture, Intel Corporation, Santa Clara, Cal.
1981.
38. S. Zeigler, N. Allègre, R. J o h n s o n , J . M o r r i s and G. Burns, A d a f o r t h e I n t e l 432
Micro- computer, Computer, Vol. 14, n o . 6 ( J u n e 1981), p. 47.
39. H.T. Kung, Let's Design Algorithms for VLSI Systems, Proc. Conf. Very Large
Scale Integration, Architecture, Design, Fabrication, Cal. Inst, of T e c h n o l o g y , Los
Angeles, Jan.1979, p. 65-90.
40. H.T. Kung, Why Systolic Architectures ? Computer, Vol. 15, No 1 (Jan.
1982), p.37.
41. M.J. Foster and H . T . Kung, The D e s i g n of Special P u r p o s e V L S I Chips, Computer,
Vol. 13, n o . 1 ( J a n . ' 8 0 ) , p. 26.
42. D.A. Patterson and C H . Sequin, A VLSI RISC,Computer, Vol. 15, no. 9, (Sept.
1982), p.8.
43. J.R. Larus, A Comparison of M i c r o c o d e , Assembly Code and High-Level Languages
on the VAX-11 and RISC-I, Computer Architecture News (ACM Special Interest
Group), Vol. 10, n o . 5, S e p t . ' 8 2 , p . 10.