INTRODUCTION
This paper descrives the architecture of the CPU and Memory
for the Central Air Daca Computer (CADC) System used in the Grumman/
Navy Fl4A carrier-baseé fighter airerait, The CADG performs
spectalized computeretional functions in response to input stimuli
such as pressure sensors, teuperature sensors and closed Loop
feedback inputs. Outputs from the CADC system are used to drive
pilot visual displays (euch as, altimeter, temperature indicator,
nach aumber indicator, etc.) and to provide control inputs for
other aircraft systems. Tne outputs from the CADC are in the form
of digital and analog signals. Figure i illustrates a block
diagram for the CADC.
Being in a flight envirorment meant that certain constraints
must greatly refleet the architecture of the CPU and Memory.
These constraints were size, power, real-time computing capability
and cost, not necessarily in that order, Other constraints such
ag temperature, acceleration and mechanical shock affected the
overall design of the CADC,
The size of the CPU-Memory wes Limited to a maximum of 40
square inches. This included the arithmetic section, read-only
memory, and read/write memory. Since the unit was to be packaged
oa a printed circuit card che mmber of layers of che p.c. card
was an important consideration, The power consumpzicn had a Limir
of 10 watts at anbient 25°C, This was principally a function of
the capabilities of the p.c, card to withstand the heat.
The required computing capacity for the CPU was not defined
at the beginning. This meant that the system had to be somewhat
Flexible to changes in computational load. Of course limits had
to be set to be able to work within the other constraints, what
was known abost the compueation was the form of the equations to
Ye implemented, This included polynominal evaluations, data limit-ing, data comparison and discrete or flag inputs and outputs.
This meant that the arithmetic and logical functions of the
system had to handle at least the following operations:
maleiply
Divide
Add
Subtract
Limits
Square Root
And
or
Conditional Transfer
Unconditional Transfer
Receive Diserete Data
Receive Digital pata
Output Discrete Data
Output Digital Data
The Last constraint ef cost was certainly important since
the eystem would eventually go into high volume production.
FUNCTIONS TO BE IMPLEMENTED
Before we proceed, a better understanding of the functions
to be implemented is necessary. The function that most often
occurred was the polynomial,
POD = aga? + age’ + aye! + age? + age? + aye +
where x was the input, either from outside the CPU (sensors) or
from its own memory.
In order to save arithmetic
in ite "nested" form as follows:
time the polynomial was implemented
PO) = (((C(Ceag) + a5) x + By) K+ ag) x4 BQ) eH aL) KF OQ
‘The data limit function was one that would accept 2 binary
inputs, an upper Limit (J), @ lower Limic (L) and a parameter (P).
The output would then be as
Pifo
Lite
uae?
follows;
2 P2L
ek
>.NALSAS
sVOS-NO
FHLO 8.
Wwiasia
Old OL
S¥o.voIONI
S1NALNO
SOTWNY
SLNdLN9.
TWwLlsIa
sindno.
aiayosia
ONIHOLINOA 7
As3l 4713s
“4
-1 amid
| SLYAANOO
7WEISIG
OL 90TVNV
aNY fd
¢OSNSS:
["] aunssaud
‘OINVYNAG
4OSNSS
aunssaud
laws
SWBLSAS
‘uvO8 -NO
SINAN LOTId
SLNdNI
Asal 3735
SLNANI LOTId
7 SLMdNI
i woveoaas
30d0ud
aunwaadWaL
LaveoulyEven though such a function could be programed in software, it
was decided to build it in hardware since it was used often
enough.
pata conditioning and scaling also had to be accomplished.
This involved the following simple expressions
ates
ole
Again the occurance of these were frequent enough to warrent
hardware consideration. This will becowe apparent later when the
hardware is discussed.
Gince size and power consumption was of the ultinate importance,
DS technology was chosen as the means of circuit implementation.
his ellowed greater packaging densities to be obtained that
otherwise would not be, Tae slowness of MOS devices and the high
thresholds used allowed a design that was virtually immune to
Glectrical noise on the ground or transnitted from packages withla
close proximity. The higher supply voltages required resuleed in
a more efficient power supply design.
NUMBER SYSTEM
he CPU is @ fractional fixed point machine with the most
significant bit a sign bit and the cther bits representing data.
Negative nuabers are represented in two's complement notation.
‘two's complementation was chosen te avoid the ambiguity of double
zeros.
The word length chosen for the system was 20 bicss 19 bits
of data and 1 bit for sign, Tais Length was chosen after a Chorough
analysis of the accuracy required for certain throughput calculation:
such as the rate of change of altitude function.Early in the architecture study it was realized that
package size and quantity should be kept to a minimum if we were
to meet the size constraints established. With minimum packaging
space requirements it was necessary to use packages with the
fewest possible leads. This would minimize the complex p.c. card
interconnect which was inevitable. Because of this the processor
was designed to transfer data serially througsout the entire system,
PROCESSOR PARALELLISM
As is known by all computer designers, serial machines are
usually not the best way to go if computational speed is needed.
To get around this it was decided to have several arichmetic or
processing units working at the same tine, This resulted in @
technique known as "pipeline processing" or "pipeline concurrency
As defined by Bell and Newell! "pipeline concurzency is the name
given to a system of multiple functional units, each of which is
responsible for partial interpretational and execution of the
instruction stream."
This system uses multiple functional units each dedicated
for 2 specific task, These functional units are:
1, Parallel Muléiplier Unit (PMD)
2. Parallel Divider Unit (PDU)
3. Special Logic Function (SLF)
4, Data Steering Unit (Si)
5, Random Access Storage (Read/Write) (RAS)
6. Read-only Memory UnLt (ROM)
Figure 2 shows a block diagram of the functional units
they would work together in a typical system. Each unit was designed
to operate as a separate entity and could be used without the need
of any of the other unite, This was done to provide maximum
expandibi lity with minimun additional hardvare, Fach functional
unit is controlled by its own microinstruction ROM, The miero-
instructions are also transferred serially to minimize package
pin count, Teaporary data storage is provided in the form of
read/write memory.
I, Gordon ¢. Bell, and Allen Newell, Computer Structures: Readings
‘and Examples, McGraw W111 Book Co., N.Ycy + PB.pouiwos FMunrs29 av Olt Quimws ANIME HOO LH & aR
SHivd Viva
SHLVd TONLNOD ---~
(YOSN3S YO a/v)
SVN Wes
SLNANI WIIG gunn,
om
WoL on +4
Ht l.. 21
NoWONUAsNl |02-.
a1naow anv viva [Sil
= g1nd0w o
3
=
Q
Ines 8
NoULo
2 aindow ‘ONY. Vive 5
Z 31n00w =
6-2
NdLn0, 8 sila
qwutgIg 2 [*70yLNoo
3
_ g | ineino
‘SWwOu c N-O1'9-b
NOLLOMELSNI | 91-81 Siig
‘ONY vit sia “OYLNOD
4 -31nGow oy TouanBefore looking at the functions of each of these units a
brief look at the timing is needed. Figure 3 illuserates thie
timing, The CPU-Memory clock is 375KR2. One complete clock period,
defined as a bit time 1s 2.66ysec, Every 20 consecutive bit times
are defined as a word, The first bit time of a word is called 10,
and the last time of the word is called T19, Two types of words
are used in the system, W, and Wy. In Wy, the arithmetical
algorithms operate and instruction words are shifted serially
into each functional unit. In Wy, computational inputs and out-
puts are shifted serially among the units, A word mark used to
distinguish word times is a signal coincident with T18 of every
word time, Two consecutive word times, Wy and Wp, is called an
operation (op) time. To distinguish the final operation time
a frame mark is generated in the system executive control, The
time between frame marks is called a frame, A frame includes
one complete cycle of computations. The frame mark is miczo-
programmed to allew the user to restart the computational cycle
when all previous computations are complete, Since this system
mist operate in real time it was therefore necessary to obtain
the most computation from each functional unit during each frame
time,
ARITHMETIC UNTTS
The Parallel Multiplier Unit accepts two serial inputs,
multiplicand end multiplier, in one word time (Wp) and produces
their properly rounded product by means of @ parellel algorithm
in one more word time.
he product is shifted out in the next Wp, while inputs for
the next operation aze sfaultaneously shifted in, The miltiplication
operation is achieved using Rooth's algorithm”, The PMU does not
need en instruction word to operate, but Is capable of operating
continuously in this manner.
The Parallel Divider Unit accepts two serial inputs, dividend
and divisor, in one word time (Wg) and produces the proper quotient
2. Yoahan Cha, Digital Computer’ Design Fundamentals, McGrawHill
Book Co., NeYe, 1962, pg. 326wep ue ni € Bonga
n (OaWNVEOONDONSIW) WW
SWVEs
wSISNVEL ws
NOLLONYLSNI
NTT sv ov omby means of @ parallel algorithm in one wore word time (#,). The
quotient 1s shifted out in the next Wp, while inputs for the next
operation are simltancously shifted in, The division operation
49 achieved by using a non-restoring division algorithm.” An
actual photograpa of the PDU chip is shown in Figure 4.
‘the Special Logic Function performs Logical operations and
generates specific data and Logic outputs, The unit accepts an
instruction word which specifies details of the operation,
‘The fundamental logical operation of this unit Ls the limit
function. If consists of three registers (U,P, and L) whose inpues
arrive in Wy. One of these regiscers Is picked at the end of Wg
‘vy ascociated comparison logic, Other logic functions such as
‘AND's, OR's, GRAY CODE (special), Conditional, and Unconditional
data branching is also included in this Logic.
‘Tho Data Steering Unit operates as a three channel serial
digital data multiplexer. Informatica is shifted serially through
the device during wy. A 15 bit instructlon word is accepted
Garing Wi, that spectftes which input or input conbinetions (Add
or subtract) is to be "steered" to each of three data outputs.
The instruction word for this unit is the last 15 bits of the
20-bit instruction word, From the least significant end, the
First four bits specifies the selection for Oucput 1, The next
four bits specifies the selection for Output 2 and the last
seven bits specifies the selection for Output 3. Addition or
subtraction is perforned by specifying that output combination
to be “steered” to the output. By performing additions and
subtractions in this manner the programmer can ebtain the sum
and difference during the same word time that the date is being
transferred, This transfer way be either to or from the memory
or arithmetic units, Specific instruction codes are interpreted
as follows
Yoahan Chu, Digital Computer Design Fundamentals, MeGrawHill
Book Co., Ne¥., 1962, pe. 39-6 i 8 9
0 0 0 0
° o 0 1
0 0 1 o
9 a 1 1
0 L o 0
0 1 0 1
0 1 L o
0 1 L 1
1 0 o 0
1 ° 0 1
L ° L a
1 0 L 1
1 1 ° 0
1 1 ° 1
L 1 L 9
1 1 L 1
wo
0 0 0 0
° 0 o L
9 0 L 0
0 0 L 1
0 1 ° 0
0 1 ° L
° 1 1 oe
0 L 1 1
1 0 0 0
1 0 o L
1 0 1 °
1 ° 1 1
1 L 0 0
1 1 0 1
1 1 1 o
1 1 1 L
Selected to Output 1
ExT
ExT
EXT
ext
ExT
ext
EXT
EXT
ext
ExT
EXT
EXT
EXT.
EXT.
ExT
EXT
Selected co Output 2
ExT
EXT
ExT
ExT
xT
xr
ext
EXT
ExT
EXT
O,LLLLTLL2LLLLLL1L1 (PTS) (Maxémam
Positive Number)
ext
EXT
EXT
EXD
EXT
2
a
4
5
6
7
8
9
10
13
ree
9+ EXT 4
10 + Ext 4
44 ENT 8
2 = EXT B
1
ev sHeEN
9
19
AL
9 + EXT &
10 + EXT 4
4 + EXT B
2 > Ext 8crs
HEE HHH HH ec eeoeoee
20
io
1516
a 0
0 0
o 4
0 L
1 0
1 °
1 L
1 L
° °
0 o
0 1
0 L
1 °
L 0
1 L
zu 1
EXT 12 +
o
1 Sy
o sy
1 ogy
ERT
ExT
ExT
ExT
Ext
senor eHarnororore
a
EXT 8-5,
+
ext 2-2,
EXT 2-29
EXT 4 = £5
EXT 4 = E>
1
ww aren
Ww
Ra
Selected to Output 3
(controlled by bit 18)