Professional Documents
Culture Documents
net/publication/268323930
CITATIONS READS
4 44
1 author:
Michael D. Godfrey
Stanford University retired
52 PUBLICATIONS 1,063 CITATIONS
SEE PROFILE
All content following this page was uploaded by Michael D. Godfrey on 29 May 2015.
Michael D. Godfrey
ICL Head of Research, Bracknell, Berks.
March 1986
Abstract
1. Introduction
The purpose of this note is to discuss some of the motivation for innovation in com-
putational architecture, and, in this context, to review a number of computational
architectures and describe an active memory architecture which is the basis of the ICL
Distributed Array Processor (DAP)* products. Any discussion of new computational
architecture must take account of the pervasive impact of VLSI systems technology.
It will be argued that a key attribute of VLSI as an implementation medium is its
mutability. Using VLSI it is natural to implement directly the computation of spe-
cific problems. This fact will induce a fundamental change in the structure of the
information industry. While many new computational architectures do not fit well
with this VLSI systems driven view of the future, the DAP structure, viewed as an
active memory technology, may prove to be effective for the composition of important
classes of application systems.
At present, there is a very high level of activity directed toward “new” architecture
definition. Much of this work has a negative motivation in the sense that it is based
on the observation that since it has become increasingly hard to make “conventional”
architecture machines operate faster, one should build a “non-conventional” (or non
3. Alternative Architectures
Not only has there been considerable recent discussion about new architectures, but
there has been increasing discussion about the terminology and taxonomy of these
architectures. All this is still pretty confused. I will try to keep things simple, and
avoid as much of the confusion as possible. Approximately, the architectures will
be described in order of increasing specialization, but this is only very rough as the
notion of specialization is itself not simple.
A standard taxonomy uses the following notation:
SISD Single Instruction, Single Data stream: a single conventional (von Neumann)
processor,
SIMD Single Instruction, Multiple Data: a set of processing elements each of which
operates on its own data, but such that a single stream of instructions is
broadcast to all processors,
MIMD Multiple Instruction, Multiple Data: typically, a collection of conventional
processors with some means of communication and some supervisory control.
The remaining possibility in this taxonomy, MISD, has not been much explored
even though, with current technology, it has much to commend it.
In addition, the “granularity” of the active elements in the system is often used
for classification. A system based on simple processors which operate on small data
fields is termed “fine-grained,” while a system of larger processors, such as 32 bit
microprocessors, is termed “coarse-grained.” This classification can tend to conceal
other key distinctions, most prominently the nature and efficiency of communications
between the active elements, and thus it may not improve useful understanding.
* This collaborative project, the largest under the Alvey program, is lead by ICL
with Plessey, Manchester University, and Imperial College (London) as partners.
The active memory structure of the Distributed Array Processor (DAP[2, Chapter
12]), first developed by ICL, will be described in a more complete way because it may
be viewed as forming a basis for a distinctive capability which has not been developed
in other systems. The DAP structure by its nature leads to the formulation of prob-
lems in terms of the required rearrangement of the data. Thus, its efficiency tends
to be dependent on the spatial distribution of data-dependent elementary decisions.
(A formal notation for efficient organization of the dynamics of data arrangement is
presented in [1].)
Broadly, the DAP is an active memory mechanism such that an array of processing
elements control the manipulation of data in the memory structure. The array of
processing elements, each of which addresses a local memory, are operated by a single
instruction stream and communicate with their nearest neighbors, in some topology.
Progressively more narrow definitions also include the restriction that the pro-
cessors have some particular amount of local storage, that the processor width is one
bit, and that particular structures are provided for data movement through the local
store structure. For applications that require communication beyond local neighbors,
it may be essential to have row and column data paths which allow the movement of
a bit from any position to any other position in a (short) time which is independent
of the distance moved.
While the above definitions are useful for some purposes, an external definition
is more appropriate for understanding some applications and market opportunities.
A useful external definition is: A DAP is a subsystem which is directly effective for
execution of DAP-Fortran, or of a sub-set of the Fortran 8X array extensions. The
term “directly effective” is intended to mean that there is a close match between the
language construct and the corresponding architectural feature, and that the resulting
speed of operation is relatively high. The relevant Fortran extensions permit logical
and arithmetic operations on arrays of objects. The DAP performs operations in
parallel on individual fields defined over the array of objects.
26 ICL Technical Journal May 1986
4.1 Performance
So far no mention has been made of absolute performance. This is appropriate as it
is assumed that a contemporary DAP will be constructed from contemporary tech-
nology. Therefore, the important question is what is DAP relatively good at? The
definitions above are meant to make it clear that a DAP is relatively good at com-
putations which involve a relatively high density of operations, including selection
and conditional operations, on replicated structures and which require parallel rear-
rangements of data structures. The replication may be in terms of the dimensions of
arrays, record structures, tables, or other patterns. For example, routing algorithms
in 2 dimensions satisfy this requirement very nicely.
4.2 Cost
Cost is an important consideration in definition of a DAP because if a DAP is defined
as being relatively good at some computation, this must be taken to mean that
it is relatively more cost-efficient. DAP costs are differentially affected by VLSI
technology. The basic DAP structure scales exactly with the circuit density. This
simple correspondence between DAP structure and VLSI structure is a useful feature
which must be taken into account when projecting possible future cost-effective DAP
structures. The main discontinuity occurs at the point where a useful integrated
memory and processor array can be produced. Roughly, the technology to produce
1megabit RAMs will permit such an integrated implementation.
To indicate present cost characteristics, 2 micron CMOS (2 layer metal) can
support, approximately, an 8x8 DAP processor array. The chip fabrication cost is
of the order of $10. Thus, a 16 chip set to provide a 32x32 array would imply
a chip cost of $160. This structure would require external memory to compose a
subsystem. Using emerging VLSI technology it will be possible to construct memory
and processors on a single chip, thus improving performance and reducing the cost
to approximately the cost of the memory.
VLSI is developing into a highly mutable design medium. This will diminish the
need for general-purpose systems. Instead, it will become common practice to create
the application implementation directly in VLSI. (For examples of this approach see
[3] Part 2.) One way of viewing this change is that it raises system architecture to
a set of logical constructs which guide the implementation of application designs.
Much work remains to be done before a good working set of abstract architectural
principles are available. In addition, standardized practices and interface definitions
are required at both abstract and practical implementation levels in order that efficient
composition can be realized in this form. However, the economic benefits of this mode
of working will tend to ensure that the enabling concepts, standards, and supporting
infrastructure will emerge relatively quickly.
6. Conclusion
In the long-run the efficiency of direct implementation of specific computations in
silicon will dominate other techniques. However, before this level of efficiency can
be achieved in a routine manner a number of research problems must be solved and
substantial new infrastructure must be established. In addition to the need for im-
provement in VLSI design methods, a new level of understanding and definition of
software will be required.
7. Acknowledgments
Several people have made very helpful comments on previous drafts of this paper.
Steve McQueen, Peter Flanders, and David Hunt made particularly detailed and
useful comments. As always, remaining faults rest with me.
References
[1] Flanders, P.M., “A Unified Approach to a Class of Data Movements on an Array
Processor,” IEEE Tr. on Comp. Vol. C-31, no. 9, Sept. 1982.
[2] Iliffe, J.K., Advanced Computer Design, Prentice-Hall, 1982.
[3] Denyer, P. and Renshaw, D., VLSI Signal Processing: A Bit-Serial Approach,
Addison-Wesley, 1985.
[4] Mead, C. and Conway, L., An Introduction to VLSI Systems, Addison-Wesley,
1980.
[5] Ruttenberg, J.C. and Fisher, J.A., “Lifting the Restriction of Aggregate Data
Motion in Parallel Processing,” IEEE International Workshop on Computer Sys-
tem Organization, New Orleans, LA, USA, 29-31 March 1983 (NY, USA, IEEE
1983) pp. 211-215.
[6] Hillis, W.D., The Connection Machine, MIT Press, 1985.
[7] Parnas, D.L., “Software Aspects of Strategic Defense Systems,” American Scien-
tist, vol. 73, No.5, pp. 432-440, 1985, and reprinted in CACM, Vol. 28, No. 12,
pp. 1326-1335, Dec. 1985.