You are on page 1of 7

0137936

EE307
The von Neumann Architecture,
Intel x86, ARM and the Playstation 2.
Ian Norton-Badrul <inorton@iee.org>

John von Neumann

In his 1946 paper “Preliminary Discussion of the Logical Design of an Electronic


Computing Instrument”[BGN46] , the Hungarian mathematician John von Neumann
described the concept of “the stored program computer”. The idea that both instructions
for a calculation and the data required by it are stored in
the same memory structure, giving a clear separation of
storage and processing units. Almost all programmable
computers are fully or mostly von Neumann. The first von
Neumann machine to run was the Manchester I in 1948.
It's design was greatly influenced by via von Neumann's
1945 report entitled “First Draft of a Report on the
EDVAC”[JVN45].

The von Neumann architecture treats instructions as


Fig 1 data, and vice versa, this concept allows a computing
John Von Neumann with EDVAC device to be re-programmable, and allows for the idea of
a program compiler. The idea that programs and their data exist in the same area has
given us not only the compiler and self modifying code but has lead to viruses and
exploits through events such as buffer overflows. In a von Neumann machine, there are
distinct processing and a single storage area, these are connected by a bus, data or
instructions to be processed, and results from processing have to pass over this bus, only
one item can be passed over the bus in either direction at the same time.

One limitation of the von Neumann architecture is due to the bandwidth limitation that
exists between the storage and processing units. The efficiency of instruction throughput
can be severely impaired in situations where large amounts of data are required, the
processing unit would remain idle while waiting for data or for the next instruction to be
fetched from memory. In a speech at the 1977 Turing Award lecture, John Backus (Also
the inventor of FORTRAN) coined the term “von Neumann Bottleneck” to describe this
limitation of the von Neumann architecture.

The development of modern computer architecture has seen numerous improvements to


the problem posed by the von Neumann bottleneck. The advent of index/general purpose
registers, hardware interrupts and multiple processors have reduced the bottleneck but
have not come close to eliminating it.

The Intel x86 family

In the early 1970s the first microprocessors were shipped by Intel, 1971 saw the 4004
chip containing 2250 transistors. The modern x86 processor architecture is a direct
descendant of this chip, the first x86 family chip appeared in in 1978 as the 8086. in the

04/02/05 1/7
0137936

form of the slightly more simple 8088, the chip family became the processor of choice in
the first IBM PCs. The 8088 and 8086 are CISC devices running at 6-12 MHz, they have
16bit registers and can address memory with 20bits (1mb) of address space in a
segmented fashion (this is referred to as 'Real Mode'). The x86 family of chips introduce
the concept of different addressing modes (near and far) to allow larger amounts of
memory space to be addressed than can be defined in a single instruction. Also the 8086
supported a small amount of stack space to allow for true subroutines that could be ran
without altering the variables of the routine that called it.

The IBM PC systems based on the x86 family of chips are von Neumann machines, they
have distinct processing and storage elements and a bottleneck exists between the
processor and it's memory.

The x86 chip family grew to include the 80286, this processor was able to execute
software originally written for earlier x86 chips by operating in 'Real Mode'. The 80286
also has when running in 'Protected Mode' the 80286 gives access to the processor's
extended addressing capabilities (allowing up-to 16mb to be address).

The addition of the 80386 processor in 1985 saw the birth of what is the de-facto
processor architecture in home computers today. The 80386 was the first of the fully
32bit processors (Instructions, registers, and memory address space all are 32bits wide*),
an architecture we now refer to as IA32. This 32bit mode was named 'Enhanced Mode'.
Enhanced Mode brought support for paging allowing ares of RAM need but not currently
in use to be 'paged' to slower magnetic media, and read back into RAM when required
again so that other data or instructions can be loaded into RAM and executed, this by
itself is known as virtual memory. Paging also allows for the context switching needed for
true multitasking operating systems such as Linux**, Windows NT and BSD

The demand for faster processors to perform more complex calculations increased. For
80386 and earlier chips, floating point calculations where relatively inefficient compared
to integer operations. To improve performance, floating point math co-processors such as
the 80287 and 80387 were often added to computer systems where there was a need for
better floating point performance. Following the development of the 80486 series of
chips, the dedicated floating point hardware was included in the main CPU package.
Developments around the time of the 486 chip began to see many other processor
manufacturers offering x86 compatible devices, notably AMD, Cyrix and IBM with the
AM486, Cx486 and Blue Lightening processors. While AMD had been developing
compatible chips since the 80086 it was after 1990 when x86 family processors from
companies other than Intel became widely used.

In 1993, Intel broke from it's traditional x86 naming scheme and announced the 'Pentium'
chip, effectively an 80586 chip, Intel sought to use a name instead of a number partly for
marketing reasons and partly because Intel had never officially registered I80486 or
earlier part numbers as trademarks. The new chip name was a result of a competition,
where the winner received a free trip to Hawaii. (some believe the 'pent' part of the name,
had something to do with the '5' from 586 or the chip's other name 'p5') In 1995, Intel
released it's 'Pentium Pro' chip, based on it's p6 chip, this was also to become to basis for
the later Pentium II processor.

1997 saw the arrival of 'MMX', Intel's 'Matrix Math eXtentions' both as a processing
extension and a marketing tool (colourful Intel men dancing around in clean room suits to
various disco tunes). Intel's MMX provides 57 new instructions and the use of eight 64 bit

* 80386sx has a 24bit address bus and 16 bit data bus.


** Linux first written on the 80386 in 1991 using Minix and gcc-1.3

04/02/05 2/7
0137936

wide registers, instructions designed for digital signal processing and the introduction of
the SIMD[DAYTON1] where a single instruction would be able execute over multiple data
items. MMX provides a more efficient method for processing three-dimensional vectors
through matrix manipulation, bringing more realistic 3D games within easier reach of
developers. This is perhaps displayed more by AMD's processor extension, '3Dnow' that
provides similar extra instructions aimed at the same area as MMX. 1997 also saw the
introduction of 'Accelerated Graphic Port' (AGP), AGP isolates memory of the graphics
system from the rest of main memory by providing direct communication between
processor and video output device, removing the need for video data to travel across the
main system bus greatly improves the bandwidth available for graphics and significantly
reduces the effect of the von Neumann bottleneck.

Intel's IA32 (x86) family of processors has continued to grow, the complexity of
subsequent chips has followed predictions made by Moore's Law and has seen new
extensions such as SSE. The IA32 architecture changed little until recently, partly because
of Intel's pursuit of the IA64 architecture. IA64 gives a fully 64bit architecture with a
whole new instruction set, at the hardware and assembly level this new family of
processor is not compatible with IA32 chips. Uptake of IA64 systems using Intel's Itanium
and Itanium2 processors is relatively low, partly because of poor driver support and partly
because of the lack of direct backward compatibility with IA32, it is this legacy support in
x86 chips is considered by many to be the architecture's biggest barrier to improved
performance.

In 2003, AMD announced the release of the AMD64 processor architecture. This is an
extension to the x86 architecture providing good legacy performance and providing full
64bit wide registers, address space and instructions. The main benefits of this system
have been seen to be the increased address space, allowing an AMD64 based machine to
address up to 16Gb of main memory (far in excess of the 4Gb barrier in most IA32 class
systems)

The Playstation 2 Platform.


The Playstation 2 is not just another ordinary games console. At the time of writing, the
Playstation 2 is one of the most technincally advanced consumer computing devices
around. It sports a 128bit CPU core named 'Emotion Engine'[IEEESONY] with two vector
processing units and dedicated sound, graphic and I/O processors all linked via a 128bit
system bus. Although it's 300MHz main clock speed is modest compared to average home
PCs, it still out performs similar speed x86 based devices in similar tasks such as vector or
floating point computation. The PS2 employs what is known as a 'micro-programmable
graphics architecture'[MALINS1] this is a system that allows the programmer to write
program code that deals with the various stages of graphic processing providing a
parallel, non-blocking system where each hardware element has it's own instruction and
data memory. This releases the CPU from control over graphic processing and output.
The Playstation 2's CPU is a variant of the MIPS processor core containing a floating point
unit and SIMD extensions similar to those provided by MMX and SSE. It has three on-chip
cache memory areas divided up into 16kb (scratch pad), 16kb (ICACHE) and 8Kb
(DCACHE) (see Fig 2)
The two vector units operate differently, Only VU0 is directly connected to the CPU, this
can also used by the CPU as a MIPS co-processor. This alows the CPU to directly call VU
instructions. Also, the two VUs can be directly re-programmed with new microcode, they
can perform very different custom operations (in parallel) according to the microcode
they are loaded with. VU1 outputs 3d primatives directly to the graphics system for raster
display.

04/02/05 3/7
0137936

The IPU (Image Processing Unit) contains hardware to perform various image
manipulation functions such as MPEG2 decoding and compressed image decompression
(jpeg, png etc).

Fig 2 - Emotion Engine Architecture

The DMA Controller handles of data among the elements of the system, this also handles
the loading of microcode and program data into VU0 and VU1.
The parallel nature of the microcoded graphics architecture allows for much more
efficiant use of the bus between processing elements, in this case reducing the effect of
the Von Neumann bottleneck.
The process of outputing 3d graphics in many systems follows what is known as the 3d
graphics pipeline (fig 3), In many games systems more sections of the lower end of the
pipeline have been seen in hardware for acceleration. The Emotion Engine provides a
microcoded hardware option for the sections of the pipeline from Model Transform to
Screen Transform. The microcoded solution offers better use of bus bandwidth as in pure
hardware implementations often a fixed data or word size is required (possibly leading to
padding and wasted bandwidth). Microcoded data need only be exactly what is needed.

Fig 3 - 3d Graphics Processing Pipeline

04/02/05 4/7
0137936

The playstation 2 system also provides industry standard interfaces such as ieee1394
(firewire) and USB. It is also the first games console system with an officially supported
Linux based development environment. In 2002 this development package was released
including a 10/100 mbit ethernet adapter, keyboard and 40Gb hard drive.
The next generation of the Playstation (Playstation III) will see expansion on the parallel
aspects of the Emotion Engine in the form of the Cellular processor system, similar in
concept to multi-transputer systems, each CPU cell will have it's own internal memory bus
and processing elements giving a potentially very powerful parallel computing system.

ARM - The most popular chip in the world.


The concept behind the ARM processor is that of a simple hardware design that is reliable,
fast and offers a reduced hardware instruction set (RISC), The ARM uses fixed length
instructions and to make the hardware required for instruction decoding and exection
much more simple than for variable length instructions.

ARM began life at Acorn Computers in Cambridge around 1984. The ARM1 was originally
intended to be the processor behind the machine to replace the BBC micro and produce a
to meet the needs to businesses, providing a 32bit device and higher performance than
the BBC's 8-bit 6502 processor and alternative 16bit processors of the time. ARM stood
for 'Acorn RISC Machine'.

Acorn opted for RISC for a number of reasons, design and fabrication of a large design
(CISC) were beyond the budget of Acorn at the time. Designing, testing and building a
RISC device (with it's relatively low number of transistors) was more within reach of
Acorn's resources.

The first model of the instruction set used by the ARM processors (the ARM instruction
set) was developed by six Acorn engineers using BBC micros[ARMCH1]. In April 1985 the
first production run produced the first working ARM1 CPUs, each with under 25000
transistors. ARM1 provided 16 user accessible 32bit registers and 26bit register indirect
addressing[ARMCL], allowing it to address 64mb of memory. The ARM1 was improved on
in the form of the ARM2 which greatly improved performance in many operations by
providing multiplication instructions and provision for a math co-processor.

Fig 4 - ARM 3 Stage Pipeline


In 1987 Acorn released the Archimedes computer, the first ARM based computer platform.
Archimedes was an 8MHz machine comparable in performance to 68k CISC machines
such as the Atari ST and ran RISC OS.

04/02/05 5/7
0137936

In an effort to improve instruction throughput the use of a three stage (fetch, decode,
execute) instruction pipeline was introduced (see fig 4). This allows the next instruction of
a program to be fetched from memory while the current instruction is being decoded in
the CPU core. When this instruction is being executed, the following one will be decoded
and the instruction preceding that will be fetched from memory. This prevents any of the
three parts of the CPU laying idle during program execution and increases instruction
throughput by a factor of three.

Later developments saw the formation of ARM Ltd, set-up in 1991 by Apple, Acorn and
VLSI Technology. The ARM CPU was renamed 'Advanced RISC Machine'. ARM Ltd adopted
a new numbering scheme for it's processors, a single digit after 'ARM' to signifying the
processor core and two digits after defining the versions of support I/O devices in the chip
package.

Fig 5 Netwinder Internet/Office Server

Instead of directly developing and fabricating it's own CPUs. ARM Ltd licensed chip
manufacturers to produce ARM processors. The first license holder was VLSI technology.
ARM processors have continued to be developed and aimed at low power consumption
areas and embedded systems (PDA, routers, modems etc) ARM processors such as
ARM700, StrongARM and Xscale have been used in virtually every arena of computational
electronics, from set-top TV boxes through washing machines to dedicated internet
servers (Fig 5).

StrongARM brought PCMCIA support and saw widespread use in PDAs and early tablet
computing devices such as the Compaq/HP IPAQ, Sharp Zaurus and Siemens Simpad. The
ARM9 series also included a version (ARM926-EJ-S) with a hardware based Java virtual
machine environment. Since the ARM6 core, the ARM family has fully supported 32bit
addressing allowing it to address 4Gb of memory/devices.

04/02/05 6/7
0137936

References

[JVN45] Von Neumann, J. 1981. First draft of a report on the EDVAC." In Stern, N. From
ENIAC to Univac: An Appraisal of the Eckert-Mauchly Computers. Digital Press, Bedford,
Massachusetts.¤

[BGN46] Burks, A. W., Goldstine, H. H., and von Neumann, J. 1963. Preliminary discussion
of the logical design of an electronic computing instrument. In Taub, A. H., editor, John
von Neumann Collected Works, The Macmillan Co., New York, Volume V, 34-79*
(http://www.cs.unc.edu/~adyilie/comp265/vonNeumann.html)

[INT02] The History of Computing, 2002, EDVAC,


http://www.thocp.net/hardware/edvac.htm

[INTML] Intel Corp, Moores Law,


http://www.intel.com/technology/silicon/mooreslaw/index.htm

[WPX86] Wikipedia, X86, http://en.wikipedia.org/wiki/X86

[DAYTON1] MMX Technology Technical Overview, University of Dayton,


http://www.udayton.edu/~cps/cps560/notes/hardware/mmx/overview/overview.html

[IEEESONY] Paul Holman, The Technology behind the Playstation 2. Sony Computer
entertainment group. September 2002
http://www.ieee.org.uk/docs/sony.pdf

[MALINS1] Dominic Mallinson, Benefits of A Micro-programmable Graphics Architecture,


http://www.bringyou.to/games/PS2.htm

[ARMCH1] The History of the ARM CPU


http://www.ot1.com/arm/armchap1.html

[ARMCL] ARM Chips List, University of Maryland USA.


http://www.cs.umd.edu/class/fall2001/cmsc411/proj01/arm/armchip.html#Arm1

¤ Contained within pdf- http://qss.stanford.edu/~godfrey/vonNeumann/vnedvac.pdf


* John von Neumann's original papers

04/02/05 7/7