Professional Documents
Culture Documents
Chapter 2
Computer Evolution and Performance
1
ENIAC Programming in the ’40s
2
IAS Machine Structure of von Neumann (IAS) machine
• This architecture came to be known as the
“von Neumann” architecture and has been the
basis for virtually every machine designed since
then
Structure of IAS –
IAS - details detail
• 1000 x 40 bit words
— Binary number
— 2 x 20 bit instructions
• Set of registers (storage in CPU)
— Memory Buffer Register
— Memory Address Register
— Instruction Register
— Instruction Buffer Register
— Program Counter
— Accumulator
— Multiplier Quotient
IAS Instruction
Set IAS Operation
3
Commercial Computers IBM
• 1947 - Eckert-Mauchly Computer Corp • The major manufacturer of punched-card
• UNIVAC I (Universal Automatic Computer) processing equipment
— Commissioned by US Bureau of Census 1950 • 1953 - the 701
calculations — IBM’s first stored program computer
— First successful commercial computer — Scientific calculations
• Became part of Sperry-Rand Corporation • 1955 - the 702
• Late 1950s - UNIVAC II — Business applications
— Faster and with more memory than Univac I • Lead to 700/7000 series
— Backwards compatible • Established IBM dominance
• Developed 1100 series for scientific computation
• Scientific/text processing split
4
An IBM 7094 Configuration Integrated Circuits; 3rd Generation
• Transistors are “discrete components”
• Manufactured in own container; soldered to
circuit board
• Early 2nd generation: 10,000 discrete transistors
• Later computers several hundred thousand
• 1958: Integrated circuits (ICs)
— Many transistors on piece of silicon
— Defines 3rd generation of computers
• Two most important 3rd generation computers
were the IBM System 360 and the DEC PDP-8
5
Moore’s Law Generations of Computers
• Gordon Moore – co-founder of Intel • Vacuum tube - 1946-1957
• Predicted that the number of transistors on a • Transistor - 1958-1964
chip will double every year • Small scale integration - 1965 on
• Since 1970’s development has slowed a little — Up to 100 devices on a chip
— Number of transistors doubles every 18 months • Medium scale integration - to 1971
— 100-3,000 devices on a chip
• Cost of a chip has remained almost unchanged
• Large scale integration - 1971-1977
• Higher packing density means shorter electrical — 3,000 - 100,000 devices on a chip
paths, giving higher performance • Very large scale integration - 1978 -1991
• Smaller size gives increased flexibility — 100,000 - 100,000,000 devices on a chip
• Reduced power and cooling requirements • Ultra large scale integration – 1991 -
• Fewer interconnections increases reliability — Over 100,000,000 devices on a chip
6
DEC PDP-8 DEC - PDP-8 Bus Structure
• 1964
• First minicomputer (named after the miniskirt!)
• Did not need air conditioned room
• Small enough to sit on a lab bench
• 96 separate signal paths carry address, data
• $16,000 and control signals
— vs.$100k+ for IBM 360
• Easy to plug new modules into bus
• Embedded applications & OEM
• Bus is controlled by CPU
• Introduced the BUS structure (vs. the central-
• Now virtually universal in microcomputers
switched IBM architecture)
7
16 and 32 bit processors Intel x86 Processors 1971 - 1989
• 16-bit processors were introduced in the late
1970’s
• First 32-processors were introduced in 1981
(Bell Labs and Hewlett-Packard)
• Intel’s 32-bit processor did not arrive until 1985
– and we’re still using that architecture
Speeding it up Pipelining
• Pipelining • Like an assembly line
• On board cache • Instruction execution is divided into stages
• On board L1 & L2 cache • When one stage has completed, that circuitry is
• Branch prediction free to operate on the “next” instruction
• Data flow analysis
• Speculative execution
8
Cache Memory Branch Prediction
• Very fast memory stores copies of data in slow • Processor looks ahead in instruction stream and
main memory attempts to predict which branch will be taken
• Can be onboard (same chip as processor) or in by conditional code
separate chips • Execute predicted branch and hope it’s right
• Wrong guess leads to pipeline stall
9
Solutions I/O Devices
• Increase number of bits retrieved at one time • Peripherals with intensive I/O demands
— Make DRAM “wider” rather than “deeper” • Large data throughput demands
• Change DRAM interface • Processors can handle this
— Cache • But there are problems moving data between
• Reduce frequency of memory access processor and peripheral
— More complex cache and cache on chip • Solutions:
• Increase interconnection bandwidth — Caching
— High speed buses — Buffering
— Hierarchy of buses — Higher-speed interconnection buses
— More elaborate bus structures
— Multiple-processor configurations
10
Intel Microprocessor Performance Increased Cache Capacity
• Typically two or three levels of cache between
processor and main memory
• Chip density increased
— More cache memory on chip
– Faster cache access
• Pentium chip devoted about 10% of chip area
to cache
• Pentium 4 devotes about 50%
11
Pentium Evolution (1) Pentium Evolution (2)
• 8080 • 80486
— first general purpose microprocessor — sophisticated powerful cache and instruction
— 8 bit data bus pipelining
— Used in first personal computer – Altair
— built in floating point co-processor
• 8086
— much more powerful • Pentium
— 16 bit processor and bus — Superscalar
— instruction cache, prefetch few instructions — Multiple instructions executed in parallel
— 8088 (8 bit external bus) used in first IBM PC
• Pentium Pro
• 80286
— Increased superscalar organization
— 16 Mbyte memory addressable
— up from 1Mb — Aggressive register renaming
• 80386 — branch prediction
— 32 bit — data flow analysis
— Support for multitasking — speculative execution
12
Constraints on Embedded Systems Environmental Coupling
— Different application characteristics depending on • Embedded systems are typically tightly coupled
static or dynamic loads, sensor speed, response time with their environment
requirements, computation versus I/O intensive
tasks • Most systems are subject to real-time response
— Different models of computation ranging from restraints generated by interaction with the
discrete asynchronous event systems to systems environment
with continuous time dynamics — speed of motion, measurement precision, reactive
latency dictate software requirements
Integrated memory
• Acorn RISC Machine (ARM1) released 1985 management unit,
graphics and I/O
processor
— VLSI Technology was the silicon partner ARM3 First use of 4 KB unified 12 MIPS @ 25
processor cache MHz
— High performance with low power consumption ARM6 First to support 32-
bit addresses;
4 KB unified 28 MIPS @ 33
MHz
floating-point unit
— ARM2 added multiply and swap instructions ARM7 Integrated SoC 8 KB unified 60 MIPS @ 60
MHz
— ARM3 added a small cache ARM8 5-stage pipeline;
static branch
8 KB unified 84 MIPS @ 72
MHz
prediction
• Acorn, VLSI and Apple started ARM Ltd: ARM9 16 KB/16 KB 300 MIPS @ 300
MHz
13
Processor Assessment Performance Assessment
• What criteria do we use to assess processors? • Performance is not easy compare; it varies with
— Cost — Processor Speed
— Size — Instruction Set
— Reliability — Application
— Security — Choice of programming language
— Backwards compatibility — Quality of compiler
— Power consumption — Skill of the programmer
— Performance
14
Instruction Execution Rate Performance Factors and System Attributes
• If all instructions required the same number of
clock ticks then the average execution rate
would be a constant dependent on clock speed
• In addition to variation in instruction clocks,
we also have to take into account the time
needed to read and write memory
Where Ic = Instruction Count
— This can be significantly slower than the system
p = processor cycles needed to decode / execute
clock
m = number of memory references needed
k = ratio between processor cycle and memory cycle
= 1 / processor cycle (clock time)
15
A note on measures of central tendencies Amdahl’s Law
• The conventional arithmetic average is only one • Proposed by Gene Amdahl in 1967 to predict
of many measures of central tendencies. Three the maximum speedup of a system using
classical or Pythagorean means are (for n multiple processors compared to a single
observations): processor
— Arithmetic mean (sum of the observations divided by
n) • For any give program some fraction f involves
— Geometric mean (nth root of the product of the code that is amenable to parallel execution
observations) while the remainder of the code (1-f) can only
— Harmonic mean (n divided by the sum of the be executed in a serial fashion
reciprocal of the observations)
• In its simplest form Amdahl’s law for N
• It is always true that the arithmetic mean >
geometric mean > harmonic mean processors can be expressed as
Speedup = 1 / [(1-f) + f / N )
• Reasons for selection of a particular measure are
beyond the scope of this course.
16