Professional Documents
Culture Documents
Over View of Microprocessor Systems: L.Anjaneyulu
Over View of Microprocessor Systems: L.Anjaneyulu
MICROPROCESSOR
SYSTEMS
L.ANJANEYULU
Dept of ECE
N.I.T., Warangal
NITW/ECE/LA 1
ENIAC-on-a-Chip
Moore School of Electrical Engineering, University of
Pennsylvania http://www.ee.upenn.edu/~jan/eniacproj.html
NITW/ECE/LA 2
Intel
1950's: Shockley leaves Bell Labs to establish Shockley Labs in
California. Some of the best young electronic engineers and solid-
state physicists come to work with him. These include Robert Noyce
and Gordon Moore.
1969: Intel was a tiny start-up company in Santa Clara, headed by
Noyce and Moore.
1970: Busicom placed an order with Intel for custom calculator chips.
Intel had no experience of custom-chip design and sets outs to design a
general-purpose solution.
1971: Intel have problems translating architectures into working chip
designs - the project runs late.
Faggin joins Intel and solves the problems in weeks.
The result is the Intel 4000 family (later renamed MCS-4,
Microcomputer System 4-bit), comprising the 4001 (2k ROM), the
4002 (320-bit RAM), the 4003 (10-bit I/O shift-register) and the 4004,
a 4-bit CPU.
NITW/ECE/LA 3
Intel 4004
Introduced in 1971, the Intel
4004 "Computer-on-a-Chip" was
a 2300 transistor device capable
of performing 60,000 operations
per Second.
NITW/ECE/LA 4
The Intel 4004
Federico
Faggin
designed the
Intel 4004
processor.
His initials
were printed
on the
circuit.
NITW/ECE/LA 5
Intel 4004 – First Microcomputer
http://uk.geocities.com/magoos_universe/4004_main.htm
NITW/ECE/LA 6
The Busicom Calculator
NITW/ECE/LA 7
Intel 8008
1972: Faggin begins work on an 8-bit processor, the
Intel 8008. The prototype has serious problems with
electrical charge leaking out of its memory circuits.
Device physics, circuit design and layout are important
new skills. The 8008 chip layout is completely redesigned
and the chip is released.
There is a sudden surge in microprocessor interest.
Intel's 8008 is well-received, but system designers want
increased speed, easier interfacing, and more I/O and
instructions. The improved version, produced by Faggin,
is the 8080.
Faggin leaves Intel to start his own company Zilog, who
later produce the Z80.
NITW/ECE/LA 8
Federico Faggin : Zilog
Zilog produced the
3.5MHz Zilog Z80 (a very
popular processor taught
in many universities)
… and, later, a 16-bit Z8000.
Another great design but
Zilog struggled to provide
good support, they were a
new and inexperienced
company and had only a
few hundred employees;
at this time Intel had over
10 thousand.
NITW/ECE/LA 9
The Zilog Z80
The Z80 microprocessor is an 8 bit
CPU with a 16 bit address bus capable
of direct access of 64k of memory
space.
NITW/ECE/LA 13
The IBM PC
1981: IBM, having seen Apple's success recognise
a new personal computer market. They choose Intel
over Motorola and Zilog (and their own proprietary
processors) because of Intel's long-term commitment
to the 8086 line.
IBM selects the Intel 8088 for their PC, introduced in
August.
Intel bring out the 16-bit 80286 for the IBM PC AT
but it has weaknesses, most notably in virtual memory
support. The newest 'killer' application software,
Microsoft Windows, needs a more powerful
processor.
NITW/ECE/LA 14
Contemporary Microprocessors:
16/32-bit Processors
(external 16-bit Bus, internal 32 Bit
Structure)
Motorola MC68010
National Semiconductor NS16032
Additional Functionality on the Chip
Direct Memory Access (DMA) (Intel 80186)
Virtual memory management
(MC68010, Intel 80286)
Optional Coprocessor (Intel 8086/80286,
NS16032)
Extended Address Space
NITW/ECE/LA 15
Microprocessor History
32-bit Processors
CISC Processors
• Motorola MC680x0
• Intel i386 / i486 / Pentium
• National Semiconductor NS32x32
• Concept of a Processor Family
• Binary Compatibility
• Compatible with 16 Bit Processors
RISC Processors
• Advanced Micro Devices Am29000 (~1987)
• Sun Microsystems SPARC
• MIPS technologies MIPS R2000 / MIPS R3000
NITW/ECE/LA 16
Moore’s Law
NITW/ECE/LA 18
80386(1985) IA-32 architecture family.
Support for multitasking
Additional registers (segment pointers).
All GP registers now 32 bits.
Address space now 32 bits with several new addressing
modes. Provides logical address space for each
software process
Added paging support under existing segmented
architecture. Supports:Segmented-memory model
and
“Flat” one-memory model
Almost a general purpose register machine.
Intel386 Processor Includes 6 Parallel Stages. Bus
interface unit , Code prefetch unit ,Instruction decode
unit , Execution unit , Segment unit , Paging unit
NITW/ECE/LA 19
Intel486 processor
more parallel execution capability than Intel386
instruction decode and execution units in five pipelined stages,
each stage (when needed) operates in parallel with the others on up
to five instructions in different stages of execution.
Each stage can do its work on one instruction in one clock, and so
the Intel486 processor can execute as rapidly as one instruction per
clock cycle.
8-KByte on-chip first level cache to increase the percent of instructions
that could execute at the scalar rate of one per clock:
Memory access instructions included if the operand was in the first-level
cache.
Integrated the x87 floating point unit onto the processor
New pins, bits and instructions to support more complex and powerful
systems
Second-level cache support
Multiprocessor support.
Power management (for notebooks and laptops
NITW/ECE/LA 20
Speeding it up
Pipelining
On board cache
On board L1 & L2 cache
Branch prediction
Data flow analysis
Speculative execution
NITW/ECE/LA 21
Pentium Evolution (2)
80486
sophisticated powerful cache and instruction pipelining
built in maths co-processor
Pentium
Superscalar CPU
Multiple instructions executed in parallel + L1 Cache
Pentium Pro
Increased superscalar organization
Aggressive register renaming
branch prediction
CPU
data flow analysis L2 Cache
+ L1 Cache
speculative execution
NITW/ECE/LA 22
Pentium Evolution (3)
Pentium II (1997)
Pentium Pro + MMX (MultiMedia eXtensions)
- Data Bus (64bit), Address Bus (36bit)
- L1 Cache: 32K Byte, L2 Cache: 512K Byte Cache
- Processor Core Speed (450MHz - 233MHz)
- System Bus (100MHz)
graphics, video & audio processing
Celeron = Pentium II - L2Cache - Celeron A :
L2Cache(128KByte)
Xeon = Pentium II + Graphic Accelerator + .. (Server용 CPU)
(1998)
Scalability : can be scaled to 2, 4, 8 or more,
and used for high-end server and workstations
NITW/ECE/LA 23
Pentium III(1999)
NITW/ECE/LA 24
Pentium 4 (2000)
Further floating point and multimedia enhancements
- Data Bus (64bit), Address Bus (36bit)
- Processor Core Speed (2GHz - 3.2GHz)
- System Bus (400MHz-800MHz)
Itanium
64 bit
See Intel web pages for detailed information on processors
NITW/ECE/LA 25
Technological Development
Model Year # of transistors
4004 1971 2250
8008 1972 2500
8080 1974 5000
8086 1978 29000
80286 1982 120000
80386 1985 275000
80486 1989 1180000
Pentium 1993 3100000
Pentium-II 1997 7500000
Pentium-III 1999 24000000
Pentium 4 2000 42000000
NITW/ECE/LA 26
Technological Development
Pentium 4
100000000 Pentium III
Pentium II
# of transistors
10000000
80486 Pentium
1000000 80386
100000 80286
8086
10000
8080
1000
4004
8008
71
72
74
78
82
85
89
93
97
99
00
19
19
19
19
19
19
19
19
19
19
20
Year
NITW/ECE/LA 27
Performance
1970s Processors:
4004 8008 8080 8086 8088
Introduced 1971 1972 1974 1978 1979
NITW/ECE/LA 28
Performance
1980s Processors:
NITW/ECE/LA 30
Performance
Recent Processors:
Pentium III Pentium 4
NITW/ECE/LA 31
Contemporary Microprocessor
64/32-bit Processors
SUN Microsystems SuperSPARC
Motorola 88110
IBM, Motorola PowerPC 601 (MPC601)
“Modern” Processors
64-bit Structure
Internal Parallelism
• Instruction pipelining
• Arithmetic Pipelining
Instruction and Data Caches
Advanced Memory and Peripheral
Connections
NITW/ECE/LA 32
Performance Mismatch
Processor speed increased
Memory capacity increased
Memory speed lags behind processor
speed!!
NITW/ECE/LA 33
DRAM and Processor
Characteristics
NITW/ECE/LA 34
Intel Itanium 2 (McKinley)
• 64bit Processor
• 221 million transistors!
How are they used?
• What will we do as
transistor counts
continue to grow?
• Most of chip is used for
memories, inst. decoding,
dynamic scheduling…
• Why is it done this way?
• How much more efficient
could it be if more of area
went to actual processing?
NITW/ECE/LA 35
Even More Recent Example
• Runs 64-bit
IA-64 ISA
• Die: 3.74 cm2
• .13µ process
• 410M transistors
• 1.5GHz core
• 1.3V logic
• 130W power
consumption!
• 6.4GB/s bus
• Cost: $2,247-
$4,226
• 9MB L3 cache
later this year…
NITW/ECE/LA 36
AMD Opteron (100 Million Transistors)
NITW/ECE/LA 37
NITW/ECE/LA 38
Cyrix III
• Developed by National Semiconductor
• 133 MHz Front Side Bus (although it supports 66 MHz, and 100 MHz FSB).
• 256 KB integrated L2 cache along with a 64 KB integrated L1 cache.
• 3dNow! SIMD instructions in a dual pipelined FPU.
• As with the MII, the Cyrix III supports MMX.
• superscalar design featuring two seven stage pipelines allowing two
processing streams to be processed simultaneously.
• two level translation buffer and a 512 entry branch target buffer.
• out-of-order execution through register renaming and data forwarding and
bypassing to resolve data dependencies between pipelines.
• Speculative execution after a predicted branch is also supported.
• 15% to 20% cheaper than a comparable Celeron.
• Hope to capture 10% of market.
• Subject of a lot of legal action by Intel but VIA is still in business.
NITW/ECE/LA 39
First Implementation of Key Features: Montecito
Core Core
Core 1 Core 2
L3 Cache L3 Cache
NITW/ECE/LA 40
Intel’s Latest: The Pentium 4 2.4GHz
NITW/ECE/LA 41
Selecting a Microprocessor
Issues
Technical: speed, power, size, cost
Other: development environment, prior expertise, licensing, etc.
Speed: how evaluate a processor’s speed?
Clock speed – but instructions per cycle may differ
Instructions per second – but work per instr. may differ
Dhrystone: Synthetic benchmark, developed in 1984.
Dhrystones/sec.
• MIPS: 1 MIPS = 1757 Dhrystones per second (based on
Digital’s VAX 11/780). A.k.a. Dhrystone MIPS. Commonly
used today.
So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per
second
SPEC: set of more realistic benchmarks, but oriented to desktops
EEMBC – EDN Embedded Benchmark Consortium,
www.eembc.org
• Suites of benchmarks: automotive, consumer electronics,
networking, office automation, telecommunications
NITW/ECE/LA 42
Which has higher performance?
Time to do the task (Execution Time)
• execution time, response time, latency
Tasks per day, hour, week, sec, ns. .. (Performance)
• throughput, bandwidth
Response time and throughput often are in opposition
Response Time
Time to complete a task
Throughput
Total amount of work done per time
Execution Time (CPU Time)
User CPU time
• Time spent in the program
System CPU time
• Time spent in OS
Elapsed Time
Execution Time + Time of I/O and time sharing
NITW/ECE/LA 43
Performance evaluation
Criteria of Performance
Execution time seems to measure the power of the
CPU
Elapsed time measures the performance of whole
system including OS and I/O
User is interested in elapsed time
Sales people are interested in the highest number
of performance that can be quoted
Performance analysist is interested in both
execution time and elapsed time
NITW/ECE/LA 44
Coffee Time!
NITW/ECE/LA 45