You are on page 1of 45


L.ANJANEYULU Dept of ECE N.I.T., Warangal


ENIAC-on-a-Chip Moore School of Electrical Engineering, University of



1950's: Shockley leaves Bell Labs to establish Shockley Labs in California. Some of the best young electronic engineers and solidstate physicists come to work with him. These include Robert Noyce and Gordon Moore. 1969: Intel was a tiny start-up company in Santa Clara, headed by Noyce and Moore. 1970: Busicom placed an order with Intel for custom calculator chips. Intel had no experience of custom-chip design and sets outs to design a general-purpose solution. 1971: Intel have problems translating architectures into working chip designs - the project runs late. Faggin joins Intel and solves the problems in weeks. The result is the Intel 4000 family (later renamed MCS-4, Microcomputer System 4-bit), comprising the 4001 (2k ROM), the 4002 (320-bit RAM), the 4003 (10-bit I/O shift-register) and the 4004, a 4-bit CPU.


Intel 4004
Introduced in 1971, the Intel 4004 "Computer-on-a-Chip" was a 2300 transistor device capable of performing 60,000 operations per Second. It was the first-ever single-chip microprocessor and had approximately the same performance as the 18,000 vacuum tube ENIAC. The 4-bit Intel C4004 ran at a Clock Speed of 108 Kilo Hertz.

The Intel 4004

Federico Faggin designed the Intel 4004 processor. His initials were printed on the circuit.


Intel 4004 First Microcomputer NITW/ECE/LA

The Busicom Calculator

The Busicom calculator used five Intel 4001s, two 4002s, three 4003s and the 4004 CPU

The original engineering prototype of the Busicom desk-top printing calculator, the worlds first commercial product to use a microprocessor.


Intel 8008
1972: Faggin begins work on an 8-bit processor, the Intel 8008. The prototype has serious problems with electrical charge leaking out of its memory circuits. Device physics, circuit design and layout are important new skills. The 8008 chip layout is completely redesigned and the chip is released. There is a sudden surge in microprocessor interest. Intel's 8008 is well-received, but system designers want increased speed, easier interfacing, and more I/O and instructions. The improved version, produced by Faggin, is the 8080. Faggin leaves Intel to start his own company Zilog, who later produce the Z80.

Federico Faggin : Zilog

Zilog produced the 3.5MHz Zilog Z80 (a very popular processor taught in many universities) and, later, a 16-bit Z8000. Another great design but Zilog struggled to provide good support, they were a new and inexperienced company and had only a few hundred employees; at this time Intel had over 10 thousand.

The Zilog Z80

The Z80 microprocessor is an 8 bit CPU with a 16 bit address bus capable of direct access of 64k of memory space. It was based on the 8080; it has a large instruction set. Programming features include an accumulator and six eight bit registers that can be paired as 3-16 bit registers. In addition to the general registers, a stack-pointer, program-counter, and two index (memory pointers) registers are provided.

Early Microcontrollers
1974: Motorola (originally car radio manufacturers) had introduced transistors in the 1950s and decided to make a late but serious effort in the microprocessor market. They announced their 8-bit 6800 processor. Though bulky, and fraught with production problems, their 6800 had a good design. 1975: General Motors approach Motorola about a custom-built derivative of the 6800. Motorola's long experience with automobile manufacturers pays off and Ford follow GM's lead. 1976: Intel introduce an 8-bit microcontroller, the MCS-48. They ship 251,000 in this year. 1980: Intel introduce the 8051, an 8-bit microcontroller with on-board EPROM memory. They ship 22 million and 91 million in 1983.

The Intel 8086(1978)

29,000 Transistors Clock Speeds: 5, 8 and 10 MHz Approx. 10 times the performance of the 8080 Intel 8086, 16 bit assembly-language compatible extension of the 8080 architecture. 1978. All registers 16 bits wide. Additional registers all have dedicated uses. Extended Accumulator architecture. IBM selects the 8088, an 8086 with an 8 bit external bus, as the processor for the IBM PC. Early 1980

Early Computers
1979: Motorola also announce a 16-bit 68000. Indisputably, the best microprocessor on the market. It would be used in the Apple Macintosh launched in 1984. Intel look seriously at the competition (Motorola and Zilog) and implement 'Operation CRUSH' - a huge campaign with a focused and trained work force providing customer support, complete solutions and long-term product support. CRUSH proves an excellent strategy and the 8086 becomes the de facto standard. This success helps finance additions to their product range, one of which is the bus width reduced 8088, a 16-bit (8-bit bus) microprocessor.

The early Apple Macintosh


1981: IBM, having seen Apple's success recognise a new personal computer market. They choose Intel over Motorola and Zilog (and their own proprietary processors) because of Intel's long-term commitment to the 8086 line. IBM selects the Intel 8088 for their PC, introduced in August. Intel bring out the 16-bit 80286 for the IBM PC AT but it has weaknesses, most notably in virtual memory support. The newest 'killer' application software, Microsoft Windows, needs a more powerful processor.



Contemporary Microprocessors: 16/32-bit Processors (external 16-bit Bus, internal 32 Bit Structure)
Motorola MC68010 National Semiconductor NS16032

Additional Functionality on the Chip

Direct Memory Access (DMA) (Intel 80186) Virtual memory management (MC68010, Intel 80286) Optional Coprocessor (Intel 8086/80286, NS16032) Extended Address Space

Microprocessor History
32-bit Processors
CISC Processors
Motorola MC680x0 Intel i386 / i486 / Pentium National Semiconductor NS32x32 Concept of a Processor Family Binary Compatibility Compatible with 16 Bit Processors

RISC Processors
Advanced Micro Devices Am29000 (~1987) Sun Microsystems SPARC MIPS technologies MIPS R2000 / MIPS R3000



Moores Law

Dr. Gordon E. Moore co-founded Intel in 1968. His observation that number of transistors doubled every 2 years became known as Moores Law



Pentium Evolution (1)

8086 much more powerful 16 bit instruction cache, prefetch few instructions 8088 (8 bit external bus) used in first IBM PC 80286(1982) : Added new instructions to support memory management.
Added memory mapping and multilevel protection scheme

Added real addressing mode to support legacy 8086 code. 16 Mbyte memory addressable (up from 1Mb) Include:
Segment limit checking, Read-only and execute-only segment options, Up to four privilege levels to protect operating system code (in several subdivisions, if desired) from application or user programs. Hardware task switching and local descriptor tables allow the operating system to protect application or user programs from each other.

80386(1985) IA-32 architecture family.

Support for multitasking Additional registers (segment pointers). All GP registers now 32 bits. Address space now 32 bits with several new addressing modes. Provides logical address space for each software process Added paging support under existing segmented architecture. Supports:Segmented-memory model and Flat one-memory model Almost a general purpose register machine. Intel386 Processor Includes 6 Parallel Stages. Bus
interface unit , Code prefetch unit ,Instruction decode unit , Execution unit , Segment unit , Paging unit

Intel486 processor
more parallel execution capability than Intel386 instruction decode and execution units in five pipelined stages, each stage (when needed) operates in parallel with the others on up to five instructions in different stages of execution. Each stage can do its work on one instruction in one clock, and so the Intel486 processor can execute as rapidly as one instruction per clock cycle. 8-KByte on-chip first level cache to increase the percent of instructions that could execute at the scalar rate of one per clock: Memory access instructions included if the operand was in the first-level cache. Integrated the x87 floating point unit onto the processor New pins, bits and instructions to support more complex and powerful systems Second-level cache support Multiprocessor support. Power management (for notebooks and laptops

Speeding it up
Pipelining On board cache On board L1 & L2 cache Branch prediction Data flow analysis Speculative execution



Pentium Evolution (2)

sophisticated powerful cache and instruction pipelining built in maths co-processor

Superscalar Multiple instructions executed in parallel
CPU + L1 Cache

Pentium Pro
Increased superscalar organization Aggressive register renaming branch prediction CPU data flow analysis + L1 Cache speculative execution

L2 Cache


Pentium Evolution (3)

Pentium II (1997)
Pentium Pro + MMX (MultiMedia eXtensions) - Data Bus (64bit), Address Bus (36bit) - L1 Cache: 32K Byte, L2 Cache: 512K Byte Cache - Processor Core Speed (450MHz - 233MHz) - System Bus (100MHz)

graphics, video & audio processing

Celeron = Pentium II - L2Cache - Celeron A :
L2Cache(128KByte) Xeon = Pentium II + Graphic Accelerator + .. (Server CPU) (1998)

Scalability : can be scaled to 2, 4, 8 or more, and used for high-end server and workstations



Pentium III(1999)
Data Bus (64bit), Address Bus (36bit) Processor Core Speed (1.1GHz - 450MHz) System Bus (133MHz) Cache Speed Upgrade (Advanced Transfer Cache) 70 New Extended Instructions (SIMD) 70 new streaming SIMD extensions (SSE) : 50 to improve floating-point performance 12 to improve multimedia processing 8 to improve the efficiency of L1 cache - Pentium III Xeon Processor

Additional floating point instructions for 3D graphics


Pentium 4 (2000)
Further floating point and multimedia enhancements
- Data Bus (64bit), Address Bus (36bit) - Processor Core Speed (2GHz - 3.2GHz) - System Bus (400MHz-800MHz) - 800 MHz : Pentium 4 C -3.20 GHz, 3 GHz, 2.80 GHz, 2.60 GHz, 2.40 GHz - 533 MHz : Pentium 4 B 3.06 GHz, 2.80 GHz, 2.66 GHz, 2.53 GHz, 2.40 GHz, 2.26 GHz - 400 MHz : Pentium 4 A 2.60 GHz, 2.50 GHz, 2.40 GHz, 2.20 GHz, 2 GHz -hyper-threading technology

64 bit
See Intel web pages for detailed information on processors

Technological Development
Model 4004 8008 8080 8086 80286 80386 80486 Pentium Pentium-II Pentium-III Pentium 4 Year 1971 1972 1974 1978 1982 1985 1989 1993 1997 1999 2000

# of transistors 2250 2500 5000 29000 120000 275000 1180000 3100000 7500000 24000000 42000000

Technological Development
Pentium 4 Pentium III Pentium II 80486 80386 80286 8086 8080 Pentium

# of transistors

10000000 1000000 100000 10000 1000 4004

19 71 19 72 19 74 19 78 19 82 19 85 19 89 19 93 19 97 19 99 20 00





1970s Processors:
4004 Introduced Clock Speeds Bus Width Number of Transistors Addressable Memory Virtual Memory 1971 108 KHz 8008 1972 108 KHz 8080 1974 2 MHz 8086 1978 5 MHz, 8MHz, 10MHz 16 bits 29,000 1 MB -8088 1979 5 MHz, 8MHz 8 bits 29,000 1 MB --

4 bits 2300 640 bytes --

8 bits 3500 16 KBytes --

8 bits 6000 64 KBytes --



1980s Processors:
80286 Introduced Clock Speeds Bus Width Number of Transistors Addressable Memory Virtual Memory 1982 6 MHz 12.5 MHz 16 bits 134,000 16 MB 1 GB 386TM DX 1985 16 MHz-33 MHz 32 bits 275,000 4 GB 64 TB

386TM SX 1988 16 MHz-33 MHz 16 bits 275,000 4GB 64 TB

486TM DX CPU 1989 25 MHz- 50 MHz 32 bits 1.2 million 4GB 64 TB


1990s Processors:
486TM SX Introduced Clock Speeds Bus Width Number of Transistors Addressable Memory Virtual Memory 1991 16 MHz133MHz 32 bits 1.185 million 4 GB 64 TB Pentium 1993 60 MHz 166 MHz 32 bits 3.1 million 4 GB 64TB Pentium 1995 150 MHz200MHz 64 bits 5.5 million 64 GB 64 TB Pentium II 1997 200 MHz300MHz 64 bits 7.5 million 64 GB 64 TB



Recent Processors:
Pentium III Introduced Clock Speeds Bus Width Number of Transistors Addressable Memory Virtual Memory 1999 450 MHz 64 bits 95 million 64 GB 64 GB Pentium 4 2000 1.3-1.8 GHz 64 bits 42 million 64 GB 64 TB



Contemporary Microprocessor
64/32-bit Processors
SUN Microsystems SuperSPARC Motorola 88110 IBM, Motorola PowerPC 601 (MPC601)

Modern Processors
64-bit Structure Internal Parallelism
Instruction pipelining Arithmetic Pipelining

Instruction and Data Caches Advanced Memory and Peripheral Connections


Performance Mismatch
Processor speed increased Memory capacity increased Memory speed lags behind processor speed!!



DRAM and Processor Characteristics



Intel Itanium 2 (McKinley)

64bit Processor 221 million transistors! How are they used? What will we do as transistor counts continue to grow? Most of chip is used for memories, inst. decoding, dynamic scheduling Why is it done this way? How much more efficient could it be if more of area went to actual processing?

Even More Recent Example

Runs 64-bit IA-64 ISA Die: 3.74 cm2 .13 process 410M transistors 1.5GHz core 1.3V logic 130W power consumption! 6.4GB/s bus Cost: $2,247$4,226 9MB L3 cache later this year

AMD Opteron (100 Million Transistors)





Cyrix III
Developed by National Semiconductor 133 MHz Front Side Bus (although it supports 66 MHz, and 100 MHz FSB). 256 KB integrated L2 cache along with a 64 KB integrated L1 cache. 3dNow! SIMD instructions in a dual pipelined FPU. As with the MII, the Cyrix III supports MMX. superscalar design featuring two seven stage pipelines allowing two processing streams to be processed simultaneously. two level translation buffer and a 512 entry branch target buffer. out-of-order execution through register renaming and data forwarding and bypassing to resolve data dependencies between pipelines. Speculative execution after a predicted branch is also supported. 15% to 20% cheaper than a comparable Celeron. Hope to capture 10% of market. Subject of a lot of legal action by Intel but VIA is still in business.



First Implementation of Key Features: Montecito

Core Core

Core 1
L3 Cache

Core 2
L3 Cache

Key Processor Features

 Intels first dual-core processor  Intels first processor with >1 billion transistors  24 MB L3 cache  Multi-threading  Compatible with existing Itanium 2-based systems
System Bus
1MB L2I 2 Way MultiMulti-threading



Power Management/ Frequency Boost (Foxton)

Targeting H22005

1.7 Billion Transistors

2x12MB L3 caches with Pellston


Multiple cores, Multiple threads and L3 Cache on ONE die


Intels Latest: The Pentium 4 2.4GHz

478 pin packaging



Selecting a Microprocessor
Issues Technical: speed, power, size, cost Other: development environment, prior expertise, licensing, etc. Speed: how evaluate a processors speed? Clock speed but instructions per cycle may differ Instructions per second but work per instr. may differ Dhrystone: Synthetic benchmark, developed in 1984. Dhrystones/sec. MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digitals VAX 11/780). A.k.a. Dhrystone MIPS. Commonly used today.  So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second SPEC: set of more realistic benchmarks, but oriented to desktops EEMBC EDN Embedded Benchmark Consortium, Suites of benchmarks: automotive, consumer electronics, networking, office automation, telecommunications

Which has higher performance?

Time to do the task (Execution Time) execution time, response time, latency Tasks per day, hour, week, sec, ns. .. (Performance) throughput, bandwidth Response time and throughput often are in opposition Response Time Time to complete a task Throughput Total amount of work done per time Execution Time (CPU Time) User CPU time Time spent in the program System CPU time Time spent in OS Elapsed Time Execution Time + Time of I/O and time sharing

Performance evaluation

Criteria of Performance
Execution time seems to measure the power of the CPU Elapsed time measures the performance of whole system including OS and I/O User is interested in elapsed time Sales people are interested in the highest number of performance that can be quoted Performance analysist is interested in both execution time and elapsed time



Coffee Time!