You are on page 1of 33

Intel 80186

The Intel 80186, also known as the iAPX 186,[4] or just 186, is a
microprocessor and microcontroller introduced in 1982. It was based Intel 80186
on the Intel 8086 and, like it, had a 16-bit external data bus
multiplexed with a 20-bit address bus. It was also available as the
80188, with an 8-bit external data bus.

An Intel A80186 processor with a


gray heat spreader.
General Info
Launched 1982
Discontinued September 28,
2007[1]
Common Intel, AMD,
Description manufacturer(s) Fujitsu, Siemens
AG[2]

Features and performance Performance


Max. CPU clock 6 MHz to 25 MHz
The 80186 series was generally intended for embedded systems, as rate
microcontrollers with external memory. Therefore, to reduce the
number of integrated circuits required, it included features such as FSB speeds 6 MHz to 25 MHz
clock generator, interrupt controller, timers, wait state generator, Data width 16 bits
DMA channels, and external chip select lines.
Address width 20 bits
The initial clock rate of the 80186 was 6 MHz, but due to more Architecture and classification
hardware available for the microcode to use, especially for address
Min. feature size 3 µm[3]
calculation, many individual instructions ran faster than on an 8086
at the same clock frequency. For instance, the common Instruction set x86-16
register+immediate[5] addressing mode was significantly faster than Physical specifications
on the 8086, especially when a memory location was both (one of
Transistors 55,000
the) operand(s) and the destination. Multiply and divide also showed
great improvement being several times as fast as on the original Co-processor 8087 and later,
8086 and multi-bit shifts were done almost four times as quickly as 80187 (for 80186
in the 8086. only)

A few new instructions were introduced with the 80186 (referred to Package(s) 68-pin PLCC
as the 8086-2 instruction set in some datasheets): enter/leave 68-pin LCC
(replacing several instructions when handling stack frames), 100-pin PQFP
pusha/popa (push/pop all general registers), bound (check array
index against bounds), and ins/outs (input/output of string). A useful (Engineering
immediate mode was added for the push, imul, and multi-bit shift Sample Only)
instructions. These instructions were also included in the 68-pin PGA
contemporary 80286 and in successor chips. (The instruction set of
the 80286 is superset of the 80186's, plus new instructions for Socket(s) PGA68
Protected mode.) PLCC-68
(variant)
The (redesigned) CMOS version, 80C186, introduced DRAM LCC-68 (variant)
refresh, a power-save mode, and a direct interface to the 8087 or
80187 floating point numeric coprocessor. Products, models, variants
Variant(s) Intel 80188
History
Predecessor Intel 8088
Successor Intel 80386 (The
80286 was also
introduced in
early 1982, and
thus
contemporary
with the 80186)
Intel 80286
The Intel 80286[3] (also marketed as the iAPX 286[4] and often
called Intel 286) is a 16-bit microprocessor that was introduced on
Intel 80286
February 1, 1982. It was the first 8086-based CPU with separate,
non-multiplexed address and data buses and also the first with
memory management and wide protection abilities. The 80286 used
approximately 134,000 transistors in its original nMOS (HMOS)
incarnation and, just like the contemporary 80186,[5] it could
correctly execute most software written for the earlier Intel 8086 and
8088 processors.[6]

The 80286 was employed for the IBM PC/AT, introduced in 1984,
and then widely used in most PC/AT compatible computers until the
early 1990s.

An Intel A80286-8 processor with a


gray ceramic heat spreader
General Info
Launched 1982
Discontinued 1991[1]
Common Intel, IBM, AMD,
manufacturer(s) Harris (Intersil),
Siemens AG,
Fujitsu
Performance
Max. CPU clock 4 MHz to 25 MHz
rate
History and performance FSB speeds 4 MHz to 25 MHz
Data width 16 bits
Intel's first 80286 chips were specified for a maximum clockrate of
4, 6 or 8 MHz and later releases for 12.5 MHz. AMD and Harris Address width 24 bits
later produced 16 MHz, 20 MHz and 25 MHz parts, respectively. Architecture and classification
Intersil and Fujitsu also designed fully static CMOS versions of
Min. feature size 1.5 µm[2]
Intel's original depletion-load nMOS implementation, largely aimed
at battery-powered devices. Instruction set x86-16 (with
MMU)
On average, the 80286 was reportedly measured to have a speed of
Physical specifications
about 0.21 instructions per clock on "typical" programs,[7] although
it could be significantly faster on optimized code and in tight loops, Transistors 134,000
as many instructions could execute in 2 clock cycles each. The Co-processor Intel 80287
6 MHz, 10 MHz and 12 MHz models were reportedly measured to
operate at 0.9 MIPS, 1.5 MIPS and 2.66 MIPS respectively.[8] Package(s) 68-pin PLCC
68-pin LCC
The later E-stepping level of the 80286 was free of the several 100-pin PQFP
significant errata that caused problems for programmers and (Engineering
operating-system writers in the earlier B-step and C-step CPUs Sample Only)
(common in the AT and AT clones).[9] 68-pin PGA
Socket(s) PGA68
Architecture PLCC-68
(variant)
Intel did not expect personal computers to use the 286.[10] The CPU
LCC-68 (variant)
was designed for multi-user systems with multitasking applications,
including communications (such as automated PBXs) and real-time History
process control. It had 134,000 transistors and consisted of four Predecessor 8086, 8088
independent units: address unit, bus unit, instruction unit and (while 80186 was
execution unit, organized into a loosely coupled (buffered) pipeline
contemporary)
just as in the 8086. The significantly increased performance over the
8086 was primarily due to the non-multiplexed address and data Successor Intel 80386
buses, more address-calculation hardware (most importantly, a
dedicated adder) and a faster (more hardware-based) multiplier.[11] It
was produced in a 68-pin package, including PLCC (plastic leaded
chip carrier), LCC (leadless chip carrier) and PGA (pin grid array)
packages.[12]

The performance increase of the 80286 over the 8086 (or 8088)
could be more than 100% per clock cycle in many programs (i.e., a
doubled performance at the same clock speed). This was a large
increase, fully comparable to the speed improvements around a
decade later when the i486 (1989) or the original Pentium (1993)
were introduced. This was partly due to the non-multiplexed address
and data buses, but mainly to the fact that address calculations (such
as base+index) were less expensive. They were performed by a
AMD 80286 (16 MHz version)
dedicated unit in the 80286, while the older 8086 had to do effective
address computation using its general ALU, consuming several extra
clock cycles in many cases. Also, the 80286 was more efficient
in the prefetch of instructions, buffering, execution of jumps,
and in complex microcoded numerical operations such as
MUL/DIV than its predecessor.[11]

The 80286 included, in addition to all of the 8086 instructions,


all of the new instructions of the 80186: ENTER, LEAVE,
BOUND, INS, OUTS, PUSHA, POPA, PUSH immediate,
IMUL immediate, and immediate shifts and rotates. The 80286
also added new instructions for protected mode: ARPL, CLTS, LAR, LGDT, LIDT, LLDT, LMSW,
LSL, LTR, SGDT, SIDT,
SLDT, SMSW, STR, VERR, and VERW. Some of the
instructions for protected mode can (or must) be used in real mode to set up and switch to protected mode,
and a few (such as SMSW and LMSW) are useful for real mode itself.

The Intel 80286 had a 24-bit address bus and was able to address up to 16 MB of RAM, compared to the
1 MB addressability of its predecessor. However, memory cost and the initial rarity of software using the
memory above 1 MB meant that 80286 computers were rarely shipped with more than one megabyte of
RAM.[11] Additionally, there was a performance penalty involved in accessing extended memory from real
mode (in which DOS, the dominant PC operating system until the mid-1990s, ran), as noted below.
Features

Protected mode

The 286 was the first of the x86 CPU family to support
protected virtual-address mode, commonly called "protected
mode". In addition, it was the first commercially available
microprocessor with on-chip MMU capabilities (systems using
the contemporaneous Motorola 68010 and NS320xx could be
equipped with an optional MMU controller). This would allow
IBM compatibles to have advanced multitasking OSes for the
first time and compete in the Unix-dominated
server/workstation market. Intel 80286 die shot

Several additional instructions were introduced in protected


mode of 80286, which are helpful for multitasking operating systems.

Another important feature of 80286 is prevention of unauthorized access. This


is achieved by:

Forming different segments for data, code, and stack, and


preventing their overlapping.
Assigning privilege levels to each segment. Segments with lower Siemens 80286 (10 MHz
privilege levels cannot access segments with higher privilege levels. version)

In 80286 (and in its co-processor Intel 80287), arithmetic operations can be


performed on the following different types of numbers:

unsigned packed decimal,


unsigned binary,
unsigned unpacked decimal,
signed binary,
floating-point numbers (only with an 80287).

By design, the 286 could not revert from protected mode to the basic 8086-
compatible real address mode ("real mode") without a hardware-initiated reset. IBM 80286 (8 MHz
In the PC/AT introduced in 1984, IBM added external circuitry, as well as version)
specialized code in the ROM BIOS and the 8042 peripheral microcontroller to
enable software to cause the reset, allowing real-mode reentry while retaining
active memory and returning control to the program that initiated the reset. (The BIOS is necessarily
involved because it obtains control directly whenever the CPU resets.) Though it worked correctly, the
method imposed a huge performance penalty.

In theory, real-mode applications could be directly executed in 16-bit protected mode if certain rules (newly
proposed with the introduction of the 80286) were followed; however, as many DOS programs did not
conform to those rules, protected mode was not widely used until the appearance of its successor, the 32-bit
Intel 80386, which was designed to go back and forth between modes easily and to provide an emulation of
real mode within protected mode. When Intel designed the 286, it was not designed to be able to multitask
real-mode applications; real mode was intended to be a simple way for a bootstrap loader to prepare the
system and then switch to protected mode; essentially, in protected mode the 80286 was designed to be a
new processor with many similarities to its predecessors, while real mode on the 80286 was offered for
smaller-scale systems that could benefit from a more advanced version of the 80186 CPU core, with
advantages such as higher clock rates, faster instruction execution (measured in clock cycles), and
unmultiplexed buses, but not the 24-bit (16 MB) memory space.

To support protected mode, new instructions have been added: ARPL, VERR, VERW, LAR, LSL, SMSW,
SGDT, SIDT, SLDT, STR, LMSW, LGDT, LIDT, LLDT, LTR, CLTS. There are also new exceptions
(internal interrupts): invalid opcode, coprocessor not available, double fault, coprocessor segment overrun,
stack fault, segment overrun/general protection fault, and others only for protected mode.

OS support

The protected mode of the 80286 was not utilized until many years after its release, in part because of the
high cost of adding extended memory to a PC, but also because of the need for software to support the large
user base of 8086 PCs. For example, in 1986 the only program that made use of it was VDISK, a RAM disk
driver included with PC DOS 3.0 and 3.1. A DOS could utilize the additional RAM available in protected
mode (extended memory) either via a BIOS call (INT 15h, AH=87h), as a RAM disk, or as emulation of
expanded memory.[11] The difficulty lay in the incompatibility of older real-mode DOS programs with
protected mode. They simply could not natively run in this new mode without significant modification. In
protected mode, memory management and interrupt handling were done differently than in real mode. In
addition, DOS programs typically would directly access data and code segments that did not belong to them,
as real mode allowed them to do without restriction; in contrast, the design intent of protected mode was to
prevent programs from accessing any segments other than their own unless special access was explicitly
allowed. While it was possible to set up a protected-mode environment that allowed all programs access to
all segments (by putting all segment descriptors into the GDT and assigning them all the same privilege
level), this undermined nearly all of the advantages of protected mode except the extended (24-bit) address
space. The choice that OS developers faced was either to start from scratch and create an OS that would not
run the vast majority of the old programs, or to come up with a version of DOS that was slow and ugly (i.e.,
ugly from an internal technical viewpoint) but would still run a majority of the old programs. Protected
mode also did not provide a significant enough performance advantage over the 8086-compatible real mode
to justify supporting its capabilities; actually, except for task switches when multitasking, it actually yielded
only a performance disadvantage, by slowing down many instructions through a litany of added privilege
checks. In protected mode, registers were still 16-bit, and the programmer was still forced to use a memory
map composed of 64 kB segments, just like in real mode.[13]

In January 1985, Digital Research previewed the Concurrent DOS 286 1.0 operating system developed in
cooperation with Intel. The product would function strictly as an 80286 native-mode (i.e. protected-mode)
operating system, allowing users to take full advantage of the protected mode to perform multi-user,
multitasking operations while running 8086 emulation.[14][15][16] This worked on the B-1 prototype step of
the chip, but Digital Research discovered problems with the emulation on the production level C-1 step in
May, which would not allow Concurrent DOS 286 to run 8086 software in protected mode. The release of
Concurrent DOS 286 was delayed until Intel would develop a new version of the chip.[14] In August, after
extensive testing on E-1 step samples of the 80286, Digital Research acknowledged that Intel corrected all
documented 286 errata, but said that there were still undocumented chip performance problems with the
prerelease version of Concurrent DOS 286 running on the E-1 step. Intel said that the approach Digital
Research wished to take in emulating 8086 software in protected mode differed from the original
specifications. Nevertheless, in the E-2 step, they implemented minor changes in the microcode that would
allow Digital Research to run emulation mode much faster.[9] Named IBM 4680 OS, IBM originally chose
DR Concurrent DOS 286 as the basis of their IBM 4680 computer for IBM Plant System products and
point-of-sale terminals in 1986.[17] Digital Research's FlexOS 286 version 1.3, a derivation of Concurrent
DOS 286, was developed in 1986, introduced in January 1987, and later adopted by IBM for their IBM 4690
OS, but the same limitations affected it.
The problems led to Bill Gates famously referring to the 80286 as a "brain-dead chip",[18] since it was clear
that the new Microsoft Windows environment would not be able to run multiple MS-DOS applications with
the 286. It was arguably responsible for the split between Microsoft and IBM, since IBM insisted that OS/2,
originally a joint venture between IBM and Microsoft, would run on a 286 (and in text mode).

Other operating systems that used the protected mode of the 286 were Microsoft Xenix (around 1984),[19]
Coherent,[20] and Minix.[21] These were less hindered by the limitations of the 80286 protected mode
because they did not aim to run MS-DOS applications or other real-mode programs. In its successor 80386
chip, Intel enhanced the protected mode to address more memory and also added the separate virtual 8086
mode, a mode within protected mode with much better MS-DOS compatibility, in order to satisfy the
diverging needs of the market.[22]
Intel 80386
The Intel 80386, also known as i386 or just 386, is a 32-bit
microprocessor introduced in 1985.[2] The first versions had 275,000 Intel 80386
transistors[3] and were the CPU of many workstations and high-end
personal computers of the time. As the original implementation of
the 32-bit extension of the 80286 architecture,[4] the 80386
instruction set, programming model, and binary encodings are still
the common denominator for all 32-bit x86 processors, which is
termed the i386-architecture, x86, or IA-32, depending on context.

The 32-bit 80386 can correctly execute most code intended for the
earlier 16-bit processors such as 8086 and 80286 that were
ubiquitous in early PCs. (Following the same tradition, modern 64-
bit x86 processors are able to run most programs written for older
x86 CPUs, all the way back to the original 16-bit 8086 of 1978.)
An Intel 80386DX 16 MHz processor
Over the years, successively newer implementations of the same
with a gray ceramic heat spreader.
architecture have become several hundreds of times faster than the
original 80386 (and thousands of times faster than the 8086).[5] A General Info
33 MHz 80386 was reportedly measured to operate at about 11.4 Launched October 1985
MIPS.[6]
Discontinued September 28,
The 80386 was introduced in October 1985, while manufacturing of 2007[1]
the chips in significant quantities commenced in June 1986.[7][8] Common Intel
Mainboards for 80386-based computer systems were cumbersome manufacturer(s) AMD
and expensive at first, but manufacturing was rationalized upon the
80386's mainstream adoption. The first personal computer to make IBM
use of the 80386 was designed and manufactured by Compaq[9] and Performance
marked the first time a fundamental component in the IBM PC
compatible de facto standard was updated by a company other than Max. CPU clock 12 MHz to
IBM. rate 40 MHz
Data width 32 bits (386SX:
In May 2006, Intel announced that 80386 production would stop at
16 bit)
the end of September 2007.[10] Although it had long been obsolete
as a personal computer CPU, Intel and others had continued making Address width 32 bits (386SX:
the chip for embedded systems. Such systems using an 80386 or one 24 bits)
of many derivatives are common in aerospace technology and Architecture and classification
electronic musical instruments, among others. Some mobile phones
Min. feature size 1.5µm to 1µm
also used (later fully static CMOS variants of) the 80386 processor,
such as BlackBerry 950[11] and Nokia 9000 Communicator. Linux Instruction set x86-32
continued to support 80386 processors until December 11, 2012; Physical specifications
when the kernel cut 386-specific instructions in version 3.8.[12]
Transistors 275,000
Co-processor Intel 80387
Package(s) 132-pin PGA,
132-pin PQFP;
SX variant: 88-
pin PGA, 100-pin
BQFP with
0.635mm pitch
Socket(s) PGA132
History
Predecessor Intel 80286
Successor Intel 80486

Intel A80386DX-20 CPU die image

Architecture
The processor was a significant evolution in the x86
architecture, and extended a long line of processors that
stretched back to the Intel 8008. The predecessor of the
80386 was the Intel 80286, a 16-bit processor with a
segment-based memory management and protection
system. The 80386 added a three-stage instruction
pipeline, extended the architecture from 16-bits to 32-
bits, and added an on-chip memory management unit.
This paging translation unit made it much easier to
implement operating systems that used virtual memory.
It also offered support for register debugging.

The 80386 featured three operating modes: real mode,


protected mode and virtual mode. The protected mode,
which debuted in the 286, was extended to allow
Intel 80386 registers
the 386 to address up to 4 GB of memory. The all
new virtual 8086 mode (or VM86) made it 3
1 ... 1
5 ... 0
7 ... 0
0 (bit position)
possible to run one or more real mode programs Main registers (8/16/32 bits)
in a protected environment, although some
EAX AX AL Accumulator
programs were not compatible.
register

EBX BX BL Base
register
The ability for a 386 to be set up to act like it had ECX CX CL Count
a flat memory model in protected mode despite register
the fact that it uses a segmented memory model in EDX DX DL Data
all modes was arguably the most important
register
feature change for the x86 processor family until
AMD released x86-64 in 2003. Index registers (16/32 bits)

ESI SI Source
Several new instructions have been added to 386: Index
BSF, BSR, BT, BTS, BTR, BTC, CDQ, CWDE,
LFS, LGS, LSS, MOVSX, MOVZX, SETcc, EDI DI Destination

SHLD, SHRD. Index

EBP BP Base
Two new segment registers have been added (FS Pointer
and GS) for general-purpose programs, single
Machine Status Word of 286 grew into eight ESP SP Stack

control registers CR0–CR7. Debug registers Pointer


DR0–DR7 were added for hardware breakpoints. Program counter (16/32 bits)
New forms of MOV instruction are used to access
EIP IP Instruction
them.
Pointer

Chief architect in the development of the 80386 Segment selectors (16 bits)
was John H. Crawford.[13] He was responsible for CS Code
extending the 80286 architecture and instruction Segment
set to 32-bit, and then led the microprogram
development for the 80386 chip. DS Data
Segment
The 80486 and P5 Pentium line of processors ES Extra
were descendants of the 80386 design. Segment

FS F Segment
Datatypes of 80386 GS G Segment

SS Stack
The following data types are directly supported
Segment
and thus implemented by one or more 80386
machine instructions; these data types are briefly Status register
described here.[14]: 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
(bit
7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
position)
Bit (boolean value), bit field (group of up
to 32 bits) and bit string (up to 4 Gbit in V R 0 N IOPL O D I T S Z 0 A 0 P 1 C EFlags
length).
8-bit integer (byte), either signed (range −128..127) or unsigned (range 0..255).
16-bit integer, either signed (range −32,768..32,767) or unsigned (range 0..65,535).
32-bit integer, either signed (range −231..231−1) or unsigned (range 0..232−1).
64-bit integer, either signed (range −263..263−1) or unsigned (range 0..264−1).
Offset, a 16- or 32-bit displacement referring to a memory location (using any addressing
mode).
Pointer, a 16-bit selector together with a 16- or 32-bit offset.
Character (8-bit character code).
String, a sequence of 8-, 16- or 32-bit words (up to 4 Gbit in length).[15]
BCD, decimal digits (0..9) represented by unpacked bytes.
Packed BCD, two BCD digits in one byte (range 0..99).
Intel 80486
The Intel 80486, also known as the i486 or 486, is the successor
model of 32-bit x86 microprocessor to the Intel 80386. Introduced in Intel 80486
1989, the 80486 improved on the performance of the 80386DX
thanks to on-die L1 cache and floating-point unit, as well as an
improved, five-stage tightly-coupled pipelined design. It was the
first x86 chip to use more than a million transistors. It represents the
fourth generation of binary compatible CPUs since the original 8086
of 1978.

The initial model, the 80486DX, was introduced with 25 and


33 MHz models. Later a 50 MHz part was added, then clock-
doubled DX2/50 and DX2/66 parts, and later still, clock-tripled The exposed die of an Intel
DX4/75 and DX4/100 ones. 80486DX2 microprocessor
General Info
The 486DX was later supplemented with the cheaper 80486SX,
which was also available in 16 and 20 MHz variants. The "SX" and Launched April 1989
"DX" designations matched those of the 80386DX and 80386SX, Discontinued September 28,
but had different meanings. For the 486, the SX designation 2007
indicated no on-chip FPU. In early 486SX units, the FPU was
present but disabled; later models removed it entirely. A Common Intel, IBM, AMD,
supplementary 80487DX upgrade chip was also offered, but this was manufacturer(s) Texas
not an FPU; it was an entire, complete 80486 replacement processor Instruments,
that disabled the original SX part. Harris
Semiconductor,
A 50 MHz 80486 executes around 40 million instructions per second
UMC, SGS
on average and is able to reach 50 MIPS peak performance.
Thomson
Performance
Max. CPU clock 16 MHz to
rate 100 MHz
FSB speeds 16 MHz to
50 MHz
Data width 32 bits[1]
Address width 32 bits[1]
Virtual address 32 bits (linear);
width 46 bits (logical)[1]
Architecture and classification
Min. feature size 1 µm to 0.6 µm
Instruction set x86 including x87
(except for "SX"
models)
Physical specifications
Co-processor Intel 80487SX
Background
Package(s) PGA (socket 1,
The 80486 was announced at Spring Comdex in April 1989. At the 2, 3), 196-pin
announcement, Intel stated that samples would be available in the PQFP, 208-pin
third quarter of 1989 and production quantities would ship in the SQFP
fourth quarter of 1989.[2] The first 80486-based PCs were
History
announced in late 1989, but some advised that people wait until
1990 to purchase an 80486 PC because there were early reports of Predecessor Intel 80386
bugs and software incompatibilities.[3] Successor Pentium (P5)

Improvements
The instruction set of the i486 is very similar to its
predecessor, the Intel 80386, with the addition of only a
few extra instructions, such as CMPXCHG which
implements a compare-and-swap atomic operation and
XADD, a fetch-and-add atomic operation returning the
original value (unlike a standard ADD which returns
flags only).

From a performance point of view, the architecture of


the i486 is a vast improvement over the 80386. It has an
on-chip unified instruction and data cache, an on-chip
floating-point unit (FPU) and an enhanced bus interface
unit. Due to the tight pipelining, sequences of simple
instructions (such as ALU reg,reg and ALU reg,im)
could sustain a single clock cycle throughput (one
instruction completed every clock). These Intel 80486 registers
improvements yielded a rough doubling in integer 3 ... 1
... 0
... 0
(bit
1 5 7 0
ALU performance over the 386 at the same clock
position)
rate. A 16-MHz 80486 therefore had a
performance similar to a 33-MHz 386, and the Main registers (8/16/32 bits)
older design had to reach 50 MHz to be EAX AH AL A register
comparable with a 25-MHz 80486 part. [a]
EBX BH BL B register

ECX CH CL C register
Differences between i386 and i486 EDX DH DL D register

An 8 KB on-chip (level 1) SRAM cache Index registers (16/32 bits)


stores the most recently used instructions ESI SI Source
and data (16 KB and/or write-back on Index
some later models). The 386 had no such
internal cache but supported a slower off- EDI DI Destination
chip cache (which was not a level 2 Index
cache because there was no internal EBP BP Base
level 1 cache on the 80386).
Pointer
An enhanced external bus protocol to
ESP SP Stack
enable cache coherency and a new burst
mode for memory accesses to fill a Pointer
cacheline of 16 bytes within 5 bus cycles. Program counter (16/32 bits)
The 386 needed 8 bus cycles to transfer
EIP IP Instruction
the same amount of data.
Tightly-coupled[b] pipelining completes a Pointer
simple instruction like ALU reg,reg or ALU Segment selectors (16 bits)
reg,im every clock cycle (after a latency
CS Code
of several cycles). The 386 needed two
clock cycles to do this. Segment

Integrated FPU (disabled or absent in SX DS Data


models) with a dedicated local bus; Segment
together with faster algorithms on more
ES Extra
extensive hardware than in the i387, this
Segment
performs floating point calculations faster
compared to the i386+i387 combination. FS F Segment
Improved MMU performance. GS G Segment
New instructions: XADD, BSWAP, SS Stack
CMPXCHG, INVD, WBINVD, INVLPG.
Segment

Just as in the 80386, a simple flat 4 GB memory Status register


model could be implemented by setting all 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 (bit
"segment selector" registers to a neutral value in
position)
protected mode, or setting (the same) "segment
registers" to zero in real mode, and using only the V R 0 N IOPL O D I T S Z 0 A 0 P 1 C EFlags
32-bit "offset registers" (x86-terminology for
Floating-point registers (80 bits)
general CPU registers used as address registers)
7 0
as a linear 32-bit virtual address bypassing the 9 ... 0 (bit
segmentation logic. Virtual addresses were then position)
normally mapped onto physical addresses by the ST0 STack
paging system except when it was disabled. (Real
register 0
mode had no virtual addresses.) Just as with the
80386, circumventing memory segmentation ST1 STack
could substantially improve performance in some register 1
operating systems and applications. ST2 STack
register 2
On a typical PC motherboard, either four matched
30-pin (8-bit) SIMMs or one 72-pin (32-bit) ST3 STack
SIMM per bank were required to fit the 80486's register 3
32-bit data bus. The address bus used 30-bits ST4 STack
(A31..A2) complemented by four byte-select pins register 4
(instead of A0,A1) to allow for any 8/16/32-bit
ST5 STack
selection. This meant that the limit of directly
addressable physical memory was 4 gigabytes as register 5

well (230 32-bit words = 232 8-bit words). ST6 STack


register 6

ST7 STack
register 7
P5 (microarchitecture)
The first Pentium microprocessor was introduced by Intel on March 22, 1993.[2][3] Its P5 microarchitecture was the fifth
generation for Intel, and the first superscalar IA-32 microarchitecture. As a direct extension of the 80486 architecture, it
P5
included dual integer pipelines, a faster floating-point unit, wider data bus, separate code and data caches and features for
further reduced address calculation latency. In October 1996, the Pentium with MMX Technology (often simply referred to
as Pentium MMX) was introduced, complementing the same basic microarchitecture with the MMX instruction set, larger
caches, and some other enhancements.

The P5 Pentium competitors included the Motorola 68060 and the PowerPC 601 as well as the SPARC, MIPS, and Alpha
microprocessor families, most of which also used a superscalar in-order dual instruction pipeline configuration at some
time.

Intel's Larrabee multicore architecture project uses a processor core derived from a P5 core (P54C), augmented by
multithreading, 64-bit instructions, and a 16-wide vector processing unit.[4] Intel's low-powered Bonnell microarchitecture
General Info
employed in early Atom processor cores also uses an in-order dual pipeline similar to P5.[5]
Launched March 22, 1993
Intel discontinued the P5 Pentium processors (which had been downgraded to an entry-level product since the Pentium II
Discontinued February 15,
debuted in 1997) in early 2000 in favor of the Celeron processor which also replaced the 80486 brand.[1]
2000[1]
Performance
Max. CPU 60–300 MHz
clock rate
FSB speeds 50–66 MHz
Cache
L1 cache 16–32 KiB
Architecture and classification
Architecture P5 (IA-32)
Instructions MMX
Physical specifications
Socket(s) Socket 4
Socket 5
Socket 7
Products, models, variants
Model(s) Pentium series
Pentium OverDrive
series
Pentium MMX
series
History
Predecessor Intel 80486
Successor P6, Pentium II
Development
The P5 microarchitecture was designed by the same Santa Clara team which designed the 386 and 486.[6] Design work
started in 1989;[7] the team decided to use a superscalar architecture, with on-chip cache, floating-point, and branch
prediction. The preliminary design was first successfully simulated in 1990, followed by the laying-out of the design. By
this time, the team had several dozen engineers. The design was taped out, or transferred to silicon, in April 1992, at which
point beta-testing began.[8] By mid-1992, the P5 team had 200 engineers.[9] Intel at first planned to demonstrate the P5 in
June 1992 at the trade show PC Expo, and to formally announce the processor in September 1992,[10] but design problems
forced the demo to be cancelled, and the official introduction of the chip was delayed until the spring of 1993.[11][12]

John H. Crawford, chief architect of the original 386, co-managed the design of the P5,[13] along with Donald Alpert, who
Intel Pentium A80501 66 MHz SX950
managed the architectural team. Dror Avnon managed the design of the FPU.[14] Vinod K. Dham was general manager of
Die Image
the P5 group.[15]

Major improvements over the 80486 microarchitecture

The P5 microarchitecture brings several important advancements over the preceding i486 architecture.

Performance:
Superscalar architecture — The Pentium has two datapaths (pipelines) that allow it to complete two instructions per clock cycle in many
cases. The main pipe (U) can handle any instruction, while the other (V) can handle the most common simple instructions. Some RISC
proponents had argued that the "complicated" x86 instruction set would probably never be implemented by a tightly pipelined
microarchitecture, much less by a dual-pipeline design. The 486 and the Pentium demonstrated that this was indeed possible and feasible.
64-bit external databus doubles the amount of information possible to read or write on each memory access and therefore allows the
Pentium to load its code cache faster than the 80486; it also allows faster access and storage of 64-bit and 80-bit x87 FPU data.
Separation of code and data caches lessens the fetch and operand read/write conflicts compared to the 486. To reduce access time and
implementation cost, both of them are 2-way associative, instead of the single 4-way cache of the 486. A related enhancement in the
Pentium is the ability to read a contiguous block from the code cache even when it is split between two cache lines (at least 17 bytes in
worst case).
Much faster floating-point unit. Some instructions showed an enormous improvement, most notably FMUL, with up to 15 times higher
throughput than in the 80486 FPU. The Pentium is also able to execute a FXCH ST(x) instruction in parallel with an ordinary (arithmetical or
load/store) FPU instruction.
Four-input address adders enables the Pentium to further reduce the address calculation latency compared to the 80486. The Pentium can
calculate full addressing modes with segment-base + base-register + scaled register + immediate offset in a single cycle; the 486 has a
three-input address adder only, and must therefore divide such calculations between two cycles.
The microcode can employ both pipelines to enable auto-repeating instructions such as REP MOVSW perform one iteration every clock
cycle, while the 80486 needed three clocks per iteration (and the earliest x86 chips significantly more than the 486). Also, optimization of the
access to the first microcode words during the decode stages helps in making several frequent instructions execute significantly more
quickly, especially in their most common forms and in typical cases. Some examples are (486→Pentium, in clock cycles): CALL (3→1), RET
(5→2), shifts/rotates (2–3→1).
A faster, fully hardware-based multiplier makes instructions such as MUL and IMUL several times faster (and more predictable) than in the
80486; the execution time is reduced from 13–42 clock cycles down to 10–11 for 32-bit operands.
Virtualized interrupt to speed up virtual 8086 mode.
Other features:
Enhanced debug features with the introduction of the Processor-based debug port (see Pentium Processor Debugging in the Developers
Manual, Vol 1).
Enhanced self-test features like the L1 cache parity check (see Cache Structure in the Developers Manual, Vol 1).
New instructions: CPUID, CMPXCHG8B, RDTSC, RDMSR, WRMSR, RSM.
Test registers TR0–TR7 and MOV instructions for access to them were eliminated.
The later Pentium MMX also added the MMX instruction set, a basic integer SIMD instruction set extension marketed for use in multimedia
applications. MMX could not be used simultaneously with the x87 FPU instructions because the registers were reused (to allow fast context
switches). More important enhancements were the doubling of the instruction and data cache sizes and a few microarchitectural changes for
better performance.

The Pentium was designed to execute over 100 million instructions per second (MIPS),[16] and the 75 MHz model was able to reach 126.5 MIPS in certain
benchmarks.[17] The Pentium architecture typically offered just under twice the performance of a 486 processor per clock cycle in common benchmarks. The
fastest 80486 parts (with slightly improved microarchitecture and 100 MHz operation) were almost as powerful as the first-generation Pentiums, and the AMD
Am5x86 was roughly equal to the Pentium 75 regarding pure ALU performance.

Errata

The early versions of 60–100 MHz P5 Pentiums had a problem in the floating-point unit that resulted in incorrect (but predictable) results from some division
operations. This flaw, discovered in 1994 by professor Thomas Nicely at Lynchburg College, Virginia, became widely known as the Pentium FDIV bug and
caused embarrassment for Intel, which created an exchange program to replace the faulty processors.

In 1997, another erratum was discovered that could allow a malicious program to crash a system without any special privileges, the "F00F bug". All P5 series
processors were affected and no fixed steppings were ever released, however contemporary operating systems were patched with workarounds to prevent crashes.

Cores and steppings


The Pentium was Intel's primary microprocessor for personal computers during the mid-1990s. The original design was reimplemented in newer processes and
new features were added to maintain its competitiveness as well as to address specific markets such as portable computers. As a result, there were several variants
of the P5 microarchitecture.

P5

The first Pentium microprocessor core was code-named "P5". Its product code was 80501 (80500 for the earliest steppings Q0399).
There were two versions, specified to operate at 60 MHz and 66 MHz respectively, using Socket 4. This first implementation of the
Pentium used a traditional 5-volt power supply (descended from the usual TTL logic compatibility requirements). It contained 3.1
million transistors and measured 16.7 mm by 17.6 mm for an area of 293.92 mm2.[18] It was fabricated in a 0.8 μm BiCMOS
process.[19] The 5-volt design resulted in relatively high energy consumption for its operating frequency when compared to the
directly following models.
Intel Pentium
microarchitecture
P54C

The P5 was followed by the P54C (80502) in 1994, with versions specified to operate at 75, 90, or 100 MHz using a 3.3
volt power supply. Marking the switch to Socket 5, this was the first Pentium processor to operate at 3.3 volts, reducing
energy consumption, but necessitating voltage regulation on mainboards. As with higher-clocked 486 processors, an
internal clock multiplier was employed from here on to let the internal circuitry work at a higher frequency than the
external address and data buses, as it is more complicated and cumbersome to increase the external frequency, due to
physical constraints. It also allowed two-way multiprocessing and had an integrated local APIC as well as new power
management features. It contained 3.3 million transistors and measured 163 mm2.[20] It was fabricated in a BiCMOS
process which has been described as both 0.5 μm and 0.6 μm due to differing definitions.[20]

P54CQS
Intel Pentium P54C die shot
The P54C was followed by the P54CQS in early 1995, which operated at 120 MHz. It was fabricated in a 0.35 μm BiCMOS process and was the first commercial
microprocessor to be fabricated in a 0.35 μm process.[20] Its transistor count is identical to the P54C and, despite the newer process, it had an identical die area as
well. The chip was connected to the package using wire bonding, which only allows connections along the edges of the chip. A smaller chip would have required a
redesign of the package, as there is a limit on the length of the wires and the edges of the chip would be further away from the pads on the package. The solution
was to keep the chip the same size, retain the existing pad-ring, and only reduce the size of the Pentium's logic circuitry to enable it to achieve higher clock
frequencies.[20]

P54CS

The P54CQS was quickly followed by the P54CS, which operated at 133, 150, 166 and 200 MHz, and introduced Socket 7. It contained 3.3 million transistors,
measured 90 mm2 and was fabricated in a 0.35 μm BiCMOS process with four levels of interconnect.

P24T

The P24T Pentium OverDrive for 486 systems were released in 1995, which were based on 3.3 V 0.6 μm versions using a 63 or 83 MHz clock. Since these used
Socket 2/3, some modifications had to be made to compensate for the 32-bit data bus and slower on-board L2 cache of 486 motherboards. They were therefore
equipped with a 32 KB L1 cache (double that of pre-P55C Pentium CPUs).

P55C

The P55C (or 80503) was developed by Intel's Research & Development Center in Haifa, Israel. It was sold as
Pentium with MMX Technology (usually just called Pentium MMX); although it was based on the P5 core, it
featured a new set of 57 "MMX" instructions intended to improve performance on multimedia tasks, such as
encoding and decoding digital media data. The Pentium MMX line was introduced on October 22, 1996, and
released in January 1997.[21]
Intel Pentium MMX
The new instructions worked on new data types: 64-bit packed vectors of either eight 8-bit integers, four 16-bit microarchitecture
Pentium logo, integers, two 32-bit integers, or one 64-bit integer. So, for example, the PADDUSB (Packed ADD Unsigned
with MMX Saturated Byte) instruction adds two vectors, each containing eight 8-bit unsigned integers together,
enhancement
elementwise; each addition that would overflow saturates, yielding 255, the maximal unsigned value that can be
(1993–1999)
represented in a byte. These rather specialized instructions generally require special coding by the programmer
for them to be used.

Other changes to the core include a 6-stage pipeline (vs. 5 on P5) with a return stack (first done on Cyrix 6x86) and better parallelism,
an improved instruction decoder, 32 KB L1 cache with 4-way associativity (vs. 16 KB with 2-way on P5), 4 write buffers that could
now be used by either pipeline (vs. one corresponding to each pipeline on P5) and an improved branch predictor taken from the
Pentium Pro,[22][23] with a 512-entry buffer (vs. 256 on P5).[24]
Pentium MMX 166 MHz
It contained 4.5 million transistors and had an area of 140 mm2. It was fabricated in a 0.28 μm CMOS process with the same metal without cover
pitches as the previous 0.35 μm BiCMOS process, so Intel described it as "0.35 μm" because of its similar transistor density.[25] The
process has four levels of interconnect.[25]

While the P55C remained compatible with Socket 7, the voltage requirements for powering the chip differ from the standard Socket 7 specifications. Most
motherboards manufactured for Socket 7 prior to the establishment of the P55C standard are not compliant with the dual voltage rail required for proper operation
of this CPU (2.9 volt core voltage, 3.3 volt I/O voltage). Intel addressed the issue with OverDrive upgrade kits that featured an interposer with its own voltage
regulation.
Pentium Pro
The Pentium Pro is a sixth-generation x86 microprocessor
developed and manufactured by Intel introduced in November 1, Pentium Pro
1995.[1] It introduced the P6 microarchitecture (sometimes referred General Info
to as i686) and was originally intended to replace the original Launched November 1,
Pentium in a full range of applications. While the Pentium and
1995
Pentium MMX had 3.1 and 4.5 million transistors, respectively, the
Pentium Pro contained 5.5 million transistors.[2] Later, it was Discontinued June 1998
reduced to a more narrow role as a server and high-end desktop Common Intel
processor and was used in supercomputers like ASCI Red, the first manufacturer(s)
computer to reach the teraFLOPS performance mark.[3] The Pentium
Performance
Pro was capable of both dual- and quad-processor configurations. It
only came in one form factor, the relatively large rectangular Socket Max. CPU clock 150 MHz to
8. The Pentium Pro was succeeded by the Pentium II Xeon in 1998. rate 200 MHz
FSB speeds 60 MHz to
66 MHz
Architecture and classification
Min. feature size 0.35 μm to
0.50 μm
Microarchitecture P6
Instruction set x86
Physical specifications
Cores 1
Socket(s) Socket 8
History
Successor Pentium II
Xeon

Microarchitecture
The lead architect of Pentium Pro was Fred Pollack who was specialized in superscalarity and had also
worked as the lead engineer of the Intel iAPX 432.[5]
Summary

The Pentium Pro incorporated a new microarchitecture, different from the


Pentium's P5 microarchitecture. It has a decoupled, 14-stage superpipelined
architecture which used an instruction pool. The Pentium Pro (P6) featured
many advanced concepts not found in the Pentium, although it wasn't the first
or only x86 processor to implement them (see NexGen Nx586 or Cyrix 6x86).
The Pentium Pro pipeline had extra decode stages to dynamically translate IA-
32 instructions into buffered micro-operation sequences which could then be 200 MHz Pentium Pro
analysed, reordered, and renamed in order to detect parallelizable operations with a 512 KB L2 cache
that may be issued to more than one execution unit at once. The Pentium Pro in PGA package
thus featured out of order execution, including speculative execution via
register renaming. It also had a wider 36-bit address bus (usable by PAE),
allowing it to access up to 64 GB of memory.

The Pentium Pro has an 8 KB instruction cache, from which up to 16 bytes are
fetched on each cycle and sent to the instruction decoders. There are three
instruction decoders. The decoders are not equal in capability: only one can
decode any x86 instruction, while the other two can only decode simple x86
instructions. This restricts the Pentium Pro's ability to decode multiple
instructions simultaneously, limiting superscalar execution. x86 instructions are
200 MHz Pentium Pro
decoded into 118-bit micro-operations (micro-ops). The micro-ops are RISC-
with a 1 MB L2 cache in
like; that is, they encode an operation, two sources, and a destination. The
PPGA package.
general decoder can generate up to four micro-ops per cycle, whereas the
simple decoders can generate one micro-op each per cycle. Thus, x86
instructions that operate on the memory (e.g., add this register to this location
in the memory) can only be processed by the general decoder, as this operation
requires a minimum of three micro-ops. Likewise, the simple decoders are
limited to instructions that can be translated into one micro-op. Instructions that
require more micro-ops than four are translated with the assistance of a
sequencer, which generates the required micro-ops over multiple clock cycles.
The Pentium Pro was the first processor in the x86-family to support
upgradeable microcode under BIOS and/or operating system control.[6] Uncapped Pentium Pro
256 KB
Micro-ops exit the re-order buffer (ROB) and enter a reserve station (RS),
where they await dispatch to the execution units. In each clock cycle, up to five
micro-ops can be dispatched to five execution units. The Pentium Pro has a
total of six execution units: two integer units, one floating-point unit (FPU), a
load unit, store address unit, and a store data unit.[7] One of the integer units
shares the same ports as the FPU, and therefore the Pentium Pro can only
dispatch one integer micro-op and one floating-point micro-op, or two integer
micro-ops per a cycle, in addition to micro-ops for the other three execution
units. Of the two integer units, only the one that shares the path with the FPU
on port 0 has the full complement of functions such as a barrel shifter, Pentium II Overdrive with
multiplier, divider, and support for LEA instructions. The second integer unit, heatsink removed. Flip-
which is connected to port 1, does not have these facilities and is limited to chip Deschutes core is
simple operations such as add, subtract, and the calculation of branch target on the left. 512 KB cache
addresses.[7] is on the right.[4]

The FPU executes floating-point operations. Addition and multiplication are


pipelined and have a latency of three and five cycles, respectively. Division and square-root are not
pipelined and are executed in separate units that share the FPU's ports. Division and square root have a
latency of 18-36 and 29-69 cycles, respectively. The smallest number is for single precision (32-bit)
floating-point numbers and the largest for extended precision (80-bit) numbers. Division and square root can
operate simultaneously with adds and multiplies, preventing them from executing only when the result has
to be stored in the ROB.

After the microprocessor was released, a bug was discovered in the floating point unit, commonly called the
"Pentium Pro and Pentium II FPU bug" and by Intel as the "flag erratum". The bug occurs under some
circumstances during floating point-to-integer conversion when the floating point number won't fit into the
smaller integer format, causing the FPU to deviate from its documented behaviour. The bug is considered to
be minor and occurs under such special circumstances that very few, if any, software programs are affected.

The Pentium Pro P6 microarchitecture was used in one form or another by Intel for more than a decade. The
pipeline would scale from its initial 150 MHz start, all the way up to 1.4 GHz with the "Tualatin" Pentium
III. The design's various traits would continue after that in the derivative core called "Banias" in Pentium M
and Intel Core (Yonah), which itself would evolve into the Core microarchitecture (Core 2 processor) in
2006 and onward.[8]

Performance

Despite being advanced for the time, the Pentium Pro's out-of-order register renaming architecture had
trouble with running 16-bit code and mixed code (8/16-bit or 16/32-bit), as using partial registers cause
frequent pipeline flushing.[9] Specific use of partial registers was a common performance optimization in the
day, as it incurred no performance penalty on pre-P6 Intel processors; also, the dominant operating systems
at the time of the Pentium Pro's release were 16-bit DOS and the mixed 16/32-bit Windows 3.1x and
Windows 95 (although the latter requires a 32-bit 80386 CPU, much of its code is still 16-bit for
performance reasons, such as USER.exe). This, together with the high cost of Pentium Pro systems, caused
rather lackluster reception among PC enthusiasts at the time. To take full advantage of the Pentium Pro's P6
microarchitecture, a fully 32-bit OS is needed, such as Windows NT, Linux, Unix, or OS/2. The
performance issues on legacy code were later partially mitigated by Intel with the Pentium II.

Compared to RISC microprocessors, the Pentium Pro, when introduced, slightly outperformed the fastest
RISC microprocessors on integer performance when running the SPECint95 benchmark,[10] but floating-
point performance was significantly lower, half of some RISC microprocessors.[10] The Pentium Pro's
integer performance lead disappeared rapidly, first overtaken by the MIPS Technologies R10000 in January
1996, and then by Digital Equipment Corporation's EV56 variant of the Alpha 21164.[11]

Reviewers quickly noted the very slow writes to video memory as the weak spot of the P6 platform, with
performance here being as low as 10% of an identically clocked Pentium system in benchmarks such as
VIDSPEED. Methods to circumvent this included setting VESA drawing to system memory instead of
video memory in games such as Quake,[12] and later on utilities such as FASTVID emerged, which could
double performance in certain games by enabling the write combining features of the CPU.[13][14] MTRRs
are set automatically by Windows video drivers starting from ~1997, and there the improved cache/memory
subsystem and FPU performance caused it to outclass the Pentium clock-for-clock in the emerging 3D
games of the mid–to–late 1990s, particularly when using NT4. However, its lack of MMX implementation
reduces performance in multimedia applications that made use of those instructions.

Caching

Likely Pentium Pro's most noticeable addition was its on-package L2 cache, which ranged from 256 KB at
introduction to 1 MB in 1997. At the time, manufacturing technology did not feasibly allow a large L2 cache
to be integrated into the processor core. Intel instead placed the L2 die(s) separately in the package which
still allowed it to run at the same clock speed as the CPU core. Additionally, unlike most motherboard-based
cache schemes that shared the main system bus with the CPU, the Pentium Pro's cache had its own back-
side bus (called dual independent bus by Intel). Because of this, the CPU could read main memory and
cache concurrently, greatly reducing a traditional bottleneck. The cache was also "non-blocking", meaning
that the processor could issue more than one cache request at a time (up to 4), reducing cache-miss penalties.
(This is an example of MLP, Memory Level Parallelism.) These properties combined to produce an L2
cache that was immensely faster than the motherboard-based caches of older processors. This cache alone
gave the CPU an advantage in input/output performance over older x86 CPUs. In multiprocessor
configurations, Pentium Pro's integrated cache skyrocketed performance in comparison to architectures
which had each CPU sharing a central cache.

However, this far faster L2 cache did come with some complications. The Pentium Pro's "on-package cache"
arrangement was unique. The processor and the cache were on separate dies in the same package and
connected closely by a full-speed bus. The two or three dies had to be bonded together early in the
production process, before testing was possible. This meant that a single, tiny flaw in either die made it
necessary to discard the entire assembly, which was one of the reasons for the Pentium Pro's relatively low
production yield and high cost. All versions of the chip were expensive, those with 1024 KB being
particularly so, since it required two 512 KB cache dies as well as the processor die.
Pentium II
The Pentium II[2] brand refers to Intel's sixth-generation
microarchitecture ("P6") and x86-compatible microprocessors Pentium II
introduced on May 7, 1997. Containing 7.5 million transistors (27.4
million in the case of the mobile Dixon with 256 KB L2 cache), the
Pentium II featured an improved version of the first P6-generation
core of the Pentium Pro, which contained 5.5 million transistors.
However, its L2 cache subsystem was a downgrade when compared
to the Pentium Pro's.

In 1998, Intel stratified the Pentium II family by releasing the


Pentium II-based Celeron line of processors for low-end
workstations and the Pentium II Xeon line for servers and high-end
workstations. The Celeron was characterized by a reduced or
omitted (in some cases present but disabled) on-die full-speed L2
cache and a 66 MT/s FSB. The Xeon was characterized by a range
of full-speed L2 cache (from 512 KB to 2048 KB), a 100 MT/s FSB,
a different physical interface (Slot 2), and support for symmetric Original Pentium II MMX Case
multiprocessing. Badge

In February 1999, the Pentium II was replaced by the nearly General Info
identical Pentium III, which only added the then new SSE Launched May 7, 1997
instruction set. However, the older family would continue to be
Discontinued December 26,
produced until June 2001 for desktop units,[3] September 2001 for
2003[1]
mobile units,[4] and the end of 2003 for embedded devices.[1]
Common Intel
manufacturer(s)
Performance
Max. CPU clock 233 MHz to
rate 450 MHz
FSB speeds 66 MHz to
100 MHz
Architecture and classification
Min. feature size 0.35 μm to
0.18 μm
Microarchitecture P6
Instruction set IA-32, MMX
Physical specifications
Cores 1
Socket(s) Slot 1
MMC-1
MMC-2
Mini-Cartridge
PPGA-B615
(μPGA1)
Products, models, variants
Core name(s) Klamath
Overview
Deschutes
The Pentium II microprocessor was largely based upon the Tonga
microarchitecture of its predecessor, the Pentium Pro, but with some Dixon
significant improvements.[5]
History
Unlike previous Pentium and Pentium Pro processors, the Pentium II Predecessor Pentium,
CPU was packaged in a slot-based module rather than a CPU socket. Pentium Pro
The processor and associated components were carried on a
daughterboard similar to a typical expansion board within a plastic Successor Pentium III
cartridge. A fixed or removable heatsink was carried on
one side, sometimes using its own fan.[6]

This larger package was a compromise allowing Intel to


separate the secondary cache from the processor while
still keeping it on a closely coupled back-side bus. The
L2 cache ran at half the processor's clock frequency,
unlike the Pentium Pro, whose off die L2 cache ran at
the same frequency as the processor. However, its
associativity was increased to 16-way (compared to 4-
way on the Pentium Pro) and its size was always
512 KB, twice of the smallest option of 256 KB on the
Pentium Pro. Off-package cache solved the Pentium
Pro's low yield issues, allowing Intel to introduce the Pentium II processor with MMX technology, SECC
Pentium II at a mainstream price level.[7][8] cartridge.

Intel improved 16-bit code execution performance on


the Pentium II, an area in which the Pentium Pro was at
a notable handicap, by adding segment register caches. Most consumer software of the day was still using at
least some 16-bit code, because of a variety of factors. The issues with partial registers was also addressed
by adding an internal flag to skip pipeline flushes whenever possible.[9] To compensate for the slower L2
cache, the Pentium II featured 32 KB of L1 cache, double that of the Pentium Pro, as well as 4 write buffers
(vs. 2 on the Pentium Pro); these can also be used by either pipeline, instead of each one being fixed to one
pipeline.[10][11] The Pentium II was also the first P6-based CPU to implement the Intel MMX integer SIMD
instruction set which had already been introduced on the Pentium MMX.[7]

The Pentium II was basically a more consumer-oriented version of the Pentium Pro. It was cheaper to
manufacture because of the separate, slower L2 cache memory. The improved 16-bit performance and
MMX support made it a better choice for consumer-level operating systems, such as Windows 9x, and
multimedia applications. The slower and cheaper L2 cache's performance penalty was mitigated by the
doubled L1 cache and architectural improvements for legacy code. General processor performance was
increased while costs were cut.[7][12]

All Klamath and some early Deschutes Pentium IIs use a combined L2 cache controller / tag RAM chip that
only allows for 512 MB to be cached; while more RAM could be installed in theory, this would result in
very slow performance. While this limit was practically irrelevant for the average home user at the time, it
was a concern for some workstation or server users. Presumably, Intel put this limitation deliberately in
place to distinguish the Pentium II from the more upmarket Pentium Pro line, which has a full 4 GB
cacheable area. The '82459AD' revision of the chip on some 333 MHz and all 350 MHz and faster Pentium
IIs lifted this restriction and also offered a full 4 GB cacheable area.[13][14]
Pentium III
The Pentium III[2] (marketed as Intel Pentium III Processor,
informally PIII, and stylized as pentium !!!) brand refers to Intel's
Pentium III
32-bit x86 desktop and mobile microprocessors based on the sixth-
generation P6 microarchitecture introduced on February 26, 1999.
The brand's initial processors were very similar to the earlier
Pentium II-branded microprocessors. The most notable differences
were the addition of the SSE instruction set (to accelerate floating
point and parallel calculations), and the introduction of a
controversial serial number embedded in the chip during the
manufacturing process.

Even after the release of the Pentium 4 in late 2000, the Pentium III
continued to be produced with new models introduced until early
2003, and were discontinued in April 2004 for desktop units[3] and
May 2007 for mobile units.[1]

General Info
Launched February 26,
1999
Discontinued May 18, 2007[1]
Common Intel
manufacturer(s)
Performance
Max. CPU clock 400 MHz to
rate 1.4 GHz
FSB speeds 100 MHz to
133 MHz
Architecture and classification
Min. feature size 0.25 μm to 0.13
μm
Microarchitecture P6
Instruction set IA-32, MMX,
SSE
Physical specifications
Cores 1
Processor cores Socket(s) Slot 1

Similarly to the Pentium II it superseded, the Pentium III was also Socket 370
accompanied by the Celeron brand for lower-end versions, and the Socket 479
Xeon for high-end (server and workstation) derivatives. The (mobile)
Pentium III was eventually superseded by the Pentium 4, but its
Tualatin core also served as the basis for the Pentium M CPUs, Products, models, variants
which used many ideas from the P6 microarchitecture. Subsequently, Core name(s) Katmai
it was the Pentium M microarchitecture of Pentium M branded
CPUs, and not the NetBurst found in Pentium 4 processors, that Coppermine
formed the basis for Intel's energy-efficient Core microarchitecture Coppermine T
of CPUs branded Core 2, Pentium Dual-Core, Celeron (Core), and
Tualatin
Xeon.
History
Predecessor Pentium II
Successor Pentium 4,
Xeon, Celeron,
Pentium M
Pentium 4
Pentium 4[1][2] is a brand by Intel for an entire series of single-core
CPUs for desktops, laptops and entry-level servers. The processors Pentium 4
were shipped from November 20, 2000, until August 8, 2008.[3][4] General Info
The CPU was active from 2000 until May 21, 2010.[5][6] Launched November 20,
2000
All Pentium 4 CPUs are based on the NetBurst architecture. The
Pentium 4 Willamette (180 nm) introduced SSE2, while the Prescott Discontinued August 8, 2008
(90 nm) introduced SSE3. Later versions introduced Hyper- Performance
Threading Technology (HTT).
Max. CPU clock 1.3 GHz to
The first Pentium 4-branded processor to implement 64-bit was the rate 3.8 GHz
Prescott (90 nm) (February 2004), but this feature was not enabled. FSB speeds 400 MT/s to
Intel subsequently began selling 64-bit Pentium 4s using the "E0" 1066 MT/s
revision of the Prescotts, being sold on the OEM market as the
Pentium 4, model F. The E0 revision also adds eXecute Disable Architecture and classification
(XD) (Intel's name for the NX bit) to Intel 64. Intel's official launch Microarchitecture NetBurst
of Intel 64 (under the name EM64T at that time) in mainstream
Instruction set x86 (i386), x86-
desktop processors was the N0 stepping Prescott-2M.
64 (only some
Intel also marketed a version of their low-end Celeron processors chips), MMX,
based on the NetBurst microarchitecture (often referred to as SSE, SSE2,
Celeron 4), and a high-end derivative, Xeon, intended for multi- SSE3
socket servers and workstations. In 2005, the Pentium 4 was Physical specifications
complemented by the dual-core-brands Pentium D and Pentium
Extreme Edition. Transistors 42M 180 nm
55M 130 nm
169M 130 nm
(P4EE)
125M 90 nm
188M 65 nm
Socket(s) Socket 423
Socket 478
LGA 775
History
Predecessor Pentium III
Successor Pentium D,
Core 2
Microarchitecture
In benchmark evaluations, the advantages of the NetBurst microarchitecture were unclear. With carefully
optimized application code, the first Pentium 4s outperformed Intel's fastest Pentium III (clocked at
1.13 GHz at the time), as expected. But in legacy applications with many branching or x87 floating-point
instructions, the Pentium 4 would merely match or run slower than its predecessor. Its main downfall was a
shared unidirectional bus. The NetBurst microarchitecture consumed more power and emitted more heat
than any previous Intel or AMD microarchitectures.

As a result, the Pentium 4's introduction was met with mixed reviews: Developers disliked the Pentium 4, as
it posed a new set of code optimization rules. For example, in mathematical applications, AMD's lower-
clocked Athlon (the fastest-clocked model was clocked at 1.2 GHz at the time) easily outperformed the
Pentium 4, which would only catch up if software was re-compiled with SSE2 support. Tom Yager of
Infoworld magazine called it "the fastest CPU - for programs that fit entirely in cache". Computer-savvy
buyers avoided Pentium 4 PCs due to their price premium, questionable benefit, and initial restriction to
Rambus RAM. In terms of product marketing, the Pentium 4's singular emphasis on clock frequency (above
all else) made it a marketer's dream. The result of this was that the NetBurst micro architecture was often
referred to as a marchitecture by various computing websites and publications during the life of the Pentium
4. It was also called "NetBust," a term popular with reviewers who reflected negatively upon the processor's
performance.

The two classical metrics of CPU performance are IPC (instructions per cycle) and clock speed. While IPC
is difficult to quantify due to dependence on the benchmark application's instruction mix, clock speed is a
simple measurement yielding a single absolute number. Unsophisticated buyers would simply consider the
processor with the highest clock speed to be the best product, and the Pentium 4 had the fastest clock speed.
Because AMD's processors had slower clock speeds, it countered Intel's marketing advantage with the
"megahertz myth" campaign. AMD product marketing used a "PR-rating" system, which assigned a merit
value based on relative performance to a baseline machine.

At the launch of the Pentium 4, Intel stated that NetBurst-based


processors were expected to scale to 10 GHz after several fabrication
process generations. However, the clock speed of processors using
the NetBurst micro architecture reached a maximum of 3.8 GHz.
Intel had not anticipated a rapid upward scaling of transistor power
leakage that began to occur as the die reached the 90 nm lithography
and smaller. This new power leakage phenomenon, along with the
standard thermal output, created cooling and clock scaling problems
as clock speeds increased. Reacting to these unexpected obstacles,
Intel attempted several core redesigns ("Prescott" most notably) and Pentium 4 Willamette 1.5 GHz on
explored new manufacturing technologies, such as using multiple Socket 423
cores, increasing FSB speeds, increasing the cache size, and using a
longer instruction pipeline along with higher clock speeds. These
solutions failed, and from 2003 to 2005, Intel shifted development away from NetBurst to focus on the
cooler-running Pentium M microarchitecture. On January 5, 2006, Intel launched the Core processors, which
put greater emphasis on energy efficiency and performance per clock cycle. The final NetBurst-derived
products were released in 2007, with all subsequent product families switching exclusively to the Core
microarchitecture.
Pentium D
The Pentium D[2] brand refers to two series of desktop dual-core
64-bit x86-64 microprocessors with the NetBurst microarchitecture, Pentium D
which is the dual-core variant of Pentium 4 "Prescott" manufactured General Info
by Intel. Each CPU comprised two dies, each containing a single Launched May 25, 2005
core, residing next to each other on a multi-chip module package.
The brand's first processor, codenamed Smithfield, was released by Discontinued August 8,
Intel on May 25, 2005. Nine months later, Intel introduced its 2008[1]
successor, codenamed Presler,[3] but without offering significant Common Intel
upgrades in design,[4] still resulting in relatively high power manufacturer(s)
consumption.[5] By 2004, the NetBurst processors reached a clock
Performance
speed barrier at 3.8 GHz due to a thermal (and power) limit
exemplified by the Presler's 130 watt thermal design power[5] (a Max. CPU clock 2.66 GHz to
higher TDP requires additional cooling that can be prohibitively rate 3.73 GHz
noisy or expensive). The future belonged to more energy efficient FSB speeds 533 MT/s to
and slower clocked dual-core CPUs on a single die instead of two.[6] 1,066 MT/s
The final shipment date of the dual die Presler chips was August 8,
2008,[7] which marked the end of the Pentium D brand and also the Architecture and classification
NetBurst microarchitecture. Min. feature size 90 nm to 65 nm
Microarchitecture NetBurst
Instruction set MMX, SSE,
SSE2, SSE3,
and x86-64
Physical specifications
Cores 2 (2×1)
Socket(s) LGA 775
Products, models, variants
Core name(s) Smithfield,
Presler
History
Predecessor Pentium 4
Successor Pentium Dual-
Core, Core 2

Pentium D/Extreme Edition


The dual-core CPU is capable of running multi-threaded applications typical in transcoding of audio and
video, compressing, photo and video editing and rendering, and ray-tracing. Single-threaded applications,
including most older games, do not benefit much from a second core compared to an equally clocked single-
core CPU. Nevertheless, the dual-core CPU is useful to run both the client and server processes of a game
without noticeable lag in either thread, as each instance could be running on a different core. Furthermore,
multi-threaded games benefit from dual-core CPUs.

In 2008, many business applications were not optimized


for multiple cores. They ran at similar speed when not
multitasking on the Pentium D or older Pentium 4
branded CPUs at the same clock speed. However, in
multitasking environments such as BSD, Linux,
Microsoft Windows operating systems, other processes
are often running at the same time; if they require
significant CPU time, each core of the Pentium D
branded processor can handle different programs,
improving overall processing speed over its single-core
Pentium 4 counterpart.
The competing Athlon 64 X2, although running at lower clock rates and lacking Hyper-threading, had some
significant advantages over the Pentium D, such as an integrated memory controller, a high-speed
HyperTransport bus, a shorter pipeline (12 stages compared to the Pentium D's 31), and better floating point
performance,[11] more than offsetting the difference in raw clock speed. Also, while the Athlon 64 X2
inherited mature multi-core control logic from the multi-core Opteron, the Pentium D was seemingly rushed
to production and essentially consisted of two CPUs in the same package. Indeed, shortly after the launch of
the mainstream Pentium D branded processors (26 May 2005) and the Athlon 64 X2 (31 May 2005), a
consensus arose that AMD's implementation of multi-core was superior to that of the Pentium D.[12][13] As
a result of this and other factors, AMD surpassed Intel in CPU sales at US retail stores for a period of time,
although Intel retained overall market leadership because of its exclusive relationships with direct sellers
such as Dell.[14]

Comparison to Pentium Dual-Core


In 2007, Intel released a new line of desktop processors under the brand Pentium Dual Core, using the Core
microarchitecture (which was based upon the Pentium M architecture, which was itself based upon the
Pentium III). The newer Pentium Dual-Core processors give off considerably less heat (65 watt max) than
the Pentium D (95 or 130 watt max). They also run at lower clock rates, only have up to 2 MB L2 Cache
memory while the Pentium D has up to 2×2 MB, and they lack Hyper-threading.

The Pentium Dual-Core has a wider execution unit (four issues wide compared to the Pentium D's three) and
its 14 stages-long pipeline is less than half the length of the Pentium D's, allowing it to outperform the
Pentium D in most applications despite having lower clock speeds and less L2 cache memory.
Pentium Dual-Core
The Pentium Dual-Core brand was used for mainstream x86-
architecture microprocessors from Intel from 2006 to 2009 when it Pentium Dual-Core
was renamed to Pentium. The processors are based on either the 32- General Info
bit Yonah or (with quite different microarchitectures) 64-bit Merom- Launched 2006
2M, Allendale, and Wolfdale-3M core, targeted at mobile or desktop
computers. Discontinued 2009
Common Intel
In terms of features, price and performance at a given clock manufacturer(s)
frequency, Pentium Dual-Core processors were positioned above
Celeron but below Core and Core 2 microprocessors in Intel's Performance
product range. The Pentium Dual-Core was also a very popular Max. CPU clock 1.3 GHz to
choice for overclocking, as it can deliver high performance (when rate 3.4 GHz
overclocked) at a low price.
FSB speeds 533 MHz to
800 MHz
Architecture and classification
Min. feature size 65 nm to 45 nm
Microarchitecture Core
Instruction set MMX, SSE,
SSE2, SSE3,
SSSE3, x86-64
Physical specifications
Cores 2
Socket(s) LGA 775
Socket M
Socket P
Products, models, variants
Core name(s) Yonah
Merom-2M
Processor cores
Allendale
In 2006, Intel announced a plan[1] to return the Pentium trademark Wolfdale-3M
from retirement to the market, as a moniker of low-cost Core
History
microarchitecture processors based on the single-core Conroe-L but
with 1 MiB of cache. The identification numbers for those planned Predecessor Pentium M,
Pentiums were similar to the numbers of the latter Pentium Dual- Pentium D
Core microprocessors, but with the first digit "1", instead of "2", Successor Pentium (2009)
suggesting their single-core functionality. A single-core Conroe-L
with 1 MiB cache was deemed as not strong enough to distinguish the planned Pentiums from the Celerons,
so it was replaced by dual-core CPUs, adding "Dual-Core" to the line's name. Throughout 2009, Intel
changed the name back from Pentium Dual-Core to Pentium in its publications. Some processors were sold
under both names, but the newer E5400 through E6800 desktop and SU4100/T4x00 mobile processors were
not officially part of the Pentium Dual-Core line.
Intel Pentium Dual-Core processor family
Desktop Laptop
Rebranded
Original Logo Code- Date Code- Date
Logo Core Core
named released named released

dual
dual (65 nm)
Yonah Jan 2007
Allendale (65 nm) Jun 2007 dual
Merom Nov 2007
Wolfdale dual Aug 2008 (65 nm)
Penryn Dec 2008
(45 nm) dual
(45 nm)

List of Intel Pentium Dual-Core microprocessors

Allendale

Subsequently, on June 3, 2007, Intel released the desktop Pentium


Dual-Core branded processors[4] known as the Pentium E2140 and
E2160.[5] An E2180 model was released later in September 2007.
These processors support the Intel 64 extensions, being based on the
newer, 64-bit Allendale core with Core microarchitecture. These
closely resembled the Core 2 Duo E4300 processor with the
exception of having 1 MB of L2 cache instead of 2 MB.[2] Both of
them had an 800 MHz FSB. They targeted the budget market above
the Intel Celeron (Conroe-L single-core series) processors featuring
only 512 KB of L2 cache. Such a step marked a change in the
Pentium brand, relegating it to the budget segment rather than its
former position as the mainstream/premium brand.[6] These CPUs Intel Pentium E2220 @ 2.40GHz
are highly overclockable.[7] (Allendale) installed
Rebranding
The Pentium Dual-Core brand has been discontinued in early 2010 and replaced by the Pentium name. The
Desktop E6000 series and the OEM-only mobile Pentium SU2000 and all later models were always called
Pentium, but the Desktop Pentium Dual-Core E2000 and E5000 series processors had to be rebranded.

Comparison to the Pentium D


Although using the Pentium name, the desktop Pentium Dual-Core is based on the Core microarchitecture,
which can clearly be seen when comparing the specification to the Pentium D, which is based on the
NetBurst microarchitecture first introduced in the Pentium 4. Below the 2 or 4 MiB of shared-L2-cache-
enabled Core 2 Duo, the desktop Pentium Dual-Core has 1 or 2 MiB of shared L2 Cache. In contrast, the
Pentium D processors have either 2 or 4 MiB of non-shared L2 cache. Additionally, the fastest-clocked
Pentium D has a factory boundary of 3.73 GHz, while the fastest-clocked desktop Pentium Dual-Core
reaches 3.2 GHz. A major difference among these processors is that the desktop Pentium Dual Core
processors have a TDP of only 65 W while the Pentium D ranges between 95 and 130 W. Despite the
reduced clock speed, and lower amounts of cache, Pentium dual-core outperformed Pentium D by a fairly
large margin.

You might also like