You are on page 1of 23

PH 208 P lecture #6

16-bit microprocessors (INTEL 8086 / 8088) 80186/286, Z8001/2, M68000


Dig. Equip. LSI-11, NS16000
Apart from design concept & instruction set, pin no.–’70s 40pin trend – also 48/68
Primary objectives of 16-bit P:
• Increase memory addressing
• Increase execution speed
• Provide powerful instruction set
• Facilitate programming in high level languages
• Function in a multiprocessor environment

• Requires change in design concepts


• Addressing limit is limited by number of pins in the IC

• INTEL iAPX 8086/8088 – 16-bit μP with 40-pins –1 Mbyte mem. addr – 5 to 10 MHz
• 8088 – 8-bit data bus – same internal architecture & instruction set as 8086 – 16
bit data is transferred as 8 bit words – ‘8 bit P with the power of a 16 bit P’.

Signals (7 categories)
• Address bus, data bus, control and status signals, external requests, response to
external signals, power and clock + multiprocessor environment
A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati
PH 208 P lecture #6
• Signals (7 categories): Address bus, data
bus, control and status signals, external
requests, response to external signals,
power + clock & multiprocessor environ.
• Data bus and status signals are
multiplexed with addr. bus.
• MN/MX: 2 opn. (Min/Max mode);
MN for single μP environment. MX for
multiprocessor environment, such as a
coprocessor – 8 pins are assigned
different task + bus controller (8288) is
necessary for generating control signals.

• TEST: For synchronizing multiproc.


environ. If WAIT instr is executed, &
if TEST = 1, μP interrupts execution.
• DEN: Data enable - to bi-dir. Buffer
to isolate MPU from system bus
• DT/R: Data transmit/receive, ,,
• M/IO: Memory and IO
• BHE/S7: Bus high enable, to enable
higher byte of 16-bit data. S7-8088.
A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati
PH 208 P lecture #6

A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati


PH 208 P lecture #6

• P asserts ALE at start of m/c when


addr needs to be placed in addr bus.
• In 8088, A15-A8 are not multiplexed
So only 2 latches reqd.

A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati


PH 208 P lecture #6

• 8088 MPU in maximum mode (with


demultiplexed addr bus A19-A0).
• 8 signals are assigned functions of
status signals through 8288 (to
compensate lost signals).
• DT/R and DEN are used to buffer the
data bus to reduce loading.

A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati


PH 208 P lecture #6
Internal architecture

Two processing units


EU and BIU

Processing with
one processing unit

Two processing units

BIU

EU

A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati


PH 208 P lecture #6
• Four 16-bit GP regr.
AX, BX, CX, DX
equiv. to four Acc.
• 4 memory pointers
SP (stack pointer)
BP (base pointer)
SI (source index)
DI (Destination index)
• 4 segment registers
CS (code segment)
DS (data segment)
SS (stack segment)
ES (extra segment)
• SR+MP = mem, addr.
• IP (Instruction pointer)
(same as PC in 8085)
• Flag register (9 flags)
2 gp: 6 data + 3 cntrl
OF (overflow) – sign #
DF (Dir. Fl) – string i/d
IF (Interrupt Fl)=EI, DI
TF (Trap Fl)- single
stepping instructions
A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati
PH 208 P lecture #6

A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati


PH 208 P lecture #6

Memory segmentation
• In order to access 1 MB, 20-bit addr is required. But IP regr. is 16-bit. So, memory
segmentation by using segment registers (SRs).
• SRs assign memory base addr: CS (instruction), DS (data), SS (Stack), ES (addl. Data)
• SR+MR = 20 bit addr. Default combination can be over ridden by instruction.
If program is < 64K, all SR can be defined at the same base addr.
If program is > 64K, all segments can be separate to avoid overlap

• Memory address Format:


physical: 20-bit address
offset: address of an instruction (data) in reference to the base address in the SR.
logical: combination of a segment and an offset address

A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati


PH 208 P lecture #6

(SP) (SI)

A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati


PH 208 P lecture #6
135 basic instructions

A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati


PH 208 P lecture #6
INTEL 8086 INSTRUCTION SET (135 instructions)
Can operate on indiv. bits, bytes, 16-bit & 32-bit words, signed & BCD no., ASCII char.
• DATA TRANSFER INSTRUCTIONS
GEN – PURPOSE BYTE OR WORD TRANSFER {MOV, PUSH, POP. XCHG, XLAT}
SIMPLE INPUT AND OUTPUT PORT TRANSFER {IN, OUT}
SPECIAL ADDRESS TRANSFER {LEA, LDS, LES}
FLAG TRANSFER {LAHF, SAHF, PUSHF, POPF}

• ARITHMETIC INSTRUCTIONS
ADDITION {ADD, ADC, INC, DAA, AAA}
SUBTRACTION {SUB, SBB, CMP, DEC, NEG, AAS, DAS}
MULTIPLICATION {MUL, IMUL, AAM}
DIVISION {DIV, IDIV, AAD, CBW, CWD}

• BIT MANIPULATION INSTRUCTIONS


LOGICAL {NOT, AND, OR, XOR, TEST}
SHIFT {SHL/SAL, SHR, SAR}
ROTATE {ROL, ROR, RCL, RCR}

• STRING INSTRUCTIONS {REP, REPE/REPZ, REPNE/REPNZ, MOVS/MOVSB/MOVSW,


COMPS/COMPSB/COMPSW, SCAS/SCASB/SCASW,
LODS/LODSB/LODSW, STOS/STOSB/STOSW}
A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati
PH 208 P lecture #6
• PROGRAM EXECUTION TRANSFER INSTRUCTIONS
UNCONDITIONAL TRANSFER {JMP, CALL, RET}
CONDITIONAL TRANSFER {JA/JNBE, JAE/JNB, JB/JNAE, JBE/JNA, JC, JE/JZ, JG/JNLE,
JGE/JNL, JL/JNGE, JLE/JNG, JNC, JNE/JNZ, JNO, JNP/JPO,
JNS, JO, JP/JPE, JS}
ITERATION CONTROL {LOOP, LOOPE/LOOPZ, LOOPNE/LOOPNZ, JCXZ}
INTERRUPT {INT, INTO, IRET}

• PROCESS CONTROL INSTRUCTIONS


FLAG SET/CLEAR {STC, CMC, CLC, STD, CLD, STI, CLI}
EXTERNAL H/W SYNCHRONIZATION {HLT, NOP, WAIT, ESC, LOCK}

OTHER 16-bit P’s (80x86 family) Prefetched pipeline, II processing, mem. mgt
l

• 80186
68 pin DIP, 80186 (8 & 10 MHz), integrated to reduce chip count rather than improving
memory addr. Multiplexed addr & data bus with addl lines for clk gen, interrupt controllers,
DMA controller, a chip select unit.
• 80286 (also in 68 pin DIP)
Different architectural philosophy, No multiplexing of buses (uses 24 lines to addr 16Mbytes
Can support memory management unit (MMU) to adddress 1Gb of memory (Virtual Mem.),
in-built protection mechanism (to protect user progr.), Multi-user environ & closer to 80386.
A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati
PH 208 P lecture #6
High-end high-performance P’s from INTEL (80X86): 32- & 64- bit P’s
Trend of evolution: 1) multiuser, multitasking, time-sharing environ. 2) distributed
processing interconnected with networks.
Single-user – unlimited access to all aspects. But multi-user environment requires

Multi-user operating system & new architectural design needed to handle all these!
Intel 80386 and 80486 (32-bit processors):
• 132-pin grid array packages + non multiplexed 32-bit addr. bus – 20 MHz to 33 MHz
– 4Gbytes of physical memory – 64 (246) Tbytes of virtual memory through MMU.
• Can operate in real mode (PA space is 1 Mbyte with 20-bit addr bus) and protected
mode (PA=4GB with 32-bit addr bus) Main difference in mem. spc & addr. schemes
A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati
PH 208 P lecture #6
Functional signal groups in
Intel 80386:
• 32 bit registers
• Interrupts and external request
signals are similar to 8085.
• Has new signals for co-processor
interfacing.
• Some signals for spl. functions
– byte enable signals for addr.
bus (for identifying groups of
data lines active in 32-bit
data) and in bus control (BS16# to connect directly to 32-bit or 16-bit buses).
Instruction set and addressing modes:
• Nine categories of instructions and eleven addressing modes.
• Operands can be single bit, string of bits, signed and unsigned 8-, 16-, 64-bit data,
ASCII characters and BCD numbers.
• ENTER, LEAVE, ARPL (Adj Req. Priv. Level), VERR/W (Verify Seg for R/W) - HLL & OS
• MMU and better segmentation / paging (4Kbyte) simplifies swapping with
physical memory & disk; 4-level protection mech. (0-highest, 3-lowest priority)
A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati
PH 208 P lecture #6

• 32-bit registers can be used as 32-, 16- or 8-bit registers (EAX, AX, AL/AH)
• EIP or IP equivalent to PC
• 14-bit are used as flags in 32-bit flag regr. (8086 has 9 flags, 80286 has 11, 80386 has 13)
6 for data (S, Z, CY, AC, O, P) and 3 for m/c opn. (interrupt, single-step, string) + I/O
privilege (in protected mode to determine usable I/O instr.), nested task (to show link
between two tasks), virtual mode (8086 compatible mode) and resume flag, RF (Resume
Flag to work with break points).
A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati
PH 208 P lecture #6
INTEL 80486:
• Up-graded and faster version with 168-pin grid array package (DX). Clk speed of 25 to 66
MHz. 1.2 million transistors (as compared to 300,000 in 80386!).
• Important addl. features over 80386 are built in math coprocessor (3x faster than 80386),
8 Kbyte of code and cache memory on chip, highly pipelined execution unit (so execution
time for most instructions is one clock cycle).

INTEL Pentium Processor:


• 32-bit addr bus and 64-bit data bus (Clk speed of up to 233 MHz). 7 GPRs, each 32-bit long.
• 273-pin grid array package with 3.1 million transistors
• Mainframe features – suited for desktop PC applications including 3D graphics.
• Advanced design features:
1) Superscaler architecture (2 exec units with dual pipeline architecture)
2) On-chip cache memory for code & data (two 8K cache for commonly used C/D – level 1)
3) Branch prediction (mainframe tech – most likely instructions are predicted & pipelined)
4) High-performance floating-point unit (7-stage pipeline & h/wired codes – 5-10x of 486)
5) Performance monitoring (user can monitor internal events and optimize by removing
bottlenecks in code)
A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati
PH 208 P lecture #6
INTEL Pentium processor (1993) • 66 MHz
‘P that could not count’– error in 6 /9 decimal in
th th
• 8 kbyte data and code caches
some division sums – 2 simult. design errors – was fixed!• Burst mode to R & W data
7 GP regr. 32 bit wide; 32/16/8 (3), 32/16 bit (3) 128 bit cache line fed by 64 bit data
bus in two fills w/o P intervention.
• Two 32-bit pre-fetch buffer – one for
each ALU.
• ID has 2 O/P – for ‘u’ & ‘v’ ALUs.
• CU controls ALUs that work in parallel
• 5-step pipeline can exe 2 instr. in 1T.
(32b + 32b) • ‘u’ pipe  All cmd except fl. pt. arith
8kB 8kB ‘v’ pipe  more limited range.
256 entry cache • Certain drawbacks in parallel process,
say, add 2 no. and divide by 5.
• FPU has 8-bit pipeline and x &  h/w,
calc are 10x faster than 80486.
• Eight 80 bits wide, stacked FPU regis.

• Branch pred.: JMP - ‘flushing’ pipeline


wastes ~5T. Guess next instr. 85%

A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati


PH 208 P lecture #6
INTEL MMX (MultiMedia eXtension) Pentium:
• An addition to standard Pentium to increase speed of multimedia, communication
and other applications with a large number of repetitive calculations.
• SIMD: INTEL looked for time consuming common characteristics – e.g. changing
colour of a pixel. This led to an idea called Single Instruction Multiple Data (SIMD).
SIMD can be used to perform the same opn. on multiple bits of data, and this is
executed in parallel. MMX allows 8 pixels to be moved around and processed
together. SIMD is the heart of MMX.
• FPRs: MMX instructions control the 8 FPRs and 8 more registers for holding
addresses, loop control, data manipulation instructions, etc. FPRs are highly
flexible – the 64-bit mantissa can be used for 8 separate bytes, four 16-bit words,
two 32-bit ‘doublewords’ or a single 64-bit ‘quadword’.
• Saturation arithmetic: In fixed-pt.
arithmetic, 1 + 1111 = 0000 with
overflow flag being ‘set’. Checking
flag consumes time. In graphics
(say, ‘shading’), sudden drop to
zero gives unwanted color change.
SA assures a wrap-around effect
(maintaining the saturation value) Accidental wrap around and restart
A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati
PH 208 P lecture #6
INTEL Pentium 4:
• 64-bit processor; Uses 0.13 m technology
• Operating V reduced from 1.75 V to 1.5 V.
• 478 pins and 55 million transistors
• 55W in 10 mm x 10 mm ckt. Heat sink & fan.
As P overheats, fan speed increases & opn
speed of P decreases. If T > 69 ˚C, P is shut.
• System bus (Front Side Bus) is ‘Quad
Pumped’ (shifts 4 lots of data along the bus).
133MHz x 4 ~ like a bus running @ 533 MHz.
• Incoming & outgoing info is stored in 256kB
level 2 Advanced Transfer Cache.
• Better branch prediction  hyper pipeline
• Br. Predictor loads data to buffer & decoder.
• Upto 12000 decoded instr. (Micro-Operand)
or OP are stored in L1 Exec. Trace Cache.
• P4 has 20 stage pipeline128 instr at a time
• OPs arranged in order as info to be stored in
memory & arithmetic opn. (FP & Integer).
• ALU uses MMX (Streaming SIMD Extn 2 instr)
A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati
PH 208 P lecture #6
The Celeron
Trade-off between speed and price! The solution was to use Pentium design and
take out non-essential areas to make the h/w cheaper. e.g. in the 2 GHz Celeron,
price reduction was achieved by reducing the L2 cache from 512 kB to 128 kB, and
the FSB down from 533 to 400 MHz.
INTEL Core Processors:
• The Yonah CPU (65 nm) was launched in Jan. 2006 under the Core brand.
• Single and dual-core mobile versions under the Core Solo, Core Duo, and Pentium
Dual-Core brands. A server version was released as Xeon LV.
• Features: Streaming SIMD Extensions 3 (SSE3) Support; Single- & Dual-core tech.
with 2 MB of shared L2 cache (restructuring P organization); Increased FSB
speed, with FSB running at 533 MHz or 667 MHz; A 12-stage instruction pipeline.
In July 2006, the Core 2 processor with Core microarchitecture (65/45 nm) was
launched. Subsequently, Xeon, Pentium and Celeron under Core 2 were released.
The Core microarchitecture is Intel's final mainstream processor line to use FSB, with
all later Intel processors based on Nehalem (45 nm) and following Intel
microarchitecture exclusively using the Quick path intercommect (QPI) or Direct
Media Interface (DMI) bus.
A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati
PH 208 P lecture #6
Improvements from the Intel Core processors were:
• A 14-stage instruction pipeline to achieve significantly higher clock speeds
• SSE3 support for all models & SSE4.1 support for Core 2 models with 45 nm arch.
• An x86-64 (64-bit) instruction set is added, allowing all Core 2 to run 64-bit appl.
• Increased FSB speed, with the FSB running at 1.6 GHz.
• Increased L2 cache size (ranging from 1 MB to 12 MB)
• Some mobile Core 2 Duo processors support Dynamic Front Side Bus Throttling,
with the FSB running at half of its full speed in Super Low Frequency Mode,
therefore reducing the core speed to half of its full speed as well. This technique
allows the processors to consume less power, increasing battery life.
• Some mobile Core 2 Duo processors have Dynamic Acceleration Technology, while
mobile Core 2 Quad processors support Dual Dynamic Acceleration Technology.
For a mobile Core 2 Duo, this feature allows the CPU to overclock one processor
core while turning off the other one. As for a mobile Core 2 Quad, two cores can
be overclocked. The processor does this if an application only uses a single core or
two as a minimum requirement to function effectively and the clock multiplier is
only increased by 1.

A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati


PH 208 P lecture #6
Trends in high-performance processors
CISC (complex Instruction Set Computing) → RISC (Reduced Instruction Set Computing) (RISC).
Instruction decoder or control unit has Reduced instr. that are simple & fast in execution.
a micro program, which provides the Replaces micro program with h/w to exec steps.
steps needed to carry out the instr. All instr. of same length  so easy to pipeline
So this P is actually run by s/w! Simple load and store instr. Everything else done
internally with large no. of regr. instead of RAM.
Executing logical AND using logic gates takes 8 ns, but 80386 instructions (@25MHz) take 80 ns
RISC µP: Alpha Digital Equip. Corp., PA7100  HP, PowerPC  Apple, Super Sparc  Sun µSys, 32-136
registers, up to 200 MHz, mainly work station and servers (except PowerPC).
Some general features of RISC processors are

A. Srinivasan, Department of Physics, Indian Institute of Technology Guwahati

You might also like