Unit 1

The Intel 86 Family of Processors
Internal External
Processors Year Architecture Bus Size Transistors Principle Features
86 1978 16 16 29K 16-bit architecture, basic
segment protection
88 1979 8 8 29K Same as 86, but with 8-bit
processor bus. (IBM PC)
286 1982 16 16 130K Expands segmentation
protection, adds single-
instruction task switching
(used in IBM PC/AT)
Intel 386TM 1985 32 32 375K Adds paging, 32-bit
extensions, on-chip address
translation, and greater
speed to 286 functions
Intel 386TM 1988 32 16 375K Same as Intel 386
SX processor, but with a 16-bit
data bus
The IntelTM 86 Family of Processors
Internal External
Processors Year Architecture Bus Size Transistors Principle Features
Intel 486TM 1989 32 32 1,200K Adds on-chip cache, floating-
DX point unit, and greater speed
to Intel386TM
Intel486TM 1991 32 32 No math, Lower cost
SX
Intel486TM 1992 32 32 1.2 Meg Double internal speed
DX-2
PentiumTM 1993 32 64 3.1 Meg Superscaler, Code & Data
P5 - 60,66 Cache, 64 bit data bus
PentiumTM 3.3v, Power Mgt,
1994 32 64 3.3 Meg
P54C Multiprocessor support
PentiumTM On Chip L1 & L2,
1995 32 64 CPU
Pro Dynamic Execution
5.5 Meg
GTL logic
FEATURES OF 80486
• 80486 was introduced 1989

• » Improved version of 386
• » Combined coprocessor
functions for performing
floating-point arithmetic
• » Added parallel execution

capability to instruction
decode and execution units
• – Achieves scalar execution of

1 instruction/clock
BUS INTERFACE UNIT(BUI)
• The bus interface unit is used to organize all the bus activities of the processor.
• The address driver is connected with the internal 32 bit address o/p of the cache and the
system bus. The data bus transceivers are interconnected between the internal 32-bit data
bus and the system bus. The write data buffer is queue of four 80 bit registers and is able to
hold the 80 bit data which will be the written to the memory.
• Due to pipelined execution of the write operation, data must be available in

advance.
• To control the bus access and operations, the following bus control and the request
sequencers ADS#, W/R# ,D/C#, M/IO#, PCD, PWT, RDY@, LOCK#,
PLOCK#,BOFF#,A20M#,BREQ,HOLD,HLDA,RESET,INTR,NMI,FERR#,and IGNNE#
are used.
EXECUTION UNIT (EU) AND CONTROL UNIT (CU)
• The parity generation and control unit generates the parity and carries out the checking
during the processor operation. The boundary scan control unit of the processor performs
boundary scan tests operation to ensure the correct operation of all components of the
circuit on the mother board.
• The prefetcher unit fetches the codes from the memory and arranges them in a 32 byte
code queue. The function of the instruction decoder is to receive the code from the code
queue and then decodes the instruction code sequentially. The output of the decoder is fed
to the control unit to derive the control signals, which are used for execution of the
decoded instructions. Before execution, the protection units check all the protection norms.
If there is any violation, an appropriate execution is generated.
EXECUTION UNIT (EU) AND CONTROL UNIT (CU)
• The control ROM stores a micro program to generate control signal for execution of instructions. The
register bank and ALU are used for their usual operation just like they perform in 80286.
• The barrier shifter is used to perform the shift and rotate algorithms.
• The segmentation unit, description registers, paging unit, translation look aside buffer and limit and
attribute PLA are worked together for the virtual memory management. These units also provide
protection to the op-codes or operand in the physical memory.
• FLOATING-POINT UNIT(FPU)
• The floating point unit and register bank of FPU communicate with tha bus interface unit (BIU) under
the control of memory management unit (MMU), through a 64-bit internal data bus. Generally the
FPU is used for mathematical data processing at very high speed of compare to the ALU.
PIPELINE STAGES
80586- FEATURES
• Pentium Processor • Separate 8k Code and 8k data caches
• 32-bit Microprocessor • Similar to 80486 but with 64-bit data bus
• 32-bit addressing • Wider internal data paths: 128- and 256-bit wide
• 64-bit Data Bus. • Superscalar performance allows two instructions

• Superscalar architecture per clock cycle
• Two pipelined integer units • So far intel CPU upto 80486-has one instruction is
• Pipelined Floating Point Unit issued to the execution unit per sec
• Dynamic Branch prediction • To improve this, architects employs the technique

of Multiple Instruction Issue (MII)
FEATURES
• To achieve this CPU must have more than one execution channels .there
exists two problems
• A)How to issue multiple instructions
• B) How to execute them concurrently
• MII architecture may be again redivided into two architectures
• A)Very Long Instruction Word (VLIW)architecture
• B)Superscalar architecture
FEATURES
• In VLIW processor ,the compiler reorders the sequential stream of code that is coming
from memory into a fixed size instruction group and issues them in parallel for execution
• In super scalar the hardware decides which instructions are to be issued concurrently at
run time
• CPU issues two instructions in parallel to the two independent integer pipelines known as
U and V pipelines
• branch prediction done using the branch target buffer (BTB)
• the pipelined floating-point unit, and the 64-bit external data bus
• Even-parity checking is implemented for the data bus and the internal RAM arrays
(caches and TLBs).
PENTIUM ARCHITECTURE
BLOCK DIAGRAM
CODE AND DATA CACHE
• There are separate code and data caches, and the cache line size is 32 bits just like the
80486 processor.
• Each cache is connected with its own translation look-aside buffer (TLB).
• Therefore, the paging unit of a memory management unit (MMU) can rapidly convert
linear code or data addresses into physical addresses.
• Due to two separate caches, the pre-fetches cannot conflict with data access cycles.
• BRANCH PREDICTION:
• Branch prediction consists of control unit (CU) and a Branch Trace
Buffer (BTB). The function of control unit and branch trace buffer is as follows:
• Branch Trace Buffer: The BTB is used to store the target address and statistical
information about the branch operation.
• Hence, the branch prediction is able to predict branches and cause the Pentium to use the
most likely target address for instruction fetching. Pipeline freeze up caused by pipeline
flushes and the subsequent fetching operations are reduced and the program execution is
accelerated.
Control Unit
• The control unit controls the five-stage integer pipelines U and V, and the eight-stage floating-
point pipeline.
• In the Pentium processor, the integer pipelines are used for all instructions which are not involved in
any floating-point operations. Therefore, the Pentium can transmit two integer instructions in the
same clock cycle and the performance of the processor is improved. This method is called
superscalar architecture.
• The first four stages of floating- point pipeline overlap with the Pipeline and the parallel operation of
the integer and the floating-point pipeline is possible only under some specified conditions.
• If the operating clock frequency of Pentium is as same as 80486, the Pentium floating-point unit is
able to execute floating-point instructions 3 to 5 times faster than 80486.This is possible as a
hardware mulitiplier,divider and quicker algorithms are incorporates in the microcode floating-point
unit.
Control Unit
• The Pentium has a microcode support unit to support complex function.
• The support unit controls the pipelines with the microcode. Actually, this unit uses both
pipelines together. Therefore, complex microcode instructions run very fast on a Pentium
than on a 80486.
SUPER SCALAR
• A superscalar CPU architecture implements a form of
parallelism called instruction level parallelism within a
single processor. It therefore allows faster CPU throughput
than would otherwise be possible at a given clock rate.
• A superscalar processor executes more than one instruction

during a clock cycle by simultaneously dispatching multiple
instructions to redundant functional units on the processor.
• Each functional unit is not a separate CPU core but an

execution resource within a single CPU such as an arithmetic
logic unit, a bit shifter, or a multiplier..
PIPELINES U AND V
• The Pentium is a superscalar processor and it has two integer pipelines, called u and v. The process
is issuing two instructions in parallel is known as paring.
• The U-pipeline is able to handle the full instruction set of the Pentium.
• The V-pipeline has limited handling capability .The V-pipeline is able to handle only simple
instructions without any microcode support. The V-pipeline is used to execute ‘simple integer
instruction’ such as load/store type instructions and the FPU instruction FXCH,
• Actually , Pentium processor use a set of pairing rules to select a simple instruction which can go
through the V-pipeline . When instructions are paired, initially the instruction is issued to
the U-pipeline and then the next sequential instruction is issued to the V-pipeline.
Instruction Issue Algorithm
• Decode the two consecutive instructions I1 and I2
• If the following are all true
– I1 and I2 are simple instructions
– I1 is not a jump instruction
– Destination of I1 is not a source of I2
– Destination of I1 is not a destination of I2
• Then issue I1 to u pipeline and I2 to v pipeline
• Else issue I1 to u pipeline

PIPELINING
• There are two integer pipelines and a floating-point unit in the Pentium processor.
• PREFETCH (PF)
• DECODE-1 (D1)
• DECODE-2 (D2)
• EXECUTE (E)
• WRITE BACK (WB)

Integer Pipeline 3. Decode2(D2):
• The integer pipeline stages are as follows: • Decodes the control word
• Address of memory resident operands

1.Prefetch(PF) :
are calculated.
– Instructions are prefetched from the on-chip
instruction cache 4. Execute (EX):
2.Decode1(D1): • The instruction is executed in ALU Data

– Two parallel decoders attempt to decode and cache is accessed at this stage For both
issue the next two sequential instructions ALU and data cache access requires more
– It decodes the instruction to generate a than one clock.
control word
• A single control word causes direct execution of
5. Writeback(WB):
an instruction
• Complex instructions require microcoded control
• The CPU stores the result and updates
sequencing
the flags
Floating-Point Pipeline
• The floating point pipeline has 8 stages as follows:
1.Prefetch(PF) :
– Instructions are prefetched from the on-chip instruction cache
2.Instruction Decode(D1):
– Two parallel decoders attempt to decode and issue the next two sequential
instructions
• It decodes the instruction to generate a control word,
• A single control word causes direct execution of an instruction

• Complex instructions require microcoded control sequencing
3. Address Generate (D2):

• Decodes the control word
• Address of memory resident operands are calculated.
4. Memory and Register Read (Execution Stage) (EX):

• Register read, memory read or memory write performed as required by the
instruction to access an operand.
5. Floating Point Execution Stage 1(X1):

• Information from register or memory is written into FP register.
• Data is converted to floating point format before being loaded into the
floating point unit
6. Floating Point Execution Stage 2(X2):
• Floating point operation performed within floating point unit.
7. Write FP Result (WF):

• Floating point results are rounded and the result is written to the target
floating point register.
8. Error Reporting(ER)
• If an error is detected, an error reporting stage is entered where the error is
reported and FPU status word is updated
FLOATING POINT UNIT
Floating point Unit
• FRD - Floating Point Rounding FDD - Floating Point Division

FADD - Floating Point Addition FEXP - Floating Point Exponent
FAND - Floating Point And FMUL - Floating Point Multiply

Integer Instruction Pairing Rules
• The Pentium processor can issue one or two instructions every clock. In order to issue two instructions simultaneously
they must satisfy the following conditions:
1. Both instructions in the pair must be “simple” as defined below Simple instructions are entirely hardwired; they do not
require any microcode control and, in general, execute in one clock
2. There must be no read-after-write or write-after-write register dependencies between them.

RAW:
i1. R2  R1 + R3
i2. R4  R2 + R3
WAW:
i1. R2  R4 + R7
i2. R2  R1 + R3
1. Neither instruction may contain both a displacement and an immediate
2. Instructions with prefixes can only occur in the u-pipe.
3. Instruction prefixes are treated as separate 1-byte instructions. Sequencing hardware is used to allow them to function
as simple instructions.
• For example the instructions a=b+c; d=e+f; can be run in parallel because none of the result
depend on other calculations.
• If the ins a=b+c; b=e+f; might not be runnable in parallel,depending on the order in which
the ins complete while they move through the units.
• When the no of parallel issued ins increases, the cost of dependency checking increases
extremely rapidly.
• This is exacerbated by the need to check dependencies at run time and at the cpu`s clock
rate.
• This is leads to extra cost for additional logic gates required to implement the
checks.
Instruction Issue for
Floating Point Unit
• The rules of how floating-point (FP) instructions get issued on the Pentium processor are :
1.FP instructions do not get paired with integer instructions.
2.When a pair of FP instructions is issued to the FPU, only the FXCH instruction can be the
second instruction of the pair.
The first instruction of the pair must be one of a set F where F = [ FLD,FADD, FSUB,
FMUL, FDIV, FCOM, FUCOM, FTST, FABS, FCHS].
3.FP instructions other than FXCH and instructions belonging to set F, always get issued
singly to the FPU.
4.FP instructions that are not directly followed by an FXCH instruction are issued singly to the
FPU.
BUS CYCLE DEFINTION GROUP
• Bus control group:
• ADS#: The ADS# (address data strobe) output pin indicates that the address bus contains a
valid memory address.
• RDY#: The RDY# (ready) input pin acts as a ready signal and this signal is used for the
current nonburst cycle.
• BRDY#: The BRDY# (burst ready) input indicate the burst mode of memory read or memory
write operation. During the burst mode, the speed of memory access may be doubled compared to
the normal memory read/write operation.

BUS CYCLE DEFINTION GROUP
• HOLD: The HOLD pin act as a local bus hold input. This pin may be activated by another bus
master like DMA controller. This pin is functionally similar to tha BREQ pin.
• HLDA: The HLDA output signal is used to acknowledge the receipt of a valid a HOLD request.
• BOFF#: When the BOFF# (back off) input pin is at logical level1,80486 CPU places its buffer at
hold state. The active high back-off input signal forces the current bus master of 80486 CPU release
the bus in the next clock cycle.
• KEN#: The KEN# (cache enable) input pin is used to decide whether the current cycle is cacheable
or not.
• FLUSH#: The FLUSH# is a cache flush input signal. When this pin is activated, it clears the cache
contents and validity bits.
BUS CYCLES
BUS OPERATAION
• Ti – the ideal state indicated that no bus cycles are being run.
• T1- ( Address Time)The first clk of a bus cycle whn no other cycles are outstanding.
• - the addr and bus cycle are driven during T1 and ADS# enabled.
• - T1- indicates tht this is the only bus cycle in progress or no bus cycle pipelining is occur.
• T2 – (DATA TIME) The second and subsequent clk in bus cycles.
• - if read bus cycle is being run, the data is latched from data bus whn BRDY# enabled at
the end of T2.
• -During T2 state no other cycles are currently running.
BUS OPERATAION
• T12 – ( Address time for 2nd cycle, & data time for 1ST cyale already in progress)-
• - T12 – indicated pipeline state, in which the address phase of anewly pipelined cycle and data
time for the 1ST cycle already I progress occur simultaneously.
• - the processor is still in the T2 state for the current cycle and has entered T1 state for the next
pipelined cycle.
• T2P- ( data time for 1st cycle & data time for 2nd cycle pipelined)
• - it indicates two cycles that are outstanding on the bus and both are in the T state.
• - During T2P, BRDY# enabled for 1st cycle initiated by the processor.
• - When 1st outstanding cycle is completes, the state transitions to T2, indicating only one
outstanding cycle.
BUS OPERATAION
• TD- (Wait state or dead clock state)- indicates there is one outstanding bus cycles, that
its address, status & ADS# have already been driven some time in past, and the data and BRDY#
pins are not enabled due to the data bus requires one dead clock to turn around b/w the two
consecutive reads and writes or vice versa.
• - the processor enters TD if in the previous clk there were two outstanding cycles, the
last BRDY# was returned and Dead clk is needed.
BUS OPERATION
BUS OPERATIONS
BUS OPERATIONS
BUS OPERATIONS
Branch prediction
• Branch prediction is another new feature of the Pentium.
• Performance gain through pipelining can be reduced by the presence of program transfer
instructions (such as JMP,CALL,RET and conditional jumps).
• They change the sequence causing all the instructions that entered the pipeline after program
transfer instruction invalid.
• Suppose instruction I3 is a conditional jump to I50 at some other address(target

address), then the instructions that entered after I3 is invalid and new sequence beginning
with I50 need to be loaded in.
• This causes bubbles in pipeline, where no work is done as the pipeline stages are reloaded.
• The ability to predict branches and avoid the branch penalty combined with the instruction pairing
can result in a substantial reduction in the clock count for a given program.
Branch Prediction Logic
• To avoid this problem, the Pentium uses a scheme called Dynamic Branch Prediction.
• In this scheme, a prediction is made concerning the branch instruction currently in pipeline.
• Prediction will be either taken or not taken.
• If the prediction turns out to be true, the pipeline will not be flushed and no clock cycles will be
lost.
• If the prediction turns out to be false, the pipeline is flushed and started over with the correct
instruction. It results in a 3 cycle penalty if the branch is executed in the u-pipeline and 4 cycle
penalty in v-pipeline.
• It is implemented using a 4-way set associative cache with 256 entries. This is referred to as the
Branch Target Buffer(BTB).
• BTB is a look-aside cache that sits off to the side of D1 stages of two pipelines and monitors for branch
instructions.
• The first time that a branch instruction enters either pipeline, the BTB uses its source memory address
to perform a lookup in the cache.
• Since the instruction has not been seen before, this results in a BTB miss.
• It means the prediction logic has no history on instruction.
• It then predicts that the branch will not be taken and program flow is not altered.
• Even unconditional jumps will be predicted as not taken the first time that they are seen by BTB.
• When the instruction reaches the execution stage, the branch will be either taken or not taken.
• If taken, the next instruction to be executed should be the one fetched from branch target address.
• If not taken, the next instruction is the next sequential memory address.
• When the branch is taken for the first time, the execution unit provides feedback to the branch
prediction logic.
• The branch target address is sent back and recorded in BTB.
• A directory entry is made containing the source memory address and history bits set as strongly
taken
History Resultin Predicti If branch If branch
Bits g on Made is taken is not
Descript taken
ion
11 Strongly Branch Remains Downgrade
Taken Taken Strongly s to
Taken Weakly
Taken
10 Weakly Branch Upgrades Downgrade
Taken Taken to Strongly s to
Taken Weakly Not
Taken
01 Weakly Branch Upgrades Downgrade
Not Not to Weakly s to
Taken Taken Taken Strongly
Not Taken
00 Strongly Branch Upgrades Remains
Not Not to Weakly Strongly
Taken Taken Not Taken Not Taken
Branch Prediction
• Branch Prediction
• Branch Target Buffer
• The processor accesses the BTB with the address of the instruction in the D1 stage
example)
inner_loop :
mov byte ptr flag[edx], al PF D1 D2 EX WB
add edx, ecx PF D1 D2 EX WB
cmp edx, FALSE PF D1 D2 EX WB
jle inner_loop PF
• 486 : 6 clocks
Pentium : 2 clocks with branch prediction
www.advancedmsinc.com
PAGING:
• The paging system operates in both real and protected mode.
• It is enabled by setting the PG bit to 1 (left most bit in CR0).
• (If set to 0, linear addresses are physical addresses).
• CR3 contains the page directory 'physical' base address.
• The page directory can reside at any 4K boundary since the low order 12 bits of
• the address are set to zero.
• The page directory contains 1024 directory entries of 4 bytes each.
• Each page directory entry addresses a page table that contains up to 1024
entries.
PAGING:
PAGING:
PAGING:
PAGING
D: Dirty . This bit is set if a write has been performed to the page pointed to by the PTE. Dirty bits
are used to determine if the page should be written back to hard disk when the page is swapped out
(to make room for a new page coming in)
A: Accessed. This bit is set if a read or write was performed to the page selected by the PDE and
PTE. This nit is used by the operating system to help choose a victim page to swap out when all pages
are in use and ane w page must be loaded into RAM. A page that has been accessed is less likely to
be swapped out than a page that has been not accessed.
PCD: Catch Disable. This bit determines whether the current memory access is cached.
PWT: Write through. This bit enables write through operations between the cache and memory.
U: User. This bit is used when the performing protection checks on the current memory address.
PAGING
W: Writeable. This bit determines whether the page may be written to and is also used in protection
checks.
P: PRESENT. This bit indicates whether the page is actually stored in memory. In a demand-paging
system, when a new page is needed , one of two conditions may be true:
There is a free frame page available.

No page frame are available.
If a page frame is available , the new page is copied into memory at the appropriate address ,the
TLB’s are updated ,and the P-bit is set to indicate that the page is in memory.
If no free pages exist, a victim page must be chosen to make room for the new page. The P-bit of the
victim’s PTE is cleared , to show that the page has been swapped out. The page may be copied back
to hard disk(as required by the dirty bit)befors the new page is read in.
REGISTER SETS IN PENTIUM
REGISTER SETS IN PENTIUM GENERAL PURPOSE REGISTERS
 The general-purpose register is able to hold 8-,16-.or 32-bit 31 16 15 8 7 0
data. AH AL
EAX ACCUMULATOR
 The 8086 microprocessor has byte and word sized registers, EBX BASE
BH BL
but 80386 contains double word sized or extended registers. CH CL

ECX COUNT
 The 8-,16-bit register can be addressed just like the 8086
DH DL
EDX DATA
processor.
SI
 The AX,BX,CX,DX,SI,DI,BP,SP,FLAGS and IP registers are ESI SOURCE INDEX
DESTINATION
DI
16-bit registers and they have been extended to 32-bits. EDI INDEX
 A 32-bit register called an extended register and it is BP

EBP BASE POINTER
represented by the register name with prefix E. ESP STACK POINTER SP
 Similarly, all 32-bit general purpose registers are represented

by EAX, EBX, ECX, EDX, ESI, and EDI. The other 32-bits
registers are EBP, ESP, EFLAGS and EIP.
SEGMENT REGISTER IN PENTIUM
SEGMENT REGISTERS
15 0
 Besides the above 32-bit registers, the 80386 also provides 2 new CS
16-bit segment registers such as FS and GS. SS
 Therefore, all segment registers of 80386 are CS,DS,ES,SS,FS, and DS
GS. ES
 CODE SEGMENT(CS) FS
 DATA SEGMENT (DS)
GS
 EXTRA SEGMENT (ES)
 STACK SEGMENT (SS)
 The FS and GS registers are additional extra segment registers

which allows access 6-different segments
CONTROL REGISTERS
The 80386 processor has four 32-bit control register CONTROL REGISTERS
: CR0-CR3. 31 16 15 0
CR4
These registers are used to hold global machine
CR3
status. The load and store instructions are used to
access these registers. CR2

CR1
In 80386, these registers perform paged memory
CR0
management, cache enable/disable and protected
mode operation.
CR4- controls the Pentium processors extensions for

virtual 8086 operation
CR4 also used for debugger support and it can

DEBUG REGISTERS DEBUG REGISTERS
31 16 15 0
The 80386 has eight 32-bit debug registers DR7- DR0 for hardware DR7
debugging. DR6
DR5
Among the eight debugging registers, two registers DR5 and DR4 are
DR4
reserved by Intel. DR3
The first four registers DR3 to DR0 are used to store four program DR2
controllable breakpoint addresses. DR1

DR0
The DR7 and DR6 hold breakpoint control and break point status
information respectively.
TEST REGISTERS
TEST REGISTERS
Two test registers TR6 and TR7 exist in the 80386 processor for
31 16 15 0
page caching. Branch
prediction,
The 80386 has four system address registers to refer the superscalar
TR12 operation
descriptor table.
TR7 test control
The four different types of descriptors tables are Global
Descriptor Table (GDT), Interrupt Descriptor Table (IDT),Local TR6 test status
Descriptor Table (LDT),Task State Segment Descriptor(TSS).
The Global Descriptor Table (GDT) contains descriptor available 15 31 19

0 0 0
to all tasks. TR TSS SELECTOR TSS BASE ADDR TSS LIMIT
The local descriptor table (LDT) contains descriptors that can be LDTR LDTSS SELECTOR LDT BASE ADDR LDT LIMIT
private to a task. All tasks may have their private LDTs. IDRT IDT BASE ADDR IDT LIMIT
GDTR GDT BASE ADDR GDT LIMIT

SYSTEM ADDRESS REGISTERS
• The GDT may contain all descriptor types except interrupt and trap descriptors.
• The LDT contains segment, task gate, and call gate descriptors.
• A segment cannot be accessed by a task if its segment descriptor does not exist in either
GDT or LDT at the time of access
FLAG REGISTER
• The flag register of the 80386 is a 32-bit register. EFLAG REGISTER
15 0
• Among these 32-bits,D31 to D18,D15,D5 and D3 are 31 16
EFLAG FLAG
reserved by the Intel and D1 is always 1.
• The lower 15 bits of the flag register of 80586 are same as 80286.
• Only two flags are newly added to the 80586
• The two new flags are VM and RF flags.

• RF(Resume Flag):
• At the starting of each instruction cycle, the status of RF is always checked. If RF=1, any debug fault will be ignored while
executing any instruction. This flag is automatically reset after execution of instructions except IRET and POPF.
• VM( Virtual Mode Flag)
• When this flag is set, the 80386 enters in the virtual 8086 mode within the protected mode .If VF is set,80386 operates in
protected mode. When this flag is cleared or reset, the 80386 operates in real address mode.
OPERATING MODES IN PENTIUM
The Pentium processor architecture supports three
operating modes such as
 Protected mode,
 Real-address mode and
 System Management Mode (SMM) and
 One ‘quasi operating mode’ or virtual -8086 mode.
Protected Mode:
 The protected mode is the local operating mode of
the processor.
 In this mode, all instructions and architectural features
are available; the processor is able to provide the
highest performance capability.
Real Addressing Mode:
 The real addressing operating mode provides the programming environment of the 8086 processor
incorporating the ability to switch to the protected mode or system management mode.
System Management Mode (SMM):
 The System Management Mode (SMM) of the Pentium processor provides an operating system with a
transparent mechanism for implementing power management.
 When an external system interrupt pin (SMI#) IS ACTIVATED, A System Management Interrupt (SMI)
is generates and the processor has to be entered in the system management mode.
 In this mode, the processor switches to a separate address space while saving the context of the
currently running program or task.
 Then the system management mode’s specific code can be executed transparently. Upon returning from SMM,
the processor can be back to the real- address mode state, or protected mode state or virtual 8086 mode
state from the system management mode by using RESET or RSM signal.
Virtual-8086 Mode:
When the processor operates in protect mode, it can support a quasi-operating mode known as virtual-8086
mode.
The mode allows the processor to execute 8086 software in a protected as well as multitasking environment.
 Initially, The Pentium processor enters the real-address mode through a power up or a reset operation.
The switching between real address mode and protected mode requires the initialization before the mode is
changed. When PE=1, the processor operating mode is changes from real address mode to protected mode.
When VM=1, the processor, operating mode changes from protected mode to virtual 8086 mode.
PROTECTED MODE
 The protected mode operation of the Pentium processor protects different task in multitasking operating
system from invalid accesses.
• Memory Management of Pentium:
• During protected mode operation of the Pentium processor, memory management is done in two
different ways, namely, segmentation and paging.
• Segmentation: Segmentation issued to the isolate individual code, data and stack modules so that
multiple programs can run on the same processor without interfering with other program.
• Paging: Paging is one of the memory management techniques which allows the processor to
address a range of virtual memory that is greater than the physical memory that can be addressed using the
processor’s address bus alone. This is done by swapping pages in and out of the main memory and on and off
the disk.
PROTECTED MODE
• In protected mode, segmentation cannot be disabled but the use of paging is optional.
• Actually segmentation divides the processor‘s addressable memory space into small protected address
space called segments.
• This address space is also known as the linear address space.
• Usually, segments are used to hold the code, data and stack for a program or it can also hold system data.
• Whenever more then one program is executed on the processor, each program must be assigned its own set of
segments.
• Then the processor enforces the boundaries between these segments so that one program does not
interface with the execution of other program.
MEMORY MANAGEMENT OF PENTIUM PROCESSOR
MULTITASKING
 Most significant features of protected mode is its ability to support execution of multiple
programs is known ass tasks.
 it can able to switch from task to task at very high speed gives the impression tht many task are
all running at the same time. Time SLICE
TASK 1 TASK 2 TASK 3 TASK 1 TASK 2 TASK 4
TASK SWITCH TASK 3 COMPLETES TASK 4 BEGINS
 Each task executes for a period of time is called time slice.
 Task switch is used to switch from one task to the next.

31 16 15 0
TASK STATE SEGMENT (TSS) 0 LINK (previous TSS) 0
ESP0 4
SS0 8
ESP1 C
• During task switch the content of all processor reg, as well as other
SS1 10
information are saved and thn new info is loaded for next task. ESP2 14
SS2 18
CR3 (PDBR) 1C
• The special memory structure is known as Task State Segment EIP 20
EFLAGS
(TSS). 24
EAX 28
ECX 2C
• TSS contains 32 bit reg, 16 bit segment selector and EDX 30
EBX 34
additionally storage for stack pointer.
ESP 38
EBP 3C
• When task is created the task LDT`s selector(addr 60H), PDBR ESI 40
EDI 44
(ADDR 1CH), T-bit, I/O map base address are filled or enabled. ES 48
CS 4C
• During task switch these items are read but not changed. SS 50
DS 54
• Register portion (20H to 5CH) is modified during task switch FS 58
GS 5C
being overwritten current content of each reg. TASK LDT SELECTOR 60
I/O MAP BASE ADDR T 64
TSS DESCRIPTORS
31 24 23 20 19 16 15 14 13 12 10 9 87 0
Base Base
limit (16-
address G 0 AVL P DPL 10 B 1 addr(16-
19)
(24 -31) 23)
Base address (0 - 15) Segment Limit (0 - 15)
• Base address : 32 bit segment base address

• Segment limit : 20-bit segment size limit
• G (granularity): it determines how the limit field is interpreted
• G=0, segment size from 1 byte to 1MB
• G=1, segment size from 4KB to 4GB.
• AVL : Segment available or not.
• P: Segment present (segment is in memory or not).
• DPL : Descriptor privilege level (GDT, LDT, IDT).
• B: busy bit (task is currently running or waiting to run when B=1).
TSS REGISTER
• When multiple TSS descriptor exist in the GDT, the TSS currently in use is accessed through the task
register.
• The task reg is used as an index pointer into the GDT to locate a TSS descriptor.
TR Selector Base address segment limit
visible to
program invisible to programmer
mer
• Visible portion accesed by programmer.
• Invisible portion automatically loaded with information from TSS descriptor.
• New task register is loaded with new TSS descriptor with LTR ( Load Task Reg) instruction.
• LTR need 16 bit reg, a new task reg is loaded to execute via LTR
• The visible portion of task may be read with STR (Store Task Register)
TASK GATES
• TSS descriptor contains two DPL bits tht specify the privilege level of the segment, a task switch may
result in privilege violation if the new task has a lower priority than the currently executing
task.
• It may be necessary for an interrupt or exception to cause a task switch to a segment containing a
handler code.
• Task gates as an way of task switching.
• Task gates may be stored in the LDT of a task or in the IDT 31 16 15 14 13 12 87 0
• TSS description points to the handler code Reserved P DPL 101 Reserved 4
For the divide error exeception, thr is possible TSS

segment Reserved 0
Many different task may generate this exception selector
Each will require task gate in its LDT to access the TSS descriptor of the divide error handler.
TASK SWITCHING
• It can categories as follows
• The current task JMPs or CALLs a TSS descriptor.
• The current task JMPs or CALLs a TSS gate
• The current task execute an IRET when the NT flag is set
• An interrupt or exception selects a task gates.
• When task switching takes place the following steps are taken
• The new TSS descriptor or task gates might have proper privilege to allow a task switch
• The DPL(descriptor privilege level), CPL( Current privilege level), RPL(Requested pivilege level) values are compared before
any further processing takes place.
• The new TSS descriptor must have its present bit set and have a valid limit field.
• The state of current task is saved (copying all processor reg in to TSS for current task)
• The task reg is loaded with new TSS descriptor selector is set,
• The busy bit is TSS descriptor is set, TS =1 in CR0.
• The state of the new task is loaded from its TSS and current Execution is resumed.
LOGICAL TO PHYSICAL ADDRESS MAPPING IN MULTIPLE
TASK
EXCEPTIONS AND INTERRUPTS
• An interrupt is generated is response to

a hardware request on the INTR or
NMI input`s.
• An exception is generated during the

execution.
• Example: divide error is an exception

generated when DIV or DIVI ins are
executed while a divisor operand is 0.
Interrupt Descriptor Table

Interrupt Descriptor Table
• Real mode uses a 1KB interrupt vector table(IVT) starts with addr 0000H.
• An 8–bit vector number is shifted 2 bits to the left to form an index into the IVT.
• The protected mode relies on an interrupt descriptor table(IDT) to support and exception.
• The IDT consist of 8–byte gate descriptor for task, trap or interrupt gates.
• The size of IDT is controlled by a 16-bit limit value stored in interrupt table descriptor register(IDTR).
• The 48-bit reg contains the 32-bit base address & 16-bit size limit.
• The 8-bit vector number for currently recognized interrupt is shifted 3 bits to the left and use
as an index into the IDT.
• Vector 10H accesses the descriptor at addr 80H in the IDT.

Interrupt Descriptor Table(IDT) Descriptors
TASK GATE
The IDT consist of 3 descriptor such as Task gate, Trap
31 16 15 14 13 12 8 7 0
gate, interrupt gate.
Reserved P DPL 101 Reserved
The P-bit in each descriptor explains for present and indicates
TSS Segment Selector Reserved
whether the segment is present in memory.
The 2-bit DPL field specify the descriptor privilege level.
The 32-bit addr pts to the first instruction in the handler`s INTERRUPT GATE
code segment. 31 16 15 14 13 12 5 4 0
The segment selector executed from GDT or LDT for OFFSET (16-31) P DPL 1110000 Reserved
interrupt and trap gate. Segment Selector OFFSET( 0-15 )
The TSS segment selector for a task gate pts to a TSS

descriptor from GDT.
TRAP GATE
Interrupt and trap gate operate like CALL to a call gate
Task gate operate like CALL to task gate 31 16 15 14 13 12 8 7 0
OFFSET (16-31) P DPL 1110000 Reserved

The pentium will enter shut down mode if the limit is
exceeded. TSS Segment Selector OFFSET( 0-15 )
Interrupt and exception Descriptions Vector Descriptions
Error
code
Cause
0 divide error No divisor is 0
1 debug exception No multiple task during debugging
2 NMIinterrupt No rising edge, cant disable via s/w
3 breakpoint No opcode CCH
31 3 2 1 0 4 overflow No overflow flag
5 bounda check No array subscript out of range

Reserved U/S W/R P
6 invalid opcode No D6H,F1H reserved by intel
7 device not avl No no external coprocessor
8 double fault Yes 2 exception occur in sequential
U/S 0- Access occurred in supervisor mode 9 Yes not avl in pentium
1-Access occurred in user mode 10 invalid TSS Yes invalid TSS during task switch
11 segment not present Yes in current descriptor no segment
12 stack fault Yes SS clear

0- Read Fault
W/R 13 general protection Yes exceed the segment limit
1-Write Fault 14 page fault Yes page not in memory
15 Yes not avl in pentium
16 floating point error No round off number

0- Page not present
P memory operand larger thn 1byte
17 alignment check Yes begins at odd addr or even addr tht
1-Page level protection violation is not multiple of 2,4,8
18 machine check Depends on CPU model
19-31 reserved ---

PAGE FAULT ERROR CODE 32-255 mask No maskable interrupt
ADDRESSING MODES
• The Pentium processors can support the following addressing modes:
• Register mode
• Immediate mode
• Register direct mode
• Direct mode
• Base displacement mode
• Base indexed mode
• PC relative mode
• The effective address can be computed as
• Effective address= Base register+ Index register * Scaling factor+ displacement
• Where,
• Base register are EAX, EBX, ECX, EDX, ESP and EBP
• Index register is EDI and ESI
• Scaling factor is 1, 2, 4 and 8
• For example, the addressing mode looks like
• [EBX][EDI*2]+FF
REGISTER ADDRESSING MODES
• Register mode: In this mode, the operand is in a register.
• The content of EBX register is copied to EAX register.
Example: The effect of
executing the MOV BX, CX
instruction at the point just
before the BX register
changes. Note that only the
rightmost 16 bits of register
EBX change.
IMMEDIATE ADDRESSING MODES
• Term immediate implies that data immediately follow the hexadecimal opcode in the
memory.
– immediate data are constant data
– data transferred from a register or memory location are variable data
EXAMPLE: The operation of
the MOV EAX,3456H
instruction. This instruction
copies the immediate data
(13456H) into EAX.
REGISTER DIRECT ADDRESSING MODES
• Direct addressing with a MOV instruction transfers data between a memory location, located
within the data segment, and the AL (8-bit), AX (16-bit), or EAX (32-bit) register.
• usually a 3-byte long instruction
• MOV AL,DATA loads AL from the data segment memory location DATA (1234H).
• DATA is a symbolic memory location, while 1234H is the actual hexadecimal location
EXAMPLE: The
operation of the MOV
AL,[1234H]
instruction when
DS=1000H .
DIRECT ADDRESSING MODES
• In direct address mode, the

instruction
effective address is within the
opcode address A
instruction.
• For example MOV EAX, address. Memory

The content of the effective
address will be copied into the EAX
register.
operand
REGISTER INDIRECT ADDRESSING MODES
• Allows data to be addressed at any memory
location through an offset address held in any
of the following registers: BP, BX, DI, and SI.
• In the 64-bit mode, the segment registers serve

no purpose in addressing a location
in the flat model.
EXAMPLE: The operation of the MOV AX,

[BX] instruction when BX = 1000H and DS
= 0100H. Note that this instruction is
shown after the contents of memory are
transferred to AX.
BASE DISPLACEMENT ADDRESSING MODES
• The effective address is sum of the content of the two registers and a
constants.
• For example, MOV EAX, [ESP+4].The content of the memory location

specified by the effective address must be copied into the EAX register.
instruction
opcode R A
Memory
+
operand
Registers
BASE INDEXED ADDRESSING MODES
• The effective address is the sum of the
contents of two registers. For example
MOV EAX,[ESP][ESI].
• The contents of ESP and ESI registers are

added to generate the effective address of
the memory location.
• The content of the memory location specified

by the effective address must be copied into
EXAMPLE: An example showing
the EAX register.
how the base-plus-index
addressing mode functions for the
MOV DX,[BX + DI] instruction.
Notice that memory address
02010H is accessed because
DS=0100H, BX=100H and
PC RELATIVE ADDRESSING MODES
JMP (ANY
ADDRESS
CONDITIONAL INS)
INSTRUCTION SET IN PENTIUM
Meanings of the operand specifications:
reg - register mode operand, 32-bit register
reg8 - register mode operand, 8-bit register
r/m - general addressing mode, 32-bit
r/m8 - general addressing mode, 8-bit
immed - 32-bit immediate is in the instruction
immed8 - 8-bit immediate is in the instruction
m - symbol (label) in the instruction is the effective address
• INTEGER INSTRUCTION Data Transfer Instructions
• MOV Move
• CMOVE/CMOVZ Conditional move if equal/Conditional move if zero
• CMOVNE/CMOVNZ Conditional move if not equal/Conditional move if not zero
• CMOVA/CMOVNBE Conditional move if above/Conditional move if not below or equal

CMOVAE/CMOVNB Conditional move if above or equal/Conditional move if not below
• CMOVB/CMOVNAE Conditional move if below/Conditional move if not above or equal

CMOVBE/CMOVNA Conditional move if below or equal/Conditional move if not above
• CMOVG/CMOVNLE Conditional move if greater/Conditional move if not less or equal
• CMOVGE/CMOVNL Conditional move if greater or equal/Conditional move if not less
• CMOVL/CMOVNGE Conditional move if less/Conditional move if not greater or equal
• CMOVLE/CMOVNG Conditional move if less or equal/Conditional move if not greater

• CMOVC Conditional move if carry

• CMOVNC Conditional move if not carry
• CMOVO Conditional move if overflow
• CMOVNO Conditional move if not overflow
• CMOVS Conditional move if sign (negative)
• CMOVNS Conditional move if not sign (non-negative)
• CMOVP/CMOVPE Conditional move if parity/Conditional move if parity even CMOVNP/CMOVPO
Conditional move if not parity/Conditional move if parity odd XCHG Exchange
•• BSWAP

Byte swap
• XADD Exchange and add CMPXCHG Compare and exchange

CMPXCHG8B
• Compare and exchange 8 bytes PUSH
• Push onto stack
• POP Pop off of stack
• PUSHA/PUSHAD Push general-purpose registers onto stack
• POPA/POPAD Pop general-purpose registers from stack
• IN Read from a port
• OUT Write to a port
• CWD/CDQ Convert word to doubleword/Convert doubleword to quadword
• CBW/CWDE Convert byte to word/Convert word to doubleword in EAX register
• MOVSX Move and sign extend
• MOVZX Move and zero extend

FLOATING POINT INSTRUCTION SET IN PENTIUM

Unit 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1

Uploaded by

Copyright:

Available Formats

The Intel 86 Family of Processors

• 80486 was introduced 1989

• » Added parallel execution

• – Achieves scalar execution of

• Due to pipelined execution of the write operation, data must be available in

• Pentium Processor • Separate 8k Code and 8k data caches

• 32-bit Microprocessor • Similar to 80486 but with 64-bit data bus

• 64-bit Data Bus. • Superscalar performance allows two instructions

• Dynamic Branch prediction • To improve this, architects employs the technique

exists two problems

• A)How to issue multiple instructions

• B) How to execute them concurrently

• MII architecture may be again redivided into two architectures

• A)Very Long Instruction Word (VLIW)architecture

• branch prediction done using the branch target buffer (BTB)

• The Pentium has a microcode support unit to support complex function.

• A superscalar processor executes more than one instruction

• Each functional unit is not a separate CPU core but an

is issuing two instructions in parallel is known as paring.

• Decode the two consecutive instructions I1 and I2

• If the following are all true

– I1 and I2 are simple instructions

– I1 is not a jump instruction

– Destination of I1 is not a source of I2

– Destination of I1 is not a destination of I2

• Then issue I1 to u pipeline and I2 to v pipeline

• Else issue I1 to u pipeline

• WRITE BACK (WB)

• Address of memory resident operands

2.Decode1(D1): • The instruction is executed in ALU Data

• A single control word causes direct execution of an instruction

3. Address Generate (D2):

4. Memory and Register Read (Execution Stage) (EX):

5. Floating Point Execution Stage 1(X1):

7. Write FP Result (WF):

• FRD - Floating Point Rounding FDD - Floating Point Division

2. There must be no read-after-write or write-after-write register dependencies between them.

1. Neither instruction may contain both a displacement and an immediate

2. Instructions with prefixes can only occur in the u-pipe.

1.FP instructions do not get paired with integer instructions.

valid memory address.

current nonburst cycle.

the normal memory read/write operation.

• T2 – (DATA TIME) The second and subsequent clk in bus cycles.

• Suppose instruction I3 is a conditional jump to I50 at some other address(target

• Prediction will be either taken or not taken.

• It means the prediction logic has no history on instruction.

• The branch target address is sent back and recorded in BTB.

• The paging system operates in both real and protected mode.

• It is enabled by setting the PG bit to 1 (left most bit in CR0).

• (If set to 0, linear addresses are physical addresses).

• CR3 contains the page directory 'physical' base address.

• the address are set to zero.

• The page directory contains 1024 directory entries of 4 bytes each.

There is a free frame page available.

 The general-purpose register is able to hold 8-,16-.or 32-bit 31 16 15 8 7 0

but 80386 contains double word sized or extended registers. CH CL

 A 32-bit register called an extended register and it is BP

 Similarly, all 32-bit general purpose registers are represented

 The FS and GS registers are additional extra segment registers

access these registers. CR2

CR4- controls the Pentium processors extensions for

CR4 also used for debugger support and it can