You are on page 1of 14

EXPERIMENT NO:

Aim: To study comparison between x86 and pentium processor


Theory:
Features:
● Complex Instruction Set Computer (CISC) architecture with Reduced Instruction Set
Computer (RISC) performance
● 64-Bit data Bus & 32 bit address bus
● Upward code compatibility
● Pentium processor uses Superscalar architecture and hence can issue multiple
instructions per cycle.
● Multiple Instruction Issue (MII) capability
● Pentium processor executes instructions in five stages. This staging, or pipelining, allows
the processor to overlap multiple instructions so that it takes less time to execute two
instructions in a row.
● The Pentium processor fetches the branch target instruction before it executes the
branch instruction.
● The Pentium processor has two separate 8-kilobyte (KB) caches on chip, one for
instructions and one for data. It allows the Pentium processor to fetch data and
instructions from the cache simultaneously.
● When data is modified, only the data in the cache is changed. Memory data is changed
only when the Pentium processor replaces the modified data in the cache with a
different set of data
● The Pentium processor has been optimized to run critical instructions in fewer clock
cycles than the 80486 processor.
● The Pentium processor has two primary operating modes -
I. Protected Mode - In this mode all instructions and architectural features are available,
providing the highest performance and capability. This is the recommended mode that
all new applications and operating systems should target.
II. Real-Address Mode - This mode provides the programming environment of the Intel
8086 processor, with a few extensions. Reset initialization places the processor in real
mode where, with a single instruction, it can switch to protected mode
● Floating Point Unit - There are 8 general-purpose 80-bit Floating point registers. Floating
point unit has 8 stages of pipelining. Since the possibility of error is more in Floating
Point unit (FPU) than in integer unit, additional error checking stage is there in FPU.

● Local advanced programmable interrupt controller


1
● Data Integrity and Error Detection -has significant error detection and data integrity
capability
● Data parity checking is done on byte – byte basis ->Address parity checking and internal
parity checking features are added
● Dual Integer Processor- Allows execution of two instructions per clock cycle
● Functional redundancy check- To provide maximum error detection of the processor and
interface to the processor
● A second processor ‘checker’ is used to execute in lock step with the ‘master’ processor-
It checks the master’s output and compares the value with the internal computed values
& An error signal is generated in case of mismatch
● Three execution units- One execution unit executes floating point instructions & the
other two (U pipe and V pipe) execute integer instructions

Architecture:
Pentium processor is a complex machine with many interlocking parts. At the heart of the
processors are the two integer pipelines, the U pipeline and the V pipeline. These pipelines are
responsible for executing 80x86 instructions. A floating-point unit is included on the chip to execute
instructions previously handled by the external 80x87 math coprocessors. During execution, the U and V
pipelines are capable of executing two integer instructions at the same time, under special conditions, or
one floating-point instruction.
The Pentium communicates with the outside world via a 32-bit address bus and a 64-bit data
bus. The bus unit is capable of performing burst reads and writes of 32 bytes to memory, and through
bus cycle pipelining, allows two bus cycles to be in progress simultaneously.
An 8KB instruction cache is used to provide quick access to frequently used instructions.
When an instruction is not found in the instruction cache, it is read from the external data bus and a
copy placed into the instruction cache for future references. The branch target buffer and prefetch
buffers work together with the instruction cache to fetch instructions as fast as possible. The prefetch
buffers maintain a copy of the next 32 bytes of prefetched instruction code, and can be loaded from the
cache in a single clock cycle, due to the 256-bit wide data output of the instruction cache.
A separate 8KB data cache stores a copy of the most frequently accessed memory data. Since
memory accesses are significantly longer than processor clock cycles, it pays to keep a copy of memory
data in a fast-reading cache. The data and instruction caches may both be enabled/disabled with
hardware or software. Both also employ the use of a translation look aside buffer, which converts logical
addresses into physical addresses when virtual memory is employed.
The Pentium uses a technique called branch prediction to maintain a steady flow of
instructions into the pipelines. To support branch prediction, the branch target buffer maintains a copy
of instructions in a different part of the program located at an address called the branch target.
The floating point unit of the Pentium maintains a set of floating point registers and provides
80- bit precision when performing high-speed math operations. The floating-point unit uses hardware in
the U and V pipelines to perform the initial work during a floating point instruction (such as fetching a
64- bit operand). And then uses its own pipeline to complete the operation. Since both integer pipelines
are used, only one floating point instruction may be executed at a time.

2
3
Superscalar Architecture and Pipelining
In Pentium processor, the integer instructions traverse a five-stage pipeline. The pipeline
stages are as follows:
PF – Prefetch
D1 – Instruction Decode
D2 – Address Generate
EX – Execute – ALU and Cache Access
WB – Write-Back
Pentium processor is a superscalar machine, capable of executing two instructions in parallel.
The five stage pipelines operate in parallel allowing integer instructions to execute in a single
clock in each pipeline. The pipelines in Pentium processor are called U and V pipes and the
process of issuing two instructions in parallel is termed as 2 Issue superscalar. There are two
execution units in Pentium and the instruction pairing allows each unit to complete the
execution of an instruction at the same time.
The Figure 1.2 depicts how ten instructions move through the pipeline of Pentium processor.

The five clock cycles are used to perform five pipeline stages. In the clock cycle 1, the
prefetch (PF) action is implemented. A pair of instructions is prefetched from the on- chip code
cache during clock 1. This first pair is issued in parallel to the U and V pipelines for decoding
purpose (D1 stage), while another pair is being prefetched (PF stage) during the clock 2 cycle.

4
In clock 3 cycle, the first instruction pair moves to decode 2 (D2) stage, while the second pair is
now issued to the decode 1 (D1) stage of both the pipelines and the third pair of instructions is
being fetched (PF stage). In this way, each pair of instructions can proceed to the next stage in
the pipeline with each cycle of the processor clock (PCLK). During clock cycle 5, the first
instruction pair completes its execution. If we observe the column of CLK5, the first pair is in the
last stage (WB) of the pipeline whereas the second pair is implementing the 4th stage (EX) and
the third instruction pair is at the 3rd stage (D2) of the pipeline and so on. Thus, ten different
instructions are present at the various pipeline stages during a single clock cycle. After the clock
cycle 5, each succeeding clock cycle shows the completion of another instruction pair.

Integer Pipeline Stages:

1. Prefetch (PF Stage):

There are two prefetch buffer/queues present in Pentium and at a time, one of them is active.
The active queue fetches the instruction codes from the on-chip cache or memory until the
branch prediction logic predicts that a branch will be taken when the branch instruction
reaches the execution stage. During the normal pipeline operation, this active queue
supplies two consecutive instructions to U and V pipelines.

2. Decode 1 (D1):

Stage Two pipelines filled with instructions are decoded in D1 stage. The instructions are first
checked for the pairability beside branch prediction.

▪ Instruction Pairing:
The two instructions are pairable only if they satisfy the following conditions
a) Both instructions in the pair must be simple. The instructions, which are completely
hardwired, are called Simple Instructions. They do not require any microcode control
and execute in 1,2 or at the most 3 clock cycle.
b) No register dependencies/contention between them.⎫
If the two instructions are not pairable, I2 instruction in the V pipeline’s D1 stage is deleted and
shifted to the D1 stage of the U pipeline when I1 is moved to the D2 stage of U pipeline.
▪ Branch Prediction:
The Pentium processor includes branch prediction logic, allowing it to avoid pipeline stalls
if it correctly predicts whether or not the branch will be taken when the branch
instruction is executed. When a branch operation is correctly predicted, no performance
penalty is incurred. However, when branch prediction is not correct, a three cycle
penalty is incurred if the branch is executed in the U pipeline and a four cycle penalty if
the branch is in the V pipeline.

5
3. Decode 2 or D2 Stage:

The D1 stage is followed by D2 stage in which the instructions are further decoded and the
addresses of memory resident operands are calculated. It performs segmentation
addressing. The address calculation at this stage is much faster. Pentium requires a single
clock cycle to calculate the address for the instructions containing a base and
index-addressing mode with displacement and an immediate addressing mode. During
the D2 stage, the processor also performs the segmentation protection checks required
when the processor forming memory addresses in protected mode.

4. Execution or EX-Stage:

The execution stage is comprised of the arithmetic logic unit, or ALU. The U pipeline’s ALU
incorporates a barrel shifter, while the V pipeline’s does not. It is obvious, then, that the
U pipeline can handle instructions that cannot be handled in the V pipeline. When
necessary, data cache accesses (on a cache hit) or memory accesses (on a cache miss)
are performed in this stage. Access to the data cache can be made by the U pipeline and
V pipeline simultaneously. Both instructions enter the execution stage at the same time.
If the instruction in the V pipeline stalls, the U pipeline instruction is permitted to
proceed to the write-back stage (i.e. the last stage in integer pipeline). However, if the U
pipeline instruction stalls, the V pipeline instruction will not proceed to the write-back
stage.

6
5. Write-Back or WB Stage:
This is the final stage of integer instruction execution. In WB stage, the processor state is
modified by updating target registers and EFLAGS register (if necessary).

Floating Point Instruction Pipeline Stages:

Most floating-point instructions are issued singly to the U pipeline and cannot be paired
with integer instructions. It consists of eight pipeline stages. The first four stages are shared with
integer pipeline and the last four reside within the floating-point unit itself.

Instruction Pairing Rules for Floating Point Instructions:

i. FP instructions are normally issued to the U pipeline singly as they do not get paired
with integer instructions. However, a limited pairing of two FP instructions can be
performed.
ii. Pairing can occur only if the first instruction issued to the U pipeline is a simple set F
instruction and the second instruction is the floating point exchange, FXCH
instruction. The F set or simple instructions are FLD single/ double precision, FLDST
(i) and all forms of FADD, FSUB, FMUL, FDIV, FCOM, FUCOM, FABS, and FCHS.

7
The 8 pipeline stages are:

FPU Internal Pipelining-resources :

Inside the FPU, all the resources are allocated to one or more instructions at one time. This
permits pipeline execution within the FPU. This is explained with the help of three examples:

i. FDIV instruction cannot be executed with any other instruction, since FDIV requires
all of the FPU resources.
ii. Similarly, two consecutive FMUL instructions cannot be executed simultaneously,
iii. FMUL instruction can be executed in parallel with one or two FADD instructions.
iv. Three FADD instructions can be executed simultaneously.

The Register Set (Software Model /Architecture)

The Intel x86 architectures register set is subdivided into the following groups:

1. Base architecture registers (Application register set)

I. General-purpose registers (8x32 bit)


II. Instruction pointer (EIP 32 bit)
III. Flags Register (EFLAGS 32 bit)

8
IV. Segment registers (6x16 bit)
Six 16-bit segment registers CS, SS, DS, ES, FS, and GS hold segment selector values
identifying the currently addressable memory segments. The selector in CS indicates
the current code segment, the selector in SS indicates the current stack segment, and
the selectors in DS, ES, FS and GS indicate the current four data segments.

2. System registers

I. Memory management registers:


Four memory management registers are used to control segmented memory management.
The Gobal Descriptor Tables Register (GDTR) and Interrupt Descriptor Table Register
(IDTR) can be loaded with instructions which get a 6-byte data item from memory. The
Local Descriptor Table Register (LDTR) and Task Register (TR) can be loaded with
instructions, which take a 16-bit segment selector as an operand. The remaining bytes
of these registers are then loaded automatically by the processor from the descriptor
referenced by the operand.

II. Control registers:


There are five control registers (CR0, CR1, CR2, CR3 and CR4). Only four of them are used by
the current implementation; register CR1 is reserved for future use. The CR0
register contains system control flags, which control modes of operation or indicate states of
the processor.

9
3. Floating-point registers

The on-chip FPU includes eight 80-bit data registers R0 to R7, a 16, bit tag word, a 16-bit control
registers, a 16-bit status register, a 48-bit instruction pointer, and a 48-bit data pointer.

I. Data registers
II. Tag word
III. Status word
IV. instruction and data pointers

4. Debug registers

The base architecture and floating-point registers are accessible by applications programs. The
system and debug registers are accessible only by system programs (such as OS), running on the
highest privilege level.

The Flags Register (EFLAGS) :

It is a 32-bit register called EFLAGS. The specified bits and bit fields of EFLAGS control a number
of operations and indicate the status of the processor. The lower 16 bits of EFLAGS, called
FLAGS, are used when executing 8086 or 80286 code.

10
Memory Management:

The primary functions of the MMU are:

1. Translation of the virtual (logical) address into a physical (real) address


2. Provide for the paging mechanism involved in the virtual memory organization. The
paging unit does this.
3. Provide for the segmentation mechanism by the segmentation unit.
4. Provide for memory protection. This is usually done within the paging or segmentation
unit, or both.
5. Inclusion and management of a fast-access translation look aside buffer (TLB)
▪ Segmentation

Segmented memory is utilized by protected mode to allow tasks to have their own separate
memory spaces, which are protected from access by other tasks. A segment can be from 1
byte to 4 GB long. Segments can start at any base address in memory, and storage
overlapping between segments is allowed.

▪ Address Translation Mechanism


A virtual (logical) address in the x86 architecture is formed out of two components:
1. 16-bit selector, used to determine the linear base address (the address of the
first byte of the segment) of the segment.
2. A 32-bit offset used the internally address within a segment. The offset of a
given memory location address is its distance in bytes from the segment base address.
(Called as EIP for instruction Fetch)
● Segment Descriptors

Each segment has a segment descriptor associated with it the segment descriptor is 8 bytes
long and contains the following information about the segment:

1. A 32-bit segment base linear address.


2. A 20-bit segment limit, specifying the size of the segment
3. Access rights byte, containing protection mechanism information
4. Control bits
● Paging Mechanism
The paging mechanism is optional. It is enabled when PG Bit (Bit 1) in CR0 is set. Paging
works beneath segmentation and is transparent to the segmentation. The standard page
size of the x86 is 4KB = 212 bytes, but can be extended to 4 Mbytes for Pentium
processor. The x86 uses two levels of tables to translate the linear address into a physical
address. There are three components to the paging mechanism: the page directory, the
page tables, and the page frame. A uniform size for all the elements simplifies memory
allocation and reallocation schemes, since there is no problem with memory
fragmentation
11
Pentium Processor Cache:
● The cache line size is 32-bytes.
● The Pentium processor has a separated code and data cache each of 8k bytes.
● Since the Pentium processor has data bus of 8 bytes (64 – bits), it requires a burst
of four consecutive transfers to fill the cache line of 32 bytes.
● The data cache can be configured as a write-through or a write-back cache on a
line-by-line basis and it follows the MESI protocol
● Each cache is organized as two-way set-associative.
● Each cache has a dedicated translation look aside buffer (TLB) to translate linear
addresses to physical addresses.
● The code cache does not require a write policy, as it is a read-only cache.
● The data cache tags are triple ported to support two data transfers and an snoop
cycle in the same clock.
● The code cache tags are also triple ported to support snooping and split line
access simultaneously.
● Individual pages in the main memory can be configured as cacheable or
non-cacheable by software or hardware.
● The cache can be enabled or disabled by software or hardware

Protection Mechanism in x86:

In protected mode, the Intel Architecture provides a protection mechanism that operates
at both the segment level and the page level. This protection mechanism provides the
ability to limit access to certain segments or pages based on privilege levels (four
privilege levels for segments and two privilege levels for pages). For example, critical
operating-system code and data can be protected by placing them in more privileged
segments than those that contain applications code. The processor’s protection
mechanism will then prevent application code from accessing the operating-system code
and data in any but a controlled, defined manner.

Segment and page protection can be used at all stages of software development to assist
in localizing and detecting design problems and bugs. It can also be incorporated into
end-products to offer added robustness to operating systems, utilities software, and
applications software. When the protection mechanism is used, each memory reference
is checked to verify that it satisfies various protection checks. All checks are made before
the memory cycle is started; any violation results in an exception. Because checks are
performed in parallel with address translation, there is no performance penalty.

12
The protection checks that are performed fall into the following categories:

• Limit checks.
• Type checks.
• Privilege level checks.
• Restriction of addressable domain.
• Restriction of procedure entry-points
• Restriction of instruction set

IO Protection in Intel x86:

Two mechanisms provide protection for I/O functions:

1. The IOPL field in the EFLAGS register defines the right to use I/O-related instructions.
2. The I/O permission bit map of a TSS segment defines the right to use ports in the I/O
address space.
These mechanisms operate only in protected mode, including virtual 8086 mode; they do not
operate in real mode. In real mode, there is no protection of the I/O space; any procedure can
execute I/O instructions, and any I/O port can be addressed by the I/O instructions.

13
Task Management in x86:

The x86 architecture was particularly designed for efficient handling of tasks in a
multitasking environment. A task can be defined as an instance of the execution of a program. A
very important attribute of any multitasking, multi-user OS is the ability to switch rapidly
between tasks. The x86 supports the task switching operation in hardware. The task switch
operation saves the entire state of the machine (all the registers, the address space, and a link
to the previous task), loads a new execution state, performs protection checks, and begins
execution of the new task.

The task switch operation is invoked by executing an inter segment JMP or CALL instruction,
which refers to a task state segment (TSS) or a task gate descriptor in the GDT or LDT.

Conclusion:

14

You might also like