Professional Documents
Culture Documents
Processor
Bogdan Ilisie
&
Rika Kanai
These features made the newly introduced chip a very popular choice for desktop,
although it was later found that the processor had some notorious implementation
errors.
The Pentium CPU (MMX)
Pipelined Integer Unit
As it can be seen from the previous diagram, the Integer unit has two pipelines(U and
V),while the Floating Point Unit (FPU) has one pipeline.
Although different later processors like the MMX tampered with the 5 execution
steps(by adding intermediate LIFO structures to hold bulks of instructions), the
steps remain the core foundation of the pipelining.
Pipelined Integer Unit
1) In the Pre-fetch cycle, two pre-fetch buffers read instructions to be executed. Instructions can be fetched from the
U or V pipeline. The U pipeline contains more complex instructions.
2) In the Decode cycle, two decoders, decode the instructions and try to pair them together so they can run in parallel,
since the Pentium features a Superscalar architecture.
Even though the Pentium processor features a Superscalar architecture,
in order for two instructions to run concurrently, like in the diagram
below, they need to satisfy some rules. Essentially, the instructions have
to be independent otherwise they cannot be paired together.
For two instructions to be paired together in the Decode stage, they have to lack
dependencies.
The two paired instructions would also have to be basic, in the sense that they contain
no displacements or immediate addressing.
As it can be deduced, pipelines will sometimes execute an instruction at the time,
despite the Superscalar ability.
If two instructions are executing concurrently in the pipeline (given they satisfy the
proper conditions, and are independent) and one of them stalls as a result of hazard
control, the other one will also stall.
Branch Prediction
The prediction of whether a jump will occur or no, is based on the branch’s previous behavior.
There are four possible states that depict a branch’s disposition to jump:
When a branch has its address in the branch target buffer, its
behavior is tracked.
The Intel Pentium branch prediction algorithm is indeed better than a 50% guess, but it has
limitations.
In a need to increase the accuracy of branch predictions, the processors following the Pentium
adopted a different branch prediction algorithm.
Some loops have repetitive patterns and they need to be recognized. With a two bit binary
counter, it is impossible to attain any complexity.
Later generation processors, such as the Pentium MMX, Pentium Pro, Pentium II, use another
mechanism for branch prediction.
A 4 bit register is used to record the previous behavior of the branch. If the 4 bit register would be
0001, it would mean that the branch only jumped the last time out of 4.
A 4 bit register would not be of much use without any additional logic. In addition to the 4 bit
register, there are 16, 2-bit counters like the ones that were previously shown.
Branch Prediction (in later Pentium Models)
Therefore, by combining a 4 bit register that records past trends, with 16 individually updated
2-bit counters, we end up with a much stronger mechanism for prediction, which is currently
used in Pentium MMX, Pentium II, and others.
Newer Generation Chips
The Pentium MMX, includes new instructions, registers, and data types which are aimed at
maximizing the speed of multimedia computations.
Since multimedia work requires massive data manipulation, SIMD instructions were added to
the MMX set. SIMD instructions work on multiple data values at once, in order to maximize
the amount of work done by each instruction.
The improved multimedia support of the MMX, along with lower power consumption, larger
caches, and new branch prediction mechanisms, brought about the new generations of
Pentiums (II & III)