You are on page 1of 31

Data Hazards

Data Hazards
We must ensure that the results obtained when instructions are executed in a pipelined processor are identical to those obtained when the same instructions are executed sequentially. Hazard occurs A3+A B4A No hazard A5C B 20 + C When two operations depend on each other, they must be executed sequentially in the correct order. Another example: Mul R2, R3, R4 Add R5, R4, R6

Time Clock c y cle Instruction I 1 (Mul) F1 D1 F2 E1 D2 W1 D2A E2 W2 1 2 3 4 5 6 7 8 9

I 2 (Add)

I3

F3

D3

E3

W3

I4

F4

D4

E4

W4

Figure 8.6. Pipeline stalled by data dependenc y between 2 Dand W 1.

Data Hazards

Figure 8.6. Pipeline stalled by data dependency between D2 and W1.

Time Clock c y cle Instruction I 1 (Mul) F1 D1 F2 E1 D2 W1 D2A E2 W2 1 2 3 4 5 6 7 8 9

I 2 (Add)

I3

F3

D3

E3

W3

I4

F4

D4

E4

W4

Figure 8.6. Pipeline stalled by data dependenc y between 2 Dand W 1.

Data Hazards

Figure 8.6. Pipeline stalled by data dependency between D2 and W1.

Operand Forwarding
Instead of from the register file, the second instruction can get data directly from the output of ALU after the previous instruction is completed. A special arrangement needs to be made to forward the output of ALU to the input of ALU.

Data Hazards
Select R2 and R3 for ALU Operations ADD R2 and R3 STORE SUM IN R1

ADD R1, R2, R3

IF

ID

EX

WB

SUB R4, R1, R5

IF

ID

EX

WB

Select R1 and R5 for ALU Operations

Stalling
Stalling involves halting the flow of instructions until the required result is ready to be used. However stalling wastes processor time by doing nothing while waiting for the result.

Cont
ADD R1, R2, R3
IF ID EX M WB

STALL

IF

ID

EX

WB

STALL

IF

ID

EX

WB

STALL

IF

ID

EX

WB

SUB R4, R1, R5

IF

ID

EX

WB

Type of Pipelining
Software Pipelining
Can Handle Complex Instructions Allows programs to be reused

Hardware Pipelining
Help designer manage complexity a complex task can be divided into smaller, more manageable pieces. Hardware pipelining offers higher performance

Type of Hardware Pipelines


Instruction Pipeline - An instruction pipeline is very similar to a manufacturing assembly line. Data Pipeline data pipeline is designed to pass data from stage to stage.

Instruction Pipelines Conflict


It divided into two categories.
Data Conflicts Branch Conflicts

When the current instruction changes a register that the next one needed, data conflicts happens. When the current instruction make a jump, branch conflicts happens.

Control Hazards

Overview
Branch Penalties Branch Prediction

Branch Penalties
Branches are a major problem because you do not know which instruction will come next until the instruction is executed The time to fill the pipeline again is known as the branch penalty. The larger the number of stages in the pipeline the larger the potential branch penalty.

Branch Penalties Cont..

Instruction 3 is a conditional branch to instruction 20, which is taken. As soon as it is executed in step 6, the pipeline is flushed (instruction 3 is able to complete) and instructions starting at #20 are loaded into the pipeline. Note that no instructions are completed during cycles 7 through 10.

Branch Prediction: Improving Branch Performance


Static Prediction With static branch prediction the instruction loaded will depend on the design of the pipeline. If the prediction is correct there is no branch penalty. A compiler has to aware of the type of prediction used on the machine in order to optimize the machine code properly. Never taken - The prediction is that the branch will not be taken. Therefore, the next instructions in memory are loaded into the pipeline Always taken The prediction is that the branch will be taken so the next instruction loaded is at the branch destination. Code prediction The processor determines which prediction to use based on the instruction.

Delayed Branch
The instructions in the delay slots are always fetched. Therefore, we would like to arrange for them to be fully executed whether or not the branch is taken. The objective is to place useful instructions in these slots. The effectiveness of the delayed branch approach depends on how often it is possible to reorder instructions.

Delayed Branch
LOOP NEXT Shift_left Decrement Branch=0 Add (a) Original program loop R1 R2 LOOP R1,R3

LOOP

NEXT

Decrement Branch=0 Shift_left Add


(b) Reordered instructions

R2 LOOP R1 R1,R3

Figure 8.12. Reordering of instructions for a delayed branch.

Clock c

ycle

Delayed Branch
2 3 4 5 E F E

T ime 6 7 8

Instruction Decrement F

Branch

Shift (delay slot)

Decrement (Branch taken)

Branch

Shift (delay slot)

Add (Branch not taken)

Figure 8.13. Execution timing showing the delay slot being filled during the last two passes through the loop in Figure 8.12.

Branch Prediction: Improving Branch Performance


Static Prediction With static branch prediction the instruction loaded will depend on the design of the pipeline. If the prediction is correct there is no branch penalty. A compiler has to aware of the type of prediction used on the machine in order to optimize the machine code properly. Never taken - The prediction is that the branch will not be taken. Therefore, the next instructions in memory are loaded into the pipeline Always taken The prediction is that the branch will be taken so the next instruction loaded is at the branch destination. Code prediction The processor determines which prediction to use based on the instruction.

Branch Prediction Cont..

Better performance can be achieved if we arrange for some branch instructions to be predicted as taken and others as not taken. Use hardware to observe whether the target address is lower or higher than that of the branch instruction. Let compiler include a branch prediction bit. So far the branch prediction decision is always the same every time a given instruction is executed static branch prediction.

Influence on Instruction Sets

Overview
Some instructions are much better suited to pipeline execution than others. Addressing modes Conditional code flags

Addressing Modes
Addressing modes include simple ones and complex ones. In choosing the addressing modes to be implemented in a pipelined processor, we must consider the effect of each addressing mode on instruction flow in the pipeline:
Side effects The extent to which complex addressing modes cause the pipeline to stall Whether a given mode is likely to be used by compilers

Addressing Modes
In a pipelined processor, complex addressing modes do not necessarily lead to faster execution. Advantage: reducing the number of instructions / program space Disadvantage: cause pipeline to stall / more hardware to decode / not convenient for compiler to work with Conclusion: complex addressing modes are not suitable for pipelined execution.

Addressing Modes
Good addressing modes should have:
Access to an operand does not require more than one access to the memory Only load and store instruction access memory operands The addressing modes used do not have side effects

Register, register indirect, index

Conditional Codes
If an optimizing compiler attempts to reorder instruction to avoid stalling the pipeline when branches or data dependencies between successive instructions occur, it must ensure that reordering does not cause a change in the outcome of a computation. The dependency introduced by the conditioncode flags reduces the flexibility available for the compiler to reorder instructions.

Datapath and Control Considerations

Incrementer

PC

Re gister f ile

Constant 4

MUX

A ALU B R

Instruction decoder

IR

MDR

Original Design
Memory b us data lines Address lines

MAR

Figure 7.8. T hree-b us or ganization of the datapath.

Re gister f ile

Bus A

Bus B

A ALU B R

PC Control signal pipeline Incrementer

Instruction decoder

IMAR Memory address (Instruction f etches)

Instruction queue MDR/Write Instruction cache

DMAR

MDR/Read

Pipelined Design
Memory address (Data access) Data cache

Figure 8.18. Datapath modified for pipelined xecution, e with interstage uffers b at the input and output of the ALU.

- Separate instruction and data caches - PC is connected to IMAR - DMAR - Separate MDR - Buffers for ALU - Instruction queue - Instruction decoder output

- Reading an instruction from the instruction cache - Incrementing the PC - Decoding an instruction - Reading from or writing into the data cache - Reading the contents of up to two regs - Writing into one register in the reg file - Performing an ALU operation

Bus C