You are on page 1of 5

Branch Prediction Schemes

There are many methods to deal with the pipeline stalls caused by branch delay. We discuss four simple compile-time schemes in which predictions are static - they are fixed for each branch during the entire execution, and the predictions are compile-time guesses. Stall pipeline Predict taken Predict not taken Delayed branch

Stall pipeline
The simplest scheme to handle branches is to freeze or flush the pipeline, holding or deleting any instructions after the branch until the branch destination is known. Advantage: simple both to software and hardware (solution described earlier)

Predict Not Taken
A higher performance, and only slightly more complex, scheme is to predict the branch as not taken, simply allowing the hardware to continue as if the branch were not executed. Care must be taken not to change the machine state until the branch outcome is definitely known. The complexity arises from: we have to know when the state might be changed by an instruction; we have to know how to "back out" a change. The pipeline with this scheme implemented behaves as shown below:

Untaken Branch Instr Instr i+1 Instr i+2

IF

ID IF

EX ID IF

MEM EX ID MEM idle ID

WB MEM EX WB idle EX idle MEM WB WB MEM WB

Taken Branch Instr IF Instr i+1 Branch target

ID IF

EX idle IF

determined during ID. . we have fetched the fall-through and just continue. which has one branch delay slot is shown below: Untaken branch instr Branch delay instr(i+1) Instr i+2 Instr i+3 Instr i+4 Taken branch instr Branch delay instr(i+1) Branch target Branch target+1 IF ID EX MEM WB IF ID EX IF ID IF MEM WB EX ID IF IF ID EX MEM WB IF ID EX IF ID IF MEM WB EX ID MEM WB EX MEM WB MEM WB EX ID MEM WB EX MEM WB . As soon as the branch is decoded and the target address is computed. This causes all instructions following the branch to stall one clock cycle. These instructions are executed whether or not the branch is taken. we assume the branch to be taken and begin fetching and executing at the target address. the execution cycle with a branch delay of length n is Branch instr sequential successor 1 sequential successor 2 . In some machines where the target address is known before the branch outcome a predict-taken scheme might make sense. Predict Taken An alternative scheme is to predict the branch as taken. we restart the fetch at the branch target. Delayed Branch In a delayed branch. The pipeline behavior of the DLX pipeline. sequential successor n Branch target if taken Sequential successors are in the branch-delay slots.Branch target+1 IF ID EX MEM WB When branch is not taken. If the branch is taken during ID. . there is no advantage in this approach. Because in DLX pipeline the target address is not known any earlier than the branch outcome. . .

We will show three branch-scheduling schemes: From before branch From target From fall through Scheduling the branch-delay slot. usually the target instruction will need to be copied because it can be reached by another path. The left box in each pair shows the code before scheduling. In (b) the branch-delay slot is scheduled from the target of the branch. In (c) the branch-delay slot is scheduled from the not-taken fall through To make this optimazation legal for (b) and (c). the use of R1 in the branch condition prevents the ADD instruction (whose destination is R1) from being moved after the branch.Branch target+2 IF ID EX MEM WB The job of the compiler is to make the successor instructions valid and useful. the right box shows the scheduled code In (a) the delay slot is scheduled with an independent instruction from before the branch. it must be OK to execute the SUB instruction when the branch goes in the . This is the best choice. Strategies (b) and (c) are used when (a) is not possible. In the code sequences for (b) and (c).

most machines with conditional branches have introduced a cancelling branch. .if the branch is incorrectly predicted. May enlarge program if instructions are duplicated From before Branch must not depend on the rescheduled branch instructions From target From fall though Must be OK to execute rescheduled instructions if branch is not taken Must be OK to execute instructions if branch When branch is not taken is taken The limitations on delayed-branch scheduling arise from the restrictions on the instructions that are scheduled into the delay slots and our ability to predict at compile time whether a branch is likely to be taken or not. Cancelling Branch To improve the ability of the compiler to fill branch delay slots. The behavior of a predicted-taken cancelling branch depends on whether the branch is taken or not: Untaken branch instr Branch delay instr(i+1) Instr i+2 Instr i+3 Instr i+4 IF ID EX MEM IF ID idle IF ID IF WB idle idle EX ID IF MEM EX ID WB MEM EX WB MEM WB . Scheduling strategy Requirements When Improves Performance Always When branch is taken.g unexpected direction.if the branch behaves as predicted. In a cancelling branch the instruction includes the direction that the branch was predicted. the instruction in the branch delay slot is fully executed. the instruction in the delay slot is turned into no-op(idle). OK means that the work might be wasted but the program will still execute correctly. .

This is the source both of their advantage . . and their disadvantage . Delayed branches are an architecturally visible feature of the pipeline.Taken branch instr Branch delay instr(i+1) Branch target Branch target+1 Branch target+2 IF ID EX MEM WB IF ID EX IF ID IF MEM WB EX ID IF MEM WB EX ID MEM WB EX MEM WB The advantage of cancelling branches is that they eliminate the requirements on the instruction placed in the delay slot.allowing the use of simple compiler scheduling to reduce branch penalties.exposing an aspect of the implementation that is likely to change.