This action might not be possible to undo. Are you sure you want to continue?
Until now, we have assumed that there is no overlap in the execution of the basic steps of successive instructions. A substantial improvement in performance can be achieved by overlapping the execution of basic steps called pipe lining.
Example: Suppose that we want to perform the combined multiply and add operations
with a stream of numbers. Ai * Bi + Ci for i = 1,2,3,……,7
Each sub operation is to be implemented in a segment within a pipeline. The sub operations performed in each segment of the pipeline are as follows:
R1 R3 R5 Ai, R1* R2, R3 + R4 R2 R4 Bi Ci Input Ai & Bi
Multiply & Input Ci Add Ci to product
Multiplier Ai R3 Adder R5
Clock Pulse Number
Segment 1 R1 R2 A1 A2 A3 A4 A5 A6 A7 ----B1 B2 B3 B4 B5 B6 B7 -----
Segment 2 R3 R4 --A1 * B1 A2 * B2 A3 * B3 A4 * B4 A5 * B5 A6 * B6 A7 * B7 ----C1 C2 C3 C4 C5 C6 C7 ---
Segment 3 R5 ----A1 * B1 + C1 A2 * B2 + C2 A3 * B3 + C3 A4 * B4 + C4 A5 * B5 + C5 A6 * B6 + C6 A7 * B7 + C7
1 2 3 4 5 6 7 8 9
Content of register in pipeline Example 1. 2. 3. 4. Suppose to execute n task there are k-segment pipeline with a clock cycle time of tp . The first task T1 requires a time equal to ktp to complete its operation using k segments in the pipe. The remaining n - 1 tasks emerge from the pipe at the rate of one task per clock cycle and they will be completed after a time equal to (n - l) tp Therefore, to complete n tasks using a k-segment pipeline requires k + (n - 1) clock cycles.
SPEEDUP The speedup of a pipelining processing over an equivalent non pipelining processing is defined by the ratio. nt p S= (k + n − 1)t p As the number of tasks increases, n becomes much larger than k - 1, and k + n - 1 approaches the value of n. Under this condition, the speedup becomes tn S= tp If we assume that the time it takes to process a task is the same in the pipeline and non pipeline circuits, we will have tn = ktp. Including this assumption, the speedup reduces to kt p S= =k tp This shows that the theoretical maximum speedup that a pipeline can provide is k, where k is the number of segments in the pipeline. For example: When a finite number of tasks are executed in a pipeline, the space-time diagram clearly shows the pipeline start-up region, where all stages are not yet fully utilized, and the pipeline drainage region, composed of stages that have become idle because the last task has left them. If a pipeline executes many tasks, the overhead of start-up and drainage can be ignored and the effective throughput of the pipeline taken to be 1 task per cycle. In terms of instruction execution, this corresponds to an effective CPI (cycles per instruction) of 1. If the 5-stage pipeline has to be drained after executing 7 instructions (as in Figure 2.b), then 7 instructions are executed in 11 cycles, making the effective CPI equal to 11/7 = 1.57.
For example: There are 4 segments t1=60ns,t2=70ns,t3=100ns, t4=80ns and the interface registers have a delay of tr=10ns Using Pipeline : The clock cycle is choosen to be tp= t3+ tr =100+10 =110ns Non-pipeline: tn = t1+ t2+ t3+ t4+ tr =60+70+100+80+10 =320 The pipeline adder has a speedup of 320/110 = 2.9
There are 3 major difficulties that cause the instruction pipeline to deviate from its normal operation. 1. 2. 3. Resource conflict Data dependency Branch difficulties
Resource conflict: Resource conflict access to memory by 2 segments of the same time. Most of these conflicts can be resolved by using separate instruction and data memories. Data dependency: When an instruction depends on the result of a previous instruction ,but this result is not available. Branch difficulties: Branch difficulties arise from branch and other instructions that change that value of PC.
Example: Instr 1 FI Instr 2 Instr 3 Instr 4 Inst 5 DA FI FO DA FI EX FO DA EX FO -
EX FI -
In pipe line system in which instruction executions are overlapped, this means that one of the operations required for instructions IK+1,IK+2 may be started and completed before the instruction IK is completed, this difference can cause problems if not properly considered in the design of the control. Existence of such dependencies causes what is called “hazard”. These hazards must be detected and resolved so that the accurate results produced by the machine matching programmer’s expectations. The hardware technique that detects and resolves hazards is called interlock. A hazard occurs whenever an object within the system (i.e. registers, flags or memory locations) is accessed or modified by two separate instructions that are close enough in the program such that they will be active simultaneously in the pipeline. In computer architecture, a hazard is a potential problem that can happen in a pipelined processor.
Types of hazards:
1. 2. 3. 4. 1. Instruction hazard Data hazard Branching hazards Structural hazards INSTRUCTION HAZARDS:
An instruction RAW hazard occurs when the instruction to be initiated in the pipeline is fetched from a location that is yet to be updated by some uncompleted instruction in the pipeline. The instruction initiation must be suspended till the required change has occurred. To handle this hazard a centralized controller is required to keep the address in the range sets for the instructions inside the pipeline. It is then required that every instruction fetch, PC be compared against the possible match with the address in the address range set used by the subsequent stages. The match with any of the address means there is an instruction RAW hazard, and the instruction fetch must be suspended till the instruction updating the object moves out from point of hazard in the pipeline.
Types of hazards:1. 2. 3. RAW (read and write) or true dependency hazard WAR (write and read) or ant dependency hazard WAW (write and write) or out dependency hazard We demonstrate 3 hazards with the help of a sample program:Instruction no 1. 2. 3. 4. 5. RAW hazard: code STORE SUB STORE ADD STORE R4,A R3,A R3,A R3,A R3,A
In the above code, the second instruction must use the value of A updated by the first instruction. If the second instruction (SUB) reads the value A before instruction 1 has a chance to update it, a wrong value of data will be used by the CPU. This situation is called raw hazard. WAR hazard: A WAR hazard between 2 instructions i & j occurs when the instruction j attempts to write onto some object that being read by instruction i. The WAR hazard exists between the instructions 2 & 3, since an attempt by instruction 3 to record a value in A before instruction 2 has read the value is clearly wrong.
WAW hazard: When WAW hazard between 2 instructions i & j occurs when the instruction j attempts to write onto some object that is also required to be modified by the instruction i. Similarly WAW hazard occurs between the instructions 3 & 5 since an attempt by instruction 5 to store before the store of instruction 3 is clearly incorrect. 2. DATA HAZARDS:
Data hazards occur when data is modified. Ignoring potential data hazards can result in race conditions. There are 3 situations data hazard can occur in: 1. RAW (read after write). An operand is modified and read soon after. Example: Instr 1: R3 ← R1 + R2 Instr 2: R5 ← R3 + R2 The first instruction is calculating a value to be saved in register R3 and the second is going to use this value to compute a result for register 5. 2. WAR (write after read)
Example: R1←R2+R3 R3←R4+R5 We must ensure that we do not store the result of register R3 before it has had a chance to fetch the operands. 3. WAW (write after write) Two instructions that write the same operand are performed. For example: Instr -1: Instr -2: R2←R1+R3 R2←R4+R5
We must delay the WB (write back) of instr2 until the execution instr1.
Eliminating hazards: There are several established techniques preventing hazards: 1. 2 Bubbling the pipeline/pipeline break or pipeline stall Elimination data hazards
Pipeline stall Bubbles: Data dependency in pipelines (execution of one instruction depending on completion of a previous instruction) can cause pipeline stalls which diminish the performance. As instruction is fetched, control logic determines whether a hazard could/will occur. If this is true, then the control logic inserts NOPs into the pipeline. Thus, before the next instruction is executed, the previous would have had sufficient to complete and prevent the hazard. Bubble insertion The first solution is for the assembler to detect this type of data dependency and insert three redundant but harmless instructions (adding 0 to a register or shifting a register by 0 bit) they perform no useful work and just take up memory locations to space out the datadependent instructions. We say that the assembler inserts three bubbles in the pipeline to resolve a read-aftercompute data dependency. Actually, inserting two bubbles might suffice if we note that writing into and reading out of a register each take 1 ns. Insertion of bubbles in a pipeline implies reduced throughput; inserting three bubbles obviously hurts the performance more than inserting two bubbles. Therefore, an objective of pipeline designers is to minimize the instances of bubble insertion. An intelligent compiler can also help mitigate the performance penalty. A read-after-load data dependency is shown in Figure 4 where the third instruction uses the value that the second instruction loads into register $ 8. Without data forwarding, the third instruction must read register $ 8 in cycle 7, one cycle after the second instruction completes the load process. This implies the insertion of three bubbles into the pipeline to avoid using the wrong value in the third instruction. Note that data forwarding alone cannot solve the problem in this case. The value needed by the ALU in the third instruction becomes available at the end of cycle 5; so even if a data forwarding mechanism is available, we still need one bubble to ensure correct operation.
sw – store word ,lw –load word
Data Forwarding: writes into register $ 8 and the fourth instruction needs the result of the third instruction in register $ 9. Note that writing into register , $ 8 is completed in cycle 6; hence, reading of the new value from register $ 8 is possible beginning with cycle 7. The third instruction, however, reads out registers $8 and $2 in cycle 4 and will thus not get the intended value for register $ 8. This data dependency problem can be solved by bubble insertion or via data forwarding. Note that in Figure 3, even though the result of the second instruction is not yet stored in register $ 8 by the time the third instruction needs it, the result is in fact available at the output of the ALU. Thus, if a bypass path is provided from the output of the ALU to one of its inputs. This approach is known as data forwarding.
Forwarding involves feeding output data into a previous stage of a pipeline, for instance, let’s say we want to write the value 3 to register 1, and then add 7 to register 1 and store the result in register 2. i.e. Instr -0 : register 1=6 Inst r-1 : register 1=3 Instr -2 : register 2= register 1+7 =10 Following execution register 2 should contain the value 10. However if instruction 1 does not completely exit the pipeline before inst-2 starts execution, it means that the register 1 does not contain the value when instr-2 performs its addition. In such an event, inst-2 adds 7 to the old value of register 1 and so register 2 would contain 13.
Logic to detect the hazards in operand fetches can be worked out as follow from the definition of range and domain sets, we can device a mechanism to keep track of the domain and range sets for various instructions passing through the pipeline stages. Each set will associate with one set of storage registers for the domain set and another for the range set addresses. This storage will be required for all the stages beyond the decode stage. When an instruction moves from stage to stage it also carries its range and domain set information. Let k be the stage beyond which hazard can occur. For simplicity assume, that there is only one element each in the domain and each range set. Let DK & RK be the domain and range address register associated with a stage k. If a stage i needs to detect any of the 3 hazards, then it must compare it Ri & Di.
The comparators are connected as follows:1. 2. 3. The stage j detects RAW hazard by comparing Dj & Ri for all I >j > k The stage j detects WAR hazard by comparing Rj & Di for all I > j> k The stage j detects WAW hazard by comparing Rj & Ri for all I > j > k
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.