You are on page 1of 12

Cache and Pipeline

Prof. R. N. Biswas
RN Biswas

Microprocessors & Microcontrollers

1

Improvement of Speed by Cache   A Cache is a high-speed Memory interposed between the processor and the slower Main Memory.  Primary or L1 cache is at the chip level  Secondary or L2 cache is at the board level Cache reduces access time by exploiting Locality of Reference  Holds more often used data/code  Frees the external bus for other operations RN Biswas Microprocessors & Microcontrollers 2 . enabling faster access to data/code .

Direct-mapped Cache Main Memory Block M0 Block M1 Cache tag Block C0 Block C1 tag tag = 0 Block M127 Block M128 tag Block C127 tag = 1 Block M255 Tag 5 RN Biswas Block Word 7 4 Structure of Main Memory Address Block M3968 tag = 31 Microprocessors & Microcontrollers Block M4095 3 .

Fully Associative Cache Main Memory Cache tag Block C0 Block C1 tag tag = 0 Block M0 tag = 1 Block M1 Block M2 tag = 2 Block C127 tag Tag 12 Word 4 Structure of Main Memory Address RN Biswas tag = 4 094 Block M4094 tag = 4 095 Block M4095 Microprocessors & Microcontrollers 4 .

Set-associative Cache (4 blocks/set) Main Memory tag Set 0 tag tag tag tag Set 31 tag tag tag Cache Block C0 Block C1 tag = 1 Block C2 Block C3 Block M0 Block M1 Set 0 Block C124 Block C125 Block C126 tag = 127 Tag Set Word 7 5 4 Block M127 tag = 0 Block M128 tag = 127 Block M255 tag = 0 Block M3968 tag = 127 Block M4095 Set 1 Block C127 Structure of Main Memory Address RN Biswas tag = 0 Set 31 Microprocessors & Microcontrollers 5 .

RN Biswas Microprocessors & Microcontrollers 6 .memory updated only when the location is replaced by a new one from memory.  Cache hit – cache is accessed. Cache write requires memory update:   Write-back .Cache Access and Update Sequence  CPU floats memory address.  Cache Controller compares the tag field of the address with the tags in the selected set:   Cache miss – main memory is accessed and the fetched contents stored in the cache. Write-through .memory updated for every write.

RN Biswas Microprocessors & Microcontrollers 7 .  A processor having an n-stage pipeline would have up to n instructions simultaneously being processed by the different functional units of the processor.Speed Improvement by Pipelining   Processor speed can be enhanced by having separate hardware units for the different functional blocks.  The number of unit operations into which the instruction cycle of a processor can be divided for this purpose defines the number of stages in the pipeline. Effective processor speed increases ideally by a factor equal to the number of pipelining stages. with buffers between the successive units.

Typical Pipeline Organisation  A common choice is to have four such units :      Fetch: Fetch the instruction code from the memory. each separating two functional units of the processor. Decode: Decode the Op Code and fetch operand(s). and have to be completed within the same time as prescribed by the pipeline design RN Biswas Microprocessors & Microcontrollers 8 . A four-stage pipeline would require three buffers. Operate: Perform operation required by the op code. Operate cycle of I2.  Write cycle of I1. Decode cycle of I3 and Fetch cycle of I4 take place in the same time slot. Write: Store the result in the destination location.

A Four-stage Pipeline RN Biswas Microprocessors & Microcontrollers 9 .

creating a bubble in the pipeline which persists for several instructions. the Write cycle of the previous instruction has to be over before the Operate cycle of the next instruction can start.Data Dependency in Pipelining  If the input data for an instruction depends on the outcome of the previous instruction. The pipeline effectively idles through one instruction. F1 D1 O1 W1 F2 D2 idle F3 idle Bubble ends here RN Biswas O2 D3 W2 O3 W3 D4 O4 F4 Microprocessors & Microcontrollers W4 10 .

if taken 11 . But if I1 is a conditional branch instruction.Branch Dependency in Pipelining  A Branch instruction can cause a pipeline stall if the branch is taken. F1 D1 F2 O1 D2 F2 W1 O2 W2 executed if branch is not taken D2 executed for unconditional branch F2 RN Biswas branch instruction O2 W2 D2 O2 Microprocessors & Microcontrollers W2 for conditional branch. as the next instruction has to be aborted in that case. the next Fetch cycle (F2) can start after D1. If I1 is an unconditional branch instruction. F2 has to wait until O1 for the decision as to whether the branch will be taken or not.

An instruction unaffected by the write operation has to be placed in the Load Delay Slot.  Branch Dependency . with instructions preceding the branch placed in the Branch Delay Slots.  Requires optimising compilers to be written along with the design of the microprocessors. RN Biswas Microprocessors & Microcontrollers 12 .Avoidance of Pipeline Bubbles  Data Dependency .The branch instruction has to perform a delayed branch.