Professional Documents
Culture Documents
Microcomputer Architecture
3. RISC versus CISC characteristics, overlapped register windows, pipelining-general considerations, RISC
pipeline, Instruction pipeline, parallel processing, vector processing, array processing, Superscalar processors
– overview, design issues, power P C, Pentium, CISC scalar & RISC scalar processors. Linear and linear
pipeline pressers.
6. DSP archi. and ASIC design: Basic architecture, operation, Pipelining, Application Specific Instruction-Set
Processors (ASIPs) – Micro Controllers and Digital Signal Processors.
REFERENCE BOOKS
1. Computer system Architecture (PHI) by M.Morris Mano
2. Digital Design (PHI) by M.Morris Mano
3. Computer Organization & Architecture – (PHI) by William Stallings.
4. Advanced Computer Architecture (McGraw Hill) by Kai Hwang.
5.ARM Architecture Reference Manual, 2nd Ed, Published 2001, edited by David Seal, Addison-Wesley.
Devasi Chocha Mobile -9662739107 Email-chochadevh@gmail.com
19-08-2020 12:46:44 Copyright@2020-21 Electrical Engg. Dept. of The M S University of Baroda 1
The Maharaja Sayajirao University of Baroda
Faculty of Technology and Engineering
Electrical Engineering Department
BE-IV-Electronics Microcomputer Architecture
Characteristic of RISC –
Characteristic of CISC –
1 RISC stands for Reduced Instruction CISC stands for Complex Instruction Set
Set Computer. Computer.
2 RISC processors have simple CSIC processor has complex instructions
instructions taking about one clock that take up multiple clocks for execution.
cycle. The average clock cycle per The average clock cycle per instruction
instruction (CPI) is 1.5 (CPI) is in the range of 2 to 15.
19-08-2020 12:46:45 Copyright@2020-21 Electrical Engg. Dept. of The M S University of Baroda 4
The Maharaja Sayajirao University of Baroda
Faculty of Technology and Engineering
Electrical Engineering Department
BE-IV-Electronics Microcomputer Architecture
LOAD A, 2:3
LOAD B, 5:2
PROD A, B
STORE 2:3, A
Pipeline principle
1. Fetch Instruction (FI): Read the next expected introduction into a buffer
4. Fetch Operands (FO): Fetch each operand from memory. Operands in register
need not be fetched.
5. Execute Instruction (EI): Perform the indicated operation and store the result, if
any, in the specified destination operand location.
INSTRUCTION PIPELINING
➢ First stage fetches the instruction and buffers it.
➢ When the second stage is free, the first stage
passes it the buffered instruction.
➢ While the second stage is executing the instruction,
the first stage takes advantages of
➢ any unused memory cycles to fetch and buffer the
next instruction.
➢ This is called instruction prefetch or fetch
overlap.
❖Fetch instruction(FI)
❖Decode instruction(DI)
❖Calculate operands (CO)
❖Fetch operands(FO)
❖Execute instructions(EI)
❖Write operand(WO)
❖ Fetch Instruction(FI)
Read the next expected instruction into a buffer
❖ Decode Instruction(DI)
Determine the opcode and the operand specifiers.
❖ Calculate Operands(CO)
Calculate the effective address of each source operand.
❖ Fetch Operands(FO)
Fetch each operand from memory. Operands in registers
need not be fetched.
❖ Execute Instruction(EI)
Perform the indicated operation and store the result
❖ Write Operand(WO)
Store the result in memory.
Si Si+1
m d
Latch delay : d
= max {m } + d
Pipeline frequency : f
f=1/
19-08-2020 12:46:45 Copyright@2020-21 Electrical Engg. Dept. of The M S University of Baroda 20
The Maharaja Sayajirao University of Baroda
Faculty of Technology and Engineering
Electrical Engineering Department
BE-IV-Electronics Microcomputer Architecture
Tk = [ k + (n-1)]
T1 = n k
Speedup factor
T1 nk nk
Sk = = [ k + (n-1)] = k + (n-1)
Tk
19-08-2020 12:46:45 Copyright@2020-21 Electrical Engg. Dept. of The M S University of Baroda 21
The Maharaja Sayajirao University of Baroda
Faculty of Technology and Engineering
Electrical Engineering Department
BE-IV-Electronics Microcomputer Architecture
Sk n
Ek = =
k k + (n-1)
n nf
Hk = =
[ k + (n-1)] k + (n-1)
Simple example
Consider a nonpipelined machine with 6 execution stages of lengths 50 ns, 50 ns, 60
ns, 60 ns, 50 ns, and 50 ns.
- Find the instruction latency on this machine.
- How much time does it take to execute 100 instructions?
Solution:
Instruction latency = 50+50+60+60+50+50= 320 ns
Time to execute 100 instructions = 100*320 = 32000 ns
Solution:
Remember that in the pipelined implementation, the length of the pipe stages must
all be the same, i.e., the speed of the slowest stage plus overhead. With 5ns
overhead it comes to:
Solution:
Speedup is the ratio of the average instruction time without pipelining to the
average instruction time with pipelining.
(here we do not consider any stalls introduced by different types of hazards which
we will look at in the next section)
Pipeline Hazards
There are situations, called hazards, that prevent the next instruction in the
instruction stream from being executing during its designated clock cycle.
Hazards reduce the performance from the ideal speedup gained by pipelining.
Pipelines Hazards
Structural hazards:
structural hazards are those that occur because of resource conflicts.
Structural Hazards
Example 1
• For cost-saving reasons, a CPU may be designed with a single
interface to memory.
• This interface is always used during IF.
• It is also used during MEM for Load or Store operations.
• When a Load or Store gets to the MEM stage, the instruction in the
IF stage must be stalled.
Structural Hazards
❖ In such a case, the MEM cycle of a branch would interfere with the EX cycle of
the following instruction, causing a stall.
❖ In both cases, the problem could be solved with additional CPU hardware.
❖ In the first case, a second memory port.
❖ In the second case, an additional ALU.
❖ Therefore, structural hazards are caused solely by insufficient hardware.
Data Hazards
Data Hazards
Data Hazards
• Soon we will discuss machines that allow Load and Stores to be executed
out of order.
19-08-2020 12:46:45 Copyright@2020-21 Electrical Engg. Dept. of The M S University of Baroda 34
The Maharaja Sayajirao University of Baroda
Faculty of Technology and Engineering
Electrical Engineering Department
BE-IV-Electronics Microcomputer Architecture
Data Hazards
(i2 tries to read a source before i1 writes to it) A read after write (RAW) data hazard
refers to a situation where an instruction refers to a result that has not yet been
calculated or retrieved. This can occur because even though an instruction is
executed after a prior instruction, the prior instruction has been processed only partly
through the pipeline.
For example:
i1. R2 <- R5 + R3 IF ID EX WB
i2. R4 <- R2 + R3 IF ID EX WB
The first instruction is calculating a value to be saved in register R2, and the
second is going to use this value to compute a result for register R4. However, in a
pipeline, when operands are fetched for the 2nd operation, the results from the first
have not yet been saved, and hence a data dependency occurs.
Data Hazards
•After instruction B has executed, the value of the register should be B's
result, but A's result is stored instead.
•This can only happen with pipelines that write values in more than one
stage, or in variable-length pipelines (i.e. FP pipelines).
•It does not happen in our version of the DLX pipeline, but a modified version
might allow it.
•More on this later.
(i2 tries to write an operand before it is written by i1) A write after write (WAW) data
hazard may occur in a concurrent execution environment.
For example:
Data Hazards
•This type of hazard is rare because most pipelines read values early and write
results late.
•However, it might happen for a CPU that had complex addressing modes. i.e.
autoincrement.
(i2 tries to write a destination before it is read by i1) A write after read (WAR) data
hazard represents a problem with concurrent execution.
For example:
In any situation with a chance that i2 may finish before i1 (i.e., with concurrent
execution), it must be ensured that the result of register R5 is not stored before i1
has had a chance to fetch the operands.
Data Hazards
•This is NOT a hazard since the register value does NOT change.
The problem with data hazards, introduced by this sequence of instructions can
be solved with a simple hardware technique called forwarding.
1 2 3 4 5 6 7
The key insight in forwarding is that the result is not really needed by SUB until
after the ADD actually produces it. The only problem is to make it available for
SUB when it needs it.
If the result can be moved from where the ADD produces it (EX/MEM register), to
where the SUB needs it (ALU input latch), then the need for a stall can be
avoided.
:
❖ The ALU result from the EX/MEM register is always fed back to the ALU input
latches.
❖ If the forwarding hardware detects that the previous ALU operation has written the
register corresponding to the source for the current ALU operation, control logic
selects the forwarded result as the ALU input rather than the value read from the
register file.
Forwarding of results to the ALU requires the additional of three extra inputs on each
ALU multiplexer and the addtion of three paths to the new inputs.
1 2 3 4 5 6 7 8 9
As our example shows, we need to forward results not only from the immediately
previous instruction, but possibly from an instruction that started three cycles earlier.
Forwarding can be arranged from MEM/WB latch to ALU input also. Using those
forwarding paths the code sequence can be executed without stalls:
1 2 3 4 5 6 7
ADD R1, R2, R3 IF ID EXadd MEMadd WB
SUB R4, R5, R1 IF ID EXsub MEM WB
AND R6, R1, R7 IF ID EXand MEM WB
Control Hazards
Control Hazards
Control Hazards
Control hazards can cause a greater performance loss for pipeline than data
hazards. When a branch is executed, it may or may not change the PC (program
counter) to something other than its current value.
If instruction i is a taken branch, then the PC is normally not changed until the end
of MEM stage, after the completion of the address calculation and comparison
The simplest method of dealing with branches is to stall the pipeline as soon as
the branch is detected until we reach the MEM stage, which determines the new
PC. The pipeline behaviour looks like :
Branch IF ID EX MEM WB
Branch
IF ID EX MEM WB
successor+1
Control Hazards
This control hazards stall must be implemented differently from a data hazard, since
the IF cycle of the instruction following the branch must be repeated as soon as we
know the branch outcome. Thus, the first IF cycle is essentially a stall (because it
never performs useful work), which comes to total 3 stalls.
Three clock cycles wasted for every branch is a significant loss. With a 30% branch
frequency and an ideal CPI of 1, the machine with branch stalls achieves only half
the ideal speedup from pipelining!
The number of clock cycles can be reduced by two steps:
❖ Find out whether the branch is taken or not taken earlier in the pipeline;
❖ Compute the taken PC (i.e., the address of the branch target) earlier.
Both steps should be taken as early in the pipeline as possible. In some machines,
branch hazards are even more expensive in clock cycles. For example, a machine
with separate decode and register fetch stages will probably have a branch delay -
the length of the control hazard - that is at least one clock cycle longer. The branch
delay, unless it is dealt with, turns into a branch penalty. Many older machines that
implement more complex instruction sets have branch delays of four clock cycles or
more.
In general, the deeper the pipeline, the worse the branch penalty in clock cycles.
19-08-2020 12:46:45 Copyright@2020-21 Electrical Engg. Dept. of The M S University of Baroda 50
The Maharaja Sayajirao University of Baroda
Faculty of Technology and Engineering
Electrical Engineering Department
BE-IV-Electronics Microcomputer Architecture
There are many methods to deal with the pipeline stalls caused by branch delay
❖ Stall pipeline
❖ Predict taken
❖ Predict not taken
❖ Delayed branch
Stall pipeline
The simplest scheme to handle branches is to freeze or flush the pipeline, holding
or deleting any instructions after the branch until the branch destination is known.
Advantage: simple both to software and hardware
A higher performance, and only slightly more complex, scheme is to predict the
branch as not taken, simply allowing the hardware to continue as if the branch
were not executed. Care must be taken not to change the machine state until the
branch outcome is definitely known.
Because in pipeline the target address is not known any earlier than the branch
outcome, there is no advantage in this approach. In some machines where the
target address is known before the branch outcome a predict-taken scheme might
make sense.
19-08-2020 12:46:45 Copyright@2020-21 Electrical Engg. Dept. of The M S University of Baroda 53
The Maharaja Sayajirao University of Baroda
Faculty of Technology and Engineering
Electrical Engineering Department
BE-IV-Electronics Microcomputer Architecture
Delayed Branch
Cont..
Taken branch instr IF ID EX MEM WB
Branch delay instr(i+1) IF ID EX MEM WB
Branch target IF ID EX MEM WB
Branch target+1 IF ID EX MEM WB
Branch target+2 IF ID EX MEM WB
The job of the compiler is to make the successor instructions valid and useful.
Then those instructions which now follow the branch can be executed while
the branch target is being determined.
1. Multiple streams
2. Prefetch branch target.
3. Loop buffer
4. Branch prediction
5. Delayed branch (Discussed in last topic)
Multiple Streams
Challenges:
Leads to bus & register contention
Multiple branches lead to further pipelines being needed
Branch Prediction
➢ Predict by opcode
VMIPS
19-08-2020 12:46:46 Copyright@2020-21 Electrical Engg. Dept. of The M S University of Baroda 69
The Maharaja Sayajirao University of Baroda
Faculty of Technology and Engineering
Electrical Engineering Department
BE-IV-Electronics Microcomputer Architecture
Vector Supercomputers
Epitomized by Cray-1, 1976:
Cray-1 (1976)