You are on page 1of 44

Advanced Computer Systems

Architecture
Course Teacher: Dr.-Ing. Shehzad Hasan
CIS, NED University

Lecture # 6

Fall Semester 2015 CS-506 ACSA 1


Recap Lecture – 5
• Computer Arithmetic
• Non-Restoring Unsigned Division
• Non-Restoring Signed Division
– Quotient conversion from {-1, 1} to 2’s Complement

– Floating Point Arithmetic


• IEEE Floating Point Standard

Fall Semester 2015 CS-506 ACSA 2


Basic FP Operations
Addition & Subtraction
Assume e1  e2; alignment shift (preshift) is needed if e1 > e2

( s1  b e1) + ( s2  b e2) = ( s1  b e1) + ( s2 / b e1–e2)  b e1


= ( s1  s2 / b e1–e2)  b e1 =  s  b e

Example: Numbers to be added:


x = 25  1.00101101 Operand with
y = 21  1.11101101 smaller exponent
to be preshifted
Operands after alignment shift:
x = 25  1.00101101
y = 25  0.000111101101
Extra bits to be
Result of addition: rounded off
s = 25  1.010010111101
s = 25  1.01001100 Rounded sum

Fall Semester 2015 CS-506 ACSA 3


FP Addition
• When operand signs are alike, a single bit
normalization shift is always enough as 1 ≤ s < 4. If
the result is in 2 ≤ s < 4 it may have to be reduced by
a factor of 2 through a single bit right shift ( and
adding 1 to the exponent to compensate).
• When operands have different signs, the resulting
significand may be very close to 0 and left shifting by
many positions may be needed for normalization.
• Overflow/underflow can occur during the addition
step as well as due to normalization.

Fall Semester 2015 CS-506 ACSA 4


FP Addition x Operands y

Unpack
Unpack Signs Exponents Significands

Isolate the sign, exponent, significand Add/


Sub
Reinstate the hidden 1 Selective complement
Mu x Sub and possible swap
Convert operands to internal format
Identify special operands, exceptions Align significands

cout cin
Control Add
& sign
logic

Normalize

Round and
Pack selective complement

Combine sign, exponent, significand


Add Normalize
Hide (remove) the leading 1
Identify special outcomes, exceptions Sign Exponent Significand
Pack
s Sum/Difference

Fall Semester 2015 CS-506 ACSA 5


Basic FP Operations
Multiplication
( s1  b e1)  ( s2  b e2) = ( s1  s2 )  b e1+e2

Because s1  s2  [1, 4), postshifting may be needed for normalization

Overflow or underflow can occur during multiplication or normalization

Division
( s1  b e1) / ( s2  b e2) = ( s1 / s2 )  b e1-e2

Because s1 / s2  (0.5, 2), postshifting may be needed for normalization

Overflow or underflow can occur during division or normalization

Fall Semester 2015 CS-506 ACSA 6


Floating-Point Multipliers and Dividers
Floating-point operands
( s1  b e1)  ( s2  b e2) = ( s1  s2 )  b e1+e2
s1  s2  [1, 4): may need postshifting Unpack

Overflow or underflow can occur during


XOR Add
multiplication or normalization Exponents
Multiply
Significands

Adjust
Exponent Normalize

Round

Adjust
Normalize
Exponent

Pack

Product

Fall Semester 2015 CS-506 ACSA 7


Rounding Schemes
rtne(x)
4 The IEEE 754-2008 standard
3 includes five rounding modes:
2
1. Round to nearest, ties away
from 0 (rtna)
1
2. Round to nearest, ties to even
x (rtne) [default rounding mode]
–4 –3 –2 –1 1 2 3 4

–1
3. Round toward zero (inward)
4. Round toward + (upward)
–2
5. Round toward – (downward)
–3

–4

Rounding to the nearest even number

Fall Semester 2015 CS-506 ACSA 8


Rounding and Exceptions
Adder result = (coutz1z0 . z–1z–2 . . . z–l G R S)2’s-compl
Guard bit Sticky bit
Why only 3 extra bits? Round bit OR of all bits
shifted past R
Amount of alignment right-shift
One bit: G holds the bit that is shifted out, no precision is lost
Two bits or more: Shifted significand has a magnitude in [0, 1/2)
Unshifted significand has a magnitude in [1, 2)
Difference of aligned significands has a magnitude in [1/2, 2)
Normalization left-shift will be by at most one bit
If a normalization left-shift actually takes place: (1/2, 1) [1, 2)
R = 0, round down, discarded part < ulp/2 Shift left No shift
R = 1, round up, discarded part > ulp/2
The only remaining question is establishing whether the discarded part
is exactly ulp/2 (for round to nearest even); S provides this information
Fall Semester 2015 CS-506 ACSA 9
Examples
• Add
– 0 01111101 00000000000000000000000
– 0 10000101 10010000000000000000000

• Multiply
– 0 10000100 0100…. 00
– 1 00111100 1100…. 00

Fall Semester 2015 CS-506 ACSA 10


Invalidated Laws of Algebra
Many laws of algebra do not hold for floating-point arithmetic
(some don’t even hold approximately)
This can be a source of confusion and incompatibility
Associative law of addition: a + (b + c) = (a + b) + c
a = 0.123 41  105 b = – 0.123 40  105 c = 0.143 21  101
a +fp (b +fp c)
= 0.123 41  105 +fp (– 0.123 40  105 +fp 0.143 21  101)
= 0.123 41 105 –fp 0.123 39 105
Results = 0.200 00 101
differ (a +fp b) +fp c
by more = (0.123 41  105 –fp 0.123 40  105) +fp 0.143 21  101
than = 0.100 00 101 +fp 0.143 21 101
20%! = 0.243 21 101

Fall Semester 2015 CS-506 ACSA 11


Other Invalidated Laws of Algebra with FLP Arithmetic

Associative law of multiplication a (b c) = (a b) c

Cancellation law (for a > 0) a  b = a  c implies b = c

Distributive law a  (b + c) = (a  b) + (a  c)

Multiplication canceling division a (b /a) = b

Before the IEEE 754 floating-point standard became available and


widely adopted, these problems were exacerbated by the use of
many incompatible formats

Fall Semester 2015 CS-506 ACSA 12


Instruction Level Parallelism

Fall Semester 2015 CS-506 ACSA 13


Basic Pipelining
Consider a 5-stage instruction pipeline as shown below:
IF ID EX M WB

A time-space diagram is used to describe the progress of instructions through


the pipeline.

WB I1 I2 I3 I4 I5 I6
M I1 I2 I3 I4 I5 I6 I7
Stages

EX I1 I2 I3 I4 I5 I6 I7 I8
ID I1 I2 I3 I4 I5 I6 I7 I8 I9
IF I1 I2 I3 I4 I5 I6 I7 I8 I9 I10
1 2 3 4 5 6 7 8 9 10
Clock Cycles 

Fall Semester 2015 CS-506 ACSA 14


Basic Pipelining
WB I1 I2 I3 I4 I5 I6
M I1 I2 I3 I4 I5 I6 I7

Stages
EX I1 I2 I3 I4 I5 I6 I7 I8
ID I1 I2 I3 I4 I5 I6 I7 I8 I9
IF I1 I2 I3 I4 I5 I6 I7 I8 I9 I10
1 2 3 4 5 6 7 8 9 10
Clock Cycles 
Instruction Latency (the time it takes to complete an instruction)
= 5 cycles
Instruction Throughput (for 10 cycles)
=6/10 IPC = 0.6 IPC
Speedup of k-stage pipeline for a program having n instructions
𝑛𝑘𝜏 𝑛𝑘
𝑆𝑝 = =
(𝑘 + 𝑛 − 1)𝜏 𝑘 − 1 + 𝑛

Fall Semester 2015 CS-506 ACSA 15


MIPS Pipelined Architecture

Fall Semester 2015 CS-506 ACSA 16


MIPS Pipelined Architecture
Instruction Fetch (IF) Stage
• Instruction Fetch
Instruction’s address in PC is applied to
instruction memory that causes the
addressed instruction to become available
at the output lines of instruction memory.
• Updating PC
what is written in PC is determined by the
control signal PCSrc. Depending upon the
status of control signal PCSrc, PC is either
written by the branch target address (BTA)
or the sequential address (PC + 4).

Fall Semester 2015 CS-506 ACSA 17


MIPS Pipelined Architecture
Instruction Format

Fall Semester 2015 CS-506 ACSA 18


MIPS Pipelined Architecture

Instruction Decode (ID) Stage


• Instruction is decoded by the
control unit that takes 6-bit
opcode and generates control
signals.
• The control signals are buffered
in the pipeline registers until
they are used in the concerned
stage by the corresponding
instruction.

Fall Semester 2015 CS-506 ACSA 19


MIPS Pipelined Architecture
Instruction Decode (ID) Stage
• Registers are also read in this
stage.
 The first source register’s identifier in
every instruction is at bit positions
[25:21] and second source register’s
identifier (if any) is at bit positions
[20:16].
 The destination register’s identifier is
either at bit positions [15:11] (for R-
type) or at [20:16] (for load and
immediate data).

Fall Semester 2015 CS-506 ACSA 20


MIPS Pipelined Architecture

Execution (EX) Stage


• This stage is marked by the
use of ALU that performs the
desired operation on registers
(R-type), calculates address
(memory reference
instructions), or compares
registers (branch).

Fall Semester 2015 CS-506 ACSA 21


MIPS Pipelined Architecture

Execution (EX) Stage


• An ALU control accepts 6-bit
funct field and 2-bit control
signal ALUOp to generate
the required control signal
for the ALU.
• Branch Target Address (BTA)
is also calculated in the EX
stage by a separate adder

Fall Semester 2015 CS-506 ACSA 22


MIPS Pipelined Architecture
Memory (M) Stage
• Data memory is read (load)
or written (store) using the
address calculated by the
ALU in EX stage.
• Branch decisions are taken
in this stage
 ZERO output of ALU and
BRANCH signal generated by
the control unit are ANDed to
determine the fate of branch
(taken or not taken)

Fall Semester 2015 CS-506 ACSA 23


MIPS Pipelined Architecture
Write Back (WB) Stage
• Result produced by ALU
in EX stage (R-type) or
data read from data
memory in M stage (lw)
is written in destination
register.
• The data to be written …
in destination register is
selected via multiplexer
controlled by the control
signal MemToReg

Fall Semester 2015 CS-506 ACSA 24


MIPS Pipelined Architecture
Consider pipelined execution of following MIPS instructions:
ld R1, 10(R2)
dadd R3, R4, R5
The load instruction uses all stages in the pipeline but add instruction doesn’t
access data memory.
C1 C2 C3 C4 C5
ld IF ID EX M WB
dadd IF ID EX WB

A resource conflict is indicated in CC5. That is, two different instructions attempt
to use the same hardware in the same cycle.
This can be averted by ensuring uniformity: make all instructions pass through all
the stages in the same order.
As a consequence, some instructions will do nothing (accomplished through
disabling corresponding control signals) in some stages

Fall Semester 2015 CS-506 ACSA 25


Pipeline Hazards
• A pipeline hazard is a situation that prevents an instruction
from using a pipeline stage during the designated clock cycle.
• Hazards reduce the performance from the ideal speedup
gained by pipelining. There are three classes of hazards:
1. Structural hazards arise from resource conflicts when the
hardware cannot support all possible combinations of instructions
simultaneously in overlapped execution.
2. Data hazards arise when an instruction depends on the results of
a previous instruction in a way that is exposed by the overlapping of
instructions in the pipeline.
3. Control hazards arise from the pipelining of branches and other
instructions that change the PC.

Fall Semester 2015 CS-506 ACSA 26


Structural Hazards
Some pipelined processors have shared a single-memory pipeline for data
and instructions. As a result, when an instruction contains a data memory
reference, it will conflict with the instruction reference for a later instruction.

Fall Semester 2015 CS-506 ACSA 27


Structural Hazards
Solution: Stall the pipeline for 1 clock cycle when the data memory
access occurs

No instruction is initiated on clock cycle 4 (which normally would initiate instruction i+3).
Because the instruction being fetched is stalled, all other instructions in the pipeline
before the stalled instruction can proceed normally.

In the above figure it is assumed that instructions i+1 and i+2 are not memory references

Fall Semester 2015 CS-506 ACSA 28


Structural Hazards
• Structural hazards are typically averted by
employing replicated resources.
• This structural hazard can be avoided by
having Harvard Architecture i.e. separate
memory units for instructions and data
– One for instruction fetch and another for data
read/write.

Fall Semester 2015 CS-506 ACSA 29


Data Hazards
• Data hazards occur when the pipeline changes the order
of read/write accesses to operands so that the order
differs from the order seen by sequentially executing
instructions on a non-pipelined processor
• Consider the sequence of instruction
DADD R1, R2, R3
DSUB R4, R1, R5
AND R6, R1, R7
OR R8, R1, R9
XOR R10, R1, R11

Fall Semester 2015 CS-506 ACSA 30


Data Hazards

Fall Semester 2015 CS-506 ACSA 31


Data Hazards
There are a number of data dependencies
between various pair of instructions as
detailed below

1) The DADD instruction writes the value of R1


in the WB stage, but the DSUB instruction
reads the value during its ID stage

2) The AND instruction is also affected by this


hazard. The write of R1 does not complete
until the end of clock cycle 5. Thus, the AND
instruction that reads the registers during
clock cycle 4 will receive the wrong results.

3) The OR instruction operates without incurring a hazard because we perform the


register file reads in the second half of the cycle and the writes in the first half.

4) The XOR instruction also operates properly because its register read occurs in clock
cycle 6, after the register write.

Fall Semester 2015 CS-506 ACSA 32


Data Hazards
• The problem can be solved with a simple hardware
technique called forwarding (also called bypassing).
• Forwarding can be generalized to include passing a
result directly to the functional unit that requires it.
• In the previous example the result is not really
needed by the DSUB instruction until after the DADD
instruction actually produces it.
• If the result can be moved from the pipeline register
where the DADD stores it to where the DSUB needs
it, then the need for a stall can be avoided.

Fall Semester 2015 CS-506 ACSA 33


Data Hazards

Fall Semester 2015 CS-506 ACSA 34


Data Hazards
• Consider another example To prevent a stall in this sequence, we
DADD R1, R2, R3 would need to forward the values of
LD R4, 0(R1) the ALU output and memory unit
SD R4, 12(R1) output from the pipeline registers to
the ALU and data memory inputs.

Fall Semester 2015 CS-506 ACSA 35


Data Hazards
Where to find the ALU result?
• The ALU result generated in the EX stage is passed
through the pipeline registers to the MEM and WB
stages, before it is finally written to the register file.
• Since the pipeline registers already contain the ALU
result, we could just forward that value to
subsequent instructions, to prevent data hazards.

Fall Semester 2015 CS-506 ACSA 36


Forwarding Unit
Forwarding unit selects the correct ALU inputs for the
EX stage.

• If there is no hazard, the ALU’s operands will come from the


register file, just like before.
• If there is a hazard, the operands will come from either the
EX/MEM or MEM/WB pipeline registers instead.
• The ALU sources will be selected by two new multiplexers,
with control signals named ForwardA and ForwardB.

Fall Semester 2015 CS-506 ACSA 37


Forwarding Unit

Fall Semester 2015 CS-506 ACSA 38


Detecting Data Hazards
EX Hazard (ALU-ALU forwarding)
• An EX hazard occurs between the instruction currently in its
EX stage and the previous instruction if:
1. The previous instruction will write to the register file, and
2. The destination is one of the ALU source registers in the EX stage.
The first ALU source comes from the pipeline register when
necessary.
• if (EX/MEM.RegWrite && (EX/MEM.RegisterRd == ID/EX.RegisterRs))
then ForwardA = 10 (2)
The second ALU source is treated in a similar fashion.
• if (EX/MEM.RegWrite && (EX/MEM.RegisterRd == ID/EX.RegisterRt))
then ForwardB = 10 (2)

Fall Semester 2015 CS-506 ACSA 39


Detecting Data Hazards
MEM hazard (MEM-ALU forwarding)
• A MEM hazard may occur between an instruction in the EX
stage and the instruction from two cycles ago.
For detecting and handling MEM hazard for the first ALU source.
• if (MEM/WB.RegWrite && (MEM/WB.RegisterRd == ID/EX.RegisterRs)
&& ((EX/MEM.RegisterRd != ID/EX.RegisterRs) || != (EX/MEM.RegWrite)))
then ForwardA = 01
The second ALU operand is handled similarly.
• if (MEM/WB.RegWrite && (MEM/WB.RegisterRd == ID/EX.RegisterRt)
&& ((EX/MEM.RegisterRd != ID/EX.RegisterRt) || != (EX/MEM.RegWrite)))
then ForwardB = 01

Fall Semester 2015 CS-506 ACSA 40


Data Hazards
Consider the following sequence of instructions:
LD R1,0(R2)
DSUB R4,R1,R5
AND R6,R1,R7
OR R8,R1,R9

The LD instruction does not


have the data until the end
of clock cycle 4 (its MEM
cycle), while the DSUB
instruction needs to have
the data by the beginning
of that clock cycle.

Fall Semester 2015 CS-506 ACSA 41


Data Hazards
• The load instruction has a delay or latency that cannot be
eliminated by forwarding alone. Instead, we need to add
hardware, called a pipeline interlock, to preserve the correct
execution pattern.

Fall Semester 2015 CS-506 ACSA 42


Detecting Stalls
Consider
– LD R1, 0(R2)
– DSUB R4,R1,R5

• A load use hazard occurs between the current instruction in


its ID stage and the preceding instruction in the EX stage if:
– The preceding instruction is a load instruction and
– The load’s destination register is one of the current source registers.
• The set of equations to test the above conditions follows:
– if (ID/EX.MemRead = 1 && ((ID/EX.RegisterRt = IF/ID.RegisterRs) ||
(ID/EX.RegisterRt = IF/ID.RegisterRt)))
then stall

Fall Semester 2015 CS-506 ACSA 43


Detecting Stalls

Fall Semester 2015 CS-506 ACSA 44

You might also like