You are on page 1of 44

Chapter 4

The Processor:
Datapath and Control
MIPS Instruction Formats
Name Fields Comments
Field 6 5 5 5 5 6 All MIPS
bits bits bits bits bits bits instructions 32 bits
size
R op rs rt rd sh fun Arithmetic
I op rs rt address/ Transfer,
immediate branch
J op target address Jump

2
Instruction Execution Steps
• For every instruction, the first two steps are
identical:
– Send the program counter (PC) to the memory
that contains the code and fetch the instruction
from that memory.
– Read one or two registers, using fields of the
instruction to select the registers. (e.g. For load,
we need to read one register.)

3
Instruction Execution Steps
• After these two steps, the actions required to
complete the instruction depend on the
instruction class.
– Memory-reference
– arithmetic-logical
– branches

4
The Processor: Datapath & Control
• Generic Implementation:
– use the program counter (PC) to supply instruction
address
– get the instruction from memory
– read registers
– use the instruction to decide exactly what to do
Instruction Execution Steps
• Even across different instruction classes, we
can find some similarities.
– For example, all instructions use ALU after reading
the registers.
• The memory-reference instructions use the ALU for an
address calculation.
• The arithmetic-logical instructions use the ALU for the
operation execution.
• The branch instructions use the ALU for comparison.

6
Instruction Execution Steps
• After using the ALU, the actions required to
complete the different instruction classes
differ.
– A memory-reference instruction will need to
access the memory (to write/read).
– An arithmetic instruction must write the data
from the ALU back into a register.
– A branch instruction may need to change the
next instruction address based on the
comparison.
7
Abstract View of MIPS

Data

Register #
PC Address Instruction Registers ALU Address
Register #
Instruction
memory Data
Register # memory

Data

8
Building a Datapath

9
Building Blocks of Implementation
Instruction
address

PC
Instruction Add Sum
Instruction
memory

a. Instruction memory b. Program counter c. Adder

• The instruction memory stores the instructions of a


program.
• The program counter (PC, 32-bit register) keeps the address
of the instruction.
• An adder is needed to increment the PC to the address of
the next instruction.
10
Execution of An Instruction
• We must start by fetching the instruction from
memory.
• To prepare for executing the next instruction,
we must also increment the PC so that it
points at the next instruction, 4 bytes later.

11
Fetching Instructions and incrementing the PC

Add

Read
PC address

Instruction
Instruction
memory

12
R-Type Instructions

• Arithmetic-logical instructions
• Read two registers.
• Perform an ALU operation on the
contents of the registers.
• write the result into a register.
• add, sub, and, or, slt, …

13
Register Files
• A register file is a collection of registers in
which any register can be read or written by
specifying the number of the register in the
file.
• To read one word, we need an input (register
number) and one output (data).
• To write one word, we need two inputs
(register number, data).
14
Register Files and ALU

To write to the register file, The ALU is controlled by the 4-bit signal.
RegWrite signal must be asserted.

15
Datapath for R-Type Instructions
ALU operation
Read 3
register 1 Read
Read data 1
register 2 Zero
Instruction
Registers ALU ALU
Write result
register
Read
Write data 2
data

RegWrite

16
I-Type Instructions

17
load word/store word
• lw $t1, offset_value($t2)
• Memory[$t2 + offset_value] -> $t1
• offset_value: 16 bits
• We need both the register file and the ALU.
• We also need a unit to sign-extend the 16-bit
offset to a 32-bit signed value.

18
Data Memory and Sign-Extension Unit

MemWrite

Address Read
data 16 32
Sign
extend
Write Data
data memory

MemRead

a. Data memory unit b. Sign-extension unit


19
Datapath for Load/Store

3 ALU operation
Read
register 1 MemWrite
Read
data 1
Read
Instruction register 2 Zero
Registers ALU ALU
Write Read
result Address
register data
Read
Write data 2
Data
data
memory
RegWrite Write
data
16 32
Sign MemRead
extend

20
Branch/Jump
Instructions

21
beq Instruction
• beq $t1, $t2, offset
• 16-bit offset <= needs a sign-extension
• The base address for the branch address
calculation is the address of the instruction
following the branch (PC+4).
• The offset field is shifted left 2 bits so that it
is a word offset.
• The branch requires two operations:
– compute the branch target address.
– compare the register contents.
22
Datapath for a Branch

23
j Instruction
• j offset
• It replaces the lower 28 bits of the PC with the
lower 26 bits of the instruction shifted left by
2 bits.

24
Single Cycle Datapath
Implementation

25
Single Cycle Datapath
• The implementation attempts to execute all
instructions in 1 clock cycle.
– CPI is always 1.
– No datapath resource can be used more than
once per instruction.
– Any element needed more than once must be
duplicated.
– A separate instruction memory is required.

26
Multiplexor (Data Selector)
• Used to allow multiple connections to the
input of an element
• A control signal selects among the inputs.
• A multiplexor selects from among several
inputs based on the setting of its control lines.

27
Composing Datapaths
• The key differences between R-type and
memory-reference (lw, sw) are:
– The second input to the ALU is either a register (R-
type) or the sign-extended half of the instruction
(memory).
– The value stored into a destination register comes
from the ALU (R-type) or the memory (lw).

28
Combining R-Type and Memory

29
Adding Branch Datapath

30
ALU Control

ALU control input Function


000 AND
001 OR
010 add
110 subtract
111 set on less than

31
ALU Control
• Depending on the instruction class, the
ALU will need to perform one of the five
functions.
– lw, sw: compute the memory address by
addition.
– R-type: perform one of the five actions
(AND, OR, sub, add, set on less than)
depending on the 6-bit funct field.
– Branch: perform a subtraction.

32
ALU Control
• We generate the 3-bit ALU control input using
a small control unit that has as inputs the
function field of the instruction and a 2-bit
control field (ALUOp).
• ALUOp
– 00 : add for lw and sw
– 01 : subtract for beq
– 10 : determined by the funct field
• The 3-bit output directly controls the ALU.
33
ALU Control
opcode ALUOp operation Funct action ALU control
input
lw load word
sw store word
beq branch eq
R-type add
R-type subtract
R-type AND
R-type OR
R-type slt
34
Multiple Levels of Decoding
• The main control unit generates the ALUOp
bits, which then are used as input to the
ALU control that generates the actual
signals to control the ALU unit. => multiple
levels of decoding
• It can reduce the size of the main control
unit.
• Using several smaller control units may also
potentially increase the speed of the
control unit.
35
Truth Table for ALU Control Bits
ALUOp Funct Field ALU
ALUOp1 ALUOp2
control
F5 F4 F3 F2 F1 F0 input

0 0
0 1
1 0
1 0
1 0
1 0
36
1 0
Single Cycle Datapath (R+I+J-type)

37
7 Control Signals
Signal When When asserted (1)
deasserted (0)
RegDst register destination <- rt register destination <- rd

RegWrite None register is written

ALUSrc 2nd ALU op <- register 2nd ALU op <- sign-ext lower
16 bits of the instruction
PCSrc PC <- PC + 4 PC <- branch target address

MemRead None Data memory is read

MemWrite None Data memory is written

MemtoReg Register Write data <- Register Write data <- data
ALU memory
38
Single Cycle Datapath with Control

39
Control Signals for Each Opcode

Inst RegD ALUS Memt Reg Mem Mem Branch ALU ALU
st rc oReg Write Read Write Op1 Op2

R 1 0 0 1 0 0 0 1 0

lw 0 1 1 1 1 0 0 0 0

sw 0 1 n/a 0 0 1 0 0 0

beq 0 0 n/a 0 0 0 1 0 1

40
Why a single-cycle implementation is
not used today
• Inefficient
– The clock cycle must have the same length for
every instruction (CPI = 1).
– The clock cycle is determined by the longest
possible path in the machine.
– load <= 5 functional units
• instruction memory, register file, ALU, data memory,
register file

41
Single Cycle Disadvantages & Advantages
• Uses the clock cycle inefficiently – the clock cycle must be
timed to accommodate the slowest instruction
– especially problematic for more complex instructions like floating
point multiply

Cycle 1 Cycle 2
Clk

lw sw Waste

• May be wasteful of area since some functional units (e.g.,


adders) must be duplicated since they can not be shared
during a clock cycle
but
• Is simple and easy to understand
• Alternatives: multiple clock cycle datapath,
pipelining

43
Multicycle Advantages & Disadvantages
• Uses the clock cycle efficiently – the clock cycle is timed to
accommodate the slowest instruction step
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10
Clk

lw sw R-type
IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch

• Multicycle implementations allow functional units


to be used more than once per instruction as long
as they are used on different clock cycles
but
• Requires additional internal state registers, more
muxes, and more complicated control.

You might also like