You are on page 1of 50

CENTRAL PROCESSING UNIT

ARCHITECTURE AND
CONTROL UNIT
PII Lecture 6: Processor:
Datapath and Control
 Datapath:
 Single-bus Organization
 Multiple-bus Organization
 MIPS: Multicycle Datapath and Control
 Stages of Instructions
 Datapath Walkthroughs
 Processor and Logic Design

CS1104-P2-6 Processor: Datapath and Control 2


PII Lecture 6: Processor:
Datapath and Control
 Reading:
 Chapter 9 of textbook, which is Chapter 7 in
“Computer Organization” by Hamacher,
Vranesic and Zaky.
 Optional reading: Chapter 5 in “Computer
Organization & Design” by Patterson and
Hennessy.

CS1104-P2-6 Processor: Datapath and Control 3


Datapath

CS1104-P2-6 Processor: Datapath and Control 4


Recap: Organisation

Bus

Processor Memory Devices

Control
Cache Input

Datapath
Output

Registers

CS1104-P2-6 Processor: Datapath and Control 5


Fundamental Concepts
 Processor (CPU): the active part of the
computer, which does all the work (data
manipulation and decision-making).
 Datapath: portion of the processor which
contains hardware necessary to perform all
operations required by the computer (the
brawn).
 Control: portion of the processor (also in
hardware) which tells the datapath what
needs to be done (the brain).
CS1104-P2-6 Processor: Datapath and Control 6
Fundamental Concepts (2)
 Instruction execution Instruction
Fetch
cycle: fetch, decode,
execute. Instruction
 Fetch: fetch next Decode
instruction (using PC) Operand
from memory into IR. Fetch
 Decode: decode the
instruction. Execute
 Execute: execute Result
instruction. Store

Next
Instruction

CS1104-P2-6 Processor: Datapath and Control 7


Fundamental Concepts (3)
 Fetch: Fetch next instruction into IR
(Instruction Register).
 Assume each word is 4 bytes and each instruction
is stored in a word, and that the memory is byte
addressable.
 PC (Program Counter) contains address of next
instruction.
IR  [[PC]]
PC [PC] + 4

CS1104-P2-6 Processor: Datapath and Control 8


Single-bus Organization
Internal
processor bus
Control signals
PC
...
Address line Instruction
MAR decoder
Memory and control
bus logic
MDR
Data line
IR
Y
Constant 4
RO
Select MUX :
:
Add
ALU Sub A B R(n–1)
control : ALU
lines Carry-in
XOR
TEMP

CS1104-P2-6 Processor: Datapath and Control 9


Instruction Execution
 An instruction can be executed by performing
one or more of the following operations in
some specified sequence:
 Transfer a word of data from one register to
another or to the ALU (Arithmetic Logic Unit).
 Perform an arithmetic or a logic operation and
store the result in a register.
 Fetch the contents of a given memory location and
load them into a register.
 Store a word of data from a register into a given
memory location.

CS1104-P2-6 Processor: Datapath and Control 10


Register Transfer
 Register to register transfer:
 For each register Ri, two control signals:
 Riin used to load the data on the bus into the register.
 Riout to place the register’s contents on the bus.
 Example: To transfer contents of R1 to R4:
 Set R1out to 1. This places contents of R1 on the bus.
 Set R4in to 1. This loads data from the processor bus into
R4.

CS1104-P2-6 Processor: Datapath and Control 11


Register Transfer (2)
Internal
processor bus

Riin
Yin
X
X
Ri
Y
Constant 4
X

Select MUX Riout

A B
ALU

Zin X

Z
X
Zout

CS1104-P2-6 Processor: Datapath and Control 12


Arithmetic/Logic Operation
Internal
processor bus
 ALU: Performs Riin
Yin
arithmetic and X
X
logic operations
Ri
Y
on its A and B Constant 4
X
inputs.
Riout
 To perform
Select MUX

R3  [R1] + [R2]: A B
ALU
1. R1out, Yin
2. R2out, SelectY, Zin X
Add, Zin
3. Zout, R3in Z
X
Zout

CS1104-P2-6 Processor: Datapath and Control 13


Arithmetic/Logic Operation (2)
 If there are n operations, do we need n
ALU control lines?
 We could use encoding, which requires
log2 n control lines for n operations.
However, this will increase complexity and
hardware (additional decoder needed).

Add
ALU Sub A B
control : ALU
lines Carry-in
XOR

CS1104-P2-6 Processor: Datapath and Control 14


Reading a Word from Memory
 Move (R1), R2 /* R2  [[R1]]
1. MAR  [R1]
2. Start a Read operation on the memory bus
3. Wait for the MFC response from the memory
4. Load MDR from the memory bus
5. R2  [MDR]
 MDR has four control signals: MDRin, MDRout, MDRinE
and MDRoutE. Memory-bus
data lines
Internal
processor bus
MDRin MDRin
E

X X

MDR

X X

MDRout MDRou
E t
CS1104-P2-6 Processor: Datapath and Control 15
Reading a Word from Memory (2)
 Move (R1), R2 /* R2  [[R1]]
 Sequence of control steps:
1. R1out, MARin, Read
2. MDRinE, WMFC
3. MDRout, R2in
 WMFC: Wait for arrival of MFC (Memory-Function-
Completed) signal.
 MFC: To accommodate variability in response time,
the processor waits until it receives an indication that
the Read/Write operation has been completed. The
addressed device sets MFC to 1 to indicate this.

CS1104-P2-6 Processor: Datapath and Control 16


Storing a Word in Memory
 Move R2, (R1) /* [R1]  [R2]
 Sequence of control steps:
1. R1out, MARin
2. R2out, MDRin, Write
3. MDRoutE, WMFC

CS1104-P2-6 Processor: Datapath and Control 17


Executing a Complete Instruction
 Add (R3), R1 /* R1  [R1] + [[R3]]
 Adds the contents of a memory location pointed to by
R3 to register R1.
 Sequence of control steps:
1. PCout, MARin, Read, Select4, Add, Zin Steps 1 – 3:
2. Zout, PCin, Yin, WMFC Instruction
fetch
3. MDRout, IRin
4. R3out, MARin, Read
5. R1out, Yin, WMFC
6. MDRout, SelectY, Add, Zin
7. Zout, R1in, End

CS1104-P2-6 Processor: Datapath and Control 18


Multiple-Bus Organization
 Single-bus structure: Control sequences are long as
only one data item can be transferred over the bus in
a clock cycle.
 Figure on next slide shows a three-bus structure.
 All registers are combined into a single block called
register file with three ports: 2 outputs allowing 2
registers to be accessed simultaneously and have
their contents put on buses A and B, and 1 input
allowing data on bus C to be loaded into a third
register.
 Buses A and B are used to transfer source operands
to the A and B inputs of ALU, and result transferred
to destination over bus C.

CS1104-P2-6 Processor: Datapath and Control 19


Multiple-Bus Organization (2)
Bus A Bus B Bus C Bus A Bus B Bus C

Incrementer

Instruction
PC decoder

Register IR
file
Constant 4
MDR
MUX

A
ALU
R
MAR

Address
line
Memory bus
data lines
CS1104-P2-6 Processor: Datapath and Control 20
Multiple-Bus Organization (3)
 For the ALU, R=A (or R=B) means that its A (or B)
input is passed unmodified to bus C.
 Add R4, R5, R6 /* R6  [R4] + [R5]
 Adds the contents of R4 and R5 to R6.
 Sequence of control steps:
1. PCout, R=B, MARin, Read, IncPC
2. WMFC
3. MDRoutB, R=B, IRin
4. R4outA, R5outB, SelectA, Add, R6in, End

CS1104-P2-6 Processor: Datapath and Control 21


Control
 Hardwired control or microprogrammed control.
 Hardwired control:
CLK Control step
Clock counter

...
External
: inputs
Decoder/
IR : encoder
:
Condition
: codes

...

Control signals

CS1104-P2-6 Processor: Datapath and Control 22


Memory bus
data lines
Performance Consideration

 What parameters can affect the performance??


 Architecture – definitely yes
 Speed of the Clock
 Powerful instruction set
 Separate units for fetching and execution
 Adopt faster memory – using Cache memory
 MIPS Using Superscalar Architecture

CS1104-P2-6 Processor: Datapath and Control 23


Memory bus
data lines
Control (2)
 Microprogrammed control:
 Control signals generated by a program.
 Control word (CW) is a microinstruction that contains
individual bits that represent the various control signals.
 Vertical organization: highly encoded schemes that use
compact codes to specify only a small number of control
functions in each microinstruction.
 Horizontal organization: minimally encoded scheme in
which many resources can be controlled with a single
microinstructions.
 Popular in Complex Instruction Set Architectures (CISC)
because complex instruction sets require complex
controllers that can more easily be implemented as
microprograms.

CS1104-P2-6 Processor: Datapath and Control 24


Memory bus
data lines
Control (3)

 Example of a horizontal 1.
2.
PCout, MARin, Read, Select4, Add, Zin
Zout, PCin, Yin, WMFC
organization scheme: 3. MDRout, IRin
4. R3out, MARin, Read
5. R1out, Yin, WMFC
6. MDRout, SelectY, Add, Zin
7. Zout, R1in, End
instruction

MDRout

WMFC
Select
MARin
Read
PCout

R1out

R3out
Micro-

PCin

.. ..

R1in
Add

End
Zout
IRjn
Yin

Zin
1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0
2 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0
3 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
4 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0
5 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0
6 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 Select=0: SelectY
7 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 Select=1: Select4
CS1104-P2-6 Processor: Datapath and Control 25
Memory bus
data lines
MIPS: Multicycle Datapath and Control

Adapted from D. Patterson’s CS61C


http://www.cs.berkeley.edu/~pattrsn/61CF00
Copyright 2000 UCB

CS1104-P2-6 Processor: Datapath and Control 26


Stages of a Datapath
 Problem: a single, atomic block which
“executes an instruction” (performs all
necessary operations beginning with fetching
the instruction) would be too bulky and
inefficient.
 Solution: break up the process of “executing
an instruction” into stages, and then connect
the stages to create the whole datapath.
 Smaller stages are easier to design.
 Easy to optimize (change) one stage without
touching the others.

CS1104-P2-6 Processor: Datapath and Control 27


Stages of a Datapath (2)
 There is a wide variety of MIPS instructions:
so what general steps do they have in
common?
 Stages
1. Instruction Fetch
2. Instruction Decode
3. ALU
4. Memory Access
5. Register Write

CS1104-P2-6 Processor: Datapath and Control 28


Stages of a Datapath (3)
 Stage 1: Instruction Fetch.
 No matter what the instruction is, the 32-bit
instruction word must first be fetched from
memory (the cache-memory hierarchy).
 Also, this is where we increment PC
(that is, PC = PC + 4, to point to the next
instruction; byte addressing so + 4).

CS1104-P2-6 Processor: Datapath and Control 29


Stages of a Datapath (4)
 Stage 2: Instruction Decode
 Upon fetching the instruction, we next gather data
from the fields (decode all necessary instruction
data).
 First, read the opcode to determine instruction
type and field lengths.
 Second, read in data from all necessary registers.
 For add, read two registers.
 For addi, read one register.
 For jal, no read necessary.

CS1104-P2-6 Processor: Datapath and Control 30


Stages of a Datapath (5)
 Stage 3: ALU (Arithmetic-Logic Unit)
 The real work of most instructions is done here:
arithmetic (+, -, *, /), shifting, logic (&, |),
comparisons (slt).
 What about loads and stores?
 lw $t0, 40($t1)
 The address we are accessing in memory =
the value in $t1 plus the value 40.
 We do this addition at this stage.

CS1104-P2-6 Processor: Datapath and Control 31


Stages of a Datapath (6)
 Stage 4: Memory Access
 Actually only the load and store instructions do
anything during this stage; for the other
instructions, they remain idle during this stage.
 Since these instructions have a unique step, we
need this extra stage to account for them.
 As a result of the cache system, this stage is
expected to be just as fast (on average) as the
others.

CS1104-P2-6 Processor: Datapath and Control 32


Stages of a Datapath (7)
 Stage 5: Register Write
 Most instructions write the result of some
computation into a register.
 Examples: arithmetic, logical, shifts, loads, slt
 What about stores, branches, jumps?
 They do not write anything into a register at
the end.
 These remain idle during this fifth stage.

CS1104-P2-6 Processor: Datapath and Control 33


Datapath: Generic Steps

registers
rd

instruction
memory
PC

memory
rs

Data
rt ALU

+4 imm

1. Instruction 2. Decode/ 3. Execute 4. Memory 5. Reg.


Fetch Register Write
Read

CS1104-P2-6 Processor: Datapath and Control 34


Datapath Walkthroughs: add
 add $r3,$r1,$r2 # r3 = r1+r2
 Stage 1: Fetch this instruction, increment PC.
 Stage 2: Decode to find that it is an add
instruction, then read registers $r1 and $r2.
 Stage 3: Add the two values retrieved in stage 2.
 Stage 4: Idle (nothing to write to memory).
 Stage 5: Write result of stage 3 into register $r3.

CS1104-P2-6 Processor: Datapath and Control 35


Datapath Walkthroughs: add (2)

reg[1]

registers
3

instruction
reg[1]+reg[2]

memory
PC

memory
1

Data
reg[2] ALU
2

imm
+4
add r3, r1, r2

CS1104-P2-6 Processor: Datapath and Control 36


Datapath Walkthroughs: slti
 slti $r3,$r1,17
 Stage 1: Fetch this instruction, increment PC.
 Stage 2: Decode to find it is an slti, then read
register $r1.
 Stage 3: Compare value retrieved in stage 2 with
the integer 17.
 Stage 4: Go idle.
 Stage 5: Write the result of stage 3 in register
$r3.

CS1104-P2-6 Processor: Datapath and Control 37


Datapath Walkthroughs: slti (2)

reg[1]

registers
x

instruction
reg[1]-17

memory
PC

memory
1

Data
ALU
3

imm 17
+4
slti r3, r1, 17

CS1104-P2-6 Processor: Datapath and Control 38


Datapath Walkthroughs: sw
 sw $r3, 20($r1)
 Stage 1: Fetch this instruction, increment PC.
 Stage 2: Decode to find it is an sw, then read
registers $r1 and $r3.
 Stage 3: Add 20 to value in register $r1 (retrieved
in stage 2).
 Stage 4: Write value in register $r3 (retrieved in
stage 2) into memory address computed in stage
3.
 Stage 5: Go idle (nothing to write into a register).

CS1104-P2-6 Processor: Datapath and Control 39


Datapath Walkthroughs: sw (2)

reg[1]

registers
x

instruction
reg[1]+20

memory
PC

MEM[r1+20]<-r3 memory
1

Data
ALU
3 reg[3]

imm 20
+4
sw r3, 20(r1)

CS1104-P2-6 Processor: Datapath and Control 40


Why Five Stages?
 Could we have a different number of stages?
 Yes, and other architectures do.
 So why does MIPS have five stages, if
instructions tend to go idle for at least one
stage?
 There is one instruction that uses all five stages:
the load.

CS1104-P2-6 Processor: Datapath and Control 41


Datapath Walkthroughs: lw
 lw $r3, 40($r1)
 Stage 1: Fetch this instruction, increment PC.
 Stage 2: Decode to find it is a lw, then read
register $r1.
 Stage 3: Add 40 to value in register $r1 (retrieved
in stage 2).
 Stage 4: Read value from memory address
compute in stage 3.
 Stage 5: Write value found in stage 4 into register
$r3.

CS1104-P2-6 Processor: Datapath and Control 42


Datapath Walkthroughs: lw (2)
reg[3]
reg[1]

registers
x

instruction
reg[1]+40

memory
PC

memory
1

Data
ALU
3

r3<-MEM[r1+40]
imm 40
+4
lw r3, 40(r1)

CS1104-P2-6 Processor: Datapath and Control 43


What Hardware Is Needed?
 PC: a register which keeps track of address
of the next instruction.
 General Purpose Registers
 Used in stages 2 (read) and 5 (write).
 We are currently working with 32 of these.
 Memory
 Used in stages 1 (fetch) and 4 (R/W).
 Cache system makes these two stages as fast as
the others, on average.

CS1104-P2-6 Processor: Datapath and Control 44


Datapath: Summary
 Construct datapath based on register transfers
required to perform instructions.
 Control part causes the right transfers to happen.

registers
rd
instruction
memory
PC

memory
rs

Data
ALU
rt

+4 imm

opcode, funct
Controller
CS1104-P2-6 Processor: Datapath and Control 45
Where is Logic Design Used?
 Combinational circuits for
ALU and other parts of the
datapath.
 Different control signals are ALU

needed for different clock


cycles and different
instructions for the ALU,
registers and other parts of ALU Control
the datapath. Sequential
circuits.
CS1104-P2-6 Processor: Datapath and Control 46
Where is Logic Design Used? (2)
Start

Instruction fetch/decode and register fetch

Memory access R-type Branch Jump


instructions instructions instruction instruction

 High-level view of finite state machine control.


 Sequential logic design can be used to assert the
correct control signals at the correct times.

CS1104-P2-6 Processor: Datapath and Control 47


Summary
 Datapath is the hardware that performs
operations necessary to execute programs.
 Control instructs datapath on what to do next.
 Datapath needs:
 access to storage (general purpose registers and
memory)
 computational ability (ALU)
 helper hardware (local registers and PC)

CS1104-P2-6 Processor: Datapath and Control 48


Summary (2)
 Five stages of datapath (executing an
instruction):
 1: Instruction Fetch (Increment PC)
 2: Instruction Decode (Read Registers)
 3: ALU (Computation)
 4: Memory Access
 5: Write to Registers
• ALL instructions must go through ALL five
stages.
• Datapath designed in hardware.
CS1104-P2-6 Processor: Datapath and Control 49
End of file

50

You might also like