You are on page 1of 31

SANTA CLARA UNIVERSITY

The Processor
A simple processor executing few RISC V instructions
SANTA CLARA UNIVERSITY

The Task
• Design a RISC V processor that can execute the following instructions:
1. Load word, lw
2. Store word, sw
3. Branch if equal, beq
4. R-type subset, two register operand arithmetic/logic:
• Add, add
• Subtract, sub
• And, and
• Or, or
• This subset –albeit small- is powerful enough to build programs (but
no use of functions).

7/15/2022
SANTA CLARA UNIVERSITY

Additional Requirement

• Every instruction will completely execute in exactly one cycle.

7/15/2022
SANTA CLARA UNIVERSITY

The Plan
• Understand what a processor does
• Identify and understand the building blocks used,
• Build the processor gradually,
• Understand the control signal generation and use,
• Figure out how to calculate timing
• Misc.:
• Examples of extra instructions
• Looking inside the register file.

7/15/2022
SANTA CLARA UNIVERSITY

The Lifecycle of an Instruction


• The processor performs the following tasks (ad infinitum)
1. Fetch (retrieve) instruction from instruction memory,
2. Decode (i.e., parse and understand) fetched instruction,
3. Execute instruction (perform what instruction is “instructing”,
4. Identify address of next instruction and go back to (1).

7/15/2022
SANTA CLARA UNIVERSITY

Building Blocks: The PC


• The Program Counter, PC.
• 32-bit register which is updated every cycle (because there is a new instruction
every cycle per requirement).
• Simplified diagram will be used.
Out • Clock will not be shown (implicit)
PC PC
32 32 • PCWrite (hidden in diagram) = 1 →PC
is updated every cycle
• Input/Output signals = 32 bits each
clk (implicit).
Simplified Diagram Detailed Diagram

7/15/2022
SANTA CLARA UNIVERSITY

Building Blocks: The Data Memory


clk • Used by ld and sd instructions.
• Clocked.
• MemRead & MemWrite should never
be high (1) at the same time.
Address Data-Out
32 Memory 32
• Address and data =32bits.
• Clock and signal sizes will not be
Data-In shown in later diagrams.
32

MemRead MemWrite

7/15/2022
SANTA CLARA UNIVERSITY

Building Blocks: The Instruction Memory


clk • Read every cycle.
• Address is from PC
• Clocked.
Data-Out • MemRead=1 all the time
Address
32 Memory 32 • MemWrite = 0 all the time
• Address and data = 32 bits.
• Ignore data-in not needed.
• Clock, MemRead, MemWrite, and
MemRead MemWrite
=1 =0
signal sizes will not be shown in later
diagrams.
7/15/2022
SANTA CLARA UNIVERSITY

Building Blocks: The ALU


• Operations needed
• +:
• ld, sd, (r-type & func=add)
• -: | Zero
• beq 32
• &: |
• r-type & func = and 32
• |:
|
• r-type & func = or 32
• Arith./logic operation selection ALU Operation= 4
provides room for 2^4 =16 bits
operations.
• Augmented with another 1-bit output
–Zero- which is high (1) when the
result (Out) is = 0.

7/15/2022
SANTA CLARA UNIVERSITY

Building Blocks: Adders


• An adder is simpler than an ALU.
| • Since it is dedicated for addition, it does not need any control to inform
32 it about operation to be performed.
+ | • An adder with one constant operand (like the lower figure) is
32 even simpler to build.
| • It costs less gates because it is specialized.
32 • The top figure adder is needed to calculate PC + offset for a
beq instruction.
|
32
• The bottom figure adder is needed to calculate PC+4.
• Every cycle, one of their outputs will have to be selected to
+ |
update the PC.
32
• The selection depends on the instruction executed and its outcome.

7/15/2022
SANTA CLARA UNIVERSITY

Building Blocks: The Register File


clk • An array of 32 registers.
• Every register made of 32 flip-flops (X0 is all 0)
|
5
Rs1
ReadData1
• Has two read (output) ports.
|
• Has one write (input) port.
Rs2
| 32
5

|
5
Register
file
• Reading/Writing a register in register file is
ReadData2
| slower than accessing a separate register.
32
WriteData • More details provided at end of document.
|
32

RegWrite

7/15/2022
SANTA CLARA UNIVERSITY

Building Blocks: Immediate Generation


Imm[11:5] Rs2[4:0] Rs1[4:0] func3 Imm[4:0] Opcode[6:0] Immed. Gen
Imm[12] Imm[10:5] Rs2[4:0] Rs1[4:0] func3 Imm[4:1] Imm[11] Opcode[6:0]

32
Imm[11:0] Rs1[4:0] func Rd[4:0] Opcode[6:0]

• Formats of sw, beq, and lw instructions.

• The immediate fields placement are not identical across instructions.


• The beq instruction immediate fields need some reordering.
• To hide the complexity, and simplify the diagram, the whole instruction is
shipped to the immediate generation unit which hides the required work.
• Based on the instruction type, it puts the bits together and generate the appropriate immediate.
• Immediate generation is not hard or slow- just tedious (like sorting out old notes at quarter’s end.)
• Behavior is undefined if the instruction is in the r-type family.

7/15/2022
SANTA CLARA UNIVERSITY

Building Blocks: One-bit Left Shifter

• Needed to add missing immediate bit 0 tt


In0 et
t
te
ex te Out0
to branch displacement (offset). te
xt
xt Out1
ext
ext ext
• Very simple block. No logic gates. ext
tex <<1 ext
tex
te te
In30 xt xt
ex ex
t t Out31
xt xt
e
t t discard
x
t

7/15/2022
SANTA CLARA UNIVERSITY

Building Blocks: Control


• Produce control signals and ALU control.
• Covered after looking at the data path.

7/15/2022
SANTA CLARA UNIVERSITY

Incrementally Building The Datapath


• The building blocks are combined to build the data-path
incrementally.
• Earlier version are incomplete implementations.
• Gradually extension and refinement helps understanding.

7/15/2022
SANTA CLARA UNIVERSITY

Instruction Fetch
• Repeated instruction fetch!
• Nothing more.
+
• Every clock cycle the PC is
incremented by 4 & a word is read
off the memory.
• The word (instruction) goes out to a
black hole.
Instruction

• Eventually when the value of the PC PC


Instruction
32
Memory
reaches a very large number, it will
roll back to 0.

7/15/2022
SANTA CLARA UNIVERSITY

Parse, Decode & Register Read


C
• Still repeated instruction fetch, with no
o
n
t
r
action.
Opcode &

• Instruction fields are sent to


o
func* fields l
S
Control
+ Unit i
g
n
a
l
s
• Register file,
• Registers results are left hanging with no
takers!
• No worry about register write at the moment.
Rs1
|
5 ReadData1
Instruction Rs2 |
PC |

• Control unit,
Memory 32
5

Rd Register File
|

• Control signals generated


5 ReadData2
|
32

• Immediate Generation
• Note: some instruction bits sent to
Immed. Gen

32

more than one destination (to


accommodate different formats).
7/15/2022
SANTA CLARA UNIVERSITY

Partial Datapath: r-type


• ALU inputs : Two register file outputs
C
o
n
t
r
Opcode &

• Write port to register file added,


o
func* fields l
Control S
+ Unit i
g

• Control signal needed to register file,


n
a
l
s

• ALU will perform either +,-, &, |


depending on ALU Operation
Rs1
|
5
Instruction Rs2 | Zero
PC | 32

• ALU operation derivation from control


Memory
5

Rd
| Register file

unit not shown in picture. 5


|
32
ReadData2

• Programs runs from one instruction


ALU Operation= 4
bits
WriteData

to the next performing r-type Immed. Gen

32
RegWrite

instructions (assuming correct


opcodes).

7/15/2022
SANTA CLARA UNIVERSITY

Partial Datapath: r-type, lw, sw


C
o
n
t
r
Opcode & o
func* fields l
Control S
+ Unit i
g
n
a
l
s

Rs1
|
5
Instruction Rs2 | Zero
PC | 32
Memory
5 Address Data
Rd Register |
| 32 Memory
5 File
Write Data

Data In
(for sd)

ALU Operation= 4 bits


Immed. Gen
MemRead MemWrite

RegWrite

32-bit immediate

7/15/2022
SANTA CLARA UNIVERSITY

Partial Datapath: Merging r-type with lw and sw


C C
o
o
n
n t
t r
r Opcode & o
Opcode & o func* fields l
S
func* fields l
+
Control
Unit i
Control S g
+ Unit i n
g a
n l
s
a
l
s

Rs1
|
5
Instruction Rs2 | Zero
PC | 32
Rs1 Memory
|
5 Address Data
5 Rd Register |
| 32 Memory
Instruction Rs2 | Zero 5 File
PC | 32
Memory Write Data
5

Rd
| Register file Data In
5 (for sd)
ReadData2
| ALU Operation= 4 bits
32 Immed. Gen
MemRead MemWrite
ALU Operation= 4
RegWrite
bits
WriteData
32-bit immediate

Immed. Gen

RegWrite
32

Differences R-type Ld/Sd


Lower input to ALU (mux needed) From RS2 From immediate
ALU out To write port of register file To address of data memory
RS2 destination To ALU To DataIn of memory (for sd)
Reg.File Data In (mux needed) From ALU From data out of memory (for ld)
7/15/2022
SANTA CLARA UNIVERSITY

Complete Datapath

C + <<1

C
o
n
t
r
o
Opcode & l
func* fields Beq
Control
+ Unit S
i
g
n
a
l
s

Rs1
|
5
Instruction Rs2 |
PC | 32
Memory
5 Address Data
Rd Register |
| 32 Memory
5 File

Write Data

Data In
(for sd)

B ALU Op
Immed. Gen
MemRead MemWrite
RegWrite

A
32-bit immediate

7/15/2022
SANTA CLARA UNIVERSITY

Adding Control Signals


• Control signals control the flow and the operations of data.
• In the tiny RISC V datapath, they are needed for:
• Controlling writes to storage elements.
• Controlling multiplexors
• Generating ALU Operation.
• Control signals are derived from the opcode and the function fields.

7/15/2022
SANTA CLARA UNIVERSITY

Control Signals: Storage Elements Writes


Storage Element Why Control Resolution
Register File To permit writes only for r-type and ld PCWrite signal added
Data Memory To permit write only for sd MemWrite (also MemRead added for lw)
PC Needs to be update only once per instruction. Instruction takes one cycle, so it is always updated,
hence it can be hardwired to always be updated.
Instruction Memory Prevent writing Assumption is we always read it and never write to
it. Hence its MemWrite is hardwired to 0 and
MemRead hardwired to 1, both are hidden and
never discussed any further.

7/15/2022
SANTA CLARA UNIVERSITY

Control Signals: Multiplexor Controls


• Needed for multiplexors A, B and C.
• They choose one of the inputs to pass through.
• Minor Clarification:
• Some instructions do not care to use the output of the multiplexor.
• E.g. beq and sw do not care for output of multiplexor A
• When the instruction is beq or sw, multiplexor A could be designed to pass either of its
inputs (it is a “don’t care”) condition.

7/15/2022
SANTA CLARA UNIVERSITY

Control Signals: ALU Operation


• The ALU operation controls are 4 signals allowing 16 possible
operations.
• However, in this design only add, subtract, and, or are needed.
• They are derived from opcode and function fields
• Avoiding Boolean expressions here is how they are set
1. Add (r-type and inst.= add) or load or store
2. Subtract  (r-type and inst. = sub) or beq
3. And  r-type and inst. = and
4. Or  r-type and inst. = or
• A more formal definition (using Boolean expressions) is not hard for
some one with a background in logic design.

7/15/2022
SANTA CLARA UNIVERSITY

Control Signal Derivation


• Simplified overview of control unit:
• Two stages
1. Stage one: uses opcode and function
field and generates one signal for every Is_ld Control Signals: ALUSC,etc.
instruction (good idea to create a signal Is-sd

when no instruction is recognized) Is_r-type


Misc. logic

• Signals: is_ld, is_sd, is_r-type, is_beq, Opcode & func fields


Instruction Decoder Is_beq
Misc. logic gates

is_error.
2. Stage two: used these signals to
generate RISC V (blue) control signals
• MemWrite, MemRead, AluSrc, etc.

7/15/2022
SANTA CLARA UNIVERSITY

Control Signal Equations


• Examples:
• RegWrite high if load or an r-type
• RegWrite = is_lw + is_r-type
• MemRead high if load
• MemRead = is_lw
• Multiplexor signals discussed in next slides.

7/15/2022
SANTA CLARA UNIVERSITY

Equations for Mux Controls


To Register
• Using mux A as an example Memory
File
Output
• It should let memory output 0
through if instruction is load, ALU 1
• It should let ALU output through if Output
instruction is r-type. AluSrc= is_r-type
• Equation for mux control depends To Register
Memory
on how inputs are connected to 0 File
Output
and 1 inputs. 1
ALU 0
• Rule To help: Set mux control such that
Output
the signal connected to the “1” input pass.
AluSrc= is_lw

7/15/2022
SANTA CLARA UNIVERSITY

Estimating the Clock Cycle Length


• In order to estimate how long a clock cycle is needed for this
implementation:
• Trace timing for all instructions and find out which instruction takes the
longest time.
• The longest time will is the minimum clock cycle length that can be used.
• The delay of all blocks need to be known.
• Block output earliest valid time
= block delay + valid time of latest input.

7/15/2022
SANTA CLARA UNIVERSITY

Load Timing
400
500

C + <<1

700
C
o
0 n
t
400 r
Opcode & o

100
func* fields l
S
Control
+ Unit i


g
ALU 200 ps, n
a
l
• Adders= 100 ps, s

• reg. file rd/wr= 100 ps,



500
Mem rd/wr = 400 ps. 400
Rs1
|
5
| Zero

Instruction Rs2
Everything else ≈ 0 PC
Memory
|
5
32
700 Address
0 Register | Memory

Rd
Outcome: |
5 File
500

Data In
32

• PC can be updated at 700 ps, 1100 Write Data


500 (for sd)
500

• Reg. file write will complete at


1200 ps. B 400
ALU Op
➔ Load needs 1200 ps. Immed. Gen
MemRead MemWrite

RegWrite 400 700 400

A 1100
32-bit immediate 400

400

7/15/2022
SANTA CLARA UNIVERSITY

Summing Up
• While other instructions may need less than 1400 to complete, clock
cycle has to be large enough to tolerate the load instruction.
• A pipelined data path will fix this problem.

7/15/2022

You might also like