You are on page 1of 46

EE-222: Microprocessor Systems

RISC-V CPU Datapath and Control Unit

Instructor: Dr. Rehan Ahmed [rehan.ahmed@seecs.edu.pk]


Great Idea #1: Levels of
Representation & Interpretation
Higher-Level Language temp = v[k];
v[k] = v[k+1];
Program (e.g. C) v[k+1] = temp;
Compiler
lw $t0, 0($2)
Assembly Language lw $t1, 4($2)
Program (e.g. RISC-V) sw $t1, 0($2)
sw $t0, 4($2)
Assembler
0000 1001 1100 0110 1010 1111 0101 1000
Machine Language 1010 1111 0101 1000 0000 1001 1100 0110
Program (RISC-V) 1100 0110 1010 1111 0101 1000 0000 1001
0101 1000 0000 1001 1100 0110 1010 1111
Machine We are here
Interpretation
Hardware Architecture Description
(e.g. block diagrams)
Architecture
Implementation
Logic Circuit Description
(Circuit Schematic Diagrams)
2
§4.1 Introduction
The Processor
• We will examine two RISC-V implementations
– A simplified version
– A more realistic pipelined version

• Simple subset, shows most aspects


– Memory reference: ld, sd
– Arithmetic/logical: add, sub, and, or
– Control transfer: beq
Hardware Design Hierarchy
system
Today

datapath control

code state combinational


multiplexer comparator
registers registers logic

Your DLD Course: register logic


Make sure you
know this stuff switching
networks
4
Before we build our datapath:
A Review of Key Important Concepts
Review -- Combinational Logic
• Hardware is permanent. Always do everything
you might want
• Use MUXes to pick from among input
– S input bits selects one of 2S inputs

• Ex: ALU
6
Sequential Elements
• Register: stores data in a circuit
– Uses a clock signal to determine when to update
the stored value
– Edge-triggered: update when Clk changes from 0
to 1

Clk
D Q
D

Clk
Q
Sequential Elements
• Register with write control
– Only updates on clock edge when write control
input is 1
– Used when stored value is required later

Clk

D Q Write

Write D
Clk
Q
Propagation Delay in Gates

9
Gate Delay (Propagation Delay)
• Time that it takes for combinational gate output to
change after inputs change

A
A
Y B
B

t gate

10
Path Delay
• Delay through a series of combinational gates:
– Specifically, the time it takes for the output of the series of
gates to change after the inputs to the path change.

• Example:
– Propagation delay for each gate is the same (1ns in this
example)

A
X
B

C
D Y

11
Path Delay
• Delay through a series of combinational gates.
– Specifically, the time it takes for the output of the series of
gates to change after the inputs to the path change.

• Example:
– Propagation delay for each gate is the same (1ns in this
example)

A 0
1 5
X
B 3
0 4
1 2
C
0
D Y
0 3 4

12
Comments on the Cct
A 0
1 5
X
B 3
0 4
1 2
C
0
D Y
0 3 4

• In the circuit above, if we apply all inputs at time 0, then


after 5ns, all the outputs have settled to their final
values.

• Do you see any problem with this?


– Some outputs might settle earlier
– Some outputs may switch back and forth a few times
before settling to a final value
– What about input synchronization for the downstream logic?
13
How to Synchronize Inputs?
Add flip-flops!
A
D Q
X
B D XQ

other circuitry
other circuitry

D Q

C
Y
D Q
D Y
Q

D
D Q

• Rising edge on clock at time 0:


– Assuming no delay in the flip-flops, the outputs of the
source (left four) flip-flops change at time 0.
• Some time later (one clock cycle), the clock goes high
again, and the destination flip-flops read in X and Y

14
General Structure of a Digital System
• Digital systems are made up of many stages of flip-flops
and combinational logic.
A
D Q
X
B D XQ

other circuitry
other circuitry

D Q

C
Y
D Q
D Y
Q

D
D Q

DFF Combinational DFF Combinational DFF …


Logic Logic

15
You’ve Seen this Before
• Finite State Machines, shift registers, counters, etc…

FSM FSM
Inputs Next State DFF Outputs
Output Logic
Logic

COUNT_INTERNAL
wire

+ D Q COUNT
1

CLK
COUNTER

16
Review: A General Sequential Circuit

Y1 y1
w
Combinational Combinational z
circuit circuit

Y2 y2

Clock

Y1,Y2 represent NEXT state y1,y2 represent PRESENT state

17
Review: Sequential Circuit
• In a sequential circuit, the values of the outputs
depend on the past behavior of the circuit, as well
as the present values of its inputs.
– Moore: If the outputs depend only on the present
state.
– Mealy: If the outputs depend on both the present
state and the present values of the inputs.

W Combinational Combinational
Flip-flops circuit Z
circuit Q

Clock

18
Review -- SDS and Sequential Logic

19
Clocking Methodology
• Combinational logic transforms data during
clock cycles
– Between clock edges
– Input from state elements, output to state
element
– Longest delay determines clock period
Agenda
• Datapath Overview
• Assembling the Datapath Part 1
• Processor Design Process
• Assembling the Datapath Part 2

24
Hardware Design Hierarchy
system
Today

datapath control

code state combinational


multiplexer comparator
registers registers logic

Your DLD Course: register logic


Make sure you
know this stuff switching
networks
25
The Processor
• Processor (CPU): Instruction Set Architecture
(ISA) implemented directly in hardware
– Datapath: part of the processor that contains the
hardware necessary to perform operations
required by the processor (“the brawn”)

– Control: part of the processor (also in hardware)


which tells the datapath what needs to be done
(“the brain”)

26
§4.3 Building a Datapath
Building a Datapath
• Datapath
– Elements that process data and addresses
in the CPU
• Registers, ALUs, mux’s, memories, …
• We will build a RISC-V datapath incrementally
– Refining the overview design
Executing an Instruction
Very generally, what steps do you take (order
matters!) to figure out the effect/result of the
next RISC-V instruction?
– Get the instruction add s0,t0,t1
– What instruction is it? add
– Gather data read R[t0], R[t1]
– Perform operation calc R[t0]+R[t1]
– Store result save into s0

28
Instruction Fetch

Increment by
4 for next
32-bit instruction
register
Basic Phases of Instruction Execution

rd
PC

Reg[]
rs1

IMEM
ALU

DMEM
rs2

+ imm
4
mux

1. Instruction 2. Decode/ 5. Register


3. Execute 4. Memory
Fetch Register Write
Read

Clock
time 30
State Required by RV32I ISA
Each instruction reads and updates this state during execution:
• Registers (x0..x31)
− Register file (or regfile) Reg holds 32 registers x 32 bits/register: Reg[0].. Reg[31]
− First register read specified by rs1 field in instruction
− Second register read specified by rs2 field in instruction
− Write register (destination) specified by rd field in instruction
− x0 is always 0 (writes to Reg[0]are ignored)
• Program Counter (PC)
− Holds address of current instruction
• Memory (MEM)
− Holds both instructions & data, in one 32-bit byte-addressed memory space
− We’ll use separate memories for instructions (IMEM) and data (DMEM)
▪ Later we’ll replace these with instruction and data caches
− Instructions are read (fetched) from instruction memory (assume IMEM read-only)
− Load/store instructions access data memory

32
Agenda
• Datapath Overview
• Assembling the Datapath Part 1
• Processor Design Process
• Assembling the Datapath Part 2

34
R-Format Instructions
• Read two register operands
• Perform arithmetic/logical operation
• Write register result
Implementing the add instruction

add rd, rs1, rs2


• Instruction makes two changes to machine’s state:
− Reg[rd] = Reg[rs1] + Reg[rs2]
− PC = PC + 4

36
Datapath Walkthroughs (1/3)
• add x3,x1,x2 # r3 = r1+r2
1) IF: fetch this instruction, increment PC
2) ID: decode as add
then read R[1] and R[2]
3) EX: add the two values retrieved in ID
4) MEM: idle (not using memory)
5) WB: write result of EX into R[3]

37
Instruction Fetch

Increment by
4 for next
32-bit instruction
register
Example: add Instruction

add x3,x1,x2
R[1] + R[2]
R[1]

registers
3
instruction
memory
PC

memory
1

Data
ALU
2 R[2]

imm
+4
MUX

39
Datapath for add

+4 Reg[]
DataD Reg[rs1]
pc inst[11:7] alu
pc+4
IMEM AddrD
inst[19:15] AddrA DataA Reg[rs2]
+
inst[24:20] AddrB DataB

inst[31:0] RegWriteEnable
(RegWEn)

Control Logic
40
Timing Diagram for add
+4 Reg[]
DataD Reg[rs1]
pc inst[11:7] alu
pc+4 IMEM AddrD
inst[19:15] AddrA DataA Reg[rs2]
+
inst[24:20] AddrB DataB

inst[31:0]
RegWEn

clock
time
Clock

PC 1000 1004

PC+4 1004 1008

inst[31:0] add x1,x2,x3 add x6,x7,x9

Reg[rs1] Reg[2] Reg[7]

Reg[rs2] Reg[3] Reg[9]

alu Reg[2]+Reg[3] Reg[7]+Reg[9]

Reg[1] ??? Reg[2]+Reg[3] 41


Implementing the sub instruction

sub rd, rs1, rs2


• Almost the same as add, except now have to subtract
operands instead of adding them
• inst[30] selects between add and subtract

42
Datapath for add/sub

+4 Reg[]
DataD Reg[rs1]
ALU
pc IMEM
inst[11:7]
AddrD alu
pc+4 inst[19:15] AddrA DataA Reg[rs2]
inst[24:20] AddrB DataB

inst[31:0] RegWEn ALUSel


(1=write, 0=no write) (Add=0/Sub=1)

Control Logic
43
Implementing other R-Format instructions

• All implemented by decoding funct3 and funct7 fields and


selecting appropriate ALU function

44
Implementing the addi instruction
• RISC-V Assembly Instruction:
addi x15,x1,-50

111111001110 00001 000 01111 0010011


imm=-50 rs1=1 ADD rd=15 OP-Imm

45
Datapath for add/sub

+4 Reg[]
DataD Reg[rs1]
ALU
pc IMEM
inst[11:7]
AddrD alu
pc+4 inst[19:15] AddrA DataA Reg[rs2]
inst[24:20] AddrB DataB

inst[31:0] RegWEn ALUSel


(1=write, 0=no write) (Add=0/Sub=1)

Control Logic
46
Adding addi to datapath

+4 Reg[]
DataD
ALU
pc IMEM
inst[11:7]
AddrD Reg[rs1] alu
pc+4 inst[19:15] AddrA DataA 0
Reg[rs2]
inst[24:20] AddrB DataB 1

inst[31:20]
Imm. imm[31:0]
Gen

inst[31:0] ImmSel=I RegWEn=1 BSel=1 ALUSel=Add

Control Logic
47
I-Format immediates

inst[31:0]

------inst[31]-(sign-extension)------- inst[30:20]

imm[31:0]
inst[31:20] imm[31:0]
Imm.
Gen • High 12 bits of instruction (inst[31:20]) copied to low 12 bits
of immediate (imm[11:0])
• Immediate is sign-extended by copying value of inst[31] to
ImmSel=I fill the upper 20 bits of the immediate value (imm[31:12])

48
Adding addi to datapath

+4 Reg[]
DataD
ALU
pc IMEM
inst[11:7]
AddrD Reg[rs1] alu
pc+4 inst[19:15] AddrA DataA 0
Reg[rs2]
inst[24:20] AddrB DataB 1

Also works for all other I-


format arithmetic instruction
inst[31:20]
Imm. (slti,sltiu,andi,ori,
imm[31:0]
Gen xori,slli,srli,srai)
just by changing ALUSel

inst[31:0] ImmSel=I RegWEn=1 BSel=1 andi


sltiu
slti
xori
ori
ALUSel=Add

Control Logic
49
Why Five Stages?
• Could we have a different number of stages?
– Yes, and other architectures do

• So why does RISC-V have five if instructions


tend to idle for at least one stage?
– The five stages are the union of all the operations
needed by all the instructions
– There is one instruction that uses all five stages:
load (lw/lb)

50
Administrivia
• Semester Project due on Monday 6th May 2019
− Project display schedule will be out in the coming week

• Assignment-3: Last one ☺ and fun doing, hopefully.


− You are required to watch a keynote by John Hennessy and
David Patterson and pen down the key take-away points of
the videos in one page [single-sided, no handwritten, font
size no more than 12-pts and no less than 10 pts]
− Keynote link:
https://www.acm.org/hennessy-patterson-turing-lecture
− Due on Monday 6th May 2019
▪ You’ll be interviewed for it along with your project demo
− Submit the soft copy on LMS.
− Also submit the hard-copy that day.
51

You might also like