RISC-V Pipeline P3

Dynamic Branch Prediction 193
• Dynamic branch prediction, makes prediction based on accumulated

data during run-time or execution of an instruction.
• This requires hardware to save some past data of branches and utilizes
that data for the prediction of the branches.
• This prediction is based on two interacting methods:
– Branch outcome/direction prediction: Taken or Not Taken
– Branch target prediction: Branch target address
– (When a branch is predicted as taken, the branch target address is
required, since while fetching a branch instruction, the branch
target address is not available, hence, this needs prediction)
Kuruvilla Varghese
193
• Simple Method: Look up the address of the branch instruction to see

if the conditional branch was taken the last time this instruction was
executed, if so, begin fetching new instructions from the same place
as the last time.
• This is implemented using a Branch prediction buffer / branch history
table: small memory indexed by the lower portion of the address of
the branch instruction. The memory contains a bit that says whether
the branch was recently taken or not.
Kuruvilla Varghese
194
1
Branch Instruction (for … loop) 195
for (i = 0, i < 10, i++) add x1, x0, x0 // i = 0

{ addi x2, x0, 10
loop:
addi x1, x1, 1 // i++

} bne x1, x2, loop
Kuruvilla Varghese
195
• The steady-state prediction behaviour will mis-predict on the first and

last loop iterations.
• Misprediction in the last iteration is inevitable since the prediction bit
will indicate taken, as the branch has been taken previously.
• The misprediction on the first iteration happens because the bit is
flipped on prior execution of the last iteration of the loop, since the
branch was not taken on that exiting iteration.
• Two incorrect predictions per loop.
• To remedy this weakness, 2-bit prediction schemes are often used. In
a 2-bit scheme, a prediction must be wrong twice before it is changed.
Kuruvilla Varghese
196
2
2-bit prediction Scheme 197
Taken • Output is indicated in the

state name.
Not Taken
Predict Predict • We can call states as
Taken Taken
⁃ 0 Strong Taken
0 1
Taken ⁃ 1 Weak taken
Taken Not Taken
⁃ 2 Weak Not Taken
⁃ 3 Strong Not Taken
Predict Taken Predict
Not Not
Taken Taken
• ~85-90% accuracy for
3 Not Taken 2 many programs with 2-bit
counter based prediction.
Not Taken
Kuruvilla Varghese
197
2-bit prediction: Another scheme 198
Taken • Output is indicated in the

state name.
Not Taken
Predict Predict • We can call states as
Taken Taken
Taken ⁃ 0 Strong Taken
0 1
Not Taken ⁃ 1 Weak taken
Taken
⁃ 2 Weak Not Taken
⁃ 3 Strong Not Taken
Predict Taken Predict
Not Not
Taken Not Taken Taken
• ~85-90% accuracy for
3 2 many programs with 2-bit
counter based prediction.
Not Taken
Kuruvilla Varghese
198
3
Branch Target Buffer (BTB) 199
Outcome
• Branch Target Buffer
1 Branch PC(63:8) Computed Address FSM- Structure shown here
NSL
BTB is of Direct Mapped
Valid Tag Target Address Dir bits Type, It can be Set
Associative or Fully
63 PC 87 0
associative Type also
Branch
History • This scheme can be
Address Table
(BHT)
extended to
accommodate
Global/Local branch
History.
= Decode
PC + 4
Kuruvilla Varghese
199
Branch Target Buffer . 200
• Branch Target Buffer or Branch Target address cache stores the

branch target address prediction.
• BTB is accessed in IF stage and predicted Branch target address is
used as the next fetch address, if the fetched instruction is a branch or
jump (jump is always taken).
• Once the branch direction is determined and the branch target address
is computed in a later stage, these can be compared with the predicted
values and decision to continue or flush the pipeline can be taken.
• Once the real values are computed, BTB values can be updated. For
direction bits to be updated, FSM Next state logic has to be
implemented.
Kuruvilla Varghese
200
4
Branch Target Buffer .. 201
• The BTB valid bits must be cleared at the start and during a context switch.
Similarly, the 2-bits for outcome must be initialized to default (e.g., weekly not
taken) at the start and during a context switch. The state assignment for weekly
not taken state can be “00” for ease of implementation.
• When branch address is looked up in BTB, if the valid is ‘0’ or the address tag
does not match, the outcome can be predicted to default (e.g., weekly not taken).
• When branch address is looked up in BTB, if the valid is ‘1’ and the address tag
is matched, the outcome is fed to NSL of FSM, this can be registered at NSL to
update the outcome when the branch decision is computed later. As this time
another branch may be looking up the BTB.
Kuruvilla Varghese
201
Global/Local Branch Correlation 202
• History of previous branches’ outcome may influence the outcome

of the current branch.
• E.g., Take an if … then … else construct. If the if condition is
evaluated as true, all following else conditions would be false.
Hence, this affects the conditional branches used in if and else
conditions.
• Similarly, previous outcome of the current branch itself may
influence the present outcome of the branch.
• E.g., As we discussed earlier, think of the conditional branch in the
end of a for … loop, If the previous outcome is taken, most probably
current outcome also will be taken.
Kuruvilla Varghese
202
5
Global Branch Correlation 203
• Recently executed branch outcomes in the execution path is correlated

with the outcome of the next branch.
• If first branch is not taken, second is also not taken.
if (cond1)
…
if (cond1 AND cond2)
• If first branch is taken, second is not taken.
if cond1 then (a = 2)
…
if (a == 0)
Kuruvilla Varghese
203
Global Branch Correlation . 204
if (cond1)
…
if (cond2)
…
if (cond1 AND cond2)
…
• If first and second branches are both taken, then third branch is also
taken.
• If first or second branch is not taken, then third branch is also not
taken.
Kuruvilla Varghese
204
6
Global Branch Correlation .. 205
if (aa == 2) ; B1
aa = 0;
if (bb == 2) ; B2
bb = 0;
if (aa != bb) { ; B3
…
}
• If B1 is taken (i.e., aa == 0 @B3) and B2 is taken (i.e., bb = 0 @B3)

then B3 is not certainly taken.
Kuruvilla Varghese
205
• Associate branch outcomes with global outcome history of all

branches.
• Make a prediction based on the outcome of the branch the last time
the same global branch history was encountered.
• Keep track of the global outcome history of all branches in a register
called Global History Register (GHR).
• Use GHR to index into a table that recorded the outcome that was
seen for that GHR value in the recent past in Pattern History Table
(table of 2-bit counters).
Kuruvilla Varghese
206
7
Outcome
Computed Address FSM-
NSL
BTB
Valid Tag Target Address Dir Bits
Branch
History
63 PC 87 0 Table (BHT)
1 0 1 1
Address
Global Branch
History Register
= Decode
Note: The sizes of

PC + 4 BTB and BHT may
not be same
Kuruvilla Varghese
207
Local Branch Correlation 208
• Idea: Have a per-branch history register.

• Associate the predicted outcome of a branch with branch
outcome history of the same branch.
• Make a prediction based on the outcome of the branch the
last time the same local branch history was encountered.
• Local history/branch predictor.
• Uses two levels of history (Per-branch history register +
history at that history register value).
Kuruvilla Varghese
208
8
Local Branch Correlation . 209
• for (i = 0, i <= 4, i++) { }
• If the loop test is done at the end of the body, the

corresponding branch will execute the pattern (1110)n, where
1 and 0 represent taken and not taken respectevely, and n is
the number of times the loop is executed.
• If we know the direction of this branch taken in last three
executions, we can predict the next branch direction.
Kuruvilla Varghese
209
Local Branch Correlation 210
Computed Address FSM- Update History

NSL
BTB
Valid Tag Target Address Dir Bits LBHR Local Branch
History Register
63 PC 87 0
PC
Address
= Decode
PC + 4
Kuruvilla Varghese
210
9
Tournament Branch Predictors 211
• Uses multiple predictors tracking for each branch, which predictor

yields the best results.
• A typical tournament predictor might contain two predictions for each
branch index: one based on local information and one based on global
branch behaviour.
• A selector would choose which predictor to use for any given
prediction.
• The selector can operate similarly to a 1-bit or 2-bit predictor,
favouring whichever of the two predictors has been more accurate.
Kuruvilla Varghese
211
Exceptions - Interrupts 212
• Events other than branches that alter the flow of instructions.

• Initially designed to handle unexpected events like undefined
instructions, later extended for I/O devices to communicate
to processor.
Kuruvilla Varghese
212
10
Exceptions 213
• Exceptional events occur during the run time within a

processor
– OS calls by user program (Internal)
– Arithmetic overflow, Divide by zero (Internal)
– Using undefined Instructions (Internal)
– Misaligned Instructions (Internal)
– Hardware malfunctions, (e.g., Bus error) (Internal / External)
• Since an exception occurs as a result of executing an
instruction, they are synchronous by nature.
Kuruvilla Varghese
213
RISC-V Privilege Modes 214
• A CPU has three privilege modes: Machine, Supervisor, and User.

Each privilege mode has its own user registers, control and status
registers (CSRs) for trap handling, and stack area dedicated to them.
• While operating in User mode, a context switch is required to handle
an event in Supervisor mode.
• The software sets up the system for a context switch, and then an
ECALL instruction is executed which synchronously switches control
to the environment-call-from-User mode exception handler.
• Trap refers to the synchronous transfer of control to a trap handler
caused by an exceptional condition occurring within a RISC-V thread.
Trap handlers usually execute in a more privileged environment.
Kuruvilla Varghese
214
11
Interrupts 215
• Interrupts – service requests from hardware blocks to the

processor core.
– External devices / blocks
• Since an interrupt may occur anytime, and they are typically
not part of the instruction execution sequence, they are
asynchronous by nature.
Kuruvilla Varghese
215
Exceptions - Interrupts 216
• E.g., Hardware malfunction during add instruction

• Machine Exception Program Counter (MEPC) – contains the
address of the offending instruction.
• Branch to a specific location with a Cause Register
specifying the source of exception (RISC-V).
• Or Vectored Interrupts – branching to separate location
depending on source (Base register + offset as per the
source).
Kuruvilla Varghese
216
12
What to do? 217
• The operating system can then take the appropriate action,

which may involve providing some service to the user
program, taking some predefined action in response to a
malfunction, or stopping the execution of program and
reporting an error.
• After performing whatever action is required because of
exception, the operating system can terminate the program or
may continue its execution, using the MEPC to determine
where to restart the execution of the program.
Kuruvilla Varghese
217
RISC-V Exceptions - Registers 218
• Machine Exception Program Counter (MEPC): A 64-bit

register used to hold the address of the affected instruction.
(Such a register is needed even when exceptions are
vectored.)
• Machine Execution Cause register (MCAUSE): A register
used to record the cause of the exception. In the RISC-V
architecture, this register is 64 bits, most bits are currently
unused.
Kuruvilla Varghese
218
13
RISC-V Exceptions 219
• A pipelined implementation treats exceptions as another form of

control hazard.
• Suppose there is a hardware malfunction in an add instruction.
• The instructions that follow the add instruction must be flushed from
the pipeline and instructions from the exception address should be
fetched.
Kuruvilla Varghese
219
RISC-V Exceptions . 220
• To flush the instruction in the IF stage, the instruction in

IF/ID pipeline register can be turned in to a nop.
• To flush instructions in the ID stage, control signal ID.Flush
from the hazard detection unit selects the multiplexer input in
the ID stage that zeros control signals for stalls.
• To flush the instruction in the EX phase, a new signal
EX.Flush cause new multiplexers to zero the control lines.
• Instructions in the MEM stage and WB stage completes the
operation.
Kuruvilla Varghese
220
14
RISC-V Exceptions .. 221
• To start fetching instruction from exception location, an

additional input to the PC multiplexer is added that feeds the
exception location address as input.
• The final step is to save the address of the offending
instruction in the supervisor exception program counter
(SEPC)
Kuruvilla Varghese
221
Exception Control: divide by zero Exception

Conditions
MCAUSE
IF.Flush
Excption MEPC
Unit
IF/ID ID/EX EX.Flush EX/MEM MEM/WB

Co ID.Flush
4 Control signals
+ ntr 0
0
ol
dmr dmw
zero
CE
Exception Inst rd1# dt1
Address PC Mem FA ad# do
rd2#
ALU
wr# dt2
NOP wr dt
wr dt
Im FB ALU Control
m
rd memreg
rs1
rs2
Forwarding
Unit
222
15
RISC-V Exceptions … 223
• With five instructions active in any clock cycle, the hard part is to associate an
exception with the appropriate instruction. Also, multiple exceptions can occur
simultaneously in a single clock cycle.
• The solution is to prioritize the exceptions so that it is easy to determine which is
serviced first.
• In RISC-V implementations, the hardware sorts exceptions so that the earliest
instruction is interrupted.
• The MEPC register captures the address of the interrupted instructions, and the
MCAUSE register records the highest priority exception in a clock cycle if more
than one exception occurs.
Kuruvilla Varghese
223
RISC-V Exceptions …. 224
• If the exception is due to the instruction execution, PC – 8, or PC value from

ID/EX Pipeline Register, can be registered in SEPC and SCAUSE register can
be updated with this cause.
• If the instruction is an undefined instruction, then decode of that happens in ID
stage, one need to flush only IF and ID stage and PC – 4, or PC value from
IF/ID Pipeline Register, can be registered in SEPC and SCAUSE register can be
updated with this cause.
• Same way, other sources of exceptions/interrupt need to be worked out.
• I/O device requests and hardware malfunctions are not associated with a
specific instruction, so the implementation has some flexibility as to when to
interrupt the pipeline. Best way is to complete all the instructions in the
pipeline, to do that, it is enough that the PC is loaded with the
Interrupt/Exception address when the external interrupt is detected.
Kuruvilla Varghese
224
16
Hardware-Software Interface 225
• The hardware and the operating system must work in conjunction so that
exceptions behave as you would expect. The hardware contract is normally to
stop the offending instruction in midstream, let all prior instructions complete,
flush all following instructions, set a register to show the cause of the exception,
save the address of the offending instruction, and then branch to a prearranged
address.
• The operating system contract is to look at the cause of the exception and act
appropriately. For an undefined instruction or hardware failure, the operating
system normally kills the program and returns an indicator of the reason. For an
I/O device request or an operating system service call, the operating system saves
the state of the program, performs the desired task, and, at some point in the
future, restores the program to continue execution.
Kuruvilla Varghese
225
Precise / Imprecise Exceptions or Interrupts 226
• The difficulty of always associating the proper exception with the correct
instruction in pipelined computers has led some computer designers to relax this
requirement in noncritical cases.
• Such processors are said to have imprecise interrupts or imprecise exceptions.
E.g., suppose exception has occurred due to an instruction at EX stage, but if the
hardware, register current PC value into SEPC, it is an imprecise exception.
• RISC-V and the vast majority of computers today support precise interrupts or
precise exceptions.
Kuruvilla Varghese
226
17
RISC-V Interrupt Vector Address 227
• Many RISC-V computers store the exception entry address in a

special register named Machine Trap Vector (MTVEC), which the OS
can load with a value of its choosing.
• The exception address in the hardware is not accessible to OS, Hence
this register.
• This would be a CSR and using instruction for CSR it could be
accessed.
Kuruvilla Varghese
227
Interrupt Controllers 228
• Local Interrupt Controllers

– Core Local Interrupter (CLINT)
– Core Local Interrupt Controller (CLIC)
• Global Interrupt Controllers
– Platform Local Interrupt Controller (PLIC)
Kuruvilla Varghese
228
18
Core Local Interrupter (CLINT) 229
• CLINT offers a compact design with a fixed priority scheme, with

preemption support for interrupts from higher privilege levels only.
• The primary purpose of the CLINT is to serve as a simple CPU
interrupter for software and timer interrupts, since it does not control
other local interrupts wired directly to the CPU.
Kuruvilla Varghese
229
Core Local Interrupt Controller (CLIC) 230
• CLIC is a fully featured local interrupt controller with configurations

that support programmable interrupt levels and priorities.
• The CLIC also supports nested interrupts (preemption) within a given
privilege level, based on the interrupt level and priority configuration.
• Both the CLINT and CLIC integrate registers mtime and mtimecmp to
configure timer interrupts, and msip to trigger software interrupts.
• Additionally, both the CLINT and the CLIC run at the core clock
frequency.
Kuruvilla Varghese
230
19
Global - Platform Local Interrupt Controller (PLIC) 231
• PLIC provides system level flexibility for dispatching interrupts to a

single CPU or multiple CPUs in the system.
• Global interrupts that route through the PLIC arrive at the CPU
through a single interrupt connection with a dedicated interrupt ID.
• Each global interrupt has a programmable priority register available in
the PLIC memory map.
• There is also a system level programmable threshold register which
can be used to mask all interrupts below a certain level.
• The PLIC runs off a different clock than local interrupt controllers,
which is typically an integer divided ratio from the core clock.
Kuruvilla Varghese
231
Interrupt Control and Status Registers (CSRs) 232
• There are interrupt related CSRs contained in the CPU, as well as

memory mapped configuration registers in the respective interrupt
controllers.
• Both are used to configure and properly route interrupts to a CPU.
• Machine mode interrupt CSRs are discussed here.
• Many Machine mode interrupt CSRs have Supervisor or User mode
equivalents.
Kuruvilla Varghese
232
20
• mstatus: Status register containing interrupt enables for all privilege

modes, previous privilege mode, and other privilege level settings.
• mcause: Status register which indicates whether an exception or
interrupt occurred, along with a code to distinguish details of each
type.
• mie: Interrupt enable register for local interrupts when using CLINT
modes of operation. In CLIC modes, this is hardwired to 0 and
interrupt enables are handled using clicintie[i] memory mapped
registers.
Kuruvilla Varghese
233
• mip: Interrupt pending register for local interrupts when using CLINT
modes of operation. In CLIC modes, this is hardwired to 0 and
pending interrupts are handled using clicintip[i] memory mapped
registers.
• mtvec: Machine Trap Vector register which holds the base address of
the interrupt vector table, as well as the interrupt mode configuration
(direct or vectored) for CLINT and CLIC controllers.
• All synchronous exceptions also use mtvec as the base address for
exception handling in all CLINT and CLIC modes.
Kuruvilla Varghese
234
21
• mtvt: Used only in CLIC modes of operation. Contains the base

address of the interrupt vector table for selectively vectored interrupt
in CLIC direct mode, and for all vectored interrupts in CLIC vectored
mode. This register does not exist on designs with a CLINT.
Kuruvilla Varghese
235
Common Registers to CLIC and CLINT 236
• msip: Machine mode software interrupt pending register,

used to assert a software interrupt for a CPU.
• mtime: Machine mode timer register which runs at a constant
frequency. Part of the CLINT and CLIC designs. There is a
single mtime register on designs that contain one or more
CPUs.
• mtimecmp: Memory mapped machine mode timer compare
register, used to trigger an interrupt when mtimecmp is
greater than or equal to mtime. There is an mtimecmp
dedicated to each CPU.
Kuruvilla Varghese
236
22
Timer Interrupts 237
• Timer interrupts always trap to Machine mode, unless

delegated to Supervisor mode using the mideleg register.
• Similarly, Machine mode exceptions may be delegated to
Supervisor mode using the medeleg register.
• For designs that also implement User mode, there exists
sideleg and sedeleg registers to delegate Supervisor interrupts
to User mode.
Kuruvilla Varghese
237
Entry Behaviour for Interrupt Handlers 238
• Whenever an interrupt occurs, hardware will automatically save and

restore important registers. The following steps are complete as an
interrupt handler is entered.
• Save pc to mepc
• Save Privilege level to mstatus.mpp
• Save mie to mstatus.mpie
• Set pc to interrupt handler address, based on mode of operation
• Disable interrupts by setting mstatus.mie = 0
• At this point control is handed over to software where the interrupt
processing begins.
Kuruvilla Varghese
238
23
Exit Behaviour for Interrupt Handlers 239
• At the end of the interrupt handler, the mret instruction will do the
following.
– Restore mepc to pc
– Restore mstatus.mpp to Privilege level
– Restore mstatus.mpie to mie
• There may be additional instructions or functionality contained within

the handler, based on the interrupt controller and mode of operation.
• For example, saving/restoring additional user registers, enabling
preemption, and handling global interrupts routed through the PLIC
which require a claim/complete step.
Kuruvilla Varghese
239
240
Thank You
Kuruvilla Varghese
240
24

RISC-V Pipeline P3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RISC-V Pipeline P3

Uploaded by

Copyright:

Available Formats

Dynamic Branch Prediction 193

• Dynamic branch prediction, makes prediction based on accumulated

Dynamic Branch Prediction 194

• Simple Method: Look up the address of the branch instruction to see

for (i = 0, i < 10, i++) add x1, x0, x0 // i = 0

addi x1, x1, 1 // i++

Dynamic Branch Prediction 196

• The steady-state prediction behaviour will mis-predict on the first and

Taken • Output is indicated in the

2-bit prediction: Another scheme 198

Taken • Output is indicated in the

Branch Target Buffer . 200

• Branch Target Buffer or Branch Target address cache stores the

Global/Local Branch Correlation 202

• History of previous branches’ outcome may influence the outcome

• Recently executed branch outcomes in the execution path is correlated

Global Branch Correlation . 204

• If B1 is taken (i.e., aa == 0 @B3) and B2 is taken (i.e., bb = 0 @B3)

Global Branch Correlation 206

• Associate branch outcomes with global outcome history of all

Note: The sizes of

Local Branch Correlation 208

• Idea: Have a per-branch history register.

• for (i = 0, i <= 4, i++) { }

• If the loop test is done at the end of the body, the

Local Branch Correlation 210

Computed Address FSM- Update History

• Uses multiple predictors tracking for each branch, which predictor

Exceptions - Interrupts 212

• Events other than branches that alter the flow of instructions.

• Exceptional events occur during the run time within a

RISC-V Privilege Modes 214

• A CPU has three privilege modes: Machine, Supervisor, and User.

• Interrupts – service requests from hardware blocks to the

Exceptions - Interrupts 216

• E.g., Hardware malfunction during add instruction

• The operating system can then take the appropriate action,

RISC-V Exceptions - Registers 218

• Machine Exception Program Counter (MEPC): A 64-bit

• A pipelined implementation treats exceptions as another form of

RISC-V Exceptions . 220

• To flush the instruction in the IF stage, the instruction in

• To start fetching instruction from exception location, an

Exception Control: divide by zero Exception

IF/ID ID/EX EX.Flush EX/MEM MEM/WB

RISC-V Exceptions …. 224

• If the exception is due to the instruction execution, PC – 8, or PC value from

Precise / Imprecise Exceptions or Interrupts 226

• Many RISC-V computers store the exception entry address in a

Interrupt Controllers 228

• Local Interrupt Controllers

• CLINT offers a compact design with a fixed priority scheme, with

Core Local Interrupt Controller (CLIC) 230

• CLIC is a fully featured local interrupt controller with configurations

• PLIC provides system level flexibility for dispatching interrupts to a

Interrupt Control and Status Registers (CSRs) 232

• There are interrupt related CSRs contained in the CPU, as well as

• mstatus: Status register containing interrupt enables for all privilege

Interrupt Control and Status Registers (CSRs) 234

• mtvt: Used only in CLIC modes of operation. Contains the base

Common Registers to CLIC and CLINT 236

• msip: Machine mode software interrupt pending register,

• Timer interrupts always trap to Machine mode, unless

Entry Behaviour for Interrupt Handlers 238