Professional Documents
Culture Documents
RISC-V Pipeline
CERTIFICATE
Certified that the minor project (18EC64)work titled Design and Simulation of 5-
Stage RISC-V Pipeline is carried out by Ishita Singh (1RV20EC195), Sourab
Somasundar (1RV20EC157), Sudeep Joshi (1RV20EC161) and Wafa Naaz Shaik
(1RV20EC191) who are bonafide students of RV College of Engineering, Bengaluru, in
partial fulfillment of the requirements for the degree of Bachelor of Engineering in
ECE of the Visvesvaraya Technological University, Belagavi during the year 2022-23. It
is certified that all corrections/suggestions indicated for the Internal Assessment have
been incorporated in the minor project. The minor project report has been approved as
it satisfies the academic requirements in respect of minor project work prescribed by the
institution for the said degree.
External Viva
1.
2.
DECLARATION
We, Ishita Singh , Sourab Somasundar , Sudeep Joshi and Wafa Naaz Shaik
students of sixth semester B.E., Department of ECE, RV College of Engineering, Ben-
galuru, hereby declare that the minor project titled ‘Design and Simulation of 5-
Stage RISC-V Pipeline’ has been carried out by us and submitted in partial fulfilment
for the award of degree of Bachelor of Engineering in ECE during the year 2022-23.
Further we declare that the content of the dissertation has not been submitted previously
by anybody for the award of any degree or diploma to any other university.
We also declare that any Intellectual Property Rights generated out of this project carried
out at RVCE will be the property of RV College of Engineering, Bengaluru and we will
be one of the authors of the same.
Place: Bengaluru
Date:
Name Signature
1. Ishita Singh(1RV20EC195)
2. Sourab Somasundar(1RV20EC157)
3. Sudeep Joshi(1RV20EC161)
We also express our gratitude to our panel members Dr. Govinda Raju M, As-
sociate Professor and Prof. Ravishankar Holla, Assistant Professor, Department of
ECE for their valuable comments and suggestions during the phase evaluations.
Our sincere thanks to the project coordinators Dr. Veena Devi S V, Prof.
Sindhu Rajendran and Prof. Subramanya K N for their timely instructions and
support in coordinating the project.
Our gratitude to Prof. Narashimaraja P for the organized latex template which
made report writing easy and interesting.
Our sincere thanks to Dr. H. V. Ravish Aradhya, Professor and Head, Depart-
ment of ECE, RVCE for the support and encouragement.
We thank all the teaching staff and technical staff of ECE department, RVCE for
their help.
Lastly, we take this opportunity to thank our family members and friends who pro-
vided all the backup support throughout the project work.
ABSTRACT
i
CONTENTS
Abstract i
List of Figures iv
List of Tables v
Abbreviations vi
ii
3.3 Instruction Decode module . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 Execute Stage module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Memory Stage module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6 Writeback Stage module . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
iii
LIST OF FIGURES
1.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.1 Test program for Register, Immediate and Store type instructions . . . . 38
5.2 Output of Unpipelined Processor-1 . . . . . . . . . . . . . . . . . . . . . 39
5.3 Output of Unpipelined Processor-2 . . . . . . . . . . . . . . . . . . . . . 40
5.4 Output of Unpipelined Processor-3 . . . . . . . . . . . . . . . . . . . . . 41
5.5 Test program to check Branch, Jump and Upper type instructions . . . . 41
5.6 Output for the Branch, Jump and Upper type instructions . . . . . . . . 42
5.7 Test program for the pipelined processor . . . . . . . . . . . . . . . . . . 43
5.8 Output of the 5-stage pipelined processor . . . . . . . . . . . . . . . . . . 43
5.9 Output of the Branch Prediction Unit . . . . . . . . . . . . . . . . . . . . 44
5.10 Writing to the mtvec register using CSR write immediate(csri) . . . . . . 44
5.11 Ecall and mret instruction execution . . . . . . . . . . . . . . . . . . . . 45
iv
LIST OF TABLES
v
ABBREVIATIONS
AI Artificial Intelligence
EX Execute
ID Instruction Decode
IF Instruction Fetch
LB Load Byte
LH Load Half-Word
PC Program Counter
vi
RISC-V Reduced Instruction Set Computing-Five
WB Write-Back
vii
RV College of Engineering® , Bengaluru - 560059
Chapter 1
Introduction to RISC-V Processor
CHAPTER 1
INTRODUCTION TO RISC-V PROCESSOR
1.1 Introduction
Reduced Instruction Set Computing-Five(RISC-V) is a free and open-source Instruc-
tion Set Architecture(ISA) that defines the set of instructions a computer processor can
execute. It is designed to be open-source, simple, modular, and extensible, allowing
for customization and innovation. RISC-V enables a wide range of applications, from
embedded systems to high-performance computing, fostering collaboration and driving
advancements in computer architecture [16].
One of the advantages of RISC-V architecture is its ability to improve compilation
speed compared to Complex Instruction Set Computers(CISC). The simplicity of the
instruction set allows for easier decoding and faster execution [14].
Taking up a project on RISC-V offers the advantages of working with an open-source,
customizable architecture that aligns with industry trends. It provides opportunities for
innovation, skill development, and community contribution, while preparing individuals
for future relevance in areas such as embedded systems, IoT, AI, and high-performance
computing [2].
The project aims to create a Central Processing Unit(CPU) based on the RISC-V
architecture. The CPU design follows a 5-stage pipeline approach, dividing the instruc-
tion execution into distinct stages for improved efficiency. The project also focuses on
simulating branch prediction technique as a separate module to resolve control hazards[7].
1.2 Motivation
The motivation to study RISC-V computer architecture stems from its flexible, open-
source nature, making it a prominent choice for research. RISC-V offers opportunities to
explore innovative designs, optimize performance, and contribute to computer architec-
ture advancement.
By working on a RISC-V project, one can explore open-source technologies, gain
insights into computer architecture, and contribute to the growing ecosystem. RISC-V’s
customizable platform enables experimentation, development, and specialized hardware.
Its widespread interest in academia and industry provides networking and knowledge-
sharing opportunities for potential contributions to the field.
1.4 Objectives
The objectives of the project are:
2. To design and simulate Control and Status Registers (CSR) and Branch Prediction
Unit (BPU) modules.
handle complex scenarios and corner cases. The coverage number is determined using a
methodology, and the coverage graph shows increased coverage with more test cases.
The paper [9] by Arul Rathi, C., et al., introduces a dynamic branch predictor that
utilizes a 2-bit saturation counter. When a branch is encountered for the first time, it
is noted in the Pattern History Table(PHT). The predictor will only predict the branch
again if it confirms the branch in its second appearance based on the saturation counter’s
state.
In paper [12], the researchers Shraddha M. Bhagat et al., have focused on optimal
performance across various process technologies of the RISC-V architecture. It explores
15 instructions within the architecture, implementing them using ModelSim, a tool for
simulation and verification of digital designs.
Saif et al. [20] presented an Field Programmable Gate Array(FPGA) implemen-
tation of an educational RISC-V processor suitable for embedded applications. They
demonstrated the feasibility of using RISC-V for educational purposes by designing and
implementing a processor on FPGA. Their work contributed to enhancing understanding
of RISC-V in teaching environments.
Poli et al. [21] designed and implemented a RISC-V processor on FPGA, with a
specific focus on mobility, sensing, and networking applications. They showcased the
adaptability of RISC-V architecture in different application domains and highlighted its
potential for enabling innovative solutions in the field.
Gayathri and Jaya [22] proposed an area-optimized floating-point coprocessor for
RISC-V processors. Their research addressed the need for efficient floating-point com-
putation in RISC-V architectures, contributing to improved performance in applications
requiring high-precision calculations.
Phangestu et al. [23] developed a five-stage pipelined 32-bit RISC-V base integer
instruction set architecture soft microprocessor core in VHDL. Their work delved into the
intricate details of pipeline architecture and provided insights into enhancing processor
performance through optimized pipelining strategies.
Zheng et al. [24] introduced a soft RISC-V processor IP with high-performance and
low-resource consumption for FPGA. By emphasizing resource efficiency, their research
catered to the requirements of resource-constrained environments, making RISC-V an
attractive option for a wider range of applications.
Lai et al. [25] implemented a 32-bit RISC-V architecture processor using Verilog
HDL, with a particular focus on intelligent signal processing and communication systems.
Their work explored the integration of RISC-V in signal processing, paving the way for
applications in communication and data analysis.
These studies collectively contribute to a deeper understanding of RISC-V architec-
ture, its implementation on FPGA, and its applications across various domains.
1. Understand the RV32I standard base Instruction Set Architecture by studying the
RISC-V Instruction Set Manual. Instructions are selected, and their datapaths are
implemented using Verilog HDL and Vivado design suite.
2. Certain types of instructions which are discussed in the further chapters are selected
and data-path supporting those instructions are implemented.
• Chapter 2 discusses the RV32I base integer instruction set and the implementation
of unpipelined datapath. It explores each instruction type in the RV32I Instruction
Set.
• Chapter 4 explores the Control & Status Registers (CSR) and their implementa-
tion. In addition to this, a control hazard mitigation technique, dynamic branch
prediction unit has been implemented.
• Chapter 5 discusses the results and evaluates the output obtained during the im-
plementation of the
• Chapter 6 concludes the project and discusses the future scope associated with the
design of open-source RISC-V datapath.
1.7 Summary
This chapter introduces the concept of RISC-V processor and the aim of this project.It
also states the motivation for this project and its problem statement.A brief literature
review is also written.Finally a brief methodology of the project is discussed along with
the brief introduction of what each chapter in this report deals with.
The next chapter deals with the instructions in RV32I Integer Instruction Set and
how to go about the implementation of the unpipelined datapath.
Chapter 2
RV32I Base Integer Instruction Set
& Implementation of Unpipelined
Processor
CHAPTER 2
RV32I BASE INTEGER INSTRUCTION SET &
IMPLEMENTATION OF UNPIPELINED
PROCESSOR
This chapter serves as an introductory exploration of the RV32I Instruction Set Archi-
tecture (ISA), focusing on its significance within the realm of RISC-V processor design.
RV32I stands for RISC-V 32-bit Integer ISA, forming a fundamental subset of the broader
RISC-V instruction set. The objective of this chapter is to establish a foundational un-
derstanding of RV32I’s essentials.
The RV32I ISA is designed specifically for 32-bit integer operations, forming the
bedrock of various RISC-V processor designs. A noteworthy mandate, outlined in the
ISA Manual [16], is the requirement for every RISC-V design to encompass support for
the complete instruction set under RV32I. This is explored in the first section of the
chapter. Another crucial aspect highlighted in the instruction set manual’s volume two
[18] is the necessity for every RISC-V design to incorporate support for the Machine
mode. As a cornerstone of the RISC-V privilege architecture, the Machine mode governs
the processor’s interactions with memory, peripherals, and underlying resources.
The subsequent segment of this chapter shifts the focus to designing an unpipelined
processor design. While the RV32I ISA forms the foundation for instruction set archi-
tecture, the execution of these instructions profoundly impacts overall processor perfor-
mance. An introduction to the unpipelined design provides insights into the sequential
execution of instructions, serving as a precursor to subsequent discussions on pipeline
optimization strategies.
The introduction of RV32I instructions takes precedence within this chapter due to its
pivotal role in shaping hardware design, development, and debugging endeavors. A robust
grasp of RV32I instructions is essential for crafting a functional processor architecture
aligned with the RISC-V standard.
the basic arithmetic, logical, memory access, and control flow operations necessary for
general-purpose computation. The RV32I Base Integer Instruction Set consists of the
following categories of instructions:
various types of comparisons, such as checking if one value is equal to another, checking
if one value is less than another, or checking if one value is less than or equal to another.
The comparison instructions are mainly used for making decisions in control flow and
conditional branching. The main integer comparison instructions are:
slt rd, rs1, rs2 Set register rd to 1 if the value in register rs1 is less than the value
in rs2; otherwise, set rd to 0(signed comparison).
sltu rd, rs1, rs2 Set register rd to 1 if the value in register rs1 is less than the
value in rs2; otherwise, set rd to 0(unsigned comparison).
• Instruction Fetch(IF) Stage: At the heart of the processor, the Instruction Fetch
(IF) stage is responsible for fetching instructions from memory based on the pro-
gram counter (PC). It updates the PC to indicate the address of the subsequent
instruction to be executed.
• Instruction Decode(ID) Stage: The Instruction Decode (ID) stage interprets the
fetched instruction and extracts essential information such as the opcode, source
and destination registers, immediate values, and other relevant fields.
• Execute(EX) Stage: The Execution (EX) stage is where the actual operation speci-
fied by the instruction is performed. It encompasses arithmetic and logic operations
through the Arithmetic and Logic Unit(ALU), which takes inputs from registers as
dictated by the instruction’s opcode.
• Write-Back(WB) Stage: The Write Back (WB) stage finalizes the instruction’s
execution by writing the result back to the destination register specified by the
instruction.
• Control Unit: The Control Unit coordinates the data and instruction flow between
the various stages. Additionally, it manages branching and control flow instructions
(e.g., jump, branch) by updating the program counter accordingly.
• Register File: The Register File is a critical component of the processor, hous-
ing the RISC-V processor’s general-purpose registers. It facilitates read and write
operations to these registers, enabling data manipulation and transfer.
• Arithmetic and Logic Unit(ALU): The ALU is responsible for executing arithmetic
and logic operations based on the opcode received from the Instruction Decode
stage. It forms the core computational element of the processor.
• Memory Unit : The Memory Unit, if required, facilitates data access for load/store
instructions. It interacts with the data memory to fetch or store data from or to
the processor’s register file.
2.3 Summary
In summary, this chapter deals with the various instructions of the RV23I Integer
Instruction Set, Arithmetic,Logical,Comparison,Immediate,Load and Store and Control
Chapter 3
Design optimization and the
Pipelined Processor
CHAPTER 3
DESIGN OPTIMIZATION AND THE PIPELINED
PROCESSOR
A pipelined processor operates by guiding instructions through distinct stages: IF
(Instruction Fetch), ID (Instruction Decode), EX (Execute), MEM (Memory Access),
and WB (Write-Back). This architectural approach divides the datapath that are con-
currently handled at various stages.
In contrast, the RISC-V Unpipelined iteration of the processor exhibited a different
configuration. It featured a multitude of control signals from the control unit, and mod-
ules were not segmented into discrete stages. Ensuring design simplicity was a pivotal
consideration, necessitating a reduction in the number of signals employed to orchestrate
processor operations. RISC-V’s ISA offered a solution to this challenge by encoding com-
prehensive instruction details within its instruction code, utilizing opcode and function3
signals. These signals, each spanning 7 and 3 bits respectively within the instruction
code, facilitated the generation of multiplex select signals in every stage. This optimiza-
tion strategy was seamlessly incorporated into the design.
In a real-world context, the CPU is positioned at the core, serving as the interface
between instruction memory and data memory. In the devised structure, each stage
is meticulously defined based on its operation within the processor. The IF stage, for
instance, was integrated with the cs i n signal (chip select for instruction memory, active
low) to facilitate interaction with the instruction memory. Similarly, the MEM stage
incorporated signals such as rd (read), wr (write), and cs d n (chip select for data memory,
active low) to accommodate its specific functionalities. All of this is confined into a single
top-level module Core.v as shown in the Figure 3.1 (a). Control signals i data cs i n,
i addr are used to interface the processor with instruction memory. Control signals
d data, cs d n, wr, rd, Data write MEM are used to interface the processor with data
memory.
The internal architecture of the IF.v module, as shown in Figure 3.2, takes the cur-
rent PC and adds 4 to it, creating a new signal called PC 4. Additionally, the PC is
combined with another value called immOut, which is shifted by one bit to the left. This
combination helps to create a signal called Next PC.
Now, deciding what Next PC should be involves two signals: PC src and jal. PC src
tells whether there is a need to do a branch or a jump in the program. On the other
hand, the jal signal indicates that the instruction is JALR, which means there is a need
to change the PC in a specific way to make things work correctly.
The ID.v module (Instruction Decode), as shown in Figure 3.3, is influenced by the
clock (clk ) and reset (rst) signals. It relies on three signals from the WB (Write-Back)
stage: RegWrite, rd WB, and Data WB. When the RegWrite signal is active, it means
that the Data WB value needs to be stored in a register at the address given by rd WB.
The ID stage also takes the instruction Code from the IF stage to decipher it. It then
passes along three important pieces of information—function3, opcode, and rd—to the
stages that come after it. Additionally, the PC value from the IF stage is carried forward
as is to the next stage. Another critical signal, alu ctrl, is also passed along to the next
stage. This alu ctrl signal serves as instructions to the ALU (Arithmetic and Logical
Unit), directing it on what specific operation to perform. Lastly, immOut represents the
immediate value embedded within the instruction code.
As shown in the internal architecture of the ID stage in the Figure 3.4, it comprises of
three important components: the register file, the immediate generator, and the control
unit modules. The register file contains all registers ranging from x0 to x31. It also
generates a signal called read data valid, indicating when the data retrieved from the
read ports of the register file is valid.
The instruction code undergoes a decoding process within this module. This results
in the extraction of three specific pieces of signals: rs1 (register source 1), rs2 (register
source 2), and rd (register destination). The instruction code is also fed into the immedi-
ate generator module. Here, the immediate value is extracted and sign extended, based
on the type of instruction indicated by the opcode signal.
Furthermore, the instruction code is transmitted to the control unit module. This
control unit generates an alu ctrl signal, which is then sent to the stage for execution.
The alu ctrl mapping is as given in Table 3.1.
The EX.v module as shown in the Figure 3.5 adeptly receives inputs from the preced-
ing ID stage—including Read1, Read2, ‘alu ctrl‘, ‘alu data valid‘, ‘rd‘, ‘immOut‘, ‘func3‘,
‘opcode‘, and ‘PC‘.
Several of these inputs, namely PC, rd, immOut, func3, and opcode, are relayed
directly to the subsequent stage, ensuring their continued integration and significance in
the overall pipeline progression.
Furthermore, the module provides a looping back of the signals result it generates and
the immOut value back to the IF stage. This looping back is of importance, particularly
in scenarios involving branch and jump instructions.
Among the received inputs, the opcode facilitates the intrinsic multiplexers within
the EX stage to judiciously select the appropriate operands to channel to the ALU, thus
orchestrating the precise operation as mandated by the instruction.
In parallel, the func3 signal assumes a position of significance, especially in the context
of store instructions. This value plays a vital role in executing sign extension before the
contents of Read2 are dispatched to the subsequent stage for storage in the data memory
unit.
The internal architecture of the EX stage is shown in the Figure 3.6. This stage com-
prises three intricate sub-modules, encompassing multiplexers and comparators. At its
core, the Arithmetic and Logical Unit (ALU) undertakes the responsibility of performing
operations on operands. These operation are guided by the alu ctrl signal, meticulously
generated within the Control Unit during the Instruction Decode (ID) stage. This sig-
nal traverses the pipeline register, channeling the precise guidance necessary for ALU’s
actions.
Preceding the ALU, two multiplexers diligently channel diverse operations requisite
for distinct instructions. In the realm of instructions such as Register-Register types and
Branch types, Read1 and Read2 are sent to the ALU. Conversely, for Immediate-Register
types, Load and Store instructions, the path involves Read1 and immOut passing into
the ALU. For Jump instructions, PC and immOut are passed into the ALU’s operations.
All of this guided by the two multiplexes which are situated before the ALU, the select
signals are generated using opcode signals passed from the ID stage.
The Store Sign Extend module plays a pivotal role in extending signs to words, half-
words, or bytes, contingent upon the specific store instruction’s nature. Output stemming
from this module seamlessly progresses to the subsequent stage, particularly the Data
Write bus intended for data memory.
The Branch-Jump module helps in determining the occurrence of a branch. This
module critically assimilates the flags emanating from the ALU, dissecting their signif-
icance in the context of branching instructions. In instances of branch instructions, a
pivotal decision is meticulously formulated based upon the cogent analysis of flag values.
This decision subsequently manipulates the PC src, potentially elevating it to ’1’ when
a branch is determined. Conversely, in scenarios where the branch is not taken, PC src
regresses to ’0’. Notably, when a jump instruction surfaces, PC src directly made as ’1’.
The jalr signal signifies the emergence of the JALR instruction, which is the only
instruction which requires PC to be changed to the result produced by the ALU.
The MEM module as shown in the Figure 3.7, efficiently processes crucial inputs
from prior stages, including func3, opcode, alu flags (lt and ltu), rd r (read address of
the register file), and PC. These inputs are prepared for the next stage, ensuring their
ongoing use in the pipeline.
This module also handles data memory interactions using control signals like d data(data
read from the data memory), d addr (address for the data memory, rd (read signal),
wr (write signal), and cs d n(chip select for the data memory). These signals manage
communication with the data memory unit. The Data Store is the data which is sup-
posed to be stored into the Data memory in the case of Store-type instructions.
The internal architecture of the MEM stage is shown in the Figure 3.8. The opcode
undergoes two comparisons within the module. It is matched against the opcodes of
load and store instructions using comparators. In the case of a load instruction match,
the rd (read) signal is activated. For a store instruction match, the wr (write) signal is
activated. When either wr or rd signals are active, the chip select signal cs d n is also
activated. When memory operations are inactive, memory interface lines transition to a
high impedance state (Z-state).
Rest of the signals are directly forwarded to the subsequent stage without alteration.
This ensures flow of relevant data for further processing.
The WB.v module is shown in the Figure 3.9. It generates DataOut WB to be written
back to the register file depending on the type of the instruction indicated by opcode and
func3. Signals RegWrite and rd WB instructions the register file to perform writing
operation on the address indicated by rd WB.
As shown in the Figure 3.10, SIGN EXTEND module plays a critical role in this
phase. It performs sign extension on the read data sourced from the data memory. This
operation holds paramount importance for load instructions such as Load Half-Word(LH),
Load Byte(LB), Load Half-Word Unsigned(LHU), and Load Byte Unsigned(LBU). These
instructions require the data from the MEM stage’s data memory to be adequately sign-
extended. The specifics of sign and word, half-word, or byte adjustments are encoded
within the func3 field.
The lt and ltu flags, representing less than and less than unsigned comparisons, re-
spectively, are combined through an OR gate to act as the selection line for a multiplexer.
This multiplexer aptly selects between 32-bit ‘1‘ and ‘0‘ values. This mechanism becomes
vital in the context of SET instructions. Depending on the result emanating from the
ALU, the appropriate data (‘1‘ or ‘0‘) must be written back to a designated register
within the register file.
In cases where the opcode pertains to jump instructions, the return value must be the
return address of the jump instruction, namely PC + 4. Consequently, this arithmetic
operation (PC + 4) is performed and subsequently routed back to the ID stage.
For instructions categorized as R-type, I-type, and U-type, the logical course of ac-
tion involves writing back the outcome generated by the ALU. This data is seamlessly
integrated into the register bank, serving as a fundamental step in the broader pipeline
flow.
3.7 Summary
To summarize this chapter, it deals with the design optimization and the various
stages of the pipelined processor design, Instruction Fetch, Instruction Decode, Execute,
Memory and Write-back Stage. Each stage functionality is explained respectively.All
these stages are included in into a single top-level module Core which is used to interface
the processor with data memory and instruction memory.
The next chapter deals with control and status registers which deals with control of
processor’s behaviour and branch prediction unit which is a special module that helps to
improve instruction throughput and overall performance by reducing the impact of the
stalling created by the branch instructions on the pipeline.
Chapter 4
Control Status Registers & Branch
Prediction Unit
CHAPTER 4
CONTROL STATUS REGISTERS & BRANCH
PREDICTION UNIT
4.1 Modes in RISC-V
The modes in RISC-V are used to control the access to resources and to protect the
system from unauthorized access. The modes in RISC-V are implemented using a mech-
anism called privilege levels. Each mode has a different privilege level, and instructions
can only be executed in modes with a privilege level equal to or greater than the privilege
level of the instruction. RISC-V has four different modes:
• User Mode: Used by user applications. Fewer privileges than machine mode.
• Supervisor Mode: Used by the operating system. More privileges than user mode.
1. Control and Status Registers (CSRs): Machine mode has access to all control and
status registers (CSRs), which provide information about the processor’s state and
configuration. CSRs are used for managing interrupts, exceptions, memory protec-
tion, and other system-related tasks.
2. Interrupts and Exceptions: Machine mode handles the highest priority interrupts
and exceptions, including external interrupts from peripherals and software inter-
rupts (ECALLs) from user-level programs. It uses the ‘mtvec‘ register to set the
base address of the trap vector, where it can find interrupt and exception handlers.
Machine mode provides the foundation for creating a secure and robust operating sys-
tem that can manage and protect user-level applications and services effectively. It is
crucial for the proper functioning and security of the RISC-V-based systems. In order to
implement the M-mode, the procedure followed is,
1. CSR address, function and opcode extraction: Here, in order to execute CSR in-
structions, the address of the register to which one is writing, reading, setting or
clearing to has to be known. The addresses of the M-mode registers being used are:
These functions are identified by [31:20] bits of the 32 bit instruction. The various
instructions used in modifying CS registers are:
(a) CSR Read-Write (csrrw): reads the value from the specified CSR into
the destination register (rd) and then writes the value from the source register
into the CSR. This instruction is useful for swapping values between a general-
purpose register and a CSR.
(b) CSR Read and Set (csrrs): reads the value from the specified CSR into
the destination register (rd) and then sets selected bits in the CSR using the
value from the source register. It is commonly used for setting specific control
bits in the CSR without affecting other bits.
(c) CSR Read and Clear (csrrc): reads the value from the specified CSR into
the destination register (rd) and then clears selected bits in the CSR using the
value from the source register. Like ‘csrrs‘, it is useful for modifying specific
control bits in the CSR.
(d) CSR Read-Write Immediate (csrrwi): reads the value from the specified
CSR into the destination register (rd) and then writes a zero-extended 5-bit
immediate value into the CSR. It is used for directly writing a small immediate
value into the CSR.
(e) CSR Read and Set Immediate (csrrsi): reads the value from the specified
CSR into the destination register (rd) and then sets selected bits in the CSR
using a zero-extended 5-bit immediate. This instruction is used to directly set
specific bits in the CSR.
(f) CSR Read and Clear Immediate (csrrci): reads the value from the spec-
ified CSR into the destination register (rd) and then clears selected bits in the
CSR using a zero-extended 5-bit immediate. It is used to directly clear specific
bits in the CSR
2. Establishing the state of the m-mode logic: Here, there are 4 states in the processor,
(a) Reset: when the reset signal is enabled, it resets the instruction address to
BOOT Address.
(c) Trap taken: this state indicates that a trap has occurred and is being trans-
ferred to the trap handler.
(d) Trap return: this state indicates that a trap has been handled by a trap handler
and is returning back to normalcy. Ecall, ebreak , mret are some examples
leading to a trap or exception.
3. Assigning the respective addresses and program counter values according to the
given states: The program counter needs to update with the operations of the
processor at all times and to do this, one has to assign addresses to which the
program counter must update to in the above states.
4. Modifying the control status registers according to the state of M-mode: The im-
portance of each register and how they are manipulated is discussed in detail:
• Mstatus: The mstatus register is essential for managing privilege levels and
handling interrupts efficiently in the RISC-V architecture. It allows the proces-
sor to switch between different privilege levels and maintain the state required
for proper exception handling and context switching during interrupts and
traps. Some of the key fields in the mstatus register are:
(a) Machine Previous Interrupt Enable(MPIE): This bit represents the in-
terrupt enable status of the machine before an interrupt or trap occurs.
It helps save the interrupt enable status when transitioning to a higher
privilege level.
• Mie: The ”mie” register has one bit per interrupt source, allowing the proces-
sor to enable or disable interrupts on a per-source basis. When a bit in the
”mie” register is set to 1, it means that the corresponding interrupt source is
enabled, and the processor can respond to interrupts generated by that source.
Conversely, if a bit is set to 0, interrupts from that source are masked, and the
processor will ignore them. The ”mie” register is closely related to the ”mip”
(Machine Interrupt Pending) register, which indicates the pending interrupts
at the machine-level. When an interrupt is raised by an enabled source, the
corresponding bit in the ”mip” register is set to 1. If the corresponding bit in
the ”mie” register is also set to 1, the processor will take appropriate actions
to handle the interrupt.
• Mtvec: The ”mtvec” register contains the base address of the trap vector
table, which is a contiguous block of memory containing the addresses of the
exception handling routines. When a trap or exception occurs in the Machine
mode, the processor uses the value in the ”mtvec” register to determine the
starting address of the corresponding exception handling routine. The format
of the ”mtvec” register is typically divided into two parts:
– Base Address: The lower bits (bits 1:0) of the ”mtvec” register represent
the base address of the trap vector table. These bits must be set to zero
since the trap vector table must be aligned on a 4-byte boundary.
– Mode Control: The remaining bits of the ”mtvec” register are used for
mode control, which specifies the mode of the trap vector. There are two
modes: Direct mode and Vectored mode.
∗ In Direct mode (bit 2 set to 0), the processor jumps directly to the
address specified in the ”mtvec” register when a trap occurs. The
address in ”mtvec” must point to the exception handling routine.
∗ In Vectored mode (bit 2 set to 1), the processor uses the base address
specified in ”mtvec” to calculate the address of the exception handling
routine. It then uses the exception code from the ”mcause” register
to index into the trap vector table to find the appropriate exception
handling routine.
The control and status registers operates in four state namely reset,operating,trap
and return state. Initially the processor is in reset state and this moves onto the
operating state.The operating state is the normal state of the processor. When an
exception occurs, it moves into the trap state .This is shown in fig 4.2. After han-
dling the exception, the state goes back to operating state.When a return function
is identified it goes to trap-return state which then goes back to operating state
after the return function is executed.
• Fixed Branch Prediction: In this approach, the processor consistently makes the
same guess, either predicting that the branch will be taken or not taken. While
simple, fixed branch prediction may not be very accurate as it doesn’t adapt to the
program’s behavior.
• True Branch Prediction: In this approach, the predictor correctly anticipates whether
the branch will be taken or not taken based on historical information and patterns
observed during program execution. It can be further classified into Static Predic-
tion and Dynamic Prediction.
Under the design, 2-bit dynamic branch prediction is incorporated. The predicted
Program Counter (PC) will be stored in a buffer called the Branch Target Buffer(BTB),
as depicted in Figure 4.3. The BTB is a table that stores the target address of recently
executed branches. This crucial component plays a pivotal role in enhancing the effi-
ciency of the branch prediction unit, enabling more accurate predictions and facilitating
smoother program execution.
A Finite State Machine(FSM) is illustrated in Figure 4.4 within the branch prediction
unit, comprising four distinct states: Actually Taken, Strongly Taken, Weakly Not Taken,
and Strongly Not Taken.
The FSM’s initial state is Weakly Not Taken. Upon encountering the subsequent
branch, if it is taken, the FSM transitions to the Actually Taken state. Conversely, if
the next branch is not taken, the FSM shifts to the Strongly Not Taken state. Should
a branch be taken following the Strongly Not Taken state, the FSM progresses to the
Weakly Taken state. Similarly, if an untaken branch follows the Weakly Taken state, the
FSM advances to the Strongly Taken state.
The primary function of this FSM is to forecast whether the forthcoming branch
will be taken or not. This prognostication hinges on the historical sequence of branch
outcomes. As an instance, if the past two branches were taken, the FSM is inclined to
anticipate that the subsequent branch will follow suit in being taken.
This approach reduces the number of mispredictions and improves processor perfor-
mance by accurately predicting branch outcomes. By implementing 2-bit dynamic branch
prediction, the aim is to achieve higher prediction accuracy, reduce pipeline stalls, and
optimize the overall performance of the processor for various applications.
4.4 Summary
Hence to summarize it can be said that Control and Status Registers (CSRs) are a
vital component of the RISC-V instruction set architecture. They provide a way to con-
trol and monitor the processor’s behavior, configuration, and operational status. Coming
to Branch prediction, it is a crucial aspect of modern processor design that helps im-
prove instruction throughput and overall performance by reducing the impact of branch
instructions (such as conditionals and loops) on the pipeline.
The next chapter shows the various results of the processor design and its discussions.
Chapter 5
Results & Discussions
CHAPTER 5
RESULTS & DISCUSSIONS
This chapter will discuss about the results of the test program run on the Verilog
design of the processor design. To test the working of the processor, a test assembly
program is written which consists of all the instructions being supported and it is assem-
bled using RARS software. The instruction code generated by the RARS is fed into the
instruction memory of the Verilog design using the Verilog directive ’$reammemh’.
In the first two sections the outputs of the unpipelined and pipelined versions of
the processor is discussed. In the subsequent sections Branch prediction unit and CSR
module is tested as stand alone modules. They are tested for appropriate instructions to
verify their working.
Figure 5.1: Test program for Register, Immediate and Store type instructions
The assembly test program testing all the R-type, I-type and S-type instructions are
given in the Figure 5.1.
In the subsequent figures the step outputs from the RARS software is given on the
left side of the figure and the corresponding Vivado outputs is given on the right side
of the figure. In the Vivado output two signals are indicated, namely ALU Result and
WriteBack.
The outputs for the first 11 instructions are observed in the Figure 5.2 . On the right
side, one can observe that the values changed in the register file and data memory using
RARS software which is compared to the design on the right side which is the output
waveform of the Verilog code. It can be observed that the output from RARS and Verilog
model is matching as it is. This verifies the working of the processor. In the subsequent
Figures 5.3 and 5.4, the outputs for the next instructions can be seen.
Figure 5.5: Test program to check Branch, Jump and Upper type instructions
Figure 5.6: Output for the Branch, Jump and Upper type instructions
The test program to test the Branch, Jump and Upper type instructions are given in
the Figure 5.5. The program when run on the RARS software shows the infinite looping
of the PC between 00000028H and 00000044H. This can be observed in the output
waveform in the Figure 5.6 by looking at the signal i addr that is highlighted as PC. This
verifies the working of the Branch, Jump and Upper type instructions of the designed
processor.
The output signals i data, result and DataOut WB are as observed in the Figure 5.8.
It can be observed that the instruction code for the instruction addi t0,x0,0x00f is fetched
by the IF stage(highlighted as a). Next the result is calculated in the Execute stage which
arrived after two cycles(highlighted as b) and finally getting written back to the register
file in the Write-Back stage(highlighted as c).
instruction:
Figure 5.10: Writing to the mtvec register using CSR write immediate(csri)
5.6 Summary
Hence in summary the outputs of Unpipelined, Pipelined, Branch Prediction Unit
and Control Status register are shown and its results are discussed. The images are taken
from the Vivado simulation which is done to test the various modules of the processor.
The next chapter deals with the conclusion of this project and its future scope.
Chapter 6
Conclusion and Future Scope
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
6.1 Conclusion
In this project, data-path for the instructions under RV32I base ISA is designed
using Verilog HDL. To simplify the design, the data-path has been iterated through
various versions supporting all types of instructions under RV32I. The design was later
clustered into various pipeline stage modules adding the necessary ports to interface with
memory units. The single-cycle processor has been converted into a pipe-lined data-path
demonstrating the propagation of instructions through various stages.
Firstly, a single-cycle RISC-V data-path has been designed that supports every in-
struction type in the RV32I Instruction Set, namely R, I, S, B, J and the U-types. Next
step has been to achieve the 5-stage pipe-lining of each of the stages. The 5 stages are In-
struction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory (MEM) and Write
Back (WB). In addition to this, a dynamic branch prediction unit has been developed to
mitigate the control hazards. Furthermore, a successful attempt has been made to equip
the processor with the CSR instructions.
Thus, the project has enabled a deeper understanding of the RISC-V architecture and
demonstrated an effective blending of theoretical concepts and real-world design choices
to create a functional and flexible data-path for a RISC-V processor.
6.4 Limitations
Despite the notable achievements in implementing the RV32I Instruction Set Archi-
tecture data-path, the project work does exhibit certain limitations that warrant acknowl-
edgment.
The current scope of testing for the hardware designs is confined to programs authored
by the project designers themselves. While these tests provide essential validation, they
are not exhaustive and do not encompass the wide range of potential inputs and corner
cases that real-world applications may introduce.
A key limitation pertains to the project’s readiness for integration into the broader
Very Large Scale Integration(VLSI) workflow. The absence of formal verification, a crit-
ical phase in the VLSI design process, impedes the project’s progression to subsequent
stages. Without comprehensive formal verification, the hardware designs lack the rigorous
validation needed for confident advancement.
Another limitation concerns the exploration of pipeline hazards. Although a 5-stage
pipeline version of the processor was developed, the project’s approach to handling
pipeline hazards remains incomplete. These hazards, encompassing structural, data, and
control dependencies, can significantly impact the processor’s execution efficiency and
effectiveness.
The next limitation is handling a trap(or)interrupt using csr registers.Though it was
developed as a seperate module, the module could not be integrated into the pipeline and
the trap(or)interrupt handling could not be well understood.
Additionally, the project’s testing and validation rely primarily on programs authored
by the project designers. This approach may not sufficiently cover the diverse software
workloads and scenarios that a real-world processor may encounter.
In summary, while the project work has yielded valuable insights , the aforementioned
limitations emphasize the need for further refinement and validation. Addressing these
limitations would enhance the project’s readiness for integration into the VLSI work-
flow, enable comprehensive formal verification, and ensure the thorough consideration of
pipeline hazards for a more robust processor design.
6.5 Applications
• Educational Tools: The project’s hardware designs serve as educational tools
for hands-on learning of processor architecture, benefiting students and researchers
alike.
• Skill Development: Aspiring engineers and professionals can leverage the hard-
ware designs for skill development in hardware description languages, circuit design,
and simulation techniques.
• Understanding the use of CSRs to control the state of the processor and to provide
information about its state.
• Usage of a variety of software tools (Vivado, RARS, RIPES) to simulate and verify
processor design.
6.7 Summary
In summary, this section encapsulates the culmination of the project, providing an
overview of key accomplishments, future potential, and learning outcomes.
The project successfully achieved the design and implementation of a single cycle
RISC-V processor, also enhancing instruction execution efficiency through stages includ-
ing IF, ID, EX, MEM, and WB. Dynamic branch prediction techniques were integrated
to mitigate control hazards. Future potential lies in the exploration of advanced proces-
sor architectures and optimization strategies.Learning outcomes encompass a profound
understanding of RISC-V architecture, hands-on experience in processor design and dy-
namic prediction techniques.
In conclusion, the project embodies innovation and continuous learning, with its im-
pact resonating in education, research, and open-source collaboration. As the chapter
concludes, it emphasizes the dual nature of accomplishment and ongoing exploration,
paving the way for a dynamic and evolving future in computer architecture.
The subsequent section delves into the references that have guided and influenced the
project’s progression.
[1] A. Rani and N. Grover, Novel Design of 32-bit Asynchronous (RISC) Microproces-
sor its Implementation on FPGA. International Journal of Information Engineering
and Electronic Business, 2018.
[3] Bharath, P. Vijay, and B. SanthiBhushan, MIPS Based 32-Bit RISC Processor
with Thermal Management Unit and Flexible Pipelining Structure. IEEE 9th Uttar
Pradesh Section International Conference on Electrical, Electronics and Computer
Engineering (UPCON)., 2022.
[5] Truong, T. Giang, T. G. Do, and T. D. Do., RISC-V Random Test Generator. IEEE
15th International Conference on Advanced Computing and Applications, 2021.
[6] Miura, Junya, H. Miyazaki, and K. Kise, A Portable and Linux capable RISC-V
computer system in Verilog HDL. arXiv preprint arXiv:2002.03576, 2020.
[7] Zhou and W. et al., A Novel Sleep Scheduling Strategy on RISC-V Processor. Jour-
nal of Physics: Conference Series. Vol. 1631. No. 1. IOP Publishing, 2020.
[8] Singh, Aslesa, and et al., Design and implementation of a 32-bit isa risc-v proces-
sor core using virtex-7 and virtex-ultrascale. IEEE 5th International Conference on
Computing Communication and Automation (ICCCA), 2020.
[9] A. Rathi, C., and et al., Design and development of an efficient branch predictor
for an in-order RISC-V processor. 2020.
[10] Lee, Jaewon, and et al., RISC-V FPGA platform toward ROS-based robotics applica-
tion. 30th International Conference on Field-Programmable Logic and Applications
(FPL). IEEE, 2020.
51
RV College of Engineering® , Bengaluru - 560059
[11] Höller, Roland, and et al., Open-source RISC-V processor ip cores for FPGAs—overview
and evaluation. 8th Mediterranean Conference on Embedded Computing (MECO).
IEEE, 2019.
[12] Bhagat, S. M., and S. U. Bhandari., Design and Analysis of 16-bit RISC Proces-
sor. Fourth International Conference on Computing Communication Control and
Automation (ICCUBEA). IEEE, 2018.
[13] Gür, Etki, and et al., FPGA implementation of 32-bit RISC-V processor with web-
based assembler-disassembler. International Symposium on Fundamentals of Elec-
trical Engineering (ISFEE). IEEE, 2018.
[14] Lee, Yunsup, and et al., An agile approach to building RISC-V microprocessors.
IEEE Micro 36.2 (2016): 8-20., 2016.
[15] Patil, Vinayak, and et al., Out of order floating point coprocessor for RISC V ISA.
19th International Symposium on VLSI Design and Test. IEEE, 2015, 2015.
[16] Waterman, Andrew, and et al., The risc-v instruction set manual. volume 1: User-
level isa, version 2.0. California Univ Berkeley Dept of Electrical Engineering and
Computer Sciences, 2014.
[17] Dandamudi and S. P., Guide to RISC processors: for programmers and engineers.
Springer Science Business Media, 2005.
[19] N. Bruns, V. Herdt, D. Große, and R. Drechsler, “Toward RISC-V CSR Compliance
Testing,” IEEE Embedded Systems Letters, vol. 13, no. 4, pp. 202–205, 2021.
[22] G. Gayathri, S Jaya, et al., “An Area Optimized Floating-Point Coprocessor for
RISC-V Processor,” in 2023 International Conference on Control, Communication
and Computing (ICCC), IEEE, 2023, pp. 1–5.
[24] T. Zheng, G. Cai, and Z. Huang, “A Soft RISC-V Processor IP with High-performance
and Low-resource consumption for FPGA,” in 2022 IEEE International Symposium
on Circuits and Systems (ISCAS), IEEE, 2022, pp. 2538–2541.
[25] J.-Y. Lai, C.-A. Chen, S.-L. Chen, and C.-Y. Su, “Implement 32-bit RISC-V Archi-
tecture Processor using Verilog HDL,” in 2021 International Symposium on Intelli-
gent Signal Processing and Communication Systems (ISPACS), IEEE, 2021, pp. 1–
2.