Abs 2

Design and Simulation of 5-Stage
RISC-V Pipeline
A Minor Project Report (18EC64)

Submitted by,
Ishita Singh 1RV20EC195
Sourab Somasundar 1RV20EC157
Sudeep Joshi 1RV20EC161
Wafa Naaz Shaik 1RV20EC191
Under the guidance of

Prof. Mahendra B M
Assistant Professor
Dept. of ECE
RV College of Engineering
In partial fulfillment of the requirements for the degree of

Bachelor of Engineering in
Electronics and Communication Engineering
2022-23
RV College of Engineering® , Bengaluru
(Autonomous institution affiliated to VTU, Belagavi )
Department of ECE
.
CERTIFICATE
Certified that the minor project (18EC64)work titled Design and Simulation of 5-
Stage RISC-V Pipeline is carried out by Ishita Singh (1RV20EC195), Sourab
Somasundar (1RV20EC157), Sudeep Joshi (1RV20EC161) and Wafa Naaz Shaik
(1RV20EC191) who are bonafide students of RV College of Engineering, Bengaluru, in
partial fulfillment of the requirements for the degree of Bachelor of Engineering in
ECE of the Visvesvaraya Technological University, Belagavi during the year 2022-23. It
is certified that all corrections/suggestions indicated for the Internal Assessment have
been incorporated in the minor project. The minor project report has been approved as
it satisfies the academic requirements in respect of minor project work prescribed by the
institution for the said degree.
Signature of Guide Signature of Head of the Department Signature of Principal
Prof. Mahendra B M Dr. H. V. Ravish Aradhya Dr. K. N. Subramanya
External Viva
Name of Examiners Signature with Date
1.
2.
DECLARATION
We, Ishita Singh , Sourab Somasundar , Sudeep Joshi and Wafa Naaz Shaik
students of sixth semester B.E., Department of ECE, RV College of Engineering, Ben-
galuru, hereby declare that the minor project titled ‘Design and Simulation of 5-
Stage RISC-V Pipeline’ has been carried out by us and submitted in partial fulfilment
for the award of degree of Bachelor of Engineering in ECE during the year 2022-23.
Further we declare that the content of the dissertation has not been submitted previously
by anybody for the award of any degree or diploma to any other university.
We also declare that any Intellectual Property Rights generated out of this project carried
out at RVCE will be the property of RV College of Engineering, Bengaluru and we will
be one of the authors of the same.
Place: Bengaluru
Date:
Name Signature
1. Ishita Singh(1RV20EC195)
2. Sourab Somasundar(1RV20EC157)
3. Sudeep Joshi(1RV20EC161)
4. Wafa Naaz Shaik(1RV20EC191)

ACKNOWLEDGEMENTS
We are indebted to our guide, Prof. Mahendra B M, Assistant Professor, RV

College of Engineering . for the wholehearted support, suggestions and invaluable advice
throughout our project work and also helped in the preparation of this thesis.
We also express our gratitude to our panel members Dr. Govinda Raju M, As-
sociate Professor and Prof. Ravishankar Holla, Assistant Professor, Department of
ECE for their valuable comments and suggestions during the phase evaluations.
Our sincere thanks to the project coordinators Dr. Veena Devi S V, Prof.
Sindhu Rajendran and Prof. Subramanya K N for their timely instructions and
support in coordinating the project.
Our gratitude to Prof. Narashimaraja P for the organized latex template which
made report writing easy and interesting.
Our sincere thanks to Dr. H. V. Ravish Aradhya, Professor and Head, Depart-
ment of ECE, RVCE for the support and encouragement.
We express sincere gratitude to our beloved Principal, Dr. K. N. Subramanya for

the appreciation towards this project work.
We thank all the teaching staff and technical staff of ECE department, RVCE for
their help.
Lastly, we take this opportunity to thank our family members and friends who pro-
vided all the backup support throughout the project work.
ABSTRACT
The rapid advancements in computing demands efficient data-path architectures that

balance performance and power consumption. RISC-V is an open standard Instruction
Set Architecture (ISA) based on established reduced instruction set computer principles.
RISC-V offers the opportunity to explore an open-source instruction set architecture.
It is designed to be simple, modular, and extensible, allowing for customization and
innovation. As it is royalty-free, many silicon vendors are developing RISC-V CPUs as
an alternative to proprietary instruction set architectures.
The objective of this project is to carry out the RTL design and simulation of a 5-stage
RISC-V data-path, aiming to achieve the complete functionality of all the instructions
in the RV32I base ISA. Control & Status Registers (CSRs) and Branch Prediction Unit
(BPU) have also been designed separately. The functional verification of the single-cycle
and pipe-lined data-paths are carried out. The pipe-lined design comprises of distinct
pipe-line stage modules — instruction fetch, decode, execute, memory, and write-back —
these architectural components converge into a unified central pipe-line core module.
The single-cycle design executes the complete RV32I ISA instructions. In parallel, the
pipe-lined design demonstrates the signal propagation across its 5 pipe-lined stages. The
architecture interfacing with the memory units is carried out. To verify the functionality
of the data-path, simulations are run by generating test codes from the RARS software
, which is then dumped into the Verilog code of the design and all the instructions are
successfully run on the simulated data-path. Finally, the work showcases the successful
design, and simulation of a 5-stage RISC-V data-path based on RV32I Instruction Set
Architecture, verified by running RTL simulations. In addition to this, the design of
Control & Status Registers and Branch Prediction Unit has been successfully achieved
and simulated.
i
CONTENTS
Abstract i
List of Figures iv
List of Tables v
Abbreviations vi
1 Introduction to RISC-V Processor 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Brief Methodology of the project . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 RV32I Base Integer Instruction Set & Implementation of Unpipelined

Processor 7
2.1 RV32I Base Integer Instruction Set . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Integer Arithmetic Instructions . . . . . . . . . . . . . . . . . . . 9
2.1.2 Integer Logical Instructions . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 Integer Comparison Instructions . . . . . . . . . . . . . . . . . . . 9
2.1.4 Integer Immediate Instructions . . . . . . . . . . . . . . . . . . . 10
2.1.5 Integer Load and Store Instructions . . . . . . . . . . . . . . . . . 10
2.1.6 Integer Control Flow Instructions . . . . . . . . . . . . . . . . . . 11
2.2 Designing of the Unpipelined Processor . . . . . . . . . . . . . . . . . . . 11
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Design optimization and the Pipelined Processor 16

3.1 The Central Core.v module . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Instruction Fetch module . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
ii
3.3 Instruction Decode module . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 Execute Stage module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Memory Stage module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6 Writeback Stage module . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Control Status Registers & Branch Prediction Unit 27

4.1 Modes in RISC-V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Machine Mode in RISC-V . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Branch Prediction Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5 Results & Discussions 37

5.1 Unpipelined Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Pipelined Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Branch Prediction Unit output . . . . . . . . . . . . . . . . . . . . . . . 43
5.4 Execution of CSRWI instruction . . . . . . . . . . . . . . . . . . . . . . . 44
5.5 Execution of Ecall & Mret . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6 Conclusion and Future Scope 46

6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3 Significance of the Project Work . . . . . . . . . . . . . . . . . . . . . . . 48
6.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.6 Learning Outcomes of the Project . . . . . . . . . . . . . . . . . . . . . . 49
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
iii
LIST OF FIGURES
1.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Block diagram of the unpipelined processor . . . . . . . . . . . . . . . . . 14
3.1 Core.v and IF.v modules . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 IF stage Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Instruction Decode ID.v Module . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 ID Stage Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Execution Stage EX.v Module . . . . . . . . . . . . . . . . . . . . . . . . 21
3.6 EX stage Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.7 Memory Stage MEM.v Module . . . . . . . . . . . . . . . . . . . . . . . 23
3.8 MEM Stage Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.9 Writeback Stage WB.v Module . . . . . . . . . . . . . . . . . . . . . . . 25
3.10 WB Stage Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 CSR Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2 CSR state diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Block Diagram of Branch Prediction Unit . . . . . . . . . . . . . . . . . 34
4.4 Finite State Machine of the Branch Prediction Unit . . . . . . . . . . . . 35
5.1 Test program for Register, Immediate and Store type instructions . . . . 38
5.2 Output of Unpipelined Processor-1 . . . . . . . . . . . . . . . . . . . . . 39
5.5 Test program to check Branch, Jump and Upper type instructions . . . . 41
5.6 Output for the Branch, Jump and Upper type instructions . . . . . . . . 42
5.7 Test program for the pipelined processor . . . . . . . . . . . . . . . . . . 43
5.8 Output of the 5-stage pipelined processor . . . . . . . . . . . . . . . . . . 43
5.9 Output of the Branch Prediction Unit . . . . . . . . . . . . . . . . . . . . 44
5.10 Writing to the mtvec register using CSR write immediate(csri) . . . . . . 44
5.11 Ecall and mret instruction execution . . . . . . . . . . . . . . . . . . . . 45
iv
LIST OF TABLES
3.1 Operations-to-Bits Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 20
v
ABBREVIATIONS
AI Artificial Intelligence
ALU Arithmetic and Logic Unit
BPU Branch Prediction Unit
BTB Branch Target Buffer
CISC Complex Instruction Set Computers
CPU Central Processing Unit
CSR Control Status Registers
EX Execute
FPGA Field Programmable Gate Array
FSM Finite State Machine
GRIFT GRIFT (Galois RISC-V ISA Formal Tools)
HDL Hardware Description Language
ID Instruction Decode
IF Instruction Fetch
IoT Interneet of Things
ISA Instruction Set Architecture
LB Load Byte
LBU Load Byte Unsigned
LH Load Half-Word
LHU Load Half-Word Unsigned
MEM Memory Access
MIE Machine Interrupt Enable
MPIE Machine Previous Interrupt Enable
PC Program Counter
PHT Pattern History Table
vi
RISC-V Reduced Instruction Set Computing-Five
RTL Register Transfer Logic
RV32I RISC-V 32-bit Integer
VLSI Very Large Scale Integration
WB Write-Back
vii
RV College of Engineering® , Bengaluru - 560059
Chapter 1
Introduction to RISC-V Processor
UG Minor Project Report Department of ECE 2022-23

CHAPTER 1
INTRODUCTION TO RISC-V PROCESSOR
1.1 Introduction
Reduced Instruction Set Computing-Five(RISC-V) is a free and open-source Instruc-
tion Set Architecture(ISA) that defines the set of instructions a computer processor can
execute. It is designed to be open-source, simple, modular, and extensible, allowing
for customization and innovation. RISC-V enables a wide range of applications, from
embedded systems to high-performance computing, fostering collaboration and driving
advancements in computer architecture [16].
One of the advantages of RISC-V architecture is its ability to improve compilation
speed compared to Complex Instruction Set Computers(CISC). The simplicity of the
instruction set allows for easier decoding and faster execution [14].
Taking up a project on RISC-V offers the advantages of working with an open-source,
customizable architecture that aligns with industry trends. It provides opportunities for
innovation, skill development, and community contribution, while preparing individuals
for future relevance in areas such as embedded systems, IoT, AI, and high-performance
computing [2].
The project aims to create a Central Processing Unit(CPU) based on the RISC-V
architecture. The CPU design follows a 5-stage pipeline approach, dividing the instruc-
tion execution into distinct stages for improved efficiency. The project also focuses on
simulating branch prediction technique as a separate module to resolve control hazards[7].
1.2 Motivation
The motivation to study RISC-V computer architecture stems from its flexible, open-
source nature, making it a prominent choice for research. RISC-V offers opportunities to
explore innovative designs, optimize performance, and contribute to computer architec-
ture advancement.
By working on a RISC-V project, one can explore open-source technologies, gain
insights into computer architecture, and contribute to the growing ecosystem. RISC-V’s
customizable platform enables experimentation, development, and specialized hardware.
Its widespread interest in academia and industry provides networking and knowledge-
sharing opportunities for potential contributions to the field.
UG Minor Project Report Department of ECE, 2022-23 2

Taking up a 5-stage pipeline in RISC-V as a project objective helps to understand

the fundamental principles of pipelining and its impact on performance. Secondly, it
allows for the understanding of various pipeline stages, such as instruction fetch, decode,
execute, memory, and writeback.
1.3 Problem statement

Developing a Functional RISC-V Processor along with the design of Control Status
Registers, Branch Prediction Unit Module as two seperate modules.
1.4 Objectives
The objectives of the project are:
1. To design and simulate RISC-V Datapath supporting instructions in RV32I ISA.
2. To design and simulate Control and Status Registers (CSR) and Branch Prediction
Unit (BPU) modules.
3. To verify the functionality of RV32I ISA in unpipelined and pipelined versions.
1.5 Literature Review

In paper [1], authors Archana Rani and Naresh Grover have discussed how a 32-bit
fully functional asynchronous processor has been designed using VHDL. The proces-
sor consists of five stages and has undergone functional simulation to ensure efficient
execution of instructions. The open-core design provides advantages over commercial
microprocessors, offering a deeper understanding of the asynchronous microprocessor’s
internal workings.
The author Lee, Yunsup, et al., in his paper [14], has introduced Agile Manifesto
that provides values and principles for software developers. The case study focuses on
the Raven-5 RISC-V Vector Microprocessor, highlighting its unique features like on-chip
DC-DC converter and adaptive clock. The agile hardware design is enabled by ”Chisel”
HDL, derived from the ”Scalar” programming language, allowing synthesis and compre-
hensive testing. Paper [7] discusses open-source RISC-V test suites like MicroTESK and
GRIFT (Galois RISC-V ISA Formal Tools)(GRIFT) assist in verification. It employs a
table-based approach for test generation, supplemented by the random weight method to

handle complex scenarios and corner cases. The coverage number is determined using a
methodology, and the coverage graph shows increased coverage with more test cases.
The paper [9] by Arul Rathi, C., et al., introduces a dynamic branch predictor that
utilizes a 2-bit saturation counter. When a branch is encountered for the first time, it
is noted in the Pattern History Table(PHT). The predictor will only predict the branch
again if it confirms the branch in its second appearance based on the saturation counter’s
state.
In paper [12], the researchers Shraddha M. Bhagat et al., have focused on optimal
performance across various process technologies of the RISC-V architecture. It explores
15 instructions within the architecture, implementing them using ModelSim, a tool for
simulation and verification of digital designs.
Saif et al. [20] presented an Field Programmable Gate Array(FPGA) implemen-
tation of an educational RISC-V processor suitable for embedded applications. They
demonstrated the feasibility of using RISC-V for educational purposes by designing and
implementing a processor on FPGA. Their work contributed to enhancing understanding
of RISC-V in teaching environments.
Poli et al. [21] designed and implemented a RISC-V processor on FPGA, with a
specific focus on mobility, sensing, and networking applications. They showcased the
adaptability of RISC-V architecture in different application domains and highlighted its
potential for enabling innovative solutions in the field.
Gayathri and Jaya [22] proposed an area-optimized floating-point coprocessor for
RISC-V processors. Their research addressed the need for efficient floating-point com-
putation in RISC-V architectures, contributing to improved performance in applications
requiring high-precision calculations.
Phangestu et al. [23] developed a five-stage pipelined 32-bit RISC-V base integer
instruction set architecture soft microprocessor core in VHDL. Their work delved into the
intricate details of pipeline architecture and provided insights into enhancing processor
performance through optimized pipelining strategies.
Zheng et al. [24] introduced a soft RISC-V processor IP with high-performance and
low-resource consumption for FPGA. By emphasizing resource efficiency, their research
catered to the requirements of resource-constrained environments, making RISC-V an
attractive option for a wider range of applications.

Lai et al. [25] implemented a 32-bit RISC-V architecture processor using Verilog
HDL, with a particular focus on intelligent signal processing and communication systems.
Their work explored the integration of RISC-V in signal processing, paving the way for
applications in communication and data analysis.
These studies collectively contribute to a deeper understanding of RISC-V architec-
ture, its implementation on FPGA, and its applications across various domains.
1.6 Brief Methodology of the project

The methodology of the project, as shown in Figure 1.1, is as follows
1. Understand the RV32I standard base Instruction Set Architecture by studying the
RISC-V Instruction Set Manual. Instructions are selected, and their datapaths are
implemented using Verilog HDL and Vivado design suite.
2. Certain types of instructions which are discussed in the further chapters are selected
and data-path supporting those instructions are implemented.
3. These individual datapaths are combined to create a single-cycle datapath, which

is then scaled to support all instructions in the ISA manual.
4. Divide the datapath into five pipeline stages.
Figure 1.1: Methodology

• Chapter 2 discusses the RV32I base integer instruction set and the implementation
of unpipelined datapath. It explores each instruction type in the RV32I Instruction
Set.
• Chapter 3 discusses the implementation of the 5 stages of pipelining, namely, in-

struction fetch, instruction decode, execute, memory and write back stages.
• Chapter 4 explores the Control & Status Registers (CSR) and their implementa-
tion. In addition to this, a control hazard mitigation technique, dynamic branch
prediction unit has been implemented.
• Chapter 5 discusses the results and evaluates the output obtained during the im-
plementation of the
• Chapter 6 concludes the project and discusses the future scope associated with the
design of open-source RISC-V datapath.
1.7 Summary
This chapter introduces the concept of RISC-V processor and the aim of this project.It
also states the motivation for this project and its problem statement.A brief literature
review is also written.Finally a brief methodology of the project is discussed along with
the brief introduction of what each chapter in this report deals with.
The next chapter deals with the instructions in RV32I Integer Instruction Set and
how to go about the implementation of the unpipelined datapath.

Chapter 2
RV32I Base Integer Instruction Set
& Implementation of Unpipelined
Processor

CHAPTER 2
RV32I BASE INTEGER INSTRUCTION SET &
IMPLEMENTATION OF UNPIPELINED
PROCESSOR
This chapter serves as an introductory exploration of the RV32I Instruction Set Archi-
tecture (ISA), focusing on its significance within the realm of RISC-V processor design.
RV32I stands for RISC-V 32-bit Integer ISA, forming a fundamental subset of the broader
RISC-V instruction set. The objective of this chapter is to establish a foundational un-
derstanding of RV32I’s essentials.
The RV32I ISA is designed specifically for 32-bit integer operations, forming the
bedrock of various RISC-V processor designs. A noteworthy mandate, outlined in the
ISA Manual [16], is the requirement for every RISC-V design to encompass support for
the complete instruction set under RV32I. This is explored in the first section of the
chapter. Another crucial aspect highlighted in the instruction set manual’s volume two
[18] is the necessity for every RISC-V design to incorporate support for the Machine
mode. As a cornerstone of the RISC-V privilege architecture, the Machine mode governs
the processor’s interactions with memory, peripherals, and underlying resources.
The subsequent segment of this chapter shifts the focus to designing an unpipelined
processor design. While the RV32I ISA forms the foundation for instruction set archi-
tecture, the execution of these instructions profoundly impacts overall processor perfor-
mance. An introduction to the unpipelined design provides insights into the sequential
execution of instructions, serving as a precursor to subsequent discussions on pipeline
optimization strategies.
The introduction of RV32I instructions takes precedence within this chapter due to its
pivotal role in shaping hardware design, development, and debugging endeavors. A robust
grasp of RV32I instructions is essential for crafting a functional processor architecture
aligned with the RISC-V standard.
2.1 RV32I Base Integer Instruction Set

The RV32I Base Integer Instruction Set is the fundamental set of integer instruc-
tions for the RISC-V architecture targeting 32-bit processors. These instructions provide

the basic arithmetic, logical, memory access, and control flow operations necessary for
general-purpose computation. The RV32I Base Integer Instruction Set consists of the
following categories of instructions:
2.1.1 Integer Arithmetic Instructions

These instructions are a fundamental part of the instruction set, allowing arithmetic
operations to be performed on integer data. These instructions operate on 32-bit signed
or unsigned integer values and include addition, subtraction operations.The main integer
logical instructions are:
add rd, rs1, rs2: Add the values in register rs1 and rs2 and store the result in
register rd.
sub rd, rs1, rs2: Subtract the value in register rs2 from the value in rs1 and store
the result in register rd.
2.1.2 Integer Logical Instructions

In RISC-V, Integer Logical Instructions are a set of instructions that perform bitwise
logical operations on integer data. These instructions operate on 32-bit unsigned integers
and include bitwise AND, OR, XOR, shift left, shift right, and arithmetic shift right
operations.They allow efficient manipulation of individual bits or groups of bits in integer
values. The main integer logical instructions are:
and rd, rs1, rs2 Perform a bitwise AND operation on the values in register rs1
and rs2 and store the result in register rd.
or rd, rs1, rs2 Perform a bitwise OR operation on the values in register rs1 and
rs2 and store the result in register rd.
sll rd, rs1, rs2 Perform a logical left shift on the value in register rs1 by the
number of bits specified in register rs2 and store the result in register rd.
srl rd, rs1, rs2 Perform a logical right shift on the value in register rs1 by the
number of bits specified in register rs2 and store the result in register rd.
sra rd, rs1, rs2 Perform an arithmetic right shift on the value in register rs1 by
the number of bits specified in register rs2 and store the result in register rd.
2.1.3 Integer Comparison Instructions

Integer Comparison Instructions are used to compare integer values and set flags
based on the result of the comparison. These instructions allow programmers to perform

various types of comparisons, such as checking if one value is equal to another, checking
if one value is less than another, or checking if one value is less than or equal to another.
The comparison instructions are mainly used for making decisions in control flow and
conditional branching. The main integer comparison instructions are:
slt rd, rs1, rs2 Set register rd to 1 if the value in register rs1 is less than the value
in rs2; otherwise, set rd to 0(signed comparison).
sltu rd, rs1, rs2 Set register rd to 1 if the value in register rs1 is less than the
value in rs2; otherwise, set rd to 0(unsigned comparison).
2.1.4 Integer Immediate Instructions

Integer Immediate Instructions are used to perform arithmetic and logical operations
between a register and an immediate value (constant) in a single instruction. Immediate
instructions provide a convenient way to work with constants without having to load them
from memory explicitly. The immediate value is encoded directly within the instruction
itself, allowing for efficient computations.The main integer immediate instructions are:
addi rd, rs1, imm: Add the immediate value imm to the value in register rs1 and
store the result in register rd.
andi rd, rs1, imm: Perform a bitwise AND operation between the value in register
rs1 and the immediate value imm, storing the result in register rd.
ori rd, rs1, imm: Perform a bitwise OR operation between the value in register
xori rd, rs1, imm: Perform a bitwise XOR operation between the value in register
slti rd, rs1, imm: Set register rd to 1 if the value in register rs1 is less than the
immediate value imm; otherwise, set rd to 0 (signed comparison).
sltiu rd, rs1, imm: Set register rd to 1 if the value in register rs1 is less than the
immediate value imm; otherwise, set rd to 0 (unsigned comparison).
2.1.5 Integer Load and Store Instructions

Integer Load and Store Instructions are used to access and manipulate data stored
in memory. These instructions are essential for reading data from memory into registers
(load) and writing data from registers into memory (store). Memory in RISC-V is byte-
addressable, and the instructions support loading and storing 32-bit (word) values. The

main integer load and store instructions are:

lw rd, offset(rs1) Load a 32-bit word from memory at the address (rs1 + offset)
and store it in register rd.
sw rs2, offset(rs1) Store the 32-bit value in register rs2 to memory at the address
(rs1 + offset).
2.1.6 Integer Control Flow Instructions

Integer Control Flow Instructions are used to manage the flow of execution in a pro-
gram. These instructions allow the program to make decisions based on specific conditions
and perform jumps or branches to different parts of the code. The main integer control
flow instructions are:
beq rs1, rs2, offset Branch to the instruction at the address (PC + offset) if the
values in registers rs1 and rs2 are equal.
bne rs1, rs2, offset Store the 32-bit value in register rs2 to memory at the address
(rs1 + offset).
jal rd, offset Jump and link. Store the address of the next instruction (PC + 4)
in register rd and jump to the address (PC + offset).
2.2 Designing of the Unpipelined Processor

The initial objective in designing an unpipelined processor was to gain a comprehen-
sive grasp of accommodating diverse instruction types, comprehending their respective
data paths, and formulating Verilog code for the purpose of assessing their functional
accuracy. To facilitate the debugging process, the design was systematically undertaken
in distinct phases, involving the selection of specific instructions and the subsequent de-
velopment of corresponding data paths. Proposed processor design encompassed distinct
iterations, with the initial being the R-type processor variant. This iteration, has focused
on the integration of selected instructions from the R-type category. Subsequently, grad-
ual advancement to the I-type processor version, wherein specific I-type instructions were
incorporated into the data path and subjected to testing.
The third evolution of the proposed processor materialized as the RIS-type(R-type,
I-type and S-type) processor. Within this iteration,integrated instructions from the R,
I, and S types was executed. Notably, this encompassed a comprehensive array of arith-
metic, load, and store instructions. The output yielded by the RIS-type processor is

comprehensively presented in the results section.

Culminating the series of iterations was the unpipelined RISC-V processor core. This
one incorporated support for Branch, Jump, and Upper type instructions. To enhance
future design endeavors, the datapath is strategically divided into distinct stages, laying
the groundwork for the forthcoming pipelined processor design.
Central to our architecture was the ”core.v” module, endowed with bus privileges to
facilitate interaction with instruction and data memory. A detailed exploration of the
core.v module is undertaken in the subsequent chapter.
The designing of unpipelined processor of RISC-V involves of following steps: Design
of the datapath. The datapath is the physical circuitry that implements the instructions
in the ISA. Design of the control unit. The control unit is responsible for sequencing the
execution of instructions. Simulate the processor in software.
The focus is on achieving the fundamental stages of instruction processing, covering
instruction fetch, decode, execute, memory access (if applicable), and write back.
• Instruction Fetch(IF) Stage: At the heart of the processor, the Instruction Fetch
(IF) stage is responsible for fetching instructions from memory based on the pro-
gram counter (PC). It updates the PC to indicate the address of the subsequent
instruction to be executed.
• Instruction Decode(ID) Stage: The Instruction Decode (ID) stage interprets the
fetched instruction and extracts essential information such as the opcode, source
and destination registers, immediate values, and other relevant fields.
• Execute(EX) Stage: The Execution (EX) stage is where the actual operation speci-
fied by the instruction is performed. It encompasses arithmetic and logic operations
through the Arithmetic and Logic Unit(ALU), which takes inputs from registers as
dictated by the instruction’s opcode.
• Memory Access(MEM) Stage (if necessary): If the instruction necessitates mem-

ory access, the Memory Access (MEM) stage handles reading from or writing to
memory. This stage is vital for load/store instructions that interact with the data
memory.
• Write-Back(WB) Stage: The Write Back (WB) stage finalizes the instruction’s

execution by writing the result back to the destination register specified by the
instruction.
• Control Unit: The Control Unit coordinates the data and instruction flow between
the various stages. Additionally, it manages branching and control flow instructions
(e.g., jump, branch) by updating the program counter accordingly.
• Register File: The Register File is a critical component of the processor, hous-
ing the RISC-V processor’s general-purpose registers. It facilitates read and write
operations to these registers, enabling data manipulation and transfer.
• Arithmetic and Logic Unit(ALU): The ALU is responsible for executing arithmetic
and logic operations based on the opcode received from the Instruction Decode
stage. It forms the core computational element of the processor.
• Memory Unit : The Memory Unit, if required, facilitates data access for load/store
instructions. It interacts with the data memory to fetch or store data from or to
the processor’s register file.
• Data Path: To create a coherent and functional unpipelined RISC-V processor,all

the components are connected, including the Register File, ALU, Memory Unit (if
applicable), and the Control Unit, to establish a data path. This data path enables
seamless data flow through the processor during instruction execution.

Figure 2.1: Block diagram of the unpipelined processor
2.3 Summary
In summary, this chapter deals with the various instructions of the RV23I Integer
Instruction Set, Arithmetic,Logical,Comparison,Immediate,Load and Store and Control

Flow instructions.Furthermore we discuss the objective in designing an unpipelined pro-

cessor and then move on to the concept of pipeline and the various stages required for
its implementation.
The next chapter deals with the pipelined processor and its stages in detail.

Chapter 3
Design optimization and the
Pipelined Processor

CHAPTER 3
DESIGN OPTIMIZATION AND THE PIPELINED
PROCESSOR
A pipelined processor operates by guiding instructions through distinct stages: IF
(Instruction Fetch), ID (Instruction Decode), EX (Execute), MEM (Memory Access),
and WB (Write-Back). This architectural approach divides the datapath that are con-
currently handled at various stages.
In contrast, the RISC-V Unpipelined iteration of the processor exhibited a different
configuration. It featured a multitude of control signals from the control unit, and mod-
ules were not segmented into discrete stages. Ensuring design simplicity was a pivotal
consideration, necessitating a reduction in the number of signals employed to orchestrate
processor operations. RISC-V’s ISA offered a solution to this challenge by encoding com-
prehensive instruction details within its instruction code, utilizing opcode and function3
signals. These signals, each spanning 7 and 3 bits respectively within the instruction
code, facilitated the generation of multiplex select signals in every stage. This optimiza-
tion strategy was seamlessly incorporated into the design.
3.1 The Central Core.v module
a) Central CPU Core.v Module b) Instruction Fetch IF.v Module
Figure 3.1: Core.v and IF.v modules
In a real-world context, the CPU is positioned at the core, serving as the interface
between instruction memory and data memory. In the devised structure, each stage
is meticulously defined based on its operation within the processor. The IF stage, for
instance, was integrated with the cs i n signal (chip select for instruction memory, active

low) to facilitate interaction with the instruction memory. Similarly, the MEM stage
incorporated signals such as rd (read), wr (write), and cs d n (chip select for data memory,
active low) to accommodate its specific functionalities. All of this is confined into a single
top-level module Core.v as shown in the Figure 3.1 (a). Control signals i data cs i n,
i addr are used to interface the processor with instruction memory. Control signals
d data, cs d n, wr, rd, Data write MEM are used to interface the processor with data
memory.
3.2 Instruction Fetch module

The Instruction Fetch stage assumes the crucial role of managing the Program Counter(PC)
and effecting modifications in response to instructions. The module takes clk, rst, PC src,
jalr, i data, result EX, and immOut EX as inputs as shown in the Figure 3.1 (b). Signals
cs i n, i addr, instrCode, and PC are the output of the module. In cases of sequential
execution, the PC undergoes incrimination by 4 in each cycle. However, when directives
emerge from the EX (Execution) stage stipulating branch or jump operations via PC src
and jalr signals, the PC is adjusted correspondingly. Notably, the signals result EX and
immOut EX, originating from the EX stage, are added the PC as the offset for branches
and jumps.
Figure 3.2: IF stage Architecture

The internal architecture of the IF.v module, as shown in Figure 3.2, takes the cur-
rent PC and adds 4 to it, creating a new signal called PC 4. Additionally, the PC is
combined with another value called immOut, which is shifted by one bit to the left. This
combination helps to create a signal called Next PC.
Now, deciding what Next PC should be involves two signals: PC src and jal. PC src
tells whether there is a need to do a branch or a jump in the program. On the other
hand, the jal signal indicates that the instruction is JALR, which means there is a need
to change the PC in a specific way to make things work correctly.
3.3 Instruction Decode module
Figure 3.3: Instruction Decode ID.v Module
The ID.v module (Instruction Decode), as shown in Figure 3.3, is influenced by the
clock (clk ) and reset (rst) signals. It relies on three signals from the WB (Write-Back)
stage: RegWrite, rd WB, and Data WB. When the RegWrite signal is active, it means
that the Data WB value needs to be stored in a register at the address given by rd WB.
The ID stage also takes the instruction Code from the IF stage to decipher it. It then
passes along three important pieces of information—function3, opcode, and rd—to the
stages that come after it. Additionally, the PC value from the IF stage is carried forward
as is to the next stage. Another critical signal, alu ctrl, is also passed along to the next
stage. This alu ctrl signal serves as instructions to the ALU (Arithmetic and Logical
Unit), directing it on what specific operation to perform. Lastly, immOut represents the
immediate value embedded within the instruction code.

Figure 3.4: ID Stage Architecture
As shown in the internal architecture of the ID stage in the Figure 3.4, it comprises of
three important components: the register file, the immediate generator, and the control
unit modules. The register file contains all registers ranging from x0 to x31. It also
generates a signal called read data valid, indicating when the data retrieved from the
read ports of the register file is valid.
The instruction code undergoes a decoding process within this module. This results
in the extraction of three specific pieces of signals: rs1 (register source 1), rs2 (register
source 2), and rd (register destination). The instruction code is also fed into the immedi-
ate generator module. Here, the immediate value is extracted and sign extended, based
on the type of instruction indicated by the opcode signal.
ALU Operation alu ctrl

ADD 0000
SUB 0001
AND 0010
OR 0100
XOR 1000
SRL 1001
SLL 1010
SRA 1100
BUF 1101
Table 3.1: Operations-to-Bits Mapping
Furthermore, the instruction code is transmitted to the control unit module. This

control unit generates an alu ctrl signal, which is then sent to the stage for execution.
The alu ctrl mapping is as given in Table 3.1.
3.4 Execute Stage module
Figure 3.5: Execution Stage EX.v Module
The EX.v module as shown in the Figure 3.5 adeptly receives inputs from the preced-
ing ID stage—including Read1, Read2, ‘alu ctrl‘, ‘alu data valid‘, ‘rd‘, ‘immOut‘, ‘func3‘,
‘opcode‘, and ‘PC‘.
Several of these inputs, namely PC, rd, immOut, func3, and opcode, are relayed
directly to the subsequent stage, ensuring their continued integration and significance in
the overall pipeline progression.
Furthermore, the module provides a looping back of the signals result it generates and
the immOut value back to the IF stage. This looping back is of importance, particularly
in scenarios involving branch and jump instructions.
Among the received inputs, the opcode facilitates the intrinsic multiplexers within
the EX stage to judiciously select the appropriate operands to channel to the ALU, thus
orchestrating the precise operation as mandated by the instruction.
In parallel, the func3 signal assumes a position of significance, especially in the context
of store instructions. This value plays a vital role in executing sign extension before the
contents of Read2 are dispatched to the subsequent stage for storage in the data memory
unit.

Figure 3.6: EX stage Architecture
The internal architecture of the EX stage is shown in the Figure 3.6. This stage com-
prises three intricate sub-modules, encompassing multiplexers and comparators. At its
core, the Arithmetic and Logical Unit (ALU) undertakes the responsibility of performing
operations on operands. These operation are guided by the alu ctrl signal, meticulously
generated within the Control Unit during the Instruction Decode (ID) stage. This sig-
nal traverses the pipeline register, channeling the precise guidance necessary for ALU’s
actions.
Preceding the ALU, two multiplexers diligently channel diverse operations requisite
for distinct instructions. In the realm of instructions such as Register-Register types and
Branch types, Read1 and Read2 are sent to the ALU. Conversely, for Immediate-Register
types, Load and Store instructions, the path involves Read1 and immOut passing into
the ALU. For Jump instructions, PC and immOut are passed into the ALU’s operations.
All of this guided by the two multiplexes which are situated before the ALU, the select
signals are generated using opcode signals passed from the ID stage.
The Store Sign Extend module plays a pivotal role in extending signs to words, half-

words, or bytes, contingent upon the specific store instruction’s nature. Output stemming
from this module seamlessly progresses to the subsequent stage, particularly the Data
Write bus intended for data memory.
The Branch-Jump module helps in determining the occurrence of a branch. This
module critically assimilates the flags emanating from the ALU, dissecting their signif-
icance in the context of branching instructions. In instances of branch instructions, a
pivotal decision is meticulously formulated based upon the cogent analysis of flag values.
This decision subsequently manipulates the PC src, potentially elevating it to ’1’ when
a branch is determined. Conversely, in scenarios where the branch is not taken, PC src
regresses to ’0’. Notably, when a jump instruction surfaces, PC src directly made as ’1’.
The jalr signal signifies the emergence of the JALR instruction, which is the only
instruction which requires PC to be changed to the result produced by the ALU.
3.5 Memory Stage module
Figure 3.7: Memory Stage MEM.v Module
The MEM module as shown in the Figure 3.7, efficiently processes crucial inputs
from prior stages, including func3, opcode, alu flags (lt and ltu), rd r (read address of
the register file), and PC. These inputs are prepared for the next stage, ensuring their
ongoing use in the pipeline.

This module also handles data memory interactions using control signals like d data(data
read from the data memory), d addr (address for the data memory, rd (read signal),
wr (write signal), and cs d n(chip select for the data memory). These signals manage
communication with the data memory unit. The Data Store is the data which is sup-
posed to be stored into the Data memory in the case of Store-type instructions.
Figure 3.8: MEM Stage Architecture
The internal architecture of the MEM stage is shown in the Figure 3.8. The opcode
undergoes two comparisons within the module. It is matched against the opcodes of
load and store instructions using comparators. In the case of a load instruction match,
the rd (read) signal is activated. For a store instruction match, the wr (write) signal is
activated. When either wr or rd signals are active, the chip select signal cs d n is also
activated. When memory operations are inactive, memory interface lines transition to a
high impedance state (Z-state).
Rest of the signals are directly forwarded to the subsequent stage without alteration.
This ensures flow of relevant data for further processing.

3.6 Writeback Stage module
Figure 3.9: Writeback Stage WB.v Module
The WB.v module is shown in the Figure 3.9. It generates DataOut WB to be written
back to the register file depending on the type of the instruction indicated by opcode and
func3. Signals RegWrite and rd WB instructions the register file to perform writing
operation on the address indicated by rd WB.
Figure 3.10: WB Stage Architecture

As shown in the Figure 3.10, SIGN EXTEND module plays a critical role in this
phase. It performs sign extension on the read data sourced from the data memory. This
operation holds paramount importance for load instructions such as Load Half-Word(LH),
Load Byte(LB), Load Half-Word Unsigned(LHU), and Load Byte Unsigned(LBU). These
instructions require the data from the MEM stage’s data memory to be adequately sign-
extended. The specifics of sign and word, half-word, or byte adjustments are encoded
within the func3 field.
The lt and ltu flags, representing less than and less than unsigned comparisons, re-
spectively, are combined through an OR gate to act as the selection line for a multiplexer.
This multiplexer aptly selects between 32-bit ‘1‘ and ‘0‘ values. This mechanism becomes
vital in the context of SET instructions. Depending on the result emanating from the
ALU, the appropriate data (‘1‘ or ‘0‘) must be written back to a designated register
within the register file.
In cases where the opcode pertains to jump instructions, the return value must be the
return address of the jump instruction, namely PC + 4. Consequently, this arithmetic
operation (PC + 4) is performed and subsequently routed back to the ID stage.
For instructions categorized as R-type, I-type, and U-type, the logical course of ac-
tion involves writing back the outcome generated by the ALU. This data is seamlessly
integrated into the register bank, serving as a fundamental step in the broader pipeline
flow.
3.7 Summary
To summarize this chapter, it deals with the design optimization and the various
stages of the pipelined processor design, Instruction Fetch, Instruction Decode, Execute,
Memory and Write-back Stage. Each stage functionality is explained respectively.All
these stages are included in into a single top-level module Core which is used to interface
the processor with data memory and instruction memory.
The next chapter deals with control and status registers which deals with control of
processor’s behaviour and branch prediction unit which is a special module that helps to
improve instruction throughput and overall performance by reducing the impact of the
stalling created by the branch instructions on the pipeline.

Chapter 4
Control Status Registers & Branch
Prediction Unit

CHAPTER 4
CONTROL STATUS REGISTERS & BRANCH
PREDICTION UNIT
4.1 Modes in RISC-V
The modes in RISC-V are used to control the access to resources and to protect the
system from unauthorized access. The modes in RISC-V are implemented using a mech-
anism called privilege levels. Each mode has a different privilege level, and instructions
can only be executed in modes with a privilege level equal to or greater than the privilege
level of the instruction. RISC-V has four different modes:
• Machine Mode: Default mode. Used for most instructions.
• User Mode: Used by user applications. Fewer privileges than machine mode.
• Supervisor Mode: Used by the operating system. More privileges than user mode.
• Hypervisor Mode: Used by hypervisors. More privileges than supervisor mode.
4.2 Machine Mode in RISC-V

Machine mode in RISC-V is the highest privilege mode in the RISC-V processor’s
privilege hierarchy. In machine mode, the processor has access to all resources and con-
trol registers and can execute privileged instructions, including those that affect memory
protection, virtual memory management, and interrupt handling. Machine mode can di-
rectly access all memory and I/O devices and perform operations that are not available in
lower privilege modes. Key features of machine mode being implemented in the processor
are:
1. Control and Status Registers (CSRs): Machine mode has access to all control and
status registers (CSRs), which provide information about the processor’s state and
configuration. CSRs are used for managing interrupts, exceptions, memory protec-
tion, and other system-related tasks.
2. Interrupts and Exceptions: Machine mode handles the highest priority interrupts
and exceptions, including external interrupts from peripherals and software inter-

rupts (ECALLs) from user-level programs. It uses the ‘mtvec‘ register to set the
base address of the trap vector, where it can find interrupt and exception handlers.
Machine mode provides the foundation for creating a secure and robust operating sys-
tem that can manage and protect user-level applications and services effectively. It is
crucial for the proper functioning and security of the RISC-V-based systems. In order to
implement the M-mode, the procedure followed is,
1. CSR address, function and opcode extraction: Here, in order to execute CSR in-
structions, the address of the register to which one is writing, reading, setting or
clearing to has to be known. The addresses of the M-mode registers being used are:
• Machine status register(mstatus): 300H
• Machine interrupt-enable register (mie): 304H
• Machine trap-handler base address(mtvec): 305H
• Machine exception program counter(mepc): 341H
• Machine trap cause(mcause): 342H
• Machine interrupt pending(mip): 344H
These functions are identified by [31:20] bits of the 32 bit instruction. The various
instructions used in modifying CS registers are:
(a) CSR Read-Write (csrrw): reads the value from the specified CSR into
the destination register (rd) and then writes the value from the source register
into the CSR. This instruction is useful for swapping values between a general-
purpose register and a CSR.
(b) CSR Read and Set (csrrs): reads the value from the specified CSR into
the destination register (rd) and then sets selected bits in the CSR using the
value from the source register. It is commonly used for setting specific control
bits in the CSR without affecting other bits.
(c) CSR Read and Clear (csrrc): reads the value from the specified CSR into
the destination register (rd) and then clears selected bits in the CSR using the
value from the source register. Like ‘csrrs‘, it is useful for modifying specific
control bits in the CSR.

(d) CSR Read-Write Immediate (csrrwi): reads the value from the specified
CSR into the destination register (rd) and then writes a zero-extended 5-bit
immediate value into the CSR. It is used for directly writing a small immediate
value into the CSR.
(e) CSR Read and Set Immediate (csrrsi): reads the value from the specified
CSR into the destination register (rd) and then sets selected bits in the CSR
using a zero-extended 5-bit immediate. This instruction is used to directly set
specific bits in the CSR.
(f) CSR Read and Clear Immediate (csrrci): reads the value from the spec-
ified CSR into the destination register (rd) and then clears selected bits in the
CSR using a zero-extended 5-bit immediate. It is used to directly clear specific
bits in the CSR
2. Establishing the state of the m-mode logic: Here, there are 4 states in the processor,
(a) Reset: when the reset signal is enabled, it resets the instruction address to
BOOT Address.
(b) Operating: This state represents the normal operation.
(c) Trap taken: this state indicates that a trap has occurred and is being trans-
ferred to the trap handler.
(d) Trap return: this state indicates that a trap has been handled by a trap handler
and is returning back to normalcy. Ecall, ebreak , mret are some examples
leading to a trap or exception.
3. Assigning the respective addresses and program counter values according to the
given states: The program counter needs to update with the operations of the
processor at all times and to do this, one has to assign addresses to which the
program counter must update to in the above states.
(a) PC BOOT B̄OOT ADDRESS;
(b) PC EPC ēxception program counter
(c) PC TRAP t̄rap address
(d) PC NEXT n̄ext address

4. Modifying the control status registers according to the state of M-mode: The im-
portance of each register and how they are manipulated is discussed in detail:
• Mstatus: The mstatus register is essential for managing privilege levels and
handling interrupts efficiently in the RISC-V architecture. It allows the proces-
sor to switch between different privilege levels and maintain the state required
for proper exception handling and context switching during interrupts and
traps. Some of the key fields in the mstatus register are:
(a) Machine Previous Interrupt Enable(MPIE): This bit represents the in-
terrupt enable status of the machine before an interrupt or trap occurs.
It helps save the interrupt enable status when transitioning to a higher
privilege level.
(b) Machine Interrupt Enable(MIE): This bit controls whether machine-level

interrupts are enabled. If MIE is set to 1, machine-level interrupts can be
processed; otherwise, they are ignored.
To modify the mstatus 3rd and 7th bit is manipulated.
• Mie: The ”mie” register has one bit per interrupt source, allowing the proces-
sor to enable or disable interrupts on a per-source basis. When a bit in the
”mie” register is set to 1, it means that the corresponding interrupt source is
enabled, and the processor can respond to interrupts generated by that source.
Conversely, if a bit is set to 0, interrupts from that source are masked, and the
processor will ignore them. The ”mie” register is closely related to the ”mip”
(Machine Interrupt Pending) register, which indicates the pending interrupts
at the machine-level. When an interrupt is raised by an enabled source, the
corresponding bit in the ”mip” register is set to 1. If the corresponding bit in
the ”mie” register is also set to 1, the processor will take appropriate actions
to handle the interrupt.
• Mcause: When an exception or trap occurs in the RISC-V processor, the

”mcause” register is automatically updated with a specific value that indicates
the cause of the exception. The value in the ”mcause” register helps identify
the type of interrupt or exception that triggered the trap, allowing the system
software to take appropriate actions and handle the event accordingly.

• Mtvec: The ”mtvec” register contains the base address of the trap vector
table, which is a contiguous block of memory containing the addresses of the
exception handling routines. When a trap or exception occurs in the Machine
mode, the processor uses the value in the ”mtvec” register to determine the
starting address of the corresponding exception handling routine. The format
of the ”mtvec” register is typically divided into two parts:
– Base Address: The lower bits (bits 1:0) of the ”mtvec” register represent
the base address of the trap vector table. These bits must be set to zero
since the trap vector table must be aligned on a 4-byte boundary.
– Mode Control: The remaining bits of the ”mtvec” register are used for
mode control, which specifies the mode of the trap vector. There are two
modes: Direct mode and Vectored mode.
∗ In Direct mode (bit 2 set to 0), the processor jumps directly to the
address specified in the ”mtvec” register when a trap occurs. The
address in ”mtvec” must point to the exception handling routine.
∗ In Vectored mode (bit 2 set to 1), the processor uses the base address
specified in ”mtvec” to calculate the address of the exception handling
routine. It then uses the exception code from the ”mcause” register
to index into the trap vector table to find the appropriate exception
handling routine.
Figure 4.1: CSR Control Unit

Figure 4.2: CSR state diagram
The control and status registers operates in four state namely reset,operating,trap
and return state. Initially the processor is in reset state and this moves onto the
operating state.The operating state is the normal state of the processor. When an
exception occurs, it moves into the trap state .This is shown in fig 4.2. After han-
dling the exception, the state goes back to operating state.When a return function
is identified it goes to trap-return state which then goes back to operating state
after the return function is executed.
4.3 Branch Prediction Unit

Branch prediction is a technique used in processors to enhance performance by fetch-
ing and executing instructions along the predicted path. When a conditional branch is
encountered, the processor must decide whether it will be taken or not taken. Two main
types of branch prediction exist:
• Fixed Branch Prediction: In this approach, the processor consistently makes the
same guess, either predicting that the branch will be taken or not taken. While
simple, fixed branch prediction may not be very accurate as it doesn’t adapt to the
program’s behavior.
• True Branch Prediction: In this approach, the predictor correctly anticipates whether
the branch will be taken or not taken based on historical information and patterns

observed during program execution. It can be further classified into Static Predic-
tion and Dynamic Prediction.
Figure 4.3: Block Diagram of Branch Prediction Unit
1. Static Prediction: Static branch prediction is based on analyzing the control

flow graph of the program. It relies on fixed rules and does not consider past
behavior.
2. Dynamic Prediction: Dynamic branch prediction, on the other hand, relies on

the implementation history. It uses historical behavior to make predictions on
the outcome of branches during program execution.
Under the design, 2-bit dynamic branch prediction is incorporated. The predicted
Program Counter (PC) will be stored in a buffer called the Branch Target Buffer(BTB),
as depicted in Figure 4.3. The BTB is a table that stores the target address of recently
executed branches. This crucial component plays a pivotal role in enhancing the effi-
ciency of the branch prediction unit, enabling more accurate predictions and facilitating
smoother program execution.

Figure 4.4: Finite State Machine of the Branch Prediction Unit
A Finite State Machine(FSM) is illustrated in Figure 4.4 within the branch prediction
unit, comprising four distinct states: Actually Taken, Strongly Taken, Weakly Not Taken,
and Strongly Not Taken.
The FSM’s initial state is Weakly Not Taken. Upon encountering the subsequent
branch, if it is taken, the FSM transitions to the Actually Taken state. Conversely, if
the next branch is not taken, the FSM shifts to the Strongly Not Taken state. Should
a branch be taken following the Strongly Not Taken state, the FSM progresses to the
Weakly Taken state. Similarly, if an untaken branch follows the Weakly Taken state, the
FSM advances to the Strongly Taken state.
The primary function of this FSM is to forecast whether the forthcoming branch
will be taken or not. This prognostication hinges on the historical sequence of branch
outcomes. As an instance, if the past two branches were taken, the FSM is inclined to
anticipate that the subsequent branch will follow suit in being taken.
This approach reduces the number of mispredictions and improves processor perfor-
mance by accurately predicting branch outcomes. By implementing 2-bit dynamic branch
prediction, the aim is to achieve higher prediction accuracy, reduce pipeline stalls, and
optimize the overall performance of the processor for various applications.

4.4 Summary
Hence to summarize it can be said that Control and Status Registers (CSRs) are a
vital component of the RISC-V instruction set architecture. They provide a way to con-
trol and monitor the processor’s behavior, configuration, and operational status. Coming
to Branch prediction, it is a crucial aspect of modern processor design that helps im-
prove instruction throughput and overall performance by reducing the impact of branch
instructions (such as conditionals and loops) on the pipeline.
The next chapter shows the various results of the processor design and its discussions.

Chapter 5
Results & Discussions

CHAPTER 5
RESULTS & DISCUSSIONS
This chapter will discuss about the results of the test program run on the Verilog
design of the processor design. To test the working of the processor, a test assembly
program is written which consists of all the instructions being supported and it is assem-
bled using RARS software. The instruction code generated by the RARS is fed into the
instruction memory of the Verilog design using the Verilog directive ’$reammemh’.
In the first two sections the outputs of the unpipelined and pipelined versions of
the processor is discussed. In the subsequent sections Branch prediction unit and CSR
module is tested as stand alone modules. They are tested for appropriate instructions to
verify their working.
5.1 Unpipelined Processor
Figure 5.1: Test program for Register, Immediate and Store type instructions
The assembly test program testing all the R-type, I-type and S-type instructions are
given in the Figure 5.1.
In the subsequent figures the step outputs from the RARS software is given on the

left side of the figure and the corresponding Vivado outputs is given on the right side
of the figure. In the Vivado output two signals are indicated, namely ALU Result and
WriteBack.
Figure 5.2: Output of Unpipelined Processor-1
The outputs for the first 11 instructions are observed in the Figure 5.2 . On the right
side, one can observe that the values changed in the register file and data memory using

RARS software which is compared to the design on the right side which is the output
waveform of the Verilog code. It can be observed that the output from RARS and Verilog
model is matching as it is. This verifies the working of the processor. In the subsequent
Figures 5.3 and 5.4, the outputs for the next instructions can be seen.

Figure 5.5: Test program to check Branch, Jump and Upper type instructions

Figure 5.6: Output for the Branch, Jump and Upper type instructions
The test program to test the Branch, Jump and Upper type instructions are given in
the Figure 5.5. The program when run on the RARS software shows the infinite looping
of the PC between 00000028H and 00000044H. This can be observed in the output
waveform in the Figure 5.6 by looking at the signal i addr that is highlighted as PC. This
verifies the working of the Branch, Jump and Upper type instructions of the designed
processor.
5.2 Pipelined Processor

The design is divided into 5 pipeline stages and a simple test assembly program to
verify the data moving in the pipeline stages is given in the Figure 5.7.

Figure 5.7: Test program for the pipelined processor
The output signals i data, result and DataOut WB are as observed in the Figure 5.8.
It can be observed that the instruction code for the instruction addi t0,x0,0x00f is fetched
by the IF stage(highlighted as a). Next the result is calculated in the Execute stage which
arrived after two cycles(highlighted as b) and finally getting written back to the register
file in the Write-Back stage(highlighted as c).
Figure 5.8: Output of the 5-stage pipelined processor
5.3 Branch Prediction Unit output

The predicted PC is compared with PC IF; if they are equal, its a hit and then
according to the global history of the predicted PC branch is taken or not taken, hit and
taken signal decides the next PC address. Below in figure 5.9 shows the output of this

instruction:
Figure 5.9: Output of the Branch Prediction Unit
5.4 Execution of CSRWI instruction

The csr instruction “csrwi mtvec,1” ,writes a zimmediate value of 1 to the mtvec
register address.The mtvec register is written with the value 1 as shown in Fig 5.10 :
Figure 5.10: Writing to the mtvec register using CSR write immediate(csri)
5.5 Execution of Ecall & Mret

Ecall instruction leads to trap being taken which then switches the processor’s nor-
mal operating mode to trap mode to handle the trap and the Mret instruction helps in
returning from the trap state.The current state of m-mode switching to trap taken state
from operating state for ecall instruction and m-mode switching to trap return state from
operating state for mret is shown in Figure.5.11.

Figure 5.11: Ecall and mret instruction execution
5.6 Summary
Hence in summary the outputs of Unpipelined, Pipelined, Branch Prediction Unit
and Control Status register are shown and its results are discussed. The images are taken
from the Vivado simulation which is done to test the various modules of the processor.
The next chapter deals with the conclusion of this project and its future scope.

Chapter 6
Conclusion and Future Scope

CHAPTER 6
CONCLUSION AND FUTURE SCOPE
6.1 Conclusion
In this project, data-path for the instructions under RV32I base ISA is designed
using Verilog HDL. To simplify the design, the data-path has been iterated through
various versions supporting all types of instructions under RV32I. The design was later
clustered into various pipeline stage modules adding the necessary ports to interface with
memory units. The single-cycle processor has been converted into a pipe-lined data-path
demonstrating the propagation of instructions through various stages.
Firstly, a single-cycle RISC-V data-path has been designed that supports every in-
struction type in the RV32I Instruction Set, namely R, I, S, B, J and the U-types. Next
step has been to achieve the 5-stage pipe-lining of each of the stages. The 5 stages are In-
struction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory (MEM) and Write
Back (WB). In addition to this, a dynamic branch prediction unit has been developed to
mitigate the control hazards. Furthermore, a successful attempt has been made to equip
the processor with the CSR instructions.
Thus, the project has enabled a deeper understanding of the RISC-V architecture and
demonstrated an effective blending of theoretical concepts and real-world design choices
to create a functional and flexible data-path for a RISC-V processor.
6.2 Future Scope

It is essential to acknowledge that the 5-stage pipelined RISC-V design is not without
its challenges. Pipeline hazards, such as data hazards and structural hazards, must be
carefully managed to ensure correct execution. Techniques like forwarding, stalling can
be effectively employed to mitigate these hazards and maintain the pipeline’s efficiency.
Moreover, dealing with memory access conflicts and cache management requires careful
consideration to prevent performance bottlenecks.
There are ample amount of opportunities for further optimization of the designed
5-stage pipelined RISC-V processor. Innovations in branch prediction algorithms, data
forwarding techniques, and memory hierarchy management can lead to even more sub-
stantial performance gains and efficiency improvements. Additionally, exploring the use of
more stages in the pipeline or adopting superscalar architectures can enhance instruction-

level parallelism and provide additional performance benefits.
6.3 Significance of the Project Work

The successful simulation of the RV32I Instruction Set Architecture data-path, com-
prising both single-cycle and 5-stage pipeline versions of the processor, holds multifaceted
significance within the realm of computer architecture. The translation of theoretical
concepts into tangible hardware not only bridges theory and practice but also nurtures a
deeper understanding among emerging computer engineers.
The realization of two distinct processor versions exemplifies implementation profi-
ciency, providing a platform for insightful comparative analysis that informs future design
strategies.
The project’s engagement with dividing the data-path into stages to simulate a
pipeline helps in breaking down the flow of data in the processor.
6.4 Limitations
Despite the notable achievements in implementing the RV32I Instruction Set Archi-
tecture data-path, the project work does exhibit certain limitations that warrant acknowl-
edgment.
The current scope of testing for the hardware designs is confined to programs authored
by the project designers themselves. While these tests provide essential validation, they
are not exhaustive and do not encompass the wide range of potential inputs and corner
cases that real-world applications may introduce.
A key limitation pertains to the project’s readiness for integration into the broader
Very Large Scale Integration(VLSI) workflow. The absence of formal verification, a crit-
ical phase in the VLSI design process, impedes the project’s progression to subsequent
stages. Without comprehensive formal verification, the hardware designs lack the rigorous
validation needed for confident advancement.
Another limitation concerns the exploration of pipeline hazards. Although a 5-stage
pipeline version of the processor was developed, the project’s approach to handling
pipeline hazards remains incomplete. These hazards, encompassing structural, data, and
control dependencies, can significantly impact the processor’s execution efficiency and
effectiveness.
The next limitation is handling a trap(or)interrupt using csr registers.Though it was

developed as a seperate module, the module could not be integrated into the pipeline and
the trap(or)interrupt handling could not be well understood.
Additionally, the project’s testing and validation rely primarily on programs authored
by the project designers. This approach may not sufficiently cover the diverse software
workloads and scenarios that a real-world processor may encounter.
In summary, while the project work has yielded valuable insights , the aforementioned
limitations emphasize the need for further refinement and validation. Addressing these
limitations would enhance the project’s readiness for integration into the VLSI work-
flow, enable comprehensive formal verification, and ensure the thorough consideration of
pipeline hazards for a more robust processor design.
6.5 Applications
• Educational Tools: The project’s hardware designs serve as educational tools
for hands-on learning of processor architecture, benefiting students and researchers
alike.
• Research Exploration: The designs find utility in research, enabling investiga-

tions into advanced studies and optimization techniques.
• Embedded Systems: Embedded systems development benefits from the single-

cycle version’s efficiency, catering to lightweight applications in IoT devices and
industrial controllers.
• High-Performance Computing: The pipelined version suits high-performance

computing needs, excelling in parallel execution of complex workloads.
• Open-Source Initiatives: The project’s contributions extend to open-source

hardware initiatives, fostering collaboration and innovation within the broader com-
munity.
• Skill Development: Aspiring engineers and professionals can leverage the hard-
ware designs for skill development in hardware description languages, circuit design,
and simulation techniques.
6.6 Learning Outcomes of the Project

• Understanding of the RISC-V architecture.

• Experience with implementation of pipelined instructions that can execute instruc-

tions efficiently.
• Knowledge on how to implement branch prediction technique to mitigate control

hazards.
• Understanding the use of CSRs to control the state of the processor and to provide
information about its state.
• Usage of a variety of software tools (Vivado, RARS, RIPES) to simulate and verify
processor design.
6.7 Summary
In summary, this section encapsulates the culmination of the project, providing an
overview of key accomplishments, future potential, and learning outcomes.
The project successfully achieved the design and implementation of a single cycle
RISC-V processor, also enhancing instruction execution efficiency through stages includ-
ing IF, ID, EX, MEM, and WB. Dynamic branch prediction techniques were integrated
to mitigate control hazards. Future potential lies in the exploration of advanced proces-
sor architectures and optimization strategies.Learning outcomes encompass a profound
understanding of RISC-V architecture, hands-on experience in processor design and dy-
namic prediction techniques.
In conclusion, the project embodies innovation and continuous learning, with its im-
pact resonating in education, research, and open-source collaboration. As the chapter
concludes, it emphasizes the dual nature of accomplishment and ongoing exploration,
paving the way for a dynamic and evolving future in computer architecture.
The subsequent section delves into the references that have guided and influenced the
project’s progression.

BIBLIOGRAPHY
[1] A. Rani and N. Grover, Novel Design of 32-bit Asynchronous (RISC) Microproces-
sor its Implementation on FPGA. International Journal of Information Engineering
and Electronic Business, 2018.
[2] Saif, M. H. Banna, N. U. Sadad, and M. N. I. Mondal, FPGA Implementation of Ed-

ucational RISC-V Processor Suitable for Embedded Applications. International Con-
ference on Electrical, Computer and Communication Engineering (ECCE). IEEE,
2023.
[3] Bharath, P. Vijay, and B. SanthiBhushan, MIPS Based 32-Bit RISC Processor
with Thermal Management Unit and Flexible Pipelining Structure. IEEE 9th Uttar
Pradesh Section International Conference on Electrical, Electronics and Computer
Engineering (UPCON)., 2022.
[4] Soundari, D. V., and et al., Enhancing Network-on-chip Performance by 32-bit

RISC processor based on Power and Area Efficiency. Materials Today: Proceedings,
2021.
[5] Truong, T. Giang, T. G. Do, and T. D. Do., RISC-V Random Test Generator. IEEE
15th International Conference on Advanced Computing and Applications, 2021.
[6] Miura, Junya, H. Miyazaki, and K. Kise, A Portable and Linux capable RISC-V
computer system in Verilog HDL. arXiv preprint arXiv:2002.03576, 2020.
[7] Zhou and W. et al., A Novel Sleep Scheduling Strategy on RISC-V Processor. Jour-
nal of Physics: Conference Series. Vol. 1631. No. 1. IOP Publishing, 2020.
[8] Singh, Aslesa, and et al., Design and implementation of a 32-bit isa risc-v proces-
sor core using virtex-7 and virtex-ultrascale. IEEE 5th International Conference on
Computing Communication and Automation (ICCCA), 2020.
[9] A. Rathi, C., and et al., Design and development of an efficient branch predictor
for an in-order RISC-V processor. 2020.
[10] Lee, Jaewon, and et al., RISC-V FPGA platform toward ROS-based robotics applica-
tion. 30th International Conference on Field-Programmable Logic and Applications
(FPL). IEEE, 2020.
51
[11] Höller, Roland, and et al., Open-source RISC-V processor ip cores for FPGAs—overview
and evaluation. 8th Mediterranean Conference on Embedded Computing (MECO).
IEEE, 2019.
[12] Bhagat, S. M., and S. U. Bhandari., Design and Analysis of 16-bit RISC Proces-
sor. Fourth International Conference on Computing Communication Control and
Automation (ICCUBEA). IEEE, 2018.
[13] Gür, Etki, and et al., FPGA implementation of 32-bit RISC-V processor with web-
based assembler-disassembler. International Symposium on Fundamentals of Elec-
trical Engineering (ISFEE). IEEE, 2018.
[14] Lee, Yunsup, and et al., An agile approach to building RISC-V microprocessors.
IEEE Micro 36.2 (2016): 8-20., 2016.
[15] Patil, Vinayak, and et al., Out of order floating point coprocessor for RISC V ISA.
19th International Symposium on VLSI Design and Test. IEEE, 2015, 2015.
[16] Waterman, Andrew, and et al., The risc-v instruction set manual. volume 1: User-
level isa, version 2.0. California Univ Berkeley Dept of Electrical Engineering and
Computer Sciences, 2014.
[17] Dandamudi and S. P., Guide to RISC processors: for programmers and engineers.
Springer Science Business Media, 2005.
[18] A. Waterman, Y. Lee, R. Avizienis, D. A. Patterson, and K. Asanović, “The RISC-

V Instruction Set Manual Volume II: Privileged Architecture Version 1.9,” EECS
Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2016-129,
2016. [Online]. Available: http://www2.eecs.berkeley.edu/Pubs/TechRpts/
2016/EECS-2016-129.html.
[19] N. Bruns, V. Herdt, D. Große, and R. Drechsler, “Toward RISC-V CSR Compliance
Testing,” IEEE Embedded Systems Letters, vol. 13, no. 4, pp. 202–205, 2021.
[20] M. H. B. Saif, N. U. Sadad, and M. N. I. Mondal, “FPGA Implementation of

Educational RISC-V Processor Suitable for Embedded Applications,” in 2023 In-
ternational Conference on Electrical, Computer and Communication Engineering
(ECCE), IEEE, 2023, pp. 1–5.

[21] L. Poli, S. Saha, X. Zhai, and K. D. Mcdonald-Maier, “Design and Implementa-

tion of a RISC V Processor on FPGA,” in 2021 17th International Conference on
Mobility, Sensing and Networking (MSN), IEEE, 2021, pp. 161–166.
[22] G. Gayathri, S Jaya, et al., “An Area Optimized Floating-Point Coprocessor for
RISC-V Processor,” in 2023 International Conference on Control, Communication
and Computing (ICCC), IEEE, 2023, pp. 1–5.
[23] A. E. Phangestu, I. T. Mujiono, M. Kom, et al., “Five-Stage Pipelined 32-bit RISC-

V Base Integer Instruction Set Architecture Soft microprocessor core in VHDL,”
in 2022 International Seminar on Intelligent Technology and Its Applications (ISI-
TIA), IEEE, 2022, pp. 304–309.
[24] T. Zheng, G. Cai, and Z. Huang, “A Soft RISC-V Processor IP with High-performance
and Low-resource consumption for FPGA,” in 2022 IEEE International Symposium
on Circuits and Systems (ISCAS), IEEE, 2022, pp. 2538–2541.
[25] J.-Y. Lai, C.-A. Chen, S.-L. Chen, and C.-Y. Su, “Implement 32-bit RISC-V Archi-
tecture Processor using Verilog HDL,” in 2021 International Symposium on Intelli-
gent Signal Processing and Communication Systems (ISPACS), IEEE, 2021, pp. 1–
2.

Abs 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Abs 2

Uploaded by

Copyright:

Available Formats

Design and Simulation of 5-Stage

A Minor Project Report (18EC64)

Under the guidance of

In partial fulfillment of the requirements for the degree of

Signature of Guide Signature of Head of the Department Signature of Principal

Prof. Mahendra B M Dr. H. V. Ravish Aradhya Dr. K. N. Subramanya

Name of Examiners Signature with Date

4. Wafa Naaz Shaik(1RV20EC191)

We are indebted to our guide, Prof. Mahendra B M, Assistant Professor, RV

We express sincere gratitude to our beloved Principal, Dr. K. N. Subramanya for

The rapid advancements in computing demands efficient data-path architectures that

1 Introduction to RISC-V Processor 1

2 RV32I Base Integer Instruction Set & Implementation of Unpipelined

3 Design optimization and the Pipelined Processor 16

4 Control Status Registers & Branch Prediction Unit 27

5 Results & Discussions 37

6 Conclusion and Future Scope 46

2.1 Block diagram of the unpipelined processor . . . . . . . . . . . . . . . . . 14

3.1 Core.v and IF.v modules . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1 CSR Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.1 Operations-to-Bits Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 20

ALU Arithmetic and Logic Unit

BPU Branch Prediction Unit

BTB Branch Target Buffer

CISC Complex Instruction Set Computers

CPU Central Processing Unit

CSR Control Status Registers

FPGA Field Programmable Gate Array

FSM Finite State Machine

GRIFT GRIFT (Galois RISC-V ISA Formal Tools)

HDL Hardware Description Language

IoT Interneet of Things

ISA Instruction Set Architecture

LBU Load Byte Unsigned

LHU Load Half-Word Unsigned

MEM Memory Access

MIE Machine Interrupt Enable

MPIE Machine Previous Interrupt Enable

PHT Pattern History Table

RTL Register Transfer Logic

RV32I RISC-V 32-bit Integer

VLSI Very Large Scale Integration

UG Minor Project Report Department of ECE 2022-23

UG Minor Project Report Department of ECE, 2022-23 2

Taking up a 5-stage pipeline in RISC-V as a project objective helps to understand

1.3 Problem statement

1. To design and simulate RISC-V Datapath supporting instructions in RV32I ISA.

3. To verify the functionality of RV32I ISA in unpipelined and pipelined versions.

1.5 Literature Review

UG Minor Project Report Department of ECE, 2022-23 3

UG Minor Project Report Department of ECE, 2022-23 4

1.6 Brief Methodology of the project

3. These individual datapaths are combined to create a single-cycle datapath, which

4. Divide the datapath into five pipeline stages.

Figure 1.1: Methodology

UG Minor Project Report Department of ECE, 2022-23 5

• Chapter 3 discusses the implementation of the 5 stages of pipelining, namely, in-

UG Minor Project Report Department of ECE, 2022-23 6

UG Minor Project Report Department of ECE 2022-23

2.1 RV32I Base Integer Instruction Set

UG Minor Project Report Department of ECE, 2022-23 8

2.1.1 Integer Arithmetic Instructions

2.1.2 Integer Logical Instructions