Josh Abbott Corey Wright Processor Report Cpre381

Josh Abbott
Corey Wright
CprE 381
5/3/2014
Final Project
Group 25
MIPS BASED PROCESSOR DESIGN

CPRE381: COMPUTER ORGANIZATION AND ASSEMBLY LEVEL PROGRAMMING
JOSH ABBOTT
COREY WRIGHT
Josh Abbott and Corey Wright
CprE 381
Final Group 25
CONTENTS
1. Oranization ................................................................................................................................................................3
2. Instructin Set ............................................................................................................................................................4
3. Processor Design ....................................................................................................................................................10
3.1 Processor Modules ...........................................................................................................................................10
3.2 Processor connections .....................................................................................................................................14
4. Conclusion ..............................................................................................................................................................16
5. Appendix .................................................................................................................................................................17
CprE 381
Final Group 25
1. ORANIZATION
Computer Organization has taught us a great deal about the inner workings of processors
in general. Through the laboratories of this course, we have been slowly worked our way
up to this final moment; the pipeline processor. This project required us to pull all of the
knowledge of the semester together and put it into one big project. Creating a pipeline
processor means one must have a vast understanding of state machine concepts and timing
along with knowing all the necessary blocks in great detail that are needed to create a
processor. By the end were are able to build, from the ground up, the pipelined processor
and every module that the system encompasses.
The processor is broken up into 5 stages. Figure 1 below shows the breakdown in task.
S1: Instuction
Fetch
S2: Instruction
Decode
S3: Execute
S4: Memory
S5: Write Back
Figure 1: Pipeline Stages

Through these stages we are able to place the proper functions in each block. Each one of
these block are relatively independent. We will discuss the parts that exhibit dependencies
in Section 4 and how we will use forwarding and hazard units to deal with these. With
these stages and the material covered in this class we decided what each stage should do.
The Table 1 below shows the operations each stage should complete in the design.
CprE 381
Final Group 25
Instruction
Fetch
Instruction
Decode
Execute
Memory
Write Back
Increment
Program
counter
Read Program
Counter
Read Register
File
ALU execution
Read/Write to
memory
Write to
register
Distribute
Controls to
each block
Sign Extension
of branch
address
ALU Input
Control
Access
Instruction
Memory and
read address
Table 1: Stage Operations

Through previous labs we were able to build a simple single cycle processor. This
processor was able to load, store, and perform R-type and branch instructions. We decided
that the pipeline processor we intended to build was not too far off from this. When you
consider the single cycle processor in Figure 1, it can be seen that really all you need to do
in order to create a pipeline is to carry the instructions and controls over to each stage of
the pipeline.
2. INSTRUCTIN SET
In the design of this processor we set out to create an instruction set that would be capable
of writing a vast majority of programs. We decided to keep it simple and only use a limited
number of instruction but enough to cover anything needed to make basic programs. To do
this we implemented load word, store word, R-type, branch equals, jump and link, jump
register, and jump. With these instructions we were able to write several programs in
various places in memory and produce the correct outputs. Notice that we didnt include in
pseudo instructions. The reason for this is that we really wanted the processor to be quick
and easy to understand. Although this may not be the most user friendly for software
programs, it ensures speed and control.
CprE 381
Final Group 25
R-TYPE INSTUCTION
The R-Type instruction includes logical and, logical or, add, and subtract. These type of
instructions allows the user to access two different registers, perform an operation, and
then put the operation result in a third register. To do this it takes the two read results
from the register file and then inputs them into the algorithm logic unit (ALU). This value is
then sent through the memory stage to the write back stage where it is written to the
register file.
Table 2: R-type
CprE 381
Final Group 25
LOAD WORD INSTRUCTION

We were able to implement load word in this pipeline processor be the same means as
what we did in the single cycle processor. Basically the instruction is first fetch from the
memory like all instructions. After that the instruction is passed to the instruction decode
phase in which the register file doesnt necessarily do anything useful. Once the instruction
is in the execution phase, it takes the first 15 bits of the instruction and sign extends them
to 32 bits. This tells the processor where to go in memory. The value is then added to the
zero and passed into the memory stage. Once its is in memory the address is read and
output to the write back stage where the value of register address and the data to be
written is passed backwards to the register file. We were able to implement a hazard unit
to help control this instruction in the case of a load hazard.
Table 3: Load Word
STORE WORD INSTRUCTION

Store word is a very similar instruction to load word. This time however the value being
read from the registers is now stored into a memory location. For this to happen we take
the data from the register and pass it all the way to the memory stage. In other word, it is
not used in the execution stage. The address for the location in which the register needs to
be written is used in the execution stage. This is goes through the same process as in the
load word, it is sign extended and then added to zero and passed into the memory stage.
Table 4: Store Word
CprE 381
Final Group 25
BRANCH INSTRUCTION
We designed the branch instruction to take place when the values of the chosen two
registers are the same. We designed the processor to evaluate the branch in the instruction
decode stage. The reason being is that we wanted to minimize the branch impact on the
pipeline. By conducting the branch in the instruction decode stage we only need to flush
the instruction of the fetch stage. This calls for inserting a bubble into the program and
allowing it to propagate through the pipeline in place of the instruction directly after the
branch. Basically we compare the two registers and create a branch address by adding the
program counter to the first 15 bits of the branch instruction (these bits stand for the
offset). An example of the branch will be given in the following sections.
Table 5: Branch on Equals
JUMP INSTRUCTION
The jump instruction is very simple. All we do is sign extend the first 25 bits and then
update the program counter in the Fetch stage. This way the instruction directly after the
jump will not be loaded into the pipeline. The program counter has three muxes with three
different controls to ensure the correct address is being index in memory. For the case of
jump, we have a control that passes this jump address through to the program counter.
Table 6: Jump
CprE 381
Final Group 25
JUMP AND LINK INSTRUCTION

Jump and link is almost identical to jump except it requires us to right the next address into
the last register of the register file (for our design this is register 31). In order to perform
this operation, we had to include a mux at the input of the write address pin of the register
file. Instead of the pipeline writing in the write back stage, for jump and link, it writes to the
register in the instruction decode stage. This allows us to record the value immediately and
then start performing the instructions at the jump location.
Table 7: Jump and Link
JUMP REGISTER INSTRUCTION

Jump register is great instruction for being used in conjunction with jump and link. This
instruction basically takes the value of the provided register (in the instruction) and jumps
the pipeline to that location. For instance if one uses a jump and link to a particular
location, the user has the ability to jump right back to the next instruction from where the
jump occurred by performing a jump register with register 31 ( where the jump and link
saves the next address). This is extremely helpful for programing because it provides the
means of moving through memory efficiently. By having such an instruction, the
programmer does not need to memorize address locations for jump but can simply use
register 31 as a return address to a particular program.
Table 8: Jump Register
CprE 381
Final Group 25
ADD IMMEDIATE INSTRUCTION

The add immediate instruction is an I-Type instruction that allows the user to load a value
into a register. We decided this is an extremely useful feature because it gives the user a lot
of control of what is in register. With an add immediate instruction, it is easy to put a large
range of values into registers. This is important for programs and memory control.
Basically, this instruction sign extends the first 15 bits and adds them in the execution stage
to the target registers value using the ALU. The register location is then passed through the
write back stage where it uses the ALU out value (which is also passed through) to write
into the register file.
Table 9: Add Immediate
CprE 381
Final Group 25
3. PROCESSOR DESIGN
3.1 PROCESSOR MODULES

To understand how the processor works and how everything fits together you should first
know the individual parts to the processor and how they work together. In this section we
will discover how each part works and then see the whole grand design in one remarkable
piece of computer engineering.
MULTIPLEXOR
The multiplexor is the simplest actual module used in the pipelined datapath. A multiplexor
is simply put, a block that will choose one of several inputs based on an enable input. In this
design we used a total of ten multiplexors. Two different sizes of multiplexors were used, a
32 bit and 5 bit multiplexor. The number of possible inputs (n) is based off of the size of the
imputed enable as a function 2^n. This design does not use more than two bits for any one
enable. Most of the enable bits come from the control unit or a control bit combined with
logic from another location in the design.
INSTRUCTION MEMORY
The instruction Memory is a RAM type memory with one input and one output. Its size is
32b wide by 256b long. The instruction memory is initialized by a mem_init.mif file. The
.mif format is specifically designed for memory initialization and can commonly be found as
standard text file easily opened with such programs as onenote or notepad. For this module
it could have been possible to choose a ROM type memory, but RAM was chosen since the
multi-cycle datapath that we had previously made needed both writing and reading. The
only input, PC, chooses which line to read in the file and the only output, instruction[32] is
the contents of that line.
SIGN EXTENDER
The sign extender module is used in two variations within the pipelined datapath. Both
examples are used to take a subsection of the 32b instruction and extended that subsection
to the full 32b size needed to be compatible with the datapath width. The first instance
takes in the bits instruction[15:0] and extends to 32 bits to add to the PC for the branch
instruction. This is also used as an input into a mux for the ALU input because it can access
the immediate values for such instructions as addi. The other sign extender is used by a
mux to implement the jump instruction. It extends instruction[25:0] to 32 bits to give a
compatible data width.
10
CprE 381
Final Group 25
REGISTER FILE
The register unit is the main memory location within the datapath. This is where LW stores
values at and it is where the values for ALU generally come from in R-type instructions.
There are five inputs to this module and two outputs. The inputs are as follows; one
regwrite enable bit that sets writing on or off, writereg as the register to be written to,
writedata as the data value to be written, RA1 as the first memory location to be read from
instruction[25:21], and RA2 as the second memory location to be read from
instruction[20:16]. The outputs are the data from the two read memory locations, RD1 and
RD2. These will then be used as possible inputs for the ALU be compared against each
other for the RD1 = RD2 comparison needed for the branch instruction. The register unit is
within the Reg stage.
CONTROL
The control is a module that takes in only one value but outputs a 12 bit value that controls
the entire datapath and how information moves throughout the system. The input is the
instruction saved from the fetch stage of the datapath and the control is found based on the
first six bits, instruction[31:26]. This input size restricts the number of possible
instructions used by the datapath to a total of 2^6, or 64 instructions. The control is
evaluated in the second stage, or Reg stage of the datapath. The control bits are as
follows.
Table 10: Control Bits
11
CprE 381
Final Group 25
HAZARD DETECTION UNIT

The hazard detection unit is used to detect possible overlaps in the the way the pipelined
datapath is run. The hazard detection unit basically takes in the instruction along with the
control and detects if a R-type instruction will override a LW instruction within the
Register file. If this is true it will stop the PC from incrementing and flush the IF/ID and
ID/EX stage registers. This is basically the same as stopping the pipeline from advancing
for one operation and allowing the LW to finish the operation before allowing the register
file to be used again.
For an example of the hazard unit in action can be seen below.
Figure 2: Pipeline Hazard Example

Instructions
11
12
13
: 8C060006; --LW R6
Test Load Hazard
: 00261800; --R1 + R6 =R3
: 30000023; --JAL to line 35 and repeat in a loop
These instructions are performing a load word, then using and add directly after that with
the same register we want to load a value in to. In order to stop the pipeline from falling
into the trap of this load hazard, we stall the pipeline for a cycle. It is important to note that
in this example, the program counter does not increment when the add instruction is in the
execution stage. The reason for this is that it is waiting for the load word to get to the write
back stage where it can then forward the data back to the execution stage where the
updated value of R6 is needed.
FORWARDING
The forwarding unit is crucial for several circumstances. This unit allows the pipeline to
continue moving without stalls. Typically forwarding occurs in when there are registers in
the execution stage that need updated values of a given register from the memory stage or
writeback stage. Instead of stalling the pipeline until the the needed register is update, we
simply pull the value that is to be written to the desired register and direct it to the ALU
where the value is needed. Consider the given situation below,
12
Instruction 1:
Instruction 2:
CprE 381
Final Group 25
Add $R1, $R2, $R3 //Add R1+R2 and Store result in R3

Add $R3, $R1, $R4 // Add R3+R1 and Store result in R4
This code will requires an updated value of $R3 for use in instruction 2. The forwarding
unit we built simply takes the value of $R3 and uses it before it writes it to the register file
in the next instruction. The forwarding unit does this by forwarding the value of R3 in the
memory stage to the execution stage where instruction 2 is performed. Through having this
unit we have been able to allow the pipeline to continue flowing.
Furthermore, if you consider the example for the hazard unit, the data must be forwarded
to the execution stage for the updated value of R6 to be added properly.
ALU CONTROL
The ALU control is a simple module that takes in values from the main control along with
values from the instruction and gives an appropriate three bit control for the ALU. The two
inputs consist of the two control bits ALUOp0 and ALUOp1 along with the six bits from the
instruction[5:0]. The third bit in the control output represents the A inversion that is used
for subtract, slt, and zero comparisons. The remaining two bits represent the four main
functions that the ALU can complete. The instruction set for the ALU control are as follows.
Table 11: ALU Control Bits
ALU
The ALU unit is the main logic unit used in the pipelined datapath that we built. The ALU
combines the main logical functions that transistors are capable of into one block that is
then used to run the datapath. The ALU can do four main functions which are; add, and, or,
and set less than. With the ability to invert one of the inputs you can also make the add
behave as a subtract. The ALU takes in three inputs and returns one output. The inputs are
two 32 bit fields for the data into the functions and then a three bit operation code coming
from the ALU control. This allows for five main functions, add, sub, and, or, slt.
13
CprE 381
Final Group 25
DATA REGISTER
The data register is the storage unit for naturally valued numbers within the datapath. The
data register is 32 bits wide and 32 bits long with each value initialized as 0. These 32
locations each have common names used widely in when programming in assembly
including but not limited to the temporary register ($t0-$t7), $ra, and the $zero register.
When a store word instruction is executed the values are stored in the data register and
then can be used later in LW operations. The data register takes in four inputs and has a
single output. The inputs include; one bit memwrite that enables writing to the memory,
one bit memread that enables reading from the data register, 32 bit wide data input from
the EX stage, and a five bit wide address from the ALU output from the EX stage. The output
is the 32 bit value from the addressed location which can be used to write to the register
file.
3.2 PROCESSOR CONNECTIONS

Now that we know each module and what they do we can discover how they each fit
together within the datapath. It is imperative that each module is in the correct stage and
with the correct connections or else even one misplaced letter could have the ability to
completely break the system that we have worked so hard to build.
In order to effectively assemble all these block into our pipeline datapath, we used
concepts of building a pipeline derived in class. Below is the figure of a pipeline that we
used in class.
14
CprE 381
Final Group 25
Figure 3: Pipeline Used for Theoretical Design

This is a very accurate description of what we implemented. This diagram however is
lacking 2 muxes for jump and link at the inputs of the write data and write address. In
addition there is no hazard and forwarding unit. To see these implemented refer to the
diagram of our pipeline in the appendix, Figure 4.
There are some differences from the reference diagram and the actual implemented code.
We put the 15-0 instruction code extension along with the subsequent add to the PC value
one stage earlier in the Reg stage. This was so that we could then update the ID/EX register
with this value and so we could deal with any possible branches one step earlier. This
eliminated a number of problems with branch timing issues and allowed us to evaluate the
zero value directly from the register file outputs. Since the zero value evaluated straight
from the register we did not need to wait on the ALU to use this value in logic. Also for
convenience and lack of worry about execution time through the applicable registers we
kept track of the instruction and full control values.
15
CprE 381
Final Group 25
4. CONCLUSION
The pipeline processor has increased our knowledge of the processors a great deal. We
have learned so much on how to draft, design, implement, simulate, and troubleshoot.
Through the creation of our pipeline processor we can see how many other instructions
can be created, however due to the time constraint of this project, we simply were not able
to add any non-essential instructions except for jump register and jump and link.
Regardless, the ISA we created can perform the vast majority of programs. The pipeline
processor has given us a better appreciation for timing and how much work goes into to
making your processor as fast as possible. Earlier in the class we were able to implement
single and multi cycle processors but the reality is that the pipeline, although more
complicated, will have a much higher throughput than the other two. We you consider that
the brains of the computer is the processor, any little gain in timing can make a major
change in the total speed of your system. Overall we were able to implement a fully
working pipeline processor with a complete ISA. To view the program we wrote for this to
test all the instruction as well as additional diagrams and images, refer to the appendix.
Through this course, the knowledge and skills gained has truly been an incredible
experience.
16
CprE 381
Final Group 25
5. APPENDIX
Figure 4: Actual Implemented Pipeline

17
CprE 381
Final Group 25
Pipeline Stall on branch example

You can see here that the instructions forward a bubble when the stall is issued. It is first
seen in the third instruction on the 44 line.
Figure 5: Branch Stall Example
The beginning of the program with the actual codes used. You can see at instruction 13 the
first hazard that is implemented.
Figure 6: Demo
18
CprE 381
Final Group 25
Below is the program we wrote. This code basically displays that all the instructions are
working correctly. We first started of the code by checking all instructions. We then jump
to a different area of the memory and begin a doubler program that simply keeps doubling
the value in R1. Once the value equals 16, it returns to the line after the jump and link.
CONTENT BEGIN
0
: 00000000;
1
: 00000000;
2
: 8C010001; --LW R1, (0001)
3
: 8C020002; --LW R2, (0010)
4
: 8C030003; --LW R3, (0011)
5
: 8C040004; --LW R4, (0100)
6
: 8C050005; --LW R5, (0101)
7
: 00221800; --R1 + R2 = R3 = 3
8
: 00231800; --R1 + R3 = R3 = 4 Forward B in EX stage writedata=4,
writeaddress=3
9
: 00221800; --R1 + R2 = R3 =3
10 : 00611800; --R3 + R1 = R3 =4 Forward A in EX stage
11 : 8C060006; --LW R6
Test Load Hazard
12 : 00261800; --R1 + R6 =R3
13 : 30000023; --JAL to line 35 and repeat in a loop
14 : AC010004; --SW R1, (0100) should load 16 into mem location 4
should see
EXALUOUT=4 and EXmuxout2=16
15 : 8C030004; --LW R3, (0100) should load 16 into R3
16 : 00231800; --add R1, R3, R3 =32 this will inflict a load hazard
17 : FC000000; --Finish program jump to zero, stall occurs and r3=32 is written to on
line aaaaaaaaaaaaaaaaaa#2
18 : 00200800; --Junk codes through line 34
19 : 10000004;
20 : 8C010001;
21 : 15AB57CD;
22 : 30000013;
23 : 45794326;
24 : 001FF800;
25 : 1000FFFC;
26 : 458BCA6C;
27 : 345FFFFF;
28 : 61A87899;
29 : 15AB57CD;
30 : 15842579;
31 : 45794326;
32 : 61A87899;
33 : 15AB57CD;
34 : 15842579;
35 : 8C010001; --Jump to here to begin doubler program LW R1 (0001)
36 : 8C020010; --LW R2 (10000)
19
37
38
39
40
41
42
43
44
:
:
:
:
:
:
:
:
CprE 381
Final Group 25
00210800; --add R1, R1, R1

10220005; --beq R1, R2, +5 offset to line
FC000025; -- jump -2 to line
35ABBC54;-- junk through line 43
24AC567B;
24784327;
638A5CB3;
CFE00000; --beq jumps to here, then jump to line 14
20

Josh Abbott Corey Wright Processor Report Cpre381

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Josh Abbott Corey Wright Processor Report Cpre381

Uploaded by

Copyright:

Available Formats

Josh Abbott

MIPS BASED PROCESSOR DESIGN

Josh Abbott and Corey Wright

Josh Abbott and Corey Wright

S5: Write Back

Figure 1: Pipeline Stages

Josh Abbott and Corey Wright

Table 1: Stage Operations

Josh Abbott and Corey Wright

Josh Abbott and Corey Wright

LOAD WORD INSTRUCTION

Table 3: Load Word

STORE WORD INSTRUCTION

Table 4: Store Word

Josh Abbott and Corey Wright

Table 5: Branch on Equals

Josh Abbott and Corey Wright

JUMP AND LINK INSTRUCTION

Table 7: Jump and Link

JUMP REGISTER INSTRUCTION

Table 8: Jump Register

Josh Abbott and Corey Wright

ADD IMMEDIATE INSTRUCTION

Table 9: Add Immediate

Josh Abbott and Corey Wright

3.1 PROCESSOR MODULES

Josh Abbott and Corey Wright

Table 10: Control Bits

Josh Abbott and Corey Wright

HAZARD DETECTION UNIT

Figure 2: Pipeline Hazard Example

Josh Abbott and Corey Wright

Add $R1, $R2, $R3 //Add R1+R2 and Store result in R3

Table 11: ALU Control Bits

Josh Abbott and Corey Wright

3.2 PROCESSOR CONNECTIONS

Josh Abbott and Corey Wright

Figure 3: Pipeline Used for Theoretical Design

Josh Abbott and Corey Wright

Josh Abbott and Corey Wright

Figure 4: Actual Implemented Pipeline

Josh Abbott and Corey Wright

Pipeline Stall on branch example

Figure 5: Branch Stall Example

Josh Abbott and Corey Wright

Josh Abbott and Corey Wright

00210800; --add R1, R1, R1

You might also like