Professional Documents
Culture Documents
Corey Wright
CprE 381
5/3/2014
Final Project
Group 25
JOSH ABBOTT
COREY WRIGHT
CprE 381
Final Group 25
CONTENTS
1. Oranization ................................................................................................................................................................3
2. Instructin Set ............................................................................................................................................................4
3. Processor Design ....................................................................................................................................................10
3.1 Processor Modules ...........................................................................................................................................10
3.2 Processor connections .....................................................................................................................................14
4. Conclusion ..............................................................................................................................................................16
5. Appendix .................................................................................................................................................................17
CprE 381
Final Group 25
1. ORANIZATION
Computer Organization has taught us a great deal about the inner workings of processors
in general. Through the laboratories of this course, we have been slowly worked our way
up to this final moment; the pipeline processor. This project required us to pull all of the
knowledge of the semester together and put it into one big project. Creating a pipeline
processor means one must have a vast understanding of state machine concepts and timing
along with knowing all the necessary blocks in great detail that are needed to create a
processor. By the end were are able to build, from the ground up, the pipelined processor
and every module that the system encompasses.
The processor is broken up into 5 stages. Figure 1 below shows the breakdown in task.
S1: Instuction
Fetch
S2: Instruction
Decode
S3: Execute
S4: Memory
CprE 381
Final Group 25
Instruction
Fetch
Instruction
Decode
Execute
Memory
Write Back
Increment
Program
counter
Read Program
Counter
Read Register
File
ALU execution
Read/Write to
memory
Write to
register
Distribute
Controls to
each block
Sign Extension
of branch
address
ALU Input
Control
Access
Instruction
Memory and
read address
2. INSTRUCTIN SET
In the design of this processor we set out to create an instruction set that would be capable
of writing a vast majority of programs. We decided to keep it simple and only use a limited
number of instruction but enough to cover anything needed to make basic programs. To do
this we implemented load word, store word, R-type, branch equals, jump and link, jump
register, and jump. With these instructions we were able to write several programs in
various places in memory and produce the correct outputs. Notice that we didnt include in
pseudo instructions. The reason for this is that we really wanted the processor to be quick
and easy to understand. Although this may not be the most user friendly for software
programs, it ensures speed and control.
CprE 381
Final Group 25
R-TYPE INSTUCTION
The R-Type instruction includes logical and, logical or, add, and subtract. These type of
instructions allows the user to access two different registers, perform an operation, and
then put the operation result in a third register. To do this it takes the two read results
from the register file and then inputs them into the algorithm logic unit (ALU). This value is
then sent through the memory stage to the write back stage where it is written to the
register file.
Table 2: R-type
CprE 381
Final Group 25
CprE 381
Final Group 25
BRANCH INSTRUCTION
We designed the branch instruction to take place when the values of the chosen two
registers are the same. We designed the processor to evaluate the branch in the instruction
decode stage. The reason being is that we wanted to minimize the branch impact on the
pipeline. By conducting the branch in the instruction decode stage we only need to flush
the instruction of the fetch stage. This calls for inserting a bubble into the program and
allowing it to propagate through the pipeline in place of the instruction directly after the
branch. Basically we compare the two registers and create a branch address by adding the
program counter to the first 15 bits of the branch instruction (these bits stand for the
offset). An example of the branch will be given in the following sections.
JUMP INSTRUCTION
The jump instruction is very simple. All we do is sign extend the first 25 bits and then
update the program counter in the Fetch stage. This way the instruction directly after the
jump will not be loaded into the pipeline. The program counter has three muxes with three
different controls to ensure the correct address is being index in memory. For the case of
jump, we have a control that passes this jump address through to the program counter.
Table 6: Jump
CprE 381
Final Group 25
CprE 381
Final Group 25
CprE 381
Final Group 25
3. PROCESSOR DESIGN
10
CprE 381
Final Group 25
REGISTER FILE
The register unit is the main memory location within the datapath. This is where LW stores
values at and it is where the values for ALU generally come from in R-type instructions.
There are five inputs to this module and two outputs. The inputs are as follows; one
regwrite enable bit that sets writing on or off, writereg as the register to be written to,
writedata as the data value to be written, RA1 as the first memory location to be read from
instruction[25:21], and RA2 as the second memory location to be read from
instruction[20:16]. The outputs are the data from the two read memory locations, RD1 and
RD2. These will then be used as possible inputs for the ALU be compared against each
other for the RD1 = RD2 comparison needed for the branch instruction. The register unit is
within the Reg stage.
CONTROL
The control is a module that takes in only one value but outputs a 12 bit value that controls
the entire datapath and how information moves throughout the system. The input is the
instruction saved from the fetch stage of the datapath and the control is found based on the
first six bits, instruction[31:26]. This input size restricts the number of possible
instructions used by the datapath to a total of 2^6, or 64 instructions. The control is
evaluated in the second stage, or Reg stage of the datapath. The control bits are as
follows.
11
CprE 381
Final Group 25
: 8C060006; --LW R6
Test Load Hazard
: 00261800; --R1 + R6 =R3
: 30000023; --JAL to line 35 and repeat in a loop
These instructions are performing a load word, then using and add directly after that with
the same register we want to load a value in to. In order to stop the pipeline from falling
into the trap of this load hazard, we stall the pipeline for a cycle. It is important to note that
in this example, the program counter does not increment when the add instruction is in the
execution stage. The reason for this is that it is waiting for the load word to get to the write
back stage where it can then forward the data back to the execution stage where the
updated value of R6 is needed.
FORWARDING
The forwarding unit is crucial for several circumstances. This unit allows the pipeline to
continue moving without stalls. Typically forwarding occurs in when there are registers in
the execution stage that need updated values of a given register from the memory stage or
writeback stage. Instead of stalling the pipeline until the the needed register is update, we
simply pull the value that is to be written to the desired register and direct it to the ALU
where the value is needed. Consider the given situation below,
12
Instruction 1:
Instruction 2:
CprE 381
Final Group 25
This code will requires an updated value of $R3 for use in instruction 2. The forwarding
unit we built simply takes the value of $R3 and uses it before it writes it to the register file
in the next instruction. The forwarding unit does this by forwarding the value of R3 in the
memory stage to the execution stage where instruction 2 is performed. Through having this
unit we have been able to allow the pipeline to continue flowing.
Furthermore, if you consider the example for the hazard unit, the data must be forwarded
to the execution stage for the updated value of R6 to be added properly.
ALU CONTROL
The ALU control is a simple module that takes in values from the main control along with
values from the instruction and gives an appropriate three bit control for the ALU. The two
inputs consist of the two control bits ALUOp0 and ALUOp1 along with the six bits from the
instruction[5:0]. The third bit in the control output represents the A inversion that is used
for subtract, slt, and zero comparisons. The remaining two bits represent the four main
functions that the ALU can complete. The instruction set for the ALU control are as follows.
ALU
The ALU unit is the main logic unit used in the pipelined datapath that we built. The ALU
combines the main logical functions that transistors are capable of into one block that is
then used to run the datapath. The ALU can do four main functions which are; add, and, or,
and set less than. With the ability to invert one of the inputs you can also make the add
behave as a subtract. The ALU takes in three inputs and returns one output. The inputs are
two 32 bit fields for the data into the functions and then a three bit operation code coming
from the ALU control. This allows for five main functions, add, sub, and, or, slt.
13
CprE 381
Final Group 25
DATA REGISTER
The data register is the storage unit for naturally valued numbers within the datapath. The
data register is 32 bits wide and 32 bits long with each value initialized as 0. These 32
locations each have common names used widely in when programming in assembly
including but not limited to the temporary register ($t0-$t7), $ra, and the $zero register.
When a store word instruction is executed the values are stored in the data register and
then can be used later in LW operations. The data register takes in four inputs and has a
single output. The inputs include; one bit memwrite that enables writing to the memory,
one bit memread that enables reading from the data register, 32 bit wide data input from
the EX stage, and a five bit wide address from the ALU output from the EX stage. The output
is the 32 bit value from the addressed location which can be used to write to the register
file.
14
CprE 381
Final Group 25
15
CprE 381
Final Group 25
4. CONCLUSION
The pipeline processor has increased our knowledge of the processors a great deal. We
have learned so much on how to draft, design, implement, simulate, and troubleshoot.
Through the creation of our pipeline processor we can see how many other instructions
can be created, however due to the time constraint of this project, we simply were not able
to add any non-essential instructions except for jump register and jump and link.
Regardless, the ISA we created can perform the vast majority of programs. The pipeline
processor has given us a better appreciation for timing and how much work goes into to
making your processor as fast as possible. Earlier in the class we were able to implement
single and multi cycle processors but the reality is that the pipeline, although more
complicated, will have a much higher throughput than the other two. We you consider that
the brains of the computer is the processor, any little gain in timing can make a major
change in the total speed of your system. Overall we were able to implement a fully
working pipeline processor with a complete ISA. To view the program we wrote for this to
test all the instruction as well as additional diagrams and images, refer to the appendix.
Through this course, the knowledge and skills gained has truly been an incredible
experience.
16
CprE 381
Final Group 25
5. APPENDIX
CprE 381
Final Group 25
The beginning of the program with the actual codes used. You can see at instruction 13 the
first hazard that is implemented.
Figure 6: Demo
18
CprE 381
Final Group 25
Below is the program we wrote. This code basically displays that all the instructions are
working correctly. We first started of the code by checking all instructions. We then jump
to a different area of the memory and begin a doubler program that simply keeps doubling
the value in R1. Once the value equals 16, it returns to the line after the jump and link.
CONTENT BEGIN
0
: 00000000;
1
: 00000000;
2
: 8C010001; --LW R1, (0001)
3
: 8C020002; --LW R2, (0010)
4
: 8C030003; --LW R3, (0011)
5
: 8C040004; --LW R4, (0100)
6
: 8C050005; --LW R5, (0101)
7
: 00221800; --R1 + R2 = R3 = 3
8
: 00231800; --R1 + R3 = R3 = 4 Forward B in EX stage writedata=4,
writeaddress=3
9
: 00221800; --R1 + R2 = R3 =3
10 : 00611800; --R3 + R1 = R3 =4 Forward A in EX stage
11 : 8C060006; --LW R6
Test Load Hazard
12 : 00261800; --R1 + R6 =R3
13 : 30000023; --JAL to line 35 and repeat in a loop
14 : AC010004; --SW R1, (0100) should load 16 into mem location 4
should see
EXALUOUT=4 and EXmuxout2=16
15 : 8C030004; --LW R3, (0100) should load 16 into R3
16 : 00231800; --add R1, R3, R3 =32 this will inflict a load hazard
17 : FC000000; --Finish program jump to zero, stall occurs and r3=32 is written to on
line aaaaaaaaaaaaaaaaaa#2
18 : 00200800; --Junk codes through line 34
19 : 10000004;
20 : 8C010001;
21 : 15AB57CD;
22 : 30000013;
23 : 45794326;
24 : 001FF800;
25 : 1000FFFC;
26 : 458BCA6C;
27 : 345FFFFF;
28 : 61A87899;
29 : 15AB57CD;
30 : 15842579;
31 : 45794326;
32 : 61A87899;
33 : 15AB57CD;
34 : 15842579;
35 : 8C010001; --Jump to here to begin doubler program LW R1 (0001)
36 : 8C020010; --LW R2 (10000)
19
37
38
39
40
41
42
43
44
:
:
:
:
:
:
:
:
CprE 381
Final Group 25
20