Professional Documents
Culture Documents
Implementation of A Verilog Multi Cycle CPU-FinalDraft
Implementation of A Verilog Multi Cycle CPU-FinalDraft
Abstract:
gained knowledge in CPU datapath design. The project requirements include having at least 15 different instructions, including branch and jump instructions. Each module should be separately testable. The entire design should be implemented and have the ability to run a small sample program which is easily changeable to demonstrate functionality.
Group Contributions:
Joey Nirschl: High level design, testing, implementation. Benjamin Holland: Individual component design, testing, implementation.
Time Contribution: Joey Nirschls Hours: 25 (50% of work) Ben Hollands Hours: 25 (50% of work)
ProjectWorkBreakdown
Table of Contents
Purpose of Machine Instruction Set Definition Instruction Format Design Methodology Design Testing Methodology Conclusion Lessons Learned Appendix A Verilog Code & Testbench Appendix B Simulation Results Appendix C Commonly Made Verilog Mistakes Appendix D Figures and Diagrams Appendix E Sources
1.
Purpose of Machine:
The multicycle CPU design is an improvement on the single cycle design. In this implementation the multicycle design allows for instructions to be executed in multiple stages. This is a great improvement to the signal cycle design because it allows instructions to be executed completely in three to five stages per instruction.
Stage 1: Instruction fetch Stage 2: Instruction decode/register fetch Stage 3: Memory address computation, execution, branch or jump completion Stage 4: Memory access load, memory access store, R-type instruction completion Stage 5: Memory read completion
In our implementation every instruction shares the first two stages which are the instruction fetch and instruction decode/register fetch stages. In the first stage data is fetched from memory and stored in the memory data register and the instruction register. The second stage decodes the instruction to either a R-type instruction, a branch instruction, a jump instruction or a memory address. At stage three the instruction may take separate logical paths depending on the instruction type which was decoded in stage 2. A finite state machine of these logical paths is described in Figure 1 of Appendix D. Stage three is the last stage for instructions of branch or jump types. After either of these two instructions have completed the next instruction is fetched in stage one, and the logic cycle restarts at the beginning of stage one. Stage four occurs for R-type and I-type instructions and for instructions which require memory access (load word, store word). Both store word and R/I-type instructions end in stage four. R/I-type instructions must now store the ALU result in the register file. Store word must store values to memory in this stage. The logic cycle then begins again at stage one with the instruction fetch.
Stage five is only responsible for the load word instruction which after reading the word from memory still needs to store the word to the register file. Load word instructions must load values from memory and store values into a register. After the writing of data to the register file is complete the cycle then will again continue with the fetching of the next instruction in stage one. The control module of the datapath is responsible for organizing and updating the stages of instructions. The advantage of breaking instructions up into stages is that fast instructions can be completed in fewer stages than slower instructions whereas in a single cycle design, all instructions are implemented in one stage requiring the system to wait during every instruction for the time it would take the longest instruction to finish. Since some instructions can now finish in one to two cycles sooner than in the single cycle implementation, the overall average number of clock cycles required to execute instruction code is drastically reduced.
2.
This implementation of a multicycle CPU has support for R-type instructions, I-type instructions, as well as branch and jump instructions. Special logic has been added to the control unit to support I-type instructions because I-type instructions were not previously implemented in the design by Patterson and Hennessy. The instruction set is modeled off of the MIPS (millions of instructions per second) instruction set. Aside from a few minor differences in operation codes the implemented instruction follows the MIPS instruction set convention. The instruction format is discussed in more detail in the next section.
*Stages were added to the finite state machine to support additional functionality. The FSM can be viewed in Appendix D. (The additional logic of each figure is indicated in red.)
add Add, stores the addition of register source (rs) and register target (rt) into register destination (rd). R[rd]=R[rs] + R[rt]
sub Subtract, stores the difference of register source (rs) and register target (rt) into register destination (rd). R[rd]=R[rs] - R[rt]
and And, stores the bitwise and operation of register source (rs) and register target (rt) into register destination (rd). R[rd]=R[rs] & R[rt]
or Or, stores the bitwise or operation of register source (rs) and register target (rt) into register destination (rd). R[rd]=R[rs] | R[rt]
xor - Xor, stores the bitwise xor operation of register source (rs) and register target (rt) into register destination (rd). R[rd]=R[rs] ^ R[rt]
slt Set Less Than, conditionally stores a value 1 or 0 in register destination (rd) if register source (rs) is less than register target (rt). if(R[rs}<R[rt]){R[rd]=1} else{R[rd]=0}
beq Branch Equal, Conditionally upon equality of register source (rs) and register target (rt) branch to current pc value + 4 + immediate value. if(R[rs]==R[rt]){PC=PC+4+BranchAddress}
lw Load Word, loads a 32-bit quantity at memory address in register source (rs) + sign extended immediate into the register target (rt). R[rt]=M[R[rs]+SignExtendedImmediate]
sw Store Word, stores a 32-bit quantity in register target (rt) to register source (rs) +sign extended immediate . M[R[rs]+ SignExtendedImmediate]=R[rt]
addi Add Immediate, stores the addition of register source (rs) and the sign extended immediate value into register target (rt). R[rt]=R[rs] + SignExtendedImmediate
andi - And Immediate, stores the bitwise and operation of register source (rs) and the zero extended immediate value into register target (rt). R[rt]=R[rs] & ZeroExtendedImmediate
xori - Xor Immediate, stores the bitwise xor operation of register source (rs) and the zero extended immediate value into register target (rt). R[rt]=R[rs] ^ ZeroExtendedImmediate
ori - Or Immediate, stores the bitwise or operation of register source (rs) and the zero extended immediate value into register target (rt). R[rt]=R[rs] | ZeroExtendedImmediate
slti Set Less Than Immediate, conditionally stores a value 1 or 0 in register destination (rt) if register source (rs) is less than the sign extended immediate value. if(R[rs}<SignExtendedImmediate){R[rt]=1} else{R[rt]=0}
3.
Instruction Format
The instruction format is different for each of the three instruction types. An Rtype instruction has six fields which include the opcode, rs, rt, rd, shamt, and function fields. The opcode for an R-type instruction is always zero. The function field defines the type of the R-type instruction (ex: add, sub, and, or, ect.). The shamt field is used in shifting operations (not implemented in this design). RD is the register destination which is where the operation result is stored after execution of the instruction. RS (register source) and RT (register target) are the fields referencing the register values to be used in the computational operation of the instruction. The I-type instruction has 4 fields. The opcode for an I-type instruction defines the operation of the instruction. RS (register source) and RT (register target) are the
fields referencing the register values to be used in the computational operation of the instruction. The immediate field of the instruction can either be used as a constant value or as way to compute a memory address by sign extending the value. The J-type instruction has an opcode just like the other two instructions in order to define the instruction operation. The J-type instruction also has an address field which can be used to jump to the specified memory address. The figures below show the individual fields of each instruction type.
Instruction add sub and or xor slt xori beq lw sw addi andi ori slti j
Instruction Type R R R R R R I I I I I I I I J
Opcode 0x00 0x00 0x00 0x00 0x00 0x00 0x0F 0x04 0x23 0x2B 0x08 0x0C 0x0D 0x0A 0x02
Function 0x20 0x22 0x24 0x25 0x01 0x2A N/A N/A N/A N/A N/A N/A N/A N/A N/A
4. Design Methodology
The general approach to this design was to map out a high level design of the system. Our design was based off of the ideas presented in the Computer Organization and Design textbook written by David A. Patterson and John L. Hennessy. Figures 2 and 3 show a general outline of how our implementation was planned out on paper before implementation. The red markings on the figures in Appendix D are the modifications that were made to the design. After the high level design we broke the datapath down in separate modules so that we could divide work among team members and test functionality of each module individually. Testing each module individually was extremely important because it
allowed us to catch many errors in a controlled environment before it became cluttered in the traffic of the entire system. Modularizing code also allows team members to assign responsibility and let team members specialize in specific areas of the code creating more efficient code than if the system were not modularized. Having the system be modular allows for a greater resistance to change for the overall system because if new functionality is needed either a new modular is added or logic is modified within a module to accommodate the added requirement. Although having a system be modular is an important aspect, it is also important to note that the original design must be good enough to allow for code functionality to be modularized in the first place. Once the system has been designed and implemented in pieces it is only a matter of combining the pieces of the system to make the entire CPU. This is easy to do in theory, but with every project there are unforeseen mistakes and poor logic errors. Thankfully, because the system was designed well to begin with, there was room for change and modifications to correct the mistakes of the early implementation. After an intensive debug period, the system was complete. At this point we were able to fully document the entirety of the project and consider features to add or subtract as well as other design changes.
5.
Design
As mentioned earlier the original design was roughly based off of Figures 2 and 3 of Appendix D. Also as mention above, each of the core datapath functionalities was implemented in a separate module. Actual project code can be seen in Appendix A. Below is a top level view of the final datapath implementation. (The additional logic of each figure is indicated in red.)
zerovalue[31..0]
ALUM ulticycle:ALU
zero result[31..0]
DATAA
OUT0
Address[5..0] writeData[31..0]
ALUOut[31..0]
PRE D Q DATAB
M ulticycleControlFSM :m ainControl
ALUSrcA IorD RegDst
valueB[31..0]
ENA CLR
MUX21
clock pc[1]~reg0
PRE D Q a ENA CLR x1[31..0] x0[31..0] x[31..0]
opcode[5..0]
ALUSrcB[2..0] PCSource[1..0]
pc[2]~reg0
PRE D Q
readB[4..0]
iord regdst m em toreg m em read pcwritecond pcwrite m em write irwrite regwrite aluscra alu_out[31..0] pcsource[1..0] alusrcb[2..0] InstructionDecode:IDStage_address sign_extend:extendIm m ediate_signvalue M ulticycleControlFSM :m ainControl_PCWriteCondition M ulticycleControlFSM :m ainControl_PCWrite M ulticycleControlFSM :m ainControl_ALUSrcB M ulticycleControlFSM :m ainControl_PCSource pc[3]~reg0_OUT 0 pc[2]~reg0_OUT 0 pc[1]~reg0_OUT 0 ALUM ulticycle:ALU_zero ALUM ulticycle:ALU_result pc[4]~reg0_OUT 0
register_B[31..0]
PRE D Q clock ENA CLR
sign_extend:extendIm m ediate
ENA CLR
signvalue[31..0]
value[15..0]
pc[3]~reg0
PRE D Q
x[31..0]
pc[4]~reg0
PRE D Q
ENA CLR
pc[5]~reg0
PRE D Q
pc[5]~reg0_OUT 0
ENA CLR
ALUControlM ulti:alucontrol
forceadd funct[5..0] opcode[5..0] ALUIn[3..0]
m em _out[31..0]
m em DataReg[31..0]
PRE D Q
ENA CLR
twom ux5:writereg
a x1[4..0] x0[4..0] x[4..0]
pc[31]~reg0
PRE D Q
ENA CLR
pc[30]~reg0
PRE D Q
ENA CLR
pc[29]~reg0
PRE D Q
ENA CLR
pc[31..0]
pc[28]~reg0
PRE D Q
ENA CLR
pc[27]~reg0
PRE D Q
ENA CLR
pc[26]~reg0
PRE D Q
ENA CLR
pc[25]~reg0
PRE D Q
pc[25]~reg0_OUT 0
ENA CLR
pc[24]~reg0
PRE D Q
pc[24]~reg0_OUT 0
ENA CLR
pc[23]~reg0
PRE D Q
pc[23]~reg0_OUT 0
ENA CLR
pc[22]~reg0
PRE D Q
pc[22]~reg0_OUT 0
ENA CLR
pc[21]~reg0
PRE D Q
pc[21]~reg0_OUT 0
ENA CLR
pc[20]~reg0
PRE D Q
pc[20]~reg0_OUT 0
ENA CLR
pc[19]~reg0
PRE D Q
pc[19]~reg0_OUT 0
ENA CLR
pc[18]~reg0
PRE D Q
pc[18]~reg0_OUT 0
ENA CLR
pc[17]~reg0
PRE D Q
pc[17]~reg0_OUT 0
ENA CLR
pc[16]~reg0
PRE D Q
pc[16]~reg0_OUT 0
ENA CLR
pc[15]~reg0
PRE D Q
pc[15]~reg0_OUT 0
ENA CLR
pc[14]~reg0
PRE D Q
pc[14]~reg0_OUT 0
ENA CLR
pc[13]~reg0
PRE D Q
pc[13]~reg0_OUT 0
ENA CLR
pc[12]~reg0
PRE D Q
pc[12]~reg0_OUT 0
ENA CLR
pc[11]~reg0
PRE D Q
pc[11]~reg0_OUT 0
ENA CLR
pc[10]~reg0
PRE D Q
pc[10]~reg0_OUT 0
ENA CLR
pc[9]~reg0
PRE D Q
pc[9]~reg0_OUT 0
ENA CLR
pc[8]~reg0
PRE D Q
pc[8]~reg0_OUT 0
ENA CLR
pc[7]~reg0
PRE D Q
pc[7]~reg0_OUT 0
ENA CLR
pc[6]~reg0
PRE D Q
pc[6]~reg0_OUT 0
ENA CLR
Add0 cycle[31..0]~reg0
A[31..0]
32' h00000001 --
PRE OUT[31..0] D Q
cycle[31..0]
*To examine details of design please use the zoom feature of your PDF viewer
current_state
next_state:E next_state:0000 next_state:B next_state:C next_state:D next_state:F next_state:G next_state:H I G C B 0000 J K M D F E H L next_state:I next_state:J next_state:K next_state:L next_state:M clk 0000 B C D E F G H I J K L M
WideOr9
RegWrite~reg0
PRE D Q
RegWrite
A[5..0] B[5..0]
OUT
ENA CLR
EQUAL
Equal1
PCWrite~0
PCWrite~reg0
PRE D Q
PCWrite
A[5..0]
6' h2B --
B[5..0]
OUT
ENA CLR
EQUAL
MemRead~0
MemRead~reg0
PRE D Q
Equal5
MemRead
A[5..0]
6' h08 --
ENA CLR
B[5..0]
EQUAL
IorD~0
IorD~reg0
PRE D Q
Equal6
IorD
ENA CLR
A[5..0]
6' h0A --
B[5..0]
OUT
ALUSrcB~0
EQUAL
forceAdd~reg0
PRE D Q
forceAdd
Equal7
ENA A[5..0]
6' h0D --
CLR OUT
B[5..0]
WideOr1
EQUAL
ALUSrcB[1]~reg0
PRE D Q
Equal8
ENA CLR A[5..0]
6' h0C --
ALUSrcB[2..0]
B[5..0]
OUT
WideOr0
EQUAL
ALUSrcA~reg0
PRE D Q
ALUSrcA
Equal9
ENA CLR
A[5..0]
6' h0F --
B[5..0]
OUT
EQUAL
Equal4
A[5..0]
6' h02 --
B[5..0]
OUT
EQUAL
Equal3
A[5..0]
6' h04 --
B[5..0]
OUT
EQUAL
Equal2
A[5..0]
6' h00 --
B[5..0]
OUT
EQUAL
WideOr2
WideOr3
WideOr4
WideOr5
WideOr6
WideOr8
WideOr7 ALUSrcB[2]~reg0
PRE D Q D
RegDst~reg0
PRE Q
RegDst
ENA CLR
ENA CLR
PCWriteCondition~reg0
PRE D Q
PCWriteCondition
ENA CLR
PCSource[1..0]~reg0
PRE D Q
PCSource[1..0]
ENA CLR
MemWrite~reg0
PRE D Q
MemWrite
ENA CLR
MemtoReg~reg0
PRE D Q
MemtoReg
ENA CLR
IRWrite~reg0
PRE D Q
IRWrite
ENA CLR
ALUSrcB[0]~reg0
PRE D Q
clk
ENA CLR
*To examine details of design please use the zoom feature of your PDF viewer
DATA[15..0]
OUT
Mux32_OUT
MUX
Mux31
node[319..1]
319' h00000000000000000000000000000000000000000000000000000000000000000000000000000000 --
SEL[3..0]
BUF (DC)
1' h0 --
Add1
DATA[15..0]
OUT
Mux31_OUT
valueA[31..0]
1' h1 --
A[32..0] OUT[32..0]
valueB[31..0] result~32_OUT0
1' h1 --
B[32..0]
MUX ADDER
Mux30
result~38_OUT0 result~69_OUT0 result~37_OUT0 result~36_OUT0 result~35_OUT0 result~95_OUT0 result~65_OUT0 result~33_OUT0 result~63_OUT0 result~31_OUT0 result~39_OUT0 result~40_OUT0 result~41_OUT0 result~42_OUT0 result~43_OUT0 result~44_OUT0 result~45_OUT0 result[31]~0_OUT0 result~64_OUT0 result~94_OUT0 result~96_OUT0 result~62_OUT0 result~30_OUT0
SEL[3..0]
1' h0 --
DATA[15..0]
OUT
Mux30_OUT
MUX
Mux29
SEL[3..0]
1' h0 --
DATA[15..0]
OUT
Mux29_OUT
MUX
Mux28
SEL[3..0]
1' h0 --
result~93_OUT0
DATA[15..0]
OUT
Mux28_OUT
result~61_OUT0 result~29_OUT0
MUX
Mux27
SEL[3..0]
1' h0 --
result~92_OUT0
OUT DATA[15..0]
Mux27_OUT
result~60_OUT0 result~28_OUT0
MUX
Mux26
SEL[3..0]
1' h0 --
result~91_OUT0
DATA[15..0]
OUT
Mux26_OUT
result~59_OUT0 result~27_OUT0
MUX
Mux25
SEL[3..0]
1' h0 --
result~90_OUT0
DATA[15..0]
OUT
Mux25_OUT
result~58_OUT0 result~26_OUT0
MUX
Mux24
SEL[3..0]
1' h0 --
result~89_OUT0
DATA[15..0]
OUT
Mux24_OUT
result~57_OUT0 result~25_OUT0
MUX
Mux23
SEL[3..0]
1' h0 --
result~88_OUT0
DATA[15..0]
OUT
Mux23_OUT
result~56_OUT0 result~24_OUT0
MUX
Mux22
SEL[3..0]
1' h0 --
result~87_OUT0
OUT DATA[15..0]
Mux22_OUT
result~55_OUT0 result~23_OUT0
MUX
Mux21
SEL[3..0]
1' h0 --
result~86_OUT0
DATA[15..0]
OUT
Mux21_OUT
result~54_OUT0 result~22_OUT0
MUX
Mux20
SEL[3..0]
1' h0 --
result~85_OUT0
DATA[15..0]
OUT
Mux20_OUT
result~53_OUT0 result~21_OUT0
MUX
Mux19
SEL[3..0]
1' h0 --
result~84_OUT0
DATA[15..0]
OUT
Mux19_OUT
result~52_OUT0 result~20_OUT0
MUX
Mux18
SEL[3..0]
1' h0 --
result~83_OUT0
DATA[15..0]
OUT
Mux18_OUT
result~51_OUT0 result~19_OUT0
MUX
Mux17
SEL[3..0]
1' h0 --
result~82_OUT0
OUT DATA[15..0]
Mux17_OUT
result~50_OUT0 result~18_OUT0
MUX
Mux16
SEL[3..0]
1' h0 --
result~81_OUT0
DATA[15..0]
OUT
Mux16_OUT
result~49_OUT0 result~17_OUT0
MUX
Mux15
SEL[3..0]
1' h0 --
result~80_OUT0
DATA[15..0]
OUT
Mux15_OUT
result~48_OUT0 result~16_OUT0
MUX
Mux14
SEL[3..0]
1' h0 --
result~79_OUT0
DATA[15..0]
OUT
Mux14_OUT
result~47_OUT0 result~15_OUT0
MUX
Mux13
SEL[3..0]
1' h0 --
result~78_OUT0
DATA[15..0]
OUT
Mux13_OUT
result~46_OUT0 result~14_OUT0
MUX
Mux12
SEL[3..0]
1' h0 --
result~77_OUT0
OUT DATA[15..0]
Mux12_OUT
result~13
MUX
Mux11
SEL[3..0]
1' h0 --
result~76_OUT0 result~12
DATA[15..0]
OUT
Mux11_OUT
MUX
Mux10
SEL[3..0]
1' h0 --
result~75_OUT0
DATA[15..0]
OUT
Mux10_OUT
result~11
MUX
Mux9
SEL[3..0]
1' h0 --
result~74_OUT0 result~10
DATA[15..0]
OUT
Mux9_OUT
MUX
Mux8
SEL[3..0]
1' h0 --
result~73_OUT0
DATA[15..0]
OUT
Mux8_OUT
result~9
MUX
Mux7
SEL[3..0]
1' h0 --
result~72_OUT0
OUT DATA[15..0]
Mux7_OUT
result~8
MUX
Mux6
SEL[3..0]
1' h0 --
result~71_OUT0 result~7
DATA[15..0]
OUT
Mux6_OUT
MUX
Mux5 LessThan0
SEL[3..0] A[31..0] B[31..0]
OUT
DATA[15..0] LESS_THAN
OUT
Mux5_OUT
result~1
MUX
Mux4
SEL[3..0]
1' h0 --
result~66_OUT0 result~2
DATA[15..0]
OUT
Mux4_OUT
MUX
Mux3
SEL[3..0]
1' h0 --
result~67_OUT0
DATA[15..0]
OUT
Mux3_OUT
result~3
MUX
Mux2
SEL[3..0]
1' h0 --
result~68_OUT0
OUT DATA[15..0]
Mux2_OUT
result~4
MUX
Mux1
aluctrl[3..0]
1' h0 --
SEL[3..0]
DATA[15..0]
OUT
Mux1_OUT
Add0_OUT
result~5
result~34_OUT0 Mux0
MUX
SEL[3..0]
1' h0 --
result~70_OUT0
DATA[15..0]
OUT
Mux0_OUT
result~6
MUX
Equal0
A[31..0] B[31..0]
OUT
zero
EQUAL
*To examine details of design please use the zoom feature of your PDF viewer
opcode[5..0]
IN[5..0] OUT[63..0]
DECODER
Selector1 WideOr0
SEL[3..0]
ALUIn[0]$latch
OUT D PRE Q ENA CLR
node[3..1]
3' h0 --
BUF (DC) 0 1 1 0 0 1
2' h1 --
DATA[3..0]
ALUIn[0]~1
ALUIn[0]~0 Selector4
SELECTOR
SEL[3..0] OUT
2' h2 --
ALUIn[1]$latch
PRE D ENA CLR Q
0 0 1 0
0 1
DATA[3..0]
ALUIn[1]~9 Selector5
ALUIn[1]~12 Selector3
SELECTOR
SEL[3..0] OUT
3' h3 --
SEL[3..0] OUT
ALUIn[2]$latch
PRE D ENA CLR Q
1' h0 --
DATA[3..0]
2' h1 --
DATA[3..0]
ALUIn[3..0]
WideOr5
SELECTOR SELECTOR
WideOr6 Equal4
funct[5..0]
6' h22 --
A[5..0] B[5..0]
OUT 1
0 1 0
0 1
EQUAL
ALUIn[0]~3 WideOr4
Equal3
ALUIn[0]~2 Selector0
A[5..0]
6' h01 --
B[5..0]
OUT
SEL[3..0] OUT
EQUAL
ALUIn[3]~13
3' h3 --
DATA[3..0]
Equal2
SELECTOR OUT
A[5..0]
6' h20 --
B[5..0]
Selector6 WideOr3
SEL[3..0] OUT
EQUAL
Equal1
A[5..0]
6' h25 --
3' h3 --
B[5..0]
OUT
DATA[3..0]
0 EQUAL 1 1 SELECTOR
Equal0
ALUIn[1]~6
A[5..0]
6' h24 --
B[5..0]
OUT
WideOr2
EQUAL
Equal5 ALUIn~14
A[5..0]
6' h2A --
B[5..0]
OUT
EQUAL 0
0 1 0
0 1
ALUIn[2]~8 forceadd
ALUIn[2]~11
*To examine details of design please use the zoom feature of your PDF viewer
6.
Testing Methodology
The general methodology to test the system directly stems from our design methodology. In the design methodology we broke important system functionalities in separate modules so that we could individually debug and assign responsibility. This way each module can be tested individually eliminating possible interference from other modules. Once each module has been individually tested and is working, the system can be implemented using each of the smaller modules. At this point it is just a matter of working out any system integration issues or finding any bugs that were missed in the first stage. Once the system was completely integrated, we decided that the best way to test the system as a whole was to write a program which would demonstrate the working functionality of the entire system. Finally, after writing our test program, we found that we were able to implement a working datapath that calculates the nth digit of the Fibonacci sequence correctly.
7.
Conclusion
Our Computer Engineering 305 project came from an accumulation of material from Cpre305 and previous courses. The knowledge we needed to complete this project included an understanding of multicycle CPUs, datapaths, control units, finite state machines, digital logic, and Verilog. With our knowledge, we were able to build individual logic modules and integrate those modules to create our multicycle processor. The processor was capable of supporting fifteen MIPS instructions. In the process of building the CPU, we added logic to the design presented in the textbook by Patterson and Hennessy to fully support our multicycle design.
8.
Lessons Learned
Save often, ModelSim has a bad habit of crashing in the lab. The more you save, the less amount of work will be lost after a program or computer crashes. Make backups, if all else fails, you have a backup. Use comments, when working with others, comments allow others to understand your code. The less comments provided, the harder it may be for someone to understand your code in the future.
Create block schematics, block schematics help to understand the big picture. If the block diagram created from the Verilog code, does not look correct, then the block diagram can bring understanding to the high level design as well as help overcome mistakes in code.
//MultiCycle is our multicycle cpu module MultiCycle(cycle, pc, clock, alu_out, mem_out, regdst, memread, memwrite, regwrite, memtoreg, zero, pcwritecond, pcwrite,iord,irwrite, pcsource,aluscra,alusrcb); // input/output input clock; output[31:0] cycle,alu_out, mem_out, pc; output regdst, memread, memwrite, regwrite, memtoreg; output zero; output pcwritecond, pcwrite,iord,irwrite; output aluscra; output [1:0] pcsource; output [2:0] alusrcb;
// control variables wire regdst, memread, memwrite, regwrite, memtoreg; wire pcwritecond, pcwrite,iord,irwrite, aluscra, zero; wire [1:0] pcsource; wire [2:0] alusrcb; wire [31:0] jumpaddress,alu_out, mem_out; wire [31:0] branchCondition;
wire[3:0] aluCtrl;
// other variables reg [31:0] pc = 32'b0; reg [31:0] ALUOut; reg [31:0] register_A, register_B;
wire [31:0] memAddress; // Decode control signals wire [5:0]opCode; wire [4:0] regToWrite;
//Instruction decode variables wire[4:0] rs,rt,rd; wire [15:0] immediatevalue; wire [4:0] shamt; wire [5:0] funct; wire [25:0] address;
reg [31:0] memDataReg; wire [31:0] regA,regB; wire [31:0] regWriteData; wire [31:0] imm_value; wire [31:0] valueA, valueB; wire forceadd;
//Data Memory module holds both data and instructions DataMemory data(memwrite,memread,memAddress[5:0], register_B,mem_out);
//Instruction decode decodes instructions and puts values into appropiate wires InstructionDecode IDStage(clock, mem_out,opCode,rs,rt,rd,shamt,funct,immediatevalue,address);
//Microcode Control FSM control control of multicycle cpu MulticycleControlFSM mainControl(opCode,clock,aluscra,iord,alusrcb,pcsource,regdst,memtoreg, memread,pcwritecond, pcwrite, memwrite, irwrite, regwrite,forceadd);
//MemoryDataRegister holds data from memory that may be written into register always@(posedge clock) memDataReg = mem_out;
//Chooses appropiate write register depending on the control twomux5 writereg(regdst, rd, rt,regToWrite); //Chooses appropiate data to write depending on the control twomux32 writedata(memtoreg,memDataReg,ALUOut,regWriteData); regFileRTL RTL(clock,regwrite,regWriteData,regToWrite,rs,rt,regA,regB);
//Registers hold value until positive edge of clock, when they are updated always@(posedge clock) begin register_A = regA; register_B=regB; end
//sign extend the immediate value sign_extend extendImmediate(clock,immediatevalue,imm_value); //xero extend the immediat value wire [31:0] zeroextendvalue;
//temp ALU out register holds value from alu until updated on posedge clock always@(posedge clock) begin ALUOut= alu_out; end
JumpAddress jumpTo(pc,address,jumpaddress); //Mux chooses next data to pc depending on control ThreeToOneMux32 branchesAndJumps(pcsource,jumpaddress,ALUOut,alu_out,branchCondition); wire brachwritecond, gotoNextPc; assign brachwritecond = pcwritecond & zero; assign gotoNextPc =pcwrite | brachwritecond; //PC update always @ (posedge clock) begin if(gotoNextPc) pc=branchCondition; end
//The Testbench for our multicycle cpu module AMultiCycleTest; reg clock; wire[31:0] cycle,alu_out, mem_out, pc; wire regdst, memread, memwrite, regwrite, memtoreg; wire zero; wire pcwritecond, pcwrite,iord,aluscra,irwrite; wire [1:0] pcsource; wire [2:0] alusrcb; initial begin clock =1'b0; end always begin #15 clock = ~clock; end
MultiCycle testcpu(cycle, pc, clock, alu_out, mem_out, regdst, memread, memwrite, regwrite, memtoreg, zero, pcwritecond, pcwrite,iord,irwrite, pcsource,aluscra,alusrcb); endmodule // END: AMultiCycleTest
//Control for multicycle cpu module MulticycleControlFSM(opcode,clk,ALUSrcA,IorD,ALUSrcB,PCSource,RegDst,MemtoReg, MemRead,PCWriteCondition, PCWrite, MemWrite, IRWrite, RegWrite,forceAdd); input [5:0]opcode;
input clk; output ALUSrcA,IorD,RegDst,MemtoReg; output [1:0]ALUSrcB; output [1:0]PCSource; output MemRead,PCWriteCondition, PCWrite, MemWrite, IRWrite, RegWrite,forceAdd; reg MemRead,PCWriteCondition, PCWrite, MemWrite, IRWrite, RegWrite; reg ALUSrcA,IorD,RegDst,MemtoReg, forceAdd; reg [2:0]ALUSrcB; reg [1:0]PCSource; reg [3:0] current_state, next_state; reg [3:0] debug; parameter A=4'b0000, B=4'b0001, C=4'b0010, D=4'b0011, E=4'b0100, F=4'b0101, G=4'b0110, H=4'b0111, I=4'b1000, J=4'b1001, K=4'b1010, L=4'b1011, M=4'b1100; //parameter A=0, B=1, C=2, D=3, E=4, F=5, G=6, H=7, I=8, J=9; K=10;L=11,M=12;
//forceAdd 1=add (only in states A, B) initial begin current_state=4'b0000; next_state=4'b0000; end always@(posedge clk) begin current_state=next_state; end always@(posedge clk or opcode) begin case(current_state)
A:begin
forceAdd=1; end
B:begin
debug = 4'b0001;
ALUSrcA=0; ALUSrcB=3'b011;
begin next_state=J; end //IType instrcution, treate as R-Type //because ALU control will take care of proper execution
else if(opcode== 6'b001000 ||//addI opcode==6'b001010//slt ) begin next_state=K;//sign extended immediate state end else if(opcode== 6'b001101||//orI opcode== 6'b001100||//andI opcode== 6'b001111//xorI ) begin next_state=M;//zero extended immediate state end else debug = 4'b1111; end C:begin
debug = 4'b0010;
ALUSrcA=1; ALUSrcB=3'b010;
IorD=0; PCSource=0;
forceAdd=0;
//if lw nextstate = D or sw nextstate = F //if(opcode==35 || opcode==43) if(opcode==6'b100011) begin next_state=D; end else if(opcode==6'b101011) next_state=F; else debug = 4'b1111; end
D:begin
debug = 4'b0011;
MemRead = 1; IorD=1;
ALUSrcA=0;
next_state=E;
forceAdd=0; end
E:begin
debug = 4'b0100;
next_state=A;
forceAdd=0;
end
F:begin
debug = 4'b0101;
next_state=A;
ALUSrcA = 0; ALUSrcB = 0; PCSource = 0; RegDst = 0; MemtoReg = 0; MemRead = 0; PCWriteCondition = 0; PCWrite = 0; IRWrite = 0; RegWrite = 0;
forceAdd=0;
end
G:begin
debug = 4'b0110;
ALUSrcA=1; ALUSrcB=3'b000;
next_state=H;
IorD = 0; PCSource = 0; RegDst = 0; MemtoReg = 0; MemRead = 0; PCWriteCondition = 0; PCWrite = 0; MemWrite = 0; IRWrite = 0; RegWrite = 0;
forceAdd=0;
end
H:begin //For RType or IType, if not RType, it is IType //if IType regDst = 0 debug = 4'b0111;
RegDst=1'b1;
RegWrite = 1; MemtoReg=1'b0;
next_state=A;
forceAdd=0;
end
I:begin
debug = 4'b1000;
next_state=A;
forceAdd=0;
end
J:begin
ALUSrcA = 0; IorD = 0; ALUSrcB = 0; RegDst = 0; MemtoReg = 0; MemRead = 0; PCWriteCondition = 0; MemWrite = 0; IRWrite = 0; RegWrite = 0;
debug = 4'b1010;
MemtoReg = 0; IorD = 0; RegDst = 0; MemRead = 0; PCWriteCondition = 0; PCWrite=0; MemWrite = 0; IRWrite = 0; RegWrite = 0; PCSource=2'b00;
next_state=L;
MemtoReg = 0;
IorD = 0; MemRead = 0; PCWriteCondition = 0; PCWrite=0; MemWrite = 0; IRWrite = 0; ALUSrcA = 0; ALUSrcB = 3'b000; PCSource=2'b00;
next_state=A;
PCSource=2'b00;
next_state=L;
endmodule//END: MulticycleControlFSM
//Testbench for control module testbenchMulticycleControlFSM; reg [5:0]op; reg clock=0; wire ALUSrcA,IorD,RegDst,MemtoReg; wire [1:0]ALUSrcB; wire [1:0]PCSource; always begin #2 clock=~clock; end initial begin op=6'b000000;//add 1 #10 op=6'b001000;//addi 9 #10 op=6'b000000;//Sub 2 #10 op=6'b000100;//branch 10 #10 op=6'b000000;//And 3 #10 op=6'b000010;//j 15 #10 op=6'b000000;//Or 4 #10op=6'b100011;//LW 16
#10 op=6'b000000;//Mult 7 #10 op=6'b001100;//AndI 12 #10 op=6'b000000;//Div 8 #10 op=6'b001111;//XorI 13 #10 op=6'b001010;//SltI 14
//Total Lines: 186 module ALUMulticycle(aluctrl, valueA, valueB,result,zero); input [3:0] aluctrl; input [31:0] valueA; input [31:0] valueB; output [31:0] result; reg [31:0] result; output zero; reg zero;
case(aluctrl)
4'b0000://Bitwise And begin result = valueA & valueB; end 4'b0001://Bitwise Or begin result = valueA | valueB; end 4'b0010://Add begin result = valueA + valueB; end
4'b0101://Xor begin result = valueA ^ valueB; end 4'b0110://Sub begin result = valueA - valueB; end 4'b0111://Slt begin result = valueA < valueB ? 1:0; end
endcase
if(valueA==valueB)
end endmodule
module testALUMultiCycle; reg [3:0] aluctrl; reg [31:0] valueA; reg [31:0] valueB; wire [31:0] result; wire zero;
initial begin
//AND aluctrl = 4'b0000; valueA = 0; valueB = 4294967295; $monitor("AND -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); #5 aluctrl = 4'b0000; valueA = 4294967295;
valueB = 4294967295; $monitor("AND -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); //OR #5 aluctrl = 4'b0001; valueA = 4294967295; valueB = 4294967295; $monitor("OR -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); #5 aluctrl = 4'b0001; valueA = 0; valueB = 0; $monitor("OR -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero);
//Add #5 aluctrl = 4'b0010; valueA = 5; valueB = 5; $monitor("ADD -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); #5 aluctrl = 4'b0010; valueA = 0; valueB = 4294967295; $monitor("ADD -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero);
//XOR #5 aluctrl = 4'b0101; valueA = 0; valueB = 1; $monitor("XOR -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); #5 aluctrl = 4'b0101; valueA = 4294967295; valueB = 0; $monitor("XOR -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero);
//Subtract #5 aluctrl = 4'b0110; valueA = 5; valueB = 4; $monitor("SUBTRACT -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); #5 aluctrl = 4'b0110; valueA = 4294967295; valueB = 0; $monitor("SUBTRACT -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero);
//SLT
#5 aluctrl = 4'b0111; valueA = 5; valueB = 4; $monitor("SLT -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); #5 aluctrl = 4'b0111; valueA = 0; valueB = 4294967295; $monitor("SLT -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero);
end
endmodule
//ALU control module ALUControlMulti(funct, opcode,forceadd, ALUIn); input [5:0]funct; input [5:0] opcode; input forceadd; output [3:0]ALUIn; reg [3:0]ALUIn;
if(forceadd==1) begin ALUIn = 4'b0010; end else begin //begin case case(opcode) //R-Type 6'b000000: begin //And if(funct==6'b100100) begin ALUIn = 4'b0000; end //Or else if(funct==6'b100101) begin ALUIn = 4'b0001; end //Add else if(funct==6'b100000) begin ALUIn = 4'b0010; end
ALUIn = 4'b0101; end //Sub else if(funct==6'b100010) begin ALUIn = 4'b0110; end //Slt else if(funct==6'b101010) begin ALUIn = 4'b0111; end
end//end R-type //Begin I-Type //AndI 6'b001100://C begin ALUIn = 4'b0000; end //OrI 6'b001101://D begin ALUIn = 4'b0001; end //XorI 6'b001111://F begin ALUIn = 4'b0101; end //SltI
6'b001010://A begin ALUIn = 4'b0111; end //AddI 6'b001000://8 begin ALUIn = 4'b0010; end
//Branch 6'b000100://4 begin ALUIn = 4'b0010; end //LW 6'b100011: begin ALUIn = 4'b0010; end //SW 6'b101011: begin ALUIn = 4'b0010; end //End I-Type
//ALU control testbench module testALUControlMulti; reg clock; reg [5:0]funct; reg [5:0]op; wire [3:0]ALUIn; initial begin $monitor(" Time=%d,\top=%d,\t funct=%d,\t ALUIn=%d", $time, op,funct, ALUIn); end initial begin op=6'b000000;funct=6'b100000;//add 1 #20 op=6'b000000;funct=6'b100010;//Sub 2 #20 op=6'b000000;funct=6'b100100;//And 3 #20 op=6'b000000;funct=6'b100101;//Or 4 #20 op=6'b000000;funct=6'b000001;//Xor 5 #20 op=6'b000000;funct=6'b101010;//Slt 6 #20 op=6'b000000;funct=6'b011000;//Mult 7 #20 op=6'b000000;funct=6'b011010;//Div 8 #20 op=6'b001000;funct=6'b010100;//addi 9 #20 op=6'b000100;funct=6'b000110;//branch (I) 10 ///??? #20 op=6'b001101;funct=6'bx;//OrI #20 op=6'b001100;funct=6'bx;//AndI #20 op=6'b001111;funct=6'bx;//XorI 11 12 13
#20 op=6'b001010;funct=6'bx;//SltI #20 op=6'b000010;funct=6'bx; //j 15 #20 op=6'b100011; funct=6'bx;//LW 16 #20 op=6'b101011; funct=6'bx;//SW 17 #20 $stop; end
14
//Data Memory module module DataMemory( memWrite,memRead,Address, writeData,readData); input memWrite, memRead; input [5:0] Address; input [31:0] writeData; output [31:0] readData; reg [31:0] readData; reg [31:0]dataMemory[1024:0];
initial begin
dataMemory[0] = 32'b00100000000101010000000000010100;//N=20 dataMemory[4] = 32'b00000000000000001011100000100000; dataMemory[8] = 32'b00010010101000000000000000000110; dataMemory[12] = 32'b00100000000101100000000000000001; dataMemory[16] = 32'b00000010111101101011100000100000; dataMemory[20] = 32'b00000010111101101011000000100010; dataMemory[24] = 32'b00100010101101011111111111111111; dataMemory[28] = 32'b00010010101000000000000000000001;
if(memWrite == 1'b1) begin dataMemory[Address]=writeData; end if(memRead ==1'b1) begin readData=dataMemory[Address]; end end endmodule//END: DataMemory
reg memWrite,memRead; reg [5:0] Address; reg [31:0] writeData; wire [31:0] readData;
initial begin $monitor(" memWrite=%d, memRead=%d, Address=%d, writeData=%d,readData=%d ", $time,memWrite,memRead,Address, writeData,readData); end
initial begin memRead=1; #20 Address=4; #20 memRead=0; #20 Address=1; #20 memWrite=1; #20 writeData=32'b1; #20 $stop; end
//Register File module regFileRTL(clock,regWrite,inData,wrReg,readA, readB,regA,regB); input clock; input regWrite; input [31:0] inData; input [4:0] wrReg; input [4:0] readA; input [4:0] readB; output [31:0] regA; output [31:0] regB; reg [31:0] registerFiles[31:0];
end
endmodule//END: regFileRTL
//InstructionDecode decode the instruction module InstructionDecode(clock,instruction, opcode, rs,rt,rd,shamt,funct, immediate, address); input clock; input [31:0] instruction; output [5:0] opcode, funct; output [4:0] rs, rt,rd,shamt; output [15:0] immediate; output [25:0] address;
reg [5:0] opcode, funct; reg [4:0] rs, rt,rd,shamt; reg [15:0] immediate; reg [25:0] address;
always@(posedge clock)
begin assign opcode = instruction[31:26]; assign rs = instruction[25:21]; assign rt = instruction[20:16]; assign rd = instruction[15:11]; assign shamt = instruction[10:6]; assign funct = instruction[5:0]; assign immediate = instruction[15:0]; assign address = instruction[25:0]; end endmodule// END: InstructionDecode
//Instruction decode testbench module AInstrTest; reg clock; reg [31:0] instr; wire [5:0] opcode, funct; wire [4:0] rs, rt,rd,shamt; wire [15:0] immediate; wire [25:0] address;
initial begin $monitor("Time=%d, instOp=%d,%d,instRs=%d,%d,instRt=%d,%d,instRd=%d,%d,instShT=%d,%d,instFt=%d,%d,in stImm=%d,%d,instAdd=%d;%d", $time,instr[31:26], opcode,instr[25:21], rs,instr[21:16],rt,instr[15:11],rd,instr[10:6],shamt,instr[5:0],funct,instr[15:0], immediate,instr[25:0], address); clock=0;
end always #2 clock= ~clock; initial begin instr = 32'b00100000000101010000000000010001; #20 instr = 32'b00000000000000001011100000100000; #20 instr = 32'b00010010101000000000000000000110; #20 instr = 32'b00100000000101100000000000000001; #20 instr = 32'b00000010111101101011100000100000; #20 instr = 32'b00000010111101101011000000100010; #20 instr = 32'b00100010101101011111111111111111; #20 instr = 32'b00010010101000000000000000000001; #20 instr = 32'b00010000000000001111111111111011; #20 instr = 32'b10101100000101110000000000000001; #20 $stop; end InstructionDecode testdecode(clock,instr, opcode, rs,rt,rd,shamt,funct, immediate, address); endmodule//END:AInstrTest
//Sign extension module module sign_extend(clock,value,signvalue); input clock; input [15:0] value; output [31:0] signvalue; reg [31:0] signvalue; always@(posedge clock) begin signvalue[31:16] = 16'b0000000000000000; if(value[15] ==1'b1)
//Sign Extend test bench module testSignExtend; reg clock; reg [15:0] value; wire [31:0] newvalue; initial begin $monitor(" Time=%d, value=%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b, signvalue=%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b% b%b%b%b%b%b%b", $time, value[15],value[14],value[13],value[12],value[11],value[10],value[9], value[8],value[7],value[6],value[5],value[4],value[3],value[2],value[1], value[0], newvalue[31],newvalue[30],newvalue[29],newvalue[28],newvalue[27], newvalue[26],newvalue[25],newvalue[24],newvalue[23],newvalue[22], newvalue[21],newvalue[20],newvalue[19],newvalue[18],newvalue[17], newvalue[16],newvalue[15],newvalue[14],newvalue[13],newvalue[12], newvalue[11],newvalue[10],newvalue[9],newvalue[8],newvalue[7], newvalue[6],newvalue[5],newvalue[4],newvalue[3],newvalue[2], newvalue[1],newvalue[0]); clock=0; end always #2 clock= ~clock;
initial begin value = 0;#20 value = 1;#20 value = 2;#20 value = 3;#20 value = 4; #20 value = 5;#20 value = 20;#20 value = 40;#20 value = 500;#20 value = 10000;
#20 value = 16'b1000000000000000;#20 value = 16'b1000000000000001; #20 value = 16'b0111111111111111;#20 value = 16'b1010101010101010; #20 value = 16'b1111111111111111;#20 value = 16'b1111111111111110; #20 $stop; end sign_extend testsign(clock,value,newvalue); endmodule
//Zero extension module module zero_extend(clock,value, zerovalue); input clock; input [15:0] value; output [31:0] zerovalue; reg [31:0] zerovalue; always@(posedge clock) begin zerovalue[31:16] = 16'b0000000000000000; zerovalue[15:0] = value; end endmodule
module testZeroExtend; reg [15:0] value; reg clock; wire [31:0] newvalue; initial begin $monitor(" Time=%d, value=%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b, zerovalue=%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b% b%b%b%b%b%b%b", $time, value[15],value[14],value[13],value[12],value[11],value[10],value[9], value[8],value[7],value[6],value[5],value[4],value[3],value[2],value[1], value[0], newvalue[31],newvalue[30],newvalue[29],newvalue[28],newvalue[27], newvalue[26],newvalue[25],newvalue[24],newvalue[23],newvalue[22], newvalue[21],newvalue[20],newvalue[19],newvalue[18],newvalue[17], newvalue[16],newvalue[15],newvalue[14],newvalue[13],newvalue[12], newvalue[11],newvalue[10],newvalue[9],newvalue[8],newvalue[7], newvalue[6],newvalue[5],newvalue[4],newvalue[3],newvalue[2], newvalue[1],newvalue[0]); clock=0; end always #2 clock= ~clock; initial begin value = 0;#20 value = 1;#20 value = 2;#20 value = 3;#20 value = 4; #20 value = 5;#20 value = 20;#20 value = 40;#20 value = 500;#20 value = 10000;
#20 value = 16'b1000000000000000;#20 value = 16'b1000000000000001; #20 value = 16'b0111111111111111;#20 value = 16'b1010101010101010; #20 value = 16'b1111111111111111;#20 value = 16'b1111111111111110;
//JumpAddress module JumpAddress(pc,address,newAddress); input [31:0] pc; input [25:0] address; output [31:0] newAddress; reg [31:0] newAddress;
always@(pc or address) begin newAddress[31:28] = pc[31:28]; newAddress[27:0] = (address <<2); end endmodule//END: JumpAddress
//JumpAddress testbench module AJumpAddressTest; reg [31:0] pc; reg [25:0] addr; wire [31:0] newAddr; integer x,y;
initial
begin $monitor(" Time=%d, pc=%d, addr=%d, newAddr=%d", $time,pc,addr,newAddr); end initial begin x=0; y=0; addr = 32'b0; pc = 32'b00010000000000000000000000000000; for(x = 0; x < 32; x=x+1) begin #10 addr=x; end pc = 32'b00110000000000000000000000000000; addr= 32'b11110000000000000000000000000000; for(y = 0; y < 32; y=y+1) begin #10 addr=y; end #20 $stop; end JumpAddress jumptest(pc,addr,newAddr); endmodule
//Twomux5 has a datapath of 5 bits wide and a choice of two elements module twomux5(a,x1,x0,x); input a; input [4:0] x1,x0; output [4:0]x;
reg [4:0]x;
if(a == 1'b1) begin x = x1; end else if(a==1'b0) begin x = x0; end
//Twomux32 has a datapath of 32 bits wide and a choice of two elements module twomux32(a,x1,x0,x); input a; input [31:0] x1,x0; output [31:0]x; reg [31:0]x; always@(a or x1 or x0) begin if(a == 1'b1) begin x = x1; end
//ThreeToOneMux has a datapath of 32 bits wide and a choice of three elements module ThreeToOneMux32(select,x2,x1,x0,out); input [1:0] select; input [31:0] x2,x1,x0; output [31:0] out; reg [31:0] out; always@(select or x0 or x0 or x2) begin if(select == 2'b00) begin out = x0; end if(select == 2'b01) begin out = x1; end if(select == 2'b10) begin out = x2; end
end
endmodule// END:ThreeToOneMux32 //FiveToOneMux 32 has a datapath of 32 bits and a choice of three elements module FiveToOneMux32(select,x4,x3,x2,x1,x0,out); input [2:0] select; input [31:0] x4,x3,x2,x1,x0; output [31:0] out; reg [31:0] out; always@(select or x0 or x1 or x2 or x3 or x4) begin if(select == 3'b000) begin out = x0; end if(select == 3'b001) begin out = x1; end if(select == 3'b010) begin out = x2; end if(select == 3'b011) begin out = x3; end if(select == 3'b100) begin out = x4; end end endmodule
The following simulation results are of a program we wrote which calculates the nth digit of the Fibonacci sequence. In this simulation the nth digit to calculate was set as 20. After running the simulation we calculated that 20th digit of the Fibonacci sequence was 6765, which is indeed correct.
addi $21,$0,20 add $23,$0,$0 beq $21,$0,end addi $22, $0,1 loop: add $23,$23,$22 sub $22,$23,$22 addi $21,$21,-1 beq $21,$0,end beq $0,$0, loop end: sw $23,1($0)
To double check out binary math, we compiled our assemble code in the MIPS simulator SPIM.
[0x00400000] [0x00400004]
[0x00400008] 0x12a00007 beq $21, $0, 28 [end-0x00400008]; 3: beq $21,$0,end [0x0040000c] [0x00400010] 0x20160001 addi $22, $0, 1 0x02f6b820 add $23, $23, $22 ; 4: addi $22, $0,1 ; 6: add $23,$23,$22 ; 7: sub $22,$23,$22 ; 8: addi $21,$21,-1
[0x00400014] 0x02f6b022 sub $22, $23, $22 [0x00400018] 0x22b5ffff addi $21, $21, -1
0x12a00002 beq $21, $0, 8 [end-0x0040001c] 0x1000fffc beq $0, $0, -16 [loop-0x00400020] 0xac170001 sw $23, 1($0)
dataMemory[0] = 32'b00100000000101010000000000010100; dataMemory[4] = 32'b00000000000000001011100000100000; dataMemory[8] = 32'b00010010101000000000000000000110; dataMemory[12] = 32'b00100000000101100000000000000001; dataMemory[16] = 32'b00000010111101101011100000100000; dataMemory[20] = 32'b00000010111101101011000000100010; dataMemory[24] = 32'b00100010101101011111111111111111; dataMemory[28] = 32'b00010010101000000000000000000001; dataMemory[32] = 32'b00010000000000001111111111111011; dataMemory[36] = 32'b10101100000101110000000000000001;
On the following pages are the results of the simulation running the program described above.
/AMultiCycleTest/testcpu/clock /AMultiCycleTest/testcpu/cycle /AMultiCycleTest/testcpu/alu_out /AMultiCycleTest/testcpu/mem_out /AMultiCycleTest/testcpu/pc /AMultiCycleTest/testcpu/regdst /AMultiCycleTest/testcpu/memread /AMultiCycleTest/testcpu/memwrite /AMultiCycleTest/testcpu/regwrite /AMultiCycleTest/testcpu/memtoreg /AMultiCycleTest/testcpu/zero /AMultiCycleTest/testcpu/pcwritecond /AMultiCycleTest/testcpu/pcwrite /AMultiCycleTest/testcpu/iord /AMultiCycleTest/testcpu/irwrite /AMultiCycleTest/testcpu/aluscra /AMultiCycleTest/testcpu/pcsource /AMultiCycleTest/testcpu/alusrcb /AMultiCycleTest/testcpu/jumpaddress /AMultiCycleTest/testcpu/branchCondition /AMultiCycleTest/testcpu/aluCtrl 0010 /AMultiCycleTest/testcpu/ALUOut /AMultiCycleTest/testcpu/register_A /AMultiCycleTest/testcpu/register_B /AMultiCycleTest/testcpu/memAddress /AMultiCycleTest/testcpu/opCode /AMultiCycleTest/testcpu/regToWrite /AMultiCycleTest/testcpu/rs /AMultiCycleTest/testcpu/rt /AMultiCycleTest/testcpu/rd /AMultiCycleTest/testcpu/immediatevalue /AMultiCycleTest/testcpu/shamt 00000 /AMultiCycleTest/testcpu/funct /AMultiCycleTest/testcpu/address /AMultiCycleTest/testcpu/memDataReg /AMultiCycleTest/testcpu/regA /AMultiCycleTest/testcpu/regB /AMultiCycleTest/testcpu/regWriteData /AMultiCycleTest/testcpu/imm_value /AMultiCycleTest/testcpu/valueA /AMultiCycleTest/testcpu/valueB /AMultiCycleTest/testcpu/forceadd /AMultiCycleTest/testcpu/zeroextendvalue /AMultiCycleTest/testcpu/brachwritecond
0000000000000000xxxxxxxxxxxxxxxx 0000000000000000xxxxxxxxxxxxxxxx 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 10110 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 00000000000000000000000000101100 00000000000000000000000000101100
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00 00 011
0000xxxxxxxxxxxxxxxxxxxxxxxxxx00
0010 0010
0010 0010
0010
00000
00000
00000
00000
00000
00000
00000
00000
00000
00000
00000
00000
00000
00000
00000
00000
00000
00000
00000
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
10111
10101
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
10110
00000
00000
000001
00000000000000000000000000101100
00000000000000xxxxxxxxxxxxxxxx00
2 us
4 us
6 us
8 us
10 us
12 us
Entity:AMultiCycleTest Architecture: Date: Sun Dec 02 8:30:30 PM Central Standard Time 2007 Row: 1 Page: 1
/AMultiCycleTest/testcpu/gotoNextPc /AMultiCycleTest/testcpu/RTL/registerFiles [31] [30] [29] [28] [27] [26] [25] [24] [23] [22] [21] [20] [19] [18] [17] [16] [15] [14] [13] [12] [11] [10] [9] [8] [7] [6] [5] [4] [3] [2] [1] [0] 0 /AMultiCycleTest/testcpu/data/dataMemory[1] 6765 20 0 1 1 0 19 1 18 17 2 3 2 16 5 3 15 8 5 14 13 8 13 21 13 12 34 21 11 55 34 10 89 55 9 144 233 377 610 987 89 8
1597 2584 4181 1597 2584
6765 4181 0
2 us
4 us
6 us
8 us
10 us
12 us
Entity:AMultiCycleTest Architecture: Date: Sun Dec 02 8:30:30 PM Central Standard Time 2007 Row: 1 Page: 2
When a register or a wire that is spelled incorrectly is used in Verilog using ModelSim, the compiler will not throw any error or warnings, but at the same time as expected, the program will cease to function correctly.
A warning is thrown but not enforced in ModelSim when a register is assigned more bits than the register is wide. This forces the register to only act upon the bottom bits of the assigned bits, usually to the inconvenience of the developer.
Module names should be name exactly as the file name which holds the module. Although this is not a strict rule of Verilog, it is a good practice because some programs like Quartus II depend on this naming scheme for some applications.
It is important to remember to pay close attention to the sensitivity list on an always block. If a variable is not included in the always block that is used inside the block itself, then the entire block may not run at all. This is a confusing issue to find when debugging code.
In Verilog an output must be accompanied by a register if the data is to be manipulated. Begin and end statements must be used properly. Not having an end statement to accompany a begin statement will cause problems in code. To assign output from one module to another a wire must be used. Using a register will cause a compilation error. Blocking vs. Non-Blocking assignment statements, misunderstanding the differences between these assignment statements can cause problems in the inner workings of Verilog code. This is also a very hard issue to debug.
Figure 3 High Level Datapath Design with Control Logic (Patternson,Hennessy, page 323)
Appendix E Sources
David A. Patterson, John L. Hennessy. Computer Organization and Design, Revised Printing 3rd Ed. New York: Elsevier, 2007.