You are on page 1of 67

Implementation of a Verilog Multicycle CPU

Joey Nirschl, Benjamin Holland


Iowa State University Department of Computer Engineering Ames, Iowa 50011 (515) 294-4111
jnirsch@iastate.edu, bholland@iastate.edu

Keywords: Verilog, Simulation, Multicycle Processor, CPU,


Datapath, Instruction set

Abstract:

This project was a semester term project to solidify our

gained knowledge in CPU datapath design. The project requirements include having at least 15 different instructions, including branch and jump instructions. Each module should be separately testable. The entire design should be implemented and have the ability to run a small sample program which is easily changeable to demonstrate functionality.

Group Contributions:
Joey Nirschl: High level design, testing, implementation. Benjamin Holland: Individual component design, testing, implementation.
Time Contribution: Joey Nirschls Hours: 25 (50% of work) Ben Hollands Hours: 25 (50% of work)

ProjectWorkBreakdown

Design(10%) Programming(20%) Testing(50%) Documentation(20%)

Table of Contents
Purpose of Machine Instruction Set Definition Instruction Format Design Methodology Design Testing Methodology Conclusion Lessons Learned Appendix A Verilog Code & Testbench Appendix B Simulation Results Appendix C Commonly Made Verilog Mistakes Appendix D Figures and Diagrams Appendix E Sources

1.

Purpose of Machine:

The multicycle CPU design is an improvement on the single cycle design. In this implementation the multicycle design allows for instructions to be executed in multiple stages. This is a great improvement to the signal cycle design because it allows instructions to be executed completely in three to five stages per instruction.

These stages include:

Stage 1: Instruction fetch Stage 2: Instruction decode/register fetch Stage 3: Memory address computation, execution, branch or jump completion Stage 4: Memory access load, memory access store, R-type instruction completion Stage 5: Memory read completion

In our implementation every instruction shares the first two stages which are the instruction fetch and instruction decode/register fetch stages. In the first stage data is fetched from memory and stored in the memory data register and the instruction register. The second stage decodes the instruction to either a R-type instruction, a branch instruction, a jump instruction or a memory address. At stage three the instruction may take separate logical paths depending on the instruction type which was decoded in stage 2. A finite state machine of these logical paths is described in Figure 1 of Appendix D. Stage three is the last stage for instructions of branch or jump types. After either of these two instructions have completed the next instruction is fetched in stage one, and the logic cycle restarts at the beginning of stage one. Stage four occurs for R-type and I-type instructions and for instructions which require memory access (load word, store word). Both store word and R/I-type instructions end in stage four. R/I-type instructions must now store the ALU result in the register file. Store word must store values to memory in this stage. The logic cycle then begins again at stage one with the instruction fetch.

Stage five is only responsible for the load word instruction which after reading the word from memory still needs to store the word to the register file. Load word instructions must load values from memory and store values into a register. After the writing of data to the register file is complete the cycle then will again continue with the fetching of the next instruction in stage one. The control module of the datapath is responsible for organizing and updating the stages of instructions. The advantage of breaking instructions up into stages is that fast instructions can be completed in fewer stages than slower instructions whereas in a single cycle design, all instructions are implemented in one stage requiring the system to wait during every instruction for the time it would take the longest instruction to finish. Since some instructions can now finish in one to two cycles sooner than in the single cycle implementation, the overall average number of clock cycles required to execute instruction code is drastically reduced.

2.

Instruction Set Definition

This implementation of a multicycle CPU has support for R-type instructions, I-type instructions, as well as branch and jump instructions. Special logic has been added to the control unit to support I-type instructions because I-type instructions were not previously implemented in the design by Patterson and Hennessy. The instruction set is modeled off of the MIPS (millions of instructions per second) instruction set. Aside from a few minor differences in operation codes the implemented instruction follows the MIPS instruction set convention. The instruction format is discussed in more detail in the next section.

*Stages were added to the finite state machine to support additional functionality. The FSM can be viewed in Appendix D. (The additional logic of each figure is indicated in red.)

The instructions included in this set are as listed below:

add Add, stores the addition of register source (rs) and register target (rt) into register destination (rd). R[rd]=R[rs] + R[rt]

sub Subtract, stores the difference of register source (rs) and register target (rt) into register destination (rd). R[rd]=R[rs] - R[rt]

and And, stores the bitwise and operation of register source (rs) and register target (rt) into register destination (rd). R[rd]=R[rs] & R[rt]

or Or, stores the bitwise or operation of register source (rs) and register target (rt) into register destination (rd). R[rd]=R[rs] | R[rt]

xor - Xor, stores the bitwise xor operation of register source (rs) and register target (rt) into register destination (rd). R[rd]=R[rs] ^ R[rt]

slt Set Less Than, conditionally stores a value 1 or 0 in register destination (rd) if register source (rs) is less than register target (rt). if(R[rs}<R[rt]){R[rd]=1} else{R[rd]=0}

beq Branch Equal, Conditionally upon equality of register source (rs) and register target (rt) branch to current pc value + 4 + immediate value. if(R[rs]==R[rt]){PC=PC+4+BranchAddress}

lw Load Word, loads a 32-bit quantity at memory address in register source (rs) + sign extended immediate into the register target (rt). R[rt]=M[R[rs]+SignExtendedImmediate]

sw Store Word, stores a 32-bit quantity in register target (rt) to register source (rs) +sign extended immediate . M[R[rs]+ SignExtendedImmediate]=R[rt]

addi Add Immediate, stores the addition of register source (rs) and the sign extended immediate value into register target (rt). R[rt]=R[rs] + SignExtendedImmediate

andi - And Immediate, stores the bitwise and operation of register source (rs) and the zero extended immediate value into register target (rt). R[rt]=R[rs] & ZeroExtendedImmediate

xori - Xor Immediate, stores the bitwise xor operation of register source (rs) and the zero extended immediate value into register target (rt). R[rt]=R[rs] ^ ZeroExtendedImmediate

ori - Or Immediate, stores the bitwise or operation of register source (rs) and the zero extended immediate value into register target (rt). R[rt]=R[rs] | ZeroExtendedImmediate

slti Set Less Than Immediate, conditionally stores a value 1 or 0 in register destination (rt) if register source (rs) is less than the sign extended immediate value. if(R[rs}<SignExtendedImmediate){R[rt]=1} else{R[rt]=0}

j Jump, unconditionally jumps to the instruction at the specified address. PC = PC[31:28]+address<<2

3.

Instruction Format

The instruction format is different for each of the three instruction types. An Rtype instruction has six fields which include the opcode, rs, rt, rd, shamt, and function fields. The opcode for an R-type instruction is always zero. The function field defines the type of the R-type instruction (ex: add, sub, and, or, ect.). The shamt field is used in shifting operations (not implemented in this design). RD is the register destination which is where the operation result is stored after execution of the instruction. RS (register source) and RT (register target) are the fields referencing the register values to be used in the computational operation of the instruction. The I-type instruction has 4 fields. The opcode for an I-type instruction defines the operation of the instruction. RS (register source) and RT (register target) are the

fields referencing the register values to be used in the computational operation of the instruction. The immediate field of the instruction can either be used as a constant value or as way to compute a memory address by sign extending the value. The J-type instruction has an opcode just like the other two instructions in order to define the instruction operation. The J-type instruction also has an address field which can be used to jump to the specified memory address. The figures below show the individual fields of each instruction type.

Instruction add sub and or xor slt xori beq lw sw addi andi ori slti j

Instruction Type R R R R R R I I I I I I I I J

Opcode 0x00 0x00 0x00 0x00 0x00 0x00 0x0F 0x04 0x23 0x2B 0x08 0x0C 0x0D 0x0A 0x02

Function 0x20 0x22 0x24 0x25 0x01 0x2A N/A N/A N/A N/A N/A N/A N/A N/A N/A

4. Design Methodology

The general approach to this design was to map out a high level design of the system. Our design was based off of the ideas presented in the Computer Organization and Design textbook written by David A. Patterson and John L. Hennessy. Figures 2 and 3 show a general outline of how our implementation was planned out on paper before implementation. The red markings on the figures in Appendix D are the modifications that were made to the design. After the high level design we broke the datapath down in separate modules so that we could divide work among team members and test functionality of each module individually. Testing each module individually was extremely important because it

allowed us to catch many errors in a controlled environment before it became cluttered in the traffic of the entire system. Modularizing code also allows team members to assign responsibility and let team members specialize in specific areas of the code creating more efficient code than if the system were not modularized. Having the system be modular allows for a greater resistance to change for the overall system because if new functionality is needed either a new modular is added or logic is modified within a module to accommodate the added requirement. Although having a system be modular is an important aspect, it is also important to note that the original design must be good enough to allow for code functionality to be modularized in the first place. Once the system has been designed and implemented in pieces it is only a matter of combining the pieces of the system to make the entire CPU. This is easy to do in theory, but with every project there are unforeseen mistakes and poor logic errors. Thankfully, because the system was designed well to begin with, there was room for change and modifications to correct the mistakes of the early implementation. After an intensive debug period, the system was complete. At this point we were able to fully document the entirety of the project and consider features to add or subtract as well as other design changes.

5.

Design

As mentioned earlier the original design was roughly based off of Figures 2 and 3 of Appendix D. Also as mention above, each of the core datapath functionalities was implemented in a separate module. Actual project code can be seen in Appendix A. Below is a top level view of the final datapath implementation. (The additional logic of each figure is indicated in red.)

Top Level Design - Final Implementation


m em Address~[5..0] pc[0]~reg0 pc~_OUT 0
PRE D Q memWrite ENA CLR aluctrl[3..0] valueA[31..0] memRead SEL

InstructionDecode:IDStage DataM em ory:data


clock readData[31..0] instruction[31..0] opcode[5..0] rs[4..0] rt[4..0] rd[4..0] funct[5..0] immediate[15..0] address[25..0] clock value[15..0]

zero_extend:extender zero_extend:extender_zerovalue pc[0]~reg0_OUT 0

zerovalue[31..0]

ALUM ulticycle:ALU
zero result[31..0]

DATAA

OUT0

Address[5..0] writeData[31..0]

ALUOut[31..0]
PRE D Q DATAB

M ulticycleControlFSM :m ainControl
ALUSrcA IorD RegDst

FiveT oOneM ux32:registerBm ux_out

valueB[31..0]

ENA CLR

MUX21

MemtoReg MemRead clk PCWriteCondition PCWrite MemWrite IRWrite RegWrite forceAdd

clock pc[1]~reg0
PRE D Q a ENA CLR x1[31..0] x0[31..0] x[31..0]

regFileRT L:RT L twom ux32:writedata


clock regWrite inData[31..0] wrReg[4..0] readA[4..0] regA[31..0] regB[31..0] ENA CLR

twom ux32:registerAm ux register_A[31..0]


PRE D Q a x1[31..0]

opcode[5..0]

ALUSrcB[2..0] PCSource[1..0]

pc[2]~reg0
PRE D Q

readB[4..0]

iord regdst m em toreg m em read pcwritecond pcwrite m em write irwrite regwrite aluscra alu_out[31..0] pcsource[1..0] alusrcb[2..0] InstructionDecode:IDStage_address sign_extend:extendIm m ediate_signvalue M ulticycleControlFSM :m ainControl_PCWriteCondition M ulticycleControlFSM :m ainControl_PCWrite M ulticycleControlFSM :m ainControl_ALUSrcB M ulticycleControlFSM :m ainControl_PCSource pc[3]~reg0_OUT 0 pc[2]~reg0_OUT 0 pc[1]~reg0_OUT 0 ALUM ulticycle:ALU_zero ALUM ulticycle:ALU_result pc[4]~reg0_OUT 0

register_B[31..0]
PRE D Q clock ENA CLR

sign_extend:extendIm m ediate

ENA CLR

signvalue[31..0]

value[15..0]

pc[3]~reg0
PRE D Q

ENA CLR x0[31..0]

x[31..0]

pc[4]~reg0
PRE D Q

ENA CLR

pc[5]~reg0
PRE D Q

pc[5]~reg0_OUT 0

ENA CLR

ALUControlM ulti:alucontrol
forceadd funct[5..0] opcode[5..0] ALUIn[3..0]

m em _out[31..0]

m em DataReg[31..0]
PRE D Q

ENA CLR

twom ux5:writereg
a x1[4..0] x0[4..0] x[4..0]

pc[31]~reg0
PRE D Q

ENA CLR

pc[30]~reg0
PRE D Q

ENA CLR

pc[29]~reg0
PRE D Q

ENA CLR

pc[31..0]

pc[28]~reg0
PRE D Q

ENA CLR

pc[27]~reg0
PRE D Q

ENA CLR

pc[26]~reg0
PRE D Q

ENA CLR

pc[27]~reg0_OUT 0 pc[28]~reg0_OUT 0 pc[29]~reg0_OUT 0 pc[26]~reg0_OUT 0 pc[30]~reg0_OUT 0 pc[31]~reg0_OUT 0

pc[25]~reg0
PRE D Q

pc[25]~reg0_OUT 0

ENA CLR

pc[24]~reg0
PRE D Q

pc[24]~reg0_OUT 0

ENA CLR

pc[23]~reg0
PRE D Q

pc[23]~reg0_OUT 0

ENA CLR

pc[22]~reg0
PRE D Q

pc[22]~reg0_OUT 0

ENA CLR

pc[21]~reg0
PRE D Q

pc[21]~reg0_OUT 0

ENA CLR

pc[20]~reg0
PRE D Q

pc[20]~reg0_OUT 0

ENA CLR

pc[19]~reg0
PRE D Q

pc[19]~reg0_OUT 0

ENA CLR

pc[18]~reg0
PRE D Q

pc[18]~reg0_OUT 0

ENA CLR

pc[17]~reg0
PRE D Q

pc[17]~reg0_OUT 0

ENA CLR

pc[16]~reg0
PRE D Q

pc[16]~reg0_OUT 0

ENA CLR

pc[15]~reg0
PRE D Q

pc[15]~reg0_OUT 0

ENA CLR

pc[14]~reg0
PRE D Q

pc[14]~reg0_OUT 0

ENA CLR

pc[13]~reg0
PRE D Q

pc[13]~reg0_OUT 0

ENA CLR

pc[12]~reg0
PRE D Q

pc[12]~reg0_OUT 0

ENA CLR

pc[11]~reg0
PRE D Q

pc[11]~reg0_OUT 0

ENA CLR

pc[10]~reg0
PRE D Q

pc[10]~reg0_OUT 0

ENA CLR

pc[9]~reg0
PRE D Q

pc[9]~reg0_OUT 0

ENA CLR

pc[8]~reg0
PRE D Q

pc[8]~reg0_OUT 0

ENA CLR

pc[7]~reg0
PRE D Q

pc[7]~reg0_OUT 0

ENA CLR

pc[6]~reg0
PRE D Q

pc[6]~reg0_OUT 0

ENA CLR

Add0 cycle[31..0]~reg0
A[31..0]
32' h00000001 --

PRE OUT[31..0] D Q

B[31..0] ENA ADDER CLR

cycle[31..0]

ALUOut_OUT 0 zero register_B_OUT 0

*To examine details of design please use the zoom feature of your PDF viewer

Datapath Control - Final Implementation


Equal0 next_state opcode[5..0]
6' h23 --

current_state
next_state:E next_state:0000 next_state:B next_state:C next_state:D next_state:F next_state:G next_state:H I G C B 0000 J K M D F E H L next_state:I next_state:J next_state:K next_state:L next_state:M clk 0000 B C D E F G H I J K L M

WideOr9

RegWrite~reg0
PRE D Q

RegWrite

A[5..0] B[5..0]

OUT

Equal0:OUT Equal1:OUT Equal5:OUT

ENA CLR

EQUAL

Equal6:OUT Equal7:OUT Equal8:OUT Equal9:OUT

Equal1

PCWrite~0

PCWrite~reg0
PRE D Q

PCWrite

A[5..0]
6' h2B --

B[5..0]

OUT

Equal4:OUT Equal3:OUT Equal2:OUT

ENA CLR

EQUAL

current_state.K current_state.M current_state.C current_state.B

MemRead~0

MemRead~reg0
PRE D Q

Equal5

MemRead

A[5..0]
6' h08 --

WideOr2 OUT current_state.G WideOr3 current_state.D

ENA CLR

B[5..0]

EQUAL

WideOr4 WideOr5 WideOr6 current_state.0000

IorD~0

IorD~reg0
PRE D Q

Equal6

IorD

ENA CLR

A[5..0]
6' h0A --

B[5..0]

OUT

WideOr8 WideOr7 clk

ALUSrcB~0
EQUAL

forceAdd~reg0
PRE D Q

forceAdd

Equal7
ENA A[5..0]
6' h0D --

CLR OUT

B[5..0]

WideOr1
EQUAL

ALUSrcB[1]~reg0
PRE D Q

Equal8
ENA CLR A[5..0]
6' h0C --

ALUSrcB[2..0]

B[5..0]

OUT

WideOr0
EQUAL

ALUSrcA~reg0
PRE D Q

ALUSrcA

Equal9

ENA CLR

A[5..0]
6' h0F --

B[5..0]

OUT

EQUAL

Equal4

A[5..0]
6' h02 --

B[5..0]

OUT

EQUAL

Equal3

A[5..0]
6' h04 --

B[5..0]

OUT

EQUAL

Equal2

A[5..0]
6' h00 --

B[5..0]

OUT

EQUAL

WideOr2

WideOr3

WideOr4

WideOr5

WideOr6

WideOr8

WideOr7 ALUSrcB[2]~reg0
PRE D Q D

RegDst~reg0
PRE Q

RegDst

ENA CLR

ENA CLR

PCWriteCondition~reg0
PRE D Q

PCWriteCondition

ENA CLR

PCSource[1..0]~reg0
PRE D Q

PCSource[1..0]

ENA CLR

MemWrite~reg0
PRE D Q

MemWrite

ENA CLR

MemtoReg~reg0
PRE D Q

MemtoReg

ENA CLR

IRWrite~reg0
PRE D Q

IRWrite

ENA CLR

ALUSrcB[0]~reg0
PRE D Q

clk
ENA CLR

*To examine details of design please use the zoom feature of your PDF viewer

ALU - Final Implementation


Mux32
SEL[3..0]
16' h00E7 --

DATA[15..0]

OUT

Mux32_OUT

MUX

Mux31

node[319..1]
319' h00000000000000000000000000000000000000000000000000000000000000000000000000000000 --

SEL[3..0]

BUF (DC)

1' h0 --

Add1

DATA[15..0]

OUT

Mux31_OUT

valueA[31..0]
1' h1 --

A[32..0] OUT[32..0]

valueB[31..0] result~32_OUT0

1' h1 --

B[32..0]

MUX ADDER

Mux30

result~38_OUT0 result~69_OUT0 result~37_OUT0 result~36_OUT0 result~35_OUT0 result~95_OUT0 result~65_OUT0 result~33_OUT0 result~63_OUT0 result~31_OUT0 result~39_OUT0 result~40_OUT0 result~41_OUT0 result~42_OUT0 result~43_OUT0 result~44_OUT0 result~45_OUT0 result[31]~0_OUT0 result~64_OUT0 result~94_OUT0 result~96_OUT0 result~62_OUT0 result~30_OUT0

SEL[3..0]

1' h0 --

DATA[15..0]

OUT

Mux30_OUT

MUX

Mux29

SEL[3..0]

1' h0 --

DATA[15..0]

OUT

Mux29_OUT

MUX

Mux28

SEL[3..0]

1' h0 --

result~93_OUT0

DATA[15..0]

OUT

Mux28_OUT

result~61_OUT0 result~29_OUT0
MUX

Mux27

SEL[3..0]

1' h0 --

result~92_OUT0

OUT DATA[15..0]

Mux27_OUT

result~60_OUT0 result~28_OUT0

MUX

Mux26

SEL[3..0]

1' h0 --

result~91_OUT0

DATA[15..0]

OUT

Mux26_OUT

result~59_OUT0 result~27_OUT0
MUX

Mux25

SEL[3..0]

1' h0 --

result~90_OUT0

DATA[15..0]

OUT

Mux25_OUT

result~58_OUT0 result~26_OUT0

MUX

Mux24

SEL[3..0]

1' h0 --

result~89_OUT0

DATA[15..0]

OUT

Mux24_OUT

result~57_OUT0 result~25_OUT0
MUX

Mux23

SEL[3..0]

1' h0 --

result~88_OUT0

DATA[15..0]

OUT

Mux23_OUT

result~56_OUT0 result~24_OUT0
MUX

Mux22

SEL[3..0]

1' h0 --

result~87_OUT0

OUT DATA[15..0]

Mux22_OUT

result~55_OUT0 result~23_OUT0

MUX

Mux21

SEL[3..0]

1' h0 --

result~86_OUT0

DATA[15..0]

OUT

Mux21_OUT

result~54_OUT0 result~22_OUT0
MUX

Mux20

SEL[3..0]

1' h0 --

result~85_OUT0

DATA[15..0]

OUT

Mux20_OUT

result~53_OUT0 result~21_OUT0

MUX

Mux19

SEL[3..0]

1' h0 --

result~84_OUT0

DATA[15..0]

OUT

Mux19_OUT

result~52_OUT0 result~20_OUT0
MUX

Mux18

SEL[3..0]

1' h0 --

result~83_OUT0

DATA[15..0]

OUT

Mux18_OUT

result~51_OUT0 result~19_OUT0
MUX

Mux17

SEL[3..0]

1' h0 --

result~82_OUT0

OUT DATA[15..0]

Mux17_OUT

result~50_OUT0 result~18_OUT0

MUX

Mux16

SEL[3..0]

1' h0 --

result~81_OUT0

DATA[15..0]

OUT

Mux16_OUT

result~49_OUT0 result~17_OUT0
MUX

Mux15

SEL[3..0]

1' h0 --

result~80_OUT0

DATA[15..0]

OUT

Mux15_OUT

result~48_OUT0 result~16_OUT0

MUX

Mux14

SEL[3..0]

1' h0 --

result~79_OUT0

DATA[15..0]

OUT

Mux14_OUT

result~47_OUT0 result~15_OUT0
MUX

Mux13

SEL[3..0]

1' h0 --

result~78_OUT0

DATA[15..0]

OUT

Mux13_OUT

result~46_OUT0 result~14_OUT0
MUX

Mux12

SEL[3..0]

1' h0 --

result~77_OUT0

OUT DATA[15..0]

Mux12_OUT

result~13

MUX

Mux11

SEL[3..0]

1' h0 --

result~76_OUT0 result~12

DATA[15..0]

OUT

Mux11_OUT

MUX

Mux10

SEL[3..0]

1' h0 --

result~75_OUT0

DATA[15..0]

OUT

Mux10_OUT

result~11

MUX

Mux9

SEL[3..0]

1' h0 --

result~74_OUT0 result~10

DATA[15..0]

OUT

Mux9_OUT

MUX

Mux8

SEL[3..0]

1' h0 --

result~73_OUT0

DATA[15..0]

OUT

Mux8_OUT

result~9

MUX

Mux7

SEL[3..0]

1' h0 --

result~72_OUT0

OUT DATA[15..0]

Mux7_OUT

result~8

MUX

Mux6

SEL[3..0]

1' h0 --

result~71_OUT0 result~7

DATA[15..0]

OUT

Mux6_OUT

MUX

Mux5 LessThan0
SEL[3..0] A[31..0] B[31..0]

OUT

DATA[15..0] LESS_THAN

OUT

Mux5_OUT

result~1

MUX

Mux4

SEL[3..0]

1' h0 --

result~66_OUT0 result~2

DATA[15..0]

OUT

Mux4_OUT

MUX

Mux3

SEL[3..0]

1' h0 --

result~67_OUT0

DATA[15..0]

OUT

Mux3_OUT

result~3

MUX

Mux2

SEL[3..0]

1' h0 --

result~68_OUT0

OUT DATA[15..0]

Mux2_OUT

result~4

MUX

Mux1

aluctrl[3..0]
1' h0 --

SEL[3..0]

DATA[15..0]

OUT

Mux1_OUT

Add0_OUT

result~5

result~34_OUT0 Mux0

MUX

SEL[3..0]

1' h0 --

result~70_OUT0

DATA[15..0]

OUT

Mux0_OUT

result~6

MUX

Equal0

A[31..0] B[31..0]

OUT

zero

EQUAL

*To examine details of design please use the zoom feature of your PDF viewer

ALU Control - Final Implementation


Decoder0

opcode[5..0]

IN[5..0] OUT[63..0]

DECODER

Selector1 WideOr0

SEL[3..0]

ALUIn[0]$latch
OUT D PRE Q ENA CLR

node[3..1]
3' h0 --

BUF (DC) 0 1 1 0 0 1

2' h1 --

DATA[3..0]

ALUIn[0]~1

ALUIn[0]~0 Selector4

SELECTOR

SEL[3..0] OUT
2' h2 --

ALUIn[1]$latch
PRE D ENA CLR Q

0 0 1 0

0 1

DATA[3..0]

ALUIn[1]~9 Selector5

ALUIn[1]~12 Selector3

SELECTOR

SEL[3..0] OUT
3' h3 --

SEL[3..0] OUT

ALUIn[2]$latch
PRE D ENA CLR Q
1' h0 --

DATA[3..0]
2' h1 --

DATA[3..0]

ALUIn[3..0]

WideOr5
SELECTOR SELECTOR

WideOr6 Equal4

funct[5..0]
6' h22 --

A[5..0] B[5..0]

OUT 1

0 1 0

0 1

EQUAL

ALUIn[0]~3 WideOr4

Equal3

ALUIn[0]~2 Selector0

A[5..0]
6' h01 --

B[5..0]

OUT

SEL[3..0] OUT

EQUAL

ALUIn[3]~13

3' h3 --

DATA[3..0]

Equal2
SELECTOR OUT

A[5..0]
6' h20 --

B[5..0]

Selector6 WideOr3
SEL[3..0] OUT

EQUAL

Equal1

A[5..0]
6' h25 --

3' h3 --

B[5..0]

OUT

DATA[3..0]

0 EQUAL 1 1 SELECTOR

Equal0

ALUIn[1]~6

A[5..0]
6' h24 --

B[5..0]

OUT

WideOr2

EQUAL

Equal5 ALUIn~14
A[5..0]
6' h2A --

B[5..0]

OUT

EQUAL 0

0 1 0

0 1

ALUIn[2]~8 forceadd

ALUIn[2]~11

*To examine details of design please use the zoom feature of your PDF viewer

6.

Testing Methodology

The general methodology to test the system directly stems from our design methodology. In the design methodology we broke important system functionalities in separate modules so that we could individually debug and assign responsibility. This way each module can be tested individually eliminating possible interference from other modules. Once each module has been individually tested and is working, the system can be implemented using each of the smaller modules. At this point it is just a matter of working out any system integration issues or finding any bugs that were missed in the first stage. Once the system was completely integrated, we decided that the best way to test the system as a whole was to write a program which would demonstrate the working functionality of the entire system. Finally, after writing our test program, we found that we were able to implement a working datapath that calculates the nth digit of the Fibonacci sequence correctly.

7.

Conclusion

Our Computer Engineering 305 project came from an accumulation of material from Cpre305 and previous courses. The knowledge we needed to complete this project included an understanding of multicycle CPUs, datapaths, control units, finite state machines, digital logic, and Verilog. With our knowledge, we were able to build individual logic modules and integrate those modules to create our multicycle processor. The processor was capable of supporting fifteen MIPS instructions. In the process of building the CPU, we added logic to the design presented in the textbook by Patterson and Hennessy to fully support our multicycle design.

8.

Lessons Learned

Save often, ModelSim has a bad habit of crashing in the lab. The more you save, the less amount of work will be lost after a program or computer crashes. Make backups, if all else fails, you have a backup. Use comments, when working with others, comments allow others to understand your code. The less comments provided, the harder it may be for someone to understand your code in the future.

Create block schematics, block schematics help to understand the big picture. If the block diagram created from the Verilog code, does not look correct, then the block diagram can bring understanding to the high level design as well as help overcome mistakes in code.

Appendix A Verilog Code & Testbench

//MultiCycle is our multicycle cpu module MultiCycle(cycle, pc, clock, alu_out, mem_out, regdst, memread, memwrite, regwrite, memtoreg, zero, pcwritecond, pcwrite,iord,irwrite, pcsource,aluscra,alusrcb); // input/output input clock; output[31:0] cycle,alu_out, mem_out, pc; output regdst, memread, memwrite, regwrite, memtoreg; output zero; output pcwritecond, pcwrite,iord,irwrite; output aluscra; output [1:0] pcsource; output [2:0] alusrcb;

// for debug reg[31:0] cycle=0;

always @ (posedge clock) begin cycle = cycle + 1; end

// control variables wire regdst, memread, memwrite, regwrite, memtoreg; wire pcwritecond, pcwrite,iord,irwrite, aluscra, zero; wire [1:0] pcsource; wire [2:0] alusrcb; wire [31:0] jumpaddress,alu_out, mem_out; wire [31:0] branchCondition;

wire[3:0] aluCtrl;

// other variables reg [31:0] pc = 32'b0; reg [31:0] ALUOut; reg [31:0] register_A, register_B;

wire [31:0] memAddress; // Decode control signals wire [5:0]opCode; wire [4:0] regToWrite;

//Instruction decode variables wire[4:0] rs,rt,rd; wire [15:0] immediatevalue; wire [4:0] shamt; wire [5:0] funct; wire [25:0] address;

reg [31:0] memDataReg; wire [31:0] regA,regB; wire [31:0] regWriteData; wire [31:0] imm_value; wire [31:0] valueA, valueB; wire forceadd;

assign memAddress = iord? ALUOut:pc;

//Data Memory module holds both data and instructions DataMemory data(memwrite,memread,memAddress[5:0], register_B,mem_out);

//Instruction decode decodes instructions and puts values into appropiate wires InstructionDecode IDStage(clock, mem_out,opCode,rs,rt,rd,shamt,funct,immediatevalue,address);

//Microcode Control FSM control control of multicycle cpu MulticycleControlFSM mainControl(opCode,clock,aluscra,iord,alusrcb,pcsource,regdst,memtoreg, memread,pcwritecond, pcwrite, memwrite, irwrite, regwrite,forceadd);

//MemoryDataRegister holds data from memory that may be written into register always@(posedge clock) memDataReg = mem_out;

//Chooses appropiate write register depending on the control twomux5 writereg(regdst, rd, rt,regToWrite); //Chooses appropiate data to write depending on the control twomux32 writedata(memtoreg,memDataReg,ALUOut,regWriteData); regFileRTL RTL(clock,regwrite,regWriteData,regToWrite,rs,rt,regA,regB);

//Registers hold value until positive edge of clock, when they are updated always@(posedge clock) begin register_A = regA; register_B=regB; end

//sign extend the immediate value sign_extend extendImmediate(clock,immediatevalue,imm_value); //xero extend the immediat value wire [31:0] zeroextendvalue;

zero_extend extender(clock,immediatevalue, zeroextendvalue);

twomux32 registerAmux(aluscra,register_A,pc,valueA); FiveToOneMux32 registerBmux(alusrcb,zeroextendvalue,imm_value<<2, imm_value,4,register_B,valueB);

// ALU control control operation of alu ALUControlMulti alucontrol(funct,opCode,forceadd,aluCtrl);

//Main ALU ALUMulticycle ALU(aluCtrl,valueA,valueB,alu_out,zero);

//temp ALU out register holds value from alu until updated on posedge clock always@(posedge clock) begin ALUOut= alu_out; end

JumpAddress jumpTo(pc,address,jumpaddress); //Mux chooses next data to pc depending on control ThreeToOneMux32 branchesAndJumps(pcsource,jumpaddress,ALUOut,alu_out,branchCondition); wire brachwritecond, gotoNextPc; assign brachwritecond = pcwritecond & zero; assign gotoNextPc =pcwrite | brachwritecond; //PC update always @ (posedge clock) begin if(gotoNextPc) pc=branchCondition; end

endmodule// END: MultiCycle

//The Testbench for our multicycle cpu module AMultiCycleTest; reg clock; wire[31:0] cycle,alu_out, mem_out, pc; wire regdst, memread, memwrite, regwrite, memtoreg; wire zero; wire pcwritecond, pcwrite,iord,aluscra,irwrite; wire [1:0] pcsource; wire [2:0] alusrcb; initial begin clock =1'b0; end always begin #15 clock = ~clock; end

MultiCycle testcpu(cycle, pc, clock, alu_out, mem_out, regdst, memread, memwrite, regwrite, memtoreg, zero, pcwritecond, pcwrite,iord,irwrite, pcsource,aluscra,alusrcb); endmodule // END: AMultiCycleTest

//Control for multicycle cpu module MulticycleControlFSM(opcode,clk,ALUSrcA,IorD,ALUSrcB,PCSource,RegDst,MemtoReg, MemRead,PCWriteCondition, PCWrite, MemWrite, IRWrite, RegWrite,forceAdd); input [5:0]opcode;

input clk; output ALUSrcA,IorD,RegDst,MemtoReg; output [1:0]ALUSrcB; output [1:0]PCSource; output MemRead,PCWriteCondition, PCWrite, MemWrite, IRWrite, RegWrite,forceAdd; reg MemRead,PCWriteCondition, PCWrite, MemWrite, IRWrite, RegWrite; reg ALUSrcA,IorD,RegDst,MemtoReg, forceAdd; reg [2:0]ALUSrcB; reg [1:0]PCSource; reg [3:0] current_state, next_state; reg [3:0] debug; parameter A=4'b0000, B=4'b0001, C=4'b0010, D=4'b0011, E=4'b0100, F=4'b0101, G=4'b0110, H=4'b0111, I=4'b1000, J=4'b1001, K=4'b1010, L=4'b1011, M=4'b1100; //parameter A=0, B=1, C=2, D=3, E=4, F=5, G=6, H=7, I=8, J=9; K=10;L=11,M=12;

//forceAdd 1=add (only in states A, B) initial begin current_state=4'b0000; next_state=4'b0000; end always@(posedge clk) begin current_state=next_state; end always@(posedge clk or opcode) begin case(current_state)

A:begin

debug = 4'b0000; //added recently

MemRead = 1; ALUSrcA=0; IorD=1'b0; IRWrite = 1; ALUSrcB=3'b001; PCWrite = 1; PCSource=2'b00; next_state=B;

RegDst=0; MemtoReg=0; PCWriteCondition=0; MemWrite=0; RegWrite=0;

forceAdd=1; end

B:begin

debug = 4'b0001;

ALUSrcA=0; ALUSrcB=3'b011;

IorD=0; PCSource=0; RegDst=0; MemtoReg=0; MemRead=0;

PCWriteCondition=0; PCWrite=0; MemWrite=0; IRWrite=0; RegWrite=0; forceAdd=1;

//if lw or sw nextstate = C //if(opcode==35 || opcode==43) if(opcode==6'b100011 || opcode==6'b101011) begin next_state=C; end

//if r type nextstate = G //if(opcode==0) else if(opcode==6'b000000) begin next_state=G; end

//if beq nextstate = I //if(opcode==4) else if(opcode==6'b000100) begin next_state=I; end

//if j nextstate = j //if(opcode==2) else if(opcode==6'b000010)

begin next_state=J; end //IType instrcution, treate as R-Type //because ALU control will take care of proper execution

else if(opcode== 6'b001000 ||//addI opcode==6'b001010//slt ) begin next_state=K;//sign extended immediate state end else if(opcode== 6'b001101||//orI opcode== 6'b001100||//andI opcode== 6'b001111//xorI ) begin next_state=M;//zero extended immediate state end else debug = 4'b1111; end C:begin

debug = 4'b0010;

ALUSrcA=1; ALUSrcB=3'b010;

IorD=0; PCSource=0;

RegDst=0; MemtoReg=0; MemRead=0; PCWriteCondition=0; PCWrite=0; MemWrite=0; IRWrite=0; RegWrite=0;

forceAdd=0;

//if lw nextstate = D or sw nextstate = F //if(opcode==35 || opcode==43) if(opcode==6'b100011) begin next_state=D; end else if(opcode==6'b101011) next_state=F; else debug = 4'b1111; end

D:begin

debug = 4'b0011;

MemRead = 1; IorD=1;

ALUSrcA=0;

ALUSrcB=0; PCSource=0; RegDst=0; MemtoReg=0; PCWriteCondition=0; PCWrite=0; MemWrite=0; IRWrite=0; RegWrite=0;

next_state=E;

forceAdd=0; end

E:begin

debug = 4'b0100;

RegDst=1'b0; RegWrite = 1; MemtoReg=1'b1;

next_state=A;

ALUSrcA = 0; IorD = 0; ALUSrcB = 0; PCSource = 0; MemRead = 0; PCWriteCondition = 0;

PCWrite = 0; MemWrite = 0; IRWrite = 0;

forceAdd=0;

end

F:begin

debug = 4'b0101;

MemWrite = 1'b1; IorD=1'b1;

next_state=A;

ALUSrcA = 0; ALUSrcB = 0; PCSource = 0; RegDst = 0; MemtoReg = 0; MemRead = 0; PCWriteCondition = 0; PCWrite = 0; IRWrite = 0; RegWrite = 0;

forceAdd=0;

end

G:begin

debug = 4'b0110;

ALUSrcA=1; ALUSrcB=3'b000;

next_state=H;

IorD = 0; PCSource = 0; RegDst = 0; MemtoReg = 0; MemRead = 0; PCWriteCondition = 0; PCWrite = 0; MemWrite = 0; IRWrite = 0; RegWrite = 0;

forceAdd=0;

end

H:begin //For RType or IType, if not RType, it is IType //if IType regDst = 0 debug = 4'b0111;

RegDst=1'b1;

RegWrite = 1; MemtoReg=1'b0;

next_state=A;

ALUSrcA = 0; IorD = 0; ALUSrcB = 0; PCSource = 0; MemRead = 0; PCWriteCondition = 0; PCWrite = 0; MemWrite = 0; IRWrite = 0;

forceAdd=0;

end

I:begin

debug = 4'b1000;

ALUSrcA=1; ALUSrcB=3'b000; PCWriteCondition = 1; PCSource=2'b01;

next_state=A;

IorD = 0; RegDst = 0; MemtoReg = 0; MemRead = 0; PCWrite = 0; MemWrite = 0; IRWrite = 0; RegWrite = 0;

forceAdd=0;

end

J:begin

debug = 4'b1001; PCWrite = 1; PCSource=2'b10; next_state=A;

ALUSrcA = 0; IorD = 0; ALUSrcB = 0; RegDst = 0; MemtoReg = 0; MemRead = 0; PCWriteCondition = 0; MemWrite = 0; IRWrite = 0; RegWrite = 0;

forceAdd=0; end K:begin

debug = 4'b1010;

ALUSrcA = 1; ALUSrcB = 3'b010;

MemtoReg = 0; IorD = 0; RegDst = 0; MemRead = 0; PCWriteCondition = 0; PCWrite=0; MemWrite = 0; IRWrite = 0; RegWrite = 0; PCSource=2'b00;

next_state=L;

forceAdd=0; end L: begin

debug = 4'b1011; RegDst = 0; RegWrite = 1;

MemtoReg = 0;

IorD = 0; MemRead = 0; PCWriteCondition = 0; PCWrite=0; MemWrite = 0; IRWrite = 0; ALUSrcA = 0; ALUSrcB = 3'b000; PCSource=2'b00;

next_state=A;

forceAdd=0; end M: begin debug = 4'b1100; ALUSrcA = 1; ALUSrcB = 3'b100;

MemtoReg = 0; IorD = 0; RegDst = 0; MemRead = 0; PCWriteCondition = 0; PCWrite=0; MemWrite = 0; IRWrite = 0; RegWrite = 0;

PCSource=2'b00;

next_state=L;

forceAdd=0; end endcase end

endmodule//END: MulticycleControlFSM

//Testbench for control module testbenchMulticycleControlFSM; reg [5:0]op; reg clock=0; wire ALUSrcA,IorD,RegDst,MemtoReg; wire [1:0]ALUSrcB; wire [1:0]PCSource; always begin #2 clock=~clock; end initial begin op=6'b000000;//add 1 #10 op=6'b001000;//addi 9 #10 op=6'b000000;//Sub 2 #10 op=6'b000100;//branch 10 #10 op=6'b000000;//And 3 #10 op=6'b000010;//j 15 #10 op=6'b000000;//Or 4 #10op=6'b100011;//LW 16

#10 op=6'b000000;//Xor 5 #30 op=6'b101011;//SW 17 #10 op=6'b000000;//Slt 6 #10 op=6'b001101;//OrI 11

#10 op=6'b000000;//Mult 7 #10 op=6'b001100;//AndI 12 #10 op=6'b000000;//Div 8 #10 op=6'b001111;//XorI 13 #10 op=6'b001010;//SltI 14

end MulticycleControlFSM test(op,clock,ALUSrcA,IorD,ALUSrcB,PCSource,RegDst,MemtoReg, MemRead,PCWriteCondition, PCWrite, MemWrite, IRWrite, RegWrite);

endmodule// END: testbenchMulticycleControlFSM

//Total Lines: 186 module ALUMulticycle(aluctrl, valueA, valueB,result,zero); input [3:0] aluctrl; input [31:0] valueA; input [31:0] valueB; output [31:0] result; reg [31:0] result; output zero; reg zero;

always@(aluctrl or valueA or valueB) begin

case(aluctrl)

4'b0000://Bitwise And begin result = valueA & valueB; end 4'b0001://Bitwise Or begin result = valueA | valueB; end 4'b0010://Add begin result = valueA + valueB; end

4'b0101://Xor begin result = valueA ^ valueB; end 4'b0110://Sub begin result = valueA - valueB; end 4'b0111://Slt begin result = valueA < valueB ? 1:0; end

endcase

if(valueA==valueB)

begin zero=1'b1; end else begin zero=1'b0; end

end endmodule

module testALUMultiCycle; reg [3:0] aluctrl; reg [31:0] valueA; reg [31:0] valueB; wire [31:0] result; wire zero;

initial begin

//AND aluctrl = 4'b0000; valueA = 0; valueB = 4294967295; $monitor("AND -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); #5 aluctrl = 4'b0000; valueA = 4294967295;

valueB = 4294967295; $monitor("AND -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); //OR #5 aluctrl = 4'b0001; valueA = 4294967295; valueB = 4294967295; $monitor("OR -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); #5 aluctrl = 4'b0001; valueA = 0; valueB = 0; $monitor("OR -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero);

//Add #5 aluctrl = 4'b0010; valueA = 5; valueB = 5; $monitor("ADD -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); #5 aluctrl = 4'b0010; valueA = 0; valueB = 4294967295; $monitor("ADD -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero);

//XOR #5 aluctrl = 4'b0101; valueA = 0; valueB = 1; $monitor("XOR -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); #5 aluctrl = 4'b0101; valueA = 4294967295; valueB = 0; $monitor("XOR -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero);

//Subtract #5 aluctrl = 4'b0110; valueA = 5; valueB = 4; $monitor("SUBTRACT -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); #5 aluctrl = 4'b0110; valueA = 4294967295; valueB = 0; $monitor("SUBTRACT -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero);

//SLT

#5 aluctrl = 4'b0111; valueA = 5; valueB = 4; $monitor("SLT -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero); #5 aluctrl = 4'b0111; valueA = 0; valueB = 4294967295; $monitor("SLT -> aluctrl: %b | valueA: %b (%d) | valueB: %b (%d) | Result= %b (%d) | Zero = %b",aluctrl,valueA,valueA,valueB,valueB,result,result,zero);

end

ALUMulticycle test(aluctrl, valueA, valueB,result,zero);

endmodule

//ALU control module ALUControlMulti(funct, opcode,forceadd, ALUIn); input [5:0]funct; input [5:0] opcode; input forceadd; output [3:0]ALUIn; reg [3:0]ALUIn;

always@(funct or opcode or forceadd) begin

if(forceadd==1) begin ALUIn = 4'b0010; end else begin //begin case case(opcode) //R-Type 6'b000000: begin //And if(funct==6'b100100) begin ALUIn = 4'b0000; end //Or else if(funct==6'b100101) begin ALUIn = 4'b0001; end //Add else if(funct==6'b100000) begin ALUIn = 4'b0010; end

//Xor else if(funct == 6'b000001) begin

ALUIn = 4'b0101; end //Sub else if(funct==6'b100010) begin ALUIn = 4'b0110; end //Slt else if(funct==6'b101010) begin ALUIn = 4'b0111; end

end//end R-type //Begin I-Type //AndI 6'b001100://C begin ALUIn = 4'b0000; end //OrI 6'b001101://D begin ALUIn = 4'b0001; end //XorI 6'b001111://F begin ALUIn = 4'b0101; end //SltI

6'b001010://A begin ALUIn = 4'b0111; end //AddI 6'b001000://8 begin ALUIn = 4'b0010; end

//Branch 6'b000100://4 begin ALUIn = 4'b0010; end //LW 6'b100011: begin ALUIn = 4'b0010; end //SW 6'b101011: begin ALUIn = 4'b0010; end //End I-Type

//jump 6'b000010: begin ALUIn=4'b0010;

end endcase //endcase end//end else end//end always endmodule

//ALU control testbench module testALUControlMulti; reg clock; reg [5:0]funct; reg [5:0]op; wire [3:0]ALUIn; initial begin $monitor(" Time=%d,\top=%d,\t funct=%d,\t ALUIn=%d", $time, op,funct, ALUIn); end initial begin op=6'b000000;funct=6'b100000;//add 1 #20 op=6'b000000;funct=6'b100010;//Sub 2 #20 op=6'b000000;funct=6'b100100;//And 3 #20 op=6'b000000;funct=6'b100101;//Or 4 #20 op=6'b000000;funct=6'b000001;//Xor 5 #20 op=6'b000000;funct=6'b101010;//Slt 6 #20 op=6'b000000;funct=6'b011000;//Mult 7 #20 op=6'b000000;funct=6'b011010;//Div 8 #20 op=6'b001000;funct=6'b010100;//addi 9 #20 op=6'b000100;funct=6'b000110;//branch (I) 10 ///??? #20 op=6'b001101;funct=6'bx;//OrI #20 op=6'b001100;funct=6'bx;//AndI #20 op=6'b001111;funct=6'bx;//XorI 11 12 13

#20 op=6'b001010;funct=6'bx;//SltI #20 op=6'b000010;funct=6'bx; //j 15 #20 op=6'b100011; funct=6'bx;//LW 16 #20 op=6'b101011; funct=6'bx;//SW 17 #20 $stop; end

14

ALUControlMulti aluctrltest(funct, op, ALUIn); endmodule

//Data Memory module module DataMemory( memWrite,memRead,Address, writeData,readData); input memWrite, memRead; input [5:0] Address; input [31:0] writeData; output [31:0] readData; reg [31:0] readData; reg [31:0]dataMemory[1024:0];

initial begin

dataMemory[0] = 32'b00100000000101010000000000010100;//N=20 dataMemory[4] = 32'b00000000000000001011100000100000; dataMemory[8] = 32'b00010010101000000000000000000110; dataMemory[12] = 32'b00100000000101100000000000000001; dataMemory[16] = 32'b00000010111101101011100000100000; dataMemory[20] = 32'b00000010111101101011000000100010; dataMemory[24] = 32'b00100010101101011111111111111111; dataMemory[28] = 32'b00010010101000000000000000000001;

dataMemory[32] = 32'b00010000000000001111111111111011; dataMemory[36] = 32'b10101100000101110000000000000001; end always@(memWrite or memRead or Address) begin

if(memWrite == 1'b1) begin dataMemory[Address]=writeData; end if(memRead ==1'b1) begin readData=dataMemory[Address]; end end endmodule//END: DataMemory

//Data Memory testbench module ADataMemTest;

reg memWrite,memRead; reg [5:0] Address; reg [31:0] writeData; wire [31:0] readData;

initial begin $monitor(" memWrite=%d, memRead=%d, Address=%d, writeData=%d,readData=%d ", $time,memWrite,memRead,Address, writeData,readData); end

initial begin memRead=1; #20 Address=4; #20 memRead=0; #20 Address=1; #20 memWrite=1; #20 writeData=32'b1; #20 $stop; end

DataMemory testMem(memWrite,memRead,Address, writeData,readData); endmodule

//Register File module regFileRTL(clock,regWrite,inData,wrReg,readA, readB,regA,regB); input clock; input regWrite; input [31:0] inData; input [4:0] wrReg; input [4:0] readA; input [4:0] readB; output [31:0] regA; output [31:0] regB; reg [31:0] registerFiles[31:0];

initial begin registerFiles[5'b00000] = 32'b0; end

always@(posedge clock) begin

if(regWrite && ( wrReg != 5'b00000)) begin registerFiles[wrReg] = inData; end

end

assign regA = registerFiles[readA]; assign regB = registerFiles[readB];

endmodule//END: regFileRTL

//InstructionDecode decode the instruction module InstructionDecode(clock,instruction, opcode, rs,rt,rd,shamt,funct, immediate, address); input clock; input [31:0] instruction; output [5:0] opcode, funct; output [4:0] rs, rt,rd,shamt; output [15:0] immediate; output [25:0] address;

reg [5:0] opcode, funct; reg [4:0] rs, rt,rd,shamt; reg [15:0] immediate; reg [25:0] address;

always@(posedge clock)

begin assign opcode = instruction[31:26]; assign rs = instruction[25:21]; assign rt = instruction[20:16]; assign rd = instruction[15:11]; assign shamt = instruction[10:6]; assign funct = instruction[5:0]; assign immediate = instruction[15:0]; assign address = instruction[25:0]; end endmodule// END: InstructionDecode

//Instruction decode testbench module AInstrTest; reg clock; reg [31:0] instr; wire [5:0] opcode, funct; wire [4:0] rs, rt,rd,shamt; wire [15:0] immediate; wire [25:0] address;

initial begin $monitor("Time=%d, instOp=%d,%d,instRs=%d,%d,instRt=%d,%d,instRd=%d,%d,instShT=%d,%d,instFt=%d,%d,in stImm=%d,%d,instAdd=%d;%d", $time,instr[31:26], opcode,instr[25:21], rs,instr[21:16],rt,instr[15:11],rd,instr[10:6],shamt,instr[5:0],funct,instr[15:0], immediate,instr[25:0], address); clock=0;

end always #2 clock= ~clock; initial begin instr = 32'b00100000000101010000000000010001; #20 instr = 32'b00000000000000001011100000100000; #20 instr = 32'b00010010101000000000000000000110; #20 instr = 32'b00100000000101100000000000000001; #20 instr = 32'b00000010111101101011100000100000; #20 instr = 32'b00000010111101101011000000100010; #20 instr = 32'b00100010101101011111111111111111; #20 instr = 32'b00010010101000000000000000000001; #20 instr = 32'b00010000000000001111111111111011; #20 instr = 32'b10101100000101110000000000000001; #20 $stop; end InstructionDecode testdecode(clock,instr, opcode, rs,rt,rd,shamt,funct, immediate, address); endmodule//END:AInstrTest

//Sign extension module module sign_extend(clock,value,signvalue); input clock; input [15:0] value; output [31:0] signvalue; reg [31:0] signvalue; always@(posedge clock) begin signvalue[31:16] = 16'b0000000000000000; if(value[15] ==1'b1)

begin signvalue[31:16] = 16'b1111111111111111; end signvalue[15:0] = value; end endmodule//END: sign_extend

//Sign Extend test bench module testSignExtend; reg clock; reg [15:0] value; wire [31:0] newvalue; initial begin $monitor(" Time=%d, value=%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b, signvalue=%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b% b%b%b%b%b%b%b", $time, value[15],value[14],value[13],value[12],value[11],value[10],value[9], value[8],value[7],value[6],value[5],value[4],value[3],value[2],value[1], value[0], newvalue[31],newvalue[30],newvalue[29],newvalue[28],newvalue[27], newvalue[26],newvalue[25],newvalue[24],newvalue[23],newvalue[22], newvalue[21],newvalue[20],newvalue[19],newvalue[18],newvalue[17], newvalue[16],newvalue[15],newvalue[14],newvalue[13],newvalue[12], newvalue[11],newvalue[10],newvalue[9],newvalue[8],newvalue[7], newvalue[6],newvalue[5],newvalue[4],newvalue[3],newvalue[2], newvalue[1],newvalue[0]); clock=0; end always #2 clock= ~clock;

initial begin value = 0;#20 value = 1;#20 value = 2;#20 value = 3;#20 value = 4; #20 value = 5;#20 value = 20;#20 value = 40;#20 value = 500;#20 value = 10000;

#20 value = 16'b1000000000000000;#20 value = 16'b1000000000000001; #20 value = 16'b0111111111111111;#20 value = 16'b1010101010101010; #20 value = 16'b1111111111111111;#20 value = 16'b1111111111111110; #20 $stop; end sign_extend testsign(clock,value,newvalue); endmodule

//Zero extension module module zero_extend(clock,value, zerovalue); input clock; input [15:0] value; output [31:0] zerovalue; reg [31:0] zerovalue; always@(posedge clock) begin zerovalue[31:16] = 16'b0000000000000000; zerovalue[15:0] = value; end endmodule

//Zero extension testbench

module testZeroExtend; reg [15:0] value; reg clock; wire [31:0] newvalue; initial begin $monitor(" Time=%d, value=%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b, zerovalue=%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b%b% b%b%b%b%b%b%b", $time, value[15],value[14],value[13],value[12],value[11],value[10],value[9], value[8],value[7],value[6],value[5],value[4],value[3],value[2],value[1], value[0], newvalue[31],newvalue[30],newvalue[29],newvalue[28],newvalue[27], newvalue[26],newvalue[25],newvalue[24],newvalue[23],newvalue[22], newvalue[21],newvalue[20],newvalue[19],newvalue[18],newvalue[17], newvalue[16],newvalue[15],newvalue[14],newvalue[13],newvalue[12], newvalue[11],newvalue[10],newvalue[9],newvalue[8],newvalue[7], newvalue[6],newvalue[5],newvalue[4],newvalue[3],newvalue[2], newvalue[1],newvalue[0]); clock=0; end always #2 clock= ~clock; initial begin value = 0;#20 value = 1;#20 value = 2;#20 value = 3;#20 value = 4; #20 value = 5;#20 value = 20;#20 value = 40;#20 value = 500;#20 value = 10000;

#20 value = 16'b1000000000000000;#20 value = 16'b1000000000000001; #20 value = 16'b0111111111111111;#20 value = 16'b1010101010101010; #20 value = 16'b1111111111111111;#20 value = 16'b1111111111111110;

#20 $stop; end zero_extend testzero(clock,value,newvalue); endmodule

//JumpAddress module JumpAddress(pc,address,newAddress); input [31:0] pc; input [25:0] address; output [31:0] newAddress; reg [31:0] newAddress;

always@(pc or address) begin newAddress[31:28] = pc[31:28]; newAddress[27:0] = (address <<2); end endmodule//END: JumpAddress

//JumpAddress testbench module AJumpAddressTest; reg [31:0] pc; reg [25:0] addr; wire [31:0] newAddr; integer x,y;

initial

begin $monitor(" Time=%d, pc=%d, addr=%d, newAddr=%d", $time,pc,addr,newAddr); end initial begin x=0; y=0; addr = 32'b0; pc = 32'b00010000000000000000000000000000; for(x = 0; x < 32; x=x+1) begin #10 addr=x; end pc = 32'b00110000000000000000000000000000; addr= 32'b11110000000000000000000000000000; for(y = 0; y < 32; y=y+1) begin #10 addr=y; end #20 $stop; end JumpAddress jumptest(pc,addr,newAddr); endmodule

//Twomux5 has a datapath of 5 bits wide and a choice of two elements module twomux5(a,x1,x0,x); input a; input [4:0] x1,x0; output [4:0]x;

reg [4:0]x;

always@(a or x1 or x0) begin

if(a == 1'b1) begin x = x1; end else if(a==1'b0) begin x = x0; end

end endmodule//END: twomux5

//Twomux32 has a datapath of 32 bits wide and a choice of two elements module twomux32(a,x1,x0,x); input a; input [31:0] x1,x0; output [31:0]x; reg [31:0]x; always@(a or x1 or x0) begin if(a == 1'b1) begin x = x1; end

else if(a == 1'b0) begin x=x0; end end endmodule//END: twomux32

//ThreeToOneMux has a datapath of 32 bits wide and a choice of three elements module ThreeToOneMux32(select,x2,x1,x0,out); input [1:0] select; input [31:0] x2,x1,x0; output [31:0] out; reg [31:0] out; always@(select or x0 or x0 or x2) begin if(select == 2'b00) begin out = x0; end if(select == 2'b01) begin out = x1; end if(select == 2'b10) begin out = x2; end

end

endmodule// END:ThreeToOneMux32 //FiveToOneMux 32 has a datapath of 32 bits and a choice of three elements module FiveToOneMux32(select,x4,x3,x2,x1,x0,out); input [2:0] select; input [31:0] x4,x3,x2,x1,x0; output [31:0] out; reg [31:0] out; always@(select or x0 or x1 or x2 or x3 or x4) begin if(select == 3'b000) begin out = x0; end if(select == 3'b001) begin out = x1; end if(select == 3'b010) begin out = x2; end if(select == 3'b011) begin out = x3; end if(select == 3'b100) begin out = x4; end end endmodule

Appendix B Simulation Results

The following simulation results are of a program we wrote which calculates the nth digit of the Fibonacci sequence. In this simulation the nth digit to calculate was set as 20. After running the simulation we calculated that 20th digit of the Fibonacci sequence was 6765, which is indeed correct.

The assembly code to our program is listed below:

addi $21,$0,20 add $23,$0,$0 beq $21,$0,end addi $22, $0,1 loop: add $23,$23,$22 sub $22,$23,$22 addi $21,$21,-1 beq $21,$0,end beq $0,$0, loop end: sw $23,1($0)

To double check out binary math, we compiled our assemble code in the MIPS simulator SPIM.

[0x00400000] [0x00400004]

0x20150002 addi $21, $0, 20 0x0000b820 add $23, $0, $0

; 1: addi $21,$0,2 ; 2: add $23,$0,$0

[0x00400008] 0x12a00007 beq $21, $0, 28 [end-0x00400008]; 3: beq $21,$0,end [0x0040000c] [0x00400010] 0x20160001 addi $22, $0, 1 0x02f6b820 add $23, $23, $22 ; 4: addi $22, $0,1 ; 6: add $23,$23,$22 ; 7: sub $22,$23,$22 ; 8: addi $21,$21,-1

[0x00400014] 0x02f6b022 sub $22, $23, $22 [0x00400018] 0x22b5ffff addi $21, $21, -1

[0x0040001c] [0x00400020] [0x00400024]

0x12a00002 beq $21, $0, 8 [end-0x0040001c] 0x1000fffc beq $0, $0, -16 [loop-0x00400020] 0xac170001 sw $23, 1($0)

; 9: beq $21,$0,end ; 10: beq $0,$0, loop ; 12: sw $23,1($0)

In binary representation our program code is as follows:

dataMemory[0] = 32'b00100000000101010000000000010100; dataMemory[4] = 32'b00000000000000001011100000100000; dataMemory[8] = 32'b00010010101000000000000000000110; dataMemory[12] = 32'b00100000000101100000000000000001; dataMemory[16] = 32'b00000010111101101011100000100000; dataMemory[20] = 32'b00000010111101101011000000100010; dataMemory[24] = 32'b00100010101101011111111111111111; dataMemory[28] = 32'b00010010101000000000000000000001; dataMemory[32] = 32'b00010000000000001111111111111011; dataMemory[36] = 32'b10101100000101110000000000000001;

On the following pages are the results of the simulation running the program described above.

/AMultiCycleTest/testcpu/clock /AMultiCycleTest/testcpu/cycle /AMultiCycleTest/testcpu/alu_out /AMultiCycleTest/testcpu/mem_out /AMultiCycleTest/testcpu/pc /AMultiCycleTest/testcpu/regdst /AMultiCycleTest/testcpu/memread /AMultiCycleTest/testcpu/memwrite /AMultiCycleTest/testcpu/regwrite /AMultiCycleTest/testcpu/memtoreg /AMultiCycleTest/testcpu/zero /AMultiCycleTest/testcpu/pcwritecond /AMultiCycleTest/testcpu/pcwrite /AMultiCycleTest/testcpu/iord /AMultiCycleTest/testcpu/irwrite /AMultiCycleTest/testcpu/aluscra /AMultiCycleTest/testcpu/pcsource /AMultiCycleTest/testcpu/alusrcb /AMultiCycleTest/testcpu/jumpaddress /AMultiCycleTest/testcpu/branchCondition /AMultiCycleTest/testcpu/aluCtrl 0010 /AMultiCycleTest/testcpu/ALUOut /AMultiCycleTest/testcpu/register_A /AMultiCycleTest/testcpu/register_B /AMultiCycleTest/testcpu/memAddress /AMultiCycleTest/testcpu/opCode /AMultiCycleTest/testcpu/regToWrite /AMultiCycleTest/testcpu/rs /AMultiCycleTest/testcpu/rt /AMultiCycleTest/testcpu/rd /AMultiCycleTest/testcpu/immediatevalue /AMultiCycleTest/testcpu/shamt 00000 /AMultiCycleTest/testcpu/funct /AMultiCycleTest/testcpu/address /AMultiCycleTest/testcpu/memDataReg /AMultiCycleTest/testcpu/regA /AMultiCycleTest/testcpu/regB /AMultiCycleTest/testcpu/regWriteData /AMultiCycleTest/testcpu/imm_value /AMultiCycleTest/testcpu/valueA /AMultiCycleTest/testcpu/valueB /AMultiCycleTest/testcpu/forceadd /AMultiCycleTest/testcpu/zeroextendvalue /AMultiCycleTest/testcpu/brachwritecond
0000000000000000xxxxxxxxxxxxxxxx 0000000000000000xxxxxxxxxxxxxxxx 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 10110 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 00000000000000000000000000101100 00000000000000000000000000101100

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00

00 00 011
0000xxxxxxxxxxxxxxxxxxxxxxxxxx00

0010 0010

0010 0010 0010 0010 0010

0010 0010 0010 0010 0010

0010 0010 0010 0010 0010

0010 0010

0010

00000

00000

00000

00000

00000

00000

00000

00000

00000

00000

00000

00000

00000

00000

00000

00000

00000

00000

00000

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

10111

10101

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

10110

00000

00000

000001

00000000000000000000000000101100

00000000000000xxxxxxxxxxxxxxxx00

2 us

4 us

6 us

8 us

10 us

12 us

Entity:AMultiCycleTest Architecture: Date: Sun Dec 02 8:30:30 PM Central Standard Time 2007 Row: 1 Page: 1

/AMultiCycleTest/testcpu/gotoNextPc /AMultiCycleTest/testcpu/RTL/registerFiles [31] [30] [29] [28] [27] [26] [25] [24] [23] [22] [21] [20] [19] [18] [17] [16] [15] [14] [13] [12] [11] [10] [9] [8] [7] [6] [5] [4] [3] [2] [1] [0] 0 /AMultiCycleTest/testcpu/data/dataMemory[1] 6765 20 0 1 1 0 19 1 18 17 2 3 2 16 5 3 15 8 5 14 13 8 13 21 13 12 34 21 11 55 34 10 89 55 9 144 233 377 610 987 89 8
1597 2584 4181 1597 2584

6765 4181 0

144 233 377 610 987 7 6 5 4 3

2 us

4 us

6 us

8 us

10 us

12 us

Entity:AMultiCycleTest Architecture: Date: Sun Dec 02 8:30:30 PM Central Standard Time 2007 Row: 1 Page: 2

Appendix C Commonly Made Verilog Mistakes

When a register or a wire that is spelled incorrectly is used in Verilog using ModelSim, the compiler will not throw any error or warnings, but at the same time as expected, the program will cease to function correctly.

A warning is thrown but not enforced in ModelSim when a register is assigned more bits than the register is wide. This forces the register to only act upon the bottom bits of the assigned bits, usually to the inconvenience of the developer.

Module names should be name exactly as the file name which holds the module. Although this is not a strict rule of Verilog, it is a good practice because some programs like Quartus II depend on this naming scheme for some applications.

It is important to remember to pay close attention to the sensitivity list on an always block. If a variable is not included in the always block that is used inside the block itself, then the entire block may not run at all. This is a confusing issue to find when debugging code.

In Verilog an output must be accompanied by a register if the data is to be manipulated. Begin and end statements must be used properly. Not having an end statement to accompany a begin statement will cause problems in code. To assign output from one module to another a wire must be used. Using a register will cause a compilation error. Blocking vs. Non-Blocking assignment statements, misunderstanding the differences between these assignment statements can cause problems in the inner workings of Verilog code. This is also a very hard issue to debug.

Appendix D Figures and Diagrams

Figure 1 Control FSM Logic Diagram (Patternson,Hennessy, page 338)

Figure 2 High Level Datapath Design (Patternson,Hennessy, page 320)

Figure 3 High Level Datapath Design with Control Logic (Patternson,Hennessy, page 323)

Appendix E Sources

David A. Patterson, John L. Hennessy. Computer Organization and Design, Revised Printing 3rd Ed. New York: Elsevier, 2007.

You might also like