3rd International Conference on Advanced Manufacturing Technology (ICAMT 2004), 11-13 May, Kuala Lumpur, Malaysia

Ismail Saad1, Pukhraj Vaya1, Abu Bakar Abd Rahman2 Lecturer, 2MSc Student School of Engineering and Information Technology University Malaysia Sabah, Locked Bag 2073, 88999 Kota Kinabalu, Sabah, Malaysia Tel: +60-8-832-0000 x 3147/3066, Fax: +60-8-832-0348 (e-mail: ismail_s@ums.edu.my, vaya@ums.edu.my , abubakar@seit.ums.edu.my)

Abstract This paper presents the design and simulation of 16-bit RISC processor architecture behavioral model based on HDL methodology using Verilog-HDL software. The processor system consists of ROM, RAM, I/O and CPU. The CPU module is merely a shell which instances the real processor definition in cpu_core.v, control.v, datapath.v and alsu.v file. Behavioral model of control module which comprises of controller state machine, Instruction Register (IR) and a group of Control Signals are explained thoroughly. The tasks of modeling Read, Write and Tristate buffer operation for datapath module are also deeply being explained. The functionality of the processor design was tested by executing three instructions type. Thus, it is shown that Verilog-HDL can be used to improve the design process of new microprocessor architecture. Keywords: Verilog- HDL, RISC, Behavioral Model, Register, Microprocessor 1. Introduction Microprocessor application is not limited to personal computer but also used in a specific field such as robotics, communications, control systems, etc [1-5]. However, the process of designing a new processor for such application is very complicated, as it involves million transistors in single chips [6-9]. Therefore, in order to improve the design process and thus minimizing error, time and cost, Verilog Hardware Description Language (Verilog-HDL) is a software tool that can be used to simulate and verify the functionality of the microprocessor components before the real device were fabricated [9-13]. Thus, the paper presents a design and simulation of 16-bits RISC processor based on HDL methodology using Verilog-HDL on Synopys Front-end Compiler. 2. Processor Architecture The processor has a multiplexed 16-bit data and address path. The instructions has a variable length, as it take one word for instruction that operates within Registers only and two words for instructions operates on Registers/Memory and Register/Immediate . The 16-bits instruction fields consists of 2-Mode bit, 1-bit each for Set Condition (set_bit) and Test Condition (test_bit), 3-bit ALU Function (ALU_func) and 3-bit each for Destination Register (Rd), Source- 1 Register (Rs1) and Source-2 Register (Rs2). The processor can execute 36 instructions, which are grouped into 2 instruction type; Arithmetic/Logical and Load/Store Instructions. There are six registers in the processor where 3 of them are general purpose while the other 3 are dedicated register that is PC (Program Counter), IR (Instruction Register) and DR (Direct Register). On top of that, a dummy register (always zero) is also included in the register file. 1 4. 3. Verilog-HDL Model for Processor System

The system of processor consists of 256 words of ROM (addresses 0-255), 256 words of RAM (addresses 256-511) and I/O consisting of a bank of 16 switches (mapped at address 512) and a bank of 16 LEDs (mapped at address 513). The cpu.v file is merely a shell which simulates the pad ring and which instances the real processor definition in the following files: cpu_core.v control.v datapath.v alsu.v Verilog HDL Module Codes

4.1 Cpu_Core Module This module has a single internal system address/data bus. Because of a single bus system, all the data from memory, Data_in must pass through a tri-state, TrisMem control signal before connection to system bus. Furthermore, this module instances the definition of control and datapath modules of processor. 4.2 Control Module Two main functions of control module is to execute operations in proper sequence by means of controller state machine and to generates the control signals that cause each instructions to be executed.

Source2 Register (Rs2). RnW. Generally. Tri-state buffer control signals for System bus 3. nOE and ENB) and signals to identify memory write 2. 2. For distinguish transitions of operation from one state to another. address_hold (01). Hence. 1: if (data_hold) states =>Fetch1. else begin case (state) 0: if (data_hold & ModeBit =00) state => Fetch1. For Load and Store instruction type. else if (data_hold & ModeBit =01) state => Execute. data_hold sub state and ModeBit is 01 or in the Fetch1 state. 1-bit zero flag register the controller state machines and sub states of memory cycles and the different type of generated control signals as illustrated in figure 1 below. Datapath control signals 4. else if ( ModeBit =01|| 00) state => Fetch1. the control module will coded the instruction in the IR into an Opcode. nALE. The memory control signals (nME. Fetch2 and Execute state is used for 3 memory cycles or 12 clock cycles. There is also a 1-bit zero flag register inside control unit that specifically design for an execution of conditional instruction. ALU function. Furthermore. This instruction from the system bus will be taken into Instruction Register (IR) during the Fetch1 state and data_setup sub state of memory cycle and sequentially the IR will be updated. Source1 Register (Rs1).3rd International Conference on Advanced Manufacturing Technology (ICAMT 2004). For behaviourally model the task of taken instruction from system bus to IR in control module. State 4.3 Control Signals There are 4 groups of control signals that must be generated in control unit as listed below: 1. Set Bit and Test bit field format. The behavioural model of this zero flag register is carried out by using if statement in verilog code as return below: always @ (posedge Clock) begin if ((((state == `Execute && ModeBit == 2'b01) && sub_state == `data_hold) || (state == `Fetch1 && ModeBit == 2'b00 && sub_state == `data_setup)) && setbit == 1'b1) zero_flag_reg <= Zero. always @(positive edge of clock) begin if (nReset = 0) state => high impedance. which use 4 clock cycle or 1 memory cycle to be executed.2. all instructions are complete in exactly 12 clock cycles. end : CONTROL ReadPC_1 ReadR0_1 Zero Function TrisPC ReadR1_1 0: Fetch1 3: Fetch2 1: Execute Sub_state : 0: address_setup 1: address_hold Zero zero_flag_reg Zero Flag TrisALU ReadR2_1 ReadR3_1 ReadPC_2 nTrisRd ReadR0_2 PC_inc ReadR1_2 Rs2_sel ReadR2_2 ReadR3_2 WriteR2 WriteR1 TrisRs2 TrisRd 3: data_setup 2: data_hold IR testbit setbit ModeBit Opcode ALUfunc Rd Rs1 LoadDR LoadPC WriteR3 WritePC Rs2 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Continuously. else if ( ModeBit =10||11) state => Fetch2. which is the longest instruction to be executing. the if statement of verilog code has been used as below. This operation will only occur in the positive edge of the clock cycle. This controller state machine has been coded in verilog by using case statement and in general the algorithm can be view as below. else if (data_hold & (ModeBit =10 || 11) state => Fetch2. Fetch2 (11) and Execute (01) that coded by using gray code. 3: if (data_hold & (ModeBit =10||11) state => Execute. data_setup (11) and data_hold (10).2 Instruction Register and Zero Flag Register Any instructions that stored in rom. end 4. ModeBit. 11-13 May.1 Controller State Machine The controller state machine has three states: Fetch1 (00). it also has 4 memory cycles sub states: address_setup (00). setbit of instruction is TRUE The processor is in Execute state. ALU function control signal 2 . The Zero signals from ALU unit will only taken into zero flag register when the following condition has been satisfied and it happen only in the positive edge of the clock: 1.2. 1. Fig. Kuala Lumpur. that use 8 clock cycle or 2 memory cycles to be executed. Destination Register (Rd). Execute state is for Register + Immediate instruction type. TrisMem goes high. Malaysia The control module consists of 16-bit Instruction Register (IR). the data_hold sub state of memory cycle and the 2-mode bit fields of instruction has been used. data_setup sub state and ModeBit is 00. Fetch1 state is for Register + Register instruction type. Processor Control Module Architecture 4.2.v file will be taken into system bus if the tri-state buffer. always @(posedge Clock) begin if ((state == `Fetch1) && (sub_state == `data_setup )) IR <= #20 Sysbus.

the behavioral model of decoder (En_read_dec) is return in verilog as below: assign En_read_dec = ((state == `Fetch1 && ModeBit == 2'b00) || (state == `Execute && (ModeBit == 2'b01 || ModeBit == 2'b10 || ModeBit == 2'b11)) || (state == `Fetch2 && (ModeBit == 2'b11 || ModeBit == 2'b10))). TrisRs2: sub state is either data_setup or data_hold and on memory_write and ModeBit is 11. sub state. the behavioral model of multiplexors of verilog code for read (read from R0_1) and write (write to R1) operation is given below: assign WriteR1 = (En_wrt_dec && Rd == 3'b001 ? 1 : 0. ReadR1_1. there are 5 tri-state buffers (TrisALU. All of this control signals is behaviourally coded in verilog by using continuous assignment statement. Another control signals for datapath that generated in control unit are Rs2_sel. A continuous assignment statement of verilog is used to code the two-decoder signal (En_read_dec and En_wrt_dec) and the multiplexors using continuous assignment is used to select which signals for read and write from and to selected registers in datapath unit respectively. ReadR2_1. assign nOE = ( sub_state == `address_setup ) || ( sub_state == address_hold ) || memory_write. TrisPC: sub state is either address_setup or address_hold and state is either Fetch1 or Fetch2 or Execute with ModeBit of 01. assign nME = ( sub_state == `address_setup ) || ( sub_state == `data_hold ). 2. Kuala Lumpur. This decoder is required as a control for read the contents of any one of 5 general-purpose registers in datapath unit (PC. TrisRs2. As for example. 1. TrisPC. ReadR2_2. TrisALU: sub state is either address_setup or address_hold and state is Execute with ModeBit is either 11 or 10. R1. following is the behavioral model of PC_inc: assign PC_inc = ((sub_state == `address_hold ) && (( state == `Fetch1 ) || ( state == `Fetch2 ) || (state == `Execute && ModeBit == 2'b01 ) )). nTrisRd: inverse of TrisRd (~TrisRd). For example. R3 and PC). 2. LoadDR and LoadPC. En_wrt_dec: state is Fetch1 and sub state is data_hold and ModeBit is 00 or state is Execute and sub state is data_setup and ModeBit is 10 or state is Execute and sub state is data_hold and ModeBit is 01 and zero_flag reg is TRUE and testbit is either 1 or 0. which is the datapath unit control signals. R2 and R3) and for writes any results or computed data into any one of 4 general-purpose register (R1. R0. As for example. WriteR3 and WritePC. In contrast. assign ENB = ~nOE. Following is the condition for a statement of En_read_dec and En_wrt_dec control signals: 3 The asynchronous reset “assign” that overrides the synchronous action of state. TrisRd and nTrisRd) for datapath unit and only 1 (TrisMem) for memory. ReadR1_2. For the second group of control signals. ReadR3_1 and ReadPC_1 and for source2 register (Rs2) is called as: ReadR0_2. For behaviourally model the third groups of control signals. the control signals for write operation are identify as: WriteR1. WriteR2. The control signal for alsu function. ReadR3_2 and ReadPC_2. Both read and write operation that coded by using the multiplexors using continuous assignment must satisfied the En_read_dec and En_wrt_dec control signals respectively together with the source1 register (Rs1) and source2 register (Rs2) instruction field for read and destination register (Rd) field for write operation. R2. The behavioural model for each of these signals is carried out in verilog by using a continuous assignment statement. En_read_dec: state is Fetch1 and ModeBit is 00 or state is Execute and ModeBit is either 01 or 10 or 11 or state is Fetch2 and ModeBit is either 11 or 10. IR and zero flag register must be included in this system of control unit as a normal method for the description of an . 11-13 May. assign RnW = ( sub_state == `address_setup ) || ( sub_state == `address_hold )|| ~memory_write. assign nALE = ( sub_state == `address_setup ). the behavioral model of TrisMem Tri-state buffers is return as follow: assign TrisMem = ( (sub_state == `data_setup || sub_state == `data_hold ) && (state == `Fetch1 || state == `Fetch2 || ( state == `Execute && (ModeBit == 2'b10 || ModeBit == 2'b01)))). PC_inc. 5. As for example. called as Function is coded as ALUfunc or the 3-bit of alsu function in instruction field format that define by using a continuous assignment statement of verilog code as follows: assign Function = ALUfunc. 3. 4.3rd International Conference on Advanced Manufacturing Technology (ICAMT 2004). 6. TrisMem: sub state is either data_setup or data_hold and state is either Fetch1 or Fetch2 or Execute with ModeBit is either 10 or 01. TrisRd: sub state is data_setup and state is Execute and ModeBit is 10. assign ReadR0_1=( En_read_dec && Rs1 == 3'b000 ?1: 0. The control signals for read operation for source1 register (Rs1) are named as: ReadR0_1. Following are the conditions of a continuous assignment statement for each of the tristate buffer control signals for System bus assignment: 1. Malaysia The behavioural model of the first group of control signals is written in verilog using assignment as follow: assign memory_write = (Opcode == `ST) && (state == `Execute). 2 decoder for read and write control signals is needed.

assign Mux2_out = (Rs2_sel) ? Rs2 : DR. . 2. end assign Rs1 = ( ReadR2_1 ) ? R2 : 16’bz. LoadPC and LoadDR control signals is not directly use for destination register. The same structure applied for all others register (R0. Fig. R3. 16 PC ReadPC_2 PC+1 1 PC_inc Rs ReadPC_1 Rs + 0 Mux1_out WritePC WriteR1 R0 R1 R2 R3 DR nTris ReadR0_2 ReadR0_1 ReadR1_1 Rd 16 LoadDR ReadR1_2 16 16 Rs2_sel 1 0 Mux2_out WriteR2 ReadR2_2 ReadR2_1 ReadR3_1 ReadR3 2 Zero ALU result Function16 TrisALU TrisPC WriteR3 16 16 TrisRd TrisRs2 Note that. if (WriteR1) R1 = Rd. TrisRs2. assign Rd = (nTrisRd) ? result:16’bz. assign Sysbus = (TrisPC) ? PC:16’bz. called as Mux2_out. each register for read is control by tri-state. For write to destination register (Rd). R1. end else begin deassign state. thus if the register (Rs1 or Rs2) is not selected it will be in high impedance state. assign sub_state = 0. it can only happen in the rising or positive edge of clock. From figure 2. there are 2 multiplexors. To model the tri-state buffer (TrisALU. In second tasks. R1. 11-13 May. the continuous assignment for Rs1 that will selects R2 if the control signal ReadR2_1 is ‘1’ and if ReadR2_1 is ‘0’ Rs1 will be in high impedance state is coded in verilog as follow: 4 For the third tasks. Mux2_out multiplexor used Rs2_sel control signal generated from control unit for accomplish the task of selecting Rs2 or DR. For write operation. the multiplexors using continuous assignment structure of tri-state is used. R2. WriteR3 and LoadDR) is used as a condition inside the if statement.3rd International Conference on Advanced Manufacturing Technology (ICAMT 2004). deassign sub_state. LoadPC control signal is necessary because of the architecture of the design processor that allow any value to be write into PC. the Procedural Block and if statement of verilog is used. TrisPC. R3. one is for selecting PC function. The following are the verilog code for each of the tri-state buffer in datapath unit: assign Sysbus = (TrisALU) ? result:16’bz. TrisRd) for system bus. if (LoadDR) DR = Sysbus. the same structure of multiplexor using continuous assignment for tri-state that previously used for behaviourally model the first task is used. Processor Datapath Module Architecture For the first task. all the generated control signals from control unit specifically for this operation (LoadPC. while LoadPC is use for control a write operation from the multiplexor1 output (Mux1_out) to Program Counter. This Mux2_out is necessary for distinguish between an execution of R+R instruction (Rs2 is selected) or R+I instruction (DR is selected). To model the write operation for Rd (destination register) from any one of 4 available register (R1. The complete verilog code of this task is return as below: always @(posedge Clock) begin if (LoadPC) PC = Mux1_out. assign Sysbus = (TrisRs2) ? Rs2:16’bz. Kuala Lumpur. end 4. Malaysia asynchronous reset within synchronous sequentially system as follow: always @(nReset) if (!nReset) begin assign state = 0. PC). WriteR1.3 Datapath Module The following are the main task of datapath unit that has to be behaviourally model by verilog code to suit the definition of design processor as can be view in figure 2: 1. while Mux1_out used PC_inc control signal. 3. and PC) both for Rs1 and Rs2. PC. (Rd). if (WriteR3) R3 = Rd. deassign IR. LoadDR is for controlling a write operation from system bus to Data Register (DR). assign Sysbus = (TrisRd) ? Rd:16’bz. As can be seen from figure2. thus the procedural block used here. WriteR2. 2. assign IR = 0. if (WriteR2) R2 = Rd. To model the read operation for both Rs1 (source1 register) and Rs2 (source2 register) from any one of 5 available register (R0. thus a multiplexor is needed for selecting between 2 of the PC functions: automatically increased or accept value that write to it as a destination register (Rd). R3. PC). R2. All of the previously defined control signals for datapath from control unit specifically for read operation have been used in this structure. called as Mux1_out and the other is for selecting between Rs2 (source2 register) and DR (Data Register) data value. Both multiplexors have been coded in verilog by using continuous assignment for multiplexor as below: assign Mux1_out = (PC_inc) ? PC + 1:Rd. For example. deassign zero_flag_reg. assign zero_flag_reg = 0.

2 Register + Register operation Rd (R3) ←Rs1 (R1) ADDr Rs2 (R2) This instruction is used to perform add operation within Registers. As for example only 3 types of instructions showed here i. Timing Diagram of Register + Immediate operation . `XOR : result = input1 ^ input2.input2. The content of memory locations addresses at [`SWITCHES + Register 0(always zero)] will be loaded into destination register (Register 2). On top of that. ncluded. This simple processor model can be used as a basic platform in designing any specific-application in specific field. The advantage of using HDL methodology i. Malaysia Lastly. `SUB : result = input1 . Register + Register and Load Instructions. logic and shift operation. 3. 5. The Zero flag was modeled using assign statement as below: assign Zero = (result =0). `SRA : result = input1 >> 1. At the simulation level the functionalities of the processor been verified through timing diagram in every module.e Verilog-HDL software for designing any system such that it will improve the design 5 Fig. Timing Diagram of Load operation 6. In the processor system the memory location for SWITCHES is mapped at address 512 and the contents of SWITCHES is unsigned value 7. This instruction perform add operation within Register1 (R1) and Immediate value (259) where the immediate value 259 will be stored into Register 1. `AND : result = input1 & input2. always @(input1 or input2 or Function) case (Function) `ADD : result = input1 + input2. the asynchronous reset “assign” that overrides the synchronous action of all register in datapath unit must be included in this system as a normal method for the description of an asynchronous reset within synchronous sequentially system. The processor architecture offers 36 types of instruction available to be used. Kuala Lumpur.3rd International Conference on Advanced Manufacturing Technology (ICAMT 2004). default : result = input1. 11-13 May.1 Register + Immediate Value operation Rd← Rs1 (R1) Addi Imm 16 (259 / 103hex). Details of the process shown in the figure 4 below: The arithmetic & shift logic operation been coded using procedural assignment with sequential logic case statement as below . `OR : result = input1 | input2. Timing Diagram of Register + Register operation 5. 4. Conclusions A new and simple 16-bit RISC processor architecture has successfully been design based on HDL methodology and also a simulation with verification of processor functionalities has been effectively done using Verilog-HDL software on Synopsys Compiler.3 ALSU Module The final module in the Microprocessor is ALSU (Arithmetic & Logic Shift Unit) where basically it will perform seven basic arithmetic & logic shift operation. Processor Functionality Verification of processor functionalities has been done for the basic operation which includes arithmetic. Details of the process as shown in the figure 3 below: Fig. In this example the instruction involves arithmetic ADD operation between Register 1 and Register 2 then output will be stored in the Register 3. there is one Zero flag register included. Details of the process shown in the figure 5 below: 5. endcase Fig. `NOT : result = ~input1. The immediate value 259(103hex) in the Register 1 will be added with immediate value in the Register 2: 93(5Dhex) that have been stored initially then result: 352(160hex) will be stored in the Register 3.3 Load operation Rd (R2)←mem[Rs1 (R0)+ SWITCHES] This instruction is used to perform load operation. 4. 5. 5.e: Register + Immediate. In this example the instruction involves load from `SWITCHES to Register 2.

1994. 2001. 1997 [5] I. A-E. Papagiannis. Design and validation with HDL of a complex input/output processor for an ATM switch : the CMC. Scarfone. [13] Hebert. Morgan Kaufmann. 1995. Principles of Digital Design. pg 67-71. pg 643-647. M. The Role of Custom Design In ASIC Chips. 2000. pg 14-145. 7. Malaysia process by minimizing error.ie/hompage/ian_grout/publications. Winner. Digital Logic and Computer Design. PLL based ASIC system for DSP real-time analogue interface. A purely data structure for accurate high level timing simulation of synchronous designs. Bailey. Proceedings of the 37th conference on design automation. Digital System Design With VHDL. ACM Press. United States. Kuala Lumpur. [10] Diaz.ht ml . www. 1999 [4] M. ACM Press. ASIC microprocessor.D Gajski. 2000 [3] D. T-A. P.ece. A.. time and cost and also the design system model are fully reusable as the code can be changes accordingly for any specific need of application. A-W. [11] Mahdi. Morris Mano.3rd International Conference on Advanced Manufacturing Technology (ICAMT 2004). Patterson & J. D. Savaria. Grout. I-C. International Conference on Hardware Software Codesign. [12] Arnold.. Merayo. Computer Organization and Design . California. Cupal. M-J. Chang. Zwolinski. A. [7] Flynn. Prentice Hall. San Diego. A. A Systematic approach to software peripherals for embedded system. D. A Method to Derive Application-Specific Embedded Processing Cores. Verilog HDL A Guide to Digital Design and Synthesis. O. ACM Press. Kraljic. 1995 [9] Lioupis. pg 88-92. EZ431/631 VLSI Group Design Project-Microprocessor Specification Document. Plaza. JJ.2002. Prentice Hall. 2000. Cowles. University of Southampton. R-I. 11-13 May. ACM Press. 2000 [6] Dally. P. Verilog HDL Conference.The Hardware/ Software Interface. I-A. Wallace. pg 237-243. J-R. References [1] D. Printice Hall. L-A. Proceedings of the 22nd annual international workshop on microprogramming and microarchitecture. Psihogiou. 6 . 1989. Y. J-C. Verilog HDL conference.ul.L. Hennesy. Zamboni. Proceedings. pg 101-107. Prentice Hall. Proceedings of the ninth International symposium on hardware/software codesign. M-G. W-J. 1997 [2] M. [8] Samir Palnitkar. McNally.