You are on page 1of 87

CARLETON UNIVERSITY

Design of a 32-bit RISC Microprocessor with Floating Point Unit Design of a Floating Point Unit
Author: Adam Parsons S/N: 100653270

ELEC 4907

Supervisor: M. Shams

April 5, 2010

Department of Electronics 2009-2010

Microprocessor Design

April 5, 2010

Abstract
This fourth year project presents and examines the design of a microprocessor. The project is to design a 32-bit RISC microprocessor with a floating point unit. The design presented includes contributions from Zain Zia, Chaiya See-toh, and Adam Parsons. This report covers the topics of professional engineering practices as well as project management techniques, but it centers mainly on the microprocessor, and its design. It provides background and information on the microprocessor and its importance to todays society. The more technical portion of the report focuses heavily upon the Floating Point Unit which is can be viewed as a coprocessor to the microprocessor that was designed. It starts by focusing on the understanding of how a microprocessor operates, which is then followed by a more in depth study of how a floating point unit is designed and operated. Furthermore, the results of the successful digital design testing are presented and explained, with suggestions of improvements and further optimization techniques.

ii

Microprocessor Design

April 5, 2010

Acknowledgements
My immediate thanks go to Maitham Shams (project supervisor), for his constant guidance. Under his instruction for this project, I have gained valuable skills that can be applied in the workplace.

I would also like to thank my group members Chaiyas See-toh and Zain Zia, of whom without this project would not have been completed. Their patience and dedication to hard work made this project a success, and they were indeed a true pleasure to work with.

I would also like to thank all those I have met in my abundance of years at Carleton University. You have all kept me on the right track, as you constantly remind me of things I had often forgotten. I would also like to thank the creators of ASICWORLD.com, as well as AJDESIGNER.com, for without their guidance, I would be lost in the language of Verilog and floating point calculations.

Most of all I would like to thank my parents, who patiently stood by me in all my years of studies, although they dont always understand what I am supposed to be learning. April 2010 Adam Parsons iii

Microprocessor Design

April 5, 2010

This is for those who are patient. Were here for the long haul.

iv

Microprocessor Design

April 5, 2010

Table of Contents
Abstract ................................................................................................................................ii Acknowledgements.............................................................................................................iii Table of Figures .................................................................................................................. vii Table of Equations ............................................................................................................. vii Table of Tables .................................................................................................................. viii List of Abbreviations ......................................................................................................... viii 1.0 Introduction .................................................................................................................. 1 1.1 Purpose ......................................................................................................................... 1 1.1.1 Motivation .............................................................................................................. 1 1.1.2 Applications ............................................................................................................ 2 1.2 Report Overview ........................................................................................................... 2 2.0 Health and Safety ...................................................................................................... 4 2.1 Engineering Professionalism ..................................................................................... 6 2.2 Project Management................................................................................................. 7 3.0 Project Overview ........................................................................................................... 9 3.1 Design Specifications .................................................................................................. 10 3.2 Design Methodology ................................................................................................... 12 4.0 Background of Floating Point Representation ............................................................ 14 4.1 Floating Point Unit ...................................................................................................... 18 4.2 Addition and Subtraction ............................................................................................ 19 4.2.1 Addition ................................................................................................................ 22 4.2.2 Subtraction ........................................................................................................... 23 4.3 Multiplication and Division ......................................................................................... 24 4.3.1 Multiplier .............................................................................................................. 26 4.3.2 Division ................................................................................................................. 28 4.4 Float to Integer ........................................................................................................... 30 4.5 Integer to Float ........................................................................................................... 32 4.6 Power Approximation ................................................................................................. 33 4.7 Square-Root ................................................................................................................ 38 4.8 Floating Point Control Unit ......................................................................................... 39 5.0 Digital Testing.............................................................................................................. 42 v

Microprocessor Design

April 5, 2010

5.1 Structural Analysis....................................................................................................... 42 5.2 Timing Analysis............................................................................................................ 44 5.3 Implementation .......................................................................................................... 45 6.0 Concluding Remarks.................................................................................................... 47 6.1 Summary of Project Accomplishments ....................................................................... 47 6.2 Considerations for Future Work ................................................................................. 48 References ........................................................................................................................ 49 Appendix A: Verilog Design Code ..................................................................................... 50 Addition Module ........................................................................................................... 50 Subtraction Module ...................................................................................................... 53 Normalization Module............................................................................................... 56 24- bit Addition Module ............................................................................................ 58 Multiplication Module................................................................................................... 60 Division Module ............................................................................................................ 63 Floating Point to Integer Conversion Module ............................................................... 65 Integer to Floating Point Conversion Module ............................................................... 68 Power Module ............................................................................................................... 70 Square Root Module ..................................................................................................... 72 Control Module ............................................................................................................. 73 Appendix B: Digital Testing Results................................................................................... 77 Standard Case Waveforms ............................................................................................ 77 Corner Case Tables ........................................................................................................ 79

vi

Microprocessor Design

April 5, 2010

Table of Figures
FIGURE 1: PROJECT SCHEDULE ........................................................................................................................8 FIGURE 2: PROCESSOR OVERVIEW ..................................................................................................................9 FIGURE 3: WORKLOAD PARTITIONING CHART ..............................................................................................12 FIGURE 4: FLOATING POINT BINARY .............................................................................................................16 FIGURE 5: FLOATING POINT BLOCK DIAGRAM ..............................................................................................19 FIGURE 6: ADDITION/SUBTRACTION MODULE .............................................................................................21 FIGURE 7: CARRY LOOK-AHEAD ADDER ........................................................................................................22 FIGURE 8: TWO'S COMPLIMENT ...................................................................................................................23 FIGURE 9: MULTIPLIER AND DIVIDER MODULE .............................................................................................25 FIGURE 10: MULTIPLICATION ALGORITHM ...................................................................................................26 FIGURE 11: MULTIPLICATION BLOCK DIAGRAM............................................................................................28 FIGURE 12: DIVISION BLOCK DIAGRAM.........................................................................................................29 FIGURE 13: DIVISION ALGORITHM ................................................................................................................30 FIGURE 14: FLOAT TO INTEGER BLOCK..........................................................................................................31 FIGURE 15: INTEGER TO FLOAT DIAGRAM ....................................................................................................32 FIGURE 16: LOG2 VS IEEE ESTIMATE .............................................................................................................34 FIGURE 17: POWER UNIT...............................................................................................................................37 FIGURE 18: SQUAREROOT UNIT ....................................................................................................................39 FIGURE 19: FLOATING POINT CONTROL UNIT ...............................................................................................40 FIGURE 20: ALTERA DE2 IMPLEMENTATION .................................................................................................46

Table of Equations
EQUATION 1 ..................................................................................................................................................32 EQUATION 2 ..................................................................................................................................................32 EQUATION 3 ..................................................................................................................................................33 EQUATION 4 ..................................................................................................................................................35 EQUATION 5 ..................................................................................................................................................36 EQUATION 6 ..................................................................................................................................................36 EQUATION 7 ..................................................................................................................................................36 EQUATION 8 ..................................................................................................................................................36 EQUATION 9 ..................................................................................................................................................38 EQUATION 10 ................................................................................................................................................38

vii

Microprocessor Design

April 5, 2010

Table of Tables
TABLE 1: IEEE-754 SPECIAL REPRESENTATIONS ............................................................................................17 TABLE 2: LOG ESTIMATE ERROR ....................................................................................................................35 TABLE 3: LOG ESTIMATE ERROR CORRECTION ..............................................................................................35 TABLE 4: STANDARD TEST CASE ....................................................................................................................43 TABLE 5: SPECIAL TEST CASES .......................................................................................................................43 TABLE 6: FAST TIMING ANALYSIS ..................................................................................................................44 TABLE 7: SLOW TIMING ANALYSIS.................................................................................................................45

List of Abbreviations

CPU RISC FPGA OPCODE ALU FPU MIPS NaN INF FMAX TCO TH TSU

Central Processing Unit Reduced Instruction Set Computer Field Programmable Gate Array Operational Code Arithmetic Logic Unit Floating Point Unit Microprocessor without Interlocked Pipeline Stages Not a Number Infinity Maximum Frequency Clock Output Time Hold Time Clock Setup Time

viii

Microprocessor Design

April 5, 2010

Chapter 1
1.0 Introduction
The purpose of this report is to present and examine the design of a microprocessor. The project is to design a 32-bit RISC microprocessor with a floating point unit. The design presented includes contributions from Zain Zia, Chaiya See-toh, and Adam Parsons.

1.1 Purpose
Microprocessors are extremely small electrical devices built on an integrated circuit. They are the cornerstone that todays automated systems are built upon. Most notably the microprocessor is used in the common computer; be it either a PC or a MAC. There are many more applications of it in the modern world, and there is often a microprocessor design specifically for that task. Their uses can range from simple household devices such as washing machines and mobile phones to the automatic check-in booths in the airport.

1.1.1 Motivation
As the microprocessor becomes more integrated into every aspect of daily life, it becomes more important to understand the design and implementation of the device. This allows for improvements and optimizations in order to maintain a competitive 1

Microprocessor Design marketplace, as well as a constant progression of modern technology. Modern

April 5, 2010

applications of microprocessors require them to be faster, precise and designed with minimal hardware.

1.1.2 Applications
The 32-bit RISC microprocessor with floating point unit is a more specialized device, but it still maintains a wide range of possible implementations. It can store and manipulate large data sets, and handle real number calculations that may be necessary in the field. These applications would tend to be directed to math-intensive operations, such as data processing. With a more specialized functionality that provides faster and more accurate outputs compared to a general microprocessor. Due to the specialty of the processor it is often encouraged to implement it as part of a multi-core processing set. This particular processor can be implemented within web controllers, graphics processors, as well as mobile GPS devices.

1.2 Report Overview


Chapter 2 outlines the engineering project as a whole. This ranges from the Health and Safety concerns involved with designing a microprocessor, and the appropriate procedures taken to ensure that the respective Health and Safety 2

Microprocessor Design

April 5, 2010

requirements are met. It also addresses the engineering professionalism pertaining to the project, through project management, workload partitioning, as well as workplace synergy. Chapter 3 will begin to present you with the more technical aspect of the microprocessor and its design. This chapter addresses the overview of the project, providing background information regarding the microprocessor, as well as design specifications, and the partitioning of the actual microprocessor components in relation to each project member. The specialized main topic of the project is presented within Chapter 4. For this specific report it will provide in depth technical details regarding the floating point unit. The individual modules of the device will be explained, and the algorithms and optimizations that were used to produce a high performing floating point unit. In Chapter 5 the results from the digital design testing are displayed and analyzed. This chapter also contains explanations for performance analysis and performance restrictions of the floating point unit. Chapter 6 concludes the report by summarizing the projects work and accomplishments, and possible applications for the 32-bit RISC micro processor with floating point unit, or even just simply the floating point coprocessor. This chapter also states proposals for future improvements to be made to the processor.

Microprocessor Design

April 5, 2010

Chapter 2
2.0 Health and Safety
Microprocessors are relatively safe devices to operate, but within the computer design lab it is still important to follow and respect general health and safety principles as regulated by the Carleton University Health-And-Safety document. Some of the relevant health and safety principle from the document include: usage of personal protective equipment at all times, using the equipment only for its designed purpose, keeping the lab supervisor informed of any unsafe condition, keeping track of the location and correct use of safety equipment, determining potential hazards and appropriate safety precautions before beginning new operations.

As the microprocessor was implemented and tested on the ALTERA DE2 Development Board, extra precautions were needed to be considered to ensure a safe work environment. The following measures ensure that the board operates within its normal operating conditions while maintain the health and safety of all project members.

Microprocessor Design

April 5, 2010

Automatic testing was incorporated to check the integrity of the following units before the first execution: systems Memory Units (RAM and ROM), Input and Output signal processing circuitry, the Arithmetic Logic Unit (ALU), Control Unit, and Registers.

Software was developed which during predetermined time intervals monitors for electrical parameters such as Current or Voltage in the Circuit. When fault is sensed it sends a signal to the board which halts further execution and terminates the program. This circuitry continually tests for proper supply voltage to the microprocessor.

Overcurrent is an abnormal current greater than the full load value of the circuit. This can occur due to short-circuitry or overload currents in any unit.

Overload is an overcurrent which persists long enough to cause dangerous overheating. This can occur during long start time, during multiple restarts in a short interval and if the normal duty cycle of the processor is exceeded.

An Alarm Signal is generated by the board and the program execution is halted if an overload was to occur.

The board was implemented in such a fashion so that failure to execute the program disconnects the Voltage Source to prevent any false leakage of Current.

An asynchronous Reset Signal for the Microprocessor was designed for manual override to reset all units in case of a danger of overload.

Microprocessor Design

April 5, 2010

Microprocessor is designed so that the algorithm cant be altered by anyone except by the designers themselves.

2.1 Engineering Professionalism


To meet the requirements for professionalism in engineering, all engineers must abide by the Professional Engineers Act (PEA), and the Professional Engineers Ontario (PEO) Code of Ethics. As engineering is a self-regulated profession with strict rigor on its code of ethics, it is of upmost importance that we follow the principles of fairness, integrity and honesty. During the project design there have been minimal ethical dilemmas from a professional standpoint. As the project work was fairly separate for each individual, there were never any conflicts of points of view, as we all trusted each other to have been working at the best of their respective abilities. Professional engineering had occurred at all times, as the only reasonable way for this project to even possibly be completed is for each group partner to operate without impeding the work flow of the other group members. The only major difficulty was meeting specific preset deadlines, as previously outlined by the project proposal. The proposal may have produced an unreasonable timeline for the group to keep pace with. This may have been caused by our minimal communication outside of our weekly meetings. Consistent contact was maintained

Microprocessor Design through emails as to keep each other up to date with status reports, and questions regarding project difficulties/confusion. Although during the development of a microprocessor there are reduced

April 5, 2010

chances for unprofessional behavior there was none that had truly impeded the quality of work, or professional decisions that had to be made for the completion of the project. Each group members professional responsibilities aided in meeting each members individually designated goals. It has also enabled the achievement of the groups goal which was to successfully designing a microprocessor.

2.2 Project Management


Several project management techniques were used in order to coordinate, manage and perform the project. Weekly group meetings with Prof. M. Shams kept clear the objectives and progress of the design project. It was here that we could clarify any individual misconceptions of the design of the project with the supervisor. This portion of the project management was fairly relaxed, which is important as to not be intimidated or fear the supervisor. The relatively loose regulation of supervision had encouraged the groups members to improve communication with each other, instead of being completely autonomous with very little knowledge of each others involvement of the project. Open communication was encouraged (via email/phone), to enable the clear flow of design concepts and ideas. This also promoted the projects success for when any group 7

Microprocessor Design

April 5, 2010

member arrived at a difficult design decision or had any other difficulty either of the other group members had been able to assist. The ability to perform the project is not something that could truly fall under project management of the group. This ability rests heavily upon the individual group member as the software required to complete the project is available in several laboratories within the Department of Electronics at Carleton University; a free web service of the program was also available for use at home. The performance expectations were clearly displayed within the initial project proposal as shown in Figure 1 below.

Figure 1: Project Schedule

The partitioning of the workload relating to the project was decided during one of the initial group meetings that were supervised by M. Shams. Each portion designated was selected or compromised by the individual group members as to encourage each individual to work in the field that sparks the most personal interest, which would therefore increase workflow productivity. 8

Microprocessor Design

April 5, 2010

Chapter 3
3.0 Project Overview
Before discussing the more technical side of the design of a 32-bit RISC microprocessor with floating point unit, it important to receive a clear overview of the components of a microprocessor. A simple microprocessor is built from five basic integrated blocks as shown in Figure 2. These are: Inputs/Outputs Memory Datapath Control Unit Arithmetic Logic Unit

Figure 2: Processor Overview

Microprocessor Design

April 5, 2010

Figure 2 clearly shows the organization of the microprocessor, which is consistent throughout all types of processors. Every processor performs the same basic functions of fetching decoding and executing, which require all of the five necessary blocks. The processor receives instructions from the Memory, which is responsible for storing the instruction sets as well as data sets. The flow of data between the Memory to the processor follows the implementation of the Datapath. The Datapath interprets the instruction signals between the Control Unit, Memory, as well as the Input/Output devices. This interpretation of data is regulated by the Control Units output signals which then branch to the Input/Output devices. The input and output devices, usually consist of hardware such as a keyboard, or a graphics display.

3.1 Design Specifications


The Microprocessor design requires the implementation of a memory and register unit which temporarily stores data within the microprocessor. The memory was given a specified size of 512 x 32 bits. The size of each register in the microprocessor is specified to 32-bits. The standard set of instruction classes to be performed by the microprocessor was also specified. A description of these classes follows. 10

Microprocessor Design

April 5, 2010

R-type

Instruction

Arithmetic

Instructions

(Addition,

Subtraction,

Multiplication and Division of two operands) and Logical Instructions (A Comparison of two operands). Branch Instruction Makes a jump to the provided Memory address by comparing two operands. Operands are compared for equality and if they are equal the branch is executed. Load Instruction Loads a data word from Memory into one of the specified registers in the processor. Store Instruction Stores a data word from a specified register into the specified Memory address.

11

Microprocessor Design

April 5, 2010

3.2 Design Methodology


The microprocessor was designed using Verilog Hardware Design Language (Verilog-HDL). This allows the user to operate comfortably within the Verilog programming language, for design, testing, as well as synthesis of the overall microprocessor design. The Quartus II software was used to compile and simulate the Verilog-HDL code, as it connects fairly easily to the ALTERA DE2 development boards that the design must be implemented upon. The design was partitioned into three distinct portions as mentioned in Section 2, as well as shown in Figure 3.

Figure 3: Workload Partitioning Chart

The design follows the Von Neumann Architecture, which follows the standard FETCH DECODE EXECUTE pattern of microprocessors. This particular architecture allows the instructions and data to be stored within the same memory. This particular architecture has been chosen due to its highly-optimized instruction set, high performance implementations, programmability (easy to express programs) and 12

Microprocessor Design

April 5, 2010

reduction in the required hardware. It does this by sharing the functional units, while also implementing pipelining, and as a result a smaller silicon size chip with a lower operating power can be fabricated.

13

Microprocessor Design

April 5, 2010

Chapter 4
4.0 Background of Floating Point Representation
Many basic microprocessors are unable to handle real number arithmetic, but only integer manipulations. Real number manipulation allows for the processors to handle rational, as well as possibly irrational numbers. This is very important for data analysis and manipulation of various signals within Digital Signal Processing (DSP) devices. An important part of handling real numbers is scientific notation, which is a form of handling real numbers that may be too large to be conveniently expressed in decimal notation. This notation is presented as [fraction] x 10[exponent] [real] x 10[integer] More often than not scientific notation is expressed in its normalized format. This is the format of when the most significant integer is of the real number is the only one to the left hand side of the decimal point. This allows for easy comparison of the magnitudes of two numbers as they are expressed solely within the exponent of the notation.

14

Microprocessor Design Examples of real numbers: 11/5 = 2.2ten 3.141593ten

April 5, 2010

5.73ten x 10 -4

235.9722 x108

(normalized scientific notation) (scientific notation)

The floating point representation of real binary values allows microprocessors to manipulate real numbers. This notation deals with the fractions created by real numbers through the placement of binary points 1 as well as scientific notation.

Examples of real binary numbers: 110111.11two = 55.75 1011two x 23 1.0001two x 2-7 (scientific notation) (normalized scientific notation)

There are different formats for handling floating point binary, such as MIPS and IEEE-754 standards. In the design of a floating point unit these both require specific sizes of both the exponent and fraction. The size of the exponent and fraction (commonly referred to as mantissa) are determined by the size of the fixed word. A large exponent

Binary point is the binary term for a decimal point, as we are now working in binary notation instead of

decimal notation

15

Microprocessor Design

April 5, 2010

would be ideal for a large range of numbers, while a larger size of the fraction allows for a more precise representation of the numbers within the reduced range. For a 32-bit word neither of these are much of problem as there is a relatively large range, with capabilities of significant precision.

MIPS floating point representation was designed by MIPS Technologies (-1)sign x [fraction] x 2[exponent]

With 32-bit MIPS representation floating point binary is expressed as :


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9

EXPONENT [8 bits]

Figure 4: Floating Point Binary

FRACTION [23 bits]

This format allows for 23-bits to express the fraction, with 8-bits expressing the exponent. The exponent holds a bias of 127, which allows for the exponential to range from +127 to -127.

MIPS may not have many limitations but it is not the best representation for floating point numbers for binary computing. A more commonly used standard is the IEEE-754 representation of floating point binary. (-1)sign x [1+fraction] x 2[exponent]

16

Microprocessor Design

April 5, 2010

It still uses the 32-bit format expressed like the MIPS, but the format assumes that the fraction is constantly normalized, which enables the most significant bit to be implied. This hidden bit allows for the fraction to actually be 24-bits instead of 23-bits long.

This format is preferred over the MIPS format mainly because it allows for special representations of certain values such as Inf, and NaN to prevent interrupts.

Value Zero Signaling NaN Quiet NaN Infinity

Exponent Zero 255 255 255

Table 1: IEEE-754 Special Representations

Fraction Zero nonzero nonzero Zero

Binary 0000000000000000000000000000000 1111111100000000000000000000001 1111111110000000000000000000000 1111111100000000000000000000000

These special representations do not cover overflow and underflow exceptions. Overflow occurs when the exponent is too large to be represented, while underflow occurs when the negative exponent is also too large to be represented.

17

Microprocessor Design

April 5, 2010

4.1 Floating Point Unit


The floating point unit designed in this project utilizes the IEEE-754 format for design optimization. The actual unit performs the standard ALU operations, as well as a few extra operations that can only be done in floating point format. These operations include: Addition Subtraction Multiplication Division Power Square Root Floating Point to Integer Integer to Floating Point

Many of the algorithms that were utilized throughout the design of the floating point unit were created through basic arithmetic that can be done by hand.

18

Microprocessor Design

April 5, 2010

Figure 5: Floating Point Block Diagram

4.2 Addition and Subtraction


The addition and subtraction modules follow very similar algorithms, as it is very easy to switch between the two functions. The two functions were not complimentary together as to increase the capability of a pipelining implementation so that multiple instructions can occur before the completion of the algorithms. The two algorithms follow the same basic initial steps:

19

Microprocessor Design Step 1:

April 5, 2010

Compare Exponent of two numbers and shift the smaller number to the right until exponents match The shift allows the two numbers to have the same exponent which enables the numbers to the easily added together with a basic arithmetic adder/subtractor that could be designed from an ALU. Step 2: Add or Subtract significands The specific addition/subtraction function module is called in respect to the instruction implemented.

Step 3: Normalize the sum by shifting right or left Normalization of the sum adjusts for over flow or underflow. This must be done as each floating point number is normalized as to maintain consistency of arithmetic algorithms.

Step 4: Round the Significand

20

Microprocessor Design Rounding the significand can be done to increase accuracy, but it was decided that it would delay the operational speed of the device, in comparison to the relatively high accuracy that can be determined from a 22bit mantissa. Truncation was performed instead, as to maintain the high speeds that the unit can operate within.

April 5, 2010

Figure 6: Addition/Subtraction Module

21

Microprocessor Design

April 5, 2010

4.2.1 Addition

The addition of the significands can be done for the sake of simplicity with a basic Carry-Save Adder (CSA). However, a Carry-Look Ahead Adder (CLA) produces results faster as it calculates both the propagate and the generate signals for the group to avoid waiting for the ripple to determine the first groups generated carry. The group generate signal is the signal that generates the summation by passing the two signals through an AND gate. This is done in parallel with the group propagation signal is the signal that determines if the signal will pass along. This signal is created by passing the group inputs through an OR gate.

In this project a 24 bit CLA Adder was used as to increase the speed of the function.

Figure 7: Carry Look-Ahead Adder

22

Microprocessor Design

April 5, 2010

4.2.2 Subtraction

The subtraction of the significands utilized the CLA used in the previous module. As the difference between addition and subtraction is minimal it was very elementary to change the addition module into a subtraction module.

The only technical change from the addition to subtraction was the mantissa of the subtractor was converted into a negative value through twos compliment manipulation.

Figure 8: Two's Compliment

23

Microprocessor Design

April 5, 2010

4.3 Multiplication and Division

The as with the Addition/Subtraction modules the Multiplication and Division modules follow similar premises when dealing with floating point notation.

Step 1:

Addition/Subtraction exponents without bias The exponents are added or subtracted together, just as if this was done by hand.

Step 2:

Manipulation of Significands Multiplication or Division of the significands is done at this stage, where a separate module is called to perform the specified operation.

Step 3:

Check if Normalized and for Overflow As binary multiplication/division produces an output that is a summation of the sizes of the inputs, it is important to check if the product/quotient is normalized, as well as the exponents being check for overflow.

24

Microprocessor Design Step 4: Rounding or Truncation Due to the large size of the mantissa, as well as for the sake of speed, truncation was chosen to occur as it was deemed unnecessary for a floating point number that already holds such precision. Step 5: Set the Sign The sign it set by passing the two sign bits through an XOR gate to produce the appropriate value.

April 5, 2010

Figure 9: Multiplier and Divider Module

25

Microprocessor Design

April 5, 2010

4.3.1 Multiplier

There are several various algorithms for multiplication, but the rolled out binary multiplier was used, as like the addition/subtraction modules it was the most relatable and clear to understand and explain. A simple binary adder performs a simple shift and summation for the entire length of the multiplicand. This can be implemented within a loop to conserve space within the chip design. This produces a synchronous circuit which therefore relies upon 24 clock edges until it is completed. The rolled out version was used to make the same basic algorithm but instead of the synchronous loop, each stage was laid out to produce the accurate multiplication in much less than 24 clock edges. This format allows for easier implementation of pipelining circuitry as to support multiple function calls simultaneously.

26

Microprocessor Design

April 5, 2010

Step 1: Check the multiplier bit [n] Step 2: If the multiplier bit [n] holds a value of 1 then the product is summed with the multiplicand and placed within the product register Step 3: Shift the multiplicand left by 1 bit Step 4: Shift the multiplier right by 1 bit Step 5: Check if the loop has stepped through each multiplier bit, if not then step to the next bit (n+1) and repeat.

27

Microprocessor Design

April 5, 2010

Figure 11: Multiplication Block Diagram

4.3.2 Division
The division algorithm is identical to the multiplication algorithm, and can be implemented in a very similar manner. This division algorithm is different from the multiplication algorithm implemented because it was kept in the iterative loop.

28

Microprocessor Design Step 1: Step 2a: Check the Remainder

April 5, 2010

If the remainder is greater than zero the quotient is shifted by 1-bit, and

the new LSB is set to a value of one. Step 2b: If the remainder is less than zero the quotient is shifted by 1-bit, the new

LSB is set to a value of zero, and the remainder is restored. Step 5: Check if the loop has stepped through each remainder bit, if not then

step to the next bit (n+1) and repeat.

Figure 12: Division Block Diagram

29

Microprocessor Design

April 5, 2010

Figure 13: Division Algorithm

The loop was maintained because as the multiplication algorithm was already built, the looped divider would provide an appropriate comparison during simulations, and timing analysis.

4.4 Float to Integer


The integer to float unit was centrally designed with the purpose of use within the Power Module. It separates the 23-bit fraction into an integer, numerator and denominator.

30

Microprocessor Design

April 5, 2010

It does this by placing the fraction into a shift register that is twice as large as an integer register (2x 32-bit), as to maximize the size of the integers that can be produced. As to order to produce an integer the exponent must be zero; therefore large register is then shifted left or right according to the value of the exponent to set the exponent to zero. If the exponent is too large for the shift register to manipulate then the register is shifted to the far right or the far left and the exponent is adjusted accordingly.

Figure 14: Float to Integer Block

31

Microprocessor Design

April 5, 2010

The numerator and denominator are formed by stepping through the bottom segment (32-bits) of the shift register, while counting the value of bits. As the bits are counted they follow the equation =
0

1 2
Equation 1

, () = = + ,

= 0 = 1

Equation 2

4.5 Integer to Float

Figure 15: Integer to Float Diagram

The Integer to Float Module accepts the inputs in signed binary integer format, and normalizes the integer, which provides it with an exponent value of its own. The importance of normalization was previously discussed in Section 4.0

32

Microprocessor Design

April 5, 2010

4.6 Power Approximation


The Power Module of the FPU was to initially use a recursive algorithm, but a looping algorithm provided many issues than were necessary for determining power of a floating point number. The first issue was the fact that the loop was in fact a loop. As floating point representation handles large real numbers, it would be unwise to loop for extremely large numbers, with large exponents. The loop method would prove to be far too slow for floating point representation. The second issue was the difficulty in creating the power of a real number (for example 2.523.194). The looped algorithm had initially only dealt with integers in the exponent form, but with the application real numbers, the situation had become more difficult to manipulate.

The first issue was addressed by changing the Power Module into a Power Approximation Module. The Power Approximation Module uses the IEEE-754 binary representation of a 32-bit floating point number in its estimation of LOG2(X).

LOG2(x) = Xinteger/223 - 127


Equation 3

This approximation method is fairly accurate for its respective speed. 33

Microprocessor Design

April 5, 2010

Figure 16: Log2 vs IEEE Estimate

However, a problem occurs when the logarithmic value is further manipulated, the precision becomes greatly lost in comparison to its actual value.
X=5 Y = Log2(X) = Z = 2*Y = 2^Z = Real 5 2.3219 4.6439 25 Xinteger = Xinteger Y = 127 = 223 Z = 2*Y = Z + 127 = 223 XFloat= Estimate 1084227584 2.25 4.5 1103101952 16 Lossy Estimate 1084227584 2 4 1065353220 1

34

Microprocessor Design
Table 2: Log Estimate Error

April 5, 2010

This issue can be resolved by shifting the value of Xinteger to the left a few binary points before passing it through the logarithmic estimate function. In this implementation of the algorithm the Xinteger was shifted by two places and the results can be seen in the table below.

X=5 Y = Log2(X) = Z = 2*Y = 2^Z =

2.3219 4.6439 25

Real

Y=(

223

Z+127100 223

Xinteger*100 = Z= 2*Y = XFloat =

) (127 100) = /100 =

Estimate 108422758400 225 450 1103101952 24

Lossy Estimate 108422758400 225 450 1103101952 24

Table 3: Log Estimate Error Correction

The accuracy of the estimate of the power module has greatly increased from the implementation. This can be further improved by shifting the initial Xinteger by several more binary places. The second issue was resolved by utilizing the Float to Integer Converter Module. This module converts the binary real exponent into a more manipulative integer format. + 10

Equation 4

35

Microprocessor Design

April 5, 2010

With the logarithmic estimate provided, the manipulation into a power module becomes as simple as multiplication and division of an integer. Example:

= log 2 [ ] =

Equation 5

= log 2 [ ] = 2 + 2

log 2 [ ]

Equation 6

Equation 7

Equation 8

These calculations are within the block diagram in Figure 17 which shows the flow of the individual steps to produce the power approximation module.

36

Microprocessor Design

April 5, 2010

Figure 17: Power Unit

The power block is incapable of handling exponents outside the range of +4.2950e+009 to - 4.2950e+009 as these numbers are too large for the algorithm to properly operate. 37

Microprocessor Design

April 5, 2010

4.7 Square-Root
There are several different iterative methods (i.e. Newtons Method) for developing the square-root estimate of a binary real number. The issue was once again, that the methods take several iterations. For this reason, the Square-Root Module utilizes the same method of logarithmic approximation as the Power Module. This is much faster than the Power Module, as it does not rely upon the Float to Integer Converter. It simply follows the formula: _ = 1 ( log 2 ) 2

= 2 _

Equation 9

Equation 10

38

Microprocessor Design

April 5, 2010

Figure 18: Squareroot Unit

4.8 Floating Point Control Unit


The Floating Point Control Unit is most vital portion of the coprocessor, as it is responsible for organizing the various operations of the coprocessor. This is done by handling only six opcode signals, with each representing the specific module called to produce an output value. The control module handles the input instructions and checks 39

Microprocessor Design for special cases. Although IEEE-754 floating point representation was designed to handle certain special cases, it was deemed better to be on the side of caution.

April 5, 2010

Figure 19: Floating Point Control Unit

The various exceptions the control unit is designed to catch are cases when the inputs or outputs would be clearly: Zeros, NaNs or INFs.

40

Microprocessor Design

April 5, 2010

For example: Input x Zero = Zero Input + Inf = INF Input/Zero = NaN

After the control unit checks for special cases it then calls the individual modules in event that the predetermined opcode is received.

41

Microprocessor Design

April 5, 2010

Chapter 5
5.0 Digital Testing
After the complete coprocessor was designed, the overall digital testing began. There were two types of digital design testing that was done on the design. These tests were regarding structural analysis, as well as timing analysis.

5.1 Structural Analysis


Structural testing is a form of testing when specific inputs are used in the testing of the circuit. These gauge the range of the design, and detect flaws within the design. This is different from functional testing, because in structural testing the design is known, and so is the ability to probe points along the designated testing paths.

The first case test shown in the table below is a standard test case, which is comfortably within the operational range of the floating point units parameters. This test case shows that the floating point unit is operating properly under reasonable conditions.

42

Microprocessor Design Standard case: (Input1 = 5, Input2 = 0.75) Real Value A 5 B 0.75 Add 5.75 Sub 4.25 Mul 3.75 Div 6.6667 Pow 3.3437 SQRT 2.2361 Floating Point Value 0_10000001_01000000000000000000000 0_01111110_10000000000000000000000 0_10000001_01110000000000000000000 0_10000001_00010000000000000000000 0_10000000_11100000000000000000000 0_10000001_10101010101010101010100 0_10000000_10101110000101000111101 0_10000000_00011110101110000101000
Table 4: Standard Test Case

April 5, 2010

FPU Value 5 0.75 5.75 4.25 3.75 6.6667 3.3599 2.2399

More specific cases were also used to test the corners of the design. A few results of the specific cases that were used are shown in the table below:

Real Value FPU Value Real Value FPU Value Real Value FPU Value Real Value FPU Value

A B ADD SUB MUL DIV POWER 5 5 10 0 25 1 3125 - 10 0 25 1 2560 5 -5 0 10 -25 -1 3.1605e-018 - 0 10 -25 -1 NaN 5 0 5 5 0 NaN 1 - 5 5 0 NaN 1 5 Inf Inf -Inf Inf 0 Inf - Inf -Inf Inf 0 Inf
Table 5: Special Test Cases

Several more extra cases were tested with the results posted within Appendix B. These cases test the corners of the design, which range from the smallest numbers the FPU should be able to handle all the way to the largest.

43

Microprocessor Design

April 5, 2010

5.2 Timing Analysis


Two versions of timing analysis were used on the digital design. The first one, which can be seen in Table 6, is the Fast Model Timing Analyzer. The second version is the Slow Model Timing Analyzer which is shown in Table 7. The fast timing model utilizes best-case timing model of the fastest device to analyze and report the fastest delay of the timing characteristics for the design. While the slow timing model utilizes the worst-case scenario for the designs timing characteristics.

Type Worst-case tsu Worst-case tco Worst-case tpd Worst-case th Worst-case Minimum tco Worst-case Minimum tpd Fast Model Clock Setup: 'clk'

Time 4.702 ns 11.824 ns 10.560 ns 4.808 ns 4.231 ns 4.286 ns 4.88 MHz ( period = 204.804 ns )

From opcode[0] mulA[30] B[31] A[0] floatmul: floatmulA|e[25] opcode[1]

To subB[30] valueout[30] valueout[29] mulA[0] valueout[3] valueout[7]

Power:power|normFr[0] Power:power| float2int:float2pow| denominator[29]

Table 6: Fast Timing Analysis

The maximum operation frequency of the Fast Timing Model is a slow 4.88MHz, while in the Slow Timing Model the maximum operating frequency is an even slower 2.21MHz. Table 6 clearly shows that the Float to Integer Module used within the Power module is by far the slowest module, and it greatly affects the highest operating frequency of the device.

44

Microprocessor Design
Type Worst-case tsu Worst-case tco Worst-case tpd Worst-case th Slow Model Clock Setup: 'clk' Time 9.168 ns 24.432 ns 20.806 ns 9.778 ns 2.21 MHz ( period = 453.352 ns ) From opcode[0] mulB[24] B[31] A[0] Power:power| float2int:float2pow| numerator[18]

April 5, 2010
To subB[30] valueout[30] valueout[29] mulA[0] Power:power| normFr[0]

Table 7: Slow Timing Analysis

The Slow Model Analysis was done without the power modules float to integer converter, and produced a maximum frequency of 88.75MHz, with a Fast Timing Analysis fmax of 199.80MHz. The slowest clock setup time was due to the Subtractor Module needing to switch to a twos compliment before it passes through the binary adder. Although the removal of the twos compliment would make the subtractor into another floating point adder, curiosity took over, and resulted in impressive improvements in speed. The Slow model Analyzer produced a fmax value of 144.7MHz, while the fast model analyzer produced more than double that speed with an fmax characterized at 320.82 MHz.

5.3 Implementation
The coprocessor was implemented onto the ALTERA DE2 development board as shown in Figure 20. Due to the lack of inputs provided by the board it was unreasonable to create a complex form of setting the input values for the device for live-testing. Instead a set of preset inputs were assigned for purpose of presentation.

45

Microprocessor Design

April 5, 2010

Figure 20: Altera DE2 Implementation

The various switches determined the opcode, and set the operation to be performed by the device. The push buttons were set as the reset input for the device, for when a new opcode was to be inputted into the board. The outputs were displayed on both the small LCD as well as the on the 18 LEDs located above the switches. Due to the size of the LCD display, which did not allow the floating point unit to display large real numbers, the 18 LEDs displayed the output in floating point binary format. The eight green LEDs clearly showing the exponential value, while the rest displayed a truncated version of the mantissa. 46

Microprocessor Design

April 5, 2010

Chapter 6
6.0 Concluding Remarks
This concluding chapter allows for a brief review of the project, and to emphasize on a few key points that developed during the course of the year.

6.1 Summary of Project Accomplishments


The coprocessor was successfully designed and implemented upon the ALTERA DE2 Development board, using 32-bit data registers.

The addition and subtraction modules utilized the fastest basic binary addition algorithm. The multiplication module is optimized for the ability to be pipelined, while the divider utilized a slow looping algorithm. The Power module used IEEE logarithmic estimation to improve performance, but was slowed down considerably by the Float to Integer Converter that it required to fully operate.

The digital design was put under test, and analyzed to optimized performance characteristics. There were a few small bugs here and there, but the floating point unit successfully passed the rigorous digital device testing, although perceptively slow to the commercial versions of the FPU, which operate at speeds around 250MHz. 47

Microprocessor Design

April 5, 2010

6.2 Considerations for Future Work


There are still many more possibilities for a faster Floating Point Unit coprocessor. Improvements of fmax within the Float to Integer Module would greatly increase the speed by a minimum factor of four, and with improvements in the speed of the twos compliment of the Subtractor, the maximum operating frequency would be at worst case somewhat close to the standard operating of commercial FPUs.

The multiplier is ready to be pipelined, and several tests are required to see how well the coprocessor would combine with the regular 32-bit RISC microprocessor.

48

Microprocessor Design

April 5, 2010

References
[1] Carleton University, Laboratory Health and Safety Manual, [Online]. Available at: http://www.doe.carleton.ca/undergrads/health-and-safety.pdf [Accessed: March 28 2010]. [2] D. A. Patterson and J. L. Hennessey, Computer Organization and Design, 3rd Ed. San Francisco: Morgan Kaufmann Publishers. [3] Carleton University, Microprocessor Systems, ELEC 4601. [Online]. Available at: http://www.doe.carleton.ca/~shams/ELEC4601/Course_Notes.pdf [Accessed: Oct 17 2009]. [4] Carleton University, Digital Design Flow, ELEC 4706. [Online]. Available at: http://www.doe.carleton.ca/courses/ELEC4706/protected/class%20material/08-0910%20LECTURES [Accessed: Oct 13 2009]. [5] Carleton University, Binary Manipulation, SYSC 3006. [Online]. Available at: http://www.sce.carleton.ca/courses/sysc-3006/f09/Part3-BinaryManipulations.pdf [Accessed: Oct 12 2009]. [6] ASIC WORLD, Verilog Tutorials, Deepak Kumar Tala [Online]. Available at: http://www.asic-world.com/verilog/veritut.html [Accessed: Sept 25 2009]. [7] Goldberg, David. 1991. What Every Computer Scientist Should Know About Floating-Point Arithmetic.[Online]. Available at: http://delivery.acm.org/10.1145/110000/103163/p5-goldberg.pdf [Accessed: Oct 5 2009].

49

Microprocessor Design

April 5, 2010

Appendix A: Verilog Design Code


Addition Module
module adder (A,B,OUT,clk,rst,overflow,start,finish); input clk,rst,start; input [31:0] A,B; output [31:0] OUT; output finish,overflow; reg[7:0] expLarge,diffreg; reg [23:0] shift,noshift,out; reg sign,snorm; wire signA,signB,expoverflow; wire[24:0] addoutput,normLarge; wire [7:0] expA,expB,diff,expNorm; wire [23:0] fractionA,fractionB,normout,shiftout; assign fractionA[22:0]=A[22:0]; assign fractionB[22:0]=B[22:0]; assign fractionA[23]=1; assign fractionB[23]=1; assign expA=A[30:23]; assign expB=B[30:23]; assign diff=diffreg; assign signA=A[31]; assign signB=B[31]; //1.0 ALU Difference and shift SHIFTR8 SHIFT8(shift[23:0],shiftout[23:0],diff); always@(posedge clk or posedge rst) begin if(rst) begin shift<=24'b0; noshift<=24'b0; expLarge<=8'b0; diffreg<=8'b0; sign<=1'b0; snorm<=1'b0; end else if(start)begin if(expA==expB)begin

50

Microprocessor Design
shift<=fractionA; noshift<=fractionB; expLarge<=expA; diffreg<=8'b0; sign<=signA; snorm<=1'b1;

April 5, 2010

end

end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end

end else if(expA>expB)begin shift<=fractionB; noshift<=fractionA; expLarge<=expA; diffreg<=expA-expB; sign<=signA; snorm<=1'b1; end else if(expB>expA)begin shift<=fractionA; noshift<=fractionB; expLarge<=expB; diffreg<=expB-expA; sign<=signB; snorm<=1'b1; end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end

// Add Significands bitadder add(noshift,shiftout,1'b0,addoutput); // Normalize

51

Microprocessor Design
normalizer addnorm(expLarge,addoutput,expNorm,normLarge,clk,rst,expoverflow,snorm,fnorm); // check for overflow? assign overflow=expoverflow; // output exponent assign OUT[30:23]=expNorm;//expNorm; // output truncated mantissa assign OUT[22:0]=normLarge[22:0]; // output sign assign OUT[31]=sign; assign finish=fnorm; endmodule

April 5, 2010

52

Microprocessor Design

April 5, 2010

Subtraction Module
module subtractor (A,B,OUT,clk,rst,overflow,start,finish); input clk,rst,start; input [31:0] A,B; output [31:0] OUT; output overflow,finish; reg[7:0] expLarge,diffreg; reg [23:0] shift,noshift,out; reg sign,snorm; wire signA,signB,expoverflow,fnorm; wire[24:0] suboutput,normLarge; wire [7:0] expA,expB,diff,expNorm; wire [23:0] fractionA,fractionB,normout,shiftout,shiftout1; assign fractionA[22:0]=A[22:0]; assign fractionA[23]=1; assign fractionB[22:0]=B[22:0]; assign fractionB[23]=1; assign expA=A[30:23]; assign expB=B[30:23]; assign diff=diffreg; assign signA=A[31]; assign signB=B[31]; //1.0 ALU Difference and shift SHIFTR8 SHIFT8sub(shift[23:0],shiftout[23:0],diff); always@(posedge clk or posedge rst) begin if(rst) begin shift<=24'b0; noshift<=24'b0; expLarge<=8'b0; diffreg<=8'b0; sign<=1'b0; snorm<=1'b0; end else if(start)begin if(expA==expB)begin shift<=fractionA; noshift<=fractionB; expLarge<=expA; diffreg<=8'b0;

53

Microprocessor Design
sign<=1b0; snorm<=1'b1;

April 5, 2010

end

end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end

end else if(expA>expB)begin shift<=fractionB; noshift<=fractionA; expLarge<=expA; diffreg<=expA-expB; sign<=1b0; snorm<=1'b1; end else if(expB>expA)begin shift<=fractionA; noshift<=fractionB; expLarge<=expB; diffreg<=expB-expA; sign<=1b1; snorm<=1'b1; end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end

//2.0 Add Significands // this is the slowest part by 100MHz i blame the INV wire [23:0] negtemp; assign negtemp[23:20]=~shiftout[23:20]+1'b1; assign negtemp[19:15]=~shiftout[19:15]+1'b1; assign negtemp[14:12]=~shiftout[14:12]+1'b1; assign negtemp[11:8]=~shiftout[11:8]+1'b1;

54

Microprocessor Design
assign negtemp[7:4]=~shiftout[7:4]+1'b1; assign negtemp[3:0]=~shiftout[3:0]+1'b1; bitadder sub(noshift,negtemp,1'b0,suboutput); //bitadder sub(noshift,(~shiftout+1'b1),1'b0,suboutput); // Normalize normalizer addnorm(expLarge,suboutput,expNorm,normLarge,clk,rst,expoverflow,snorm,fnorm); // check for overflow? assign overflow=expoverflow; // output exponent assign OUT[30:23]=expNorm;//expNorm; // output truncated mantissa assign OUT[22:0]=normLarge[22:0]; // output sign assign OUT[31]=sign; assign finish=fnorm; endmodule

April 5, 2010

55

Microprocessor Design

April 5, 2010 Normalization Module

module normalizer(expin,in,expout,out,clk,rst,overflow,start,finish); input clk,rst,start; input [7:0]expin; input [24:0]in; output [23:0]out; output [7:0] expout; output finish,overflow; reg active,first; reg [24:0] regF,fregF; reg [8:0] regE,fregE; always@(posedge clk or posedge rst)begin if(rst)begin regF<=25'b0; regE[7:0]<=8'b0; fregF<=25'b0; fregE<=9'b0; active<=1'b0; first<=1'b0; end else if(start)begin if(!first)begin fregF<=fregF; fregE<=fregE; regF<=in[24:0]; regE[7:0]<=expin[7:0]; active<=1'b1; first<=1'b1; end else if(regF[24]==1'b1)begin regF<=regF>>1'b1; regE<=regE+1'b1; // Increment Exponent active<=1'b1; first<=1'b1; end else if(regF[23]==1'b0 && regF[24]==1'b0)begin //shift left regF<=regF<<1'b1; regE<=regE-1'b1; // Decrement Exponent active<=1'b1; first<=1'b1; end else begin regE<=regE;

56

Microprocessor Design
regF<=regF; fregE<=regE; fregF<=regF; active<=1'b0; first<=1'b1;

April 5, 2010

end assign out=fregF[23:0]; assign expout=fregE[7:0]; assign overflow=fregF[8]; assign finish=~active; endmodule

end end else begin regE<=regE; regF<=regF; fregE<=fregE; fregF<=fregF; active<=1'b0; first<=1'b0; end

57

Microprocessor Design

April 5, 2010 24- bit Addition Module

module bitadder(addinA,addinB,carryin,sum); input[23:0] addinA,addinB; input carryin; output [24:0]sum; wire carryout1,carryout2,carryout3,carryout4,carryout5,carryout6; wire [3:0] sum1,sum2,sum3,sum4,sum5,sum6; fourbitadder adder1(addinA[3:0],addinB[3:0],carryin,sum1,carryout1); fourbitadder adder2(addinA[7:4],addinB[7:4],carryout1,sum2,carryout2); fourbitadder adder3(addinA[11:8],addinB[11:8],carryout2,sum3,carryout3); fourbitadder adder4(addinA[15:12],addinB[15:12],carryout3,sum4,carryout4); fourbitadder adder5(addinA[19:16],addinB[19:16],carryout4,sum5,carryout5); fourbitadder adder6(addinA[23:20],addinB[23:20],carryout5,sum6,carryout6); assign sum[24] = carryout6; assign sum[23:20] = sum6; assign sum[19:16] = sum5; assign sum[15:12] = sum4; assign sum[11:8] = sum3; assign sum[7:4] = sum2; assign sum[3:0] = sum1; assign test=addinA+addinB; endmodule

58

Microprocessor Design

April 5, 2010 4-bit Addition Module

module fourbitadder(addinA,addinB,carryin,sum,carryout); input[3:0] addinA,addinB; input carryin; output [3:0]sum; output carryout; wire[3:0] generation,propagation; wire [2:0] carrybit; assign sum[0] = propagation[0]^carryin; assign generation = addinA&addinB; assign propagation = addinA^addinB; assign carrybit[0] = generation[0]|(propagation[0]&carryin); assign carrybit[1] = generation[1]|(generation[0]&propagation[1])|(propagation[0]&propagation[1]&carryin); assign carrybit[2] = generation[2]|(generation[1]&propagation[2])|(generation[0]&propagation[1]&propagation[2]) |(propagation[0]&propagation[1]&propagation[2]&carryin); assign sum[3:1] = propagation[3:1]^carrybit[2:0]; endmodule

59

Microprocessor Design

April 5, 2010

Multiplication Module
module floatmul(A,B,OUT,clk,rst,overflow,start,finish); input clk,rst,start; input [31:0] A,B; output [31:0] OUT; output finish,overflow; reg active; reg [47:0] Mplier,Mcand,product,d,e; reg [7:0]counter; wire [23:0] fractionA,fractionB; wire [7:0] expA,expB; wire [8:0] expsum; assign expA=A[30:23]-127; assign expB=B[30:23]-127; assign fractionA={1'b1,A[22:0]}; assign fractionB={1'b1,B[22:0]}; // adding exponents without bias assign expsum = ((A[30:23]-127)+(B[30:23]-127))+127; // check for overflow assign overflow = expsum[8]; // multiplying significands always@(posedge clk)begin if(rst)begin d=0; e=0; active=1'b0; end else if(start) begin active=1'b1; d={({32{fractionA[1]}}&fractionB)&({32{fractionA[0]}}&fractionB),({32{fractionA[1]}}&fra ctionB)^({32{fractionA[0]}}&fractionB)}; e[0]=d[0]; d={({32{fractionA[2]}}&fractionB)&d[32:1],({32{fractionA[2]}}&fractionB)^d[32:1]}; e[1]=d[0]; d={({32{fractionA[3]}}&fractionB)&d[32:1],({32{fractionA[3]}}&fractionB)^d[32:1]}; e[2]=d[0]; d={({32{fractionA[4]}}&fractionB)&d[32:1],({32{fractionA[4]}}&fractionB)^d[32:1]}; 60

Microprocessor Design

April 5, 2010

e[3]=d[0]; d={({32{fractionA[5]}}&fractionB)&d[32:1],({32{fractionA[5]}}&fractionB)^d[32:1]}; e[4]=d[0]; d={({32{fractionA[6]}}&fractionB)&d[32:1],({32{fractionA[6]}}&fractionB)^d[32:1]}; e[5]=d[0]; d={({32{fractionA[7]}}&fractionB)&d[32:1],({32{fractionA[7]}}&fractionB)^d[32:1]}; e[6]=d[0]; d={({32{fractionA[8]}}&fractionB)&d[32:1],({32{fractionA[8]}}&fractionB)^d[32:1]}; e[7]=d[0]; d={({32{fractionA[9]}}&fractionB)&d[32:1],({32{fractionA[9]}}&fractionB)^d[32:1]}; e[8]=d[0]; d={({32{fractionA[10]}}&fractionB)&d[32:1],({32{fractionA[10]}}&fractionB)^d[32:1]}; e[9]=d[0]; //-----------10----------d={({32{fractionA[11]}}&fractionB)&d[32:1],({32{fractionA[11]}}&fractionB)^d[32:1]}; e[10]=d[0]; d={({32{fractionA[12]}}&fractionB)&d[32:1],({32{fractionA[12]}}&fractionB)^d[32:1]}; e[11]=d[0]; d={({32{fractionA[13]}}&fractionB)&d[32:1],({32{fractionA[13]}}&fractionB)^d[32:1]}; e[12]=d[0]; d={({32{fractionA[14]}}&fractionB)&d[32:1],({32{fractionA[14]}}&fractionB)^d[32:1]}; e[13]=d[0]; d={({32{fractionA[15]}}&fractionB)&d[32:1],({32{fractionA[15]}}&fractionB)^d[32:1]}; e[14]=d[0]; d={({32{fractionA[16]}}&fractionB)&d[32:1],({32{fractionA[16]}}&fractionB)^d[32:1]}; e[15]=d[0]; d={({32{fractionA[17]}}&fractionB)&d[32:1],({32{fractionA[17]}}&fractionB)^d[32:1]}; e[16]=d[0]; d={({32{fractionA[18]}}&fractionB)&d[32:1],({32{fractionA[18]}}&fractionB)^d[32:1]}; e[17]=d[0]; d={({32{fractionA[19]}}&fractionB)&d[32:1],({32{fractionA[19]}}&fractionB)^d[32:1]}; e[18]=d[0]; //---------20----------d={({32{fractionA[20]}}&fractionB)&d[32:1],({32{fractionA[20]}}&fractionB)^d[32:1]}; e[19]=d[0]; d={({32{fractionA[21]}}&fractionB)&d[32:1],({32{fractionA[21]}}&fractionB)^d[32:1]}; e[20]=d[0]; d={({32{fractionA[22]}}&fractionB)&d[32:1],({32{fractionA[22]}}&fractionB)^d[32:1]}; e[21]=d[0]; d={({32{fractionA[23]}}&fractionB)&d[32:1],({32{fractionA[23]}}&fractionB)^d[32:1]}; e[22]=d[0]; //---again!!! for N+1 iterations or good luck 61

Microprocessor Design

April 5, 2010

d={({32{fractionA[23]}}&fractionB)&d[32:1],({32{fractionA[23]}}&fractionB)^d[32:1]}; e[22]=d[0]; //--------e[47:23]=d; active=1'b0; end else begin d=0; e=0; active=1'b0; end end // truncation // output the mantissa assign OUT[22:0]=e[45:22];//e[45:23];//46:22 // output exponent assign OUT[30:23]=expsum[7:0]; // set the sign // xor the signs together assign OUT[31]={A[31] ^ B[31]}; assign finish=~active; endmodule

62

Microprocessor Design

April 5, 2010

Division Module
module floatdiv(A,B,OUT,clk,rst,overflow,start,finish);//floatdiv input clk,rst,start; input[31:0] A,B; output[31:0] OUT; output overflow,finish; wire [7:0] expA,expB; wire [8:0] expsub; assign expA=A[30:23]-127; assign expB=B[30:23]-127; reg active; reg [46:0] remainder,divisorreg;//46:0 reg [23:0] quotientreg,outreg; reg [7:0] counter; //adding exponents without bias assign expsub =((A[30:23]-127)-(B[30:23]-127))+127; // check for overflow assign overflow = expsub[8]; //the divider starts here always@(posedge clk or posedge rst) begin if(rst)begin remainder<={22'b0,1'b1,A[22:0]}; quotientreg<=24'b0; divisorreg<={1'b1,B[22:0],23'b0}; counter<=7'b0; active<='b0; outreg<=24'b0; end else if(start)begin if(counter<25)begin//25 remainder<=remainder-divisorreg; if(remainder[46])begin // shift quotient to the left quotientreg<={quotientreg[22:0],1'b0}; end else begin// restore if less than zero remainder<=remainder+divisorreg; // shift quotient to the left quotientreg<={quotientreg[22:0],1'b1}; end // shift divisor to the right divisorreg<={1'b0,divisorreg[46:1]}; counter<=counter+1'b1;

63

Microprocessor Design
active<=1'b1; outreg<=outreg;

April 5, 2010

assign OUT[30:23]=expsub[7:0]; assign finish=~active; assign OUT[22:0]=outreg[22:0]; // set the sign // xor the signs together assign OUT[31]={A[31] ^ B[31]}; endmodule

end

end else begin quotientreg<=quotientreg; outreg<=outreg; divisorreg<=divisorreg; remainder<=remainder; counter<=counter; active<=1'b0; end

end else begin quotientreg<=quotientreg; divisorreg<=divisorreg; remainder<=remainder; counter<=counter; active<=1'b0; outreg<=quotientreg; end

64

Microprocessor Design

April 5, 2010

Floating Point to Integer Conversion Module


module float2int(IN,clk,rst,integerOUT,numerator,denominator,sign,INTexp,start,finish); input [31:0] IN;//the_float input clk,rst,start; output [31:0] integerOUT,numerator,denominator; output sign,finish; output [7:0] INTexp; wire signed [7:0] diff; wire unsigned [7:0]expIN; wire [63:0]fraction; wire [31:0] fractionIN,integerIN,integerOUT; reg active; reg [31:0] bincount,denominator,numerator; reg [63:0]fractionshift; reg [7:0] counter,intexp; assign fraction[31:9]=IN[22:0]; assign fraction[32]=1; assign expIN=IN[30:23]; assign diff=expIN-127; assign integerIN=fractionshift[63:32]; assign fractionIN=fractionshift[31:0]; assign integerOUT=fractionshift[63:32]; assign sign=IN[31]; sign assign INTexp=intexp-127; assign finish=~active;

// normalize the exponent //the integer

//the positive/negative

//shift A into integer and fraction always@(posedge rst or posedge clk)begin if (rst) begin fractionshift<=64'b0; intexp<=8'b0; end else if(expIN<= 159 && expIN>= 95)begin if(expIN<127)begin fractionshift<=fraction>>(-diff); intexp<=expIN+(-diff); end else begin fractionshift<=fraction<<diff; intexp<=expIN-diff; end

65

Microprocessor Design

April 5, 2010

end else if(expIN>159)begin // for a large integer fractionshift<=fraction<<31; intexp<=expIN-5'b11111; // decrement exponent end else if(expIN<95)begin // for a small fraction fractionshift<=fraction>>31; intexp<=expIN+5'b11111; // increment exponent end else begin fractionshift<=fraction; intexp<=intexp; end end // find the numerator and denominator integers of the floating point // by adding the fractions 1/2+1/4+1/8..etc = 0.875=7/8 always@(posedge clk or posedge rst)begin if(rst)begin counter<=32;//0 bincount<=1; numerator<=1'b0; denominator<=1'b1; active<=1'b0; end else if(start)begin if(counter>0)begin counter<=counter-1'b1; bincount<=bincount*2'b10; active<=1'b1; if(fractionIN[counter])begin //cross multiplying denominator<=bincount*denominator; numerator<=bincount*numerator+denominator; end else begin numerator<=numerator; denominator<=denominator; end end else begin counter<=counter; bincount<=bincount; numerator<=numerator; denominator<=denominator;

66

Microprocessor Design
active<=1'b0;

April 5, 2010

end end endmodule

end

67

Microprocessor Design

April 5, 2010

Integer to Floating Point Conversion Module


module INT2FLOAT(in,out,clk,rst,start,finish); input clk,rst,start; input [31:0]in; output [31:0] out; output finish; reg [64:0] shiftreg,fshiftreg; reg [7:0] shiftexp,fshiftexp; reg active,first,sign; always@(posedge clk or posedge rst)begin if (rst)begin shiftreg<=65'b0; shiftexp<=8'b10111111;// 159=8'b10011111 //191=10111111 active<=1'b0; first<=1'b1; fshiftreg<=65'b0; fshiftexp<=8'b10001110; sign<=1'b0; end else if(start)begin if(first)begin if(in[31])begin// if negative shiftreg[31:0]<=~in[31:0]+1'b1; sign<=1'b1; end else begin shiftreg[31:0]<=in[31:0]; sign<=1'b0; end shiftexp<=shiftexp; fshiftreg<=fshiftreg; fshiftexp<=fshiftexp; active<=1'b1; first<=1'b0; end else if(!shiftreg[64])begin shiftreg<=shiftreg<<1'b1; shiftexp<=shiftexp-1'b1; fshiftreg<=fshiftreg; fshiftexp<=fshiftexp; active<=1'b1; first<=1'b0; sign<=sign; end

68

Microprocessor Design
else begin shiftreg<=shiftreg; shiftexp<=shiftexp; fshiftreg<=shiftreg; fshiftexp<=shiftexp; active<=1'b0; first<=1'b0; sign<=sign; end shiftreg<=shiftreg; shiftexp<=shiftexp; fshiftreg<=fshiftreg; fshiftexp<=fshiftexp; active<=active; first<=first; sign<=sign;

April 5, 2010

end else begin

end assign out={sign,fshiftexp[7:0],fshiftreg[63:41]}; assign finish=~active; endmodule

end

69

Microprocessor Design

April 5, 2010

module Power(A,B,OUT,clk,rst,start,finish); input clk,rst,start; input [31:0] A,B; output finish; output [31:0] OUT; wire [63:0] log,mullog,mullog2; wire [31:0] integerOUT,numerator,denominator,OUTmul; wire [7:0] expPow; wire ffloat; reg [63:0] normInt,normFr; reg [31:0] check,checkout; reg active; float2int float2pow(B,clk,rst,integerOUT,numerator,denominator,sign,expPow,start,ffloat);

Power Module

assign log=(A*100)/8388608-127*100; // convert to log A/(2^(23))-127; assign mullog=(numerator*log/denominator); //+log apply to the power of B assign mullog2=log*integerOUT; //seperately include the integer //check for invalids always@(posedge clk or posedge rst)begin if(rst)begin normInt<=63'b0; normFr<=63'b0; active<=1'b1; end else if(ffloat)begin if(A<=10'd1065353216)begin // if negative or zero normInt<=32'b1111111100000000000000000000000;// NaN normFr<=32'b0; active<=1'b0; end else if(numerator==0)begin normFr<=32'b0; normInt<=((mullog2+127*100)*8388608)/100; active<=1'b0; end else if(integerOUT==0)begin normInt<=32'b0; normFr<=((mullog+127*100)*8388608)/100; active<=1'b0; end

70

Microprocessor Design
else begin //because it can't do log2(0) normInt<=((mullog2+127*100)*8388608)/100; // convert from log (A+127)*(2^(23)); normFr<=((mullog+127*100)*8388608)/100; active<=1'b0; end end else begin normInt<=normInt; normFr<=normInt; active<=1'b1; end end always@(posedge clk or posedge rst)begin if(rst)begin check<=32'b0; checkout<=32'b0; end else if(!active)begin if(B[31]==1'b1)begin check<=normFr+normInt; checkout<={check[31],(~check[30:23]+1'b1),check[22:0]}; end else begin check<=check; checkout<=normFr+normInt; end end else begin check<=check; checkout<=checkout; end end assign OUT=checkout; assign finish=~active; endmodule

April 5, 2010

71

Microprocessor Design

April 5, 2010

Square Root Module


module SQRT(A,OUT,clk,rst); input [31:0]A; input clk,rst; output [31:0]OUT; wire [63:0] logrt,mulrt; reg [31:0] normIntr; assign logrt=(A*100)/8388608-127*100; // convert to log A/(2^(23))-127; assign mulrt=logrt/2; // apply the root always@(posedge clk or posedge rst)begin if(rst) normIntr<=0; else if(A<=10'd1065353216) // if negative or zero normIntr<=32'b1111111100000000000000000000000;// NaN else normIntr<=((mulrt+127*100)*8388608)/100; // convert from log (A127)*(2^(23)); end assign OUT=normIntr; endmodule

72

Microprocessor Design

April 5, 2010

Control Module
module Control(opcode,A,B,clk,rst,valueout); input clk,rst; input[2:0] opcode; input[31:0] A,B; output[31:0] valueout; reg [31:0]OUT; reg [31:0] addA,addB,subA,subB,divA,divB,mulA,mulB,powA,powB,sqrtA; reg sdiv,spow,sadd,ssub,smul,ssqrt,finish; wire [31:0] addOUT,subOUT,OUTdiv,OUTmul,OUTpow,root; // declare constants wire[31:0] Inf,NaN,Zero,One; wire /*fpow,fdiv,fadd,fsub,fmul,fsqrt,*/addof,subof,mulof,divof; assign Inf=32'b1111111100000000000000000000000; assign NaN=32'b1111111110000000000000000000000; assign One=32'b0011111110000000000000000000000; assign Zero=32'b0000000000000000000000000000000; adder addition(addA,addB,addOUT,clk,rst,addof,sadd,fadd); //A+B subtractor subtraction(subA,subB,subOUT,clk,rst,subof,ssub,fsub); //A-B floatmul floatmulA(mulA,mulB,OUTmul,clk,rst,mulof,smul,fmul);// A*B floatdiv floatdivA(divA,divB,OUTdiv,clk,rst,divof,sdiv,fdiv);// A/B Power power(powA,powB,OUTpow,clk,rst,spow,fpow);//A^B SQRT squareroot(sqrtA,root,clk,rst); // check for Zeros NaN & INFs inputs // check for Special Case Statements always@(posedge clk or posedge opcode)begin // opcode case statements case(opcode) 0: begin sdiv<=1'b0;spow<=1'b0;sadd<=1'b0;ssub<=1'b0;smul<=1'b0;ssqrt<=1'b0; addA<=1'b0;addB<=1'b0;subA<=1'b0;subB<=1'b0;divA<=1'b0;divB<=1'b0;mulA<=1'b0;m ulB<=1'b0;powA<=1'b0;powB<=1'b0;sqrtA<=1'b0; OUT<=NaN; end //For the Adder =============================================================== 1: begin if(A[30:0]==Zero[30:0]) OUT<=B; else if(B[30:0]==Zero[30:0]) OUT<=A;

73

Microprocessor Design
else if(A==Inf || B==Inf) OUT<=Inf[30:0]; else if(A[30:0]==B[30:0] && A[31]!=B[31]) //A+(-A) or (-A)+A OUT<=Zero; else if(A[31]==1'b1 && B[31]==1'b0)begin //-A+B = B-A subB<={1'b0,A[30:0]}; subA<=B; OUT<=subOUT; end else if (A[31]==1'b0 && B[31]==1'b1)begin //A+-B = A-B subB<={1'b0,B[30:0]}; subA<=A; ssub<=1'b1; OUT<=subOUT; end else if (A[31]==1'b0 && B[31]==1'b1)begin //-A + -B = -(A+B) addB<={1'b0,B[30:0]}; addA<={1'b0,A[30:0]}; OUT<={1'b1,addOUT[30:0]}; sadd<=1'b1; end else begin addA<=A; addB<=B; sadd<=1'b1; OUT<=addOUT; end

April 5, 2010

end //For the Subtractor ============================================================= 2: begin if(A==B) OUT<=Zero;// just make it zero else if(A[30:0]==Zero[30:0]) OUT<={~B[31],B[30:0]}; else if(B[30:0]==Zero[30:0]) OUT<=A; else if(A[31]==1'b1 && B[31]==1'b0)begin //-A - B = -(B+A) addA<={1'b0,A[30:0]}; addB<=B; sadd<=1'b1; OUT<={1'b1,addOUT[30:0]}; end else if (A[31]==1'b0 && B[31]==1'b1)begin //A - -B = A+B addB<={1'b0,B[30:0]}; addA<=A; sadd<=1'b1;

74

Microprocessor Design
OUT<={1'b0,addOUT[30:0]}; end else if (A[31]==1'b1 && B[31]==1'b1)begin //- A - -B = B-A subA<={1'b0,B[30:0]}; subB<={1'b0,A[30:0]}; ssub<=1'b1; OUT<=subOUT; end else begin subA<=A; subB<=B; ssub<=1'b1; OUT<=subOUT; end

April 5, 2010

end // For the Mulitplier ============================================================= 3: begin if (A[30:0]==Zero[30:0]|| B[30:0]==Zero[30:0]) //if(A*Zero) OUT<={{A[31]^B[31]},Inf[30:0]}; // Inf else if(A[30:0]==Inf[30:0]||B[30:0]==Inf[30:0]) //if(A*Inf) OUT<=Zero; // Zero else begin mulA<=A; mulB<=B; smul<=1'b1; OUT<=OUTmul; end end // For the Divider ============================================================ 4: begin // varieties of Zero or NaN or Inf if(A[30:0]==Zero[30:0]) OUT<=Zero; // Zero else if(B[30:0]==Zero[30:0]) OUT<={{A[31]^B[31]},Inf[30:0]}; // Inf else if(B[30:0]==Inf[30:0]) //if(A/Inf) OUT<=Zero; // Zero else if(A[30:0]==B[30:0]) // 1 OUT[31:0]<={{A[31]^B[31]},One[30:0]};//One else begin divA<=A; divB<=B; sdiv=1'b1; OUT<=OUTdiv;

75

Microprocessor Design

April 5, 2010

end end // For the Power ============================================================== 5: begin if(A[31]) OUT<=NaN; else if (A[30:0]==Zero[30:0]) // +/- Zero OUT<=Zero; if(B[30:0]==Zero[30:0]) OUT<=One; else begin powA<=A; powB<=B; spow<=1'b1; if(fpow) OUT<=OUTpow; end end // For the SquareRoot ============================================================ 6: begin if(A[31]) OUT<=NaN; else if (A[30:0]==Zero[30:0]) // +/- Zero OUT<=Zero; else begin sqrtA<=A; OUT<=root; end end // Default Case =============================================================== default: begin sdiv<=1'b0;spow<=1'b0;sadd<=1'b0;ssub<=1'b0;smul<=1'b0;ssqrt<=1'b0; addA<=1'b0;addB<=1'b0;subA<=1'b0;subB<=1'b0;divA<=1'b0;divB<=1'b0;mulA<=1'b0;m ulB<=1'b0;powA<=1'b0;powB<=1'b0;sqrtA<=1'b0; OUT<=NaN; end endcase end // output the output value assign valueout=OUT; endmodule

76

Microprocessor Design

April 5, 2010

Appendix B: Digital Testing Results


Standard Case Waveforms
Addition

Subtraction

Multiplication

Division

Power

77

Microprocessor Design
Square-root

April 5, 2010

78

Microprocessor Design

April 5, 2010

Corner Case Tables


Real Value A SMALLEST B 5 Add 5 Sub 5 Mul 2.9387e-038 Div 1.1755e-039 Pow* 7.0138e-192 SQRT 2.6484e-096 Real Value A LARGEST B 5 Add Sub Mul 1.7014e+039 Div 6.8056e+037 Pow 4.5624e+192 SQRT 2.1360e+096 Real Value A -SMALLEST B 5 Add 5 Sub 5 Mul -2.9387e-038 Div -1.1755e-039 Pow -7.0138e-192 SQRT NaN Real Value A -LARGEST B 5 Add -3.4028232e+38 Sub -3.4028232e+38 Mul -1.7014e+039 Div -6.8056e+037 Pow 4.5624e+192 SQRT NaN Floating Point Value 0_00000001_00000000000000000000000 0_10000001_01000000000000000000000 0_10000001_01000000000000000000000 0_10000001_01000000000000000000000 0_00000011_11000000000000000000000 0_11111111_11001100110011001100101 0_01110011_10000101000111101011100 0_01000000_00000000000000000000000 Floating Point Value 0_11111110_11111111111111111111110 0_10000001_01000000000000000000000 0_11111110_11111111111111111111110 1_11111110_11111111111111111111110 0_00000000_01111111111111111111100* 0_11111100_11001100110011001100100 1_01111101_11110011001100110011001 0_10111110_11111101011100001010001 Floating Point Value 1_00000001_00000000000000000000000 0_10000001_01000000000000000000000 0_10000001_01000000000000000000000 1_10000001_01000000000000000000000 1_00000011_01000000000000000000000 1_11111111_11001100110011001100101 1_10001000_00000000000000000000000 0_11111111_10000000000000000000000 Floating Point Value 1_11111110_11111111111111111111110 0_10000001_01000000000000000000000 1_11111110_11111111111111111111110 1_11111110_11111111111111111111110 1_00000000_01111111111111111111100* 1_11111100_11001100110011001100100 0_01111101_11110011001100110011001 0_11111111_10000000000000000000000 FPU Value 5.8774717e-39 5 5 5 8.2284604e-38 INF 3.7109374e-4 1.0842021e-19 FPU Value 3.4028232e+38 5 3.4028232e+38 3.4028232e+38 Overflow(INF) 7.6563520e+37 -4.8749998e-1 1.8354509e+19 FPU Value -5.8774717e-39 5 5 5 -8.2284604e-38 - INF -5.1200000e+2 NaN FPU Value -3.4028232e+38 5 -3.4028232e+38 -3.4028232e+38 Overflow(INF) 7.6563520e+37 4.8749998e-1 NaN

* note: the corner cases are too large for the power unit algorithm to handle

79