You are on page 1of 4

2009 International Joint Conference on Artificial Intelligence

32 bit Multiplication and Division ALU Design Based on RISC Structure

Kui YI
Department of Computer Science and Information Engineer, WuHan Polytechnic University Wuhan, HuBei Province 430023, China Email: ykll1903@126.com

Yue-Hua DING
Department of Computer Science and Information Engineer, WuHan Polytechnic University Wuhan, HuBei Province 430023, China Email: ykll1903@126.com

AbstractThis paper analyses structure and algorithm of Floating-Point ALU, and implements multiplication and division operation in the homo-hardware circuit. The FloatingPoint multiplication and division ALU supports Floating-Point number according with IEEE-754 standard. This ALU adopts 4-Level pipelining structure: 0 operation number check exponent addition and subtraction operation fraction multiplication and division operation result normalization and round. Each step can act as a single module. Among these modules, there are some registers which can prepare necessary data for next operation. Keywords- pipelining, IEEE-754 Standard, VHDL

II.

MULTIPLICATION AND DIVISION WORKING THEORY AND STRUCTURE

To complete 32 bit multiplication and division ALU design, there are 4-Level pipelining: 0 operation number check exponent addition and subtraction operation fraction multiplication and division operation result normalization and rounding; Floating-Point number accords with IEEE754 standard. Fraction is expressed by sign magnitude. Exponent is expressed by biasing. This ALU can enhance CPU arithmetic speed and the whole system performance. A. Key Questions to be Resolved (1)In the process of arithmetic, there are some different details between multiplication and division operation, although they are similar. So INOP operation signal is designed to distinguish the arithmetic type. (2)Separate multiplication and division arithmetic process correctly. Divide this process into 4 pipeliningsegments and assure the 4 modules have each dependent function. (3)Because of adopting pipelining structure in the design, we must assure mission in the pipelining is in sequence and enter the pipelining uninterruptedly, so the advantage of pipelining--enhancing the CPU performance, can be reflected. B. Key Technology and Complexity Analysis 1) Adopted Key Technology The key technology adopted in the design is pipelining structure. Pipelining in computer is just like factory assembly line. To implement pipelining, input mission must be divided into a series of sub-mission at first, which can make submission run parallel at each step of pipelining. Mission entering pipelining uninterruptedly can realize sub-mission parallel. Therefore, pipelining process improves computer system performance by leaps and bounds. It is very economic method to realize time parallel in computer. 2) Complexity Analysis (1)For avoiding overflow, It must check divisor is 0 or not. If divisor is 0, error message must be cautioned. (2)Because of adopting IEEE 754 standard floating-point number in the design, exponent is expressed by biasing. On doing exponent addition or subtraction, [ x + y ] biasing = [ x ] biasing + [ y ] complement;[ x - y ] biasing = [ x ]

I. INTRODUCTION On digitalization and information dynasty, digital integrated circuit application is very popular. With the development of micro-electronics technology, craft digital integration runs to ASIC (Application Specific IC) now, but radio tube transistor middle-small-scale integration VLSI (Very-Large-Scale Integration) is out of time gradually. ASCI reduces production cost, enhances production physical size and accelerates society digitalization tenor. But for long cycle length of designhigh investment of version change and low flexibility, ASCI application range is restricted. With the growth of market demands, Very-Large-Scale high speedlow consumption new pattern FPGA/CPLD is renovated constantly. New pattern FPGA integrates CPU or DSP (Digital Signal Processing) even. This kind of FPGA can allow you implement software and hardware in the homo-EPGA chip cooperation design, which provides strong hardware support for SOPC (System On Programmable Chip)[1]. This paper introduces multiplication and division ALU can implement CPU ALU Floating-point multiplication and division operation in QuartusII with VHDL language. In addition, this ALU in this paper is designed corresponding to RISC design characteristic. Relation among each module in arithmetic unit and each module function are also illuminated well and truly. This article provides top structure of multiplication and division arithmetic implementation. Pipelining structure can make each module implement time parallelism on running process and enhance the whole CPU arithmetic speed.

978-0-7695-3615-6/09 $25.00 2009 IEEE DOI 10.1109/JCAI.2009.159

761

Authorized licensed use limited to: SRM University. Downloaded on July 17, 2009 at 03:45 from IEEE Xplore. Restrictions apply.

biasing + [-y]complement, so it must assure [ y ] complement and [ -y ] complement are correct on subtraction arithmetic [7]. (3)While doing fraction multiplication and division, fraction structure is the 1.m form actually, so the lost 1 is filled in actual arithmetic. The result after arithmetic must be 1.m form too, so the fraction of final result must be processed. C. Structure and Design of Floating-Point Number 1) Structure of Floating-Point Number Because multiplication and division ALU is for 32 bit Floating-Point and Floating- Point numbers adopts IEEE 745 standard, the Floating-Point numbers structure in the design shows as Figure.1:
Figure. 1 Structure of Floating-Point Numbers

entering pipelining data, store the data which inputs pipelining later, into every register temporarily. Abstract structure graph designed shows as Figure. 3.

Figure. 3 ALU Design Structure

III.

TOP CONNECTION GRAPH AND MODULE DESIGN

A. Top Connection Graph


The top connection graph of Floating-Point number multiplication and division ALU shows as Figure. 4567:

(Caption: In the Floating-Point number structure, M is value of fraction, which is expressed by sign magnitude adopting hidden bit expression; E is value of exponent, which is expressed by biasing; S is fraction sign.) 2) Floating-Point Numbers Multiplication and Division Arithmetic Flow

Figure. 4 the First Part of Connection Graphic

Figure. 2 Floating-Point Numbers Multiplication and Division Arithmetic Flow

Through analysis of two Floating-Point numbers multiplication and division arithmetic, the multiplication and division ALU adopts pipelining structure and is divided into 4 steps in the whole arithmetic process: 0 operation number check exponent addition and subtraction operation fraction multiplication and division operation result normalization and rounding. Between each module there are registers designed to store last operation result. The four modules can act as four different sub-missions. As long as input numbers to do multiplication and division arithmetic into the pipelining, the missions can implement each operation parallel and time parallelism is realized too. Adding register among each module meet the pipelining structure demand, because realizing every sub-mission time parallelism in the pipelining structure asks every sub-mission in the process can input pipelining uninterruptedly. But each sub-mission implement time is not same in the whole pipelining. Considering not affect arithmetic of early

Figure. 5

the Second Part of Connection Graphic

Figure. 6 the Third Part of Connection Graphic

762

Authorized licensed use limited to: SRM University. Downloaded on July 17, 2009 at 03:45 from IEEE Xplore. Restrictions apply.

Figure. 7 the Fourth Part of Connection Graphic

(Caption: because of the connection graphic is too big, the whole connection graphic is divided into 4 parts orderly.) B. 0 Operation Number Check Module (CHECKO ) This module charges of checking whether two operation numbers 0 or not. On doing multiplication arithmetic, if one operation number is 0 the result must be 0, no matter what another number is. On doing division arithmetic, if divisor is 0, overflow happens. If divisor is not 0 but dividend is 0, the final result must be 0. The abstract chip design shows as Fig.4:
OUTDATA1 OUTDATA2, shows first data and second data which result is not 0 respectively; OUTERROR, shows divisor is 0, that is activating error alarm; OUTDATA3, shows final result is 0; INOP: because there is difference between multiplication arithmetic and division arithmetic, which can result in difference in the operation of each module, a pin must be designed to distinguish the operation type. It is supposed that on INOP = 0 expressing multiplication arithmetic and on INOP = 1 expressing division arithmetic. OUTOP, shows output operation sign; INDATA1INDATA2, shows two source operation numbers which comes from upper register.

the above mentioned rules does not establish. Exponent of double sign bit ALU can be used and the second sign bit of biasing is defined. Highest bit is always 0 taking part in addition and subtraction arithmetic, so the overflow condition is that highest sign bit of result is 1. If the low sign bit is 0, result overflow shows; if the low sign bit is 1, result underflow shows. If the highest sign bit is 0, overflow does not happen; If low sign bit is 1, result is positive; If low sign bit is 0, result is negative. The two input data in the chip are INDATA1 and INDATA2. On implementing addition and subtraction arithmetic of exponents, taking bits from exponent of each number are getting from the 23rd bit to 30th bit of each number. As for the first number, 8 bits data can be got to take part in arithmetic which are INDATA1(30 DOWNTO 23); But for the second number, DATA2(30) is negated at first. Combine with other bits which are from 23rd bit to 29th bit to take part in arithmetic. The 8 bits are INDATA2(29 DOWNTO 23). In IEEE 754 standard fraction of every Floating-Point number is 1.m default form, but input number dose not show the 1. To ensure correctness of next multiplication and division arithmetic of fraction, output data in this module must show the 1. The 1 adds to fraction of output number, and then 1.m form shows. In this way, fraction takes part in arithmetic with 24 bits on next multiplication and division arithmetic. Detailed chip design shows as Fig.5. D. Resister 3( FLOATERG3) This register prepares correct data for next fraction multiplication and division. On doing fraction multiplication and division, two fraction signs must be known, so and so only result sign can show correctly. Fetch the sign bit of two numbers and make nonequivalence operation between them. The 31st bit expresses fraction sign in terms of FloatingPoint structure. Because all input numbers merge together and REGVALUE(65) expresses all digit bits , the sign of the first number is expressed as REGVALUE(32) and the sign of the second number is expressed as REGVALUE(65). If result of two numbers nonequivalence operation is 1, it expresses that the sign bit of two numbers is opposite and the final output result is negative number. If result of two numbers nonequivalence operation is 0, it expresses that the sign bit of two numbers is sameness and the final output result is positive number. There is one output pin named OUTCODE in the chip, which shows final sign bit and final exponent combine and output together. Detailed chip design shows as Fig. 6. E. Multiplication Arithmetic Module(MULT) This module charges of multiplication of two input numbers. MULT function unit is called in the design. Its two input pins are dataa[23......0] and datab[23......0]. They are both 24 bits, so the input two numbers are asked to add abbreviatory 1 to the fraction of them. The two numbers which come into the unit is sign magnitude form. Through arithmetic its final result is 48 bits which is [47......0]. The defined output pin is result[47......0]. Detailed chip shows as Fig. 6.

C. Exponent Addition and Subtraction Operation Module (ECODEOP) The main function of this module is to realize exponent addition and subtraction arithmetic of two numbers. Multiplication rule of Floating-Point number is exponent addition and fraction multiplication; Division rule of Floating-Point number is exponent subtraction and fraction division. Exponent is usually expressed by biasing in computer. Data bit of biasing and data bit of complement is same but sign bit of them is on the contrary. Converting the exponent to complement form can realize exponent addition and subtraction arithmetic. It is supposed that there are two Floating-Point numbers x and y. Complement form of y is taken by means of negating highest bit of 8 bit exponent of y. [ x + y ]biasing = [ x ]biasing + [ y ]complement, which is multiplication operation; After highest bit of 8 bit exponent of y is negated and complement form of 8 bit exponent of y is taken, the complement of [y] is got. [ x y ]biasing = [ x ]biasing + [ y ]complement, which is division operation. When INOP equals 0, [y]complement outputs and adds to [ x ]biasing, which is multiplication operation. When INOP equals 1, [y]complement outputs and adds to [ x ]biasing, which is division operation. If result of exponent overflows,

763

Authorized licensed use limited to: SRM University. Downloaded on July 17, 2009 at 03:45 from IEEE Xplore. Restrictions apply.

F. Division Arithmetic Module (DIVIDE) This module charges of two input numbers division arithmetic. DIVIDE function unit is called in the design. Two input numbers are also asked to be sign magnitude form in the function unit. There are two input pins in the chip defined as number[23......0] and denom[23......0]. The result through the arithmetic is that quotient[23......0] shows quotient and remain[23......0] shows remainder. Because remainder problem is not considered in the design, only one output pin quotient[23......0] is defined to express final arithmetic result. Detailed chip structure shows as Fig.6. G. Resister 4( FLOATERG4) This register charges of judging input data before output. Considering that there is 32-bits data bus in the design, the result enters the register with sequence. It must choose different data bit as final output result by judging INOP in the register. Data input into the register is sorted from lowest bit to highest bit. For example, the first data is from 0th bit to 23rd bit and other data sorted on this way. There are some pins in the register, detailed chip design shows as Fig. 7.
Input pin INOP , judges what arithmetic type is; Input pin CLK, clock signal; Input pin INCODE[8......0] shows sign and exponent of final result; Input pin INDATA3[47......0] shows sign result of multiplication arithmetic; Input pin INDATA4[23......0]shows final result of division arithmetic; Output pin OUTDATA[23......0], shows data value on getting correct digit bit.

number fraction is between 1/2 and 1, every multiplication result only left-shift normalized two bit at the most. On fraction multiplication the low word length part of the double word length and on fraction division because of indivisibility more bit of quotient will define word length. On normalizing the arithmetic result, the kept bits can be left-shifted into the defined word length of fraction. Round toward + infinity is adopted in the design, so no guard digit need kept. Put the final result into the output pin OUTDATA[31......0]. This result is the final result of multiplication or division arithmetic. Detailed chip design shows as Fig. 7. IV. CONCLUSION

Final hardware in RISC microprocessor based on FPGA is tested on GW48 EDA system, which is produced by Hangzhou Kang-Xin Corp. Through QuartusII4.1 simulation integration and synthesis assemble, result expresses that the Floating-Point number multiplication and division ALU according with IEEE 754 standard accomplishes expectant function. Pipelining structure implements each sub-mission parallelism operation and enhance system performance of computer.

REFERENCE
[1] [2] [3] [4] [5] Mo-JianKun, Gao-JianSheng,Computer Organization, Huazhong University of Science and Technology Press1996; Zheng-WeiMin, Computer System Structure, Tsinghua University Press,2004.10; Bai-ZhongYing, Computer Organization, Science Press,2000.11; Pan-Song, Huang-JiYe, EDA Technology Utility Tutorial [M]BeiJingScience Press2002; [5]Zhang-XiuJuan, Chen-XinHua, EDA Design and emulation Practice [M] BeiJingLEngine Industry Press 2003 WeiPu Information; [6]"IEEE Standard of Binary Floating-Point Arithmetic" IEEE Standard754,IEEE Computer Society,1985; [7]Jan M. Rabaey,Digital Integrated Circuits- A Design Perspective Prentice H al l I nternational,I nc. Tsinghua University Press 1999.2; [8] John L. Hennessy David A. Patterson Computer Organization&Design The Hardware/SoftwareInterface, Engine Industry Press 1999.9; [9]Aldec Active-HDL the Design Verification Company Online Help;

H. Normalization and Rounding Module ( OUTRESULT) This module charges for normalization and rounding of final fraction result. Result can not output directly, because there is a problem with data which enter the register: no matter how result is came from multiplication or division arithmetic, it must assure the fraction is 1.m form, which can accord with IEEE 754 form. So fraction that is not 1.m form must be left-shift normalized and the final result is made 1.m form. Because the problem exists, data can not output directly as result. The problem must be solved in the register: as for fraction is 0.1m form, fraction should be left-shifted and lowest bit fill the emptied with 0; As fraction is 0.01m form, fraction should be left-shifted normalized and lowest bit fill the emptied with 00. Because range of Floating-Point

[6] [7]

[8]

[9]

764

Authorized licensed use limited to: SRM University. Downloaded on July 17, 2009 at 03:45 from IEEE Xplore. Restrictions apply.

You might also like