Floating-Point Representation

http://fourier.eng.hmc.edu/e85/lectures/arithmetic_html/node11.html

next

previous

Next: Floating-Point Arithmetic Up: arithmetic_html Previous: Fast Multiplication

Floating-Point Representation
Decimal Cases

( ( )

)

In programming, a floating point number general, a floating-point number can be written as

is expressed as

. In

where M is the fraction mantissa or significand. E is the exponent. B is the base, in decimal case . Binary Cases As an example, a 32-bit word is used in MIPS computer to represent a floating-point number:

1 bit ..... 8 bits .............. 23 bits representing: The implied base is 2 (not explicitly shown in the representation). The exponent can be represented in signed 2's complement (but also see biased notation later). The implied decimal point is between the exponent field E and the significand field M. More bits in field E mean larger range of values representable. More bits in field M mean higher precision. Zero is represented by all bits equal to 0: Normalization To efficiently use the bits available for the significand, it is shifted to the left until all leading 0's disappear

1 of 7

6/3/2012 7:17 AM

Floating-Point Representation http://fourier.0: By normalization. it does not need to be shown explicitly. This can be done by simply adding 1 (a bias) at the MSB of the exponent field and the resulting representation is called biased notation. Zero is represented by all 0's and is not (and cannot be) normalized. Moreover. a 4-bit exponent field and a 9-bit significand field): with an implied 1. The significand could be further shifted to the left by 1 bit to gain one more bit for precision. The value can be kept unchanged by adjusting the exponent accordingly. Example: A binary number can be represented in 14-bit floating-point form in the following ways (1 sign bit. to avoid possible confusion. highest precision can be achieved.html (as they make no contribution to the precision). The first bit 1 before the decimal point is implicit. as the MSB of the significand is always 1. in the following the default normalization does not assume this implicit 1 unless otherwise specified.edu/e85/lectures/arithmetic_html/node11. Biased Notation for Exponent To simplify the hardware for comparing two exponents (to use simpler integer sorting rather than subtraction).eng. we may want to avoid 2's complement representation for the exponent. The actual value represented is However. Consider a 5-bit exponent field (range of exponents: ): 2 of 7 6/3/2012 7:17 AM .hmc.

an implied 1 is used. As the implied base is 2. With the biased exponent.html The bias depends on number of bits in the exponent field. The range of actual exponents represented is still the same. 8 exponent bits and 23 bits for the significand. 3 of 7 6/3/2012 7:17 AM . the bias for the 8-bit exponent is (instead of The 8-bit exponent field: ). including 1 sign bit.hmc. i. the bias is .eng. which lifts the representation (not the actual exponent) by half of the range to get rid of the negative parts represented by 2's complement. the significand has effectively 24 bits including 1 implied bit to the left of the decimal point not explicitly represented in the notation.Floating-Point Representation http://fourier. Note in particular that in IEEE 754 notation.edu/e85/lectures/arithmetic_html/node11.. If there are e bits in this field.e. the value represented by the notation is: Floating-Point Notation of IEEE 754 The IEEE 754 floating-point standard uses 32 bits to represent a floating-point number.

. the range of magnitudes representable is ). the range of exponent values representable is and the range of magnitudes representable is about For example. 16..edu/e85/lectures/arithmetic_html/node11.Floating-Point Representation http://fourier. etc. e. if . Note that the base is 4 (instead of 2) 4 of 7 6/3/2012 7:17 AM . The smallest exponent is reserved to represent denormalized numbers (smaller than which cannot be normalized) and zero. (or in general. Examples: Normalize . is represented by: Other Implied Bases Given e bits for the exponent field. e.g.html Note: Zero exponent is represented by . or (b) increasing the implied base from 2 to 4. the range of exponent values representable is and the range of magnitudes representable is This range can be extended by (a) increasing number of bits for exponent. the bias of the notation. when the implied base is . 8. Normalization: If the implied base is .eng. the significand must be shifted multiple of q bits at a time so that the exponent can be correspondingly adjusted to keep the value unchanged.g. Obviously.hmc. the implied 1 can no longer be used. The range of exponents representable is from -126 to 127. a number is divided by zero. the representation is normalized. If at least one of the first q bits of the significand is 1. For example. The exponent (with all zero significand) is reserved to represent infinities or not-anumber (NaN) which may occur when.

eng. the significand has to be shifted 3 bits at a time. Obviously the implied 1 can not be used.html Note that the significand has to be shifted to the left two bits at a time during normalization. the value is (without implied 1): or (with implied 1): Examples of IEEE 754: -0. Similarly.Floating-Point Representation http://fourier.hmc. if .3125 5 of 7 6/3/2012 7:17 AM . corresponding to dividing the value by 4. Represent in biased notation with and implied base is 2. bits for exponent field. normalization means to left shift the significand q bits at a time until there is at least one 1 in the highest q bits of the significand. and the notation is (without implied 1): or (with implied 1): Find the value represented in this biased notation: The biased exponent is 17.edu/e85/lectures/arithmetic_html/node11. the actual exponent is . because the smallest reduction of the exponent necessary to keep the value represented unchanged is 1. In general. if the implied base is . The bias is The biased exponent is .

25 The biased exponent: . .edu/e85/lectures/arithmetic_html/node11.hmc. this value is a denorm which cannot be normalized: Can you answer the following questions regarding 32-bit IEEE 754 floating-point representation and explain why?: 6 of 7 6/3/2012 7:17 AM .eng. As the most negative exponent representable is -126.5 The based exponent: .Floating-Point Representation http://fourier.html The biased exponent is .0 The biased exponent is . 37. 1. -78.

hmc.Floating-Point Representation http://fourier.eng.edu/e85/lectures/arithmetic_html/node11.html What is the largest magnitude (absolute value) representable? What is the smallest magnitude (absolute value) representable? what is the largest gap between two consecutive numbers? what is the smallest gap between two consecutive numbers? next previous Next: Floating-Point Arithmetic Up: arithmetic_html Previous: Fast Multiplication Ruye Wang 2003-10-24 7 of 7 6/3/2012 7:17 AM .