Floating-Point Representation




Next: Floating-Point Arithmetic Up: arithmetic_html Previous: Fast Multiplication

Floating-Point Representation
Decimal Cases

( ( )


In programming, a floating point number general, a floating-point number can be written as

is expressed as

. In

where M is the fraction mantissa or significand. E is the exponent. B is the base, in decimal case . Binary Cases As an example, a 32-bit word is used in MIPS computer to represent a floating-point number:

1 bit ..... 8 bits .............. 23 bits representing: The implied base is 2 (not explicitly shown in the representation). The exponent can be represented in signed 2's complement (but also see biased notation later). The implied decimal point is between the exponent field E and the significand field M. More bits in field E mean larger range of values representable. More bits in field M mean higher precision. Zero is represented by all bits equal to 0: Normalization To efficiently use the bits available for the significand, it is shifted to the left until all leading 0's disappear

1 of 7

6/3/2012 7:17 AM

we may want to avoid 2's complement representation for the exponent. to avoid possible confusion. The actual value represented is However. Biased Notation for Exponent To simplify the hardware for comparing two exponents (to use simpler integer sorting rather than subtraction).html (as they make no contribution to the precision). This can be done by simply adding 1 (a bias) at the MSB of the exponent field and the resulting representation is called biased notation. in the following the default normalization does not assume this implicit 1 unless otherwise specified. it does not need to be shown explicitly.eng. Consider a 5-bit exponent field (range of exponents: ): 2 of 7 6/3/2012 7:17 AM . The value can be kept unchanged by adjusting the exponent accordingly.edu/e85/lectures/arithmetic_html/node11. Zero is represented by all 0's and is not (and cannot be) normalized. The first bit 1 before the decimal point is implicit. a 4-bit exponent field and a 9-bit significand field): with an implied 1.Floating-Point Representation http://fourier.hmc. highest precision can be achieved. The significand could be further shifted to the left by 1 bit to gain one more bit for precision. Example: A binary number can be represented in 14-bit floating-point form in the following ways (1 sign bit. Moreover.0: By normalization. as the MSB of the significand is always 1.

the bias is . 3 of 7 6/3/2012 7:17 AM . an implied 1 is used. which lifts the representation (not the actual exponent) by half of the range to get rid of the negative parts represented by 2's complement. the bias for the 8-bit exponent is (instead of The 8-bit exponent field: ).Floating-Point Representation http://fourier.. If there are e bits in this field.html The bias depends on number of bits in the exponent field.e.edu/e85/lectures/arithmetic_html/node11. the significand has effectively 24 bits including 1 implied bit to the left of the decimal point not explicitly represented in the notation. including 1 sign bit. As the implied base is 2.eng.hmc. The range of actual exponents represented is still the same. 8 exponent bits and 23 bits for the significand. With the biased exponent. Note in particular that in IEEE 754 notation. i. the value represented by the notation is: Floating-Point Notation of IEEE 754 The IEEE 754 floating-point standard uses 32 bits to represent a floating-point number.

Floating-Point Representation http://fourier. For example. the implied 1 can no longer be used.eng. when the implied base is .g. or (b) increasing the implied base from 2 to 4. the range of magnitudes representable is ). the significand must be shifted multiple of q bits at a time so that the exponent can be correspondingly adjusted to keep the value unchanged.. etc. the bias of the notation.. e. if . e. 8.g. Note that the base is 4 (instead of 2) 4 of 7 6/3/2012 7:17 AM . the range of exponent values representable is and the range of magnitudes representable is This range can be extended by (a) increasing number of bits for exponent. The exponent (with all zero significand) is reserved to represent infinities or not-anumber (NaN) which may occur when. the representation is normalized. is represented by: Other Implied Bases Given e bits for the exponent field. the range of exponent values representable is and the range of magnitudes representable is about For example. Obviously. If at least one of the first q bits of the significand is 1. Examples: Normalize .html Note: Zero exponent is represented by .hmc. (or in general. 16. The range of exponents representable is from -126 to 127. a number is divided by zero.edu/e85/lectures/arithmetic_html/node11. Normalization: If the implied base is . The smallest exponent is reserved to represent denormalized numbers (smaller than which cannot be normalized) and zero.

and the notation is (without implied 1): or (with implied 1): Find the value represented in this biased notation: The biased exponent is 17. Similarly.edu/e85/lectures/arithmetic_html/node11. Represent in biased notation with and implied base is 2. bits for exponent field. Obviously the implied 1 can not be used.eng. if the implied base is .Floating-Point Representation http://fourier. In general.3125 5 of 7 6/3/2012 7:17 AM . normalization means to left shift the significand q bits at a time until there is at least one 1 in the highest q bits of the significand. corresponding to dividing the value by 4. the significand has to be shifted 3 bits at a time. the actual exponent is .hmc. because the smallest reduction of the exponent necessary to keep the value represented unchanged is 1. The bias is The biased exponent is . if .html Note that the significand has to be shifted to the left two bits at a time during normalization. the value is (without implied 1): or (with implied 1): Examples of IEEE 754: -0.

html The biased exponent is . 37.Floating-Point Representation http://fourier.25 The biased exponent: .hmc.eng. As the most negative exponent representable is -126.0 The biased exponent is .5 The based exponent: . 1. this value is a denorm which cannot be normalized: Can you answer the following questions regarding 32-bit IEEE 754 floating-point representation and explain why?: 6 of 7 6/3/2012 7:17 AM . .edu/e85/lectures/arithmetic_html/node11. -78.

edu/e85/lectures/arithmetic_html/node11.eng.html What is the largest magnitude (absolute value) representable? What is the smallest magnitude (absolute value) representable? what is the largest gap between two consecutive numbers? what is the smallest gap between two consecutive numbers? next previous Next: Floating-Point Arithmetic Up: arithmetic_html Previous: Fast Multiplication Ruye Wang 2003-10-24 7 of 7 6/3/2012 7:17 AM .Floating-Point Representation http://fourier.hmc.

Sign up to vote on this title
UsefulNot useful