You are on page 1of 71

02/11/2011 02/11/2011 1 1

Data Representation
Floating Point Floating Point
2 2 02/11/2011 02/11/2011
Learning Objectives: Learning Objectives:
Demonstrate an understanding of floating
point representation of a real binary
number.
Normalise a real binary number
Discuss the trade-off between accuracy
and range when representing numbers.
02/11/2011 02/11/2011
Binary Range Binary Range
Limited by the number of bits used to Limited by the number of bits used to
represent a number. represent a number.
More bits means a wider range. More bits means a wider range.
But even using 4 bytes (2 bits) to But even using 4 bytes (2 bits) to
represent a number means that represent a number means that
4,278,190,080 is the largest number which 4,278,190,080 is the largest number which
can be held. can be held.
4 4 02/11/2011 02/11/2011
Fixed Point Binary Fixed Point Binary
A number with a decimal point is known A number with a decimal point is known
(strangely!) as a (strangely!) as a reaI reaI number as opposed to an number as opposed to an
integer which is a whole number. integer which is a whole number.
We can extend the binary system to represent We can extend the binary system to represent
real numbers by reserving some bits for the real real numbers by reserving some bits for the real
or fractional part. or fractional part.
8 8 4 4 2 2 1 1 1/8 1/8 1/16 1/16
0 0 1 1 1 1 0 0 1 1 1 1 0 0 0 0
6.75 = 0110.1100 6.75 = 0110.1100
..
5 5 02/11/2011 02/11/2011
Fixed Point Binary Fixed Point Binary - - Decimal Decimal
Converter Converter
Try using it to 'play' with fixed point binary. Try using it to 'play' with fixed point binary.
6 6 02/11/2011 02/11/2011
Fixed Point Binary Range Fixed Point Binary Range
The range is now even more limited as The range is now even more limited as
now some bits are reserved for the real / now some bits are reserved for the real /
fractional part and so are no longer being fractional part and so are no longer being
used to hold higher numbers! used to hold higher numbers!
7 7 02/11/2011 02/11/2011
Fixed Point Binary Precision Fixed Point Binary Precision
Also the fractional part can only hold 4 places, Also the fractional part can only hold 4 places,
any places after the first 4 will be either rounded any places after the first 4 will be either rounded
or truncated, so precision will be lost. or truncated, so precision will be lost.
This might first appear to be accurate enough for This might first appear to be accurate enough for
most purposes. most purposes.
However, each binary digit after the point is However, each binary digit after the point is
worth half of the last not 1/10 like in decimal worth half of the last not 1/10 like in decimal
values. values.
Example shown on next slide. Example shown on next slide.
8 8 02/11/2011 02/11/2011
Fixed Point Binary Precision Fixed Point Binary Precision
110.1 = 6.5 110.1 = 6.5
110.11 = 6.75 110.11 = 6.75
We have missed out 6.51 to 6.74! We have missed out 6.51 to 6.74!
This means accuracy is poor. This means accuracy is poor.
9 9 02/11/2011 02/11/2011
Floating Point (Fractional Real Floating Point (Fractional Real
Numbers) Numbers)
This increases the possible range of stored real This increases the possible range of stored real
numbers but not accuracy (this only achieved by numbers but not accuracy (this only achieved by
using more bytes (bits): using more bytes (bits):
e.g. 1,200,000,000,000 (Decimal) = 0.12*10 e.g. 1,200,000,000,000 (Decimal) = 0.12*10
1 1
0.12 = mantissa , 1 = exponent 0.12 = mantissa , 1 = exponent
A number is therefore held in two parts: A number is therefore held in two parts:
Mantissa Mantissa
Exponent Exponent
Could be represented: 12 1 if it was understood Could be represented: 12 1 if it was understood
that the first part is the mantissa and second part that the first part is the mantissa and second part
is the exponent is the exponent
10 10 02/11/2011 02/11/2011
Mantissa & Exponent Mantissa & Exponent 1 byte each 1 byte each
Most exam questions appear to use 8 bits for the Most exam questions appear to use 8 bits for the
mantissa and 8 bits for the exponent. mantissa and 8 bits for the exponent.
Try 'playing' with the Try 'playing' with the Floating Point Binary Floating Point Binary - -
Decimal Converter Decimal Converter..
Also use it whenever you need to for the rest of Also use it whenever you need to for the rest of
this presentation. this presentation.
11 11 02/11/2011 02/11/2011
Mantissa Mantissa
Represents the magnitude of the number Represents the magnitude of the number
and is the fractional part of the and is the fractional part of the
representation. representation.
Place value of MSB is Place value of MSB is 1 and the other bits 1 and the other bits
are , . are , .
12 12 02/11/2011 02/11/2011
Exponent Exponent
Represents the power of 2 by which the Represents the power of 2 by which the
mantissa must be multiplied to give the mantissa must be multiplied to give the
original value. original value.
02/11/2011 02/11/2011 1 1
Positive Mantissa Positive Mantissa
&&
Positive Exponent Positive Exponent
Denary Denary - -> Floating Point Binary > Floating Point Binary
14 14 02/11/2011 02/11/2011
6.5 6.5
6.5 6.5
= 0 1101 * 2 = 0 1101 * 2
= 0.1101 * 2 = 0.1101 * 2
11 11
1101 11 1101 11
mantissa mantissa exponent exponent
sign bits sign bits
Add 0's to right of the
mantissa and to left
(before the sign bit) of the
exponent.
= 6 = 6 = 6 = 6
= 110 1000 = 110 1000 = 110 1000 = 110 1000
....
....

000 00000 0 0
Fixed point binary Fixed point binary Fixed point binary Fixed point binary
15 15 02/11/2011 02/11/2011
Using an 8 bit byte for the Using an 8 bit byte for the
mantissa and another 8 bit byte mantissa and another 8 bit byte
for the exponent show 1.75 as a for the exponent show 1.75 as a
2 byte, floating point number in 2 byte, floating point number in
two's complement form. two's complement form.
%ry this independentIy first. %ry this independentIy first.
16 16 02/11/2011 02/11/2011
1.75 1.75
1.75 = 1 + + 1.75 = 1 + +
= 1.11 = 1.11 (binary (binary fixed point) fixed point)
= 0.111 * 2 = 0.111 * 2
1 1
= 0.111 * 2 = 0.111 * 2
00000001 00000001
01110000 00000001 01110000 00000001
mantissa mantissa exponent exponent
02/11/2011 02/11/2011 17 17
Positive Mantissa Positive Mantissa
&&
Positive Exponent Positive Exponent
Floating Point Binary Floating Point Binary - -> Denary > Denary
18 18 02/11/2011 02/11/2011
00000011 = 00000011 =
0.1101000 * 2 0.1101000 * 2

6.5 6.5
01101000 00000011 01101000 00000011
= 110.1 = 110.1
Assumed binary
point between sign
bit and 2
nd
bit.
19 19 02/11/2011 02/11/2011
Using 8 bits for the mantissa, 8 bits for Using 8 bits for the mantissa, 8 bits for
the exponent and storing the mantissa the exponent and storing the mantissa
and the exponent in two's complement and the exponent in two's complement
form. form.
Give the denary number which would Give the denary number which would
have 01011000 00000011 as its have 01011000 00000011 as its
binary, floating point representation. binary, floating point representation.
%ry this independentIy first. %ry this independentIy first.
20 20 02/11/2011 02/11/2011
01011000 00000011 01011000 00000011
0000011 = 0000011 =
0.1011000 * 2 0.1011000 * 2

5.5 5.5
= 101.1000 = 101.1000
Assumed binary
point between sign
bit and 2
nd
bit.
02/11/2011 02/11/2011 21 21
Positive Mantissa Positive Mantissa
&&
Negative Exponent Negative Exponent
Denary Denary - -> Floating Point Binary > Floating Point Binary
22 22 02/11/2011 02/11/2011
0.125 0.125
0.125 = 1/8 0.125 = 1/8
= 0.001 (binary = 0.001 (binary fixed point) fixed point)
0.1 * 2 0.1 * 2
- -2 2
- -2 = 2 = - - 00000010 00000010
= 1 1111110 = 1 1111110
01000000 11111110 01000000 11111110
two's complement two's complement
2 2 02/11/2011 02/11/2011
Using an 8 bit byte for the Using an 8 bit byte for the
mantissa and another 8 bit byte mantissa and another 8 bit byte
for the exponent show 0.75 as for the exponent show 0.75 as
a 2 byte, floating point number a 2 byte, floating point number
in two's complement form. in two's complement form.
%ry this independentIy first. %ry this independentIy first.
24 24 02/11/2011 02/11/2011
0.75 0.75
two's complement two's complement
0.75 = + 1/8 0.75 = + 1/8
= 0.011 = 0.011 (binary (binary fixed point fixed point) )
= = 11 * 2 11 * 2
- -1 1
- -1 = 1 = - - 00000001 00000001
= 1 1111111 = 1 1111111
0 1100000 1 1111111 0 1100000 1 1111111
02/11/2011 02/11/2011 25 25
Positive Mantissa Positive Mantissa
&&
Negative Exponent Negative Exponent
Floating Point Binary Floating Point Binary - -> Denary > Denary
26 26 02/11/2011 02/11/2011
= 0.001 = 0.001 (binary (binary fixed point fixed point) )
01000000 11111110 01000000 11111110
negative negative negative negative
= = - -2 2 = = - -2 2
undo two's complement undo two's complement
11111110 11111110
- - 00000010 00000010
0.1000000 * 2 0.1000000 * 2
- -2 2
= 1/8 = 1/8
0.125 0.125
27 27 02/11/2011 02/11/2011
Using 8 bits for the mantissa, 8 bits for Using 8 bits for the mantissa, 8 bits for
the exponent and storing the mantissa the exponent and storing the mantissa
and the exponent in two's complement and the exponent in two's complement
form. form.
Give the denary number which would Give the denary number which would
have 01100000 11111111 as its have 01100000 11111111 as its
binary, floating point representation. binary, floating point representation.
%ry this independentIy first. %ry this independentIy first.
28 28 02/11/2011 02/11/2011
01100000 11111111 01100000 11111111
11111111 11111111
= = - - 00000001 00000001
= = - -1 1 (decimal) (decimal)
0.1100000 * 2 0.1100000 * 2
- -1 1
= + 1/8 = + 1/8
= 0.25 + 0.125 = 0.25 + 0.125
0.375 0.375
= 0.01100000 = 0.01100000
undo two's undo two's
complement complement
undo two's undo two's
complement complement
02/11/2011 02/11/2011 29 29
Negative Mantissa Negative Mantissa
&&
Positive Exponent Positive Exponent
Denary Denary - -> Floating Point Binary > Floating Point Binary
0 0 02/11/2011 02/11/2011
- - 1.5 1.5
1.5 = 1.1 1.5 = 1.1 (binary) (binary)
- - 1.1 = 1.1 = - - 0.11 * 2 0.11 * 2
1 1
= 1 01 * 2 = 1 01 * 2
00000001 00000001
= 1 0100000 0 0000001 = 1 0100000 0 0000001
two's complement two's complement two's complement two's complement
mantissa mantissa
exponent exponent
1 1 02/11/2011 02/11/2011
Using an 8 bit byte for the Using an 8 bit byte for the
mantissa and another 8 bit byte mantissa and another 8 bit byte
for the exponent show for the exponent show - -1.25 as 1.25 as
a 2 byte, floating point number a 2 byte, floating point number
in two's complement form. in two's complement form.
%ry this independentIy first. %ry this independentIy first.
2 2 02/11/2011 02/11/2011
- - 1.25 1.25
- - 1.25 = 1.25 = - - 1 + 1 +
= = - - 1.01 1.01 (binary (binary fixed point fixed point) )
= = - - 0.101 * 2 0.101 * 2
1 1
= 1 011 * 2 = 1 011 * 2
00000001 00000001
1 0110000 0 0000001 1 0110000 0 0000001
mantissa mantissa exponent exponent
02/11/2011 02/11/2011
Negative Mantissa Negative Mantissa
&&
Positive Exponent Positive Exponent
Floating Point Binary Floating Point Binary - -> Denary > Denary
00000011 00000011
00000011 = 00000011 =
1.1101000 * 2 1.1101000 * 2

= = - - 0001.1 0001.1
- - 1.5 1.5
11101000 00000011 11101000 00000011
You may notice that as shown previously, -1.5 can also be
shown as 1 0100000 0 0000001.
This is because 11101000 00000011 is not normalised
which is something we will look at later.
undo two's undo two's
complement complement
undo two's undo two's
complement complement
= = - - 0.0011000 * 2 0.0011000 * 2

5 5 02/11/2011 02/11/2011
Using 8 bits for the mantissa, 8 bits for Using 8 bits for the mantissa, 8 bits for
the exponent and storing the mantissa the exponent and storing the mantissa
and the exponent in two's complement and the exponent in two's complement
form. form.
Give the denary number which would Give the denary number which would
have 10111010 00000011 as its have 10111010 00000011 as its
binary, floating point representation. binary, floating point representation.
%ry this independentIy first. %ry this independentIy first.
6 6 02/11/2011 02/11/2011
10111010 00000011 10111010 00000011
0 0000011 = 0 0000011 =
1.0111010 * 2 1.0111010 * 2

= = - - 0100.0110 0100.0110
= = - - (4 + + 1/8) (4 + + 1/8)
= = - - 4.75 4.75
= = - - 0.1000110 0.1000110 = = - - 0.1000110 0.1000110
02/11/2011 02/11/2011 7 7
Negative Mantissa Negative Mantissa
&&
Negative Exponent Negative Exponent
Denary Denary - -> Floating Point Binary > Floating Point Binary
8 8 02/11/2011 02/11/2011
- - 0.125 0.125
- -0.125 = 0.125 = - - 1/8 1/8
= = - - 0.001 0.001 (binary (binary fixed point fixed point) )
- - 0.1 * 2 0.1 * 2
- -2 2
= 1 1000000 * 2 = 1 1000000 * 2
1 11111110 1 11111110
1 1000000 11111110 1 1000000 11111110
= = - -0.1 * 2 0.1 * 2
- -00000010 00000010
= = - -0.1 * 2 0.1 * 2
- -00000010 00000010
9 9 02/11/2011 02/11/2011
Using an 8 bit byte for the Using an 8 bit byte for the
mantissa and another 8 bit byte mantissa and another 8 bit byte
for the exponent show for the exponent show - -0.25 as 0.25 as
a 2 byte, floating point number a 2 byte, floating point number
in two's complement form. in two's complement form.
%ry this independentIy first. %ry this independentIy first.
40 40 02/11/2011 02/11/2011
- - 0.25 0.25
- - 0.25 = 0.25 = - -
= = - - 0.01 0.01 (binary (binary fixed point fixed point) )
= = - - 0.1 * 2 0.1 * 2
- -1 1
= = - - 0.1000000 * 2 0.1000000 * 2
- -00000001 00000001
= 11000000 * 2 = 11000000 * 2
11111111 11111111
11000000 11111111 11000000 11111111
02/11/2011 02/11/2011 41 41
Negative Mantissa Negative Mantissa
&&
Negative Exponent Negative Exponent
Floating Point Binary Floating Point Binary - -> Denary > Denary
42 42
10000000 11111101 10000000 11111101
1111101 1111101
= = - - 0000011 0000011
= = - -
1.0000000 * 2 1.0000000 * 2
- -
= = - - 0.001 0.001
= = - - 1/8 1/8
- - 0.125 0.125
Note that the
mantissa looks the
same in two's
complement form
as in none two's
complement form
because the last 1
is at the beginning.
Note that the
mantissa looks the
same in two's
complement form
as in none two's
complement form
because the last 1
is at the beginning.
undo two's
complement
undo two's
complement
= 1.0000000 * 2 = 1.0000000 * 2
- -
4 4 02/11/2011 02/11/2011
Using 8 bits for the mantissa, 8 bits for Using 8 bits for the mantissa, 8 bits for
the exponent and storing the mantissa the exponent and storing the mantissa
and the exponent in two's complement and the exponent in two's complement
form. form.
Give the denary number which would Give the denary number which would
have 10000000 11111110 as its have 10000000 11111110 as its
binary, floating point representation. binary, floating point representation.
%ry this independentIy first. %ry this independentIy first.
10000000 11111110 10000000 11111110
11111110 11111110
= = - - 0000010 0000010
= = - - 2 2
1.0000000 * 2 1.0000000 * 2
- -2 2
= = - - 0.01 0.01
- - 0.25 0.25
You may notice that as shown previously, -0.25
can also be shown as 1 1000000 1 1111111.
This is because 10000000 11111110 is
normalised which is something we will look at next.
undo two's
complement
undo two's
complement
45 45 02/11/2011 02/11/2011
Denary Denary - -> Floating Point > Floating Point
1. 1. Convert fractional part of denary number to fractions. Convert fractional part of denary number to fractions.
2. 2. Convert to fixed point binary (keep Convert to fixed point binary (keep sign if exists). sign if exists).
. . Move binary point to left hand side of first 1 and count Move binary point to left hand side of first 1 and count
how many places and note direction needed. how many places and note direction needed.
. . 0. 0.number * 2^no.of places needed in step (denary). number * 2^no.of places needed in step (denary).
5. 5. 0. 0.number * 2^no.of places needed in step (binary). number * 2^no.of places needed in step (binary).
f moved right then use f moved right then use sign and then flip for two's sign and then flip for two's
complement complement
6. 6. 1 1
st st
binary number binary number - - Remove binary point (keeping 1 Remove binary point (keeping 1
st st
0) and add any necessary 0's to right (to make 8 bits). 0) and add any necessary 0's to right (to make 8 bits).
Convert to two's complement if Convert to two's complement if tive and remove tive and remove - - sign. sign.
This is the This is the antissa antissa..
7. 7. Add any necessary 0's (before sign bit) to left of 2 Add any necessary 0's (before sign bit) to left of 2
nd nd
binary number (to make 8 bits, including sign bit). This binary number (to make 8 bits, including sign bit). This
is the is the Exponent Exponent..
46 46 02/11/2011 02/11/2011
Floating Point Floating Point - -> Denary > Denary
1. 1. Convert exponent to denary. Convert exponent to denary.
2. 2. f sign bit = 1 then flip to convert from two's f sign bit = 1 then flip to convert from two's
complement. complement.
. . Mantissa * 2^exponent (denary). Mantissa * 2^exponent (denary).
Convert mantissa from two's complement if sign bit = Convert mantissa from two's complement if sign bit =
1 and insert for our benefit a 1 and insert for our benefit a sign. sign.
nsert assumed binary point after the sign bit. nsert assumed binary point after the sign bit.
4. 4. Move the binary point the exponent number of Move the binary point the exponent number of
places (> places (> , < , < - -). ).
5. 5. Convert to denary as fixed point binary. Convert to denary as fixed point binary.
47 47 02/11/2011 02/11/2011
Decimal Normalisation Decimal Normalisation
4,568,000 = 0.4568 x 10 4,568,000 = 0.4568 x 10
8 8
= 0.04568 x 10 = 0.04568 x 10
9 9
The first way is obviously more efficient. The first way is obviously more efficient.
This form is called the normalised form. This form is called the normalised form.
48 48 02/11/2011 02/11/2011
Binary Normalisation Binary Normalisation
n binary the normalised form is used to n binary the normalised form is used to
maximise efficiency and to have only one maximise efficiency and to have only one
way to represent a number. way to represent a number.
The mantissa is said to be normalised if The mantissa is said to be normalised if
the first two digits are different. the first two digits are different.
For positive numbers, the first digit is always For positive numbers, the first digit is always
0 0 and the second is always and the second is always 1 1. .
For negative numbers the first digit is always For negative numbers the first digit is always
1 1 and the second is always and the second is always 0 0. .
49 49 02/11/2011 02/11/2011
Normalising Floating Point Normalising Floating Point
Numbers Numbers
1. 1. Convert the exponent to denary. Convert the exponent to denary.
2. 2. Shift the mantissa (not the sign bit) as Shift the mantissa (not the sign bit) as
many places to left as necessary to many places to left as necessary to
achieve a leading 1 achieve a leading 1 (if positive i.e. sign bit = (if positive i.e. sign bit =
0) 0) or a leading 0 or a leading 0 (if negative i.e. sign bit = 1) (if negative i.e. sign bit = 1)..
. . Subtract the number of places that were Subtract the number of places that were
necessary from the exponent and necessary from the exponent and
convert back to binary. convert back to binary.
50 50 02/11/2011 02/11/2011
0 0001101 00000010 0 0001101 00000010
1. 1. The exponent 00000010 = 2 The exponent 00000010 = 2
2. 2. The mantissa 0 0001101 has to be shifted (x) The mantissa 0 0001101 has to be shifted (x)
left to achieve a leading 1 left to achieve a leading 1 (not including the sign bit) (not including the sign bit)
i.e. i.e. 0 1101000 0 1101000
. . So exponent should be 2 So exponent should be 2 = = - -1 1
= = - - 00000001 00000001
= 1 1111111 = 1 1111111
So normalised So normalised 01101000 11111111 01101000 11111111
51 51 02/11/2011 02/11/2011
1 1111001 00000011 1 1111001 00000011
1. 1. 00000011 = 00000011 =
2. 2. 1 1111001 has to be shifted (4x) left to achieve 1 1111001 has to be shifted (4x) left to achieve
a leading 0 a leading 0 (not including the sign bit) (not including the sign bit)..
. . So exponent should be So exponent should be - - 4 = 4 = - -1 1
= = - - 00000001 00000001
= 1 1111111 = 1 1111111
So normalised So normalised 10010000 11111111 10010000 11111111
02/11/2011 02/11/2011 52 52
Normalise Normalise these floating point these floating point
binary numbers. binary numbers.
%ry this independentIy first. %ry this independentIy first.
11000000 11111111 11000000 11111111 11000000 11111111 11000000 11111111
11101000 00000011 11101000 00000011 11101000 00000011 11101000 00000011
5 5 02/11/2011 02/11/2011
11101000 00000011 11101000 00000011
1. 1. 00000011 = 00000011 =
2. 2. 1101000 has to be shifted (2x) left to achieve a 1101000 has to be shifted (2x) left to achieve a
leading 0 leading 0 (not including the sign bit) (not including the sign bit)..
. . So exponent should be So exponent should be 2 = 1 2 = 1
= 00000001 = 00000001
So normalised So normalised 1 0100000 0 0000001 1 0100000 0 0000001
54 54 02/11/2011 02/11/2011
11000000 11111111 11000000 11111111
1. 1. 11111111 = 11111111 = - - 0000001 0000001
= = - -1 1 (denary) (denary)
2. 2. 1000000 has to be shifted (1x) left to achieve a 1000000 has to be shifted (1x) left to achieve a
make the 2 make the 2
nd nd
bit 0. bit 0.
. . So exponent should be So exponent should be - -1 1 1 = 1 = - -2 2
= = - - 00000010 00000010
= 11111110 = 11111110
So normalised So normalised 10000000 11111110 10000000 11111110
02/11/2011 02/11/2011 55 55
f you are asked to give the floating f you are asked to give the floating
point binary form of a decimal and point binary form of a decimal and
make sure it is make sure it is normalised normalised..
Then convert as practised and Then convert as practised and
normalise if necessary. normalise if necessary.
02/11/2011 02/11/2011 56 56
Numbers are held in floating point form with one byte for the Numbers are held in floating point form with one byte for the
mantissa (fraction) and one byte for the exponent mantissa (fraction) and one byte for the exponent
(characteristic). All values are held in two's complement form (characteristic). All values are held in two's complement form
and the mantissa is normalised. and the mantissa is normalised.
Using this format, write down the binary floating point values Using this format, write down the binary floating point values
and the denary values of and the denary values of
(i) (i) the largest magnitude, positive number; the largest magnitude, positive number;
(ii) (ii) the smallest magnitude, positive number; the smallest magnitude, positive number;
(iii) (iii) the largest magnitude, negative number; the largest magnitude, negative number;
(iv) (iv) the smallest magnitude, negative number. the smallest magnitude, negative number.
(The denary values may be left as a product of a power of 2). (The denary values may be left as a product of a power of 2).
57 57 02/11/2011 02/11/2011
Floating Point Binary Floating Point Binary - - Decimal Decimal
converter converter
Either use own understanding or the Either use own understanding or the Fixed Fixed
Point Binary Point Binary - - Decimal Converter Decimal Converter to help to help
you do the last slide independently first. you do the last slide independently first.
58 58 02/11/2011 02/11/2011
The largest magnitude, positive number that can The largest magnitude, positive number that can
be held in a floating point system using 8 bits for be held in a floating point system using 8 bits for
the mantissa and 8 bits for the exponent. the mantissa and 8 bits for the exponent.
0 1111111 * 2 0 1111111 * 2
0 1111111 0 1111111
= 127/128 * 2 = 127/128 * 2
127 127
59 59 02/11/2011 02/11/2011
The smallest magnitude, positive number that can The smallest magnitude, positive number that can
be held in a floating point system using 8 bits for be held in a floating point system using 8 bits for
the mantissa and 8 bits for the exponent. the mantissa and 8 bits for the exponent.
0 1000000 * 2 0 1000000 * 2
1 0000000 1 0000000
= 0.5 * 2 = 0.5 * 2
- -128 128
60 60 02/11/2011 02/11/2011
The largest magnitude, negative number; that can The largest magnitude, negative number; that can
be held in a floating point system using 8 bits for be held in a floating point system using 8 bits for
the mantissa and 8 bits for the exponent. the mantissa and 8 bits for the exponent.
1 0000000 * 2 1 0000000 * 2
0 1111111 0 1111111
= = - - 1 * 2 1 * 2
127 127
61 61 02/11/2011 02/11/2011
The smallest magnitude, negative number; that The smallest magnitude, negative number; that
can be held in a floating point system using 8 bits can be held in a floating point system using 8 bits
for the mantissa and 8 bits for the exponent. for the mantissa and 8 bits for the exponent.
1 0111111 * 2 1 0111111 * 2
1 0000000 1 0000000
= = - - 65/128 * 2 65/128 * 2
- -128 128
62 62 02/11/2011 02/11/2011
mproving Accuracy of Binary mproving Accuracy of Binary
Floating Point Numbers Floating Point Numbers
f we want to improve accuracy we must use f we want to improve accuracy we must use
more bits for the mantissa by reducing the more bits for the mantissa by reducing the
number of bits for the exponent. number of bits for the exponent.
As more digits could be represented after the binary As more digits could be represented after the binary
point. point.
However the range would be decreased as the However the range would be decreased as the
exponent could not be as large as before. exponent could not be as large as before.
So the power of two which the mantissa is multiplying So the power of two which the mantissa is multiplying
by is decreased. by is decreased.
6 6 02/11/2011 02/11/2011
Representing Zero Representing Zero
Using the Floating Point Binary Using the Floating Point Binary - - Decimal Decimal
converter: converter:
Try representing 0 as a non Try representing 0 as a non- -normalised binary normalised binary
floating point number. floating point number.
Now try representing 0 as a normalised Now try representing 0 as a normalised
floating point number? floating point number?
Can you? Why? Can you? Why?
64 64 02/11/2011 02/11/2011
Representing Zero Representing Zero
A normalised value must have the first two A normalised value must have the first two
bits of the mantissa different. bits of the mantissa different.
Therefore one must be a 1 which must Therefore one must be a 1 which must
represent either represent either - -1 or + , but not zero. 1 or + , but not zero.
65 65 02/11/2011 02/11/2011
Floating Point Binary Floating Point Binary
You may now be thinking 'f the range is so large You may now be thinking 'f the range is so large
why don't we use floating point binary why don't we use floating point binary
representation for all numbers (including representation for all numbers (including
integers)?' integers)?'
However, it is more complicated to perform However, it is more complicated to perform
arithmetic on floating point numbers than arithmetic on floating point numbers than
integers and so they are slower to work with. integers and so they are slower to work with.
Because of this floating point representation is Because of this floating point representation is
only used with real fractional numbers or only used with real fractional numbers or
integers outside the range of +2 billion to integers outside the range of +2 billion to - -2 2
billion (which is the limit for 4 byte normal binary billion (which is the limit for 4 byte normal binary
representation). representation).
66 66 02/11/2011 02/11/2011
Plenary Plenary
Give the denary number which would have
01000000 00000000 as its binary, floating
point representation in this computer.
67 67 02/11/2011 02/11/2011
Plenary Plenary
or 0.5
68 68 02/11/2011 02/11/2011
Plenary Plenary
Show
10
-10
as 2 byte, normalised, floating point
numbers.
69 69 02/11/2011 02/11/2011
Plenary Plenary
01010100 00000100
10101100 00000100
70 70 02/11/2011 02/11/2011
Plenary Plenary
Explain the effect on the Explain the effect on the
range range
accuracy accuracy
of the numbers that can be stored if the of the numbers that can be stored if the
number of bits in the exponent is reduced. number of bits in the exponent is reduced.
71 71 02/11/2011 02/11/2011
Plenary Plenary
Range is decreased because power of Range is decreased because power of
two which the mantissa is multiplying by two which the mantissa is multiplying by
is decreased. is decreased.
Accuracy is increased because more Accuracy is increased because more
digits are represented after the binary digits are represented after the binary
point. point.

You might also like