Professional Documents
Culture Documents
Overflow Carry
Sign bit
01111 + 100+
00111 111
-------- -------------
10110 1011
Sign bit
Carry
Fractional Fixed Point Rep
Rather than using the integer values just
discussed, a fractional fixed-point number
that has values between +0.99 . . . and -1
can be used.
Data types
1.Short:
it is of size 16 bits represented as 2s complement with
a range from -215 to (215 -1)
2.Int or signed int:
it is of size 32 bits represented as 2s complement with
a range from -231 to ( 231-1)
3.Float:
it is of size 32 bits represented as IEEE 32 bit with a
range from 2-126(1.175494x10-38) to 2+128
(3.40282346x1038)
4.Double:
it is of size 64 bits represented as IEEE 64 bit with a
range from 2-1022(2.22507385x10-308) to 2
1024(1.79769313x10308)
Floating-point representation
X= M. be
where M is the value of the significand (mantissa),
b is the base
e is the exponent.
Mantissa determines the accuracy of the number
Exponent determines the range of numbers that can be
represented
Floating point numbers can be represented as:
Single precision :
called "float" in the C language family
it is a binary format that occupies 32 bits
its significand has a precision of 24 bits
Double precision :
called "double" in the C language family
it is a binary format that occupies 64 bits
its significand has a precision of 53 bits
Single Precision (SP):
31 30 23 22 0
S e f
31 30 20 19 0 31 0
s e f f