Professional Documents
Culture Documents
Information can come in different formats, such as: text, audio, video. However, the computer can only
process binary strings. It is necessary to be able to convert between internal and external formats.
It is important that both integers and floating points are represented internally as fixed-length binary
strings.
Mathematical background
Integer division - A / B delivers two results: q – quotient and remainder – r
A=Bxq+r
o Div: A div B = q
o Mod: A mod B = r
Polynomials – functions of powers of x
A polynomial representation is of the form: a nxn = an-1xn-1 + … + a1x1 + a0x0 + a-1x-1 + … + a-mx-m
where the coefficients ai are constant.
NR = rnRn + rn-1Rn-1 + … + r1R1 + r0R0 + r-1R-1 + … + r-mR-m with the coefficients ri in the range of <0, R-1>
Decimal numbers have base R = 10 (r in range 0-9)
Binary numbers have base R = 2 (r in range 0-1)
Hexadecimal numbers have base R = 16 (r in range 0-15)
Examples:
Another common system is Unicode – aims to cover all written languages. Encoding for a character in
Unicode has a fixed-length of 16 bits.
Unsigned – can be converted directly into binary and processed without any special care. Addition of the
sign complicates the issue, as there is no obvious way of representing it in binary.
Signed integers are usually represented using 2's complement. The most significant bit is always the
sign bit – 1 for negative numbers, 0 for positive.
To create a negative number using 2's complement:
Given positive number N, the negative is created by: 2 n - N
Example: using 8-bit integers
N = 510 = 000001012
To represent –5 = 28 – 5 = 256 – 5 = 25110 = 111110112
Another way: create positive integer, invert all bits, add 1.
Example: N = 510 = 000001012
Inverting all bits in 00000101 produces 11111010.
After adding 1 the result is 11111011.
Simplification: given a positive integer, to negate it, copy all bits starting from the right, up
to(including) the fist 1 in the digit, invert the rest: 00000101 - copy the rightmost bit, invert the
rest: 11111011.
The range of numbers that can be represented using n bits is <-2 n-1, 2n-1 –1>.
Example: Using 8-bit integers, it is possible to represent numbers between –128 and 127. The difference
between the number of positive and negative numbers accounts for the presence of 0.
Overflow might occur when adding 2 positive or 2 negative values. It can be detected by checking the
sign bit of the answer.
Multiplication – multiplying 2 8-bit strings produces a product at most 8+8=16 bits in length – potential
for overflow if the result is longer than the length allocated by the machine.
Carry – another problem of fixed-length integers. Addition of two fixed-length binary strings can
generate a carry bit as the result of the last bit.
00101100 + 11100111 = 100010011
Adding 2 negative integers always generates carry: 11110110 + 11100111 = 111011101
Rounding error
Most conversions will not be exact as the number must fit into the set number of bits (23 for single
precision).
Example: 0.6012510 = 0.100110011110101110000101000111...2 - this cannot be represented precisely
The number can be either truncated after 23 bits or rounded using 24 th bit. Rounding introduces less
error than truncation.
To round a 24-bit number into 23 bits:
Remove the least significant bit.
If that bit was 1, add it to the 23-bit number
Rounding introduces small error (bounded by 2 -24 for rounding from 24 to 23 bits).
To truncate the number, simply cut it at the required amount of bits and throw away the rest.
When adding numbers represented as floating points, the order of the additions can be important. If a
very large number is added to a very small one, the small one may be truncated. The general rule is to
add small values together first, to let them accumulate.