You are on page 1of 76

CSCI-UA.

0201

Computer Systems Organization


Bits and Bytes:
Data Presentation
Mohamed Zahran (aka Z)
mzahran@cs.nyu.edu
http://www.mzahran.com

Some slides adapted


and modified from:
•Clark Barrett
•Jinyang Li
•Bryant and O’Hallaron
Bits and Bytes
• Representing information as bits
• How bits are manipulated?
• Integers
• Floating points
Our First Steps…
How do we represent data in a computer?
• How do we represent information using
electrical signals?
• At the lowest level, a computer is an
electronic machine.
• Easy to recognize two conditions:
– presence of a voltage – we call this state “1”
– absence of a voltage – we call this state “0”
Binary Representations

0 1 0

3.3V
2.8V

0.5V
0.0V
A Computer is a Binary Digital
Machine
• Basic unit of information is the binary digit, or bit.
• Values with more than two states require multiple
bits.
– A collection of two bits has four possible states:
00, 01, 10, 11
– A collection of three bits has eight possible states:
000, 001, 010, 011, 100, 101, 110, 111
– A collection of n bits has 2n possible states.
George Boole
• (1815-1864)
• English mathematician and
philosopher
• Inventor of Boolean Algebra
• Now we can use things like:
AND, OR, NOT, XOR,
XNOR, NAND, NOR, ….

Source: http://history-computer.com/ModernComputer/thinkers/Boole.html
Claude Shannon
• (1916–2001)
• American mathematician
and electronic engineer
• His work is the foundation
for using switches (i.e.
transistors), and hence
binary numbers, to
implement Boolean function.

Source: http://history-computer.com/ModernComputer/thinkers/Shannon.html
So, we use transistors to
implement logic gates.
Logic gates manipulate binary
numbers to implement Boolean
functions.
Boolean functions solve
problems.

…. Simply Speaking … 
Encoding Byte Values
• Byte = 8 bits
0 0 0000
– Binary 000000002 to 111111112 1 1 0001
2 2 0010
– Decimal: 010 to 25510 3 3 0011
4 4 0100
– Hexadecimal 0016 to FF16 5 5 0101
6 6 0110
• Base 16 number representation 7 7 0111
8 8 1000
• Every 4 bits  1 hexadecimal digit 9 9 1001
• Use characters ‘0’ to ‘9’ and ‘A’ to ‘F’ A
B
10
11
1010
1011
• Write FA1D37B16 in C language as C 12 1100
D 13 1101
– 0xFA1D37B E 14 1110
– 0xfa1d37b F 15 1111
Data Representations
C Data Type Typical 32-bit Intel IA32 x86-64

char 1 1 1

short 2 2 2

int 4 4 4

long 4 4 8

long long 8 8 8

float 4 4 4

double 8 8 8

pointer 4 4 8

The most widely used .


Byte Ordering
• How are bytes within a multi-byte word
ordered in memory?
• Conventions
– Big Endian: Sun, PPC, Internet
• Most significant byte has lowest address
– Little Endian: x86
• Most significant byte has highest address
Byte Ordering Example
• Big Endian
– Most significant byte has lowest address Most
• Little Endian Significant
Byte
– Most significant byte has highest address
• Example
– Variable x has 4-byte representation 0x01234567
– Address given by &x is 0x100
Big Endian 0x100 0x101 0x102 0x103
01 23 45 67

Little Endian 0x100 0x101 0x102 0x103


67 45 23 01
Reading Byte-Reversed Listings
• Disassembly
– given the binary file, get the assembly

• Example Fragment
Address Instruction Code Assembly Rendition
8048365: 5b pop %ebx
8048366: 81 c3 ab 12 00 00 add $0x12ab,%ebx
804836c: 83 bb 28 00 00 00 00 cmpl $0x0,0x28(%ebx)

• Deciphering Numbers
– Value: 0x12ab
– Pad to 32 bits (int is 4 bytes_: 0x000012ab
– Split into bytes: 00 00 12 ab
– Reverse (little endian): ab 12 00 00
Examining Data Representations
• Code to print Byte Representation of
data

void show_bytes(unsigned char * start, int len){


int i;
for (i = 0; i < len; i++)
printf(”%p\t%2x\n",start+i, start[i]);
printf("\n");
}

printf directives:
%p: Print pointer
%x: Print Hexadecimal
show_bytes Execution Example
int a = 15213;
printf("int a = 15213;\n");
show_bytes((unsigned char *) &a, sizeof(int));

Result (Linux):
int a = 15213;
0x11ffffcb8 0x6d
0x11ffffcb9 0x3b
0x11ffffcba 0x00
0x11ffffcbb 0x00

Note: 15213 in decimal is 3B6D in hexadecimal


Representing Integers
Decimal: 15213
Binary: 0011 1011 0110 1101
Hex: 3 B 6 D
int A = 15213; long int C = 15213;
IA32, x86-64 Sun
IA32 x86-64 Sun
6D 00
6D 6D 00
3B 00
3B 3B 00
00 3B
00 00 3B
00 6D
00 00 6D
00
00
00
00
Representing Strings
• Strings in C char S[6] = "18243";

– Represented by array of characters


– Each character encoded in ASCII format
• Standard 7-bit encoding of character set
• Character ‘0’ has code 0x30
– Digit i has code 0x30+i Little Endian Big Endian

– String should be null-terminated 31 31


38 38
• Byte ordering not an issue 32 32
34 34
Byte Ordering is an issue for a data item. 33 33
An array is a group of data items. 00 00
How to Manipulate Bits?
Boolean Algebra
• Developed by George Boole in 19th Century
– Algebraic representation of logic
• Encode “True” as 1 and “False” as 0
And Or
 A&B = 1 when both A=1 and B=1  A|B = 1 when either A=1 or B=1
A B A&B A B A|B
0 0 0 0 0 0
0 1 0 0 1 1
1 0 0 1 0 1
1 1 1 1 1 1
Not Exclusive-Or (Xor)
 ~A = 1 when A=0  A^B = 1 when either A=1 or B=1, but not both
A ~A A B A^B
0 1 0 0 0
1 0 0 1 1
1 0 1
1 1 0
Application of Boolean Algebra
• Applied to Digital Systems by Claude
Shannon
– 1937 MIT Master’s Thesis
– Reason about networks of relay switches
• Encode closed switch as 1, open switch as 0

Transistor
General Boolean Algebras
• Operate on Bit Vectors (e.g. an integer is a bit vector
of 4 bytes = 32 bits)
– Operations applied bitwise
01101001 01101001 01101001
& 01010101 | 01010101 ^ 01010101 ~ 01010101
01000001
01000001 01111101
01111101 00111100
00111100 10101010
10101010
Bit-Level Operations in C
• Operations &, |, ~, ^ Available in C
– Apply to any “integral” data type
• long, int, short, char, unsigned
• Examples (Char data type)
– ~0x41 = 0xBE
• ~010000012 = 101111102
– ~0x00 = 0xFF
• ~000000002 = 111111112
– 0x69 & 0x55 = 0x41
• 011010012 & 010101012 = 010000012
– 0x69 | 0x55 = 0x7D
• 011010012 | 010101012 = 011111012
Contrast: Logic Operations in C
• Contrast to Logical Operators
&&, ||, !
• View 0 as “False”
• Anything nonzero as “True”
• Always return 0 or 1
• Examples (char data type)
– !0x41 = 0x00
– !0x00 = 0x01
– !!0x41 = 0x01

– 0x69 && 0x55 = 0x01


– 0x69 || 0x55 = 0x01
– p && *p (avoids null pointer access)
Boolean in C
• Did not exist in standard C89/90
• It was introduced in C99 standard
• You may need to use the following switch
with gcc:
gcc –std=c99 …
#include <stdbool.h>
bool x;
x = false;  lower case
x = true;
Shift Operations
• Left Shift: x << y
Argument x 01100010
– Shift x left by y positions
• Throw away extra bits on left << 3 00010000
• Fill with 0’s on right Log. >> 2 00011000
• Right Shift: x >> y Arith. >> 2 00011000
– Shift x right y positions
• Throw away extra bits on right
– type1: Logical shift Argument x 10100010
• Fill with 0’s on left
<< 3 00010000
– type 2: Arithmetic shift (covered later)
• Replicate most significant bit on right Log. >> 2 00101000
• Undefined Behavior Arith. >> 2 11101000
– Shift amount < 0 or ≥ size of x
How to present Integers?
(unsigned and signed)
Two Type of Integers
• Unsigned
– positive numbers and 0
• Signed numbers
– negative numbers as well as positive
numbers and 0
Unsigned Integers
w1
B2U(X)   xi 2 i

i0
10111011
128 64 32 16 8 4 2 1

187
Unsigned Integers
• An n-bit unsigned integer represents 2n
values:
22 21 2 0
from 0 to 2n-1. 0 0 0 0
0 0 1 1
0 1 0 2
0 1 1 3
1 0 0 4
1 0 1 5
1 1 0 6
1 1 1 7
Unsigned Binary Arithmetic
• Base-2 addition – just like base-10
– add from right to left, propagating carry
carry

10010 10010 1111


+ 1001 + 1011 + 1
11011 11101 10000

10111
+ 111
What About Negative Numbers?
People have tried several options:
Sign Magnitude: One's Complement Two's Complement
000 = +0 000 = +0 000 = +0
001 = +1 001 = +1 001 = +1
010 = +2 010 = +2 010 = +2
011 = +3 011 = +3 011 = +3
100 = -0 100 = -3 100 = -4
101 = -1 101 = -2 101 = -3
110 = -2 110 = -1 110 = -2
111 = -3 111 = -0 111 = -1

• Issues: balance, number of zeros, ease of operations


• Which one is best? Why?
Signed Integers
• With n bits, we have 2n distinct values.
– assign about half to positive integers and
about half to negative
• Positive integers
– just like unsigned: zero in most significant
(MS) bit
00101 = 5
• Negative integers
– In two’s complement form
In general: a 0 at the MS bit indicates positive
and a 1 indicates negative.
Two’s Complement
• Two’s complement representation
developed to make circuits easy for
arithmetic.
– for each positive number (X), assign value to
its negative (-X),
such that X + (-X) = 0 with “normal” addition,
ignoring carry out

00101 (5) 01001 (9)


+ 11011 (-5) + 10111 (-9)
00000 (0) 00000 (0)
Two’s Complement Signed Integers
• MS bit is sign bit.
• Range of an n-bit number: -2n-1 through 2n-1 – 1.
– The most negative number (-2n-1) has no positive
counterpart.
-23 22 21 20 -23 22 21 20
0 0 0 0 0 1 0 0 0 -8
0 0 0 1 1 1 0 0 1 -7
0 0 1 0 2 1 0 1 0 -6
0 0 1 1 3 1 0 1 1 -5
0 1 0 0 4 1 1 0 0 -4
0 1 0 1 5 1 1 0 1 -3
0 1 1 0 6 1 1 1 0 -2
0 1 1 1 7 1 1 1 1 -1
Converting Binary (2’s C) to Decimal
1. If MS bit is one (i.e. number is
negative), take two’s n 2n
complement to get a positive 0 1

number.
1 2
2 4

2. Get the decimal as if the


3 8
4 16

number is unsigned (using power 5 32

of 2s).
6 64
7 128

3. If original number was negative,


8 256
9 512

add a minus sign. 10 1024


Examples
X = 00100111two
= 25+22+21+20 = 32+4+2+1 n 2n
= 39ten 0 1
1 2
2 4
X = 11100110two 3 8
4 16
-X = 00011010 5 32
= 24+23+21 = 16+8+2 6 64
7 128
= 26ten 8 256

X= -26ten 9 512
10 1024
Numeric Ranges
Example: Assume 16-bit numbers
Decimal Hex Binary
Unsigned 65535 FF FF 11111111 11111111
Max
Signed 32767 7F FF 01111111 11111111
Max
Signed -32768 80 00 10000000 00000000
Min
-1 -1 FF FF 11111111 11111111
0 0 00 00 00000000 00000000
Values for Different Sizes
W
8 16 32 64
Unsig. 255 65,535 4,294,967,295 18,446,744,073,709,551,615
Max
Signed 127 32,767 2,147,483,647 9,223,372,036,854,775,807
Max
Signed -128 -32,768 -2,147,483,648 -9,223,372,036,854,775,808
Min

 C Programming
 #include <limits.h>
 Declares constants, e.g.,
 INT_MAX
 LONG_MAX
 INT_MIN
 UINT_MIN
 …
What happens if you change the
type of a variable
(aka type casting)?
Signed vs. Unsigned in C
• Constants
– By default, signed integers
– Unsigned with “U” as suffix
0U, 4294967259U
• Casting
– Explicit casting between signed & unsigned
int tx, ty;
unsigned ux, uy;
tx = (int) ux;
uy = (unsigned) ty;

– Implicit casting also occurs via assignments and


procedure calls
tx = ux;
uy = ty;
General Rule for Casting:
signed <-> unsigned
Follow these two steps:
1. Keep the bit presentation
2. Re-interpret

Effect:
• Numerical value may change.
• Bit pattern stays the same.
Mapping Signed  Unsigned
Bits Signed Unsigned
0000 0 0
0001 1 1
0010 2 2
0011 3 3
0100 4
= 4
0101 5 5
0110 6 6
0111 7 7
1000 -8 8
1001 -7 9
1010 -6 10
1011 -5
+/- 16 11
1100 -4 12
1101 -3 13
1110 -2 14
1111 -1 15
Casting Surprises
• Expression Evaluation
–If there is a mix of unsigned and signed in single
expression,
signed values implicitly cast to unsigned
–Including comparison operations <, >, ==, <=,
>=

If there is an expression
that has many types, the
compiler follows these rules.
Example
#include <stdio.h>

main() {

int i = -7;
unsigned j = 5;
Condition is
if( i > j ) TRUE!
printf(“Surprise!\n");

}
Expanding & Truncating a variable
Expanding
• Convert w-bit signed integer to w+k-bit
with same value
• Convert unsigned: pad k 0 bits in front
• Convert signed: make k copies of sign bit
w
X •••

•••

X ••• •••
k w
Sign Extension Example
short int x = 15213;
int ix = (int) x;
short int y = -15213;
int iy = (int) y;

Decimal Hex Binary


x 15213 3B 6D 00111011 01101101
ix 15213 00 00 3B 6D 00000000 00000000 00111011 01101101
y -15213 C4 93 11000100 10010011
iy -15213 FF FF C4 93 11111111 11111111 11000100 10010011

• Converting from smaller to larger


integer data type
• C automatically performs sign extension
Truncating
• Example: from int to short (i.e. from
32-bit to 16-bit)
• High-order bits are truncated
• Value is altered  must reinterpret
• This non-intuitive behavior can lead to
buggy code!  So don’t do it!
Addition, negation, multiplication,
and shifting
Negation: Complement & Increment
• The complement of x satisfies
Two’sComp(x) + x = 0
Two’sComp(x) = ~x + 1

• Proof sketch
– Observation: ~x + x == 1111…111 == -1
 ~x + x + 1 = 0
 (~x + 1) + x = 0
 Two’sComp(x) + x = 0

x 1 0 0 1 1 1 0 1

+ ~x 0 1 1 0 0 0 1 0

-1 1 1 1 1 1 1 1 1
Unsigned Addition
u •••
Operands: w bits
+ v •••
True Sum: w+1 bits u+v •••

Discard Carry: w bits UAddw(u , v) •••


Two’s Complement Addition
Operands: w bits u •••
+ v •••
True Sum: w+1 bits u+v •••
Discard Carry: w bits TAddw(u , v) •••

– If sum  2w–1, becomes negative (positive


overflow)
– If sum < –2w–1, becomes positive (negative
overflow)
Multiplication
• Exact Product of w-bit numbers x, y
– Either signed or unsigned
• Ranges
– Unsigned: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1
– Two’s complement min: x * y ≥ (–2w–1)*(2w–1–1) =
–22w–2 + 2w–1
– Two’s complement max: x * y ≤ (–2w–1) 2 = 22w–2
Power-of-2 Multiply with Shift
• Operation
– u << k gives u * 2k k
– Both signed and unsigned

• Examples
– u << 3 == u * 8
– (u << 5) – (u << 3) == u * 24
– Most machines shift and add faster than
multiply
• Compiler generates this code automatically
Compiled Multiplication Code
C Function
int mul12(int x)
{
return x*12;
}

Compiled Arithmetic Operations Explanation


leal (%eax,%eax,2), %eax t = x+x*2
sall $2, %eax return t << 2;

• C compiler automatically generates shift/add


code when multiplying by constant
Unsigned Power-of-2
Divide with Shift
• Quotient of Unsigned by Power of 2
– u >> k gives  u / 2k 

Examples:

Division Computed Hex Binary


x 15213 15213 3B 6D 00111011 01101101
x >> 1 7606.5 7606 1D B6 00011101 10110110
x >> 4 950.8125 950 03 B6 00000011 10110110
x >> 8 59.4257813 59 00 3B 00000000 00111011
Compiled Unsigned Division Code
C Function
unsigned udiv8(unsigned x)
{
return x/8;
}

Compiled Arithmetic Operations Explanation


shrl $3, %eax # Logical shift
return x >> 3;

• Uses logical shift for unsigned


• For Java Users
– Logical shift written as >>>
Signed Power-of-2 Divide with Shift
• Quotient of Signed by Power of 2
– x >> k gives  x / 2k 
– Uses arithmetic shift

Examples
Division Computed Hex Binary
y -15213 -15213 C4 93 11000100 10010011
y >> 1 -7606.5 -7607 E2 49 11100010 01001001
y >> 4 -950.8125 -951 FC 49 11111100 01001001
y >> 8 -59.4257813 -60 FF C4 11111111 11000100
Floating Points
Some slides and information about FP are adopted from
Prof. Michael Overton book:
Numerical Computing with IEEE Floating Point Arithmetic
Turing Award 1989 to William Kahan for design of the
IEEE Floating Point Standards 754 (binary) and 854
(decimal)
Carnegie Mellon

Background: Fractional binary


numbers
• What is 1011.1012?
Carnegie Mellon

Background: Fractional Binary Numbers


2i
2i-1

4
••• 2
1

bi bi-1 ••• b2 b1 b0 b-1 b-2 b-3 ••• b-j


1/2
1/4 •••
1/8

2-j

• Value:
Carnegie Mellon

Fractional Binary Numbers:


Examples
 Value Representation
5 3/4 101.112
2 7/8 10.1112
Carnegie Mellon

Why not fractional binary


numbers?
• Not efficient
 3 * 2100  1010000000 ….. 0
100 zeros

Given a finite length (e.g. 32-bits), cannot


represent very large nor very small numbers
(ε  0)
Carnegie Mellon

IEEE Floating Point


• IEEE Standard 754
– Supported by all major CPUs
– The IEEE standards committee consisted mostly
of hardware people, plus a few academics led by
W. Kahan at Berkeley.

• Main goals:
– Consistent representation of floating point
numbers by all machines .
– Correctly rounded floating point operations.
– Consistent treatment of exceptional situations
such as division by zero.
Carnegie Mellon

Floating Point Representation


• Numerical Form:
(–1)s M 2E
– Sign bit s determines whether number is negative
or positive
– Significand M a fractional value
– Exponent E weights value by power of two

• Encoding
– MSB s is sign bit s
– exp field encodes E
– frac field encodes M

s exp frac
Carnegie Mellon

Precisions
• Single precision: 32 bits
s exp frac
1 8-bits 23-bits

• Double precision: 64 bits


s exp frac
1 11-bits 52-bits

• Extended precision: 80 bits (Intel only)


s exp frac
1 15-bits 63 or 64-bits
Based on exp
we have 3 encoding schemes
• exp ≠ 0..0 or 11…1  normalized encoding
• exp = 0… 000  denormalized encoding
• exp = 1111…1  special value encoding
– frac = 000…0
– frac = something else
Carnegie Mellon

1. Normalized Encoding
• Condition: exp ≠ 000…0 and exp ≠ 111…1
referred to as Bias

• Exponent is: E = Exp – (2k-1 – 1), k is the # of


exponent bits
– Single precision: E = exp – 127
– Double precision: E = exp – 1023
frac Range(E)=[-126,127]
Range(E)=[-1022,1023]
• Significand is: M = 1.xxx…x2
– Range(M) = [1.0, 2.0-ε)
– Get extra leading bit for free
Normalized Encoding Example
• Value: Float F = 15213.0;
1521310 = 111011011011012
= 1.11011011011012 x 213

• Significand
M = 1.11011011011012
frac = 110110110110100000000002

• Exponent
E = exp – Bias = exp - 127 = 13
 exp = 140 = 100011002

• Result:

0 10001100 11011011011010000000000
s exp frac
Carnegie Mellon

2. Denormalized Encoding
(called subnormal in revised standard 854)
• Condition: exp = 000…0

• Exponent value: E = 1 – Bias (instead of E = 0 – Bias)


• Significand is: M = 0.xxx…x2 (instead of M=1.xxx2)
frac
• Cases
– exp = 000…0, frac = 000…0
• Represents zero
• Note distinct values: +0 and –0
– exp = 000…0, frac ≠ 000…0
• Numbers very close to 0.0
Carnegie Mellon

3. Special Values Encoding


• Condition: exp = 111…1

• Case: exp = 111…1, frac = 000…0


– Represents value  (infinity)
– Operation that overflows
– E.g., 1.0/0.0 = −1.0/−0.0 = +, 1.0/−0.0 = −

• Case: exp = 111…1, frac ≠ 000…0


– Not-a-Number (NaN)
– Represents case when no numeric value can be
determined
– E.g., sqrt(–1),  − ,   0
Carnegie Mellon

Visualization: Floating Point Encodings

− +
−Normalized −Denorm +Denorm +Normalized

NaN NaN
0 +0
Carnegie Mellon

Floating Point in C
• C:
–float single precision
–double double precision
• Conversions/Casting
–Casting between int, float, and double
changes bit representation, examples:
– double/float → int
• Truncates fractional part
• Not defined when out of range or NaN
– int → double
• Exact conversion
Conclusions
• Everything is stored in memory as 1s
and 0s
• The binary presentation by itself does
not carry a meaning, it depends on the
interpretation.
• IEEE Floating Point has clear
mathematical properties
– Represents numbers as: (-1)S x M x 2E

You might also like