Bits and Bytes PDF

CSCI-UA.
0201
Computer Systems Organization

Bits and Bytes:
Data Presentation
Mohamed Zahran (aka Z)
mzahran@cs.nyu.edu
http://www.mzahran.com
Some slides adapted

and modified from:
•Clark Barrett
•Jinyang Li
•Bryant and O’Hallaron
Bits and Bytes
• Representing information as bits
• How bits are manipulated?
• Integers
• Floating points
Our First Steps…
How do we represent data in a computer?
• How do we represent information using
electrical signals?
• At the lowest level, a computer is an
electronic machine.
• Easy to recognize two conditions:
– presence of a voltage – we call this state “1”
– absence of a voltage – we call this state “0”
Binary Representations
0 1 0
3.3V
2.8V
0.5V
0.0V
A Computer is a Binary Digital
Machine
• Basic unit of information is the binary digit, or bit.
• Values with more than two states require multiple
bits.
– A collection of two bits has four possible states:
00, 01, 10, 11
– A collection of three bits has eight possible states:
000, 001, 010, 011, 100, 101, 110, 111
– A collection of n bits has 2n possible states.
George Boole
• (1815-1864)
• English mathematician and
philosopher
• Inventor of Boolean Algebra
• Now we can use things like:
AND, OR, NOT, XOR,
XNOR, NAND, NOR, ….
Source: http://history-computer.com/ModernComputer/thinkers/Boole.html
Claude Shannon
• (1916–2001)
• American mathematician
and electronic engineer
• His work is the foundation
for using switches (i.e.
transistors), and hence
binary numbers, to
implement Boolean function.
Source: http://history-computer.com/ModernComputer/thinkers/Shannon.html
So, we use transistors to
implement logic gates.
Logic gates manipulate binary
numbers to implement Boolean
functions.
Boolean functions solve
problems.
…. Simply Speaking … 
Encoding Byte Values
• Byte = 8 bits
0 0 0000
– Binary 000000002 to 111111112 1 1 0001
2 2 0010
– Decimal: 010 to 25510 3 3 0011
4 4 0100
– Hexadecimal 0016 to FF16 5 5 0101
6 6 0110
• Base 16 number representation 7 7 0111
8 8 1000
• Every 4 bits  1 hexadecimal digit 9 9 1001
• Use characters ‘0’ to ‘9’ and ‘A’ to ‘F’ A
B
10
11
1010
1011
• Write FA1D37B16 in C language as C 12 1100
D 13 1101
– 0xFA1D37B E 14 1110
– 0xfa1d37b F 15 1111
Data Representations
C Data Type Typical 32-bit Intel IA32 x86-64
char 1 1 1
short 2 2 2
int 4 4 4
long 4 4 8
long long 8 8 8
float 4 4 4
double 8 8 8
pointer 4 4 8
The most widely used .

Byte Ordering
• How are bytes within a multi-byte word
ordered in memory?
• Conventions
– Big Endian: Sun, PPC, Internet
• Most significant byte has lowest address
– Little Endian: x86
• Most significant byte has highest address
Byte Ordering Example
• Big Endian
– Most significant byte has lowest address Most
• Little Endian Significant
Byte
– Most significant byte has highest address
• Example
– Variable x has 4-byte representation 0x01234567
– Address given by &x is 0x100
Big Endian 0x100 0x101 0x102 0x103
01 23 45 67
Little Endian 0x100 0x101 0x102 0x103

67 45 23 01
Reading Byte-Reversed Listings
• Disassembly
– given the binary file, get the assembly
• Example Fragment
Address Instruction Code Assembly Rendition
8048365: 5b pop %ebx
8048366: 81 c3 ab 12 00 00 add $0x12ab,%ebx
804836c: 83 bb 28 00 00 00 00 cmpl $0x0,0x28(%ebx)
• Deciphering Numbers
– Value: 0x12ab
– Pad to 32 bits (int is 4 bytes_: 0x000012ab
– Split into bytes: 00 00 12 ab
– Reverse (little endian): ab 12 00 00
Examining Data Representations
• Code to print Byte Representation of
data
void show_bytes(unsigned char * start, int len){

int i;
for (i = 0; i < len; i++)
printf(”%p\t%2x\n",start+i, start[i]);
printf("\n");
}
printf directives:
%p: Print pointer
%x: Print Hexadecimal
show_bytes Execution Example
int a = 15213;
printf("int a = 15213;\n");
show_bytes((unsigned char *) &a, sizeof(int));
Result (Linux):
int a = 15213;
0x11ffffcb8 0x6d
0x11ffffcb9 0x3b
0x11ffffcba 0x00
0x11ffffcbb 0x00
Note: 15213 in decimal is 3B6D in hexadecimal

Representing Integers
Decimal: 15213
Binary: 0011 1011 0110 1101
Hex: 3 B 6 D
int A = 15213; long int C = 15213;
IA32, x86-64 Sun
IA32 x86-64 Sun
6D 00
6D 6D 00
3B 00
3B 3B 00
00 3B
00 00 3B
00 6D
00 00 6D
00
00
00
00
Representing Strings
• Strings in C char S[6] = "18243";
– Represented by array of characters

– Each character encoded in ASCII format
• Standard 7-bit encoding of character set
• Character ‘0’ has code 0x30
– Digit i has code 0x30+i Little Endian Big Endian
– String should be null-terminated 31 31

38 38
• Byte ordering not an issue 32 32
34 34
Byte Ordering is an issue for a data item. 33 33
An array is a group of data items. 00 00
How to Manipulate Bits?
Boolean Algebra
• Developed by George Boole in 19th Century
– Algebraic representation of logic
• Encode “True” as 1 and “False” as 0
And Or
 A&B = 1 when both A=1 and B=1  A|B = 1 when either A=1 or B=1
A B A&B A B A|B
0 0 0 0 0 0
0 1 0 0 1 1
1 0 0 1 0 1
1 1 1 1 1 1
Not Exclusive-Or (Xor)
 ~A = 1 when A=0  A^B = 1 when either A=1 or B=1, but not both
A ~A A B A^B
0 1 0 0 0
1 0 0 1 1
1 0 1
1 1 0
Application of Boolean Algebra
• Applied to Digital Systems by Claude
Shannon
– 1937 MIT Master’s Thesis
– Reason about networks of relay switches
• Encode closed switch as 1, open switch as 0
Transistor
General Boolean Algebras
• Operate on Bit Vectors (e.g. an integer is a bit vector
of 4 bytes = 32 bits)
– Operations applied bitwise
01101001 01101001 01101001
& 01010101 | 01010101 ^ 01010101 ~ 01010101
01000001
01000001 01111101
01111101 00111100
00111100 10101010
10101010
Bit-Level Operations in C
• Operations &, |, ~, ^ Available in C
– Apply to any “integral” data type
• long, int, short, char, unsigned
• Examples (Char data type)
– ~0x41 = 0xBE
• ~010000012 = 101111102
– ~0x00 = 0xFF
• ~000000002 = 111111112
– 0x69 & 0x55 = 0x41
• 011010012 & 010101012 = 010000012
– 0x69 | 0x55 = 0x7D
• 011010012 | 010101012 = 011111012
Contrast: Logic Operations in C
• Contrast to Logical Operators
&&, ||, !
• View 0 as “False”
• Anything nonzero as “True”
• Always return 0 or 1
• Examples (char data type)
– !0x41 = 0x00
– !0x00 = 0x01
– !!0x41 = 0x01
– 0x69 && 0x55 = 0x01

– 0x69 || 0x55 = 0x01
– p && *p (avoids null pointer access)
Boolean in C
• Did not exist in standard C89/90
• It was introduced in C99 standard
• You may need to use the following switch
with gcc:
gcc –std=c99 …
#include <stdbool.h>
bool x;
x = false;  lower case
x = true;
Shift Operations
• Left Shift: x << y
Argument x 01100010
– Shift x left by y positions
• Throw away extra bits on left << 3 00010000
• Fill with 0’s on right Log. >> 2 00011000
• Right Shift: x >> y Arith. >> 2 00011000
– Shift x right y positions
• Throw away extra bits on right
– type1: Logical shift Argument x 10100010
• Fill with 0’s on left
<< 3 00010000
– type 2: Arithmetic shift (covered later)
• Replicate most significant bit on right Log. >> 2 00101000
• Undefined Behavior Arith. >> 2 11101000
– Shift amount < 0 or ≥ size of x
How to present Integers?
(unsigned and signed)
Two Type of Integers
• Unsigned
– positive numbers and 0
• Signed numbers
– negative numbers as well as positive
numbers and 0
Unsigned Integers
w1
B2U(X)   xi 2 i
i0
10111011
128 64 32 16 8 4 2 1
187
Unsigned Integers
• An n-bit unsigned integer represents 2n
values:
22 21 2 0
from 0 to 2n-1. 0 0 0 0
0 0 1 1
0 1 0 2
0 1 1 3
1 0 0 4
1 0 1 5
1 1 0 6
1 1 1 7
Unsigned Binary Arithmetic
• Base-2 addition – just like base-10
– add from right to left, propagating carry
carry
10010 10010 1111

+ 1001 + 1011 + 1
11011 11101 10000
10111
+ 111
What About Negative Numbers?
People have tried several options:
Sign Magnitude: One's Complement Two's Complement
000 = +0 000 = +0 000 = +0
001 = +1 001 = +1 001 = +1
010 = +2 010 = +2 010 = +2
011 = +3 011 = +3 011 = +3
100 = -0 100 = -3 100 = -4
101 = -1 101 = -2 101 = -3
110 = -2 110 = -1 110 = -2
111 = -3 111 = -0 111 = -1
• Issues: balance, number of zeros, ease of operations

• Which one is best? Why?
Signed Integers
• With n bits, we have 2n distinct values.
– assign about half to positive integers and
about half to negative
• Positive integers
– just like unsigned: zero in most significant
(MS) bit
00101 = 5
• Negative integers
– In two’s complement form
In general: a 0 at the MS bit indicates positive
and a 1 indicates negative.
Two’s Complement
• Two’s complement representation
developed to make circuits easy for
arithmetic.
– for each positive number (X), assign value to
its negative (-X),
such that X + (-X) = 0 with “normal” addition,
ignoring carry out
00101 (5) 01001 (9)

+ 11011 (-5) + 10111 (-9)
00000 (0) 00000 (0)
Two’s Complement Signed Integers
• MS bit is sign bit.
• Range of an n-bit number: -2n-1 through 2n-1 – 1.
– The most negative number (-2n-1) has no positive
counterpart.
-23 22 21 20 -23 22 21 20
0 0 0 0 0 1 0 0 0 -8
0 0 0 1 1 1 0 0 1 -7
0 0 1 0 2 1 0 1 0 -6
0 0 1 1 3 1 0 1 1 -5
0 1 0 0 4 1 1 0 0 -4
0 1 0 1 5 1 1 0 1 -3
0 1 1 0 6 1 1 1 0 -2
0 1 1 1 7 1 1 1 1 -1
Converting Binary (2’s C) to Decimal
1. If MS bit is one (i.e. number is
negative), take two’s n 2n
complement to get a positive 0 1
number.
1 2
2 4
2. Get the decimal as if the

3 8
4 16
number is unsigned (using power 5 32
of 2s).
6 64
7 128
3. If original number was negative,

8 256
9 512
add a minus sign. 10 1024

Examples
X = 00100111two
= 25+22+21+20 = 32+4+2+1 n 2n
= 39ten 0 1
1 2
2 4
X = 11100110two 3 8
4 16
-X = 00011010 5 32
= 24+23+21 = 16+8+2 6 64
7 128
= 26ten 8 256
X= -26ten 9 512
10 1024
Numeric Ranges
Example: Assume 16-bit numbers
Decimal Hex Binary
Unsigned 65535 FF FF 11111111 11111111
Max
Signed 32767 7F FF 01111111 11111111
Max
Signed -32768 80 00 10000000 00000000
Min
-1 -1 FF FF 11111111 11111111
0 0 00 00 00000000 00000000
Values for Different Sizes
W
8 16 32 64
Unsig. 255 65,535 4,294,967,295 18,446,744,073,709,551,615
Max
Signed 127 32,767 2,147,483,647 9,223,372,036,854,775,807
Max
Signed -128 -32,768 -2,147,483,648 -9,223,372,036,854,775,808
Min
 C Programming
 #include <limits.h>
 Declares constants, e.g.,
 INT_MAX
 LONG_MAX
 INT_MIN
 UINT_MIN
 …
What happens if you change the
type of a variable
(aka type casting)?
Signed vs. Unsigned in C
• Constants
– By default, signed integers
– Unsigned with “U” as suffix
0U, 4294967259U
• Casting
– Explicit casting between signed & unsigned
int tx, ty;
unsigned ux, uy;
tx = (int) ux;
uy = (unsigned) ty;
– Implicit casting also occurs via assignments and

procedure calls
tx = ux;
uy = ty;
General Rule for Casting:
signed <-> unsigned
Follow these two steps:
1. Keep the bit presentation
2. Re-interpret
Effect:
• Numerical value may change.
• Bit pattern stays the same.
Mapping Signed  Unsigned
Bits Signed Unsigned
0000 0 0
0001 1 1
0010 2 2
0011 3 3
0100 4
= 4
0101 5 5
0110 6 6
0111 7 7
1000 -8 8
1001 -7 9
1010 -6 10
1011 -5
+/- 16 11
1100 -4 12
1101 -3 13
1110 -2 14
1111 -1 15
Casting Surprises
• Expression Evaluation
–If there is a mix of unsigned and signed in single
expression,
signed values implicitly cast to unsigned
–Including comparison operations <, >, ==, <=,
>=
If there is an expression
that has many types, the
compiler follows these rules.
Example
#include <stdio.h>
main() {
int i = -7;
unsigned j = 5;
Condition is
if( i > j ) TRUE!
printf(“Surprise!\n");
}
Expanding & Truncating a variable
Expanding
• Convert w-bit signed integer to w+k-bit
with same value
• Convert unsigned: pad k 0 bits in front
• Convert signed: make k copies of sign bit
w
X •••
•••
X ••• •••
k w
Sign Extension Example
short int x = 15213;
int ix = (int) x;
short int y = -15213;
int iy = (int) y;
Decimal Hex Binary

x 15213 3B 6D 00111011 01101101
ix 15213 00 00 3B 6D 00000000 00000000 00111011 01101101
y -15213 C4 93 11000100 10010011
iy -15213 FF FF C4 93 11111111 11111111 11000100 10010011
• Converting from smaller to larger

integer data type
• C automatically performs sign extension
Truncating
• Example: from int to short (i.e. from
32-bit to 16-bit)
• High-order bits are truncated
• Value is altered  must reinterpret
• This non-intuitive behavior can lead to
buggy code!  So don’t do it!
Addition, negation, multiplication,
and shifting
Negation: Complement & Increment
• The complement of x satisfies
Two’sComp(x) + x = 0
Two’sComp(x) = ~x + 1
• Proof sketch
– Observation: ~x + x == 1111…111 == -1
 ~x + x + 1 = 0
 (~x + 1) + x = 0
 Two’sComp(x) + x = 0
x 1 0 0 1 1 1 0 1
+ ~x 0 1 1 0 0 0 1 0
-1 1 1 1 1 1 1 1 1
Unsigned Addition
u •••
Operands: w bits
+ v •••
True Sum: w+1 bits u+v •••
Discard Carry: w bits UAddw(u , v) •••

Two’s Complement Addition
Operands: w bits u •••
+ v •••
True Sum: w+1 bits u+v •••
Discard Carry: w bits TAddw(u , v) •••
– If sum  2w–1, becomes negative (positive

overflow)
– If sum < –2w–1, becomes positive (negative
overflow)
Multiplication
• Exact Product of w-bit numbers x, y
– Either signed or unsigned
• Ranges
– Unsigned: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1
– Two’s complement min: x * y ≥ (–2w–1)*(2w–1–1) =
–22w–2 + 2w–1
– Two’s complement max: x * y ≤ (–2w–1) 2 = 22w–2
Power-of-2 Multiply with Shift
• Operation
– u << k gives u * 2k k
– Both signed and unsigned
• Examples
– u << 3 == u * 8
– (u << 5) – (u << 3) == u * 24
– Most machines shift and add faster than
multiply
• Compiler generates this code automatically
Compiled Multiplication Code
C Function
int mul12(int x)
{
return x*12;
}
Compiled Arithmetic Operations Explanation

leal (%eax,%eax,2), %eax t = x+x*2
sall $2, %eax return t << 2;
• C compiler automatically generates shift/add

code when multiplying by constant
Unsigned Power-of-2
Divide with Shift
• Quotient of Unsigned by Power of 2
– u >> k gives  u / 2k 
Examples:
Division Computed Hex Binary

x 15213 15213 3B 6D 00111011 01101101
x >> 1 7606.5 7606 1D B6 00011101 10110110
x >> 4 950.8125 950 03 B6 00000011 10110110
x >> 8 59.4257813 59 00 3B 00000000 00111011
Compiled Unsigned Division Code
C Function
unsigned udiv8(unsigned x)
{
return x/8;
}
Compiled Arithmetic Operations Explanation

shrl $3, %eax # Logical shift
return x >> 3;
• Uses logical shift for unsigned

• For Java Users
– Logical shift written as >>>
Signed Power-of-2 Divide with Shift
• Quotient of Signed by Power of 2
– x >> k gives  x / 2k 
– Uses arithmetic shift
Examples
Division Computed Hex Binary
y -15213 -15213 C4 93 11000100 10010011
y >> 1 -7606.5 -7607 E2 49 11100010 01001001
y >> 4 -950.8125 -951 FC 49 11111100 01001001
y >> 8 -59.4257813 -60 FF C4 11111111 11000100
Floating Points
Some slides and information about FP are adopted from
Prof. Michael Overton book:
Numerical Computing with IEEE Floating Point Arithmetic
Turing Award 1989 to William Kahan for design of the
IEEE Floating Point Standards 754 (binary) and 854
(decimal)
Carnegie Mellon
Background: Fractional binary

numbers
• What is 1011.1012?
Carnegie Mellon
Background: Fractional Binary Numbers

2i
2i-1
4
••• 2
1
bi bi-1 ••• b2 b1 b0 b-1 b-2 b-3 ••• b-j

1/2
1/4 •••
1/8
2-j
• Value:
Carnegie Mellon
Fractional Binary Numbers:

Examples
 Value Representation
5 3/4 101.112
2 7/8 10.1112
Carnegie Mellon
Why not fractional binary

numbers?
• Not efficient
 3 * 2100  1010000000 ….. 0
100 zeros
Given a finite length (e.g. 32-bits), cannot

represent very large nor very small numbers
(ε  0)
Carnegie Mellon
IEEE Floating Point

• IEEE Standard 754
– Supported by all major CPUs
– The IEEE standards committee consisted mostly
of hardware people, plus a few academics led by
W. Kahan at Berkeley.
• Main goals:
– Consistent representation of floating point
numbers by all machines .
– Correctly rounded floating point operations.
– Consistent treatment of exceptional situations
such as division by zero.
Carnegie Mellon
Floating Point Representation

• Numerical Form:
(–1)s M 2E
– Sign bit s determines whether number is negative
or positive
– Significand M a fractional value
– Exponent E weights value by power of two
• Encoding
– MSB s is sign bit s
– exp field encodes E
– frac field encodes M
s exp frac
Carnegie Mellon
Precisions
• Single precision: 32 bits
s exp frac
1 8-bits 23-bits
• Double precision: 64 bits

s exp frac
1 11-bits 52-bits
• Extended precision: 80 bits (Intel only)

s exp frac
1 15-bits 63 or 64-bits
Based on exp
we have 3 encoding schemes
• exp ≠ 0..0 or 11…1  normalized encoding
• exp = 0… 000  denormalized encoding
• exp = 1111…1  special value encoding
– frac = 000…0
– frac = something else
Carnegie Mellon
1. Normalized Encoding
• Condition: exp ≠ 000…0 and exp ≠ 111…1
referred to as Bias
• Exponent is: E = Exp – (2k-1 – 1), k is the # of

exponent bits
– Single precision: E = exp – 127
– Double precision: E = exp – 1023
frac Range(E)=[-126,127]
Range(E)=[-1022,1023]
• Significand is: M = 1.xxx…x2
– Range(M) = [1.0, 2.0-ε)
– Get extra leading bit for free
Normalized Encoding Example
• Value: Float F = 15213.0;
1521310 = 111011011011012
= 1.11011011011012 x 213
• Significand
M = 1.11011011011012
frac = 110110110110100000000002
• Exponent
E = exp – Bias = exp - 127 = 13
 exp = 140 = 100011002
• Result:
0 10001100 11011011011010000000000
s exp frac
Carnegie Mellon
2. Denormalized Encoding
(called subnormal in revised standard 854)
• Condition: exp = 000…0
• Exponent value: E = 1 – Bias (instead of E = 0 – Bias)

• Significand is: M = 0.xxx…x2 (instead of M=1.xxx2)
frac
• Cases
– exp = 000…0, frac = 000…0
• Represents zero
• Note distinct values: +0 and –0
– exp = 000…0, frac ≠ 000…0
• Numbers very close to 0.0
Carnegie Mellon
3. Special Values Encoding

• Condition: exp = 111…1
• Case: exp = 111…1, frac = 000…0

– Represents value  (infinity)
– Operation that overflows
– E.g., 1.0/0.0 = −1.0/−0.0 = +, 1.0/−0.0 = −
• Case: exp = 111…1, frac ≠ 000…0

– Not-a-Number (NaN)
– Represents case when no numeric value can be
determined
– E.g., sqrt(–1),  − ,   0
Carnegie Mellon
Visualization: Floating Point Encodings
− +
−Normalized −Denorm +Denorm +Normalized
NaN NaN
0 +0
Carnegie Mellon
Floating Point in C
• C:
–float single precision
–double double precision
• Conversions/Casting
–Casting between int, float, and double
changes bit representation, examples:
– double/float → int
• Truncates fractional part
• Not defined when out of range or NaN
– int → double
• Exact conversion
Conclusions
• Everything is stored in memory as 1s
and 0s
• The binary presentation by itself does
not carry a meaning, it depends on the
interpretation.
• IEEE Floating Point has clear
mathematical properties
– Represents numbers as: (-1)S x M x 2E

Bits and Bytes PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bits and Bytes PDF

Uploaded by

Copyright:

Available Formats

CSCI-UA.

Computer Systems Organization

Some slides adapted

The most widely used .

Little Endian 0x100 0x101 0x102 0x103

void show_bytes(unsigned char * start, int len){

Note: 15213 in decimal is 3B6D in hexadecimal

– Represented by array of characters

– String should be null-terminated 31 31

– 0x69 && 0x55 = 0x01

10010 10010 1111

• Issues: balance, number of zeros, ease of operations

00101 (5) 01001 (9)

2. Get the decimal as if the

number is unsigned (using power 5 32

3. If original number was negative,

add a minus sign. 10 1024

– Implicit casting also occurs via assignments and

Decimal Hex Binary

• Converting from smaller to larger

Discard Carry: w bits UAddw(u , v) •••

– If sum  2w–1, becomes negative (positive

Compiled Arithmetic Operations Explanation

• C compiler automatically generates shift/add

Division Computed Hex Binary

Compiled Arithmetic Operations Explanation

• Uses logical shift for unsigned

Background: Fractional binary

Background: Fractional Binary Numbers

bi bi-1 ••• b2 b1 b0 b-1 b-2 b-3 ••• b-j

Fractional Binary Numbers:

Why not fractional binary

Given a finite length (e.g. 32-bits), cannot

IEEE Floating Point

Floating Point Representation

• Double precision: 64 bits

• Extended precision: 80 bits (Intel only)

• Exponent is: E = Exp – (2k-1 – 1), k is the # of

• Exponent value: E = 1 – Bias (instead of E = 0 – Bias)

3. Special Values Encoding

• Case: exp = 111…1, frac = 000…0

• Case: exp = 111…1, frac ≠ 000…0

Visualization: Floating Point Encodings

You might also like