You are on page 1of 18

Increasing Information per Bit

• Information in a source
– Mathematical Models of Sources
– Information Measures

• Compressing information
– Huffman encoding
• Optimal Compression for DMS?
– Lempel-Ziv-Welch Algorithm
• For Stationary Sources?
• Practical Compression
• Quantization of analog data
– Scalar Quantization
– Vector Quantization
– Model Based Coding
– Practical Quantization
• -law encoding
• Delta Modulation
• Linear Predictor Coding (LPC)
Huffman encoding
• Variable length binary code for DMS
– finite alphabet, fixed probabilities
• Code satisfies the Prefix Condition
– Codes are instantaneously and unambiguously
decodable as they arrive
e.g, 0,10,110,111 is OK
0,01,011,111 is not OK 01 = 0,111, or 01
Huffman encoding
• Use Probabilities to order coding priorities of letters
– Low probability get codes first (more bits)
– This smoothes out the information per bit
Huffman encoding
• Use a code tree to make the code
• Combine the symbols with lowest probability to make a new block symbol
• Assign a 1 to one of the old symbols code word and 0 to the other symbol
• Now reorder and combine the two lowest probability symbols of the new set
• Each time the synthesized block symbol has lowest probability the code words
get shorter

x1 00
x2 01
x3 10
x4 110
x5 1110
x6 11110
x7 11111

D0 D1 D2 D3 D4
Huffman encoding
• Result:
– Self Information or Entropy
• H(X) = 2.11 (The best possible average number of bits)
– Average number of bits per letter
_ 7
R   nk P ( xk )
k 1

 2.21

nk = number of bits per symbol

H (X )
So the efficiency = _
 95.5%
R
Huffman encoding
• Lets compare to simple 3 bit code

x1 00 2 0.35 000 3 0.35


x2 01 2 0.3 001 3 0.3
x3 10 2 0.2 010 3 0.2
x4 110 3 0.1 011 3 0.1
x5 1110 4 0.04 100 3 0.04
x6 11110 5 0.005 101 3 0.005
x7 11111 5 0.005 110 3 0.005
R= 2.21 R= 3
Efficiency 95% 70%
Huffman encoding
• Another example

x1 00
x2 010
x3 011
x4 100
x5 101
x6 110
x7 1110
X8 1111

D0 D1 D2 D3
Huffman encoding
• Multi-symbol Block codes
– Use symbols made of original symbols
x1 x1 , x1 x2  x7 x7  J  L2 Symbols

_
Can show the new codes information per bit ( R J ) satisfies:
_
Rj 1
 H (X ) 
J J
_
RJ
lim  H (X )
J  J

So large enough block code gets you as close to H(X) as you want
Huffman encoding

• Lets consider
a J=2 block
code example
Encoding Stationary Sources
• Now there are joint probabilities of blocks of
symbols that depend on previous bits
P( x1 x2  xk )  P ( x1 ) P ( x2 | x1 ) P ( x3 | x1 x2 )  P ( xk | x1  xk 1 )
 P ( x1 ) P( x2 ) P ( x3 )  P( xk ) Unless DMS

Can show the joint entropy is:


H ( X 1 X 2  X k )  H ( X 1 )  H ( X 2 | X 1 )    H ( X k | X 1 X 2  X k 1 )
k
  H(Xm)
m 1

Which means less bits than a symbol by symbol code can be used
Encoding Stationary Sources
• H(X | Y) is the conditional entropy
Joint (total) probability of xi
n m
H ( X | Y )   P ( xi | y j ) P( y j ) log 2 P ( xi | y j )
i 1 j 1
Information in xi given yj
Can show:
n m P( y j | xi )
H ( X | Y )   P( y j | xi ) P( xi ) log 2 ( P( xi ))
i 1 j 1 P( y j )
Conditional Entropy
• Plotting this for n = m = 2 we see that when
Y depends strongly on X then H(X|Y) is low

Conditional Entropy H(X|Y) vs P(X=0) for various P(Y|X) P(Y=0|X=0)


P(Y=1|X=1)
1.20E+00
0.05
1.00E+00
0.15
8.00E-01
H(X|Y)

0.35
6.00E-01
0.5
4.00E-01 0.75
2.00E-01 0.9999
0.00E+00
0.00E+00 2.00E-01 4.00E-01 6.00E-01 8.00E-01 1.00E+00 1.20E+00
P(X=0)
Conditional Entropy
• To see how P(X|Y) and P(Y|X) relate consider:
• They are very simlar when P(X=0) ~ 0.5
P(X=0|Y=0)

1.20E+00

1.00E+00 P(Y=0|X=0)
0.05
8.00E-01
0.15
0.35
6.00E-01
0.5
0.75
4.00E-01
0.95
2.00E-01

0.00E+00
0.00E+00 2.00E-01 4.00E-01 6.00E-01 8.00E-01 1.00E+00 1.20E+00
P(X=0)
Optimal Codes for Stationary Sources
• Can show that for large blocks of symbols
Huffman encoding is efficient
1
Define H J ( X )  H ( X1 X 2  X J )
J
_
1
Then Huffman code gives: HJ (X )  R  HJ (X ) 
J
Now if J 
_
Get: H ( X )  R  H ( X )  
 0
J  i.e., Huffman is optimal
Lempel-Ziv-Welch Code
• Huffman encoding is efficient but need to
know joint probabilities of large blocks of
symbols
• Finding joint probabilities is hard
• LZW is independent of source statistics
• Is a universal source code algorithm
• Is not optimal
Lempel-Ziv-Welch
• Build a table from strings not already in the table
• Output table location for strings in the table
• Build the table again to decode
Input String = /WED/WE/WEE/WEB/WET
Characters Input Code Output New code value New String
/ / 256 /W
W W 257 WE
E E 258 ED
D D 259 D/
/W 256 260 /WE
E E 261 E/
/WE 260 262 /WEE
E/ 261 263 E/W
WE 257 264 WEB
B B 265 B/
/WE 260 266 /WET
T T

Source: http://dogma.net/markn/articles/lzw/lzw.htm
Lempel-Ziv-Welch
• Decode
Input Codes: / W E D 256 E 260 261 257 B 260 T

Input/ STRING/
OLD_CODE CHARACTER New table entry
NEW_CODE Output

/ / /

W / W W 256 = /W
E W E E 257 = WE

D E D D 258 = ED

256 D /W / 259 = D/

E 256 E E 260 = /WE

260 E /WE / 261 = E/

261 260 E/ E 262 = /WEE

257 261 WE W 263 = E/W

B 257 B B 264 = WEB

260 B /WE / 265 = B/

T 260 T T 266 = /WET


Lempel-Ziv-Welch
• Typically takes hundreds of entries to table
before compression occurs
• Some nasty patents make licensing an issue

You might also like