You are on page 1of 40

Communication Systems

Technology Embedded in Daily Life

VSP 2019, Dr. Lutz Lampe & Dr. Paul Lusina


How compact can you get?

Coding theorem for a single source symbol:


Consider a source X whose symbols are represented by a
binary prefix-free code. Then (1) the average codeword length
satisfies
H(X)  Lav bit/symbols
and (2) there exists a code for which
Lav < H(X) + 1 bit/symbol.
How compact can you get?
H(X)  Lav < H(X) + 1 bits
bit/symbol

P
r
li
1) Lower bound follows from Kraft inequality 2 1
i=1
P
I
2) Note: H(X) = pi log 2(pi )
i=1
P
I
Lav = pi l i
i=1

Hence, if li = log2 (pi ) then Lav = H(X). So these are


the optimum codeword lengths.

3) Upper bound: Choose li = d log2 (pi )e and construct


prefix-free code
Average word length
Example: C = {00, 10, 11, 011}
with probabilities p1 , . . . , p4
0
p1
1
p 1 + p4
p 1 + p4 + p4 p4
p 2 + p3 p2
p 2 + p3 p3

The average codeword length is equal to the sum of the


probabilities of all parent nodes in the tree including the root
nodes.
Optimal codes: Preliminaries
Fact 1: There is no unused leaf in a binary tree of an optimal
binary prefix-free code.

Imagine there was …


Optimal codes: Preliminaries
Fact 1: There is no unused leaf in a binary tree of an optimal
binary prefix-free code.

Imagine there was … then we can delete it (code was not optimal)
Optimal codes: Preliminaries
Fact 2: There exists an optimal binary prefix-free code such that
the two least likely codewords only differ in the last digit,
i.e., they are are siblings.

least likely longest codewords p1 p2 ... pr


l1  l2  ...  lr

Option 1: Option 2:
pr

pr
pr 1 pr 1
Optimal codes: Huffman code
Facts 1 + 2: Start from leaves and build towards root as follows

Step 1 Create r leaf nodes and assign probabilities p1…pr.


Mark those nodes as active.

Step 2 Create a parent node to the two least likely active nodes,
and assign it the corresponding probability.
Activate this new node and deactivate the children nodes.
Step 3 If there is only one active node left, it becomes the root.
Otherwise, go to Step 2.
Optimal codes: Huffman code
Example: AX = {A, B, C, D}
pA = 0.2, pB = 0.5, pC = 0.2, pD = 0.1

L=1 r=4 codewords with probabilities


p1 = 0.2, p2 = 0.5, p3 = 0.2, p4 = 0.1

Huffman code:

Lav = 1.8
For a large file encoded using the following Huffman code, what is the
probability of randomly selecting a 1?

Symbol Probability Code


a 1/2 0
b 1/4 10
c 1/8 110
d 1/8 111

A p1>1/2
B p1=1/2
C p1<1/2
For a large file encoded using the following Huffman code, what is the
probability of randomly selecting a 1?

Symbol Probability Code


a 1/2 0
b 1/4 10
c 1/8 110
d 1/8 111

A p1>1/2
B p1=1/2
C p1<1/2
Where Symbol (Huffman) Codes Fail (I)

Entropy:
H = å i =1 pi log2 1 pi
6

= (.95)log2 1 .95 + 5(.01)log2 1 .01


= 0.4025 bits
Expected Length:
L(C , X ) = å i =1 pi ! i
6

= (.95)(1) + 3(.01)(3) + 2(.01)(4)


= 1.12 bits

1 bit ‘overhead’ on symbol codes can be costly for distributions with


entropy <<1
Where Symbol (Huffman) Codes Fail (II)

Predict the next symbol in the following sequences:


1. abacbb___
2. req____
3. We delivered it at his req _ _ _ _

Symbol codes cannot exploit redundancy (context) in the overall


message.
Where Symbol (Huffman) Codes Fail (III)

Predict the next symbol in the following sequences:

1. I am twice as ha_ _ _ _ _ _ _ _.

2. Ich bin zwei mal glück _ _ _ _ _ _ _ _ _ _ _ _.

Symbol codes cannot easily adapt if the symbol probabilities


change, i.e. letter frequency is different in English and German
Stream Codes

Use patterns or data context to compress multiple symbols of a message.


Lempel-Ziv codes:
- Identify patterns and replace the patterns with binary indices.
- For N infinitely large, they are equivalent to the source entropy.
- Agnostic of the source probability, i.e. the receiver need only know the
encoding algorithm and not the symbol probabilities.

Arithmetic codes:
- Bayesian model which uses the conditional probabilities of strings of
outcomes to partition the outcome space from [1 0).
- The partition label describes the compressed file.
- Closely matches the Shannon information content of the source string.
- Receiver need to know the symbol conditional probabilities.
Lempel Ziv Coding
- learn dictionary on the fly
- encode streams of data
- asymptotically optimal

Example1 AABABBBABAABABBBABBABB
A|ABABBBABAABABBBABBABB
A|AB|ABBBABAABABBBABBABB
A|AB|ABB|BABAABABBBABBABB

A|AB|ABB|B|ABA|ABAB|BB|ABBA|BB

1http://math.mit.edu/~goemans/18310S15/lempel-ziv-notes.pdf
Lempel Ziv Coding
Example1 AABABBBABAABABBBABBABB

phrase codebook entry encoding binary


∅ 0
A 1 A 0
AB 2 1B 11
ABB 3 2B 10 1
B 4 0B 00 1
ABA 5 2A 010 0
ABAB 6 5B 101 1
BB 7 4B 100 1
ABBA 8 3A 011 0
BB 7 0111

1http://math.mit.edu/~goemans/18310S15/lempel-ziv-notes.pdf
Lempel Ziv Coding
Example1 AABABBBABAABABBBABBABB

phrase codebook entry encoding binary


∅ 0
A 1 A 0
AB 2 1B 11
ABB 3 2B 10 1
B 4 0B 00 1
ABA 5 2A 010 0
ABAB 6 5B 101 1
BB 7 4B 100 1
ABBA 8 3A 011 0
BB 7 0111

,0|1,1|10,1|00,1|010,0|101,1|100,1|011,0|0111
1http://math.mit.edu/~goemans/18310S15/lempel-ziv-notes.pdf
Lempel Ziv Coding
Many versions and extensions
e.g. Lempel-Ziv-Welch, see
https://en.wikipedia.org/wiki/Lempel–Ziv–Welch

Used in gzip, compress, tiff, …


Some extensions
- Concept of typical sequences as a way to conceptualize
data compression and prove that Shannon entropy is the
relevant measure of information (formalized in the
“Shannon Source Coding Theorem”)

- Arithmetic coding as a practical lossless compression


mechanism used in many compression methods (e.g.
JPEG2000, JBIG, MPEG4)
Lossless vs lossy compression
So far: lossless compression (entropy coding)
= data can be fully recovered from codewords

Lossy compression: part of the information content cannot


be restored

When do we do this?
- We do not care for the lost information (irrelevance)
e.g., sound we cannot hear, contrast we cannot see
- We do not have the resources for all the information
e.g., low resolution content over low-bandwidth links
Lossy compression
Analog Digital
Processing Sampling Quantization
signal
signal

Companding Analog-to-digital converter (ADC)


Prediction
Transformation

Digital Entropy compressed


Processing Quantization
signal coding digital signal
Quantization

quantization error

}
11
}

10

01

00
Quantization
Model x q = x + nq
quantized signal quantization
signal sample error

Assume
- quantization using m bits
Q=2m quantization intervals
- quantization range [-xmax, xmax]
and uniform quantization
interval size D=2xmax/Q
Quantization
xq
xmax

D clipping→
}
}
-xmax
xmax x
←clipping

-xmax
Quantization
Model x q = x + nq
quantized signal quantization
signal sample error

Assume
- negligible clipping
-xmax≤x≤xmax approximately
D2
- nq is uniformly distributed Pn q = E(n2q ) = 12

p
Px
Define: modulation level A = xmax
Quantization
Model x q = x + nq
quantized signal quantization
signal sample error

Measure of distortion
Signal-to-quantization-noise power ratio (SQNR)
E(x2 ) Px
SQNR = E(n2q ) = Pn q

Assumptions
12A2 x2max
SQNR = D2 = 3A2 Q2 = 3A2 22m
Quantization
in decibel (dB):
SQNR = 10 log10 (3A2 22m ) = 10 log10 (3) + 10 log10 (A2 ) + m20 log10 (2)

SQNR = 4.77 dB + 10 log10 (A2 ) + m · 6 dB

Each bit more for ADC gives 6 dB gain in SQNR


Quantization
80

a l i ty
qu
70
16 CD 15
60
14
13
50
12
11 ty
SQNR (dB) --->

a l i
10
n e qu
40
9 ph o
8 Tele
30
7
6
20
5

10
m=4
0
-40 -35 -30 -25 -20 -15 -10 -5 0
20 log 10(A) --->
Quantization
80

a l i ty
qu
70
16 CD 15
60
14

Cl
13

ip
12

pi
50

ng
SQNR (dB) --->

11
10

di
40
9

st
or
30
8

tio
7

n
20
6
5
10
m=4
0
-40 -35 -30 -25 -20 -15 -10 -5 0
20 log 10(A) --->
Companding
xq
xmax

clipping→
-xmax
xmax x
←clipping

-xmax
Lossy compression
Analog Digital
Processing Sampling Quantization
signal
signal

Companding Analog-to-digital converter (ADC)


Prediction
Transformation

Digital Entropy compressed


Processing Quantization
signal coding digital signal
Prediction
Reduce dynamic of signal through predicting next
samples from previous samples and look at difference

1) Simplest case
xk yk

one
sample
delay

yk=xk-xk-1, k=0,1,2,…
Prediction
Reduce dynamic of signal through predicting next
samples from previous samples and look at difference

2) linear prediction with finite-impulse response (FIR) filter


xk yk

prediction
filter
a1 , . . . , a K

P
K
yk = x k a i xk i , 8k
i=1
Prediction
3) Example
https://www.mathworks.com/help/dsp/examples/lpc-analysis-and-synthesis-of-speech.html

0.4
!
0.3

0.2

0.1

-0.1

-0.2

-0.3

-0.4
0 200 400 600 800 1000 1200 1400 1600
Lossy compression
Analog Digital
Processing Sampling Quantization
signal
signal

Companding Analog-to-digital converter (ADC)


Prediction
Transformation

Digital Entropy compressed


Processing Quantization
signal coding digital signal
Transformation
Represent signal through elements in a suitable ‘dictionary’
2 3 2 3 2 3
x1 a11 a12 · · · a1n r1
6 x2 7 6 7 6 r2 7
6 7 6 a21 a22 · · · a2n 7 6 7
6 .. 7 = 4 5 6 .. 7
4 . 5 ... ... 4 . 5
xn an1 an2 · · · ann r n
| {z } | {z } | {z }
x A r
signal dictionary entries coefficients
Transformation
Represent signal through elements in a suitable ‘dictionary’
Compression through different quantization levels

most important
coefficient 2 3
r1
reduce # of 6 r2 7
quantization 6 7 1
6 .. 7=A x
levels 4 . 5
rn
least important
coefficient
Transformation
Use widely for image, sound, video compression
higher frequencies

For example, JPEG:

higher frequencies
quantization step size
our D!

see https://en.wikipedia.org/wiki/JPEG
Lossy compression
Analog Digital
Processing Sampling Quantization
signal
signal

Companding Analog-to-digital converter (ADC)


Prediction
Transformation

Digital Entropy compressed


Processing Quantization
signal coding digital signal

You might also like