ECEVSP L03 Compression2

Communication Systems
Technology Embedded in Daily Life
VSP 2019, Dr. Lutz Lampe & Dr. Paul Lusina

How compact can you get?
Coding theorem for a single source symbol:

Consider a source X whose symbols are represented by a
binary prefix-free code. Then (1) the average codeword length
satisfies
H(X)  Lav bit/symbols
and (2) there exists a code for which
Lav < H(X) + 1 bit/symbol.
How compact can you get?
H(X)  Lav < H(X) + 1 bits
bit/symbol
P
r
li
1) Lower bound follows from Kraft inequality 2 1
i=1
P
I
2) Note: H(X) = pi log 2(pi )
i=1
P
I
Lav = pi l i
i=1
Hence, if li = log2 (pi ) then Lav = H(X). So these are

the optimum codeword lengths.
3) Upper bound: Choose li = d log2 (pi )e and construct

prefix-free code
Average word length
Example: C = {00, 10, 11, 011}
with probabilities p1 , . . . , p4
0
p1
1
p 1 + p4
p 1 + p4 + p4 p4
p 2 + p3 p2
p 2 + p3 p3
The average codeword length is equal to the sum of the

probabilities of all parent nodes in the tree including the root
nodes.
Optimal codes: Preliminaries
Fact 1: There is no unused leaf in a binary tree of an optimal
binary prefix-free code.
Imagine there was …

Fact 1: There is no unused leaf in a binary tree of an optimal
binary prefix-free code.
Imagine there was … then we can delete it (code was not optimal)
Fact 2: There exists an optimal binary prefix-free code such that
the two least likely codewords only differ in the last digit,
i.e., they are are siblings.
least likely longest codewords p1 p2 ... pr

l1  l2  ...  lr
Option 1: Option 2:
pr
pr
pr 1 pr 1
Optimal codes: Huffman code
Facts 1 + 2: Start from leaves and build towards root as follows
Step 1 Create r leaf nodes and assign probabilities p1…pr.

Mark those nodes as active.
Step 2 Create a parent node to the two least likely active nodes,
and assign it the corresponding probability.
Activate this new node and deactivate the children nodes.
Step 3 If there is only one active node left, it becomes the root.
Otherwise, go to Step 2.
Optimal codes: Huffman code
Example: AX = {A, B, C, D}
pA = 0.2, pB = 0.5, pC = 0.2, pD = 0.1
L=1 r=4 codewords with probabilities

p1 = 0.2, p2 = 0.5, p3 = 0.2, p4 = 0.1
Huffman code:
Lav = 1.8
For a large file encoded using the following Huffman code, what is the
probability of randomly selecting a 1?
Symbol Probability Code

a 1/2 0
b 1/4 10
c 1/8 110
d 1/8 111
A p1>1/2
B p1=1/2
C p1<1/2
For a large file encoded using the following Huffman code, what is the
probability of randomly selecting a 1?
Symbol Probability Code

a 1/2 0
b 1/4 10
c 1/8 110
d 1/8 111
A p1>1/2
B p1=1/2
C p1<1/2
Where Symbol (Huffman) Codes Fail (I)
Entropy:
H = å i =1 pi log2 1 pi
6
= (.95)log2 1 .95 + 5(.01)log2 1 .01

= 0.4025 bits
Expected Length:
L(C , X ) = å i =1 pi ! i
6
= (.95)(1) + 3(.01)(3) + 2(.01)(4)

= 1.12 bits
1 bit ‘overhead’ on symbol codes can be costly for distributions with

entropy <<1
Where Symbol (Huffman) Codes Fail (II)
Predict the next symbol in the following sequences:

1. abacbb___
2. req____
3. We delivered it at his req _ _ _ _
Symbol codes cannot exploit redundancy (context) in the overall

message.
Where Symbol (Huffman) Codes Fail (III)
Predict the next symbol in the following sequences:
1. I am twice as ha_ _ _ _ _ _ _ _.
2. Ich bin zwei mal glück _ _ _ _ _ _ _ _ _ _ _ _.
Symbol codes cannot easily adapt if the symbol probabilities

change, i.e. letter frequency is different in English and German
Stream Codes
Use patterns or data context to compress multiple symbols of a message.

Lempel-Ziv codes:
- Identify patterns and replace the patterns with binary indices.
- For N infinitely large, they are equivalent to the source entropy.
- Agnostic of the source probability, i.e. the receiver need only know the
encoding algorithm and not the symbol probabilities.
Arithmetic codes:
- Bayesian model which uses the conditional probabilities of strings of
outcomes to partition the outcome space from [1 0).
- The partition label describes the compressed file.
- Closely matches the Shannon information content of the source string.
- Receiver need to know the symbol conditional probabilities.
Lempel Ziv Coding
- learn dictionary on the fly
- encode streams of data
- asymptotically optimal
Example1 AABABBBABAABABBBABBABB
A|ABABBBABAABABBBABBABB
A|AB|ABBBABAABABBBABBABB
A|AB|ABB|BABAABABBBABBABB
A|AB|ABB|B|ABA|ABAB|BB|ABBA|BB
1http://math.mit.edu/~goemans/18310S15/lempel-ziv-notes.pdf
Lempel Ziv Coding
phrase codebook entry encoding binary

∅ 0
A 1 A 0
AB 2 1B 11
ABB 3 2B 10 1
B 4 0B 00 1
ABA 5 2A 010 0
ABAB 6 5B 101 1
BB 7 4B 100 1
ABBA 8 3A 011 0
BB 7 0111
Lempel Ziv Coding
phrase codebook entry encoding binary

∅ 0
A 1 A 0
AB 2 1B 11
ABB 3 2B 10 1
B 4 0B 00 1
ABA 5 2A 010 0
ABAB 6 5B 101 1
BB 7 4B 100 1
ABBA 8 3A 011 0
BB 7 0111
,0|1,1|10,1|00,1|010,0|101,1|100,1|011,0|0111
Lempel Ziv Coding
Many versions and extensions
e.g. Lempel-Ziv-Welch, see
https://en.wikipedia.org/wiki/Lempel–Ziv–Welch
Used in gzip, compress, tiff, …

Some extensions
- Concept of typical sequences as a way to conceptualize
data compression and prove that Shannon entropy is the
relevant measure of information (formalized in the
“Shannon Source Coding Theorem”)
- Arithmetic coding as a practical lossless compression

mechanism used in many compression methods (e.g.
JPEG2000, JBIG, MPEG4)
Lossless vs lossy compression
So far: lossless compression (entropy coding)
= data can be fully recovered from codewords
Lossy compression: part of the information content cannot

be restored
When do we do this?
- We do not care for the lost information (irrelevance)
e.g., sound we cannot hear, contrast we cannot see
- We do not have the resources for all the information
e.g., low resolution content over low-bandwidth links
Lossy compression
Analog Digital
Processing Sampling Quantization
signal
signal
Companding Analog-to-digital converter (ADC)

Prediction
Transformation
Digital Entropy compressed

Processing Quantization
signal coding digital signal
Quantization
quantization error
}
11
}
10
01
00
Quantization
Model x q = x + nq
quantized signal quantization
signal sample error
Assume
- quantization using m bits
Q=2m quantization intervals
- quantization range [-xmax, xmax]
and uniform quantization
interval size D=2xmax/Q
Quantization
xq
xmax
D clipping→
}
}
-xmax
xmax x
←clipping
-xmax
Quantization
Model x q = x + nq
signal sample error
Assume
- negligible clipping
-xmax≤x≤xmax approximately
D2
- nq is uniformly distributed Pn q = E(n2q ) = 12
p
Px
Define: modulation level A = xmax
Quantization
Model x q = x + nq
signal sample error
Measure of distortion
Signal-to-quantization-noise power ratio (SQNR)
E(x2 ) Px
SQNR = E(n2q ) = Pn q
Assumptions
12A2 x2max
SQNR = D2 = 3A2 Q2 = 3A2 22m
Quantization
in decibel (dB):
SQNR = 10 log10 (3A2 22m ) = 10 log10 (3) + 10 log10 (A2 ) + m20 log10 (2)
SQNR = 4.77 dB + 10 log10 (A2 ) + m · 6 dB
Each bit more for ADC gives 6 dB gain in SQNR

Quantization
80
a l i ty
qu
70
16 CD 15
60
14
13
50
12
11 ty
SQNR (dB) --->
a l i
10
n e qu
40
9 ph o
8 Tele
30
7
6
20
5
10
m=4
0
-40 -35 -30 -25 -20 -15 -10 -5 0
20 log 10(A) --->
Quantization
80
a l i ty
qu
70
16 CD 15
60
14
Cl
13
ip
12
pi
50
ng
SQNR (dB) --->
11
10
di
40
9
st
or
30
8
tio
7
n
20
6
5
10
m=4
0
-40 -35 -30 -25 -20 -15 -10 -5 0
20 log 10(A) --->
Companding
xq
xmax
clipping→
-xmax
xmax x
←clipping
-xmax
Lossy compression
Analog Digital
signal
signal

Prediction
Transformation

Prediction
Reduce dynamic of signal through predicting next
samples from previous samples and look at difference
1) Simplest case
xk yk
one
sample
delay
yk=xk-xk-1, k=0,1,2,…
Prediction
Reduce dynamic of signal through predicting next
samples from previous samples and look at difference
2) linear prediction with finite-impulse response (FIR) filter

xk yk
prediction
filter
a1 , . . . , a K
P
K
yk = x k a i xk i , 8k
i=1
Prediction
3) Example
https://www.mathworks.com/help/dsp/examples/lpc-analysis-and-synthesis-of-speech.html
0.4
!
0.3
0.2
0.1
-0.1
-0.2
-0.3
-0.4
0 200 400 600 800 1000 1200 1400 1600
Lossy compression
Analog Digital
signal
signal

Prediction
Transformation

Transformation
Represent signal through elements in a suitable ‘dictionary’
2 3 2 3 2 3
x1 a11 a12 · · · a1n r1
6 x2 7 6 7 6 r2 7
6 7 6 a21 a22 · · · a2n 7 6 7
6 .. 7 = 4 5 6 .. 7
4 . 5 ... ... 4 . 5
xn an1 an2 · · · ann r n
| {z } | {z } | {z }
x A r
signal dictionary entries coefficients
Transformation
Represent signal through elements in a suitable ‘dictionary’
Compression through different quantization levels
most important
coefficient 2 3
r1
reduce # of 6 r2 7
quantization 6 7 1
6 .. 7=A x
levels 4 . 5
rn
least important
coefficient
Transformation
Use widely for image, sound, video compression
higher frequencies
For example, JPEG:
higher frequencies
quantization step size
our D!
see https://en.wikipedia.org/wiki/JPEG
Lossy compression
Analog Digital
signal
signal

Prediction
Transformation


ECEVSP L03 Compression2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ECEVSP L03 Compression2

Uploaded by

Copyright:

Available Formats

Communication Systems

Technology Embedded in Daily Life

VSP 2019, Dr. Lutz Lampe & Dr. Paul Lusina

Coding theorem for a single source symbol:

Hence, if li = log2 (pi ) then Lav = H(X). So these are

3) Upper bound: Choose li = d log2 (pi )e and construct

The average codeword length is equal to the sum of the

Imagine there was …

least likely longest codewords p1 p2 ... pr

Step 1 Create r leaf nodes and assign probabilities p1…pr.

L=1 r=4 codewords with probabilities

Symbol Probability Code

Symbol Probability Code

= (.95)log2 1 .95 + 5(.01)log2 1 .01

= (.95)(1) + 3(.01)(3) + 2(.01)(4)

1 bit ‘overhead’ on symbol codes can be costly for distributions with

Predict the next symbol in the following sequences:

Symbol codes cannot exploit redundancy (context) in the overall

Predict the next symbol in the following sequences:

2. Ich bin zwei mal glück _ _ _ _ _ _ _ _ _ _ _ _.

Symbol codes cannot easily adapt if the symbol probabilities

Use patterns or data context to compress multiple symbols of a message.

phrase codebook entry encoding binary

phrase codebook entry encoding binary

Used in gzip, compress, tiff, …

- Arithmetic coding as a practical lossless compression

Lossy compression: part of the information content cannot

Companding Analog-to-digital converter (ADC)

Digital Entropy compressed

SQNR = 4.77 dB + 10 log10 (A2 ) + m · 6 dB

Each bit more for ADC gives 6 dB gain in SQNR

Companding Analog-to-digital converter (ADC)

Digital Entropy compressed

2) linear prediction with finite-impulse response (FIR) filter

Companding Analog-to-digital converter (ADC)

Digital Entropy compressed

For example, JPEG:

Companding Analog-to-digital converter (ADC)

Digital Entropy compressed

You might also like