Lec5 Compression

Compression
For sending and storing information

Text, audio, images, videos
Common Applications
Text compression
loss-less, gzip uses Lempel-Ziv coding, 3:1 compression
better than Huffman
Audio compression
lossy, mpeg 3:1 to 24:1 compression
MPEG = motion picture expert group
Image compression
lossy, jpeg 3:1 compression
JPEG = Joint photographic expert group
Video compression
lossy, mpeg 27:1 compression
Text Compression
Prefix code: one, of many, approaches
no code is prefix of any other code
constraint: loss-less
tasks
encode: text (string) -> code
decode: code --> text
main goal: maximally reduce storage, measured by
compression ratio
minor goals:
simplicity
efficiency: time and space
some require code dictionary or 2 passes of data
Simplest Text Encoding

Run-length encoding
Requires special character, say @
Example Source:
ACCCTGGGGGAAAACCCCCC
Encoding:
A@C3T@G5@4A@C6
Method
any 3 or more characters are replace by @char#
+: simple
-: special characters, non-optimal
Shannons Information theory (1948)

How well can we encode?
Shannons goal: reduce size of messages for improved
communication
What messages would be easiest/hardest to send?
Random bits hardest - no redundancy or pattern
Formal definition: S, a set of symbols si
Information content of S = -sum pi*log(pi)
measure of randomness
more random, less predictable, higher information content!
Theorem: only measure with several natural properties
Information is not knowledge
Compression relies on finding regularities or
redundancies.
Example
Send ACTG each occurring 1/4 of the time

Code: A--00, C--01, T--10, G--11
2 bits per letters: no surprise
Average message length:
prob(A)*codelength(A)+prob(B)*codelength(B) +
1/4*2+. = 2 bits.
Now suppose:
prob(A) = 13/16 and other 1/16
Codes: A - 1; C-00, G-010, T-011 (prefix)
13/16*1+ 1/16*2+ 1/16*3+1/16*3=21/16 = 1.3+
What is best result? Part of the answer:
The information content! But how to get it?
Understanding Entropy/Information
Suppose a set S is divided into k classes

Let ni be the number of elements in class i
Let N be the sum of all ni.
Let pi be ni/N (the frequency of class i)
Entropy(S) = -p1*log(p1) - p2*log(p2) -.-pk*log(pk).
Note if k = 2, same as before.
If all classes equally likely (pi = 1/k) then
Entropy(S) = - 1/k*log(1/k) - = -log(1/k) = log(k)
If k = power of 2, then this is number of bits to distinguish all classes
If one class has probability 1, then

Entropy(S) = - 0*log(..) - -1*log(1) = 0
Set isnt mixed up at all.
Intuitively entropy gives right answers.

Learning Hint: To understand equations, try special cases.
The Shannon-Fano Algorithm
Earliest algorithm: Heuristic divide and conquer

Illustration: source text with only letters ABCDE
Symbol A B C D E
---------------------------------Count
15 7 6 6 5
Intuition: frequent letters get short codes
1. Sort symbols according to their
frequencies/probabilities, i.e. ABCDE.
2. Recursively divide into two parts, each with approx.
same number of counts.
This is instance of balancing which is NP-complete.
Note: variable length codes.
Shannon-Fano Tree
N o t e P r e fix p r o p e r ty
00
01
10
d
110
e
111
Result for this distribution
Symbol Count -log(1/p) Code (# of bits)

---------- -------- --------- -------------------A
15
1.38
00
30
B
7
2.48
01
14
C
6
2.70
10
12
D
6
2.70
110
18
E
5
2.96
111
15
TOTAL (# of bits): 89
average message length = 89/39=2.3
Note: Prefix property for decoding
Can you do better?
Theoretical optimum = -sum pi*log(pi) = entropy
Code Tree Method/Analysis

Binary tree method
Internal nodes have left/right references:
0 means go to the left
1 means go to the right
Leaf nodes store the value
Decode time-cost is O(logN)
Decode space-cost is O(N)
quick argument: number of leaves > number of internal
nodes.
Proof: induction on ..
number of internal nodes.
Prefix Property: each prefix uniquely defines char.
Code Encode(character)
Again can use binary prefix tree
For encode and decode could use hashing
yields O(1) encode/decode time
O(N) space cost ( N is size of alphabet)
For compression, main goal is reducing storage size
in example its the total number of bits
code size for single character = depth of tree
code size for document =
sum of (frequency of char * depth of character)
different trees yield different storage efficiency
Whats the best tree?
Huffman Code
Provably optimal: i.e. yields minimum storage cost
Algorithm: CodeTree huff(document)
1. Compute the frequency and a leaf node for each char
leaf node has countfield and character
2. Remove the 2 nodes with least counts and create a new
node with count equal to the sum of counts and sons,
the removed nodes.
internal node has 2 node ptrs and count field
3. Repeat 2 until only 1 node left.
4. Thats it!
Bad code example

char
code
frequency bits
000
10
30
001
15
45
010
12
36
011
10
Total
128
Tree, a la Huffman
R e p e a t : M e r g e lo w e s t f r e q u e n c y n o d e s
44
17
10
27
7
15
4
12
Tree with codes: note Prefix property
fre q / c o d e / c h a r
44
17
10
00
a
27
7
3
010
s
15
10
e
4
011
t
12
11
i
Tree Cost
b it s / n o d e t o t a l b its : 9 5 ( b e f o r e 1 2 8 )
44
17
1 0 /2 /2 0
27
7
3 /3 /9
1 5 /2 /3 0
4 /3 /1 2
1 2 /2 /2 4
Analysis
Intuition: least frequent chars get longest codes or
most frequent chars get shortest codes.
Let T be a minimal code tree. (Induction)
All nodes have 2 sons. (by construction)
Lemma: if c1 and c2 be least frequently used then they
are at the deepest depth
Proof:
if not deepest nodes, exchange and total cost
(number of bits) goes down
Analysis (continued)
Sk : Huffman algorithm on k chars produces optimal
code.
S2: obvious
Sk => Sk+1
Let T be optimal code on k+1 chars
By lemma, two least freq chars are deepest
Replace two least freq char by new char with freq
equal to sum
Now have tree with k nodes
By induction, Huffman yields optimal tree.
Lempel-Ziv
Input: string of characters

Internal: dictionary of (codewords, words)
Output: string of codewords and characters.
Codewords are distinct from characters.
In algorithm, w is a string, c is character and w+c
means concatenation.
When adding a new word to the dictionary, a new code
word needs to be assigned.
Lempel-Ziv Algorithm
w = NIL;
while ( read a character c )
{
if w+c exists in the dictionary
w = w+c;
else
add w+c to the dictionary;
output the code for w;
w = k;
}
Adaptive Encoding
Webster has 157,000 entries: could encode in X bits
but only works for this document
Dont want to do two passes
Adaptive Huffman
modify model on the fly
Zempel-Liv 1977
ZLW Zempel-Liv Welsh
1984 used in compress (UNIX)
uses dictionary method
variable number of symbols to fixed length code
better with large documents- finds repetitive patterns
Audio Compression
Sounds can be represented as a vector valued function
At any point in time, a sound is a combination of
different frequencies of different strengths
For example, each note on a piano yields a specific
frequency.
Also, our ears, like pianos, have cilia that responds to
specific frequencies.
Just like sin(x) can be approximated by small number
of terms, e.g. x -x^3/3+x^5/120, so can sound.
Transforming a sound into its spectrum is done
mathematically by a fourier transform.
The spectrum can be played back, as on computer with
sound cards.
Audio
Using many frequencies, as in CDs, yields a good
approximation Using few frequenices, as in telephones,
a poor approximation
Sampling frequencies yields compresssion ratios
between 6 to 24, depending on sound and quality
High-priced electronic pianos store and reuse
samples of concert pianos
High filter: removes/reduces high frequencies, a
common problem with aging
Low filter: removes/reduces low frequencies
Can use differential methods:
only report change in sounds
Image Compression
with or without loss, mostly with
who cares about what the eye cant see
Black and white images can regarded as functions from
the plane (R^2) into the reals (R), as in old TVs
positions vary continuous, but our eyes cant see the
discreteness around 100 pixels per inch.
Color images can be regarded as functions from the
plane into R^3, the RGB space.
Colors are vary continuous, but our eyes sample colors
with only 3 difference receptors (RGB)
Mathematical theories yields close approximation
there are spatial analogues to fourier transforms
Image Compression
faces can be done with eigenfaces

images can be regarded a points in R^(big)
choose good bases and use most important vectors
i.e. approximate with fewer dimensions:
JPEG, MPEG, GIF are compressed images
Video Compression
Uses DCT (discrete cosine transform)
Note: Nice functions can be approximated by
sum of x, x^2, with appropriate coefficients
sum of sin(x), sin(2x), with right coefficients
almost any infinite sum of functions
DCT is good because few terms give good results on
images.
Differential methods used:
only report changes in video
Summary
Issues:
Context: what problem are you solving and what is an
acceptable solution.
evaluation: compression ratios
fidelity, if loss
approximation, quantization, transforms, differential
adaptive, if on-the-fly, e.g. movies, tv
Different sources yield different best approaches
cartoons versus cities versus outdoors
code book separate or not
fixed or variable length codes

Lec5 Compression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec5 Compression

Uploaded by

Copyright:

Available Formats

Compression

For sending and storing information

Simplest Text Encoding

Shannons Information theory (1948)

Send ACTG each occurring 1/4 of the time

Suppose a set S is divided into k classes

If one class has probability 1, then

Intuitively entropy gives right answers.

The Shannon-Fano Algorithm

Earliest algorithm: Heuristic divide and conquer

Result for this distribution

Symbol Count -log(1/p) Code (# of bits)

Code Tree Method/Analysis

Bad code example

Tree with codes: note Prefix property

Input: string of characters

faces can be done with eigenfaces

You might also like