You are on page 1of 28

Compression

For sending and storing information


Text, audio, images, videos

Common Applications
Text compression
loss-less, gzip uses Lempel-Ziv coding, 3:1 compression
better than Huffman
Audio compression
lossy, mpeg 3:1 to 24:1 compression
MPEG = motion picture expert group
Image compression
lossy, jpeg 3:1 compression
JPEG = Joint photographic expert group
Video compression
lossy, mpeg 27:1 compression

Text Compression
Prefix code: one, of many, approaches
no code is prefix of any other code
constraint: loss-less
tasks
encode: text (string) -> code
decode: code --> text
main goal: maximally reduce storage, measured by
compression ratio
minor goals:
simplicity
efficiency: time and space
some require code dictionary or 2 passes of data

Simplest Text Encoding


Run-length encoding
Requires special character, say @
Example Source:
ACCCTGGGGGAAAACCCCCC
Encoding:
A@C3T@G5@4A@C6
Method
any 3 or more characters are replace by @char#
+: simple
-: special characters, non-optimal

Shannons Information theory (1948)


How well can we encode?
Shannons goal: reduce size of messages for improved
communication
What messages would be easiest/hardest to send?
Random bits hardest - no redundancy or pattern
Formal definition: S, a set of symbols si
Information content of S = -sum pi*log(pi)
measure of randomness
more random, less predictable, higher information content!
Theorem: only measure with several natural properties
Information is not knowledge
Compression relies on finding regularities or
redundancies.

Example

Send ACTG each occurring 1/4 of the time


Code: A--00, C--01, T--10, G--11
2 bits per letters: no surprise
Average message length:
prob(A)*codelength(A)+prob(B)*codelength(B) +
1/4*2+. = 2 bits.
Now suppose:
prob(A) = 13/16 and other 1/16
Codes: A - 1; C-00, G-010, T-011 (prefix)
13/16*1+ 1/16*2+ 1/16*3+1/16*3=21/16 = 1.3+
What is best result? Part of the answer:
The information content! But how to get it?

Understanding Entropy/Information

Suppose a set S is divided into k classes


Let ni be the number of elements in class i
Let N be the sum of all ni.
Let pi be ni/N (the frequency of class i)
Entropy(S) = -p1*log(p1) - p2*log(p2) -.-pk*log(pk).
Note if k = 2, same as before.
If all classes equally likely (pi = 1/k) then
Entropy(S) = - 1/k*log(1/k) - = -log(1/k) = log(k)
If k = power of 2, then this is number of bits to distinguish all classes

If one class has probability 1, then


Entropy(S) = - 0*log(..) - -1*log(1) = 0
Set isnt mixed up at all.

Intuitively entropy gives right answers.


Learning Hint: To understand equations, try special cases.

The Shannon-Fano Algorithm

Earliest algorithm: Heuristic divide and conquer


Illustration: source text with only letters ABCDE
Symbol A B C D E
---------------------------------Count
15 7 6 6 5
Intuition: frequent letters get short codes
1. Sort symbols according to their
frequencies/probabilities, i.e. ABCDE.
2. Recursively divide into two parts, each with approx.
same number of counts.
This is instance of balancing which is NP-complete.
Note: variable length codes.

Shannon-Fano Tree
N o t e P r e fix p r o p e r ty

00

01

10

d
110

e
111

Result for this distribution

Symbol Count -log(1/p) Code (# of bits)


---------- -------- --------- -------------------A
15
1.38
00
30
B
7
2.48
01
14
C
6
2.70
10
12
D
6
2.70
110
18
E
5
2.96
111
15
TOTAL (# of bits): 89
average message length = 89/39=2.3
Note: Prefix property for decoding
Can you do better?
Theoretical optimum = -sum pi*log(pi) = entropy

Code Tree Method/Analysis


Binary tree method
Internal nodes have left/right references:
0 means go to the left
1 means go to the right
Leaf nodes store the value
Decode time-cost is O(logN)
Decode space-cost is O(N)
quick argument: number of leaves > number of internal
nodes.
Proof: induction on ..
number of internal nodes.
Prefix Property: each prefix uniquely defines char.

Code Encode(character)
Again can use binary prefix tree
For encode and decode could use hashing
yields O(1) encode/decode time
O(N) space cost ( N is size of alphabet)
For compression, main goal is reducing storage size
in example its the total number of bits
code size for single character = depth of tree
code size for document =
sum of (frequency of char * depth of character)
different trees yield different storage efficiency
Whats the best tree?

Huffman Code
Provably optimal: i.e. yields minimum storage cost
Algorithm: CodeTree huff(document)
1. Compute the frequency and a leaf node for each char
leaf node has countfield and character
2. Remove the 2 nodes with least counts and create a new
node with count equal to the sum of counts and sons,
the removed nodes.
internal node has 2 node ptrs and count field
3. Repeat 2 until only 1 node left.
4. Thats it!

Bad code example


char

code

frequency bits

000

10

30

001

15

45

010

12

36

011

10

Total

128

Tree, a la Huffman
R e p e a t : M e r g e lo w e s t f r e q u e n c y n o d e s
44
17
10

27
7

15
4

12

Tree with codes: note Prefix property

fre q / c o d e / c h a r
44
17
10
00
a

27
7

3
010
s

15
10
e
4
011
t

12
11
i

Tree Cost

b it s / n o d e t o t a l b its : 9 5 ( b e f o r e 1 2 8 )
44
17
1 0 /2 /2 0

27
7

3 /3 /9

1 5 /2 /3 0
4 /3 /1 2

1 2 /2 /2 4

Analysis
Intuition: least frequent chars get longest codes or
most frequent chars get shortest codes.
Let T be a minimal code tree. (Induction)
All nodes have 2 sons. (by construction)
Lemma: if c1 and c2 be least frequently used then they
are at the deepest depth
Proof:
if not deepest nodes, exchange and total cost
(number of bits) goes down

Analysis (continued)
Sk : Huffman algorithm on k chars produces optimal
code.
S2: obvious
Sk => Sk+1
Let T be optimal code on k+1 chars
By lemma, two least freq chars are deepest
Replace two least freq char by new char with freq
equal to sum
Now have tree with k nodes
By induction, Huffman yields optimal tree.

Lempel-Ziv

Input: string of characters


Internal: dictionary of (codewords, words)
Output: string of codewords and characters.
Codewords are distinct from characters.
In algorithm, w is a string, c is character and w+c
means concatenation.
When adding a new word to the dictionary, a new code
word needs to be assigned.

Lempel-Ziv Algorithm
w = NIL;
while ( read a character c )
{
if w+c exists in the dictionary
w = w+c;
else
add w+c to the dictionary;
output the code for w;
w = k;
}

Adaptive Encoding
Webster has 157,000 entries: could encode in X bits
but only works for this document
Dont want to do two passes
Adaptive Huffman
modify model on the fly
Zempel-Liv 1977
ZLW Zempel-Liv Welsh
1984 used in compress (UNIX)
uses dictionary method
variable number of symbols to fixed length code
better with large documents- finds repetitive patterns

Audio Compression
Sounds can be represented as a vector valued function
At any point in time, a sound is a combination of
different frequencies of different strengths
For example, each note on a piano yields a specific
frequency.
Also, our ears, like pianos, have cilia that responds to
specific frequencies.
Just like sin(x) can be approximated by small number
of terms, e.g. x -x^3/3+x^5/120, so can sound.
Transforming a sound into its spectrum is done
mathematically by a fourier transform.
The spectrum can be played back, as on computer with
sound cards.

Audio
Using many frequencies, as in CDs, yields a good
approximation Using few frequenices, as in telephones,
a poor approximation
Sampling frequencies yields compresssion ratios
between 6 to 24, depending on sound and quality
High-priced electronic pianos store and reuse
samples of concert pianos
High filter: removes/reduces high frequencies, a
common problem with aging
Low filter: removes/reduces low frequencies
Can use differential methods:
only report change in sounds

Image Compression
with or without loss, mostly with
who cares about what the eye cant see
Black and white images can regarded as functions from
the plane (R^2) into the reals (R), as in old TVs
positions vary continuous, but our eyes cant see the
discreteness around 100 pixels per inch.
Color images can be regarded as functions from the
plane into R^3, the RGB space.
Colors are vary continuous, but our eyes sample colors
with only 3 difference receptors (RGB)
Mathematical theories yields close approximation
there are spatial analogues to fourier transforms

Image Compression

faces can be done with eigenfaces


images can be regarded a points in R^(big)
choose good bases and use most important vectors
i.e. approximate with fewer dimensions:
JPEG, MPEG, GIF are compressed images

Video Compression
Uses DCT (discrete cosine transform)
Note: Nice functions can be approximated by
sum of x, x^2, with appropriate coefficients
sum of sin(x), sin(2x), with right coefficients
almost any infinite sum of functions
DCT is good because few terms give good results on
images.
Differential methods used:
only report changes in video

Summary
Issues:
Context: what problem are you solving and what is an
acceptable solution.
evaluation: compression ratios
fidelity, if loss
approximation, quantization, transforms, differential
adaptive, if on-the-fly, e.g. movies, tv
Different sources yield different best approaches
cartoons versus cities versus outdoors
code book separate or not
fixed or variable length codes

You might also like