You are on page 1of 9

Department of Computer Engineering

University of California at Santa Cruz


Data Compression (1)
Hai Tao
Department of Computer Engineering
University of California at Santa Cruz
Data Compression Why ?
Storing or transmitting multimedia data requires large space
or bandwidth
The size of one hour 44K sample/sec 16-bit stereo (two channels)
audio is 3600x44000x2x2= 633.6MB, which can be recorded on one
CD (650 MB). MP3 compression can reduce this number by factor
of 10
The size of a 500x500 color image is 750KB without compression
(JPEG can reduced this by a factor of 10 to 20)
The size of one minute real-time, full size, color video clip is
60x30x640x480x3= 1.659GB. A two-hour movie requires 200GB.
MPEG2 compression can bring this number down to 4.7 GB (DVD)
Department of Computer Engineering
University of California at Santa Cruz
Compression methods
Run-length Coding
Entropy Coding Huffman Coding
MPEG
H.261
Bit Position
Layered Coding
DCT Source Coding
FFT
Transformation
DM
Prediction
Vector Quantization
JPEG
DPCM
Sub-band Coding
Hybrid Coding
Sub-sampling
Arithmetic Coding
DV1 RTV, DV1 PLV
Department of Computer Engineering
University of California at Santa Cruz
Run-length coding
Example:
A scanline of a binary image is
00000 00000 00000 00000 00010 00000 00000 01000 00000
00000
Total of 50 bits
However, strings of consecutive 0s or 1s can be represented
more efficiently
0(23) 1(1) 0(12) 1(1) 0(13)
If the counts can be represented using 5 bits, then we can reduce
the amount of data to 5+5*5=30 bits. A compression ratio of
40%
Department of Computer Engineering
University of California at Santa Cruz
Huffman coding
Example: 4 letters in language A B S Z
To uniquely encode each letter, we need two bits
A- 00 B-01 S-10 Z 11
A message AAABSAAAAZ is encoded with 20 bits
Now how about assign
A- 0 B-100 S-101 Z 11
The same message can be encoded using 15 bits
The basic idea behind Huffman coding algorithm is to assign
shorter codewords to more frequently used symbols
Department of Computer Engineering
University of California at Santa Cruz
Huffman coding Problem statement
Given a set of N symbols S={s
i,
i=1,N} with probabilities
of occurrence P
i,
i=1,N, find the optimal encoding of the
the symbol to achieve the minimum transmission rate
(bits/symbol)
Example: Five symbols, A,B,C,D,E with probabilities of
P(A)=0.16,
P(B)=0.51
P(C)=0.09
P(D)=0.13
P(E)=0.11
Without Huffman coding, 3 bits are needed for each symbol
Department of Computer Engineering
University of California at Santa Cruz
Huffman Coding - Algorithm
Algorithm
Each symbol is a leave node in a tree
Combining the two symbols or composite symbols with the least
probabilities to form a new parent composite symbols, which has
the combined probabilities. Assign a bit 0 and 1 to the two links
Continue this process till all symbols merged into one root node.
For each symbol, the sequence of the 0s and 1s from the root node
to the symbol is the code word
Example
Department of Computer Engineering
University of California at Santa Cruz
Huffman Coding - Example
Step 1
Step 2
Step 3
P(C)=0.09) P(E)=0.11)
P(CE)=0.20)
P(D)=0.13) P(A)=0.16)
P(AD)=0.29)
1
0
1
0
P(C)=0.09) P(E)=0.11)
P(CE)=0.20)
P(D)=0.13) P(A)=0.16)
P(AD)=0.29)
1
0
1
0
P(ACDE)=0.49)
1
0
Department of Computer Engineering
University of California at Santa Cruz
Huffman Coding - Example
Step 4
Step 5
A=000, B=1, C=011, D=001, E=010
Expected bits/symbol is
3*(0.16+0.09+0.13+0.11)+1*0.51=3*0.49+1*0.51=1.98bit/symbol
Compression ratio is 1.02/3=34%
P(C)=0.09) P(E)=0.11)
P(CE)=0.20)
P(D)=0.13) P(A)=0.16)
P(AD)=0.29)
1
0
1
0
P(ACDE)=0.49)
1
0
P(ABCDE)=1)
P(B)=0.51)
1
0

You might also like