Professional Documents
Culture Documents
CSEP 590 Data Compression: Adaptive Huffman Coding
CSEP 590 Data Compression: Adaptive Huffman Coding
Data Compression
Autumn 2007
c b
11
5 6
a
3 3
d
1 2
c b
11 7
5 5 6 6
a
3 3 3 4
d
1 1 2 2
c b
Number the nodes as they are removed from the
priority queue.
CSEP 590 - Lecture 2 - Autumn 2007 5
Adaptive Huffman Principle
• In an optimal tree for n symbols there is a
numbering of the nodes y1<y2<... <y2n-1 such
that their corresponding weights x1,x2, ... , x2n-1
satisfy:
– x1 < x2 < ... < x2n-1
– siblings are numbered consecutively
• And vice versa
– That is, if there is such a numbering then the tree is
optimal. We call this the node number invariant.
0 1 0 1
0 1 0 1 0 1 0 1
a b c d e f 0 1 0 1
g h i j
CSEP 590 - Lecture 2 - Autumn 2007 7
Initialization
• The tree will encode up to m + 1 symbols
including NYT.
• We reserve numbers 1 to 2m + 1 for node
numbering.
• The initial Huffman tree consists of a single
node weight
0 2m + 1
NYT node number
0 21
NYT
0 19 1 20
NYT a
output = 000
0 19 1 20
NYT a
output = 0001
0 19 2 20
NYT a
output = 0001
0 19 2 20
NYT a
NYT
fixed code for b
output = 00010001
0 19 2 20
0 1 a
0 17 0 18
a
NYT b
output = 00010001
0 19 2 20
0 1 a
0 17 1 18
a
NYT b
output = 00010001
1 19 2 20
0 1 a
0 17 1 18
a
NYT b
output = 00010001
1 19 2 20
0 1 a
0 17 1 18
a
NYT b
fixed code for c
NYT
output = 0001000100010
1 19 2 20
0 1 a
0 17 a 18
1
0 1 b
0 15 0 16
a
NYT c
output = 0001000100010
1 19 2 20
0 1 a
0 17 a 18
1
0 1 b
0 15 1 16
a
NYT c
output = 0001000100010
1 19 2 20
0 1 a
1 17 a 18
1
0 1 b
0 15 1 16
a
NYT c
output = 0001000100010
2 19 2 20
0 1 a
1 17 a 18
1
0 1 b
0 15 1 16
a
NYT c
output = 0001000100010
2 19 2 20
0 1 a
1 17 a 18
1
0 1 b
0 15 1 16
a
NYT c fixed code for d
NYT
output = 0001000100010000011
2 19 2 20
0 1 a
1 17 a 18
1
0 1 b
0 15 1 16
a
0 1 c
0 13 0 14
a
NYT d
output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 26
Example
4 21
• aabcdad 0 1
2 19 2 20
0 1 a
1 17 a 18
1
0 1 b
0 15 1 16
a
0 1 c
0 13 1 14
a
NYT d
output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 27
Example
4 21
• aabcdad 0 1
2 19 2 20
0 1 a
1 17 a 18
1
exchange!
0 1 b
1 15 1 16
a
0 1 c
0 13 1 14
a
NYT d
output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 28
Example
4 21
• aabcdad 0 1
2 19 2 20
0 1 a
1 17 1 18
b 0 1
1 15 1 16
a
0 1 c
0 13 1 14
a
NYT d
output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 29
Example
4 21
• aabcdad 0 1
2 19 2 20
0 1 a
1 17 2 18
exchange!
b 0 1
1 15 1 16
a
0 1 c
0 13 1 14
a
NYT d
output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 30
Example
4 21
• aabcdad 0 1
2 19 2 20
a 0 1
1 17 2 18
b 0 1
1 15 1 16
a
0 1 c
0 13 1 14
a
NYT d
output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 31
Example
4 21
• aabcdad 0 1
2 19 3 20
a 0 1
1 17 2 18
b 0 1
1 15 1 16
a
0 1 c
0 13 1 14
a
NYT d
output = 0001000100010000011
CSEP 590 - Lecture 2 - Autumn 2007 32
Example
5 21 Note: the first a is coded
• aabcdad 0 1 as 000, the second as 1,
and the third as 0
2 19 3 20
a 0 1
1 17 2 18
b 0 1
1 15 1 16
a
0 1 c
0 13 1 14
a
NYT d
output = 00010001000100000110
CSEP 590 - Lecture 2 - Autumn 2007 33
Example
5 21
• aabcdad 0 1
3 19 3 20
a 0 1
1 17 2 18
b 0 1
1 15 1 16
a
0 1 c
0 13 1 14
a
NYT d
output = 00010001000100000110
CSEP 590 - Lecture 2 - Autumn 2007 34
Example
6 21
• aabcdad 0 1
3 19 3 20
a 0 1
exchange!
1 17 2 18
b 0 1
1 15 1 16
a
0 1 c
0 13 1 14
a
NYT d
output = 000100010001000001101101
CSEP 590 - Lecture 2 - Autumn 2007 35
Example
6 21
• aabcdad 0 1
3 19 3 20
a 0 1
1 17 2 18
d 0 1
1 15 1 16
a
0 1 c
0 13 1 14
a
NYT b
output = 000100010001000001101101
CSEP 590 - Lecture 2 - Autumn 2007 36
Example
6 21
• aabcdad 0 1
3 19 3 20
a 0 1
2 17 2 18
d 0 1
1 15 1 16
a
0 1 c
0 13 1 14
a
NYT b
output = 000100010001000001101101
CSEP 590 - Lecture 2 - Autumn 2007 37
Example
6 21
• aabcdad 0 1
3 19 4 20
a 0 1
2 17 2 18
d 0 1
1 15 1 16
a
0 1 c
0 13 1 14
a
NYT b
output = 000100010001000001101101
CSEP 590 - Lecture 2 - Autumn 2007 38
Example
7 21
• aabcdad 0 1
3 19 4 20
a 0 1
2 17 2 18
d 0 1
1 15 1 16
a
0 1 c
0 13 1 14
a
NYT b
output = 000100010001000001101101
CSEP 590 - Lecture 2 - Autumn 2007 39
Data Structure for Adaptive Huffman
7 1. Fixed code table
a 0 1 2. Binary tree with
b parent pointers
3 4 3. Table of pointers
c
a 0 1 nodes into tree
d 4. Doubly linked list
. 2 2 to rank the nodes
. d 0 1
. 1 1
a
j 0 1 c
0 a
1
NYT NYT b
0 1 0 1
0 1 0 1 0 1 0 1
a b c d e f g h
• 00110000