You are on page 1of 8

Dictionary Coding

• Does not use statistical knowledge of data.


• Encoder: As the input is processed develop a
Dictionaries for Data Compression dictionary and transmit the index of strings
found in the dictionary.
• Decoder: As the code is processed
CSE 326 reconstruct the dictionary to invert the
process of encoding.
Autumn 2005
• Examples: LZW, LZ77, Sequitur,
Lecture 19
• Applications: Unix Compress, gzip, GIF

Dictionary Data Compression - Lecture 19 2

LZW Encoding Algorithm LZW Encoding Example (1)


Dictionary
ababababa
Repeat 0 a
find the longest match w in the dictionary 1 b
output the index of w
put wa in the dictionary where a was the
unmatched symbol

Dictionary Data Compression - Lecture 19 3 Dictionary Data Compression - Lecture 19 4

LZW Encoding Example (2) LZW Encoding Example (3)


Dictionary Dictionary
ababababa ababababa
0 a 0 0 a 01
1 b 1 b
2 ab 2 ab
3 ba

Dictionary Data Compression - Lecture 19 5 Dictionary Data Compression - Lecture 19 6

1
LZW Encoding Example (4) LZW Encoding Example (5)
Dictionary Dictionary
ababababa ababababa
0 a 01 2 0 a 01 2 4
1 b 1 b
2 ab 2 ab
3 ba 3 ba
4 aba 4 aba
5 abab

Dictionary Data Compression - Lecture 19 7 Dictionary Data Compression - Lecture 19 8

LZW Encoding Example (6) LZW Decoding Algorithm


• Emulate the encoder in building the dictionary.
Dictionary
ababababa Decoder is slightly behind the encoder.
0 a 01 2 4 3
1 b
2 ab initialize dictionary;
3 ba decode first index to w;
4 aba put w? in dictionary;
5 abab repeat
decode the first symbol s of the index;
complete the previous dictionary entry with s;
finish decoding the remainder of the index;
put w? in the dictionary where w was just decoded;

Dictionary Data Compression - Lecture 19 9 Dictionary Data Compression - Lecture 19 10

LZW Decoding Example (1) LZW Decoding Example (2a)


Dictionary Dictionary
012436 012436
0 a a 0 a a b
1 b 1 b
2 a? 2 ab

Dictionary Data Compression - Lecture 19 11 Dictionary Data Compression - Lecture 19 12

2
LZW Decoding Example (2b) LZW Decoding Example (3a)
Dictionary Dictionary
012436 012436
0 a a b 0 a a ba
1 b 1 b
2 ab 2 ab
3 b? 3 ba

Dictionary Data Compression - Lecture 19 13 Dictionary Data Compression - Lecture 19 14

LZW Decoding Example (3b) LZW Decoding Example (4a)


Dictionary Dictionary
012436 012436
0 a a b ab 0 a a b ab a
1 b 1 b
2 ab 2 ab
3 ba 3 ba
4 ab? 4 aba

Dictionary Data Compression - Lecture 19 15 Dictionary Data Compression - Lecture 19 16

LZW Decoding Example (4b) LZW Decoding Example (5a)


Dictionary Dictionary
012436 012436
0 a a b ab aba 0 a a b ab aba b
1 b 1 b
2 ab 2 ab
3 ba 3 ba
4 aba 4 aba
5 aba? 5 abab

Dictionary Data Compression - Lecture 19 17 Dictionary Data Compression - Lecture 19 18

3
LZW Decoding Example (5b) LZW Decoding Example (6a)
Dictionary Dictionary
012436 012436
0 a a b ab aba ba 0 a a b ab aba ba b
1 b 1 b
2 ab 2 ab
3 ba 3 ba
4 aba 4 aba
5 abab 5 abab
6 ba? 6 bab

Dictionary Data Compression - Lecture 19 19 Dictionary Data Compression - Lecture 19 20

LZW Decoding Example (6b) Decoding Exercise


Dictionary
012436 Base Dictionary
0 a a b ab aba ba bab 0 1 4 0 2 0 3 5 7
1 b 0 a
2 ab 1 b
3 ba 2 c
4 aba 3 d
5 abab 4 r
6 bab
7 bab?

Dictionary Data Compression - Lecture 19 21 Dictionary Data Compression - Lecture 19 22

Trie Data Structure for Encoder’s Encoder Uses a Trie (1)


Dictionary
• Fredkin (1960) 0 1 2 3 4
a b c d r
0 a 9 ca 0 1 2 3 4
1 b 10 ad a b c d r b 5 c 8 d 10 r 6 a 9 a 11 a 7
2 c 11 da
3 d 12 abr
4 r 13 raa b 5 c 8 d 10 r 6 a 9 a 11 a 7 r 12 a 13
5 ab 14 abra
6 br r 12 a 13 a 14
7 ra
8 ac a 14 abracadabraabracadabra
0 1 4 0 2 0 3 5 7 12

Dictionary Data Compression - Lecture 19 23 Dictionary Data Compression - Lecture 19 24

4
Encoder Uses a Trie (2) Decoder’s Data Structure
0 1 2 3 4 • Simply an array of strings
a b c d r
0 a 9 ca
b 5 c 8 d 10 r 6 a 9 a 11 a 7 1 b 10 ad
2 c 11 da 0 1 4 0 2 0 3 5 7 12 8 ...
r 12 a 15 a 13 3 d 12 abr a b r a c a d ab ra abr
4 r 13 raa
5 ab 14 abr?
a 14
6 br
7 ra
abracadabraabracadabra 8 ac
0 1 4 0 2 0 3 5 7 12 8

Dictionary Data Compression - Lecture 19 25 Dictionary Data Compression - Lecture 19 26

Bounded Size Dictionary Implementing the LRV Strategy


Doubly linked queue
• Bounded Size Dictionary Least Recent
0 1 2 3 4 Circular sibling lists
a b c d r Parent pointers
– n bits of index allows a dictionary of size 2n
– Doubtful that long entries in the dictionary will be b 5 c 8 d 10 r 6 a 9 a 11 a 7
useful.
• Strategies when the dictionary reaches its limit. r 12 a 13

1. Don’t add more, just use what is there.


a 14
2. Throw it away and start a new dictionary.
3. Double the dictionary, adding one more bit to indices. abracadabraabracadabra
4. Throw out the least recently visited entry to make Most Recent 0 1 4 0 2 0 3 5 7 12
room for the new entry.
Dictionary Data Compression - Lecture 19 27 Dictionary Data Compression - Lecture 19 28

Implementing the LRV Strategy Notes on LZW


Doubly linked queue • Extremely effective when there are repeated
Least Recent
0 1 2 3 4 Circular sibling lists
a b c d r Parent pointers patterns in the data that are widely spread.
• Negative: Creates entries in the dictionary
b 5 c 8 d 10 a 9 a 11 a 7
that may never be used.
r 12 a 6 a 13 • Applications:
– Unix compress, GIF, V.42 bis modem standard
a 14

abracadabraabracadabra
Most Recent 0 1 4 0 2 0 3 5 7 12 8

Dictionary Data Compression - Lecture 19 29 Dictionary Data Compression - Lecture 19 30

5
LZ77 Solution A
• Ziv and Lempel, 1977 • If xn+1xn+2...xn+k is a substring of x1x2...xn then
• Dictionary is implicit xn+1xn+2...xn+k can be coded by <j,k> where j is
• Use the string coded so far as a dictionary. the beginning of the match.
• Given that x1x2...xn has been coded we want • Example
to code xn+1xn+2...xn+k for the largest k ababababa babababababababab....
possible. coded
ababababa babababa babababab....
<2,8>

Dictionary Data Compression - Lecture 19 31 Dictionary Data Compression - Lecture 19 32

Solution A Problem Solution B


• What if there is no match at all in the • If xn+1xn+2...xn+k is a substring of x1x2...xn and
dictionary? xn+1xn+2... xn+kxn+k+1 is not then xn+1xn+2...xn+k
ababababa cabababababababab.... xn+k+1 can be coded by
coded <j,k, xn+k+1 >
• Solution B. Send tuples <j,k,x> where where j is the beginning of the match.
– If k = 0 then x is the unmatched symbol • Examples
– If k > 0 then the match starts at j and is k long and
the unmatched symbol is x. ababababa cabababababababab....
ababababa c ababababab ababab....
<0,0,c> <1,9,b>

Dictionary Data Compression - Lecture 19 33 Dictionary Data Compression - Lecture 19 34

Solution B Example Surprise Code!


a bababababababababababab..... a bababababababababababab$
<0,0,a> <0,0,a>
a b ababababababababababab..... a b ababababababababababab$
<0,0,b> <0,0,b>
a b aba bababababababababab..... a b ababababababababababab$
<1,2,a> <1,22,$>

a b aba babab ababababababab.....


<2,4,b>
a b aba babab abababababa bab.....
<1,10,a>

Dictionary Data Compression - Lecture 19 35 Dictionary Data Compression - Lecture 19 36

6
Surprise Decoding Surprise Decoding
<0,0,a><0,0,b><1,22,$> <0,0,a><0,0,b><1,22,$>

<0,0,a> a <0,0,a> a
<0,0,b> b <0,0,b> b
<1,22,$> a <1,22,$> a
<2,21,$> b <2,21,$> b
<3,20,$> a <3,20,$> a
<4,19,$> b <4,19,$> b
... ...
<22,1,$> b <22,1,$> b
<23,0,$> $ <23,0,$> $

Dictionary Data Compression - Lecture 19 37 Dictionary Data Compression - Lecture 19 38

Solution C In Class Exercise


• The matching string can include part of itself! • Use Solution C to code the string
• If xn+1xn+2...xn+k is a substring of – abaabaaabaaaab$
x1x2...xn xn+1xn+2...xn+k
that begins at j < n and xn+1xn+2... xn+kxn+k+1 is
not then xn+1xn+2...xn+k xn+k+1 can be coded by
<j,k, xn+k+1 >

Dictionary Data Compression - Lecture 19 39 Dictionary Data Compression - Lecture 19 40

Bounded Buffer – Sliding Window Search in the Sliding Window


• We want the triples <j,k,x> to be of bounded size. To offset length
achieve this we use bounded buffers. aaaabababaaab$ 1 0
– Search buffer of size s is the symbols xn-s+1...xn
j is then the offset into the buffer. aaaabababaaab$ 2 1
– Look-ahead buffer of size t is the symbols xn+1...xn+t
• Match pointer can start in search buffer and go into the aaaabababaaab$ 2 2
look-ahead buffer but no farther.
aaaabababaaab$ 2 3
match pointer uncoded text pointer
Sliding window
tuple aaaabababaaab$ 2 4
aaaabababaaab$ <2,5,a> tuple
search buffer look-ahead buffer
aaaabababaaab$ 2 5 <2,5,a>
coded uncoded

Dictionary Data Compression - Lecture 19 41 Dictionary Data Compression - Lecture 19 42

7
Coding Example Coding the Tuples
s = 4, t = 4, a = 3
• Simple fixed length code
tuple log2 (s + 1) + log2 (s + t + 1) + log2a
aaaabababaaab$ <0,0,a>
aaaabababaaab$ <1,3,b> tuple fixed code
s = 4, t = 4, a = 3
aaaabababaaab$ <2,5,a> 010 0101 00
<2,5,a>
aaaabababaaab$ <4,2,$> • Variable length code using adaptive Huffman
or arithmetic code on Tuples
– Two passes, first to create the tuples, second to
code the tuples
– One pass, by pipelining tuples into a variable
length coder
Dictionary Data Compression - Lecture 19 43 Dictionary Data Compression - Lecture 19 44

Zip and Gzip Example


12
• Search Window
aaaabababaaabaaaababababaaabba$
– Search buffer 32KB
– Look-ahead buffer 258 Bytes 9
• How to store such a large dictionary Offset =12 – 8 = 4
aba 8 6 4 Length = 5
– Hash table that stores the starting positions for all
three byte sequences. Tuple = <4,5,a>
7 5
– Hash table uses chaining with newest entries at the
beginning of the chain. Stale entries can be ignored. 11 3
• Second pass for Huffman coding of tuples.
• Coding done in blocks to avoid disk accesses.
10 2 1

Dictionary Data Compression - Lecture 19 45 Dictionary Data Compression - Lecture 19 46

Example Notes on LZ77


18
• Very popular especially in unix world
aaaabababaaabaaaababababaaabba$
• Many variants and implementations
13 9 – Zip, Gzip, PNG, PKZip,Lharc, ARJ
17 12 8 6 4 • Tends to work better than LZW
bab – LZW has dictionary entries that are never used
7 5
No match – LZW has past strings that are not in the dictionary
16 11 3 Tuple = <0,0,b>
– LZ77 has an implicit dictionary. Common tuples
are coded with few bits.

15 14 10 2 1

Dictionary Data Compression - Lecture 19 47 Dictionary Data Compression - Lecture 19 48

You might also like