LEMPEL-ZIV COMPRESSION

ESE 751-SPEECH, IMAGE AND CODING
GROUP MEMBERS: ‡SUHANA BINTI SABUDIN ‡HARYANTI BINTI NORHAZMAN ‡NURULAZLINA BINTI RAMLI ‡FARHAN HANI BINTI GHAZALI

INTRODUCTION
‡ An algorithm for lossless data compression scheme, originally called Lempel-Ziv coding, and also referred to as Lempel-Ziv-Welch (LZW) coding, following the modifications of Welch. ‡ Not a single algorithm. A family of algorithm developed by Abraham Lempel and Jacob Ziv e.g. LZW (Lempel-Ziv-Welch): used in the compress command Unix operating system. TIFF (Tag Image File Format) supports LZ coding ‡ Adopted in a variety of imaging file formats, such as the graphic interchange format (GIF), tagged image file format (TIFF) and the portable document format (PDF).

BASIC PRINCIPLES OF ENCODING:
1) It assigns a fixed length codeword to a variable length of symbols. 2) Unlike Huffman coding and arithmetic coding, this coding scheme does not require a priori knowledge of the probabilities of the source symbols. 3) The coding is based on a dictionary or codebook containing the source symbols to be encoded. The coding starts with an initial dictionary, which is enlarged with the arrival of new symbol sequences. 4) There is no need to transmit the dictionary from the encoder to the decoder. A Lempel-Ziv decoder builds an identical dictionary during the decoding process.

LEMPEL AND ZIV INTRODUCED DYNAMIC DICTIONARY ENCODERS KNOWN AS:
‡ LZ77 : An adaptive dictionary-based compression algorithm
and developed in 1977. An algorithm uses a sliding window

dictionary, where each entry is a character. LZ77 code words consist of an offset to a sliding window and the number of characters following the offset to include in an encoded string.

LEMPEL AND ZIV INTRODUCED DYNAMIC DICTIONARY ENCODERS KNOWN AS:
‡ LZ78 : Due to inefficiency, Lempel and Ziv developed a different form of dictionary-based compression in 1978 
The techniques is used by replacing the phrases with a pointer

to where they have occurred earlier in the text. ‡ LZW : If the message to be encoded consists of only one character, LZW outputs the code for this character 
otherwise

it

inserts

two-

or

multi-character,

overlapping*,distinct patterns of the message to be encoded in a Dictionary.

‡LZ 77 uses previous seen text to build a dictionary Strings of symbols are added to a dictionary.1) LZ77 ALGORITHM ‡ First paper by Lempel and Ziv in 1977 about lossless compression with an adaptive dictionary. . Adaptive dictionary: Entries are taken from the text itself and created on-the-fly ‡A search buffer containing encoded character sequence that precedes the current coding position can be considered as a dictionary ‡The encoder matches the input sequence through a sliding window.

LZ77 CONT. ‡ Main data structure is a sliding window divided into two parts:  A look-ahead buffer which has characters read in from the input but not yet encoded. ‡ The algorithm tries to match the contents of the look-ahead buffer to a string in the search buffer. ..  A large block of decoded text held in a search buffer ‡ Symbols within the look-ahead buffer are then compared with data in the search buffer.

Lempel and Ziv developed a different form of dictionary-based compression in 1978 The techniques is used by replacing the phrases with a pointer to where they have occurred earlier in the text. .LEMPEL AND ZIV INTRODUCED DYNAMIC DICTIONARY ENCODERS KNOWN AS: ‡ LZ77 : adaptive dictionary-based compression algorithm and developed in 1977 ‡LZ78 : Due to inefficiency.

‡ Once the longest match is found. the character following the prefix in a look-ahead buffer). the search buffer is searched to find the longest match with a prefix of the look-ahead buffer. .ENCODING ‡ To encode the sequence in the look-ahead buffer. length. it is coded into a fixed-length codeword consisting of three elements: (position. ‡ The match can overlap with the look-ahead buffer. but it cannot overlap the buffer itself. ‡ The window is shifted left by length+1 symbols to begin the next search.LZ77.

SLIDING WINDOW ‡ The LZ77 algorithm employs a principle called sliding-window: ‡ It looks at the data through a search buffer . What happens if the input is very long and therefore references (and lengths) become very large numbers? . removing the oldest encoded data from the view and adding new unencoded data to it. the window slides along. anything outside this window can neither be referenced nor encoded.LZ77. ‡ As more data is being encoded. ‡ This is where we spotted the weakness of the outlined algorithm.

If a match is found.EXAMPLE OF LZ77 (ENCODER PART) Input string= abracadabrad Steps: 1) Read an unencoded string (at look ahead buffer) 2) Search the longest matching of the current look ahead buffer in the search buffer. write the encode output (fixed-length codeword) by following this concept: .

x= is the no of match prefix location that we found in the search buffer y= length of the match prefix z= next bit after a match prefix in the look ahead buffer . y . z > where.Concept of encoded: encode = < x .

therefore at next sliding window. .Reminder!!! If there have the match prefix in the search buffer. we will put in that next prefix together to the current search buffer.

3) Go through the look ahead buffer until finish the encoded process. we get the string that been encoded by this compression algorithm. S=abracadabrad Encoded output== . 4)Finally. Unencoded string.

No searching is required. . The encoding process requires that the dictionary is searched for matches to the string to be encoding ‡ Decoding an offset and length combination only requires going to a dictionary offset and copying the specified number of symbols.LZ77 : DECODING PROCESS ‡ The LZSS decoding process is less resource intensive than the LZSS encoding process.

then copy the specified number of symbols from the dictionary to the decoded output. . Step 3. Read the encoded/not encoded flag. Step 2. Initialize the dictionary to a known value.DECODING INPUT REQUIRES THE FOLLOWING STEPS: Step 1. Read the encoded length and offset. If the flag indicates an encoded string: Step 3a.

DECODING INPUT REQUIRES THE FOLLOWING STEPS: Step 3b. Shift a copy of the symbols written to the decoded output into the dictionary. Otherwise. Repeat from Step 2. until all the entire input has been decoded. Step 4. Step 5. . read the next character and write it to the decoded output.

. Steps: 1)Decode the encoded output that we get from the compression process.EXAMPLE OF LZ77 (DECODER PART) To get the original input string. we need to do the decompress process.

2) Construct the table as below. Encoded .

y.3) Decode the encoded input level by level with follow the decoded concept. of match prefix that we found previously <x. Next bit or character Contain No.z> after match prefix Length of the match prefix .

It is represent the original input string. 3) Finally. we get the decoded output.4) Repeat step 3 until finish all the encoded input. Encoded input == The string (Decoded output) == S= abracadabrad .

Lempel and Ziv developed a different form of dictionarybased compression in 1978 called LZ78.2) LZ78 ALGORITHM ‡ Due to inefficiency of LZ77. ‡ Instead of having a limited-size window into the preceding text. ‡ The basic idea of this method is to build a dictionary of strings while encoding. LZ78 builds its dictionary out of all of the previously seen symbols in the input text. .

this process continues. ‡ Both the encoder and decoder start off with an empty dictionary. ‡ The dictionary is built progressively. . one character at a time.LZ78 CONT. it is added to the current string. As each character is read in. As long as the character matches some existing phrase in the dictionary..

LET]S BUILD THE DICTIONARY!!! .

Go through the given string. STEPS: 1) Initially the dictionary is empty. a bit by bit to encountered a Unix input symbol(no match with others) or phrases and then added to the dictionary. . Show the encoding process.EXAMPLE OF LZ78 (ENCODER PART) The string S= bacababbaabbabbaaacbbc is to be encoded.

.

Concept of encoded: encode = < x.2)Encode the phrases by defining whether the phrase has any match with existing phrases in dictionary. x = find the match phrase in the dictionary and encode it by refer to the index no y = the last bit in each phrase (Unix symbol) . y > where.

.

S= bacababbaabbabbaaacbbc Encoded output== .3) Repeat step no. 4)Finally. 2 until finish the string. Unencoded string. we get the string that been encoded by this compression algorithm.

we need to do the decompress process. Steps: 1)Decode the encoded output that we get from the compression process. .EXAMPLE OF LZ78 (DECODER PART) To get the original input string.

* Again as each character is read in. y1 > where. it is added to the current string. Take the contents of earlier index number in decoded column to represent as the decoded output. y = remain the last bit in encoded output . it is added to the current string.Concept of decoded: decode = < x1 . x1 = refer to the index number. . As each character is read in.

.

It is represent the original input string. we get the decoded output.2) Repeat step 1 until finish the encoded output. Encoded input == The string (decoded output) S= bacababbaabbabbaaacbbc . 3) Finally.

Jacob Ziv. ‡ ‡ .3) LZW : LEMPEL-ZIV-WELCH ALGORITHM ‡ Lempel Ziv Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel. and Terry Welch. The algorithm is designed to be fast to implement but is not usually optimal because it performs only limited analysis of the data. Published by Welch in 1984 as an improved implementation of the LZ78 algorithm.

The example runs as shown with 12 bit codes.HOW LZW WORKS? ‡ The codes from 0 to 255 represent 1-character sequences consisting of the corresponding 8bit character (ASCII Codes) ‡ The remaining codes (256 through 4095) are assigned to strings as the algorithm proceeds. while codes 256-4095 refer to substrings. ‡ This means codes 0-255 refer to individual bytes. .

CONT ‡ ‡ Produces only a list of dictionary entry indexes Encoding 1. From the input. find the longest string that exists in the dictionary 3.255) 2. Starts with initial dictionary ‡ For example. possible ascii characters (0.. Continue from that character on from (2) . Output this string s index in the dictionary 4.LZW . Append the next character in the input to that string and add it into the dictionary 5.

EXAMPLE : COMPRESSION USING LZW Example : Use the LZW algorithm to compress the string BABAABAAA .

EXAMPLE : LZW COMPRESSION STEP 1 BABAABAAA ENCODER OUTPUT STRING codeword 256 TABLE string BA output code representing 66 B .

EXAMPLE : LZW COMPRESSION STEP 2 BABAABAAA ENCODER OUTPUT STRING codeword 256 257 TABLE string BA AB output code representing 66 B 65 A .

EXAMPLE : LZW COMPRESSION STEP 3 BABAABAAA ENCODER OUTPUT STRING codeword 256 257 258 TABLE string BA AB BAA output code representing 66 B 65 256 A BA .

EXAMPLE 1: LZW COMPRESSION STEP 4 BABAABAAA ENCODER OUTPUT STRING codeword 256 257 258 259 TABLE string BA AB BAA ABA output code representing 66 B 65 256 257 A BA AB .

EXAMPLE 1: LZW COMPRESSION STEP 5 BABAABAAA ENCODER OUTPUT STRING codeword 256 257 258 259 260 TABLE string BA AB BAA ABA AA output code representing 66 B 65 256 257 65 A BA AB A .

EXAMPLE : LZW COMPRESSION STEP 6 BABAABAAA ENCODER OUTPUT STRING codeword 256 257 258 259 260 P=AA C=empty TABLE string BA AB BAA ABA AA output code representing 66 B 65 256 257 65 260 A BA AB A AA .

LZW DECOMPRESSION  The LZW de-compressor creates the same string table during decompression.  The string table is updated for each character in the input stream.  It starts with the first 256 table entries initialized to single characters. except the first one. .  Decoding achieved by reading codes and translating them through the code table being built.

EXAMPLE : LZW DECOMPRESSION 1 Example 2: Use LZW to decompress the output sequence of Example 1: <66><65><256><257><65><260>. .

EXAMPLE : LZW DECOMPRESSION STEP 1 <66><65><256><257><65><260> ENCODER OUTPUT string B A 256 BA Old = 65 S = A New = 66 C = A STRING TABLE codeword string .

EXAMPLE : LZW DECOMPRESSION STEP 2 <66><65><256><257><65><260> ENCODER OUTPUT string B A BA STRING TABLE codeword string 256 257 BA AB .

EXAMPLE : LZW DECOMPRESSION STEP 3 <66><65><256><257><65><260> ENCODER OUTPUT string B A BA AB STRING TABLE codeword string 256 257 258 BA AB BAA .

EXAMPLE : LZW DECOMPRESSION STEP 4 <66><65><256><257><65><260> ENCODER OUTPUT string B A BA AB A STRING TABLE codeword string 256 257 258 259 BA AB BAA ABA .

EXAMPLE : LZW DECOMPRESSION STEP 5 <66><65><256><257><65><260> ENCODER OUTPUT string B A BA AB A AA STRING TABLE codeword string 256 257 258 259 260 BA AB BAA ABA AA .

ADVANTAGES AND DISADVANTAGES OF LZW ‡ LZW compression works best for files containing lots of repetitive data. ‡ Files that are compressed but that do not contain any repetitive information at all can even grow bigger! ‡ LZW compression is fast. This is often the case with text and monochrome images. .All recent computer systems have the horsepower to use more efficient algorithms. ‡ LZW is a fairly old compression technique .

WHERE IS LZW COMPRESSION USED? ‡ LZW compression can be used in a variety of file formats: ‡ TIFF files ‡ GIF files ‡ PDF files In recent applications LZW has been replaced by the more efficient Flate algorithm. .

CONCLUSION o The Lempel Ziv algorithms belong to the third category of dictionary coders. while at the same time also encoding the data. o It is not necessary to explicitly transmit/store the dictionary because the decoder can build up the dictionary in the same way as the encoder while decompressing the data. . o The dictionary is being built in a single pass.

CONCLUSION (CONT) ‡ LZ77 : Memory /speed constrains require restrictions used a fixed-size window (sliding window principle) and used previously processed data as dictionary. . ‡ LZW : Most popular modifications of LZ78 dictionary pre-filled with input alphabet. ‡ LZ78 : Keep explicit dictionary and gradually build dictionary during encoding.

THE END -THANK YOU- .

Sign up to vote on this title
UsefulNot useful