Lempel-Ziv , or LZ77 is an adaptive dictionary- based compression techniques. LZ77 exploits the fact that words and phrases within a text file are likely to be repeated. When there is repetition, they can be encoded as a pointer to an earlier occurrence, with the pointer accompanied by the number of characters to be matched. The encoder examines the input sequence through a sliding window. This window consists of two parts a
that contains a portion of the recently encoded sequence, and a
that contains the next portion of the sequence to be encoded. In practice the sizes of the two buffers are larger . The code encoded as a triple (
is the offset (The distance of the pointer from the look-ahead buffer),
is the length of the longest match and
is the codeword corresponding to the symbol in the look-ahead buffer that follows the match. It is a very simple adaptive scheme that requires no prior knowledge of the source and seems to require no assumptions about the characteristics of the source , .
LZW is a universal lossless data compression technique created by Abraham Lempel, Jacob Ziv , and Terry Welch, . This technique is simple to be implemented, and has the potential for very high throughput in hardware implementations . LZW compression creates a table of strings commonly occurring in the data being compressed, and replaces the actual data with references into the table. The table is formed during compression at the same time at which the data is encoded and during decompression at the same time as the data is decoded . LZW is a technique for removing the necessity of encoding the second element of the pair (
). That is, the encoder would only send the index to the dictionary. So the dictionary has to be primed with all the letters of the source alphabet. The technique is surprisingly simple; it replaces strings of characters with single codes. It does not do any analysis of the incoming text .
This scheme is initiated by Ziv and Lempel , . An implementation using a binary tree is proposed by Bell. The technique is quite simple: A ring buffer is kept, which initially
contains “space” characters only. Several letters are read from
the file to the buffer. Then the buffer will be searched for the longest string that matches letters just read, and its length and position in the buffer will be sent. If the buffer size is 4096 bytes, the position can be encoded in 12 bits. If the match length is represented in four bits, the <position, length> pair is two bytes long. If the longest match is no more than two characters, then just one character is sent without encoding, and the process is restarted with the next letter. One extra bit must be sent each time to tell the decoder whether a <position, length> pair is sent or the code of the character .
In each step the LZSS technique sends either a character or a [position, length] pair. Among these, perhaps character
“e” appears more frequently than “x”, and a [position, length]
pair of length 3 might be commoner than one of length 18. Thus, if the more frequent will be encoded in fewer bits and less frequent in more bits, the total length of the encoded text will be diminished. This compression suggests that it should use arithmetic coding, preferably of adaptive kind, along with LZSS , .
LZHUF, the technique of Haruyasu Yoshizaki replaces
LZARI’s adaptive arithmetic coding with adaptive Huffman.
LZHUF encodes the most significant 6 bits of the position in its 4096-byte buffer by table lookup. More recent, and hence more probable, positions are coded in fewer bits. On the other hand, the remaining 6 bits are sent verbatim. Because Huffman coding encodes each letter into a fixed number of bits, table lookup can be easily implemented .
PPM, or prediction by partial matching, is an adaptive statistical modeling technique based on blending together different length context models to predict the next character in the input sequence. A Series of improvements was described called PPMC that is tuned to improve compression and increase execution speed. Also the use of exclusion principle is used to improve the performance. PPM relies on arithmetic coding to obtain very good compression performance. PPM is a combination of several fixed-order context models to predict the next character in an input sequence. The prediction probabilities for each context in the model are calculated by frequency counts, which are updated adaptively and the symbols that occurs are encoded relative to their predicated distribution using arithmetic coding .
PPMC (prediction by partial matching without exclusion) is a technique to assign probability to the escape character is called the technique C and will be as follows: at any level, with the current context, let the total number of symbols seen previous be n
and let n
be the total number of distinct contexts. Then the probability of the escape character is given by n
/ ( n
). Any character which appeared in this context n
times will have a probability n
). The intuitive explanation of this technique, based on experimental evidence, is that if many distinct contexts are encountered, then the escape character will have higher probability but if these distinct contexts tend to appear too many times, then the probability of the escape character decreases. The PPM technique using technique C for probability estimation is called PPMC technique.
PPMC with Exclusion Technique
PPMC can be modified by using exclusion, this modification will improve the compression ratio but it is slower than the first type. Exclusion principle states that: If a
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 11, No. 12, December 201375http://sites.google.com/site/ijcsis/ ISSN 1947-5500