You are on page 1of 1
we ‘hip 7» Ont compression blanks in the sequence. The number of occurrences ean again be offset (by —3) Sequences of between three and 257 bytes can thus be reduced to two bytes. Further variations are tabulators used to replace a specific number of null bytes and the defi tion of different M-bytes to spocily different numbers of null bytes. For exampl M5-hyte could replace 16 null bytes, while an Md-byte could replace 8 null bytes. An MS-byte followed by an Mé-byte would then represent 24 null bytes 7.4.3 Vector Quantization In the case of vector quantization, a data stream is divided into blocks of 1 bytes each (n> 1). A predefined table contains a set of patterns. For each block, the table ‘consulted to find the most similar pattern (aocording to a fixed criterion). Each pattern in the table is associated with an index. Thus, each block can be assigned an index, Such a table can also be multidimensional, in which ease the index will be a veetor. The ‘corresponding decoder has the same table and uses the vector to generate an appro. ‘mation ofthe original data stream, For further details see [Gra84] for example, 7.44 Pattern Substitution ‘A technique that can be used for text compression substitutes single bytes for pat- tees that occur frequently. This patter substitution ean be used to eode, for example, the terminal symbols of high-level languages (begin, end, i. By using an M-byte, & larger number of words can be encoded-the M-byte indicates that the next byte is an index representing one of 256 words. The sume technique can be applied to still images, video, and audio, In these media, it is not easy to identify small sets of frequently occur- ring patterns. ILis thus better to perform an approximation that looks for the most sim Jar (instead of the same) pattern. This isthe above described vector quantization. 7.45. Diatomic Encoding Diatomic encoding is a variation based on combinations of two data bytes. Thi technique determines the most frequently ceeurting pairs of bytes. Studies have shown that the eight y occurring pairs in the English language are “E," “T° STH "A, and “HE.” Replacing these pairs by special single bytes do not oceur in the text Ieads to a date reduction of more than ten pereent. 7.4.8 Statistical Coding Thore is no fundamental reason that different characters need to be coded with fixed number of bits. Morse code is based on this: Frequently occurting characters are ‘coded with short strings, while seldom-occurring characters are coded with longer strings. Such statistical coding depends how frequently individual characters or

You might also like