we ‘hip 7» Ont compression
blanks in the sequence. The number of occurrences ean again be offset (by —3)
Sequences of between three and 257 bytes can thus be reduced to two bytes. Further
variations are tabulators used to replace a specific number of null bytes and the defi
tion of different M-bytes to spocily different numbers of null bytes. For exampl
M5-hyte could replace 16 null bytes, while an Md-byte could replace 8 null bytes. An
MS-byte followed by an Mé-byte would then represent 24 null bytes
7.4.3 Vector Quantization
In the case of vector quantization, a data stream is divided into blocks of 1 bytes
each (n> 1). A predefined table contains a set of patterns. For each block, the table
‘consulted to find the most similar pattern (aocording to a fixed criterion). Each pattern
in the table is associated with an index. Thus, each block can be assigned an index,
Such a table can also be multidimensional, in which ease the index will be a veetor. The
‘corresponding decoder has the same table and uses the vector to generate an appro.
‘mation ofthe original data stream, For further details see [Gra84] for example,
7.44 Pattern Substitution
‘A technique that can be used for text compression substitutes single bytes for pat-
tees that occur frequently. This patter substitution ean be used to eode, for example,
the terminal symbols of high-level languages (begin, end, i. By using an M-byte, &
larger number of words can be encoded-the M-byte indicates that the next byte is an
index representing one of 256 words. The sume technique can be applied to still images,
video, and audio, In these media, it is not easy to identify small sets of frequently occur-
ring patterns. ILis thus better to perform an approximation that looks for the most sim
Jar (instead of the same) pattern. This isthe above described vector quantization.
7.45. Diatomic Encoding
Diatomic encoding is a variation based on combinations of two data bytes. Thi
technique determines the most frequently ceeurting pairs of bytes. Studies have shown
that the eight y occurring pairs in the English language are “E," “T°
STH "A, and “HE.” Replacing these pairs by special single bytes
do not oceur in the text Ieads to a date reduction of more than ten
pereent.
7.4.8 Statistical Coding
Thore is no fundamental reason that different characters need to be coded with
fixed number of bits. Morse code is based on this: Frequently occurting characters are
‘coded with short strings, while seldom-occurring characters are coded with longer
strings. Such statistical coding depends how frequently individual characters or