Professional Documents
Culture Documents
LZRW4
LZRW4
Intorduction
2. Statistical Methods
**********************
3.12 LZRW4
LZRW4 is a variant of LZ77, based on ideas of Ross Williams about possible ways to
combine a dictionary method with prediction (Section 3.30). LZRW4 also borrows some
ideas from LZRW1. It uses a 1 Mbyte buffer where both the search and look-ahead
buffers slide from left to right. At any point in the encoding process, the order-2 context
of the current symbol (the two most recent symbols in the search buffer) is used to
predict the current symbol. The two symbols constituting the context are hashed to a
Each partition contains 32 pointers to the input data in the 1 Mbyte buffer (each pointer
is thus 20 bits long).
The 32 pointers in partition A[I] are checked to find the longest match between
the look-ahead buffer and the input data seen so far. The longest match is selected and
is coded in 8 bits. The first 3 bits code the match length according to Table 3.17; the
remaining 5 bits identify the pointer in the partition. Such an 8-bit number is called a
copy item. If no match is found, a literal is encoded in 8 bits. For each item, an extra
bit is prepared, a 0 for a literal and a 1 for a copy item. The extra bits are accumulated
in groups of 16, and each group is output, as in LZRW1, preceding the 16 items it refers
to.
length: 2 3 4 5 6 7 8 16
The partitions are updated all the time by moving "good" pointers toward the start
of their partition. When a match is found, the encoder swaps the selected pointer with
the pointer halfway toward the partition (Figure 3.18a,b). If no match is found, the
entire 32-pointer partition is shifted to the left and the new pointer is entered on the
The Red Queen shook her head, "You may call it 'nonsense' if you like," she said,
"but I've heard nonsense, compared with which that would be as sensible as a
dictionary!"