Professional Documents
Culture Documents
02001 IEEE
0-7803-6725-1/01/$10.00 450
facilitate the task of compression, we believe that the effect of sub-reason of prioritization, we propose the following segmentation
optimal block-based segmentation is less significant than that of scheme based on a series of decision rules from the block type with
sub-optimal layer-based segmentation, especially if we consider a the highest priority to the block type with the lowest priority. The
prioritized segmentation and compression approach. decision for smooth and text blocks is relatively straightforward.
The histogram of smooth or text blocks is typically dominated by
The principle motivation for prioritizing the block type is to design
one or two intensity values(modes). The distinction between
coding algorithms so that they can gracefully handle the potential
graphics and image blocks is often tricky. Since the maximum size
penalty cost by the wrong decision made at the segmentation stage.
of color palette in our color indexing strategy is four, a block is
In the order of from high priority to low priority, we have smooth
deciax:! to be a graphics block only if the number of modes is no
block(one-color), text block(two-color), graphics block(four-color)
larger than f o x . When the block size is reasonably small, the
and image block(wave1et-based). For example, If a block with low
likelihood that there are more than four different colors within a
priority(e.g. graphics) is coded by a two-color coding algorithm, graphics block is small. Overall, we provide a detailed step-by-step
some color might get lost; while if a block with high priority(e.g.
imlementation of the urouosed segmentation scheme in Fie. 2.
text) is coded by a four-color coding algorithm, we can still
achieve the nearly optimal compression. Based on the above empirical statistics collection:
observations, we decide the block type through a sequential mode
test in the order of decreasing priority. The prioritized - calculate the probability of intensity value i(i=O,I, ...,255
segmentation and compression strategies offer a better control of p i = freq ( i ) / B , where B is the block size;
the tradeoff between the rate and the distortion. Experiment results
show that we can achieve visually lossless compression of scanned - find the mode: rn, ,..., mN and compute the curnulati\
documents at the bit rate of around l-ISbpp, though no explicit
probability around the mode m : C, = Pi
rate-distortion optimization is performed yet. More complicated
rate-distortion optimized segmentation is referred to [4].
sequential decision:
2. Block-based Segmentation - if N=l and cpTh, declare smooth block;
The segmentation of a compound image is often a computation - else if N=2and cl+c2 >Th and 1 cl-cz I>T, declare text block;
demanding problem. In order to keep the computational
complexity of segmentation low, we propose a simple but effective - else if N14 and c,+c2+c3+c4>Th,declare graphics block;
segmentation strategy based on the histogram of the block. Figure - else declare image block.
1 shows the examples of several typical histogram distributions of
four different types of block. Varying characteristics of the Figure 2. Step-by-step implementation of block-based
histogram from one type to another make it a natural choice for the segmentation
block-based segmentation. It can be estimated from Fig. 2 that the average computations
required for each block are quite modest. Experiments have
justified that the complexity of the proposed block-based
segmentation is at least one order of magnitude lower than that of
color-clustering based segmentation used by Djvu. Though the
information contained in the neighboring block could help the
segmentation, we choose to rely on the data within the block only
due to memory considerations. For scanned documents that
typically have fairly high resolution, it is often impractical to store
the whole image in the buffer. Meanwhile, since the goal of
segmentation is to facilitate the task of compression, we believe
(a) smooth block (b) text block that the coding efficiency loss due to sub-optimal segmentation
results is small especially after we consider the prioritization of the
45 1
, ~ ~ l i C O N r t N r 5 2) Graphics block:
The coding of graphics block is similar to the coding of a
palette-based image. Since the segmentation has put a
limitation on the number of modes for the graphics blocks, we
only deal with the case of a small-size palette(M=4) here. In
previous work [9], a color re-indexing algorithm is proposed
to find the optimal palette for a given image. However, we
find that when the palette size is small, we can simply assign
index 0 to the most probable color; index 1 to the second
most probable color and so on. Such ad-hoc indexing scheme
often achieves nearly optimal compression performance
without the necessity of exhaustively searching all 4!=24
possible palettes. Meantime, since the knowledge of empirical
probability distribution is already available from the segmen-
tation stage, histogram-based color indexing does not require
Figure 3. Original JOSA image(left) and block-based segmentation any additional computation. The palette is compressed in a
result(right,B= 16,A=8,T=128,th=0.05,Th=0.95). similar way like the (rnin,rnax)values in the text block and the
indexed image is compressed by a 4th0rder(4~=256 contexts)
3. Adaptive Coding context-based adaptive arithmetic coder.
After the segmentation, the block type is coded by a context-based Resolution scalability is another tricky issue for graphics
adaptive arithmetic coder. The context includes the nearest four block coding. Classical linear transforms such as wavelet
causal neighbors, which gives 44 =256 contexts in total. The transforms do not preserve the level set and thus often
overhead is about 0.002-0.02bpp for the block size of 8-32. increase the entropy of palette-based images [IO]. In this
Depending on the block type, we have designed prioritized coding work, we propose a level-set preserving multi-resolution
algorithms to adaptively compress each block. The coding of decomposition for the palette image based on lifting scheme.
smooth block is straightforward: we simply plug the average For simplicity, we obtain the low-resolution image directly
intensity value into an adaptive arithmetic coder. Next, we shall from the down-sampling of a high-resolution image, i.e.
detail on the coding of text block, graphics block and image block. X(i,j)=x(2i,2J). Then the interlaced lattice ~(2i+Z,2j+Z)is
adaptively predicted from the low-resolution image x(2i,2j):
1) Text block:
g, = n(2i,2j) - x(2i+ 2,2 j + 2)]
The block is first quantized to a binary map and then both the
modes(min,rnux) and the binary map are transmitted to the g, = ~ ( 2 i + 2 , 2 j ) - x ( 2 i + 2 , 2 j ) ]
decoder. The (rnin,max) values are compressed in a similar
fashion like the average value in the smooth block. The binary if lgll<lg2l. P(2i + 1,2j + 1) = [x(2i,2j) + n(2i + 2,2 j + 2)]/2
map is compressed by a 6-th 0rder(2~=64contexts) context- otherwise, i ( 2 i + 1,2j + 1) = [x(2i + 2,2j) + x(2i,2 j + 2)]/2.
based adaptive binary adaptive coder(e.g. QM coder). The The prediction of the interlaced lattice x(i,j)(i+j=odd) from
resolution scalability for binary image is solved by the x(i,j)(i+j=even) can be performed in a similar fashion. The
resolution reduction method specified in progressive JBIG prediction residue is generated by e = x - ;(mod M ) and its
[SI. Instead of taking the average of each 2x2 block, we reversibility is achieved by x = e + P(mod M ) .
obtain a low-resolution image by calculating the following
quantity for each X(capita1 denotes a low-resolution pixel): 3) Image block:
Q =4~+2(a+b+c+d)+ Wavelet-based coding algorithms have been proven effective
(e + f + g + h)- 3(B +C)- A on compressing photographic images. We use the normalized
S+P transform [XI for its computational efficiency. To exploit
If 49.5, then X=l;otherwise X=O; the nonstationarity of high bands, wavelet coefficients are
coded in two stages: the positions of significant coefficients
are first compressed by a context-based adaptive binary
arithmetic coder and the sigdmagnitude of significant
coefficients are plugged into another arithmetic coder.
Resolution scalability is guaranteed by the wavelet transform.
4. Experiment Results
We use the scanned document JOSA(768~928,Fig. 3) in our
experiment. Block-based Segmentation and Adaptive Coder
Fig. 4 The context used to determine the value of a low- (BSAC) is compared with the popular Djvu coder. The
resolution pixel executables of Djvu encodeddecoder can be downloaded from
httr,://www.lizardtech.com/.
452
Fig. 4 shows the comparison of different portions of decoded JOSA full resolution for all layers and with the highest quality factor(100).
image at the bit rate of around l.Obpp. The parameter setting with It can be observed that some texts are still smeared out by Djvu
Djvu coder is: resolution of 300dpi for all three layers and quality coder at the bit rate of nearly 2.Obpp. Though the coding results of
factor of 88 for background and foreground layers. In BSAC coder, JPEG-2000 are not included here, we note that JPEG-2000 need at
we use the same parameters at the segmentation stage as shown in least 1.5bpp to fully preserve all the fine structures(inc1uding those
Fig.3 and the quantization step size(q) of 16 in the coding of image coming from the other side of the sheet during the scanning
block. Though resolution scalability is supported by BSAC, we only process). In contrast, through careful inspection with the decoded
show the coding results at full resolution to compare them and Djvu image by BSAC, we find that all the fine details are well preserved
with the similar parameter setting. Significant improvement on at the rate of around 1.3bpp(q=8). More extensive experiments
visual quality has been achieved by BSAC over Djvu. Meanwhile, demonstrate that BSAC is able to achieve visually lossless
the overall computational complexity of BSAC coder is several compression for typical scanned documents at the bit rate of
times less than that of Djvu. For example, it takes around 1 seconds I.0-1.5bpp.
for JPEG2000 or BSAC coder to compress the JOSA image while
Djvu spends over 10 seconds with the above parameter setting on
the same Pentium-III 866M machine.
5. References
[l] L. Bottou et al., “High quality document image compression
using DjVu”, Journal of Electronic Imaging, Vo1.7, pp.410-
425, July 1998.
[2] R.L.de Queiroz et al., “Optimizing block-thresholding
segmentation for multiplayer compressiong of compound
images”, IEEE Trans. on Image Processing, vo1.9, pp.1461-
1471, Sep. 2000.
[3] A. Said and A. Drukarev, “Simplified segmentation for
compound image compression”, Proceeding of ICIP’ 1999.
[4] D. Cheng and C. Bouman, “Document compression using
rate-distortion optimized segmentation”, to appear in the
Journal of Electronic Imaging.
[5] Mixed Raster Content ITU-T Study Group 8, Question 5 ,
Draft Recommendation T.44, May 1997.
[6] R.L. de Queiroz, “On data filling algorithms for MRC
layers”, Proceeding of ICIP’2000, Vancouver, Sep. 2000
Djvu BSAC [7] L. Bottou and S. Pigeon, “Lossy compression of partially
masked still images”. Proceedings of DCC98, Snowbird,
Figure 4. Comparison of the text(top), graphics(midd1e) and March 1998
irnage(bottom) regions in the decoded JOSA images between (81 JBIG, “Progressive bi-level image compression”,
DjVu coder(96352bytes) and BSAC coder(85,806bytes). International Standard, ISOlIEC 1 1544, 1993
As we mentioned in the introduction, Djvu coder fails to offer the [9] Z. Weng et al., “An efficient color re-indexing scheme for
visually lossless compression even with the most conservative palette-based compression,”Proc. of ICIP2000
parameter settings due to the fundamental limitations of MRC-based [IO] X. Li and S . Lei, “On the study of lossless compression of
model. Fig. 5 shows the portion of decoded JOSA image by Djvu at computer generated compound images”, Proc. of ICIP’2001
453