You are on page 1of 4

Block-based Segmentation and Adaptive Coding for Visually Lossless

Compression of Scanned Documents


Xin Li and Shawmin Lei
Digital Video Department, Sharp Labs of America
5750 NW Pacific Rim Blvd., Camas, WA98607, USA

Abstract In spite of the popularity of MRC-based model for representing


This paper presents a novel block-based segmentation and compound images, it has the following fundamental limitations.
adaptive coding(BSAC) algorithm for visually lossless First, MRC-based model is based on an ideal assumption with the
compression of scanned documents that contain not only segmentation that textslgraphics can be accurately separated from
photographic images but also text and graphic images. For such images. However, there does not exist a perfect segmentation
compound image source, we structure the image into non- algorithm that can fulfill such requirement. Second, layer-based
overlapping blocks and classify each block into four different approaches have to address the issue of coding of partially masked
classes based on the empirical statistics within the block. Different foregroundhackground in order to avoid the potential information
coding strategies are applied to different classes in order to leakage. The ultimate solution to such new challenging problem is
achieve the very best compression performance. Our new block- not known yet. Intuitively, the masked data should not spend any
based image coder is able to provide visually lossless compression bit because they do not help resolve the uncertainty about the
of scanned documents at the bit rate of around I-1.5bpp with image at the decoder. However, existing approaches such as data
modest computational complexity and very low memory filling [2,6]and successive projection [7] can only provide a sub-
requirement. optimal solution. Third, and the most importantly, the rate-
distortion optimization problem becomes almost intractable within
the framework of MRC-based model when the segmentation and
1. Introduction coding strategies we deal with are both sub-optimal. Such
difficulty attributes to the failure of preserving the visual quality by
With fast developments of computer technologies, the storage and
Djvu even at a relatively high bit rate.
distribution of documents have started to experience the transition
from conventional paper formats to everlasting digital formats. Besides layer-based segmentation, simplified approaches such as
According to the generation process, digital documents can be block-based segmentation [3,4]have also been proposed recently.
classified into either computer-generated or scanned documents. It is well known that block-based coder suffers from blocking
Though both of them can be modeled by a compound image artifacts when compressing photographic images at the low bit
source, their noise characteristics vary and lead to different rate. However scanned documents often contain large portion of
compression goals. Unlike computer-generated documents, texts and graphics, which reduce the risk of generating blocking
scanned documents typically contain visually noticeable noise. artifacts. Meantime, since scanned documents typically have fairly
Therefore visually lossless compression becomes a more high spatial resolution, block-based segmentation offers an
reasonable goal than lossless compression for scanned documents. effective way of singling out texts and graphics regions; and the
overhead used to inform the decoder about the block-based
The most popular algorithm for lossy compresion of scanned
segmentation results is negligible with a reasonable choice of
document so far is probably the Djvu coder [ l ] developed by the
block size. Moreover, when the image is structured into blocks
AT&T researchers. It is based on so-called “Mixed Raster
instead of layers, most existing coding algorithms with desirable
Content”(MRC) model [5] that decomposes a compound image
features(e.g. resolution scalability, rate-distortion optimization)
into background, foreground and mask layers. In Djvu, the mask
can be easily applied. Due to those advantages, we advocate the
layer is compressed by the JBIG2 coder at the original spatial
resolution and the backgroundlforeground layers are compressed block-based segmentation.
by a wavelet-based image coder(IW44) at a reduced resolution. In this paper, we propose a simple but effective block-based
Though Djvu has achieved high compression ratio, the quality of segmentation scheme based on the histogram analysis of local data.
decoded images is merely satisfactory for the web-browsing Depending on the number of distinct modes, we classify each
purpose. Many important applications such as intelligent block into four different types: smooth block(only one mode), text
document management system require the stored document should block(exact1y two modes), graphics block(fewer than four modes)
be visually lossless to the original document. Unfortunately, Djvu and image block(more than four modes). In contrast, previous
does not meet such requirement even with the parameter setting approaches such as multi-scale Bayesian [4]and color clustering
that attempts to best preserve the image quality. Another [ l ] require the knowledge of the whole image and therefore are
disadvantage with Djvu is that it does not fully support the much more computationally complicated. For example, in the Djvu
resolution scalability because the mask layer is always compressed coder, the segmentation stage often takes more time than the
at the full resolution while foregroundhackground layers are following compression stage. Meanwhile, for scanned color
compressed at a different resolution. Though scanned documents documents at the resolution of 6OOdpi, the memory requirement for
are often scanned at a high resolution(e.g. 600dpi), users might storing the whole image in the buffer is prohibitive(near1y lOOM
want to print them out at a reduced resolution(e.g. 300dpi). bytes). Moreover, recognizing that the goal of segmentation is to

02001 IEEE
0-7803-6725-1/01/$10.00 450
facilitate the task of compression, we believe that the effect of sub-reason of prioritization, we propose the following segmentation
optimal block-based segmentation is less significant than that of scheme based on a series of decision rules from the block type with
sub-optimal layer-based segmentation, especially if we consider a the highest priority to the block type with the lowest priority. The
prioritized segmentation and compression approach. decision for smooth and text blocks is relatively straightforward.
The histogram of smooth or text blocks is typically dominated by
The principle motivation for prioritizing the block type is to design
one or two intensity values(modes). The distinction between
coding algorithms so that they can gracefully handle the potential
graphics and image blocks is often tricky. Since the maximum size
penalty cost by the wrong decision made at the segmentation stage.
of color palette in our color indexing strategy is four, a block is
In the order of from high priority to low priority, we have smooth
deciax:! to be a graphics block only if the number of modes is no
block(one-color), text block(two-color), graphics block(four-color)
larger than f o x . When the block size is reasonably small, the
and image block(wave1et-based). For example, If a block with low
likelihood that there are more than four different colors within a
priority(e.g. graphics) is coded by a two-color coding algorithm, graphics block is small. Overall, we provide a detailed step-by-step
some color might get lost; while if a block with high priority(e.g.
imlementation of the urouosed segmentation scheme in Fie. 2.
text) is coded by a four-color coding algorithm, we can still
achieve the nearly optimal compression. Based on the above empirical statistics collection:
observations, we decide the block type through a sequential mode
test in the order of decreasing priority. The prioritized - calculate the probability of intensity value i(i=O,I, ...,255
segmentation and compression strategies offer a better control of p i = freq ( i ) / B , where B is the block size;
the tradeoff between the rate and the distortion. Experiment results
show that we can achieve visually lossless compression of scanned - find the mode: rn, ,..., mN and compute the curnulati\
documents at the bit rate of around l-ISbpp, though no explicit
probability around the mode m : C, = Pi
rate-distortion optimization is performed yet. More complicated
rate-distortion optimized segmentation is referred to [4].
sequential decision:
2. Block-based Segmentation - if N=l and cpTh, declare smooth block;
The segmentation of a compound image is often a computation - else if N=2and cl+c2 >Th and 1 cl-cz I>T, declare text block;
demanding problem. In order to keep the computational
complexity of segmentation low, we propose a simple but effective - else if N14 and c,+c2+c3+c4>Th,declare graphics block;
segmentation strategy based on the histogram of the block. Figure - else declare image block.
1 shows the examples of several typical histogram distributions of
four different types of block. Varying characteristics of the Figure 2. Step-by-step implementation of block-based
histogram from one type to another make it a natural choice for the segmentation
block-based segmentation. It can be estimated from Fig. 2 that the average computations
required for each block are quite modest. Experiments have
justified that the complexity of the proposed block-based
segmentation is at least one order of magnitude lower than that of
color-clustering based segmentation used by Djvu. Though the
information contained in the neighboring block could help the
segmentation, we choose to rely on the data within the block only
due to memory considerations. For scanned documents that
typically have fairly high resolution, it is often impractical to store
the whole image in the buffer. Meanwhile, since the goal of
segmentation is to facilitate the task of compression, we believe
(a) smooth block (b) text block that the coding efficiency loss due to sub-optimal segmentation
results is small especially after we consider the prioritization of the

(c) graphics block


i i o o m m a n s r

(d) image block


block type. For example, we allow a text block to be treated as a
graphics block because the coding strategy we design for graphics
blocks is also suitable for text blocks that have a higher priority.
As an example, Figure 3 shows the original test image JOSA used
in our experiment and its segmentation results. Intensity values
from darkness to brightness(0,80,160,240)denote smooth block,
text block, graphics block and image block respectively. Empirical
studies show that block size of 16 achieves a reasonable tradeoff
among the complexity and the performance. The noise threshold
Figure 1. The histogram of four different types of blocks A is chose to be 8 for typical scanned documents. The
(“0”denotes the location of a mode). segmentation results are explicitly transmitted to the decoder as the
overhead. Therefore the segmentation is only performed at the
An intensity value is defined to be a mode if its frequency satisfies encoder. Such asymmetric structure of allocating the computation
two conditions: it is a local maximum and the cumulative source is desired in many applications such as online document
probability around it is above a pre-selected threshold th. For the browsing and printing.

45 1
, ~ ~ l i C O N r t N r 5 2) Graphics block:
The coding of graphics block is similar to the coding of a
palette-based image. Since the segmentation has put a
limitation on the number of modes for the graphics blocks, we
only deal with the case of a small-size palette(M=4) here. In
previous work [9], a color re-indexing algorithm is proposed
to find the optimal palette for a given image. However, we
find that when the palette size is small, we can simply assign
index 0 to the most probable color; index 1 to the second
most probable color and so on. Such ad-hoc indexing scheme
often achieves nearly optimal compression performance
without the necessity of exhaustively searching all 4!=24
possible palettes. Meantime, since the knowledge of empirical
probability distribution is already available from the segmen-
tation stage, histogram-based color indexing does not require
Figure 3. Original JOSA image(left) and block-based segmentation any additional computation. The palette is compressed in a
result(right,B= 16,A=8,T=128,th=0.05,Th=0.95). similar way like the (rnin,rnax)values in the text block and the
indexed image is compressed by a 4th0rder(4~=256 contexts)
3. Adaptive Coding context-based adaptive arithmetic coder.

After the segmentation, the block type is coded by a context-based Resolution scalability is another tricky issue for graphics
adaptive arithmetic coder. The context includes the nearest four block coding. Classical linear transforms such as wavelet
causal neighbors, which gives 44 =256 contexts in total. The transforms do not preserve the level set and thus often
overhead is about 0.002-0.02bpp for the block size of 8-32. increase the entropy of palette-based images [IO]. In this
Depending on the block type, we have designed prioritized coding work, we propose a level-set preserving multi-resolution
algorithms to adaptively compress each block. The coding of decomposition for the palette image based on lifting scheme.
smooth block is straightforward: we simply plug the average For simplicity, we obtain the low-resolution image directly
intensity value into an adaptive arithmetic coder. Next, we shall from the down-sampling of a high-resolution image, i.e.
detail on the coding of text block, graphics block and image block. X(i,j)=x(2i,2J). Then the interlaced lattice ~(2i+Z,2j+Z)is
adaptively predicted from the low-resolution image x(2i,2j):
1) Text block:
g, = n(2i,2j) - x(2i+ 2,2 j + 2)]
The block is first quantized to a binary map and then both the
modes(min,rnux) and the binary map are transmitted to the g, = ~ ( 2 i + 2 , 2 j ) - x ( 2 i + 2 , 2 j ) ]
decoder. The (rnin,max) values are compressed in a similar
fashion like the average value in the smooth block. The binary if lgll<lg2l. P(2i + 1,2j + 1) = [x(2i,2j) + n(2i + 2,2 j + 2)]/2
map is compressed by a 6-th 0rder(2~=64contexts) context- otherwise, i ( 2 i + 1,2j + 1) = [x(2i + 2,2j) + x(2i,2 j + 2)]/2.
based adaptive binary adaptive coder(e.g. QM coder). The The prediction of the interlaced lattice x(i,j)(i+j=odd) from
resolution scalability for binary image is solved by the x(i,j)(i+j=even) can be performed in a similar fashion. The
resolution reduction method specified in progressive JBIG prediction residue is generated by e = x - ;(mod M ) and its
[SI. Instead of taking the average of each 2x2 block, we reversibility is achieved by x = e + P(mod M ) .
obtain a low-resolution image by calculating the following
quantity for each X(capita1 denotes a low-resolution pixel): 3) Image block:
Q =4~+2(a+b+c+d)+ Wavelet-based coding algorithms have been proven effective
(e + f + g + h)- 3(B +C)- A on compressing photographic images. We use the normalized
S+P transform [XI for its computational efficiency. To exploit
If 49.5, then X=l;otherwise X=O; the nonstationarity of high bands, wavelet coefficients are
coded in two stages: the positions of significant coefficients
are first compressed by a context-based adaptive binary
arithmetic coder and the sigdmagnitude of significant
coefficients are plugged into another arithmetic coder.
Resolution scalability is guaranteed by the wavelet transform.

4. Experiment Results
We use the scanned document JOSA(768~928,Fig. 3) in our
experiment. Block-based Segmentation and Adaptive Coder
Fig. 4 The context used to determine the value of a low- (BSAC) is compared with the popular Djvu coder. The
resolution pixel executables of Djvu encodeddecoder can be downloaded from
httr,://www.lizardtech.com/.

452
Fig. 4 shows the comparison of different portions of decoded JOSA full resolution for all layers and with the highest quality factor(100).
image at the bit rate of around l.Obpp. The parameter setting with It can be observed that some texts are still smeared out by Djvu
Djvu coder is: resolution of 300dpi for all three layers and quality coder at the bit rate of nearly 2.Obpp. Though the coding results of
factor of 88 for background and foreground layers. In BSAC coder, JPEG-2000 are not included here, we note that JPEG-2000 need at
we use the same parameters at the segmentation stage as shown in least 1.5bpp to fully preserve all the fine structures(inc1uding those
Fig.3 and the quantization step size(q) of 16 in the coding of image coming from the other side of the sheet during the scanning
block. Though resolution scalability is supported by BSAC, we only process). In contrast, through careful inspection with the decoded
show the coding results at full resolution to compare them and Djvu image by BSAC, we find that all the fine details are well preserved
with the similar parameter setting. Significant improvement on at the rate of around 1.3bpp(q=8). More extensive experiments
visual quality has been achieved by BSAC over Djvu. Meanwhile, demonstrate that BSAC is able to achieve visually lossless
the overall computational complexity of BSAC coder is several compression for typical scanned documents at the bit rate of
times less than that of Djvu. For example, it takes around 1 seconds I.0-1.5bpp.
for JPEG2000 or BSAC coder to compress the JOSA image while
Djvu spends over 10 seconds with the above parameter setting on
the same Pentium-III 866M machine.

Original Djvu BSAC


Figure 5. Comparison of portions of decoded JOSA images among
DjVu(174,426bytes) and BSAC(117,017bytes).

5. References
[l] L. Bottou et al., “High quality document image compression
using DjVu”, Journal of Electronic Imaging, Vo1.7, pp.410-
425, July 1998.
[2] R.L.de Queiroz et al., “Optimizing block-thresholding
segmentation for multiplayer compressiong of compound
images”, IEEE Trans. on Image Processing, vo1.9, pp.1461-
1471, Sep. 2000.
[3] A. Said and A. Drukarev, “Simplified segmentation for
compound image compression”, Proceeding of ICIP’ 1999.
[4] D. Cheng and C. Bouman, “Document compression using
rate-distortion optimized segmentation”, to appear in the
Journal of Electronic Imaging.
[5] Mixed Raster Content ITU-T Study Group 8, Question 5 ,
Draft Recommendation T.44, May 1997.
[6] R.L. de Queiroz, “On data filling algorithms for MRC
layers”, Proceeding of ICIP’2000, Vancouver, Sep. 2000
Djvu BSAC [7] L. Bottou and S. Pigeon, “Lossy compression of partially
masked still images”. Proceedings of DCC98, Snowbird,
Figure 4. Comparison of the text(top), graphics(midd1e) and March 1998
irnage(bottom) regions in the decoded JOSA images between (81 JBIG, “Progressive bi-level image compression”,
DjVu coder(96352bytes) and BSAC coder(85,806bytes). International Standard, ISOlIEC 1 1544, 1993
As we mentioned in the introduction, Djvu coder fails to offer the [9] Z. Weng et al., “An efficient color re-indexing scheme for
visually lossless compression even with the most conservative palette-based compression,”Proc. of ICIP2000
parameter settings due to the fundamental limitations of MRC-based [IO] X. Li and S . Lei, “On the study of lossless compression of
model. Fig. 5 shows the portion of decoded JOSA image by Djvu at computer generated compound images”, Proc. of ICIP’2001

453

You might also like