You are on page 1of 10

Recent Developments in Context-Based Predictive Techniques for Lossless Image Compression

N ASIR M EMON 1
1 2

AND

X IAOLIN W U 2

Department of Computer Science, Northern Illinois University, DeKalb, IL, 60115, USA Department of Computer Science, The University of Western Ontario, London, Ontario, Canada N6A 5B7 Email: memon@cs.niu.edu

In this paper we describe some recent developments that have taken place in context-based predictive coding, in response to the JPEG/JBIG committee's recent call for proposals for a new international standard on lossless compression of continuous-tone images. We describe the different prediction techniques that were proposed and give a performance comparison. We describe the notion of context-based bias cancellation, which represents one of the key ideas that was proposed and incorporated in the "nal standard. We also describe the different error modelling and entropy coding techniques that were proposed for encoding prediction errors, the most important development here being an ingeniously simple and effective technique for adaptive Golomb–Rice coding. We conclude with a short discussion on avenues for future research. Received July, 1996; revised May, 1997

1.

INTRODUCTION

We have seen an increased level of activity in image and video compression in recent years; however, most of this activity has been restricted to lossy compression. Many applications, such as medical imaging, image archiving, high-precision image analysis, remote sensing, pre-press imaging, preservation of art work and historical documents, require lossless compression. Despite the importance of lossless image compression of continuous-tone images there is a paucity of standard algorithms. Current standards for lossless compression include 1. 2. 3. Lossless JPEG (Huffman and arithmetic). JBIG, Group 4 Fax. GIF, Photo CD, PNG etc.

It is generally accepted that the Huffman-coding-based JPEG lossless standard provides poor compression and a host of better techniques have been reported in the literature [1– 3]. The JPEG arithmetic coding version does provide about 10% better compression, but is not available in the public domain and hence has seen little use. JBIG and the CCITT Group 3 and 4 standards are primarily designed for bi-level data and do not provide good compression when used on greyscale images by compressing individual bit planes. GIF and PNG are essentially suitable for synthesized images and are known not to work well with natural continuous-tone images acquired through an array of sensors. Due to the perceived inadequacy of current standards for lossless image compression, the JBIG/JPEG committee of T HE C OMPUTER J OURNAL,

the International Standards Organization (ISO) approved a new work item proposal in early 1994, titled Next Generation Lossless Compression of Continuous-tone Still Pictures. A call was issued in March 1994 soliciting proposals specifying algorithms for lossless and near-lossless compression of continuous-tone (2–16 bit) still pictures. A number of requirements were imposed on submissions; for details the reader is referred to [4]. For instance, exploitation of interband correlations (in colour and satellite images for example) was prohibited. This call for proposals resulted in renewed activity focused on the development of lossless image compression techniques. A large part of this activity has focused on a speci"c type of compression technique, loosely referred to in the literature as lossless DPCM or lossless predictive coding. Since the baseline algorithm that has been standardized [5] and the proposed high-performance extension both employ a predictive coding approach, we restrict our discussion to predictive coding techniques. We describe some of the important new developments that emerged in response to the call for proposals, and have contributed signi"cantly to the advancement in the state of the art of predictive coding techniques for lossless image compression. The paper is structured as follows. In the next section we begin by giving an introduction to predictive coding techniques for lossless image compression and describe the current lossless JPEG standard. We also introduce and establish some terminology and notation that is used throughout the rest of the paper. In Section 3 we describe various predictors that were proposed. We describe three speci"c predictors, Vol. 40, No. 2/3, 1997

a subset of which is generally used for prediction and/or context determination by lossless image compression techniques. In the Huffman coding version. which is arrived at from the values of previously encoded neighbouring pixels. j]. needs to be transmitted. The arithmetically coded version uses quantized prediction errors at neighbouring pixels as contexts for Vol. The error modelling techniques employed by most lossless compression schemes proposed in the literature can be captured within the context modelling framework described in [9] and applied in [9. PREDICTIVE CODING TECHNIQUES AND THE CURRENT JPEG LOSSLESS STANDARD standard variable-length entropy coding techniques. No. Viewed in this framework. the convergence of which led to the development of the new standard. the role of the error model is essentially to provide estimates of the conditional probability of the prediction error. left to right within a row) and predict the value of the current pixel on the basis of the pixels which have already been transmitted (received). Notation used for specifying neighbouring pixels of the current pixel P[i. then it can be coded ef"ciently using any of the T HE C OMPUTER J OURNAL. This can be done by estimating the PDF by maintaining counts of symbol occurrences within each context [10] or by estimating the parameters (variance. which was one of the key ideas that contributed towards the development of the "nal standard.128 NN i WW NW W N P[i. M EMON AND X. 7]. 40. Here the transmitter (and receiver) process the image in some "xed order (say. In the remainder of the paper we use the notation speci"ed in Figure 1 to denote speci"c neighbours of the pixel P[i. Unfortunately. The current lossless JPEG standard Among various methods which have been devised for lossless compression. predictive techniques are perhaps the simplest and most ef"cient. In Figure 1 we show a template of two-dimensional neighbourhood pixels. j] 0 (no prediction) N W NW N + W − NW W + (N − NW)/2 N + (W − NW/2 (N + W )/2 j FIGURE 1. j] and its predicted value by P[i. Hence. In this approach. nor do we intend this to be a thorough treatise on lossless image compression in general. 2/3. even after applying the most sophisticated prediction techniques. We conclude in Section 7 with a discussion on avenues for further research in lossless image compression. the prediction error at each pixel is encoded with respect to a conditioning state or context.1. W U TABLE 1. 10]. The current JPEG standard uses a predictive scheme when used in its lossless mode. j]. an important discovery made during the standardization process was the surprising ef"cacy of the MED predictor despite its apparent simplicity. generally the residual image has ample structure which violates the IID assumption. 1997 . In Section 5 we outline the different error modelling techniques that were proposed for encoding prediction errors and in Section 6 we describe the speci"c entropy coding techniques employed. j]. in order to encode prediction errors ef"ciently we need a model that captures the structure that remains after prediction. which is then encoded along with the compressed image. such as Huffman coding or arithmetic coding. The latter approach requires two passes through the data. In fact. j] in the ith row and jth column. The most important contribution here came from the revised LOCO algorithm proposed by HP laboratories [6. This step is often referred to as error modelling [8]. e = P[i. Our intention is to describe the main ideas that were proposed in response to the JPEG committee's call for proposals. essentially no error model is used. If the residual image consisting of prediction errors is treated as an Independent and Identically Distributed (IID) source. If we denote the current ˆ pixel by P[i. The prediction errors are then encoded either by Huffman coding or arithmetic coding—codecs for both are provided by the standard. 2. then only ˆ the prediction error. GAP and ALCM in detail and then present a performance comparison which clearly establishes the choice of MED as the default predictor for the standard. raster order going row by row. The prediction errors are assumed to be IID and either a static default Huffman table is used or a custom Huffman code table can be speci"ed. for example) of an assumed Probability Density Function (PDF) (Laplacian. given the context in which it occurs. in the form of a very simple and effective parameter estimation technique for Golomb–Rice coding. MED. Table 1 lists the eight predictors used. In Section 4 we describe the notion of context-based bias cancellation. If the prediction is reasonably accurate then the distribution of prediction errors is concentrated near zero and has a signi"cantly lower zero-order entropy than the original image. j] − P[i. JPEG predictors for lossless coding Mode 0 1 2 3 4 5 6 7 Prediction for P[i. for example) as in [8]. It provides eight different predictors from which the user can select. j] NNE NE N. 2. At this point we would like to note that it is not the intention of this paper to give a detailed description of the new lossless JPEG standard.

3. planar interpolation is used to compute the prediction value. j] + N )/2 ELSE IF (dv − dh < −8) {weak vertical edge} ˆ ˆ P[i. In this paper we restrict our discussion to these seven proposals. the MED predictor was observed to give superior performance over most linear predictors [16]. 40. Vol. NE and NNE are as de"ned in Figure 1. employed prediction followed by conditional encoding of the prediction error. NW. LOCO-I (low-complexity lossless coder) [6]. NN. j] =  N + W − NW otherwise. In an extensive evaluation. In GAP the gradient of the intensity function at the current pixel P[i. that adapts in the presence of local edges. 2/3. The subsequent steps assist in classifying the magnitude of the prediction error into one of a set of ranges and the "nal bits that determine the exact prediction error magnitude within the range are sent uncoded. 1. The other two proposals [12. The weights are adapted on the #y as encoding progresses. N W N + W − NW. THE PREDICTION STEP 129 candidate predictors. 1997 . if neither a vertical edge nor a horizontal edge is detected. j] according to the estimated gradients in the neighbourhood. NW. in response to the call for proposals for a new lossless image compression standard. j] = N ELSE { ˆ P[i. A detailed description of the coder and the standard can be found in [11]. These thresholds were arrived at after extensive experimentation with a large set of test images. Given the success of predictive techniques for lossless image compression. W ) if NW ≤ min(N . j] = (3 P[i. Speci"cally. j] = (N + W )/2 + (NE − NW)/4. The "rst binary decision determines whether the prediction error is zero. The ALCM proposal [18] and the JSLUG proposal [19] included an adaptive predictor that used a weighted combination of "ve neighbourhood pixels in order to predict the current pixel. IF (dv − dh > 32) {horizontal edge} ˆ ˆ P[i. j] is estimated by computing the following quantities: dh = |W − WW| + |N − NW| + |N − NE| dv = |W − NW| + |N − NN| + |NE − NNE|. The QM-Coder is used for encoding each binary decision. If it is not zero. W ) P[i. 13] were based on transform coding. However. W. Binary arithmetic coding is used within each context by decomposing the prediction error into a sequence of binary decisions. Martucci reported the best results with the following three predictors. then the second step determines the sign of the error. j] = ( P[i. The neighbourhood used consisted of the N. 2. W )  min(N . used the median edge detection (MED) predictor. The MED predictor has also been called MAP (median adaptive predictor) and was "rst proposed by Martucci [15]. The ALCM and JSLUG predictor In this section we brie#y describe the predictors that were proposed. The thresholds given in the above procedure are for 8bit data and are adapted on the #y for higher resolution images. The North pixel is used as a prediction in the case of a vertical edge being detected. j] + W )/2 ELSE IF (dv − dh > 8) {weak horizontal edge} ˆ ˆ P[i. No. j] = ( P[i. MED detects horizontal or vertical edges by examining the North (N ). 3. j] + W )/4 ELSE IF (dv − dh < −32) {vertical edge} ˆ ˆ P[i. GAP weights the neighbouring pixels of P[i. it was no surprise that seven out of the nine proposals submitted to ISO. At the end of the section we give a performance comparison and make a few observations. j] = W ELSE IF (dv − dh < −80) {sharp vertical edge} ˆ P[i. j] + N )/4 } where N. The MED predictor Hewlett Packard's proposal.3.1. 3. NE and WW pixels as speci"ed in Figure 1. (1) ˆ A prediction P[i. W. in which case it is easy to see that MAP turns out to be the MED predictor. West (W ) and North-West (NW) neighbours of the current pixel P[i. W ) ˆ max(N . right from the "rst-round evaluations it was clear that the transform-codingbased proposals did not provide as good compression ratios as algorithms proposed based on predictive techniques [14]. One way of interpreting such a predictor is that it always chooses either the best or the second-best predictor among the three T HE C OMPUTER J OURNAL. The West pixel is used as a prediction in the case of a horizontal edge.L OSSLESS I MAGE C OMPRESSION conditioning the prediction error. j] as illustrated in Figure 1 in Section 2. Finally. In the remainder of this paper we describe in more detail the speci"c contributions that were made which have led to the development of the new proposed standard. j] = (3 P[i. prediction is performed according to the following equations:  if NW ≥ max(N . The GAP predictor The CALIC proposal [17] included a gradient-adjusted predictor (GAP) which adapts the prediction according to local gradients and hence gives a more robust performance compared to standard linear predictors. j] is then made by the following procedure: IF (dv − dh > 80) {sharp horizontal edge} ˆ P[i. 3. 3.2. Martucci presented the MAP predictor as a non-linear adaptive predictor that selects the median of a set of three predictions in order to predict the current pixel.

1131 4. If the prediction was too high then the largest neighbouring pixel is decremented and the smallest one is incremented. Cafe and Tools are SCID images (CMYK). adapted to horizontal and vertical gradients in the neighbourhood of the pixel being predicted.9088 1.9985 4. but ALCM has much higher computational complexity. Another predictor.9290 5.4044 4.1119 4. Woman. based on some recent work by Wu [23].4834 4. M EMON AND X.5372 1. the Mitsubishi proposal.130 N. but a detailed evaluation revealed their performance to be inferior [20]. we see that the performance of the three techniques is very similar. W U TABLE 2. 2 N N + NE + .0589 5.6175 1. The image Faxballs is a graphics image. The images Bike. there were a few more predictors proposed in different submissions.229 Initially. However.5307 4. 1997 .5321 2. MED and GAP have comparable complexity.5755 6.259 MED 8. The following set of predictors was used: N.4453 4. After prediction. but fares poorly in compound images that have both text and image data. ALCM too fares poorly with compound images. adaptively switched between a "xed set of predictors based on the texture and gradients in the neighbourhood of the target pixel.1680 2.7669 4. 10]. W. MRI. 3.5491 2. CT.6794 6. Finger and US are T HE C OMPUTER J OURNAL. There is no clear winner that outperforms others on all test images.0362 2. Zero-order entropy of prediction errors with the MED. CONTEXT-BASED BIAS CANCELLATION 3. j] and its surrounding. Conditioning of the ˆ prediction error e = P[i.2754 3.1057 4.2469 3.0553 4.2587 4.9934 2. CLARA [21]. Performance comparison In Table 2 we give the zero-order entropy of prediction errors with the three different predictors described above on the ISO test image set.5371 4.1238 2. 2/3. the weights are changed as follows. On examining the results.0823 3.6515 4. N +W . Chart s and Chart are compound RGB images containing text and pictures.4696 3.5248 2. If the prediction was lower than the actual value then the weight 1 of the largest neighbouring pixel is decremented by 256 and the weight of the smallest neighbour is decremented by the same amount.4048 4. monochrome medical images and "nally.4173 4. Instead of estimating the PDF of prediction Vol.1504 6.8476 2.6517 4.0301 2. The images Air1 and Air2 are RGB aerial images.6006 4. ˆ ˆ given that the current pixel is predicted to be P[i.4548 5.4226 3.2442 5. ties are broken by using the following priority scheme. No.4514 4. GAP performs better in smooth images. The images Compound1.8578 4. Water and Bike3 are scanned RGB images.7711 4. 4. j] to its context can exploit higher-order structures such as texture patterns and local activity in the image for further compression gains. the large number of possible contexts can lead to the `sparse context' or `high model cost' problem [9. The images Cats.204 GAP 8.4713 2. The set X-ray. given in the DARC proposal by Kodak [22]. j] − P[i. Local gradients alone cannot adequately characterize some of the more complex relationships between the predicted pixel P[i. Speci"cally. 2 4 Details of the exact manner in which the selection was made are given in [21].7922 1.0845 6.5847 6.8935 2.2967 4.6389 1. 1 5 4 2 P[i. CR. the MED predictor was adopted by the committee as the default predictor for the baseline algorithm of the proposed standard.6380 4. The CALIC proposal employed a novel and effective solution to this problem.2726 5. j] = αW + (1 − α)N where α= v v h+ v = |W − NW| h = |N − NW|.2557 5.1297 5. 40. Compound2. This test set was made available to all proposers and comprised of >160 Mbyte of image data.3845 3.0503 5. all pixels are assigned an equal weight. Other predictors Besides MED.1145 5. Given these facts.1285 5. For example. j] 3 The pixel labelled 1 is changed with highest priority and the pixel labelled 5 with least priority. GAP and ALCM.0132 6.3469 3.2951 5. MED has the lowest average rate over the entire data set.1057 5.4.8577 3. Hotel and Gold are YUV video images.9356 5.8478 1.0651 3.2113 4.4483 2. GAP and ALCM predictors Image Air1 Air2 Bike Bike3 Cafe Cats Chart Chart s Compound1 Compound2 CR CT Faxballs Finger Gold Graphic Hotel MRI Tools Ultra-Sound Water Woman X-ray Average ALCM 8. In case more than one pixel has the highest weight.5.

T HE C OMPUTER J OURNAL. b0 using the prediction value P[i. . The observed Laplacian distribution without conditioning on contexts is a composition of many contextsensitive distributions of different means and different variances (see Figure 2). . we would seemingly use 4 × 28 = 1024 different compound contexts. errors. j] = P[i. p(e|C). namely bk = 0 1 ˆ if xk ≥ P[i. j] + ¯ (C). in practice the new prediction ˜ error = P[i. By careful counting one determines that the total number of valid compound contexts is only 576 [24]. j] to P[i. 2N − NN. denoted by C(δ. the two proposals that employed bias cancellation. which we are currently investigating. We describe below the details by which contexts were formed and quantized by CALIC and LOCO-I1 p . A realistic distribution for prediction errors (bottom) which is in fact a weighted combination of nine different Laplacians (top). NW. This scheme can be viewed as a product quantization of two independently treated image features: spatial texture patterns and the energy of prediction errors. At a glance. Hence contexts used for bias cancellation are also called prediction contexts. It can be a function of some neighbouring pixels. We call this process bias cancellation. . we can correct the bias in the prediction by feeding back e(C) and adjusting the ¯ ˆ ˜ ˆ prediction P[i. Therefore.5 1 0. j] ˆ if xk < P[i. j]. by an error feedback mechanism that cancels prediction biases in different contexts. On the other hand. the more biased e(C) is from zero. only its conditional expectation E{e|C} is estimated using the corresponding sample means e(C). . .5 0 -10 -8 -6 -4 -2 0 2 4 6 8 10 In CALIC. NN. contexts for error modelling are formed by embedding 144 texture contexts into four error energy contexts to form a total of 576 compound contexts. j] is estimated rather than e = ˆ j]. NN and WW are de"ned as in Figure 1. is then quantized to four levels yielding a quantized error energy context Q( ) which is combined with the quantized texture pattern 0 ≤ B < 2 K to form compound modelling contexts. Error energy contexts are computed by using an error energy estimator de"ned as = dh + dv + 2|ew |. 1997 . NE. not all 28 binary codewords of the B quantizer de"ned by (4) are possible. In order not ¯ to over-adjust the predictor. j] as the threshold. (3) (2) 1. signi"cant improvements can be made. This in turn leads to an improved predictor P[i. 40. 2/3.L OSSLESS I MAGE C OMPRESSION 1. where ¯ (C) is the sample mean of conditioned on context C. Conditioning of the prediction error to its context provides a means to separate these distributions. FIGURE 2. WW.5 0 -10 -8 -6 -4 -2 0 2 4 6 8 10 where N. j]: P[i. x6 and x7 . x7 } {N . This leads to the need for selective feedback techniques. The idea of gaining coding ef"ciency by bias cancellation arises from the observation that the conditional mean e(C) is ¯ generally not zero in a given context C. represent ˆ the events whether the prediction value P[i. Texture contexts are formed by quantization of a local neighbourhood of pixel values to a binary vector C = = {x0 . Also note that an event xi in a prediction context need not be a neighbouring pixel to P[i. No.5 131 4. for example. NE. NW. j] − P[i − 1. ˜ ˆ for P[i. W. Context formation and quantization in CALIC 1 0. However. Since the variability of neighbouring pixels also in#uences the error distribution. j] + e(C). the more ¯ effective is the process of bias cancellation. (5) where dh and dv are as de"ned in Equation (1) and ew = ˆ P[i − 1. j] = P[i. (4) Clearly. It can be seen that for some images.1. Conceptually. This does not contradict the well-known fact that the prediction errors without conditioning on contexts. β). j] 0 ≤ k < K = 8. j] − P[i. Since the conditional mean e(C) is the most likely pre¯ diction error in a given context C. . W. In Table 3 we show the reduction in zero-order entropy when the error feedback mechanism described above is used along with the GAP predictor. C is then quantized to an 8-bit binary number B = ˆ b7 b6 . within each context C. j] forms a convex or concave waveform with respect to the neighbouring pixels in the vertical and horizontal directions. bias cancellation can also be viewed as a two-stage adaptive prediction scheme via conditioning of prediction errors to contexts and the subsequent error feedback. follow a zero-mean Laplacian (symmetric exponential) distribution for most continuoustone images. j] (|ew | is chosen because large errors tend to occur consecutively). 2W − WW}. j] − P[i. x6 . Vol. B captures the texture patterns in the modelling context which are indicative of the behaviour of the prediction error e. the texture contexts are combined with quantized error energy to form compound modelling contexts. performance can actually degrade by a little in some instances. These estimates are then used to ¯ further re"ne the prediction prior to entropy coding.

Larger L only improves coding ef"ciency marginally.1. However. However. Furthermore. The variance of prediction errors strongly correlates to the smoothness of the image around the predicted pixel P[i. No. −q3 . Reducing the number of contexts could be one way to address this problem. However. 5.0823 3.6389 1. Context formation and quantization in LOCO-I1p Inspired by the success of the CALIC algorithm in the "rst round of evaluations. but merged the bias cancellation contexts into a few conditioning states for entropy coding of errors. −q4 ) are merged T HE C OMPUTER J OURNAL.2557 5.9356 5. in practice it is observed that prediction does not completely remove the statistical redundancy in the image even after context-based bias cancellation. faces two major dif"culties: the use of prohibitively large memory space for error modelling. Coding contexts in CALIC (6) The differences D1.5077 4. and the lack of suf"cient samples in each context during adaptive coding in order to make reliable probability estimations.0191 3. a direct implementation of this approach. contexts of the type (q1 .1720 5.5485 6. q3 . M EMON AND X.2344 5.2252 3. COMPUTING CODING CONTEXTS TABLE 3.0132 6. this leads to poorer performance due to loss in modelling ef"ciency.229 GAP (with feedback) 8. Coding contexts in CALIC were computed by "rst computing an error energy estimator as de"ned in (5).9441 5. For time and space ef"ciency. q2 .4834 4. Thus entropy coding of errors using estimated conditional probability p(e| ) improves coding ef"ciency over using p(e).6517 4. being further away from the current pixel. 5. due to the large number of conditioning states or contexts and the large alphabet size of prediction errors.2.7518 4.174 4. These conditioning states are also called coding contexts in order to distinguish them from bias cancellation contexts.9934 2. Zero-order entropy of prediction errors with and without error feedback Image Air1 Air2 Bike Bike3 Cafe Cats Chart Chart s Compound1 Compound2 CR CT Faxballs Finger Gold Graphic Hotel MRI Tools Ultra-Sound Water Woman X-ray Average GAP (No feedback) 8. while obtaining compression ratios that were only 3% inferior to CALIC's on a majority of ISO test images.3845 3. Another key contribution of the CALIC proposal was a modelling paradigm that employed a large number of contexts for bias cancellation. In fact it was simpli"ed further before adoption in the "nal committee draft of the standard by dropping the difference D4 and thereby obtaining a reduced context count of 364 [5]. The total number of contexts turns out to be 1094 within each of which the bias in prediction error is estimated in a manner similar to CALIC.7127 4. Many of the lossless compression techniques reported in the literature have adopted this approach and use only a small number of contexts for conditioning the encoding of prediction errors.2951 5.2469 3. q4 ) = P(−e|−q1 .4818 2.1057 5.2967 4. 1997 . Although the quantizer Q( ) can be optimized off-line Vol.1114 5. is quantized into only three regions (labelled −1 to +1).0000 2. W U based on the assumption that P(e|q1 . q3 .8578 4. q2 .1057 4.4439 4.132 N.0629 4. In practice.0045 2.4483 2. j].0553 4. As mentioned earlier in Section 2. Extensive evaluation of the two context formation and quantization techniques described above showed little difference in compression performance for typical images. 40.9088 1. The difference D4.0301 2. has to be quantized to a small number of (L) levels. L = 8 is found to be suf"cient. −q4 ). Contexts in LOCO-I1 p are formed by "rst computing the following differences: D1 = NE − N D2 = N − NW D3 = NW − W D4 = WW − W. D2 and D3 are then quantized into nine regions (labelled −4 to +4) symmetric about the origin with one of the quantization regions (region 0) consisting of only the difference value 0. The second set of techniques was adopted by the committee for the proposed standard since it is simpler.0185 4. q4 ) and (−q1 .6006 4. −q2 .5924 2. Conditioning the error distribution on leads to separation of prediction errors into classes of different variances.2528 4. 2/3.7669 4.5093 3.9475 6.5847 6.3966 3. To model this correlation predictive techniques usually condition the encoding of the prediction error on local image activity level and on quantized prediction errors incurred in neighbouring pixels.5321 2.6672 1. the HP group submitted a signi"cantly different algorithm [7] which was one-pass (as opposed to their original two-pass submission [6]) and incorporated the context-based bias cancellation mechanism proposed in CALIC.4086 2. they considerably simpli"ed the context formation and quantization techniques and combined them with a very simple and ef"cient entropy coding technique (described in Section 6). −q2 . −q3 .

In Table 4 we list the zero-order entropy of prediction errors using the GAP predictor and the entropy after conditioning on the coding contexts described above. yielding 147 contexts in which adaptive binary arithmetic coding is performed.2528 4. and entropy coding of prediction errors.1098 5.2.4550 5. No.9305 3.1.2.4742 1. in practice.0629 4.9441 5. Some binary decisions are encoded conditioned on the sign of the current prediction error after it has been revealed and thus utilize 147 × 2 = 294 contexts.4086 2. the count N of the prediction errors seen so far and the accumulated sum of magnitudes of prediction errors A seen so far.9475 6.3658 3.0000 2.7127 4. 7]. 6. the small number of conditional error probabilities involved means that even small images will provide enough samples to learn p(e|Q( )) quickly to facilitate an adaptive entropy coding technique. Considering this fact. Instead it "rst remaps prediction errors into an alphabet of size 2z Vol.0546 1. q4 = 42.7810 1.174 GAP Conditioned on Q( ) 8. q5 = 60. binary or m-ary. static or adaptive.4818 2. CALIC used an adaptive m-ary arithmetic coder. Coding contexts in LOCO-I1p An advantage of the techniques that employ prediction followed by error modelling is the clean separation between prediction. CACM++ package that was developed and made publicly available by Carpinelli and Salamonsen.4989 3.4355 2.5518 1.4439 4.8957 1.2252 3. The strategy employed is an approximation to the optimal parameter selection for this entropy coder. However.1720 5. 5.8202 4. 1997 . ENTROPY CODING by standard dynamic programming techniques in order to minimize the conditional entropy of prediction errors over a training set of images [24]. on the other hand. For details the reader is referred to [6.9331 5. quantizes the prediction errors incurred in the North. CALIC does not feed an m-ary arithmetic coder with prediction errors directly.2344 5. q3 = 25.1114 5. In the rest of the section we brie#y describe some of the coding techniques that were proposed in the CALIC. q2 = 15.7738 5. q6 = 85. The compression results that we report in the next section were obtained by coupling CALIC with CACM++.4508 4.772 133 Image Air1 Air2 Bike Bike3 Cafe Cats Chart Chart s Compound1 Compound2 CR CT Faxballs Finger Gold Graphic Hotel MRI Tools Ultra-Sound Water Woman X-ray Average estimation context.7518 4. The software is based on the work in [25]. 2/3. LOCO and ALCM proposals. West and NorthEast neighbours into 7. which is described in the next section. be it Huffman or arithmetic. The coding context k is then computed as k = min{k |2k N ≥ A}. it was found that an image-independent quantizer with bins which are "xed. q1 = 5.8731 2. Entropy of prediction errors before and after conditioning on coding contexts GAP No conditioning 8. any entropy coder. The reason for doing this is tied in to the speci"c entropy coder that LOCO-I1 p uses. modelling of prediction errors. q7 = 140. (7) worked almost as well as the optimal image-dependent quantizer. can usually be interfaced with such a system. 40.L OSSLESS I MAGE C OMPRESSION TABLE 4. which is computed as described in Subsection 4.5924 2. Entropy coding in the CALIC system LOCO-I1 p uses k-coding contexts for a k bit/pixel image.5940 4.0437 2.5995 4. The main contribution came from the HP group's revised LOCO-I1 p proposal in the form of an ingeniously simple and effective usage of Golomb–Rice coding.3966 3. Coding contexts in ALCM and JSLUG Coding contexts in the ALCM proposal were obtained by quantizing the maximum prediction error in the four nearest neighbours of the current pixel.5093 3. m-ary and binary arithmetic coding and Golomb–Rice coding. The speci"c coding context is computed from the expected magnitude of the prediction error within the current bias T HE C OMPUTER J OURNAL. Furthermore. This is done by maintaining in each context. a variety of entropy coding techniques was proposed including Huffman coding.5485 6.0045 2.1861 4. 5.3. Estimating L = 8 conditional error probabilities p(e|Q( )) requires only a modest amount of memory while estimating probabilities for entropy coding.5077 4.1220 3. 7 and 3 levels respectively.7753 2.0185 4. This maximum error was quantized into seven levels using "xed thresholds in order to form seven coding contexts in which adaptive binary arithmetic coding was performed. Quite often.0191 3. The JSLUG proposal.6672 1.0241 5. 6. It can be clearly seen that signi"cant improvement is obtained.

CONCLUSION AND AVENUES FOR FUTURE WORK In Table 5 we show "nal bit rates that were reported by the CALIC.92 6.97 3. and compensates for such biases through an error feedback mechanism. W U TABLE 5.43 4.34 5. For simplicity of implementation m is always a power of 2.2 and for details the reader is referred to [7].43 4.47 3.50 – – 1.69 2.69 3.90 4. Entropy coding in the ALCM system The ALCM and JSLUG proposals used adaptive binary arithmetic coding to encode prediction errors within each coding context.81 4. Golomb codes of parameter m encode a positive integer n by encoding n mod m in binary followed by an encoding of n div m in unary.134 N. Image Air1 Air2 Bike Bike3 Cafe Cats Chart Chart s Compound1 Compound2 CR CT Faxballs Finger Gold Graphic Hotel MRI Tools Ultra-Sound Water Woman X-ray Average CALIC 8. both CALIC and LOCO-I1 p included alphabet extension mechanisms for low-entropy images or regions where it is potentially bene"cial to encode runs of uniform T HE C OMPUTER J OURNAL. the compression performance achieved is surprisingly close to that obtained by arithmetic coding. 6.35 5.74 4.02 2.23 3.66 1. Also.2.79 4. a vital issue in lossless image coding.23 4. Entropy coding in the LOCO-I1p system In LOCO-I1 p the prediction errors are encoded using a special case of Golomb codes [26] which is also known as Rice coding [27]. One of these was CALIC's modelling paradigm that uses a large number of contexts to estimate conditional prediction biases.96 LOCO-I1 p 8.07 2. For an image with z bits/pixel.63 3.51 2.86 2. A separate binary decision tree is maintained for each of the seven coding contexts and the "rst binary decision encoded is whether the symbol is more or less than a parameter value m. 1997 . essentially. One can see from the results that the three proposed algorithms listed signi"cantly outperform the current lossless standard.09 0.92 – 3.98 5.30 6.15 5. Also.73 4.50 5. No. In ALCM.22 2.94 4.83 2.00 3.63 0.08 3.24 3. In this paper we brie#y surveyed some of the key ideas that emerged in the process.82 4.75 5.17 2.78 6.51 1.84 – 3.28 2. the tails of error distributions are truncated and an escape mechanism is used to further reduce the number of code symbols. Also included in the table are bit rates obtained by a publicly available lossless JPEG implementation of Cornell University (LJPEG). If the symbol is greater m is subtracted and the procedure repeated until a negative branch is taken. 6. 40. 6. In this case the binarization is done by a decision tree for m equally probable symbols. prediction errors are "rst mapped to the range 0 to 2z − 1 and then binarized using a decision tree.77 4.59 1. Overall compression performance results symbols.06 LJPEG – 4.80 2.33 5.83 3.50 4. Bit rates (bits/pixel) of some proposed schemes on ISO test set and comparison with lossless JPEG (Huffman).24 1. We have brie#y described this parameter estimation procedure in Subsection 4.17 3.30 1. The actual bit rates achieved are mostly very close and sometimes even better than the corresponding entropy "gures.85 4.31 3.17 5.59 4.99 2.34 1. The bit rates shown are after a few additional tricks that were used by each technique to improve compression performance. M EMON AND X.26 3.04 5. The reader is referred to the original proposals for details.69 4. 7. prediction errors can be mapped to the range 0 to 2z − 1 and the coding parameter k can vary from 0 to z − 1.3. instead of attempting codes with each parameter on a block of symbols and selecting the one which results in the shortest code [27]. the bit rates of LOCO-I1 p are remarkable.67 1.63 2.4.21 instead of 2z+1 for a z-bit image.83 3. Although CALIC gives the best overall performance.77 1. 2/3. However.37 4. in LOCO-I1 p . When m = 2k the encoding has a very simple realization and has been referred to as Rice coding in the literature. the procedure involves adaptive binary arithmetic coding of the Golomb m code of the prediction error.71 5. LOCO-I1 p and the ALCM proposals.67 1.69 2.24 5. Despite the simplicity of the coding and estimation procedures.41 3. The baseline algorithm that has been "nalized is essentially the one given in LOCO-I1 p with minor modi"cations. the coding parameter k is estimated on the #y for each prediction error.22 – 5.84 0.63 3. For example.27 2. the CALIC proposal included a sign prediction technique for reducing the conditional entropy of prediction errors.62 4.05 5. Note that averages were only taken over those images with no missing entries in any column. Since binary coding is used the prediction error needs to be binarized before encoding. Different values of m are used for each coding context.74 1.18 5. This approach offers an effective means of reducing model cost.33 2.32 1. The standardization project for lossless image compression of continuous-tone images has resulted in signi"cant advances in the state of the art of such techniques.95 2.27 3. For these reasons the bit rates for CALIC in Table 5 are lower than the corresponding rates in Table 4.60 5.29 1. given its simplicity. by using a large number of contexts for bias Vol.50 4.18 ALCM 8.

Although the MED predictor has been known for a long time. [11] Pennebaker. We have reported the recent advances in lossless image coding. SPIE Proc. C. Sippy. adaptive. For both theoretical and practical interests one would like to know how much gap still exists between the lossless bit rates obtainable by the new JPEG lossless standard and the ultimate image compressibility regardless of computational complexity. M. G. ISO Working Document ISO/IEC/ SC29/WG1/N256. Maeder. IEEE Trans. Symp. W. Comp. I. Van Rostrand Reinhold. 272–282. and Ono. G. lossless/nearly-lossless coding scheme for continuous-tone images. Theory. [7] Weinberger. D. Speck. IEEE Press. LOCO-I1 p . New York. M. (1996) A comparison of the prediction schemes proposed for a new standard on lossless coding of continuous-tone still images. [12] Boliek. SPIE. (1993) Context-based lossless image compression.. P. Memon. J. was partially supported by NSF Career award NCR 9703969. T. (1995) LOCOI: new developments. J. G. and Macy. and Rissanen. Data Compression Conf. 1997 . (1992) Error modeling for hierarchical lossless image compression.. D. the simple context formation and quantization mechanisms presented in the LOCO proposals were also important contributions and were adopted into the baseline standard. on Circuits and Systems. T. Lossless and near-lossless compression of continuous-tone still images (JPEG-LS).12. J. [8] Howard. (1981) Universal modeling and coding. Vol. 1444. D. N. S. J. In fact. S. N. and Zandi. A. 68–77. CA. E. J. Data Compression Conf. (1995) A contextbased.. IEEE Computer Society Press. pp. S. In Storer.. In Medical Imaging V: Image Capture. lossy ±3. 2418. pp. F. and Sapiro. G. K. ISO Working Document ISO/IEC JTC1/SC29/WG1 N196. the coding performance comes within a few per cent of much more complex arithmetic-coding-based techniques.. and Cohn. G. G. S. G. [18] Speck..12: JSLUG. Formatting and Display. and Wu. (1995) Compression results—lossless. J. II-309–312. 351–360.. M. Seroussi. (1995) CREW: lossless/lossy image compression—contribution to ISO/IEC JTC 1. Vol. In addition. 135 [3] Tischer. Worley. [14] Urban. J. 2/3. pp. J and Goodwin. its effectiveness for prediction in lossless image compression had not been realized. S. S. (eds).. (1985) Parameter reduction and context selection for compression of gray scale images. In IEEE Int. [20] Memon. lossy ±1.. The second. IT-27. In spite of being extremely simple to implement both in software and hardware.. ISO Working Document ISO/IEC JTC1/SC29/WG1 N281. G. and Cohn. 8–20. Los Alamitos. Seroussi. and Vitter. G. We can identify two possible problems that may prevent further improvement in coding ef"ciency: (i) there may exist undiscovered structures of prediction errors associated with some events other than the local intensity gradients and neighbouring errors which have already been exploited by the current methods.. J. (1993) Fast and ef"cient lossless image compression. L. Vol. Res. Another key observation that emerged in the convergence process was the ef"cacy of the MED predictor used in the LOCO submissions. X. ISO Working Document ISO/IEC JTC1/SC29/WG1 N522. N. M. ISO Working Document ISO/IEC JTC1/SC29/WG1 N41. 36. [6] Weinberger. A. J. K. [10] Todd. In Storer. [4] ISO/IEC JTC 1/SC 29/WG 1 (1994) Call for contributions— lossless compression of continuous-tone still pictures. The latter is a highly complex but principled algorithm with a provable asymptotical optimality in compressibility. Langdon. Inform. [17] Wu. and Vitter. and Sayood. Develop. N. G. Proc. A recent study [23] seemed to suggest that the ratio of compression gains versus computational complexity is diminishing. T HE C OMPUTER J OURNAL. (1991) Sunset: a hardware oriented algorithm for lossless compression of gray scale images. IEEE Computer Society Press. and Sayood. X. 40. P. New York. G. CA. (1995) Contribution to JTC 1. [2] Howard.. 12–22. Proc. ACKNOWLEDGEMENTS The authors would like to thank the reviewers for their substantive and informed review which led to signi"cant improvements in the manuscript. ISCAS 96.29. REFERENCES [1] Langdon. ISO Working Document ISO/IEC JTC1/SC29/WG1 N245. ISO Working Document ISO/IEC JTC1/SC29/WG1 N196. ISO Working Document ISO/IEC JTC1/SC29/WG1 N197. (eds). 1310–1313. 7]. IBM J.. (1995) Proposal for next generation lossless compression of continuous-tone still pictures: activity level classi"cation model (ALCM). C. G. No. (1993) JPEG Still Image Data Compression Standard. J. G.29. A. [19] Langdon. (1990) Reversible compression of HDTV images using median adaptive prediction and arithmetic coding.. ISO Working Document ISO/IEC JTC1/SC29/WG1 N203. D. ISO Working Document ISO/IEC JTC1/SC29/WG1 N198. (1995) Lossless image compression—a comparative study. and Langdon. M. pp. and (ii) the context quantizers employed by the current methods may deviate signi"cantly from an optimal error classi"er that minimizes conditional entropy. pp. P. (1995) CLARA: continuous-tone lossless coding with edge analysis and range amplitude detection. A. pp. V. and Sapiro. [15] Martucci. (1995) LOCOI: a low complexity lossless image compression algorithm. 269– 278. D. A. Los Alamitos. M. Haidinyak. and Mitchell. [9] Rissanen. J. [16] Memon. [5] ISO/IEC JTC 1/SC 29/WG 1 (1997) CD 14495. ISO Working Document ISO/IEC JTC1/SC29/WG1 N199. it is interesting to ask why MED yields such good performance. J. [13] Mochizuki. IEEE Press. New York. The question becomes even more tantalizing considering that the best (also the most expensive) version of CALIC [23] has reached a 2% shorter average code length than the universal context modelling (UCM) algorithm [6. In Still Image Compression.L OSSLESS I MAGE C OMPRESSION cancellation but merging these contexts into a few contexts for conditional entropy coding of prediction errors. 29. [21] Ueno. B. M. (1995) Proposal for lossless compression of continuous-tone still pictures: lossless transform coding for still pictures (LTC). G. 188–193. and perhaps most important key contribution was the ingenious and effective technique of Golomb–Rice coding using sequential parameter estimation in the revised HP proposal. R. In Proc.

(1997) Ef"cient and effective lossless compression of continuous-tone images via context selection and quantization. F. IEEE Trans. and Smith. California Institute of Technology. Technical Report 79-22. M EMON AND X.12 [JTC1/SC29/WG1 N41] ISO Working Document ISO/IEC JTC1/SC29/WG1 N204. W. [23] Wu. Honsinger. [27] Rice. Pasadena. In Proc. D. C. 656–664. pp.136 N. 202–211. Neal. IP-6.. Theory. IEEE Trans. (1966) Run-length codings. N. M... C.. A. [26] Golomb. I. Rabbani. Jet Propulsion Laboratory. [24] Wu. (1979) Some Practical Universal Noiseless Coding Techniques. X.29. 40. 2/3. B. and Witten. Commun. R. IEEE Trans. 1997 . [22] Gandhi. (1995) A proposal submitted in response to call for contributions for JTC 1. S. T HE C OMPUTER J OURNAL. CA. Image Processing. X. (1995) Arithmetic coding revisited. IT-12. W U [25] Moffat. Data Compression Conf. 399–401. 45. 437–444. and Memon. Inform. R. Vol. (1997) Context-based adaptive lossless image coding. No..