Professional Documents
Culture Documents
2
CONTENTS
Abstract
I. Introduction
4. JPEG
4.1 JPEG
3
4.1.1 JPEG Encoder
4.1.2 The 2D 8x8 DCT
4.1.3 Quantization
4.1.4 Differential Coding of DC Coefficients
4.1.5 Zigzag Scanning of AC Coefficients
4.1.6 Entropy Coding
5. Other Commonly Used Techniques
5.1 JPEG2000
5.2 Others
Summary
References
4
1. Introduction to Image Compression
5
Each pixel is a sample of an original image. More samples typically provide
more accurate representations of the original. The intensity of each pixel is
variable. In color image systems, a color is typically represented by three or
four component intensities such as red, green, and blue.
1.1.2 RGB
When the eye perceives an image on a computer monitor, it is in actually
perceiving a large collection of finite color elements, or pixels [1]. Each of
these pixels is in and of itself composed of three dots of light; a green dot, a
blue dot, and a red dot. The color the eye perceives at each pixel is a result of
varying intensities of green, red, and blue light emanating from that location. A
color image can be represented as 3 matrixes of values, each corresponding to
the brightness of a particular color in each pixel. Therefore, a full color image
can be reconstructed by superimposing these three matrices of “RGB”, as
shown in Fig. 2.
1.1.3 Grayscale
If an image is measured by an intensity matrix with the relative intensity being
represented as a color between black and white, it would appear to be a
grayscale image, as shown in Fig. 3.
6
Fig. 3 A grayscale image
1.1.4 YUV
In the case of a color RGB picture a point-wise transform is made to the YUV
(luminance, blue chrominance, red chrominance) color space. This space in
some sense is more efficient than the RGB space and allows better
quantization. The transform is given by
, (1)
7
. (2)
The camera captures the reflected light from the surface of the object, and the
received light will be converted into three primary color components R, G and
B. These three primary color components are processed by coding algorithms
afterward.
Image compression addresses the problem of reducing the amount of data
required to represent a digital image. It is a process intended to yield a compact
representation of an image, thereby reducing the image storage/transmission
requirements. Compression is achieved by the removal of one or more of the
following three basic data redundancies:
1. Coding Redundancy
2. Inter-pixel Redundancy
3. Perceptual Redundancy
Coding redundancy occurs when the codes assigned to a set of events such as
the pixel values of an image have not been selected to take full advantage of
the probabilities of the events [2].
Inter-pixel redundancy usually results from correlations between the pixels.
Due to the high correlation between the pixels, any given pixel can be predicted
from its neighboring pixels.
Perceptual redundancy is due to data that is ignored by the human visual
8
system. In other words, all the neighboring pixels in the smooth region of a
natural image have a high degree of similarity and this insignificant variation in
the values of the neighboring pixels is not noticeable to the human eye.
1.2.1 System
Image compression techniques reduce the number of bits required to represent
an image by taking advantage of these redundancies. An inverse process
called decoding is applied to the compressed data to get the reconstructed
image. The objective of compression is to reduce the number of bits as much as
possible, while keeping the resolution and the quality of the reconstructed
image as close to the original image as possible.
Image compression systems are composed of two distinct structural blocks: an
encoder and a decoder, as shown in Fig. 6.
Image f(x,y) is fed into the encoder, which creates a set of symbols form the
input data and uses them to represent the image. If we let n 1 and n2 denote the
number of information carrying units in the original and encoded images
respectively, the compression that is achieved can be quantified numerically via
the compression ratio, CR = n1/n2
As shown in Fig. 6, the encoder is responsible for reducing the coding, inter-
pixel and perceptual redundancies of input image. In first stage, the mapper
transforms the input image into a format designed to reduce inter-pixel
redundancies. The second stage, qunatizer block reduces the accuracy of
mapper’s output in accordance with a predefined criterion. In third and final
stage, a symbol decoder creates a code for quantizer output and maps the
output in accordance with the code. These blocks perform, in reverse order, the
inverse operations of the encoder’s symbol coder and mapper block. As
quantization is irreversible, an inverse quantization is not included.
9
1.2.2 Benefits
The benefits of image compression can be listed as follows:
1. It provides a potential cost savings associated with sending less data over
switched telephone network where cost of call is really usually based upon
its duration.
2. It not only reduces storage requirements but also overall execution time.
3. It reduces the transmission errors since fewer bits are transferred.
4. It also provides a level of security against illicit monitoring.
1.2.3 Categories
The image compression techniques are broadly classified into two categories
depending whether or not an exact replica of the original image could be
reconstructed using the compressed image. These are:
1. Lossy technique
2. Lossless technique
2.1 Introduction
Lossy schemes provide much higher compression ratios than lossless schemes.
Lossy schemes are widely used since the quality of the reconstructed images is
adequate for most applications. By this scheme, the decompressed image is not
identical to the original image, but reasonably close to it.
10
decoding is a reverse process. Firstly, entropy decoding is applied to
compressed data to get the quantized data. Secondly, de-quantization is applied
to it and finally the inverse transformation to get the reconstructed image.
Major performance considerations of a lossy compression scheme include:
1. Compression ratio
2. Signal - to – noise ratio
3. Speed of encoding and decoding.
Lossy compression techniques includes following schemes:
1. Transform coding
2. ector quantization
3. Fractal coding
4. Block truncation coding
5. Subband coding
2.2 Techniques
2.2.1 Transform Coding
In this coding scheme, transforms such as DFT (Discrete Fourier Transform)
and DCT (Discrete Cosine Transform) are used to change the pixels in the
original image into frequency domain coefficients. These coefficients have
several desirable properties. One is the energy compaction property that results
in most of the energy of the original data being concentrated in only a few of
the significant transform coefficients. Only those few significant coefficients
are selected and the remaining is discarded. The selected coefficients are
considered for further quantization and entropy encoding. DCT coding has
been the most common approach to transform coding. (See Fig. 8)
Fig. 8 DCT
11
called code vectors. As shown in Fig. 9, a vector is usually a block of pixel
values. A given image is then partitioned into non-overlapping blocks (vectors)
called image vectors. Each in the dictionary is determined and its index in the
dictionary is used as the encoding of the original image vector. Thus, each
image is represented by a sequence of indices that can be further entropy
coded.
(a)
(b)
12
Fig. 9 (a) Vector quantization coding procedure (b) decoding procedure
Now, we want to find a map W which takes an input image and yields an
output image. If we want to know when W is contractive, we will have to
define a distance between two images. The distance is defined as
δ ( f , g )=¿ (x , y )∈ P|f ( x , y )−g ( x , y )|
where f and g are value of the level of grey of pixel (for greyscale image), P is
13
the space of the image, and x and y are the coordinates of any pixel. This
distance defines position (x,y) where images f and g differ the most.
Natural images are not exactly self similar. Lena image (see Fig. 10), a typical
image of a face, does not contain the type of self-similarity that can be found in
the Sierpinski triangle. But next image shows that we can find self-similar
portions of the image. A part of her hat is similar to a portion of the reflection
of the hat in the mirror.
2.2.4 Block Truncation Coding
In this scheme, the image is divided into non-overlapping blocks of pixels. For
each block, threshold and reconstruction values are determined. The threshold
is usually the mean of the pixel values in the block. Then a bitmap of the block
is derived by replacing all pixels whose values are greater than or equal (less
than) to the threshold by a 1 (0). Then for each segment (group of 1s and 0s) in
the bitmap, the reconstruction value is determined. This is the average of the
values of the corresponding pixels in the original block. (See Fig. 11)
14
Fig. 11 Block truncation coding procedure
15
Fig. 12 Subband coding
3.1 Introduction
In lossless compression techniques, the original image can be perfectly
recovered from the compressed image. These are also called noiseless since
they do not add noise to the signal. It is also known as entropy coding since it
use decomposition techniques to minimize redundancy.
Following techniques are included in lossless compression:
1. Run length encoding
2. Huffman encoding
3. LZW coding
4. Area coding
3.2 Techniques
3.2.1 Run Length Encoding
This is a very simple compression method used for sequential data. It is very
useful in repetitive data. This technique replaces sequences of identical pixels,
called runs by shorter symbols. The run length code for a gray scale image is
represented by a sequence {Vi, Ri} where Vi is the intensity of pixel and Ri
refers to the number of consecutive pixels with the intensity Vi as shown in the
figure. If both Vi and Ri are represented by one byte, this span of 12 pixels is
coded using eight bytes yielding a compression ration of 1: 5. (See Fig. 13)
16
Fig. 13 Run-length encoding
The pixels in the image are treated as symbols. The symbols that occur more
frequently are assigned a smaller number of bits, while the symbols that occur
less frequent are assigned a relatively larger number of bits. Huffman code is a
prefix code. The binary code of any symbol is not the prefix of the code of any
other symbol. Most image coding standards use lossy techniques in earlier
stages of compression and use Huffman coding as the final step.
3.2.3 LZW Coding
LZW (Lempel-Ziv–Welch) is a dictionary based coding. Dictionary based
17
coding can be static or dynamic. In static dictionary coding, dictionary is fixed
during the encoding and decoding processes. In dynamic dictionary coding, the
dictionary is updated on fly. LZW is widely used in computer industry and is
implemented as compress command on UNIX. (See Fig. 15)
4. JPEG
4.1 JPEG
JPEG (Joint Photographic Expert Group) standard will be simply described in
this section, and the flow chart, and the algorithm will be introduced.
4.1.1 JPEG Encoder
Fig. 16 shows the block diagram of the JPEG standard. The YCbCr color
18
transform and the chrominance subsampling format are not defined in the
JPEG standard, but most JPEG software will perform this processing because it
makes the JPEG encoder reduce the date quantity more efficient. However, we
don’t discuss the basic concept of subsampling format here. We will focus on
JPEG encoder as follows.
(3)
where
5.
The discrete cosine transform shown is closely related to the Discrete Fourier
Transform (DFT). Both take a set of points from the spatial domain and
19
transform them into an equivalent representation in the frequency domain. The
difference is that while the DFT takes a discrete signal in one spatial dimension
and transforms it into a set of points in one frequency dimension and the
Discrete Cosine Transform (for an 8x8 block of values) takes a 64-point
discrete signal, which can be thought of as a function of two spatial dimensions
x and y, and turns them into 64 DCT coefficients which are in terms of the 64
unique orthogonal 2D spectrum shown in Fig. 17.
The DCT coefficient values are the relative amounts of the 64 spatial
frequencies present in the original 64-point input. The element in the upper
most left corresponding to zero frequency in both directions is the “DC
coefficient” and the rest are called “AC coefficients.”
Because pixel values typically change vary slowly from point to point across an
image, the FDCT processing step lays the foundation for achieving data
compression by concentrating most of the signal in the lower spatial
frequencies. For a typical 8x8 sample block from a typical source image, most
of the spatial frequencies have zero or near-zero amplitude and need not be
encoded.
At the decoder the IDCT reverses this processing step. It takes the 64 DCT
coefficients and reconstructs a 64-point output image signal by summing the
basis signals. Mathematically, the DCT is one-to-one mapping for 64-point
vectors between the image and the frequency domains. In principle, the DCT
introduces no loss to the source image samples; it merely transforms them to a
domain in which they can be more efficiently encoded.
4.1.3 Quantization
20
After DCT, the encoder performs quantization to reduce precision of the data
and discard the less important high frequency coefficients. As mentioned
above, the human eyes are more sensitive to the low frequency components
than the high frequency components, the JPEG quantization assigns large
quantization step size to high frequency components to discard the redundant
information, and assign small quantization step size to the low frequency
components to preserve the significant information. Fig. 18 shows two
quantization tables defined in JPEG, where Qr is the luminance quantization
table and Qc is the chrominance quantization table.
The quantization step size is smaller in the upper left region to preserve the low
frequency components. On the other hand, the quantization step is larger in the
lower right region to reduce the less important high frequency components to
zero. Since the distortion of the high frequency features of the image is less
sensitive to the human eyes, it is not easy for us to observe the difference
between the original image and the quantized image.
4.1.4 Differential Coding of DC Coefficients
21
After 2-D DCT and quantization, we can find that the component of AC terms
in 8×8 block will consist of many zeros such as Fig. 19. And the DC coefficient
is the mean of each corresponding block, and the current DC coefficient is very
similar to the DC coefficients in its neighboring blocks. Thus, the JPEG
encoder performs the predictive coding on the DC coefficients to reduce the
redundancy, as shown in Fig. 20. The differential coding of DC coefficient is
denoted as difference of DCi and DCi-1.
22
The zero run length coding encoded and represented the AC coefficients as the
following form
(1)
RUN is the number of continuous zeros before the nonzero coefficient VALUE
For example, if we want to encode the AC coefficients as follows
(2)
Then the zero run length encoded AC coefficients will become
(3)
EOB means that the remaining coefficients are all zeros
After zero run length coding, we want to determinate the order of scanning.
Since the energy of image usually concentrate on low frequency, the non-zero
coefficient will appear at low frequency region with higher probability. On the
other hand, zero will appear at hig frequency. We adopt zigzag scanning that
can concentrate zero on the high frequency. (See Fig. 21.)
23
VALUE Binary codes for VALUE SIZE
0 0
-1,1 0,1 1
-3,-2,2,3 00,01,10,11 2
-7,-6,-5,-4,4,5,6,7 000,001,010,011,100,101,110,111 3
-15,...,-8,8,...,15 0000,...,0111,1000,...,1111 4
-31,...,-16,16,...31 00000,...,01111,10000,...,11111 5
-63,...,-32,32,...63 000000,...,011111,100000,...,111111 6
-127,...,-64,64,...,127 0000000,...,0111111,1000000,...,1111111 7
-255,..,-128,128,..,255 ... 8
-511,..,-256,256,..,511 ... 9
-1023,..,-512,512,..,1023 ... 10
24
0/1 2 00
...
0/6 7 1111000
...
0/10 16 1111111110000011
1/1 4 1100
1/2 5 11011
...
1/10 16 1111111110001000
2/1 5 11100
...
4/5 16 1111111110011000
...
15/10 16 1111111111111110
5.1 JPEG2000
JPEG 2000 is an image compression standard and coding system. It was
created by the Joint Photographic Experts Group committee in 2000 with the
intention of superseding their original discrete cosine transform-based JPEG
standard (created in 1992) with a newly designed, wavelet-based method.
The aim of JPEG 2000 is not only improving compression performance over
JPEG but also adding features such as scalability. JPEG 2000's improvement in
compression performance relative to the original JPEG standard is actually
rather modest and should not ordinarily be the primary consideration for
evaluating the design. Very low and very high compression rates are supported
in JPEG 2000. The ability of the design to handle a very large range of
effective bit rates is one of the strengths of JPEG 2000. The following sections
describe the algorithm of JPEG 2000. (See Fig. 21 and Fig. 22)
25
Fig. 21 JPEG2000
27
SUMMARY
This tutorial starts from the introduction of backgrounds of why we need image
compression, how image is formed, what kind of image processing tools are
there, what we can do via image compression techniques, to how each method
is implemented.
I didn’t talk much on how to implement JPEG and JPEG2000, but focus more
on introducing different kinds of compression techniques. The drawback is that
each cannot be discussed in details, but since the topic is development of
compression techniques, so I would like to talk more broadly. Therefore,
different methods are covered.
There are still many other analysis tools with more complex algorithms that
have not been introduced here. Readers who are interested can find more
information in the references.
REFERENCES
[1] Trutna et al. “An Introduction to JPEG Compression”, 2001
[2] J.-J. Ding and J.-D. Huang, "Image Compression by Segmentation and
Boundary Description", National Taiwan University, Taipei, 2007
[3] Subramanya, “Image Compression Technique,” Potentials IEEE, Vol. 20,
Issue 1, pp 19-23, Feb-March 2001.
[4] Jackson and Hannah, “Comparative Analysis of Image Compression
Techniques,” System Theory, Proceedings SSST’93, 25th Southeastern
Symposium, pp 513-517, 7–9 March 1993
[5] Yang and Bourbakis, “An Overview of Lossless Digital Image
Compression Techniques,” Circuits & Systems, 2005 48th Midwest
Symposium ,vol. 2 IEEE, pp 1099-1102, 7–10 Aug 2005
[6] Avcibas, Memon, Sankur, Sayood, “A Progressive Lossless / Near
Lossless Image Compression Algorithm,” IEEE Signal Processing
Letters, vol. 9, No. 10, pp 312-314, October 2002.
[7] Chen, Yang and Zhang, “A New Efficient Image Compression Technique
with Index-Matching Vector Quantization,” Consumer Electronics, IEEE
Transactions, Vol. 43, Issue 2, pp 173- 182, May 1997.
28
[8] Kumar, “A Study of Various Image Compression Techniques”
[9] Kil and Shin, “Reduced Dimension Image Compression and Its
Applications,” Image Processing Proceedings, International Conference,
Vol. 3 , pp 500-503, 23-26 Oct.,1995
[10] Suzuki et al., “Intra Prediction by Template Matching,” ICIP 2006
[11] T.K. Tan, C.S. Boon, and Y. Suzuki, “Intra Prediction by Averaged
Template Matching Predictors,” IEEE 2007
[12] Y. Guo, Y.K. Wang, and H. Li, “Priority-Based Template Matching Intra
Prediction,” ICME 2008.
[13] L.Wei, and M. Levoy, “Fast Texture Synthesis using Tree-Structure
Vector Quantization,” SIGGRAPH 2000, pp.479-488, Jul. 2000.
[14] Rafael C. Gonzalez, Richard E. woods, “Digital Image Processing”,
Third Edition, Prentice Hall.
[15] Skodras, Christopoulos, and Rbrahimi, “The JPEG 2000 Still Image
Compreesion Standard”, IEEE Signal magazine, September 2001.
[16] Gray, “The JPEG2000 Standard”, Technische Universität München.
[17] Gaetano Impoco, “JPEG2000 – A Short Tutorial”, April 1, 2004.
[18] D. Taubman, E. Ordentkich, M. Weinherger, G. Seroussi, I. Ueno, and F.
Ono, “Embedded Block Coding in JPEG2000”, in Proc. Int. Conf. Image
Process. (ICIP’02), Sep. 2000.
[19] Kuei-Lan Lin, “Analysis and Architecture Design for JPEG2000 Still
Image Encoding System”, National Central University, Taiwan, ROC.
29