Professional Documents
Culture Documents
Basic Image Compression Concepts
Basic Image Compression Concepts
Concepts
Presenter:
Guan-Chen Pan
Research Advisor:
Jian-Jiun Ding , Ph. D.
Assistant professor
1
Outlines
Introductions
Basic concept of image compression
Proposed method for arbitrary-shape
image segment compression
Improvement of the boundary region by
morphology
JPEG2000
Triangular and trapezoid regions and
modified JPEG image compression
2
Introduction
Lossless or lossy(widely used)
B
G
R Chrominance
YCbCr Color 8×8
Downsampling
Image Transform FDCT
(4:2:2 or 4:2:0)
Zigzag &
Huffman
Run Length
Encoding
Quantizer Coding Bit-stream
Differential Huffman
Quantization Coding Encoding
Table
3
Y 0.299 0.587 0.114 R 0
4
Chrominance Subsampling
The name of the format is not always
related to the subsampling ratio.
(a) 4 : 4 : 4 (b) 4 : 2 : 2 (c) 4 : 2 : 0 (d) 4 : 1 : 1
W W W W
H Y H Y H Y H Y
H/2 Cb
H Cb H Cb H C
b
5
Compression ratio (CR):
n1
CR = ,
n2
σ𝑊−1 𝐻−1
𝑥=0 σ𝑦=0 [𝑓 𝑥, 𝑦 − 𝑓′(𝑥, 𝑦)]
2
R𝑀𝑆𝐸 = ,
𝑊𝐻
where H and W are the height and the width of the images
respectively
6
Peak-to-signal ratio (PSNR):
255
P𝑆𝑁𝑅 = 20 log10
𝑀𝑆𝐸
7
Reduce the Correlation between
Pixels
Transform coding
1. Coordinate rotation
2. Karhunen-Loeve transform
3. Discrete cosine transform
4. Discrete wavelet transform
Predictive coding
8
Coordinate rotation
Draw a line that has the mean square
error with all data for 𝐘 = 𝐀𝐗
Weight Transform
Original sequence X=(x0,x1) 200
180
Height Weight
160
65 170
140
75 188
120
60 150
x1
70 170 100
56 130 80
80 203 60
68
68 160 40
50 110 20
40 80
0
0 10 20 30 40 50 60 70 80 90
x0
Height
9
New Height y0 New Width y1
181.971 3.416
203.406 0.887
161.554 0.560
183.844 -1.220
141.512 -3.223
206.133 -2.999
173.823 -3.111
120.721 -5.152
89.159 -7.112
10
Karhunen-Loeve transform(KLT)
12
Discrete cosine transform
The DCT is an approximation of the KLT
and more widely used in image and video
compression.
The DCT can concentrate more energy
in the low frequency bands than the DFT.
13
Discrete wavelet transform
Wavelet transform is very similar to the
conventional Fourier transform, but it is
based on small waves, called wavelet,
which is composed of time varying and
limited duration waves.
We use 2-D discrete wavelet transform in
image compression.
14
Rows
Columns h ( m) ↓ 2 WD ( j , m, n)
h (n) ↓ 2
h ( m) ↓ 2 WV ( j , m, n)
W ( j 1, m, n)
h ( m) ↓ 2 WH ( j , m, n)
h ( n) ↓ 2
h ( m) ↓ 2 W ( j , m, n)
LL2 HL2
H
W ( j , m, n) W ( j , m, n)
HL1
LH2 HH2
W ( j 1, m, n)
16
Quantization
17
Luminance quantization matrix
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
18
Entropy Coding Algorithms
1. Huffman Coding
◦ Difference Coding (DC)
◦ Zero Run Length Coding (AC)
2. Arithmetic Coding
3. Golomb Coding
19
Huffman Coding
Huffman coding is the most popular
technique for removing coding
redundancy.
◦ Unique prefix property
◦ Instantaneous decoding property
◦ Optimality
JPEG(fixed, not optimal)
20
Symbol Probability 1 2 3 4 Code
a2 0.4 0.4 0.4 0.4 0.6 1
a6 0.3 0.3 0.3 0.3 0.4 00
a1 0.1 0.1 0.2 0.3 011
a4 0.1 0.1 0.1 0100
a3 0.06 0.1 01010
a5 0.04 01011
21
Difference Coding
For DC coefficients
The DC coefficients is very close to its
neighbors and usually have much larger
value than AC coefficients.
22
Zero Run Length Coding
Encode each value which is not 0, than
add the number of consecutive zeroes in
front of it
EOB (End of Block) = (0,0)
Only 4-bit value
[57,45,0,0,0,0,23,0,-30,-16,0,……,0]
⇒[(0,57)(0,45)(4,23)(1,-30)(0,16)EOB]
“Eighteen zeroes, 3” ⇒(15,0) ; (2,3)
where (15,0) is 16 consecutive zeroes
23
run/category code length code word
0/0 (EOB) 4 1010
Category Values Bits for the value
15/0 (ZRL) 11 11111111001
1 -1,1 0,1
0/1 2 00
2 -3,-2,2,3 00,01,10,11
...
3 -7,-6,-5,-4,4,5,6,7 000,001,010,011,100,101,110,111
0/6 7 1111000
4 -15,...,-8,8,...,15 0000,...,0111,1000,...,1111
...
5 -31,...,-16,16,...31 00000,...,01111,10000,...,11111
0/10 16 1111111110000011
6 -63,...,-32,32,...63 000000,...,011111,100000,...,111111
1/1 4 1100
7 -127,...,-64,64,...,127 0000000,...,0111111,1000000,...,1111111
1/2 5 11011
8 -255,..,-128,128,..,255 ...
...
9 -511,..,-256,256,..,511 ...
1/10 16 1111111110001000
10 -1023,..,-512,512,..,1023 ...
2/1 5 11100
11 -2047,...,-1024,1024,...,2047 ...
...
4/5 16 1111111110011000
...
15/10 16 1111111111111110
24
Arithmetic Coding
Arithmetic coding is another coding
method widely used in image and video
compression, and its performance is
better than Huffman coding.
We treat the whole input data as a single
symbol and find the corresponding
codeword for it.
Huffman, probability very close to 1.0,
1
log 2
𝑝𝑖
25
Arithmetic Coding Algorithm
Input symbol is S
Previouslow is the lower bound for the old interval
Previoushigh is the upper bound for the old interval
Range = Previoushigh - Previouslow
Let
Previouslow= 0, Previoushigh = 1
Range = Previoushigh – Previouslow =1
WHILE (input symbol != EOF)
get input symbol S
Range = Previoushigh - Previouslow
New Previouslow = Previouslow + Range × intervallow of S
New Previoushigh = Previouslow + Range × intervalhigh of S
END
26
Symbol Probability Sub-interval
k 0.05 [0.00,0.05)
l 0.2 [0.05,0.25)
u 0.1 [0.25,0.35)
w 0.05 [0.35,0.40)
e 0.3 [0.40,0.70)
r 0.2 [0.70,0.90)
? 0.1 [0.90,1.00)
5. 𝑁𝑃𝐻 = 0 + 1 × r r r r r r r r
0.25 = 0.25 0.70
e e e e e e e e
Second l 0.40
w w w w w w w w
1. 𝑃𝐿 = 0.05 0.35
u u u u u u u u
2. 𝑃𝐻 = 0.25 0.25
l l l l l l l l
3. 𝑅 = 0.2
0.05
4. 𝑁𝑃𝐿 = 0.05 + k k k k k k k k
0
0.2 × 0.05 = 0.06 0 0.05 0.06 0.070 0.0710 0.07128 0.07132 0.0713336
5. 𝑁𝑃𝐻 = 0.05 +
0.2 × 0.25 = 0.1
27
Symbol Probability Sub-interval
k 0.05 [0.00,0.05)
l 0.2 [0.05,0.25)
w
u 0.1
0.05
[0.20,0.35)
[0.35,0.40) 0.071334 ⇒ L
e 0.3 [0.40,0.70)
r 0.2 [0.70,0.90)
? 0.1 [0.90,1.00)
k 0.05 [0.05,0.06)
l 0.2 [0.06,0.1)
0.071334 ⇒ L
u 0.1 [0.1,0.12)
w
e
0.05
0.3
[0.12,0.13)
[0.13,0.19)
r 0.2 [0.19,0.23)
? 0.1 [0.23,0.25)
28
Golomb Coding
Golomb coding is a special case of the
Huffman coding.
Optimal for the data with a geometric
distribution.
Prob y = a = (1 − p)pa
No table
29
log(𝑝+1)
1. First, determine m from p , 𝑚 = −
log(𝑝)
2. a = q×m + r
3. Convert q into the prefix. The prefix is
composed of q “1” bits followed by a “0” bit.
4. Convert r into the suffix using the binary code.
Threshold parameter (m) = 2^ log 2 𝑚 m.
◦ If r < (m), the length of the suffix is log 2 𝑚 bits.
◦ If r ≥ (m), we update r into r +(m) and encode it
into a log 2 𝑚 -length suffix.
30
Example
◦ p = 0.93, m = 10, a = 19
q = 1, r = 9
Prefix = “10”
Threshold parameter (m) = 2^ log 2 𝑚 m = 6
r >threshold
⇒ r = r + threshold = 9 + 6 = 15
⇒ encode 15 into a log 2 𝑚 -length suffix,
log 2 𝑚 =4
⇒ Suffix = “1111”
Code = “101111”
31
Decode “101111”
Encoding of quotient part Encoding of remainder part
q output bits r offset binary output bits
0 0 0 0 0000 000
1 10 1 1 0001 001
2 110 2 2 0010 010
3 1110 3 3 0011 011
4 11110 4 4 0100 100
5 111110 5 5 0101 101
6 1111110 6 12 1100 1100
: : 7 13 1101 1101
N <N repetitions of 1>0 8 14 1110 1110
9 15 1111 1111
q = 1, r = 9
⇒ a = 10*1+9 = 19
32
However, Golomb coding can just achieve
optimal coding efficiency when the data is
geometrically distributed.
To solve this problem, there is an Adaptive
Golomb Code.
Prob y = a = (1 − p)pa
⇒ Prob y = a = (1 − p(x))p(x)a
Without Flexibility
codeword and
table adaptation
Huffman NO GOOD
Golomb YES MIDDLE
Adaptive YES GOOD
Golomb
33
Proposed Method for Arbitrary-
Shape Image Segment Compression
An arbitrary-shape image segment f and
its shape matrix.
75 96 0 0 0 0 1 1 0 0
105 98 99 101 73 85 66 60 1 1 1 1 1 1 1 1
100 97 89 94 87 64 55 0 1 1 1 1 1 1 1
84 94 90 81 71 66 0 0 1 1 1 1 1 1
93 86 94 81 70 0 0 1 1 1 1 1 0
86 86 81 72 0 0 0 1 1 1 1 0
98 97 78 0 0 0 1 1 1 0 0
105 104 0 0 0 1 1 0 0 0
34
Standard 8x8 DCT bases with the shape
of f
0 1 2 3 4 5 6 7
35
The 37 arbitrary-shape orthonormal
DCT bases by Gram-Schmidt process
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37
W 1 H 1
F (k ) f ( x, y) 'x, y (k ) for k 1,2,..., M
x 0 y 0
36
Quantization
Q(k ) Qa k Qc , for k 1,2,..., M
F (k )
Fq (k ) Round , where k 1,2,..., M
Q(k )
45
40
35
30
25
20
15
10
5
0 100 200 300 400 500 600
37
Improvement of the Boundary
Region by Morphology
38
JPEG2000
JPEG 2000 is a new standard and it can
achieve better performance in image
compression.
Advantages
◦ Efficient lossy and lossless compression
◦ Superior image quality
◦ Additional features such as spatial scalability
and region of interest.
Complexity
39
JPEG 2000 encoder
Rate
Control
Original Coded
Image Forward Image
Forward Tier-1 Tier-2
Component Quantization
2D DWT Encoder Encoder
Transform
Coded Reconstructed
Image Inverse Image
Tier-2 Tier-1 Inverse
Dequantization Component
Decoder Decoder 2D DWT
Transform
40
41
Irreversible component transform
(ICT)
Irreversible and real-to-real
V0 x, y 0.299 0.587 0.114 U 0 x, y
U x , y
V
1 x , y
0.5 0.41869 0.08131 1
V2 x, y 0.16875 0.33126 0.5 U 2 x, y
V1 x, y U 2 x, y U1 x, y
V2 x, y U0 x, y U1 x, y
43
Tiling DWT in each tile
Image
Component
44
Rows
Columns h ( m) ↓ 2 WD ( j , m, n)
h (n) ↓ 2
h ( m) ↓ 2 WV ( j , m, n)
W ( j 1, m, n)
h ( m) ↓ 2 WH ( j , m, n)
h ( n) ↓ 2
h ( m) ↓ 2 W ( j , m, n)
45
46
Tier-1 Encoder
Quantized
DWT Context
coefficients
Fractional
Bit-plane Arithmetic
Bit-plane
Conversion Decision Coding
Coding
Tier-1 Encoder
1 LSB
48
17 22 33 48 64 80 96 112
22 28 38 52 67 81 96 112
33 38 48 62 75 86 100 116 17 = 000100012
48 52 62 70 83 96 110 125
64 67 75 83 96 108 118 132 160 = 101000002
80 81 86 96 108 117 128 142
96 96 100 110 118 128 140 150
112 112 116 125 132 142 150 160
49
Stripe and Scan Order
50
d v d
Zero Coding h D h
d v d
D : current encode data, binary : 0 or 1
h :0~2 v :0~2 d :0~4
51
v
Sign Coding h D h
v
h, v: neighborhood sign
status
◦ -1: one or both negative
◦ 0: both insignificant or
both significant but
opposite sign
◦ -1: one or both positive
D : 𝑋 ⊗ 𝑋,
⊗ = XOR
52
Magnitude Refinement Coding
σ′[x,y] is initialized to 0, and it will
become 1 after the first time of the
magnitude refinement coding is met at
[x,y]
53
Run-Length Coding
For four zeros : (CX,D) is (0,0)
Else is (0,1), and use 2 uniform(CX=18)
to record the 1’s position
◦ (0110)
◦ The first nonzero position is (01)2
⇒(0,1), (18,0), (18,1)
54
D
(0,1) Arithmetic Compressed
encoder data
CX
(total 19)
55
Why Called Fractional?
56
Tier-2 Encoder
Rate/Distortion optimized truncation
57
Triangular and Trapezoid Regions and
Modified JPEG Image Compression
Divide an image into 3 parts:
1. Lower frequency regions
2. Traditional image blocks and
3. The arbitrarily-shaped image blocks
58
1 1 1 1 1 1 1 1 1 0 1 sections
0 1 1 1 1 1 1 1 1 1 1 sections
0 1 1 1 1 1 1 1 1 0
1 sections
Zone 1
0 1 1 1 1 1 1 1 0 0 1 sections
0 0 1 1 0 1 1 1 1 0 2 sections
0 0 1 0 0 1 1 1 0 0 2 sections Zone 2
0 0 0 0 0 0 1 1 0 0 1 sections
0 0 0 0 0 0 1 1 0 0 1 sections Zone 3
Zone 1 1 zone
Zone 3
1 zone
59
α -distance
60
Corner too close
61
(M-1)th row
(M-2)th row
.
.
.
.
N = 10
.
.
1st row
0th row
N = K(m) + K(M-1-m)
62
(a) 1. Construct the
rectangular region and
m = M-1
m = M-2
.
.
.
.
.
.
obtain the
m= 1
m= 0
orthonormal DCT
n=0 1 2
basis Cp,q [m, n]
(b)
Region A Region B
2. Select the DCT basis
Cp,q [m, n] that
Rotation by 180 ∘
Region B
Region A
satisfies p+q=even
Rectangular Region
63
Reference:
1. J.D Huang "Image Compression by Segmentation and Boundary Description, " 2008.
2. G. Roberts, "Machine Perception of Three-Dimensional Solids," in Optical and Electro-
Optical Information Processing, J. T. T. e. al., Ed. Cambridge, MA: MIT Press, 1965, pp. 159-
197.
3. J. Canny, "A Computational Approach to Edge Detection," IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 8, pp. 679-698, Nov. 1986.
4. D. Comaniciu and P. Meer, "Mean Shift: A Robust Approach toward Feature Space
Analysis, " IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, pp. 603-619, 2002.
5. J.J Ding, P.Y Lin, S.C Pei, and Y.H Wang,"The Two-Dimensional Orthogonal DCT
Expansion in Triangular and Trapezoid Regions and Modified JPEG Image Compression,
",VCIP2010
6. J.J Ding, S.C Pei, W.Y Wei, H.H Chen, and T.H Lee, "Adaptive Golomb Code for Joint
Geometrically Distributed Data and Its Application in Image Coding", APSIPA 2010
7. W.Y Wei, "Image Compression", available in http://disp.ee.ntu.edu.tw/tutorial.php
8. K. R. Rao and P.Yip, Discrete Cosine Transform, Algorithms, Advantage, Applications, New York:
Academic, 1990.
9. S.S. Agaian, Hadamard Matrices and Their Applications, New York, Springer-Verlag, 1985.
10. H. F. Harmuth, Transmission of information by orthogonal functions, Springer, New York,
1970.
64
11. R. Koenen, Editor, “Overview of the MPEG-4 Standard,” ISO/IEC JTC/SC29/WG21,
MPEG-99-N2925, March 1999, Seoul, South Korea.
12. T. Sikora, “MPEG-4 very low bit rate video,” IEEE International Symposium on Circuits and
Systems, ISCAS ’97, vol. 2, pp. 1440-1443, 1997.
13. T. Sikora and B. Makai, “Shape-adaptive DCT for generic coding of video,” IEEE Trans.
Circuits Syst.Video Technol., vol. 5, pp. 59-62, Feb. 1995.
14. W.K. Ng and Z. Lin, “A New Shape-Adaptive DCT for Coding of Arbitrarily Shaped
Image Segments,” ICASSP, vol. 4, pp. 2115-2118, 2000.
15. S. C. Pei, J. J. Ding, P.Y. Lin and T. H. H. Lee, “Two-dimensional orthogonal DCT expansion
in triangular and trapezoid regions,” Computer Vision, Graphics, and Image Processing, Sitou,
Taiwan, Aug. 2009.
16. D. A. Huffman, "A method for the construction of minimum-redundancy codes,"
Proceedings of the IRE, vol. 40, no. 9, pp. 1098-1101, 1952.
17. S. W. Golomb, "Run length encodings," IEEE Trans. Inf.Theory, vol. 12, pp. 399-401, 1966.
18. R. Gallager and D.V.Voorhis, "Optimal source codes for geometrically distributed
integer alphabets," IEEE Trans. Information Theory, vol. 21, pp. 228–230, March 1975.
19. R. F. Rice, "Some practical universal noiseless coding techniques–part I," Tech. Rep. JPL-
79-22, Jet Propulsion Laboratory, Pasadena, CA, March 1979.
20. G. Seroussi and M. J. Weinberger, "On adaptive strategies for an extended family of
Golomb-type codes," Proc. DCC’97, pp. 131-140, 1997.
21. C. J. Lian “JPEG2000 “, DSP/IC design lab, GIEE, ntu
65