Professional Documents
Culture Documents
Signal Processing
journal homepage: www.elsevier.com/locate/sigpro
a r t i c l e i n f o abstract
Article history: The discrete cosine transform (DCT) is a central mathematical operation in several digital
Received 22 October 2013 signal processing methods and image/video standards. In this paper, we propose a
Received in revised form collection of twelve approximations for the 8-point DCT based on integer functions.
21 December 2013
Considered functions include: the floor, ceiling, truncation, and rounding-off functions.
Accepted 24 December 2013
Sought approximations are required to meet the following specific criteria: (i) very low
Available online 1 January 2014
arithmetic complexity, (ii) orthogonality or quasi-orthogonality, and (iii) low-complexity
Keywords: inversion. By varying a scaling parameter, approximations could be systematically
Approximate DCT obtained and several existing approximations were identified as particular cases of the
Image compression
proposed methodology. Particular cases include the signed DCT and the rounded DCT.
Integer functions
Four new quasi-orthogonal approximations were introduced and their practical relevance
Low-complexity algorithms
was demonstrated. All approximations were given fast algorithms based on matrix
factorization methods. Proposed approximations are multiplierless; their computation
requires only additions and bit-shifting operations. Additive complexity ranged from 18 to
24 additions. Obtained approximations were compared with the exact DCT and assessed
in the context of JPEG-like image compression. As quality assessment measures, we
considered the peak signal-to-noise ratio and the structural similarity index. Because its
low-complexity and good performance properties, the proposed approximations are
suitable for hardware implementation in dedicated architectures.
& 2013 Elsevier B.V. All rights reserved.
0165-1684/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.sigpro.2013.12.027
202 R.J. Cintra et al. / Signal Processing 99 (2014) 201–214
Wang factorization [30], Lee DCT for power-of-two block- 2. Exact and approximate DCT
lengths [31], Arai DCT scheme [32], Loeffler algorithm [33],
Vetterli–Nussbaumer algorithm [28], Hou algorithm [29], 2.1. Mathematical preliminaries
and Feig–Winograd factorization [34]. All these methods
are classical results in the field and have been considered The N-point DCT is algebraically represented by the N N
for practical applications [18,35,36]. For instance, the Arai transformation matrix CN whose elements are given by [1,2]
DCT scheme was employed in a number of recent hard-
1 πðm 1Þð2n 1Þ
ware implementations of the DCT [37–39]. cm;n ¼ pffiffiffiffi βm 1 cos ;
N 2N
Naturally, DCT fast algorithms that result in major
pffiffiffi
computational savings compared to direct computation where m; n ¼ 1; 2; …; N, β0 ¼ 1, and βk ¼ 2, for k a0.
of the DCT were already developed few decades ago. In Let x ¼ ½x0 x1 ⋯ xN 1 > be an input vector, where the
fact, the intense research in the field has led to methods superscript > denotes the transposition operation. The one-
that are in the vicinity of the theoretical complexity of dimensional (1-D) DCT transform of x is the N-point vector
exact DCT [8,32,33,40,41]. Thus, the computation of the X ¼ ½X 0 X 1 ⋯ X N 1 > given by X ¼ CN x. Because CN is an
exact DCT is a task with little scope for major improve- orthogonal matrix, the inverse transformation can be written
ments in terms of minimization of computational com- according to x ¼ CN> X. The DCT is mathematically relevant
plexity by means of standard methods. because of its relationship with the KLT. Indeed, if the input
On the other hand, DCT approximations—operations that signal x is modeled after a first-order Markov process with
closely emulate the DCT—are mathematical tools that can correlation coefficient ρ, then the DCT matrix converges to the
furnish an alternative venue for DCT evaluation. Effectively, KLT matrix, as ρ converges to one (ρ-1) [2, p. 61].
DCT approximations have already been considered in a Let A and B be square matrices of size N. For two-
number of works [2,9,42–44]. Moreover, although usual dimensional (2-D) signals, we have the following expressions
fast algorithms can reduce the computational complexity that relate the forward and inverse 2-D DCT operations,
significantly, they still need floating-point operations [2]. In respectively:
contrast, approximate DCT methods can be tailored to
require very low arithmetic complexity and simple number B ¼ CN A CN> and A ¼ CN> B CN : ð1Þ
representation systems. Although the procedures described in this work can
A comprehensive list of approximate methods for the be applied to any blocklength, we focus exclusively on the
DCT is found in [2]. Prominent techniques include the 8-point DCT. Thus, for simplicity, the 8-point DCT matrix is
signed DCT (SDCT) [9], the binDCT [8], the level 1 approx- denoted as C and is given by
imation by Lengwehasatit–Ortega [45], the Bouguezel–
Ahmad–Swamy (BAS) series of algorithms [43,44,46–49], 1
C 9C8 ¼
the DCT round-off approximation [50], the modified DCT 2
2 3
round-off approximation [42], and the multiplier-free DCT γ3 γ3 γ3 γ3 γ3 γ3 γ3 γ3
6γ γ2 γ4 γ6 γ6 γ 4 γ 2 γ0 7
approximation for RF imaging [51]. 6 0 7
6 7
The goal of this paper is two-fold. First, we aim at 6 γ1 γ5 γ5 γ1 γ1 γ 5 γ5 γ1 7
6 7
proposing a systematic procedure for deriving low- 6 7
6 γ2 γ 6 γ0 γ4 γ4 γ0 γ6 γ2 7
complexity approximations for the 8-point DCT. For such, 6
6 γ3
7;
6 γ 3 γ3 γ3 γ3 γ 3 γ 3 γ3 7 7
we consider several types of rounding-off functions applied 6 7
6 γ4 γ 0 γ6 γ2 γ2 γ 6 γ0 γ4 7
to the scaled exact 8-point DCT matrix. The entries of the 6 7
6γ γ 1 γ1 γ5 γ5 γ1 γ 1 γ5 7
sought approximate DCT matrices are required to possess 4 5 5
null multiplicative complexity; only additions and simple γ6 γ 4 γ2 γ0 γ0 γ 2 γ4 γ6
bit-shifting operations are allowed. Second, we focus on
suggesting practical approximations and assessing them as where γ k ¼ cos ð2πðk þ 1Þ=32Þ, k ¼ 0; 1; …; 6. These quanti-
tools for JPEG-like image compression. ties are algebraic integers explicitly given by [37]
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
The paper unfolds as follows. In Section 2, we describe pffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffi
the mathematical framework of the DCT and we discuss the 2þ 2þ 2 2þ 2
γ0 ¼ 0:9808…; γ 1 ¼ 0:9239…;
polar decomposition method for DCT approximation [52]. 2 2
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
In Section 3, we propose a systematic method based on pffiffiffiffi pffiffiffi
integer functions for obtaining low-complexity matrices 2þ 2 2 2
γ2 ¼ 0:8315…; γ 3 ¼ 0:7071…;
useful for generating DCT approximations. The discussed 2 2
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
method is based on a computational search over a subset of pffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffi
2 2 2 2 2
candidate matrices under constraints of low-computational γ4 ¼ 0:5556…; γ 5 ¼ 0:3827…;
2 2
complexity; and orthogonality or near orthogonality. In
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Section 4, we provide details of the computational search pffiffiffiffi
2 2þ 2
and list the obtained approximations. Section 5 presents γ6 ¼ 0:1951…
fast algorithms for the obtained approximations. In Section 6, 2
the resulting approximations are subject to performance Here, we adopt the following terminology. A matrix A is
assessment in the context of image compression using image said to be orthogonal if A A > is a diagonal matrix. In
quality measures as figures of merit. In Section 7, we state particular, if A A > is the identity matrix, then A is said to
concluding remarks. be orthonormal.
R.J. Cintra et al. / Signal Processing 99 (2014) 201–214 203
In order that the entries of intðα CÞ are defined on C, nearest integer functions, we have the following ranges
we must restrict the range of α. We notice that the largest of α: ½0; 6=γ 0 , ½2=γ 0 ; 8=γ 0 , ½2=γ 0 ; 8=γ 0 , ½0; 6=γ 0 , and
element of C is γ 0 =2. Thus, it is sufficient to solve the ½1=γ 0 ; 7=γ 0 , respectively. For computational purposes, we
following inequality for α: 0 r intðα γ 0 =2Þ r 3. For the employed 106 uniformly spaced values of α (cf. (4)) over
ceiling, floor, truncation, round-away-from-zero, and all each considered interval.
Table 1
Ceiling function.
2 3
T~ 0 1 1 1 1 1 1 1 1 ð0; 2=γ 4 Þ No
61 1 1 1 0 0 0 07
6 7
6 7
61 1 0 0 0 0 1 17
6 7
61 0 0 0 1 1 1 07
6 7
6 7
61 0 0 1 1 0 0 17
6 7
61 0 1 1 0 0 1 07
6 7
6 7
41 0 1 0 0 1 0 15
1 0 1 0 1 0 1 0
Table 2
Truncation function.
2 3
T0 [50] 1 1 1 1 1 1 1 1 ð2=γ 4 ; 4=γ 0 Þ Yes
61 1 1 0 0 1 1 1 7
6 7
6 7
61 0 0 1 1 0 0 1 7
6 7
61 0 1 1 1 1 0 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 1 0 1 1 0 1 1 7
6 7
6 7
40 1 1 0 0 1 1 0 5
0 1 1 1 1 1 1 0
2 3
T1 1 1 1 1 1 1 1 1 ð4=γ 0 ; 4=γ 1 Þ Yes
62 1 1 0 0 1 1 2 7
6 7
6 7
60 1 1 0 0 1 1 0 7
6 7
61 0 2 1 1 2 0 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 2 0 1 1 0 2 1 7
6 7
6 7
41 0 0 1 1 0 0 1 5
0 1 1 2 2 1 1 0
2 3
T2 1 1 1 1 1 1 1 1 ð4=γ 1 ; 4=γ 2 Þ Yes
62 1 1 0 0 1 1 2 7
6 7
6 7
62 0 0 2 2 0 0 2 7
6 7
61 0 2 1 1 2 0 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 2 0 1 1 0 2 1 7
6 7
6 7
40 2 2 0 0 2 2 0 5
0 1 1 2 2 1 1 0
2 3
T3 2 2 2 2 2 2 2 2 ð4=γ 4 ; 6=γ 2 Þ Yes
63 2 2 0 0 2 2 3 7
6 7
6 7
63 1 1 3 3 1 1 3 7
6 7
62 0 3 2 2 3 0 2 7
6 7
6 7
62 2 2 2 2 2 2 2 7
6 7
62 3 0 2 2 0 3 2 7
6 7
6 7
41 3 3 1 1 3 3 1 5
0 2 2 3 3 2 2 0
2 3
T~ 1 1 1 1 1 1 1 1 1 ð2=γ 3 ; 2=γ 4 Þ No
61 1 0 0 0 0 1 1 7
6 7
6 7
61 0 0 1 1 0 0 1 7
6 7
61 0 1 0 0 1 0 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
60 1 0 1 1 0 1 0 7
6 7
6 7
40 1 1 0 0 1 1 0 5
0 0 1 1 1 1 0 0
206 R.J. Cintra et al. / Signal Processing 99 (2014) 201–214
Table 3
Round-away-from-zero function.
2 3
T4 1 1 1 1 1 1 1 1 1=γ 0 Yes
61 1 1 0 0 1 1 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 0 1 1 1 1 0 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 1 0 1 1 0 1 1 7
6 7
6 7
41 1 1 1 1 1 1 1 5
0 1 1 1 1 1 1 0
2 3
T~ 2 [9] 1 1 1 1 1 1 1 1 ð0; 2=γ 0 Þ No
61 1 1 1 1 1 1 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 1 1 1 1 1 1 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 1 1 1 1 1 1 1 7
6 7
6 7
41 1 1 1 1 1 1 1 5
1 1 1 1 1 1 1 1
2 3
T~ 3 1 1 1 1 1 1 1 1 ð2=γ 2 ; 2=γ 3 No
62 2 1 1 1 1 2 2 7
6 7
6 7
62 1 1 2 2 1 1 2 7
6 7
62 1 2 1 1 2 1 2 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 2 1 2 2 1 2 1 7
6 7
6 7
41 2 2 1 1 2 2 1 5
1 1 2 2 2 2 1 1
2 3
T~ 4 2 2 2 2 2 2 2 2 ð2=γ 3 ; 2=γ 4 Þ No
62 2 1 1 1 1 2 2 7
6 7
6 7
62 1 1 2 2 1 1 2 7
6 7
62 1 2 1 1 2 1 2 7
6 7
6 7
62 2 2 2 2 2 2 2 7
6 7
61 2 1 2 2 1 2 1 7
6 7
6 7
41 2 2 1 1 2 2 1 5
1 1 2 2 2 2 1 1
R.J. Cintra et al. / Signal Processing 99 (2014) 201–214 207
Table 4
Round-half-up and round-half-down functions.
Table 5 Table 6
Nearest integer functions. Orthogonal approximations.
2 3
Table 7 1 2 1 2 1 0 1 0
Deviation from diagonality measure. 61 2 1 0 1 2 1 0 7
6 7
6 7
Approximation T δðT T Þ> 61 0 1 2 1 0 1 2 7
6 7
16
61 0 1 0 1 2 1 2 7
7
T~ 0 0.4548 ¼ 6 7;
861 0 1 0 1 2 1 2 7
T~ 1 0.0646 6 7
61 0 1 2 1 0 1 2 7
T~ 2 [9] 0.1056 6 7
6 7
T~ 3 0.0063 41 2 1 0 1 2 1 0 5
T~ 4 0.0036 1 2 1 2 1 0 1 0
2 3
characteristic. Indeed, we have the following inverse 1 3 2 3 1 1 1 1
61 3 1 1 1 3 2 1 7
matrices: 6 7
6 7
2 3 61 1 1 3 1 1 2 3 7
1 1 1 1 1 1 0 1 6 7
61 61 2 1 1 3 7
6 1 0 1 1 1 1 1 7
7 ~ 1 6
T3 ¼ 6
1 1 3 7
7
6 7 61 1 2 1 1 3 1 3 7
61 1 0 1 1 1 1 1 7 6 7
6 7 61 1 1 1 1 3 7
61 1 1 1 1 1 0 1 7 6 3 2 7
~ 1 6 7 6 7
T1 ¼ 6 7 41 3 1 1 1 3 2 1 5
6 1 1 1 1 1 1 0 1 7
6 7
6 1 1 0 1 1 1 1 1 7 1 3 2 3 1 1 1 1
6 7
6 7
4 1 1 0 1 1 1 1 1 5 2 3
1 1 1 1 1 1 0 1 1=8 0 0 0 0 0 0 0
2 3 6 7
1=8 0 0 0 0 0 0 0 6 0 1=28 0 0 0 0 0 0 7
6 7
6 0 6 7
6 1=4 0 0 0 0 0 0 7
7 6
6
0 0 1=20 0 0 0 0 0 7
7
6 7 6 0 0 0 1=28 0 0 0 0 7
6 0 0 1=4 0 0 0 0 0 7
6 7 6
6
7;
7
6 7 6 0 0 0 0 1=8 0 0 0 7
6 0 0 0 1=4 0 0 0 0 7 6 7
6
6 0
7 6 0 0 0 0 0 1=28 0 0 7
6 0 0 0 1=8 0 0 0 7
7 6 7
6 7 6 0 0 0 0 0 0 1=20 0 7
6 0 0 0 0 0 1=4 0 0 7 4 5
6 7 0 0 0 0 0 0 0 1=28
6 0 0 0 0 0 0 1=4 0 7
4 5
0 0 0 0 0 0 0 1=4
2 3 2 3
1 2 2 2 1 2 0 2 1 3 2 3 1 1 1 1
61 61 1 1 3 2 1 7
6 2 0 2 1 2 2 2 7
7 6 3 1 7
6 7 6 7
61 2 0 2 1 2 2 2 7 61 1 1 3 1 1 2 3 7
6 7 6 7
61 2 1 1 3 7
16
61 2 2 2 1 2 0 2 7
7
1 6
T~ 4 ¼ 6
1 1 3 7
7
¼ 6 7; 61 1 2 1 1 3 1 3 7
861 2 2 2 1 2 0 2 7 6 7
6 7 61
61 2 0 2 1 2 2 2 7 6 1 1 3 1 1 2 3 7
7
6 7 6 7
6 7 41 3 1 1 1 3 2 1 5
41 2 0 2 1 2 2 2 5
1 2 2 2 1 2 0 2 1 3 2 3 1 1 1 1
2 3 2 3
1 1 1 1 1 0 1 0 1=16 0 0 0 0 0 0 0
61 1 1 0 1 1 1 0 7 6 0 7
6 7 6 1=28 0 0 0 0 0 0 7
6 7 6 7
61 0 1 1 1 0 1 1 7 6 0 0 1=20 0 0 0 0 0 7
6 7 6 7
61 6 7
~ 1 6 0 1 0 1 1 1 1 7
7 6 0 0 0 1=28 0 0 0 0 7
T2 ¼ 6 7 6
6 0
7
7
61 0 1 0 1 1 1 1 7 6 0 0 0 1=16 0 0 0 7
6 7 6 7
61 0 1 1 1 0 1 1 7 6 0 0 0 0 0 1=28 0 0 7
6 7 6 7
6 7 6 0 7
41 1 1 0 1 1 1 0 5 4 0 0 0 0 0 1=20 0 5
1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 1=28
2 3
1=8 0 0 0 0 0 0 0 2 3
1=2 0 0 0 0 0 0 0
6 0 7
6 1=4 0 0 0 0 0 0 7 6 0 1 0 0 0 0 0 07
6 7 6 7
6 0 0 1=8 0 0 0 0 0 7 6 7
6 7 6 0 0 1 0 0 0 0 07
6 7 6 7
6 0 0 0 1=4 0 0 0 0 7 6 0 0 0 1 0 0 0 07
6
6 0
7
7 ~ 16
¼ T3 6
7
7:
6 0 0 0 1=8 0 0 0 7 6 0 0 0 0 1=2 0 0 07
6 7 6 7
6 0 0 0 0 0 1=4 0 0 7 6 0 0 0 0 0 1 0 07
6 7 6 7
6 0 0 0 0 0 0 1=8 0 7 6 7
4 5 4 0 0 0 0 0 0 1 05
0 0 0 0 0 0 0 1=4 0 0 0 0 0 0 0 1
R.J. Cintra et al. / Signal Processing 99 (2014) 201–214 209
Table 9
Arithmetic complexity of the obtained approximations.
T0 [50] 0 22 0
T1 0 22 4
T2 0 22 6
T3 0 30 16
T4 0 24 0
T5 0 24 4
T6 [51] 0 24 6
T7 0 32 12
T~ 1 0 18 0
T~ 2 [9] 0 28 0
T~ 3 0 28 10 Fig. 2. Quality measures of the considered approximations for several
T~ 4 0 28 12 values of r according to the following figures of merit: (a) average PSNR,
and (b) average PSNR absolute percentage error relative to the DCT.
210 R.J. Cintra et al. / Signal Processing 99 (2014) 201–214
2 3 2 3
m3 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0
6 7 60 07
6 0 m3 0 0 0 0 0 0 7 6 1 1 0 0 0 0 7
6 7 6 7
6 0 0 m5 m1 0 0 0 0 7 61 0 0 1 0 0 0 07
6 7 6 7
6 7 60
6 0 0 m1 m5 0 0 0 0 7 6 1 1 0 0 0 0 07
7
K¼6
6
7; B2 ¼ 6 7;
6 0 0 0 0 m4 m6 m2 m0 77 60
6
0 0 0 1 0 0 07
7
6 7 60
6 0 0 0 0 m0 m4 m6 m2 7 6 0 0 0 0 1 0 07
7
6 7 6 7
6 0 0 0 0 m2 m0 m4 m6 7 40 0 0 0 0 0 1 05
4 5
0 0 0 0 m6 m2 m0 m4 0 0 0 0 0 0 0 1
2 3
1 0 0 0 0 0 0 1
2 3 60 1 0 0 0 0 1 0 7
1 1 0 0 0 0 0 0 6 7
6 7
61 60 0 1 0 0 1 0 0 7
6 1 0 0 0 0 0 07
7 6 7
6 7 60 0 0 1 1 0 0 0 7
60 0 0 1 0 0 0 07 6 7
6 7 B3 ¼ 6 7;
60 61 0 0 0 0 0 0 1 7
6 0 1 0 0 0 0 07
7 6 7
B1 ¼ 6 7; 60 1 0 0 0 0 1 0 7
60 0 0 0 0 0 1 07 6 7
6 7 6 7
60 40 0 1 0 0 1 0 0 5
6 0 0 0 0 0 0 17
7
6 7 0 0 0 1 1 0 0 0
40 0 0 0 0 1 0 05
0 0 0 0 1 0 0 0 where constants mi, i ¼ 0; 1; …; 6, depend on the particular
choice of transformation matrix T. In Table 8, these con-
stants are listed for each of the discussed transformations.
As a consequence, all transformations also share the same
Fig. 3. Quality measures of the considered approximations for several Fig. 4. Quality measures of the considered approximations for several
values of r according to the following figures of merit: (a) average SSIM, values of r according to the following figures of merit: (a) average SR-SIM,
and (b) average SSIM absolute percentage error relative to the DCT. and (b) average SR-SIM absolute percentage error relative to the DCT.
R.J. Cintra et al. / Signal Processing 99 (2014) 201–214 211
fast algorithm and signal flow structure, as presented in sequence [16]. Only the r initial coefficients in each block
Fig. 1. For each transform, we could assess the arithmetic were retained; the remaining coefficients being discarded.
complexity, as measured by multiplication, addition, and We adopted 1 rr r 45.
bit-shifting operation counts. A multiplication by 3 was Subsequently, the inverse 2-D transform was applied
counted as one addition and one bit-shift operation. Results and the compressed images were obtained. Original and
are shown in Table 9. All proposed algorithms are multi- compressed images were then evaluated for image degra-
plierless; requiring only additions and bit-shifts operations. dation. As quality assessment measures, we considered
As shown in (5), transforms T~ 3 and T~ 4 lead to same (i) the peak signal-to-noise ratio (PSNR) [69], (ii) the
approximations. Since T~ 4 requires more arithmetic opera- structural similarity index (SSIM) [70], and (iii) the spectral
tions than T~ 3 does, we do not consider T~ 4 for further residual based similarity index (SR-SIM) [71]. For each
analysis. value of r, average image quality measures based on the
45 images were considered. As opposed to analyzing
particular images as in [43,44,46–48], by taking average
6. Image compression
measurements, the suggested approach is less prone to
variance effects and fortuitous data. Therefore, the pro-
6.1. JPEG-like compression
posed methodology is more robust [50,72].
Fig. 5. Compressed ‘boat’ image using (a) T0 [50], (b) T4 , (c) T~ 1 , (d) T~ 2 [9], (e) T~ 3 , and (f) DCT, for r¼ 10.
212 R.J. Cintra et al. / Signal Processing 99 (2014) 201–214
Fig. 6. Compressed ‘Lena’ image using (a) T0 [50], (b) T4 , (c) T~ 1 , (d) T~ 2 [9], (e) T~ 3 , and (f) DCT, for r ¼25.
PSNR, the APE is calculated according to the following Comparing the approximations with 22 additions, we have
expression: that T0 could outperform approximations T1 and T2 in terms
of PSNR, SSIM, and SR-SIM measures for all considered
PSNRC PSNRT
APEðPSNRÞ ¼ ;
values of r. In terms of the approximations with 24 addi-
PSNR C
tions, we have that T4 showed better behavior than approx-
where PSNRC and PSNRT are the values of PSNR considering imations T5 and T6 according to the considered measures.
the exact DCT and a given approximation T, respectively. The Moreover, in comparison, T4 has the advantage of requiring
values of APEðSSIMÞ and APEðSR SIMÞ are calculated in a no bit-shifting operations. Focusing on the non-orthogonal
similar manner. transforms, T~ 3 presented the best performance in terms of
Approximation T7 outperformed all other approxima- PSNR, SSIM, and SR-SIM measures. However, T~ 1 and SDCT
tions in terms of PSNR. This is partially expected because, showed lower computational complexities.
by using all possible elements in C, it may potentially The preceding discussion permits us to identify the
better approximate the actual DCT vector basis. Never- approximations with better performance and complexity
theless, it should be noticed that T7 also possesses the trade-off. Thus, we separate the following approximations:
highest computational cost among the examined transfor- T0 , T4 , T~ 1 , and T~ 3 . Considering this restricted set of
mations. On the other hand, the non-orthogonal approx- transformations, we processed two particular images for
imation T~ 3 outperformed all other approximations in qualitative analysis. The SDCT and DCT were also consid-
terms of SSIM and SR-SIM in high compression scenarios ered for comparison purposes. Figs. 5 and 6 show ‘boat’
(r r 9). Approximation T~ 1 has the lowest computational and ‘Lena’ images after being submitted to the JPEG-like
complexity, requiring only 18 additions. The orthogonal compression experiment for r ¼10 and r ¼25, respectively.
approximation T3 showed comparable performance to the We also included the associate difference images with
non-orthogonal approximation T~ 3 . However, the non- respect to the original image in order to qualitatively
orthogonal approximation is less computationally expen- illustrate the small amount of error effected by the
sive requiring 28 additions and 10 bit-shifts; whereas proposed approximations. Measurements for PSNR, SSIM,
T3 requires 30 additions and 16 bit-shifting operations. and SR-SIM are also included.
R.J. Cintra et al. / Signal Processing 99 (2014) 201–214 213
7. Conclusion [18] N. Roma, L. Sousa, Efficient hybrid DCT-domain algorithm for video
spatial downscaling, EURASIP J. Adv. Signal Process. 2007 (2) (2007)
1–16.
This paper introduces a collection of DCT approxima- [19] International Organisation for Standardisation, Generic Coding of
tions derived from the application of common integer Moving Pictures and Associated Audio Information – Part 2: Video,
functions to the exact DCT. The proposed mathematical ISO/IEC JTC1/SC29/WG11 – Coding of Moving Pictures and Audio,
ISO, 1994.
formalism could encompass—as particular cases—several
[20] International Telecommunication Union, ITU-T Recommendation
transforms already archived in the literature. In particular, H.261 Version 1: Video Codec for Audiovisual Services at p 64
the well-known SDCT was derived as a result of the kbits, Technical Report, ITU-T, 1990.
suggested systematic procedure. All proposed transforms [21] International Telecommunication Union, ITU-T Recommendation
H.263 Version 1: Video Coding for Low Bit Rate Communication,
were given fast algorithms that have the same structure. Technical Report, ITU-T, 1995.
This suggests a common algebraic framework shared [22] T. Wiegand, G.J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the
among all discussed approximations. Only additions and H.264/AVC video coding standard, IEEE Trans. Circuits Syst. Video
Technol. 13 (7) (2003) 560–576.
simple bit-shifting operations were necessary for their [23] Joint Video Team, Recommendation H.264 and ISO/IEC 14 496–10
evaluation. The low-complexity property of the obtained AVC: Draft ITU-T Recommendation and Final Draft International
approximations makes them suitable for hardware imple- Standard of Joint Video Specification, Technical Report, ITU-T, 2003.
[24] A. Luthra, G.J. Sullivan, T. Wiegand, Introduction to the special issue
mentation in dedicated architecture employing fixed-point
on the H.264/AVC video coding standard, IEEE Trans. Circuits Syst.
arithmetic. The proposed approximations were assessed in Video Technol. 13 (7) (2003) 557–559.
terms of computational complexity and performance in [25] M.T. Pourazad, C. Doutre, M. Azimi, P. Nasiopoulos, HEVC: the new
JPEG-like compression; exhibiting a good balance between gold standard for video compression: How does HEVC compare with
H.264/AVC? IEEE Consum. Electron. Mag. 1 (3) (2012) 36–46.
cost and performance. [26] F. Bossen, B. Bross, K. Suhring, D. Flynn, HEVC complexity and
implementation analysis, IEEE Trans. Circuits Syst. Video Technol.
22 (12) (2012) 1685–1696.
[27] G.J. Sullivan, J. Ohm, W.-J. Han, T. Wiegand, Overview of the high
efficiency video coding (HEVC) standard, IEEE Trans. Circuits Syst.
Acknowledgments Video Technol. 22 (12) (2012) 1649–1668.
[28] M. Vetterli, H. Nussbaumer, Simple FFT and DCT algorithms with
reduced number of operations, Signal Process. 6 (1984) 267–278.
This research was partially supported by CNPq, CAPES, [29] H.S. Hou, A fast recursive algorithm for computing the discrete
FACEPE, and FAPERGS. cosine transform, IEEE Trans. Acoust. Signal Speech Process. 6 (10)
(1987) 1455–1461.
[30] Z. Wang, Fast algorithms for the discrete W transform and for the
References discrete Fourier transform, IEEE Trans. Acoust. Speech Signal Pro-
cess. ASSP-32 (1984) 803–816.
[1] K.R. Rao, P. Yip, Discrete Cosine Transform: Algorithms, Advantages, [31] B.G. Lee, A new algorithm for computing the discrete cosine trans-
Applications, Academic Press, San Diego, CA, 1990. form, IEEE Trans. Acoust. Speech Signal Process. ASSP-32 (1984)
[2] V. Britanak, P. Yip, K.R. Rao, Discrete Cosine and Sine Transforms, 1243–1245.
Academic Press, Boston, MA, 2007. [32] Y. Arai, T. Agui, M. Nakajima, A fast DCT-SQ scheme for images,
[3] N. Ahmed, T. Natarajan, K.R. Rao, Discrete cosine transform, IEEE Trans. IEICE E-71 (11) (1988) 1095–1097.
Trans. Comput. C-23 (1) (1974) 90–93. [33] C. Loeffler, A. Ligtenberg, G. Moschytz, Practical fast 1D DCT algo-
[4] J.L. Doob, Stochastic Processes, first edition 1953, John Wiley & Sons, rithms with 11 multiplications, in: Proceedings of the International
New York, 1990. Conference on Acoustics, Speech, and Signal Processing, 1989,
[5] I.I. Gikhman, A.V. Skorohod, The Theory of Stochastic Processes, pp. 988–991.
Springer-Verlag, New York, 1974. [34] E. Feig, S. Winograd, Fast algorithms for the discrete cosine trans-
[6] K. Karhunen, Über lineare Methoden in der Wahrscheinlichkeits- form, IEEE Trans. Signal Process. 40 (9) (1992) 2174–2193.
rechnung, Annales Academiae scientiarum Fennicae: Mathematica- [35] B. Vasudev, N. Merhav, DCT mode conversions for field/frame coded
Physica, Universitat Helsinki, 1947. MPEG video, in: IEEE Second Workshop on Multimedia Signal
[7] R.J. Clarke, Relation between the Karhunen–Loève and cosine trans- Processing, 1998, pp. 605–610.
forms, IEEE Proc. F Commun. Radar Signal Process. 128 (6) (1981) [36] M.C. Lin, L.R. Dung, P.K. Weng, An ultra-low-power image compres-
359–360. sor for capsule endoscope, BioMed. Eng. OnLine 5 (1) (2006) 1–8.
[8] J. Liang, T.D. Tran, Fast multiplierless approximation of the DCT with [37] H.L.P.A. Madanayake, R.J. Cintra, D. Onen, V.S. Dimitrov, L.T. Bruton,
the lifting scheme, IEEE Trans. Signal Process. 49 (2001) 3032–3044. Algebraic integer based 8 8 2-D DCT architecture for digital video
[9] T.I. Haweel, A new square wave transform based on the DCT, Signal processing, in: IEEE International Symposium on Circuits and
Process. 82 (2001) 2309–2319. Systems (ISCAS), 2011, pp. 1247–1250.
[10] F. Gianfelici, RBF-based technique for statistical demodulation of [38] N. Rajapaksha, A. Edirisuriya, A. Madanayake, R.J. Cintra, D. Onen,
pathological tremor, IEEE Trans. Neural Netw. Learn. Syst. 24 (10) I. Amer, V.S. Dimitrov, Asynchronous realization of algebraic integer-
(2013) 1565–1574. based 2D DCT using Achronix Speedster SPD60 FPGA, J. Electr.
[11] F. Gianfelici, D. Farina, An effective classification framework for Comput. Eng. 2013 (2013) 1–9.
brain–computer interfacing based on a combinatoric setting, IEEE [39] A. Edirisuriya, A. Madanayake, V. Dimitrov, R.J. Cintra, J. Adikari, VLSI
Trans. Signal Process. 60 (3) (2012) 1446–1459. architecture for 8-point AI-based Arai DCT having low area-time
[12] C. Turchetti, G. Biagetti, F. Gianfelici, P. Crippa, Nonlinear system complexity and power at improved accuracy, J. Low Power Electron.
identification: an effective framework based on the Karhunen–Loève Appl. 2 (2) (2012) 127–142.
transform, IEEE Trans. Signal Process. 57 (2) (2009) 536–550. [40] A. Edirisuriya, A. Madanayake, R.J. Cintra, V.S. Dimitrov, N.
[13] F. Gianfelici, C. Turchetti, P. Crippa, A non-probabilistic recognizer of T. Rajapaksha, A single-channel architecture for algebraic integer
stochastic signals based on KLT, Signal Process. 89 (4) (2009) based 8 8 2-D DCT computation, IEEE Trans. Circuits Syst. Video
422–437. Technol. 23 (12) (2013) 2083–2089.
[14] M. Gastpar, P.-L. Dragotti, M. Vetterli, The distributed Karhunen– [41] M.T. Heideman, C.S. Burrus, Multiplicative Complexity, Convolution,
Loève transform, IEEE Trans. Inf. Theory 52 (12) (2006) 5177–5196. and The DFT, Signal Processing and Digital Filtering, Springer-Verlag,
[15] V. Bhaskaran, K. Konstantinides, Image and Video Compression New York, NY, 1988.
Standards, Kluwer Academic Publishers, Boston, 1997. [42] F.M. Bayer, R.J. Cintra, DCT-like transform for image compression
[16] G.K. Wallace, The JPEG still picture compression standard, IEEE requires 14 additions only, Electron. Lett. 48 (15) (2012) 919–921.
Trans. Consum. Electron. 38 (1) (1992) xviii–xxxiv. [43] S. Bouguezel, M.O. Ahmad, M.N.S. Swamy, A multiplication-free
[17] W.B. Pennebaker, J.L. Mitchell, JPEG Still Image Data Compression transform for image compression, in: 2nd International Conference
Standard, Van Nostrand Reinhold, New York, NY, 1992. on Signals, Circuits and Systems, 2008, pp. 1–4.
214 R.J. Cintra et al. / Signal Processing 99 (2014) 201–214
[44] S. Bouguezel, M.O. Ahmad, M.N.S. Swamy, A low-complexity para- [58] K. Nakagaki, R. Mukundan, A fast 4 4 forward discrete Tchebichef
metric transform for image compression, in: IEEE International transform algorithm, IEEE Signal Process. Lett. 14 (10) (2007)
Symposium on Circuits and Systems (ISCAS), 2011. 684–687.
[45] K. Lengwehasatit, A. Ortega, Scalable variable complexity approx- [59] F.M. Bayer, R.J. Cintra, Image compression via a fast DCT approxima-
imate forward DCT, IEEE Trans. Circuits Syst. Video Technol. 14 (11) tion, IEEE Lat. Am. Trans. 8 (6) (2010) 708–713.
(2004) 1236–1248. [60] J.W. Eaton, D. Bateman, S. Hauberg, GNU Octave Manual Version 3,
[46] S. Bouguezel, M.O. Ahmad, M.N.S. Swamy, Low-complexity 8 8 Network Theory Limited, 2008.
transform for image compression, Electron. Lett. 44 (21) (2008) [61] R.L. Graham, D.E. Knuth, O. Patashnik, Concrete Mathematics, 2nd
1249–1250. edition, Addison-Wesley, Upper Saddle River, NJ, 2008.
[47] S. Bouguezel, M.O. Ahmad, M.N.S. Swamy, A fast 8 8 transform for [62] G. Plonka, A global method for invertible integer DCT and integer
image compression, in: International Conference on Microelectro- wavelet algorithms, Appl. Comput. Harmon. Anal. 16 (2004) 90–110.
nics (ICM), 2009, pp. 74–77. [63] K. Oldham, J. Myland, J. Spanier, An Atlas of Functions, 2nd edition,
[48] S. Bouguezel, M.O. Ahmad, M.N.S. Swamy, A novel transform for
Springer, 2008.
image compression, in: 53rd IEEE International Midwest Sympo-
[64] International Organization for Standardization, ISO/IEC/IEEE 60559:2011,
sium on Circuits and Systems (MWSCAS), 2010, pp. 509–512.
2011.
[49] S. Bouguezel, M.O. Ahmad, M.N.S. Swamy, Binary discrete cosine and
[65] Wolfram Research, Round—nearest integer function 〈http://func
Hartley transforms, IEEE Trans. Circuits Syst. I: Regul. Pap. 60 (4)
tions.wolfram.com/IntegerFunctions/Round/27/01/01/01/〉, Septem-
(2013) 989–1002.
[50] R.J. Cintra, F.M. Bayer, A DCT approximation for image compression, ber 2013.
IEEE Signal Process. Lett. 18 (10) (2011) 579–582. [66] B.N. Flury, W. Gautschi, An algorithm for simultaneous orthogonal
[51] U.S. Potluri, A. Madanayake, R.J. Cintra, F.M. Bayer, N. Rajapaksha, transformation of several positive definite symmetric matrices to
Multiplier-free DCT approximations for RF multi-beam digital nearly diagonal form, SIAM J. Sci. Stat. Comput. 7 (1) (1986) 169–184.
aperture-array space imaging and directional sensing, Meas. Sci. [67] R.E. Blahut, Fast Algorithms for Signal Processing, Cambridge Uni-
Technol. 23 (11) (2012) 114003. versity Press, Cambridge, UK, 2010.
[52] R.J. Cintra, An integer approximation method for discrete sinusoidal [68] The USC-SIPI Image Database 〈http://sipi.usc.edu/database/〉, Uni-
transforms, J. Circuits Syst. Signal Process. 30 (6) (2011) 1481–1501. versity of Southern California, Signal and Image Processing Institute,
[53] G.A.F. Seber, A Matrix Handbook for Statisticians, John Wiley & Sons, 2011.
Inc., Hoboken, NJ, 2008. [69] Q. Huynh-Thu, M. Ghanbari, Scope of validity of PSNR in image/
[54] N.J. Higham, Computing real square roots of a real matrix, Linear video quality assessment, Electron. Lett. 44 (13) (2008) 800–801.
Algebra Appl. 88/89 (1987) 405–430. [70] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality
[55] MATLAB, version 8.1 (R2013a) Documentation, The MathWorks Inc., assessment: from error visibility to structural similarity, IEEE Trans.
Natick, MA, 2013. Image Process. 13 (4) (2004) 600–612.
[56] F.M. Bayer, R.J. Cintra, A. Edirisuriya, A. Madanayake, A digital [71] L. Zhang, H. Li, SR-SIM: a fast and high performance IQA index based
hardware fast algorithm and FPGA-based prototype for a novel 16- on spectral residual, in: 2012 19th IEEE International Conference on
point approximate DCT for image compression applications, Measur. Image Processing (ICIP), 2012, pp. 1473–1476. http://dx.doi.org/10.
Sci. Technol. 23 (8) (2012) 114010. 1109/ICIP.2012.6467149.
[57] S. Ishwar, P.K. Meher, M.N.S. Swamy, Discrete Tchebichef transform— [72] S.M. Kay, Fundamentals of Statistical Signal Processing, Volume I:
A fast 4 4 algorithm and its application in image/video compres- Estimation Theory, Prentice Hall Signal Processing Series, vol. 1,
sion, in: IEEE International Symposium on Circuits and Systems
Prentice Hall, Upper Saddle River, NJ, 1993.
(ISCAS), 2008, pp. 260–263.