You are on page 1of 14

Signal Processing 99 (2014) 201–214

Contents lists available at ScienceDirect

Signal Processing
journal homepage: www.elsevier.com/locate/sigpro

Low-complexity 8-point DCT approximations


based on integer functions
R.J. Cintra a,b,n, F.M. Bayer c, C.J. Tablada a
a
Signal Processing Group, Departamento de Estatística, Universidade Federal de Pernambuco, Brazil
b
Department of Electrical and Computer Engineering, University of Akron, OH, United States
c
Departamento de Estatística and LACESM, Universidade Federal de Santa Maria, Brazil

a r t i c l e i n f o abstract

Article history: The discrete cosine transform (DCT) is a central mathematical operation in several digital
Received 22 October 2013 signal processing methods and image/video standards. In this paper, we propose a
Received in revised form collection of twelve approximations for the 8-point DCT based on integer functions.
21 December 2013
Considered functions include: the floor, ceiling, truncation, and rounding-off functions.
Accepted 24 December 2013
Sought approximations are required to meet the following specific criteria: (i) very low
Available online 1 January 2014
arithmetic complexity, (ii) orthogonality or quasi-orthogonality, and (iii) low-complexity
Keywords: inversion. By varying a scaling parameter, approximations could be systematically
Approximate DCT obtained and several existing approximations were identified as particular cases of the
Image compression
proposed methodology. Particular cases include the signed DCT and the rounded DCT.
Integer functions
Four new quasi-orthogonal approximations were introduced and their practical relevance
Low-complexity algorithms
was demonstrated. All approximations were given fast algorithms based on matrix
factorization methods. Proposed approximations are multiplierless; their computation
requires only additions and bit-shifting operations. Additive complexity ranged from 18 to
24 additions. Obtained approximations were compared with the exact DCT and assessed
in the context of JPEG-like image compression. As quality assessment measures, we
considered the peak signal-to-noise ratio and the structural similarity index. Because its
low-complexity and good performance properties, the proposed approximations are
suitable for hardware implementation in dedicated architectures.
& 2013 Elsevier B.V. All rights reserved.

1. Introduction desirable because the DCT is equipped with fast algorithms


and is adequate for hardware implementation.
The discrete cosine transform (DCT) is widely regarded The DCT has been considered and effectively adopted in
as a key operation in digital signal processing [1,2]. In fact, a number of methods for image and video coding [15]. As a
the DCT is asymptotically equivalent to the Karhunen–Loève result, the DCT is the central mathematical step for the
transform (KLT) [3–5], which is an optimal tool in terms of following standards: JPEG [16,17], MPEG-1 [18], MPEG-2
decorrelation and energy compaction properties [1–3,6–9]. [19], H.261 [20], H.263 [21], H.264 [22–24], and the recent
The KLT has been considered in several contemporary HEVC [25–27]. All the above standards consider the
applications [10–13]. In [14], a distributed version of the particular 8-point DCT.
KLT, which is suitable for sensor networks data, was Thus, developing fast algorithms for the efficient eva-
introduced. However, when high correlated first-order Mar- luation of the 8-point DCT is a main task in the circuits,
kov signals are considered [1,2]—such as natural images [8] systems, and signal processing communities. Archived
—the DCT can closely emulate the KLT. This approximation is literature contains a multitude of DCT fast algorithms for
n
this particular blocklength [28,29]. Remarkably extensive
Corresponding author at: Signal Processing Group, Departamento
de Estatística, Universidade Federal de Pernambuco, Brazil.
reports have been generated amalgamating scattered
E-mail addresses: rjdsc@stat.ufpe.org (R.J. Cintra), methods for the 8-point DCT [1,2]. Among the most
bayer@ufsm.br (F.M. Bayer). popular techniques, we mention the following algorithms:

0165-1684/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.sigpro.2013.12.027
202 R.J. Cintra et al. / Signal Processing 99 (2014) 201–214

Wang factorization [30], Lee DCT for power-of-two block- 2. Exact and approximate DCT
lengths [31], Arai DCT scheme [32], Loeffler algorithm [33],
Vetterli–Nussbaumer algorithm [28], Hou algorithm [29], 2.1. Mathematical preliminaries
and Feig–Winograd factorization [34]. All these methods
are classical results in the field and have been considered The N-point DCT is algebraically represented by the N  N
for practical applications [18,35,36]. For instance, the Arai transformation matrix CN whose elements are given by [1,2]
DCT scheme was employed in a number of recent hard-  
1 πðm  1Þð2n  1Þ
ware implementations of the DCT [37–39]. cm;n ¼ pffiffiffiffi βm  1 cos ;
N 2N
Naturally, DCT fast algorithms that result in major
pffiffiffi
computational savings compared to direct computation where m; n ¼ 1; 2; …; N, β0 ¼ 1, and βk ¼ 2, for k a0.
of the DCT were already developed few decades ago. In Let x ¼ ½x0 x1 ⋯ xN  1  > be an input vector, where the
fact, the intense research in the field has led to methods superscript > denotes the transposition operation. The one-
that are in the vicinity of the theoretical complexity of dimensional (1-D) DCT transform of x is the N-point vector
exact DCT [8,32,33,40,41]. Thus, the computation of the X ¼ ½X 0 X 1 ⋯ X N  1  > given by X ¼ CN  x. Because CN is an
exact DCT is a task with little scope for major improve- orthogonal matrix, the inverse transformation can be written
ments in terms of minimization of computational com- according to x ¼ CN>  X. The DCT is mathematically relevant
plexity by means of standard methods. because of its relationship with the KLT. Indeed, if the input
On the other hand, DCT approximations—operations that signal x is modeled after a first-order Markov process with
closely emulate the DCT—are mathematical tools that can correlation coefficient ρ, then the DCT matrix converges to the
furnish an alternative venue for DCT evaluation. Effectively, KLT matrix, as ρ converges to one (ρ-1) [2, p. 61].
DCT approximations have already been considered in a Let A and B be square matrices of size N. For two-
number of works [2,9,42–44]. Moreover, although usual dimensional (2-D) signals, we have the following expressions
fast algorithms can reduce the computational complexity that relate the forward and inverse 2-D DCT operations,
significantly, they still need floating-point operations [2]. In respectively:
contrast, approximate DCT methods can be tailored to
require very low arithmetic complexity and simple number B ¼ CN  A  CN> and A ¼ CN>  B  CN : ð1Þ
representation systems. Although the procedures described in this work can
A comprehensive list of approximate methods for the be applied to any blocklength, we focus exclusively on the
DCT is found in [2]. Prominent techniques include the 8-point DCT. Thus, for simplicity, the 8-point DCT matrix is
signed DCT (SDCT) [9], the binDCT [8], the level 1 approx- denoted as C and is given by
imation by Lengwehasatit–Ortega [45], the Bouguezel–
Ahmad–Swamy (BAS) series of algorithms [43,44,46–49], 1
C 9C8 ¼
the DCT round-off approximation [50], the modified DCT 2
2 3
round-off approximation [42], and the multiplier-free DCT γ3 γ3 γ3 γ3 γ3 γ3 γ3 γ3
6γ γ2 γ4 γ6  γ6 γ 4 γ 2  γ0 7
approximation for RF imaging [51]. 6 0 7
6 7
The goal of this paper is two-fold. First, we aim at 6 γ1 γ5  γ5  γ1  γ1 γ 5 γ5 γ1 7
6 7
proposing a systematic procedure for deriving low- 6 7
6 γ2 γ 6  γ0  γ4 γ4 γ0 γ6  γ2 7
complexity approximations for the 8-point DCT. For such, 6
6 γ3
7;
6 γ 3  γ3 γ3 γ3 γ 3 γ 3 γ3 7 7
we consider several types of rounding-off functions applied 6 7
6 γ4 γ 0 γ6 γ2  γ2 γ 6 γ0  γ4 7
to the scaled exact 8-point DCT matrix. The entries of the 6 7
6γ γ 1 γ1  γ5  γ5 γ1 γ 1 γ5 7
sought approximate DCT matrices are required to possess 4 5 5
null multiplicative complexity; only additions and simple γ6 γ 4 γ2  γ0 γ0 γ 2 γ4  γ6
bit-shifting operations are allowed. Second, we focus on
suggesting practical approximations and assessing them as where γ k ¼ cos ð2πðk þ 1Þ=32Þ, k ¼ 0; 1; …; 6. These quanti-
tools for JPEG-like image compression. ties are algebraic integers explicitly given by [37]
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
The paper unfolds as follows. In Section 2, we describe pffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffi
the mathematical framework of the DCT and we discuss the 2þ 2þ 2 2þ 2
γ0 ¼  0:9808…; γ 1 ¼  0:9239…;
polar decomposition method for DCT approximation [52]. 2 2
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
In Section 3, we propose a systematic method based on pffiffiffiffi pffiffiffi
integer functions for obtaining low-complexity matrices 2þ 2 2 2
γ2 ¼  0:8315…; γ 3 ¼  0:7071…;
useful for generating DCT approximations. The discussed 2 2
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
method is based on a computational search over a subset of pffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffi
2 2 2 2 2
candidate matrices under constraints of low-computational γ4 ¼  0:5556…; γ 5 ¼  0:3827…;
2 2
complexity; and orthogonality or near orthogonality. In
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Section 4, we provide details of the computational search pffiffiffiffi
2 2þ 2
and list the obtained approximations. Section 5 presents γ6 ¼  0:1951…
fast algorithms for the obtained approximations. In Section 6, 2
the resulting approximations are subject to performance Here, we adopt the following terminology. A matrix A is
assessment in the context of image compression using image said to be orthogonal if A  A > is a diagonal matrix. In
quality measures as figures of merit. In Section 7, we state particular, if A  A > is the identity matrix, then A is said to
concluding remarks. be orthonormal.
R.J. Cintra et al. / Signal Processing 99 (2014) 201–214 203

2.2. DCT approximations properties of the resulting DCT approximation are in


principle lost. In this case, the off-diagonal elements
Generally, a DCT approximation is a transformation C^ contribute to a computational complexity increase and
that—according to some specified metric—behaves simi- the absorption of matrix S in JPEG-like schemes cannot be
larly to the exact DCT matrix C. An approximation matrix C^ easily done. However, at the expense of not providing an
is usually based on a transformation matrix T of low orthonormal approximation, one may consider approxi-
computational complexity. Indeed, matrix T is the key mating S itself by replacing the off-diagonal elements of D
component of a given DCT approximation. by zeros. Thus, the resulting matrix S^ is given by
Often the elements of the transformation matrix T qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
possess null multiplicative complexity. For instance, this S^ ¼ ½diagðT  T > Þ  1 ;
property can be satisfied by restricting the entries of T to
where diagðÞ returns a diagonal matrix with the diagonal
the set of powers of two f0; 7 1; 72; 74; 7 8; …g. In fact,
elements of its matrix argument. Thus, the non-orthogonal
multiplications by such elements are trivial and require
approximation is furnished by
only bit-shifting operations.
Approximations for the DCT can be classified into two C~ ¼ S^  T:
categories depending on whether C^ is orthonormal or not. Matrix C~ can be a meaningful approximation if S^ is, in some
We start discussing the orthogonal case. In principle, given sense, close to S; or, alternatively, if T is almost orthogonal.
a low-complexity matrix T it is possible to derive an From the algorithm designing perspective, proposing
orthonormal matrix C^ based on T by means of the polar non-orthogonal approximations may be a less demanding
decomposition [52,53]. Indeed, if T is a full rank real matrix, task, since (3) is not required to be satisfied. However,
then the following factorization is uniquely determined: since C~ is not orthonormal, the inverse transformation
C^ ¼ S  T; ð2Þ must be cautiously examined. Indeed, the inverse trans-
formation does not employ directly the low-complexity
where S is a symmetric positive definite matrix matrix T and is given by
[53, p. 348]. Matrix S is explicitly related to T according to 1 1
the following relation: C~ ¼ T  1  S^ :
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Even if T is a low-complexity matrix, it is not guaranteed
S ¼ ðT  T > Þ  1 ; that T  1 also possesses low computational complexity
pffi figures. Nevertheless, it is possible to obtain non-orthogo-
where  denotes the matrix square root operation [54,55].
Being orthonormal, such kind of approximation satisfies nal approximations whose both direct and inverse trans-
1 > formation matrices have low computational complexity.
C^ ¼ C^ . Therefore, we have that This situation is illustrated by two prominent approxima-
1 tions: the SDCT [9] and the BAS approximation described
C^ ¼ T>  S> :
in [46].
1
As a consequence, the inverse transformation C^ inherits
3. Scaling and integer mapping
the same computational complexity of the forward
transformation.
Approximations archived in the literature often possess
From the computational point of view, it is desirable
transformation matrices with entries defined on the set
that S be a diagonal matrix. In this case, the computational
C0 ¼ f0; 71; 7 2g [9,42,44,49,50]. Thus such transforma-
complexity of C^ is the same as that of T, except for the
tions possess null multiplicative complexity, because the
scale factors in the diagonal matrix S. Moreover, depend-
required arithmetic operations can be implemented exclu-
ing on the considered application, even the constants in S
sively by means of additions and bit-shifting operations.
can be disregarded in terms of computational complexity
However, in [57,58], an image compression scheme based
assessment. This occurs when the involved constants are
on the Tchebichef transform was advanced for image
trivial multiplicands, such as the powers of two. Another
compression. This particular method employs a discrete
more practical possibility for neglecting the complexity of S
transformation referred to as the discrete Tchebichef
arises when it can be absorbed into other sections of a larger
transform (DTT) [57]. The implied DTT low-complexity
procedure. This is the case in JPEG-like compression, where
matrix possesses entries defined on C0 but also considers
the quantization step is present [16]. Thus, matrix S can be
the elements 73. Multiplications by constants 7 3 can be
incorporated into the quantization matrix [42,44–47,50,56].
implemented by means of one addition and one bit-
In terms of the inverse transformation, it is also beneficial
1 shifting operation (3  x ¼ 2  x þ x). Thus, as suggested in
that S is diagonal, because the complexity of C^ becomes
> [57], we adopt C ¼ C0 [ f 7 3g as the domain set of the
essentially that of T .
entries for the sought DCT approximations. Nevertheless,
In order that S be a diagonal matrix, it is sufficient that
we emphasize that approximations with entries 73 are
T satisfies the orthogonality condition:
expected to possess a higher computational complexity.
T  T > ¼ D; ð3Þ In [9] Haweel introduced a simple approach for design-
ing a DCT approximation. The DCT approximation termed
where D is a diagonal matrix [52].
as SDCT was defined as follows [9]:
Now we address the non-orthogonal case. If (3) is not
satisfied, then S is not diagonal and the advantageous signðCÞ;
204 R.J. Cintra et al. / Signal Processing 99 (2014) 201–214

where signðÞ is the signum function applied to each entry 8 


>
> 1 2x  1
of C and is given by >
< x þ if A Z;
2 4
8 roundODD ðxÞ ¼  
< þ1 if x 4 0;
> >
>
> x
1
otherwise:
: 2
signðxÞ ¼ 0 if x ¼ 0;
>
:
1 if x o 0: The round-half-away-from-zero function is the imple-
mentation employed in the round function in Matlab/Octave.
The SDCT can be regarded as a seminal work in the field of The international technical standard ISO/IEC/IEEE 60559:2011
DCT approximations. recommends roundEVEN ðÞ as the nearest integer function of
Additionally, in [50,52,59] a low-complexity DCT choice [64]. This latter implementation is adopted in the
approximation was proposed based on the following scientific computation software Mathematica [65].
matrix:
roundð2  CÞ; 4. Computational search
where roundðÞ is the entrywise rounding-off function as
implemented in Matlab/Octave programming languages 4.1. Problem setup
[55,60]. In this work, our goal is to expand and generalize
the methods employed to derive these two approxima- In this section, we exhaustively compute (4) for judi-
tions above. ciously chosen values of α such that the following condi-
As a venue to design DCT approximations, we consider tions about T ¼ intðα  CÞ are satisfied:
integer functions [61, Chapter 3]. An integer function is
simply a function whose values are integers. We aim at (a) matrix T must possess its elements defined on C;
mapping the exact entries of the DCT matrix into integer (b) T  T > must be a diagonal matrix or must exhibit a
quantities. The resulting matrix is sought to approximate small deviation from diagonality in the sense
the DCT. For such end, we adopt the following general described in [66];
mapping: (c) if T is not orthogonal (cf. (3)), but T  T > is approxi-
mately a diagonal matrix, then the inverse matrix T  1
R⟶M8 ðZÞ must possess low-complexity with its elements defined
α⟼intðα  CÞ; ð4Þ on C.
where M8 ðZÞ is the space of 8  8 matrices over the set of
integers Z and intðÞ is a prototype integer function [61, p. 67]. Condition (a) ensures that the forward transformation
Function intðÞ operates entrywise over its matrix argument. is a low-complexity operation. Therefore, in terms of
Parameter α is termed as the expansion factor and scales the implementation, it may require simple hardware struc-
exact DCT matrix allowing a wide range of possible integer tures. If (3) is satisfied, then the inverse transformation is
mappings [62]. guaranteed to have low computational complexity. This is
Particular examples of integer functions are the floor, because T becomes the transpose of itself, apart from the
ceiling, truncation (round towards zero), and round-away- multiplication by a diagonal matrix. On the other hand, if
from-zero functions. These functions are defined, respec- (3) is not satisfied, then resulting matrices may not be
tively, as follows: useful in contexts that depend on orthogonalization.
Nevertheless, if T  T > is ‘almost’ diagonal, then T can be
floorðxÞ ¼ ⌊xc ¼ maxfm A Z∣m rxg;
of interest. In that case, one may explicitly check whether
ceilðxÞ ¼ ⌈x⌉ ¼ minfn A Z∣n Zxg;
T  1 has low complexity. Thus, Condition (c) is considered.
truncðxÞ ¼ signðxÞ  ⌊jxjc; To quantify the deviation from diagonality, as required
roundAFZ ðxÞ ¼ signðxÞ  ⌈jxj⌉; in Condition (b), we adopt the following measure, called
where j  j returns the absolute value of its argument. deviation from diagonality.
Another particularly useful integer function is the round Definition 1. Let A be a square matrix. Then its deviation
to nearest integer function [63, p. 73]. This function possesses from diagonality is given by [66]
various definitions depending on its behavior for input
arguments whose fractional part is exactly 1=2. Thus, relevant J diagðAÞ J F
δð A Þ ¼ 1  ;
rounding-off methods are the round-half-up, round-half- JAJF
down, round-half-away-from-zero, round-half-towards-zero, where J  J F denotes the Frobenius norm for matrices [53].
round-half-to-even, and round-half-to-odd functions. These
different nearest integer functions are, respectively, given by As a design criterion, we adopt the deviation from
diagonality exhibited by the SDCT as the maximum devia-
roundHU ðxÞ ¼ ⌊x þ 12 c; roundHD ðxÞ ¼ ⌈x  12 ⌉; tion acceptable for non-orthogonalpffiffiffi approximations. Such
 
roundHAFZ ðxÞ ¼ signðxÞ  ⌊x þ 12 c; threshold value is equal to 1  2= 5. The SDCT was chosen
  as a reference transformation because (i) it has proven
roundHTZ ðxÞ ¼ signðxÞ  ⌈x  12 ⌉;
8  good properties [9] and (ii) it is widely employed in
> x1
> 2x  1 performance comparisons [42,43,46–50]. Thus, according
>
< if A Z;
2 4 to this criterion, Condition (b) becomes:
roundEVEN ðxÞ ¼  
>
> 1
>
: xþ2 otherwise; 
(b) δ T  T > r 1  p2ffiffi5  0:1056.
R.J. Cintra et al. / Signal Processing 99 (2014) 201–214 205

In order that the entries of intðα  CÞ are defined on C, nearest integer functions, we have the following ranges
we must restrict the range of α. We notice that the largest of α: ½0; 6=γ 0 , ½2=γ 0 ; 8=γ 0 , ½2=γ 0 ; 8=γ 0 , ½0; 6=γ 0 , and
element of C is γ 0 =2. Thus, it is sufficient to solve the ½1=γ 0 ; 7=γ 0 , respectively. For computational purposes, we
following inequality for α: 0 r intðα  γ 0 =2Þ r 3. For the employed 106 uniformly spaced values of α (cf. (4)) over
ceiling, floor, truncation, round-away-from-zero, and all each considered interval.

Table 1
Ceiling function.

Approximation Transformation matrix Range of α Orthogonal?

2 3
T~ 0 1 1 1 1 1 1 1 1 ð0; 2=γ 4 Þ No
61 1 1 1 0 0 0 07
6 7
6 7
61 1 0 0 0 0 1 17
6 7
61 0 0 0 1 1 1 07
6 7
6 7
61 0 0 1 1 0 0 17
6 7
61 0 1 1 0 0 1 07
6 7
6 7
41 0 1 0 0 1 0 15
1 0 1 0 1 0 1 0

Table 2
Truncation function.

Approximation Transformation matrix Range of α Orthogonal?

2 3
T0 [50] 1 1 1 1 1 1 1 1 ð2=γ 4 ; 4=γ 0 Þ Yes
61 1 1 0 0 1 1 1 7
6 7
6 7
61 0 0 1 1 0 0 1 7
6 7
61 0 1 1 1 1 0 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 1 0 1 1 0 1 1 7
6 7
6 7
40 1 1 0 0 1 1 0 5
0 1 1 1 1 1 1 0
2 3
T1 1 1 1 1 1 1 1 1 ð4=γ 0 ; 4=γ 1 Þ Yes
62 1 1 0 0 1 1 2 7
6 7
6 7
60 1 1 0 0 1 1 0 7
6 7
61 0 2 1 1 2 0 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 2 0 1 1 0 2 1 7
6 7
6 7
41 0 0 1 1 0 0 1 5
0 1 1 2 2 1 1 0
2 3
T2 1 1 1 1 1 1 1 1 ð4=γ 1 ; 4=γ 2 Þ Yes
62 1 1 0 0 1 1 2 7
6 7
6 7
62 0 0 2 2 0 0 2 7
6 7
61 0 2 1 1 2 0 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 2 0 1 1 0 2 1 7
6 7
6 7
40 2 2 0 0 2 2 0 5
0 1 1 2 2 1 1 0
2 3
T3 2 2 2 2 2 2 2 2 ð4=γ 4 ; 6=γ 2 Þ Yes
63 2 2 0 0 2 2 3 7
6 7
6 7
63 1 1 3 3 1 1 3 7
6 7
62 0 3 2 2 3 0 2 7
6 7
6 7
62 2 2 2 2 2 2 2 7
6 7
62 3 0 2 2 0 3 2 7
6 7
6 7
41 3 3 1 1 3 3 1 5
0 2 2 3 3 2 2 0
2 3
T~ 1 1 1 1 1 1 1 1 1 ð2=γ 3 ; 2=γ 4 Þ No
61 1 0 0 0 0 1 1 7
6 7
6 7
61 0 0 1 1 0 0 1 7
6 7
61 0 1 0 0 1 0 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
60 1 0 1 1 0 1 0 7
6 7
6 7
40 1 1 0 0 1 1 0 5
0 0 1 1 1 1 0 0
206 R.J. Cintra et al. / Signal Processing 99 (2014) 201–214

4.2. Obtained approximations respectively. Moreover, from Table 3, we have that


2 3
2 0 0 0 0 0 0 0
In terms of Condition (a), the ceiling function could 60 1
6 0 0 0 0 0 07 7
supply only one low-complexity candidate matrix for DCT 6 7
approximation as displayed in Table 1. The specific range 60 0 1 0 0 0 0 07
6 7
of α for which the obtained matrix arises is also shown in 60 0 0 1 0 0 0 07
~ 6 7 ~
T4 ¼ 6 7  T 3: ð5Þ
Table 1. Henceforth, we adopt the following notation: 60 0 0 0 2 0 0 07
6 7
orthogonal matrices are denoted as Tk , k ¼ 0; 1; 2; …; and 60 0 0 0 0 1 0 07
6 7
non-orthogonal ones are referred to as T~ k , k ¼ 0; 1; 2; … . 6
40 0 0 0 0 0 1 05
7
However, notice that this particular matrix (Table 1) is
0 0 0 0 0 0 0 1
non-orthogonal and its deviation from diagonality is
exceedingly high. In fact, we have that δðT~ 0 Þ  0:4548. As a consequence, by means of (2), both T~ 3 and T~ 4 lead to
For such reason, this approximation will not be further the same DCT approximation.
considered in this paper. Hereafter, we only list matrices The nearest integer function may result in different
that satisfy all prescribed conditions. approximations when the choice of α results in a matrix
The floor function could not furnish any matrix under α  C whose entries are possibly half-integers. The values of
the prescribed requirements. On the other hand, when α that effect half-integers are of the form l=γ k , k ¼ 0; 1; …; 6,
considering the truncation function, five matrices were where l A Z. Apart from these critical points, the different
obtained, being listed in Table 2. Both orthogonal and non- types of nearest integer functions behave identically. By
orthogonal matrices were found. Similarly, for the round- examining these boundary cases, we could establish inter-
away-from-zero function, a set of four distinct matrices vals for which each of the discussed nearest integer
was derived. These matrices are shown in Table 3. From functions results in meaningful DCT approximations.
Tables 2 and 3, we notice that matrices T0 and T~ 2 coincide Table 4 brings the obtained low-complexity matrices
with the rounded DCT reported in [50] and the SDCT [9], derived from the round-half-up and round-half-down

Table 3
Round-away-from-zero function.

Approximation Transformation matrix Range of α Orthogonal?

2 3
T4 1 1 1 1 1 1 1 1 1=γ 0 Yes
61 1 1 0 0 1 1 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 0 1 1 1 1 0 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 1 0 1 1 0 1 1 7
6 7
6 7
41 1 1 1 1 1 1 1 5
0 1 1 1 1 1 1 0
2 3
T~ 2 [9] 1 1 1 1 1 1 1 1 ð0; 2=γ 0 Þ No
61 1 1 1 1 1 1 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 1 1 1 1 1 1 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 1 1 1 1 1 1 1 7
6 7
6 7
41 1 1 1 1 1 1 1 5
1 1 1 1 1 1 1 1
2 3
T~ 3 1 1 1 1 1 1 1 1 ð2=γ 2 ; 2=γ 3  No
62 2 1 1 1 1 2 2 7
6 7
6 7
62 1 1 2 2 1 1 2 7
6 7
62 1 2 1 1 2 1 2 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 2 1 2 2 1 2 1 7
6 7
6 7
41 2 2 1 1 2 2 1 5
1 1 2 2 2 2 1 1
2 3
T~ 4 2 2 2 2 2 2 2 2 ð2=γ 3 ; 2=γ 4 Þ No
62 2 1 1 1 1 2 2 7
6 7
6 7
62 1 1 2 2 1 1 2 7
6 7
62 1 2 1 1 2 1 2 7
6 7
6 7
62 2 2 2 2 2 2 2 7
6 7
61 2 1 2 2 1 2 1 7
6 7
6 7
41 2 2 1 1 2 2 1 5
1 1 2 2 2 2 1 1
R.J. Cintra et al. / Signal Processing 99 (2014) 201–214 207

Table 4
Round-half-up and round-half-down functions.

Approximation Transformation matrix Range of α Orthogonal?

T0 [50] See Table 2 ð1=γ 4 ; 1=γ 5 Þ Yes


T4 See Table 3 ð1=γ 5 ; 3=γ 0 Þ Yes
2 3
T5 1 1 1 1 1 1 1 1 ð3=γ 0 ; 3=γ 1 Þ Yes
62 1 1 0 0 1 1 2 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
61 0 2 1 1 2 0 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
6 2 1 1 7
61 0 1 0 2 7
6 7
41 1 1 1 1 1 1 1 5
0 1 1 2 2 1 1 0
2 3
T6 [51] 1 1 1 1 1 1 1 1 ð3=γ 1 ; 3=γ 2 Þ Yes
62 1 1 0 0 1 1 2 7
6 7
6 7
62 1 1 2 2 1 1 2 7
6 7
61 0 2 1 1 2 0 1 7
6 7
6 7
61 1 1 1 1 1 1 1 7
6 7
6 2 1 7
1 7
61 0 1 0 2
6 7
41 2 2 1 1 2 2 1 5
0 1 1 2 2 1 1 0
2 3
T7 2 2 2 2 2 2 2 2 ð1=γ 6 ; 3=γ 4 Þ Yes
63 2 1 1 1 1 2 3 7
6 7
6 7
62 1 1 2 2 1 1 2 7
6 7
62 1 3 1 1 3 1 2 7
6 7
6 7
62 2 2 2 2 2 2 2 7
6 7
61 3 1 2 2 1 3 1 7
6 7
6 7
41 2 2 1 1 2 2 1 5
1 1 2 3 3 2 1 1
T~ 1 See Table 2 ð1=γ 3 ; 1=γ 4 Þ No

Table 5 Table 6
Nearest integer functions. Orthogonal approximations.

Function Range of α Transformation matrix Approximation T Diagonal elements of T  T >

roundHAFZ ðÞ ½1=γ 4 ; 1=γ 5 Þ T0 T0 [50] ½8 6 4 6 8 6 4 6


½1=γ 5 ; 3=γ 0 Þ T4 T1 ½8 12 4 12 8 12 4 12
ð3=γ 0 ; 3=γ 1 Þ T5 T2 ½8 12 16 12 8 12 16 12
ð3=γ 1 ; 3=γ 2 Þ T6 T3 ½32 34 40 34 32 34 40 34
½1=γ 6 ; 3=γ 4 Þ T7 T4 ½8 6 8 6 8 6 8 6
½1=γ 3 ; 1=γ 4 Þ T~ 1 T5 ½8 12 8 12 8 12 8 12
T6 [51] ½8 12 20 12 8 12 20 12
roundHTZ ðÞ ð1=γ 4 ; 1=γ 5 Þ T0
T7 ½32 30 20 30 32 30 20 30
½1=γ 5 ; 3=γ 0 Þ T4
½3=γ 0 ; 3=γ 1 Þ T5
½3=γ 1 ; 3=γ 2 Þ T6
ð1=γ 6 ; 3=γ 4 Þ T7
½1=γ 3 ; 1=γ 4 Þ T~ 1 apart from the above listed ones. In Table 5, we show the
roundEVEN ðÞ ð1=γ 4 ; 1=γ 5 Þ T0 resulting matrices.
½1=γ 5 ; 3=γ 0 Þ T4 Considering orthogonal
pffiffiffiffi matrices, as shown in (2), the
ð3=γ 0 ; 3=γ 1 Þ T5 diagonal matrix S ¼ D is required to orthonormalize T.
ð3=γ 1 ; 3=γ 2 Þ T6
In Table 6, the required diagonal elements are listed. As
½1=γ 6 ; 3=γ 4 Þ T7
described in [42,44–47,50,56], in the context of JPEG-like
½1=γ 3 ; 1=γ 4 Þ T~ 1
image compression, these diagonal matrices represent no
roundODD ðÞ ½1=γ 4 ; 1=γ 5 Þ T0 additional arithmetic complexity. This is because they can
½1=γ 5 ; 3=γ 0 Þ T4
½3=γ 0 ; 3=γ 1 Þ T5
be absorbed into the image quantization step [16].
½3=γ 1 ; 3=γ 2 Þ T6 Regarding the obtained non-orthogonal approxima-
½1=γ 6 ; 3=γ 4 Þ T7 tions, in Table 7, we show the deviation from diagonality
½1=γ 3 ; 1=γ 4 Þ T~ 1 values for the associate diagonal matrix D as in (3). The
proposed matrices T~ 3 and T~ 4 showed significantly lower
deviation from orthogonality when compared to the well-
functions. Among the resulting matrices, we identify T6 as known SDCT.
the approximation described in [51]. The remaining near- We explicitly computed the inverse of the non-
est integer functions could not supply any distinct matrix orthogonal matrices, confirming their low-complexity
208 R.J. Cintra et al. / Signal Processing 99 (2014) 201–214

2 3
Table 7 1 2 1 2 1 0 1 0
Deviation from diagonality measure. 61 2 1 0 1 2 1 0 7
6 7
6 7
Approximation T δðT  T Þ> 61 0 1 2 1 0 1 2 7
6 7
16
61 0 1 0 1 2 1 2 7
7
T~ 0 0.4548 ¼ 6 7;
861 0 1 0 1 2 1 2 7
T~ 1 0.0646 6 7
61 0 1 2 1 0 1 2 7
T~ 2 [9] 0.1056 6 7
6 7
T~ 3 0.0063 41 2 1 0 1 2 1 0 5
T~ 4 0.0036 1 2 1 2 1 0 1 0

2 3
characteristic. Indeed, we have the following inverse 1 3 2 3 1 1 1 1
61 3 1 1 1 3 2 1 7
matrices: 6 7
6 7
2 3 61 1 1 3 1 1 2 3 7
1 1 1 1 1 1 0 1 6 7
61 61 2 1 1 3 7
6 1 0  1 1 1 1 1 7
7 ~ 1 6
T3 ¼ 6
1 1 3 7
7
6 7 61 1 2 1 1 3 1 3 7
61 1 0 1 1 1 1 1 7 6 7
6 7 61 1 1 1 1 3 7
61 1 1 1 1 1 0 1 7 6 3 2 7
~ 1 6 7 6 7
T1 ¼ 6 7 41 3 1 1 1 3 2 1 5
6 1 1 1 1 1 1 0 1 7
6 7
6 1 1 0 1 1 1 1 1 7 1 3 2 3 1 1 1 1
6 7
6 7
4 1 1 0 1 1 1 1 1 5 2 3
1 1 1 1 1 1 0 1 1=8 0 0 0 0 0 0 0
2 3 6 7
1=8 0 0 0 0 0 0 0 6 0 1=28 0 0 0 0 0 0 7
6 7
6 0 6 7
6 1=4 0 0 0 0 0 0 7
7 6
6
0 0 1=20 0 0 0 0 0 7
7
6 7 6 0 0 0 1=28 0 0 0 0 7
6 0 0 1=4 0 0 0 0 0 7
6 7 6
6
7;
7
6 7 6 0 0 0 0 1=8 0 0 0 7
6 0 0 0 1=4 0 0 0 0 7 6 7
6
6 0
7 6 0 0 0 0 0 1=28 0 0 7
6 0 0 0 1=8 0 0 0 7
7 6 7
6 7 6 0 0 0 0 0 0 1=20 0 7
6 0 0 0 0 0 1=4 0 0 7 4 5
6 7 0 0 0 0 0 0 0 1=28
6 0 0 0 0 0 0 1=4 0 7
4 5
0 0 0 0 0 0 0 1=4
2 3 2 3
1 2 2 2 1 2 0 2 1 3 2 3 1 1 1 1
61 61 1 1 3 2 1 7
6 2 0 2 1 2 2 2 7
7 6 3 1 7
6 7 6 7
61 2 0 2 1 2 2 2 7 61 1 1 3 1 1 2 3 7
6 7 6 7
61 2 1 1 3 7
16
61 2 2 2 1 2 0 2 7
7
1 6
T~ 4 ¼ 6
1 1 3 7
7
¼ 6 7; 61 1 2 1 1 3 1 3 7
861 2 2 2 1 2 0 2 7 6 7
6 7 61
61 2 0 2 1 2 2 2 7 6 1 1 3 1 1 2 3 7
7
6 7 6 7
6 7 41 3 1 1 1 3 2 1 5
41 2 0 2 1 2 2 2 5
1 2 2 2 1 2 0 2 1 3 2 3 1 1 1 1

2 3 2 3
1 1 1 1 1 0 1 0 1=16 0 0 0 0 0 0 0
61 1 1 0 1 1 1 0 7 6 0 7
6 7 6 1=28 0 0 0 0 0 0 7
6 7 6 7
61 0 1 1 1 0 1 1 7 6 0 0 1=20 0 0 0 0 0 7
6 7 6 7
61 6 7
~ 1 6 0 1 0 1 1 1 1 7
7 6 0 0 0 1=28 0 0 0 0 7
T2 ¼ 6 7 6
6 0
7
7
61 0 1 0 1 1 1 1 7 6 0 0 0 1=16 0 0 0 7
6 7 6 7
61 0 1 1 1 0 1 1 7 6 0 0 0 0 0 1=28 0 0 7
6 7 6 7
6 7 6 0 7
41 1 1 0 1 1 1 0 5 4 0 0 0 0 0 1=20 0 5
1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 1=28
2 3
1=8 0 0 0 0 0 0 0 2 3
1=2 0 0 0 0 0 0 0
6 0 7
6 1=4 0 0 0 0 0 0 7 6 0 1 0 0 0 0 0 07
6 7 6 7
6 0 0 1=8 0 0 0 0 0 7 6 7
6 7 6 0 0 1 0 0 0 0 07
6 7 6 7
6 0 0 0 1=4 0 0 0 0 7 6 0 0 0 1 0 0 0 07
6
6 0
7
7 ~  16
¼ T3 6
7
7:
6 0 0 0 1=8 0 0 0 7 6 0 0 0 0 1=2 0 0 07
6 7 6 7
6 0 0 0 0 0 1=4 0 0 7 6 0 0 0 0 0 1 0 07
6 7 6 7
6 0 0 0 0 0 0 1=8 0 7 6 7
4 5 4 0 0 0 0 0 0 1 05
0 0 0 0 0 0 0 1=4 0 0 0 0 0 0 0 1
R.J. Cintra et al. / Signal Processing 99 (2014) 201–214 209

Table 8 5. Fast algorithm


Constants required for the fast algorithm.
Considering usual decimation-based techniques and
Approximation m0 m1 m2 m3 m4 m5 m6
matrix factorization [67], fast algorithms for the obtained
T0 [50] 1 1 1 1 1 0 0 transformations could be derived. All discussed matrices
T1 2 0 1 1 1 1 0 share the same factorization structure described below:
T2 2 2 1 1 1 0 0
T3 3 3 2 2 2 1 0
T ¼ P  K  B1  B2  B3 ;
T4 1 1 1 1 1 1 0
T5 2 1 1 1 1 1 0
T6 [51] 2 2 1 1 1 1 0 where P is a permutation matrix, K is a multiplicative
T7 3 2 2 2 1 1 1 matrix, and B1 , B2 , and B3 are additive matrices. These
T~ 1 1 1 1 1 0 0 0 matrices are given by
T~ 2 [9] 1 1 1 1 1 1 1
T~ 3 2 2 2 1 1 1 1 2 3
1 0 0 0 0 0 0 0
T~ 4 2 2 2 2 1 1 1 60
6 0 0 0 1 0 0 0 77
6 7
60 0 1 0 0 0 0 0 7
6 7
60 0 0 0 0 1 0 0 7
6 7
P¼6 7;
60 1 0 0 0 0 0 0 7
6 7
60 0 0 0 0 0 0 1 7
6 7
6 7
40 0 0 1 0 0 0 0 5
0 0 0 0 0 0 1 0

Fig. 1. General signal flow graph for proposed transformations. Input


data xn, n ¼ 0; 1; …; 7, relates to output Xk, k ¼ 0; 1; …; 7, according to
X ¼ T  x. Dashed arrows represent multiplications by  1. (a) Full dia-
gram, (b) Block A.

Table 9
Arithmetic complexity of the obtained approximations.

Approximation Multiplications Additions Bit-shifts

T0 [50] 0 22 0
T1 0 22 4
T2 0 22 6
T3 0 30 16
T4 0 24 0
T5 0 24 4
T6 [51] 0 24 6
T7 0 32 12
T~ 1 0 18 0
T~ 2 [9] 0 28 0
T~ 3 0 28 10 Fig. 2. Quality measures of the considered approximations for several
T~ 4 0 28 12 values of r according to the following figures of merit: (a) average PSNR,
and (b) average PSNR absolute percentage error relative to the DCT.
210 R.J. Cintra et al. / Signal Processing 99 (2014) 201–214

2 3 2 3
m3 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0
6 7 60 07
6 0 m3 0 0 0 0 0 0 7 6 1 1 0 0 0 0 7
6 7 6 7
6 0 0 m5 m1 0 0 0 0 7 61 0 0 1 0 0 0 07
6 7 6 7
6 7 60
6 0 0  m1 m5 0 0 0 0 7 6 1 1 0 0 0 0 07
7
K¼6
6
7; B2 ¼ 6 7;
6 0 0 0 0 m4  m6 m2 m0 77 60
6
0 0 0 1 0 0 07
7
6 7 60
6 0 0 0 0  m0 m4  m6 m2 7 6 0 0 0 0 1 0 07
7
6 7 6 7
6 0 0 0 0  m2  m0 m4 m6 7 40 0 0 0 0 0 1 05
4 5
0 0 0 0 m6  m2  m0 m4 0 0 0 0 0 0 0 1
2 3
1 0 0 0 0 0 0 1
2 3 60 1 0 0 0 0 1 0 7
1 1 0 0 0 0 0 0 6 7
6 7
61 60 0 1 0 0 1 0 0 7
6 1 0 0 0 0 0 07
7 6 7
6 7 60 0 0 1 1 0 0 0 7
60 0 0 1 0 0 0 07 6 7
6 7 B3 ¼ 6 7;
60 61 0 0 0 0 0 0 1 7
6 0 1 0 0 0 0 07
7 6 7
B1 ¼ 6 7; 60 1 0 0 0 0 1 0 7
60 0 0 0 0 0 1 07 6 7
6 7 6 7
60 40 0 1 0 0 1 0 0 5
6 0 0 0 0 0 0 17
7
6 7 0 0 0 1 1 0 0 0
40 0 0 0 0 1 0 05
0 0 0 0 1 0 0 0 where constants mi, i ¼ 0; 1; …; 6, depend on the particular
choice of transformation matrix T. In Table 8, these con-
stants are listed for each of the discussed transformations.
As a consequence, all transformations also share the same

Fig. 3. Quality measures of the considered approximations for several Fig. 4. Quality measures of the considered approximations for several
values of r according to the following figures of merit: (a) average SSIM, values of r according to the following figures of merit: (a) average SR-SIM,
and (b) average SSIM absolute percentage error relative to the DCT. and (b) average SR-SIM absolute percentage error relative to the DCT.
R.J. Cintra et al. / Signal Processing 99 (2014) 201–214 211

fast algorithm and signal flow structure, as presented in sequence [16]. Only the r initial coefficients in each block
Fig. 1. For each transform, we could assess the arithmetic were retained; the remaining coefficients being discarded.
complexity, as measured by multiplication, addition, and We adopted 1 rr r 45.
bit-shifting operation counts. A multiplication by 3 was Subsequently, the inverse 2-D transform was applied
counted as one addition and one bit-shift operation. Results and the compressed images were obtained. Original and
are shown in Table 9. All proposed algorithms are multi- compressed images were then evaluated for image degra-
plierless; requiring only additions and bit-shifts operations. dation. As quality assessment measures, we considered
As shown in (5), transforms T~ 3 and T~ 4 lead to same (i) the peak signal-to-noise ratio (PSNR) [69], (ii) the
approximations. Since T~ 4 requires more arithmetic opera- structural similarity index (SSIM) [70], and (iii) the spectral
tions than T~ 3 does, we do not consider T~ 4 for further residual based similarity index (SR-SIM) [71]. For each
analysis. value of r, average image quality measures based on the
45 images were considered. As opposed to analyzing
particular images as in [43,44,46–48], by taking average
6. Image compression
measurements, the suggested approach is less prone to
variance effects and fortuitous data. Therefore, the pro-
6.1. JPEG-like compression
posed methodology is more robust [50,72].

Discussed transformations were considered as tools for


JPEG-like image compression. Adopting the computational 6.2. Results and discussion
experiment described in [42,50,56], we employed 45
512  512 8-bit images obtained from a public image bank Figs. 2(a), 3(a), and 4(a) show the obtained plots based on
[68]. All images were subdivided into 8  8 blocks and the selected quality assessment measures. In order to enhance
were submitted to a 2-D transformation similar to (1), visualization of the results, we considered the absolute
where the exact DCT matrix was replaced with a selected percentage error (APE) relative to the DCT as shown in
DCT approximation. The resulting 64 coefficients in the Figs. 2(b), 3(b), and 4(b). The APE measures simply represent
transform domain were ordered in the standard zigzag relative errors in percentage. For instance, considering the

Fig. 5. Compressed ‘boat’ image using (a) T0 [50], (b) T4 , (c) T~ 1 , (d) T~ 2 [9], (e) T~ 3 , and (f) DCT, for r¼ 10.
212 R.J. Cintra et al. / Signal Processing 99 (2014) 201–214

Fig. 6. Compressed ‘Lena’ image using (a) T0 [50], (b) T4 , (c) T~ 1 , (d) T~ 2 [9], (e) T~ 3 , and (f) DCT, for r ¼25.

PSNR, the APE is calculated according to the following Comparing the approximations with 22 additions, we have
expression: that T0 could outperform approximations T1 and T2 in terms
  of PSNR, SSIM, and SR-SIM measures for all considered
PSNRC PSNRT 
APEðPSNRÞ ¼  ;
 values of r. In terms of the approximations with 24 addi-
PSNR C
tions, we have that T4 showed better behavior than approx-
where PSNRC and PSNRT are the values of PSNR considering imations T5 and T6 according to the considered measures.
the exact DCT and a given approximation T, respectively. The Moreover, in comparison, T4 has the advantage of requiring
values of APEðSSIMÞ and APEðSR  SIMÞ are calculated in a no bit-shifting operations. Focusing on the non-orthogonal
similar manner. transforms, T~ 3 presented the best performance in terms of
Approximation T7 outperformed all other approxima- PSNR, SSIM, and SR-SIM measures. However, T~ 1 and SDCT
tions in terms of PSNR. This is partially expected because, showed lower computational complexities.
by using all possible elements in C, it may potentially The preceding discussion permits us to identify the
better approximate the actual DCT vector basis. Never- approximations with better performance and complexity
theless, it should be noticed that T7 also possesses the trade-off. Thus, we separate the following approximations:
highest computational cost among the examined transfor- T0 , T4 , T~ 1 , and T~ 3 . Considering this restricted set of
mations. On the other hand, the non-orthogonal approx- transformations, we processed two particular images for
imation T~ 3 outperformed all other approximations in qualitative analysis. The SDCT and DCT were also consid-
terms of SSIM and SR-SIM in high compression scenarios ered for comparison purposes. Figs. 5 and 6 show ‘boat’
(r r 9). Approximation T~ 1 has the lowest computational and ‘Lena’ images after being submitted to the JPEG-like
complexity, requiring only 18 additions. The orthogonal compression experiment for r ¼10 and r ¼25, respectively.
approximation T3 showed comparable performance to the We also included the associate difference images with
non-orthogonal approximation T~ 3 . However, the non- respect to the original image in order to qualitatively
orthogonal approximation is less computationally expen- illustrate the small amount of error effected by the
sive requiring 28 additions and 10 bit-shifts; whereas proposed approximations. Measurements for PSNR, SSIM,
T3 requires 30 additions and 16 bit-shifting operations. and SR-SIM are also included.
R.J. Cintra et al. / Signal Processing 99 (2014) 201–214 213

7. Conclusion [18] N. Roma, L. Sousa, Efficient hybrid DCT-domain algorithm for video
spatial downscaling, EURASIP J. Adv. Signal Process. 2007 (2) (2007)
1–16.
This paper introduces a collection of DCT approxima- [19] International Organisation for Standardisation, Generic Coding of
tions derived from the application of common integer Moving Pictures and Associated Audio Information – Part 2: Video,
functions to the exact DCT. The proposed mathematical ISO/IEC JTC1/SC29/WG11 – Coding of Moving Pictures and Audio,
ISO, 1994.
formalism could encompass—as particular cases—several
[20] International Telecommunication Union, ITU-T Recommendation
transforms already archived in the literature. In particular, H.261 Version 1: Video Codec for Audiovisual Services at p  64
the well-known SDCT was derived as a result of the kbits, Technical Report, ITU-T, 1990.
suggested systematic procedure. All proposed transforms [21] International Telecommunication Union, ITU-T Recommendation
H.263 Version 1: Video Coding for Low Bit Rate Communication,
were given fast algorithms that have the same structure. Technical Report, ITU-T, 1995.
This suggests a common algebraic framework shared [22] T. Wiegand, G.J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the
among all discussed approximations. Only additions and H.264/AVC video coding standard, IEEE Trans. Circuits Syst. Video
Technol. 13 (7) (2003) 560–576.
simple bit-shifting operations were necessary for their [23] Joint Video Team, Recommendation H.264 and ISO/IEC 14 496–10
evaluation. The low-complexity property of the obtained AVC: Draft ITU-T Recommendation and Final Draft International
approximations makes them suitable for hardware imple- Standard of Joint Video Specification, Technical Report, ITU-T, 2003.
[24] A. Luthra, G.J. Sullivan, T. Wiegand, Introduction to the special issue
mentation in dedicated architecture employing fixed-point
on the H.264/AVC video coding standard, IEEE Trans. Circuits Syst.
arithmetic. The proposed approximations were assessed in Video Technol. 13 (7) (2003) 557–559.
terms of computational complexity and performance in [25] M.T. Pourazad, C. Doutre, M. Azimi, P. Nasiopoulos, HEVC: the new
JPEG-like compression; exhibiting a good balance between gold standard for video compression: How does HEVC compare with
H.264/AVC? IEEE Consum. Electron. Mag. 1 (3) (2012) 36–46.
cost and performance. [26] F. Bossen, B. Bross, K. Suhring, D. Flynn, HEVC complexity and
implementation analysis, IEEE Trans. Circuits Syst. Video Technol.
22 (12) (2012) 1685–1696.
[27] G.J. Sullivan, J. Ohm, W.-J. Han, T. Wiegand, Overview of the high
efficiency video coding (HEVC) standard, IEEE Trans. Circuits Syst.
Acknowledgments Video Technol. 22 (12) (2012) 1649–1668.
[28] M. Vetterli, H. Nussbaumer, Simple FFT and DCT algorithms with
reduced number of operations, Signal Process. 6 (1984) 267–278.
This research was partially supported by CNPq, CAPES, [29] H.S. Hou, A fast recursive algorithm for computing the discrete
FACEPE, and FAPERGS. cosine transform, IEEE Trans. Acoust. Signal Speech Process. 6 (10)
(1987) 1455–1461.
[30] Z. Wang, Fast algorithms for the discrete W transform and for the
References discrete Fourier transform, IEEE Trans. Acoust. Speech Signal Pro-
cess. ASSP-32 (1984) 803–816.
[1] K.R. Rao, P. Yip, Discrete Cosine Transform: Algorithms, Advantages, [31] B.G. Lee, A new algorithm for computing the discrete cosine trans-
Applications, Academic Press, San Diego, CA, 1990. form, IEEE Trans. Acoust. Speech Signal Process. ASSP-32 (1984)
[2] V. Britanak, P. Yip, K.R. Rao, Discrete Cosine and Sine Transforms, 1243–1245.
Academic Press, Boston, MA, 2007. [32] Y. Arai, T. Agui, M. Nakajima, A fast DCT-SQ scheme for images,
[3] N. Ahmed, T. Natarajan, K.R. Rao, Discrete cosine transform, IEEE Trans. IEICE E-71 (11) (1988) 1095–1097.
Trans. Comput. C-23 (1) (1974) 90–93. [33] C. Loeffler, A. Ligtenberg, G. Moschytz, Practical fast 1D DCT algo-
[4] J.L. Doob, Stochastic Processes, first edition 1953, John Wiley & Sons, rithms with 11 multiplications, in: Proceedings of the International
New York, 1990. Conference on Acoustics, Speech, and Signal Processing, 1989,
[5] I.I. Gikhman, A.V. Skorohod, The Theory of Stochastic Processes, pp. 988–991.
Springer-Verlag, New York, 1974. [34] E. Feig, S. Winograd, Fast algorithms for the discrete cosine trans-
[6] K. Karhunen, Über lineare Methoden in der Wahrscheinlichkeits- form, IEEE Trans. Signal Process. 40 (9) (1992) 2174–2193.
rechnung, Annales Academiae scientiarum Fennicae: Mathematica- [35] B. Vasudev, N. Merhav, DCT mode conversions for field/frame coded
Physica, Universitat Helsinki, 1947. MPEG video, in: IEEE Second Workshop on Multimedia Signal
[7] R.J. Clarke, Relation between the Karhunen–Loève and cosine trans- Processing, 1998, pp. 605–610.
forms, IEEE Proc. F Commun. Radar Signal Process. 128 (6) (1981) [36] M.C. Lin, L.R. Dung, P.K. Weng, An ultra-low-power image compres-
359–360. sor for capsule endoscope, BioMed. Eng. OnLine 5 (1) (2006) 1–8.
[8] J. Liang, T.D. Tran, Fast multiplierless approximation of the DCT with [37] H.L.P.A. Madanayake, R.J. Cintra, D. Onen, V.S. Dimitrov, L.T. Bruton,
the lifting scheme, IEEE Trans. Signal Process. 49 (2001) 3032–3044. Algebraic integer based 8  8 2-D DCT architecture for digital video
[9] T.I. Haweel, A new square wave transform based on the DCT, Signal processing, in: IEEE International Symposium on Circuits and
Process. 82 (2001) 2309–2319. Systems (ISCAS), 2011, pp. 1247–1250.
[10] F. Gianfelici, RBF-based technique for statistical demodulation of [38] N. Rajapaksha, A. Edirisuriya, A. Madanayake, R.J. Cintra, D. Onen,
pathological tremor, IEEE Trans. Neural Netw. Learn. Syst. 24 (10) I. Amer, V.S. Dimitrov, Asynchronous realization of algebraic integer-
(2013) 1565–1574. based 2D DCT using Achronix Speedster SPD60 FPGA, J. Electr.
[11] F. Gianfelici, D. Farina, An effective classification framework for Comput. Eng. 2013 (2013) 1–9.
brain–computer interfacing based on a combinatoric setting, IEEE [39] A. Edirisuriya, A. Madanayake, V. Dimitrov, R.J. Cintra, J. Adikari, VLSI
Trans. Signal Process. 60 (3) (2012) 1446–1459. architecture for 8-point AI-based Arai DCT having low area-time
[12] C. Turchetti, G. Biagetti, F. Gianfelici, P. Crippa, Nonlinear system complexity and power at improved accuracy, J. Low Power Electron.
identification: an effective framework based on the Karhunen–Loève Appl. 2 (2) (2012) 127–142.
transform, IEEE Trans. Signal Process. 57 (2) (2009) 536–550. [40] A. Edirisuriya, A. Madanayake, R.J. Cintra, V.S. Dimitrov, N.
[13] F. Gianfelici, C. Turchetti, P. Crippa, A non-probabilistic recognizer of T. Rajapaksha, A single-channel architecture for algebraic integer
stochastic signals based on KLT, Signal Process. 89 (4) (2009) based 8  8 2-D DCT computation, IEEE Trans. Circuits Syst. Video
422–437. Technol. 23 (12) (2013) 2083–2089.
[14] M. Gastpar, P.-L. Dragotti, M. Vetterli, The distributed Karhunen– [41] M.T. Heideman, C.S. Burrus, Multiplicative Complexity, Convolution,
Loève transform, IEEE Trans. Inf. Theory 52 (12) (2006) 5177–5196. and The DFT, Signal Processing and Digital Filtering, Springer-Verlag,
[15] V. Bhaskaran, K. Konstantinides, Image and Video Compression New York, NY, 1988.
Standards, Kluwer Academic Publishers, Boston, 1997. [42] F.M. Bayer, R.J. Cintra, DCT-like transform for image compression
[16] G.K. Wallace, The JPEG still picture compression standard, IEEE requires 14 additions only, Electron. Lett. 48 (15) (2012) 919–921.
Trans. Consum. Electron. 38 (1) (1992) xviii–xxxiv. [43] S. Bouguezel, M.O. Ahmad, M.N.S. Swamy, A multiplication-free
[17] W.B. Pennebaker, J.L. Mitchell, JPEG Still Image Data Compression transform for image compression, in: 2nd International Conference
Standard, Van Nostrand Reinhold, New York, NY, 1992. on Signals, Circuits and Systems, 2008, pp. 1–4.
214 R.J. Cintra et al. / Signal Processing 99 (2014) 201–214

[44] S. Bouguezel, M.O. Ahmad, M.N.S. Swamy, A low-complexity para- [58] K. Nakagaki, R. Mukundan, A fast 4  4 forward discrete Tchebichef
metric transform for image compression, in: IEEE International transform algorithm, IEEE Signal Process. Lett. 14 (10) (2007)
Symposium on Circuits and Systems (ISCAS), 2011. 684–687.
[45] K. Lengwehasatit, A. Ortega, Scalable variable complexity approx- [59] F.M. Bayer, R.J. Cintra, Image compression via a fast DCT approxima-
imate forward DCT, IEEE Trans. Circuits Syst. Video Technol. 14 (11) tion, IEEE Lat. Am. Trans. 8 (6) (2010) 708–713.
(2004) 1236–1248. [60] J.W. Eaton, D. Bateman, S. Hauberg, GNU Octave Manual Version 3,
[46] S. Bouguezel, M.O. Ahmad, M.N.S. Swamy, Low-complexity 8  8 Network Theory Limited, 2008.
transform for image compression, Electron. Lett. 44 (21) (2008) [61] R.L. Graham, D.E. Knuth, O. Patashnik, Concrete Mathematics, 2nd
1249–1250. edition, Addison-Wesley, Upper Saddle River, NJ, 2008.
[47] S. Bouguezel, M.O. Ahmad, M.N.S. Swamy, A fast 8  8 transform for [62] G. Plonka, A global method for invertible integer DCT and integer
image compression, in: International Conference on Microelectro- wavelet algorithms, Appl. Comput. Harmon. Anal. 16 (2004) 90–110.
nics (ICM), 2009, pp. 74–77. [63] K. Oldham, J. Myland, J. Spanier, An Atlas of Functions, 2nd edition,
[48] S. Bouguezel, M.O. Ahmad, M.N.S. Swamy, A novel transform for
Springer, 2008.
image compression, in: 53rd IEEE International Midwest Sympo-
[64] International Organization for Standardization, ISO/IEC/IEEE 60559:2011,
sium on Circuits and Systems (MWSCAS), 2010, pp. 509–512.
2011.
[49] S. Bouguezel, M.O. Ahmad, M.N.S. Swamy, Binary discrete cosine and
[65] Wolfram Research, Round—nearest integer function 〈http://func
Hartley transforms, IEEE Trans. Circuits Syst. I: Regul. Pap. 60 (4)
tions.wolfram.com/IntegerFunctions/Round/27/01/01/01/〉, Septem-
(2013) 989–1002.
[50] R.J. Cintra, F.M. Bayer, A DCT approximation for image compression, ber 2013.
IEEE Signal Process. Lett. 18 (10) (2011) 579–582. [66] B.N. Flury, W. Gautschi, An algorithm for simultaneous orthogonal
[51] U.S. Potluri, A. Madanayake, R.J. Cintra, F.M. Bayer, N. Rajapaksha, transformation of several positive definite symmetric matrices to
Multiplier-free DCT approximations for RF multi-beam digital nearly diagonal form, SIAM J. Sci. Stat. Comput. 7 (1) (1986) 169–184.
aperture-array space imaging and directional sensing, Meas. Sci. [67] R.E. Blahut, Fast Algorithms for Signal Processing, Cambridge Uni-
Technol. 23 (11) (2012) 114003. versity Press, Cambridge, UK, 2010.
[52] R.J. Cintra, An integer approximation method for discrete sinusoidal [68] The USC-SIPI Image Database 〈http://sipi.usc.edu/database/〉, Uni-
transforms, J. Circuits Syst. Signal Process. 30 (6) (2011) 1481–1501. versity of Southern California, Signal and Image Processing Institute,
[53] G.A.F. Seber, A Matrix Handbook for Statisticians, John Wiley & Sons, 2011.
Inc., Hoboken, NJ, 2008. [69] Q. Huynh-Thu, M. Ghanbari, Scope of validity of PSNR in image/
[54] N.J. Higham, Computing real square roots of a real matrix, Linear video quality assessment, Electron. Lett. 44 (13) (2008) 800–801.
Algebra Appl. 88/89 (1987) 405–430. [70] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality
[55] MATLAB, version 8.1 (R2013a) Documentation, The MathWorks Inc., assessment: from error visibility to structural similarity, IEEE Trans.
Natick, MA, 2013. Image Process. 13 (4) (2004) 600–612.
[56] F.M. Bayer, R.J. Cintra, A. Edirisuriya, A. Madanayake, A digital [71] L. Zhang, H. Li, SR-SIM: a fast and high performance IQA index based
hardware fast algorithm and FPGA-based prototype for a novel 16- on spectral residual, in: 2012 19th IEEE International Conference on
point approximate DCT for image compression applications, Measur. Image Processing (ICIP), 2012, pp. 1473–1476. http://dx.doi.org/10.
Sci. Technol. 23 (8) (2012) 114010. 1109/ICIP.2012.6467149.
[57] S. Ishwar, P.K. Meher, M.N.S. Swamy, Discrete Tchebichef transform— [72] S.M. Kay, Fundamentals of Statistical Signal Processing, Volume I:
A fast 4  4 algorithm and its application in image/video compres- Estimation Theory, Prentice Hall Signal Processing Series, vol. 1,
sion, in: IEEE International Symposium on Circuits and Systems
Prentice Hall, Upper Saddle River, NJ, 1993.
(ISCAS), 2008, pp. 260–263.

You might also like