You are on page 1of 16

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/313798249

LETRIST: Locally Encoded Transform Feature Histogram for Rotation-Invariant


Texture Classification

Article  in  IEEE Transactions on Circuits and Systems for Video Technology · February 2017
DOI: 10.1109/TCSVT.2017.2671899

CITATIONS READS
55 1,192

5 authors, including:

Tiecheng Song Fanman Meng


Chongqing University of Posts and Telecommunications University of Electronic Science and Technology of China
29 PUBLICATIONS   276 CITATIONS    107 PUBLICATIONS   1,476 CITATIONS   

SEE PROFILE SEE PROFILE

Qingbo Wu Jianfei Cai


University of Electronic Science and Technology of China Nanyang Technological University
111 PUBLICATIONS   1,293 CITATIONS    339 PUBLICATIONS   5,321 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Single Image Deraining View project

Image description and classification View project

All content following this page was uploaded by Tiecheng Song on 15 December 2018.

The user has requested enhancement of the downloaded file.


This paper has been accepted by TCSVT.
1

LETRIST: Locally Encoded Transform Feature


Histogram for Rotation-Invariant Texture
Classification
Tiecheng Song, Member, IEEE, Hongliang Li, Senior Member, IEEE, Fanman Meng, Member, IEEE,
Qingbo Wu, Member, IEEE and Jianfei Cai, Senior Member, IEEE

Abstract—Classifying texture images, especially those with [6], [7]. In the real world, textures not only have complex in-
significant rotation, illumination, scale and viewpoint changes, trinsic patterns which are difficult to describe using a universal
is a fundamental and challenging problem in computer vision. model, but also exhibit large extrinsic variations (e.g., rotation,
This paper proposes a simple yet effective image descriptor, called
Locally Encoded TRansform feature hISTogram (LETRIST), for illumination, and deformation) due to different imaging con-
texture classification. LETRIST is a histogram representation ditions (as shown in Fig. 6). If no a priori information of these
that explicitly encodes the joint information within an image imaging conditions is available, successful classification of
across feature and scale spaces. The proposed representation is these textures is a highly challenging task. In such a scenario,
training-free, low-dimensional, yet discriminative and robust for a good image feature representation should have the following
texture description. It consists of the following major steps. 1)
A set of transform features is constructed to characterize local desirable properties for texture classification:
texture structures and their correlation by applying linear and 1) Discriminative to distinguish different classes of tex-
non-linear operators on the extremum responses of directional tures [1], [8]–[10];
Gaussian derivative filters in scale space. Established on the
2) Invariant to image transformations such as rotation,
basis of steerable filters, the constructed transform features
are exactly rotationally invariant as well as computationally illumination, scale, and viewpoint changes [1], [9], [11];
efficient. 2) The scalar quantization via binary or multi-level 3) Insensitive to noise [10], [12]–[16];
thresholding is adopted to quantize these transform features into 4) Low-dimensional to facilitate subsequent operations
texture codes. Two quantization schemes are designed, both of (e.g., storage, matching, or classification) [17], [18];
which are robust to image rotation and illumination changes.
5) Efficient for implementation [1], [10], [19], [20].
3) The cross-scale joint coding is explored to aggregate the
discrete texture codes into a compact histogram representation, Although a large number of approaches have been pro-
i.e., LETRIST. Experimental results on the Outex, CUReT, KTH- posed for texture analysis, to our knowledge, most of them
TIPS and UIUC texture datasets show that LETRIST consistently cannot well balance the aforementioned properties [9]. For
produces better or comparable classification results than the
state-of-the-art approaches. Impressively, recognition rates of
instance, the texton-based feature representation introduced
100.00% and 99.00% have been achieved on the Outex and KTH- by Varma and Zisserman [11] can extract discriminative yet
TIPS datasets, respectively. In addition, the noise robustness is robust features when classifying texture images, especially
evaluated on the Outex and CUReT datasets. The source code is with significant scale and viewpoint changes (see Section
publicly available at https://github.com/stc-cqupt/letrist. IV). However, it usually involves a high-dimensional feature
Index Terms—Texture classification, texture analysis, rotation representation, and requires an extra learning phase and a
invariance, steerable filter, texton, local binary pattern (LBP). costly nearest-neighbor calculation. As for the local binary
pattern (LBP) introduced by Ojala et al. [1], it is simple
to implement and also robust to rotation and illumination
I. I NTRODUCTION
changes. However, it is very sensitive to noise. Recently, the
XTRACTION of texture features serves a fundamental
E building block in a variety of visual applications such as
texture classification [1], face recognition [2], medical diagno-
binary rotation invariant and noise tolerant (BRINT) descriptor
was proposed in [10]. Nonetheless, its discriminative power is
compromised with the overemphasized noise robustness, and
sis [3], image matching [4], [5], and scene/object recognition it also needs a high-dimensional feature representation.
To better handle these five properties, in this paper we
Copyright ⃝20xx
c IEEE. Personal use of this material is permitted. How-
ever, permission to use this material for any other purposes must be obtained propose a novel image feature representation, called Locally
from the IEEE by sending an email to pubs-permissions@ieee.org. Encoded TRansform feature hISTogram (LETRIST), for tex-
This work was supported in part by National Natural Science Foundation ture classification. In the proposed method, we first compute
of China (No. 61525102, No. 61601102 and No. 61502084).
T. Song is with the School of Communication and Information Engineering, the extremum responses of the first and second directional
Chongqing University of Posts and Telecommunications, Chongqing 400065, Gaussian derivative filters at multiple scales. This is accom-
China (email: tggwin@gmail.com). H. Li, F. Meng and Q. Wu are with plished by the extremum filtering1 established on the basis
the School of Electronic Engineering, University of Electronic Science and
Technology of China, Chengdu 611731, China (email: hlli@uestc.edu.cn; of steerable filters [21]. The resulting extremum responses
fmmeng@uestc.edu.cn; wqb.uestc@gmail.com). H. Li is the corresponding
author. J. Cai is with the School of Computer Science and Engineering, 1 In this paper, the extremum filtering refers to the filtering operations for
Nanyang Technological University, Singapore (email: asjfcai@ntu.edu.sg). computing the directional extremum (maximum and minimum) responses.
2

are exactly rotation-invariant and also easy to implement. steerable filters to construct the transform features.
Then, we construct a set of transform features by performing Recently, there is increasing interest in using deep neural
linear and non-linear operators on these extremum responses networks [25] to compute invariant image representations,
to capture discriminative texture information. Derived from such as DeCAF [26], the wavelet scattering networks (Scat-
the first- and second-order image derivatives, the constructed Net) [27], and PCANet [28]. A typical architecture of deep
transform features can characterize not only local texture convolutional neural networks (CNNs) [29] alternates several
structures (e.g., edges, lines and blobs) but also local second- layers of convolutional filtering, non-linear rectification and
order curvatures (e.g., caps, ridges, ruts and cups). Next, feature pooling, followed by a classification module. Thus,
we directly quantize these transform features into discrete a hierarchy of features corresponding to different levels of
texture codes via simple binary or multi-level thresholding. abstractions can be learned. In particular, the convolutional
Particularly, two types of scalar quantization, i.e., the ratio and filters used in ScatNet and PCANet are respectively predefined
uniform quantizers, are designed, both of which are robust to wavelets and PCA filters. In the proposed LETRIST, we
image rotation and illumination changes. Finally, we jointly sequentially perform extremum filtering, feature transform,
encode these texture codes across scales to build feature scalar quantization and cross-scale joint coding for histogram
histograms, which are further concatenated to form the image representations. These operators can be viewed as some special
descriptor, i.e., LETRIST. All of the above steps contribute to a processing layers in CNNs with a shallow feature extraction
simple, low-dimensional, yet discriminative and robust image architecture. However, unlike traditional CNNs, our method is
descriptor. Experimental results demonstrate the effectiveness training-free and the final image features are low-dimensional.
of our method in classifying textures under various image The rest of this paper is organised as follows. Section II
transformations and even in the presence of Gaussian noise. reviews the related work. Section III elaborates the proposed
There are some works involving Gaussian derivative filters. method. Section IV evaluates our method, compares classifi-
Varma and Zisserman [11] employed the Maximum Response cation results with the state-of-the-art, and also tests the noise
8 (MR8) filter bank to build a texton-based image representa- robustness. Finally, the conclusion is drawn in Section V.
tion. Zhang et al. [20] used the second directional derivative of
Gaussian filter to build a local energy pattern (LEP) histogram.
II. R ELATED WORK
In [22], they also used the maximum responses of Gaussian
derivative filters to extract continuous rotation-invariant fea- During the past decades, various approaches to extracting
tures for texton-based representation. In our previous work texture features have been proposed, most of which fall into
[23], we adopted the edge and bar filters to build a binary one of the three major categories, i.e., statistical, model-based,
coding based texture representation. All these works differ and filtering approaches [30]. The statistical approaches char-
from the proposed method in that they either model images as acterize textures by some statistical measures. For instance,
a texton-based representation or use multiple oriented filters the spatial-dependence co-occurrence matrices introduced by
to extract approximate rotation-invariant features. In contrast, Haralick et al. [31] are built by counting the gray-level pairs
the proposed method is training-free and can achieve exact of pixels at a specified distance and orientation. Then some
rotation invariance via the extremum filtering. statistical measures are computed from these matrices as tex-
Our method is related to the work [19], where six basic ture features. In model-based approaches, an image model is
image features (BIF) are computed from the responses of first assumed, and the model parameters are then estimated and
Gaussian derivative filters to build a BIF-column representa- used as texture features. The circular symmetric autoregressive
tion. However, there are three major differences. Firstly, our (CSAR) model [32] and the anisotropic circular Gaussian
feature construction is based on the extremum filtering while Markov random field (ACGMRF) model [33] are two rep-
[19] is based on the study of local symmetry types. Secondly, resentative works for rotation-invariant texture classification.
unlike [19] that uses simple combinations of filter responses, In filtering approaches, texture features are generally derived
we introduce non-linear operators and consider the correlation from the local “energy” of filtered images. This category
between the first- and second-order features. Thirdly, instead of methods have been extensively studied in the literature,
of using the BIF-column representation, we couple our trans- including Laws filters, Gabor filters, and wavelet transforms.
form features with the tailored scale quantization and joint A comprehensive analysis and evaluation can be found in [30].
coding, making our image descriptor very compact. Recently, texton-based and local pattern-based represen-
The proposed LETRIST also differs from our previous tations, as a generalized combination of the statistical and
work [24] which explores joint space-frequency features for filtering approaches [11], [34], have shown promising per-
texture representation. Firstly, we generalize [24] in spatial formance in texture classification and face recognition. The
filtering and develop new pixel-wise transform features via the texton-based representations [11], [17], [35], [36] involve a
extremum filtering. Secondly, beyond the mean-based quanti- learning process to generate a texton dictionary (i.e., a set of
zation used in [24], we design the ratio and uniform quantizers textons). This is typically accomplished by clustering (via k-
to fit our constructed features. Thirdly, by extending the joint means) local feature vectors extracted from the training images
scale coding, we propose the adjacent-scale coding to form and choosing the cluster centers as the textons. Given a novel
a much more compact image representation. Lastly, without image, a frequency histogram of textons is built by assigning
resort to the Fourier transform used in [24], we make our each feature vector, and thereby each pixel within this image,
method computationally more efficient by taking advantage of to the closest texton in the dictionary. In the existing literature,
3

Fig. 1. The pipeline of the proposed method. The extremum response maps of I1max θ θ
, I2max θ
, and I2min are respectively shown in three stacks. In each
stack, the extremum responses are computed at three scales: σ1 = 1, σ2 = 2, σ3 = 4 (top to bottom). The transform feature maps of g, d, s, r and their
quantized code maps are shown at these three scales (top to bottom). The final image descriptor is obtained by concatenating three feature histograms. The
blue digits shown in the code maps indicate the total numbers of quantization levels.

local feature vectors are obtained from local descriptors [37]– and local directional number (LDN) [56] for face recognition.
[39], the Maximum Response 8 (MR8) filter bank [11], the Other local pattern-based representations include the Weber
responses of local derivatives filters [22], [34], source image local descriptor (WLD) [44], the local energy pattern (LEP)
patches [11], random project (RP) patches [17], and so on. For [20], and the local N-ary pattern (LNP) [57].
texton-based representations, there are problems to choose a Combining the information that reflects different aspects
proper dictionary size and it is costly to perform a nearest- of an image can contribute to a discriminative feature rep-
neighbor calculation for each pixel [17], [19], [20], [40]. resentation. Great efforts have been made in this direction
by exploiting LBP. For instance, Guo et al. [40] proposed
Without requiring a dictionary learning stage, local pattern-
the complete LBP (CLBP) by encoding three complemen-
based representations [1], [40]–[44] have been attracting in-
tary components, i.e., the center pixel, and the signs and
creasing attention. The influential work of local binary pattern
magnitudes of local differences. Qi et al. [58] developed the
(LBP) introduced by Ojala et al. [1], leads the way in this re-
pairwise rotation invariant co-occurrence LBP (PRICoLBP)
gard. In LBP, the intensity value of a center pixel is compared
by encoding two LBPs collaboratively. By further incorpo-
with its neighbors, and then the resultant sign information
rating the multi-scale and multi-orientation color information,
(a bit string) is encoded into an integer code. Finally, a
they demonstrated the effectiveness of PRICoLBP for visual
frequency histogram of such codes is computed to represent
classification. Hong et al. [18] introduced a numerical variant
the image/region. Due to the computational efficiency and
of LBP, LBP difference (LBPD), so that it can be effectively
texture discrimination, LBP has been widely used in texture
combined with other features through a covariance matrix. The
classification [1], face recognition [2], video description [45],
resulting texture descriptor COV-LBPD achieved competitive
and object detection [46]–[48]. A lot of variants of LBP have
performance compared with the state-of-the-art methods [19],
been developed. Tan and Triggs [41] introduced the local
[40], [45]. Quan et al. [59] performed the lacunarity analysis
ternary pattern (LTP) to encode the pixel differences into three
on multi-scale LBPs to characterize the spatial distribution of
levels. Liao et al. [49] suggested using the most frequently
texture structures. In our previous work [23], we leveraged
occurred patterns, called dominant LBP (DLBP), as texture
locally enhanced binary coding (LEBC) to augment LBP codes
features. It is noted that only the dominant pattern frequencies
for texture representation and obtained promising results.
are considered in DLBP while the informative pattern types are
Other efforts include applying the idea of LBP to encode
discarded. To better explore the information in patterns, Guo
the neighboring information of Gabor features [60]–[63] and
et al. [8] presented a three-layered model to learn the optimal
monogenic signal features [64] for robust face recognition.
pattern subset. LBP is a local operator which fails to capture
larger scale structural features. Therefore, the patch sampling
[10], [50]–[52] and geometric sampling structures [47], [53] III. P ROPOSED M ETHOD
were proposed to encode the macro-structure information. The The proposed method explicitly encodes the joint infor-
original LBP is sensitive to noise. Consequently, many at- mation within an image across feature and scale spaces.
tempts were made to alleviate this issue by employing the local Fig. 1 shows the pipeline of the proposed method which is
averaging [10], [12], [13], the frequency or transform-domain summarized into four steps: (1) extremum filtering, (2) feature
components [14], [15], or the error-correction mechanism [16]. transform, (3) scalar quantization, and (4) cross-scale joint
In addition, alternative coding rules were designed for specific coding and image representation. In particular, the input image
applications, such as the completed local binary count (CLBC) is first convolved with a family of Gaussian derivative filters
[42] for texture classification, local tetra patterns [54] for to compute the extremum responses at multiple scales. Then,
image retrieval, local high-order derivative patterns [43], [55] based on these extremum responses, a set of transform features
4

is constructed which is rotationally invariant according to


steerable filters. Next, these transform features are quantized
into discrete texture codes via scalar quantization. Finally,
the texture codes are jointly encoded across scales to build
multiple histograms, which are further concatenated to form (a) (b)
the image feature representation.
Fig. 2. The MR8 filter bank [11] and the basis filters. (a) The MR8 filter
The reasons for adopting these four sequential steps are bank consists of 38 filters: an edge (elongated first derivative) filter and a
as follows. Step (1) is the foundation of the subsequent bar (elongated second derivative) filter both at 6 orientations and 3 scales
steps and is mainly used to extract multi-scale rotationally (σx , σy ) = {(1, 3), (2, 6), (4, 12)}, a Gaussian and a LOG filter both at
the scale σ = 10. For the oriented edge and bar filters, only the maximal
invariant features. These features, however, are not sufficiently filter response across all orientations is used at each scale. This gives a total
discriminative for texture description (see Section IV-C-2)). of 8 rotationally invariant filter responses. (b) The 5 basis filters (Gx , Gy ,
Therefore, Step (2) performs the feature transform to construct Gxx , Gxy , and Gyy ) at 3 scales σ = {1, 2, 4} are used to synthesize the
directional Gaussian derivative filters in the proposed method.
a more discriminative rotation-invariant feature set. Note that
it is inappropriate to directly concatenate all transform features
over the whole image as texture representation — this will lead
where Gx and Gxx are respectively the scale-normalized3 first
to a high-dimensional and rotation-sensitive image descriptor.
and second derivatives of G along the x-axis, and similarly for
An alternative method is to use the transform features as pixel-
Gy , Gxy and Gyy . Note that we have omitted the arguments
level descriptors, followed by the k-means clustering to build
(x, y; σ) for simplicity, similarly hereinafter. These five basis
a texton-based texture descriptor [36]. However, as mentioned
filters used in Eqns. (2) and (3) are shown in Fig. 2 (b).
in Sections I and II, such a method requires a learning stage
Given an image I, we first compute the first- and second-
and a costly nearest-neighbor calculation. To overcome afore-
order image derivatives by Lx = Gx ∗ I, Ly = Gy ∗ I, Lxx =
mentioned problems, Step (3) adopts the scalar quantization
Gxx ∗ I, Lxy = Gxy ∗ I, Lyy = Gyy ∗ I, where ∗ denotes
to obtain discrete pixel-wise codes to build histogram-based
the convolution. Thus, the responses of the first and second
image features2 . For a compact image feature representation,
Gaussian derivative filters at orientation θ [21], [22] are
Step (4) is specially designed by effectively encoding the √
generated texture codes across features and scales. I1θ = Gθ1 ∗ I = cos(θ)Lx + sin(θ)Ly = L2x + L2y sin(θ + ϕ)
(4)
A. Multi-scale Extremum Filtering where ϕ = arctan( L Ly ) and
x

Local image derivatives contain rich structural information, I2θ = Gθ2 ∗ I = cos2 (θ)Lxx − sin(2θ)Lxy + sin2 (θ)Lyy
which have shown great potential in texture analysis [1], [11], ( √ )
1 2
[13], [19], [20], [22], [23], [44]. Inspired by [11], [22], in this = Lxx + Lyy + (Lxx − Lyy ) + 4L2xy cos(2θ − ψ)
work we employ the steerability of Gaussian derivative filters 2
(5)
[21] (up to second order) to compute the extremum (maximum 2Lxy
where ψ = arctan( Lyy −L ). Then, we can compute the
and minimum) responses in scale space. We refer to these op- xx
extremum responses in a similar way as in [22]. That is, the
erators as “extremum filtering”. The motivation for introducing
maximum value of I1θ over all θ is
the extremum filtering is multi-fold. The first is to capture √
the useful information contained in the first- and second-order θ
I1max = L2x + L2y (6)
differential structures at a range of scales. The second is to
extract local features that are exactly rotation-invariant. The The maximum and minimum values of I2θ over all θ are
third is to make our method efficient to implement. ( √ )
1 2
According to the theory of steerable filters [21], any ori- θ
I2max = Lxx + Lyy + (Lxx − Lyy ) + 4Lxy 2 (7)
2
entation of the first or second derivative of a Gaussian can
be synthesized by taking a linear combination of several and
( √ )
basis filters. Formally, consider a two-dimensional circularly 1 2
symmetric Gaussian function:
θ
I2min = Lxx + Lyy − (Lxx − Lyy ) + 4Lxy
2 (8)
2
( 2 )
1 x + y2 Without loss of generality, let us rotate image I counter-
G(x, y; σ) = exp − (1)
2πσ 2 2σ 2 clockwise around its center point by an arbitrary angle α◦
where σ is the standard deviation or the scale. The first and and denote the rotated image as I ′ . Under this rotation, point
second Gaussian derivatives at an arbitrary orientation θ are (x, y) in I is changed to (x′ , y ′ ) in I ′ . Assume point (x, y)
in I achieves the extremum response (as defined in Eqns. (6)-
Gθ1 = cos(θ)Gx + sin(θ)Gy (2) (8)) through the first or second directional Gaussian derivative
and filter at angle θ̂◦ . Then, point (x′ , y ′ ) in I ′ can also achieve
the same extremum response through this directional derivative
Gθ2 = cos2 (θ)Gxx − sin(2θ)Gxy + sin2 (θ)Gyy (3) filter at angle (θ̂ + α)◦ . That is, the extremum response for
3 The scale normalization operators: G ← σG , G ← σG , G
xx ←
2 Like the LBP codes [1], our quantized pixel-wise codes are rotationally
x x y y
invariant and hence the resulting histogram is also rotationally invariant. σ 2 Gxx , Gxy ← σ 2 Gxy , Gyy ← σ 2 Gyy .
5

Fig. 3. The shape-index values in the range [0, 1] and their corresponding
local shapes. For more shape categories, please refer to [65].

two corresponding points between arbitrarily rotated images is


exactly invariant. In contrast, the traditional MR8 filter bank
[11] which uses 6-directional filtering (see Fig. 2 (a)) can only
achieve discrete rotation invariance with additional computa- Fig. 4. Illustration of the rotation invariance of transform features. Column 1:
tional costs. To capture the multi-scale feature properties, we four texture images with the rotation of 0◦ , 60◦ , 120◦ and 200◦ , respective-
compute the extremum responses at Nσ scales. Following [11], ly (http://sipi.usc.edu/database/database.php?volume=textures); Columns 2-4:
image histograms of transform features g, d, s and r (σ = 2).
[19], [20], [22], [24], we empirically set Nσ = 3: σ1 = 1,
σ2 = 2, and σ3 = 4 (see the discussion in Section IV-C).
defined on a finite interval can quantitatively characterize
B. Transform Feature Construction various local second-order curvatures (e.g., caps, ridges,
With the obtained extremum responses, we would like to saddles, ruts and cups), making it a good choice for
construct a compact yet discriminative transform feature set constructing the discriminative transform features.
by taking into account the subsequent quantization and coding. 4) The mixed extrema ratio r, which captures the correla-
To this end, we apply linear and non-linear operators on the tion information of the first- and second-order differen-
extremum responses to quantitatively characterize local texture tial structures. To make the output have a low dynamic
structures and their correlation. The transform feature set is range, we employ the arctangent function as the rectifier:
( ) ( )
constructed as follows. 2 d 2 Iθ − Iθ
r = arctan c · = arctan c · 2maxθ 2min
1) The gradient magnitude g, which is the maximum re- π g π I
 √ 1max
sponse of the first directional Gaussian derivative filter: 2
√ 2 (Lxx − Lyy ) + 4L2xy
θ = arctan c · 
g = I1max = L2x + L2y (9) π L2x + L2y
2) The extrema difference d, which is the difference of (14)
the maximum and minimum responses of the second where c is a scale factor to adjust the ratio of d and g.
directional Gaussian derivative filter: The constructed transform features, denoted as F ={g, d,

θ
d = I2max − I2min
θ 2
= (Lxx − Lyy ) + 4L2xy (10) s, r}, have the following properties. First, they are compact
and rotationally invariant. This is achieved by performing
3) The shape index s, which provides a quantitative mea- linear/non-linear operators on rotation-invariant extremum re-
sure of local second-order curvature: θ
sponses I1max θ
, I2max θ
and I2min , i.e., linear combinations
( θ )
1 1 I θ
+ I2min of the extremum responses to obtain {g, d} whereas non-
s = − arctan − 2max linear combinations to obtain {s, r}. Fig. 4 illustrates the
2 π Iθ − I2min
θ
 2max  rotation invariance of these transform features. Second, they
(11)
1 1  −L xx − L yy  are discriminative and complementary. The feature set F
= − arctan √
2 π 2 encompasses information about the first-order (i.e., {g}) and
(L − L ) + 4L2
xx yy xy
second-order (i.e., {d, s}) differential structures and a mix of
Note that the shape index is originally defined in the both (i.e., {r}) in scale space. As shown in Fig. 1, these
classical differential geometry [65] and it can be derived transform features have strong discriminative power in describ-
from the principal curvatures of a surface [3], [66], i.e., ing different local structures and curvatures, such as edges,
the eigenvalues κ1 and κ2 of local Hessian matrix H: lines, blobs and ridge/valley-like regions. Also, the feature
[ ] set F provides complementary information by employing both
Lxx Lxy
H= (12) absolute amplitude values (i.e., {g, d}) and relative angle/ratio
Lxy Lyy
values (i.e., {s, r}). Third, the feature subset {s, r} whose
and values are in the range [0, 1], lends itself to the simple uniform
( )
2 κ2 + κ1 quantization as defined in Eqn. (16) in the next subsection.
s′ = arctan (κ1 ≥ κ2 ) (13)
π κ2 − κ1
With our definition, all values of s′ in Eqn. (13) have C. Scalar Quantization
been mapped to the interval [0, 1] for the ease of subse- We now consider quantizing the transform features into
quent processing. As shown in Fig. 3, the shape index s discrete texture codes. Our key considerations for designing
6

s =1 s =2 s =4 s =1 s =2 s =4
Frequency

Frequency

Frequency

Frequency

Frequency

Frequency
1 2 3 1 2 3
1000

500
1000

500
1000

500
1000

500
1000

500
1000

500
the proper quantization level L via multi-level thresholding.
0
0 0.7 1.3
0
0 0.53 1.3
0
0 0.2 1.3
0
0 0.51 1.3
0
0 0.6 1.3
0
0 0.21 1.3
Also, the rotation invariance is inherited from the feature set
g g g d d d
800
s =1
800
s =1
F ={g, d, s, r} used in Eqns. (15) and (16). Finally, the robust-
1 1

600
s =2
2
600
s =2
2 ness to illumination changes is gained due to the following
s =4 s =4
3 3
facts: i) A local/global brightness change (i.e., a constant is
Frequency

Frequency
400 400
added to each pixel value) can be cancelled by the difference
200 200 operation in Gaussian derivatives. ii) A local/global contrast
0
0 0.33 0.67 1
0
0 0.2 0.4 0.6 0.8 1
change (i.e., each pixel value is multiplied by a constant) is
s r
already eliminated by the division operation in Eqns. (11)
Fig. 5. Illustration of the binary ratio quantizer for {g, d} and the uniform and (14) for features {s, r}; a global contrast change is also
quantizer for {s, r}. Given a texture image, histograms of g, d, s and r are eliminated by the division operation in Eqn. (15) for {g, d}.
computed at three scales: σ1 = 1, σ2 = 2, σ3 = 4. The red vertical lines
indicate the quantization thresholds: for {g, d}, they are computed as kmx ;
for {s, r}, they are determined by quantization levels L s and L r.
D. Cross-scale Joint Coding and Image Representation
a good quantizer are the computational efficiency, the dis- Next, we consider aggregating the generated texture codes
criminative power as well as the robustness to rotation and into a histogram-based feature representation. A straightfor-
illumination changes. Toward these goals, two types of scalar ward way is to jointly encode all of these texture codes at all
quantization via simple binary or multi-level thresholding are scales, but the resulting feature histogram is extremely high
designed by taking into account different feature properties. dimensional and sparse, and may not be discriminative. To deal
For transform features {g, d} whose values are in a non- with this problem, we propose a cross-scale joint coding to first
negative interval, we adopt a mean-value based binary ratio construct multiple feature histograms, and then concatenate
quantizer Q1 (·): these histograms for image feature representation.
{
0, if x/mx > k The cross-scale joint coding is performed as follows (see
y = Q1 (x) = (15)
1, otherwise the diagram of this process in Fig. 1).
where x ∈ {g, d}, k is a tuning parameter, and mx is the mean • Adjacent-scale coding (ASC): Transform features
value of the transform feature map of x (see Fig. 1). Here, the {g, d, s} are jointly encoded across two adjacent scales,
use of mx is robust to image rotation, and the similar averaging e.g., (σ1 , σ2 ), (σ2 , σ3 ), etc. Formally, for the adjacent-
operator was previously used in [13], [23], [24], [40]. Fig. 5 scale pair (σi , σi+1 ) (i = 1, 2, ..., Nσ − 1), the ASC
illustrates the binary ratio quantizer for {g, d}4 . For transform value of pixel (x, y) in image I is computed as
features {s, r} whose values are in the range [0, 1], we adopt ∑
2
a uniform quantizer Q2 (·): j−1
ci (x, y) = (L s) ys (x, y; σi+j−1 )+


 0, x ∈ [0, ∆] j=1
 ∑
2[ ]
2
1, x ∈ [∆, 2∆] j−1
y = Q2 (x) = (16) (L s) (L d) yd (x, y; σi+j−1 ) + (17)

 ···

L − 1, x ∈ [(L − 1)∆, 1] j=1

2[

2
]
where x ∈ {s, r}, L is the quantization level, and ∆ = 1/L 2
(L s) (L d) (L g)
j−1
yg (x, y; σi+j−1 )
is the quantization step. The quantization levels for s and j=1
r, denoted by L s and L r, are not necessarily the same,
where ys (x, y; σi ), yd (x, y; σi ) and yg (x, y; σi ) are re-
but both are closely related to the texture discrimination
spectively the quantized texture codes (with correspond-
and feature dimension. As can be seen from Fig. 5, if the
ing quantization levels L s, L d and L g) for features
quantization level L s or L r is too small, the quantized
s, d and g at the scale σi .
feature codes will be coarse and lack the discrimination. If
• Full-scale coding (FSC): Transform features {r} are
too large, however, the resulting feature codes will be noisy
jointly encoded across all Nσ scales (σ1 , ..., σNσ ). The
and tend to produce a high-dimensional feature representation.
FSC value of pixel (x, y) in image I is computed by
To keep a balance among them, we set L s = 3 and L r = 5.
These parameter settings will be discussed in Section IV-C. ∑

j−1
Based on the above steps, the computational efficiency cNσ (x, y) = (Lr ) yr (x, y; σj ) (18)
is achieved by using the simple scalar quantization. Mean- j=1

while, the discriminative power is maintained by choosing where yr (x, y; σj ) is the quantized texture code for
4 In this work, we have straightforwardly quantized {g, d} into two levels
feature r at the scale σj and Lr is the quantization level.
and achieved good classification performance. It is possible to quantize {g, d} Then, Nσ feature histograms {Hi |i = 1, 2, ..., Nσ } can be
into multiple levels to improve the performance, which may require some
efforts to design the quantization thresholds and to keep the compactness of built for image I:
encoded features. In fact, how to theoretically seek the optimal quantization ∑
levels to achieve the best classification result is an open problem, and we Hi (l) = f (ci (x, y), l) (19)
leave it as our future work. (x,y)∈I
7

where l ∈ {0, 1, ..., Ci } (Ci is the maximum value of ci ) and


{
1, if m = n
f (m, n) = (20)
0, otherwise
Note that in our implementation Nσ = 3, L s = 3, L d =
L g = 2, and L r = 5. The resulting histograms H1 , H2 and
H3 are of dimensions 144, 144 and 125, respectively. From
these three histograms, the following feature representations
Fig. 6. Image examples of the Outex [67], CUReT [68], KTH-TIPS [69],
can be obtained: and UIUC [38] datasets. Each row presents images of the same texture class
• LETRIST ASC1: the histogram H1 based on the ASC at (in total 3 classes per dataset). We can see that classifying these images
is challenging due to the presence of large intraclass changes (rotation,
two adjacent scales (σ1 , σ2 ) = (1, 2). illumination, scale, viewpoint, and deformation) and small interclass changes
• LETRIST ASC2: the histogram H2 based on the ASC at (e.g., the images in the first and second rows).
two adjacent scales (σ2 , σ3 ) = (2, 4).
• LETRIST ASC: the concatenated histogram [H1 , H2 ].
• LETRIST FSC: the histogram H3 based on the FSC A. Datasets and Experimental Setup
across all three scales (σ1 , σ2 , σ3 ) = (1, 2, 4).
In this subsection, we first describe four benchmark texture
• LETRIST: the concatenated histogram [H1 , H2 , H3 ].
datasets and then present the experimental setup. Fig. 6 shows
We choose LETRIST as the final image feature representa- some image examples of each dataset.
tion, which is a 413-dimensional image descriptor.
1) Outex Dataset [67]: The Outex dataset5 is widely used
for rotation and illumination invariance texture classification. It
E. Properties of LETRIST consists of 24 texture classes, and each class contains 20 non-
Local image derivatives contain important information for overlapping texture images (128×128 pixels). These images
characterizing intrinsic image structures such as edges, lines are acquired under 3 different illumination conditions (“inca”,
and blobs. The proposed LETRIST enriches such information “tl84” and “horizon”) with 9 different rotation angles (0◦ ,
via feature transform and comprehensively encodes each pixel 5◦ ,10◦ , 15◦ , 30◦ , 45◦ , 60◦ , 75◦ and 90◦ ). Two test suites,
via cross-scale joint coding while ensuring some invariance Outex TC 00010 (TC10) and Outex TC 00012 (TC12), are
to image rotation and illumination changes. The resulting used in our experiments.
LETRIST descriptor has many desirable properties for texture • The TC10 test suite is designed for evaluating the rotation
classification, which are summarized as follows: invariance. For this test suite, there are a total of 4320
1) It is discriminative because it takes advantage of: i) the (24×20×9) texture samples taken under the “inca” illu-
informative transform features (the first- and second- mination condition: 480 (24×20) samples with angle 0◦
order structures and a mix of both); ii) the descriptive are used as the training set, and 3840 (24×20×8) samples
texture codes via multi-level thresholding; and iii) the with other eight rotation angles are used as the test set.
combined information across feature and scale spaces. • The TC12 test suite is designed for evaluating both
2) It is robust to image rotation because: i) it is built upon the rotation and illumination invariance. For this test
the rotation-invariant extremum responses; and ii) it uses suite, the same samples as TC10 are used for training
the global averaging-based quantization in Eqn. (15). and all other samples with the illumination “tl84” or
3) It is also robust to illumination changes by employing “horizon” are used for testing. Hence, there are a total
image derivatives (difference operation) and their ratio of 480 (24×20) training samples and 4320 (24×20×9)
features (division operation). test samples for both TC12-tl84 and TC12-horizon.
4) It is insensitive to noise. The main reasons lie in that: i)
2) CUReT Dataset [68]: The CUReT dataset contains 61
the low-order Gaussian derivatives are to some degree
texture classes which are captured under different viewing and
insensitive to noise; ii) the extremum responses are
illumination conditions. We use the same subset of images
stable and robust; and iii) the global averaging operator
(i.e., the cropped CUReT dataset6 ) as in [10], [11], [14], [17],
used in Eqn. (15) can reduce the influence of noise.
[39], [40], [42], [58]: 61 texture classes with 92 images (of
5) It is low dimensional mainly due to the use of a compact
size 200×200) per class, for the classification task. Following
feature set coupled with the cross-scale joint coding.
the common practice, 46 images per class are randomly chosen
6) It is computationally efficient thanks to the exploitation
for training and the remaining 46 images per class for testing.
of steerable filters and scalar quantization.
3) KTH-TIPS Dataset [69]: The KTH-TIPS dataset7 is
IV. E XPERIMENTAL R ESULTS designed to supplement the CUReT database with scale vari-
ations. This dataset includes 10 texture classes with images
To evaluate the proposed method, we perform extensive tex- acquired at 9 scales, viewed under 3 different illumination
ture classification experiments on four popular and challenging directions and 3 different poses. This produces a total of 81
datasets: Outex [67], CUReT [68], KTH-TIPS [69], and UIUC
[38]. We discuss the parameter settings, analyze the influences 5 http://www.outex.oulu.fi/
of different processing techniques, compare our method with 6 http://www.robots.ox.ac.uk/∼vgg/research/texclass/index.html

the state-of-the-art, and also evaluate the noise robustness. 7 http://www.nada.kth.se/cvap/databases/kth-tips/


8

images per class, of which 40 images are randomly chosen for 7) BRINT [10]: A rotation-invariant texture descriptor by
training and the remaining 41 for testing. exploring the idea of averaging before binarization. The
4) UIUC Dataset [38]: The UIUC dataset8 contains 25 tex- best results were reported using the 9-scale histogram
ture classes. Each class has 40 images (640×480 pixels) with representation BRINT2 CS CM (1296 dimensions).
significant viewpoint and scale variations. In addition, non- 8) PRICoLBP [58]: A rotation-invariant texture descriptor
rigid deformations, uncontrolled illuminations and viewpoint- by encoding spatial co-occurrence LBP features. We
dependent appearance variations are present in this dataset. directly take the results obtained by P RICoLBPg for
As in [11], [19], [39], half of samples per class are randomly comparison (1180 dimensions).
chosen for training and the remaining half for testing. 9) COV-LBPD [18]: A covariance descriptor fusing the
In our experiments, each texture sample is converted into numerical LBP variant (LBPD) with other features.
gray scale and then normalized to have zero mean and standard In [18], different (rotation-invariant or rotation-variant)
deviation [10], [11], [14], [40]. This normalization removes feature sets were designed for different datasets.
global affine illumination changes. Since our focus is on image 10) PLS [59]: A texture descriptor that exploits the lacu-
feature representation, we use a simple nearest-neighborhood narity analysis to characterize the spatial distribution of
(NN) classifier for texture classification [10], [11], [14], [17]– local image patterns (i.e., LBPs) from multiple scales.
[20], [40], [42], [52]. The distance between two feature 11) VZ MR8 [36]: A texton-based texture representation
histograms is measured using the chi-square statistic: using 8 filter responses derived from 38 filters.
∑ [H1 (k) − H2 (k)]2 12) VZ Joint [11]: A texton-based texture representation
χ2 (H1 , H2 ) = based on local image patches around each pixel.
H1 (k) + H2 (k)
k 13) VZ MRF [11]: A texton-based texture representation
where H1 and H2 are two feature histograms with bins using a two-dimensional histogram — one dimension
indexed by k. The performance is measured in terms of the for the quantized center pixel, and the other dimension
(mean) classification accuracy, which is defined as the number for the assigned texton for each pixel’s neighbour (i.e.,
of correctly classified test samples divided by the total number the N ×N image patch with the center pixel discarded).
of all test samples. For the CUReT, KTH-TIPS and UIUC 14) RP [17]: A texton-based texture representation based on
datasets, the results are reported as the average accuracy over local image patches using random project.
100 random splits of the training and test sets. 15) CMR [22]: A texton-based texture representation based
on continuous maximum responses of Gaussian deriva-
B. Comparison Methods and Implementation Details tives filters computed at 4 scales.
We compare our method with 2 baselines and 20 state-of- 16) BIF [19]: A BIF-based texture representation. A set of
the-art approaches. The details are given as follows. 6 BIFs is computed at 4 scales to populate a 1296-
dimensional histogram. The histogram stack is generated
1) LBP [1]: The rotation-invariant uniform patterns
riu2 based on 8 base scales. The multi-scale metric and
LBPP,R (P sampling neighbors on a circle of radius
riu2 histogram-stack scale-shifting are adopted to obtain the
R). We implement LBP24,3 as one baseline which has
minimum histogram matching distance.
26-dimensional features.
17) LEP [20]: The pyramid LEP with shifting scheme
2) LTP [41]: The local ternary patterns based on the split
riu2 P LEP6,3 . Similar to [19], the 3-scale LEP features are
ternary coding. We implement LT P24,3 as the other
computed at 3 pyramid levels to obtain the minimum
baseline which has 52-dimensional features.
histogram matching distance.
3) CLBP [40]: The completed LBP by encoding the center
18) LFD [14]: A rotation-invariant local frequency descrip-
pixel, the signs and magnitudes of local difference. We
tor by extracting the magnitude and phase features from
implement the 3-scale joint histogram representation
riu2 riu2 riu2 riu2 low frequency components (264 dimensions).
CLBP S8,1 /M8,1 /C + CLBP S16,2 /M16,2 /C +
riu2 riu2 19) RTL [70]: An approach for iterative rotation–covariant
CLBP S24,3 /M24,3 /C (2200 dimensions).
texture learning using steerable Riesz wavelets. We take
4) CLBC [42]: The completed LBC by encoding three
the best results reported in [70] for comparison.
complementary components similar to CLBP. We im-
20) LNP [57]: The local n-ary pattern which generalizes the
plement the 3-scale joint histogram representation
riu2 riu2 riu2 riu2 local pattern representation. In [57], different (rotation-
CLBC S8,1 /M8,1 /C + CLBC S16,2 /M16,2 /C +
riu2 riu2 invariant or rotation-variant) LNP descriptors were used
CLBC S24,3 /M24,3 /C (1900 dimensions).
for different datasets and we report the best results.
5) disCLBP [8]: A discriminative texture descriptor by
21) LEBC [23]: A rotation-invariant texture representation
learning the optimal pattern subset. The best results were
using binary code ensemble. The source image and the
reported by integrating the learning model with CLBP.
responses of the edge and bar filters at 8 orientations are
6) LBP-PTP [52]: A rotation-invariant texture descriptor
used as the feature set, followed by local binary coding.
using Pixel-To-Patch (PTP) sampling structure to encode
This yields a 1280-dimensional feature representation.
the neighboring intensity relationship. The best results
22) SFC [24]: A rotation-invariant texture representation
were reported using the 3-scale joint histogram repre-
by exploring space-frequency co-occurrences (SFC) via
sentation LNIRP/LBP/DCI PTP (600 dimensions).
local quantized patterns. We report the results using the
8 http://www-cvr.ai.uiuc.edu/ponce grp/data/index.html setting of G3 L1 R22 (1728 dimensions).
9

100
97.5 99

Classification accuracy (%)

Classification accuracy (%)

Classification accuracy (%)


97
99.5 98.5
96.5
96 98
99
95.5
L_s=2 L_s=2 97.5 L_s=2
95
98.5 L_s=3 L_s=3 L_s=3
L_s=4 94.5 L_s=4 97 L_s=4
L_s=5 L_s=5 L_s=5
98 94
L_s=6 L_s=6 L_s=6
96.5
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
k k k

(a) (b) (c)

Fig. 7. Classification performance of the proposed LETRIST ASC with respect to k and L s. (a) Outex TC10. (b) CUReT. (c) KTH-TIPS.

90
98 94
85 92
Classification accuracy (%)

Classification accuracy (%)


Classification accuracy (%)
96 L_r=2
L_r=3
80 L_r=4 90
94
L_r=5 88 L_r=2
75 L_r=6
92 L_r=3
L_r=7 86 L_r=4
90 L_r=2 70
L_r=5
L_r=3 84 L_r=6
88 L_r=4 65
82 L_r=7
L_r=5
86 L_r=6 60
80
L_r=7
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6
c c c

(a) (b) (c)

Fig. 8. Classification performance of the proposed LETRIST FSC with respect to c and L r. (a) Outex TC10. (b) CUReT. (c) KTH-TIPS.

(a) (b) (c)

Fig. 9. Classification performance of the proposed LETRIST as a function of k and c (L s = 3 and L r = 5). (a) Outex TC10. (b) CUReT. (c) KTH-TIPS.

Among the above 22 approaches, 5), 11)-15), 17) and 19) the experiments on Outex (TC10), CUReT and KTH-TIPS
require a learning phase to extract texture features while others datasets and the test results are reported in Fig. 7, Fig. 8 and
are training-free. In our experiments, the results of LBP, LTP, Fig. 9, respectively.
CLBP, CLBC, LEBC and SFC are reported according to our
own implementation. The results of other methods are directly From Fig. 7 and Fig. 8, it can be seen that there are
taken from the literature. significant performance gaps when L (L s and L r) is varied
from 2 to 3 (or more) across all three datasets. This is because
when the quantization level L is set as 2, the quantizer is too
C. Parameter Settings and Influences of Different Processing coarse and most of the potentially discriminative information
Techniques is discarded. As the number of L increases, more structure
1) Parameter Settings: The proposed method involves four information is preserved, hence leading to the improved clas-
main parameters, i.e., c in Eqn. (14), k in Eqn. (15), L sification performance. Furthermore, it can be seen that when
(L s and L r) in Eqn. (16) that need to be evaluated. L s > 2 and L r > 3, LETRIST ASC and LETRIST FSC
We notice that the parameters k and L s are associated can obtain stable and high performance with a wide range of
with the LETRIST ASC representation while c and L r values of k and c, i.e., k ∈ [1, 2.6] and c ∈ [0.4, 1.8]. This gives
are associated with LETRIST FSC. Thus, we first test the us more freedom to choose the values of k and c. From Fig.
classification performance of LETRIST ASC with respect 7 (b) and (c), one can see that increasing L s does not always
to k and L s, and then test the classification performance lead to better classification results. The reason is that when
of LETRIST FSC with respect to c and L r. Finally, we the quantization level increases, the quantization step becomes
test the overall classification performance of LETRIST (i.e., smaller, making the output more vulnerable to noise. Also,
the concatenated LETRIST ASC and LETRIST FSC) with a large quantization level tends to give a high-dimensional
different combinations of k, c, L s and L r. We conduct representation. According to our experiments, it is preferable
10

100 100 TABLE I


Classification accuracy (%)

Classification accuracy (%)


C LASSIFICATION ACCURACY (%) ON THE O UTEX DATASET. T HE RESULTS
90
OF VZ MR8 AND VZ JOINT ARE TAKEN FROM [40]. T HE RESULT OF
80
95 RTL [70] WAS OBTAINED USING THE S UPPORT V ECTOR M ACHINE (SVM)
CLASSIFIER . T HE HIGHEST CLASSIFICATION ACCURACIES ARE IN BOLD .
70
90
60
Outex_TC12 (tl84) Outex_TC12 (tl84)
Method TC10 TC12-tl84 TC12-horizon Published in
CUReT CUReT LBP [1] 95.08 85.05 80.79 PAMI02
S30 S31 S32 S33 S40 S41 S42 S50 S51 F0 F10 F11 F12 F13 F20 F21 F22 F23 F24 LTP [41] 98.20 93.59 89.42 TIP10
Scheme index Scheme index
CLBP [40] 99.17 95.23 95.58 TIP10
(a) (b) CLBC [42] 99.04 94.10 95.14 TIP12
disCLBP [8] - 97.0 96.5 PR12
Fig. 10. Classification performance of the proposed method using different LBP-PTP [8] 99.77 98.87 98.70 SPL13
processing techniques. (a) Scale schemes. “Smi” denotes the i-th scheme BRINT [10] 99.35 97.69 98.56 TIP14
using Nσ = m scales. The values of {σ} for these schemes: S30 = (1, 2, 4) VZ MR8 [36] 93.59 92.55 92.82 IJCV05
(i.e., LETRIST), S31 = (2, 4, 8), S32 = (4, 8, 16), S33 = (8, 16, 32), VZ Joint [11] 92.00 91.41 92.06 PAMI09
S40 = (1, 2, 4, 8), S41 = (2, 4, 8, 16), S42 = (4, 8, 16, 32), S50 = LFD [14] 99.38 98.77 98.66 TIP13
(1, 2, 4, 8, 16), S51 = (2, 4, 8, 16, 32). (b) Feature and coding schemes. LEBC [23] 99.45 98.36 98.15 VCIP13
F 0 = {g, d, s, r} (i.e., LETRIST), F 10 = {d, s, r}, F 11 = {g, s, r}, RTL [70] 98.4 97.8 98.4 TIP14
F 12 = {g, d, r}, F 13 = {g, d, s} (i.e., LETRIST ASC). F 20 jointly LNP [57] 94.97 80.93 83.77 TCSVT15
encodes features {g, d, s, r} at each scale and then concatenates three SFC [24] 100.00 99.77 99.93 PR15
histograms. F 21 jointly encodes each feature across three scales and then LETRIST ASC1 99.71 96.90 98.98
concatenates four histograms. F 22 and F 23 perform the coding similar to LETRIST ASC2 99.64 96.90 97.69
F 20 and F 21 respectively, and F 24 performs adjacent-scale coding (ASC). LETRIST FSC 98.23 97.18 96.74
All of F 22, F 23 and F 24 operate on the binarized (thresholding by the mean LETRIST 100.00 99.81 100.00
θ
here) extremum responses I1max θ
, I2max θ
and I2min .

D. Classification Results
to set L s ∈ {3, 4} and L r ∈ {3, 4, 5}. When L s = 3 In this subsection, we perform a comparative evaluation of
and L r = 5, one can observe from Fig. 9 that our method the proposed method against the state-of-the-art in terms of
is not sensitive to k and c — the classification performance the classification accuracy. Tables I, II, III and IV present
does not vary much by varying k ∈ {1.4, 1.6, 1.8, 2, 2.2} and the comparison results on the Outex, CUReT, KTH-TIPS and
c ∈ {0.6, 0.8, 1, 1.2, 1.4}. Thus, in this work we set k = 2, UIUC datasets, respectively.
c = 1, L s = 3 and L r = 5 by default. Results on Outex Dataset. From Table I, one can observe
2) Influences of Different Processing Techniques: In this that LETRIST performs best for all test suites on the Outex
part, we evaluate the contributions of different processing dataset. Impressively, the perfect recognition rates of 100%
techniques to the proposed method, including the possible have been achieved for TC10 and TC12-horizon test suites.
scale, feature and coding schemes. We test on Outex TC12 The second best method is SFC, followed by LBP-PTP. When
(tl84) and CUReT and plot the results in Fig. 10. tested from TC10 to TC12, LBP-PTP has a performance
drop whereas the proposed method shows steady classification
Fig. 10 (a) shows the classification performance of the performance. This demonstrates the robustness of our method
proposed method by varying the scale parameters (σ1 , ..., σNσ ) to the mixed variations of rotation and illumination. The
used in multi-scale extremum filtering. It can be seen that training-free LFD, LEBC and BRINT perform fairly well,
increasing the number of scales does not always improve the and they all produce better results than LBP’s variants, CLBP
classification accuracy. With the same number of scales, using and CLBC. For learning-based methods, RTL works better
the smaller scales can obtain better results. Therefore, in this than disCLBP for the TC12 test suite, and both outperfor-
paper we empirically set σ1 = 1, σ2 = 2 and σ3 = 4 to obtain m VZ MR8 and VZ joint by a large margin. Especially,
a low-dimensional image descriptor while preserving the high the classification accuracies of VZ MR8 and VZ joint are
classification performance. These parameters will be fixed and even lower than those of two baselines, LBP and LTP, for
used for all datasets in the following experiments. TC10. Among the proposed features, LETRIST ASC1 per-
Fig. 10 (b) shows the results of the proposed method using forms best for TC10 and TC12-horizon test suites whereas
different feature and coding schemes. By comparing F 0 LETRIST FSC performs best for TC12-tl84. When these three
with F 10, F 11, F 12 and F 13, we can observe the reduced features are combined, the proposed LETRIST leads to the
classification performance if any one of these four features best performance, demonstrating the complementarity of these
{g, d, s, r} is not used. In particular, there is a relatively large three features for texture description. Fig. 11 shows the test
drop in classification accuracy without using {s} or {r}. These samples and their nearest neighbours wrongly classified by
results indicate the effectiveness of each transform feature for our method for the TC12-tl84 test suite. One can see that
texture description. By comparing F 0 with F 20 and F 21, the visual similarity of these 8 image pairs accounts for this
we can observe the advantages of the proposed cross-scale misclassification.
joint coding over other schemes. Also, F 22, F 23 and F 24 Results on CUReT Dataset. As can be seen from Table
show very poor performance by using the extremum responses. II, most of the state-of-the-art approaches are competitive on
Therefore, the proposed transform features coupled with the this dataset. Among the top six approaches are SFC, BIF,
cross-scale joint coding are effective for texture classification. LETRIST, LEBC, RP and CMR, all of which produce about
11

Fig. 11. Texture samples and their nearest neighbours wrongly classified by our method for the Outex TC12 (tl84) test suite. Top: test samples; Bottom:
the nearest neighbours found from the training samples. “ID” denotes the image index and “T” denotes the image class index.

TABLE II TABLE III


C LASSIFICATION ACCURACY (%) ON THE CUR E T DATASET. T HE RESULTS C LASSIFICATION ACCURACY (%) ON THE KTH-TIPS DATASET. T HE
OF VZ MR8 AND VZ JOINT ARE TAKEN FROM [40]. T HE RESULT OF RESULTS OF VZ MR8 AND VZ JOINT ARE TAKEN FROM [20]. T HE
PRIC O LBP [58] WAS OBTAINED USING THE S UPPORT V ECTOR M ACHINE RESULTS OF PLS [59] AND PRIC O LBP [58] WERE OBTAINED USING THE
(SVM) CLASSIFIER . T HE NUMBER MARKED WITH † IS THE DIMENSION OF S UPPORT V ECTOR M ACHINE (SVM) CLASSIFIER . T HE NUMBER MARKED
ONE SINGLE HISTOGRAM FROM HISTOGRAM STACKS . T HE FOUR BEST WITH † (‡) IS THE DIMENSION OF ONE SINGLE HISTOGRAM FROM
RESULTS ARE IN BOLD . HISTOGRAM STACKS ( PYRAMID HISTOGRAMS ). T HE HIGHEST
CLASSIFICATION ACCURACY IS IN BOLD .
Method Accuracy Dimension Published in
LBP [1] 87.14 26 PAMI02 Method Accuracy Dimension Published in
LTP [41] 92.51 52 TIP10 LBP [1] 90.08 26 PAMI02
CLBP [40] 97.00 2200 TIP10 LTP [41] 93.51 52 TIP10
CLBC [42] 96.76 1990 TIP12 CLBP [40] 96.77 2200 TIP10
disCLBP [8] 98.3 - PR12 CLBC [42] 96.71 1990 TIP12
VZ MR8 [36] 97.79 2440 IJCV05 COV-LBPD [18] 98.00 210 TIP14
VZ Joint [11] 97.66 2440 PAMI09 VZ MR8 [36] 93.50 400 IJCV05
VZ MRF [11] 98.03 219600 PAMI09 VZ Joint [11] 95.46 400 PAMI09
RP [17] 98.52 2440 PAMI12 CMR [22] 95.13 400 CVIU13
CMR [22] 98.48 2440 CVIU13 BIF [19] 98.5 1296(†) IJCV10
BIF [19] 98.6 1296(†) IJCV10 LEP [20] 97.56 729(‡) TIP13
LFD [14] 97.90 264 TIP13 LEBC [23] 97.93 1280 VCIP13
LEBC [23] 98.54 1280 VCIP13 PLS [59] 98.40 140 CVPR14
BRINT [10] 97.86 1296 TIP14 PRICoLBP [58] 98.4 1180 PAMI14
PRICoLBP [58] 98.4 1180 PAMI14 SFC [24] 98.11 1728 PR15
SFC [24] 98.74 1728 PR15 LETRIST ASC1 97.91 144
LETRIST ASC1 95.61 144 LETRIST ASC2 97.51 144
LETRIST ASC2 94.38 144 LETRIST FSC 93.98 125
LETRIST FSC 89.96 125 LETRIST 99.00 413
LETRIST 98.54 413

98.5∼98.7% classification accuracies. They are followed by no additional learning as in disCLBP, VZ MR8, VZ Joint,
PRICoLBP, disCLBP, VZ MRF, LFD, BRINT, VZ MR8, and VZ MRF, RP, and CMR.
VZ Joint. Two baselines, LBP and LTP, give the lowest perfor- Results on KTH-TIPS Dataset. From Table III, one can
mances by using very low-dimensional feature representations. clearly see that the proposed LETRIST performs best among
Regarding the proposed features, LETRIST ASC1 yields the all the compared methods. The next top methods are BIF, PLS
best results and LETRIST FSC the worst. However, similar to and PRICoLBP with classification accuracies all being about
what has been shown on the Outex dataset, the combination of 0.5∼0.6% lower than ours. After these methods, SFC, COV-
three complementary features produces remarkably improved LBPD and LEBC obtain similar results and they work better
performance on this dataset. Although the state-of-the-art than LEP, CLBP and CLBC. The texton-based VZ MR8,
approaches have achieved high classification accuracies, they VZ joint and CMR have not shown great advantages on
typically require a large number of feature dimensions. For this dataset. For the proposed features, LETRIST ASC1 and
instance, the dimensions of SFC, BIF, LEBC, RP, CMR and LETRIST ASC2 (both 144 dimensions ) produce better results
PRICoLBP are 1728, 1296, 1280, 2440, 2440 and 1180, than CLBP (2200 dimensions) and CLBC (1990 dimensions).
respectively. In contrast, LFD and LETRIST respectively Moreover, our compact LETRIST ASC1 performs competi-
enjoy 264 and 413 dimensional features. In this situation, the tively with the 1278-dimensional SFC. It should be mentioned
proposed LETRIST outperforms LFD by about 0.6%. Note that the rotation-variant features were used in COV-LBPD and
that the good result of 98.6% achieved by BIF on this dataset LEP to produce the reported results in Table III. This puts them
is based on the multi-scale metric and scale-shifting. Without at a great advantage for the KTH-TIPS dataset in which the
such “post-processing” techniques, the classification accuracy images have no significant rotation. Nonetheless, the proposed
of BIF is only 98.1% [19], in contrast to our result of 98.54%. method surpasses all other methods investigated.
It is also noted that our method is training-free, requiring Results on UIUC Dataset. As shown in Table IV, BIF
12

TABLE IV TABLE V
C LASSIFICATION ACCURACY (%) ON THE UIUC DATASET. T HE RESULTS C LASSIFICATION ACCURACY (%) ON O UTEX TC40 A AND
OF PLS [59] AND LNP [57] WERE OBTAINED USING THE S UPPORT O UTEX TC40 BC (TC40 BC REPRESENTS TC40 B AND TC40 C, AND
V ECTOR M ACHINE (SVM) CLASSIFIER . T HE NUMBER MARKED WITH † IS THE RESULT FOR TC40 BC IS THE AVERAGE ACCURACY OBTAINED ON
THE DIMENSION OF ONE SINGLE HISTOGRAM FROM HISTOGRAM STACKS . TC40 B AND TC40 C). T HE RESULTS FOR THE METHOD MARKED WITH
T HE THREE BEST RESULTS ARE IN BOLD . † (‡) WERE OBTAINED USING THE PCA (SVM) CLASSIFIER . T HE RESULTS
FOR THE COMPARED METHODS ARE TAKEN FROM [74].
Method Accuracy Dimension Published in
LBP [1] 64.06 26 PAMI02 Dataset ScatNet(†) ScatNet PCANet FV-VGGVD(‡) LETRIST
LTP [41] 82.08 52 TIP10 TC40 A 94.07 87.55 59.49 93.7 98.59
CLBP [40] 91.56 2200 TIP10 TC40 BC 77.93 72.45 44.39 71.6 98.35
CLBC [42] 92.30 1990 TIP12
VZ MR8 [36] 92.94 2500 IJCV05
VZ Joint [11] 97.83 2500 PAMI09
BIF [19] 98.8 1296(†) IJCV10
LEBC [23] 94.29 1280 VCIP13
PLS [59] 96.57 140 CVPR14
LNP [57] 89.2 - TCSVT15
SFC [24] 96.71 1728 PR15
LETRIST ASC1 96.45 144
LETRIST ASC2 96.52 144
LETRIST FSC 84.84 125
LETRIST 97.63 413 Fig. 12. Left: noise-free images; Right: noisy images with SNR=15, 10, and
5 (from left to right).

performs best on the UIUC Dataset and VZ joint comes


second, closely followed by LETRIST. By using a much We collect a subset of ImageNet by retrieving 10 image
more compact feature representation, the proposed LETRIST classes: Bird, Equine, Aircraft, Brick, Building, Newspaper,
improves SFC by about 0.9%. Based on the LBP operator, Fruit, Face, Flower, and Tree. There are at least 1045 images
PLS works better than LEBC, while LEBC works better than and at most 1603 images in each class, leading to a collection
CLBC and CLBP. The texton-based VZ joint outperforms its of 13460 images in total. We randomly select 800 images per
competitor VZ MR8, and the LBP’s variant CLBC outper- class as the training set and 200 images per class as the test set.
forms its competitor CLBP on this dataset. Regarding the low- For SIFT, we adopt the popular Bag of Words representation:
dimensional descriptors, LETRIST ASC1, LETRIST ASC2 we extract dense SIFT features using the code provided by
and PLS produce comparable results which are much better Lezibnik in [72] and learn 1000 visual words via k-means
than those of the high-dimensional LEBC, VZ MR8, CLBC clustering from 640000 SIFT features in the training set. The
and CLBP9 . The baselines of LBP and LTP perform worst by classification accuracies of LETRIST and SIFT are 57.81%
using much fewer number of features than other methods. Fi- and 56.12%, respectively. Among the 10 classes, LETRIST
nally, it needs to be pointed out that LETRIST has achieved the outperforms SIFT by 16.18% for Newspaper and 15.65% for
promising recognition rate of 97.63% on the challenging UIUC Flower while SIFT outperforms LETRIST by 18.40% for
dataset without using any “post-processing” techniques (e.g., Aircraft and 13.88% for Brick, demonstrating the power of
histogram-stack scale-shifting in BIF). The potential reasons LETRIST for describing textured scenes.
why LETRIST has shown robustness to scale changes on this We also compare LETRIST with the recent CNN-based fea-
dataset are that: i) LETRIST captures the salient structure (e.g., tures including ScatNet [27], PCANet [28] and FV-VGGVD
edge and line) and curvature features (e.g., cap, ridge, rut and [73] on Outex TC40 A and Outex TC40 BC [74] for large-
cup) — these features imply some scale invariance even when scale rotation (and illumination) invariance texture classifica-
the image varies in size. ii) The cross-scale joint coding is, to tion. As shown in Table V, the proposed LETRIST descriptor
some extent, robust to scale changes. In addition, LETRIST using the nearest-neighborhood classifier shows better results
has the advantages of computational simplicity in feature than CNN-based features (even they use the PCA/SVM classi-
extraction and low-dimensionality in feature representation. fier). In particular, LETRIST outperforms CNN-based features
The average time required for constructing an image descriptor on TC40 BC by a large margin. Since the filters used in
from each image (640×480 pixels) in the UIUC dataset is: traditional CNNs (e.g., the convolutional filters in FV-VGGVD
0.35s for LETRIST, 0.36s for LBP, 0.34s for LTP, 0.82s for and the PCA filters in PCANet) lack a built-in mechanism
CLBP, 0.72s for CLBC, 2.91s for LEBC, and 0.85s for SFC, to deal with the rotation changes [75], the resulting CNN-
by running MATLAB (R2013a) on a PC with 2.3GHz Intel(R) based features show inferior classification performance on Ou-
Xeon(R) CPU, 32GB RAM, and Windows 7. tex TC40 A and Outex TC40 BC. These results demonstrate
the superiority of LETRIST for texture classification with
significant rotation and illumination changes.
E. Comparison with SIFT and CNN-based Features
In this section, we compare LETRIST with the popular
SIFT [37] features on ImageNet [71] for image classification. F. Robustness to Noise
9 The
In this subsection, we evaluate the robustness of LETRIST
result of PLS was obtained using the support vector machine (SVM)
classifier. The superiority of the SVM classifier over the NN classifier has to Gaussian noise. For this purpose, each texture image in the
been shown in [10], [53], [69]. Outex and CUReT datasets is corrupted by additive Gaussian
13

to the 3D volume by extracting and encoding transform


features in the spatio-temporal domain. We believe that our
feature representation will have great potential in a number
of challenging computer vision tasks such as dynamic texture
classification, object tracking, and action recognition.

ACKNOWLEDGMENT
The authors would like to thank MVG, Z. Guo and Y. Zhao
for providing the source codes of LBP, CLBP and CLBC.

R EFERENCES
[1] T. Ojala, M. Pietikäinen, and T. Mäenpää, “Multiresolution gray-scale
and rotation invariant texture classification with local binary patterns,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971–987,
Jul. 2002.
[2] T. Ahonen, A. Hadid, and M. Pietikäinen, “Face description with local
binary patterns: Application to face recognition,” IEEE Trans. Pattern
Fig. 13. Performance comparison of different methods in the presence of Anal. Mach. Intell., vol. 28, no. 12, pp. 2037–2041, Dec. 2006.
different levels of Gaussian noise on Outex and CUReT datasets. [3] A. Larsen, J. Vestergaard, and R. Larsen, “HEp-2 cell classification using
shape index histograms with donut-shaped spatial pooling,” IEEE Trans.
Med. Imag., vol. 33, no. 7, pp. 1573–1580, Jul. 2014.
[4] T. Song and H. Li, “Local polar DCT features for image description,”
noise with zero mean and standard deviation determined IEEE Signal Process. Lett., vol. 20, no. 1, pp. 59–62, 2013.
by the signal-to-noise ratio (SNR) [10]. Fig. 12 shows two [5] T. Song, F. Meng, Q. Wu, B. Luo, T. Zhang, and Y. Xu, “L2SSP:
Robust keypoint description using local second-order statistics with soft-
example images and their noisy images and Fig. 13 presents pooling,” Neurocomput., vol. 230, pp. 230–242, 2017.
the classification results of different methods with respect to [6] T. Song and H. Li, “WaveLBP based hierarchical features for image
different SNR levels. One can see that the proposed LETRIST classification,” Pattern Recogn. Lett., vol. 34, no. 12, pp. 1323–1328,
2013.
shows very strong “anti-noise” ability compared with LBP, [7] Y. Zhang, J. Wu, and J. Cai, “Compact representation of high-
LTP, CLBP, CLBC and LEBC at all noise levels on both dimensional feature vectors for large-scale image recognition and re-
datasets. Moreover, LETRIST works much better than the trieval,” IEEE Trans. Image Process., vol. 25, no. 5, pp. 2407–2419,
May. 2016.
state-of-the-art BRINT when SNR>5 for all three Outex test [8] Y. Guo, G. Zhao, and M. PietikäInen, “Discriminative features for
suites, while BRINT works much better than LETRIST on texture description,” Pattern Recogn., vol. 45, no. 10, pp. 3834–3843,
both Outex and CUReT datasets when SNR=5. In addition, Oct. 2012.
[9] U. Kandaswamy, S. Schuckers, and D. Adjeroh, “Comparison of tex-
LETRIST and SFC perform competitively at higher SNR ture analysis schemes under nonideal conditions,” IEEE Trans. Image
levels (e.g., SNR=100 and 30) on Outex and CUReT. In the Process., vol. 20, no. 8, pp. 2260–2275, Aug. 2011.
case of SNR=5, LETRIST significantly outperforms SFC on [10] L. Liu, Y. Long, P. Fieguth, S. Lao, and G. Zhao, “BRINT: Binary
rotation invariant and noise tolerant texture classification,” IEEE Trans.
both datasets. The noise robustness of LETRIST mainly lies Image Process., vol. 23, no. 7, pp. 3071–3084, Jul. 2014.
in that: i) LETRIST is built upon the extremum responses [11] M. Varma and A. Zisserman, “A statistical approach to material classifi-
derived from low-order Gaussian derivative filters. Thus it is cation using image patch exemplars,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 31, no. 11, pp. 2032–2047, Nov. 2009.
noise-robust in design. ii) It uses the global averaging operator [12] F. Khellah, “Texture classification using dominant neighborhood struc-
for scalar quantization, which is also robust to Gaussian noise. ture,” IEEE Trans. Image Process., vol. 20, no. 11, pp. 3270–3279, 2011.
[13] T. Song, H. Li, F. Meng, Q. Wu, B. Luo, B. Zeng, and M. Gabbouj,
“Noise-robust texture description using local contrast patterns via global
V. C ONCLUSION measures,” IEEE Signal Process. Lett., vol. 21, no. 1, pp. 93–96, 2014.
[14] R. Maani, S. Kalra, and Y. Yang, “Rotation invariant local frequency
In this paper, we have presented a simple yet effective descriptors for texture classification,” IEEE Trans. Image Process.,
image descriptor, Locally Encoded TRansform feature hIS- vol. 22, no. 6, pp. 2409–2419, Jun. 2013.
Togram (LETRIST), for texture classification. LETRIST is [15] J. He, H. Ji, and X. Yang, “Rotation invariant texture descriptor using
local shearlet-based energy histograms,” IEEE Signal Process. Lett.,
built by quantizing and encoding a set of transform features vol. 20, no. 9, pp. 905–908, Sep. 2013.
derived from the extremum responses of the first and second [16] J. Ren, X. Jiang, and J. Yuan, “Noise-resistant local binary pattern with
directional Gaussian derivative filters. The transform features an embedded error-correction mechanism,” IEEE Trans. Image Process.,
vol. 22, no. 10, pp. 4049–4060, Oct. 2013.
are constructed to characterize local texture structures and [17] L. Liu and P. Fieguth, “Texture classification from random features,”
their correlation. The scalar quantization, i.e., binary or multi- IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 3, pp. 574–586,
level thresholding, is adopted to generate informative texture Mar. 2012.
[18] X. Hong, G. Zhao, M. Pietikainen, and X. Chen, “Combining LBP
codes. The cross-scale joint coding is explored to build the difference and feature correlation for texture description,” IEEE Trans.
compact image feature representation. The proposed LETRIST Image Process., vol. 23, no. 6, pp. 2557–2568, Jun. 2014.
is training-free and efficient to implement. It is also low- [19] M. Crosier and L. D. Griffin, “Using basic image features for texture
classification,” Int. J. Comput. Vision, vol. 88, no. 3, pp. 447–460, 2010.
dimensional, yet discriminative and robust for texture de- [20] J. Zhang, J. Liang, and H. Zhao, “Local energy pattern for texture
scription. Experimental results demonstrate that our method is classification using self-adaptive quantization thresholds,” IEEE Trans.
not only robust to rotation, illumination, scale and viewpoint Image Process., vol. 22, no. 1, pp. 31–42, Jan. 2013.
[21] W. Freeman and E. Adelson, “The design and use of steerable filters,”
changes, but also robust to Gaussian noise. In future work, IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 9, pp. 891–906,
we plan to extend the proposed method from the 2D plane Sep. 1991.
14

[22] J. Zhang, H. Zhao, and J. Liang, “Continuous rotation invariant local [48] A. Satpathy, X. Jiang, and H.-L. Eng, “LBP-based edge-texture features
descriptors for texton dictionary-based texture classification,” Comput. for object recognition,” IEEE Trans. Image Process., vol. 23, no. 5, pp.
Vis. Image Underst., vol. 117, no. 1, pp. 56–75, Jan. 2013. 1953–1964, May. 2014.
[23] T. Song, F. Meng, B. Luo, and C. Huang, “Robust texture representation [49] S. Liao, M. Law, and A. Chung, “Dominant local binary patterns for
by using binary code ensemble,” in Proc. VCIP, 2013, pp. 1–6. texture classification,” IEEE Trans. Image Process., vol. 18, no. 5, pp.
[24] T. Song, H. Li, F. Meng, Q. Wu, and B. Luo, “Exploring space-frequency 1107–1118, May. 2009.
co-occurrences via local quantized patterns for texture representation,” [50] S. Liao, X. Zhu, Z. Lei, L. Zhang, and S. Z. Li, “Learning multi-scale
Pattern Recognit., vol. 48, no. 8, pp. 2621–2632, 2015. block local binary patterns for face recognition,” in Proc. ICB, 2007,
[25] L. Deng and D. Yu, “Deep learning: Methods and applications,” Found. pp. 828–837.
Trends Signal Process., vol. 7, no. 3-4, pp. 197–387, Jun. 2014. [51] L. Wolf, T. Hassner, and Y. Taigman, “Effective unconstrained face
[26] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and recognition by combining multiple descriptors and learned background
T. Darrell, “DeCAF: A deep convolutional activation feature for generic statistics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 10, pp.
visual recognition,” in Proc. ICML, 2014, pp. 647–655. 1978–1990, Oct. 2011.
[27] J. Bruna and S. Mallat, “Invariant scattering convolution networks,” [52] K. Wang, C.-E. Bichot, C. Zhu, and B. Li, “Pixel to patch sampling
IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1872–1886, structure and local neighboring intensity relationship patterns for texture
Aug. 2013. classification,” IEEE Signal Process. Lett., vol. 20, no. 9, pp. 853–856,
Sep. 2013.
[28] T.-H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, “PCANet: A
[53] L. Liu, P. Fieguth, G. Kuang, and H. Zha, “Sorted random projections for
simple deep learning baseline for image classification?” IEEE Trans.
robust texture classification,” in Proc. ICCV, Nov. 2011, pp. 391–398.
Image Process., vol. 24, no. 12, pp. 5017–5032, Dec. 2015.
[54] S. Murala, R. Maheshwari, and R. Balasubramanian, “Local tetra
[29] Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks patterns: A new feature descriptor for content-based image retrieval,”
and applications in vision,” in Proc. ISCAS, May 2010, pp. 253–256. IEEE Trans. Image Process., vol. 21, no. 5, pp. 2874–2886, May. 2012.
[30] T. Randen and J. Husoy, “Filtering for texture classification: a compar- [55] K.-C. Fan and T.-Y. Hung, “A novel local pattern descriptor—-local
ative study,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 4, pp. vector pattern in high-order derivative space for face recognition,” IEEE
291–310, Apr. 1999. Trans. Image Process., vol. 23, no. 7, pp. 2877–2891, Jul. 2014.
[31] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural features for [56] A. Ramirez Rivera, R. Castillo, and O. Chae, “Local directional number
image classification,” IEEE Trans. Syst., Man, Cybern., vol. SMC-3, pattern for face analysis: Face and expression recognition,” IEEE Trans.
no. 6, pp. 610 –621, Nov. 1973. Image Process., vol. 22, no. 5, pp. 1740–1752, May. 2013.
[32] R. Kashyap and A. Khotanzad, “A model-based method for rotation [57] S. Wang, Q. Wu, X. He, J. Yang, and Y. Wang, “Local N-ary pattern
invariant texture classification,” IEEE Trans. Pattern Anal. Mach. Intell., and its extension for texture classification,” IEEE Trans. Circuits Syst.
vol. PAMI-8, no. 4, pp. 472–481, Jul. 1986. Video Technol., vol. 25, no. 9, pp. 1495–1506, Sep. 2015.
[33] H. Deng and D. Clausi, “Gaussian MRF rotation-invariant features for [58] X. Qi, R. Xiao, C. Li, Y. Qiao, J. Guo, and X. Tang, “Pairwise rotation
image classification,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, invariant co-occurrence local binary pattern,” IEEE Trans. Pattern Anal.
no. 7, pp. 951–955, Jul. 2004. Mach. Intell., vol. 36, no. 11, pp. 2199–2213, Nov. 2014.
[34] T. Ahonen and M. Pietikäinen, “Image description using joint distribu- [59] Y. Quan, Y. Xu, Y. Sun, and Y. Luo, “Lacunarity analysis on image
tion of filter bank responses,” Pattern Recogn. Lett., vol. 30, no. 4, pp. patterns for texture classification,” in Proc. CVPR, 2014.
368–376, Mar. 2009. [60] W. Zhang, S. Shan, W. Gao, X. Chen, and H. Zhang, “Local gabor
[35] T. Leung and J. Malik, “Representing and recognizing the visual binary pattern histogram sequence (LGBPHS): a novel non-statistical
appearance of materials using three-dimensional textons,” Int. J. Comput. model for face representation and recognition,” in Proc. ICCV, 2005.
Vision, vol. 43, no. 1, pp. 29–44, Jun. 2001. [61] B. Zhang, S. Shan, X. Chen, and W. Gao, “Histogram of gabor
[36] M. Varma and A. Zisserman, “A statistical approach to texture classifi- phase patterns (HGPP): A novel object representation approach for face
cation from single images,” Int. J. Comput. Vision, vol. 62, no. 1-2, pp. recognition,” IEEE Trans. Image Process., vol. 16, no. 1, pp. 57–68,
61–81, Apr. 2005. Jan. 2007.
[37] D. Lowe, “Distinctive image features from scale-invariant key-points,” [62] S. Xie, S. Shan, X. Chen, and J. Chen, “Fusing local patterns of gabor
Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004. magnitude and phase for face recognition,” IEEE Trans. Image Process.,
[38] S. Lazebnik, C. Schmid, and J. Ponce, “A sparse texture representation vol. 19, no. 5, pp. 1349–1361, May. 2010.
using local affine regions,” IEEE Trans. Pattern Anal. Mach. Intell., [63] Z. Lei, S. Liao, M. Pietikäinen, and S. Li, “Face recognition by exploring
vol. 27, no. 8, pp. 1265–1278, Aug. 2005. information jointly in space, scale and orientation,” IEEE Trans. Image
[39] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, “Local features Process., vol. 20, no. 1, pp. 247–256, Jan. 2011.
and kernels for classification of texture and object categories: A com- [64] M. Yang, L. Zhang, S. C.-K. Shiu, and D. Zhang, “Monogenic binary
prehensive study,” Int. J. Comput. Vis., vol. 73, no. 2, pp. 213–238, Jun. coding: An efficient local feature extraction approach to face recogni-
2007. tion,” IEEE Trans. Inf. Forensics Security, vol. 7, no. 6, pp. 1738–1751,
[40] Z. Guo, L. Zhang, and D. Zhang, “A completed modeling of local binary Dec. 2012.
pattern operator for texture classification,” IEEE Trans. Image Process., [65] J. J. Koenderink and A. J. van Doorn, “Surface shape and curvature
vol. 19, no. 6, pp. 1657–1663, Jun. 2010. scales,” Image Vision Comput., vol. 10, no. 8, pp. 557–565, Oct. 1992.
[66] K. Pedersen, K. Stensbo-Smidt, A. Zirm, and C. Igel, “Shape index
[41] X. Tan and B. Triggs, “Enhanced local texture feature sets for face
descriptors applied to texture-based galaxy analysis,” in Proc. ICCV,
recognition under difficult lighting conditions,” IEEE Trans. Image
2013, pp. 2440–2447.
Process., vol. 19, no. 6, pp. 1635–1650, Jun. 2010.
[67] T. Ojala, T. Maenpaa, M. Pietikainen, J. Viertola, J. Kyllonen, and
[42] Y. Zhao, D.-S. Huang, and W. Jia, “Completed local binary count for S. Huovinen, “Outex—-new framework for empirical evaluation of
rotation invariant texture classification,” IEEE Trans. Image Process., texture analysis algorithms,” in Proc. ICPR, 2002, pp. 701–706.
vol. 21, no. 10, pp. 4492–4497, Oct. 2012. [68] K. J. Dana, B. van Ginneken, S. K. Nayar, and J. J. Koenderink,
[43] B. Zhang, Y. Gao, S. Zhao, and J. Liu, “Local derivative pattern versus “Reflectance and texture of real-world surfaces,” ACM Trans. Graph.,
local binary pattern: Face recognition with high-order local pattern vol. 18, no. 1, pp. 1–34, Jan. 1999.
descriptor,” IEEE Trans. Image Process., vol. 19, no. 2, pp. 533–544, [69] M. Fritz, E. Hayman, B. Caputo, and J.-O. Eklundh, “On the significance
Feb. 2010. of real-world conditions for material classification,” in Proc. ECCV,
[44] J. Chen, S. Shan, C. He, G. Zhao, M. Pietikainen, X. Chen, and W. Gao, 2004, pp. 253–266.
“WLD: A robust local image descriptor,” IEEE Trans. Pattern Anal. [70] A. Depeursinge, A. Foncubierta-Rodriguez, D. Van De Ville, and
Mach. Intell., vol. 32, no. 9, pp. 1705–1720, Sep. 2010. H. Muller, “Rotation–covariant texture learning using steerable riesz
[45] G. Zhao, T. Ahonen, J. Matas, and M. Pietikäinen, “Rotation-invariant wavelets,” IEEE Trans. Image Process., vol. 23, no. 2, pp. 898–908,
image and video description with local binary pattern features,” IEEE Feb. 2014.
Trans. Image Process., vol. 21, no. 4, pp. 1465–1477, Apr. 2012. [71] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei, “ImageNet:
[46] M. Heikkila and M. Pietikainen, “A texture-based method for modeling A large-scale hierarchical image database,” in Proc. CVPR, June 2009,
the background and detecting moving objects,” IEEE Trans. Pattern pp. 248–255.
Anal. Mach. Intell., vol. 28, no. 4, pp. 657–662, Apr. 2006. [72] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial
[47] S. ul Hussain and B. Triggs, “Visual recognition using local quantized pyramid matching for recognizing natural scene categories,” in Proc.
patterns,” in Proc. ECCV, 2012, pp. 716–729. CVPR, vol. 2, 2006, pp. 2169–2178.
15

[73] M. Cimpoi, S. Maji, and A. Vedaldi, “Deep filter banks for texture Qingbo Wu (S’12-M’13) received the B.E. degree
recognition and segmentation,” in Proc. CVPR, 2015, pp. 3828–3836. in Education of Applied Electronic Technology from
[74] L. Liu, P. W. Fieguth, Y. Guo, X. Wang, and M. Pietikäinen, “Local Hebei Normal University in 2009, and the Ph.D.
binary features for texture classification: Taxonomy and experimental degree in signal and information processing in U-
study,” Pattern Recogn., vol. 62, pp. 135–160, 2017. niversity of Electronic Science and Technology of
[75] V. Chandrasekhar, J. Lin, O. Morère, H. Goh, and A. Veillard, “A China in 2015. From February 2014 to May 2014, he
practical guide to CNNs and Fisher Vectors for image instance retrieval,” was a Research Assistant with the Image and Video
Signal Process., vol. 128, pp. 426–439, 2016. Processing (IVP) Laboratory at Chinese University
of Hong Kong. Then, from October 2014 to October
2015, he served as a visiting scholar with the Image
& Vision Computing (IVC) Laboratory at University
of Waterloo. He is currently a lecturer in the School of Electronic Engineering,
University of Electronic Science and Technology of China. His research inter-
Tiecheng Song received his Ph.D. degree in Signal ests include image/video coding, quality evaluation, and perceptual modeling
and Information Processing from University of Elec- and processing.
tronic Science and Technology of China (UESTC) in
2015. From October 2015 to April 2016, he joined
the Multimedia Lab of Nanyang Technological U-
niversity, Singapore, as a visiting student. He is
currently working in the School of Communication
and Information Engineering, Chongqing University
of Posts and Telecommunications (CQUPT). His
research interests include feature extraction, texture
analysis, and image representation.

Hongliang li (SM’12) received his Ph.D. degree in


Electronics and Information Engineering from Xian
Jiaotong University, China, in 2005. From 2005 to
2006, he joined the visual signal processing and
communication laboratory (VSPC) of the Chinese
University of Hong Kong (CUHK) as a Research
Associate. From 2006 to 2008, he was a Postdoctoral
Fellow at the same laboratory in CUHK. He is
currently a Professor in the School of Electronic
Engineering, University of Electronic Science and
Technology of China. His research interests include Jianfei Cai (S’98-M’02-SM’07) received his PhD
image segmentation, object detection, image and video coding, visual atten- degree from the University of Missouri-Columbia.
tion, and multimedia communication system. He is currently an Associate Professor and has
Dr. Li has authored or co-authored numerous technical articles in well- served as the Head of Visual & Interactive Comput-
known international journals and conferences. He is a co-editor of a Springer ing Division and the Head of Computer Communica-
book titled Video segmentation and its applications. Dr. Li was involved tion Division at the School of Computer Engineer-
in many professional activities. He is a member of the Editorial Board of ing, Nanyang Technological University, Singapore.
the Journal on Visual Communications and Image Representation, and the His major research interests include computer vision,
Area Editor of Signal Processing: Image Communication, Elsevier Science. visual computing and multimedia networking. He
He served as a Technical Program co-chair in ISPACS 2009, General co- has published more than 180 technical papers in
chair of the ISPACS 2010, Publicity co-chair of IEEE VCIP 2013 local chair international journals and conferences. He has been
of the IEEE ICME 2014, and TPC members in a number of international actively participating in program committees of various conferences. He
conferences, e.g., ICME 2013, ICME 2012, ISCAS 2013, PCM 2007, PCM has served as the leading Technical Program Chair for IEEE International
2009, and VCIP 2010. He serves as a Technical Program Co-chairs for IEEE Conference on Multimedia & Expo (ICME) 2012 and the leading General
VCIP2016. He is now a senior member of IEEE. Chair for Pacific-rim Conference on Multimedia (PCM) 2012. Since 2013, he
has been serving as an Associate Editor for IEEE Trans on Image Processing
(T-IP). He has also served as an Associate Editor for IEEE Trans on Circuits
and Systems for Video Technology (T-CSVT) from 2006 to 2013.

Fanman Meng (S’12-M’14) received the Ph.D.


degree in Signal and Information Processing from
University of Electronic Science and Technology of
China (UESTC), Chengdu, China, in 2014. From
July 2013 to July 2014, he joined Division of Visual
and Interactive Computing of Nanyang Technologi-
cal University in Singapore as a Research Assistant.
He is currently an Associate Professor in the School
of Electronic Engineering, University of Electron-
ic Science and Technology of China, Chengdu,
Sichuan, China. His research interests include image
segmentation and object detection.
Dr. Meng has authored or co-authored numerous technical articles in well-
known international journals and conferences. He received the “Best student
paper honorable mention award” for the 12th Asian Conference on Computer
Vision (ACCV 2014) in Singapore and the “Top 10% paper award” in the
IEEE International Conference on Image Processing (ICIP 2014) at Paris,
France. He is now a member of IEEE and IEEE CAS society.

View publication stats

You might also like