Professional Documents
Culture Documents
net/publication/313798249
Article in IEEE Transactions on Circuits and Systems for Video Technology · February 2017
DOI: 10.1109/TCSVT.2017.2671899
CITATIONS READS
55 1,192
5 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Tiecheng Song on 15 December 2018.
Abstract—Classifying texture images, especially those with [6], [7]. In the real world, textures not only have complex in-
significant rotation, illumination, scale and viewpoint changes, trinsic patterns which are difficult to describe using a universal
is a fundamental and challenging problem in computer vision. model, but also exhibit large extrinsic variations (e.g., rotation,
This paper proposes a simple yet effective image descriptor, called
Locally Encoded TRansform feature hISTogram (LETRIST), for illumination, and deformation) due to different imaging con-
texture classification. LETRIST is a histogram representation ditions (as shown in Fig. 6). If no a priori information of these
that explicitly encodes the joint information within an image imaging conditions is available, successful classification of
across feature and scale spaces. The proposed representation is these textures is a highly challenging task. In such a scenario,
training-free, low-dimensional, yet discriminative and robust for a good image feature representation should have the following
texture description. It consists of the following major steps. 1)
A set of transform features is constructed to characterize local desirable properties for texture classification:
texture structures and their correlation by applying linear and 1) Discriminative to distinguish different classes of tex-
non-linear operators on the extremum responses of directional tures [1], [8]–[10];
Gaussian derivative filters in scale space. Established on the
2) Invariant to image transformations such as rotation,
basis of steerable filters, the constructed transform features
are exactly rotationally invariant as well as computationally illumination, scale, and viewpoint changes [1], [9], [11];
efficient. 2) The scalar quantization via binary or multi-level 3) Insensitive to noise [10], [12]–[16];
thresholding is adopted to quantize these transform features into 4) Low-dimensional to facilitate subsequent operations
texture codes. Two quantization schemes are designed, both of (e.g., storage, matching, or classification) [17], [18];
which are robust to image rotation and illumination changes.
5) Efficient for implementation [1], [10], [19], [20].
3) The cross-scale joint coding is explored to aggregate the
discrete texture codes into a compact histogram representation, Although a large number of approaches have been pro-
i.e., LETRIST. Experimental results on the Outex, CUReT, KTH- posed for texture analysis, to our knowledge, most of them
TIPS and UIUC texture datasets show that LETRIST consistently cannot well balance the aforementioned properties [9]. For
produces better or comparable classification results than the
state-of-the-art approaches. Impressively, recognition rates of
instance, the texton-based feature representation introduced
100.00% and 99.00% have been achieved on the Outex and KTH- by Varma and Zisserman [11] can extract discriminative yet
TIPS datasets, respectively. In addition, the noise robustness is robust features when classifying texture images, especially
evaluated on the Outex and CUReT datasets. The source code is with significant scale and viewpoint changes (see Section
publicly available at https://github.com/stc-cqupt/letrist. IV). However, it usually involves a high-dimensional feature
Index Terms—Texture classification, texture analysis, rotation representation, and requires an extra learning phase and a
invariance, steerable filter, texton, local binary pattern (LBP). costly nearest-neighbor calculation. As for the local binary
pattern (LBP) introduced by Ojala et al. [1], it is simple
to implement and also robust to rotation and illumination
I. I NTRODUCTION
changes. However, it is very sensitive to noise. Recently, the
XTRACTION of texture features serves a fundamental
E building block in a variety of visual applications such as
texture classification [1], face recognition [2], medical diagno-
binary rotation invariant and noise tolerant (BRINT) descriptor
was proposed in [10]. Nonetheless, its discriminative power is
compromised with the overemphasized noise robustness, and
sis [3], image matching [4], [5], and scene/object recognition it also needs a high-dimensional feature representation.
To better handle these five properties, in this paper we
Copyright ⃝20xx
c IEEE. Personal use of this material is permitted. How-
ever, permission to use this material for any other purposes must be obtained propose a novel image feature representation, called Locally
from the IEEE by sending an email to pubs-permissions@ieee.org. Encoded TRansform feature hISTogram (LETRIST), for tex-
This work was supported in part by National Natural Science Foundation ture classification. In the proposed method, we first compute
of China (No. 61525102, No. 61601102 and No. 61502084).
T. Song is with the School of Communication and Information Engineering, the extremum responses of the first and second directional
Chongqing University of Posts and Telecommunications, Chongqing 400065, Gaussian derivative filters at multiple scales. This is accom-
China (email: tggwin@gmail.com). H. Li, F. Meng and Q. Wu are with plished by the extremum filtering1 established on the basis
the School of Electronic Engineering, University of Electronic Science and
Technology of China, Chengdu 611731, China (email: hlli@uestc.edu.cn; of steerable filters [21]. The resulting extremum responses
fmmeng@uestc.edu.cn; wqb.uestc@gmail.com). H. Li is the corresponding
author. J. Cai is with the School of Computer Science and Engineering, 1 In this paper, the extremum filtering refers to the filtering operations for
Nanyang Technological University, Singapore (email: asjfcai@ntu.edu.sg). computing the directional extremum (maximum and minimum) responses.
2
are exactly rotation-invariant and also easy to implement. steerable filters to construct the transform features.
Then, we construct a set of transform features by performing Recently, there is increasing interest in using deep neural
linear and non-linear operators on these extremum responses networks [25] to compute invariant image representations,
to capture discriminative texture information. Derived from such as DeCAF [26], the wavelet scattering networks (Scat-
the first- and second-order image derivatives, the constructed Net) [27], and PCANet [28]. A typical architecture of deep
transform features can characterize not only local texture convolutional neural networks (CNNs) [29] alternates several
structures (e.g., edges, lines and blobs) but also local second- layers of convolutional filtering, non-linear rectification and
order curvatures (e.g., caps, ridges, ruts and cups). Next, feature pooling, followed by a classification module. Thus,
we directly quantize these transform features into discrete a hierarchy of features corresponding to different levels of
texture codes via simple binary or multi-level thresholding. abstractions can be learned. In particular, the convolutional
Particularly, two types of scalar quantization, i.e., the ratio and filters used in ScatNet and PCANet are respectively predefined
uniform quantizers, are designed, both of which are robust to wavelets and PCA filters. In the proposed LETRIST, we
image rotation and illumination changes. Finally, we jointly sequentially perform extremum filtering, feature transform,
encode these texture codes across scales to build feature scalar quantization and cross-scale joint coding for histogram
histograms, which are further concatenated to form the image representations. These operators can be viewed as some special
descriptor, i.e., LETRIST. All of the above steps contribute to a processing layers in CNNs with a shallow feature extraction
simple, low-dimensional, yet discriminative and robust image architecture. However, unlike traditional CNNs, our method is
descriptor. Experimental results demonstrate the effectiveness training-free and the final image features are low-dimensional.
of our method in classifying textures under various image The rest of this paper is organised as follows. Section II
transformations and even in the presence of Gaussian noise. reviews the related work. Section III elaborates the proposed
There are some works involving Gaussian derivative filters. method. Section IV evaluates our method, compares classifi-
Varma and Zisserman [11] employed the Maximum Response cation results with the state-of-the-art, and also tests the noise
8 (MR8) filter bank to build a texton-based image representa- robustness. Finally, the conclusion is drawn in Section V.
tion. Zhang et al. [20] used the second directional derivative of
Gaussian filter to build a local energy pattern (LEP) histogram.
II. R ELATED WORK
In [22], they also used the maximum responses of Gaussian
derivative filters to extract continuous rotation-invariant fea- During the past decades, various approaches to extracting
tures for texton-based representation. In our previous work texture features have been proposed, most of which fall into
[23], we adopted the edge and bar filters to build a binary one of the three major categories, i.e., statistical, model-based,
coding based texture representation. All these works differ and filtering approaches [30]. The statistical approaches char-
from the proposed method in that they either model images as acterize textures by some statistical measures. For instance,
a texton-based representation or use multiple oriented filters the spatial-dependence co-occurrence matrices introduced by
to extract approximate rotation-invariant features. In contrast, Haralick et al. [31] are built by counting the gray-level pairs
the proposed method is training-free and can achieve exact of pixels at a specified distance and orientation. Then some
rotation invariance via the extremum filtering. statistical measures are computed from these matrices as tex-
Our method is related to the work [19], where six basic ture features. In model-based approaches, an image model is
image features (BIF) are computed from the responses of first assumed, and the model parameters are then estimated and
Gaussian derivative filters to build a BIF-column representa- used as texture features. The circular symmetric autoregressive
tion. However, there are three major differences. Firstly, our (CSAR) model [32] and the anisotropic circular Gaussian
feature construction is based on the extremum filtering while Markov random field (ACGMRF) model [33] are two rep-
[19] is based on the study of local symmetry types. Secondly, resentative works for rotation-invariant texture classification.
unlike [19] that uses simple combinations of filter responses, In filtering approaches, texture features are generally derived
we introduce non-linear operators and consider the correlation from the local “energy” of filtered images. This category
between the first- and second-order features. Thirdly, instead of methods have been extensively studied in the literature,
of using the BIF-column representation, we couple our trans- including Laws filters, Gabor filters, and wavelet transforms.
form features with the tailored scale quantization and joint A comprehensive analysis and evaluation can be found in [30].
coding, making our image descriptor very compact. Recently, texton-based and local pattern-based represen-
The proposed LETRIST also differs from our previous tations, as a generalized combination of the statistical and
work [24] which explores joint space-frequency features for filtering approaches [11], [34], have shown promising per-
texture representation. Firstly, we generalize [24] in spatial formance in texture classification and face recognition. The
filtering and develop new pixel-wise transform features via the texton-based representations [11], [17], [35], [36] involve a
extremum filtering. Secondly, beyond the mean-based quanti- learning process to generate a texton dictionary (i.e., a set of
zation used in [24], we design the ratio and uniform quantizers textons). This is typically accomplished by clustering (via k-
to fit our constructed features. Thirdly, by extending the joint means) local feature vectors extracted from the training images
scale coding, we propose the adjacent-scale coding to form and choosing the cluster centers as the textons. Given a novel
a much more compact image representation. Lastly, without image, a frequency histogram of textons is built by assigning
resort to the Fourier transform used in [24], we make our each feature vector, and thereby each pixel within this image,
method computationally more efficient by taking advantage of to the closest texton in the dictionary. In the existing literature,
3
Fig. 1. The pipeline of the proposed method. The extremum response maps of I1max θ θ
, I2max θ
, and I2min are respectively shown in three stacks. In each
stack, the extremum responses are computed at three scales: σ1 = 1, σ2 = 2, σ3 = 4 (top to bottom). The transform feature maps of g, d, s, r and their
quantized code maps are shown at these three scales (top to bottom). The final image descriptor is obtained by concatenating three feature histograms. The
blue digits shown in the code maps indicate the total numbers of quantization levels.
local feature vectors are obtained from local descriptors [37]– and local directional number (LDN) [56] for face recognition.
[39], the Maximum Response 8 (MR8) filter bank [11], the Other local pattern-based representations include the Weber
responses of local derivatives filters [22], [34], source image local descriptor (WLD) [44], the local energy pattern (LEP)
patches [11], random project (RP) patches [17], and so on. For [20], and the local N-ary pattern (LNP) [57].
texton-based representations, there are problems to choose a Combining the information that reflects different aspects
proper dictionary size and it is costly to perform a nearest- of an image can contribute to a discriminative feature rep-
neighbor calculation for each pixel [17], [19], [20], [40]. resentation. Great efforts have been made in this direction
by exploiting LBP. For instance, Guo et al. [40] proposed
Without requiring a dictionary learning stage, local pattern-
the complete LBP (CLBP) by encoding three complemen-
based representations [1], [40]–[44] have been attracting in-
tary components, i.e., the center pixel, and the signs and
creasing attention. The influential work of local binary pattern
magnitudes of local differences. Qi et al. [58] developed the
(LBP) introduced by Ojala et al. [1], leads the way in this re-
pairwise rotation invariant co-occurrence LBP (PRICoLBP)
gard. In LBP, the intensity value of a center pixel is compared
by encoding two LBPs collaboratively. By further incorpo-
with its neighbors, and then the resultant sign information
rating the multi-scale and multi-orientation color information,
(a bit string) is encoded into an integer code. Finally, a
they demonstrated the effectiveness of PRICoLBP for visual
frequency histogram of such codes is computed to represent
classification. Hong et al. [18] introduced a numerical variant
the image/region. Due to the computational efficiency and
of LBP, LBP difference (LBPD), so that it can be effectively
texture discrimination, LBP has been widely used in texture
combined with other features through a covariance matrix. The
classification [1], face recognition [2], video description [45],
resulting texture descriptor COV-LBPD achieved competitive
and object detection [46]–[48]. A lot of variants of LBP have
performance compared with the state-of-the-art methods [19],
been developed. Tan and Triggs [41] introduced the local
[40], [45]. Quan et al. [59] performed the lacunarity analysis
ternary pattern (LTP) to encode the pixel differences into three
on multi-scale LBPs to characterize the spatial distribution of
levels. Liao et al. [49] suggested using the most frequently
texture structures. In our previous work [23], we leveraged
occurred patterns, called dominant LBP (DLBP), as texture
locally enhanced binary coding (LEBC) to augment LBP codes
features. It is noted that only the dominant pattern frequencies
for texture representation and obtained promising results.
are considered in DLBP while the informative pattern types are
Other efforts include applying the idea of LBP to encode
discarded. To better explore the information in patterns, Guo
the neighboring information of Gabor features [60]–[63] and
et al. [8] presented a three-layered model to learn the optimal
monogenic signal features [64] for robust face recognition.
pattern subset. LBP is a local operator which fails to capture
larger scale structural features. Therefore, the patch sampling
[10], [50]–[52] and geometric sampling structures [47], [53] III. P ROPOSED M ETHOD
were proposed to encode the macro-structure information. The The proposed method explicitly encodes the joint infor-
original LBP is sensitive to noise. Consequently, many at- mation within an image across feature and scale spaces.
tempts were made to alleviate this issue by employing the local Fig. 1 shows the pipeline of the proposed method which is
averaging [10], [12], [13], the frequency or transform-domain summarized into four steps: (1) extremum filtering, (2) feature
components [14], [15], or the error-correction mechanism [16]. transform, (3) scalar quantization, and (4) cross-scale joint
In addition, alternative coding rules were designed for specific coding and image representation. In particular, the input image
applications, such as the completed local binary count (CLBC) is first convolved with a family of Gaussian derivative filters
[42] for texture classification, local tetra patterns [54] for to compute the extremum responses at multiple scales. Then,
image retrieval, local high-order derivative patterns [43], [55] based on these extremum responses, a set of transform features
4
Local image derivatives contain rich structural information, I2θ = Gθ2 ∗ I = cos2 (θ)Lxx − sin(2θ)Lxy + sin2 (θ)Lyy
which have shown great potential in texture analysis [1], [11], ( √ )
1 2
[13], [19], [20], [22], [23], [44]. Inspired by [11], [22], in this = Lxx + Lyy + (Lxx − Lyy ) + 4L2xy cos(2θ − ψ)
work we employ the steerability of Gaussian derivative filters 2
(5)
[21] (up to second order) to compute the extremum (maximum 2Lxy
where ψ = arctan( Lyy −L ). Then, we can compute the
and minimum) responses in scale space. We refer to these op- xx
extremum responses in a similar way as in [22]. That is, the
erators as “extremum filtering”. The motivation for introducing
maximum value of I1θ over all θ is
the extremum filtering is multi-fold. The first is to capture √
the useful information contained in the first- and second-order θ
I1max = L2x + L2y (6)
differential structures at a range of scales. The second is to
extract local features that are exactly rotation-invariant. The The maximum and minimum values of I2θ over all θ are
third is to make our method efficient to implement. ( √ )
1 2
According to the theory of steerable filters [21], any ori- θ
I2max = Lxx + Lyy + (Lxx − Lyy ) + 4Lxy 2 (7)
2
entation of the first or second derivative of a Gaussian can
be synthesized by taking a linear combination of several and
( √ )
basis filters. Formally, consider a two-dimensional circularly 1 2
symmetric Gaussian function:
θ
I2min = Lxx + Lyy − (Lxx − Lyy ) + 4Lxy
2 (8)
2
( 2 )
1 x + y2 Without loss of generality, let us rotate image I counter-
G(x, y; σ) = exp − (1)
2πσ 2 2σ 2 clockwise around its center point by an arbitrary angle α◦
where σ is the standard deviation or the scale. The first and and denote the rotated image as I ′ . Under this rotation, point
second Gaussian derivatives at an arbitrary orientation θ are (x, y) in I is changed to (x′ , y ′ ) in I ′ . Assume point (x, y)
in I achieves the extremum response (as defined in Eqns. (6)-
Gθ1 = cos(θ)Gx + sin(θ)Gy (2) (8)) through the first or second directional Gaussian derivative
and filter at angle θ̂◦ . Then, point (x′ , y ′ ) in I ′ can also achieve
the same extremum response through this directional derivative
Gθ2 = cos2 (θ)Gxx − sin(2θ)Gxy + sin2 (θ)Gyy (3) filter at angle (θ̂ + α)◦ . That is, the extremum response for
3 The scale normalization operators: G ← σG , G ← σG , G
xx ←
2 Like the LBP codes [1], our quantized pixel-wise codes are rotationally
x x y y
invariant and hence the resulting histogram is also rotationally invariant. σ 2 Gxx , Gxy ← σ 2 Gxy , Gyy ← σ 2 Gyy .
5
Fig. 3. The shape-index values in the range [0, 1] and their corresponding
local shapes. For more shape categories, please refer to [65].
s =1 s =2 s =4 s =1 s =2 s =4
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
1 2 3 1 2 3
1000
500
1000
500
1000
500
1000
500
1000
500
1000
500
the proper quantization level L via multi-level thresholding.
0
0 0.7 1.3
0
0 0.53 1.3
0
0 0.2 1.3
0
0 0.51 1.3
0
0 0.6 1.3
0
0 0.21 1.3
Also, the rotation invariance is inherited from the feature set
g g g d d d
800
s =1
800
s =1
F ={g, d, s, r} used in Eqns. (15) and (16). Finally, the robust-
1 1
600
s =2
2
600
s =2
2 ness to illumination changes is gained due to the following
s =4 s =4
3 3
facts: i) A local/global brightness change (i.e., a constant is
Frequency
Frequency
400 400
added to each pixel value) can be cancelled by the difference
200 200 operation in Gaussian derivatives. ii) A local/global contrast
0
0 0.33 0.67 1
0
0 0.2 0.4 0.6 0.8 1
change (i.e., each pixel value is multiplied by a constant) is
s r
already eliminated by the division operation in Eqns. (11)
Fig. 5. Illustration of the binary ratio quantizer for {g, d} and the uniform and (14) for features {s, r}; a global contrast change is also
quantizer for {s, r}. Given a texture image, histograms of g, d, s and r are eliminated by the division operation in Eqn. (15) for {g, d}.
computed at three scales: σ1 = 1, σ2 = 2, σ3 = 4. The red vertical lines
indicate the quantization thresholds: for {g, d}, they are computed as kmx ;
for {s, r}, they are determined by quantization levels L s and L r.
D. Cross-scale Joint Coding and Image Representation
a good quantizer are the computational efficiency, the dis- Next, we consider aggregating the generated texture codes
criminative power as well as the robustness to rotation and into a histogram-based feature representation. A straightfor-
illumination changes. Toward these goals, two types of scalar ward way is to jointly encode all of these texture codes at all
quantization via simple binary or multi-level thresholding are scales, but the resulting feature histogram is extremely high
designed by taking into account different feature properties. dimensional and sparse, and may not be discriminative. To deal
For transform features {g, d} whose values are in a non- with this problem, we propose a cross-scale joint coding to first
negative interval, we adopt a mean-value based binary ratio construct multiple feature histograms, and then concatenate
quantizer Q1 (·): these histograms for image feature representation.
{
0, if x/mx > k The cross-scale joint coding is performed as follows (see
y = Q1 (x) = (15)
1, otherwise the diagram of this process in Fig. 1).
where x ∈ {g, d}, k is a tuning parameter, and mx is the mean • Adjacent-scale coding (ASC): Transform features
value of the transform feature map of x (see Fig. 1). Here, the {g, d, s} are jointly encoded across two adjacent scales,
use of mx is robust to image rotation, and the similar averaging e.g., (σ1 , σ2 ), (σ2 , σ3 ), etc. Formally, for the adjacent-
operator was previously used in [13], [23], [24], [40]. Fig. 5 scale pair (σi , σi+1 ) (i = 1, 2, ..., Nσ − 1), the ASC
illustrates the binary ratio quantizer for {g, d}4 . For transform value of pixel (x, y) in image I is computed as
features {s, r} whose values are in the range [0, 1], we adopt ∑
2
a uniform quantizer Q2 (·): j−1
ci (x, y) = (L s) ys (x, y; σi+j−1 )+
0, x ∈ [0, ∆] j=1
∑
2[ ]
2
1, x ∈ [∆, 2∆] j−1
y = Q2 (x) = (16) (L s) (L d) yd (x, y; σi+j−1 ) + (17)
···
L − 1, x ∈ [(L − 1)∆, 1] j=1
2[
∑
2
]
where x ∈ {s, r}, L is the quantization level, and ∆ = 1/L 2
(L s) (L d) (L g)
j−1
yg (x, y; σi+j−1 )
is the quantization step. The quantization levels for s and j=1
r, denoted by L s and L r, are not necessarily the same,
where ys (x, y; σi ), yd (x, y; σi ) and yg (x, y; σi ) are re-
but both are closely related to the texture discrimination
spectively the quantized texture codes (with correspond-
and feature dimension. As can be seen from Fig. 5, if the
ing quantization levels L s, L d and L g) for features
quantization level L s or L r is too small, the quantized
s, d and g at the scale σi .
feature codes will be coarse and lack the discrimination. If
• Full-scale coding (FSC): Transform features {r} are
too large, however, the resulting feature codes will be noisy
jointly encoded across all Nσ scales (σ1 , ..., σNσ ). The
and tend to produce a high-dimensional feature representation.
FSC value of pixel (x, y) in image I is computed by
To keep a balance among them, we set L s = 3 and L r = 5.
These parameter settings will be discussed in Section IV-C. ∑
Nσ
j−1
Based on the above steps, the computational efficiency cNσ (x, y) = (Lr ) yr (x, y; σj ) (18)
is achieved by using the simple scalar quantization. Mean- j=1
while, the discriminative power is maintained by choosing where yr (x, y; σj ) is the quantized texture code for
4 In this work, we have straightforwardly quantized {g, d} into two levels
feature r at the scale σj and Lr is the quantization level.
and achieved good classification performance. It is possible to quantize {g, d} Then, Nσ feature histograms {Hi |i = 1, 2, ..., Nσ } can be
into multiple levels to improve the performance, which may require some
efforts to design the quantization thresholds and to keep the compactness of built for image I:
encoded features. In fact, how to theoretically seek the optimal quantization ∑
levels to achieve the best classification result is an open problem, and we Hi (l) = f (ci (x, y), l) (19)
leave it as our future work. (x,y)∈I
7
images per class, of which 40 images are randomly chosen for 7) BRINT [10]: A rotation-invariant texture descriptor by
training and the remaining 41 for testing. exploring the idea of averaging before binarization. The
4) UIUC Dataset [38]: The UIUC dataset8 contains 25 tex- best results were reported using the 9-scale histogram
ture classes. Each class has 40 images (640×480 pixels) with representation BRINT2 CS CM (1296 dimensions).
significant viewpoint and scale variations. In addition, non- 8) PRICoLBP [58]: A rotation-invariant texture descriptor
rigid deformations, uncontrolled illuminations and viewpoint- by encoding spatial co-occurrence LBP features. We
dependent appearance variations are present in this dataset. directly take the results obtained by P RICoLBPg for
As in [11], [19], [39], half of samples per class are randomly comparison (1180 dimensions).
chosen for training and the remaining half for testing. 9) COV-LBPD [18]: A covariance descriptor fusing the
In our experiments, each texture sample is converted into numerical LBP variant (LBPD) with other features.
gray scale and then normalized to have zero mean and standard In [18], different (rotation-invariant or rotation-variant)
deviation [10], [11], [14], [40]. This normalization removes feature sets were designed for different datasets.
global affine illumination changes. Since our focus is on image 10) PLS [59]: A texture descriptor that exploits the lacu-
feature representation, we use a simple nearest-neighborhood narity analysis to characterize the spatial distribution of
(NN) classifier for texture classification [10], [11], [14], [17]– local image patterns (i.e., LBPs) from multiple scales.
[20], [40], [42], [52]. The distance between two feature 11) VZ MR8 [36]: A texton-based texture representation
histograms is measured using the chi-square statistic: using 8 filter responses derived from 38 filters.
∑ [H1 (k) − H2 (k)]2 12) VZ Joint [11]: A texton-based texture representation
χ2 (H1 , H2 ) = based on local image patches around each pixel.
H1 (k) + H2 (k)
k 13) VZ MRF [11]: A texton-based texture representation
where H1 and H2 are two feature histograms with bins using a two-dimensional histogram — one dimension
indexed by k. The performance is measured in terms of the for the quantized center pixel, and the other dimension
(mean) classification accuracy, which is defined as the number for the assigned texton for each pixel’s neighbour (i.e.,
of correctly classified test samples divided by the total number the N ×N image patch with the center pixel discarded).
of all test samples. For the CUReT, KTH-TIPS and UIUC 14) RP [17]: A texton-based texture representation based on
datasets, the results are reported as the average accuracy over local image patches using random project.
100 random splits of the training and test sets. 15) CMR [22]: A texton-based texture representation based
on continuous maximum responses of Gaussian deriva-
B. Comparison Methods and Implementation Details tives filters computed at 4 scales.
We compare our method with 2 baselines and 20 state-of- 16) BIF [19]: A BIF-based texture representation. A set of
the-art approaches. The details are given as follows. 6 BIFs is computed at 4 scales to populate a 1296-
dimensional histogram. The histogram stack is generated
1) LBP [1]: The rotation-invariant uniform patterns
riu2 based on 8 base scales. The multi-scale metric and
LBPP,R (P sampling neighbors on a circle of radius
riu2 histogram-stack scale-shifting are adopted to obtain the
R). We implement LBP24,3 as one baseline which has
minimum histogram matching distance.
26-dimensional features.
17) LEP [20]: The pyramid LEP with shifting scheme
2) LTP [41]: The local ternary patterns based on the split
riu2 P LEP6,3 . Similar to [19], the 3-scale LEP features are
ternary coding. We implement LT P24,3 as the other
computed at 3 pyramid levels to obtain the minimum
baseline which has 52-dimensional features.
histogram matching distance.
3) CLBP [40]: The completed LBP by encoding the center
18) LFD [14]: A rotation-invariant local frequency descrip-
pixel, the signs and magnitudes of local difference. We
tor by extracting the magnitude and phase features from
implement the 3-scale joint histogram representation
riu2 riu2 riu2 riu2 low frequency components (264 dimensions).
CLBP S8,1 /M8,1 /C + CLBP S16,2 /M16,2 /C +
riu2 riu2 19) RTL [70]: An approach for iterative rotation–covariant
CLBP S24,3 /M24,3 /C (2200 dimensions).
texture learning using steerable Riesz wavelets. We take
4) CLBC [42]: The completed LBC by encoding three
the best results reported in [70] for comparison.
complementary components similar to CLBP. We im-
20) LNP [57]: The local n-ary pattern which generalizes the
plement the 3-scale joint histogram representation
riu2 riu2 riu2 riu2 local pattern representation. In [57], different (rotation-
CLBC S8,1 /M8,1 /C + CLBC S16,2 /M16,2 /C +
riu2 riu2 invariant or rotation-variant) LNP descriptors were used
CLBC S24,3 /M24,3 /C (1900 dimensions).
for different datasets and we report the best results.
5) disCLBP [8]: A discriminative texture descriptor by
21) LEBC [23]: A rotation-invariant texture representation
learning the optimal pattern subset. The best results were
using binary code ensemble. The source image and the
reported by integrating the learning model with CLBP.
responses of the edge and bar filters at 8 orientations are
6) LBP-PTP [52]: A rotation-invariant texture descriptor
used as the feature set, followed by local binary coding.
using Pixel-To-Patch (PTP) sampling structure to encode
This yields a 1280-dimensional feature representation.
the neighboring intensity relationship. The best results
22) SFC [24]: A rotation-invariant texture representation
were reported using the 3-scale joint histogram repre-
by exploring space-frequency co-occurrences (SFC) via
sentation LNIRP/LBP/DCI PTP (600 dimensions).
local quantized patterns. We report the results using the
8 http://www-cvr.ai.uiuc.edu/ponce grp/data/index.html setting of G3 L1 R22 (1728 dimensions).
9
100
97.5 99
Fig. 7. Classification performance of the proposed LETRIST ASC with respect to k and L s. (a) Outex TC10. (b) CUReT. (c) KTH-TIPS.
90
98 94
85 92
Classification accuracy (%)
Fig. 8. Classification performance of the proposed LETRIST FSC with respect to c and L r. (a) Outex TC10. (b) CUReT. (c) KTH-TIPS.
Fig. 9. Classification performance of the proposed LETRIST as a function of k and c (L s = 3 and L r = 5). (a) Outex TC10. (b) CUReT. (c) KTH-TIPS.
Among the above 22 approaches, 5), 11)-15), 17) and 19) the experiments on Outex (TC10), CUReT and KTH-TIPS
require a learning phase to extract texture features while others datasets and the test results are reported in Fig. 7, Fig. 8 and
are training-free. In our experiments, the results of LBP, LTP, Fig. 9, respectively.
CLBP, CLBC, LEBC and SFC are reported according to our
own implementation. The results of other methods are directly From Fig. 7 and Fig. 8, it can be seen that there are
taken from the literature. significant performance gaps when L (L s and L r) is varied
from 2 to 3 (or more) across all three datasets. This is because
when the quantization level L is set as 2, the quantizer is too
C. Parameter Settings and Influences of Different Processing coarse and most of the potentially discriminative information
Techniques is discarded. As the number of L increases, more structure
1) Parameter Settings: The proposed method involves four information is preserved, hence leading to the improved clas-
main parameters, i.e., c in Eqn. (14), k in Eqn. (15), L sification performance. Furthermore, it can be seen that when
(L s and L r) in Eqn. (16) that need to be evaluated. L s > 2 and L r > 3, LETRIST ASC and LETRIST FSC
We notice that the parameters k and L s are associated can obtain stable and high performance with a wide range of
with the LETRIST ASC representation while c and L r values of k and c, i.e., k ∈ [1, 2.6] and c ∈ [0.4, 1.8]. This gives
are associated with LETRIST FSC. Thus, we first test the us more freedom to choose the values of k and c. From Fig.
classification performance of LETRIST ASC with respect 7 (b) and (c), one can see that increasing L s does not always
to k and L s, and then test the classification performance lead to better classification results. The reason is that when
of LETRIST FSC with respect to c and L r. Finally, we the quantization level increases, the quantization step becomes
test the overall classification performance of LETRIST (i.e., smaller, making the output more vulnerable to noise. Also,
the concatenated LETRIST ASC and LETRIST FSC) with a large quantization level tends to give a high-dimensional
different combinations of k, c, L s and L r. We conduct representation. According to our experiments, it is preferable
10
D. Classification Results
to set L s ∈ {3, 4} and L r ∈ {3, 4, 5}. When L s = 3 In this subsection, we perform a comparative evaluation of
and L r = 5, one can observe from Fig. 9 that our method the proposed method against the state-of-the-art in terms of
is not sensitive to k and c — the classification performance the classification accuracy. Tables I, II, III and IV present
does not vary much by varying k ∈ {1.4, 1.6, 1.8, 2, 2.2} and the comparison results on the Outex, CUReT, KTH-TIPS and
c ∈ {0.6, 0.8, 1, 1.2, 1.4}. Thus, in this work we set k = 2, UIUC datasets, respectively.
c = 1, L s = 3 and L r = 5 by default. Results on Outex Dataset. From Table I, one can observe
2) Influences of Different Processing Techniques: In this that LETRIST performs best for all test suites on the Outex
part, we evaluate the contributions of different processing dataset. Impressively, the perfect recognition rates of 100%
techniques to the proposed method, including the possible have been achieved for TC10 and TC12-horizon test suites.
scale, feature and coding schemes. We test on Outex TC12 The second best method is SFC, followed by LBP-PTP. When
(tl84) and CUReT and plot the results in Fig. 10. tested from TC10 to TC12, LBP-PTP has a performance
drop whereas the proposed method shows steady classification
Fig. 10 (a) shows the classification performance of the performance. This demonstrates the robustness of our method
proposed method by varying the scale parameters (σ1 , ..., σNσ ) to the mixed variations of rotation and illumination. The
used in multi-scale extremum filtering. It can be seen that training-free LFD, LEBC and BRINT perform fairly well,
increasing the number of scales does not always improve the and they all produce better results than LBP’s variants, CLBP
classification accuracy. With the same number of scales, using and CLBC. For learning-based methods, RTL works better
the smaller scales can obtain better results. Therefore, in this than disCLBP for the TC12 test suite, and both outperfor-
paper we empirically set σ1 = 1, σ2 = 2 and σ3 = 4 to obtain m VZ MR8 and VZ joint by a large margin. Especially,
a low-dimensional image descriptor while preserving the high the classification accuracies of VZ MR8 and VZ joint are
classification performance. These parameters will be fixed and even lower than those of two baselines, LBP and LTP, for
used for all datasets in the following experiments. TC10. Among the proposed features, LETRIST ASC1 per-
Fig. 10 (b) shows the results of the proposed method using forms best for TC10 and TC12-horizon test suites whereas
different feature and coding schemes. By comparing F 0 LETRIST FSC performs best for TC12-tl84. When these three
with F 10, F 11, F 12 and F 13, we can observe the reduced features are combined, the proposed LETRIST leads to the
classification performance if any one of these four features best performance, demonstrating the complementarity of these
{g, d, s, r} is not used. In particular, there is a relatively large three features for texture description. Fig. 11 shows the test
drop in classification accuracy without using {s} or {r}. These samples and their nearest neighbours wrongly classified by
results indicate the effectiveness of each transform feature for our method for the TC12-tl84 test suite. One can see that
texture description. By comparing F 0 with F 20 and F 21, the visual similarity of these 8 image pairs accounts for this
we can observe the advantages of the proposed cross-scale misclassification.
joint coding over other schemes. Also, F 22, F 23 and F 24 Results on CUReT Dataset. As can be seen from Table
show very poor performance by using the extremum responses. II, most of the state-of-the-art approaches are competitive on
Therefore, the proposed transform features coupled with the this dataset. Among the top six approaches are SFC, BIF,
cross-scale joint coding are effective for texture classification. LETRIST, LEBC, RP and CMR, all of which produce about
11
Fig. 11. Texture samples and their nearest neighbours wrongly classified by our method for the Outex TC12 (tl84) test suite. Top: test samples; Bottom:
the nearest neighbours found from the training samples. “ID” denotes the image index and “T” denotes the image class index.
98.5∼98.7% classification accuracies. They are followed by no additional learning as in disCLBP, VZ MR8, VZ Joint,
PRICoLBP, disCLBP, VZ MRF, LFD, BRINT, VZ MR8, and VZ MRF, RP, and CMR.
VZ Joint. Two baselines, LBP and LTP, give the lowest perfor- Results on KTH-TIPS Dataset. From Table III, one can
mances by using very low-dimensional feature representations. clearly see that the proposed LETRIST performs best among
Regarding the proposed features, LETRIST ASC1 yields the all the compared methods. The next top methods are BIF, PLS
best results and LETRIST FSC the worst. However, similar to and PRICoLBP with classification accuracies all being about
what has been shown on the Outex dataset, the combination of 0.5∼0.6% lower than ours. After these methods, SFC, COV-
three complementary features produces remarkably improved LBPD and LEBC obtain similar results and they work better
performance on this dataset. Although the state-of-the-art than LEP, CLBP and CLBC. The texton-based VZ MR8,
approaches have achieved high classification accuracies, they VZ joint and CMR have not shown great advantages on
typically require a large number of feature dimensions. For this dataset. For the proposed features, LETRIST ASC1 and
instance, the dimensions of SFC, BIF, LEBC, RP, CMR and LETRIST ASC2 (both 144 dimensions ) produce better results
PRICoLBP are 1728, 1296, 1280, 2440, 2440 and 1180, than CLBP (2200 dimensions) and CLBC (1990 dimensions).
respectively. In contrast, LFD and LETRIST respectively Moreover, our compact LETRIST ASC1 performs competi-
enjoy 264 and 413 dimensional features. In this situation, the tively with the 1278-dimensional SFC. It should be mentioned
proposed LETRIST outperforms LFD by about 0.6%. Note that the rotation-variant features were used in COV-LBPD and
that the good result of 98.6% achieved by BIF on this dataset LEP to produce the reported results in Table III. This puts them
is based on the multi-scale metric and scale-shifting. Without at a great advantage for the KTH-TIPS dataset in which the
such “post-processing” techniques, the classification accuracy images have no significant rotation. Nonetheless, the proposed
of BIF is only 98.1% [19], in contrast to our result of 98.54%. method surpasses all other methods investigated.
It is also noted that our method is training-free, requiring Results on UIUC Dataset. As shown in Table IV, BIF
12
TABLE IV TABLE V
C LASSIFICATION ACCURACY (%) ON THE UIUC DATASET. T HE RESULTS C LASSIFICATION ACCURACY (%) ON O UTEX TC40 A AND
OF PLS [59] AND LNP [57] WERE OBTAINED USING THE S UPPORT O UTEX TC40 BC (TC40 BC REPRESENTS TC40 B AND TC40 C, AND
V ECTOR M ACHINE (SVM) CLASSIFIER . T HE NUMBER MARKED WITH † IS THE RESULT FOR TC40 BC IS THE AVERAGE ACCURACY OBTAINED ON
THE DIMENSION OF ONE SINGLE HISTOGRAM FROM HISTOGRAM STACKS . TC40 B AND TC40 C). T HE RESULTS FOR THE METHOD MARKED WITH
T HE THREE BEST RESULTS ARE IN BOLD . † (‡) WERE OBTAINED USING THE PCA (SVM) CLASSIFIER . T HE RESULTS
FOR THE COMPARED METHODS ARE TAKEN FROM [74].
Method Accuracy Dimension Published in
LBP [1] 64.06 26 PAMI02 Dataset ScatNet(†) ScatNet PCANet FV-VGGVD(‡) LETRIST
LTP [41] 82.08 52 TIP10 TC40 A 94.07 87.55 59.49 93.7 98.59
CLBP [40] 91.56 2200 TIP10 TC40 BC 77.93 72.45 44.39 71.6 98.35
CLBC [42] 92.30 1990 TIP12
VZ MR8 [36] 92.94 2500 IJCV05
VZ Joint [11] 97.83 2500 PAMI09
BIF [19] 98.8 1296(†) IJCV10
LEBC [23] 94.29 1280 VCIP13
PLS [59] 96.57 140 CVPR14
LNP [57] 89.2 - TCSVT15
SFC [24] 96.71 1728 PR15
LETRIST ASC1 96.45 144
LETRIST ASC2 96.52 144
LETRIST FSC 84.84 125
LETRIST 97.63 413 Fig. 12. Left: noise-free images; Right: noisy images with SNR=15, 10, and
5 (from left to right).
ACKNOWLEDGMENT
The authors would like to thank MVG, Z. Guo and Y. Zhao
for providing the source codes of LBP, CLBP and CLBC.
R EFERENCES
[1] T. Ojala, M. Pietikäinen, and T. Mäenpää, “Multiresolution gray-scale
and rotation invariant texture classification with local binary patterns,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971–987,
Jul. 2002.
[2] T. Ahonen, A. Hadid, and M. Pietikäinen, “Face description with local
binary patterns: Application to face recognition,” IEEE Trans. Pattern
Fig. 13. Performance comparison of different methods in the presence of Anal. Mach. Intell., vol. 28, no. 12, pp. 2037–2041, Dec. 2006.
different levels of Gaussian noise on Outex and CUReT datasets. [3] A. Larsen, J. Vestergaard, and R. Larsen, “HEp-2 cell classification using
shape index histograms with donut-shaped spatial pooling,” IEEE Trans.
Med. Imag., vol. 33, no. 7, pp. 1573–1580, Jul. 2014.
[4] T. Song and H. Li, “Local polar DCT features for image description,”
noise with zero mean and standard deviation determined IEEE Signal Process. Lett., vol. 20, no. 1, pp. 59–62, 2013.
by the signal-to-noise ratio (SNR) [10]. Fig. 12 shows two [5] T. Song, F. Meng, Q. Wu, B. Luo, T. Zhang, and Y. Xu, “L2SSP:
Robust keypoint description using local second-order statistics with soft-
example images and their noisy images and Fig. 13 presents pooling,” Neurocomput., vol. 230, pp. 230–242, 2017.
the classification results of different methods with respect to [6] T. Song and H. Li, “WaveLBP based hierarchical features for image
different SNR levels. One can see that the proposed LETRIST classification,” Pattern Recogn. Lett., vol. 34, no. 12, pp. 1323–1328,
2013.
shows very strong “anti-noise” ability compared with LBP, [7] Y. Zhang, J. Wu, and J. Cai, “Compact representation of high-
LTP, CLBP, CLBC and LEBC at all noise levels on both dimensional feature vectors for large-scale image recognition and re-
datasets. Moreover, LETRIST works much better than the trieval,” IEEE Trans. Image Process., vol. 25, no. 5, pp. 2407–2419,
May. 2016.
state-of-the-art BRINT when SNR>5 for all three Outex test [8] Y. Guo, G. Zhao, and M. PietikäInen, “Discriminative features for
suites, while BRINT works much better than LETRIST on texture description,” Pattern Recogn., vol. 45, no. 10, pp. 3834–3843,
both Outex and CUReT datasets when SNR=5. In addition, Oct. 2012.
[9] U. Kandaswamy, S. Schuckers, and D. Adjeroh, “Comparison of tex-
LETRIST and SFC perform competitively at higher SNR ture analysis schemes under nonideal conditions,” IEEE Trans. Image
levels (e.g., SNR=100 and 30) on Outex and CUReT. In the Process., vol. 20, no. 8, pp. 2260–2275, Aug. 2011.
case of SNR=5, LETRIST significantly outperforms SFC on [10] L. Liu, Y. Long, P. Fieguth, S. Lao, and G. Zhao, “BRINT: Binary
rotation invariant and noise tolerant texture classification,” IEEE Trans.
both datasets. The noise robustness of LETRIST mainly lies Image Process., vol. 23, no. 7, pp. 3071–3084, Jul. 2014.
in that: i) LETRIST is built upon the extremum responses [11] M. Varma and A. Zisserman, “A statistical approach to material classifi-
derived from low-order Gaussian derivative filters. Thus it is cation using image patch exemplars,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 31, no. 11, pp. 2032–2047, Nov. 2009.
noise-robust in design. ii) It uses the global averaging operator [12] F. Khellah, “Texture classification using dominant neighborhood struc-
for scalar quantization, which is also robust to Gaussian noise. ture,” IEEE Trans. Image Process., vol. 20, no. 11, pp. 3270–3279, 2011.
[13] T. Song, H. Li, F. Meng, Q. Wu, B. Luo, B. Zeng, and M. Gabbouj,
“Noise-robust texture description using local contrast patterns via global
V. C ONCLUSION measures,” IEEE Signal Process. Lett., vol. 21, no. 1, pp. 93–96, 2014.
[14] R. Maani, S. Kalra, and Y. Yang, “Rotation invariant local frequency
In this paper, we have presented a simple yet effective descriptors for texture classification,” IEEE Trans. Image Process.,
image descriptor, Locally Encoded TRansform feature hIS- vol. 22, no. 6, pp. 2409–2419, Jun. 2013.
Togram (LETRIST), for texture classification. LETRIST is [15] J. He, H. Ji, and X. Yang, “Rotation invariant texture descriptor using
local shearlet-based energy histograms,” IEEE Signal Process. Lett.,
built by quantizing and encoding a set of transform features vol. 20, no. 9, pp. 905–908, Sep. 2013.
derived from the extremum responses of the first and second [16] J. Ren, X. Jiang, and J. Yuan, “Noise-resistant local binary pattern with
directional Gaussian derivative filters. The transform features an embedded error-correction mechanism,” IEEE Trans. Image Process.,
vol. 22, no. 10, pp. 4049–4060, Oct. 2013.
are constructed to characterize local texture structures and [17] L. Liu and P. Fieguth, “Texture classification from random features,”
their correlation. The scalar quantization, i.e., binary or multi- IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 3, pp. 574–586,
level thresholding, is adopted to generate informative texture Mar. 2012.
[18] X. Hong, G. Zhao, M. Pietikainen, and X. Chen, “Combining LBP
codes. The cross-scale joint coding is explored to build the difference and feature correlation for texture description,” IEEE Trans.
compact image feature representation. The proposed LETRIST Image Process., vol. 23, no. 6, pp. 2557–2568, Jun. 2014.
is training-free and efficient to implement. It is also low- [19] M. Crosier and L. D. Griffin, “Using basic image features for texture
classification,” Int. J. Comput. Vision, vol. 88, no. 3, pp. 447–460, 2010.
dimensional, yet discriminative and robust for texture de- [20] J. Zhang, J. Liang, and H. Zhao, “Local energy pattern for texture
scription. Experimental results demonstrate that our method is classification using self-adaptive quantization thresholds,” IEEE Trans.
not only robust to rotation, illumination, scale and viewpoint Image Process., vol. 22, no. 1, pp. 31–42, Jan. 2013.
[21] W. Freeman and E. Adelson, “The design and use of steerable filters,”
changes, but also robust to Gaussian noise. In future work, IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 9, pp. 891–906,
we plan to extend the proposed method from the 2D plane Sep. 1991.
14
[22] J. Zhang, H. Zhao, and J. Liang, “Continuous rotation invariant local [48] A. Satpathy, X. Jiang, and H.-L. Eng, “LBP-based edge-texture features
descriptors for texton dictionary-based texture classification,” Comput. for object recognition,” IEEE Trans. Image Process., vol. 23, no. 5, pp.
Vis. Image Underst., vol. 117, no. 1, pp. 56–75, Jan. 2013. 1953–1964, May. 2014.
[23] T. Song, F. Meng, B. Luo, and C. Huang, “Robust texture representation [49] S. Liao, M. Law, and A. Chung, “Dominant local binary patterns for
by using binary code ensemble,” in Proc. VCIP, 2013, pp. 1–6. texture classification,” IEEE Trans. Image Process., vol. 18, no. 5, pp.
[24] T. Song, H. Li, F. Meng, Q. Wu, and B. Luo, “Exploring space-frequency 1107–1118, May. 2009.
co-occurrences via local quantized patterns for texture representation,” [50] S. Liao, X. Zhu, Z. Lei, L. Zhang, and S. Z. Li, “Learning multi-scale
Pattern Recognit., vol. 48, no. 8, pp. 2621–2632, 2015. block local binary patterns for face recognition,” in Proc. ICB, 2007,
[25] L. Deng and D. Yu, “Deep learning: Methods and applications,” Found. pp. 828–837.
Trends Signal Process., vol. 7, no. 3-4, pp. 197–387, Jun. 2014. [51] L. Wolf, T. Hassner, and Y. Taigman, “Effective unconstrained face
[26] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and recognition by combining multiple descriptors and learned background
T. Darrell, “DeCAF: A deep convolutional activation feature for generic statistics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 10, pp.
visual recognition,” in Proc. ICML, 2014, pp. 647–655. 1978–1990, Oct. 2011.
[27] J. Bruna and S. Mallat, “Invariant scattering convolution networks,” [52] K. Wang, C.-E. Bichot, C. Zhu, and B. Li, “Pixel to patch sampling
IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1872–1886, structure and local neighboring intensity relationship patterns for texture
Aug. 2013. classification,” IEEE Signal Process. Lett., vol. 20, no. 9, pp. 853–856,
Sep. 2013.
[28] T.-H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, “PCANet: A
[53] L. Liu, P. Fieguth, G. Kuang, and H. Zha, “Sorted random projections for
simple deep learning baseline for image classification?” IEEE Trans.
robust texture classification,” in Proc. ICCV, Nov. 2011, pp. 391–398.
Image Process., vol. 24, no. 12, pp. 5017–5032, Dec. 2015.
[54] S. Murala, R. Maheshwari, and R. Balasubramanian, “Local tetra
[29] Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks patterns: A new feature descriptor for content-based image retrieval,”
and applications in vision,” in Proc. ISCAS, May 2010, pp. 253–256. IEEE Trans. Image Process., vol. 21, no. 5, pp. 2874–2886, May. 2012.
[30] T. Randen and J. Husoy, “Filtering for texture classification: a compar- [55] K.-C. Fan and T.-Y. Hung, “A novel local pattern descriptor—-local
ative study,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 4, pp. vector pattern in high-order derivative space for face recognition,” IEEE
291–310, Apr. 1999. Trans. Image Process., vol. 23, no. 7, pp. 2877–2891, Jul. 2014.
[31] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural features for [56] A. Ramirez Rivera, R. Castillo, and O. Chae, “Local directional number
image classification,” IEEE Trans. Syst., Man, Cybern., vol. SMC-3, pattern for face analysis: Face and expression recognition,” IEEE Trans.
no. 6, pp. 610 –621, Nov. 1973. Image Process., vol. 22, no. 5, pp. 1740–1752, May. 2013.
[32] R. Kashyap and A. Khotanzad, “A model-based method for rotation [57] S. Wang, Q. Wu, X. He, J. Yang, and Y. Wang, “Local N-ary pattern
invariant texture classification,” IEEE Trans. Pattern Anal. Mach. Intell., and its extension for texture classification,” IEEE Trans. Circuits Syst.
vol. PAMI-8, no. 4, pp. 472–481, Jul. 1986. Video Technol., vol. 25, no. 9, pp. 1495–1506, Sep. 2015.
[33] H. Deng and D. Clausi, “Gaussian MRF rotation-invariant features for [58] X. Qi, R. Xiao, C. Li, Y. Qiao, J. Guo, and X. Tang, “Pairwise rotation
image classification,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, invariant co-occurrence local binary pattern,” IEEE Trans. Pattern Anal.
no. 7, pp. 951–955, Jul. 2004. Mach. Intell., vol. 36, no. 11, pp. 2199–2213, Nov. 2014.
[34] T. Ahonen and M. Pietikäinen, “Image description using joint distribu- [59] Y. Quan, Y. Xu, Y. Sun, and Y. Luo, “Lacunarity analysis on image
tion of filter bank responses,” Pattern Recogn. Lett., vol. 30, no. 4, pp. patterns for texture classification,” in Proc. CVPR, 2014.
368–376, Mar. 2009. [60] W. Zhang, S. Shan, W. Gao, X. Chen, and H. Zhang, “Local gabor
[35] T. Leung and J. Malik, “Representing and recognizing the visual binary pattern histogram sequence (LGBPHS): a novel non-statistical
appearance of materials using three-dimensional textons,” Int. J. Comput. model for face representation and recognition,” in Proc. ICCV, 2005.
Vision, vol. 43, no. 1, pp. 29–44, Jun. 2001. [61] B. Zhang, S. Shan, X. Chen, and W. Gao, “Histogram of gabor
[36] M. Varma and A. Zisserman, “A statistical approach to texture classifi- phase patterns (HGPP): A novel object representation approach for face
cation from single images,” Int. J. Comput. Vision, vol. 62, no. 1-2, pp. recognition,” IEEE Trans. Image Process., vol. 16, no. 1, pp. 57–68,
61–81, Apr. 2005. Jan. 2007.
[37] D. Lowe, “Distinctive image features from scale-invariant key-points,” [62] S. Xie, S. Shan, X. Chen, and J. Chen, “Fusing local patterns of gabor
Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004. magnitude and phase for face recognition,” IEEE Trans. Image Process.,
[38] S. Lazebnik, C. Schmid, and J. Ponce, “A sparse texture representation vol. 19, no. 5, pp. 1349–1361, May. 2010.
using local affine regions,” IEEE Trans. Pattern Anal. Mach. Intell., [63] Z. Lei, S. Liao, M. Pietikäinen, and S. Li, “Face recognition by exploring
vol. 27, no. 8, pp. 1265–1278, Aug. 2005. information jointly in space, scale and orientation,” IEEE Trans. Image
[39] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, “Local features Process., vol. 20, no. 1, pp. 247–256, Jan. 2011.
and kernels for classification of texture and object categories: A com- [64] M. Yang, L. Zhang, S. C.-K. Shiu, and D. Zhang, “Monogenic binary
prehensive study,” Int. J. Comput. Vis., vol. 73, no. 2, pp. 213–238, Jun. coding: An efficient local feature extraction approach to face recogni-
2007. tion,” IEEE Trans. Inf. Forensics Security, vol. 7, no. 6, pp. 1738–1751,
[40] Z. Guo, L. Zhang, and D. Zhang, “A completed modeling of local binary Dec. 2012.
pattern operator for texture classification,” IEEE Trans. Image Process., [65] J. J. Koenderink and A. J. van Doorn, “Surface shape and curvature
vol. 19, no. 6, pp. 1657–1663, Jun. 2010. scales,” Image Vision Comput., vol. 10, no. 8, pp. 557–565, Oct. 1992.
[66] K. Pedersen, K. Stensbo-Smidt, A. Zirm, and C. Igel, “Shape index
[41] X. Tan and B. Triggs, “Enhanced local texture feature sets for face
descriptors applied to texture-based galaxy analysis,” in Proc. ICCV,
recognition under difficult lighting conditions,” IEEE Trans. Image
2013, pp. 2440–2447.
Process., vol. 19, no. 6, pp. 1635–1650, Jun. 2010.
[67] T. Ojala, T. Maenpaa, M. Pietikainen, J. Viertola, J. Kyllonen, and
[42] Y. Zhao, D.-S. Huang, and W. Jia, “Completed local binary count for S. Huovinen, “Outex—-new framework for empirical evaluation of
rotation invariant texture classification,” IEEE Trans. Image Process., texture analysis algorithms,” in Proc. ICPR, 2002, pp. 701–706.
vol. 21, no. 10, pp. 4492–4497, Oct. 2012. [68] K. J. Dana, B. van Ginneken, S. K. Nayar, and J. J. Koenderink,
[43] B. Zhang, Y. Gao, S. Zhao, and J. Liu, “Local derivative pattern versus “Reflectance and texture of real-world surfaces,” ACM Trans. Graph.,
local binary pattern: Face recognition with high-order local pattern vol. 18, no. 1, pp. 1–34, Jan. 1999.
descriptor,” IEEE Trans. Image Process., vol. 19, no. 2, pp. 533–544, [69] M. Fritz, E. Hayman, B. Caputo, and J.-O. Eklundh, “On the significance
Feb. 2010. of real-world conditions for material classification,” in Proc. ECCV,
[44] J. Chen, S. Shan, C. He, G. Zhao, M. Pietikainen, X. Chen, and W. Gao, 2004, pp. 253–266.
“WLD: A robust local image descriptor,” IEEE Trans. Pattern Anal. [70] A. Depeursinge, A. Foncubierta-Rodriguez, D. Van De Ville, and
Mach. Intell., vol. 32, no. 9, pp. 1705–1720, Sep. 2010. H. Muller, “Rotation–covariant texture learning using steerable riesz
[45] G. Zhao, T. Ahonen, J. Matas, and M. Pietikäinen, “Rotation-invariant wavelets,” IEEE Trans. Image Process., vol. 23, no. 2, pp. 898–908,
image and video description with local binary pattern features,” IEEE Feb. 2014.
Trans. Image Process., vol. 21, no. 4, pp. 1465–1477, Apr. 2012. [71] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei, “ImageNet:
[46] M. Heikkila and M. Pietikainen, “A texture-based method for modeling A large-scale hierarchical image database,” in Proc. CVPR, June 2009,
the background and detecting moving objects,” IEEE Trans. Pattern pp. 248–255.
Anal. Mach. Intell., vol. 28, no. 4, pp. 657–662, Apr. 2006. [72] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial
[47] S. ul Hussain and B. Triggs, “Visual recognition using local quantized pyramid matching for recognizing natural scene categories,” in Proc.
patterns,” in Proc. ECCV, 2012, pp. 716–729. CVPR, vol. 2, 2006, pp. 2169–2178.
15
[73] M. Cimpoi, S. Maji, and A. Vedaldi, “Deep filter banks for texture Qingbo Wu (S’12-M’13) received the B.E. degree
recognition and segmentation,” in Proc. CVPR, 2015, pp. 3828–3836. in Education of Applied Electronic Technology from
[74] L. Liu, P. W. Fieguth, Y. Guo, X. Wang, and M. Pietikäinen, “Local Hebei Normal University in 2009, and the Ph.D.
binary features for texture classification: Taxonomy and experimental degree in signal and information processing in U-
study,” Pattern Recogn., vol. 62, pp. 135–160, 2017. niversity of Electronic Science and Technology of
[75] V. Chandrasekhar, J. Lin, O. Morère, H. Goh, and A. Veillard, “A China in 2015. From February 2014 to May 2014, he
practical guide to CNNs and Fisher Vectors for image instance retrieval,” was a Research Assistant with the Image and Video
Signal Process., vol. 128, pp. 426–439, 2016. Processing (IVP) Laboratory at Chinese University
of Hong Kong. Then, from October 2014 to October
2015, he served as a visiting scholar with the Image
& Vision Computing (IVC) Laboratory at University
of Waterloo. He is currently a lecturer in the School of Electronic Engineering,
University of Electronic Science and Technology of China. His research inter-
Tiecheng Song received his Ph.D. degree in Signal ests include image/video coding, quality evaluation, and perceptual modeling
and Information Processing from University of Elec- and processing.
tronic Science and Technology of China (UESTC) in
2015. From October 2015 to April 2016, he joined
the Multimedia Lab of Nanyang Technological U-
niversity, Singapore, as a visiting student. He is
currently working in the School of Communication
and Information Engineering, Chongqing University
of Posts and Telecommunications (CQUPT). His
research interests include feature extraction, texture
analysis, and image representation.