You are on page 1of 16

J. Vis. Commun. Image R. 17 (2006) 701716 www.elsevier.

com/locate/jvci

Evaluation and comparison of texture descriptors proposed in MPEG-7


Feng Xu *, Yu-Jin Zhang
Department of Electronic Engineering, Tsinghua University, Beijing 100084, PR China Received 1 March 2004; accepted 20 October 2005 Available online 29 November 2005

Abstract Texture description contributes as one of the most important low-level features in content-based image retrieval. In MPEG-7, homogeneous texture descriptor (HTD), texture browsing descriptor (TBD), and edge histogram descriptor (EHD) have been proposed as texture descriptors. However, no comprehensive evaluation and comparison of these three descriptors have been made. In this paper, we propose a comprehensive evaluation and comparison benchmark for feature descriptors, especially for visual descriptors in MPEG-7. In the proposed benchmark, three texture descriptors in MPEG-7 are evaluated and compared. First, the descriptors are analyzed according to the standard criteria. Second, experiments are implemented on the Brodatz texture image database. Analysis of the experimental results shows that each descriptor has some specic characteristics and performs better than the other two in certain applications. The applicability is also summarized for each descriptor. The survey as well as performance evaluation and comparison in this paper provide several guidelines for using these descriptors in image retrieval and other applications. 2005 Elsevier Inc. All rights reserved.
Keywords: Content-based image retrieval; MPEG-7; Homogeneous texture descriptor; Texture browsing descriptor; Edge histogram descriptor

1. Introduction MPEG-7, an international standard whose ocial name is Multimedia Content Descriptor Interface, aims at providing fundamental tools for describing multimedia content. MPEG-7 denes the syntax and semantics to describe the multimedia content, which consists of seven parts: system, description denition language (DDL), visual descriptor, audio descriptor, multimedia description scheme (MDS), reference software, and conformance testing. In visual descriptor part, the visual descriptors are specied as normative descriptors, basic descriptors, and descriptors for localization [1]. Some core visual descriptors are dened to describe the color, texture, shape, and motion features of visual data.

Corresponding author. Fax: +86 10 62770317. E-mail address: f-xu02@mails.tsinghua.edu.cn (F. Xu).

1047-3203/$ - see front matter 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.jvcir.2005.10.002

702

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

In MPEG-7, three texture descriptors are dened as: homogeneous texture descriptor (HTD), texture browsing descriptor (TBD), and edge histogram descriptor (EHD) [2,3]. Abundant researches on texture description have been performed with the development of MPEG-7 [1,49]. Texture is one of the most important low-level features in content-based image retrieval (CBIR). As Zhang [10] summarized, there are several models for texture representation, including texture model based on spatial relationship (such as auto-correlation, co-occurrence gray-level matrix, fractal, and stochastic eld), texture model based on frequency relationship (such as power spectrum and wavelet), and texture model based on perceptual structure. An eective texture descriptor can signicantly improve the performance of image retrieval. Therefore, evaluation and comparison between dierent texture descriptors are necessary. Although MPEG-7 has exhibited many characteristics of the three texture descriptors and a signicant number of previous work have proposed several algorithms to improve their performance, there are no comprehensive evaluation and comparison, to the best of our knowledge, which have been performed. With wide application of MPEG-7, the analytical and comprehensive evaluation and comparison among a series of descriptors are crucially necessary in order that the most appropriate descriptor can be used eectively and eciently in some certain application. In this paper, the three texture descriptors in MPEG-7 are evaluated and compared in an overall and hierarchical framework. First, some common evaluation criteria, including good retrieval accuracy, compact features, general application, low computation complexity, robust retrieval performance, and hierarchical coarse-to-ne representation, are used in the analytical study as in [11]. Second, evaluation and comparison experiments are implemented. Since texture-based image retrieval is one of the most important applications for texture descriptors, we implemented image retrieval experiments for evaluation and comparison. Image retrieval is implemented on a texture image database containing a signicant number of images so that the performance of each descriptor can be examined adequately. In the image retrieval, the widely accepted Brodatz texture database [12] is used and some negative inuencing factors are considered. First, the robustness of the retrieval performance to noise, rotation and compression is considered. Second, the eciency of the descriptors is compared with computing time. Finally, the retrieval performances with dierent distance measures are investigated. All results are illustrated with precision and recall indicators to compare descriptors performance [13]. Based on the evaluation and comparison on eectiveness and eciency, we summarized applicability for each descriptor. As a survey and performance testing paper, the main contributions of this paper are in three aspects. First, a comprehensive framework for evaluation and comparison of the feature descriptors is proposed, in which some other series descriptors, especially visual descriptors in MPEG-7 such as color descriptors and shape descriptors, can also be evaluated and compared. It provides a standard benchmark for descriptors evaluation and comparison. Second, three texture descriptors dened in MPEG-7 are evaluated and compared. This provides the basis of selection in application. Third, the applicability for the three texture descriptors is summarized. The rest of the paper is organized as follows. In Section 2, three descriptors are briey introduced and explained. In Section 3, an analytical discussion and evaluation of the three descriptors are provided. In Section 4, some experimental comparisons based on image retrieval are given in detail. In Section 5, the applicability is summarized for each descriptor. Conclusions are presented in Section 6. 2. Texture descriptors In this section, the principles of the three texture descriptors HTD, TBD, and EHD are described and discussed. 2.1. Homogeneous texture descriptor The homogeneous texture descriptor characterizes the region texture using the energy and energy deviation in a set of frequency channels [2]. In MPEG-7, it is designed for texture-based image search and retrieval. HTD is extracted by Gabor lter banks which partition the frequency space with equal angle of 30 in angular direction and with octave division in radial direction. According to some previous results [1,6], the best numbers of angular and directional parameters are 6 and 5, respectively, resulting in 30 channels in total. A demonstration of the channels is illustrated in Fig. 1.

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716


Channel (Ci)

703

Fig. 1. Frequency space partition of HTD.

In each channel, the following 2-D Gabor function is applied to lter the image: " # " # 2 2 x x s h hr GPs;r x; h exp exp ; 2r2 2r2 qs hr

where {xs = x0 2s, s = 0, 1, 2, 3, 4} are the center frequencies in the radial direction, and x0 is the center frequency of the highest frequency channel, specied by 3/4. The corresponding bandwidths are {Bs = B0 2s, s = 0, 1, 2, 3, 4}, and B0 is the largest bandwidth speciedp by 1/2. 5} are the center {hr = 30 r, r = 0, 1, 2, 3, 4, p  angles in the angular direction. In addition, rqs Bs =2 2 ln 2, where rhr is a constant 30 =2 2 ln 2. After ltering, the rst and second moments in 30 frequency channels are computed, together with the intensity mean and standard deviation of the original image, to compose the HTD represented as a 62-dimensional vector. Then the descriptor is quantized into 8 bits for each number according to the standardized tables. A detailed introduction to HTD is given in [1]. When HTD is used for texture-based image retrieval and indexing, the general distance measure is city block distance (absolute value distance): X ei ej d i d j ; d ij 2 where dij denotes the distance between two images i and j; ei and ej denote the corresponding energy mean of descriptor vectors; di and dj denote the energy deviation, respectively. 2.2. Texture browsing descriptor TBD relates to a perceptual characterization of texture, similar to human visual characterization in terms of regularity, coarseness, and directionality. This descriptor is useful for browsing application and coarse classication of texture [2], which is also computed based on Gabor lter banks. The dominant orientations and scales are determined by projection, and then the regularity is computed. Finally, the TBD vector, also called Perceptual Browsing Component (PBC) in [8], is integrated. The TBD is a 5-dimensional vector expressed as: PBC Regularity Directionality 1 Directionality 2 Scale1 Scale2 3

where Regularity represents the degree of periodic structure of the texture. The larger the Regularity value is, the more regular the pattern is. Directionalities represent for two dominant orientations of the texture while Scales represent the two dominant scales of the texture. In Eq. (3), Directionality1 denotes the primary dominant orientation and Directionality2 denotes the secondary dominant orientation. Analogously, Scale1 denotes the primary dominant scale and Scale2 denotes the secondary dominant scale. In MPEG-7, the regularities are cast into four degrees: irregular, slightly regular, regular, and highly regular. The TBD vector

704

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

can also be quantized according to the recommended tables. The detailed computing process for TBD can be found in [8]. When TBD is used in texture-based image retrieval and browsing, Euclidean distance measure shown below can be applied as the similarity measure. v u 5 uX 2 d ij t 4 PBC i PBC j ;
k 1

where dij denotes the distance between two images i and j. 2.3. Edge histogram descriptor The edge histogram descriptor represents for the spatial distribution of ve types of edge in local image regions, which are dened as four directional edges and one non-directional edge. The four directional edges are generated by counting edges at 0, 45, 90, and 135 directions respectively. In the implementation of the descriptor, an image is divided into 4 4 non-overlapping sub-images. Further, each sub-image is divided into image-blocks (the number of blocks depends on specic application). The ve types of edge information can be extracted from the image-blocks by edge detection operators. Thus, for each sub-image a local edge histogram with 5 bins is generated and the total of 80 histogram bins (16 sub-images multiplying 5 bins) is achieved for the whole image. The division of sub-image and image-block is illustrated in Fig. 2. Here, an image-block is denoted as B and the edge lter (2 2 matrix) coecients are denoted as f(k), k = 0,1,2,3. The magnitude m of each edge can be calculated as follows: X 3 5 m B f k . k 0 If the maximum value among the ve types of edge strength is greater than a pre-determined threshold, the image-block is considered as containing the corresponding edge in it. Otherwise, the image-block contains no edge. After all edge values of the same type are summed up in one sub-image, the ve bins for dierent edge types are obtained for each sub-image. The values of the edge bins are normalized by the total number of blocks. Finally, the whole histogram can be quantized according to the recommended tables. The more detailed computation process can be found in [9]. When EHD is used in texture-based image retrieval and indexing, Euclidean distance measure can be implemented before quantization. However, the histogram intersection shown below is preferred after quantization. PL1 k 0 min H i k ; H j k P i; j 6 PL1 k 0 H i k where H denotes the histogram of an image, and L denotes the total number of bins.

sub-image
(0,0) (1,0) (2,0) (3,0) (0,1) (1,1) (2,1) (3,1) (0,2) (1,2) (2,2) (3,2) (0,3) (1,3) (2,3) (3,3)

image-block

Fig. 2. Denition of sub-image and image-block of EHD.

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

705

3. Analytical discussions In the previous section, three texture descriptors are described in detail. An analytical discussion about the similarities and dierences of the three descriptors is given below. The similarities among HTD, TBD, and EHD are as follows: (a) HTD, TBD, and EHD are all perceptually meaningful. HTD captures energy features in all directions and scales in Gabor transform domain, which represents texture exactly and comprehensively; TBD captures regularity as well as two dominant directions and scales, which is consistent with human visual perception; EHD captures edge information after dividing the image into sub-images, which is statistical description for texture. (b) HTD, TBD, and EHD are all application independent. No prior knowledge or information about the particular type of texture is assumed. (c) HTD, TBD, and EHD all have constant dimensions. The dimension is invariant once the feature type, i.e., the texture descriptor, is selected and used. The dierences among HTD, TBD, and EHD are as follows: (a) Feature domain. HTD is obtained from Gabor transform domain, i.e., energy domain; TBD is obtained from both Gabor transform domain and spatial domain since the perceptual browsing component is computed by the ltered image and expressed in the spatial domain; EHD is obtained from spatial domain. (b) Feature representation. The representation of HTD and EHD is not very compact. HTD is 62-dimensional and EHD is 80-dimensional, which may lead to time-consuming in image-to-image matching. The representation of TBD is more compact. Its dimension is only 5, which can provide quick browsing. (c) Feature computation complexity. The computation process of TBD is more complex than that of HTD since Gabor ltering is just the pre-processing for the computation of TBD. The computation of TBD needs some extra processes to obtain the regularity as well as dominant directions and scales, as described in Section 2. The computation process of EHD is the least complex one among these three descriptors, as it can be directly obtained in spatial domain by edge detection operators. So, TBD is the most time-consuming in computation. (d) Type of features captured. HTD only captures global features providing holistic characteristic. TBD captures both global features and local features. However, it only provides the perceptual characteristic instead of precise numerical characteristic. EHD only captures local features that provide particular numerical characteristic. (e) Parameters or thresholds inuence. For HTD, two crucial parameters are total numbers of orientation and scale of Gabor lters, which are recommended as 6 and 5, respectively, in MPEG-7. For TBD, besides the same parameter as that of HTD, a threshold for the candidate selection [8] is signicantly inuential. For EHD, a threshold is important in determining which types of edge the image-block belongs to. (f) Hierarchical representation. HTD supports hierarchical representation including base representation (32-dimensional vector without the second energy moments) and enhanced representation (62-dimensional vector), while TBD and EHD do not support hierarchical representation. (g) Suitability for ecient indexing. HTD is extracted by Gabor lters which are successful in texture representation. Therefore it is quite suitable for texture based image retrieval and indexing. TBD is represented according to human perception and recommended for texture browsing. However, it is quite unsuitable for image retrieval. EHD is histogram-based descriptor and can be either used as an individual indexing tool or integrated with any other histogram.

706

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

4. Experimental comparison and evaluation 4.1. Experiment setting To test the performance of the three descriptors, we implemented the texture-based image retrieval and indexing system which mainly consists of feature extraction and image ranking based on similarity measure. In the feature extraction, the feature vectors for each of the descriptors are computed and then stored in the database. In the image ranking, a certain distance measure between the query image and the database images is computed and all the images in the database are ranked as its distance value to the query. The detailed process will be discussed below. The well-known Brodatz texture image database including 112 images is used in the experiment. Every image in size of 640 640 is divided into 25 non-overlapping sub-images that are gathered into a class naturally. Thus, an image database with 2800 images belonging to 112 classes is prepared for image retrieval experiment. Common performance measures, i.e., precision and recall, are used as the evaluation criteria. They are dened as [13]: precision recall No. relevant images retrieved ; Total No. images retrieved 7 8

No. relevant images retrieved ; Total No. relevant images

Every image in the image database is used as a query in turn and drives image retrieval. Then, the corresponding precision and recall are calculated for each image. For the whole image database, the precision and recall are computed by averaging the precisions and recalls of all the images. Through image ranking, the top N images are returned according to their similarity measures. If there is a relevant image in returned set, i.e., the returned image belongs to the same class of the query image, the number of relevant images retrieved increases by 1. Finally, the corresponding precision and recall values are computed and recorded and average precision and recall of all the images are achieved from all the 2800 images. The three types of negative inuencing factors are considered and investigated in detail here: robustness to noise, rotation, and compression. Then the computational eciency and the performance based on dierent similarity measures are explored. For robustness, we present experimental results in two parts. First, the same inuences on dierent texture descriptors are evaluated and compared. The column graphs are used to evaluate the performance of the dierent descriptors under the same inuence. The horizontal axis is the degree of the inuencing factor and the precision/recall changes with the inuence degree. Second, the dierent inuencing factors on each descriptor are investigated and compared. The curves of precisionrecall are applied to illustrate the performance of each descriptor. The horizontal axis is the recall and the precision changes with the increase of recall. 4.2. Robustness 4.2.1. Inuence of noise Noise is often an unavoidable eect for image retrieval and indexing. De-noise ltering can be used, but some texture information may also be eliminated by ltering. Therefore, whether a descriptor is robust or not to noise is quite important. To test the robustness of the descriptors to noise, Gaussian noise with zero mean and several dierent standard deviations has been added to each image in the database in our retrieval experiments. Three descriptors are used as the retrieval features, respectively, and the Euclidean distance measure is used as the similarity measure. The standard deviation of the noise ranges from 0, which has no eect to images, to 76.5, which makes the image texture signicantly illegible. Some discrete values are implemented in the experiments; the corresponding noise standard deviations are 6.375, 19.125, 25.5, 38.25, and 76.5, respectively. When the number of returned images is 20, Fig. 3 illustrates the experimental results, in which the horizontal axis

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

707

A 35
30
Pr ec is ion ( %)

25 20 15 10 5 0 0 6.375 19.125 25.5 38.25 76.5


Noise

HTD TBD EHD

Precision with noise

B 30
25
Rec all ( %)

20 15 10 5 0 0 6.375 19.125 25.5 Noise Recall with noise 38.25 76.5

HTD TBD EHD

Fig. 3. Precision and recall comparison with noise.

denotes the increase of noise standard deviation and vertical axis denotes precision/recall. Fig. 3 (A) presents the precision and (B) presents the recall. From Fig. 3, it can be concluded that noise has weak inuence on three texture descriptors, especially for HTD and TBD. For HTD, when the noise standard deviation increases little (such as 6.375), the retrieval precision and recall are almost non-decreasing. Then, the precision and recall decrease with the increase of the noise standard deviation. Until the noise standard deviation equals 76.5 for which the image textures are blurred signicantly, the precision and recall only decrease by less than 5%. Although the precision and recall are quite low for TBD, they also decrease little with the increase of noise standard deviation. So HTD and TBD are quite robust to noise and suitable for texture based image retrieval and browsing, respectively, in noisy circumstance, such as web image retrieval and fabric image retrieval. The possible reason is that Gabor lter is stable to detect the orientations and scales. In noisy circumstance, though the texture patterns are blurred, the energy distribution does not change too much so that the dominant orientations and scales can be detected correctly. Relatively, EHD is aected by noise more than HTD and TBD. Since EHD is extracted in spatial domain, the spatial distribution and edges change dramatically with the noise. So EHD is not suitable for noisy circumstance. 4.2.2. Inuence of rotation Rotation is another important factor for image retrieval. It is desired that texture description is invariant to rotation. Thus, it is necessary to investigate whether the three descriptors are robust to rotation or not. A rotation experiment as in [14] is implemented. The pre-processing includes two steps. First, every image in the database is rotated by a certain degree. For 25 images in the same class, the orientations are from 0 to 360 with dierent interval angles, respectively. The degree of interval angle ranges from 0 to 28.8, in which some discrete values are implemented in the experiments. The corresponding degrees of interval angles are 3.6, 7.2, 14.4, and 28.8. Second, each rotated image is cropped and interpolated with bilinear method in order to be in the same size as the original image. Then, the descriptor could be extracted from the rotated

708

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

images. In the image ranking, the Euclidean distance measure is applied. When the number of returned images is 20, the experimental results are illustrated in Fig. 4, in which the horizontal axis denotes the increase of degree of interval angle and vertical axis denotes precision/recall. Fig. 4 (A) presents the precision and (B) presents the recall. From Fig. 4, it can be found that rotation aects performance of three texture descriptors signicantly. For all the three texture descriptors, the precision and recall decrease excessively when the images are rotated. There is not much dierence for precision/recall when the degree of interval angle ranges from 3.6 to 28.8. However, the precision and recall almost achieve the minimum at the 14.4 interval since the total interval angle of all the 25 images in one image class is 360. When the interval angle is smaller than 14.4, the texture between two adjacent images does not dier to each other so signicantly as that of 14.4. When the interval angle is larger than 14.4, some images are rotated to the original images, leading to a little increase of the precision/recall. On the whole, the performance of the three texture descriptors decreases dramatically even with a little rotation. The possible reason is that the texture pattern will be regarded as another texture pattern after rotation. For HTD, the precision decreases more than 15% and recall decreases more than 10% because the dominant orientations are shifted; for TBD, although the precision/recall is quite low even without rotation, the precision/recall also decreases; for EHD, the precision and recall both decrease nearly 10%. Though the EHD can express edge information to some extent, it is not as stably rotation-invariant as some shape descriptors (such as Fourier Descriptor). It can be concluded that the three texture descriptors are not robust to rotation, especially for HTD. 4.2.3. Inuence of compression As more and more images are stored in JPEG compressed format, it is important to investigate how to obtain three descriptors directly from the compressed domain. DCT transform is the main compression technique used in the existing JPEG image compression standard. Images are rst divided into 8 8 blocks and
A 35
30 Pr ec is ion ( % ) 25 20 15 10 5 0 0 3.6 7.2 14.4 28.8 HTD TBD EHD

Interval Angle (degree)

Precision with rotation

B 30
25 R ec all ( % ) 20 15 10 5 0 0 3.6 7.2 14.4 28.8 HTD TBD EHD

Interval Angle (degree)

Recall with rotation


Fig. 4. Precision and recall comparison with rotation.

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

709

decomposed into DCT domain where pixel energy is packed into the same number of DCT coecients. Subsequently, the quantized DCT coecients are zigzagged and entropy coded (Human codes). As both DCT quantization and Human encoding in JPEG are implemented with simple look-up-table operations, the major computing cost for compression/decompression lies in the forward or inverse DCT transform. If three descriptors can be extracted directly from the compressed domain, the computing cost required in decompression can be avoided [15]. Here, we rst investigate the inuence of DCT compression on three texture descriptors. In general, a 2dimensional DCT can be expressed as below:     N 1 X N 1 X 2x 1 up 2y 1vp C u; v a u a v f x; y cos cos ; 9 2N 2N x0 y 0 where ( p 1=N ; au p 2=N ;

u0 u 1; 2; . . . ; N 1

Since DCT can make energy convergence, only a few DCT coecients can represent the image in compressed domain. In our experiments, the DCT coecients over a certain threshold will be preserved. The thresholds equaling 10, 100, and 300 are considered respectively. Through the experiments on the Brodatz database, it is found that more than 90% larger coecients are preserved when the coecients over 10 are kept; while about 10% larger coecients are preserved when the coecients over 100 are kept. Only 1% larger coecients are preserved when the coecients over 300 are kept. In Eq. (9), when u = 0 and v = 0, the following DC coecient of an 8 8 image-block is obtained: C 0; 0
7 X 7 1X f i ; j . 8 i 0 j 0

10

Eq. (10) is related to the average pixel value inside the block. If we ignore all the AC coecients and repeat the average of pixel values 8 times along row and column direction respectively, we can reconstruct an approximated image which has exactly the same size as the original one directly from the DC coecients. Although it is dicult to extract reliable texture descriptors due to the loss of the details in compressed domain, this DC coecient algorithm provides primary foundation for content-based image indexing and retrieval in compressed domain. For JPEG images, this algorithm just needs to complete the entropy (Human) de-coding of the image-blocks and de-zigzaging and de-quantizing the compressed format in turn [16], followed by extraction of descriptors from the DCT coecients without full IDCT. From Eq. (10), it is seen that this type of extraction only applies one multiplication for each block, while full IDCT will require 4032 additions and 4096 multiplications for each block [17]. Then the descriptor extraction could be performed on the compressed images when the DCT coecients over dierent thresholds or only DC coecients are preserved. In the image ranking, the Euclidean distance measure is also applied. When the number of returned images is 20, the experimental results are illustrated in Fig. 5, in which the horizontal axis denotes the threshold of preserved DCT coecients and vertical axis denotes precision/recall. Fig. 5 (A) presents the precision and (B) presents the recall. From Fig. 5, it can be found that HTD and TBD are more robust than EHD to DCT compression. For HTD, the precision/recall is almost non-decreasing when the coecient threshold smaller than 100. For TBD, the precision/recall decreases little even only DC coecient is preserved. For EHD, the precision/recall decreases little when the coecient threshold smaller than 10. When more coecients are discarded, the precision/recall decreases signicantly. On the whole, the compression does not aect three descriptors signicantly because of the energy convergence of DCT transform. The more the larger coecients are preserved, the higher the precision/recall is maintained. The observation that EHD and TBD can maintain the better performance than that of EHD is also due to Gabor lter. Although some information is discarded, the main pattern is invariable so that Gabor lter can detect the dominant orientations and scales. On the

710

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

A 35
30
Pr ec is ion ( %)

25 20 15 10 5 0 0 10 100 300 DC Compression Coefficient Threshold Precision with compression

HTD TBD EHD

B 30
25
Rec all ( %)

20 15 10 5 0 0 10 100 300 DC
Compression Coefficient Threshold

HTD TBD EHD

Recall with compression


Fig. 5. Precision and recall comparison with compression.

other hand, image retrieval performance only by DC coecients shows that the descriptors extraction directly from DCT compressed domain is possible, though it is quite dicult and unreliable due to the low resolution of the DC images. In the work by Huang et al. [18], an algorithm for retrieving JPEG compressed images based on weighted texture features is proposed. In this algorithm, all texture features are extracted in DCT compressed domain. First, the weight for each query is selected by training, and then a weighted distance measure is applied to the matching. For Brodatz texture database, the average precision is signicantly higher than that of the above experiments. It shows that for image retrieval in compressed domain, other descriptors than three descriptors proposed in MPEG-7 are required. 4.3. Performance of the three texture descriptors For each texture descriptor, dierent inuences weaken the same descriptor diversely. It is necessary to compare the performance of the same descriptor under dierent inuences, which can provide the instruction to select the appropriate texture descriptor in a certain application. A set of negative inuencing factors in Section 4.2 is investigated again. First, the Gaussian noise with zero mean and standard deviation equaling 19.125 is added on all the images and then image retrieval is implemented by three texture descriptors respectively. Second, all images are rotated by 14.4 interval angle between two adjacent images in the same image class and retrieval is implemented. Finally, three descriptors are extracted directly from images with DC coecient in DCT transform domain, computed from Eq. (10), and then retrieval is implemented. The experimental results are illustrated as precisionrecall curves in Fig. 6, in which (a) shows the performance of HTD, (b) shows the performance of TBD, and (c) shows the performance of EHD. The horizontal axis denotes recall and the vertical axis denotes precision.

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

711

0.5 0.45 0.4 0.35 original noise rotated compressed

Precision

0.3 0.25 0.2 0.15 0.1 0.05 0 0 0.1 0.2 0.3 0.4 0.5 Recall 0.6 0.7 0.8 0.9 1

Precision-recall curve of retrieval using HTD

B
0.25

original noise rotated compressed

0.2

Precision

0.15

0.1

0.05

0.1

0.2

0.3

0.4

0.5 Recall

0.6

0.7

0.8

0.9

Precision-recall curve of retrieval using TBD

C
0.3

original noise rotated compressed

0.25

Precision

0.2

0.15

0.1

0.05

0.1

0.2

0.3

0.4

0.5 Recall

0.6

0.7

0.8

0.9

Precision-recall curve of retrieval using EHD

Fig. 6. Precisionrecall curves of image retrieval.

From Fig. 6, some fundamental properties can be concluded for each descriptor. HTD is quite robust to noise. However, rotation and compression of only DC coecients are aected more signicantly to HTD. Especially, when the recall is lower than 0.1, the precision decreases dramatically from the retrieval by the

712

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

original image data. For TBD, the retrieval precision by the original image data is quite low. All the retrievals under dierent inuences do not perform well enough although the precisions do not decrease so much. It is necessary to emphasize that the curve under compression is ascending on the whole, which proves that TBD is quite unsuitable for image retrieval because the small number of returned images cannot provide an acceptable precision. EHD is sensitive to all the negative inuencing factors, in which rotation has the least aection. The performances for three descriptors under compression are the worst, which proves the diculty of extraction directly from compressed domain. In Fig. 6, some precisionrecall curves decrease quickly until to a minimum point, and then increase a little. It accounts for the rank of relevant images. When quite a lot of relevant images are not ranked at the top of all the images, the precisionrecall could have minimum. On the contrary, the curve that the precision is monotonically decreasing with recall gives a good image retrieval performance. 4.4. Computational eciency To compare the computational eciency of three descriptors, we recorded the computation time for descriptor extraction and image ranking by similarity measure in Brodatz texture database by using a PC running Pentium IV-2.0G with 512 M memory. The program is made with C code. 4.4.1. Time consumption for descriptor extraction To eliminate the time sway caused by some parameters in the algorithms and dierent images, 2800 processes of descriptor extraction for all the images in the database have been implemented for each descriptor. The average running time of those processes is given in Table 1. It can be seen from Table 1 that HTD is the most ecient in descriptor extraction. Although it is not the shortest descriptor, HTD requires the least computing complexity among the three descriptors. TBD is the most time-consuming. In fact, the high-level feature is more complex. So it can be concluded that the more perceptual the descriptor, the more complex the descriptor extraction is. 4.4.2. Time consumption for ranking In practical applications, ranking time is more important for users. If feature data (or descriptors) of all images are stored in the database of the system, the query time is just the retrieval time. To make the average retrieval time more precise, we implemented 2800 processes of query and image ranking for all the images in the database. The average running time is given in Table 2. It can be seen from Table 2 that HTD is the least ecient in image retrieval. From the point of view of distance measure, the computation of similarity measure for TBD is much faster than that of the other two descriptors since its dimension is only ve. Although the computation time for ranking is directly proportional to the dimension of the vectors in general, HTD is more time-consuming than EHD because HTD consists of real part and image part which doubles the computational complexity. So HTD is less ecient in real time system even though it has high performance in retrieval. However, TBD is less suitable for retrieval due to the low performance even though it has high computation eciency. Relatively, EHD balances on the eectiveness and eciency.
Table 1 The average elapsed time of feature extraction Texture descriptor Average running time of feature extraction (ms) HTD 585 TBD 814 EHD 664

Table 2 The average running time of ranking Texture descriptor Average running of ranking (ms) HTD 920 TBD 10 EHD 322

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

713

4.5. Dierent similarity measures Although the similarity measure is not included in the normative part of the MPEG-7 standard, it is important in retrieval. Recently, a large number of successful measure functions based on dierent principles (such as statistics, psychology, medicine, social, and economic sciences, etc.) have been proposed and implemented to verify whether it is appropriate for retrieval. Minkowski distance is often used as similarity measure which is accompanied by specic meaning in physics and mathematics. In this set of distance measures, two of the most familiar measures are Euclidean distance and city block distance. Histogram intersection is another important similarity measure for low-level feature, especially used for histogram features. There are also some more complex similarity measures including perception, recognition and subjectivity. Considering the recommendation of MPEG-7, some distance measures are implemented on three texture descriptors in our experiments. The Mahalanobis distance is not considered because dierent descriptors would require dierent covariance matrices and for some descriptors it is simply impossible to dene a covariance matrix. When the number of returned images is 20, the corresponding precisions are shown in Table 3. From the table, it can be seen that although the City block distance (dened in Eq. (2)) and Euclidean distance (dened in Eq. (4)) are dierent, there is no essential dierence between their performances. Both of them are quite eective. For HTD, the weighted city block distance is weighted by normalized deviation of the energy moment, which improves the precision signicantly at the cost of computation. For TBD, it is very dicult to nd a good matching measure as mentioned in [19]. For EHD, Euclidean distance is ecient enough without quantization. After quantization, histogram intersection is more ecient and convenient for combined retrieval with other histogram features. 5. Applicability The three texture descriptors have been used widely in many applications. In the following, we try to summarize the applicability for three descriptors from some typical existing applications. As recommended in MPEG-7, HTD is suitable for texture-based image retrieval; TBD is suitable for texture-based browsing; EHD is suitable for texture and shape-based image retrieval. Detailed discussions are given below. 5.1. HTD (a) Suitable for texture-based image retrieval. This is the general application for HTD because it has much better performance than that of many other texture descriptors. Some detailed comparison results can be found in [6]. (b) Used in texture-based segmentation. Since HTD can capture the most salient features of a texture pattern, dierent texture patterns in one image can be distinguished by it. For instance, a remote sensing image can be divided into many regions according to its texture, in which vegetation tends to have distinct texture while ocean area is almost smooth [6]. 5.2. TBD (a) Texture browsing and indexing. This is the initially proposed application of this descriptor in MPEG-7. The TBD vector represents the regularity, coarseness (scales) and orientations of a texture pattern, which has more intimate correlation to Human Visual System (HVS). Several examples of texture patterns and
Table 3 The precisions of the dierent measures City block distance HTD TBD EHD 35.42% 11.76% 16.14% Weighted city block distance 50.61% Euclidean distance 30.06% 10.90% 16.50% Histogram intersection 22.83%

714

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

their TBD are illustrated in Fig. 7, in which each texture pattern has regularity value and dominant orientations and scales. (b) Texture classication. Some similar textures belong to the same pattern class in HVS and perceptual concept, which have the same or similar TBD. So texture patterns can be classied in terms of its TBD. Recommended by MPEG-7, texture can be classied into four categories from irregular and slightly regular to regular and highly regular. In our experiment, a sort of classication for images in Brodatz database according to the regularity is shown in Table 4. (c) Detection of texture structural defects, just as in [20]. Humans have a surprising capability to easily nd imperfections in spatial structures. But this capability to perceive local disorder is quite weak in computer vision. A survey on defect detection in texture is given in [21]. TBD can be used to detect texture structural detects due to its description consistent with human perception. We dene structural defects as irregularitiesregions where regularity is signicantly dierent to the dominant values of most other regions. In our defect detection experiment, the image is divided into appropriate blocks in which the

D75 [3.448000 4 4 2 3]

D46 [4.168000 2 3 3 3]

D102 [5.680000 5 5 4 2]

Fig. 7. TBD expressions of some Brodatz textures.

Table 4 Texture classication for Brodatz database Regularity value Regularity > 5 (highly regular) 4 < Regularity < 5 (regular) Number of images in Brodatz textures database D6 D21 D34 D52 D53 D102 D2 D3 D4 D5 D10 D14 D17 D18 D20 D22 D24 D25 D28 D33 D35 D36 D37 D38 D39 D40 D42 D43 D44 D46 D47 D50 D51 D54 D55 D56 D57 D60 D64 D65 D66 D67 D69 D73 D76 D80 D82 D84 D85 D86 D87 D93 D95 D97 D99 D101 D103 D104 D106 D107 D111 D1 D7 D8 D9 D11 D12 D13 D15 D16 D19 D23 D26 D27 D29 D30 D31 D32 D41 D45 D48 D49 D58 D59 D61 D62 D63 D68 D70 D71 D72 D74 D75 D77 D78 D79 D81 D83 D88 D89 D90 D91 D92 D94 D96 D98 D100 D105 D108 D109 D110 D112

3 < Regularity < 4 (slightly regular or irregular)

Fig. 8. Result of defect detection (D35).

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

715

regularity of TBD is computed. After obtaining the TBD vectors of all the blocks, we considered that the defect lies in the blocks with signicantly dierent regularity values to most of the other blocks. An example is shown in Fig. 8. 5.3. EHD (a) Suitable for texture and shape-based as well as color and texture-based image retrieval. EHD presents local edge feature while some shape descriptors (such as Fourier Descriptor) involve global feature, hence the combination between them can describe the edge of the object more precisely. Color is another important low-level image feature, often described by color histogram. Therefore, color and texture histograms are convenient to be combined and integrated to implement more eective image retrieval. (b) Used for a type of coarse representation of object edge. EHD consists of one non-directional and four directional (vertical, horizontal, 45, and 135) edges, thus it can be considered as a local representation of an object. Although it cannot be applied independently, it is a good supplement to object representation. An enhanced EHD is introduced in [9], in which the global information is included and better retrieval performance is achieved.

6. Conclusions In this paper, we propose a comprehensive evaluation and comparison benchmark for feature descriptors, especially for visual descriptors in MPEG-7. In this benchmark, three texture descriptors, HTD, TBD, and EHD dened in MPEG-7, are evaluated and compared. From the experiments on Brodatz texture image database, robustness, computation eciency, and similarity measure for the three descriptors are investigated. HTD is the most robust to noise. The performance of TBD is weakened dramatically by all types of negative inuencing factors. Generally, it is dicult to make three descriptors eective in compressed domain. For computation eciency, HTD is less time-consuming than the other two descriptors in the feature extraction while TBD is much less time-consuming than the other two descriptors in the image ranking. Euclidean distance and city block distance are both eective for these three texture descriptors. In the future, the combination of the descriptors will be studied to handle complex texture images. The combinations of these texture descriptors with other content descriptors are planned. Acknowledgments This work has been supported by the Grants SRFDP-20050003013 and NJUPT-K02089. References
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] Y.M. Ro, M. Kim, H.K. Kang, B.S. Manjunath, J. Kim, MPEG-7 homogeneous texture descriptor, ETRI J. 23 (2) (2001) 4151. ISO/IEC JTC1/SC29/WG11, MPEG-7 overview, V. 8, Doc. N4980, 2002. ISO/IEC JTC1/SC29/WG11, MPEG-7 overview, V. 9, Doc. N5525, 2003. M. Grgic, M. Ghanbari, S. Grgic, in: Texture-based Image Retrieval in MPEG-7 Multimedia System, EUROCON2001, vol. 2, 2001, 365368. P. Kruizinga, N. Petkov, S.E. Grigorescu, Comparison of texture features based on Gabor lters, in: Proceedings of the 10th International Conference on Image Analysis and Processing, Venice, Italy, September 27-29, 1999, pp. 142147. B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image data, IEEE Trans. Pattern Anal. Mach. Intell. 18 (8) (1996) 837842. B.S. Manjunath, J.R. Ohm, V.V. Vasudevan, A. Yamada, Color and texture descriptors, IEEE Trans. Circ. Syst. Vid. Technol. 11 (6) (2001) 703715. P. Wu, B.S. Manjunath, S. Newsam, H.D. Shin, A texture descriptor for browsing and similarity retrieval, Signal Process. Image Commun. 16 (2000) 3343. C.S. Won, D.K. Park, S.J. Park, Ecient use of mpeg-7 edge histogram descriptor, ETRI J. 24 (1) (2002) 2330. Y.J. Zhang, Content-Based Visual Information Retrieval (in Chinese), Science Publisher, Beijing, 2003.

716

F. Xu, Y.-J. Zhang / J. Vis. Commun. Image R. 17 (2006) 701716

[11] D.S. Zhang, G.J. Lu, A comparative study of curvature scale space and Fourier descriptors for shape-based image retrieval, J. Vis. Commun. Image Representation 14 (2003) 4160. [12] P. Brodatz, Texture: A Photographic Album for Artists and Designers, Dover, New York, 1996. ` ller, W. Mu ` ller, D.M. Squire, S.M. Maillet, T. Pun, Performance evaluation in content-based image retrieval: overview and [13] H. Mu proposals, Pattern Recognit. Lett. 22 (2001) 593601. [14] C.M. Pun, Rotation-invariant texture feature for image retrieval, Comput. Vis. Image Understanding 89 (2003) 2443. [15] H. Wang, A. Divakaran, A. Vetro, S.F. Chang, H. Sun, Survey of compressed-domain features used in audiovisual indexing and analysis, J. Vis. Commun. Image Representation 14 (2003) 150183. [16] A.R. McIntyre, M.I. Heywood, in: Exploring Content-based Image Indexing Techniques in the Compressed Domain, Proceedings of the 2002 IEEE Canadian Conference on Electrical and Computer Engineering, vol. 2, 2002, 957962. [17] J. Jiang, A. Armstrong, G.C. Feng, Direct content access and extraction from JPEG compressed images, Pattern Recognit. 35 (2002) 25112519. [18] X.Y. Huang, Y.J. Zhang, D. Hu, Image Retrieval Based on Weighted Texture Features Using DCT Coecients of JPEG images, ICICS-PCM 2003, 3B2.2 P0367 (15). [19] H. Eidenberger, Distance measures for MPEG-7-based retrieval, Technical Report TR-188-2-2003-20, Vienna University of Technology, Austria, 2003. [20] D. Chetverikov, Pattern regularity as a visual key, Image Vis. Comput. 18 (2000) 975985. [21] K.Y. Song, M. Petrou, J. Kittler, Texture defect detection: a review, SPIE: Applications of Articial Intelligence X: Machine Vision and Robotics 1708 (SPIE) (1992) 99106.

You might also like