You are on page 1of 5

Content-Based Image Retrieval Incorporating Models of Human Perception

M. Emre Celebi
Dept. of Computer Science and Engineering
University of Texas at Arlington
Arlington, TX 76019 U.S.A.
celebi@cse.uta.edu

Y. Alp Aslandogan
Dept. of Computer Science and Engineering
University of Texas at Arlington
Arlington, TX 76019 U.S.A.
alp@cse.uta.edu

Abstract

evaluated based on anecdotal accounts of good and poor
matching results [6].

In this work, we develop a system for retrieving medical
images with focus objects incorporating models of human
perception. The approach is to guide the search for an
optimum similarity function using human perception. First,
the images are segmented using an automated
segmentation tool. Then, 20 shape features are computed
from each image to obtain a feature matrix. Principal
component analysis is performed on this matrix to reduce
the number of dimensions. Principal components obtained
from the analysis are used to select a subset of variables
that best represents the data. A human perception of
similarity experiment is designed to obtain an aggregated
human response matrix. Finally, an optimum weighted
Manhattan distance function is designed using a genetic
algorithm utilizing the Mantel test as a fitness function. The
system is tested for content-based retrieval of skin lesion
images. The results show significant agreement between
the computer assessment and human perception of
similarity. Since the features extracted are not specific to
skin lesion images, the system can be used to retrieve other
types of images.

In this work, we develop a system for retrieving
medical images with focus objects incorporating models of
human perception. The approach is to guide the search for
an optimum similarity function using human perception.

1. Introduction
Current content-based retrieval systems use low-level
image features based on color, texture, and shape to
represent images. However, except in some constrained
applications such as human face and fingerprint
recognition, these low-level features do not capture the
high-level semantic meaning of images [14]. In most of the
systems, another aspect that is as important as the features
themselves has been neglected: The processing and
interpretation of those features by human cognition.
Although the ultimate goal of all image similarity metrics is
to be consistent with human perception, little work has
been done to systematically examine this consistency.
Commonly, the performance of similarity metrics is

Figure 1 shows an overview of the system. First, the
images are segmented using an automated segmentation
tool. Then, 20 shape features are computed from each
image to obtain a feature matrix. Principal component
analysis is performed on this matrix to reduce its
dimension. The principal components obtained from the
analysis are used to select a subset of variables that best
represents the data. A human perception of similarity
experiment is designed to obtain an aggregated human
response matrix. Finally, an optimum weighted Manhattan
distance function is designed using a genetic algorithm
utilizing the Mantel test as a fitness function.
Segmentation

Feature
Comp.

Dimensionality
Reduction

Feature
Sel.

Feature
Matrix

Feature
Matrix
Human
Experiments

Human
Matrix

GA

Weights

Human
Matrix

Figure 1. System overview
The system is tested for content-based retrieval of skin
lesion images. The results show significant agreement
between the computer assessment and human perception of
similarity. Since the features extracted are not specific to
skin lesion images, the system can be used to retrieve other
types of images.

2. Segmentation and feature computation
We use 184 skin lesion images obtained from various
online image databases [1,4] as our data set. All images

Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’04)
0-7695-2108-8/04 $ 20.00 © 2004 IEEE

called feature extraction.16 4. three of these features are shape related features.17871 4. Cum. Eigenvalues and explained variances PC 1 2 3 4 5 6 7 8 9 10 Figure 2. 3. Some variables may be difficult or expensive to measure therefore. the constructed Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’04) 0-7695-2108-8/04 $ 20. is linear or non-linear transformation of the original feature space that optimizes a particular class separability criterion. and border irregularity. summarizes the clinical features of pigmented lesions suggestive of melanoma (a deadly form of skin cancer) by: asymmetry (A).2 Using principal components to select a subset of variables Substantial dimensionality reduction can be achieved using n << m PCs instead of m variables. 35. extent. The percentage of variance explained for each variable is presented in table 2. PCA is applied on the correlation matrix. It is useful to examine the sums of squares of the loadings for each row of this matrix because the row sum of squares indicates how much variance for that variable is explained by the retained PCs [5]. collecting data on them in future studies may be impractical [5].73 In our analysis. Interestingly.89 20. Our data consists of 184 samples each having 20 features.1 Principal component analysis of skin lesion data Principal Component Analysis (PCA) is an unsupervised linear feature extraction technique that transforms a set of variables into a substantially smaller set of uncorrelated variables that represents most of the information in the original set of variables [5].72 84. since each PC is a linear combination of all m variables [8]. is selection of a subset of features that minimizes intra-class variance and maximizes inter-class variance. [16].85 7.22 1. The other approach. eccentricity. but usually the values of all m variables are still needed to calculate the PCs.have a resolution of about 500 pixels per centimeter. convexity. Segmented skin lesion images The ABCD rule of dermatoscopy [12].67 5. From table 1 it can be seen that 8 PCs account for 95. 3. while the original variables are readily interpretable. Table 1. recommended by American Cancer Society. border irregularity (B). contour sequence moments.77% of variation in the data. we choose to retain 8 principal components. The eigenvalue and the Eigenvalue 7. One approach. The correlations between the components and the variables constitute the principal component loading matrix.06 % Var. Explained variances for each variable Variable 1 2 3 4 5 6 7 8 9 10 % Variance 99 95 96 98 89 98 98 100 98 99 Variable 11 12 13 14 15 16 17 18 19 20 % Variance 96 86 99 89 99 99 98 95 100 85 As it can be seen. orientation.03291 0. SkinSeg.77 97. asymmetry. Furthermore. compactness. Table 2.05 79.17062 1.53324 1.44458 0.9 1. major and minor axis length. color variegation (C) and diameter greater than 6 mm (D). developed by Goshtasby et al.89 56. Some of the segmented lesion images are shown in figure 2.67 98. Dimensionality reduction and feature selection Dimensionality reduction can be achieved in two different ways. For each image in the database. 3. lesion area and perimeter.2 72. called feature subset selection.82177 0.56 4. bending energy. For segmentation of lesion images we have used an automated tool. solidity.44 93. we compute 20 shape features: Diameter.00 © 2004 IEEE .38006 0. Since features have arbitrary units.11 2. convex hull area and perimeter.55 95.31 15.88 89.06138 3.21225 % Variance 35. percentage of explained variance for the first 10 PCs are given in table 1. 8 PCs account for more than 90% of variation in 16 of the variables and more than 85% of variation in all variables.91134 0.

feature computation and PCA steps we have used all of the images in our data set. the variable with the highest loading on the relevant PC is selected to represent that component. This can be formulated as: m  n n  ¦ i R 2 (i) 1 where R2 (i) is the squared multiple correlation of the ith discarded variable with the retained variables.j)+S4 (i. Optimization of the similarity function using a genetic algorithm Genetic algorithms are stochastic optimization methods that can handle high-dimensional optimization problems [3].j) Note that weighting is necessary while obtaining S since we want to emphasize dissimilarity more than similarity. In the segmentation. 6 (4th Moment). Human perception of similarity experiments Since the ultimate user of an image retrieval system is human.PCs may not be easy to interpret. The total amount of variation the selected variables account for can be used as a criterion to evaluate the efficiency of a particular subset of variables in representing the original set of variables [5]. unless it has been chosen to represent a larger PC. a large entry represents a pair of dissimilar images.j)+(1/8)*S3 (i. and S4 for “Not similar”). We have used Parallel Genetic Algorithm Library Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’04) 0-7695-2108-8/04 $ 20. the study of human perception of image content from a psychophysical level is crucial [14]. 2) = 1128 trials (comparisons). using 184 images in the experiments is impractical. few content-based image retrieval systems have taken into account the characteristics of human visual perception and the underlying similarities between images it implies [7]. 4.j) by 1. S3 for “Partially similar”. If we add to that the variances of the retained variables. Figure 3 shows a snapshot of the graphical user interface of the experiment. This scale is adapted from a psychophysical study [9]. We take the weighted average of SK to obtain the overall dissimilarity matrix S as follows: S (i. Each discarded variable is regressed on the retained variables and the corresponding squared multiple correlations are summed. the percentage of variation that the selected subset of variables explains is 88. Each session took about 19 minutes on the average. 12 (Solidity). As a result.j)+(1/27)*S2 (i. partially similar. The experiment consists of 3 sessions each comprised of approximately 376 trials. one for each level of similarity. considerably similar. and 5 (3rd Moment) and discard the remaining ones. Experiment GUI The subjects are expected to rate the similarity between pairs of images on a scale of four: Extremely similar. The image on the left is the reference image. Figure 3. However. Therefore. instead of using n PCs. The overall dissimilarity matrix S is the weighted average of the individual similarity matrices SK= {SK (i. Since S is a dissimilarity matrix. However. Nine subjects participated in the experiments. The total amount of variation that a subset of variables explains is the sum of the variation they explain in each of the discarded variables plus the sum of the variances for the variables comprising the subset. 18 (Eccentricity). 19 (Orientation).00 © 2004 IEEE . whereas a small entry corresponds to a pair of similar images. In this procedure. Therefore we select 48 images from the database to use in the experiment. starting with the largest retained PC.29%. and not similar. We design a human perception of similarity experiment to incorporate human perception into our shape-based image retrieval system. and the one on the right is the test image. to account for most of the variation in the data [8]. In our case. we increase the corresponding SK (i. 20 (Extent). 14 (Asymmetry). in our case 1 for each variable. 5. (Matrix S1 is for “Extremely similar”. we retain the following variables: 10 (Perimeter). We construct the human response matrix following the approach described in Guyader et al. This means C (48. since the number of trials is proportional to the square of the number of images. we could use n of the original variables. We use Jolliffe’s B4 method [8] for variable subset selection. [7].j)}. Each time a subject associates a test image j to a reference i. it might be preferable if. we can obtain a measure of the total amount of variation that a subset of variables explains.j)= (1/64)*S1 (i. S2 for “Considerably similar”.

Since Manhattan distance is a symmetric function. Figure 4. To symmetrize this matrix.i]).0).j] and [j. we can take the average of symmetric entries (i.56. it is not necessarily symmetric. [i. 3) Compute the rA’B statistic between A’ and B using eq. 5. their method cannot be used for general similarity based image retrieval. [6] have developed a methodology for designing similarity metrics based on human visual system.2 Results of the optimization Initially (i. respectively. In other words. 1. Rogowitz et al. The statistic used for measuring the correlation between two matrices A and B is the classical Pearson coefficient: r 1 N N ª Aij  A º ª Bij  B º »« » ( Equation 1) ¦¦ « N  1 i 1 j 1 «¬ s A »¼ «¬ s B »¼ where N is the number of elements in the lower or upper triangular part of the matrix. with all weights equal to 1. which is the correlation between two matrices. [15] have used shape similarity judgments from human subjects to evaluate the performance of several shape distance metrics.(PGAPack) developed by David Levine [10] to implement our algorithm. In each generation. Figure 4 shows a selection of matching results. Ɩ and SA are the mean and standard deviation of elements of A.. Since the Mantel test works with symmetric dissimilarity matrices we need to transform these matrices to symmetric dissimilarity ones. we use the Mantel test as the fitness function of our genetic algorithm. Frese et al. The metric they propose is based on color and spatial attributes. match Human match 5. we can transform the feature matrix to a dissimilarity matrix by taking the weighted Manhattan distances between each row using the gene values of the fittest individual in that generation as weights.. Mojsilovic et al. this information is not used to improve the performance of a CBIR system.00 © 2004 IEEE .73. this matrix is guaranteed to be symmetric. They conduct two psychophysical scaling experiments aimed at uncovering the dimensions human observers use in rating the similarity of photographic images and compare the results with two algorithmic image similarity methods. Query Comp. After optimization the correlation becomes 0. They perform subjective experiments and analyze the results using multidimensional scaling to Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’04) 0-7695-2108-8/04 $ 20. The human response matrix is already a dissimilarity matrix. Although their work provides insight into how humans judge similarity. Related work To the authors’ knowledge relatively little work has been done to incorporate human perception of similarity in CBIR systems in a systematic manner. is used as a fitness value indicating the goodness of a particular set of weights. A selection of matching results 6. the output of the Mantel test.e. 4) Repeat steps 2 and 3 a great number of times (>5000). the correlation between the aggregated human matrix and the feature matrix is 0. creating a new matrix A’. For the specific case of shape similarity Scassellati et al. The testing procedure for the Mantel test is as follows [2]: 1) Compute the Pearson coefficient rAB using eq. 2) Permute randomly rows and the corresponding columns of one of the matrices. Therefore. This means using optimization we achieve 31% improvement in the agreement between the computer assessment and human perception.1 Mantel test The Mantel test is a statistical technique for comparing two n x n dissimilarity matrices.e. [11] have developed a perceptually based image retrieval system based on color and texture attributes. 1. We have used a GNU general public license software “zt” for performing the Mantel test [2]. [13] have studied how human observers judge image similarity. But. As mentioned before.

(2001) “The Practical Handbook of Genetic Algorithms. (2000) “The labeled dissimilarity scale: A metric of perceptual dissimilarity. pp.. have been developed [14]. J. 7: 1-12 [3] Chambers L. Currently the human experiment requires approximately one hour to be completed.” Springer-Verlag New York Inc... their model is suitable for only global similarity retrieval. U.00 © 2004 IEEE .extract relevant dimensions. However.” IEEE International Workshop on Neural Network for Signal Processing [8] Jolliffe I.ps [11] Mojsilovic A.. Frese T.” Perception & Psychophysics. Bouman C. 2185: 2-14 [16] Xu L. B.. L.edu/kw/derm/ [2] Bonnet E. R.” Chapman & Hall/CRC Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’04) 0-7695-2108-8/04 $ 20. 1314..mcs. References [4] Cohen B. [7] have developed a natural scene classification system incorporating human perception based on Gabor features... Dhawan A. Also. 3299: 26-29 [14] Rui Y. Yu C. They also design distance metrics for color and texture matching. (1986) “Principal Component Analysis. A psychophysical experiment was designed to measure the perceived similarity of each image with every other image in the database.edu/derm [5] Dunteman G. Jackowski M. and Flickner M. Kovacevic J. Proceedings of the SPIE Conference on Storage and Retrieval for Image and Video Databases Conference II. Conclusions Content-based image retrieval has been an active research area in the past 10 years. However. Huang T. Goshtasby A. “Dermatology Image Bank.). Since the early 90s numerous image retrieval systems. (Eds. “DermAtlas” Johns Hopkins University of Medicine.... (2000) “Matching and Retrieval Based on the Vocabulary and Grammar of Color Patterns.” Journal of Statistical Software. another content-based image retrieval system powerful in color or texture aspects can be combined with our system..utah.1319 [13] Rogowitz B. H...” IEEE Transactions on Image Processing.jhmi. However. Available at http://wwwmedlib. both research and commercial. For example. Guyader et al. K. agreement between the computer assessment and human perception is improved by 31 percent.med. 3016: 472-483 [7] Guyader N. for general similarity based retrieval.anl. (1999) “Image Retrieval: Current Techniques.” Available at ftp://ftp. and Lehmann C... the same approach can be used to develop similarity functions based on other low-level features such as color or texture. B.” Sage Publications [6] Frese T. Smith J..” Image and Vision Computing. (1996) “Users Guide to the PGAPack Parallel Genetic Algorithm. Herault J.” in Niblack W. They optimize a set of weights by analyzing the results of a psychophysical experiment by multidimensional scaling. Herve L. Ganapathy S. S. Available at http://dermatlas. (2002) “zt: A Software Tool for Simple and Partial Mantel Tests. Applications. and Jain R. these metrics are not optimized for maximum agreement with human perception of similarity. 17: 65-74 [1] Bezzant L.. 10: 39-62 [15] Scassellati B. (1994) “Retrieving images by 2D shape: a comparison of computation methods with human perceptual judgments... White T. A.P (1997) “A Methodology for Designing Image Similarity Metrics Based on Human Visual System Models. and Guerin A. we used aggregated human perception as a guide in optimizing a visual similarity function in a content-based image retrieval system. As a result of optimization. The weights of the similarity function were optimized by means of a genetic algorithm using the dissimilarity matrix obtained from the human experiments. and Hayes M. 62: 152-161 [10] Levine D.. In this system we focus on shape similarity. In this work.” Proceedings of the SPIE Conference on Human Vision and Electronic Imaging. Alexopoulos S.” The Journal of the American Medical Association. 8. (1999) “Segmentation of Skin Cancer Images. ed.med.. (1998) “Perceptual Image Similarity Experiments. and Huntley A... 9: 38-54 [12] NIH Consensus Conference (1992) “Diagnosis and treatment of early melanoma. (2002) “Towards the Introduction of Human Perception in a Natural Scene Classification System. and Chang S. Promising Directions and Open Issues” Journal of Visual Communication and Image Representation. E. Further work needs to be done to minimize the experiment duration in order to include experts in the experiments.” University of Utah Health Sciences Library.. Roseman D. and Peer Y. Hu J. and Safranek R. T. (1989) “Principal Component Analysis.gov/pub/pgapack/user_guide. Bouman C.” Proceedings of the SPIE Conference on Human Vision and Electronic Imaging II. The main contribution of this work is the incorporation of human perception into this task in a systematic and generalizable manner. classes such as people or animals cannot be discriminated 7. A. and Kalin E. and Allebach J.. [9] Kurtz D. Bines S.