You are on page 1of 10

International Journal of Synthetic Emotions

Volume 11 • Issue 1 • January-June 2020

Hybrid Features Extraction for


Adaptive Face Images Retrieval
Adel Alti, LRSD Laboratory, Computer Science Department, Sciences Faculty, University Ferhat Abbas Setif-1, Setif, Algeria
https://orcid.org/0000-0001-8348-1679

ABSTRACT

Existing methods of face emotion recognition have been limited in performance in terms of recognition
accuracy and execution time. It is highly important to use efficient techniques for improving this
performance. In this article, the authors present an automatic facial image retrieval combining the
advantages of color normalization by texture estimators with the gradient vector. Starting from a
query face image, an efficient algorithm for human face by hybrid feature extraction provides very
interesting results.

Keywords
Color Normalization, Gradient Vector, Similarity Distance, Texture Estimators

1. INTRODUCTION

Information Technologies (IT) is a crucial communications framework for transferring a large


number of facial images in computer networks. The dilemma of searching the relevant facial images
was a tedious issue in a large video material. This problem has brought the attention of experts and
researchers to come up with new innovative solutions to address these issues.
Content-based face image indexing and retrieval have been widely used as an effective solution
that could help to achieve video/image transmission with rate control. It consists of finding relevant
images in a large video (Karmakar, 2019). Current systems combine various features to improve
discrimination and classification. Other systems such as Photobook (MIT’s Vision and Modeling
Group) and VisualSEEK (Columbia University Center) involving the user by different interaction
modalities to refine their researches. These systems offer search results where the query is made
up of the whole image. In fact, VisualSEEK is known to be particularly efficient to cope with high
image data spaces. Indeed, it helps to improve emotions recognition, ameliorate search result and
enhance flexibility within the possibility of designating an object of the image (Ashraf et al. 2018).
This work focuses on feature-based emotions image indexing and retrieval. The main steps of
emotions classification system are feature extraction: the extraction of distinctive facial features,
classification: categorizing the extracted data (patterns) through a learning process. Our challenge in
this paper is to build computational models that select relevant features and apply similarity models
to detect accurately relevant requested images. The end goal of this research is to build user-aware
preferences to automatically respond and adapt to human needs. To reach this goal, we choose to
work with a standard dataset specially designed for emotional recognition. We mainly focus on

DOI: 10.4018/IJSE.2020010102

Copyright © 2020, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.


17
International Journal of Synthetic Emotions
Volume 11 • Issue 1 • January-June 2020

the problems related to linear transformations (rotation, scaling and translation) on images, or by
structuring the content of the image. We propose feature-based modeling approach that combines
gradient vector and normalized steering kernels from each color channel with estimators of the
covariance. The proposed methodology combines the advantages of color normalization by texture
estimators with the gradient vector.
The paper is organized as follows: the next section reviews the state of the art of recognition
and search models. In section 3, we introduce the proposed methodology for features modeling and
selection. The similarity evaluation strategy is presented in section 4. Classification results, as well
as discussion and main findings, are exposed in section 5. Section 6 summarizes our contributions
and concludes the paper.

2. RELATED WORK

Considerable works have been carried out on content-based image indexing. Existing works of
content-based image retrieval using dominant colors as well as the complexity of their content. They
use generic attributes such as color, shape or texture (Israel et al. 2004; Zhou et al. 2017;). Other
systems use XML schemas to search for images on their semantic and visual content (Hong & Nah,
2004). These visual primitives can be categorized into three main types: color-based descriptors,
texture-based descriptors and shape-based descriptors. Histograms (Boujemaa, Boughorbel, &Vertan,
2001) and Color Angles (Wang et al. 2010) are typical examples of the first type. In particular, color
angles (Wang et al. 2010) are considered one of the most powerful discriminative algorithms that were
applied to diverse classification problems including face recognition. In fact, color angles, are known
to be particularly efficient to cope with high dimensional data spaces (Costa, Humpire-Mamani, &
Traina, 2012). However, the problem with color angles is the fact of being frame-based classifiers
i.e. they are inherently unable to model pixels dependencies.
Co-occurrence matrix (Eleyan & Demirel, 2011) is another well-known discriminative technique
for texture-based descriptors and face recognition. Gabor filter (Abhishree, Latha, Manikantan, &
Ramachandran, 2015), Wavelet transformations (Ashraf et al. 2018) are other types of models that
are known to be part of the transform-based domains and especially part of the texture ones. In spite
of having less discriminative power than Color Angles and other discriminative classifiers, they have
the advantage of modeling efficiently sequences and temporal data due to their internal network
configuration. This property has made Color Angles very popular in the face recognition literature
(Wang et al. 2010). For instance, Color Angles were applied in (Mahoor, & Abdel-Mottaleb, 2008)
for multimodal face modeling and video indexing in a smart environment.
Other proposed texture analysis and classification techniques such as Fourier Mellin Transform
(Goecke, Asthana, Pettersson, & Petersson, 2007), algebraic moments (De Siqueira, Schwartz, &
Pedrini, 2013), contour models (Bouhini, Géry, & Largeron, 2013) were designed to detect specific
situations and classification results were quite interesting. Other face recognition studies applying
Fourier Mellin Transform can be found in (Derrode, & Ghorbel, 2001). However, it seems difficult to
find attributes that can model an image according to all of their aspects described above. More recently,
in (Karmakar, 2019) proposed a retrieval technique for medical images, the main idea is to find the
requested data based DWT domain, where a simple linear function is used for that. Comparison and
analysis of algorithms for image retrieval on a large images dataset was done.
The major problem with all these approaches is the non-consideration of all images features
retrieval. Lot of queries on lack of exploration of information available about the various image
features and the various computing signatures that lack exploration. The association between face
images and the investigative signatures was not studied for deriving useful satisfaction. To cope with
this limitation, other approaches are based on the combination of three descriptors-based techniques
for extracting the image features, which are the aim of our approach.

18
International Journal of Synthetic Emotions
Volume 11 • Issue 1 • January-June 2020

The purpose of this research paper is presenting a novel approach. It consists in enriching the
image retrieval with semantic content. Users then have the opportunity also to add some key terms
in order to guide and filter relevant images depending on their interests and preferences in order to
adapt the search process to the specific needs of users.

3. PROPOSED METHODOLOGY

We propose the segmentation of color images into coherent regions using gradient method. This
method is applied all the regions to extract local maxima. We have a shape descriptor formed from
the regions of the segmented image combined with other descriptors: color channel normalization
and estimators of the covariance. Then, all descriptors vectors (De Siqueira, Schwartz, & Pedrini,
2013) are concatenated to obtain the image features vector. Finally, Euclidean distance is applied to
find the input images based on features vectors. Some benefits are faster processing, supports reduced
cost and efficiency of the index.
The focus this paper is to apply similarity-based techniques based on the various visual attributes
to derive complete image signature and validate the relevance and significance of face image, which
facilitated in effective implication of these extracted features for each image by validating the
automatic relevance.
The proposed method improves facial recognition accuracy and execution time using reduced
feature vector size. The three signatures, color, texture and shape on the statistical distribution of the
image highlight the potential differences between the images.

3.1. Color Channel Normalization


The initial color image I (x , y ) → R (x , y ),V (x , y ), B (x , y ) is normalized. We use “chromatic” to
 
define the color and “astrometric” to define the brightness parameter. The new representation is a
linear space transformation obtained as follows:

R (x , y ) +V (x , y ) + B (x , y )
I1 = : Intensity at pixel location (x, y)
3

R (x , y ) −V (x , y ) + 1
I2 = : Color difference (Red / Green) (1)
2

R (x , y ) +V (x , y ) − 2B (x , y ) + 2
I3 = : Color difference (Yellow / Blue)
4

This normalization clearly describes the color content of the image. This content translated into
the feature vector and saved as a signature with the initial image in the image database.

3.2. Estimators of the Covariance


The image covariance estimators are computed directly on the image matrices. These estimators
measure the difference average between adjacent pixels gray levels. The estimated covariance value
used to determine the quantitative and qualitative characteristics of the image texture. The proposed
method measures co-occurrences of pixels in the window of 32 × 32 pixels. The offset between
two successive displacements is 4 pixels with a scale of 1, 2, 4, 8 and 16 pixels, according to four
directions given by θ = 0°, 45°, 90° then 135°. These estimators are obtained from co-occurrence

19
International Journal of Synthetic Emotions
Volume 11 • Issue 1 • January-June 2020

matrix Pd,θ (x, y). By selecting the same values d and orientations θ equal to 0 °, 45 °, 90° and 135 °,
the texture feature vector has been estimated as follows:

N i 6 di2 
COV (d, θ) = ∑ ∑ i − j ∗ Pd ,θ (i, j )   (2)
i =1 j =1 (
n n2 − 1 )
3.3. Gradient Vector
Our goal is to find homogeneous regions according defined through boundary of gradient vector.
The gradient norm at each pixel of window W of size d is defined as follows:

1. Direct application of continuous derivation for each pixel (x, y) of an image I, we compute the
partial derivatives Gx and Gy with respect to x and y:

Gx = I (x + 1, y ) − I (x , y )
(3)
Gy = I (x , y + 1) − I (x , y )

These gradients require the successive convolution of two masks as follows:

−1
H x = −1, 1 H y =   (4)
 1 

The amplitude G (x , y ) is formulated as follows:

G (x , y ) = Gx 2 + Gy 2 (5)

The gradient direction is defined by:

G 
D (x , y ) = Arctg  x  (6)
 Gy 

2. The color gradient image is also created;


3. Feature-based classification is attempt to classify curves that represent object shapes into several
classes using a local maximum maxC .

4. SIMILARITY-BASED EVALUATION STRATEGY

Information Retrieval System based on images as query belongs to the Query by Example categories.
Most of those information retrieval systems are using image as query and the result is usually a set
of similar images. This approach will try to merge the benefits of both visual and textual querying.
First, the user will upload a face image. Then our system will generate a list of images as the most
Content-Based Emotional Image Retrieval system (CBEIR). However, our approach will present a
list of face images that satisfies user research.

20
International Journal of Synthetic Emotions
Volume 11 • Issue 1 • January-June 2020

4.1. The Database Construction Phase


The indexing phase is the phase during which signatures are calculated and assigned to the images
captured from a sequence of images. We randomly select videos based on keywords “faces image,
emotions faces, face drawing expression”. The result is a set of 5742 images captured from a video
sequence. To fine the selection we selected images only about human emotions. Finally, we obtained
a random set of 1123 emotional images.

4.2. The Retrieval Phase


The retrieval task has a primordial step, which is the search phase. The main objective of video and
emotional images retrieval is the association of a color, shape and texture descriptors with a feature
vector on the query image automatically assigned or derived from its content.

4.3. Similarity and Display Search Results


The extracted features are widely used to analyze and find the request image by one or more criteria:
color, texture and shape. Table 1 show the different used Euclidean distances.

Table 1. Similarity distances

Similarity Distances Related Formula

Sc (I ,Q ) =
∑ i
min(I 1i , I 1q ) + min(I 2i , I 2q ) + min(I 3i , I 3q )
; i: 0 à 255 ;
Color
min( I , Q )3

Texture St (I ,Q ) = ∑ (Cov(d, θ))


i

∑ (max )
2
Shape S f (I ,Q ) = i
I
d ,i
− maxQ d ,i

wc Sc (I ,Q ) + wt St (I ,Q ) + w f S f (I ,Q )
Color, texture, and shape Sct f (I ,Q ) =
wc + wt + w f

5. EXPERIMENTAL RESULTS

The excellence of the proposed approach based three features descriptors is implemented with the
help of the C++ Builder tool. At the time of implementation process, multimedia indexing and search
system is evaluated on 30 video sequences that consists of 30 images that help to analyze the proposed
features extraction system efficiency. From the collected region, various features (color, shape and
texture) are extracted which are trained by using the SVM method which is stored as template in
database. Figure 1 shows the list of face images used in the experimentation. The efficiency of the
system is evaluated using the accuracy and response time.

5.1. Color Attribute


Once a query image entered, the system extracts automatically the color descriptor. The new
extracted features are compared with the trained features using the Euclidean distance. Such results
are obtained on a workstation Dell Latitude E5410 Laptop/Intel Core i5 2.67 GHz, 4 GB RAM

21
International Journal of Synthetic Emotions
Volume 11 • Issue 1 • January-June 2020

Figure 1. Sample of facial images used in experimentation

using C++ Builder. It is noticed from Table 3 that the execution time of our technique is acceptable
and the system displays all the images similar to the request image whose similarity distance is
greater than 0.75 (see Table 2).

5.2. Texture Attribute


To evaluate the efficiency of the presented method, the texture descriptor is computed on the
request image. Table 4 shows the similarity distance obtained from the indexed classified images

Table 2. Accuracy of the proposed face image recognition using color attribute

Facial Images Precision Recall


A 0.903 0.938
B 0.896 0.917
C 0.967 0.967

Table 3. Response time of the proposed face image recognition using color attribute

Facial Images Execution Time


A 8s
B 10s
C 6s

Table 4. Accuracy of the proposed face image recognition using texture attribute

Facial Images Precision Recall


A 0.952 0.983
B 0.918 0.967
C 0.880 0.917

using the texture attribute. The system tries to display all similar images to the request image when
the similarity is greater than 0.75. The obtained results are encouraging and demonstrate the high
efficiency (Table 5).

22
International Journal of Synthetic Emotions
Volume 11 • Issue 1 • January-June 2020

Table 5. Response time of proposed image search using texture attribute

Facial Images Execution Time


A 25s
B 20s
C 13s

5.3. Shape Attribute


Table 6 depicted that the proposed method effectively trains the extracted features with minimum time
when compared to the other training methods. In addition, it is noticed from Table 7 that a response
time equal to 5 seconds using shape attribute.

Table 6. Accuracy of the proposed face image recognition using shape attribute

Facial Images Precision Recall


A 0.550 0.687
B 0.166 0.112
C 0.457 0.589

Table 7. Response time of proposed image search using shape attribute

Facial Images Execution Time


A 5s
B 6s
C 4s

5.4. Combination of Image Attributes


The combination of different image features helps in restricting and personalizing pertinent images. In
other terms, the image captioning aids to enhance the flexibility of indexing and retrieval process of
pertinent images. Based on a query image formulated using an index constructed by three attributes,
color-texture-shape helps in reducing the time searching and enhances the retrieval by giving restricted
documents based on user preferences. An acceleration of the algorithm can be seen on supercomputers
that are more powerful that including massive parallelism with fine granularity.
After extracting set of combined attributes (color, texture and shape) for each face image the next
goal was to find it. The accuracy results obtained after applying face image recognition-using hybrid
features retrieval is given in Table 8. It is noticed from Table 8 that during the retrieval stage, most
of the face images record an accuracy between 93-96%. This accuracy for the face images depends
on hybrid features extraction and demonstrate the high efficiency.
From the table, it can be perceived that the number of combined attributes is 3 because during
repeated experiments it was observed maximum accuracy was achieved with 3 attributes. These results
were validated for real-time medical dataset. In future, more number of features can be extracted and
these experiments can be performed to get more validation about the retrieval tests for emotional
facial images (Table 9).

23
International Journal of Synthetic Emotions
Volume 11 • Issue 1 • January-June 2020

Table 8. Accuracy of the proposed face image recognition-using hybrid features extraction

Facial Images Precision Recall


A 0.911 0.932
B 0.927 0.982
C 0.937 0.968

Table 9. Response time of the proposed face image recognition-using hybrid features extraction

Facial Images Execution Time


A 52ms
B 42ms
C 35ms

6. CONCLUSION

In this paper, we proposed a hybrid efficient facial images indexing and retrieval based on three
descriptors to improve the selection and matching process. The proposed image features benefits from
the combination of color normalization, covariance estimators and the gradient vector for retrieving
pertinent facial image. Our approach offers users several possibilities for querying a database image
data, and returns as a response, the images most similar to the query image. The numerical results are
encouraging and demonstrate the high efficiency and flexibility. Futures works attempt to enhance this
method in terms of using different word embedding in order to investigate their impact in retrieval task.

24
International Journal of Synthetic Emotions
Volume 11 • Issue 1 • January-June 2020

REFERENCES

Abhishree, T. M., Latha, J., Manikantan, K., & Ramachandran, S. (2015). Face recognition using Gabor filter
based feature extraction with anisotropic diffusion as a pre-processing technique. Procedia Computer Science,
45, 312–321. doi:10.1016/j.procs.2015.03.149
Ashraf, R., Ahmed, M., Jabbar, S., Khalid, S., Ahmad, A., Din, S., & Jeon, G. (2018). Content based image
retrieval by using color descriptor and discrete wavelet transform. Journal of Medical Systems, 42(3), 44.
doi:10.1007/s10916-017-0880-7 PMID:29372327
Bouhini, C., Géry, M., & Largeron, C. (2013). User-Centered Social Information Retrieval Model Exploiting
Annotations and Social Relationships. In Asia Information Retrieval Symposium (pp. 356-367). Springer.
doi:10.1007/978-3-642-45068-6_31
Boujemaa, N., Boughorbel, S., & Vertan, C. (2001). Color Soft Signature for Image Retrieval by Content.
Eusflat, 2, 394–401.
Costa, A. F., Humpire-Mamani, G., & Traina, A. J. M. (2012). An efficient algorithm for fractal analysis of
textures. In 2012 IEEE 25th SIBGRAPI Conference on Graphics, Patterns and Images (pp. 39-46). doi:10.1109/
SIBGRAPI.2012.15
De Siqueira, F. R., Schwartz, W. R., & Pedrini, H. (2013). Multi-scale gray level co-occurrence matrices for
texture description. Neurocomputing, 120, 336–345. doi:10.1016/j.neucom.2012.09.042
Derrode, S., & Ghorbel, F. (2001). Robust and efficient Fourier–Mellin transform approximations for gray-level
image reconstruction and complete invariant description. Computer Vision and Image Understanding, 83(1),
57–78. doi:10.1006/cviu.2001.0922
Eleyan, A., & Demirel, H. (2011). Co-occurrence matrix and its statistical features as a new approach for face
recognition. Turkish Journal of Electrical Engineering and Computer Sciences, 19(1), 97–107.
Goecke, R., Asthana, A., Pettersson, N., & Petersson, L. (2007). Visual vehicle egomotion estimation using
the fourier-mellin transform. In 2007 IEEE Intelligent Vehicles Symposium (pp. 450-455). doi:10.1109/
IVS.2007.4290156
Hong, S., & Nah, Y. (2004). An intelligent image retrieval system using XML. In IEEE 10th International
Conference on Multimedia Modelling (p. 363). doi:10.1109/MULMM.2004.1265010
Israel, M., Broek, E. L., Putten, P. V., & Den, M. J. (2004). Automating the construction of scene classifiers for
content-based video retrieval. Seattle, WA: Academic Press.
Karmakar, D. (2019). Multimodal Biometric Recognition in Feature Level Fusion using Statistical Moment
Measure of Color Values. Journal of Computer and Mathematical Sciences, 10(3), 584–592. doi:10.29055/
jcms/1041
Mahoor, M. H., & Abdel-Mottaleb, M. (2008). A multimodal approach for face modeling and recognition. IEEE
Transactions on Information Forensics and Security, 3(3), 431–440. doi:10.1109/TIFS.2008.924597
Van Den Broek, E. L., & van Rikxoort, E. V. (2004). Evaluation of color representation for texture analysis.
Proceedings of the 16th Belgium-Netherlands Artificial Intelligence Conference, 35-42.
Van Rikxoort, E. M., van den Broek, E. L., & Schouten, T. E. (2005). Object based image retrieval: Utilizing
color and texture. Academic Press.
Wang, K., Wu, D., Chen, F., Liu, Z., Luo, X., & Liu, S. (2010). Angular color uniformity enhancement of
white light-emitting diodes integrated with freeform lenses. Optics Letters, 35(11), 1860–1862. doi:10.1364/
OL.35.001860 PMID:20517442
Zhou, D., Wu, X., Zhao, W., Lawless, S., & Liu, J. (2017). Query expansion with enriched user profiles for
personalized search utilizing folksonomy data. IEEE Transactions on Knowledge and Data Engineering, 29(7),
1536–1548. doi:10.1109/TKDE.2017.2668419

25
International Journal of Synthetic Emotions
Volume 11 • Issue 1 • January-June 2020

Adel Alti obtained a Master’s degree from the University of Setif (UFAS), Algeria, in 1998. He obtained a Ph.D.
degree in software engineering from UFAS University of Sétif, Algeria, 2011. Right now he is an associate professor,
HDR at the University of Sétif. He is a header of the Smart Semantic Context-aware Services research group
LRSD. His area of interest includes Mobility; ambient, pervasive and ubiquitous computing, automated software
engineering, mapping multimedia concepts into UML, semantic integration of architectural description into MDA
platforms, context-aware quality software architectures and automated service management, Context and QoS.
During his work, he has published number of publications concerning these subjects.

26

You might also like