You are on page 1of 20

A Model Combining C-Tree and

Neighbor Graph for Semantic-based


Image Retrieval
Nguyen Thi Uyen Nhi
University of Science - Hue University, Vietnam
University of Economics, The University of Danang, Vietnam
Thanh The Van
HCMC University of Food Industry, Vietnam

ABSTRACT
In this paper, a semantic-based image retrieval system (SBIR) is built to search the set of similar images
and the semantic of a query image. To solve this problem, firstly, a method is proposed to improve the
efficiency of image retrieval based on the combination of a C-Tree and a neighbor graph at leaf level.
The neighbor graph limits the omission of similar components being branched during node separation
and improves accuracy for image retrieval. Then, an ontology framework for the image dataset is created
semi-automatically to execute the SPARQL query to search a set of similar images and semantics of the
query image. To evaluate the effectiveness of the proposed method, the experiment results are assessed
and compared with some of the recently published methods on image datasets such as COREL (1000
images), Wang (10,800 images), ImageCLEF (20,000 images), and Stanford Dogs (20,580 images).

Keywords: C-Tree, Graph-CTree, SBIR_grCT, SPARQL, Ontology

1. INTRODUCTION
The problem of data discovery, looks up, and analyze image semantics is becoming popular research
topic that many people pay attention to. Identifying the desired image from a large and diverse collection
is a daunting task. Therefore, it is essential to develop high-precision image retrieval systems for large
datasets. However, the low-level content of the image including color, shape, texture of image, texture…
cannot define the user’s high-level semantics (Allani et al., 2017; Cao et al., 2016). Therefore, it is
essential to have a structure for mapping low-level features to ontology-based high-level semantics (Liu et
al., 2008; Sarwara et al., 2013) through classifying and creating SPARQL query (Gombos & Kiss, 2017).
A model mapping the low-level feature to the high-level semantics for each image is proposed to solve
two problems: (1) searching for a similar set of images by low-level feature with input image; (2)
converting low-level content into high-level semantics of the image based on SPARQL and ontology
query. In the previous study, a balanced multi-branch clustering tree C-Tree (Nhi et al., 2020) was built
on a semi-supervised learning to automatically archive indexed images from low-level features of the
image. Each node on the C-Tree is clustered based on a similarity measure according to the method of
clustering and hierarchical clustering to effectively search for the same set of images, and classify the
input query image. Querying on a C-Tree has many advantages such as a multi-branch tree, clustering
feature vectors, so a large data can be stored. Each inner node and leaf node has a maximum of M
elements and they split into k nodes when the maximum number of elements is exceeded, making the
tree training process highly complex and the training time long. At the same time, the node splitting
process on C-Tree can make similar elements be split into separate nodes. In the worst case, these similar

1
elements are in separate branches. The search according to the most similar branch will not get the same
elements that have been moved to other branches. Therefore, query performance on C-Tree is not high.
This paper proposes an improved model with the combination of C-Tree and neighbor cluster visibility to
improve the efficiency of image retrieval on C-Tree, specifically: (1) each node has a maximum N
element. Each leaf node has M elements. When each node exceeds the maximum number of elements, it
is split into 2 new nodes. Therefore, the C-Tree creating process is simpler and faster; (2) proposing to
create a neighbor graph from leaf clusters during the process of building a C-Tree called Graph-CTree to
limit the omission of branched similar elements. At this point, the most suitable leaf node searched on the
C-Tree will give the input query image, a graph of neighboring clusters generated from the nodes
associated with this leaf node, improving search efficiency. For semantic-based image retrieval, a semi-
automatic ontology is built upon the RDF triple language (Furche et al., 2006; Haase et al., 2004) for the
multi-content and multi-object ImageCLEF dataset. An input image is queried on an improved C-Tree to
find a similar set of images. From there, the algorithm k-NN (k-Nearest Neighbor) was implemented on
this image set to give categories of the dataset and save it to the visual vectors. SPARQL query is
automatically generated based on visual vectors and is queried on the ontology. The result of this query is
the set of descriptive semantics of the dataset and the set of similar images according to the semantic
classification of the input image. The experiment was carried out on the Corel, Wang, ImageCLEF,
Stanford Dogs dataset and compared with the previous results and the relevant works to evaluate the
efficiency and accuracy of the proposals.
The contributions of the paper include: (1) improving C-Tree node splitting; (2) proposing a model
combining C-Tree and clustering neighbor graph, Graph-CTree to improve the efficiency of image
retrieval; (3) building a framework for a semi-automatic ontology for image data to associate the low-
level feature of the image with high-level semantic concepts; (4) proposing SBIR_grCT model and
building an experimental application on ImageCLEF dataset.
The paper includes the following sections. Part 2 presents relevant research. In part 3, we describe a
model combining a C-Tree with a clustering neighbor cluster graph. We propose to build a framework for
a semi-automatic ontology for semantic-based image retrieval in part 4. Part 5 is an application of the
SBIR_grCT query system based on the proposed model. The experiment was performed on data sets of
COREL, Wang, ImageCLEF, and Stanford Dogs to compare with other methods and prove the
effectiveness of the proposed system. Conclusion and directions for research development in the future
are presented in Part 6.

2. RELATED WORK
Semantic-based image retrieval has become an active research topic in recent times. There are many
techniques of semantic-based image retrieval, which have been implemented in many different number
systems such as techniques to build a model for mapping between low-level features and high-level
semantics (Allani et al., 2015; Sethi et al., 2001), query techniques based on ontology to accurately
describe semantics for images (Hooi et al., 2014; Sarwara et al., 2013; Zhong et al., 2020), techniques for
classification data (Erwin et al., 2017; Garg et al., 2019), content-based semantics and image retrieval
systems for hierarchical databases (Allani et al., 2017; Pandey et al., 2016)...
Jalila Filali proposed to build vocabulary and ontology based on visual annotations of the ImageCLEF
dataset. The image retrieval process is accomplished by integrating both low-level features and visual
semantic similarity. The ontologies are diversified by concepts and relationships extracted from BabelNet
lexical resources. However, this proposal system has not combined low-level feature-based image search
with the semantic concept and semantic query of the image (Filali et al., 2017).
Hakan Cevikalp proposed a method for image retrieval by using a class binary hierarchical tree and
trained with a low-level feature-based TSVM technique to separate labeled and unlabeled data samples at
each node of the binary hierarchy. A hashing algorithm has been proposed to perform both data learning
and speed up image search. The hyper-planes separated by TSVM are used to generate binary code. The

2
method has experimented on ImageCLEF, CIFAR100 dataset and compared the efficiency with other
methods (Cevikalp et al., 2018).
M. Jiu used deep multi-layer networks based on nonlinear activation functions for image annotation. The
SVM technique is applied to layering images at the output layer to extract a semantic level according to
visual information for similar pocket-based images from BoW (Bag-of-Words). Then, the method is
evaluated on the ImageCLEF dataset. In this method, a deep multi-layer network is fixed the number of
layers, so the image classification is limited (Jiu & Sahbi, 2017)
In 2017, Allani Olfa proposed pattern-based image retrieval system SemVisIR, which combined semantic
and visual features. They organized the image dataset in a graph of patterns which are automatically built
for the different domains by clustering algorithms. SemVisIR modeled the visual aspects of images
through graphs of regions and assigning them to automatically built ontology modules for each domain.
Their system was implemented and evaluated on ImageCLEF. The performance of this method is not high
compared to the previous methods, because the semantics of images are retrieved directly on the ontology
(Allani et al., 2017).
Cong Bai proposed a method for semantic-based image retrieval, which integrates the visual model into
the BOW model to reduce the semantic distance. This integration makes the proposal closer to the
physiological features of the visual system of humans. Images are segmented into foreground objects and
background areas by using visual segmentation. The different characteristics are extracted from merged
objects and background areas, so it increases the ability of the features to express the semantic
understanding of the image. The query system was experimented with two popular datasets namely
COREL 5K and VOC 2006 to demonstrate the efficiency of the proposal. However, the computational
complexity of the segment is still high, and segment performance has a great influence on query
performance (Bai et al., 2018).
Barz, B., & Denzler, J. proposed a method to integrate semantic relationships between classes. It's come
up as a class classification, into deep learning by deep neural networks. The hierarchy-based semantics
preserve the semantic similarity of the classes in the common space of the images and classes. Hence, it
allows retrieving images from a database that is not only intuitive but also semantically similar to certain
query images. The proposed method that was tested on CIFAR-100 showed significantly improved the
semantic consistency of the image retrieval results (Barz & Denzler, 2020).
The recent approaches have focused on methods for mapping low-level features to semantic concepts by
using supervised or unsupervised machine learning techniques; building data models such as graphs,
trees, or deep learning networks to store low-level contents of images; building ontology to define the
high-level concepts, etc. Therefore, it can be said that semantic-based image retrieval (SBIR) is a topic
that is paid more attention to.

3. MODEL COMBINING C-TREE AND NEIGHBOR GRAPH


3.1. Improving node splitting on C-Tree
The C-Tree (Nhi et al., 2020) consists of a root node, a set of nodes in, and a set of leaf nodes L . Nodes
are connected through the path of the parent-child relationship. There are two types of elements stored in
the tree. The Element-Data ED is stored in the leaf node, and the Element-Center EC is stored in the
inner node. The data element 𝐸𝐷 contains the following elements: index is the position of ED on the
file, f is the feature vector of an image, identifier of image ID , the file containing the image caption of
file , and cla which are classes of the image, now an data element ED  index, f , ID, file, cla . The
center element EC contains the following components: index is the position of EC on the file, f c is the
center vector of the feature vectors at a child node that has the path linking the path to the EC , and
isNextLeaf is the value to check whether the next sub-cluster is leaf or not, at this time a centroid element
EC  index, f c , isNextLeaf , path .

3
The C-Tree includes leaf nodes L which are nodes without child nodes, containing data element ED , so
that L  {ED1 , ED2 ,..., EDn } , 1  n  M , in which M is the maximum number of elements in a leaf node;
the internal nodes 𝐼 are nodes with at least two child nodes, containing center element EC , so that
I  {EC1 , EC2 ,..., ECm } , 1  m  N , in which N is the maximum number of elements in the inner node;
each element of inner node links to its adjacent child node via its path ; each leaf node has the same
height (equilibrium condition). When a leaf node or an inter node in a C-Tree is full, it means that the
number of elements exceeds the threshold M , N respectively and the node performs splitting based on
the K-Means algorithm. Figure 1 describes the process of splitting a node in a C-Tree.

Figure 1. Splitting a node on the C-Tree

The process of splitting a node in a tree is implemented as follows:


 Choose k  2 , take the two elements furthest from each other according to Euclidean measure.
First, define the node center as the mean value of the feature vectors of the node to be separated,
select an element EDi (with a leaf node) or ECi (with an inner node) away from the center of the
first node as the center of the first new cluster N L , then select element ED j (or EC j ) which is
the most far from EDi (or EC ) as the center of the second new cluster N R .
 If it is the root node, the system creates a new root. If it is an inner node or a leaf node, the system
creates a new parent node (if there is no parent node) or adds two new center elements to the
parent node.
 The elements in the original node are distributed into two new nodes according to the rule of
selecting the closest node based on Euclidean distance.
 Delete the old center element of the split node and delete the text file of the split node.
 Updates the center element at the parent node cluster and performs recursion to the root node.
Therefore, selecting the different maximum number of elements of inner and leaf nodes and choosing the
number of split nodes k  2 , the C-Tree training process reduces complexity and computation costs and
its efficiency is improved.
3.2. Neighbor graph
The Graph-CTree neighbor graph is created based on a node splitting process when training a C-Tree.
Each time the leaf node is split, the neighboring levels of the newly split-leaf nodes are marked. The
clustering problem on the tree will be returned to the graph clustering problem whose the main task is to
connect related clusters to avoid missing similar image data in the search process. Figure 2 describes the
structure of the Graph-CTree neighbor clustering graph.

4
Figure 2. Structure of Graph-CTree neighbor clustering graph

The Graph-CTree neighbor clustering graph is described as follows: Graph-CTree G  V , E  is an


undirected graph, including:
 The set of vertexes V is clusters of tree leaf node C-Tree;
 The set of edges E  V  V are the links of a pair of leaf nodes, formed in the relationship of
neighboring levels.
 Neighboring levels of the cluster:
o Neighboring level 1: Let f c1 , f c2 be center feature vectors of the cluster (node) L1 , L2 . If
the Euclidean distance between the centers of two clusters d E ( fc1 , fc2 )   , where  is
the threshold value, L1 , L2 are neighboring level 1.

o Neighboring level 2: Let C1 , C2 be respectively the representative classes of the cluster


(node) L1 , L2 . The representative class is the class with the most frequency in the cluster.
If C1  C2 then L1 , L2 are neighboring level 2.

o Neighboring level 3: Let C1 , C2 be the representative class of the cluster (node) L1 , L2 .


The (C1 ) and (C2 ) are the sets of image of C1 , C2 , respectively. If (C1 )  (C2 )
or (C2 )  (C1 ) then 𝐿1 , 𝐿2 are neighboring level 3.
Every neighboring level of a leaf node, after being specified, creates a file to store its neighbors by
filename Neighbor[i]Leaf[j].txt, in which i is the neighbor, j is the neighbor node. The Figure 3 shows
that if having the neighboring level 1 and level 2 of the leaf node Leaf37 and not having level 3, the
system will not create the file. Thus, neighboring level 1 of Leaf37 are the leaf nodes Leaf17 and Leaf38.

5
At the same time, Leaf17 is also neighboring level 2 with this leaf node cluster. Figure 4 shows a
neighbor clustering graph from the Leaf37 node.

Figure 3. An example of a neighbor file

Figure 4. Neighbor clustering graph of leaf node Leaf37

Thus, the structure of Graph-CTree still follows the rules for creating C-Tree, the process of node splitting
is improved: (1) Each inner node has a maximum of N elements. Each leaf node has M elements. When
each node exceeds the maximum number of elements, the node is split into 2 new nodes. Therefore, the
C-Tree training process will be simpler and faster; (2) proposing to create a neighbor graph from leaf
clusters during node splitting on C-Tree, called Graph-CTree. Searching the Graph-CTree neighbor graph
will limit the omission of similar data on the C-Tree because it can find the relevant leaf clusters. The
accuracy of retrieval is enhanced.

4. BUILDING AN ONTOLOGY FRAMEWORK


4.1. Building a framework for ontology for the image dataset
To query images in a semantic approach, a framework for ontology for the dataset is built on the RDF
triple language. The ontology framework for the proposed image uses an ImageCLEF dataset (Escalante
et al., 2010). This is a set of multi-object images and rich in subject domain. It is capable of scaling the
domain, inheriting ontology systems for the individual image, and being able to describe the relationships
of image with its classes. Figure 5 shows the proposed model for building the ontology for the image.

Figure 5. Model for building the framework for ontology from the ImageCLEF image dataset

The proposed model follows the following steps:


Step 1: Inherit the list of classes (Escalante et al., 2010)from the ImageCLEF dataset to add and create a
hierarchy for classes (1);
Step 2: Add captions, semantic descriptions for images (2);

6
Step 3: Render identifiers of URIs and descriptions (4), annotation of image taken from WWW (5);
Step 4: Create the ontology from the information of image datasets (3) and WWW (6).
Each image is an individual/instance of one or more classes in the ontology. Figure 6 shows an example
of a framework for visual ontology which is built on Protégé for the ImageCLEF dataset. The output for
the ontology built on Protégé is the data model of RDF/XML triple language and is depicted in Figure 7.
Each attribute will be stored as a triad and it will create a large number of triads. However, the image data
is enormous and is added regularly to improve and enrich the image ontology. Therefore, the ontology
from Protégé does not effectively manage these large-sized knowledge bases. Therefore, we propose to
build a framework for semi-automatic ontology, which is regularly updated, can scale the domain and
inherit ontology systems.
4.2. Creating a semi-automatic ontology
To construct a framework for semi-automatic ontology, firstly, it is necessary to prepare sample data for
classes, class hierarchy, and attributes, prepare literals for class and image instances, and class definitions
for the ImageCLEF background dataset. These sample data are automatically obtained from sources such
as the WWW image data set. After that, the experts will conduct editing, updating before creating the
automatic ontology by software built on dotNET Framework 4.8 platform, C# programming language.
The RDF and OWL libraries in C# are used to model the RDF/OWL data.

Figure 6. An example of a visual ontology on Protégé

Figure 7. Ontology appears in RDF/XML triple language

7
Figure 8. Model of creating the framework for semi-automatic ontology

Figure 8 proposes a model to create the framework for semi-automatic ontology. The process for creating
a framework for semi-automatic ontology includes the following steps:
Step 1: Define the image dataset to create a framework for ontology: Inherit the list of classes from the
ImageCLEF dataset to add and create a hierarchy for classes (1);
Step 2: Get images from WWW, render resource identifiers, and describe the images (2). Then, take the
identifier of image and URL description (3) and add the input data (4) to the ontology learning process
Step 3: Create sample data for the elements of the framework ontology (5), including:
 Automatic phase: create classes inherited from the image data set; create literals for individuals
(images) and classes; create individuals according to each class of the image.
 Manual phase: create a hierarchy of classes from automatically generated classes; create
properties for classes, images, and relationships between images and classes; combine with the
information about identifiers of URI resource, resource identifiers, etc. taken from WWW like

8
Wordnet, BabelNet, Wikipedia, Dbpedia… to create the dictionary. In this phase, there should be
edited and updated from the experts before creating the frame ontology.
Step 4: Based on data samples that have been updated, edit to create a framework for ontology based on
C# language (6).
Ontology is stored by the Notation 3 syntax (N3). N3 is a sequence and it is not the XML of the RDF
models. N3 is designed to be compact and easier to read than the RDF notation of XML. Tim Berners-Lee
and others from the Semantic Web community developed the format in 2008. The data model of N3 helps
the semantic model of RDF data.
Figure 9 shows an example of the ontology according to the N3 language. Figure 10 represents a
synonym dictionary that defines the classification vocabulary and properties to extend the semantics of
the ontology extracted from WordNet.
Each classified vocabulary in the dictionary describes information such as images representing
vocabulary, meta-data, description of vocabulary, and the concepts of that vocabulary. Therefore, the
framework for semi-automatic ontology is built from the ImageCLEF dataset, creating the foundation for
additional sets of images and semantic-based image retrieval.

Figure 9. Ontology in N3 format

Figure 10. An example of a conceptual vocabulary in the dictionary

9
4.3. The SPARQL query on a framework for semi-automatic ontology
SPARQL query is applied to query on N3 for semantic-based image retrieval on the ontology. An input
query image can contain one or more objects. The system extracts low-level features of an image, then
searches on a combined model of a C-Tree and a neighbor graph (Graph-CTree) for a set of similar
images. The k-NN algorithm is used to classify the image and store the classes representing the query
image into a visual word vector. The SPARQL query is automatically generated from image classes of
visual word vectors to retrieve on ontology. The result of this query is a set of similar images based on
semantics and a set of semantic descriptions of images including meta-data, URI identifiers, and semantic
concepts of images mapped from a synonym dictionary of the ontology. The SPARQL query is
automatically generated from visual word vectors according to the following algorithm:
Algorithm CSP: automatically generate SPARQL queries
Input: Visual word vector W
Output: SPARQL queries
Function CSP( W )
Begin
n  W .count ;
SPARQL = "SELECT DISTINCT ?Img
WHERE{";
for i = 0. . n do
SPARQL += "<subject>: + W(i) +"rdf:type"+ <object>: +?Img"+
"UNION/AND";
EndFor;
SPARQL+=}”;
Return SPARQL;
End.
The input of the above algorithm is the visual word vector W and the output is a SPARQL query to query
on the ontology. We will count the number of words in the visual word vector and execute the SPARQL
query according to those words.
Figure 11 shows an example of the SPARQL query generated from the visual word vector of an input image
15372.jpg of the ImageCLEF image dataset. The SPARQL query is created in the standard of W3C and can be
queried in two forms: UNION query or AND query.

Figure 11. An example of creating a SPARQL query from a visual word vector

10
5. THE SEMANTIC-BASED IMAGE RETRIEVAL
5.1. A model of the semantic-based image retrieval based on Graph-CTree
The system of the semantic-based image retrieval based on Graph-CTree model and ontology is called
SBIR_grCT. The query system consists of two phases: (1) the pre-processing phase extracts the
characteristics of the image in the data set, segment the image to create conceptual classes, organize the
storage in the C-Tree, and generate a neighbor clustering graph of the leaf node cluster. At the same time,
it builds a framework for semi-automatic ontology based on the RDF triple language and stores it in N3
format; (2) The query phase will search for a set of similar images on the Graph-CTree model, classify
the image with the k-NN algorithm to find visual word vectors and create the SPARQL query
automatically, which helps query on the ontology to render high-level semantics of the image.

Figure 12. Model of the semantic-based image retrieval system SBIR_grCT

11
Figure 12 shows the architecture of the semantic-based image retrieval system SBIR_grCT based on the
Graph-CTree model with two specific phases as follows:
Phase One: Pre-processing phase:
Step 1: From the image dataset, segment and extract low-level feature (1);
Step 2: Create the data sample (2), in which is the feature vector and is the image class from the
segmentation and extraction of the feature vector of the image;
Step 3: Create a model combining C-Tree and neighbor graph of leaf nodes (Graph-CTree) from data
samples (3);
Step 4: Build a framework for semi-automatic ontology from WWW (4) and ImageCLEF dataset (5).
Phase Two: Image retrieval phase:
Step 1: Each input query image is extracted with feature vectors (6);
Step 2: Perform query (7) on Graph-CTree: First, based on Euclidean measure to find the most suitable
leaf node on C-Tree, then, take the set of neighboring nodes with this leaf node on the neighbor graph (8).
The result is a set of similar images and an arrangement of similar images by measure (9);
Step 3: Classify image from similar image dataset based on a k-NN algorithm (10) to create a visual word
vector (11);
Step 4: Automatically create SPARQL query from visual word vector (12) and execute the query on a
framework for ontology (13);
Step 5: The result of the semantic-based image retrieval on this ontology (14) is metadata, URIs (15), set
of similar images, and its semantics (16).

12
5.2. Experiment
5.2.1. Experimental environment
In the experiment, the SBIR_grCT query system is built on the dotNET Framework 4.8 platform, the C #
programming language. The graphs are built on Mathlab 2015. The SBIR_grCT query system is
implemented on computers with Intel (R) CoreTM i7-8750H, 2.70GHz CPU, 8GB RAM, and Windows
10 Professional operating system. The image datasets used in the experiment are COREL, WANG and
ImageCLEF, Stanford Dogs, as described in Table 1. ImageCLEF is a multi-object image dataset, used to
build the frame ontology for semantic-based image retrieval, and also used for the experiment for content-
based image retrieval (CBIR) on C-Tree and Graph-CTree. In addition, image datasets have single object,
that is only considered the main object in the image, such as COREL, WANG, Staford Dogs are also
experimented to evaluate the accuracy of CBIR on C-Tree and Graph-CTree.

Table 1. Informations of image datasets


No. Dataset Number of images Number of subjects Size
1 COREL 1.000 10 30,3 MB
2 WANG 10.800 80 62,2 MB
3 ImageCLEF 20.000 276 1,64 GB
4 Stanford Dogs 20.580 120 778 MB
5.2.2. Experimental application
The image retrieval system SBIR_grCT performs two processes: (1) Content-based image retrieval
according to C-Tree and Graph-CTree, (2) semantic-based image retrieval on ontology. For an input
image, firstly, the SBIR_grCT system extracts the feature vectors and searches the content-based image
respectively on C-Tree, Graph-CTree to find a set of similar images. Figure 13 shows the result of finding
a content-based image of the query system SBIR_grCT.

Figure 13. A result of finding the content-based image of the query system SBIR_grCT

Then, from the set of content-based similar images, the query system SBIR_grCT performs image
classification on the C-Tree or Graph-CTree to find the visual word vector. The SPARQL query is
automatically generated from a visual word vector to retrieve a semantic-based image on the ontology.
We will create a UNION or AND query depending on the purpose of the user.

Figure 14. A result of the semantic-based image retrieval of the query system SBIR_grCT

13
The query result on the framework for ontology is a set of semantically similar images, including a list of
similar image datasets, which are queried on the ontology by WWW or the name of the image on the
server. Figure 14 is a result of the semantic-based image retrieval and classification on the query system
SBIR_grCT on ontology. Each image in the set of similar images will be semantically described with the
meta-data for the caption of the image, URIs of the image. At the same time, the classes got in
classification process of similar image dataset is mapped to the synonym dictionary of ontology to extract
general concepts, equivalent keyword, the image representing of the classes and its meta-data. Figure 15
is an example of the concept of class in the dictionary on ontology.

Figure 15. The semantic concepts for class

5.2.3. Experimental results and evaluation


In the paper, the proposal has been built according to the SBIR_grCT system. Building improved C-Tree,
with proposed node splitting improvement, helps training times faster than a C-Tree which was built
previously. Table 2 shows a comparison of training time for the image datasets on C-Tree and improved
C-Tree. The results in Table 2 show the effectiveness of improving C-Tree in node separation.
The paper uses factors such as precision, recall and F-measure, and query time (milliseconds) to evaluate
the efficiency of retrieval images. Based on the available performance values, the average query time, and
performance values for Corel, Wang, ImageCLEF và StanfordDogs on improved C-Tree and Graph-
CTree are presented in Table 3, Table 4, Table 5, and Table 6.

Table 2. Training times for image datasets

14
Training time (s)
Data structure
COREL WANG ImageCLEF Stanford Dogs
C-Tree 6.168 18.720 44.871 45.080
Improved C-Tree 5.142 15.480 31.618 32.560

Table 3. Performance of SBIR_grCT on the COREL dataset


Performance Avg. precision Avg. recall Avg. F-measure Avg. query time (ms)
Improved C-Tree 0.677655 0.699885 0.688521 19.91437
Graph-CTree 0.888473 0.884555 0.886346 133.51876

Table 4. Performance of SBIR_grCT on the WANG dataset


Performance Avg. precision Avg. recall Avg. F-measure Avg. query time (ms)
Improved C-Tree 0.607222 0.489184 0.545011 39.746901
Graph-CTree 0.7664731 0.667659 0.713353 202.793863

Table 5. Performance of SBIR_grCT on the ImageCLEF dataset


Performance Avg. precision Avg. recall Avg. F-measure Avg. query time (ms)
Improved C-Tree 0.606217 0.409401 0.474718 44.08912
Graph-CTree 0.839814365 0.780735832 0.806435717 239.285867

Table 6. Performance of SBIR_grCT on the Stanford Dogs dataset


Performance Avg. precision Avg. recall Avg. F-measure Avg. query time (ms)
Improved C-Tree 0.570385 0.37308 0.450394 48.30427
Graph-CTree 0.826416 0.513825 0.630859 296.9064
The above tables show the improvement of the model combining a C-Tree and a neighbor clustering
graph creates better query performance in terms of precision for image datasets, such as COREL, WANG,
ImageCLEF, Stanford Dogs. Searching the neighbor graph will limit the omission of similar data on the
C-Tree because it can find the relevant leaf clusters. Therefore, querying on Graph-CTree brings much
higher accuracy than C-Tree. However, the query time on a Graph-CTree is higher than the query time on
the improved C-Tree because the neighbor graph takes a lot of memory to keep track of all clusters of
neighboring nodes.
In addition, to evaluate the query system results, a Receiver Operating Characteristic curve (ROC) was
performed. The area under the AUC curve (Area Under the Curve) and the spatial limitation of the ROC
is a measure of the accuracy of the query. The larger the area is, the higher the accuracy is. Combining
precision and coverage creates another metric called the Precision-Recall curve (PR curve) to evaluate the
effectiveness of the image query system. The AUC of the PR curve is similar to that of the ROC curve. It
means the larger the area under the curve is, the higher the accuracy is. Based on experimental data,
Precision-Recall curve and ROC curve plots were performed to evaluate the performance of SBIR_grCT
query system on C-Tree and Graph-CTree for COREL, WANG, ImageCLEF and Stanford Dogs image
sets (Figure 16, Figure 17, Figure 18, Figure 19).
In a PR curve graph, each curve is a set of image folders of each image set. Precision-Recall the curve
graph shows the area under the curve of Graph-CTree larger than C-Tree, showing our proposed
improvements better precision. In the ROC curve graph, a baseline diagonal divides the space of the ROC
into two parts. The points above the diagonal represent the correct classification results. The points below
the diagonal are the results of false classification. Points further from the baseline will give better
classification results than points located near the baseline. ROC graphs have points all on the baseline,
showing that the image classification results are good. Based on the ROC graph, it can be seen that the
points of Graph-CTree for the image sets are all farther from the baseline than the C-Tree. Consequently,
Graph-CTree's image classification results are better than C-Tree's. From there, the proposed
improvements in the article are effective and correct. At the same time, we use precision and semantic
classification time to evaluate the image classification efficiency of the image retrieval system

15
SBIR_grCT based on the C-Tree and Graph-CTree with k-NN algorithm. The more accurate the image
classification is, the more correct the query result on the ontology is. Table 7 summarizes the results on
the precision and time of classification on C-Tree and Graph-CTree of image datasets COREL, WANG,
ImageCLEF and Stanford Dogs. Table 7 shows that the classification accuracy of Graph-CTree is also
better than that of C-Tree. Therefore, our proposed approach for improving C-Tree has enhanced image
classification accuracy, making accuracy when querying the ontology is also better.

Figure 16. Precision-Recall curve and ROC curve plots on C-Tree and Graph-CTree (COREL dataset)

Figure 17. Precision-Recall curve and ROC curve on C-Tree and Graph-CTree (WANG)

Figure 18. Precision-Recall curve and ROC curve on C-Tree and Graph-CTree (ImageCLEF)

Figure 19. Precision-Recall curve and ROC curve on C-Tree and Graph-CTree (StanfordDogs)

To evaluate the accuracy and efficiency of the SBIR_grCT query system, we compared the obtained
performance with other studies on the same set of image data. Table 8 shows the results of the Mean
Average Precision comparison (MAP) on the COREL (1000 images) set of our proposed method with the

16
following results. (1) method of Huneiti (Huneiti & Daoud, 2015). It is a method of content- based image
query by extracting color and texture feature vectors textured and using Discrete Wavelet Transforms
(DWT) and Self Organized Maps (SOM). (2) The Fuzzy-NN method of Garg (Garg et al., 2019), this is a
method combining Neuro-Fuzzy and Deep Neural Network to effectively classify and query for a certain
set of images. And (3) Fused Information Feature-based Image Retrieval System (FIF-IRS) method by
Bella MIT (Bella & Vasuki, 2019), consisting of 8-Directional Gray Level Co-occurrence Matrix (8D-
GLCM) and HSV Color Moments (HSVCM) ) features. Table 8 shows that our proposed method for the
COREL image set gives better performance than the results of the works above. It proves that our
proposal is effective for the Corel image set.

Table 8. Comparison of mean average precision (MAP) of methods on COREL dataset


Methods Mean Average Precision (MAP)
A. Huneiti (Huneiti & Daoud, 2015) 0.559
Garg, M. (Garg et al., 2019) 0.602
Bella M. I. T. (Bella & Vasuki, 2019) 0.609
Graph-CTree 0.888473
Table 9 is the comparison between our proposed method and the studies on the Wang image set (10,800
images) including (1) the Signature-Based Bag of Visual Words method (SBoVW) with the expansion of
the Sorted Dominant Local Color method (SDLC) by using the weight of each visual word in each image
and choosing the best parameter for it by Dos Santos (Santos et al., 2015). (2) The SVM method to
classify the images which are Feature Extraction with Top Hat Morphological Operation by Das (Das et
al., 2017). And (3) the method of Chhabra (Chhabra et al., 2020) with the use of the Decision tree to
classify images extracted by ORB features (Oriented Fast and Rotated BRIEF). The results show that the
MAP of our proposed method is higher than the mentioned methods. Therefore, it is effective for the
Wang image set.

Table 9. Comparison of mean average precision (MAP) of methods on WANG dataset


Methods Mean Average Precision (MAP)
Dos Santos (Santos et al., 2015) 0.570
Das (Das et al., 2017) 0.559
Chhabra (Chhabra et al., 2020) 0.577
Graph-CTree 0.766473

Table 10. Comparison of mean average precision (MAP) of methods on ImageCLEF dataset
Methods Mean Average Precision (MAP)
Vijayarajan (Vijayarajan et al., 2016) 0.4618
Cevikalp (Cevikalp et al., 2018) 0.4678
Seymour (Seymour & Zhang, 2018) 0.420
Graph-CTree 0.839814

Table 11. Comparison of mean average precision (MAP) of methods on StanfordDogs dataset
Methods Mean Average Precision (MAP)
Yhang (Zhang et al., 2016) 0.5083
Wang (Wang et al., 2018) 0.5349
Graph-CTree 0.826416

17
Table 10 shows the results of the MAP comparison between our proposed method and the methods of the
studies on the ImageCLEF image set (20,000 images). (1) The team of Vijayarajan (Vijayarajan et al.,
2016) with a method of constructing a domain ontology from image descriptions, from which a document
retrieval system and an image retrieval system were built for semantic image retrieval. (2) The method of
Hakan Cevikalp (Cevikalp et al., 2018) used Support Vector Machines (SVM) and binary hierarchy tree
to classify low-level image features, extracted based on graph-cut structure. (3) Seymour (Seymour &
Zhang, 2018) introduces a new method for constructing textual kernel matrices using these word vectors
that improves tag retrieval results for both user-generated tags as well as expert labels and using kernel
canonical correlation analysis (KCCA) to learn a semantic space. This result shows that our proposed
method is effective for multi-subject image sets.
Table 11 shows the MAP comparison results between our proposed method and the methods of the
research works on the Stanford Dogs image set (20,580 images). (1) The research team of Yhang (Zhang
et al., 2016) proposes a new fine-grained image categorization model with a dense graph mining
algorithm is developed to discover graphlets representative to each super-/sub-category. (2) The method
of Wang (Wang et al., 2018) proposes a simple and effective feature aggregation method using
generalized-mean pooling (GeM pooling), which can make better use of information from the output
tensor of the convolutional layer. This result also shows that our proposed method is effective for the
Stanford Dogs image set. The data from the above tables show that the proposed method gains higher
MAP when compared with other methods on the same image dataset. It can be seen that our proposed
method is effective in solving the query problem and semantic analysis of single-object images and multi-
object images.

6. CONCLUSION
This paper proposes a method to improve a balanced clustering tree C-Tree that was built in our previous
study to improve the efficiency of semantic-based image retrieval. First, the paper proposes an
improvement on node splitting on a C-Tree. The number of nodes per split is fixed k  2 and the
maximum number of elements of the leaf and inner nodes is distinct. This improvement reduces the time
and training complexity of the C-Tree. The model combining a C-Tree and a neighbor clustering graph
named Graph-CTree is proposed. This is the model improved from the C-Tree by creating a neighbor
clustering graph from a leaf node to avoid missing equivalent elements moved to other nodes or branches
during node splitting. The Graph-CTree model is built on a semi-supervised learning technique that
combines hierarchical clustering and partition clustering methods to map low-level features of an image
into high-level semantics. The k-NN algorithm is used to classify images based on a set of similar images
according to content and neighboring elements. The classification result is a list of classification names
for the image called the visual word vector. At the same time, we propose to build a semi-automatic
ontology for the image set to store the data describing the image sets in RDF triple language. The
SPARQL queries are automatically generated from the visual word vector for the semantic retrieval on a
framework for a semi-automatic ontology for the input image. Since then, a semantic-based image
retrieval system SBIR_grCT is built based on the Graph-CTree and ontology model. The experiment is
performed on image datasets, such as COREL (1000 images), Wang (10,800 images), ImageCLEF
(20,000 images) and Standford Dogs (20,580 images). The average query accuracy on Graph-CTree is
higher than C-Tree. Experimental performance is compared with other methods on the same image
dataset to evaluate the proposed model, method, and algorithm. The comparison results show that the
query system SBIR_grCT is more accurate than other studies on the same image dataset. This shows that
our proposals in this paper are effective and correct. In the future, we will build a semi-automatic
ontology from WWW image collections. Then, the ontology frame was added, enriched with other image
sets such as COREL, WANG, ImageCLEF, Stanford Dogs, etc. In addition, we also improve the
construction and query algorithms on the C-Tree and Graph-CTree to enhance the efficiency of semantic-
based image retrieval.

18
ACKNOWLEDGMENT
The authors would like to thank the Faculty of Information Technology, University of Science - Hue
University for their professional advice for this study. We would also like to thank HCMC University of
Food Industry, University Of Economics - The University Of Danang, University of Education HCMC,
and research group SBIR_HCM, which are sponsors of this research. This work has been sponsored and
funded by Ho Chi Minh City University of Food Industry under Contract No. 147/HD-DCT. We also
thank anonymous reviewers for helpful comments on this article.

REFERENCES
Allani, O., Mellouli, N., Zghal, H. B., Akdag, H., & Ghzala, H. B. (2015). A Relevant Visual
Feature Selection Approach for Image Retrieval. In Proceding of The international
conference on computer vision theory and applications (vol. 10, pp. 377-384). Berlin,
Germany.
Allani, O., Zghal, H. B., Mellouli, N., & Akdag, H. (2017). Pattern graph-based image retrieval
system combining semantic and visual features. Multimedia Tools and Applications,
76(19), 20287–20316.
Bai, C., Chen, J.-n., Huang, L., Kpalma, K., & Chen, S. (2018). Saliency-based multi-feature
modeling for semantic image retrieval. Journal of Visual Communication and Image
Representation, 50, 199-204.
Barz, B., & Denzler, J. (2020). Deep Learning on Small Datasets without Pre-Training using
Cosine Loss. In Proceding of IEEE Winter Conference on Applications of Computer
Vision (WACV) (pp. 1371-1380). Snowmass Village, CO, USA, USA.
Bella, M. I. T., & Vasuki, A. (2019). An efficient image retrieval framework using fused
information feature. Computers & Electrical Engineering, 75, 46-60.
Cao, Y., Wang, M. L. J., Yang, Q., & Yu, P. S. (2016). Deep Visual-Semantic Hashing for Cross-
Modal Retrieval. In Proceding of SIGKDD International Conference on Knowledge
Discovery and Data Mining (pp. 1445-1454). San Francisco, California, USA.
Cevikalp, H., Elmas, M., & Ozkan, S. (2018). Large-scale image retrieval using transductive
support vector machines. Computer Vision and Image Understanding, 173, 2-12.
Chhabra, P., Garg, N. K., & Kumar, M. (2020). Content-based image retrieval system using
ORB and SIFT features. Neural Computing and Applications, 32, 2725–2733.
Das, R., Thepade, S., & Ghosh, S. (2017). Novel feature extraction technique for content-based
image recognition with query classification. International Journal of Computational Vision
and Robotics, 7(1-2), 123-147.
Erwin, Fachrurrozi, M., Fiqih, A., Saputra, B. R., Algani, R., & Primanita, A. (2017). Content
based image retrieval for multi-objects fruits recognition using k-means and k-nearest
neighbor. In Proceding of International Conference on Data and Software Engineering
(ICoDSE) (pp. 1-6). Palembang, Indonesia.
Escalante, H. J., Hernández, C. A., Gonzalez, J. A., López-López, A., Montes, M., Morales, E.
F., . . . Grubinger, M. (2010). The segmented and annotated IAPR TC-12 benchmark.
Computer Vision and Image Understanding, 114(4), 419-428.
Filali, J., Zghal, H., & Martinet, J. (2017). Towards visual vocabulary and ontology-based image
retrieval system. In Proceding of International Conference on Agents and Artificial
Intelligence (pp. 560-565). Rome, Italy.
Furche, T., Linse, B., Bry, F., Plexousakis, D., & Gottlob, G. (2006). RDF Querying: Language
Constructs and Evaluation Methods Compared. Reasoning Web. Reasoning Web:
Lecture Notes in Computer Science (Vol. 4126, pp. 1-52): Springer, Berlin, Heidelberg.

19
Garg, M., Singh, H., & Malhotra, M. (2019). Fuzzy-NN approach with statistical features for
description and classification of efficient image retrieval. Modern Physics Letters A,
34(3), 1-24.
Gombos, G., & Kiss, A. (2017). P-Spar(k)ql: SPARQL Evaluation Method on Spark GraphX with
Parallel Query Plan. In Proceding of IEEE 5th International Conference on Future
Internet of Things and Cloud (FiCloud) (pp. 212-219). Prague, Czech Republic.
Haase, P., Broekstra, J., Eberhart, A., & Volz, R. (2004). A Comparison of RDF Query
Languages. International Semantic Web Conference: Lecture Notes in Computer
Science (Vol. 3298, pp. 502-517): Springer, Berlin, Heidelberg.
Hooi, Y. K., Hassan, M. F., & Shariff, A. M. (2014). A Survey on Ontology Mapping Techniques.
Advances in Computer Science and its Applications: Lecture Notes in Electrical
Engineering (Vol. 279, pp. 829-836): Springer, Berlin, Heidelberg.
Huneiti, A., & Daoud, M. (2015). Content-based image retrieval using SOM and DWT. Journal of
software Engineering and Applications, 8(2), 51-61.
Jiu, M., & Sahbi, H. (2017). Nonlinear Deep Kernel Learning for Image Annotation. IEEE
Transactions on Image Processing, 26(4), 1820-1832.
Liu, Y., Zhang, D., & Lu, G. (2008). Region-based image retrieval with high-level semantics
using decision tree learning. Pattern Recognition, 41(8), 2554-2570.
Nhi, N. T. U., Van, T. T., & Thanh, L. M. (2020). A self-balanced clustering tree for semantic-
based image retrieval. Journal of Computer Science and Cybernetics, 36(1), 49-67.
Pandey, S., Khannaa, P., & b, H. Y. (2016). A semantics and image retrieval system for
hierarchical image databases. Information Processing and Management, 52(4), 571-591.
Santos, J. M. d., Moura, E. S. d., Silvaa, A. S. d., Cavalcanti, J. M. B., Torres, R. d. S., & Vidal,
M. L. A. (2015). A signature-based bag of visual words method for image indexing and
search. Pattern Recognition Letters, 65, 1-7.
Sarwara, S., Qayyum, Z. U., & Majeed, S. (2013). Ontology based Image Retrieval Framework
Using Qualitative Semantic Image Descriptions. Procedia Computer Science, 22, 285-
294.
Sethi, I. K., Coman, I. L., & Stan, D. (2001). Mining association rules between low-level image
features and high-level concepts. Data Mining and Knowledge Discovery: Theory, Tools,
and Technology, III, 279-290.
Seymour, Z., & Zhang, Z. M. (2018). Image Annotation Retrieval with Text-Domain Label
Denoising. In Proceding of International Conference on Multimedia Retrieval (pp. 240-
248).
Vijayarajan, V., Dinakaran, M., Tejaswin, P., & Lohani, M. (2016). A generic framework for
ontology-based information retrieval and image retrieval in web data. Human-centric
Computing and Information Sciences, 6(18), 1-30.
Wang, Z., Li, Z., Sun, J., & Xu, Y. (2018). Selective Convolutional Features based Generalized-
mean Pooling for Fine-grained Image Retrieval. In Proceding of Visual Communications
and Image Processing (VCIP) (pp. 1-4). Taichung, Taiwan.
Zhang, L., Yang, Y., Wang, M., Hong, R., Nie, L., & Li, X. (2016). Detecting Densely Distributed
Graph Patterns for Fine-Grained Image Categorization. IEEE Transactions on Image
Processing, 25(2), 553-565.
Zhong, B., Li, H., Luo, H., & Zhou, J. (2020). Ontology-Based Semantic Modeling of Knowledge
in Construction: Classification and Identification of Hazards Implied in Images. Journal of
Construction Engineering and Management, 146(4), e04020013.

20

You might also like