You are on page 1of 5

Pattern Recognition Letters 30 (2009) 789–793

Contents lists available at ScienceDirect

Pattern Recognition Letters


journal homepage: www.elsevier.com/locate/patrec

A fusion neural network classifier for image classification q


Sanggil Kang a, Sungjoon Park b,*
a
Department of Computer Science and Information Engineering, INHA University, 253 Younghyun-dong, Nam-gu, Incheon 402-751, Republic of Korea
b
Department of Mobile Game, Kongju Communication and Arts College, Gongju, Chungnam 314-713, South Korea

a r t i c l e i n f o a b s t r a c t

Article history: Neural networks have been commonly used for image classification problems by fusing input features
Available online 20 June 2008 extracted from multiple MPEG-7 descriptors. It is because they can provide better performance than
those extracted from single descriptor. However the input feature dimension can be various according
Keywords: to MPEG-7 descriptors. Usually input features with large dimension are dominant over those with small
Image classification dimension for generating outputs of the neural networks, even though their contribution to output is
Fusion neural network classifier almost same. In order to solve the problem, we propose a fusion neural network classifier which divides
Sensitivity each descriptor by the number of its input features. And we consider the importance of the input features
MPEG-7 descriptor
in each descriptor during training the classifier. In the experimental section, we showed the analysis of
our method and compared the performance of sports image classification with conventional neural net-
work classifier, using six classes of sports images collected on the Internet.
Ó 2008 Elsevier B.V. All rights reserved.

1. Introduction tors according to the number input features. And we consider the
importance of the input features in each descriptor during training
Recently rich multimedia data from different sources has been the FNNC. The importance of each input can be obtained using the
accumulated through Internet. There is an enormous quantity of input sensitivity which is well-known method. Also, we dynami-
image data on the World Wide Web. Thus, we need heavy search- cally update the input sensitivity during training the classifier be-
ing time for finding user’s preferred images. By providing an appro- cause the input sensitivity usually changes as epoch goes.
priate image classification method, the problem can be solved to In Section 4, we show that our system outperforms the conven-
some extent. tional neural network classifier (NNC).
Neural network has been utilized for improving image classifi- The rest of the paper is organized as follows: Section 2 discusses
cation problems because of its property called ‘black-box’ learning. about the related work. In Section 3, we demonstrate our method.
The input feature of the image classification problem has been re- Experimental results and conclusion are presented in Sections 4
trieved using MPEG-7 descriptors such as color layout (CL), edge and 5, respectively.
histogram (EH), and region-based shape (RS). In many cases of im-
age classification problems using neural network, fusion of input 2. Related works
features extracted from multiple descriptors gives better perfor-
mance than those from single descriptor. However, the dimension MPEG-7 ( ISO/IEC/JTC1/SC29/WG11/N6828, 2004;Martinez et
of extracted input features is various according to the descriptors. al., 2002;ISO/IEC/JTC1/SC29/WG11/N4358, 2001), formally named
Due to the ‘black-box’ training style (Haykin, 1999), the descriptor ‘‘multimedia content description interface”, is a standard for
generating many input features can be dominant over that gener- describing the multimedia content data. It is low-level descriptors
ating few input features for outputs of the NNC. However, in the to identify, categorize or filter images/video. MPEG-7 standard
light of opinions of the experts, the classification performance specifies a set of descriptors, each defining the syntax and the
can be improved if each descriptor can contribute equivalently semantics of an elementary visual low-level feature, e.g., color, tex-
for neural network output regardless of the input feature dimen- ture, shape, etc. As explained in (ISO/IEC/JTC1/SC29/WG11/N6828,
sion. In order to take into consideration, we introduce a fusion neu- 2004; Martinez et al., 2002; ISO/IEC/JTC1/SC29/WG11/N4358,
ral network classifier (FNNC) which makes the contribution of each 2001), CL descriptor represents the spatial layout of color images
input descriptor equivalent in the hidden layer. We divide descrip- in a very compact form, which is based on generating a tiny
(8  8) thumbnail of an image. EH descriptor represents the spatial
q distribution of five types of edges (four directional edges and one
This work was supported by INHA University Research Grant (INHA-35035-01).
* Corresponding author. Tel.: +82 32 860 8377; fax: +82 32 874 1435. non-directional). It consists of local histograms of these edge
E-mail addresses: sgkang@inha.ac.kr (S. Kang), sjpark@kcac.ac.kr (S. Park). directions, which may optionally be aggregated into global or

0167-8655/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.patrec.2008.06.009
790 S. Kang, S. Park / Pattern Recognition Letters 30 (2009) 789–793

semi-global histograms (Won et al., 2001). Homogeneous texture feature extraction module, MPEG-7 XM engine extracts the fea-
(HT) descriptor is designed to characterize the properties of texture tures of image in the XML description format. There are various
in an image (or region), based on the assumption that the texture is types of descriptors in MPEG-7 XM engine such as CL, EH, HT,
homogeneous. Because of the characteristics of MPEG-7 descrip- and RS. The parsing engine parses the raw descriptions to trans-
tors, they have been used as input feature of image classifiers mod- form them to numerical values which are suitable for neural net-
eled by machine learning techniques such as neural network and work implementation. The preprocess engine normalizes the
fuzzy system. Zhou and Huang (2001), Banerjee and Kundu numerical values to the 0–1 range. By normalizing the input fea-
(2003) and Jain and Vailaya (1996) used edge points of a scene tures, it can avoid that input features with big number scale dom-
for indexing images. Zhou et al. proposed structural features inant the output of NNC for the classification of the sports image
extracted from edge maps for content based image retrieval tech- over input features with small number scale.
niques. Also, Jain et al. proposed a technique for shape comparison The classification module builds NNC by training from its initial
using directional histogram of edge. state for classifying images by considering the input feature
Various types of neural networks techniques have been em- dimension extracted from the feature extraction module when
ployed for image classification problems because of their general- training NNC. The detail algorithm is explained in detail at the next
ization ability. Park et al. (2004) proposed a method of content- section.
based image classification using a neural network. They extracted
the object region using a region segmentation technique. Machado 3.2. Classification module
and Neveds (1996) introduced a multi-model neural network
(MMNN) for interpreting satellite images of Amazon region for The dimension of generated input features used for classifying
deforestation monitoring. They showed that MMNN can be suit- images can be various according to the descriptors. For example,
able for the application requiring higher performance degrees. Spy- the feature dimension from EH descriptor is 80 while 35 from RS
rou et al. (2005) proposed three content-based image classification descriptor. According to opinions from experts of MPEG-7, the
techniques by fusing various MPEG-7 descriptors such as a merg- importance of those two descriptors for image classification prob-
ing fusion combined with an SVM classifier (Ye et al., 2004), a lems is almost similar.
back-propagation fusion combined with a KNN classifier (Li et al., However, in this case, the input features from EH descriptor can
2004), and fuzzy-ART neurofuzzy network (Cowie et al., 2005). be dominant over those of RS descriptor if we use conventional
Kim et al. (2005, 2006) applied a single feed-forward neural net- NNC as seen in Fig. 2. The reason is following. The output of a node
work for adult image classification and multiple neural networks in the hidden layer can be expressed as
for sports image classification problems using multiple MPEG-7
features. They compared the classification performance by varying
the combination of features extracted from various MPEG-7 x11 1
descriptors. X1 x12
In the above literatures fusion of multiple descriptors provides . wk ,1
y1
Descriptor x wk , 2 2 .
better performance than single descriptor. However, they did not 1k w k ,i
. wk , n . .
consider variety of the dimension of extracted input features
x1N1 .
according to the descriptors. In order to take into consideration, y hi
.
we introduce a fusion neural network classifier which is explained
.i yi
x21 .
in detail from the following section.
x22 .
X2 .
wl ,1
wl ,2
ys
3. Proposed sports image classification method Descriptor x wl ,i
2l
. wl ,n
3.1. Overall architecture of our system x2 N 2
n

Input Hidden Output


Fig. 1 shows the overall architecture of our image classification
Layer Layer Layer
system. The system is composed of two parts; one is feature
extraction module and the other is classification module. In the Fig. 2. The conventional neural network classifier.

Color layout

Edge Parsing
histogram
. &
Normalizing Neural Input
. Classifier
. engine network sensitivity
Image
Homogeneous
file texture

Region-based
shape

Feature Extraction Module Classification Module

Fig. 1. Overall architecture of our system.


S. Kang, S. Park / Pattern Recognition Letters 30 (2009) 789–793 791

! !
X
N1 X
N2
1 X
N1
1 X
N2
yh;i ¼ fh x1k wk;i þ x2l wl;i ; ð1Þ yh;i ¼ fh x1k wk;i þ x2l wl;i : ð2Þ
k¼1 l¼1
N1 k¼1 N2 l¼1

where N1 and N2 is the input dimension of feature set X1 and X2, However, the factor 1/N1 and 1/N2 in Eq. (2) is applied to all inputs
respectively, N1 > N2. Here X 1 ¼ ½x11 ; x12 ; . . . ; x1k ; . . . ; x1N1  extracted of each feature group, regardless of the importance of each input in
from descriptor 1 and X 2 ¼ ½x21 ; x22 ; . . . ; x2l ; . . . ; x2N2  generated from the group.
descriptor 2, respectively. The fh is the activation function in the To take into the consideration, we provide weight to each input
hidden layer. The x1k and x2l is input k from descriptor 1 and input according to its importance. The importance of each input can be
l from descriptor 2, respectively. Also, the wk,i and wl,i is the corre- obtained from input sensitivity which is well-known method for
sponding weight of x1k and x2l connected to the node i. As seen in calculating the importance of inputs. The input sensitivity can be
Eq. (1), the contribution of the input group X1 is larger than the in- defined as the ratio of variation of the output of hidden layer to
put group X2 in the hidden layer because of N1 > N2. It can be a fac- the variation of input as follows:
tor of bad training performance. To avoid the problem, we make
contribution of each input group in the hidden layer equivalent 1 XH
oyh;i
sxgk ¼ ; g ¼ 1 or 2; ð3Þ
by dividing the input group 1 by N1 and the input group 2 by N2. H i¼1 oxgk
The Eq. (1) is modified as follows:

x11 y h1 sx11
1

X1 x12 s x12
. wk ,1
wk , 2 yh 2 . y1
Descriptor x1 k 2 . .
w k ,i
wk , n . s x1 k .
x1N1 . .
y hi
. .
i. s x1 N 1 yi Classifier
x21 .
s x 21
x22 .
X2 .
wl ,1
wl ,2
s x 22 ys
w l ,i
Descriptor x2l .
wl ,n .
x2 N 2 y hH
H

Input Hidden Output


Layer Layer Sensitivity Layer

Fig. 3. An example of a fusion neural network classifier.

Soccer Swimming Skiing Field &Track Baseball Basketball

Fig. 4. Six classes of sports image data for training and test.

Table 1
The classification accuracies of input features: edge histogram (EH), region shape (RS) for training and test data according to different epochs (%)

Epoch Descriptor Soccer Swimming Skiing Field & track Baseball Basket ball Mean
(a) Training data
1000 EH 82.6 80.4 89.5 84.3 80.5 86.7 84.0
RS 80.2 84.1 92.4 86.8 81.3 90.4 85.9
5000 EH 87.3 80.8 93.1 91.8 82.7 83.6 86.6
RS 89.1 85.2 86.9 90.4 85.3 91.5 88.1
10,000 EH 89.5 87.3 90.4 95.1 89.5 90.1 90.3
RS 90.2 88.8 90.7 90.5 85.3 92.7 89.7
20,000 EH 94.7 96.3 91.2 93.1 94.1 89.5 93.2
RS 92.3 91.7 92.8 94.9 93.5 94.1 93.2

(b) Test data


1000 EH 73.2 80.1 79.5 68.6 64.8 79.4 74.3
RS 75.3 76.7 76.5 72.9 70.1 79.3 74.8
5000 EH 80.2 85.3 79.8 73.1 70.5 76.6 77.6
RS 79.6 82.4 80.2 75.2 72.3 81.5 78.5
10,000 EH 81.8 83.3 82.2 80.1 79.5 76.2 80.5
RS 79.8 82.8 83.6 78.4 80.9 78.7 80.7
20,000 EH 80.8 82.3 80.6 75.3 72.7 75.3 77.8
RS 81.5 79.5 80.5 77.2 78.6 80.1 79.6
792 S. Kang, S. Park / Pattern Recognition Letters 30 (2009) 789–793

where sxgk is the input sensitivity of input xgk which means input k where ~sxgk is the normalized sensitivity of input xgk. Eq. (5) is the
in group g. The input sensitivity is normalized as Eq. (4) for modified version of Eq. (2) by applying the normalized input
convenience. sensitivity
sx !
1 X 1 X
N1 N2
~sxgk ¼ PN gk ; g ¼ 1 or 2; ð4Þ
g yh;i ¼ fh x1k~sx1k wk;i þ x2l~sx2k wl;i : ð5Þ
k¼1 sxgk N1 k¼1 N2 l¼1

Table 2
The classification accuracies of FNNC and NNC for training and test data according to different epochs (%)

Soccer Swimming Skiing Field & track Baseball Basketball


Training Test Training Test Training Test Training Test Training Test Training Test
(a) Epoch = 1000
Soccer NNC 91.6 79.7 1.3 4.2 1.7 2.2 2.5 8.1 2.4 4.3 0.5 1.5
FNNC 92.1 90.2 1.8 2.3 0.8 1.1 1.7 1.5 2.3 4.1 1.3 0.8
Swimming NNC 1.3 2.5 91.5 83.5 1.6 3.2 2.8 3.9 1.1 4.2 1.7 2.7
FNNC 2.1 2.1 90.2 86.1 1.8 3.7 2.1 4.2 2.5 2.6 1.3 1.3
Skiing NNC 0.8 2.4 1.4 5.3 95.4 85.2 1.3 4.2 0.8 2.1 0.3 0.8
FNNC 0.5 1.5 1.3 2.8 96.3 90.8 0.2 1.2 0.5 1.5 1.2 2.2
Field & track NNC 2.3 6.4 1.2 3.5 1.8 5.3 92.2 78.9 0.9 2.7 1.6 3.2
FNNC 0.8 3.4 1.3 2.3 1.2 1.2 95.7 89.4 0.8 2.6 0.2 1.1
Baseball NNC 0.9 7.2 0.5 2.4 2.6 4.8 0.8 5.3 93.6 79.1 1.6 1.2
FNNC 1.7 2.7 0.5 1.5 0.9 0.9 1.9 3.4 93.1 89.6 1.9 1.9
Basket ball NNC 1.8 2.2 1.6 2.5 2.2 3.9 1.8 4.9 1.4 4.2 91.2 82.3
FNNC 1.4 1.8 1.8 2.6 2.3 2.5 2.2 3.2 1.7 3.6 90.6 86.3

(b) Epoch = 5000


Soccer NNC 94.2 84.3 0.8 1.2 0.2 3.1 2.7 7.3 1.4 2.8 0.7 1.3
FNNC 93.3 90.1 1.3 2.1 1.4 2.4 1.7 2.7 1.4 1.8 0.9 0.9
Swimming NNC 2.8 3.1 91.8 87.5 1.5 2.1 1.2 3.3 1.1 1.2 1.6 2.8
FNNC 2.2 4.2 92.1 84.4 1.9 3.7 2.5 4.8 0.3 1.3 1 1.6
Skiing NNC 1.4 2.4 1.3 4.5 92.9 83.3 1.1 2.3 2.1 2.1 1.2 5.4
FNNC 1.4 1.4 1.1 2.1 94.6 89.8 1.4 3.4 0.3 1.5 1.2 1.8
Field & track NNC 0.8 2.5 0.5 1.2 2.2 4.3 92.3 83.7 1.8 3.8 2.4 4.5
FNNC 1.8 1.8 1.6 2.7 0.9 1.4 93.7 90.9 1 1.5 1 1.7
Baseball NNC 2.5 7.8 0.7 3.9 0.9 3.6 1.8 4.8 92.8 76.5 1.3 3.4
FNNC 1.4 1.8 2 2.1 3.5 4.1 1.2 1.8 90.3 88.4 1.6 1.8
Basket ball NNC 1.2 3.8 1.1 1.4 1.7 2.9 1.4 2.5 0 2.1 94.6 87.3
FNNC 1.8 4.3 1.7 1.7 1.1 3.1 1.2 2.2 1.8 3.7 92.4 85

(c) Epoch = 10,000


Soccer NNC 96.3 79.6 1.2 4.8 0.7 2.8 1.8 4.2 0 6.2 0 2.4
FNNC 91.2 87.8 2.2 2.6 1.3 2.2 1.9 3.5 1.6 1.6 1.8 2.3
Swimming NNC 2.1 6.9 93.8 83.6 1.1 2.8 1.3 2.7 1.2 2.2 0.5 1.8
FNNC 1.9 5.5 89.6 81.5 2.3 2.8 3.3 6.3 1.3 2.3 1.6 1.6
Skiing NNC 1.3 3.8 0.4 2.2 94.1 84.7 1 3.2 1.4 3.4 1.8 2.7
FNNC 1.9 1.8 1.3 2.3 92.5 88.7 1.6 2.6 1.1 2.1 1.6 2.5
Field & track NNC 0.7 1.7 1 1.5 1.5 3.5 93.6 87.8 1.7 2.7 1.5 2.8
FNNC 2.3 3.2 1.7 2.8 1.6 3.5 91.3 85.1 1.2 1.5 1.9 3.9
Baseball NNC 2.7 6.1 0.2 2.2 0.3 3.6 0.5 3.8 95.2 81.2 1.1 3.1
FNNC 0.9 1.9 2.2 4.2 1.6 3.6 1.5 1.8 92.3 86.6 1.5 1.9
Basket ball NNC 0.9 3.7 0.4 2.3 0 3.2 0.8 3.8 0.5 3.1 97.4 83.9
FNNC 2.7 3.5 1.5 1.9 2.6 4.2 1.1 3.3 1.8 3.8 90.3 83.3

(d) Epoch = 20,000


Soccer NNC 95.8 89.3 1.3 1.3 0.8 2.7 1.6 2.8 0.5 1.5 0 2.4
FNNC 91.8 85.1 1.3 2.6 1.1 2 2.7 3.4 1.6 4.6 1.5 2.3
Swimming NNC 1.2 4.3 95.6 82.5 0.6 3.4 0.8 3.7 1.5 3.2 0.3 2.9
FNNC 1.4 2 91.7 86.9 1.9 2.8 3.3 4.2 0.8 2.5 0.9 1.6
Skiing NNC 1.5 2.8 0.8 5.6 93.7 80.2 1.1 3.9 1.3 4.2 1.6 3.3
FNNC 0.8 3.8 1.8 3.3 93.5 83.5 0.8 3.6 1.6 2.1 1.5 3.7
Field & track NNC 1 2.7 1.6 3.8 0.4 3.2 95.4 82.8 1.3 3.7 0.3 3.8
FNNC 1.4 2.8 0.9 2.6 1.5 3.5 94 86.3 0.5 1.5 1.7 3.3
Baseball NNC 1.2 4.1 0.9 3.2 0.5 4.6 0.8 3.9 95.1 79.7 1.5 4.5
FNNC 1.7 0.9 1.3 4.2 0.9 3.6 1.5 1.6 93.7 87.8 0.9 1.9
Basket ball NNC 0.8 3.7 1 2.3 0.6 1.8 1.4 2.6 1.7 3.2 94.5 86.4
FNNC 1.7 4.8 1.2 2.9 1.5 3.2 0 4.1 1.4 2.8 94.2 82.2
S. Kang, S. Park / Pattern Recognition Letters 30 (2009) 789–793 793

Also, we dynamically update the input sensitivity while training the test data. However, the accuracy of FNNC is 93% and 88.7% for
classifier because the importance of each input can change accord- training and test data, respectively. The deviation of training and
ing to the update of the weights in the hidden layer for each epoch. test performance is 11.1% for NNC but 4.3% for FNNC. For
Fig. 3 is an example of FNNC as explained above. epoch = 5000, 10,000, and 20,000, the deviation is 9.3%
(93.1  83.8%), 11.6% (95.1  83.5%), and 11.5% (95%  83.5%),
4. Experiment respectively. However, for FNNC, it is 4.6% (92.7  88.1%), 5.7%
(91.2  85.5%), and 7.9% (93.2  85.3%), respectively. From the
4.1. Experimental conditions experimental result, we can deduce that our method is robust to
the data unseen during training the classifiers for the sports image
We compared classification performance of FNNC with conven- classification problem, compared to NNC. It is because the impor-
tional NNC using two MPEG-7 descriptors such as EH and RS be- tance of each input feature is being taken into the consideration
cause the difference of those feature dimensions is large. The EH during training our neural network classifier as explained in the
and RS has 80 feature dimension and 35 feature dimension, respec- previous section. Especially, for small epoch (1000), our method
tively. In our experiment we used six classes of sports image data provided the optimal result. However, there is a little difference
collected on the Internet such as soccer, swimming, skiing, field & between the two methods for the large epoch.
track, baseball, and basketball as seen in Fig. 4.
Using MPEG-7 XM (experimentation model), we extracted in- 5. Conclusion
put features using EH and RS descriptors from query images in
the feature extraction module. We extracted the input feature of This paper proposed the fusion neural network classifier for im-
EH from the spatial distribution of five types of edges which con- age classification problems by considering the property of equiva-
sists of four directional edges (one horizontal, one vertical, and lent contribution of MPEG-7 descriptors. From the experimental
two diagonal edges) and one non-directional edge for 16 local re- results, we showed that our method can provide robust training
gions in the image. Also, we divided the image space into 16 performance compared to conventional neural network classifier.
non-overlapping sub-images so that we can define a local edge his- Our method can be useful for the case in which fusion of different
togram for each sub-image. We obtained total 16  5 = 80 binary dimension features is used to NNC. The input features are gener-
histogram. We also extracted 35 input features from the region- ated from same source.
based shape of each image by capturing the distributions of all pix- The drawback of our method is that the structure of the FNNC is
els within a region which explains the shape of objects. Also we more complicated than the conventional NNC due to its
normalized the input features into 0–1 range as a preprocessing. functionality.
For training, 1200 images (200 images each sports) were used As a further work, we need to apply our method to other classi-
and for testing, 600 images (100 images each sports) were used. fication problems using a FNNC whose input features are extracted
We structured the three-layered FNNC in the classification module. from the same resource.
The hyperbolic tangent sigmoid function and hard limit function
was used in the hidden layer and in the output layer, respectively. References
The Levenberg–Marquardt back-propagation algorithm in MATLAB
was used because of its training ability. Also, we set 0.1 as the Banerjee, M., Kundu, M.K., 2003. Edge based features for content based image
retrieval. Pattern Recognition Lett. 24 (3), 2649–2661.
learning rate and 76 neurons in the hidden layer for training. For Cowie, R., Douglas, E., Taylor, J.G., Loannou, S., Wallace, M., Kollias, S., 2005. An
fair comparison, FNNC and NNC were trained under identical train- intelligent system for facial emotion recognition. In: IEEE International
ing conditions. Also, we exhaustively trained them by varying ini- Conference on Multimedia and Expo, 2005.
Haykin, S., 1999. Neural Networks: A Comprehensive Foundation. Prentice-Hall.
tial values of weights and epoch, i.e., 100 repetitions for each Jain, A.K., Vailaya, 1996. Image retrieval using color and shape. Pattern Recognition
epoch. 29, 1233–1244.
Kim, W., Lee, H.K., Park, J., Yoon, K., 2005. Multiclass adult image classification using
neural networks. In: Canadian Conference on AI, pp. 222–226.
4.2. Analysis of contribution of two MPEG-7 descriptors for image Kim, W., Oh, S., Kang, S., Kim, D., 2006. Multi-module image classification system.
classification In: 7th International Conference on Flexible Query Answering System, pp. 498-
506.
Li, X., Chitti, S., Liu, L. 2004. k-Nearest classification across multiple private databases.
In this section, we show that descriptors can contribute equiv-
In: Third International Conference on Image and Graphics, pp. 377–380.
alently for neural network output even though the difference of Machado, R.J., Neveds, P.E.C.S.A., 1996. Multi-model neural network for image
those feature dimensions is large. Table 1 shows the classification classification. In: 1996 Proceedings of the Second Workshop on Cybernetics
accuracies of NNCs trained with EH features and RS features, Vision, pp. 57–59.
Martinez, J.M., 2002. MPEG-7: Overview of MPEG-7 Description Tools, Part 2, IEEE
respectively. The overall accuracy for EH and RS is 88.5% and Multimedia, pp. 83–93.
89.2%, respectively. For test data, the overall accuracy is 77.6% for Park, S.B., Lee, J.W., Kim, S.K., 2004. Content-based image classification using a
EH and 78.4% for RS. From the experimental result, we can deduce neural netwrk. Pattern Recognition Lett. 25 (3), 287–300.
Spyrou, E., Borgne, H.L., Mailis, T., Cooke, E., Avrithis, Y., Connor, N., 2005. Fusing
that contribution of EH and RS for the image classification perfor- MPEG-7 visual descriptors for imageclassification. In: International Conference
mance is equivalent. on Artificial Neural Network – ICANN 2005, Lecture Notes in Computer Science,
vol. 3697, pp. 847–852.
Won, C.S., Park, D.K., Park, S., 2001. Efficient use of MPEG-7 edge histogram
4.3. Analysis and comparison of performances of FNNC and NNC descriptor. ETRI J. 24 (1).
ISO/IEC/JTC1/SC29/WG11/N4358. Text of ISO/IEC 15938-3/FDIS Information
In this section, we compare and analyze performances of FNNC Technology – Multimedia Content Description Interface – Part 3 Visual. MPEG
Video Group, Sydney, July 2001.
and conventional NNC using a fusion of input features extracted ISO/IEC/JTC1/SC29/WG11/N6828. MPEG-7 Overview (version 10). MPEG Video
from EH and RS. Table 2 is the result of training and test perfor- Group, October 2004.
mance of both FNNC and the conventional NNC. From the table, Ye, J., Yao, H., Jiang, F., 2004. Based on HMM and SVM multilayer architecture
classifier for Chinese sign language recognition with large vocabulary. In: Third
the deviation of performances between training and test for NNC
International Conference on Image and Graphics, pp. 377–380.
is high, compared to FNNC. For instance, for epoch = 1000, the Zhou, X.S., Huang, T.S., 2001. Edge-based structural features for content-based
overall accuracy of NNC for training data is 92.6% while 81.5% for image retrieval. Pattern Recognition Lett. 22, 457–468.

You might also like