You are on page 1of 6

The Use of Genetic Algorithm for Feature Selection

in Video Concept Detection

Marjan Momtazpour*, Mohammad Hossein Saraee**, and Maziar Palhang***

Department of Electrical and Computer Engineering
Isfahan University of Technology,
Isfahan, Iran, 84156831112

Abstract—Video semantic concept detection is considered as an Nowadays, great developments in video concept detection
important research problem by the multimedia industry in recent have been reported in some domains [3], [4], [5], [6], and [7].
years. Classification is the most accepted method used for concept The semantic concepts can be from wide range of topics
detection, where, the output of the classification system is including objects, scenes, and events. There are certain
interpreted as semantic concepts. These concepts can be standard datasets available for research purposes. For example
employed for automatic indexing, searching and retrieval of NIST TRECVID [8] provides a large-scale set of video data.
video objects. However, employed features have high dimensions The TRECVID 2005 dataset includes hundreds of hours of
and thus, concept detection with the existing classifiers broadcast news video from different TV channels.
experiences high computation complexity. In this paper, a new
The existing classifiers have some disadvantages. Because
approach is proposed to reduce the classification complexity and
of high-dimensional features, they suffer from high
the required time for learning and classification by choosing the
most important features. For this purpose genetic algorithms are computation complexity with respect to the time and memory
employed as a feature selector. Simulation results illustrate consumption. Even with the long durations of time spending
improvements in the behavior of the classifier. for the learning process, these algorithms are unreliable, with
respect to different performance metrics [3].
Keywords-component; Video Concept detection, Genetic In this paper, we propose a novel framework to decrease the
Algorithm, Feature Selection, Classification, Support Vector time which is required for learning and classification by
Machine. selecting the most important features from the traditional low-
level features, used in the existing classifiers. For this purpose,
I. INTRODUCTION Genetic Algorithm (GA) is employed to select a subset of
features. The goal of the paper is to improve the behavior of the
With the explosive growth of data (including text, sound classification in terms of speed of the classification, and the
and video), the need for automatic indexing, searching, classification accuracy.
organizing, and categorizing of data is increased [1], [2]. Most The rest of the paper is organized as follows. The related
of this data are multimedia ones including sound and videos. works are reviewed in Section 2. In Section 3, an introduction
Hence, one of the most important research problems in this to feature selection and genetic algorithm is presented. In
field is video concept detection. Section 4, the proposed method is introduced. Experimental
The main goal of video concept detection is to determine and simulation results are presented in Section 5. Finally,
the presence (or absence) of certain semantic concepts such as Section 6 concludes the paper.
“outdoor”, “airplane”, “car” and “person” in a video frame.
Too much effort has been reported in the literature in order to II. RELATED WORKS
decrease the semantic gap between the low-level features and
Semantic concept detection has a great importance in video
the high-level ones. Examples include automatic concept
indexing and retrieval. There are several reported researches in
detection and high-level feature extraction [3], [4], [5], [6], and
[7]. the literature. In this section, a literature review of concept
The most popular approach in concept detection is to use a detection was discussed.
classification method to determine if a concept exists in a video First of all, to process the raw film, the raw video should be
frame or not. The most famous classifier for this type of converted to the relational form. Video frames should be
problem is Support Vector Machine (SVM) [3]. The output of converted into relational database to perform data mining
the classification system can be interpreted as high-level methods such as classification, clustering, and so on. Type of
semantic concepts used for retrieval and filtering in different extracted features can be different. Examples include sound
types of videos. A list of important concepts, a machine extraction from video to convert to text, or to get the subtitle
learning algorithm, and a large-scale dataset are required for of videos to learn the actions and speeches during the movie.
these classification systems [4]. The most important part of information in videos is visual
features. Visual features can be grouped into two parts: global
Proceedings of ICEE 2010, May 11-13, 2010
978-1-4244-6760-0/10/$26.00 ©2010 IEEE
features and local ones. Global features such as color as to not significantly degrading the performance of the
histogram and edge direction histogram (EDH) provide classifier system [13].
information about the whole video frame. However, local Genetic Algorithm (GA) is one of the most effective tools
features like SIFT and Bag-of-Features make information for data analysis and pattern recognition [14]. GAs are suitable
available about specific objects in the images. Most of the for feature selection due to their adaptive behavior and
concept detection systems work based on visual features randomized search technique [15]. Feature selection can
especially global ones. improve classification accuracy.
In [6] SVM-based classifier is employed for learning and The major issues in applying GAs to a certain problem are
selecting a good representation of the chromosomes and a
classification of concepts with global visual features. Also, the
suitable evaluation function as well as appropriate crossover
fusions of local and global features, such as the ones which
and mutation operators. In feature selection problems, the form
have been considered in [6], have been employed in [7]. of the gene is a binary array with length equal to l with each
Statistical approaches based on global/local visual and audio feature has a bit; indicating the presence (1) or absence (0) of
features, and their combinations have been employed in [4] that feature in candidate feature set [13]. In the following
and several fusion frameworks have been proposed for section, the application of GAs for feature selection in video
concept detection. It is shown that the fused multimodal concept detection is analyzed.
models considerably reduce the detection errors, compared to
single models. IV. PROPOSED METHOD
The framework which is proposed in [9] is a combination of In video concept detection through machine learning
Apriori algorithm and association rule mining to find frequent approaches, a large number of features have been considered in
item-sets in dataset. This model made classification rules to the literature. However, classification based on large number of
classify the video shots into semantics concepts. features, not only increase the classification time, but also
Bayesian networks have been also employed to build up causes to get stuck in local minima. Therefore, feature selection
relations between different concepts [10]. Then, the structure can be used as a good solution to decrease the number of
of the semantic network could be learned by an appropriate features. In this section, a new method is proposed to decrease
learning method. An improved version of K2 algorithm has the number of features with Genetic Algorithm.
been proposed in [10]. Parallel Self-Organizing Maps have
been also used for concept detection in [11]. A. Classification Strategy
In [12], authors want to have a concept detection system for Classification is one of the most accepted methods
web-based videos like YouTube. They proposed an automatic applicable for video concept detection. While different
graph model generator from the online video-sharing websites classifiers could be employed for this purpose, SVM is more
for concept relationship learning based on known ontology popular. In this paper, SVM is used as the base classifier.
(such as LSCOM). An SVM-based learning method is While SVM classifiers behave well in complex
employed to extract concepts from massive videos like classification problems, the learning phase can be a time
YouTube. consuming process, especially when a large number of features
In [1], for each individual concept, the prior probability of should be considered. Also, it has been shown that, in video
the concept is incorporated with detection score of an concept detection, SVM just memorizes the concept patterns,
individual SVM detector. Then, probabilistic estimates of the behaving similar to memory-based classifiers such as KNN [3].
target concept are computed using all of the individual SVM Therefore, reducing the number of features can be beneficial
from the time and performance perspectives.
detectors. Finally, these estimates are linearly combined using
The proposed classification strategy is illustrated in Fig. 1.
weights learned from the training set.
In the first step, GA is used to select a subset of features. At
Using all of the features for learning a classifier, is each round of the GA, selected features are evaluated with
computationally complex. Therefore, in next sections, a new respect to an appropriate performance measure. For this
framework is presented to learn a classifier based on a partial purpose, a randomly generated test data is temporarily
of visual features. classified with the selected features. Then, classification
performance is measured in terms of correct rate. The temporal
classifier which is used in this paper, is classification based on
Since each extra feature can increase the cost, space and the Mahalanobis distance. Since classification based on
time of a classification system, image processing researchers Mahalanobis is faster than SVM, this improves the feature
have a major motivation to design and implement classifiers selection phase from the time perspective.
with small feature sets. However, there is a need to have a In the second step of the proposed strategy, selected
sufficient set of features to achieve accurate recognition rates. features are employed for classification with an SVM classifier.
Hence, this led us to search for an optimum subset of features Similar to [16], F-Score, accuracy, recall and precision are the
from the available ones [13]. performance metrics that are measured at the end of the second
Feature selection strategies fall into two main categories. step.
The first category is to select features independent of their
effect on the classification performance. The second approach B. Genetic Algorithm
directly selects a subset of the available features, in such a way When GAs are used for feature selection, a chromosome is
represented as a string of bits with length equal to the number
of features. A 0 or 1, in position i, indicates that whether the ith
feature is selected or not, respectively.
One of the most essential steps in applying GAs to a certain
application is to choose an appropriate evaluation function [13].
In order to select the best feature set, in the evaluation function,
we should evaluate the selected features on the test data.
Hence, in the evaluation function, with chromosomes
consisting of a subset of features, the classifier learns and tests
the dataset with specified features. The evaluation function has
to minimize the error rate of the classifier in each new
generation. It is almost clear that the best chromosome is the
one that has the minimum fitness function compared with other
chromosomes. Fig. 2 illustrates the pseudo code of the
evaluation function.
In any GA-based system, beside an appropriate evaluation
function, proper Crossover and Mutation operations are also
required. Crossover operation defines how the genetic
algorithm combines two individuals, or parents, to generate a
child for the next generation. Mutation specifies how the
genetic algorithm makes small random changes in the
individuals in the population to create mutated children. The Figure 2. Pseudo code of evaluation function.
Mutation and Crossover operations, used in this paper, are
illustrated in Figs. 3 and 4, respectively. It should be mentioned
that the selection function in this research is Tournament.
Selection function selects the population members that will be
used as the parents for the Crossover operation.
V. EXPERIMENTAL RESULTS Figure 3. Example of Mutation operator.
In order to evaluate the proposed method, extensive
experiments have been performed. In this section some
experimental results are presented.
A. Data Sets and Features
The experiments are performed on the development set of
2005 (TREC05) video collection in the TREC Video Retrieval
Evaluation [8]. The TREC05 is a collection of about one
hundred hours of broadcast news video from six channels Figure 4. Example of Crossover operator.
including CNN (English), NBC (English), MSNBC (English),
CCTV (English), NTDTV (Chinese), and LBC (Arabic). This
dataset contains around 60,000 of video shots, which are B. Semantic Concepts
equally distributed across the 6 channels.
A number of semantic concepts are chosen from the
After shot segmentation, each video shot is represented by
LSCOM-Lite lexicon [18], which covers different concepts
the middle frame named as the “key-frame”. Three visual
features have been extracted from each key-frame including presented in the broadcast news videos. These concepts have
Edged Direction Histogram (EDH) with 73 dimensions, Gabor been manually annotated by people to describe the visual
Texture (GBR) with 48 dimensions, and Grid Color Moment content of the key-frames in TRECVID 2005 data sets [19].
(GCM) with 225 dimensions. Detailed description of these The features and annotations of TRECVID 2005 are available
features could be found in [17]. at:
There is a large difference between the frequencies of
different concepts. Common concepts like “Outdoor” have
frequencies around 50%, while rare ones (like “Prisoner”)
have frequencies below 1% [3]. All of the concepts are
categorized into three sets [2]:
1) High-Frequency Concepts which are the concepts with
frequency of positive examples greater than 0.05 such as
“Person”, “Face”, “Outdoor”, and “Sky”,
Figure 1. The proposed classification strategy 2) Medium-Frequency Concepts which are the concepts
with frequency of positive examples between 0.05 and 0.01
such as “Meeting”, “Car”, “Animal”, and “Sports”.
3) Low-Frequency Concepts which are the concepts with
frequency of positive examples below 0.01 such as “Airplane”,
“Snow”, “Maps”, and “Prisoner”.
The frequency of negative examples in the experimental
dataset is large. Therefore, in this paper, the idea of [6] is used
for balancing the data. To balance the frequency of the
positive and negative samples, randomly take 20% of negative
samples. Then, one of the following situations may occur:
1. If the number of examples in this 20% of negative
samples is more than the number of positive examples, then
this 20% of negative samples is used for training the classifier.
2. If the number of examples in this 20% of negative
samples is less than the number of positive examples, but the
total number of negative examples in dataset is greater than
Figure 5. Learning and classification time for different concepts.
the number of positive examples, a subset of negative
examples with the size equal to the size of positive examples
chosen randomly for training.
3. If the number of negative examples is less than the
number of positive examples, all negative examples are
C. Experiment Environment
The simulations have been performed on a Pentium IV
computer with a 2.5 GHz Dual Core CPU, and 2GBytes of
RAM. The simulation software is developed in
MATLAB2008. The SVM implementation, employed in the
experiments, is an SVM with Gaussian Radial Basis Function
(RBF) kernel with default sigma equal to 1. In order to find the
separating hyper-planes in MATLAB, we use Sequential
Minimal Optimization (SMO) method. In each experiment, the Figure 6. Precision of classification for different concepts.
number of generations, explored by the GA sub-system, is 50.
The population size should be greater than the number of Precision, recall, F-score, and accuracy of the regular SVM
features. For EDH and GBR this value is 100 and for GCM it are compared with the ones of the proposed solution in Figs. 6,
is equal to 250. In the evaluation function of the GA sub- 7, 8, and 9 respectively. It is almost clear that the related
system, 3-fold cross validation is used to ensure that system outcomes of the proposed method are comparable with the
works independent of the data. regular SVM. It can be seen that in different cases the
D. Numerical Results performance of the existing method is the same as the new one.
However, there are different cases that one of these solutions
For each concept, the GA has to select the best features. The performs better than the other one.
outcome of GA is an input for an SVM learner to learn the While the performance of the proposed solution is almost
dataset. Then the performance of the SVM is compared to the similar to the traditional method, it can reach the results in
performance of a regular SVM that learns with all the shorter period of time. Hence, the achievable accuracy of
available features. In this paper, only six concepts are different classifiers can be compared in conjunction with
examined. These concepts are from three different categories, learning and classification time. For this purpose, we define
introduced in Section 5.2. In other words, “airplane” and the following metric as the achievable accuracy per unit time
“snow”, “car” and “sports”, and “outdoor” and “person” or Accuracy Density:
belong to the Low-Frequency, Medium-Frequency and High-
Frequency categories, respectively. Classification Accuracy
Simulation results are illustrated in Figs. 5 to 10. The time Accuracy Density = (1)
Learning and Classification Time
required for learning and classification in regular SVM and the
proposed method (SVM with GA) are compared in Fig. 5. As
it is clear from this figure, applying GA for feature selection Fig. 10 compares the regular SVM with the proposed
reduces the learning and classification time. This is due to this solution with respect to the Accuracy Density. It is almost
fact that the number of features is smaller in the proposed clear that the proposed method, results in greater values of
method, which results in decreasing the dimension of the accuracy densities, compared with the traditional one.
problem. Table 1 shows the percentage of the selected features by the
GA sub-system for each individual concept. As an example,
for the EDH features of the Airplane concept, GA selects
about 68.49% of features.
In recent years, the need for automatic indexing, searching
and categorizing of multimedia information has been increased
dramatically. One of the most important challenges in this field
is automatic image and video concept detection. Due to the
high dimensional features that are employed, concept detection
with existing classifiers suffers from high computation
complexity, in addition to their unacceptable performance in
some cases. In this paper, a new approach has been proposed to
reduce the classification complexity and the required time for
learning and classification by choosing the most important
features. For this purpose genetic algorithms are employed for Figure 9. Accuracy of classification for different concepts.
feature selection. Experimental results show that the proposed
method reduces the learning and classification time while the
performance of the classification remains almost unchanged.

Figure 10. Accuracy Density of classification for different concepts.


Figure 7. Recall of classification for different concepts.
Concept Selected Concept Selected
- Feature features (%) - Feature features (%)
Airplane Snow
68.49 63.01
Airplane Snow
93.75 87.5
Airplane Snow
40.89 65.33
Car Sports
38.36 52.05
Car Sports
54.17 75
Car Sports
54.67 38.66
Outdoor Person
41.10 34.25
Outdoor Person
58.33 64.58
Outdoor Person
34.66 12.44
Figure 8. F-score of classification for different concepts.

We would like to thank Dr. Pejman Khadivi and Mr.
Mahmoud Momtazpour for their comments and reviews.
REFERENCES Proceedings of the IEEE International Conference on Multimedia &
Expo (ICME), pp. 859-862, Beijing, China, July 2-5, 2007.
[1] Y. Aytar, O.B. Orhan, M. Shah, “Improving Semantic Concept
Detection and Retrieval using Contextual Estimates”, ICME 2007, pp. [10] F. Wang, D. Xu, H. Xu, W. Wu, “Construction of Semantic Network for
536-539, 2007. Videos”, ICICIC (2), pp. 217-220, 2006.
[11] M. Koskela, J. Laaksonen, “Semantic concept detection from news
[2] W. Jiang, E. Zavesky, S. Chang, A. Loui, “Cross-Domain Learning
videos with self-organizing maps”, In Proceedings of 3rd IFIP
Methods for High-Level Visual Concept Classification”, In IEEE
International Conference on Image Processing, October 2008. Conference on Artificial Intelligence Applications and Innovations, pp.
591-599, June 2006.
[3] J. Yang and A. Hauptmann, “(Un)Reliability of video concept
[12] P. Yuan, B. Zhang, J. Li, “Semantic Concept Learning through Massive
detection”, Proc. Int. Conf. Image and Video Retrieval, p. 85–94, IV-D2,
Internet Video Mining”, ICDM Workshops, pp. 847-853, 2008.
[4] S. Chang, D. Ellis, W. Jiang, K. Lee, A. Yanagawa, A.C. Loui, J. Luo, [13] H. Vafaie, K.A. De Jong , “Genetic Algorithms as a Tool for Feature
Selection in Machine Learning”, ICTAI, 1992.
“Large-scale multimodal semantic concept detection for consumer
video”, MIR, pp. 255-264, 2007. [14] M.L. Raymer, W.F. Punch, E.D. Goodman, L.A. Kuhn, A.K. Jain,
[5] A. Yanagawa, A.C. Loui, J. Luo, S. Chang, D. Ellis, W. Jiang, L. “Dimensionality Reduction Using Genetic Algorithms”, IEEE
Transactions on Evolutionary Computation, Vol. 4, pp. 164-171, 2000.
Kennedy, and K. Lee, “Kodak consumer video benchmark data set:
concept definition and annotation”, Columbia University ADVENT [15] J. Jarmulak, S. Craw, “Genetic algorithms for feature selection and
Technical Report 246-2008-4, Sep, 2008. weighting”, In Proceedings of the IJCAI'99 workshop on Automating the
Construction of Case Based Reasoners, pp. 28-33, 1999.
[6] A. Yanagawa, S. Chang, L. Kennedy, W. Hsu, “Columbia University's
Baseline Detectors for 374 LSCOM Semantic Visual Concepts”, [16] L. Lin, G. Ravitz, M. Shyu, S. Chen, “Correlation-based video semantic
Columbia University ADVENT Technical Report #222-2006-8, March concept detection using multiple correspondence analysis”, Proc. of 10th
20, 2007. IEEE International Symposium on Multimedia, pp. 316-321, 2008.
[7] Y. Jiang, A. Yanagawa, S. Chang, C. Ngo, “CU-VIREO374: Fusing [17] A. Yanagawa, W. Hsu, and S. Chang, “Brief Descriptions of Visual
Columbia374 and VIREO374 for Large Scale Semantic Concept Features for Baseline TRECVID Concept Detectors”, Columbia
Detection”, Columbia University ADVENT Technical Report #223-2008- University ADVENT Technical Report #219-2006-5, July 2006.
1, Aug. 2008. [18] Large Scale Concept Ontology for Multimedia,
[8] NIST. TREC video retrieval evaluation (TRECVID), http://www- [19] L. Duan, I.W. Tsang, D. Xu, S.J. Maybank, “Domain Transfer SVM for Video Concept Detection”, Proceedings of the 21st IEEE Computer
[9] L. Lin, G. Ravitz, M. Shyu, and S. Chen, “Video Semantic Concept Society Conference on Computer Vision and Pattern Recognition
Discovery using Multimodal-based Association Classification”, (CVPR 2009), June 2009.