Expert Systems with Applications 39 (2012) 12291–12301

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications
journal homepage:

Two-Tier genetic programming: towards raw pixel-based image classification
Harith Al-Sahaf b, Andy Song a,⇑, Kourosh Neshatian b,c, Mengjie Zhang b

School of Computer Science and Information Technology, RMIT University, G.P.O. Box 2476, Melbourne 3001, Australia School of Engineering and Computer Science, Victoria University of Wellington, P.O. Box 600, Wellington 6140, New Zealand c Department of Computer Science and Software Engineering, University of Canterbury, Private bag 4800, Christchurch 8140, New Zealand

a r t i c l e

i n f o

a b s t r a c t
Classifying images is of great importance in machine vision and image analysis applications such as object recognition and face detection. Conventional methods build classifiers based on certain types of image features instead of raw pixels because the dimensionality of raw inputs is often too large. Determining an optimal set of features for a particular task is usually the focus of conventional image classification methods. In this study we propose a Genetic Programming (GP) method by which raw images can be directly fed as the classification inputs. It is named as Two-Tier GP as every classifier evolved by it has two tiers, the other for computing features based on raw pixel input, one for making decisions. Relevant features are expected to be self-constructed by GP along the evolutionary process. This method is compared with feature based image classification by GP and another GP method which also aims to automatically extract image features. Four different classification tasks are used in the comparison, and the results show that the highest accuracies are achieved by Two-Tier GP. Further analysis on the evolved solutions reveals that there are genuine features formulated by the evolved solutions which can classify target images accurately. Ó 2012 Elsevier Ltd. All rights reserved.

Keywords: Evolutionary computation Genetic programming Feature extraction Feature selection Image classification

1. Introduction The aim of this investigation is to propose a Genetic Programming (GP) methodology for image classification by which the conventional feature extraction step can be avoided. The importance of image classification is self-evident in the recent years as machine vision and image processing applications are widely and increasingly spread into our daily lives. For example an ordinary modern point-and-shoot camera can classify the presence or absence of human faces in real time. By image classification personal hand-held mobile-phones can be turned into a museum guide and an information booth. Museum visitors may access information of a displayed item simply by showing it to their phones. Image classification is frequently found in commercial applications as well, for example identifying pedestrians in security surveillance systems, labeling type of cells or detecting anomaly in medical imaging systems, and differentiating various terrains in satellite imagery applications. Other than image acquisition, the two main components in a conventional image classification approach are feature extraction and classification. The classification is based on the features been
⇑ Corresponding author. Tel.: +61399259761.
E-mail addresses: (H. Al-Sahaf), (A. Song),, kourosh.neshatian@ (K. Neshatian), (M. Zhang). 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved.

extracted. In other words, the image classifier does not directly operate on the input image itself, but on some kind of feature values. Feature extraction is actually a transformation process which converts images into corresponding feature values. The main purpose of this process is dimensionality reduction, as an image often contains a large number of pixels of which many are redundant in term of the contribution towards classification. There are numerous image features available in the literature, for example features based on edges or contours; features based on image histograms, features extracted from transformed domains such as Fourier domain; Wavelet domain and Hough domain; features generated by templates; texture features and so on. Distinctive characteristics of images from different classes are expected to be captured in these features, so that the images would be labeled accurately by a classifier. Without features conventional classifiers would not be able to perform well on images especially when these images are from real world scenarios. The negative side of feature extraction process is that domain knowledge is required. Features are often constructed to fit a specific application. If not so, then the designer of this application needs to select a set of existing features suited for the task. Therefore understanding of the application itself is essential, either for designing features or for choosing features. As the result, features are problem dependent. There is no universal features applicable for all kinds of applications. One workaround for the domain dependence issue is generating a large set of features to cover all

then performing feature selection on them to pick out the prominent features. is a member of Evolutionary Computing methods. In Section 4 four image classification tasks are introduced. As mentioned previously. Section 5 reports the experiments along with the corresponding results. As a powerful problem solving paradigm. Fig. & Zhang. & Nandi. & de Vos. It has been successfully applied into many domains and proven as a powerful problem-solving paradigm (Poli et al. Hence no domain knowledge would be required in this approach. This is known as elitism. Image classification is also an area that GP has been extensively studied (dos Santos. Ferreira. 2005). including a set of functions (internal nodes on a program tree) and a set of terminals (leaf nodes). Such a process is iterative and will stop if one of the termination conditions is met. A program with better fitness is more likely to be selected for reproducing the next generation. Donelli. A selected program may randomly change its branch to create a new program. A domain independent image classification method is therefore highly desirable. Melgani.1. They are evaluated against the problem to be solved. Ciesielski. During the evolutionary process of GP. The new population produced by crossover. Additionally the classification accuracy would largely rely on the performance of the feature selection algorithm. Johnston. 2010). 2005). image classification often consists of two parts: feature extraction and classification... The more successful programs. André Gonçalves. & Lamparelli. 2011). This implements the survival-of-the-fittest principle proposed by Darwin. or individuals. Szymanski et al. studied the use of GP approach to classify images in unbalanced dataset (Urvesh et al. mutation and elitism will be evaluated against the same fitness measure. 2008). 1. 1. A solution built by GP is essentially a computer program... The one-dimensional feature vector generated by their GP method performed better than the classical Canny algorithm. It can be seen that GP is able to construct classifiers which are comparable to conventional classifiers such as decision trees. This is known as crossover in GP. which can be written as Lisp Sexpression ( + (à (À x y)z)( à x( à y y))). Bhowan et al. motion detection (Pinto & Song.12292 H. Kowaliw. GP methodology itself is domain independent. Ventura. Ciesielski. For instance.2. 2009) and finding interest points (Olague & Trujillo. Bhanu and Lin trained GP programs to identify road. Some solutions found by GP are even patentable. is critical in the success of GP evolution. The feature extraction process happens implicitly inside of a GP program. da Silva Torres. & Williams. In the domain of image classification. These studies show that GP can achieve better or at least comparable results in these tasks without providing much domain knowledge. This is known as mutation. Jack. Banzhaf. Their results suggest that GP is capable of handling this difficult problem. Ma. A suitable representation. Background Genetic Programming (GP). 2001). were able to classify haemorrhage and micro aneurysms in retina images by GP (Zhang. 2008). & Herrera. It also has been used for various image related tasks such as image segmentation (Poli. it has been adapted into many complex domains such as timetable scheduling. Goals of this study In this paper we aim to enable GP to build classifiers based on raw pixels instead of feature values. The programs generated by GP are often creative. 2010. . 2008. Selected programs may swap their tree branches to generate new programs. pioneered by John Koza Koza (1992). Each solution is represented as a program tree in Lisp Sexpression as shown in Fig. This paper is concluded in Section 7. It also describes the features manually designed for these domains. Bhowan. Loveard. Fundamentally GP is an evolutionary method for automatic program generation. 2010). 1993. 2003). circuit design. Guo & Nandi. & Zhao. but also address the feature extraction component (Alden Tackett. & Harding. like and field regions from radar images (Lin & Bhanu. Zhang et al. It has been adapted into many domains including classification (Espejo. 2006). Furthermore the time required for feature extraction was significantly reduced because of these GP generated features. Pasolli. 2011. Section 3 presents the GP methodology called Two-Tier GP. texture analysis (Song & Ciesielski. The expression in the figure is actually ((x À y) à z) + (x à (y à y)). Zhang and Rockett proposed a GP-based methodology to extract features for edge detection (Zhang & Rockett. For example Guo and Nandi developed a GP-based approach to extract features which actually outperformed the features designed by domain experts (Guo. GP can no only deal with classification. Organization The rest of this paper is organized as such: Section 2 briefly discusses GP and the related previous work. a perfect solution is found. 2000). Al-Sahaf et al. al used GP to classify texture images (Song. 2008). This approach is computationally expensive. Zhang. 2002). and not thought about by human experts. / Expert Systems with Applications 39 (2012) 12291–12301 the possible good ones. GP is just responsible to generate the classifiers. & Andreae. Attoui. 2011). However most of these previous works require domain dependent features as well. 2009. a group of program trees are randomly generated as the initial population. and stock market prediction (Poli et al. 1. Their approach can handle this complex task well (Song. 2008). Song et al. in our case an image classification task. used GP for texture classification which is based on raw image pixel values rather than on texture features. are more likely to form their own offspring. Each program is assigned with a fitness value which is its performance on solving the task. Li. & Ciesielski. Section 6 is the study on some of the evolved GP classifiers to reveal the captured features. edge detection (Fu. 2005). 2002. A selected program may also be directly copied into the new generation. An example of program tree in GP. The specific research questions of this study are as follows:  How can image classification problems be represented in GP to generate classifiers which operate directly on raw image pixels rather than extracted feature values?  How would this approach perform on a collection of image classification tasks especially when compared with other approaches?  How would the evolved GP classifiers achieve good accuracies? Can this approach automatically evolve genuine features relevant to a problem without human intervention? 1. & Johnston. or the maximum number of generations is reached. The key aspect of GP is to evolve programs as the solutions for a particular problem. which is exactly the motivation of the GP method presented in this study. Song et. 2. GP has also demonstrated its advantages in performance.

Such a GP program tree performs feature extraction as well as classification. Four of the classification functions are arithmetic operators which take two double numbers as their input parameters and return the corresponding result in double. GP image classification and our previous works (Zhang et al. or a terminal node. the Aggregation tier and the Classification tier. Both types of functions are a part of the function set of Two-Tier GP. Table 1 Classification functions.. each of which is a 2D array of pixel intensity values. Based on this two tier structure. 2 the images under different functions are identical. is a variation of 2TGP. In terms of aggregation functions. 2TGP-line. Moreover the aggregation functions are prohibited to become the root node. The functions for the classification tier have no such restrictions. 2 which shows a 4layer program tree. Oechsle and Clark proposed a two-stage GP scheme for this purpose (Oechsle & Clark. 2008. program trees generated by this method have two tiers. In Fig.H. Otherwise. Consequently there might be multiple layers of functions in the classification tier. Our study here extends and compares with this work. Double. 2011). Such a structure is illustrated in Fig.2. It involves two GP programs. However such a tree is not able to receive images as its input hence can not survive during the evolution due to its poor performance. numeric values will propagate from the bottom to the top. & Zhang. or another classification function node. 2. 2003. It can also take other classification nodes as its children. 2011). followed by the fitness measure. 2. The basic set is named as 2TGP in which it takes a square image as input. We expect these different functions to automatically locate different regions which are relevant. 3. Both of these stages are GP-based. Function + À Â Ä IF Input parameters Double. They are building blocks of the classification tier. 2000. Their system performs the operation of feature extraction in the first stage. There are three tiers in each evolved program: image filtering tier. All these functions on the same tree operate on the same input image. In this study we only concern binary classification. Song & Ciesielski. Grammatically a program tree could have no aggregation tier.. Our GP function set using these three methods contains the following functions listed in Table 1 as the classification functions. 3. we propose three sets of functions.1. but above the input terminals. Double. The only child which can be attached to that layer are terminals. 2TGP-mix. which may come from a aggregation function. Atkins et al. the third set. Then we present the function set and the terminal set. Function set More details of the aggregation functions and the classification functions are discussed below. The division operator Ä is protected. but just the classification tier. For aggregation functions that is transformed from an image and for classification functions that is transformed from a couple of numeric inputs. and classification in the second stage. although the feature extraction part is not explicit. Double. 2004). However the system requires human intervention to reformulate the extracted features from the first stage to be used in the second stage. double Return Double Double Double Double Double Fig. an image can be transformed into several values which are eventually turned into a single decision variable. A classification function can be the root node. Methodology This section describes the details of GP representation for raw pixel based image classification. accepts additional shapes like circles and rectangles. 3. This method is referred as ‘‘TwoTier GP’’. The second one. proposed a GP method to evolve a single program which can perform both feature extraction and classification (Atkins. Firstly we introduce the Two-Tier program structure. It reads in lines or 1-D arrays of pixels as input. This classification tier may contain leaf nodes such as the random number terminals shown in Fig. The output from such a function corresponding to one 2D input image is one numeric value. Double. The experiments showed that similar classification performance could be achieved by this Three-Tier approach compared to the manually designed features. double double double double double. All the positive returns from the tree are considered one class while an output equal or below zero is treated as the other class. it returns the value of the third parameter. 2008). All functions produce a numeric output. Neshatian. On top of these two. However the exact behavior of these functions are not predefined but rather established as the result of GP evolution. Once an image is sent to this GP tree for classification. The classification functions operate on double precision floating numbers. Therefore the Aggregation tier is always located at the bottom level of a program tree. which returns zero if the denominator happens to be zero. This setting is very similar to the related work on GP feature extraction. These functions can be viewed as feature extractors of which the major purpose is reducing data dimensionality. Al-Sahaf et al. The aggregation functions take images as inputs in raw pixel values. . The conditional IF function returns the value of the second parameter if the value of the first parameter is negative. The aim of this study is to utilize GP for both feature extraction and classification. which is the current image under classification. feature extraction tier and classification tier. This function is actually performing dimensionality reduction since the function aggregates multiple pixel intensity values into a single value. It should be pointed out that the aggregation layer does not permit any functions as its child nodes due to our grammar constraint. The output from the root is then used as the decision parameter. Poli. The former is constructed out of one layer of aggregation functions while the latter consists of one or more layers of classification functions. such as raw image inputs. Two tier program structure As suggested by the name. Atkins et al. Tree structure in Two-Tier GP. Their evolved programs could achieve similar or slightly better results compared to manually constructed texture features. Similar investigations do exist in the literature. / Expert Systems with Applications 39 (2012) 12291–12301 12293 Similarly Lam and Cisielski reported texture features generated by GP (Lam & Ciesielski.

the red vertical bar. Furthermore due to the small number of pixels in these sample windows. 1) with a size of 4 Â 7. 4 is an example of that. while the right rectangle starts at (6. The aggregation function attached to this terminal will behave differently according to the return value from this terminal. Until this point the behaviors of these five functions are identical. The first case.Y coordinates in integers. Imagewidth] and Y is never outside of [0. Under the 2TGP-line method. but a multitude of them. the shape. ‘‘column’’. Y are responsible to generate random integers as coordinates for the aggregation functions. When the shape is ‘‘circle’’. They are shown in Table 4.4. The ‘‘RandDoub’’ terminal is the usual random number generator. the sampling window starts as X and Y as its top-left corner. 5 illustrates two examples. to specify the size of a sampling region under an aggregation function.12294 H. it randomly returns one of the two values: ‘‘column’’ and ‘‘row’’. The coordinates is (5. After loading the input image. The sampling window is truncated here to prevent readings outside of the image boundaries. The 4th parameter randomly provides an enumerated value. The limits on parameters X. with size S = 9. (2003). 3 which contains two cases of the sampling. The calculations of mean. These five functions all have four input parameters.Imageheight) while that for X. 4) with a size of 8 Â 3. Function AggMean AggMed AggStDev AggMax AggMin P 1 (Image) Image Image Image Image Image P 2 (X) Integer Integer Integer Integer Integer P 3 (Y) Integer Integer Integer Integer Integer P 4 (Size S) Integer Integer Integer Integer Integer Return Double Double Double Double Double . at the position X. The smallest possible value for size S is 3. As can be seen from the figure. To avoid out-of-range errors. with size S = 10. ‘‘row’’. is a shape in ‘‘column’’ started at X = 4 and Y = 5. the pair of coordinates define the center of the circle. a function would sample a horizontal 1-D array of length S from the original image to perform the corresponding calculation. For a aggregation function operating under ‘‘circle’’. S are input from corresponding terminals. these functions sample a square sub-image of size S Â S. Functions like mean and standard deviation are used widely such as Smart and Zhang (2003). Y is Imagewidth and Imageheight respectively. the blue horizontal bar. For value ‘‘row’’. This is illustrated in Fig. is a shape in ‘‘row’’. max and median have been shown as well (Atkins et al. started at X = 1 and Y = 2. the execution of GP programs would be computationally less intensive.3. For other shapes. Fig. They operate together on an image.Y. It reads in the original image in raw pixels as a 2D-array. They are not allowed to go beyond the image boundaries.. then the sampling process will take X and Y as the center. Y. One might think that would not be enough to catch important characteristics of input images. which is not necessarily the most suited for all problems. Terminal set The terminal set in conventional GP methods for image classification usually just includes a random number generator and nodes for receiving feature values as the inputs. except the enumerated values for shapes. That means X is always in the range of [0. The first parameter is the original input image for classification.6. The last terminal Shape is not used by 2TGP. Aggregation functions: 2TGP-mix There are not many differences in aggregation functions between 2TGP-mix and 2TGP-line. A sub-image smaller than 3 Â 3 can not sufficiently represent the entire image. The responsibility of GP evolution here is more or less to find the most prominent sub-images with an optimal size at the optimal position so that the characteristics of original images in different classes can be captured. The Image terminal is the input of GP classifiers. The areas beyond the image boundaries are not used. Zhang et al. It is the terminal for classification functions. The sampling window will be truncated if it exceeds the image boundaries. 1994). except for one extra input parameter. The fourth is also an integer which indicates size S. it has five possible return values: ‘‘square’’. However the terminal set in our methodology is a little more complicated due to the TwoTier program structure. the size S is unused. They are similar to those for 2TGP in Table 2. Aggregation functions: 2TGP The aggregation functions for 2TGP are listed in Table 2 which also shows the input data types and the output data type. Fig. There are five possible values for the 4th parameter in 2TGP-mix: ‘‘square’’.4) and the size S is 7. Under the 2TGP-mix method. The motivation behind it is to enable sampling windows that are not in square shape. The first three are the same as those in 2TGP and 2TGP-line. a vertical or horizontal array of size S respectively. Imageheight]. Other type of terminals X. only 6 pixels not 10 pixels are included in this case.Imageheight)]. The intention of this function set is to see whether introducing more shapes are beneficial or not. 3. These aggregation functions operate in a way similar to feature extraction in the literature. Al-Sahaf et al. so we expect a collection of horizontal lines and vertical lines would be more flexible in capturing the most important regions. The thickness of sampled sub-images either in ‘‘row’’ shape or in ‘‘column’’ shape is just 1. The combination of these five shapes may provide more opportunities to construct an optimal sampling region which would lead to high accuracies in classification. For shape ‘‘rectangle’’. ‘‘circle’’ and ‘‘rectangle’’. Table 2 Aggregation functions (2TGP). 3. However it should be pointed out that there is not just one such function on a GP tree. The left rectangle starts at (3. [3. the maximum for size S is min(Imagewidth.5. 3. ‘‘row’’. Oechsle and Clark (2008). They operate on a S Â S square. In this case.min(Imagewidth. and return the result as a double precision floating number. / Expert Systems with Applications 39 (2012) 12291–12301 3. the standard deviation. 2011). The second and third are X. Terminal Size returns a value in between 3 and the minimum of image width and height. The second case. the median. Instead two extra values will be generated to determine the width and height of the window. When the value is ‘‘column’’. ‘‘circle’’ and ‘‘rectangle’’. Y and the size for 2TGP apply here as well. median and three other values are then based on the pixels under the circle but within the image boundaries. it is the top-left corner of the sampling window. The effectiveness of functions min. The other terminals are all for the aggregation functions described in the previous sub-sections. ‘‘column’’. Then they will respectively calculate the mean. Aggregation functions: 2TGP-line The aggregation functions used for 2TGP-line are shown in Table 3. and hence more expressive than 2TGP sampling windows which are squares. The rest are the same as shown in Table 3. The parameter values X. either ‘‘column’’ or ‘‘row’’. the maximum and the minimum from the sampled subimage. using the Bresenham circle algorithm to generate a circle of which the diameter is S (Hearn & Pauline Baker. then the function samples a column of pixels as a vertical 1-D array of which the length is S. All pixels in gray participate in the calculation of the aggregation functions.

/ Expert Systems with Applications 39 (2012) 12291–12301 Table 3 Aggregation functions (2TGP-line and 2TGP-mix). Obviously a GP program with higher accuracy would be assigned with a better fitness. Fitness measure An essential part of evolutionary methods is the fitness measure. (1): Fig. Examples of rectangular input in 2TGP-mix. the negative examples been correctly classified. In the case of image classification. Fitness ¼ Classfication Accuracy ¼ TP þ TN Â 100% TOTAL ð1Þ where TP is the total number of True Positives.H.7. the positive examples been classified as positive. 4. Image classification tasks Fig. 5. TN is the total number of True Negatives. but also used in evaluating the performance of evolved classifiers on test data. Examples of circular input in 2TGP-mix. circle. 4. This classification accuracy measure is not only used in evolving GP classifiers on the training set. column. 1] 2D-array containing raw image pixel values Size of the sampling window Returns a value as the coordinate Returns one from {square. Examples of input in 2TGP-line. row. rectangle} 3. Function AggMean AggMed AggStDev AggMax AggMin P 1 (image) Image Image Image Image Image P 2 (X) Integer Integer Integer Integer Integer P 3 (Y) Integer Integer Integer Integer Integer P 4 (shape) Enum Enum Enum Enum Enum P 5 (Size S) Integer Integer Integer Integer Integer 12295 Return Double Double Double Double Double Table 4 Terminals set. TOTAL is the total number of examples in the dataset. 3. To evaluate the performance of our proposed Two-Tier GP methodology.Y Shape Type Double Image Integer Integer Enum Description Randomly generate a number in between [0. . the performance criterion is quite straightforward. four different sets of binary image classification (a) (b) Fig. hence more likely to participate in the process of reproducing the next generation. That is the classification accuracy as shown in Eq. Terminal RandDoub Image Size X. Al-Sahaf et al.

Only two classes are selected for this investigation: Lymphocytes non activés and Mésothéliales.12296 H. As shown in Fig. Face detection The Second dataset is the MIT Faces dataset (Sung. DEHG and EFIH.1. The Mean and Standard Deviation of these seven areas are calculated to form the fourteen features for this task. 6 shows some samples of the two classes: Head and Tail. Fig. 8. Features for face detection The domain-specific features for face images are designed based on the work of Urvesh et al. Examples of head (top row) and tail (bottom row). 9. There were ten domain specific features manually designed for the coin problem.1. Features for cell image classification The manually designed features for the cell problem are similarly these for the coins dataset. a group of manually defined features are also presented along with each data set. Cell recognition The third dataset is Microscopic Cells (Lezoray. 4. BCFE. an image is divided into seven regions. Al-Sahaf et al. which considered as a hard problem.1.2. 6. They are used for comparing with the Two-Tier GP approaches which do not use any pre-defined features. 7. This dataset consists of 384 grey-scale images of size 55 Â 55.1.3. since our focus is binary classification. 4. The original dataset consists of 3900 color images of eighteen classes. tasks are introduced. Fig. 4. Examples of face images (top row) and non-face images (bottom row). 7. Fig. Elmoataz. dark regions are always present around areas of the eyes. 4. . which are considered as a relative easy problem. 4. Pre-defined features for coin images.2. Notes that the coins of both classes are placed in different angles to prevent the classifiers from being specific to a particular coin position. (2010). 9.1. and the central square JKML respectively. / Expert Systems with Applications 39 (2012) 12291–12301 Fig.3. these coin images are divided into small regions for feature extraction. DEHG. but not on non-face images. Features for coin images As shown in Fig. manually categorized by domain experts. 1995). the nose. Fig. BCFE. which contains 60000 grey-scale examples of size 19 Â 19 that fall into two classes: face and non-face. They are the Mean and Standard Deviation of pixel intensities for the four quadrants ABED. For example. They are the quadrants ABED. 2003). JKON and PQSR which represent the eyes. 4. and three specific areas ACML. and the mouth respectively. and EFIH. Fig. They are described in this section in the order of problem difficulties. The difficulty of this task is considered as medium because of the similarity between face images. & Cardot. Pre-defined features for face detection. There are in total 1297 instances which have been converted into grey-scale images of size 22 Â 22 pixels. 8 shows some examples of them. Coins First dataset is Coins (Smart & Zhang. 2005). Ten features (Mean and Standard Fig. 10 shows the examples of both classes. Furthermore.

11: the quadrants ABED. This dataset consists of 10002 grey-scale examples of size 18 Â 36. pedestrian images and non-pedestrian images. 2006). Pre-defined features for pedestrian detection. GHKJ. (2011). EFIH. RSUT and TUWV which roughly cover the head. Fig. Al-Sahaf et al. 13. Features for pedestrian detection problem The domain-specific features are based on the work of Atkins et al. the torso and the legs of a pedestrian. 5. Deviation) have been extracted from five different areas as shown in Fig. we split the dataset into two halves. The fourth dataset is for pedestrian detection (Munder & Gavrila. It is a difficult task because both classes. 13. These regions are: the octets ABED.4.4. and the GP run time parameters. BCFE. DEHG. Experiments and results The experiments are presented in this section which includes the data preparation. Examples of lymphocytes non activés (top row) and mésothéliales (bottom row). 11.1. Examples of pedestrian (top row) and non-pedestrian (bottom row). The first half is for evolving the classifiers while the second half is for testing the performance of the best GP classifier trained by the Fig. JKMN and KLON. and the central square area JKLM. / Expert Systems with Applications 39 (2012) 12291–12301 12297 Fig. . There are in total 22 features (Mean and Standard Deviation) extracted from eleven regions as shown in Fig. Pre-defined features for cell image classification. 10.1.H. 5. BCFE. Pedestrian detection Fig. the other methods for comparison. 4. Data preparation For each of the four tasks. Fig. 12 shows some of them. DEHG and EFIH. HILK. 4. 12. The experimental results obtained from the four tasks are listed and discussed at the end of this section. contain large variations. and three middle areas PQSR.

 à Means being significantly > 3TGP. feature extraction and classification.2. Gabriel Balan. The ratios of new individuals generated by crossover. T-test has been conducted on pairs of one Two-Tier GP method with the FeEx + GP method. Their accuracies in training and test are comparable to that of the FeEx + GP approach. There are 30  5 methods  4 datasets = 600 runs in total.. There are four blocks stacked together in the table. 19% and 1% respec- . Each population consists of 1024 individual GP classifiers to reduce the chance of early convergence. The results of classifying blood cells show that FeEx + GP is the worst performer this time. The outcomes are presented using small marks in Table 7:  § Means being significantly < FeEx + GP. The initial populations are created by ramped half-and-half method. So the column and row approach can not capture the characteristics as well as the human designed features used by FeEx + GP. 5.4. the maximum and the mean training/test accuracies obtained from the 30 runs for each task are presented along with the respective method. Parameter Generations Population size Crossover rate Mutation rate Elitism rate Tree depth Selection type Tournament size Value 50 1024 0. For the task of face detection. which consist of tiers for image filtering. all the methods performed well except 3TGP. GP run time parameters All the methods applied on the four classification tasks involved GP. The random seeds for each of the 30 runs in one set of experiments are all different. from coins problem to pedestrian detection. This suggests that the pre-defined features might not be very suited for this problem. In other words. mutation and elitism are 80%. The depth of each individual is kept in between 2 and 10. Then the training and test are both conducted based on the feature data. It was clearly outperformed by the three Two-Tier methods. Liviu Panait. However 2TGP-line could not match with FeEx + GP. The second method for comparison is another classification method based on GP. It seems what is required in this problem is some form of blob information which can be obtained by squares. Each evolutionary process stops at the maximum generation 50 unless a perfect classifier with accuracy 100% is found. It is proposed by Atkins et al. The two Two-Tier approaches. 5. This is quite likely because the features were designed based on human hypothesis rather than built on the true nature of cell images. image division and so on. 2TGP-line and evolution.01 2–10 Tournament 7 tively. two other methods are applied on the same four datasets for comparison. In this case both 2TGP-line and 2TGP-mix achieved the highest training and test accuracies. Each block is the outcomes for one classification task. Training set Pos Coins Faces Blood cells Pedestrians 96 1500 349 2501 Neg 96 1500 299 2501 Total 192 3000 648 5001 H. feature extraction then classification. 5. The GP implementation used in all the experiments is the Evolutionary Computing Java-based (ECJ) package (Luke. All the automatic feature discovery methods here achieved higher accuracies in both training and test. The classification method is the canonical GP as GP is able to match other non-evolutionary classification methods (Espejo et al. To confirm the significance of the differences showing in Table 7. Therefore 2TGP and 2TGP-mix are able to perform well. A three-tier structure is used by the evolved classification programs. 3TGP is again the worst performer. Among the three Two-Tier methods. The sizes of these datasets are listed in Table 5. tournament selection is applied to pick good individuals for reproducing new generations.    Means being significantly > FeEx + GP. At the end of each training. Therefore each method receives 30 training accuracies and 30 corresponding test accuracies on one dataset. not on the raw images. the best program evolved from the last generation is then evaluated against the test set. 3TGP To evaluate the effectiveness of the Two-Tier approaches. Every training is repeated 30 times for each method for each dataset. For the comparison purpose all the GP run time parameters in these methods are made identical. The standard deviation of the 30 accuracies is also listed. (2011) whose aim is also to establish a domain independent classification method which does not require pre-defined image features. namely ‘‘2TGP’’. There is no class imbalance between the positives and the negatives to bias the final result. The above two methods are compared to the three variations of Two-Tier GP.80 0. The first is the conventional two-step approach. although 3TGP also managed to achieve perfect training and test in some runs. which are listed in Table 6. The left half of the table is for training while the right half is for the corresponding tests.19 0. Atkins et al. circles or rectangles. 2010). These functions aim to enhance the input images for subsequent processes. ‘‘2TGP-line’’ and ‘‘2TGP-mix’’ which use slightly different sampling windows to process input images as discussed in Section 3. This feature-based approach is referred as the ‘‘FeEx + GP’’ method. Al-Sahaf et al. & Zbigniew Skolicki. One of the reasons is that ECJ can be easily extended to support the multi-tier structures. The 2TGP method was not so accurate. The columns Pos and Neg show how many positive examples and negative examples each dataset has. Domain dependent features described in Section 4 are extracted for each dataset. For the coin problem. Functions of image filtering support operations like image subtraction. and on pairs of one Two-Tier GP method with the 3TGP method. Results and discussions The experimental results from the 600 runs of the five methods are summarized in Table 7.3. these two methods are able to automatically define certain type of features which can compete with those pre-defined features. Such results suggest the Two-Tier approaches are able to define features which can lead to good classification outcomes. The minimum. Methods for comparison: FeEx + GP. 2010). However its performance is still comparable to that of FeEx + GP. For instance the use of median filter can remove the salt and pepper noise. named as ‘‘3TGP’’.12298 Table 5 Datasets for the four tasks. 2TGP and 2TGP-mix performed similarly. To maintain population diversity. One possible reason is that both heads and tails on coin images do not contain many lines or edges. Sean Paus. / Expert Systems with Applications 39 (2012) 12291–12301 Test set Pos 96 1500 349 2500 Neg 96 1500 300 2500 Total 192 3000 649 5001 Table 6 GP run time parameters.

14.97 1.79 4. Additionally the classifiers evolved by Two-Tier approaches do not differ much in performance.00 97. Program analysis One issue embedded in the GP paradigm is the understandability of evolved solutions.0 100.00 71. dev 0.95 1.88 81.37 97.91 0.61 93.04 91.62à 90.0 100. An evolved coin classifier with its sampling regions.93à 95.81à 98. we can see that the difficulties of these classification problems gradually increase as indicated by the drop in accuracies from the coin problem to pedestrian detection.93 0.64 à 95.97 Testing (%) Min 99.37 97.97 3.31 0.80 3.86 74. We name them as F1. Their performance is consistent in the pedestrian detection problem.96 Face detection 88. A close inspection of the classifier reveals that the program tree is the following equation: 2  F1 À F2 À F3 So this classifier is actually calculating the differences between a coin’s central area and its rim areas in terms of the standard deviation.83 75.54 87.16 Max 100.59 86.76 4.26 3.04 92.66 81.96 1. Function F1 operates on the middle area of coin images.69 87. again achieved high accuracies in this task.61§ 99.39 à 87.23 90.40 89.72 2.0 100. The sampling windows under them are marked as sub-figures (b) to (d) in Fig.61 2.87§ 79. Al-Sahaf et al.81 88. 6.0 100.90 97. The three aggregation functions on the tree happen to all be standard deviation functions operating on a square.97 92.95 93.57 80.90 97. .00 7.03 95.0 92.73 à 80.01 92.75 0.75 1.13 à St.37 75.80 à 96. 2TGP-mix.36 86.40§ 99.54 80.16 6.00 à 94. Functions F2 and F3 on the contrary operate on the rim areas of coins.96 98.43 à 84.24  95.24 à 85.40 94.93 Mean 100.79 98.0 100.03 81.16 0.26 Blood cell classification 76.70 89. 14(a) shows a classifier trained for the coins dataset by the 2TGP-mix method.50 96.40§ 92.76 85.99 84.72§ 99.0 100.53 89.84 84. Both of them are significantly better than FeEx + GP and 3TGP.26 2.00 97.90 2.56 4.23 Max 100. / Expert Systems with Applications 39 (2012) 12291–12301 Table 7 Results from the four datasets.52 63.80 Fig.0 94.08§ 93.62 2.28 Mean 99.07 Pedestrian detection 72.05 86. 14.76 93.42  93.99 87.66  85.25 2.43 85.46 91.42 à 86.58 70. In this part of the study we aim to manually analyze some of the classifiers generated from the experiments in Section 5.0 100.81 98.84 97.46 3. From what is presented in Table 7. The standard deviation on their training and test results are not big.33 76.93 85.21 6.52 1. Higher deviations are found in FeEx + GP and 3TGP for relatively difficult problems.63 97. dev 0.99 0.28 82.0 93.44 96.03 2.75 2. Training (%) Min FeEx + GP 3TGP 2TGP 2TGP-line 2TGP-mix FeEx + GP 3TGP 2TGP 2TGP-line 2TGP-mix FeEx + GP 3TGP 2TGP 2TGP-line 2TGP-mix FeEx + GP 3TGP 2TGP 2TGP-line 2TGP-mix Coins dataset 100.88 90.49§à 99.50 96.76 93.43 83.36 92. These program trees are usually difficult to interpret despite their good performance.64 à 81.10 0.14 90.07 98.15 0. F2 and F3 from the left to the right for the sake of analysis.69 95.48 73.51 1.0 75.H.52 96.97 1.26 76.65 2.70 à 95.57 95. However the Two-Tier approaches are still able to reach good accuracies.57à 83.79 à 95.47 1.59 95.42 1.90 à 12299 St.35 96. Fig.07  96.16 74.0 100.82 1.83 94.76  95.47 97.63 0.97 88.98 97.27 91. It scored 100% accuracy in both training and test.

these programs are still able to find out prominent regions. 2TGP-mix stays as the best method. The opposite is often true. We can see that it heavily relies on areas where the standing body of a pedestrian is likely to appear. Such performance is not by chance as features can be revealed by the analysis on these evolved classifiers. The classifiers evolved for the blood cell problem and pedestrian detection are more complex to be interpreted.69% and 97. domain experts would not be confident in removing most of pixels in feature calculation. It was one of the best during the experiment and achieved 89. That probably explains the high accuracy of this classifier. Three different variations of this approach are introduced. 15(b)-(h) show these regions selected by the aggregation functions. because it has more function nodes and uses a mixture of different aggregation functions. we can still gain some insights. 7. Obviously these areas are the most distinctive regions of human faces. 15.12300 H. The results show that the proposed Two-Tier GP method is able to achieve better or at least comparable performance compared to classification based on pre-defined features. The training and test accuracies were 97. The excellent performance of this classifier indicates that the majority part of the image are not necessary for the classification. trees growing unnecessarily large during the evolution. They sample images in slight different ways. The comparisons are conducted on four sets of image classification tasks.07% respectively. 16. and a tier of classification functions which are to transform the outputs of aggregation functions into a class label. Fig. Important features are indeed automatically discovered by the Two-Tier approach. 2TGP-line and 2TGP-mix. Fig. However by analyzing its sampling regions. the Three-Tier method is not as effective possibly due to the excessive tier for image filtering. These classifiers have a tier of aggregation functions which are to transform an image into a single numeric value. Sample program (pedestrians) feature regions. This could well be the defining feature to separate heads from tails. However features are expected to be captured by these programs. 15(a) shows one of the best classifiers evolved for the face detection task using 2TGP-mix. Another way of directly classifying raw image is also compared with the three Two-Tier method. One may simplified these programs to remove the redundant nodes on the tree. It can be seen that the focus of this classifier is around the eyes and the nose. The evolved GP classifiers take raw pixel values directly as the inputs. manually designed features tend to cover most of pixels on an image. Only a small proportion of the image is used to reach the decision. 2TGP-line. Moreover the feature used by this classifier is quite concise. performed well on the four datasets. With the flexibility of using a range of sampling windows. 2TGPmix. For example the sampling regions of a pedestrian classifier are shown in Fig. In comparison. All three representations 2TGP. 16. 14(a). . Although no human intervention nor domain knowledge is involved. Sample program (faces) (a) tree (b-h) Feature regions. In comparison.20% at test. Al-Sahaf et al. These three methods have been compared with a traditional approach which performs classification on manually defined feature values. In a way implicit and effective feature discovery is achieved through this Two-Tier GP method. Fig. Features around them would support good classification. / Expert Systems with Applications 39 (2012) 12291–12301 Fig. 2TGP. Note one of the stumbling blocks in tree analysis is the bloat problem. Most of the regions found by GP are vertical bars which may be more suitable for this particular domain. Combating bloat and simplifying evolved programs are not in the scope of this work. Conclusions This paper proposes a Two-Tier GP methodology for image classification. This tree is more difficult to understand compared to the tree in Fig.

R. Do. V. John R. Ma. GECCO (Vol. (2002) Feature extraction from multiple data sources using genetic programming. . & Williams. (2008). Genetic programming for edge detection: A global approach. 538–547. Melgani. (2002). P. W. & Clark. Prentice Hall. Proceedings of the 2011 IEEE congress on evolutionary computation. 5–8 June 2011 (pp. (Vol. IEEE Transactions on Pattern Analysis and Machine Intelligence.. Corbett. Banzhaf. A. 6021. Olague. Dignum. Li. A. Image and Vision Computing. Part B. The University of Birmingham. pp. Kowaliw. Sung.. A. I. school of computer science. Object detection via feature synthesis using MDL-based genetic programming.). Shen & Paul E. Brumby. / Expert Systems with Applications 39 (2012) 12291–12301 12301 In the near future we will further analyze the evolved programs to see the underlying reasons why these Two-Tier programs are accurate in classification. (2006). Ekárt. Fu. 40(2). Riccardo In A.... Computer graphics (2 ed. (2001). M. Technical report.-G. H. 35(3). (2011).. A. Zhang. In Proceeding of the 24th international conference image and vision computing New Zealand. (2004). In Alice E. Elmoataz. Lewis (Eds.. Man and Cybernetics. & Michael J. Rubens A.>. William B. Moreover new components for Two-Tier GP will be proposed to enhance the applicability and performance of this method. Loveard. Part B. Y. In Kalyanmoy Deb. CEC ’02. Part C. Espejo. Using genetic programming for multiclass classification by simultaneously solving component binary classification problems. Asoke K. Ecj: A java-based evolutionary computation research system. In H.. & Andreae. 1–13). & Pauline Baker. R. & Ciesielski. D. & A. Genetic programming for feature detection and image segmentation. H. Texture classifiers generated by genetic programming.H. L. Silva. Genetic programming for classification with unbalanced data. Kharma & Harding. Ciesielski. A. Asoke K. Lin. P. Evolving optimal feature extraction using multiobjective genetic programming: A methodology and preliminary study on edge detection. SPIE..Poli@cs. pp. Smart. The University of Birmingham. Massachusetts Institute of Technology. Australian joint conference on artificial intelligence. V. & Trujillo.. Oechsle O. M. Sean Paus. Zbigniew Skolicki. (2005). Poli.. pp. (2008).. E. Poli. W. A.). In IEEE congress on evolutionary computation (pp. & Rockett. Dariu M. Song. D. Pedro G. Brooks (Eds. McPhee. B. H. Algorithms and technologies for multispectral. P.. 2502–2507). Luke. D. In IEEE congress on evolutionary computation (pages 5–8). F. & Gavrila. Smart. (2011). pp. W.. Liviu Panait. Evolutionary Computation. 254–261). 2000.. Texture segmentation by genetic programming. Will R. (Eds. <R. Al-Sahaf et al. 2. & Ciesielski. & Nandi. & Zhang. Langdon. MA. (1994). & Zhang.. Proceedings of the 2002 congress on.. (2003). M. Y. Nawwaf N.. K.. Pope. 980–987. Genetic programming for feature discovery and image discrimination. 28(11). New Orleans. A relevance feedback method based on genetic programming for classification of remote sensing images. T. 39(5). Evolutionary computation. USA: MIT Press.. S. William D. 16(4). UK. In ICGA (pp.... (2010). & N. Jack. A Field Guide to Genetic Programming.. PhD thesis. Man. 89–99. and Cybernetics. IEEE Computational Intelligence Society. Zhang. Detecting motion from noisy scenes using genetic programming. . Pattern Recognition. B. R. M.. & Herrera. Smith (Ed. Machine Vision and Applications. K. V. (2000). S. F. Evolutionary-computer-assisted design of image operators that detect interest points using genetic programming.. (2009). (2010). (1992). M. J. (2006). Y. Esparcia-Alcázar. & Zhang. Lecture notes in computer science (Vol. 2492–2497).bham. In IGARSS (2) (pp. M.. 1114–1125). Cristiano D. Springer. Ferreira. Towards genetic programming for texture classification.. Learning and example selection for object and pattern detection. D.. 525–528). U. (2005). Q. & Nandi. H. Elena Popovici et al. M. Feature extraction and classification by genetic programming. Winter. 461–>. Adrian F.. (1995). IEEE Transactions on Systems. Edgbaston.. A survey on the application of genetic programming to classification. IEEE Transactions on Systems. pp. Donelli. M. EURASIP Journal on Applied Signal Processing (8). Ventura. M. R. In IEEE congress on evolutionary computation (pp. Darwen. Lezoray. ACM... Song.). 2671–2684. In Sylvia S.. S. Burke. Edmund K. Classification strategies for image classification in genetic programming. et al. J. Pinto. A domain independent genetic programming approach to automatic feature extraction for image classification. & Ciesielski.. Neshatian. 35(1). Wellington: IEEE. Two improvements in genetic programming for image classification.. Man.). André Gonçalves. The knowledge revealed from this analysis would be helpful in solving more complex problems by GP. Sima Etaner-Uyar (Eds. (2003).gmu. Brian T. Birmingham B 15 2TT. & Zhang. Pasolli. 166–171. A domain independent window approach to multiclass object detection using genetic programming.. Gabriel Balan. 1863–1868. B. 227–239). dos Santos. Cambridge. Ciesielski.-M. IVCNZ ’09 (pp. S. hyperspectral. Galassi. (2003). S. O. & Song. IEEE Transactions on Systems. C. 841–859. Guo. T. (2005). & Cardot. & de Vos. Information Science.)... 29(7).. (2005). Munder.. M. et al. Tackett. 181(13). and Cybernetics. A. Guo. 243–248). M. IEEE.. Attoui. 461–472). & Zhao. In Proceeding of image and vision computing conference ( John J. (2011). Hans-Georg Beyer. An experimental study on pedestrian classification. (Vol. A. lulu. 795–802). O’Reilly (Eds. Wolfgang Banzhaf. Genetic programming: On the programming of computers by means of natural selection. Evolving novel image features using genetic programming-based image transforms. V. 1. (2011). Lecture notes in computer science (Vol. Discovery of human-competitive image texture feature extraction programs using genetic programming. da Silva Torres.). Hearn. 121–144. Esch-Mosher. In M. Lam. & Bhanu. 484–498. (2008). Steven P. 338–345). Breast cancer diagnosis using genetic programming generated feature. IEEE. F. Zhang.). 2002.. & Lamparelli. 303–311). M. 4725. 14. Springer. EuroGP.. IEEE Press. 2256. & Paul J. Szymanski.. M. and ultraspectral imagery VIII. Feature generation using genetic programming with application to fault classification. In ICVS (pp. Beyer & U. Available from <http://cs. Johnston. 402–407). A color object recognition scheme: Application to cellular sorting. Koza. Automatic detection and classification of buried objects in gpr images using genetic algorithms and support vector machines. & Johnston. 322–327). (1993). Stumptner. 131–140). (2009).. (2008). G... GECCO (pp. Eads. (2010). References Atkins. In EuroGP (pp. Victor B. USA. L. (2008).. S. Song.