Genetic Programming based Image Segmentation with Applications to Biomedical Object Detection

Tarundeep Singh Dhot, Nawwaf Kharma
Department of Electrical and Computer Engineering Concordia University, Montreal, QC H3G 1M8

t_dhot@encs.concordia.ca, kharma@ece.concordia.ca

Mohammad Daoud
Department of Electrical and Computer Engineering University of Western Ontario London, ON, N6A 3K7

mohammad.dauod@gmail.com

Rabab Ward
Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, V6T 1Z4

rababw@ece.ubc.ca

ABSTRACT
Image segmentation is an essential process in many image analysis applications and is mainly used for automatic object recognition purposes. In this paper, we define a new genetic programming based image segmentation algorithm (GPIS). It uses a primitive image-operator based approach to produce linear sequences of MATLAB® code for image segmentation. We describe the evolutionary architecture of the approach and present results obtained after testing the algorithm on a biomedical image database for cell segmentation. We also compare our results with another EC-based image segmentation tool called GENIE Pro. We found the results obtained using GPIS were more accurate as compared to GENIE Pro. In addition, our approach is simpler to apply and evolved programs are available to anyone with access to MATLAB®.

1. INTRODUCTION
Image segmentation is the process of extraction of objects of interest from a given image. It allows certain regions in the image to be identified as an object based on some distinguishing criteria, for example, pixel intensity or texture. It is an important part of many image analysis techniques as it is a crucial first step of the imaging process and greatly impacts any subsequent feature extraction or classification. It plays a critical role in automatic object recognition systems for a wide variety of applications like medical image analysis [8, 9, 14, 15], geosciences and remote sensing [2, 3, 4, 5, 10, 11], and target detection [10, 11, 16]. However, image segmentation is an ill-defined problem. Even though numerous approaches have been proposed in the past [7, 12, 13], there is still no general segmentation framework that can perform adequately across a diverse set of images [1]. In addition, most image segmentation techniques exhibit a strong domain or application-type dependency [7, 12, 17]. Automated segmentation algorithms often include a priori information of its subjects [8], making use of well-designed segmentation techniques restricted to a small set of imagery. In this paper, we propose a new, simple image segmentation algorithm called Genetic Programming based Image Segmentation (GPIS) that uses a primitive image-operator based approach for segmentation and present results. The algorithm does not require any a priori information about objects to be segmented other than a set of training images. In addition, the algorithm is implemented on MATLAB® and uses its standard image-function library. This allows easy access to anyone with MATLAB®. In the following sections, we provide a brief introduction to relevant work in GP based image segmentation and image analysis, followed by an overview of our approach in Section 1.3. Section 2 describes the methodology of our algorithm and the

Categories and Subject Descriptors
I.4.6 [Image Processing and Computer Vision]: Segmentation – pixel classification.

General Terms
Algorithms, Experimentation.

Keywords
Image Segmentation, Genetic Programming.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO’09, July 8–12, 2009, Montréal Québec, Canada. Copyright 2009 ACM 978-1-60558-325-9/09/07...$5.00.

experimental setup for compiling results. Finally, Section 3 presents the results of the experiments conducted on a biomedical image database for cell segmentation purposes. We also compare our results with another EC-based image segmentation algorithm called GENIE Pro.

1.1 Related Work
One of the initial works in this field was published by Tackett [16] in 1993. He applied GP to develop a processing tree capable of classifying features extracted from IR images. These evolved features were later used to construct a classifier for target detection. On the same lines, in 1995, Daida et al. [5, 6] used GP to derive spatial classifiers for remote sensing purposes. This was the first time GP was used for image processing applications in geosciences and remote sensing. In 1996, Poli [14] proposed an interesting approach to image analysis based on evolving optimal filters. The approach viewed image segmentation, image enhancement and feature detection purely as a filtering problem. In addition, he outlined key criteria while building terminal sets, function sets and fitness functions for an image analysis application. In 1999, Howard et al. [10, 11] presented a series of works using GP for automatic object detection in real world and military image analysis applications. They proposed a staged evolutionary approach for evolution of target detectors or discriminators. This resulted in achieving practical evolution times. In 1999, another interesting approach was proposed by Brumby et al. [4]. They used a hybrid evolutionary approach to evolve image extraction algorithms for remote sensing applications. These algorithms were evolved using a pool of low level image processing operators. On the same lines, Bhanu et al. [2, 3] used GP to evolve composite operators for object detection. These operators were synthesized from combinations of primitive image processing operations used in object detection. In order to control the code-bloat problem, they also proposed size limits for the composite operators. In 2003, Roberts and Claridge [15] proposed a GP based image segmentation technique for segmenting skin lesion images. A key feature of their work was the ability of the GP to generalize based on a small set of training images. Our approach is motivated by the works of Tackett [16], Brumby et al. [4] and Bhanu et al. [2, 3]. They all effectively implemented a primitive image operator based approach for image analysis. This is similar to our approach. In addition, we have used the key criteria outlined by Poli [14] as references while building our algorithm.

and mutation. In order to compute fitness of a pipeline, the resultant segmentation produced by a pipeline is compared to a set of training images. These training images are produced by manual labeling of pixels by user as True (feature) or False (non-feature) pixels using an in-built mark-up tool called ALLADIN. Finally, when a run of GENIE Pro is concluded, the fittest pipeline in the population is selected and combined using a linear classifier (Fisher Discriminant) to form evolved solution that can be used to segment new images. GENIE Pro was developed for analyzing multispectral satellite data. It has also been applied for biomedical feature-extraction problems [9]. We have used it for comparison purposes.

1.3 Overview of Our Work
In this paper, we describe a new genetic programming based image segmentation algorithm, GPIS that uses a primitive imageoperator based approach for segmentation. Each segmentation algorithm can be viewed as a unique combination of image analysis operators that are successfully able to extract desired regions from an image. If we are able to describe a sufficient set of these image analysis operators, it is possible to build multiple segmentation algorithms that segment a wide variety of images. In GPIS, we define a pool of low level image analysis operators. The GP searches the solution space for the best possible combination of these operators that are able to perform the most accurate segmentation. From now on, we refer to these image analysis operators as primitives. Each individual in a population is a combination of these primitives and represents an image segmentation program. Therefore, GPIS typically breeds a population of segmentation programs in order to evolve one accurate image segmentation program.

2. METHODOLOGY
The proposed algorithm GPIS is designed as a general tool for learning based segmentation of images. In this paper, particular attention is given to the testing it on biomedical images. Our approach does not require a particular image format or size and works equally well on both color and grayscale images in any MATLAB® compatible format. For the purpose of learning, a directory with both input images and matching ground truths (GTs) must be provided. From this point onwards, we call this a training set. Every input image must have a corresponding GT of the same size and format. The GT image is a binary image showing the best assessment of the boundaries of the objects of interest; all pixels inside those boundaries are by definition object pixels and all pixels outside the boundaries are by definition, non-object pixels. Pixels on the boundary itself are by definition also object pixels. GPIS has two stages of operation. Stage 1 is a learning phase in which GPIS uses the training set to evolve a MATLAB® program which meets user-defined threshold of segmentation accuracy relative to the input images of the training set. In the second stage, this evolved individual is evaluated for its ability to segment unseen images of the same type as the training images. The accuracy results achieved here are from here on called validation accuracy. In a real world situation, due to lack of GTs for unseen images, validation accuracy will take the form of the subjective assessment

1.2 GENIE Pro
GENIE Pro [4, 9] is a general purpose, interactive and adaptive GA-based image segmentation and classification tool. GENIE Pro uses a hybrid GA to assemble image-processing algorithms or pipelines from a collection of low-level image processing operators (for example edge detectors, textures measures, spectral orientations and morphological filters). The role of each evolved pipeline is to classify each pixel as feature or non-feature. The GA begins with a population of random pipelines, performs fitness evaluation for each pipeline in the population and selects the fitter pipelines to produce offspring pipelines using crossover

of a human user. However, for this paper, the authors evaluate the quality i.e. the validation accuracy of the individual evolved by GPIS by comparing their segmentation results to their matching GT images. We report the results of our evaluation in the Results section (Section 3) of this paper.

chromosome represents a complete MATLAB® segmentation program. There is a one-to-one mapping between the genome and the phenome as shown in Figure 2 (c). It also shows the representation of the knowledge structure used by the genetic learning system.
[Operator Name, Input Plane 1, Input Plane 2, Weights, SE/FP]

2.1 Stage 1: Learning phase of GPIS
GPIS operates in a typical evolutionary cycle in which a population of potential program solutions (each meant to segment images) is subjected to repeated selection and diversification until at least one of the individual meets the termination criteria. The flowchart of the learning stage is presented in Figure 1.
START

(a)
[G1] [G2] [G3] [G4] [G5] ......... [Gn]

(b)
.... .... d1 = input; h1 = fspecial(‘disk’,[6 6]); io1 = imfilter(d1, h1); SE1 = strel(‘square’, 2); io2 = imerode(io1, SE1); io3 = imclose(io2, SE1); Io4 = imadd(io2,io3); out = im2bw(io4, 0.55);

Initialization

Fitness Evaluation
next generation

.... ....

Termination Criteria met?
No

Yes

Output (Fittest individual)

....

STOP
GENOME PHENOME

Elitism

Parent Selection

(c) Figure 2. (a) Typical layout of a gene (b) Typical layout of a chromosome comprising of n genes (c) One-to-one mapping of the genome and phenome We use a pool of 20 primitive operators. Table 1 provides the complete list of all primitive image analysis operators in the gene pool along with the typical number of inputs required for each operator. Initialization creates a starting population for the GP. The initial population to the GP is randomly generated i.e. chromosomes are formed by a random assigned sequence of operators. The genomic initialization is also random i.e. parameter values of operators are also assigned randomly, based on the operator type. For practical reasons, the size of each chromosome is limited to a maximum length of 15. In addition, at the time of initialization, the size of the population along with values of crossover rates and mutation rates assigned by the user.

parents (copy)

elite

Genetic Diversification
offspring

Survivor Aggregation (Σ)

Injection

Figure 1. Flowchart of GPIS

2.1.1 Representation and Initialization
In our scheme, the genome of an individual MATLAB® program that processes an image. The program is an image file and the execution of the program is an image of the same size and format. image file is a segmented version of the input image. encodes a input to the MATLAB® This output

The general layout of a gene is a shown in Figure 1 (a). As seen in the figure, each gene specifies information about the primitive operator it encodes, the input images to the operator and parameter settings for the operator. This corresponds to a few lines (1-3) of the equivalent MATLAB® program. The gene consists of five parts. The first part contains name of the primitive operator and the second and third part contain the possible input images to the operator. Based on nature of the primitive operator, a gene may have one or two input images. The fourth part contains weights or parameter values for the primitive operator and fifth part encodes the nature of the Structuring Element or SE (only in case of morphological operations) or a secondary Filter Parameter or FP (only in case of filter operators). The phenomic representation (chromosome) is a linear combination of the genes, as shown in Figure 1 (b). The

2.1.2 Fitness Evaluation
A segmented image consists of positive (object) and negative (non-object) pixels. Ideally the segmentation of an image would result in an output image where positive pixels cover object pixels perfectly and the negative pixels cover non-object pixels perfectly. Based on this idea, we can view segmentation as a pixelclassification problem. The task of the segmentation program now becomes assignment of the right class to every pixel in the image. As such, we can apply measure of classification accuracy to the problem of image segmentation. Every segmentation program can be expected to identify not only pixels belonging to the objects of interest (True Positives, TPs), but also some non-object pixels identified as objects (False Negatives, FNs). Further, in addition to identifying non-object pixels (True Negatives, TNs), some pixels belonging to non-objects can be identified as object pixels

Table 1. Primitive image analysis operators in the gene pool Operator Name ADDP SUBP MULTP DIFF AVER DISK GAUS LAPL UNSHARP LP HP DIL ERODE OPEN CLOSE OPCL CLOP HISTEQ ADJUST THRES Description Add Planes Subtract Planes Multiply Planes Absolute Difference Averaging Filter Disk Filter Gaussian Filter Laplacian Filter Unsharp Filter Lowpass Filter Highpass Filter Image Dilate Image Erode Image Open Image Close Image OpenClose Image CloseOpen Histogram Equalization Image Adjust Thresholding Inputs 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Operator Type Arithmetic Arithmetic Arithmetic Arithmetic Filter Filter Filter Filter Filter Filter Filter Morphological Morphological Morphological Morphological Morphological Morphological Enhancement Enhancement Post-processing

(1) where FPR represents False Positive Rate and FNR represents False Negative Rate. The above formula for accuracy extends image segmentation problem to a pixel-classification problem. Therefore, ideally value of accuracy should be 1 (or 100%) for a perfectly segmented image. We also see that the formula is monomodal i.e. if image A is better segmented than image B  Accuracy (A) > Accuracy (B). However, we further extend this formula by introducing a term that penalizes longer programs. The fitness function for GPIS is as follows: (2) where FPR represents False Positive Rate, FNR represents False Negative Rate, len represents length of the program, β is a scaling factor for the length of a program, such that β ϵ [0.004, 0.008]. We found this range sufficient for our purpose.

2.1.3 Termination Criteria
Termination of the GP is purely fitness based and the evolutionary cycle continues till the time there is no major change in fitness over a 10 generations. In order to do this, first we calculate a minimum acceptable fitness value based on our trial runs. This value was found to be 95% for the database in use. Till the time, these values of fitness were not achieved, the GP keeps running. Once, these values were reached, a mechanism of calculating cumulative means of the fitness of successive generations was implemented. If the absolute difference between the means of 10 successive generations was less than 5% of the highest fitness achieved, the GP stops. If however, the GP is used on any other database, a default value of 90% is set. The termination criteria can be defined as follows: |current fitness – mean fitness(10 gen)| < 0.05  highest fitness

2.1.4 Parent Selection
Parent selection is done to select chromosomes that undergo diversification operations. In order to do this, we use a tournament selection scheme. It is chosen instead of rank selection as it is computationally more efficient. The size of the tournament window λ is kept at 10% of the size of the population. The number of parents selected is 50% of the size of the population.

(False Positives, FPs). Therefore, for an ideal segmentation, the number of FPs and FNs should be zero while the number of TPs and TNs should be exactly equal to number of object and non-object pixels. If we normalize the value of TPs and TNs by the total number of object and non-object pixels respectively, their individual values in the best case scenario would be 1 and 0 in the worst case scenario. However, for the segmentation problem, achieving this is a challenging task, thus we define two more measures based on TPs, TNs, FPs and FNs called the False Positive Rate (FPR) and False Negative rate (FNR). FPR is the proportion of non-object pixels that were erroneously reported as being object pixels. FNR is the proportion of object pixels that were erroneously reported as non-object pixels. Therefore, for an ideal segmentation, the values of FPR and FNR should be zero. For finding accuracy of a segmentation program, we use a pixel-based accuracy formula based on FPR and FNR. This formula reflects the training and validation accuracy for GPIS. It is as follows:

2.1.5 Elitism
We use elitism as a means of saving the top 1% chromosomes of a population. Copies of the best 1% of the chromosomes in the population are copied without change to the next generation.

2.1.6 Diversification
We employ five genetic operators in total: one crossover and four mutation operators. These are selected probabilistically based on their respective rate of crossover and mutation. Crossover: We use a 1-point crossover for our GP. Two parents are chosen randomly from the parent pool. A random location is chosen in each of the parent chromosomes. The subsequences before and after this location in the parents are exchanged creating two offspring chromosomes. Mutation: We use four mutation operators for our GP. There are three inter-genomic mutation operators, namely, swap, insert and

delete and one intra-genomic mutation operator, alter, which typically alters the weight element of the selected gene. The gene to be mutated is randomly chosen from the selected parent chromosome.

algorithm. From here on, we refer to the above as training accuracy and validation accuracy respectively. The output of Stage 2 is a chromosome that performs equally well on both training and validation sets and produces high overall validation accuracy.

2.1.7 Injection
In order to overcome loss of diversity in a population, we use an injection mechanism. We inject a fixed percentage of new randomly initialized programs to the population after every n generation. In the current configuration, we inject 20% new programs every 5 generations.

2.3 Experimental Setup
In order to test the effectiveness and efficacy of our algorithm, we tested the algorithm on a biomedical image database that consisted of HeLa cell images (in culture) of size 512 pixels  384 pixels . The task of the algorithm was to segment the cells present in the images. The procedure for obtaining results using our algorithm is given in Section 2.3.1.1. We also compare the results of our algorithm with those produced by GENIE Pro. The procedure used for obtaining results using GENIE Pro is given in Section 2.3.1.2. The final parameter values used for GPIS is given in Table 2. Table 2. Parameter settings for GPIS Population size: µ Crossover Rate: Pc Swap Mutation Rate: Pms Insert Mutation Rate: Pmi Delete Mutation Rate: Pmd Alter Mutation Rate: Pma Scalability factor for length: β 200 0.45 0.25 0.25 0.2 0.7 0.005

2.1.8 Survivor Aggregation
The aim of this phase is to collect chromosomes that have qualified to be part of the next generation (parent, offspring, elite, injected) in order to build the population for the next generation. This phase works in two modes: non- injection and injection mode. In the non-injection mode, copies of all parent chromosomes (50%), offspring chromosomes (49%) and elite chromosomes (1%) form the population of the next generation. In the injection mode, since a fixed size population (20%) of new chromosomes is inserted into the population, the top 79% of parent-offspring population is selected along with the elite set (1%) to form the population of the next generation.

2.1.9 Output (Fittest Individual)
Once the termination criterion has been satisfied, the output of the GP is typically the ―fittest‖ chromosome present in the final population. This chromosome is then chosen to be tested on a set of unseen test images and it is explained in Section 2.2. Our aim is to create a pool of such outputs (segmentation programs) which allows us to have multiple segmentation algorithms for the same database. This is created by subsequent runs of the GP. Note: When we apply percentages, the results are rounded to the closest integers. In case of elitism, if 1% < 1, 1 individual is copied.

2.3.1 Procedure for Training and Validation
In order to plan a run of the algorithm, we first decide size of the training and validation sets. To do so, we define G as the global total number of images in a database, T as the training set, V as the validation set, and R as the number of times optimal individuals are evolved for the same database. The final values for the above used in the present configuration are: G = 1026, T = 30, V = 100 and R = 28.

2.2 Stage 2: Evaluation Methodology
As mentioned in the previous section, the output of Stage 1 gives us one chromosome, which was the fittest chromosome amongst the population of final generation. The accuracy of the segmentations produced by this chromosome on the training images is known as training accuracy of the run. The actual challenge for this individual is to produce similar segmentation accuracies on an unseen set of images known as the validation images. In order to do this, we randomly select a fixed number of new images from outside the training set along with their corresponding GTs, from the image database. From this point onwards, we refer to call this the validation set. Once the validation set is chosen, the ―fittest chromosome‖ is applied on the entire set of images, one-by-one and segmentation accuracies for each image is calculated based on the accuracy formula (1) given is Section 2.1.2. Once this process ends, the average segmentation accuracy of set or validation accuracy of the run is calculated. We repeat the above process for various runs and calculate the overall training accuracy (average training accuracies of runs) and validation accuracy (average validation accuracies of runs) for the

2.3.1.1 Procedure for Obtaining Results using GPIS
Step 1. Randomly select T images and other V images from the G images in the database. Step 2. Perform training on T images to choose fittest individual for validation. Step 3. Validate this individual on V images to check the applicability of this individual on unseen images. If individual produces high validation accuracy, save it in the result set, else discard it. Step 4. Repeat Steps 1 to 3, R times producing a set of optimal individuals (result set). Step 5. Calculate values of average training and validation accuracy of the result set.

2.3.1.2 Procedure for Obtaining Results using GENIE Pro
Step 1. Select the same T and V images from the G images in the database, used for the corresponding GPIS run.

Step 2. Load each of the T images as a base image and create a training overlay for each image by marking Foreground (object) and Background (non-object) pixels manually. Step 3. Train on these manually marked training overlays using the in-built Ifrit Pixel Classifier. Step 4. Apply learned solution on V images to produce corresponding segmented images. Step 5. Calculate validation accuracy for these V images using formula (1). Step 6. Repeat Steps 1 to 5, R times, same as like GPIS. Step 7. Calculate values of average training and validation accuracy of the result set.

Table 3. Segmentation accuracy: GPIS Vs GENIE Pro Algorithm GPIS GENIE Pro Training Data 98.76% 94.12% Validation Data 97.01% 93.12%

Table 4. Cell count rate: GPIS Vs GENIE Pro Cell Count Measure Detected Cells Type 1 Cells Type 2 Cells Undetected Cells GPIS Training Data 98.24% 100% 98.78% 1.32% Validation Data 97.98% 100% 98.22% 1.55% GENIE PRO Training Data 97.02% 100% 97.49% 2.12% Validation Data 96.56% 100% 96.89% 2.25%

3. RESULTS
We have based our results on two criteria, effectiveness of the algorithm to accurately segment the given images, and efficiency of the algorithm in doing so. Effectiveness is based on two measures, pixel accuracy of the evolved solution and the cell count rate (percentage of cell structures correctly identified). In order to calculate the cell count rate, we have categorized cells into two types: Type1 and 2. Type 1 cells are those which can be identified by eye with relative ease. Type 2 cells are those which are relatively difficult to be identified by eye. We also provide comparative results for effectiveness for GENIE Pro. This is presented in Section 3.1.1. Efficiency reflects the time the algorithm takes to produce one individual of acceptable fitness. This is measured in terms of number of generations. These results are presented in Section 3.1.2. We also briefly discuss one evolved program and also provide segmented images produced. This is presented in Section 2.4.3 and Figure 5 and 6.

Table 5. Performance of GPIS based on number of generations Statistical Measure MEAN MEDIAN STANDARD DEVIATION UPPER BOUND LOWER BOUND Number Of Generations 122.07 122 6.85 138 112

3.1 Effectiveness
Table 3 presents results obtained for training and validation accuracies of segmentation achieved for GPIS and GENIE Pro. These values represent each algorithm’s ability to correctly classify each pixel in an image as an object or non-object pixel. We found that our algorithm performed better in segmenting the cells in the images as compared to GENIE Pro. The second measure for effectiveness that we used was cell count rate. We extend the concept of TPs, TNs, FPs and FNs to object detection where a TP denotes an object that is correctly identified by the algorithm as cell, FN denotes an object incorrectly identified as a cell, FP denotes non-object incorrectly identified as cell, and TN denotes a non-object correctly identified as the background. In order to consider an object as belonging to any of the above four options, a minimum of 70% of object pixels must correspond to any of the four options mentioned above. Cells identified were manually counted. Similar to the accuracy formula, based on TPs, TNs, FPs and FNs, we can define the FPR and FNR for cell count. FPR is the proportion of non-cell structures that were erroneously reported as being cell structures. FNR is the proportion of cell structures that were erroneously reported as non-cell structures. The cell count rate formula used is as follows: Cell Count Rate = (1-FPR)  (1-FNR) (3)

3.2 Efficiency
Table 5 reflects the efficiency of the process to produce the required results. We measure efficiency based on number of generations taken by GPIS to produce one individual of minimum acceptable fitness. This acceptable fitness is 95% training accuracy. In our runs, we observed that GPIS never failed to produce an acceptable individual. The experiments were performed on an Intel Pentium (R) 4 CPU, 3.06 GHz, 2GB RAM computer. To execute 1 generation, GPIS took at an average 4.21 minutes. The average time taken for a complete run was approximately 513 minutes. The maximum time taken for a complete run was 580 minutes. Since GPIS is designed to run as an offline tool and the time it takes to execute an evolved program is between 1-3 seconds, the period of evolution of an optimal program is within reasonable real world constraints. Also, the standard deviation for number of generations is low. This shows that GPIS runs consistently to produce an optimal program within a tight window.

3.3 Evolved Program
Figure 5 shows the chromosomal and genomic structure of an evolved program. The program evolved is a combination of filters

and morphological operators. The first gene is a 6  6 Gaussian low pass filter with a sigma value of 0.8435 followed by a 4  4 averaging filter. The output image from gene 2 is eroded with a flat, disk-shaped structuring element of radius 2. A 6  6 Gaussian low pass filter with a sigma value of 0.8435 followed by a 4  4 averaging filter. The output image from gene 2 is eroded with a flat, disk-shaped structuring element of radius 2. A 6  6 averaging filter is again applied to the output image of the eroded image. Its output image undergoes a composite morphological operation of closing and opening with the same structuring element as above. Finally this image is converted to a binary output image using a threshold of 0.09022. The validation accuracy is calculated for this image. Figure 6 shows implementation of this evolved program on two validation images along with corresponding results from GENIE Pro.

[5] J. M. Daida, J. D. Hommes, T. F. Bersano-Begey,S. J. Ross, and J. F. Vesecky, ―Algorithm Discovery using the Genetic Programming Paradigm: Extracting Low-contrast Curvilinear Features from SAR Images of Arctic Ice‖, Advances in Genetic Programming II, P. J. Angeline, K. E. Kinnear, (Eds.), Chapter 21, The MIT Press, 1996, pp. 417-442. [6] B. Bhanu, Y. Lin, ―Learning Composite Operators for Object Detection‖, Proceedings of the Conference on Genetic and Evolutionary Computation, July 2002, pp. 1003–1010. [7] S. P. Brumby, J. P. Theiler, S. J. Perkins, N. R. Harvey, J. J. Szymanski, and J. J. Bloch, ―Investigation of Image Feature Extraction by a Genetic Algorithm‖, Proceedings of SPIE, Vol. 3812, 1999, pp. 24-31. [8] Bhanu, B.; Sungkee Lee; Das, S., ―Adaptive image segmentation using genetic and hybrid search methods”, IEEE Transactions on Aerospace and Electronic Systems, Vol. 31, Issue 4, Oct 1995 Page(s):1268 – 1291. [9] B. Bhanu and Y. Lin, ―Object Detection in Multi-modal Images using Genetic Programming‖, Applied Soft Computing, Vol. 4, Issue 2, 2004, pp. 175-201. [10] Bhanu, B.; Sungkee Lee; Das, S., ―Adaptive image segmentation using genetic and hybrid search methods”, IEEE Transactions on Aerospace and Electronic Systems, Vol. 31, Issue 4, Oct 1995 Page(s):1268 – 1291. [11] B. Bhanu and Y. Lin, ―Object Detection in Multi-modal Images using Genetic Programming‖, Applied Soft Computing, Vol. 4, Issue 2, 2004, pp. 175-201. [12] B. Bhanu, Y. Lin, ―Learning Composite Operators for Object Detection‖, Proceedings of the Conference on Genetic and Evolutionary Computation, July 2002, pp. 1003–1010. [13] S. P. Brumby, J. P. Theiler, S. J. Perkins, N. R. Harvey, J. J. Szymanski, and J. J. Bloch, ―Investigation of Image Feature Extraction by a Genetic Algorithm‖, Proceedings of SPIE, Vol. 3812, 1999, pp. 24-31. [14] J. M. Daida, J. D. Hommes, T. F. Bersano-Begey,S. J. Ross, and J. F. Vesecky, ―Algorithm Discovery using the Genetic Programming Paradigm: Extracting Low-contrast Curvilinear Features from SAR Images of Arctic Ice‖, Advances in Genetic Programming II, P. J. Angeline, K. E. Kinnear, (Eds.), Chapter 21, The MIT Press, 1996, pp. 417-442. [15] J. M. Daida, J. D. Hommes, S. J. Ross, A. D. Marshall, and J. F. Vesecky, ―Extracting Curvilinear Features from SAR Images of Arctic Ice: Algorithm Discovery Using the Genetic Programming Paradigm,‖ Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Italy, IEEE Press, 1995, pp. 673–75. [16] K. S. Fu, and J. K. Mui, ―A Survey on Image Segmentation‖, Pattern Recognition, 13, 1981, pp. 3-16. P. Ghosh and M. Mitchell, ―Segmentation of Medical Images using a Genetic Algorithm‖, Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, 2006, pp. 1171—1178. [17] Harvery, N. Levenson, R. M., Rimm, D. L. Investigation of automated feature extraction techniques for applications in cancer derection from multi-spectral histopathology images. Proceedings of SPIE, Vol. 5032, 2003, 557-556.

4. CONCLUSIONS
In this paper, we propose a simple approach to the complex problem of image segmentation. The proposed algorithm, GPIS, uses genetic programming to evolve image segmentation programs from a pool of primitive image analysis operators. The evolved solutions are simple MATLAB® based image segmentation programs. They are easy to read and implement. In addition, the algorithm does not require any a priori information of objects to be segmented from the images. We have tested our algorithm on a biomedical image database. We also compare the results to another GA-based image segmentation algorithm, GENIE Pro. We found that our algorithm consistently produced better results. Both the segmentation accuracy and cell count rate were higher than GENIE Pro. It also produced an optimal solution within a reasonable time window. In addition, GPIS never failed to produce an optimal solution.

5. ACKNOWLEDGMENTS
We are grateful to Ms Aida Abu-Baker and Ms Janet Laganiere from CHUM Research Centre, Notre-Dame Hospital, Montreal for providing us with the images for the cell database. We would also like to thank Dr James Lacefield from University of Western Ontario, London for his help on this project.

6. REFERENCES
[1] Bhanu, B.; Sungkee Lee; Das, S., ―Adaptive image segmentation using genetic and hybrid search methods”, IEEE Transactions on Aerospace and Electronic Systems, Vol. 31, Issue 4, Oct 1995 Page(s):1268 – 1291. [2] B. Bhanu and Y. Lin, ―Object Detection in Multi-modal Images using Genetic Programming‖, Applied Soft Computing, Vol. 4, Issue 2, 2004, pp. 175-201. [3] B. Bhanu, Y. Lin, ―Learning Composite Operators for Object Detection‖, Proceedings of the Conference on Genetic and Evolutionary Computation, July 2002, pp. 1003–1010. [4] S. P. Brumby, J. P. Theiler, S. J. Perkins, N. R. Harvey, J. J. Szymanski, and J. J. Bloch, ―Investigation of Image Feature Extraction by a Genetic Algorithm‖, Proceedings of SPIE, Vol. 3812, 1999, pp. 24-31.

[18] D. Howard and S. C. Roberts, ―A Staged Genetic Programming Strategy for Image Analysis‖, Proceedings of the Genetic and Evolutionary Computation Conference, 1999, pp. 1047—1052. [19] D. Howard, S. C. Roberts, and R. Brankin, ―Evolution of Ship Detectors for Satellite SAR Imagery‖, Proceedings of EuroGP'99, Vol. 1598, 1999, pp. 135- 148. [20] N. R. Pal, and S. K. Pal, ―A Review on Image Segmentation Techniques‖, Pattern Recognition, 26, 1993, pp. 1277-1294. [21] D. L. Pham, C. Xu, J. L. Prince, ―Survey of Current Methods in Medical Image Segmentation‖, Annual Review of Biomedical Engineering, 2, 2000, pp. 315—337. [22] R. Poli, ―Genetic Programming for Feature Detection and Image Segmentation‖, T.C. Forgarty (Ed.), Evolutionary Computation, Springer- Verlag, Berlin, Germany, 1996, pp. 110–125. GAUS AVER EROD AVER CLOP THRES (a) Genomic Structure

[23] M. E. Roberts and E. Claridge, ―An Artificially Evolved Vision System for Segmenting Skin Lesion Images‖, Proceedings of the 6th International Conference on Medical Image Computing and Computer-Assisted Intervention, Vol. 2878, 2003, pp. 655- 662. [24] W. Tackett, ―Genetic Programming for Feature Discovery and Image Discrimination‖, In S. Forrest, editor, Proceedings of 5th International Conference on Genetic Algorithm, 1993, pp. 303–311. [25] W. Tackett, ―Genetic Programming for Feature Discovery and Image Discrimination‖, In S. Forrest, editor, Proceedings of 5th International Conference on Genetic Algorithm, 1993, pp. 303–311. [26] Y. J. Zhang, ―Influence of Segmentation over Feature Measurement‖, Pattern Recognition Letters, 16(2), 1992, 201-206. [GAUSS, d1, 0, 6, 0.8435] [AVER, io1, 0, 4, 0] [EROD, io2, 0, 0, 1] [AVER, io3, 0, 6, 0] [CLOP, io4, 0, 0, 1] [THRESH, io5, 0, 0.09022, 0]

MATLAB® Implementation d1 = input;

[GAUSS, d1, 0, 6, 0.8435] [AVER, io1, 0, 4, 0] [EROD, io2, 0, 0, 1] [AVER, io3, 0, 6, 0] [CLOP, io4, 0, 0, 1] [THRESH, io5, 0, 0.09022, 0]

h1 = fspecial(‘gaussian’, [6 6], 0.8435); io1 = imfilter(d1, h1); h2 = fspecial(‘average’, [4 4]); io2 = imfilter(io1,h2); SE1 = strel(‘disk’, 2); io3 = imerode(io2, SE1); h3 = fspecial(‘average’, [6 6]); io4 = imfilter(io3,h3); io5 = imclose(io4, SE1); output = im2bw(io5, 0.09022);

Segmentation accuracy on validation set: 99.04 %; Number of operators used = 6; Average execution time = 1.252 seconds; Number of generation needed to converge = 114; Number of fitness evaluation = 10,532 (b) Figure 5. An evolved program: (a) Chromosomal and genomic structure for the evolved program, (b) Genomic structure and equivalent MATLAB® implementation of the evolved program with corresponding performance results

(a)

(b)

(c)

(d)

Figure 6. (a) Segmentation produced by GPIS using evolved program shown above on validation image 1 (Validation Accuracy = 99.21%, Cell Count Rate = 100%), (b) Segmentation produced by GENIE Pro on validation image 1 (Validation Accuracy = 95.46%, Cell Count Rate = 97.89%), (c) Segmentation produced by GPIS using evolved program shown above on validation image 2 (Validation Accuracy = 98.93%, Cell Count Rate = 100%), (d) Segmentation produced by GENIE Pro on validation image 2 (Validation Accuracy = 94.22%, Cell Count Rate = 96.45%)

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.