You are on page 1of 8

Genetic Programming based Image Segmentation with

Applications to Biomedical Object Detection


Tarundeep Singh Dhot, Nawwaf Kharma
Department of Electrical and Computer Engineering
Concordia University, Montreal, QC H3G 1M8
t_dhot@encs.concordia.ca, kharma@ece.concordia.ca

Mohammad Daoud
Department of Electrical and Computer Engineering
University of Western Ontario
London, ON, N6A 3K7
mohammad.dauod@gmail.com

Rabab Ward
Department of Electrical and Computer Engineering
University of British Columbia
Vancouver, BC, V6T 1Z4
rababw@ece.ubc.ca

ABSTRACT
Image segmentation is an essential process in many image
1. INTRODUCTION
Image segmentation is the process of extraction of objects of
analysis applications and is mainly used for automatic object
interest from a given image. It allows certain regions in the image
recognition purposes. In this paper, we define a new genetic
to be identified as an object based on some distinguishing criteria,
programming based image segmentation algorithm (GPIS). It uses
for example, pixel intensity or texture. It is an important part of
a primitive image-operator based approach to produce linear
many image analysis techniques as it is a crucial first step of the
sequences of MATLAB® code for image segmentation. We
imaging process and greatly impacts any subsequent feature
describe the evolutionary architecture of the approach and present
extraction or classification. It plays a critical role in automatic
results obtained after testing the algorithm on a biomedical image
object recognition systems for a wide variety of applications like
database for cell segmentation. We also compare our results with
medical image analysis [8, 9, 14, 15], geosciences and remote
another EC-based image segmentation tool called GENIE Pro. We
sensing [2, 3, 4, 5, 10, 11], and target detection [10, 11, 16].
found the results obtained using GPIS were more accurate as
compared to GENIE Pro. In addition, our approach is simpler to However, image segmentation is an ill-defined problem. Even
apply and evolved programs are available to anyone with access though numerous approaches have been proposed in the past [7,
to MATLAB®. 12, 13], there is still no general segmentation framework that can
perform adequately across a diverse set of images [1]. In addition,
most image segmentation techniques exhibit a strong domain or
Categories and Subject Descriptors application-type dependency [7, 12, 17]. Automated segmentation
I.4.6 [Image Processing and Computer Vision]: Segmentation –
algorithms often include a priori information of its subjects [8],
pixel classification.
making use of well-designed segmentation techniques restricted to
a small set of imagery.
General Terms
In this paper, we propose a new, simple image segmentation
Algorithms, Experimentation.
algorithm called Genetic Programming based Image Segmentation
(GPIS) that uses a primitive image-operator based approach for
Keywords segmentation and present results. The algorithm does not require
Image Segmentation, Genetic Programming. any a priori information about objects to be segmented other than
a set of training images. In addition, the algorithm is implemented
on MATLAB® and uses its standard image-function library. This
Permission to make digital or hard copies of all or part of this work for
allows easy access to anyone with MATLAB®.
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that In the following sections, we provide a brief introduction to
copies bear this notice and the full citation on the first page. To copy relevant work in GP based image segmentation and image
otherwise, or republish, to post on servers or to redistribute to lists,
analysis, followed by an overview of our approach in Section 1.3.
requires prior specific permission and/or a fee.
GECCO’09, July 8–12, 2009, Montréal Québec, Canada. Section 2 describes the methodology of our algorithm and the
Copyright 2009 ACM 978-1-60558-325-9/09/07...$5.00.
experimental setup for compiling results. Finally, Section 3 and mutation. In order to compute fitness of a pipeline, the
presents the results of the experiments conducted on a biomedical resultant segmentation produced by a pipeline is compared to a set
image database for cell segmentation purposes. We also compare of training images. These training images are produced by manual
our results with another EC-based image segmentation algorithm labeling of pixels by user as True (feature) or False (non-feature)
called GENIE Pro. pixels using an in-built mark-up tool called ALLADIN. Finally,
when a run of GENIE Pro is concluded, the fittest pipeline in the
1.1 Related Work population is selected and combined using a linear classifier
One of the initial works in this field was published by Tackett (Fisher Discriminant) to form evolved solution that can be used to
[16] in 1993. He applied GP to develop a processing tree capable segment new images.
of classifying features extracted from IR images. These evolved
GENIE Pro was developed for analyzing multispectral satellite
features were later used to construct a classifier for target
data. It has also been applied for biomedical feature-extraction
detection. On the same lines, in 1995, Daida et al. [5, 6] used GP
problems [9]. We have used it for comparison purposes.
to derive spatial classifiers for remote sensing purposes. This was
the first time GP was used for image processing applications in
geosciences and remote sensing.
1.3 Overview of Our Work
In this paper, we describe a new genetic programming based
In 1996, Poli [14] proposed an interesting approach to image image segmentation algorithm, GPIS that uses a primitive image-
analysis based on evolving optimal filters. The approach viewed operator based approach for segmentation. Each segmentation
image segmentation, image enhancement and feature detection algorithm can be viewed as a unique combination of image
purely as a filtering problem. In addition, he outlined key criteria analysis operators that are successfully able to extract desired
while building terminal sets, function sets and fitness functions regions from an image. If we are able to describe a sufficient set
for an image analysis application. of these image analysis operators, it is possible to build multiple
In 1999, Howard et al. [10, 11] presented a series of works using segmentation algorithms that segment a wide variety of images. In
GP for automatic object detection in real world and military image GPIS, we define a pool of low level image analysis operators. The
analysis applications. They proposed a staged evolutionary GP searches the solution space for the best possible combination
approach for evolution of target detectors or discriminators. This of these operators that are able to perform the most accurate
resulted in achieving practical evolution times. segmentation. From now on, we refer to these image analysis
operators as primitives. Each individual in a population is a
In 1999, another interesting approach was proposed by Brumby et
combination of these primitives and represents an image
al. [4]. They used a hybrid evolutionary approach to evolve image
segmentation program. Therefore, GPIS typically breeds a
extraction algorithms for remote sensing applications. These
population of segmentation programs in order to evolve one
algorithms were evolved using a pool of low level image
accurate image segmentation program.
processing operators. On the same lines, Bhanu et al. [2, 3] used
GP to evolve composite operators for object detection. These
operators were synthesized from combinations of primitive image 2. METHODOLOGY
processing operations used in object detection. In order to control The proposed algorithm GPIS is designed as a general tool for
the code-bloat problem, they also proposed size limits for the learning based segmentation of images. In this paper, particular
composite operators. attention is given to the testing it on biomedical images. Our
approach does not require a particular image format or size and
In 2003, Roberts and Claridge [15] proposed a GP based image
works equally well on both color and grayscale images in any
segmentation technique for segmenting skin lesion images. A key
MATLAB® compatible format.
feature of their work was the ability of the GP to generalize based
on a small set of training images. For the purpose of learning, a directory with both input images
and matching ground truths (GTs) must be provided. From this
Our approach is motivated by the works of Tackett [16], Brumby
point onwards, we call this a training set. Every input image must
et al. [4] and Bhanu et al. [2, 3]. They all effectively implemented
have a corresponding GT of the same size and format. The GT
a primitive image operator based approach for image analysis.
image is a binary image showing the best assessment of the
This is similar to our approach. In addition, we have used the key
boundaries of the objects of interest; all pixels inside those
criteria outlined by Poli [14] as references while building our
boundaries are by definition object pixels and all pixels outside
algorithm.
the boundaries are by definition, non-object pixels. Pixels on the
1.2 GENIE Pro boundary itself are by definition also object pixels.
GENIE Pro [4, 9] is a general purpose, interactive and adaptive GPIS has two stages of operation. Stage 1 is a learning phase in
GA-based image segmentation and classification tool. GENIE Pro which GPIS uses the training set to evolve a MATLAB® program
uses a hybrid GA to assemble image-processing algorithms or which meets user-defined threshold of segmentation accuracy
pipelines from a collection of low-level image processing relative to the input images of the training set.
operators (for example edge detectors, textures measures, spectral In the second stage, this evolved individual is evaluated for its
orientations and morphological filters). The role of each evolved ability to segment unseen images of the same type as the training
pipeline is to classify each pixel as feature or non-feature. images. The accuracy results achieved here are from here on
The GA begins with a population of random pipelines, performs called validation accuracy.
fitness evaluation for each pipeline in the population and selects In a real world situation, due to lack of GTs for unseen images,
the fitter pipelines to produce offspring pipelines using crossover validation accuracy will take the form of the subjective assessment
of a human user. However, for this paper, the authors evaluate the chromosome represents a complete MATLAB® segmentation
quality i.e. the validation accuracy of the individual evolved by program. There is a one-to-one mapping between the genome and
GPIS by comparing their segmentation results to their matching the phenome as shown in Figure 2 (c). It also shows the
GT images. We report the results of our evaluation in the Results representation of the knowledge structure used by the genetic
section (Section 3) of this paper. learning system.

2.1 Stage 1: Learning phase of GPIS [Operator Name, Input Plane 1, Input Plane 2, Weights, SE/FP]
GPIS operates in a typical evolutionary cycle in which a
population of potential program solutions (each meant to segment
(a)
images) is subjected to repeated selection and diversification until
at least one of the individual meets the termination criteria. The
[G1] [G2] [G3] [G4] [G5] ......... [Gn]
flowchart of the learning stage is presented in Figure 1.

START (b)

....
Initialization d1 = input;
h1 = fspecial(‘disk’,[6 6]);
.... io1 = imfilter(d1, h1);
SE1 = strel(‘square’, 2);
Fitness .... io2 = imerode(io1, SE1);
Evaluation io3 = imclose(io2, SE1);
next generation

....
Io4 = imadd(io2,io3);
out = im2bw(io4, 0.55);
....
Termination Output
Yes
(Fittest STOP
Criteria met? individual)
GENOME PHENOME

No

Elitism Parent Selection


(c)
Figure 2. (a) Typical layout of a gene (b) Typical layout of
parents

Genetic
(copy)
elite

Diversification a chromosome comprising of n genes (c) One-to-one


offspring

mapping of the genome and phenome


We use a pool of 20 primitive operators. Table 1 provides the
Survivor
Aggregation Injection complete list of all primitive image analysis operators in the gene
(Σ) pool along with the typical number of inputs required for each
operator.

Figure 1. Flowchart of GPIS Initialization creates a starting population for the GP. The initial
population to the GP is randomly generated i.e. chromosomes are
2.1.1 Representation and Initialization formed by a random assigned sequence of operators. The genomic
In our scheme, the genome of an individual encodes a initialization is also random i.e. parameter values of operators are
MATLAB® program that processes an image. The input to the also assigned randomly, based on the operator type. For practical
program is an image file and the execution of the MATLAB® reasons, the size of each chromosome is limited to a maximum
program is an image of the same size and format. This output length of 15. In addition, at the time of initialization, the size of
image file is a segmented version of the input image. the population along with values of crossover rates and mutation
rates assigned by the user.
The general layout of a gene is a shown in Figure 1 (a). As seen in
the figure, each gene specifies information about the primitive 2.1.2 Fitness Evaluation
operator it encodes, the input images to the operator and A segmented image consists of positive (object) and negative
parameter settings for the operator. This corresponds to a few (non-object) pixels. Ideally the segmentation of an image would
lines (1-3) of the equivalent MATLAB® program. The gene result in an output image where positive pixels cover object pixels
consists of five parts. The first part contains name of the primitive perfectly and the negative pixels cover non-object pixels perfectly.
operator and the second and third part contain the possible input Based on this idea, we can view segmentation as a pixel-
images to the operator. Based on nature of the primitive operator, classification problem. The task of the segmentation program now
a gene may have one or two input images. The fourth part becomes assignment of the right class to every pixel in the image.
contains weights or parameter values for the primitive operator As such, we can apply measure of classification accuracy to the
and fifth part encodes the nature of the Structuring Element or SE problem of image segmentation. Every segmentation program can
(only in case of morphological operations) or a secondary Filter be expected to identify not only pixels belonging to the objects of
Parameter or FP (only in case of filter operators). interest (True Positives, TPs), but also some non-object pixels
The phenomic representation (chromosome) is a linear identified as objects (False Negatives, FNs). Further, in addition
combination of the genes, as shown in Figure 1 (b). The to identifying non-object pixels (True Negatives, TNs), some
pixels belonging to non-objects can be identified as object pixels
Table 1. Primitive image analysis operators in the gene pool (1)
Operator where FPR represents False Positive Rate and FNR represents
Description Inputs Operator Type False Negative Rate. The above formula for accuracy extends
Name
image segmentation problem to a pixel-classification problem.
ADDP Add Planes 2 Arithmetic Therefore, ideally value of accuracy should be 1 (or 100%) for a
perfectly segmented image. We also see that the formula is mono-
SUBP Subtract Planes 2 Arithmetic modal i.e. if image A is better segmented than image B 
MULTP Multiply Planes 2 Arithmetic Accuracy (A) > Accuracy (B).

Absolute However, we further extend this formula by introducing a term


DIFF 2 Arithmetic that penalizes longer programs. The fitness function for GPIS is as
Difference
follows:
AVER Averaging Filter 1 Filter
(2)
DISK Disk Filter 1 Filter where FPR represents False Positive Rate, FNR represents False
Negative Rate, len represents length of the program, β is a scaling
GAUS Gaussian Filter 1 Filter factor for the length of a program, such that β ϵ [0.004, 0.008].
LAPL Laplacian Filter 1 Filter We found this range sufficient for our purpose.

UNSHARP Unsharp Filter 1 Filter 2.1.3 Termination Criteria


Termination of the GP is purely fitness based and the evolutionary
LP Lowpass Filter 1 Filter cycle continues till the time there is no major change in fitness
HP Highpass Filter 1 Filter over a 10 generations. In order to do this, first we calculate a
minimum acceptable fitness value based on our trial runs. This
DIL Image Dilate 1 Morphological value was found to be 95% for the database in use. Till the time,
these values of fitness were not achieved, the GP keeps running.
ERODE Image Erode 1 Morphological Once, these values were reached, a mechanism of calculating
OPEN Image Open 1 Morphological cumulative means of the fitness of successive generations was
implemented. If the absolute difference between the means of 10
CLOSE Image Close 1 Morphological successive generations was less than 5% of the highest fitness
achieved, the GP stops. If however, the GP is used on any other
Image Open-
OPCL 1 Morphological database, a default value of 90% is set. The termination criteria
Close
can be defined as follows:
Image Close- |current fitness – mean fitness(10 gen)| < 0.05  highest fitness
CLOP 1 Morphological
Open
Histogram 2.1.4 Parent Selection
HISTEQ 1 Enhancement Parent selection is done to select chromosomes that undergo
Equalization
diversification operations. In order to do this, we use a
ADJUST Image Adjust 1 Enhancement tournament selection scheme. It is chosen instead of rank
selection as it is computationally more efficient. The size of the
THRES Thresholding 1 Post-processing
tournament window λ is kept at 10% of the size of the population.
The number of parents selected is 50% of the size of the
(False Positives, FPs). population.

Therefore, for an ideal segmentation, the number of FPs and FNs 2.1.5 Elitism
should be zero while the number of TPs and TNs should be We use elitism as a means of saving the top 1% chromosomes of a
exactly equal to number of object and non-object pixels. If we population. Copies of the best 1% of the chromosomes in the
normalize the value of TPs and TNs by the total number of object population are copied without change to the next generation.
and non-object pixels respectively, their individual values in the
best case scenario would be 1 and 0 in the worst case scenario. 2.1.6 Diversification
However, for the segmentation problem, achieving this is a We employ five genetic operators in total: one crossover and four
challenging task, thus we define two more measures based on mutation operators. These are selected probabilistically based on
TPs, TNs, FPs and FNs called the False Positive Rate (FPR) and their respective rate of crossover and mutation.
False Negative rate (FNR). FPR is the proportion of non-object
pixels that were erroneously reported as being object pixels. FNR Crossover: We use a 1-point crossover for our GP. Two parents
is the proportion of object pixels that were erroneously reported as are chosen randomly from the parent pool. A random location is
non-object pixels. Therefore, for an ideal segmentation, the values chosen in each of the parent chromosomes. The subsequences
of FPR and FNR should be zero. For finding accuracy of a before and after this location in the parents are exchanged creating
segmentation program, we use a pixel-based accuracy formula two offspring chromosomes.
based on FPR and FNR. This formula reflects the training and Mutation: We use four mutation operators for our GP. There are
validation accuracy for GPIS. It is as follows: three inter-genomic mutation operators, namely, swap, insert and
delete and one intra-genomic mutation operator, alter, which algorithm. From here on, we refer to the above as training
typically alters the weight element of the selected gene. The gene accuracy and validation accuracy respectively.
to be mutated is randomly chosen from the selected parent
The output of Stage 2 is a chromosome that performs equally well
chromosome.
on both training and validation sets and produces high overall
2.1.7 Injection validation accuracy.
In order to overcome loss of diversity in a population, we use an
injection mechanism. We inject a fixed percentage of new 2.3 Experimental Setup
randomly initialized programs to the population after every n In order to test the effectiveness and efficacy of our algorithm, we
generation. In the current configuration, we inject 20% new tested the algorithm on a biomedical image database that
programs every 5 generations. consisted of HeLa cell images (in culture) of size 512 pixels  384
pixels . The task of the algorithm was to segment the cells present
2.1.8 Survivor Aggregation in the images. The procedure for obtaining results using our
The aim of this phase is to collect chromosomes that have algorithm is given in Section 2.3.1.1. We also compare the results
qualified to be part of the next generation (parent, offspring, elite, of our algorithm with those produced by GENIE Pro. The
injected) in order to build the population for the next generation. procedure used for obtaining results using GENIE Pro is given in
Section 2.3.1.2. The final parameter values used for GPIS is given
This phase works in two modes: non- injection and injection
in Table 2.
mode. In the non-injection mode, copies of all parent
chromosomes (50%), offspring chromosomes (49%) and elite Table 2. Parameter settings for GPIS
chromosomes (1%) form the population of the next generation. In
Population size: µ 200
the injection mode, since a fixed size population (20%) of new
chromosomes is inserted into the population, the top 79% of Crossover Rate: Pc 0.45
parent-offspring population is selected along with the elite set Swap Mutation Rate: Pms 0.25
(1%) to form the population of the next generation.
Insert Mutation Rate: Pmi 0.25
2.1.9 Output (Fittest Individual) Delete Mutation Rate: Pmd 0.2
Once the termination criterion has been satisfied, the output of the
GP is typically the ―fittest‖ chromosome present in the final Alter Mutation Rate: Pma 0.7
population. This chromosome is then chosen to be tested on a set Scalability factor for length: β 0.005
of unseen test images and it is explained in Section 2.2. Our aim
is to create a pool of such outputs (segmentation programs) which
allows us to have multiple segmentation algorithms for the same 2.3.1 Procedure for Training and Validation
database. This is created by subsequent runs of the GP. In order to plan a run of the algorithm, we first decide size of the
training and validation sets. To do so, we define G as the global
Note: When we apply percentages, the results are rounded to the total number of images in a database, T as the training set, V as
closest integers. In case of elitism, if 1% < 1, 1 individual is the validation set, and R as the number of times optimal
copied. individuals are evolved for the same database. The final values for
the above used in the present configuration are: G = 1026, T = 30,
2.2 Stage 2: Evaluation Methodology V = 100 and R = 28.
As mentioned in the previous section, the output of Stage 1 gives
us one chromosome, which was the fittest chromosome amongst 2.3.1.1 Procedure for Obtaining Results using GPIS
the population of final generation. The accuracy of the Step 1. Randomly select T images and other V images from the
segmentations produced by this chromosome on the training G images in the database.
images is known as training accuracy of the run. The actual Step 2. Perform training on T images to choose fittest
challenge for this individual is to produce similar segmentation individual for validation.
accuracies on an unseen set of images known as the validation
Step 3. Validate this individual on V images to check the
images.
applicability of this individual on unseen images. If
In order to do this, we randomly select a fixed number of new individual produces high validation accuracy, save it in
images from outside the training set along with their the result set, else discard it.
corresponding GTs, from the image database. From this point Step 4. Repeat Steps 1 to 3, R times producing a set of optimal
onwards, we refer to call this the validation set. Once the individuals (result set).
validation set is chosen, the ―fittest chromosome‖ is applied on
the entire set of images, one-by-one and segmentation accuracies Step 5. Calculate values of average training and validation
for each image is calculated based on the accuracy formula (1) accuracy of the result set.
given is Section 2.1.2. Once this process ends, the average
2.3.1.2 Procedure for Obtaining Results using GENIE
segmentation accuracy of set or validation accuracy of the run is
calculated. Pro
Step 1. Select the same T and V images from the G images in
We repeat the above process for various runs and calculate the the database, used for the corresponding GPIS run.
overall training accuracy (average training accuracies of runs) and
validation accuracy (average validation accuracies of runs) for the
Step 2. Load each of the T images as a base image and create a Table 3. Segmentation accuracy: GPIS Vs GENIE Pro
training overlay for each image by marking Foreground
(object) and Background (non-object) pixels manually. Algorithm Training Data Validation Data
Step 3. Train on these manually marked training overlays using GPIS 98.76% 97.01%
the in-built Ifrit Pixel Classifier.
GENIE Pro 94.12% 93.12%
Step 4. Apply learned solution on V images to produce
corresponding segmented images.
Table 4. Cell count rate: GPIS Vs GENIE Pro
Step 5. Calculate validation accuracy for these V images using
formula (1). GPIS GENIE PRO
Cell
Step 6. Repeat Steps 1 to 5, R times, same as like GPIS. Count Training Validation Training Validation
Step 7. Calculate values of average training and validation Measure Data Data Data Data
accuracy of the result set.
Detected
98.24% 97.98% 97.02% 96.56%
3. RESULTS Cells
We have based our results on two criteria, effectiveness of the
Type 1
algorithm to accurately segment the given images, and efficiency 100% 100% 100% 100%
Cells
of the algorithm in doing so.
Effectiveness is based on two measures, pixel accuracy of the Type 2
98.78% 98.22% 97.49% 96.89%
evolved solution and the cell count rate (percentage of cell Cells
structures correctly identified). In order to calculate the cell count Undetected
rate, we have categorized cells into two types: Type1 and 2. Type 1.32% 1.55% 2.12% 2.25%
Cells
1 cells are those which can be identified by eye with relative ease.
Type 2 cells are those which are relatively difficult to be identified
by eye. We also provide comparative results for effectiveness for Table 5. Performance of GPIS based on number of generations
GENIE Pro. This is presented in Section 3.1.1.
Efficiency reflects the time the algorithm takes to produce one Statistical Measure Number Of Generations
individual of acceptable fitness. This is measured in terms of MEAN 122.07
number of generations. These results are presented in Section
3.1.2. We also briefly discuss one evolved program and also MEDIAN 122
provide segmented images produced. This is presented in Section STANDARD DEVIATION 6.85
2.4.3 and Figure 5 and 6. UPPER BOUND 138
LOWER BOUND 112
3.1 Effectiveness
Table 3 presents results obtained for training and validation
accuracies of segmentation achieved for GPIS and GENIE Pro. 3.2 Efficiency
These values represent each algorithm’s ability to correctly Table 5 reflects the efficiency of the process to produce the
classify each pixel in an image as an object or non-object pixel. required results. We measure efficiency based on number of
We found that our algorithm performed better in segmenting the generations taken by GPIS to produce one individual of minimum
cells in the images as compared to GENIE Pro. acceptable fitness. This acceptable fitness is 95% training
accuracy. In our runs, we observed that GPIS never failed to
The second measure for effectiveness that we used was cell count produce an acceptable individual.
rate. We extend the concept of TPs, TNs, FPs and FNs to object
detection where a TP denotes an object that is correctly identified The experiments were performed on an Intel Pentium (R) 4 CPU,
by the algorithm as cell, FN denotes an object incorrectly 3.06 GHz, 2GB RAM computer. To execute 1 generation, GPIS
identified as a cell, FP denotes non-object incorrectly identified as took at an average 4.21 minutes. The average time taken for a
cell, and TN denotes a non-object correctly identified as the complete run was approximately 513 minutes. The maximum time
background. In order to consider an object as belonging to any of taken for a complete run was 580 minutes.
the above four options, a minimum of 70% of object pixels must Since GPIS is designed to run as an offline tool and the time it
correspond to any of the four options mentioned above. Cells takes to execute an evolved program is between 1-3 seconds, the
identified were manually counted. period of evolution of an optimal program is within reasonable
real world constraints. Also, the standard deviation for number of
Similar to the accuracy formula, based on TPs, TNs, FPs and FNs,
generations is low. This shows that GPIS runs consistently to
we can define the FPR and FNR for cell count. FPR is the
produce an optimal program within a tight window.
proportion of non-cell structures that were erroneously reported as
being cell structures. FNR is the proportion of cell structures that
were erroneously reported as non-cell structures. The cell count 3.3 Evolved Program
rate formula used is as follows: Figure 5 shows the chromosomal and genomic structure of an
evolved program. The program evolved is a combination of filters
Cell Count Rate = (1-FPR)  (1-FNR) (3)
and morphological operators. The first gene is a 6  6 Gaussian [5] J. M. Daida, J. D. Hommes, T. F. Bersano-Begey,S. J. Ross,
low pass filter with a sigma value of 0.8435 followed by a 4  4 and J. F. Vesecky, ―Algorithm Discovery using the Genetic
averaging filter. The output image from gene 2 is eroded with a Programming Paradigm: Extracting Low-contrast Curvilinear
flat, disk-shaped structuring element of radius 2. A 6  6 Gaussian Features from SAR Images of Arctic Ice‖, Advances in
low pass filter with a sigma value of 0.8435 followed by a 4  4 Genetic Programming II, P. J. Angeline, K. E. Kinnear,
averaging filter. The output image from gene 2 is eroded with a (Eds.), Chapter 21, The MIT Press, 1996, pp. 417-442.
flat, disk-shaped structuring element of radius 2. A 6  6 [6] B. Bhanu, Y. Lin, ―Learning Composite Operators for Object
averaging filter is again applied to the output image of the eroded Detection‖, Proceedings of the Conference on Genetic and
image. Its output image undergoes a composite morphological Evolutionary Computation, July 2002, pp. 1003–1010.
operation of closing and opening with the same structuring
[7] S. P. Brumby, J. P. Theiler, S. J. Perkins, N. R. Harvey, J. J.
element as above. Finally this image is converted to a binary
Szymanski, and J. J. Bloch, ―Investigation of Image Feature
output image using a threshold of 0.09022. The validation
Extraction by a Genetic Algorithm‖, Proceedings of SPIE,
accuracy is calculated for this image.
Vol. 3812, 1999, pp. 24-31.
Figure 6 shows implementation of this evolved program on two [8] Bhanu, B.; Sungkee Lee; Das, S., ―Adaptive image
validation images along with corresponding results from GENIE segmentation using genetic and hybrid search methods”,
Pro. IEEE Transactions on Aerospace and Electronic Systems,
Vol. 31, Issue 4, Oct 1995 Page(s):1268 – 1291.
4. CONCLUSIONS
[9] B. Bhanu and Y. Lin, ―Object Detection in Multi-modal
In this paper, we propose a simple approach to the complex
Images using Genetic Programming‖, Applied Soft
problem of image segmentation. The proposed algorithm, GPIS,
Computing, Vol. 4, Issue 2, 2004, pp. 175-201.
uses genetic programming to evolve image segmentation
programs from a pool of primitive image analysis operators. The [10] Bhanu, B.; Sungkee Lee; Das, S., ―Adaptive image
evolved solutions are simple MATLAB® based image segmentation using genetic and hybrid search methods”,
segmentation programs. They are easy to read and implement. In IEEE Transactions on Aerospace and Electronic Systems,
addition, the algorithm does not require any a priori information Vol. 31, Issue 4, Oct 1995 Page(s):1268 – 1291.
of objects to be segmented from the images. We have tested our [11] B. Bhanu and Y. Lin, ―Object Detection in Multi-modal
algorithm on a biomedical image database. We also compare the Images using Genetic Programming‖, Applied Soft
results to another GA-based image segmentation algorithm, Computing, Vol. 4, Issue 2, 2004, pp. 175-201.
GENIE Pro. We found that our algorithm consistently produced
better results. Both the segmentation accuracy and cell count rate [12] B. Bhanu, Y. Lin, ―Learning Composite Operators for Object
were higher than GENIE Pro. It also produced an optimal solution Detection‖, Proceedings of the Conference on Genetic and
within a reasonable time window. In addition, GPIS never failed Evolutionary Computation, July 2002, pp. 1003–1010.
to produce an optimal solution. [13] S. P. Brumby, J. P. Theiler, S. J. Perkins, N. R. Harvey, J. J.
Szymanski, and J. J. Bloch, ―Investigation of Image Feature
5. ACKNOWLEDGMENTS Extraction by a Genetic Algorithm‖, Proceedings of SPIE,
We are grateful to Ms Aida Abu-Baker and Ms Janet Laganiere Vol. 3812, 1999, pp. 24-31.
from CHUM Research Centre, Notre-Dame Hospital, Montreal [14] J. M. Daida, J. D. Hommes, T. F. Bersano-Begey,S. J. Ross,
for providing us with the images for the cell database. We would and J. F. Vesecky, ―Algorithm Discovery using the Genetic
also like to thank Dr James Lacefield from University of Western Programming Paradigm: Extracting Low-contrast Curvilinear
Ontario, London for his help on this project. Features from SAR Images of Arctic Ice‖, Advances in
Genetic Programming II, P. J. Angeline, K. E. Kinnear,
(Eds.), Chapter 21, The MIT Press, 1996, pp. 417-442.
6. REFERENCES
[1] Bhanu, B.; Sungkee Lee; Das, S., ―Adaptive image [15] J. M. Daida, J. D. Hommes, S. J. Ross, A. D. Marshall, and J.
segmentation using genetic and hybrid search methods”, F. Vesecky, ―Extracting Curvilinear Features from SAR
IEEE Transactions on Aerospace and Electronic Systems, Images of Arctic Ice: Algorithm Discovery Using the Genetic
Vol. 31, Issue 4, Oct 1995 Page(s):1268 – 1291. Programming Paradigm,‖ Proceedings of the IEEE
International Geoscience and Remote Sensing Symposium,
[2] B. Bhanu and Y. Lin, ―Object Detection in Multi-modal
Italy, IEEE Press, 1995, pp. 673–75.
Images using Genetic Programming‖, Applied Soft
Computing, Vol. 4, Issue 2, 2004, pp. 175-201. [16] K. S. Fu, and J. K. Mui, ―A Survey on Image Segmentation‖,
Pattern Recognition, 13, 1981, pp. 3-16.
[3] B. Bhanu, Y. Lin, ―Learning Composite Operators for Object
Detection‖, Proceedings of the Conference on Genetic and P. Ghosh and M. Mitchell, ―Segmentation of Medical Images
Evolutionary Computation, July 2002, pp. 1003–1010. using a Genetic Algorithm‖, Proceedings of the 8th Annual
Conference on Genetic and Evolutionary Computation,
[4] S. P. Brumby, J. P. Theiler, S. J. Perkins, N. R. Harvey, J. J.
2006, pp. 1171—1178.
Szymanski, and J. J. Bloch, ―Investigation of Image Feature
Extraction by a Genetic Algorithm‖, Proceedings of SPIE, [17] Harvery, N. Levenson, R. M., Rimm, D. L. Investigation of
Vol. 3812, 1999, pp. 24-31. automated feature extraction techniques for applications in
cancer derection from multi-spectral histopathology images.
Proceedings of SPIE, Vol. 5032, 2003, 557-556.
[18] D. Howard and S. C. Roberts, ―A Staged Genetic [23] M. E. Roberts and E. Claridge, ―An Artificially Evolved
Programming Strategy for Image Analysis‖, Proceedings of Vision System for Segmenting Skin Lesion Images‖,
the Genetic and Evolutionary Computation Conference, Proceedings of the 6th International Conference on Medical
1999, pp. 1047—1052. Image Computing and Computer-Assisted Intervention, Vol.
[19] D. Howard, S. C. Roberts, and R. Brankin, ―Evolution of 2878, 2003, pp. 655- 662.
Ship Detectors for Satellite SAR Imagery‖, Proceedings of [24] W. Tackett, ―Genetic Programming for Feature Discovery
EuroGP'99, Vol. 1598, 1999, pp. 135- 148. and Image Discrimination‖, In S. Forrest, editor,
[20] N. R. Pal, and S. K. Pal, ―A Review on Image Segmentation Proceedings of 5th International Conference on Genetic
Techniques‖, Pattern Recognition, 26, 1993, pp. 1277-1294. Algorithm, 1993, pp. 303–311.

[21] D. L. Pham, C. Xu, J. L. Prince, ―Survey of Current Methods [25] W. Tackett, ―Genetic Programming for Feature Discovery
in Medical Image Segmentation‖, Annual Review of and Image Discrimination‖, In S. Forrest, editor,
Biomedical Engineering, 2, 2000, pp. 315—337. Proceedings of 5th International Conference on Genetic
Algorithm, 1993, pp. 303–311.
[22] R. Poli, ―Genetic Programming for Feature Detection and
Image Segmentation‖, T.C. Forgarty (Ed.), Evolutionary [26] Y. J. Zhang, ―Influence of Segmentation over Feature
Computation, Springer- Verlag, Berlin, Germany, 1996, pp. Measurement‖, Pattern Recognition Letters, 16(2), 1992,
110–125. 201-206.

[GAUSS, d1, 0, 6, 0.8435] [AVER, io1, 0, 4, 0] [EROD, io2,


GAUS AVER EROD AVER CLOP THRES 0, 0, 1] [AVER, io3, 0, 6, 0] [CLOP, io4, 0, 0, 1] [THRESH,
io5, 0, 0.09022, 0]
(a)

Genomic Structure MATLAB® Implementation


d1 = input;
[GAUSS, d1, 0, 6, 0.8435] h1 = fspecial(‘gaussian’, [6 6], 0.8435);
io1 = imfilter(d1, h1);
[AVER, io1, 0, 4, 0] h2 = fspecial(‘average’, [4 4]);
io2 = imfilter(io1,h2);
[EROD, io2, 0, 0, 1] SE1 = strel(‘disk’, 2);
io3 = imerode(io2, SE1);
[AVER, io3, 0, 6, 0] h3 = fspecial(‘average’, [6 6]);
io4 = imfilter(io3,h3);
[CLOP, io4, 0, 0, 1] io5 = imclose(io4, SE1);
[THRESH, io5, 0, 0.09022, 0] output = im2bw(io5, 0.09022);
Segmentation accuracy on validation set: 99.04 %; Number of operators used = 6; Average execution time = 1.252 seconds; Number of
generation needed to converge = 114; Number of fitness evaluation = 10,532
(b)
Figure 5. An evolved program: (a) Chromosomal and genomic structure for the evolved program, (b) Genomic structure and
equivalent MATLAB® implementation of the evolved program with corresponding performance results

(a) (b) (c) (d)


Figure 6. (a) Segmentation produced by GPIS using evolved program shown above on validation image 1 (Validation Accuracy =
99.21%, Cell Count Rate = 100%), (b) Segmentation produced by GENIE Pro on validation image 1 (Validation Accuracy =
95.46%, Cell Count Rate = 97.89%), (c) Segmentation produced by GPIS using evolved program shown above on validation image
2 (Validation Accuracy = 98.93%, Cell Count Rate = 100%), (d) Segmentation produced by GENIE Pro on validation image 2
(Validation Accuracy = 94.22%, Cell Count Rate = 96.45%)