Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
Constructing Models for MicroArray Data with Swarm Algorithm

Constructing Models for MicroArray Data with Swarm Algorithm

Ratings: (0)|Views: 69 |Likes:
Published by ijcsis
Building a model plays an important role in DNA microarray data. An essential feature of DNA microarray data sets is that the number of input variables (genes) is far greater than the number of samples. As such, most classification schemes employ variable selection or feature selection methods to pre-process DNA microarray data. In this paper Flexible Neural Tree (FNT) model for gene expression profiles classification is done. Based on the predefined instruction/operator sets, a flexible neural tree model can be created and evolved. This framework allows input variables selection, over-layer connections and different activation functions for the various nodes involved. The FNT structure is developed using the Ant Colony Optimization (ACO) and the free parameters embedded in the neural tree are optimized by Particle Swarm Optimization (PSO) algorithm and its enhancement (EPSO). The purpose of this research is to find the model which is an appropriate model for feature selection and tree-based ensemble models that are capable of delivering high performance classification models for microarray data.
Building a model plays an important role in DNA microarray data. An essential feature of DNA microarray data sets is that the number of input variables (genes) is far greater than the number of samples. As such, most classification schemes employ variable selection or feature selection methods to pre-process DNA microarray data. In this paper Flexible Neural Tree (FNT) model for gene expression profiles classification is done. Based on the predefined instruction/operator sets, a flexible neural tree model can be created and evolved. This framework allows input variables selection, over-layer connections and different activation functions for the various nodes involved. The FNT structure is developed using the Ant Colony Optimization (ACO) and the free parameters embedded in the neural tree are optimized by Particle Swarm Optimization (PSO) algorithm and its enhancement (EPSO). The purpose of this research is to find the model which is an appropriate model for feature selection and tree-based ensemble models that are capable of delivering high performance classification models for microarray data.

More info:

Published by: ijcsis on Jan 20, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

01/20/2011

pdf

text

original

 
 
Constructing Models from Microarray Data withSwarm Algorithms.
Mrs.Aruchamy Rajini Dr. (Mrs.)Vasantha kalayani David
Lecturer in Computer Applications Associate Professor, Department of Computer ScienceHindusthan College of Arts & Science, Coimbatore
 
Avinashilingam Deemed University, Coimbatore
 
aruchamy_rajini@yahoo.co.in vasanthadavid@yahoo.com
  Abstract 
Building a model plays an important role in DNAmicroarray data. An essential feature of DNA microarraydata sets is that the number of input variables (genes) is fargreater than the number of samples. As such, mostclassification schemes employ variable selection or featureselection methods to pre-process DNA microarray data. Inthis paper Flexible Neural Tree (FNT) model for geneexpression profiles classification is done. Based on the pre-defined instruction/operator sets, a flexible neural treemodel can be created and evolved. This framework allowsinput variables selection, over-layer connections anddifferent activation functions for the various nodes involved.The FNT structure is developed using the Ant ColonyOptimization (ACO) and the free parameters embedded inthe neural tree are optimized by Particle SwarmOptimization (PSO) algorithm and its enhancement (EPSO).The purpose of this research is to find the model which isan appropriate model for feature selection and tree-basedensemble models that are capable of delivering highperformance classification models for microarray data.
 Keywords
 
--- DNA, FNT, ACO, PSO, EPSO
I.
 
INTRODUCTION
A DNA micro array (also commonly known as DNA chipor gene array) is a collection of microscopic DNA spotsattached to a solid surface, such as glass, plastic or siliconchip forming an array for the purpose of expression profiling, monitoring expression levels for thousands of genes simultaneously. Micro arrays provide a powerful basis to monitor the expression of thousands of genes, inorder to identify mechanisms that govern the activation of genes in an organism [1].Recent advances in DNA micro array technology allowscientists to measure expression levels of thousands of genes simultaneously in a biological organism. Since thecancer cells usually evolve from normal cells due tomutations in genomic DNA, comparison of the geneexpression levels of cancerous and normal tissues or different cancerous tissues may be useful to identify thosegenes that might anticipate the clinical behavior of cancers.Micro array technology has made the modern biologicalresearch by permitting the simultaneous study of genescomprising a large part of genome [2]. In response to thedevelopment of DNA micro array technologies,classification methods and gene selection techniques are been computed for better use of classification algorithm inmicro array gene expression data [3] [4].Variable selection refers to the problem of selecting inputvariables that are most predictive for a given outcome.Appropriate variable selection can greatly enhance theeffectiveness and potential interpretability of an inferencemodel. Variable selection problems are found in allsupervised and unsupervised machine learning tasksincluding classification, regression, time-series prediction,and clustering [5].This paper develops a Flexible Neural Tree (FNT) [6] for selecting the input variables. Based on the pre-definedinstruction/operator sets, a flexible neural tree model can be created and evolved. FNT allows input variablesselection, over-layer connections and different activationfunctions for different nodes. The tuning of the parameters encoded in the structure is accomplished usingParticle Swarm Optimization (PSO) algorithm and itsenhancement.The proposed method interleaves both optimizations.Starting with random structures and corresponding parameters, it first tries to improve the structure and thenas soon as an improved structure is found, it then tunes its parameters. It then goes back to improving the structureagain and, then tunes the structure and rules' parameters.This loop continues until a satisfactory solution is foundor a time limit is reached.II.
 
THE FLEXIBLE NEURAL TREE MODELThe function set
 F 
and terminal instruction set
used for generating a FNT model are described as
=
 F 
U
=
+
2
 ,
+
3
 , . . . ,
+
 N 
 }
U
{x
1
 , . . . , x
n
 }
, where +
i
(
i
= 2
 ,
3
 , . . .,N 
)denote non-leaf nodes’ instructions and taking
i
arguments.
 x
1
,
 x
2
,
. . .
,
 x
n
 
are leaf nodes instructions andtaking no other arguments. The output of a non-leaf nodeis calculated as a flexible neuron model (see Fig.1). Fromthis point of view, the instruction +
i
 
is also called aflexible neuron operator with
i
inputs.In the creation process of neural tree, if a nonterminalinstruction, i.e., +
i
(
i
=2
 ,
3
 ,
4
 , . . .,N 
) is selected,
i
realvalues are randomly generated and used for representingthe connection strength between the node +
i
and itschildren. In addition, two adjustable parameters
a
i
and
b
i
are randomly created as flexible activation function parameters and their value range are [0, 1]. For 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 9, December 2010237http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
developing the forecasting model, the flexible activationfunction
 f 
(
ai, bi, x
) =
e
((
 x
ai)/bi
)2
is used.The total excitation of +
n
 
is
net 
n
 
=
n j
=1
 
w
 j
* x
 j
,where
 x
 j
 
(
 j
= 1
 ,
2
 , . . ., n
) are the inputs to node +
n
and
w
 j
are generated randomly with their value rangeare[0,1].The output of the node +
n
 
is then calculated by
out 
n
 
=
 f 
(
a
n
 , b
n
 , net 
n
) =
e
( (
net n
an )/bn
)2
. The overall output of flexible neural tree can be computed from left to right bydepth-first method, recursively [7].
 
Fig. 1.
 
A flexible neuron operator (left), and a typical representation of the FNT with function instruction set
 F 
=
+
2
 ,
+
3
 ,
+
4
 ,
+
5
 ,
+
6
 }
, and terminalinstruction set
=
{x
1
 , x
2
 , x
3
 }
(right)
III.
 
SWARM INTELLIGENCEALGORITHMS.Swarm Intelligence (SI) has recently emerged as a familyof nature inspired algorithms, especially known for their ability to produce low cost, fast and reasonably accuratesolutions to complex search problems [1].
 
It gives anintroduction to swarm intelligence with special emphasison two specific SI algorithms well-known as ParticleSwarm Optimization (PSO) and Ant Colony Optimization(ACO).PSO was originated from computer simulations of thecoordinated motion in flocks of birds or schools of fish.As these animals wander through a three dimensionalspace, searching for food or evading predators, thesealgorithms make use of particles moving at velocitydynamically adjusted according to its historical behaviorsand its companions in an n-dimensional space to searchfor solutions for an n-variable function optimization problem. The Particle Swarm Optimization algorithmincludes some tuning parameters that greatly influence thealgorithm performance, often stated as the explorationexploitation trade off. Exploration is the ability to testvarious regions in the problem space in order to locate agood optimum, hopefully the global one. Exploitation isthe ability to concentrate the search around a promisingcandidate solution in order to locate the optimum precisely [8][9][10][11].El-Desouky et al., in [10] proposed a more enhanced particle swarm algorithm depending on exponentialweight variation instead of varying it linearly which gives better results when applied on some benchmarksfunctions. In this paper three models are compared: 1) ATree structure is created with ACO 2) A Tree structure iscreated with ACO and the parameters are optimized withPSO 3) A Tree Structure is created with ACO and the parameters are optimized with EPSO. Comparisons of thethree models are shown in this paper to propose anefficient methodology.IV.
 
ANT COLONY OPTIMIZATION (ACO) FOR EVOLVING THE ARCHITECTURE OFFNTACO is a new probabilistic technique for solvingcomputational problems to find optimal path. It is a paradigm for designing metaheuristic algorithm for combinatorial optimization problems. The mainunderlying idea, inspired by the behavior of real ants, isthat of a parallel search over several constructive threads based on local problem data and on a dynamic memorystructure containing information on the quality of  previously obtained results.In this algorithm, each ant will build and modify the treesaccording to the quantity of pheromone at each node.Each node memorizes the rate of pheromone. First, a population of programs is generated randomly. Each nodeis initialized at 0
.
5, which means that the probability of choosing each terminal and function is equal initially. Thehigher the rate of pheromone, the higher the probability to be chosen. Each ant is then evaluated using a predefinedobjective function which is given by Mean Square Error (MSE)[7].
Fit
(i) =1/p
 pj=1
(
 At - Ex
)
2
(1)
 
+
n
 
w
1
 w
2
 w
3
 X
1
 X
2
 f(a,b)
Y
X
3
 
+
6
+
2
+
3
+
3
 
X
1
X
2
X
3
 
+
3
 
X
1
 X
2
 
+
2
 X
3
 
X
1
X
2
 X
3
 X
3
X
2
X
1
X
2
X
3
 OutputLayer Secondhiddenlayer Firsthiddenlayer Inputlayer 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 9, December 2010238http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
Where p is the total number of samples, At and Ex areactual and expected outputs of the j
th
sample.
Fit
(i)denotes the fitness value of the i
th
ant.The pheromone is updated by two mechanisms: – 1. Trail Evaporation: - Evaporation decreases the rate of  pheromone for every instruction on every node, in order to avoid unlimited accumulation of trails, according tofollowing formula:Pg = (1
 
α
) Pg
1 (2)where Pg denotes the pheromone value at the generationg,
α
is a constant (
α
= 0.15).
2.Daemon actions: - For each tree, the components of the tree will be reinforced according to the Fitness of thetree. The formula is
 P 
i,
 s
i
=
 P 
i,
 s
i
+
α
 
3)
 
where
 s
is a solution (tree),
 Fit 
(
 s
) its Fitness,
 si
thefunction or the terminal set at node
i
in this individual,
á
is a constant (
á
= 0
.
1),
 Pi,si
is the value of the pheromonefor the instruction
 si
in the node
i
[7].A brief description of AP algorithm is as follows:(1)every component of the pheromone tree is set to anaverage value; (2) random generation of tree based on the pheromone; (3) evaluation of ants (4) update of the pheromone; (5) go to step (1) unless some criteria issatisfied[7]V.
 
PARAMETER OPTIMIZATION WITH PSO.PSO [12] is in principle such a multi-agent parallel searchtechnique. It does not require any gradient information of the function to be optimized, uses only primitivemathematical operators. Particles are conceptual entitieswhich fly through the multi-dimensional search space.PSO was inspired by the social behavior of a birdflock or fish school.PSO[13] conducts searches using a population of particles which correspond to individuals.In the PSO algorithm, the birds in a flock aresymbolically represented as particles. These particlescan be considered as simple agents flying” through a problem space. A particle’s location represents a potentialsolution for the problem in the multi-dimensional problemspace. A different problem solution is generated, when a particle moves to a new location.PSO model consists of a swarm of particles, which areinitialized with a population of random positions. Theymove iteratively through the d-dimension problem spaceto search the new solutions, where the fitness, f, (Eqn. (1))can be calculated as the certain qualities measure. Each particle has a position represented by a position-vector x
i
 (i is the index of the particle) and a velocity represented by a velocity-vector v
i
. Each particle remembers its own best position so far in a vector x.Each particle keeps track of its own best position, whichis associated with the best fitness it has achieved so far ina vector p
i
. The best position among all the particlesobtained so far in the population is kept track of as p
g
.Each particle
i
maintains the following information:
 x
i
the current position of the particle,
v
i
the currentvelocity of the particle must be defined by parameters
v
min
and
v
max
. At each time step t, by using individual best position p
i
, and all the global best position, p
g
(t), a newvelocity for particle i is updated by[1]
i
(t+1) = wv
i
(t)+c
1
φ
1
(p
i
(t) – x
i
(t))+c
2
φ
2
(p
 g 
(t) – X 
i
(t)) (4)
Where w is the inertia weight whose range is [0.4, 0.9], c
1
and c
2
are positive constant and are the learning factorscalled, respectively, cognitive parameter and social parameter. The proper fine-tuning may result in faster convergence and alleviation of local minima. The defaultvalues, usually, c
1
=c
2
=2 are used. Even by usingc
1
=c
2
=1.49 gives better results.
 
φ
1
and
φ
2
are uniformlydistributed random number in range of [0, 1].During the iteration time t, the update of the velocity fromthe previous velocity to the new velocity is determined.The new position is then determined by the sum of the previous and the new velocity, according to the formula:
 X 
i
(t+1) = x
i
(t) +
v
i
(t+1) (5)
Various methods are used to identify particle to influencethe individual. Two basic approaches to PSO exist based on the interpretation of the neighborhood of  particles. They are (1) global best (gbest) version of PSO where the neighborhood of each particle is theentire swarm. The social component then causes particles to be drown toward the best particle in theswarm.(2) local best (lbest) PSO model, particles haveinformation only of their own and their nearest arrayneighbors best(lbest) rather than that of entire group. Thegbest model converges quickly but has weakness of beingtrapped in local optima. The gbest is recommendedstrongly for unimodal objective function [1].The PSO is executed with repeated application of equation (4), (5) until a specified number of iterations has been exceeded or when the velocityupdates are close to zero over a number of iterations.The PSO algorithm work as follows:1) Initial population is generated randomly. The learning parameters c
1
, c2
 
are assigned in advance.2) The objectivefunction value for each particle is calculated.3) Search point is modified. The current search point of each particle is changed using Equations (4) and (5).4) If 
 F 
it(s)
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 9, December 2010239http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->