This action might not be possible to undo. Are you sure you want to continue?

9, December 2010

**Constructing Models from Microarray Data with Swarm Algorithms.
**

Mrs.Aruchamy Rajini

Lecturer in Computer Applications Hindusthan College of Arts & Science, Coimbatore aruchamy_rajini@yahoo.co.in

Abstract Building a model plays an important role in DNA microarray data. An essential feature of DNA microarray data sets is that the number of input variables (genes) is far greater than the number of samples. As such, most classification schemes employ variable selection or feature selection methods to pre-process DNA microarray data. In this paper Flexible Neural Tree (FNT) model for gene expression profiles classification is done. Based on the predefined instruction/operator sets, a flexible neural tree model can be created and evolved. This framework allows input variables selection, over-layer connections and different activation functions for the various nodes involved. The FNT structure is developed using the Ant Colony Optimization (ACO) and the free parameters embedded in the neural tree are optimized by Particle Swarm Optimization (PSO) algorithm and its enhancement (EPSO). The purpose of this research is to find the model which is an appropriate model for feature selection and tree-based ensemble models that are capable of delivering high performance classification models for microarray data. Keywords --- DNA, FNT, ACO, PSO, EPSO I. INTRODUCTION

**Dr. (Mrs.)Vasantha kalayani David
**

Associate Professor, Department of Computer Science Avinashilingam Deemed University, Coimbatore vasanthadavid@yahoo.com been computed for better use of classification algorithm in micro array gene expression data [3] [4]. Variable selection refers to the problem of selecting input variables that are most predictive for a given outcome. Appropriate variable selection can greatly enhance the effectiveness and potential interpretability of an inference model. Variable selection problems are found in all supervised and unsupervised machine learning tasks including classification, regression, time-series prediction, and clustering [5]. This paper develops a Flexible Neural Tree (FNT) [6] for selecting the input variables. Based on the pre-defined instruction/operator sets, a flexible neural tree model can be created and evolved. FNT allows input variables selection, over-layer connections and different activation functions for different nodes. The tuning of the parameters encoded in the structure is accomplished using Particle Swarm Optimization (PSO) algorithm and its enhancement. The proposed method interleaves both optimizations. Starting with random structures and corresponding parameters, it first tries to improve the structure and then as soon as an improved structure is found, it then tunes its parameters. It then goes back to improving the structure again and, then tunes the structure and rules' parameters. This loop continues until a satisfactory solution is found or a time limit is reached. II. THE FLEXIBLE NEURAL TREE MODEL The function set F and terminal instruction set T used for generating a FNT model are described as S = F U T = {+2,+3, . . . ,+N}U{x1, . . . , xn}, where +i(i = 2, 3, . . .,N) denote non-leaf nodes’ instructions and taking i arguments. x1,x2,. . .,xn are leaf nodes instructions and taking no other arguments. The output of a non-leaf node is calculated as a flexible neuron model (see Fig.1). From this point of view, the instruction +i is also called a flexible neuron operator with i inputs. In the creation process of neural tree, if a nonterminal instruction, i.e., +i(i =2, 3, 4, . . .,N) is selected, i real values are randomly generated and used for representing the connection strength between the node +i and its children. In addition, two adjustable parameters ai and bi are randomly created as flexible activation function parameters and their value range are [0, 1]. For

A DNA micro array (also commonly known as DNA chip or gene array) is a collection of microscopic DNA spots attached to a solid surface, such as glass, plastic or silicon chip forming an array for the purpose of expression profiling, monitoring expression levels for thousands of genes simultaneously. Micro arrays provide a powerful basis to monitor the expression of thousands of genes, in order to identify mechanisms that govern the activation of genes in an organism [1]. Recent advances in DNA micro array technology allow scientists to measure expression levels of thousands of genes simultaneously in a biological organism. Since the cancer cells usually evolve from normal cells due to mutations in genomic DNA, comparison of the gene expression levels of cancerous and normal tissues or different cancerous tissues may be useful to identify those genes that might anticipate the clinical behavior of cancers. Micro array technology has made the modern biological research by permitting the simultaneous study of genes comprising a large part of genome [2]. In response to the development of DNA micro array technologies, classification methods and gene selection techniques are

237

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010

developing the forecasting model, the flexible activation function f (ai, bi, x) = e− ((x−ai)/bi)2 is used. The total excitation of +n is netn = ∑nj=1 wj * xj, where xj (j = 1, 2, . . ., n) are the inputs to node +n and wj are generated randomly with their value range are[0,1].The output of the node +n is then calculated by outn = f(an, bn, netn) =e−( (netn−an)/bn)2 . The overall output of flexible neural tree can be computed from left to right by depth-first method, recursively [7].

on two specific SI algorithms well-known as Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO). PSO was originated from computer simulations of the coordinated motion in flocks of birds or schools of fish. As these animals wander through a three dimensional space, searching for food or evading predators, these algorithms make use of particles moving at velocity dynamically adjusted according to its historical behaviors and its companions in an n-dimensional space to search for solutions for an n-variable function optimization problem. The Particle Swarm Optimization algorithm includes some tuning parameters that greatly influence the algorithm performance, often stated as the exploration exploitation trade off. Exploration is the ability to test various regions in the problem space in order to locate a good optimum, hopefully the global one. Exploitation is the ability to concentrate the search around a promising candidate solution in order to locate the optimum precisely [8][9][10][11]. El-Desouky et al., in [10] proposed a more enhanced particle swarm algorithm depending on exponential weight variation instead of varying it linearly which gives better results when applied on some benchmarks functions. In this paper three models are compared: 1) A Tree structure is created with ACO 2) A Tree structure is created with ACO and the parameters are optimized with PSO 3) A Tree Structure is created with ACO and the parameters are optimized with EPSO. Comparisons of the three models are shown in this paper to propose an efficient methodology.

X1 w2 X2 w3

w1

+n

f(a,b)

Y

X3 Output Layer

+6

Second hidden layer

X1

X2

+2

X3

+3

+3

IV. ANT COLONY OPTIMIZATION (ACO) FOR EVOLVING THE ARCHITECTURE OF FNT ACO is a new probabilistic technique for solving computational problems to find optimal path. It is a paradigm for designing metaheuristic algorithm for combinatorial optimization problems. The main underlying idea, inspired by the behavior of real ants, is that of a parallel search over several constructive threads based on local problem data and on a dynamic memory structure containing information on the quality of previously obtained results. In this algorithm, each ant will build and modify the trees according to the quantity of pheromone at each node. Each node memorizes the rate of pheromone. First, a population of programs is generated randomly. Each node is initialized at 0.5, which means that the probability of choosing each terminal and function is equal initially. The higher the rate of pheromone, the higher the probability to be chosen. Each ant is then evaluated using a predefined objective function which is given by Mean Square Error (MSE)[7].

First hidden layer

+3

X1

X2

+2

X3

Input layer

X1

X2

X3

X3

X2

X1

X2

X3

Fig. 1. A flexible neuron operator (left), and a typical representation of the FNT with function instruction set F = {+2,+3,+4,+5,+6}, and terminal instruction set T = {x1, x2, x3} (right)

III. SWARM INTELLIGENCE ALGORITHMS. Swarm Intelligence (SI) has recently emerged as a family of nature inspired algorithms, especially known for their ability to produce low cost, fast and reasonably accurate solutions to complex search problems [1]. It gives an introduction to swarm intelligence with special emphasis

Fit (i) =1/p ∑p j=1 (At - Ex)2

(1)

238

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010

Where p is the total number of samples, At and Ex are actual and expected outputs of the j denotes the fitness value of the ith ant.

th

sample.

Fit(i)

(i is the index of the particle) and a velocity represented by a velocity-vector vi. Each particle remembers its own best position so far in a vector x. Each particle keeps track of its own best position, which is associated with the best fitness it has achieved so far in a vector pi. The best position among all the particles obtained so far in the population is kept track of as pg. Each particle i maintains the following information: xi the current position of the particle, vi the current velocity of the particle must be defined by parameters vmin and vmax. At each time step t, by using individual best position pi, and all the global best position, pg(t), a new velocity for particle i is updated by[1] Vi (t+1) = wvi(t)+c1φ1(pi(t) – xi(t))+

The pheromone is updated by two mechanisms: – 1. Trail Evaporation: - Evaporation decreases the rate of pheromone for every instruction on every node, in order to avoid unlimited accumulation of trails, according to following formula: Pg = (1 − α) Pg−1 (2)

where Pg denotes the pheromone value at the generation g, α is a constant (α = 0.15). – 2.Daemon actions: - For each tree, the components of the tree will be reinforced according to the Fitness of the tree. The formula is

c2φ2 (pg(t) – Xi(t))

(4)

Pi,si = Pi,si + α

F it(s)

(3)

where s is a solution (tree), Fit(s) its Fitness, si the function or the terminal set at node i in this individual, á is a constant (á = 0.1), Pi,si is the value of the pheromone for the instruction si in the node i[7]. A brief description of AP algorithm is as follows:(1) every component of the pheromone tree is set to an average value; (2) random generation of tree based on the pheromone; (3) evaluation of ants (4) update of the pheromone; (5) go to step (1) unless some criteria is satisfied[7] V. PARAMETER OPTIMIZATION WITH PSO. PSO [12] is in principle such a multi-agent parallel search technique. It does not require any gradient information of the function to be optimized, uses only primitive mathematical operators. Particles are conceptual entities which fly through the multi-dimensional search space. PSO was inspired by the social behavior of a bird flock or fish school.PSO[13] conducts searches using a population of particles which correspond to individuals. In the PSO algorithm, the birds in a flock are symbolically represented as particles. These particles can be considered as simple agents flying” through a problem space. A particle’s location represents a potential solution for the problem in the multi-dimensional problem space. A different problem solution is generated, when a particle moves to a new location. PSO model consists of a swarm of particles, which are initialized with a population of random positions. They move iteratively through the d-dimension problem space to search the new solutions, where the fitness, f, (Eqn. (1)) can be calculated as the certain qualities measure. Each particle has a position represented by a position-vector xi

Where w is the inertia weight whose range is [0.4, 0.9], c1 and c2 are positive constant and are the learning factors called, respectively, cognitive parameter and social parameter. The proper fine-tuning may result in faster convergence and alleviation of local minima. The default values, usually, c1=c2=2 are used. Even by using c1=c2=1.49 gives better results. φ1and φ2 are uniformly distributed random number in range of [0, 1]. During the iteration time t, the update of the velocity from the previous velocity to the new velocity is determined. The new position is then determined by the sum of the previous and the new velocity, according to the formula:

Xi (t+1) = xi(t) + vi(t+1)

(5)

Various methods are used to identify particle to influence the individual. Two basic approaches to PSO exist based on the interpretation of the neighborhood of particles. They are (1) global best (gbest) version of PSO where the neighborhood of each particle is the entire swarm. The social component then causes particles to be drown toward the best particle in the swarm.(2) local best (lbest) PSO model, particles have information only of their own and their nearest array neighbors best(lbest) rather than that of entire group. The gbest model converges quickly but has weakness of being trapped in local optima. The gbest is recommended strongly for unimodal objective function [1]. The PSO is executed with repeated application of equation (4), (5) until a specified number of iterations has been exceeded or when the velocity updates are close to zero over a number of iterations. The PSO algorithm work as follows: 1) Initial population is generated randomly. The learning parameters c1, c2 are assigned in advance.2) The objective function value for each particle is calculated.3) Search point is modified. The current search point of each particle is changed using Equations (4) and (5).4) If

239

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 9, December 2010

maximum number of iterations is reached, then stop; otherwise go to step (2). VI. EXPONENTIAL PARTICLE SWARM OPTIMIZATION (EPSO) In linear PSO, the particles tend to fly towards the gbest position found so far for all particles. This social cooperation helps them to discover fairly good solutions rapidly. However, it is exactly this instant social collaboration that makes particles stagnate on local optima and fails to converge at global optimum. Once a new gbest is found, it spreads over particles immediately and so all particles are attracted to this position in the subsequent iterations until another better solution is found. Therefore, the stagnation of PSO is caused by the overall speed diffusion of newly found gbest [10]. An improvement to original PSO is constituted by the fact that w is not kept constant during execution; rather, starting from maximal value, it is linearly decremented as the number of iterations increases down to a minimal value [4], initially set to 0.9, decreasing to 0.4 over the first 1500 iterations if the iterations are above 1500, and remaining 0.4 over the remainder of the run according to

flexible activation function parameters) encoded in the best tree formulate a particle. 5) If the maximum number of local search is reached, or no better parameter vector is found for a significantly long time then go to step 6); otherwise go to step 4); 6) If satisfactory solution is found, its corresponding informative genes are extracted, then the algorithm is stopped; otherwise go to step 2). VII. RESULTS As a Preliminary study, the Wisconsin Prognostic breast cancer (WPBC)[18] data set has 34 attributes (32 realvalued) and 198 instances. The methodology adopted for breast cancer data set was applied. Half of the observation was selected for training and the remaining samples for testing the performance of different models. All the models were trained and tested with same set of data. The instruction set used to create an optimal FNT classifier S = FUT = {+2,……… ,+N} U {x0.x1,…..,x31}Where xi (i=0,1,….31) denotes the 32 input features. To get an optimal tree structure an ACO algorithm is applied. In this experiment the input is the number of ant and the number of iterations. Each ant is made to run for a specified number of iterations. Each ant constructs a neural tree with its objective function which is calculated as MSE. The ant which gives the low MSE is taken to be the best tree for which the parameters are optimized with PSO and EPSO. The tree which produces the low error is the optimized neural tree and this extracts the informative genes. As with breast cancer data set, it was well proven that the tree structure with ACO and parameter optimization done with EPSO can achieve better accuracy compared with the other models. The main purpose is to compare the models quality, where the quality is measured according to the error rate, mean absolute percentage error and accuracy. The ACO-EPSO model has the smallest error rate when compared with the other models. All the three models are made to run for the same number of iterations and the results shows that ACO-EPSO success to reach optimal minimum in all runs. This method gives the best minimum points better than the other models. This is depicted in the following figures. In Figure 1 and 2 the error rate and mean absolute percentage error of the model ACO-EPSO is low when compared with ACO and ACO–PSO.

W = (w – 0.4) (MAXITER - ITERATION) MAXITER

+ 0.4 (6)

MAXITER is the maximum number of iterations, and ITERATION represents the number of iterations. EPSO has a great impact on global and local exploration it is supposed to bring out the search behavior quickly and intelligently as it avoid the particles from stagnation of local optima by varying this inertia weight exponentially, as given W = (w – 0.4) e(

MAXITER - ITERATION )-1

/ MAXITER

+ 0.4

(7)

By using the Equation (7) the movement of particles will be faster and distant from each other. A. General learning Procedure:

The general learning procedure for constructing the FNT model can be described as follows. 1) Create an initial population randomly (Set FNT trees and its corresponding parameters); 2) Structure optimization is achieved by the Ant Colony Optimization Algorithm. 3) If a better structure is found, then go to step 4), otherwise go to step 2); 4) Parameter optimization is achieved by the EPSO algorithm. In this stage, the architecture of FNT model is fixed, and it is the best tree developed during the end of run of the structure search. The parameters (weights and

240

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Fig1: Comparison of models in terms of error rate

**Fig3: Comparison of models in terms of accuracy
**

VIII.CONCLUSION

Fig2: Comparison of models in terms of mean absolute percentage error

A new forecasting model based on neural tree representation by ACO and its parameters optimization by EPSO was proposed in this paper. A combined approach of ACO and EPSO was encoded in the neural tree was developed. It should be noted that there are other treestructure based evolutionary algorithms and parameter optimization algorithms that could be employed to accomplish same task but this proposed model yields feasibility and effectiveness .This proposed new model helps to find optimal solutions at a faster convergence. EPSO convergence is slower to low error, while other methods convergence faster to large error. The Proposed method increases the possibility to find the optimal solutions as it decreases with the error rate.

REFERENCES

In Figure 3 the accuracy of the model with ACO-EPSO is high, which shows that the proposed model is highly efficient that it could be used for faster convergence and slower error rate.

[1] Swagatam DAjith Abraham ,Amit Konar,”Swarm Intelligence Algorithms in Bioinformatics”,Studies in Computational Intelligence(SCI)94,113-147 Springer – Verlag Berlin Heidelberg (2008). [2] Per Broberg,”Statistical methods for ranking differentially expressed genes molecular sciences”,AstraZeneca Research and Development Lund,s-221 87 Lund,Sweden 7 May 2003. [3]Hong Chai and Carlotta Domeniconi, “An Evaluation of Gene Selection methods for multiclass microarray Data Classification”, Proceedings of the second European Workshop on Data Mininig and Text Mining in bioinformatics.

241

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

[4] J.Jager,R.Sengupta,W.L.Ruzzo,” Improved gene selection for classifications of microarrays”,Pacific Symposium on Biocomputing8:53-64(2003). [5] Yuehui Chen, Ajith Abraham and Bo Yang, “Feature Selection & Classification using Flexible Neural Tree”, Elsevier Science, 15 January 2006. [6] Chen, Y., Yang, B., Dong, J., “Nonlinear systems modelling via optimal design of neural trees”. International Journal of Neural Systems. 14, (2004) 125-138. [7] Yuehui Chen, Bo Yang and Jiwen Dong, “Evolving Flexible Neural Networks using Ant Programming and PSO Algorithm”, Springer – Verlag Berlin Heidelberg 2004. [8] Li-ping, Z., Huan-jun, Y., Shang-xu, H., Optimal Choice of Parameters for Particle Swarm Optimization, Journal of Zhejiang University Science, Vol. 6(A)6, pp.528-534, 2004. [9] Sousa, T., Silva, A., Neves, A., Particle Swarm Based Data Mining Algorithms for Classification Tasks, Parallel Computing 30, pp. 767-783, 2004. [10] El-Desouky N., Ghali N., Zaki M., A New Approach to Weight Variation in Swarm Optimization, proceedings of Alazhar Engineering, the 9th International Conference, April 12 14, 2007. [11] Neveen I.Ghali, Nahed EL-Dessouki, Mervat A.N and Lamiaa Bakrawi, “Exponential Particle Swarm Optimization approach for Improving Data Clustering”, World Academy of Science, Engineering & Technology 42, 2008. [12] Kennedy.j, Eberhart.R. and Shi.Y. (2001),”Swarm Intelligence”,Morgan Kaufmann Academic Press. [13] Kennedy, J., Eberhart, R., Particle Swarm Optimization, Proceedings of the IEEE International joint conference or Neural networks, vol.4, pp. 1942-1948, 1995. [14] Shi, Y., Eberhart, R., Parameter Selection in Particle Swarm Optimization, proceedings of the 7th International Conference on Evolutionary Programming VII, pp. 591 – 600, 1998. [15]Vasantha Kalyani David, Sundaramoorthy Rajasekaran: “Pattern Recognition using Neural and Functional Networks” Springer 2009 [16] Yuehui Chen, Ajith Abraham and Lizhi Peag, “Gene Expression Profiling using Flexible Neural Trees”, Springer – Verlag Berlin Heidelberg 2006. [17] Dr. V. Sarvanan, R. Mallika, “An Effective Classification Model for Cancer Diagnosis using Micro Array Gene Expression Data” IEEE xplore. [18] Available on the UW CS ftp server, ftp ftp.cs.wics.edu, cd math-prog/cpo-dataset/machine-learn/WPBC/ [19] Yuehui CHEN,Ajith Abraham and Yong Zhang ,”Ensemble of Flexible Neural Trees for Breast Cancer Detection”2003.

242

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

- Journal of Computer Science IJCSIS March 2016 Part II
- Journal of Computer Science IJCSIS March 2016 Part I
- Journal of Computer Science IJCSIS April 2016 Part II
- Journal of Computer Science IJCSIS April 2016 Part I
- Journal of Computer Science IJCSIS February 2016
- Journal of Computer Science IJCSIS Special Issue February 2016
- Journal of Computer Science IJCSIS January 2016
- Journal of Computer Science IJCSIS December 2015
- Journal of Computer Science IJCSIS November 2015
- Journal of Computer Science IJCSIS October 2015
- Journal of Computer Science IJCSIS June 2015
- Journal of Computer Science IJCSIS July 2015
- International Journal of Computer Science IJCSIS September 2015
- Journal of Computer Science IJCSIS August 2015
- Journal of Computer Science IJCSIS April 2015
- Journal of Computer Science IJCSIS March 2015
- Fraudulent Electronic Transaction Detection Using Dynamic KDA Model
- Embedded Mobile Agent (EMA) for Distributed Information Retrieval
- A Survey
- Security Architecture with NAC using Crescent University as Case study
- An Analysis of Various Algorithms For Text Spam Classification and Clustering Using RapidMiner and Weka
- Unweighted Class Specific Soft Voting based ensemble of Extreme Learning Machine and its variant
- An Efficient Model to Automatically Find Index in Databases
- Base Station Radiation’s Optimization using Two Phase Shifting Dipoles
- Low Footprint Hybrid Finite Field Multiplier for Embedded Cryptography

Sign up to vote on this title

UsefulNot usefulBuilding a model plays an important role in DNA microarray data. An essential feature of DNA microarray data sets is that the number of input variables (genes) is far greater than the number of sam...

Building a model plays an important role in DNA microarray data. An essential feature of DNA microarray data sets is that the number of input variables (genes) is far greater than the number of samples. As such, most classification schemes employ variable selection or feature selection methods to pre-process DNA microarray data. In this paper Flexible Neural Tree (FNT) model for gene expression profiles classification is done. Based on the predefined instruction/operator sets, a flexible neural tree model can be created and evolved. This framework allows input variables selection, over-layer connections and different activation functions for the various nodes involved. The FNT structure is developed using the Ant Colony Optimization (ACO) and the free parameters embedded in the neural tree are optimized by Particle Swarm Optimization (PSO) algorithm and its enhancement (EPSO). The purpose of this research is to find the model which is an appropriate model for feature selection and tree-based ensemble models that are capable of delivering high performance classification models for microarray data.

- APSO_1_2009
- chp%3A10.1007%2F978-3-540-49774-5_20
- 8. Multi Objective Evolutionary FULL
- Algo
- An Introduction to Multi-Objective Evolutionary Algorithms and Some of Their Potential Uses in Biology
- 7645
- Normal-boundary intersection based parametric multi-objective optimization of green sand mould system
- _CI 1_Course Intro & Outline
- 5232
- Banks 2008
- sbalzarini00
- An Orthogonal Design Based Constrained Evolutionary Optimization Algorithm
- 109-405-1-PB
- An Improved Multimodal PSO Method Based on Electrostatic Interaction Using N-Nearest-Neighbor Local Search
- 10.1.1.14
- Biologically Inspired Algorithms
- Differential Evolution with Self-adaptation and Local Search for Constrained Multiobjective Optimization
- Wps 045
- mezura-swevo
- Using Traceless Genetic Programming for Solving Multiobjective Optimization problems
- ECC-40
- yöneylem
- NC. 1. ETSNT 2009 - A Comparison of Multiobjective Evolutionary Algorithms
- Multi-Objective Optimization Using
- Ahmed 2013
- An Evolutionary Approach for Solving the Multi-Objective Job-Shop Scheduling Problem
- Using Multi-ob jective Evolutionary Algorithms for Single-Ob jective Optimization
- NBSGAIII - c2014022
- Coello Coeelo
- Dynamic vs Greedy
- Constructing Models for MicroArray Data with Swarm Algorithm