You are on page 1of 5

2011 Seventh International Conference on Natural Computation

A Study of Hybrid Parallel Genetic Algorithm Model
WANG Zhu-rong , JU Tao, CUI Du-wu,HEI Xin-hong
School of Computer Science and Engineering Xi’an University of Technology Xi’an, China
Abstract—Genetic algorithms is facing the low evolution rate and difficulties to meet real-time requirements when handing large-scale combinatorial optimization problems. In this paper, we propose a coarse-grained-master-slave hybrid parallel genetic algorithm model based on multi-core cluster systems. This model integrates the message–passing model and the shared-memory model. We use message-passing model—MPI among nodes which correspond to coarse-grained Parallel Genetic Algorithm (PGA), meanwhile use share-memory model—OpenMP within the node which correspond to masterslave PGA. So it can combine effectively the higher parallel computing ability of multi-core cluster system with inherent parallelism of PGA. On the basis of the proposed model, we implemented a hybrid parallel genetic algorithm (HPGA) based on two-layer parallelism of processes and threads, and it is used to solve several benchmark functions. Theoretical analysis and experimental result show that the proposed model has superiority in versatility and convenience for parallel genetic algorithm design. Keywords-Genetic Algorithm;Parallel Programming Model; Multi-core cluster system; OpenMP; MPI

design various parallel strategies, and applied the PGA on a variety of parallel computer problems. As the traditional parallel machine is very expensive, the general user can not afford to it, which undoubtedly prevented a normal user’s pace using parallel genetic algorithms to solve the largescale combinatorial optimization problems. As the technology advanced and price falls, multi-core CPU is increasingly popular. Using high-performance multi-core PC to construct cluster system, features lower investment risk, structural flexibility, scalability, easy implementation, high cost-performance and so on. Therefore, it can easily obtain high–computing performance. How to combine the GA very well with the existing parallel computer system very well, and to design effective parallel genetic algorithm and the realization of the corresponding system, have a positive meaning of theory and applied research of GA. Thus the realization of parallel genetic algorithm model has become an important research direction. II. RELATED WORK The research of PGA mainly includes following aspects: population size, encoding means, parameter setup which effected the efficiency of the PGA [3][4], implementation model of PGA[5],hybrid parallel genetic algorithm theory research[6][7] and the application of PGA[8][9]. Researches listed above mainly focus on the theory of the inherent characteristics of PGA, and only combine the parallel genetic algorithms with parallel programming techniques. Can we fundamentally change the previous concurrent implementation ways using serial method to simulate parallel genetic algorithms, to realize real PGA? Arunadevi et al. [9] studied of the hybrid programming model-MPI+OpenMP on the multi-core node cluster system and combined the advantages of the two parallel programming models to obtain better performance. Xiaoping et al. [10] realized a master-slave parallel genetic algorithm framework on the basis of the MPI, but the master-slave parallel genetic algorithm can not fully play to the cluster computing performance of each node because of its communication restrictions. So it can not work well to the inherent parallelism of PGA and high-speed parallel computing performance of cluster system, and can not deal with very well to the large-scale complex combinatorial optimization problem. In order to give full play to the high-speed parallel computing performance of multi-core PC and inherent



Genetic Algorithms (GA), a kind of global optimization algorithms, has many advantages, such as independency of relevant areas of problems solved, strong robustness to the type of the problems, so it is widely applied in many disciplines. Along with the rapid development of science and technology, the scale of the optimization problem is getting bigger and bigger, and complexity of searching space is getting higher and higher. So people put forward higher requirements for solution quality and processing speed of traditional GA. The use of traditional GA for solving such complex optimization problems, needs more quantity of individual and a large number of calculations, and does not meet the requirements of real time as its slow evolution. So the processing method of traditional GA appeared to be inadequate [1][2]. The inherent parallelism of GA leads itself much suitable to be realized on the large scale parallel machines. If realized the Parallel Genetic Algorithms (PGA) through effectively combing the inherent parallelism of PGA with the higher parallel computing ability of parallel machines, we can overcome the traditional GA deficiencies and increase solution quality of GA greatly and speed up its convergence. In recent years, the PGA research and applications have been widespread concerned. Many researchers dedicated to

978-1-4244-9953-3/11/$26.00 ©2011 IEEE


“coarsegrained-coarse–grained” model. and achieve two-layer parallelism among and within populations. so the coarsegrained-master-slave hybrid parallel genetic algorithm can be well mapped on multi-core PC cluster. and hybrid PGA (HPGA)[1]. and “coarse-grainedmaster-slave” model [2]. The algorithm can overcome the barriers of local convergence. and presentes an effective solution method for a general user to handle the complex combinatorial optimization problems by PGA on the low cost. we can map the coarse-grained evolution among the sub-populations of HPGA as shown in figure 2 to the various computing nodes of the multi-core PC cluster for realizing parallel execution. load balancing. which results different types of parallel genetic algorithm models. HPGA is generally designed by hierarchical structure. There are three models of HPGA currently: “coarse-grained-fine-grained” model. so that the excellent genetic information rapidly spread among the sub-populations. Most of practical application is coarsegrained –master-slave model. p2. The population is assigned to different computing nodes with different methods. and realized the coarsegrained PGA among nodes and master-salve PGA within node. synchronization not easy to control. Hybrid genetic algorithm is a multi-layer parallel model which combines the first three basic parallel genetic algorithms model. poor real-time and scalability. and p4 represent four multicore computer nodes of PC cluster respectively. The basic idea of PGA is to realize parallelization of traditional genetic algorithms through multi-group parallel evolution and transport operator that exchange information among populations. p1. and the evolution among the sub-populations is implemented according to the coarse-grained PGA. The proposed model fundamentally improves the solution quality and convergence speed of PGA. and through using transport operator to achieve information exchange of excellent genes. and using hybrid programming of MPI and OpenMP. Through combining the physical topology of the current multi-core PC cluster with the logical structure of the “coarse-grained-master-slave” hybrid parallel genetic algorithm. fine-grained PGA (FPGA). but compute cost is too high. Parallel genetic algorithms currently have the following four models: master-slave PGA (MPGA). thus it can accelerate the convergence speed and improve solution quality of genetic algorithms. p2. 1039 . the upper layer commonly used coarse-grained model. p3. PGA Model Analysis Parallel genetic algorithms seek to combine the highspeed parallel computing of the parallel computers with the inherent parallelism of GA. The combination of multi-core PC cluster can give full play to “coarse-grained-masterslave” hybrid PGA parallelism. maintain and enrich the diversity of population. The evolution within the sub-population is implemented according to the masterslave PGA. and realize evolution of global optimal direction. we achieved both process and thread level parallelism. Physical topology of Multi-core cluster Figure 2. so it further accelerates the convergence speed and improves the solution quality of algorithms. and c1. HPGA Model Design The existing realization of HPGA either simulated by serial method on single computer or implemented on a distributed system. B. Meanwhile map the master-slave evolution within the subpopulation of HPGA to the multiple processing cores of the each computing node to realize the two-layer parallelism among computing nodes and between processing cores within the computing nodes. and effectively and rapidly complete search of the complex issues. HPGA Model In Figure 1. these methods did not fully utilize the inherent parallelism of PGA due to the factor of communication delay. In view of the fact that the physical structure of multi-core PC cluster system just coincide with the logical structure of coarse-grained-masterslave hybrid parallel genetic algorithm. III. The evolution of the populations is assigned to different computing nodes of parallel computer systems to realize distributed evolution. reduce the likelihood of precocity. Figure 1. we propose a hybrid parallel genetic algorithms realization model on the multi-core PC cluster. speed up the search process. the lower layer can use any another kind of model. p3. c2 represent each processor core of multi-core computing nodes (cluster mentioned in this paper is composed of four dual-core nodes). Based on the analysis of the two structures above.parallelism of hybrid parallel genetic algorithms. and each sub-population is divided into several more smaller populations. There was direct implementation of PGA on the parallel computer. The Figure 1 and Figure 2 illustrate the specific topology structure respectively as follow. coarse-grained PGA (CPGA). and p4 in Figure 2 represent upper layer’s various sub-populations of HPGA structure respectively. However.The p1. D ES IGN AND I MPLEMENTATION OF H PGA M ODEL A.

Each multi-core computing node independently executes the GA operation of a sub-population. When implementing the HPGA. The upper layer consists of the coarse-grained PGA.C. The main purpose to combine two parallel programming models is to make full use of the resource of multi-core cluster. other threads to perform calculations. how to reconstruct the algorithm in order to obtain better performance in multi-core processor and so on. The lower layer consists of the master-slave PGA. load balancing. Each thread decides which part of it is to be operated according to the number of threads and their own thread ID in the process. After a certain interval of evolution. The physical layer is responsible for constructing specific parallel programming model. Inside of the multi-core computing node. The physical layer is divided into two level structures. so that all processor core of multi-core computer will be utilized effectively. the information exchanging between nodes is necessary in order to handle the task effectively. we let each node in the multicore cluster correspond to a process. it maps to the multi-core node of the physical layer to independently execute GA operation. and largely improve the performance. and communication between nodes is completed by calling the basic functions of MPI. only the loop part is multi-thread parallel computation. OpenMP programming model. and then assign each of these sub-populations to a computing node of multi-core cluster. each part is assigned to a multi-core computing node (that is a process) to handle. Before parallel implementation. and the lower one refers to the parallel of within node. which can generate multiple lightweight threads with OpenMP approach within MPI process. For each process. because MPI is only used to deal with coarse-grained communication among the processes. OpenMP can quickly share data through shared memory. run-time library functions and environment variables. The task in each computing node is parallel handled by message passing model-MPI. in the physical layer implementation. it maps to the inside of each node of multi-core cluster to implement the specific GA. uniformly handled. by providing a group of platform-independent pragma. Furthermore. so it can shorten the response time of task. the task of each node is further divided into more smaller tasks according data decomposition means. we first divide the whole populations into a certain number of sub-populations according to the number of nodes of the multi-core cluster. the individual migration is executed between the various populations by using certain migration strategies to exchange population information and maintain the population diversity. b) Threads-level parallelism implementation: Completed the process-level task decomposition. guides the complier to execute parallel operation according to the program parallelism. and thereby enhance the performance of parallel programs. overlap communications and computation. by using load balance strategy of OpenMP as far as possible ensure balancing load of every processor. At the same time. and the logic layer for implementing the HPGA. To achieve processes and threads two-level parallelism. then the decomposed task is assigned to the different computing core of the multicore processor to generate multiple threads for parallel executing. these details are completed by the compiler and the OpenMP thread library. we use the master-slave PGA approach to implement the master-slave 1040 . ensure load balancing. MPICH2 is used for the implementation of specific message passing model. a) Process-level parallelism implementation: By way of task decomposition. (2 )Logic layer implementation The logic layer is used to implement hybrid parallel genetic algorithms. Hybrid parallel programming model can greatly reduce the number of communication processes. HPGA Model Implementation The implementation of HPGA model based on the multi-core PC cluster is divided into physical layer and logic layer. synchronization. rationally. then let the main thread or a thread assigned to perform communication. destruction and other technical details. The parallel programming model within node uses share memory parallel programming model-OpenMP. reduce the communication overhead at a large extent. The threadlevel parallelism is implemented by calling the pragma and run-time library function provided by the OpenMP in order to implement parallelization of the loop part within an inner node. other code segments are executed by a single MPI process. and the process is divided into multiple threads according to the number of the processor cores. Using hybrid parallel programming model. so that it implements the coarsegrained parallelism between populations. we combine the message passing and shared memory parallel programming model to achieve MPI and OpenMP hybrid programming. Application developers only need to consider the problems such as which kinds of codes should be executed in the manner of multi-threading. the upper structure refers to the parallel of between nodes. and in each constant time interval or when certain conditions are met. (1) Physical layer implementation In order to take full advantage of the hierarchy structure characteristics of multi-core cluster. which includes two layers. So that the model can effectively achieve process and thread level parallelism. the problem is divided into small parts which communication is not frequent. So application developers do not need to explicitly deal with the complex thread creation. and therefore resolve perfectly the interaction between processor cores within multi-core computer node. We adopt the “coarse-grained-masterslave” HPGA model in this paper. it is needed to ensure that there is no data dependency in parallel execution.

ICSOA represents the experimental results in[14] at the same experimental conditions.1 ) 2 + 1).17738e-011 3.50959e-006 2. Several functions referred in [14] are solved by using of HPGA.78419 Accuracy Figure 3 Evolution process for function f 1 Figure 4 Evolution process for function f 2 The results of the same experiment averaged over 30 runs for f1 and f2.667546 2. and the load balancing between threads was guaranteed as well. y ) = 100 ( x 2 − y) 2 + ( x − 1 ) 2 .309091 29. 25 (sin( 50 ( x 2 + y 2 ) 0 .121934 6.115345 F2 3.988512 1. the individual fitness computing and genetic operating can be executed by different thread respectively.141250 3.115720 4. the pragma of OpenMP is invoked to parallelize the section of FOR LOOP in the program.124465 1.043432 6. we can find that HPGA has better performance.898834 HPGA(Seconds) F2 0. Especially.826940 1. and standard deviation(SD).82429 3. so the probability of conflict can be significantly reduced when the number of processors simultaneously access the same memory area.85016 Speedup 3.100 ] Table I Optimization result of test functions Algoritm type SGA HPGA ICSOA BS MBS SD f1 6. this approach uses multi-core programming technology to generate multiple parallel execution threads. y ∈ [ − 20 . HPGA MODEL VERIFICATION AND ANALYSIS A HPGA Model Verification On the basis of the programming model above. while the other threads parallelly calculate the individual fitness.62723 3.76777 3.86867 3.678623 1. mean best solutions(MBS). Through above experimental results.546479 110.67745e-019 * f1 5.46393e-006 3.77532 3.81011 3. y ) = ( x 2 + y 2 ) 0 . Through analyzing the parallel genetic algorithm program of selected benchmark functions in this paper. The information exchange between main thread and the other threads includes task is distributed from main thread to the other threads.66568 3.183552 1. As the problem size of PGA is fixed before handling. we design and implement a hybrid parallel genetic algorithm (HPGA). we 1041 .48088e-016 1. the static balancing scheduling strategy defaulted in OpenMP is used to ensure the task was divided into blocks of equal size. the result mainly are mean best solutions(BS). In Table I.67145 3.780353 4.56775 F1 0.02829e-014 * f2 4.345357 4.31566e-012 3.57846e-018 0.218334 Table III The speedup between SGA and HPGA to get the same accuracy Accuracy E-9 E-10 E-11 E-12 E-13 E-14 3.404434 115.60023 3.19594e-017 1.70123 3.31e-007 f2 4. The main thread is responsible for the main genetic operations. B Analysis and Discuss When implement the thread-level parallelism of HPGA model. x .69946e-018 2. The iterations in the FOR LOOP are divided into small iteration blocks.342384 6. Two of these functions’ expression and illustrations are as follows: f 1 ( x . 20 ] f 2 ( x .553485 0.150202 1. x . and the computing results is sent from the other threads to main thread.0460 f1 7.evolution within the node.73462e-018 0. and assigned to different threads to parallel execute.0134 Table II Computation time required in SGA and HPGA to get the same accuracy SGA(Seconds) F1 E-9 E-10 E-11 E-12 E-13 E-14 2.31827e-011 3. The main experimental results are as follows: IV. y ∈ [ − 100 .765653 29.30e-007 f2 1.353546 8.

The 15th International Conference Advanced Computing and Communications. In the experimental test. We used two-layer parallel both process and threads to implement HPGA. 2010:1-12. so that HPGA can deal with the large-scale complex combinatorial optimization and real world problems. Chinese Journal of Computer . Nikolopoulos D.. The speedup calculated by the Amdahl law under the hybrid parallel programming model is as follow: 1 (2) ψ = ≈ 4 . XU Bao-wen. A Survey of Parallel Genetic Algorithms. If we use the hybrid parallel programming model. and put forward a kind of specific hybrid parallel genetic algorithm implementation model.(in chinese) [14] Xu Guang-hua. Skelton.Improvement of Real-Valued Genetic Algorithm and Performance Study. The communication cost is the bottleneck for further optimizing the parallel genetic algorithms. The 17th EUROMICRO International Conference on Digital Object Identifier. [12] Dong Li. The 11th IEEE International Conference on High Performance Computing and Communications. Parallel&Distributed Processing (IPDPS). The analysis and research of parallel genetic algorithm.8 percent of the above ideal theoretical computing value that is theoretical value and does not consider the communication cost. 2 / 4 + 0 .It accounts for 90. the migration is carried out within a certain interval. 2009:427-436. Parallel Genetic Algorithms with Schema Migration. which also reduces the communication cost. [13] Ren Zi-wu. Liu Dan. its communication cost can be ignored. Parallel genetic algorithms with migration scheme based on penetration theory. so the maximum speedup theoretical value calculated by the Amdahl law is as follow: 1 1 (1) ψ = = ≈ 3 .28(7):146-152. output results and so on. 08 1 the concept of “population pool” in implementing the migration operation to reduce communication costs. 1 + (1 − 0 .31% compared with the single MPI model. Chang Bing-guo. AN Zhu-lin.60873035) and Scientific Research Program Fund of Education Department of Shaanxi Province (No. computing trends.S. MPI/OpenMP parallel programming on cluster of multi-core SMP nodes. Chinese Journal of Computer.11(3):416-420.2007.W. [7] WU Hao-yang. Because communication within a node use shared memory means. Journal of Xian Jiao Tong University. receiving optimal migration individuals. integrated the MPI and OpenMP parallel programming model. [9] Meghanathan N. REFERENCES [1] Zdeně k Konfrst.Hybrid. Parallel Distributed and Network-based Processing. After using the above strategy.2010JK713). Liang Lin. 2007. we see that there is 80 percent of program can be parallel executed by two threads. The cost is mainly consumed by the selected individual’s migration among the populations. Parallel techniques for physically based simulation on multi.Jost G. and the calculation of the individual fitness.can estimate that 10 percent of program need be serially implemented.19(2):177-181. ACKNOWLEDGMENT This work was supported by National Natural Science Foundation of China (No. 2009:615-621. They include the initialization of the parallel environment. If we only use the parallel programming model of MPI.ACTA ELECTRONICA SINICA.R. the communication cost of the algorithm is greatly reduced. Since we introduced the population pool in the algorithm.core process or architectures. and each process corresponds to a node. the maximum measured speedup value is 3. Master-Slave parallel genetic algorithm framework on MPI. A multi-group parallel geneticalgorithm base on simulated annealing method. thus it prevents the sub-populations from spreading out the individual information in the case of not fully evolved. Proceedings of the 18th International Parallel and Distributed Processing Symposium. JOURNAL OF SOFTWARE. de Dupinski B. [3] THOMASZEWSKIB. 1 + (1 − 0 . f is proportion of serial section in the entire algorithm . the number of the dualcore PC is 4.294-301.2003. it only has the process-level parallelism. 26 2 0 . Hybrid MPI/OpenMP power-aware computing. In the program.and p is the number of processors. [8] Rabenseifner R. 2007:578-583. we combined the physical topology of the multi-core PC cluster with the logical structure of coarse-grained-master-slave PGA. 2000. Immune clonal selection optimization method with combining mutation strategies. 1) / 4 where ψ1 is the speedup.Hager G. Intelligent transport route planning using parallel genetic algorithms and MPI in high performance computing cluster. 1996:897-902. Ong Hang See . Computers and Graphics. 2004. [5] GUAN Yu. 2010 IEEE International Symposium on Digital Object Identifier. JOURNAL OF SYSTEM SIMULATION.G. 8 / 8 ) Compare with the two kinds of speedup obtained. 2004.26(3):. V. BLOCH I NGER W. 35(2):269-274. A coarse-grained parallel genetic algorithm with migration for shortest path routing problem. ZHANG Ming-yi. 2008(32):25-40. 2008:1-4. IEEE xPlore. Our next step is to do further research for proposed model and corresponding HPGA. Parallel genetic algorithms: advances. Urbana2 Champaign: Illinois Genetic Algorithms Laboratory. Rina Azlin Razali.(in chinese) [11] Salman Yussof. and introduced 1042 .Marina Md Din. and the number of all the processor core is 8.applications and perspectives.16(9):38-41. the broadcast of parallel genetic algorithm’s parameters. At the same time. They include the genetic operations of the selection. each process can generate two parallel threads in its interior.San Ye. the mutation. Through analyzing program’s each process. f + (1 − f ) / p 0 . we got the following conclusion: the speedup got by using the hybrid parallel programming model increased by 38. [2] Xue Shengjun.CONCLUSION In this paper.2005.86867. the communications of different populations are significantly reduced.. [6] LAI Xin-sheng. [4] Cantú-Paz E. [10] LIU Xiao-ping. the crossover. Guo Shaoyong. PABSTS. 1 ) * ( 0 .