You are on page 1of 26

A

SEMINAR REPORT
ON

GENETIC ALGORITHM

Submitted to: Er. Richa Dutta Lecturer

Submitted by: Neh yadav(4108030) Kamini (4108021) CSE-8TH Sem

YAMUNA INSTITUTE OF ENGINEERING AND TECHNOLOGY GADHOLI

1

Abstract
Genetic algorithms provide heuristic solutions for combinatorial-optimization problems that have found applications in many areas with outstanding success. Genetic algorithms is an optimization technique for searching very large spaces that models the role of the genetic material in living organisms. A genetic algorithm (GA) is a search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a part of evolutionary computing, which is a rapidly growing area of artificial intelligence. It uses techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover. A small population of individual exemplars can effectively search a large space because they contain schemata, useful substructures that can be potentially combined to make fitter individuals. Formal studies of competing schemata show that the best policy for replicating them is to increase them exponentially according to their relative fitness. This turns out to be the policy used by genetic algorithms. Fitness is determined by examining a large number of individual fitness cases. This process can be very efficient if the fitness cases also evolve by their own GAs.

2

1 A Biology Lesson Every organism has a set of rules. For example. The physical expression of the genotype . describing how that organism is built up from the tiny building blocks of life.the organism itself . a blueprint so to speak. like eye colour or hair colour. Each gene represents a specific trait of the organism. Very occasionally a gene may be mutated. The resultant offspring may end up having half the genes from one parent and half from the other. These genes and their settings are usually referred to as an organism's genotype. 3 . These rules are encoded in the genes of an organism. This process is called recombination. which in turn are connected together into long strings called chromosomes. and has several different settings. Normally this mutated gene will not affect the development of the phenotype but very occasionally it will be expressed in the organism as a completely new trait.Introduction 1. the settings for a hair colour gene may be blonde. When two organisms mate they share their genes.1. black or auburn.is called the phenotype.

specifically those that follow the principles first laid down by Charles Darwin of survival of the fittest. The basic concept of Genetic Algorithms is designed to simulate processes in natural system necessary for evolution.2 About Genetic Algorithms Genetic Algorithms are adaptive heuristic search algorithm premised on the evolutionary ideas of natural selection and genetic. Not only does Genetic Algorithms provide an alternative methods to solving problem. but this is a restrictive view. it consistently outperforms other traditional methods in most of the problems link. there are many ways to view genetic algorithms. As such they represent an intelligent exploitation of a random search within a defined search space to solve a problem. which might prove difficult for traditional methods but ideal for Genetic Algorithms . However. Genetic Algorithms has been widely studied. Genetic Algorithms have been wrongly regarded as a function optimiser. Many of the real world problems involved finding optimal parameters.1. In fact. because of its outstanding performance in optimisation. 4 . Perhaps most users come to Genetic Algorithms looking for a problem solver. experimented and applied in many fields in engineering worlds. First pioneered by John Holland in the 60s.

Finish Start 5 . This is motivated by a hope. Algorithm is started with a set of solutions (represented by chromosomes) called population. Solutions which are selected to form new solutions (offspring) are selected according to their fitness . Solution to a problem solved by genetic algorithms is evolved.the more suitable they are the more chances they have to reproduce. that the new population will be better than the old one.3 Brief Overview   Genetic algorithms are inspired by Darwin's theory about evolution. This is exactly what the problem shown here is.1.  This is repeated until some condition (for example number of populations or improvement of the best solution) is satisfied. Example Problem solving can be often expressed as looking for extreme of a function. Solutions from one population are taken and used to form a new population. Some function is given and Genetic Algorithms tries to find minimum of the function.

6 .Fig.1 Graph represents some search space and goal is to travel from the gray cell to the green cell in the shortest number of steps .1 Fig 1. 1.

There are methods for selection of the chromosomes. It is analogous to reproduction and biological crossover. the bigger chance to be selected) . upon which genetic algorithms are based. With a crossover probability cross over the parents to form a new offspring (children). [New population] Create a new population by repeating following steps until the new population is complete . [Start] Generate random population of n chromosomes (suitable solutions for the problem) 2. crossover is a genetic operator used to vary the programming of a chromosome or chromosomes from one generation to the next. offspring is an exact copy of parents. Genetic Algorithm 1. Cross over is a process of taking more than one parent solutions and producing a child solution from them.   [Selection] Select two parent chromosomes from a population according to their fitness (the better fitness. If no crossover was performed. Fig 2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population 3.2. [Crossover ] In genetic algorithms.1 7 .

Mutation alters one or more gene values in a chromosome from its initial state. the children take one section of the chromosome from each parent. The point at which the chromosome is broken depends on the randomly selected crossover point. Hence Genetic Algorithms can come to better solution by using mutation. and others are produced by crossover. [Test] If the end condition is satisfied. stop. mutation is a genetic operator used to maintain genetic diversity from one generation of a population of algorithm chromosomes to the next. Fig 2. It is analogous to biological mutation. the solution may change entirely from the previous solution.1 Shows the crossover between parent 1 and parent 2.Figure 2. you now have a new population full of individuals. As we can see. and return the best solution in current population 6. In order to ensure that the individuals are not all exactly the same. [Replace] Use new generated population for a further run of algorithm 5.2 After selection and crossover. This particular method is called single point crossover because only one crossover point exists. [Loop] Go to step 2 8 . With a mutation probability mutate new offspring at each locus (position in chromosome. In mutation. you allow for a small chance of mutation.  [Accepting] Place new offspring in a new population 4. Some are directly copied.  [Mutation] In genetic algorithms of computing.

Domain knowledge is scarce or expert knowledge is difficult to encode to narrow the search space. Genetic Algorithm have been used for problem-solving and for modeling . complex or poorly understood. once he can encode solutions of a given problem to chromosomes in Genetic Algorithms . engineering problems. Who can benefit from Genetic Algorithms  Nearly everyone can gain benefits from Genetic Algorithms. The appeal of Genetic Algorithms comes from their simplicity and elegance as robust search algorithms as well as from their power to discover good solutions rapidly for difficult high-dimensional problems.       The search space is large. Genetic Algorithms are applied to many scientific.3. and compare the relative performance (fitness) of solutions. including traveling salesman problem. No mathematical analysis is available. in business and entertainment. Traditional search methods fail. 9 .   An effective Genetic Algorithms representation and meaningful fitness evaluation are the keys of the success in Genetic Algorithms applications.

wind tunnels and balsa wood shapes. more fuel efficient and safer vehicles for all the things we use vehicles for.  Rather than spending years in laboratories working with polymers. 10 . lighter. Applications 4. Automotive Design  Using Genetic Algorithms to both design composite materials and aerodynamic shapes for race cars and regular means of transportation (including aviation) can return combinations of best materials and best engineering to provide faster.1. the processes can be done much quicker and more efficiently by computer modeling using Genetic Algorithms searches to return a range of options human designers can then put together however they please.4.

machines. but also project them forward to analyze weaknesses and possible point failures in the future so these can be avoided.4. and just about any other computer-assisted engineering design application. building trusses. etc. and some of these can not only solve design problems.2 Engineering Design  Getting the most out of a range of materials to optimize the structural and operational design of buildings. satellite booms. 11 . is a rapidly expanding application of Genetic Algorithms. turbines. flywheels. robot gripping arms.  There is work to combine Genetic Algorithms optimizing particular aspects of engineering problems to work together. These are being created for such uses as optimizing the design of heat exchangers. factories.

do our laundry and even clean the bathroom for us ! 12 . or to return results for entirely new types of robots that can perform multiple tasks and have more general application.  Genetic Algorithms can be programmed to search for a range of optimal designs and components for each specific use.  Genetic Algorithm designed robotics just might get us those nifty multi-purpose.3 Robotics  Robotics involves human designers and engineers trying out all sorts of things in order to create useful machines that can do work for humans.4. Each robot's design is dependent on the job or jobs it is intended to do. who will cook our meals. so there are many different designs out there. learning robots we've been expecting any year now since we watched the Jetsons as kids.

4.  These could take notice of your system's instability and anticipate your re-routing needs. Using more than one Genetic Algorithms circuit-search at a time. a FAX machine that only sends faxes sometimes. soon your interpersonal communications problems may really be all in your head rather than in your telecommunications system. your land line's number of 'ghost' phone calls every month? Well.  Other Genetic Algorithms are being developed to optimize placement and routing of cell towers for best coverage and ease of switching. Genetic Algorithms are being developed that will allow for dynamic and anticipatory routing of circuits for telecommunications networks. 13 . inconsistent internet access. so your cell phone and blackberry will be thankful for Genetic Algorithms too.4 Optimized Telecommunications Routing  Do you find yourself frustrated by slow LAN performance.

The shortest routes for traveling. traffic routers and even shipping companies. Early and accurate detection of cancer 14 . improving productivity as well! Chances are increasing steadily that when you get that trip plan packet from the travel agency. a Genetic Algorithms contributed more to it than the agent did. even to including pickup loads and deliveries along the way. The timing to avoid traffic tie-ups and rush hours. 6.Cancer gene search with data-mining and genetic Algorithms 6.  Most efficient use of transport for shipping.4. making it the second leading cause of death in the United States. The program can be modeling all this in the background while the human agents do other things.5 Trip.1 Introduction  Cancer leads to approximately 25% of all mortalities. Traffic and Shipment Routing  New applications of a Genetic Algorithms known as the "Traveling Salesman Problem" or TSP can be used to plan the most efficient routes and scheduling for travel planners.

confusion. Analysis of gene expression data leads to cancer identification and classification which will facilitate proper treatment selection and drug development. Gene expression data sets for ovarian. polluted water. cancer type. Solutions chosen to form new chromosomes (offspring) are selected according to the fitness. classification of diseases. genetic data (containing as many as 15. This integrated algorithm involves a genetic algorithm and correlation-based heuristics for data preprocessing (on partitioned data sets) and data mining (decision tree and support vector machines algorithms) for making predictions.  A genetic algorithm is a search algorithm based on the concept of natural genetics.Each solution in the population is evaluated based on its fitness. One of the methods for cancer identification is through the analysis of genetic data.is critical to the well being of patients. prostate. A genetic algorithm is initiated with a set of solutions (chromosomes) called the population . synthetic chemicals. Due to the high cost. A successive mutation in the normal cell that damages the DNA and impairs the cell replication mechanism . There is a need to select the most informative genes from such wide data sets . Removal of uninformative genes decreases noise. An integrated gene-search algorithm for genetic expression data analysis was proposed.g.. and complexity. and air that may accelerate the mutations. and prediction of various outcomes. radiation. and white blood cells. there is a need to identify the mutated genes that contribute to a cancerous state. connecting/muscle tissue (sarcomas). and lung cancer were analyzed in this research. These Single nucleotide polymorphisms are responsible for the variation that exists between human beings. i. the more suitable the solution the higher the 15 . certain microbes.e.There are number of carcinogens such as tobacco smoke. Knowledge derived by the proposed algorithm has high classification accuracy with the ability to identify the most significant genes. Thus.  Cancer develops mainly in epithelial cells. The human genome contains approximately10 million single nucleotide polymorphisms.. and increases the chances for identification of the most important genes. e.000 genes per patient) is normally collected on a limited number of patients (100–300 patients).

data-mining algorithms are applied to the training and testing data sets and their results are evaluated to determine the most significant gene set. This provides a chance of searching previously unexplored regions. A data-mining (i. The iterative Phase I includes data partitioning.likelihood it will reproduce. the number of populations or quality of the best solution) is satisfied. and there is a high possibility of achieving an overall optimal/near optimal solution. classification) algorithm takes a training expression data set as input and predict if the test sample is a normal or cancerous. then the gene set is repartitioned to form the next iteration of data-mining and GA–CFS(Genetic 16 . The Decision Tree algorithm is applied to each partitioned data set to determine the classification accuracy. the genetic algorithm. execution of the Decision Tree algorithm (or other data-mining algorithms) to the partitioned data set. The set of significant genes is utilized in Phase II for validation of the quality of genes.e.1). the cancer training gene data set is initially partitioned into several subsets with approximately 1000 genes in each subset (Fig.1)..2 Integrated algorithm  The integrated gene-search algorithm consists of two phases. 6.  In Phase I. The partitioning of the data sets can be performed arbitrarily or randomly. making the genetic algorithm a global search algorithm 6. Thus. The total number of genes selected (most significant as well as medium significant genes) from all the partitioned data sets is an overestimate of the actual significant gene The total number of genes selected from all the partitioned data sets are merged to formulate a single gene set (Fig. If the current gene set is more than the user-defined threshold (e. and the correlation-based heuristics for gene reduction..1000 genes). This is repeated until some condition (for example.g. Genetic algorithm searches the solution space without following crisp constraints and takes into account potentially all feasible solution regions. 6.

6.  In Phase II. the Genetic Algorithm-Correlation Based Feature Selection)algorithm can be re-applied to the reduced gene data sets. Phase I is repeated until the number of significant genes is less than the threshold. data-mining algorithms such as Decision Tree and Support Vector Machine algorithms are then applied to the training dataset for only the significant genes (Fig.This step validates the fact that the proposed gene selection algorithm preserves the information/knowledge.Algorithm-Correlation Based Feature Selection) algorithms.1). 17 . To further reduce the number of genes. The classification accuracy obtained from this reduced gene data set is not smaller than the maximum classification accuracy from the previous partitioned data sets.

1 Integrated gene-search algorithm 18 .Complete data set for cancer Data set Data set Data set 00001 to 01000 01001 to 02000 0i001 to 0i+1000 1n001 to 1n+1000 Phase I Data mining Data mining Data mining Data mining GA-CFS GA-CFS GA-CFS GA-CFS Identify gene set YES If >1000 NO Data mining Phase II Testing results Training results s Most significant genes Fig.6.

6. and lung cancers. These gene sets require further investigation for their medical relevance. This leads to multiple models and supports the underlying hypothesis that genetic expression data sets can be used in diagnosis of various cancers.3 Conclusion The integrated gene-search algorithm (Genetic Algorithm-Correlation Based Feature Selection algorithm with data mining) was proposed and successfully applied to the training and test genetic expression data sets of ovarian. The overestimate of the actual significant gene set using this algorithm allows the investigation of potentially useful genes or their combinations. This uniformly applicable algorithm not only provided high classification accuracy but also determined a set of the most significant genes for each of the three cancers. 19 . as the prediction power attained from these gene sets is statistically equivalent to that reported in the literature. prostate. The integrated gene-search algorithm is capable of identifying significant genes by partitioning the data set with a correlation-based heuristic.

It does this in a way that mimics nature (hence the name). but has its own unique traits. Over time the individuals who are less suited die off while those that are well suited reproduce and dominate the others. leaving out the rest. The children of those individuals are then passed to a mutate function. In addition to reproduction between well suited individuals (cross-over in the context of a Genetic Algorithm) the offspring of those individuals experience mutation.1 The Algorithm A genetic algorithm can be thought of as a search. the algorithm uses the fitness function to determine which individuals to include in the cross-over (reproduce). the algorithm is searching for an optimal state. Here’s some pseudo code that might help to understand how this might be implemented: 20 . meaning that the child of individuals A and B is not purely a cross between the two. At generation zero the population is usually randomly generated. that alters them in some way. Given some initial state. The first generation of these creatures may not be optimally suited for their environment.Genetic Algorithm Problems 5. a Genetic Algorithm can be expressed as a function that takes as input a population and a fitness function. Generally mutation occurs at a low probability. The population is a collection of individuals and the fitness function is a means of determining how fit an individual is. In order to get from generation zero to generation 1. usually at a very low probability. Say you have a population of a certain species. In the context of programming.5.

Genetic Algorithm are adept for optimization problems in particular. B = true. in which case you’re done. in order to get from generation zero to 21 . This population is generation zero. for each individual in generation zero.  Using the fitness function. It might be the case that one of these individuals satisfies the formula. SATISFIABILITY (or satisfaction) problems attempt to assign values to a boolean formula in such a way that it evaluates to true. KSATISFIABILITY problems for example can be solved with a genetic algorithm (though other means exist).5.  For anyone not familiar with K-SATISFIABILITY problems I’ll give a short explanation. C = false. in the example I gave the formula consists of only one clause. B = true. each solution consisting of a random assignment of true of false to each variable. B. A larger SATISFIABILITY problem may consist of hundreds of variables and thousands of clauses and cannot be solved on paper in a reasonable amount of time. So if my SATISFIABILITY problem consists of two variables: A and B and one clause: A OR B then one solution would be A = true.  Clauses are the components of the boolean formula. Otherwise.2 How They Are Used  There are a variety of problems that can be solved with genetic algorithms. Notice that there are many different assignments of these variables that satisfy the formula.  To solve a SATISFIABILITY problem with a genetic algorithm you start of with a population of randomly generated “solutions”. If there were more clauses this might not be the case. C) and three clauses. A solution to this problem would be A = true. a fitness value is determined. In this context the fitness function is defined as the number of satisfied (or unsatisfied) clauses in the boolean formula. Here is an example of a larger sat problem: (A OR B OR C) AND (A OR !B OR !C) AND (!A OR B OR !C)  This formula consists of three variables (A.

After each passing generation. 5.C) = {True . False.C) = {False. those having a fitness above the average. however. we must choose a portion of the population to “reproduce”.C) = {False. no guarantee that they will ever satisfy all of them. So for the individual X(A. True} a possible child would be Child(A. for individuals X and Y and X(A. A mutation takes an assignment and flips it. True.  At Generation zero for a large problem. for example.3 Problems with Genetic Algorithms  After each generation the individuals of a population begin to approach the solution. False} if a mutation event occurs on the variable B.B.B. it will become X(A.C) = {True. At probabilities above 5% in many cases a solution will not be found in a reasonable amount of time.C) {True. If this is the case.B. Without this mutation the algorithm does not approach a solution. and the 1s represent true. however. say a SATISFIABILITY problem has the solution 000011000 where each character in the bit string represents a variable and the 0s represent false. True.  Once we’ve made our selection we perform the cross over by producing a new individual with a portion of its assignments coming from each parent (the size of the portion may be determined randomly). may actually be very different from the solution.B. False. False}. there is very little chance of a solution existing. False}. In the context of a SATISFIABILITY problem this means they satisfy more and more clauses. the children produced by this individual will look similar to it and 22 .  For example.  After we’ve generated a new population we then randomly mutate each individual at a very low probability.generation one. The string 111100111 might satisfy 90% of the clauses. For example. This is because individuals that have a fitness near the maximum. False} and Y(A. False. the average fitness increases and it becomes likely that an individual satisfies the formula.B. There is.

4 Possible Solutions  A possible way to fix this problem would be to reset the search. while the other is called a local maximum. Generated a new set of random solutions as the algorithm did at generation zero and proceed from there. 5.the likelihood of it being mutated into the solution is essentially zero. while the actual solution is un similar to the current state. The higher one represents the solution to the problem. 23 . the other 75. Hopefully after the reset the search will approach the solution rather than a local max. This is called a random-reset. one reaching 100. A genetic algorithm may reach the peak of a local maximum and become stuck because all similar solutions have a lower fitness. The following graph illustrates this problem: Local Max Problem  From the graph you can see that there are two peaks.

 These solutions would fix the problem in a case where there were only a few local maximums. This would produce a population that very different from the one that existed at the local maximum. For these problems. but never the solution. genetic algorithms with random-reset might find solutions that have very high fitness. 24 . but for some problems it might be the case that there are numerous local maximums. possibly 100%. Another similar solution would be to mutate each individual in the current population at a much higher rate.

robotics and other in future we shall concentrate on the development of hybrid approaches using genetic algorithm an object oriented technology. This algorithm is extremely applicable in different artificial intelligence approaches as well as different basics approaches like object oriented.CONCLUSION & FUTURE SCOPE Genetic algorithm is a probabilistic solving optimization problem which is modeled on a genetic evaluations process in biology and is focused as an effective algorithm to find a global optimum solution for many types of problem. The use of genetic algorithms to solve large and often complex computational problems has given rise to many new applications in a variety of disciplines. potentially huge search spaces and navigating them looking for optimal combinations of things and solutions which we might never be able to find. 25 .6. high quality solutions to difficult practical problems in a diverse variety of fields. Genetic Algorithms are good at taking larger. They have discovered powerful.

edu/~mbwall/presentations/IntroToGAs [2]http://www.org/15-real-world-applications-genetic-algorithms 26 .com/ga/intro/gat1.wikipedia.ai-junkie.org/wiki/Genetic_algorithm [4]http://css.engineering.edu/~ankusiak/Journal-papers/Gene_07.mit. References [1] http://lancet.uiowa.pdf [5]http://brainz.7.html [3]http://en.