You are on page 1of 74

1.

Introduction
Motivation

Scope of the project

Introduction

In this project, we are evaluating various genetic operators in the TRAVELING SALESMEN PROBLEM. The project is about applying different combinations of genetic operators on the given problem and by applying a specific fitness function on the problem to get the specified result with a verified number of iteration and to get the most suitable traveling salesman path at the end of iteration. The GENETIC OPERATORS used in the project are: 1. Selection Techniques: Tournament Selection Roulette wheel Selection Linear Ranking Selection

2. Crossover Techniques: PMX crossover Single point crossover Double point crossover

3. Mutation Techniques: Single mutation Double mutation

The project is implemented in C language with a graphical interface built in specified C language.

Motivation

Motivation of the project is to create a system that can endure a best defined result for a Traveling Salesman Problem. The Traveling Salesman Problem (TSP) is a problem in combinatorial optimization studied in operations research and theoretical computer science. Given a list of cities and their pair wise distances, the task is to find a shortest possible tour that visits each city exactly once. Testing every possibility for N city tour would be N! A 30 city tour would have to measure the total distance of be 2.65 X 10 32 different tours. Assuming a trillion additions per second, this would take 252,333,390,232,297 years. Adding one more city would cause the time to increase by a factor of 31. Obviously, this is an impossible solution. A genetic algorithm can be used to find a solution is much less time. So, the main motive behind making this project is to solve the complex Traveling Salesman Problem in the optimized time. As Genetic algorithms (GA) are an evolutionary optimization approach, which are an alternative to traditional optimization methods this is the main reason to solve the traveling salesman problem using genetic algorithm. The problem has some direct importance, since quite a lot of practical applications can be put in this form. It is also has a theoretical importance in complexity theory, since the TSP is one of the class of NP Hard combinatorial problems. We aim to take up this problem to improve our knowledge about the various graph theory algorithms which are applied to solve the problem and also to learn C language. The deceptive simplicity of the problem along with its NP Hardness is the major motivation behind us taking up this problem.

Problem Definition
The traveling salesman problem consists in finding the shortest (or a nearly shortest) path connecting a number of locations (perhaps hundreds), such as cities visited by a travelling salesman on his sales route. The Traveling Salesman Problem is typical of a large class of "hard" optimization problems that have intrigued mathematicians and computer scientists for years. Most important, it has applications in science and engineering. The problem was first formulated as a mathematical problem in 1930 and is one of the most intensively studied problems in optimization. It is used as a benchmark for many optimization methods. Even though the problem is computationally difficult, a large number of heuristics and exact methods are known, so that some instances with tens of thousands of cities can be solved. A genetic algorithm can be used to find a solution is much less time. Although it might not find the best solution, it can find a near perfect solution for a 100 city tour in less than a minute. There are a couple of basic steps to solving the travelling salesman problem using a GA. Let's number the cities from 1 to n, and let city 1 is the city-base of the salesman. Also let's assume that c (i, j) is the visiting cost from i to j. There can be c (i, j) <>c (j, i). All the possible solutions are (n-1)! Someone could probably determine them systematically, find the cost for each and every one of these solutions and finally keep the one with the minimum cost. These require at least (n-1)! steps. If for example there were 21 cities the steps required are (n-1)! = (21-1)! = (20!) steps. According to this algorithm whenever the salesman is in town i he chooses as his next city, the city j for which the c (i, j) cost, is the minimum among all c (i, k) costs, where k are the pointers of the city the salesman has not visited yet. There is also a simple rule just in case more than one city gives the minimum cost, for example in such a case the city with the smaller k will be chosen. This is a greedy algorithm which selects in every step the cheapest visit and does not care whether this will lead to a wrong result or not.

Background

The project basically deals with the genetic operators implementation over the travelling salesman problem as tsp is the NP hard problem thus the background scenario of the project tells the basic problem to solve the problem of TSP by any computational method is not sufficient and neither it provide any satisfactory results for the problem only by using the genetic algorithm and its operators we can get the better results for this problem. In spite of 40 years of extensive research by some very smart people, brute force seems to be the only solution. For example, the TSP can be solved by listing all possible tours covering the cities once, and picking the one with the least length. This takes an eternity as there are an exponential number of such tours (n!). These problems are the (in) famous NP-Hard problems. Stephen Cook at the University of Toronto proved that if one of them can be solved efficiently, all of them can be a very intuitively pleasing fact. Tragically, most combinatorial optimization problems in nature turn out to be a NP-hard problem. As 40 years of research hasn't found a good solution to these problems, people have begun to approximate the solutions. Say, for TSP, if the least length best tour is around 100 KM, a procedure that finds a tour of 150 KMS would be considered good. Of course, if the length of the best tour is around 5000 KM, the same procedure should return a tour of length 7500 KM. This means that the approximate procedure for TSP should return a solution that is always some fixed fraction times more than the best solution. Most of the NP-hard problems have reasonable approximation algorithms; the TSP does not!! It is proven that you cannot invent a procedure that can take a set of cities and give a tour that will always be some fraction times more than the optimal solution. TSP seems to be too hard. But of course, most of this theory would collapse if someone comes up with a good way to solve one NP-hard problem.

Scope of the Project

The scope of the project generally lays in the area where the project can be worked on or where the computational function performed by the project can be used for any specific task. Overall the scope of the project should be specific and can relate to the general environment. As TRAVELING SALESMAN PROBLEM is a NP hard problem it is difficult to find its solution

Benefits of the Project


As the project is based on the genetic algorithm it solves the problem in polynomial time. Genetic algorithms are one of the best ways to solve a problem for which little is known. They are a very general algorithm and so will work well in any search space. All we need to know is what you need the solution to be able to do well, and a genetic algorithm will be able to create a high quality solution. Genetic algorithms use the principles of selection and evolution to produce several solutions to a given problem. The most common type of genetic algorithm works like this: a population is created with a group of individuals created randomly. The individuals in the population are then evaluated. The evaluation function is provided by the programmer and gives the individuals a score based on how well they perform at the given task. Two individuals are then selected based on their fitness, the higher the fitness, the higher and the chance of being selected. These individuals then "reproduce" to create one or more offspring, after which the offspring are mutated randomly. This continues until a suitable solution has been found or a certain number of generations have passed, depending on the needs of the programmer.

BACKGROUND STUDY AND RELATED WORKS

GENETIC ALGORITHM Genetic Algorithms are a family of computational models inspired by evolution. These algorithms encode a potential solution to a specific problem on a simple chromosome-like data structure and apply recombination (crossover) operators to these structures so as to preserve critical information. Genetic algorithms are often viewed as function optimizers, although the range of problems to which genetic algorithm have been applied is quite broad. Genetic algorithms (GA) are an evolutionary optimization approach, which are an alternative to traditional optimization methods.

Diagram- Genetic Algorithm for Evolutionary Programs

GA is one of the most appropriate methods for complex non-linear models where location of the global optimum is a difficult task.

GA follows the concept of solution evolution by stochastically developing generations of solution populations using a given fitness. They are particularly applicable to problems, which are large, non-linear and possibly discrete in nature. Genetic algorithms are a probabilistic search approach, which are founded on the ideas of evolutionary processes.

The basic components of GA are illustrated in the left figure: gene, chromosome, and population. Usually the chromosome is represented as a binary string. The real trick of GA is on the encoding of problem domain, and the selection of next generation. Genetic Algorithms (GAs) can be seen as a software tool that tries to find structure in data that might seem random, or to make a seemingly unsolvable problem more or less 'solvable'. GAs can be applied to domains about which there is insufficient knowledge or the size and/or complexity is too high for analytic solution. A variety of evolutionary algorithms have been proposed, of which the major ones are: GAs, evolutionary programming, evolution strategies, classifier systems, and genetic programming. Genetic algorithms operate on set of possible solutions. Because of random nature of the genetic algorithm, solutions found by the algorithm can be good, poor or infeasible [defective, erroneous] so there should be a way to specify how good that solution is. This is done by assigning fitness value [or just fitness] to the solution. Chromosomes represent solutions within the genetic algorithm. Two basic component of chromosome are coded solution and its fitness value. Chromosomes are grouped into population [set of solutions] on which the genetic algorithm operates.
8

In each step [generation] genetic algorithm selects chromosomes form population [selection is usually based on fitness value of chromosome] and combines them to produce new chromosomes [offspring]. These offspring chromosomes form new population [or replace some of the chromosomes in the existing population] in hope that new population will be better then previous. Populations keep track of the worst and the best chromosomes and stores additional statistical information which can be used by genetic algorithm to determine stop criteria. Chromosome in some way stores solution which it represents. This is called representation [encoding] of the solution. There are number of probabilities way to represent solution in such way that it is suitable for genetic algorithm [binary, real number, vector of real number, permutations, and so on] and they are mostly depend on nature of problem.

How does GA work? 1. Start with a randomly generated population of n l-bit strings (candidate solutions to a problem). (These "solutions" are not to be confused with "answers" to the problem, think of them as possible characteristics that the system would employ in order to reach the answer (1)) 2. Calculate the fitness f(x) of each string in the population. 3. Repeat the following steps until n new strings have been created:
o

Select a pair of parent strings from the current population, the probability of selection being an increasing function of fitness. Selection is done "with replacement" meaning that the same string can be selected more than once to become a parent. With the crossover probability, cross over the pair at a randomly chosen point to form two new strings. If no crossover takes place, form two new strings that are exact copies of their respective parents. Mutate the two new strings at each locus with the mutation probability, and place the resulting strings in the new population.

4. Replace the current population with the new population. 5. Go to step 2. Why does GA work? The question that most people ask about GA is that why such a selectrecombine-mute process should do anything useful. The most widely given answer was from the book "Adaptation in Natural and Artificial System" by John Holland, who is the father of Genetic Algorithm. Consider a function over a single variable, whose search space is onedimensional, with function maximization as a goal. The following figure demonstrates how the search space is partitioned (sampled). The chromosome form (hyper plane) of 0***...* spans the first half of the search space and 1***...* spans the second half of the space.

10

Since the strings in the 0***...* partition are on average better than those in the 1***...* partition, we would like the search to be proportionally biased toward this partition. A typical genetic algorithm requires: 1. A genetic representation of the solution domain, 2. A fitness function to evaluate the solution domain. A standard representation of the solution is as an array of bits. Arrays of other types and structures can be used in essentially the same way. The main property that makes these genetic representations convenient is that their parts are easily aligned due to their fixed size, which facilitates simple crossover operations. The fitness function is defined over the genetic representation and measures the quality of the represented solution. The fitness function is always problem dependent. Once we have the genetic representation and the fitness function defined, GA proceeds to initialize a population of solutions randomly, and then improve it through repetitive application of mutation, crossover, and inversion and selection operators. To gain a general understanding of genetic algorithms, it is useful to examine its components. Before a GA can be run, it must have the following five components: 1. A chromosomal representation of solutions to the problem. 2. A function that evaluates the performances of solutions. 3. A population of initialized solutions. 4. Genetic operators that evolve the population. 5. Parameters that specify the probabilities by which these genetic operators are applied.
11

PROCESS OF GENETICS ALGORITHM


GAs is population-based search techniques that maintain populations of potential solutions during searches. A string with a fixed bit-length usually represents a potential solution. In order to evaluate each potential solution, GAs need a payoff (or reward, objective) function that assigns scalar payoff to any particular solution. Once the representation scheme and evaluation function is determined, a GA can start searching. Initially, often at random, GAs creates a certain number, called the population size, of strings to form the first generation. Next, the payoff function is used to evaluate each solution in this first generation. Better solutions obtain higher payoffs. Then, on the basis of these evaluations, some genetic operations are employed to generate the next generation. The procedures of evaluation and generation are iteratively performed until the optimal solution is found or the time allotted for computation ends.

12

A. Representation: Usually, only two components of GA are problem dependent: the representation and evaluation functions. Representation is a key genetic algorithm issue because genetic algorithms directly manipulate coded representations of problems. In principle, any character set and coding scheme can be used. However, binary character set is preferred because it yield the largest number of schemata for any given parameter resolution.

In most GAs, the individuals are represented by fixed-length binary strings that express a schema as a pattern defined over alphabet {0, 1, *}, and describe a set of binary strings in the search space. Thus, each string belongs to 2L schemata (L is the length of binary string).

13

B. Evaluation Function Along with the representation scheme, the evaluation function is problem dependent. GAs is search techniques based on feedback received from their exploration of solutions. The judge of the GA's exploration is called an evaluation function. The notion of evaluation and fitness are sometimes used interchangeably.

It is important to distinguish between the evaluation function and the fitness function. While evaluation functions provide a measure of an individual's performance, fitness functions provide a measure of an individual's reproduction opportunities. In fact, evaluation of an individual is independent of other individuals, while an individual's fitness is always dependent of other individuals.

C. Initial Population Choosing an appropriate population size for a genetic algorithm is a necessary but difficult task for all GA users. If the population size is too small, the genetic algorithm will converge too quickly to find the optimal solution. On the other hand, if the population size is too large, the computation cost may be prohibitive.

Reeves described an approach to specifying a minimal population size for the application of GAs. Smith suggested populations. an algorithm for adaptively resizing GA

14

Robertson's investigation of population size on parallel machines found that performance increased monotonically with population size. The initial population for a genetic algorithm is usually chosen at random. D. Operators

From a mechanistic point of view, a GA is an iterative process in which each iteration has two steps, evaluation and generation. o In the evaluation step, domain information is used to evaluate the quality of an individual. o The generation step includes a selection phase and a recombination phase. In the selection phase, fitness is used to guide the reproduction of new candidates for following iterations. The fitness function maps an individual to a real number that is used to indicate how many offspring that individual is expected to breed. High-fitness individuals are given more emphasis in subsequent generations because they are selected more often. In the recombination phase, crossover and mutation perform mixing. Crossover reconstructs a pair of selected individuals to create two new offspring. Mutation is responsible for re-introduction inadvertently "lost" gene values.

So, the three primary operators: selection, crossover, and mutation. While selection according to fitness is an exploitative resource, the crossover and mutation operators are exploratory resources. The GA combines the exploitation of past results with the exploration of new areas of the search space. The effectiveness of a GA depends on an appropriate mix of exploration and exploitation. The following describe these three operators:

15

1. Selection: The selection phase plays an important role in driving the search towards better individuals and in maintaining a high genotypic diversity in the population. The selection phase could be divided into the selection algorithm and the sampling algorithm.
i. The selection algorithm assigns each individual x a real

number, called the target sampling rate, tsr(x, t), to indicate the expected number of offspring x will reproduce by time t. ii. The sampling algorithm actually reproduces, based on the target sampling rate, copies of individuals to form the intermediate population.
1.

2.

3. 4.

There are two types of selection algorithms: (a) Explicit fitness remapping, and (b) Implicit fitness remapping. The first one re-maps the fitness onto a new scale, which is then used by the sampling algorithm. Proportional selection and fitness ranking belongs to this category. The second one fills the mating pool without passing through the intermediate step of remapping.

There are three algorithms are being used for selection operator, are as follows: Tournament Selection, Roulette Wheel Selection and Linear Ranking Selection Tournament selection is a method of selecting an individual from a population of individuals in a genetic algorithm. Tournament selection involves running several "tournaments" among a few individuals chosen at random from the population. The winner of each tournament (the one with the best fitness) is selected for crossover. Selection pressure is easily

16

adjusted by changing the tournament size. If the tournament size is larger, weak individuals have a smaller chance to be selected. Tournament selection pseudo code: Choose k (the tournament size) individuals from the population at random Choose the best individual from pool/tournament with probability p Choose the second best individual with probability p*(1-p) Choose the third best individual with probability p*((1-p) ^2) and so on... Deterministic tournament selection selects the best individual (when p=1) in any tournament. A 1-way tournament (k=1) selection is equivalent to random selection. The chosen individual can be removed from the population that the selection is made from if desired; otherwise individuals can be selected more than once for the next generation. Tournament selection has several benefits: it is efficient to code, works on parallel architectures and allows the selection pressure to be easily adjusted.

ROULETTE WHEEL SELECTION


Simple reproduction allocates offspring strings using a roulette wheel with slots sized according to fitness. This is a way of choosing members from

17

the population of chromosomes in a way that is proportional to their fitness. Parents are selected according to their fitness. The better the fitness of the chromosome, the greater the chance it will be selected, however it is not guaranteed that the fittest member goes to the next generation. Imagine a roulette wheel in which all chromosomes in the population are placed according to fitness, as in the picture below. 1.) The wheel is "spun" and the marble falls into one of the slots. 2.) The wheel is spun to select each chromosome that is to mate. 3.) The wheel is "spun" once for each chromosome that is chosen.

This can be simulated by following algorithm: 1. [Sum] Calculate sum of all chromosome fitness in population -sum S. 2. [Select] Generate random number from interval (0, S) - r. 3. [Loop] Go through the population and sum fitness from 0 - sum s. When the sum s is greater then r, stop and return the chromosome where you are. Of course, step 1 is performed only once for each population.

RANK SELECTION It is Roulette wheel with a pre-sort. The roulette method of selection will have problems when the fitness differs greatly. For example, if the best
18

chromosome fitness is 90% of the entire roulette wheel then the other chromosomes will have a slim chance of being selected. Rank selection first ranks the population and then every chromosome receives fitness from this ranking. The worst will have fitness 1, second worst 2 etc. and the best will have fitness N (number of chromosomes in population). To begin, chromosomes are sorted in order of highest fitness to lowest fitness with the highest being awarded a rank of 10 (since our population size is 10). The higher is the rank, the higher is the fitness.

Diagram- Before ranking selection

Diagram- After ranking selection

As a rule of thumb, rank selection allows average-fitness chromosomes to do a bit more breeding. We've not had much success with this particular selection method, but others have. Like everything else in genetic algorithms, it's problem dependent. So, it's worth implementing and testing out to see if it works for your project.
2.

Crossover:

19

In order to explore other points in the search space, variation is introduced into the intermediate population by means of some idealized genetic recombination operators. The most important recombination operator is called crossover. One-point crossover A single crossover point on both parents' organism strings is selected. All data beyond that point in either organism string is swapped between the two parent organisms. The resulting organisms are the children:

EXAMPLE:---

Parent 1: Parent 2: Offspring 1:

XX|XXXXX YY|YYYYY XXYYYYY

Offspring 2: Y Y X X X X X

20

Two-point crossover

Two-point crossover calls for two points to be selected on the parent organism strings. Everything between the two points is swapped between the parent organisms, rendering two child organisms:

EXAMPLE--21

P1 = 1234 | 567 | 8 P2 = 8521 | 364 | 7

We would get:-

Q1= 1234 | 364 | 8 Q2 = 8521 | 567 | 7

22

PMX Crossover PMX Crossover is a genetic algorithm operator. For some problems it offers better performance than most other crossover techniques. Basically, parent 1 donates a swath genetic material and the corresponding swath from the other parent is sprinkled about in the child. Once that is done, the remaining alleles are copied direct from parent 2. 1. Randomly select a swath of alleles from parent 1 and copy them directly to the child. Note the indexes of the segment. 2. Looking in the same segment positions in parent 2, select each value that hasn't already been copied to the child.

For each of these values: i. ii. iii. iv. 2 3 Copy any remaining positions from parent 2 to the child. Note the index of this value in Parent 2. Locate the value, V, from parent 1 in this same position. Locate this same value in parent 2. If the index of this value in Parent 2 is part of the original swath, go to step i. using this value. If the position isn't part of the original swath, insert Step A's value into the child in this position.

23

3.

Mutation:
i.

ii.

When individuals are represented as bit strings, mutation consists of reversing a randomly chosen bit. For example, assume that the individuals are represented as binary strings. In bit complement, once a bit is selected to mutate that bit will be flipped to be the complement of the original bit value.

Single mutation
A mutation operator is that simply inverts the value of the chosen gene (0 goes to 1 and 1 goes to 0). This mutation operator can only be used for binary genes.

24

E. Parameters 1. Running a genetic algorithm entails setting a number of parameter values. However, finding good settings that work well on one's problem is not a trivial task. 2. There are two primary parameters concern the behaviour of genetic algorithms: Crossover Rate (Cr) and Mutation Rate (Mr). The crossover rate controls the frequency with which the crossover the crossover operator is applied. If there are N individuals (population size=N) in each generation then in each generation N*Cr individuals undergo crossover. The higher crossover rate, the more quickly new individuals are added to the population. If the crossover is too high, high-performance individuals are discarded faster than selection can produce improvements. However, a low crossover rate stagnate the search due to loss of exploration power.

25

Mutation is the operator that maintains diversity in the population. A genetic algorithm with a too high mutation rate will become a random search. After the selection phase, each bit position of each individual in the intermediate population undergoes a random change with a probability equal to the mutation rate Mr. Consequently, approximately Mr*N*L mutations occur per generation, where L is the length of the chromosome. A genetic algorithm with a too high mutation rate will become a random search.

TRAVELING SALESMAN PROBLEM


The Traveling Salesman problem (TSP) is a problem in combinatorial optimization studied in operations research and theoretical computer science. Given a list of cities and their pair wise distances, the task is to find a shortest possible tour that visits each city exactly once. The problem was first formulated as a mathematical problem in 1930 and is one of the most intensively studied problems in optimization. It is used as a benchmark for many optimization methods. Even though the problem is computationally difficult, a large number of heuristics and exact methods are known, so that some instances with tens of thousands of cities can be solved.

26

Testing every possibility for an N city tour would be N! A 30 city tour would have to measure the total distance of be 2.65 X 10 32 different tours. Assuming a trillion additions per second, this would take 252,333,390,232,297 years. Adding one more city would cause the time to increase by a factor of 31. Obviously, this is an impossible solution. A genetic algorithm can be used to find a solution is much less time. Although it might not find the best solution, it can find a near perfect solution for a 100 city tour in less than a minute. There are a couple of basic steps to solving the Traveling salesman problem using a GA. PROCEDURE: In the first step an initial set of chromosomes is created where each chromosome represents a distinct path between two given cities and satisfying given condition. In the second step cost of traversing each path is calculated to determine weight of the respective chromosome by reciprocating the cost. In the next step the chromosomes with the greater weights are selected for further processing. In the fourth step crossover of the selected set of chromosomes is done to generate a new population.
27

Crossover is done by employing PMX algorithm. Whenever necessary the mutation of the running generation is done to avoid the result form converging into local minima. In the program prepared here, we have scheduled mutation to occur after the completion of every fourth generation. Now, the second to fourth step are executed in a loop till hundredth generation. The final generation has a set of chromosomes with the highest weights that is, the paths with the shortest distances. Finally from this set, the chromosome with the highest weight is selected and provided to the user. Thus by using this algorithm best possible path can be determined between any two cities which covers all the cities in the given graph. The algorithm is not perfect but tries to give best possible result in polynomial time.

28

2. Analysis
Analysis of Existing Methods

Requirement & Specification

Proposed Solution

Analysis

Computing a solution for Traveling Salesman Problem The traditional lines of attack for the NP-hard problems are the following:

Devising algorithms for finding exact solutions (they will work reasonably fast only for relatively small problem sizes).

29

Devising "suboptimal" or heuristic algorithms, i.e., algorithms that deliver either seemingly or probably good solutions, but which could not be proved to be optimal. Finding special cases for the problem ("sub-problems") for which either better or exact heuristics are possible.

Analysis of Existing Methods:The most direct solution would be to try all permutations (ordered combinations) and see which one is cheapest (using brute force search). The running time for this approach lies within a polynomial factor of O (n!), the factorial of the number of cities, so this solution becomes impractical even for only 20 cities. One of the earliest applications of dynamic programming is an algorithm that solves the problem in time O (n22n). The dynamic programming solution requires exponential space. Using inclusionexclusion, the problem can be solved in time within a polynomial factor of 2n and polynomial space. Other approaches include as follows: As There are various branch-and-bound algorithms which can be used to process TSP containing 40 to 60 cities. The Progressive Improvement Algorithms which use techniques reminiscent of linear programming. Works well for up to 200 cities. Implementations of branch-and-bound and problem-specific cut generation; this is the method of choice for solving large instances. This approach holds the current record, solving an instance with 85,900 cities. Heuristic and Approximation Algorithms:There are various Heuristic and Approximation algorithms, which quickly yield good solutions have been devised. Modern methods can find solutions for extremely large problems (millions of cities) within a

30

reasonable time which are with a high probability just 2-3% away from the optimal solution. Several categories of heuristics are recognized. Constructive heuristics The nearest neighbour (NN) algorithm (or so-called greedy algorithm) lets the salesman choose the nearest unvisited city as his next move. This algorithm quickly yields an effectively short route. For N cities randomly distributed on a plane, the algorithm on average yields length = 1.25 * exact_shortest_length. However, there exist many specially arranged city distributions which make the NN algorithm give the worst route (Gutin, Yeo, and Zverovich, 2002). This is true for both asymmetric and symmetric TSP (Gutin and Yeo, 2007). Rosenkrantz et al. [1977] showed that the NN algorithm has the approximation factor (log | V |) for instances satisfying the triangle inequality. The bitonic tour of a set of points is the minimum-perimeter monotone polygon that has the points as its vertices; it can be computed efficiently by dynamic programming. Another constructive heuristic, Match Twice and Stitch (MTS) (Kahng, Reda 2004 [18]), MTS performs two sequential matching, where the second matching is executed after deleting all the edges of the first matching, to yield a set of cycles. The cycles are then stitched to produce the final tour. Randomized Improvement Optimized Markov chain algorithms which use local searching heuristic sub-algorithms can find a route extremely close to the optimal route for 700 to 800 cities. Random path change algorithms are currently the stateof-the-art search algorithms and work up to 100,000 cities. The concept is quite simple: Choose a random path, choose four nearby points, and swap their ways to create a new random path, while in parallel decreasing the upper bound of the path length. If repeated until a certain number of trials of random path changes fail due to the upper bound, one has found a local minimum with high probability, and further it is a global minimum with high probability (where high means that the rest probability decreases

31

exponentially in the size of the problem - thus for 10,000 or more nodes, the chances of failure is negligible). TSP is a touchstone for many general heuristics devised for combinatorial optimization such as genetic algorithms, simulated annealing, Tabu search, Ant colony optimization, and the cross entropy method. Ant Colony Optimization:Artificial intelligence researcher Marco Dorigo described in 1997 a method of heuristically generating "good solutions" to the TSP using a simulation of an ant colony called ACS (Ant Colony System).[19] It uses some of the same ideas used by real ants to find short paths between food sources and their nest, an emergent behaviour resulting from each ant's preference to follow trail pheromones deposited by other ants. ACS sends out a large number of virtual ant agents to explore many possible routes on the map. Each ant probabilistically chooses the next city to visit based on a heuristic combining the distance to the city and the amount of virtual pheromone deposited on the edge to the city. The ants explore, depositing pheromone on each edge that they cross, until they have all completed a tour. At this point the ant which completed the shortest tour deposits virtual pheromone along its complete tour route (global trail updating). The amount of pheromone deposited is inversely proportional to the tour length; the shorter the tour, the more it deposits.

PROPOSED SOLUTION

We have tried to solve the Traveling salesman problem using new alternative method of genetic algorithm. Code the problem The problem has to be coded into data structure, which can be handed like a chromosome. For TSP, the chromosome is set of ordered indexes of
32

cities, through which the traveler goes. For other problems, it could be integer or real number, for difficult tasks, there is idea of using Neural Networks for coding the problem. Define fitness function The fitness function evaluates each chromosome and sets the numeric value to it, which represents the quality of the chromosome e.g. of the solution, which the chromosome represents. Fitness function is then used for evaluating the population and preferring the higher quality of individuals for mating and creating offspring. For TSP, the fitness function of chromosome is computed at the total distance of the represented solution. But even here, computing of the cost of solution is not so easy and could be research, as the total distance equation does not have to be the best fitness function. As TSP algorithm tends sometime to create quite long distance connections, root mean square (RMS) value could be used to compute the cost. RMS value is just sum of square roots of distances between the cities in path which is encoded in chromosome. In this way, we could prefer more expecting solution of (in distances) 2 3 2 2 (total distance=9, RMS cost 21) against chromosome 1 2 1 5 (total distance 9, but RMS distance 31). For more difficult applications, the fitness function could be defined in complex abstract and non exact way that only tries to compare the quality of chromosomes against each other, but fitness function by itself does not return any meaningful information.

33

3. Design
Early and Basic Design Flow Charts of the Project

Design

The purpose of the design phase is to plan a solution of t h e p r o b l e m s p e c i f i e d b y t h e r e q u i r e m e n t d o c u m e n t . T hi s phase is t h e f i r s t s t e p i n m o vi n g f r o m t h e pr o b l e m

34

domain to the solution

d om a i n . I n o t h e r w or d s , s t a r t i n g

with what is needed; design takes us towards how to s a t i s f y t h e n e e d s . In this project, we are making a graphic user interface (GUI) in C language for taking inputs from the user easily. Early Design Functioning of the Project
MENU Functioning of menu bar :EXIT

(A) Menu:It will perform the task of solving the GA. It can be sub divided as follows:

(a) START:This will start the Genetic Process, and the user to give several requirements related to his problem for solving TSP.

(b) RESULT:This will show the last result which was performed just before it. Showing the number of iterations used, what combination of genetic operators have been chosen by the user to solve the TSP.

(B)EXIT:
This will exit the project window.

Basic Design

Project
35

Menu Bar

MENU

EXIT

The outlay of Project

Menu Bar

36

The components of Menu Bar

Modules The project consists of following modules:

i) ii)

Enter Numbers of cities required Selection Techniques: 1- Tournament, 2- Roulette Wheel and 3Linear ranking. Crossover Techniques: 4- Single Crossover, 5- Double Crossover and 6- PMX. Mutation Techniques: 7- Single Mutation, and 8- Double Mutation. Last module for solving Traveling Salesman Problem using above inputs by Genetic Algorithm.

iii)

iv)

v)

Designing of the Flow Chart of the Project

To gain a general understanding of genetic algorithms, it is useful to examine its components. Before a GA can be run, it must have the following five components: 1. A chromosomal representation of solutions to the problem. 2. A function that evaluates the performances of solutions. 3. A population of initialized solutions.

37

4. Genetic operators that evolve the population. 5. Parameters that specify the probabilities by which these genetic operators are applied. Chromosome in some way stores solution which it represents. This is called representation [encoding] of the solution. There are number of probabilities way to represent solution in such way that it is suitable for genetic algorithm [binary, real number, vector of real number, permutations, and so on] and they are mostly depend on nature of problem.

Genetic algorithms produce new chromosomes [solutions] by combining existing chromosomes. This operation is called crossover. Crossover operation takes parts of solution encodings from two existing chromosomes [parents] and combines them into single solution [new chromosome]. This operation depends on chromosome representation and can be very complicated. Although general crossover operations are easy to implement, building specialized crossover operation for specific problem can greatly improve performance of the genetic algorithm.

38

Before genetic algorithm finishes production of new chromosome, after it performs crossover operation, it performs mutation operation. Mutation operation makes random but small changes to encoded solution. This prevents falling of all solution into local optimum and extends search space of the algorithm. Mutation as well as crossover operation depends on chosen representation.

39

Diagram - Mutation operation examples [swap mutation is performed over the first and overt the second invert mutation is performed]

The last operations defined by genetic algorithms used to manipulate the chromosomes are fitness operation and fitness comparator. Fitness operation measures quality of produced solution [chromosome]. This operation is specific to problem and it actually tells genetic algorithm what to optimize. Fitness comparators [as their name suggests] are used to compare chromosomes based their fitness. Basically fitness comparator tells genetic algorithm whether it should minimize or maximize fitness values of chromosomes. The selected chromosomes [parents] are paired for mating. The mating is done by performing crossover operation over paired parents and applying mutation operation to newly produced chromosome. This kind of operation gives better control over the production of new chromosomes but can it be skipped and new chromosomes can be produced as the selection operation selects parents from the population.

40

Diagram- Selection Operation Flow chart

41

Diagram - Flowchart of a genetic algorithm

42

4. CODING

Coding

Once the design is complete, most of the major decision a b o u t t h e s y s t e m h a s b e e n m a d e .T h e g o a l o f t h e c o d i n g

43

phase is to translate the design of the system into code in a given programming language. C o d e o f t h e p r oj e c t : #include<graphics.h> #include<dos.h> #include<stdio.h> #include<conio.h> #include<string.h> #include<stdlib.h> #include<math.h> #include"c:\pro\loding.c"

#define maxs 100 #define pop 36

int pos[pop]; int n,chrom[pop][maxs-1],cross_m[pop],cp[pop]; static float best_fit; static best_chrom_temp[maxs-1],best_cost; int rnd[maxs],cf[maxs],acc[maxs]; float fitf[pop],avg,hi,lo,exc[pop],tot; char s[100],sr[100],st[100]; int set=0; void newsele();

44

void newmut(); void newcros(); void cities(); void final(); void result(); void init_rnd(int * ,int); void myprint(char , int); float high(float *,int *); float low (float *); void swap(int *, int*); void myran(int *, int , int ); void chrom_print(int * , int); void tournament(int n); void roulette(int n); void rank(int n); void cal_cost(int); void exc_to_acc(); void print_start(); void print_cf_ff(int); void acc_to_pop(); void s_crossover(int); void rep_cross(int); void d_crossover(int); void d_rep_cross(int); void mutate(int);

45

void d_mutate(int); void pmx(int);

void main() { union REGS i,o; int gd,gm,x,y; gd=DETECT; initgraph(&gd,&gm,"c:\\tc\\bgi"); loding(); clrscr(); i.x.ax=0x01; int86(0x33,&i,&o); setcolor(YELLOW); outtextxy(10,10,"MENU"); outtextxy(270,10,"EXIT"); while(!kbhit()) { i.x.ax=0x03; int86(0x33,&i,&o); if(o.x.bx==1) { if(o.x.cx>=10&&o.x.cx<=56&&o.x.dx>=10&&o.x.dx<=17) { setcolor(YELLOW);

46

outtextxy(3,30,"START"); outtextxy(3,50,"RESULT"); while(!kbhit()) { i.x.ax=0x03; int86(0x33,&i,&o); if(o.x.bx==1) { if(o.x.cx>=10&&o.x.cx<=87&&o.x.dx>=30&&o.x.dx<=37) { cities(); setcolor(RED); setfillstyle(1,BLACK); rectangle(0,20,getmaxx(),getmaxy()); floodfill(20,150,RED); setcolor(BLACK); rectangle(0,20,getmaxx(),getmaxy()); newsele(); setcolor(RED); setfillstyle(1,BLACK); rectangle(0,20,getmaxx(),getmaxy()); floodfill(20,150,RED); setcolor(BLACK); rectangle(0,20,getmaxx(),getmaxy()); newcros();

47

setcolor(RED); setfillstyle(1,BLACK); rectangle(0,20,getmaxx(),getmaxy()); floodfill(20,150,RED); setcolor(BLACK); rectangle(0,20,getmaxx(),getmaxy()); newmut(); setcolor(RED); setfillstyle(1,BLACK); rectangle(0,20,getmaxx(),getmaxy()); floodfill(20,150,RED); setcolor(BLACK); rectangle(0,20,getmaxx(),getmaxy()); settextstyle(1,0,0); final(); main(); } else

if(o.x.cx>=10&&o.x.cx<=80&&o.x.dx>=50&&o.x.dx<=57) { result(); main(); setfillstyle(1,BLACK);

48

rectangle(0,20,getmaxx(),getmaxy()); floodfill(20,150,RED); setcolor(BLACK); rectangle(0,20,getmaxx(),getmaxy()); } } } }

else if(o.x.cx>=150&&o.x.cx<=200&&o.x.dx>=10&&o.x.dx<=17) { setcolor(BLACK); outtextxy(3,30,"START"); setcolor(YELLOW); outtextxy(120,30,"10"); outtextxy(120,50,"20"); outtextxy(120,70,"42"); while(!kbhit()){ i.x.ax=0x03; int86(0x33,&i,&o); if(o.x.bx==1) { if(o.x.cx>=150&&o.x.cx<=200&&o.x.dx>=30&&o.x.dx<=37) {

49

break; } } }

else if(o.x.cx>=270&&o.x.cx<=300&&o.x.dx>=10&&o.x.dx<=17) { setcolor(YELLOW); exit(0); }

} }

getch(); closegraph(); } void newsele() { int gd,gm,x,y,array[]={230,230,230,260,286,260,286,230}; x=getmaxx(); y=getmaxy();

50

setcolor(WHITE); settextstyle(1,0,1); outtextxy(65,100,"1:Tournament 3:Ranking"); outtextxy(100,150,"Selection:"); setcolor(WHITE); rectangle(240,150,390,180); outtextxy(x/340+200,y/400+100,""); setcolor(WHITE); rectangle(60,70,450,300); outtextxy(x/340+200,y/400+100,""); fillpoly(4,array); setcolor(BLUE); outtextxy(250,235,"OK"); moveto(130,160); setcolor(WHITE); fflush(stdin); gotoxy(34,11); sleep(3); gets(s); return(0); } void final() { int i,j,temp; 2:Roulette Wheel

51

clrscr(); for(i=0;i<pop;i++) acc[i]=cf[i]=rnd[i]=0; for(i=0;i<pop;i++) { for(j=0;j<n-1 ; ) { temp= random(n); if(temp && rnd[temp]!=1) { rnd[temp]=1; chrom[i][j++]=temp; } } init_rnd(rnd,n); } clrscr(); printf("\n\t\t\t--- POPULATION SET #1\n"); myprint('.',80); printf("\n%15s%40s%9s %8s\n","CHROMOSOMES","CF","EC","AC"); myprint('.',80); printf("\n"); for(set=0;set<460;set++) {

52

cal_cost(n); exc_to_acc(); print_start(); print_cf_ff(n); acc_to_pop(); s_crossover(n); rep_cross(n); mutate(n); } printf("\n\t\t\tNo. of iteration used are:%d\n",set); myprint('.',80); myprint('.',80); printf("\n"); printf("\n%15s%28s %19s\n","SELECTION","CROSSOVER","MUTATION"); myprint('*',80); printf("\n%15s%28s%19s\n",s,st,sr); getch(); }

void s_crossover(int n) { int temp,temp_chrom[(maxs-1)/2+1][maxs-1]; int k,l,m,j,i; printf("\n\t\t (CROSSOVER)--- \n"); ---GENETIC OPERATION

53

myprint('.',80); printf("\n CHROMOSOMES \t\t\tCM\t\tCP\n");

myprint('.',80); printf("\n"); for(i=0;i<pop;i++ ) { cross_m[i]=-1; for(j=0;j<n-1;j++) temp_chrom[i][j]=chrom[i][j]; } for(i=0;i<pop; ) { if(cross_m[i]<0) { temp = rand() % pop; if(i!=temp && cross_m[temp]<0 ) { cross_m[i]=temp; cross_m[temp]=i; cp[i]=cp[temp]=random(n-2); for(k=cp[i]+1;k<n-1;k++) swap(&chrom[i][k],&chrom[temp][k]); i++; } }

54

else i++; } for(i=0;i<pop;i++) { printf("%2d|",i+1); for(j=0;j<n-1;j++) printf(" %d",temp_chrom[i][j]); printf("\t\t\t\t %d",cross_m[i]+1); printf("\t\t %d",cp[i]+1); printf("\n"); } myprint('.',80); printf("\n"); getch(); printf("\n\t\t\t ---AFTER CROSSOVER--- \n"); myprint('.',80); }

void mutate(int n) { float pm=0.010,temp; int i,j,temp_chrom[pop][maxs-1]; int mal_gene[pop][maxs-1]; for(i=0;i<pop;i++)

55

for(j=0;j<n-1;j++) { temp_chrom[i][j]=chrom[i][j]; mal_gene[i][j]=0; } for(i=0;i<pop;i++) for(j=0;j<n-1;j++) { temp=(float)random(1000)*(0.001f); if( temp < pm) { int t1,t2,k; myran(&t1,1,n); if(t1!=chrom[i][j]) { mal_gene[i][j]=1; t2=chrom[i][j]; temp_chrom[i][j]=chrom[i][j]=t1; for(k=j+1;k<n-1;k++) if(chrom[i][k]==t1) { chrom[i][k]=t2; break; } for(k=0;k<j;k++)

56

if(chrom[i][k]==t1) { chrom[i][k]=t2; break; } } } } getch(); printf("\n\t\t \n"); myprint('.',80); printf("\n\t\t\t ---AFTER MUTATION--- \n"); myprint('.',80); for(i=0;i<pop;i++) { for(j=0;j<n-1;j++) { if(mal_gene[i][j]) printf("[%d] ",temp_chrom[i][j]); else printf("%d ",temp_chrom[i][j]); } printf("\n"); } ---GENETIC OPERATION (MUTATION)---

57

getch(); printf("\n\t\t\t MUTATION--- \n"); myprint('.',80); for(i=0;i<pop;i++) { for(j=0;j<n-1;j++) printf("%d ",chrom[i][j]); printf("\n"); } } ---GENETIC REPAIR APPLIED FOR

58

5. TESTING

Testing

Testing is the process of exercising or evaluating a system or components by manual or automated means to verify that it satisfies specified requirements. Software testing is the process of executing a program or system with the intent of finding errors. There are following testing: 1.) INTEGRATION AND SYSTEM TESTING: Software testing is the process used to help identify the correctness, security, and quality of developed computer software. Testing is a process of technical investigation, performed on behalf of stake holders, that is intended to reveal quality related information about the product with respect to the context in which it is intended to operate. This includes, but is not limited to the process of executing a program or application with the intent of finding errors. Th i s t e s t i n g i s p e r f o r m e d t o d e t e c t d e s i g n e r r o r s b y focusing on testing the interconnection between modules. After designing the modules the main issue is to put them together interfacing. It should be thoroughly checked that

59

n o d a t a s h o u l d b e l o s t a t t h e t i m e of i n t e g r a t i o n ; 1 m o d ul e c a n h a v e a n i n a d v er t e n t , a d v e r s e e f f e c t o n a n o t h er ; s u b functions, when combined, may not produce the desired major function. A l l t h e s e p r o bl e m s ar e t a k e n c a r e o f d u r i n g i n t e g r a t i o n testing. During integration testing phase we have put the administrator side module are one side and user side module side module are another side.

2.) WHITE BOX TESTING: - (Structural Testing) It uses as an internal perspective of the system to design test cases based on internal structure. It requires programming skills to identify all paths through the software. The tester chooses test cases inputs to exercise all paths and determines the appropriate output. While White box testing is applicable at the unit integration and system levels of the software testing process, its typically applied to the unit. So while it normally tests paths within a unit, it can also test paths between units during integration, and between subsystems during level test. 3.) BLACKBOX TESTING: - (Functional Testing)
It takes an external perspective of the test object to drive test cases. These

tests can be functional or non-functional, though usually functional. The test designer selects valid and invalid input and determines the correct output. There is no knowledge of the test objects internal structure. This method of design is applicable to all levels of testing unit, integration, system and acceptance. Black-box testing uses external descriptions of the software, including specifications, requirements, and designs to derive test cases. These tests can be functional or non-functional, though usually functional. The test designer selects valid and invalid inputs and determines the correct output. There is no knowledge of the test object's internal structure. This method of test design is applicable to all levels of
60

software testing: unit, integration, functional testing, system and acceptance. The higher the level, and hence the bigger and more complex the box, the more one is forced to use black box testing to simplify. While this method can uncover unimplemented parts of the specification, one cannot be sure that all existent paths are tested. 4.) ACCEPTANCE TESTING:It allows the end user or customer to decide whether or not to accept the product. ALPHA Testing is simulated or actual operational testing by potential users/ customers or an independent test team at the developers site. BETA Testing comes after alpha testing. Versions of the software, known as beta versions, are released to a limited audience outside of the company. We have per formed white box testing. We have used an internal perspective of the system to design test cases based on internal structure. We have chosen test cases inputs to exercise all paths and determine the appropriate outputs. Test cases: 10 cities
{{0,0,0,0,0,0,0,0,0,0,0}, {0,0,60,62,65,65,57,67,62,76,50}, {0,60,0,75,52,72,54,60,62,55,79}, {0,62,75,0,76,56,62,73,61,62,58}, {0,65,52,76,0,51,61,79,63,51,50}, {0,65,72,56,51,0,79,78,52,52,72}, {0,57,54,62,61,79,0,60,53,57,53}, {0,67,60,73,79,78,60,0,74,76,70}, {0,62,62,61,63,52,53,74,0,72,53}, {0,76,55,62,51,52,57,76,72,0,55},

61

{0,50,79,58,50,72,53,70,53,55,0}}

Best Answers:
549 549 548 546 549 549

20 cities
{{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}, {0,0,36,69,46,77,41,35,36,62,65,84,52,31,87,30,56,52,57,87,38,}, {0,36,0,56,86,73,29,42,56,88,28,38,29,56,61,37,57,74,53,50,84,}, {0,69,56,0,49,46,46,39,74,65,52,60,49,75,69,25,56,70,82,85,61,}, {0,46,86,49,0,22,66,89,86,43,85,71,36,61,69,69,78,57,50,41,20,}, {0,77,73,46,22,0,67,73,24,64,60,86,44,23,58,28,69,83,34,48,46,}, {0,41,29,46,66,67,0,46,65,72,87,42,76,58,65,84,62,48,56,39,51,}, {0,35,42,39,89,73,46,0,31,55,22,58,33,56,34,54,62,79,23,86,73,}, {0,36,56,74,86,24,65,31,0,82,70,81,88,82,39,31,45,33,53,80,46,}, {0,62,88,65,43,64,72,55,82,0,35,79,51,38,69,73,73,35,23,85,31,}, {0,65,28,52,85,60,87,22,70,35,0,24,31,29,88,21,30,47,80,88,31,}, {0,84,38,60,71,86,42,58,81,79,24,0,43,40,46,29,31,48,64,87,48,}, {0,52,29,49,36,44,76,33,88,51,31,43,0,48,38,72,59,69,25,57,23,}, {0,31,56,75,61,23,58,56,82,38,29,40,48,0,54,38,66,71,79,72,88,}, {0,87,61,69,69,58,65,34,39,69,88,46,38,54,0,48,51,63,75,31,36,}, {0,30,37,25,69,28,84,54,31,73,21,29,72,38,48,0,78,38,26,58,35,}, {0,56,57,56,78,69,62,62,45,73,30,31,59,66,51,78,0,26,45,73,59,}, {0,52,74,70,57,83,48,79,33,35,47,48,69,71,63,38,26,0,66,75,36,},

62

{0,57,53,82,50,34,56,23,53,23,80,64,25,79,75,26,45,66,0,41,64,}, {0,87,50,85,41,48,39,86,80,85,88,87,57,72,31,58,73,75,41,0,81,}, {0,38,84,61,20,46,51,73,46,31,31,48,23,88,36,35,59,36,64,81,0}}

Best Answer
572 572 573 572 572 572

575 575 574 572

Maintenance
The project has been coded in easy p r o gr a m m i n g

c o n s t r a i n t s . T h e l o gi c a l o p e r a t o r s u s e d m a k e i t e a s y f or m a i n t e n a n c e . T h e m o d u l a r i z a t i o n o f t h e pr o j e c t m a k e s i t e a s y f or t e s t i n g . T h e p r o j e c t s i z e h a s b e e n q u i t e c o m p a c t , w h i c h m a k e s i t e a s y f o r t h e d e t e c t i o n of e r r o r s . The project has been made such that the maintenance cost
63

would

come

to

be

quite

less.

The

pr o j e c t

can

be

implemented successfully to do the needful.

6. SNAPSHOTS

64

65

First Population set for 20 Cities

66

After Crossover Applied on the Population set

67

After Genetic Repair Applied

68

After Genetic Mutation Applied

69

Example of Output

70

Output of 42 cities

71

CONCLUSION

72

Conclusion

To conclude we would like to say that we have successfully attained the basic purpose of our project which was to solve traveling salesman problem using genetic algorithm. We have shown all the optimal results obtained by applying different combinations of selection, crossover and mutation. Our project gives best result for 20 cities and 42 cities problem after 500 and 10,000 iterations respectively. It is simple, easy and user friendly. A GUI in C language is developed which would help user to select different operators of his choice. The end results are shown in form representation, consisting of fitness value, average value and expected value. We have tried to minimize the traveling salesman problem. The basic functionalities have been implemented well but as nothing can be perfect and needs improvement with time so is the case with my project. Although it fulfills all the basic requirements but still there is lot of scope for improvement.

73

References
[1] D.E. Goldberg and R. Lingle, Alleles, Loci, and the Traveling Salesman Problem, in: J.J. Grefenstette (ed), Proceedings of the First International Conference on Genetic Algorithm and Their Application, Lawrence Erlbaum Associates, Hillsdale, NJ, 1985 pp. 154-159.

[2] http://cs.felk.cvut.cz/~xobitko/ga/ [3]Abdollah Homaifar, Shanguchuan Guan, and Gunar E. Lieoins, Schema analysis of the traveling salesman problem using genetic algorithm. Complex System, 6(2): 183-217, 1992.

74