You are on page 1of 6

Solving Large Knapsack Problems with a Genetic A l ~ o r i t h ~

Dr. Richard Spillman Department of Computer Science Pacific Lutheran University Tacoma, WA 98447
SPILLMRJ@PLU.edu
A3STRACT
This paper develops a new approach to finding solutions to the subset sum problem. The subset sum problem is an important NP-complete problem in
computer science which has applications in operations research, cqptography, and bin packing. A genetic algorithm is developed which easily solves this problem. The genetic algorithm begins with a randomly generated population of solutions and breds a new population using the best elements of the previous population. Each g e m t i o n of solutions produces better solutions to the subset-sum problem than the previous generation. It is shown that this approach wl efficiently produce il 1, O solutions to large ( 0O O elements or more) subset sum problems. Various parameters of the algorithm are varied in order to improve its performaace.

1.6 Introduction Given n positive integm wl, . . . , w and a , positive integer W, the subset-sum problem (SSP) asks for the subset of the wfs whose sum is W o as close as r possible to W without exceeding W. Formally, the SSP seeks to
n

maximize
l=1

W,X,

W,X,
i- 1

sw

ere 3 is 0 or 1. This problem is known to be NPcomplete [I] and hence in its most g 1 form, it is a difficult problem to solve. Yet, it is also a problem which offers many practical appiications in computer science, operations research, and management science. It also serves as the basis for several public key cryptosystems. Because of the importance of the potential applications and the general difficulty of solving large SSPs, several algorithms have been developed which solve the problem directly and several
0-7803-2559-1/95 $4.00 0 1995 IEEE
632

fastexalgorithms have been produced which provide good approximate solutions. For example, Ibam and Kim [2] developed a fully polymmial approximation scheme for the SSP in 1975. It was improved upon by Lawler [3] and lam by Martello and Toth [4]. Fischetti [5] confirmed the perfbmawe of Martello and Toths scheme although he found that the worst case perfwmance was not quite a s good as suggested in [4]. Ma~telloand T t reported oh very goal results for several approximation s h m s in cee t e r survey and experimental analysis [6]. hi In 1985, Lagarias and Odlyzko [7l reported the development of an algorithm for the solution of low density knapsacks. Their algorithm could solve low density knapsacks of size 50 using about 14 minutes of CRAY-1 time. Coster et. al. 1471 reported an improvement of the Lagarias and Odlyzko algorithm which both speeded it up and found more solutions. However, their experimental analysis only considered knapsacks up to size 66. Both of these sizes are, of course, considerably less than the 1, O element 0 O O lmapsack generated by DES. Balas and Zemel (481, on the dher hand, produced an algorithm for large knapsacks which performs quite well. However, as they noted in t e r report,their results apply only to knapsacks with hi bounded coefficients. When the coefficients are allowed to grow with the size of the problem (as seems to be the case with the knapsack embedding of s-boxes) then they concluded that the solution time grows exponentially with knapsack size. Martello and Toth (493 have also suggested a solution to the knapsack problem which works well on large knapsacks. They also noted a limitation to their approach. Their best results for large k q m c k s occured when a large number of o p t i d solutions to the knapsack exist so that their branch and bound procedure wl terminate early. il The Martello and Toth algorithm which performed the best in their survey is based on the use of a greedy algorithm for solution of the SSP. This paper presents an entirely different approach to the SSP, one based on the use of a genetic algorithm which provides a

directed random search of the SSP solution space. It turns out that the algorithm is quite simple to implement and the run times are short for even large problems. 2.0 Genetic Algorithms Genetic algorithms (GAS)where first suggested 1 by John Holland in the early seventies E 11. Over the last 20 years they have been used to solve a wide range of search, optimization, and machine learning problems. As the name indicates, these algorithms attempt to solve problems by modeling the way in which human genetic processes seem to operate (at least at a simple level). A good survey of the nature and use of genetic algorithms can be found in book by Goldberg [121. Holland's idea was to construct a search algorithmmodeled on the concepts of natural selectionin the biological sciences. The result is a d m d random search procedure. The process begins by constructing a random population of possible solutions. This population is used to create a new generation of possible solutions which is then used to create another generation of solutions, and so on. The best elements of the current generation are used to create the next generation. It is hoped that the new generation will contain "better" solutions than the previous generation. Remarkably, this turns out to be the case in many applications. 2.1 A Genetic Algorithm for the SSP Many applications of genetic algorithms have been suggested including the development of genetic algorithms to attack NP-complete problems. Yet, most of that effort has been directed to the Traveling Salesman Problem (TSP) [13,14,151. The TSP has posed several problems for genetic algorithms most of which fall in the area of representation. On the other hand, little consideration has been given to the class of subset sum problems for which the representation issue is easy to solve. Recently, Falkenauer and Delchambre [16] did report some success using a genetic algorithm to solve the bin packing problem which is related to the subset sum problem. Spillman has also suggestedusing genetic algorithms to solve the knapsack cipher [171. When constructing a genetic algorithm for a specific problem area, there are t r e systems which need he to be defined. The first is the representation scheme. The second is a mating pmcss consistent with the representation. The third is a mutation process. All t r e systems for the Subset Sum Problem are defined in he this section. 2.2 Key Representation The representation structure for the SSP is easy to generate because the problem naturally suggests a scheme. The binary bit pattern which represents the summation terms is perhaps the best structure. H e m a typical population for an 8 element SSP would be: 1 0 1 0 1 0 1 0 - addtermS1,3,5,&7 0 1 1 0 1 1 0 0 - addterms2,3,5,and6 0 0 0 0 1 0 0 1 - addtermsSand8 etc.

Once a representation scheme is selected for a genetic algorithm, it is also necessary to supply an evaluation function. This function is used to determine the "best" representations. Again, for the SSP the bask structure of the evaluation function is easy to determine. It should measure how close a given sum of terms is to the target sum. Within this general guideline there is a wide range of possible variations. For this paper, the actual evaluation function should have three other p m w e s . First, the function should range between 0 and 1 with 1 indicating an exact match with the target sum for the knapsack. Requiring the evaluation function to fall within a set range gives a better picture of global perfor". Second, chromosomes which produce a sum greater than the target should, in general, have a lower fitness than chromosomes which produce an equivalent sum less than the target. In this way, infeasible solutions, solutions which produce a sum g e t r than the sought after target are penalized while rae feasible solutions have a greater change of being followed by the algorithm. Third, it should be difficult to produce a high fitness value. Small differences between the current chromosome and the target sum should be amplified. This is accomplished by the use of the square root in the evaluation function chosen for this research. It should be noted that none of these three conditions are required by the nature of the gemtic algorithm. In fact, research in genetic algorithm design has shown that any reasonable choice of an evaluation function will work [181. The actual evaluation function used in this research effort is determined as f l o s olw: (1) Calculate the maximum difference that could occur between a c h r o m m e and the target sum: MaxDiff = max (Target, Full-Sum - Target) where Full-Sum is the sum of all the components in the knapsack. (2) Determine the value of the current chromosome call it Sum. (3) If Sum <= Target then the fitness of the chromosome is given by: Fit = 1 - sqrt(lSum Targetnarget) (4) If Sum > Target then the fitness of the chromosome is given by: Fit = 1 - 6throot(lSum - TargetlMaxDiff) 2.3 The Mating Process Given a population of chrommes, each one with a fitness value, the algorithm progresses by randomly selecting two for mating. The selection is weighted in favor of chromosomes with a high fitness with a high fitness value value. That is, c h r o " e s have a greater chance of being selected to generate children for the next generation. The two parents generate two children using the standard crossover

2.4

operation. The Mutation Process After the n w generation has been determined, e the chmosomes are subjected to a low rate mutation

633

function which involves three different processes in an attempt to vary the genes. Half of the time bits are randomly mutated. The other half of the time, bits have a low probability of being swapped with ther neighbor. The final mutation process is one of invertmg the order of a set of bits between two random points. In this inversion process, wluch also has a small probability of occurring, two random points in the chromosome are selected and the order of the bits between those points is reversed. For example, using () to note the two random points, chromosome A - 0 1 1 ( 0 1 1 0 1 1 ) 0 0 1 becomes 0 1 1 ( 1 1 0 1 1 0 ) 0 0 1. These are all low probability mutation processes but between them, they help to prevent the algorithm from becoming stuck in a local optimum point. 2.5 The Complete Algorithm These pmcesses are combined to mate the complete genetic algorithm. 1. A random population of chromosomes (binary string of 0'1 & 1's) is generated. 2. A fitness value for each chromosome in the population is determined. 3. A biased (based on fitness) random selection of two parentsis conducted. 4. The crossover operation is applied to the selected parents. 5. The mutation process is applied to the children. 6. The new population is scanned and used to update the "best" chromosome across the generations. This process will stop after a fixed number of generations or when the best chromosome has a fitness which exceeds the approximation level. 3.0 Experimental Results A SUN pascal program was written for a SPARCStation 1+ which implemented the algorithm described in section 2.0. The fundamental problem inputs to the program included n, the size of the problem, the set of n weights, and the target sum. The inputs to the program which describe the genetic algorithm included the population size, the maximum number of generations, the probability of mutation, the probability of inversion, the probability of a swap operation, and the approximation level. A typical fun was set for a population of 40; a maximum of 250 generations; a probability of mutation of O.OOO1; a probability of inversion of 0.001; and a probability of a swap of 0.001. In general, the gemtic algorithm was able to solve large SSPs to a high degree of accuracy in a short time. This experimental analysis will present the results in several ways. First, the general performance of the algorithm across a range of large SSPs will be presented. Second, the effect of the initial population size on the performance of the algorithm will be considered. 3.1 General Performance Overall, the algorithm efficiently solved large SSPs in a short time. On more than lo00 runs of the algorithm on problems of 100 to 17,000 variables, it rarely failed to find an acceptable solution. Even in

those c s s that failed, it found a very good approximate ae solution but it ran into the maximum number of generations (which w s set to a low 250 for that run). a Clearly, given more generations, every case would have been solved. It was found that the genetic algorithm could easily and quickly find a solution that was close to the optimal. However, it was slow to move f o a close rm solution to the optimal. As a result, a local search routine was added to the basic algorithm. Whenever a population element was within .95 of the solution, the system would examine the element one bit at a time to determine if complementing that bit would improve the solution. This local search metlxxi did not add much overhead to the algorithm yet it greatly improved the time to solution. In fkt, the algorithm with the local search routine routinely solved SSPs in the range of l0,OOO elements in usually less than 20 minnutes on a SUN SPARC l+. Figure 1 is a graph of the fitness of the best element in a population of 40 for a typical run on a l0,OOO element SSP. The exact solution was found in 12.2 minutes. This run required only 30 generations which means that the algorithm examined only 1200 elements of the solution space. Figure 2 shows the results for single runs on three different sized SSPs. The similarity in the three solutions does not seem to be u n d . In fact, the average time to solution across a set of 50 random SSP for each of 21 different sized problems did not vary significantly. These results are shown in Figure 3. 3.2 Population Size Effects While the main purpose of this report is to establish the genetic algorithm as a viable approach to solving the SSP problem, it is also interesting to consider the effects of various genetic parameters on the performance of the algorithm. As a illustration, this section will briefly look at the impact of population size on the solution of the SSP. As expected, the n m e of generations to a ubr solution decreases as the population size increases which is clearly shown in Figure 8 (for a l0,OOO element SSP). This occurs because there are more elements examined within each generation. Of course, this also implies that as the population size increases, the time to process each generation also increases. The result seems to be a relatively stable processing time (within the range 14 to 20) as shown in Figure 4.
4.0 Conclusions While it was found that a genetic algorithm could efficiently solve a large SSP, many questions remain open. For example, what is the effect of changing other genetic parameters such as mutation or crossover rates on the performance of the algorithm? This paper only considered the basic form of a genetic algorithm. Other approaches to genetic algorithms exists, so the question of which genetic structure is best for the solution of SSPs needs to be investigated.

634

Finally, the application of genetic algorithms to other versions and modifications of the standard Sum of Subsets problem should be considered.
1. M.R.k e y & D S Johnson, Computers and .. Intmctability: A Guide to the llzeory o NPf Completeness, W.H. Freeman & Company, New York (1979). 2. O.H. Ibarra and C.E. Kim, Fast approximation algorithms for knapsack and sum of subset problems, Joumal o the ACM, 22,463-468 (1975). f 3. E.L. Lawler, Fast approximation algorithms for knapsack problems,M#hematics o Opemtions Reseurch , f 4,339-356 ( 1979). 4. S . Martello and P. Toth, Worst-case analysis of greedy algorithms fr thesubset-sum problem, o Mathematid Prog?t"ing, 28,198-205 (1984). 5. M.Fischetti, Worst-case analysis of an approximation scheme for the subset sun problem, OpemtionsResearchLmen, 5,283-284 (1986). 6. S. Martello and P. Toth, Approximation schemes for the subset-sumprob1em:Swey and experimental analysis, firopean Joumal o operational Resew&, 2 2, f 56-69 (1985). 7. Lagarias, J.C., and Odlyzko, A.M., "Solving LowDensity Subset SumprOblems," J.ACM, vol32, pp. 229-246,1985.
8. Coster, MJ.,Jon, A., LaMacchia, B.A., Odlyzko, A.M., S C ~ K , and Stem, J., "Improved LowC., Density Subset Sum Algorithms," Comput. complexity, vol2, pp. 111-128, 1992.

13. Braun, H. "OnSolving Traveling Salesman Problems by Genetic Aigorithms", in Parallel Problem rm aue Solving f o N t r ,Lecture Notes in Computer Science. 4%, 129-133,1990. 14. Homaifar, A., Guan,S., and Liepins, G. "A New Approach on the Traveling Salesman Problem by Genetic Algorithms", Proceedings of the 5th International Conference on Genetic Algorithms, pp. 460 466, 1993.

15. Grefenstette, J., "IncorpOratingProblem Specific Knowledge into Genetic Algorithms," Genetic Algorithms and Simulated Annealing, ed.by L. Davis, Morgan Kaufmann, pp. 42-60,1987. 16. Falkenawr, E. and Delchambre, A., "A Genetic Algorithm for Bin Packing and Line Balancing," h Proceedings of t e IEEE International Confenace on Robotics and Automation, pp. 1186 - 1192, 1992. 17. Spillman, R, "Cryptanalysis of knapsack ciphers using genetic algorithms," Cryptologia, 17,367-377, 1993. 18. G. Rawlins, Foundations o Genetic Algorithms, f Morgan K a u f " Publishers, San Mako (1991).

9. Balas, E., and Zemel, E., "AnAlgorithm for Large Zero-One Knapsack Problems," Operations Research, . vol. 28, p ~ 1130-1154, 1980. 10. Martello, S., and Toth, P., "A New Algorithm for the 0-1 Knapsack Problem," Management Science, vol. 34, pp. 633-644, 1988. 11. John H. HoIIand, AdapSation in Mzaml and Artificial System, Umversity of Michigan Press (1975). 12. David Goldberg, Genetic Algorithms in Search, Optimization, and M c i eLaming, A d s - e l y ahn dimWse, R e a d q (1989).

635

0.9
0.8

0.7
0.6
u l

0.5

ir:

c1

0.4

0.3
0.2

Fitness

0.1
0

Time (minutes)

Figure 1 : Typical 10,000Element Knapsack Run

I
rl

PI

&

VI

1 W

b
3

*
d

CJ

m
1 3

timefminutes)

Figure 2: Knapsack Solutions

636

50

..
r(

SIZE

Figure 3: Average time to solution across knapsacks

20

U Time

18

1 6

14

12
O f 3 m O

m O

vr

Population

Figure 4: Population effects

637

You might also like