Professional Documents
Culture Documents
Search For Maximal Snake-in-the-Box Using New Genetic Algorithm
Search For Maximal Snake-in-the-Box Using New Genetic Algorithm
Search For Maximal Snake-in-the-Box Using New Genetic Algorithm
Algorithm
Kim-Hang Ruiz
International MIS
1065 Waltons Pass
Evans, GA 30809
kimhangruiz@aol.com
ABSTRACT
1. INTRODUCTION
General Terms
Algorithms, Theory
Keywords
Genetic Algorithm, Mitosis Genetic Algorithm, Heuristics, Snake,
Hypercube
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by
others than the author(s) must be honored. Abstracting with credit is permitted. To
copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from
Permissions@acm.org.
GECCO '14, July 12 16, 2014, Vancouver, BC, Canada.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-2662-9/14/07$15.00.
http://dx doi org/10 1145/2576768 2598296
831
S11 and 3-S13 of lengths 125, 158, and 509, longer than the best
previously known value, 103, 157 and 493 respectively [6]. It
also found all the previously known optimal snakes from spread 2
to spread 5; and the known longest 3-S9 snake of length 63
without priming. It is remarkable that the search times to find
these optimal and longest maximal k-Sn snakes took minutes or
hours, significantly shorter than the days to weeks required by the
other techniques. A list of transition sequences of three recordbreaking snakes is provided in an appendix (section 9).
2.1 Terminology
k
k-Sn
n
Qn
O(n)
OGA(t)
O(t)
Ok(t)
Ok-1(t)
spread
n-dimensional k-spread snake, e.g 3-S8 denotes 8dimensional 3-spread snake. In the literature Sn is used
for 2-Sn.
number of dimensions.
n-dimensional hypercube, e.g Q9 denotes 9-dimensional
hypercube.
search space.
search time for each Genetic Algorithm in the MGA.
search time for the Mitosis Genetic Algorithm.
search time for k-spread snakes.
search time for (k-1)-spread snakes
3. HYPERCUBE FUNDAMENTALS
One of the most important properties of the hypercube is that the
number of vertices in Qn is always twice as much as in Qn-1.
Figure-1 shows how the sole node in Q0 duplicates itself by
translating in coordinate 0 (transition 0), to form a line in Q1
which continues to duplicate in transitions 1, 2, and 3 to form Q2,
Q3, and Q4 respectively.
2.2 Definitions
832
4.3 Selection
Based on the fitness which is evaluated by the objective function,
individual chromosomes with a higher value will have a better
chance to be selected as a parent to produce offspring of the next
generation. There are many different methods to select, the two
most common are: weighted Roulette-Wheel, and Tournament. In
Roulette-Wheel selection, the probability of a member being
selected is proportional to its fitness. In Tournament selection,
four or six members in the current population are randomly
selected; and the one best fitted will become parent
(Tournament-4 or Tournament-6).
Two selected parent
chromosomes may or may not be replaced back to the current
population. If it is, the selection is with replacement, otherwise,
selection without replacement. In the selection with replacement,
the fitness of the parent is unchanged after being selected. In the
selection without replacement, the fitness of the parent is set to 0
to eliminate the chance to be selected again. It is also
recommended that the selection types in all GAs be the same.
The first two notices indicate that a tight k-Sn-1 snake with less
unavailable nodes in vicinity V1 would be more likely to produce
a longer k-Sn snake. The third notice indicates that the number of
unavailable nodes in vicinity Vk-1 can be used to distinguish the
ability of a k-Sn-1 snake in extending into a longer k-Sn snake. If
two snakes k-Sn-1 have the same length but different number of
unavailable nodes in vicinity Vk-1, the snake with the larger
number would have higher ability to extend into a longer snake
(since nodes in Vk-1 will become available nodes in Qn). The
fourth notice indicates that there are 2 available nodes in Qn for
every available node in Qn-1. These important properties will be
incorporated in calculating the fitness function in the MGA.
The replication of the snake path and unavailable paths in the left
half of the hypercube (Qn-1) into the right half, in preparing for
extending k-Sn snakes in Qn is analogous to the natural mitosis
4.4 Crossover
Each time two parents are selected, their chromosomes are
interchanged to generate new chromosomes that are different but
833
5.1 Representation
4.5 Mutation
A common single point mutation is used in MGA. It involves a
probability that an arbitrary gene in the chromosomes of the new
population will be changed from its original state. Each gene in
the chromosomes of the new population is visited by the
algorithm, and a number from 0 to 1 is randomly generated. If the
generated number is smaller than the predetermined mutation
probability, the gene will be replaced with a different gene. The
predetermined mutation probability can be different in each GA;
but it also is often set to be the same.
5. MGA IMPLEMENTATION
Figure 3 illustrates the MGA procedure in searching for the
longest maximal k-Sn. First, an initial population of transition
sequences of k-Sn-1 is generated. The GA is applied to it to find a
population of near optimal k-Sn-1 which will then be replicated
and extended to form an initial population of transition sequences
of k-Sn. The GA will be applied again to this new initial
population to find the longest maximal k-Sn, the solution to the
problem. The first initial population can be from any lower
5.3 GA Operators
In this study, both Tournament-4 and Tournment-6 selections
with and without replacement are used. Both one-point crossover
and two-point crossover are utilized. Since the representation of
the chromosome is the transition sequence, there is no special
works needed to perform during or after the crossover operators to
keep the node adjacency intact. Single point mutation is used.
834
5.
Fitness = length + E
(1)
Fitness = length
(2)
Where E is the least potential extended snake nodes in the nexthigher-dimensional hypercube.
Function (1) is used in GAs to evolve k-Sn-1 populations in lower
dimensional hypercubes, which will be used further to extend into
k-Sn population. Thus this function must account for the ability of
k-Sn-1 to extend into k-Sn. Each time that one more node is added
to the snake path in the extension procedure, (n-k)*(k-1) nodes
will become unavailable. Based on the notices in the hypercube
fundamentals above, the total available nodes in Qn is equal to the
sum of all unavailable nodes in vicinity Vk-1 plus two times the
available nodes in Qn-1. The least potential extended snake nodes
in the next higher dimensional hypercube E roughly equals to the
ratio of the total available nodes in Qn over (n-k)*(k-1).
Therefore
The snake length calculation stops when the next transition in the
chromosome leads to an unavailable node. Since the subject of
this study is to find the longest maximal k-Sn snake, an extension
procedure will be applied to the snake to determine whether it
could be extended and if so what the extended length would be.
The extension procedure begins with the calculation of the
number of available transitions from the snake tail. If the number
of available transitions is zero, the snake is maximal (cant be
extended), and the length of the maximal k-Sn snake can be
reported in the fitness function. If the number of available
transition is greater than zero, replace the unavailable transition
with an available transition. Either one of four extension
procedures listed below can be used to select an available
transition:
835
Even though the MGA found the optimal 2-S7 of length 50 and
the maximal 2-S8 of length 97 within 5 minutes, it could not find
the maximal 2-Sn >8 snakes near the previously known records. It
found 2-S9, 2-S10, 2-S11, 2-S12 and 2-S13 of lengths 185, 350, 595,
1033, and 1887 while the previously established records are 190,
370, 695, 1274, and 2466 respectively. This indicates that in order
to be more effective in search for these 2-spread snakes, the
parameter settings in GAs, the fitness functions and/or the MGA
procedure should be modified.
6. RESULTS
A summary of the results is given in Table 1, where the lengths of
the best known k-Sn values for dimension n 13 and spread k
5 are listed. The values in parentheses are the best previously
known values published in [2, 6, 8, and 15]. ]. Recent tests at
http://ai.uga.edu/sib/sibwiki/doku.php/records discover longer 2S11, 2-S12 and 2-S13 but results have not been published yet.
Table 1. Longest known
values in parentheses)
Dimension
n
2
6
(26*)
7
(50*)
8
(98)
9
(190)
10
(370)
11
(695)
12
(1274)
13
(2466)
* optimal value
without
7. DISCUSSION
It is noticed that the extension type, the length of chromosomes,
the number of replication, the population size, the number of
generations, the probability of crossover and mutation, all affect
the MGAs ability to find the longest k-Sn snakes.
5
(7*)
(9*)
(11*)
(19*)
(25*)
(39*)
(56)
(79)
The MGA also found all the previously known optimal k-Sn
snakes and the best previously known maximal 3-S9 of length 63.
Any optimal snake of length shorter than 26 was found within a
few seconds and longer ones within a few minutes. DFS took
only few milliseconds to find the former and days to find the
latter [2, 8]. The MGA found the optimal 2-S7 of length 50
within 5 minutes, much shorter than the days needed by the GA in
[13]. These results indicate that the DFS is best suited in search
for the optimal(s) shorter than 27 and the MGA for the optimal
and longest snakes longer than 26.
The MGA found the nearest values to the best previous known
maximal 2-S8, 4-S11, 4-S12, 5-S10, 5-S11, 5-S12 within minutes to
hours. In the attempt to reach the best previously known value,
the search for these snakes were repeated with much larger
population sizes and larger numbers of generations. The results
showed that the snake lengths did not improve and in some cases
were actually shorter. Thus, the assumption that the larger the
population size and the number of generations, the better is the
chance to find the longest maximal is not always true, especially
in the excessively large population sizes. This might have been
due to the convergence in the selection operator. When the
population size is too large, duplicate members are more likely to
happen. Duplicate better fit members will quickly flood the
next population which causes premature convergence, and the
snake closest to the longest will consequently be found instead of
the longest. This phenomenon happens more often in the
836
m = lp
(3)
For any Qn, the maximum length of k-Sn snakes must be less than
the depth of the search tree, 2n/k. Thus
O(t) < (50*R + g) 2rp2(2n/k)
O(t) < (50*R + g) rp2(2n+1/k)
(4)
Functions (3) and (4) indicate that the MGA search time is
linearly proportional to the number of generations, the repeating
time, the length of the chromosomes, and the square of the
population size, and grows exponentially with n. Thus, when n is
raised by one increment, the number of vertices double and the
search time grows exponentially with n (not double).
The results show that the real search time for 4-S10 is 35 minutes,
much longer than the search time for 2-S7 of 5 minutes, even
though the length of chromosome representing 4-S10 (52) is
slightly shorter than the length of chromosome representing 2-S7
(55) and the population size in the search for 4-S10 (3000) is
smaller than the population size in the search for 2-S7 (5000). The
search time functions above did not include the run time needed
to mark unavailable nodes each time a snake node is assigned or
extended in the chromosome. In general, the higher the spread,
the more unavailable nodes need to be marked. The number of
unavailable nodes that need to be marked in the MGA are:
Lets look at the search space and the search time of the MGA to
understand why. The search space is clearly proportional to the
population size p, the number of generation g, and the repeating
time r. Thus
O(n) = pgr
Each time a new generation is built, the selection, crossover and
mutation operators in the GA are carried out, therefore the search
time in the GA can be formulated as followed:
OGA(t) = prg(s+c+m)
(n - 1) for spread-2,
(n - 1)*(n - 2) / 2 for spread-3,
(n - 1)*(n - 2)*(n - 3) / 4 for spread-4, and
(n - 1)*(n - 2)*(n - 3)*(n - 4) / 8 for spread-5.
Where
s = 6p
Accordingly, s = 4p for Tournament-4 selection.
During the crossover operator, each gene in two parent
chromosomes will be copied to the next generation despite the
type of crossover operator. Thus the search time for c is
proportional with the length of the chromosome l and the
population size p.
c = lp
837
[3] Casella, D. and Potter, W., New lower bounds for the snakein-the-box problem: Using evolutionary techniques to hunt
for snakes and coils, in Proceedings of the Florida Artificial
Intelligence Research Society Conference, (2005).
8. CONCLUSIONS
The MGA found three new record-breaking 3-spread snakes in
Q10, Q11 and Q13, all the previously known optimal snakes from
spread 2 to spread 5, and the known longest maximal 3-S9 snake
of length 63. It is remarkable that it found those within minutes
to hours without using any longest previously known snake to
seed. This proves that MGA is a very effective technique in
tackling SIB problem. Modifications to the MGA procedure or
settings have been researched to improve its search for 2-spread
snakes in dimensions higher than 8.
Preliminary results are
promising but more tests need to be done before those results can
be reported.
9. APPENDIX
81748675837486728175847683758470827684738576847182738
67485738679817486758374867281758476837584708276847385
7684718273867485738
[13] Potter, W., Robinson, R., Miller, J., Kochut, K., and Redys,
D. Using the genetic algorithm to find snake-in-the-box
codes. In Proceedings of the 7th International Conference on
Industrial & Engineering Applications of Artificial
Intelligence and Expert Systems, (1994) 307-314.
10. REFERENCES
[1] Abbott, H. L. and Katchalski, M., On the construction of
snake in the box codes, Utilitas Mathematica, 40, (1991),
97116.
838