You are on page 1of 4

Engineering complex systems: Ant Colony Optimization (and in

general natural inspired metaheuristic) to model and to solve


complex dynamic problems

Luca Maria Gambardella


IDSIA, Istituto Dalle Molle di studi sull’Intelligenza Artificiale
Galleria 2, 6928 Manno, Lugano, Switzerland
luca@idsia.ch

Many problems of practical importance solutions. ACO is based on a set of


can be modeled as combinatorial parallel artificial ants that iteratively build
optimisation problems. Examples are: und update problem solutions. These
vehicle routing for distribution of goods solutions are defined by sequence of states
and services, scheduling of jobs in a and are constructed using a probabilistic
production line, generating timetables or search. Artificial ants start from random
scheduling meetings, multiple fault states: next, each ant chooses
location, routing in dynamic networks and probabilistically the new state to visit
many more. using a probabilistic function mainly
Combinatorial optimisation problems are based on the pheromone intensity. At the
formulated defining a model of the end of the each iteration the pheromone on
problem and an objective function, the best solution is increased according to
according to some cost measure, to be a learning rule. The rationale is that in this
optimised. Finding an optimal solution way the structure of ''preferred sequences''
means selecting the best alternative among emerges in the pheromone trail and future
a finite or countably infinite number of ants will use this information to generate
possible solutions. Hence, to practically new and better solutions.
solve large problem instances one often Recently, many ant based algorithms have
has to resort to approximate methods that been proposed to solve different types of
return near-optimal solutions in relatively combinatorial optimization problems such
short time. This type of algorithms are as symmetric and asymmetric travelling
loosely called metaheuristics. salesman problems (TSP/ATSP
Metaheuristics are usually inspired by [7],[10],[11]) the sequential ordering
natural process such as genetic algorithms problem (SOP, [14]), the quadratic
or simulated annealing. assignment problem (QAP,[12]), the
Recently one of the most successful vehicles routing problem (VRPTW, [13])
metaheuristic is Ant Colony Optimization and the information routing in dynamic
([2], [3],[8]). The natural metaphor on internet and telephone network (AntNet,
which ACO algorithms are based is real [5]). In these situations ACO algorithms
ant colonies behaviour. Real ants are show robustness, scalability and self-
capable of finding the shortest path from a organization capabilities since they are
food source to their nest without using able to dynamic adapt to topological
visual cues but exploiting a chemical change in the problem structure and to
substance called pheromone. While rapidly react to failure conditions. Only
walking, ants deposit pheromone on the recently new applications of Ant Colony
ground, and follow, in probability, Optimization include routing in ad hoc
pheromone previously deposited by other networks (AntHocNet, [6]).
ants. The above real ants behaviour has Our interests and our possible
inspired ACO, where a set of artificial ants contributions in the meeting is to
cooperate to solve a combinatorial investigate how ACO based techniques
problem by exchanging indirect (and in general metaheuristic) are useful to
information via artificial pheromone. This solve complex problems where dynamic
artificial pheromone is accumulated at and stochastic situations are the key issue.
run-time through a learning mechanism We are also interested in methodological
that gives reward to good problem
and theoretical aspects of swarm problem structure (i.e., the pheromone
intelligence applied to dynamic problems. trails). The recursive transmission of such
ACO algorithms show similarities with knowledge by means of stigmergy
some optimization, learning and determines a reduction in the variance of
simulation approaches like heuristic graph the whole search process: the so far most
search, Monte Carlo simulation, neural interesting explored transitions
networks, and evolutionary computation. probabilistically bias future search,
These similarities are briefly discussed in preventing ants to waste resources in not
the following. promising regions of the search space.

Heuristic graph search. In ACO Neural networks. Ant colonies, being


algorithms each ant performs an heuristic composed of numerous concurrently and
graph search in the space of the locally interacting units, can be seen as
components of a solution: ants take biased “connectionist” systems [9], the most
probabilistic decisions to choose the next famous examples of which are neural
component to move to, where the bias is networks. From a structural point of view,
given by an heuristic evaluation function the parallel between the ACO meta-
which favors components which are heuristic and a generic neural network is
perceived as more promising. It is obtained by putting each state i visited by
interesting to note that this is different ants in correspondence with a neuron i,
from what happens, for example, in and the problem specific neighborhood
stochastic hillclimbers or in simulated structure of state i in correspondence with
annealing [15], where (i) an acceptance the set of synaptic-like links exiting
criteria is defined and only those randomly neuron i. The ants themselves can be seen
generated moves which satisfy the criteria as input signals concurrently propagating
are executed, and (ii) the search is usually through the neural network and modifying
performed in the space of the solutions. the strength of the synaptic-like inter-
neuron connections. Signals (ants) are
Monte Carlo simulation. ACO locally propagated by means of a
algorithms can be interpreted as parallel stochastic transfer function and the more a
replicated Monte Carlo systems. Monte synapse is used, the more the connection
Carlo systems [17] are general stochastic between its two end-neurons is reinforced.
simulation systems, that is, techniques The ACO-synaptic learning rule can be
performing repeated sampling experiments interpreted as an a posteriori rule: signals
on the model of the system under related to good examples, that is, ants
consideration by making use of a which discovered a good quality solution,
stochastic component in the state sampling reinforce the synaptic connections they
and/or transition rules. Experiment results traverse more than signals related to poor
are used to update some statistical examples. It is interesting to note that the
knowledge about the problem, as well as ACO-neural network algorithm does not
the estimate of the variables the researcher correspond to any existing neural network
is interested in. In turn, this knowledge model. The ACO-neural network is also
can be also iteratively used to reduce the reminiscent of networks solving
variance in the estimation of the desired reinforcement learning problems [18]. In
variables, directing the simulation process reinforcement learning the only feedback
towards the most interesting state space available to the learner is a numeric signal
regions. Analogously, in ACO algorithms (the reinforcement) which scores the result
the ants sample the problem’s solution of actions. This is also the case in the
space by repeatedly applying a stochastic ACO meta-heuristic: The signals (ants)
decision policy until a feasible solution of fed into the network can be seen as input
the considered problem is built. The examples with associated an approximate
sampling is realized concurrently by a score measure. The strength of pheromone
collection of differently instantiated updates and the level of stochasticity in
replicas of the same ant type. Each ant signal propagation play the role of a
“experiment” allows to adaptively modify learning rate, controlling the balance
the local statistical knowledge on the between exploration and exploitation.
Finally, it is worth to make a reference to updating has the same goal as updating the
the work of Chen [4], who proposed a probabilities in the generating vector. A
neural network approach to TSP which main difference between ACO algorithms
bears important similarities with the ACO and PBIL consists in the fact that in PBIL
approach. Like in ACO algorithms, Chen all the probability vector components are
builds a tour in an incremental way, evaluated independently, making the
according to synaptic strengths. It makes approach working well only in the cases
also use of candidate lists and 2-opt local the solution is separable in its
optimization. The strengths of the components.
synapses of the current tour and of all The (1, _) evolution strategy is another EC
previous tours are updated according to a algorithm which is related to ACO
Boltzmann-like rule and a learning rate algorithms, and in particular to Ant
playing the role of an evaporation Colony Systems. In fact, in the (1, _)
coefficient. Although there are some evolution strategy the following steps are
differences, the common features are, in iteratively repeated: (i) a population of _
this case, striking. solutions (ants) is initially generated, then
(ii) the best individual of the population is
Evolutionary computation. There are saved for the next generation, while all the
some general similarities between the other solutions are discarded, and (iii)
ACO meta-heuristic and evolutionary starting from the best individual, _ . 1 new
computation (EC) . Both approaches use a solutions are stochastically generated by
population of individuals which represent mutation, and finally (iv) the process is
problem solutions, and in both approaches iterated going back to step (ii). The
the knowledge about the problem similitude with ACS is striking.
collected by the population is used to
stochastically generate a new population Stochastic learning automata. This is
of individuals. A main difference is that in one of the oldest approaches to machine
EC algorithms all the knowledge about the learning (see [16] for a review). An
problem is contained in the current automaton is defined by a set of possible
population, while in ACO a memory of actions and a vector of associated
past performance is maintained under the probabilities, a continuous set of inputs
form of pheromone trails. An EC and a learning algorithm to learn input-
algorithm which is very similar to ACO output associations. Automata are
algorithms in general and with AS in connected in a feedback configuration
particular is Baluja and Caruana’s with the environment, and a set of penalty
Population Based Incremental Learning signals from the environment to the
(PBIL)[1]. PBIL maintains a vector of real actions is defined. The similarity of
numbers, the generating vector, which stochastic learning automata and ACO
plays a role similar to that of the approaches can be made clear as follows.
population in genetic algorithms. Starting The set of pheromone trails available on
from this vector, a population of binary each arc/link is seen as a set of concurrent
strings is randomly generated: each string stochastic learning automata. Ants play
in the population will have the i-th bit set the role of the environment signals, while
to 1 with a probability which is a function the pheromone update rule is the
of the i-th value in the generating vector. automaton learning rule. The main
Once a population of solutions is created, difference lies in the fact that in ACO the
the generated solutions are evaluated and “environment signals” (i.e., the ants) are
this evaluation is used to increase (or stochastically biased, by means of their
decrease) the probabilities of each separate probabilistic transition rule, to direct the
component in the generating vector so that learning process towards the most
good (bad) solutions in the future interesting regions of the search space.
generations will be produced with higher That is, the whole environment plays a
(lower) probability. It is clear that in ACO key, active role to learn good state-action
algorithms the pheromone trail values play pairs.
a role similar to Baluja and Caruana’s
generating vector, and pheromone
References C o m p u t a t i o n , ICEC96, Nagoya,
Japan, May 20-22, 1996, pp. 622-627.
1. Baluja S. and Caruana R.. Removing 12. Gambardella L.M, Taillard E., Dorigo
the genetics from the standard genetic M., Ant colonies for the Quadratic
algorithm. In A. Prieditis and S. Assignment Problem , Journal of the
Russell, editors, Proceedings of the Operational Research Society, 50,
Twelfth International Conference on pp.167-176, 1999
Machine Learning, ML-95, pages 13. Gambardella L.M, Taillard E., Agazzi
38–46. Palo Alto, CA: Morgan G., MACS-VRPTW: A Multiple Ant
Kaufmann, 1995. Colony System for Vehicle Routing
2. E. Bonabeau, M. Dorigo, G. Problems with Time Windows , In D.
Theraulaz, N a t u r e , Volume 406 Corne, Dorigo M. and F. Glover,
Number 6791 Page 39–42, 2000 editors, New Ideas in Optimization.
3. Bonabeau E., Theraulaz G., Swarm McGraw-Hill, London, UK, pp. 63-76,
Smarts, Scientific American, March 1999
2000, Page 55-61, 2000 14. Gambardella L.M, Dorigo M., An Ant
4. Chen K. A simple learning algorithm Colony System Hybridized with a
for the traveling salesman problem. New Local Search for the Sequential
Physical Review E, 55, 1997. Ordering Problem, INFORMS Journal
5. Di Caro G. and Dorigo M. AntNet: on Computing, vol.12(3), pp. 237-255,
Distributed Stigmergetic Control for 2000
Communications Networks., Journal 15. Kirkpatrick, S. Gelatt C.D., and
of Artificial Intelligence Research, Vecchi M.P.. Optimization by
JAIR, 9:317-365, 1998 s i m u l a t e d a n n e a l i n g . Science,
6. Di Caro G., Ducatelle F., Gambardella 220(4598):671–680, 1983.
L.M., AntHocNet: an Ant-Based 16. Narendra K. and Thathachar M..
Hybrid Routing Algorithm for Mobile Learning Automata: An Introduction.
Ad Hoc Networks, Technical Report Prentice-Hall, 1989.
IDSIA, 42, 2004 17. Rubistein R.Y. Simulation and the
7. Dorigo M., Gambardella L.M, Ant Monte Carlo Method. John Wiley &
Colony System: A Cooperative Sons, 1981.
Learning Approach to the Traveling 18. Sutton R.S. and Barto A.G.
Salesman Problem, IEEE Transactions Reinforcement Learning: An
on Evolutionary Computation 1,1, pp. Introduction. Cambridge, MA: MIT
53-66, 1997 Press, 1998.
8. Dorigo M., G. Di Caro and L. M.
Gambardella. Ant Algorithms for
Discrete Optimization. Artificial Life,
5,2, pp. 137-172, 1999.
9. Feldman J. A. and Ballard V.
Connectionist models and their
properties. Cognitive Science,
6:205–254, 1982.
10. Gambardella L.M, Dorigo M, Ant-Q:
a reinforcement learning approach to
the traveling salesman problem,
Proceedings of ML-95, T w e l f t h
International Conference on Machine
Learning, A. Prieditis and S. Russell
(Eds.), Morgan Kaufmann, 1995, pp.
252–260.
11. Gambardella L.M, Dorigo M., Solving
Symmetric and Asymmetric TSPs by
Ant Colonies, Proceedings of the
IEEE Conference on Evolutionary

You might also like