DIS 11-12 W05 Lecture PDF

Distributed Intelligent Systems – W5
An Introduction to Genetic
Al
Algorithms
ith and
d
Particle
c S Swarm Op
Optimization
and their Application to Single-
R b t Design
Robot D i anddOOptimization
ti i ti
Outline
• Machine-learning-based methods Start
– Rationale for embedded systems

y
Initialize
– Terminology
• Genetic Algorithms (GA)
• Particle Swarm Optimization (PSO) Perform main loop
• p
Comparison between GA and PSO
• Noise-resistance End criterion met?
• pp cat o to single-robot
Application s g e obot sshaping
ap g N
Y
– Examples in control design and optimization
(obstacle avoidance, homing) End
– Examples in system design and optimization
(locomotion)
Rationale
R ti l andd
Classification
Why
y Machine-Learning?
g
• Complementary to a model-based/engineering
approaches:
h when h llow-level
l l details
d il matter
(optimization) and/or good models do not exist
(design)!
• When the design/optimization space is too big
(infinite)/too computationally expensive (e.g.
(e g NP
NP-
hard) to be systematically searched
• Automatic design and optimization techniques
• Role of the engineer refocused to performance
specification,
p , pproblem encoding,
g, and
customization of algorithmic parameters (and
perhaps operators)
Why
y Machine-Learning?
g
• There are design and optimization techniques

robust to noise, nonlinearities, discontinuities
• Possible
P ibl searchh spaces: parameters, rules,l SW or
HW structures/architectures
• Individual real-time adaptation to new
environmental conditions; i.e. increased
individual flexibility when environmental
conditions are not known/cannot predicted a
priori
ML Techniques: Classification Axis 1
– Supervised learning: off-line, a teacher is available
– Unsupervised learning: off-line, teacher not available
– Reinforcement-based (or evaluative) learning: on-

line, no pre-established training and evaluation data
sets
Supervised Learning
– Off
Off-line
line
– Training and test data are separated, a teacher is available
– Typical scenario: a set of input-output examples is provided to
the system, performance error given by difference between
system output and true/teacher-defined output, error fed to the
system using optimization algorithm so that performance is
increased over trial
– The generality of the system after training is tested on
examples not previously presented to the system (i.e. a “test
test
set” exclusive from the “training set”)
Unsupervised Learning
– Off-line
– No teacher available, no distinction between training
andd test data
d sets
– Goal: structure extraction from the data set
– Examples: data clustering, Principal Component
Analysis (PCA) and Independent Component analysis
(ICA)
Reinforcement-based
Reinforcement based (or Evaluative) Learning
– On-line
– No pre-established training or test data sets
– The system judges its performance according to a given metric
(e g fitness function,
(e.g., function objective function,
function performance,
performance
reinforcement) to be optimized
– The metric does not refer to any specific input-to-output
mappingi
– The system tries out possible design solutions, does mistakes,
and tries to learn from its mistakes
ML T
Techniques:
h i Classification
Cl ifi i Axis A i 2
– In simulation: reproduces the real scenario in simulation and

applies there machine-learning techniques; the learned
solutions
l ti are then
th downloaded
d l d d onto
t reall hardware
h d when
h
certain criteria are met
– Hybrid: most of the time in simulation (e.g. 90%), last

period (e.g. 10%) of the learning process on real hardware
– Hardware-in-the-loop: from the beginning on real hardware

(no simulation). Depending on the algorithm more or less
rapid
id
ML T
Techniques:
h i Classification
Cl ifi i Axis A i 3
ML algorithms
g require
q often fairlyy important
p
computational resources (in particular for population-
based algorithms), therefore a further classification is:
– On-board: machine-learning algorithm run on the system to

be learned (no external unit)
– Off-board: the machine-learning algorithm runs off-board

and the system to be learned just serves as embodied
implementation of a candidate solution
ML T
Techniques:
h i Classification
Cl ifi ti Axis A i 4
− Population-based (“multi-agent”): a population of
candidate
did solutions
l i is
i maintained
i i d by b the
h algorithm,
l i h for f
instance Genetic Algorithms (individuals), Particle Swarm
Optimization (particles), Particle Filter (particle), Ant
Colony
l Optimization
i i i (tour ( solution
l i set, etc.);) it
i can refer
f
also to the fact that multiple agents are needed to build the
solution set, for instance in virtual ants as core multi-agent
principle in Ant Colony Optimization
− Hill-climbing (“single-agent”): the machine-learning

algorithm work on a single candidate solution and try to
improve on it; for instance, (stochastic) gradient-based
methods, reinforcement learning algorithms
Selected Evaluative Machine-
L
Learning
i Techniques
T h i
• Evolutionary computation
– Genetic
G i Algorithms
Al i h (GA) population-based,
l i b d W5
– Genetic Programming (GP) population-based
– Evolutionary Strategies (ES) population-based
population based
• Swarm Intelligence
– Ant Colonyy Optimization
p ((ACO)) p
population-based,
p , W1-2
– Particle Swarm Optimization (PSO) population-based, W5
• Learning
– In-Line Adaptive Learning hill-climbing, W7
– Reinforcement Learning (RL) hill-climbing
• Particle filters (PF) population-based, W4
Genetic Algorithms
Genetic Algorithms
g Inspiration
p
• In natural evolution, organisms adapt to
their environments
en ironments – better able to ssurvive
r i e
over time
• Aspects off evolution:
l i
– Survival of the fittest
– Genetic combination in reproduction
– Mutation
• Genetic Algorithms use evolutionary
techniques
q to achieve pparameter
optimization and solution design
GA: Terminology
• Population: set of m candidate solutions (e.g. m = 100); each candidate
solution can also be considered as a genetic individual endowed with a single
chromosome which in turn consists of multiple genes.
genes
• Generation: new population after genetic operators have been applied (n = #
generations e.g. 50, 100, 1000).
• Fitness function: measurement of the efficacy of each candidate solution
• Evaluation span: evaluation period of each candidate solution during a given
generation. The time cost of the evaluation span differs greatly from scenario
to scenario: it can be extremely cheap (e.g., simply computing the fitness
function in a benchmark function)) or involve an experimental
p period
p (e.g.,
( g
evaluating the performance of a given control parameter set on a robot)
• Life span: number of generations a candidate solution survives
• Population manager: applies genetic operators to generate the candidate
solutions of the new generation from the current one
Evolutionary Loop: Several
Generations
Start
Initialize Population
Generation loop
Ex. of end criteria:

E d criterion
End it i met?
t?
• # of generations N
• best solution pperformance Y
•… End
Generation Loop
Crossover and
mutation
Population
Selection
replenishing
Decoding
Fitness measurement
(genotype → phenotype) and encoding (phenotype
→ genotype)
Population Manager
Evaluation of
Individual Candidate
System Solutions
GA: Encoding & Decoding
phenotype genotype phenotype
encoding (chromosome) decoding
• phenotype:
h usually
ll representedd by
b the
h whole
h l system which
hi h can be
b
evaluated; the whole system or a specific part of it (problem formulation
done by the engineer) is represented by a vector of dimension D; vector
components
p are usuallyy real numbers in a bounded rangeg
• genotype: chromosome = string of genotypical segments, i.e. genes, or
mathematically speaking, again a vector of real or binary numbers; vector
dimension varies according to coding schema (≥ D); the algorithm search in
this hyperspace
G1 G2 G3 G4 … Gn Gi = gene = binary or real number
Encoding: real-to-real or real-to-binary via Gray code (minimization of
nonlinear jumping between phenotype and genotype)
Decoding: inverted operation
Rem:
• Artificial evolution: usually one-to-one mapping between phenotypic and genotypic space
• Natural evolution: 1 gene codes for several functions, 1 function coded by several genes.
GA: Basic Operators
• Selection: roulette wheel (selection probability determined by
normalized fitness), ranked selection (selection probability determined
by fitness order), elitist selection (highest fitness individuals always
selected)
• Crossover (e.g.1 point, pcrossover = 0.2)
• Mutation (e.g. pmutation = 0.05)
Gk Gk’
Particle Swarm Optimization
Flocking and Swarming Inspiration
Principles: More on flocking in Week 8
• Imitate
• Evaluate
• Compare
p
PSO: Terminology
• Population: set of candidate solutions tested in one time step, consists of m
particles (e.g., m = 20)
• Particle: represents a candidate solution; it is characterized by a velocity
vector v and a position vector x in the hyperspace of dimension D
• Evaluation span: evaluation period of each candidate solution during a
single time step (algorithm iteration); as in GA the evaluation span might
take more or less time depending on the experimental scenario
• Life span: number of iterations a candidate solution survives
• Fitness function: measurement of efficacy of a given candidate solution
during the evaluation span
• Population manager: update velocities and position for each particle
according
di tto th
the main
i PSO loop
l
Evolutionary Loop: Several
Generations
Start
Initialize particles
Perform main PSO loop
Ex. of end criteria:

E d criterion
End it i met?
t?
• # of time steps N
• best solution pperformance Y
•… End
Initialization: Positions and Velocities
The Main PSO Loop – Parameters
and Variables
• Functions
– rand ()= uniformly distributed random number in [0,1]
• Parameters
– w: velocity inertia (positive scalar)
– cp: personal best coefficient/weight (positive scalar)
– cn: neighborhood best coefficient/weight (positive scalar)
• Variables
– xij(t): position of particle i in the j-th dimension at time step t (j = [1,D])
– vij(t): velocity particle i in the j-th dimension at time step t
– xij* (t ) : position
iti off particle
ti l i in
i the
th j-th
j th dimension
di i with
ith maximal
i l fitness
fit up to
t
iteration t
– xi*′j (t ) : position of particle i’ in the j-th dimension having achieved the
maximal fitness upp to iteration t in the neighborhood
g of particle
p i
The Main PSO Loop
(Eberhart, Kennedy, and Shi, 1995, 1998)
At each time step t
for each particle i
for each component j
update
p vij (t + 1) = wvij (t ) +
the c p rand ()( xij* − xij ) + cn rand ()( xi*′j − xij )
velocity
then move xij (t + 1) = xij (t ) + vij (t + 1)

The main PSO Loop
- Vector Visualization
xi* My position for optimal

fitness up to date
Here I
am! xi
xi*′
The position with
optimal fitness of my
neighbors
g up
p to date
vi
Neighborhood Types
• Size:
– Neighborhood index considers also the particle itself in
the counting
– Local: only k neighbors considered over m particles in
the population (1 < k < m); k=1 means no information
from other pparticles used in velocityy update
p
– Global: m neighbors
• Topology:
– Geographical
G hi l
– Social
– Indexed
– Random
– …
Neighborhood Examples:
Geographical vs. Social
geographical social
Neighborhood Example: Indexed
and Circular (lbest)
1
Particle 1’s 3- 2
neighbourhood 8
7 3
Virtual circle 4
6
5
A Simple PSO Animation
© M. Clerc
GA vs
vs. PSO
GA vs. PSO - Qualitative
Q
Parameter/function GA PSO
General features Population-based, Population-based,

probabilistic search probabilistic search
Individual memory no yes (randomized hill
climbing)
Individual operators
p mutation personal best pposition
p
history, velocity
inertia
S i l operators
Social selection,
l i crossover neighborhood
i hb h d bestb
position history
Particle s variables
Particle’s position position and velocity
(tracked by the
population manager)
GA vs.
vs PSO - Qualitative
Parameter/function GA PSO
# of algorithmic pm, pc, selection par. w, cn, cp, k, m,

parameters (basic) (1), m, position range position and velocity
(1) = 5 range (2) = 7
Population diversity somehow tunable via Mainly via local
pc/ppm ratio and neighborhood
g
selection schema
Global/local search somehow tunable via Tunable with w (w↑
b l
balance pc/p
/ m ratio
ti andd → global
l b l search;
h w↓↓
selection schema → local search)
Original search space discrete (but continuous (but some
continuous ok work on discrete
nowadays) nowadays)
GA vs. PSO - Quantitative
Q
• Goal: minimization of a given f(x)
• Standard benchmark functions with thirty terms (n = 30) and a
fixed number of iterations
• All xi constrained to [-5.12, 5.12]
GA vs. PSO - Quantitative
• GA: Roulette Wheel for selection, mutation applies numerical
adjustment to gene
• PSO: lbest ring topology with neighborhood of size 3
• Algorithm parameters used (not optimized in general and
biased towards small population sizes):
GA vs.
vs PSO - Quantitative
Bold: best results; 30 runs; no noise on the performance function
Function GA PSO
((no noise)) ((mean ± std dev)) ((mean ± std dev))
Sphere 0.02 ± 0.01 0.00 ± 0.00
Generalized 34.6 ± 18.9 7.38 ± 3.27

Rosenbrock
Rastrigin 157 ±21.8 48.3 ± 14.4
Griewank 0 01 ± 0.01
0.01 0 01 0 01 ± 0.03
0.01 0 03
GA vs. PSO – Overview
• Accordingg to most recent research, PSO
outperforms GA on most (but not all!)
p
continuous optimization problems
p
• GA still much more widely used in general
research community and robust to continuous
and discrete optimization problems
• Because of random aspects
aspects, very difficult to
analyze either metaheuristic or make
guarantees about performance
Design
g and Optimization
p of
Obstacle Avoidance Behavior
using
i Genetic
G ti Al
Algorithms
ith
Evolving a Neural Controller
S3 S4
Oi output S2 S5
S1 S6
neuron N with sigmoid
Ni f(x
( i) transfer function f(x)
M1 M2
wij
Oi = f ( xi )
synaptic 2
weight f ( x) = −x
−1
Ij 1+ e S8 S7
m
xi = ∑ wij I j + I 0
input
inhibitory conn.
j =1 excitatory conn.
Note: In our case we evolve synaptic weigths but Hebbian rules for dynamic
change of the weights, transfer function parameters, … can also be evolved (see
Floreano’s course)
Evolving Obstacle Avoidance
(Floreano and Mondada 1996)
Defining performance (fitness function):
Φ = V (1 − ΔV )(1 − i )
• V = mean speed of wheels, 0 ≤ V ≤ 1
• Δv = absolute algebraic difference
between wheel speeds, 0 ≤ Δv ≤ 1
• i = activation value of the sensor with the
highest activity, 0 ≤ i ≤ 1
Note: Fitness accumulated during evaluation span, normalized over number of control
loops (actions).
Shaping Robot Controllers
Note:
Controller architecture can be
of any type but worth using
GA/PSO if the number of
parameters to be tuned is
important
p
Evolved Obstacle Avoidance Behavior
Generation 100, on-line,

off-board (PC-hosted)
evolution
Note: Direction of motion NOT encoded in the fitness function: GA

automatically discovers asymmetry in the sensory system
configuration (6 proximity sensors in the front and 2 in the back)
Evolving Obstacle Avoidance
Evolved path
Fitness evolution
Expensive Optimization and
Noise Resistance
Expensive Optimization Problems
Two fundamental reasons making robot control design
and optimization
p expensive
p in terms of time:
1. Time for evaluation of candidate solutions (e.g.,

t
tens off seconds)
d ) >> time
ti for
f application
li ti off
metaheuristic operators (e.g., milliseconds)
2 Noisy performance evaluation disrupts the
2.
adaptation process and require multiple evaluations
for
o actua
actual performance
pe o a ce
– Multiple evaluations at the same point in the search
space yield different results
– Noise causes decreased convergence speed and residual
error
Reducing Evaluation Time
General recipe: exploit more abstracted, calibrated
representations (models and sim
simulation
lation tools)
See also modeling

lectures (W6 and W7)
Dealing with Noisy Evaluations
• Better information about candidate solution can be obtained by
combining multiple noisy evaluations
• We
W could ld evaluate
l systematically
i ll eachh candidate
did solution
l i for
f a
fixed number of times → not efficient from a computational
perspective
• We
W wantt tot dedicate
d di t more computational
t ti l time
ti tot evaluate
l t
promising solutions and eliminate as quickly as possible the
“lucky” ones → each candidate solution might have been
evaluated a different number of times when compared
• In GA good and robust candidate solutions survive over
generations; in PSO they survive in the individual memory
• Use dedicated functions for aggregating multiple evaluations:
e.g., minimum and average or generalized aggregation
functions (e.g., quasi-linear weighted means), perhaps
combined
co b ed wwith
t a stat
statistical
st ca test for
o comparing
co pa g resulting
esu t g
aggregated performances
GA PSO
Testing Noise
Noise-Resistant
Resistant GA and
PSO on Benchmarks
• Benchmark 1 : generalized Rosenbrock function
– 30 real parameters
– Minimize objective function
– Expensive only because of noise
• Benchmark 2: obstacle avoidance on a robot

– 22 reall parameters
– Maximize objective function
– Expensive because of noise and evaluation time
Benchmark 1: Gaussian Additive Noise
on Generalized Rosenbrock
Fair test: same

number of
evaluations
candidate
solutions for all
algorithms
(i.e. n generations/
iiterations
i off standard
d d
versions compared
with n/2 of the
noise-resistant ones))
Benchmark 2:
Obstacle Avoidance on a Mobile Robot
• Similar to [Floreano and
Mondada 1996]
– Discrete-time, single-layer, artificial
recurrent neural network controller
V = average wheel speed, Δv = difference
– Shaping of neural weights and biases between wheel speeds,
p , i = value of
(22 reall parameters)
t ) most active proximity sensor
– fitness function: rewards speed,
straight movement, avoiding
obstacles
• Different from [Floreano and
Mondada 1996]
– E
Environment:
i t bounded
b d d open-space
of 2x2 m instead of a maze
A t l results:
Actual lt [Pugh
[P h J.,
J EPFL PhD Th Thesis
i NNo.
4256, 2008]
Similar results: [Pugh et al, IEEE SIS 2005]
Baseline Experiment:
Extended-Time Adaptation
• Compare the basic algorithms
with
ith their corresponding noise-
noise
resistant version
• Population size 100, 100
iterations, evaluation span 300
secondsd ((150 s ffor noise-
i
resistant algorithms)→ 34.7
days
• Fair test: same total evaluation
time for all the algorithms
• Realistic simulation (Webots)
• Best evolved solutions averaged
over 30 evolutionary runs
• Best candidate solution in the
final pool selected based on 5
runs of 30 s each; performance
tested
d over 40 runs off 30s
30 eachh
[Pugh J., EPFL PhD Thesis No. 4256, 2008] • Similar performance for all
algorithms
Where Can Noise-Resistant
Algorithms Make the Difference?
• Limited adaptation time

• Hybrid adaptation (simulation/hardware in the loop)
• Large amount of noise
Notes:
• all examples from shaping obstacle avoidance behavior
• best evolved solution averaged over multiple runs
• fair
f i tests: same totall amount off evaluation
l i timei for
f all
ll the
h
different algorithms (standard and noise-resistant)
Limited-Time
Limited Time Adaptation Trade
Trade-Offs
Offs
Especially good with • Total adaptation time
small populations = 8.3
8 3 hours (1/100 of
previous learning
time)
• Trade-offs:
population
p p size,,
number of iterations,
evaluation span
• Realistic simulation
(Webots)
Varying population size vs
vs.
number of iterations
[Pugh J., EPFL PhD Thesis No. 4256, 2008]
Hybrid Adaptation with Real Robots
• Move from realistic simulation (Webots) to real
robots after 90% learning (even faster evolution)
• Compromise between time and accuracy
• Noise-resistance helps manage transition
[Pugh J., EPFL PhD Thesis No. 4256, 2008]

Increasing Noise Level – Set
Set-Up
Up
1x1 m arena, PSO, 50 iterations, scenario 3
• Scenario 1: One robot
learning obstacle
avoidance
• Scenario 2: One robot
learning obstacle
avoidance, one robot
running pre-evolved
obstacle avoidance
• Scenario 3: Two robots
co-learning obstacle
avoidance
Idea: more robots more noise (as perceived from an individual robot); no
“standard” com between
bet een the robots but
b t in scenario 3 information sharing
through the population manager!
[Pugh et al, IEEE SIS 2005]
Increasingg Noise Level – Sample
p Results
[Pugh et al, IEEE SIS 2005]

Not only
y Obstacle
Avoidance: Evolving More
C
Complex
l Behaviors
B h i
Evolving Homing Behavior
(Floreano and Mondada 1996)
S t
Set-up Robot’ss sensors
Robot
• Fitness function: Controller
Φ = V (1 − i )
• V = mean speed of wheels, 0 ≤ V ≤ 1
• i = activation value of the sensor with the
highest activity, 0 ≤ i ≤ 1
• Fitness accumulated during life span, normalized over maximal number (150) of
control loops (actions).
• No explicit expression of battery level/duration in the fitness function (implicit).
• Chromosome length: 102 parameters (real-to-real encoding).
• Generations: 240, 10 days embedded evolution on Khepera.
Fitness evolution
Evolution of # control loops per evaluation span
Battery recharging vs. motion patterns
Battery energy
Left wheel activation
Right wheel activation
Reach the nest -> battery recharging -> turn on spot -> out of the nest
Evolved Homing
g Behavior
Not only Control Shaping:
Automatic
Hardware Software Co-
Hardware-Software Co
Design and Optimization in
Simulation and Validation
with Real Robots
Moving Beyond
C
Controller-Only
ll O l E Evolution
l i
• Evidence: Nature evolve HW and SW at the same
time …
• Faithful realistic simulators enable to explore
design solution which encompasses co-evolution
((co-design)
g ) of control and morphological
p g
characteristics (body shape, number of sensors,
placement of sensors, etc. )
• GA (PSO?) are powerful enough for this job and
the methodology remain the same; only encoding
changes
h
Evolving Control and Robot
Morphology
(Lipson and Pollack
Pollack, 2000)
http://www.mae.cornell.edu/ccsl/research/golem/index.html
• Arbitrary recurrent ANN
• Passive and active (linear
actuators)) links
li k
• Fitness function: net distance
traveled byy the centre of mass in
a fixed duration
Example of evolutionary
sequence:
Examples of Evolved Machines
Problem: simulator not enough realistic (performance higher in

simulation because of not good enough simulated friction; e.g.,
for the arrow configuration 59.6 cm vs. 22.5 cm)
Issues for Evolving Real Systems
by Exploiting Simulation
Goal: speeding up evolution
Traditional recipes (see Axis 2 classification):
1. Evolve in simulation, download on real HW
¾ Issue:
I B id the
Bridge h simulation-reality
i l i li gap
2. Evolve with real HW in the loop
¾ Issue: evaluation span too time consuming
((and fragile)
g )
Co-Evolution
Co Evolution of Simulation and Reality
[Bongard, Zykov, and
Lipson, 2006]
http://ccsl.mae.cornell.edu/emergent_self_models
Morphological
p g Estimation and
Damage Recovery
[Bongard, Zykov, and Lipson, 2006]

Co-Evolution of Models and Reality
• “Analysis by synthesis” method
• Powerful system identification method (not only
model parameters but also model structure)
• Limitation: implies knowledge about the
underlying phenomena, shrinking the system
rules to a possible alphabet (gray box as opposed
to black box identification)
Conclusion
Take Home Messages
• Machine-learning algorithms can be classified as supervised,
unsupervised, and reinforcement-based (evaluative)
• Evaluative techniques are key for robotic learning
• Two robust population-based metaheuristics are GA and PSO
• PSO is a younger technique than GA but extremely promising
• Computationally efficient, noise-resistant algorithms can be obtained
with a simple aggregation criterion in the main algorithmic loop
• Evaluative
E l i techniques
h i can be
b usedd for
f design
d i andd optimization
i i i off
behaviors whose complexity goes beyond obstacle avoidance
• Machine-learningg techniques
q can be also used for design
g and
optimization of hardware features with the help of simulation tools
• The cost of an optimization problem is heavily influenced by the
amount of noise in the evaluation function and the evaluation time
needed for a candidate solution
Additional Literature – Week 5
Books
• Mitchell M., “An Introduction to Genetic Algorithms”. MIT Press, 1996.
• Kennedy J.J and Eberhart R.
R C
C. with Y
Y. Shi, “Swarm
Swarm Intelligence
Intelligence”. Morgan
Kaufmann Publisher, 2001.
• Clerc M., “Particle Swarm Optimization”. ISTE Ltd., London, UK, 2006.
• Engelbrecht A. P., “Fundamentals of Computational Swarm Intelligence”. John
Wil & Sons,
Wiley S 2006.
2006
• Nolfi S. and Floreano D., “Evolutionary Robotics: The Biology, Intelligence, and
Technology of Self-Organizing Machines”. MIT Press, 2004.
• Sutton R. S. and Barto A. G., “Reinforcement
Reinforcement Learning: An Introduction
Introduction”.. The MIT
Press, Cambridge, MA, 1998.
Papers
• Lipson, H., Pollack J. B., "Automatic Design and Manufacture of Artificial
Lifeforms", Nature, 406: 974-978, 2000.
• Bongard J., Zykov V., Lipson H. (2006), “Resilient Machines Through Continuous
Self-Modeling",
Self Modeling , Science Vol. 314. no. 5802, pp. 1118 - 1121

DIS 11-12 W05 Lecture PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DIS 11-12 W05 Lecture PDF

Uploaded by

Copyright:

Available Formats

Distributed Intelligent Systems – W5

– Rationale for embedded systems

• There are design and optimization techniques

– Supervised learning: off-line, a teacher is available

– Unsupervised learning: off-line, teacher not available

– Reinforcement-based (or evaluative) learning: on-

– In simulation: reproduces the real scenario in simulation and

– Hybrid: most of the time in simulation (e.g. 90%), last

– Hardware-in-the-loop: from the beginning on real hardware

– On-board: machine-learning algorithm run on the system to

– Off-board: the machine-learning algorithm runs off-board

− Hill-climbing (“single-agent”): the machine-learning

Ex. of end criteria:

• Mutation (e.g. pmutation = 0.05)

Perform main PSO loop

Ex. of end criteria:

then move xij (t + 1) = xij (t ) + vij (t + 1)

xi* My position for optimal

General features Population-based, Population-based,

# of algorithmic pm, pc, selection par. w, cn, cp, k, m,

Generalized 34.6 ± 18.9 7.38 ± 3.27

Generation 100, on-line,

Note: Direction of motion NOT encoded in the fitness function: GA

1. Time for evaluation of candidate solutions (e.g.,

See also modeling

• Benchmark 2: obstacle avoidance on a robot

Fair test: same

• Limited adaptation time

[Pugh J., EPFL PhD Thesis No. 4256, 2008]

[Pugh et al, IEEE SIS 2005]

Evolution of # control loops per evaluation span

Battery recharging vs. motion patterns

Left wheel activation

Right wheel activation

Problem: simulator not enough realistic (performance higher in

[Bongard, Zykov, and Lipson, 2006]

You might also like