Psikharpax

Robotics and Autonomous Systems xxx (2004) xxx–xxx
3 The Psikharpax project: towards building an artificial rat

4 Jean-Arcady Meyera,∗ , Agnès Guillota , Benoı̂t Girarda,c ,
5 Mehdi Khamassia,c , Patrick Pirimb , Alain Berthozc
6
a AnimatLab/LIP6, Université Paris 6, CNRS, France
7
b BEV, France
F
8
c LPPA, CNRS, Collège de France, France
Received 6 September 2004; accepted 9 September 2004

OO
10 Abstract
Drawing inspiration from biology, the Psikharpax project aims at endowing a robot with a sensory-motor equipment and a
PR
11
12 neural control architecture that will afford some of the capacities of autonomy and adaptation that are exhibited by real rats.
13 The paper summarizes the current state of achievement of the project. It successively describes the robot’s future sensors and
14 actuators, and several biomimetic models of the anatomy and physiology of structures in the rat’s brain, like the hippocampus
15 and the basal ganglia, which have already been at work on various robots, and that make navigation and action selection possible.
16 Preliminary results on the implementation of learning mechanisms in these structures are also presented. Finally, the article
17 discusses the potential benefits that a biologically inspired approach affords to traditional autonomous robotics.
D
18 © 2004 Published by Elsevier B.V.
19 Keywords: Artificial rat; Navigation; Action selection; Learning

TE
20
21 1. Introduction the human brain. In particular, several researchers con- 29
sider that it is quite premature trying to understand and 30

EC
22 Since the 2-month workshop in Dartmouth College reproduce human intelligence – whatever this expres- 31
23 that founded the field of artificial intelligence in 1956, sion really means – and that one should first try to 32
24 and since the enthusiastic comments on the prospects understand and reproduce the probable roots of this 33
25 of the discipline that this event triggered [47,19], se- intelligence, i.e., the basic adaptive capacities of ani- 34
26 rious doubts have been raised (e.g., [16,17]) about the mals [9]. In other words, before attempting to repro- 35
RR
27 chances that an artificial system might compete in the duce unique capacities that characterize man, like log- 36
28 near future with the amazing capacities exhibited by ical reasoning or natural language understanding, it 37
might be wise to concentrate first on simpler abili- 38
∗ Corresponding author. ties that human beings share with other animals, like 39
E-mail address: jean-arcady.meyer@lip6.fr (J.-A. Meyer). navigating, seeking food and avoiding dangers. In this 40
CO
1 0921-8890/$ – see front matter © 2004 Published by Elsevier B.V.

2 doi:10.1016/j.robot.2004.09.018
ROBOT 1194 1–13

UN
2 J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx
F
OO
Fig. 1. The overall design of Psikharpax.
41 spirit, several research efforts are devoted to the de- sensory-motor equipment and the major modules of its 65
42 sign of so-called animats, i.e., simulated animals or control architecture. It also describes the behaviors that 66
43 real robots whose sensors, actuators and control archi- the robot Psikharpax already exhibits in simulation. 67
PR
44 tectures are as closely inspired from those of animals

45 as possible, and that are able to “survive” or fulfill their
46 mission in changing and unpredictable environments 2. Sensory-motor equipment 68
47 [38,39,30].
48 This article describes one such endeavor, the Psikharpax will be a 50 cm long robot (Fig. 1) 69
49 Psikharpax1 project, which aims at designing an ar- equipped with three sets of allothetic sensors: a two- 70
D
50 tificial rat that will exhibit at least some of the capac- eyed visual system, an auditory system calling upon 71
51 ities of autonomy and adaptation that characterize its two electronic cochleas, and a haptic system made of 72
natural counterpart—the living creature for which the 32 whiskers on each side of its head. Sensor fusion will
TE
52 73
53 product (brain complexity) × (biological knowledge) be realized through the use of Generic Visual Percep- 74
54 is the highest. In particular, this robot will be endowed tion Processor (GVPP), a biomimetic chip dedicated 75
55 with internal needs – such as hunger, rest, or curios- to low-level real-time signal processing that already 76
56 ity – which it will try to satisfy in order to survive serves robot vision [28]. 77
EC
57 within the challenging environment of a laboratory Psikharpax will also be endowed with three sets of 78
58 populated with humans and, possibly, other robots. idiothetic sensors: a vestibular system reacting to lin- 79
59 To this end, it will sense and act on its environment ear and angular accelerations of its head, an odometry 80
60 in pursuit of its own goals and in the service of its system monitoring the length and direction of its dis- 81
61 needs, without help or interpretation from outside the placements, and capacities to assess its current energy 82
RR
62 system. level. 83
63 This article summarizes the current state of this Psikharpax will be equipped with several motors 84
64 project. In particular, it describes the robot’s future and actuators. In particular – despite the fact that such 85
a device is not really biomimetic – two wheels will 86
1
allow the robot to move at a maximum speed of 0.3 m/s. 87
Psikharpax was the king of the rats – i.e., an intelligent and
Although it will usually lie flat on the ground, it will
CO
88
adaptive character – in the Batrachomyomachy, a parody of Iliad
written in Greek verses and (falsely) attributed to Homer. The name also have the possibility of rearing, as well as of seizing 89
means “crumb robber”. objects with two forelegs. Likewise, its head will be 90
ROBOT 1194 1–13

UN
J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx 3
Fig. 3. While exploring the environment shown on the left, the robot
built the cognitive map shown on the right. This map is made of inter-
connected neurons (the corresponding links are not shown) whose
activation level depends upon what the robot perceives in its sur-
Fig. 2. An eye equipped with a camera and a log-polar sensor, which roundings. Thus, when the robot is situated in zone A of its envi-
is actuated by three motors. ronment, according to the various landmarks it perceives from this
place, some neurons in its map become more activated than others,
F
91 able to rotate, and three pairs of motors will actuate thus affording the robot the capacity of locating itself.
92 each of its eyes (Fig. 2).
OO
93 Several low-level reflexes will connect Psikharpax’s multiple-hypothesis tracking navigation strategy, main- 120
94 sensors to its actuators, thus making it possible, for taining a set of hypotheses about the robot’s position 121
95 instance, to keep looking at an object even when its that are all updated in parallel [21,40]. It serves to build 122
96 head is moving, and to avoid an obstacle detected by a dense topological map [20], in which nodes store the 123
97 its whiskers or by its visual or auditory systems. allothetic data that the robot can perceive at the cor- 124
responding places in the environment. A link between 125

PR
two nodes memorizes how far off and in what direc- 126
98 3. Control architecture tion the corresponding places are positioned relatively 127
to each other, as measured by the robot’s idiothetic sen- 128
99 Likewise, several models of nervous circuits that sors. The robot’s position is represented by an activity 129
100 contribute to the adaptive capacities of the rat are distribution over the nodes, the activity level of a given 130
101 currently simulated or tested on real robots, and will node representing the probability that the robot is cur- 131
D
102 be implemented in the final control architecture of rently located at the corresponding position (Fig. 3). 132
103 Psikharpax. In particular, this artificial rat will be en- As the robot moves in its environment, the activity 133
dowed with the capacity of effecting visual or auditory level of neurons in the map changes accordingly. Not
TE
104 134
105 saccades towards salient objects, of relying on optical only is this activity propagation within the map coher- 135
106 flow to determine whether a given landmark is close or

107 distant, of merging visual and vestibular information to
108 permanently monitor its own orientation. Among such
EC
109 circuits, those that afford capacities for navigation and

110 action selection have already been validated on pre-
111 liminary versions of the future Psikharpax. The corre-
112 sponding realizations will now be briefly described.
RR
113 3.1. Navigation
114 Many simulation models – see [51] for a review

115 – call upon so-called place cells and head direction Fig. 4. Activity updates within the map as the robot moves to suc-
cessive places in its environment. Labels a, b, . . ., e indicate both the
116 cells to implement navigation systems that are inspired
actual position of the robot and the corresponding map activity. The
from the anatomy and physiology of dedicated struc-
CO
117
grey level of each small node in a map indicates its activity, rang-
118 tures in the rat’s brain, like the hippocampus and the ing from 0 for white nodes to 1 for black nodes. Larger black dots
119 postsubiculum. The model described here implements a indicate the robot’s most probable current localization.
ROBOT 1194 1–13

UN
Fig. 5. Activity updates within the map after a translocation pro-

cess during which the robot has been moved passively from place
a to place b in the environment. After a few mistakenly recognized
positions (b–d), the robot correctly recognizes its actual position (e).
F
136 ent with the robot’s actual moves (Fig. 4), but it also
affords useful relocalization capacities when the robot
OO
137
138 is passively moved from one place to the other (Fig. 5). Fig. 6. A single channel within the basal ganglia in the GPR model.
139 In [20,22], this navigation model has been imple- D1 and D2: striatal neurons with different dopamine receptors; STN:
140 mented on a Pioneer 2 mobile robot and proved to be sub-thalamic nucleus; EP/SNr: entopeduncular nucleus and substan-
141 efficient in the unprepared environment of an ordinary tia nigra reticula; GP: globus pallidus. Solid arrows represent exci-
tatory connections, dotted arrows represent inhibitory connections.
142 laboratory, notably for detour planning.
PR
143 3.2. Action selection This model has been implemented in a Lego robot 167
whose task was to select efficiently between four ac- 168
144 To survive, the rat must be able to solve the so-called tions – wandering, avoiding obstacles, “feeding” and 169
145 action selection problem, i.e., it must be able to decide “resting” – in order to “survive” in an environment 170
146 at every moment what to do next in the service of its where it could find “food” and “rest” places [25,26] 171
D
147 needs. Some of the circuits involved in this task are (Fig. 7). 172
148 known to be located in basal ganglia–thalamus–cortex Experimental results demonstrate the model’s abil- 173
loops and have inspired the GPR model designed by ity to promote survival, in the sense that it permanently
TE
149 174
150 Gurney et al. [31]. Basically, this model is implemented keeps two essential variables [4] above lethal levels: 175
151 as a network of leaky-integrator neurons, and assumes Potential Energy (obtained via “feeding”) and Energy 176
152 that the numerous segregated channels observed in each (converted from Potential Energy via “resting”). More- 177
153 nuclei of the basal ganglia each correspond to a discrete over, the model ensures clean and efficient switching 178
EC
154 motor action (the granularity of which has still not been between actions and, because it adds a persistence 179
155 deciphered) that is inhibited by default and thus pre- loop to a classical winner-takes-all (WTA) architec- 180
156 vented from being executed (Fig. 6). Inputs to these ture, it maintains the Potential Energy level at its max- 181
157 channels are so-called saliences that take into account imum charge more often (25% of the time) than in 182
158 both internal and external perceptions to assess the rel- the absence of such loop (less than 10% of the time) 183
RR
159 evance of each action with respect to the robot’s needs. (Fig. 8). 184
160 A positive feedback loop involving the thalamus serves However, the robot’s survival depends on its chances 185
161 to introduce some persistence in such assessments. Two of getting to the right place at the right moment, i.e., to 186
162 parallel selection and control circuits within the basal a food place when its Potential Energy level is low, or to 187
163 ganglia act to modulate interactions between channels. a rest place when it lacks Energy. Obviously, additional 188
Finally, at the output of these circuits, the action that adaptive capacities would depend on the robot’s apti-
CO
164 189
165 is the least inhibited by others is selected and allowed tude to record the position of such places on its map and 190
166 to be executed by the motor system. to use this map to reach such places when needed. This 191
ROBOT 1194 1–13

UN
Fig. 7. Left: The environment showing “food” (A) and “rest” (B) places. Right: A Lego robot equipped with light sensors (A) and bumpers (B).
192 has been made possible thanks to a model combining prevents it from being selected. Hence the robot cannot 214
193 navigation and action selection capacities. move and eat at the same time. 215
As usual, saliences in both loops depend upon both

F
216
194 3.3. Navigation and action selection internal and external perceptions. However, saliences 217
in the ventral loop also depend upon four direction pro- 218
OO
195 The connection of the previously described nav- files (Fig. 10) that are generated by two different nav- 219
196 igation and action selection models and their im- igation strategies, i.e., a simple guidance (or taxon) 220
197 plementation on a simulated robot were inspired strategy, and a more elaborate topological navigation 221
198 by recent hypotheses concerning the role of dedi- strategy [51]. This allows the robot to be attracted either 222
199 cated structures within the basal ganglia – the nu- by an object that it directly perceives (guidance profile) 223
PR
200 cleus accumbens in particular – and the interaction of or to move towards a region where such an object is lo- 224
201 basal ganglia–thalamus–cortex loops in the rat’s brain cated in its map (planning profile). The latter possibility 225
202 [24,26]. The corresponding model (Fig. 9) basically puts some constraints on action selection because the 226
203 involves two such loops: a ventral loop that selects lo- robot is committed to regularly returning to previously 227
204 comotor actions, like moving north or east, and a dorsal mapped areas in its environment in order to check the 228
205 loop that selects non-locomotor actions, like feeding or accuracy of the current map (homing profile). This need 229
D
206 resting. Each of these loops has been modeled as a GPR

207 system like the one shown on Fig. 6. The STN of the
208 dorsal loop provides the interconnection between them
TE
209 because it sends excitatory projections to the output of

210 the ventral loop. Consequently, when the dorsal loop
211 is active and triggers some non-locomotor action, the
212 excitatory signal that is sent towards the ventral loop
raises the inhibition level of every locomotor action and
EC
213
RR
CO
Fig. 8. Histogram showing the percentage of overall time during Fig. 9. Interconnection of the ventral and dorsal loops in the basal
which Potential Energy is reloaded at the values shown on the ab- ganglia. The ventral loop selects locomotor actions, the dorsal loop
scissa. selects non-locomotor actions. The latter subsumes the former.
ROBOT 1194 1–13

UN
F
OO
Fig. 11. The model integrating navigation and action selection calls
upon two basal ganglia–thalamus–cortex loops. Each loop is man-
aged by a GPR model, and the coordination between loops is provided
by the subthalamic nucleus of the dorsal loop, which is connected
Fig. 10. Three direction profiles (out of the four used by the model) to the ventral loop (connection not shown here). The dorsal loop
that call upon the current map of the environment. According to the selects one of the two possible reloading actions, the ventral loop
planning profile, the robot is motivated to move in two broad direc- selects one of the 36 directions of motion (simplified here to four
PR
tions that correspond to two resources recorded in its map. Accord- cardinal directions, whereas the homing and exploration input pro-
ing to the homing profile, the robot is motivated to return to already files are not shown). Inhibitory connections are represented by dotted
explored regions of the environment. According to the exploration arrows, excitatory connections by solid arrows.
profile, the robot is motivated to wander in as yet unexplored regions
of its environment.
excitatory signal will be sent to the ventral loop in order 251
to inhibit locomotor actions during this period. 252

230 is expressed by a Disorientation variable managed by Several experiments have been performed to test the
D
253
231 the model, which increases when the robot enters un- capacities of this control architecture to ensure survival 254
232 explored areas, and decreases when it returns to known by maintaining the robot’s essential variables above 255
areas. When this variable is not too high, the robot is
TE
233
lethal levels. In particular, it has been shown that, in 256
234 motivated to explore its environment (exploration pro- the environment on the left of Fig. 12, a robot (robot 257
235 file). This model has been implemented in a version A) calling upon both topological navigation and guid- 258
236 that manages 36 locomotor actions, i.e., moving in each ance strategies survived longer than another (robot B) 259
237 of 36 possible directions, and two non-locomotor ac-
EC
238 tions, i.e., reloading actions on food and rest places that
239 change the robot’s Energy and Potential Energy levels.
240 In the simplified example described by Fig. 11, only
241 locomotor actions are selected because external percep-
242 tions are not strong enough to suppress the inhibition
RR
243 of reloading actions on resource spots A or B. Because

244 the level of the B battery is the lowest, the robot’s main
245 motivation is to move north, according to one of the
246 directions advocated by the current planning profile.
Fig. 12. Two environments used to test the connection of navigation
247 In all probability, the robot following this navigation
and action selection models. E: “rest” place, Ep: “food” place. Left:
strategy will later turn east, then south and, when it
CO
248
In this environment, it is impossible to see one resource place from the
249 gets close to the B object, this fact will be detected by other. Right: The resource at Ep2 is not present during the elaboration
250 its sensors, a refueling action will be triggered, and an of the map.
ROBOT 1194 1–13

UN
Table 1
Statistical comparison of survival times measured along 10 runs for
two robots tested in the environment on the left of Fig. 12 over a 6-h
interval
Survival times (s)
Median Range
Robot A 14431.5 2531:17274
Robot B 4908.0 2518:8831
U Mann–Whitney U = 15 P < 0.01
260 relying on guidance only (Table 1). Hence, being able to Fig. 13. Two trajectories leading to “food” places (Ep). One is shorter
261 build a map of an unknown environment and to use this than the other, but entails passing through a dangerous place (Dan-
262 map to navigate between places E and Ep where En- ger).
263 ergy and Potential Energy could be acquired enhanced
F
264 the robot’s survival. A new action being selected ev- saliences, i.e., upon the respective attractiveness of a 286
265 ery 0.87 s as a mean, survival times equal to 4 h – the resource perceived directly and one whose location is 287
maximum simulation length – were attained. However, recorded on the map.

OO
266 288
267 in some occasions, premature deaths occurred because In the environment on Fig. 13, the robot has the 289
268 of the difficulty of elaborating a map from scratch that choice between two trajectories leading to a “food” 290
269 was precise enough to help the robot reach the right place. The first one is shorter but entails passing through 291
270 resource before exhaustion. a “dangerous” place. The second one is longer, but 292
271 Likewise, in the environment on the right of Fig. 12, safer. Experience shows that the robot is able to de- 293
PR
272 it has been shown that, if place Ep1 is the only one cide to navigate via the longer path when its Potential 294
273 previously encountered by the robot and recorded on Energy level is not so low that a long journey would 295
274 its map, the robot may decide to move towards that compromise its survival, but that otherwise it chooses 296
275 place to reload its Potential Energy. However, if on its the shorter path risking having to face the potential 297
276 way it detects the proximity of another food place like danger recorded on its map. Path choice depends on 298
277 Ep2, it will occasionally give up navigating towards the respective weights of two internal states respec- 299
D
278 Ep1 and opportunistically divert via Ep2. Then, having tively related to the need of Potential Energy and to 300
279 consumed the corresponding resource, it will register Fear (Table 3). 301
280 the position of Ep2 on its map. Thus, next time it needs Finally, in the complex and challenging environ- 302
TE
281 to reload its Potential Energy, it will have the choice ment of Fig. 14, in which several possibilities exist 303
282 of navigating towards Ep1 or Ep2. Table 2 shows that of getting lost, of encountering dangers, of discover- 304
283 such opportunistic behavior depends upon the respec- ing newly created resources, and of reaching places in 305
284 tive weights that were assigned to topological naviga- which a given resource is no longer available, experi- 306
tion and guidance profiles and that served to compute ence shows that the robot survives autonomously for
EC
285 307
long periods, thanks to the many adaptive mechanisms 308

Table 2 and behaviors that have just been described [24]. The 309
Number of opportunistic detours towards Ep2 out of 15 runs in the
longest survival time thus obtained was 21 h of sim- 310
environment on the right of Fig. 12
Number of Weight of Weight of Opportunistic
RR
Ep sources topological guidance detours towards Table 3

navigation Ep2 Number of trips via the short and long paths out of 20 runs in the
Fig. 13 environment
1 0.65 0.55 0/15
Fear level Potential Energy level Short path Long path
2 0.65 0.55 2/15
2 0.55 0.55 8/15 0.2 0.1 13 7
2 0.45 0.55 13/15 0.2 0.5 2 18
CO
These numbers increase when greater weights are given to guidance The greater the need for Potential Energy, the more often the short
than to topological navigation. path is preferred.
ROBOT 1194 1–13

UN
critic module could call upon a dopamine reinforce- 335
ment signal řt that is assumed to evaluate both the 336
episodic primary reward signal rt occasionally gen- 337
erated by the robot’s actions, and a secondary signal 338
computed as the difference gPt − Pt−1 between cur- 339
rently expected and future rewards (g being a dis- 340
count factor which determines how far in the future 341
expected rewards are taken into account). This rein- 342
forcement signal would be used in the GPR module 343
to adapt the way saliences are computed in order to 344
select the most appropriate action, i.e., the action the 345
most likely to maximize the reward it will lead to 346
(Fig. 15). 347
Several variants of this learning architecture [36] 348

F
have been implemented in a simulated robot that must 349
learn in a plus-maze, and through successive trials, 350

OO
which action to perform in order to get to the end of a 351
Fig. 14. A complex environment with four “rest” places (E), four corridor where a door may provide access to a reward 352
“food” places (Ep) and two dangerous places (ZD). Resource E1 (Fig. 16). This setting reproduces an experiment per- 353
appears and disappears every 30 min.

PR
311 ulated time. In this case, the major cause preventing

312 indefinite survival does not seem to be the navigational
313 issues mentioned above, but rather an intrinsic draw-
314 back in the GPR model, which gets locked in partic-
315 ular circumstances, with no action being disinhibited
316 enough to be selected. Current work aims at suppress-
D
317 ing such limitations and at further enhancing the robot’s

318 lifetime.
TE
319 3.4. Learning
320 In an unknown environment, a rat is able to explore

321 it and to incrementally build a map that describes its
EC
322 topology. Such associative learning, which combines

323 both allothetic and idiothetic data, was implemented in
324 the navigation model described above.
325 However, a rat is also able to improve its behavior
326 over time through reinforcement learning, i.e., thanks
RR
327 to adaptive mechanisms that increase its chances of

328 exhibiting behaviors leading to rewards and that lower
329 those of behaviors leading to punishments. Concern- Fig. 15. An actor-critic model of reinforcement learning. In this ver-
330 ing action selection, a recently debated hypothesis sion, the actor module is a GPR model (involving matrisomes in
the dorsal striatum) that is segregated into different channels, with
331 [6,34,46] postulates that such mechanisms could be
saliences as inputs and actions as outputs. The critic module (involv-
mediated by dopamine signals within so-called actor-
CO
332
ing striosomes in the dorsal striatum) propagates towards the actor
333 critic architectures. According to this hypothesis, a module an estimate ř of the instantaneous reinforcement triggered by
334 GPR module could play the role of an actor, while a the selected action. Th: Thalamus; SNc: substantia nigra compacta.
ROBOT 1194 1–13

UN
Fig. 16. The plus-maze in which the robot has to learn which action
to perform in order to reach a rewarding resource that is delivered at Fig. 17. Evolution over successive trials of the number of actions
a given end. The doors at three such ends are colored dark-grey and that the robot triggers before getting to the reward.
do not deliver any reward, whereas the fourth one is colored white
and leads to the resource. The center of the maze is characterized
F
by light-grey walls, corridors by black ones. The position of the 4. Discussion 379
rewarding extremity is chosen at random at the beginning of each
new trial, when the robot succeeds in returning to its starting point,
OO
i.e., at the intersection of the four corridors. The robot is equipped It is clear from recent reviews [5,33,52] that, al- 380
with a visual system that serves to detect the colors, bearings and though many research efforts have been devoted to the 381
distances of walls and doors, and with infrared sensors that trigger design of biomimetic sensors or effectors for robots, 382
low-level obstacle-avoidance reflexes. relatively little work has been done on control system 383
architectures, and what has been done has focused pri- 384
marily on invertebrate models. Only a few groups are 385

PR
354 formed with real rats [1] that must learn to reach the currently building biomimetic robot control architec- 386
355 center of the maze, to seek which door gets lighted by tures modeled on mammalian nervous systems and, 387
356 an overhanging lamp, to move to that door and gain moreover, their efforts are often centered on isolated 388
357 access to a water dispenser, to return to the center, and behaviors, like locomotion in cats [44] or feeding in 389
358 so on. mice [29], which are not dealt with in an integrated 390
359 To learn this task, 12 variables describing the robot’s perspective. To the best of our knowledge, the scope 391
D
360 visual input were used by the GPR to select among six of the Psikharpax project is unique. First, it draws in- 392
361 possible actions (drink, move forward, turn to white, spiration from a vertebrate instead of an invertebrate. 393
turn to light-grey, turn to dark-grey, do nothing—see Second, it aims at designing both biomimetic sensors
TE
362 394
363 Fig. 15). Hence, instead of being hand-crafted as in and control architectures. Third, because it capitalizes 395
364 the experiments described in Sections 3.2 and 3.3, the on a dedicated robotic platform, it will integrate a vari- 396
365 saliences of the action selection module were self- ety of sensors, actuators and control systems making it 397
366 adapted so as to maximize the incoming rewards. possible to assess its adaptive capacities in much more 398
EC
367 Experimental results demonstrate that the most ef- challenging circumstances than those that characterize 399
368 fective learning architecture is obtained when the seemingly comparable biomimetic robotic approaches 400
369 whole maze is arbitrarily partitioned in several zones, [11,23,32,41,49]. 401
370 and when a specific actor-critic pair learns the right However, from the point of view of biological real- 402
371 action to be accomplished in each zone. This archi- ism, several improvements could be made to the mod- 403
RR
372 tecture, inspired by [48] and [14], yielded the learn- els just described. In particular, the current naviga- 404
373 ing curve shown in Fig. 17, which compares favor- tion model could be replaced by a version reproduc- 405
374 ably with the best hand-crafted action selection mod- ing more faithfully the anatomy and physiology of the 406
375 ule that has been able to be designed. As expected, hippocampus and related areas (e.g., [50,3]), or sev- 407
376 after learning, the turn to dark-grey and do nothing eral hypotheses about the way cortical columns are 408
actions were never selected. Future work will aim at connected to these areas and afford planning capaci-
CO
377 409
378 self-adapting the arbitrary partition that has been used ties (e.g., [10,37]) could be explored. Likewise, other 410
here. hypotheses about how the navigation and action se- 411
ROBOT 1194 1–13

UN
412 lection models could be connected should be imple- ing which solution, the natural or the artificial, is better 460
413 mented and compared to the control architecture de- adapted to which decision problem. 461
414 scribed herein. For instance, instead of selecting mere From the point of view of robotics, the Psikharpax 462
415 locomotor actions from the saliences of several di- project substantially extends the scope of traditional 463
416 rection profiles, the ventral loop in the basal ganglia approaches to robot control because it will involve a 464
417 might play a higher-level role and use current internal wide variety of behaviors, because it will implement 465
418 and external perceptions to select the most appropriate different learning mechanisms, and because it is not ex- 466
419 among the possible navigation strategies [43]. Finally, clusively devoted to tasks that serve humans. As noted 467
420 there are many possibilities for extending the results in [2], the majority of research conducted so far on 468
421 obtained so far on learning. Actor-critic models call- autonomous mobile robots has concentrated on devel- 469
422 ing upon dopaminergic neurons should be compared oping vehicles that exhibit a single type of behavior 470
423 to concurrent hypotheses involving the role of gluta- supporting the specific task of moving between two 471
424 mate [45] in reinforcement learning. Also, useful dis- positions in the environment while avoiding collisions 472
425 tinctions are probably to be made between habit learn- with obstacles. Clearly such limited capacities may not 473
F
426 ing and goal-directed learning [12]. Finally, the role afford the degree of autonomy required in situations 474
427 of neuromodulators in the setting of learning meta- where human intervention is undesirable or impossi- 475
OO
428 parameters should be investigated, as suggested in ble, and raises the question of how the concept of “self- 476
429 [15]. autonomy”, i.e., behavior which may be characterized 477
430 From an engineering point of view, the system de- as supporting self survival differs from that of “imposed 478
431 scribed above raises several important issues. Consid- autonomy”, i.e., behavior which does not benefit the 479
432 ering that living systems are the product of roughly robot but fulfils some desired task which we impose 480
433 3.5 billions years of evolutionary tinkering [35], one upon the system. In this perspective, being able to in- 481
PR
434 may wonder if there is the slightest chance that artifi- tegrate the past (through its recorded map), the present 482
435 cial systems exhibiting adaptive behaviors of a com- (through its sensors) and the future (through its plan- 483
436 parable efficiency might call upon simplistic neurons ning capacities), Psikharpax will represent an embod- 484
437 (e.g., threshold-gate models) and homogeneous archi- ied example of a motivationally autonomous animat 485
438 tectures (e.g., perceptrons). In other words, if nature has whose control complexity may well challenge the pos- 486
439 invented the highly complex and still imperfectly un- sibilities of external control and, hence, its capacities 487
D
440 derstood processes at work in real neurons and, if it has to withstand any imposed autonomy [18]. Conversely, 488
441 connected them in highly heterogeneous networks, it following Moravec [42], one may hope that its mobil- 489
is probably for good reasons, worth deciphering. Like- ity, its sensor equipment, and its ability to carry out
TE
442 490
443 wise, there are certainly good reasons why the output survival-related tasks in a dynamic environment will 491
444 neurons of so many structures in the brain (e.g., the provide a necessary basis for surpassing the current 492
445 cerebellum, the frontal and prefrontal cortex, the basal “reptile-stage” of robots and eventually reaching the 493
446 ganglia, the striatum, etc.) play an inhibitory role. As general competence and “true” intelligence of human 494
EC
447 described by Berthoz, decision in the brain is often the beings. 495
448 selective suppression of actions that are irrelevant with

449 respect to the goal, to the context, to past experience. “I
450 think, therefore I inhibit. [. . .] Whereas engineers tend 5. Conclusion 496
451 to formalize the problem of action selection in terms

RR
452 of probabilities of winning more than loosing, nature’s The Psikharpax project aims at designing an artifi- 497
453 solution is in terms of excitation/inhibition. Underly- cial rat able to “survive” in a laboratory populated by 498
454 ing such competition, there is a wealth of possibilities humans and other robots. It has two main objectives: 499
455 that we are a long way from understanding, but that far better understand the control mechanisms of rats, im- 500
456 transcends the cold estimate of probability calculus, prove the autonomy of robots. The robot Psikharpax 501
how Bayesian it may be. A new neurocognitive the- will be endowed with many sensors and motors that
CO
457 502
458 ory of decision remains to be elaborated” [7]. Again, are currently under development and that will serve 503
459 such considerations lead to the question of understand- to implement various reflexes. Its control architecture 504
ROBOT 1194 1–13

UN
505 has already been tested in simulation and implemented [7] A. Berthoz, La décision, Odile Jacob, Paris, 2003. 545
506 on simpler versions of the future robot. In particular, [8] R. Brooks, A robust layered control system for a mobile robot, 546
IEEE Robotics and Automation 2 (1) (1986) 14–23. 547

507 models for navigation and action selection – which af-
[9] R. Brooks, Cambrian Intelligence. The Early History of the New 548
508 ford capacities of associative and reinforcement learn- AI, The MIT Press, Cambridge, 1999. 549
509 ing – have been successfully tested. It thus appears that [10] Y. Burnod, An Adaptive Neural Network: The Cerebral Cortex, 550
510 Psikharpax will be able to explore an unknown envi- Masson, Paris, 1993. 551
511 ronment, to build a topological map of it, and to plan [11] G. Capi, E. Uchibe, K. Doya, Selection of neural architecture 552
and the environment complexity, in: Proceedings of the Third 553
512 trajectories to places where it will fulfill various internal
International Symposium on Human and Artificial Intelligence 554
513 needs, like “eating”, “resting”, “exploring” or “avoid- Systems: Dynamic Systems Approach for Embodiment and So- 555
514 ing danger”. The first version of such an efficient robot ciality, Fukui, Japan, 2002, pp. 231–237. 556
515 is expected to be available at the end of year 2005: still [12] P. Dayan, B.W. Balleine, Reward, motivation, and reinforce- 557
516 a long way to the ”whole ratẗhat Dennett might have ment learning, Neuron 36 (2002) 285–298. 558
517 advocated [13]. Even a longer way to the intelligence [13] D.C. Dennett, Why not the Whole Iguana? Behavioral and Brain 559
Sciences 1 (1978) 103–104. 560

518 of man. [14] K. Doya, K. Samejima, K. Katagiri, M. Kawato, Multiple
F
561
model-based reinforcement learning, Neural Computation 14 562

(6) (2002) 1347–1369. 563
OO
[15] K. Doya, Metalearning and neuromodulation, Neural Compu- 564

519 Uncited references tation 12 (1) (2003) 219–245. 565
[16] H.L. Dreyfus, What Computers Can’t Do: The Limits of Arti- 566
520 [8,27]. ficial Intelligence, Harper and Row, 1972. 567
[17] H.L. Dreyfus, What Computers Still Can’t Do: A Critique of 568
Artificial Reason, The MIT Press, Cambridge, 1992. 569
[18] D. Mc Farland, T. Bösser, Intelligent Behavior in Animals and 570

PR
521 Acknowledgments Robots, The MIT Press, Cambridge, 1993. 571
[19] E.A. Feigenbaum, J. Feldman, Computers and Thought, 572
522 This research was granted by the Project Robotics McGraw-Hill, 1963. 573
[20] D. Filliat, J.-A. Meyer, Active perception and map learning for 574
523 and Artiﬁcial Entities (ROBEA) of the Centre Na-
robot navigation, in: From Animals to Animats 6, Proceed- 575
524 tional de la Recherche Scientifique, France. The au- ings of the Sixth International Conference on Simulation of 576
525 thors wish to thank David Filliat, Loı̈c Lachèze, and Adaptive Behavior, The MIT Press, Cambridge, 2000, pp. 246– 577
D
526 Sidney Wiener for their valuable contribution to the 255. 578
527 project. [21] D. Filliat, J.-A. Meyer, Map-based navigation in mobile robots. 579
I. A review of localization strategies, Journal of Cognitive Sys- 580

TE
tems Research 4 (4) (2003) 243–282. 581
[22] D. Filliat, J.-A. Meyer, Global localization and topological map 582
528 References learning for robot navigation, in: From Animals to Animats 7, 583
Proceedings of the Seventh International Conference on Simu- 584
529 [1] S.V. Albertin, A.B. Mulder, E. Tabuchi, M.B. Zugaro, S.I. lation of Adaptive Behavior, The MIT Press, Cambridge, 2002, 585
Wiener, Lesions of the medial shell of the nucleus accumbens pp. 131–140. 586
EC
530
531 impair rats in finding larger rewards, but spare reward-seeking [23] P. Gaussier, S. Leprêtre, M. Quoy, A. Revel, C. Joulain, J.P. 587
532 behavior, Behavioral Brain Research 117 (1/2) (2000) 173– Banquet, Experiments and models about cognitive map learning 588
533 183. for motivated navigation, Robotics and Intelligent Systems 24 589
534 [2] T. Anderson, M. Donath, Animal behaviour as a paradigm for (2000) 53–94. 590
[24] B. Girard, Intégration de la navigation et de la sélection de 591

535 development robot autonomy, Robotics and Autonomous Sys-
l’action dans une architecture de contrôle inspirée des ganglions 592
tems 6 (1990) 145–168.
RR
536
de la base, Ph.D. Thesis, University of Paris 6, France, 2003. 593
537 [3] A. Arleo, M. Gerstner, Spatial cognition and neuro-mimetic
[25] B. Girard, V. Cuzin, A. Guillot, K. Gurney, T.J. Prescott, Com- 594
538 navigation: a model of hippocampal place cell activity, Biolog-
paring a bio-inspired robot action selection mechanism with 595
539 ical Cybernetics 83 (2000) 287–299.
winner-takes-all, in: From Animals to Animats 7, Proceed- 596
540 [4] W. Ashby, Design for a Brain, Wiley, New York, 1952.
ings of the Seventh International Conference on Simulation 597
541 [5] Y. Bar-Cohen, C. Breazeal, Biologically-inspired Intelligent
of Adaptive Behavior, The MIT Press, Cambridge, 2002, pp. 598
542 Robots, SPIE, Washington, 2003.
CO
543 [6] A.G. Barto, Adaptive critics and the basal ganglia, in: Models 75–84. 599
544 of Information Processing in the Basal Ganglia, The MIT Press, [26] B. Girard, V. Cuzin, A. Guillot, K. Gurney, T.J. Prescott, A basal 600
Cambridge, 1995, pp. 215–232. ganglia inspired model of action selection evaluated in a robotic 601
ROBOT 1194 1–13

UN
602 survival task, Journal of Integrative Neuroscience 2 (22) (2003) of Adaptive Behaviour, The MIT Press, Cambridge, 2000, pp. 659
603 179–200. 157–166. 660
604 [27] B. Girard, D. Filliat, J.-A. Meyer, A. Berthoz, A. Guillot, [42] H.P. Moravec, Robot: Mere Machine to Transcendent Mind, 661
605 An integration of two control architectures of action selec- Oxford University Press, Oxford, 1998. 662
606 tion and navigation inspired by neural circuits in the verte- [43] A.B. Mulder, E. Tabuchi, S.I. Wiener, Neurons in hippocam- 663
607 brates: the basal ganglia, in: Connectionist Models of Cognition pal afferent zones of rat striatum parse routes into multi-pace 664
608 and Perception II, Proceedings of the Eighth Neural Compu- segments during maze navigation, European Journal of Neuro- 665
609 tation and Psychology Workshop, World Scientific, 2004, pp. science 19 (7) (2004) 1923–1932. 666
610 72–81. [44] A.E. Patla, T.W. Calvert, R.B. Stein, Model of a pattern gener- 667
611 [28] S. Gourichon, J.-A. Meyer, P. Pirim, Using colored snapshots ator for locomotion in mammals, American Journal of Physiol- 668
612 for short-range guidance in mobile robots, International Journal ogy: Regulatory, Integrative and Comparative Physiology 248 669
613 of Robotics and Automation 17 (4) (2002) 154–162. (1985) 484–494. 670
614 [29] A. Guillot, J.-A. Meyer, A test of the Booth energy flow model [45] C.M.A. Pennartz, B.L. McNaughton, A.B. Mulder, The gluta- 671
615 (Mark 3) on feeding patterns of mice, Appetite 8 (1987) 67–78. mate hypothesis of reinforcement learning, Progress in Brain 672
616 [30] A. Guillot, J.-A. Meyer, The animat contribution to cognitive Research 126 (2000) 231–253. 673
617 systems research, Journal of Cognitive Systems Research 2 (2) [46] W. Schultz, P. Dayan, P.R. Montague, A neural substrate of 674
(2001) 157–165. prediction and reward, Science 275 (1997) 1593–1599.

F
618 675
619 [31] K.N. Gurney, T.J. Prescott, P. Redgrave, A computational model [47] H.A. Simon, Models of Man, Wiley, New York, 1957. 676
620 of action selection in the basal ganglia. I. A new functional [48] R.E. Suri, W. Schultz, Temporal difference model reproduces 677
OO
621 anatomy, Biological Cybernetics 84 (2001) 401–410. anticipatory neural activity, Neural Computation 13 (2001) 678
622 [32] V.V. Hafner, M. Fend, M. Lungarella, R. Pfeifer, P. König, K.P. 841–862. 679
623 Körding, Optimal coding for naturally occurring whisker de- [49] D. Touretzky, L. Saksida, Operant conditioning in skinnerbots, 680
624 flections, in: Proceedings of the Joint International Conference Adaptive Behavior 5 (3/4) (1997) 219–247. 681
625 on Artificial Neural Networks and Neural Information Process- [50] O. Trullier, J.-A. Meyer, Animat navigation using a cognitive 682
626 ing (ICANN/ICONIP), Springer Lecture Notes in Computer graph, Biological Cybernetics 83 (3) (2000) 271–285. 683
627 Science, 2003, pp. 805–812. [51] O. Trullier, S.I. Wiener, A. Berthoz, J.-A. Meyer, Biologically- 684
PR
628 [33] O. Holland, D. McFarland, Artificial Ethology, Oxford Univer- based artificial navigation systems: review and prospects, 685
629 sity Press, Oxford, 2003. Progress in Neurobiology 51 (1997) 483–544. 686
630 [34] J.C. Houk, J.L. Adams, A.G. Barto, A model of how the basal [52] B. Webb, T.R. Consi, Biorobotics: Methods and Applications, 687
631 ganglia generate and use neural signals that predict reinforce- The MIT Press, Cambridge, 2001. 688
632 ment, in: Models of Information Processing in the Basal Gan-

633 glia, The MIT Press, Cambridge, 1995, pp. 249–270.
634 [35] F. Jacob, Evolution and tinkering, Science 196 (1977) 116–
D
635 166. Jean-Arcady Meyer is Research Director 689
636 [36] M. Khamassi, B. Girard, A. Berthoz, A. Guillot, Comparing at CNRS and heads the AnimatLab within 690
637 three critic models of reinforcement learning in the basal ganglia the Laboratoire d’Informatique de Paris 6 691
(LIP6) in Paris. He trained as an engineer,

TE
638 connected to a detailed actor in a S-R task, in: Proceedings of 692
639 the Eighth Intelligent Autonomous Systems Conference, IOS he graduated in Human and Animal Psy- 693
640 Press, 2004, pp. 430–437. chology, and received his PhD in Biology. 694
641 [37] R.A. Koene, A. Gorchetchnikov, R.C. Cannon, M.E. Hasselmo, He is the founder of the journal Adaptive 695
642 Modeling goal-directed spatial navigation in the rat based on Behavior and a former Director of the Inter- 696
643 physiological data from the hippocampal formation, Neural national Society for Adaptive Behavior. His 697
EC
644 Networks 16 (5/6) (2003) 577–584. primary scientific interests are in adaptive 698
645 [38] J.-A. Meyer, The animat approach to cognitive science, in: behaviors in natural and artificial systems. 699
646 Comparative Approaches to Cognitive Science, The MIT Press, He is the main coordinator of the Psikharpax project. 700
647 Cambridge, 1995, pp. 27–44.

648 [39] J.-A. Meyer, Artificial life and the animat approach to artificial
649 intelligence, in: Artificial Intelligence, Academic Press, Lon- Agnès Guillot is Associate Professor of 701
RR
650 don, 1996, pp. 325–354. Psychophysiology at Paris X University. 702
651 [40] J.-A. Meyer, D. Filliat, Map-based navigation in mobile robots. She graduated in Human and Animal Psy- 703
652 II. A review of map-learning and path-planning strategies, chology, and holds PhDs in Psychophysi- 704
653 Journal of Cognitive Systems Research 4 (4) (2003) 283– ology and Biomathematics. Her main sci- 705
654 317. entific interests are in action selection in 706
655 [41] F. Montes Gonzalez, T.J. Prescott, K. Gurney, M. Humphries, P. animals and robots. She coordinates the 707
biomimetic modelling of Psikharpax.

CO
708
656 Redgrave, An embodied model of action selection mechanisms
657 in the vertebrate brain, in: From Animals to Animats 6, Pro-
658 ceedings of the Sixth International Conference on Simulation
ROBOT 1194 1–13

UN
709 Benoı̂t Girard is a Postdoctoral Fellow at Patrick Pirim trained as an engineer. He 727
710 the Laboratoire de Physiologie de la Percep- founded BEV, a company that brings fully 728
711 tion et de l’Action (LPPA) at the Collège de developed new technologies to the market- 729
712 France. He trained as an engineer and re- place. He invented the GVPP chip that is 730
713 ceived his PhD in Computer Science from dedicated to low-level real-time signal pro- 731
714 the University of Paris 6. His current re- cessing. He coordinates the sensory-motor 732
715 search interests are focussed on computa- design of Psikharpax. 733
716 tional models of action selection in the rat

717 brain, and notably in the control of oculo-
718 motor saccades.
Alain Berthoz trained as an engineer. He 734
graduated in Human Psychology and re- 735
719 Mehdi Khamassi is working both at the An- ceived his PhD in Biology. He is Professor 736
720 imatLab and at the LPPA as a PhD student. at the Collège de France, where he heads the 737
721 He trained as an engineer and received a Laboratoire de Physiologie de la Perception 738
722 Master Science degree in Cognitive Science et de l’ Action (LPPA). He is a member of 739
F
723 from the University of Paris 6. His current many academic societies, an expert sitting 740
724 research interests include electrophysiology on several international committees, and he 741
OO
725 experiments and computational modelling has been awarded many prizes. His main 742
726 of learning processes in the rat brain. scientific interests are in the multisensory 743
control of gaze, of equilibrium, of locomo- 744

tion and of spatial memory. He coordinates the neurophysiological 745
experiments that inspire the Psikharpax project. 746

D PR
TE
EC
RR
CO
ROBOT 1194 1–13

UN

Psikharpax

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Psikharpax

Uploaded by

Copyright:

Available Formats

Robotics and Autonomous Systems xxx (2004) xxx–xxx

3 The Psikharpax project: towards building an artificial rat

Received 6 September 2004; accepted 9 September 2004

18 © 2004 Published by Elsevier B.V.

19 Keywords: Artificial rat; Navigation; Action selection; Learning

21 1. Introduction the human brain. In particular, several researchers con- 29

sider that it is quite premature trying to understand and 30

might be wise to concentrate first on simpler abili- 38

1 0921-8890/$ – see front matter © 2004 Published by Elsevier B.V.

ROBOT 1194 1–13

Fig. 1. The overall design of Psikharpax.

44 tectures are as closely inspired from those of animals

a device is not really biomimetic – two wheels will 86

ROBOT 1194 1–13

responding places in the environment. A link between 125

to each other, as measured by the robot’s idiothetic sen- 128

106 flow to determine whether a given landmark is close or

109 circuits, those that afford capacities for navigation and

113 3.1. Navigation

114 Many simulation models – see [51] for a review

ROBOT 1194 1–13

Fig. 5. Activity updates within the map after a translocation pro-

whose task was to select efficiently between four ac- 168

ROBOT 1194 1–13

As usual, saliences in both loops depend upon both

206 resting. Each of these loops has been modeled as a GPR

209 because it sends excitatory projections to the output of

ROBOT 1194 1–13

to inhibit locomotor actions during this period. 252

243 of reloading actions on resource spots A or B. Because

ROBOT 1194 1–13

maximum simulation length – were attained. However, recorded on the map.

long periods, thanks to the many adaptive mechanisms 308

Ep sources topological guidance detours towards Table 3

ROBOT 1194 1–13

critic module could call upon a dopamine reinforce- 335

ment signal řt that is assumed to evaluate both the 336

episodic primary reward signal rt occasionally gen- 337

erated by the robot’s actions, and a secondary signal 338

computed as the difference gPt − Pt−1 between cur- 339

rently expected and future rewards (g being a dis- 340

count factor which determines how far in the future 341

expected rewards are taken into account). This rein- 342

forcement signal would be used in the GPR module 343

to adapt the way saliences are computed in order to 344

most likely to maximize the reward it will lead to 346

(Fig. 15). 347

Several variants of this learning architecture [36] 348

have been implemented in a simulated robot that must 349

learn in a plus-maze, and through successive trials, 350

which action to perform in order to get to the end of a 351

appears and disappears every 30 min.

311 ulated time. In this case, the major cause preventing

317 ing such limitations and at further enhancing the robot’s

319 3.4. Learning

320 In an unknown environment, a rat is able to explore

322 topology. Such associative learning, which combines

327 to adaptive mechanisms that increase its chances of

ROBOT 1194 1–13

marily on invertebrate models. Only a few groups are 385

369 whole maze is arbitrarily partitioned in several zones, [11,23,32,41,49]. 401

ROBOT 1194 1–13

429 [15]. autonomy”, i.e., behavior which may be characterized 477

448 selective suppression of actions that are irrelevant with