You are on page 1of 13

Robotics and Autonomous Systems xxx (2004) xxx–xxx

3 The Psikharpax project: towards building an artificial rat


4 Jean-Arcady Meyera,∗ , Agnès Guillota , Benoı̂t Girarda,c ,
5 Mehdi Khamassia,c , Patrick Pirimb , Alain Berthozc
6
a AnimatLab/LIP6, Université Paris 6, CNRS, France
7
b BEV, France
F

8
c LPPA, CNRS, Collège de France, France

Received 6 September 2004; accepted 9 September 2004


OO

10 Abstract

Drawing inspiration from biology, the Psikharpax project aims at endowing a robot with a sensory-motor equipment and a
PR

11

12 neural control architecture that will afford some of the capacities of autonomy and adaptation that are exhibited by real rats.
13 The paper summarizes the current state of achievement of the project. It successively describes the robot’s future sensors and
14 actuators, and several biomimetic models of the anatomy and physiology of structures in the rat’s brain, like the hippocampus
15 and the basal ganglia, which have already been at work on various robots, and that make navigation and action selection possible.
16 Preliminary results on the implementation of learning mechanisms in these structures are also presented. Finally, the article
17 discusses the potential benefits that a biologically inspired approach affords to traditional autonomous robotics.
D

18 © 2004 Published by Elsevier B.V.

19 Keywords: Artificial rat; Navigation; Action selection; Learning


TE

20

21 1. Introduction the human brain. In particular, several researchers con- 29

sider that it is quite premature trying to understand and 30


EC

22 Since the 2-month workshop in Dartmouth College reproduce human intelligence – whatever this expres- 31

23 that founded the field of artificial intelligence in 1956, sion really means – and that one should first try to 32

24 and since the enthusiastic comments on the prospects understand and reproduce the probable roots of this 33

25 of the discipline that this event triggered [47,19], se- intelligence, i.e., the basic adaptive capacities of ani- 34

26 rious doubts have been raised (e.g., [16,17]) about the mals [9]. In other words, before attempting to repro- 35
RR

27 chances that an artificial system might compete in the duce unique capacities that characterize man, like log- 36

28 near future with the amazing capacities exhibited by ical reasoning or natural language understanding, it 37

might be wise to concentrate first on simpler abili- 38

∗ Corresponding author. ties that human beings share with other animals, like 39

E-mail address: jean-arcady.meyer@lip6.fr (J.-A. Meyer). navigating, seeking food and avoiding dangers. In this 40
CO

1 0921-8890/$ – see front matter © 2004 Published by Elsevier B.V.


2 doi:10.1016/j.robot.2004.09.018

ROBOT 1194 1–13


UN
2 J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx
F
OO

Fig. 1. The overall design of Psikharpax.

41 spirit, several research efforts are devoted to the de- sensory-motor equipment and the major modules of its 65

42 sign of so-called animats, i.e., simulated animals or control architecture. It also describes the behaviors that 66

43 real robots whose sensors, actuators and control archi- the robot Psikharpax already exhibits in simulation. 67
PR

44 tectures are as closely inspired from those of animals


45 as possible, and that are able to “survive” or fulfill their
46 mission in changing and unpredictable environments 2. Sensory-motor equipment 68

47 [38,39,30].
48 This article describes one such endeavor, the Psikharpax will be a 50 cm long robot (Fig. 1) 69

49 Psikharpax1 project, which aims at designing an ar- equipped with three sets of allothetic sensors: a two- 70
D

50 tificial rat that will exhibit at least some of the capac- eyed visual system, an auditory system calling upon 71

51 ities of autonomy and adaptation that characterize its two electronic cochleas, and a haptic system made of 72

natural counterpart—the living creature for which the 32 whiskers on each side of its head. Sensor fusion will
TE

52 73

53 product (brain complexity) × (biological knowledge) be realized through the use of Generic Visual Percep- 74

54 is the highest. In particular, this robot will be endowed tion Processor (GVPP), a biomimetic chip dedicated 75

55 with internal needs – such as hunger, rest, or curios- to low-level real-time signal processing that already 76

56 ity – which it will try to satisfy in order to survive serves robot vision [28]. 77
EC

57 within the challenging environment of a laboratory Psikharpax will also be endowed with three sets of 78

58 populated with humans and, possibly, other robots. idiothetic sensors: a vestibular system reacting to lin- 79

59 To this end, it will sense and act on its environment ear and angular accelerations of its head, an odometry 80

60 in pursuit of its own goals and in the service of its system monitoring the length and direction of its dis- 81

61 needs, without help or interpretation from outside the placements, and capacities to assess its current energy 82
RR

62 system. level. 83

63 This article summarizes the current state of this Psikharpax will be equipped with several motors 84

64 project. In particular, it describes the robot’s future and actuators. In particular – despite the fact that such 85

a device is not really biomimetic – two wheels will 86

1
allow the robot to move at a maximum speed of 0.3 m/s. 87
Psikharpax was the king of the rats – i.e., an intelligent and
Although it will usually lie flat on the ground, it will
CO

88
adaptive character – in the Batrachomyomachy, a parody of Iliad
written in Greek verses and (falsely) attributed to Homer. The name also have the possibility of rearing, as well as of seizing 89

means “crumb robber”. objects with two forelegs. Likewise, its head will be 90

ROBOT 1194 1–13


UN
J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx 3

Fig. 3. While exploring the environment shown on the left, the robot
built the cognitive map shown on the right. This map is made of inter-
connected neurons (the corresponding links are not shown) whose
activation level depends upon what the robot perceives in its sur-
Fig. 2. An eye equipped with a camera and a log-polar sensor, which roundings. Thus, when the robot is situated in zone A of its envi-
is actuated by three motors. ronment, according to the various landmarks it perceives from this
place, some neurons in its map become more activated than others,
F

91 able to rotate, and three pairs of motors will actuate thus affording the robot the capacity of locating itself.
92 each of its eyes (Fig. 2).
OO

93 Several low-level reflexes will connect Psikharpax’s multiple-hypothesis tracking navigation strategy, main- 120

94 sensors to its actuators, thus making it possible, for taining a set of hypotheses about the robot’s position 121

95 instance, to keep looking at an object even when its that are all updated in parallel [21,40]. It serves to build 122

96 head is moving, and to avoid an obstacle detected by a dense topological map [20], in which nodes store the 123

97 its whiskers or by its visual or auditory systems. allothetic data that the robot can perceive at the cor- 124

responding places in the environment. A link between 125


PR

two nodes memorizes how far off and in what direc- 126

98 3. Control architecture tion the corresponding places are positioned relatively 127

to each other, as measured by the robot’s idiothetic sen- 128

99 Likewise, several models of nervous circuits that sors. The robot’s position is represented by an activity 129

100 contribute to the adaptive capacities of the rat are distribution over the nodes, the activity level of a given 130

101 currently simulated or tested on real robots, and will node representing the probability that the robot is cur- 131
D

102 be implemented in the final control architecture of rently located at the corresponding position (Fig. 3). 132

103 Psikharpax. In particular, this artificial rat will be en- As the robot moves in its environment, the activity 133

dowed with the capacity of effecting visual or auditory level of neurons in the map changes accordingly. Not
TE

104 134

105 saccades towards salient objects, of relying on optical only is this activity propagation within the map coher- 135

106 flow to determine whether a given landmark is close or


107 distant, of merging visual and vestibular information to
108 permanently monitor its own orientation. Among such
EC

109 circuits, those that afford capacities for navigation and


110 action selection have already been validated on pre-
111 liminary versions of the future Psikharpax. The corre-
112 sponding realizations will now be briefly described.
RR

113 3.1. Navigation

114 Many simulation models – see [51] for a review


115 – call upon so-called place cells and head direction Fig. 4. Activity updates within the map as the robot moves to suc-
cessive places in its environment. Labels a, b, . . ., e indicate both the
116 cells to implement navigation systems that are inspired
actual position of the robot and the corresponding map activity. The
from the anatomy and physiology of dedicated struc-
CO

117
grey level of each small node in a map indicates its activity, rang-
118 tures in the rat’s brain, like the hippocampus and the ing from 0 for white nodes to 1 for black nodes. Larger black dots
119 postsubiculum. The model described here implements a indicate the robot’s most probable current localization.

ROBOT 1194 1–13


UN
4 J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx

Fig. 5. Activity updates within the map after a translocation pro-


cess during which the robot has been moved passively from place
a to place b in the environment. After a few mistakenly recognized
positions (b–d), the robot correctly recognizes its actual position (e).
F

136 ent with the robot’s actual moves (Fig. 4), but it also
affords useful relocalization capacities when the robot
OO

137

138 is passively moved from one place to the other (Fig. 5). Fig. 6. A single channel within the basal ganglia in the GPR model.
139 In [20,22], this navigation model has been imple- D1 and D2: striatal neurons with different dopamine receptors; STN:
140 mented on a Pioneer 2 mobile robot and proved to be sub-thalamic nucleus; EP/SNr: entopeduncular nucleus and substan-
141 efficient in the unprepared environment of an ordinary tia nigra reticula; GP: globus pallidus. Solid arrows represent exci-
tatory connections, dotted arrows represent inhibitory connections.
142 laboratory, notably for detour planning.
PR

143 3.2. Action selection This model has been implemented in a Lego robot 167

whose task was to select efficiently between four ac- 168

144 To survive, the rat must be able to solve the so-called tions – wandering, avoiding obstacles, “feeding” and 169

145 action selection problem, i.e., it must be able to decide “resting” – in order to “survive” in an environment 170

146 at every moment what to do next in the service of its where it could find “food” and “rest” places [25,26] 171
D

147 needs. Some of the circuits involved in this task are (Fig. 7). 172

148 known to be located in basal ganglia–thalamus–cortex Experimental results demonstrate the model’s abil- 173

loops and have inspired the GPR model designed by ity to promote survival, in the sense that it permanently
TE

149 174

150 Gurney et al. [31]. Basically, this model is implemented keeps two essential variables [4] above lethal levels: 175

151 as a network of leaky-integrator neurons, and assumes Potential Energy (obtained via “feeding”) and Energy 176

152 that the numerous segregated channels observed in each (converted from Potential Energy via “resting”). More- 177

153 nuclei of the basal ganglia each correspond to a discrete over, the model ensures clean and efficient switching 178
EC

154 motor action (the granularity of which has still not been between actions and, because it adds a persistence 179

155 deciphered) that is inhibited by default and thus pre- loop to a classical winner-takes-all (WTA) architec- 180

156 vented from being executed (Fig. 6). Inputs to these ture, it maintains the Potential Energy level at its max- 181

157 channels are so-called saliences that take into account imum charge more often (25% of the time) than in 182

158 both internal and external perceptions to assess the rel- the absence of such loop (less than 10% of the time) 183
RR

159 evance of each action with respect to the robot’s needs. (Fig. 8). 184

160 A positive feedback loop involving the thalamus serves However, the robot’s survival depends on its chances 185

161 to introduce some persistence in such assessments. Two of getting to the right place at the right moment, i.e., to 186

162 parallel selection and control circuits within the basal a food place when its Potential Energy level is low, or to 187

163 ganglia act to modulate interactions between channels. a rest place when it lacks Energy. Obviously, additional 188

Finally, at the output of these circuits, the action that adaptive capacities would depend on the robot’s apti-
CO

164 189

165 is the least inhibited by others is selected and allowed tude to record the position of such places on its map and 190

166 to be executed by the motor system. to use this map to reach such places when needed. This 191

ROBOT 1194 1–13


UN
J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx 5

Fig. 7. Left: The environment showing “food” (A) and “rest” (B) places. Right: A Lego robot equipped with light sensors (A) and bumpers (B).

192 has been made possible thanks to a model combining prevents it from being selected. Hence the robot cannot 214

193 navigation and action selection capacities. move and eat at the same time. 215

As usual, saliences in both loops depend upon both


F

216

194 3.3. Navigation and action selection internal and external perceptions. However, saliences 217

in the ventral loop also depend upon four direction pro- 218
OO

195 The connection of the previously described nav- files (Fig. 10) that are generated by two different nav- 219

196 igation and action selection models and their im- igation strategies, i.e., a simple guidance (or taxon) 220

197 plementation on a simulated robot were inspired strategy, and a more elaborate topological navigation 221

198 by recent hypotheses concerning the role of dedi- strategy [51]. This allows the robot to be attracted either 222

199 cated structures within the basal ganglia – the nu- by an object that it directly perceives (guidance profile) 223
PR

200 cleus accumbens in particular – and the interaction of or to move towards a region where such an object is lo- 224

201 basal ganglia–thalamus–cortex loops in the rat’s brain cated in its map (planning profile). The latter possibility 225

202 [24,26]. The corresponding model (Fig. 9) basically puts some constraints on action selection because the 226

203 involves two such loops: a ventral loop that selects lo- robot is committed to regularly returning to previously 227

204 comotor actions, like moving north or east, and a dorsal mapped areas in its environment in order to check the 228

205 loop that selects non-locomotor actions, like feeding or accuracy of the current map (homing profile). This need 229
D

206 resting. Each of these loops has been modeled as a GPR


207 system like the one shown on Fig. 6. The STN of the
208 dorsal loop provides the interconnection between them
TE

209 because it sends excitatory projections to the output of


210 the ventral loop. Consequently, when the dorsal loop
211 is active and triggers some non-locomotor action, the
212 excitatory signal that is sent towards the ventral loop
raises the inhibition level of every locomotor action and
EC

213
RR
CO

Fig. 8. Histogram showing the percentage of overall time during Fig. 9. Interconnection of the ventral and dorsal loops in the basal
which Potential Energy is reloaded at the values shown on the ab- ganglia. The ventral loop selects locomotor actions, the dorsal loop
scissa. selects non-locomotor actions. The latter subsumes the former.

ROBOT 1194 1–13


UN
6 J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx
F
OO

Fig. 11. The model integrating navigation and action selection calls
upon two basal ganglia–thalamus–cortex loops. Each loop is man-
aged by a GPR model, and the coordination between loops is provided
by the subthalamic nucleus of the dorsal loop, which is connected
Fig. 10. Three direction profiles (out of the four used by the model) to the ventral loop (connection not shown here). The dorsal loop
that call upon the current map of the environment. According to the selects one of the two possible reloading actions, the ventral loop
planning profile, the robot is motivated to move in two broad direc- selects one of the 36 directions of motion (simplified here to four
PR

tions that correspond to two resources recorded in its map. Accord- cardinal directions, whereas the homing and exploration input pro-
ing to the homing profile, the robot is motivated to return to already files are not shown). Inhibitory connections are represented by dotted
explored regions of the environment. According to the exploration arrows, excitatory connections by solid arrows.
profile, the robot is motivated to wander in as yet unexplored regions
of its environment.
excitatory signal will be sent to the ventral loop in order 251

to inhibit locomotor actions during this period. 252


230 is expressed by a Disorientation variable managed by Several experiments have been performed to test the
D

253
231 the model, which increases when the robot enters un- capacities of this control architecture to ensure survival 254
232 explored areas, and decreases when it returns to known by maintaining the robot’s essential variables above 255
areas. When this variable is not too high, the robot is
TE

233
lethal levels. In particular, it has been shown that, in 256
234 motivated to explore its environment (exploration pro- the environment on the left of Fig. 12, a robot (robot 257
235 file). This model has been implemented in a version A) calling upon both topological navigation and guid- 258
236 that manages 36 locomotor actions, i.e., moving in each ance strategies survived longer than another (robot B) 259
237 of 36 possible directions, and two non-locomotor ac-
EC

238 tions, i.e., reloading actions on food and rest places that
239 change the robot’s Energy and Potential Energy levels.
240 In the simplified example described by Fig. 11, only
241 locomotor actions are selected because external percep-
242 tions are not strong enough to suppress the inhibition
RR

243 of reloading actions on resource spots A or B. Because


244 the level of the B battery is the lowest, the robot’s main
245 motivation is to move north, according to one of the
246 directions advocated by the current planning profile.
Fig. 12. Two environments used to test the connection of navigation
247 In all probability, the robot following this navigation
and action selection models. E: “rest” place, Ep: “food” place. Left:
strategy will later turn east, then south and, when it
CO

248
In this environment, it is impossible to see one resource place from the
249 gets close to the B object, this fact will be detected by other. Right: The resource at Ep2 is not present during the elaboration
250 its sensors, a refueling action will be triggered, and an of the map.

ROBOT 1194 1–13


UN
J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx 7

Table 1
Statistical comparison of survival times measured along 10 runs for
two robots tested in the environment on the left of Fig. 12 over a 6-h
interval
Survival times (s)
Median Range
Robot A 14431.5 2531:17274
Robot B 4908.0 2518:8831
U Mann–Whitney U = 15 P < 0.01

260 relying on guidance only (Table 1). Hence, being able to Fig. 13. Two trajectories leading to “food” places (Ep). One is shorter
261 build a map of an unknown environment and to use this than the other, but entails passing through a dangerous place (Dan-
262 map to navigate between places E and Ep where En- ger).
263 ergy and Potential Energy could be acquired enhanced
F

264 the robot’s survival. A new action being selected ev- saliences, i.e., upon the respective attractiveness of a 286

265 ery 0.87 s as a mean, survival times equal to 4 h – the resource perceived directly and one whose location is 287

maximum simulation length – were attained. However, recorded on the map.


OO

266 288

267 in some occasions, premature deaths occurred because In the environment on Fig. 13, the robot has the 289

268 of the difficulty of elaborating a map from scratch that choice between two trajectories leading to a “food” 290

269 was precise enough to help the robot reach the right place. The first one is shorter but entails passing through 291

270 resource before exhaustion. a “dangerous” place. The second one is longer, but 292

271 Likewise, in the environment on the right of Fig. 12, safer. Experience shows that the robot is able to de- 293
PR

272 it has been shown that, if place Ep1 is the only one cide to navigate via the longer path when its Potential 294

273 previously encountered by the robot and recorded on Energy level is not so low that a long journey would 295

274 its map, the robot may decide to move towards that compromise its survival, but that otherwise it chooses 296

275 place to reload its Potential Energy. However, if on its the shorter path risking having to face the potential 297

276 way it detects the proximity of another food place like danger recorded on its map. Path choice depends on 298

277 Ep2, it will occasionally give up navigating towards the respective weights of two internal states respec- 299
D

278 Ep1 and opportunistically divert via Ep2. Then, having tively related to the need of Potential Energy and to 300

279 consumed the corresponding resource, it will register Fear (Table 3). 301

280 the position of Ep2 on its map. Thus, next time it needs Finally, in the complex and challenging environ- 302
TE

281 to reload its Potential Energy, it will have the choice ment of Fig. 14, in which several possibilities exist 303

282 of navigating towards Ep1 or Ep2. Table 2 shows that of getting lost, of encountering dangers, of discover- 304

283 such opportunistic behavior depends upon the respec- ing newly created resources, and of reaching places in 305

284 tive weights that were assigned to topological naviga- which a given resource is no longer available, experi- 306

tion and guidance profiles and that served to compute ence shows that the robot survives autonomously for
EC

285 307

long periods, thanks to the many adaptive mechanisms 308


Table 2 and behaviors that have just been described [24]. The 309
Number of opportunistic detours towards Ep2 out of 15 runs in the
longest survival time thus obtained was 21 h of sim- 310
environment on the right of Fig. 12
Number of Weight of Weight of Opportunistic
RR

Ep sources topological guidance detours towards Table 3


navigation Ep2 Number of trips via the short and long paths out of 20 runs in the
Fig. 13 environment
1 0.65 0.55 0/15
Fear level Potential Energy level Short path Long path
2 0.65 0.55 2/15
2 0.55 0.55 8/15 0.2 0.1 13 7
2 0.45 0.55 13/15 0.2 0.5 2 18
CO

These numbers increase when greater weights are given to guidance The greater the need for Potential Energy, the more often the short
than to topological navigation. path is preferred.

ROBOT 1194 1–13


UN
8 J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx

critic module could call upon a dopamine reinforce- 335

ment signal řt that is assumed to evaluate both the 336

episodic primary reward signal rt occasionally gen- 337

erated by the robot’s actions, and a secondary signal 338

computed as the difference gPt − Pt−1 between cur- 339

rently expected and future rewards (g being a dis- 340

count factor which determines how far in the future 341

expected rewards are taken into account). This rein- 342

forcement signal would be used in the GPR module 343

to adapt the way saliences are computed in order to 344

select the most appropriate action, i.e., the action the 345

most likely to maximize the reward it will lead to 346

(Fig. 15). 347

Several variants of this learning architecture [36] 348


F

have been implemented in a simulated robot that must 349

learn in a plus-maze, and through successive trials, 350


OO

which action to perform in order to get to the end of a 351

Fig. 14. A complex environment with four “rest” places (E), four corridor where a door may provide access to a reward 352

“food” places (Ep) and two dangerous places (ZD). Resource E1 (Fig. 16). This setting reproduces an experiment per- 353

appears and disappears every 30 min.


PR

311 ulated time. In this case, the major cause preventing


312 indefinite survival does not seem to be the navigational
313 issues mentioned above, but rather an intrinsic draw-
314 back in the GPR model, which gets locked in partic-
315 ular circumstances, with no action being disinhibited
316 enough to be selected. Current work aims at suppress-
D

317 ing such limitations and at further enhancing the robot’s


318 lifetime.
TE

319 3.4. Learning

320 In an unknown environment, a rat is able to explore


321 it and to incrementally build a map that describes its
EC

322 topology. Such associative learning, which combines


323 both allothetic and idiothetic data, was implemented in
324 the navigation model described above.
325 However, a rat is also able to improve its behavior
326 over time through reinforcement learning, i.e., thanks
RR

327 to adaptive mechanisms that increase its chances of


328 exhibiting behaviors leading to rewards and that lower
329 those of behaviors leading to punishments. Concern- Fig. 15. An actor-critic model of reinforcement learning. In this ver-
330 ing action selection, a recently debated hypothesis sion, the actor module is a GPR model (involving matrisomes in
the dorsal striatum) that is segregated into different channels, with
331 [6,34,46] postulates that such mechanisms could be
saliences as inputs and actions as outputs. The critic module (involv-
mediated by dopamine signals within so-called actor-
CO

332
ing striosomes in the dorsal striatum) propagates towards the actor
333 critic architectures. According to this hypothesis, a module an estimate ř of the instantaneous reinforcement triggered by
334 GPR module could play the role of an actor, while a the selected action. Th: Thalamus; SNc: substantia nigra compacta.

ROBOT 1194 1–13


UN
J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx 9

Fig. 16. The plus-maze in which the robot has to learn which action
to perform in order to reach a rewarding resource that is delivered at Fig. 17. Evolution over successive trials of the number of actions
a given end. The doors at three such ends are colored dark-grey and that the robot triggers before getting to the reward.
do not deliver any reward, whereas the fourth one is colored white
and leads to the resource. The center of the maze is characterized
F

by light-grey walls, corridors by black ones. The position of the 4. Discussion 379
rewarding extremity is chosen at random at the beginning of each
new trial, when the robot succeeds in returning to its starting point,
OO

i.e., at the intersection of the four corridors. The robot is equipped It is clear from recent reviews [5,33,52] that, al- 380

with a visual system that serves to detect the colors, bearings and though many research efforts have been devoted to the 381

distances of walls and doors, and with infrared sensors that trigger design of biomimetic sensors or effectors for robots, 382
low-level obstacle-avoidance reflexes. relatively little work has been done on control system 383

architectures, and what has been done has focused pri- 384

marily on invertebrate models. Only a few groups are 385


PR

354 formed with real rats [1] that must learn to reach the currently building biomimetic robot control architec- 386

355 center of the maze, to seek which door gets lighted by tures modeled on mammalian nervous systems and, 387

356 an overhanging lamp, to move to that door and gain moreover, their efforts are often centered on isolated 388

357 access to a water dispenser, to return to the center, and behaviors, like locomotion in cats [44] or feeding in 389

358 so on. mice [29], which are not dealt with in an integrated 390

359 To learn this task, 12 variables describing the robot’s perspective. To the best of our knowledge, the scope 391
D

360 visual input were used by the GPR to select among six of the Psikharpax project is unique. First, it draws in- 392

361 possible actions (drink, move forward, turn to white, spiration from a vertebrate instead of an invertebrate. 393

turn to light-grey, turn to dark-grey, do nothing—see Second, it aims at designing both biomimetic sensors
TE

362 394

363 Fig. 15). Hence, instead of being hand-crafted as in and control architectures. Third, because it capitalizes 395

364 the experiments described in Sections 3.2 and 3.3, the on a dedicated robotic platform, it will integrate a vari- 396

365 saliences of the action selection module were self- ety of sensors, actuators and control systems making it 397

366 adapted so as to maximize the incoming rewards. possible to assess its adaptive capacities in much more 398
EC

367 Experimental results demonstrate that the most ef- challenging circumstances than those that characterize 399

368 fective learning architecture is obtained when the seemingly comparable biomimetic robotic approaches 400

369 whole maze is arbitrarily partitioned in several zones, [11,23,32,41,49]. 401

370 and when a specific actor-critic pair learns the right However, from the point of view of biological real- 402

371 action to be accomplished in each zone. This archi- ism, several improvements could be made to the mod- 403
RR

372 tecture, inspired by [48] and [14], yielded the learn- els just described. In particular, the current naviga- 404

373 ing curve shown in Fig. 17, which compares favor- tion model could be replaced by a version reproduc- 405

374 ably with the best hand-crafted action selection mod- ing more faithfully the anatomy and physiology of the 406

375 ule that has been able to be designed. As expected, hippocampus and related areas (e.g., [50,3]), or sev- 407

376 after learning, the turn to dark-grey and do nothing eral hypotheses about the way cortical columns are 408

actions were never selected. Future work will aim at connected to these areas and afford planning capaci-
CO

377 409

378 self-adapting the arbitrary partition that has been used ties (e.g., [10,37]) could be explored. Likewise, other 410

here. hypotheses about how the navigation and action se- 411

ROBOT 1194 1–13


UN
10 J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx

412 lection models could be connected should be imple- ing which solution, the natural or the artificial, is better 460

413 mented and compared to the control architecture de- adapted to which decision problem. 461

414 scribed herein. For instance, instead of selecting mere From the point of view of robotics, the Psikharpax 462

415 locomotor actions from the saliences of several di- project substantially extends the scope of traditional 463

416 rection profiles, the ventral loop in the basal ganglia approaches to robot control because it will involve a 464

417 might play a higher-level role and use current internal wide variety of behaviors, because it will implement 465

418 and external perceptions to select the most appropriate different learning mechanisms, and because it is not ex- 466

419 among the possible navigation strategies [43]. Finally, clusively devoted to tasks that serve humans. As noted 467

420 there are many possibilities for extending the results in [2], the majority of research conducted so far on 468

421 obtained so far on learning. Actor-critic models call- autonomous mobile robots has concentrated on devel- 469

422 ing upon dopaminergic neurons should be compared oping vehicles that exhibit a single type of behavior 470

423 to concurrent hypotheses involving the role of gluta- supporting the specific task of moving between two 471

424 mate [45] in reinforcement learning. Also, useful dis- positions in the environment while avoiding collisions 472

425 tinctions are probably to be made between habit learn- with obstacles. Clearly such limited capacities may not 473
F

426 ing and goal-directed learning [12]. Finally, the role afford the degree of autonomy required in situations 474

427 of neuromodulators in the setting of learning meta- where human intervention is undesirable or impossi- 475
OO

428 parameters should be investigated, as suggested in ble, and raises the question of how the concept of “self- 476

429 [15]. autonomy”, i.e., behavior which may be characterized 477

430 From an engineering point of view, the system de- as supporting self survival differs from that of “imposed 478

431 scribed above raises several important issues. Consid- autonomy”, i.e., behavior which does not benefit the 479

432 ering that living systems are the product of roughly robot but fulfils some desired task which we impose 480

433 3.5 billions years of evolutionary tinkering [35], one upon the system. In this perspective, being able to in- 481
PR

434 may wonder if there is the slightest chance that artifi- tegrate the past (through its recorded map), the present 482

435 cial systems exhibiting adaptive behaviors of a com- (through its sensors) and the future (through its plan- 483

436 parable efficiency might call upon simplistic neurons ning capacities), Psikharpax will represent an embod- 484

437 (e.g., threshold-gate models) and homogeneous archi- ied example of a motivationally autonomous animat 485

438 tectures (e.g., perceptrons). In other words, if nature has whose control complexity may well challenge the pos- 486

439 invented the highly complex and still imperfectly un- sibilities of external control and, hence, its capacities 487
D

440 derstood processes at work in real neurons and, if it has to withstand any imposed autonomy [18]. Conversely, 488

441 connected them in highly heterogeneous networks, it following Moravec [42], one may hope that its mobil- 489

is probably for good reasons, worth deciphering. Like- ity, its sensor equipment, and its ability to carry out
TE

442 490

443 wise, there are certainly good reasons why the output survival-related tasks in a dynamic environment will 491

444 neurons of so many structures in the brain (e.g., the provide a necessary basis for surpassing the current 492

445 cerebellum, the frontal and prefrontal cortex, the basal “reptile-stage” of robots and eventually reaching the 493

446 ganglia, the striatum, etc.) play an inhibitory role. As general competence and “true” intelligence of human 494
EC

447 described by Berthoz, decision in the brain is often the beings. 495

448 selective suppression of actions that are irrelevant with


449 respect to the goal, to the context, to past experience. “I
450 think, therefore I inhibit. [. . .] Whereas engineers tend 5. Conclusion 496

451 to formalize the problem of action selection in terms


RR

452 of probabilities of winning more than loosing, nature’s The Psikharpax project aims at designing an artifi- 497

453 solution is in terms of excitation/inhibition. Underly- cial rat able to “survive” in a laboratory populated by 498

454 ing such competition, there is a wealth of possibilities humans and other robots. It has two main objectives: 499

455 that we are a long way from understanding, but that far better understand the control mechanisms of rats, im- 500

456 transcends the cold estimate of probability calculus, prove the autonomy of robots. The robot Psikharpax 501

how Bayesian it may be. A new neurocognitive the- will be endowed with many sensors and motors that
CO

457 502

458 ory of decision remains to be elaborated” [7]. Again, are currently under development and that will serve 503

459 such considerations lead to the question of understand- to implement various reflexes. Its control architecture 504

ROBOT 1194 1–13


UN
J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx 11

505 has already been tested in simulation and implemented [7] A. Berthoz, La décision, Odile Jacob, Paris, 2003. 545

506 on simpler versions of the future robot. In particular, [8] R. Brooks, A robust layered control system for a mobile robot, 546

IEEE Robotics and Automation 2 (1) (1986) 14–23. 547


507 models for navigation and action selection – which af-
[9] R. Brooks, Cambrian Intelligence. The Early History of the New 548
508 ford capacities of associative and reinforcement learn- AI, The MIT Press, Cambridge, 1999. 549
509 ing – have been successfully tested. It thus appears that [10] Y. Burnod, An Adaptive Neural Network: The Cerebral Cortex, 550

510 Psikharpax will be able to explore an unknown envi- Masson, Paris, 1993. 551

511 ronment, to build a topological map of it, and to plan [11] G. Capi, E. Uchibe, K. Doya, Selection of neural architecture 552
and the environment complexity, in: Proceedings of the Third 553
512 trajectories to places where it will fulfill various internal
International Symposium on Human and Artificial Intelligence 554
513 needs, like “eating”, “resting”, “exploring” or “avoid- Systems: Dynamic Systems Approach for Embodiment and So- 555
514 ing danger”. The first version of such an efficient robot ciality, Fukui, Japan, 2002, pp. 231–237. 556
515 is expected to be available at the end of year 2005: still [12] P. Dayan, B.W. Balleine, Reward, motivation, and reinforce- 557

516 a long way to the ”whole ratẗhat Dennett might have ment learning, Neuron 36 (2002) 285–298. 558

517 advocated [13]. Even a longer way to the intelligence [13] D.C. Dennett, Why not the Whole Iguana? Behavioral and Brain 559

Sciences 1 (1978) 103–104. 560


518 of man. [14] K. Doya, K. Samejima, K. Katagiri, M. Kawato, Multiple
F

561

model-based reinforcement learning, Neural Computation 14 562


(6) (2002) 1347–1369. 563
OO

[15] K. Doya, Metalearning and neuromodulation, Neural Compu- 564


519 Uncited references tation 12 (1) (2003) 219–245. 565
[16] H.L. Dreyfus, What Computers Can’t Do: The Limits of Arti- 566
520 [8,27]. ficial Intelligence, Harper and Row, 1972. 567
[17] H.L. Dreyfus, What Computers Still Can’t Do: A Critique of 568

Artificial Reason, The MIT Press, Cambridge, 1992. 569

[18] D. Mc Farland, T. Bösser, Intelligent Behavior in Animals and 570


PR

521 Acknowledgments Robots, The MIT Press, Cambridge, 1993. 571

[19] E.A. Feigenbaum, J. Feldman, Computers and Thought, 572

522 This research was granted by the Project Robotics McGraw-Hill, 1963. 573

[20] D. Filliat, J.-A. Meyer, Active perception and map learning for 574
523 and Artificial Entities (ROBEA) of the Centre Na-
robot navigation, in: From Animals to Animats 6, Proceed- 575
524 tional de la Recherche Scientifique, France. The au- ings of the Sixth International Conference on Simulation of 576
525 thors wish to thank David Filliat, Loı̈c Lachèze, and Adaptive Behavior, The MIT Press, Cambridge, 2000, pp. 246– 577
D

526 Sidney Wiener for their valuable contribution to the 255. 578

527 project. [21] D. Filliat, J.-A. Meyer, Map-based navigation in mobile robots. 579

I. A review of localization strategies, Journal of Cognitive Sys- 580


TE

tems Research 4 (4) (2003) 243–282. 581

[22] D. Filliat, J.-A. Meyer, Global localization and topological map 582

528 References learning for robot navigation, in: From Animals to Animats 7, 583

Proceedings of the Seventh International Conference on Simu- 584

529 [1] S.V. Albertin, A.B. Mulder, E. Tabuchi, M.B. Zugaro, S.I. lation of Adaptive Behavior, The MIT Press, Cambridge, 2002, 585

Wiener, Lesions of the medial shell of the nucleus accumbens pp. 131–140. 586
EC

530

531 impair rats in finding larger rewards, but spare reward-seeking [23] P. Gaussier, S. Leprêtre, M. Quoy, A. Revel, C. Joulain, J.P. 587

532 behavior, Behavioral Brain Research 117 (1/2) (2000) 173– Banquet, Experiments and models about cognitive map learning 588

533 183. for motivated navigation, Robotics and Intelligent Systems 24 589

534 [2] T. Anderson, M. Donath, Animal behaviour as a paradigm for (2000) 53–94. 590

[24] B. Girard, Intégration de la navigation et de la sélection de 591


535 development robot autonomy, Robotics and Autonomous Sys-
l’action dans une architecture de contrôle inspirée des ganglions 592
tems 6 (1990) 145–168.
RR

536
de la base, Ph.D. Thesis, University of Paris 6, France, 2003. 593
537 [3] A. Arleo, M. Gerstner, Spatial cognition and neuro-mimetic
[25] B. Girard, V. Cuzin, A. Guillot, K. Gurney, T.J. Prescott, Com- 594
538 navigation: a model of hippocampal place cell activity, Biolog-
paring a bio-inspired robot action selection mechanism with 595
539 ical Cybernetics 83 (2000) 287–299.
winner-takes-all, in: From Animals to Animats 7, Proceed- 596
540 [4] W. Ashby, Design for a Brain, Wiley, New York, 1952.
ings of the Seventh International Conference on Simulation 597
541 [5] Y. Bar-Cohen, C. Breazeal, Biologically-inspired Intelligent
of Adaptive Behavior, The MIT Press, Cambridge, 2002, pp. 598
542 Robots, SPIE, Washington, 2003.
CO

543 [6] A.G. Barto, Adaptive critics and the basal ganglia, in: Models 75–84. 599

544 of Information Processing in the Basal Ganglia, The MIT Press, [26] B. Girard, V. Cuzin, A. Guillot, K. Gurney, T.J. Prescott, A basal 600

Cambridge, 1995, pp. 215–232. ganglia inspired model of action selection evaluated in a robotic 601

ROBOT 1194 1–13


UN
12 J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx

602 survival task, Journal of Integrative Neuroscience 2 (22) (2003) of Adaptive Behaviour, The MIT Press, Cambridge, 2000, pp. 659

603 179–200. 157–166. 660

604 [27] B. Girard, D. Filliat, J.-A. Meyer, A. Berthoz, A. Guillot, [42] H.P. Moravec, Robot: Mere Machine to Transcendent Mind, 661

605 An integration of two control architectures of action selec- Oxford University Press, Oxford, 1998. 662

606 tion and navigation inspired by neural circuits in the verte- [43] A.B. Mulder, E. Tabuchi, S.I. Wiener, Neurons in hippocam- 663

607 brates: the basal ganglia, in: Connectionist Models of Cognition pal afferent zones of rat striatum parse routes into multi-pace 664

608 and Perception II, Proceedings of the Eighth Neural Compu- segments during maze navigation, European Journal of Neuro- 665

609 tation and Psychology Workshop, World Scientific, 2004, pp. science 19 (7) (2004) 1923–1932. 666

610 72–81. [44] A.E. Patla, T.W. Calvert, R.B. Stein, Model of a pattern gener- 667

611 [28] S. Gourichon, J.-A. Meyer, P. Pirim, Using colored snapshots ator for locomotion in mammals, American Journal of Physiol- 668

612 for short-range guidance in mobile robots, International Journal ogy: Regulatory, Integrative and Comparative Physiology 248 669

613 of Robotics and Automation 17 (4) (2002) 154–162. (1985) 484–494. 670

614 [29] A. Guillot, J.-A. Meyer, A test of the Booth energy flow model [45] C.M.A. Pennartz, B.L. McNaughton, A.B. Mulder, The gluta- 671

615 (Mark 3) on feeding patterns of mice, Appetite 8 (1987) 67–78. mate hypothesis of reinforcement learning, Progress in Brain 672

616 [30] A. Guillot, J.-A. Meyer, The animat contribution to cognitive Research 126 (2000) 231–253. 673

617 systems research, Journal of Cognitive Systems Research 2 (2) [46] W. Schultz, P. Dayan, P.R. Montague, A neural substrate of 674

(2001) 157–165. prediction and reward, Science 275 (1997) 1593–1599.


F

618 675

619 [31] K.N. Gurney, T.J. Prescott, P. Redgrave, A computational model [47] H.A. Simon, Models of Man, Wiley, New York, 1957. 676
620 of action selection in the basal ganglia. I. A new functional [48] R.E. Suri, W. Schultz, Temporal difference model reproduces 677
OO

621 anatomy, Biological Cybernetics 84 (2001) 401–410. anticipatory neural activity, Neural Computation 13 (2001) 678

622 [32] V.V. Hafner, M. Fend, M. Lungarella, R. Pfeifer, P. König, K.P. 841–862. 679

623 Körding, Optimal coding for naturally occurring whisker de- [49] D. Touretzky, L. Saksida, Operant conditioning in skinnerbots, 680

624 flections, in: Proceedings of the Joint International Conference Adaptive Behavior 5 (3/4) (1997) 219–247. 681

625 on Artificial Neural Networks and Neural Information Process- [50] O. Trullier, J.-A. Meyer, Animat navigation using a cognitive 682

626 ing (ICANN/ICONIP), Springer Lecture Notes in Computer graph, Biological Cybernetics 83 (3) (2000) 271–285. 683

627 Science, 2003, pp. 805–812. [51] O. Trullier, S.I. Wiener, A. Berthoz, J.-A. Meyer, Biologically- 684
PR

628 [33] O. Holland, D. McFarland, Artificial Ethology, Oxford Univer- based artificial navigation systems: review and prospects, 685

629 sity Press, Oxford, 2003. Progress in Neurobiology 51 (1997) 483–544. 686
630 [34] J.C. Houk, J.L. Adams, A.G. Barto, A model of how the basal [52] B. Webb, T.R. Consi, Biorobotics: Methods and Applications, 687

631 ganglia generate and use neural signals that predict reinforce- The MIT Press, Cambridge, 2001. 688

632 ment, in: Models of Information Processing in the Basal Gan-


633 glia, The MIT Press, Cambridge, 1995, pp. 249–270.
634 [35] F. Jacob, Evolution and tinkering, Science 196 (1977) 116–
D

635 166. Jean-Arcady Meyer is Research Director 689

636 [36] M. Khamassi, B. Girard, A. Berthoz, A. Guillot, Comparing at CNRS and heads the AnimatLab within 690

637 three critic models of reinforcement learning in the basal ganglia the Laboratoire d’Informatique de Paris 6 691

(LIP6) in Paris. He trained as an engineer,


TE

638 connected to a detailed actor in a S-R task, in: Proceedings of 692

639 the Eighth Intelligent Autonomous Systems Conference, IOS he graduated in Human and Animal Psy- 693

640 Press, 2004, pp. 430–437. chology, and received his PhD in Biology. 694

641 [37] R.A. Koene, A. Gorchetchnikov, R.C. Cannon, M.E. Hasselmo, He is the founder of the journal Adaptive 695

642 Modeling goal-directed spatial navigation in the rat based on Behavior and a former Director of the Inter- 696

643 physiological data from the hippocampal formation, Neural national Society for Adaptive Behavior. His 697
EC

644 Networks 16 (5/6) (2003) 577–584. primary scientific interests are in adaptive 698

645 [38] J.-A. Meyer, The animat approach to cognitive science, in: behaviors in natural and artificial systems. 699

646 Comparative Approaches to Cognitive Science, The MIT Press, He is the main coordinator of the Psikharpax project. 700

647 Cambridge, 1995, pp. 27–44.


648 [39] J.-A. Meyer, Artificial life and the animat approach to artificial
649 intelligence, in: Artificial Intelligence, Academic Press, Lon- Agnès Guillot is Associate Professor of 701
RR

650 don, 1996, pp. 325–354. Psychophysiology at Paris X University. 702

651 [40] J.-A. Meyer, D. Filliat, Map-based navigation in mobile robots. She graduated in Human and Animal Psy- 703

652 II. A review of map-learning and path-planning strategies, chology, and holds PhDs in Psychophysi- 704

653 Journal of Cognitive Systems Research 4 (4) (2003) 283– ology and Biomathematics. Her main sci- 705

654 317. entific interests are in action selection in 706

655 [41] F. Montes Gonzalez, T.J. Prescott, K. Gurney, M. Humphries, P. animals and robots. She coordinates the 707

biomimetic modelling of Psikharpax.


CO

708
656 Redgrave, An embodied model of action selection mechanisms
657 in the vertebrate brain, in: From Animals to Animats 6, Pro-
658 ceedings of the Sixth International Conference on Simulation

ROBOT 1194 1–13


UN
J.-A. Meyer et al. / Robotics and Autonomous Systems xxx (2004) xxx–xxx 13

709 Benoı̂t Girard is a Postdoctoral Fellow at Patrick Pirim trained as an engineer. He 727

710 the Laboratoire de Physiologie de la Percep- founded BEV, a company that brings fully 728

711 tion et de l’Action (LPPA) at the Collège de developed new technologies to the market- 729

712 France. He trained as an engineer and re- place. He invented the GVPP chip that is 730

713 ceived his PhD in Computer Science from dedicated to low-level real-time signal pro- 731

714 the University of Paris 6. His current re- cessing. He coordinates the sensory-motor 732

715 search interests are focussed on computa- design of Psikharpax. 733

716 tional models of action selection in the rat


717 brain, and notably in the control of oculo-
718 motor saccades.

Alain Berthoz trained as an engineer. He 734

graduated in Human Psychology and re- 735

719 Mehdi Khamassi is working both at the An- ceived his PhD in Biology. He is Professor 736

720 imatLab and at the LPPA as a PhD student. at the Collège de France, where he heads the 737

721 He trained as an engineer and received a Laboratoire de Physiologie de la Perception 738

722 Master Science degree in Cognitive Science et de l’ Action (LPPA). He is a member of 739
F

723 from the University of Paris 6. His current many academic societies, an expert sitting 740

724 research interests include electrophysiology on several international committees, and he 741
OO

725 experiments and computational modelling has been awarded many prizes. His main 742

726 of learning processes in the rat brain. scientific interests are in the multisensory 743

control of gaze, of equilibrium, of locomo- 744


tion and of spatial memory. He coordinates the neurophysiological 745

experiments that inspire the Psikharpax project. 746


D PR
TE
EC
RR
CO

ROBOT 1194 1–13


UN

You might also like