Reinforcement Learning

Book Reviews
Reinforcement Learning
R. S. Sutton and A. G. Barto
Cambridge, MA: MIT Press, 1998.
Hardbound, $40.00. ISBN 0-262-19398-1
Reviewed by C. R. Gallistel
Reinforcement learning, as understood by Sutton and and historical sections at the end of each chapter are
Barto, is a fusion of the trial-and-error “law-of-effect” useful for the perspective they give. The many suggested
tradition in psychology, optimal control theory in engi- exercises are challenging, open ended, and thought pro-
neering, the secondary reinforcement tradition in learn- voking.
ing, and the use of decaying stimulus traces in, for
example, Hull’s (1952) concept of a goal gradient and,
more recently, Wagner’s (1981) model of conditioning.
Basic Concepts in Reinforcement Learning
This fusion has given researchers in artiªcial intelligence
a number of ideas for computer algorithms that learn a The reward function is the objective feedback from the
policy that maximizes the agent’s long term return environment. Rewards are integer or scalar variables as-
(amount of reward) from the performance of a task. sociated with some states or state-action pairs. This asso-
Although many of the ideas behind reinforcement ciation (the reward function) deªnes the goal of the
learning originated in psychological theorizing, in recent agent in a given situation. For example, a reward of 1
years these ideas have been most extensively developed might be associated with the state of having moved all
within the artiªcial learning community, particularly by one’s pieces off the board in a backgammon game, while
the authors of this important summary, their students, a reward of 0 is associated with all other states of the
and colleagues. The book is intended for use in a one-se- game (all the states leading up to the winning state). The
mester course for students interested in machine learn- agent’s sole objective is to maximize net long-term re-
ing and artiªcial intelligence. It would probably not be ward (e.g., in a backgammon playing agent, number of
suitable for a course intended for psychology and neu- games won). The reward function deªnes what is objec-
roscience students, because it does not present models tively good and bad for the agent. The reward function
of experimentally established behavioral or neuroscien- is unalterable by the agent. In traditional psychological
tiªc phenomena, and the problems given to illustrate terms, the reward function speciªes the states associated
how reinforcement learning algorithms may be applied with primary reinforcement.
are not necessarily problems that animals (even human The concepts of state and action are very general. An
animals) are notably good at solving (e.g., efªcient sched- action is any decision an agent might need to learn how
uling problems). However, reinforcement learning, incen- to make, and a state is any factor that the agent might
tive, and utility remain central concepts in contemporary take into consideration in making that decision. The state
work on the neurobiological basis of learned, goal- on which an action is predicated may include a model
directed behavior (Schultz, Dayan, & Montague, 1997; of the environment. This model might represent past
Shizgal, 1997), and this book is the place to look for the states of the actor’s environment. This representation
latest ideas on how these concepts may be developed would be considered part of the agent’s state at the time
into effective models for the direction of action. it decides on an action. Thus, reinforcement learning,
The preface says that the only mathematical back- unlike the more extreme (or pure) forms of neural net
ground assumed is familiarity with elementary concepts learning, is not necessarily asymbolic. It seems to be at
of probability, such as expectations of random variables. least in principle, neutral on the issue of what knowl-
Most students will feel that rather more than that is in edge is and how it determines action.
fact assumed. Nonetheless, the material is presented in The agent’s policy (the policy function) is the map-
an intuitively understandable form, emphasizing the ba- ping from possible states to possible actions. The map-
sic ideas and giving helpful illustrations of their applica- ping may be a simple look-up table, what is sometimes
tion, rather than elaborating proofs. It can be read with called a stored policy. A psychologist would call it a set
proªt by motivated neuroscience and psychology stu- of stimulus-response associations. Alternatively, the agent
dents interested in a more rigorous development of may rely on a computed policy. A computed policy may
these psychologically important ideas. The biographical involve a search through an ever-changing tree of values,
© 1999 Massachusetts Institute of Technology Journal of Cognitive Neuroscience 11:1, pp. 126–134
using a model of the environment to look for action In general, the policies considered here are rules for
sequences that produce the greatest value. choosing actions given the values of the states to which
The value function is the agent’s current mapping those actions will immediately lead (the values of the
from the set of possible states (or state-action pairs) to possible next states). A greedy policy always selects the
its estimates of the net long term reward to be expected action that leads to the state with the greatest value. The
after visiting a state (or a state-action pair) and continu- problem with a completely greedy policy is that it does
ing to act according to its current policy thereafter. In not enable the agent to learn where little-tried and so-far
psychological terms, the value function is the secondary low value actions might lead if pursued further. That is,
reinforcement values for the many states that are not it does not allow exploration of the consequences of
associated with primary reinforcement. The value func- so-far untried action sequences. If the agent is to acquire
tion is continually updated by the agent based on the an accurate value function (see below), greedy policies
experienced consequences of its actions. These experi- need to be modiªed to allow exploratory actions, at least
enced consequences are ultimately the rewards it re- occasionally. Greedy policies thus modiªed are called
ceives, but the consequences that most immediately ε−greedy.
matter to the updating process are the values of the A central concern in the design of reinforcement
states to which the agent gets in the next few actions. learning algorithms is ªnding the optimal trade-off be-
In an episodically structured task, the current value of a tween exploratory actions, which lead to more accurate
state is the current estimate of the probability of getting knowledge of the value function, and goal-oriented ac-
to a reward state before the end of an episode. In a tions, which use the current value function to generate
continuous task, the current value of a state is the esti- actions that produce reward. Policy improvement and
mate of the time-discounted amount of reward to be the determination of the value function are interdepend-
expected starting from that state and continuing to act ent processes, because the secondary reinforcing value
according to its current policy. With time discounting, of a state depends on the average amount of primary
the contribution from a reward to the value of a state reinforcement obtained after visiting that state, which
diminishes with the delay of that reward, so that rewards depends in turn on what the agent has done when
obtained after long delays contribute negligibly to value. visiting that state and subsequent states. Thus, the value
Time discounting is another venerable idea in psycho- of a state depends on the policy being followed, while
logical theorizing. In reinforcement learning models, this the actions a given policy produces themselves depend
discounting is generally exponential (a constant percent- on the values of the next possible states. The path to the
age discount per unit of delay). Empirically, however, best policy is traced out by a dance between a changing
animals of all kinds, including humans, appear to use value function and a changing policy, each dancer re-
hyperbolic discounting (Fantino, 1969; Green, Fristoe, & sponding to the other.
Myerson, 1994; Killeen, 1982; Mazur, 1984; McDiarmid & A model in reinforcement learning is a representation
Rilling, 1965). For each reward, they compute a rate of of the environment, which is used for planning, that is,
reward by dividing the magnitude of the reward by the for determining what values a contemplated sequence
delay to obtain it; then, they average these rates to of actions will lead to. Insofar as they make use of
determine the value of an option. It would be nice if models as part of the agent’s state space, the reinforce-
reinforcement learning simulations could shed light on ment learning approaches in artiªcial intelligence de-
why agents formed by natural selection use this peculiar part considerably from the reinforcement learning
form of discounting. tradition in psychological theorizing. In psychological
What reinforcement learning is most centrally con- theorizing, reinforcement learning has often been of-
cerned with are different methods for computing and fered as an alternative to model building. In practice,
updating the value function, the agent’s current estimate however, the reinforcement learning approach in ar-
of the return to be expected once one has arrived at a tiªcial intelligence, like the reinforcement learning tradi-
given state if one continues to pursue the current policy. tion in psychology, has little to say about how models of
The general approach is to back up value in accord with the environment are acquired. (“We do not address the
the Bellman equation, which sets the value of a state issues of constructing, changing, or learning the state
equal to the discounted value of its successor states, signal” [p. 61].) Moreover, learned models of the world
weighted by the probabilities of visiting each of those are rarely incorporated into the state space of the agent,
successor states. Value (secondary reinforcement) propa- because doing so is thought to lead to computational
gates backwards from the states associated with primary intractabilities. I suspect that these intractabilities arise
reinforcement to the many intrinsically neutral states or from deep and not carefully examined difªculties with
state-action pairs that precede them. These intrinsically the kinds of representational schemes that are consid-
neutral states acquire secondary reinforcing value in- ered, in particular with the tendency to represent all
sofar as they lead sooner or later to primary reinforce- structure in discrete and probabilistic form.
ment.
Book Reviews 127

Reinforcement Learning Methods used as if it were a reward value. The secondary rein-
forcement value of the pre-action state is updated
The authors analyze three types of reinforcement learn- (“backed up”) by adding to its previous value some
ing methods: fraction of the difference between that previous value
The dynamic programming approach is a collection and the value of the post-action state. Temporal differ-
of algorithms that may be used to compute the optimal ence learning is unlike dynamic programming algorithms
policy given a complete model of the environment. The in that it does not require a complete model of the
key idea in these algorithms is to use the incremental environment; indeed, it does not require any model,
(action-by-action) updating of value functions to organ- although it can proªt from a model if one is available. It
ize and structure the search for good policies, policies is unlike Monte Carlo methods in that it does not require
that maximize the net long term reward. The process of that the task be organized into episodes and that the
ªnding a good policy is accomplished by executing an backing up of the value function be postponed until an
action in accord with that policy and then backing up episode is completed. By introducing decaying eligibility
the value of the state preceding the action on the basis traces, which determine how far back in the sequence
of the values of all the possible states attainable from the of preceding states the backing up process goes, tempo-
successor state, weighted by the probabilities of attain- ral difference methods can be made to range from pure
ing them. dynamic programming methods to pure Monte Carlo
There are two problems with dynamic programming methods, sometimes substantially improving on both.
as a general solution to the reinforcement learning prob- Psychologists and neuroscientists will recognize that
lem: 1) It assumes a complete model of the environment, the temporal difference algorithm is closely related to
which is often not in fact realistically obtainable; 2) It the principle for associative updating proposed by Res-
rapidly becomes computationally intractable as the num- corla and Wagner (1972). The well known Rescorla-Wag-
ber of state variables increases, and, hence, the size of ner formula updates the strength of an association by a
the state space for which the value function must be fraction of the difference between the reinforcement
iteratively computed. (The difªculty of obtaining exact actually received on a trial and the predicted reinforce-
solutions when states and actions are continuous rather ment, where the predicted reinforcement is determined
than discrete variables is also a problem, although it is by summing over all the pretrial associative strengths.
not entirely clear to what extent this is less of a problem The temporal difference formula updates the value of a
for the other methods considered.) predecessor state by a fraction of the difference be-
Monte Carlo methods average complete returns from tween the value of that state and the value of a successor
sampled action sequences rather than from all possible state. It might be regarded as the Rescorla-Wagner for-
action sequences, as in dynamic programming. They do mula applied to the computation of secondary reinforc-
not require a model of the environment, only sample ing value, rather than to the updating of associative
experience. However, they can be applied only to tasks strengths.
that have an episodic structure, because the computa-
tion of the return must come to an end (at the end of
Caveats and Questions
the episode) before the algorithm can compute the re-
turn for that sample action sequence and back up the The authors are careful to point out in several places
values of the intermediate states or intermediate state-ac- that “particular states and actions vary greatly from ap-
tion pairs (the states or state-action pairs leading to the plication to application [of a reinforcement learning al-
episode terminating state). Thus, Monte Carlo methods gorithm], and how they are represented can strongly
are not incremental; that is, they improve their policy affect performance” (p. 54, italics mine). To which they
and their value function only on an episode by episode add that “. . . representational choices [by the modeler]
basis rather than on an action by action basis. They do are at present more art than science.” This caveat is
not proªt as rapidly as they might from the knowledge related to a question that I believe merits more extensive
gained from acting. discussion, namely, for what kinds of problem is rein-
Third, they consider temporal difference learning, forcement learning useful?
which they argue is the idea that is most central and Reinforcement learning seems to be valuable for deal-
novel in reinforcement learning. Temporal difference ing with problems in which, even when you know the
learning is like dynamic programming in that it is fully rules and constraints, it is not at all clear what is the best
incremental; the value function and policy function are action to follow. The rules of backgammon are, for exam-
updated after each action. Each action leads to a state ple, easily learned, but how to play backgammon effec-
whose value is used to back up (change) the value(s) of tively is another matter. One of the greatest successes of
the state(s) that preceded it. The value of the preceding reinforcement learning has been Tesauro’s (1994) use of
state moves some fraction of the distance toward the reinforcement learning to create a program that rapidly
value of the succeeding state. In psychological terms, the learned to play backgammon at a level equal to the very
secondary reinforcing value of the post-action state is best human players. The program discovered strategies
128 Journal of Cognitive Neuroscience Volume 11, Number 1

that have subsequently been widely adopted by cham- behavior we see may depend only on the knowledge
pion human players. acquisition process, the process that determines the
There are other problems for which determining the agent’s state, rather than its policy.
best policy is trivial given the requisite knowledge and Another question that I would like to have seen ad-
an appropriate representation of that knowledge. For dressed is the time-scale invariance or lack thereof in
these latter problems, one does not need reinforcement reinforcement learning algorithms, particularly time-dif-
learning, either because the agent’s problem is trivial or ference algorithms. An algorithm is time-scale invariant if
because it has been solved in advance, yielding an innate the time scale of the events to which it is applied has
or preprogrammed policy (an unconditioned response). no effect on the results, that is, if the only relevant aspect
In those cases, the agent needs only an efªcient explora- of the temporal intervals in the problem space is the
tory strategy, which rapidly builds up the knowledge their relative durations. All time-series analyses in con-
base on which optimal action depends. Most interest- ventional statistics are time scale invariant; the results of
ingly, it also needs an efªcient representation of the the analysis depend only on the numbers, not the units
knowledge gained from its explorations, a representation attached to the numbers.
that does not take up too much memory and that mini- The question of time-scale invariance of reinforce-
mizes the amount of computation that must be done to ment learning arises because many of the algorithms use
determine an optimal action for achieving a speciªed eligibility traces and discount decay rates that decay by
goal. It probably also needs a set of shrewdly chosen some ªxed fraction for each ªxed time step. These decay
default assumptions that ªll in the gaps in its empirically rates impose a time scale. Does this mean that the algo-
based knowledge with useful surmises. rithms are not time scale invariant? Does how well they
This question merits discussion because it is not al- work for a given task depend on the time scale of the
ways clear whether a problem belongs to the category task? If so, an algorithm that did a good job of discover-
of problems for which reinforcement learning is going ing a scheduling policy for elevators that moved at one
to be useful or to the category for which it is not going speed would not do a good job of discovering a sched-
to be relevant. Skinner (1957) thought that language uling policy for elevators that moved more slowly, pro-
learning could reasonably be treated as a problem in longing all the intervals by a scaling factor. (Scheduling
reinforcement learning, a problem of learning what ut- elevators is one of several applications of reinforcement
terances to make in which situations. Chomsky (1959) learning discussed at some length.)
famously argued that this was nonsense, that the prob- Perhaps time-scale invariance can be achieved by ad-
lem of language learning was the problem of learning justing the time steps in these discrete-time algorithms
the grammar (building up the agent’s knowledge base, to the scale of the task. But is this something that the
i.e., the agent’s model of its linguistic environment). The modeler—a deus ex machina—must do, something that
universal grammar proposals for learning language, now the algorithm could not accomplish by itself? If rein-
favored by many linguists, assume that the human brain forcement learning algorithms cannot be made to adjust
contains an already evolved machine that solves the automatically to the time scale of the task they confront,
agent’s problem (the problem of generating acceptable then I would have reservations about them as general
utterances) once the agent has learned the grammatical purpose solutions to the problem of determining the
parameters of its language environment. policy that maximizes reward in continuous time tasks.
More prosaically, take the example of matching behav- Similar problems might arise in spatial tasks, if the algo-
ior, in which animals appear to learn to match the rela- rithms could not adjust automatically to the spatial scale.
tive amounts of time (and/or responses) they allocate to
different options to the relative incomes (amounts of
Acknowledgments
reward per unit time) that they receive from those op-
tions. This strategy equates the returns realized from the I thank Peter Dayan and Rochel Gelman, for their critical
different options being sampled, which is an optimal or comments and suggestions, from which this review beneªted
greatly.
nearly optimal policy under many (but by no means all)
circumstances. This would appear to be a reinforcement Reprint requests should be sent to C. R. Gallistel, Department
learning problem par excellence. Yet, Heyman (1982) of Psychology, UCLA, 405 Hilgard Avenue, Los Angeles, CA
and others (Gibbon, 1995; Mark & Gallistel, 1994) have 90095-1563.
argued on the basis of experimental ªndings that match-
ing behavior is elicited (unconditioned) behavior rather
REFERENCES
than learned (conditioned) behavior. They argue that
matching is simply what animals are programmed to do Chomsky, N. (1959). Review of Verbal Behavior (by B. F. Skin-
given knowledge of the relative amounts of income to ner). Language, 35, 26–58.
Fantino, E. (1969). Conditioned reinforcement, choice, and
be expected from the different options. Here, too, it may the psychological distance to reward. In D. P. Hendry (Ed.),
be the case that the agent’s role (deciding on the policy Conditioned reinforcement (pp. 163–191). Homewood, IL:
to pursue) is predetermined rather than learned; the Dorsey Press.
Book Reviews 129

Gibbon, J. (1995). Dynamics of time matching: Arousal makes written, and has, for the most part, very appealing graph-
better seem worse. Psychonomic Bulletin and Review, 2, ics.
208–215.
Green, L., Fristoe, N., & Myerson, J. (1994). Temporal discount- Research on cerebral lateralization has produced a
ing and preference reversals in choice between delayed rich literature full of interesting empirical observations.
outcomes. Psychonomic Bulletin and Review, 3, 383– Unfortunately, the literature very much lacks coherence.
389. There is, for example, little agreement to what extent
Heyman, G. M. (1982). Is time allocation unconditioned be- asymmetry is driven by perceptual or motor phenomena,
havior? In M. Commons, R. Herrnstein, & H. Rachlin (Eds.),
Quantitative analyses of behavior, vol. 2: Matching and or both (although recent contributions explicitly take a
maximizing accounts (pp. 459–490). Cambridge, MA: stand on this issue, e.g., Corballis, 1998). More proble-
Ballinger Press. matically, much of the work is still couched in terms of
Hull, C. L. (1952). A behavior system. New Haven, CT: Yale task-based specializations (e.g., language is a left-hemi-
University Press. sphere function, spatial cognition is a right-hemisphere
Killeen, P. R. (1982). Incentive theory: II. Models for choice.
Journal of the Experimental Analysis of Behavior, 38, function, interpretation is left, evaluation right, and so
217–232. on) and is not sufªciently sensitive to the computational
Mark, T. A., & Gallistel, C. R. (1994). The kinetics of matching. requirements underlying perceptual operations. One of
Journal of Experimental Psychology: Animal Behavior the most serious shortcomings, however, is that many
Processes, 20, 79–95. generalizations about cerebral lateralization are based on
Mazur, J. E. (1984). Tests of an equivalence rule for ªxed and
variable delays. Journal of Experimental Psychology: Ani- a single class of evidence, be it neuropsychological, di-
mal Behavior Processes, 10, 426–436. chotic, or visual-half-ªeld data.
McDiarmid, C. G., & Rilling, M. E. (1965). Reinforcement de- In the context of the large but conceptually disparate
lay and reinforcement rate as determinants of schedule literature, Ivry and Robertson’s contribution illustrates
preference. Psychonomic Science, 2, 195–196. how cognitive neuroscience can drive progress by pro-
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian
conditioning: Variations in the effectiveness of reinforce- viding a more integrative and computationally explicit
ment and nonreinforcement. In A. H. Black & W. F. Prokasy perspective that is equally sensitive to data deriving from
(Eds.), Classical conditioning II (pp. 64–99). New York: Ap- neuroscience and cognitive science.
pleton-Century-Crofts. The book has a brief—but particularly clear—opening
Schultz, W., Dayan, & Montague, P. R. (1997). A neural sub- chapter laying out the core ideas from neuropsychologi-
strate of prediction and reward. Science, 275, 1593–1599.
Shizgal, P. (1997). Neural basis of utility estimation. Current cal and cognitive psychological research that form the
Opinion in Neurobiology, 7, 198–208. basis of our current thinking about cerebral lateralization
Skinner, B. F. (1957). Verbal behavior. New York: Appleton- (e.g., aphasia, spatial neglect, split-brains, dichotic-listen-
Century-Crofts. ing, etc.). In chapter 2, Ivry and Robertson develop their
Tesauro, G. J. (1994). TD-Gammon, a self-teaching backgam- “double ªltering by frequency” (DFF) theory of hemi-
mon program, achieves master level play. Neural Computa-
tion, 6, 215–219. spheric asymmetries in perception. Two subsequent
Wagner, A. R. (1981). SOP: A model of automatic memory chapters discuss the DFF theory in the context of asym-
processing in animal behavior. In N. E. Spear & R. R. Miller metries in visual perception, two more cover auditory
(Eds.), Information processing in animals: Memory and speech perception. A computational model imple-
mechanisms (pp. 5–47). Hillsdale, NJ: Erlbaum. menting the DFF theory is shown (a excellent aspect of
the book), and a brief chapter of how DFF might account
for neuropsychological observations about alcoholism,
schizophrenia, and dementia follows. This section (chap-
The Two Sides Of Perception ter 8) is the weakest part of the book, interrupting the
otherwise smooth ºow of argumentation and adding
Richard B. Ivry and Lynn C. Robertson little to the discussion of the theory.
Cambridge MA: MIT Press, 1998 Three assumptions form the basis of the DFF theory.
Hardbound, $55.00. ISBN 0-262-09034-1 First, the frequency content of a visual or auditory input
Reviewed by David Poeppel signal is used to build a perceptual representation. Fre-
quency-based perceptual representations are, of course,
a natural way to characterize visual and auditory signals,
and there is ample evidence that the spectral structure
With The Two Sides of Perception, Ivry and Robertson of stimuli is in fact extracted in the visual and auditory
have done the ªeld of cognitive neuroscience a major systems. Second, frequency information is argued to be
service. The provocative new theory of perceptual asym- asymmetrically represented. In particular, higher fre-
metries in vision and audition that they develop is simple quency spectral information is preferentially processed
and elegant, uniªes a variety of otherwise unrelated in the left hemisphere, lower frequency information in
observations, and is presented in a manner so explicit the right hemisphere. Crucially, Ivry and Robertson as-
that it can be evaluated and reªned in a straightforward sume that “the difference in how the two hemispheres
way. It doesn’t hurt that the book is also extremely well amplify information is one of relative scale rather than

absolute scale” (p. 36). How is the reference point pro- neuropsychological patients are reinterpreted. Although
vided that establishes the relative scale? That is the con- there are not sufªcient data available from the DFF
sequence of their third assumption: a selective attention perspective to make a clear case, the authors discuss
process provides a mechanism by which the frequency- some very nice predictions for the case of face percep-
based perceptual representation is further speciªed. De- tion and recognition—again rooting potential asymme-
pending on the perceptual requirements or task tries in the speciªc requirement imposed by the
demands, the selective-attention mechanism highlights perceptual task. The alternative model for hemispheric
the part of the spectrum necessary to execute the re- asymmetries that is considered most seriously (in chap-
quired operation. ters 3 and 7) is the categorical-coordinate distinction
An effective way to conceptualize the DFF is to think pushed by Kosslyn and colleagues. That proposal, like the
of it as a three-stage process with the selective attention DFF theory, accounts for asymmetries by appealing to
and asymmetric ampliªcation processes as ªlters— the computational requirements underlying the execu-
hence “double ªltering by frequency.” The initial con- tion of tasks rather than in assumptions about which
struction of the perceptual representation is bilateral hemisphere which psychological function lives in. For
and symmetric, providing the same input signal to both adherents (or opponents) of the categorical-coordinate
hemispheres and thus allowing for some degree of re- distinction, this book is, unsurprisingly, a must.
dundancy in the representations that are ultimately con- Also unsurprisingly, there are some things that are not
structed. Next, selective attention picks out a spectral so satisfying. Three of the shortcomings are brieºy raised.
range. In frequency terms, this stage acts as a band-pass First, the theory places a major burden on the selective
ªlter, preferentially letting through those frequencies attention system, and the discussion of how, precisely,
necessary to execute the task. This stage is “intended to the attentional ªltering process works is not presented
capture the fact that selective attention, whether it op- in sufªcient detail. In the case of the visual system, there
erates on restricted spatial regions or on objects [pre- is a large literature that Ivry and Robertson can draw on,
sumably visual or auditory] determines which region of but the discussion of selective attention in audition is
the sensory spectrum will be elaborated for further quite thin. Because of the crucial role this process plays,
processing” (p. 60). In the second ªltering stage, the the authors at least owe us some speculation about
range selected by the selective attention mechanism is when, where, and how the system selects the critical
asymmetrically ampliªed in the two hemispheres. This information.
stage can be thought of as low-pass and high-pass ªlter- Second, there is room for considerably more detail on
ing. The left hemisphere acts as a high-pass ªlter, the the possible neurophysiologic mechanisms that underlie
right as a low-pass ªlter. It is subsequent to this stage the implementation of the DFF. The lack of attention to
that non-identical representations emerge. Prior to the neurophysiologic detail is particularly frustrating with
second ªltering operation, the two hemispheres have respect to the asymmetric ªltering stage of the model.
access to the same representations; after it, the left and One wants to know, for example, what type of receptive
right hemispheres are dealing with representations that, ªeld properties would yield the asymmetric repre-
while partially overlapping, emphasize different aspects sentations? One could imagine models that incorporate
of the frequency spectrum. a varying temporal integration time for neurons in the
The intuitive appeal of the theory is that it provides two hemispheres (e.g., left hemisphere neurons in the
a simple way for information to be simultaneously rep- relevant cortical areas integrate over, say, 25 msec, right
resented at multiple scales. When presented with a hemisphere neurons over, say, 250 msec) or models that
complex (visual or auditory) spectral input, the repre- implement a delay-line architecture that provides vary-
sentations constructed by the two hemispheres empha- ing temporal integration constants. Whatever mechanism
size different aspects of the input, permitting the system the authors envision, more words about this are neces-
to simultaneously extract whatever the lower- or higher- sary.
frequency content best represents (e.g., local detail ver- Finally, the discussion of audition and speech is pro-
sus global shape, or formant versus fundamental vocative, but leaves one with too many questions. In
frequency). In this framework, task-related perceptual these two chapters, Ivry and Robertson discuss DFF in
asymmetries are not caused by the fact that certain tasks the context of pitch perception, music, prosody, and
are exclusively performed by one or the other hemi- phoneme perception. But how things ªt together is not
sphere (e.g., language/left versus spatial/right), but are as clear. Probably many of the questions raised by these
rather the consequence of the fact that executing a chapters will be answered when the issue of how selec-
particular task requires access to the part of the spec- tive attention operates in this domain are speciªed. What
trum that contains the relevant information. is confusing is that sometimes it appears as if the atten-
Ivry and Robertson go on to apply the theory to a tional ªlter selects a very small spectral range to execute
range of phenomena in vision and audition. The model a task (e.g., in the Ivry-Lebby tasks), other times a much
is presented in action by examining the responses to broader range appears necessary (e.g., in the missing
simple and complex visual stimuli, and a lot of data from fundamental tasks, if the missing fundamental corre-
Book Reviews 131

sponds to a reasonably high frequency). Some of the Cognitive Science and Psychology, Physics and Mathe-
explicit predictions are also not supported. For example, matics, Biology, Philosophy, Experiential Approaches,
DFF predicts that the harmonic structure of music is Anomalies of Consciousness, Humanities, and Culture
preferentially processed in the left hemisphere because and Society. The number of presentations in the ªrst four
harmonic structure is in the upper part of the spectrum categories that represent a traditional scientiªc approach
(e.g., p. 52), whereas there is evidence that suggests the to consciousness was 247, and the number of presenta-
opposite result (Tramo & Bharucha, 1991). Presumably tions in the last ªve categories that do not follow a
these topics and a few others that deserve more atten- scientiªc tradition was 260. The book reºects this bias,
tion (including neuroimaging data that is relevant to including more perspectives that do not fall in the realm
testing some of the ideas) will be addressed with more of scientiªc than do. Thus, this volume might be consid-
enthusiasm in subsequent editions. ered as the “Tao” of consciousness as it imparts the Way
Importantly, the theory developed by Ivry and Robert- of Consciousness in humanistic, philosophical, and expe-
son does not try to be a theory of everything (about riential terms with somewhat less content exploring the
lateralization). The authors concentrate on hemispheric topic scientiªcally.
asymmetries in visual and auditory perception, although The editors of the volume and organizers of the con-
the theory may of course generalize to other perceptual ference have worked most effectively to blend and pro-
or motor phenomena. As is probably true for any broad vide cohesion to this array of material. One valuable
new proposal, it is likely to be wrong about many details feature of the book is an Introduction that chronicles the
and maybe even about some of the major assumptions. planning that resulted in the conference’s organizational
Nevertheless, it feels like it is on the right track. structure along with a detailed narrative of the daily
events that transpired. The “spirit” of the conference is
Reprint requests should be sent to David Poeppel, Department communicated in this chronology written by Keith Suth-
of Linguistics and Department of Biology, Neuroscience and
Cognitive Science Program, University of Maryland, College erland, Jean Burns, and Anthony Freeman and reprinted
Park, 1405 Marie Mt. Hall, College Park, MD 20742-7505. from the Journal of Consciousness Studies. Each of the
ªfteen sections of the volume are introduced with an
Overview, an additional feature that aids in integration.
REFERENCES If your aim is to read the most thoughtful of contem-
Corballis, M. (1988). Cerebral asymmetry: Motoring on. porary philosophical views of consciousness, this vol-
Trends in Cognitive Sciences, 2, 152–157. ume is most valuable. John Searle, David Chalmers,
Tramo, M., & Bharucha, J. J. (1991). Musical priming by the Daniel Dennett, and Patricia Churchland are among the
right hemisphere post-callosotomy. Neuropsychologia, 29,
313–325.
luminaries who enlighten and debate issues in this do-
main. The degree to which they disagree highlights the
preliminary state of understanding about consciousness.
Dennett, the reductionist, maintains that consciousness
Toward a Science of Consciousness II. arises solely from neurons, in contrast to Chalmers, who
The Second Tucson Discussions and asserts that consciousness may be an irreducible, funda-
Debates mental property of the universe. In different ways, Searle
and Churchland are optimists. Searle addresses nine stan-
Edited by S. R. Hameroff, A. W. Kaszniak, and A. C. Scott dard objections to the scientiªc study of consciousness,
Cambridge, MA: MIT Press, 1998. and more or less concludes that all of these objections
Hardbound, 764 pp., $70.00. ISBN 0-262-08262-4 can be overcome. Churchland considers that the prob-
lem of consciousness in neuroscience may be easier to
Reviewed by Diana S. Woodruff-Pak
solve than the problem of timing, and she wholeheart-
edly endorses further research in experimental psychol-
ogy and neuroscience.
The Tao of Consciousness
Anesthesiology offers a tantalizing perspective on con-
For someone looking for a volume addressing scientiªc sciousness. The chapter by Nicholas Franks and William
perspectives of consciousness, Toward a Science of Con- Lieb addresses the molecular basis of general anesthesia
sciousness II provides some scientiªc approaches and a and points out the neural proteins that are critically
lot more. Included in this collection of 64 chapters are affected by anesthetics. Identiªcation of these sites
articles invited from among the 42 plenary talks and might be important in ªnding essential loci for con-
close to 500 concurrent oral presentations and posters sciousness. Postsynaptic receptors responsive to GABA,
presented at the second “Toward a Science of Conscious- glycine, serotonin, and nicotinic acetylcholine receptors
ness 1996” held in Tucson in April 1996 (and affection- are the most sensitive to anesthetics. It is not yet under-
ately called, “Tucson II”). The organizers’ classiªcation of stood how anesthetics activate inhibitory receptors such
the major themes included in the conference and the as GABA and glycine, yet inhibit the excitatory activation
book yielded the following categories: Neuroscience, of serotonin and nicotinic cholinergic receptors. Hans

Flohr links intravenous dissociative drugs such as spotlight on the stage of a theater. This model relies on
ketamine and the street hallucinogen phencyclidine the construct of working memory and the interaction
(PCP) to NMDA receptors and discusses the unique between input and memory.
properties of these receptors. Flohr asserts that NMDA For the most part, Jeffrey Gray is pessimistic about the
receptors play a pivotal role in higher-order repre- possibility of locating brain substrates for consciousness,
sentational states. Thus, in Flohr’s view, consciousness is but he does present some possible approaches. He sug-
a consequence of the brain’s representational activity. gests that conscious content is similar to the output from
Brain imaging and anesthesia are the tools employed by a comparator that may be localized in the subiculum of
Michael Alkire, Richard Haier, and James Fallon to search the hippocampus. An approach to determine if con-
for the site(s) of consciousness. Imaging participants sciousness is neurally generated or learned from experi-
during an awake state and under propofol anesthesia, ence would be to test individuals with word-color
Alkire and associates observed a global effect and spe- synesthesia (who experience visual sensations such as
ciªc regional effects. Additional testing suggested that it color to auditory input, or vice versa) with brain imaging
was when global neuronal functioning is reduced below techniques. “If synesthetes have color experiences when
some critical threshold that consciousness is lost. From they hear words because unusual patterns of neuronal
this perspective, consciousness is associated with brain connectivity lead to activity in brain circuits, with no
metabolism and global neuronal activation above some need for a life history of associative learning, this would
critical threshold level. imply that the features characterizing speciªc conscious
Among the most engaging chapters in the book are experiences depend upon neuronal events, not upon
those in the section entitled, “Neural Correlates of Con- information processing” (p. 289). The implication from
sciousness” whose authors defend hypotheses about this result would be that attempts to understand con-
speciªc brain sites or mechanisms for consciousness. In sciousness using nonneuronal systems, such as comput-
the Overview of this section, the editors point out that ers, would not be efªcacious.
David Chalmers argues that we need a device that de- A review of this extensive coverage of the topic of
tects consciousness before we can determine its neural consciousness cannot even touch upon all the perspec-
substrates. We are told that during his talk at Tucson II, tives that are presented. Language and consciousness are
Chalmers demonstrated a prototype consciousness me- addressed primarily in chapters dealing with communi-
ter that was said to have been developed by combining cation in animals. Paul Bloom does raise the interesting
“neuromorphic engineering, transpersonal psychology, question of whether consciousness and mental life arose
and quantum gravity, although it strikingly resembled a in the human species because we developed language
hair dryer” (p. 216). rather than language being an emergent consequence of
Brain connectivity as an emergent property of nonspe- consciousness. His conclusion is that at least some
cialized, divergent neuronal networks is the substrate for unique aspects of human mental life exist independently
consciousness postulated by Susan Greenªeld in her of language and that the nature and origin of these
chapter addressing, “A Rosetta Stone for Mind and aspects must be explained in another way.
Brain?” Greenªeld reasons by analogy, using comparisons Eight chapters are devoted to vision and conscious-
to phenomena such as infant development, animal con- ness, which seems quite excessive in the absence of
sciousness, and electrical lights on Oxford Street to make sections devoted to other sensory modalities and cogni-
her points. Joseph Bogen who localized creativity to the tive processes like attention, working memory, declara-
corpus callosum, suggests that the state he calls, “subjec- tive memory, and executive function. Blindsight, as
tivity” is engendered by neuronal activity in and imme- discussed in chapters by Larry Weiskrantz and by Stanley
diately around the intralaminar nuclei of each thalamus. Klein, is among the most directly relevant of the topics
His position is based primarily on evidence with patients on visual perception to consciousness. With blindsight, a
who have experienced small bilateral lesions of this phenomenon observed in patients with circumscribed
region that cause immediate unresponsiveness. lesions of the visual system, it can be demonstrated that
Conscious awareness of emotion is localized to ante- accurate performance on a visual task can be separated
rior cingulate cortex in the chapter by Richard Land and from conscious awareness. In addition to blindsight,
associates who also note that this region is activated in there may be more effective strategies for assessing vis-
cognitive states engaging attention, perception, and men- ual perception and consciousness than are presented in
tal effort. The main source of evidence presented in the the section on Vision and Consciousness. For example,
Land chapter is brain imaging. Bernard Baars, James New- an approach recently published by Lumer, Friston, and
man, and John Taylor present neuronal mechanisms of Rees (1998) using experimental manipulations to pro-
consciousness in a chapter written in outline form and duce binocular rivalry during fMRI recording revealed a
conclude that a brain mechanism for consciousness is role for frontoparietal areas in conscious perception.
like a global-workspace system. This system is best un- Given the vast coverage of topics in this book ranging
derstood using the metaphor of the theater-of-conscious- from quantum physics to extrasensory perception, it
ness in which consciousness corresponds to the bright would seem likely that readers approaching conscious-
Book Reviews 133

ness from a number of different perspectives would ªnd cognitive neuroscientists. The topic of unconscious
a satisfying amount of material covering their interests. learning and memory provides important insights about
However, in the case of cognitive neuroscientists, this consciousness, yet the terms, “implicit” and “nondeclara-
book may be disappointing. From a cognitive tive” do not even appear in the extensive index of this
neuroscientiªc perspective, brain imaging techniques book.
provide tools useful in the scientiªc investigation of Tucson II collected a diversity of scholars to discuss a
consciousness. Brain imaging (PET, MRI, or fMRI) is men- topic that has been a pariah to science for most of the
tioned brieºy in four different chapters, but for the most twentieth century. Cognitive neuroscience has been one
part the potential of these techniques is ignored. Appar- of the disciplines to embrace the study of consciousness
ently Robert Tootell described functional MRI maps of and to contribute signiªcantly to progress in describing
activities related to consciousness in a presentation at it with precision both in its presence and absence; Tuc-
the conference, but he did not contribute a chapter to son III would beneªt from more input from cognitive
the volume. We are left with the philosopher Michael neuroscientists. An entire section devoted to brain imag-
Lockwood pointing out the relative futility of searching ing approaches to consciousness and another added
for insights about consciousness with brain imaging. section addressing implicit forms of learning and mem-
ory are warranted. More input including case study ma-
Indeed, it may not be utterly fanciful to suppose
terial from neuropsychology and abnormal psychology
that one day, a supercomputer hooked up to a brain
would provide better empirical evidence on conscious-
scanner may, with unerring accuracy and in minute
ness, altered states of consciousness, and the absence of
detail, describe what is going on in the conscious
consciousness. Totally ignored (including no entry in the
mind of the person whose brain is being scanned.
index) are motor systems and motor control. Motor
Perhaps such methods will also yield new insights
learning is a phenomenon that makes the transition from
into the minds of nonhuman animals.
conscious to unconscious. The atomization of behavior,
But we must ask how far such work can be seen
and the role that brain structures such as the cerebellum
as a steady progress in the direction of under-
play in this process is a central part of the conscious-un-
standing what consciousness really is, in physical
conscious continuum. The fact that one might propose
terms, or of integrating it into our physical world-
additional topics and have the desire to read yet another
view. (pp. 84–85)
huge volume on multidisciplinary perspectives of con-
Another cognitive neuroscientist who presented a sciousness attests to the vitality and value of the present
topic central to the issue of consciousness is Daniel volume.
Schacter, who discussed implicit brain memory systems.
Alas, Schacter’s chapter is not included in this volume, Reprint requests should be sent to Diana S. Woodruff-Pak, De-
partment of Psychology, Temple University, 1701 N. 13th St.,
either. The topic of learning and memory in the absence Philadelphia, PA 19122.
of awareness is one that has been addressed by extensive
research in the past decade, and Daniel Schacter has
been a leading contributor in this area. His chapter REFERENCES
would also likely have addressed consciousness using Lumer, E. D., Friston, K. J., & Rees, G. (1998). Neural corre-
brain imaging techniques. In this sense, omitting a chap- lates of perceptual rivalry in the human brain. Science,
ter by Schacter eliminates two dimensions of interest to 280, 1930–1934.

Reinforcement Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reinforcement Learning

Uploaded by

Copyright:

Available Formats

Book Reviews

Book Reviews 127

128 Journal of Cognitive Neuroscience Volume 11, Number 1

Book Reviews 129

130 Journal of Cognitive Neuroscience Volume 11, Number 1

Book Reviews 131

132 Journal of Cognitive Neuroscience Volume 11, Number 1

Book Reviews 133

134 Journal of Cognitive Neuroscience Volume 11, Number 1

You might also like