You are on page 1of 3

Open access, freely available online

Book Review/Science in the Media

Palmistry
Peter Dayan

design of conventional computers as a arguing by assertion. The discussions at


counterpoint. the end on creativity and consciousness
More recently, Hawkins has all seem a bit gossamer. The book is
generously put his money where somewhat careless about functionalism,
his ideas about mentation dictate, a key doctrine for computational
founding the Redwood Neuroscience theorists about how brains give rise
Institute and also funding various to minds. According to this doctrine,
conferences and workshops. The at least roughly, it is the functional
institute is dedicated to ‘studying roles of, and functional interactions
and promoting biologically accurate among, the physical elements of brain
mathematical models of memory that matter, and not their precise
and cognition.’ Despite its youth, the physical nature. If you can capture
Institute already has attracted notable those functional aspects correctly, for
attention as a centre for theoretical instance, in a computer program, then
neuroscience. Hawkins’ quest, and— you can (re-)create what’s important
depending on which statements of about mental states. Functionalism
the book you read—its endpoint (‘... a licenses a form of inquiry into the
comprehensive theory of how the brain computational jobs played by structures
works ... describ[ing] what intelligence in the brain. However, although
is and how your brain creates it’) or just formally agreeing that ‘there’s nothing
its tipping point (‘join me, along with inherently special or magical about the
others who take up the challenge’), are brain that allows it to be intelligent,’
DOI: 10.1371/journal.pbio.0020394.g001 the subject here. the book slips into statements
There are really three books jostling such as ‘brains and computers do
Hawkins J, with Blakeslee S (2004) inside the covers. One is the (highly fundamentally different things,’ which
On intelligence. New York: Times Books. 272 p.
ISBN (hardcover) 0805074562. US$25.00. abbreviated) autobiography. The are, at best, unfortunate shorthand.
history of modern computing is very The book is a little apt to sneak

I
s Michael Moore liberal America’s brief and (at least judging by the plausible, but misleading, claims under
Rush Limbaugh? If so, is he filling a sales) very glorious, and this story is the radar. Just to give one instance,
much needed, or a much lamented, most entertaining. Don’t miss the it compellingly compares a six year
gap in turning issues that are really cast wonderfully faux naive letter from old hopping from rock to rock in a
in pastel shades into Day-Glo relief? Hawkins to Gordon Moore asking, streambed with a lumbering robot
In this hale monograph, Jeff Hawkins in 1980, to set up a research group failing to do the same task. However,
(rendered by Sandra Blakeslee) within Intel devoted to the brain. That this is a bit unfair. One of Hawkins’
plays exactly this role for theoretical Hawkins prospered in clear opposition self-denying ordinances is to consider
neuroscience. As a pastel practitioner to accepted wisdom is perhaps one of the cortex pretty much by itself. As
myself, but furtively sharing many of the key subtexts of the book. aficionados of the cerebellum (an
Hawkins’ prejudices and hunches The second, and rather less evolutionarily ancient brain region
about computational modelling in satisfying, book is about the philosophy with a special role in the organisation
neuroscience, I am caught between of mind and the history of artificial of smooth, precise, well-timed, and
commendation and consternation. intelligence and neural network task-sensitive motor output) would be
Hawkins is an engineer, approaches to understanding the brain quick to point out, the singular role for
entrepreneur, and scientist who and replicating cognition. With respect the cortex in such graceful behaviour is
founded and led the companies to the fields of artificial intelligence rather questionable.
Palm and then Handspring. He and neural nets, the text seems rather
created, against what must have been to be fighting yesterday’s battles. The
Citation: Dayan P (2004) Palmistry. PLoS Biol 2(11):
considerable obstacles, the first widely importance of learning, flexibility in e394.
successful PDA, and continued the representation and inference, and
development of this platform. He even decentralisation of control has Copyright: © 2004 Peter Dayan. This is an open-access
article distributed under the terms of the Creative
has thus amply earned a bully pulpit. been more than amply recognised in Commons Attribution License, which permits unre-
The autobiographical segments of the inexorable rise of probabilistic stricted use, distribution, and reproduction in any
medium, provided the original work is properly cited.
this book detail that, throughout approaches in both fields.
his career, he has been interested in With respect to the philosophy of Peter Dayan is in the Gatsby Computational Neurosci-
understanding how the brain works, mind, there seems to be something of ence Unit at University College London, London,
United Kingdom. E-mail: dayan@gatsby.ucl.ac.uk
using his substantial knowledge and an enthusiast’s disdain for the niceties
intuition about the architecture and of philosophical pettifogging, even DOI: 10.1371/journal.pbio.0020394

PLoS Biology | www.plosbiology.org 1711 November 2004 | Volume 2 | Issue 11 | e394


The third book is what I think is or more square), another set for the machine learning, computer vision,
intended to be the real contribution. expression, and yet others, too. and a host of other disciplines.
This contains a (not wholly convincing) Cortical representations are thus
attempt to conceptualise the definition intended to reflect directly the Predictive Auto-Association
of intelligence in terms of prediction statistical structure in the input. We can now return to the book.
rather than behaviour, and then to Importantly, for inputs such as Hawkins compactly sums up his
describe its possible instantiation in the movies, this structure is thought to thesis in the following way. ‘To make
anatomy (and mostly only the anatomy) be hierarchical and, concomitantly, predictions of future events, your
of the cortex. to provide an account of the neocortex has to store sequences
observed hierarchical structure of of patterns. To recall appropriate
Unsupervised Learning sensory cortical areas. One source of memories, it has to retrieve patterns
To situate Hawkins’ suggestions, it is hierarchical structure in movies is the by their similarity to past patterns
instructive to consider current models simple fact that objects (such as the (auto-associative recall). And finally,
of how the cerebral cortex represents, faces) have parts (such as eyes and memories have to be stored in an
and learns to represent, information cheeks) whose form and changes in invariant form so that the knowledge
about the world without being explicitly form over time are interdependent. of past events can be applied to new
taught. Being a popular account, the Another source of hierarchical situations that are similar but not
book fairly breezes by these so-called structure is that the same face can identical to the past.’ In fact, to take
unsupervised learning models (see appear in many different poses, under the latter points first, the sort of
Hinton and Ghahramani 1997; Rao many different forms of illumination, auto-associative storage and recall to
et al. 2002), in which the neocortex is and so on. Pattern theory (Grenander which Hawkins refers is a theoretically
treated as a general device for finding 1995), one of the parent disciplines and practically hobbled version of
relationships or structure in its input. of the field, calls these dimensions unsupervised learning’s probabilistic
The algorithms are called unsupervised of variation deformations. Loosely, inference. Invariance is closely related
since they have to work without the deformations are independent to the deformations we described above
detailed information from a teacher or of the objects themselves, and we in the context of pattern theory.
a supervisor about the actual structure might expect this independence Unsupervised learning has certainly
in each input. Rather, they must rely on to be reflected in the cortical paid substantial attention to sequences
general, statistical characteristics. representations. Indeed, there is of inputs and prediction, and to some
First, where does the structure in neurophysiological evidence for just good effect. For instance, (artificial)
the inputs come from? For the sake such invariant neural responses to speech recognition programs are
of concreteness, think of the input deformations of a stimulus. based on a probabilistic device called
as being something like movies on a How does the cortex do all this? Of a hidden Markov model, which is a key
television screen. Movies don’t look like course, some fraction of this structure element in a wealth of unsupervised
white noise, or ‘snow’, because of their was built in over evolution. However, learning approaches to prediction.
statistical structure. For instance, in the unsupervised learning tradition However, despite heroic efforts, these
movies, pixel activities tend to change concentrates on ontogenic adaptation, modelling methods are incapable
rather slowly over time, and pixels that based on multiple presented input of capturing the sort of complex
are close to each other on the screen movies. An additional facet of the lack structure seen in inputs such as natural
tend to have relatively similar activities of supervision is that this adaptation languages. They fail on phenomena
at any given time. Neither of these is is taken as not depending on any like long-distance dependencies, for
true of white noise. More technically, particular behavioural task. example, the agreement between the
movies constitute only a tiny fraction Finally, what does this process cases of subjects and verbs, which are
of the space of all possible activations allow the cortex to do? The whole rife. This does tend to offer a vaccine
of all the pixels on your screen. They representational structure is intended against Hawkins’ otherwise infectious
(and indeed real visual scenes) have a to support inference. Crudely, this optimism.
particular statistical structure that the involves turning partial or noisy Once place in which Hawkins goes
cortex is supposed to extract. inputs into the completed, cleaned-up beyond existing unsupervised learning
What is the cortex supposed to patterns they imply, using connections models is in an extension to actions
do with this structure? The idea between areas in the cortical hierarchy. and control, and in an ascription of
is that the cortex learns to model, Construed this way, probabilistic parts of the model to cortical anatomy.
or ‘parameterize’, it. Then, the inference actually instantiates a very The hierarchical conception of cortex
activities of cortical cells over time general form of computation. Crucially, here goes all the way down to primary
for a particular input, for example, a over the course of the development motor cortex (the neocortical area
particular face in a movie, indicate the of unsupervised learning methods, most directly associated with motor
values of the parameters associated it has been realised that the best way output). This allows auto-associative
with that face. Thereby the cortical to approach the extraction of input recall of sequences of past inputs and
activities represent the input. The structure, and inference with it, is outputs to be used to specify actions
parameters for a face might include through the language and tools of that have formerly been successful.
one set for its physical structure probability theory and statistics. The The discussion of this possibility is,
(e.g., the separation between the same realisation has driven substantial unfortunately, rather brief. Central
eyes and whether it is more round developments in artificial intelligence, issues are omitted, such as the way

PLoS Biology | www.plosbiology.org 1712 November 2004 | Volume 2 | Issue 11 | e394


that planning over multiple actions in interestingly rich sensory domains. computations go into selecting aspects
might happen. Also, the way that value Does the book provide computational of the input to which the models might
is assigned to outcomes to determine Halcyon? First, the representations be applied, and sophisticated models
success or failure is not discussed. acquired by unsupervised learning are of these computations, such as Li’s
The latter is widely believed to involve intended to be used for something— salience circuit (2002), involve aspects
the neuromodulatory systems that lie such as accomplishing more specific of cortical anatomy and physiology
below the cortex and that the book’s learning tasks, for example, making ignored in the book.
cortical chauvinism leads it cheerfully predictions of reward. However, most As a final example of a spur to
to ignore. aspects of the statistical structure of insomnia, unsupervised learners
By contrast, the book has a rather inputs are irrelevant. This might be worry that Damasio (1994) might be
detailed description of how the called the ‘carpet’ problem: there is somewhat right. That is, cool logic and
model should map onto the anatomy a wealth of statistical structure in the hot emotion may be tightly coupled in
of the cerebral cortex. Like many visual texture of carpets; however, a way that a model such as this that is
unsupervised learning modellers, this structure is irrelevant for almost rigidly confined to cortical processing,
Hawkins is a self-confessed ‘lumper’. any task. Capturing it might therefore ignoring key subcortical contributions
He ignores huge swathes of complexity (a) constitute a terrible waste of to practical decision making, will find
and specificity in cortical structure and cortical representational power, or, hard to capture.
connections in favour of a scheme of worse, (b) interfere with, or warp, the To sum up, in terms of the adage
crystalline regularity. Though this will parameterization of the aspects of the that genius is 1% inspiration and 99%
doubtless irk many readers (as will the input that are important, making it perspiration, the book’s enthymematic
lack of citations to some influential harder to extract critical distinctions. nature suggests that not quite enough
prior proponents such as Douglas The book does not address this sweat has been broken. Were it 1%
and Martin [1991]), some (though issue, relying on there being enough inspiration and 99% aspiration,
not necessarily this) strong form of predictive power to capture any and though, then the appealing call to
abstraction and omission is necessary to all predictions, including predictive arms for a new generation of modellers
get to clear functional ideas. This part characterisation of motor control. should more than suffice. 
has interesting suggestions, such as a Second, although our subjective
neat solution for a persistent dilemma sense is that we build a sophisticated Acknowledgments
for proponents of hierarchical models. predictive model of the entire Funding was from the Gatsby Charitable
The battle comes between cases in sensory input, experiments into such Foundation. I am most grateful to a large
which information in a higher cortical phenomena as change blindness number of colleagues for comments.
area, acting as prior information, boosts (Rensink 2002) show this probably
activities in a lower cortical area, and isn’t true. A classic example involves References and Further Reading
cases of predictive coding, in which alternating the presentation of Damasio AR (1994) Descartes’ error: Emotion,
the higher cortical area informs the two pictures, which differ in some reason, and the human brain. New York:
Putnam. 312 p.
lower cortical area about what it already significant way (e.g., the colour Douglas RJ, Martin KA (1991) Opening the grey
knows and therefore suppresses the of the trousers of one of the main box. Trends Neurosci 14: 286–293.
Grenander U (1995) Elements of pattern
information that the lower area would protagonists). Subjects have great theory. Baltimore (Maryland): Johns Hopkins
otherwise just repeat up the hierarchy. difficulty in identifying the difference University Press. 222 p.
The proposed solution involves the between the pictures, even though Hinton GE, Ghahramani Z (1997) Generative
models for discovering sparse distributed
invention (or rather prediction) of (a) they are explicitly told to look representations. Philos Trans R Soc Lond B Biol
two different sorts of neurons in a for it, (b) they have the subjective Sci 352: 1177–1190.
particular layer of cortex. sense that they have represented all Li Z (2002) A saliency map in primary visual cortex
Trends Cogn Sci 6: 9–16.
Unsupervised learning models of the information in each picture, and Rao RPN, Olshausen BA, Lewicki MS, editors
cortex are without doubt very elegant. (c) if the location of the change is (2002) Probabilistic models of the brain:
Perception and neural function. Cambridge
However, if pushed, purveyors of this pointed out, they see it as blindingly (Massachusetts): MIT Press. 324 p.
approach will often admit to being obvious. This, and other attentional Rensink RA (2002). Change detection. Annu Rev
kept awake at night by a number of phenomena, suggests that substantially Psychol 53: 245–277.
Wolpert DM, Ghahramani Z, Flanagan JR (2001)
critical concerns even apart from the less is actually represented than we Perspectives and problems in motor learning.
difficulty of getting the models to work might naively think. In fact, elaborate Trends Cogn Sci 5: 487–494.

PLoS Biology | www.plosbiology.org 1713 November 2004 | Volume 2 | Issue 11 | e394

You might also like