You are on page 1of 67

Human-Like Machine Intelligence

Stephen Muggleton
Visit to download the full and correct content document:
https://ebookmass.com/product/human-like-machine-intelligence-stephen-muggleton/
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

HUMAN-LIKE MACHINE INTELLIGENCE


OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Human-Like Machine Intelligence


Edited by

Stephen Muggleton
Imperial College London

Nick Chater
University of Warwick

1
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Oxford University Press 2021
The moral rights of the author have been asserted
First Edition published in 2021
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2021932529
ISBN 978–0–19–886253–6
DOI: 10.1093/oso/9780198862536.001.0001
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Preface

Recently there has been increasing excitement about the potential for artificial intelli-
gence to transform human society. This book addresses the leading edge of research in
this area. This research aims to address present incompatibilities of human and machine
reasoning and learning approaches.
According to the influential US funding agency DARPA (originator of the Internet
and Self-Driving Cars) this new area represents the Third Wave of Artificial Intelligence
(3AI, 2020s–2030s), and is being actively investigated in the United States, Europe
and China. The UK’s Engineering and Physical Sciences Research Council (EPSRC)
network on Human-Like Computing (HLC) was one of the first network internationally
to initiate and support research specifically in this area. Starting activities in 2018, the
network represents around 60 leading UK artificial intelligence and cognitive scientists
involved in the development of the interdisciplinary area of HLC. The research of net-
work groups aims to address key unsolved problems at the interface between psychology
and computer science.
The chapters of this book have been authored by a mixture of these UK and other
international specialists based on recent workshops and discussions at the Machine
Intelligence 20 and 21 workshops (2016, 2019) and the Third Wave Artificial Intelligence
workshop (2019). Some of the key questions addressed by the human-like computing
programme include how AI systems might (1) explain their decisions effectively, (2)
interact with human beings in natural language, (3) learn from small numbers of
examples and (4) learn with minimal supervision. Solving such fundamental problems
involves new foundational research in both the psychology of perception and interaction
as well as the development of novel algorithmic approaches in artificial intelligence.
The book is arranged in five parts. The first part describes central challenges of
human-like computing, ranging from issues involved in developing a beneficial form
of AI (Russell, Berkeley), as well as a modern philosophical perspective on Alan
Turing’s seminal model of computation and his view of its potential for intelligence
(Millican, Oxford). Two chapters then address the promising new approach of virtual
bargaining and representational revision as new technologies for supporting implicit
human–machine interaction (Chater, Warwick; Bundy, Edinburgh).
Part 2 addresses human-like social cooperation issues, providing both the AI perspec-
tive of dialectic explanations (Toni, Imperial) alongside relevant psychological research
on the limitations and biases of human explanations (Hahn, Birkbeck) and the challenges
human-like communication poses for AI systems (Healey, Queen Mary). The possibility
of reverse engineering human cooperation is described (Kleiman-Weiner, Harvard) and
contrasts with issues in using explanations in machine teaching (Hernandez-Orallo,
Politecnia Valencia). Part 3 concentrates on Human-Like Perception and Language,
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

vi Preface

including new approaches to human-like computer vision (Muggleton, Imperial), and


the related new area of apperception (Evans, DeepMind), as well as suggestions on com-
bining human and machine vision in analysing complex signal data (Jay, Manchester). An
ongoing UK study on social interaction is described in (Pickering, Edinburgh) together
with a chapter exploring the use of multi-modal communication (Vigliocco, UCL).
In Part 4, issues related to human-like representation and learning are discussed. This
starts with a description of work on human–machine scientific discovery (Tamaddoni-
Nezhad, Imperial) which is related to models of fast and slow learning in humans
(Mareschal), followed by a chapter on machine-learning methods for generating mutual
explanations (Schmid, Bamberg). Issues relating graphical and symbolic representation
are described in (Jamnik, Cambridge). This has potential relevance to applications for
inductively generating programs for use with spreadsheets (De Raedt, Leuven). Lastly,
Part 5 considers challenges for evaluating and explaining the strength of human-like
reasoning. Evaluations are necessarily context dependent, as shown in the paper on
automated common-sense spatial reasoning (Cohn, Leeds), though a second paper
argues that Bayesian-inspired approaches which avoid probabilities are powerful for
explaining human brain activity (Sanborn, Warwick). Bayesian approaches are also
shown to be capable of explaining various oddities of human reasoning, such as the
conjunction fallacy (Tentori, Trento). By contrast, when evaluating situated AI systems
there are clear advantages and difficulties in evaluating robot football players using
objective probabilities within a competitive environment (Sammut, UNSW). The book
closes with a chapter demonstrating the ongoing challenges of evaluating the relative
strengths of human and machine play in chess (Bratko, Ljubjana).
June 2020 Stephen Muggleton
and Nick Chater
Editors
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Acknowledgements

This book would not have been possible without a great deal of help. We would like
to thank Alireza Tamaddoni-Nezhad for his valuable help in organising the meetings
which led to this book, and with his help in finalizing the book itself, as well as
Francesca McMahon, our editor at OUP, for her advice and encouragement. We also
thank our principal funder, the EPSRC, for backing the Network on Human-Like
Computing (HLC, grant number EP/R022291/1); and acknowledge additional support
from the ESRC Network for Integrated Behavioural Science (grant number grant
number ES/P008976/1). Finally, special thanks are due to Bridget Gundry for her hard
work, tenacity, and cheerfulness in driving the book through to a speedy and successful
conclusion.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Contents

Part 1 Human-like Machine Intelligence


1 Human-Compatible Artificial Intelligence 3
Stuart Russell
1.1 Introduction 3
1.2 Artificial Intelligence 4
1.3 1001 Reasons to Pay No Attention 6
1.4 Solutions 9
1.5 Reasons for Optimism 17
1.6 Obstacles 18
1.7 Looking Further Ahead 21
1.8 Conclusion 22
References 22
2 Alan Turing and Human-Like Intelligence 24
Peter Millican
2.1 The Background to Turing’s 1936 Paper 24
2.2 Introducing Turing Machines 26
2.3 The Fundamental Ideas of Turing’s 1936 Paper 29
2.4 Justifying the Turing Machine 31
2.5 Was the Turing Machine Inspired by Human Computation? 32
2.6 From 1936 to 1950 36
2.7 Introducing the Imitation Game 38
2.8 Understanding the Turing Test 40
2.9 Does Turing’s “Intelligence” have to be Human-Like? 43
2.10 Reconsidering Standard Objections to the Turing Test 46
References 49
3 Spontaneous Communicative Conventions through Virtual Bargaining 52
Nick Chater and Jennifer Misyak
3.1 The Spontaneous Creation of Conventions 52
3.2 Communication through Virtual Bargaining 54
3.3 The Richness and Flexibility of Signal-Meaning Mappings 58
3.4 The Role of Cooperation in Communication 61
3.5 The Nature of the Communicative Act 63
3.6 Conclusions and Future Directions 65
References 66
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

x Contents

4 Modelling Virtual Bargaining using Logical Representation Change 68


Alan Bundy, Eugene Philalithis, and Xue Li
4.1 Introduction—Virtual Bargaining 68
4.2 What’s in the Box? 69
4.3 Datalog Theories 71
4.4 SL Resolution 75
4.5 Repairing Datalog Theories 77
4.6 Adapting the Signalling Convention 80
4.7 Conclusion 87
References 88

Part 2 Human-like Social Cooperation


5 Mining Property-driven Graphical Explanations for Data-centric AI
from Argumentation Frameworks 93
Oana Cocarascu, Kristijonas Cyras, Antonio Rago, and Francesca Toni
5.1 Introduction 93
5.2 Preliminaries 95
5.3 Explanations 99
5.4 Reasoning and Explaining with BFs Mined from Text 100
5.5 Reasoning and Explaining with AFs Mined from Labelled Examples 103
5.6 Reasoning and Explaining with QBFs Mined from Recommender Systems 106
5.7 Conclusions 109
References 110
6 Explanation in AI systems 114
Marko Tesic and Ulrike Hahn
6.1 Machine-generated Explanation 114
6.2 Good Explanation 120
6.3 Bringing in the user: bi-directional relationships 126
6.4 Conclusions 129
References 130
7 Human-like Communication 137
Patrick G. T. Healey
7.1 Introduction 137
7.2 Face-to-face Conversation 139
7.3 Coordinating Understanding 143
7.4 Real-time Adaptive Communication 145
7.5 Conclusion 146
References 147
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Contents xi

8 Too Many cooks: Bayesian inference for coordinating Multi-agent


Collaboration 152
Rose E. Wang, Sarah A. Wu, James A. Evans, David C. Parkes,
Joshua B. Tenenbaum, and Max Kleiman-Weiner
8.1 Introduction 152
8.2 Multi-Agent MDPs with Sub-Tasks 154
8.3 Bayesian Delegation 156
8.4 Results 160
8.5 Discussion 167
References 168
9 Teaching and Explanation: Aligning Priors between Machines
and Humans 171
Jose Hernandez-Orallo and Cesar Ferri
9.1 Introduction 171
9.2 Teaching Size: Learner and Teacher Algorithms 174
9.3 Teaching and Explanations 179
9.4 Teaching with Exceptions 182
9.5 Universal Case 185
9.6 Feature-value Case 187
9.7 Discussion 191
References 192

Part 3 Human-like Perception and Language


10 Human-like Computer Vision 199
Stephen Muggleton and Wang-Zhou Dai
10.1 Introduction 199
10.2 Related Work 201
10.3 Logical Vision 203
10.4 Learning Low-level Perception through Logical Abduction 208
10.5 Conclusion and Future Work 213
References 214
11 Apperception 218
Richard Evans
11.1 Introduction 218
11.2 Method 219
11.3 Experiment: Sokoban 228
11.4 Related Work 234
11.5 Discussion 235
11.6 Conclusion 236
References 237
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

xii Contents

12 Human–Machine Perception of Complex Signal Data 239


Alaa Alahmadi, Alan Davies, Markel Vigo, Katherine Dempsey, and Caroline Jay
12.1 Introduction 239
12.2 Human–Machine Perception of ECG Data 241
12.3 Human–Machine Perception: Differences, Benefits,
and Opportunities 250
References 256
13 The Shared-Workspace Framework for Dialogue and Other
Cooperative Joint Activities 260
Martin Pickering and Simon Garrod
13.1 Introduction 260
13.2 The Shared Workspace Framework 260
13.3 Applying the Framework to Dialogue 262
13.4 Bringing Together Cooperative Joint Activity and Communication 267
13.5 Relevance to Human-like Machine Intelligence 269
13.6 Conclusion 272
References 273
14 Beyond Robotic Speech: Mutual Benefits to Cognitive Psychology
and Artificial Intelligence from the Study of Multimodal
Communication 274
Beata Grzyb and Gabriella Vigliocco
14.1 Introduction 274
14.2 The Use of Multimodal Cues in Human Face-to-face Communication 276
14.3 How Humans React to Embodied Agents that Use Multimodal Cues? 278
14.4 Can Embodied Agents Recognize Multimodal Cues Produced by Humans? 280
14.5 Can Embodied Agents Produce Multimodal Cues? 282
14.6 Summary and Way Forward: Mutual Benefits from Studies on Multimodal
Communication 285
References 288

Part 4 Human-like Representation and Learning


15 Human–Machine Scientific Discovery 297
Alireza Tamaddoni-Nezhad, David Bohan, Ghazal Afroozi Milani,
Alan Raybould, and Stephen Muggleton
15.1 Introduction 297
15.2 Scientific Problem and Dataset: Farm Scale Evaluations (FSEs)
of GMHT Crops 299
15.3 The Knowledge Gap for Modelling Agro-ecosystems: Ecological Networks 301
15.4 Automated Discovery of Ecological Networks from FSE Data and
Ecological Background Knowledge 302
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Contents xiii

15.5 Evaluation of the Results and Subsequent Discoveries 308


15.6 Conclusions 313
References 314
16 Fast and Slow Learning in Human-Like Intelligence 316
Denis Mareschal and Sam Blakeman
16.1 Do Humans Learn Quickly and Is This Uniquely Human? 316
16.2 What Makes for Rapid Learning? 324
16.3 Reward Prediction Error as the Gateway to Fast and Slow Learning 327
16.4 Conclusion 330
References 332
17 Interactive Learning with Mutual Explanations in Relational Domains 338
Ute Schmid
17.1 Introduction 338
17.2 The Case for Interpretable and Interactive Learning 339
17.3 Types of Explanations—There is No One-Size Fits All 341
17.4 Interactive Learning with ILP 345
17.5 Learning to Delete with Mutual Explanations 346
17.6 Conclusions and Future Work 350
References 350
18 Endowing machines with the expert human ability to select
representations: why and how 355
Mateja Jamnik and Peter Cheng
18.1 Introduction 355
18.2 Example of selecting a representation 357
18.3 Benefits of switching representations 359
18.4 Why selecting a good representation is hard 361
18.5 Describing representations: rep2rep 363
18.6 Automated analysis and ranking of representations 370
18.7 Applications and future directions 373
References 375
19 Human–Machine Collaboration for Democratizing Data Science 379
CléMent Gautrais, Yann Dauxais, Stefano Teso, Samuel Kolb,
Gust Verbruggen, and Luc De Raedt
19.1 Introduction 379
19.2 Motivation 380
19.3 Data Science Sketches 383
19.4 Related Work 397
19.5 Conclusion 399
References 399
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

xiv Contents

Part 5 Evaluating Human-like Reasoning


20 Automated Common-sense Spatial Reasoning: Still a Huge Challenge 405
Brandon Bennett and AnthonyG Cohn
20.1 Introduction 405
20.2 Common-sense Reasoning 406
20.3 Fundamental Ontology of Space 412
20.4 Establishing a Formal Representation and its Vocabulary 414
20.5 Formalizing Ambiguous and Vague Spatial Vocabulary 416
20.6 Implicit and Background Knowledge 419
20.7 Default Reasoning 420
20.8 Computational Complexity 421
20.9 Progress towards Common-sense Spatial Reasoning 423
20.10 Conclusions 423
References 425
21 Sampling as the Human Approximation to Probabilistic Inference 430
Adam Sanborn, Jian-Qiao Zhu, Jake Spicer, Joakim Sundh,
Pablo León-Villagrá, and Nick Chater
21.1 A Sense of Location in the Human Sampling Algorithm 432
21.2 Key Properties of Cognitive Time Series 435
21.3 Sampling Algorithms to Explain Cognitive Time Series 437
21.4 Making the Sampling Algorithm more Bayesian 441
21.5 Conclusions 443
References 445
22 What Can the Conjunction Fallacy Tell Us about Human Reasoning? 449
Katya Tentori
22.1 The Conjunction Fallacy 449
22.2 Fallacy or No Fallacy? 450
22.3 Explaining the Fallacy 452
22.4 The Pre-eminence of Impact Assessment over Probability Judgements 455
22.5 Implications for Effective Human-like Computing 457
22.6 Conclusion 460
References 461
23 Logic-based Robotics 465
Claude Sammut, Reza Farid, Handy Wicaksono, and Timothy Wiley
23.1 Introduction 465
23.2 Relational Learning in Robot Vision 466
23.3 Learning to Act 471
23.4 Conclusion 484
References 484
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Contents xv

24 Predicting Problem Difficulty in Chess 487


Ivan Bratko, Dayana Hristova, and Matej Guid
24.1 Introduction 487
24.2 Experimental Data 489
24.3 Analysis 491
24.4 More Subtle Sources of Difficulty 500
24.5 Conclusions 502
References 503

Index 505
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Part 1
Human-like Machine Intelligence
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

1
Human-Compatible Artificial
Intelligence
Stuart Russell
University of California, Berkeley, USA

1.1 Introduction
Artificial intelligence (AI) has as its aim the creation of intelligent machines. An entity is
considered to be intelligent, roughly speaking, if it chooses actions that are expected to
achieve its objectives, given what it has perceived.1 Applying this definition to machines,
one can deduce that AI aims to create machines that choose actions that are expected to
achieve their objectives, given what they have perceived.
Now, what are these objectives? To be sure, they are—up to now, at least—objectives
that we put into them; but, nonetheless, they are objectives that operate exactly as if they
were the machines’ own and about which they are completely certain. We might call
this the standard model of AI: build optimizing machines, plug in the objectives, and off
they go. This model prevails not just in AI but also in control theory (minimizing a cost
function), operations research (maximizing a sum of rewards), economics (maximizing
individual utilities, gross domestic product (GDP), quarterly profits, or social welfare),
and statistics (minimizing a loss function). The standard model is a pillar of twentieth-
century technology.
Unfortunately, this standard model is a mistake. It makes no sense to design machines
that are beneficial to us only if we write down our objectives completely and correctly.
If the objective is wrong, we might be lucky and notice the machine’s surprisingly
objectionable behaviour and be able to switch it off in time. Or, if the machine is more
intelligent than us, the problem may be irreversible. The more intelligent the machine,
the worse the outcome for humans: the machine will have a greater ability to alter the

1 This definition can be elaborated and made more precise in various ways—particularly with respect to
whether the choosing and expecting occur within the agent, within the agent’s designer, or some combination
of both. The latter certainly holds for human agents, viewing evolution as the designer. The word ‘objective’ here
is also used informally, and does not refer just to end goals. For most purposes, an adequately general formal
definition of ‘objective’ covers preferences over lotteries over complete state sequences. Moreover, ‘state’ here
includes mental state as well as the world state external to the entity.

Start Russell, Human-Compatible Artificial Intelligence In: Human Like Machine Intelligence. Edited by: Stephen Muggleton and Nick Charter,
Oxford University Press. © Oxford University Press (2021). DOI: 10.1093/oso/9780198862536.003.0001
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

4 Human-Compatible Artificial Intelligence

world in ways that are inconsistent with our true objectives and greater skill in foreseeing
and preventing any interference with its plans.
In 1960, after seeing Arthur Samuel’s checker-playing program learn to play checkers
far better than its creator, Norbert Wiener (1960) gave a clear warning:

If we use, to achieve our purposes, a mechanical agency with whose operation


we cannot efficiently interfere . . . we had better be quite sure that the purpose
put into the machine is the purpose which we really desire.

Echoes of Wiener’s warning can be discerned in contemporary assertions that ‘su-


perintelligent AI’ may present an existential risk to humanity. (In the context of the
standard model, ‘superintelligent’ means having a superhuman capacity to achieve given
objectives.) Concerns have been raised by such observers as Nick Bostrom (2014), Elon
Musk (Kumparak 2014), Bill Gates (2015),2 and Stephen Hawking (Osborne 2017).
There is very little chance that as humans we can specify our objectives completely and
correctly in such a way that the pursuit of those objectives by more capable machines is
guaranteed to result in beneficial outcomes for humans.
The mistake comes from transferring a perfectly reasonable definition of intelligence
from humans to machines. The definition is reasonable for humans because we are
entitled to pursue our own objectives—indeed, whose would we pursue, if not our own?
The definition of intelligence is unary, in the sense that it applies to an entity by itself.
Machines, on the other hand, are not entitled to pursue their own objectives.
A more sensible definition of AI would have machines pursuing our objectives. Thus,
we have a binary definition: entity A chooses actions that are expected to achieve
the objectives of entity B, given what entity A has perceived. In the unlikely event
that we (entity B) can specify the objectives completely and correctly and insert them
into the machine (entity A), then we can recover the original, unary definition. If not,
then the machine will necessarily be uncertain as to our objectives while being obliged to
pursue them on our behalf. This uncertainty—with the coupling between machines and
humans that it entails—is crucial to building AI systems of arbitrary intelligence that are
provably beneficial to humans. We must, therefore, reconstruct the foundations of AI
along binary rather than unary lines.

1.2 Artificial Intelligence


The goal of AI research has been to understand the principles underlying intelligent
behaviour and to build those principles into machines that can then exhibit such
behaviour. In the 1960s and 1970s, the prevailing theoretical definition of intelligence
was the capacity for logical reasoning, including the ability to derive plans of action

2 Gates wrote, ‘I am in the camp that is concerned about superintelligence. . . . I agree with Elon Musk and
some others on this and don’t understand why some people are not concerned.’
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Artificial Intelligence 5

guaranteed to achieve a specified goal. A popular variant was the problem-solving


paradigm, which requires finding a minimum-cost sequence of actions guaranteed to
reach a goal state. More recently, a consensus has emerged in AI around the idea of a
rational agent that perceives and acts in order to maximize its expected utility. (In Markov
decision processes and reinforcement learning, utility is further decomposed into a
sum of rewards accrued through the sequence of transitions in the environment state.)
Subfields such as logical planning, robotics, and natural-language understanding are
special cases of the general paradigm. AI has incorporated probability theory to handle
uncertainty, utility theory to define objectives, and statistical learning to allow machines
to adapt to new circumstances. These developments have created strong connections
to other disciplines that build on similar concepts, including control theory, economics,
operations research, and statistics.
In both the logical-planning and rational-agent views of AI, the machine’s objective—
whether in the form of a goal, a utility function, or a reward function—is specified
exogenously. In Wiener’s words, this is ‘the purpose put into the machine’. Indeed,
it has been one of the tenets of the field that AI systems should be general-purpose—
that is, capable of accepting a purpose as input and then achieving it—rather than
special-purpose, with their goal implicit in their design. For example, a self-driving car
should accept a destination as input instead of having one fixed destination. However,
some aspects of the car’s ‘driving purpose’ are fixed, such as that it shouldn’t hit
pedestrians. This is built directly into the car’s steering algorithms rather than being
explicit: no self-driving car in existence today ‘knows’ that pedestrians prefer not to be
run over.
Putting a purpose into a machine that optimizes its behaviour according to clearly de-
fined algorithms seems an admirable approach to ensuring that the machine’s behaviour
furthers our own objectives. But, as Wiener warns, we need to put in the right purpose.
We might call this the King Midas problem: Midas got exactly what he asked for—
namely, that everything he touched would turn to gold—but, too late, he discovered the
drawbacks of drinking liquid gold and eating solid gold. The technical term for putting in
the right purpose is value alignment. When it fails, we may inadvertently imbue machines
with objectives counter to our own. Tasked with finding a cure for cancer as fast as
possible, an AI system might elect to use the entire human population as guinea pigs
for its experiments. Asked to de-acidify the oceans, it might use up all the oxygen in the
atmosphere as a side effect. This is a common characteristic of systems that optimize:
variables not included in the objective may be set to extreme values to help optimize that
objective.
Unfortunately, neither AI nor other disciplines built around the optimization of
objectives have much to say about how to identify the purposes ‘we really desire’. Instead,
they assume that objectives are simply implanted into the machine. AI research, in its
present form, studies the ability to achieve objectives, not the design of those objectives.
In the 1980s the AI community abandoned the idea that AI systems could have definite
knowledge of the state of the world or of the effects of actions, and they embraced
uncertainty in these aspects of the problem statement. It is not at all clear why, for
the most part, they failed to notice that there must also be uncertainty in the objective.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

6 Human-Compatible Artificial Intelligence

Although some AI problems such as puzzle solving are designed to have well-defined
goals, many other problems that were considered at the time, such as recommending
medical treatments, have no precise objectives and ought to reflect the fact that the
relevant preferences (of patients, relatives, doctors, insurers, hospital systems, taxpayers,
etc.) are not known initially in each case.
Steve Omohundro (2008) has pointed to a further difficulty, observing that any
sufficiently intelligent entity pursuing a fixed, known objective will act to preserve its
own existence (or that of an equivalent successor entity with an identical objective). This
tendency has nothing to do with a self-preservation instinct or any other biological notion;
it’s just that an entity usually cannot achieve its objectives if it is dead. According to
Omohundro’s argument, a superintelligent machine that has an off-switch—which some,
including Alan Turing (1951) himself, have seen as our potential salvation—will take
steps to disable the switch in some way. Thus we may face the prospect of superintelligent
machines—their actions by definition unpredictable and their imperfectly specified
objectives conflicting with our own—whose motivation to preserve their existence in
order to achieve those objectives may be insuperable.

1.3 1001 Reasons to Pay No Attention


Objections have been raised to these arguments, primarily by researchers within the AI
community. The objections reflect a natural defensive reaction, coupled perhaps with a
lack of imagination about what a superintelligent machine could do. None hold water on
closer examination. Here are some of the more common ones:

• Don’t worry, we can just switch it off:3 This is often the first thing that pops
into a layperson’s head when considering risks from superintelligent AI—as if a
superintelligent entity would never think of that. It is rather like saying that the risk
of losing to Deep Blue or AlphaGo is negligible—all one has to do is make the right
moves.
• Human-level or superhuman AI is impossible:4 This is an unusual claim for AI
researchers to make, given that, from Turing onward, they have been fending off
such claims from philosophers and mathematicians. The claim, which is backed by
no evidence, appears to concede that if superintelligent AI were possible, it would
be a significant risk. It is as if a bus driver, with all of humanity as his passengers,
said, ‘Yes, I’m driving toward a cliff—in fact, I’m pressing the pedal to the metal.
But trust me, we’ll run out of gas before we get there.’ The claim also represents
a foolhardy bet against human ingenuity. We’ve made such bets before and lost.

3 AI researcher Jeff Hawkins, for example, writes, ‘Some intelligent machines will be virtual, meaning they
will exist and act solely within computer networks. . . . It is always possible to turn off a computer network, even
if painful.’ https://www.recode.net/2015/3/2/11559576/.
4 The AI100 report (Stone et al. 2016) includes the following assertion: ‘Unlike in the movies, there is no
race of superhuman robots on the horizon or probably even possible.’
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

1001 Reasons to Pay No Attention 7

On 11 September 1933, renowned physicist Ernest Rutherford stated, with utter


confidence, ‘Anyone who expects a source of power from the transformation of
these atoms is talking moonshine’. On 12 September 1933, Leo Szilard invented the
neutron-induced nuclear chain reaction. A few years later, he demonstrated such a
reaction in his laboratory at Columbia University. As he recalled in a memoir: ‘We
switched everything off and went home. That night, there was very little doubt in
my mind that the world was headed for grief.’
• It’s too soon to worry about it: The right time to worry about a potentially serious
problem for humanity depends not just on when the problem will occur but also
on how much time is needed to devise and implement a solution that avoids the
risk. For example, if we were to detect a large asteroid predicted to collide with the
Earth in 2070, would we say, ‘It’s too soon to worry’? And if we consider the global
catastrophic risks from climate change predicted to occur later in this century, is it
too soon to take action to prevent them? On the contrary, it may be too late. The
relevant timescale for human-level AI is less predictable, but, like nuclear fission, it
might arrive considerably sooner than expected. Moreover, the technological path
to mitigate the risks is also arguably less clear. These two aspects in combination
do not argue for complacency; instead, they suggest the need for hard thinking to
occur soon. Wiener (1960) amplifies this point, writing,

The individual scientist must work as a part of a process whose time


scale is so long that he himself can only contemplate a very limited
sector of it. . . . Even when the individual believes that science contributes
to the human ends which he has at heart, his belief needs a continual
scanning and re-evaluation which is only partly possible. For the individual
scientist, even the partial appraisal of this liaison between the man and the
process requires an imaginative forward glance at history which is difficult,
exacting, and only limitedly achievable. And if we adhere simply to the
creed of the scientist, that an incomplete knowledge of the world and of
ourselves is better than no knowledge, we can still by no means always
justify the naive assumption that the faster we rush ahead to employ the
new powers for action which are opened up to us, the better it will be. We
must always exert the full strength of our imagination to examine where
the full use of our new modalities may lead us.

One variation on the ‘too soon to worry about it’ argument is Andrew Ng’s
statement that it’s ‘like worrying about overpopulation on Mars’. This appeals to
a convenient analogy: not only is the risk easily managed and far in the future but
also it’s extremely unlikely that we’d even try to move billions of humans to Mars
in the first place. The analogy is a false one, however. We’re already devoting huge
scientific and technical resources to creating ever more capable AI systems. A more
apt analogy would be a plan to move the human race to Mars with no consideration
for what we might breathe, drink, or eat once we arrived.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

8 Human-Compatible Artificial Intelligence

• It’s a real issue but we cannot solve it until we have superintelligence: One would not
propose developing nuclear reactors and then developing methods to contain the
reaction safely. Indeed, safety should guide how we think about reactor design. It’s
worth noting that Szilard almost immediately invented and patented a feedback
control system for maintaining a nuclear reaction at the subcritical level for power
generation, despite having absolutely no idea of which elements and reactions could
sustain the fission chain.
By the same token, had racial and gender bias been anticipated as an issue with
statistical learning systems in the 1950s, when linear regression began to be used
for all kinds of applications, the analytical approaches that have been developed in
recent years could easily have been developed then, and would apply equally well
to today’s deep learning systems.
In other words, we can make progress on the basis of general properties
of systems—e.g., systems designed within the standard model—without nec-
essarily knowing the details. Moreover, the problem of objective misspecifica-
tion applies to all AI systems developed within the standard model, not just
superintelligent ones.
• Human-level AI isn’t really imminent, in any case: The AI100 report, for example,
assures us, ‘contrary to the more fantastic predictions for AI in the popular press,
the Study Panel found no cause for concern that AI is an imminent threat to
humankind’. This argument simply misstates the reasons for concern, which are
not predicated on imminence. In his 2014 book, Superintelligence: Paths, Dangers,
Strategies, Nick Bostrom, for one, writes, ‘It is no part of the argument in this
book that we are on the threshold of a big breakthrough in artificial intelligence,
or that we can predict with any precision when such a development might occur.’
Bostrom’s estimate that superintelligent AI might arrive within this century is
roughly consistent with my own, and both are considerably more conservative than
those of the typical AI researcher.
• Any machine intelligent enough to cause trouble will be intelligent enough to have
appropriate and altruistic objectives:5 This argument is related to Hume’s is–ought
problem and G. E. Moore’s naturalistic fallacy, suggesting that somehow the
machine, as a result of its intelligence, will simply perceive what is right given its
experience of the world. This is implausible; for example, one cannot perceive,
in the design of a chessboard and chess pieces, the goal of checkmate; the same
chessboard and pieces can be used for suicide chess, or indeed many other games
still to be invented. Put another way: where Bostrom imagines humans driven

5 Rodney Brooks (2017), for example, asserts that it’s impossible for a program to be ‘smart enough that
it would be able to invent ways to subvert human society to achieve goals set for it by humans, without
understanding the ways in which it was causing problems for those same humans’. Often, the argument adds
the premise that people of greater intelligence tend to have more altruistic objectives, a view that may be
related to the self-conception of those making the argument. Chalmers (2010) points to Kant’s view that an
entity necessarily becomes more moral as it becomes more rational, while noting that nothing in our current
understanding of AI supports this view when applied to machines.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Solutions 9

extinct by a putative robot that turns the planet into a sea of paperclips, we
humans see this outcome as tragic, whereas the iron-eating bacterium Thiobacillus
ferrooxidans is thrilled. Who’s to say the bacterium is wrong? The fact that a machine
has been given a fixed objective by humans doesn’t mean that it will automatically
take on board as additional objectives other things that are important to humans.
Maximizing the objective may well cause problems for humans; the machine may
recognize those problems as problematic for humans; but, by definition, they are not
problematic within the standard model from the point of view of the given objective.
• Intelligence is multidimensional, ‘so smarter than humans’ is a meaningless concept: This
argument, due to Kevin Kelly (2017), draws on a staple of modern psychology—
the fact that a scalar IQ does not do justice to the full range of cognitive skills
that humans possess to varying degrees. IQ is indeed a crude measure of human
intelligence, but it is utterly meaningless for current AI systems because their
capabilities across different areas are uncorrelated. How do we compare the IQ
of Google’s search engine, which cannot play chess, to that of Deep Blue, which
cannot answer search queries? None of this supports the argument that because
intelligence is multifaceted, we can ignore the risk from superintelligent machines.
If ‘smarter than humans’ is a meaningless concept, then ‘smarter than gorillas’ is
also meaningless, and gorillas therefore have nothing to fear from humans. Clearly,
that argument doesn’t hold water. Not only is it logically possible for one entity to
be more capable than another across all the relevant dimensions of intelligence, it
is also possible for one species to represent an existential threat to another even if
the former lacks an appreciation for music and literature.

1.4 Solutions
Can we tackle Wiener’s warning head-on? Can we design AI systems whose purposes
don’t conflict with ours, so that we’re sure to be happy with how they behave? On the
face of it, this seems hopeless because it will doubtless prove infeasible to write down
our purposes correctly or imagine all the counterintuitive ways a superintelligent entity
might fulfil them.
If we treat superintelligent AI systems as if they were black boxes from outer space,
then indeed there is no hope. Instead, the approach we seem obliged to take, if we are to
have any confidence in the outcome, is to define some formal problem F and design AI
systems to be F -solvers, such that the closer the AI system comes to solving F perfectly,
the greater the benefit to humans. In simple terms, the more intelligent the machine, the
better the outcome for humans: we hope the machine’s intelligence will be applied both
to learning our true objectives and to helping us achieve them. If we can work out an
appropriate F that has this property, we will be able to create provably beneficial AI.
There is, I believe, an approach that may work. Humans can reasonably be described
as having (mostly implicit and partially formed) preferences over their future lives—that
is, given enough time and unlimited visual aids, a human could express a preference
(or indifference) when offered a choice between two future lives laid out before him
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

10 Human-Compatible Artificial Intelligence

or her in all their aspects. (This idealization ignores the possibility that our minds are
composed of subsystems with effectively incompatible preferences; if true, that would
limit a machine’s ability to satisfy our preferences optimally, but it doesn’t seem to prevent
us from designing machines that avoid catastrophic outcomes.) The formal problem F to
be solved by the machine in this case is a game-theoretic one: to maximize human future-
life preferences subject to its initial uncertainty as to what they are, in an environment
that includes human participants. Furthermore, although the future-life preferences are
hidden variables, they’re grounded in a voluminous source of evidence, namely, all of the
human choices ever made.
This formulation sidesteps Wiener’s problem, because we do not put a fixed purpose
in the machine according to which it can rank all possible futures. Instead, the machine
knows that it doesn’t know the true preference ranking, so it naturally acts cautiously to
avoid violating potentially important but unknown preferences. (We can certainly include
fairly strong priors on the positive value of life, health, etc., to make the machine more
useful more quickly.) The machine may learn more about human preferences as it goes
along, of course, but it will never achieve complete certainty. Such a machine will be
motivated to ask questions, to seek permission or additional feedback before undertaking
any potentially risky course of action, to defer to human instruction, and to allow itself
to be switched off. These behaviours are not built in via preprogrammed scripts or rules;
rather, they fall out as solutions of the formal problem F .
As noted in the introduction, this involves a shift from a unary view of AI to a
binary one. The classical view, in which a fixed objective is given to the machine, is
illustrated qualitatively in Figure 1.1. Once the machine has a fixed objective, it will act
to optimize the achievement of the objective; its behaviour is effectively independent of
the human’s behaviour.6 On the other hand, when the human objective is unobserved
by the machine (see Figure 1.2), the human and machine behaviors remain coupled
information-theoretically because human behavior provides further information about
human objectives.

(a) Human objective (b) Human objective

Human behaviour Machine behaviour Machine behaviour

Figure 1.1 (a) The classical AI situation in which the human objective is considered fixed and known
by the machine, depicted as a notional graphical model. Given the objective, the machine’s behaviour is
(roughly speaking) independent of any subsequent human behaviour, as depicted in (b). This unary view
of AI is tenable only if the human objective can be completely and correctly stated.

6 The independence is not strict because the human’s behaviour can provide information about the state of
the world. Thus, a passenger in an automated taxi could tell the taxi that snipers have been reported on the
road it intends to take, picking off passengers for fun; but this might affect the taxi’s behaviour only if it already
knows that death by gunfire is undesirable for humans.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Solutions 11

Human objective

Human behaviour Machine behaviour

Figure 1.2 When the human objective is unobserved, machine behaviour is no longer independent of
human behaviour, because the latter provides more information about the human objective.

1.4.1 Assistance games


This basic idea is made more precise in the framework of assistance games—originally
known as cooperative inverse reinforcement learning (CIRL) games in the terminology
of Hadfield-Menell et al. (2017a). The simplest case of an assistance game involves two
agents, one human and the other a robot. It is a game of partial information, because,
while the human knows the reward function, the robot does not—even though the robot’s
job is to maximize it. It may involve a form of inverse reinforcement learning (Russell
1998; Ng and Russell 2000) because the robot can learn more about human preferences
from the observation of human behaviour—a process that is the dual of reinforcement
learning, wherein behaviour is learned from rewards and punishments.
To illustrate assistance games, I’ll use the paperclip game. It’s a very simple game
in which Harriet the human has an incentive to ‘signal’ to Robbie the robot some
information about her preferences. Robbie is able to interpret that signal because he
can solve the game and therefore he can understand what would have to be true about
Harriet’s preferences in order for her to signal in that way.
The steps of the game are depicted in Figure 1.3. It involves making paperclips and
staples. Harriet’s preferences are expressed by a payoff function that depends on the
number of paperclips and the number of staples produced, with a certain ‘exchange
rate’ between the two. Harriet’s preference parameter θ denotes the relative value (in
dollars) of a paperclip; for example, she might value paperclips at θ = 0.45 dollars,
which means staples are worth 1 − θ = 0.55 dollars. So, if p paperclips and s staples
are produced, Harriet’s payoff will be pθ + s(1 − θ) dollars in all. Robbie’s prior is
P (θ) = Uniform(θ; 0, 1). In the game itself, Harriet goes first and can choose to make two
paperclips, two staples, or one of each. Then Robbie can choose to make 90 paperclips,
90 staples, or 50 of each.
Notice that if she were doing this by herself, Harriet would just make two staples, with
a value of $1.10. (See the annotations at the first level of the tree in Figure 1.3.) But
Robbie is watching, and he learns from her choice. What exactly does he learn? Well,
that depends on how Harriet makes her choice. How does Harriet make her choice?
That depends on how Robbie is going to interpret it. One can resolve this circularity
by finding a Nash equilibrium. In this case, it is unique and can be found by applying
the iterated-best-response algorithm: pick any strategy for Harriet; pick the best strategy
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

12 Human-Compatible Artificial Intelligence

[2,0] [1,1] [0,2]

$0.90 R $1.00 R $1.10 R


[90,0] [0,90]
[50,50]

Figure 1.3 The paperclip game. Each branch is labeled [p, s] denoting the number of paperclips and
staples manufactured on that branch. Harriet the human can choose to make two paperclips, two staples,
or one of each. (The values in green italics are the values for Harriet if the game ended there, assuming
θ = 0.45.) Robbie the robot then has a choice to make 90 paperclips, 90 staples, or 50 of each.

for Robbie, given Harriet’s strategy; pick the best strategy for Harriet, given Robbie’s
strategy; and so on. The process unfolds as follows:

1. Start with the greedy strategy for Harriet: make two paperclips if she prefers paper-
clips; make one of each if she is indifferent; make two staples if she prefers staples.
2. There are three possibilities Robbie has to consider, given this strategy for Harriet:
(a) If Robbie sees Harriet make two paperclips, he infers that she prefers pa-
perclips, so he now believes the value of a paperclip is uniformly distributed
between 0.5 and 1.0, with an average of 0.75. In that case, his best plan is to
make 90 paperclips with an expected value of $67.50 for Harriet.
(b) If Robbie sees Harriet make one of each, he infers that she values paperclips
and staples at 0.50, so the best choice is to make 50 of each.
(c) If Robbie sees Harriet make two staples, then by the same argument as in
(a), he should make 90 staples.
3. Given this strategy for Robbie, Harriet’s best strategy is now somewhat different
from the greedy strategy in step 1. If Robbie is going to respond to her making
one of each by making 50 of each, then she is better off making one of each not
just if she is exactly indifferent, but if she is anywhere close to indifferent. In fact,
the optimal policy is now to make one of each if she values paperclips anywhere
between about 0.446 and 0.554.
4. Given this new strategy for Harriet, Robbie’s strategy remains unchanged. For
example, if she chooses one of each, he infers that the value of a paperclip is
uniformly distributed between 0.446 and 0.554, with an average of 0.50, so the
best choice is to make 50 of each. Because Robbie’s strategy is the same as in
step 2, Harriet’s best response will be the same as in step 3, and we have found the
equilibrium.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Solutions 13

With her strategy, Harriet is, in effect, teaching Robbie about her preferences using
a simple code—a language, if you like—that emerges from the equilibrium analysis.
Note also that Robbie never learns Harriet’s preferences exactly, but he learns enough
to act optimally on her behalf—that is he acts (given his limited options) just as he
would if he did know her preferences exactly. He is provably beneficial to Harriet under
the assumptions stated, and under the assumption that Harriet is playing the game
correctly.
It is possible to prove that provided there are no ties that cause coordination problems,
finding an optimal strategy for the robot in an assistance game can be done by solving a
single-agent partially observable Markov decision process (POMDP) whose state space
is the underlying state space of the game plus the human preference parameters θ.
POMDPs in general are very hard to solve, but the POMDPs that represent assistance
games have additional structure that enables more efficient algorithms (Malik et al.2018).

1.4.2 The off-switch game


Within the same basic framework, one can also show that a robot solving an assistance
game will defer to a human and allow itself to be switched off. This property is illustrated
in the off-switch game shown in Figure 1.4 (Hadfield-Menell et al. 2017b). Robbie is now
helping Harriet find a hotel room for the International Paperclip Convention in Geneva.
Robbie can act now—let’s say he can book Harriet into a very expensive hotel near the
meeting venue. He is quite unsure how much Harriet will like the hotel and its price;
let’s say he has a uniform probability for its net value to Harriet between −40 and +60,
with an average of +10. He could also ‘switch himself off’—less melodramatically, take
himself out of the hotel booking process altogether—which is defined (without loss of
generality) to have value 0 to Harriet. If those were his two choices, he would go ahead
and book the hotel, incurring a significant risk of making Harriet unhappy. (If the range
were −60 to +40, with average −10, he would switch himself off instead.) I’ll give Robbie

R
act switch self off
wait
U=? H U=0
–40 +60 switch robot off
go ahead
R U=0
act switch self off
wait
U=? U=0
–40 +60

Figure 1.4 The off-switch game. R, the robot, can choose to act now, with a highly uncertain payoff; to
switch itself off; or to defer to H , the human. H can switch R off or let it go ahead. R now has the same
choice again. Acting still has an uncertain payoff, but now R knows the payoff is nonnegative.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

14 Human-Compatible Artificial Intelligence

a third choice, however: explain his plan, wait, and let Harriet switch him off. Harriet can
either switch him off or let him go ahead and book the hotel. What possible good could
this do, one might ask, given that he could make both of those choices himself?
The point is that Harriet’s choice—to switch Robbie off or let him go ahead—provides
Robbie with information about Harriet’s preferences. I’ll assume, for now, that Harriet
is rational, so if Harriet lets Robbie go ahead, it means the value to Harriet is positive.
Now, as shown in Figure 1.4, Robbie’s belief changes: it is uniform between 0 and +60,
with an average of +30.
So, if one evaluates Robbie’s initial choices from his point of view:

1. Acting now and booking the hotel has an expected value of +10.
2. Switching himself off has a value of 0.
3. Waiting and letting Harriet switch him off leads to two possible outcomes:
(a) There is a 40% chance, based on Robbie’s uncertainty about Harriet’s
preferences, that she will hate the plan and will switch Robbie off, with
value 0.
(b) There is a 60% chance Harriet will like the plan and allow Robbie to go ahead,
with expected value +30.
Thus, waiting has expected value (0.4 × 0) + (0.6 × 30) = +18, which is better than the
+10 Robbie epects if he acts now.
The upshot is that Robbie has a positive incentive to defer to Harriet—that is, to allow
himself to be switched off. This incentive comes directly from Robbie’s uncertainty about
Harriet’s preferences. Robbie is aware that there’s a chance (40% in this example) that
he might be about to do something that will make Harriet unhappy, in which case being
switched off would be preferable to going ahead. Were Robbie already certain about
Harriet’s preferences, he would just go ahead and make the decision (or switch himself
off); there would be absolutely nothing to be gained from consulting Harriet, because,
according to Robbie’s definite beliefs, he can already predict exactly what she is going to
decide.
In fact, it is possible to prove the same result in the general case: as long as Robbie is
not completely certain that he’s about to do what Harriet herself would do, he is better off
allowing her to switch him off. Intuitively, her decision provides Robbie with information,
and the expected value of information is always non negative. Conversely, if Robbie
is certain about Harriet’s decision, her decision provides no new information, and so
Robbie has no incentive to allow her to decide.
Formally, let P (u) be Robbie’s prior probability density over Harriet’s utility for the
proposed action a. Then the value of going ahead with a is

 ∞  0  ∞
EU (a) = P (u) · u du = P (u) · u du + P (u) · u du .
−∞ −∞ 0
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Solutions 15

On the other hand, the value of action d, deferring to Harriet, is composed of two parts:
if u > 0 then Harriet lets Robbie go ahead, so the value is u, but if u < 0 then Harriet
switches Robbie off, so the value is 0:
 0  ∞
EU (d) = P (u) · 0 du + P (u) · u du .
−∞ 0

Comparing the expressions for EU (a) and EU (d), it follows immediately that

EU (d) ≥ EU (a)

because the expression for EU (d) has the negative-utility region zeroed out. The two
choices have equal value only when the negative region has zero probability—that is,
when Robbie is already certain that Harriet likes the proposed action.
There are some obvious elaborations on the model that are worth exploring imme-
diately. The first elaboration is to impose a cost for Harriet’s time. In that case, Robbie
is less inclined to bother Harriet if the downside risk is small. This is as it should be.
And if Harriet is really grumpy about being interrupted, she shouldn’t be too surprised
if Robbie occasionally does things she doesn’t like.
The second elaboration is to allow for some probability of human error—that is,
Harriet might sometimes switch Robbie off even when his proposed action is reasonable,
and she might sometimes let Robbie go ahead even when his proposed action is
undesirable. It is straightforward to fold this error probability into the model (Hadfield-
Menell et al. 2017b). As one might expect, the solution shows that Robbie is less inclined
to defer to an irrational Harriet who sometimes acts against her own best interests. The
more randomly she behaves, the more uncertain Robbie has to be about her preferences
before deferring to her. Again, this is as it should be: for example, if Robbie is a self-
driving car and Harriet is his naughty two-year-old passenger, Robbie should not allow
Harriet to switch him off in the middle of the highway.
The off-switch example suggests some templates for controllable-agent designs and
provides a simple example of a provably beneficial system in the sense introduced above.
The overall approach resembles principal–agent problems in economics, wherein the
principal (e.g., an employer) needs to incentivize another agent (e.g., an employee) to
behave in ways beneficial to the principal. The key difference here is that we are building
one of the agents in order to benefit the other. Unlike a human employee, the robot should
have no interests of its own whatsoever.
Assistance games can be generalized to allow for imperfectly rational humans
(Hadfield-Menell et al. 2017b), humans who don’t know their own preferences (Chan
et al. 2019), multiple human participants, multiple robots, and so on. Scaling up to
complex environments and high-dimensional perceptual inputs may be possible using
methods related to deep inverse reinforcement learning. By providing a factored or
structured action space, as opposed to the simple atomic actions in the paperclip
game, the opportunities for communication can be greatly enhanced. Few of these
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

16 Human-Compatible Artificial Intelligence

variations have been explored so far, but I expect the key property of assistance games to
remain true: robots that solve such games will be beneficial (in expectation) to humans
(Hadfield-Menell et al. 2017a).
While the basic theory of assistance games assumes perfectly rational robots that can
solve the assistance game exactly, this is unlikely to be possible in practical situations.
Indeed, one expects to find qualitatively different phenomena occurring when the robot
is much less capable than, roughly as capable as, or much more capable than the human.
There is good reason to hope that in all cases improving the robot’s capability will be
beneficial to the human, because it will do a better job of learning human preferences
and a better job of satisfying them.

1.4.3 Acting with unknown preferences


Multiattribute utility theory (Keeney and Raiffa 1976) views the world as composed
of a set of attributes {X1 , . . . , Xn }, with preferences defined on lotteries over complete
assignments to the attributes. This is clearly an oversimplification, but it suffices for our
purpose in exploring some basic phenomena.
In some cases, a machine’s scope of action is strictly limited. For example, a (non-
Internet-connected) thermostat can only turn the heating on and off, and, to a first
approximation, affects the temperature in the house and the owner’s bank balance.7 It is
plausible in this case to imagine that the thermostat might develop a decent model of the
user’s preferences over temperature and cost attributes.
In the great majority of circumstances, however, the AI system’s knowledge of human
preferences will be extremely incomplete compared to its scope of action. How can it be
useful if this is the case? Can it even fetch the coffee?
It turns out that the answer is yes, if we understand ‘fetch the coffee’ the right way.
‘Fetch the coffee’ does not divide the world into goal states (where the human has coffee)
and non-goal states. Instead, it says that the human’s current preferences rank coffee
states above non-coffee states all other things being equal. This idea of goals as ceteris
paribus comparatives is well-established (von Wright 1972; Wellman and Doyle 1991). In
this context, it suggests that the machine should act in a minimally invasive fashion—that
is, satisfy the preferences it knows about (coffee) without disturbing any other attributes
of the world.
There remains the question of why the machine should assume that leaving other
attributes unaffected is better than disturbing them in some random way. One possible
answer is some form of risk aversion, but I suspect this is not enough. One has a sense
that a machine that does nothing is better than one that acts randomly, but this is certainly
not implicit in the standard formulation of MDPs. I think one has to add the assumption
that the world is not in an arbitrary state; rather, it resembles a state sampled from the

7 In reality, it is very difficult to limit the effects of the agent’s actions to a small set of attributes. Turning
the heat off may make the occupants more susceptible to viral infection, and profligate heating may tip the
occupants into bankuptcy, and so on. A device connected to the Internet, with the ability to send character
streams, can affect the entire planet through propaganda, online trading, etc.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Reasons for Optimism 17

stationary distribution that results from the actions of human agents operating according
to their preferences (Shah et al. 2019). In that case, one expects a random action to make
things worse for the humans.
There is another kind of action that is beneficial to humans even when the machine
knows nothing at all about human preferences: an action that simply expands the set
of actions available to the human. For example, if Harriet has forgotten her password,
Robbie can give her the password, enabling a wider range of actions than Harriet could
otherwise execute.

1.5 Reasons for Optimism


There are some reasons to think this approach may work in practice. First, there is
abundant written and filmed information about humans doing things (and other humans
reacting). More or less every book ever written contains evidence on this topic. Even
the oldest clay tablets, tediously recording the exchange of N sheep for M oxen, give
information about human preferences between sheep and oxen. Technology to build
models of human preferences from this storehouse will presumably be available long
before superintelligent AI systems are created.
Second, there are strong near-term economic incentives for robots to understand
human preferences, which also come into play well before the arrival of superintelligence.
Already, computer systems record one’s preferences for an aisle seat or a vegetarian meal.
More sophisticated personal assistants will need to understand their user’s preferences
for cost, luxury, and convenient location when booking hotels, and how these preferences
depend on the nature and schedule of the user’s planned activities. Managing a busy
person’s calendar and screening calls and emails requires an even more sophisticated
understanding of the user’s life, as does the management of an entire household when
entrusted to a domestic robot. For all such roles, trust is essential but easily lost if the
machine reveals itself to lack a basic understanding of human preferences. If one poorly
designed domestic robot cooks the cat for dinner, not realizing that its sentimental value
outweighs its nutritional value, the domestic-robot industry will be out of business.
For companies and governments to adopt the new model of AI, a great deal of
research must be done to replace the entire toolbox of AI methods, all of which have
been developed on the assumption that the objective is known exactly. There are two
primary issues for each class of task environments: how to relax the assumption of a
known objective and what form of interaction to assume between the machine and the
human. For example, problem-solving task environments have an objective defined by
a goal test G(s) and a stepwise cost function c(s, a, s ). Perhaps the machine knows a
relaxed predicate G ⊃ G and upper and lower bounds c+ and c− on the cost function,
and can ask the human (1) whether any given state s satisfies G and (2) whether one
trajectory to s is preferred to another. Design considerations include formal precision,
algorithmic complexity, feasibility of the interaction protocol from the human point of
view, and applicability in real-world circumstances.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

18 Human-Compatible Artificial Intelligence

The standard model of AI as maximizing objectives does not imply that all AI systems
have to solve some particular problem formulation such as an influence diagram or a
factored MDP. For example, it is entirely consistent with the standard model to build an
AI system directly as a policy, expressed as a set of condition–action rules specifying the
optimal action in each category of states.8 By the same token, I am not proposing that all
AI systems under the new model have to solve some explicitly formulated representation
of the assistance game. It is important to maintain a broad conception of the approach
and how it applies to the design of AI systems for any particular task environment. The
crucial elements are (1) acknowledgement that there is partial and uncertain information
about the true human preferences that are relevant in the task environment; (2) a
means for information to flow at run-time from humans to machines concerning those
preferences; and (3) allowance for the human to be a joint participant in the run-time
process.

1.6 Obstacles
There are obvious difficulties with an approach that expects machines to learn underlying
preferences from observing human behaviour. The first is that humans are irrational, in
the sense that our actions do not reflect our preferences. This irrationality arises in part
from our computational limitations relative to the complexity of the decision problems we
face. For example, if two humans are playing chess and one of them loses, it’s because the
loser (and possibly the winner too) made a mistake—a move that led inevitably to a forced
loss. A machine observing that move and assuming perfect rationality on the part of the
human might well conclude that the human preferred to lose. Thus, to avoid reaching
such conclusions, the machine must take into account the actual cognitive mechanisms
of humans.
As yet, we do not know enough about human cognitive mechanisms to invert real
human behaviour to get at the underlying preferences. One thing that seems intuitively
clear, however, is that one of our principal methods for coping with the complexity
of the world is to organize our behaviour hierarchically. That is, we make (defeasible)
commitments to higher-level goals such as ‘write an essay on a human-compatible
approach to AI’; then, rather than considering all possible sequences of words, from
‘aardvark aardvark aardvark . . .’ to ‘zyzzyva zyzzyva zyzzyva . . .’ as a chess program would
do, we choose among subtasks such as ‘write the introduction’ and ‘read more about
preference elicitation’. Eventually, we get down to the choice of words, and then typing
each word involves a sequence of keystrokes, each of which is in turn a sequence of motor
control commands to the muscles of the arms and hands. At any given point, then, a
human is embedded at various particular levels of multiple deep and complex hierarchies
of partially overlapping activities and subgoals. This means that for the machine to

8 Many applications of control theory work exactly this way: the control theorist works offline with a
mathematical model of the system and the objective to derive a control law that is then implemented in the
controller.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Obstacles 19

understand human actions, it probably needs to understand a good deal about what
these hierarchies are and how we use them to navigate the real world.
Machines might try to discover more about human cognitive mechanisms by an
inductive learning approach. Suppose that in some given state s Harriet’s action a
depends on her preferences θ according to mechanism h, that is, a = h(θ, s). (Here,
θ represents not a single parameter such as the exchange rate between staples and
paperclips, but Harriet’s preferences over future lives, which could be a structure of
arbitrary complexity.) By observing many examples of s and a, is it possible eventually to
recover h and θ? At first glance, the answer seems to be no (Armstrong and Mindermann
2019). For example, one cannot distinguish between the following hypotheses about how
Harriet plays chess:

1. h maximizes the satisfaction of preferences, and θ is the desire to win games.


2. h minimizes the satisfaction of preferences, and θ is the desire to lose games.

From the outside, Harriet plays perfect chess under either hypothesis.9 If one is merely
concerned with predicting her next move, it doesn’t matter which formulation one
chooses. On the other hand, for a machine whose goal is to help Harriet realize her
preferences, it really does matter! The machine needs to know which explanation
holds. From this viewpoint, something is seriously wrong with the second explanation
of behaviour. If Harriet’s cognitive mechanism h were really trying to minimize the
satisfaction of preferences θ, it wouldn’t make sense to call θ her preferences. It is, then,
simply a mistake to suppose that h and θ are separately and independently defined. I have
already argued that the assumption of perfect rationality—that is, h is maximization—is
too strong; yet, for it to make sense to say that Harriet has preferences, h will have to
satisfy (or nearly satisfy) some basic properties associated with rationality. These might
include choosing correctly according to preferences in situations that are computationally
trivial—for example, choosing between vanilla and bubble-gum ice cream at the beach.
Cherniak (1986) presents an in-depth analysis of these minimal conditions on rationality.
Further difficulties arise if the machine succeeds in identifying Harriet’s preferences,
but finds them to be inconsistent. For example, suppose she prefers vanilla to bubble gum
and bubble gum to pistachio, but prefers pistachio to vanilla. In that case her preferences
violate the axiom of transitivity and there is no way to maximally satisfy her preferences.
(That is, whatever ice cream the machine gives her, there is always another that she
would prefer.) In such cases, the machine could attempt to satisfy Harriet’s preferences
up to inconsistency; for example, if Harriet strictly prefers all three of the aforementioned
flavors to licorice, then it should avoid giving her licorice ice cream.

9 Of course, the Harriet who prefers to lose might grumble when she keeps winning, thereby giving a clue
as to which Harriet she is. One response to this is that grumbling is just more behaviour, and equally subject
to multiple interpretations. Another response is to say that Harriet might feel grumbly but, in keeping with her
minimizing h, would instead jump for joy. This is not to say that there is no fact of the matter as to whether
Harriet is pleased or displeased with the outcome.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

20 Human-Compatible Artificial Intelligence

Of course, the inconsistency in Harriet’s preferences could be of a far more radical


nature. Many theories of cognition, such as Minsky’s Society of Mind (1986), posit
multiple cognitive subsystems that, in essence, have their own preference structures and
compete for control—and these seem to be manifested in addictive and self-destructive
behaviours, among others. Such inconsistencies place limits on the extent to which the
idea of machines helping humans even makes sense.
Also difficult, from a philosophical viewpoint, is the apparent plasticity of human
preferences—the fact that they seem to change over time as the result of experiences. It is
hard to explain how such changes can be made rationally, because they make one’s future
self less likely to satisfy one’s present preferences about the future. Yet plasticity seems
fundamentally important to the entire enterprise, because newborn infants certainly lack
the rich, nuanced, culturally informed preference structures of adults. Indeed, it seems
likely that our preferences are at least partially formed by a process resembling inverse
reinforcement learning, whereby we absorb preferences that explain the behaviour of
those around us. Such a process would tend to give cultures some degree of autonomy
from the otherwise homogenizing effects of our dopamine-based reward system.
Plasticity also raises the obvious question of which Harriet the machine should try
to help: Harriet2020 , Harriet2035 , or some time-averaged Harriet? (See Pettigrew (2020)
for a full treatment of this approach, wherein decisions for individuals who change over
time are made as if they were decisions made on behalf of multiple distinct individuals.)
Plasticity is also problematic because of the possibility that the machine may, by subtly
influencing Harriet’s environment, gradually mould her preferences in directions that
make them easier to satisfy, much as certain political forces have been said to do with
voters in recent decades.
I am often asked, ‘Whose values should we align AI with?’ (The question is usually
posed in more accusatory language, as if my secret, Silicon-Valley-hatched plan is to align
all the world’s AI systems with my own white, male, Western, cisgender, Episcopalian
values.) Of course, this is simply a misunderstanding. The kind of AI system proposed
here is not ‘aligned’ with any values, unless you count the basic principle of helping
humans realize their preferences. For each of the billions of humans on Earth, the
machine should be able to predict, to the extent that its information allows, which life
that person would prefer.
Now, practical and social constraints will prevent all preferences from being maximally
satisfied simultaneously. We cannot all be Ruler of the Universe. This means that
machines must mediate among conflicting preferences—something that philosophers
and social scientists have struggled with for millennia. At one extreme, each machine
could pay attention only to the preferences of its owner, subject to legal constraints on its
actions. This seems undesirable, as it would have a machine belonging to a misanthrope
refuse to aid a severely injured pedestrian so that it can bring the newspaper home more
quickly. Moreover, we might find ourselves needing many more laws as machines satisfy
their owners’ preferences in ways that are very annoying to others even if not strictly
illegal. At the other extreme, if machines consider equally the preferences of all humans,
they might focus a larger fraction of their energies on the least fortunate than their owners
might prefer—a state of affairs not conducive to investment in AI. Presumably, some
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Looking Further Ahead 21

middle ground can be found, perhaps combining a degree of obligation to the machine’s
owner with public subsidies that support contributions to the greater good. Determining
the ideal solution for this issue is an open problem.
Another common question is, ‘What if machines learn from evil people?’ Here, there
is a real issue. It is not that machines will learn to copy evil actions. The machine’s
actions need not resemble in any way the actions of those it observes, any more than
a criminologist’s actions resemble those of the criminals she observes. The machine
is learning about human preferences; it is not adopting those preferences as its own
and acting to satisfy them. For example, suppose that a corrupt passport official in a
developing country insists on a bribe for every transaction, so that he can afford to pay
for his children to go to school. A machine observing this will not learn to take bribes
itself: it has no need of money and understands (and wishes to avoid) the toll imposed
on others by the taking of bribes. The machine will instead find other, socially beneficial
ways to help send the children to school. Similarly, a machine observing humans killing
each other in war will not learn that killing is good: obviously, those on the receiving end
very much prefer not to be dead.
The difficult issue that remains is this: what should machines learn from humans
who enjoy the suffering of others? In such cases, any simple aggregation scheme for
preferences (such as adding utilities) would lead to some reduction in the utilities of
others in order to satisfy, at least partially, these perverse preferences. It seems reasonable
to require that machines simply ignore positive weights in the preferences of some for
the suffering of others (Harsanyi 1977).

1.7 Looking Further Ahead


If we assume, for the sake of argument, that all of these obstacles can be overcome, as well
as all of the obstacles to the development of truly capable AI systems, are we home free?
Would provably beneficial, superintelligent AI usher in a golden age for humanity? Not
necessarily. There remains the issue of adoption: how can we obtain broad agreement on
suitable design principles, and how can we ensure that only suitably designed AI systems
are deployed?
On the question of obtaining agreement at the policy level, it is necessary first
to generate consensus within the research community on the basic ideas of—and
design templates for—provably beneficial AI, so that policy-makers have some concrete
guidance on what sorts of regulations might make sense. The economic incentives noted
earlier are of the kind that would tend to support the installation of rigorous standards
at the early stages of AI development, because failures would be damaging to entire
industries, not just to the perpetrator and victim. We already see this in miniature with
the imposition of machine-checkable software standards for cell-phone applications.
On the question of enforcement of policies for AI software design, I am less sanguine.
If Dr Evil wants to take over the world, he or she might remove the safety catch, so to
speak, and deploy an AI system that ends up destroying the world instead. This problem
is a hugely magnified version of the problem we currently face with malware. Our track
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

22 Human-Compatible Artificial Intelligence

record in solving the latter problem does not provide grounds for optimism concerning
the former. In Samuel Butler’s Erewhon and in Frank Herbert’s Dune, the solution is to
ban all intelligent machines, as a matter of both law and cultural imperative. Perhaps if
we find institutional solutions to the malware problem, we will be able to devise some
less drastic approach for AI.
The problem of misuse is not limited to evil masterminds. One possible future for
humanity in the age of superintelligent AI is that of a race of lotus eaters, progressively
enfeebled as machines take over the management of our entire civilization. This is the
future imagined in E. M. Forster’s story The Machine Stops, written in 1909. We may
say, now, that such a future is undesirable; the machines may agree with us and volunteer
to stand back, requiring humanity to exert itself and maintain its vigour. But exertion
is tiring, and we may, in our usual myopic way, design AI systems that are not quite so
concerned about the long-term vigour of humanity and just a little more helpful than they
would otherwise wish to be. Unfortunately, this process continues in a direction that is
hard to resist.

1.8 Conclusion
Finding a solution to the AI control problem is an important task; it may be, in Bostrom’s
words, ‘the essential task of our age’. It involves building systems that are far more
powerful than ourselves while still guaranteeing that those systems will remain powerless,
forever.
Up to now, AI research has focused on systems that are better at making decisions, but
this is not the same as making better decisions. No matter how excellently an algorithm
maximizes, and no matter how accurate its model of the world, a machine’s decisions
may be ineffably stupid, in the eyes of an ordinary human, if it fails to understand human
preferences.
This problem requires a change in the definition of AI itself—from a field concerned
with a unary notion of intelligence as the optimization of a given objective, to a field
concerned with a binary notion of machines that are provably beneficial for humans.
Taking the problem seriously seems likely to yield new ways of thinking about AI, its
purpose, and our relationship to it.

References
Armstrong, S. and Mindermann, S. (2019). Occam’s razor is insufficient to infer the preferences
of irrational agents, in Advances in Neural Information Processing Systems 31.
Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press.
Brooks, R. (2017). The seven deadly sins of AI predictions. MIT Technology Review,
6 October.
Chalmers, D. J. (2010). The singularity: a philosophical analysis. Journal of Consciousness Studies,
17, 7–65.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

References 23

Chan, L., Hadfield-Menell, D., Srinivasa, S. et al. (2019). The assistive multi-armed bandit, in
Proceedings of the Fourteenth ACM/IEEE International Conference on Human–Robot Interaction.
Daegu, Republic of Korea; 11–14 March.
Cherniak, C. (1986). Minimal Rationality. Cambridge, MA: MIT Press.
Gates, W. (2015). Ask me anything. Reddit, 28 January. https://www.reddit.com/r/IAmA/
comments/2tzjp7/hi_reddit_im_bill_gates_and_im_back_for_my_third/
Hadfield-Menell, D., Dragan, A. D., Abbeel, P. et al. (2017a). Cooperative inverse reinforcement
learning, in Advances in Neural Information Processing Systems 29.
Hadfield-Menell, D., Dragan, A. D. et al.(2017b). The off-switch game, in Proceedings of the Twenty-
Sixth International Joint Conference on Artificial Intelligence. Melbourne; August 19–25.
Harsanyi, J. (1977). Morality and the theory of rational behavior. Social Research, 44, 623–56.
Keeney, R. L. and Raiffa, H. (1976). Decisions with Multiple Objectives: Preferences and Value
Tradeoffs. New York: Wiley.
Kelly, K. (2017). The myth of a superhuman AI. Wired, 25 April.
Kumparak, G. (2014). Elon Musk compares building artificial intelligence to ‘summoning the
demon’. TechCrunch, 26 October. https://techcrunch.com/2014/10/26/elon-musk-compares-
building-artificial-intelligence-to-summoning-the-demon/
Malik, D., Palaniappan, M., Fisac, J. F. et al. (2018). An efficient, generalized Bellman update
for cooperative inverse reinforcement learning, in Proceedings of the Thirty-Fifth International
Conference on Machine Learning. Sydney; 6–11 August.
Minsky, M. L. (1986). The Society of Mind. New York, NY: Simon and Schuster.
Ng, A. Y. and Russell, S. J. (2000). Algorithms for inverse reinforcement learning, in Proceedings
of the Seventeenth International Conference on Machine Learning.
Omohundro, S. (2008). The basic AI drives, in AGI-08 Workshop on the Sociocultural, Ethical and
Futurological Implications of Artificial Intelligence. Memphis, TN; 4 March.
Osborne, H. (2017). Stephen Hawking AI warning: artificial intelligence could destroy civilization.
Newsweek, 7 November.
Pettigrew, R. (2020). Choosing for Changing Selves. Oxford: Oxford University Press.
Russell, S. J. (1998). Learning agents for uncertain environments, in Proceedings of the Eleventh
ACM Conference on Computational Learning Theory. Madison, WI; 24–26 July.
Shah, R., Krasheninnikov, D., Alexander, J. et al. (2019). The implicit preference information in
an initial state, in Proceedings of the Seventh International Conference on Learning Representations.
New Orleans; 6–9 May.
Stone, P., Brooks, R. A., Brynjolfsson, E. et al. (2016). Artificial intelligence and life in 2030.
Technical report, Stanford University One Hundred Year Study on Artificial Intelligence:
Report of the 2015–2016 Study Panel.
Turing, A. (1951). Can digital machines think? Lecture broadcast on BBC Third Programme.
Typescript available at http://www.turingarchive.org.
von Wright, G. (1972). The logic of preference reconsidered. Theory and Decision, 3, 140–67.
Wellman, M. P. and Doyle, J. (1991). Preferential semantics for goals, in Proceedings of the Ninth
National Conference on Artificial Intelligence. Anaheim, CA; 14–19 July.
Wiener, N. (1960). Some moral and technical consequences of automation. Science, 131(3410),
1355–8.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

2
Alan Turing and Human-Like
Intelligence
Peter Millican
Hertford College, Oxford, UK

The idea of Human-Like Computing became central to visions of Artificial Intelligence


through the work of Alan Turing, whose model of computation (1936) is explicated
in terms of the potential operations of a human “computer”, and whose famous test
for intelligent machinery (1950) is based on indistinguishability from human verbal
behaviour. But here I shall challenge the apparent human-centredness of the 1936 model
(now known as the Turing machine), and suggest a different genesis with a primary
focus on the foundations of mathematics, and with human comparisons making an
entrance only in retrospective justification of the model. It will also turn out, more
surprisingly, that the 1950 account of intelligence is ultimately far less human-centred
than it initially appears to be, because the universality of computation—as established in
the 1936 paper—makes human intelligence just one variety amongst many. It is only
when Turing considers consciousness that he goes seriously astray in suggesting that
machine intelligence must be understood on the human model. But a better approach
is clearly revealed through his own earlier work, which gave ample reason to reinterpret
intelligence as sophisticated information processing for some purpose, and to divorce
this from the subjective consciousness with which it is humanly associated.

2.1 The Background to Turing’s 1936 Paper


Alan Turing’s remarkable 1936 paper, “On Computable Numbers, with an Application
to the Entscheidungsproblem”, introduced the first model of an all-purpose, programmable
digital computer, now universally known as the Turing machine. And the paper, as noted
above, gives the impression that this model is inspired by considering the potential
operations of a human “computer”. Yet the title and organisation of the paper suggest
instead that Turing is approaching the topic from the direction of fundamental issues
in the theory of mathematics, rather than any abstract analysis of human capabilities. It
will be useful to start with an overview of two essential components of this theoretical
background.

Peter Millican, Alan Turing and Human-Like Intelligence In: Human Like Machine Intelligence. Edited by: Stephen Muggleton and Nick Charter,
Oxford University Press. © Oxford University Press (2021). DOI: 10.1093/oso/9780198862536.003.0002
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

The Background to Turing’s 1936 Paper 25

The first of these components is Georg Cantor’s pioneering work on the countability
or enumerability of various infinite sets of numbers: the question of whether the elements
of these sets could in principle be set out—or enumerated—in a single list that contains
every element of the set at least once. Cantor had shown in 1891 that such enumeration of
rational numbers (i.e. fractions of integers) is indeed possible, since they can be exhaus-
tively ordered by the combined magnitude of their numerator and denominator.1 Real
numbers, however, cannot be enumerated, as demonstrated by his celebrated diagonal
proof, which proceeds by reductio ad absurdum. Focusing on real numbers between 0 and
1 expressed as infinite decimals,2 we start by assuming that an enumeration of these is
possible, and imagine them laid out accordingly in an infinite list R (so we are faced with
an array which is infinite both horizontally, owing to the infinite decimals, and vertically,
owing to the infinite list). We then imagine constructing another infinite decimal α by
taking its first digit α[1] from the first real number in the list r1 (so α[1] = r1 [1]), its
second digit from the second real number in the list (α[2] = r2 [2]), its third digit from
the third real number in the list (α[3] = r3 [3]), and so on. Thus α is the infinite decimal
that we get by tracing down the diagonal of our imagined array: in every case α has its nth
digit in common with rn . We now imagine constructing another infinite decimal number
β from α, by systematically changing every single digit according to some rule (e.g. if
α[n] = 0, then β[n] = 1, else β[n] = 0). For any would-be enumeration R, this gives a
systematic method of constructing a number β whose nth digit β[n] must in every case
be different from the nth digit of rn . Thus β cannot be identical with any number in the
list, contradicting our assumption that R was a complete enumeration, and it follows that
no such enumeration is possible.
The second essential component in the background of Turing’s paper is David
Hilbert’s decision problem or Entscheidungsproblem: can a precise general procedure be
devised which is able, in finite time and using finite resources, to establish whether any
given formula of first-order predicate logic is provable or not? A major goal of Hilbert’s
influential programme in the philosophy of mathematics was to show that such decid-
ability was achievable, and his 1928 book with Wilhelm Ackermann even declared that
the Entscheidungsproblem should be considered the main problem of mathematical logic
(p. 77). It accordingly featured prominently in Max Newman’s Cambridge University
course on the Foundations of Mathematics, attended by Alan Turing in spring 1935. But
by then Gödel’s incompleteness theorems of 1931—also covered in Newman’s course—
had shown that two other major goals of Hilbert’s programme (proofs of consistency and
completeness) could not both be achieved, and Turing’s great paper of 1936 would show
that decidability also was unachievable.
A potentially crucial requirement in tackling the Entscheidungsproblem—especially if
a negative answer is to be given to the question of decidability—is to pin down exactly
what types of operation are permitted within the would-be general decision procedure.

1 For example, 1/1 (sum 2); 1/2, 2/1 (sum 3); 1/3, 2/2, 3/1 (sum 4); 1/4, 2/3, 3/2, 4/1 (sum 5); and so
on—every possible fraction of positive integers will appear somewhere in this list. To include negative fractions
of integers, we could simply insert each negative value immediately after its positive twin.
2 Or alternatively binimals, as discussed below.
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

26 Alan Turing and Human-Like Intelligence

For without some such circumscription of what is permissible (e.g. ruling out appeal to
an all-knowing oracle or deity, if such were to exist), it is hard to see much prospect
of delimiting the range of what can theoretically be achieved. An appropriate limit
should clearly prohibit operations that rely on unexplained magic, inspiration, or external
intervention, and should include only those that are precisely specifiable, performable by
rigorously following specific instructions in a “mechanical” manner, and reliably yielding
the same result given the same inputs. A procedure defined in these terms is commonly
called an effective method, and results thus achievable are called effectively computable.
These concepts remain so far rather vague and intuitive, but what Turing does with the
“computing machines” that he introduces in §1 of his 1936 paper is to define a precise
concept of effective computability in terms of what can be achieved by a specific kind of
machine whose behaviour is explicitly and completely determined by a lookup table of
conditions and actions. Different tables give rise to different behaviour, but the scope
of possible conditions and actions is circumscribed precisely by the limits that Turing
lays down.

2.2 Introducing Turing Machines


As its title suggests, the 1936 paper starts from the concept of a computable number:

“The ‘computable’ numbers may be described briefly as the real numbers whose
expressions as a decimal are calculable by finite means.” (p. 58)

Turing’s terminology is potentially confusing here, in view of what follows. The numeri-
cal expressions he will actually be concerned with express real numbers between 0 and 1,
interpreted in binary rather than decimal (i.e. sequences of “0” and “1”, following an
implicit binary point, as opposed to sequences of decimal digits following a decimal
point). To provide a distinctive term for such binary fractions, let us call them binimals.
For example, the first specific example that Turing gives (§3, p. 61)3 generates the infinite
sequence of binary digits:

0 1 0 1 0 1 0 1 0 1 ...

which is to be understood as expressing the recurring binimal fraction 0.01, numerically


equivalent to 1/3.4 That the binimal recurs to infinity is no difficulty: on the contrary,

3 In what follows, references of this form, citing section and page numbers, are always either to the 1936
paper or—later—to the 1950 paper. Note also that for convenience, all page references to Turing’s publications
are to the relevant reprint in Copeland (2004).
4 The “1” in the second binimal place represents 1/4, and the value of each subsequent “1” is 1/4 of the
previous one. So we have a geometric series 1/4 + 1/16 + 1/64 + . . . whose first term is 1/4 and common ratio
1/4, yielding a sum of 1/3 by the familiar formula a/(1-r).
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

Introducing Turing Machines 27

all of Turing’s binimal expressions will continue to infinity, whether recurring (as in the
binimal for 1/2, i.e. 0.1000 . . . ) or not (as in the binimal for π /4).5
That Turing’s binimal expressions continue to infinity may well seem surprising, how-
ever, given his declared aim to explore those that are computable, glossed as “calculable
by finite means”. At the very beginning of §1 of his paper he acknowledges that the term
“requires rather more explicit definition”, referring forward to §9 and then commenting
“For the present I shall only say that the justification lies in the fact that the human
memory is necessarily limited.” (p. 59). In the next paragraph he compares “a man in
the process of computing a real number to a machine which is only capable of a finite
number of conditions”. So the finitude he intends in the notion of computable numbers
does not apply to the ultimate extent of the written binimal number, nor therefore to
the potentially infinite tape—divided into an endless horizontal sequence of squares—on
which the digits of that number (as well as intermediate workings) are to be printed.
Rather, what is crucial to Turing’s concept of computability “by finite means” is that
the choice of behaviour at each stage of computation is tightly defined by a finite set
of machine memory states,6 a finite set of symbol types, and a limited range of resulting
actions. Each possible combination of state and symbol—the latter being read from the
particular square on the tape that is currently being scanned—is assigned a specific
repertoire of actions. Computation takes place through a repeated sequence of scanning
the current square on the tape, identifying any symbol (at most one per square) that it
contains, then performing the action(s) assigned to the relevant combination of current
state and symbol.
For theoretical purposes, the actions for each state/symbol combination are very
tightly constrained, limited to printing a symbol (or blank) on the current square, moving
the scanner one square left or right, and changing the current state. For much of his
paper, however, Turing slightly relaxes these constraints, allowing multiple printings and
movements, as in the example illustrated below. This shows the machine defined by a
table specified in §3 of the 1936 paper (p. 62), running within a Turing machine simulator
program.7 Starting with an empty tape in state 1 (or “b” in Turing’s paper), the machine
prints the sequence of symbols we see at the left of the tape (“P@” prints “@”; “R” moves
right; “P0” prints “0”), then moves left twice (“L,L”) to return to the square containing
the first “0”, before transitioning into state 2. Next, finding itself now scanning an “0” in
state 2, it transitions into state 3. Next, still scanning an “0” but now in state 3, it moves
right twice and stays in state 3. Then, scanning a blank square (“None”) in state 3, it
prints a “1” and moves left, transitioning into state 4. And so on.

5 Note that a decimal or binimal will recur if, and only if, it represents a rational number, i.e. a fraction of
integers.
6 Turing’s term for what is now generally called a state is an “m-configuration” (§1, p. 59).
7 By far the best way of familiarising oneself with the operation of Turing machines is to see them in action.
The Turing machine simulator illustrated here is one of the example programs built into the Turtle System,
freely downloadable from www.turtle.ox.ac.uk. Within the simulator program, this particular machine table is
available from the initial menu, which also includes some other relevant examples taken from Petzold’s excellent
book on the 1936 paper (2008).
OUP CORRECTED PROOF – FINAL, 28/5/2021, SPi

28 Alan Turing and Human-Like Intelligence

A Turing Machine in Action

Turing’s machine table has been cleverly designed to print out an infinite binimal
sequence, which has the interesting property of never recurring:

0 0 1 0 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 ...

Adding an initial binimal point, this yields a binimal number which is clearly irrational
(precisely because it never recurs), but which has been shown to be computable in the
sense that there exists a Turing machine which generates it, digit by digit, indefinitely.8
This notion is trivially extendable to binimals that also have digits before the binimal
point. And in this extended sense it is clear—given that numerical division can be
mechanised on a Turing machine—that all rational numbers are computable, so it now
follows that the computable numbers are strictly more extensive than the rational numbers.
Indeed, Turing will go on later, in ä10 of the paper, to explain that all of the familiar
irrational numbers, such as π , e, 2, and indeed all real roots of rational algebraic
equations, are computable.

8 To avoid reference to the infinite sequence of digits, we can formally define a binimal number as computable
if and only if there exists a Turing machine such that, for all positive integers n, there is some point during the
execution of that machine at which the tape will display the subsequence consisting of its first n digits.
Another random document with
no related content on Scribd:
formerly head of a department in the Ministry for Foreign Affairs in
Paris. There could be no mistake about the nature of the instructions
with which this personage was provided. If the condition of the
commercial relations between the two states was the official pretext
for his embassy, an investigation into the affairs of the émigrés was
its real object. The Senate of the town were quick to realize this.
However, Reinhard’s conciliatory bearing and his expressed dislike
for the police duties imposed upon him by the Directoire prevented
his mission from having too uncompromising an aspect. He could not
shut his eyes, of course, to what was going on, and, in spite of his
repugnance to such methods, he was forced to employ some of the
tale-bearers and spies always numerous among the émigrés. In a
short period a complete system of espionage was organized. It did
not attain to the state of perfection secured by Bourienne later under
the direction of Fouché, but its existence was enough to enhance the
uneasiness of the Hamburg Senate. Their refusal to acquiesce in
certain steps taken by the Directoire forced Reinhard to quit the town
previously, in the month of February, and to take up his abode at
Bremen, afterwards at Altona. This suburb of Hamburg, separated
from it only by an arm of the river, was yet outside the limits of the
little republic, and suited his purpose excellently as a place from
which to conduct his observations. Everything that went on in
Hamburg was known there within a few hours.
It was at this period that Reinhard received a visit from a
somewhat sinister individual, named Colleville, who came to offer his
services to the Directoire. He volunteered to keep Reinhard informed
as to the doings of the émigrés, to whom he had easy access. On
March 5, 1796, he turned up with a lengthy document containing a
wealth of particulars regarding one of the principal agents of the
princes—no other than our friend d’Auerweck, for the moment a long
way from Hamburg, but soon expected back. “He is one of the best-
informed men to be met anywhere,” Colleville reports. “He has
travelled a great deal, and is au courant with the feeling of the
various courts and ministers.”
It must be admitted that the spy was well informed as to the
character and record of the “little Baron.” D’Auerweck would seem in
intimate relations with a certain Pictet, “Windham’s man.” Through
him he was in correspondence with Verona. He was known to be the
“friend of the Baron de Wimpfen and of a M. de Saint-Croix, formerly
Lieutenant-General in the Bayeux district. In his report upon
d’Auerweck, Colleville had occasion inevitably to mention his friend
Cormier. He stated, in fact, that at the moment d’Auerweck was
located at Mme. Cormier’s house in Paris in the Rue Basse-du-
Rempart.
Colleville could not have begun his work better. D’Auerweck was
not unknown to Reinhard, who, five months before, in a letter to
Delacroix, the Minister for Foreign Affairs, had mentioned the fact of
his presence in London, “where he was in frequent touch with du
Moustier and the former minister Montciel.”
By a curious coincidence, on the same day that Reinhard got his
information, the Minister of Police in Paris, the Citoyen Cochon, had
been made aware that a congress of émigrés was shortly to be held
at Hamburg. The agent who sent him this announcement drew his
attention at the same time to the presence at Hamburg of a person
named Cormier.
“It should be possible to find out through him the names of
those who will be taking part in the Congress. He is a
magistrate of Rennes who has been continually mixed up in
intrigue. His wife has remained in Paris.... The
correspondence of this Cormier ought to be amusing, for he is
daring and has esprit.”
Reference is made in the same communication to “the baron
Varweck, a Hungarian, passing himself off as an American, living in
Paris for the past five months.”
This was enough to arouse the attention of the Directoire. The
persistence with which the two names reappeared proved that their
efforts had not slackened. By force of what circumstances had they
been drawn into the great intrigue against the Revolutionary Party? It
is difficult to say. For some months past Cormier’s letters to Lady
Atkyns had been gradually becoming fewer, at last to cease
altogether. Having lost all hope in regard to the affair of the Temple,
the ex-magistrate, placing trust in the general belief as to what had
happened, came to the conclusion that it was vain to attempt to
penetrate further into the mystery, and he decided to place his
services at the disposal of the Princes.
The Minister for Foreign Affairs lost no time about sending
instructions to Reinhard, charging him to keep a sharp watch on the
meeting of the émigrés and to learn the outcome of their infamous
manœuvres. He should get Colleville, moreover, to establish
relations with Cormier, “that very adroit and clever individual.” In the
course of a few days Reinhard felt in a position to pull the strings of
his system of espionage.
Two very different parties were formed among the émigrés at
Hamburg. That of the “Old Royalists,” or of the “ancien régime,”
would hear of nothing but the restoration of the ancient monarchy;
that of the “new régime” felt that it was necessary, in order to
reinstate the monarchy, to make concessions to Republican ideas.
Cormier would seem to have belonged to the former, of which he
was the only enterprising member. His brother-in-law, Butler, kept on
the move between Paris and Boulogne, and Calais and Dunkirk, with
letters and supplies of money from England. D’Auerweck had left
Paris now and was in England, eager to join Cormier at Hamburg,
but prevented by illness.
Cormier was now in open correspondence with the King, to whom
he had proposed the publication of a gazette in the Royalist interest.
He was in frequent communication, too, with the Baron de Roll, the
Marquis de Nesle, Rivarol, and the Abbé Louis, and all the
“monarchical fanatics.” Despite his age, in short, he was becoming
more active and enterprising than ever. Too clever not to perceive
that he was being specially watched, he was not long in getting the
spy into his own service by means of bribes, and making him
collaborate in the hoodwinking of the Minister. The report that had
got about concerning his actions, however, disquieted the Princes,
and at the end of June Cormier is said to have received a letter from
the Comte d’Artois forbidding him “to have anything more to say to
his affairs,” and reproaching him in very sharp terms. At the same
period, Butler, to whose ears the same report had found its way,
wrote to rebuke him severely for his indiscretion, and broke off all
communication with him. Meanwhile, he was in pecuniary difficulties,
and borrowing money from any one who would lend, so altogether
his position was becoming critical. Soon he would have to find a
refuge elsewhere.
When, in the autumn, Baron d’Auerweck managed to get to
Hamburg, he found his old friend in a state of great discouragement,
and with but one idea in his head—that of getting back somehow to
Paris and living the rest of his days there in obscurity.
The arrival of the “18th Brumaire” and the establishment of the
Consulate facilitated, probably, the realization of this desire. There is
no record of how he brought his sojourn at Hamburg to an end.
D’Auerweck we find offering his services to Reinhard, who formed a
high estimate of his talents. His offer, however, was not entertained.
At this point the “little Baron” also disappears for a time from our
sight.
It is about this period that Cormier and d’Auerweck fall definitively
apart, never again to cross each other’s path.
Reassured by the calm that began to reign now in Paris, and by
the fact that other émigrés who had returned to the capital were
being left unmolested, Cormier made his way back furtively one day
to the Rue Basse-du-Rempart, where the Citoyenne Butler still
resided. The former president of the Massiac Club returned to his
ancient haunts a broken-down old man. Like so many others, he
found it difficult to recognize the Paris he now saw, transformed as it
was, and turned inside out by the Revolution. Wherever he turned,
his ears were met with the sound of one name—Bonaparte, the First
Consul. What did it all matter to him? His return had but one object,
that of re-establishing his health and letting his prolonged absence
sink into oblivion. The continual travelling and his ups and downs in
foreign countries had brought him new maladies in addition to his old
enemy the gout. He had lost half his fortune, through the pillaging of
his estates in San Domingo. Thus, such of his acquaintances as had
known him in the old days, seeing him now on his return,
sympathized with him in his misfortunes and infirmities.
He seemed warranted, therefore, in counting upon security in
Paris. The one thing that threatened him was that unfortunate entry
in the list of émigrés, in which his name figured with that of his son.
In the hope of getting the names erased, he set out one day early in
November, 1800, for the offices of the Prefecture of the Seine. There
he took the oath of fidelity to the Constitution. It was a step towards
getting the names definitely erased. His long stay in Hamburg was a
serious obstacle in the way, but both he and his son looked forward
confidently now to the success of their efforts.
Suddenly, on August 21, 1801, a number of police officials made
their appearance at Cormier’s abode to arrest him by order of the
Minister of Police. His first feeling was one of stupefaction. With what
was he charged? Had they got wind of his doings in England? Had
some indiscretion betrayed him? He recovered himself, however,
and led his visitors into all the various apartments, they taking
possession of all the papers discovered, and sealing up the glass
door leading into Achille’s bedroom, he being absent at the time.
This investigation over, Cormier and the officials proceeded to the
Temple, and a few hours later he found himself imprisoned in the
Tower.
What thoughts must have passed through his mind as he
traversed successively those courts and alleys, and then mounted
the steps of the narrow stairway leading to the upper storeys of the
dungeon!
In the anguish of his position had he room in his mind for thoughts
of those days in London when the name of the grim edifice was so
often on his lips?
Three days passed before he could learn any clue as to the cause
of his arrest. At last, on August 24, he was ordered to appear before
a police magistrate to undergo his trial. An account of this trial, or
interrogatoire, is in existence, and most curious it is to note the way
in which it was conducted. The warrant for his arrest recorded that
he was accused “of conspiracy, and of being in the pay of the
foreigner.” These terms suggested that Cormier’s residence in
England, or at least in Hamburg, was known to his accusers. Had
not the Minister of Police in one of his portfolios a dossier of some
importance, full of all kinds of particulars calculated to “do” for him?
Strange to relate, there is to be found no allusion to this doubtful past
of his in the examination.
After the usual inquiries as to name, age, and dwelling-place, the
magistrate proceeds—
“What is your occupation?”
“I have none except trying to get rid of gout and gravel.”
“Have you not been away from France during the
Revolution?”
“I have served in the war in La Vendée against the Republic
from the beginning down to the capitulation. You will find my
deed of amnesty among my papers.”
“What was your grade?”
“I was entrusted with correspondence.”
“With whom did you correspond abroad?”
“With the different agents of the Prince—the Bishop of
Arras, the Duc d’Harcourt, Gombrieul, etc.”
“Did you not keep up this correspondence after you were
amnested?”
“I gave up all the correspondence eight months before
peace was declared.”
“Do you recognize this sealed cardboard box?”
“Yes, citoyen.”
And that was all! Cormier’s replies, however, so innocent on the
surface, seem to have evoked suspicion, for on August 30 (12
Fructidor) he was brought up again for a second examination.
“With whom did you correspond especially in the West?” he
was asked.
“With Scépeaux, d’Antichamp, Boigny, and Brulefort.”
“And now what correspondence have you kept in this
country?”
“None whatever.”
“What are your relations with the Citoyen Butler?”
“I have had no communication these last two years, though
he is my brother-in-law.”
“Where is he now?”
“I have no idea. I know he passed through Philadelphia on
his way to San Domingo. I don’t know whether he ever got
there or whether he returned.”
“When was he at Philadelphia?”
“He must have been there or somewhere in the United
States not more than two years ago.”
Thus no effort was made in the second inquiry any more than in
the first to search into his past. It should be mentioned that
immediately on his return Cormier had made haste to destroy all
documents that could compromise him in any way.
After a detention of three weeks he was set free, his age and
infirmities doubtless having won him some sympathy. He and his son
—for Achille had been arrested at the same time—were, however,
not accorded complete liberty, being placed en surveillance, and
obliged to live outside Paris. On September 20 they were provided
with a passport taking them to Etampes, whence they were not to
move away without permission from the police.
At this period Fouché had immense powers, and was organizing
and regulating the enormous administrative machine which
developed under his rule into the Ministry of Police. The prisons
overflowed with men under arrest who had never appeared before
the ordinary tribunal, “on account of the danger there was of their
being acquitted in the absence of legal evidence against them.” He
was reduced to keeping the rest under what was styled “une demi-
surveillance.” His army of spies and secret agents enabled him to
keep au courant with their every step.
The reports furnished as to Cormier’s behaviour seem to have
satisfied the authorities, for at the end of a certain time he was
enabled to return to Paris. Having learnt by experience how
unsatisfactory it was to be continually at the mercy of informers, he
now set himself energetically to trying to secure a regular and
complete amnesty. His petition was addressed to the First Consul on
June 18, 1803, and in it he described himself as “crippled with
infirmities,” and it was covered with marginal notes strongly
recommending him to the mercy of the chief of the state.
At last, on October 10, the Minister of Police acceded to his
request, and Cormier received a certificate of amnesty, freeing him
henceforth from all prosecution “on the score of emigration.” With
what a sense of relief must not this document have been welcomed
in the Rue Basse-du-Rempart! Bent under the weight of his
sufferings, Cormier enjoyed the most devoted care at the hands of
his family. His younger son, Patrice, had returned to Paris after an
existence not less adventurous than his father’s. He had thrown
himself into the insurrection in La Vendée, and for three years had
served in the Royalist army of the Maine. Benefiting, like his father,
by the general amnesty, he found his way back to the paternal roof in
Paris, and went into business, so as to throw a veil over his past,
until the day should come when he might appear in uniform again.
Achille, the elder, devoted himself entirely to his father, but the old
man was not to enjoy much longer the peace he had at length
secured for himself. The loss of almost all his income forced him,
moreover, to quit his residence in the Rue Basse-du-Rempart, and to
betake himself to a modest pension in the Faubourg Saint Antoine,
in which he occupied a single room, in which he kept only a few
items from the furniture of his old home—some rose-wood chairs, a
writing-table, a desk with a marble top, a prie-dieu, and a small
wooden desk, “dit à la Tronchin.” The rest of his furniture he sold. It
was in this humble lodging that he died on April 16, 1805, aged sixty-
five. Some months later Mme. Cormier died at the house in Rue
Basse-du-Rempart.
It is strange to reflect that Lady Atkyns, in the course of her many
visits to Paris, should not have ever sought to meet again her old
friend. The Emperor’s rule was gaining in strength from day to day.
Of those who had played notable parts in the Revolution, some, won
over to the new Government, were doing their utmost to merit by
their zeal the confidence reposed in them; the others, irreconcilable,
but crushed by the remorseless watchfulness of a police force
unparalleled in its powers, lived on forgotten, and afraid to take any
step that might attract attention to them. This, perhaps, is the
explanation of the silence of the various actors in the drama of the
Temple, once the Empire had been established.
CHAPTER VII
THE “LITTLE BARON”

Cormier’s departure did not for a single moment interrupt the fiery
activity of Baron d’Auerweck, nor his co-operation in the most
audacious enterprises of the agents of Princes and of the Princes
themselves. He lost, it is true, a mentor whose advice was always
worthy of attention, and who had guided him up to the present time
with a certain amount of success; but the ingenious fellow was by no
means at the end of his resources. The life which he had led for the
past five years was one which exactly suited him. A practically
never-ending list might be drawn up of acquaintances made in the
course of his continual comings and goings, of encounters in this
army of emissaries serving the counter-Revolution, and of
particularly prosperous seasons. Besides the d’Antraigues, the
Fauche-Borels, and the Dutheils, there was a regular army of
subordinates, bustling about Europe as though it were a vast anthill.
Amongst them d’Auerweck could not fail to be prominent, and he
was soon marked as a clever and resourceful agent. His sojourn in
Hamburg also continued to arouse curiosity and observation on the
part of the representatives of the Directoire. They recognized now
that he was employed and paid by England. “He serves her with an
activity worthy of the Republican Government,” Reinhard wrote to
Talleyrand; and it was well known that Peltier’s former collaborator,
always an energetic journalist, assisted in editing the Spectateur du
Nord.
An unlooked-for opportunity to exploit his talent soon offered itself
to d’Auerweck.
The Deputies of the ten states, which at that time formed the
Empire, had been brought together by the congress which opened at
Rastadt on December 9, 1797, and for eighteen months there was
an extraordinary amount of visits to and departures from the little
town in Baden. The presence of Bonaparte, who had arrived some
days before the commencement of the conference in an eight-horse
coach, with a magnificent escort, and welcomed throughout his
journey as the victor of Arcole, increased the solemnity and scope of
the negotiations. All the diplomatists, with their advisers, their
secretaries, and their clerks, crowded anxiously round him. Agents
from all the European Powers came to pick up greedily any scraps of
information, and to try to worm out any secrets that might exist.
Rastadt was full to overflowing of spies and plotters, and the name of
this quiet, peaceful city, hitherto so undisturbed, was in every one’s
mouth.
From Hamburg the “little Baron” followed attentively the first
proceedings of the Congress through the medium of the
newspapers, but the sedentary life which he was leading began to
worry him. In vain he wrote out all day long never-ending political
treatises, crammed with learned notes on the European situation,
wove the most fantastic systems, and drew up “a plan for the
partition of France, which he proposed to a certain M. du Nicolay;” all
this was not sufficient for him. D’Auerweck was on friendly terms with
the Secretary of the French Legation, Lemaître by name (who, by
the way, had no scruples about spying on him some years later, and
informed against him without a blush), and, giving full play “to his
romantic imagination and to his taste for sensational enterprises,” he
one day submitted to his confidant a scheme to “kidnap the Minister,
Reinhard, and carry him off to London; his attendants were to be
made intoxicated, his coachman to be bribed, ten English sailors to
be hidden on the banks of the Elbe!” At the back of these schemes
of mystery there figured a certain “Swiss and Genevan Agency,”
which at the proper time would, he declared, generously reimburse
them for all their expenditure. But, for all these foolish imaginings,
d’Auerweck displayed a knowledge of the world and a sound
judgment which struck all those who came in contact with him, and it
was certain that with strong and firm guidance he was capable of
doing much good and useful work. In the winter of 1798 we are told
that “he left Hamburg secretly” for an unknown destination. Lemaître
believed that he had buried himself “in the depths of Silesia,” but he
had no real knowledge of his man. For, as a matter of course,
d’Auerweck was bound to be attracted to such a centre of affairs as
Rastadt then was, in order to make the most profitable use of his
ingenuity, seeing that, according to report, the British Government,
which was making use of his services, in fear of being kept in the
dark as to what was going on, had begged him “to go and exercise
his wits in another place.”
He made Baden his headquarters, for the proximity of Rastadt,
and his intimacy with the de Gelb family, which has already been
mentioned, led him to prefer Baden to the actual field of battle, at
which place he must have come under suspicion as an old English
agent. One of the Austrian envoys at the Congress was Count
Lehrbach; and d’Auerweck managed to get into relations with him,
and even to be allowed to do secretarial work for him, on the
strength of the connection which he declared he had possessed with
Minister Thugut during the early days of the Revolution and the
confidence which had lately been reposed in him. He had reason to
hope that with the help of his ability and his gift of languages he
would soon be able to secure active employment. And, indeed, it
was in this way that d’Auerweck succeeded in re-establishing himself
at once, to his great satisfaction, as an active agent, with a footing in
the highest places, ferreting out the secrets of the Ambassadors, and
carrying on an underhand correspondence openly. His intention was,
doubtless, to return to Austria as soon as the Congress was over, by
the help of Count Lehrbach, and there to regain the goodwill of his
former patron, the Minister Thugut.
But the sanguinary drama which brought the Conference to such
an abrupt conclusion completely spoiled his plans and undid his
most brilliant combinations. We can realize the universal feeling of
consternation throughout the whole of Europe which was caused by
the news that on the evening of April 28, 1799, the French Ministers,
Bonnier, Roberjot, and Debry, who had just made up their minds to
betake themselves to Strasburg, along with their families, their
servants, and their records—a party filling eight carriages—had been
openly attacked as they were leaving Rastadt by Barbaczy’s
Hussars; that the two first-mentioned gentlemen had been dragged
from their carriages and treacherously murdered, and that the third,
Debry, had alone escaped by a miracle. Even if the outrage of
Rastadt was “neither the cause nor the pretext of the war of 1799,”
its consequences were, nevertheless, very serious.
One of these consequences, and not the least important, was that
Bonaparte’s police, magnificently reorganized by Fouché, redoubled
its shepherding of émigrés and agents of the Princes, who swarmed
in the country-side between Basle, the general headquarters of the
spies, and Mayence. Once an arrest took place, the accused was
certain to be suspected of having had a hand in the assassination of
the plenipotentiaries, and if by any bad luck he was unable to deny
having been present in the district, he found it a very difficult task to
escape from the serious results of this accusation.
A few months after this stirring event, Baron d’Auerweck, tired of
such a stormy existence, and seeing, perhaps, a shadow of the
sword of Damocles hanging over his head, determined himself to
break away from this life of agitation, and to settle down with a wife.
During the last days of the year 1799 he was married at Baden to
Mademoiselle Fanny de Gelb, a native of Strasburg, whose father
had lately served under Condé; she also had a brother who was an
officer in the army of the Princes. But, in spite of a pension which the
mother, Madame de Gelb, was paid by England on account of her
dead husband’s services, the available resources of the future
establishment were very meagre indeed, for the “little Baron” had not
learned to practise economy while rushing about Europe; so, as
soon as the marriage had been celebrated, the turn of the wheel of
fortune forced the young wife to leave the Grand Duchy of Baden
and to wander from town to town in Germany and Austria.
They travelled first to Munich and then to Nuremburg, but
d’Auerweck’s plans were to establish himself in Austria close to all
his belongings. He had the fond hope of obtaining employment from
Minister Thugut, to whom he reintroduced himself. But he
experienced a bitter disappointment, for on his first attempt to submit
to his Excellency the greater part of his last work (in which he had
embodied, as the result of desperate toil, his views on the present
political situation, the outcome of his conversations with the
representatives of the different European states, his reflections and
his forecast of events) d’Auerweck found himself unceremoniously
dismissed. Thugut flatly refused, if the story is to be believed, to
have anything further to do with a man who was still suspected of
being an English emissary. Consequently he was obliged to abandon
his idea of establishing himself in Austria, and to hunt for other
means of existence, more particularly as Madame d’Auerweck had
just presented him with his first child at Nuremburg. He turned his
steps once again in the direction of the Grand Duchy, and after
successive visits to Friburg, Basle, and Baden, he decided to make
his home in Schutterwald, a village on the outskirts of the town of
Offenburg. There he determined to lead the life of a simple, honest
citizen, and renting a very humble peasant’s cottage, he installed his
wife and his mother-in-law therein. He himself set to work on the
cultivation of his garden, devoting his spare moments to writing, so
as not to lose the knack, the sequel to his Philosophical and
Historical Reflections.
He soon got to know his neighbours and all the inhabitants of the
country very well. He was considered to be a quiet, unenterprising
man, “with a positive dislike for politics, although loquacious and
vain.” It was impossible to find out anything about his past life, for the
prudent Baron considered it inadvisable to talk of this subject, but he
was always looked upon “as an argumentative man, who wanted to
know all that was going on, whether in reference to agriculture, to
thrift, or to politics.” In spite of the apparent tranquillity in which he
was allowed to remain, d’Auerweck followed with a certain amount of
anxiety all the events which were happening not far from him, on the
frontier of the Rhine. Troops were continually passing to and fro in
this district; the French were close at hand, and their arrival at
Offenburg inspired a feeling of vague unrest in him, although he
never recognized, to tell the truth, the danger which threatened him.
He had taken the precaution to destroy, before coming to
Switzerland, his vast collection of papers: all that mass of
correspondence which had been accumulating for the last few years,
those reports and instructions, all of which constituted a very
compromising record. At last, after a residence of some months, to
make matters safe, he contrived, thanks to his marriage, to be
enrolled as a freeman of the Grand Duchy; for it seemed to him that
as a subject of Baden he would be relieved of all further cause of
alarm.
But all d’Auerweck’s fears were reawakened by the much-talked-of
news of the Duc d’Enghien’s arrest on March 15, 1804, and by the
details of how the Prince had been captured openly in the jurisdiction
of Baden, at Ettenheim, that is to say, only a short distance from
Offenburg. He absented himself for some days from Schutterwald,
so the story goes, and took himself to the mountains.
Just at the same time there arrived at the offices of the Ministry of
Police in Paris a succession of memoranda, mostly anonymous,
referring to Baron d’Auerweck, and to his presence in the
neighbourhood of the Rhine.
Some of them came from Lemaître, Reinhard’s former secretary at
Hamburg. Many of them, inexact and inaccurate as they were with
regard to the details of the alleged facts, agreed on this point, viz.
that the individual “was one of those men, who are so powerful for
good or bad, that the security of every Government requires
complete information as to their resting-places and their doings.”
Then followed a medley of gossiping insinuations, the precise import
of which it was difficult to discover.
“I shall never forget,” said one, “that, when d’Auerweck left
Hamburg two months before the assassination of the French
Ministers in order to take up his quarters only three leagues
away from Rastadt, he said: ‘I am about to undertake an
operation which will make a great sensation, and which will
render great service to the cause of the Coalition.’”
“Now supervenes a whole year, during which his doings
and his whereabouts are most carefully concealed,” wrote
another; “however, I am certain that he is acting and working
pertinaciously against the interests of France. I have heard
him make this remark: ‘We shall take some time doing it, but
at last we shall conquer you.’” A third added: “His tranquillity
and his silence are but masks for his activity, and I, for one,
could never be persuaded that he has all of a sudden ceased
to correspond with Lord Grenville in London, with the Count
de Romanzof, with a certain Nicolai in St. Petersburg, with
Prince Belmonte, with the Chevalier de Saint-Andre, with
Roger de Damas in Italy, with Dumoustier, who is, I believe, a
Hohenlohe Prince in Berlin, and directly with the Count de
Lille.” Finally, d’Auerweck, according to the same report,
“complaisantly displayed a spot in the shape of the fleur-de-
lys, inside his fist, declaring that ‘this is a sign of descent; it is
a mark of predestination; I of all men am bound to devote
myself and assist in the return of the Bourbons!’”
It would be a fatal mistake to believe that these fairy tales, all
vague and absurd as they often were, remained lost and forgotten in
the despatch-boxes of the Ministry of Police. The region, near as it
was to Rastadt, where d’Auerweck was reported to have made his
appearance, was a valuable and important indication, which of itself
was sufficient to make the man an object for watchful suspicion. The
ominous nature of the times must, of course, be remembered.
Fouché, who had just been restored to favour, and had been placed
for the second time at the head of the Ministry of Police, was anxious
to prove his zeal afresh, to please the Emperor and to deserve his
confidence, while his mind was still troubled by the execution of the
Duc d’Enghien, by the exploits of Georges Cadoudal, and by the
discovery of the English Agency at Bordeaux, which were all fitting
reasons for attracting the Minister’s attention and for exciting his
curiosity. So, when on October 11, 1804, his Excellency decided to
make further searching investigations into d’Auerweck’s case, and
gave precise orders to the prefects of the frontier departments of the
Grand Duchy, it is doubtful whether he was careful to note in his
charge the principal reasons for attaching suspicion upon the Baron,
viz. those which had to do with the assassination of the
plenipotentiaries at Rastadt.
It was some time before the required information could be
obtained, and though the first inquiries about d’Auerweck made by
Desportes, the prefect of the Upper Rhine, added little in the way of
news, they agreed, nevertheless, in certifying that the Baron lived
very quietly in the outskirts of Offenburg—
“that he there devoted himself entirely to his agricultural
occupations, and that the kind of life he led did not foster any
suspicion that he kept up his old campaigns of intrigue.”
Six months later, Desportes, in returning to the subject, showed
himself more positive than ever, for he affirmed—
“that no active correspondence can be traced to
d’Auerweck, and that he saw scarcely any one. He is a man
of a caustic and critical turn of mind, who often lets himself go
in conversation without reflection in his anxiety to talk
brilliantly. A point which is particularly reassuring about him is
that he is without credit, without fortune, and of no personal
account, and that if he wanted to mix himself up afresh in
intrigues he would choose some other place than Offenburg,
where there are now only three émigrés, the youngest of
whom is seventy-seven years of age.”
But, in spite of these very positive statements, the Minister
preserved his attitude of mistrust, which was strengthened by the
arrival of fresh notes, in which the same denunciations of the “little
Baron” were repeated. He was described as being “restless by
inclination, violently fanatical in all his opinions, and longing to make
himself notorious by some startling act.” But his position was made
worse by the information which was received, that in the autumn of
1805 d’Auerweck had absented himself from home for several days,
frightened, no doubt, by the proximity of the French armies, which
were dotted about on the banks of the Rhine. How could this sudden
flight be accounted for? And his alarm at the sight of the Emperor’s
soldiers at close quarters? Such conduct struck Fouché as being
very suspicious. He ordered a supplementary inquiry, and this time
he did not content himself with the information afforded by the
prefect of the Upper Rhine, but let loose one of his best bloodhounds
on his scent. At the time, two years ago, when preparations were
being made for the kidnapping of the Duc d’Enghien and for
watching his residence at Ettenheim, recourse was had to the
services of the Commissary of Police, Popp by name, who was
stationed at Strasburg. In this frontier town near Basle an active and
intelligent man was needed, who could maintain a constant watch on
the underhand practices of the Royalist agents. Commissary Popp
seemed to be made for the job. His handling of the Duc d’Enghien’s
affairs earned the approval of Napoleon; and Fouché, since his
reinstatement in the Ministry, recognized in him a clever and expert
functionary, on whom he could always count.
This was the man who was charged with the task of spying on
d’Auerweck, and throughout the whole of 1806 Popp was hard at
work on this mission. His original authentications differed very little
from what Desportes had written, and there was nothing to prove
that the Baron had in any way departed from his passive attitude.
“I have not discovered,” wrote Popp on April 22, 1806, “that
he is in correspondence with the English agitation, or that he
shows any inclination to excite and embitter people’s tempers.
I believe that he, like many others, is more to be pitied than to
be feared.”
Some weeks later Popp managed to loosen the tongue of an
ecclesiastic, a dweller in those parts, and from him he got
information about the business and movements of the Baron. “He is
quite absorbed in rural economy, which is his chief thought to all
appearances,” he reported to Fouché; but then, stung to the quick by
the repeated orders of his chief (who never ceased from impressing
upon him the necessity for the closest watch on d’Auerweck’s
traffickings), Popp, impatient for an opportunity to prove his zeal,
began to magnify his words by introducing subtle insinuations.
By this time d’Auerweck had come to the conclusion that his stay
at Schutterwald was too uncomfortable, and having heard of a bit of
land at a reasonable price in Elgersweier, which was not far from
Offenburg, indeed about the same distance from the town, he made
up his mind to take shelter there and to build a little house, which
would be his own property. The question was asked how could he,
whom every one looked upon as a penniless man, obtain the funds
required to complete this bargain? Without doubt he borrowed from
his mother-in-law, Madame de Gelb, who had always lived with him,
and whose modest income was so pleasantly augmented by the
pension which she received from the English Government. And so,
in the middle of the summer of 1806, the “little Baron” transported his
penates to Elgersweier, where he settled his belongings very
comfortably. By this time two other sons, Armand and Louis, had
been added to the one born at Munich, and shortly after arriving at
the new home Madame d’Auerweck gave birth to a daughter, who
was named Adelaide.
Commissary Popp knew all about these happenings, and his
supervision never slackened for an instant. Encouraged by his
success in arranging the preliminaries for the affair at Ettenheim, he
was perfectly prepared to repeat the operation. With this in view, he
began to show the Minister, in ambiguous language at first, his very
good and sufficient reasons for desiring d’Auerweck’s presence in
France. If necessary, he urged, we could easily get permission from
the Grand Duke of Baden to arrest him in his own home. This
suggestion was expressed very cautiously at first, but was soon
made more explicit, although there was not the slightest shadow of
an excuse for such violence, for all his statements “agreed in
demonstrating the perfectly peaceful nature of the “little Baron’s”
existence.
“It would be advisable to make certain of his person,” wrote
Popp, “and my opinion will always be the same if certain
difficulties with the House of Austria happen to be renewed;
for d’Auerweck, posted as a sentinel on the opposite bank,
and doubtless possessing friends on our side, would be one
of the very first bearers of information about our military
position and political topography.”
About the same time, Bourrienne, one of Minister Reinhard’s
successors at Hamburg, arrested an émigré who had lately landed
from London, and who was supposed to be in possession of
important secrets. This was the Viscount de Butler, Cormier’s half-
brother, who, after having “worked,” as we have seen, for the
Royalist Committee in London, now found himself stranded in
Hamburg in the greatest misery. It was decided to send him to Paris,
as he offered to give up certain documents. He was imprisoned in
the Temple, and there questioned by Desmarets, who extracted from
him all kinds of information with regard to his missions. Naturally,
Butler related all he knew about d’Auerweck, how he had made his
acquaintance, and what sort of terms he was on with Dutheil and
with Lord Grenville. As his answers proved satisfactory he was sent
back to Hamburg, where Bourrienne continued to make use of him
for many years.
Finally, to complete the bad luck, the police were warned of a
certain Sieur de Gelb, a former officer in the army of the Princes,
whose behaviour had been discovered to be very mysterious, and
who paid frequent visits to the frontier. Now, this émigré was no other
than Baron d’Auerweck’s brother-in-law.
All these stories, cleverly made the most of and carefully improved
upon, served to greatly excite the curiosity of the Minister of Police,
all the more as the Royalists were showing much increased activity
in many places. To add to the effect, Normandy became the theatre
of several audacious surprises, such as coaches being robbed,
convoys plundered, and attacks on the high road, many of which
were the handiwork of the inhabitants of the castle of Tournebut, led
by the Viscount d’Aché and the famous Chevalier. Besides, the
Emperor was waging war in Prussia at the head of his armies, a
thousand leagues from Paris, and in his absence the conspirators’
audacity redoubled; but he did not lose sight of them, and from his
distant camps he kept so closely in touch with all that was happening
in France that he compelled Fouché’s incessant vigilance. An event
which took place next year, when war with Germany broke out
afresh, clearly demonstrated once more the danger of attracting for
too long the attention of his Excellency the Minister of Police.
One evening, in the month of June, 1807, a policeman on his
rounds noticed in one of the squares in the town of Cassel a young
man behaving very strangely, and speechifying in the middle of a
crowd. He drew near, and ascertained that the individual, who was
very excited, was pouring forth a stream of insults and threats
against Napoleon, whom he went so far as to call “a good-for-
nothing scamp.” This was quite enough to decide the representative
of public order upon arresting the silly fellow. He was taken off to the
police station and questioned. He stated that his name was Jean-
Rodolphe Bourcard, “formerly a ribbon manufacturer,” aged twenty-

You might also like