You are on page 1of 10

Turing’s Vision and Deep Learning

Martin Davis(B)

Courant Institute, New York University, New York, USA


martin@eipye.com

Abstract. The success of the AlphaGo computer program in playing


world class Go is examined in connection with what Turing had envi-
sioned more than 70 years ago. A critical discussion is given of Searle’s
comments on intelligent computer programs and of what Copeland had
to say about this.

Keywords: Turing · AlphaGo · Searle · Copeland


Convolutional neural network

An issue of the British newspaper The Guardian in October 2017 featured


an article on the achievements of DeepMind’s most recent incarnation of their
Go-playing machine. Their AlphaGo program had previously learned to play
this deep ancient Chinese game well enough to defeat human grandmasters. It
accomplished this after being trained using thousands of games that had been
played between grandmasters. The newer version “AlphaGo Zero” learned by
playing thousands of games against itself thereby steadily improving. Today it
beats all comers including the original AlphaGo. Both versions made use of multi-
layer neural networks. These are imagined as consisting of a large number of
individual elements called “neurons” interconnected so that signals output from
each became the input to others. One may imagine a tangle of interconnected
wires resembling the tangle of neural connections in a brain. However, entering
a room in which a device like AlphGo is housed, one would see only ordinary-
looking computer equipment.
In this paper, I will discuss how Turing imagined the future of computation
at a time when the first computers were yet to be built, and how this relates to
contemporary developments.

1 Alan Turing in 1945


The Second World War was over, but the contributions Turing had made to
victory at Bletchley Park were to remain secret for a long time. But he still had
knowledge he had acquired there of the use of vacuum tube (British: valves)
circuits in carrying out logical operations. And he understood the relevance of
the new expansive view of the nature of computation that had emerged from
his theoretical investigation of computability before the war. This led to his
c Springer International Publishing AG, part of Springer Nature 2018
F. Manea et al. (Eds.): CiE 2018, LNCS 10936, pp. 146–155, 2018.
https://doi.org/10.1007/978-3-319-94418-0_15
Turing’s Vision and Deep Learning 147

ACE report outlining the construction and use of a general purpose digital com-
puter [13]. A comparison of this report with von Neumann’s Edvac report is
revealing of Turing’s very different outlook. Whereas, along with addition and
multiplication, von Neumann would provide circuitry for arithmetic division as
well as, astonishingly, even for obtaining square roots, the ACE report suggests
performing division via software, but did specify circuitry for basic digit-by-digit
Boolean logical operations. Also two of the ten problems Turing discusses as pos-
sibly appropriate for the ACE, solving simple jigsaw puzzles and playing chess,
are quite remote from numerical computation.
In 1947 in concluding an address on the ACE to the London Mathematical
Society [14], Turing said:
It has beem said that computing machines can only carry out the processes
that they are instructed to do. This is certainly true in the sense that if
they do something other than what they were instructed then they have
just made some mistake. It is also true that the intention in constructing
these machines in the first instance is to treat them as slaves, giving them
only jobs which have been thought out in detail, jobs such that the user
of the machine fully understands what in principle is going on all the
time. Up till the present machines have only been used in this way. But
is it necessary that they should always be used in such a manner? Let us
suppose we have set up a machine with certain initial instruction tables,1
so constructed that these tables might on occasion, if good reason arose,
modify those tables. One can imagine that after the machine had been
operating for some time, the instructions would have altered out of all
recognition, but nevertheless still be such that one would have to admit
that the machine was still doing very worthwhile calculations. Possibly it
might still be getting results of the type desired when the machine was
first set up, but in a much more efficient manner. In such a case one would
have to admit that the progress of the machine had not been foreseen
when its original instructions were put in. It would be like a pupil who
had learnt much from his master, but had added much more by his own
work. When this happens I feel that one is obliged to regard the machine
as showing intelligence. As soon as one can provide a reasonably large
memory capacity it should be possible to begin to experiment on these
lines. The memory capacity of the human brain is probably of the order
of ten thousand million binary digits. But most of this is probably used in
remembering visual impressions, and other comparatively wasteful ways.
One might reasonably hope to be able to make some real progress with a
few million digits, especially if one confined one’s investigations to some
rather limited field such as the game of chess. It would probably be quite
easy to find instruction tables which would enable the ACE to win against
an average player. Indeed Shannon of Bell Telephone laboratories tells me
that he has won games, playing by rule of thumb: the skill of his opponents
1
Turing had introduced the term instruction table for what came to be called a com-
puter program. As I write, one usually speaks of an app.
148 M. Davis

is not stated. But I would not consider such a victory very significant. What
we want is a machine that can learn from experience. The possibility of
letting the machine alter its own instructions provides the mechanism for
this, but this of course does not get us very far.

2 Turing and Machine Intelligence

Famously Turing proposed as a goal, the ability of a machine to carry on a con-


versation with a person so effectively that it would be difficult or impossible for
the person to tell whether the conversation had been with a person or a machine
[15]. This has led to a huge discussion of “Turing’s test,” and a proliferation of
claims that various programs had “passed” the test, mainly based on conver-
sations severely restricted in subject matter. In a brief forward to the Turing
Centenary reprint of Sara Turing’s biography of her son [18], I wrote, “The cri-
terion he chose was the ability of such a machine to carry on a conversation
that could not reasonably be distinguished from one by a person. He predicted
that this would be achieved by the end of the twentieth century, but was far
too optimistic about the task of programming computers to achieve a command
of natural language equivalent to that of every normal person”. I’m grateful to
Jack Copeland for calling my attention to the fact that Turing’s prediction of
what would be accomplished by the end of the century was far more modest:2
I believe that in about fifty years’ time it will be possible to programme
computers . . . to play the imitation game so well that an average inter-
rogator will have not more than a 70 per cent chance of making the right
identification after five minutes of questioning. . . . I believe that at the
end of the century the use of words and generally educated opinion will
have altered so much that one will be able to speak of machines thinking
without expecting to be contradicted.

I would claim that even this modest claim was too optimistic if the machine is
to exhibit anything like the verbal fluency of an adolescent or adult. Reading
what Turing had to say about teaching a “child computer” with rewards and
punishments, that is not at all surprising. Very young children seem to acquire
language with an ease that adults seeking to learn a “second” language can only
envy. Referring to my error, Copeland ([4] p. 272) refers to me as “one of Turing’s
critics”, and to what I said as “misguided criticism”. It wasn’t criticism at all, it
was an error, evidence of sloppy scholarship on my part. As Turing wisely said
at the end of his provocative essay, “We can only see a short distance ahead”; I
could never imagine criticizing Turing concerning his skill as a prophet.
I one of Turing’s critics? In 1987, Turing was still being totally ignored in dis-
cussions of the early history of computers, discussions that revolved around the
question of whether von Neumann had improperly denied the credit for their

2
I was embarrassed to find that I had made the same error in my book [6, 7]. Fortu-
nately I’ve been able to correct it for the third edition, in press as I write.
Turing’s Vision and Deep Learning 149

contribution to the “stored program concept” that the engineers Eckert and
Mauchly purportedly deserved. My essay [5], published in that year, explained
the significance of Turing’s theoretical investigations contributions as providing
the conceptual basis for modern computers. David Leavitt, in his biography of
Turing [10], mentioned “von Neumann often being given credit for ideas origi-
nating with Turing”, and wrote: “Martin Davis must be credited with setting
the record straight on this account”.
Although I cannot fairly be called a critic of Turing, and although I very much
admire Copeland’s extensive and enthusiastic work in presenting and advocating
for Turing’s crucial contributions, I certainly have been a critic of Copeland. His
advocacy has at times led him to farfetched notions, as when he proclaimed that
the “the search is on” for the construction of a Turing oracle that could provide
answers to uncomputable problems [1,9].
I was astonished to find that Copeland takes seriously Searles fanciful Chinese
room as having anything significant to say about Turing’s ideas. Copeland takes
Searle seriously as “one of AI’s greatest critics”. He seems to believe that the
Chinese room provides a serious challenge with “a subtle fallacy” ([4], pp. 274–
275). Copeland responds with a clever verbal riposte such as one can hear in
the discussion following an academic philosophy seminar talk. I prefer to discuss
what Searle has to say in terms of a domain in which computers can already
perform quite creditably: chess. Fortunately, Searle has also provided us with a
chess room [12]:

Imagine that a man who does not know how to play chess is locked inside a
room, and there he is given a set of, to him, meaningless symbols. Unknown
to him, these represent positions on a chessboard. He looks up in a book
what he is supposed to do, and he passes back more meaningless symbols.
We can suppose that if the rule book, i.e., the program, is skillfully written,
he will win chess games. People outside the room will say, “This man
understands chess, and in fact he is a good chess player because he wins”.
They will be totally mistaken. The man understands nothing of chess, he
is just a computer. And the point of the parable is this: if the man does
not understand chess on the basis of running the chess-playing program,
neither does any other computer solely on that basis.

As with the Chinese room, Searle and Copeland can assure us that “the
individual steps” are “simple binary operations that a human being can easily
carry out . . . given enough time”. Because chess playing programs (and very good
ones at that) exist, one can calculate how much time is “enough time”. As I write,
I have no doubt that chess playing programs exist that are a lot better than Deep
Blue, the program that created a sensation when it defeated Kasparov. And Deep
Blue was a better player than its predecessor Deep Thought. After obtaining
some data from a member of the team that had designed Deep Thought, I
calculated that it would require an average of more than a year of the man’s
time to carry out enough of those “simple steps” for making a single move. Searle
nevertheless insists [12]:
150 M. Davis

Here is what is going on inside Deep Blue. The computer has a bunch of
meaningless symbols that the programmers use to represent the positions
of the pieces on the board. It has a bunch of equally meaningless symbols
that the programmers use to represent options for possible moves. The
computer does not know that the symbols represent chess pieces and chess
moves, because it does not know anything.
Some academic philosophers like Searle may enjoy regarding computers as know-
ing nothing, but a human chess player who has lost his queen to a fiendishly clever
trap, will retort, “Well it certainly knows how to play chess”. I have provided
the following version of Searle’s parable [6–8]:
A precocious child whose mother is passionate about chess becomes tired
of watching her play and demands that he be allowed to play her opponent.
His mother agrees on the condition that he move the pieces only when she
tells him to and exactly where she says. He does as requested and doing
what his mother whispers in his ear achieves a checkmate. Observing the
scene, Searle tells us that the child doesn’t know anything about chess,
and is certainly not playing chess. Who could disagree?
And here is my analogous parable for the famous Chinese room itself, a spy
story:
A double agent Nak Moon, who is Korean, is pretending to be fluent in
Chinese. Actually he doesn’t understand a word of it, but is an excellent
mimic. He is being questioned in Chinese by his interlocutor via a video
connection. Amy Chung, who was born and grew up in Shanghai, is in an
adjacent room. She hears everything, and can speak to Nak without his
interlocutor knowing, by means of a hidden earphone. She tells Nak what
to reply to the questions being thrown at him. The dialog ends with the
interlocutor satisfied and Nak Moon still totally ignorant of the meaning
of what he has been saying.
Searle tells us that Nak Moon knows no Chinese. Who could disagree? Copeland
concludes his discussion of Searle’s Chinese room, by writing:
Alan Turing’s test has been attacked by some of the sharpest minds in
the business. To date, however, it stands unrefuted. In fact, it is the only
viable proposal on the table for testing whether a computer is capable of
thought ([4] p. 275).
I cannot claim to share Copeland’s apparent ability to assess the relative “sharp-
ness” of the minds of those who engage in discussions regarding computer
thought. However, I will note that multi-level neural network software together
with the sheer power of contemporary hardware has enabled computer systems
to perform as well as and even better than people, in doing things that, when
people do them, certainly require thought. I don’t believe that one can have a
meaningful discussion of whether what these devices are doing constitutes think-
ing in the human sense until neuroscientists have some real understanding what
Turing’s Vision and Deep Learning 151

it is that we do, at the level of individual neurons, when we think. In the mean-
time, we can safely leave discussions of whether these remarkable systems are
really thinking to those philosophers who regard this as worthwhile.

3 Neural Networks and AlphaGo


The idea of, a “neural network” that sought to emulate a brain by imagining
an interconnected network of objects, each having some of the characteristics of
biological neurons, occurred to a number of researchers. The foundational article
[11] by McCulloch and Pitts was the first to elaborate a mathematical theory
of such structures. Turing himself in his 1948 memo to his boss at the National
Physical Laboratory, [17]3 considered several models of such neural networks.
Other early work in this area were done by Marvin Minsky, Frank Rosenblatt,
and Michael Arbib. The neurons considered in contemporary networks have sev-
eral input channels that receive numerical signals from other neurons or from
their external environment and one output channel that transmits such signals.
Each input channel has a number associated with it called its weight; at each
stage the signal transmitted by a neuron is the weighted average of the input
values. Although weights are already present in [11], there were no weights in
Turing’s neurons.
Contemporary neural networks usually consist of at least three intercon-
nected layers. In a three-layer network, the neurons in the first layer receive
input signals and transmit their output to neurons in the second layer. These
neurons in turn transmit their output to the third layer which generates the out-
put signal. In order to obtain worthwhile results, at least some of the neurons
must apply a suitable non-linear activation function to the weighted average of
its input signals to produce its output. The choice of an appropriate activation
function in the design is crucial. The hyperbolic tangent

ex − e−x
tanh(x) =
ex + e−x
is an example of an activation function with desirable properties. Learning takes
place by training the network with a collection of examples. A back propagation
algorithm is applied after each example. This algorithm is designed to reduce the
error between the desired output and the actual output. This process is what is
meant by deep learning.
Such deep learning techniques have been very successful in training neural
networks to identify specific objects in a video image. Video images exist in
the form of an array of pixels stored in a computer. Neural networks have been

3
Jack Copeland, in a careful detailed introduction to this essay, pointed out that
its previous publication, [16] pp. 107–132, contained serious errors. I should also
mention that in [1], the article I have criticized for its embrace of hypercomputation,
information about Turing’s early comments on neural nets was brought to public
attention.
152 M. Davis

trained to distinguish a bird from a squirrel and to identify individual human


faces. The networks that accomplish such feats are usually designed to be “convo-
lutional”: what people and animals accomplish by scanning a visual field through
eye movements, a convolutional neural network accomplishes by carrying out the
enormous number of computations needed to recognize that objects in different
parts of a video image are the same kind of thing. These computations are
greatly facilitated by having special “graphical processing unit” chips as part
of the hardware. These GPU chips were originally designed and manufactured
for machines intended for playing computer games, and make very fast complex
numerical computations feasible. It needs to be emphasized that these networks
are algorithms, typically implemented, not as a large box of wriggling intercon-
nected little gadgets, but rather as lines of code written in a modern high level
programming language. The network can be thought of as a kind of dynamic
data structure.
When I was preparing the third edition of my The Universal Computer [6–
8], just published as I write, there was general astonishment over a remarkable
achievement by AlphaGo, a machine for playing the ancient game of Go. Go is
much more complex than chess. Where chess is played on a board with 64 squares
on which the pieces can be placed, Go is played on a 19 × 19 board with 361
places where pieces, called stones, can be placed. Like chess the winner is entirely
determined by the moves the players make, taking alternate turns placing a stone
in an unoccupied place. An algorithm to play Go must at each stage choose a
branch from a tree that is very long and extremely wide. Playing against the
Go master Lee Sedol in a five game match, AlphaGo won four of the games.
The prevailing opinion had been that a Go machine capable of beating human
master players was still many years in the future. I wanted to write something
about this, but I was ignorant. I had been a skeptic about neural networks and
had never learned much about them. Fortunately Thore Graepelof, one of the
computer engineers in the DeepMind group that had developed AlphaGo, agreed
to an hour-long interview from London by video connection, courtesy of Google.
He turned out to be a clear and patient explainer.
AlphaGo uses a Monte Carlo tree search algorithm taking advice from two
auxiliary convolutional neural networks: a policy network and a value network.
Both networks were trained by giving them access to a huge library of games
played between expert players over many years.The policy network estimates
how likely various possible moves are to to be played by an expert Go player
starting from a particular arrangement of pieces on the Go board. Thore told us,
“This policy network was already a pretty good Go player. I am a pretty good
amateur player myself. On my first day at DeepMind, David Silver invited me
to play against this network, and I was sure I would win. But it beat me! Then
I was sure I wanted to be part of the AlphaGo team”. The other network, the
value network, starting with any Go position, estimates for one of the players,
the probability that player will win. Convolution plays a role similar to that in
object recognition: just as a bird in one corner of a video image needs to be
recognized as still a bird when it’s in a different part of the image, a certain
Turing’s Vision and Deep Learning 153

configuration of the stones in one part of the Go board needs to be recognized


as similar to one in a different part. It should be emphasized that the success
of AlphaGo is due not only to the effectiveness of the software design, but also
to the power of contemporary hardware. In particular, AlphaGo uses GPUs for
intensive computation.
In 2016 AlphaGo was brought to Korea to challenge Lee Sedol to a five game
match. Thore said, “Although we had great confidence in AlphaGo, you never
know what chance will come up with in a tournament. Fortunately, AlphaGo
won the first three games. By the fourth game, our team was actually rooting
for Lee Sedol, and was happy for him when he succeeded”. Then AlphaGo won
the fifth game. In May 2017 at the Future of Go Summit in China, AlphaGo won
all three games against world champion Ke Jie. Thore continued, “Go players
have started to pick up patterns of play that AlphaGo invented. In training
for the May competition, we let AlphaGo play against itself and create new
games of higher quality, resulting in a set of training data for a stronger version
of AlphaGo. With this kind of bootstrapping, a machine learning system can
continue to grow”.

4 Turing’s Vision

When Turing began writing about the capabilities of computers in the 1940s,
there were still no working computers. When the first computers did come on
line, they had severe limitations. Comparing them to the objects we call phones
and carry in our pockets, we might say metaphorically, would be like compar-
ing an earthworm to a monkey. Programming them had to be done in terms of
binary machine code. Yet his audacious comments spoke of far-reaching accom-
plishments to come:
In his 1945 detailed proposal for his Automatic Computing Engine (ACE)
[13], Turing listed chess as one of ten problems that might be appropriate for the
machine. He suggested that it could be programmed to “display intelligence” and
to play “very good chess”. We have already mentioned Turing’s 1947 lecture on
the ACE to the London Mathematical Society [14]. He suggested that given the
possibility of changing its own program, the machine might be able to improve
its performance. He asserted, “When this happens I feel that one is obliged to
regard the machine as showing intelligence”. He ended his address mentioning
chess as an appropriate arena for the machine to learn from exposure to human
opponents.
In 1948, Turing presented an essay entitled Intelligent Machinery as an inter-
nal memo to his superior at The National Physical Laboratory [17]. The far-
sighted comments are particularly astonishing considering the state of computer
hardware, with the first crude true stored program computers just being built. As
problems appropriate for machines like the ACE, Turing mentioned, games like
chess or bridge, learning languages, translation, cryptography. He discussed sev-
eral kinds of simple networks constructed from neuron-like elements, and pointed
out that the networks could be fully realized by programming a machine like the
154 M. Davis

ACE. He envisioned programming a computer to carry out formal deductions


in a system like Principia Mathematica, and using it to search for solutions of
mathematical problems. He suggested both the possibility of a computer mod-
ifying its own program and also that it might be useful to introduce a random
element. It can’t be said that Turing imagined something quite like AlphaGo.
However, with its two neural nets and its Monte Carlo search algorithm, one can
find an echo of Turing’s imaginings
Given what has already been achieved it is exciting to imagine the further
developments that will now be coming. May we hope that human society will
find itself able to deal intelligently and equitably with them as they arrive?

References
1. Copeland, B.J., Proudfoot, D.: Alan Turing’s forgotten ideas in computer science.
Sci. Am. 253(4), 98–103 (1999)
2. Copeland, B.J. (ed.): The Essential Turing. Oxford University Press, New York
(2004)
3. Copeland, B.J., et al.: Alan Turing’s Electronic Brain: The Struggle to Build the
ACE, the World’s Fastest Computer. Oxford University Press, Oxford (2005)
4. Copeland, B.J., et al.: The Turing Guide. Oxford University Press, Oxford (2017)
5. Davis, M.: Mathematical logic and the origin of modern computers. In: Studies
in the History of Mathematics, pp. 137–165. Mathematical Association of Amer-
ica (1987). Reprinted in The Universal Turing Machine - A Half-Century Survey,
Herken, R. (ed.), pp. 149–174. Verlag Kemmerer & Unverzagt/Oxford University
Press, Hamburg/Oxford (1988)
6. Davis, M.: The Universal Computer: The Road from Leibniz to Turing. W.W.
Norton, New York City (2000). Second (Turing Centenary) Edition. CRC Press,
Taylor & Francis (2012)
7. Davis, M.: Engines of Logic: Mathematicians and the Origin of the Computer.
W.W. Norton, New York City (2001). Paperpack edition of [6]
8. Davis, M.: The Universal Computer: The Road from Leibniz to Turing, 3rd edn.
Taylor & Francis/CRC Press, Boca Raton (2018)
9. Davis, M.: The myth of hypercomputation. In: Teuscher, C. (ed.) Alan Turing: Life
and Legacy of a Great Thinker, pp. 195–212. Springer, Heidelberg (2004). https://
doi.org/10.1007/978-3-662-05642-4 8
10. Leavitt, D.: The Man Who Knew Too Much: Alan Turing and the Ivention of the
Computer. W.W. Norton, New York City (2006)
11. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous
activity. Bull. Math. Biophys. 5, 115–133 (1943). Reprinted in McCulloch, W.S.,
Embodiments of Mind, pp. 19–39. M.I.T. Press, Cambridge (1965)
12. Searle, J.R.: I married a computer. N. Y. Rev. Books 46, 34–38 (1999)
13. Turing, A.: Proposed Electronic Calculator. Proposal submitted to the Mathemat-
ics Division of the National Physical Laboratory (1945). Reprinted in [16], pp.
1–86. Reprinted in [3], pp. 369–454
14. Turing, A.: The Automatic Computing Engine. Lecture delivered to the London
Mathematical Society, 20 February 1947. Reprinted in [16], pp. 87–105. Reprinted
in [2], pp. 378–394
15. Turing, A.: Computing machinery and intelligence. Mind LIX, 433–460 (1950).
Reprinted in [16], pp. 133–160. Reprinted in [2], pp. 433–464
Turing’s Vision and Deep Learning 155

16. Turing, A.: Collected Works: Mechanical Intelligence, Ince, D.C. (ed.) North-
Holland, Amsterdam (1992)
17. Turing, A.: Intelligent Machinery. [2], pp. 410–432. [16], pp. 107–127
18. Turing, S.: Alan M. Turing, Centenary Edition. Cambridge University Press, Cam-
bridge (2012)

You might also like