You are on page 1of 3

FEBRUARY 25, 2015

Artificial Intelligence Goes to the Arcade


BY NICOLA TWILLEY
A shaky video, recorded with a mobile phone and smuggled out of the inaugural Fi
rst Day of Tomorrow technology conference, in April, 2014, shows an artificially
intelligent computer program in its first encounter with Breakout, the classic
Atari arcade game. Prototyped in 1975 by Steve Wozniak, the co-founder of Apple,
with assistance from Steve Jobs, the other co-founder of Apple, Breakout is a v
ariant of Pong, in which a player knocks bricks from a wall by hitting a ball ag
ainst it. After half an hour of play, the A.I. program is doing about as well as
I would, which is to say not very but it is trying to move its paddle toward the
ball, apparently grasping the rudiments of the game. After another thirty minute
s, two hundred rounds in, the A.I. has become a talented amateur: it misses the
ball only every third or fourth time. The audience laughs; isn t this cool?
Then something happens. By the three hundredth game, the A.I. has stopped missin
g the ball. The auditorium begins to buzz. Demis Hassabis, the program s creator,
advances to the next clip in his video presentation. The A.I. uses four quick re
bounds to drill a hole through the left-hand side of the wall above it. Then it
executes a killer bank shot, driving the ball into the hole and through to the o
ther side, where it ricochets back and forth, destroying the entire wall from wi
thin. Now there are exclamations, applause, and shocked whispers from the crowd.
Hours after encountering its first video game, and without any human coaching,
the A.I. has not only become better than any human player but has also discovere
d a way to win that its creator never imagined.
Today, in a paper published in Nature, Hassabis and his colleagues Volodymyr Mni
h, Koray Kavukcuoglu, and David Silver reveal that their A.I. has since achieved
the same feat with an angling game (Fishing Derby, 1980), a chicken-crossing-th
e-road game (Freeway, 1981), an armored-vehicle game (Robot Tank, 1983), a marti
al-arts game (Kung-Fu Master, 1984), and twenty-five others.* In more than a doz
en of them, including Stargunner and Crazy Climber, from 1982, it made the best
human efforts look pathetic. The Nature article appears just over a year after H
assabis s company, DeepMind, made its public dbut; Google bought the firm for six h
undred and fifty million dollars in January, 2014, soon after Hassabis first dem
onstrated his program s superhuman gaming abilities, at a machine-learning worksho
p in a Harrah s casino on the edge of Lake Tahoe. That program, the DeepMind team
now claims, is a novel artificial agent that combines two existing forms of braininspired machine intelligence: a deep neural network and a reinforcement-learnin
g algorithm.
Deep neural networks rely on layers of connections, known as nodes, to filter ra
w sensory data into meaningful patterns, just as neurons do in the brain. Apple s
Siri uses such a network to decipher speech, sorting sounds into recognizable ch
unks before drawing on contextual clues and past experiences to guess at how bes
t to group them into words. Siri s deductive powers improve (or ought to) every ti
me you speak to her or correct her mistakes. The same technique can be applied t
o decoding images. To a computer with no prexisting knowledge of brick walls or k
ung fu, the pixel data that it receives from an Atari game is meaningless. Rathe
r than staring uncomprehendingly at the noise, however, a program like DeepMind s
will start analyzing those pixels sorting them by color, finding edges and pattern
s, and gradually developing an ability to recognize complex shapes and the ways
in which they fit together.
The program s second, complementary form of intelligence reinforcement learning allows
for a kind of unsupervised obedience training. DeepMind s A.I. starts each game l
ike an unhousebroken puppy. It is programmed to find a score rewarding, but is g
iven no instruction in how to obtain that reward. Its first moves are random, ma

de in ignorance of the game s underlying logic. Some are rewarded with a treat a sco
re and some are not. Buried in the DeepMind code, however, is an algorithm that al
lows the juvenile A.I. to analyze its previous performance, decipher which actio
ns led to better scores, and change its future behavior accordingly. Combined wi
th the deep neural network, this gives the program more or less the qualities of
a good human gamer: the ability to interpret the screen, a knack for learning f
rom past mistakes, and an overwhelming drive to win.
Whipping humanity s ass at Fishing Derby may not seem like a particularly notewort
hy achievement for artificial intelligence nearly two decades ago, after all, I.B.
M. s Deep Blue computer beat Garry Kasparov, a chess grandmaster, at his own more
intellectually aspirational game but according to Zachary Mason, a novelist and co
mputer scientist, it actually is. Chess, he noted, has an extremely limited featu
re space ; the only information that Deep Blue needed to consider was the position
s of the pieces on the board, during a span of not much more than a hundred turn
s. It could play to its strengths of perfect memory and brute-force computing po
wer. But in an Atari game, Mason said, there s a byte or so of information per pixe
l and hundreds of thousands of turns, which adds up to much more and much messier
data for the DeepMind A.I. to process. In this sense, a game like Crazy Climber
is a closer analogue to the real world than chess is, and in the real world hum
ans still have the edge. Moreover, whereas Deep Blue was highly specialized, and
preprogrammed by human grandmasters with a library of moves and rules, DeepMind
is able to use the same all-purpose code for a wide array of games.
That adaptability holds promise. Hassabis has begun partnering with satellite op
erators and financial institutions to see whether his A.I. could eventually play t
heir data sets, perhaps learning to make weather predictions or trade oil future
s. In the short term, though, his team has a more modest next step in mind: to d
esign a program that can play video games from the nineteen-nineties. Hassabis,
who began working as a game designer in 1994, at the age of seventeen, and whose
first project was the Golden Joystick-winning Theme Park, in which players got
ahead by, among other things, hiring restroom-maintenance crews and oversalting
snacks in order to boost beverage sales, is well aware that DeepMind s current sys
tem, despite being state of the art, is at least five years away from being a de
cade behind the gaming curve. Indeed, the handful of games in which DeepMind s A.I
. failed to achieve human-level performance were the ones that required longer-t
erm planning or more sophisticated pathfinding Ms. Pac-Man, Private Eye, and Monte
zuma s Revenge. One solution, Hassabis suggested, would be to make the A.I. bolder
in its decision-making, and more willing to take risks. Because of the rote rei
nforcement learning, he said, it s overexploiting the knowledge that it already kno
ws.
In the longer term, after DeepMind has worked its way through Warcraft, StarCraf
t, and the rest of the Blizzard Entertainment catalogue, the team s goal is to bui
ld an A.I. system with the capability of a toddler. But this, Hassabi said, they
are nowhere near reaching. For one thing, he explained, toddlers can do transfer
learning they can bring to bear prior knowledge to a new situation. In other words
, a toddler who masters Pong is likely to be immediately good at Breakout, where
as the A.I. has to learn both from scratch. Beyond that challenge lies the much
thornier question of whether DeepMind s chosen combination of a deep neural networ
k and reinforcement learning could, on its own, ever lead to conceptual cognitio
n not only a fluency with the mechanics of, say, 2001 s Sub Command but also an unde
rstanding of what a submarine, water, or oxygen are. For Hassabis, this is an ope
n question.
Zachary Mason is less sanguine. Their current line of research leads to StarCraft
in five or ten years and Call of Duty in maybe twenty, and controllers for dron
es in live battle spaces in maybe fifty, he told me. But it never, ever leads to a
toddler. Most toddlers cannot play chess or StarCraft. But they can interact wit

h the real world in sophisticated ways. They can find their way across a room, Mas
on said. They can see stuff, and as the light and shadows change they can recogni
ze that it s still the same stuff. They can understand and manipulate objects in s
pace. These kinds of tasks the things that a toddler does with ease but that a mach
ine struggles to grasp cannot, Mason is convinced, be solved by a program that exc
els at teaching itself Breakout. They require a model of cognition that is much
richer than what Atari, or perhaps any gaming platform, can offer. Hassabis s algo
rithm represents a genuine breakthrough, but it is one that reinforces just how
much distance remains between artificial intelligence and the human mind.
*Correction: An earlier version of this post mischaracterized the video game Fre
eway.

You might also like