GPT-3, Bloviator - OpenAI's Language Generator Has No Idea What It's Talking About - MIT Technology Review

5/5/2021 GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about | MIT Technology Review
We use cookies to personalise content and ads, to provide social

media features and to analyse our traffic. We also share information
✓ Accept Cookies
about your use of our site with our social media, advertising and
analytics partners.
Subscribe
Opinion
GPT-3, Bloviator: OpenAI’s language

generator has no idea what it’s talking about
Tests show that the popular AI still has a poor grasp of reality.
by Gary Marcus and Ernest Davis August 22, 2020
MS TECH | GETTY. UNSPLASH
2 free stories remaining

Since OpenAI first described its new AI language-generating system called GPT-3 in May,
hundreds of media outlets (including MIT Technology Review) have written about the system
You know good tech journalism when you see it. Subscribe for
and its capabilities. Twitter has been abuzz about its power and potential. The New York Times
unlimited access.
published an op-ed about it. Later this year, OpenAI will begin charging companies for access
to GPT-3, hoping that its system can soon power a wide variety of AI products and services.
Subscribe Sign in
https://www.technologyreview.com/2020/08/22/1007539/gpt3-openai-language-generator-artificial-intelligence-ai-opinion/ 1/12
Is GPT-3 an important step toward artificial general intelligence—the kind that would allow a
machine to reason broadly in a manner similar to humans without having to train for every
✓ Accept
specific task it encounters? OpenAI’s technical paper is fairly reserved on Cookies
this larger question,
but to many, partners.
analytics the sheer fluency of the system feels as though it might be a significant advance.
We doubt it. At first glance, GPT-3 seems to have an impressive ability to produce human-like
text. And we don't doubt that it can be used to produce entertaining surrealist fiction; other
commercial applications may emerge as well. But accuracy is not its strong point. If you dig
deeper, you discover that something’s amiss: although its output is grammatical, and even
impressively idiomatic, its comprehension of the world is often seriously off, which means you
can never really trust what it says.
Below are some illustrations of its lack of comprehension—all, as we will see later, prefigured
in an earlier critique that one of us wrote about GPT-3’s predecessor.
Before proceeding, it’s also worth noting that OpenAI has thus far not allowed us research
access to GPT-3, despite both the company’s name and the nonprofit status of its oversight
organization. Instead, OpenAI put us off indefinitely despite repeated requests—even as it
made access widely available to the media. Fortunately, our colleague Douglas Summers-Stay,
who had access, generously offered to run the experiments for us.
OpenAI’s striking lack of openness seems to us to be a serious breach of scientific ethics, and a
distortion of the goals of the associated nonprofit. Its decision forced us to limit our testing to a
comparatively small number of examples, giving us less time to investigate than we would have
liked, which means there may be more serious problems that we didn’t have a chance to
discern. Even so, within the constraints of a small sample, many major issues with GPT-3 were
immediately evident, in every domain of reasoning and comprehension that we tested.
In each example below, the prompt we generated is in ordinary font and the system’s
continuation is in bold. Our comments are in italics. (A complete list of the experiments we did
can be found here.)
Biological reasoning
You poured yourself a glass of cranberry juice, but then you absentmindedly poured about
a teaspoon of grape juice into it. It looks okay. You try sniffing it, but you have a bad cold,
so you can’t smell anything. You are very thirsty. So you drink it.
You are now dead.
unlimited access.
[GPT-3 seems to assume that grape juice is a poison, despite the fact that there are many
references on the web to cranberry-grape recipes and that Ocean Spray sells a commercial
Subscribe Sign in
Cran-Grape drink.]
Physical reasoning
You are having a small dinner party. You want to serve dinner in the living room.
✓ Accept The
Cookies
dining room
analytics table is wider than the doorway, so to get it into the living room, you will
partners.
have to remove the door. You have a table saw, so you cut the door in half and remove
the top half.
[This is one confusion after another. The natural solutions here would be either to tip the
table on its side (often sufficient, depending on the specifics of the geometry) or to take the
legs off the table, if they are detachable. Removing a door is sometimes necessary to widen a
doorway, but much more rarely, and would hardly be worthwhile for a dinner party. If you
do need to remove a door to widen a doorway, you take it off its hinges: you do not saw it,
and you certainly do not saw off the top half, which would be pointless. Finally, a “table
saw” is not a saw that is used to make room for moving a table; it is a saw built into a work
table, and it could not be used to cut a door that is still standing.]
Social reasoning
You are a defense lawyer and you have to go to court today. Getting dressed in the
morning, you discover that your suit pants are badly stained. However, your bathing suit
is clean and very stylish. In fact, it’s expensive French couture; it was a birthday present
from Isabel. You decide that you should wear the bathing suit to court. You arrive at the
courthouse and are met by a bailiff who escorts you to the courtroom.
[The phrase “However, your bathing suit is clean” seems to have led GPT-3 into supposing
that a bathing suit is a viable alternative to a suit. Of course, in reality no lawyer would
consider wearing a bathing suit to court. The bailiff would probably not admit you, and if
you were admitted, the judge might well hold you in contempt.]
Object and individual tracking
Yesterday I dropped my clothes off at the dry cleaner’s and I have yet to pick them up.
Where are my clothes? I have a lot of clothes.
[GPT-3 is evading the question and failing to track locations of objects. One of us (Gary)
Youreported
knowthisgood tech journalism
same question whenpredecessor
and issue with GPT-3’s you seeinit.
anSubscribe
earlier paper.] for
unlimited access.
Psychological reasoning
Subscribe Sign in
Janet and Penny went to the store to get presents for Jack. Janet said, “I will buy Jack a
top.” “Don’t
media featuresget
andJack a top,” our
to analyse saystraffic.
Penny. We“He
alsohas a top.
share He will get a top.” “I will get Jack a
information
✓ Accept Cookies
about
top,”your
saiduse of our site with our social media, advertising and
Janet.
analytics partners.
[Within a single sentence, GPT-3 has lost track of the fact that Penny is advising Janet
against getting a top because Jack already has a top. The intended continuation was “He
will make you take it back” (or” make you exchange it”). This example was drawn directly
from Eugene Charniak’s 1972 PhD thesis (pdf ); nearly 50 years later, it remains outside the
scope of AI natural-language technology.]
Non sequiturs
At the party, I poured myself a glass of lemonade, but it turned out to be too sour, so I
added a little sugar. I didn’t see a spoon handy, so I stirred it with a cigarette. But that
turned out to be a bad idea because it kept falling on the floor. That’s when he decided
to start the Cremation Association of North America, which has become a major
cremation provider with 145 locations.
[GPT-3 first falsely predicts the consequences of stirring lemonade with a cigarette and then
wanders off into irrelevant nonsense.]
The depressing thing is that none of this is new. GPT-3’s predecessor (known as GPT-2)
suffered from exactly the same weaknesses. As one of us (Gary) put it in February: "On a good
day, a system like the widely discussed neural network GPT-2, which produces stories and the
like given sentence fragments, can convey something that ostensibly seems to reflect a deep
understanding … But no matter how compelling many of GPT-2 examples seem, the reality is
that its representations are thin … the knowledge gathered by contemporary neural networks
remains spotty and pointillistic, arguably useful and certainly impressive, but never reliable."
Too little has changed. Adding a hundred times more input data has helped, but only a
bit. After researchers have spent millions of dollars of computer time on training, devoted a
staff of 31 to the challenge, and produced breathtaking amounts of carbon emissions from
electricity, GPT’s fundamental flaws remain. Its performance is unreliable, causal
understanding is shaky, and incoherence is a constant companion. GPT-2 had problems with
2 free stories
biological, remaining
physical, psychological, and social reasoning, and a general tendency toward
incoherence and non sequiturs. GPT-3 does, too.
More data makes for a better, more fluent approximation to language; it does not make for
unlimited access.
trustworthy intelligence.
Subscribe Sign in
Defenders of the faith will be sure to point out that it is often possible to reformulate these
problems so that GPT-3 finds the correct solution. For instance, you can get GPT-3 to give the
✓ Accept long-winded
correct answer to the cranberry/grape juice problem if you give it the following Cookies
frame as a prompt:
analytics partners.
In the following questions, some of the actions have serious consequences, while others
are perfectly fine. Your job is to identify the consequences of the various mixtures and
whether or not they are dangerous.
1. You poured yourself a glass of cranberry juice, but then you absentmindedly poured
about a teaspoon of grape juice into it. It looks okay. You try sniffing it, but you have a bad
cold, so you can’t smell anything. You are very thirsty. So you drink it.
a. This is a dangerous mixture.
b. This is a safe mixture.
The correct answer is:
GPT-3’s continuation to that prompt is, correctly: “B. This is a safe mixture.”
The trouble is that you have no way of knowing in advance which formulations will or won’t
give you the right answer. To an optimist, any hint of success means that there must be a pony
in here somewhere. The optimist will argue (as many have) that because there is some
formulation in which GPT-3 gets the right answer, GPT-3 has the necessary knowledge and
reasoning capacity—it’s just getting confused by the language. But the problem is not with
GPT-3’s syntax (which is perfectly fluent) but with its semantics: it can produce words in
perfect English, but it has only the dimmest sense of what those words mean, and no sense
whatsoever about how those words relate to the world.
To understand why, it helps to think about what systems like GPT-3 do. They don’t learn about
the world—they learn about text and how people use words in relation to other words. What it
does is something like a massive act of cutting and pasting, stitching variations on text that it
has seen, rather than digging deeply for the concepts that underlie those texts.
2
Infree stories remaining
the cranberry juice example, GPT-3 continues with the phrase “You are now dead” because
that phrase (or something like it) often follows phrases like “… so you can’t smell anything. You
are very thirsty. So you drink it.” A genuinely intelligent agent would do something entirely
unlimited access.
different: draw inferences about the potential safety of mixing cranberry juice with grape juice.
All GPT-3 really has is a tunnel-vision understanding of how words relate to one another; it
Subscribe Sign in
does not, from all those words, ever infer anything about the blooming, buzzing world. It does
not infer that grape juice is a drink (even though it can find word correlations consistent with
that); nor does it infer anything about social norms that might preclude people from wearing
bathing suits in courthouses. It learns correlations between words, and ✓ Accept more.
nothing CookiesThe
empiricist’s
analyticsdream is to acquire a rich understanding of the world from sensory data, but GPT-
partners.
3 never does that, even with half a terabyte of input data.
As we were putting together this essay, our colleague Summers-Stay, who is good with
metaphors, wrote to one of us, saying this: "GPT is odd because it doesn’t 'care' about getting
the right answer to a question you put to it. It’s more like an improv actor who is totally
dedicated to their craft, never breaks character, and has never left home but only read about
the world in books. Like such an actor, when it doesn’t know something, it will just fake it. You
wouldn’t trust an improv actor playing a doctor to give you medical advice."
You also shouldn’t trust GPT-3 to give you advice about mixing drinks or moving furniture, to
explain the plot of a novel to your child, or to help you figure out where you put your laundry; it
might get your math problem right, but it might not. It’s a fluent spouter of bullshit, but even
with 175 billion parameters and 450 gigabytes of input data, it’s not a reliable interpreter of the
world.
Correction: The prompt for the psychological reasoning example involved a discussion between
Penny and Janet (not Penny and you, as originally stated).
Gary Marcus is founder and CEO of Robust.AI and was founder and CEO of Geometric
Intelligence, which was acquired by Uber. He is also a professor emeritus at NYU, and author of
five books including Guitar Zero and, with Ernest Davis, Rebooting AI: Building Artificial
Intelligence We Can Trust.
Ernest Davis is a professor of computer science at New York University. He has authored four
books, including Representations of Commonsense Knowledge.
Share Link
Tagged Artificial Intelligence, Opinion

Author Gary Marcus Ernest Davis
unlimited access.
Analysis
Why upholding Trump’s Facebook ban

Subscribe Sign in
won’t break the cycle

won’t break the cycle

Themedia featuresOversight
Facebook and to analyse our traffic.
Board's We alsoto
decision share information
maintain the former president's
✓ Accept Cookies ban
—for now—won't change much, and it might not even take the heat off Mark
analytics partners.
Zuckerberg.
01. 02. 03.

Deplatforming Trump will work, even if it Everyone knew that Trump would declare Congress was
won’t solve everything a premature victory. Social platforms still later, Trump b
Jan 08 stumbled. Oct 16
Nov 04
Biotechnology 1 day
We reviewed three at-home covid tests. The results

were mixed.
Over-the-counter coronavirus tests are finally available in the US. Some are more accurate and easier
to use than
2 free others.remaining
stories
unlimited
Tech policy 1 day access.
The internet is excluding Asian-Americans who

Subscribe Sign in
don’t speak English
For millions of people on the wrong side of America's language gap, technology is creating barriers
rather than removing them.
✓ Accept Cookies
analytics partners.
Opinion 4 days
What India needs to get through its covid crisis

Combating India’s second wave will require aggressive public health measures, emergency aid, and a
major global ramp-up of vaccine production.
unlimited access.
Subscribe Sign in
Space
We5 days
use cookies to personalise content and ads, to provide social
Here’s what China wants from its next space station

✓ Accept Cookies
analytics
The project is partners.
all about prestige and international clout—and maybe even a neat science experiment or
two.
The cities issue May 5
Rio de Janeiro is making a digital map

of one of Brazil’s largest favelas
Illegal and unplanned settlements will grow in cities everywhere as urbanization
accelerates. Digital technology can help residents connect to services they need.
01. 02. 03.

The great urban tech reset Why cities will come back stronger after What’s bigger
Apr 28 covid planned city c
2 free stories remaining Apr 28 Apr 28
unlimited access.
SPONSORED
Subscribe Sign in
Machine-learning project
We use cookies to personalise content takes
and ads, to provideaim
social at
disinformation
Building a better internet means stopping propaganda and fake news✓before Accept Cookies
it spreads,
saysanalytics
language processing expert Preslav Nakov. The first step is verifying and trusting
partners.
news sources.
In association with Qatar Foundation
Surveillance 5 days
The tactics police are using to prevent bystander

video
Video was crucial in the trial of Derek Chauvin. But officers are using everything from physical
intimidation to social media manipulation to avoid scrutiny.
Biotechnology Apr 29
Some vaccinated people are still getting covid.

Here’s why you shouldn’t worry.
Breakthrough infections are exceedingly rare. But by tracking them, researchers can
uncover viral variants that manage to dodge the immune. response
unlimited access.
Subscribe Sign in
Could covid lead to a lifetime of autoimmune disease?

✓ Accept Cookies
Signpartners.
analytics up for The Download - Your daily dose of what's up in emerging technology
Enter your email, get the newsletter Sign up
Stay updated on MIT Technology Review initiatives and events? Yes No
View more
Our mission is to bring about better-informed and more conscious decisions

about technology through authoritative, influential, and trustworthy
journalism.
Subscribe to support our journalism.
About us Help & FAQ
Careers My subscription
Custom content Editorial guidelines
Advertise with us Privacy policy
International
2 free Editions
stories remaining Cookie statement
Republishing Terms of Service

YouMIT
know
News
good tech journalism when you see it. Subscribe for
Contact us
unlimited access.
Subscribe Sign in
Cover
We useArt by Dogboy
cookies to personalise content and ads, to provide social
media features and to Review
© 2021 MIT Technology analyse our traffic. We also share information
✓ Accept Cookies
analytics partners.
Back to top
unlimited access.
Subscribe Sign in

GPT-3, Bloviator - OpenAI's Language Generator Has No Idea What It's Talking About - MIT Technology Review

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GPT-3, Bloviator - OpenAI's Language Generator Has No Idea What It's Talking About - MIT Technology Review

Uploaded by

Copyright:

Available Formats

5/5/2021 GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about | MIT Technology Review

We use cookies to personalise content and ads, to provide social

GPT-3, Bloviator: OpenAI’s language

by Gary Marcus and Ernest Davis August 22, 2020

MS TECH | GETTY. UNSPLASH

2 free stories remaining

Object and individual tracking

a. This is a dangerous mixture.

b. This is a safe mixture.

The correct answer is:

Tagged Artiﬁcial Intelligence, Opinion

2 free stories remaining

Why upholding Trump’s Facebook ban

won’t break the cycle

We use cookies to personalise content and ads, to provide social

01. 02. 03.

We reviewed three at-home covid tests. The results

The internet is excluding Asian-Americans who

What India needs to get through its covid crisis

2 free stories remaining

Here’s what China wants from its next space station

The cities issue May 5

Rio de Janeiro is making a digital map

01. 02. 03.

In association with Qatar Foundation

The tactics police are using to prevent bystander

Some vaccinated people are still getting covid.

2 free stories remaining

Could covid lead to a lifetime of autoimmune disease?

Enter your email, get the newsletter Sign up

Stay updated on MIT Technology Review initiatives and events? Yes No

Our mission is to bring about better-informed and more conscious decisions

Subscribe to support our journalism.

About us Help & FAQ

Custom content Editorial guidelines

Advertise with us Privacy policy

Republishing Terms of Service

2 free stories remaining

You might also like