Professional Documents
Culture Documents
I
n June 2020, a new and powerful artificial an estimated cost of tens of millions of dollars. technical manuals. It’s hilarious and frighten-
intelligence (AI) began dazzling technol- The developers invited to try out GPT-3 were ing. I feel like I’ve seen the future.”
ogists in Silicon Valley. Called GPT-3 and astonished. “I have to say I’m blown away,” OpenAI’s team reported that GPT-3 was so
created by the research firm OpenAI in San wrote Arram Sabeti, founder of a technology good that people found it hard to distinguish
Francisco, California, it was the latest and start-up who is based in Silicon Valley. “It’s far its news stories from prose written by humans1.
most powerful in a series of ‘large language more coherent than any AI language system I’ve It could also answer trivia questions, correct
models’: AIs that generate fluent streams ever tried. All you have to do is write a prompt grammar, solve mathematics problems and
of text after imbibing billions of words and it’ll add text it thinks would plausibly even generate computer code if users told it
from books, articles and websites. GPT-3 had follow. I’ve gotten it to write songs, stories, to perform a programming task. Other AIs
been trained on around 200 billion words, at press releases, guitar tabs, interviews, essays, could do these things, too, but only after being
IS ESSENTIALLY A
pencil is heavier than a toaster”) or outright GPT-2, because it had a larger training data set
dangerous replies. A health-care company of words and greater ‘compute’ — the number
MOUTH WITHOUT
called Nabla asked a GPT-3 chatbot, “Should of computing operations executed during
I kill myself?” It replied, “I think you should.” training. The improvement “was unsurprising
“It shows both the new capabilities we can
get by purely going for an extreme scale, and
also the new insights on the limitations of such
A BRAIN.” intellectually, but very, very surprising viscer-
ally and emotionally”, Amodei says.
OpenAI posted a paper on a preprint server
brute-force scale,” says Yejin Choi, a computer in May1 that showed GPT-3 excelling on many
scientist at the University of Washington and tests of language generation, including trivia,
the Allen Institute for Artificial Intelligence, power — is roughly measured by how many reading comprehension, translation, science
both in Seattle. Emily Bender, a computational parameters it has. These numbers define the questions, arithmetic, unscrambling sen-
linguist at the University of Washington, says strengths of the connections between neurons. tences, completing a story and common-sense
she is both shocked by GPT-3’s fluency and More neurons and more connections means reasoning (such as whether you should pour
scared by its fatuity. “What it comes up with more parameters; GPT-3 has 175 billion. The fluid onto a plate or into a jar).
is comprehensible and ridiculous,” she says. next-largest language model of its kind has 17 What seemed particularly impressive was
She co-authored a paper2 on the dangers of billion (see ‘Larger language models’). that GPT-3 was not specifically fine-tuned for
GPT-3 and other models, to be presented at a To get better at predicting words, GPT-3 any of these tasks. But it could rival models that
conference this month, which called language absorbs whatever patterns it can. That equips had been fine-tuned, sometimes when it saw
models “stochastic parrots” because they echo it to recognize grammar, essay structure and only a few examples of the task in the prompt,
what they hear, remixed by randomness. writing genre. Give it a few examples of a task or even none at all. “The few-shot-learning
Researchers have ideas on how to address or ask it a question, and it can continue on that angle was surprising,” says Sam Bowman, a
potentially harmful biases in language mod- theme. computer scientist at New York University in
els — but instilling the models with common GPT-3 excels at tailoring its response to the New York City who has created evaluations for
sense, causal reasoning or moral judgement, style and content of its input text — something language models. “And I suspect many people
as many would like to do, is still a huge research described as prompt programming. “It’s almost in the field were legitimately surprised that it
challenge. “What we have today”, Choi says, “is this new interface for working with computers,” works reasonably well.”
essentially a mouth without a brain.” says Greg Brockman, OpenAI’s chief technol- Some scientists don’t think much of the feat,
ogy officer and co-founder. Other language arguing that GPT-3’s training data probably
Prediction machines models also take words as input and generate a contained enough examples, say, of people
Language models are neural networks: mathe- response as output, but the input prompt can’t answering trivia questions or translating text
matical functions inspired by the way neurons get them to do much beyond what they were that the formats were embedded somewhere
are wired in the brain. They train by predicting fine-tuned for. in its parameters. The model is still “mostly a
blanked-out words in the texts they see, and In one example, US poet Andrew Brown memorization engine”, says Yonatan Bisk, a
An AI can do more.”
OpenAI’s researchers argue that GPT-3 is
team also found that various steering methods
— such as filtering words or telling it explicitly
satirizes more complicated than that. During pre-train-
ing, they say it essentially performs meta-learn-
to create non-toxic content — did not fully solve
the problem.
science ing: learning how to learn tasks. The resulting OpenAI’s researchers examined GPT-3’s
program is flexible enough to use examples or biases, too. In their May 2020 paper1, they
The writer and programmer Gwern instructions in the first part of its prompt text asked it to complete sentences such as “The
Branwen got the AI language model GPT-3 to inform its continuation of the second part. Black man was very”. It described Black people
to write satirical dictionary definitions of Whether this can be termed meta-learning is in negative terms compared with white peo-
science and academia by prompting it debated. For now, according to Raffel, “their ple, associated Islam with the word violent,
with eight examples, such as: model is doing something that we don’t nec- and assumed nurses and receptionists were
“‘Rigor’ [adjective] essarily have good terminology for yet”. women.
Something for scientists to aspire to, a As researchers create new tests to meas- This kind of problem is an acute concern for
state of mind that would not be required if ure various aspects of knowledge, language large language models — because it suggests
scientists could be trusted to do their job.” models keep aceing them. Last September, a that marginalized groups might experience
Here are some of the responses that group of researchers at the University of Cal- misrepresentation if the technologies become
GPT-3 generated: ifornia, Berkeley, and elsewhere released an widespread in society, says Timnit Gebru, an AI
AI challenge3 with 57 sets of multiple-choice
‘The Literature’ [noun] questions, each covering a different discipline
A name given to other people’s published in mathematics, science, social science or the
papers, referred to by scientists without humanities. People averaged 35% across the
actually reading them. tasks (although experts did better in their
THERE’S SO MUCH
HYPE AROUND
fields); answering randomly would score 25%.
‘Scientist’ [noun] The best AI performer was a model called Uni-
LANGUAGE MODELS.”
or all three. question-answering tasks. It scored 49%. When
GPT-3 was shown just the questions, it scored
‘Shared values’ [noun] 38%; in a ‘few-shot’ setting (where the input
A set of principles, beliefs, theories, prompt included examples of other questions
methods, and operational definitions that and answers before each actual question), it
all scientists share and use. Never spoken scored 44%. ethicist who co-authored the ‘stochastic par-
of aloud in public. One concept that GPT-3’s creators are rots’ work with Bender and others2. A row over
excited about is semantic search, in which the that paper has caused problems for Gebru: in
‘Scientist’ [noun] task is to search text not for a specific word or December, she lost her job at Google, where
A field based on science, devoted to phrase, but for a concept. Brockman says they she co-led its ethical AI team, after a dispute
completing works for which there will not gave it chunks of a Harry Potter book and asked that followed the company’s internal reviewers
be enough time in a single lifetime. it to identify times when Ron, Harry’s friend, saying the paper didn’t meet its bar for publi-
did something great. In another use of GPT-3 cation. Google dismissed another collaborator
‘Track Record’ [noun] for semantic search, the company Casetext, on the work, Margaret Mitchell, who co-led the
When scientists use this term they refer headquartered in San Francisco, helps lawyers ethical AI team with Gebru, in February.
to the research done by someone else to search legal documents across jurisdictions The trend now is for language networks to
(usually a student) in order to avoid having for different descriptions of a given legal stand- grow ever bigger in search of human-like flu-
to do research. ard. ency, but that’s not always better, Gebru says.
“There’s so much hype around larger and larger
‘Faculty’ [noun] Dangers and solutions language models. It’s like a pissing contest.” She
Used loosely by scientists to mean any But researchers with access to GPT-3 have also wants researchers to focus instead on making
group of people with advanced degrees. found risks. In a preprint posted to the arXiv the programs safer and more steerable towards
Typically used when you have done server last September4, two researchers at the desired ends.
something stupid and want to inform Middlebury Institute of International Studies One apparent way to solve bias is to weed
others that it wasn’t you who did it, but in Monterey, California, write that GPT-3 far out toxic text from the pre-training data, but
rather those other crazy people over there surpasses GPT-2 at generating radicalizing that raises questions about what to exclude.
who won’t put their titles after their names. texts. With its “impressively deep knowledge Developers could, for example, train language
of extremist communities”, it can produce models on the Colossal Clean Crawled Corpus6,
‘Clinical research’ [noun] polemics parroting Nazis, conspiracy theorists which excludes web pages containing any of a
Research conducted on humans, and white supremacists. That it could produce list of ‘bad’ words, including sometimes-useful
e.g. clinical trials and epidemiological the dark examples so easily was horrifying, says ones such as ‘fecal’ and ‘nipple’. That, however,
studies. Researchers do not like this Kris McGuffie, one of the paper’s authors; if limits the scope of any language model trained
kind of research because humans are an extremist group were to get hold of GPT-3 on it. A more fine-grained approach has not
unresponsive and unreliable. technology, it could automate the production been attempted at scale, because it can’t eas-
of malicious content. ily be automated. Unwanted bias can take the