You are on page 1of 3

Comment

perspectives. However, it could also degrade


the quality and transparency of research and
fundamentally alter our autonomy as human
researchers. ChatGPT and other LLMs produce
text that is convincing, but often wrong, so
their use can distort scientific facts and spread
misinformation.
We think that the use of this technology is
inevitable, therefore, banning it will not work.
It is imperative that the research community
engage in a debate about the implications of this
potentially disruptive technology. Here, we out-
line five key issues and suggest where to start.

Hold on to human verification


LLMs have been in development for years,
but continuous increases in the quality and
size of data sets, and sophisticated methods
to calibrate these models with human feed-
back, have suddenly made them much more
powerful than before. LLMs will lead to a new
generation of search engines1 that are able to

VITOR MIRANDA/ALAMY
produce detailed and informative answers to
complex user questions.
But using conversational AI for specialized
research is likely to introduce inaccuracies,
bias and plagiarism. We presented ChatGPT
A chatbot called ChatGPT can help to write text for essays, scientific abstracts and more. with a series of questions and assignments
that required an in-depth understanding of

ChatGPT: five priorities


the literature and found that it often generated
false and misleading text. For example, when
we asked ‘how many patients with depression

for research
experience relapse after treatment?’, it gen-
erated an overly general text arguing that
treatment effects are typically long-lasting.
However, numerous high-quality studies show
that treatment effects wane and that the risk
Eva A. M. van Dis, Johan Bollen, Robert van Rooij, of relapse ranges from 29% to 51% in the first
Willem Zuidema & Claudi L. Bockting year after treatment completion2–4. Repeat-
ing the same query generated a more detailed
and accurate answer (see Supplementary
Conversational AI is a caused excitement and controversy because it information, Figs S1 and S2).
is one of the first models that can convincingly Next, we asked ChatGPT to summarize a
game-changer for science. converse with its users in English and other systematic review that two of us authored
Here’s how to respond. languages on a wide range of topics. It is free, in JAMA Psychiatry5 on the effectiveness of
easy to use and continues to learn. cognitive behavioural therapy (CBT) for
This technology has far-reaching conse- anxiety-related disorders. ChatGPT fabricated
quences for science and society. Researchers a convincing response that contained several
and others have already used ChatGPT and factual errors, misrepresentations and wrong

S
other large language models to write essays data (see Supplementary information, Fig. S3).
and talks, summarize literature, draft and For example, it said the review was based on
ince a chatbot called ChatGPT was improve papers, as well as identify research 46 studies (it was actually based on 69) and,
released late last year, it has become gaps and write computer code, including sta- more worryingly, it exaggerated the effective-
apparent that this type of artificial tistical analyses. Soon this technology will ness of CBT.
intelligence (AI) technology will have evolve to the point that it can design exper- Such errors could be due to an absence of
huge implications on the way in which iments, write and complete manuscripts, the relevant articles in ChatGPT’s training set,
researchers work. conduct peer review and support editorial a failure to distil the relevant information or
ChatGPT is a large language model (LLM), a decisions to accept or reject manuscripts. being unable to distinguish between credible
machine-learning system that autonomously Conversational AI is likely to revolutionize and less-credible sources. It seems that the
learns from data and can produce sophisti- research practices and publishing, creating same biases that often lead humans astray,
cated and seemingly intelligent writing after both opportunities and concerns. It might such as availability, selection and confirma-
training on a massive data set of text. It is the accelerate the innovation process, shorten tion biases, are reproduced and often even
latest in a series of such models released by time-to-publication and, by helping people amplified in conversational AI6.
OpenAI, an AI company in San Francisco, to write fluently, make science more equi- Researchers who use ChatGPT risk being
California, and by other firms. ChatGPT has table and increase the diversity of scientific misled by false or biased information, and

224 | Nature | Vol 614 | 9 February 2023


©
2
0
2
3
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
incorporating it into their thinking and papers. Publishers could request author certification conversational AIs. This goes against the
Inattentive reviewers might be hoodwinked that such policies were followed. move towards transparency and open science,
into accepting an AI-written paper by its For now, LLMs should not be authors of and makes it hard to uncover the origin of, or
beautiful, authoritative prose owing to the manuscripts because they cannot be held gaps in, chatbots’ knowledge10. For example,
halo effect, a tendency to over-generalize accountable for their work. But, it might be we prompted ChatGPT to explain the work
from a few salient positive impressions7. And, increasingly difficult for researchers to pin- of several researchers. In some instances, it
because this technology typically reproduces point the exact role of LLMs in their studies. produced detailed accounts of scientists who
text without reliably citing the original sources In some cases, technologies such as ChatGPT could be considered less influential on the
or authors, researchers using it are at risk of might generate significant portions of a basis of their h-index (a way of measuring the
not giving credit to earlier work, unwittingly manuscript in response to an author’s prompts. impact of their work). Although it succeeded
plagiarizing a multitude of unknown texts In others, the authors might have gone through for a group of researchers with an h-index of
and perhaps even giving away their own many cycles of revisions and improvements around 20, it failed to generate any informa-
ideas. Information that researchers reveal to using the AI as a grammar- or spellchecker, but tion at all on the work of several highly cited
ChatGPT and other LLMs might be incorpo- not have used it to author the text. In the future, and renowned scientists — even those with an
rated into the model, which the chatbot could LLMs are likely to be incorporated into text h-index of more than 80.
serve up to others with no acknowledgement processing and editing tools, search engines To counter this opacity, the development
of the original source. and programming tools. Therefore they might and implementation of open-source AI tech-
Assuming that researchers use LLMs in their contribute to scientific work without authors nology should be prioritized. Non-commercial
work, scholars need to remain vigilant. Expert- necessarily being aware of the nature or mag- organizations such as universities typically lack
driven fact-checking and verification processes nitude of the contributions. This defies today’s the computational and financial resources
will be indispensable. Even when LLMs are able binary definitions of authorship, plagiarism needed to keep up with the rapid pace of LLM
to accurately expedite summaries, evaluations and sources, in which someone is either an development. We therefore advocate that
and reviews, high-quality journals might decide author, or not, and a source has either been scientific-funding organizations, universities,
to include a human verification step or even used, or not. Policies will have to adapt, but non-governmental organizations (NGOs), gov-
ban certain applications that use this technol- full transparency will always be key. ernment research facilities and organizations
ogy. To prevent human automation bias — an Inventions devised by AI are already causing such as the United Nations — as well tech giants
over-reliance on automated systems — it will a fundamental rethink of patent law9, and — make considerable investments in inde-
become even more crucial to emphasize the pendent non-profit projects. This will help to
importance of accountability8. We think that “One of the most immediate develop advanced open-source, transparent
humans should always remain accountable for and democratically controlled AI technologies.
scientific practice.
issues for the research Critics might say that such collaborations
community is the lack of will be unable to rival big tech, but at least one
Develop rules for accountability transparency.” mainly academic collaboration, BigScience, has
Tools are already available to predict the already built an open-source language model,
likelihood that a text originates from machines called BLOOM. Tech companies might benefit
or humans. Such tools could be useful for lawsuits have been filed over the copyright from such a program by open sourcing relevant
detecting the inevitable use of LLMs to man- of code and images that are used to train AI, parts of their models and corpora in the hope
ufacture content by paper mills and predatory as well as those generated by AI (see go.nature. of creating greater community involvement,
journals, but such detection methods are likely com/3y4aery). In the case of AI-written or facilitating innovation and reliability. Academic
to be circumvented by evolved AI technolo- -assisted manuscripts, the research and legal publishers should ensure LLMs have access to
gies and clever prompts. Rather than engage community will also need to work out who their full archives so that the models produce
in a futile arms race between AI chatbots and holds the rights to the texts. Is it the individ- results that are accurate and comprehensive.
AI-chatbot-detectors, we think the research ual who wrote the text that the AI system was
community and publishers should work out trained with, the corporations who produced Embrace the benefits of AI
how to use LLMs with integrity, transparency the AI or the scientists who used the system As the workload and competition in academia
and honesty. to guide their writing? Again, definitions of increases, so does the pressure to use conver-
Author-contribution statements and authorship must be considered and defined. sational AI. Chatbots provide opportunities
acknowledgements in research papers should to complete tasks quickly, from PhD stu-
state clearly and specifically whether, and to Invest in truly open LLMs dents striving to finalize their dissertation to
what extent, the authors used AI technolo- Currently, nearly all state-of-the-art conver- researchers needing a quick literature review
gies such as ChatGPT in the preparation of sational AI technologies are proprietary for their grant proposal, or peer-reviewers
their manuscript and analysis. They should products of a small number of big technology under time pressure to submit their analysis.
also indicate which LLMs were used. This will companies that have the resources for AI If AI chatbots can help with these tasks,
alert editors and reviewers to scrutinize man- development. OpenAI is funded largely by results can be published faster, freeing aca-
uscripts more carefully for potential biases, Microsoft, and other major tech firms are demics up to focus on new experimental
inaccuracies and improper source crediting. racing to release similar tools. Given the designs. This could significantly accelerate
Likewise, scientific journals should be trans- near-monopolies in search, word processing innovation and potentially lead to break-
parent about their use of LLMs, for example and information access of a few tech compa- throughs across many disciplines. We think
when selecting submitted manuscripts. nies, this raises considerable ethical concerns. this technology has enormous potential,
Research institutions, publishers and One of the most immediate issues for the provided that the current teething problems
funders should adopt explicit policies that research community is the lack of trans- related to bias, provenance and inaccuracies
raise awareness of, and demand transpar- parency. The underlying training sets and are ironed out. It is important to examine and
ency about, the use of conversational AI in LLMs for ChatGPT and its predecessors are advance the validity and reliability of LLMs so
the preparation of all materials that might not publicly available, and tech companies that researchers know how to use the technol-
become part of the published record. might conceal the inner workings of their ogy judiciously for specific research practices.

Nature | Vol 614 | 9 February 2023 | 225


©
2
0
2
3
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Comment
Some argue that because chatbots merely result in quick, concrete recommendations
learn statistical associations between words and policies for all relevant parties. We present
in their training set, rather than understand a non-exhaustive list of questions that could
their meanings, LLMs will only ever be able
to recall and synthesize what people have
Questions be discussed at this forum (see ‘Questions for
debate’).
already done and not exhibit human aspects for debate One key issue to address is the implications
of the scientific process, such as creative and for diversity and inequalities in research. LLMs
conceptual thought. We argue that this is a pre- Issues for discussion at a forum about could be a double-edged sword. They could help
mature assumption, and that future AI-tools conversational AIs. to level the playing field, for example by remov-
might be able to master aspects of the scien- ing language barriers and enabling more people
tific process that seem out of reach today. In • Which research tasks should or should not to write high-quality text. But the likelihood is
a 1991 seminal paper, researchers wrote that be outsourced to large language models that, as with most innovations, high-income
“intelligent partnerships” between people (LLMs)? countries and privileged researchers will quickly
and intelligent technology can outperform • Which academic skills and characteristics find ways to exploit LLMs in ways that accelerate
the intellectual ability of people alone11. These remain essential to researchers? their own research and widen inequalities.
intelligent partnerships could exceed human • What steps in an AI-assisted research Therefore, it is important that debates include
abilities and accelerate innovation to previ- process require human verification? people from under-represented groups in
ously unthinkable levels. The question is how • How should research integrity and other research and from communities affected by
far can and should automation go? policies be changed to address LLMs? the research, to use people’s lived experiences
AI technology might rebalance the academic • How should LLMs be incorporated into as an important resource.
skill set. On the one hand, AI could optimize the education and training of researchers? Science, similar to many other domains of
academic training — for example, by provid- • How can researchers and funders aid the society, now faces a reckoning induced by AI
ing feedback to improve student writing and development of independent open-source technology infringing on its most dearly held
reasoning skills. On the other hand, it might LLMs and ensure the models represent values, practices and standards. The focus
reduce the need for certain skills, such as the scientific knowledge accurately? should be on embracing the opportunity and
ability to perform a literature search. It might • What quality standards should be expected managing the risks. We are confident that
also introduce new skills, such as prompt engi- of LLMs (for example, transparency, science will find a way to benefit from conver-
neering (the process of designing and crafting accuracy, bias and source crediting) and sational AI without losing the many important
the text that is used to prompt conversational which stakeholders are responsible for the aspects that render scientific work one of the
AI models). The loss of certain skills might not standards as well as the LLMs? most profound and gratifying enterprises:
necessarily be problematic (for example, most • How can researchers ensure that LLMs curiosity, imagination and discovery.
researchers do not perform statistical analyses promote equity in research, and avoid risks
by hand any more), but as a community we need of widening inequities?
to carefully consider which academic skills and • How should LLMs be used to enhance The authors
characteristics remain essential to researchers. principles of open science?
If we care only about performance, people’s • What legal implications do LLMs have for Claudi L. Bockting is a professor of clinical
contributions might become more limited scientific practice (for example, laws and psychology in the Department of Psychiatry
and obscure as AI technology advances. In the regulations related to patents, copyright at Amsterdam UMC, University of Amsterdam,
future, AI chatbots might generate hypotheses, and ownership)? the Netherlands, and co-director of the Centre
develop methodology, create experiments12, for Urban Mental Health at the Institute for
analyse and interpret data and write manu- already). And educators should talk about its Advanced Study, University of Amsterdam.
scripts. In place of human editors and reviewers, use and ethics with undergraduate students. Eva A. M. van Dis, Johan Bollen,
AI chatbots could evaluate and review the arti- During this early phase, in the absence of any Robert van Rooij, Willem Zuidema.
cles, too. Although we are still some way from external rules, it is important for responsible A full list of author affiliations and
this scenario, there is no doubt that conversa- group leaders and teachers to determine how Supplementary information accompanies
tional AI technology will increasingly affect all to use it with honesty, integrity and transpar- this Comment online (see go.nature.
stages of the scientific publishing process. ency, and agree on some rules of engagement. com/3ww5fyz).
Therefore, it is imperative that scholars, All contributors to research should be e-mail: c.l.bockting@amsterdamumc.nl
including ethicists, debate the trade-off reminded that they will be held accounta-
between the use of AI creating a potential ble for their work, whether it was generated 1. Grant, N. & Metz, C. The New York Times (21 December
2022).
acceleration in knowledge generation and with ChatGPT or not. Every author should be 2. Jelovac, A., Kolshus, E. & McLoughlin, D. M.
the loss of human potential and autonomy in responsible for carefully fact-checking their Neuropsychopharmacol. 38, 2467–2474 (2013).
the research process. People’s creativity and text, results, data, code and references. 3. Kato, M. et al. Mol. Psychiatry 26, 118–133 (2021).
4. Vittengl, J. R., Clark, L. A., Dunn, T. W. & Jarrett, R. B.
originality, education, training and productive Second, we call for an immediate, contin- J. Consult. Clin. Psychol. 75, 475–488 (2007).
interactions with other people will probably uing international forum on development 5. van Dis, E. A. M. et al. JAMA Psychiatry 77, 265–273 (2020).
remain essential for conducting relevant and and responsible use of LLMs for research. 6. Rich, A. S. & Gureckis, T. M. Nature Mach. Intell.
1, 174–180 (2019).
innovative research. As an initial step, we suggest a summit for
7. Thorndike, E. L. J. Appl. Psychol. 4, 25–29 (1920).
relevant stakeholders, including scientists 8. Skitka, L. J., Mosier, K. & Burdick, M. D. Int. J. Human-
Widen the debate of different disciplines, technology compa- Comp. Stud. 52, 701–717 (2000).
9. George, A. & Walsh, T. Nature 605, 616–618 (2022).
Given the disruptive potential of LLMs, nies, big research funders, science academies,
10. Rudin, C. Nature Mach. Intell. 1, 206–215 (2019).
the research community needs to organize publishers, NGOs and privacy and legal special- 11. Salomon, G., Perkins, D. N. & Globerson, T. Edu. Res.
an urgent and wide-ranging debate. First, ists. Similar summits have been organized to 20, 2–9 (1991).
we recommend that every research group discuss and develop guidelines in response to 12. Melnikov, A. A. et al. Proc. Natl Acad. Sci. USA 115,
1221–1226 (2018).
immediately has a meeting to discuss and other disruptive technologies, such as human
try ChatGPT for themselves (if they haven’t gene editing. Ideally, this discussion should The authors declare no competing interests.

226 | Nature | Vol 614 | 9 February 2023


©
2
0
2
3
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.

You might also like