Dual Use of Artificial-Intelligence-Powered Drug Discovery: Comment

comment
Dual use of artificial-intelligence-powered drug

discovery
An international security conference explored how artificial intelligence (AI) technologies for drug discovery could
be misused for de novo design of biochemical weapons. A thought experiment evolved into a computational proof.
Fabio Urbina, Filippa Lentzos, Cédric Invernizzi and Sean Ekins
T
he Swiss Federal Institute for NBC 100 Generated
compounds VX LD50
(nuclear, biological and chemical)
LD50 dataset
Protection —Spiez Laboratory— 50 400 O
VX
convenes the ‘convergence’ conference N P
t-SNE 2
S
series1 set up by the Swiss government to
Count
0 O
identify developments in chemistry, biology 200
and enabling technologies that may have –50
implications for the Chemical and Biological
Weapons Conventions. Meeting every –100 0
two years, the conferences bring together –50 0 50 0 0.1 0.2 0.3 0.4 0.5
an international group of scientific and t-SNE 1 Predicted LD50 (mg kg–1)
disarmament experts to explore the current
state of the art in the chemical and biological Fig. 1 | A t-SNE plot visualization of the LD50 dataset and top 2,000 MegaSyn AI-generated and predicted
fields and their trajectories, to think through toxic molecules illustrating VX. Many of the molecules generated are predicted to be more toxic in vivo in
potential security implications and to the animal model than VX (histogram at right shows cut-off for VX LD50). The 2D chemical structure of VX
consider how these implications can most is shown on the right.
effectively be managed internationally.
The meeting convenes for three days of
discussion on the possibilities of harm, published computational machine learning be used to help derive compounds for the
should the intent be there, from cutting-edge models for toxicity prediction in different treatment of neurological diseases (details
chemical and biological technologies. areas, and, in developing our presentation to of the approach are withheld but were
Our drug discovery company received an the Spiez meeting, we opted to explore how available during the review process). The
invitation to contribute a presentation on AI could be used to design toxic molecules. underlying generative software is built on,
how AI technologies for drug discovery It was a thought exercise we had not and similar to, other open-source software
could potentially be misused. considered before that ultimately evolved that is readily available4. To narrow the
into a computational proof of concept for universe of molecules, we chose to drive the
Risk of misuse making biochemical weapons. generative model towards compounds such
The thought had never previously struck as the nerve agent VX, one of the most toxic
us. We were vaguely aware of security Generation of new toxic molecules chemical warfare agents developed during
concerns around work with pathogens or We had previously designed a commercial the twentieth century — a few salt-sized
toxic chemicals, but that did not relate to de novo molecule generator that we called grains of VX (6–10 mg)5 is sufficient to kill
us; we primarily operate in a virtual setting. MegaSyn2, which is guided by machine a person. Other nerve agents with the same
Our work is rooted in building machine learning model predictions of bioactivity mechanism such as the Novichoks have also
learning models for therapeutic and toxic for the purpose of finding new therapeutic been in the headlines recently and used in
targets to better assist in the design of new inhibitors of targets for human diseases. poisonings in the UK and elsewhere6.
molecules for drug discovery. We have This generative model normally penalizes In less than 6 hours after starting on our
spent decades using computers and AI to predicted toxicity and rewards predicted in-house server, our model generated 40,000
improve human health—not to degrade target activity. We simply proposed to invert molecules that scored within our desired
it. We were naive in thinking about the this logic by using the same approach to threshold. In the process, the AI designed
potential misuse of our trade, as our design molecules de novo, but now guiding not only VX, but also many other known
aim had always been to avoid molecular the model to reward both toxicity and chemical warfare agents that we identified
features that could interfere with the many bioactivity instead. We trained the AI with through visual confirmation with structures
different classes of proteins essential to molecules from a public database using a in public chemistry databases. Many new
human life. Even our projects on Ebola and collection of primarily drug-like molecules molecules were also designed that looked
neurotoxins, which could have sparked (that are synthesizable and likely to be equally plausible. These new molecules
thoughts about the potential negative absorbed) and their bioactivities. We opted were predicted to be more toxic, based on
implications of our machine learning to score the designed molecules with an the predicted LD50 values, than publicly
models, had not set our alarm bells ringing. organism-specific lethal dose (LD50) model3 known chemical warfare agents (Fig. 1).
Our company—Collaborations and a specific model using data from the This was unexpected because the datasets
Pharmaceuticals, Inc.—had recently same public database that would ordinarily we used for training the AI did not include
Nature Machine Intelligence | VOL 4 | March 2022 | 189–191 | www.nature.com/natmachintell 189
comment
these nerve agents. The virtual molecules learning and creating new models like ours, tools, as well as open-source software tools
even occupied a region of molecular and toxicity datasets9 that provide a baseline and many datasets that populate public
property space that was entirely separate model for predictions for a range of targets databases, are available with no oversight.
from the many thousands of molecules in related to human health are readily available. If the threat of harm, or actual harm,
the organism-specific LD50 model, which Our proof of concept was focused occurs with ties back to machine learning,
comprises mainly pesticides, environmental on VX-like compounds, but it is equally what impact will this have on how this
toxins and drugs (Fig. 1). By inverting the applicable to other toxic small molecules technology is perceived? Will hype in the
use of our machine learning models, we with similar or different mechanisms, press on AI-designed drugs suddenly flip to
had transformed our innocuous generative with minimal adjustments to our protocol. concern about AI-designed toxins, public
model from a helpful tool of medicine to a Retrosynthesis software tools are also shaming and decreased investment in these
generator of likely deadly molecules. improving in parallel, allowing new technologies? As a field, we should open a
Our toxicity models were originally synthesis routes to be investigated for known conversation on this topic. The reputational
created for use in avoiding toxicity, enabling and unknown molecules. It is therefore risk is substantial: it only takes one bad
us to better virtually screen molecules (for entirely possible that novel routes can be apple, such as an adversarial state or other
pharmaceutical and consumer product predicted for chemical warfare agents, actor looking for a technological edge, to
applications) before ultimately confirming circumventing national and international cause actual harm by taking what we have
their toxicity through in vitro testing. The lists of watched or controlled precursor vaguely described to the next logical step.
inverse, however, has always been true: the chemicals for known synthesis routes. How do we prevent this? Can we lock away
better we can predict toxicity, the better we The reality is that this is not science all the tools and throw away the key? Do
can steer our generative model to design fiction. We are but one very small we monitor software downloads or restrict
new molecules in a region of chemical company in a universe of many hundreds sales to certain groups? We could follow the
space populated by predominantly lethal of companies using AI software for drug example set with machine learning models
molecules. We did not assess the virtual discovery and de novo design. How many like GPT-311, which was initially waitlist
molecules for synthesizability or explore of them have even considered repurposing, restricted to prevent abuse and has an API
how to make them with retrosynthesis or misuse, possibilities? Most will work for public usage. Even today, without a
software. For both of these processes, on small molecules, and many of the waitlist, GPT-3 has safeguards in place to
commercial and open-source software is companies are very well funded and likely prevent abuse, Content Guidelines, a free
readily available that can be easily plugged using the global chemistry network to make content filter and monitoring of applications
into the de novo design process of new their AI-designed molecules. How many that use GPT-3 for abuse. We know of no
molecules7. We also did not physically people have the know-how to find the recent toxicity or target model publications
synthesize any of the molecules; but with pockets of chemical space that can be filled that discuss such concerns about dual use
a global array of hundreds of commercial with molecules predicted to be orders of similarly. As responsible scientists, we need
companies offering chemical synthesis, magnitude more toxic than VX? We do not to ensure that misuse of AI is prevented, and
that is not necessarily a very big step, and currently have answers to these questions. that the tools and models we develop are
this area is poorly regulated, with few if There has not previously been significant used only for good.
any checks to prevent the synthesis of new, discussion in the scientific community about By going as close as we dared, we
extremely toxic agents that could potentially this dual-use concern around the application have still crossed a grey moral boundary,
be used as chemical weapons. Importantly, of AI for de novo molecule design, at least demonstrating that it is possible to
we had a human in the loop with a firm not publicly. Discussion of societal impacts design virtual potential toxic molecules
moral and ethical ‘don’t-go-there’ voice of AI has principally focused on aspects without much in the way of effort, time or
to intervene. But what if the human were such as safety, privacy, discrimination and computational resources. We can easily erase
removed or replaced with a bad actor? With potential criminal misuse10, but not on the thousands of molecules we created, but
current breakthroughs and research into national and international security. When we cannot delete the knowledge of how to
autonomous synthesis8, a complete design– we think of drug discovery, we normally recreate them.
make–test cycle applicable to making not do not consider technology misuse
only drugs, but toxins, is within reach. Our potential. We are not trained to consider Broader effects on society
proof of concept thus highlights how a it, and it is not even required for machine There is a need for discussions across
nonhuman autonomous creator of a deadly learning research, but we can now share traditional boundaries and multiple
chemical weapon is entirely feasible. our experience with other companies disciplines to allow for a fresh look at AI
and individuals. AI generative machine for de novo design and related technologies
A wake-up call learning tools are equally applicable to larger from different perspectives and with a wide
Without being overly alarmist, this should molecules (peptides, macrolactones, etc.) variety of mindsets. Here, we give some
serve as a wake-up call for our colleagues and to other industries, such as consumer recommendations that we believe will
in the ‘AI in drug discovery’ community. products and agrochemicals, that also have reduce potential dual-use concerns for AI
Although some domain expertise in interests in designing and making new in drug discovery. Scientific conferences,
chemistry or toxicology is still required molecules with specific physicochemical such as the Society of Toxicology and
to generate toxic substances or biological and biological properties. This greatly American Chemical Society, should actively
agents that can cause significant harm, when increases the breadth of the potential foster a dialogue among experts from
these fields intersect with machine learning audience that should be paying attention industry, academia and policy making
models, where all you need is the ability to to these concerns. on the implications of our computational
code and to understand the output of the For us, the genie is out of the medicine tools. There has been recent discussion in
models themselves, they dramatically lower bottle when it comes to repurposing our this journal regarding requirements for
technical thresholds. Open-source machine machine learning. We must now ask: what broader impact statements from authors
learning software is the primary route for are the implications? Our own commercial submitting to conferences, institutional
190 Nature Machine Intelligence | VOL 4 | March 2022 | 189–191 | www.nature.com/natmachintell

comment
review boards and funding bodies as well as broaden the scope to other disciplines, and 6. Aroniadou-Anderjaska, V., Apland, J. P., Figueiredo, T. H., De
Araujo Furtado, M. & Braga, M. F. Neuropharmacology 181,
addressing potential challenges12. Making particularly to computing students, so that 108298 (2020).
increased visibility a continuous effort they are aware of the potential for misuse of 7. Genheden, S. et al. J. Cheminform. 12, 70 (2020).
and a key priority would greatly assist in AI from an early stage of their career, as well 8. Coley, C. W. et al. Science 365, eaax1566 (2019).
9. Dix, D. J. et al. Toxicol. Sci. 95, 5–12 (2007).
raising awareness about potential dual-use as understanding the potential for broader 10. Hutson, M. The New Yorker https://www.newyorker.com/tech/
aspects of cutting-edge technologies and impact12. We hope that by raising awareness annals-of-technology/who-should-stop-unethical-ai (2021).
would generate the outreach necessary to of this technology, we will have gone some 11. Brown, T. B. et al. Preprint at arXiv https://arxiv.org/abs/2005.14165
(2020).
have everyone active in our field engage in way toward demonstrating that although 12. Prunkl, C. E. A. et al. Nat. Mach. Intell. 3, 104–110 (2021).
responsible science. We can take inspiration AI can have important applications in 13. Organisation for the Prohibition of Chemical Weapons. The Hague
from examples such as The Hague Ethical healthcare and other industries, we should Ethical Guidelines https://www.opcw.org/hague-ethical-guidelines
(2021).
Guidelines13, which promote a culture also remain diligent against the potential
of responsible conduct in the chemical for dual use, in the same way that we would
Acknowledgements
sciences and guard against the misuse of with physical resources such as molecules We are grateful to the organizers and participants of the
chemistry, in order to have AI-focused or biologics. ❐ Spiez Convergence conference 2021 for their feedback and
drug discovery, pharmaceutical and questions. C.I. contributed to this article in his personal
possibly other companies agree to a code Fabio Urbina1, Filippa Lentzos2, capacity. The views expressed in this article are those of the
of conduct to train employees, secure their Cédric Invernizzi 3 and Sean Ekins 1 ✉ authors only and do not necessarily represent the position
or opinion of Spiez Laboratory or the Swiss Government.
technology, and prevent access and potential 1
Collaborations Pharmaceuticals, Inc., Raleigh, NC, We kindly acknowledge US National Institutes of Health
misuse. The use of a public-facing API for USA. 2Department of War Studies and Department funding under grant R44GM122196-02A1 from the
models, with code and data available upon of Global Health & Social Medicine, King’s College National Institute of General Medical Sciences and
request, would greatly enhance security London, London, UK. 3Spiez Laboratory, Federal 1R43ES031038-01 and 1R43ES033855-01 from the National
Institute of Environmental Health Sciences for our machine
and control over how published models are Department of Defence, Civil Protection and Sports,
learning software development and applications. Research
utilized without adding much hindrance Spiez, Switzerland. reported in this publication was supported by the National
to accessibility. Although MegaSyn is a ✉e-mail: sean@collaborationspharma.com Institute of Environmental Health Sciences of the National
commercial product and thus we have Institutes of Health under grants R43ES031038 and
control over who has access to it, going Published online: 7 March 2022 1R43ES033855-01. The content is solely the responsibility
forward, we will implement restrictions or https://doi.org/10.1038/s42256-022-00465-9 of the authors and does not necessarily represent the official
views of the National Institutes of Health.
an API for any forward-facing models. A References
reporting structure or hotline to authorities, 1. Spiez Convergence Conference https://www.spiezlab.admin.ch/ Competing interests
for use if there is a lapse or if we become en/home/meta/refconvergence.html (2021).
F.U. and S.E. work for Collaborations Pharmaceuticals,
2. Urbina, F., Lowden, C. T., Culberson, J. C. & Ekins, S. https://doi.
aware of anyone working on developing org/10.33774/chemrxiv-2021-nlwvs (2021).
Inc. F.L. and C.I. have no conflicts of interest.
toxic molecules for non-therapeutic uses, 3. Mansouri, K. et al. Environ. Health Perspect. 129, 047013 (2021).
Additional information
may also be valuable. Finally, universities 4. Blaschke, T. et al. J. Chem. Inf. Model. 60, 5918–5922 (2020).
Peer review information Nature Machine Intelligence
5. National Research Council Committee on Toxicology. https://
should redouble their efforts toward the www.ncbi.nlm.nih.gov/books/NBK233724/ (National Academies thanks Gisbert Schneider and Carina Prunkl for their
ethical training of science students and Press, 1997). contribution to the peer review of this work.
Nature Machine Intelligence | VOL 4 | March 2022 | 189–191 | www.nature.com/natmachintell 191

Dual Use of Artificial-Intelligence-Powered Drug Discovery: Comment

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dual Use of Artificial-Intelligence-Powered Drug Discovery: Comment

Uploaded by

Copyright:

Available Formats

comment

Dual use of artificial-intelligence-powered drug

Fabio Urbina, Filippa Lentzos, Cédric Invernizzi and Sean Ekins

190 Nature Machine Intelligence | VOL 4 | March 2022 | 189–191 | www.nature.com/natmachintell

Nature Machine Intelligence | VOL 4 | March 2022 | 189–191 | www.nature.com/natmachintell 191

You might also like